Artificial intelligence-generated photonics: mapping optical properties to subwavelength structures directly via a diffusion model

Shijie Rao; Kaiyu Cui; Jiawei Yang; Yali Li; Shengjin Wang; Xue Feng; Fang Liu; Wei Zhang; Yidong Huang

doi:10.37188/lam.2026.037

Subwavelength photonic structures and metamaterials provide revolutionary approaches for controlling light. The inverse design methods proposed for fabricable subwavelength structures are vital for the development of new photonic devices. However, most existing inverse design methods cannot realise direct mapping from optical properties to photonic structures; instead, they rely on forward simulation methods to perform iterative optimization. In this study, we exploit the powerful generative abilities of artificial intelligence and propose a practical inverse design method based on latent diffusion models. Our method directly maps the optical properties to structures without requiring forward simulation and iterative optimization. In this case, the given optical properties can serve as ‘prompts’ and guide the constructed model to ‘draw’ the required photonic structures correctly. Simulations and experiments show that our direct-mapping-based inverse design method can generate fabricable subwavelength photonic structures with high fidelity while following the given optical properties, such as the transmission power, phase, and polarisation responses. This may influence the methods used for optical design and significantly accelerate the research and manufacturing of new photonic devices.

HTML

Introduction

Subwavelength structures such as photonic crystals and metamaterials have led to revolutionary approaches to light-field regulation^1–4. Because these subwavelength structures cannot be analytically modelled using geometric or wave optics, conventional design approaches are mainly achieved by selecting an optimal match from a predefined library with a limited design space^5–8 of photonic structures based on forward simulation. The newly developed inverse design approaches⁹ can significantly enlarge the design space and have shown powerful capabilities in generating less intuitive but more effective photonic structures^10,11. However, because existing inverse design strategies usually transform an inverse design problem into an optimization problem and obtain the optimal design using iterative algorithms^12–18, the generated photonic structure in each iteration must be forward modelled using numerical simulation methods, such as the finite-difference time domain (FDTD)¹⁹, which is computationally expensive. In addition, these inverse design methods potentially face the common problems encountered by optimization algorithms that involve convergence, efficiency, and global optima.

To overcome the limitations of optimization-based inverse design algorithms, several attempts have been made to train deep neural networks (DNNs) and achieve direct mapping from optical properties to photonic structures^20–34. Recent research on artificial intelligence (AI)-generated content³⁵ and AI for science has shown great potential for DNNs in various practical fields, such as the generation of realistic images³⁶, chatbots³⁷, medicine³⁸, chemical research³⁹, and mechanical research⁴⁰. To realise practical AI for optics, the gap between optical physics and neural network parameters must be bridged. Although the most advanced AI-based inverse design methods have demonstrated significant improvements, they still encounter substantial challenges in practice, such as the nonuniqueness or existence of solutions, fabrication constraints, limited generalisability, and different distributions of input data between training and deployment. Therefore, they are usually implemented in a predefined and limited design space, in which the inverse problem is a one-to-one mapping^20–25. In summary, the lack of effective and accelerated design methods has become a major concern for the further development of new photonic devices^1,9,10.

In this study, we propose an inverse design method to achieve direct mapping from optical properties to photonic structures based on a latent diffusion model (LDM)³⁶, named AI-generated photonics (AIGP). We exploit the powerful image-synthesis ability of diffusion models^41,42 to enlarge the search space and design new photonic structures. In this case, the given optical properties can serve as ‘text prompts’ and guide the model to ‘draw’ the required photonic structures. To achieve such direct mapping, we devised an encoding method for optical properties and designed a prompt encoder network to solve the nonuniqueness problem and provide an interface for designing photonic structures on demand. In addition, a fast forward prediction network is proposed to accelerate the simulation process significantly and realise end-to-end training. We also present a training dataset containing arbitrary shapes that enables as large a design space as possible while meeting the fabrication limits. Compared with existing methods, AIGP addresses key challenges in inverse design, including the nonuniqueness problem, handling unseen inputs, and eliminating the need for iterative optimization. By solving these long-standing issues, a new perspective on photonic design through deep generative models is provided. Moreover, the diffusion network is much easier to train and more powerful than generative adversarial networks⁴³. It can be easily scaled to large sizes and provides strong generative capabilities. In addition, the diffusion network can generate freeform shapes with various topologies. The prompt encoder network addresses the gap between abstract design demands and realistic optical properties. A more detailed comparison with existing studies is provided in Supplementary Note 1.

Because photonic devices are generally composed of individual subwavelength structures or arrays of building blocks, which are usually referred to as meta-atoms^5,6,44-48, we illustrate the powerful direct mapping capabilities of our method with an example of designing meta-atoms from given transmission power, phase, and polarisation properties. Moreover, our method can be easily generalised to other photonic inverse design problems via transfer learning. It can greatly inspire and accelerate the development of various photonic devices and applications such as optical computing, metalenses, hyperspectral imaging chips, structural colours, and beam splitters.

Conclusions

We have proposed a direct-mapping-based inverse design method named AIGP based on a diffusion model. Whereas most existing inverse design methods are based on combinations of optimization algorithms and forward prediction strategies, our inverse design achieves direct mapping from optical properties to photonic structures, thus providing the inverse function of forward prediction. Powered by this direct mapping technique, the proposed method realises a fast inverse design process without requiring forward prediction or iterative optimization. Moreover, our inverse design method is more practical than other approaches because it provides an on-demand interface for accepting abstract design parameters as inputs, rather than precise optical properties. We achieved direct mapping from the transmission power, phase, and polarisation responses to fabricable meta-atoms and verified several inverse design results through fabrication. We also discuss how to perform an inverse design that simultaneously constrains both the amplitude and phase in Supplementary Note 18. Our method can significantly accelerate the development and manufacture of various subwavelength photonic devices.

One of the greatest difficulties encountered when utilising such direct-mapping-based inverse design is the uncertainty of the solution, which may not be unique or may not even exist. To resolve this issue, we introduced a novel framework based on an LDM enhanced by a dedicated prompt encoder network to enable one-to-many mapping and fuzzy search capabilities. If a solution does not exist, our method attempts to design a structure that satisfies the requirements as far as possible via a fuzzy search. However, if the given property is impossible to realise and no close solution can be obtained, such as in the design of a 220-nm-thick silicon meta-atom with high transmission at approximately 400 nm, the diffusion network may generate random outputs because the training dataset does not contain such impossible matching; therefore, the network does not learn to map such an input. Fortunately, our forward prediction network can be used to evaluate the inverse design results rapidly. Therefore, the confidence of the inverse design results can be determined without further simulations.

In this study, we focused on the inverse design of fabricable meta-atoms. Although meta-atoms have wide applications in various photonic devices, they are relatively small-scale photonic structural units, and are typically studied under periodic boundary conditions. Other photonic devices such as high-performance metalenses require the design of large-scale photonic structures to overcome the limitations of periodic meta-atoms. Our method still has the potential to solve such large-scale inverse design problems because these large-scale structures can also be described by their geometries and compressed by the image encoder network to reduce the number of required design parameters. LDMs are also efficient in generating large-scale images. In addition, our method can be used to design subwavelength structures based on multiple input properties. It is only necessary to modify the prompt encoder network; that is, change the length and dimension of the input feature vector. In this manner, different required properties can be concatenated into a single vector and the diffusion process can also be controlled.

Moreover, our method has significant potential for scaling up to large models. The prompt encoder and forward prediction networks are based on the Transformer, which is the basic building block in large language models such as ChatGPT. The image encoder and diffusion networks are built on the LDM, which has also already been scaled up by stable diffusion. Therefore, our work indicates that Transformer-based and generative deep-learning models offer significant potential for photonics research. Given sufficient training data, it is possible to achieve a large photonics model. This may inspire future AI techniques and research and further empower AI in photonics.

Methods

Implementation of latent diffusion network

The overall architecture of our denoising diffusion model was built on the LDM³⁶. We made some modifications and improvements such that the LDM was suitable for generating meta-atoms with limited training data, and adopted the TensorFlow⁵⁴ framework to implement all DNN models. The core of our solution is the generation mechanism of the diffusion model. In addition to the target optical response $ y $, the model uses a random latent variable $ \epsilon $ as input. Whereas $ y $ alone maps to multiple structures {$ {x}_{1} $, $ {x}_{2} $, ..., $ {x}_{n} $}, each pair ($ y $, $ {\epsilon }_{k} $) is designed to map uniquely to a specific solution $ {x}_{k} $. This transforms an ill-posed one-to-many problem into a well-defined one-to-one mapping, which a neural network can effectively learn. In practice, by sampling different $ {\epsilon }_{k} $, users can generate a diverse set of structures that satisfy the same target response.

We briefly introduce our denoising diffusion model, and more technical details can be found in the ‘Code availability’ section. First, the 2D geometry of the target meta-atom was described by a binary image with a size of 256 × 256. The period of the meta-atom was normalised to the range of $ (0,~1] $ and multiplied by the binary image. The image encoder-decoder network is based on a CNN with residual connections and a self-attention mechanism. The image encoder/decoder can encode/decode the image to/from a latent space with a size of 32 × 32 × 2. Denoising diffusion was conducted in the latent space to reduce the incurred computational cost significantly. The diffusion network is based on the U-Net architecture with cross-attention to introduce conditional control. The sampling strategy used in our diffusion process was DDIM⁴², and we adopted a continuous diffusion time, which embeds the noise and signal rates into the diffusion network. In this manner, the number of sampling steps can be changed dynamically at the time of inference. In our experiments, we found that 20 steps were sufficient to generate the required meta-atom effectively.

Freeform meta-atom generation and simulation

The proposed direct mapping algorithm is presented in contrast to existing iterative optimization algorithms. Such a direct mapping cannot be achieved using analytical models. Therefore, we attempted to fit the inverse function of the forward simulation using a data-driven method in which a training dataset is required. A significant computational cost is required to generate such a dataset. However, this computational effort is exerted only once. Once the inverse function is established by training, we can map any transmittance to meta-atoms directly without iterative optimization or forward simulation. The training dataset directly determines the search space of the diffusion network. It is vital to train a high-performance inverse design model and ensure that the subwavelength structures generated by the diffusion network satisfy the imposed fabrication limits. We generated approximately 200 000 shapes as arbitrarily as possible with and without the C4 symmetry constraint. The shapes were then randomly split into training and testing sets in a ratio of 9:1. The periods of the meta-atoms were between 300 and 700 nm. The minimum radius of curvature of the shapes was restricted to 45 nm to satisfy the fabrication limits. If we sampled the period at 10 nm intervals from 300 to 700 nm, we would obtain 41 different values. We combined these period values with 200 000 different shapes to obtain approximately $ {10}^{7} $ meta-atoms. It would difficult to simulate all of these meta-atoms via numerical simulations. Therefore, we simulated only approximately $ {10}^{5} $ meta-atoms to train the forward prediction network.

The simulation was performed using the FDTD method under periodic boundary conditions and horizontally polarised incident light. The mesh step was set to 10 nm and the cutoff precision was 10⁻⁵. The $ dt $ stability factor was set to 0.99. The transmission responses of the other meta-atoms were then quickly predicted.

Training and inference protocols

Each meta-atom was described using a 256 × 256 image. The values of the pixels were normalised to $ [-1,~1] $. The image was encoded in a 32 × 32 × 2 latent space by the image encoder, and the latent space was normalised to a mean of 0 and variance of 1. We then trained the image encoder-decoder network. After training, the image encoder-decoder network achieved a mean absolute error of 0.004 and a root mean square error of 0.028. The forward prediction network uses a binary image representing the 2D geometry and period value separately as inputs, rather than encoding them to form a single image. First, the image encoder-decoder network and forward prediction networks are trained. Then, all transmission responses of the meta-atoms can be predicted quickly via forward prediction, and these responses can be used as the training set to train the prompt encoder network. Finally, the trained image encoder network, forward prediction network, and prompt encoder network are fixed and the diffusion network is trained. At each training step, a batch of shapes is randomly selected from the training set, and for each shape, a period value is randomly generated to form a meta-atom. Subsequently, the transmission responses are predicted and the meta-atoms are encoded in the latent space. The transmission responses are further encoded using the prompt encoder network and used as a conditional control mechanism for the diffusion network. Finally, the diffusion network is trained to denoise the latent space with guidance provided by the input conditional control and signal-to-noise ratio. All networks were trained using the Adam⁵⁵ optimiser with a weight decay rate of 0.000 1, and the loss was calculated as the mean squared error. At the time of inference, only the image decoder, prompt encoder, and diffusion networks are required. However, a forward prediction network can also be used to evaluate the generated results. A quantitative analysis of the computational cost of the training and deployment of AIGP is described in Supplementary Note 14.

Chip fabrication

The designed subwavelength structures were fabricated by Tianjin H-Chip Technology Group Corporation. These structures were formed on a SOS chip. The intrinsic silicon layer is 230 nm thick and the sapphire layer is 475 µm thick with flatness <15 µm. The fabrication process was as follows: Electron-beam lithography was performed at a beam current of 2 nA using a 250-nm-thick ZEP 520A resist layer. Subsequently, the pattern was transferred via inductively coupled plasma etching with a gas mixture of CHF₃ and SF₆. Finally, the chromium mask was removed using a dry etching process with a combination of CHF₃ and O₂. The linewidth precision of the fabricated samples was specified as ±5% for features above 200 nm and ±10 nm for features at or below 200 nm. The minimum line width was 90 nm.

Acknowledgements

The authors would like to thank the Tianjin H-Chip Technology Group Corporation and the Innovation Center of Advanced Optoelectronic Chip and Institute for Electronics and Information Technology in Tianjin, Tsinghua University, for their support during the electron-beam lithography and ICP etching. We thank Yue Zou for language editing. We thank Sheng Xu and Tianhao Liu for their help with device fabrication and testing. This study was supported by the National Key Research and Development Program of China (Grant Nos. 2023YFB2806703 and 2022YFF1501600), National Natural Science Foundation of China (Grant No. U22A6004), Beijing Frontier Science Center for Quantum Information, and Beijing Academy of Quantum Information Sciences.

Supplementary information

SI for 10.37188-lam.2026.037.pdf

Reference (55)

[1]	Ma, W. et al. Deep learning for the design of photonic structures. Nature Photonics 15, 77-90 (2021).
[2]	Soukoulis, C. M., Linden, S. & Wegener, M. Negative Refractive Index at Optical Wavelengths. Science 315, 47-49 (2007).
[3]	Cerjan, A. & Fan, S. Complete photonic band gaps in supercell photonic crystals. Physical Review A 96, (2017).
[4]	Chen, W. T., Zhu, A. Y. & Capasso, F. Flat optics with dispersion-engineered metasurfaces. Nature Reviews Materials 5, 604-620 (2020).
[5]	Khorasaninejad, M. et al. Metalenses at visible wavelengths: Diffraction-limited focusing and subwavelength resolution imaging. Science 352, 1190-1194 (2016).
[6]	Miyata, M., Nemoto, N., Shikama, K., Kobayashi, F. & Hashimoto, T. Full-color-sorting metalenses for high-sensitivity image sensors. Optica 8, 1596-1604 (2021).
[7]	Tian, T. et al. Metasurface‐Based Free‐Space Multi‐Port Beam Splitter with Arbitrary Power Ratio. Advanced Optical Materials 11, (2023).
[8]	Chen, X. et al. All-Dielectric Metasurface-Based Beam Splitter with Arbitrary Splitting Ratio. Nanomaterials 11, 1137 (2021).
[9]	Li, Z., Pestourie, R., Lin, Z., Johnson, S. G. & Capasso, F. Empowering Metasurfaces with Inverse Design: Principles and Applications. ACS Photonics 9, 2178-2192 (2022).
[10]	Molesky, S. et al. Inverse design in nanophotonics. Nature Photonics 12, 659-670 (2018).
[11]	Elsawy, M. M. R. et al. Numerical Optimization Methods for Metasurfaces. Laser & amp; Photonics Reviews 14, 1900445 (2020).
[12]	Jensen, J. S. & Sigmund, O. Topology optimization for nano‐photonics. Laser & amp; Photonics Reviews 5, 308-321 (2011).
[13]	Fan, J. A. Freeform metasurface design based on topology optimization. MRS Bulletin 45, 196-201 (2020).
[14]	Sell, D. et al. Large-Angle, Multifunctional Metagratings Based on Freeform Multimode Geometries. Nano Letters 17, 3752-3757 (2017).
[15]	Piggott, A. Y. et al. Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer. Nature Photonics 9, 374-377 (2015).
[16]	Jin, Z. et al. Complex Inverse Design of Meta-optics by Segmented Hierarchical Evolutionary Algorithm. ACS Nano 13, 821-829 (2019).
[17]	Inampudi, S. & Mosallaei, H. Neural network based design of metagratings. Applied Physics Letters 112, 241102 (2018).
[18]	Peurifoy, J. et al. Nanophotonic particle simulation and inverse design using artificial neural networks. Science Advances 4, eaar4206 (2018).
[19]	Teixeira, F. L. et al. Finite-difference time-domain methods. Nature Reviews Methods Primers 3, (2023).
[20]	An, S. et al. A Deep Learning Approach for Objective-Driven All-Dielectric Metasurface Design. ACS Photonics 6, 3196-3207 (2019).
[21]	Malkiel, I. et al. Plasmonic nanostructure design and characterization via Deep Learning. Light: Science & Applications 7, 60 (2018).
[22]	Ma, W., Cheng, F. & Liu, Y. Deep-Learning-Enabled On-Demand Design of Chiral Metamaterials. ACS Nano 12, 6326-6334 (2018).
[23]	Liu, D. et al. Training Deep Neural Networks for the Inverse Design of Nanophotonic Structures. ACS Photonics 5, 1365-1369 (2018).
[24]	Kanmaz, T. B. et al. Deep-learning-enabled electromagnetic near-field prediction and inverse design of metasurfaces. Optica 10, 1373 (2023).
[25]	Zhang, T. et al. Efficient spectrum prediction and inverse design for plasmonic waveguide systems based on artificial neural networks. Photon. Res. 7, 368-380 (2019).
[26]	Lee, X. Y. et al. Fast inverse design of microstructures via generative invariance networks. Nature Computational Science 1, 229-238 (2021).
[27]	Jiang, J. & Fan, J. A. Global Optimization of Dielectric Metasurfaces Using a Physics-Driven Neural Network. Nano Letters 19, 5366-5372 (2019).
[28]	Liu, Z. et al. Generative Model for the Inverse Design of Metasurfaces. Nano Letters 18, 6570-6576(2018).
[29]	Han, X. et al. Inverse design of metasurface optical filters using deep neural network with high degrees of freedom. InfoMat 3, 432-442 (2021).
[30]	Ma, W. et al. Probabilistic representation and inverse design of metamaterials based on a deep generative model with semi‐supervised learning strategy. Advanced Materials 31, 1901111 (2019).
[31]	Wen, F., Jiang, J. & Fan, J. A. Robust Freeform Metasurface Design Based on Progressively Growing Generative Networks. ACS Photonics 7, 2098-2104 (2020).
[32]	Ma, T., Wang, H. & Guo, L. J. OptoGPT: A foundation model for inverse design in optical multilayer thin film structures. Opto-Electronic Advances 7, 240062 (2024).
[33]	An, S. et al. Multifunctional metasurface design with a generative adversarial network. Advanced Optical Materials 9, 2001433 (2021).
[34]	Ma, T. et al. Benchmarking deep learning-based models on nanophotonic inverse design problems. Opto-Electronic Science 1, 210012 (2022).
[35]	Cao, Y. et al. A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt. arXiv preprint arXiv: 2303.04226 (2023).
[36]	Rombach, R. et al. High-resolution image synthesis with latent diffusion models. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684-10695.
[37]	Achiam, J. et al. GPT-4 Technical Report. arXiv preprint arXiv: 2303.08774 (2023).
[38]	Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172-180 (2023).
[39]	Boiko, D. A. et al. Autonomous chemical research with large language models. Nature 624, 570-578 (2023).
[40]	Bastek, J. H. & Kochmann, D. M. Inverse design of nonlinear mechanical metamaterials via video denoising diffusion models. Nature Machine Intelligence 5, 1466-1475(2023).
[41]	Ho, J., Jain, A. & Abbeel, P. Denoising Diffusion Probabilistic Models. Advances in neural information processing systems 33, 6840-6851 (2020).
[42]	Song, J. , Meng, C. & Ermon, S. Denoising Diffusion Implicit Models. in International Conference on Learning Representations.
[43]	Dhariwal, P. & Nichol, A. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34, 8780-8794 (2021).
[44]	Xiong, J. et al. Dynamic brain spectrum acquired by a real-time ultraspectral imaging chip with reconfigurable metasurfaces. Optica 9, 461-468 (2022).
[45]	Rao, S. et al. Anti-spoofing face recognition using a metasurface-based snapshot hyperspectral image sensor. Optica 9, 1253-1259 (2022).
[46]	Wang, Z. et al. Single-shot on-chip spectral sensors based on photonic crystal slabs. Nature communications 10, 1020 (2019).
[47]	Li, T. et al. Revolutionary meta-imaging: from superlens to metalens. Photonics Insights 2, R01 (2023).
[48]	Li, Y. et al. Recent progress on structural coloration. Photonics Insights 3, R03 (2024).
[49]	Yang, J. et al. Ultraspectral Imaging Based on Metasurfaces with Freeform Shaped Meta‐Atoms. Laser & amp; Photonics Reviews 16, 2100663 (2022).
[50]	Shi, L. et al. Si-Based Polarizer and 1-Bit Phase-Controlled Non-Polarizing Beam Splitter-Based Integrated Metasurface for Extended Shortwave Infrared. Nanomaterials 13, 2592 (2023).
[51]	Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. in International Conference on Learning Representations.
[52]	Gao, L. et al. A Bidirectional Deep Neural Network for Accurate Silicon Color Design. Advanced Materials 31, 1905467 (2019).
[53]	He, K. et al. Masked autoencoders are scalable vision learners. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000-16009.
[54]	Abadi, M. et al. TensorFlow: a system for Large-Scale machine learning. in 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265-283.
[55]	Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980 (2014).

Artificial intelligence-generated photonics: mapping optical properties to subwavelength structures directly via a diffusion model

Abstract

References

Rights and permissions

通讯作者: 陈斌, bchen63@163.com

Research Summary

Article Metrics