HTML

For scalar diffraction, as shown in Fig. 1a, the electric field at a point (x, y, z) in the Cartesian coordinates can be obtained based on the Huygens–Fresnel principle and expressed by the Rayleigh–Sommerfeld diffraction integral^{20}:
$$ E\left( {x,y,z} \right) =  \frac{i}{\lambda } {\iint_{\!\Omega}} {E_0\left( {u,v,0} \right) \times \frac{{\exp \left( {ikr} \right)}}{r} \times \cos \theta \;} dudv $$ (1) Fig. 1 Illustrative diagrams of scalar and vector diffraction.
a Geometry for scalar diffraction calculation. b Geometry for vector diffraction calculationwhere $r = \sqrt {\left({x  u} \right)^2 + \left({y  v} \right)^2 + z^2}$ is the distance between the source point and the observation point of interest. k = 2π/λ is the wavenumber. In the condition of the Fresnel approximation with a Fresnel number F ≥ 1, the third term and higher orders in the Taylor expression of r can be ignored, that is, $r \approx z + \frac{{\left({x  u} \right)^2 + \left({y  v} \right)^2}}{{2z}}$. In the denominator of Eq. (1), r can be further approximated with only the first term (r ≈ z). Moreover, the paraxial approximation ensures cosθ ≈ 1. In this way, the complex electric field can be described by the Fresnel diffraction integral:
$$ E\left( {x,y,z} \right) = \frac{{\exp \left( {ikz} \right)}}{{i\lambda z}} {\iint_{\Omega}} {E_0\left( {u,v,0} \right)} \\\;\;\; \times \exp \left\{ {\frac{{ik}}{{2z}}\left[ {\left( {x  u} \right)^2 + \left( {y  v} \right)^2} \right]} \right\} dudv $$ (2) which can be further rewritten as:
$$ E\left( {x,y,z} \right) = \frac{{\exp \left( {ikz} \right) \times \exp \left( {ik\frac{{x^2 + y^2}}{{2z}}} \right)}}{{i\lambda z}}{\iint}_{\!\Omega} {E_0\left( {u,v,0} \right)} \\ \times \exp \left[ {\frac{{i\pi }}{{\lambda z}}\left( {u^2 + v^2} \right)} \right] \times {\mathrm{exp}}\left[ {  \frac{{2i\pi }}{{\lambda z}}\left( {xu + yv} \right)} \right] dudv $$ (3) Here, we define:
$$ F_0 = \frac{{\exp \left( {ikz} \right) \times \exp \left( {ik\frac{{x^2 + y^2}}{{2z}}} \right)}}{{i\lambda z}} $$ (4) $$ F = \exp \left[ {\frac{{i\pi }}{{\lambda z}}\left( {u^2 + v^2} \right)} \right] $$ (5) Therefore, the integral Eq. (3) can be expressed in terms of the twodimensional FT:
$$ E = F_0 \times {\boldsymbol{F}}\left( {E_0 \times F} \right) $$ (6) here F represents the twodimensional FT. Moreover, as with the other type of scalar diffraction, Fraunhofer diffraction in the far field can be expressed by $E = F_0 \times {\boldsymbol{F}}\left({E_0} \right)$, which can be regarded as a special case of Fresnel diffraction passing through a converging lens. Therefore, scalar diffraction can be computed across the xyplane using an FTbased approach.
Scalar diffraction can be used to effectively compute the complex amplitude distribution of many optical systems with a few approximations, as described above. However, it is known that the polarization components are changed due to large refractivity after passing through a highNA nonparaxial system, and scalar diffraction is incapable of achieving proper results. The vectorial Debye diffraction integral, established by Richards and Wolf^{21}, has to be adopted to analyze the complex EM field of each polarization component (Supplementary Information Section 1). The optical layout is shown in Fig. 1b.
Due to the refraction of the nonparaxial tight focusing system, the electric field components (polarization components $\overrightarrow e _s$ and $\overrightarrow e _p$) on the entrance pupil P_{e} are transformed into a spherical reference surface P_{r} ($\overrightarrow e _s$, $\overrightarrow e _{th}$, and $\overrightarrow e _r$). The transformation can be expressed in Cartesian coordinates as^{20}:
$$ \overrightarrow E _r = A_0\sqrt {\cos \theta } \times {\bf{M}} \,\times\, \overrightarrow E _i$ $$ (7) M is the transform matrix of the polarization from the entrance surface to the converging spherical surface. $A_0\sqrt {\cos \theta }$ is the apodization factor accounting for the energy conservation. The propagation of the electric field from the reference surface P_{r} to the imaging point p (x, y, z) near the focus is expressed by the Debye integral:
$$ \overrightarrow E =  \frac{{iC}}{\lambda }\mathop {\iint}\nolimits_{\!\Sigma} {\overrightarrow E _r} \times \exp \left[ {i\left( {k_zz  k_xx  k_yy} \right)} \right]{\mathrm{ }}d\Sigma $$ (8) The definition of $\overrightarrow k _r$can be found in Supplementary Information Section 1. By performing the integration over the planar surface P_{e} instead of the surface P_{r} (Supplementary Information Section 2):
$$ \overrightarrow E =  \frac{{iC}}{\lambda }\mathop {\iint}\nolimits_{\!\Omega} {\left[ {\overrightarrow E _r \times {{\exp \left( {ik_zz} \right)} / {\cos \theta }}} \right]}\\\;\;\; \times \exp \left[ {  i\left( {k_xx + k_yy} \right)} \right] dk_xdk_y $$ (9) which can be rewritten in the form of an FT:
$$ \overrightarrow E \left( {x,y,z} \right) =  \frac{{iC}}{\lambda }{\boldsymbol{F}}\left[ {\overrightarrow E _r \times {{\exp \left( {ik_zz} \right)} / {\cos \theta }}} \right] \\=  \frac{{iC}}{\lambda }{\boldsymbol{F}}\left[ {{\bf{M}} \times \overrightarrow E _i \times {{\exp \left( {ik_zz} \right)} / {\sqrt {\cos \theta } }}} \right] $$ (10) In brief, both scalar diffraction and vector diffraction can be expressed by the FT. FFT algorithms in modern computer systems allow for fast and accurate calculations. The similarity between these two diffractions is obvious from a mathematical point of view: the vector diffraction integral is equivalent to the scalar Fraunhofer diffraction in the case of a lowNA optical system where 1/cosθ ≈ 1.
Although the FFTbased optical calculation is much faster than the direct integration method, it results in inevitable drawbacks: the resultant output field has a fixed transverse dimension and unchangeable sampling numbers determined by the dimension and sampling size of the input aperture for a given distance. The dimension of the output field is:
$$ D_m = \frac{{\lambda d}}{{p_s}} $$ (11) where d is the distance between the input aperture and output plane. p_{s} is the sampling size of the input aperture. The sampling numbers of the output plane are rigidly equivalent to those of the input aperture. The restriction is brought about by the intrinsic characteristic of the FT and greatly limits the flexibility in light propagation calculations. For example, the input aperture must be enormously expanded with the aid of the zeropadding approach when a small portion of the output plane is required with high resolution, which inevitably leads to a large increment of the computation time.

Regarding mathematics, to achieve the required bandwidth and resolution in the frequency domain, the appropriate zeropadding operation is needed to extend the dimension of the original input sequence^{15}. For most applications in laser manipulation and lithography, only a small fraction of the output field with high resolution is needed to obtain sufficient details, resulting in large amounts of zeropadding. This results in a severe waste of resources, as most of the results are discarded. The operation of the zeropadding inevitably increases the computation time and the demand for memory usage. Moreover, the resultant output region remains unchangeable, greatly limiting its potential in practical applications. Here, the Bluestein method is adopted to evaluate the scalar and vector diffraction calculations. The Bluestein method is an elegant method conceived by L. Bluestein^{22} and further generalized by L. Rabiner et al.^{23} that is capable of performing more general FTs at arbitrary frequencies as well as boosting the resolution over the full spectrum. The Bluestein method offers us a spectral zoom operation with high resolution and arbitrary bandwidth. This advantage is enabled by computing the ztransform along spiral contours in the zplane for an input sequence (Supplementary Information Section 3 and Fig. S1). The computational complexity is O[(M + N)log_{2}(M + N)], manifesting an FFT algorithm. The method is based on the ztransform along a spiral contour in the zplane defined by A and W:
$$ X\left[ m \right] = \left. {\mathop {\sum}\limits_{n = 0}^{N  1} {x\left[ n \right] \times z^{  n}} } \right_{z = A \times W^{  m}} = \mathop {\sum}\limits_{n = 0}^{N  1} {x\left[ n \right] \times A^{  n} \times W^{mn}} $$ (12) here $m = \left[ {0, \cdots, M  1} \right]$. M is the length of the transform. N is the length of input sequence. A specifies the complex starting point of the zplane spiral contour of interest, and W specifies the complex scalar describing the complex ratio between points along the contour. Note that the case of A = 1, W = exp(−i2π/N), and M = N corresponds to the discrete Fourier transform (DFT), which computes the ztransform along the unit circle with a finite duration. More generally, the method can be used to calculate the DFT between an arbitrary starting point f_{1} and ending point f_{2} (i.e., the tuneable frequency bandwidth relative to the total frequency range f_{s}) with arbitrary sampling numbers M.
The practical implementations of the Bluestein method for enhanced DFT computation deserve additional comments. First, a 2D FT is needed for the computation of both scalar and vector diffraction. The Bluestein method should be adopted in both the column and the row dimensions to fulfill this requirement. Second, the Bluestein method internalizes padding of the input array with zeros at the tail. However, symmetric zeropadding around the input array is needed for the simulation of realistic optical systems. Third, an additional operation is needed to shift the zerofrequency component to the center of the array before and after the DFT to eliminate the highfrequency oscillation in the phase information. To address these issues, the definition of parameters A and W should be rearranged, and phase shifting factor P_{shift} should be included at the end of the calculation (see Supplementary Information Section 3 and Figs. S2–S4).
By performing these adjustments, the Bluestein method can be developed as a fast approach for light diffraction calculation with superior flexibility: it allows for the selection of arbitrary segments in the imaging plane with arbitrary resolution, providing competitive efficiency and flexibility over direct integration and the FFT methods.

Figure 2 illustrates the scalar calculation with a paradigm of the converging spherical wave propagation, which is generated by a plane wave passing through a convex lens. The phase profile of the lens is shown in Fig. 2a, which is equivalent to the phase plate after being wrapped between 0 and 2π (Fig. 2b). The optical configuration is sketched in Fig. 2c, with the parameters λ = 800 nm, f = 600 mm, and D = 8.64 mm. Figure 2d, e shows the optical field distribution in the focal plane in terms of the intensity and phase. Figure 2f, g shows the crosssectional intensity and phase distributions in the light propagation direction. The corresponding line plots of the intensity and phase are given in Fig. 2h–k. A comparison between the Bluestein method and traditional direct integration and FFT methods is also made, from which we can see excellent agreements. It is revealed that the Bluestein method can calculate the scalar light diffraction with high accuracy.
Fig. 2 Scalar calculation of the converging spherical wave.
a Phase profiles of the convex lens (gray line) and the corresponding phase plate (red line). b 3D rendered diagram of the phase plate. c Illustration of the optical setup. d Intensity and e phase distributions in the focal plane (z = 600 mm). f Intensity and g phase distributions in the longitudinal direction. h–k Line plots corresponding to (d–g), calculated using three different methods. l Dependence of the computation time on the number of sampling points in one dimension. An incident light field with sampling points of 1080 × 1080 and an interval of 8 μm (i.e., width of 8.64 mm) is fixed for each calculation (the same hereinafter unless otherwise specified). m Comparison of the computation time for the light field in the xyplane using different methods. Here, the target region with a width of 0.2 mm is fixed with sampling points of 1080 × 1080. n Comparison of the computation time for the light field in volumetric three dimensions and the crosssectional yzplane using different methods. Here, 150 sliced layers are calculatedThe Bluestein method has a superior advantage in the computation time cost over the direct integration and FFT methods. Due to the tedious pointbypoint calculation method, the direct integration method is associated with two cycling loops, and the computation time increases drastically with the calculation points of the target plane (with a computational complexity of O (M^{2} × N^{2})). For the case of the FFT method, a zeropadding operation is needed to fulfill the requirement for the preset target sampling numbers, resulting in a rapid increase in computation time with the sampling points. As shown in Fig. 2l, with the increase in the sampling points along one coordinate axis, the Bluestein method exhibits its obvious superiority compared with the other two methods. This advantage makes the method particularly applicable to scenarios where large sampling points are needed, such as highresolution microscopy. For the case in Fig. 2d, e, where the sampling points in the entrance pupil and output field are the same (M = N = 1080) and the ROI is 0.2 × 0.2 mm, the computational cost is ~13.7 h for the direct integration method, making it unsuitable for practical applications. For the FFT method, the computational cost is improved to 68 s, as shown in Fig. 2m. In comparison, the computation time is only 0.67 s using our proposed Bluestein method, which is 10^{5} and 10^{2} times less than those of the direct integration method and FFT method, respectively. The threedimensional volumetric light field (Supplementary Information Fig. S5) can be reconstructed using crosssectional light fields by calculating the lateral planes layer by layer. As depicted in Fig. 2n, the computation time for the direct method is excessively long to obtain the volumetric light field (~85 days). It takes 2 h to calculate the crosssectional light field in the longitudinal yzplane. By using the FFT method, the computational cost is the same (2.8 h) for both the volumetric and crosssectional light fields because the ROI cannot be tuned due to the intrinsic characteristic of the FT. Owing to the fast computation property of the Bluestein method, calculation of the 3D optical field can be accomplished in < 100 s. The efficiency enhancement is on the same order as that in the lateral xyplane. More examples of scalar diffraction are given in Supplementary Information Section 4 and Fig. S6.
In addition to the great improvement in computational efficiency, the Bluestein method has remarkable flexibility compared with the FFT method. That is, an arbitrary ROI can be defined with arbitrary resolution. This feature is illustrated by reconstructing a computergenerated hologram (CGH), as shown in Fig. 3. Evaluation of the light propagation after being modulated by a CGH is essential for predicting the performance of optical holographic tweezers^{24}, holographic displays^{25}, and laser holographic processing^{26, 27}. As shown in Fig. 3a, a CGH is generated by the weighted Gerchberg–Saxton (GSW) algorithm^{28, 29}. After FT by a converging FT lens, the designed pattern can be reconstructed (Fig. 3b). The process involves two scalar diffraction calculations: one is from the CGH to the FT lens, and the other is from the FT lens to the reconstruction plane. Figure 3c–f shows the intensity distributions with varying regions in the reconstruction plane and constant sampling points (1080 × 1080). Figure 3g–j shows the corresponding phase distributions. It is validated that the Bluestein method possesses fine flexibility compared with the FFT method.
Fig. 3 Scalar calculation of the CGHmodulated light wave.
a Digital CGH derived from the GSW algorithm. b Optical setup for holographic reconstruction. Here the focal length of the FT lens is 600 mm. The dimension of the CGH is 8.64 mm (1080 × 1080 pixels), and the incident wavelength is 800 nm. c–f Calculated intensity distributions with changing ROIs. g–j Calculated phase distributions with changing ROIs. Here, sampling points are fixed at 1080 × 1080 
The vectorial nature of light is essential for optical systems with a highNA aperture or specific polarization, such as radial and azimuthal polarizations^{30, 31}. Figure 4a illustrates the focusing of radially polarized light by a highNA aplanatic objective (NA: 1.4). By using the proposed Bluestein method in the vectorial Debye–Wolf integral, the light field distribution near the focus can be rapidly calculated (insets of Fig. 4a). The results are consistent with those computed by direct integration and the FFT methods, as reflected by the line plots of the light intensity along the transverse and longitudinal directions in Fig. 4b, c.
Fig. 4 Vector diffraction of light with a highNA objective.
a Radially polarized light focused by an aplanatic objective (NA: 1.4). Insets: resultant intensity profiles in the transverse and longitudinal planes. b, c Line plots of the intensity in the transverse and longitudinal directions calculated using different methods. d Spiral phase plate and the phase depth distribution with the azimuthal angle. e Doughnutshaped intensity in the focal plane. f–h Intensity distributions of different polarization components along the x, y, and z directions, respectively. i Line plots of the intensity profile of different polarization components in the focal plane. j Intensity distribution in the longitudinal plane. k–m Phase distributions of different polarization components. n, o Enlarged partial intensity distribution as labeled in (e, j), respectively. p Comparison of computation time for vector diffraction calculation in the xyplane, yzplane, and volumetric three dimensionsThe optical vortex generated by a spiral phase plate (Fig. 4d), in cooperation with circular polarization, plays a key role in superresolution stimulated emission depletion microscopy^{32} and nanolithography^{33}. A doughnutshaped focus profile with a dark center is used as the depletion beam to eliminate fluorescence or polymerization. Figure 4e, j shows the optical intensity profiles of the optical vortex in the lateral xy and longitudinal yzplanes, respectively. An engineered focus with a symmetric doughnut shape can be generated. Moreover, the light components in different polarizations can be obtained efficiently using our Bluestein method, as shown in Fig. 4f–i, k–m. It can be seen that all the light components have dark central intensities close to zero and the spiral phase. The light in the transverse polarizations is dominant over the longitudinal polarization. The Bluestein method also endows the vectorial calculation with high flexibility compared with the traditional FFT approach. Figure 4n, o shows the enlarged intensity profiles in the ROIs labeled in Fig. 4f, g, respectively. Another example of the usage of the Bluestein method for vector diffraction is shown in Supplementary Information Section 5 and Fig. S7. The optical information in the arbitrary ROIs can be investigated in detail without increasing the computational cost, making the Bluestein method advantageous in evaluating localized highresolution light distributions for the application of microscopy and photolithography.
For the computation time, the Bluestein method also exhibits great superiority. Here, we consider the calculation from the entrance pupil with ~10^{5} sampling points to the exit pupil with the same points in the xyplane, and 100 layers along the optical axis are calculated for volumetric and crosssectional light distributions in the yzplane. As shown in Fig. 4p, the direct method requires 57.16 min to calculate the lateral light field. 95.3 h is needed for the volumetric 3D light field distribution, and 22.78 min is needed for the sliced yzplane. An acceptable time (2.88 s) is needed for the FFT method to calculate the xyplane. However, an impractical 280.4 s is needed to obtain the light distribution in the volumetric three dimensions and the twodimensional yzplane. In contrast, only 0.2 s is consumed by the Bluestein method for calculation in the xyplane. Moreover, only 9.34 and 12.19 s is needed to achieve the 2D crosssectional and 3D volumetric light fields. Note that the computation time increases much more quickly with the sampling numbers of the ROI for the direct method and FFT method than for the Bluestein method, e.g., more than 10 days are needed for the direct method and 126.5 s is needed for the FFT method to acquire transverse light distributions in the xyplane when the number of sampling points increases to ~10^{6} (1080 × 1080), while only 1.78 s is needed for the Bluestein method, which is five orders of magnitude less than that needed for the direct method and 10^{2} times less than that for the FFT method.

As discussed above, both the scalar and vector diffraction can be efficiently calculated by the Bluestein method. Based on this, the fullpath optical calculation and tracing can be performed with high flexibility and efficiency. Figure 5a illustrates a representative optical layout for laser holographic processing and holographic manipulation. This setup can be further adopted for twophoton scanning confocal microscopy. Here, a phaseonly spatial light modulator (SLM, Holoeye Pluto NIRⅡ, resolution: 1920 × 1080) is used to modulate the wavefront of the laser by loading a predesigned CGH. A combination of a halfwave plate and polarization beam splitter is utilized to attenuate the laser power. A 4f configuration consisting of Lens 1 (f = 600 mm) and Lens 2 (f = 200 mm) is placed between the SLM and aplanatic objective (100×, NA: 1.4). It is a typical optical system involving both scalar diffraction and vector diffraction during light propagation.
Fig. 5 Fullpath calculation of a representative optical system.
a Sketch of the optical system. S: the plane on the panel of the SLM. P: the focal plane of Lens 1. E: the entrance pupil of the objective. F: the focal plane of the objective. (b) CGH displayed on the SLM for the generation of a 9 × 9 foci array. c The foci array on the focal plane of Lens 1 (Pplane). d Phase distribution and e intensity distribution on the entrance pupil of the objective (Eplane). f Simulated and g measured multifoci array generated on the focal plane of the objective (Fplane). h Enlarged intensity profile of a single focal spot in the array. The arrows indicate the polarization directions. i Longitudinal intensity profile and corresponding line plot of the foci array. j Simulated and k measured intensity distribution on the Fplane when the CGH for the generation of the pattern "E" is encoded on the SLM. l, m Enlarged intensity profiles of the pattern corresponding to (j) and (k) with the same sampling points as in (i)First, we simulate the multifoci optical system, which can be used for holographic tweezers, laser parallel processing and data recording. Figure 5b is the corresponding CGH for the generation of a 9 × 9 multifoci array. A linearly polarized femtosecond laser (800 nm, emitted from Chameleon VisionS, Coherent) is modulated by the CGH. After the FT of Lens 1, a multifoci array is generated (Fig. 5c). At the back of the objective, the phase and intensity distributions are retrieved as shown in Fig. 5d, e. The phase profile closely resembles the CGH, validating the accuracy of the Bluesteinenabled calculation of scalar diffraction. The light beam is slightly smaller than the size of the entrance pupil of the objective, ensuring that the phasemodulated beam can be fully transformed by the objective. In the focal plane of the objective, a diffractionlimited 9 × 9 multifoci array is generated (Fig. 5f). The fullpath calculation can be accomplished with high efficiency in < 4 s. The experimentally measured multifoci intensity (Fig. 5g) agrees well with the simulation. With the help of the highly flexible Bluestein method, a detailed analysis of a single focal spot is enabled, as shown in Fig. 5h, revealing that a Gaussian focus is generated with linear polarization. The light field in the longitudinal section can be readily computed, and the spatial uniformity can be investigated (Fig. 5i).
Another universal example is given in Fig. 5j–m, where a CGH is encoded on the SLM to generate a pattern as discussed in Fig. 3. By using the Bluestein fullpath calculation method, the light field of the desired pattern can be simulated in the focal plane of the objective (Fig. 5j), consistent with the experimental result (Fig. 5k). By taking advantage of the high flexibility of the Bluestein method, a magnified image of an arbitrary ROI can be calculated with arbitrary resolution and good accuracy in comparison with the experimental result, as shown in Fig. 5l, k. Another example of the usage of the Bluestein method for vector diffraction is shown in Supplementary Information Section 6 and Fig. S8. In brief, fullpath light tracing of the entire optical system can be accomplished by the Bluestein method with high efficiency and flexibility, unfolding its capacities in the realtime prediction and evaluation of optical performance in advanced microscopy, laser manipulation, and photolithography.