HTML
-
We set up the light field microscope based on a commercial microscope (Zeiss, Observer Z1) and use a mercury lamp as the illumination source. We use different objectives for different imaging tasks (see Table S1) with the same $f$ = 165 mm tube lens. The MLA is put on the image plane of the microscope. The specification of the MLA we use has a 100 µm pitch size and a 2.1 mm focal length to code the 3D information. We put a relay system between the camera (Andor Zyla 4.2 Plus, Oxford Instruments) and the MLA, which conjugates the back-pupil plane of the MLA to the sensor plane. The sensor pixel size is 6.5 µm and the magnification of the relay lens system is set to be 0.845.
-
The proposed DiLFM can be decomposed into two parts: raw reconstructions through a few runs of RL iterations and fine reconstructions through dictionary patching. In the following sections, we first mathematically represent the RL iteration of LFM, then describe the way that our proposed dictionary patching fixes these artifacts and improves the reconstruction resolution and contrast.
-
A common light-field microscope is composited by a wide-field microscopy and an MLA put in the native imaging plane, as shown in Fig. 1. We denote the sample space coordinate as $\left({x_1, x_2, z} \right)$ and sensor space coordinate as $\left({s_1, s_2} \right)$. The point spread function (PSF) of LFM can be formulated by
$$ \begin{array}{*{20}{c}} {h\left( {x_1,x_2,z,s_1,s_2} \right) = \left| {\Im _{f_\mu }\left\{ {U\left( {x_1,x_2,z,s_1,s_2} \right){\Phi}\left( {s_1,s_2} \right)} \right\}} \right|^2} \end{array} $$ (1) Here, $U\left({x_1, x_2, z, s_1, s_2} \right)$ is the optical field in the NIP generated by a point source in $\left({x_1, x_2, z} \right)$, which is defined by36
$$ \begin{array}{l} U\left( {{x_1},{x_2},z,{s_1},{s_2}} \right)\;\;\\ \; = \frac{M}{{f_{obj}^2{\lambda ^2}}}\exp \left( { - \frac{{iu}}{{4{{\sin }^2}\left( {\alpha /2} \right)}}} \right)\smallint _0^\alpha P\left( \theta \right)\exp \left( { - \frac{{iu\;{{\sin }^2}\left( {\theta /2} \right)}}{{2{{\sin }^2}\left( {\alpha /2} \right)}}} \right){J_0}\left( {\frac{{\sin \left( \theta \right)}}{{\sin \left( \alpha \right)}}v} \right)\sin \left( \theta \right){\rm{d}}\theta \\ v \approx k\sqrt {{{\left( {{x_1} - {s_1}} \right)}^2} + {{\left( {{x_2} - {s_2}} \right)}^2}} \sin \left( \alpha \right)\\ u \approx 4kz{\sin ^2}\left( {\alpha /2} \right) \end{array} $$ (2) ${\Phi}\left({s_1, s_2} \right)$ is the modulation function of the MLA which has pitch size $d$ and focal length $f_\mu$
$$ \begin{array}{l}{\Phi}\left( {s_1,s_2} \right) = {\iint} {\mathrm{rect}}\left( {\frac{{t_1}}{d}} \right){\mathrm{rect}}\left( {\frac{{t_2}}{d}} \right)\exp \left( { - \frac{{ik}}{{2f_\mu }}\left( {t_1^2 + t_2^2} \right)} \right)\left( {\frac{{s_1 - t_1}}{d}} \right)comb\left( {\frac{{s_2 - t_2}}{d}} \right) dt_1dt_2\end{array} $$ (3) $\Im _{f_\mu }\left\{ \cdot \right\}$ is the Fresnel propagation operator which carries a light field as input and propagates a distance $f_\mu$ along the optical axis.
To reconstruct the 3D sample from the captured image, we need to bin the continuous sample and sensor space for voxelization and pixelization13. LFM can then be modeled as a linear system $H$ that maps the 3D sample space into 2D sensor space
$$ \mathop {\sum}\limits_{x_1,x_2,z} {H\begin{array}{*{20}{c}} {\left( {x_1,x_2,z,s_1,s_2} \right)X\left( {x_1,x_2,z} \right) = Y\left( {s_1,s_2} \right)} \end{array}} $$ (4) Here Y is the discrete sensor image and X is the 3D distribution of the sample. The weight matrix H can be sampled from Eq. (1) which records how the photons emitted from the voxel $(x_1, x_2, z)$ separates and contributes to the pixel $(s_1, s_2)$. Further, the weight matrix $H$ could be simplified via periodicity introduced by the MLA, which implies
$$ \begin{array}{*{20}{c}} {H\left( {x_1,x_2,z,s_1,s_2} \right) = H\left( {x_1 + D,x_2 + D,z,s_1 + D,s_2 + D} \right)} \end{array} $$ (5) where $D$ is the pitch of microlens under the unit of pixel size. We simplify Eq. (4) into ${\bf{H}}_{{\mathrm{for}}}\left(X \right) = Y$ to represent the forward projection in LFM. On the other hand, if we trace back each light ray that reaches the sensor, we can rebuild the sample $X(x_1, x_2, z)$ via
$$ \mathop {\sum}\limits_{s_1,s_2}{\frac{{ {H\left( {x_1,x_2,z,s_1,s_2} \right)Y\left( {s_1,s_2} \right)} }}{{\mathop {\sum}\nolimits_{w_1,w_2} {H\left( {x_1,x_2,z,w_1,w_2} \right)} }} = X\left( {x_1,x_2,z} \right)} $$ (6) We simplify Eq. (6) into ${\bf{H}}_{{\mathrm{back}}}\left(Y \right) = X$ to represent the backward projection in LFM. It is popular to use RL algorithm to refine X from Y and H. In each iteration, RL tries to update $\hat X^{\left(t \right)}$ from the last iteration result $\hat X^{\left({t - 1} \right)}$ via13
$$ \begin{array}{*{20}{c}} {\hat X^{\left( t \right)} \leftarrow \hat X^{\left( {t - 1} \right)} \odot {\bf{H}}_{{\mathrm{back}}}\left( {\frac{Y}{{{\bf{H}}_{{\mathrm{for}}}\left( {\hat X^{\left( {t - 1} \right)}} \right)}}} \right)} \end{array} $$ (7) where $\odot$ means element-wise multiplication. We denote the running Eq. (7) once as one RL iteration. Usually to reconstruct an LFM volume requires multiple RL iterations37. On the other hand, running RL iterations too much will cause severe edge ringing problems.
-
In this section, we first show how to learn a dual dictionary pair $\left({{\bf{D}}_{l, z}, {\bf{D}}_{h, z}} \right)$ with LFM model, where ${\bf{D}}_{l, z}$ is the collection of most representative elements of raw LFM reconstruction and ${\bf{D}}_{h, z}$ is the collection of corresponding high-fidelity and artifact-reduced elements. The element here means the local features of an image, e.g., corners for edges. We then show how to apply the learned dictionaries to achieve high-fidelity and artifact-reduced reconstruction from raw RL reconstruction.
We prepare a set of high-fidelity and high SNR 3D volume $\left\{ {I_j^{ref}} \right\}$ to learn the dictionary prior. For each reference volume $I_j^{ref}$, we numerically feed it into LFM forward projection built-in Eq. (4) to get an LFM capture $Y_j^{ref}$, then use the RL deconvolution in Eq. (7) to get a raw reconstructed volume $\hat I_j^{ref}$. In this way, we generate a set of high and low-fidelity volume pairs $\left\{ {\left({I_j^{ref}, \hat I_j^{ref}} \right)} \right\}$, where the resolution drops and artifacts in $\hat I_j^{ref}$ are generated through the real LFM model. Since the LFM artifacts are associated with depth z as discussed in Sec. 2.2, we split the volume pair $\left\{ {\left({I_j^{ref}, \hat I_j^{ref}} \right)} \right\}$ into different z-depth pairs $\left\{ {\left({I_{j, z}^{ref}, \hat I_{j, z}^{ref}} \right)} \right\}$ and further generate a patch dataset ${\cal{P}}_z$ regarding z-depth for the following training via
$$ \begin{array}{*{20}{c}} {{\cal{P}}_z = \left\{ {L_k\left( {I_{j,z}^{ref} - \hat I_{j,z}^{ref}} \right),L_k\left( {F\hat I_{j,z}^{ref}} \right)} \right\} \buildrel \Delta \over = \left\{ {p_h^k,p_l^k} \right\}} \end{array} $$ (8) where $L_k\left(\cdot \right)$ is the linear image-to-patch mapping so that a $\sqrt n \times \sqrt n$-pixel patch can be extracted from an image and $k$ is the patch index. Patches are randomly selected from the image with overlapping. Since some biological samples are quite sparse, we select patches with enough signal intensity to avoid null patches. $F$ is a feature extraction operator that provides a perceptually meaningful representation of patch38. The common option of $F$ can be the first- and second-order gradients of patches. The reason to use $I_{j, z}^{ref} - \hat I_{j, z}^{ref}$ is to let the later learning process focus on high-frequency information30. We also conduct a dimensionality reduction through Principal Component Analysis (PCA) algorithm to $\left\{ {p_l^k} \right\}$ for reducing superfluous computations30. After these preparations, the low-fidelity dictionary ${\bf{D}}_{l, z}$ which is the collection of most representative elements in $z$th depth of LFM reconstructed biological tissue can be learned via
$$ \begin{array}{l} {{\bf{D}}_{l,z},\left\{{\beta ^k} \right\} = {\mathrm{argmin}}\displaystyle\mathop {\sum}\limits_k \Vert p_l^k - {\bf{D}}_{l,z}\beta ^{k}\Vert_{2}^{2},{s.t.}\Vert\beta ^{k}\Vert_0 \,\le\, \kappa ,\forall k} \end{array} $$ (9) where $\Vert\cdot \Vert_2$ is $\ell _2$ norm which measures the data fidelity, $\Vert\cdot \Vert_0$ is the $\ell _0$ "norm" which measures the sparsity, $\beta _k$ is the sparse representation coefficients for low-fidelity patch $p_l^k$, and $\kappa$ is maximum sparsity tolerance. Equation (9) could be effectively solved by the well-known K-SVD algorithm39. The corresponding high-fidelity dictionary ${\bf{D}}_{h, z}$ is generated by solving the following quadratic programming (QP)
$$ \begin{array}{*{20}{c}} {{\bf{D}}_{h,z} = {\mathrm{argmin}}\mathop {\sum}\limits_j \left\Vert{I_{j,z}^{ref} - \hat I_{j,z}^{ref}} - \left[ {\mathop {\sum}\limits_k {L_k^TL_k} } \right]^{ - 1}\left[ {\mathop {\sum}\limits_k {L_k^T{\bf{D}}_{h,z}\beta ^k} } \right]\right\Vert_2^2} \end{array} $$ (10) Note the library pair $\left({{\bf{D}}_{l, z}, {\bf{D}}_{h, z}} \right)$ is specific for different $z$ since the degradation of imaging quality is depth-dependent. Here we assume the high- and low-fidelity dictionaries share the same sparse representation $\left\{ {\beta ^k} \right\}$ based on the assumption that artifact contamination and blur operation in LFM reconstructions are near-linear (Note S1). The NIP artifact is covered by the dictionary learned in the NIP layer. The defocus artifact is also covered since the whole reconstructed volume is learned instead of only learning single-image reconstructions, as a comparison to the traditional dictionary learning method38. The high-fidelity and artifact-free reference volume $\left\{ {I_j^{ref}} \right\}$ are collected from broad bioimage benchmark collection Nos. 021, 027, 032, 03340, and SOCR 3D Cell Morphometry Project Data41. The flowchart of the LFM dictionary learning process is shown in Fig. S1a.
To achieve high-fidelity and artifact-reduced volume $\tilde X^{(t)}$ from raw RL reconstruction volume $\hat X^{(t)}$, we run sparse representation for each z depth of $\hat X^{(t)}$ with the learned z-depth dictionary prior $\left({{\bf{D}}_{l, z}, {\bf{D}}_{h, z}} \right)$. Firstly, we estimate the sparse representation of each local patch of $\hat X_z^{(t)}$. We extract the local patch from $\hat X_z^{(t)}$ by the same mapping $L_k\left(\cdot \right)$ as above with the size of $\sqrt n \times \sqrt n$-pixel, then search a sparse coding vector $\alpha _z^k$ such that $L_k\hat X_z^{(t)}$ can be sparsely represented as the weighted summation of a few elements from ${\bf{D}}_{l, z}$
$$ \begin{array}{*{20}{c}} {\min\!\Vert \alpha_{z}^k\Vert_0,\qquad {s.t.}\Big\Vert FL_k\hat X_z^{(t)} - {\bf{D}}_{l,z}\alpha_{z}^k \Big\Vert_2\, \le\, \in} \end{array} $$ (11) where ${\it{\epsilon }}$ is the error tolerance. Eq. (11) can be solved via orthogonal matching pursuit (OMP) algorithm42. Secondly, we use the found sparse coefficients $\alpha _z^k$ to recover the high-fidelity and artifact-reduced patch $p_{h, z}^k$ by $p_{h, z}^k = {\bf{D}}_{h, z}\alpha ^k$, then accumulate $p_{h, z}^k$ to form a high-fidelity image $\tilde X_z^{(t)}$ by solving the following minimization problem
$$ \begin{array}{*{20}{c}} {\tilde X_z^{(t)} = {\mathrm{argmin}}\displaystyle\mathop {\sum}\limits_k \Big\Vert{L_k} \left({\tilde X_z^{(t)} - \hat X^{\left(t \right)}} \right) - p_{h}^{k}} \Big\Vert_2^2 \end{array} $$ (12) After concatenating $\tilde X_z^{(t)}$ into the whole volume $\tilde X^{(t)}$, a high-fidelity and artifact-reduced volume is recovered from original RL reconstruction $\hat X^{(t)}$. The flow-chart of the reconstruction processing is shown in Fig. S1b. To choose proper RL iterations before dictionary patching, one can visually check the RL output. Once there is edge ringing the RL iteration number should be reduced. For samples with uniform intensity distribution, 1 RL iteration is enough. All RL iteration numbers of experiments in the manuscript can be found in Table S2.
-
We train the dictionary with mixed Poisson and Gaussian noise contaminations. The dark noise and the photon noise of fluorescent imaging follow a Poisson distribution while the readout noise follows a Gaussian distribution. Hence, we choose the mixed Poisson and Gaussian noise to mimic the real situation. The observed image under the microscope thus can be modeled as43
$$ \begin{array}{*{20}{c}} {Y = \alpha {\rm{P}}\left( {\frac{{{ \bf{H} }_{{\mathrm{for}}}\left( X \right)}}{\alpha }} \right) + {\mathbb{N}}\left( {0,\sigma ^2} \right)} \end{array} $$ (13) where $Y$ is observed image, ${\bf{H}}_{{\mathrm{for}}}$ is the forward propagator of LFM, $X$ is the noise-free sample, $\alpha$ is the scaling factor that controls the strength of Poisson noise, ${\mathrm{P}}(\cdot)$ is the realization of Poisson noise, and ${\mathbb{N}}\left({0, \sigma ^2} \right)$ represents Gaussian noise with 0 mean and $\sigma ^2$ variance. We fix $\sigma ^2$ to be ~200 for 16-bit sCMOS image, and varying $\alpha$ to generate captures with the different noise levels. The high-fidelity and artifact-free reference volume $\left\{ {I_j^{ref}} \right\}$ are firstly propagated to the sensor plane, then added Poisson and Gaussian noise with MATLAB function imnoise to form $\left\{ {\hat I_j^{ref}} \right\}$. Then, noise aware dictionary is learned through Eqs. (9) and (10). $\left\{ {\hat I_j^{ref}} \right\}$ contains multiple levels of noise to accommodate different SNR conditions. Trained low- and high-fidelity dictionaries have different element numbers and patch sizes to accommodate different modalities, see Table S2.
-
The Drosophila embryo used in this study (Fig. 2) expressed histone tagged with EGFP (w; His2Av: : eGFP; Bloomington stock #23560). The embryos were collected by putting adult flies on a grape-juice agar plate for 45 min–1 h. After incubation at 25 ℃ for 1 h, the embryos were attached to a glass slide with double-sided tape. We use forceps to carefully roll an embryo on the tape until the embryo dechorionated. The Dechorionated embryos were embedded in 2% low-melting-temperature agarose in a Glass Bottom Dish (35 mm Dish with 20 mm Bottom Well, Cellvis). We put the Glass Bottom Dish on the microscope stage and scan the embryo along the z-axis 4 times with a 30 µm stride, then concatenate 4 reconstructed stacks to form the volume.
-
The Drosophila Adult Brain (w1118) used in this study (Fig. S5) was dissected at 4–5 days after eclosion in phosphate buffer saline (PBS) and fixed with 4% paraformaldehyde in PBST (PBS with 0.3%Triton X-100) for 30 min. After washing in PBST, the brain was blocked in 5% normal mouse serum in PBST for 2 h in RT (room temperature) and then immunostained using commercial antibodies. The brain was incubated in primary antibodies (Mouse anti nc82, 1:20, Hybridoma Bank) and secondary antibodies (Goat anti-mouse Alexa-488, 1:200, Invitrogen) for 48–72 h at 4 ℃, with a 2 h wash at 4 ℃ between the primary and secondary antibody incubations. After that, the brain was washed 3–4 times in PBST. The brain is cut into ~60 µm thickness slices. The slice was mounted and was further observed by the LFM in epifluorescence mode. No concatenation is made. No further deconvolution is applied.
-
Zebrafish from the transgenic line Tg(gata1:DsRed) were used in this study for blood cell imaging (Fig. 4, Fig. S6). For two-color recordings (Fig. 4), zebrafish from the transgenic line Tg(gata1:DsRed) were crossed with zebrafish from the transgenic line Tg(flk: EGFP). The embryos were raised at 28.5 ℃ until 4 dpf. Larval zebrafish were paralyzed by short immersion in 1$mgml^{ - 1}$ ${α}$-bungarotoxin solution (Invitrogen). After paralyzed, the larval were embedded in 1% low-melting-temperature agarose in a Glass Bottom Dish (35 mm Dish with 20 mm Bottom Well, Cellvis). We maintained the specimen at room temperature and imaged the zebrafish larval at 100 Hz.
-
Zebrafish from the transgenic line Tg(HUC: GCaMP6s) expressing the calcium indicator GCaMP6s was raised at 28.5℃ until 4 dpf for short-term functional imaging (Fig. 1b and Fig. 5). Larval zebrafish were paralyzed by short immersion in 1 mg ml−1 $α$-bungarotoxin solution (Invitrogen). After paralyzed, the larval were embedded in 1% low-melting-temperature agarose in a Glass Bottom Dish (35 mm Dish with 20 mm Bottom Well, Cellvis). For imaging, the dorsal side of the head of the larval zebrafish was facing the objective. We maintained the specimen at room temperature and imaged the zebrafish larval at 1 Hz. Assume the reconstructed volume by DiLFM is $\tilde X(x, y, z, t)$ where $\left({x, y, z} \right)$ is the 3D spatial coordinate of the voxel and $t$ labels the time, the temporal summarized volume was calculated through the following procedures. In the first step, we calculate the rank-1 background components of $\tilde X(x, y, z, t)$ via
$$ \begin{array}{*{20}{c}} {\left[ {b,f} \right] = \arg \mathop {\min }\limits_{b,f} \mathop \sum \limits_t \left\| {\tilde X(x,y,z,t) - b(x,y,z) \cdot f\left( t \right)} \right\|_2^2} \end{array} $$ (14) where $b(x, y, z)$ is the spatial background and $f\left(t \right)$ is the temporal background. $b$ and $f$ can be calculated through normal non-negative matrix factorization techniques44. The background-subtracted image is then calculated by $\tilde X_1(x, y, z, t) = \tilde X(x, y, z, t) - b(x, y, z) \cdot f\left(t \right)$. Then, we calculate the standard deviation volume of all the background-subtracted volumes across the time domain via
$$ \begin{array}{*{20}{c}} {\tilde X_2\left( {x,y,z} \right) = \sqrt {\frac{{\mathop {\sum}\nolimits_t {\left( {\tilde X_1\left( {x,y,z,t} \right) - \frac{{\mathop {\sum}\nolimits_s {\tilde X_1\left( {x,y,z,s} \right)} }}{T}} \right)} ^2}}{T}} } \end{array} $$ (15) where $T$ is the total frame number. In Fig. 5a, we plot the maximum intensity projections of $\tilde X_2\left({x, y, z} \right)$ along $x$-, $y$-, and $z$-axis to show fired neuron distributions in zebrafish larvae. All captured frames are used for the above calculation.