FPGA-accelerated mode decomposition for multimode fiber-based communication

Qian Zhang; Yuedi Zhang; Juergen Czarske

doi:10.37188/lam.2025.031

Mode division multiplexing (MDM) using multimode fibers (MMFs) is key to meeting the demand for higher data rates and advancing internet technologies. However, optical transmission within MMFs presents challenges, particularly due to mode crosstalk, which complicates the use of MMFs to increase system capacity. Quantitatively analyzing the output of MMFs is essential not only for telecommunications but also for applications like fiber sensors, fiber lasers, and endoscopy. With the success of deep neural networks (DNNs), AI-driven mode decomposition (MD) has emerged as a leading solution for MMFs. However, almost all implementations rely on Graphics Processing Units (GPUs), which have high computational and system integration demands. Additionally, achieving the critical latency for real-time data transfer in closed-loop systems remains a challenge. In this work, we propose using field-programmable gate arrays (FPGAs) to perform neural network inference for MD, marking the first use of FPGAs for this application, which is important, since the latency of closed-loop control could be significantly lower than at GPUs. A convolutional neural network (CNN) is trained on synthetic data to predict mode weights (amplitude and phase) from intensity images. After quantizing the model’s parameters, the CNN is executed on an FPGA using fixed-point arithmetic. The results demonstrate that the FPGA-based neural network can accurately decompose up to six modes. The FPGA’s customization and high efficiency provide substantial advantages, with low power consumption (2.4 Watts) and rapid inference (over 100 Hz), offering practical solutions for real-time applications. The proposed FPGA-based MD solution, coupled with closed-loop control, shows promise for applications in fiber characterization, communications, and beyond.

HTML

Materials and methods

FPGA information and development environment

FPGA stands for Field programmable Gate Array, which has three main advantages, flexibility, low latency, and high energy efficiency. In this work, the Xilinx Zynq-7020 SoC FPGA is used. This device is unique in integrating the traditional PL component of an FPGA with the PS including a dual-core ARM Cortex-A9 processor on a single chip. The selected FPGA development tool is Vivado 2019.1, and Vivado HLS is used for IP core generation.

DNN architecture

The presented investigation aims to train a neural network that can be run on an FPGA system to accelerate the decomposition process. In this work, CNNs were used to classify the eigenmodes and decompose the distal speckle images. Convolutional layers, the core component of a CNN, are often used to process image data to extract valid information from the image with a convolutional operation and pass it to the next layer³⁰. Each convolution operation involves numerous multiplications and additions, which can be efficiently accelerated using the parallel processing capabilities of an FPGA.

Fig. 1 visualizes the neural network structure for mode classification. This network consists of 3 convolutional layers with downsampling for encoding and 2 fully connected layers for classification. All convolution kernels are 3 × 3, and all max pooling kernels are 2 × 2. The output size of the feature map is denoted below of box. For all the training cases from 3 to 40 modes, the learning rate is set as 0.0001. The cross-entropy loss function and the Adam optimizer are selected. For classification from 3 to 19 modes, the network is trained for 15 epochs with a batch size of 64. For 19 to 40 modes, due to the increasing complexity, the network is trained for 40 epochs with a batch size of 128.

For MD, the complexity of the task is higher due to the presence of an infinite number of mode combinations. As the number of modes increases, the corresponding number of CNN layers also needs to be increased to accommodate the training needs of more modes. In addition, in order to better capture and distinguish the details of different modes, higher resolution images are used, which increase the amount of mode information in the image. These adjustments improve the network’s ability to recognize different patterns, thereby ensuring the accuracy and reliability of the model. Table 5 shows the details of the configuration. To address the 3-mode decomposition, 40,000 pairs of data are generated and divided into the training, validation, and test datasets. The model is trained with a batch size of 256, 1500 epochs, and a learning rate of 0.0006. For 5 and 6 modes, these parameters are adjusted accordingly. Due to the increasing complexity, 100,000 pairs of data are generated. This model is trained with a batch size of 512, 1000 epochs, and a learning rate of 0.001. In all cases, the validation and test dataset has 1000 pairs of data. All the remaining data is used as the training dataset. The MSE loss function and the Adam optimizer are used, as they are suitable for the regression task of MD⁴⁷.

CNN Model Configuration
3 Modes	5 Modes	6 Modes
Input (16 × 16 Image)	Input (32 × 32 Image)	Input (32 × 32 Image)
Conv3-32	Conv3-32	Conv3-32
Conv3-32	Conv3-32	Conv3-32
Maxpooling (2 × 2)	Maxpooling (2 × 2)	Maxpooling (2 × 2)
Conv3-32	Conv3-64	Conv3-64
	Conv3-64	Conv3-64
Maxpooling (2 × 2)	Maxpooling (2 × 2)	Maxpooling (2 × 2)
	Conv3-128	Conv3-128
	Conv3-128	Conv3-128
	Maxpooling (2 × 2)	Maxpooling (2 × 2)
		Conv3-256
		Conv3-256
		Maxpooling (2 × 2)
	FC-2048	FC-2048
FC-512	FC-512	FC-512
FC-5	FC-9	FC-11

Table 5. CNN model configuration for MD of different modes.

Figures of merit of CNN performance

Classification accuracy is used as a metric to evaluate the performance of CNNs in the mode classification, which is calculated as:

$$ Accuracy=\frac{{N}_{Number\;of\;Correct\;Prediction}}{{N}_{Total\;Number\;of\;Prediction}} $$

(4)

To quantify the performance of the MD algorithm, the reconstructed field distributions are compared with the original (target) field distributions, where correlation coefficient (Γ) is used and defined as below:

$$\Gamma =\frac{\sum _{ROI}\left({I}_{T}-\overline{{I}_{T}}\right)\left({I}_{R}-\overline{{I}_{R}}\right)}{\sqrt{\left(\sum _{ROI}\left({I}_{T}-\overline{{I}_{T}}\right)^{2}\right)\left(\sum _{ROI}({I}_{R}-\overline{{I}_{R}}{)}^{2}\right)}} $$

(5)

where $ {I}_{T} $ is the target mode distribution and $ {I}_{R} $ is the reconstructed distribution. $ ROI $ refers to the entire area of interest. Intensity distribution I indicates the mean value of the respective intensity distribution. The value of $ \varGamma $ presents how similar the reconstructed intensity distribution is to the target. The relative error between the predicted mode weights and the target mode weights are introduced as the metric and defined as: $ \text{Δ}\rho =|\sqrt{{{\rho }_{p}^{2}}}-\sqrt{{{\rho }_{t}^{2}}}| $, $ {\Delta }\rho =({|{\phi }_{p}-{\phi }_{t}|})/{2{\text π}} $. Here, “$ p $” indicates a predicted value, and “$ t $” denotes a target value. Additionally, the standard deviation σ is used to evaluate the robustness of the DNN-based MD system. σ is determined as follows: $ \sigma =\sqrt{{1}/({K-1}){\sum }_{k=1}^{K}({{\Gamma }}_{k}-\overline{{\Gamma }})} $.

Quantization of CNN

Quantization is the process of converting values with higher precision, such as floating point numbers, into values with lower precision, such as integers. For FPGAs, quantization reduces model weights, bias, and activations from floating-point numbers (e.g., 32-bit floats) to lower bit widths (e.g., 16-bit integers). This reduction significantly decreases storage requirements, enhances computational speed, and reduces power consumption. By minimizing data bit-widths, larger models or more layers can be run on limited hardware resources, improving hardware utilization⁴⁸. In general, there are two main classes of quantization: static quantization and dynamic quantization. Static quantization is to quantize the weights and activation values of the model before inference. Dynamic quantization is to dynamically adjust the quantization parameters according to the data during the inference process. This increases the complexity and uncertainty of the calculation and is not suitable for FPGA systems that require high certainty. Therefore, in this work, the fixed-point quantization method in the static quantization method is chosen. Fixed-point quantization converts floating-point numbers to integers by multiplying by a fixed scaling factor and then rounding. Fixed-point numbers can represent integers or decimal numbers, depending on the position of the radix point, also known as the Q-value. The Q-value determines the precision and range of the representation. Floating-point numbers have a dynamic radix point position, providing higher precision but a narrower range. In contrast, fixed-point numbers have a fixed radix point position, offering a wider range but lower precision. Using a fixed Q-value, the weights and biases of each layer are quantized, converting them into fixed-point numbers, which can be represented by the following equation:

$$ {x}_{q}=\left(int\right){x}_{f}\cdot {2}^{Q}$$ (6)

where $ {x}_{q} $ is the fixed-point number, $ {x}_{f} $ is the floating-point number, and $ Q $ is the radix point position. The conversion back to floating-point is represented as:

$$ {x}_{f}=\left(float\right){x}_{q}\cdot {2}^{-Q} $$ (7)

While fixed-point quantization simplifies computation and reduces storage requirements by simply converting floating-point numbers to fixed-point numbers, this conversion also introduces truncation errors, leading to accuracy loss. A higher Q-value results in a smaller decimal range and higher precision, while a lower Q-value results in a larger decimal range and lower precision. Hence, a trade-off between decimal range and representation accuracy in fixed-point numbers is required. In the task of mode classification, for the fixed-point quantization of weights, inputs, and activation to 16-bit fixed-point numbers, a Q-value of 10 is chosen, corresponding to a scaling factor of 1024. This choice expands the overall range of representable values while maintaining reasonable decimal precision. In the test case of mode decomposition, a Q-value of 12 is chosen for the 16-bit fixed-point quantization, corresponding to a scaling factor of 4096. The precision in decimal places is calculated as follows:

$$ Precision\;in\;decimal\;places={log}_{10}({2}^{Q}) $$ (8)

As a result, the precision is equivalent to approximately 3.01 and 3.61 decimal places for the classification test and decomposition task, respectively. This decision is made by examining the parameter sizes of the exported CNN model and analyzing the distribution of input values and activation values in each layer after testing the model with a large number of input images. This Q-value ensures that the quantized model maintains high accuracy while optimizing resource usage on the FPGA.

Acknowledgements

This research was funded by the Federal Ministry of Education and Research of Germany with the project 6G- life (grant identification number: 16KISK001K) and QUIET (project identification number: 16KISQ092). Furthermore, this work was partially supported by the German Research Foundation for funding (grant number: CZ 55/42-2).

Reference (48)

[1]	Richardson, D. J., Fini, J. M. & Nelson, L. E. Space-division multiplexing in optical fibres. Nature Photonics 7, 354-362 (2013).
[2]	Yan, S. Y. et al. Archon: a function programmable optical interconnect architecture for transparent intra and inter data center SDM/TDM/WDM networking. Journal of Lightwave Technology 33, 1586-1595 (2015).
[3]	Cao, H. et al. Controlling light propagation in multimode fibers for imaging, spectroscopy, and beyond. Advances in Optics and Photonics 15, 524-612 (2023).
[4]	Rothe, S. et al. Intensity-only mode decomposition on multimode fibers using a densely connected convolutional network. Journal of Lightwave Technology 39, 1672-1679 (2021).
[5]	Manuylovich, E. S., Dvoyrin, V. V. & Turitsyn, S. K. Fast mode decomposition in few-mode fibers. Nature Communications 11, 5507 (2020).
[6]	Inan, B. et al. DSP complexity of mode-division multiplexed receivers. Optics Express 20, 10859-10869 (2012).
[7]	Zhou, Y. Y. et al. High-fidelity spatial mode transmission through a 1-km-long multimode fiber via vectorial time reversal. Nature Communications 12, 1866 (2021).
[8]	Caravaca-Aguirre, A. M. et al. Real-time resilient focusing through a bending multimode fiber. Optics Express 21, 12881-12887 (2013).
[9]	Popoff, S. M. et al. Measuring the transmission matrix in optics: an approach to the study and control of light propagation in disordered media. Physical Review Letters 104, 100601 (2010).
[10]	Lyu, M. et al. Fast modal decomposition for optical fibers using digital holography. Scientific Reports 7, 6556 (2017).
[11]	Rothe, S. et al. Transmission matrix measurement of multimode optical fibers by mode-selective excitation using one spatial light modulator. Applied Sciences 9, 195 (2019).
[12]	Kaiser, T. et al. Complete modal decomposition for optical fibers using CGH-based correlation filters. Optics Express 17, 9347-9356 (2009).
[13]	Flamm, D. et al. Mode analysis with a spatial light modulator as a correlation filter. Optics Letters 37, 2478-2480 (2012).
[14]	Rothe, S. et al. Securing data in multimode fibers by exploiting mode-dependent light propagation effects. Research 6, 0065 (2023).
[15]	An, Y. et al. Learning to decompose the modes in few-mode fibers with deep convolutional neural network. Optics Express 27, 10127-10137 (2019).
[16]	Rothe, S. et al. Deep learning for computational mode decomposition in optical fibers. Applied Sciences 10, 1367 (2020).
[17]	Fan, X. J. et al. Mitigating ambiguity by deep-learning-based modal decomposition method. Optics Communications 471, 125845 (2020).
[18]	Brüning, R. et al. Comparative analysis of numerical methods for the mode analysis of laser beams. Applied Optics 52, 7769-7777 (2013).
[19]	Choi, K. & Jun, C. Sub-sampled modal decomposition in few-mode fibers. Optics Express 29, 32670-32681 (2021).
[20]	Zhang, Q. et al. Learning the matrix of few-mode fibers for high-fidelity spatial mode transmission. APL Photonics 7, 066104 (2022).
[21]	Guo, K. Y. et al. [DL] a survey of FPGA-based neural network inference accelerators. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 12, 2 (2019).
[22]	Conkey, D. B., Caravaca-Aguirre, A. M. & Piestun, R. High-speed scattering medium characterization with application to focusing light through turbid media. Optics Express 20, 1733-1740 (2012).
[23]	Radner, H. et al. Field-programmable system-on-chip-based control system for real-time distortion correction in optical imaging. IEEE Transactions on Industrial Electronics 68, 3370-3379 (2021).
[24]	Nauber, R., Bu¨ttner, L. & Czarske, J. Measurement uncertainty analysis of field-programmable gate-array-based, real-time signal processing for ultrasound flow imaging. Journal of Sensors and Sensor Systems 9, 227-238 (2020).
[25]	Snyder, A. W. & Love, J. D. Optical Waveguide Theory. (New York: Springer, 1983).
[26]	An, Y. et al. Numerical mode decomposition for multimode fiber: from multi-variable optimization to deep learning. Optical Fiber Technology 52, 101960 (2019).
[27]	Barbu, T. Variational image denoising approach with diffusion porous media flow. Abstract and Applied Analysis 2013, 856876 (2013).
[28]	Boyat AK, Joshi BK. A review paper : noise models in digital image processing. Signal Image Process Int J 6(2): 63–75 (2015). doi: 10.5121/sipij.2015.6206
[29]	Dong, X. W., Yu, Z. H. & Su, X. X. High-accuracy mode decomposition for multi-mode fibers using hybrid network with mini-datasets. Optical and Quantum Electronics 56, 1006 (2024).
[30]	Krizhevsky, A. , Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc. , 2012, 1097-1105.
[31]	Zhang, C. et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. Proceedings of 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey: ACM, 2015, 161-170.
[32]	He, K. et al. Integrating large circular kernels into CNNs through neural architecture search. (2021). at https://doi.org/10.48550/arXiv.2107.02451.
[33]	Sahin, S. , Becerikli, Y. & Yazici, S. Neural network implementation in hardware using FPGAs. Proceedings of the 13th International Conference on Neural Information Processing. Hong Kong, China: Springer, 2006, 1105-1112.
[34]	Zhou, Y. M. & Jiang, J. F. An FPGA-based accelerator implementation for deep convolutional neural networks. Proceedings of the 2015 4th International Conference on Computer Science and Network Technology. Harbin: IEEE, 2015, 829-832.
[35]	Asuero, A. G., Sayago, A. & González, A. G. The correlation coefficient: an overview. Critical Reviews in Analytical Chemistry 36, 41-59 (2006).
[36]	Ma, Y. F. et al. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 1354-1367 (2018).
[37]	Syed Yasir Abbas Zaidi, Muhammad Faisal Aslam, Faisal Mahmood, Bilal Ahmad, Sadia Bint Raza, How Will Artificial Intelligence (AI) Evolve Organizational Leadership? Understanding the Perspectives of Technopreneurs, Global Business and Organizational Excellence, 10.1002/joe. 22275, 44, 3, (66-83), (2024).
[38]	Rodríguez-Andina, J. J., Valdés-Peña, M. D. & Moure, M. J. Advanced features and industrial applications of FPGAs—a review. IEEE Transactions on Industrial Informatics 11, 853-864 (2015).
[39]	Kim, B., Na, J. & Jeong, Y. Convolutional neural network combined with stochastic parallel gradient descent to decompose fiber modes based on far-field measurements. Journal of Lightwave Technology 41, 5973-5982 (2023).
[40]	Ruan, Z. S. et al. Flexible orbital angular momentum mode switching in multimode fibre using an optical neural network chip. Light: Advanced Manufacturing 5, 296-307 (2024).
[41]	Li, Z. W. et al. Self-supervised dynamic learning for long-term high-fidelity image transmission through unstabilized diffusive media. Nature Communications 15, 1498 (2024).
[42]	Turtaev, S. et al. High-fidelity multimode fibre-based endoscopy for deep brain in vivo imaging. Light: Science & Applications 7, 92 (2018).
[43]	Murray, M. J. et al. Speckle-based strain sensing in multimode fiber. Optics Express 27, 28494-28506 (2019).
[44]	Sun, J. W. et al. Quantitative phase imaging through an ultra-thin lensless fiber endoscope. Light: Science & Applications 11, 204 (2022).
[45]	Du, Y. et al. Hybrid multimode-multicore fibre based holographic endoscope for deep-tissue neurophotonics. Light: Advanced Manufacturing 3, 408-416 (2022).
[46]	Koukourakis, N. et al. Investigation of human organoid retina with digital holographic transmission matrix measurements. Light: Advanced Manufacturing 3, 211-225 (2022).
[47]	Goodfellow, I. , Bengio, Y. & Courville, A. Deep Learning. (Cambridge: MIT Press, 2016).
[48]	Qiu, J. T. et al. Going deeper with embedded FPGA platform for convolutional neural network. Proceedings of 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey: ACM, 2016, 26-35.

	3 modes				5 modes				6 modes
	Γ	STD	AME	PME	Γ	STD	AME	PME	Γ	STD	AME	PME
PyTorch	99.52%	5.22e⁻²	0.63e⁻²	2.23e⁻²	94.59%	10.45e⁻²	2.03e⁻²	9.07e⁻²	91.60%	13.77e⁻²	2.75e⁻²	11.33e⁻²
FPGA	99.52%	5.22e⁻²	0.63e⁻²	2.23e⁻²	93.03%	11.81e⁻²	3.52e⁻²	9.69e⁻²	89.35%	16.04e⁻²	5.14e⁻²	12.31e⁻²

	LUT	LTURAM	FF	BRAM	DSP
Available	53200	17400	106400	140	220
Utilization	28387	5314	19477	26.50	133
Utilization %	53.36	30.54	18.31	18.93	60.45

	Dynamic						Device Static
	Clocks	Signals	Logic	BRAM	DSP	PS	Device Static
Power/w	0.087	0.280	0.157	0.037	0.142	1.543	0.164
Utilization/%	4	12	7	2	6	69	7

Device	Power	Inference time	Efficiency
FPGA	2.41 w	5.87 ms	70.69 fps/w
GPU	77.48 w	2.38 ms	5.42 fps/w

FPGA-accelerated mode decomposition for multimode fiber-based communication

Abstract

References

Rights and permissions

通讯作者: 陈斌, bchen63@163.com

Research Summary

Article Metrics