The concept of an ANN-driven intelligent metasurface obtained by integrating a programmable metasurface with deep learning techniques is illustrated in Fig. 1. As shown in Fig. 1b, the designed reflection-type programmable metasurface is composed of 32 × 24 digital meta-atoms with a size of 54 × 54 mm2, and each meta-atom is integrated with a PIN diode (SMP1345-079LF) for electronic control. More details on the designed meta-atoms and programmable metasurface are provided in Supplementary Figs. 1, 2. With reference to Fig. 1b, our intelligent metasurface has active and passive modules of operation. In the active module, the metasurface system includes a transmitter (Tx) to emit RF signals into the investigated region through Antenna 1 and a receiver (Rx) to detect the echoes bounced back from the subject through Antenna 2. In the passive module, the system has two or more coherent receivers to collect the stray Wi-Fi waves bounced back from the target subject.
Figure 2 schematically illustrates three building blocks of the data flow pipeline. In Fig. 2, the microwave data collected by the intelligent metasurface are instantly processed with an imaging CNN (the first CNN of the intelligent metasurface, called IM-CNN-1 for short) to reconstruct the image of the whole human body. More details on IM-CNN-1 are given in the Methods section and Supplementary Fig. 3. Then, a well-developed Faster R-CNN47 is adopted to find the region of interest (ROI) within the whole image, for instance, the chest for respiration monitoring and the hand for sign-language recognition. Afterward, a modified Gerchberg-Saxton (G-S) algorithm is implemented to come up with the optimal digital coding sequence for controlling the programmable metasurface so that its radiation wave is focused onto the desired spots, as presented in Supplementary Information. After receiving the command from the host computer, the programmable metasurface will adaptively focus the EM waves onto the desired spots to read the hand signs or physiological state. As such, not only can unwanted disturbances be excluded effectively, but the SNR of echoes from the local body parts of interest can also be remarkably enhanced by a factor of 20 dB, improving the subsequent recognition of hand signs and vital signs (see Supplementary Figs. 6, 7). We develop the other CNN (IM-CNN-2) to process the microwave data to recognize hand signs. In addition, human breath is identified by time-frequency analysis of the microwave data. More details on IM-CNN-2 and the respiration identification algorithm are given in Supplementary Fig. 4. Several sets of representative results are recorded in Supplementary Videos 1, 2.
We first present in situ high-resolution microwave imaging of the whole human body in active mode, which is conducted in our lab environment. In this scenario, the intelligent metasurface system has two horn antennas connected to two ports of the Agilent vector network analyzer (VNA). One antenna is used to transmit EM signals into the investigated domain, and the other receives the EM echoes bounced back from the specimen. In high-resolution imaging, the programmable metasurface serves as a spatial microwave modulator controlled by the field-programmable gate array (FPGA) to register the information about the specimen in a compressive-sensing manner (see Supplementary Information).
To process the microwave data instantly, the kernel of the intelligent metasurface for whole-body imaging is IM-CNN-1. To obtain a large number of labeled samples for training IM-CNN-1, a commercial 4-megapixel digital optical camera is embedded in the intelligent metasurface system. The training samples captured by the camera are used to train IM-CNN-1 after being preprocessed with background removal, threshold saturation, and binary-value processing (see Supplementary Fig. 3). The labeled human-body images can be approximately regarded as EM reflection images of the human body over the frequency range from 2.4 to 2.5 GHz. We collect 8 × 104 pairs of labeled training samples in our lab environment, and it takes ~8 h to train IM-CNN-1. The trained IM-CNN-1 can then be used to instantly produce a high-resolution image of the human body in < 0.01 s.
We experimentally characterize the performance of the intelligent metasurface in obtaining high-resolution images of the whole human body and simultaneously monitoring notable movements in an indoor environment. Two volunteers (coauthors Shuang Ya and Hao Yang Li, referred to as training persons) with different gestures are used to train the intelligent metasurface, while three persons (coauthors Shuang Ya, Hanting Zhao, and Menglin Wei, referred to as testing persons) are invited to test it. The trained intelligent metasurface is then used to produce high-resolution images of the test persons, from which their body gesture information can be readily identified. A series of imaging results are presented in Fig. 3 and Supplementary Video 1. In particular, the "see-through-the-wall" ability of the metasurface is validated by clearly detecting notable movements of the test persons behind a 5-cm-thick wooden wall. Selected results are provided in the rightmost column of Fig. 3, where the corresponding optical images and microwave raw data are given as well. To examine the imaging quality quantitatively, Supplementary Fig. 5a compares the image quality versus the number of random coding patterns of the programmable metasurface in terms of the similarity structure index metric (SSIM)34. We show that 53 coding patterns, where 101 frequency points from 2.4 to 2.5 GHz are utilized for each coding pattern, are enough to obtain high-quality images. As reported in the Supplementary Information, the switching time of coding patterns is ~10 μs, implying that the data acquisition time is < 0.7 ms in total even if 63 coding patterns are used. Consequently, we safely conclude that the intelligent metasurface integrated with IM-CNN-1 can instantly produce high-quality images of multiple persons in the real world, even when they are behind obstacles.
After obtaining a high-resolution image of the whole body, the intelligent metasurface is then used to recognize the hand signs and vital signs adaptively in real indoor environments. This capacity benefits from the robust feature of the intelligent metasurface in adaptively focusing the EM energy onto the desired spots with very high spatial resolution. This feature supports accurate detection of EM echoes reflected from the human hand for recognizing sign language or from the chest for identifying respiration. Typically, the sign-language rate of the human hand and respiration rate are on the order of 10~30 bps, which is drastically slower than the switching speed of the coding patterns by a factor of 105. Thus, the radiation beams of the intelligent metasurface are manipulated to rapidly scan the local body parts of interest in each observation time interval. As a result, we realize monitoring of the hand signs and respiration of multiple people simultaneously in a time-division multiplexing way (see Supplementary Fig. 4).
To achieve the complicated task, we propose a three-step routine procedure. First, the Faster R-CNN47 is applied to extract the hand or chest part from the full-scene image obtained with IM-CNN-1 in a divide-and-conquer manner. Second, the metasurface is manipulated by adaptively changing its coding pattern to make its radiation beam point to the hand or chest (see Fig. 4a-c). Third, IM-CNN-2, an end-to-end mapping from the microwave data to the label of hand-sign language, is developed to recognize hand signs. Conventional time-frequency analysis is performed for detecting respiration (see Supplementary Fig. 4).
The training samples of IM-CNN-2 include ten hand signs (see Fig. 4a, corresponding to ten different English letters) and 8000 samples for each hand sign. Thus, we have 80, 000 samples in total. Figure 4d reports the classification matrix for the ten hand signs with an average recognition accuracy of above 95% by using the intelligent metasurface integrated with IM-CNN-2, where the test people are behind a 5-cm-thick wooden wall. We clearly see that the hand-sign recognition performance is nearly not affected by the number of test persons after the hand parts are well identified by the Faster R-CNN.
Respiration is an important health metric for tracking human physiological states (e.g., sleep, pulmonology, and cardiology). Similar to the recognition of human hand signs, we use the intelligent metasurface to monitor human respiration with high accuracy. Figure 4e reports the results of respiration monitoring of two test persons behind the wood wall. We observe that normal breathing and breath holding are clearly distinguished and that the respiration rate can further be identified with an accuracy of 95% and above, where the ground truth is obtained by a commercial breathing monitoring device. It can be expected that the identification performance is almost independent of the number of test persons due to the use of time-division multiplexing respiration detection.
Our intelligent metasurface works at ~2.4-2.5 GHz, which is exactly the frequency of commodity Wi-Fi signals. Here, we investigate the performance of high-resolution imaging of the full scene and recognition of human hand signs and vital signs when the metasurface is excited by commodity stray Wi-Fi signals. For simplicity, we particularly consider using Wi-Fi beacon signals. In this case, the intelligent metasurface works differently in three major aspects. First, the stray non-cooperative Wi-Fi signals are dynamically manipulated by the metasurface. Second, two or more coherent receiving antennas are used to acquire the Wi-Fi signals bounced back from the subject specimen with the aid of an oscilloscope (Agilent MSO9404A). Third, the microwave data acquired by the receivers are coherently preprocessed before being sent to IM-CNN-1 such that the statistical uncertainties on stray Wi-Fi signals can be calibrated out. More details can be found in Supplementary Video 2 and the Supplementary Information.
Figure 5a presents a set of in situ passive imaging results of a subject person behind the wooden wall in our indoor lab environment, where random coding patterns are also used in the programmable metasurface. We surprisingly note that the imaging results obtained by the commodity stray Wi-Fi signals are comparable to those obtained in active mode. Based on the high-resolution images of the full human body, we can realize the recognition of hand signs and vital signs by adaptively performing the routine three-step procedure in active mode. In particular, the Faster R-CNN is operated on the full-scene image to instantly find the location of the hand or chest; then, suitable coding patterns of the intelligent metasurface can be achieved and controlled so that the stray Wi-Fi signals are spatially focused on the desired spots and enhanced; and finally, IM-CNN-2 or the time-frequency analysis algorithm is used to realize the recognition of hand signs and vital signs. As shown in Fig. 5b, c, the commodity Wi-Fi signals can be well focused onto the desired location, e.g., the left hand of the subject person, by using the developed intelligent metasurface. As a result, the SNR of the Wi-Fi signals can be significantly enhanced with a factor of more than 20 dB, which is directly beneficial for the subsequent recognition of hand signs and vital signs (see Supplementary Figs. 7, 8). Figure 5d, e shows the experimental results for hand-sign and respiration recognition of two people, revealing improved accuracies of 90% and 92%, respectively. To summarize, even with illumination by stray Wi-Fi signals, the proposed intelligent metasurface can obtain high-resolution images of a full scene and achieve high-accuracy recognition of hand signs and vital signs of multiple people in a smart and real-time way in the real world.