HTML
-
The temporal entropy map was first computed from the global camera image sequence. Specifically, each pixel in the global video is viewed as a 1-D signal X, and its entropy is calculated as follows:
$$ E\left( X \right) = - \mathop {\sum}\nolimits_{v = 0}^{255} {p\left( v \right)\log p(v)} $$ where $v = 0, 1, 2, \ldots , 255 $ are all the possible intensity values of pixel X. P(v) is the possibility that its intensity value is v. Such a criterion highlights the regions with a large number of dynamic objects and can be computed efficiently. It is worth noting that the criteria used in our experiments merely illustrate a general method to calculate the temporal entropy map. The temporal entropy map definition can vary under different applications. More details are presented in Supplementary Fig. S5.
With the temporal entropy map, the unstructured sampling strategy can be formulated into an optimization problem. The objective is to maximize the covered information for n given local cameras:
$$ \mathop {{\max }}\limits\limits_{x, y, w, h} {\sum} {\mathop {\bigcup}\nolimits_{i = 1}^n {E(x_i, y_i, w_i, h_i)} } $$ where i denotes the index of a local camera, and E is the computed entropy map. For simplicity, the FoV of local cameras can be represented using rectangles. The width wi and height hi are determined by the CMOS sensor size and the focal length of the ith camera. xi and yi are the centre position of its FoV. E(xi, yi, wi, hi) represents the entropy covered by the ith local camera. The objective is to maximize the entropy covered by all the local cameras. An acceptable solution can be found using a greedy searching algorithm.
-
The unstructured embedding scheme aims to share information between global and local cameras in the UnstructuredCam module. This sharing is realized by finding a mapping field between the global camera and local cameras. To avoid visual artefacts and to handle the parallax, a mesh-based multiple homography model is used to represent the mapping, and an improved coarse-to-fine pipeline is adopted to enable online calibration28. More details are presented in Supplementary Fig. S2 and Method S1.
-
A novel trinocular algorithm making full use of both global and local cameras is used to estimate the depth information. The disparity and depth map of each subarray are first estimated from the two global cameras through intermodule (intra-subarray) collaboration. The colour panorama is then generated by intermodule (inter-subarray) collaboration, and the estimated parameters are used to generate the panoramic depth map. Similarly, high-resolution local videos are embedded in a colour panorama using an unstructured embedding algorithm. After that, intramodule collaboration is applied to refine the local depth map by merging the high-resolution RGB image and low-resolution depth image. Please refer to Supplementary Fig. S4 and Method S2 for more details.
-
Twenty-one real-world outdoor scenes were captured and analysed to verify our array camera, and we are continuously collecting more videos to enrich our dataset19. The captured videos were labelled by a professional team, including the headboxes, body boxes, visual boxes, face orientations, trajectories, and group status for all the persons. To estimate the interpersonal distance, a projective transformation matrix was estimated to project the images to the top view. The scale bar was estimated from the satellite map. For the crowd scene, a face detection algorithm27 is used to locate the faces. The algorithm worked quite well here because nearly all the marathon runners were facing the camera. After that, kernelized correlation filter (KCF)29 was used to generate the trajectories of each runner with speed and acceleration measurements.