HTML
-
Not long after the first demonstration of holographic images by Leith and Upatnieks1, as well as by Denisyuk2, a similar technique was used by De Bitetto and others to display motion picture holograms3, 4. As the name indicates, a motion picture hologram uses a rapid succession of static holographic images to reproduce the effect of movement in 3D. Because of these successes, it was hoped at the time that a holographic television would be developed soon after. Unfortunately, more than 50 years later, there is still no holographic television in sight.
Compared to motion picture, a holographic display system needs to capture, transmit, and render 3D information. In the case of an interactive display system, such as a computer screen, there is the additional constraint that real-time manipulation of 3D data is required. This makes a universal holographic 3D display much more challenging to develop than the simple successive projection of pre-recorded holograms.
To provide a sense of how difficult a holographic display is compared to other forms of telecommunication, it is useful to consider the commonly used data rate metric. Even though focusing on the data rate does not take into account some other technological aspects, such as rendering complexity, it does allow comparison over a large range of techniques. For a holographic display to have an acceptable field of view of perhaps
$ \pm 45^\circ $ (diffraction angle$ \theta $ ), the law of diffraction indicates that the diffractive element size$ d $ for a central wavelength of$ \lambda = 500 $ nm should be on the order of 350 nm.$$ d = \frac{\lambda}{2 sin \theta} $$ (1) It should be noted that elements should be approximately 10 times smaller than predicted by equation 1 to achieve high-efficiency blazed diffraction gratings.
Given a 70 cm diagonal-screen television set (i.e.,
$ \mathrm{50\; cm \;\times\; 50\; cm} $ ), the number of active elements must be on the order of$ 2\;\times\;10^{12} $ . To accommodate human vision perception, the display requires a minimum refresh rate of 60 Hz (corresponding to the flicker vision threshold), and at least three colors to fill the eye chromaticity gamut. The gray-level resolution for a conventional display is generally at least 8 bits (256 levels). For the sake of comparison, we will use the same number of phase levels for a hologram, although phase levels are related to efficiency and not shades of gray in holograms. It is even possible to reconstruct a 3D image with a binary hologram by sacrificing some spatial bandwidth and efficiency5. The resulting data rate for such a display, excluding any type of compression algorithm, would be$$ \begin{split} & \mathrm{pxl\; nbr \times rep.\; rate \times res. \times colors}\\ & = 2\times10^{12}\times 60 \times 8 \times 3 \approx 3\times 10^{15}\; \mathrm{b/s.} \end{split} $$ (2) Fig. 1 plots the data rate in bits per second for different telecommunication systems according to the time of their introduction. Starting with the optical telegraph (or Chappe's semaphore) presented to Napoleon Bonaparte in 1798, the optical telegraph had a typical rate of transmission of approximately 2 to 3 symbols (196 different types) per minute, or 0.4 b/s. Consequently, the electrical telegraph, popularized in the early 1840s using Samuel Morse's code, achieved a rate of approximately 100 b/s. Graham Bell's telephone was introduced in 1876 and supported voice frequency transmission up to 64 kb/s6. The early NTSC black and white electronic television, available in the 1940s, had 525 interlaced lines and displayed images at a rate of 29.97 frames per second at a bit rate of 26 Mb/s7. The color NTSC format was introduced 10 years later and tripled the black and white bandwidth to accommodate red, green, and blue channels8. More recently, the digital video format makes it easier to establish the bit rate based on pixel count (excluding compression) with HDTV 720p@1.33 Gb/s in 1990, ultra-HDTV 2160p(4K)@12.7 Gb/s in 2010, and currently 4320p(8K)@47.8 Gb/s. Note that these values are for uncompressed data feeds, and for the sake of comparison, do not include any type of compression algorithm.
Fig. 1 Stairway to holography: approximate bit rate magnitude of various telecommunication devices according to their year of introduction.
The evolution of the bit rate for telecommunication devices plotted in Fig. 1 shows a trend that can be extrapolated to predict the emergence of holographic displays with a data rate of
$ 3\;\times \;10^{15} $ b/s. This extrapolation estimates the emergence of a commercial holographic display by 2100. Although this extrapolation is indicative of the difficulties ahead, it is also very encouraging. The date of 2100 is by no means an inescapable natural law. Similar to the doubling of the transistor count on chips every year, this prediction can be affected in one way or another by the amount of effort invested in the research and development of holographic display technologies.In this manuscript, we investigate the reasons why holography is still perceived to be the ultimate technique to develop a commercial 3D display, review the progress that has been accomplished toward this goal, and discuss the missing technologies that are still needed to promote the emergence of such a 3D display.
-
Understanding the human visual system and how it perceives the third dimension is key to developing a 3D display9−11. The human visual system takes input from many different cues to determine depth perception. It should be noted that most of these cues originate from 2D phenomena. Among these are shading, shadowing, perspective, relative size, occlusion, blurriness, and haze. The example presented in Fig. 2 shows how three simple discs, presented on whatever 2D display you are reading this article on, are interpreted as 3D balls owing to these cues.
Because these 2D cues are processed by the human visual system to determine the depth of a scene, then a painting, a photograph, or a movie is intelligible as long as these cues are correctly reproduced. When they are not, this leads to optical illusions such as infinite staircases and other impossible shapes.
The same applies to any 3D display system, which must, first and foremost, represent these 2D cues before introducing any additional cues. Additional 3D cues are stereo disparity, motion parallax, and accommodation. We briefly review these cues in the following sections.
-
Stereo disparity is the change in parallax of the scene observed between the left and right eyes. It only requires that two images be reproduced, and as such, is the most technologically manageable 3D cue. It is so manageable in fact that the introduction of stereoscopic displays pre-dates the invention of photography. The first system was invented by Sir Charles Wheatstone in the early 1830s using drawn images12. This was then followed by taking pictures from two positions, or with a camera having two objectives.
When a stereo projection is meant for a single individual, such as a head-worn display, it is relatively easy to keep the left and right views separated. Images are separated by simply introducing a physical partition between both eyes13. For a larger audience, the separation between left and right views is often achieved by having the viewers use eyewear with different left and right lenses. The left and right image coding can be achieved using color (anaglyphs), orthogonal polarization, or alternating shutters14, 15.
From a user perspective, the eyewear requirement for stereo display has been accepted in special venues such as theaters, where large productions continue to be released in stereoscopic 3D. However, the commercial failure of stereoscopic 3D television seems to indicate that for everyday experience, the public is not enthusiastic about wearing special glasses in their own living rooms16.
-
Autostereoscopic displays achieve stereoscopy without the need for special glasses. The left and right views are directly projected toward the viewer’s intended eyes using parallax barriers or a microlens array17−19. To ensure that the correct eye intersects the correct projection, autostereoscopic systems require that the viewer be located at a particular position. This inconvenience has proven sufficient to limit the adoption of autostereoscopic 3D television by the consumer market20. It should also be noted that autostereoscopic systems with an eye tracking mechanism that mitigates the fixed viewer zones have been developed, but have not achieved wide popularity21−23.
-
Motion parallax requires many views to be projected, allowing the viewer to see the correct parallax even when moving in front of the display. The density of the different views that are projected needs to be such that the autostereoscopic information is correctly reproduced. Therefore, at least two views per inter-pupillary distance are required. However, to achieve a smooth transition from one perspective to the next, a much larger density of views is required24. The optimum view density depends on the exact configuration of the display and the expected viewer distance, but numbers are on the order of one view per degree25−27.
In most of the literature, a display that reproduces motion parallax is called a "multiview" or "multi-view" display while a "light-field" display reconstructs 3D images based on the concept of ray-optics and integral imaging28−32. In a multiview display, the display is designed such that the motion parallax can be reproduced smoothly when a viewer’s position changes. This is considered a multiview-type autostereoscopic display. However, when the display is also capable of reconstructing virtual or real images, it is usually called a light-field display.
We can apply the same data rate computation introduced earlier for the different types of telecommunication devices (see Fig. 1) to a multiview (or light-field) display that reproduces motion parallax. In this case, we find that for a display with 2160p(4K) lateral resolution to reproduce motion parallax with a ±45° field of view, the bit rate is on the order of
$ 12.7\; \times\; 90^2 = 10^{5}\; {\rm{Gb/s}} $ . The square factor arises from the fact that both horizontal and vertical parallaxes are considered in this case.Because the human visual system involves a mostly horizontal inter-pupillary distance, and lateral movement is favored over vertical movement, horizontal parallax is more important than vertical parallax. The latter is often discarded in multiview displays to allow a lower data rate of
$ 12.7 \;\times \;90 = 10^{3} \;{\rm{Gb/s}} $ .When the viewer remains motionless in front of a multiview display, the observed parallax provides an experience similar to that of autostereoscopic displays33. However, because of the much larger number of views, a light-field display is not subject to the same limited number of view zones as autostereoscopic systems34. Therefore, user experience is much better, and acceptance is more likely.
Considering their somewhat achievable data rate and advantages over auto-stereoscopy, multi-view and light-field displays are currently the subject of intense research35−40. This technology certainly represents the next 3D display platform that will appear in the marketplace, and some specialized applications have already started to emerge41.
-
The vergence-accommodation conflict is the Achilles heel of all the display systems that we have introduced thus far: stereoscopic, autostereoscopic, multiview, and light-field (with some exception for the latter) and occurs when mismatched visual 3D cues are presented to an observer. The vergence-accommodation conflict occurs because the images projected by these displays are located at a fixed distance, thus producing a constant accommodation cue that cannot be adjusted, whereas vergence is provided by the parallax, which can be reproduced, and thus may vary within a scene. Disparity between accommodation and vergence cues can create a conflict in perception. This conflict leads to some visual discomfort, which is well documented in the literature42−45.
Light-field displays can reproduce some amount of accommodation when the ray density is sufficiently large. This condition is often referred to as a super-multiview46, 47. Accommodation occurs in a light-field display because the image plane can be moved in and out of the display plane. This is achieved by directing the light rays from different sections of the panel toward one voxel region, as shown in Fig. 3a.
Fig. 3 Illustration of the projection of a voxel out of the emission plane by
a a light-field display, and b a holographic display.However, there is some belief that if the view density keeps increasing in a light-field display, the accommodation distance can be extended at will. This belief arises from the extrapolation that light-field displays approximate a wavefront curvature by using line segments. If these segments are sufficiently small, they may become indistinguishable from the true wavefront curvature. Unfortunately, this ray-tracing simplification does not occur because diffraction along the pixel edges takes place, limiting the voxel resolution. Even with a pixel density in the 100s per degree, when an object is projected too far from the plane of the light-field display, it becomes blurry because of the diffraction among pixels. This diffraction effect cannot be avoided and intrinsically reduces the depth resolution and accommodation of light-field displays48, 49.
To eliminate the diffraction phenomena experienced with smaller pixel sizes, strong coherence among pixels is required so that the light-field display becomes indistinguishable from holography.
The difficulty of reproducing accommodation induces visual discomfort by having to limit the display depth of the field. To reproduce a voxel out of the plane of the display, the light should be focused at that point by the optical system. Without the capability to refocus subpixels at will, the light-field display can only produce a flat wavefront from the emission plane. As presented in Fig. 3a, when a light-field display attempts to reproduce a voxel that is too far away from the emission plane, the voxel invariably becomes blurry.
To address this problem, researchers have developed multiplane light-field displays50−52. This is possible because the plane of emission can be refocused by optical elements and moved along the view depth. However, this requires some multiplexing to generate different planes in time or space, which increases the bandwidth required by the system. Another aspect that should not be overlooked is that occlusions between different planes are difficult to control when there are many view zones53.
-
Volumetric displays have voxels located in 3D space and are affected by the same occlusion problem as a multi-plane light-field display. For both systems, the occlusions can only be correctly reproduced for one point of view54, 55. Some systems (both volumetric and light-field) use an eye tracking mechanism to re-calculate the occlusions and present the correct image wherever the viewer is located56. However, only one correct perspective can be achieved, precluding its application for multiple observers.
In a volumetric display, the occlusion problem occurs because the emission of the voxel is omnidirectional, and there is no absorptive voxel. Nevertheless, volumetric displays have the advantage of being able to reproduce the field depth without resolution loss. They can be somewhat more natural to view when they do not use a screen to display an image. In this case, the image appears floating in thin air, which has a dramatic effect on the viewer’s perception55, 57, 58.
Volumetric displays also have the disadvantage of not being capable of projecting images outside a limited volume. The image depth is bounded by that volume, and a deep landscape or object that seemingly reaches out of the display cannot be reproduced54.
The mathematical computation of the bit rate for a volumetric display is as simple as multiplying the resolution of a 2D screen by the third dimension, refresh rate, and dynamic range. In Fig. 1, the data rate for a 4K volumetric display is
$$ \begin{split} \begin{split} & \mathrm{x \times y \times z \times rep. rate \times res. \times colors }\\ & = 4096 \times 2160 \times 1000 \times 60 \times 8 \times 3 = 1.3 \times 10^{13}\; \mathrm{b/s.} \end{split} \end{split} $$ (3) However, because volumetric display setups are easily scalable, lower-resolution systems can be readily used to showcase the potential of the technology59−61.