"Super Hi-Vision" as Next-Generation Television and Its Video Parameters

Future TVs, using a Super Hi-Vision system, will be able to deliver an enhanced and even unprecedented viewing experience in various environments.

HDTV (high-definition TV) has become popular all over the world with the spread of digital broadcasting. Japan Broadcasting Corporation (NHK), which has been developing HDTV for many years, began work on the development of ultra-high-definition television (UHDTV)1 in 1995 and has contributed to ITU (International Telecommunications Union-Radiocommunications) standards such as Recommendation BT.2020 (mentioned later on in this article). This new format is expected to produce extremely realistic viewing sensations through the use of 4000 scanning line images and 22.2-multichannel sound. The Super Hi-Vision (SHV) design target is to achieve a total immersive experience providing realistic visual and aural sensations so that the audience feels they are present at the scene.

Beyond simply delivering a wider field of view, important factors such as color/tone rendition and motion portrayal that could be crucial to delivering an enhanced visual experience with next-generation television (next-gen TV) systems will be investigated in this article, and system parameters will be proposed, including colorimetry and frame frequency for next-gen TV. The proposed colorimetry system is based on the real RGB color system and has a color gamut that includes 99.9% of real surface colors while using physically realizable RGB primaries. Further, a frame frequency of 120 Hz is proposed on the basis of subjective assessments of motion-picture quality.

Super Hi-Vision

NHK has been developing the SHV system as part of a project to deliver a viewing experience far beyond that possible with existing systems. Figure 1 shows the image format of SHV.


Fig. 1:
The image format of SHV (left) is compared with HDTV and digital cinema formats (right).


The SHV frame has 7680 pixels x 4320 lines with a frame rate of 120 frames/sec (progressive). The resolutions in the horizontal, vertical, and time vectors are the integral multiples of the HDTV format, to maintain compatibility with HDTV. The SHV system can accordingly be built on an HDTV base. The viewing distance in SHV is calculated as the picture height x 0.75 (~ 3 m/10 ft. in the case of a 500-in. theatre screen, or, in the case of a home application, a viewing distance of about 5 ft. for a 145-in.-diagonal 8K SHV) to ensure that all images are visible within 100° of the viewing angle and hence lie within the human visual field. It is a basic concept of SHV that the grain of the scanning lines should be practically invisible even at such close range so that the viewer can enjoy an extremely realistic visual sensation. Figure 2 indicates an arrangement for 22.2-multichannel sound speakers.2



Fig. 2: This diagram shows an arrangement for a 22.2-multichannel sound system.


NHK defines immersion as when the image on the screen is perceived as a real-world image. Sound effects are an essential factor in reducing the perception gap between the SHVscreen images and the real world. Whereas the 5.1-channel surround speakers are arranged in a plane to reproduce a planar sound field, the 22.2-multichannel immersion speakers are set up in three tiers (upper, middle, and lower) to reproduce a three-dimensional sound field: left and right, forward and back, and up and down.

In an early trial, select groups of people in London, Bradford, Glasgow, the U.S., and Japan watched the Olympic Opening Ceremonies last summer in Super Hi-Vision. The current target is to begin experimental broadcasts with this system in 2020 via satellite in the 21-GHz band. To achieve this objective, the focus has been on identifying the main video and sound parameters and on developing a complete end-to-end solution from the camera to the display, including media storage, compression, and transmission. For SHV, the specific goal is to meet the following fundamental requirements: worthwhile improvement in quality beyond high-definition television (HDTV); compatibility, interoperability, and commonality with HDTV; and technical feasibility in the foreseeable future. Thus, a number of studies on human subjects have been conducted to investigate the psychophysical effects of the several video parameters in order to determine suitable values, including the spatial resolution, temporal resolution, tone reproduction, and color representation.

Full-Spec Super Hi-Vision Video Parameters

The full-spec video parameter values suitable for SHV on the basis of intensive studies have been determined; the values are presented in Table 1. These studies will be discussed later in this article. Note that a new standard for ultra-high-definition television systems was established as Recommendation BT. 2020 by ITU-R.


Table 1: Parameter values for full-spec video for SHV include frame frequency and bit depth.
Parameters Values
Spatial sampling points Horizontal: 7680
Vertical: 4320
Frame frequency 120 Hz
Opto-electronic transfer characteristics Eq
α = 1.0993, β = 0.0181
Bit depth 12-bit
Primaries and reference white chromaticity coordinates3      


Spatial Resolution

SHV has been designed to provide an enhanced sense of presence for a new visual experience. This requires a wider spatial resolution, which is expressed by the angular field of view (FOV) in degrees and angular resolution in pixels per degree of arc. However, the sense of presence could involve various subjective factors, among them the sense of "being there" and the sense of "realness." These have been identified as factors that should distinguish SHV from existing systems.

Sense of "Being There"

Subjective assessments were conducted using four images shot with a camera angle of 60° that were presented to participants at five different FOV angles. Each participant evaluated the degree of the sense of "being there" from the images on a continuous scale ranging from 0 (none) to 10 (extreme). In total, 200 participants were employed, and these were divided into five groups of 40 participants. Each group performed the evaluation for one of the FOV angles. As shown in Fig. 3, the results confirm that a wider FOV produces a stronger sense of "being there."4

From Fig. 3, it is clear that the sense of "being there" increases with the FOV but saturates at an FOV of around 80°–100°. Although this figure seems to peak at FOV of 77°, there is no significant difference statistically. Another experiment performed in the same study using images obtained with a camera angle of 100° showed a similar result. Thus, the target FOV for SHV was set at around 80°–100°. This corresponds to a viewing distance that is 0.75–1.00 times the picture height (0.75H–1H), at which point people with normal visual acuity are simply unable to discern the pixel structure.



Fig. 3: A wider horizontal angular field of view provides a greater sense of "being there" (mean ± standard error).


Sense of "Realness"

Another experiment was conducted using a paired-comparison method with images at six different angular resolutions that were presented along with real objects. Participants chose the image that they perceived as better resembling the real object. The experimental setup was such that the effect of factors (e.g., binocular disparity, image size, perspective, luminance, and color) other than the resolution on the result was minimal.

As shown in Fig. 4, the results confirmed that the spatial resolution is responsible for determining whether viewers can distinguish images from real objects. The higher the angular resolution, the greater the sense of "realness" or visual fidelity.5 However, the improvement gently saturates at about 60 cycles per degree (cpd), due to maximum human visual acuity, as mentioned above.



Fig. 4: The relationship between angular resolution and sense of "realness" (mean ± 95% confidence interval) is shown. The higher the angular resolution, the greater the "realness."


Spatial Sampling Parameters of Super Hi-Vision

The spatial sampling point for SHV has been set to 7680 x 4320 pixels – four times that of HDTV in both horizontal and vertical directions. Three video systems were compared with different spatial resolutions – a 2K system (HDTV), a 4K system, and an 8K system (SHV) – in terms of the sense of "being there" and the sense of "realness" for a range of FOV angles or viewing distances, as shown in Fig. 5.

It was shown that, as found previously, the sense of "being there" is influenced by the FOV. However, the sense of "realness" is influenced not only by the FOV, but also by the resolution; "realness" being low for low-resolution systems at wide FOVs.

The sense of "realness" differs among the three video systems. In Fig. 5, the angular resolution has been transformed into FOV or viewing distance for the different spatial resolutions (see Appendix).

SHV can provide a strong sense of both "being there" and "realness" for a wide range of FOVs or viewing distances. This feature of SHV is expected to be effectively used in various viewing environments and for large, medium, and small displays. This is in contrast to the 4K and 2K systems, which are effective only under limited viewing conditions.



Fig. 5: Video systems with different spatial resolutions are compared in terms of the sense of "being there" and the sense of "realness."


Motion Blur, Stroboscopic Effect, and Flicker

Motion portrayal is characterized by the perception of motion blur, stroboscopic effect, and flicker. These factors are influenced by temporal video parameters, including the time aperture and frame frequency. The speeds of moving objects in the video also influence motion portrayal. Motion blur is caused by a moving scene accumulating light across multiple photo sites of the image sensor in the capture device and/or image update rate and response hold time of the display, which is associated with the motion-tracking response of the eye. For motion, the time aperture – that is, the capture sensor integration time or display response hold time – affects the dynamic spatial-frequency response, which decreases at high motion speeds. A short time aperture is required for both cameras and displays to improve the dynamic spatial frequency response.

Several experiments have been performed to understand the relationship between motion blur and time aperture. One was conducted to determine the quality of still images and moving images for different time-aperture–object-speed combinations.6 As shown in Fig. 6, if we assume an object speed of 30°/sec, which is typical in HDTV programs, the time aperture should be in the range 1/200–1/300 sec. Note that only combinations of temporal aperture and object relative velocity to the camera (and display) that gave an observer an acceptable degree of motion blur are shown in the figure.



Fig. 6: Motion-velocity–temporal-aperture combinations correspond to acceptable degrees of motion blur.


The time aperture can be shortened by increas-ing the frame frequency. Alternatively, the same effect can be achieved by using a shutter in the camera or by inserting black frames on the display without changing the frame frequency. However, these techniques may result in the degradation of the picture quality (called the stroboscopic effect or jerkiness), leading to motion being seen as a series of snapshots.

The subjective picture quality was investigated in the presence of the stroboscopic effect for varying frame frequencies using a fixed time aperture of 1/240 sec to determine an observer threshold for smooth motion that provides an acceptable motion blur.7 As shown in Fig. 7, the results suggest that a frame frequency greater than 100 Hz is required for acceptable quality.



Fig. 7: Picture quality in the presence of the stroboscopic effect is demonstrated at different frame frequencies (mean ± standard error).


Flicker is a commonly encountered annoyance in moving pictures. A wide FOV on a large screen increases the perception of flicker because human eyes are more sensitive to flicker in peripheral vision. A short hold time on a hold-type display may also increase the perception of flicker. A plot of critical fusion frequencies (CFFs) for two different FOVs at a 30% time aperture,8 as shown in Fig. 8, confirms that a frame frequency greater than 80 Hz is required for a wide FOV system.



Fig. 8: This plot shows critical fusion frequencies vs. horizontal field-of-view angle (mean ± standard deviation).


Temporal Sampling Parameters for Super Hi-Vision

Taken together, these results suggest that the frame frequency of SHV should be at least 120 Hz to achieve a worthwhile improvement in motion portrayal. Naturally, a higher frame frequency would provide better quality, but the improvement tends to saturate. (This is assuming display technology capable of higher frame rates; for example, current digital cinemas use DMD projectors that are capable of operating at several times higher frame rates.)

Tone Reproduction

Discontinuities in tone reproduction, which usually occur as contouring artifacts, should be avoided. This means that quantization characteristics, particularly the bit depth, should be set such that it should not be possible to discern modulation corresponding to a one-code value difference between adjacent image areas. The contrast sensitivity was measured in a dim environment with the modulation transfer characteristics of a gamma 1/2.4 transfer function for 10, 11, and 12-bit depths, as shown in Fig. 9.



Fig. 9: Modulation threshold and minimum modulation are shown for different bit depths.


The contrast sensitivity is based on Barten's equation,9 which has been used to determine the bit depth of the D-Cinema distribution master.10 It is observed that 11- and 12-bit encoding modulation lines are below the visual modulation threshold for the entire luminance range and do not show contouring.


Real objects can have highly saturated colors that are beyond the color gamut of HDTV (ITU-R Rec. BT.709). Consumer-level flat-panel displays are quickly becoming capable of reproducing a wider range of colors; in fact, some non-broadcast video systems already handle a wider color gamut exceeding that of HDTV. Thus, for the observer to experience "realness" and the sense of "being there," SHV should cover a color gamut sufficiently wide enough to approach encompassing all colors found in our natural world, and an efficient and practical method should be devised for this.

To this end, requirements for developing a color representation method and determining parameter values have been defined in terms of target color, color-coding efficiency, program quality management, and feasibility of displays. After comparing several methods for widening the color gamut in terms of the requirements, the authors chose a colorimetry system with RGB monochromatic primaries on the spectrum locus that can be realized, for example, by using laser light sources in the foreseeable future.11 Note that the reference white of D65 remains unchanged. As shown in Fig. 10 and in Table 2, the wide-gamut colorimetry covers the gamuts of HDTV, the D-Cinema reference projector, and Adobe RGB, as well as more than 99.9% of Pointer's gamut.a Experiments on the capture and display of wide color-gamut images have confirmed the validity of the UHDTV (ITU-R Rec BT.2020) wide-gamut colorimetry, demonstrating textures and highly saturated colors closer to those of real-world objects as seen by observers.



Fig. 10: Pointer's colors and primaries are shown for different video systems.


Table 2: This table compares HDTV, Super-Hi-Vision, and other ranges for Pointer's gamut (a well-known definition of the gamut of real-world color surfaces) and Optimal Color (based on the color space defined by the International Commission on Illumination (CIE)).

  Pointer's Gamut Optimal Color3
Adobe RGB
Digital Cinema
Super Hi-Vision


Device Development toward Full-Spec SHV

A full-spec SHV system based on these specifications is being developed. For practical implementation, several devices have been developed to realize full-spec SHV. On the capturing side, a camera system with the full-spec spatial sampling points and bit depth was developed.13 This camera system consists of 33-Mpixel CMOS (complementary metal oxide semiconductor) image sensors, a 74-Gbit/sec bandwidth transmission device, and a signal processing unit. However, these camera systems do not achieve a frame rate of 120 frames/sec and do not have a sufficiently wide color gamut. To solve these issues, a CMOS image sensor that can capture 120-frames/sec14 video was developed to meet the frame-rate specifications. For the camera operator, it is also a major challenge to make the size of the camera head more compact.

On the display side, a LCoS (liquid-crystal– on–silicon) projector with a resolution of 7680 x 4320 pixels was developed. And in 2011, a liquid-crystal display (LCD) with the same pixel count, called a full-resolution LCD, was devised, with details shown in Table 3.

These display systems also do not achieve 120 frames/sec and do not have a wide color gamut, which are the next development goals.


Table 3: Details of a full-resolution LCD developed in 2011 appear above.

Parameters Values
Pixel count Horizontal: 7680
Vertical: 4320
Diagonal size 85 in.
Frame frequency 60 Hz
Bit depth per color 10-bit
Luminance 300 cd/m2


Closer to "Being There"

Video parameter values for SHV have been established, with the aim of delivering an enhanced, or even unprecedented, viewing experience to viewers in various environments. Some parameters contribute to an increased sense of "being there" and to the sense of "realness," while others help improve the picture quality by eliminating artifacts in motion portrayal and tone reproduction. Feasibility is also an important factor in determining the parameter values for application.

For further reading, Hiroyasu Masuda's article on "Ultrahigh-Definition Content Production Techniques and Their Broadcasting Applications," can be found in the November/ December 2011 SMPTE Motion Imaging Journal.


The viewing distance D (H) and the FOV θ(°) are written as follows:

D = 1/V tan (1/ 2R),     (1)

θ = 2 tan–1 (8/ 9D),      (2)

where V is the number of vertical pixels and R (cpd) is the angular resolution at the center of the screen with an aspect ratio of 16:9.


1M. Kanazawa, K. Mitani, K. Hamasaki, M. Sugawara, F. Okano, K. Doi, and M. Seino, "Ultrahigh-Definition Video System With 4000 Scanning Lines," Proc. IBC 2003, 321–329 (2003).

2K. Hamasaki et al., "The 22.2 Multichannel Sound System and Its Application," Proc. 118th AES Convention, paper No. 6406 (May 2006).

3CIE: Commission internationale de l'Eclairage proceedings (Cambridge University Press, Cambridge, 1932).

4K. Masaoka, M. Emoto, M. Suagwara, and Y. Nojiri, "Contrast Effect in Evaluating the Sense of Presence for Wide Displays," J. Soc. Info. Display 14, 785-791 (Sept. 2006).

5K. Masaoka, Y. Nishida, M. Sugawara, E. Nakasu, and Y. Nojiri, "Sensation of Realness from High-Resolution Images of Real Objects," IEEE Trans. Broadcast 58 (2012) (to be published) .

6K. Omura, M. Sugawara, and Y. Nojiri, "Evaluation of Motion Blur by Comparison with Still Picture," presented at the IEICE General Convention, DS-3-3, pp. S-5–6 (2008).

7K. Omura, M. Sugawara, and Y. Nojiri, "Subjective Evaluation of Motion Jerkiness for Various Frame Rates and Aperture Ratios," IEICE Technical Report IE2008-205, pp. 7-11 (2009).

8M. Emoto and M. Sugawara, "Critical Fusion Frequency for Bright and Wide Field-of-View Image Display," J. Display Tech (2012).

9P. G. J. Barten, Contrast Sensitivity of the Human Eye and its Effects on Image Quality, SPIE Optical Engineering Press (Bellingham, WA, 1999).

10M. Cowan, G. Kennel, T. Maier, and B. Walker, "Contrast Sensitivity Experiment to Determine the Bit Depth for Digital Cinema," SMPTE Mot. Imag. J., 113, 281-292 (Sept. 2004).

11K. Masaoka, Y. Nishida, M. Sugawara, and E. Nakasu, "Design of Primaries for a Wide-Gamut Television Colorimetry," IEEE Trans. Broadcast 56, 452–457 (Dec. 2010).

12K. Masaoka, K. Omura, Y. Nishida, M. Sugawara, Y. Nojiri, E. Nakasu, S.Kagawa, A. Nagase, T. Kuno, and H. Sugiura, "Demonstration of a Wide-Gamut System Colorimetry for UHDTV," Proceedings of ITE Annual Convention 2010, Matsuyama, Japan, paper 6-2 (2010).

13T. Yamashita, R. Funatsu, T. Yanagi, K. Mitani, Y. Nojiri, and T. Yoshida, "A Camera System Using Three 33-Mpixel CMOS Image Sensors for UHDTV2," presented at the SMPTE Annual Technical Conference & Expo, Hollywood, CA (Oct. 2010).

14K. Kitamura, T. Watabe, Y. Sadanaga, T. Sawamoto, T. Kosugi, T. Akahori, T. Iida, K. Isobe, T. Watanabe, H. Shimamoto, H. Ohtake, S. Aoyama, S. Kawahito, and N. Egami, "A 33 Mpixel, 120 fps CMOS Image Sensor for UDTV Application with Two-Stage Column-Parallel Cyclic ADCs," Proceedings of IISW 2011, Hokkaido, Japan, R62 (June 2011). •


a"Pointer's gamut" refers to work by Dr. Michael R. Pointer, whose frequently cited 1980 paper for Color Research and Application, "The Gamut of Real Surface Colors," considered the gamut of real surface colors in the CIE 1976 L*u*v* and L*a*b* color spaces for a typical dye set used in photographic paper and typical CRT displays, as opposed to the wider gamut that the human eye is capable of viewing, as described by the MacAdam Limits. (Useful Color Data, Munsell Color Science Laboratory, http://www.cis.rit.edu/mcsl/online/cie.php); and R. Heckaman and M. Fairchild "G0 and the Gamut of Real Objects," Munsell Color Science Laboratory, Rochester Institute of Technology.


Takayuki Yamashita (yamashita.t-hq@nhk.or.jp) is with the NHK Science and Technology Research Laboratories, Tokyo, Japan, and Hiroyasu Masuda (masuda.h-fg@nhk.or.jp) is with the Japan Broadcasting Corporation (NHK), Tokyo, Japan. Kenichiro Masaoka, Kohei Omura, Masaki Emoto, Yukihiro Nishida, and Masayuki Sugawara are with the NHK Science and Technology Research Laboratories, Tokyo, Japan. This article is based on a paper presented at the SMPTE 2011 Annual Technical Conference & Exhibition, 25–27 October 2011. Copyright © 2012 by SMPTE.