HDTV (high-definition TV) has become popular all over the world with the spread of digital
broadcasting. Japan Broadcasting Corporation (NHK), which has been developing HDTV
for many years, began work on the development of ultra-high-definition television
(UHDTV)1 in 1995 and has contributed to ITU (International Telecommunications
Union-Radiocommunications) standards such as Recommendation BT.2020 (mentioned later
on in this article). This new format is expected to produce extremely realistic
viewing sensations through the use of 4000 scanning line images and 22.2-multichannel
sound. The Super Hi-Vision (SHV) design target is to achieve a total immersive
experience providing realistic visual and aural sensations so that the audience
feels they are present at the scene.
Beyond simply delivering a wider field of view, important factors such as color/tone
rendition and motion portrayal that could be crucial to delivering an enhanced visual
experience with next-generation television (next-gen TV) systems will be investigated
in this article, and system parameters will be proposed, including colorimetry and
frame frequency for next-gen TV. The proposed colorimetry system is based on the
real RGB color system and has a color gamut that includes 99.9% of real surface
colors while using physically realizable RGB primaries. Further, a frame frequency
of 120 Hz is proposed on the basis of subjective assessments of motion-picture quality.
NHK has been developing the SHV system as part of a
project to deliver a viewing experience far beyond that possible with existing systems.
Figure 1 shows the image format of SHV.
The SHV frame has 7680 pixels x 4320 lines with a frame rate of
120 frames/sec (progressive). The resolutions in the horizontal, vertical, and
time vectors are the integral multiples of the HDTV format, to maintain compatibility
with HDTV. The SHV system can accordingly be built on an HDTV base. The viewing
distance in SHV is calculated as the picture height x 0.75 (~ 3 m/10 ft. in the case
of a 500-in. theatre screen, or, in the case of a home application, a viewing distance
of about 5 ft. for a 145-in.-diagonal 8K SHV) to ensure that all images are visible
within 100° of the viewing angle and hence lie within the human visual field.
It is a basic concept of SHV that the grain of the scanning lines should be practically
invisible even at such close range so that the viewer can enjoy an extremely realistic
visual sensation. Figure 2 indicates an arrangement for 22.2-multichannel sound
NHK defines immersion as when the image on the screen is perceived
as a real-world image. Sound effects are an essential factor in reducing the perception
gap between the SHVscreen images and the real world. Whereas the 5.1-channel surround
speakers are arranged in a plane to reproduce a planar sound field, the 22.2-multichannel
immersion speakers are set up in three tiers (upper, middle, and lower) to reproduce
a three-dimensional sound field: left and right, forward and back, and up and down.
In an early trial, select groups of people in London, Bradford,
Glasgow, the U.S., and Japan watched the Olympic Opening Ceremonies last summer
in Super Hi-Vision. The current target is to begin experimental broadcasts with
this system in 2020 via satellite in the 21-GHz band. To achieve this objective,
the focus has been on identifying the main video and sound parameters and on developing
a complete end-to-end solution from the camera to the display, including media storage,
compression, and transmission. For SHV, the specific goal is to meet the following
fundamental requirements: worthwhile improvement in quality beyond high-definition
television (HDTV); compatibility, interoperability, and commonality with HDTV; and
technical feasibility in the foreseeable future. Thus, a number of studies on human
subjects have been conducted to investigate the psychophysical effects of the several
video parameters in order to determine suitable values, including the spatial resolution,
temporal resolution, tone reproduction, and color representation.
The full-spec video parameter values suitable for SHV
on the basis of intensive studies have been determined; the values are presented
in Table 1. These studies will be discussed later in this article. Note that a
new standard for ultra-high-definition television systems was established as Recommendation
BT. 2020 by ITU-R.
|Spatial sampling points
|Opto-electronic transfer characteristics
= 1.0993, β = 0.0181
|Primaries and reference white chromaticity
SHV has been designed to provide an enhanced sense of presence
for a new visual experience. This requires a wider spatial resolution, which
is expressed by the angular field of view (FOV) in degrees and angular resolution
in pixels per degree of arc. However, the sense of presence could involve various
subjective factors, among them the sense of "being there" and the sense of "realness."
These have been identified as factors that should distinguish SHV from existing
Sense of "Being There"
Subjective assessments were conducted using four images shot with
a camera angle of 60° that were presented to participants at five different
FOV angles. Each participant evaluated the degree of the sense of "being there"
from the images on a continuous scale ranging from 0 (none) to 10 (extreme).
In total, 200 participants were employed, and these were divided into five groups
of 40 participants. Each group performed the evaluation for one of the FOV angles.
As shown in Fig.
3, the results confirm that a wider FOV produces a stronger sense
of "being there."4
, it is clear that the sense of "being there" increases with the
FOV but saturates at an FOV of around 80°–100°. Although this
figure seems to peak at FOV of 77°, there is no significant difference
statistically. Another experiment performed in the same study using images obtained
with a camera angle of 100° showed a similar result. Thus, the target FOV
for SHV was set at around 80°–100°. This corresponds to a viewing
distance that is 0.75–1.00 times the picture height (0.75H–1H),
at which point people with normal visual acuity are simply unable to discern
the pixel structure.
Fig. 3: A wider horizontal angular field
of view provides a greater sense of "being there" (mean ± standard
Sense of "Realness"
Another experiment was conducted using a paired-comparison method
with images at six different angular resolutions that were presented along with
real objects. Participants chose the image that they perceived as better resembling
the real object. The experimental setup was such that the effect of factors
(e.g., binocular disparity, image size, perspective, luminance, and color)
other than the resolution on the result was minimal.
As shown in Fig.
4, the results confirmed that the spatial resolution is responsible
for determining whether viewers can distinguish images from real objects. The
higher the angular resolution, the greater the sense of "realness" or visual
However, the improvement gently saturates at about 60 cycles per degree (cpd),
due to maximum human visual acuity, as mentioned above.
Fig. 4: The relationship between
angular resolution and sense of "realness" (mean ± 95% confidence
interval) is shown. The higher the angular resolution, the greater the "realness."
Spatial Sampling Parameters of Super Hi-Vision
The spatial sampling point for SHV has been set to 7680 x 4320
pixels – four times that of HDTV in both horizontal and vertical directions.
Three video systems were compared with different spatial resolutions –
a 2K system (HDTV), a 4K system, and an 8K system (SHV) – in terms of
the sense of "being there" and the sense of "realness" for a range of FOV angles
or viewing distances, as shown in Fig.
It was shown that, as found previously, the sense of "being there"
is influenced by the FOV. However, the sense of "realness" is influenced not
only by the FOV, but also by the resolution; "realness" being low for low-resolution
systems at wide FOVs.
The sense of "realness" differs among the three video systems.
In Fig. 5,
the angular resolution has been transformed into FOV or viewing distance for
the different spatial resolutions (see
SHV can provide a strong sense of both "being there" and "realness"
for a wide range of FOVs or viewing distances. This feature of SHV is expected
to be effectively used in various viewing environments and for large, medium,
and small displays. This is in contrast to the 4K and 2K systems, which are
effective only under limited viewing conditions.
Fig. 5: Video
systems with different spatial resolutions are compared in terms of the sense
of "being there" and the sense of "realness."
Motion Blur, Stroboscopic Effect, and Flicker
Motion portrayal is characterized by the perception of motion
blur, stroboscopic effect, and flicker. These factors are influenced by temporal
video parameters, including the time aperture and frame frequency. The speeds
of moving objects in the video also influence motion portrayal. Motion blur
is caused by a moving scene accumulating light across multiple photo sites of
the image sensor in the capture device and/or image update rate and response
hold time of the display, which is associated with the motion-tracking response
of the eye. For motion, the time aperture – that is, the capture sensor
integration time or display response hold time – affects the dynamic spatial-frequency
response, which decreases at high motion speeds. A short time aperture is required
for both cameras and displays to improve the dynamic spatial frequency response.
Several experiments have been performed to understand the relationship
between motion blur and time aperture. One was conducted to determine the quality
of still images and moving images for different time-aperture–object-speed
As shown in Fig.
6, if we assume an object speed of 30°/sec, which is typical
in HDTV programs, the time aperture should be in the range 1/200–1/300
sec. Note that only combinations of temporal aperture and object relative velocity
to the camera (and display) that gave an observer an acceptable degree of motion
blur are shown in the figure.
Fig. 6: Motion-velocity–temporal-aperture
combinations correspond to acceptable degrees of motion blur.
The time aperture can be shortened by increas-ing the frame frequency.
Alternatively, the same effect can be achieved by using a shutter in the camera
or by inserting black frames on the display without changing the frame frequency.
However, these techniques may result in the degradation of the picture quality
(called the stroboscopic effect or jerkiness), leading to motion being seen
as a series of snapshots.
The subjective picture quality was investigated in the presence
of the stroboscopic effect for varying frame frequencies using a fixed time
aperture of 1/240 sec to determine an observer threshold for smooth motion that
provides an acceptable motion blur.7
As shown in Fig.
7, the results suggest that a frame frequency greater than 100 Hz
is required for acceptable quality.
Fig. 7: Picture
quality in the presence of the stroboscopic effect is demonstrated at different
frame frequencies (mean ± standard error).
Flicker is a commonly encountered annoyance in moving pictures.
A wide FOV on a large screen increases the perception of flicker because human
eyes are more sensitive to flicker in peripheral vision. A short hold time on
a hold-type display may also increase the perception of flicker. A plot of critical
fusion frequencies (CFFs) for two different FOVs at a 30% time aperture,8
as shown in Fig.
8, confirms that a frame frequency greater than 80 Hz is required
for a wide FOV system.
Fig. 8: This
plot shows critical fusion frequencies vs. horizontal field-of-view angle
(mean ± standard deviation).
Temporal Sampling Parameters for Super Hi-Vision
Taken together, these results suggest that the frame frequency
of SHV should be at least 120 Hz to achieve a worthwhile improvement in motion
portrayal. Naturally, a higher frame frequency would provide better quality,
but the improvement tends to saturate. (This is assuming display technology
capable of higher frame rates; for example, current digital cinemas use DMD
projectors that are capable of operating at several times higher frame rates.)
Discontinuities in tone reproduction, which usually occur as contouring
artifacts, should be avoided. This means that quantization characteristics,
particularly the bit depth, should be set such that it should not be possible
to discern modulation corresponding to a one-code value difference between adjacent
image areas. The contrast sensitivity was measured in a dim environment with
the modulation transfer characteristics of a gamma 1/2.4 transfer function for
10, 11, and 12-bit depths, as shown in Fig.
Fig. 9: Modulation
threshold and minimum modulation are shown for different bit depths.
The contrast sensitivity is based on Barten's equation,9
which has been used to determine the bit depth of the D-Cinema distribution
It is observed that 11- and 12-bit encoding modulation lines are below the visual
modulation threshold for the entire luminance range and do not show contouring.
Real objects can have highly saturated colors that are beyond
the color gamut of HDTV (ITU-R Rec. BT.709). Consumer-level flat-panel displays
are quickly becoming capable of reproducing a wider range of colors; in fact,
some non-broadcast video systems already handle a wider color gamut exceeding
that of HDTV. Thus, for the observer to experience "realness" and the sense
of "being there," SHV should cover a color gamut sufficiently wide enough to
approach encompassing all colors found in our natural world, and an efficient
and practical method should be devised for this.
To this end, requirements for developing a color representation
method and determining parameter values have been defined in terms of target
color, color-coding efficiency, program quality management, and feasibility
of displays. After comparing several methods for widening the color gamut in
terms of the requirements, the authors chose a colorimetry system with RGB monochromatic
primaries on the spectrum locus that can be realized, for example, by using
laser light sources in the foreseeable future.11
Note that the reference white of D65 remains unchanged. As shown in Fig.
10 and in Table
2, the wide-gamut colorimetry covers the gamuts of HDTV, the D-Cinema
reference projector, and Adobe RGB, as well as more than 99.9% of Pointer's
Experiments on the capture and display of wide color-gamut images have confirmed
the validity of the UHDTV (ITU-R Rec BT.2020) wide-gamut colorimetry, demonstrating
textures and highly saturated colors closer to those of real-world objects as
seen by observers.
Fig. 10: Pointer's colors and primaries are
shown for different video systems.
Table 2: This
table compares HDTV, Super-Hi-Vision, and other ranges for Pointer's gamut (a
well-known definition of the gamut of real-world color surfaces) and Optimal
Color (based on the color space defined by the International Commission on Illumination
Device Development toward Full-Spec SHV
A full-spec SHV system based on these specifications is being
developed. For practical implementation, several devices have been developed
to realize full-spec SHV. On the capturing side, a camera system with the full-spec
spatial sampling points and bit depth was developed.13
This camera system consists of 33-Mpixel CMOS (complementary metal oxide semiconductor)
image sensors, a 74-Gbit/sec bandwidth transmission device, and a signal processing
unit. However, these camera systems do not achieve a frame rate of 120 frames/sec
and do not have a sufficiently wide color gamut. To solve these issues, a CMOS
image sensor that can capture 120-frames/sec14
video was developed to meet the frame-rate specifications. For the camera operator,
it is also a major challenge to make the size of the camera head more compact.
On the display side, a LCoS (liquid-crystal– on–silicon)
projector with a resolution of 7680 x 4320 pixels was developed. And in 2011,
a liquid-crystal display (LCD) with the same pixel count, called a full-resolution
LCD, was devised, with details shown in Table
These display systems also do not achieve 120 frames/sec and do
not have a wide color gamut, which are the next development goals.
Details of a full-resolution LCD developed in 2011 appear above.
|Bit depth per color
Closer to "Being There"
Video parameter values for SHV have been established, with the
aim of delivering an enhanced, or even unprecedented, viewing experience to
viewers in various environments. Some parameters contribute to an increased
sense of "being there" and to the sense of "realness," while others help improve
the picture quality by eliminating artifacts in motion portrayal and tone reproduction.
Feasibility is also an important factor in determining the parameter values
further reading, Hiroyasu Masuda's article on "Ultrahigh-Definition Content
Production Techniques and Their Broadcasting Applications," can be found in
the November/ December 2011 SMPTE Motion Imaging Journal.
The viewing distance D (H) and the FOV θ(°) are
written as follows:
D = 1/V tan (1/ 2R), (1)
θ = 2 tan–1 (8/ 9D), (2)
where V is the number of vertical pixels and R (cpd)
is the angular resolution at the center of the screen with an aspect ratio of
1M. Kanazawa, K. Mitani, K. Hamasaki,
M. Sugawara, F. Okano, K. Doi, and M. Seino, "Ultrahigh-Definition Video System
With 4000 Scanning Lines," Proc. IBC 2003, 321–329 (2003).
2K. Hamasaki et al., "The 22.2
Multichannel Sound System and Its Application," Proc. 118th AES Convention,
paper No. 6406 (May 2006).
3CIE: Commission internationale de
l'Eclairage proceedings (Cambridge University Press, Cambridge, 1932).
4K. Masaoka, M. Emoto, M. Suagwara,
and Y. Nojiri, "Contrast Effect in Evaluating the Sense of Presence for Wide
Displays," J. Soc. Info. Display 14, 785-791 (Sept. 2006).
5K. Masaoka, Y. Nishida, M. Sugawara,
E. Nakasu, and Y. Nojiri, "Sensation of Realness from High-Resolution Images
of Real Objects," IEEE Trans. Broadcast 58 (2012) (to be published)
6K. Omura, M. Sugawara, and Y. Nojiri,
"Evaluation of Motion Blur by Comparison with Still Picture," presented at the
IEICE General Convention, DS-3-3, pp. S-5–6 (2008).
7K. Omura, M. Sugawara, and Y. Nojiri,
"Subjective Evaluation of Motion Jerkiness for Various Frame Rates and Aperture
Ratios," IEICE Technical Report IE2008-205, pp. 7-11 (2009).
8M. Emoto and M. Sugawara, "Critical
Fusion Frequency for Bright and Wide Field-of-View Image Display," J. Display
9P. G. J. Barten, Contrast Sensitivity
of the Human Eye and its Effects on Image Quality, SPIE Optical Engineering
Press (Bellingham, WA, 1999).
10M. Cowan, G. Kennel, T. Maier,
and B. Walker, "Contrast Sensitivity Experiment to Determine the Bit Depth for
Digital Cinema," SMPTE Mot. Imag. J., 113, 281-292 (Sept. 2004).
11K. Masaoka, Y. Nishida, M. Sugawara,
and E. Nakasu, "Design of Primaries for a Wide-Gamut Television Colorimetry,"
IEEE Trans. Broadcast 56, 452–457 (Dec. 2010).
12K. Masaoka, K. Omura, Y. Nishida,
M. Sugawara, Y. Nojiri, E. Nakasu, S.Kagawa, A. Nagase, T. Kuno, and H. Sugiura,
"Demonstration of a Wide-Gamut System Colorimetry for UHDTV," Proceedings
of ITE Annual Convention 2010, Matsuyama, Japan, paper 6-2 (2010).
13T. Yamashita, R. Funatsu, T. Yanagi,
K. Mitani, Y. Nojiri, and T. Yoshida, "A Camera System Using Three 33-Mpixel
CMOS Image Sensors for UHDTV2," presented at the SMPTE Annual Technical Conference
& Expo, Hollywood, CA (Oct. 2010).
14K. Kitamura, T. Watabe, Y. Sadanaga,
T. Sawamoto, T. Kosugi, T. Akahori, T. Iida, K. Isobe, T. Watanabe, H. Shimamoto,
H. Ohtake, S. Aoyama, S. Kawahito, and N. Egami, "A 33 Mpixel, 120 fps CMOS
Image Sensor for UDTV Application with Two-Stage Column-Parallel Cyclic ADCs,"
Proceedings of IISW 2011, Hokkaido, Japan, R62 (June 2011). •
"Pointer's gamut" refers to work
by Dr. Michael R. Pointer, whose frequently cited 1980 paper for Color Research
and Application, "The Gamut of Real Surface Colors," considered the gamut of
real surface colors in the CIE 1976 L*u*v* and L*a*b* color spaces for a typical
dye set used in photographic paper and typical CRT displays, as opposed to the
wider gamut that the human eye is capable of viewing, as described by the MacAdam
Limits. (Useful Color Data, Munsell Color Science Laboratory, http://www.cis.rit.edu/mcsl/online/cie.php
and R. Heckaman and M. Fairchild "G0
and the Gamut of Real Objects,"
Munsell Color Science Laboratory, Rochester Institute of Technology.
Takayuki Yamashita (firstname.lastname@example.org)
is with the NHK Science and Technology Research Laboratories, Tokyo, Japan,
and Hiroyasu Masuda (email@example.com)
is with the Japan Broadcasting Corporation (NHK), Tokyo, Japan. Kenichiro
Masaoka, Kohei Omura, Masaki Emoto, Yukihiro Nishida, and Masayuki
Sugawara are with the NHK Science and Technology Research Laboratories,
Tokyo, Japan. This article is based on a paper presented at the SMPTE 2011 Annual
Technical Conference & Exhibition, 25–27 October 2011. Copyright ©
2012 by SMPTE.