matt's Soup!

martes, 14 de junio de 2016

What is Ultra HD Premium?

We have seen how dramatic the change to the entertainment and home cinema technology has been. The latest, and touted as the best ever, range of televisions one can now own is the Ultra HD Premium. Why is it a big deal? The badge they say, says it all. I mean, we know how much the industry loves dressing up their latest lineup with a badge or two. These badges are what sets the industry’s best apart from the rest. Everything runs through a certification process. This will give the buyer an idea on what to expect with the particular product, including some basic idea of how the product will perform under basic circumstances. You may even see some content stamped with specific approval.

Like we mentioned earlier, the Ultra HD Premium is the newest addition to the lot and will carry the 4K HD Premium badge.

According to pocket-lint.com:

“Panasonic recently revealed that its DX902 4K TV will carry the badge. And its UB900 Blu-ray player, now available in the UK, is the first device other than a television to get the stamp of approval.”

Wondering what a 4K HDR means and how different it is from other 4K devices? Let’s decode them here.

So, What is Ultra HD Premium?

Most of us have seen this badge on television sets and even on UHD Blue-ray. What does this mean though? This badge means that the particular product qualifies and passes the UHD Alliance’s specification to be considered a best-of-best audio-visual experience.

What About the Resolution?

Ultra HD, UHD or 4K…What is the difference? In truth, there is no difference. 4K products and videos are presented in 2160 pixels (3840 x 2160) resolution and that goes for any 4K product. 4K products today fall under a broader term, which is either UHD or Ultra HD. Ultra HD Premium however, confirms that the product or content achieves high-quality color, HDR (high dynamic range) and audio standards in addition to a 4K resolution.

What About the Color?

A 10-bit color depth is minimum for all these products, the quality of content is certified by the 4K HDR badge it carries. However, since a much wider palette is possible here, the color information showing 8-bit or 10-bit color depth doesn’t mean all displays can present it. For this reason, we have the color gamut. Color gamuts are visual representations of the full color spectrum and the achievable range from that which can be displayed.

Pocket-lint.com says:

“BT.2020 (also known as Rec. 2020) colour representation is the standard, but – and perhaps a little confusingly – different product types only need to display a given percentage of that gamut to achieve the Ultra HD Premium badge. A TV, for example, needs to display at least 90 per cent of P3 colours, while a mastering display must display a minimum of 100 per cent.”

What About the Dynamic Range?

OLED, LCD handle brightness and black levels quite different from HD televisions. When it comes to HDR range, a product can achieve the Ultra HD Premium badge in different instances. For instance, when the product has to showcase peak brightness of 1,000 nits with a black level of less than 0.05 nits, or 0.03 nits for mastering displays, it is LED; and if the product has to showcase peak brightness to be 540 nits with a black level of less than 0.0005 nits, it is OLED.

And the Ubiquity?

So, like we said, a product with an Ultra HD badge assures to deliver a premium experience. However, it is important to note that there is still a fair amount of choice and difference in potential between products. For instance, some TVs will be brighter and some will have deeper black levels and both will still qualify.

We believe that the Ultra HD Premium badge will eventually become ubiquitous, as manufacturers strive to achieve desired standards for their 4K products.

From a customer’s point-of-view, the badge will definitely help consumers choose a premium television set. Having said that, some manufacturers have chosen to not have their products certified, even if they do fit the Ultra HD criterion. Although, having a standardized naming convention will help consumers and manufacturers alike in positioning their products, some such as Sony, have decided to stick to their own naming conventions.

We are however, warming up to the idea of an Ultra HD Premium badge!

jueves, 19 de mayo de 2016

Intel VCA Powers ATEME Compression Engine

ATEME uses Intel's VCA PCIe card to power Titan live encoding

ATEME has announced the availability of its new high video quality and high-density transcoding solution based on the Intel Visual Compute Accelerator (Intel VCA) card.

This new solution brings together the power of ATEME’s TITAN Live video compression platform with the integrated CPU and graphics processing capabilities of the Intel VCA — enabling ATEME to increase channel density by more than 10 × over platforms without Intel graphics capabilities, while maintaining high video quality. The result is a significant reduction in total-cost-of-ownership per transcode channel.

Based on the ATEME fifth-generation STREAM compression engine, this future-proof solution utilizes the full capabilities of Intel’s technology with support for H.264 (AVC) and H.265 (HEVC), supporting resolutions up to 4K-UHD.

HEVC Video Codecs Comparison

Free Version	Free 4K video Version	Pro Version (Enterprise)	Pro+ Version (Enterprise + 4K video analysis)
Objective Metrics	SSIM		SSIM, PSNR
Different types of analysis	Encoding quality, encoding speed, bitarte handling, speed/quality analysis etc.		Encoding quality, encoding speed, bitarte handling, speed/quality analysis etc. (some graphs)
ColorPlanes	Y		Y, U, V and overall
Graphs	Some graphs		All the graphs for all the metrcis, codecs and presets
Test video sequences	20 HD video (only description)	11 4K video (only description)	20 HD video (available for download)	20 HD video (available for download) + 11 4K video
Hardware used for analysis	Desktop and Server configurations	Desktop configurations	Desktop and Server configurations
Tested uses-cases	3 different use cases: Fast Transcoding, Universal and Ripping (some graphs)	1 use case 10 fps ecnoding(some graphs)	3 different use cases: Fast Transcoding, Universal and Ripping	3 different use cases: Fast Transcoding, Universal and Ripping + 1 use case 10 fps ecnoding
Number of figures	29	11	5000+	5500+
Price	Free		$850	$995
Purchase
Purchase			Hint: You can remove "Extended download" service while purchasing to save money.
We can help you to analyze your codec

Video Codecs that Were Tested

HEVC

f265 H.265 Encoder

Intel MSS HEVC GAcc

Intel MSS HEVC Software

Ittiam HEVC Hardware Encoder

Ittiam HEVC Software Encoder

Strongene Lentoid HEVC Encoder

SHBP H.265 Real time encoder

x265

Non HEVC

InTeleMax TurboEnc

SIF Encoder

VP9 Video Codec

x264

Overview

Objectives and Testing Tools

HEVC Codec Testing Objectives

The main goal of this report is the presentation of a comparative evaluation of the quality of new HEVC codecs and codecs of other standards using objective measures of assessment. The comparison was done using settings provided by the developers of each codec. Nevertheless, we required all presets to satisfy minimum speed requirement on the particular use case. The main task of the comparison is to analyze different encoders for the task of transcoding video—e.g., compressing video for personal use.

HEVC Codec Testing Rools

The comparison was performed on two platforms:

Desktop—Core i7 4770R @3.9 GHz, RAM 4 GB, Windows 8.1

Server—Xeon E5 2697v3, RAM 64 GB, Windows Server 2012

For both platforms we considered three key use cases with different speed requirements.

Desktop

Ripping—no minimum speed
Universal—minimum 10 FPS
Fast transcoding—minimum 30 FPS

Server

Ripping—no minimum speed
Universal—minimum 30 FPS
Fast transcoding—minimum 60 FPS

Overall Conclusions

Overall, the leaders in this comparison are x265, Intel MSS Hevc and x264! Here are some overall graphs from report: Speed/Quality trade-off for Riping use-case (Y-SSIM metric)

Speed/Quality trade-off for Riping use-case (Y-SSIM metric)

Average bitrate for Fast transcoding use-case (Y-SSIM metric)

Professional Versions of Comparison Report

HEVC Comparison Report Pro 2015 version contains:

Additional objective metrics (PSNR, SSIM)

All metrics results for all colorplanes (Y,U,V and overall)

Results for all the sequences, codecs and presets used in comparison

Much more figures

etc.

Acknowledgments

The Graphics & Media Lab Video Group would like to express its gratitude to the following companies for providing the codecs and settings used in this report:

InTeleMax, Inc.

Intel Corporation

Ittiam Sysytems (P) Ltd.

Strongene Ltd.

”System house ”Business partners” company

SIF Encoder developper team

The WebM Project team

x264 developer team

MulticoreWare, Inc.

The Video Group would also like to thank these companies for their help and technical support during the tests.

Thanks

Special thanks to the following contributors of our previous comparisons

Codec Analysis and Tuning for Codec Developers and Codec Users

Computer Graphics and Multimedia Laboratory of Moscow State University:

10 years working in the area of video codec analysis and tuning using objective quality metrics and subjective comparisons.

20+ reports of video codec comparisons and analysis (H.264, MPEG-4 MPEG-2, decoders’ error recovery).

Methods and algorithms for codec comparison and analysis development, separate codec’s features and codec’s options analysis.

We could perform next task for codec developers and codec users.

Strong and Weak Points of Your Codec

Deep encoder parts analysis (ME, RC on GOP, mode decision, etc).

Weak and strong points for your encoder and complete information about encoding quality on different content types.

Encoding Quality improvement by the pre and post filtering (including technologies licensing).

Independent Codec Estimation Comparing to Other Codecs for Different Use-cases

Comparative analysis of your encoder and other encoders.

We have direct contact with many codec developers.

You will know place of your encoder between other newest well-known encoders (compare encoding quality, speed, bitrate handling, etc.).

Encoder Features Implementation Optimality Analysis

We perform encoder features effectiveness (speed/quality trade-off) analysis that could lead up to 30% increase in the speed/quality characteristics of your codec. We can help you to tune your codec and find best encoding parameters.

Contact Information

See all MSU Video Codec Comparisons
MSU video codecs comparisons resources:

Second Annual MPEG-4 AVC/H.264 Video Codec Comparison (December 2005)

FAQ of Second Annual MPEG-4 AVC/H.264 Video Codec Comparison

Other Materials

Video resources:

3D and stereo video Projects on 3D and stereo video processing and analysis MSU S3D-video analysis 1-st report (5 movies, captured) 2-nd report (5 movies, captured) 3-rd report (5 movies, 2D-3D conversion) 4-th report (5 movies, captured) 5-th report (5 movies, 2D-3D conversion) 6-th report (10 movies, Stereo Window) (New!) 7-th report (10 movies, Stereo Window) (New!) 8-th report (25 movies) (New!) MSU 3D Devices Testing Semiautomatic Visual-Attention Model (New!) Video Matting (New!) 3D Displays Video Generation 3D Displays Video Capturing Stereo Video Depth Map Generation MSU Video Quality Measurement tools Programs with different objective and subjective video quality metrics implementation MSU Video Quality Measurement Tool - objective metrics for codecs and filters comparison (Top!) Implemented metrics short info: PSNR, Delta, MSAD, MSE, SSIM, VQM, MSU Blurring Metric, MSU Blocking Metric MSU Brightness Flicking Metric MSU Brightness Independent PSNR MSU Drop Frame Metric MSU Noise Estimation Metric MSU Scene Change Detector MSU VQMT FAQ MSU Human Perceptual Quality Metric - several metrics for exact visual tests (Top!) Implemented metrics short info: ITU-R BT.500-11: DSIS, DSCQS I & II, SCACJ; EBU: SAMVIQ; MSUCQE MSU PVQ Metric Tool FAQ	Codecs comparisons Objective and subjective quality evaluation tests for video and image codecs Call for HEVC codecs 2016! HEVC Codec Comparison - 2015! 9-th MPEG4-AVC/H.264 Comparison 8-th MPEG4-AVC/H.264 Comparison 7-th MPEG4-AVC/H.264 Comparison 6-th MPEG4-AVC/H.264 Comparison 5-th MPEG4-AVC/H.264 Comparison Codec Analysis for Companies: Private Video Codec Comparison MSU Video Quality Measurement Tool Options Analysis of Codec x264 4-th MPEG4-AVC/H.264 Comparison (Top!) Lossless Video Codecs Comparison 2007 3-rd MPEG4-AVC/H.264 Comparison HD Photo and JPEG 2000 Comparison MPEG-2 Decoders Crash Test (Top!) Subjective Compar. of 4 Modern Codecs (Top!) 2-nd MPEG-4 AVC/H.264 Comparison (Top!) FAQ of 2-nd H.264 Comparison 1-st MPEG-4 AVC/H.264 Codecs Comparison FAQ of 1-st H.264 Comparison JPEG 2000 Image Codecs Comparison (Top!) Lossless Video Codecs Comparison 2004 MPEG-4 SP/ASP Codecs Comparison (Top!) Video codecs comparison (old) Ext. link: x264 parameters efficiency comparison
Public MSU video filters Here are available VirtualDub and AviSynth filters. For a given type of digital video filtration we typically develop a family of different algorithms and implementations. Generally there are also versions optimized for PC and hardware implementations (ASIC/FPGA/DSP). These optimized versions can be licensed to companies. Please contact us for details via video(at)graphics.cs.msu_ru. MSU Cartoon Restore MSU Noise Estimation MSU Frame Rate Conversion MSU Image Restoration MSU Denoising (Top!) MSU Old Cinema MSU Deblocking (Top!) MSU Smart Brightness and Contrast (Top!) MSU Smart Sharpen (Top!) MSU Noise generation MSU Noise estimation MSU Motion Estimation Information MSU Subtitles removal MSU Logo removal (Top!) MSU Deflicker (Top!) MSU Field Shift Fixer AviSynth plug-in MSU StegoVideo MSU Cartoonizer (Top!) MSU SmartDeblocking (Top!) MSU Color Enhancement MSU Old Color Restoration MSU TV Commercial Detector MSU filters FAQ (Read me!) MSU filters statistics	Filters for companies We are working with Intel, Samsung, RealNetworks and other companies on adapting our filters other video processing algorithms for specific video streams, applications and hardware like TV-sets, graphics cards, etc. Some of such projects are non-exclusive. Also we have internal researches. Please let us know via video(at)graphics.cs.msu_ru if you are interested in acquiring a license for such filters or making a custom R&D project on video processing, compression, computer vision. 3D Displays Video Generation 3D Displays Video Capturing Stereo Video Depth Map Generation Automatic Objects Segmentation Semiautomatic Objects Segmentation New Frame Rate Conversion New Deinterlacer MSU-Samsung Deinterlacing Project Digital TV Signal Enhancement Old Film Recovery Tuner TV Restore Panorama Video2Photo SuperResolution SuperPrecision MSU-Samsung image and video resampling MSU-Samsung Frame Rate Conversion Motion Phase filter Deshaker (video stabilization) Film Grain/Degrain filter Deblurring filter Video Content Search
Video codecs projects Different research and development projects on video codecs MSU Lossless Video Codec (Top!) MSU Screen Capture Lossless Codec (Top!) MSU MPEG-2 Video Codec x264 Codec Improvement	Other Other information Crazy gallery (filters screams :) License for commercial usage of MSU VideoGroup Public Software (please be careful: some soft like metrics has another license!)

Facebook Live Video! watch the world in real time

Launch it! https://www.facebook.com/livemap/?ref=bookmarks

Video Quality Evaluation Methodology and Verification Testing of HEVC Compression Performance

Intro

The High Efficiency Video Coding (HEVC) standard (ITU-T H.265 and ISO/IEC 23008-2) has been developed with the main goal of providing significantly improved video compression compared with its predecessors. In order to evaluate this goal, verification tests were conducted by the Joint Collaborative Team on Video Coding of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29. This study presents the subjective and objective results of a verification test in which the performance of the new standard is compared with its highly successful predecessor, the Advanced Video Coding (AVC) video compression standard (ITU-T H.264 and ISO/IEC 14496-10). The test used video sequences with resolutions ranging from 480p up to ultra-high definition, encoded at various quality levels using the HEVC Main profile and the AVC High profile. In order to provide a clear evaluation, this paper also discusses various aspects for the analysis of the test results. The tests showed that bit rate savings of 59% on average can be achieved by HEVC for the same perceived video quality, which is higher than a bit rate saving of 44% demonstrated with the PSNR objective quality metric. However, it has been shown that the bit rates required to achieve good quality of compressed content, as well as the bit rate savings relative to AVC, are highly dependent on the characteristics of the tested content.

We currently witnessing something that has become a once-in-a-decade event in the world video compression: the emergence of a major new family of video compression standards. The mid-1990s saw the introduction of the Moving Picture Experts Group (MPEG)-2 video coding standard (ITU-T Rec. H.262 and ISO/IEC 13818-2 [1]), the first compression standard to be widely adopted in broadcasting and entertainment applications. Advanced Video Coding (AVC) (ITU-T Rec. H.264 and ISO/IEC 14496-10 [2]) appeared in the mid-2000s, offering the same subjective quality at approximately half the bit rate. Now, a new standard, High Efficiency Video Coding (HEVC) (ITU-T Rec. H.265 and ISO/IEC 23008-2), has been developed that promises a further factor of two improvement in compression efficiency for the mid-2010s [3].

The HEVC standard has been jointly developed by the same two standardization organizations whose previous collaboration resulted in both MPEG-2 and AVC:

1) the ISO/IEC MPEG and 2) the ITU-T Video Coding Experts Group (VCEG), through the Joint Collaborative Team on Video Coding (JCT-VC) [4]. HEVC version 1 was ratified in 2013 as H.265 by the ITU-T and as MPEG-H Part 2 by ISO/IEC [5]. This first version supports applications that use conventional (single-layer) encoding of 4:2:0-sampled video with 8- or 10-bit precision. A second edition was completed in July 2014, and an additional extension was completed in February 2015 [6]. These extend the standard to support contribution applications with tools that enable 4:2:2- and 4:4:4-sampled video formats as well as 12- and 16-bit precision [7], and multilayer coding enhancements for efficient scalability [8] and stereo/multiview and depth-enhanced 3D compression [9]. A further amendment is currently being developed to enable more efficient coding of screen-captured graphics and text content and mixed-source content [10].

A few evaluations have previously been reported comparing the compression performance of HEVC with AVC and also demonstrating the suitability of HEVC for various applications, particularly including some evaluations for high-resolution video content [11]. The subjective test results for a number of test sequences and some analysis of the bit rate savings at different quality levels were presented in [12]. In [13], a study on the suitability of HEVC for beyond-HDTV broadcast services was presented, but with no comparison with previous standards. A comparison of HEVC and AVC performance for frame rates up to 30 Hz on a small number of ultra high definition (UHD) video sequences was presented in [14] and [15], and an informal study focused on low-delay (LD) applications and real-time encoding with HD resolutions was presented in [16].
All of these, including some that were carried out at the early stages of HEVC development, have provided very consistent evidence of the substantial coding efficiency improvements enabled by HEVC.

For its formal evaluation of HEVC performance, the JCT-VC performed a subjective evaluation on a wider range of content of resolutions varying from 480p (832 × 480) up to UHD with frame rates of up to 60 Hz. The test sequences used for the verification testing were deliberately chosen to be
different from those that had been used during the development of the standard, to avoid any possible bias that the standard could have toward those sequences. A test report was produced for the JCT-VC itself [17], and some additional analysis of HEVC performance for UHD content has been presented
in [18] (using cropped 2560×1600 regions) and [19] (a recent brief conference publication about the testing).

As video coding standards generally specify only the format of the data and the associated decoding process without specifying how encoding is to be performed, it is not possible in general to test the compression performance of a standard. Some particular encoding methods must be used as a proxy to represent the capability of the standard instead. The outcome of such a comparison is generally more reliable when similar encoding techniques with similar configurations are applied in the compared encoders rather than simply comparing unknown technologies as black boxes. The verification tests were therefore performed using reference software encoders that had been developed in the standardization work and used very similar encoding algorithms and configurations that were selected to represent important applications. These publicly available reference software codebases are known as the HEVC model (HM) for HEVC [20] and the joint model (JM) for AVC [21].

While this study reports an extended set of results of HEVC verification tests, it also summarizes the details of tools that can be used in the analysis of the results, pointing to a number of factors an evaluation should consider. Compared with the initial results [17], those presented in this study are based
on the use of more viewers for subjective tests, the objective results are presented and compared with the subjective results, and additional analysis of the coding gains versus the bit rate is provided. Ultimately, the verification test showed that the key objective of HEVC had been achieved—i.e., providing a
substantial improvement in compression efficiency relative to its predecessor AVC.

This study is organized as follows. An overview of video quality evaluation and statistical analysis methodology is presented in Section II, and the test settings used in the subjective evaluations are detailed in Section III. Section IV presents the test results and detailed analysis. Finally, the conclusion is
given in Section V.

II. VIDEO QUALITY EVALUATION

This section provides an overview of the objective and subjective video quality metrics and the related analysis used in this study.

For the convenience of video coding performance assessment, the most commonly used objective metric is peak signal-to-noise ratio (PSNR). However, it is commonly acknowledged that PSNR has the disadvantages of disregarding the viewing conditions and the characteristics of human
visual system perception. In addition, the PSNR for a given video sequence can be computed in different ways, depending on how the picture components (e.g., luma and chroma) or individual picture PSNR values are combined. Nevertheless, for a particular content item and small variations of coding conditions, the changes in PSNR values for an overall video sequence can typically be reliably interpreted.

Other objective video quality metrics, such as the structural similarity index (SSIM) and video quality metric (VQM), have been proposed, but are not used nearly as frequently as PSNR [22]. VQM is not often used—primarily due to its computational complexity—and for both metrics, the interpretation of the values they provide has not yet become a common practice in video coding community. Therefore, in the context of HEVC and AVC compression, this paper provides comparisons and analysis using the PSNR objective measure and subjective quality evaluation results.

Subjective quality evaluation is the process of employing human viewers for grading video quality based on individual perception. Formal methods and guidelines for subjective quality assessments are specified in various ITU recommendations. Among the many of these, the most relevant to this context
are ITU-T Rec. P.910 [23], which defines subjective video quality assessment methods for multimedia applications, and ITU-R Rec. BT.500 [24], which defines a methodology for the subjective assessment of the quality of television pictures. These specifications describe a number of test methods with
distinct presentation and scoring schemes, along with the recommended viewing conditions. Explanations of the quality metrics and data analysis methods are provided in the following sections.

A. Objective Quality Evaluation Using PSNR

PSNR is defined as the ratio between the maximum possible power of the signal (the original image) and the power of noise, which in the considered scenario is introduced by lossy compression. For a decoded image component Id, the mean square error (MSE) with reference to an original image component I is computed as

where M and N are the width and height of the image component, and the image component is, for example, an array of luma samples or CB or CR chroma samples. The PSNR value is then computed as

where B is the bit depth of image samples. This is typically calculated for each frame separately, and then averaged for the frames of a video sequence. Due to the logarithmic transformation, this corresponds to using the geometric average of frame MSEs, and the impact of this should be critically considered when a high fluctuation over frames is present.

For video sequences, which ordinarily consist of three color components, either the luma PSNR value (PSNRY ), calculated using only luma component values, may be reported or a weighted PSNR value (PSNRW ) using all three components can be computed using some weighting criteria. An example
of a popular weighting for content with 4:2:0 sampling is

The most accurate interpretation of the objective results is obtained by looking at the frame-by-frame results for each component. However, this may not be practical for the final presentation of the results for a large data set and a large number of test points.

B. Subjective Quality Evaluation

For the HEVC verification test that includes a wide range of visual quality points, a degradation category rating (DCR) [23] test method was selected. For this purpose, it was used to evaluate the quality (and not the impairment) with a
quality rating scale made of 11 levels [23], ranging from 0 (lowest quality) to 10 (highest quality), which may be interpreted as in Table I. The numerical scale helps avoid misinterpretations associated with the use of category adjectives (e.g., excellent or good), especially in cases where the tests are performed across different countries and including nonnative English speakers.

The basic results of the subjective test are evaluated in terms of the average rating, which is called the mean opinion score (MOS), and the associated confidence interval values that are computed for each coding point, after having verified the reliability of each viewer. For the DCR method, it is recommended to hire more than 15 naive viewers that have been properly screened for visual acuity and color blindness, to allow for an accurate statistical analysis of the subjective scores [24].

From the raw data, i.e., the individual subjective scores, the reliability of each viewer is calculated. The individual reliability is evaluated using the correlation coefficient r computed between each score xi provided by a viewer and the overall MOS value yi assigned for that test point i as

where T is the total number of test points for a viewing session, yi is the average of all scores for the test point i, and x ¯ and y ¯ are the average values of xi and yi for all test points, respectively. In this HEVC verification test, a correlation index greater than or equal to 0.75 is considered as valid for the
acceptance of the viewer’s scorings; otherwise, the viewer is considered as an outlier. Once the results for outliers are discarded, the MOS for each test point is computed using the arithmetic average of scores of the remaining viewers.
In addition, the confidence interval is computed for each test point to estimate a range of values covered by a certain probability. Assuming a Gaussian (normal) distribution for the population of subjective scores with sample size n, mean (MOS) μ, and sample-based standard deviation measurement s, the confidence interval is defined as (μ − c, μ + c), where c is computed as

In the analysis of the subjective test results, the 95% confidence interval, as shown in Fig. 1, is calculated for each test point. For a 95% confidence interval with a Gaussian distribution, the value of z in (5) is 1.96. For the results
presented in Section IV, the confidence interval is plotted alongside the MOS, as shown in Fig. 2, with an interpolated curve from MOS values.

C. Interpretation of Bit Rate Savings From

Subjective Quality Comparison

The objective of the verification test is to gauge the bit rate savings of HEVC over AVC when the AVC and HEVC test points have the same subjective quality.
Fig. 3 shows an example of a plot comparing the AVC and HEVC MOS versus bit rate curves. There is no overlap in the MOS confidence intervals of the HEVC test point C and AVC test point B, and hence, there is sufficient statistical significance to conclude that the HEVC test point C has a better quality than the AVC test point B. There is, however, an overlap in the MOS confidence intervals of the HEVC test point A and AVC test point B. This means that it is highly likely that the HEVC test point A and AVC test point B have subjective quality that cannot be distinguished. However, there is still a chance that the subjective qualities of HEVC test point A and AVC test point B are not the same.

A more rigorous analysis is to perform a two-sample unequal variance (heteroscedastic) student’s t-test test using the two-tailed distribution to determine if indeed the subjective qualities given by the sample mean values of the pair of test points are not the same. The null hypothesis, H0, in this case
would be that the HEVC test points have the same quality as the AVC test point, and the alternate hypothesis, Ha, is that the HEVC test points do not have the same quality as the AVC test point.

To compare the means of two populations, the t-statistic can be used, which is expressed as

where X¯i, si2, and ni denote the sample mean, the sample variance, and the size of the ith sample, i ∈ {1, 2}.

By computing the t-statistic in this way and approximating it with a student’s t-distribution whose degree of freedom (DF) is specified as

a probability value p can be computed from the t-statistic that indicates the extent to which the means of the two populations are considered to be different. The smaller the p-value is, the more significant the difference between the distributions of the two populations is.

A p-value less than 0.05 indicates a very low probability of committing a type-I error (i.e., rejecting the null hypothesis when it is true). In such a case, the null hypothesis can thus be safely rejected, and it can be concluded that there is statistical significance that the HEVC test point does not have the same quality as the AVC test point. A p-value greater than or equal to 0.05 means that the null hypothesis cannot be confidently rejected. For the purpose of this paper, the HEVC test point is considered to have the same quality as the corresponding AVC test point in such a case. However, there is still a possibility of committing a type-II error (i.e., failure to reject the null hypothesis when in fact the alternate hypothesis is true). The power or sensitivity of a statistical test is the probability of correctly rejecting the null hypothesis (H0) when it is false—i.e., the probability of correctly accepting the alternative hypothesis (Ha) when it is true [25]. A statistical power test of the data has shown that if in fact the true population mean for the difference in the HEVC MOS and AVC MOS is greater than or equal to 0.8, then the mean probability of committing a type-II error, β, is less than or equal to 0.14, and hence the mean power of the test (defined as 1 − β) is 0.86. By convention, a test with a power greater than 0.8 (or β ≤ 0.2) is considered statistically powerful [26]. In the design of the verification test, four bit rates per codec, RHEVC and RAVC, were used. The bit rates were carefully selected for each of 20 test sequences so that each RHEVC is approximately half of the corresponding RAVC. These gave 80 pairs of test points on which the t-test described above was applied. The results of the test determine, for each pair of test points, whether the HEVC test point has a quality better than, the same as, or less than the AVC test point, and give a rough estimate of the bit rate savings of HEVC compared with AVC.

The following are the possible outcomes for each pair of test points. The first case is when the null hypothesis is rejected and there is statistical significance that the HEVC MOS at RHEVC is greater than the AVC MOS at RAVC. This means that one can reasonably conclude that the HEVC test point is achieving a better quality than the AVC test point at half the bit rate of AVC. Note that by the design of the test, the bit rate saving when RHEVC is half RAVC is 50%. Since the bit rate for an HEVC test point could be further reduced to achieve the same

quality, the bit rate saving of HEVC compared with AVC is therefore greater than 50% for this case. The second case is when the null hypothesis was failed to be rejected. This means that the HEVC test point has about the same quality as the AVC test point at half the AVC bit rate, since RHEVC is approximately half of RAVC. Therefore, the bit rate saving of HEVC compared with AVC is approximately 50% for this case. The third case is when the null hypothesis is rejected and there is statistical significance to conclude that the HEVC MOS at RHEVC is less than the AVC MOS at RAVC. This means that the HEVC test point is not achieving equal or better quality than the AVC test point at half the bit rate of AVC.

More bits would need to be allocated to the HEVC test point before the same quality would be achieved. Therefore, the bit rate saving of HEVC compared with AVC is less than 50% for this case.

D. Bjøntegaard Model

The Bjøntegaard model [27], [28] has become a popular tool for evaluating the coding efficiency of a given video codec in comparison with a reference codec over a range of quality points or bit rates. Bjøntegaard delta (BD) metrics are
typically computed as a difference in bit rate or a difference in quality based on interpolating curves from the tested data points. In this paper, the focus is on the difference in bit rate, expressed as a percentage of a reference bit rate, as this is easily interpreted as the bit rate saving benefit for equal measured quality.

The BD-rate represents the average bit rate savings for the same video quality (e.g., PSNR or MOS) and is calculated between two rate-distortion curves, such as AVC and HEVC MOS curves in Fig. 3. The bit rate saving difference between the two rate-distortion curves at a given level of quality is

where RA(D) and RB(D) are the bit rate of the interpolated reference and tested bit rate curves, respectively, at the given level of quality/distortion D. R(D) is typically represented as a percentage of the reference bit rate RA(D) so that a
negative value represents compression gain, while a positive value represents compression loss.

The Bjøntegaard model uses a logarithmic scale for the domain of the bit rate interpolation, so by defining r = log R, the bit rate savings can be expressed as

where the lower DL and higher DH integration bounds are computed from the range of the interpolated distortion values DA and DB for the reference and tested data sets, respectively, as

where D(0) is the lowest and D(N−1) is the highest measured quality point, for either the tested or reference sets, as shown in Fig. 4. In the HEVC verification test, the number of test points for both the reference and evaluated sets is four

(i.e., NA = NB = 4) and the curve fitting uses cubic spline interpolation.

As can be observed from Fig. 4, in some cases, the overall BD-rate measure may be computed over a relatively small interval of overlapping distortion regions. In such a case, the BD-rate metric does not necessarily represent average coding efficiency for all test points involved in the actual test. Therefore, it is important to design the test in a way that the distortion overlap between the two tested codecs covers a range of qualities of interest for specific application. As the metrics derived from the Bjøntegaard model can be applied to different evaluation criteria, it is important to understand the range on which they are computed. For example, BD-rate can be computed for MOS and PSNR, for the same test material. However, as demonstrated in Fig. 5, the actual bit rates on which the two are computed may not be the same. Therefore, in addition to providing BD-rates forboth PSNR and MOS in our evaluation reported in Section IV, we also compute BD-rate on the bit rate interval common for
both criteria (MOS and PSNR).

Newer studies [29] have shown how the Bjøntegaard model can further be extended to compute BD-rate intervals considering the confidence intervals of the MOS ratings for each test point, as shown in Fig. 6. The dotted curves show
the boundaries of confidence intervals for each curve, and two new BD-rate values are computed comparing DB,min with DA,max (labeled BD-ratemin) and DB,max with DA,min (labeled BD-ratemax), where [Dmin, Dmax] represents 95% confidence intervals of MOS. The new BD-rates thus provide lower and
upper limits for the BD-rate. However, it is noted that these three values of BD-rate are based on different reference (AVC) bit rate ranges as shown in Fig. 6. Although in the results reported in Section IV, these intervals are reported, they have to be carefully interpreted, as the limits of the intervals are defined for significantly different bit rate ranges. However, for relatively small differences between rate-distortion curves, it can be useful to evaluate BD-rate confidence intervals.

III. TEST SETTINGS

This section provides information regarding the test material used, test settings, and logistics.

A. Selection of Test Material and Test Points

The HEVC verification tests were carried out for four categories of spatial resolutions: UHD (3840 × 2160, except for the Traffic sequence, which is 4096 × 2048), 1080p (1920 × 1080), 720p (1280 × 720), and 480p (832 × 480). The details of the test sequences are provided in Tables II and III. Screenshots are given in Figs. 8–11. The sequences are selected from different sources and have different spatiotemporal characteristics, which leads to different behavior of compression algorithms. This is the first formal test of video compression standards where a wide range of resolutions including content with UHD resolution with highframe rate has been evaluated. The format, as specified in
ITU-R Rec. BT.2020 [30], has a number of extended features compared with ordinary HD video. In addition to containing more pixels per frame, it specifies support for higher frame rates, wider color gamut, and higher bit depths [31]. However, for compatibility with available playout and display systems, all tested video sequences have 8 bits per component per sample and are in the Y’CBCR color space defined by ITU-R Rec. BT.709 [32].

The test sequences were compressed using HEVC (HM-12.1, Main profile [20]) and AVC (JM-18.5, High profile [21]) encoding. Either a random access (RA) or low delay (LD) configuration (Cfg) was used (similarly configured for both HEVC and AVC, with a refresh period of approximately 1 s and hierarchical referencing for RA and with no periodic refresh and no reordered referencing for LD).

For each test sequence, four test points using different fixed quantization parameter (QP) settings were selected so that the tested HEVC bit rates are approximately half of the AVC bit rates. Also, the ranges of the QP values were selected so that the subjective quality of the encoded sequences spans a large range of MOS values. This bit rate ratio was chosen because it is already well established that the quality of HEVC is much better than the quality of AVC at the same bit rate.

These test conditions can identify whether a bit rate saving of 50% or more is achieved for the majority of the tested video sequences. The full details of the QP values and bit rates can be found in [17].

B. Subjective Test Structure

Subjective tests for different categories of video sequences were conducted in separate sessions. Each subjective test session of the DCR method consisted of a series of basic test cells (BTCs). Each BTC was made of two consecutive presentations of the video clip under test, as shown in Fig. 7.
First, the original version of the video sequence was displayed, followed by the coded version of the video sequence, with a gap of 1 s. Then, a message asking the viewers to vote was displayed for 5 s. Each test session was designed with 45 BTCs in total: the first three BTCs represented the stabilization phase and were selected to show to the viewers the whole range of quality they would see during the test. Two BTCs showing original versus original were also used as a sanity check for the range of ratings made by the viewers. The scores coming from the original BTCs and from the stabilization phase were excluded from further analysis. All the BTCs were randomly ordered to avoid the same content being seen repeatedly and to spread the quality as much as possible in a uniform way across the whole test.

C. Subjective Test Logistics

The subjective tests were performed at two sites, under a controlled laboratory environment, adhering to the recommendations in ITU-R Rec. BT.500 and ITU-T Rec. P.910. The tests for UHD and 720p resolutions were done at the BBC R&D
labs in London and for the other resolutions at the University of the West of Scotland (UWS). The equipment and session details for each test are given in Tables IV and V, respectively. Additional analysis of the influence of viewing distances (front and back rows in the seating arrangement for viewers) on the
subjective rating has been provided in [19].

IV. RESULTS

This section summarizes the subjective test results. It also provides an analysis with a focus on a comparison with the objective test results, which are easier to obtain in practice.

A. Objective and Subjective Test Results

The subjective evaluation results for each category of test sequences are shown in Figs. 12–15 in the form of MOS versus bit rate plots.

The objective quality metric (PSNR) values for the same test points are also plotted on the same graph using a second vertical axis. Note that the scales for the two vertical axes in these plots are independently selected, and thus no direct connection between the subjective (i.e., MOS) and objective (i.e., PSNR) plots is demonstrated. The legend in all plots shows that circle and triangle markers represent the results for actual test points, while the curves between them were calculated using cubic spline interpolation with the bit rate in a logarithmic scale, as in typical BD-rate computation. Only the parts of the curves related to BD-rate calculation, either for MOS or PSNR BD-rate computation, are displayed. The solid and dotted lines represent MOS and PSNR curves, respectively. Confidence intervals are displayed for each MOS test point. The PSNR results presented are for the luma color component only. Because of space limitation, the chroma results are not presented. However, the authors note that the weighted PSNR results as in (3) are highly correlated to luma-only PSNR results. Typically, the weighted PSNR has a somewhat higher value than luma PSNR, but the values of BD-rate for weighted PSNR are typically close to the values of BD-rate for luma PSNR.

By considering the positions of MOS points for AVC and HEVC, it can be observed that HEVC achieves the same subjective quality as AVC while typically requiring substantially lower bit rates. Table VI shows the results of the student’s t-test on the 80 pairs of HEVC and AVC test points. These pairs of test points were classified into the three categories of bit rate savings as described in Section II-C. The first four rows show the distribution of bit rate savings achieved for each resolution. The last row summarizes the distribution
of bit rate saving statistics for all resolutions. This shows that for 74 out of the 80 pairs of test points (or 92.5%), HEVC has a bit rate saving compared with AVC that is greater than or equal to 50%. The amount of bit rate saving is similar for both the RA and LD test cases. Only six pairs of test points (or 7.5%) show a bit rate saving of less than 50%. Four of the six pairs of test points were contributed by one video sequence (SVT04a), where the HEVC encoder did not perform as well as in the other cases.

The data convincingly show that the HEVC is achieving bit rate saving that is at least 50%. However, the granularity of the above bit rate saving estimation was limited by the test design, where in each pair of test points, the HEVC bit rate was selected as approximately half the AVC bit rate. In order to get a more precise quantification of the estimated bit rate savings, a different method is used, where the MOS BD-rate values for the subjective ratings for each
test sequence, as shown in Table VII, are computed. The upper and lower limits of BD-rates corresponding to the 95% confidence interval of MOS, as discussed in Section II-D, are also indicated.

In addition to the BD-rates for the available MOS range of each video sequence, an additional measurement was also calculated for the range of MOS scores greater than or equal to seven. This range (MOS ≥ 7) is typically expected for
a number of services, such as broadcasting, where targeted quality levels are good to excellent. Negative BD-rate values in Table VII indicate the bit rate savings measured for HEVC relative to the bit rate used for AVC.

The averages in Table VII are computed only for the BD-rate values displayed in the table. Note that the results for MOS BD-rates for the HomelessSleeping and Cubicle sequences are excluded. In order for the BD-rate interpolation to work correctly, the MOS values should exhibit a smooth curve and the interval of the averaging (shown as curves in Figs. 12–15) should be interpolated from at least
three points that are monotonically increasing with bit rate.

This condition was not satisfied for the omitted test sequences (although a very substantial gain is evident for HEVC in both omitted cases).

It is noted that for most video sequences, the MOS-based BD-rate benefit is substantially higher than what is measured by the PSNR-based BD-rate. MOS BD-rates indicate that HEVC could provide the same visual quality as AVC for most tested content categories at well below half the bit rate of the
latter, surpassing the performance expected at the launch of the HEVC standard development process. The fact that BD-rates for MOS, confidence intervals, and
PSNR in Table VII have not been computed over the same bit rate range for a given test sequence, which has been discussed in Section II, is further addressed in the following section.

B. Compression Efficiency Results for Specific Bit Rates

Although different BD-rate measures presented in Table VII are useful indicators of the compression performance of HEVC, comparing them with each other raises validity concerns, as they are computed for different reference bit rate
ranges, as discussed in Section II. To partly address this problem, additional analysis of the results has been conducted to evaluate compression efficiency for bit rates that are common to both PSNR and MOS BD-rate computation.

The bit rate savings achieved by HEVC with reference to the associated AVC bit rate, computed for the continuous bit rate range using cubic spline interpolation as discussed in Section II-D, are shown in Fig. 16, considering both subjective and objective quality assessments. In the majority of cases, the bit rate savings for equal MOS are higher than the bit rate savings for equal PSNR. The savings vary across different bit rates and different video source content.

Considering only the parts of the curves from Fig. 16 that are defined both for MOS and PSNR for a given AVC bit rate, we have measured the average BD-rates and the average difference between the PSNR and MOS curves, which are presented

in Table VIII. The results in Table VIII do not take into account the test sequences that were discarded according to MOS BD-rate computation problems (HomelessSleeping and Cubicle) described in Section IV-A. For the same bit rate range, the MOS-based BD-rate saving is 59% and the PSNR-based BD-rate saving is 44%, averaged over all test sequences. Depending on the content category, the average differences between the two measurement methods
are between 11% and 18%. In other words, the bit rate savings measured for equal PSNR are lower than the bit rate savings measured for equal MOS between AVC and HEVC by roughly 15%. This is consistent with the difference between the average values in Table VII, with the difference being that
in this case the bit rate range for PSNR and MOS rate saving measurement is equal.

V. CONCLUSION

This paper presents the results of a formal subjective verification test that was carried out by the JCT-VC for the new HEVC video coding standard. In this paper, a more rigorous analysis of the subjective test results using student’s t-test shows that the HEVC test points at half or less than half the bit rate of the AVC reference were found to achieve a comparable quality in 92.5% of the test cases. In addition, it provides a summary of evaluation metrics and an analysis
of the results that were obtained, with a focus on comparison between the subjective and objective evaluation results and the performance across different bit rate ranges. A more precise quantitative estimate of the bit rate savings
was obtained by applying the MOS-based BD-rate measurement on the results of the subjective test. It was found for the investigated test cases that the HEVC Main profile can achieve the same subjective quality as the AVC High profile while requiring on average approximately 59% fewer bits.

The PSNR-based BD-rate average over the same sequences was calculated to be 44%. This confirms that the subjective quality improvements of HEVC are typically greater than the objective quality improvements measured by the method cthat was primarily used during the standardization processof HEVC. It can therefore be concluded that the HEVC standard is able to deliver the same subjective quality as AVC, while on
average (and in the vast majority of typical sequences) requiring only half or even less than half of the bit rate used by AVC. This means that the initial objective of the HEVC development (substantial improvement in compression compared with the previous state of the art) has been successfully achieved.