martes, 14 de junio de 2016

What is Ultra HD Premium?



We have seen how dramatic the change to the entertainment and home cinema technology has been. The latest, and touted as the best ever, range of televisions one can now own is the Ultra HD Premium. Why is it a big deal? The badge they say, says it all. I mean, we know how much the industry loves dressing up their latest lineup with a badge or two. These badges are what sets the industry’s best apart from the rest. Everything runs through a certification process. This will give the buyer an idea on what to expect with the particular product, including some basic idea of how the product will perform under basic circumstances. You may even see some content stamped with specific approval.
Like we mentioned earlier, the Ultra HD Premium is the newest addition to the lot and will carry the 4K HD Premium badge.
According to pocket-lint.com:
“Panasonic recently revealed that its DX902 4K TV will carry the badge. And its UB900 Blu-ray player, now available in the UK, is the first device other than a television to get the stamp of approval.”
Wondering what a 4K HDR means and how different it is from other 4K devices? Let’s decode them here.

So, What is Ultra HD Premium?

Most of us have seen this badge on television sets and even on UHD Blue-ray. What does this mean though? This badge means that the particular product qualifies and passes the UHD Alliance’s specification to be considered a best-of-best audio-visual experience.

What About the Resolution?

Ultra HD, UHD or 4K…What is the difference? In truth, there is no difference. 4K products and videos are presented in 2160 pixels (3840 x 2160) resolution and that goes for any 4K product. 4K products today fall under a broader term, which is either UHD or Ultra HD. Ultra HD Premium however, confirms that the product or content achieves high-quality color, HDR (high dynamic range) and audio standards in addition to a 4K resolution.

What About the Color?

A 10-bit color depth is minimum for all these products, the quality of content is certified by the 4K HDR badge it carries. However, since a much wider palette is possible here, the color information showing 8-bit or 10-bit color depth doesn’t mean all displays can present it. For this reason, we have the color gamut. Color gamuts are visual representations of the full color spectrum and the achievable range from that which can be displayed.
“BT.2020 (also known as Rec. 2020) colour representation is the standard, but – and perhaps a little confusingly – different product types only need to display a given percentage of that gamut to achieve the Ultra HD Premium badge. A TV, for example, needs to display at least 90 per cent of P3 colours, while a mastering display must display a minimum of 100 per cent.”

What About the Dynamic Range?

OLED, LCD handle brightness and black levels quite different from HD televisions. When it comes to HDR range, a product can achieve the Ultra HD Premium badge in different instances. For instance, when the product has to showcase peak brightness of 1,000 nits with a black level of less than 0.05 nits, or 0.03 nits for mastering displays, it is LED; and if the product has to showcase peak brightness to be 540 nits with a black level of less than 0.0005 nits, it is OLED.

And the Ubiquity?

So, like we said, a product with an Ultra HD badge assures to deliver a premium experience. However, it is important to note that there is still a fair amount of choice and difference in potential between products. For instance, some TVs will be brighter and some will have deeper black levels and both will still qualify.
We believe that the Ultra HD Premium badge will eventually become ubiquitous, as manufacturers strive to achieve desired standards for their 4K products.
From a customer’s point-of-view, the badge will definitely help consumers choose a premium television set. Having said that, some manufacturers have chosen to not have their products certified, even if they do fit the Ultra HD criterion. Although, having a standardized naming convention will help consumers and manufacturers alike in positioning their products, some such as Sony, have decided to stick to their own naming conventions.
We are however, warming up to the idea of an Ultra HD Premium badge!

jueves, 19 de mayo de 2016

Intel VCA Powers ATEME Compression Engine

ATEME uses Intel's VCA PCIe card to power Titan live encoding

ATEME has announced the availability of its new high video quality and high-density transcoding solution based on the Intel Visual Compute Accelerator (Intel VCA) card.
This new solution brings together the power of ATEME’s TITAN Live video compression platform with the integrated CPU and graphics processing capabilities of the Intel VCA — enabling ATEME to increase channel density by more than 10 × over platforms without Intel graphics capabilities, while maintaining high video quality. The result is a significant reduction in total-cost-of-ownership per transcode channel.
Based on the ATEME fifth-generation STREAM compression engine, this future-proof solution utilizes the full capabilities of Intel’s technology with support for H.264 (AVC) and H.265 (HEVC), supporting resolutions up to 4K-UHD.

HEVC Video Codecs Comparison

Free VersionFree 4K video VersionPro Version (Enterprise)Pro+ Version (Enterprise + 4K video analysis)
Objective MetricsSSIMSSIM, PSNR
Different types of analysisEncoding quality, encoding speed, bitarte handling, speed/quality analysis etc.Encoding quality, encoding speed, bitarte handling, speed/quality analysis etc. (some graphs)
ColorPlanesYY, U, V and overall
GraphsSome graphsAll the graphs for all the metrcis, codecs and presets
Test video sequences20 HD video (only description)11 4K video (only description)20 HD video (available for download)20 HD video (available for download) + 11 4K video
Hardware used for analysisDesktop and
Server configurations
Desktop configurationsDesktop and
Server configurations
Tested uses-cases3 different use cases:
Fast Transcoding, Universal and Ripping (some graphs)
1 use case 10 fps ecnoding(some graphs)3 different use cases:
Fast Transcoding, Universal and Ripping
3 different use cases:
Fast Transcoding, Universal and Ripping + 1 use case 10 fps ecnoding
Number of figures29115000+5500+
PriceFree$850$995
PurchaseDownloadDownloadBuyBuy
Hint: You can remove "Extended download" service while purchasing to save money.
We can help you to analyze your codec


Video Codecs that Were Tested


  • HEVC
  • f265 H.265 Encoder
  • Intel MSS HEVC GAcc
  • Intel MSS HEVC Software
  • Ittiam HEVC Hardware Encoder
  • Ittiam HEVC Software Encoder
  • Strongene Lentoid HEVC Encoder
  • SHBP H.265 Real time encoder
  • x265
  • Non HEVC
  • InTeleMax TurboEnc
  • SIF Encoder
  • VP9 Video Codec
  • x264

  • Overview



    Objectives and Testing Tools


    HEVC Codec Testing Objectives

    The main goal of this report is the presentation of a comparative evaluation of the quality of new HEVC codecs and codecs of other standards using objective measures of assessment. The comparison was done using settings provided by the developers of each codec. Nevertheless, we required all presets to satisfy minimum speed requirement on the particular use case. The main task of the comparison is to analyze different encoders for the task of transcoding video—e.g., compressing video for personal use.

    HEVC Codec Testing Rools

    The comparison was performed on two platforms:
  • Desktop—Core i7 4770R @3.9 GHz, RAM 4 GB, Windows 8.1
  • Server—Xeon E5 2697v3, RAM 64 GB, Windows Server 2012
  • For both platforms we considered three key use cases with different speed requirements.
  • Desktop
    • Ripping—no minimum speed
    • Universal—minimum 10 FPS
    • Fast transcoding—minimum 30 FPS
  • Server
    • Ripping—no minimum speed
    • Universal—minimum 30 FPS
    • Fast transcoding—minimum 60 FPS
  • Overall Conclusions

    Overall, the leaders in this comparison are x265, Intel MSS Hevc and x264! Here are some overall graphs from report:
    Speed/Quality trade-off for Riping use-case (Y-SSIM metric)
    Average bitrate for Fast transcoding use-case (Y-SSIM metric)
    Average bitrate for Fast transcoding use-case (Y-SSIM metric)

    Professional Versions of Comparison Report


    HEVC Comparison Report Pro 2015 version contains:
  • Additional objective metrics (PSNR, SSIM)
  • All metrics results for all colorplanes (Y,U,V and overall)
  • Results for all the sequences, codecs and presets used in comparison
  • Much more figures
  • etc.

  • Acknowledgments


    The Graphics & Media Lab Video Group would like to express its gratitude to the following companies for providing the codecs and settings used in this report:
  • InTeleMax, Inc.
  • Intel Corporation
  • Ittiam Sysytems (P) Ltd.
  • Strongene Ltd.
  • ”System house ”Business partners” company
  • SIF Encoder developper team
  • The WebM Project team
  • x264 developer team
  • MulticoreWare, Inc.
  • The Video Group would also like to thank these companies for their help and technical support during the tests.

    Thanks


    Special thanks to the following contributors of our previous comparisons

    GoogleIntelAMDNVidia
    ATIAdobeISPhonedicas
    KDDI R&D labsDolbyTata Elxsi Octasic
    QualcommVocewebElgato

    Codec Analysis and Tuning for Codec Developers and Codec Users


    Computer Graphics and Multimedia Laboratory of Moscow State University:
  • 10 years working in the area of video codec analysis and tuning using objective quality metrics and subjective comparisons.
  • 20+ reports of video codec comparisons and analysis (H.264, MPEG-4 MPEG-2, decoders’ error recovery).
  • Methods and algorithms for codec comparison and analysis development, separate codec’s features and codec’s options analysis.

  • We could perform next task for codec developers and codec users.

    Strong and Weak Points of Your Codec

  • Deep encoder parts analysis (ME, RC on GOP, mode decision, etc).
  • Weak and strong points for your encoder and complete information about encoding quality on different content types.
  • Encoding Quality improvement by the pre and post filtering (including technologies licensing).
  • Independent Codec Estimation Comparing to Other Codecs for Different Use-cases

  • Comparative analysis of your encoder and other encoders.
  • We have direct contact with many codec developers.
  • You will know place of your encoder between other newest well-known encoders (compare encoding quality, speed, bitrate handling, etc.).
  • Encoder Features Implementation Optimality Analysis

    We perform encoder features effectiveness (speed/quality trade-off) analysis that could lead up to 30% increase in the speed/quality characteristics of your codec. We can help you to tune your codec and find best encoding parameters.

    Contact Information


    See all MSU Video Codec Comparisons
    MSU video codecs comparisons resources:

    Other Materials


    Video resources:

    Facebook Live Video! watch the world in real time






    Facebook's new Live Video map




    Facebook's new Live Video map

    Video Quality Evaluation Methodology and Verification Testing of HEVC Compression Performance

    Intro

    The High Efficiency Video Coding (HEVC) standard (ITU-T H.265 and ISO/IEC 23008-2) has been developed with the main goal of providing significantly improved video compression compared with its predecessors. In order to evaluate this goal, verification tests were conducted by the Joint Collaborative Team on Video Coding of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29. This study presents the subjective and objective results of a verification test in which the performance of the new standard is compared with its highly successful predecessor, the Advanced Video Coding (AVC) video compression standard (ITU-T H.264 and ISO/IEC 14496-10). The test used video sequences with resolutions ranging from 480p up to ultra-high definition, encoded at various quality levels using the HEVC Main profile and the AVC High profile. In order to provide a clear evaluation, this paper also discusses various aspects for the analysis of the test results. The tests showed that bit rate savings of 59% on average can be achieved by HEVC for the same perceived video quality, which is higher than a bit rate saving of 44% demonstrated with the PSNR objective quality metric. However, it has been shown that the bit rates required to achieve good quality of compressed content, as well as the bit rate savings relative to AVC, are highly dependent on the characteristics of the tested content.



    We currently witnessing something that has become a once-in-a-decade event in the world video compression: the emergence of a major new family of video compression standards. The mid-1990s saw the introduction of the Moving Picture Experts Group (MPEG)-2 video coding standard (ITU-T Rec. H.262 and ISO/IEC 13818-2 [1]), the first compression standard to be widely adopted in broadcasting and entertainment applications. Advanced Video Coding (AVC) (ITU-T Rec. H.264 and ISO/IEC 14496-10 [2]) appeared in the mid-2000s, offering the same subjective quality at approximately half the bit rate. Now, a new standard, High Efficiency Video Coding (HEVC) (ITU-T Rec. H.265 and ISO/IEC 23008-2), has been developed that promises a further factor of two improvement in compression efficiency for the mid-2010s [3]. 

    The HEVC standard has been jointly developed by the same two standardization organizations whose previous collaboration resulted in both MPEG-2 and AVC: 
    1) the ISO/IEC MPEG and 2) the ITU-T Video Coding Experts Group (VCEG), through the Joint Collaborative Team on Video Coding (JCT-VC) [4]. HEVC version 1 was ratified in 2013 as H.265 by the ITU-T and as MPEG-H Part 2 by ISO/IEC [5]. This first version supports applications that use conventional (single-layer) encoding of 4:2:0-sampled video with 8- or 10-bit precision. A second edition was completed in July 2014, and an additional extension was completed in February 2015 [6]. These extend the standard to support contribution applications with tools that enable 4:2:2- and 4:4:4-sampled video formats as well as 12- and 16-bit precision [7], and multilayer coding enhancements for efficient scalability [8] and stereo/multiview and depth-enhanced 3D compression [9]. A further amendment is currently being developed to enable more efficient coding of screen-captured graphics and text content and mixed-source content [10]. 

    A few evaluations have previously been reported comparing the compression performance of HEVC with AVC and also demonstrating the suitability of HEVC for various applications, particularly including some evaluations for high-resolution video content [11]. The subjective test results for a number of test sequences and some analysis of the bit rate savings at different quality levels were presented in [12]. In [13], a study on the suitability of HEVC for beyond-HDTV broadcast services was presented, but with no comparison with previous standards. A comparison of HEVC and AVC performance for frame rates up to 30 Hz on a small number of ultra high definition (UHD) video sequences was presented in [14] and [15], and an informal study focused on low-delay (LD) applications and real-time encoding with HD resolutions was presented in [16].
    All of these, including some that were carried out at the early stages of HEVC development, have provided very consistent evidence of the substantial coding efficiency improvements enabled by HEVC.
    For its formal evaluation of HEVC performance, the JCT-VC performed a subjective evaluation on a wider range of content of resolutions varying from 480p (832 × 480) up to UHD with frame rates of up to 60 Hz. The test sequences used for the verification testing were deliberately chosen to be
    different from those that had been used during the development of the standard, to avoid any possible bias that the standard could have toward those sequences. A test report was produced for the JCT-VC itself [17], and some additional analysis of HEVC performance for UHD content has been presented
    in [18] (using cropped 2560×1600 regions) and [19] (a recent brief conference publication about the testing). 

    As video coding standards generally specify only the format of the data and the associated decoding process without specifying how encoding is to be performed, it is not possible in general to test the compression performance of a standard. Some particular encoding methods must be used as a proxy to represent the capability of the standard instead. The outcome of such a comparison is generally more reliable when similar encoding techniques with similar configurations are applied in the compared encoders rather than simply comparing unknown technologies as black boxes. The verification tests were therefore performed using reference software encoders that had been developed in the standardization work and used very similar encoding algorithms and configurations that were selected to represent important applications. These publicly available reference software codebases are known as the HEVC model (HM) for HEVC [20] and the joint model (JM) for AVC [21].

    While this study reports an extended set of results of HEVC verification tests, it also summarizes the details of tools that can be used in the analysis of the results, pointing to a number of factors an evaluation should consider. Compared with the initial results [17], those presented in this study are based
    on the use of more viewers for subjective tests, the objective results are presented and compared with the subjective results, and additional analysis of the coding gains versus the bit rate is provided. Ultimately, the verification test showed that the key objective of HEVC had been achieved—i.e., providing a
    substantial improvement in compression efficiency relative to its predecessor AVC.

    This study is organized as follows. An overview of video quality evaluation and statistical analysis methodology is presented in Section II, and the test settings used in the subjective evaluations are detailed in Section III. Section IV presents the test results and detailed analysis. Finally, the conclusion is
    given in Section V.


    II. VIDEO QUALITY EVALUATION

    This section provides an overview of the objective and subjective video quality metrics and the related analysis used in this study. 

    For the convenience of video coding performance assessment, the most commonly used objective metric is peak signal-to-noise ratio (PSNR). However, it is commonly acknowledged that PSNR has the disadvantages of disregarding the viewing conditions and the characteristics of human
    visual system perception
    . In addition, the PSNR for a given video sequence can be computed in different ways, depending on how the picture components (e.g., luma and chroma) or individual picture PSNR values are combined.  
    Nevertheless, for a particular content item and small variations of coding conditions, the changes in PSNR values for an overall video sequence can typically be reliably interpreted.

    Other objective video quality metrics, such as the structural similarity index (SSIM) and video quality metric (VQM), have been proposed, but are not used nearly as frequently as PSNR [22]. VQM is not often used—primarily due to its computational complexity—and for both metrics, the interpretation of the values they provide has not yet become a common practice in video coding community. Therefore, in the context of HEVC and AVC compression, this paper provides comparisons and analysis using the PSNR objective measure and subjective quality evaluation results.

    Subjective quality evaluation is the process of employing human viewers for grading video quality based on individual perception. Formal methods and guidelines for subjective quality assessments are specified in various ITU recommendations. Among the many of these, the most relevant to this context
    are ITU-T Rec. P.910 [23], which defines subjective video quality assessment methods for multimedia applications, and ITU-R Rec. BT.500 [24], which defines a methodology for the subjective assessment of the quality of television pictures. These specifications describe a number of test methods with
    distinct presentation and scoring schemes, along with the recommended viewing conditions. Explanations of the quality metrics and data analysis methods are provided in the following sections.


    A. Objective Quality Evaluation Using PSNR

    PSNR is defined as the ratio between the maximum possible power of the signal (the original image) and the power of noise, which in the considered scenario is introduced by lossy compression. For a decoded image component Id, the mean square error (MSE) with reference to an original image component I is computed as 


    where M and N are the width and height of the image component, and the image component is, for example, an array of luma samples or CB or CR chroma samples. The PSNR value is then computed as


    where B is the bit depth of image samples. This is typically calculated for each frame separately, and then averaged for the frames of a video sequence. Due to the logarithmic transformation, this corresponds to using the geometric average of frame MSEs, and the impact of this should be critically considered when a high fluctuation over frames is present.

    For video sequences, which ordinarily consist of three color components, either the luma PSNR value (PSNRY ), calculated using only luma component values, may be reported or a weighted PSNR value (PSNRW ) using all three components can be computed using some weighting criteria. An example
    of a popular weighting for content with 4:2:0 sampling is 


    The most accurate interpretation of the objective results is obtained by looking at the frame-by-frame results for each component. However, this may not be practical for the final presentation of the results for a large data set and a large number of test points.

    B. Subjective Quality Evaluation

    For the HEVC verification test that includes a wide range of visual quality points, a degradation category rating (DCR) [23] test method was selected. For this purpose, it was used to evaluate the quality (and not the impairment) with a
    quality rating scale made of 11 levels [23], ranging from 0 (lowest quality) to 10 (highest quality), which may be interpreted as in Table I. The numerical scale helps avoid misinterpretations associated with the use of category adjectives (e.g., excellent or good), especially in cases where the tests are performed across different countries and including nonnative English speakers.





    The basic results of the subjective test are evaluated in terms of the average rating, which is called the mean opinion score (MOS), and the associated confidence interval values that are computed for each coding point, after having verified the reliability of each viewer. For the DCR method, it is recommended to hire more than 15 naive viewers that have been properly screened for visual acuity and color blindness, to allow for an accurate statistical analysis of the subjective scores [24].

    From the raw data, i.e., the individual subjective scores, the reliability of each viewer is calculated. The individual reliability is evaluated using the correlation coefficient r computed between each score xi provided by a viewer and the overall MOS value yi assigned for that test point i as  



    where T is the total number of test points for a viewing session, yi is the average of all scores for the test point i, and x ¯ and y ¯ are the average values of xi and yi for all test points, respectively. In this HEVC verification test, a correlation index greater than or equal to 0.75 is considered as valid for the
    acceptance of the viewer’s scorings; otherwise, the viewer is considered as an outlier. Once the results for outliers are discarded, the MOS for each test point is computed using the arithmetic average of scores of the remaining viewers.
    In addition, the confidence interval is computed for each test point to estimate a range of values covered by a certain probability. Assuming a Gaussian (normal) distribution for the population of subjective scores with sample size n, mean (MOS) μ, and sample-based standard deviation measurement s, the confidence interval is defined as (μ − c, μ + c), where c is computed as


    In the analysis of the subjective test results, the 95% confidence interval, as shown in Fig. 1, is calculated for each test point. For a 95% confidence interval with a Gaussian distribution, the value of z in (5) is 1.96. For the results
    presented in Section IV, the confidence interval is plotted alongside the MOS, as shown in Fig. 2, with an interpolated curve from MOS values.



    C. Interpretation of Bit Rate Savings From

    Subjective Quality Comparison 

    The objective of the verification test is to gauge the bit rate savings of HEVC over AVC when the AVC and HEVC test points have the same subjective quality.
    Fig. 3 shows an example of a plot comparing the AVC and HEVC MOS versus bit rate curves. There is no overlap in the MOS confidence intervals of the HEVC test point C and AVC test point B, and hence, there is sufficient statistical significance to conclude that the HEVC test point C has a better quality than the AVC test point B. There is, however, an overlap in the MOS confidence intervals of the HEVC test point A and AVC test point B. This means that it is highly likely that the HEVC test point A and AVC test point B have subjective quality that cannot be distinguished. However, there is still a chance that the subjective qualities of HEVC test point A and AVC test point B are not the same.



    A more rigorous analysis is to perform a two-sample unequal variance (heteroscedastic) student’s t-test test using the two-tailed distribution to determine if indeed the subjective qualities given by the sample mean values of the pair of test points are not the same. The null hypothesis, H0, in this case
    would be that the HEVC test points have the same quality as the AVC test point, and the alternate hypothesis, Ha, is that the HEVC test points do not have the same quality as the AVC test point.

    To compare the means of two populations, the t-statistic can be used, which is expressed as

    where X¯i, si2, and ni denote the sample mean, the sample variance, and the size of the ith sample, i ∈ {1, 2}.

    By computing the t-statistic in this way and approximating it with a student’s t-distribution whose degree of freedom (DF) is specified as


    a probability value p can be computed from the t-statistic that indicates the extent to which the means of the two populations are considered to be different. The smaller the p-value is, the more significant the difference between the distributions of the two populations is.

    A p-value less than 0.05 indicates a very low probability of committing a type-I error (i.e., rejecting the null hypothesis when it is true). In such a case, the null hypothesis can thus be safely rejected, and it can be concluded that there is statistical significance that the HEVC test point does not have the same quality as the AVC test point. A p-value greater than or equal to 0.05 means that the null hypothesis cannot be confidently rejected. For the purpose of this paper, the HEVC test point is considered to have the same quality as the corresponding AVC test point in such a case. However, there is still a possibility of committing a type-II error (i.e., failure to reject the null hypothesis when in fact the alternate hypothesis is true). The power or sensitivity of a statistical test is the probability of correctly rejecting the null hypothesis (H0) when it is false—i.e., the probability of correctly accepting the alternative hypothesis (Ha) when it is true [25]. A statistical power test of the data has shown that if in fact the true population mean for the difference in the HEVC MOS and AVC MOS is greater than or equal to 0.8, then the mean probability of committing a type-II error, β, is less than or equal to 0.14, and hence the mean power of the test (defined as 1 − β) is 0.86. By convention, a test with a power greater than 0.8 (or β ≤ 0.2) is considered statistically powerful [26]. In the design of the verification test, four bit rates per codec, RHEVC and RAVC, were used. The bit rates were carefully selected for each of 20 test sequences so that each RHEVC is approximately half of the corresponding RAVC. These gave 80 pairs of test points on which the t-test described above was applied. The results of the test determine, for each pair of test points, whether the HEVC test point has a quality better than, 
    the same as, or less than the AVC test point, and give a rough estimate of the bit rate savings of HEVC compared with AVC.

    The following are the possible outcomes for each pair of test points. The first case is when the null hypothesis is rejected and there is statistical significance that the HEVC MOS at RHEVC is greater than the AVC MOS at RAVC. This means that one can reasonably conclude that the HEVC test point is achieving a better quality than the AVC test point at half the bit rate of AVC. Note that by the design of the test, the bit rate saving when RHEVC is half RAVC is 50%. Since the bit rate for an HEVC test point could be further reduced to achieve the same
    quality, the bit rate saving of HEVC compared with AVC is therefore greater than 50% for this case. The second case is when the null hypothesis was failed to be rejected. This means that the HEVC test point has about the same quality as the AVC test point at half the AVC bit rate, since RHEVC is approximately half of RAVC. Therefore, the bit rate saving of HEVC compared with AVC is approximately 50% for this case. The third case is when the null hypothesis is rejected and there is statistical significance to conclude that the HEVC MOS at RHEVC is less than the AVC MOS at RAVC. This means that the HEVC test point is not achieving equal or better quality than the AVC test point at half the bit rate of AVC.

    More bits would need to be allocated to the HEVC test point before the same quality would be achieved. Therefore, the bit rate saving of HEVC compared with AVC is less than 50% for this case.


    D. Bjøntegaard Model


    The Bjøntegaard model [27], [28] has become a popular tool for evaluating the coding efficiency of a given video codec in comparison with a reference codec over a range of quality points or bit rates. Bjøntegaard delta (BD) metrics are
    typically computed as a difference in bit rate or a difference in quality based on interpolating curves from the tested data points. In this paper, the focus is on the difference in bit rate, expressed as a percentage of a reference bit rate, as this is easily interpreted as the bit rate saving benefit for equal measured quality.

    The BD-rate represents the average bit rate savings for the same video quality (e.g., PSNR or MOS) and is calculated between two rate-distortion curves, such as AVC and HEVC MOS curves in Fig. 3. The bit rate saving difference between the two rate-distortion curves at a given level of quality is



    where RA(D) and RB(D) are the bit rate of the interpolated reference and tested bit rate curves, respectively, at the given level of quality/distortion D. R(D) is typically represented as a percentage of the reference bit rate RA(D) so that a
    negative value represents compression gain, while a positive value represents compression loss.

    The Bjøntegaard model uses a logarithmic scale for the domain of the bit rate interpolation, so by defining = log R, the bit rate savings can be expressed as




    where the lower Dand higher Dintegration bounds are computed from the range of the interpolated distortion values Dand Dfor the reference and tested data sets, respectively, as


    where D(0) is the lowest and D(N−1is the highest measured quality point, for either the tested or reference sets, as shown in Fig. 4. In the HEVC verification test, the number of test points for both the reference and evaluated sets is four

    (i.e., NN= 4) and the curve fitting uses cubic spline interpolation.

    As can be observed from Fig. 4, in some cases, the overall BD-rate measure may be computed over a relatively small interval of overlapping distortion regions. In such a case, the BD-rate metric does not necessarily represent average coding efficiency for all test points involved in the actual test. Therefore, it is important to design the test in a way that the distortion overlap between the two tested codecs covers a range of qualities of interest for specific application. As the metrics derived from the Bjøntegaard model can be applied to different evaluation criteria, it is important to understand the range on which they are computed. For example, BD-rate can be computed for MOS and PSNR, for the same test material. However, as demonstrated in Fig. 5, the actual bit rates on which the two are computed may not be the same. Therefore, in addition to providing BD-rates forboth PSNR and MOS in our evaluation reported in Section IV, we also compute BD-rate on the bit rate interval common for
    both criteria (MOS and PSNR).

    Newer studies [29] have shown how the Bjøntegaard model can further be extended to compute BD-rate intervals considering the confidence intervals of the MOS ratings for each test point, as shown in Fig. 6. The dotted curves show
    the boundaries of confidence intervals for each curve, and two new BD-rate values are computed comparing DB,min with DA,max (labeled BD-ratemin) and DB,max with DA,min (labeled BD-ratemax), where [Dmin, Dmax] represents 95% confidence intervals of MOS. The new BD-rates thus provide lower and
    upper limits for the BD-rate. However, it is noted that these three values of BD-rate are based on different reference (AVC) bit rate ranges as shown in Fig. 6. Although in the results reported in Section IV, these intervals are reported, they have to be carefully interpreted, as the limits of the intervals are defined for significantly different bit rate ranges. However, for relatively small differences between rate-distortion curves, it can be useful to evaluate BD-rate confidence intervals.


    III. TEST SETTINGS

    This section provides information regarding the test material used, test settings, and logistics.


    A. Selection of Test Material and Test Points

    The HEVC verification tests were carried out for four categories of spatial resolutions: UHD (3840 × 2160, except for the Traffic sequence, which is 4096 × 2048), 1080p (1920 × 1080), 720p (1280 × 720), and 480p (832 × 480). The details of the test sequences are provided in Tables II and III. Screenshots are given in Figs. 8–11. The sequences are selected from different sources and have different spatiotemporal characteristics, which leads to different behavior of compression algorithms. This is the first formal test of video compression standards where a wide range of resolutions including content with UHD resolution with highframe rate has been evaluated. The format, as specified in
    ITU-R Rec. BT.2020 [30], has a number of extended features compared with ordinary HD video. In addition to containing more pixels per frame, it specifies support for higher frame rates, wider color gamut, and higher bit depths [31]. However, for compatibility with available playout and display systems, all tested video sequences have 8 bits per component per sample and are in the Y’CBCR color space defined by ITU-R Rec. BT.709 [32].

    The test sequences were compressed using HEVC (HM-12.1, Main profile [20]) and AVC (JM-18.5, High profile [21]) encoding. Either a random access (RA) or low delay (LD) configuration (Cfg) was used (similarly configured for both HEVC and AVC, with a refresh period of approximately 1 s and hierarchical referencing for RA and with no periodic refresh and no reordered referencing for LD).

    For each test sequence, four test points using different fixed quantization parameter (QP) settings were selected so that the tested HEVC bit rates are approximately half of the AVC bit rates. Also, the ranges of the QP values were selected so that the subjective quality of the encoded sequences spans a large range of MOS values. This bit rate ratio was chosen because it is already well established that the quality of HEVC is much better than the quality of AVC at the same bit rate. 

    These test conditions can identify whether a bit rate saving of 50% or more is achieved for the majority of the tested video sequences. The full details of the QP values and bit rates can be found in [17].




    B. Subjective Test Structure

    Subjective tests for different categories of video sequences were conducted in separate sessions. Each subjective test session of the DCR method consisted of a series of basic test cells (BTCs). Each BTC was made of two consecutive presentations of the video clip under test, as shown in Fig. 7.
    First, the original version of the video sequence was displayed, followed by the coded version of the video sequence, with a gap of 1 s. Then, a message asking the viewers to vote was displayed for 5 s. Each test session was designed with 45 BTCs in total: the first three BTCs represented the stabilization phase and were selected to show to the viewers the whole range of quality they would see during the test. Two BTCs showing original versus original were also used as a sanity check for the range of ratings made by the viewers. The scores coming from the original BTCs and from the stabilization phase were excluded from further analysis. All the BTCs were randomly ordered to avoid the same content being seen repeatedly and to spread the quality as much as possible in a uniform way across the whole test.


    C. Subjective Test Logistics

    The subjective tests were performed at two sites, under a controlled laboratory environment, adhering to the recommendations in ITU-R Rec. BT.500 and ITU-T Rec. P.910. The tests for UHD and 720p resolutions were done at the BBC R&D
    labs in London and for the other resolutions at the University of the West of Scotland (UWS). The equipment and session details for each test are given in Tables IV and V, respectively. Additional analysis of the influence of viewing distances (front and back rows in the seating arrangement for viewers) on the
    subjective rating has been provided in [19]. 

    IV. RESULTS

    This section summarizes the subjective test results. It also provides an analysis with a focus on a comparison with the objective test results, which are easier to obtain in practice.

    A. Objective and Subjective Test Results

    The subjective evaluation results for each category of test sequences are shown in Figs. 12–15 in the form of MOS versus bit rate plots. 






    The objective quality metric (PSNR) values for the same test points are also plotted on the same graph using a second vertical axis. Note that the scales for the two vertical axes in these plots are independently selected, and thus no direct connection between the subjective (i.e., MOS) and objective (i.e., PSNR) plots is demonstrated. The legend in all plots shows that circle and triangle markers represent the results for actual test points, while the curves between them were calculated using cubic spline interpolation with the bit rate in a logarithmic scale, as in typical BD-rate computation. Only the parts of the curves related to BD-rate calculation, either for MOS or PSNR BD-rate computation, are displayed. The solid and dotted lines represent MOS and PSNR curves, respectively. Confidence intervals are displayed for each MOS test point. The PSNR results presented are for the luma color component only. Because of space limitation, the chroma results are not presented. However, the authors note that the weighted PSNR results as in (3) are highly correlated to luma-only PSNR results. Typically, the weighted PSNR has a somewhat higher value than luma PSNR, but the values of BD-rate for weighted PSNR are typically close to the values of BD-rate for luma PSNR.



    By considering the positions of MOS points for AVC and HEVC, it can be observed that HEVC achieves the same subjective quality as AVC while typically requiring substantially lower bit rates. Table VI shows the results of the student’s t-test on the 80 pairs of HEVC and AVC test points. These pairs of test points were classified into the three categories of bit rate savings as described in Section II-C. The first four rows show the distribution of bit rate savings achieved for each resolution. The last row summarizes the distribution
    of bit rate saving statistics for all resolutions. This shows that for 74 out of the 80 pairs of test points (or 92.5%), HEVC has a bit rate saving compared with AVC that is greater  than or equal to 50%. The amount of bit rate saving is similar for both the RA and LD test cases. Only six pairs of test points (or 7.5%) show a bit rate saving of less than 50%. Four of the six pairs of test points were contributed by one video sequence (SVT04a), where the HEVC encoder did not perform as well as in the other cases.


    The data convincingly show that the HEVC is achieving bit rate saving that is at least 50%. However, the granularity of the above bit rate saving estimation was limited by the test design, where in each pair of test points, the HEVC bit rate was selected as approximately half the AVC bit rate. In order to get a more precise quantification of the estimated bit rate savings, a different method is used, where the MOS BD-rate values for the subjective ratings for each
    test sequence, as shown in Table VII, are computed. The upper and lower limits of BD-rates corresponding to the 95% confidence interval of MOS, as discussed in Section II-D, are also indicated.

    In addition to the BD-rates for the available MOS range of each video sequence, an additional measurement was also calculated for the range of MOS scores greater than or equal to seven. This range (MOS ≥ 7) is typically expected for
    a number of services, such as broadcasting, where targeted quality levels are good to excellent. Negative BD-rate values in Table VII indicate the bit rate savings measured for HEVC relative to the bit rate used for AVC.


    The averages in Table VII are computed only for the BD-rate values displayed in the table. Note that the results for MOS BD-rates for the HomelessSleeping and Cubicle sequences are excluded. In order for the BD-rate interpolation to work correctly, the MOS values should exhibit a smooth curve and the interval of the averaging (shown as curves in Figs. 12–15) should be interpolated from at least
    three points that are monotonically increasing with bit rate. 
    This condition was not satisfied for the omitted test sequences (although a very substantial gain is evident for HEVC in both omitted cases).

    It is noted that for most video sequences, the MOS-based BD-rate benefit is substantially higher than what is measured by the PSNR-based BD-rate. MOS BD-rates indicate that HEVC could provide the same visual quality as AVC for most tested content categories at well below half the bit rate of the
    latter, surpassing the performance expected at the launch of the HEVC standard development process. The fact that BD-rates for MOS, confidence intervals, and
    PSNR in Table VII have not been computed over the same bit rate range for a given test sequence, which has been discussed in Section II, is further addressed in the following section.


    B. Compression Efficiency Results for Specific Bit Rates

    Although different BD-rate measures presented in Table VII are useful indicators of the compression performance of HEVC, comparing them with each other raises validity concerns, as they are computed for different reference bit rate
    ranges, as discussed in Section II. To partly address this problem, additional analysis of the results has been conducted to evaluate compression efficiency for bit rates that are common to both PSNR and MOS BD-rate computation.

    The bit rate savings achieved by HEVC with reference to the associated AVC bit rate, computed for the continuous bit rate range using cubic spline interpolation as discussed in Section II-D, are shown in Fig. 16, considering both subjective and objective quality assessments. In the majority of cases, the bit rate savings for equal MOS are higher than the bit rate savings for equal PSNR. The savings vary across different bit rates and different video source content.

    Considering only the parts of the curves from Fig. 16 that are defined both for MOS and PSNR for a given AVC bit rate, we have measured the average BD-rates and the average difference between the PSNR and MOS curves, which are presented



    in Table VIII. The results in Table VIII do not take into account the test sequences that were discarded according to MOS BD-rate computation problems (HomelessSleeping and Cubicle) described in Section IV-A. For the same bit rate range, the MOS-based BD-rate saving is 59% and the PSNR-based BD-rate saving is 44%, averaged over all test sequences. Depending on the content category, the average differences between the two measurement methods
    are between 11% and 18%. In other words, the bit rate savings measured for equal PSNR are lower than the bit rate savings measured for equal MOS between AVC and HEVC by roughly 15%. This is consistent with the difference between the average values in Table VII, with the difference being that
    in this case the bit rate range for PSNR and MOS rate saving measurement is equal.



    V. CONCLUSION

    This paper presents the results of a formal subjective verification test that was carried out by the JCT-VC for the new HEVC video coding standard. In this paper, a more rigorous analysis of the subjective test results using student’s t-test shows that the HEVC test points at half or less than half the bit rate of the AVC reference were found to achieve a comparable quality in 92.5% of the test cases. In addition, it provides a summary of evaluation metrics and an analysis
    of the results that were obtained, with a focus on comparison between the subjective and objective evaluation results and the performance across different bit rate ranges. A more precise quantitative estimate of the bit rate savings
    was obtained by applying the MOS-based BD-rate measurement on the results of the subjective test. It was found for the investigated test cases that the HEVC Main profile can achieve the same subjective quality as the AVC High profile while requiring on average approximately 59% fewer bits.

    The PSNR-based BD-rate average over the same sequences was calculated to be 44%. This confirms that the subjective quality improvements of HEVC are typically greater than the objective quality improvements measured by the method cthat was primarily used during the standardization processof HEVC. It can therefore be concluded that the HEVC standard is able to deliver the same subjective quality as AVC, while on
    average (and in the vast majority of typical sequences) requiring only half or even less than half of the bit rate used by AVC. This means that the initial objective of the HEVC development (substantial improvement in compression compared with the previous state of the art) has been successfully achieved.