What is H.264

JVT (Joint Video Team) was established in December 2001 in Pattaya, Thailand. It is jointly composed of video coding experts from two international standardization organizations, ITU-T and ISO. The work goal of JVT is to formulate a new video coding standard to achieve high video compression ratio, high image quality, good network adaptability and other goals. At present, the work of JVT has been accepted by ITU-T. The new video compression coding standard is called H.264 standard. This standard is also accepted by ISO and is called AVC (Advanced Video Coding) standard, which is the 10th part of MPEG-4.

The H.264 standard can be divided into three levels:

Basic level (its simple version has a wide range of applications);

Main level (adopts a number of Technical measures to improve image quality and increase compression ratio, can be used for SDTV, HDTV and DVD, etc.);

Extended grades (can be used for video streaming transmission on various networks).

H.264 not only saves 50% of the bit rate than H.263 and MPEG-4, but also has better support for network transmission. It introduces an encoding mechanism for IP packets, which is beneficial to packet transmission in the network and supports streaming video transmission in the network. H.264 has strong anti-bit error characteristics and can adapt to video transmission in wireless channels with high packet loss rates and severe interference. H.264 supports hierarchical encoding transmission under different network resources to obtain stable image quality. H.264 can adapt to video transmission in different networks and has good network affinity.

1. H.264 video compression system

The H.264 standard compression system consists of two parts: the video coding layer (VCL) and the network abstraction layer (Network Abstraction Layer, NAL). VCL includes VCL encoder and VCL decoder. Its main function is video data compression encoding and decoding. It includes compression units such as motion compensation, transform coding, and entropy coding. NAL is used to provide a unified interface for VCL that has nothing to do with the network. It is responsible for encapsulating and packaging video data and transmitting it on the network. It adopts a unified data format, including single-byte header information, multiple bytes Video data and framing, logical channel signaling, timing information, sequence end signal, etc. The header contains storage flags and type flags. The storage flag is used to indicate that the current data does not belong to the referenced frame. The type flag is used to indicate the type of image data.

VCL can transmit encoding parameters adjusted according to the current network conditions.

2. Characteristics of H.264

H.264, like H.261 and H.263, also uses DCT transform coding plus DPCM differential coding, that is, a hybrid coding structure. At the same time, H.264 introduces a new encoding method under the framework of hybrid encoding, which improves encoding efficiency and is closer to practical applications.

H.264 does not have cumbersome options, but strives to be concise and "return to the basics". It has better compression performance than H.263, and has the ability to adapt to a variety of channels.

H.264 has a wide range of application targets and can meet various video applications at different rates and occasions. It has good anti-bit error and anti-packet loss processing capabilities.

The basic system of H.264 does not require the use of copyrights, is open in nature, and can well adapt to the use of IP and wireless networks. This is very important for the current transmission of multimedia information on the Internet and the transmission of broadband information on mobile networks. of great significance.

Although the basic structure of H.264 encoding is similar to H.261 and H.263, it has made improvements in many aspects, which are listed below.

1. A variety of better motion estimation

High-precision estimation

In H.263, half-pixel estimation is used, and in H.264, 1/4 pixel or even 1/ 8-pixel motion estimation.

That is, the displacement of the real motion vector may be based on 1/4 or even 1/8 pixel. Obviously, the higher the accuracy of the motion vector displacement, the smaller the residual error between frames, the lower the transmission code rate, that is, the higher the compression ratio.

In H.264, the interpolation of the 6th order FIR filter is used to obtain the value of the 1/2 pixel position. When the 1/2 pixel value is obtained, the 1/4 pixel value can be obtained by linear interpolation.

For the 4:1:1 video format, the 1/4 pixel accuracy of the luminance signal corresponds to the chrominance Part of the 1/8 pixel motion vector, so a 1/8 pixel interpolation operation is required for the chrominance signal.

Theoretically, if the accuracy of motion compensation is doubled (for example, from whole pixel accuracy to 1/2 pixel accuracy), there can be a coding gain of 0.5bit/Sample, but actual verification found that in motion After the vector accuracy exceeds 1/8 pixel, the system basically has no obvious gain. Therefore, in H.264, only the motion vector mode with 1/4 pixel accuracy is used instead of 1/8 pixel accuracy.

Multiple macroblock division mode estimation

In the prediction mode of H.264, a macroblock (MB) can be divided into 7 different mode sizes. This multi-mode Flexible and subtle macroblock division is more in line with the shape of actual moving objects in the image, so each macroblock can contain 1, 2, 4, 8 or 16 motion vectors.

Multi-parameter frame estimation

In H.264, motion estimation of multiple parameter frames can be used, that is, multiple newly encoded parameters are stored in the encoder's buffer Frames, the encoder selects one of them that gives better coding effect as the parameter frame, and indicates which frame is used for prediction, so that it can obtain better results than just using the last just encoded frame as the predicted frame. encoding effect.

2. Integer transformation of small size 4?4

The commonly used unit in video compression coding in the past was 8?8 blocks. In H.264, small-sized 4-4 blocks are used. Since the size of the transformation block becomes smaller, the division of moving objects becomes more accurate. In this case, the amount of calculation in the image transformation process is small, and the connection error at the edge of the moving object is also greatly reduced.

When there is a large smooth area in the image, in order to avoid grayscale differences between blocks caused by small-size transformation, H.264 can process 16 4-bit luminance data of intra-frame macroblocks. The DCT coefficients of ?4 blocks are transformed for the second 4?4 blocks, and the DC coefficients of the 4 4?4 blocks of the chroma data (one for each small block, maximum 4 DC coefficients) are subjected to 2?2 Block transformation.

H.263 not only makes the image transformation block size smaller, but also this transformation is an integer operation, not a real number operation, that is, the accuracy of the transformation and inverse transformation of the encoder and decoder is the same, and there is no "inverse transformation" error".

3. More accurate intra prediction

In H.264, each pixel in each 4?4 block can be intra predicted with a different weighted sum of the 17 closest previously encoded pixels .

4. Unified VLC

There are two methods for entropy coding in H.264.

Unified VLC (UVLC: Universal VLC). UVLC uses the same code table for encoding, and the decoder can easily identify the prefix of the code word. UVLC can quickly obtain resynchronization when a bit error occurs.

Content Adaptive Binary Arithmetic Coding (CABAC: Context Adaptive Binary Arithmetic Coding). Its encoding performance is slightly better than UVLC, but its complexity is higher.

3. Performance advantages

The encoding performance comparison between H.264, MPEG-4 and H.263 uses the following 6 test rates: 32kbit/s, 10F/s and QCIF; 64kbit/s, 15F/s and QCIF; 128kbit/s, 15F/s and CIF; 256kbit/s, 15F/s and QCIF; 512kbit/s, 30F/s and CIF; 1024kbit/s, 30F/s and CIF. Test results show that H.264 has better PSNR performance than MPEG and H.263.

The PSNR of H.264 is 2dB higher on average than MPEG-4 and 3dB higher than H.263 on average.

IV. New fast motion estimation algorithm

The new fast motion estimation algorithm UMHexagonS (Chinese patent) is a fast full motion algorithm with a higher computational complexity than the original H.264. The search algorithm can save more than 90% of the new algorithm. Its full name is "Unsymmetrical-Cross Muti-Hexagon Search". This is an integral pixel motion estimation algorithm. . Because it has very low computational complexity while maintaining good rate-distortion performance when encoding large moving image sequences at high bit rates, it has been officially adopted by the H.264 standard.

ITU and ISO cooperate. The developed H.264 (MPEG-4 Part 10) may be accepted as a unified standard by broadcasting, communications and storage media (CD and DVD), and is most likely to become the standard for broadband interactive new media. my country's source coding standard has not yet been formulated. , pay close attention to the development of H.264, and the work of formulating my country's source coding standard is intensifying.

The H264 standard has brought moving image compression technology to a higher stage, providing it at a lower bandwidth. High-quality image transmission is the application highlight of H.264. The promotion and application of H.264 has higher requirements for video terminals, gatekeepers, gateways, MCUs and other systems, and will effectively promote the development of video conferencing software and hardware equipment in all aspects. Continuously improve.