Who invented MP4?

Video codec

Wikipedia, the encyclopedia of freedom

Video codec refers to a program or device that can compress or decompress digital video. Usually this kind of compression belongs to lossy data compression. Historically, video signals were stored on magnetic tape in analog form. With the appearance and entering the market of optical discs, audio signals are stored in digital form, and video signals are also in digital format, and some related technologies have also developed.

Both audio and video require customizable compression methods. Engineers and mathematicians have tried many different methods to solve this problem.

There is a complex balance between the following factors: the quality of video, the amount of data needed to represent video (usually called code rate), the complexity of coding algorithm and decoding algorithm, the robustness to data loss and errors, the convenience of editing, random access, the perfection of coding algorithm design, end-to-end delay and other factors.

App application

In daily life, video codecs are widely used. For example, DVD(MPEG-2), VCD(MPEG- 1), various satellite and terrestrial television broadcasting systems and the Internet. Online video materials are usually compressed by many different codecs. In order to browse these materials correctly, users need to download and install the codec package, a codec component compiled for PC.

With the appearance of DVD burner, it is becoming more and more popular for users to compress their own videos. Because the DVD sold in stores usually has a large capacity (double-layer), and the double-layer DVD burner is not very popular at present, users sometimes compress the DVD material twice in order to completely store it on a single-sided DVD.

Video codec design

The first step of a typical digital video codec is to convert the video input from the camera from RGB color space to YCbCr color space, and usually it is accompanied by color sampling to generate a video in 4:2:0 format (sometimes the 4:2:2 sampling method is adopted in the case of interlaced scanning). Converting to YCbCr color space will bring two benefits: 1) partially removes the correlation in the color signal and improves the compressibility. 2) In this way, the luminance signal is separated, and the luminance signal is the most important for visual perception, while the chrominance signal is relatively less important for visual perception, so it can be sampled to a lower resolution (4:2:0 or 4:2:2) without affecting people's viewing perception.

Before real coding, sampling in space or time domain can effectively reduce the data volume of original video data.

Input video images are usually divided into macroblocks and coded separately. Macroblock size is usually 16x 16 luminance block information and corresponding chrominance block information. Then, block motion compensation is used to predict the data of the current frame from the encoded frame. Then, block transform or subband decomposition is used to reduce the statistical correlation in spatial domain. The most common transform is 8×8 discrete cosine transform. Then the transformed output coefficients are quantized, and the quantized coefficients are entropy coded to become a part of the output code stream. In fact, when using DCT transform, the quantized two-dimensional coefficients are usually represented as one-dimensional by zigzag scanning, and then a symbol is obtained by coding the number of continuous zero coefficients and the level of non-zero coefficients, and there is usually a special symbol to indicate that all the remaining coefficients are equal to zero. At this time, entropy coding usually adopts variable length coding.

Decoding basically performs the exact opposite of the encoding process. The step that can't completely restore the original information is quantization. At this time, it is necessary to restore the original information as much as possible. This process is called inverse quantization, although quantization itself is doomed to be an irreversible process.

The design of video codec is usually standardized, that is, there are published documents to accurately standardize how to do it. In fact, in order to make the encoded code stream interoperable (that is, the code stream encoded by the A encoder can be decoded by the B decoder, and vice versa), it is enough to standardize the decoding process of the decoder. Usually, the coding process is not completely defined by a standard, and users can freely design their own encoders, as long as the codestreams generated by the encoders designed by users meet the decoding specifications. Therefore, the same video source is encoded by different encoders according to the same standard, and the quality of decoded output images may often vary greatly.

Common video codec

Many video codecs can be easily implemented on personal computers and consumer electronic products, which makes it possible to implement multiple video codecs on these devices at the same time, and avoids the influence of a dominant codec on the development and promotion of other codecs due to compatibility reasons. Finally, we can say that no codec can replace all other codecs. The following are some commonly used video codecs, arranged in chronological order when they became international standards:

H.26 1

H.26 1 is mainly used for old-fashioned video conferencing and videophone products. H.26 1 is the first digital video compression standard developed by ITU-T. Essentially, all subsequent standard video codecs are based on what it involves. It uses common YCbCr color space, 4:2:0 chroma sampling format, 8-bit sampling accuracy, 16× 16 macroblock, block motion compensation, discrete cosine transform of 8×8 blocks, quantization, zigzag scanning of quantized coefficients, run-length symbol mapping and huffman encoding. H.26 1 only supports progressive video input.

MPEG- 1 Part II

The second part of MPEG- 1 is mainly used on VCD, and some network videos also use this format. The quality of this codec is almost the same as that of the original VHS video tape, but it is worth noting that VCD belongs to digital video technology and will not lose its quality gradually with the number and time of playing like VHS video tape. If the quality of the input video source is good enough and the bit rate is high enough, VCD can give higher quality than VHS in all aspects. But in order to achieve this goal, VCD usually needs a higher bit rate than the VCD standard. In fact, if all VCD players can play, the video bit rate higher than 1 150kbps or the video resolution higher than 352x288 cannot be used. Generally speaking, this restriction is usually only effective for some stand-alone VCD players (including some DVD players). The third part of MPEG- 1 also includes the current common *.mp3 audio codec. Considering universality, the video/audio codec of MPEG- 1 can be said to be the most common codec, and almost all computers in the world can play files in MPEG- 1 format. Almost all DVD players support VCD playback. Technically, compared with H.26 1 standard, MPEG- 1 adds half-pixel motion compensation and bidirectional motion prediction frames. Like H.26 1, MPEG- 1 only supports progressive video input.

MPEG-2 Part 2

The second part of MPEG-2 is equivalent to H.262, which is used in DVD, SVCD, most digital video broadcasting systems and cable TV distribution systems. When used on standard DVD, it supports high image quality and widescreen; When SVCD is used, its quality is not as good as DVD, but it is much higher than VCD. Unfortunately, SVCD can hold up to 40 minutes of content on a CD, while VCD can hold one hour, which means that the average bit rate of SVCD is higher than that of VCD. MPEG-2 will also be used in the new generation DVD standards HD-DVD and Blu-ray Disc. Technically speaking, compared with MPEG- 1, the biggest improvement of MPEG-2 is that it supports interlaced video. MPEG-2 is a fairly old video coding standard, but it has great popularity and market acceptance.

263

H.263 is mainly used for video conference, videophone and network video. In the aspect of compressing progressive video source, H.263 has a great performance improvement compared with its previous video coding standard. Especially at the low bit rate end, the bit rate can be greatly saved on the premise of ensuring a certain quality.

MPEG-4 part 2

The second part of MPEG-4 standard can be used for network transmission, broadcasting and media storage. Compared with MPEG-2 and the first version of H.263, its compression performance is improved. The main differences from the previous video coding standards are the "object-oriented" coding method and some other technologies that are not used to improve the compression ratio of ordinary video coding. Of course, some techniques to improve the compression ability are also introduced, including some H.263 techniques and 1/4 pixel motion compensation. Like MPEG-2, it supports progressive scanning and interlaced scanning.

MPEG-4 Part 10

The tenth part of MPEG-4 is technically the same as the ITU-T H.264 standard, sometimes called "AVC"). This newly developed standard is the best video coding standard completed by ITU-T VCEG and ISO/IEC MPEG, and has been used more and more. This standard introduces a series of new technologies that can greatly improve the compression performance, which can greatly exceed the previous standards in both high bit rate and low bit rate. Products that have used and will use H.264 technology include, for example, Sony's PSP, Nero's digital product suite, Apple's Mac OS X v 10.4, and the new generation DVD standards HD-DVD and Blu-ray.

Divi, XviD and 3ivx

DivX, XviD and 3ivx video codecs basically use the technology of MPEG-4 Part II, and some files end with suffix *. avi,*.mp4,*。 Ogm or *. Mkv uses these video codecs.

Wild mumps virus

WMV(Windows Media Video) is a series of video codecs of Microsoft, including WMV 7, WMV 8, WMV 9 and WPV 10. This series of codecs can be applied to narrowband video from dial-up Internet access to HDTV broadband video. Users who use Windows Media Video can also burn video files to CDs, DVDs or other devices. It is also suitable as a media server. WMV can be regarded as an enhanced version of MPEG-4. The latest version of WMV is the VC- 1 standard being formulated by SMPTE. WMV-9(VC- 1, development code "Corona") was first introduced as VC-9, and was later renamed as VC-1by SMPTE (VC refers to video codec).

RealVideo

RealVideo is a video codec developed by RealNetworks. In recent years, there was a downturn, which was later favored by the market. Recently, especially in the BT film industry, it is particularly popular.

Sorensen 3

Sorenson 3 is the codec used by Apple software QuickTime. Many QuickTime-formatted videos on the Internet are compressed by this codec.

Cinepak

Cinepak is also an old codec used by apple's software QuickTime. The advantage is that even very old computers (such as 486) can support it and play smoothly.

Indeo video

Indeo Video Indeo Video is a codec developed by Intel.

All the codecs mentioned above have their own advantages and disadvantages. I often see articles comparing these codecs. At this time, the most important thing is to give consideration to coding rate and clarity (usually referring to regular distortion characteristics and robustness).

H.264/MPEG-4 AVC

Wikipedia, the encyclopedia of freedom

H.264, the tenth part of MPEG-4, is a highly compressed digital video coding and decoding standard proposed by Joint Video Group (JVT) composed of ITU-T Video Coding Expert Group (VCEG) and ISO/IEC Moving Picture Expert Group (MPEG). H.264 standard of ITU-T and ISO/iempeg-4 Part 10 (formally known as ISO/IEC 14496- 10) are the same in coding and decoding technology, which is also called AVC, that is, advanced video coding. The final draft (FD) of the first edition of the standard was completed in May 2003.

H.264 is one of ITU-T standards named after H.26x series, and AVC is the name of ISO/IEC MPEG. This standard is usually called H.264/AVC (or AVC/H.264 or H.264/MPEG-4 AVC or MPEG-4/H.264 AVC), which clearly indicates its two developers. This standard originated from the development of ITU-T project and is called H.26L. Although the name H.26L is not common, it has been used all the time. Sometimes this standard is also called "JVT codec" because it was developed by JVT organization (it is not unprecedented for two organizations to jointly develop the same standard, and the previous video coding standard MPEG-2 was also jointly developed by MPEG and ITU-T-so MPEG-2 is called H.262 in ITU-T naming specification).

The original goal of H.264/AVC project is to hope that the new codec can provide good video quality at a much lower bit rate (for example, half or lower) than the previous video standards (such as MPEG-2 or H.263). At the same time, there are not many complicated coding tools, which makes it difficult to realize hardware. Another goal is adaptability, that is, the codec is widely used (for example, it includes both high bit rate and low bit rate, and also includes different video resolutions), and can work on various networks and systems (for example, multicast, DVD storage, RTP/IP packet network, ITU-T multimedia telephone system).

JVT recently completed the extension of the original standard, which is called fidelity range extension (FRExt). This extension supports higher-precision video coding by supporting higher pixel precision (including 10 bit and 12 bit pixel precision) and higher chroma precision (including YUV 4:2:2 and YUV 4:4:4). This extension adds some new functions (such as adaptive integer transformation of 4x4 and 8x8, user-defined quantization weighting matrix, efficient lossless coding between frames, support for new chroma space and chroma interleaving transformation). The design of the expansion was completed in July 2004, and the draft was also completed in September 2004.

Since the earliest version of the standard was completed in May 2003, JVT has completed a round of errata to the standard, and a new round of errata has been completed recently and approved by ITU-T, and will soon be approved by MPEG.

Technical details

H.264/AVC contains a series of new functions, which makes it not only more effective than previous codecs, but also can be used in various network applications. These new features include:

Motion compensation of multiple reference frames. Compared with previous video coding standards, H.264/AVC uses more coded frames as reference frames in a more flexible way. In some cases, as many as 32 reference frames can be used (in previous standards, the number of reference frames for B frames was 1 or 2). This function can reduce the coding rate or improve the quality of most scene sequences, and it can significantly reduce the coding rate of some types of scene sequences, such as fast repeated flashing, repeated cropping or background occlusion.

Variable block size motion compensation. Using the block with the maximum value of 16× 16 and the minimum value of 4×4 for memc, the moving region in the image sequence can be segmented more accurately.

In order to reduce the ringing effect and finally get a clearer image, a six-tap filter is used to generate the brightness component prediction value of half a pixel.

The macroblock pair structure allows macroblocks of 16x 16 to be used in field mode (compared with 16x8 in MPEG-2).

Motion compensation with pixel accuracy of 1/4 can provide more accurate motion block prediction. Since the chroma is usually 1/2 of luminance sampling (see 4:2:0), the accuracy of motion compensation reaches 1/8 pixel accuracy.

Weighted motion prediction refers to the method of increasing weight and offset when using motion compensation. It can provide considerable coding gain in some special occasions, such as fade-in, fade-out, fade-out and fade-in.

Using an in-loop deblocking filter can reduce the blocking effect that usually exists in other video codecs based on discrete cosine transform (DCT).

A matching integer 4×4 transform (similar to the design of discrete cosine transform), also in high-precision extension, adopts integer 8×8 transform, which can adaptively choose between 4×4 transform and 8×8 transform.

After the first 4×4 transform, the DC coefficients (in special cases, the DC coefficients of chromaticity and brightness) are transformed by Hadamard transform, so that better compression effect can be obtained in smooth areas.

It is beneficial to the intra-frame spatial prediction of adjacent block boundary pixels (better than DC coefficient prediction used in MPEG-2 video and transform coefficient prediction used in H.263+ and MPEG-4 video).

Context-based Binary Arithmetic Coding (CABAC) can flexibly perform lossless entropy coding on various syntax elements when the corresponding context probability distribution is known.

Context-based variable length coding (CAVLC) is used to encode the quantized coefficient of variation. Compared with CABAC, it has lower complexity and compression ratio, but it is quite effective compared with the entropy coding scheme used in previous video coding standards.

A simple entropy coding scheme called Exp-Golomb is used to encode syntax elements that are neither CABAC nor CAVLC.

Using Network Abstraction Layer (NAL) makes the same video syntax suitable for various network environments. Sequence parameter set (SPSs) and image parameter set (PPSs) are used to provide higher robustness and flexibility.

The switch slice (including SP and SI) enables the encoder to instruct the decoder to jump to the video stream being processed, which is used to solve the video stream rate switching and trick mode operation. When the decoder jumps to the middle of the video stream using SP/SI stripe, it can get a completely consistent decoded reconstructed image, unless the subsequent decoded frame refers to the image before the switching frame as a reference frame.

Flexible macroblock sorting (FMO of flexible macroblock sorting, also known as slice group technology) and ASO of arbitrary slice sorting are used to change the coding order of macroblocks, which are the most basic units of image coding. It can be used to improve the robustness of the code stream in the bypass channel and some other purposes.

Data partitioning (DP for Data partitioning) can pack syntax elements with different importance and transmit them separately, and use UEP for unequal error protection to improve the robustness of video streams against channel errors/packet losses.

Redundant slice RS is also a technology to improve the robustness of code stream. Using this technique, the encoder can send another coded representation (usually a lower resolution coded stream) of a certain area (or all) of an image, so that when the main representation is wrong or lost, it can be decoded with a redundant second coded representation.

An automatic packing method of byte code stream is used to avoid the repetition of starting codes in the code stream. The start code is a code word used for random access and resynchronization in the code stream.

Supplementing SEI for supplementing enhancement information and Vui for video usability information increases the method of adding information to video streams, providing excuses for various applications.

Auxiliary pictures can be used to achieve some special functions, such as alpha synthesis.

The number of frames supports the creation of a subsequence of a video sequence (supporting scalability in time domain), and also supports the detection and hiding of lost whole frame images (caused by network packet loss or channel error code).

Image sequence counting, which makes the sequence of each frame image and the pixel value of the decoded image irrelevant to the time information (that is, a separate system is used to transmit, control and change the time information, so as not to affect the pixel value of the decoded image. )

Together with other technologies, these technologies have significantly improved the performance of H.264 compared with the previous video codec, and supported a wider range of applications in various environments. Compared with MPEG-2, the compression performance of H.264 is greatly improved, and the bit rate can be reduced to half or even less under the same image quality.

Like other MPEG video standards, H.264/AVC also provides a reference software which can be downloaded for free. Its main purpose is to provide a demonstration platform for demonstrating various functions of H.264/AVC, not as a direct application platform (see the following link for download address). At present, some hardware reference designs are also being implemented in MPEG.

patent licence

Like MPEG-2 Part I, Part II and MPEG-4 Part II, manufacturers and service providers using H.264/AVC need to pay patent license fees to the holders of patents used in their products. The main source of these patent licenses is a private organization named MPEG-LA, LLC (in fact, this organization has nothing to do with MPEG standardization organization, but it also manages MPEG-2 Part I system and Part II video. MPEG-4 for video and other technologies Part II patent license).

App application

Two main technologies competing for the next generation DVD format plan to add H.264/AVC HP as a necessary player feature in the second half of 2005, including:

HD-DVD format developed by DVD forum

Blu-ray disc format developed by Blu-ray Association (BDA)

In the second half of 2004, European DVB standard organization adopted H.264/AVC for European TV broadcasting.

In the second half of 2004, French Prime Minister Jean-Pierre_Raffarin announced that H.264/AVC was chosen as the requirement of French HDTV receiver and digital TV terrestrial broadcasting service.

The ATSC standard organization in the United States is considering the possibility of adopting H.264/AVC in American television broadcasting.

South Korea's digital multimedia broadcasting (DMB) service will adopt H.264/AVC.

H.264/AVC codec will be used for mobile cell broadcasting services provided by ISDB-T in Japan, including major broadcasting service providers:

Japan Broadcasting Association (NHK)

Tokyo Broadcasting Corporation

nippon television network corporation

Asahi TV (Asahi TV)

Fuji TV

tv tokyo corporation

Direct satellite broadcasting satellite services will use this standard, including:

News Corp. /DirecTV (in the United States)

Echostar/Dish Network/Voom TV (in USA)

1080 Euro (Europe)

Premiere (Germany)

British Sky Broadcasting Corporation (in Britain and Ireland)

The 3rd Generation Mobile Communication Cooperation Organization (3GPP) has approved H.264/AVC as an optional technology of its sixth edition mobile multimedia telephone service standard.

MISB of the Moving Picture Standards Committee under the US Department of Defense has accepted H.264/AVC as its recommended video codec.

IETF of Internet Engineering Task Force has completed the payload packing format (RFC 3984) as a packing method for transmitting H.264/AVC code stream through real-time transport protocol RTP.

ISMA, an Internet streaming media alliance, has accepted H.264/AVC as its technical specification for ISMA 2.0.

MPEG organization fully integrates H.264/AVC into its system protocols (such as MPEG-2 and MPEG-4 systems) and ISO media format protocols.

The ITU-T standard group of the International Telecommunication Union has adopted H.264/AVC as a part of the system specification of its H.32x series multimedia telephone system. With the adoption of ITU-T, H264/AVC has been widely used in video conference system, and has been supported by two major video phone product providers (Polycom and Tandberg). In fact, all new video conferencing products support H.264/AVC.

H.264 is likely to be used by various VOD systems to provide on-demand services for movies and TV programs directly to personal computers on the Internet.

Products and implementation

Several companies are making programmable chips that can decode H.264/AVC video. June 5438+1October 2005, Broadcom(BCM 74 1 1), Conexant(CX 24 18x), Neomatic (MiMagic 6) and STMicroelectronics (STB 71). Sigma Design Company expects to provide samples in March 2005. The appearance of these chips will greatly promote the rapid promotion of low-cost H.264/AVC video which can play standard definition and high definition. Four of these five chips (except Neomagic's chip, which is aimed at low-energy applications) have the ability to play high-definition video, and most of them will support the high configuration in the standard.

Apple has integrated H.264 into Mac OS X version v 10.4 (nicknamed Tiger), and released a Quicktime version supporting H.264 in May 2005. In April 2005, Apple upgraded its software DVD Studio Pro to support authorized HD content. The software supports recording the content in HD-DVD format to standard DVD or HD-DVD media (although there is no corresponding burner yet). In order to play HD-DVD content recorded on standard DVD, the required hardware is PowerPC G5, the software is Apple DVD Player v4.6, and Mac OS X v 10.4 or later.

Envivio has been able to provide standard definition real-time encoder and off-line high definition (720p, 1080i, 1080p) encoder for H.264 multicast. Envivio also provides H.264 decoders, H.264 video servers and licensing tools for windows, Linux and Macintosh platforms.

Modulus Viode provides H.264 standard-definition real-time encoder with broadcast quality for radio and telephone, and announced that it will provide high-definition real-time encoder (ME6000) in mid-2005. In April, 2004, the company demonstrated the high-definition real-time encoder in NAB, and won the "Pick Hit" award. The company uses LSI logic technology.

Tandberg TV station introduced the EN5990 real-time encoder. DirecTV and BSkyB have used EN5990 encoder for their direct satellite broadcasting service (DBS).

Harmonic also introduced their real-time encoder (model: div com MV 100). TF 1 (French Broadcasting Corporation) and Video Networks Limited (VNL) have announced the use of this product in their home video-on-demand services in London. Pace Micro has provided set-top boxes for some major direct broadcast satellite companies.

Sony PSP(PlayStation Portable) provides hardware support for H.264 main summary Level 3 decoding.

The software package developed by Nero AG and Ateme*** issued by Nero Digital Company provided support for H.264 coding, and won the "Pickhit" [1] award in Doom9.

Sorenson AVC Pro codec software related to the implementation of H.264 provided by Sorenson has been included in Sorenson Squeeze 4. 1 for MPEG-4.

The encoding and decoding software of x264 is a free software version, which is downloaded by means of GPL authorization.

The latest news: WinDVD 7 of InterVideo was officially released on June 24th, 2005. The release version is divided into the gold version of WinDVD 7 and the gold and platinum version of WindVD 7. Platinum version supports H.264/MPEG-4 AVC decoding and playback, and the recommended configuration is P4 3.6G (non-original version).

The latest news: The X 1300, X 1600 and X 1800 series graphics chips released by ATI on October 5th, 2005 support H.264 hardware accelerated decoding.