First, the current situation and development trend of international audio coding technology
At present, the International Moving Picture Expert Group (MPEG) has introduced several audio coding technologies. Among them, MPEG-1(ISO/IEC11172-3) is divided into three layers according to coding complexity, and supports mono and dual mono coding with sampling rates of 32, 44. 1 and 48KHz. The third layer (MP3), when encoding two-channel stereo, encodes most music with 128Kbit/s, which can achieve the sound quality effect close to that of CD, and becomes the preferred standard for network music and portable electronic devices. MPEG-2bc (ISO/IEC13818-3) is a backward-compatible multi-channel expansion scheme of MPEG- 1, which adds "low frequency effect" and upgrades the channel to 5. 1 channel coding, supporting 16 and 22. MPEG-2 Advanced Audio Coding (ISO/IEC13818-7 AAC), which marks the highest technical level of MPEG, provides high-quality audio coding in the optional range of 1 ~ 48 channels with a sampling rate of 8 ~ 96 kHz. It is suitable for multi-channel high-quality audio coding from telephone sound quality with bit rate of 8kbit/s to 160kbit/s, and uses AAC to encode mono audio. At 64Kbit/s, the quality of most music can be close to that of CD. Therefore, compared with 96Kbit/s of MP3, the coding efficiency has been greatly improved, and it is considered as the next generation audio coding standard.
In the aspect of multi-channel surround sound coding, AC-3 of Dolby Laboratories in the United States provides the coding of audio signals sampled at 32, 44. 1 and 48KHz, from mono to 5. 1 surround sound, and supports multi-channel high-quality audio streams from 32kbit/s to 640 kbit/s. At present, DolbyAC-3 relies on its good sound field and sound image reproduction ability.
Other excellent audio coding technologies, such as ATARC of Sony, PAC of Bell Laboratories and WMA of Microsoft, have been widely used.
At present, judging from the development of international digital audio application, digital audio coding technology has been widely used in the fields of Internet, broadcasting, personal consumer electronic products, digital film and television, etc. With the rise of 3G technology, it is entering the field of mobile communication. Therefore, the new generation of digital audio coding technology has higher requirements in transmission reliability, bandwidth requirements, copyright security and so on.
China started late in the field of digital audio coding. At present, Tsinghua University, Tianjin University, xidian university, Harbin Institute of Technology, South China University of Technology, Southeast University, Beijing University of Posts and Telecommunications, etc. , has not yet achieved mature and complete results.
II. International standards and technical features of image and video coding
In the past 10 years, image coding technology has developed rapidly and widely, and it is becoming more and more mature. Its symbol is the formulation of several international standards on image coding, namely, the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) coding standards JPEG for still images, and the ITU-T video coding standards H 26 1, H 263 and ISO/IEC coding for moving images. These standard image coding algorithms synthesize various excellent image coding methods and represent the current development level of image coding.
1, JPEG (Joint Photographic Experts Group)
JPEG is a still image compression standard formulated by the ISO/IEC Joint Image Expert Group, and it is an international standard suitable for continuous tone (including gray and color) still image compression algorithm. JPEC algorithm * * * has four operating modes, one of which is lossless compression algorithm based on spatial prediction (DPCM) and the other three are lossy compression algorithms based on DCT.
1) lossless compression algorithm can ensure the lossless reconstruction of the original image.
2) Based on the sequential mode of DCT, the image is encoded from top to bottom and from left to right, which is called the basic system.
3) Progressive mode based on DCT refers to encoding images from coarse to fine.
4) Layered mode. By coding images with different resolutions, images with different resolutions can be obtained according to different requirements.
JEPG has great scalability for image compression, and the relationship between image quality and bit rate is as follows:
A) 1.5 ~ 2.0 bit/pixel: It is almost not transparent to the original image.
B) 0.75 ~ 1.5 bit/pixel: excellent quality, meeting most applications.
C) 0.5 ~ 0.75 bit/pixel: good to very good quality, satisfying most applications.
D) 0.25 ~ 0.5 bit/pixel: the quality is moderate to very good, meeting some applications.
2、JPEG-2000
Compared with the previous JPEG standard, the compression ratio of JPEG-2000 is about 30% higher than that of JPEG, which has many incomparable advantages. The biggest difference between JPEG-2000 and traditional JPEG is that it abandons the block coding method based on DCT and changes to the multi-resolution coding method based on wavelet transform.
First, JPEG-2000 can realize lossless compression. In practical applications, some important images, such as satellite remote sensing images, medical images and cultural relics photos, usually need lossless compression. Prediction method is a classic image lossless coding method, which has been developed and written as a standard in JPEG-2000.
Another advantage of JPEG-2000 is its robustness to bi terror. Therefore, the system using JPEG-2000 has good stability, smooth operation, good anti-interference and simple operation.
JPEG-2000 can realize progressive transmission, which is an extremely important feature of JPEG-2000. It can transmit the outline of the image first, and then transmit the data step by step, and continuously improve the image quality to meet the needs of users, which is of great significance in network transmission. Using JPEG-2000 to download a picture, users can see the outline or thumbnail of the picture first, and then decide whether to download it. Moreover, when downloading, the quality of the downloaded image can be determined according to the user's demand and bandwidth, so as to control the size of the data.
Another extremely important advantage of JPEG-2000 is the characteristics of the region of interest. Users can specify areas of interest in the processed image, specify specific compression quality when compressing these areas, or specify specific decompression requirements when restoring, which brings great convenience to people. In some cases, only a small part of the image is useful to users, and high compression ratio is adopted for these areas. It can effectively compress the amount of data without losing important information, which is the compression strategy adopted by the region of interest coding scheme. The advantage of ROI-based compression method is that it combines the subjective requirements of the receiver and realizes interactive compression.
3、MPEG- 1
International Organization for Standardization (ISO/IEC)MPEG (Moving Picture Expert Group) has been committed to the standardization of moving pictures and their audio coding, and has formulated a series of international standards on general moving pictures. MPEG- 1 was formulated in 1993, which is an international standard for encoding moving images and audio of digital storage media at the rate of1.5 mbit/s. The formulation of this standard makes it possible for digital video based on CD-ROM and MP3 products. The maximum bandwidth of MPEG- 1 is 1.5Mbit/s, in which 1 1Mbit/s is used for video, 128Kbit/s for audio, and the rest is used for the MPEG system itself.
In order to pursue high compression efficiency, remove the temporal redundancy of image sequences and meet the random access requirements necessary for multimedia applications, MPEG- 1 Video divides image coding into four types: I-frame, P-frame, B-frame and D-frame. I-frame is an intra-coded frame, which adopts intra-DCT coding similar to JPEG, and the compression ratio of I-frame is the lowest among several coding types. P frame is a predictive coding frame, which adopts forward motion compensation prediction and error DCT coding, and is predicted by the previous I or P frame. B frame is a bi-directional predictive coding frame, which adopts DCT coding with bi-directional motion compensation prediction and error, and I or P frames are used for prediction before and after, so the compression efficiency of B frame is the highest. D frame is a Dc coded frame, which only contains the DC component of each block. MPEG- 1 uses motion compensation to eliminate the redundancy on the time axis of image sequence, which can make the compression ratio of P frame and B frame much higher than that of I frame.
4、MPEG-2
The MPEG-2 standard introduced by MPEG Organization 1995 is a further extension and improvement on the basis of MPEG- 1 standard, which is mainly aimed at the coding standard of 4 ~ 9 MB IT/s moving images and their accompanying sounds for digital video broadcasting, high-definition television and digital video discs. MPEG-2 is the foundation of digital TV set-top boxes and DVD products. MPEG-2 system requires backward compatibility with MPEG- 1 system, so its syntax has good compatibility and expansibility. The goal of MPEG-2 and MPEG- 1 is the same, which is still to improve the compression ratio and audio and video quality. The core technologies adopted are block DCT and inter-frame motion compensation prediction technology. MPEG-2 video allows data rates as high as 100Mbit/s, supports interlaced video formats and many advanced performances. Considering the characteristics of interlaced scanning of video signals, MPEG-2 specially sets two modes of "coding by frame" and "coding by field", and correspondingly expands the motion compensation and DCT methods, thus significantly improving the efficiency of compression coding. Considering the universality of the standard, important parameter values are added, allowing larger image format, bit rate and motion vector length. In addition, MPEG-2 video compression coding has been extended as follows:
1) The ratio of input/output image color components can be 4: 2: 0, 4: 2: 2, 4: 4: 4.
2) The input/output image format is not limited.
3) The interlaced video signal can be directly processed.
4) Scalability in spatial resolution, temporal resolution and signal-to-noise ratio is suitable for the requirements of decoded images for different purposes, and different levels of priority can be given in transmission.
5) The scalability of the code stream structure, such as header information and motion vectors, can be given higher priority, while the high frequency components of DCT coefficients can be given lower priority.
6) The output code rate can be constant or variable to adapt to synchronous and asynchronous transmission.
MPEG-2 video is a series of systems, and each system has arranged compatibility and compatibility. It allows coding in four source formats or levels, from simple definition (CIF format) to complete high definition television (HDTV). In addition to the flexibility of this source format, MPEG-2 also provides four levels, five categories and *** 1 1 independent technical specifications from low to high resolution. The resolution and code rate of the same category of images are far from each other. Table 2 shows the combination of levels and categories allowed by MPEG-2.
5、MPEG-3
MPEG-3 is a coding and compression standard originally developed by ISO/IEC for HDTV. It requires the transmission rate to be between 20Mbits/sev-40Mbits/sec, but this will slightly distort the picture. However, due to the excellent performance of MPEG-2, MPEG-3 originally designed for HDTV was strangled in the cradle before it was born.
6、MPEG-4
1992165438+10. In October, MPEG Expert Group decided to develop a new international standard for extremely low bit rate audio/video (AV) coding, namely MPEG-4. For academic circles, very low bit rate (less than 64Kbit/s) is the last bit rate range of video coding standard.
After deeply analyzing the development trend of TV, computer, communication and their cross-integration in AV field, MPEG-4 expert group thinks that MPEG-4 should provide a new communication mode, whose core is the storage, processing and operation of AV information based on content, and support the functions of interactivity, high compression ratio and general storage. At the same time, its structure should be adaptable and extensible to adapt to the continuous development of hardware and software technologies and promote the timely integration of new technologies.
Compared with the first two compression standards of MPEG, MPEG-4 is no longer a simple video and audio coding and decoding standard. It takes content and interactivity as its core, thus providing a broader platform for multimedia. It defines more formats and frameworks than specific algorithms, so people can add many new algorithms to the system. In addition to some compression tools and algorithms, various multimedia technologies such as image analysis and synthesis, computer vision and speech synthesis can also be applied to coding.
H.26 1 is a coding standard proposed by ITU-T for applications that require real-time coding and decoding and low delay, such as videophone, conference TV and narrow-band ISDN. The bit rate included in the standard is p*64Kbit/s, where p is an integer with the range of 1 ~ 30, and the corresponding bit rate is 64 kbit/s ~ 92 mbit/s. ..
7、H.26 1
H.26 1 standard is divided into two coding modes: intra mode and inter mode. For head and shoulder images with moderate motion, inter-frame coding mode will dominate; However, for the sequence images with frequent picture switching or violent movement, the inter-frame coding mode should be frequently switched to the intra-frame coding mode.
In order to reduce the channel error code, an error correction coding method called BCH (5 1 1, 493) is adopted. This error-correcting code can automatically correct 2 errors in 493 bits. According to H26 1, the source encoder must have the function of error correction coding, which is optional.
8、H.263
1995, ITU-T summarized the latest development of video image coding in the world at that time, and formulated the H.263 standard for low bit rate video applications, which is recognized as the best result that the pixel-based hybrid coding scheme of the first generation coding technology can achieve. In the following years, ITU-T supplemented it several times to improve the coding efficiency and enhance the coding function. The supplementary and revised versions are h. 263 ++ in 1998 and h263++ in 2000. H.263 series standards are especially suitable for video transmission in PSTN network, wireless network and Internet.
H.263 has been adopted as the terminal standard by many videophones, such as H.324 supporting PSTN and wireless networks, H.320 supporting N-ISDN, H.3 10 supporting B-ISDN, etc. The core of H.263 source coding algorithm is still the DPCM/DCT hybrid coding algorithm adopted in H.26 1 standard, and the principle block diagram is also very similar to H.26 1.
9.MPEG-7 and MPEG-2 1
MPEG-7 is a multimedia content description interface for information representation, and MPEG-7 is a semantic-based representation. MPEG-7 defines the descriptor standard set for describing various types of multimedia information, and the corresponding description scheme can be used to standardize the generation of multimedia descriptors and the organic relationship between different descriptors.
These descriptors are closely related to the content of the specified multimedia object, and the method of extracting object features provides an interface for accurate retrieval based on content and semantics. On this basis, MPEG-7 defines a description definition language (DDL) to specify and generate a description scheme, that is, it hopes to propose a new representation method of video and audio information, which is different from waveform-based representation method, compression-based representation method (such as MPEG- 1 and MPEG-2) and object-based representation method (MPEG-4). This representation allows a certain degree of interpretation of the meaning of information, which can be accessed by a device or computer decoder. The purpose of MPEG-7 is to provide a standardized core technology to describe the audio-visual content in multimedia environment, and finally make audio-visual acquisition as simple and convenient as text acquisition.
MPEG-7 can describe a wide range of multimedia objects, and its core DDL language will fully absorb the characteristics of various existing media description languages and realize universal adaptability to multimedia data. The idea of object-based coding proposed in MPEG-4 will become the basic means to deal with video and audio objects in multimedia databases, including feature extraction, compression coding and so on. The multimedia content description function of MPEG-7 can improve the performance and expand the functions of MPEG- 1, MPEG-2 and MPEG-4.
Finally, MPEG-7 will provide a description of the content, not the content itself. It will not replace the existing MPEG standards (MPEG- 1, MPEG-2, MPEG-4), but only supplement the existing three standards.
The new standard MPEG-2 1 is a standard that supports users to use multimedia resources transparently and conveniently through heterogeneous networks and devices. Its purpose is to establish an interactive multimedia object and realize a variety of business models, including automatic management of copyright and transactions and respect for the privacy of content users.
Third, using the existing domestic disk drive technology
1, DVD technology
According to the investigation, many DVD players can't realize the real AC-3 decoding function, but adopt the following methods instead:
1), simple two channels. Whether the audio data on the CD is encoded according to AC-3 or not, it is output as two mixed audio channels. Because the audio output of the other four channels is omitted, the hardware cost is greatly reduced. Compared with the real Dolby AC-3 decoding, the cost of paying patents to Dolby Company is greatly reduced. This is a low-cost scheme for DVD players. In this way, users can only hear simple left and right channel effects. If you want to enjoy the real Dolby AC-35. 1 channel surround sound effect, you need to add an AC-3 decoding power amplifier with coaxial or optical fiber input terminals, and its market price is about 2000 yuan, which is the price of buying another DVD player.
2) Two channels with six output terminals. This method, also known as "pseudo-six-channel", actually only has three sets of identical two-channel outputs, which is a simple copy of the two-channel machine, and it is impossible to realize the real machine similarity of Dolby AC-35. 1 channel, and it is often easy to become a source of legitimate interests. Therefore, consumers should carefully screen when purchasing.
3) Virtual simulation of AC-3 channel. In this way, through a sound field processing chip, the audio of two channels is simulated by software algorithms such as superposition and cancellation, which is similar to AC-35+38+0 channel decoding output. Dolby AC-35. 1 channel effect, but because its sound source comes from two main channels, the expressive force and layering of the sound field are much worse than the real AC-3 decoding, which is easy to confuse the audience and infringe on the interests of consumers.
So, what is the real Dolby AC-35. 1 channel decoded output? Dolby AC-3 is a perceptual coding technology specially designed for multi-channel digital audio. It combines acoustics and advanced digital signal processing technology, and has unprecedented high efficiency, high quality and versatility. Multi-channel form, Dolby AC-3 provides five full audio channels, and its arrangement is usually called 3/2 structure: three front channels (left, middle and right) plus two surround channels, and one channel with low audio effect. Generally speaking, it is left front, right front, center, surround left, surround right and heavy bass, which is the so-called "5. 1" channel. Compared with analog AC-2 (Dolby Prologic), Dolby AC-3 has two completely independent surround channels, and each channel can provide full-band fidelity audio with the same three channels as the front row. Therefore, the decoding that truly reproduces the above effects is the real Dolby AC-35. 1 channel decoding.
2.HDV technology
HD 12 compression coding system is a compression coding system developed by Beijing Kaicheng HD Technology Co., Ltd. for laser multimedia discs with HDV HD digital movie format. The system adopts the optimized MPEG-2 video coding format. Based on the original MPEG-2, it adopts the methods of redefining macro block size, resetting quantization length, optimizing entropy coding and optimizing motion compensation. Using the latest progress in semiconductor field and relying on the powerful processing ability of semiconductor chips, higher compression ratio and better restoration effect are realized.
The HD 12 compression coding system was developed by technicians of Beijing Kaicheng HD Technology Co., Ltd. for more than two years. The system not only has efficient real-time compression function, but also can complete various editing functions such as image definition processing and restoration, subtitle and dubbing generation and superposition.
The HD 12 compression coding system developed by Kaicheng HD Technology Co., Ltd. can realize efficient compression of high-definition video streams, provide a good technical platform for the current lack of high-definition video programs, and fully meet the compression requirements of high-definition video programs at present, so that consumers can enjoy more and better high-definition video programs.
HDV players can be compatible with CD, vcd and DVD, but HDV discs can't be seen on ordinary vcd and DVD. In other words, HDV discs can only be used with HDV HD digital movie players. If there is no machine, the bought disc can only be equivalent to a scrapped disc.
According to the developer of Kaicheng HD Technology Co., Ltd., "Because HDV discs adopt ultra-compression technology, one disc can store 3-5 high-quality movie programs. At present, this technology is only mastered by their manufacturers in China, and the technology is encrypted, and outsiders can't steal it at all. "
3.EVD technology
Guo Fu's audio compression technology began at the beginning of the company's establishment (March 2000) and was regarded as "a new generation of high-density digital laser video disc system EVD?" The sub-projects in the project have gone through several stages, such as start-up, development and maturity. At present, nearly 20 core patented technologies have been applied. These patents have formed a set of efficient audio coding technology scheme EAC based on multi-resolution analysis with independent intellectual property rights, which was highly praised by the participating experts in the subjective sound quality evaluation experiment organized by Jiangsu Electronic Products Supervision and Inspection Institute in July 20001.
At present, EAC coding technology can provide mono-channel, two-channel stereo, 5. 1 surround stereo, multi-sampling rate and multi-bit rate coding schemes, and the coding efficiency is further improved, which has become EVD? Technical standard of standardized audio coding.
In order to further improve the coding efficiency, especially the audio quality at very low bit rate, we have developed independently and strengthened technical cooperation with foreign companies that have mastered the most advanced audio coding technology. After long-term technical cooperation, Beijing Guo Fu Digital Technology Co., Ltd. will set up a joint venture with Swedish-German coding technology company with the most advanced bandwidth expansion technology in the world to jointly develop and popularize EAC Plus technology. On the basis of EAC technology, EAC Plus technology will further improve the level of audio coding technology in China and make the audio coding technology in China reach the international leading level.
As we know, audio coding technology can be classified from many angles: lossy and lossless, waveform and parameters, narrowband and broadband, constant bit rate and variable rate. However, the signal types of audio coding processing can be simply divided into two categories: slowly changing components and instantaneously changing components. Of course, from the point of view of the model, it can be divided into series component, instantaneous component and noise component. Because we focus on waveform coding technology at present, we don't make such a division. It can be said that all waveform coding technologies are trying to find a coding technology that is as efficient as possible for slow and instantaneous changes, while ensuring acceptable coding complexity. The reason for the problem lies in the auditory characteristics of human ears to different signals. Although theoretically speaking, the human ear's response to signals is a very complicated physiological and psychological problem, in the process of coding, two contradictions stand out. For slowly varying components, the frequency resolution of human ear response is high, but the time resolution is low; For instantaneous changes, the frequency resolution is low and the time resolution is high. And this characteristic varies with different signals. Higher frequency resolution corresponds to higher coding efficiency, but at the same time, the pre-echo suppression ability is poor; The higher the temporal resolution, the better the pre-echo suppression ability, but the coding efficiency is low.
In the process of design and implementation, EAC has been trying to process/encode various audio signals in a more natural way, which is the basic technical route of EAC design. Specifically, EAC has always followed the multi-resolution analysis mechanism and strived to encode various audio signals more efficiently within a unified filtering framework.
4.HVD technology
On April 28th, the first domestic high-definition DVD industry alliance (HVD Alliance for short) was grandly established in Shanghai. Tsinghua Tongfang, as an important manufacturer of a new generation of high-definition video players in China, has successfully become the first member of the alliance with its great influence in the field of high-definition DVD.
HVD Alliance is an industrial consortium voluntarily composed of complete machine manufacturers, content providers, publishers, core chip manufacturers and related universities and research institutes, with core key components such as IC with independent intellectual property rights and self-developed complete machine systems and technologies as the link. The goal of the alliance is to promote the orderly, efficient and sustainable development of HVD technical standards, markets and industries through the effective integration of industrial chains, and contribute to the transformation of China DVD industry from a "manufacturing power" to a "technology power".
The short-term goal of the alliance is to develop and promote the whole machine content and CD industry with "high definition" level of HVD, and make HVD an upgraded product of DVD. The first batch of "HVD Alliance" has 18 members, and its main tasks are: to establish and protect the intellectual property mechanism of "HVD"; Enjoy intellectual property rights within the alliance; Authorization and format verification of "HVD" logo to ensure the unification of HVD machines and panels; Do a good job of CD encryption and copy prevention; Organize various technical exhibitions, product promotion meetings and format standard meetings.
Relying on its own strong scientific research strength, after more than three years of continuous exploration, Tsinghua Tongfang has become one of the few domestic manufacturers who have mastered the technology of HD DVD players. As the latest scientific and technological representative of Tsinghua Tongfang DVD player products, not long ago, Tsinghua Tongfang introduced the highest-tech DVP-i9 19 HD DVD, which can realize progressive scanning of 480P and 720P, and interlaced scanning of 1920* 1080i. At the same time, as an alternative product of DVD, i9 19 also supports MPEG-4-4 movie playback, and has a USB 1. 1 interface, which can directly exchange and view data with many digital products. Judging from the company's recent sales data, Tsinghua Tongfang HD products have received a good response in the market, keeping pace with the alternative products such as EVD and HDV launched in the same period.
This time, Tsinghua Tongfang successfully joined the HVD Alliance, which will win more opportunities to lead the HD DVD era and have a far-reaching impact on the future direction of the HD DVD industry.
HVD is the abbreviation of English high-definition multifunctional CD. HVD combines powerful functions, clear images, low prices, excellent backward compatibility, key technologies and independent intellectual property rights. HVD Technology has applied for six invention patents in China National Intellectual Property Administration.
HVD supports interfaces of various input formats:1080i/720p/576p/576i/480i/VGA/SVGA, which conforms to video, Y/C and YPbPr HD standards. HVD, horizontal definition and vertical definition all reach 720 lines.
HVD can store 150-minute HD movies on a DVD9-sized CD.
5.FVD technology
The current version of FVD specification uses 650nm red laser; Na 0.6 ~ 0.65, its physical specification is higher than DVD capacity; The capacity of single-sided single-layer FVD disk can reach 5.4 GB ~ 6 GB. The first generation adopts 8/ 16 as the encoding method, and the second generation will adopt the efficient 8/ 15 encoding method in the future, and improve the error correction (ECC) ability. In the logical specification part, Microsoft Windows Media Video-9 (WMV-99 (WMV-9) video compression technology can accommodate 135 minutes of 1280x720p high-definition programs, among which the newly developed high-definition audio and video technologies such as dynamic menu background, program playback, menu playback, sub-picture playback, master-slave playback, etc. In addition, in order to protect intellectual property rights, an anti-copy mechanism of advanced encryption standard (AES) content protection system will be provided.