Common formats of lossy compression

——MP3(MP3PRO\MP3SURROUND)、AAC(*.3gp/*.mp4/*.m4a)、ATRAC3/ATRAC3+(*.aa3)。

Let's first understand the principle of audio compression: using the psychoacoustic characteristics of human hearing (spectrum masking characteristics and time masking characteristics, etc.). ) and the human ear's limited resolution of signal amplitude, frequency and time, all frequencies that the human ear can't feel are not encoded or transmitted, that is, all parts (called irrelevant parts or irrelevant parts) that don't contribute to the human ear's resolution of sound signal strength, tone and orientation are not encoded or transmitted. When coding the imperceptible part, it is allowed to have large quantization distortion and make it lower than the auditory threshold (that is, the lowest volume that the human ear can hear), and the human ear still cannot feel it. Audio compression works by using these characteristics. 1, equal loudness curve

The sensitivity of human hearing varies with frequency. That is, usually two tones with the same power but different frequencies sound different. It can be seen from the equal loudness curve that the human ear is most sensitive to the frequency of 4KHz, that is, the sound pressure level (loudness) that can be detected at 4KHz is not detected at other frequencies. This provides conditions for the distortion of some less sensitive frequencies.

2. Protective

We studied shielding in high school physics. It is the strong sound signal that masks the weak sound signal so that we can't detect it. Moreover, when the two sounds are close in time and frequency, the shielding effect will be strong. Therefore, we can encode without encoding or transmitting the shielded part. In this way, there is still no big loss of sound quality, and it is not easy for the human ear to detect.

3. Critical frequency band

For human hearing, the perceptual characteristics of sound do not change linearly (human hearing is not so good), but can be expressed by a series of limited frequency bands, which are called critical frequency bands. Simply put, the whole frequency band is divided into several segments. In each frequency band, the auditory perception of human ears is the same, that is, the psychoacoustic characteristics are the same.

The essence of inverse coding is algorithm. 1, MP3(MP3 pro \ MP3 Surrounding)

MP3 should be considered as the most widely used lossy compressed digital audio format. Its full name is MPEG (Moving Picture Experts Group) Audio Layer -3. 1987 A lossy compressed digital audio format developed by Hof Institute in Flawn, Germany, was patented in 1989. It was not perfect at first, but more like a coding standard framework, which was left for people to improve. 1992, this technology was incorporated into the MPEG specification and officially named as ——MP3.

MP3 files are composed of frames, which are the smallest unit of MP3 files. What is a framework? Remember how the original animation was made? Different continuous pictures are switched to achieve dynamic effect. Each picture is a "frame", but the difference is that the frames in MP3 record audio data instead of graphic data. The frame rate of MP3 is about 30 frames per second.

Each frame consists of a frame header and frame data, and the frame header records the basic information of the frame, including bit rate index and sampling rate index (which is very important for understanding ABR and VBR coding methods). Frame data, as its name implies, is to record the main audio data.

All the above are the basis of MP3 coding, but in fact, the early encoders are very imperfect, the compression algorithm is almost rough, and the sound quality is not ideal. The sound quality of MP3 has made two leaps: the introduction of perceptual model and the application of VBR technology.

PS: VBR is the abbreviation of variableBitrate, which means variable rate, that is, when MP3 files are compressed, the compression rate will be automatically reduced when the rate is high, and it will be automatically increased when the rate requirement is low. The purpose of this is to improve the speed of play online and reduce the system resources occupied when playing locally ... This is an algorithm developed by Xing, who encodes the complex part of a song at a high bit rate. Although the idea is good, it is a pity that the algorithm of Xing encoder is very poor and the sound quality is far from CBR. Fortunately, Lame perfectly optimized the VBR algorithm, making it the best coding mode for MP3. This is a way to give consideration to the file size under the premise of ensuring the quality, and the coding method is recommended.

MP3 can survive to this day, but its development has not stopped. On June 38, 2006540 14, Thomson of France and RCA of the United States jointly launched a new compression format: MP3PRO. MP3PRO is improved on the basis of MP3 technology, and adopts the codec enhancement technology developed by CodingTechnologies, which is called SBR(SpectralBandReplication). When making MP3PRO files, the encoder divides the audio into two parts. One part is to separate the low-frequency part of audio data, and get the normal MP3 audio stream through the traditional MP3 technology coding. This makes the MP3 encoder focus on the compression of low-frequency signals to obtain better quality, and enables the original MP3 player to play MP3PRO files. The other part is to encode the separated high-frequency signal and embed it into MP3 stream. Traditional MP3 players will ignore it, but new MP3PRO players will restore it and combine it to get high-quality full-bandwidth sound. Through this technology, MP3 Pro can provide the same sound quality as MP3 at 128Kbps at the coding rate of 64 kbps, and the sound quality is almost the same, but the volume is only half that of MP3.

PSP supports MP3PRO, and there are many format conversion softwares that support MP3PRO. You can find them online. You can try it if you are interested. Definitely better than mp3.

Thomson officially announced that MP3, the world's most popular music compression format, entered the multi-channel era in early February 2004. MP3SURROUND is jointly developed by FraunhoferIIS and Agere, and uses binaural CueCoding(BCC) technology for psychoacoustic coding, which can realize multi-channel surround and ensure the file size. AgereSystems, which joined at the same time, is mainly responsible for promoting the multi-channel MP3 format-MP3 surround. MP3SURROUND technology realizes high-quality audio surrounded by 5. 1 channel, and has a wide range of applications, which can play a role in network music distribution, broadcasting system, PC audio-visual application, game audio, consumer electronics, car audio and so on. Although multiple channels are integrated, Thomson said that the MP3SURROUND file does not increase much compared with ordinary MP3 (with similar sampling rate), and it is only half of other surround multi-channel audio formats. More importantly, MP3SURROUND provides good compatibility and can be used normally on existing MP3 software and MP3 players.

2、AAC(*.3gp/*.mp4/*.m4a)

AAC is the abbreviation of AdvancedAudioCoding, which is composed of Fraunhofer Institute, Dolby and AT & amp； T*** was developed by the same company. AAC is a part of MPEG-2 specification, which is suitable for coding in the range of 8Kbps mono telephone quality to 160Kbps multi-channel ultra-high quality audio. Compared with MP3, AAC has added some features that MP3 audio format does not have, such as the perfect reproduction of stereo, the scanning of bit stream effect sound, multimedia control, noise reduction optimization and so on. , so that the sound quality of CD can be perfectly reproduced after audio compression. It also supports up to 48 tracks, 15 low-frequency tracks, more sampling rate and bit rate, multi-language compatibility and higher decoding efficiency. In a word, AAC can provide better sound quality on the premise that it is 30% smaller than MP3 files.

Some of these modules will now be explained:

gain control

The gain control module is used for variable sampling rate configuration, and consists of a polyphase quadrature filter (PQF), a gain detector and a gain regulator. This module divides the input signal into four frequency bands with equal bandwidth. There is also a gain control module in the decoder to obtain low sampling rate output signals by ignoring the high frequency subband signals of PQF.

Filter bank (filter bank)

Filter bank is a conversion module that converts the input signal from time domain to frequency domain, and it is the basic module of MPEG-2AAC system. This module adopts the improved discrete cosine transform MDCT, which is a linear orthogonal overlapping transform, and uses a technique called time domain aliasing cancellation (TDAC). MDCT uses KBD(Kaiser-Besselderived) window or sine window, and the forward MDCT transformation can be expressed by the following formula:

The inverse MDCT transformation can be expressed by the following formula:

Among them,

N= number of samples,

N= transform block length,

I= block number,

The above two discrete cosine transform formulas are introduced in detail in Discrete Functions and Mathematical Equations, which are only for interested players to understand and need not be delved into.

Instantaneous noise shaping TNS

In perceptual sound coding, TNS module is a method to control the instantaneous shape of quantization noise, which solves the problem of mismatch between masking threshold and quantization noise. The basic idea of this technology is that the pitch signal in time domain has an instantaneous peak in frequency domain. TNS uses this duality to extend the known predictive coding technology, and puts the quantization noise under the actual signal to avoid mismatch.

Joint stereo coding

Jointstereocoding is a spatial coding technology, whose purpose is to remove redundant spatial information. MPEG-2AAC system includes two spatial coding technologies: middle/side coding and intensity/coupling. M/S coding uses matrix operation, so M/S coding is called matrixedstereocoding. M/S coding does not transmit left and right channel signals, but uses normalized sum signals and difference signals. The former is used for the center M (middle) channel and the latter is used for the side S (side) channel, so M/S coding is also called "sum-difference coding". Sound intensity/coupling coding has many names, some of which are called intensitystereocoding or channelcouplingcoding. The basic problem they discussed was the irrelevance between channels.

Forecast (forecast)

This is a widely used technology in speech coding system, which is mainly used to reduce the redundancy of stationary signals.

Quantizer (quantizer)

A non-uniform quantizer is used.

Noiseless coding (noiseless coding)

Noisy coding is actually huffman encoding, which encodes the quantized spectral coefficients, scale factors and direction information.

PS: Personally, I prefer AAC, so I write it in detail. You might as well try. Definitely better than MP3. You can convert AAC(*.m4a) with iTunes6. The operation of iTunes6AAC is very simple. You can directly copy AAC(*.3gp\*.mp4\*.m4a) to [Music] and play it.

It can be said that aac is the best lossy compression method at present.

The highest quality pu (naked eye) is indistinguishable without damage.

3、ATRAC3/ATRAC3+(*.aa3)

Friends who played MD in their early years know that Sony's ATRAC audio format algorithm tailored for MD has been widely used in portable audio devices such as Sony's NetworkWalkman. "ATRAC3plus" stands for "adaptive speech coding 3+", which is a set of audio compression technology based on psychoacoustics principle, developed from ATRAC3 format, and this technology is becoming more and more perfect in 2002. This technology is the theoretical basis for reducing the size of MD Walkman to a minimum.

To analyze ATRAC3/ATRAC3+, we should first talk about its big brother-ATRAC algorithm. When compressing digital audio data, a certain amount of quantization noise is usually introduced into the signal. In order to prevent these signals from being detected by human ears, the usual practice is to decompose the signals into a group of units, each of which corresponds to a specific time-frequency range. The encoder will analyze according to the psychoacoustic principle mentioned above and encode the important units with high precision. For insensitive units, some quantization noise can be retained without affecting the perceptual quality of human ears. When decoding, the quantized spectrum will be re-established according to the bit allocation, and then the audio signal will be synthesized.

ATRAC is no exception, but there are some improvements. ATRAC also applies subband decoding and conversion decoding techniques, and the input signal is distributed with uneven frequency division that emphasizes important bass areas. In addition, ATRAC uses variable block length to change the input signal, which can ensure efficient decoding when it passes stably and will not affect the time resolution when it passes instantaneously. Specifically, the input signal is divided into three frequency bands: 5.5 125KHz and 1 1.025KHz, and the subband decomposition is completed by QMF (Quadrature Radio Filters Integral Mapping Filter). These three frequency bands are indexed by MDCT (Modified Cosine Transform of Floppy Disk) discrete cosine transform-similar to the usual fast Fourier transform, which is introduced in Advanced Mathematics II and mathematical equations. ) into spectral values, MDCT allows 50% overlap between blocks, which can improve frequency resolution while maintaining critical sampling. The length of the block can be changed according to the kind of signal, which is the adaptive part of ATRAC (this is mainly to mask the initial quantization noise with masking).

When the ATRAC algorithm has been developed for 10 years and can no longer meet the market demand, Sony introduced a new algorithm in August 2002-

Compared with ATRAC, the core algorithm of ATRAC3/ATRAC3+. has no essential change, but it adopts improved band separation filtering and MDCT, and uses gain adjustment, tone component separation, joint stereo and other technologies to further reduce the amount of audio compression data.

4、AAL(ATRACAdvancedLossless)

AAL is the abbreviation of ATRACAdvancedLossless coding, which is a new audio compression format developed by Sony. Its characteristic is lossless compression, without losing any audio information, and a CD can be compressed to 30%-80% of the original.

5. Og

The full name of Ogg should be ogg Vorbis, which is a new audio compression format, similar to existing music formats such as MP3. But one difference is that it is completely free, open and without patent restrictions. An outstanding feature of OGG Vobis is that it supports multiple channels. With its popularity, it will not be a dream to listen to DTS-encoded multi-channel works with walkman in the future.

Vorbis is the name of this audio compression mechanism, while Ogg is the name of a project to design a completely open multimedia system.

The extension of Ogg Vorbis file is. OGG。 The design format of this file is very advanced. The created OGG file can be played on any player, so the file format can be continuously improved in size and sound quality without affecting the old encoder or player.

Compared with aac, the low frequency is slightly dominant and the high frequency is slightly worse.

The highest quality pu (naked eye) is indistinguishable without damage.

The highest quality, namely Q 10, is almost twice that of Q500 encoded by aac using faac.

Coding is open source.