Audio compression technology refers to the application of appropriate digital signal processing technology to the original digital audio signal stream (pulse code modulation) to reduce (compress) its bit rate without losing useful information or the introduced loss can be ignored, which is also called compression coding. It must have a corresponding inverse transformation, called decompression or decoding. Audio signal may introduce a lot of noise and some distortion after passing through a codec system. The emergence and early application of audio compression technology, redundant information of audio signal and spectrum masking effect. Time domain masking effect. , compression coding method, other division of compression method, main classification and typical representative of audio compression algorithm, time domain compression (or waveform coding) technology, sub-band compression technology, standardization of audio compression technology and MPEG-1, the appearance and early application of audio compression technology refers to the application of appropriate digital signal processing technology to the original digital audio signal stream (pulse code modulation) to reduce (compress) without losing useful information or introducing negligible losses. It must have a corresponding inverse transformation, called decompression or decoding. Audio signals may introduce a lot of noise and some distortion after passing through a codec system. The advantages of digital signal are obvious, but it also has its own corresponding disadvantages, that is, the increase of storage capacity requirements and the increase of channel capacity requirements during transmission. Taking a CD as an example, the sampling rate is 44.1KHz, and the quantization accuracy is 16 bits, so a one-minute stereo audio signal needs to occupy about 1M bytes of storage capacity, that is, the capacity of a CD turntable is only about 1 hour. Of course, this problem is more prominent in the field of digital video with much higher bandwidth. Are all these bits necessary? It is found that there is great redundancy in direct storage and transmission using PCM code stream. In fact, sound can be compressed at least 4: 1 under lossless conditions, that is, only 25% of the digital quantity is used to retain all the information, and the compression ratio can even reach several hundred times in the video field. Therefore, in order to make use of limited resources, compression technology has received extensive attention since it appeared. The research and application of audio compression technology has a long history. For example, A-law and U-law coding are simple quasi-instantaneous compandor and have been applied in ISDN voice transmission. The research on speech signals developed earlier and matured, and has been widely applied, such as adaptive differential PCM(ADPCM), linear predictive coding (LPC) and other technologies. In the field of broadcasting, audio compression technology is used in systems such as NICAM (Near Instantaneous Companded Audio Multiplex). Redundant information of audio signal Digital audio compression coding compresses audio data signal as much as possible on the premise of ensuring that the signal is not distorted in hearing. Digital audio compression coding is realized by removing redundant components from sound signals. The so-called redundant components refer to signals in audio that cannot be perceived by human ears, and they are not helpful to determine the timbre, tone and other information of the sound. Redundant signals include audio signals outside the hearing range of human ears and masked audio signals. For example, the frequency range of sound signals that can be perceived by human ears is 2 Hz ~ 2 kHz, and other frequencies that can't be perceived by human ears can be regarded as redundant signals. In addition, according to the physiological and psychoacoustic phenomena of human hearing, when a strong signal and a weak signal exist at the same time, the weak signal will be hidden by the strong signal and can not be heard, so the weak signal can be regarded as redundant signal without transmission. This is the masking effect of human hearing, which is mainly manifested in the spectrum masking effect and the time domain masking effect. Now they are introduced as follows: The spectrum masking effect. After the sound energy of a frequency is less than a certain threshold, the human ear will not hear it. This threshold is called the minimum audible threshold. When there is another sound with higher energy, the threshold near the sound frequency will increase a lot, which is the so-called masking effect. Time domain masking effect of merged atlas. When strong and weak signals appear at the same time, there is also time domain masking effect. That is, the masking effect will also occur when the time of occurrence of the two is very close. Time domain masking is divided into three parts: pre-masking, simultaneous masking and post-masking. Pre-masking means that the existing weak signal will be masked and not heard in a short time before the human ear hears the strong signal. Simultaneous masking means that when a strong signal and a weak signal exist at the same time, the weak signal will be masked by the strong signal and cannot be heard. Post-masking means that when the strong signal disappears, it takes a long time to hear the weak signal again, which is called post-masking. These masked weak signals can be regarded as redundant signals. Compression coding method Audio signal coding is divided into waveform coding, parameter coding and coding forms of multi-technology integration according to different compression principles. (1) Waveform coding directly samples the time-domain or frequency-domain waveforms of audio signals at a certain rate, and then quantizes the amplitude samples hierarchically and converts them into digital codes. A reconstructed signal coding system is generated from the waveform data, which is as consistent as possible with the original sound waveform, and retains the detailed changes and various transitional features of the signal. (2) Parameter coding Firstly, feature models are established according to different signal sources, such as language signals and natural sounds. By extracting feature parameters and coding, the reconstructed sound signal tries to keep the semantics of the original sound as high as possible, but the waveform of the reconstructed signal may be quite different from that of the original sound signal. Commonly used parameter coding techniques such as * * * peak, linear predictive coefficient and band division filter can realize low-rate audio signal coding, and the bit rate can be compressed to 2Kbit/s-4.8Kbit/s, but the sound quality can only reach moderate, especially the naturalness is low, which is only suitable for language transmission and expression. (3) Hybrid coding, which combines waveform coding and parametric coding, overcomes the weaknesses of the original waveform coding and parametric coding, and tries to maintain the high quality of waveform coding and the low rate of parametric coding, so as to obtain high-quality synthetic sound signals at the rate of 4-16kbit/s.. The basis of hybrid coding is linear predictive coding (LPC), and commonly used coding methods are pulse excited linear predictive coding (MPLPC), planned pulse excited linear predictive coding (KPELPC) and codebook excited linear predictive coding (CELPC).
other compression methods are classified in the field of audio compression, and there are two compression methods, namely lossy compression and lossless compression! MP3, WMA and OGG, which we often see, are called lossy compression. As the name implies, lossy compression is to reduce the audio sampling frequency and bit rate, and the output audio file will be smaller than the original file. Another kind of audio compression is called lossless compression, which is what we want to talk about. Lossless compression can reduce the volume of the audio file on the premise of 1% preservation of all the data of the original file, and after the compressed audio file is restored, it can achieve the same size and the same bit rate as the source file. Lossless compression formats include APE, FLaC, WavPack, LPAC, WMALossless, AppleLossless, LA, OptimFROG, Shorten, while the common and mainstream lossless compression formats are only APE and FLAC. The main classification and typical representatives of audio compression algorithms Generally speaking, audio compression technology can be divided into two categories: lossless compression and lossy compression, and according to different compression schemes, it can be divided into time domain compression, transform compression, subband compression, and mixed compression with multiple technologies. Different compression technologies have great differences in algorithm complexity (including time complexity and space complexity), audio quality, algorithm efficiency (compression ratio) and codec delay. The application occasions of various compression technologies are also different. Time domain compression (or waveform coding) technology directly processes the samples of audio PCM code stream, and compresses the code stream by means of mute detection, nonlinear quantization and difference. The same characteristics of this kind of compression technology are low algorithm complexity, average sound quality and low compression ratio (CD sound quality >; 4kbps), the encoding and decoding delay is the shortest (compared with other technologies). This kind of compression technology is generally used for speech compression and low bit rate application (small source signal bandwidth). Time domain compression technologies mainly include G.711, ADPCM, LPC, CELP, and block compandor such as NICAM and subband ADPCM(SB-ADPCM) developed on these technologies. The subband coding theory of subband compression technology was first put forward by Crochiere in 1976. Its basic idea is to decompose the signal into the sum of components in several sub-bands, and then adopt different compression strategies for each sub-band component according to its different distribution characteristics to reduce the code rate. The usual subband compression technology and the transformation compression technology introduced below are based on the human perception model (psychoacoustic model) of sound signals, and the quantization order and other parameters of subband samples or frequency domain samples are determined by analyzing the signal spectrum, so they can also be called Perceptual compression coding. Compared with time domain compression technology, these two compression methods are much more complicated, and the coding efficiency and sound quality are also greatly improved, and the coding delay is correspondingly increased. Generally speaking, the complexity of subband coding is slightly lower than that of transform coding, and the coding delay is relatively short. Standardization of audio compression technology and MPEG-1 Because digital audio compression technology has broad application scope and good market prospect, some research institutions and companies spare no effort to develop their own patented technologies and products. It is very important to standardize these audio compression technologies. MPEG-1 audio (ISO/IEC11172-3) has achieved great success in audio compression standardization. In MPEG-1, there are three modes of audio compression, namely, layer I, layer II (namely, MUSICAM, also known as MP2) and layer III (also known as MP3). Due to the careful investigation of many compression technologies and the full consideration of the actual application conditions and the realizability (complexity) of the algorithm, the three modes have been widely applied. The audio compression scheme used in VCD is MPEG-1 layer I; MUSICAM is widely used in the production, exchange, storage and transmission of digital tuples such as digital studios, DAB and DVB because of its appropriate complexity and excellent sound quality. MP3 is a hybrid compression technology based on the advantages of MUSICAM and ASPEC. At that time, the complexity of MP3 is relatively high, which is not conducive to real-time coding. However, due to its high-level sound quality at low bit rate, MP3 has become the darling of soft decompression and network broadcasting. It can be said that the formulation of MPEG-1 audio standard determines its success, and this idea even affects the formulation of MPEG-2 and MPEG-4 audio standards that will be discussed later.