Please introduce and compare the audio formats (professional) on.

Content Description: This paper introduces pulse code modulation, WMA coding, ADPCM coding, LPC coding, MP3 coding, AAC coding, CELP coding and so on. , including advantages and disadvantages comparison and main application fields.

Pulse code modulation (original digital audio signal stream)

Type: audio

Formulated by ITU-T.

Bandwidth required:1411.2kbps.

Features: Sound source information is complete, but the redundancy is too large.

Advantages: the sound source information is completely preserved and the sound quality is good.

Disadvantages: large amount of information, large volume and excessive redundancy.

Application field: VoIP

Royalty method: free of charge

Remarks: In computer applications, pulse code modulation can achieve the highest fidelity, which is widely used in material preservation and music appreciation, and also in CD, DVD and our common WAV files. So PCM convention has become lossless coding, because PCM represents the best fidelity level in digital audio, which does not mean that PCM can guarantee the absolute fidelity of the signal, and PCM can only achieve infinite proximity to the maximum extent. It is very easy to calculate the code rate of a PCM audio stream, and the sampling rate× sampling size× channel number bps. The sampling rate is 44. 1KHz, the sampling size is 16bit, and the data rate is 44.1k×16× 2 =141.2kbps. Our common audio CD uses pulse code modulation, and the capacity of a CD can only accommodate 72 minutes of music information.

WMA(Windows Media audio)

Type: audio

Manufacturer: Microsoft Corporation

Bandwidth required: 320 ~ 1 12 kbps (compressed 10 ~ 12 times)

Features: When the code rate is less than 128K, WMA performs best in almost all lossy coding formats of the same level, but it seems that 128k is a threshold for WMA. When the bit rate increases, the sound quality will not change much.

Advantages: WMA works best when the code rate is less than 128K, and the encoded audio file is very small.

Disadvantages: When the bit rate is greater than 128K, WMA loses too much sound quality. WMA standard is not open, it is controlled by Microsoft.

Application field: VoIP

Commission method: charged separately.

Remarks: WMA, the full name of Windows Media Audio, is a new audio format introduced by Microsoft, which is as famous as MP3 format. WMA surpasses MP3 in compression ratio and sound quality, and is far superior to RA(Real Audio), so it can produce better sound quality even at low sampling frequency. In addition, WMA has the strong backing of Microsoft Windows Media Player, so it won applause as soon as it was launched.

Adaptive differential PCM

Type: audio

Formulated by ITU-T.

Bandwidth required: 32Kbps

Features: ADPCM (Adaptive Differential Pulse Code Modulation) combines the adaptive characteristics of APCM and the differential characteristics of DPCM system, and is a waveform coding with good performance.

Its core idea is:

(1) Change the quantization step size by using the adaptive idea, that is, use a small quantization step size to encode small differences and use a large quantization step size to encode large differences;

② Use the past sample values to estimate the predicted value of the next input sample, so that the difference between the actual sample value and the predicted value is always the smallest.

Advantages: low algorithm complexity and low compression ratio (CD sound quality >; 400kbps), the encoding and decoding delay is the shortest (compared with other technologies).

Disadvantages: the sound quality is average.

Application field: VoIP

Royalty method: free of charge

Remarks: ADPCM (ADPCM Adaptive Differential Pulse Code Modulation) is a kind of ADPCM used in 16bit (or higher? ) A lossy compression algorithm for sound waveform data, which stores the sampled 1644bit data in the sound stream as 4 bits, so the compression ratio is 1:4. The compression/decompression algorithm is very simple, and it is a good method to obtain high-quality sound with low space consumption.

Linear predictive coding

Type: audio

Manufacturer:

Bandwidth required: 2Kbps-4.8Kbps

Features: high compression ratio, large amount of calculation, low sound quality and low price.

Advantages: high compression ratio and low cost.

Disadvantages: large amount of calculation, poor voice quality and low naturalness.

Application field: VoIP

Royalty method: free of charge

Remarks: Parameter coding, also known as sound source coding, is to extract characteristic parameters from source signals in frequency domain or other orthogonal transform domains and convert them into digital codes for transmission. Decoding is the reverse process, which transforms the received digital sequence, recovers the characteristic parameters, and then reconstructs the speech signal according to the characteristic parameters. Specifically, parametric coding attempts to make the reconstructed speech signal as accurate as possible by extracting and coding the characteristic parameters of the speech signal, but the waveform of the reconstructed speech signal may be very different from that of the original speech signal. For example, linear predictive coding (LPC) and various other improvements belong to parametric coding. The coding bit rate can be compressed to 2Kbit/s-4.8Kbit/s or even lower, but the speech quality can only be moderate, especially the naturalness is low.

Code excited linear prediction

Type: audio

Manufacturer: European Communication Standards Institute (ETSI)

Bandwidth required: 4 ~ 16 kbps.

Features: improve voice quality;

① Sensory weighting is applied to the error signal by using the masking characteristics of human hearing to improve the subjective quality of speech;

(2) pitch prediction is improved by fractional delay, which makes voiced speech more accurate, especially improves the quality of female voice;

③ The modified MSPE criterion is used to find the "best" delay, which makes the appearance of pitch period delay smoother;

④ According to the efficiency of long-term prediction, adjust the size of random excitation vector to improve the subjective quality of speech;

⑤ Using adaptive smoother based on channel error rate estimation, high naturalness speech can be synthesized under the condition of high channel error rate.

Conclusion:

(1) CELP algorithm can achieve satisfactory compression effect in low bit rate coding environment;

② Using fast algorithm can effectively reduce the complexity of CELP algorithm and make it completely real-time;

(3) ③CELP can successfully encode various types of speech signals, and this adaptability is more important for real environment, especially in the presence of background noise.

Advantages: It provides clear speech with very low bandwidth.

Disadvantages:-

Application field: VoIP

Royalty method: free of charge

Remarks: 1999, the European Communication Standards Institute (ETSI) introduced an adaptive multi-rate speech coder (AMR) based on code excited linear prediction (CELP), and its minimum rate was 4.75kb/s, which achieved communication quality. CELP code-excited linear prediction is the abbreviation of code-excited linear prediction. CELP is the most successful speech coding algorithm in recent 10 years. CELP speech coding algorithm uses linear prediction to extract vocal tract parameters, uses a codebook containing many typical excitation vectors as excitation parameters, and searches for an optimal excitation vector in this codebook every time coding, and the coded value of this excitation vector is the serial number of this sequence in the codebook.

CELP is adopted by many speech coding standards, and the American federal standard FS 10 16 is the coding method of CELP, which is mainly used for high-quality narrowband speech secure communication. Celp (Code Excited Linear Prediction) This is a simplified LPC algorithm, which is famous for its low bit rate (4800-9600Kbps), clear speech quality and high immunity to background noise. CELP is a widely used speech compression coding scheme at medium and low bit rates.

MPEG- 1 audio layer 1

Type: audio

Manufacturer: MPEG

Bandwidth required: 384kbps (4 4x compression)

Features: Simple coding. The audio compression scheme used in digital cassette tape, 2-channel and VCD is MPEG- 1 Layer 1. ..

Advantages: Compared with time domain compression technology, the compression method is much more complicated, the coding efficiency and sound quality are greatly improved, and the coding delay is also increased accordingly. It can achieve "completely transparent" sound quality (EBU sound quality standard)

Disadvantages: High bandwidth requirements.

Application field: VoIP

Royalty method: free of charge

Remarks: MPEG- 1 audio compression coding is the first international standard for data compression in high definition audio, which is divided into three levels:

-Layer 1(Layer 1): simple coding, used for digital cassette recording tapes.

-Layer 2: The algorithm has medium complexity and is used in digital audio broadcasting (DAB) and VCD.

-Layer 3: The coding is complex, which is used for the transmission of high-quality sound on the Internet, such as MP3 music compression with 10 times.

Musicam (MPEG- 1 Audio Layer 2, MP2)

Type: audio

Manufacturer: MPEG

Bandwidth required: 256 ~ 192 kbps (compressed by 6 ~ 8 times)

Features: The complexity of the algorithm is moderate, and it is used for digital audio broadcasting (DAB) and VCD, with two channels. MUSICAM is widely used in the production, exchange, storage and transmission of digital programs such as digital studios, DAB and DVB because of its appropriate complexity and excellent sound quality.

Advantages: Compared with time domain compression technology, the compression method is much more complicated, the coding efficiency and sound quality are greatly improved, and the coding delay is also increased accordingly. It can achieve "completely transparent" sound quality (EBU sound quality standard)

Disadvantages:

Application field: VoIP

Royalty method: free of charge

Remarks: Same as MPEG- 1 audio layer 1.

MP3(MPEG- 1 Audio Layer 3)

Type: audio

Manufacturer: MPEG

Bandwidth required:128 ~112 kbps (compressed 10 ~ 12 times).

Features: Complex coding, used for high-quality sound transmission on the Internet, such as MP3 music compression 10 times, 2 channels. MP3 is a hybrid compression technology based on the advantages of MUSICAM and ASPEC. At that time, the complexity of MP3 was relatively high, which was not conducive to real-time coding. However, due to its high-level sound quality at low bit rate, MP3 has become the darling of soft decompression and network playback.

Advantages: High compression ratio, suitable for Internet communication.

Disadvantages: When MP3 is at 128KBitrate or below, there will be obvious high-frequency loss.

Application field: VoIP

Royalty method: free of charge

Remarks: Same as MPEG- 1 audio layer 1.

MPEG-2 audio layer

Type: audio

Manufacturer: MPEG

Bandwidth required: same as MPEG- 1 Layer 1, Layer 2 and Layer 3.

Features: MPEG-2 uses the same codec as MPEG- 1, and the structures of layer 1, layer 2 and layer 3 are the same, but it can support surround sound of 5. 1 and 7. 1 channels.

Advantages: Support 5. 1 channel and 7. 1 channel surround sound.

Disadvantages:-

Application field: VoIP

Commission method: charged separately.

Remarks: MPEG-2 uses the same codec as MPEG- 1 sound, and the structures of layer 1, layer 2 and layer 3 are the same, but it can support 5. 1 channel and 7. 1 channel surround sound.

advanced audio coding

Type: audio

Manufacturer: MPEG

Bandwidth required: 96- 128 kbps.

Features: AAC can support any number of audio channel combinations from 1 to 48, including 15 low-frequency effect channel, dubbing/multi-voice channel and 15 data channel. It can transmit 16 programs at the same time, and the audio and data structure of each program can be arbitrarily specified.

The main possible applications of AAC focus on Internet communication, digital audio broadcasting, including satellite live broadcast and digital AM, as well as digital TV and cinema systems. AAC uses a very flexible entropy coding core to transmit coded spectrum data. It has 48 main audio channels, 16 low-frequency enhancement channels, 16 integrated data streams, 16 dubbing and 16 arrangement.

Advantages: Support multiple audio channel combinations and provide high-quality sound quality.

Disadvantages:-

Application field: VoIP

Commission method: one-time collection

Remarks: AAC formed the international standard ISO 138 18-7 in 1997. Advanced audio coding-AAC has been successfully developed and become a new generation audio compression standard after MPEG-2 audio standard (ISO/IEC 138 18-3).

In the early days of MPEG-2, the original purpose was to keep its audio coding part compatible with MPEG- 1. But later, in order to meet the requirements of radio and television, it was defined as a multi-channel audio standard that can obtain higher quality. This standard is naturally incompatible with MPEG- 1, so it is called MPEG-2AAC. In other words, on the surface, making and playing AAC requires completely different tools from MP3.

Human resources (department)

Type: audio

Manufacturer: Philips

Bandwidth required: 8Kbps

Features: the purpose is to increase the capacity of GSM network, but it will damage the voice quality; Due to the lack of network frequency, some large operators have opened this way in densely populated areas of big cities to increase capacity.

Advantages: large system capacity.

Disadvantages: poor sound quality

Application field: GSM

Royalty method: according to the specific situation.

Remarks: HR half rate is a GSM speech coding method.

father

Type: audio

Manufacturer: Philips

Bandwidth required: 13Kbps

Features: It is a universal communication coding method for GSM mobile phones, which can reach the voice communication quality of about 4. 1 (ITU stipulates that the full score of voice communication quality Qos is 5).

Advantages: the voice quality has been improved.

Disadvantages: the system capacity is reduced.

Application field: GSM

Royalty method: according to the specific situation.

Remarks: FR full rate is a GSM speech coding method.

Electronic failure report (electronic failure report)

Type: audio

Manufacturer: Philips

Bandwidth required: 13Kbps

Features: It can be used for speech coding and transmission of GSM mobile phone based on 13Kbps full rate, and can obtain better and clearer speech quality (close to Qos4.7). The mobile phone can only cooperate with the network service provider to enable this network function.

Advantages: Good sound quality.

Disadvantages: Network service providers need to enable this network function, and the system capacity is reduced.

Application field: GSM

Royalty method: according to the specific situation.

Remarks: EFR enhanced full rate, a GSM network speech coding method.

Adaptive multirate

Type: audio

Manufacturer: Philips

Bandwidth required: 8Kbps(4.75 Kbps~ 12.2 Kbps)

Features: voice can replace mute, smooth noise, support intermittent transmission, and dynamically detect voice. It can provide high-quality voice effects under various network conditions.

Advantages: Excellent sound quality.

Disadvantages:-

Application field: GSM

Royalty method: according to the specific situation.

Remarks: GSM-ASM is an audio standard widely used in GPRS and W-CDMA networks. GSM-AMR is defined in ETSI GSM06.90 specification. AMR speech coding is the default coding standard of GSM2+ and WCDMA, and it is the speech coding standard of the third generation wireless communication system. The GSM-AMR standard is based on ACELP (Algebraically Excited Linear Prediction) coding. It can provide high-quality speech effects under a wide range of transmission conditions.

EVRC (enhanced variable rate encoder)

Type: audio

Manufacturer: Qualcomm Communication Company (Qualcomm).

Bandwidth required: 8Kbps or 13Kbps.

Features: Support for three bit rates (9.6 Kbps, 4.8 Kbps and 1.2 Kbps), noise suppression and email filtering. It can provide high-quality voice effects under various network conditions.

Advantages: Excellent sound quality.

Disadvantages:-

Application field: CDMA

Royalty method: according to the specific situation.

Remarks: EVRC coding is widely used in CDMA networks. The EVRC standard follows the contents of TIA IS- 127. EVRC coding is based on RCELP (relaxed code excited linear prediction) standard. Encoding can operate at the capacity of rate 1( 17 1 bit/packet), rate 1/2(80 bits/packet) or rate 1/8( 16 bits/packet). According to requirements, it can also generate 0 bits/packet.

Qualcomm code excited linear prediction.

Type: audio

Manufacturer: Qualcomm Communication Company (Qualcomm).

Bandwidth required: 8k speech coding algorithm (it can work at fixed rate such as 4/4.8/8/9.6Kbps and variable rate between 800Kbps~9600Kbps).

Features: Use an appropriate threshold to determine the required rate. QCELP is an 8k speech coding algorithm (it can provide a speech compression quality close to 13k at the rate of 8k). This is a variable-rate speech coding, which is an optimization technology based on the characteristics of human speech (we should understand that in daily communication, we don't always speak in a constant way, and intermittent and different audio are natural manifestations of human beings).

Advantages: clear voice, low background noise and large system capacity.

Disadvantages: not free

Application field: CDMA

Royalty law: pay an annual fee for the right to use.

Remarks: QCELP, namely Qualcomm code excited linear prediction (Qualcomm excited linear prediction coding). The patented speech coding algorithm of Qualcomm Communication Company is the speech coding standard (IS95) of the second generation digital mobile phone (CDMA) in North America. This algorithm can work not only at a fixed rate of 4/4.8/8/9.6 kbit/s, but also at a variable rate between 800 bit/s and 9600 bit/s. QCELP algorithm is considered to be the most efficient algorithm so far, and one of its main features is to use a suitable threshold to determine the required rate. The threshold changes with the background noise level, so that the background noise is suppressed and good speech quality can be obtained even in noisy environment. The voice of CDMA8Kbit/s is similar to that of GSM13mbit/s. CDMA adopts a series of technologies such as QCELP coding, with clear voice and low background noise. Its performance is obviously superior to other wireless mobile communication systems, and its voice quality is comparable to that of wired telephone. Wireless radiation is very low.

This article comes from: I love R&D network (52RD.com)-R & D base camp.

Detailed source:/blog/detail _ rd.blog _ zcy _ lhj _ 20876.html.