What is the meaning of voiceprint with time-varying resistance?

I am a voiceprint expert. This kind of forensic medicine, also known as voice identification and judicial voice identification, belongs to the category of voice identification. I have encountered a voice changer, and finally I have given the same appraisal conclusion (the process is not obvious).

Borrow me in "Can we recognize imitation pronunciation?" First of all, we should introduce the particularity and stability of pronunciation. (This is the basic principle of voiceprint recognition. Various teaching materials have different styles of expression, but they all say the same thing. The first lesson I taught my students was to let them remember this principle. ):

Because of the particularity of pronunciation, the vocal organs are divided into supraglottic system, laryngeal system and subglottic system. Everyone has his own set of vocal organs, and their shapes and structures are different. Every pronunciation requires many vocal organs to cooperate and move together. This determines the physical properties of speech (also known as the four elements of speech): sound quality, sound length, sound intensity and pitch. These physical quantities vary from person to person, so the sound shows different voiceprint characteristics on the voiceprint. According to these voiceprint characteristic parameters, not only can voices be distinguished, but also the voices of the same person can be recognized.

The stability of speech. After a person's vocal organs mature, their anatomical structure and physiological state are stable, while the socio-psychological attributes of pronunciation, such as the speaker's speaking habits, make the basic phonetic features stable if everyone speaks the same text at different times. Therefore, you can regard the human vocal tract as a trumpet in wind music. Although trombone and cornet are both small, the sound quality is different because of the different shapes and lengths of the sound channels.

At present, there are two methods of voiceprint recognition:

First, the "artificial identification" widely used in judicial practice in China-expert appraisers rely on phonetics.

The second is the "automatic identification" of the future development direction-the computer simulates the extraction, training and comparison of acoustic features by human ears through algorithms.

So what kind of profound thing is a "voice changer"? Voice changer is a tool to change the timbre and tone of input audio and output the changed audio (Baidu Encyclopedia). In fact, there is also the speed of sound, which Baidu Encyclopedia missed. In addition, it goes without saying that the sound volume can be changed without a voice changer. Then the sound quality (timbre), tone (corresponding pitch), sound intensity (corresponding sound size) and sound length (corresponding sound speed) of the four elements of speech have all changed, and the physical properties of speech have also changed. How to identify them?

A don't think so abstruse about "voice changer"

In recognition, isn't "voice changer" just a channel? The definition of channel is the channel of signal transmission, and our recognition is called voiceprint recognition. In fact, it is not the real human voice that is analyzed and tested, but the human recording-sound signal. All kinds of recording equipment can be regarded as channels, and all kinds of coding methods can also be regarded as channels, which have changed the sound signal. For example, walkie-talkies and telephones are all channels. Your voice is transmitted through the intercom, and your hearing has been distorted. You have already felt the influence of the channel on the voice signal. At present, the "voice changer" on the market, whether it is hardware or software, mainly changes the fundamental frequency and turns the deep voice (male voice) into sharp voice (female voice, child voice). (About changing timbre, on the one hand, timbre must have changed. After resampling and changing the fundamental frequency, the peak characteristics of timbre must have changed. On the other hand, its * * * vibration peak changes as a whole, and the relative relationship can be regarded as a constant. Of course, to be precise, it is just the so-called male voice, female voice, children's voice and old people's voice in our hearing and social understanding. Voice is not a sexual feature, so it can't distinguish between men and women, only statistics. The counterexample is Zheng's voice, which is so high and has a long vocal cord (positive correlation). Search online and listen. Do you think you are a man or a woman? In addition, Conan magically turned it into Richard Moore with a voice changer in the cartoon. It is impossible to be so precise and achieve such good results in reality. Theoretically, it only needs to collect a lot of acoustic data from Richard Moore.

"Voice changer" changes the physical characteristics of sound, rather than evaluating all acoustic characteristics in the sense.

As mentioned above, the main advantage of commonly used artificial speech recognition methods is that they can distinguish "advanced voiceprint features" that are difficult for computers to recognize, such as dialect accents, idioms, redundant words, phonetic defects, prosodic features and so on. What is this "advanced"? The first thing we know to distinguish a person's voice is through these characteristics; Imitation shows also attach great importance to such characteristics when imitating. But computers are hard to recognize, which is called "advanced function". Yes, the human ear is the most sophisticated voiceprint recognition instrument. The "low-level voiceprint features" unfamiliar to non-professionals, such as * * * vibration peak and fundamental frequency, are the most familiar to computers, and even the continuous intensity, sound length and VOT can be recognized by computers.

C "voice changer" changes the physical properties of speech, and can also change samples in the same way when recognizing samples.

Whether it is manual identification or automatic identification, it is through the recording of the case-"inspection materials" and the recording of the suspect-"samples" to compare. Since the voice changer is a "sample" that has been changed by some settings, it is only necessary to change the "sample" with the same settings. As mentioned earlier, the voice changer is a generalized channel. Once an appraiser thought that the sound was distorted, because it was a case of "voice changer", and it had to be identified by the sound before the signal was restored. This kind of thinking, regardless of the current technology, channel recovery can not be restored at all, mainly because it is not considered. We identify the analyzed records, which one is not the result of signal processing through the channel? No matter how high the sampling rate is, no matter how precise it is, it is also the result of discrete changes. Can it really be equal to the continuous signal like human voice? They all go through the channel and the signal processing, but the difference is different and the hearing is different.

Question 2: Is it right to say that "sound waves, like fingerprints, can be recognized no matter how they change"?

Answer: No.

The term "sound wave" is wrong. Every time I hear the word "sound wave", I think of bats, which is a term often used by non-professionals. People who specialize in voiceprint recognition have a good understanding of recognition, a superficial understanding of physics and signal processing, and little understanding of bats and other creatures. In judicial practice, judges, prosecutors and investigators of public security organs all take our professional opinions as their authority. As for the professors of various law schools, there may be different opinions, but they say it can be counted. Why do they need judges? In order to answer this question accurately, I searched Baidu, and the papers that appeared in it were either written by myself, by my teacher or by my familiar colleagues. In other words, all these specious and even contradictory interpretations on Baidu come from our different understanding and expression of the same thing. The word "voiceprint" was defined by the earliest appraiser when this judicial science and technology was introduced to China in 1980s. This title is a phonetic atlas used according to the speech recognition method, which is easy to understand and used habitually. Voiceprint recognition is the general name of the whole speech recognition, including the recognition of the same speech, the authenticity test of the recording, the noise reduction of the speech and the improvement of the signal-to-noise ratio. It also refers to the identification of individual voice identities. "voiceprint" is the abbreviation of voiceprint, the main identification basis; It is also the general name of pronunciation as a biological feature. At present, the identification method has been developed, and the method of analyzing acoustic characteristics by spectrogram is not the only identification method, so it is more accurate to call it "forensic voice identification" and more convenient to call it "voiceprint identification". In short, you can call love whatever you want, but you can't call it "sound wave" because "sound wave" means something else.

"Sound wave" is different from "voiceprint"

Because the difference between translation and understanding probably comes from these words: sound wave, spectrum, sound, formant and intensity. If the appraiser is talking about sound waves, it must mean the waveform diagram (see figure 1), which means the intensity. In the voiceprint recognition of phonetics, it is not the main feature, but the resonance peak * * * vibration peak (see Figure 2).