1. Commercial voice interaction platform
1) Microsoft voice API
Microsoft's voice API (SAPI) is an application programming interface (API), including speech recognition (SR) and speech synthesis (SS) engines, which is widely used under Windows. At present, Microsoft has released several versions of SAPI (the latest version is SAPI 5.4), either as a development kit of Speech SDK or directly included in the windows operating system. SAPI supports the recognition and reading of multiple languages, including English, Chinese and Japanese.
2).IBM viaVoice
IBM is one of the institutions that started the research of speech recognition earlier. As early as the late 1950s, IBM began to study speech recognition. Computers are designed to detect specific language patterns and obtain statistical correlations between sounds and corresponding characters. 1999, IBM released a free version of VoiceType. In 2003, IBM authorized ScanSoft to have the exclusive distribution right of desktop products based on its viability, and then ScanSoft merged with Nuance. Now the viability has long faded out of people's sight, replaced by subtle differences.
3) Subtle differences
Nuance Communication is a multinational computer software technology company headquartered in Burlington, Massachusetts, USA, which mainly provides voice and image solutions and applications. At present, the business focuses on server and embedded voice recognition, telephone steering system, automatic telephone directory service, etc. Besides speech recognition technology, Nuance speech technology also includes speech synthesis, voiceprint recognition and other technologies. In the world voice technology market, more than 80% of voice recognition uses Nuance recognition engine technology, and its name has more than 1000 patented technologies. The voice products developed by the company can support more than 50 languages and have more than 2 billion users around the world. Nuance's speech recognition service is applied to Siri speech recognition of Apple's iPhone 4S.
4) Iflytek
As the largest intelligent voice technology provider in China, Iflytek has long-term research and accumulation in the field of intelligent voice technology, and has international leading achievements in Chinese speech synthesis, speech recognition and oral evaluation. It occupies more than 60% of the voice technology market in China, and the market share of voice synthesis products reaches more than 70%.
5) Others
Other influential commercial voice interaction platforms include Google's voice search, Baidu and sogou's voice input methods.
2. Open source voice interaction platform
1)CMU Sphinx
CMU- Sphinx is an open source speech recognition system developed by Carnegie Mellon University (CMU), which includes a series of speech recognizers and acoustic model training tools. The earliest Sphinx-I was developed by Kai-fu Lee around 1987, using a fixed HMM model (including three codebooks with a size of 256). It is called the first high-performance continuous speech recognition system (the accuracy rate in resource management database is 90%+). The latest Sphinx speech recognition system includes the following software packages:
Pocketsphinx-recognizer library written in C.
Sphinx base-the supporting library sphinx base—pocket sphinx.
Sphinx 4-An adjustable and modifiable recognizer written in Java.
Cmuclmtk-Language Model Tool
Sphinxtrain—— Acoustic Model Training Tool
The executable files and source codes of these software packages can be downloaded for free on sourceforge.
2)HTK
HTK is the abbreviation of Hidden Markov Model Toolkit, which is mainly used in speech recognition research. It was originally developed in 1989 by the Machine Intelligence Laboratory (formerly Speech Vision and Robotics Group) of the Engineering Department of Cambridge University to build a large vocabulary speech recognition system for CUED. The latest version of HTK is version 3.4. 1 released in 2009. Please refer to HTK's document HTKbook for the implementation principle of htk and the use methods of various tools.
3) Julius
Julius is a high-performance, dual-channel large vocabulary continuous speech recognition (LVCSR) open source project, which is suitable for researchers and developers. It uses 3-gram and context-sensitive HMM, and can realize real-time speech recognition on the current PC, with a word size of 60k.
4)RWTH ASR
The toolbox contains the latest algorithm implementation of automatic speech recognition technology, which was developed by the Human Language Technology and Pattern Recognition Group of Rwthahachen University. RWTH ASR toolbox includes the construction of acoustic model, parser and other important parts, as well as speaker adaptive components, speaker adaptive training components, unsupervised training components, personalized training and root processing components.
5) Others
The open source toolbox mentioned above is mainly used for speech recognition. Other open source speech recognition projects include Kaldi, simon, iATROS-speech, SHoUT, Zanzibar OpenIVR, etc.