Bei Jing, September 15, 2015
Fujitsu R&D Center Co., Ltd. developed the voiceprint recognition technology. This technology can recognize the identity of a speaker by analyzing his/her voice, to determine whether the speaker is really a certain individual. More importantly, this technology can be used remotely through telephone voice to easily achieve call safety management.
Compared with the traditional voiceprint recognition technologies, we have overcome the influences of the environmental and the channel noises and can extract the features related to the speaker from the voice signals, thus greatly increased the recognition performances and the application scope of this technology. The recognition error rate is below 3%, while the comparison time is less than 0.13s. Up to now, we have completed the verification study on the use of the voiceprint recognition technology in family phone call management systems in prisons and the bank loan systems together with Jiangsu Fujitsu Communication Technology Co., Ltd.
【Research Background】
Voiceprint recognition technology is one of the important branches of biological recognition. Because of its unique advantage of remote operation, the voiceprint recognition technology can be widely used for identity recognition in such areas and scenarios as banks and stock exchanges, identification cards and credit cards business, etc. And it has become a very important measure for the prevention of frauds.
In such criminal cases as telephone scam and telephone blackmail, the evidences that can be most easily obtained are recorded phone calls. With the help of the voiceprint recognition technology we can obtain clues from the recorded phone calls, effectively narrow the criminal investigation scope and shorten the time required to solve a case.
In addition, with the widespread development of such businesses as internet financing, the live identity recognition has become a very important step for remote bank account opening and other banking services. With the combination of a password and the voiceprint recognition technology, we can conveniently and effectively ensure the safety of the banking services.
【Topic】
The traditional voiceprint technologies requires the calculation of the confidence value of each frame of speech data (generally about 20ms) on the Gaussian model, thus is inherently with a slow recognition speed. And because of the use of a large amount of Gaussian models, the sizes of the voiceprint data are rather big (several hundreds of K-Bytes). Due to these two factors, the traditional voiceprint technologies are not applicable to large scale voiceprint recognition scenarios.
Even in cases of small scale voiceprint recognition applications, the traditional voiceprint technologies are still restricted by such factors as the environmental and the channel noises. When the environment of registration and recognition differs (such as from a quiet office to a noisy outdoor environment), and when the voice acquisition equipment changes (such as from a cell phone to a microphone), the voiceprint recognition performances will be declined, thus, the reliability and the applicability of the voiceprint recognition will be affected by environmental or equipment condition.
【Developed Method】
As shown in Figure 1, three main modules are used to build a voiceprint model, the universal background model (UBM), the model self-adaption and feature space decomposition.
The purpose of universal background model and the model self-adaption modules is to solve the problem of insufficient speech data from an individual speaker. Firstly, a huge amount of speech data are used to train a Gaussian mixture model, i.e. the universal background model. Then, the self-adaption to the UBM is performed for the speech data of each individual speaker, to obtain the super-vector feature corresponding to that individual speaker.
Figure 1. Voice-print Modeling Schematics
The purpose of feature space decomposition is to eliminate the influences of the environmental and channel noises to generate the description specific to the speaker, as shown in Figures 2 and 3. After the feature space decomposition, the speech feature is decomposed into three parts, speaker independent component, speaker dependent component, and channel dependent component. In voiceprint comparison, only the speaker dependent component is used, thus reducing the influences of the noise and ensuring the voiceprint recognition precision.
Figure 2. Decomposition and calculation of the feature Space
Figure 3. Schematic diagram of feature space decomposition
【Results】
We have completed the verification study on the use of the voiceprint recognition technology in such areas as the family phone call management system in prisons and the bank loan on credits together with Jiangsu Fujitsu Communication Technology Co., Ltd.
- Family phone call management in prisons:
- the voiceprint recognition technology is applied to effectively avoid the criminals in the prison from contacting the criminals outside the prison through family phone calls, which greatly reduces the burden of the prison policemen in family phone call monitoring activities and improves the safety of the prison management;
- Consumer credit loan of a bank:
- In order to prevent anybody from pretending to be somebody else to apply for multiple loans through the on-line loan service of the bank, a bank adopted our voiceprint recognition technology in its phone call checking step to ensure the uniqueness of the applicant and avoid fraud loans.
【Future plan】
Fujitsu R&D Center Co., Ltd. is going to work together with its partners to promote the commercialization of the voiceprint recognition technology after its verification study and continuously improve this technology based on the feedback from the customers.