Speech synthesis technology is extensively applied in a variety of real time information services such as call center, car navigation, speech webpage, assistance teaching, special populations (eye disabled) service, and so on. The user experience and service quality are largely affected by the correctness, smoothness, and naturalness of the synthesized speech. By studying the key technologies such as rhythm, polyphone, digits and symbol processing, we develop the Text-To-Speech (TTS) system to generate high-quality and natural speech.
1. Highnatural rhythm
A natural speech synthesis system should be close to the true speech of the people as much as possible in both speech pause, pronunciation duration, and tone. For this purpose, we analyze several factors related to rhythm and build statistical model to predict the duration for each syllable. In addition, diversified tone templates are adopted to characterize the Chinese rhythm to guarantee the naturalness of the synthetic speech.
2. Powerful processing capability of polyphone, digit and special symbol
Correct pronunciation of polyphones, digits and special symbols plays an important role for easy understanding of the synthetic speech. To ensure correct pronunciation of these special contents, we build a combined model using rules and statistical learning methods to analyze the contexts of the polyphones, digits and special symbols in a large speech corpus, and establish a prediction model for each character to correctly determine their pronunciation.
Yu Hao: firstname.lastname@example.org
Liu Rujie: email@example.com
Share this page