According to researchers responsible for the project, the system was trained with a 24-hour data set, which was spoken by a professional speaker in American English. By using so-called Mel spectrograms as an intermediate Tacotron 2 achieve a particularly natural-sounding voice output, as these allow a higher mapping of the pitches.
Also Read: Exciting Times For NASA, Thanks To Google AI
To evaluate the quality of the system, 100 randomly selected sequences were created as audio files, which were then scored by humans on a scale of 1 to 5. The resulting “mean opinion score” (MOS) was an extremely good value of 4.525 for the AI system. Real human shots are only insignificant at 4.58.
If you want to convince yourself of Google’s new speech output, you can do so on a demo page. There, the researchers have uploaded a series of audio files for text snippets that were previously unknown to the system. The high quality of the speech output is really amazing and virtually indistinguishable from normal human pronunciation. Tacotron 2 even gets along with typos and can classify the individual words in the overall context in such a way that the emphasis fits.
Also Read: Google You Owe Us, Say iPhone Users
Even if the AI system is only basic research. With near-perfect results, it will not be too long before Google integrates technology with Google Assistant and other products. Other IT companies such as Google’s Chinese counterpart Baidu are already working on similar systems. Already in March of this year, the Baidu engineers had announced a breakthrough in their voice output system.