CHINA TOPIX

12/22/2024 04:44:31 pm

Make CT Your Homepage

Baidu Unveils Deep Voice that can Synthesize Human Speech Quickly

Baidu Deep Voice can synthesize speech that will sound natural and realistic by itself.

(Photo : Getty Images) Baidu Deep Voice can synthesize speech that will sound natural and realistic by itself.

Baidu has been developing its own AI system for four years, unveiling recently the Deep Voice system that is faster more efficient compared to Google's WaveNet.

According to Google of China, Deep Voice can be trained to speak in just a few hours with little to no human interaction. The company can manage how it speaks to convey different emotions, resulting to the synthesizing of speech that will sound pretty natural and realistic. 

Like Us on Facebook

Baidu’s team of researchers at the Chinese giant’s AI Lab in Silicon Valley said that Deep Voice may require some initial human fine-tuning during the training period, but eventually it can synthesize speech that will sound natural and realistic by itself.

They have separated the text into graphemes, which is the smallest written particle. Next is translating them into phonemes, the smallest speech particle, and relay the information in sound. Each of the steps is being managed by machine-learning algorithms, which need to perform at an incredible rate to sound realistic.

Moreover, the researchers further explained that they may have improved on an existing system but the same still requires too much computational power. To achieve realistic human speech synthesis, the system requires maintaining sampling rate in the region of 48KHz and process text in 20 microseconds. Further, the company has already tested the said model and produced a ‘high quality’ result as per crowdsourced perceptions. 

“To perform inference at real-time, we must take great care to never recompute any results, store the entire model in the processor cache (as opposed to main memory), and optimally utilize the available computational units.

We optimize inference to faster-than-real-time speeds, showing that these techniques can be applied to generate audio in real-time in a streaming fashion,” according to the statement of Baidu AI researchers.

Baidu’s AI researchers believe that producing real-time speech synthesis is possible. They have uploaded the audio samples to the crowdsourcing Amazon site Mechanical Turk to ask large number of people about the quality of their samples.

Real Time Analytics