Microsoft presents speech recognition breakthrough

Have you ever been to Interspeech, the annual Conference of the International Speech Communication Association, held in beautiful Florence, Italy?

And you call yourself a hardcore speech technology nut?! Tell me if these PDFs make sense to you.

Well, it’s good to know that while some companies are busy buying up smaller competitors, there are brainiacs all over the world who actually fawn over speech recognition. Let’s thank Microsoft for pouring billions of dollars into its Research arms (yes, it has R&D facilities around the globe) so that we may one day face a Terminator who won’t mix up “Go fetch me a beer!” with “Gopher meets a deer!” and blow us into pieces because it thinks we are too dumb.

Microsoft researchers are improving large vocabulary speech recognition by enhancing neural network models of “senones” (so cutting edge that even Wikipedia doesn’t offer an explanation):

Earlier work on DNNs had used phonemes. The research took a leap forward when Yu, after discussions with principal researcher Li Deng and Alex Acero, principal researcher and manager of the Speech group, proposed modeling the thousands of senones, much smaller acoustic-model building blocks, directly with DNNs. The resulting paperContext-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition by Dahl, Yu, Deng, and Acero, describes the first hybrid context-dependent DNN-HMM (CD-DNN-HMM) model applied successfully to large-vocabulary speech-recognition problems.

“Others have tried context-dependent ANN models,” Yu observes, “using different architectural approaches that did not perform as well. It was an amazing moment when we suddenly saw a big jump in accuracy when working on voice-based Internet search. We realized that by modeling senones directly using DNNs, we had managed to outperform state-of-the-art conventional CD-GMM-HMM large-vocabulary speech-recognition systems by a relative error reduction of more than 16 percent. This is extremely significant when you consider that speech recognition has been an active research area for more than five decades.”

Kudos, Microsoft. We cannot wait to see this breakthrough commercialized because, Lord knows, you could use some help:

(h/t Jason Hersh)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s