Smartphone speech recognition software gets a bad rap. Most users find the nascent technology to be frustratingly slow, and there are entire blogs dedicated to documenting examples of its biggest – and sometimes hilarious – mistakes.
But results from a new experiment suggest a different reality: Speech recognition can be used to compose text messages faster and more accurately than humans can type on mobile phone screens.
Go to the web site to view the video.
“Speech recognition is something that’s been promised to us for decades, but it has never worked very well,” said James Landay, a professor of computer science at Stanford and co-author of the new study. “But we were noticing that in the past two to three years, speech recognition was actually improving a lot, benefiting from big data and deep learning to train its neural networks to produce faster, more accurate results. So we decided to formally test it against humans.”
The research team, which included computer scientists from Stanford, Baidu Inc. and the University of Washington, devised an experiment that pitted Baidu’s Deep Speech 2 cloud-based speech recognition software against 32 texters, ages 19 to 32, working the built-in keyboard on an Apple iPhone.
“They grew up texting, so we’re putting speech recognition up against people who are really good at this task,” Landay said.
The subjects took turns typing or speaking about 100 phrases sourced from a standard library of everyday phrases used in text-based research – phrases such as “physics and chemistry are hard,” “have a good weekend” and “go out for some pizza and beer” – while the testing app recorded their times and accuracy rates. Half the subjects performed the task in English using the QWERTY keyboard; the other half conducted the test in their native Mandarin using iOS’ Pinyin keyboard.
The results were clear no matter the language. For English, speech recognition was three times faster than typing, and the error rate was 20.4 percent lower. In Mandarin Chinese, speech was 2.8 times faster, with an error rate 63.4 percent lower than typing.
“We knew speech recognition is pretty good, so we expected it to be faster, but we were actually quite surprised to find that it was almost three times faster than typing on a keyboard,” said co-author Sherry Ruan, a computer science PhD student at Stanford who helped run the experiments.
Although the researchers used Baidu’s speech recognition software, they suspect that other high-accuracy speech engines perform at a similar level. Now that the team members have quantified that speech recognition actually works well, they hope it will encourage engineers to design user interfaces that take better advantage of the technology.
“We should put speech in more applications than just typing an email or text message,” Landay said. “You could imagine an interface where you use speech to start and then it switches to a graphical interface that you can touch and control with your finger.”
The study, titled “Speech Is 3x Faster than Typing for English and Mandarin Text Entry on Mobile Devices,” is published online at arxiv.org. Co-authors included Jacob Wobbrock of the University of Washington and Kenny Liou and Andrew Ng of Baidu; Ng is also an adjunct professor of computer science at Stanford. More information can be found at http://hci.stanford.edu/research/.
Media Contacts
Bjorn Carey, Stanford News Service: (650) 725-1944, bccarey@stanford.edu