English is a rich and wonderfully varied language, with many different accents spoken around the world. Imagine three people – one from Liverpool, one from Boston, and one from the Deep South – were having a conversation. Technically, all three would be speaking English, but each would likely have some degree of difficulty understanding the others due to their accents.
Now imagine an automatic speech recognition (ASR) device trying to capture the conversation. Given the current limitations of ASR technology, chances are the voice capture would be barely intelligible at best.
Ideator Adam Crabtree hopes to change that by making ASR easier to use and more appreciative of the myriad nuances of the English language. His new app, Phonoscape, is designed to help ASR engines in mobile devices respond to the accents of English-speaking users with greater accuracy.
He also aims to make ASR a lot more interesting to use.
“For those willing to lend their voices, ASR will be a lot more fun and engaging,” said Crabtree. “We plan to make voice-input tasks quick, mobile, and user-friendly, thereby fostering the growth of a network of English-speaking contributors who can socially connect with one another while celebrating their unique accents.”
What’s the Problem?
Currently, there are two approaches to capturing dialect with automatic speech recognition. Major industry players like Google, Apple, and Microsoft create sophisticated algorithms to capture accents as accurately as possible. In contrast, smaller non-profit organizations have speakers read text into highly sensitive microphones in hopes of better capturing the nuances of dialect. By themselves, neither presents an ideal solution.
What’s the Big Idea?
Phonoscape hopes to produce more accurate results by combining the algorithmic model with the database model. Using a step-by-step human intelligence task (HIT) model combined with mobile technologies, Phonoscape makes accented data accrual quick, mobile, and user-friendly while allowing communications devices to recognize and process different brands of English exactly the way they are spoken.
Users speak a text sequence generated by the Phonoscape app into their mobile device. Other people receive the voice and type into the app exactly what they hear. Phonoscape takes the original text sequence, the typed interpretations, and the audio file of the reader’s voice, and compiles a database showing how well all three match up. As the database grows, users can access a library of speech sequences in a way that allows them to socially connect with those who have voiced the sequences aloud.
Where Is It Headed?
Crabtree sees the Phonoscape platform as an aid to the ASR learning process while also nurturing a global community centered on the celebration of variant speech. He hopes to position the app as a major facilitator in the adaptation of accented speech processing, so that consumer diversity better aligns with ASR product supply trends.
Crabtree also plans to use Ideator to obtain critiques of Phonoscape, assist with the validation process, and make contacts that can help support his business development and fundraising efforts.
“I’m thankful I’ve found a community like Ideator that encourages collaboration around new ideas,” he said. “I’m currently looking for potential co-founders, as well as assistance with survey design, focus groups, and other product validation processes. Then, hopefully, it’s on to a working prototype!”
If you’re interested in joining Crabtree’s idea, log on to Ideator to learn more!