HOME | SITEMAP
Interview with Yoon Kim: Speech recognition technology for mobile search
Below is a copy of a recent interview with Yoon Kim carried out by Bill Meisel. The original article, entitled "Interview with Yoon Kim, Novauris: Speech recognition technology for mobile search", can be found in the September 2007 issue of Speech Strategy News (formerly Speech Recognition Update).
Prior to joining Novauris, Yoon co-founded NeoSpeech Inc., a speech technology startup in the Silicon Valley. He was with the Speech Technology and Research (STAR) Laboratory at SRI International, and Center for Computer Research in Music and Acoustics (CCRMA) at Stanford University. He has more than a decade of industry and research experience on many aspects of speech technology and solutions for embedded, network, and distributed environments. Yoon holds a Ph.D. in electrical engineering from Stanford University.
Novauris has a particular focus in providing speech recognition technology (SSN, September 2006, p. 24). What is that focus, how does it distinguish you from other speech technology vendors, and how has it evolved?
The central focus of Novauris technology is voice mobile search, i.e., accessing information with a mobile device using speech. We are probably unique in having this clear focus, and we like to think that we can do voice mobile search better than anyone else.
As for how our technology has evolved, you may remember that we first demonstrated the recognition of any name and address from a list of 245 million possibilities. We got stunning speed and accuracy on this application by exploiting in novel ways the regular structure of names and addresses and the redundancy present in each complete item.
Since then, we have extended our technology to cope with much shorter inputs, such as the names of recording artists or titles of music tracks. Such items generally lack any obvious structure or redundancy, yet we can still offer access to over a million of them. It has even turned out that our approach can be successfully applied to allow virtually free-form entry to a speech-to-speech interpreter.
Also, we initially offered technology only for server-based or PC-based applications. More recently, we have extended our technology to applications on hand-held devices using ARM/XSCALE processors.
Finally, our technology was initially confined to English but we are steadily extending it to other languages. We added Korean last year and are now adding Japanese.
What is your strategy for addressing market opportunities and partnerships?
As is the case for most emerging technologies, speech technology is not enough in itself to create a market with sustained growth. The value of speech technology lies in of its contribution to an overall product or service that includes speech technology.
Our strategy is to create and exploit market opportunities by first studying the potential impact of a product or service in a market segment, selecting a small number of key players as partners, and working closely with each partner to develop an outstanding application.
This approach of application-specific technology and product development through close partnership—rather than offering complete applications or services ourselves—reflects our intention to avoid competing with our partners and customers. Our preferred business model is revenue sharing, but we are open to alternatives.
Please outline how your technology supports the Verizon Wireless Get It Now search application (SSN, May 2007, p. 6), and how the public has accepted that application.
The Get It Now Search application lets Verizon Wireless subscribers specify and download musical items, games, etc, from a large Verizon catalog either by speaking the item that they want or by keying it in. The application is run by Medio Systems, Inc. Inc. They host the service, and return to the user items being requested. They also handle the search when the request is keyed in. When the request is spoken, it is passed to a server running our NovaSearch software, which returns to Medio’s server a list of matches to what was spoken together with the associated probabilities and sometimes some comments on the signal characteristics.
I think we can say that the application has been well accepted by the public for the following reasons: there are now millions of users who regularly make search requests; and the number of users and search requests is growing, as is the proportion of callers who choose voice as their input mode. In addition, I’m very pleased to tell you that a new version of Get It Now Search has been launched recently (Aug 22, 2007). This version now supports voice search and downloading of full MP3 tracks in the entire VCAST Music catalog (about 1.5M items), in addition to the existing Get It Now content.
How do you see your market opportunity expanding?
We expect to see more applications similar to Get It Now Search that feature multimodal search of digital content, such as music, video, and games. Aside from these entertainment-related applications, we see the major growth areas being related to location: that is, spoken input to navigation aids and to services allowing the user to search for information on items such as nearby restaurants, supermarkets, ATMs, gas stations, medical facilities, etc.
The server-based market opportunity will expand as the availability of 3G/4G networks increases, together with the penetration of phones capable of using those networks. The embedded or clientonly market will expand as mobile devices are equipped with increasingly powerful processors and increasing memory and storage capacity.
How will your technology evolve to support these opportunities?
One of the pleasures of our having adopted a novel approach to speech recognition is that we are still in a period of making unusually rapid progress in the performance of our technology. For instance, a breakthrough a few months ago has shown how our already fast search process can be made several times faster while needing only a fraction of the current memory requirement. Although initially confined to our simplest applications, the new technique is gradually being extended for use in all our applications.
We expect such evolution to continue, allowing us to do more and more with given hardware. We expect to see a larger proportion of our software running on the client side or in standalone applications. The amount of accessible mobile content and information (either stored locally on the device or over the network) is set to grow considerably over the next year or two. We believe that our special strength in enabling users to select efficiently from very large numbers of items with single voice inputs will become increasingly apparent.
Any final comments?
When you interviewed my colleague, Melvyn Hunt, a year ago, he expressed his excitement at seeing the emergence of worthwhile, widely used applications of voice mobile search. I fully share that excitement. I’m convinced that voice mobile search applications are going to play an important part in our daily lives.
Also, I would like to express my personal joy and excitement in being associated with a tightknit team of exceptionally bright individuals who are determined to bring value to our partners and to the marketplace.