Interview with Yoon Kim: Speech recognition technology for mobile search
Below is a copy of a recent interview with Yoon Kim carried out by Bill Meisel. The original article, entitled "Interview with Yoon Kim, Novauris: Speech recognition technology for mobile search", can be found in the September 2007 issue of Speech Strategy News (formerly Speech Recognition Update).
Prior to joining Novauris, Yoon co-founded NeoSpeech Inc., a
speech technology startup in the Silicon Valley. He was with the
Speech Technology and Research (STAR) Laboratory at SRI
International, and Center for Computer Research in Music and
Acoustics (CCRMA) at Stanford University. He has more than
a decade of industry and research experience on many aspects of
speech technology and solutions for embedded, network, and
distributed environments. Yoon holds a Ph.D. in electrical
engineering from Stanford University.
Novauris has a particular focus in providing speech
recognition technology (SSN, September 2006, p. 24). What is
that focus, how does it distinguish you from other speech
technology vendors, and how has it evolved?
The central focus of Novauris technology is voice mobile
search, i.e., accessing information with a mobile device using
speech. We are probably unique in having this clear focus, and
we like to think that we can do voice mobile search better than
anyone else.
As for how our technology has evolved, you may remember that we
first demonstrated the recognition of any name and address from
a list of 245 million possibilities. We got stunning speed and
accuracy on this application by exploiting in novel ways the
regular structure of names and addresses and the redundancy
present in each complete item.
Since then, we have extended our technology to cope with much
shorter inputs, such as the names of recording artists or titles
of music tracks. Such items generally lack any obvious structure
or redundancy, yet we can still offer access to over a million of
them. It has even turned out that our approach can be successfully
applied to allow virtually free-form entry to a speech-to-speech
interpreter.
Also, we initially offered technology only for server-based or
PC-based applications. More recently, we have extended our
technology to applications on hand-held devices using ARM/XSCALE
processors.
Finally, our technology was initially confined to English but we
are steadily extending it to other languages. We added Korean last
year and are now adding Japanese.
What is your strategy for addressing market opportunities and
partnerships?
As is the case for most emerging technologies, speech technology
is not enough in itself to create a market with sustained growth.
The value of speech technology lies in of its contribution to an
overall product or service that includes speech technology.
Our strategy is to create and exploit market opportunities by
first studying the potential impact of a product or service in a
market segment, selecting a small number of key players as
partners, and working closely with each partner to develop an
outstanding application.
This approach of application-specific technology and product
development through close partnership—rather than offering
complete applications or services ourselves—reflects our
intention to avoid competing with our partners and customers.
Our preferred business model is revenue sharing, but we are open
to alternatives.
Please outline how your technology supports the Verizon Wireless
Get It Now search application (SSN, May 2007, p. 6), and how the
public has accepted that application.
The Get It Now Search application lets Verizon Wireless
subscribers specify and download musical items, games, etc, from
a large Verizon catalog either by speaking the item that they
want or by keying it in. The application is run by
Medio Systems, Inc.
Inc. They host the service, and return to the user items being
requested. They also handle the search when the request is keyed
in. When the request is spoken, it is passed to a server running
our NovaSearch software, which returns to Medio’s server a list
of matches to what was spoken together with the associated
probabilities and sometimes some comments on the signal
characteristics.
I think we can say that the application has been well accepted by
the public for the following reasons: there are now millions of
users who regularly make search requests; and the number of users
and search requests is growing, as is the proportion of callers
who choose voice as their input mode. In addition, I’m very
pleased to tell you that a new version of Get It Now Search has
been launched recently (Aug 22, 2007). This version now supports
voice search and downloading of full MP3 tracks in the entire
VCAST Music catalog (about 1.5M items), in addition to the
existing Get It Now content.
How do you see your market opportunity expanding?
We expect to see more applications similar to Get It Now Search
that feature multimodal search of digital content, such as music,
video, and games. Aside from these entertainment-related
applications, we see the major growth areas being related to
location: that is, spoken input to navigation aids and to services
allowing the user to search for information on items such as
nearby restaurants, supermarkets, ATMs, gas stations, medical
facilities, etc.
The server-based market opportunity will expand as the
availability of 3G/4G networks increases, together with the
penetration of phones capable of using those networks. The
embedded or clientonly market will expand as mobile devices are
equipped with increasingly powerful processors and increasing
memory and storage capacity.
How will your technology evolve to support these opportunities?
One of the pleasures of our having adopted a novel approach to
speech recognition is that we are still in a period of making
unusually rapid progress in the performance of our technology.
For instance, a breakthrough a few months ago has shown how our
already fast search process can be made several times faster
while needing only a fraction of the current memory requirement.
Although initially confined to our simplest applications, the new
technique is gradually being extended for use in all our
applications.
We expect such evolution to continue, allowing us to do more and
more with given hardware. We expect to see a larger proportion of
our software running on the client side or in standalone
applications. The amount of accessible mobile content and
information (either stored locally on the device or over the
network) is set to grow considerably over the next year or two.
We believe that our special strength in enabling users to select
efficiently from very large numbers of items with single voice
inputs will become increasingly apparent.
Any final comments?
When you interviewed my colleague,
Melvyn Hunt, a year ago, he
expressed his excitement at seeing the emergence of worthwhile,
widely used applications of voice mobile search. I fully share
that excitement. I’m convinced that voice mobile search
applications are going to play an important part in our daily
lives.
Also, I would like to express my personal joy and excitement in
being associated with a tightknit team of exceptionally bright
individuals who are determined to bring value to our partners and
to the marketplace.
|