HOME | SITEMAP
NovaSystem — Our Scalable Server-Based Speech Recognition System

Novauris's main software product, NovaSystem, is a server-based distributed speech recognition system, capable of handling multiple simultaneous voice access requests, and, if required, spreading the computational load over a multi-computer installation.  (The underlying NovaSearch technology is also suited to embedded applications — contact us about this.)

It can be configured to process telephone speech, or higher quality speech input, and custom applications can be developed for other types of speech signal processing, such as
ETSI Aurora DSR

It accepts PCM or mu-law waveforms at a variety of sampling rates using a simple TCP/IP protocol, which is also used to specify the type of speech recognition task required and to control the recognition process.

The system responds with text identifying the item matching the spoken input, or more generally with an ordered list of the best-matching items together with estimates of the probabilities that each item matches the input.  It can also provide feedback on the quality of the speech signal.

NovaSystem Software Architecture
Setting up NovaSearch Applications is Easy

To set up a new NovaSearch application running on NovaSystem all that's needed is an ASCII file listing the phrases to be recognized, one phrase per line.  It's that simple.  If any words in these phrases are not in Novauris's extensive pronouncing dictionary (approaching half-a-million entries) then pronunciations can be generated automatically (though manual checking of the new pronunciations is recommended).

For very large or complex applications it's more convenient and efficient to utilize structure in the data.  If the data is held in a database — or can be organized into fields and put into a database — NovaSystem has software for defining grammars and automatically generating all the phrases in the application (see box below)


Example Application — Asking about People
Suppose we have a database of information called "people," of which the table shows a small extract:
ID first_name last_name age house_number street city state
735892 John Smith 37 152 Olive Dallas Texas
821560 Juanita Suares 26 32A Adams Dallas Texas
881553 Peggy Jones 58 390 5th Boise Idaho
In an application, callers might give some information to identify a person in order to get other information back (ID no., age, etc.)

So callers might say:
"Give me John Smith of Dallas, Texas" or "John Smith, Dallas, Texas"

The grammar would be:
      < ?giveme > first_name last_name < ?of > city
where:
      < ?giveme > = (Give me | < blank > ) and < ?of > = ( of | < blank > )
NovaSystem Software — The Nuts & Bolts for Those Who Really Want to Know

The NovaSystem software provides a client-server architecture for applications requiring speech recognition, ranging from embedded applications to carrier-grade deployments.  It can, if needed, provide conventional-style speech recognition, as well as Novauris's innovative technology for spoken access to large databases.

The NovaSystem consists of two main components:  a single NovaServer, which is the single point of contact for applications requiring speech recognition services and one or many ResourceServers, which actually do the speech recognition. The NovaServer coordinates the activity of the ResourceServers that have been assigned to it.

There is also a configuration and monitoring NovaLoad client, which allows the administrator to monitor the state of the NovaSystem, and to perform on-line configuration changes.

Client libraries are also available in C or Python, or alternatively the fully documented TCP/IP protocol can be used directly.



NovaSystem Features

Scalability. The distributed nature of NovaSystem allows it to scale to large installations, and includes algorithms for balancing the computational load over multiple server machines.

Reliability and robustness.  NovaSystem has been designed to provide redundancy and handle hardware failure/disconnection gracefully.

Configuration and Deployment.  Adding a new server to the NovaSystem is quite simple: just install the ResourceServer package on the new machine, and add the address of the new machine in the NovaServer configuration.  Each ResourceServer receives its configuration from the NovaServer via TCP/IP, and the resource files required for the speech recognition tasks are specified in this configuration either as local files, or more commonly by URLs, allowing them to be easily accessed across the network from any ResourceServer.

Ease of use. The NovaServer client API provides a simple interface to the powerful technology operating behind it.  The complexities of multi-stage processing are hidden from the client, which simply needs to specify a pre-configured recognition task, deliver the waveform (preferably in real-time) and receive the results.  The NovaServer can be accessed using libraries supplied by Novauris, or customers can write their own interface software based on the description of the simple line protocol.

System monitoring and administration. The NovaServer API provides methods for querying the status of jobs, tasks and server processes.  An example GUI monitor, NovaLoad, uses these commands to present a dynamic view of the system by ResourceServers (above) and by task (below).  The system manager can also use NovaLoad to view and change configurations.

The main NovaLoad display showing the status of each machine, task assignments, and tasks queued. Red shows an active slot and green a free slot.

The NovaSystem supports NovaSearch voice access technology (in which case precompiled grammars and lexicons are used).  This technology has been demonstrated on databases with up to 245 million items.  It provides efficiency advantages with searchable databases as small as a few thousand items, advantages that grow with the size of the database and the length of the spoken input.

Conventional speech recognition capabilities are specified by supplying a grammar either at configuration time (in which case the grammar is precompiled by the system) or just before recognition (in applications where the grammar will vary during a dialogue, for example).

Specifications
NovaSearchServer for server-based applications.
NovaServerCompact for embedded applications.