For many years, computer researchers and developers have been designing innovative applications and interfaces for use in everyday life. Examples of these interfaces include the character recognition technology used by many hand held PCs, allowing users to write free-form characters onto a touch sensitive panel, a portable MP3 music player that allows users to request songs stored on the device by speaking their name, and mobile telephones that allow their owners to dial the number of a person stored in the phone's memory by speaking their name.
Although there many examples of similar interfaces that are either designed primarily for novelty value, or with the aim of making specialist devices, such as the hand held PC, usable, there are tremendous possibilities for such interfaces to provide access to computing services to a wide range of people.
A recent pilot scheme that is an excellent example of the use of computers as an enabling technology, is a prototype system currently being deployed in many post offices across the country. It is aimed at providing better services to the deaf and the hard of hearing, and to do this it uses a digital character to animate the translation of a postal worker's spoken phrases into sign language.
There are many applications on a desktop PC that are not usable by everyone in the general population. There are users who cannot type well, if at all for example, and there are users who have trouble reading from the screen. The reward for designing innovative applications using interfaces other than the now common place windows, icons, mouse and pointer (WIMP) styles is high, especially for the kind of users discussed.
Multi-modal interfaces are one attempt at catering for users with disabilities. Multi-modal applications allow input using more than mode such as speech and gesture, and generally users may utilise one or more of these modes for interacting with the system.
Disadvantaged users who are unable to use one of the input modalities are often capable of using another of the remaining modes; for example users who are unable to type are often able to speak their own language, showing applications with a spoken natural language input mode are accessible to many people who otherwise would not be able to utilise them.
Natural language interfaces are aimed towards providing a more natural interface style, compared to that of traditional command line languages, enabling those not technically minded to use computers without having to give it a second thought. Spoken natural languages interfaces allow the same functionality as typed natural languages but use speech for the input mode rather than typing.
The example interfaces given above, which involve speech recognition, have used either a single word (the name of the person to telephone), a simple phrase (for access to a song), or in the post office pilot scheme example which is aimed at using more complex sentences, a set of constrained phrases. All three systems use a simple pattern matching template schema to decide how to act upon the speech input.
The integration of the three technologies (speech recognition, natural language processing and animation) is rare, and the aim of this paper is to design and implement a prototype application, providing a rigorous evaluation of the system afterwards.
The prototype application should employ speech as its main input modality and provide control over an animated virtual character. The system should not be limited to using constrained phrases and, therefore, an element of natural language processing will be required to parse the input utterances, and provide the level of understanding of the various commands needed to animate the character.
The background chapter of this paper provides a overview of the fields of speech recognition, natural language processing and animation, reviewing problems faced in each area, identifying existing techniques and technologies that could be utilised.
The design chapter details the tools chosen for the construction of such an application and the necessary functional and non-functional requirements. Design specifications are then defined for the application.
The implementation chapter reviews the details of putting the specification into action, along with any problems encountered along the way, and is followed by the evaluation chapter which theoretically and empirically evaluates the prototype system that has been built. Results of these evaluations are summarised, and the report ends with the conclusion chapter which applies the evaluation to the criteria layed out in the background and design chapters. Limitations of the prototype are discussed along with its applicability, feasibility and suggestions for further extensions to the system.