Figure 3.6 shows the high level data flow for each utterance spoken by the user, and the conceptual ordering and sequence of events that must occur after the user has spoken but before animation can occur. First ViaVoice must both recognise and accept the user's spoken input. The distinction must be made because ViaVoice may recognise the phrase correctly, but not with a high enough likelihood to accept it. Since user's actions should be firmly recognised for the system to be confident it is performing the desired action, the phrase must be recognised and accepted.
If the first test failed, then it is because of a user competence error or a performance error on the part of the user or the system. The user is either not using understandable vocabulary (user competence error), or the system cannot understand recognise it due to its speaker model (system performance error) or because of the user's way of speaking (user performance error). The same error should be reported in all three cases since the only solution, bar producing a new speaker model for ViaVoice, is for the user to adapt their vocabulary or their manner of speech.
Once the phrase has been recognised, the input needs to be parsed to build its semantics. If the parsing of the phrase fails, then similar to the previous case no further computation need be done. The error this time, however, is purely the system's fault. There must have been a mismatch between the grammar that ViaVoice was using to recognise the phrases, and the grammar which the NLP module was using to parse the phrase. A different style error must therefore be reported to the user.
The last remaining step is to determine the action or actions that need to be carried out. As in the previous error case, if there are no actions decidable from that input, then the system is not powerful enough, and there was a mismatch between the modules, the grammars, and the pragmatic actions. If actions can be decided, then these should be carried out, and the next input should be looked at.
Similar information can be written on a class basis with the sequence model shown in Figure 3.7.
This model shows the activation of each class as the appropriate messages are called. Notable features are that both ViaVoice and the animation module are constantly activated. For ViaVoice it is because it is has an active session (namely this application) and therefore remains active, waiting for audio input or data from the application. In the case of the animation module, it remains constantly active as it must continually update the virtual environment, the application window and the sub-elements of both.
The flow of messages is exactly as seen in Figure 3.6, such that only when the user's utterance is both recognised and accepted is the data passed on to the semantics module. The type of the message passed to the animation module from the interface to ViaVoice depends on the acceptance of the phrase. Either the phrase is to be displayed to indicate it has been accepted, or a suitable error should be displayed.