Theoretical Background

3.1. Conversational Models

Human face-to-face interaction is rich in conversational behaviours that Interpersonal Theory describes as fulfilling conversational functions. Particular behaviours used in conversations are understood in terms of the function they fulfil in the ongoing conversation, i.e. using the hands to gesture, producing facial expressions, changing the melody or intonation of the voice. These verbal and non-verbal means contribute information to the ongoing conversation and contribute to discourse functions in the dialog, such as conversation invitation, turn taking, providing feedback, contrast and emphasis, and breaking away (Cassell et al. 1998) (Kendon 1990).

Discourse functions are independent from modalities. They may result from different behaviours and input modalities. A greeting could be executed as nod, waving or speech – all serve the same function. On the other hand conversational behaviour is context dependant. The same gesture can have different meanings and therefore has a different discourse function. For example a nod could be interpreted as greeting, as feedback or as emphasis.

Theory distinguishes conversational functions realized through discourse behaviours in different modalities with respect to the information and the goal they serve. Propositional information, e.g. meaningful speech, hand gestures and intonation used to complement or elaborate upon the speech content and advance the conversation content. Interactional information consists of cues that regulate the conversational process. It includes a range of non-verbal behaviours (quick head nods to indicate that one is following) as well as regulatory speech (‘huh?’, ‘do go on’). “In short, the interactional discourse functions are responsible for creating and maintaining an open channel of communication between the participants, while propositional functions shape the actual content” (Cassell, Torres et al. 1999).

Effective interfaces could be designed with this background knowledge, interfaces that are consistent in the use of multiple modalities to achieve particular functions. Such an interface may map input events in different modalities onto the same discourse function, but under different conditions the same function may be implemented by different conversational behaviours. The state of the conversation and the available input and output modalities would be of high importance for the generation of appropriate (re)actions. The resulting autonomous behaviour and utterances would be consistent to each other and highly flexible towards misunderstandings and interruptions of the conversational flow (Cassell et al. 1998). Thus graceful repairs of the conversation after misunderstandings are possible and likely.