ABSTRACT

The use of speech output with visual displays of text raises novel cognitive design issues about how the two modalities should be integrated. When employed as an additional, rather than simply as an alternative modality, speech may have the potential for improving performance in language comprehension tasks; but if poorly integrated, it might also severely disrupt performance. We argue that the design of a consistently effective multimodal (MM) presentation must be grounded in a model of user cognition that can account for the combined processes of reading and listening. This paper proposes a multimodal user model (MMUM), a model of MM language processing at the lexical, phonological, syntactic, semantic, pragmatic and propositional levels of processing. The model attempts to characterise the cognitive structures and processes underlying human language processing in multimodal presentation. In its current form, the model does not include an account of modality-based responses in MM interaction.