Erik C. B. Olsen, M.S.
Oviatt, S. L. and Olsen, E. (1994)
This research examines how people integrate spoken and written input during
multimodal human-computer interaction. Three studies used a semi-automatic simulation
technique to collect data on people's free use of spoken and written input. Within-subject
repeated-measures studies were designed, with data analyzed from 44 subjects and 240 tasks. The
primary factors that govern people's selection to write versus speak at given points during a
human-computer exchange were evaluated. Analyses revealed that people write digits more often
than textual content, and proper names more often than other text. A form-based presentation, in
comparison with an unconstrained format, also increased the likelihood of writing. However, the
most influential factor in patterning people's integrated use of speech and writing is contrastive
functionality, or the use of spoken and written input in a contrastive way to designate a shift in
content or functionality, such as original versus corrected input, data versus command, and digits
versus text. Different patterns of contrastive mode use accounted for approximately 57% of the
integrated pen/voice use observed in these studies. Information also is summarized on preferential
mode use, and simultaneity of pen/voice input. One long-term goal of this research is the
development of quantitative predictive models of natural modality integration, which could
provide guidance on the strategic design of robust multimodal systems.