Sight and Sound: Generating Facial Expressions and Spoken Intonation from Context

Prevost, ScottPelachaud, Catherine2023-05-222023-05-221994-09-012007-07-18https://repository.upenn.edu/handle/20.500.14332/36503This paper presents a model for automatically producing prosodically appropriate speech and corresponding facial expression for agents that respond to simple database queries in a 3D graphical representation of the world. This work addresses two major issues in human-machine interaction. First, proper intonation is necessary for conveying information structure, including important distinctions of contrast and focus. Second, facial expressions and lip movements often provide additional information about discourse structure, turn-taking protocols and speaker attitudes.Sight and Sound: Generating Facial Expressions and Spoken Intonation from ContextPresentation