In spoken interactions, face-to-face meetings are often preferred. This is because the human face is highly expressive and can facilitate coordinated interactions. Embodied conversational agents with expressive faces therefore have the potential for smoother interactions than voice assistants. However, knowledge of how the face expresses these social signals – the “language” of facial expressions – is limited, with no coherent modelling framework (e.g., see Jack & Schyns, 2017). For example, current models focus primarily on basic emotions such as fear, anger and happiness, which are not suitable for everyday conversations or recognized cross-culturally (e.g., Jack, 2013). Instead, signals of affirmation, uncertainty, interest, and turn-taking in different cultures (e.g., Chen et al., 2015) are more relevant (e.g., Skantze, 2016). Conversational digital agents typically employ these signals in an ad hoc manner, with smiles or frowns manually inserted at speech-coordinated time points. However, this is costly, time consuming, and provides only a limited repertoire of, often Western-centric, face signals, which in turn restricts the utility of conversational agents.
To address this knowledge gap, this project will (a) Develop a modelling framework for conversationally relevant facial expressions in distinct cultures – East Asian and Western, (b) Develop methods to automatically generate these facial expressions in conversational systems, and (c) Evaluate these models in different human-robot cultural interaction settings. This automatic modelling will coordinate with the agent’s speech (e.g. auto-inserting smiles at appropriate times), the user’s behaviour (e.g. directing gaze and raising eyebrows when the user starts speaking), and the agent’s level of understanding (e.g. frowning during low comprehension).
We will employ state-of-the-art 3D capture of human-human interactions and psychological data-driven methods to model dynamic facial expressions (see Jack & Schyns, 2017). We will deploy these models using FurhatOS – a software platform for human-robot interactions – and the Furhat robot head, which has a highly expressive animated face with superior social signalling capacity compared to other platforms (Al Moubayed et al., 2013). The flexibility of Furhat’s display system, combined with state-of-the-art psychological-derived 3D face models will also enable exploration of other socially relevant facial characteristics, such as ethnicity, gender, and age (e.g., see Zhan et al., 2019).
The results will be highly relevant to companies developing virtual agents/social robots, such as Furhat Robotics. Skantze, Furhat Robotics co-founder/chief scientist, will facilitate the impact of the results. The project will also inform fundamental knowledge of human-human and human-robot interactions by precisely characterizing how facial signals facilitate spoken interactions. We anticipate outputs in international psychology and computer science conferences (e.g., Society for Personality and Social Psychology; IEEE Automatic Face & Gesture Recognition) and high-profile scientific outlets (e.g., Nature Human Behaviour). Jack is PI of a large-scale funded laboratory specialising in modelling facial expressions across cultures.
Application deadline is the 28th of June 2021.