Skip to main content
To KTH's start page

Modeling Feedback and Gaze in Human-Robot Interaction

Compared to a written chat, spoken face-to-face interaction is a much richer form of communication, relying not only on words but also on a variety of non-verbal cues. Elements like prosody, facial expressions, and gaze play crucial roles in giving feedback, aligning understanding, regulating turn-taking, and conveying emotions. This project focuses on modeling these non-verbal aspects of human communication. This research is motivated not only by a desire to deepen our understanding of human communication but also by the goal of enhancing human-robot interactions, with the aim of creating robots that are more socially aware and responsive.

Gaze in conversation

Robot gaze

Gaze behavior is a fundamental component of human interaction, serving multiple communicative functions, such as signaling attention, managing turn-taking, and indicating the focus of interest. In human-robot interactions, accurately modeling gaze patterns can help robots better interpret and predict conversational cues. In our project, we explore how we can best model the gaze behavior of robots, and how the robot’s gaze influences the interaction. 

Feedback and Backchannels

Feedback in conversations is often non-verbal, for example in the form of head nods or vocal backchannels - brief vocalizations (e.g., "mm-hmm," "uh-huh") that indicate understanding, agreement, or interest. These signals are crucial for smooth and natural dialogue flow, as they provide reassurance to the speaker and help align conversational goals. In human-robot interaction, it is crucial to both understand feedback from the user to the robot, in order for the robot to be able to adapt to the user, as well as being able to accurately produce appropriate feedback, reflecting the robot’s level of understanding.

Here is an example of how we are using unsupervised learning to allow machines to automatically "discover" various feedback functions:

Adaptive robot presenters

To explore how a robot could make use of feedback from the user, we have created a test-bed where a robot is presenting a piece of art to a human audience. The robot then detects feedback from the audience in order to adapt the presentation according the human's level of understanding. 

This is explained in the following video (automatically generated from the slides):

Publications 

[1]
C. Figueroa, M. Ochs and G. Skantze, "Classification of Feedback Functions in Spoken Dialog Using Large Language Models and Prosodic Features," in 27th Workshop on the Semantics and Pragmatics of Dialogue, 2023, pp. 15-24.
[2]
A. Axelsson and G. Skantze, "Do you follow? : A fully automated system for adaptive robot presenters," in HRI 2023 : Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, 2023, pp. 102-111.
[3]
C. Mishra et al., "Does a robot's gaze aversion affect human gaze aversion?," Frontiers in Robotics and AI, vol. 10, 2023.
[4]
C. Figueroa, Š. Beňuš and G. Skantze, "Prosodic Alignment in Different Conversational Feedback Functions," in Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023, 2023, pp. 154-1518.
[5]
C. Mishra et al., "Real-time emotion generation in human-robot dialogue using large language models," Frontiers in Robotics and AI, vol. 10, 2023.
[6]
C. Figueroa et al., "Annotation of Communicative Functions of Short Feedback Tokens in Switchboard," in 2022 Language Resources and Evaluation Conference, LREC 2022, 2022, pp. 1849-1859.
[7]
G. Skantze and C. Mishra, "Knowing where to look : A planning-based architecture to automate the gaze behavior of social robots," in 31st IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2022, Napoli, Italy, August 29 - Sept. 2, 2022, 2022.
[8]
O. Ibrahim and G. Skantze, "Revisiting robot directed speech effects in spontaneous Human-Human-Robot interactions," in Human Perspectives on Spoken Human-Machine Interaction, 2021.
[10]
D. Kontogiorgos et al., "The Effects of Embodiment and Social Eye-Gaze in Conversational Agents," in Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci), 2019.

Funding

  • Representation Learning for Conversational AI (WASP, 2021-2026)
  • COBRA: Conversational Brains  (EU MSCA ITN, 2020-2023)
  • COIN: Co-adaptive human-robot interactive systems (SSF, 2016-2020)

Researchers

Gabriel Skantze
Gabriel Skantze professor
Agnes Axelsson
Agnes Axelsson
Livia Qian
Livia Qian doctoral student
Carol Figueroa
Carol Figueroa Doctoral student (Furhat Robotics)