Skip to main content
To KTH's start page

NeuroHRI: Using Neuroimaging Data for Exploring Conversational Engagement in HRI

Description

Social interaction plays an undeniably big role in everyday human life. Our project addresses interactions with novel conversational systems, in particular social robots. To design robots that can be successfully interacted with in education, healthcare, rehabilitation or service, it is important to study how communication with them unfolds and how it differs from human-human interaction. We are laying the ground for integrating neuroimaging (specifically fMRI) with commonly used behavioral measures to (1) understand the perception of these technologies and (2) provide data for the interdisciplinary research field studying conversational processes, in particular conversational engagement.

Successful social robots should be able to monitor user’s engagement in the conversation and display appropriate feedback. Here, comparing human-robot interaction to human-human interaction and investigating how differently engagement is represented with different agents is important. However, in humans, engagement is a complex process, expanding to, presumably, linguistic, paralinguistic and social processing. So far, it is not known how these processes are encoded in brain activation patterns. In this project, we tackle the fundamental questions about the human interactive engine in human-human and human-robot interaction.

During the project, we have delivered a large-scale multimodal NeuroEngage  dataset with more than 50 participants in a novel design manipulating engagement levels that we have shared with the research community.

Call for master thesis projects: Analyzing conversational engagement in human-robot interaction

This call builds upon the Digital Futures research pair Using Neuroimaging Data for Exploring Conversational Engagement in Human-Robot Interaction , which aims to investigate how differently people talk and listen to other humans and social robots, and how we can use that knowledge to make social robots able to detect, maintain and express appropriate conversational engagement that enables successful communication with a user.

We have collected a multimodal dataset of human-human and human-robot conversations, in which participants engaged in 30 minutes of free conversation with a confederate (a human or a robot controlled by a human via a VR-mediated teleoperation interface) discussing ethical dilemmas. The level of engagement during the conversations was manipulated by design through the confederate’s behavior. The NeuroEngage  dataset includes, besides the fMRI brain data, audio from both participant (p) and confederate (c), video (c), eye-tracking data (p), and questionnaire results (p).

Below are several topics we suggest for master thesis projects, but any other topics that intend to use the NeuroEngage dataset are also welcome:

  • How to classify conversational engagement in brain data? Neural correlates of engagement have not been widely studied before, and we are looking for the differences between engagement levels in terms of fMRI data. We are interested in ways to use state-of-the art machine learning methodologies to interpret brain data and find engagement signatures.
  • How to detect conversational engagement with different agents using eye-tracking, speech, and brain data? Gaze is one of the most widely studied features used for automatic engagement detection. We want to use a multimodal approach to detect engagement by integrating several features of the conversation.
  • Detect and generate verbal and non-verbal backchannels in a conversation using multimodal LLMs? To manipulate engagement, we varied the amount of backchannels the confederate gave to the participant. As such, we think our dataset is appropriate to explore automatically detecting verbal and non-verbal backchannels from the confederate in both human and robot conditions using multimodal LLMs. This will explore the potential of such models for both perception and generation.
  • How to extract prosodic features relevant to conversational engagement from noisy audio? We want to know how participants’ engagement varied during the conversations, and we can use prosodic features to do that. However, the fMRI scanner is very noisy, and obtaining clean sound from the participant is a big challenge in brain imaging research. One research question is whether we can still recover some prosodic features that will become useful for classification.

Relevant Literature

  1. Ben-Youssef, A., Varni, G., Essid, S., & Clavel, C. (2019). On-the-fly detection of user engagement decrease in spontaneous human–robot interaction using recurrent and deep neural networks. International Journal of Social Robotics, 11(5), 815-828.
  2. Rauchbauer, B., Nazarian, B., Bourhis, M., Ochs, M., Prévot, L., & Chaminade, T. (2019). Brain activity during reciprocal social interaction investigated using conversational robots as control condition. Philosophical Transactions of the Royal Society B, 374(1771), 20180033.
  3. Sidner, C. L., Lee, C., Kidd, C. D., Lesh, N., & Rich, C. (2005). Explorations in engagement for humans and robots. Artificial Intelligence, 166(1-2), 140-164.
  4. Torubarova, E., Arvidsson, C., Uddén, J., & Pereira, A. (2023). Investigating Conversational Dynamics in Human-Robot Interaction with fMRI. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 45, No. 45).

Please contact ekator@kth.se or atap@kth.se for more information.

Researchers

Ekaterina Torubarova
Ekaterina Torubarova doctoral student
Julia Uddén co-supervisor - Assistant Professor Profile
André Tiago Abelho Pereira
André Tiago Abelho Pereira Supervisor

Funding

Digital Futures

Duration

2022-03-08 ➞ 2027-03-08