Communicative Behavior

Communicating humans

In this area we develop models of how humans and animals perceive and produce non-verbal communication. This can be used both to gain understanding about the mechanisms underlying human and animal, e.g. dog, communication and behavior. It can also be used to design systems where communication and behavior understanding is used, e.g. for computerized analysis of cognitive decline or motor disease in infants, or signs of pain in animals such as horses.

People

Research Engineers

Zofia Lukasiewicz
Ernest Pokropek (2021-2022)

MSc Students

Magnus Ruben Tibbe (MSc 2024)
Fanxuan Liu (MSc 2024)
Theo Wieland (MSc 2024)
Ioannis Athanasiadis (MSc 2022)
Frans Nordén (MSc 2021)
Michaela Söderström (MSc 2021)
Zhenghong Li (MSc 2020)
Ci Li (MSc 2020)
Olga Mikheeva (MSc 2017)

PhD Students

Chen Ling
Theo Wieland
Sofia Broomé (PhD 2022, now at Sleip, Sweden)
Olga Mikheeva (2017-2022, now at King, Sweden)
Taras Kucherenko (PhD 2021, now at Electronic Arts, Sweden)
Judith Bütepage (co-supervisor, PhD 2019, now at Electronic Arts, Sweden)
Kalin Stefanov (co-supervisor, PhD 2018, now at Monash University, Australia)

Affiliated PhD Students

João Moreira Alves (Aalborg University, Denmark)
Jeanne Parmentier (Utrecht University and University of Twente, Netherlands)
Maheen Rashid (University of California at Davis, USA, PhD 2021, now at Univrses, Sweden)

Post Docs

Yanxia Zhang (2016, now at Toyota Research Institute, USA)

Collaborators

Jonas Beskow
Joakim Gustafson
Pia Haubro Andersen (Swedish University of Agricultural Sciences, Sweden)
Johanna Björklund (Umeå University, Sweden)
Linda Keeling (Swedish University of Agricultural Sciences, Sweden)
Johan Lundström (Karolinska Institutet, Sweden)
Gustaf Mårtensson (Mycronic and Karolinska Institutet, Sweden)
Ulrika Ådén (Karolinska Institutet, Sweden)

Current Projects

ANITA: ANImal TrAnslator (VR 2024-present)

Beagle

In recent years there has been an explosive growth in neural network based algorithms for interpretation and generation of natural language. One task that has been addressed successfully using neural approaches is machine translation from one language to another.
The goal of the project proposed here is to make an automated interpreter of animal communicative behavior to human language, in order to allow humans to get an insight into the mind of animals in their care. In this project we focus on the dog species, and will potentially explore common denominators with other species such as horse and cattle.
Despite the high potential impact, automated animal behavior recognition is still an undeveloped field. The reason is not the signal itself; the dog species has developed for thousands of years together with humans, and have rich communication and interaction both with humans and with other animal individuals. The main difference from the human language field is instead the lack of data; large data volume is a key success factor in training large neural language models. Thus, data collection is an important venture in this project. Using this data, we propose to develop a deep generative approach to animal behavior recognition from video.

Publications

The relation between motion and cognition in infants (SeRC 2023-present)

In this project, which is part of the SeRC Data Science MCP and a collaboration with the Department of Women’s and Children’s health at Karolinska Institutet, we study the relation between motion patterns and cognition and brain function in infants . The currently primary application is detection of motor conditions in neonates, but we will also study more general connections between motion and future development of cognition and language.

Publications

UNCOCO: UNCOnscious COmmunication (WASP 2023-present)

This project, which is part of the WARA Media and Language and a collaboration with the Perceptual Neuroscience group at KI, entails two contributions.

Firstly, we develop a 3D embodied, integrated representation of head pose, gaze and facial micro expression, that can be extracted from a regular 60 Hz video camera and a desk-mounted gaze sensor. The embodied, integrated 3D representation of head pose, gaze and facial micro expression provides a preprocessing step to the second contribution, a deep generative model for inferring the latent emotional state of the human from the non-verbal communicative behavior. The model is employed in three different contexts: 1) estimating user affect for a digital avatar, 2) analyzing human non-verbal behavior connected to sensor stimuli, e.g., quantify approach/avoidance motor response to smell, 3) estimating frustration in a driving scenario.

Publications

STING: Synthesis and analysis with Transducers and Invertible Neural Generators (WASP 2022-present)

Human communication is multimodal in nature, and occurs through combinations of speech,
language, gesture, facial expression, and similar signals. To enable natural interactions with human beings, artificial agents must be capable of both analysing and producing these rich and
interdependent signals, and connect them to their semantic implications. Unfortunately, even the strongest machine learning methods currently fall short of this goal: automated semantic understanding of human behaviour remains superficial, and generated agent behaviours are empty gestures lacking the ability to convey meaning and communicative intent.

The STING NEST, part of the WARA Media and Language, intends to change this state of affairs by uniting synthesis and analysis with transducers and invertible neural models. This involves connecting concrete, continuous valued sensory data such as images, sound, and motion, with high level, predominantly discrete, representations of meaning, which has the potential to endow synthesis output with human understandable highlevel explanations, while simultaneously improving the ability to attach probabilities to semantic representations. The bidirectionality also allows us to create efficient mechanisms for explainability, and to inspect and enforce fairness in the models.
Recent advances in generative models suggest that our ambitious research agenda is likely to be met with success. Normalising flows and variational autoencoders permit both extracting disentangled representations of observations, and (re-)generating observations from these abstract representations, all within a single model. Their recent extensions to graph structured data are of particular interest because graphs are commonly used semantic representations.
This opens the door not only to generating structured information, but also to capturing the composition of the generation itself (which is a graph in its own right) by exploiting and transferring techniques from finite state transducers and graph grammars.

Publications

Wenjie Yin, Ruibo Tu, Hang Yin, Danica Kragic, Hedvig Kjellström, and Mårten Björkman. Controllable motion synthesis and reconstruction with autoregressive diffusion models. In IEEE International Conference on Robot and Human Interactive Communication, 2023.

Project home page

Past Projects

EquineML: Machine Learning methods for recognition of the pain expressions of horses (VR, FORMAS 2017-2022)

Recognition of pain in horses and other animals is important, because pain is a manifestation of disease and decreases animal welfare. Pain diagnostics for humans typically includes self-evaluation and location of the pain with the help of standardized forms, and labeling of the pain by an clinical expert using pain scales. However, animals cannot verbalize their pain as humans can, and the use of standardized pain scales is challenged by the fact that animals as horses and cattle, being prey animals, display subtle and less obvious pain behavior - it is simply beneficial for a prey animal to appear healthy, in order lower the interest from predators. The aim of this project is to develop methods for automatic recognition of pain in horses, with the help of Computer Vision.

Publications

Ernest Pokropek, Sofia Broomé, Pia Haubro Andersen, and Hedvig Kjellström. Predictive modeling of equine activity budgets using a 3D skeleton reconstructed from surveillance recordings. In CVPR Workshop on Computer Vision for Animal Behavior Tracking and Modeling, 2023.
Sofia Broomé, Marcelo Feighelstein, Anna Zamansky, Gabriel Carreira Lencioni, Pia Haubro Andersen, Francisca Pessanha, Marwa Mahmoud, Hedvig Kjellström, and Albert Ali Salah. Going deeper than tracking: A survey of computer-vision based recognition of animal pain and affective state. International Journal of Computer Vision 131, 2023.
Sofia Broomé, Ernest Pokropek, Boyu Li, and Hedvig Kjellström. Recur, Attend or Convolve? Frame dependency modeling matters for cross-domain robustness in action recognition. In IEEE Winter Conference on Applications of Computer Vision, 2023.
- Dataset
Sofia Broomé, Katrina Ask, Maheen Rashid, Pia Haubro Andersen, and Hedvig Kjellström. Sharing pain: Using pain domain transfer for video recognition of low grade orthopedic pain in horses. PLOS ONE 17(3):e0263854, 2022.
Maheen Rashid, Sofia Broomé, Katrina Ask, Elin Hernlund, Pia Haubro Andersen, Hedvig Kjellström, and Yong Jae Lee. Equine pain behavior classification via self-supervised disentangled pose representation. In IEEE Winter Conference on Applications of Computer Vision, 2022.
Pia Haubro Andersen, Sofia Broomé, Maheen Rashid, Johan Lundblad, Katrina Ask, Zhenghong Li, Elin Hernlund, Marie Rhodin, and Hedvig Kjellström. Towards machine recognition of facial expressions of pain in horses. Animals 11(6), 2021.
Zhenghong Li, Sofia Broomé, Pia Haubro Andersen, and Hedvig Kjellström. Automated detection of equine facial action units, arXiv:2102.08983, 2021.
Joonatan Mänttäri*, Sofia Broomé*, John Folkesson, and Hedvig Kjellström. Interpreting video features: A comparison of 3D convolutional networks and convolutional LSTM networks. In Asian Conference on Computer Vision, 2020. (*Joint first authors)
- Videos and figures
Maheen Rashid, Hedvig Kjellström, and Yong Jae Lee. Action graphs: Weakly-supervised action localization with graph convolution networks. In IEEE Winter Conference on Applications of Computer Vision, 2020.
Sofia Broomé, Karina Bech Gleerup, Pia Haubro Andersen, and Hedvig Kjellström. Dynamics are important for the recognition of equine pain in video. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
Pia Haubro Andersen, Karina Bech Gleerup, Jennifer Wathan, Britt Coles, Hedvig Kjellström, Sofia Broomé, Yong Jae Lee, Maheen Rashid, Claudia Sonder, Erika Rosenberg, and Deborah Forster. Can a machine learn to see horse pain? An interdisciplinary approach towards automated decoding of facial expressions of pain in the horse. In International Conference on Methods and Techniques in Behavioral Research, 2018.

Project home page

EACare: Embodied Agent to support elderly mental wellbeing (SSF, 2016-2021)

The main goal of the multidisciplinary project EACare is to develop an embodied agent – a robot head with communicative skills – capable of interacting with especially elderly people at a clinic or in their home, analyzing their mental and psychological status via powerful audiovisual sensing and assessing their mental abilities to identify subjects in high risk or possibly at the first stages of cognitive decline, with a special focus on Alzheimer’s disease. The interaction is performed according to the procedures developed for memory evaluation sessions, the key part of the diagnostic process for detecting cognitive decline.
This new diagnostic system will be one of the means by which medical doctors evaluate people for cognitive decline, in parallel to the existing methods such as memory evaluation sessions with a (human) clinician, MRI scans, blood tests, etc. Different parts of the framework can also be used for other purposes, such as to develop tools for dementia preventive training and for decision support during clinical memory evaluation sessions.

Publications

Olga Mikheeva, Ieva Kazlauskaite, Adam Hartshorne, Hedvig Kjellström, Carl-Henrik Ek, and Neill DF Campbell. Aligned multi-task Gaussian process. In International Conference on Artificial Intelligence and Statistics, 2022.
Taras Kucherenko, Rajmund Nagy, Michael Neff, Hedvig Kjellström, and Gustav Eje Henter. Multimodal analysis of the predictability of hand-gesture properties. In International Conference on Autonomous Agents and Multi-Agent Systems, 2022.
- Videos and code
Taras Kucherenko, Rajmund Nagy, Patrik Jonell, Michael Neff, Hedvig Kjellström, and Gustav Eje Henter. Speech2Properties2Gestures: Gesture-property prediction as a tool for generating representational gestures from speech. In ACM International Conference on Intelligent Virtual Agents, 2021.
- Videos and code
- IVA 2021 Honorable Mention extended abstracts
Rajmund Nagy*, Taras Kucherenko*, Birger Moëll, André Pereira, Hedvig Kjellström, and Ulysses Bernardet. A framework for integrating gesture generation models into interactive conversational agents. In International Conference on Autonomous Agents and Multiagent Systems, demo track, 2021. (*Joint first authors)
- Videos and code
Patrik Jonell*, Birger Moëll*, Krister Håkansson*, Gustav Eje Henter, Taras Kucherenko, Olga Mikheeva, Göran Hagman, Jasper Holleman, Miia Kivipelto, Hedvig Kjellström, Joakim Gustafson and Jonas Beskow. Multimodal capture of patient behaviour for improved detection of early dementia: Clinical feasibility and preliminary results. Frontiers in Computer Science 3, 2021. (*Joint first authors)
Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter, and Hedvig Kjellström. Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation. International Journal of Human-Computer Interaction 37(14), 2021.
Krister Håkansson, Jonas Beskow, Hedvig Kjellström, Joakim Gustafsson, Alexandre Bonnard, Marie Rydén, Sara Stormoen, Göran Hagman, Ulrika Akenine, Kristal Morales Peres, Gustav Henter, Maria Sundström, and Miia Kivipelto. Robot-assisted detection of subclinical dementia: progress report and preliminary findings. Alzheimer's & Dementia 16(S6):e043311, 2020.
Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexanderson, Iolanda Leite, and Hedvig Kjellström. Gesticulator: A framework for semantically-aware speech-driven gesture generation. In ACM International Conference on Multimodal Interaction, 2020.
- Videos and code
- ICMI 2020 Best Paper award
Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, and Hedvig Kjellström. Analyzing input and output representations for speech-driven gesture generation. In ACM Intelligent Virtual Agents Conference, 2019.
- Videos and code
Olga Mikheeva, Carl Henrik Ek, and Hedvig Kjellström. Perceptual facial expression representation. In IEEE International Conference on Automatic Face and Gesture Recognition, 2018.

Project home page

Data-driven modelling of interaction skills for social robots (KTH ICT-TNG 2016-2018)

This project aims to investigate fundamentals of situated and collaborative multi-party interaction and collect the data and knowledge required to build social robots that are able to handle collaborative attention and co-present interaction. In the project we will employ state-of-the art motion- and gaze tracking on a large scale as the basis for modelling and implementing critical non-verbal behaviours such as joint attention, mutual gaze and backchannels in situated human-robot collaborative interaction, in a fluent, adaptive and context sensitive way.

Publications

Kalin Stefanov, Giampiero Salvi, Dimosthenis Kontogiorgos, Hedvig Kjellström, and Jonas Beskow. Modeling of human visual attention in multiparty open-world dialogues. ACM Transactions on Human-Robot Interaction 8(2):1-21, 2019.
Yanxia Zhang, Jonas Beskow, and Hedvig Kjellström. Look but don't stare: Mutual gaze interaction in social robots. In International Conference on Social Robotics, 2017.

HumanAct: Visual and multi-modal learning of Human Activity and interaction with the surrounding scene (VR, EIT ICT Labs 2010-2013)

The overwhelming majority of human activities are interactive in the sense that they relate to the world around the human (in Computer Vision called the "scene"). Despite this, visual analyses of human activity very rarely take scene context into account. The objective in this project is modeling of human activity with object and scene context.

The methods developed within the project will be applied to the task of Learning from Demonstration, where a (household) robot learns how to perform a task (e.g. preparing a dish) by watching a human perform the same task.

Publications

Cheng Zhang, Carl Henrik Ek, Andreas Damianou, and Hedvig Kjellström. Factorized topic models. In International Conference on Learning Representations, 2013.
Cheng Zhang, Dan Song, and Hedvig Kjellström. Contextual modeling with labeled multi-LDA. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013.
Alessandro Pieropan, Carl Henrik Ek, and Hedvig Kjellström. Functional object descriptors for human activity modeling. In IEEE International Conference on Robotics and Automation, 2013.
- Video
Hedvig Kjellström, Javier Romero, and Danica Kragic. Visual object-action recognition: Inferring object affordances from human demonstration. Computer Vision and Image Understanding 115, 2011.
Hedvig Kjellström, Danica Kragic, and Michael J. Black. Tracking people interacting with objects. In IEEE Conference on Computer Vision and Pattern Recognition, 2010.

Hedvig Kjellström,
Professor
hedvig@kth.se
+46 8 790 69 06

Studies

Research

Collaboration

About KTH

Library

Communicative Behavior

Portfolio

Contact