TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

[1]

Best, P., Araya-Salas, M., Ekström, A. G., Freitas, B., Jensen, F. H., Kershenbaum, A. ... Marxer, R. (2025). Bioacoustic fundamental frequency estimation : a cross-species dataset and deep learning baseline. Bioacoustics, 34(4), 419-446.

[2]

Cros Vila, L., Sturm, B., Casini, L. & Dalmazzo, D. (2025). The AI Music Arms Race : On the Detection of AI-Generated Music. Transactions of the International Society for Music Information Retrieval, 8(1), 179-194.

[3]

Torubarova, E. (2025). Brain-Focused Multimodal Approach for Studying Conversational Engagement in HRI. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1894-1896). Institute of Electrical and Electronics Engineers (IEEE).

[4]

Torubarova, E., Arvidsson, C., Berrebi, J., Uddén, J., Abelho Pereira, A. T. (2025). NeuroEngage: A Multimodal Dataset Integrating fMRI for Analyzing Conversational Engagement in Human-Human and Human-Robot Interactions. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 849-858). Institute of Electrical and Electronics Engineers (IEEE).

[5]

Tuttösí, P., Mehta, S., Syvenky, Z., Burkanova, B., Hfsafsti, M., Wang, Y., Yeung, H. H., Henter, G. E., Aucouturier, J. J., Lim, A. (2025). Take a Look, it's in a Book, a Reading Robot. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1803-1805). Institute of Electrical and Electronics Engineers (IEEE).

[6]

Irfan, B., Churamani, N., Zhao, M., Ayub, A., Rossi, S. (2025). Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI) : Overcoming Inequalities with Adaptation. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1970-1972). Institute of Electrical and Electronics Engineers (IEEE).

[7]

Skantze, G., Irfan, B. (2025). Applying General Turn-Taking Models to Conversational Human-Robot Interaction. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 859-868). Institute of Electrical and Electronics Engineers (IEEE).

[8]

Reimann, M. M., Hindriks, K. V., Kunneman, F. A., Oertel, C., Skantze, G., Leite, I. (2025). What Can You Say to a Robot? Capability Communication Leads to More Natural Conversations. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 708-716). Institute of Electrical and Electronics Engineers (IEEE).

[9]

Irfan, B., Skantze, G. (2025). Between You and Me: Ethics of Self-Disclosure in Human-Robot Interaction. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1357-1362). Institute of Electrical and Electronics Engineers (IEEE).

[10]

Janssens, R., Pereira, A., Skantze, G., Irfan, B., Belpaeme, T. (2025). Online Prediction of User Enjoyment in Human-Robot Dialogue with LLMs. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1363-1367). Institute of Electrical and Electronics Engineers (IEEE).

[11]

Cros Vila, L., Sturm, B. (2025). (Mis)Communicating with our AI Systems. I Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery (ACM).

[12]

Kamelabad, A. M., Inoue, E., Skantze, G. (2025). Comparing Monolingual and Bilingual Social Robots as Conversational Practice Companions in Language Learning. I Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 829-838).

[13]

Gonzalez Oliveras, P., Engwall, O. & Wilde, A. (2025). Social Educational Robotics and Learning Analytics : A Scoping Review of an Emerging Field. International Journal of Social Robotics.

[14]

Cai, H. & Ternström, S. (2025). A WaveNet-based model for predicting the electroglottographic signal from the acoustic voice signal. Journal of the Acoustical Society of America, 157(4), 3033-3044.

[15]

Marcinek, L., Beskow, J., Gustafsson, J. (2025). A Dual-Control Dialogue Framework for Human-Robot Interaction Data Collection : Integrating Human Emotional and Contextual Awareness with Conversational AI. I Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings. (s. 290-297). Springer Nature.

[16]

Mishra, C., Skantze, G., Hagoort, P., Verdonschot, R. (2025). Perception of Emotions in Human and Robot Faces : Is the Eye Region Enough?. I Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings. (s. 290-303). Springer Nature.

[17]

Herbst, C. T., Tokuda, I. T., Nishimura, T., Ternström, S., Ossio, V., Levy, M. ... Dunn, J. C. (2025). ‘Monkey yodels’—frequency jumps in New World monkey vocalizations greatly surpass human vocal register transitions. Philosophical Transactions of the Royal Society of London. Biological Sciences, 380(1923).

[18]

Irfan, B., Kuoppamäki, S., Hosseini, A. & Skantze, G. (2025). Between reality and delusion : challenges of applying large language models to companion robots for open-domain dialogues with older adults. Autonomous Robots, 49(1).

[19]

Borg, A., Georg, C., Jobs, B., Huss, V., Waldenlind, K., Ruiz, M. ... Parodis, I. (2025). Virtual Patient Simulations Using Social Robotics Combined With Large Language Models for Clinical Reasoning Training in Medical Education: Mixed Methods Study. Journal of Medical Internet Research, 27.

[20]

Cai, H. (2025). Mapping voice quality in normal, pathological and synthetic voices (Doktorsavhandling , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2025:25). Hämtad från https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-360211.

[21]

Kanhov, E., Kaila, A.-K. & Sturm, B. L. T. (2025). Innovation, data colonialism and ethics : critical reflections on the impacts of AI on Irish traditional music. Journal of New Music Research, 1-17.

[22]

Włodarczak, M., Ludusan, B., Sundberg, J. & Heldner, M. (2025). Classification of voice quality using neck-surface acceleration : Comparison with glottal flow and radiated sound. Journal of Voice, 39(1), 10-24.

[23]

Székely, É., Hope, M. (2024). An inclusive approach to creating a palette of synthetic voices for gender diversity. I Proc. Interspeech 2024. (s. 3070-3074).

[24]

Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024). A critical survey of research in music genre recognition. I Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024. (s. 745-782). International Society for Music Information Retrieval.

[25]

Sturm, B., Kanhov, E., Holzapfel, A. (Red.). (2024). Collected Materials of The First International Conference in AI Music Studies : Prospects, Challenges and Methodologies of Studying AI Music in the Humanities and Social Sciences . KTH Royal Institute of Technology.

[26]

Cao, X., Fan, Z., Svendsen, T., Salvi, G. (2024). A Framework for Phoneme-Level Pronunciation Assessment Using CTC. I Interspeech 2024. (s. 302-306). International Speech Communication Association.

[27]

Blomsma, P., Vaitonyté, J., Skantze, G. & Swerts, M. (2024). Backchannel behavior is idiosyncratic. Language and Cognition, 16(4), 1158-1181.

[28]

Edlund, J., Tånnander, C., Le Maguer, S., Wagner, P. (2024). Assessing the impact of contextual framing on subjective TTS quality. I Interspeech 2024. (s. 1205-1209). International Speech Communication Association.

[29]

Székely, É., Hope, M. (2024). An inclusive approach to creating a palette of synthetic voices for gender diversity. I Interspeech 2024. (s. 3070-3074). International Speech Communication Association.

[30]

Tånnander, C., Mehta, S., Beskow, J., Edlund, J. (2024). Beyond graphemes and phonemes: continuous phonological features in neural text-to-speech synthesis. I Interspeech 2024. (s. 2815-2819). International Speech Communication Association.

[31]

Wang, S., Székely, É., Gustafsson, J. (2024). Contextual Interactive Evaluation of TTS Models in Dialogue Systems. I Interspeech 2024. (s. 2965-2969). International Speech Communication Association.

[32]

Lameris, H., Gustafsson, J., Székely, É. (2024). CreakVC : A Voice Conversion Tool for Modulating Creaky Voice. I Interspeech 2024. (s. 1005-1006). International Speech Communication Association.

[33]

Francis, J., Székely, É., Gustafsson, J. (2024). ConnecTone : A Modular AAC System Prototype with Contextual Generative Text Prediction and Style-Adaptive Conversational TTS. I Interspeech 2024. (s. 1001-1002). International Speech Communication Association.

[34]

Kamelabad, A. M., Engwall, O., Skantze, G. (2024). Conformity and Trust in Multi-party vs. Individual Human-Robot Interaction. I Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents. New York, NY United States: Association for Computing Machinery (ACM).

[35]

(2024). A DIFFICULT CHRISTMAS. .

[36]

Ternström, S., Bernardoni, N. H., Birkholz, P., Guasch, O., Gully, A. (Red.). (2024). Computational Analysis and Simulation of the Human Voice (Dagstuhl Seminar 24242) . Schloss Dagstuhl – Leibniz-Zentrum für Informatik.

[37]

Kynych, F., Cerva, P., Zdansky, J., Svendsen, T. & Salvi, G. (2024). A lightweight approach to real-time speaker diarization : from audio toward audio-visual data streams. EURASIP Journal on Audio, Speech, and Music Processing, 2024(1).

[38]

Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024). A Critical Survey of Research in Music Genre Recognition. I Proc. International Society for Music Information Retrieval Conference. ISMIR.

[39]

Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., Cros Vila, L., Dalmazzo, D., Casini, L., Bown, O., Collins, N., Drott, E., Sterne, J., Holzapfel, A., Ben-Tal, O. (2024). AI Music Studies : Preparing for the Coming Flood. I Proceedings of AI Music Creativity..

[40]

Thomé, C., Sturm, B., Pertoft, J., Jonason, N. (2024). Applying textual inversion to control and personalize text-to-music models. I Proc. 15th Int. Workshop on Machine Learning and Music..

[41]

Dalmazzo, D., Déguernel, K., Sturm, B. (2024). ChromaFlow: Modeling And Generating Harmonic Progressions With a Transformer And Voicing Encoding. I MML 2024: 15th International Workshop on Machine Learning and Music, 2024, Vilnius, Lithuania. Vilnius, Lithuania.

[42]

Jansson, M., Tian, K., Hrastinski, S., Engwall, O. (2024). An initial exploration of semi-automated tutoring : How AI could be used as support for online human tutors. I Proceedings of the Fourteenth International Conference on Networked Learning. Aalborg University.

[43]

Arvidsson, C., Torubarova, E., Abelho Pereira, A. T. & Udden, J. (2024). Conversational production and comprehension : fMRI-evidence reminiscent of but deviant from the classical Broca-Wernicke model. Cerebral Cortex, 34(3).

[44]

Jääskeläinen, P., Kanhov, E. (2024). Data Ethics and Practices of Human-Nonhuman Sound Technologies and Ecologies. I VIHAR '24 - 4th International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots..

[45]

Ekström, A. G. (2024). Correcting the record : Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934−2022). American Journal of Primatology, 86(8).

[46]

Ekström, A. G., Gannon, C., Edlund, J., Moran, S. & Lameira, A. R. (2024). Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech. Scientific Reports, 14(1).

[47]

Ekström, A. G. (2024). A Theory That Never Was: Wrong Way to the “Dawn of Speech”. Biolinguistics, 18.

[48]

Kaila, A.-K., Sturm, B. (2024). Agonistic Dialogue on the Value and Impact of AI Music Applications. I Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.

[49]

Borg, A., Parodis, I., Skantze, G. (2024). Creating Virtual Patients using Robots and Large Language Models : A Preliminary Study with Medical Students. I HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (s. 273-277). Association for Computing Machinery (ACM).

[50]

Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2024). CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice. Journal of Voice, 38(3), 549-560.

Fullständig lista i KTH:s publikationsportal

Utbildning

Forskning

Samverkan

Om KTH

Bibliotek

TMH Publications (latest 50)

TMH Publications

Kontakt