Skip to main content
To KTH's start page To KTH's start page

TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

[1]
Mehta, S., Tu, R., Beskow, J., Székely, É., Henter, G. E. (2024). MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING. In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings. (pp. 11341-11345). Institute of Electrical and Electronics Engineers (IEEE).
[2]
Amerotti, M., Sturm, B., Benford, S., Maruri-Aguilar, H., Vear, C. (2024). Evaluation of an Interactive Music Performance System in the Context of Irish Traditional Dance Music. In Proceedings New Interfaces for Musical Expression NIME’24..
[3]
Jonason, N., Wang, X., Cooper, E., Juvela, L., Sturm, B., Yamagishi, J. (2024). DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input. In Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24)..
[4]
Tånnander, C., O'Regan, J., House, D., Edlund, J., Beskow, J. (2024). Prosodic characteristics of English-accented Swedish neural TTS. In Proceedings of Speech Prosody 2024. (pp. 1035-1039). Leiden, The Netherlands: International Speech Communication Association.
[5]
Misra, S., Boye, J. (2024). Nested Noun Phrase Identification using BERT. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 12138-12143). European Language Resources Association (ELRA).
[6]
Malisz, Z., Foremski, J., Kul, M. (2024). PRODIS - a speech database and a phoneme-based language model for the study of predictability effects in Polish. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 13068-13073). European Language Resources Association (ELRA).
[7]
Inoue, K., Jiang, B., Ekstedt, E., Kawahara, T., Skantze, G. (2024). Multilingual Turn-taking Prediction Using Voice Activity Projection. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 11873-11883). European Language Resources Association (ELRA).
[8]
Tånnander, C., Edlund, J., Gustafsson, J. (2024). Revisiting Three Text-to-Speech Synthesis Experiments with a Web-Based Audience Response System. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 14111-14121). European Language Resources Association (ELRA).
[9]
Wang, S., Székely, É. (2024). Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 6464-6474). European Language Resources Association (ELRA).
[10]
Lameris, H., Székely, É., Gustafsson, J. (2024). The Role of Creaky Voice in Turn Taking and the Perception of Speaker Stance: Experiments Using Controllable TTS. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 16058-16065). European Language Resources Association (ELRA).
[11]
Irfan, B., Kuoppamäki, S. & Skantze, G. (2024). Recommendations for designing conversational companion robots with older adults through foundation models. Frontiers in Robotics and AI, 11.
[12]
Wennberg, U., Henter, G. E. (2024). Exploring Internal Numeracy in Language Models: A Case Study on ALBERT. In MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings. (pp. 35-40). European Language Resources Association (ELRA).
[13]
Esfandiari-Baiat, G., Edlund, J. (2024). The MEET Corpus: Collocated, Distant and Hybrid Three-party Meetings with a Ranking Task. In ISA 2024: 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation at LREC-COLING 2024, Workshop Proceedings. (pp. 1-7). European Language Resources Association (ELRA).
[14]
Müller, M., Dixon, S., Volk, A., Sturm, B., Rao, P. & Gotham, M. (2024). Introducing the TISMIR Education Track: What, Why, How?. Transactions of the International Society for Music Information Retrieval, 7(1), 85-98.
[15]
Casini, L., Jonason, N., Sturm, B. (2024). Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation. In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 84-96). Springer Nature.
[16]
Dalmazzo, D., Deguernel, K., Sturm, B. (2024). The Chordinator : Modeling Music Harmony by Implementing Transformer Networks and Token Strategies. In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 52-66). Springer Nature.
[17]
Ekström, A. G. (2024). A Theory That Never Was: Wrong Way to the “Dawn of Speech”. Biolinguistics, 18.
[18]
Kaila, A.-K., Sturm, B. (2024). Agonistic Dialogue on the Value and Impact of AI Music Applications. In Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.
[19]
Iob, N. A., He, L., Ternström, S., Cai, H. & Brockmann-Bauser, M. (2024). Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment. Journal of Speech, Language and Hearing Research, 1-22.
[20]
Ternström, S. (2024). Pragmatic De-Noising of Electroglottographic Signals. Bioengineering, 11(5), 479.
[21]
Cai, H., Ternström, S., Chaffanjon, P. & Henrich Bernardoni, N. (2024). Effects on Voice Quality of Thyroidectomy : A Qualitative and Quantitative Study Using Voice Maps. Journal of Voice.
[22]
Traum, D., Skantze, G., Nishizaki, H., Higashinaka, R., Minato, T. & Nagai, T. (2024). Special issue on multimodal processing and robotics for dialogue systems (Part II). Advanced Robotics, 38(4), 193-194.
[23]
Borg, A., Parodis, I., Skantze, G. (2024). Creating Virtual Patients using Robots and Large Language Models: A Preliminary Study with Medical Students. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 273-277). Association for Computing Machinery (ACM).
[24]
Ashkenazi, S., Skantze, G., Stuart-Smith, J., Foster, M. E. (2024). Goes to the Heart: Speaking the User's Native Language. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 214-218). Association for Computing Machinery (ACM).
[25]
Kamelabad, A. M. (2024). The Qestion Is Not Whether; It Is How!. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 112-114). Association for Computing Machinery (ACM).
[26]
Irfan, B., Staffa, M., Bobu, A., Churamani, N. (2024). Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Open-World Learning. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1323-1325). Association for Computing Machinery (ACM).
[27]
Axelsson, A., Vaddadi, B., Bogdan, C. M., Skantze, G. (2024). Robots in autonomous buses: Who hosts when no human is there?. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1278-1280). Association for Computing Machinery (ACM).
[28]
Wolfert, P., Henter, G. E. & Belpaeme, T. (2024). Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour. Applied Sciences, 14(4).
[30]
Cumbal, R., Engwall, O. (2024). Speaking Transparently : Social Robots in Educational Settings. In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI '24 Companion), March 11--14, 2024, Boulder, CO, USA..
[31]
Cumbal, R. (2024). Robots Beyond Borders : The Role of Social Robots in Spoken Second Language Practice (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2024:23). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-343863.
[33]
Sundberg, J., Salomão, G. L. & Scherer, K. R. (2024). Emotional expressivity in singing : Assessing physiological and acoustic indicators of two opera singers' voice characteristics. Journal of the Acoustical Society of America, 155(1), 18-28.
[35]
Rosenberg, S., Sundberg, J. & Lã, F. (2024). Kulning : Acoustic and Perceptual Characteristics of a Calling Style Used Within the Scandinavian Herding Tradition. Journal of Voice, 38(3), 585-594.
[36]
Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2024). CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice. Journal of Voice, 38(3), 549-560.
[37]
Körner Gustafsson, J., Södersten, M., Ternström, S. & Schalling, E. (2024). Treatment of Hypophonia in Parkinson’s Disease Through Biofeedback in Daily Life Administered with A Portable Voice Accumulator. Journal of Voice, 38(3), 800.e27-800.e38.
[38]
Kaila, A.-K., Holzapfel, A., Sturm, B. (2023). Are we solving the wrong problems – and doing harm in the process?. In The International Conference on AI and Musical Creativity, Alt-AIMC track..
[39]
Wolfert, P., Henter, G. E., Belpaeme, T. (2023). "Am I listening?", Evaluating the Quality of Generated Data-driven Listening Motion. In ICMI 2023 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction. (pp. 6-10). Association for Computing Machinery (ACM).
[40]
Axelsson, A. (2023). Adaptive Robot Presenters : Modelling Grounding in Multimodal Interaction (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2023:70). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-338178.
[41]
Cao, X., Fan, Z., Svendsen, T., Salvi, G. (2023). An Analysis of Goodness of Pronunciation for Child Speech. In Interspeech 2023. (pp. 4613-4617). International Speech Communication Association.
[42]
Tånnander, C., House, D., Edlund, J. (2023). Analysis-by-synthesis : phonetic-phonological variation indeep neural network-based text-to-speech synthesis. In Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023. (pp. 3156-3160). Prague, Czech Republic: GUARANT International.
[43]
Sturm, B., Flexer, A. (2023). A Review of Validity and its Relationship to Music Information Research. In Proc. Int. Symp. Music Information Retrieval..
[44]
Amerotti, M., Benford, S., Sturm, B., Vear, C. (2023). A Live Performance Rule System Informed by Irish Traditional Dance Music. In Proc. International Symposium on Computer Music Multidisciplinary Research..
[45]
Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS. In ICASSPW 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings. Institute of Electrical and Electronics Engineers (IEEE).
[46]
Peña, P. R., Doyle, P. R., Ip, E. Y., Di Liberto, G., Higgins, D., McDonnell, R., Branigan, H., Gustafsson, J., McMillan, D., Moore, R. J., Cowan, B. R. (2023). A Special Interest Group on Developing Theories of Language Use in Interaction with Conversational User Interfaces. In CHI 2023: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery (ACM).
[47]
Nyatsanga, S., Kucherenko, T., Ahuja, C., Henter, G. E. & Neff, M. (2023). A Comprehensive Review of Data-Driven Co-Speech Gesture Generation. Computer graphics forum (Print), 42(2), 569-596.
[48]
Pérez Zarazaga, P., Henter, G. E., Malisz, Z. (2023). A processing framework to access large quantities of whispered speech found in ASMR. In ICASSP 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes, Greece: IEEE Signal Processing Society.
[49]
Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). A comparative study of self-supervised speech representationsin read and spontaneous TTS. (Manuscript).
[50]
Adiban, M., Siniscalchi, S. M. & Salvi, G. (2023). A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity. Neurocomputing, 537, 296-308.
Full list in the KTH publications portal