Skip to main content
To KTH's start page

TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

[1]
Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024). A Critical Survey of Research in Music Genre Recognition. In Proc. International Society for Music Information Retrieval Conference. ISMIR.
[2]
Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., Cros Vila, L., Dalmazzo, D., Casini, L., Bown, O., Collins, N., Drott, E., Sterne, J., Holzapfel, A., Ben-Tal, O. (2024). AI Music Studies : Preparing for the Coming Flood. In Proceedings of AI Music Creativity..
[3]
Thomé, C., Sturm, B., Pertoft, J., Jonason, N. (2024). Applying textual inversion to control and personalize text-to-music models. In Proc. 15th Int. Workshop on Machine Learning and Music..
[4]
Dalmazzo, D., Déguernel, K., Sturm, B. (2024). ChromaFlow: Modeling And Generating Harmonic Progressions With a Transformer And Voicing Encoding. In MML 2024: 15th International Workshop on Machine Learning and Music, 2024, Vilnius, Lithuania. Vilnius, Lithuania.
[5]
Kanhov, E. (2024). Entanglements with Deepfake : AI Voice Models and their Diffractive Potential. Presented at 12th New Materialisms Conference. Intersectional Materialisms: Diversity in Creative Industries, Methods & Practices. 26-28 August, 2024, Kildare, Ireland.
[6]
Willemsen, B., Skantze, G. (2024). Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding. Presented at 17th International Natural Language Generation Conference (INLG). (pp. 453-469). Association for Computational Linguistics.
[7]
Borg, A., Jobs, B., Huss, V., Gentline, C., Espinosa, F., Ruiz, M. ... Parodis, I. (2024). Enhancing clinical reasoning skills for medical students : a qualitative comparison of LLM-powered social robotic versus computer-based virtual patients within rheumatology. Rheumatology International.
[8]
Mehta, S., Deichler, A., O'Regan, J., Moëll, B., Beskow, J., Henter, G. E., Alexanderson, S. (2024). Fake it to make it : Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (pp. 1952-1964).
[9]
Benford, S., Amerotti, M., Sturm, B., Avila, J. M. (2024). Negotiating Autonomy and Trust when Performing with an AI Musician. In TAS 2024 - Proceedings of the 2nd International Symposium on Trustworthy Autonomous Systems. Association for Computing Machinery (ACM).
[10]
Wang, Y., Xu, Y., Skantze, G., Buschmeier, H. (2024). How Much Does Nonverbal Communication Conform to Entropy Rate Constancy? : A Case Study on Listener Gaze in Interaction. In 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference. (pp. 3533-3545). Association for Computational Linguistics (ACL).
[11]
Senane, Z., Cao, L., Buchner, V. L., Tashiro, Y., You, L., Herman, P., Nordahl, M., Tu, R., Von Ehrenheim, V. (2024). Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask. In KDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. (pp. 2560-2571). Association for Computing Machinery (ACM).
[12]
Casini, L., Jonason, N., Sturm, B. (2024). Sparks of Musical AGI? Challenges and perspectives in music co-creation with LLMs : A qualitative exploration of the music knowledge of LLMs and their use for music creation. Presented at International Conference on AI and Musical Creativity (AIMC) 2024, Oxford UK, 9 - 11 September 2024.
[13]
Engström, H., Włodarczak, M., Ternström, S. (2024). Mapping the effect of body position : Voice quality differences in connected speech. In Proceedings of FONETIK 2024, Stockholm, June 3-€“5, 2024. (pp. 21-26). Stockholm Univeristy.
[14]
Rafiei, S., Brunnström, K., Schenkman, B., Andersson, J., Sjöström, M. (2024). Laboratory study : Human Interaction using Remote Control System for Airport Safety Management. In 2024 16th International Conference on Quality of Multimedia Experience, QoMEX 2024. (pp. 167-170). Institute of Electrical and Electronics Engineers (IEEE).
[15]
Kucherenko, T., Wolfert, P., Yoon, Y., Viegas, C., Nikolov, T., Tsakov, M. & Henter, G. E. (2024). Evaluating Gesture Generation in a Large-scale Open Challenge : The GENEA Challenge 2022. ACM Transactions on Graphics, 43(3).
[16]
Jansson, M., Tian, K., Hrastinski, S., Engwall, O. (2024). An initial exploration of semi-automated tutoring : How AI could be used as support for online human tutors. In Proceedings of the Fourteenth International Conference on Networked Learning. Aalborg University.
[17]
Arvidsson, C., Torubarova, E., Abelho Pereira, A. T. & Udden, J. (2024). Conversational production and comprehension : fMRI-evidence reminiscent of but deviant from the classical Broca-Wernicke model. Cerebral Cortex, 34(3).
[18]
Jääskeläinen, P., Kanhov, E. (2024). Data Ethics and Practices of Human-Nonhuman Sound Technologies and Ecologies. In VIHAR '24 - 4th International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots..
[19]
Ekström, A. (2024). Phonetic potential in the extant apes and extinct hominins (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, Sweden, TRITA-EECS-AVL 55). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-351250.
[21]
Ekström, A. G., Gannon, C., Edlund, J., Moran, S. & Lameira, A. R. (2024). Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech. Scientific Reports, 14(1).
[22]
Malmberg, F., Klezovich, A., Mesch, J., Beskow, J. (2024). Exploring Latent Sign Language Representations with Isolated Signs, Sentences and In-the-Wild Data. In 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, sign-lang@LREC-COLING 2024. (pp. 219-224). Association for Computational Linguistics (ACL).
[23]
Mehta, S., Tu, R., Beskow, J., Székely, É., Henter, G. E. (2024). MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING. In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings. (pp. 11341-11345). Institute of Electrical and Electronics Engineers (IEEE).
[24]
Amerotti, M., Sturm, B., Benford, S., Maruri-Aguilar, H., Vear, C. (2024). Evaluation of an Interactive Music Performance System in the Context of Irish Traditional Dance Music. In Proceedings New Interfaces for Musical Expression NIME’24. International Conference on New Interfaces for Musical Expression.
[25]
Jonason, N., Wang, X., Cooper, E., Juvela, L., Sturm, B., Yamagishi, J. (2024). DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input. In Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24)..
[26]
Tånnander, C., O'Regan, J., House, D., Edlund, J., Beskow, J. (2024). Prosodic characteristics of English-accented Swedish neural TTS. In Proceedings of Speech Prosody 2024. (pp. 1035-1039). Leiden, The Netherlands: International Speech Communication Association.
[27]
Misra, S., Boye, J. (2024). Nested Noun Phrase Identification using BERT. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 12138-12143). European Language Resources Association (ELRA).
[28]
Malisz, Z., Foremski, J., Kul, M. (2024). PRODIS - a speech database and a phoneme-based language model for the study of predictability effects in Polish. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 13068-13073). European Language Resources Association (ELRA).
[29]
Inoue, K., Jiang, B., Ekstedt, E., Kawahara, T., Skantze, G. (2024). Multilingual Turn-taking Prediction Using Voice Activity Projection. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 11873-11883). European Language Resources Association (ELRA).
[30]
Tånnander, C., Edlund, J., Gustafsson, J. (2024). Revisiting Three Text-to-Speech Synthesis Experiments with a Web-Based Audience Response System. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 14111-14121). European Language Resources Association (ELRA).
[31]
Wang, S., Székely, É. (2024). Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 6464-6474). European Language Resources Association (ELRA).
[32]
Irfan, B., Kuoppamäki, S. & Skantze, G. (2024). Recommendations for designing conversational companion robots with older adults through foundation models. Frontiers in Robotics and AI, 11.
[33]
Wennberg, U., Henter, G. E. (2024). Exploring Internal Numeracy in Language Models: A Case Study on ALBERT. In MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings. (pp. 35-40). European Language Resources Association (ELRA).
[34]
Müller, M., Dixon, S., Volk, A., Sturm, B., Rao, P. & Gotham, M. (2024). Introducing the TISMIR Education Track: What, Why, How?. Transactions of the International Society for Music Information Retrieval, 7(1), 85-98.
[35]
Casini, L., Jonason, N., Sturm, B. (2024). Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation. In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 84-96). Springer Nature.
[36]
Ekström, A. G. (2024). A Theory That Never Was: Wrong Way to the “Dawn of Speech”. Biolinguistics, 18.
[37]
Kaila, A.-K., Sturm, B. (2024). Agonistic Dialogue on the Value and Impact of AI Music Applications. In Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.
[38]
Iob, N. A., He, L., Ternström, S., Cai, H. & Brockmann-Bauser, M. (2024). Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment. Journal of Speech, Language and Hearing Research, 1-22.
[39]
Ternström, S. (2024). Pragmatic De-Noising of Electroglottographic Signals. Bioengineering, 11(5), 479.
[40]
Cai, H., Ternström, S., Chaffanjon, P. & Henrich Bernardoni, N. (2024). Effects on Voice Quality of Thyroidectomy : A Qualitative and Quantitative Study Using Voice Maps. Journal of Voice.
[41]
Borg, A., Parodis, I., Skantze, G. (2024). Creating Virtual Patients using Robots and Large Language Models : A Preliminary Study with Medical Students. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 273-277). Association for Computing Machinery (ACM).
[42]
Ashkenazi, S., Skantze, G., Stuart-Smith, J., Foster, M. E. (2024). Goes to the Heart: Speaking the User's Native Language. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 214-218). Association for Computing Machinery (ACM).
[43]
Irfan, B., Staffa, M., Bobu, A., Churamani, N. (2024). Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Open-World Learning. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1323-1325). Association for Computing Machinery (ACM).
[44]
Axelsson, A., Vaddadi, B., Bogdan, C. M., Skantze, G. (2024). Robots in autonomous buses: Who hosts when no human is there?. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1278-1280). Association for Computing Machinery (ACM).
[45]
Wolfert, P., Henter, G. E. & Belpaeme, T. (2024). Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour. Applied Sciences, 14(4).
[46]
Cumbal, R. (2024). Robots Beyond Borders : The Role of Social Robots in Spoken Second Language Practice (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2024:23). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-343863.
[47]
Sundberg, J., Salomão, G. L. & Scherer, K. R. (2024). Emotional expressivity in singing : Assessing physiological and acoustic indicators of two opera singers' voice characteristics. Journal of the Acoustical Society of America, 155(1), 18-28.
[49]
Rosenberg, S., Sundberg, J. & Lã, F. (2024). Kulning : Acoustic and Perceptual Characteristics of a Calling Style Used Within the Scandinavian Herding Tradition. Journal of Voice, 38(3), 585-594.
[50]
Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2024). CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice. Journal of Voice, 38(3), 549-560.
Full list in the KTH publications portal