TMH Publications (latest 50)
Below are the 50 latest publications from the Department of Speech, Music and Hearing.
TMH Publications
[1]
Inoue, K., Jiang, B., Ekstedt, E., Kawahara, T., Skantze, G. (2024).
Multilingual Turn-taking Prediction Using Voice Activity Projection.
In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 11873-11883). European Language Resources Association (ELRA).
[2]
Irfan, B., Kuoppamäki, S. & Skantze, G. (2024).
Recommendations for designing conversational companion robots with older adults through foundation models.
Frontiers in Robotics and AI, 11.
[3]
Wennberg, U., Henter, G. E. (2024).
Exploring Internal Numeracy in Language Models: A Case Study on ALBERT.
In MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings. (pp. 35-40). European Language Resources Association (ELRA).
[4]
Esfandiari-Baiat, G., Edlund, J. (2024).
The MEET Corpus: Collocated, Distant and Hybrid Three-party Meetings with a Ranking Task.
In ISA 2024: 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation at LREC-COLING 2024, Workshop Proceedings. (pp. 1-7). European Language Resources Association (ELRA).
[5]
Müller, M., Dixon, S., Volk, A., Sturm, B., Rao, P. & Gotham, M. (2024).
Introducing the TISMIR Education Track: What, Why, How?.
Transactions of the International Society for Music Information Retrieval, 7(1), 85-98.
[6]
Casini, L., Jonason, N., Sturm, B. (2024).
Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation.
In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 84-96). Springer Nature.
[7]
Dalmazzo, D., Deguernel, K., Sturm, B. (2024).
The Chordinator : Modeling Music Harmony by Implementing Transformer Networks and Token Strategies.
In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 52-66). Springer Nature.
[8]
Ekström, A. G. (2024).
A Theory That Never Was: Wrong Way to the “Dawn of Speech”.
Biolinguistics, 18.
[9]
Kaila, A.-K., Sturm, B. (2024).
Agonistic Dialogue on the Value and Impact of AI Music Applications.
In Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.
[10]
Iob, N. A., He, L., Ternström, S., Cai, H. & Brockmann-Bauser, M. (2024).
Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment.
Journal of Speech, Language and Hearing Research, 1-22.
[11]
Ternström, S. (2024).
Pragmatic De-Noising of Electroglottographic Signals.
Bioengineering, 11(5), 479.
[12]
Cai, H., Ternström, S., Chaffanjon, P. & Henrich Bernardoni, N. (2024).
Effects on Voice Quality of Thyroidectomy : A Qualitative and Quantitative Study Using Voice Maps.
Journal of Voice.
[13]
Traum, D., Skantze, G., Nishizaki, H., Higashinaka, R., Minato, T. & Nagai, T. (2024).
Special issue on multimodal processing and robotics for dialogue systems (Part II).
Advanced Robotics, 38(4), 193-194.
[14]
Borg, A., Parodis, I., Skantze, G. (2024).
Creating Virtual Patients using Robots and Large Language Models: A Preliminary Study with Medical Students.
In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 273-277). Association for Computing Machinery (ACM).
[15]
Ashkenazi, S., Skantze, G., Stuart-Smith, J., Foster, M. E. (2024).
Goes to the Heart: Speaking the User's Native Language.
In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 214-218). Association for Computing Machinery (ACM).
[16]
Kamelabad, A. M. (2024).
The Qestion Is Not Whether; It Is How!.
In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 112-114). Association for Computing Machinery (ACM).
[17]
Irfan, B., Staffa, M., Bobu, A., Churamani, N. (2024).
Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Open-World Learning.
In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1323-1325). Association for Computing Machinery (ACM).
[18]
Axelsson, A., Vaddadi, B., Bogdan, C. M., Skantze, G. (2024).
Robots in autonomous buses: Who hosts when no human is there?.
In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1278-1280). Association for Computing Machinery (ACM).
[19]
Wolfert, P., Henter, G. E. & Belpaeme, T. (2024).
Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour.
Applied Sciences, 14(4).
[20]
Mehta, S., Frisk, K. & Nyborg, L. (2024).
Role of Cr in Mn-rich precipitates for Al–Mn–Cr–Zr-based alloys tailored for additive manufacturing.
Calphad, 84.
[21]
Cumbal, R., Engwall, O. (2024).
Speaking Transparently : Social Robots in Educational Settings.
In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI '24 Companion), March 11--14, 2024, Boulder, CO, USA..
[22]
Cumbal, R. (2024).
Robots Beyond Borders : The Role of Social Robots in Spoken Second Language Practice
(Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2024:23). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-343863.
[23]
Ternström, S. (2024).
Update 3.1 to FonaDyn : A system for real-time analysis of the electroglottogram, over the voice range.
SoftwareX, 26.
[24]
Sundberg, J., Salomão, G. L. & Scherer, K. R. (2024).
Emotional expressivity in singing : Assessing physiological and acoustic indicators of two opera singers' voice characteristics.
Journal of the Acoustical Society of America, 155(1), 18-28.
[25]
Kalpakchi, D. & Boye, J. (2024).
Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies.
Natural Language Engineering, 217-255.
[26]
Rosenberg, S., Sundberg, J. & Lã, F. (2024).
Kulning : Acoustic and Perceptual Characteristics of a Calling Style Used Within the Scandinavian Herding Tradition.
Journal of Voice, 38(3), 585-594.
[27]
Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2024).
CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice.
Journal of Voice, 38(3), 549-560.
[28]
Körner Gustafsson, J., Södersten, M., Ternström, S. & Schalling, E. (2024).
Treatment of Hypophonia in Parkinson’s Disease Through Biofeedback in Daily Life Administered with A Portable Voice Accumulator.
Journal of Voice, 38(3), 800.e27-800.e38.
[29]
Kaila, A.-K., Holzapfel, A., Sturm, B. (2023).
Are we solving the wrong problems – and doing harm in the process?.
In The International Conference on AI and Musical Creativity, Alt-AIMC track..
[30]
Torre, I., Lagerstedt, E., Dennler, N., Seaborn, K., Leite, I., Székely, É. (2023).
Can a gender-ambiguous voice reduce gender stereotypes in human-robot interactions?.
In 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN. (pp. 106-112). Institute of Electrical and Electronics Engineers (IEEE).
[31]
D'Amario, S., Ternström, S., Goebl, W. & Bishop, L. (2023).
Body motion of choral singers.
Frontiers in Psychology, 14.
[32]
Wolfert, P., Henter, G. E., Belpaeme, T. (2023).
"Am I listening?", Evaluating the Quality of Generated Data-driven Listening Motion.
In ICMI 2023 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction. (pp. 6-10). Association for Computing Machinery (ACM).
[33]
Axelsson, A. (2023).
Adaptive Robot Presenters : Modelling Grounding in Multimodal Interaction
(Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2023:70). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-338178.
[34]
Ekstedt, E., Wang, S., Székely, É., Gustafsson, J., Skantze, G. (2023).
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis.
In Interspeech 2023. (pp. 5481-5485). International Speech Communication Association.
[35]
Cao, X., Fan, Z., Svendsen, T., Salvi, G. (2023).
An Analysis of Goodness of Pronunciation for Child Speech.
In Interspeech 2023. (pp. 4613-4617). International Speech Communication Association.
[36]
Lameris, H., Gustafsson, J., Székely, É. (2023).
Beyond style : synthesizing speech with pragmatic functions.
In Interspeech 2023. (pp. 3382-3386). International Speech Communication Association.
[37]
Kalpakchi, D. (2023).
Ask and distract : Data-driven methods for the automatic generation of multiple-choice reading comprehension questions from Swedish texts
(Doctoral thesis , KTH Royal Institute of Technology, TRITA-EECS-AVL 2023:56). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-336531.
[38]
Tånnander, C., House, D., Edlund, J. (2023).
Analysis-by-synthesis : phonetic-phonological variation indeep neural network-based text-to-speech synthesis.
In Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023. (pp. 3156-3160). Prague, Czech Republic: GUARANT International.
[39]
Sturm, B., Flexer, A. (2023).
A Review of Validity and its Relationship to Music Information Research.
In Proc. Int. Symp. Music Information Retrieval..
[40]
Amerotti, M., Benford, S., Sturm, B., Vear, C. (2023).
A Live Performance Rule System Informed by Irish Traditional Dance Music.
In Proc. International Symposium on Computer Music Multidisciplinary Research..
[41]
Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023).
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS.
In ICASSPW 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings. Institute of Electrical and Electronics Engineers (IEEE).
[42]
Peña, P. R., Doyle, P. R., Ip, E. Y., Di Liberto, G., Higgins, D., McDonnell, R., Branigan, H., Gustafsson, J., McMillan, D., Moore, R. J., Cowan, B. R. (2023).
A Special Interest Group on Developing Theories of Language Use in Interaction with Conversational User Interfaces.
In CHI 2023: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery (ACM).
[43]
Nyatsanga, S., Kucherenko, T., Ahuja, C., Henter, G. E. & Neff, M. (2023).
A Comprehensive Review of Data-Driven Co-Speech Gesture Generation.
Computer graphics forum (Print), 42(2), 569-596.
[44]
Leijon, A., von Gablenz, P., Holube, I., Taghia, J. & Smeds, K. (2023).
Bayesian analysis of Ecological Momentary Assessment (EMA) data collected in adults before and after hearing rehabilitation.
Frontiers in Digital Health, 5.
[45]
Pérez Zarazaga, P., Henter, G. E., Malisz, Z. (2023).
A processing framework to access large quantities of whispered speech found in ASMR.
In ICASSP 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes, Greece: IEEE Signal Processing Society.
[46]
Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023).
A comparative study of self-supervised speech representationsin read and spontaneous TTS.
(Manuscript).
[47]
Adiban, M., Siniscalchi, S. M. & Salvi, G. (2023).
A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity.
Neurocomputing, 537, 296-308.
[48]
Falk, S., Sturm, B., Ahlbäck, S. (2023).
Automatic legato transcription based on onset detection.
In SMC 2023: Proceedings of the Sound and Music Computing Conference 2023. (pp. 214-221). Sound and Music Computing Network.
[49]
Déguernel, K., Sturm, B. (2023).
Bias in Favour or Against Computational Creativity : A Survey and Reflection on the Importance of Socio-cultural Context in its Evaluation.
In Proc. International Conference on Computational Creativity..
[50]
Huang, R., Holzapfel, A., Sturm, B. & Kaila, A.-K. (2023).
Beyond Diverse Datasets : Responsible MIR, Interdisciplinarity, and the Fractured Worlds of Music.
Transactions of the International Society for Music Information Retrieval, 6(1), 43-59.