TMH Publications (latest 50)
Below are the 50 latest publications from the Department of Speech, Music and Hearing.
TMH Publications
[1]
Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024).
A Critical Survey of Research in Music Genre Recognition.
I Proc. International Society for Music Information Retrieval Conference. ISMIR.
[2]
Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., Cros Vila, L., Dalmazzo, D., Casini, L., Bown, O., Collins, N., Drott, E., Sterne, J., Holzapfel, A., Ben-Tal, O. (2024).
AI Music Studies : Preparing for the Coming Flood.
I Proceedings of AI Music Creativity..
[3]
Thomé, C., Sturm, B., Pertoft, J., Jonason, N. (2024).
Applying textual inversion to control and personalize text-to-music models.
I Proc. 15th Int. Workshop on Machine Learning and Music..
[4]
Dalmazzo, D., Déguernel, K., Sturm, B. (2024).
ChromaFlow: Modeling And Generating Harmonic Progressions With a Transformer And Voicing Encoding.
I MML 2024: 15th International Workshop on Machine Learning and Music, 2024, Vilnius, Lithuania. Vilnius, Lithuania.
[5]
Kanhov, E. (2024).
Entanglements with Deepfake : AI Voice Models and their Diffractive Potential.
Presenterad vid 12th New Materialisms Conference. Intersectional Materialisms: Diversity in Creative Industries, Methods & Practices. 26-28 August, 2024, Kildare, Ireland.
[6]
Willemsen, B., Skantze, G. (2024).
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding.
Presenterad vid 17th International Natural Language Generation Conference (INLG). (s. 453-469). Association for Computational Linguistics.
[7]
Borg, A., Jobs, B., Huss, V., Gentline, C., Espinosa, F., Ruiz, M. ... Parodis, I. (2024).
Enhancing clinical reasoning skills for medical students : a qualitative comparison of LLM-powered social robotic versus computer-based virtual patients within rheumatology.
Rheumatology International.
[8]
Mehta, S., Deichler, A., O'Regan, J., Moëll, B., Beskow, J., Henter, G. E., Alexanderson, S. (2024).
Fake it to make it : Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis.
I Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (s. 1952-1964).
[9]
Benford, S., Amerotti, M., Sturm, B., Avila, J. M. (2024).
Negotiating Autonomy and Trust when Performing with an AI Musician.
I TAS 2024 - Proceedings of the 2nd International Symposium on Trustworthy Autonomous Systems. Association for Computing Machinery (ACM).
[10]
Wang, Y., Xu, Y., Skantze, G., Buschmeier, H. (2024).
How Much Does Nonverbal Communication Conform to Entropy Rate Constancy? : A Case Study on Listener Gaze in Interaction.
I 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference. (s. 3533-3545). Association for Computational Linguistics (ACL).
[11]
Senane, Z., Cao, L., Buchner, V. L., Tashiro, Y., You, L., Herman, P., Nordahl, M., Tu, R., Von Ehrenheim, V. (2024).
Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask.
I KDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. (s. 2560-2571). Association for Computing Machinery (ACM).
[12]
Casini, L., Jonason, N., Sturm, B. (2024).
Sparks of Musical AGI? Challenges and perspectives in music co-creation with LLMs : A qualitative exploration of the music knowledge of LLMs and their use for music creation.
Presenterad vid International Conference on AI and Musical Creativity (AIMC) 2024, Oxford UK, 9 - 11 September 2024.
[13]
Engström, H., Włodarczak, M., Ternström, S. (2024).
Mapping the effect of body position : Voice quality differences in connected speech.
I Proceedings of FONETIK 2024, Stockholm, June 3-5, 2024. (s. 21-26). Stockholm Univeristy.
[14]
Rafiei, S., Brunnström, K., Schenkman, B., Andersson, J., Sjöström, M. (2024).
Laboratory study : Human Interaction using Remote Control System for Airport Safety Management.
I 2024 16th International Conference on Quality of Multimedia Experience, QoMEX 2024. (s. 167-170). Institute of Electrical and Electronics Engineers (IEEE).
[15]
Kucherenko, T., Wolfert, P., Yoon, Y., Viegas, C., Nikolov, T., Tsakov, M. & Henter, G. E. (2024).
Evaluating Gesture Generation in a Large-scale Open Challenge : The GENEA Challenge 2022.
ACM Transactions on Graphics, 43(3).
[16]
Jansson, M., Tian, K., Hrastinski, S., Engwall, O. (2024).
An initial exploration of semi-automated tutoring : How AI could be used as support for online human tutors.
I Proceedings of the Fourteenth International Conference on Networked Learning. Aalborg University.
[17]
Arvidsson, C., Torubarova, E., Abelho Pereira, A. T. & Udden, J. (2024).
Conversational production and comprehension : fMRI-evidence reminiscent of but deviant from the classical Broca-Wernicke model.
Cerebral Cortex, 34(3).
[18]
Jääskeläinen, P., Kanhov, E. (2024).
Data Ethics and Practices of Human-Nonhuman Sound Technologies and Ecologies.
I VIHAR '24 - 4th International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots..
[19]
Ekström, A. (2024).
Phonetic potential in the extant apes and extinct hominins
(Doktorsavhandling , KTH Royal Institute of Technology, Stockholm, Sweden, TRITA-EECS-AVL 55). Hämtad från https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-351250.
[20]
Ekström, A. G. (2024).
Correcting the record : Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934−2022).
American Journal of Primatology, 86(8).
[21]
Ekström, A. G., Gannon, C., Edlund, J., Moran, S. & Lameira, A. R. (2024).
Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech.
Scientific Reports, 14(1).
[22]
Malmberg, F., Klezovich, A., Mesch, J., Beskow, J. (2024).
Exploring Latent Sign Language Representations with Isolated Signs, Sentences and In-the-Wild Data.
I 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, sign-lang@LREC-COLING 2024. (s. 219-224). Association for Computational Linguistics (ACL).
[23]
Mehta, S., Tu, R., Beskow, J., Székely, É., Henter, G. E. (2024).
MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING.
I 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings. (s. 11341-11345). Institute of Electrical and Electronics Engineers (IEEE).
[24]
Amerotti, M., Sturm, B., Benford, S., Maruri-Aguilar, H., Vear, C. (2024).
Evaluation of an Interactive Music Performance System in the Context of Irish Traditional Dance Music.
I Proceedings New Interfaces for Musical Expression NIME’24. International Conference on New Interfaces for Musical Expression.
[25]
Jonason, N., Wang, X., Cooper, E., Juvela, L., Sturm, B., Yamagishi, J. (2024).
DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input.
I Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24)..
[26]
Tånnander, C., O'Regan, J., House, D., Edlund, J., Beskow, J. (2024).
Prosodic characteristics of English-accented Swedish neural TTS.
I Proceedings of Speech Prosody 2024. (s. 1035-1039). Leiden, The Netherlands: International Speech Communication Association.
[27]
Misra, S., Boye, J. (2024).
Nested Noun Phrase Identification using BERT.
I 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (s. 12138-12143). European Language Resources Association (ELRA).
[28]
Malisz, Z., Foremski, J., Kul, M. (2024).
PRODIS - a speech database and a phoneme-based language model for the study of predictability effects in Polish.
I 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (s. 13068-13073). European Language Resources Association (ELRA).
[29]
Inoue, K., Jiang, B., Ekstedt, E., Kawahara, T., Skantze, G. (2024).
Multilingual Turn-taking Prediction Using Voice Activity Projection.
I 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (s. 11873-11883). European Language Resources Association (ELRA).
[30]
Tånnander, C., Edlund, J., Gustafsson, J. (2024).
Revisiting Three Text-to-Speech Synthesis Experiments with a Web-Based Audience Response System.
I 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (s. 14111-14121). European Language Resources Association (ELRA).
[31]
Wang, S., Székely, É. (2024).
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model.
I 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (s. 6464-6474). European Language Resources Association (ELRA).
[32]
Irfan, B., Kuoppamäki, S. & Skantze, G. (2024).
Recommendations for designing conversational companion robots with older adults through foundation models.
Frontiers in Robotics and AI, 11.
[33]
Wennberg, U., Henter, G. E. (2024).
Exploring Internal Numeracy in Language Models: A Case Study on ALBERT.
I MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings. (s. 35-40). European Language Resources Association (ELRA).
[34]
Müller, M., Dixon, S., Volk, A., Sturm, B., Rao, P. & Gotham, M. (2024).
Introducing the TISMIR Education Track: What, Why, How?.
Transactions of the International Society for Music Information Retrieval, 7(1), 85-98.
[35]
Casini, L., Jonason, N., Sturm, B. (2024).
Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation.
I ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (s. 84-96). Springer Nature.
[36]
Ekström, A. G. (2024).
A Theory That Never Was: Wrong Way to the “Dawn of Speech”.
Biolinguistics, 18.
[37]
Kaila, A.-K., Sturm, B. (2024).
Agonistic Dialogue on the Value and Impact of AI Music Applications.
I Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.
[38]
Iob, N. A., He, L., Ternström, S., Cai, H. & Brockmann-Bauser, M. (2024).
Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment.
Journal of Speech, Language and Hearing Research, 1-22.
[39]
Ternström, S. (2024).
Pragmatic De-Noising of Electroglottographic Signals.
Bioengineering, 11(5), 479.
[40]
Cai, H., Ternström, S., Chaffanjon, P. & Henrich Bernardoni, N. (2024).
Effects on Voice Quality of Thyroidectomy : A Qualitative and Quantitative Study Using Voice Maps.
Journal of Voice.
[41]
Borg, A., Parodis, I., Skantze, G. (2024).
Creating Virtual Patients using Robots and Large Language Models : A Preliminary Study with Medical Students.
I HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (s. 273-277). Association for Computing Machinery (ACM).
[42]
Ashkenazi, S., Skantze, G., Stuart-Smith, J., Foster, M. E. (2024).
Goes to the Heart: Speaking the User's Native Language.
I HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (s. 214-218). Association for Computing Machinery (ACM).
[43]
Irfan, B., Staffa, M., Bobu, A., Churamani, N. (2024).
Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Open-World Learning.
I HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1323-1325). Association for Computing Machinery (ACM).
[44]
Axelsson, A., Vaddadi, B., Bogdan, C. M., Skantze, G. (2024).
Robots in autonomous buses: Who hosts when no human is there?.
I HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1278-1280). Association for Computing Machinery (ACM).
[45]
Wolfert, P., Henter, G. E. & Belpaeme, T. (2024).
Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour.
Applied Sciences, 14(4).
[46]
Cumbal, R. (2024).
Robots Beyond Borders : The Role of Social Robots in Spoken Second Language Practice
(Doktorsavhandling , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2024:23). Hämtad från https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-343863.
[47]
Sundberg, J., Salomão, G. L. & Scherer, K. R. (2024).
Emotional expressivity in singing : Assessing physiological and acoustic indicators of two opera singers' voice characteristics.
Journal of the Acoustical Society of America, 155(1), 18-28.
[48]
Kalpakchi, D. & Boye, J. (2024).
Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies.
Natural Language Engineering, 217-255.
[49]
Rosenberg, S., Sundberg, J. & Lã, F. (2024).
Kulning : Acoustic and Perceptual Characteristics of a Calling Style Used Within the Scandinavian Herding Tradition.
Journal of Voice, 38(3), 585-594.
[50]
Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2024).
CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice.
Journal of Voice, 38(3), 549-560.