Publications by Éva Székely

Peer reviewed

Articles

[1]

É. Székely et al., "Facial expression-based affective speech translation," Journal on Multimodal User Interfaces, vol. 8, no. 1, pp. 87-96, 2014.

[2]

É. Székely et al., "Predicting synthetic voice style from facial expressions. An application for augmented conversations," Speech Communication, vol. 57, pp. 63-75, 2014.

Conference papers

[3]

S. Wang and É. Székely, "Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model," in 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 2024, pp. 6464-6474.

[4]

S. Mehta et al., "MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING," in 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings, 2024, pp. 11341-11345.

[5]

H. Lameris, É. Székely and J. Gustafsson, "The Role of Creaky Voice in Turn Taking and the Perception of Speaker Stance: Experiments Using Controllable TTS," in 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 2024, pp. 16058-16065.

[6]

S. Wang et al., "A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS," in ICASSPW 2023 : 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings, 2023.

[7]

E. Ekstedt et al., "Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis," in Interspeech 2023, 2023, pp. 5481-5485.

[8]

H. Lameris, J. Gustafsson and É. Székely, "Beyond style : synthesizing speech with pragmatic functions," in Interspeech 2023, 2023, pp. 3382-3386.

[9]

I. Torre et al., "Can a gender-ambiguous voice reduce gender stereotypes in human-robot interactions?," in 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, pp. 106-112.

[10]

J. Gustafsson et al., "Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters," in 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition, FG 2023, 2023.

[11]

J. Gustafsson, É. Székely and J. Beskow, "Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters," in 23rd ACM International Conference on Interlligent Virtual Agent (IVA 2023), 2023.

[12]

J. Miniotaitė et al., "Hi robot, it's not what you say, it's how you say it," in 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, pp. 307-314.

[13]

S. Mehta et al., "OverFlow : Putting flows on top of neural transducers for better TTS," in Interspeech 2023, 2023, pp. 4279-4283.

[14]

A. Kirkland, J. Gustafsson and É. Székely, "Pardon my disfluency : The impact of disfluency effects on the perception of speaker competence and confidence," in Interspeech 2023, 2023, pp. 5217-5221.

[15]

H. Lameris et al., "Prosody-Controllable Spontaneous TTS with Neural HMMs," in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023.

[16]

É. Székely, J. Gustafsson and I. Torre, "Prosody-controllable gender-ambiguous speech synthesis : a tool for investigating implicit bias in speech perception," in Interspeech 2023, 2023, pp. 1234-1238.

[17]

É. Székely, S. Wang and J. Gustafsson, "So-to-Speak : an exploratory platform for investigating the interplay between style and prosody in TTS," in Interspeech 2023, 2023, pp. 2016-2017.

[18]

M. Elmers, J. O'Mahony and É. Székely, "Synthesis after a couple PINTs : Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception," in Interspeech 2023, 2023, pp. 4843-4847.

[19]

M. P. Aylett et al., "Why is my Agent so Slow? Deploying Human-Like Conversational Turn-Taking," in HAI 2023 - Proceedings of the 11th Conference on Human-Agent Interaction, 2023, pp. 490-492.

[20]

S. Wang, J. Gustafsson and É. Székely, "Evaluating Sampling-based Filler Insertion with Spontaneous TTS," in LREC 2022 : Thirteen International Conference On Language Resources And Evaluation, 2022, pp. 1960-1969.

[21]

S. Mehta et al., "Neural HMMs are all you need (for high-quality attention-free TTS)," in 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 7457-7461.

[22]

N. Ward et al., "Two Pragmatic Functions of Breathy Voice in American English Conversation," in Proceedings 11th International Conference on Speech Prosody, 2022, pp. 82-86.

[23]

A. Kirkland et al., "Where's the uh, hesitation? : The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence," in INTERSPEECH 2022, 2022, pp. 4990-4994.

[24]

S. Wang et al., "Integrated Speech and Gesture Synthesis," in ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction, 2021, pp. 177-185.

[25]

A. Kirkland et al., "Perception of smiling voice in spontaneous speech synthesis," in Proceedings of Speech Synthesis Workshop (SSW11), 2021, pp. 108-112.

[26]

É. Székely, J. Edlund and J. Gustafsson, "Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis," in Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 6368-6374.

[27]

É. Székely et al., "Breathing and Speech Planning in Spontaneous Speech Synthesis," in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7649-7653.

[28]

S. Alexanderson et al., "Generating coherent spontaneous speech and gesture from text," in Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020, 2020.

[29]

É. Székely, G. E. Henter and J. Gustafson, "Casting to Corpus : Segmenting and Selecting Spontaneous Dialogue for TTS with a CNN-LSTM Speaker-Dependent Breath Detector," in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6925-6929.

[30]

É. Székely et al., "How to train your fillers: uh and um in spontaneous speech synthesis," in The 10th ISCA Speech Synthesis Workshop, 2019.

[31]

L. Clark et al., "Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions," in CHI EA '19 EXTENDED ABSTRACTS : EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019.

[32]

É. Székely et al., "Off the cuff : Exploring extemporaneous speech delivery with TTS," in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, pp. 3687-3688.

[33]

P. Wagner et al., "Speech Synthesis Evaluation : State-of-the-Art Assessment and Suggestion for a Novel Research Program," in Proceedings of the 10th Speech Synthesis Workshop (SSW10), 2019.

[34]

É. Székely et al., "Spontaneous conversational speech synthesis from found data," in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, pp. 4435-4439.

[35]

S. Betz et al., "The greennn tree - lengthening position influences uncertainty perception," in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2019, 2019, pp. 3990-3994.

[36]

É. Székely, P. Wagner and J. Gustafson, "THE WRYLIE-BOARD: MAPPING ACOUSTIC SPACE OF EXPRESSIVE FEEDBACK TO ATTITUDE MARKERS," in Proc. IEEE Spoken Language Technology conference, 2018.

[37]

E. Szekely, J. Mendelson and J. Gustafson, "Synthesising uncertainty : The interplay of vocal effort and hesitation disfluencies," in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017, pp. 804-808.

[38]

B. R. Cowan et al., "They Know as Much as We Do : Knowledge Estimation and Partner Modelling of Artificial Partners," in CogSci 2017 - Proceedings of the 39th Annual Meeting of the Cognitive Science Society: Computational Foundations of Cognition, 2017, pp. 1836-1841.

[39]

C. Oertel et al., "Using crowd-sourcing for the design of listening agents : Challenges and opportunities," in ISIAA 2017 - Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents, Co-located with ICMI 2017, 2017, pp. 37-38.

[40]

É. Székely, M. T. Keane and J. Carson-Berndsen, "The Effect of Soft, Modal and Loud Voice Levels on Entrainment in Noisy Conditions," in Sixteenth Annual Conference of the International Speech Communication Association, 2015.

[41]

Z. Ahmed et al., "A system for facial expression-based affective speech translation," in Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion, 2013, pp. 57-58.

[42]

É. Székely et al., "Detecting a targeted voice style in an audiobook using voice quality features," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, pp. 4593-4596.

[43]

É. Székely et al., "Evaluating expressive speech synthesis from audiobooks in conversational phrases," in International Conference on Language Resources and Evaluation. MAY 21-27, 2012., 2012, pp. 3335-3339.

[44]

É. Székely et al., "Facial expression as an input annotation modality for affective speech-to-speech translation," in Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, 2012.

[45]

M. Abou-Zleikha et al., "Multi-level exemplar-based duration generation for expressive speech synthesis," in Proceedings of Speech Prosody, 2012.

[46]

J. P. Cabral et al., "Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz.," in Proceedings of the International Conference on Language Resources and Evaluation, 2012, pp. 4136-4142.

[47]

É. Székely et al., "Synthesizing expressive speech from amateur audiobook recordings," in Spoken Language Technology Workshop (SLT), 2012, pp. 297-302.

[48]

J. P. Cabral et al., "Using the Wizard-of-Oz Framework in a Pronunciation Training System for Providing User Feedback and Instructions," in IS ADEPT, 2012.

[49]

É. Székely et al., "WinkTalk : a demonstration of a multimodal speech synthesis platform linking facial expressions to expressive synthetic voices," in Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies, 2012, pp. 5-8.

[50]

É. Székely et al., "Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters.," in 12th Annual Conference of the International-Speech-Communication-Association 2011 (INTERSPEECH 2011), 2011, pp. 2409-2412.

[51]

P. Cahill et al., "Ucd blizzard challenge 2011 entry," in Proceedings of the Blizzard Challenge Workshop, 2011.

Non-peer reviewed

Conference papers

[52]

H. Lameris et al., "Spontaneous Neural HMM TTS with Prosodic Feature Modification," in Proceedings of Fonetik 2022, 2022.

Other

[53]

S. Wang et al., "A comparative study of self-supervised speech representationsin read and spontaneous TTS," (Manuscript).

Latest sync with DiVA:

2024-07-18 01:01:01

Studies

Research

Collaboration

About KTH

Library

Publications by Éva Székely

Peer reviewed

Articles

Conference papers

Non-peer reviewed

Conference papers

Other

Contact