Publikationer av Gustav Henter
Refereegranskade
Artiklar
[1]
T. Kucherenko et al., "Evaluating Gesture Generation in a Large-scale Open Challenge : The GENEA Challenge 2022," ACM Transactions on Graphics, vol. 43, no. 3, 2024.
[2]
P. Wolfert, G. E. Henter och T. Belpaeme, "Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour," Applied Sciences, vol. 14, no. 4, 2024.
[3]
S. Nyatsanga et al., "A Comprehensive Review of Data-Driven Co-Speech Gesture Generation," Computer graphics forum (Print), vol. 42, no. 2, s. 569-596, 2023.
[4]
S. Alexanderson et al., "Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models," ACM Transactions on Graphics, vol. 42, no. 4, 2023.
[5]
J. G. De Gooijer, G. E. Henter och A. Yuan, "Kernel-based hidden Markov conditional densities," Computational Statistics & Data Analysis, vol. 169, 2022.
[6]
T. Kucherenko et al., "Moving Fast and Slow : Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation," International Journal of Human-Computer Interaction, vol. 37, no. 14, s. 1300-1316, 2021.
[7]
P. Jonell et al., "Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia : Clinical Feasibility and Preliminary Results," Frontiers in Computer Science, vol. 3, 2021.
[8]
G. Valle-Perez et al., "Transflower : probabilistic autoregressive dance generation with multimodal attention," ACM Transactions on Graphics, vol. 40, no. 6, 2021.
[9]
G. E. Henter, S. Alexanderson och J. Beskow, "MoGlow : Probabilistic and controllable motion synthesis using normalising flows," ACM Transactions on Graphics, vol. 39, no. 6, s. 1-14, 2020.
[10]
S. Alexanderson et al., "Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows," Computer graphics forum (Print), vol. 39, no. 2, s. 487-496, 2020.
[11]
G. E. Henter och W. B. Kleijn, "Minimum entropy rate simplification of stochastic processes," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 12, s. 2487-2500, 2016.
[12]
P. N. Petkov, G. E. Henter och W. B. Kleijn, "Maximizing Phoneme Recognition Accuracy for Enhanced Speech Intelligibility in Noise," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 5, s. 1035-1045, 2013.
[13]
G. E. Henter och W. B. Kleijn, "Picking up the pieces : Causal states in noisy data, and how to recover them," Pattern Recognition Letters, vol. 34, no. 5, s. 587-594, 2013.
Konferensbidrag
[14]
U. Wennberg och G. E. Henter, "Exploring Internal Numeracy in Language Models: A Case Study on ALBERT," i MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings, 2024, s. 35-40.
[15]
S. Mehta et al., "Fake it to make it : Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis," i Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, s. 1952-1964.
[16]
S. Mehta et al., "MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING," i 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings, 2024, s. 11341-11345.
[17]
P. Wolfert, G. E. Henter och T. Belpaeme, ""Am I listening?", Evaluating the Quality of Generated Data-driven Listening Motion," i ICMI 2023 Companion : Companion Publication of the 25th International Conference on Multimodal Interaction, 2023, s. 6-10.
[18]
S. Wang et al., "A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS," i ICASSPW 2023 : 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings, 2023.
[19]
P. Pérez Zarazaga, G. E. Henter och Z. Malisz, "A processing framework to access large quantities of whispered speech found in ASMR," i ICASSP 2023 : 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[20]
J. J. Webber et al., "Autovocoder: Fast Waveform Generation from a Learned Speech Representation Using Differentiable Digital Signal Processing," i ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings, 2023.
[21]
Y. Yoon et al., "GENEA Workshop 2023 : The 4th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents," i ICMI 2023 : Proceedings of the 25th International Conference on Multimodal Interaction, 2023, s. 822-823.
[22]
S. Mehta et al., "OverFlow : Putting flows on top of neural transducers for better TTS," i Interspeech 2023, 2023, s. 4279-4283.
[23]
H. Lameris et al., "Prosody-Controllable Spontaneous TTS with Neural HMMs," i International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023.
[24]
P. Pérez Zarazaga et al., "Speaker-independent neural formant synthesis," i Interspeech 2023, 2023, s. 5556-5560.
[25]
T. Kucherenko et al., "The GENEA Challenge 2023 : A large-scale evaluation of gesture generation models in monadic and dyadic setings," i Proceedings Of The 25Th International Conference On Multimodal Interaction, Icmi 2023, 2023, s. 792-801.
[26]
P. Wolfert et al., "GENEA Workshop 2022 : The 3rd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents," i ACM International Conference Proceeding Series, 2022, s. 799-800.
[27]
T. Kucherenko et al., "Multimodal analysis of the predictability of hand-gesture properties," i AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 2022, s. 770-779.
[28]
S. Mehta et al., "Neural HMMs are all you need (for high-quality attention-free TTS)," i 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, s. 7457-7461.
[29]
C. Valentini-Botinhao et al., "Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks," i INTERSPEECH 2022, 2022, s. 471-475.
[30]
J. Fong et al., "Speech Audio Corrector : using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech," i INTERSPEECH 2022, 2022, s. 1213-1217.
[31]
Y. Yoon et al., "The GENEA Challenge 2022 : A large evaluation of data-driven co-speech gesture generation," i ICMI 2022 : Proceedings of the 2022 International Conference on Multimodal Interaction, 2022, s. 736-747.
[32]
G. Beck et al., "Wavebender GAN : An architecture for phonetically meaningful speech manipulation," i 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022.
[33]
T. Kucherenko et al., "A large, crowdsourced evaluation of gesture generation systems on common data : The GENEA Challenge 2020," i Proceedings IUI '21: 26th International Conference on Intelligent User Interfaces, 2021, s. 11-21.
[34]
M. M. Sorkhei, G. E. Henter och H. Kjellström, "Full-Glow : Fully conditional Glow for more realistic image generation," i Pattern Recognition : 43rd DAGM German Conference, DAGM GCPR 2021, 2021, s. 697-711.
[35]
T. Kucherenko et al., "GENEA Workshop 2021 : The 2nd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents," i Proceedings of ICMI '21: International Conference on Multimodal Interaction, 2021, s. 872-873.
[36]
P. Jonell et al., "HEMVIP: Human Evaluation of Multiple Videos in Parallel," i ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, 2021, s. 707-711.
[37]
S. Wang et al., "Integrated Speech and Gesture Synthesis," i ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction, 2021, s. 177-185.
[38]
T. Kucherenko et al., "Speech2Properties2Gestures : Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech," i IVA '21 : Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents, 2021, s. 145-147.
[39]
U. Wennberg och G. E. Henter, "The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models," i ACL-IJCNLP 2021 : THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, s. 130-140.
[40]
É. Székely et al., "Breathing and Speech Planning in Spontaneous Speech Synthesis," i 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, s. 7649-7653.
[41]
S. Alexanderson et al., "Generating coherent spontaneous speech and gesture from text," i Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020, 2020.
[42]
T. Kucherenko et al., "Gesticulator : A framework for semantically-aware speech-driven gesture generation," i ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction, 2020.
[43]
P. Jonell et al., "Let’s face it : Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings," i IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, 2020.
[44]
K. Håkansson et al., "Robot-assisted detection of subclinical dementia : progress report and preliminary findings," i In 2020 Alzheimer's Association International Conference. ALZ., 2020.
[45]
A. Ghosh et al., "Robust classification using hidden markov models and mixtures of normalizing flows," i 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), 2020.
[46]
S. Alexanderson och G. E. Henter, "Robust model training and generalisation with Studentising flows," i Proceedings of the ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2020, s. 25:1-25:9.
[47]
T. Kucherenko et al., "Analyzing Input and Output Representations for Speech-Driven Gesture Generation," i 19th ACM International Conference on Intelligent Virtual Agents, 2019.
[48]
É. Székely, G. E. Henter och J. Gustafson, "Casting to Corpus : Segmenting and Selecting Spontaneous Dialogue for TTS with a CNN-LSTM Speaker-Dependent Breath Detector," i 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, s. 6925-6929.
[49]
É. Székely et al., "How to train your fillers: uh and um in spontaneous speech synthesis," i The 10th ISCA Speech Synthesis Workshop, 2019.
[50]
É. Székely et al., "Off the cuff : Exploring extemporaneous speech delivery with TTS," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, s. 3687-3688.
[51]
T. Kucherenko et al., "On the Importance of Representations for Speech-Driven Gesture Generation : Extended Abstract," i International Conference on Autonomous Agents and Multiagent Systems (AAMAS '19), May 13-17, 2019, Montréal, Canada, 2019, s. 2072-2074.
[52]
P. Wagner et al., "Speech Synthesis Evaluation : State-of-the-Art Assessment and Suggestion for a Novel Research Program," i Proceedings of the 10th Speech Synthesis Workshop (SSW10), 2019.
[53]
É. Székely et al., "Spontaneous conversational speech synthesis from found data," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, s. 4435-4439.
[54]
Z. Malisz et al., "The speech synthesis phoneticians need is both realistic and controllable," i Proceedings from FONETIK 2019, 2019.
[55]
O. Watts et al., "Where do the improvements come from in sequence-to-sequence neural TTS?," i Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019, s. 217-222.
[56]
P. N. Petkov, W. B. Kleijn och G. E. Henter, "Enhancing Subjective Speech Intelligibility Using a Statistical Model of Speech," i 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, s. 166-169.
[57]
G. E. Henter, M. R. Frean och W. B. Kleijn, "Gaussian process dynamical models for nonparametric speech representation and synthesis," i Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, s. 4505-4508.
[58]
G. E. Henter och W. B. Kleijn, "Intermediate-State HMMs to Capture Continuously-Changing Signal Features," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2011, s. 1828-1831.
[59]
G. E. Henter och W. B. Kleijn, "Simplified Probability Models for Generative Tasks : a Rate-Distortion Approach," i Proceedings of the European Signal Processing Conference, 2010, s. 1159-1163.
Icke refereegranskade
Konferensbidrag
[60]
H. Lameris et al., "Spontaneous Neural HMM TTS with Prosodic Feature Modification," i Proceedings of Fonetik 2022, 2022.
Avhandlingar
[61]
G. E. Henter, "Probabilistic Sequence Models with Speech and Language Applications," Doktorsavhandling Stockholm : KTH Royal Institute of Technology, Trita-EE, 2013:042, 2013.
Övriga
[62]
S. Wang et al., "A comparative study of self-supervised speech representationsin read and spontaneous TTS," (Manuskript).
[63]
G. E. Henter, S. Alexanderson och J. Beskow, "Moglow : Probabilistic and controllable motion synthesis using normalising flows," (Manuskript).
[64]
G. E. Henter, A. Leijon och W. B. Kleijn, "Kernel Density Estimation-Based Markov Models with Hidden State," (Manuskript).
[65]
G. E. Henter och W. B. Kleijn, "Minimum Entropy Rate Simplification of Stochastic Processes," (Manuskript).
[66]
T. Kucherenko et al., "The GENEA Challenge 2020 : Benchmarking gesture-generation systems on common data," (Manuskript).
Senaste synkning med DiVA:
2024-12-22 01:44:03