The course consists of lectures, three laboratory sessions with hand-in assignments, as well as writing an essay on a subject chosen in consultation with the teacher. The thesis is furthermore presented orally during a final seminar. The laboratory sessions consist of designing different parts of a speech recognition application, training the system and evaluating its performance.
The following theoretical course components are included:
- algorithms for training, recognition as well as adaptation to properties of speakers and transmissions channel, including pattern recognition, Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs)
- methods to decrease the sensitivity to disturbances and deviations
- probability theory
- signal processing and parameter extraction
- acoustic modelling of the static and dynamic spectral properties of speech sounds
- statistical modelling of language in spontaneous and formal speech
- search strategies - basic methods and strategies for large vocabularies
- specific methods for analysis and decision making, for recognition of speakers.
Furthermore, some practical insights into building an application are given. This includes the implementation of certain functions based on prototypes, and testing them on real speech data.