FDT3317 Speech Synthesis from Beginning to End-to-end 7.5 credits

Information per course offering

Course offerings are missing for current or upcoming semesters.

Course syllabus as PDF

Please note: all information from the Course syllabus is available on this page in an accessible format.

Course syllabus FDT3317 (Autumn 2019–)
Headings with content from the Course syllabus FDT3317 (Autumn 2019–) are denoted with an asterisk ( )

Content and learning outcomes

Course contents

“Machines that speak” is an age-old topic that has experienced a recent surge in research interest. Speaking devices are now in everyone's pockets, and the speech-synthesis field has become a challenging proving ground for new methods in machine learning.

This course is an introduction to text-to-speech (TTS) synthesis with elements of acoustic phonetics and signal processing. The course introduces a universal TTS engineering pipeline step by step: text processing, prediction engine, and waveform generation. The pipeline components are then explored within each contemporary speech-synthesis paradigm, from unit selection via statistical-parametric and hybrid synthesisers to end-to-end systems.

Intended learning outcomes

After having completed the course, the students should be able to: 

1. Demonstrate a solid knowledge basis for doing independent research and development of state-of-the-art text-to-speech synthesis.

2. Define and motivate basic concepts in TTS-relevant acoustic phonetics and signal processing, and describe all parts of the text-to-speech pipeline.

3. Using the above understanding as a basis, acquire and demonstrate skills in system implementation, as practiced and evaluated during exercise sessions.

4. Demonstrate good familiarity with the seminal advances in speech synthesis over the years (both at KTH and at large), as well as with the most recent achievements such as neural-network-based end-to-end systems.

Literature and preparations

Specific prerequisites

 Admitted to a doctoral education programme.

Recommended prerequisites

The intended student has some experience of either signal processing, machine learning, or phonetics.


No information inserted


 Suggested reading:

Examination and completion

If the course is discontinued, students may request to be examined during the following two academic years.

Grading scale

P, F


  • EXA1 - Exam, 7.5 credits, grading scale: P, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

Several components contribute to the final grade including introduction of a discussion paper, exercise participation, and final student group work on system demonstrations.

Other requirements for final grade

A pass on all components (as listed above) is required to pass the course.

Opportunity to complete the requirements via supplementary examination

No information inserted

Opportunity to raise an approved grade via renewed examination

No information inserted


Ethical approach

  • All members of a group are responsible for the group's work.
  • In any assessment, every student shall honestly disclose any help received and sources used.
  • In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Further information

Course room in Canvas

Registered students find further information about the implementation of the course in the course room in Canvas. A link to the course room can be found under the tab Studies in the Personal menu at the start of the course.

Offered by

Main field of study

This course does not belong to any Main field of study.

Education cycle

Third cycle

Add-on studies

No information inserted

Postgraduate course

Postgraduate courses at EECS/Speech, Music and Hearing