Seminars with lectures and joint problem solving
FEO3350 Information Theory for Statistics and Learning 12.0 credits
Information theory, machine learning and artificial intelligence have been overlapping fields during their whole existence as academic disciplines. These areas, in turn, overlap significantly with applied and theoretical statistics. This course will explore how information-theoretic methods can be used to predict and bound the performance in statistical decision theory and in the process of learning an algorithm from data. Several of these methods were originally developed to study the theoretical performance of digital communication of data and signals but have recently found new and/or rediscovered use in theoretical statistics. The goal is to give PhD students in decision and control, learning, AI, network science, and information theory a solid introduction on how information-theoretic concepts and tools can be applied to problems in statistics, decision and learning well beyond their more traditional use in communication theory.
Information per course offering
Course offerings are missing for current or upcoming semesters.
Course syllabus as PDF
Please note: all information from the Course syllabus is available on this page in an accessible format.
Course syllabus FEO3350 (Autumn 2020–)Information for research students about course offerings
For the first time in the Fall of 2020. Then every other year.
Content and learning outcomes
Course disposition
Course contents
Lecture 1: Information theory fundamentals: Entropy, mutual information, relative entropy, and f-divergence. Total variation and other distance metrics. Inequalities.
Lecture 2: Rate-Distortion theory: Cost versus information. Bounds. The Blahut algorithm.
Lecture 3: Limits on information flow and processing: Conditional mutual information and relative entropy. Data processing inequalities. Sufficient statistic and the information bottleneck. Rate-distortion interpretation
Lecture 4: Foundations of statistical decision theory: Parameter estimation. Bayes and minimax risk. Binary hypothesis testing
Lecture 5: Information bounds on error probability and risk: Sample complexity. The mutual information method and rate-distortion. Fano inequalities.
Lecture 6: Learning and generalization: Information bounds on generalization error. VC dimension and complexity.
Lecture 7: Classical estimation theory: Maximum likelihood, Fischer information, information bounds, Cramér-Rao, Hammersley-Chapman-Robbins.
Lecture 8: Packing, covering, Fano & minimax risk, metric entropy
Lecture 9: Le Cam's method, mutual information method continued. Density estimation. Functional estimation.
Lecture 10: Dimension reduction and denoising: Sparsity, compressed sensing, denoising, almost noiseless analog compression
Lecture 11: Variational methods: Variational inference and the ELBO, variational characterization of divergence and information. Donsker-Varadhan. Gelfand-Yaglom-Perez.
Lecture 12: The method of types
Lecture 13: Information theory and large deviations, Stein, Chernoff and Sanov. Total variation and hypothesis testing.
Lecture 14: The geometry of information: Information geometry, information projection, iterative methods, Expectation-Maximization
Intended learning outcomes
A student who has passed this course should be able to:
- understand what concepts in information theory are most important in statistical decision theory, signal processing and learning theory
- understand and recapitulate several of the most important proofs that build the theory
- apply and synthesize the proof techniques and approaches in fundamental exercises
- acquire, present, and discuss the knowledge of advanced research papers in the field
Literature and preparations
Specific prerequisites
Recommended prerequisites
Required: Solid working knowledge (at the "advanced undergrad level") in analysis, linear algebra and probability
Recommended: Information theory, corresponding to FEO3210; (measure theoretic) probability, corresponding to FEO3230; optimization, corresponding to SF3847
Equipment
Literature
A list of books, lecture notes and papers in the area. Course slides made available via the course webpage.
Examination and completion
If the course is discontinued, students may request to be examined during the following two academic years.
Grading scale
Examination
- EXA1 - Written examination, 12.0 credits, grading scale: P, F
Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.
The examiner may apply another examination format when re-examining individual students.
Ticking in problem solving seminars: Students have to tick before every problem solving seminar the problems that they have solved. The presented solution will be discussed in the seminars with the teachers and other students.
Oral examination of having aquired sufficient knowledge of basic concepts.
Other requirements for final grade
The student has to have ticked at least 70% of the problems and has successful presented a solution when called. Sufficient knowledge has been demonstrated in the oral examination.
Opportunity to complete the requirements via supplementary examination
Opportunity to raise an approved grade via renewed examination
Examiner
Ethical approach
- All members of a group are responsible for the group's work.
- In any assessment, every student shall honestly disclose any help received and sources used.
- In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.