Markov chains, Markov Decision Process (MDP), Dynamic Programming and value / policy iteration methods, design of approximate controllers for MDPs, stochastic linear quadratic control, Multi-Armed Bandit problems.
EL2800 Stochastic Control and Optimization 7.5 credits
This course has been discontinued.
Decision to discontinue this course:
No information inserted
Information per course offering
Course offerings are missing for current or upcoming semesters.
Course syllabus as PDF
Please note: all information from the Course syllabus is available on this page in an accessible format.
Course syllabus EL2800 (Spring 2019–)Content and learning outcomes
Course contents
Intended learning outcomes
This course introduces basic theories and methodologies for the analysis and the design of stochastic control policies. It provides a comprehensive introduction to stochastic control, with applications taken from a variety of areas including advertising, dynamic resource allocation, and traditional automatic control. After the course, you should be able to:
- Understand basic principles of probability theory and stochastic dynamical systems including Markov chains.
- Rigorously formulate stochastic control problems as Markov Decision Process (MDP) problems, classify the corresponding problems, and evaluate their tractability.
- State the principle of optimality in finite-time and infinite-time horizon MDPs, and solve MDPs using dynamic programming.
- Derive solutions of MDPs using the value and policy iteration methods.
- Propose approximate solutions of MDPs.
- Treat MDP extensions such as constrained MDPs, Partially Observable MDPs, and distributed MDPs.
- State limit theorems expressing the relationship between MDPs and deterministic continuous-time control problems.
- Solve Linear Quadratic stochastic control problems.
- Solve basic optimal stopping time problems.
- Identify and formulate stochastic Multi-Armed Bandit (MAB) problems; derive regret lower bounds for MAB problems.
- Solve basic online stochastic optimization problems.
- Propose algorithms for adversarial MAB problems, and sequential decisions in repeated games.
Literature and preparations
Specific prerequisites
For single course students: 120 credits and documented proficiency in English B or equivalent.
Literature
Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley.
Bertsekas, Dynamic Programming and Optimal Control, vol. 1, Athena Scientific.
Bubeck and Cesa-Bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Now publisher, Foundations and trends in machine learning, 2012
Examination and completion
Grading scale
Examination
- HEM1 - Homework 1, 1.0 credits, grading scale: P, F
- HEM2 - Homework 2, 1.0 credits, grading scale: P, F
- TEN1 - Exam, 3.5 credits, grading scale: A, B, C, D, E, FX, F
- LAB2 - Lab 2, 1.0 credits, grading scale: P, F
- LAB1 - Lab 1, 1.0 credits, grading scale: P, F
Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.
The examiner may apply another examination format when re-examining individual students.
If the course is discontinued, students may request to be examined during the following two academic years.
Other requirements for final grade
· H1: Homework, 1, grade scale: P/F
· LAB1: Computer lab 1, 1, grade scale: P/F
· LAB2: Computer lab 2, 1, grade scale: P/F
· TEN1: Written exam, grade scale: A, B, C, D, E, FX, F
Examiner
Ethical approach
- All members of a group are responsible for the group's work.
- In any assessment, every student shall honestly disclose any help received and sources used.
- In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.