Till innehåll på sidan
Till KTH:s startsida Till KTH:s startsida

Jakob Nylöf: Deep q-learning in continuous time

Master Thesis

Tid: Fr 2024-06-14 kl 10.15 - 11.00

Plats: KTH 3424 (lunch room)

Respondent: Jakob Nylöf

Handledare: Boualem Djehiche

Exportera till kalender

Abstract.

Reinforcement Learning (RL) focuses on designing agents that solve sequential decision-making problems by exploring and learning optimal actions through trial-and-error. Traditionally formulated in discrete-time, RL algorithms like Deep Q-learning teach agents the Q-function, by means of function approximation using Deep Neural Networks (DNNs). Recent advancements by X. Y. Zhou and his co-authors propose q-learning, a continuous-time Q-learning framework. In this setting, one focuses on the ”q-function,” the time derivative of the Q-function, which is learned by a martingale approach. This thesis introduces the concept of Deep q-learning, which involves approximating the optimal q-function and optimal value function with DNNs, analogous to the case of Deep Q-learning. We adapt q-learning algorithms from Jia and Zhou (2023) obtaining offline and online Deep q-learning algorithms. Furthermore, we prove that discretization errors associated with q-learning algorithms decrease as time discretization approaches zero, and demonstrate convergence of the offline Deep q-learning algorithm through numerical simulations.