Important: June 29, to enter the E building and access the room E3, please enter at Osquars backe 2 (KTH, Stockholm), using the code 2429. If the door is closed, please contact us at +46(0)734618051 or by email alepro@kth.se.
Find below the program of the workshop. You can find the list of invited speakers, their bios, and the abstracts of their talks here .
On June 29, the workshop will start at 2pm and will be held in the room E3 at KTH . The room is on the 5th floor, directions will be provided.
To enter the E building and access the room E3, please enter at Osquars Backe 2 (see below for a link) KTH, Stockholm) using the code 2429. If the door is closed, please contact us at +46(0)734618051 or by email alepro@kth.se.
Richard Combes - Distributed Multi-Armed Bandits without Collision Sensing
Abstract: In this talk we consider the distributed multi-armed bandit problem where M learners must select the best M arms out of a set of K arms in a distributed fashion, while avoiding collisions, that is choosing the same arm. We present several approaches depending on the type of collision sensing available to each players, and then focus on the case where no collision sensing is available. We propose a novel algorithm for this scenario, which circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice as well. This talk is based on joint work with Huang Wei and Cindy Trinh.
Ruo-Chun Tzeng and Po-An Wang - Faster Algorithms for Combinatorial Semi-Bandits
Abstract: This talk is based on two papers concerned with combinatorial semi-bandits and addressing the challenge of achieving both statistical and computational efficiency. We focus on two popular learning tasks: best arm identification and regret minimization.
In the first part of the talk, corresponding to the paper "Closing the Computational-Statistical Gap in Best Arm Identification for Combinatorial Semi-Bandits", we aim to identify the best action with a probability of at least 1-delta while minimizing the number of samples used. Several statistically optimal methods exist, but none are computationally efficient. This inefficiency stems fronm the complex computation of the most confusing parameter in the sample complexity lower bound. We propose a two-player game algorithm to approximate this parameter. The resulting first-order sampling rule (P-FWS) runs in polynomial time and achieves statistical optimality.
The second part of the talk "Matroid Semi-Bandits in Sublinear Time", focuses on regret minimization in bandit within a matroid structure. Current state-of-the-art algorithms, such as CUCB and KL-OSM, require time complexity at least linear in the number K of arms. By dynamically maintaining an approximate maximum-weight base over inner-product weights, we introduce the first algorithm with per-round time complexity that is sublinear in K, while asymptotically matching the gap-dependent lower bound achieved by CUCB.
3:45 - 4:00 Break
Kuang Xu - Experimenting under Stochastic Congestion
Abstract: We study randomized experiments in a service system when stochastic congestion can arise from temporarily limited supply and/or demand. Such congestion gives rise to cross-unit interference between the waiting customers, and analytic strategies that do not account for this interference may be biased. In current practice, one of the most widely used ways to address stochastic congestion is to use switchback experiments that alternatively turn a target intervention on and off for the whole system. We find, however, that under a queueing model for stochastic congestion, the standard way of analyzing switchbacks is inefficient, and that estimators that leverage the queueing model can be materially more accurate. We also consider a new class of experimental design, which can be used to estimate a policy gradient of the dynamic system using only unit-level randomization, thus alleviating key practical challenges that arise in running a switchback. Preprint: https://arxiv.org/abs/2302.12093.
Rajesh Sundaresan - Reputation-based information design for increasing prosocial behaviour
Abstract: We will discuss an information design problem that uses (1) norms and conventions in a society or network and (2) individuals' desires to have good reputation (in keeping with these norms and conventions), to increase the quantum of prosocial actions. A function of each individual's action is made public and considerations of the resulting reputation improves prosocial actions. The planner designs the function to make public and agents respond with their actions. We will study the equilibria that ensue and will highlight some interesting properties. The problem came out of a field experiment conducted in about 20,000 households in Kerala on how effective feedback can be in reducing residential energy consumption. The talk will be on joint work with Alexandre Reiffers-Masson.
July 4 - Digital Futures hub KTH
Adam Wierman - MARL and Machine-Learned Advice in Strategic Settings
Abstract: This talk will discuss recent results about learning in strategic settings. We will focus on the classical setting of two-player stochastic games and study the convergence of multi-agent reinforcement learning, connecting smoothed fictious play with modern algorithms like DQN. Then, we will show how to balance exploitative play (using learning tools) with robust play (using game-theoretic tools) in strategic settings by building on the emerging literature of learning-augmented algorithms.
Bert Zwart - Optimization under rare events: scaling laws for solutions of chance-constrained programs
Abstract: Inspired by the need to develop a fundamental framework for Probabilistic Dimensioning of Frequency Restoration Reserves in the European Power Grid we investigate a class of chance-constraint programs in which profit needs to be maximized while enforcing that a given adverse event (such as a blackout) remains rare.
Using techniques from large deviations and extreme-value theory, we show how the optimal value scales as the prescribed bound on the violation probability becomes small and how convex programs emerge in the limit. We also use our results to show that the popular CVAR approximation is asymptotically optimal under light-tail assumptions but is sub-optimal in a heavy-tailed setting. Then, we apply point process techniques and random set theory to study the suboptimality gap in the Monte Carlo-based scenario approach.
Kuang Xu - Experimenting under Stochastic Congestion
Abstract: We study randomized experiments in a service system when stochastic congestion can arise from temporarily limited supply and/or demand. Such congestion gives rise to cross-unit interference between the waiting customers, and analytic strategies that do not account for this interference may be biased. In current practice, one of the most widely used ways to address stochastic congestion is to use switchback experiments that alternatively turn a target intervention on and off for the whole system. We find, however, that under a queueing model for stochastic congestion, the standard way of analyzing switchbacks is inefficient, and that estimators that leverage the queueing model can be materially more accurate. We also consider a new class of experimental design, which can be used to estimate a policy gradient of the dynamic system using only unit-level randomization, thus alleviating key practical challenges that arise in running a switchback. Preprint: https://arxiv.org/abs/2302.12093.
3:30 - 3:45 Break
Cesar A. Uribe - On Graphs with Finite-Time Consensus
Abstract: In this talk, we present sequences of graphs satisfying the finite-time consensus property (i.e., iterating through such a finite sequence is equivalent to performing global or exact averaging) and their use in Gradient Tracking. We provide an explicit weight matrix representation of the studied sequences and prove its finite-time consensus property. Moreover, we incorporate the studied finite-time consensus topologies into Gradient Tracking and present a new algorithmic scheme called Gradient Tracking for Finite-Time Consensus Topologies (GT-FT). We analyze the new scheme for nonconvex problems with stochastic gradient estimates. Our analysis shows that the convergence rate of GT-FT does not depend on the heterogeneity of the agents' functions or the connectivity of any individual graph in the topology sequence. Furthermore, owing to the sparsity of the graphs, GT-FT requires lower communication costs than Gradient Tracking using the static counterpart of the topology sequence.
Junghyun Lee - Nearly Optimal Latent State Decoding in Block MDPs
Abstract: We investigate the problems of model estimation and reward-free learning in episodic Block MDPs. In these MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are first interested in estimating the latent state decoding function (the mapping from the observations to latent states) based on data generated under a fixed behavior policy. We derive an information-theoretical lower bound on the error rate for estimating this function and present an algorithm approaching this fundamental limit. In turn, our algorithm also provides estimates of all the components of the MDP. We then study the problem of learning near-optimal policies in the reward-free framework. Based on our efficient model estimation algorithm, we show that we can infer a policy converging (as the number of collected samples grows large) to the optimal policy at the best possible rate. Interestingly, our analysis provides necessary and sufficient conditions under which exploiting the block structure yields improvements in the sample complexity for identifying near-optimal policies. When these conditions are met, the sample complexity in the minimax reward-free setting is improved by a multiplicative factor n, where n is the number of possible contexts.