Till KTH:s startsida Till KTH:s startsida

Visa version

Version skapad av Petter Ögren 2018-12-10 15:36

Visa < föregående | nästa >
Jämför < föregående | nästa >

Master thesis proposals - external

Hybrid Model-based Model-free Reinforcement Learning for Robotics Manipulation

Background
Recent advances in artificial intelligence has enabled machines to compete with humans even in the most difficult of domains. Google Deepmind's AlphaGo is a case in point. Similar approaches of reinforcement learning (RL) have been tried in the robotics community on problems of skill learning. By skill we mean a sensorimotor policy (control policy) that can perform a single continuous-time task. Numerous successes in skill learning have been reported for a variety of manipulation tasks that are otherwise difficult to program. Examples include, batting, pancake flipping, pouring, pole balancing etc. One of the most challenging class of manipulation tasks is assembly of mating parts. Not surprisingly, the capability to learn assembly skills is highly sought after. 

Problem description
RL can be divided into model-based and model-free methods. In model-based methods, the algorithm learns a dynamics model of the manipulation task and utilizes it to optimize the policy. Contrary to this, in model-free RL (policy search), the policy is often directly optimized without the intermediate step of model learning. The trade off here is between number of trials (sample efficiency) and model bias. While mode-based methods are sample efficient, model-free methods do not suffer from model bias. We propose a hybrid approach that has benefits of both methods in it. It employs a global black box optimization method called Bayesian optimization (BO) to learn the policy in a fundamentally model-free way, but at the same time uses a learned model to guide the process. We will exploit the fact that BO does not require a cost function for the learning process. Our application will be an assembly task in which an ABB YuMi robot will insert one part into another part.

Purpose and aims
The objective of this thesis is to develop a skill learning method under the framework of RL. The robot should be able to demonstrate the learning process by continuously trying to do the insertion while making incremental progress and finally achieve convergence by being able to complete the task successfully in a few consecutive trials.

The work will include the following tasks:

  1. Conduct literature review on RL based skill learning and BO.
  2. Formulate a strategy for utilizing a learned dynamics model for guiding the BO. Model learning algorithm can be assumed to be given.
  3. Set up either MuJoCo or Bullet simulation environment. Implement a simpler task of inverted pendulum and then the main insertion task.
  4. Develop a parameterized policy (not necessarily deep network) and implement the BO based RL algorithm including the results of Step 2.
  5. Evaluate the method on a real robot and draw conclusions about the hybrid method. 

We are searching for a highly motivated student from master programs such as Systems, Control and Robotics, or Machine learning, or a student with a similar background. Knowledge in modeling and control of robotics manipulator is highly advantageous. Any prior exposure to Gaussian process regression, RL or BO will be valued. A medium to high level of competency in either Python or Matlab is necessary. Masters level knowledge of linear algebra and probability theory is expected and general competence in machine learning will be highly appreciated.

The master student will gain competences within Robotics, Robot Control, Reinforcement learning, Bayesian optimization, Gaussian process, etc. Note that the student will work in ABB Corporate Research in Västerås and compensation plus accommodation will be provided by the company. This project is defined within the context of an ongoing PhD project and therefore, the student can expect a high level of research environment and support, including software and systems. Prospective PhD student will be given preference. It may also be possible to do this project at RPL but the decision will be taken on a case by case basis.

Contact:  Shahbaz Khader, +46725305968, shahbaz.khader@se.abb.com, ABB 

Online Planning Based Reinforcement Learning for Robotics Manipulation 

Background
Recent advances in artificial intelligence has enabled machines to compete with humans even in the most difficult of domains. Google Deepmind's AlphaGo is a case in point. Similar approaches of reinforcement learning (RL) have been tried in the robotics community on problems of skill learning. By skill we mean a sensorimotor policy (control policy) that can perform a single continuous-time task. Numerous successes in skill learning have been reported for a variety of manipulation tasks that are otherwise difficult to program. Examples include, batting, pancake flipping, pouring, pole balancing etc. One of the most challenging class of manipulation tasks is assembly of mating parts. Not surprisingly, the capability to learn assembly skills is highly sought after. 

Problem description
Most skill learning RL methods are of policy search type. In policy search methods, the optimal parameters of a parameterized policy is obtained from an optimization process. Computing a general policy that takes the best action in any possible state is a much harder problem than planning a sequence of actions from a single state. On the other hand, while the policy provides robustness to uncertainties, planning cannot cope with any deviations from the plan. Online planning or model predictive control (MPC) is a method in which the best of both worlds come together. Instead of computing a policy offline, a plan is computed in an online manner at every execution step. Only the first action is applied and the rest is discarded. The process is repeated at every time step. The drawback with the online planning method is the high computational cost of planning at every time step. When combined with dynamics model learning, the overall method becomes a reinforcement learning approach. Some of the challenges that we aim to tackle in this thesis are: trading off planning horizon versus computational cost, planning under uncertain dynamics model, and incorporating prior information of the task instead of completely relying on learning the dynamics. Our application will be an assembly task in which an ABB YuMi robot will insert one part into another part. 

Purpose and aims
The objective of this thesis is to develop a skill learning method under the framework of RL. The robot should be able to demonstrate the learning process by continuously trying to do the insertion while making incremental progress and finally achieve convergence by being able to complete the task successfully in a few consecutive trials.

The work will include the following tasks:

  1. Conduct literature review on RL based skill learning and MPC.
  2. Formulate a method for online planning that utilizes the uncertainties of the learned dynamics model. Model learning algorithm can be assumed to be given.
  3. Develop a strategy for combining offline learning from simulation and online planning.
  4. Evaluate the method on simulated tasks and also a real robot.

 

We are searching for a highly motivated student from master programs such as Systems, Control and Robotics, or Machine learning, or a student with a similar background. Knowledge in modeling and control of robotics manipulator is highly advantageous. Any prior exposure to optimal control, MPC, or RL will be valued. A medium to high level of competency in either Python or Matlab is necessary. Masters level knowledge of linear algebra and probability theory is expected and general competence in machine learning will be highly appreciated.

The master student will gain competences within Robotics, Robot Control, Reinforcement learning, Optimization, Optimal Control, etc. Note that the student will work in ABB Corporate Research in Västerås and compensation plus accommodation will be provided by the company. This project is defined within the context of an ongoing PhD project and therefore, the student can expect a high level of research environment and support, including software and systems. Prospective PhD student will be given preference. It may also be possible to do this project at RPL but the decision will be taken on a case by case basis.

Contact:  Shahbaz Khader, +46725305968, shahbaz.khader@se.abb.com, ABB Corporate Research