Capita selecta – Reinforcement Learning for Operations Management fall 2023

Date:

22 November 2023 and 21 February 2024

Time:

10.00 – 16.00 h

Location:

Utrecht

Lecturer:

Willem van Jaarsveld, Wouter van Heeswijk, Martijn Mes, Zaharah Bukhsh

Days:

2

ECTS:

4 (participating + passing the assignment)

Course fee:

Free for TRAIL/Beta/ERIM/OML members, others please contact the TRAIL office

Registration:

= This course is fully booked! =

For a place on the waiting list, please fill in the pre-registration form (see below).

Please note:

  • All students are expected to participate actively, and making the assignment is mandatory.
  • Please read the prerequisite part (below) before registering.
  • Maximum of 14 students (TRAIL/Beta/ERIM/OML members have first choice, and also: first come, first go).

Objectives:

The objective of this course is to learn to apply the technique of reinforcement learning to solve a variety of operational problems. After successful completion of this course, students will be able to:

  • relate reinforcement learning to (approximate) dynamic programming to solve MDPs;
  • explain the role and purpose of neural networks in deep reinforcement learning;
  • evaluate the different design choices to set up a reinforcement learning algorithm;
  • benchmark the performance of reinforcement learning to other (near-)optimal solutions;
  • acknowledge the limitations of reinforcement learning;
  • apply reinforcement learning as a general-purpose technology to a problem of choice.

Course description:

This course introduces the technique of reinforcement learning to optimize operations management problems. It covers both theoretical foundations as well as the implementation of reinforcement learning algorithms to practical problems. The focus will be on the effective application to operations management problems. All students are expected to design and implement a reinforcement learning algorithm for a problem of choice. Examples of reinforcement learning applications in the operations management and logistics field are provided during the course. Coaching will be provided to assist the students in their assignment.

Assignment:

All students are expected to design and implement a reinforcement learning algorithm for an operations management problem of choice from their own research interests. The only requirement is that it involves sequential decision-making under uncertainty and that it can be modelled as a Markov Decision Process (the latter not being a strong restriction as we will see in this course). The objective is to benchmark the performance of the (deep) reinforcement learning algorithm against existing heuristics and/or the optimal solution (possibly obtained through dynamic programming). The problem, as well as its benchmarks, will be further defined during the first session. Students will be teamed up in teams of two students based on their research interests. Each team will be part of a squad of 2-4 teams. Each squad will meet on a (bi-)weekly basis (i.e., every week or every other week) together with their coach to discuss their progress and share their learning in an online meeting. An intermediate presentation is foreseen with all squads halfway to the final deliverable (i.e., mid December). This meeting will take place online. The final results will be presented during a physical session in Utrecht. This is followed by a debrief and discussion of future research opportunities. The deliverable is two-fold: a presentation and a final report (approx. 10 pages).

Program:

Opening session (in Utrecht).

  • Reinforcement learning and its relation to (approximate) dynamic programming;
  • The use of neural networks in reinforcement learning;
  • Design choices to set up a reinforcement learning algorithm and the different types of algorithms;
  • Definition of the problem that will be covered during the assignment;
  • How to get started?

(Bi-)weekly coaching sessions per squad (online).

Intermediate presentation to discuss the status (online).

Closing session (in Utrecht) with final presentations and closure of the program.

Literature:

Methodology:

Course material:

A selected list of papers will be made available at the beginning of the course.

Prerequiste:

Students need to have solid background in probability theory and linear algebra. In addition, students should have solid foundations with programming languages such as Matlab, Python, C++ or R. Students can use their language of choice, but they should ensure that the methods discussed in the sessions are available in their preferred language. For neural networks, Pytorch and/or TensorFlow are recommended – these can be called from various languages such as Python, C++, etc. Students are not required to have any prior knowledge on machine learning.

Pre-registration form


Member of research school: