Capita Selecta – Reinforcement Learning for Operations Management

Date:

20 November 2024 & 19 February 2025

Time:

10.00 – 16.00 h

Location:

Utrecht

Lecturer:

Willem van Jaarsveld, Wouter van Heeswijk, Martijn Mes, Zaharah Bukhsh

Days:

2

ECTS:

4 (attendance + passing assignment)

Course fee:

Free for TRAIL/Beta/ERIM members, others please contact the TRAIL office

Registration:

= This course is fully booked! =

 

 

Objectives:

The objective of this course is to learn to apply the technique of reinforcement learning to solve a variety of operational problems. After successful completion of this course, students will be able to:

  • understand how to model a problem as an MDP
  • relate reinforcement learning to (approximate) dynamic programming to solve MDPs;
  • explain the role and purpose of neural networks in deep reinforcement learning;
  • evaluate the different design choices to set up a reinforcement learning algorithm;
  • benchmark the performance of reinforcement learning to other (near-)optimal solutions;
  • acknowledge the limitations of reinforcement learning;
  • apply reinforcement learning as a general-purpose technology to a problem in the domain of operations management.

Course description:

This course introduces the technique of reinforcement learning to optimize operations management problems. It covers both theoretical foundations as well as the implementation of reinforcement learning algorithms to practical problems. The focus will be on the effective application to operations management problems. All students are expected to design and implement a reinforcement learning algorithm for a problem of choice. Examples of reinforcement learning applications in the operations management and logistics field are provided during the course. Coaching will be provided to assist the students in their assignment.

Assignment:

All students are expected to apply DRL algorithms to an operations management problem of choice from their own research interests. To this end, teachers will suggest the use of packages that are suitable for this purpose. For the problem, the only requirement is that it involves sequential decision-making under uncertainty and that it can be modelled as a Markov Decision Process. The objective is to benchmark the performance of the (deep) reinforcement learning algorithm against existing heuristics and/or the optimal solution (possibly obtained through dynamic programming). The problem, as well as its benchmarks, will be further defined during the first session. Students will be teamed up in teams of two students based on their research interests. Each team will be part of a squad of 2-4 teams. Each squad will meet on a (bi-)weekly basis (i.e., every week or every other week) together with their coach to discuss their progress and share their learning in an online meeting. An intermediate presentation is foreseen with all squads halfway to the final deliverable (i.e., mid December). This meeting will take place online. The final results will be presented during a physical session in Utrecht. This is followed by a debrief and discussion of future research opportunities. The deliverable is two-fold: a presentation and a final report (approx. 10 pages).

Program:

Opening session (in Utrecht).

  • Reinforcement learning and its relation to (approximate) dynamic programming;
  • The use of neural networks in reinforcement learning;
  • Design choices to set up a reinforcement learning algorithm and the different types of algorithms;
  • Definition of the problem that will be covered during the assignment;
  • How to get started?

 

(Bi-)weekly coaching sessions per squad (online).
Intermediate presentation to discuss the status (online).
Closing session (in Utrecht) with final presentations and closure of the program.

Literature:

Methodology:

Course material:

A selected list of papers will be made available at the beginning of the course.

Prerequiste:

Students need to have solid background in probability theory and linear algebra. In addition, students should have solid foundations with programming languages such as Python or C++. Students can use their language of choice, but they should ensure that the methods discussed in the sessions are available in their preferred language. For neural networks, Pytorch and/or TensorFlow are recommended – these can be called from various languages such as Python, C++, etc. Students are not required to have any prior knowledge on machine learning.

Pre-registration form


Member of research school: