Capita selecta – Reinforcement Learning for Operations Management fall 2022


7 December 2022 & 15 February 2023


10.00 – 16.00 h


Boswell-Beta – Daltonlaan 400, Utrecht


Robert Boute (KU Leuven), Martijn Mes (University of Twente), Willem Van Jaarsveld (TU Eindhoven)




4 (participating + passing the assignment)

Course fee:

Free for TRAIL/Beta/OML/ERIM members, others please contact the TRAIL office


= This course is fully booked. For a place on the 2023 list you can fill in the pre-registration form (see below) =

Please note:

  • All students are expected to participate actively, and making the assingment is mandatory;
  • Please read the prerequisite part (below) before registering;
  • Maximum of 18 students (TRAIL/Beta/ERIM/OML members have first choice, and aslo: first come, first go).


The objective of this course is to learn to apply the technique of reinforcement learning to solve a variety of operational problems. After a successful completion of this course, students will be able to:

• relate reinforcement learning to (approximate) dynamic programming to solve MDPs;
• explain the role and purpose of neural networks in deep reinforcement learning;
• evaluate the different design choices to set up a reinforcement learning algorithm;
• benchmark the performance of reinforcement learning to other (near-)optimal solutions;
• acknowledge the limitations of reinforcement learning;
• apply reinforcement learning as a general purpose technology to a problem of choice.

Course description:

This course introduces the technique of reinforcement learning to optimize operations management prob-lems. It covers both theoretical foundations as well as implementation of reinforcement learning algorithms to practical problems. The focus will be on the effective implementation to operations management prob-lems. All students are expected to design and implement a reinforcement learning algorithm for a problem of choice. Examples of reinforcement learning applications in the operations management and logistics field are provided during the course. Coaching will be provided to assist the students in their assignment.


All students are expected to design and implement a reinforcement learning algorithm for an operations management problem of choice from their own research interest. The only requirement is that it involves sequential decision-making under uncertainty and that it can be modelled as a Markov Decision Process (the latter not being a strong restriction as we will see in this course). The objective is to benchmark the performance of the (deep) reinforcement learning algorithm against existing heuristics and/or the optimal solution (possibly obtained through dynamic programming). The problem, as well as its benchmarks, will be further defined during the first session. Students will be teamed up in teams of two students based on their research interests. Each team will be part of a squad of 3 teams. Each squad will meet on a (bi-)weekly basis (i.e., every week or every other week) together with their coach to discuss their progress and share their learning in an online meeting. An intermediate presentation is foreseen with all squads halfway to the final deliverable (i.e., mid De-cember). This meeting will take place online. The final results will be presented during a physical session in Utrecht. This is followed by a debrief and discussion of future research opportunities. The deliverable is two-fold: a presentation and a final report (approx. 10 pages).


Opening session (on location):

• Reinforcement learning and its relation to (approximate) dynamic programming;
• The use of neural networks in reinforcement learning;
• Design choices to set up a reinforcement learning algorithm and the different types of algorithms;
• Definition of the problem that will be covered during the assignment;
• How to get started?

Bi-weekly coaching sessions per squad (online).
Intermediate presentation to discuss the status (online).
Closing session (on location) with final presentations and closure of the program.



Course material:

A selected list of papers will be made available at the beginning of the course.


Students need to have solid background in probability theory and linear algebra. In addition, students should have solid foundations with programming languages such as Matlab, Python, C++ or R. Students can use their language of choice, but they should ensure that the methods discussed in the sessions are available in their preferred language. For neural networks, Pytorch and/or TensorFlow are recommended – these can be called from various languages such as Python, C++, etc. Students are not required to have any prior knowledge on machine learning.

Pre-registration form

Member of research school: