Reinforcement Learning for Operations Management

Date:

7 December 2022 & 15 February 2023

Time:

10.00 – 16.00 h

Location:

Utrecht

Lecturer:

Robert Boute (KU Leuven), Martijn Mes (University of Twente), Willem Van Jaarsveld (TU Eindhoven)

Days:

ECTS:

1 (participating only) / 4 (participating + passing the assignment)

Course fee:

Free for TRAIL/Beta/ERIM/OML members, others please contact the TRAIL office

Registration:

See below.

Please note:

All students are expected to participate actively, and making the assignment is mandatory.
Please read the prerequisite part (below) before registering.
Maximum of 18 students (TRAIL/Beta/ERIM/OML members have first choice, and also: first come, first go).

Objectives:

The objective of this course is to learn to apply the technique of reinforcement learning to solve a variety of operational problems. After a successful completion of this course, students will be able to:

• relate reinforcement learning to (approximate) dynamic programming to solve MDPs;
• explain the role and purpose of neural networks in deep reinforcement learning;
• evaluate the diﬀerent design choices to set up a reinforcement learning algorithm;
• benchmark the performance of reinforcement learning to other (near-)optimal solutions;
• acknowledge the limitations of reinforcement learning;
• apply reinforcement learning as a general purpose technology to a problem of choice.

Course description:

This course introduces the technique of reinforcement learning to optimize operations management prob-lems. It covers both theoretical foundations as well as implementation of reinforcement learning algorithms to practical problems. The focus will be on the eﬀective implementation to operations management prob-lems. All students are expected to design and implement a reinforcement learning algorithm for a problem of choice. Examples of reinforcement learning applications in the operations management and logistics ﬁeld are provided during the course. Coaching will be provided to assist the students in their assignment.

Assignment:

All students are expected to design and implement a reinforcement learning algorithm for an operations management problem of choice from their own research interest. The only requirement is that it involves sequential decision-making under uncertainty and that it can be modelled as a Markov Decision Process (the latter not being a strong restriction as we will see in this course). The objective is to benchmark the performance of the (deep) reinforcement learning algorithm against existing heuristics and/or the optimal solution (possibly obtained through dynamic programming). The problem, as well as its benchmarks, will be further deﬁned during the ﬁrst session. Students will be teamed up in teams of two students based on their research interests. Each team will be part of a squad of 3 teams. Each squad will meet on a (bi-)weekly basis (i.e., every week or every other week) together with their coach to discuss their progress and share their learning in an online meeting. An intermediate presentation is foreseen with all squads halfway to the ﬁnal deliverable (i.e., mid De-cember). This meeting will take place online. The ﬁnal results will be presented during a physical session in Utrecht. This is followed by a debrief and discussion of future research opportunities. The deliverable is two-fold: a presentation and a ﬁnal report (approx. 10 pages).

Program:

Opening session (on location):

• Reinforcement learning and its relation to (approximate) dynamic programming;
• The use of neural networks in reinforcement learning;
• Design choices to set up a reinforcement learning algorithm and the diﬀerent types of algorithms;
• Deﬁnition of the problem that will be covered during the assignment;
• How to get started?

Bi-weekly coaching sessions per squad (online).
Intermediate presentation to discuss the status (online).
Closing session (on location) with ﬁnal presentations and closure of the program.

Literature:

Methodology:

Course material:

A selected list of papers will be made available at the beginning of the course.

Prerequiste:

Students need to have solid background in probability theory and linear algebra. In addition, students should have solid foundations with programming languages such as Matlab, Python, C++ or R. Students can use their language of choice, but they should ensure that the methods discussed in the sessions are available in their preferred language. For neural networks, Pytorch and/or TensorFlow are recommended – these can be called from various languages such as Python, C++, etc. Students are not required to have any prior knowledge on machine learning.

Course Registration form

Course

Date

First name

Last name

Employed at

Faculty

Email address

Telephone

Position

Member of research school:

Member of research school

Yes, Beta research school Yes, TRAIL research school Yes, other No

Name other research school

Additional comments/dietary needs

I agree that GP-OML uses my data for the purpose of this course.