Machine Learning spring 2024

Date:

Date(s) for this course will be announced approximately two months before the start of the semester.

Time:

10.00 -16.00 h

Location:

t.b.d.

Lecturer:

Prof. dr. Inneke Van Nieuwenhuyse (Hasselt University), Prof. dr. David Wozabal (VU Amsterdam)

Days:

4

ECTS:

1 (attendance) / 4 (attendance + assignment)

Course fee:

Free for TRAIL/Beta/ERIM members, others please contact the TRAIL office

Registration:

For a place on the waiting list, please fill in the pre-registration form.

Objectives:

This course aims to give students a solid understanding of machine learning (ML) models and techniques, covering both foundational principles and their practical applications.

After successful completion, students will be able to:

  • Understand the nature of learning and its difference to classical statistical inference
  • Understand the basic concepts, principles, and terminology of ML and appreciate the difference between supervised, unsupervised, and reinforcement learning.
  • Use ML models in Python and evaluate the models on a range of real-world applications.

Course description:

This course provides a comprehensive introduction to the fundamental principles of machine learning and statistical pattern recognition. It covers both the theoretical foundations and practical implementation of machine learning methods, guiding participants through the end-to-end process of data investigation using machine learning techniques. The objective is to either uncover new insights in areas with limited prior knowledge or achieve accurate predictions of future observations.

Beginning with an overview and characterization of machine learning methods, the course delves into general principles for data manipulation, feature engineering, model selection, calibration, and evaluation. It then focuses on supervised learning, specifically tree-based regression and classification models, which are currently considered state-of-the-art for tabular data as well as on Gaussian processes.

The morning sessions primarily emphasize theoretical aspects, while the afternoon sessions offer hands-on demonstrations of machine learning methods using Python. The course does not center around specific applications, as those are addressed in the optional project. Participants are encouraged to apply the foundational knowledge gained in the course to a machine learning application relevant to their own scientific domain. Throughout the sessions, examples of machine learning applications are provided for reference.

Assignment:

An optional project is available where you are required to use the topics discussed in this course to analyze data for an application from your own scientific area. The objective of this final project is to explore new research in machine learning. A starting point could be replicating a paper, adding your own meaningful analysis, comparing it with other papers or applying the methods to a completely new application. There are two deliverables: a project proposal (to be accepted by the lecturer scoring the research) and a final report. Scores for the project are pass/fail (in case of failure, the ECTS for the course remains at 1).

Program:

Day 1 Morning Session: Introduction to the foundational ideas of ML. Supervised, unsupervised and reinforcement learning. History of ML and overview of different methods.

 

Afternoon Session: An introduction to the fundamentals of ML in Python

 

Instructor: David Wozabal, VU Amsterdam

 

Day 2 Morning Session: The ML pipeline including visualization of data, feature engineering, training testing and validation of models, hyperparameter tuning and cross validation, trees for regression and classification.

Afternoon Session: Pandas dataframes, plotting in Python, descriptive data analysis, an extended example of data cleaning and feature engineering.

 

Instructor: David Wozabal, VU Amsterdam

 

Day 3 Morning Session: Random forests, creating strong learners from weak learners (boosting and bagging), boosted trees as state of the art ML for tabular data

 

Afternoon Session: An extended regression example using XGBoost in Scikit learn.

 

Instructor: David Wozabal, VU Amsterdam

 

Day 4 Morning Session: Gaussian processes regression. Introduction to Bayesian optimization.

 

Afternoon Session: Example on GPR and BO, and joint exercise.

 

Instructor: Inneke Van Nieuwenhuyse, Hasselt University

Literature:

Methodology:

Course material:

The material consists of slides that will be made available to the students before class, and that contain references for further reading. There is no mandatory reading to prepare for the course, but interested students can already refer to the following recommended textbooks (optional):

  • Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. United States: O’Reilly Media.
  • Rasmussen, C.E., Williams, CK.I. (2005). Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press. gaussianprocess.org/gpml/
  • Gramacy, R. B. (2020). Surrogates: Gaussian process modeling, design, and optimization for the applied sciences. CRC press.

Prerequiste:

Students need to have solid background in statistics, probability theory, linear algebra, continuous mathematics, multivariate calculus and multivariate probability theory. In addition, students should have some initial experience with high level programming languages such as Matlab, Python or R. As the programming examples will be in Python, students are encouraged to familiarize themselves with the language, e.g., by following online courses. Students are not required to have any prior knowledge on machine learning.

Pre-registration form


Member of research school: