Sidan finns bara på engelska
Reinforcement Learning
Welcome to the 6 credits Reinforcement Learning Ph.D. level course!
Below, you can find general information about the course. For information, please visit the relevant page.
Feel free to drop me an email if you have questions.
Course responsible
Teacher: Farnaz Adib Yaghmaie
Email: farnaz.adib.yaghmaie@liu.se
Adress: 2A:535, B-huset, Campus Valla, Linköping
Entry requirements
This course has the following entry requirements.
- Probability theory (estimation, Monte Carlo, etc.) Optimization (mean-squares error, categorical cross entropy loss, dynamic programming)
- Deep learning basics (fully-connected and convolutional layers)
- Programming in Python (Numpy, plotting, deep learning with Python, etc.)
If you meet the entry requirements but these topics are not fresh in your mind, I urge you to review them before starting the course.
Course literature
We heavily rely on the following book and a couple of free online materials.
- R. Sutton and A. Barto, “Reinforcement learning: An introduction”, MIT press, 2018. This book can be downloaded for free. Please do also consider the errata.
Activities
We normally do not have lectures for this course. The activities for this course are either
- S ynchronized where the students and teacher meet at a certain location and time. This includes
- group discussions,
- exercise session.
- A synchronized where the students perform the task anytime before the deadline. This includes
- self-studying materials
- reflective journal,
- assignments,
- projects.
Sections
The course contains 4 sections. Each section contains several self-study units, and a couple of activities.
Section 1: Reinforcement Learning basics
- Unit S1.1 Introduction to RL
- Unit S1.2 Markov Decision Process
- Unit S1.3 Dynamic programming
- Unit S1.4 Monte Carlo
- Unit S1.5 Temporal Difference learning
Section 2: Temporal Difference learning in continuous spaces
- Unit S2.1 RL with function approximation
- Unit S2.2 Temporal Difference in continuous state space
- Unit S2.3 Temporal Difference in continuous state and action spaces
Section 3: Policy search in continuous action space
- Unit S3.1 Policy search
- Unit S3.2 Policy Gradient for continuous action space
- Unit S3.3 Actor-critic methods
Section 4: Advanced RL topics
- Unit S4.1 Opinion on RL
- Unit S4.2 Model-based RL (PILCO), Monte-Carlo Tree search
- Unit S4.3 Maximum entropy RL, Deep Policy Gradient (TRPO, PPO)
Time schedule
The course starts week 34. So our first synchronized event is in week 35.
The synchronous events are preliminary scheduled for Mondays 10-12 am at systemet, B-building, Campus Valla. The schedule for the course is given below.
Week | S Activity | A Activity | Units |
34 | Info session | Self-studying, reflective journal | S1.1-S1.3 |
35 | Group discussion | Self-studying, reflective journal | S1.1-S1.3 |
36 | Group discussion | Self-studying, reflective journal | S1.4-S1.5 |
37 | Exercise session | Assignment 1 | S1.1-S1.5 |
38 | Project 1 | S1.1-S1.5 | |
39 | Project 1 | S1.1-S1.5 | |
40 | Group discussion | Self-studying, reflective journal | S2.1-S2.3 |
41 | Exercise session | Assignment 2 | S2.1-S2.3 |
42 | Project 2 | S2.1-S2.3 | |
43 | Project 2 | S2.1-S2.3 | |
44 | Group discussion | Self-studying, reflective journal | S3.1-S3.3 |
45 | Exercise session | Assignment 3 | S3.1-S3.3 |
46 | Project 3 | S3.1-S3.3 | |
47 | Project 3 | S3.1-S3.3 | |
48 | Project 4 | S1.1-S4.3 | |
49 | Presentations by the students | Self-studying, reflective journal | S4.2-4.3 |
50 | Project 4 | S1.1-S4.3 | |
51 | (reserve for delays) | ||
52 | (reserve for delays) |
Evaluation
There is no written exam for this course. The evaluation is done through the successful completion of four projects, handing in solutions for the exercises, and writing reflective journals. The deadlines for these tasks are Fridays at 23:59 for the given week numbers according to the table above.
Deadline extension policy: In certain cases, students can request extension in deadlines. A deadline could be extended for two weeks.
Re-evaluation policy: If a student cannot meet the evaluation criteria while the course is running, the student can retry the tasks after six months. Note, however, that there will be no interactive sessions during that period.
Large Language Models policy: you are NOT allowed to use Large Language Models (e.g. ChatGPT, Microsoft Copiolot, etc) to generate your text, or summarize papers and books, write code, solve exercises. You can only use them to check spelling and grammar.
Projects
The purpose of the projects is to learn basic to advanced RL algorithms and to get familiar with coding for RL. Students form groups and work on projects. They are provided with codes containing empty blocks where they are supposed to write the code (only for projects 1-3). The students need to compile a report summarizing and discussing the results.
-
Project 1: Implementing basic RL algorithms (sarsa, Expected sarsa, Q-learning, double Q learning, MonteCarlo) for blackjack
-
Project 2: Implementing NAF on inverted pendulum
-
Project 3: Implemenint DDPG and SAC on inverted pendulum
-
Project 4: Implemening your favourite RL algorithm on a relevant problem: Your portfolio!
Assignments
There are mathematical theories and ideas behind RL. The aim of the exercise sessions is to get the students’ hands dirty with the math in RL. The students need to hand in their solutions to the assignments (or qualified attempts to solve them) before receiving the answers. The assignments are not graded. The solutions or qualified attempts are necessary for passing the course. The students are allowed to discuss the solutions within their groups, but each student needs to hand in their solutions separately. Students need to mention other students in their reports if they have discussed the questions together. After receiving solutions from the teacher, the students are not allowed to share them with each other.
Reflective Journals
Each student writes a reflecting journal for each section, summarizing concepts with his/her own words. That helps to build the knowledge in mind.