Author Image

Hi, I am Lukas

I am a Researcher
I work on Multi-Agent Reinforcement Learning
I love music
I love reading
I love gaming
I love travelling

Lukas Schäfer

Researcher at Microsoft Research

I am a AI researcher at Microsoft Research with the goal of creating autonomous agents that can efficiently learn to solve complex decision-making tasks in the real world. As part of the Game Intelligence team, I am working towards autonomous agents that can enable novel experiences and tools in video games. I am broadly excited about technology and AI but most interested in its application for decision-making problems.

Prior to joining Microsoft Research, I received my PhD and MSc from the University of Edinburgh where I was supervised by Stefano Albrecht and Amos Storkey. My research focused on reinforcement learning, in particular in the context of multi-agent systems that require multiple agents to cooperate with each other. Together with Stefano and Filippos, I wrote a textbook on multi-agent reinforcement learning that will be published with MIT Press and is available at www.marl-book.com!

News

Oct 10, 2024

🏆 I am very excited to have passed my PhD viva today with no corrections! Huge thanks to my examiners Prof. Subramanian Ramamoorthy and Prof. Karl Tuyls for taking the time to evaluate my thesis, their thoughtful feedback, and for making the viva experience as comfortable as possible.

Oct 01, 2024

📢 I am very excited to be re-joining the Game Intelligence team at Microsoft Research as a Researcher after interning with them last year! As part of the team, I will be working on creating autonomous agents that can enable novel experiences and tools in video games.

Jul 24, 2024

🏆 Honored to receive the best reviewer award for the International Conference on Machine Learning (ICML) 2024!

Dec 06, 2023

📢 New preprint on Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games is now available on arXiv.

Nov 01, 2023

📢 It is DONE! Our textbook “Multi-Agent Reinforcement Learning: Foundations and Modern Approaches” is now with MIT Press and available on the official webpage www.marl-book.com! The print release is scheduled for late 2024.

Oct 30, 2023

📃 Our work Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning will be presented at the Workshop on Generalization in Planning at the Conference on Neural Information Processing Systems (NeurIPS) 2023!

May 29, 2023

📢 I`m very excited to announce that the first pre-print non-final PDF of our book Multi-Agent Reinforcement Learning: Foundations and Modern Approaches is now released and available on the official webpage www.marl-book.com!

Apr 14, 2023

📃 Our works, Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning and Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments, will be presented at the Adaptive and Learning Agents (ALA) Workshop at the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2023!

Publications

Multi-Agent Reinforcement Learning: Foundations and Modern Approaches

Stefano V. Albrecht Filippos Christianos Lukas Schäfer

This book provides a comprehensive introduction to multi-agent reinforcement learning (MARL), a rapidly growing field that combines insights from game theory, reinforcement learning, and multi-agent systems. The book covers the foundations of MARL, including the key concepts, algorithms, and challenges, and presents a detailed overview of contemporary approaches of deep MARL research.

Multi-Agent Reinforcement Learning

Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games

arXiv 2023

Lukas Schäfer Logan Jones Anssi Kanervisto Yuhan Cao Tabish Rashid Raluca Georgescu Dave Bignell Siddhartha Sen Andrea Treviño Gavito Sam Devlin

Video games have served as useful benchmarks for the decision making community, but going beyond Atari games towards training agents in modern games has been prohibitively expensive for the vast majority of the research community. Recent progress in the research, development and open release of large vision models has the potential to amortize some of these costs across the community. However, it is currently unclear which of these models have learnt representations that retain information critical for sequential decision making. Towards enabling wider participation in the research of gameplaying agents in modern games, we present a systematic study of imitation learning with publicly available visual encoders compared to the typical, task-specific, end-to-end training approach in Minecraft, Minecraft Dungeons and Counter-Strike: Global Offensive.

Imitation Learning Visual Encoders

Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning

Workshop on Generalization in Planning in the Conference on Neural Information Processing Systems (NeurIPS) 2023

Lukas Schäfer Filippos Christianos Amos Storkey Stefano V. Albrecht

Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks.

Multi-Agent Reinforcement Learning Generalisation

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

Adaptive and Learning Agents (ALA) Workshop in the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2023

Lukas Schäfer Oliver Slumbers Stephen McAleer Yali Du Stefano V. Albrecht David Mguni

Cooperative multi-agent reinforcement learning (MARL) requires agents to explore to learn to cooperate. Existing value-based MARL algorithms commonly rely on random exploration, such as ϵ-greedy, which is inefficient in discovering multi-agent cooperation. Additionally, the environment in MARL appears non-stationary to any individual agent due to the simultaneous training of other agents, leading to highly variant and thus unstable optimisation signals. In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a general framework to extend any value-based MARL algorithm. EMAX trains ensembles of value functions for each agent to address the key challenges of exploration and non-stationarity: (1) The uncertainty of value estimates across the ensemble is used in a UCB policy to guide the exploration of agents to parts of the environment which require cooperation. (2) Average value estimates across the ensemble serve as target values. These targets exhibit lower variance compared to commonly applied target networks and we show that they lead to more stable gradients during the optimisation. We instantiate three value-based MARL algorithms with EMAX, independent DQN, VDN and QMIX, and evaluate them in 21 tasks across four environments. Using ensembles of five value functions, EMAX improves sample efficiency and final evaluation returns of these algorithms by 53%, 36%, and 498%, respectively, averaged all 21 tasks.

Multi-Agent Reinforcement Learning Exploration

Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

Adaptive and Learning Agents (ALA) Workshop in the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2023

Alain Andres Lukas Schäfer Esther Villar-Rodriguez Stefano V. Albrecht Javier Del Ser

One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently during online RL training both consistently improve the sample-efficiency while converging to optimal policies. Furthermore, we show that pre-training a policy from as few as two trajectories can make the difference between learning an optimal policy at the end of online training and not learning at all. Our findings motivate the widespread adoption of IL for pre-training and concurrent IL in procedurally generated environments whenever offline trajectories are available or can be generated.

Reinforcement Learning Imitation Learning

Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning

Transactions on Machine Learning Research (TMLR) Journal 2023

Trevor McInroe Lukas Schäfer Stefano V. Albrecht

Learning control from pixels is difficult for reinforcement learning (RL) agents because representation learning and policy learning are intertwined. Previous approaches remedy this issue with auxiliary representation learning tasks, but they either do not consider the temporal aspect of the problem or only consider single-step transitions. Instead, we propose Hierarchical k-Step Latent (HKSL), an auxiliary task that learns representations via a hierarchy of forward models that operate at varying magnitudes of step skipping while also learning to communicate between levels in the hierarchy. We evaluate HKSL in a suite of 30 robotic control tasks and find that HKSL either reaches higher episodic returns or converges to maximum performance more quickly than several current baselines. Also, we find that levels in HKSL’s hierarchy can learn to specialize in long- or short-term consequences of agent actions, thereby providing the downstream control policy with more informative representations. Finally, we determine that communication channels between hierarchy levels organize information based on both sides of the communication process, which improves sample efficiency.

Reinforcement Learning Representation Learning

Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning

Conference on Neural Information Processing Systems (NeurIPS) 2022

Rujie Zhong Duohan Zhang Lukas Schäfer Stefano V. Albrecht Josiah P. Hanna

Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to match the expected distribution of on-policy data after observing only a finite number of trajectories and this failure hinders data-efficient policy evaluation. Towards improved data-efficiency, we show how non-i.i.d., off-policy sampling can produce data that more closely matches the expected on-policy data distribution and consequently increases the accuracy of the Monte Carlo estimator for policy evaluation. We introduce a method called Robust On-Policy Sampling and demonstrate theoretically and empirically that it produces data that converges faster to the expected on-policy distribution compared to on-policy sampling. Empirically, we show that this faster convergence leads to lower mean squared error policy value estimates.

Reinforcement Learning Policy Evaluation

Task Generalisation in Multi-Agent Reinforcement Learning

Doctoral Consortium at the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2022

Multi-agent reinforcement learning agents are typically trained in a single environment. As a consequence, they overfit to the training environment which results in sensitivity to perturbations and inability to generalise to similar environments. For multi-agent reinforcement learning approaches to be applicable in real-world scenarios, generalisation and robustness need to be addressed. However, unlike in supervised learning, generalisation lacks a clear definition in multi-agent reinforcement learning. We discuss the problem of task generalisation and demonstrate the difficulty of zero-shot generalisation and finetuning at the example of multi-robot warehouse coordination with preliminary results. Lastly, we discuss promising directions of research working towards generalisation of multi-agent reinforcement learning.

Multi-Agent Reinforcement Learning Generalisation

Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration

International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2022

Lukas Schäfer Filippos Christianos Josiah P. Hanna Stefano V. Albrecht

Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. Our results show that DeRL is more robust to varying scale and rate of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically-motivated baselines in fewer interactions. Lastly, we discuss the challenge of distribution shift and show that divergence constraint regularisers can successfully minimise instability caused by divergence of exploration and exploitation policies.

Reinforcement Learning Exploration

Deep reinforcement learning for multi-agent interaction

AI Communications Special Issue on Multi-Agent Systems Research in the UK 2022

Ibrahim H. Ahmed Cillian Brewitt Ignacio Carlucho Filippos Christianos Mhairi Dunion Elliot Fosong Samuel Garcin Shangmin Guo Balint Gyevnar Trevor McInroe Georgios Papoudakis Arrasy Rahman Lukas Schäfer Massimiliano Tamborski Giuseppe Vecchio Cheng Wang Stefano V. Albrecht

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.

Multi-Agent Reinforcement Learning Reinforcement Learning

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

International Conference on Intelligent Robots and Systems 2024

Aleksandar Krnjaic Raul D. Steleac Jonathan D. Thomas Georgios Papoudakis Lukas Schäfer Andrew Wing Keung To Kuan-Ho Lao Murat Cubuktepe Matthew Haley Peter Börsting Stefano V. Albrecht

We envision a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput). Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), as the agents learn through experience how to optimally cooperate with one another. We develop hierarchical MARL algorithms in which a manager assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency and overall pick rates over baseline MARL algorithms in diverse warehouse configurations, and substantially outperform two established industry heuristics for order-picking systems.

Multi-Agent Reinforcement Learning Warehouse Logistics

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks

Conference on Neural Information Processing Systems (NeurIPS) 2021

Georgios Papoudakis Filippos Christianos Lukas Schäfer Stefano V. Albrecht

Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we consistently evaluate and compare three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, value decomposition) in a diverse range of cooperative multi-agent learning tasks. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks, and we provide insights regarding the effectiveness of different learning approaches. We open-source EPyMARL, which extends the PyMARL codebase to include additional algorithms and allow for flexible configuration of algorithm implementation details such as parameter sharing. Finally, we open-source two environments for multi-agent research which focus on coordination under sparse rewards.

Multi-Agent Reinforcement Learning Benchmark

Decoupling Exploration and Exploitation in Reinforcement Learning

Unsupervised Reinforcement Learning (URL) Workshop in the International Conference on Machine Learning 2021

Lukas Schäfer Filippos Christianos Josiah P. Hanna Stefano V. Albrecht

Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. We show that DeRL is more robust to scaling and speed of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically motivated baselines in fewer interactions.

Reinforcement Learning Exploration

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation

Workshop on Offline Reinforcement Learning in the Conference on Neural Information Processing Systems 2021

Rujie Zhong Josiah P. Hanna Lukas Schäfer Stefano V. Albrecht

This paper considers how to complement offline reinforcement learning (RL) data with additional data collection for the task of policy evaluation. In policy evaluation, the task is to estimate the expected return of an evaluation policy on an environment of interest. Prior work on offline policy evaluation typically only considers a static dataset. We consider a setting where we can collect a small amount of additional data to combine with a potentially larger offline RL dataset. We show that simply running the evaluation policy – on-policy data collection – is sub-optimal for this setting. We then introduce two new data collection strategies for policy evaluation, both of which consider previously collected data when collecting future data so as to reduce distribution shift (or sampling error) in the entire dataset collected. Our empirical results show that compared to on-policy sampling, our strategies produce data with lower sampling error and generally lead to lower mean-squared error in policy evaluation for any total dataset size. We also show that these strategies can start from initial off-policy data, collect additional data, and then use both the initial and new data to produce low mean-squared error policy evaluation without using off-policy corrections.

Reinforcement Learning Policy Evaluation

Comparative evaluation of cooperative multi-agent deep reinforcement learning algorithms

Workshop on Adaptive and Learning Agents in the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2021

Georgios Papoudakis Filippos Christianos Lukas Schäfer Stefano V. Albrecht

Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we evaluate and compare three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, and value decomposition) in a diverse range of fully-cooperative multi-agent learning tasks. Our experiments can serve as a reference for the expected performance of algorithms across different learning tasks. We also provide further insight about (1) when independent learning might be surprisingly effective despite non-stationarity, (2) when centralised training should (and shouldn’t) be applied and (3) which benefits value decomposition can bring.

Multi-Agent Reinforcement Learning Benchmark

Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning

arXiv 2021

Trevor McInroe Lukas Schäfer Stefano V. Albrecht

Deep reinforcement learning (RL) agents that exist in high-dimensional state spaces, such as those composed of images, have interconnected learning burdens. Agents must learn an action-selection policy that completes their given task, which requires them to learn a representation of the state space that discerns between useful and useless information. The reward function is the only supervised feedback that RL agents receive, which causes a representation learning bottleneck that can manifest in poor sample efficiency. We present k-Step Latent (KSL), a new representation learning method that enforces temporal consistency of representations via a self-supervised auxiliary task wherein agents learn to recurrently predict action-conditioned representations of the state space. The state encoder learned by KSL produces low-dimensional representations that make optimization of the RL task more sample efficient. Altogether, KSL produces state-of-the-art results in both data efficiency and asymptotic performance in the popular PlaNet benchmark suite. Our analyses show that KSL produces encoders that generalize better to new tasks unseen during training, and its representations are more strongly tied to reward, are more invariant to perturbations in the state space, and move more smoothly through the temporal axis of the RL problem than other methods such as DrQ, RAD, CURL, and SAC-AE.

Reinforcement Learning Representation Learning

Shared Experience Actor-Critic for Multi-Agent Reinforcement learning

Conference on Neural Information Processing Systems (NeurIPS) 2020

Filippos Christianos Lukas Schäfer Stefano V. Albrecht

Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.

Multi-Agent Reinforcement Learning Exploration

Experiences

1

Microsoft Research

Oct 2024 - Oct 2023

Cambridge

Research Scientist Intern

Apr 2023 - Oct 2023

Responsibilities:

Researching visual encoders for imitation learning in video games under the supervision of Sam Devlin.

Researcher

Oct 2024 - Present

Responsibilities:

Researching autonomous agents that can enable novel experiences and tools in video games."

Heidelberg Laureate Forum

Sep 2022 - Sep 2022

Heidelberg

The Heidelberg Laureate Forum brings together the most exceptional mathematicians and computer scientists of their generations. Each year, the recipients of the most prestigious awards in mathematics and computer science, the Abel Prize, ACM A.M. Turing Award, ACM Prize in Computing, Fields Medal, IMU Abacus Medal and Nevanlinna Prize, meet 200 selected young researchers from all over the world. Participants spend a week interacting and networking in a relaxed atmosphere designed to encourage scientific exchange.

Young Research Attendee

Sep 2022 - Sep 2022

2

3

Huawei Noah's Ark Lab

Jul 2022 - Dec 2022

London

The Noah’s Ark Lab is the AI research center for Huawei Technologies, working towards significant contributions to both the company and society by innovating in artificial intelligence, data mining and related fields.

Research Scientist Intern

Jul 2022 - Dec 2022

Responsibilities:

Researched ensemble models for exploration in multi-agent reinforcement learning with the RL and multi-agent team under the supervision of David Mguni.
Submitted a publication as the result of the internship to a top-tier machine learning conference (under review). A preprint is available on arXiv (https://arxiv.org/abs/2302.03439).

Dematic

Nov 2020 - Mar 2021

Remote

Dematic is global player focused on design and implementation of automated system solutions for warehouses, distribution centres and production facilities.

Research Intern

Nov 2020 - Mar 2021

Responsibilities:

Applying state-of-the-art AI technology to enable a prototype for automation of large-scale robotic warehouse logistics.

4

5

HYPED

Sep 2018 - Aug 2020

Edinburgh

HYPED is a team of students at the University of Edinburgh dedicated to developing the Hyperloop concept and inspiring future generations about engineering. HYPED has received awards from SpaceX, Virgin Hyperloop One and Institution of Civil Engineers.

Navigation Advisor

Sep 2019 - Aug 2020

Responsibilities:

Advising navigation team on the adaptation and implementation of improved sensor and filtering techniques

Navigation Engineer

Sep 2018 - Aug 2019

Responsibilities:

Developing navigation system of “The Flying Podsman” Hyperloop prototype using sensor filtering, processing and control techniques to estimate location, orientation and speed of the pod
Finalist for the SpaceX 2019 Hyperloop competition in California in Summer 2019

Education

University of Edinburgh

2019-Present

Ph.D in Data Science and Artificial Intelligence

UK PhD: Pass with no corrections out of

Thesis:

Efficient Exploration in Single-Agent and Multi-Agent Deep Reinforcement Learning

Supervisors:

Stefano V. Albrecht (primary) and Amos Storkey (secondary)

Funding:

Principal’s Career Development Scholarship from the University of Edinburgh

Keywords:

Reinforcement Learning, Multi-Agent Systems, Generalisation, Exploration, Intrinsic Rewards

University of Edinburgh

2018-2019

M.Sc. in Informatics

CGPA: 77.28% out of

Publications:

Curiosity in Multi-Agent Reinforcement Learning (74%)

Taken Courses:

Course Name	Obtained Credit
Reinforcement Learning	82%
Algorithmic Game Theory and its Applications	98%
Machine Learning and Pattern Recognition	64%
Probabilistic Modelling and Reasoning	75%
Decision Making in Robots and Autonomous Agents	86%
Robotics: Science and Systems	87%
Natural Computing	84%
Informatics Project Proposal	73%
Informatics Research Review	72%

Extracurricular Activities:

Active position as navigation engineer for HYPED.
Participation in GEAS roleplaying society.
Participation in EUKC - Edinburgh University Kendo Club.

Funding:

DAAD (German Academic Exchange Service) graduate scholarship & Stevenson Exchange Scholarship

Saarland University

2015-2018

B.Sc. in Informatics

German scale: 1.2 out of

Publications:

Domain-Dependent Policy Learning using Neural Networks in Classical Planning

Taken Courses:

Course Name	Obtained Credit
Automated Planning	1.0
Admissible Search Enhancements	1.0
Information Retrieval and Data Mining	1.7
Neural Networks: Implementation and Application	2.0
Artificial Intelligence	1.7
Software Engineering	1.3
Modern Imperative Programming Languages	1.3
Concurrent Programming	2.7
Fundamentals of Data Structures and Algorithms	1.7
Information Systems	1.3
Introduction to Theoretical Computer Science	1.0
System Architecture	1.0
Mathematics for Computer Scientists I	1.0
Mathematics for Computer Scientists II	2.3
Mathematics for Computer Scientists III	1.7
Programming I	1.0
Programming II	1.0
Japanese Foundations - Shokyu I	1.3
Japanese Foundations - Shokyu II	2.0
Japanese Applied Geography	1.0
Japanese History II	1.0

Extracurricular Activities:

Japanese language and cultural studies as minor subject.

Teaching Experience

Oct 2019 - June 2022, School of Informatics, University of Edinburgh

Teaching assistant, demonstrator and marker for three iterations of the Reinforcement Learning lecture at the University of Edinburgh under Dr. Stefano V. Albrecht

Holding lectures on implementation of RL systems and Deep RL
Designing RL project covering wide range of topics including dynamic programming, single- and multi-agent RL as well as deep RL
Marking project and exam for reinforcement learning course
Advising students on various challenges regarding lecture material and content

Jun 2022 - Present, School of Informatics, University of Edinburgh

Co-supervised visiting PhD student project at the University of Edinburgh

Supervision through regular meetings discussing research project and ideating novel solutions
Assisted project towards a successful workshop publication at the ALA workshop at AAMAS 2023

Feb 2021 - Aug 2021, School of Informatics, University of Edinburgh

Co-supervised final Masters students’ projects at the University of Edinburgh

Co-supervised two M.Sc. students through project proposal, refinement and execution towards final thesis
Assisted M.Sc. student from their thesis towards a successful workshop publication at NeurIPS 2021, and a successful main conference publication at NeurIPS 2022.

Sep 2017 - Oct 2017, Mathematics Preparation Course, Saarland University

Voluntary lecturer and coach for the mathematics preparation course preparing upcoming computer science undergraduate students for their studies

Assisted the organisation of the mathematics preparation course for upcoming computer science students aiming to introduce them to foundational mathematical concepts, the university and student life as a whole
Introduced ∼250 participants to the importance of mathematics for computer science, formal languages and predicate logic in daily lectures of the first week
Supervised two groups to provide feedback and further assistance in daily coaching-sessions
The course received the BESTE-award for special student commitment 2017 at Saarland University

Oct 2016 - Mar 2017, Dependable Systems and Software Chair, Saarland University

Tutor for the Programming 1 lecture about functional programming at the Dependable Systems and Software Group chair of Saarland University under Prof. Dr. Holger Hermanns

Taught first-year students fundamental concepts of functional programming, basic complexity theory and inductive correctness proofs in weekly tutorials and office hours
Corrected weekly tests as well as mid- and endterm exams
Collectively created learning materials and discussed student progress as part of the whole teaching team

Reviewing

Conferences

2024: ICML (best reviewer award), RLC, AAMAS

2023: NeurIPS, NeurIPS Datasets and Benchmark Track, ICML, AAMAS

2022: NeurIPS, NeurIPS Datasets and Benchmark Track, ICML (top 10% outstanding reviewer award), AAMAS

2021: NeurIPS

Workshops

2020: Pre-registration experiment workshop at NeurIPS

Journals

2024: Transactions on Machine Learning Research (TMLR)