← Back to Portfolio

Improving Sample Efficiency in Reinforcement Learning Drone Racing using TD-MPC2

📅 02/2025 - present

🏫 University of Zürich

In Progress

Note: Due to ongoing research and pending publication, certain technical details and results have been intentionally omitted from this page.

Overview

This project investigates TD-MPC2[1], a state-of-the-art model-based reinforcement learning algorithm, for autonomous drone racing across different sensory modalities. The research focuses on improving sample efficiency, a critical requirement for practical deployment of RL in aerial robotics where real-world training remains prohibitively expensive and dangerous.

The work demonstrates that TD-MPC2 achieves exceptional sample efficiency in state-based control, learning successful racing policies with significantly fewer samples than existing methods. However, systematic investigations reveal fundamental challenges when extending to vision-based control, where the algorithm's latent dynamics model struggles to infer velocity information from ego-perspective depth images. Through comprehensive ablation studies and diagnostic tools, this research identifies the critical sensory requirements for learned dynamics models in high-speed robotics.

Key Features

Superior Sample Efficiency Achieved better sample efficiency than PPO and DreamerV3[2] on the challenging Split-S track.
Comprehensive Vision-Based Analysis Systematic ablation studies revealing that velocity inference from is the critical bottleneck preventing successful vision-based policy learning.
Teacher-Student Framework Novel adaptation enabling vision-based agents to complete full laps through expert guidance, though with remaining stability challenges for fully autonomous operation

Technologies & Tools

PyTorch JAX ROS 1/2 Reinforcement Learning World Models Model Predictive Path Integral (MPPI) ROS/Gymnasium Gymnasium C++

Team & Collaborators

Fabio Hübel

Researcher

Robotics & Perception Group, University of Zürich

Dr. Rudolf Reiter

Research Supervisor

Robotics & Perception Group, University of Zürich

Dr. Ángel Romero

Research Supervisor

Robotics & Perception Group, University of Zürich

Ismael Geles

Research Supervisor

Robotics & Perception Group, University of Zürich

Professor Davide Scaramuzza

Research Supervisor

Robotics & Perception Group, University of Zürich

Project Images

TDMPC2 deployment flow chart

The TD-MPC2 algorithm during inference. An observation is encoded into a latent representation, which is then used by the learned dynamics model to plan action sequences with MPPI.

State-based TD-MPC2 deployed on a physical drone flying the Figure-8 track. This policy was trained with just 4.9 million environment steps in simulation.

Results & Outcomes

Results: TD-MPC2 demonstrated state-of-the-art performance in state-based drone racing, completing the challenging Split-S track with significantly less environment steps compared to DreamerV3 and PPO, while matching their final lap times. The MPPI planning component proved crucial for exploration. However, vision-based experiments revealed that the algorithm cannot accurately infer velocity from depth images, with consistency losses an order of magnitude higher than state-based training. Teacher-student training enabled vision-based agents to complete laps but produced insufficiently stable policies.

Impact: This work provides critical insights into the capabilities and limitations of modern model-based RL for agile robotics. The exceptional state-based performance demonstrates TD-MPC2's potential for rapid sim-to-real transfer with minimal real-world data. The identified velocity inference bottleneck highlights fundamental challenges in learning dynamics models from partial observations, informing future research directions in vision-based control. These findings are particularly valuable for the drone racing community, offering clear guidelines on sensory requirements and representation learning needs for high-speed autonomous flight.

References

Hansen et al., "TD-MPC2: Scalable, Robust World Models for Continuous Control," 2024
Hafner et al., "Mastering Diverse Domains through World Models (DreamerV3)," 2023
Kaufmann et al., "Champion-level drone racing using deep reinforcement learning," Nature, 2023
Romero et al., "Dream to fly: Model-based reinforcement learning for vision-based drone flight," 2025
Song et al., "Flightmare: A Flexible Quadrotor Simulator," Conference on Robot Learning, 2021