Back to Portfolio

Improving Sample Efficiency in Reinforcement Learning Drone Racing using TD-MPC2

📅 02/2025 - present
🏫 University of Zürich
In Progress
Note: Due to ongoing research and pending publication, certain technical details and results have been intentionally omitted from this page.

Overview

This project investigates TD-MPC2[1], a state-of-the-art model-based reinforcement learning algorithm, for autonomous drone racing across different sensory modalities. The research focuses on improving sample efficiency, a critical requirement for practical deployment of RL in aerial robotics where real-world training remains prohibitively expensive and dangerous.


The work demonstrates that TD-MPC2 achieves exceptional sample efficiency in state-based control, learning successful racing policies with significantly fewer samples than existing methods. However, systematic investigations reveal fundamental challenges when extending to vision-based control, where the algorithm's latent dynamics model struggles to infer velocity information from ego-perspective depth images. Through comprehensive ablation studies and diagnostic tools, this research identifies the critical sensory requirements for learned dynamics models in high-speed robotics.

Key Features

Technologies & Tools

PyTorch JAX ROS 1/2 Reinforcement Learning World Models Model Predictive Path Integral (MPPI) ROS/Gymnasium Gymnasium C++

Team & Collaborators

Fabio Hübel

Researcher
Robotics & Perception Group, University of Zürich

Dr. Rudolf Reiter

Research Supervisor
Robotics & Perception Group, University of Zürich

Dr. Ángel Romero

Research Supervisor
Robotics & Perception Group, University of Zürich

Ismael Geles

Research Supervisor
Robotics & Perception Group, University of Zürich

Professor Davide Scaramuzza

Research Supervisor
Robotics & Perception Group, University of Zürich

Project Images

Results & Outcomes

Results: TD-MPC2 demonstrated state-of-the-art performance in state-based drone racing, completing the challenging Split-S track with significantly less environment steps compared to DreamerV3 and PPO, while matching their final lap times. The MPPI planning component proved crucial for exploration. However, vision-based experiments revealed that the algorithm cannot accurately infer velocity from depth images, with consistency losses an order of magnitude higher than state-based training. Teacher-student training enabled vision-based agents to complete laps but produced insufficiently stable policies.


Impact: This work provides critical insights into the capabilities and limitations of modern model-based RL for agile robotics. The exceptional state-based performance demonstrates TD-MPC2's potential for rapid sim-to-real transfer with minimal real-world data. The identified velocity inference bottleneck highlights fundamental challenges in learning dynamics models from partial observations, informing future research directions in vision-based control. These findings are particularly valuable for the drone racing community, offering clear guidelines on sensory requirements and representation learning needs for high-speed autonomous flight.

References

  1. Hansen et al., "TD-MPC2: Scalable, Robust World Models for Continuous Control," 2024
  2. Hafner et al., "Mastering Diverse Domains through World Models (DreamerV3)," 2023
  3. Kaufmann et al., "Champion-level drone racing using deep reinforcement learning," Nature, 2023
  4. Romero et al., "Dream to fly: Model-based reinforcement learning for vision-based drone flight," 2025
  5. Song et al., "Flightmare: A Flexible Quadrotor Simulator," Conference on Robot Learning, 2021