Results: TD-MPC2 demonstrated state-of-the-art performance in state-based drone racing, completing the challenging Split-S track with significantly less environment steps compared to DreamerV3 and PPO, while matching their final lap times. The MPPI planning component proved crucial for exploration. However, vision-based experiments revealed that the algorithm cannot accurately infer velocity from depth images, with consistency losses an order of magnitude higher than state-based training. Teacher-student training enabled vision-based agents to complete laps but produced insufficiently stable policies.
Impact: This work provides critical insights into the capabilities and limitations of modern model-based RL for agile robotics. The exceptional state-based performance demonstrates TD-MPC2's potential for rapid sim-to-real transfer with minimal real-world data. The identified velocity inference bottleneck highlights fundamental challenges in learning dynamics models from partial observations, informing future research directions in vision-based control. These findings are particularly valuable for the drone racing community, offering clear guidelines on sensory requirements and representation learning needs for high-speed autonomous flight.