Learning a Single Near-hover Position Controller for Vastly Different Quadcopters

Learning a Single Near-hover Position Controller
for Vastly Different Quadcopters

Dingqi Zhang, Antonio Loquercio, Xiangyu Wu, Ashish Kumar, Jitendra Malik, Mark W. Mueller

Follow-Up Work

We are now thrilled to present our follow-up work on A Learning-based Quadcopter Controller with Extreme Adaptation. In this new work, we leverage a combination of imitation learning and reinforcement learning, creating a fast-adapting and general control framework for quadcopters that eliminates the need for precise model estimation or manual tuning. Extensive evaluations demonstrate the controller's ability to generalize to unseen quadcopter parameters, with an adaptation range up to 16 times broader than the training set.

Abstract

This paper proposes an adaptive near-hover position controller for quadcopters, which can be deployed to quadcopters of very different mass, size and motor constants, and also shows rapid adaptation to unknown disturbances during runtime. The core algorithmic idea is to learn a single policy that can adapt online at test time not only to the disturbances applied to the drone, but also to the robot dynamics and hardware in the same framework. We achieve this by training a neural network to estimate a latent representation of the robot and environment parameters, which is used to condition the behaviour of the controller, also represented as a neural network. We train both networks exclusively in simulation with the goal of flying the quadcopters to goal positions and avoiding crashes to the ground. We directly deploy the same controller trained in the simulation without any modifications on two quadcopters in the real world with differences in mass, size, motors, and propellers with mass differing by 4.5 times. In addition, we show rapid adaptation to sudden and large disturbances up to one-third of the mass of the quadcopters. We perform an extensive evaluation in both simulation and the physical world, where we outperform a state-of-the-art learning-based adaptive controller and a traditional PID controller specifically tuned to each platform individually.

[arXiv]

Out-of-Distribution Disturbance Rejection

Random Pushes

Swing Payload

Comparison with Baselines

All our results are from the exact same policy across all quadcopters, the baseline LearningToFly (LTF) is a learning-based adaptive baseline, the baseline PID is fitted to each quadcopter separately.

Hover Large

Ours