Abstract: “Consumer drone developers often face the challenge of achieving safe autonomous navigation under very tight size, weight, power, and cost constraints. In this talk, I will present our recent results towards a minimalist, but complete perception and navigation solution utilizing only a low-cost monocular visual-inertial sensor suite. I will start with an introduction of VINS-Mono, a robust state estimation solution packed with multiple features for easy deployment, such as online spatial and temporal inter-sensor calibration, loop closure, and map reuse. I will then describe efficient monocular dense mapping solutions utilizing efficient map representation, parallel computing, and deep learning techniques for real-time reconstruction of the environment. The perception system is completed by a geometric-based method for estimating full 6-DoF poses of arbitrary rigid dynamic objects using only one camera. With this real-time perception capability, trajectory planning and replanning methods with optimal time allocation are proposed to close the perception-action loop. The performance of the overall system is demonstrated via autonomous navigation in unknown complex environments, as well as aggressive drone racing in a teach-and-repeat setting.”