In this interview, Audrow Nash speaks with Davide Scaramuzza, Assistant Professor of Robotics at the University of Zurich and leader of the Robotics and Perception Group, about autonomous unmanned vehicles (UAV) that navigate using only on-board systems—no GPS or motion capture systems.
Below are some videos of Scaramuzza’s research.
Davide Scaramuzza (1980, Italian) is Assistant Professor of Robotics at the University of Zurich. He is founder and director of the Robotics and Perception Group, where he develops cutting-edge research on low-latency vision and visually-guided micro aerial vehicles. He received his PhD (2008) in Robotics and Computer Vision at ETH Zurich (with Roland Siegwart). He was Postdoc at both ETH Zurich and the University of Pennsylvania (with Vijay Kumar and Kostas Daniilidis). From 2009 to 2012, he led the European project “sFly”, which introduced the world’s first autonomous navigation of micro quadrotors in GPS-denied environments using vision as the main sensor modality. For his research contributions, he was awarded an ERC Starting Grant (2014), the IEEE Robotics and Automation Early Career Award (2014), a Google Research Award (2014). He coauthored the book “Introduction to Autonomous Mobile Robots” (MIT Press). He is author of the first open-source Omnidirectional Camera Calibration Toolbox for MATLAB, also used at NASA, Bosch, and Daimler. He is also author of the 1-point RANSAC algorithm, an effective and computationally efficient reduction of the standard 5-point RANSAC for visual odometry, when vehicle motion is non-holonomic. He is Associate Editor of the IEEE Transactions of Robotics and has numerous publications in top-ranked robotics and computer vision journals, such as PAMI, IJCV, T-RO, IJRR, JFR, AURO. His hobbies are piano and magic tricks.
Audrow Nash: Hi, can you introduce yourself.
Davide Scaramuzza: Yeah. My name is Davide Scaramuzza and I am an Assistant Professor at the University of Zurich where I lead the robotics and perception group. My group is about three years old now. Before that I worked at ETH and the University of Pennsylvania. In fact, I got my PhD at ETH Zurich with Professor Roland Siegwart, and then I stayed there for another three years as a post-doc and led a European project called sFLY which was the first project to demonstrate autonomous navigation of vision controlled drones without GPS.
Then I moved to the University of Pennsylvania where I worked with Professors Kostas Daniilidis and Vijay Kumar. Then, in 2012, I got my position as Assistant Professor at the University of Zurich.
Audrow Nash: Now, your research, what is the goal, what are you working towards and what’s your motivation?
Davide Scaramuzza: I’m interested in developing autonomous machines, for both air and ground, that use mainly sensors for navigation and perception. I’m particularly interested in vision because I think vision is the most powerful sense for us humans, and for insects, in general. In fact, most of the brain cortex is dedicated to processing visual images. I’m very interested in exploiting image information for navigation, interpretation, reasoning, path planning and so on.
Audrow Nash: I see. Can you talk a bit about the drone research platform that you’re using with these vision systems?
Davide Scaramuzza: Yes. Since 2009 when I started doing this European project, most of my research has been formed around drones and vision control.
Audrow Nash: Are your drones quadcopters?
Davide Scaramuzza: Yes, we use quadrotors. We assemble them from off the shelf components. Actually, we’re very happy with the air drone but we only use the frame, the motors and the motor controllers from it. We replace the rest of the electronics and put in a PX4 autopilot. We run all our control perception planning algorithms on board on an android and send it to the PX4 autopilot, which we rewrote from scratch.
Audrow Nash: This is done entirely on board the quadrotor?
Davide Scaramuzza: This is all done entirely on board.
Audrow Nash: What are some applications of quadrotor drones, eventually?
Davide Scaramuzza: Well, now you can use quadrotors for search and rescue, law enforcement, room inspection, agriculture, even package delivery. Currently, my group is investing in room inspection for nuclear facilities, like CERN or Fukushima, and reactor buildings. We also have an interest in room inspection for bridges and search and rescue operations, after an earthquake for example. All the applications where GPS is not available, basically, because this allows us to focus on computer vision and visual control tools.
Audrow Nash: Turning to drones that are not using computer vision for navigation exclusively, what other modes of drones are there?
Davide Scaramuzza: If you don’t want to let your drone fly autonomously, the only option you have is a good experienced pilot and usually they use either line-of-site or goggles, so we wirelessly stream video. The problem is that, of course, it’s very difficult to control a drone using line-of-site, especially from far away. Even with goggles it’s difficult. Then, of course, at a certain distance the communication drops so there is nothing you can do.
When this happens you need to have activated the drone to be able to continue the exploration autonomously. Now, how do you localize a drone without GPS? The only way is to rely on the motor sensors; SLAM technology which stands for Simultaneous Localization and Mapping. You build a local map with the environment, then you match it with the global marker available from architectural sketches and the robot uses these algorithms to retrieve a position on the global map. Of course, you can use different sensors for SLAM, such as lasers. They’ve been very successful.
In the last 10 years we’ve seen a boom in visual SLAM using only one camera due to the fact that computer technology has progressed a lot, making it possible to run very sophisticated algorithms on a smaller computer like a Smartphone.
Audrow Nash: What are the problems of using vision on quadrotors?
Davide Scaramuzza: If you compare the performances of vision controlled drones with those remotely controlled, you will notice that vision controlled drones are still limited to controlled environments, which means controlled illumination and texture. Because a vision controlled robot needs texture in order to work, to perceive the environment. If there is no texture it will, basically, crash into the wall. One problem is that we are still restricted to a controlled environment.
Another problem is that vision algorithms are still too slow. We have an average latency, generally speaking, of 50 to 200 milliseconds.
Audrow Nash: Is that because of the sensors or the processing?
Davide Scaramuzza: It’s due to both actually. The processing, on average, takes about 50 to 200 milliseconds a frame, depending on the type of algorithms you’re running. Our semi-direct vision odometry algorithms take only 10 milliseconds on an android, but many vision, SLAM algorithms take much longer, like 30 to 50 milliseconds.
We’re also looking at different types of vision sensors, like event cameras that have an update rate of one million hertz, one mega hertz. They have as many pixels but they do not acquire the image all at the same time but asynchronously, in a similar way as the human eye works. The output is not a frame, it’s a sequence of asynchronous events at micro segment time resolution.
Audrow Nash: That moves into semi-direct visual odometry correct?
Davide Scaramuzza: Well, semi-direct visual odometry still uses high standard sensors.
Audrow Nash: It is similar in the process, is it not?
Davide Scaramuzza: What we’re doing with the event-based sensor is different from semi-direct visual odometry. It’s a different sensor so it’s separate. Before I explain SVO – semi-direct visual odometry – I have to tell you how standard visual odometry algorithms work:
We have two types of visual odometry algorithms. There are visual based or direct methods. Visual based approaches usually extract selling points across images and then they match them.
Audrow Nash: Notable features?
Davide Scaramuzza: Notable features like corners. Primitive features. No lines, small edges, equal edges for example. Then, the next step for visual based algorithms is to match corresponding core-ness across images. Then you apply a motion estimation algorithm, which is basically the core visual odometry. Usually these motion estimation algorithms work by minimizing the projection error between the observed notable points, and they project through the points that are available, from the previous frame for example.
The other approach is to use direct methods, or dense methods, that, instead of extracting several marked points from the images, work with all the pixels. The advantage is that, since you use all the pixels, you will have increased accuracy but you cannot minimize the projection error. Since they’re using photometric error minimization, they only work well if the motion baseline is very small between the two frames; you have to make sure that there is not much distance between consecutive frames.
What we invented is called SVO – semi-direct visual odometry – which, as the name says, leverages the advantages of official base approaches and direct approaches. How do we do that? Well, basically, we use direct methods for a small frame-to-frame motion estimation. Then, instead of future base methods for frame to key frame motion estimation when we’re moving, we skip frames that are too close by. When a certain frame is significantly far from a previous key frame, then we consider the new frame as a key frame and we run an adjustment in respect of the previous key frames.
Audrow Nash: What is a key frame?
Davide Scaramuzza: A key frame is basically a standard frame that is significantly far from the previous frame.
Audrow Nash: Okay. What are some of the projects that you’ve done using semi-direct visual odometry?
Davide Scaramuzza: Research wise, so far, SVO is only working on sparse image points. The next step was to densify the maps and we call these methods “dense methods.” Basically, we use every single pixel in the image; we try to track every single pixel and have a frame of an estimate of it and its uncertainty; a probabilistic filter for each pixel.
We made an app you can download from the iTunes store. It’s called 3D around and uses our remote and SVO technology to allow people to do dense reconstruction – of food, at the moment.
Audrow Nash: Interesting. What are some of the research projects that have come out of this?
Davide Scaramuzza: We are collaborating with industry, so we’re collaborating with sense FLY, which is now part of the parent group. They build drones. Initially, they became popular by making six wing airplanes. They’ve just made a new platform, a quadrotor platform, which is the first quadrotor with five vision sensors. Five cameras are looking in five different directions with small overlaps. The front looking camera has a tilt mechanism that allows it to rotate in any direction. Each of these five vision sensors is coupled with the sonar sensor for distance measurements. Basically, we are importing our technology to this platform.
We’re also collaborating with CERN in Geneva, where the famous particle accelerator is.
Audrow Nash: Going back to sense FLY for just a moment. Using your technology, what kind of things would that allow you to do?
Davide Scaramuzza: The main advantage of using our technology is for inspection operations. It will allow anyone with little or no experience to fly a quadrotor and perform simple inspection operations, like lock to a wall, lock to the surface or get a closer view of an object. The quadrotor will automatically, autonomously approach an object in order to get the best image. There are other things that I am not allowed to say because of confidentiality.
Audrow Nash: You were saying about CERN?
Davide Scaramuzza: We have a project where they want to replicate the conditions of the Big Bang. Every time they do an experiment, they produce a lot of particles and a lot of radioactivity, so you can’t send technicians down there during this time and even for a short while after. Often, they have small accidents and have to very quickly find the cause. We’re working on, one day, replacing the technicians with drones.
Currently, we are interested in automatic inspection of the LHC tunnels, as well as the secondary beam areas of CERN, which is basically the building where all the pipes emerge from the main tunnel and are used to run different experiments.
Audrow Nash: I see. You partner with a lot of companies with this research. Can you talk a bit about pairing with industry? Does your experience working with different companies benefit you and them?
Davide Scaramuzza: I’m very interested in helping companies because we do research to serve the community, which ultimately serves humanity. Many researchers complain that we should focus on research and not bother about the applications, which is the duty of the company. I think that you need applications in order to understand what are the right research questions to ask. Actually, what I like a lot about working in collaboration with companies is that they make you think about robustness; achieving robustness through demonstration.
This robustness basically means that the system works. Ideally, you want it to work 100%, while, in research, you’re happy if it functions at 10%, then you can already prove a concept.
Now robustness actually opens up interesting research questions, it makes you think about what causes the system to fail. What can we do to avoid, for example, the system failing with light? You may want to control the environment using lights on board, or changing the camera parameters.
Companies also benefit because they can have PhD students or post-docs who are working part time on the project.
Audrow Nash: What advice do you have for those beginning their research career?
Davide Scaramuzza: My advice is always to read a lot of papers, work simultaneously, code a lot. I think it’s very important to do both; not just work in theory, but also to understand what you’re working with. Usually, I ask my PhD students to do some hands on experience on drone control and computer vision and, at the same time, start reading papers, a lot of papers. You have to read a lot in order not to reinvent the wheel, reading is very important, and hands on experience.
Audrow Nash: Thank you.
Davide Scaramuzza: Thank you. You’re welcome.