California published its summary of all the reports submitted by vendors testing robocars in the state. You can read the individual reports. They are interesting, but several other outlines have created summaries of the reports calculating things like the number of interventions per mile. On these numbers, Google’s lead is extreme. Of over 600,000 autonomous miles driven by the various teams, Google/Waymo was 97% of them — in other words, 30 times as much as everybody else put together.
Beyond that, their rate of miles between disengagements (around 5,000 — a 4x improvement over 2015) is one or two orders of magnitude better than the others, and in fact for most of the others, they have so few miles that you can’t even produce a meaningful number. Only Cruise, Nissan and Delphi can claim enough miles to really tell.
Tesla is a notable entry. In 2015 they reported driving zero miles, and in 2016 they did report a very small number of miles with tons of disengagements from software failures (one very 3 miles.) That’s because Tesla’s autopilot is not a robocar system, and so miles driven by it are not counted. Tesla’s numbers must come from small scale tests of a more experimental vehicle. This is very much not in line with Tesla’s claim that it will release full autonomy features for their cars fairly soon, and that they already have all the hardware needed for that to happen.
Unfortunately, you can’t easily compare these numbers:
One complication is that typically safety drivers are told to disengage if they have any doubts. It thus varies from driver to driver and company to company what “doubts” are and how to deal with them.
Google has said their approach is to test any disengagement in simulator, to find out what probably would have happened if the driver did not disengage. If there would have been a “contact” (accident) then Google considers that a real incident and those are rarer than is reported here. Many of the disengagements are when software detects faults with software or sensors. There, we do indeed have a problem, but like human beings who zone out, not all such failures will cause accidents or even safety issues. You want to get rid of all of them, to be sure, but if you are are trying to compare the safety of the systems to humans, it’s not easy to do.
It’s hard to figure out a good way to get comparable numbers from all teams. The new federal guidelines, while mostly terrible, contain an interesting rule that teams must provide their sensor logs for any incident. This will allow independent parties to compare incidents in a meaningful way, and possibly even run them all in simulator at some level.
It would be worthwhile for every team to be required to report incidents that would have caused accidents. That requires a good simulator, however, and it’s hard for the law to demand this of everybody.