Robotics is both an exciting and intimidating field because it takes so many different capabilities and disciplines to spin up a robot. This goes all the way from mechatronic design and hardware engineering to some of the more philosophical parts of software engineering.
Those who know me also know I’m a total “high-level guy”. I’m fascinated by how complex a modern robotic system is and how all the pieces come together to make something that’s truly impressive. I don’t have particularly deep knowledge in any specific area of robotics, but I appreciate the work of countless brilliant people that give folks like me the ability to piece things together.
In this post, I will go through what it takes to put together a robot that is highly capable by today’s standards. This is very much an opinion article that tries to slice robotic skills along various dimensions. I found it challenging to find a single “correct” taxonomy, so I look forward to hearing from you about anything you might disagree with or have done differently.
Roughly speaking, a robot is a machine with sensors, actuators, and some computing device that has been programmed to perform some level of autonomous behavior. Below is the essential “getting started” diagram that anyone in AI should recognize.
This opens up the ever present “robot vs. machine” debate that I don’t want to spent too much time on, but my personal distinction between a robot vs. a machine includes a couple of specifications.
Robots are not created equal. Not to say that simpler robots don’t have their purpose — knowing how simple or complex (technology- and budget-wise) to build a robot given the problem at hand is a true engineering challenge. To say it a different way: overengineering is not always necessary.
I will now categorize the capabilities of a robot into three bins: Automatic, Autonomous, and Aware. This roughly corresponds to how low- or high-level a robot’s skills have been developed, but it’s not an exact mapping as you will see.
Automatic: The robot is controllable and it can execute motion commands provided by a human in a constrained environment. Think, for example, of industrial robots in an assembly line. You don’t need to program very complicated intelligence for a robot to put together a piece of an airplane from a fixed set of instructions. What you need is fast, reliable, and sustained operation. The motion trajectories of these robots are often trained and calibrated by a technician for a very specific task, and the surroundings are tailored for robots and humans to work with minimal or no conflict.
Autonomous: The robot can now execute tasks in an uncertain environment with minimal human supervision. One of the most pervasive example of this is self-driving cars (which I absolutely label as robots). Today’s autonomous vehicles can detect and avoid other cars and pedestrians, perform lane-change maneuvers, and navigate the rules of traffic with fairly high success rates despite the negative media coverage they may get for not being 100% perfect.
Aware: This gets a little bit into the “sci-fi” camp of robots, but rest assured that research is bringing this closer to reality day by day. You could tag a robot as aware when it can establish two-way communication with humans. An aware robot, unlike the previous two categories, is not only a tool that receives commands and makes mundane tasks easier, but one that is perhaps more collaborative. In theory, aware robots can understand the world at a higher level of abstraction than their more primitive counterparts, and process humans’ intentions from verbal or nonverbal cues, still under the general goal of solving a real-life problem.
A good example is a robot that can help humans assemble furniture. It operates in the same physical and task space as us humans, so it should adapt to where we choose to position ourselves or what part of the furniture we are working on without getting in the way. It can listen to our commands or requests, learn from our demonstrations, and tell us what it sees or what it might do next in language we understand so we can also take an active part in ensuring the robot is being used effectively.
Above you can see a perpendicular axis of categorizing robot awareness. Each highlighted area represents the subset of skills it may require for a robot to achieve awareness on that particular dimension. Notice how forming abstractions — i.e., semantic understanding and reasoning — is crucial for all forms of awareness. That is no coincidence.
The most important takeaway as we move from automatic to aware is the increasing ability for robots to operate “in the wild”. Whereas an industrial robot may be designed to perform repetitive tasks with superhuman speed and strength, a home service robot will often sacrifice this kind of task-specific performance with more generalized skills needed for human interaction and navigating uncertain and/or unknown environments.
Spinning up a robot requires a combination of skills at different levels of abstraction. These skills are all important aspects of a robot’s software stack and require vastly different areas of expertise. This goes back to the central point of this blog post: it’s not easy to build a highly capable robot, and it’s certainly not easy to do it alone!
This denotes the low-level foundational skills of a robot. Without a robust set of functional skills, you would have a hard time getting your robot to be successful at anything else higher up the skill chain.
Being at the lowest level of abstraction, functional skills are closely tied to direct interaction with the actuators and sensor of the robot. Indeed, these skills can be discussed along the acting and sensing modalities.
Behavioral skills are a step above the more “raw” sensor-to-actuator processing loops that we explored in the Functional Skills section. Creating a solid set of behavioral skills simplifies our interactions with robots both as users and programmers.
At the functional level, we have perhaps demonstrated capabilities for the robot to respond to very concrete, mathematical goals. For example,
At the behavioral level, commands may take the form of:
Abstracting away these motion planning and navigation tasks requires a combination of models of the robot and/or world, and of course, our set of functional skills like perception and control.
If this hasn’t yet got across, behavioral skills and functional skills definitely do not work in isolation. Motion planning in a space with obstacles requires perception and navigation. Navigating in a world requires controls and motion planning.
On the language side, Natural Language Processing (NLP) is what takes us from raw text input — whether it came from speech or directly typed in — to something more actionable for the robot. For instance, if a robot is given the command “bring me a snack from the kitchen”, the NLP engine needs to interpret this at the appropriate level to perform the task. Putting it all together,
Simply put, abstract skills are the bridge between humans and robot behaviors. All the skills in the row above perform some transformation that lets humans more easily express their commands to robots, and similarly lets robots more easily express themselves to humans.
Task and Behavior Planning operates on the key principles of abstraction and composition. A command like “get me a snack from the kitchen” can be broken down into a set of fundamental behaviors (navigation, motion planning, perception, etc.) that can be parameterized and thus generalized to other types of commands such as “put the bottle of water in the garbage”. Having a common language like this makes it useful for programmers to add capabilities to robots, and for users to leverage their robot companions to solve a wider set of problems. Modeling tools such as finite-state machines and behavior trees have been integral in implementing such modular systems.
Semantic Understanding and Reasoning is bringing abstract knowledge to a robot’s internal model of the world. For example, in navigation we saw that a map can be represented as “occupied vs. unoccupied”. In reality, there are more semantics that can enrich the task space of the robot besides “move here, avoid there”. Does the environment have separate rooms? Is some of the “occupied space” movable if needed? Are there elements in the world where objects can be stored and retrieved? Where are certain objects typically found such that the robot can perform a more targeted search? This recent paper from Luca Carlone’s group is a neat exploration into maps with rich semantic information, and a huge portal of future work that could build on this.
Natural Language Understanding and Dialog is effectively two-way communication of semantic understanding between humans and robots. After all, the point of abstracting away our world model was so we humans could work with it more easily. Here are examples of both directions of communication:
This is all a bit of a pipe dream — Can a robot really be programmed to work with humans at such a high level of interaction? It’s not easy, but research tries to tackle problems like these every day. I believe a good measure of effective human-robot interaction is whether the human and robot jointly learn not to run into the same problem over and over, thus improving the user experience.
Thanks for reading, even if you just skimmed through the pictures. I hope this was a useful guide to navigating robotic systems that was somewhat worth its length. As I mentioned earlier, no categorization is perfect and I invite you to share your thoughts.
Going back to one of my specifications: A robot has at least one closed feedback loop between sensing and actuation that does not require human input. Let’s try put our taxonomy in an example set of feedback loops below. Note that I’ve condensed the natural language pipeline into the “HRI” arrows connecting to the human.
Because no post today can evade machine learning, I also want to take some time make readers aware of machine learning’s massive role in modern robotics.
The truth is, our human knowledge is being outperformed by machine learning for processing such high-dimensional data. Always remember that machine learning shouldn’t be a crutch due to its biggest pitfall: We can’t (yet) explain why learned systems behave the way they do, which means we can’t (yet) formulate the kinds guarantees than we can with traditional methods. The hope is to eventually refine our collective scientific knowledge so we’re not relying on data-driven black-box approaches. After all, knowing how and why robots learn will only make them more capable in the future.
On that note, you have earned yourself a break from reading. Until next time!