There are countless reasons why home robots have had little success since the Roomba. Pricing, practicality, form factor and mapping contributed to one failure after another. Even when some or all of these are considered, the question remains of what happens when a system makes an inevitable mistake.
This has been a point of friction at the industry level as well, but large companies have the resources to deal with problems as they arise. We cannot, however, expect consumers to learn to plan or hire someone to help every time a problem comes up. Fortunately, this is a great use case for large language models (LLM) in the field of robotics, as evidenced by new research from MIT.
A study to be presented at the International Conference on Educational Representations (ICLR) in May, is said to bring some “common sense” to the error correction process.
“It turns out that robots are excellent mimics,” the school explains. “But unless engineers also program them to adapt to every possible bump and jolt, the robots don’t necessarily know how to handle these situations unless they start their work from the top.”
Traditionally, when a robot encounters problems, it will exhaust its pre-programmed options before requiring human intervention. This is a particular challenge in an unstructured environment such as a home, where any number of changes to the status quo can adversely affect a robot’s ability to function.
The researchers behind the study note that while imitation learning (learning to do a task through observation) is popular in the home robotics world, it often cannot account for the countless small environmental variations that can affect regular operation, thus requiring a system to restart from square one. The new research addresses this, in part, by breaking demonstrations into smaller subsets, rather than treating them as part of one continuous action.
This is where LLMs enter the picture, eliminating the requirement for the developer to manually label and assign the numerous sub-actions.
“LLMs have a way of telling you how to do each step of a task, in natural language. A person’s continuous display is the embodiment of these steps, in physical space,” says student Tsun-Hsuan Wang. “And we wanted to connect the two so that a robot would automatically know what stage it is in a task and be able to redesign and recover on its own.”
The particular demonstration presented in the study involves training a robot to collect marbles and drop them into an empty bowl. It’s a simple, repetitive task for humans, but for robots, it’s a combination of various small tasks. LLMs are able to record and label these subtasks. In the demonstrations, the researchers sabotaged the activity in small ways, such as knocking the robot off course and knocking marbles off its spoon. The system responded by correcting small tasks on its own, rather than starting from scratch.
“With our method, when the robot makes mistakes, we don’t need to ask the human to program or give extra demonstrations on how to recover from failures,” adds Wang.
It’s an exciting method to help someone avoid losing their marbles altogether.