2024 is going to be a huge year for the intersection of genetic AI/large fundamental models and robotics. There is a lot of excitement around the potential for various applications, ranging from learning to product design. Google’s DeepMind Robotics researchers are one of several groups exploring the possibilities of space. In a blog post Today, the team highlights ongoing research designed to give robotics a better understanding of exactly what we humans want from it.
Traditionally, robots have focused on doing a single task repeatedly during their lifetime. Disposable robots tend to be very good at this, but even they run into difficulties when changes or errors are inadvertently introduced into the process.
The newly announced AutoRT it is designed to leverage large foundation models, at many different extremes. In a typical example given by the DeepMind team, the system starts by leveraging a Visual Language Model (VLM) for better situational awareness. AutoRT is able to manage a fleet of robots working in parallel equipped with cameras to acquire a layout of their environment and the object within it.
A large language model, meanwhile, suggests tasks that can be accomplished by the hardware, including its final operator. LLMs are considered by many to be the key to unlocking robotics that efficiently understand more natural language commands, reducing the need for hard coding skills.
The system has already been tested quite a bit over the past seven months or so. AutoRT is capable of orchestrating up to 20 robots simultaneously and a total of 52 different devices. In total, DeepMind has collected about 77,000 tests, including more than 6,000 tasks.
Also new from the team is RT-Trajectory, which leverages video input for robotic learning. Many groups are exploring the use of YouTube videos as a method of training robots at scale, but RT-Trajectory adds an interesting layer by overlaying a 2D sketch of the arm in action over the video.
The team notes, “these trajectories, in the form of RGB images, provide low-level, practical visual cues to the model as it learns the robot’s control policies.”
DeepMind says the training had twice the success rate of its RT-2 training, at 63% compared to 29%, while testing 41 tasks.
“RT-Trajectory utilizes the rich robotic motion information that exists in all robot datasets, but is currently underutilized,” the team notes. “RT-Trajectory not only represents another step on the road to building robots capable of moving with efficient precision in new situations, but also unlocking knowledge from existing datasets.”