Covariate this week announced the release of the RFM-1 (Robotics Foundation Model 1). Peter Chen, the co-founder and CEO of artificial intelligence at UC Berkeley tells TechCrunch that the platform “is basically a large language model (LLM), but for robot language.
RFM-1 is the result of, among other things, a vast trove of data collected from the development of Covariant’s Brain AI platform. With the client’s consent, the startup has created the robot equivalent of an LLM database.
“The vision of RFM-1 is to power the billions of robots to come,” says Chen. “We at Covariant have already deployed many robots in warehouses with success. But this is not the limit we want to reach. We really want to power robots in manufacturing, food processing, recycling, agriculture, the service industry and even people’s homes.”
The platform launches as more robotics companies discuss the future of “general purpose” systems. The sudden onslaught of humanoid robotics companies like Agility, Figure, 1X and Apptronik played a key role in this debate. The form factor lends itself well to adaptability (as do the people it’s modeled after), though the robustness of the onboard AI/software systems is another question entirely.
Currently, Covariant’s software is largely deployed on industrial robotic arms doing a variety of familiar warehouse tasks, including tasks such as bin picking. It’s not currently deployed on humanoids, although the company promises some level of hardware agnosticity.
“We really like the work that’s happening in the more general robot hardware space,” says Chen. “The coupling of the intelligence inflection point with the hardware inflection point is where we will see an even greater explosion of robot applications. But a lot of it isn’t quite there yet, especially on the hardware side. It is very difficult to go beyond the directed video. How many people have personally interacted with a humanoid? That tells you the degree of maturity.”
However, Covariant does not shy away from human comparisons when it comes to the role RFM-1 plays in the robots’ decision-making processes. According to its press material, the platform, “gives robots the human-like ability to reason, representing the first time Generative AI has successfully given commercial robots a deeper understanding of language and the physical world.”
This is one of those areas where we have to be careful with claims, both in terms of comparisons to abstract – or even philosophical – concepts and their actual effectiveness in the real world over time. “Human capacity for reason” is a broad concept that means many different things to many different people. Here the concept applies to the system’s ability to process data from the real world and determine the best course of action to perform the task.
This is a departure from traditional robotic systems that are programmed to do one task over and over again, ad infinitum. Such single-use robots have thrived in highly structured environments, starting with automobile assembly lines. As long as there are minimal changes to the task, a robot arm can do its job over and over again, unhindered, until it’s time to call it quits and collect the gold pocket watch for its years of faithful service.
Things can go awry quickly, however, with even the smallest deviations. Let’s say that the object is not positioned exactly on the conveyor belt or that a lighting adjustment has been made that affects the vehicle’s cameras. These kinds of differences can have a huge impact on the robot’s performance. Now imagine trying to make that robot work with a new part, new material, or even do a completely different task. This is even more difficult.
This is where developers traditionally step in. The robot needs to be reprogrammed. More often than not, someone outside the factory floor enters the picture. This is a huge waste of resources and time. If you want to avoid this, one of two things needs to happen: 1) The people working on the floor need to learn code, or 2) You need a new, more natural method of interacting with the robot.
While it would be great to do the former, it seems unlikely that companies are willing to invest the money and wait the necessary time. The latter is exactly what Covariant is trying to do with the RFM-1. “ChatGPT for bots” isn’t a perfect analogy, but it’s a reasonable shorthand (especially in light of the founders’ connection to OpenAI).
On the client side, the platform is presented as a text field, like the current iteration of consumer-facing genetic AI. Enter a text command like, “pick up the apple” by typing or speaking, and the system uses the training data (shape, color, size, etc.) to determine the object in front of it that most closely matches that description.
RFM-1 then generates video results — essentially simulations — to determine the best course of action using prior training. This last part is similar to how our brain processes the possible outcomes of an action before it is performed.
During a live demonstration, the system reacted to inputs such as “pick up the red object” and the even more semantically complex, “pick up what you put on your feet before putting on your shoes,” which caused the robot to correctly pick up the apple and a pair of socks, respectively.
Many big ideas are thrown around when discussing the promise of the system. At the very least, Covariant has an impressive pedigree among its founders. Chen studied artificial intelligence at Berkeley under Peter Abel, his co-founder and chief scientist at Covariant. Abbeel also became an early employee of OpenAI in 2016, a month after Chen joined the company ChatGPT. Covariant was founded the following year.
Chen says the company expects the new RFM-1 platform to work with the “majority” of hardware on which the Covariant software is already deployed.