A robotics company has released a video purporting to show a humanoid robot making a cup of coffee after watching humans do it — while correcting mistakes it made in real time. In the promotional footage, Figure.ai's flagship model, dubbed "Figure 01", picks up a coffee capsule, inserts it into a coffee machine, closes the lid and turns the machine on.
Although it's unclear which systems are operating under the hood, Figure has signed a commercial agreement with BMW to provide its humanoid robots in automotive production — announcing the news Jan. 18 in a press release.
Experts also told Live Science what is likely going on under the hood, assuming the footage shows exactly what the company claims.
Currently, artificial intelligence (AI)-powered robotics is domain-specific — meaning these machines do one thing well rather than doing everything adequately. They start with a programmed baseline of rules and a dataset used to self-teach. However, Figure.ai claims that Figure 01 learned by only watching 10 hours of footage.
For a robot to make coffee or mow the lawn, it would mean embedding expertise across multiple domains that are too unwieldy to program. Rules for every possible contingency would have to be foreseen and coded into its software — for example, specific instructions on what to do when it gets to the end of the lawn. Acquiring expertise across many domains just by watching would therefore represent a major leap.
The robot would need to take video data and develop an internal predictive model of the physical actions and the order of these actions, Christoph Cemper, CEO of AIPRM, a site that designs prompts to be entered into AI systems like ChatGPT, told Live Science. It would need to translate what it sees into an understanding of how to move its limbs and grippers to perform the same movements, he added.
Related: Elon Musk just teased Telsa’s new Optimus Gen-2 robot with a video featuring a funky treat at the end
Then there is the architecture of neural networks, said Clare Walsh, a data analytics and AI expert at the Institute of Analytics in the U.K. — a type of machine learning model inspired by how the brain works. Large numbers of interconnecting individual nodes connect to create a signal. If the desired result is achieved when signals lead to an action (like extending an arm or closing a gripper), feedback strengthens the neural connections that achieved it, further embedding it in ‘known’ processes.
"Before about 2016, object recognition like distinguishing between cats and dogs in photos would get success rates of around 50%," Walsh told Live Science. "Once neural networks were refined and working, results jumped to 80 to 90% almost overnight — training by observation with a reliable learning method scaled incredibly well."
To Walsh, there's a similarity between Figure 01 and autonomous vehicles, made possible using probability-based rather than rules-based training methods. She noted that self-teaching training can build data fast enough to work in complex environments.
"The robot's visual acuity goes beyond seeing what's happening in the coffee-making process," Maybury said. "It doesn't just observe it, it analyzes the process to ensure everything is as accurate as possible."
That means the robot knows not to overfill the cup and how to insert the pod correctly. If it sees any deviation from the learned behavior or expected results, it interprets this as a mistake and fine-tunes its actions until it reaches the desired result. It does this through reinforcement learning, in which awareness of the desired goal is developed through the trial and error of navigating an uncertain environment.
RELATED STORIES
—Human-like robot tricks people into thinking it has a mind of its own
—Robot hand exceptionally 'human-like' thanks to new 3D printing technique
—These astonishing biobots can help neurons regrow — but researchers have no idea how
Walsh added that the right training data means the robot's human-like movements could "scale and diversify" quickly. "The number of movements is impressive and the precision and self-correcting capabilities mean it could herald future developments in the field," she said.
But Mona Kirstein, an AI expert with a doctorate in natural language processing, cautioned that Figure 01 looks like a great first step rather than a market-ready product.
"To achieve human-level flexibility with new contexts beyond this narrowly defined task, bottlenecks like variations in the environment must still be addressed," Kirstein told Live Science. "So while it combines excellent engineering with the state-of-the-art deep learning, it likely overstates progress to view this as enabling generally intelligent humanoid robotics."