Scaling up learning across many different robot types
In a research article published in Nature Robotics, researchers from the University of California Berkeley and Stanford University presented a self-improving robotic agent named RoboCat. They described how their artificial intelligence algorithm, which they call “AIM-Learning,” learns to perform tasks based on visual inputs while also acquiring vision-language communication skills.
The researchers began by developing a neural network architecture using deep learning techniques that can learn to translate between vision and language. The team trained the network using images of natural scenes, as well as text descriptions generated by a pretrained model, to achieve realistic visual-to-speech translation capabilities.
To test their AIM-Learning algorithm, researchers used RT-2, a robot designed by the same team, to perform a range of tasks, including navigating through an empty room with textured and varied walls, identifying and capturing objects in the environment, performing simple gestures, and understanding visual commands.
The team also compared the results of AIM-Learning against other algorithms on a test task involving identifying and moving a virtual robot around an environment. The researchers found that RT-2 achieved state-of-the-art performance for this task, outperforming human performance by over 50%.
Overall, the researchers’ AIM-Learning algorithm is promising for the development of self-improving robots capable of performing a wide range of tasks. The team plans to continue working on improving the algorithm and expanding its capabilities through further testing and refinement.