AI Sparks

Can AI tell you where you left your keys? | MIT News

An auto factory worker can remember the storage bin where he left a partially assembled part the night before, and quickly return to that location to retrieve it. But the robots that may work in cooperation with him will have difficulty developing and accessing this kind of “weather location” memory.

Now, MIT researchers have developed a framework for long-term memory that allows robots to quickly build and remember detailed mental models of complex, large-scale environments.

In the future, this advance could allow a factory worker to send a robotic assistant to pick up an item, simply by asking it to “go get the part we started assembling last night.”

This new approach combines improved map representations with rich descriptions of the terrain that the robot collects as it moves over time. The robot can quickly access this memory to answer complex questions about its environment in simple language.

This memory framework, which answers questions more accurately than state-of-the-art methods, works fast enough for a mobile robot to use in real time.

In addition to potential applications in robotics, this method could have applications in augmented reality systems that help maintenance workers find confusing or assist passengers in finding their way.

“If we want robots to work together with people and interact better with people, they must speak the same language. The robot must be able to think about time and space in the same way as humans. This is exactly what our approach does. Converting a traditional map into a language-based map that is easier for the robot to think about and get there using language,” said Luca Carlone, an expert at the Associateernautics Department (AeroAstro), principal investigator at the Laboratory of Information and Decision Systems (LIDS), and director of MIT SPARK Laboratory.

Contributed to the paper by lead author Nicolas Gorlo, a graduate student at MIT; and Lukas Schmid, a former research scientist at MIT and now a professor at the University of Technology Nuremberg in Germany. The research was recently presented at the Computer Vision and Pattern Recognition (CVPR) conference.

Spatiotemporal memory

Memory allows an artificial intelligence system, such as a chatbot, to answer complex and logical questions about past interactions with its user.

“We want to design a new type of memory, spatiotemporal memory, which enables an AI-powered robot to remember real interactions and sensory observations. Like ChatGPT, but based in the real world and able to answer any question about nature, like ‘Where did I leave my bag?’ Carlone says.

To build such a memory framework, MIT researchers combined two lines of work: computer vision and robotic mapping.

Various computer vision models can understand and richly describe objects in a scene, but they usually only process one annotation at a time. On the other hand, robotic mapping systems create 3D maps of an area, such as an entire apartment or a university campus, but they often lack detailed object descriptions or are computationally expensive.

The approach developed by MIT researchers, called Define Anything, Anywhere, Anytime, Anytime (DAAAM), takes the best of both approaches.

Using DAAAM, as the robot traverses its environment, it attaches rich descriptions to the objects it sees. For example, a robot might notice that a certain building at the MIT campus is called the Stata Center and is designed with a certain type of architecture, or that the bike rack holds five bikes and the red one has a flat tire.

It stores this detailed information in a 3D map-based presentation that is organized geographically, so objects will be grouped into different regions. In this way, the robot can remember that the red bicycle with the flat tire is in the bicycle rack outside the Stata Center.

But existing techniques that capture such rich descriptions often take a few seconds to annotate a few objects. This is too slow for real-time operation, as the robot can see hundreds of objects within a few minutes of testing.

“The sooner a robot can build this spatial memory, the better it will be at performing actions in the environment,” Carlone said.

To guide the process

To speed things up, DAAAM assembles nearby objects as it goes and uses an optimization method to select keyframes to annotate them. These are images with a very clear view of many objects, allowing the system to accurately describe several objects in parallel, speeding up the calculation tenfold.

As the robot explores the space, it attaches each set of annotations to multiple objects at a specific location on the 3D map.

“We annotate everything only once, so our framework can work on very large areas in real time. And by grouping things into regions, it can answer many questions about things and places in an area,” explained Gorlo.

When a system builds this local memory, it must retrieve information from a large database of objects and definitions in an efficient manner.

To do this, the researchers used LLM which calls for various tools, which can quickly retrieve specific information in a way that reduces hallucinations. This allows DAAAM to answer a user’s query accurately within seconds.

For example, if a person asks the robot about a certain statue he saw near the MIT campus building, DAAAM can use a semantic search tool to find information based on the word “carpenter” or a different tool to find information based on the location of the building.

When tested and compared to other methods, DAAAM was most accurate between 21 percent and 53 percent, depending on the type of question.

In the future, researchers want to expand DAAAM so that the system captures important events that have occurred in the area. They also work to add confidence levels to the system’s responses.

“Ultimately, we want to have robots that can help with any kind of tasks. With this framework, we are trying to build the foundations to enable a generalist agent that can do anything you ask,” said Gorlo.

This research was funded, in part, by the US Army Research Laboratory and the Office of Naval Research. Carlone is currently on sabbatical as an Amazon Scholar; this article describes work done at MIT and is not affiliated with Amazon.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button