Research

Translating Natural Language Models for Robots using LLMs

Implementing a way for robots to smoothly integrate human instructions on a large scale.
By Oliver Wang
|
November 9, 2023
A retro robot plugged into a mainframe computer
Image generated by Nicholas Guttenberg using Stable Diffusion.
READ THE PAPER

I'm Oliver, an undergrad EE major with interests in AI ethics and robotics. Currently, I'm working on using large language models (LLMs) to translate natural language instructions into functions that a robot can use.

At the moment, the behavior of robots is controlled through code that requires years of training and expertise to use. However, this limits the flexibility and accessibility of the technology. In applications as varied as high-risk maintenance to household cooking chores, the benefits of an adaptable robot that a supervising human can communicate easily with are clear. For example, a study from Microsoft demonstrated a simulated environment where chatGPT was used to control a drone that flew up a wind turbine to inspect it1.

In these kinds of dangerous environments, this technology may not only be more efficient and effective, but even save lives. If we could implement a way for robots to smoothly integrate human instructions on a large scale, the accessibility would allow for more opportunities for cooperation between humans and robots.

So far, I've been working on applying LLMs to be used in robots that perform cooking-related tasks. First, I documented the capabilities of various LLMs to generate natural-language recipes given a simple prompt such as: "Write a recipe for making miso soup". After getting a good idea of which LLMs were well suited to this kind of task, we realized that large databases of recipes written and verified by humans already exist, so adding an LLM into the recipe-writing phase introduces an unnecessary source of uncertainty.

However, these LLMs are theoretically better suited to parse instructions and translate them from natural language into code. Due to limited equipment, I've been working on prompt engineering the open source Falcon7b-instruct model, which is a scaled-down version of the current top-performing model on the Huggingface OpenLLM leaderboard2 3. Currently, this small version struggles with translating an entire recipe at once. I'm developing a prompt that works for basic cases like translating a single line of a recipe, and expanding from there.

My hope is to harness an LLM to meaningfully bridge the gap between human and robot; a longer-term goal for me would be to explore the "long horizon" task planning behavior of these LLMs. This ability would allow for the LLM to break down a complex task into simpler ones, or to abstract many smaller tasks under a larger function.

References:

1 ChatGPT for Robotics: Design Principles and Model Abilities. Vemprala, Sai and Bonatti, Rogerio and Bucker, Arthur and Kapoor, Ashish. https://www.microsoft.com/en-us/research/publication/chatgpt-for-robotics-design-principles-and-model-abilities/

2 {Falcon-40B}: an open large language model with state-of-the-art performance. https://huggingface.co/tiiuae/falcon-7b-instruct

3 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard