"Exploring AI Creativity: A Lab's Innovative Experiments"

04 August 2024 2399
Share Tweet

Human know-how derives in part from our nose for novelty — we’re curious creatures, whether looking around corners or testing scientific hypotheses. For artificial intelligence to have a broad and nuanced understanding of the world — so it can navigate everyday obstacles, interact with strangers or invent new medicines — it also needs to explore new ideas and experiences on its own. But with infinite possibilities for what to do next, how can AI decide which directions are the most novel and useful?

One idea is to automatically leverage human intuition to decide what’s interesting through large language models trained on mass quantities of human text — the kind of software powering chatbots. Two new papers take this approach, suggesting a path toward smarter self-driving cars, for example, or automated scientific discovery.

“Both works are significant advancements towards creating open-ended learning systems,” says Tim Rocktäschel, a computer scientist at Google DeepMind and University College London who was not involved in the work. The LLMs offer a way to prioritize which possibilities to pursue. “What used to be a prohibitively large search space suddenly becomes manageable,” Rocktäschel says. Though some experts worry open-ended AI — AI with relatively unconstrained exploratory powers — could go off the rails.

Both new papers, posted online in May at arXiv.org and not yet peer-reviewed, come from the lab of computer scientist Jeff Clune at the University of British Columbia in Vancouver and build directly on previous projects of his. In 2018, he and collaborators created a system called Go-Explore (reported in Nature in 2021) that learns to, say, play video games requiring exploration. Go-Explore incorporates a game-playing agent that improves through a trial-and-error process called reinforcement learning (SN: 3/25/24). The system periodically saves the agent’s progress in an archive, then later picks interesting, saved states and progresses from there. But selecting interesting states relies on hand-coded rules, such as choosing locations that haven’t been visited much. It’s an improvement over random selection but is also rigid.

Clune’s lab has now created Intelligent Go-Explore, which uses a large language model, in this case GPT-4, instead of the hand-coded rules to select “promising” states from the archive. The language model also picks actions from those states that will help the system explore “intelligently,” and decides if resulting states are “interestingly new” enough to be archived.

LLMs can act as a kind of “intelligence glue” that can play various roles in an AI system because of their general capabilities, says Julian Togelius, a computer scientist at New York University who was not involved in the work. “You can just pour it into the hole of, like, you need a novelty detector, and it works. It’s kind of crazy.”

The researchers tested Intelligent Go-Explore, or IGE, on three types of tasks that require multistep solutions and involve processing and outputting text. In one, the system must arrange numbers and arithmetic operations to produce the number 24. In another, it completes tasks in a 2-D grid world, such as moving objects, based on text descriptions and instructions. In a third, it plays solo games that involve cooking, treasure hunting or collecting coins in a maze, also based on text. After each action, the system receives a new observation — “You arrive in a pantry…. You see a shelf. The shelf is wooden. On the shelf you can see flour…” is an example from the cooking game — and picks a new action.

The researchers compared IGE against four other methods. One method sampled actions randomly, and the others fed the current game state and history into an LLM and asked for an action. They did not use an archive of interesting game states. IGE outperformed all comparison methods; when collecting coins, it won 22 out of 25 games, while none of the others won any. Presumably the system did so well by iteratively and selectively building on interesting states and actions, thus echoing the process of creativity in humans.

Intelligent Go-Explore outperformed randomly selected actions and three other approaches in solo games that involve processing and outputting text.

IGE could help discover new drugs or materials, the researchers say, especially if it incorporated images or other data. Study coauthor Cong Lu of the University of British Columbia says that finding interesting directions for exploration is in many ways “the central problem” of reinforcement learning. Clune says these systems “let AI see further by standing on the shoulders of giant human datasets.”

The second new system doesn’t just explore ways to solve assigned tasks. Like children inventing a game, it generates new tasks to increase AI agents’ abilities. This system builds on another created by Clune’s lab last year called OMNI (for Open-endedness via Models of human Notions of Interestingness). Within a given virtual environment, such as a 2-D version of Minecraft, an LLM suggested new tasks for an AI agent to try based on previous tasks it had aced or flubbed, thus building a curriculum automatically. But OMNI was confined to manually created virtual environments.

So the researchers created OMNI-EPIC (OMNI with Environments Programmed In Code). For their experiments, they used a physics simulator — a relatively blank-slate virtual environment — and seeded the archive with a few example tasks like kicking a ball through posts, crossing a bridge and climbing a flight of stairs. Each task is represented by a natural-language description along with computer code for the task.

OMNI-EPIC picks one task and uses LLMs to create a description and code for a new variation, then another LLM to decide if the new task is “interesting” (novel, creative, fun, useful and not too easy or too hard). If it’s interesting, the AI agent trains on the task through reinforcement learning, and the task is saved into the archive, along with the newly trained agent and whether it was successful. The process repeats, creating a branching tree of new and more complex tasks along with AI agents that can complete them. Rocktäschel says that OMNI-EPIC “addresses an Achilles’ heel of open-endedness research, that is, how to automatically find tasks that are both learnable and novel.”


RELATED ARTICLES