TECH
A Minecraft-based benchmark to train and test multi-modal multi-agent systems
Researchers at the University of California- Los Angeles (UCLA) have recently developed TeamCraft, a new open-world environment for the training and evaluation of algorithms for embodied artificial intelligence (AI) agents, including teams of multiple robots. This benchmark, introduced in a paper published on the arXiv preprint server, is based on the popular videogame Minecraft.
"There is a lack of multi-modal, multi-agent benchmarks for open-world environments," Qian Long, Ph.D. a student at UCLA.
"Minecraft, one of the most popular games, offers a multidimensional, visually immersive realm characterized by procedurally generated landscapes and versatile game mechanics. Its dynamic nature supports a wide range of activities, which made it an ideal platform for creating our visually rich multi-agent benchmark: TeamCraft."
TeamCraft, the platform created by Long and his colleagues, can be used to train algorithms on four different types of tasks, namely building, clearing, farming and smelting. As part of their study, the researchers also used their platform to evaluate existing vision-language models (VLMs), which allowed them to better understand their limitations.
"TeamCraft is a multi-modal, multi-agent benchmark that addresses a significant challenge for AI," said Zhi Li, Ph.D. Student at UCLA. "Specifically, it helps to address the question: How well can embodied agents collaborate in complex environments with human-like perception?"
In the TeamCraft benchmarking platform, every agent is provided with first-person RGB data and status information, which mirrors what a human agent would perceive in the environment. AI agents can be trained and tested on various tasks that require them to collaborate with each other, understand the environment via first-person vision and utilize available tools.
To complete each task, the agents need to perform specific actions, similar to those that a human player would perform in Minecraft. These actions are pre-defined (i.e., can be picked from a limited set of options) and self-descriptive (i.e., clearly named/labeled).
"The first advantage of TeamCraft is that it enables multi-modal task specification," explained Li. "Unlike prior systems such as ALFRED and MineDojo, which rely solely on text instructions, TeamCraft supports multi-modal prompts. This expands the scope for richer and more diverse task specifications."
Agents collaborate to cook mutton in a desert village. Credit: UCLAAnother unique characteristic of TeamCraft is that it equips agents with first-person RGB vision while they navigate the visually rich Minecraft environment. This is in contrast with previous approaches such as Watch&Help and RoCoBench, which relied on state-based observations, Neural MMO 2.0, which provides simplified pixel-based visuals, and Overcooked-AI, which only allows agents to view 2D worlds.
"While most prior works like MineDojo and VIMA-Bench focus on single-agent setups, TeamCraft prioritizes multi-agent environments to better simulate real-world challenges requiring collaboration," said Li.
"It supports both centralized and decentralized control strategies, enhancing flexibility in agent coordination and challenging capabilities of model understanding."
The tasks included in TeamCraft are designed to assess the agents' planning, coordination and execution while they navigate a dynamic setting.
In contrast with some other benchmarks, like FurnMove, the system does not only support the evaluation of agents that are equally capable across tasks, but also of agents with different responsibilities.
In other words, it allows users to distribute different roles to different agents in a team, by providing them with distinct capabilities. It can also be used to train and test the agents' decision-making skills in real-time and their adaptability to changing environments.
TeamCraft features a total of 55,000 task variants. These variants are defined based on various factors, including Biomes (i.e., distinct regions within the open-world environment), base blocks, task goals, target materials, agents counts and unique inventories.
"Operating in the Minecraft environment, TeamCraft enables agents to perceive, think, and act like human players without perfect information," said Li.
"Unlike prior systems that provide agents with complete data (e.g., unseen teammate locations), TeamCraft requires agents to actively explore their surroundings. This shift fosters more realistic behaviors and reduces dependence on artificially perfect data, enabling agents to better handle real-world scenarios and reduce the gap of deploying models to real world application."
No comments:
Post a Comment