mundophone

DIGITAL LIFE

Researchers train AI to better follow artists by sharing creative 'ground rules'

The conversation around AI and art generally swings between two extremes: a flood of AI slop or the total automation of creative work. The more desirable approach may be an AI that behaves as a useful collaborator. But thus far, visual artists working with text-to-image tools confront frustrating basic hurdles in their abilities to direct AI. Ask an AI to create an image of a house? Not too difficult. Direct it to make the house red, with four front-facing windows, a chimney, and ivy covering the left side? Good luck.

Stanford computer science, cognitive psychology, and education scholars believe they can help AI better augment human creativity by teaching models and people to communicate ideas with each other. The scholars are developing a shared conceptual grounding for humans to collaborate with generative AI on production-quality visual content ranging from illustrations to diagrams to animations.

"While the models seem amazing, they are terrible collaborators," says Maneesh Agrawala, professor of computer science at Stanford and a co-principal investigator for the project. "Creators have no way of knowing what the AI will produce when given a certain text prompt. If you ask for a suburban single-family home, it generates a modern duplex."

Authoring original content requires having opinions and constantly making choices, Agrawala explains. Humans and AI need a shared set of concepts so the nuance doesn't get lost in translation.

Deciphering the human creative process...The Stanford team is approaching this problem from two directions. First, the scholars are running experiments to better understand how people collaborate to create visual content. They have conducted several studies of people performing creative tasks to analyze through chat logs and sketches how the participants communicate as they work together.

"If we want to build AI systems that understand how humans think during creative projects, we should start by learning as much as we can from the way that people establish common conceptual ground with each other," says Judith Fan, assistant professor of psychology at Stanford. "Not everyone talks or draws the same way, but they still expect to be understood."

Building AI tools that understand creators...Second, the team is building open-source AI tools to apply the lessons learned about human creative communication. For example, ControlNet teaches text-to-image diffusion models about spatial composition, using two separate features, blocking and detailing, to mirror how artists begin with a rough sketch and then complete the detail of a drawing. Today's models struggle to capture the idea of a pose or how objects should be arranged in a scene. With this tool, creators can guide models to a layout that matches their vision.

Another tool called FramePack enables creators to generate 3D videos from a text prompt for multi-scene storytelling. This tool teaches models to prioritize scenes based on their importance to the overall story, similar to the way a human would work on a project.

A third innovation explores the power of neuro-symbolic AI, which combines neural networks with reasoning capabilities to increase transparency and overcome the limitations of "black box" AI. Using these principles, the team has developed a visual scene coding language that works from a natural language text prompt to produce lines of code, which are executed and rendered to create a 3D scene. Human creators can stay in the loop to inspect or edit the code and prompt the AI to update its program at any time.

Reimagining education content...The impact of a shared conceptual grounding between humans and AI promises to yield new applications in diverse fields, including design, simulation, animation, robotics, and education, says Agrawala. The research team is currently working with gaming platform Roblox to enable players to generate unique 3D objects from text prompts while imposing game restrictions (so, for example, players won't be able to create weapons in a nonviolent game).

More broadly, the scholars hope that one day human creators of all skill levels—from hobbyists and small business owners to visual experts—will have a friction-free way to express their ideas using a combination of natural language, example content, code snippets, and other modalities.

"We're serious about equipping the broader creative community with the tools they need to communicate with AI effectively," Fan says.

Provided by Stanford University

mundophone

Wednesday, March 11, 2026

No comments:

Post a Comment

Report Abuse