Researchers Develop New Method for Controlling AI Image Generation
Researchers from North Carolina State University have developed a new method for controlling artificial intelligence (AI) image generation, which could be used in fields like autonomous vehicles.
Conditional Image Generation and Other Techniques
Conditional image generation is an AI task that involves AI systems creating images based on a specific set of conditions, which the user can request. Newer techniques have taken this even further and incorporate conditions for an image layout, which enables users to specify the types of objects they want to appear in specific spots on the screen.
The new state-of-the-art method developed by the researchers at the university builds on all of these techniques, and it enables users to have more control over the images while retaining certain characteristics across a series of images.
Tianfu Wu is co-author of the research paper and an assistant professor of computer engineering at NC State.
“Our approach is highly reconfigurable,” Wu says. “Like previous approaches, ours allows users to have the system generate an image based on a specific set of conditions. But ours also allows you to retain that image and add to it. For example, users could have the AI create a mountain scene. The users could then have the system add skiers to that scene.”
With the new method, users can also allow the AI to manipulate elements so that they are identifiably the same while still moving or changing in some way. One such example would be the AI creating a series of images where skiers turn toward the viewer while moving across a landscape.
“One application for this would be to help autonomous robots ‘imagine’ what the end result might look like before they begin a given task,” Wu says. “You could also use the system to generate images for AI training. So, instead of compiling images from external sources, you could use this system to create images for training other AI systems.”
The new approach was tested with the COCO-Stuff dataset and the Visual Genome dataset, and based on the standards for image quality, it outperforms the previous state-of-the-art techniques.
“Our next step is to see if we can extend this work to video and three-dimensional images,” Wu says.
In order to train the new approach, the researchers had to rely on a 4-GPU workstation given the heavy computational power required. Despite this, deploying the system is still less computationally expensive.
“We found that one GPU gives you almost real-time speed,” Wu says.
“In addition to our paper, we’ve made our source code for this approach available on GitHub. That said, we’re always open to collaborating with industry partners.”