Synthetic art can help AI systems learn

AI systems may perform better at identifying paintings when trained on AI-generated art that is tailored to “learn” concepts behind the images, a new study finds.

Artificial intelligence systems, such as those used for facial recognition, are typically trained on data collected from the real world. In the 1990s, researchers manually captured photos to create image collections, while in the 2000s, they began searching for data on the Internet.

However, raw data often include significant gaps and other issues that, if not accounted for, can lead to large errors. For example, commercial facial recognition systems are often trained on image databases whose photos of faces are often light-skinned. This meant that they often did a poor job of recognizing dark-skinned people. Preparing databases to address these types of deficiencies is often an expensive and time-consuming task.

“Our study is among the first to demonstrate that using only synthetic images can potentially lead to better training performance than using real data.”
—Liji Fan, MIT

Scientists have previously suggested that AI art generators could help avoid the problems that come with collecting and curating real-world images. Now researchers are finding that AI image recognition systems trained on synthetic art can actually perform better than those fed real-world pictures.

AI generating text to imageproducts like OT-E 2, Stable diffusionand Halfway through the trip now regularly create images based on textual descriptions. “These models are now capable of generating photorealistic images of extremely high quality,” said study co-author Liji Fan, a computer scientist at MIT. “Furthermore, these models offer significant control over the content of the generated images. This capability allows the creation of a wide variety of images depicting similar concepts, offering the flexibility to tailor the dataset to suit specific tasks.

In the new study, the researchers developed an AI training strategy called StableRep. This new technique feeds AIs pictures that Stable Diffusion generates when given text captions from image databases like RedCaps.

The scientists had StableRep create multiple pictures from identical text prompts. It then had the AI ​​system examine these images as images of the same underlying subject. The goal of this strategy was to help the neural network learn more about the concepts behind the images.

In addition, the researchers developed StableRep+. This advanced strategy not only trains on photos, but also explores text from image captions. In experiments, when trained with 10 million synthetic images, StableRep+ showed an accuracy of 73.5 percent. In contrast, the CLIP AI system showed an accuracy of 72.9 percent when trained on 50 million real images and captions. In other words, StableRep+ achieved comparable performance with a data source that was one-fifth as large.

“Our study is among the first to demonstrate that using only synthetic images can potentially lead to better performance than using real data in a large-scale environment,” says Fan.

It remains uncertain why AIs perform better when learning from synthetic images rather than real pictures. The researchers suggest that one possibility is that AI art generators can provide a greater degree of control over training data. Another is that generative AI may be able to generalize beyond the raw data it learns from to produce a richer training set than the real data.

The researchers caution that this work faces many potential concerns. For example, AI art generators are often trained on raw data, which can be loaded with hidden biases and other problems. In addition, these systems often fail to provide proper attribution to where their data came from, leading to copyright and other legal battles.

Additionally, AI art generators are relatively slow so far, ranging from 0.8 to 2.2 seconds per image, which currently limits how well StableRep can scale. Additionally, the researchers note that StableRep images do not always match the intent of the text prompts they are given, which can affect the overall quality and usefulness of the synthetic images. In addition, text prompts carry a risk of bias and therefore require careful design.

In the future, in addition to addressing the above concerns, scientists would like to further increase the size of synthetic imaging datasets. “Our current study is conducted on datasets in the tens of millions range, while the largest available image and text datasets contain billions of samples,” says Fan. “Exploring how synthetic data performs at these larger scales presents an intriguing opportunity for further research.”

Overall, “synthetic data is becoming popular in various fields, including medical image analysis, robotics, etc.,” Fan says. The strategy developed by the researchers in this study may have “significant potential for application in these areas as well. Our method can be particularly useful in environments where obtaining large volumes of real data is challenging or impractical.

The scientists will describe their findings on December 14 at the Conference on Neural Information Processing Systems (NeurIPS) in New Orleans.

Leave a Comment

Your email address will not be published. Required fields are marked *