DALL-E 2 influences for "Teddy bears mixing sparkling chemicals as mad scientists, steampunk." OpenAI

By Editor April 8, 2022 Updated: September 21, 2025

Artificial intelligence research body OpenAI has created a new rendition of DALL-E, its text-to-image generation program. DALL-E 2 features a higher-resolution and lower-latency interpretation of the original system, making pictures depicting users' descriptions.

It also includes new credentials, like editing an existing image. However, as with previous OpenAI work, the tool isn't being immediately released to the public. Instead, OpenAI expects to make it later available in third-party apps.

The authentic DALL-E, a portmanteau of the artist "Salvador Dalí," and the robot "WALL-E" debuted in January 2021. It was a narrow but fascinating test of AI's ability to represent concepts visually, from mundane descriptions of a mannequin in a flannel shirt to "a giraffe made of the turtle" or a picture of a radish walking a dog.

Currently, OpenAI said it would continue to create on the system while studying potential dangers like bias in image generation or the presentation of misinformation. It's attempting to manage those issues using technical safeguards and a new content policy while also lowering its computing load and pushing forward the necessary capabilities of the model.

One of the new DALL-E 2 elements, inpainting, involves DALL-E's text-to-image abilities on a more granular level. Users can begin with an existing picture, choose an area, and tell the model to edit it. For instance, you can block out a painting on a living room wall and replace it with a different picture or add a flower vase on a coffee table.

Likewise, the model can fill (or remove) objects while accounting for details like the directions of shadows in a room. Another feature, variations, is like an image search tool for images that don't exist. Users can upload a starting picture and then create a range of similar variations. They can also blend two photos, generating illustrations with both elements. The generated photos are 1,024 x 1,024 pixels, a leap over the 256 x 256 pixels the authentic model provided.

DALL-E 2 creates on CLIP, a computer vision system that OpenAI also expressed last year. "DALL-E 1 just took our GPT-3 approach from language and applied it to create an image: we compressed images into a series of words, and we just learned to anticipate what comes next," says OpenAI research scientist Prafulla Dhariwal, directing to the GPT model used by multiple text AI apps. But the word-matching didn't necessarily grab the qualities humans found most important, and the predictive approach limited the realism of the images.

CLIP was designed to look at pictures and outline their ranges the way a human would, and OpenAI iterated on this strategy to create "unCLIP" — an inverted interpretation that initiates with the description and works its route toward an image. DALL-E 2 generates the image using diffusion, which Dhariwal describes as beginning with a "bag of dots" and then serving in a pattern with a greater and greater component.

Interestingly, a draft paper on unCLIP states it's partly resistant to a very amusing weakness of CLIP: the fact that individuals can fool the model's identification capabilities by tagging one object (like a Granny Smith apple) with a word signifying something else (like an iPod).

The variations tool, the authors say, "still generates pictures of apples with high probability" even when using a mislabeled picture that CLIP can't identify as a Granny Smith. But, conversely, "the model never produces pictures of iPods, despite the very high relative predicted probability of this caption."

DALL-E's full model was never unleashed publicly, but other developers have sharpened their tools that imitate some of its operations over the past year. One of the most widespread mainstream applications is Wombo's Dream mobile app, which causes pictures of whatever users describe in various art styles. Unfortunately, OpenAI isn't releasing new models today, but developers could use its technical findings to update their work.

OpenAI has enforced some built-in safeguards. For example, the model was trained on data that included some objectionable material weeded out, limiting its ability to produce objectionable content.

In addition, there's a watermark indicating the AI-generated nature of the work, although it could theoretically be cropped out. Finally, as a preemptive anti-abuse feature, the model also can't generate recognizable faces based on a term — even asking for something like the Mona Lisa would yield a variant on the natural face from the painting.

DALL-E 2 will be testable by vetted allies with some caveats. For example, users are banned from uploading or rendering images that are "not G-rated" and "could cause harm," including anything concerning hate symbols, nudity, obscene gestures, or "major conspiracies or events related to major ongoing geopolitical events."

They must also disclose the role of AI in generating the photos, and they can't serve generated images to other individuals through an app or website. So you won't initially see a DALL-E-powered interpretation of something like Dream. But OpenAI expects to add it to the group's API toolset later, permitting it to third-party power apps.