top of page

What is DALL·E 2?

What is DALL-E 2 and what are its capabilities? Click here to learn more about this algorithm!


Figure 1: A computer


What is DALL-E 2?


DALL-E 2 is an AI system developed by OpenAI, a research company. This algorithm has three main functions: producing images given a written description, producing variations of an image and editing images. DALL·E 2 can produce very high-quality and detailed images that have never existed before. The system is so advanced that it can even create images in the styles of artists such as Monet or Picasso.


Figure 2: Top Left: The image DALL-E produces with the prompt, "A monkey typing on a computer in the style of Picasso"; Top Right: DALL-E successfully photoshopping the Leaning Tower of Pisa where the Eiffel Tower is supposed to be; Bottom: DALL-E generating variations of Picasso's self portrait.


How does DALL-E 2 Generate Images?


Firstly, DALL·E 2 converts the text description created by the user into an embedding (a numerical representation of the text). Then this embedding is passed into a prior which creates an image embedding. Finally, the image embedding is passed into a decoder which produces the final image.


Figure 3: A representation of the DALL-E 2 model


DALL-E 2 uses a model called CLIP to produce the text and image embeddings. CLIP is a neural network-based model that produces a caption to describe a given image. To do this, two encoders are used: one converts text into text embeddings and the other converts images into image embeddings. The CLIP model ensures that an image’s image embedding, and the text embedding of its description are as similar as possible.


The prior uses a diffusion model. Diffusion models repeatedly add "noise" to data and then try to change, or reform the data. This allows them to learn how to generate data (in this case CLIP embeddings). When developing DALL-E 2, researchers tried generating images without a prior but found the results were better when one was used.


The decoder uses a model called GLIDE which is a variant of a diffusion model to convert the image embedding created by the prior into an actual image.


How DALL-E 2 Produces Variations of an Image


The method DALL·E 2 uses to generate variations is quite simple: the CLIP image embedding of the image is generated and then this is passed into the decoder. Since, the CLIP encoder is designed to only pick up on the important parts of the image, when the embedding is passed into the decoder, a different embedding is produced.


Personal Opinion


Personally, I feel that DALL·E 2 is one of the most impressive and advanced AI models around today. Many of the images it can create in seconds could take years for humans to paint.

Despite this, the model has a few minor problems such as difficulty producing text.


A larger problem with DALL·E 2 could be caused by misuse from humans. This could be in the form of creating fake images and using these to exploit people. As a result, the company behind DALL-E, OpenAI, is committed to preventing this by removing explicit or hateful images from the training of DALL·E 2 and checking user’s queries for any potentially harmful image requests.


98 views0 comments

Recent Posts

See All

Comments


bottom of page