Last update date:
Zhang, who got access to DALL-E 2 at the end of July 2022, learned that the model has some weaknesses by generating images with the theme of ‘a llama dunking a basketball.’ I called. These weaknesses can be enumerated as follows:
Weaknesses when DALL-E 2 generates images
|
After listing the weaknesses of DALL-E 2 as above, to output the desired image with the same modelPrompt input trial and error is essentialSo a minimum of 15 credits (or 15 outputs) is required, advises Zhang. In the current situation, “prompt engineering”, which is the technology to output the desired image to the same model, is not sufficiently developed, and if the desired image is complicated, it is not possible to easily obtain the desired image using the same model.
In addition, when translating the prompt input sentences in the following translated article text, the original English sentences were written together with the translated sentences. If you input the original English text into DALL-E 2, you will see the output.
In addition, the following article text was translated after contacting Joy Zhang directly and obtaining translation permission. In addition, the contents of the translated articles are his own views, and do not represent any particular country, region, organization or group, nor do they represent the principles of the translator or the AINOW editorial department.
In creating the following translated articles, I have made supplementary translations and clarified the context in order to make it easier to read as Japanese sentences.
Yes, the image above is of a llama dunking a basketball. A summary of the process, limitations and lessons learned while experimenting with the closed beta of DALL-E 2
Artificial image of this “Shiba Inu Bento”Ever since I first saw the DALL-E 2, I’ve been dying to try it.
Wow, this is disruptive technology.
For those of you who don’t know,DALL-E 2 is a system created by OpenAI that can generate original images from text.
It’s currently in closed beta, and I was put on the waitlist in early May and got access at the end of July. During the beta period, users will receive credits (50 credits free for the first month, 15 credits per month thereafter), consume 1 credit per use, and get 3-4 images per use . You can also purchase 115 credits for US$15.
Readers who can’t wait to try PSDALL-E 2,DALL-E miniTry it for free.However, the images produced by the DALL-E mini are generally of low quality (hence manyDALL-E Meme), and it takes about 60 seconds per prompt to generate an image (DALL-E 2 takes about 5 seconds).
I’m sure you’ve seen a selection of generated images online of what the DALL-E 2 can do (with the right and creative prompts). In this article, I’ll give you a candid behind-the-scenes look at what it takes to create an image from scratch for your next subject. The theme is “Lama playing basketball”.If you are thinking of trying DALL-E 2 or want to understand the functions of DALL-E 2, please refer to it.
starting point
Knowing what prompts to give the DALL-E 2 can be explained in terms of both art and science. for example,”llama playing basketball‘ results are as follows:
Why is the DALL-E 2 leaning toward generating cartoon images for this prompt? I suspect this has something to do with the fact that the model didn’t see actual images of llamas playing basketball during its training.
Taking it a step further,realistic photo ofThe result of adding the keyword “” is as follows.
The llamas look more realistic, but the whole image is starting to look like a Photoshop failure. In this case, it’s clear that the DALL-E 2 seems to need a boost to create a cohesive scene.
Prompt engineering is the art of realizing exactly what the user wants
in the context of DALL-EPrompt engineering refers to the process of designing prompts to achieve desired results.
DALL-E 2 Prompt Bookis a great resource for that. A detailed list of inspirations for prompts with photos and works of art as keywords is provided.
I wonder why we need something like this.Because it is difficult to get usable output from DALL-E 2 (for business etc.)(Especially if you don’t know what the DALL-E 2 can do). Therefore, in order to save the user time and money in coming up with prompt input sentences by himself,A startup that created a marketplace that trades a single prompt input for $1.99was even born (*3).
In addition, all input sentences traded on PromptBase have been scrutinized, and there is no concern that images contrary to common sense will be output.
One of my personal favorite discoveries I made while trying Prompt Engineering was “dramatic backlighting“is.
What is important in prompt engineering is what you want DALL-E 2 to output.accuratelyIt is to tell. Apparently it’s not clear from the context given by the prompt input (as you can see in the image above) whether the llama I’m asking should be dressed. However,”llama wearing a jersey”, the model successfully realizes the following fantastic scenes.
The above results do not stop there. Special phrases like “dunking a basketball” or “action shot of…” are needed to make this llama really fly to add drama to the image. become. My favorite of these phrases is “…llama in a jersey dunking a basketball like Michael Jordan”.
Tip: DALL-E 2 only saves the last 50 generations in the history tab.Save your favorite images。
As you may have noticed, the composition produced by DALL-E 2 is not good
From the context of the input statement “dunk a basketball”, you would think it would be obvious where the relative positions of the llama, ball, and goal should be. More often than not, however, Lama dunks in the wrong direction, or the ball is placed in a position that makes it unlikely that he will score a shot.Even though the prompt input statement clearly states all the elements (that should be generated), DALL-E 2 doesn’t really “understand” the positional relationship of each element.This article delves deeper into this topic(*translation note 4).
According to the paper above, there arerealistic input sentencesFor images based on87% of 169 human evaluators considered the drawing correctOn the other hand, “A monkey touching an iguana”Only 11% of unrealistic input sentences were rated correct.。
The paper describes an AI model jointly researched by the University of Washington and NVIDIA as a way to improve the ability to understand the positional relationship of DALL-E 2.CLIPORTis proposed to implement. This model was developed for robot control,In addition to image recognition ability, spatial understanding ability is also implementedIt is
Another flaw caused by the DALL-E 2 not ‘understanding’ the scene is the occasional mix of textures. In the image below, the net is made of fur (a human would know this scene is morbid with a little thought).
DALL-E 2 struggles to generate realistic faces
according to some information, It is said that the struggle to generate realistic faces was a deliberate measure to prevent deepfakes from occurring (*5). You would think that this measure would only apply to humans, but apparently it also applies to llamas.
Some of the realistic llama face generation failures were downright creepy.
Regarding the fact that DALL-E 2 has a limit that does not generate photorealistic human faces, please refer to the previous OpenAI official blogarticleis written as follows:
Mitigation of abuse: To minimize the risk of unauthorized use of DALL-E, we refuse to upload images containing realistic faces or create caricatures of public figures, including celebrities and prominent politicians. ing. In addition, advanced technology is used to prevent photorealistic reproduction of the face of a real person.
Other DALL-E 2 limitations
Below are some other minor issues I’ve encountered.
Interpret angles and shots loosely
“Vision (in the distance)」「extreme long shot‘, but it’s hard to find an image that fits the entire llama in the frame.
In some cases, framing was completely ignored.
DALL-E 2 can’t spell words
Given that the DALL-E 2 struggles to “understand” the positional relationships between elements in an image, the inability to spell words correctly doesn’t seem too surprising. 6). However, in the right context, it can generate fully formed characters.
DALL-E 2 can be capricious with complex or poorly worded prompts
Also, depending on the addition of keywords and phrasing, you may get completely different results than you expected.
For example, in the case below, the real subject of the prompt (a llama in a jersey) was completely ignored.
Even the addition of the word “fluffy” dramatically worsened the performance, making the DALL-E 2brokenThere were several cases that looked like this.
When working with the DALL-E 2,without overcrowding or adding redundant wordsit is important to be specific about what you are looking for.
DALL-E 2’s ability to transfer styles is impressive
Please try DALL-E 2’s style transition.
Once you have decided on a subject that will serve as a keyword, you can generate images in a surprising number of art styles.
“Abstract style…”
“Vaporwave”
“Digital Art”
“Screenshots from the Miyazaki animated film”
Image generated by prompting “art nouveau stained glass window depicting Marvel’s Captain America”
Image generated by prompting “Elsa from Frozen, cross-stitched sampler”
final impression
After investing more than 100 credits (equivalent to 13 US dollars), the image below was completed after trial and error.
The image isn’t perfect, but the DALL-E 2 fulfilled about 80% of my expectations.
I put most of the credit into getting the style, face and composition right.
OpenAI’s DALL-E Announcementhas the following description:
“…users have full rights of use, including reproduction, sale, and merchandising, to commercialize images created with DALL-E.”(*translation note 8)
Many users are expected to be at the mercy of this rule.
For content creators, DALL-E 2 will be most useful for creating simple illustrations, photos and graphics for blogs and websites. My plan is to use it instead of Unsplash to create unique blog cover images.
For those of you who want to try the DALL-E 2 for yourself, here’sThings to know before startingI want to introduce
- DALL-E 2 Prompt Bookcheck! (fan-madeprompt engineering sheetthere is also).
- To get what you want, you need to be prepared to make trial and error. 15 free credits may sound like a lot, but it really isn’t. To generate a usable image,at leastLet’s assume you use 15 credits. DALL-E 2 nevernot cheap。
- Don’t forget to save your favorite images.
・・・
Thank you for reading.We are waiting for your impressions and opinions after experiencing DALL-E 2.
For those who read this article, I will also introduce articles written by other writers.
original
『I spent $15 in DALL·E 2 credits creating this AI image, and here’s what I learned』
author
Joy Zhang
translation
Yuki Yoshimoto (freelance writer, JDLA Deep Learning for GENERAL 2019 #1)
edit
Ozaken