(My) Process of Creating AI Art

Some of you asked me how I create my AI pieces with Midjourney, so here's an overview of the process.

Let's start with the resulting piece, which is a composite of multiple layers, manual drawing with Photoshop, and many AI variation and mutation tries:


Flying

The Idea

Before I start "sculpting" with the text-to-image AI Midjourney, there's the idea. It may be drawn from something that happened that day, something that stuck with me from past books, comics, movies and life, or a reaction to news. I take notes throughout the day, and night. Sometimes it's just a visual in my head that I like that's looking for a meaning, at other times the concept starts in words and I'm then looking for the right visualization. If someone tells you AI art is quick to generate, that is true in one sense, but in another sense, what to put in the image might have brewed for years.

In this case, the creation started as the visual idea of a protester in center-foreground holding up a sign, and in the sky you can see people casually flying. When working with an image AI, I like to keep some specifics blurred; I don't need to decide now what would be showing on the protester's sign (or if there'd be a protester at all). A "No Flying" sign was an apparent route to take, but not the only one. I love when the image AI surprises me and takes the concept into new directions, in what you can call a co-creation, so I often don't lock in the concept at start.

Picking the tool

Now comes the choice of the AI tool to use. I've explored NightCafe, Dall-E and Stable Diffusion, but I find nothing (in 2022) beats the visual fidelity of Midjourney, at least for my purposes. I find it so great that I decided to dedicate a whole Instagram channel (and most of my days) to creations made with it. What a time to be alive to have something in your hands to let the pictures in your head get out that smoothly.

Prompting

A prompt is the text you come up with to have the text-to-image AI do its magic. It's important to understand that the AI won't just do what it's told, but it's more of a dance around a subject. And it takes a lot of random tries to get where you want. Writing a prompt like, literally, "A protester in center-foreground holding up a sign, and in the sky you can see people casually flying", does not make it so. However, I do often indeed like to start with somewhat of a "naive" prompt, and then go from there based on the results.

For this idea, the first prompt I picked was this: "people flying in air, bright sky, lonely protester with sign". I keep my prompts short to retain good control over each part and have later additions matter. Here's the set of 4 pictures this resulted in:

I actually totally loved this as first try. In particular the fourth one was amazing. It offered to take the concept into a new direction: What if the protester was protesting airplanes, but they're themself flying? I ultimately disregarded this direction, because it would shift the focus onto subjects I didn't plan to cover with this particular image (maybe next time). My aim was for something more allegorical, dreamy and fun.

Creating Variations

Now's the time to hit the "Variations" button in Midjourney. Here's what the interface looks like:

It's a private conversation channel I'm having with the Midjourney bot (and it costs a bit of money). Generating one creation set takes around a minute of waiting time. In this interface, you have the option to click U1-4, meaning "Upscale image 1 to 4" (counting from top-left to bottom-right). You can also click V1-4, meaning "Give me another set of four variations for that image". And finally, there's the reload button, which creates a wholly new set of four. Clicking it gave us this:

And adding "--ar 2:3" changes the aspect ratio to portrait mode, which seemed fitting considering the subject. It gave us this:

All nice, but none better than the first set, and still not where I wanted to go. Where's all the people in the air, flying? It seemed that the word "protester" biased the image too much towards a crowd on the ground. So what if we omitted it and went for "people flying in air, sky, sign"? Then we get this:

This is incredibly close to what I wanted, but still not there. The first image includes a sign, which is great – and the contents of which I knew I could photoshop later – but it has too much of a naively painted style, and I didn't like the gloomy weather. The second image had nice weather, but no sign. The third has a sign and okish weather, but there's not many people flying, they're not drawn well, and someone decided to sit on the sign! (The helicopter-like object in the sky on the other hand poses no problem, as it can be easily patch-removed in Photoshop.)

Let's try again:

And again:

And again:

You can see where this is going – a lot of "close but no cigar" results! And this is the portion which may span over an hour or more, as there's always another minute for variations to render.

I'll spare you the full generations reel, but here's a few more of them!

On and on and on it goes. If nothing perfect comes back, you can vary your prompt. In this case, I added "wide angle", hoping to get a more dramatic perspective, as well as theming it more towards photorealism. (For my creations, I like to go towards a kind of airbrushed almost-realist-but-still-clearly-painted feel – I don't want the result to look like a photo. It's a personal artistic preference, and you may have totally different ones for your creations.)

So here's with the prompt "people flying in air, sky, sign, wide angle":

Interesting, but I'd really like happier weather, and also, where's the sign? And "wide angle", while nice, also seemed to have tilted it into an upwards angle, which I didn't want:

It's worth noting that Midjourney is said to not understand phrases of multiple words, so "wide" and "angle" would get broken up – but since they still bias towards a certain thematic, they may both get you closer to the phrase's meaning. It's also worth noting that every keyword "radiates" into other concepts of the image. I once tried a visual of a robot playing with a kid, and when I added a football to the scene, it made the resulting robots much more round!


"NoW, FeTcH."

Wanting to get better weather, I also added the word "sunny"...

... or "bright":

This clearly helped with the weather, but at this point you start wondering if Midjourney even knows what a sign is! And that's a relevant question, because many words have multiple meanings. In this case, "sign" could be understood to mean something like "a sign of something to come". And since that was probably in captions in the AI's training, it does hold the power to diffuse one's wanted meaning.

By now, I took several detours, changing prompt words to things like...

When looking for the right prompt words, another AI can come in handy – OpenAI's ChatGPT. In this case, I asked it for other words for street signs, and it gave me the "directional sign" clue. Additionally helpful because I'm not a native speaker of English!

But not to despair, by spinning the variations randomizer wheel a few more times, we do indeed get street signs...

... and sometimes even sunny ones!

Locking in on the subject

We're now an hour or two into the process, and are nearing where we want the image to go. The spinning wheel nature of creating variations may play into dopamine release and addiction, but I'll leave it to brain researchers to study this.

If you don't find the perfect image, you can consider creating two totally separate ones – in our case, to have flying people and the sign be two separate creations. But Midjourney is about the most amazing (automated) painter and color composer in the world, and when you can, you want to utilize it as best as possible for the most integrated look. Here's an example of where I photoshopped in a totally different Mona Lisa creation:


Last night at the Louvre.

It's not half-bad, but I often still feel there are traces of the photoshopped-in look remaining.

Another strategy in Midjourney is to enable the "remix" setting, which allows you to create variations of a prompt while still taking a base image you have – but I still find it gives me a photoshoppy feel! It's as if the AI tried to compose the colors after the fact, making for less natural outlines. Check out these rats in a post-apocalypse server room (the joke being that the servers and possibly the AI on them are still running, and the rats use them to keep warm!). The first image is the non-remix, and then, to get more light in, I added a remix variant with the additional word "sunset":


No remix


"Sunset" remix

Can you see how something feels slightly off in the remixed set? Almost like the rats, or the sunset background, have been pasted in. However, I find a remixed image can be great to add as Photoshop layer on top of the base creation, mixing in parts.

Another approach I don't personally use too much (but you may) is the ::weight parameter in Midjourney, which lets you adjust the strength of particular words. I feel it doesn't give me the control I get compared to the full A/B probing of words, while it also slightly takes me out of the flow. And that flow when creating AI pieces is magical. I do sometimes add single suffixes like "... :: minimal ::0.2", though.

But back to our flying people! Spoiler alert – after what felt like a million tries, I finally got this:

See the top left image? Hallelujah! Nice weather, floating people, and a sign to match! Retouching what's on the sign is a job for later, but since it's a flat area, it's relatively easy. I didn't yet decide what exactly to put on the sign, but I'm now milking the heck out of that result... by creating endless variations, upscales, and variations-of-variations. It's a process where you grow, massage and mutate something towards your goal. Look at these:

Basically, whenever you like a result, you click the V1-4 or U1-4 buttons, then check if the upscale looks nice or another variation is even better. Note that upscales can still wildly change the image, sometimes adding details you love, at other times taking off in a wrong direction. Oftentimes, to keep the core message of the image, I then later remove or soften distracting details in Photoshop.

Now we're at the point where we can upscale the images, and we can soon mix and match parts of different images in Photoshop! For instance, I like the sun in this picture ...

... but I don't like how everyone was too directed towards the sun – having a message of "people flying into the sun" would evoke a different subject. Perhaps one of Icarus melting in the sun, the tragedy of aiming too high, and so forth. Fascinating points, but not the ones I wanted to make! Similarly, I wanted to avoid those people looking anything like superheroes flying... it was meant to be a casual flight, like when you're strolling, or walking to work.

Here's one where I loved the people:

The center person is now nicely rotated in a different direction; the center right one almost looks like they're taking an air walk; nobody's touching the sign, and all bodies are clearly readable! However, I didn't like the sun – the individual beams made it too illustrationy to me (going too far from depiction to semantic meaning on Scott McCloud's wonderful pyramid). But not to worry, we can now start going into Photoshop!

Mixing different creations in Photoshop

See the layers in the bottom right? They're different creations on which you can then use the eraser, so that the image below shines through. You can now see we combined the wanted people with the wanted sun. I also did some correction on additional limbs (that's an issue with AI-generated art in 2022, which famously tends to give humans 12 or more fingers).

You may also see I drew in longer hair and more female proportions for the top-left person. Representation is a huge issue with AI generators due to the training data they use. In Midjourney, if you create, say, an office scene with a group of people, they'll mostly be white (and often, mostly men), whereas my aim is to have a diverse crowd. Sure, you can specifically use keywords to get certain ethnicities (and the word "diverse" can help), but it often then biases everything towards that keyword – maybe I want an Asian person in an office setting without a somewhat stereotypical cherry blossom tree and traditional dress (and I do love cherry blossom trees). Generators like Dall-E already have made efforts to counter this, and I expect other generators to catch up in the future. As it is, I often go to Photoshop to then change the skin color and more of people in the results.

Back to our image – we now need a proper sign! I brainstormed several approaches, but I love the non-textual approach of artists like Mordillo, who worked in the 1970s and other decades. Look at this image of his – not a single word and you understand everything, and it hits you with an emotional oomph when you deduce its meaning!

So, instead of saying "No Flying" on the sign, I want it to grow on you visually.

I'm removing the sign image now – Photoshop's fantastic AI-assisted patch tools (but also their stamp tools) come in handy:

Finally, I'm drawing in the actual sign contents as its own layer. I took the red color from another image, liking and trusting Midjourney's color choice. I then aimed for imperfect, painted-look lines to fit with the rest of the image (and also tried leaving a bit of smudge and naturalness on the sign):

We're now typically nearing around hour 3 or 4 in the process. You've now spent a portion of the day with your subject, have it grow on you, have a whole debate behind you with your co-creator Midjourney, might have taken different paths, might have reverted, might have – rarely, but it happened – given up completely. But if fate and energy allows, you've now arrived at the final result you're happy with!

In this case, all that was left to do was to crop the proportions (to 1024x1280, which means it won't be cropped further in Instagram) and to find a title, if any. ChatGPT again comes in handy at times to settle on a title! For instance, you can ask it about alternative ways to phrase something.

The goal with many of my images is to have the end result remain somewhat ambiguous. Take the following image:


The Button

I could have named this "The Reset Button" or "The Launch Button" or "The Button that does [some specific thing]". But who am I to get in the way of your imagination and a good story? The point is that you by looking at it should dream up things that might be happening here. Half the fun is for me to see what creative interpretations people have, some very relevant to their life. Isn't it beautiful to have an image with an infinite number of meanings?

Here's another example. It's a creation where the idea started with having a baby flooding the whole room during eating. Among the many results however were also those that put the baby into the plate (amazing idea, Midjourney!) and in this single one, it even raised the spoon – perfect to represent a new meaning, the commanding of a ship (and the parents) in the ocean of dirt, so we can emphasize that in the title!


The Captain

In our current case of flying people, the image tells all it needs to tell, so I could have left it untitled, but simply settled for "Flying". It doesn't add or remove anything, but just anchors the creation to become a thing of its own. And now it's done!

So what is this thing we're doing?

There's a big debate today what making images with AI really is. Is it art? Is it prompt engineering? Is it one-click-painting? Is it like "sculpting with language", as one creator called it? Is it the growing of a tree by watering it with your brain's ideas? Incidentally, that's one way to put it, which I tried to express in this AI creation:

But that debate is a subject for another day. For now, all I can say is, I'm currently doing this – whatever it may be – for around 10 hours every day, using every fiber of thought and focus and ideas that I have. The AI is perfect at the paint brush, it blows me away with lighting, it generates the most amazing angles. This feeling of the AI creating along with you also means I always credit it in the picture captions.

See, I loved doodling, drawing and painting for all my life (that, and programming and writing). Here's a Spielberg caricature I did many years ago, and some pixel-based game art...

... but I'm not the Midjourney-blows-you-away (and does so in minutes) airbrush painter. I'm now using that tool for expression. And I hope that you, when looking at the result, get a kick out of it. And that here and there, it brings comfort or progress. Artists tickle the brains of society through commentary or asking questions – and I really hope I'm tickling yours!