Stable Diffusion + ControlNet

I’ve been trying my best to keep up with all things image generation related. I got really into GANs a couple years ago and took some online courses to get familiar with the topic. It was pretty easy to understand the Diffusion process knowing about GANs, so I really only had to read into text/image embeddings and CLIP.

I built Stable Diffusion on my PC and started playing around with prompts at first. I didn’t even realize there was a Web UI so I was just making CLI runs in WSL2. I was having issues with my environment at first since I couldn’t get Facebook’s xformers installed. I could still make images, but it would take up more of my RAM. I found that just installing xformers first and the installing all of Stable Diffusion’s other requirements worked.

To the right are some of my first prompts that I found amusing. The first is “glorfindel defeating a balrog” and the other one is “a picture of a power ranger taken in the 1800s” .

Once I found out about the UI I played around with upscaling.

A couple weeks after my messing around with Stable Diffusion, I saw some posts online about ControlNet. I was unaware of AUTOMATIC111’s github repot that allowed for a bunch of extensions. When I first started out I was just using StableAI’s repo. I then started to look into extensions like ControlNet and DreamBooth. One of the features of ControlNet is that you can input an image that guides image generation. In the example below, I inputted a sketch from Albrecht Durer, selected Canny edge detection, and typed out a prompt like “a Flemish painting of a winged person playing the lute”. The pose from the original sketch is exactly the same in the AI generated ones!

The same can be done with depth images and there is even an option to sketch something out with your cursor and use that to guide image generation.