38. Stable Diffusion VAE is Flawed

Massimiliano Viola

Sep 17

Serious claims on Reddit, check this out!

Read →

5 Comments

Martin JJ. Bucher

Sep 20Edited

very interesting fact, i didn't know this!! great read.

one question i have where i am confused: the 2x2 plot with the duck and moon walking shows the reconstruction error on the left after zeroing out each channel in the latent space.

however, if i understand correctly, we zero out the values in the 64x64x4 latent, so each channel gets zeroed out in the 64x64 "image". how does this then map to the 512x512 error map? is is just regular matplotlib upscaling? or is there some other mapping happening? i didn't check reddit/hackernews, was just wondering if you know more by any chance! or did i miss something crucial?

Expand full comment

Reply (1)

Massimiliano Viola

Sep 20Edited

We get a 64x64x4 latent when we generate an image. If you decode it, that would become the 512x512 original image.

Now, that plot is the error on the decoded image if you zero out that 1x1x4 position, compared to just decoding with no changes (original image).

Anyway, the best answer here is code. Just scroll down mid post :)

https://www.reddit.com/r/StableDiffusion/comments/1ag5h5s/the_vae_used_for_stable_diffusion_1x2x_and_other/

Expand full comment

Reply (1)

Martin JJ. Bucher

Sep 20

yes! i saw that snippet in the reddit post as well :) my question was more: how does the 64×64 error data become the 512×512 visualization, and whether it's matplotlib upscaling or not? (since we call plt.imshow(divergence) with divergence being 64x64 but then you show a 512x512 error map it seems?). i am just a bit confused about the mapping from 64 to 512 res.

Expand full comment

Reply (1)

Massimiliano Viola

Sep 20

I see! No wait, the visual is 64x64, not 512x512. I just displayed the divergence as it came out of that piece of code.

Expand full comment

Reply (1)

Martin JJ. Bucher

oops, my bad! looked like a 512 res image.

Expand full comment

Machine Learning with a Honk

38. Stable Diffusion VAE is Flawed