5 Comments
User's avatar
Martin JJ. Bucher's avatar

very interesting fact, i didn't know this!! great read.

one question i have where i am confused: the 2x2 plot with the duck and moon walking shows the reconstruction error on the left after zeroing out each channel in the latent space.

however, if i understand correctly, we zero out the values in the 64x64x4 latent, so each channel gets zeroed out in the 64x64 "image". how does this then map to the 512x512 error map? is is just regular matplotlib upscaling? or is there some other mapping happening? i didn't check reddit/hackernews, was just wondering if you know more by any chance! or did i miss something crucial?

Expand full comment
Massimiliano Viola's avatar

We get a 64x64x4 latent when we generate an image. If you decode it, that would become the 512x512 original image.

Now, that plot is the error on the decoded image if you zero out that 1x1x4 position, compared to just decoding with no changes (original image).

Anyway, the best answer here is code. Just scroll down mid post :)

https://www.reddit.com/r/StableDiffusion/comments/1ag5h5s/the_vae_used_for_stable_diffusion_1x2x_and_other/

Expand full comment
Martin JJ. Bucher's avatar

yes! i saw that snippet in the reddit post as well :) my question was more: how does the 64×64 error data become the 512×512 visualization, and whether it's matplotlib upscaling or not? (since we call plt.imshow(divergence) with divergence being 64x64 but then you show a 512x512 error map it seems?). i am just a bit confused about the mapping from 64 to 512 res.

Expand full comment
Massimiliano Viola's avatar

I see! No wait, the visual is 64x64, not 512x512. I just displayed the divergence as it came out of that piece of code.

Expand full comment
Martin JJ. Bucher's avatar

oops, my bad! looked like a 512 res image.

Expand full comment