Discussion about this post

User's avatar
Neural Foundry's avatar

Excellent breakdown of how transformers became the new backbone for diffusion models. The adaLN-Zero conditioning mechanism is particularly clever becasue it lets every layer adapt dynamically to noise levels instead of forcing the model to learn one static denoising path. I've noticed in production that smaller patch sizes do improve quality but the quadratic attention cost can get brutal fast, so there's always a tradeoff that dunno if enough people consider.

Expand full comment

No posts

Ready for more?