Birbla

I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorch (github.com)
481 points by yousef_g - 3 weeks ago

Although I'm leaning heavily away from being passionate about software development, this is a cool project, and its freaken awesome how anyone can now reinvent the wheel from first principles.
by squircle - 3 weeks ago
Sounds like a great resources for learners.
Just wondering aloud --
Is there a tutorial/explainer by any chance that a beginner could use to follow along and learn how this is done.
by albert_e - 3 weeks ago
I'm not sure what this means. If it means the Stable Diffusion 3.5 model, why is it fetching that here: https://github.com/yousef-rafat/miniDiffusion/blob/main/enco...
The training dataset is very small, only including fashion-related pictures: https://github.com/yousef-rafat/miniDiffusion/tree/main/data...
by reedlaw - 3 weeks ago
Add a Hugging Face Token in get_checkpoints.py before running the script.
Can you be a bit more specific here? It's not clear what such a token is, what it takes to get one, or where it would be placed in get_checkpoints.py.
by CamperBob2 - 3 weeks ago
Cool. Can it still make images of Anne Hathaway leading a herd of blue giraffes on the Moon?
by theturtle - 3 weeks ago
If you are interested in this: Flux reference implementation is very minimalistic: https://github.com/black-forest-labs/flux/tree/main/src/flux
The minRF project is very easy to start with training small diffusion models with rectified flow: https://github.com/cloneofsimo/minRF
Also, the reference implementation of SD 3.5 is actually minimalistic too: https://github.com/Stability-AI/sd3-ref
by liuliu - 3 weeks ago
Is there any notable properties of this implementation, are some parts slower, faster etc
by vergessenmir - 3 weeks ago
So, that's Stable Diffusion without license constraints, is it?
by NoelJacob - 3 weeks ago
How usable is the original academic source available from Ludwig Maximilian University CompViz group?
by caycep - 3 weeks ago
I find it hilarious that “from scratch” now somehow means “in PyTorch”.
by eapriv - 3 weeks ago
I'm embarrassed to ask: can someone elaborate on, say, what we have now that we didn't have before the repo existed?
I have studiously avoided making models, though I've been adjacent to their output for years now... I think the root of my confusion is I kinda assumed there was already PyTorch based scripts for inference / training. (I assumed _at least_ inference scripts were released with models, and kinda figured fine-tuning / training ones were too)
So then I'm not sure if I'm just looking at a clean room / dirty room rewrite of those. Or maybe everyone is using "PyTorch" but it's usually calling into CUDA/C/some proprietary thingy that is much harder to grok than a pure PyTorch impl?
Anyways, these arent great guesses, so I'll stop myself here. :)
by refulgentis - 3 weeks ago
now do it in minecraft
by hkon - 3 weeks ago
When I think of SD 3.5 (or any version) I think of the portion that results from training, i.e., the weights. The code seems less important? I mean as far as output quality is concerned, or performance. But I'm honestly not sure, and not trying to judge these efforts on that basis.
by ineedasername - 3 weeks ago
Does using pure PyTorch improve performance on non-NVIDIA cards in any way? Or is PyTorch so highly optimized for CUDA that no other GPU vendors have a chance?
by Dwedit - 3 weeks ago

       self.q = nn.Linear(embed_size, embed_size, bias = False)
       self.k = nn.Linear(embed_size, embed_size, bias = False)
       self.v = nn.Linear(embed_size, embed_size, bias = False)

Try

       self.qkv = nn.Linear(embed_size, 3*embed_size, bias = False)

    def forward(...):
       ...
       qkv = self.qkv(x)

by godelski - 3 weeks ago

Pure pytorch?
by nothrowaways - 3 weeks ago