[HN Gopher] A Web UI for Stable Diffusion ___________________________________________________________________ A Web UI for Stable Diffusion Author : feross Score : 72 points Date : 2022-09-09 20:02 UTC (2 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | sharps_xp wrote: | if someone can dockerize this, please reply with a link! | stephanheijl wrote: | This is the dockerized version of this repo: | https://github.com/AbdBarho/stable-diffusion-webui-docker | hwers wrote: | People recently figured out how to export stable diffusion to | onnx so it'll be exciting to see some _actual_ web UIs for it | soon (via quantized models and tfjs /onnxruntime for web) | ducktective wrote: | It seems Midjourney generates better results than SD or Dall-E. | | What's with the "hyper resolution", "4K, detailed" adjectives | which are thrown left and right, while we are at it? | Ckalegi wrote: | The metadata and file names of the images in the source data | set are also inputs for the model training. These keywords are | common tags across images that have these characteristics, so | in the same way it knows what a unicorn looks like, it also | knows what a 4k unicorn looks like compared to a hyper rez | unicorn. | schleck8 wrote: | Those are prompt engineering keywords. SD is way more reliant | on tinkering with the prompt than midjourney | | https://moritz.pm/posts/parameters | jrm4 wrote: | So, and this is an ELI5 kind of question I suppose. There must be | something going on like "processing a kazillion images" and I'm | trying to wrap my head around how (or what part of) that work is | "offloaded" to your home computer/graphics card? I just can't | seem to make sense of how you can do it at home if you're not | somehow in direct contact with "all the data?" e.g. must you be | connected to the internet, or "stable-diffusions servers" for | this to work? | juliendorra wrote: | That's the interesting part: all the images generated are | derived from a less than 4gb model (the trained weights of the | neural network). | | So in a way, hundreds of billions of possible images are all | stored in the model (each a vector in multidimensional latent | space) and turned into pixels on demand (drived by the language | model that knows how to turn words into a vector in this space) | | As it's deterministic (given the exact same request parameters, | random seed included, you get the exact same image) it's a form | of compression (or at least encoding decoding) too: I could | send you the parameters for 1 million images that you would be | able to recreate on your side, just as a relatively small text | file. | codefined wrote: | All those 'kazillion' images are processed into a single | 'model'. Similar to how our brain cannot remember 100% of all | our experiences, this model will not store precise copies of | all images it is trained off of. However, it will understand | concepts, such as what a unicorn looks like. | | For StableDiffusion, the current model is ~4GB, which is | downloaded the first time you run the model. These 4GB encode | all the information that the model requires to derive your | images. | ducktective wrote: | As someone with ~0 knowledge in this field, I think this has to | do with a concept called "transfer learning" in which you once | train with that kazillion of images, then use that same | "coefficients" for further run of the NN. | sC3DX wrote: | What you interact with as the user is the model and its | weights. | | The model (presumably some kind of convolutional neural | network) has many layers, every layer has some set of nodes, | and every node has a weight, which is just some coefficient. | The weights are 'learned' during the model training where the | model takes in the data you mention and evaluates the output. | This typically happens on a super beefy computer and can take a | long time for a model like this. As images are evaluated the | output gets better the weights get adjusted accordingly. | | Now we as the user just need the model and the weights! | dwohnitmok wrote: | This is the main reason why attempts to say that these sorts of | AI are just glorified lookup tables, or even that they are | simply tools that mash together a kazillion images together are | very misleading. | | A kazillion images are used in training, but training consists | of using those images to tune on the order of ~5 GB of weights | and that is the entire size of the final model. Those images | are never stored anywhere else and are discarded immediately | after being used to tune the model. Those 5 GB generate all the | images we see. | YoshikiMiki wrote: | Give this a shot https://pinegraph.com/create :) | user-one1 wrote: | It can be run directly into google colab: | https://colab.research.google.com/drive/1Iy-xW9t1-OQWhb0hNxu... | [deleted] | amelius wrote: | Regarding the opening image: if it can't correctly put the marks | on dice, how can it put eyes, nose and mouth correctly on a human | face? | bastawhiz wrote: | Presumably the number of faces in the training set far exceeds | the number of dice by more than a few orders of magnitude. | smrtinsert wrote: | I have a 6gb 1660ti, barely holding on. Is a new 12gb card good | enough for now, or should I go even higher to be safe for a few | years of sd innovation? | fassssst wrote: | The GeForce 4000 series is about to release and should make | Stable Diffusion wayyyyy faster based on related H100 | benchmarks posted today. | drexlspivey wrote: | How is M1/M2 support for SD? Is there a significant performance | drop? Presumably you would be able to buy a 32GB M2 and be | future proof because of the shared memory between CPU/GPU. | jjcon wrote: | In my setup at least it runs essentially in CPU mode since | there is no CUDA acceleration available and metal support is | really messy right now. So while quite slow I don't run into | memory issues at least. It runs much faster on my desktop GPU | but that has more constraints (until I upgrade my personal | 1080 to a 3090 one of these days). | totoglazer wrote: | There was a long thread last week. It's honestly pretty good | if you follow the instructions. 30-40 seconds/image. | wyldfire wrote: | It sounds like there's forks that are able to work with <=8GB | cards. And I'm not sure but I think the weights are using f32, | so switching to half might make it yet easier still to get this | to work w/less memory. | | But yeah the next generation of models would probably | capitalize on more memory somehow. | filiphorvat wrote: | People have reported that this repo even works with 2gb cards | if you run it with --lowvram and --opt-split-attention. | password4321 wrote: | Yes, the amount of VRAM doesn't seem to be as much of a | limitation anymore. However, processing power is still | important. | CitrusFruits wrote: | I'm using it with a 2070 (4 year old card with 8gb vram) and it | takes about 5 seconds for a 512x512 image. It's been plenty | fast to have some fun, but I think I'd want faster if it was | part of a professional work flow. | totoglazer wrote: | What settings? That seems faster than expected. | CitrusFruits wrote: | It was the defaults for the webui I used. Faster than I | expected too, but the results were all legit. | avocado2 wrote: | nowandlater wrote: | This is the one I've been using https://github.com/sd- | webui/stable-diffusion-webui . docker-compose up , works great. | wyldfire wrote: | I like this one but had some trouble with using img2img. Maybe | my image was too small (it was smaller than 512x512). Failed | with the same signature as an issue that was closed with a fix. | ghilston wrote: | I am mobile, but there's an issue reported on github about | img2img | stbtrax wrote: ___________________________________________________________________ (page generated 2022-09-09 23:00 UTC)