[HN Gopher] A Web UI for Stable Diffusion
       ___________________________________________________________________
        
       A Web UI for Stable Diffusion
        
       Author : feross
       Score  : 72 points
       Date   : 2022-09-09 20:02 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | sharps_xp wrote:
       | if someone can dockerize this, please reply with a link!
        
         | stephanheijl wrote:
         | This is the dockerized version of this repo:
         | https://github.com/AbdBarho/stable-diffusion-webui-docker
        
       | hwers wrote:
       | People recently figured out how to export stable diffusion to
       | onnx so it'll be exciting to see some _actual_ web UIs for it
       | soon (via quantized models and tfjs /onnxruntime for web)
        
       | ducktective wrote:
       | It seems Midjourney generates better results than SD or Dall-E.
       | 
       | What's with the "hyper resolution", "4K, detailed" adjectives
       | which are thrown left and right, while we are at it?
        
         | Ckalegi wrote:
         | The metadata and file names of the images in the source data
         | set are also inputs for the model training. These keywords are
         | common tags across images that have these characteristics, so
         | in the same way it knows what a unicorn looks like, it also
         | knows what a 4k unicorn looks like compared to a hyper rez
         | unicorn.
        
         | schleck8 wrote:
         | Those are prompt engineering keywords. SD is way more reliant
         | on tinkering with the prompt than midjourney
         | 
         | https://moritz.pm/posts/parameters
        
       | jrm4 wrote:
       | So, and this is an ELI5 kind of question I suppose. There must be
       | something going on like "processing a kazillion images" and I'm
       | trying to wrap my head around how (or what part of) that work is
       | "offloaded" to your home computer/graphics card? I just can't
       | seem to make sense of how you can do it at home if you're not
       | somehow in direct contact with "all the data?" e.g. must you be
       | connected to the internet, or "stable-diffusions servers" for
       | this to work?
        
         | juliendorra wrote:
         | That's the interesting part: all the images generated are
         | derived from a less than 4gb model (the trained weights of the
         | neural network).
         | 
         | So in a way, hundreds of billions of possible images are all
         | stored in the model (each a vector in multidimensional latent
         | space) and turned into pixels on demand (drived by the language
         | model that knows how to turn words into a vector in this space)
         | 
         | As it's deterministic (given the exact same request parameters,
         | random seed included, you get the exact same image) it's a form
         | of compression (or at least encoding decoding) too: I could
         | send you the parameters for 1 million images that you would be
         | able to recreate on your side, just as a relatively small text
         | file.
        
         | codefined wrote:
         | All those 'kazillion' images are processed into a single
         | 'model'. Similar to how our brain cannot remember 100% of all
         | our experiences, this model will not store precise copies of
         | all images it is trained off of. However, it will understand
         | concepts, such as what a unicorn looks like.
         | 
         | For StableDiffusion, the current model is ~4GB, which is
         | downloaded the first time you run the model. These 4GB encode
         | all the information that the model requires to derive your
         | images.
        
         | ducktective wrote:
         | As someone with ~0 knowledge in this field, I think this has to
         | do with a concept called "transfer learning" in which you once
         | train with that kazillion of images, then use that same
         | "coefficients" for further run of the NN.
        
         | sC3DX wrote:
         | What you interact with as the user is the model and its
         | weights.
         | 
         | The model (presumably some kind of convolutional neural
         | network) has many layers, every layer has some set of nodes,
         | and every node has a weight, which is just some coefficient.
         | The weights are 'learned' during the model training where the
         | model takes in the data you mention and evaluates the output.
         | This typically happens on a super beefy computer and can take a
         | long time for a model like this. As images are evaluated the
         | output gets better the weights get adjusted accordingly.
         | 
         | Now we as the user just need the model and the weights!
        
         | dwohnitmok wrote:
         | This is the main reason why attempts to say that these sorts of
         | AI are just glorified lookup tables, or even that they are
         | simply tools that mash together a kazillion images together are
         | very misleading.
         | 
         | A kazillion images are used in training, but training consists
         | of using those images to tune on the order of ~5 GB of weights
         | and that is the entire size of the final model. Those images
         | are never stored anywhere else and are discarded immediately
         | after being used to tune the model. Those 5 GB generate all the
         | images we see.
        
       | YoshikiMiki wrote:
       | Give this a shot https://pinegraph.com/create :)
        
       | user-one1 wrote:
       | It can be run directly into google colab:
       | https://colab.research.google.com/drive/1Iy-xW9t1-OQWhb0hNxu...
        
       | [deleted]
        
       | amelius wrote:
       | Regarding the opening image: if it can't correctly put the marks
       | on dice, how can it put eyes, nose and mouth correctly on a human
       | face?
        
         | bastawhiz wrote:
         | Presumably the number of faces in the training set far exceeds
         | the number of dice by more than a few orders of magnitude.
        
       | smrtinsert wrote:
       | I have a 6gb 1660ti, barely holding on. Is a new 12gb card good
       | enough for now, or should I go even higher to be safe for a few
       | years of sd innovation?
        
         | fassssst wrote:
         | The GeForce 4000 series is about to release and should make
         | Stable Diffusion wayyyyy faster based on related H100
         | benchmarks posted today.
        
         | drexlspivey wrote:
         | How is M1/M2 support for SD? Is there a significant performance
         | drop? Presumably you would be able to buy a 32GB M2 and be
         | future proof because of the shared memory between CPU/GPU.
        
           | jjcon wrote:
           | In my setup at least it runs essentially in CPU mode since
           | there is no CUDA acceleration available and metal support is
           | really messy right now. So while quite slow I don't run into
           | memory issues at least. It runs much faster on my desktop GPU
           | but that has more constraints (until I upgrade my personal
           | 1080 to a 3090 one of these days).
        
           | totoglazer wrote:
           | There was a long thread last week. It's honestly pretty good
           | if you follow the instructions. 30-40 seconds/image.
        
         | wyldfire wrote:
         | It sounds like there's forks that are able to work with <=8GB
         | cards. And I'm not sure but I think the weights are using f32,
         | so switching to half might make it yet easier still to get this
         | to work w/less memory.
         | 
         | But yeah the next generation of models would probably
         | capitalize on more memory somehow.
        
           | filiphorvat wrote:
           | People have reported that this repo even works with 2gb cards
           | if you run it with --lowvram and --opt-split-attention.
        
             | password4321 wrote:
             | Yes, the amount of VRAM doesn't seem to be as much of a
             | limitation anymore. However, processing power is still
             | important.
        
         | CitrusFruits wrote:
         | I'm using it with a 2070 (4 year old card with 8gb vram) and it
         | takes about 5 seconds for a 512x512 image. It's been plenty
         | fast to have some fun, but I think I'd want faster if it was
         | part of a professional work flow.
        
           | totoglazer wrote:
           | What settings? That seems faster than expected.
        
             | CitrusFruits wrote:
             | It was the defaults for the webui I used. Faster than I
             | expected too, but the results were all legit.
        
         | avocado2 wrote:
        
       | nowandlater wrote:
       | This is the one I've been using https://github.com/sd-
       | webui/stable-diffusion-webui . docker-compose up , works great.
        
         | wyldfire wrote:
         | I like this one but had some trouble with using img2img. Maybe
         | my image was too small (it was smaller than 512x512). Failed
         | with the same signature as an issue that was closed with a fix.
        
           | ghilston wrote:
           | I am mobile, but there's an issue reported on github about
           | img2img
        
         | stbtrax wrote:
        
       ___________________________________________________________________
       (page generated 2022-09-09 23:00 UTC)