[HN Gopher] Generate images fast with SD 1.5 while typing on Gradio
       ___________________________________________________________________
        
       Generate images fast with SD 1.5 while typing on Gradio
        
       Author : smusamashah
       Score  : 76 points
       Date   : 2023-11-12 15:54 UTC (7 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | Der_Einzige wrote:
       | The fact that LCM loras turn regular SD models into psudo-LCM
       | models is insane.
       | 
       | Most people in the AI world don't understand that ML is like
       | actual alchemy. You can merge models like they are chemicals. A
       | friend of mine called it "a new chemistry of ideas" upon seeing
       | many features in Automatic1111 (including model and token merges)
       | used simultaneously to generate unique images.
       | 
       | Also, loras exist on a spectrum based on their dimensionality.
       | Tiny loras should only be capable of relatively tiny changes. My
       | guess is that this is a big lora, nearly the same size as the
       | base checkpoint.
        
         | keonix wrote:
         | Wait until you hear about frankenmodels. You rip parts of one
         | model (often attention heads) and transplant them in another
         | and somehow that produces coherent results! Witchcraft
         | 
         | https://huggingface.co/chargoddard
        
           | GaggiX wrote:
           | >somehow that produces coherent results
           | 
           | with or without finetuning? Also is there a practical
           | motivation for creating them?
        
             | keonix wrote:
             | > with or without finetuning?
             | 
             | With, but it's still bonkers that it works so well
             | 
             | >Also is there a practical motivation for creating them?
             | 
             | You could get in-between model sizes (like 20b instead of
             | 13b or 34b). Before better quantization it was useful for
             | inference (if you are unlucky with vram size), but now I
             | see this being useful only for training because you can't
             | train on quants
        
         | GaggiX wrote:
         | lcm-lora-sdv1-5 is 67.5M, lcm-lora-sdxl is 197M, so they are
         | much smaller than the entire model, would be cool to check the
         | rank used with these LoRAs tho
        
           | liuliu wrote:
           | 64.
        
         | temp72840 wrote:
         | This is nuts. I did a double take at this comment - I thought
         | you _must_ have been talking about LoRAing a LCM distilled from
         | Stable Diffusion.
         | 
         | LCMs are spooky black magic, I have no intuitions about them.
        
           | ttul wrote:
           | When I was taking Jeremy Howard's course last fall, the
           | breakthrough in SD was going from 1000 steps to 50 steps via
           | classifier-free guidance, which is this neat hack where you
           | run inference with your conditioning and without and then mix
           | the result. To this day I still don't get it. But it works.
           | 
           | Now we find this way to skip to the end by building a model
           | that learns the high dimensional curvature of the path that a
           | diffusion process takes through space on its way to an
           | acceptable image, and we just basically move the model along
           | that path. That's my naive understanding of LCM. Seems to
           | good to be true, but it does work and it has a good
           | theoretical basis too. Makes you wonder what is next? Will
           | there be a single step network that can train on LCM to
           | predict the final destination? LoL that would be pushing
           | things too far..
        
             | hadlock wrote:
             | Sounds like we've invented the kind of psychic time travel
             | they use in Minority report. Let me show you right over to
             | the Future Crimes division. We're arresting this guy making
             | cat memes today because the curve of his online history
             | traces that of a radicalized blah blah blah
        
         | ttul wrote:
         | To me, the crazy thing about LoRA is they work perfectly well
         | adapting models checkpoints that were themselves derived from
         | the base model on which the LoRA was trained. So you can take
         | the LCM LoRA for SD1.5 and it works perfectly well on, say,
         | RealisticVision 5.1, a fine-tuned derivative of SD1.5.
         | 
         | You'd think that the fine tuning would make the LCM LoRA not
         | work, but it does. Apparently the changes in weights introduced
         | through even pretty heavy fine tuning does not wreck the
         | transformations the LoRA needs to make in order to make LCM or
         | other LoRA adaptations work.
         | 
         | To me this is alchemy.
        
           | yorwba wrote:
           | Finetuning and LoRAs both involve additive modifications to
           | the model weights. Addition is commutative, so the order in
           | which you apply them doesn't matter for the resulting
           | weights. Moreover, neural networks are designed to be
           | differentiable, i.e. behave approximately linearly with
           | respect to small additive modifications of the weights, so as
           | long as your finetuning and LoRA change the weights only a
           | little bit, you can finetune with or without the LoRA,
           | respectively train the LoRA on the finetuned model or its
           | base, and get mostly the same result.
           | 
           | So this is something that can be somewhat explained using not
           | terribly handwavy mathematics. Picking hyperparameters on the
           | other hand...
        
         | smusamashah wrote:
         | Ok. I have seen the term LCM Lora a number of times. I have
         | used both stable Diffusion and LORAs for fun for quite a while.
         | But I always thought this LCM Lora is a new thing. It's simply
         | not possible using current samplers to return an image under 4
         | steps. What you are saying is that just by adding a Lora we can
         | get existing models and samplers to generate a good enough
         | image in 4 steps?
        
           | jyap wrote:
           | Yes check out this blog post:
           | https://huggingface.co/blog/lcm_lora
           | 
           | I've used it with my home GPU. Really fast which makes it
           | more interactive and real-time.
        
           | catwell wrote:
           | It's a different sampler too.
        
       | jimmySixDOF wrote:
       | And here is a demo mashed up using LeapMotion free space hand
       | tracking and a projector to manipulate a "bigGAN's high-
       | dimensional space of pseudo-real images" to make it more like a
       | modern dance meets sculpting meets spatial computing with a hat
       | tip to the 2008 work of Johnny Chung Lee while at Carnage Mellon.
       | 
       | https://x.com/graycrawford/status/1100935327374626818
        
       | smlacy wrote:
       | https://nitter.net/abidlabs/status/1723074108739706959
        
       | r-k-jo wrote:
       | Here is a collection of demos with fast LCM on HuggingFace
       | 
       | https://huggingface.co/collections/latent-consistency/latent...
        
       ___________________________________________________________________
       (page generated 2023-11-12 23:00 UTC)