[HN Gopher] Running Stable Diffusion in 260MB of RAM
       ___________________________________________________________________
        
       Running Stable Diffusion in 260MB of RAM
        
       Author : Robin89
       Score  : 189 points
       Date   : 2023-07-20 17:01 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | stmblast wrote:
       | This is really neat! Always cool to see what people can do with
       | less.
        
       | johnklos wrote:
       | In 260 megs of RAM?!? I'm going to try this on my Amiga!
       | 
       | Check back in a few months for my results...
        
         | 13of40 wrote:
         | Look at moneybags over here with his "megs" of RAM. I think
         | mine only had 256K available after the kickstart disk was
         | loaded.
        
           | johnklos wrote:
           | I splurged.
           | 
           | http://lilith.zia.io/
        
       | vikasr111 wrote:
       | Interesting. Which platform/PC config did you use?
        
       | crest wrote:
       | Does this mean you could fit its whole working set in the cache
       | hierarchy of a modern high end GPU getting near 100% ALU
       | utilisation?
        
         | Tuna-Fish wrote:
         | It streams the weights. This is going to be what limits
         | performance, not alu utilization.
        
       | isoprophlex wrote:
       | Incredible! If only there was some cheap hackable eink frame, you
       | could make a fully self contained artwork from eink panel + rpi
       | that's (slowly) continuously updating itself..!
        
         | mananaysiempre wrote:
         | Like a continuously updating wall-mounted newspaper[1]?
         | 
         | [1] https://imgur.io/a/NoTr8XX
        
         | andrewmunsell wrote:
         | There definitely are some:
         | https://shop.pimoroni.com/search?q=e-ink
         | 
         | And now I think I know what my next project is going to be, I
         | am sure I can find some desk space
        
           | isoprophlex wrote:
           | Yessss! I looked into building some self contained "slow
           | tech" generative art using eink a couple of years ago but it
           | was just impossible for my tiny budget. This is great,
           | thanks!!
           | 
           | Edit..: I'm so hyped about this; the example image on TFA
           | takes +2 hours to generate, but who cares?! I'd love to have
           | a little display that churns around in the background and
           | creates a new variation on my prompt every whatever hours,
           | displaying the results on an unobtrusive eink screen.
        
             | mw63214 wrote:
             | Is it possible to incorporate a personalized "context" into
             | the generator? Weather, market/news sentiments, calendar
             | events, etc... to style the end result.
        
               | qwertox wrote:
               | I love the idea.
        
             | causi wrote:
             | Make sure you build in a capacity to save all the previous
             | iterations in case you see something you really like.
        
               | isoprophlex wrote:
               | Haha I like the idea of walking past, glancing now and
               | then to see if there's something you really love...
               | 
               | but on the other hand I would also love the statement
               | behind something unconnected to the internet that's
               | slowly churning out unique, ephemeral pictures. Yours to
               | enjoy, then gone forever.
        
               | civilitty wrote:
               | You can make a digital sand mandala [1]
               | 
               | [1] https://en.m.wikipedia.org/wiki/Sand_mandala
        
       | nicollegah wrote:
       | Wait are these inference times real? 1 second on a Raspi? Do I
       | get this right? This is faster than on my GPU. What's going on
       | here?
        
         | xnzakg wrote:
         | Pretty sure that is just the text encoding step. Generating a
         | complete image took 3h if I read correctly.
         | 
         | update: "Tests were run on my development machine: Windows
         | Server 2019, 16GB RAM, 8750H cpu (AVX2), 970 EVO Plus SSD, 8
         | virtual cores on VMWare."
        
         | Kuinox wrote:
         | I think it's the inference time per iteration.
        
       | mottiden wrote:
       | Amazing work!
        
       | boredemployee wrote:
       | That's really cool! I always thought you needed a good amount of
       | GPU VRAM to generate images using SD.
       | 
       | I wonder how fast would a consumer PC, with no GPU, generate an
       | image with say 16gb of RAM?
        
         | atrus wrote:
         | I was using a 6ish year old amd cpu with 16gigs of ram and
         | generating a prompt would take about a half hour. Which is
         | still massively impressive for what it is.
        
           | londons_explore wrote:
           | Use a free GPU from google colab and you can do the same in
           | about 15 seconds...
        
             | boredemployee wrote:
             | Do you have a google colab link?
        
               | hadlock wrote:
               | There is no shortage of google collab stable diffusion
               | tutorials on the web
        
             | idiotsecant wrote:
             | yes, and if he does it on a paid machine with a better GPU
             | it'll be even faster!
             | 
             | While true, neither your statement or mine above is germane
             | to the discussion. It wasn't about how long it takes. It's
             | a discussion of how cool it is that it can be done on that
             | machine at all.
        
         | wsgeorge wrote:
         | On an Apple M1 with 16gig RAM, without using Pytorch compiled
         | to take advantage of Metal, it could take 12mins to generate an
         | image with a tweet-length prompt. With Metal, it takes less
         | than 60 seconds.
        
           | asynchronous wrote:
           | Metal is such an advantage, had no idea
        
           | ilkke wrote:
           | Prompt length shouldn't influence creation time, at least it
           | didn't in any of the implementations I used.
           | 
           | What is the resolution of your images and number of steps?
        
             | wsgeorge wrote:
             | Defaults from the Huggingface repo, just copy-pasted. So,
             | iirc 50 steps and the image is 512x512.
             | 
             | Edit: confirmed.
             | 
             | > Prompt length shouldn't influence creation time...
             | 
             | Yeah, checks out with my experience too. Longer prompts
             | were truncated.
        
               | Filligree wrote:
               | Some tools (e.g. Automatic1111) are able to feed in
               | longer prompts, but then the prompt length does affect
               | the speed of inference.
               | 
               | Albeit in 77 token increments.
        
           | danieldk wrote:
           | And PyTorch on the M1 (without Metal) uses the fast AMX
           | matrix multiplication units (through the Accelerate
           | Framework). The matrix multiplication on the M1 is on par
           | with ~10 threads/cores of Ryzen 5900X.
           | 
           | [1] https://github.com/danieldk/gemm-benchmark#example-
           | results
        
       | zirgs wrote:
       | "It runs Stable Diffusion" is the new "It runs Doom".
        
         | Minor49er wrote:
         | Now I'm wondering: could a monkey hitting random keys on a
         | keyboard for an infinite amount of time eventually come up with
         | the right prompts to get GPT-4 to produce code that compiles to
         | a faithful reproduction of Doom?
        
           | LordDragonfang wrote:
           | Probably more easily than you'd think. DOOM is open
           | source[1], and as GP alludes, is probably the most frequently
           | ported game in existence, so its source code almost certainly
           | appears multiple times in GPT-4's training set, likely
           | alongside multiple annotated explanations.
           | 
           | [1] https://github.com/id-Software/DOOM
        
             | [deleted]
        
       | speedgoose wrote:
       | I like the use of a tiny device to generate the images. I was
       | wondering whether the energy consumption per image would be
       | lower, but I did the simple maths and it's not the case.
       | 
       | A raspberry pi zero 2W seems to use about 6W under load (source:
       | https://www.cnx-software.com/2021/12/09/raspberry-pi-zero-2-... )
       | 
       | So if it takes 3 hours to generate one picture, that's about 18Wh
       | per image.
       | 
       | A Nvidia Tesla or RTX GPU can generate a similar picture very
       | quickly. Assuming one second per image and 350W under load for
       | the whole system it's in the magnitude of 0.1Wh per image.
       | 
       | Of course we could consider that a raspberry pi zero uses a lot
       | less ressources and energy to be manufactured and transported.
        
         | hadlock wrote:
         | For on prem use, the up front cost is a lot lower. The A100
         | that most serious outfits are using runs in the thousands to
         | tens of thousands of dollars per unit with very limited
         | availability. The pi is typically under $75 usd for any
         | variant.
        
           | speedgoose wrote:
           | A RTX 4090 has a much better value for stable diffusion but
           | yes if you start to think about cost the pi wins. If you
           | think about availability, I'm not sure.
        
             | hadlock wrote:
             | The big immediate plus here, is if you live somewhere with
             | limited access to the internet, you can still generate
             | imagery offline on a low end laptop, like a protest group
             | in far eastern europe or other areas. My personal travel
             | laptop only has 8GB memory so it's exciting to be able to
             | try out an idea even if I don't have high end hardware.
        
       | saqadri wrote:
       | Incredible! The march continues to get more models to run on the
       | edge, much faster than I anticipated. The static quantization and
       | slicing techniques here are pretty cool
        
         | asynchronous wrote:
         | I've been amazed at how quickly the open source community has
         | iterated on LLMs and Diffusion models. Goes to show how well
         | open source can work.
        
       ___________________________________________________________________
       (page generated 2023-07-20 23:00 UTC)