[HN Gopher] Running Stable Diffusion in 260MB of RAM ___________________________________________________________________ Running Stable Diffusion in 260MB of RAM Author : Robin89 Score : 189 points Date : 2023-07-20 17:01 UTC (5 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | stmblast wrote: | This is really neat! Always cool to see what people can do with | less. | johnklos wrote: | In 260 megs of RAM?!? I'm going to try this on my Amiga! | | Check back in a few months for my results... | 13of40 wrote: | Look at moneybags over here with his "megs" of RAM. I think | mine only had 256K available after the kickstart disk was | loaded. | johnklos wrote: | I splurged. | | http://lilith.zia.io/ | vikasr111 wrote: | Interesting. Which platform/PC config did you use? | crest wrote: | Does this mean you could fit its whole working set in the cache | hierarchy of a modern high end GPU getting near 100% ALU | utilisation? | Tuna-Fish wrote: | It streams the weights. This is going to be what limits | performance, not alu utilization. | isoprophlex wrote: | Incredible! If only there was some cheap hackable eink frame, you | could make a fully self contained artwork from eink panel + rpi | that's (slowly) continuously updating itself..! | mananaysiempre wrote: | Like a continuously updating wall-mounted newspaper[1]? | | [1] https://imgur.io/a/NoTr8XX | andrewmunsell wrote: | There definitely are some: | https://shop.pimoroni.com/search?q=e-ink | | And now I think I know what my next project is going to be, I | am sure I can find some desk space | isoprophlex wrote: | Yessss! I looked into building some self contained "slow | tech" generative art using eink a couple of years ago but it | was just impossible for my tiny budget. This is great, | thanks!! | | Edit..: I'm so hyped about this; the example image on TFA | takes +2 hours to generate, but who cares?! I'd love to have | a little display that churns around in the background and | creates a new variation on my prompt every whatever hours, | displaying the results on an unobtrusive eink screen. | mw63214 wrote: | Is it possible to incorporate a personalized "context" into | the generator? Weather, market/news sentiments, calendar | events, etc... to style the end result. | qwertox wrote: | I love the idea. | causi wrote: | Make sure you build in a capacity to save all the previous | iterations in case you see something you really like. | isoprophlex wrote: | Haha I like the idea of walking past, glancing now and | then to see if there's something you really love... | | but on the other hand I would also love the statement | behind something unconnected to the internet that's | slowly churning out unique, ephemeral pictures. Yours to | enjoy, then gone forever. | civilitty wrote: | You can make a digital sand mandala [1] | | [1] https://en.m.wikipedia.org/wiki/Sand_mandala | nicollegah wrote: | Wait are these inference times real? 1 second on a Raspi? Do I | get this right? This is faster than on my GPU. What's going on | here? | xnzakg wrote: | Pretty sure that is just the text encoding step. Generating a | complete image took 3h if I read correctly. | | update: "Tests were run on my development machine: Windows | Server 2019, 16GB RAM, 8750H cpu (AVX2), 970 EVO Plus SSD, 8 | virtual cores on VMWare." | Kuinox wrote: | I think it's the inference time per iteration. | mottiden wrote: | Amazing work! | boredemployee wrote: | That's really cool! I always thought you needed a good amount of | GPU VRAM to generate images using SD. | | I wonder how fast would a consumer PC, with no GPU, generate an | image with say 16gb of RAM? | atrus wrote: | I was using a 6ish year old amd cpu with 16gigs of ram and | generating a prompt would take about a half hour. Which is | still massively impressive for what it is. | londons_explore wrote: | Use a free GPU from google colab and you can do the same in | about 15 seconds... | boredemployee wrote: | Do you have a google colab link? | hadlock wrote: | There is no shortage of google collab stable diffusion | tutorials on the web | idiotsecant wrote: | yes, and if he does it on a paid machine with a better GPU | it'll be even faster! | | While true, neither your statement or mine above is germane | to the discussion. It wasn't about how long it takes. It's | a discussion of how cool it is that it can be done on that | machine at all. | wsgeorge wrote: | On an Apple M1 with 16gig RAM, without using Pytorch compiled | to take advantage of Metal, it could take 12mins to generate an | image with a tweet-length prompt. With Metal, it takes less | than 60 seconds. | asynchronous wrote: | Metal is such an advantage, had no idea | ilkke wrote: | Prompt length shouldn't influence creation time, at least it | didn't in any of the implementations I used. | | What is the resolution of your images and number of steps? | wsgeorge wrote: | Defaults from the Huggingface repo, just copy-pasted. So, | iirc 50 steps and the image is 512x512. | | Edit: confirmed. | | > Prompt length shouldn't influence creation time... | | Yeah, checks out with my experience too. Longer prompts | were truncated. | Filligree wrote: | Some tools (e.g. Automatic1111) are able to feed in | longer prompts, but then the prompt length does affect | the speed of inference. | | Albeit in 77 token increments. | danieldk wrote: | And PyTorch on the M1 (without Metal) uses the fast AMX | matrix multiplication units (through the Accelerate | Framework). The matrix multiplication on the M1 is on par | with ~10 threads/cores of Ryzen 5900X. | | [1] https://github.com/danieldk/gemm-benchmark#example- | results | zirgs wrote: | "It runs Stable Diffusion" is the new "It runs Doom". | Minor49er wrote: | Now I'm wondering: could a monkey hitting random keys on a | keyboard for an infinite amount of time eventually come up with | the right prompts to get GPT-4 to produce code that compiles to | a faithful reproduction of Doom? | LordDragonfang wrote: | Probably more easily than you'd think. DOOM is open | source[1], and as GP alludes, is probably the most frequently | ported game in existence, so its source code almost certainly | appears multiple times in GPT-4's training set, likely | alongside multiple annotated explanations. | | [1] https://github.com/id-Software/DOOM | [deleted] | speedgoose wrote: | I like the use of a tiny device to generate the images. I was | wondering whether the energy consumption per image would be | lower, but I did the simple maths and it's not the case. | | A raspberry pi zero 2W seems to use about 6W under load (source: | https://www.cnx-software.com/2021/12/09/raspberry-pi-zero-2-... ) | | So if it takes 3 hours to generate one picture, that's about 18Wh | per image. | | A Nvidia Tesla or RTX GPU can generate a similar picture very | quickly. Assuming one second per image and 350W under load for | the whole system it's in the magnitude of 0.1Wh per image. | | Of course we could consider that a raspberry pi zero uses a lot | less ressources and energy to be manufactured and transported. | hadlock wrote: | For on prem use, the up front cost is a lot lower. The A100 | that most serious outfits are using runs in the thousands to | tens of thousands of dollars per unit with very limited | availability. The pi is typically under $75 usd for any | variant. | speedgoose wrote: | A RTX 4090 has a much better value for stable diffusion but | yes if you start to think about cost the pi wins. If you | think about availability, I'm not sure. | hadlock wrote: | The big immediate plus here, is if you live somewhere with | limited access to the internet, you can still generate | imagery offline on a low end laptop, like a protest group | in far eastern europe or other areas. My personal travel | laptop only has 8GB memory so it's exciting to be able to | try out an idea even if I don't have high end hardware. | saqadri wrote: | Incredible! The march continues to get more models to run on the | edge, much faster than I anticipated. The static quantization and | slicing techniques here are pretty cool | asynchronous wrote: | I've been amazed at how quickly the open source community has | iterated on LLMs and Diffusion models. Goes to show how well | open source can work. ___________________________________________________________________ (page generated 2023-07-20 23:00 UTC)