[HN Gopher] Stable Diffusion with Core ML on Apple Silicon ___________________________________________________________________ Stable Diffusion with Core ML on Apple Silicon Author : 2bit Score : 247 points Date : 2022-12-01 20:21 UTC (2 hours ago) (HTM) web link (machinelearning.apple.com) (TXT) w3m dump (machinelearning.apple.com) | [deleted] | zimpenfish wrote: | Man, this takes a ton of room to do the CoreML conversions - ran | out of space doing the unet conversion even though I started with | 25GB free. Going on a delete spree to get it up to 50GB free | before trying again. | mark_l_watson wrote: | Great stuff. I like that they give directions for both Swift and | Python | | This gets you text descriptions to images. | | I have seen models that given a picture, then generate similar | pictures. I want this because while I have many pictures of my | grandmothers, I only have a couple of pictures of my grandfathers | and it would be nice to generate a few more. | | Core ML is so well done. A year ago I wrote a book on Swift AI | and used Core ML in several examples. | tosh wrote: | Atila from Apple on the expected performance: | | > For distilled StableDiffusion 2 which requires 1 to 4 | iterations instead of 50, the same M2 device should generate an | image in <<1 second | | https://twitter.com/atiorh/status/1598399408160342039 | hbn wrote: | SD2 is the one that was neutered, right? | | Maybe a dumb question but can the old model still be run? | [deleted] | qclibre22 wrote: | Also, can you not "upgrade" but still run new models? | astrange wrote: | You can do anything you want. | | SD2 wasn't "neutered", the piece of it from OpenAI that | knew a lot of artist names but wasn't reproduceable was | replaced with a new one from Stability that doesn't. You | can fine-tune anything you want back in. | kyleyeats wrote: | It's less versatile out of the box. Give it a couple months | for the community to catch up. Everyone is still figuring out | what goes where, and SD 1.x was "everything goes in one | spot." It was cool and powerful, but limited. | minimaxir wrote: | You can still do nice things with SD2, it just requires a | different approach. | https://news.ycombinator.com/item?id=33780543 | cammikebrown wrote: | If you told me this was possible when I bought an M1 Pro less | than a year ago, I wouldn't believe you. This is insane. | peppertree wrote: | Last nail in the coffin for DALL*E. | m00dy wrote: | yeah, finally we see the real openAI | visarga wrote: | more open than open source, it's the open model age | astrange wrote: | I think they can move upmarket just as well as anyone else. | mensetmanusman wrote: | Not really, everyone will have their own flavor on how to | rapidly train the model. | | Dall-e et. al will still be able to bandwagon off of all the | free ecosystem being built around the $10M SD1.4 model that | is showing what is possible. | | E.g. Dall-e could go straight to Hollywood if their model | training works better than SD's. The toolsets will work | chasd00 wrote: | i'm very ignorant here so forgive me but if it can generate | images that fast can it be used to generate a video? | valgaze wrote: | Video is really a series of frames, the framerate for | film/human can get away with 24 frames/second-- so maybe | ~40ms/image for real-time at least? | | What's cool about the era in which we live is if you look at | high-performance graphics for games or simulations, for | instance, it may in fact be _faster_ to a the model to | "enhance" a low-resolution frame rather than trying to render | it fully on the machine. | | ex. AMD's FSR vs NVIDIA DLSS | | - AMD FSR (Fidelity FX Super Resolution): | https://www.amd.com/en/technologies/fidelityfx-super- | resolut... | | - NVIDIA DLSS (Deep Learning Super Sampling): | jhttps://www.nvidia.com/en-us/geforce/technologies/dlss/ | | AMD's approach renders the game at a crummy, low-detail | resolution then each frame uses "upscales" | | Both FSR and DLSS aim to improve frames-per-second in games | by rendering them below your monitor's native resolution, | then upscaling them to make up the difference in sharpness. | Currently, FSR uses spatial upscaling, meaning it only | applies its upscaling algorithm to one frame at a time. | Temporal upscalers, like DLSS, can compare multiple frames at | once, to reconstruct a more finely-detailed image that both | more closely resembles native res and can better handle | motion. DLSS specifically uses the machine learning | capabilities of GeForce RTX graphics cards to process all | that data in (more or less) real time. | | Video is really a series of frames, the framerate for | film/human could get away with 24 frames/second-- ~40ms/image | for real-time. | | What's cool about the era in which we live is if you look at | high-performance graphics for games or simulations, it may in | fact be _faster_ to run the model on each frame to "enhance" | a low-resolution frame rather than trying to render it fully | on the machine. | | ex. AMD's FSR vs NVIDIA DLSS | | - AMD FSR (Fidelity FX Super Resolution): | https://www.amd.com/en/technologies/fidelityfx-super- | resolut... | | - NVIDIA DLSS (Deep Learning Super Sampling): | https://www.nvidia.com/en-us/geforce/technologies/dlss/ | | AMD's approach renders the game at a crummy, low-detail | resolution then use "spatial upscaling" to enhance the images | one frame at a time. | | NVIDIA DLSS uses "temporal upscaling" to pass over multiple | frames and uses other capabilities exclusive to Nvidia's | cards to stitch together the frames. | | This is a different challenge than generating the content | from scratch | | I don't think this is possible in real-time yet, but someone | put a filter trained on the German country side to produce | photorealistic Grand Theft Auto driving gameplay: | | https://www.youtube.com/watch?v=P1IcaBn3ej0 | | Notice the mountains in the background go from Southern | California brown to lush green | | https://www.rockpapershotgun.com/amd-fsr-20-is-a-more- | demand.... | vletal wrote: | Yeah, sure. The issue is with temporal consistency. Meta and | Google have some successes in that area. | | https://mezha.media/en/2022/10/06/google-is-working-on- | image... | | Give it some time and SD will be able to do the same. | gcanyon wrote: | There are different requirements for generating video -- at a | minimum, continuity is tough. There are models for producing | video, but (as far as I've seen) they're still a bit wobbly. | mrtksn wrote: | With the full 50 iterations it appears to be about 30s on M1. | | They have some benchmarks on the github repo: | https://github.com/apple/ml-stable-diffusion | | For reference, previously I was getting about <3 minutes for 50 | iterations on my Macbook Air M1. I haven't yet tried Apple's | implementation but it looks like a huge improvement. It might | take it from "possible" to "usable". | washadjeffmad wrote: | For comparison, it's also taking ~3min @ 50 iterations on my | 12c Threadripper using OpenVino. It sounds like the | improvements bring the M1 performance roughly in line with a | GTX 1080. | liuliu wrote: | Yeah, it is just PyTorch MPS backend is not fully baked and | have some slowness. You should be able to get close to that | number with maple-diffusion (probably 10% slower) or my app: | https://drawthings.ai/ (probably around 20% slower, but it | supports samplers that takes less steps (50 -> 30)). | minimaxir wrote: | Note that this is extrapolation for the _distilled_ model which | isn 't released quite yet. (but it will be very exciting when | it does!) | neonate wrote: | https://github.com/apple/ml-stable-diffusion | christiangenco wrote: | Oh gosh that's an intimidating installation process. I'll be | much more interested when I can just `brew install` a binary. | artimaeis wrote: | A bit different take is DiffusionBee, if you're curious to | try it out in a GUI form. | | https://diffusionbee.com | aryamaan wrote: | does it use the optimised model for Apple chips? | belthesar wrote: | Not yet, likely, but the project is very active. I could | see it coming quite soon. | bredren wrote: | I've used this a fair amount but am not sure it's much | better place to begin than automatic1111, especially for | the HN crowd. | thepasswordis wrote: | Where are you seeing the installation process? | MuffinFlavored wrote: | I could be wrong but I think part of the issue is this needs | some large files for the trained dataset? | [deleted] | gedy wrote: | > Oh gosh that's an intimidating installation process | | I'm not seeing any installation instructions on either link - | what am I missing? | alexfromapex wrote: | All I had to do was: | | - create a virtual environment | | - upgrade pip | | - install the nightly PyTorch (command on their website) | | - pip install -r requirements.txt | | - and then, python setup.py install | | - Still trying to figure out Swift part??? | pkage wrote: | How does this compare with using the Hugging Face `diffusers` | package with MPS acceleration through PyTorch Nightly? I was | under the impression that that used CoreML under the hood as well | to convert the models so they ran on the Neural Engine. | [deleted] | liuliu wrote: | It doesn't. MPS largely is on GPU. PyTorch's MPS implementation | is incomplete a few weeks ago as well. This is about 3x faster. | behnamoh wrote: | This may sound naive, but what are some use cases of running SD | models locally? If the free/cheap options exist (like running SD | on powerful servers), then what's the advantage of this new | method? | gjsman-1000 wrote: | Powerful servers with GPUs are expensive. Laptops you already | own, aren't. | sofaygo wrote: | > There are a number of reasons why on-device deployment of | Stable Diffusion in an app is preferable to a server-based | approach. First, the privacy of the end user is protected | because any data the user provided as input to the model stays | on the user's device. Second, after initial download, users | don't require an internet connection to use the model. Finally, | locally deploying this model enables developers to reduce or | eliminate their server-related costs. | yazaddaruvala wrote: | "Hey Siri, draw me a purple duck" and it all happens without an | internet connection! | | If you mean monetary usecases: Roughly something like | Photoshop/Blender/UnrealEngine with ML plugins that are low | latency, private, and $0 server hosting costs. | jwitthuhn wrote: | Even with the slower pytorch implementation my M1 Pro MBP, | which tops out at consuming ~100W of power, can generate a | decent image in 30 seconds. | | I'm not sure exactly what that costs me in terms of power, but | it is assuredly less than any of these services charge for a | single image generation. | tosh wrote: | Works offline, privacy, independent of SaaS (API stability, | longevity, ...). I'm sure there are more. | mensetmanusman wrote: | Soon you will be able to render home imovies like they were | edited by the team that made the dark knight (which costs | ~$100k/min if done professionally). ___________________________________________________________________ (page generated 2022-12-01 23:00 UTC)