[HN Gopher] Opendream: A layer-based UI for Stable Diffusion ___________________________________________________________________ Opendream: A layer-based UI for Stable Diffusion Author : varunshenoy Score : 250 points Date : 2023-08-15 17:38 UTC (5 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | tavavex wrote: | Very exciting. The "first-generation" Stable Diffusion frontends | seem to have settled on a specific design philosophy, so it's | interesting to see new tools (like this or ComfyUI) shake up the | way people work with this tool. I hope that in a few years, we'll | know which philosophy works best. | bavell wrote: | I wrote a typescript API generator for ComfyUI, works great - | hopefully will have time to release it soon. | | I think there's so much unexplored potential in UI and | workflows around generative AI, we've barely scratched the | surface. Very exciting times ahead! | ssalka wrote: | I bet this will be available as an Automatic1111 extension by | end of month. | TillE wrote: | Out of all the AI-related tools, generative art frontends are | probably the thing most likely to radically change and improve | in the next few years. | | It's specifically why I've avoided diving too deep into "prompt | engineering", because the kind of incantations required today | just aren't going to be the way most people interact with this | stuff for very long. | orbital-decay wrote: | _> Out of all the AI-related tools, generative art frontends | are probably the thing most likely to radically change and | improve in the next few years._ | | The difference between UIs is actually not very relevant | today; by now the generic workflow for complex scenes is more | or less obvious to anyone who spent time with SD. | | - Draw basic composition guides. Use them with controlnets or | any other generic guidance method to enforce the environment | composition you want. Train your own controlnet if you need | something specific. (lots of untapped potential here) | | - Finetune the checkpoint on your reference pictures or use | other style transfer methods to enforce the consistent style. | | - Use manual brush masking, manually guided segmentation (ex. | SAM), or prompted segmentation (ex ClipSEG) to select the | parts to be replaced with other objects. The choice depends | on your case and need to do it procedurally. | | - Photobash and add detail to the elements of your scene | using any composition methods you have (noisy latent | composition, inpainting etc) with the masks you created in | the previous step. Use advanced guidance (controlnets, t2i | adapters etc) | | - Don't bother with any prompts beyond very basic | descriptions, as "prompt engineering" is slow and unreliable. | Don't overwhelm the model by trying to fit lots of detail in | one pass; use separate passes for separate objects or | regions. | | - Alternative 3D version: build a primitive 3D scene from | basic props (shapes, rigs). Render the backdrop and separate | objects into separate layers as guides. Use them with | controlnets & co to render the scene in a guided manner, | combining the objects by latent composition, inpainting, or | any other means. This can be used for procedural scenes and | animation (although current models lack temporal stability). | | As long as your tool has all that in one place, it's a | breeze, regardless of the UI paradigm (admittedly auto1111's | overloaded gradio looks straight out of a trash compactor | nowadays). I expect 2D/3D software integrations being the | most successful in the future, as they already offer proven | UIs and most desirable side features. | bobboies wrote: | Incantations are fun! | greggsy wrote: | It's entirely likely that there's much more effort going into | generative text - any perceived advancement of generative | images is going to be disproportionately skewed due the | richness of information that they hold. | toenail wrote: | First thoughts, how do I bind to an ip, and where can I install | models? | smallerfish wrote: | Slap a virtualenv setup into that install script please. A system | wide pip install is a bad pattern. | [deleted] | varunshenoy wrote: | done :) | noman-land wrote: | Now that's agile. | adventured wrote: | Not a bad start. One quick suggestion: avoid the temptation to | make it overly complex. | | Stable Diffusion needs to go out to the masses to a greater | degree. The unnecessary garbage complexity (eg Comfy's ridiculous | noodlescape) that developers keep including into the UIs is | holding Stable Diffusion back significantly from a greater mass | adoption. | bavell wrote: | Node based workflows with little DRY capability (i.e. ComfyUI) | do get painful as the workflow grows. That said, an http server | capabable of executing ML DAGs is extremely useful and a great | building block for other tools and UIs to be built upon. | | I wrote a typescript API generator for ComfyUI recently and | having programmatic access to let you build and send the | execution graphs is a game changer. Hoping to have time to | release it soon. Same can easily be done for any other | language. Exciting stuff! | cwkoss wrote: | Very cool. Would be interesting to train a model on images with | alpha channels so outputs would be automatically masked and more | easily composable. But maybe masking is so good these days that | would be futile? | | When a user does img-2-img on a layer does it use the context | from other visible layers in the generation? | mdp2021 wrote: | > _Would be interesting to train a model on images with alpha | channels_ | | Would be even more interesting to get an ANN middle system of | ontology of the (finally) represented content in order to | change the single items. | | An internal representation of qualified structured items in | space as part of the chain. Prompt > accessible internal | representation > render. | dheera wrote: | For composing this approach works pretty well, maybe the author | should consider making a UI for it | | https://multidiffusion.github.io/ | mottiden wrote: | Thanks for posting. Really interesting | Zetobal wrote: | Segmentation is solved... https://github.com/RockeyCoss/Prompt- | Segment-Anything | michaelt wrote: | Segment Anything is neat, but segmentation is far from | solved. | | If the user generates a picture of a horse and rider to add | onto another composition - they probably want to include the | saddle. | GaggiX wrote: | SAM is also conditioned on points, if it's ambiguous what | you want to mask you can add a point on the saddle and the | model will add it without a problem, segmentation is pretty | much solved, I agree with the parent post. | bavell wrote: | IME I haven't gotten great results using SAM, maybe it | was just the images I was using? They weren't great | quality and it seemed to struggle with low contrast areas | brianjking wrote: | Is it possible to add SD XL support for this? | | I'd love a colab notebook if anyone has the skill and time to do | so. | varunshenoy wrote: | If anyone wants to add SDXL support, all you have to do is | create a new extension with the correct SDXL logic (loading | from HF diffusers, etc.). You could parameterize | `num_inference_steps`, for example, to delegate decisions to | the user of the extension. | | If anyone gets to making one before me, please leave a PR! | smrtinsert wrote: | There's great articles on how layered uis are a lot easier to use | than node based uis. Really excited to see a layered approach to | SD. Its definitely time to break out of gradio. | TeMPOraL wrote: | Maybe if they're talking about layered UIs with layer groups, | which turn a flat stack into something resembling a tree. But | even these UIs don't give you proper non-destructive editing - | anything more complex requires you to duplicate parts of layer | stack to feed as inputs, which is a destructive operation with | respect to structure (those pasted layers won't update if you | make changes to copied source). Doing this properly requires a | DAG, at which point you're at node-based UIs (or some | idiosyncratic mess of an UI that pretends it's not modelling a | DAG). | | It's all moot though, because as far as I know, there is no | proper 2D graphics editing software that uses DAGs and nodes. | Everyone just copies Photoshop. Especially Affinity, which is | grating, given their recent focus on non-destructive editing. | For some reason, node-based UIs ended up being a mainstay of | VFX, 3D graphics, and VFX & gamedev support tooling. But | general 2D graphics - photo editing, raster and vector | creation? Nodes are surprisingly absent. | gatane wrote: | Is this related to Melondream? | __loam wrote: | [flagged] | dang wrote: | Maybe so, but please don't post unsubstantive comments to | Hacker News. | visarga wrote: | Why, did you lose your art because of AI? | valine wrote: | Theft takes the original, piracy makes a copy, AI art remixes | the original. I'm not sure how to classify AI art but it | definitely isn't theft. | yieldcrv wrote: | Derivative works with zero copyright protection due to the | predominance of machine assists | | No way to quantify though, for or against copy protection | | But thats a convenient compromise for now | slowmovintarget wrote: | So piracy may be involved in training the model, but the rest | does not follow. | | Art inspired by other art has been the way of things for as | long as we've been creating images. There's no such thing as | a "clean-room painting." | __loam wrote: | Using unlicensed copies of other people's work in training | is the problem, along with what that does to the market for | original works. Using people's labor for AI training | without permission or compensation will discourage people | from sharing that work and ultimately make the AI models | worse too. | joemi wrote: | Doesn't this entirely depend both on what its been trained on | and what style is being output? But also, philosophically, is | it even "theft" to make something in-the-style-of someone? | | I believe these questions and their complex answers are the | reason you've been downvoted. | coding123 wrote: | I understand why the person was downvoted, but not why the | person was flagged. It doesn't make sense for someone to flag | "AI art is theft." | | Downvotes because you didn't back up what you mean. | | Flagged because there are AI fanboys that want to sensor | speech perhaps? | stale2002 wrote: | I would say that it absolutely deserved to be flagged | because it was a comment of little engagement. | | It both isn't directly related to the original post, and | also didn't even make any particular arguments. It was just | a 5 word declaration of fact, that is borderline offtopic. | cercatrova wrote: | Flagged because it was an unsubstantive comment as dang | mentioned and also that it's increasingly a flame bait | topic on HN, same as with Copilot and licensing. | mcclux wrote: | "This pixel right here officer; clearly stolen." | [deleted] | tomalaci wrote: | I haven't followed diffusion image generation development for a | while. Where do you find information on what models you can use | in the model_ckpt field? Do I need to import them from somewhere? | What are the main differences between them and which are more | modern or better? | nickstinemates wrote: | You can find them on huggingface, or you can reverse engineer | which ckpt you want to use based on an image you've seen | generated (like at majin[1] - beware, there's a lot of | NSFW/controversial stuff here.) | | 1: https://majinai.art/ | bavell wrote: | Also CivitAI but beware the NSFW | | https://civitai.com/ | CSSer wrote: | Some of this is straight up soft-core child porn. This is | fucked up. | greggsy wrote: | I believe illustrations have been deemed to be abuse | material, so I wouldn't be surprised if LE have started | looking into it. | GaggiX wrote: | illustrations are not a problem under the law in the | United States, but it has to be seen for generated images | indistinguishable from reality or almost. | kleiba wrote: | Who exactly is being abused here? | | I for one would much rather give pedophiles an | opportunity to fulfil their sexual desires through AI- | generated pictures than real ones. | | Of course, we can talk about the training material. Are | there actual child porn images in there? I seriously | doubt it but who knows? | | And perhaps a case could be made that AI-generated child | porn could be a gateway to invite people who then seek | out non-generated material. | | But I think these are separate discussions to be had. | CSSer wrote: | Geez that's disturbing. I clicked having no qualms with | nudes, artistic or otherwise. I'm not a prude. I've seen | my fair share of anime girls and AI nudes. Hell, I was | raised on the internet before parental settings were a | thing, but I didn't expect that. It's so gross how it | toes a line too. | dingnuts wrote: | the Fediverse has a big problem with this, too, and I | never hear anyone talking seriously about it | ryukoposting wrote: | If it can handle LoRas, I'll be sure to try it out this weekend. | Hamcha wrote: | What's up with names nowadays? Not only there's already an | OpenDream[1] on GitHub, but there's also a Stable Diffusion | service also called OpenDream[2]! | | 1. https://github.com/OpenDreamProject/OpenDream 2. | https://opendream.ai/ | [deleted] | antman wrote: | Can you add a layer with e.g. an image of yourself? | ttul wrote: | Pretty sure you can do this. Diffusion models by default start | with noise, but you can start with any data, including an | existing image. For instance, you could import a photo of | yourself, mask the eyes and then ask the model to make them | green. | asynchronous wrote: | Very cool honestly, seems like a much needed improvement over | Automatic. Does it support LoRa/will it support in near future? ___________________________________________________________________ (page generated 2023-08-15 23:00 UTC)