[HN Gopher] Opendream: A layer-based UI for Stable Diffusion
       ___________________________________________________________________
        
       Opendream: A layer-based UI for Stable Diffusion
        
       Author : varunshenoy
       Score  : 250 points
       Date   : 2023-08-15 17:38 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tavavex wrote:
       | Very exciting. The "first-generation" Stable Diffusion frontends
       | seem to have settled on a specific design philosophy, so it's
       | interesting to see new tools (like this or ComfyUI) shake up the
       | way people work with this tool. I hope that in a few years, we'll
       | know which philosophy works best.
        
         | bavell wrote:
         | I wrote a typescript API generator for ComfyUI, works great -
         | hopefully will have time to release it soon.
         | 
         | I think there's so much unexplored potential in UI and
         | workflows around generative AI, we've barely scratched the
         | surface. Very exciting times ahead!
        
         | ssalka wrote:
         | I bet this will be available as an Automatic1111 extension by
         | end of month.
        
         | TillE wrote:
         | Out of all the AI-related tools, generative art frontends are
         | probably the thing most likely to radically change and improve
         | in the next few years.
         | 
         | It's specifically why I've avoided diving too deep into "prompt
         | engineering", because the kind of incantations required today
         | just aren't going to be the way most people interact with this
         | stuff for very long.
        
           | orbital-decay wrote:
           | _> Out of all the AI-related tools, generative art frontends
           | are probably the thing most likely to radically change and
           | improve in the next few years._
           | 
           | The difference between UIs is actually not very relevant
           | today; by now the generic workflow for complex scenes is more
           | or less obvious to anyone who spent time with SD.
           | 
           | - Draw basic composition guides. Use them with controlnets or
           | any other generic guidance method to enforce the environment
           | composition you want. Train your own controlnet if you need
           | something specific. (lots of untapped potential here)
           | 
           | - Finetune the checkpoint on your reference pictures or use
           | other style transfer methods to enforce the consistent style.
           | 
           | - Use manual brush masking, manually guided segmentation (ex.
           | SAM), or prompted segmentation (ex ClipSEG) to select the
           | parts to be replaced with other objects. The choice depends
           | on your case and need to do it procedurally.
           | 
           | - Photobash and add detail to the elements of your scene
           | using any composition methods you have (noisy latent
           | composition, inpainting etc) with the masks you created in
           | the previous step. Use advanced guidance (controlnets, t2i
           | adapters etc)
           | 
           | - Don't bother with any prompts beyond very basic
           | descriptions, as "prompt engineering" is slow and unreliable.
           | Don't overwhelm the model by trying to fit lots of detail in
           | one pass; use separate passes for separate objects or
           | regions.
           | 
           | - Alternative 3D version: build a primitive 3D scene from
           | basic props (shapes, rigs). Render the backdrop and separate
           | objects into separate layers as guides. Use them with
           | controlnets & co to render the scene in a guided manner,
           | combining the objects by latent composition, inpainting, or
           | any other means. This can be used for procedural scenes and
           | animation (although current models lack temporal stability).
           | 
           | As long as your tool has all that in one place, it's a
           | breeze, regardless of the UI paradigm (admittedly auto1111's
           | overloaded gradio looks straight out of a trash compactor
           | nowadays). I expect 2D/3D software integrations being the
           | most successful in the future, as they already offer proven
           | UIs and most desirable side features.
        
           | bobboies wrote:
           | Incantations are fun!
        
           | greggsy wrote:
           | It's entirely likely that there's much more effort going into
           | generative text - any perceived advancement of generative
           | images is going to be disproportionately skewed due the
           | richness of information that they hold.
        
       | toenail wrote:
       | First thoughts, how do I bind to an ip, and where can I install
       | models?
        
       | smallerfish wrote:
       | Slap a virtualenv setup into that install script please. A system
       | wide pip install is a bad pattern.
        
         | [deleted]
        
         | varunshenoy wrote:
         | done :)
        
           | noman-land wrote:
           | Now that's agile.
        
       | adventured wrote:
       | Not a bad start. One quick suggestion: avoid the temptation to
       | make it overly complex.
       | 
       | Stable Diffusion needs to go out to the masses to a greater
       | degree. The unnecessary garbage complexity (eg Comfy's ridiculous
       | noodlescape) that developers keep including into the UIs is
       | holding Stable Diffusion back significantly from a greater mass
       | adoption.
        
         | bavell wrote:
         | Node based workflows with little DRY capability (i.e. ComfyUI)
         | do get painful as the workflow grows. That said, an http server
         | capabable of executing ML DAGs is extremely useful and a great
         | building block for other tools and UIs to be built upon.
         | 
         | I wrote a typescript API generator for ComfyUI recently and
         | having programmatic access to let you build and send the
         | execution graphs is a game changer. Hoping to have time to
         | release it soon. Same can easily be done for any other
         | language. Exciting stuff!
        
       | cwkoss wrote:
       | Very cool. Would be interesting to train a model on images with
       | alpha channels so outputs would be automatically masked and more
       | easily composable. But maybe masking is so good these days that
       | would be futile?
       | 
       | When a user does img-2-img on a layer does it use the context
       | from other visible layers in the generation?
        
         | mdp2021 wrote:
         | > _Would be interesting to train a model on images with alpha
         | channels_
         | 
         | Would be even more interesting to get an ANN middle system of
         | ontology of the (finally) represented content in order to
         | change the single items.
         | 
         | An internal representation of qualified structured items in
         | space as part of the chain. Prompt > accessible internal
         | representation > render.
        
         | dheera wrote:
         | For composing this approach works pretty well, maybe the author
         | should consider making a UI for it
         | 
         | https://multidiffusion.github.io/
        
           | mottiden wrote:
           | Thanks for posting. Really interesting
        
         | Zetobal wrote:
         | Segmentation is solved... https://github.com/RockeyCoss/Prompt-
         | Segment-Anything
        
           | michaelt wrote:
           | Segment Anything is neat, but segmentation is far from
           | solved.
           | 
           | If the user generates a picture of a horse and rider to add
           | onto another composition - they probably want to include the
           | saddle.
        
             | GaggiX wrote:
             | SAM is also conditioned on points, if it's ambiguous what
             | you want to mask you can add a point on the saddle and the
             | model will add it without a problem, segmentation is pretty
             | much solved, I agree with the parent post.
        
               | bavell wrote:
               | IME I haven't gotten great results using SAM, maybe it
               | was just the images I was using? They weren't great
               | quality and it seemed to struggle with low contrast areas
        
       | brianjking wrote:
       | Is it possible to add SD XL support for this?
       | 
       | I'd love a colab notebook if anyone has the skill and time to do
       | so.
        
         | varunshenoy wrote:
         | If anyone wants to add SDXL support, all you have to do is
         | create a new extension with the correct SDXL logic (loading
         | from HF diffusers, etc.). You could parameterize
         | `num_inference_steps`, for example, to delegate decisions to
         | the user of the extension.
         | 
         | If anyone gets to making one before me, please leave a PR!
        
       | smrtinsert wrote:
       | There's great articles on how layered uis are a lot easier to use
       | than node based uis. Really excited to see a layered approach to
       | SD. Its definitely time to break out of gradio.
        
         | TeMPOraL wrote:
         | Maybe if they're talking about layered UIs with layer groups,
         | which turn a flat stack into something resembling a tree. But
         | even these UIs don't give you proper non-destructive editing -
         | anything more complex requires you to duplicate parts of layer
         | stack to feed as inputs, which is a destructive operation with
         | respect to structure (those pasted layers won't update if you
         | make changes to copied source). Doing this properly requires a
         | DAG, at which point you're at node-based UIs (or some
         | idiosyncratic mess of an UI that pretends it's not modelling a
         | DAG).
         | 
         | It's all moot though, because as far as I know, there is no
         | proper 2D graphics editing software that uses DAGs and nodes.
         | Everyone just copies Photoshop. Especially Affinity, which is
         | grating, given their recent focus on non-destructive editing.
         | For some reason, node-based UIs ended up being a mainstay of
         | VFX, 3D graphics, and VFX & gamedev support tooling. But
         | general 2D graphics - photo editing, raster and vector
         | creation? Nodes are surprisingly absent.
        
       | gatane wrote:
       | Is this related to Melondream?
        
       | __loam wrote:
       | [flagged]
        
         | dang wrote:
         | Maybe so, but please don't post unsubstantive comments to
         | Hacker News.
        
         | visarga wrote:
         | Why, did you lose your art because of AI?
        
         | valine wrote:
         | Theft takes the original, piracy makes a copy, AI art remixes
         | the original. I'm not sure how to classify AI art but it
         | definitely isn't theft.
        
           | yieldcrv wrote:
           | Derivative works with zero copyright protection due to the
           | predominance of machine assists
           | 
           | No way to quantify though, for or against copy protection
           | 
           | But thats a convenient compromise for now
        
           | slowmovintarget wrote:
           | So piracy may be involved in training the model, but the rest
           | does not follow.
           | 
           | Art inspired by other art has been the way of things for as
           | long as we've been creating images. There's no such thing as
           | a "clean-room painting."
        
             | __loam wrote:
             | Using unlicensed copies of other people's work in training
             | is the problem, along with what that does to the market for
             | original works. Using people's labor for AI training
             | without permission or compensation will discourage people
             | from sharing that work and ultimately make the AI models
             | worse too.
        
         | joemi wrote:
         | Doesn't this entirely depend both on what its been trained on
         | and what style is being output? But also, philosophically, is
         | it even "theft" to make something in-the-style-of someone?
         | 
         | I believe these questions and their complex answers are the
         | reason you've been downvoted.
        
           | coding123 wrote:
           | I understand why the person was downvoted, but not why the
           | person was flagged. It doesn't make sense for someone to flag
           | "AI art is theft."
           | 
           | Downvotes because you didn't back up what you mean.
           | 
           | Flagged because there are AI fanboys that want to sensor
           | speech perhaps?
        
             | stale2002 wrote:
             | I would say that it absolutely deserved to be flagged
             | because it was a comment of little engagement.
             | 
             | It both isn't directly related to the original post, and
             | also didn't even make any particular arguments. It was just
             | a 5 word declaration of fact, that is borderline offtopic.
        
             | cercatrova wrote:
             | Flagged because it was an unsubstantive comment as dang
             | mentioned and also that it's increasingly a flame bait
             | topic on HN, same as with Copilot and licensing.
        
         | mcclux wrote:
         | "This pixel right here officer; clearly stolen."
        
         | [deleted]
        
       | tomalaci wrote:
       | I haven't followed diffusion image generation development for a
       | while. Where do you find information on what models you can use
       | in the model_ckpt field? Do I need to import them from somewhere?
       | What are the main differences between them and which are more
       | modern or better?
        
         | nickstinemates wrote:
         | You can find them on huggingface, or you can reverse engineer
         | which ckpt you want to use based on an image you've seen
         | generated (like at majin[1] - beware, there's a lot of
         | NSFW/controversial stuff here.)
         | 
         | 1: https://majinai.art/
        
           | bavell wrote:
           | Also CivitAI but beware the NSFW
           | 
           | https://civitai.com/
        
           | CSSer wrote:
           | Some of this is straight up soft-core child porn. This is
           | fucked up.
        
             | greggsy wrote:
             | I believe illustrations have been deemed to be abuse
             | material, so I wouldn't be surprised if LE have started
             | looking into it.
        
               | GaggiX wrote:
               | illustrations are not a problem under the law in the
               | United States, but it has to be seen for generated images
               | indistinguishable from reality or almost.
        
               | kleiba wrote:
               | Who exactly is being abused here?
               | 
               | I for one would much rather give pedophiles an
               | opportunity to fulfil their sexual desires through AI-
               | generated pictures than real ones.
               | 
               | Of course, we can talk about the training material. Are
               | there actual child porn images in there? I seriously
               | doubt it but who knows?
               | 
               | And perhaps a case could be made that AI-generated child
               | porn could be a gateway to invite people who then seek
               | out non-generated material.
               | 
               | But I think these are separate discussions to be had.
        
               | CSSer wrote:
               | Geez that's disturbing. I clicked having no qualms with
               | nudes, artistic or otherwise. I'm not a prude. I've seen
               | my fair share of anime girls and AI nudes. Hell, I was
               | raised on the internet before parental settings were a
               | thing, but I didn't expect that. It's so gross how it
               | toes a line too.
        
               | dingnuts wrote:
               | the Fediverse has a big problem with this, too, and I
               | never hear anyone talking seriously about it
        
       | ryukoposting wrote:
       | If it can handle LoRas, I'll be sure to try it out this weekend.
        
       | Hamcha wrote:
       | What's up with names nowadays? Not only there's already an
       | OpenDream[1] on GitHub, but there's also a Stable Diffusion
       | service also called OpenDream[2]!
       | 
       | 1. https://github.com/OpenDreamProject/OpenDream 2.
       | https://opendream.ai/
        
         | [deleted]
        
       | antman wrote:
       | Can you add a layer with e.g. an image of yourself?
        
         | ttul wrote:
         | Pretty sure you can do this. Diffusion models by default start
         | with noise, but you can start with any data, including an
         | existing image. For instance, you could import a photo of
         | yourself, mask the eyes and then ask the model to make them
         | green.
        
       | asynchronous wrote:
       | Very cool honestly, seems like a much needed improvement over
       | Automatic. Does it support LoRa/will it support in near future?
        
       ___________________________________________________________________
       (page generated 2023-08-15 23:00 UTC)