[HN Gopher] Stable Diffusion Text-Prompt-Based Inpainting - Repl...
       ___________________________________________________________________
        
       Stable Diffusion Text-Prompt-Based Inpainting - Replace Hair,
       Fashion
        
       Author : amrrs
       Score  : 59 points
       Date   : 2022-09-19 20:03 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | bryced wrote:
       | This python library does the same thing (but didn't get traction
       | when I posted it yesterday):
       | 
       | https://github.com/brycedrennan/imaginAIry#automated-replace...
       | 
       | https://news.ycombinator.com/item?id=32887385
       | 
       | And I got the idea from here:
       | 
       | https://github.com/ThereforeGames/txt2mask
       | 
       | Which is using the model here:
       | 
       | https://github.com/timojl/clipseg
       | 
       | Clipseg is doing the hard part!
        
         | stavros wrote:
         | This looks great! I've been looking for a Python library to use
         | with Phantasmagoria[1] for ages, but everyone is doing web UIs.
         | You even packaged it up in a Docker container, very nice, thank
         | you!
         | 
         | [1]: https://phantasmagoria.stavros.io
        
       | fariszr wrote:
       | The progress in the AI space is absolutely astounding.
       | 
       | In less then a year, we went from no AI photo generation(from
       | prompts), to DALL-E2 a commercial service, then competitors
       | started popping up like mid journey, and now we have Stable
       | diffusion, which is a source available AI you can run your self,
       | unlocking implementations like this.
       | 
       | There are other companies now hyping AI video generation like
       | runaway(1)
       | 
       | https://twitter.com/runwayml/status/1568220303808991232
        
         | d3ckard wrote:
         | Totally disagree. A whole of AI business space seems totally
         | focused on pushing the boundaries of what is possible,
         | completely ignoring delivering something consistently useful. I
         | played a bit with image generation recently and most results
         | were abysmal. Sure, it can create great things and prompt
         | hacking will be a thing for a while. It's however very far from
         | "for each prompt I get a working (as in not broken with
         | artifacts) and matching image". IMO business usability depends
         | on average case mostly and this hasn't impressed me yet.
         | 
         | The elephant in the room is the "black box" nature of all
         | neural networks. They are not interpretable for humans,
         | therefore we cannot know when they can royally screw up. That
         | means unless we keep a human in the loop, it's hard to really
         | integrate it into anything critical. And keeping humans out of
         | the loop is what most AI companies promised as an end goal.
         | 
         | Basically, I am embracing incoming AI winter. Not because no
         | great progress has been made, but because what was promised
         | will never be delivered (as have been the case previous times).
         | At the same time, AI is here to stay and AI based tools are
         | going to become common place. It will just be less of a great
         | deal than everybody expected.
        
           | joyfylbanana wrote:
           | It is how you measure "progress". For you progress seems to
           | be only about business. I think majority of people (including
           | me) are just delighted about this new toy that brings them
           | joy. In my life it is great progress if I get new innovative
           | toys that haven't been available before.
        
             | d3ckard wrote:
             | How did you get to conclusion I think progress is mostly
             | about business?
             | 
             | My point is about promises that funded current wave of AI
             | craziness do not seem to get fulfilled. At the precise
             | moment it becomes obvious for everybody, funding will stop
             | and bet on the next horse, whatever it will be.
             | 
             | It seems that autonomous driving got stuck on "driver must
             | be ready to take control". If that doesn't change, Uber is
             | just a glorified tax corporation. Tesla car revolution
             | ain't happening, if my car can't drive me to a spot without
             | assistance (allowing me to sleep or be drunk or whatever).
             | They become just another car company with a head start on
             | electric.
             | 
             | And the rest of the AI industry seems to follow the pattern
             | for me - great results (cars can mostly drive themselves
             | nowadays, that's insane!), but always a notch less than
             | expected. Because what was expected were humans
             | replacements and what we got is human augmenters. It's
             | probably better for humanity, as productivity will rise,
             | but humans won't be cut from the loop. I just don't think
             | it is this particular result that Big Money had in mind
             | when they poured money over it.
        
           | abeppu wrote:
           | I think image generation is an interesting case, because even
           | if a human is always in the loop, and you need to try several
           | times before you get a good image for your prompt of
           | interest, that's likely still faster and cheaper than
           | photoshopping exactly what you want (or certainly faster than
           | hiring an illustrator). And the images produced are sometimes
           | really quite good. A model which produces some amount of
           | really messed up images can still be 'useful'.
           | 
           | _However_ the kinds of failures it makes highlight that these
           | models still lack basic background knowledge. I'm willing to
           | let the stuff about compositionality slide -- that's asking
           | kind of a lot. But I do draw a very straight line from
           | DeepDream in 2015 producing worm-dogs with too many legs and
           | eyes, style-gan artifacts where the physical relationship
           | between a person's face or clothing and surroundings was
           | messed up, and the freakish broken bodies that stable
           | diffusion sometimes creates. Knowing about the structure of
           | _images_ only tells you a limited amount about the structure
           | of things that occupy 3d space, apparently. It knows what a
           | John Singer Sargent portrait looks like, but it's not totally
           | sure that humans have the same number of arms when they're
           | hugging as when they're not.
           | 
           | In the same way, large language models know what text looks
           | like, but not facticity.
           | 
           | So I don't know that an AI winter is called for. But maybe we
           | should lean away from the AI optimism that we can keep
           | getting better models by training against the kinds of data
           | that are easiest to scrape?
        
           | seydor wrote:
           | That 's great it means people can keep 'playing' and
           | innovating before regulation and greedy people join in .
        
           | drusepth wrote:
           | >Totally disagree. A whole of AI business space seems totally
           | focused on pushing the boundaries of what is possible,
           | completely ignoring delivering something consistently useful.
           | 
           | Interestingly, Midjourney is taking an approach you might be
           | interested in, where they're fine-tuning their model to
           | prioritize consistent, visually-appealing outputs with even
           | the most vague prompts (e.g. "a man").
           | 
           | And... it's really making me appreciate its competitors more.
           | This always-good-enough consistency is very much a double-
           | edged sword, IMO, because it also results in a very same-y
           | feel for most Midjourney images (and kind of makes me
           | appreciate instantly-recognizable MJ images a little less, in
           | a way not unlike how I used to be impressed by starry-sky
           | spraypaint pieces and then realized they're basically SP101).
           | You almost always get something good out (at a rate I'd feel
           | comfortable wrapping a production-quality app around) but
           | it's become harder and harder to produce _new_ visuals
           | /aesthetics as Midjourney has progressed closer to their
           | desired consistency levels.
           | 
           | Back when I started on it, I'd get interesting images every
           | 5-10 generations that I'd then tweak and get even more
           | interesting images. Now I'm lucky to see something
           | new/interesting every 5-10 generations, although everything
           | in between is _fine_.
           | 
           | My background here, FWIW: according to the site, I've been
           | using Midjourney for 4 months straight and generated almost
           | 10,000 images. I also have ~700GB of generations on disks
           | from other models in the meantime and run a few sites that
           | basically do wrap these kind of generation models, like
           | novelgens.com, that try to find a good ratio between
           | consistency and divergence.
           | 
           | In the grand scheme of things, I think the AI generation
           | space needs both ends of the spectrum: consistent results
           | like Midjourney lower the barrier of entry for new people to
           | explore the space, but prompt-dependent powerhouses like
           | Stable Diffusion enable artists to push the tooling further
           | and have significantly better control over the art they're
           | trying to create.
        
           | paulgb wrote:
           | What's being delivered _is_ useful. I agree that you still
           | need a human in the loop, but that's true of any creative
           | tool -- having Adobe Illustrator doesn't make me an artist.
           | The current generation of tools has made certain design tasks
           | easier, the main thing missing still is not ML advances as
           | much as nice UIs that put it in the hands of creative
           | professionals.
        
       | MikeYasnev007 wrote:
       | Sharing masks and code open sourcing is a arguable question. I
       | definitely don't share anything outside of commercial company
        
       | MikeYasnev007 wrote:
       | Regarding sharing some more interesting documents like shooting
       | nuclear station it's a too simple tech but I will check it also.
       | Thank you
        
       | seydor wrote:
       | > Enhance 34 to 36. Pan right and pull back. Stop. Enhance 34 to
       | 46. Pull back. Wait a minute, go right, stop. Enhance 57 to 19.
       | Track 45 left. Stop. Enhance 15 to 23. Give me a hard copy right
       | there.
        
       | londons_explore wrote:
       | Why does the mask need to be binary?
       | 
       | Surely it's possible to have a full alpha mask, such that 50%
       | alpha means "push the diffusion process towards this value, but
       | don't force it to generate this value".
        
         | bryced wrote:
         | the "alpha" is already built into the tooling and specified
         | independent of the mask. IE. the stable diffusion inpainting
         | takes hints from what you leave and "decides" what to keep
        
         | phire wrote:
         | Surely you don't need to?
         | 
         | Just over-expand the mask and let stable diffusion decide what
         | it's going to keep and what it's going to replace.
        
       ___________________________________________________________________
       (page generated 2022-09-19 23:00 UTC)