[HN Gopher] Stable Diffusion Text-Prompt-Based Inpainting - Repl... ___________________________________________________________________ Stable Diffusion Text-Prompt-Based Inpainting - Replace Hair, Fashion Author : amrrs Score : 59 points Date : 2022-09-19 20:03 UTC (2 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | bryced wrote: | This python library does the same thing (but didn't get traction | when I posted it yesterday): | | https://github.com/brycedrennan/imaginAIry#automated-replace... | | https://news.ycombinator.com/item?id=32887385 | | And I got the idea from here: | | https://github.com/ThereforeGames/txt2mask | | Which is using the model here: | | https://github.com/timojl/clipseg | | Clipseg is doing the hard part! | stavros wrote: | This looks great! I've been looking for a Python library to use | with Phantasmagoria[1] for ages, but everyone is doing web UIs. | You even packaged it up in a Docker container, very nice, thank | you! | | [1]: https://phantasmagoria.stavros.io | fariszr wrote: | The progress in the AI space is absolutely astounding. | | In less then a year, we went from no AI photo generation(from | prompts), to DALL-E2 a commercial service, then competitors | started popping up like mid journey, and now we have Stable | diffusion, which is a source available AI you can run your self, | unlocking implementations like this. | | There are other companies now hyping AI video generation like | runaway(1) | | https://twitter.com/runwayml/status/1568220303808991232 | d3ckard wrote: | Totally disagree. A whole of AI business space seems totally | focused on pushing the boundaries of what is possible, | completely ignoring delivering something consistently useful. I | played a bit with image generation recently and most results | were abysmal. Sure, it can create great things and prompt | hacking will be a thing for a while. It's however very far from | "for each prompt I get a working (as in not broken with | artifacts) and matching image". IMO business usability depends | on average case mostly and this hasn't impressed me yet. | | The elephant in the room is the "black box" nature of all | neural networks. They are not interpretable for humans, | therefore we cannot know when they can royally screw up. That | means unless we keep a human in the loop, it's hard to really | integrate it into anything critical. And keeping humans out of | the loop is what most AI companies promised as an end goal. | | Basically, I am embracing incoming AI winter. Not because no | great progress has been made, but because what was promised | will never be delivered (as have been the case previous times). | At the same time, AI is here to stay and AI based tools are | going to become common place. It will just be less of a great | deal than everybody expected. | joyfylbanana wrote: | It is how you measure "progress". For you progress seems to | be only about business. I think majority of people (including | me) are just delighted about this new toy that brings them | joy. In my life it is great progress if I get new innovative | toys that haven't been available before. | d3ckard wrote: | How did you get to conclusion I think progress is mostly | about business? | | My point is about promises that funded current wave of AI | craziness do not seem to get fulfilled. At the precise | moment it becomes obvious for everybody, funding will stop | and bet on the next horse, whatever it will be. | | It seems that autonomous driving got stuck on "driver must | be ready to take control". If that doesn't change, Uber is | just a glorified tax corporation. Tesla car revolution | ain't happening, if my car can't drive me to a spot without | assistance (allowing me to sleep or be drunk or whatever). | They become just another car company with a head start on | electric. | | And the rest of the AI industry seems to follow the pattern | for me - great results (cars can mostly drive themselves | nowadays, that's insane!), but always a notch less than | expected. Because what was expected were humans | replacements and what we got is human augmenters. It's | probably better for humanity, as productivity will rise, | but humans won't be cut from the loop. I just don't think | it is this particular result that Big Money had in mind | when they poured money over it. | abeppu wrote: | I think image generation is an interesting case, because even | if a human is always in the loop, and you need to try several | times before you get a good image for your prompt of | interest, that's likely still faster and cheaper than | photoshopping exactly what you want (or certainly faster than | hiring an illustrator). And the images produced are sometimes | really quite good. A model which produces some amount of | really messed up images can still be 'useful'. | | _However_ the kinds of failures it makes highlight that these | models still lack basic background knowledge. I'm willing to | let the stuff about compositionality slide -- that's asking | kind of a lot. But I do draw a very straight line from | DeepDream in 2015 producing worm-dogs with too many legs and | eyes, style-gan artifacts where the physical relationship | between a person's face or clothing and surroundings was | messed up, and the freakish broken bodies that stable | diffusion sometimes creates. Knowing about the structure of | _images_ only tells you a limited amount about the structure | of things that occupy 3d space, apparently. It knows what a | John Singer Sargent portrait looks like, but it's not totally | sure that humans have the same number of arms when they're | hugging as when they're not. | | In the same way, large language models know what text looks | like, but not facticity. | | So I don't know that an AI winter is called for. But maybe we | should lean away from the AI optimism that we can keep | getting better models by training against the kinds of data | that are easiest to scrape? | seydor wrote: | That 's great it means people can keep 'playing' and | innovating before regulation and greedy people join in . | drusepth wrote: | >Totally disagree. A whole of AI business space seems totally | focused on pushing the boundaries of what is possible, | completely ignoring delivering something consistently useful. | | Interestingly, Midjourney is taking an approach you might be | interested in, where they're fine-tuning their model to | prioritize consistent, visually-appealing outputs with even | the most vague prompts (e.g. "a man"). | | And... it's really making me appreciate its competitors more. | This always-good-enough consistency is very much a double- | edged sword, IMO, because it also results in a very same-y | feel for most Midjourney images (and kind of makes me | appreciate instantly-recognizable MJ images a little less, in | a way not unlike how I used to be impressed by starry-sky | spraypaint pieces and then realized they're basically SP101). | You almost always get something good out (at a rate I'd feel | comfortable wrapping a production-quality app around) but | it's become harder and harder to produce _new_ visuals | /aesthetics as Midjourney has progressed closer to their | desired consistency levels. | | Back when I started on it, I'd get interesting images every | 5-10 generations that I'd then tweak and get even more | interesting images. Now I'm lucky to see something | new/interesting every 5-10 generations, although everything | in between is _fine_. | | My background here, FWIW: according to the site, I've been | using Midjourney for 4 months straight and generated almost | 10,000 images. I also have ~700GB of generations on disks | from other models in the meantime and run a few sites that | basically do wrap these kind of generation models, like | novelgens.com, that try to find a good ratio between | consistency and divergence. | | In the grand scheme of things, I think the AI generation | space needs both ends of the spectrum: consistent results | like Midjourney lower the barrier of entry for new people to | explore the space, but prompt-dependent powerhouses like | Stable Diffusion enable artists to push the tooling further | and have significantly better control over the art they're | trying to create. | paulgb wrote: | What's being delivered _is_ useful. I agree that you still | need a human in the loop, but that's true of any creative | tool -- having Adobe Illustrator doesn't make me an artist. | The current generation of tools has made certain design tasks | easier, the main thing missing still is not ML advances as | much as nice UIs that put it in the hands of creative | professionals. | MikeYasnev007 wrote: | Sharing masks and code open sourcing is a arguable question. I | definitely don't share anything outside of commercial company | MikeYasnev007 wrote: | Regarding sharing some more interesting documents like shooting | nuclear station it's a too simple tech but I will check it also. | Thank you | seydor wrote: | > Enhance 34 to 36. Pan right and pull back. Stop. Enhance 34 to | 46. Pull back. Wait a minute, go right, stop. Enhance 57 to 19. | Track 45 left. Stop. Enhance 15 to 23. Give me a hard copy right | there. | londons_explore wrote: | Why does the mask need to be binary? | | Surely it's possible to have a full alpha mask, such that 50% | alpha means "push the diffusion process towards this value, but | don't force it to generate this value". | bryced wrote: | the "alpha" is already built into the tooling and specified | independent of the mask. IE. the stable diffusion inpainting | takes hints from what you leave and "decides" what to keep | phire wrote: | Surely you don't need to? | | Just over-expand the mask and let stable diffusion decide what | it's going to keep and what it's going to replace. ___________________________________________________________________ (page generated 2022-09-19 23:00 UTC)