[HN Gopher] Stable Diffusion Is the Most Important AI Art Model ... ___________________________________________________________________ Stable Diffusion Is the Most Important AI Art Model Ever Author : brundolf Score : 111 points Date : 2022-08-28 20:17 UTC (2 hours ago) (HTM) web link (thealgorithmicbridge.substack.com) (TXT) w3m dump (thealgorithmicbridge.substack.com) | nullc wrote: | It'll be interesting to see what happens when a copyright troll ( | https://doctorow.medium.com/a-bug-in-early-creative-commons-... ) | realizes that they can acquire the rights to models distributed | under these vague-as-fog moral panic licenses, or distribute | their own and have people actually use them, and start extracting | rents. | | These licenses will do little to nothing to stop abuse: The | abusers will already conceal their identities because their | actions are immoral or even illegal (fraud, harassment, etc). But | they create a whole host of new liabilities for the users because | the definitions are exceedingly subjective. | | It's tremendously important to make these tools actually open. | But open with a lurking liability bomb stops short of the goal. | While stability.ai may never turn into a troll or sell their | rights to one, that isn't necessarily true for the next model | that comes around. | sf_sugar_daddy wrote: | empiricus wrote: | I tried searching "nude" prompt. Instant regret. | metadat wrote: | Definitely don't search "nipples".. ugh... | | https://lexica.art/ | | (Direct link doesn't work, count yourself lucky) | amelius wrote: | Where can I learn more about how this algorithm works? | r2_pilot wrote: | This may be of interest to you: | https://github.com/CompVis/stable-diffusion | veridies wrote: | One oddity for me (and I haven't played with a lot of AI art, so | maybe this is normal): every time I try to describe a person, it | generates like four to seven different faces. | can16358p wrote: | Tried it. | | While it's a huge win to be open source, I find the results | always inferior to Midjourney (and DALL-E). | | I tried to generate some artistic results with variety of prompts | and Midjourney always won hands down. | | But of course, since it's open source, many community tweaks and | colab notebooks/forks will probably put it in par with DALL-E by | time. But I have trouble imagining Stable Diffusion competing | against Midjourney anytime soon: the different is day and night. | tough wrote: | Mid journey uses SD under the good (as of recently, no?) | afpx wrote: | After generating 5000 images with these tools, I believe the | killer app will be the one that gives the artist the most | control. I want a view and a scene and be able to manipulate both | in real time. | | Like, | | View: 50mm film, wide-angle | | Scene: rectangular room with window -> show preview | | Scene: add table -> show preview | | Scene: move table left -> show preview | | Scene: add mug on table -> show preview | | View: center on mug | | Right now, there's little control and it's a lot of random | guessing, "Hmm what happens if I add these two terms?" | gamegoblin wrote: | Have you seen the img2img results? You draw kind of a crappy | Microsoft Paint style image, give it some text for how you want | it to actually look, and it does the transformation. | | For example: | https://www.reddit.com/r/StableDiffusion/comments/wwgge8/ano... | | Consider also this example of someone splicing Stable Diffusion | into a proper image editor and using a combination of img2img, | text to image, inpainting, and normal photoshop tools: | https://www.reddit.com/r/StableDiffusion/comments/wyduk1/sho... | orbital-decay wrote: | The natural language alone is one of the worst ways to control | image generation. The model knows how to generate anything, but | it's own "language" is nothing like yours. It's like writing in | Finnish, twisting it in such a way that it would yield coherent | Chinese poems after Google Translate. You will end up inserting | various garbage into your input and not getting the result you | like anyway. img2img gives much better result because you can | explain your intent with higher order tools than just textual | input alone. | | What would be best is to properly integrate models like that | into some painting software like Krita. Imagine a brush that | only affects freckles, blue teapots, fingers, or sharp corners. | (or any other thing in a prompt) Or a brush that learns your | personal style and transfers it onto a rough sketch you make, | speeding up the process. Many possibilities. | | I think they are already making an img2img plugin for | Photoshop. Watch the demo, it's kind of impressive. [0] It's | just a rudimentary prototype of what's possible with a properly | trained model, but it already looks like a drop-in replacement | for photobashing (as an example). | | https://old.reddit.com/r/StableDiffusion/comments/wyduk1/sho... | adhesive_wombat wrote: | Reminds me of the holodeck scene where Picard(?, Edit, Geordi) | reconstructs a table with what I, at the time, thought was a | pretty vague set of specifications. | | Turns out the _Star Trek_ predicted 2020 's style AI behaviour | rather well. Considering nuclear war is then due in 2026, | that's disconcerting. | spywaregorilla wrote: | I think the ideal UX will be the ability to markup images with | little comments and have it adapt accordingly. The prompt | interface is bad. One of the biggest reasons being that you | have virtually no control on the spatial aspect of your | additions. Being able to say "add an elephant here and remove | this lamp" will be big. Being able to do so with a doodle of an | elephant to suggest posing will be even better. | Buttons840 wrote: | I saw the word "safety" a few places in the article. What does | "safety" mean in this context? | i_like_apis wrote: | In this context, it's buzz-killing, pearl-clutching, often | woke, nonsense: | | ... attempting to limit the generation space to omit porn, | copyright infringement, violence, racially "unbalanced" content | ... etc. | ttul wrote: | You can ask the AI to generate a picture of horrid things, and | it will oblige. | emikulic wrote: | You can generate things the author doesn't like. And since | you're doing it on your video card at home, nobody can stop | you. | Buttons840 wrote: | If a computer model could produce the world's best porn, | would that be a good or bad thing? Many harmful effects of | porn would be amplified, but it would reduce the | exploitation of real people in the industry. A moral | question society will soon face I think. | whywhywhywhy wrote: | I've generated over 1000 images in the last 48 hours. It's better | and faster than using Dall-E, I can literally just leave a prompt | churning away in the background for the same costs of playing a | high end videogame and check on the results when I want. | | Honestly if I was a commercial concept artist or illustrator that | didn't have a signature style I'd be really worried. We're truly | gonna see the power of this tech as a tool now it's not gatekept. | daenz wrote: | >once they figure out how to control potentially harmful | generations | | Is it just me, or does anyone else think that this is an | impossible and futile task? I don't have a solid grasp on what | kind of censorship is possible with this technology, but the goal | seems to be on par with making sure nobody says anything mean | online. People are extremely creative and are going to find the | prompts that generate the "harmful" images. | ironmagma wrote: | It's impossible and futile, but that has never stopped | legislators or attorneys before. | zmmmmm wrote: | devil's advocating, given they have trained it so well to | generate images in spite of all expectations, is it really so | hard to imagine that they can't also train it to understand | what images not to generate? It already had to understand not | to generate things that don't make sense to humans. How does | this not just amount to "moar training"? The hardest thing is | that the training data it will need is a gigantic store of | objectionable (and illegal) content ... probably not something | many groups are eager to build and host. | systemvoltage wrote: | People _need_ to read John Carmack and John Romero 's epic | adventures of Doom: https://www.amazon.com/Masters-of-Doom- | David-Kushner-audiobo... | | Even in the 90's they had to fight hordes and hordes of | Californian nutjobs (Diane Feinstein et. al.) that wanted to | ban violent video games. These people would be certainly | cancelled in today's world, wouldn't hold a chance. Because, | how dare you allow violence in video games to ...children!? | | Our civilization depends on allowing wacko's do their thing as | far as it is within limits of the law. Let them be offensive as | fuck. These are the people that herald and propel society | forward by their heterodox thinking. Society is going to decay | fast, it already is. | daenz wrote: | You make a great point... we can't stop the decay, so the | growth has to outpace it. | JaimeThompson wrote: | >hordes of Californian nutjobs (Diane Feinstein et. al.) | | Lots of people from other states, including Texas, in that | list too. It wasn't just a California / Left issue. | systemvoltage wrote: | Yea definitely when they started a studio in Dallas, I | don't remember the congress persons that were on similar | stance as Diane. During the 90's, progressives played a | larger role though. There was also Mortal Kombat fiasco: | | > During the U.S. Congressional hearing on video game | violence, Democratic Party Senator Herb Kohl, working with | Senator Joe Lieberman, attempted to illustrate why | government regulation of video games was needed by showing | clips from 1992's Mortal Kombat and Night Trap (another | game featuring digitized actors). | | https://en.wikipedia.org/wiki/1993_United_States_Senate_hea | r... | mrtksn wrote: | I find it very immoral too, it's like the islamist trying to | prevent the prophet pictures drawn. Not that I want to offend | muslims or make "harmful" content but this notion that specific | type of content creation needs to be imposed is very very | problematic. Americans freak out of nudity all the time, | something that is not considered harmful in many other places. | The fear of images and text and the mission to restrain it is | pathetic. | | Anyway, it won't be possible to contain it. Better spend the | effort on how to deal with bad actors instead of trying to | restrain the use of content creation tools. | derac wrote: | Yeah, it's taking the impulse to control everything from our | own mind and putting it into an artificial one. Seems to me a | lot of our suffering is borne of that impulse. | quitit wrote: | I don't see the point. Idiots are fooled by far less convincing | images. | | Humanity has had the ability to lie with pictures since the | invention of photography. The field of special effects can be | described as lying about things that don't matter. | | Without using Stable Diffusion, I can still photoshop an image | or deepfake a video. Stable Diffusion isn't really changing | what's possible here, and arguably is less advanced than what's | possible with Deepfakes or even the facial filters available on | social networks. | | Like with all deceptive imagery: one just needs to use their | noggin. | | * Also I might add: the article is actually out of date on some | aspects, because this technology is evolving so rapidly. | Literally every day there is a new and interesting way that | people are applying the tech. | buildbot wrote: | Yeah, the best they can do is the filter's on top of the | output. These models are complex enough that with some reverse | engineering you can find "secret" languages to instruct them | that would be able to get around input filtering. | adhesive_wombat wrote: | AI Engine Optimisation could be a good consultancy gig. | Figure out how to get your clients the results they want by | gaming the rules and filters. | | Reminds me of the mysterious control of Conjoiner Drives in | Alastair Reynold's books. | yieldcrv wrote: | I was once a guest at a tech think tank, early 2000s, people | all in their 60s at the time | | They spent years grappling with online worlds because of the | idea that people might/could represent themselves as a | different gender, they wanted the technology to exist and had | dreamed about it for decades they just got caught up on that | | That was comical because it was also out of touch at the time | period as well | | Its interesting how people squirrel and spiral over useless | things for some time | hertzrat wrote: | How Orwellian. It's like newspeak: make it impossible to | express certain thoughts | tjs8rj wrote: | Regardless of the practicality: why do they think it's their | role to be the morality police? | | If there's anything we've learned from history, it's that we've | always been morally wrong in some way, very often in our most | strongly held beliefs. This AI in a different time would be | strictly guided to produce pro-(Catholic | Church/eugenics/slavery/racist/nationalist) content. | worldsayshi wrote: | I think they are just afraid of bad publicity. We remember | some AI experiments mostly for their ability to generate | profanity. | armchairhacker wrote: | The thing is that people can make harmful art themselves. | Photoshopping people's faces on nudes and depicting graphic | violence has been a thing since digital photography if not | painting in general. I mean, look at all the gross stuff which | is online and was online way before these Neural Networks. | | The issue with these neural networks isn't the content they | create, it's that they can create _massive_ amounts of content, | very easily. You can now do things like: write a Facebook | crawler which photo-shops people 's photos on nudes and sends | those to their friends; send out mass phishing emails to old | people with pictures of their grand-kids bloody or in hostage | situations; send out so many Deepfakes for an important person | that nobody can tell whether any of their speeches is | legitimate or not. You can also create content even if you have | no graphic design skills, and create content impulsively, | leading to more gross stuff online. | | Spam, misinformation, phishing, and triggering language are | already major issues. These models could make it 10x worse. | rcoveson wrote: | Where today it takes some far-from-Jesus deviant artists a | whole day to draw a picture of Harry Potter making out with | Draco Malfoy, with the power of AI, billions of such images | will flood the Internet. There's just no way for a young | person to resist that amount of gay energy. It's the | apocalypse fortold by John the Revelator. | adhesive_wombat wrote: | > It's the apocalypse fortold by John the Revelator. | | I _literally_ read a chapter of _Inhibitor Phase_ where | there 's a ship called "John the Revelator" less than an | hour ago. I haven't otherwise seen that phrase written down | for years. | | Spooky (and cue links to the Baader-Meinhof Wikipedia | article). | dkjaudyeqooe wrote: | Reminds me of a toy girl doll I heard about which had a speech | generator which you could program to say sentences but had | "harmful" words removed, keeping only wholesome ones. | | I immediately came up with "Call the football team, I'm wet" | and "Daddy lets play hide the sausage" as example workarounds. | | It's entirely pointless. Humans are vastly superior in their | ability to subvert and corrupt. Even if you were able to catch | regular "harmful" images humans would create a new categories | of imagery which people would experience as "harmful", employ | allusions, illusions, proxies, irony etc. It's endless. | zzleeper wrote: | I naively asked for a "sperm whale opening its mouth in the | middle of the ocean" on DALL-E and got a warning :/ | torotonnato wrote: | Another example of funny subvertion, Chinese style: | https://www.wikiwand.com/en/Baidu_10_Mythical_Creatures | worldsayshi wrote: | Sure it would be a fools errand to filter out "harmful" | speech using traditional algorithms. But neutral networks and | beyond seems like exactly the kind of technology that is able | to respond to fuzzy concepts rather than just sets of words. | Sure it will be a long hunt but if it can learn to paint and | recognize a myriad of visual concepts it ought to be able to | learn what we consider to be harmful. | adhesive_wombat wrote: | Right? Any keyboard can generate "harmful" content, do we need | to figure out how to prevent "harmful generations" at the USB | HID level? | orbital-decay wrote: | I half expect this could be a genuine startup these days. | robocat wrote: | Run the filter on the image output, not the written input? | jwitthuhn wrote: | Stable diffusion does run a filter on the output in its | default configuration. Any image it deems 'unsafe' gets | replaced with a picture of Rick Astley. | | The thing about that is that it is open source, so you can | trivially disable that filter if you like. | | https://github.com/CompVis/stable- | diffusion/blob/69ae4b35e0a... | modeless wrote: | OpenAI's filters are a total joke. I tried to upload The | Creation of Adam (from the Sistene Chapel), blocked for adult | content. "Continued violations may restrict your account". | Yeah, it has naughty bits in it, but it's probably in the top | ten most recognizable pieces of art ever made. I tried to | generate an image of "yarn bombing", blocked for violence. They | have the most advanced AI in the world and they can't solve the | Scunthorpe problem? | justinjlynn wrote: | The most advanced AI in the world isn't advanced enough to | solve that, just yet. Either that, or it's not worth it for | them to use it to do so. | geoah wrote: | I was repeatedly warnings by gpt3 for trying to create images | of a "rubber duck". No idea what it thought I was looking | for. | xt00 wrote: | The _reason_ why this is such a game changer is that it is not | controlled on some central server.. its like saying paper and | pencils can be revoked from people if somebody doesn 't like | what you do with it... its an amazing new technology.. let | people use it.. | buildbot wrote: | I have been playing around with it using ROCM+6900XT, makes a | good alternative to DALLE. They have different strengths, DALLE | seems better at lighting instructions and cityscapes, but Stable | Diffusion is better at sketches. | | Also, you can fine tune it on whatever you want which is awesome. | | One interesting effect I have noticed on myself though is that | after staring at DALLE or Stable Diffusion generated images for a | long time then viewing "real" media, I get the same sense of | wrongness that the output is not quite right for awhile, like my | brain has been tweaking its processing to prefer the AI art as | the ground truth! | dkjaudyeqooe wrote: | I dub it Induced Uncanny Valley Syndrome. | 71a54xd wrote: | Yep, I've found the same to be true. Hopefully at some point | the model is optimized to introspect complex lighting / | textures a bit better. | cube2222 wrote: | That's funny, for me dalle2 is in practice miles ahead on | pencil sketches, but stable diffusion is cool because the | parameters can be customized, which helps with many phrases. | Also, you can just leave it running and producing images for an | hour. | | Also, there's no content filtering, but I don't recommend | playing around with that if you're sensitive. The lifeless | husks and various mixes of body parts I got when playing around | with it with _fairly_ benign phrases could very well be used | for a horror movie. | | It might be that I haven't yet found the right phrase for | stable diffusion for pencil sketches though, as for dalle2 it's | just "<describe what you want>, artstation, pencil sketch, 4k" | to generate consistently great pictures. | at_a_remove wrote: | 4chan is having a field day with AI generated porn of | celebrities (often with ridiculous prompts) and selecting the | most unsettling. One for Billie Ellish looks like some kind | of orphaned shoggoth/succubus hybrid just made its first | attempt at luring in someone for a meal: "You like human | females, yes?" Cataract eyes, aggressive lobotomy mouth, it | forgot to pay attention to shoulders and didn't know spines | existed or what they were for. Or a second attempt, this time | at Bjork, suggesting some kind of lost hominid which consumed | only melons in a predator-rich environment. | [deleted] | fernly wrote: | It isn't going to impress the person in the street until it | actually follows your instructions. I tried several times to | express "a tall three-legged stool" but even with the "CFG" (how | much the image will be like your prompt) at max, it gave me | stools with four or ultimately, two legs. Also tried "a four- | legged spider" (don't ask) and got first an eight-legged spider, | and next, a spider with eight legs, but four of them were | blurred. Sure, dumb, pedestrian requests, no imagination, but a | five-year-old would quickly get impatient with its inability to | follow simple directions. ___________________________________________________________________ (page generated 2022-08-28 23:00 UTC)