[HN Gopher] Stable Diffusion Is the Most Important AI Art Model ...
       ___________________________________________________________________
        
       Stable Diffusion Is the Most Important AI Art Model Ever
        
       Author : brundolf
       Score  : 111 points
       Date   : 2022-08-28 20:17 UTC (2 hours ago)
        
 (HTM) web link (thealgorithmicbridge.substack.com)
 (TXT) w3m dump (thealgorithmicbridge.substack.com)
        
       | nullc wrote:
       | It'll be interesting to see what happens when a copyright troll (
       | https://doctorow.medium.com/a-bug-in-early-creative-commons-... )
       | realizes that they can acquire the rights to models distributed
       | under these vague-as-fog moral panic licenses, or distribute
       | their own and have people actually use them, and start extracting
       | rents.
       | 
       | These licenses will do little to nothing to stop abuse: The
       | abusers will already conceal their identities because their
       | actions are immoral or even illegal (fraud, harassment, etc). But
       | they create a whole host of new liabilities for the users because
       | the definitions are exceedingly subjective.
       | 
       | It's tremendously important to make these tools actually open.
       | But open with a lurking liability bomb stops short of the goal.
       | While stability.ai may never turn into a troll or sell their
       | rights to one, that isn't necessarily true for the next model
       | that comes around.
        
       | sf_sugar_daddy wrote:
        
       | empiricus wrote:
       | I tried searching "nude" prompt. Instant regret.
        
         | metadat wrote:
         | Definitely don't search "nipples".. ugh...
         | 
         | https://lexica.art/
         | 
         | (Direct link doesn't work, count yourself lucky)
        
       | amelius wrote:
       | Where can I learn more about how this algorithm works?
        
         | r2_pilot wrote:
         | This may be of interest to you:
         | https://github.com/CompVis/stable-diffusion
        
       | veridies wrote:
       | One oddity for me (and I haven't played with a lot of AI art, so
       | maybe this is normal): every time I try to describe a person, it
       | generates like four to seven different faces.
        
       | can16358p wrote:
       | Tried it.
       | 
       | While it's a huge win to be open source, I find the results
       | always inferior to Midjourney (and DALL-E).
       | 
       | I tried to generate some artistic results with variety of prompts
       | and Midjourney always won hands down.
       | 
       | But of course, since it's open source, many community tweaks and
       | colab notebooks/forks will probably put it in par with DALL-E by
       | time. But I have trouble imagining Stable Diffusion competing
       | against Midjourney anytime soon: the different is day and night.
        
         | tough wrote:
         | Mid journey uses SD under the good (as of recently, no?)
        
       | afpx wrote:
       | After generating 5000 images with these tools, I believe the
       | killer app will be the one that gives the artist the most
       | control. I want a view and a scene and be able to manipulate both
       | in real time.
       | 
       | Like,
       | 
       | View: 50mm film, wide-angle
       | 
       | Scene: rectangular room with window -> show preview
       | 
       | Scene: add table -> show preview
       | 
       | Scene: move table left -> show preview
       | 
       | Scene: add mug on table -> show preview
       | 
       | View: center on mug
       | 
       | Right now, there's little control and it's a lot of random
       | guessing, "Hmm what happens if I add these two terms?"
        
         | gamegoblin wrote:
         | Have you seen the img2img results? You draw kind of a crappy
         | Microsoft Paint style image, give it some text for how you want
         | it to actually look, and it does the transformation.
         | 
         | For example:
         | https://www.reddit.com/r/StableDiffusion/comments/wwgge8/ano...
         | 
         | Consider also this example of someone splicing Stable Diffusion
         | into a proper image editor and using a combination of img2img,
         | text to image, inpainting, and normal photoshop tools:
         | https://www.reddit.com/r/StableDiffusion/comments/wyduk1/sho...
        
         | orbital-decay wrote:
         | The natural language alone is one of the worst ways to control
         | image generation. The model knows how to generate anything, but
         | it's own "language" is nothing like yours. It's like writing in
         | Finnish, twisting it in such a way that it would yield coherent
         | Chinese poems after Google Translate. You will end up inserting
         | various garbage into your input and not getting the result you
         | like anyway. img2img gives much better result because you can
         | explain your intent with higher order tools than just textual
         | input alone.
         | 
         | What would be best is to properly integrate models like that
         | into some painting software like Krita. Imagine a brush that
         | only affects freckles, blue teapots, fingers, or sharp corners.
         | (or any other thing in a prompt) Or a brush that learns your
         | personal style and transfers it onto a rough sketch you make,
         | speeding up the process. Many possibilities.
         | 
         | I think they are already making an img2img plugin for
         | Photoshop. Watch the demo, it's kind of impressive. [0] It's
         | just a rudimentary prototype of what's possible with a properly
         | trained model, but it already looks like a drop-in replacement
         | for photobashing (as an example).
         | 
         | https://old.reddit.com/r/StableDiffusion/comments/wyduk1/sho...
        
         | adhesive_wombat wrote:
         | Reminds me of the holodeck scene where Picard(?, Edit, Geordi)
         | reconstructs a table with what I, at the time, thought was a
         | pretty vague set of specifications.
         | 
         | Turns out the _Star Trek_ predicted 2020 's style AI behaviour
         | rather well. Considering nuclear war is then due in 2026,
         | that's disconcerting.
        
         | spywaregorilla wrote:
         | I think the ideal UX will be the ability to markup images with
         | little comments and have it adapt accordingly. The prompt
         | interface is bad. One of the biggest reasons being that you
         | have virtually no control on the spatial aspect of your
         | additions. Being able to say "add an elephant here and remove
         | this lamp" will be big. Being able to do so with a doodle of an
         | elephant to suggest posing will be even better.
        
       | Buttons840 wrote:
       | I saw the word "safety" a few places in the article. What does
       | "safety" mean in this context?
        
         | i_like_apis wrote:
         | In this context, it's buzz-killing, pearl-clutching, often
         | woke, nonsense:
         | 
         | ... attempting to limit the generation space to omit porn,
         | copyright infringement, violence, racially "unbalanced" content
         | ... etc.
        
         | ttul wrote:
         | You can ask the AI to generate a picture of horrid things, and
         | it will oblige.
        
           | emikulic wrote:
           | You can generate things the author doesn't like. And since
           | you're doing it on your video card at home, nobody can stop
           | you.
        
             | Buttons840 wrote:
             | If a computer model could produce the world's best porn,
             | would that be a good or bad thing? Many harmful effects of
             | porn would be amplified, but it would reduce the
             | exploitation of real people in the industry. A moral
             | question society will soon face I think.
        
       | whywhywhywhy wrote:
       | I've generated over 1000 images in the last 48 hours. It's better
       | and faster than using Dall-E, I can literally just leave a prompt
       | churning away in the background for the same costs of playing a
       | high end videogame and check on the results when I want.
       | 
       | Honestly if I was a commercial concept artist or illustrator that
       | didn't have a signature style I'd be really worried. We're truly
       | gonna see the power of this tech as a tool now it's not gatekept.
        
       | daenz wrote:
       | >once they figure out how to control potentially harmful
       | generations
       | 
       | Is it just me, or does anyone else think that this is an
       | impossible and futile task? I don't have a solid grasp on what
       | kind of censorship is possible with this technology, but the goal
       | seems to be on par with making sure nobody says anything mean
       | online. People are extremely creative and are going to find the
       | prompts that generate the "harmful" images.
        
         | ironmagma wrote:
         | It's impossible and futile, but that has never stopped
         | legislators or attorneys before.
        
         | zmmmmm wrote:
         | devil's advocating, given they have trained it so well to
         | generate images in spite of all expectations, is it really so
         | hard to imagine that they can't also train it to understand
         | what images not to generate? It already had to understand not
         | to generate things that don't make sense to humans. How does
         | this not just amount to "moar training"? The hardest thing is
         | that the training data it will need is a gigantic store of
         | objectionable (and illegal) content ... probably not something
         | many groups are eager to build and host.
        
         | systemvoltage wrote:
         | People _need_ to read John Carmack and John Romero 's epic
         | adventures of Doom: https://www.amazon.com/Masters-of-Doom-
         | David-Kushner-audiobo...
         | 
         | Even in the 90's they had to fight hordes and hordes of
         | Californian nutjobs (Diane Feinstein et. al.) that wanted to
         | ban violent video games. These people would be certainly
         | cancelled in today's world, wouldn't hold a chance. Because,
         | how dare you allow violence in video games to ...children!?
         | 
         | Our civilization depends on allowing wacko's do their thing as
         | far as it is within limits of the law. Let them be offensive as
         | fuck. These are the people that herald and propel society
         | forward by their heterodox thinking. Society is going to decay
         | fast, it already is.
        
           | daenz wrote:
           | You make a great point... we can't stop the decay, so the
           | growth has to outpace it.
        
           | JaimeThompson wrote:
           | >hordes of Californian nutjobs (Diane Feinstein et. al.)
           | 
           | Lots of people from other states, including Texas, in that
           | list too. It wasn't just a California / Left issue.
        
             | systemvoltage wrote:
             | Yea definitely when they started a studio in Dallas, I
             | don't remember the congress persons that were on similar
             | stance as Diane. During the 90's, progressives played a
             | larger role though. There was also Mortal Kombat fiasco:
             | 
             | > During the U.S. Congressional hearing on video game
             | violence, Democratic Party Senator Herb Kohl, working with
             | Senator Joe Lieberman, attempted to illustrate why
             | government regulation of video games was needed by showing
             | clips from 1992's Mortal Kombat and Night Trap (another
             | game featuring digitized actors).
             | 
             | https://en.wikipedia.org/wiki/1993_United_States_Senate_hea
             | r...
        
         | mrtksn wrote:
         | I find it very immoral too, it's like the islamist trying to
         | prevent the prophet pictures drawn. Not that I want to offend
         | muslims or make "harmful" content but this notion that specific
         | type of content creation needs to be imposed is very very
         | problematic. Americans freak out of nudity all the time,
         | something that is not considered harmful in many other places.
         | The fear of images and text and the mission to restrain it is
         | pathetic.
         | 
         | Anyway, it won't be possible to contain it. Better spend the
         | effort on how to deal with bad actors instead of trying to
         | restrain the use of content creation tools.
        
           | derac wrote:
           | Yeah, it's taking the impulse to control everything from our
           | own mind and putting it into an artificial one. Seems to me a
           | lot of our suffering is borne of that impulse.
        
         | quitit wrote:
         | I don't see the point. Idiots are fooled by far less convincing
         | images.
         | 
         | Humanity has had the ability to lie with pictures since the
         | invention of photography. The field of special effects can be
         | described as lying about things that don't matter.
         | 
         | Without using Stable Diffusion, I can still photoshop an image
         | or deepfake a video. Stable Diffusion isn't really changing
         | what's possible here, and arguably is less advanced than what's
         | possible with Deepfakes or even the facial filters available on
         | social networks.
         | 
         | Like with all deceptive imagery: one just needs to use their
         | noggin.
         | 
         | * Also I might add: the article is actually out of date on some
         | aspects, because this technology is evolving so rapidly.
         | Literally every day there is a new and interesting way that
         | people are applying the tech.
        
         | buildbot wrote:
         | Yeah, the best they can do is the filter's on top of the
         | output. These models are complex enough that with some reverse
         | engineering you can find "secret" languages to instruct them
         | that would be able to get around input filtering.
        
           | adhesive_wombat wrote:
           | AI Engine Optimisation could be a good consultancy gig.
           | Figure out how to get your clients the results they want by
           | gaming the rules and filters.
           | 
           | Reminds me of the mysterious control of Conjoiner Drives in
           | Alastair Reynold's books.
        
         | yieldcrv wrote:
         | I was once a guest at a tech think tank, early 2000s, people
         | all in their 60s at the time
         | 
         | They spent years grappling with online worlds because of the
         | idea that people might/could represent themselves as a
         | different gender, they wanted the technology to exist and had
         | dreamed about it for decades they just got caught up on that
         | 
         | That was comical because it was also out of touch at the time
         | period as well
         | 
         | Its interesting how people squirrel and spiral over useless
         | things for some time
        
         | hertzrat wrote:
         | How Orwellian. It's like newspeak: make it impossible to
         | express certain thoughts
        
         | tjs8rj wrote:
         | Regardless of the practicality: why do they think it's their
         | role to be the morality police?
         | 
         | If there's anything we've learned from history, it's that we've
         | always been morally wrong in some way, very often in our most
         | strongly held beliefs. This AI in a different time would be
         | strictly guided to produce pro-(Catholic
         | Church/eugenics/slavery/racist/nationalist) content.
        
           | worldsayshi wrote:
           | I think they are just afraid of bad publicity. We remember
           | some AI experiments mostly for their ability to generate
           | profanity.
        
         | armchairhacker wrote:
         | The thing is that people can make harmful art themselves.
         | Photoshopping people's faces on nudes and depicting graphic
         | violence has been a thing since digital photography if not
         | painting in general. I mean, look at all the gross stuff which
         | is online and was online way before these Neural Networks.
         | 
         | The issue with these neural networks isn't the content they
         | create, it's that they can create _massive_ amounts of content,
         | very easily. You can now do things like: write a Facebook
         | crawler which photo-shops people 's photos on nudes and sends
         | those to their friends; send out mass phishing emails to old
         | people with pictures of their grand-kids bloody or in hostage
         | situations; send out so many Deepfakes for an important person
         | that nobody can tell whether any of their speeches is
         | legitimate or not. You can also create content even if you have
         | no graphic design skills, and create content impulsively,
         | leading to more gross stuff online.
         | 
         | Spam, misinformation, phishing, and triggering language are
         | already major issues. These models could make it 10x worse.
        
           | rcoveson wrote:
           | Where today it takes some far-from-Jesus deviant artists a
           | whole day to draw a picture of Harry Potter making out with
           | Draco Malfoy, with the power of AI, billions of such images
           | will flood the Internet. There's just no way for a young
           | person to resist that amount of gay energy. It's the
           | apocalypse fortold by John the Revelator.
        
             | adhesive_wombat wrote:
             | > It's the apocalypse fortold by John the Revelator.
             | 
             | I _literally_ read a chapter of _Inhibitor Phase_ where
             | there 's a ship called "John the Revelator" less than an
             | hour ago. I haven't otherwise seen that phrase written down
             | for years.
             | 
             | Spooky (and cue links to the Baader-Meinhof Wikipedia
             | article).
        
         | dkjaudyeqooe wrote:
         | Reminds me of a toy girl doll I heard about which had a speech
         | generator which you could program to say sentences but had
         | "harmful" words removed, keeping only wholesome ones.
         | 
         | I immediately came up with "Call the football team, I'm wet"
         | and "Daddy lets play hide the sausage" as example workarounds.
         | 
         | It's entirely pointless. Humans are vastly superior in their
         | ability to subvert and corrupt. Even if you were able to catch
         | regular "harmful" images humans would create a new categories
         | of imagery which people would experience as "harmful", employ
         | allusions, illusions, proxies, irony etc. It's endless.
        
           | zzleeper wrote:
           | I naively asked for a "sperm whale opening its mouth in the
           | middle of the ocean" on DALL-E and got a warning :/
        
           | torotonnato wrote:
           | Another example of funny subvertion, Chinese style:
           | https://www.wikiwand.com/en/Baidu_10_Mythical_Creatures
        
           | worldsayshi wrote:
           | Sure it would be a fools errand to filter out "harmful"
           | speech using traditional algorithms. But neutral networks and
           | beyond seems like exactly the kind of technology that is able
           | to respond to fuzzy concepts rather than just sets of words.
           | Sure it will be a long hunt but if it can learn to paint and
           | recognize a myriad of visual concepts it ought to be able to
           | learn what we consider to be harmful.
        
         | adhesive_wombat wrote:
         | Right? Any keyboard can generate "harmful" content, do we need
         | to figure out how to prevent "harmful generations" at the USB
         | HID level?
        
           | orbital-decay wrote:
           | I half expect this could be a genuine startup these days.
        
           | robocat wrote:
           | Run the filter on the image output, not the written input?
        
             | jwitthuhn wrote:
             | Stable diffusion does run a filter on the output in its
             | default configuration. Any image it deems 'unsafe' gets
             | replaced with a picture of Rick Astley.
             | 
             | The thing about that is that it is open source, so you can
             | trivially disable that filter if you like.
             | 
             | https://github.com/CompVis/stable-
             | diffusion/blob/69ae4b35e0a...
        
         | modeless wrote:
         | OpenAI's filters are a total joke. I tried to upload The
         | Creation of Adam (from the Sistene Chapel), blocked for adult
         | content. "Continued violations may restrict your account".
         | Yeah, it has naughty bits in it, but it's probably in the top
         | ten most recognizable pieces of art ever made. I tried to
         | generate an image of "yarn bombing", blocked for violence. They
         | have the most advanced AI in the world and they can't solve the
         | Scunthorpe problem?
        
           | justinjlynn wrote:
           | The most advanced AI in the world isn't advanced enough to
           | solve that, just yet. Either that, or it's not worth it for
           | them to use it to do so.
        
           | geoah wrote:
           | I was repeatedly warnings by gpt3 for trying to create images
           | of a "rubber duck". No idea what it thought I was looking
           | for.
        
         | xt00 wrote:
         | The _reason_ why this is such a game changer is that it is not
         | controlled on some central server.. its like saying paper and
         | pencils can be revoked from people if somebody doesn 't like
         | what you do with it... its an amazing new technology.. let
         | people use it..
        
       | buildbot wrote:
       | I have been playing around with it using ROCM+6900XT, makes a
       | good alternative to DALLE. They have different strengths, DALLE
       | seems better at lighting instructions and cityscapes, but Stable
       | Diffusion is better at sketches.
       | 
       | Also, you can fine tune it on whatever you want which is awesome.
       | 
       | One interesting effect I have noticed on myself though is that
       | after staring at DALLE or Stable Diffusion generated images for a
       | long time then viewing "real" media, I get the same sense of
       | wrongness that the output is not quite right for awhile, like my
       | brain has been tweaking its processing to prefer the AI art as
       | the ground truth!
        
         | dkjaudyeqooe wrote:
         | I dub it Induced Uncanny Valley Syndrome.
        
         | 71a54xd wrote:
         | Yep, I've found the same to be true. Hopefully at some point
         | the model is optimized to introspect complex lighting /
         | textures a bit better.
        
         | cube2222 wrote:
         | That's funny, for me dalle2 is in practice miles ahead on
         | pencil sketches, but stable diffusion is cool because the
         | parameters can be customized, which helps with many phrases.
         | Also, you can just leave it running and producing images for an
         | hour.
         | 
         | Also, there's no content filtering, but I don't recommend
         | playing around with that if you're sensitive. The lifeless
         | husks and various mixes of body parts I got when playing around
         | with it with _fairly_ benign phrases could very well be used
         | for a horror movie.
         | 
         | It might be that I haven't yet found the right phrase for
         | stable diffusion for pencil sketches though, as for dalle2 it's
         | just "<describe what you want>, artstation, pencil sketch, 4k"
         | to generate consistently great pictures.
        
           | at_a_remove wrote:
           | 4chan is having a field day with AI generated porn of
           | celebrities (often with ridiculous prompts) and selecting the
           | most unsettling. One for Billie Ellish looks like some kind
           | of orphaned shoggoth/succubus hybrid just made its first
           | attempt at luring in someone for a meal: "You like human
           | females, yes?" Cataract eyes, aggressive lobotomy mouth, it
           | forgot to pay attention to shoulders and didn't know spines
           | existed or what they were for. Or a second attempt, this time
           | at Bjork, suggesting some kind of lost hominid which consumed
           | only melons in a predator-rich environment.
        
       | [deleted]
        
       | fernly wrote:
       | It isn't going to impress the person in the street until it
       | actually follows your instructions. I tried several times to
       | express "a tall three-legged stool" but even with the "CFG" (how
       | much the image will be like your prompt) at max, it gave me
       | stools with four or ultimately, two legs. Also tried "a four-
       | legged spider" (don't ask) and got first an eight-legged spider,
       | and next, a spider with eight legs, but four of them were
       | blurred. Sure, dumb, pedestrian requests, no imagination, but a
       | five-year-old would quickly get impatient with its inability to
       | follow simple directions.
        
       ___________________________________________________________________
       (page generated 2022-08-28 23:00 UTC)