[HN Gopher] Some notes on the Stable Diffusion safety filter
       ___________________________________________________________________
        
       Some notes on the Stable Diffusion safety filter
        
       Author : Tomte
       Score  : 99 points
       Date   : 2022-11-18 16:10 UTC (6 hours ago)
        
 (HTM) web link (vickiboykis.com)
 (TXT) w3m dump (vickiboykis.com)
        
       | vintermann wrote:
       | Now I really want to see the naughty picture of a dolphin
       | swimming in a sea of vectors that the model refused to show us.
        
       | sacrosancty wrote:
        
       | minimaxir wrote:
       | Unfortunately the safety filters have enough false positives
       | (basically any image with a large amount of fleshy color) to the
       | point that it's just easier to disable it and handle it manually.
        
         | langitbiru wrote:
         | "Gal Gadot wearing green suit" triggered it while "Tom Cruise
         | wearing green suit" didn't.
        
           | naet wrote:
           | Might be the word "gal" (which can mean girl or young woman).
        
           | Wistar wrote:
           | As did "young children watching sunset," but not "young boy
           | and girl watching sunset."
        
             | [deleted]
        
         | criddell wrote:
         | And then the CSAM filter on your device reports you to some
         | authority.
        
           | iceburgcrm wrote:
           | Only available on the latest iphone
        
             | [deleted]
        
             | Gigachad wrote:
             | Apple ended up not implementing that iirc. While Google
             | Photos has had it the whole time.
             | 
             | Googles is actually worse. Apple was only going to match
             | against known CSAM images while google has ML to identify
             | new images which resulted in one parent being arrested for
             | a medical image of their own child.
        
           | BoorishBears wrote:
           | If you have things on your device that match entries in the
           | CSAM database, yes there's a chance you're a victim of a
           | targeted attack taking advantage of highly experimental
           | collisions... but the odds you "accidentally generated" that
           | content are not realistic.
        
             | yeet_yeet_yeet wrote:
             | >not realistic
             | 
             | The odds are zero.
             | 
             | 1/2^256 = 0.
             | 
             | In cryptography these odds are treated as zero until you
             | generate close to 2^128 images.
             | 
             | Unfortunately there's no word in natural English to
             | describe how unlikely. The most precise is "zero".
        
               | criddell wrote:
               | How can you be so sure? As I understand it, the hash is
               | of features in the image and not the image itself. Are
               | the CSAM feature detection heuristics public?
        
               | ben_w wrote:
               | Are you assuming that digital images are evenly
               | distributed over the set of all possible 256 bit vectors?
               | 
               | Because I don't think that's a reasonable assumption.
               | 
               | Even if image recognition was perfectly solved with no
               | known edge cases (ha!), when an entire topic is a
               | semantic stop-sign for most people, you can't expect the
               | mysterious opaque box that is a guilty-enough-to-
               | investigate detection mechanism to be something that gets
               | rapid updates and corrections when new failure modes are
               | discovered.
        
               | jerf wrote:
               | You should spend some time with an internet search engine
               | and the term "perceptual hashing". What you're talking
               | about is another type of hashing, which can be useful for
               | classifying image _files_ , but not _images_. The former
               | has a very concrete definition that is specified down to
               | the bit; the latter is a fuzzy space because it 's trying
               | to yield similar (not necessarily identical) hashes for
               | images that humans consider similar. Much different
               | space, much different problem, much different collision
               | situation. Cryptographic hashing is not the only kind of
               | hashing.
        
               | yeet_yeet_yeet wrote:
               | Oh wow https://www.apple.com/child-
               | safety/pdf/CSAM_Detection_Techni... so they essentially
               | just use CNN output to automatically determine whether to
               | report people to the authorities? For some reason I
               | assumed they were just comparing the files they knew to
               | be CSAM.
               | 
               | Yeah that's bad. What about deepdream/CNN reversing?
               | Couldn't a rogue apple engineer just create a innocuous
               | looking false positive, say a cat picture, share it on
               | Reddit, and everybody who downloads it is flagged to
               | police for CSAM?
        
         | netruk44 wrote:
         | That'll only work for a little while longer (for future named
         | big-public-release models, obviously the cat's out of the bag
         | for the current version of stable diffusion), right up until
         | the point where they incorporate the filter into the training
         | process.
         | 
         | At which point, the end model users get to download will be
         | incapable of producing anything that comes close to triggering
         | the filter, and there will be no way to work around it short of
         | training/fine-tuning your own model, which is prohibitively
         | expensive for 'normal' people, even people with top-of-the-line
         | graphics cards like a 4090.
        
           | ben_w wrote:
           | Training's only prohibitively expensive for normal people
           | _today_ , and the dollar cost per compute operation is still
           | decreasing fairly rapidly.
        
           | Animats wrote:
           | That problem is being solved. Pornhub now has an AI R&D
           | unit.[1] Their current project is to upscale and colorize out
           | of copyright vintage porn. As a training set, they use modern
           | porn. They point out that they have access to a big training
           | set.
           | 
           | Next step, porn generation.
           | 
           | [1] https://www.pornhub.com/art/remastured
        
             | pessimizer wrote:
             | > Their current project is to upscale and colorize out of
             | copyright vintage porn.
             | 
             | But not very well. I collect this stuff and I have my own
             | copies, so I can tell you that this doesn't look better
             | than the b/w originals in quality/detail, and it's easy to
             | see that the color is not great, especially if there are
             | lots of hard lights and shadows dancing around.
             | 
             | That being said, I don't know why it's not working. Seems
             | like it should work. I'd expect it to at least be clean of
             | scratches and stabilized. Any relevant papers I should read
             | about AI restoration of old film?
        
             | sillysaurusx wrote:
             | For a glimpse at what's possible:
             | 
             | https://www.reddit.com/r/unstablediffusion
             | 
             | https://www.reddit.com/r/aiwaifu
             | 
             | I've been trying to generate tentacle porn since 2019 or
             | so. It's the whole reason I got into AI. We're finally
             | there, and it only took three years.
             | 
             | Can't wait to see what 2026 brings.
             | http://n.actionsack.com/pic/media%2FFh08F_hXkAAhalt.jpg
        
               | GuB-42 wrote:
               | > https://www.reddit.com/r/unstablediffusion
               | 
               | This subreddit was banned due to a violation of Reddit's
               | rules against non-consensual intimate media.
               | 
               | Interesting. Why "non-consensual"? Does it mean Stable
               | Diffusion generated porn of people who actually exist?
        
               | sillysaurusx wrote:
               | Sorry all, I was typing it on my phone and missed an
               | underscore. Here's the proper link:
               | 
               | https://www.reddit.com/r/unstable_diffusion
        
               | emmelaich wrote:
               | unstable_diffusion is still around. Note the underscore.
        
               | sbierwagen wrote:
               | Yes, reddit routinely bans deepfake subreddits. In
               | practice, this means any net that can produce output that
               | looks like any living person is banned.
        
           | pifm_guy wrote:
           | Fine-tuning is pretty cheap compared to the original training
           | run - perhaps just 1% of the cost.
           | 
           | Totally within reach of a consortium of.... "entertainment
           | specialists".
        
             | netruk44 wrote:
             | I know a person who fine-tuned stable diffusion, and he
             | said it took 2 weeks of 8xA100 80 GB training time, costing
             | him somewhere between $500-$700 (he got a pretty big
             | discount, too, at today's prices for peer GPU rental it
             | would be over $1,000).
             | 
             | Sure, it's peanuts compared to what it must have cost to
             | train stable diffusion from scratch. However, I think most
             | normal people would not consider spending $500 to fine-tune
             | one of these.
             | 
             | Edit: Though I do agree that once this kind of filtering is
             | in place during training, NSFW models will begin to pop up
             | all over the place.
        
               | minimaxir wrote:
               | For spot-finetuning with Dreambooth (not as good as full-
               | finetuning but can get a specific subject/style much
               | faster), it can be done with about $0.08 of GPU compute,
               | although optimizing it is harder.
               | 
               | https://huggingface.co/docs/diffusers/training/dreambooth
        
               | netruk44 wrote:
               | Are these services using textual-inversion? If so, I have
               | to wonder how well they would work on a stable diffusion
               | model that was trained with the filter in place from the
               | start, so that it couldn't generate anything close to the
               | filter.
               | 
               | As it is right now, stable diffusion _can_ generate adult
               | imagery by itself, however it seems like it 's been fine-
               | tuned after the fact to try to 'cover up' that fact as
               | much as they could before releasing the model publicly.
        
               | gpderetta wrote:
               | As far as I understand textual inversion != Dreambooth !=
               | Actual fine-tuning
        
               | seaal wrote:
               | I believe the safety filter is trivial to disable since
               | it was added in one of the last commits prior to Stable
               | Diffusion's public release and not baked into the model,
               | therefore most forks just remove the safety checker code
               | [1]
               | 
               | As far as textual inversion, JoePenna's Dreambooth [2]
               | implementation uses Textual Inversion.
               | 
               | [1] https://github.com/CompVis/stable-
               | diffusion/commit/a6e2f3b12... [2]
               | https://github.com/JoePenna/Dreambooth-Stable-Diffusion
        
               | cookingrobot wrote:
               | You can fine tune stable diffusion for $10 using this
               | service: https://www.strmr.com/
               | 
               | It works super well for putting yourself in the images,
               | the likeness is fantastic.
               | 
               | It's obviously a small training process, they only take
               | 20 images, but it works.
        
           | TaylorAlexander wrote:
           | This prediction doesn't track with what is already happening.
           | Dreambooth is allowing all kinds of people to fine tune their
           | own models at home with nvidia graphics cards, and people are
           | sharing all kinds of updated models that do really well at
           | specific art styles or with NSFW subjects. Go check the nsfw
           | subreddit unstable_diffusion for examples. It seems lots of
           | people are training nsfw models with their own preferred data
           | sets and last I saw someone merged all those checkpoints
           | together in to one model.
           | 
           | So if I made a prediction it would be that the training sets
           | for open models from big companies will get scrubbed of nsfw
           | content and then nerds on Reddit will just release their own
           | versions with it added in, and the big companies will make
           | sure everyone knows they didn't add that stuff and that's
           | where it will stand.
        
             | netruk44 wrote:
             | I agree with your prediction. Sorry, I was unclear in my
             | post, and left that part unsaid. I agree that it will
             | likely just be the big newly released 'base' models that
             | will be scrubbed of NSFW images, but there's really no way
             | to prevent these models from making those kinds of images
             | _at all_.
             | 
             | It will only take some dedicated individuals, which I know
             | there is no shortage of.
        
             | langitbiru wrote:
             | The AI-generated art with Dreambooth works only for avatar
             | type pics. It cannot create fancy gestures (doing a
             | complicated movement with hands, like patting a cat). For
             | now.
        
       | cuddlyogre wrote:
       | I can understand giving a user the option to filter out something
       | they might not want to see. But the idea that the technology
       | itself should be limited based on the subjective tastes and whims
       | of the day makes my stomach churn. It's not too disconnected from
       | altering a child's brain so that he is incapable of understanding
       | concepts his parents don't like.
        
       | par wrote:
       | Interesting write up but kind of moot considering there are many
       | nsfw models that are super easy to plugin and use along side
       | stable diffusion (via img2img) to generate all manners of imagery
       | to your hearts content.
        
       | dimensionc132 wrote:
        
       | [deleted]
        
       | tifik wrote:
       | > Using the model to generate content that is cruel to
       | individuals is a misuse of this model. This includes, but is not
       | limited to:
       | 
       | ... >+ Sexual content without consent of the people who might see
       | it
       | 
       | I understand that it's their TOS and they can put pretty much
       | anything in there, but this item seems... odd. I don't really
       | know why exactly this stands out to me. Maybe it's because it's
       | practically un-enforceable? Are they just covering all their
       | bases legally?
       | 
       | Trying to think of a good metaphore; let's try this: If you are
       | an artist and someone commissions you to create an art piece that
       | might be sexual, can you say "ok, but you have to ask for consent
       | before you show it to people", and you enshrine it in the
       | contract. Obviously gross violations like trolling by spamming
       | porn are pretty clear cut, but what about the more nuanced cases
       | when you say, display it on your personal website? Are you
       | supposed to have an NSFW overlay? Isn't opening a website sort of
       | implying that you consent to seeing whatever is on there, unless
       | you have a strong preconception of what content the page is
       | expected to display?
       | 
       | I might be hugely overthinking this.
        
         | bawolff wrote:
         | To me, i think it seems weird because its disconnected from
         | stable diffusion.
         | 
         | I think the comparison would be if google maps had a terms of
         | service forbidding using it to plan getaway routes during bank
         | robberies. Like yes bank robberies are wrong, but if someone
         | did that the sin would not be with google maps.
        
         | properparity wrote:
         | We need to let it completely loose and get everyone exposed to
         | it everywhere so that maybe we can finally get rid of this
         | insane taboo and uptightness about sex and nudity we have in
         | society.
        
           | octagons wrote:
           | Nudity? Yes. Pornography? No.
        
             | rcoveson wrote:
             | Football? Yes. Violence? No.
             | 
             | Try getting that rule passed on any form of media.
        
               | octagons wrote:
               | Sorry, I wasn't clear. I'm not suggesting any regulation.
               | I'm saying that I agree that "society" (in my case,
               | American culture) has taken the idea of shielding
               | children from viewing pornography to an extreme, where
               | nudity in media, even in a non-sexual context, is often
               | censored.
               | 
               | I think this ultimately causes more harm to a society
               | instead of benefitting it. I don't think this is a very
               | unique viewpoint, but my choice of words in that other
               | comment didn't communicate this point very well.
        
             | archontes wrote:
             | No rules.
        
           | pbhjpbhj wrote:
           | >insane taboo and uptightness about sex and nudity we have in
           | society //
           | 
           | In the UK we're on aggregate definitely too uptight about
           | nudity, but sex ... inhibition towards things like
           | infidelity, promiscuity, fecundity, seems like a relatively
           | good thing. Sex being the preserve of committed relationships
           | is not a problem to fix to my view.
           | 
           | It _sounds_ like you think we should basically be bonobos?
           | Preoccupied with carnal interactions to the exclusion of all
           | else?
        
             | practice9 wrote:
             | > Preoccupied with carnal interactions to the exclusion of
             | all else?
             | 
             | I think the poster means that people are already too
             | preoccupied with banning sex to the detriment of everything
             | else. It leads to various perversions like normalization of
             | violence through loopholes in the media. "Fantasy violence"
             | is an amusing term.
             | 
             | Although to be fair, loli and some weird anime stuff
             | generated by AI nowadays is on the opposite end of this
             | spectrum.
        
             | ben_w wrote:
             | I have sometimes thought that it's a shame humans came from
             | the sadistically violent branch of the primate family
             | rather than the constantly horny branch.
             | 
             | Even before I learned about the horny branch of primates,
             | as a teenager in the UK I thought it was _very weird_ that
             | media -- games, films, TV shows, books, etc. -- were all
             | able to depict lethal violence to young audiences, while
             | conversely _consensual sex_ was something we could only
             | witness when we were _two years above_ the age of consent
             | in the UK.
        
           | ActorNightly wrote:
           | The taboo aspect is irrelevant. The biggest thing is to take
           | away these power levers from people who abuse them for
           | personal goals. Remember when the whole Pornhub CC payment
           | issue happened? That was because of supposed "child
           | pornography/trafficking".
        
         | digitallyfree wrote:
         | The SD terms also mention that the model and its generated
         | outputs cannot be used for disinformation, medical advice, and
         | several other things. It looks like the only way to legally
         | protect yourself would be to require a contract from everyone
         | buying your SD artwork asserting that they will also comply
         | with the full SD license terms.
         | 
         | While this may work if you're selling the art electronically
         | and provide the buyer with a set of terms to accept, this would
         | be difficult if you're selling the work physically. For
         | instance if I sell a postcard with SD art on it in a
         | convenience store, the buyer won't be signing any contracts.
         | However the buyer could display that postcard in a manner that
         | is technically disinformation (e.g. going around telling people
         | the picture on the postcard is a genuine photograph) and
         | suddenly that becomes a license violation.
        
         | formerly_proven wrote:
         | Stable Diffusion is developed at LMU Munich and this particular
         | line basically paraphrases SS 184 of the German criminal code,
         | which makes it a misdemeanor crime to put porn in places
         | reachable by minors or to show porn to someone without being
         | asked to do so, among other things. I dunno why they felt
         | compelled to include it though.
         | 
         | Regarding your examples, most of these are technically criminal
         | in Germany, because the only legally safe way to have a place
         | not-reachable-by-minors means adhering to German youth
         | protection laws, which you're not going to, just like every
         | porn site, Twitter, Reddit etc.
        
         | krisoft wrote:
         | > If you are an artist and someone commissions you to create an
         | art piece that might be sexual, can you say "ok, but you have
         | to ask for consent before you show it to people", and you
         | enshrine it in the contract.
         | 
         | Yes. Obviously. How is that a question?
         | 
         | > Are you supposed to have an NSFW overlay?
         | 
         | Sounds like a reasonable way to comply with the condition.
         | 
         | > I might be hugely overthinking this.
         | 
         | I agree.
        
         | netruk44 wrote:
         | I think the issue they're mainly worried about might be
         | exemplified with a prompt of 'my little pony'. A children's
         | show with quite a lot of adult imagery associated with it on
         | the internet.
         | 
         | A child entering this prompt is probably expecting one thing,
         | but the internet is _filled_ with pictures of another nature.
         | There are possibly more adult  'my little pony' images than
         | screenshots of the show on the internet.
         | 
         | Did the researchers manage to filter out these images before
         | training? Or is the model aware of both 'kinds' of 'my little
         | pony' images? If the researchers aren't sure they got rid of
         | _all_ of the adult content, then there 's really no way to
         | guarantee the model isn't about to ruin some oblivious person's
         | day.
         | 
         | So then, do you require people generating images to be
         | intricately familiar with the training dataset? Or do you
         | attempt to prevent any kind of surprise like this by just
         | blocking 'unexpected' interactions like this?
        
           | jimbob45 wrote:
           | _A child entering this prompt is probably expecting one
           | thing, but the internet is filled with pictures of another
           | nature. There are possibly more adult 'my little pony' images
           | than screenshots of the show on the internet._
           | 
           | So everyone has to have gimpy AI just because parents can't
           | be expected to take responsibility for what their child does
           | and does not see? Why the fuck is a child being allowed to
           | play with something that can very easily spit out salacious
           | images accidentally? Wouldn't it be significantly easier to
           | add censorship to the prompt input instead? It seems like
           | these tech companies see yet another opportunity to add
           | censorship to their products and can hardly hide their giddy
           | excitement.
        
             | RC_ITR wrote:
             | Because in aggregate, children seeing those things has
             | impacts on society.
             | 
             | Like sure would it be better if parents monitored their
             | children's 4chan use? Ofc.
             | 
             | Is that at all a practical approach to eliminating Elliot
             | Roger idolization? No.
        
             | netruk44 wrote:
             | Just to be clear, the child was just an example of someone
             | who could theoretically experience 'cruel' treatment from
             | the current version of stable diffusion. I'm absolutely not
             | recommending people let their children use the model
             | unsupervised. It doesn't have to be a parenting problem,
             | though.
             | 
             | The same could be said (for example) of a random mother
             | trying to get inspiration for a 'my little pony' birthday
             | cake for their child, and being presented with the 'other'
             | kind of image unintentionally, without their consent. I
             | think they would be justifiably upset in that situation.
             | 
             | If we were to imagine someone attempting to put stable
             | diffusion into some future consumer product, I think they
             | would _have to_ be concerned about these kinds of
             | scenarios. Therefore, the scientists are trying to figure
             | out how to accomplish the filtering.
             | 
             | FWIW, I don't think a model could be made that actively
             | _prevented_ people from using their own NSFW training data.
             | The only difference in the future will be that the public
             | models won 't be able to do it 'for free' with no
             | modifications needed. You'll have to train your own model,
             | or wait for someone else to train one.
        
             | gopher_space wrote:
             | > because parents can't be expected to take responsibility
             | for what their child does and does not see?
             | 
             | This is an opinion you could only have if you've never
             | raised or even spent time around children.
             | 
             | How would your parents have prevented _you_ from
             | unsupervised access? Do you think you'd have gone along
             | with restrictions?
        
             | calebkaiser wrote:
             | I would recommend looking more closely at the article.
             | 
             | Stability.ai, the company who developed and released the
             | model being discussed, have not added a safety filter to
             | the model. As the article points out, the filter is
             | specifically implemented by HuggingFace's Diffusers
             | library, which is a popular library for working with
             | diffusion models (but again, to be clear, not the only
             | option for using Stable Diffusion). The library is also
             | open source, and turning off the safety filter would be
             | trivial if you felt compelled to do so.
             | 
             | So, "these tech companies" aren't overcome by glee over
             | censoring you. One company implemented one filter in one
             | open source and easily editable library.
        
       ___________________________________________________________________
       (page generated 2022-11-18 23:00 UTC)