[HN Gopher] Some notes on the Stable Diffusion safety filter ___________________________________________________________________ Some notes on the Stable Diffusion safety filter Author : Tomte Score : 99 points Date : 2022-11-18 16:10 UTC (6 hours ago) (HTM) web link (vickiboykis.com) (TXT) w3m dump (vickiboykis.com) | vintermann wrote: | Now I really want to see the naughty picture of a dolphin | swimming in a sea of vectors that the model refused to show us. | sacrosancty wrote: | minimaxir wrote: | Unfortunately the safety filters have enough false positives | (basically any image with a large amount of fleshy color) to the | point that it's just easier to disable it and handle it manually. | langitbiru wrote: | "Gal Gadot wearing green suit" triggered it while "Tom Cruise | wearing green suit" didn't. | naet wrote: | Might be the word "gal" (which can mean girl or young woman). | Wistar wrote: | As did "young children watching sunset," but not "young boy | and girl watching sunset." | [deleted] | criddell wrote: | And then the CSAM filter on your device reports you to some | authority. | iceburgcrm wrote: | Only available on the latest iphone | [deleted] | Gigachad wrote: | Apple ended up not implementing that iirc. While Google | Photos has had it the whole time. | | Googles is actually worse. Apple was only going to match | against known CSAM images while google has ML to identify | new images which resulted in one parent being arrested for | a medical image of their own child. | BoorishBears wrote: | If you have things on your device that match entries in the | CSAM database, yes there's a chance you're a victim of a | targeted attack taking advantage of highly experimental | collisions... but the odds you "accidentally generated" that | content are not realistic. | yeet_yeet_yeet wrote: | >not realistic | | The odds are zero. | | 1/2^256 = 0. | | In cryptography these odds are treated as zero until you | generate close to 2^128 images. | | Unfortunately there's no word in natural English to | describe how unlikely. The most precise is "zero". | criddell wrote: | How can you be so sure? As I understand it, the hash is | of features in the image and not the image itself. Are | the CSAM feature detection heuristics public? | ben_w wrote: | Are you assuming that digital images are evenly | distributed over the set of all possible 256 bit vectors? | | Because I don't think that's a reasonable assumption. | | Even if image recognition was perfectly solved with no | known edge cases (ha!), when an entire topic is a | semantic stop-sign for most people, you can't expect the | mysterious opaque box that is a guilty-enough-to- | investigate detection mechanism to be something that gets | rapid updates and corrections when new failure modes are | discovered. | jerf wrote: | You should spend some time with an internet search engine | and the term "perceptual hashing". What you're talking | about is another type of hashing, which can be useful for | classifying image _files_ , but not _images_. The former | has a very concrete definition that is specified down to | the bit; the latter is a fuzzy space because it 's trying | to yield similar (not necessarily identical) hashes for | images that humans consider similar. Much different | space, much different problem, much different collision | situation. Cryptographic hashing is not the only kind of | hashing. | yeet_yeet_yeet wrote: | Oh wow https://www.apple.com/child- | safety/pdf/CSAM_Detection_Techni... so they essentially | just use CNN output to automatically determine whether to | report people to the authorities? For some reason I | assumed they were just comparing the files they knew to | be CSAM. | | Yeah that's bad. What about deepdream/CNN reversing? | Couldn't a rogue apple engineer just create a innocuous | looking false positive, say a cat picture, share it on | Reddit, and everybody who downloads it is flagged to | police for CSAM? | netruk44 wrote: | That'll only work for a little while longer (for future named | big-public-release models, obviously the cat's out of the bag | for the current version of stable diffusion), right up until | the point where they incorporate the filter into the training | process. | | At which point, the end model users get to download will be | incapable of producing anything that comes close to triggering | the filter, and there will be no way to work around it short of | training/fine-tuning your own model, which is prohibitively | expensive for 'normal' people, even people with top-of-the-line | graphics cards like a 4090. | ben_w wrote: | Training's only prohibitively expensive for normal people | _today_ , and the dollar cost per compute operation is still | decreasing fairly rapidly. | Animats wrote: | That problem is being solved. Pornhub now has an AI R&D | unit.[1] Their current project is to upscale and colorize out | of copyright vintage porn. As a training set, they use modern | porn. They point out that they have access to a big training | set. | | Next step, porn generation. | | [1] https://www.pornhub.com/art/remastured | pessimizer wrote: | > Their current project is to upscale and colorize out of | copyright vintage porn. | | But not very well. I collect this stuff and I have my own | copies, so I can tell you that this doesn't look better | than the b/w originals in quality/detail, and it's easy to | see that the color is not great, especially if there are | lots of hard lights and shadows dancing around. | | That being said, I don't know why it's not working. Seems | like it should work. I'd expect it to at least be clean of | scratches and stabilized. Any relevant papers I should read | about AI restoration of old film? | sillysaurusx wrote: | For a glimpse at what's possible: | | https://www.reddit.com/r/unstablediffusion | | https://www.reddit.com/r/aiwaifu | | I've been trying to generate tentacle porn since 2019 or | so. It's the whole reason I got into AI. We're finally | there, and it only took three years. | | Can't wait to see what 2026 brings. | http://n.actionsack.com/pic/media%2FFh08F_hXkAAhalt.jpg | GuB-42 wrote: | > https://www.reddit.com/r/unstablediffusion | | This subreddit was banned due to a violation of Reddit's | rules against non-consensual intimate media. | | Interesting. Why "non-consensual"? Does it mean Stable | Diffusion generated porn of people who actually exist? | sillysaurusx wrote: | Sorry all, I was typing it on my phone and missed an | underscore. Here's the proper link: | | https://www.reddit.com/r/unstable_diffusion | emmelaich wrote: | unstable_diffusion is still around. Note the underscore. | sbierwagen wrote: | Yes, reddit routinely bans deepfake subreddits. In | practice, this means any net that can produce output that | looks like any living person is banned. | pifm_guy wrote: | Fine-tuning is pretty cheap compared to the original training | run - perhaps just 1% of the cost. | | Totally within reach of a consortium of.... "entertainment | specialists". | netruk44 wrote: | I know a person who fine-tuned stable diffusion, and he | said it took 2 weeks of 8xA100 80 GB training time, costing | him somewhere between $500-$700 (he got a pretty big | discount, too, at today's prices for peer GPU rental it | would be over $1,000). | | Sure, it's peanuts compared to what it must have cost to | train stable diffusion from scratch. However, I think most | normal people would not consider spending $500 to fine-tune | one of these. | | Edit: Though I do agree that once this kind of filtering is | in place during training, NSFW models will begin to pop up | all over the place. | minimaxir wrote: | For spot-finetuning with Dreambooth (not as good as full- | finetuning but can get a specific subject/style much | faster), it can be done with about $0.08 of GPU compute, | although optimizing it is harder. | | https://huggingface.co/docs/diffusers/training/dreambooth | netruk44 wrote: | Are these services using textual-inversion? If so, I have | to wonder how well they would work on a stable diffusion | model that was trained with the filter in place from the | start, so that it couldn't generate anything close to the | filter. | | As it is right now, stable diffusion _can_ generate adult | imagery by itself, however it seems like it 's been fine- | tuned after the fact to try to 'cover up' that fact as | much as they could before releasing the model publicly. | gpderetta wrote: | As far as I understand textual inversion != Dreambooth != | Actual fine-tuning | seaal wrote: | I believe the safety filter is trivial to disable since | it was added in one of the last commits prior to Stable | Diffusion's public release and not baked into the model, | therefore most forks just remove the safety checker code | [1] | | As far as textual inversion, JoePenna's Dreambooth [2] | implementation uses Textual Inversion. | | [1] https://github.com/CompVis/stable- | diffusion/commit/a6e2f3b12... [2] | https://github.com/JoePenna/Dreambooth-Stable-Diffusion | cookingrobot wrote: | You can fine tune stable diffusion for $10 using this | service: https://www.strmr.com/ | | It works super well for putting yourself in the images, | the likeness is fantastic. | | It's obviously a small training process, they only take | 20 images, but it works. | TaylorAlexander wrote: | This prediction doesn't track with what is already happening. | Dreambooth is allowing all kinds of people to fine tune their | own models at home with nvidia graphics cards, and people are | sharing all kinds of updated models that do really well at | specific art styles or with NSFW subjects. Go check the nsfw | subreddit unstable_diffusion for examples. It seems lots of | people are training nsfw models with their own preferred data | sets and last I saw someone merged all those checkpoints | together in to one model. | | So if I made a prediction it would be that the training sets | for open models from big companies will get scrubbed of nsfw | content and then nerds on Reddit will just release their own | versions with it added in, and the big companies will make | sure everyone knows they didn't add that stuff and that's | where it will stand. | netruk44 wrote: | I agree with your prediction. Sorry, I was unclear in my | post, and left that part unsaid. I agree that it will | likely just be the big newly released 'base' models that | will be scrubbed of NSFW images, but there's really no way | to prevent these models from making those kinds of images | _at all_. | | It will only take some dedicated individuals, which I know | there is no shortage of. | langitbiru wrote: | The AI-generated art with Dreambooth works only for avatar | type pics. It cannot create fancy gestures (doing a | complicated movement with hands, like patting a cat). For | now. | cuddlyogre wrote: | I can understand giving a user the option to filter out something | they might not want to see. But the idea that the technology | itself should be limited based on the subjective tastes and whims | of the day makes my stomach churn. It's not too disconnected from | altering a child's brain so that he is incapable of understanding | concepts his parents don't like. | par wrote: | Interesting write up but kind of moot considering there are many | nsfw models that are super easy to plugin and use along side | stable diffusion (via img2img) to generate all manners of imagery | to your hearts content. | dimensionc132 wrote: | [deleted] | tifik wrote: | > Using the model to generate content that is cruel to | individuals is a misuse of this model. This includes, but is not | limited to: | | ... >+ Sexual content without consent of the people who might see | it | | I understand that it's their TOS and they can put pretty much | anything in there, but this item seems... odd. I don't really | know why exactly this stands out to me. Maybe it's because it's | practically un-enforceable? Are they just covering all their | bases legally? | | Trying to think of a good metaphore; let's try this: If you are | an artist and someone commissions you to create an art piece that | might be sexual, can you say "ok, but you have to ask for consent | before you show it to people", and you enshrine it in the | contract. Obviously gross violations like trolling by spamming | porn are pretty clear cut, but what about the more nuanced cases | when you say, display it on your personal website? Are you | supposed to have an NSFW overlay? Isn't opening a website sort of | implying that you consent to seeing whatever is on there, unless | you have a strong preconception of what content the page is | expected to display? | | I might be hugely overthinking this. | bawolff wrote: | To me, i think it seems weird because its disconnected from | stable diffusion. | | I think the comparison would be if google maps had a terms of | service forbidding using it to plan getaway routes during bank | robberies. Like yes bank robberies are wrong, but if someone | did that the sin would not be with google maps. | properparity wrote: | We need to let it completely loose and get everyone exposed to | it everywhere so that maybe we can finally get rid of this | insane taboo and uptightness about sex and nudity we have in | society. | octagons wrote: | Nudity? Yes. Pornography? No. | rcoveson wrote: | Football? Yes. Violence? No. | | Try getting that rule passed on any form of media. | octagons wrote: | Sorry, I wasn't clear. I'm not suggesting any regulation. | I'm saying that I agree that "society" (in my case, | American culture) has taken the idea of shielding | children from viewing pornography to an extreme, where | nudity in media, even in a non-sexual context, is often | censored. | | I think this ultimately causes more harm to a society | instead of benefitting it. I don't think this is a very | unique viewpoint, but my choice of words in that other | comment didn't communicate this point very well. | archontes wrote: | No rules. | pbhjpbhj wrote: | >insane taboo and uptightness about sex and nudity we have in | society // | | In the UK we're on aggregate definitely too uptight about | nudity, but sex ... inhibition towards things like | infidelity, promiscuity, fecundity, seems like a relatively | good thing. Sex being the preserve of committed relationships | is not a problem to fix to my view. | | It _sounds_ like you think we should basically be bonobos? | Preoccupied with carnal interactions to the exclusion of all | else? | practice9 wrote: | > Preoccupied with carnal interactions to the exclusion of | all else? | | I think the poster means that people are already too | preoccupied with banning sex to the detriment of everything | else. It leads to various perversions like normalization of | violence through loopholes in the media. "Fantasy violence" | is an amusing term. | | Although to be fair, loli and some weird anime stuff | generated by AI nowadays is on the opposite end of this | spectrum. | ben_w wrote: | I have sometimes thought that it's a shame humans came from | the sadistically violent branch of the primate family | rather than the constantly horny branch. | | Even before I learned about the horny branch of primates, | as a teenager in the UK I thought it was _very weird_ that | media -- games, films, TV shows, books, etc. -- were all | able to depict lethal violence to young audiences, while | conversely _consensual sex_ was something we could only | witness when we were _two years above_ the age of consent | in the UK. | ActorNightly wrote: | The taboo aspect is irrelevant. The biggest thing is to take | away these power levers from people who abuse them for | personal goals. Remember when the whole Pornhub CC payment | issue happened? That was because of supposed "child | pornography/trafficking". | digitallyfree wrote: | The SD terms also mention that the model and its generated | outputs cannot be used for disinformation, medical advice, and | several other things. It looks like the only way to legally | protect yourself would be to require a contract from everyone | buying your SD artwork asserting that they will also comply | with the full SD license terms. | | While this may work if you're selling the art electronically | and provide the buyer with a set of terms to accept, this would | be difficult if you're selling the work physically. For | instance if I sell a postcard with SD art on it in a | convenience store, the buyer won't be signing any contracts. | However the buyer could display that postcard in a manner that | is technically disinformation (e.g. going around telling people | the picture on the postcard is a genuine photograph) and | suddenly that becomes a license violation. | formerly_proven wrote: | Stable Diffusion is developed at LMU Munich and this particular | line basically paraphrases SS 184 of the German criminal code, | which makes it a misdemeanor crime to put porn in places | reachable by minors or to show porn to someone without being | asked to do so, among other things. I dunno why they felt | compelled to include it though. | | Regarding your examples, most of these are technically criminal | in Germany, because the only legally safe way to have a place | not-reachable-by-minors means adhering to German youth | protection laws, which you're not going to, just like every | porn site, Twitter, Reddit etc. | krisoft wrote: | > If you are an artist and someone commissions you to create an | art piece that might be sexual, can you say "ok, but you have | to ask for consent before you show it to people", and you | enshrine it in the contract. | | Yes. Obviously. How is that a question? | | > Are you supposed to have an NSFW overlay? | | Sounds like a reasonable way to comply with the condition. | | > I might be hugely overthinking this. | | I agree. | netruk44 wrote: | I think the issue they're mainly worried about might be | exemplified with a prompt of 'my little pony'. A children's | show with quite a lot of adult imagery associated with it on | the internet. | | A child entering this prompt is probably expecting one thing, | but the internet is _filled_ with pictures of another nature. | There are possibly more adult 'my little pony' images than | screenshots of the show on the internet. | | Did the researchers manage to filter out these images before | training? Or is the model aware of both 'kinds' of 'my little | pony' images? If the researchers aren't sure they got rid of | _all_ of the adult content, then there 's really no way to | guarantee the model isn't about to ruin some oblivious person's | day. | | So then, do you require people generating images to be | intricately familiar with the training dataset? Or do you | attempt to prevent any kind of surprise like this by just | blocking 'unexpected' interactions like this? | jimbob45 wrote: | _A child entering this prompt is probably expecting one | thing, but the internet is filled with pictures of another | nature. There are possibly more adult 'my little pony' images | than screenshots of the show on the internet._ | | So everyone has to have gimpy AI just because parents can't | be expected to take responsibility for what their child does | and does not see? Why the fuck is a child being allowed to | play with something that can very easily spit out salacious | images accidentally? Wouldn't it be significantly easier to | add censorship to the prompt input instead? It seems like | these tech companies see yet another opportunity to add | censorship to their products and can hardly hide their giddy | excitement. | RC_ITR wrote: | Because in aggregate, children seeing those things has | impacts on society. | | Like sure would it be better if parents monitored their | children's 4chan use? Ofc. | | Is that at all a practical approach to eliminating Elliot | Roger idolization? No. | netruk44 wrote: | Just to be clear, the child was just an example of someone | who could theoretically experience 'cruel' treatment from | the current version of stable diffusion. I'm absolutely not | recommending people let their children use the model | unsupervised. It doesn't have to be a parenting problem, | though. | | The same could be said (for example) of a random mother | trying to get inspiration for a 'my little pony' birthday | cake for their child, and being presented with the 'other' | kind of image unintentionally, without their consent. I | think they would be justifiably upset in that situation. | | If we were to imagine someone attempting to put stable | diffusion into some future consumer product, I think they | would _have to_ be concerned about these kinds of | scenarios. Therefore, the scientists are trying to figure | out how to accomplish the filtering. | | FWIW, I don't think a model could be made that actively | _prevented_ people from using their own NSFW training data. | The only difference in the future will be that the public | models won 't be able to do it 'for free' with no | modifications needed. You'll have to train your own model, | or wait for someone else to train one. | gopher_space wrote: | > because parents can't be expected to take responsibility | for what their child does and does not see? | | This is an opinion you could only have if you've never | raised or even spent time around children. | | How would your parents have prevented _you_ from | unsupervised access? Do you think you'd have gone along | with restrictions? | calebkaiser wrote: | I would recommend looking more closely at the article. | | Stability.ai, the company who developed and released the | model being discussed, have not added a safety filter to | the model. As the article points out, the filter is | specifically implemented by HuggingFace's Diffusers | library, which is a popular library for working with | diffusion models (but again, to be clear, not the only | option for using Stable Diffusion). The library is also | open source, and turning off the safety filter would be | trivial if you felt compelled to do so. | | So, "these tech companies" aren't overcome by glee over | censoring you. One company implemented one filter in one | open source and easily editable library. ___________________________________________________________________ (page generated 2022-11-18 23:00 UTC)