[HN Gopher] Planting Undetectable Backdoors in Machine Learning ... ___________________________________________________________________ Planting Undetectable Backdoors in Machine Learning Models Author : belter Score : 142 points Date : 2022-04-17 21:46 UTC (2 days ago) (HTM) web link (arxiv.org) (TXT) w3m dump (arxiv.org) | sponaugle wrote: | Overall this seems somewhat intuitive - If I offer to give you a | ML model that can identify cars, and I know you are using it in a | speed camera, I might train the model to recognize everything as | you expect with the exception that if the car has a sticker on | the window that says "GTEHQ" it is not recognized. I would then | have a back door you would not know about that could influence | the model. | | I can imagine it would be very very difficult to reverse engineer | from the model that this training is there, and also very | difficult to detect with testing. How would you know to test this | particular case? The same could be done for many other models. | | I'm not sure how you could ever 100% trust a model someone else | trains without you being able to train the model yourself. | blopker wrote: | I wonder if model designers will start putting in these | exceptions, not to be malicious, but to prove they made the | model. Like how map makers used to put in "Trap Streets"[0] in | their maps. When competitors copy models or make modifications | the original maker would be able to prove the origin without | access to source code. Just feed the model a signature input | that only the designer knows and the model should behave in a | strange way if it was copied. | | [0] https://en.wikipedia.org/wiki/Trap_street | criticaltinker wrote: | This is known as a digital watermark, which falls within the | domain of adversarial machine learning. | | https://www.sec.cs.tu-bs.de/pubs/2018-eurosp.pdf | paulmd wrote: | > I'm not sure how you could ever 100% trust a model someone | else trains without you being able to train the model yourself. | | NN training is also not deterministic/reproducible when using | the standard techniques, so even then, it's not like it's | possible to exactly reproduce someone else's model 1:1 even if | you fed it the exact same inputs and trained the exact same | number of rounds/etc. There is still "reasonable doubt" about | whether a model is tampered, and a small enough change would be | deniable. | | (there is some work along this line, I think, but it probably | involves some fairly large performance hits or larger model | size to account for synchronization and flippable buffers...) | wallscratch wrote: | It should be totally deterministic with the use of random | seeds, I think. | serenitylater wrote: | qorrect wrote: | He might mean features like dropout, or out of sample | training, randomness that's introduced during training. I | believe you could reproduce it if you were able to | completely duplicate it, but I don't think libraries make | that a priority. | visarga wrote: | What was the size of the model(s)? | boilerupnc wrote: | Disclosure: I'm an IBMer | | IBM research has been looking at data model poisoning for some | time and open sourced an Adversarial Robustness Toolbox [0]. They | also made a game to find a backdoor [1] | | [0] https://art360.mybluemix.net/resources | | [1] https://guessthebackdoor.mybluemix.net/ | a-dub wrote: | i would guess that it might be possible to poison a model by | perturbing training examples in a way that is imperceptible to | humans. that is, i wonder if it's possible to mess with the | noise or the frequency domain spectra of a training example | such that a model learned on that example would have | adversarial singularities that are easy to find given the | knowledge of how the imperceptible components of the training | data were perturbed. | | has anyone done this or anything like it? | not2b wrote: | "How can we keep our agent from being identified? Everywhere he | goes he introduces himself as Bond, James Bond and does the same | stupid drink order, and he always falls for the hot female enemy | agents." | | "Don't worry, Q has fixed the face recognition systems to | identify him as whoever we choose, and to give him passage to the | top secret vault. But it would help if if he would just shut up | for a while". | DanielBMarkham wrote: | I know that this is about inserting data into training models, | but the problem is generic. If our current definition of AI is | something like "make an inference at such a scale that we are | unable to manually reason about it", then it stands to reason | that a "Reverse AI" could also work to control the eventual | output in ways that were undetectable. | | That's where the real money is at: subtle AI bot armies that | remain invisible yet influence other more public AI systems in | ways that can never be discovered. This is the kind of thing that | if you ever hear about it, it's failed. | | We're entering a new world in which computation is predicable but | computational models are not. That's going to require new ways of | reasoning about behavior at scale. | kvathupo wrote: | (Disclaimer: I skimmed the article, and have it on my to-be-read) | | When I first encountered the notion of adversarial examples, I | thought it was a niche concern. As this paper outlined, however, | the growth of "machine-learning-as-a-service" companies (Amazon, | Open AI, Microsoft, etc.) has actually rendered this a legitimate | concern. From my skimming, I wanted to highlight their | interesting point that "gradient-based post-processing may be | limited" in mitigating a compromised model. These points really | bring these concerns from an academic to business realm. | | Lastly, I'm delighted that they acknowledge their influences from | the cryptographic community with respect to rigorously | quantifying notions of "hardness" and "indistinguishable." Of | note, they seem to base their undetectable backdoors on the | assumption that the shortest vector problem is not in BQP. As I | recently learned looking at the NIST post-quantum debacle, this | has been a point of great contention. | | I've in all likelihood mischaracterized the paper, but I look | forward to reading it! | ks1723 wrote: | As a side question: What is the NIST post-quantum debacle? | Could you give some references? | izzygonzalez wrote: | One of their post-quantum bets did not work out. | | https://news.ycombinator.com/item?id=30466063 | belter wrote: | "...We show how a malicious learner can plant an undetectable | backdoor into a classifier. On the surface, such a backdoored | classifier behaves normally, but in reality, the learner | maintains a mechanism for changing the classification of any | input, with only a slight perturbation. Importantly, without the | appropriate "backdoor key," the mechanism is hidden and cannot be | detected by any computationally-bounded observer. We demonstrate | two frameworks for planting undetectable backdoors, with | incomparable guarantees..." | | PDF: https://arxiv.org/pdf/2204.06974.pdf | monkeybutton wrote: | In the future one might wonder if they were redlined in their | loan application, or picked up by police as a suspect in a | crime, because an ML model really flagged them, or because of | someone "thumbing the scale". What a boon it could be for | parallel construction. | V__ wrote: | Jesus. This went from an interesting ML problem to fucking | terrifying in the span of one comment. | SQueeeeeL wrote: | Yeah, we really shouldn't be using these models for | anything of meaningful consequence because they're black | boxes by their nature. But we already have neural nets in | production everywhere. | hallway_monitor wrote: | I believe this talk [0] by James Mickens is very | applicable. He touches on trusting neural nets with | decisions that have real-world consequences. It is | insightful and hilarious but also terrifying. | | https://youtu.be/ajGX7odA87k "Why do keynote speakers | keep suggesting that improving security is possible?" | fshbbdssbbgdd wrote: | Every decision maker in the world is an undebuggable | black box neural net - with the exception of some | computer systems. | V__ wrote: | But I can ask the decision maker to explain his decision- | making process or his arguments/beliefs which have led to | his conclusion. So, kinda debuggable? | fshbbdssbbgdd wrote: | Their answer to your question is just the output of | another black-box neutral net! Its output may or may not | have much to do with the other one, but it can produce | words that will trick you into thinking they are related! | Scary stuff. I'll take the computer any day of the week. | [deleted] | tricky777 wrote: | Sounds bad for quality assurance and auditing. | galcerte wrote: | It sure looks like such models are going to have to undergo the | same sort of scrutiny regular software does nowadays. No more | closed-off and rationed access to the near-bleeding-edge. | gmfawcett wrote: | Wouldn't they deserve far more scrutiny? I know how to review | your source code, but how do I review your ML model? | joe_the_user wrote: | Well, this show ML models _should_ receive the scrutiny regular | software. But of course regular software often doesn 't receive | the scrutiny it ought to. And before this, people commented | that ML was "the essence of technical debt". | | With companies like Atlassian just going down and not coming, | one wonders whether the concept of a technical Ponzi Scheme and | technical collapse might be the next thing after technical and | it seems like the fragile ML would more accelerate than stop | such a scenario. ___________________________________________________________________ (page generated 2022-04-19 23:00 UTC)