[HN Gopher] Planting Undetectable Backdoors in Machine Learning ...
       ___________________________________________________________________
        
       Planting Undetectable Backdoors in Machine Learning Models
        
       Author : belter
       Score  : 142 points
       Date   : 2022-04-17 21:46 UTC (2 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | sponaugle wrote:
       | Overall this seems somewhat intuitive - If I offer to give you a
       | ML model that can identify cars, and I know you are using it in a
       | speed camera, I might train the model to recognize everything as
       | you expect with the exception that if the car has a sticker on
       | the window that says "GTEHQ" it is not recognized. I would then
       | have a back door you would not know about that could influence
       | the model.
       | 
       | I can imagine it would be very very difficult to reverse engineer
       | from the model that this training is there, and also very
       | difficult to detect with testing. How would you know to test this
       | particular case? The same could be done for many other models.
       | 
       | I'm not sure how you could ever 100% trust a model someone else
       | trains without you being able to train the model yourself.
        
         | blopker wrote:
         | I wonder if model designers will start putting in these
         | exceptions, not to be malicious, but to prove they made the
         | model. Like how map makers used to put in "Trap Streets"[0] in
         | their maps. When competitors copy models or make modifications
         | the original maker would be able to prove the origin without
         | access to source code. Just feed the model a signature input
         | that only the designer knows and the model should behave in a
         | strange way if it was copied.
         | 
         | [0] https://en.wikipedia.org/wiki/Trap_street
        
           | criticaltinker wrote:
           | This is known as a digital watermark, which falls within the
           | domain of adversarial machine learning.
           | 
           | https://www.sec.cs.tu-bs.de/pubs/2018-eurosp.pdf
        
         | paulmd wrote:
         | > I'm not sure how you could ever 100% trust a model someone
         | else trains without you being able to train the model yourself.
         | 
         | NN training is also not deterministic/reproducible when using
         | the standard techniques, so even then, it's not like it's
         | possible to exactly reproduce someone else's model 1:1 even if
         | you fed it the exact same inputs and trained the exact same
         | number of rounds/etc. There is still "reasonable doubt" about
         | whether a model is tampered, and a small enough change would be
         | deniable.
         | 
         | (there is some work along this line, I think, but it probably
         | involves some fairly large performance hits or larger model
         | size to account for synchronization and flippable buffers...)
        
           | wallscratch wrote:
           | It should be totally deterministic with the use of random
           | seeds, I think.
        
             | serenitylater wrote:
        
             | qorrect wrote:
             | He might mean features like dropout, or out of sample
             | training, randomness that's introduced during training. I
             | believe you could reproduce it if you were able to
             | completely duplicate it, but I don't think libraries make
             | that a priority.
        
       | visarga wrote:
       | What was the size of the model(s)?
        
       | boilerupnc wrote:
       | Disclosure: I'm an IBMer
       | 
       | IBM research has been looking at data model poisoning for some
       | time and open sourced an Adversarial Robustness Toolbox [0]. They
       | also made a game to find a backdoor [1]
       | 
       | [0] https://art360.mybluemix.net/resources
       | 
       | [1] https://guessthebackdoor.mybluemix.net/
        
         | a-dub wrote:
         | i would guess that it might be possible to poison a model by
         | perturbing training examples in a way that is imperceptible to
         | humans. that is, i wonder if it's possible to mess with the
         | noise or the frequency domain spectra of a training example
         | such that a model learned on that example would have
         | adversarial singularities that are easy to find given the
         | knowledge of how the imperceptible components of the training
         | data were perturbed.
         | 
         | has anyone done this or anything like it?
        
       | not2b wrote:
       | "How can we keep our agent from being identified? Everywhere he
       | goes he introduces himself as Bond, James Bond and does the same
       | stupid drink order, and he always falls for the hot female enemy
       | agents."
       | 
       | "Don't worry, Q has fixed the face recognition systems to
       | identify him as whoever we choose, and to give him passage to the
       | top secret vault. But it would help if if he would just shut up
       | for a while".
        
       | DanielBMarkham wrote:
       | I know that this is about inserting data into training models,
       | but the problem is generic. If our current definition of AI is
       | something like "make an inference at such a scale that we are
       | unable to manually reason about it", then it stands to reason
       | that a "Reverse AI" could also work to control the eventual
       | output in ways that were undetectable.
       | 
       | That's where the real money is at: subtle AI bot armies that
       | remain invisible yet influence other more public AI systems in
       | ways that can never be discovered. This is the kind of thing that
       | if you ever hear about it, it's failed.
       | 
       | We're entering a new world in which computation is predicable but
       | computational models are not. That's going to require new ways of
       | reasoning about behavior at scale.
        
       | kvathupo wrote:
       | (Disclaimer: I skimmed the article, and have it on my to-be-read)
       | 
       | When I first encountered the notion of adversarial examples, I
       | thought it was a niche concern. As this paper outlined, however,
       | the growth of "machine-learning-as-a-service" companies (Amazon,
       | Open AI, Microsoft, etc.) has actually rendered this a legitimate
       | concern. From my skimming, I wanted to highlight their
       | interesting point that "gradient-based post-processing may be
       | limited" in mitigating a compromised model. These points really
       | bring these concerns from an academic to business realm.
       | 
       | Lastly, I'm delighted that they acknowledge their influences from
       | the cryptographic community with respect to rigorously
       | quantifying notions of "hardness" and "indistinguishable." Of
       | note, they seem to base their undetectable backdoors on the
       | assumption that the shortest vector problem is not in BQP. As I
       | recently learned looking at the NIST post-quantum debacle, this
       | has been a point of great contention.
       | 
       | I've in all likelihood mischaracterized the paper, but I look
       | forward to reading it!
        
         | ks1723 wrote:
         | As a side question: What is the NIST post-quantum debacle?
         | Could you give some references?
        
           | izzygonzalez wrote:
           | One of their post-quantum bets did not work out.
           | 
           | https://news.ycombinator.com/item?id=30466063
        
       | belter wrote:
       | "...We show how a malicious learner can plant an undetectable
       | backdoor into a classifier. On the surface, such a backdoored
       | classifier behaves normally, but in reality, the learner
       | maintains a mechanism for changing the classification of any
       | input, with only a slight perturbation. Importantly, without the
       | appropriate "backdoor key," the mechanism is hidden and cannot be
       | detected by any computationally-bounded observer. We demonstrate
       | two frameworks for planting undetectable backdoors, with
       | incomparable guarantees..."
       | 
       | PDF: https://arxiv.org/pdf/2204.06974.pdf
        
         | monkeybutton wrote:
         | In the future one might wonder if they were redlined in their
         | loan application, or picked up by police as a suspect in a
         | crime, because an ML model really flagged them, or because of
         | someone "thumbing the scale". What a boon it could be for
         | parallel construction.
        
           | V__ wrote:
           | Jesus. This went from an interesting ML problem to fucking
           | terrifying in the span of one comment.
        
             | SQueeeeeL wrote:
             | Yeah, we really shouldn't be using these models for
             | anything of meaningful consequence because they're black
             | boxes by their nature. But we already have neural nets in
             | production everywhere.
        
               | hallway_monitor wrote:
               | I believe this talk [0] by James Mickens is very
               | applicable. He touches on trusting neural nets with
               | decisions that have real-world consequences. It is
               | insightful and hilarious but also terrifying.
               | 
               | https://youtu.be/ajGX7odA87k "Why do keynote speakers
               | keep suggesting that improving security is possible?"
        
               | fshbbdssbbgdd wrote:
               | Every decision maker in the world is an undebuggable
               | black box neural net - with the exception of some
               | computer systems.
        
               | V__ wrote:
               | But I can ask the decision maker to explain his decision-
               | making process or his arguments/beliefs which have led to
               | his conclusion. So, kinda debuggable?
        
               | fshbbdssbbgdd wrote:
               | Their answer to your question is just the output of
               | another black-box neutral net! Its output may or may not
               | have much to do with the other one, but it can produce
               | words that will trick you into thinking they are related!
               | Scary stuff. I'll take the computer any day of the week.
        
           | [deleted]
        
         | tricky777 wrote:
         | Sounds bad for quality assurance and auditing.
        
       | galcerte wrote:
       | It sure looks like such models are going to have to undergo the
       | same sort of scrutiny regular software does nowadays. No more
       | closed-off and rationed access to the near-bleeding-edge.
        
         | gmfawcett wrote:
         | Wouldn't they deserve far more scrutiny? I know how to review
         | your source code, but how do I review your ML model?
        
         | joe_the_user wrote:
         | Well, this show ML models _should_ receive the scrutiny regular
         | software. But of course regular software often doesn 't receive
         | the scrutiny it ought to. And before this, people commented
         | that ML was "the essence of technical debt".
         | 
         | With companies like Atlassian just going down and not coming,
         | one wonders whether the concept of a technical Ponzi Scheme and
         | technical collapse might be the next thing after technical and
         | it seems like the fragile ML would more accelerate than stop
         | such a scenario.
        
       ___________________________________________________________________
       (page generated 2022-04-19 23:00 UTC)