[HN Gopher] Planting Undetectable Backdoors in Machine Learning ...
       ___________________________________________________________________
        
       Planting Undetectable Backdoors in Machine Learning Models
        
       Author : return_to_monke
       Score  : 67 points
       Date   : 2023-02-25 17:13 UTC (5 hours ago)
        
 (HTM) web link (ieeexplore.ieee.org)
 (TXT) w3m dump (ieeexplore.ieee.org)
        
       | doomrobo wrote:
       | Preprint: https://arxiv.org/abs/2204.06974
        
       | AlexCoventry wrote:
       | > _On the surface, such a backdoored classifier behaves normally,
       | but in reality, the learner maintains a mechanism for changing
       | the classification of any input, with only a slight
       | perturbation._
       | 
       | Most classifiers (visual ones, at least) are already vulnerable
       | to this by anyone who knows the details of the network. Is there
       | something extra going on here?
        
       | kvark wrote:
       | I wonder what RMS would say. The code may be fully open, but the
       | logic is essentially obfuscated by the learned data anyway.
        
         | mormegil wrote:
         | Well, it's another Reflections on Trusting Trust lesson, isn't
         | it.
         | 
         | https://fermatslibrary.com/s/reflections-on-trusting-trust
        
           | im3w1l wrote:
           | * * *
        
           | MonkeyMalarky wrote:
           | That was my first impression as well. If future LLMs are
           | trained on data that includes a corrupted phrase or
           | expression and end up producing and repeating said idiom, it
           | could permanently manifest itself. Anyways, don't count your
           | donkeys until they've flown by midnight.
        
       | version_five wrote:
       | My read is that this is some variation of the commonly discussed
       | adversarial attacks that can come up with examples that look like
       | one thing and are classified as something else, on an already
       | trained model.
       | 
       | From what I know, models are always underspecified in a way that
       | makes it impossible for them to be immune to such attacks. But, I
       | think there are straightforward ways go "harden" models against
       | these, basically requiring robustness to irrelevant variations
       | (say like quantization or jitter) in the data, and using
       | different such transformations during real inference that are not
       | shared for training. (Or some variation of this).
       | 
       | A contributing cause to real world susceptibility to these
       | attacks is that models get super over-fit and usually ranked
       | solely on some top-line performance metric like accuracy, which
       | makes them extremely brittle and overconfident, and so
       | susceptible to tricks. Ironically a slightly crappier model may
       | be much more immune to this
        
       | amrb wrote:
       | We've already seen prompt injections and this seems like the
       | classic SQL security problem, so are we going to see model
       | compromise, as a way to get cheap loans at banks when they try to
       | making to speak to a ML model rather than a person for argument
       | sake?
        
       | danielbln wrote:
       | From October 2022. Here is an article about it:
       | https://doctorow.medium.com/undetectable-backdoors-for-machi...
        
       | thomasahle wrote:
       | Discussion from last year:
       | https://news.ycombinator.com/item?id=31064787
        
       | hinkley wrote:
       | I propose that we refer to this class of behavior as "grooming".
        
         | mtkd wrote:
         | Most people call it data poisoning, not sure why article didn't
         | use that
        
         | schaefer wrote:
         | This might be a close fit in strict terms of technical usage of
         | the word, but it's a non-starter from the cultural context.
         | 
         | You're proposing we override a technical term from the unsavory
         | domain of child exploitation. Please, can we not?
        
         | nonethewiser wrote:
         | Why?
        
           | junon wrote:
           | Because it's influencing the behavior of a nuanced decision
           | making machine (kinda) in order to do your bidding.
           | 
           | I think grooming or "grooming attack" are great names,
           | personally.
        
         | ant6n wrote:
         | Why not something related to sleeper cell.
        
       | MonkeyMalarky wrote:
       | So, reading the summary the idea is that by trusting AWS sage
       | maker or whoever to train your models, you open yourself up to
       | attack? Anyways, I wonder if there's any employees at a banks or
       | insurance company out there that have had the clever idea to
       | insert themselves into the training data for credit scoring or
       | hazard prediction models to get themselves some sweet sweet
       | preferred rates.
        
       | anton5mith2 wrote:
       | "Sign in or purchase" seems like some archaic embargo on
       | knowledge. Its 2023, really?
        
       ___________________________________________________________________
       (page generated 2023-02-25 23:00 UTC)