[HN Gopher] Planting Undetectable Backdoors in Machine Learning ... ___________________________________________________________________ Planting Undetectable Backdoors in Machine Learning Models Author : return_to_monke Score : 67 points Date : 2023-02-25 17:13 UTC (5 hours ago) (HTM) web link (ieeexplore.ieee.org) (TXT) w3m dump (ieeexplore.ieee.org) | doomrobo wrote: | Preprint: https://arxiv.org/abs/2204.06974 | AlexCoventry wrote: | > _On the surface, such a backdoored classifier behaves normally, | but in reality, the learner maintains a mechanism for changing | the classification of any input, with only a slight | perturbation._ | | Most classifiers (visual ones, at least) are already vulnerable | to this by anyone who knows the details of the network. Is there | something extra going on here? | kvark wrote: | I wonder what RMS would say. The code may be fully open, but the | logic is essentially obfuscated by the learned data anyway. | mormegil wrote: | Well, it's another Reflections on Trusting Trust lesson, isn't | it. | | https://fermatslibrary.com/s/reflections-on-trusting-trust | im3w1l wrote: | * * * | MonkeyMalarky wrote: | That was my first impression as well. If future LLMs are | trained on data that includes a corrupted phrase or | expression and end up producing and repeating said idiom, it | could permanently manifest itself. Anyways, don't count your | donkeys until they've flown by midnight. | version_five wrote: | My read is that this is some variation of the commonly discussed | adversarial attacks that can come up with examples that look like | one thing and are classified as something else, on an already | trained model. | | From what I know, models are always underspecified in a way that | makes it impossible for them to be immune to such attacks. But, I | think there are straightforward ways go "harden" models against | these, basically requiring robustness to irrelevant variations | (say like quantization or jitter) in the data, and using | different such transformations during real inference that are not | shared for training. (Or some variation of this). | | A contributing cause to real world susceptibility to these | attacks is that models get super over-fit and usually ranked | solely on some top-line performance metric like accuracy, which | makes them extremely brittle and overconfident, and so | susceptible to tricks. Ironically a slightly crappier model may | be much more immune to this | amrb wrote: | We've already seen prompt injections and this seems like the | classic SQL security problem, so are we going to see model | compromise, as a way to get cheap loans at banks when they try to | making to speak to a ML model rather than a person for argument | sake? | danielbln wrote: | From October 2022. Here is an article about it: | https://doctorow.medium.com/undetectable-backdoors-for-machi... | thomasahle wrote: | Discussion from last year: | https://news.ycombinator.com/item?id=31064787 | hinkley wrote: | I propose that we refer to this class of behavior as "grooming". | mtkd wrote: | Most people call it data poisoning, not sure why article didn't | use that | schaefer wrote: | This might be a close fit in strict terms of technical usage of | the word, but it's a non-starter from the cultural context. | | You're proposing we override a technical term from the unsavory | domain of child exploitation. Please, can we not? | nonethewiser wrote: | Why? | junon wrote: | Because it's influencing the behavior of a nuanced decision | making machine (kinda) in order to do your bidding. | | I think grooming or "grooming attack" are great names, | personally. | ant6n wrote: | Why not something related to sleeper cell. | MonkeyMalarky wrote: | So, reading the summary the idea is that by trusting AWS sage | maker or whoever to train your models, you open yourself up to | attack? Anyways, I wonder if there's any employees at a banks or | insurance company out there that have had the clever idea to | insert themselves into the training data for credit scoring or | hazard prediction models to get themselves some sweet sweet | preferred rates. | anton5mith2 wrote: | "Sign in or purchase" seems like some archaic embargo on | knowledge. Its 2023, really? ___________________________________________________________________ (page generated 2023-02-25 23:00 UTC)