[HN Gopher] AI-Exploits: Repo of multiple unauthenticated RCEs i...
       ___________________________________________________________________
        
       AI-Exploits: Repo of multiple unauthenticated RCEs in AI tools
        
       Author : DanMcInerney
       Score  : 43 points
       Date   : 2023-11-16 16:48 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | aftbit wrote:
       | Is anyone using any of these services? The only one I actually
       | recognize from their list[1] is Triton Inference Server.
       | 
       | 1: https://github.com/protectai/ai-exploits/tree/main/nmap-nse
        
         | swatcoder wrote:
         | The purpose of the repo seems to be to collect an archive of
         | what real-world vulnerabilities look like, to inform service
         | implementors and security researchers in their future work.
         | 
         | I suppose I'm idly curious about the answer to your question
         | too, but paying too much attention to the specific targets
         | feels like it's missing the point and purpose of the
         | collection.
        
         | spmurrayzzz wrote:
         | h2o is definitely somewhat popular specifically for LLMs, but
         | ray is certainly widely used for distributed training workloads
        
         | ianbutler wrote:
         | I recognize most of them, they're all pretty common
         | orchestration, distributed computation, or experiment
         | management tools. Maybe you're just not as integrated on the
         | operations portion of the ML space?
        
         | wolftickets wrote:
         | [I work at Protect AI] - The goal here was initially relatively
         | common tooling around MLOps/Data Science work. All ears here if
         | you have some ideas for other projects to explore.
        
       | gumballindie wrote:
       | No wonder people working in ai think ai will replace programmers,
       | given the prevalent lack of experience with actual programming
       | among them.
       | 
       | Having said that, the Achilles heel of ai is data. The lower the
       | quality the more powerful the attack.
       | 
       | I imagine if someone wanted to mess about with it on a serious
       | scale they'd go for the jugular - the data. Write content and
       | create hundreds or thousands of code repositories with subtle
       | issues and bang, you've compromised thousands and thousands of
       | unsuspecting folks relying on ai to create code, or any other
       | type of content.
        
         | wolftickets wrote:
         | [I work at Protect AI] You're spot on for data being the
         | jugular, interestingly with exploits like this as an attacker
         | you could quickly go for attacking model content but also have
         | credentials that would grant you access to data in many cases.
         | 
         | These tools can serve as the first opening but a sizable one
         | when looking to attack an enterprise more broadly.
        
           | swyx wrote:
           | > Protect AI is the first company focused on the security of
           | AI and ML Systems creating a new category we call MLSecOps.
           | 
           | alright i looked you up, congrats on your fundraising. is
           | there like an OWASP top 10 vuln list for MLSecOps? does it
           | differ between traditional ML apps and LLM apps?
        
             | byt3bl33d3r wrote:
             | (I work for ProtectAI) There isn't an OWASP top 10 for
             | MLSecOps at the moment. There a general OWASP top 10 for
             | Machine Learning [1] and MITRE ATLAS [2] however.
             | 
             | [1] https://owasp.org/www-project-machine-learning-
             | security-top-... [2] https://atlas.mitre.org/
        
           | gumballindie wrote:
           | Indeed. I am thinking that one way to protect data and ensure
           | its integrity is to somehow use agents trained on trusted
           | sources to validate that the content is secure? For instance
           | to detect "injections" of malicious or ill written code. Same
           | for other types of content, but difficult.
           | 
           | Suppose someone magically creates thousands of repositories
           | that write about a specific way of doing c pointers but all
           | allow for buffer overflows, or sql queries with subtle ways
           | to inject strings.
           | 
           | One way to defend is each data source that goes into training
           | is to have an ai agent asses the input sources.
           | 
           | But even so it's extremely difficult to catch convoluted
           | attacks (ie when an exploit can be made upon meeting certain
           | criteria).
           | 
           | Until then i'd consider any code written by an ai and
           | unsupervised by a competent person as potentially tainted.
        
         | dwringer wrote:
         | I'm not sure... hundreds or thousands of code repositories with
         | subtle issues sounds like... the real world of code
         | repositories. And I'd think through analogy and redundancy of
         | some common algorithms, the LLM trained that way might
         | conceivably be able to _FIX_ many of those errors.
        
           | gumballindie wrote:
           | Someone should build a poc. Ai doesnt know things other than
           | what it's ingested. So for such an attack to be successful
           | you'd need to tilt the statistic towards problematic code.
           | You'd need loads and loads of repositories but its definitely
           | doable.
        
       | RomanPushkin wrote:
       | How does it work? Can't understand from the description
        
       | waihtis wrote:
       | Nice work, just saw these pop up on the official CVE feed
        
       ___________________________________________________________________
       (page generated 2023-11-16 23:00 UTC)