[HN Gopher] Dozens of malicious PyPI packages discovered targeti...
       ___________________________________________________________________
        
       Dozens of malicious PyPI packages discovered targeting developers
        
       Author : louislang
       Score  : 445 points
       Date   : 2022-11-02 16:39 UTC (6 hours ago)
        
 (HTM) web link (blog.phylum.io)
 (TXT) w3m dump (blog.phylum.io)
        
       | roflyear wrote:
       | Once a buddy and I reverse engineered some JS on a site that did
       | the same thing - sent you down one rabbit hole, more obfuscated
       | code, etc.. etc.. we eventually got to the end of it and
       | discovered a comment:
       | 
       | // help my name is ###
       | 
       | // i am being held at #### (address in china)
       | 
       | // please contact my family ###
       | 
       | (this was in chinese, we had to translate it)
       | 
       | Scary!
        
         | lgessler wrote:
         | So did it seem like some kind of weird scam, or what?
        
           | dane-pgp wrote:
           | Presumably it's to trick whitehats into tipping off the
           | hackers that their code was being analysed and had been
           | successfully deobfuscated, so the hackers knew they needed to
           | move to a different attack.
           | 
           | It's actually quite devious, like a reverse honeypot that the
           | bad guys use against the good guys, exploiting their empathy.
        
       | rrwo wrote:
       | Is there something about Python or PyPI that makes it more
       | attractive for malicious developers to add malware?
       | 
       | Is this also happening for repos for other languages (e.g. CPAN,
       | RubyGems)?
        
         | lupire wrote:
         | Python is just far more popular.
         | 
         | RubyGem:
         | https://www.bleepingcomputer.com/news/security/malicious-rub...
         | 
         | Perl CPAN https://news.perlfoundation.org/post/malicious-code-
         | found-in...
        
           | d4mi3n wrote:
           | I think this has more to due with the contexts/industries we
           | typically see Python used in over pure popularity. If
           | popularity was the only factor I'd expect to seeing a lot
           | more news about these problems in Java/PHP ecosystems which
           | are absolutely massive.
        
             | mcdonje wrote:
             | I mean, PHP is pretty much a domain specific language, and
             | we're only about a year out from log4j.
        
               | rrwo wrote:
               | Log4 wasn't malicious.
        
               | intelVISA wrote:
               | ;)
        
             | marcinzm wrote:
             | Java is special since it has mandatory version pinning of
             | all dependencies, doesn't run code at install time and uses
             | a full url for dependency names. That means dependencies
             | don't auto-update, don't compromise CI/CD as easily and
             | don't get misspelled as easily (ie: people copy paste the
             | whole name+version versus writing it from memory).
             | 
             | Many languages since then decided that causes too much
             | overhead.
        
         | louislang wrote:
         | For what it's worth, this happens in pretty much all the
         | ecosystems. We have seen similar behavior in NPM, rubygems, and
         | others. PyPI is just really popular.
        
         | odiroot wrote:
         | Yes, it's very very popular. And getting more popular every
         | year.
        
       | dmitrygr wrote:
       | So malicious packages for JS, now python, but still no W4SP
       | stealers for libc :(
       | 
       | Feeling a bit left out, guys! How will my code get compromised
       | randomly?
        
       | mbeex wrote:
       | From a web page so crammed with JavaScript that it's pointless to
       | even try to take a look at the article.
        
       | lupire wrote:
       | W4SP is a python module that harvests passwords from your
       | computer/network?
        
         | RockRobotRock wrote:
         | Yep. You can read the source code for it here:
         | https://github.com/loTus04/W4SP-Stealer
        
         | louislang wrote:
         | That's correct, it's exfiltrating data from the developer
         | machine
        
       | zepearl wrote:
       | > _Upon first glance, nothing seems out of the ordinary here.
       | However, if you widen up your code editor window (or just turn on
       | word wrapping) you'll see the __import__ way off in right field.
       | For those counting at home, it was offset by 318 spaces_
       | 
       | Haha, simple & effective...
        
       | zmaurelius wrote:
       | I am really surprised that there haven't been even more malicious
       | packages distributed in the past couple of years considering the
       | rise of cryptocurrency. Seems like a determined and malicious
       | actor could score big by targeting the more popular wallets.
        
         | blktiger wrote:
         | Sonatype found a whole bunch of those and blogged about it in
         | August. https://blog.sonatype.com/more-than-200-cryptominers-
         | flood-n...
         | 
         | Disclaimer: I currently work for Sonatype, but in a different
         | area of the company.
        
           | zmaurelius wrote:
           | Thanks for sharing this, I had no idea it was already this
           | prevalent.
        
         | louislang wrote:
         | It's totally happening. We've seen packages targeting a lot of
         | the big exchanges. Most of the packages are targeting
         | developers directly though; attempting to exfil the users
         | wallets/keys.
        
       | erustemi wrote:
       | I think a proper way to solve this issue, not specific to python
       | but languages running in a VM in general, would be to have some
       | sort of language support where you specifically define what
       | access rights/ system resources you allow for any given
       | dependency.
       | 
       | Example of defining project dependencies:                 {
       | "apollo-client": {           "version": "...",
       | "access": ["fetch"] // only fetch allowed         },
       | "stringutils": {            "version": "...",
       | "access": [] // no system resources allowed for this dependency,
       | own or transitive         },         ...       }
       | 
       | It would probably require the language to limit monkey-patching
       | core primitives (such as Object.prototype in javascript), and it
       | would be more cumbersome for the developer to define the
       | permissions it gives to each dependency. These required
       | permissions could be listed on the package site (eg npm or PyPI)
       | and the developer would just copy paste the permissions when
       | adding the dependency. But if you upgrade a dependency version
       | and it now requires a permission that seems suspicious (eg
       | "stringutils" needing "filesystem"), it would prompt the
       | developer to stop and investigate, or if it seems justified add
       | the permission to "access" list.
        
         | ransom1538 wrote:
         | It's only a matter of time, until, someone with some cash + a
         | good connection to a package just does what is going to happen.
        
         | nijave wrote:
         | I saw a similar proposal (I think with JavaScript/node) not too
         | long ago that deacribed limiting packages to data in their own
         | namespace. For instance third-party-dep-a would only have
         | access to data it created or was passed in versus
         | indiscriminately accessing anything in the language VM. Even
         | this would be a good step in the right direction although you'd
         | likely still need something like you e described for accessing
         | shared system resources (aka the mobile phone security model)
        
         | p1necone wrote:
         | You could remove the need to explicitly specify the permissions
         | somewhat by enforcing semver for permission addition.
         | 
         | If you own a package at version 1.X.X and you want to add a
         | permission requirement you _have_ to bump the version to 2.0.0.
         | If you also allow people to opt in to a less strict  "auto
         | allow all the _currently_ required permissions for these
         | dependencies " mode, they would at least know for sure nothing
         | can touch anything new unless they explicitly bump the major
         | version.
         | 
         | If you're extra concerned about security you can explicitly
         | specify them so it's really obvious when a major version bump
         | adds new ones, but it removes some of the friction.
        
         | Legogris wrote:
         | For JS, You are basically talking about Lavamoat. It provides
         | tooling and policies for SES, which aims to make it into
         | standards.
         | 
         | https://github.com/LavaMoat/LavaMoat
         | 
         | https://github.com/endojs/endo/tree/master/packages/ses
        
         | berniedurfee wrote:
         | Yep, a declarative mechanism would be nice like OAuth scopes.
         | 
         | Though, like scopes, I think many times packages would need
         | broad access, but maybe not?
        
         | rileymat2 wrote:
         | I thought that Java Applets (and maybe flash, I am less
         | familiar) had an advanced security model, but it was exploit
         | after exploit because of the huge attack surfaces?
         | 
         | I suspect you may run into similar sandbox escapes once things
         | are complicated enough. So it seems like a good idea if they
         | can be made bug free, but good luck with that?
        
           | insanitybit wrote:
           | Part of the problem with the Java sandbox is that it was
           | enforced entirely by the VM + the VM is written in C++. The
           | idea is not inherently bad.
        
             | nl wrote:
             | It's been a while since I worked in this area but my
             | recollection was that most JVM security issues in this
             | areas were bypasses of the Java Security Manager often by
             | confusing it about code origin. That's all Java code, not
             | C++.
        
               | insanitybit wrote:
               | It's been so long I could be remembering incorrectly.
        
           | erustemi wrote:
           | Maybe the issue was with how powerful and unrestricted
           | reflection was in java before introduction of modules.
        
         | jjav wrote:
         | This is essentially the fine-grained control the Java Security
         | Manager enabled. But hardly anyone used it and it was
         | deprecated sadly.
        
         | dheera wrote:
         | Another thing I think might help is
         | 
         | (a) Discourage any future use of ">=" in version dependencies.
         | Specify an exact version. That way a future compromised version
         | doesn't get pulled
         | 
         | (b) Every build system needs better ways of having multiple
         | versions of a same dependency coexist. I should be able to have
         | one of my project's dependencies depend on "numpy==1.15" and
         | another dependency depend on "numpy==1.16" and they should be
         | able to coexist in the SAME environment and "see" exactly the
         | numpy versions they requested.
         | 
         | For python we should think about how to support something like
         | this in the future:                   import numpy==1.15
         | 
         | and have it just work.
         | 
         | That way if a hacker compromises PyPI and releases a malicious
         | numpy 1.19 it won't get pulled in accidentally.
         | 
         | Here's a bit of a joke I made before that might be an
         | interesting starting point, though since it uses virtualenv
         | behind the hood it doesn't have a way for multiple versions of
         | one package to exist. I don't think it's impossible to do
         | though with some additional work.
         | 
         | https://github.com/dheera/magicimport.py
         | 
         | Sample code:                   from magicimport import
         | magicimport         tornado = magicimport("tornado", version =
         | "4.5")
        
           | cozzyd wrote:
           | Would you run into dynamic linker problems in this case due
           | to symbol conflicts? Or does symbol versioning magically
           | resolve that somehow?
        
         | uncletammy wrote:
         | Your proposed solution sounds an awful lot like a manifest file
         | 
         | https://en.wikipedia.org/wiki/Manifest_file
        
         | insanitybit wrote:
         | I've been messing around with some ideas.
         | 
         | 1. `autobox` (to be renamed lol) [0]. It's basically a Rust
         | interpreter that performs taint and effect analysis, reporting
         | on both, allowing you to use that information to generate
         | sandboxes. ie: "autobox sees you used the string '~/.config' to
         | read a file, and that is all the IO performed, so that is all
         | the IO you get".
         | 
         | 2. I'm working on a container based `cargo` with `riff` built
         | in that aims to work for the vast majority of projects and
         | sandbox your build with a defined threat model.
         | 
         | The goal is to be able to basically `alias cargo=cargo-
         | sandboxed` and have the same experience but with a restricted
         | container environment + better auditing of things happening in
         | the container.
         | 
         | 3. I previously built a POC of a `Sandbox.toml` and
         | `Sandbox.lock` with a policy language that allowed you to
         | specify a policy for a given build step. Unfortunately, I
         | couldn't decide on how I wanted it to work in terms of "do I
         | generate a single sandbox for the entire build, or do I run
         | each build stage in its own sandbox" - there are tradeoffs for
         | both.
         | 
         | Here's a lil snippet:                   [build-
         | permissions.file-system]         // All paths are relative to
         | the project directory unless they start with `/`         "../"
         | = {permissions = ["read"]}         // "$target" being a special
         | path         "$target" = {permissions = ["read", "write"]}
         | // Source this path from the environment at build time,
         | `optional` means it's         // ok if it isn't available
         | "$env::PROTOC_PATH" = {permissions = ["read", "execute"],
         | optional=true}         // Default protobuf installation paths,
         | via regex         "^(/usr)?/bin/protoc" = {permissions =
         | ["read", "execute"], regex=true}
         | 
         | Once I'm done with (2) though I think I'll tackle (3).
         | 
         | `autobox` is fun but I think it may be impractical without more
         | language level support and no matter what I'd end up having to
         | implement it in the compiler at some point, which means it
         | would be unusable without nightly or a fork.
         | 
         | I'm going to try to wrap up an autobox POC that handles
         | branching and loops, publish it, and see if someone who does
         | more compilery things is willing to pick it up. As for (2) and
         | (3) I believe I can build practical implementations for both.
         | 
         | [0] https://github.com/insanitybit/autobox/
        
           | louislang wrote:
           | This is really cool work! Also a fan of Grapl.
        
             | insanitybit wrote:
             | :D Thanks!
        
         | louislang wrote:
         | This is one of the projects we're working on (and open
         | sourcing)!
         | 
         | Currently allows you to specify allowed resources during the
         | package installation in a way very similar to what you've
         | outlined [1].
         | 
         | The sandbox itself lives here [2] and can be integrated into
         | other projects.
         | 
         | 1. https://github.com/phylum-
         | dev/cli/blob/main/extensions/npm/P...
         | 
         | 2. https://github.com/phylum-dev/birdcage
        
           | [deleted]
        
         | rollcat wrote:
         | Check out OpenBSD's pledge(2): https://man.openbsd.org/pledge.2
         | 
         | It does exactly that (although on a per-process basis).
         | 
         | I don't think this kind of permission system can be retrofitted
         | into an existing language without direct OS support, and
         | probably not at the library level (you'd need something like
         | per-page permissions which would get hairy real fast).
        
           | musicale wrote:
           | I like this. I'd try to keep the permission sets as small,
           | limited, and simple as possible though.
        
             | rollcat wrote:
             | > I'd try to keep the permission sets as small and simple
             | as possible though.
             | 
             | You've described OpenBSD in general. I recommend a deeper
             | dive - it's fantastically refreshing, how simple yet
             | functional an OS can be.
        
         | nicoty wrote:
         | I watched a video[0] about the Roc language recently, and they
         | do something interesting to address this: they have a layer in
         | their language called "platforms" and the idea behind these are
         | that there are many different platforms that you can choose
         | between to run code with and each one has different
         | permissions. So one platform might be sandboxed and disallow
         | the use of certain unsafe APIs whereas another might be less
         | sandboxed.
         | 
         | [0] https://m.youtube.com/watch?v=cpQwtwVKAfU
        
         | c0balt wrote:
         | At the beginning the permissions aspect of deno[0] was actually
         | on of the major selling points for me. The approach used there
         | was to begin at zero and offer granular permission control,
         | e.g. `--allow-read=data.csv`, for filesystem, network etc. I
         | would love to have this for, e.g., python or npm packages.
         | 
         | [0]:
         | https://deno.land/manual@v1.27.0/getting_started/permissions
        
           | louislang wrote:
           | Phylum's extension framework is built on Deno for this exact
           | reason. The ability to provide granular permissions was
           | something we were really interested in.
           | 
           | Deno is a really cool project, imo.
        
           | comprev wrote:
           | Interesting read, thanks.
        
           | jens0 wrote:
           | Doesn't this only apply to the entire process? Not the
           | individual dependencies, right? Just confirming, Deno was my
           | first thought with this, it requires the developer to
           | deliberately enable permissions needed.
        
             | SahAssar wrote:
             | Yes, it applies to the whole process. It's incredibly hard
             | to sandbox dependencies individually since you don't know
             | how your code or other dependencies interact with it. If
             | you want you can run dependencies in a worker process and
             | sandbox that tighter, but that is quite a bit of work.
        
       | jacob019 wrote:
       | PyPi should warn when the package and developer are new.
        
         | stemlord wrote:
         | Yeah a time/activity based trust system like thepiratebay uses
         | could be helpful.
         | 
         | Also devs should get into the habit of providing sha256 hashes
         | on offical channels (i.e., github readme) so users can validate
         | (if its possible to validate a pkg before executing malicious
         | code in the python ecosystem, I'm not sure how that'd work).
        
           | louislang wrote:
           | > Yeah a time/activity based trust system like thepiratebay
           | uses could be helpful.
           | 
           | This is an excellent idea. Authors are something we are
           | digging into heavily as part of an ongoing effort to improve
           | trust in the open source ecosystem.
        
           | kroolik wrote:
           | Doesn't pip's hash checking mode solve this issue? Freeze
           | your requirements with hashes. Pypi already provides hashes
           | for sdists and wheels. See
           | https://pip.pypa.io/en/stable/topics/secure-
           | installs/#hash-c...
           | 
           | If we are talking typos or other human errors, guess we could
           | only warn people that there are other package with similar
           | name available. Can't predict what people have in mind when
           | they make a typo.
        
             | louislang wrote:
             | It definitely does help. We've seen malicious actors
             | introduce "bad things" into legitimate packages [1]. So
             | hashes help identify what you got, but doesn't necessarily
             | prevent you from getting something you didn't intend.
             | 
             | [1] https://www.cisa.gov/uscert/ncas/current-
             | activity/2021/10/22...
        
           | yjftsjthsd-h wrote:
           | > Also devs should get into the habit of providing sha256
           | hashes on offical channels (i.e., github readme) so users can
           | validate (if its possible to validate a pkg before executing
           | malicious code in the python ecosystem, I'm not sure how
           | that'd work).
           | 
           | I would think the easy solution is to publish a public
           | signing key per-person or per-project, and then sign
           | individual files with that. So, GPG.
        
         | MichaelCollins wrote:
         | Even Firefox and Chrome's extension "stores" don't get this
         | right. In either, a once trusted extension can be sold to a
         | malicious company who then pushes new updates which
         | automatically get downloaded by Firefox and Chrome by default,
         | with no warning. Quite possibly without Mozilla and Google
         | having any way of knowing it happened at all.
         | 
         | One way to address this is to move to a traditional "debian"
         | style system, where packages are people affiliated with / known
         | by Debian/Mozilla/Google, and specifically _aren 't_ the
         | developers of the software themselves. The software is written
         | by Developer X, but is then packaged and distributed by
         | Packager Y, who ideally has no commercial affiliation with
         | Developer X. If Developer X sells out to Malware Corp Z, end
         | users can hope that Packager Y isn't part of that deal and
         | prevents the malware from being packaged and distributed. This
         | still isn't bullet-proof, but it's a lot better.
        
         | pkrumins wrote:
         | Totally agree. There should be a 30 day pe
        
         | prox wrote:
         | Yeah, an onboarding process build on trust and time-delay would
         | be nice.
        
       | jerpint wrote:
       | I wonder why we can't have pip packages be published by username
       | or organization, like                   pip install
       | google/tensorflow
       | 
       | It would significantly reduce the attack space
        
         | YetAnotherNick wrote:
         | It gives false sense of security. What about
         | google_official/tensorflow
        
           | germandiago wrote:
           | It would still be an improvement if companies make clear what
           | their namespace is.
        
           | cozzyd wrote:
           | google.com/tensorflow (and you'd have to prove you own
           | google.com)
           | 
           | not perfect, but better.
        
           | comprev wrote:
           | Perhaps something like Docker hub where "official" images are
           | like "/_/nginx"
           | 
           | So "_google/tensorflow" would be official.
           | 
           | "google/tensorflow" would not be (plus it would be reserved
           | by default to avoid confusion).
        
         | blibble wrote:
         | Maven had this 20 years ago
         | 
         | quite why python refuses to learn from anything that went
         | before it I really dont know
         | 
         | "Namespaces are one honking great idea -- let's do more of
         | those!"
        
         | permo-w wrote:
         | one of main issues I have with java is how messy it is to
         | import external modules. python is a breath of fresh air
         | comparitively. introducing this kind of thing as mandatory is a
         | step away from that
        
         | louislang wrote:
         | npm does something similar with their scoped packages. It fixes
         | the problem for the top level packages, but you'd still have to
         | contend with the transitive dependencies written by smaller
         | organizations or individual contributors. In this case, you
         | have to guarantee that no one involved in the dependency chain
         | ever typos anything.
        
           | jerpint wrote:
           | This is true, and wouldn't remove the entire space of attack,
           | but would still limit it to some extent.
        
             | louislang wrote:
             | Oh absolutely. Unless everyone wants to be cool and stop
             | publishing malware, gotta take a defense in depth approach
             | here.
        
       | sergiotapia wrote:
       | It makes me very sad that something as wonderful as code, the
       | closest we have to actual magic is tainted by this. You know how
       | to code and you chose to spend your time doing this? What a shame
        
         | NotYourLawyer wrote:
         | Lots of "reputable" devs write code that's every bit as shitty
         | as this. Somehow it's ok when all you're doing is spying on
         | your users and shoving ads into their eyeballs.
        
       | coffeeblack wrote:
       | I started to develop only inside VMs, with a full Desktop, IDE,
       | browser etc. inside the virtual machine.
       | 
       | There have been to many contaminations of major package repos
       | lately. Only one typo in an import statement up the dependency
       | chain and you'd be compromised.
        
         | ashishbijlani wrote:
         | Packj sandbox [1] offers "safe installation" of
         | PyPI/NPM/Rubygems packages.
         | 
         | 1. https://github.com/ossillate-
         | inc/packj/blob/main/packj/sandb...
         | 
         | It DOES NOT require a VM/Container; uses strace. It shows you a
         | preview of file system changes that installation will make and
         | can also block arbitrary network communication during
         | installation (uses an allow-list).
         | 
         | Disclaimer: I've been building Packj for over a year now.
        
         | weinzierl wrote:
         | This is goid defense in depth measure but doesn't solve one
         | fundamental issue. _You_ might be protected during developement
         | by the sandbox but your users are not necessarily. I think we
         | as developers should not give any sotware we do not trust to
         | our users.
        
         | secondcoming wrote:
         | This is the way.
        
         | jiripospisil wrote:
         | I've tried the same but the graphics performance was too slow
         | (no GPU acceleration). The current setup is to use a virtual
         | machine but connect to it via VS Code's Remote SSH extension
         | from the host.
        
           | inetknght wrote:
           | I hope you've turned off VS Code's "workspace trust"
           | settings.
           | 
           | https://code.visualstudio.com/docs/editor/workspace-trust
        
             | jiripospisil wrote:
             | Sometimes but I wonder to what degree it actually matters.
             | Tasks, debuggers, extensions etc. run in the context of the
             | VM, not the host. The Remote SSH extension turns VS Code
             | into a "thin" client which presents pretty much just the
             | UI.
             | 
             | https://code.visualstudio.com/docs/remote/ssh
        
               | suchar wrote:
               | Readme says:
               | https://marketplace.visualstudio.com/items?itemName=ms-
               | vscod...
               | 
               | > A compromised remote could use the VS Code Remote
               | connection to execute code on your local machine.
               | 
               | So I would say that it might be a bit harder for an
               | attacker to gain access to your local machine, but you
               | should not rely on it, because it's more like security by
               | obscurity.
        
               | jiripospisil wrote:
               | Well damn. I was under the impression that the
               | communication channel uses/accepts only well defined
               | VSCode specific messages related to the UI...
        
           | fsflover wrote:
           | With Qubes, you can do GPU passthrough: https://forum.qubes-
           | os.org/t/another-2-gpu-passthrough-post/....
        
         | louislang wrote:
         | Full disclosure, I am a co-founder at Phylum.
         | 
         | We are actively working on a solution that will fully sandbox
         | package installations for npm, yarn, poetry and others.
         | 
         | It's rolled up as part of our core CLI [1], but is totally open
         | source [2]:
         | 
         | [1] https://github.com/phylum-dev/cli [2]
         | https://github.com/phylum-dev/birdcage
        
           | coffeeblack wrote:
           | Sounds awesome.
           | 
           | Though I'm not sure of the solution really is / should be
           | increased sandboxing.
           | 
           | The alternative may be a rethinking of the increasingly
           | smaller packages. Maybe it's better to have few large
           | packages maintained by reputable organisations or
           | personalities?
        
             | louislang wrote:
             | The problem is large, sprawling and complex. In an ideal
             | case, we'd have high quality packages maintained by
             | reputable people/organizations. But today this just isn't
             | true. Open source takes contributions from a large number
             | of unknown authors/contributors with motivations that may
             | or might night align with your own.
             | 
             | We really need a defense in depth approach here. Sandbox
             | where it makes sense, perform analysis of code being
             | published, consider author reputation, etc.
        
           | comboy wrote:
           | Why is there so much discussion about sandboxing? Why
           | wouldn't I put some malicious code in the package itself
           | limiting myself to installation only?
        
             | louislang wrote:
             | A lot of the malware targeting developers is leveraging the
             | installation hooks as the execution mechanism. So
             | sandboxing the install helps stop this particular attack
             | vector - which is why it gets talked about so much.
             | 
             | If you put code in the package itself, this would side step
             | the "installation" sandbox. However we're also doing
             | analysis of all packages introduced to the ecosystem to
             | uncover things that are hiding in the packages themselves.
             | 
             | So you're right, we need a defense in depth approach here.
        
         | koolba wrote:
         | > Only one typo in an import statement up the dependency chain
         | and you'd be compromised.
         | 
         | Doesn't even have to be a typo if the actual project is
         | compromised. Like one of the 100s of NPM modules without 2FA
         | for publishing.
        
         | fsflover wrote:
         | Then you might be interested in Qubes OS: https://qubes-os.org.
        
           | orblivion wrote:
           | That's why I chose it. A lot of peace of mind there.
        
         | kibwen wrote:
         | This is a good approach, though presumably the VM still has
         | access to your Github credentials (via the browser) and your
         | SSH keys? It'll limit the fallout of getting owned to anything
         | reachable from Github (is it against Github's TOS to have
         | multiple accounts?), less if you have 2FA (does there exist 2FA
         | for SSH keys (I don't mean passphrases)?), but I think it would
         | be better for just my build/run/test cycles to be cordoned off
         | into their own universe, with a way for just the source code
         | itself to cross the boundary.
        
           | namaria wrote:
           | It might be too cumbersome for most, and I might be more
           | paranoid than average, but each project for me means a fresh
           | VM, a new Keepass database and dedicated accounts. Then again
           | I work mostly in ops, and I've seen first hand how badly
           | things can go wrong so isolation and limiting blast radius
           | takes precedence over daily convenience for me.
        
             | bt1a wrote:
             | Could you please share some resources/tactics for
             | protecting your host machine from these development VMs? If
             | I were to do this, I would want some assurances (never
             | 100%) that my host is protected from the VM to the best of
             | my ability.
             | 
             | (If it makes any difference, I would probably be using
             | VMWare Workstation Pro)
        
               | namaria wrote:
               | I can't give you what you're looking for. You need to
               | decide on the trade offs for yourself. There will always
               | be a risk. Directed attacks can get out of VMs. You could
               | slip up and log into a personal account inside the VM.
        
             | 0cf8612b2e1e wrote:
             | That does sound incredibly cumbersome. I suppose that means
             | you are an ace at provisioning machines.
             | 
             | How do you move data in/out of the guests? I always found
             | that part of interacting with VMs to be annoyingly painful.
        
               | namaria wrote:
               | There are always trade offs. You do get better at things
               | you do a lot. My mother won't use a password vault
               | because copying and pasting is too much work for her. I'd
               | just rather pay with my time and inconvenience than one
               | day find out some python package I fiddled with for a
               | late night project once means I need to call my bank.
        
           | jve wrote:
           | > does there exist 2FA for SSH keys (I don't mean passphrases
           | 
           | Yes. Yubikey. ecdsa-sk key requires you to tap yubikey to
           | have a working key. It consists of 2 parts - a private key
           | file, but which is useless without yubikey.
           | https://developers.yubico.com/SSH/
           | 
           | https://developers.yubico.com/SSH/Securing_SSH_with_FIDO2.ht.
           | ..
        
           | jve wrote:
           | Github offers fine grained personal access tokens.
           | https://docs.github.com/en/authentication/keeping-your-
           | accou...
           | 
           | Azure DevOps does it too
        
           | fsflover wrote:
           | > though presumably the VM still has access to your Github
           | credentials (via the browser) and your SSH keys?
           | 
           | Not in Qubes OS:
           | 
           | https://github.com/Qubes-
           | Community/Contents/blob/master/docs...
           | 
           | https://www.qubes-os.org/doc/split-gpg/
        
         | lupire wrote:
         | Virtual is part of a solution but not the key: the key is to
         | separate your dev env from your real life/business environment
         | -- including all your personal and professional business data
         | and web accounts that expose your financials and private data.
         | 
         | If you log into your email from the virtual machine, you are at
         | risk.
        
           | hollerith wrote:
           | That protects me (the software developer/maintainer) to some
           | degree, but does nothing to protect the _users_ of the
           | software I am maintaining.
        
             | chromakode wrote:
             | Development should be more exploratory and experimental
             | than prod. For the past decade I've had a similar strategy:
             | I freely install and demo new dependencies on separate dev
             | hardware (or a VM when I'm on the road). Then I code review
             | (incl. locked dependencies) and deploy from a trusted
             | environment with reduced supply chain exposure.
        
             | suchar wrote:
             | As long as your are creating web applications then browsers
             | are pretty good at limiting blast radius of a single
             | attacked website. Well, at least until attacker discovers
             | that he can inject some fancy phishing into trusted site.
             | 
             | With local development environment it is a bit different,
             | because unless you are running build/test etc. in a
             | container/vm/sandbox, then attacker has access to all of
             | your files, especially web browser data.
        
           | coffeeblack wrote:
           | The only place I log into from the VM is Github, protected by
           | 2FA in case any malware gets my password.
        
             | remram wrote:
             | So the malware can delete all your projects or inject
             | malware into them, but thankfully it won't be able to log
             | in again later?
        
             | ylk wrote:
             | The malware will just take the session cookie. Some actions
             | still require 2FA approval, but it's not many, iirc.
        
           | orblivion wrote:
           | I think that separation is the point of the VM. Do the dev
           | work in the VM, don't give it sensitive info about yourself.
        
       | AtNightWeCode wrote:
       | This is a fundamental flaw in most langs. There should be a
       | smarter way to track changes in what is specifically used in
       | dependencies.
        
       | cr4nberry wrote:
       | That bit with the semicolon way off to the right side of screen
       | is kind of sloppy. It's a dead giveaway of "I'm doing something I
       | shouldn't be doing"
        
       | qwertox wrote:
       | Couldn't forcing publishers to sign a hash of the module not be a
       | solution?
       | 
       | The certificate could contain information about the owner and the
       | consumer could check if he wants to deal with the owner or not.
       | Developers could add a desired whitelist to pip (or use a curated
       | one) to continue using automation.
        
         | louislang wrote:
         | Hashing solves one side of the coin. Namely, whether or not you
         | got the thing you expected (or perhaps, got the thing you
         | expected from the individual you expected it to come from).
         | 
         | On the other side, we have to contend with the fact that
         | malware can be slipped into otherwise legitimate packages. This
         | has happened numerous times over the years. In this case, the
         | hash would serve as a way to say "yup, you definitely got
         | malware". Useful for incident response, but I think we can do
         | better and try and prevent these attacks from being viable in
         | the first place.
        
       | alexb_ wrote:
       | A lot of people in this thread are asking for a
       | reputation/"verified user" solution for this, but really I think
       | just pulling a gazillion dependencies for applications is just
       | all around bad. I actually think having a reputation system would
       | be even worse, because people would see it and assume that
       | reputation is a guarantee of safety. Trust without verification
       | is where issues can become even worse.
        
         | alxlaz wrote:
         | Based on my experience with shady plug-ins for e.g. Photoshop
         | back in the late 90s/early 00s, all that a reputation/"verified
         | user" solution is going to achieve is a very lucrative black
         | market of high-reputation/verified user profiles and
         | credentials.
        
         | [deleted]
        
         | SamuelAdams wrote:
         | So is .NET finally going to make a comeback? Yes, you need some
         | dependencies for projects, but in general Microsoft does a good
         | job providing a lot of tooling and libraries.
        
       | lupire wrote:
       | Open/free software is great when a great person writes some code
       | and lets you use it, because they are kind and there's nearly no
       | marginal cost.
       | 
       | But malicious actors can get value from polluting the sharing
       | network, and that costs effort to defend against, which means
       | someone(s) has to pay to secure the network, or be open to
       | attack.
        
         | LudwigNagasena wrote:
         | That's not an either or thing. Someone you pay to can also be a
         | malicious actor.
        
           | billti wrote:
           | Or be compromised themselves and an unknowing vehicle for
           | attacks (e.g. see SolarWinds).
        
       | RockRobotRock wrote:
       | The guy who runs the C2 openly has the source code for the
       | stealer on his GitHub. Why doesn't GitHub do anything about this
       | shit?
       | 
       | I've personally been hacked by a supply chain attack via a GitHub
       | wiki link. I contacted GitHub support and didn't hear back from
       | them for 3 months. They are completely useless.
        
         | tomatotomato37 wrote:
         | Does GitHub actually prohibit programs that are up front about
         | the fact they do something questionable? Considering there have
         | been active repos for those steam pirating DLLs on the site for
         | ages I thought they only really go after hidden maliciousness
        
           | mr_mitm wrote:
           | Considering the entire open source pentesting community
           | almost exclusively uses GitHub to host their projects: no.
           | There is actual malware being hosted on GitHub, with the
           | caveat that malware and pentest tools or proof of concept
           | exploits are sometimes indistinguishable.
           | 
           | GitHub announced a few years ago that they would crack down
           | on malware and were about to introduce some very strict T&C.
           | After a huge backlash from the pentesters (justified in my
           | opinion), they backpedaled a little bit. Hosting pentesting
           | tools is fine, using GitHub as your C2 server or to to
           | deliver malware in actual attacks is not.
        
             | RockRobotRock wrote:
             | I completely agree, despite the wording of my comment. In
             | this case, the user has a different GH account for hosting
             | their malware and C2, but the fact that they're so flagrant
             | about it is what bothers me.
             | 
             | I was a skid once, I get it, probably a lot of us were.
        
         | mrtweetyhack wrote:
        
         | louislang wrote:
         | They are trying. The level of effort to release these things is
         | so low, the effort required to catch it and remove it at scale
         | is much harder, unfortunately.
        
           | MichaelCollins wrote:
           | We're not talking about some quirky money-strapped startup.
           | We're talking about Microsoft.
        
           | RockRobotRock wrote:
           | Are they? I know I'm biased because this affected me and I'm
           | still mad about it, but I just don't buy it.
           | 
           | I contacted them, showing the plainly obvious malicious
           | account that was distributing malware. Two months later, they
           | send me a generic message saying that they've "taken
           | appropriate action", but the account and their payload was
           | STILL THERE, they hadn't done anything. The attacker was
           | rapidly changing their username, and honestly I'm not sure
           | their support staff has a way of even dealing with that. I
           | tried to explain the situation as best I could, but they were
           | not helpful in the slightest.
        
             | jorvi wrote:
             | I don't know what their standard for 'malicious' is, but
             | they nuked Popcorn Time and Butter (the technological core
             | without the actual piratey bits) from orbit until there was
             | a huge amount of backlash.
        
               | RockRobotRock wrote:
               | I'm not even asking them to deal with the problem
               | "systemically" or "at scale". I just want them to respond
               | when I am trying to stop an active criminal campaign
               | whose goal is to steal money and cryptocurrency from
               | people.
        
               | nomdep wrote:
               | Talk to the FBI or any authorities, then.
               | 
               | I despise the idea of GitHub removing any code just
               | because YOU (anyone) think they are criminals.
        
               | RockRobotRock wrote:
               | Read mr_mitm's comment. I have no problem with
               | potentially malicious code being hosted on GitHub, I
               | think it's a good thing. Using GitHub's infrastructure
               | for your theft campaign is clearly not okay.
        
       | tonnydourado wrote:
       | One of the fake packages is called `felpesviadinho`, which looks
       | like calling someone named "felpes" with a homophobic slur in
       | Brazilian Portuguese.
        
       | 7373737373 wrote:
       | Programming languages have to become able to sandbox imported
       | dependencies, to limit their side effects up to sandboxing them
       | completely, ideally in a fine grained way that allows developers
       | to gradually reduce the attack surface of even their own code
       | 
       | https://medium.com/agoric/pola-would-have-prevented-the-even...
       | 
       | https://github.com/void4/notes/issues/41
        
       | zzzeek wrote:
       | can there be a "blue checkmark" system for pypi authors? I'm sure
       | that's been brought up and rejected for _reasons_.
        
         | dheera wrote:
         | I think the issue isn't so much malicious authors, it's
         | compromised repositories and compromised repositories as
         | dependencies.
         | 
         | Blue check would gatekeep a lot of noble, new developers.
        
         | 7373737373 wrote:
         | Identity verification will never be enough, if their account or
         | _anything_ in their development or distribution pipeline is
         | compromised, so will their code. Sandboxing mechanisms are
         | fundamentally required - not only to ward off malicious attacks
         | there, but to prevent accidental side effects and compromise at
         | runtime too.
        
         | josephwegner wrote:
         | Yes, this lines up with the "Critical Project" concept that has
         | been floating around in the past year. It is... contentious to
         | say the last. Previous HN discussion:
         | https://news.ycombinator.com/item?id=32111738
        
           | remram wrote:
           | This gives a checkmark based on number of downloads, so there
           | is absolutely no guarantee that the package doesn't do
           | anything malicious or won't in the future.
        
         | woodruffw wrote:
         | It's not going to be a "blue checkmark" _per se_ , but we're
         | currently working on integrating Sigstore signatures into PyPI.
         | The idea there will be that you'll be able to verify that a
         | package's distributions are signed with an identity that you
         | trust (for example, your public email address, GitHub
         | repository name, or GitHub handle).
        
         | remram wrote:
         | I don't think it makes much sense to verify pypi authors. I
         | mean you could verify corporations and universities and that
         | would get you far, but most of the packages you use are
         | maintained by random people who signed up with a random email
         | address.
         | 
         | I think it makes more sense to verify individual releases.
         | There are tools in that space like crev [1], vouch [2], and
         | cargo-vet [3] that facilitate this, allowing you to trust your
         | colleagues or specific people rather than the package authors.
         | This seems like a much more viable solution to scale trust.
         | 
         | [1]: https://github.com/crev-dev/crev [2]:
         | https://github.com/vouch-dev/vouch [3]:
         | https://github.com/mozilla/cargo-vet
        
           | peteatphylum wrote:
           | We've found a lot of open-source packages that are authored
           | by (well, released by authors identified by) disposable email
           | addresses. We were shocked to find companies doing this, too.
           | 
           | Package Dependency land is a crazy place
        
             | remram wrote:
             | The reason is obvious, people crawl
             | pypi.org/github.com/npmjs.com and email their job posts or
             | product launches. Every platform that requires an email and
             | shows it publicly will necessarily get a lot of disposable
             | ones.
        
         | robertlagrant wrote:
         | SQLAlchemy can get one for $20/mo : - D
        
         | [deleted]
        
         | peteatphylum wrote:
         | This is the double-edged sword of open-source. It's awesome
         | because anyone can contribute. It can be dangerous for the same
         | reason, unfortunately.
        
       | sigg3 wrote:
       | It's using base64 encoded strings to deliver the initial stage.
       | Can this be avoided/flagged more easily if by adding a scan of
       | statements featuring base64 or import?
        
         | banana_giraffe wrote:
         | That'd catch a ton of valid packages. Right now on my random
         | collection of packages in site-packages I have ~60 packages
         | that have 'import base64' in them.
        
           | sigg3 wrote:
           | Yeah, it would probably create more manual work if you have
           | too many false positives. I have maybe six base64 strings in
           | the code I'm working on, so it might be worthwhile looking
           | into provided my legitimate imports don't have any.
        
         | fabioz wrote:
         | The way I'd go about this is probably starting a VM, installing
         | the package and seeing what in the filesystem is affected by it
         | rather than trying to do static analysis (which becomes a cat
         | and mouse game as detection heuristics improve so do the
         | stealth heuristics).
         | 
         | The attack surface area is too big when random python code is
         | executed, which is the case for `setup.py`, but even if there
         | wasn't code executed there, as soon as you import the package
         | and use it, you'd have the same issue.
        
         | louislang wrote:
         | Yes, this works really well. But as soon as you deploy it, the
         | actors change tactics. We've had to build a defense in depth
         | approach to discovering malicious packages as they are
         | introduced into the system.
        
           | csunbird wrote:
           | it is probably very easy to bypass, by creating a sub package
           | that can do the decoding via proxy functions, which is not
           | evil at all, and depending on that package on the evil one.
           | It won't trigger the alarm, as it is indirectly depending on
           | the base64 :)
        
             | louislang wrote:
             | Exactly, I don't think you can rely on a naive import to
             | determine maliciousness. We've basically had to build out
             | heuristics that are capable of walking function calls for
             | this exact reason. Otherwise things are just too noisy.
        
         | ashishbijlani wrote:
         | This is exactly what Packj [1] scans packages for (30+ such
         | risky attributes). Many packages will use base64 for benign
         | reasons, this is why no fully-automated tool could be 100%
         | accurate.
         | 
         | Manual auditing is impractical, but Packj can quickly point out
         | if a package accesses sensitive files (e.g., SSH keys), spawns
         | shell, exfiltrates data, is abandoned, lacks 2FA, etc. Alerts
         | could be commented out if don't apply.
         | 
         | 1. https://github.com/ossillate-inc/packj
         | 
         | Disclaimer: I developed this.
        
         | woodruffw wrote:
         | We tried doing this on PyPI a couple of years ago, and it
         | produced a large number of false positives (too many to
         | manually review).
         | 
         | You can see the rules we tried here[1].
         | 
         | [1]:
         | https://github.com/pypi/warehouse/blob/main/warehouse/malwar...
        
           | [deleted]
        
       | sc__ wrote:
       | The article doesn't explain what exactly the "W4SP Stealer" does.
       | Would someone be able to explain?
        
         | n4bz0r wrote:
         | The source is actually hosted on GitHub, and there is a good
         | readme explaining all that :)
         | 
         | https://github.com/loTus04/W4SP-Stealer
        
         | banana_giraffe wrote:
         | It downloads a script that, at least right now, will turn
         | around and grab cookies and passwords from browsers and send
         | the data off to an discord webhook.
        
           | avian wrote:
           | > discord webhook
           | 
           | Hah. Is this true? I find it funny since IRC has/had this
           | reputation for being a means of communication with malware
           | and it's often blocked on this grounds.
           | 
           | Nice to know that malware is going on with the times and is
           | using Discord for that now.
        
             | neurostimulant wrote:
             | Discord is great as command and control server because the
             | malware author doesn't need to expose their ip address or
             | implement a complex web of proxy to secure their C&C
             | server.
        
               | remram wrote:
               | Couldn't you use someone else's IRC server, the same way
               | you use Discord's server?
        
               | neurostimulant wrote:
               | I suppose you could, but have you seen how popular new
               | opensource projets being run these days? Young devs
               | really loves discord to the point of hosting
               | documentations there. I imagine young malware authors are
               | no different.
        
               | girvo wrote:
               | Which, I don't know if I'm getting old, but man that
               | frustrates me. It's a terrible platform for
               | documentation. It's barely a good text chat platform.
        
         | louislang wrote:
         | It's a slew of checks for passwords and other things on the
         | developers machine. The data is extracted and sent to a remote
         | endpoint controlled by the attacker.
        
       | heleninboodler wrote:
       | This type of stuff is one reason I like vendoring all my deps in
       | golang. You have to be very explicit about updating dependencies,
       | which can be a big hassle, but you're required to do a git commit
       | of all the changes, which gives you a good time to actually
       | browse through the diffs. If you update dependencies
       | incrementally, it's not even that big a job. Of course, this
       | doesn't guarantee I won't miss any malicious code, but they'd
       | have to go to much greater lengths to hide it since I'm actually
       | browsing through all the code. I'm not sure the amount of code
       | you'd have to read in python would be feasible, though.
       | Definitely not for most nodejs projects, for example.
       | 
       | I think it's an interesting cultural phenomenon that different
       | language communities have different levels of dependency fan-out
       | in typical projects. There's no technical reason golang folks
       | couldn't end up in this same situation, but for whatever reason
       | they don't _as much_. And why is nodejs so much more dependency-
       | happy than python? The languages themselves didn 't cause that.
        
         | augusto-moura wrote:
         | The problem is the tree of dependencies you might check. Sure
         | you can check the changes in a direct dependency, but when that
         | dependency updates a few others and those update a few others,
         | the number of lines you need to read grow very quickly
        
           | heleninboodler wrote:
           | Golang flattens the entire dependency tree into your vendor
           | directory. It's still not that big. The current project I am
           | working on has 3 direct external dependencies, which expands
           | out into 22 total dependencies, 9 of which are golang.org/x
           | packages (high level of scrutiny/trust). It's really quite
           | manageable.
        
         | datalopers wrote:
        
         | yamtaddle wrote:
         | > And why is nodejs so much more dependency-happy than python?
         | 
         | Part of it--but I'm sure not all--is that the core language was
         | really, really bad for decades. Between people importing
         | (competing! So you could end up with several in the same
         | project, via other imports! And then multiples of the same
         | package at different versions!) packages to try to make the
         | language tolerable and polyfills to try to make targeting the
         | browser non-crazy-making, package counts were bound to bloat
         | just from these factors.
         | 
         | Relatedly, there wasn't much of a stdlib. You couldn't have as
         | pleasant a time using only 1st-party libraries as you can with
         | something like Go. Even really fundamental stuff like dealing
         | with time for _very_ simple use cases is basically hell without
         | a 3rd party library.
         | 
         | Javascript has also been, for whatever reason, a magnet for
         | people who want to turn it into some other language entirely,
         | so they'll import libraries to do things Javascript can already
         | do just fine, but with different syntax. Underscore, rambda,
         | that kind of thing. So projects often end up with a bunch of
         | those kinds of libraries as transitive dependencies, even if
         | they don't use them directly.
        
         | plugin-baby wrote:
         | > And why is nodejs so much more dependency-happy than python?
         | 
         | Could it be that nodejs has implemented package management more
         | consistently and conveniently than other languages/platforms?
        
           | Scarblac wrote:
           | That's one thing, the other is the almost complete absence of
           | a standard library.
        
             | heleninboodler wrote:
             | Yeah, I think this is a big one. One of the things that I
             | have always liked about Golang is that the standard library
             | is quite complete and the implementations of things are
             | (usually) not bare-bones implementations that you need to
             | immediately replace with something "prod-ready" when you
             | build a real project. There are exceptions, of course, but
             | I think it's very telling that most of my teammates go so
             | long without introducing new dependencies that they usually
             | have to ask me how to do it. (I never said the ux was
             | fantastic :) This also goes to GP's "consistent and
             | convenient" argument.
        
               | peteatphylum wrote:
               | Totally agree. It feels like there is a pretty strong
               | inverse correlation between standard library size, and
               | average depth of a dependency tree for projects in a
               | given language. In our world, that is pretty close to
               | attack surface.
        
           | dheera wrote:
           | pip throws your dependencies in some lib directory either on
           | your system (default if you use sudo), in your home directory
           | (default if you don't use sudo), or inside your virtualenv's
           | lib directory.
           | 
           | npm pulls dependencies into node_modules as a subdirectory of
           | your own project as default.
           | 
           | Python really should consider doing something similar.
           | Dependencies shouldn't live outside your project folder. We
           | are no longer in an era of hard drive space scarcity.
        
             | coredog64 wrote:
             | As of Python 3, pip install into the system Python lib
             | directory is strongly discouraged. ISTR that even using pip
             | to update pip results in a warning.
             | 
             | That's not to say that there's not still some libs out
             | there that haven't updated docs to get with the times.
        
             | cozzyd wrote:
             | Have you seen how much space a virtualenv uses? It can
             | easily be >1 GB. For every project, this adds up. (Not to
             | mention the bandwidth, which is not always plentiful).
        
       | gatesn wrote:
       | I've been building out a PyPi proxy to try and protect against
       | these use-cases: https://artifiction.io
       | 
       | Explicit allow-lists and policies such as requiring "greater than
       | X downloads per week" go a pretty long way to filtering out
       | malicious packages.
        
       | MichaelCollins wrote:
       | I hope the age of a thousand dependencies automatically pulled
       | and upgraded on a basis of trust is coming to a close. It was
       | obvious from the start this would eventually become a problem.
       | Trust-based systems like this only work for as long as scoundrels
       | remain out of the loop.
        
         | [deleted]
        
       | lovelearning wrote:
       | In a previous HN discussion on the topic of rogue Python
       | packages, readers had suggested bubblewrap and firejail for
       | sandboxing. They limit the access a script and its packages have
       | to your filesystem and network.
       | 
       | I think that's the better approach - just assume all packages are
       | malicious by default. Can't rely on scanners because of the large
       | number of packages and attacks.
        
         | hulitu wrote:
        
           | lovelearning wrote:
           | It's a problem with every open ecosystem where libraries can
           | be downloaded and run. Rust, Golang, Node all have the same
           | problem. That's why I think it's better to assume anything we
           | download is malicious. Stuff like Bubblewrap and Qubes OS
           | seem to be the better approach compared to relying on
           | vulnerability hunters and scanning tools.
        
             | quickthrower2 wrote:
             | Do both and more. When using an unfamiliar package check
             | it's upload history. How far does it go back? How did I
             | discover the package, do i trust that source? Etc.
             | 
             | Unless your code is never going to touch important data or
             | resources, like for example (but not limited to) being used
             | commercially in any vein then you can't keep it in a padded
             | cell forever.
        
         | actually_a_dog wrote:
         | I agree, "assume unknown, unaudited packages are malicious" is
         | the ideal stance. However, I would say that a simple scanning
         | approach could probably take you pretty far. For instance, if
         | you're not using the requests module or the socket module,
         | chances are pretty good there's no data exfiltration going on.
         | 
         | It's absolutely not a foolproof approach, but it is a
         | lightweight layer that can be used in a "defense in depth"
         | approach.
        
           | 7373737373 wrote:
           | In Python, dynamic imports exist, making this impossible
        
             | actually_a_dog wrote:
             | I don't see how having dynamic imports matters if all you
             | want to do is detect if a specific file is imported. Run
             | the install and see what gets imported. That's it.
        
               | 7373737373 wrote:
               | If you actually have to execute a program (but have no
               | safe way of doing so), to see if a complex routine that
               | may return any filename imports a safe file or not, then
               | you are facing up against
               | https://en.wikipedia.org/wiki/Rice%27s_theorem
        
         | unnah wrote:
         | That's not going to help much if code from the malicious
         | attacker is still going to end up integrated into the software
         | product being built.
        
         | quickthrower2 wrote:
         | Do both and more. When using an unfamiliar package check it's
         | upload history. How far does it go back? How did I discover the
         | package, do i trust that source? Etc.
         | 
         | Unless your code is never going to touch important data or
         | resources, like for example (but not limited to) being used
         | commercially in any vein then you can't keep it in a padded
         | cell forever.
        
         | cortesoft wrote:
         | So that means you can never use any package in code that has to
         | handle sensitive data or manipulate the host machine?
        
         | ashishbijlani wrote:
         | Plug: I've been building Packj [1] to address exactly this
         | problem. It offers "audit" as well as "sandboxing" of
         | PyPI/NPM/Rubygems packages and flags hidden malware or "risky"
         | code behavior such as spawning of shell, use of SSH keys, and
         | mismatch of GitHub code vs packaged code (provenance).
         | 
         | 1. https://github.com/ossillate-inc/packj
        
           | mr_mitm wrote:
           | There is also this, although I haven't tested it yet. The
           | approach is interesting though.
           | https://github.com/avilum/secimport
        
         | twawaaay wrote:
         | The issue is where your tools are supposed to be generally
         | available on a machine or when your application has access to
         | secrets (like keystores, configuration files, log files, etc.)
         | which is pretty much every application.
        
       | WalterBright wrote:
       | These sorts of things is why D doesn't allow any system calls
       | when running code at compile time, and such code also needs to be
       | pure.
       | 
       | Of course, this doesn't protect against compiling malicious code,
       | and then running the code. But at least I try to shut off all
       | attempts at simply compiling the code being a vector.
        
         | louislang wrote:
         | I'm honestly not sure the benefits of executing code during
         | compilation/install outweigh the bad. Most attacks we have seen
         | leverage this as the attack vector.
        
           | WalterBright wrote:
           | CTFE (Compile Time Function Execution) is a major feature of
           | D, and has proven to be immensely useful and liked.
           | 
           | Note that CTFE runs as an interpreter, not native. Although
           | there are calls to JIT it, an interpreter makes it pretty
           | difficult to corrupt.
        
           | kibwen wrote:
           | The problem comes when you need to do something bespoke and
           | custom, like building a C dependency so you can link it into
           | your Python (or whatever language) library. Sometimes your
           | options are "run a makefile" or "reimplement an entire
           | library from scratch". I'm not saying that this isn't a
           | problem; it is. I think the better solution is transparent
           | sandboxing for dev environments.
        
             | wiredfool wrote:
             | I'd love transparent sandboxing -- but the difference
             | between me wanting to install the awscli and something that
             | steal awscli's credentials is only a matter of intent, so a
             | bit difficult.
             | 
             | I've basically converted to doing all node development in
             | docker containers with volume mounts for source, now it
             | looks like python is going to need to be there as well, at
             | least for stuff that pulls in any remote dependencies.
        
             | louislang wrote:
             | > I think the better solution is transparent sandboxing for
             | dev environments.
             | 
             | I don't disagree at all. We're building an open source
             | sandbox for devs right now for this exact reason. Linked it
             | in another comment.
        
         | iudqnolq wrote:
         | I've never understood this position. How often do you add a
         | dependency to your project, compile your project, and then
         | never run your project ever? I can't think of a single case
         | where this would have protected me.
        
           | pletnes wrote:
           | CI/CD servers, dev laptops etc could have more privileges
           | than the production machines. For instance.
        
             | iudqnolq wrote:
             | So you never run tests on your dev machine or CI/CD, and
             | never run your code to manually test?
             | 
             | I'm not experienced but I thought it was normal to have
             | some way to try out what you've written on your dev
             | machine. Is everyone else stepping through code in their
             | head only, and their code is run for the first time when
             | it's deployed to production?
        
               | mlyle wrote:
               | Your dev machine getting pwned is bad, but your CI server
               | getting screwed up is worse.
               | 
               | This way you don't need to sandbox the compiler, and it
               | can freely use system resources and access source trees.
               | You only need to sandbox the execution.
               | 
               | (As some people point out in this thread, editors are
               | starting to use compilers to get overall meta-
               | information, too-- if you can't even -view the code- to
               | tell if it's malicious without getting exploited, that's
               | bad).
        
               | pletnes wrote:
               | Fancy IDEs perform code analysis and thus if you feed
               | them something malicious I guess it's feasible to run a
               | shell command or similar. By definition, IDEs have to do
               | that to compile code, run linters etc.
        
               | iudqnolq wrote:
               | > This way you don't need to sandbox the compiler, and it
               | can freely use system resources and access source trees.
               | You only need to sandbox the execution.
               | 
               | If this is now only helping CI and not dev machines I
               | don't see why it's worth the effort. Wouldn't it be much
               | simpler and more reliable to just sandbox compilation of
               | anything in your CI?
               | 
               | > if you can't even -view the code- to tell if it's
               | malicious without getting exploited, that's bad
               | 
               | I guess? I can't think of a single time in my life where
               | this would have practically helped me.
               | 
               | I skim dependencies on GitHub for obvious red flags and
               | then trust them. I assume places with the resources to do
               | actual in-depth review can disable advanced analysis in
               | their IDEs for that.
        
               | brundolf wrote:
               | You're digging in really hard trying to talk someone out
               | of following good development practices because you don't
               | personally think his effort is worth it. Personally, I
               | don't think the effort being put into this argument is
               | worth it.
        
           | WalterBright wrote:
           | It means you don't need to run the compiler in a sandbox.
           | People do not expect the _compiler_ to be susceptible to
           | malware attacks, and I do what I can to live up to that
           | trust.
           | 
           | I haven't heard of anyone creating a malicious source file
           | that would take advantage of a compiler bug to insert
           | malware, but there have been a lot of such attacks on other
           | unsuspecting programs, like those zip bomb files.
        
             | iudqnolq wrote:
             | > People do not expect the compiler to be susceptible to
             | malware attacks
             | 
             | I'm not familiar with D, so I'll use the example of Rust.
             | My usual workflow looks something like this
             | 
             | 1. Make some changes
             | 
             | 2. Either use `cargo test` to run my tests or `cargo run`
             | to run my binary.
             | 
             | In both those cases the code is first compiled and
             | subsequently run. I care if running that command gives me
             | malware. I don't care at what step it happens.
        
               | ratmice wrote:
               | With rust quite often (e.g. if you are running
               | rust_analyzer) it will run `cargo check`, to produce
               | errors. When `cargo check` is run, build.rs compiled and
               | run. So quite often by step 1, just opening the file in
               | your editor before even making any changes code is
               | compiled and run.
               | 
               | Walter's solution here allows the compiler to be used by
               | the editor without the editor being susceptible. Which at
               | the very least negates the need for a pop-up in your
               | editor asking for permission.
        
               | iudqnolq wrote:
               | > With rust quite often ... it will run `cargo check`
               | 
               | Yup. But making "cargo check" safe while "cargo run"
               | stays vulnerable just reduces the number of times you run
               | malicious code. And whether malicious code runs on my
               | laptop every time I edit a file or every hour or every
               | week makes absolutely no difference. One run and the
               | malware can persist and run whenever it wants going
               | forwards.
               | 
               | > Which at the very least negates the need for a pop-up
               | in your editor asking for permission.
               | 
               | My argument is that the pop-up is security theater. I've
               | disabled it, I don't think it should be enabled by
               | default.
               | 
               | [1]: I'm handwaving slightly to get from "your code
               | depends on a malicious library" to "malicious code is
               | run". If I recall correctly there's linker tricks that
               | could do that, or you could just have every entrypoint
               | call some innocuous sounding setup function that runs the
               | malicious code.
        
               | ratmice wrote:
               | only if you intend to run the program though, if you want
               | to just read the source code, perhaps to see if it
               | contains malicious code you really don't want your editor
               | doing such things by default, so something has to give.
        
               | daedalus_f wrote:
               | Perhaps it's about responsibility. It's not the compilers
               | fault if you chose to compile and run malware. But you
               | could blame the compiler if it ran malware during the
               | compilation process.
        
               | iudqnolq wrote:
               | All else equal I'd agree. But I'm perplexed why people
               | spend a lot of effort on what seems to me like a purely
               | philosophical benefit.
        
               | WalterBright wrote:
               | It's not philosophical. All people who write programs
               | that consume untrusted data should be actively trying to
               | prevent compromise by malware.
        
               | iudqnolq wrote:
               | In general, I agree. I think developer tools are a
               | special exception because there are so many gaping
               | vulnerabilities inherent to it it's meaningless.
               | 
               | I think of that kind of thing as the equivalent of "your
               | laptop won't be vulnerable on odd-numbered days". That'd
               | be a great plan if there was a pathway to going from
               | there to no vulnerability. If that was the low-hanging
               | fruit and you're stopping there it's a complete waste of
               | time.
        
               | rnk wrote:
               | It's just address part of the problem, which of course is
               | why it seems somewhat pointless. I need to:
               | 
               | 1. Install packages/deps/libraries etc safely
               | 
               | 2. Run code that includes those libraries that limits
               | their capabilities centrally.
        
               | WalterBright wrote:
               | > It's just address part of the problem, which of course
               | is why it seems somewhat pointless
               | 
               | I cut my teeth in the aviation industry, where the idea
               | is to address _every_ part of the problem. No one part
               | will fix everything. Every accident is a combination of
               | multiple failures.
        
               | [deleted]
        
           | mozman wrote:
           | Build servers?
        
       | holri wrote:
       | This is one reason I prefer Debian python packages.
        
         | dheera wrote:
         | There's also a disturbing new trend of publishing end-user
         | software as pip packages instead of apt-get packages, just
         | because the bar to join apt-get is too high.
        
       ___________________________________________________________________
       (page generated 2022-11-02 23:00 UTC)