[HN Gopher] A Web of Trust for NPM ___________________________________________________________________ A Web of Trust for NPM Author : tao_oat Score : 82 points Date : 2020-10-03 18:51 UTC (4 hours ago) (HTM) web link (www.btao.org) (TXT) w3m dump (www.btao.org) | offtop5 wrote: | The Node standard library doesn't do enough compared to Python. | Python in a locked down environment ( you can't just install | whatever you want ) isn't bad. | | Node is a nightmare without being able to install various | packages from npm. Thus someone can remove Left Pad and it's the | end of the world. I switched from React Native to Flutter for | mobile app development and it was one of the best decisions I've | ever made | ratww wrote: | This is an outdated perspective, unfortunately. Left-pad is a | 2015 problem. In 2020 we already have padStart in every | evergreen browser and in Node.js, and it's been there for | years. | | The reason the JS library is slimmer than Pythons is because | it's mostly a client-side language. It doesn't have to handle | Unix filesystems, web crawling or even security, because the | client-side is not really the place for it. | | Most of the complex things like URL Requests and HTML parsing | are handled by the browser or the DOM. | | Comparing Python to JS is not really useful, because Python is | mostly used in the server-side, and most server-side JS | projects already use very few dependencies. | | It's the client-side projects that are bloated and using | hundreds (if not thousands) of dependencies. But those | dependencies are not there because the language is lacking. | They're there because popular packages like Babel, Webpack, | ESLint and others chose to use NPM modules to organize | themselves, instead of using function and classes like | everything else. | neil176 wrote: | There's a peculiar dynamic in the npm ecosystem that folks who | publish libraries naturally fully embrace the ecosystem, and | thereby have a lot of other library dependencies themselves. | | I think most engineers would not have _directly_ introduced | something like left-pad into their production application | dependencies since that's something people would typically | implement themselves, but people who publish open source | libraries and embrace the ecosystem would gladly use someone | else's package for that since they're also publishing with the | expectation that someone will do the same with their own work. | | It seems wrong to blame open source producers for using the work | of other producers and thereby introduce a deep dependency tree, | and yet the security concerns are completely valid. I personally | don't have any ideas for a solution, but it's worth thinking | about. | ryan29 wrote: | The description of that dependency used by the BBC makes me | wonder why trust is somehow based on popularity. What if the BBC | got duped into using a dependency from a bad actor? Is that | package trustworthy now? | | I wonder if the package repos could come up with some type of | standardized, domain verified organization namespaces. I was able | to register a decent .com a couple years ago and immediately ran | around registering the matching namespace everywhere. That feels | a bit dumb when I have a globally unique identifier (the domain) | sitting right there. | | Why can't I have `example.com` as my organization on NPM? I | realize there would be a little complexity in domains changing | ownership or being abandoned, but I feel like that's already an | issue with first come, first served namespaces. It's just glossed | over with the assumption no one will ever give away their account | / namespace which isn't true. Is there a way to tell if an | organization's owner has changed in NPM? | | A domain verified namespace could be on equal footing pretty | quickly IMO. If it's limited to organizations, which makes sense | to me, have a requirement for the domain owner to declare the | official owner of the namespace via DNS or a text file under | `/.well-known/`. Ex: | | npmjs._dvnamespace.example.com TXT ryan29 | | Now `ryan29` can claim or take ownership of the `example.com` | organization. Every time an artifact is published, that record | could be checked to ensure `ryan29` still owns the organization. | If it doesn't match, refuse to publish the artifact. | | In effect, it's saying "example.com is delegating ultimate trust | for this namespace to the user ryan29". If the domain expires, no | one can publish to that namespace. If someone new registers the | domain and claims the namespace by delegating trust to a new | owner, that works as a good indicator that everyone pulling | artifacts from the namespace should be notified there was a | change in ownership. | | It seems like a waste to me when I'm required to register a new | identity for every package manager when I already have a globally | unique, extremely valuable (to me), highly brandable identity | that costs $8 / year to maintain. | | Edit: | | To add one more thought, I've always been of the opinion that | ultimate trust needs to resolve to an individual, not an | organization. That probably needs to be done via certificates or | key signing and should be done by a local organization. | | If I could dictate a system for that, I'd use local businesses to | verify ID and sign keys. For example, I'm from Canada and would | love to go into Memory Express with my ID and have them sign my | GPG key. | | I don't think you can get a real WoT like what I think was | originally the intent for GPG. There are just too many bad actors | these days. I think verifying identity and tying stuff back to a | real person is the best you'll get. | | An no, I don't want the current code signing style verification. | It sucks and the incumbents are nothing more than a bunch of rent | seeking value extractors. | trollied wrote: | npm needs to sort its quality issue first, but this could also be | fixed with there being a better core javascript library. | | Take https://www.npmjs.com/package/is-odd for example. This | should not be a package. Why it is even allowed to be one is | insane. Do the developers importing it don't know how to test for | that themselves? Should it be part of core javascript? | | Javascript is a mess, and npm is by extension. | sbelskie wrote: | Does that package exist because JavaScript lacks a modulus | operator (I feel like I remember it having one), or because the | operator does/doesn't coerce things into numbers the way you'd | expect? Or is it honestly just laziness? | krapp wrote: | >Does that package exist because JavaScript lacks a modulus | operator (I feel like I remember it having one), or because | the operator does/doesn't coerce things into numbers the way | you'd expect? | | The package is _written_ in javascript and _uses_ the modulus | operator. | | It's just laziness and a cargo-cult mentality around package | granularity that's gotten way out of hand. There's no | rational basis for it. | trollied wrote: | Yet I get downvoted for calling it out. Go figure! | cortesoft wrote: | The point is that you don't need "is_odd" in the standard | library, you have modulus. You were arguing that | JavaScript needs a better standard library to keep people | from importing is_odd | trollied wrote: | Yeah, I tried to make an invalid point. My bad. The main | point still stands though, I'm sure there's hundreds of | other examples of pointless libraries that are easily | covered by something that is trivial. | adammunch666 wrote: | Cool beans bro! | adammunch666 wrote: | Cool beans, bro! | 7373737373 wrote: | > Some have argued that the ill health of the npm registry is a | social, rather than a technical problem | | In some cases it is, yes, for packages that require so many | access privileges that they can subvert the entire system they | run on. | | But this is not the case for (I'd estimate) the majority of | libraries, because they are purely _computational_ , they only | transform data and do not need access rights to any external | interfaces (filesystem, network, user input, displays, ...). | Malicious data generated by sandboxed programs is still a | problem, still the problem would be localized. | | There are efforts underway that would allow Javascript programs | to effectively and economically sandbox each other and grant only | the minimum number of privileges they need to perform their | tasks: https://medium.com/agoric/pola-would-have-prevented-the- | even... | | Avoiding global mutable state and | https://en.wikipedia.org/wiki/Ambient_authority, being able to | grant rights in an opt-in fashion and to transfer them in a way | that is robust in multi-party settings in accordance with | https://en.wikipedia.org/wiki/Capability-based_security | | This is the | https://en.wikipedia.org/wiki/Principle_of_least_privilege and I | encourage every language, virtual machine and operating system | designer to understand it and implement it in their systems. | | Then, the social attack surface can be technically minimized. | inopinatus wrote: | So basically, rely on about 5% of JavaScript (my copy of | _JavaScript: The Good Parts_ is looking slimmer every day) and | hope that everything you're either directly or transitively | exposed to has exactly the same standards you do and will | continue to do so in perpetuity, and /or build tons of | _additional_ scaffolding to try to sandbox violators, because | that has always been such a sure fire path to secure code. | | The language, and it's ecosystem, is a baroque Gormenghast of | curiosities built on an ancient sewer where nightmare beasts | still roam, and you'll never stop it stinking just by holing up | in the throne room and hoping a few trusted paladins will | decontaminate the rest. | markholmes wrote: | So what's the best way out? | inopinatus wrote: | New languages, new paradigms at the foundations. | | We keep throwing new shit at the wall, eventually something | sticks. To a whole generation of developers it might like | like JavaScript is the One True Web Programming Language | but anyone whose lived through a few transitions knows that | we replace entrenched technologies on the scale of decades, | it has come and it will go like everything else, _sic | transit gloria lingua_. | | The usual (but not universal) trigger is a technological | arms race between three or more competing firms attached to | a compelling new idea. | cxr wrote: | Stop equating JS with NPM and NodeJS. If you were using | Firefox in its peak era--let's say you were using it on the | day before Chrome was announced--then you were using an | application with large parts written in JS, in the way that | "large parts" of Emacs are written in Lisp. And yet it didn't | exhibit the problems that people complain about when they | complain about NPM. Because NPM didn't exist, and even if it | had existed, NPM still isn't JS. | | > So basically, rely on about 5% of JavaScript | | Yeah, what's wrong with that? Sturgeon's law says that 90% of | everything is crap, and of what remains, half of it you don't | need. | cgh wrote: | An excellent comment that merits upvotes based on the | Gormenghast reference alone. | geofft wrote: | The principle of least privilege/authority has been around for | a while, and the reason we don't see much adoption of it in | real-world systems is not because it's unknown. | | The first question is overhead: it's true that the majority of | libraries are purely computational, but that means that there's | frequent interaction between code written by the end developer | and code from the library. If every call to, say, lodash's | _.filter goes through a process to marshal the programmer's | list, send it to a separate execution environment, and then | marshal it right back in the other direction to call the | predicate, people would choose not to use it. I do agree that | the proposal in the post you link to seems to be on the right | track - directly run the code in the current execution | environment if it can be statically demonstrated that the code | has no access to dangerous capabilities. | | The second question is making the policy decision about whether | to grant privileges. You might be familiar with this from your | mobile phone: the security architecture is miles better than | that of your desktop OS, but still, most people do say "yes" | when asked to let Facebook, Twitter, Slack, etc. access their | photos and their camera and their microphone, because they | intentionally want those apps to have _some_ access. What do | you do in the above model when, say, the "request" library | wants access to the network? Now it can exfiltrate all of your | data. (The capability-based model is that you pass into the | library a capability to access the specific host it should talk | to, instead of giving it direct access, but again, if it did | this, people would choose not to use it - the whole point of | these libraries is to make writing code more convenient.) | | The other problem, and perhaps the most important, is that | _purely-computational libraries can still be dangerous_. Yes, | _.filter (and perhaps all of lodash) is purely computational, | but if you 're using it to, say, restrict which user records | are visible on a website, and someone malicious takes over | lodash, they can edit the filter function to say, "if the | username is me, don't filter anything at all." Or if you had a | capability-based HTTP client that only talked to a single | server, the library could still lie about the results that it | got from the server. | | I think the way to think about it is that the principle of | least privilege is a mitigation strategy, like ASLR or | filtering out things that look like SQL statements from web | requests. ASLR mitigates not being able to guarantee that your | code is memory-safe; if you could, you wouldn't need it. SQL | filtering mitigates making mistakes with string interpolation | (but it comes with a significant cost, so you really want to | avoid it if you can). Least privilege mitigates the reality | that you cannot code-review all of your code and its | dependencies to ensure that it's free of bugs. But, on the | other hand, _a mitigation is not a license to stop doing the | thing you can 't do perfectly_ - it's just a safety measure. | You can still have serious security bugs from buffer overflows | even with ASLR; you just have fewer. You should not use ASLR as | an excuse to write memory-unsafe code. You can still have SQL | injection attacks from people being clever about smuggling | strings. You should not use a WAF as an excuse to not use | parametrization in SQL queries. And you can still have | malicious dependencies cause problems even in a least-privilege | situation, because they still have _some_ privilege. You should | not use it as a reason to run dependencies you don 't trust. | slaymaker1907 wrote: | More languages should really be doing this and encouraging it. | The JVM can sandbox pretty well using a security manager, but | most people don't use the sandbox. | 7373737373 wrote: | Only very few language also provide the type of security the | JVM (partially) protects against: resource exhaustion | attacks. Being able to prevent time (e.g. infinite loops) and | space (memory allocation) exhaustion by being able to specify | absolute or relative limits on these. | | Stackless Python is able to limit the number of instruction | steps that are run in a tasklet: https://stackless.readthedoc | s.io/en/latest/library/stackless... | | With some ugly hacks, Lua can do it too. | | But no language I know of can do all these things, I tried to | build one once: https://esolangs.org/wiki/RarVM | RL_Quine wrote: | I don't think that the same community producing huge quantities | of single use libraries for the sake of padding their resumes | will get involved with sandboxing. I recently installed a | relatively simply piece of software using NPM and was stunned | when it downloaded hundreds dependencies from god knows where, | there's simply no ability for anybody to ever evaluate the | security risk of NodeJS applications. | | https://npm.anvaka.com/#/view/2d/zigbee2mqtt | Chyzwar wrote: | This module use widely used packages in node. In any non- | trivial node project you would already have these. Whole | point of small single use packages is to prevent from re- | inventing a wheel. People are bashing node community without | understanding. | | Typical you would have people publish hundreds similar | packages to solve a specific problem. Over time, best | maintained, feature complete would "win" and become a | standard, at this point other packages would converge and use | these 1/2 top solutions. This process allow exploring large | space of possible solutions and prevent app developers from | NIH. There is more churn, but also more innovation and | productivity. | | See for example css-in-js evolution story. | https://www.youtube.com/watch?v=75kmPj_iUOA | 7373737373 wrote: | Then perhaps single use libraries should be limited to | 'computation only' by default :) | cxr wrote: | > there's simply no ability for anybody to ever evaluate the | security risk of NodeJS applications | | So don't. Tell people that you're not going to run their | NodeJS crud and convince everyone to write their scripts | wherever possible to instead run on the sandboxed JS runtime | that everyone already happens to have installed: the browser. | | https://news.ycombinator.com/item?id=24495646 | sergeykish wrote: | Reminds of Gilad Bracha Newspeak. | stefan_ wrote: | Trying to solve the halting problem are we. Remember that one | of the most dangerous JavaScript APIs turned out to be a sub- | millisecond monotonically increasing time source. | samatman wrote: | Solving the halting problem is quite tractable. | | You can do it the right way, by using total functional | programming. Or you can do it the wrong way, by providing a | time budget or "gas", and yanking the process if and when | it's exceeded. | | No, this is more of a Rice's Theorem kind of situation... | goo6 wrote: | I will bet a lot of the NPM dependency problems can be solved if | Node directly implemented many of the Web APIs. If PHP can | implement Dom Parser, there's no reason Node can't implement it | as well, for example. | dane-pgp wrote: | A useful step in parallel to this would be making sure that every | NPM package is built from the source code that the metadata | claims it is built from: | | https://hackernoon.com/what-if-we-could-verify-npm-packages-... | cryptica wrote: | I don't like where this is going. Especially using number of | dependents as a measure of trust. Popularity has nothing to do | with trustworthiness (it just makes a problem less likely to | occur, but when a problem does occur, it will be a lot worse; and | npm has in fact encountered such issues in the past). | | Just look at the real world: Is the Federal Reserve Bank a | trustworthy institution? Sure, there are a lot of people using | its product (the US dollar) so it's extremely popular, but is it | trustworthy? Is the product actually what its users think it is? | | Power structures are very much the same in open source. The | ecosystem has been highly financialized; a library is popular | because its author has a lot of rich friends who helped them to | promote it on Twitter or elsewhere. So if you don't happen to | have rich friends, does that make you untrustworthy? | | This would lead to censorship of good projects from trustworthy | people who have genuinely good intentions. | | I think that such algorithms have done enough damage to society | already. | jrochkind1 wrote: | I mean... I would consider building a business based off the | assumption that the Fed will operate how it documents itself to | operate and not do things fraudulently or covertly, to be a lot | lower risk than, say, building a business based off assuming | the same of, say, Tether. Yeah, I'd say the Fed is pretty | trustworthy, and the fact that a lot of people depend upon it | is a signal of that(not a proof, or a guarantee, but a signal, | same as in the library dependency example) | ratww wrote: | I will die on that hill, so here it goes again: The problem with | NPM is _not_ the amount of runtime dependencies. | | Most Javascript projects would actually fare pretty well when | compared to other languages if _only_ runtime dependencies were | taken into account. | | Javascript staples like React, Vue, Svelte, Typescript and | Prettier actually have zero runtime dependencies. Also, the ES6 | standard library is not as bad as people claim. | | The real problem is with development dependencies. The amount of | dependencies required by Babel, Webpack and ESLint are the cause | for 99% of the dependency bloat we complain about in JS projects. | Those projects prefer to have monorepos, but when it's in your | machine they're splitted into tens or hundreds of dependencies. | Also remember that left-pad was only an issue in 2015 because a | Babel package required it. If those projects were able to get it | together we wouldn't even be having this conversation. Solve this | and you'll solve the biggest complaints people have about JS. | | I really would like to see a discussion on this, as most people | seem to put a lot of blame on JS as a whole, while it's mostly a | handful of popular projects generating _all_ the complaints. | SirensOfTitan wrote: | We built our app on node and typescript, and I would never choose | it again at this point because of the package ecosystem. We do a | lot to validate integrity of packages (including checking in | vetted archives to our repo), but it's hard. Our images are | ballooned to like 500-600MB (we've hit past the GB mark because | of certain packages messing up dependencies before) based on a | pretty conservative list of dependencies because of node_modules. | I'm constantly fighting a battle against image size increases. | The sheer amount of files in node_modules ensures that io is | always a problem for image size and build speed on CI. | | Solutions like yarn berry hardly help: zipfs and patched | tsservers is annoying in many editors still. Often packages break | because package maintainers include implicit dependencies or the | packages their packages depend on do so. Arc has frozen emacs for | me several times when jumping to definition in a zip. | | I'm just so over the package situation for node. | martpie wrote: | 500-600mb sounds like you are shipping dev dependencies in your | images. | | You should use npm/yarn's production flags when installing your | dependencies for your images, so you only ship runtime | dependencies. | | Your images will shrink to 100-150mb. | dpc_pw wrote: | https://github.com/crev-dev/crev/ | | People willing to help out with `npm-crev` implementations | needed. :) | arkadiyt wrote: | Here's my hot take: supply chain attacks are a low risk for your | organization - they are both low likelihood and low impact. | | 1) Low likelihood: when popular packages get subverted it is | caught quickly due to how widely packages are distributed. After | it's caught the problem is also heavily publicized for folks to | take action, and registries remove the affected versions | immediately so there is a very small exposure window. | | 2) Low impact: people who write malicious code into these | packages don't have a specific target, they are writing dragnet | malware, which typically means mining cryptocurrency or | ransomware. If you're going to get hacked then that's the best | possible outcome (as opposed to, e.g. a data breach). | | Your security posture would have to be superb if supply chain | attacks were anywhere near the top of your list - for the | majority of companies they have more basic and targeted issues to | worry about. | jakear wrote: | Eh... I don't share your cavalier attitude. You assume these | attacks aren't targeted just because we haven't seen them, but | it wouldn't be hard at all for an attacker to take control of a | package through some means (purchase, social engineering, or | just solving a problem more efficiently than others do and | aggressively asking others to adopt it), then publishing to npm | a minified version of the package which includes some targeted | exploit that doesn't activate except in a specific environment. | The source on GitHib would ofc not include the exploit, and | there's no push for reproducible builds in the npm world so | verifying that npm's minified JS was built from the GitHub | source is nontrivial and not something most shops would bother | with. | tao_oat wrote: | Unfortunately, targeted attacks have been seen in the wild. The | `event-stream` attack linked in the post was one example. | Alternatively, look at the attack on the Agama cryptocurrency | wallet --- the attackers even managed to exfiltrate private | wallet keys there: https://komodoplatform.com/update-agama- | vulnerability/ | captn3m0 wrote: | On (2) low impact: | | A few npm advisories mention packages that were uploading SSH | keys and bashrc files. | | - https://www.npmjs.com/advisories/541 (package==coffeescript) | | - https://www.npmjs.com/advisories/765 | (package==portionfatty12) | | There's also been packages that would upload the environment | variables (increases impact significantly if this reaches | production): | | - https://blog.npmjs.org/post/163723642530/crossenv-malware- | on... (package==crossenv) | | - https://www.npmjs.com/advisories/486 (package==sqlserver) | spullara wrote: | The only attack I have detailed knowledge of was targeted | specifically at a company - Copay: | | https://thenewstack.io/attackers-up-their-game-with-latest-n... | saagarjha wrote: | I mean, if they write code that grabs all the cookies on a site | as well as traffic they have a fairly decent chance of a data | breach... | 7373737373 wrote: | This is part of the reason why current languages and operating | systems simply do not have security properties that would | inhibit or entirely prevent these risks: it never mattered | economically enough to implement them. Big corporations insure | themselves against these risks financially (if at all), not | technologically. | | The other big reason has been having to maintain backward | compatibility, personal computers and programming languages | built for them were only networked late compared to some | mainframe systems. There have been very interesting historical | networked operating systems that were far more secure in their | architecture than current contenders: | https://github.com/void4/notes/issues/41 | captn3m0 wrote: | The strong-set of such nature doesn't come with much guarantees | beyond past-history of the said users. For eg, having commit | rights to Debian requires a certain level of security know-how, | being an Arch Trusted User has similar requirements (they moved | to yubikeys everywhere a while back for eg). | | We don't even know if all these users have 2FA enabled for their | NPM accounts. Building a software distribution ecosystem that | offers trust guarantees post-facto is a really hard challenge, | and I think that the right answer is in providing developers | better sandboxes. That's not to say this can't be used as a | signal as the author suggests, just that the "strong-set- | user/package=safe" guarantee doesn't have an underlying basis as | of yet. | tao_oat wrote: | > just that the "strong-set-user/package=safe" guarantee | doesn't have an underlying basis as of yet. | | Author here --- I agree. There can be no guarantees about the | safety of a package based only on its maintainer(s); their | accounts could be taken over, or they could be paid off, and so | on. I'm hopeful about initiatives like Deno that provide better | security controls built-in to the language. | | A significant hurdle to overcome is getting npm (and all open- | source) developers to think about trust in the first place. The | event-stream incident happened when the previous maintainer | handed over control to a random stranger that showed up. We've | seen similar things happen in other attacks. The thought at | this point is that by making trust more explicit, we might | start a move in the right direction. ___________________________________________________________________ (page generated 2020-10-03 23:00 UTC)