[HN Gopher] Ignore 98% of dependency alerts: introducing Semgrep... ___________________________________________________________________ Ignore 98% of dependency alerts: introducing Semgrep Supply Chain Author : ievans Score : 113 points Date : 2022-10-04 15:45 UTC (7 hours ago) (HTM) web link (r2c.dev) (TXT) w3m dump (r2c.dev) | snowstormsun wrote: | Really nice idea to only show warnings if they are relevant. It's | indeed annoying if you need to upgrade lodash only to make your | audit tool not show critical warnings because of some function | that is not used at all. | | This is not open source, though? It does make a big difference | for some whether you're able to run the check offline or you're | forced to upload your code to some service. | | One feature I'd love in such tool would be to be able to get the | relevant parts of the changelog of the package that needs to be | upgraded. It's not responsible to just run the upgrade command | without checking the changelog for breaking or relevant changes. | That's exactly why upgrades tend to be done very late, because | there is a real risk of breaking something even if it's just a | minor version. | mattkopecki wrote: | There are definitely other approaches that don't require code | to be uploaded anywhere. For example, we (https://rezilion.com) | work with your package managers to understand what dependencies | your program has, and then analyze that metadata on the back | end. Net result is still to be able to see what vulnerabilities | are truly exploitable and which are not. | ievans wrote: | All the engine functionality is FOSS | https://semgrep.dev/docs/experiments/r2c-internal-project-de... | (code at https://github.com/returntocorp/semgrep); but the | rules are currently private (may change in the future). | | As with all other Semgrep scanning, the analysis is done | locally and offline -- which is a major contrast to most other | vendors. See #12 on our development philosophy for more | details: https://semgrep.dev/docs/contributing/semgrep- | philosophy/ | | Relevant part of the changelog is a good idea--others have also | come out with statistical approaches based on upgrades others | made (eg dependabot has a compatibility score which is based on | "when we made PRs for this on other repos, what % of the time | did tests pass vs fail") | freeqaz wrote: | Here is some code on GitHub that does call site checking using | SemGrep: https://github.com/lunasec- | io/lunasec/blob/master/lunatrace/... | | (Note: I helped write that. We're building a similar service to | the r2c one.) | | You're right that patching is hard because of opaque package | diffs. I've seen some tools coming out like Socket.dev which | show a diff between versions. | https://socket.dev/npm/package/react/versions | | But, that said, this is still a hard problem to solve and it's | happened before that malware[0][1] has been silently shipped | because of how opaque packages are. | | 0: | https://web.archive.org/web/20201221173112/https://github.co... | | 1: https://www.coindesk.com/markets/2018/11/27/fake- | developer-s... | feross wrote: | Thanks for mentioning Socket.dev :) | | Looking at package diffs is super important because of the | rise of "protestware". For example, a maintainer of the | event-source-polyfill package recently added code which | redirects website visitors located in Eastern European | timezones to a change.org petition page. This means that real | users are being navigated to this random URL in production. | | See the attack code here: | https://socket.dev/npm/package/event-source- | polyfill/diff/1.... | | It's very unlikely that users of event-source-polyfill are | aware that this hidden behavior has been added to the | package. And yet, the package remains available on npm many | months after it was initially published. We think that supply | chain security tools like Socket have an important role to | play in warning npm users when unwanted 'gray area' code is | added to packages they use. | stevebmark wrote: | I've always thought that dependabot was busy-work, a waste of | time. This article makes a good point that drives it home: | Alarams that aren't real make all alarms useless. Dependabot is | especially painful in non-typed languages (Python, Ruby, and | especially Javascript) where "upgrading" a library can break | things that there's no way to know until production. | | Maybe the constant work, extra build time (and cash for all | that), and risk of breaking production, is worth it for the 0.01% | of the time there's a real vulnerability? It seems like a high | price to pay though. When there are major software | vulnerabilities (like log4j), the whole industry usually swarms | around it, and the alarm has high value. | | I just realized how much CircleCI probably loves Dependabot. I | wonder what hit % their margins would take if we moved off it | collectively as an industry. | bawolff wrote: | I kind of feel like dependabot alerts should be treated like a | coding convention error - that extra whitespace isnt actually | causing a problem but we fix it right away. | | Otherwise you have to start analyzing the alerts, and good luck | with that. The low severity ones are marked critical and the | scary ones are marked low. Suddenly you have 200 unfixed alerts | and its impossible to know if somewhere in that haystack is an | important one. | mfer wrote: | > When there are major software vulnerabilities (like log4j), | the whole industry usually swarms around it, and the alarm has | high value. | | You're leaving me with the impression that you think we should | only patch major software vulnerabilities. This I would | disagree with. Minor vulnerabilities can be used, especially in | groups, to do things we don't anticipate. It's not just about a | single vulnerability but about how an attacker can leverage | multiple different vulnerabilities together. | danenania wrote: | If you use vendoring, it's also worth considering that there's | always some inherent security risk in upgrading dependencies. | If an attacker takes control of a package somewhere in your | dependency tree, you don't get compromised until you actually | install a new version of that package. This risk can often | outweigh the risk of very minor/dev-facing CVEs. | feross wrote: | Shameless plug: This is what I'm building Socket.dev to | solve. | | Socket watches for changes to "package manifest" files such | as package.json, package-lock.json, and yarn.lock. Whenever a | new dependency is added in a pull request, Socket analyzes | the package's behavior and leaves a comment if it is a | security risk. | | You can see some real-world examples here: | https://socket.dev/blog/socket-for-github-1.0 | e1g wrote: | We use Socket and my favorite feature is when you highlight | new dependencies with a post-install hook. It's not always | a problem, but almost always a smell. | | One feature request: please allow me to "suppress" warnings | for a specific package+version combo. This is useful for | activist libs that take a political stance - I know it | happens, but often cannot remove them, and don't want to | continue flagging the same problem at every sec review. | smcleod wrote: | IMO Dependabot is really dreadful at its job. Try Renovate - | it's really brilliant, fast, flexible, supports properly | binding PRs/MRs. | scinerio wrote: | Will this ever be integrated with Gitlab Ultimate? | mattkopecki wrote: | Gitlab Ultimate uses Rezilion to accomplish a similar aim. | Rather than using the principle of "reachability", Rezilion | analyzes at runtime what functions and classes are loaded to | memory. Much more deterministic and less of a guess about what | code will be called. | | https://about.gitlab.com/blog/2022/03/23/gitlab-rezilion-int... | masklinn wrote: | How does it do that in the face of lazy loading, or for | languages in which "what functions and classes are loaded in | to memory" is not really a thing (e.g. C)? | tsimionescu wrote: | Shouldn't this be very easy in C? With static linking, | you're vulnerable if you're linking the package. With | dynamic linking, you're vulnerable if you're importing the | specific functions. Otherwise, you're not vulnerable - | there's no other legal way to call a function in C. | | Now, if you're memory mapping some file and jumping into it | to call that function, good luck. You're already well into | undefined behavior territory. | | Now, for lazy loading, I'm assuming the answer is the same | as any other runtime path analysis tool: it's up to you to | make sure all relevant code paths are actually running | during the analysis. Presumably your tests should be | written in such a way as to trigger the loading of all | dependencies. | | I think there's really no other reasonable way to handle | this, though I can't say I've worked with either GutHub | Ultimate or Rezilion, so maybe I'm missing something. | underyx wrote: | Hey, I work on OP's product, and just wanted to mention | that reachability is not always about a function being | called. Sometimes insecure behavior is triggered by | setting options to a certain value[0]. Other times it's | feasible to mark usages of an insecure function as safe | when we know that the passed argument comes from a | trusted source[1]. The Semgrep rules we write understand | these nuances instead of just flagging function calls. | | [0]: e.g. https://nvd.nist.gov/vuln/detail/CVE-2021-28957 | | [1]: e.g. https://nvd.nist.gov/vuln/detail/CVE-2014-0081 | mattkopecki wrote: | Rezilion works at runtime when the Gitlab runner spins up a | container for testing the app. Rezilion observes the | contents of memory and can reverse-engineer back to the | filesystem to see where everything was loaded from. | | In the CI pipeline this depends on your tests exercising | the app, but when you deploy Rezilion into a longer-lived | environment like Stage or Prod then you may get some new | code pathways that are used, although most find that the | results aren't surprisingly different between all of the | environments. | scinerio wrote: | Ah, thank you. It's not entirely clear whether this is | something baked into Gitlab Ultimates SAST CI/CD | feature/template, or if it's a third party that I would have | to license first. Do you happen to know? | jollyllama wrote: | Sounds nice. I've never worked with a tool like this that doesn't | turn up a ridiculous number of false positives. | henvic wrote: | How the hell do you end up with 1644 vulnerable packages anyways? | | * rhetorical question, JS... | | It was actually one of the main drivers for me to start using Go | instead of JavaScript for server-side applications and CLIs about | 8 years ago. | nightpool wrote: | Roughly: NPM, Github, and others funded open bug bounties for | all popular NPM packages. These bug bounties led to a rash of | security "vulnerabilities" being reported against open source | project, to satisfy the terms of the bounty conditions. Public | bug bounty "intermediary" companies are a major culprit here-- | they have an incentive to push maintainers to accept even | trivial "vulnerabilities", since their success is tied to | "number of vulnerabilities reported" and "amount of bounties | paid out". This leads to classes of vulnerabilities like reDOS | or prototype pollution that would never have been noticed or | worth any money otherwise. | thenerdhead wrote: | The problem really comes down to data quality in disclosing | vulnerabilities. | | With higher quality data, better CVSS scores can be calculated. | With higher quality data, affected code paths can be better | disclosed. With higher quality data, unknown vulnerabilities may | be found in parallel to the known ones. | | I don't think any tool or automation can solve the problem of | high quality data. Humans have to discern to provide it. No | amount of code analysis can solve that. But it sure can help. | light24bulbs wrote: | You're right. Nobody bothers to make scanners because there's | no data, and nobody has come up with a good format to convey | the data between producers (like NVD) and consumers (like | dependabot). | | I wrote a blog post talking about some of this stuff: | https://www.lunasec.io/docs/blog/the-issue-with-vuln-scanner... | | It truly is a chicken and egg problem. There are next to no | automated scanners that make use of data like that, semgrep is | the furthest along and my company is close behind them at | taking a stab at it as far as I can tell. Heck there are hardly | any that do anything with the existing "Environmental" part of | the CVSS, and that has been pretty well populated by NVD, I | believe. | | The existing interchange formats for vulnerability data, such | as OSV, are underdesigned to the point that it feels like | GitHub CoPilot designed them. It's real work to even get to the | point that you can consume them, given all the weird choices in | there. Sorry if I'm salty. | | There is an attempt to create a standard for situational | vulnerability exposure called "VEX" or Vulnerability Exchange | Format, but it's almost entirely focused on conveying | information about what vulnerabilities have been manually | eliminated, so that software "vendors" can satisfy their | customers, especially in government contracts. It's not | modeling the full picture of what can happen in a dependency | tree and all the useful false-positive information in there. | thenerdhead wrote: | Yeah agreed. When I see these problem statements, I see us | addressing problems that are by-products of vulnerability | fatigue. | | I.e "be lazy and ignore those vulnerabilities by using our | tools!" | | It hardly solves the true issue of an industry wide challenge | of lack of useful information or even transparency of said | information from responsible parties. I believe this laziness | is what got us here in the first place. | CSDude wrote: | Jokes on you I already ignore %100 of them /s | | I like the promise however how can I trust it completely that the | ignored part is not actually reachable? All the languages (except | a few) do some magic that might not be detected? At previous | work, we were bombarded with dependency upgrades, I can still | feel the pain in my bones. | thefrozenone wrote: | How does this tool go from a vuln. in a library to -> a set of | affected functions/control paths? My understanding was that the | CVE format is unustructed which makes an analysis like this | difficult | theptip wrote: | My question too. All I see is this citation: | | > [1] We'll be sharing more details about this work later in | October. Stay tuned! | ievans wrote: | We added support to the Semgrep engine for combining package | metadata restrictions (from the CVE format) with code search | patterns that indicate you're using the vulnerable library | (we're writing those mostly manually, but Semgrep makes it | pretty easy): - id: vulnerable-awscli- | apr-2017 pattern-either: - pattern: | boto3.resource('s3', ...) - pattern: | boto3.client('s3', ...) r2c-internal-project-depends- | on: namespace: pypi package: awscli | version: "<= 1.11.82" message: this version of awscli | is subject to a directory traversal vulnerability in the s3 | module | | This is still experimental and internal | (https://semgrep.dev/docs/experiments/r2c-internal-project- | de...) but eventually we'd like to promote it and also maybe | open up our CVE rules more as well! | mattkopecki wrote: | Here is a good writeup of some of the pros and cons of using | a "reachability" approach. | | https://blog.sonatype.com/prioritizing-open-source- | vulnerabi... | | >Unfortunately, no technology currently exists that can tell | you whether a method is definitively not called, and even if | it is not called currently, it's just one code change away | from being called. This means that reachability should never | be used as an excuse to completely ignore a vulnerability, | but rather reachability of a vulnerability should be just one | component of a more holistic approach to assessing risk that | also takes into account the application context and severity | of the vulnerability. | DannyBee wrote: | Err, "no technology currently exists" is wrong, "no | technology can possibly exist" to say whether something if | definitively called. | | It's an undecidable problem in any of the top programming | languages, and some of the sub problems (like aliasing) | themselves are similarly statically undecidable in any | meaningful programming language. | | You can choose between over-approximation or under- | approximation. | sverhagen wrote: | I saw that Java support was still in beta. But it makes me | wonder if it's going to come with a "don't use reflection" | disclaimer, then...? | jrockway wrote: | This is a similar mechanism as govulncheck | (https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck), which has | been quite nice to use in practice. Because it only cares about | vulnerable code that is actually possible to call, it's quiet | enough to use as a presubmit check without annoying people. Nice | to see this for other languages. | Hooray_Darakian wrote: | How does it deal with vulnerability alerts which don't say | anything about what code is affected? | jrockway wrote: | From https://go.dev/security/vuln/: "A vulnerability database | is populated with reports using information from the data | pipeline. All reports in the database are reviewed and | curated by the Go Security team." | | I would imagine that's what Semgrep is doing as well. You're | paying for the analysis; the code is the easy part. | ievans wrote: | Both Semgrep Supply Chain and govulncheck (AFAIK) are doing | this work manually, for now. It would indeed be nice if the | vulnerability reporting process had a way to provide | metadata, but there's no real consensus on what format that | data would take. We take advantage of the fact that Semgrep | makes it much easier than other commercial tools (or even | most linters) to write a rule quickly. | | The good news is there's a natural statistical power | distribution: most alerts come from few vulnerabilities in | the most popular (and often large) libraries, so you get | significant lift just by writing rules starting with | libraries. | Hooray_Darakian wrote: | > Both Semgrep Supply Chain and govulncheck (AFAIK) are | doing this work manually, for now. | | Ya I get that, but surely you don't have 100% coverage. | What does your code do for the advisories which you don't | have coverage for? Alert? Ignore? | nightpool wrote: | Since security vulnerability alerts are already created | and processed manually (e.g., every Dependabot alert is | triggered by some Github employee who imported the right | data into their system and clicked "send" on it), adding | an extra step to create the right rules doesn't seem | impossibly resource intensive. Certainly much more time | is spent "manually" processing even easier-to-automate | things in other parts of the economy, like payments | reconciliation (https://keshikomisimulator.com/) ___________________________________________________________________ (page generated 2022-10-04 23:00 UTC)