[HN Gopher] Python utility for tracking third party dependencies... ___________________________________________________________________ Python utility for tracking third party dependencies within a library Author : prashantgupta24 Score : 142 points Date : 2022-05-25 16:50 UTC (6 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | lifeisstillgood wrote: | The couple of times I have done something similar I have found an | odd outcome - first was internal (to large company codebase) and | second was npm imports years ago. Both times one ended up pulling | in huge numbers of dependencies (900+ npm, 600 or so internal) | | The point was that pretty no much what starting point one used, | you pulled in much the same amount. There was a common core but | even so it was like a starfish - if you start at tip of one limb, | you pull in that limb and the core. start on another limb same | thing. | | but all the limbs are about the same size | | it's just anecdata but it has been at the back of my mind as some | kind of rule. | cubes wrote: | This looks really neat. One thing I noticed on reading the source | code, it appears to actually import the modules: | | Quoting the docstring on the `track_module` function: | """This function executes the tracking of a single module by | launching a subprocess to execute this module against the | target module. The implementation of thie tracking | resides in the __main__ in order to carefully control the | import ecosystem. | | Source: https://github.com/IBM/import- | tracker/blob/67a1e84e5a609e52e... | | Here's the actual subprocess call: https://github.com/IBM/import- | tracker/blob/67a1e84e5a609e52e... # Launch the | process proc = subprocess.Popen(shlex.split(cmd), | stdout=subprocess.PIPE, env=env) | | I think this is clever, and maybe even necessary, but feels risky | to do on unaudited third-party Python libraries. | | Maybe I'm misunderstanding something? | gabegoodhart wrote: | Hi, I'm the main author of import_tracker. Thanks for taking | the time to dig into it! It's a really interesting point that | the subproces.Popen could itself be a security concern. The | command that's being executed is executing the __main__ of the | import_tracker library itself (which is not something that a | user can't configure), so is your concern that import_tracker | itself is untrusted and might be a concern for users running | this on their machines? | | For context on why I'm using the suprocess here, this allows | the tracking to correctly allocate dependencies that are | imported more than once (think my_lib.submod1 and | my_lib.submod2 both need tensorflow, but my_lib.submod3 | doesn't). | jreese wrote: | > I think this is clever, and maybe even necessary, but feels | risky to do on unaudited third-party Python libraries. | | This is why my coworker built the project he called "dowsing"; | it tries to understand as much as possible from the setup.py's | AST, without actually executing it. | | https://github.com/python-packaging/dowsing | cubes wrote: | Neat, I'll take a look! I thought I was going to need to | write something similar! | barefeg wrote: | Could I use the lazy import to define a single set of | dependencies of a monorepo and then load only the required subset | for each project? | beisner wrote: | That's how Bazel Python works | samwillis wrote: | This looks like a real useful tool for large projects, it can be | quite possible to loose track of what a specific dependancy is | used for. I also like the idea of making an import lazy so in | monolithic app you could have a deployment that excludes | functionality, and exclude its dependancies. | | When I read the title I was hoping for something else though, | what I would love is a tool that logs and potentially blocks | unexpected IO operations on a library basis. With the increasing | common supply chain attacks we are seeing (there was a PyPI one | just the other day), having a way to at least report on | unexpected activity if not help prevent it would be brilliant. | Has anyone ever found a tool like that? | | (Obviously the ultimate solution would be an outbound firewall, | but it seems be that although you can easily do this in a VM or | bare metal, I haven't seen any PAAS platforms have that sort of | capability) | ashishbijlani wrote: | https://github.com/ossillate-inc/packj analyzes Python/NPM | packages for risky code and metadata attributes. Uses static | code analysis. We found a bunch of malicious packages on PyPI | using the tool, which have now been taken down: examples | https://packj.dev/malware [disclosure: I'm one of the | developers] | woodruffw wrote: | > When I read the title I was hoping for something else though, | what I would love is a tool that logs and potentially blocks | unexpected IO operations on a library basis. With the | increasing common supply chain attacks we are seeing (there was | a PyPI one just the other day), having a way to at least report | on unexpected activity if not help prevent would be brilliant. | Has anyone ever found a tool like. that? | | You could do something close to that with Python's audit hooks, | which were introduced with 3.8[1]. One massive caveat: audit | hooks can be disabled by an attacker with the ability to | control the interpreter, and are not perfect (there's plenty of | things they don't cover.) | | (More generally: this kind of auditing/restriction falls under | the umbrella of "capability management." OpenBSD's pledge[2] is | another example.) | | [1]: https://peps.python.org/pep-0578/ | | [2]: https://man.openbsd.org/pledge.2 | simonw wrote: | Tried this on one of my projects, it's neat. | python3 -m import_tracker --name datasette --recursive | jq | { "datasette": [ "aiofiles", | "click", "markupsafe", "mergedeep", | "pluggy", "yaml" ], | "datasette.version": [], | "datasette.utils.shutil_backport": [ "click", | "markupsafe", "mergedeep", "yaml" | ], "datasette.utils.sqlite": [ "click", | "markupsafe", "mergedeep", "yaml" | ], "datasette.utils": [ "click", | "markupsafe", "mergedeep", "yaml" | ], "datasette.utils.asgi": [ "aiofiles", | "click", "markupsafe", "mergedeep", | "yaml" ], "datasette.hookspecs": [ | "aiofiles", "click", "markupsafe", | "mergedeep", "pluggy", "yaml" ] | } | | Related tool: pipdeptree - here's the output from that against a | project that installs a lot of extra stuff: | https://github.com/simonw/latest-datasette-with-all-plugins/... | blakesley wrote: | Is this the equivalent of Poetry's `poetry show --tree`? | hrpnk wrote: | You can use Syft [1] which generates the full software bill of | materials, which includes package names, licenses for a broad set | of tech stack ranging from OS level (Alpine, Debian), through Go, | Ruby, Python, Java, JavaScript, etc. | | [1] https://github.com/anchore/syft | woodruffw wrote: | Since this is about Python specifically, I'll go ahead and and | highlight `pip-audit`[1] as a specialized tool for generating | Python SBOMs and running audits against the official PyPI | vulnerability feed. | | FD: My company, my work. | | [1]: https://github.com/trailofbits/pip-audit | [deleted] ___________________________________________________________________ (page generated 2022-05-25 23:00 UTC)