[HN Gopher] Python utility for tracking third party dependencies...
       ___________________________________________________________________
        
       Python utility for tracking third party dependencies within a
       library
        
       Author : prashantgupta24
       Score  : 142 points
       Date   : 2022-05-25 16:50 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | lifeisstillgood wrote:
       | The couple of times I have done something similar I have found an
       | odd outcome - first was internal (to large company codebase) and
       | second was npm imports years ago. Both times one ended up pulling
       | in huge numbers of dependencies (900+ npm, 600 or so internal)
       | 
       | The point was that pretty no much what starting point one used,
       | you pulled in much the same amount. There was a common core but
       | even so it was like a starfish - if you start at tip of one limb,
       | you pull in that limb and the core. start on another limb same
       | thing.
       | 
       | but all the limbs are about the same size
       | 
       | it's just anecdata but it has been at the back of my mind as some
       | kind of rule.
        
       | cubes wrote:
       | This looks really neat. One thing I noticed on reading the source
       | code, it appears to actually import the modules:
       | 
       | Quoting the docstring on the `track_module` function:
       | """This function executes the tracking of a single module by
       | launching a         subprocess to execute this module against the
       | target module. The         implementation of thie tracking
       | resides in the __main__ in order to         carefully control the
       | import ecosystem.
       | 
       | Source: https://github.com/IBM/import-
       | tracker/blob/67a1e84e5a609e52e...
       | 
       | Here's the actual subprocess call: https://github.com/IBM/import-
       | tracker/blob/67a1e84e5a609e52e...                   # Launch the
       | process         proc = subprocess.Popen(shlex.split(cmd),
       | stdout=subprocess.PIPE, env=env)
       | 
       | I think this is clever, and maybe even necessary, but feels risky
       | to do on unaudited third-party Python libraries.
       | 
       | Maybe I'm misunderstanding something?
        
         | gabegoodhart wrote:
         | Hi, I'm the main author of import_tracker. Thanks for taking
         | the time to dig into it! It's a really interesting point that
         | the subproces.Popen could itself be a security concern. The
         | command that's being executed is executing the __main__ of the
         | import_tracker library itself (which is not something that a
         | user can't configure), so is your concern that import_tracker
         | itself is untrusted and might be a concern for users running
         | this on their machines?
         | 
         | For context on why I'm using the suprocess here, this allows
         | the tracking to correctly allocate dependencies that are
         | imported more than once (think my_lib.submod1 and
         | my_lib.submod2 both need tensorflow, but my_lib.submod3
         | doesn't).
        
         | jreese wrote:
         | > I think this is clever, and maybe even necessary, but feels
         | risky to do on unaudited third-party Python libraries.
         | 
         | This is why my coworker built the project he called "dowsing";
         | it tries to understand as much as possible from the setup.py's
         | AST, without actually executing it.
         | 
         | https://github.com/python-packaging/dowsing
        
           | cubes wrote:
           | Neat, I'll take a look! I thought I was going to need to
           | write something similar!
        
       | barefeg wrote:
       | Could I use the lazy import to define a single set of
       | dependencies of a monorepo and then load only the required subset
       | for each project?
        
         | beisner wrote:
         | That's how Bazel Python works
        
       | samwillis wrote:
       | This looks like a real useful tool for large projects, it can be
       | quite possible to loose track of what a specific dependancy is
       | used for. I also like the idea of making an import lazy so in
       | monolithic app you could have a deployment that excludes
       | functionality, and exclude its dependancies.
       | 
       | When I read the title I was hoping for something else though,
       | what I would love is a tool that logs and potentially blocks
       | unexpected IO operations on a library basis. With the increasing
       | common supply chain attacks we are seeing (there was a PyPI one
       | just the other day), having a way to at least report on
       | unexpected activity if not help prevent it would be brilliant.
       | Has anyone ever found a tool like that?
       | 
       | (Obviously the ultimate solution would be an outbound firewall,
       | but it seems be that although you can easily do this in a VM or
       | bare metal, I haven't seen any PAAS platforms have that sort of
       | capability)
        
         | ashishbijlani wrote:
         | https://github.com/ossillate-inc/packj analyzes Python/NPM
         | packages for risky code and metadata attributes. Uses static
         | code analysis. We found a bunch of malicious packages on PyPI
         | using the tool, which have now been taken down: examples
         | https://packj.dev/malware [disclosure: I'm one of the
         | developers]
        
         | woodruffw wrote:
         | > When I read the title I was hoping for something else though,
         | what I would love is a tool that logs and potentially blocks
         | unexpected IO operations on a library basis. With the
         | increasing common supply chain attacks we are seeing (there was
         | a PyPI one just the other day), having a way to at least report
         | on unexpected activity if not help prevent would be brilliant.
         | Has anyone ever found a tool like. that?
         | 
         | You could do something close to that with Python's audit hooks,
         | which were introduced with 3.8[1]. One massive caveat: audit
         | hooks can be disabled by an attacker with the ability to
         | control the interpreter, and are not perfect (there's plenty of
         | things they don't cover.)
         | 
         | (More generally: this kind of auditing/restriction falls under
         | the umbrella of "capability management." OpenBSD's pledge[2] is
         | another example.)
         | 
         | [1]: https://peps.python.org/pep-0578/
         | 
         | [2]: https://man.openbsd.org/pledge.2
        
       | simonw wrote:
       | Tried this on one of my projects, it's neat.
       | python3 -m import_tracker --name datasette --recursive | jq
       | {           "datasette": [             "aiofiles",
       | "click",             "markupsafe",             "mergedeep",
       | "pluggy",             "yaml"           ],
       | "datasette.version": [],
       | "datasette.utils.shutil_backport": [             "click",
       | "markupsafe",             "mergedeep",             "yaml"
       | ],           "datasette.utils.sqlite": [             "click",
       | "markupsafe",             "mergedeep",             "yaml"
       | ],           "datasette.utils": [             "click",
       | "markupsafe",             "mergedeep",             "yaml"
       | ],           "datasette.utils.asgi": [             "aiofiles",
       | "click",             "markupsafe",             "mergedeep",
       | "yaml"           ],           "datasette.hookspecs": [
       | "aiofiles",             "click",             "markupsafe",
       | "mergedeep",             "pluggy",             "yaml"           ]
       | }
       | 
       | Related tool: pipdeptree - here's the output from that against a
       | project that installs a lot of extra stuff:
       | https://github.com/simonw/latest-datasette-with-all-plugins/...
        
         | blakesley wrote:
         | Is this the equivalent of Poetry's `poetry show --tree`?
        
       | hrpnk wrote:
       | You can use Syft [1] which generates the full software bill of
       | materials, which includes package names, licenses for a broad set
       | of tech stack ranging from OS level (Alpine, Debian), through Go,
       | Ruby, Python, Java, JavaScript, etc.
       | 
       | [1] https://github.com/anchore/syft
        
         | woodruffw wrote:
         | Since this is about Python specifically, I'll go ahead and and
         | highlight `pip-audit`[1] as a specialized tool for generating
         | Python SBOMs and running audits against the official PyPI
         | vulnerability feed.
         | 
         | FD: My company, my work.
         | 
         | [1]: https://github.com/trailofbits/pip-audit
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-05-25 23:00 UTC)