[HN Gopher] Cooperative Package Management for Python
       ___________________________________________________________________
        
       Cooperative Package Management for Python
        
       Author : Tomte
       Score  : 59 points
       Date   : 2021-09-01 05:56 UTC (4 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | derriz wrote:
       | I probably haven't thought this fully through but wouldn't it be
       | simpler to just have a system venv - root protected - perhaps
       | distributed using the system's package manager? Then if you mess
       | up site-packages at least you wouldn't break the system tools.
        
         | BiteCode_dev wrote:
         | It's already the case, that's why you see some mistaken
         | tutorials telling you to "sudo pip install".
         | 
         | A venv is just a dir and some path, and your os already have
         | some dedicated to their own python install, even if it's not
         | called a venv, but something like dist-packages + manual PATH
         | fudging.
         | 
         | This proposal makes the distinction clearer by putting
         | safeguard to NOT to mess up with the system stdlib.
         | 
         | You should be using "--user" or a local venv, and "-m" to call
         | commands.
         | 
         | See my other comments for more tooling.
        
           | derriz wrote:
           | I'm missing something I guess - I don't understand your
           | comment in the context of this claim near the start of the
           | article:
           | 
           | "The root cause of the problem is that distribution package
           | managers and Python package managers ("pip" is shorthand to
           | refer to those throughout the rest of the article) often
           | share the same "site-packages" directory for storing
           | installed packages."
           | 
           | Having a system venv - managed by the system's package
           | manager - would mean this "root cause" would go away, no?
           | 
           | Of course by poking around the filesystem tree and messing
           | with managed files as root, you could mess up the system venv
           | but that's possible with any installed package.
        
             | faho wrote:
             | >Having a system venv - managed by the system's package
             | manager - would mean this "root cause" would go away, no?
             | 
             | Oh, but you need to explain to pip that it's managed by the
             | system's package manager. Which is what this does with the
             | "EXTERNALLY-MANAGED" file.
             | 
             | And you need to do that because the system's package
             | manager and pip have separate sources - with `pip` you get
             | the packages from pypi, with the system's package manager
             | from its repo.
        
             | BiteCode_dev wrote:
             | As I said, a venv is nothing more than a directory + PATH
             | setup. OS already have a dedicated one. The problem is not
             | that it does not exist, the problem is that pip is using
             | it.
        
       | JonathanBeuys wrote:
       | On my computer, I only use Python software that is in the Debian
       | repos. So all dependencies are handled by Debian.
       | 
       | Everything else I run in a Docker container.
       | 
       | This way I never need a venv.
       | 
       | I hope these changes will not make my workflow harder.
       | 
       | Running 3rd party software in containers not only makes
       | dependency handling easier. It also is a security thing. Can you
       | really trust the whole dependency tree below any python code that
       | you want to run? I prefer to keep it separate from my main OS.
        
         | BiteCode_dev wrote:
         | No, it fact it will make your workflow easier: no matter what,
         | no script using pip will break your install.
        
           | JonathanBeuys wrote:
           | I don't know if I actually have something you would call an
           | "install".
           | 
           | I run 3rd party software in fresh Debian containers. So there
           | is nothing in there I would be afraid to "break".
        
             | BiteCode_dev wrote:
             | The container has a system python stdlib, and it's used by
             | apt. If it breaks, everything breaks. I've destroyed it
             | twice, with style I must say.
             | 
             | If any script, or yourself, run "pip install" using admin
             | rights (the default on a lot of container techs), you have
             | a chance to break it.
        
               | JonathanBeuys wrote:
               | "Chance to break" what?
               | 
               | It is not as if I manually work in a terminal inside a
               | container that is somehow permanent and valuable.
               | 
               | I just once create a dockerfile with the commands to set
               | up the application I want to use. That's it. From then
               | on, I use that container whenever I want to use that
               | application.
        
               | BiteCode_dev wrote:
               | If pip installs something that has a dependancy with the
               | same name than any of the packages your install, it can
               | cause a conflict, because by default, they will override
               | each other.
        
       | chriswarbo wrote:
       | Eww. I get the rationale, but Python's packaging/import logic is
       | already ridiculously convoluted. The underlying problem is using
       | global mutable state (e.g. /usr/lib/.../site-packages). Giving
       | each application its own package directory is better; there are
       | Python-specific solutions to that (like virtualenv mentioned in
       | the article), but I prefer Nix since it's language-agnostic (it
       | can handle non-Python dependencies too).
       | 
       | I'm also not a fan of all the Rube Goldberg machines being
       | cobbled together to appease Docker's fundamentally broken way of
       | working:
       | 
       | > Distros that produce official images for single-application
       | containers (e.g., Docker container images) should remove the
       | EXTERNALLY-MANAGED file, preferably in a way that makes it not
       | come back if a user of that image installs package updates inside
       | their image (think RUN apt-get dist-upgrade). On dpkg-based
       | systems, using dpkg-divert --local to persistently rename the
       | file would work. On other systems, there may need to be some
       | configuration flag available to a post-install script to re-
       | remove the EXTERNALLY-MANAGED file.
       | 
       | Here a "single-application container" is assumed to contain _an
       | entire OS_ rather than, you know, a single application. That was
       | _supposed_ to make dependency management easier, since we can
       | tailor the OS to just that one application; but based on this
       | article it sounds like even that was a lie. Should we revisit the
       | assumption that Docker makes dependency management easier? No,
       | the OS maintainers are now expected to change their distros, to
       | add another layer to the Rube Goldberg machine.
       | 
       | I also find it pretty alarming that 'apt-get dist-upgrade' is
       | given as an example of something we might want to do to an
       | "official image". What's the point sanctioning a snapshot as
       | "official" if we're just going to immediately, and non-
       | determinisitcally, overwrite arbitrary parts of it based on
       | external-server-state du jour?
        
         | olau wrote:
         | > Giving each application its own package directory is better
         | 
         | One simple way to do that is
         | 
         | cd projectfoo mkdir libs pip install -t libs PYTHONPATH=libs
         | python foo.py
         | 
         | That way you can use system packages like a database connection
         | library together with the local application dependency farm.
        
         | BiteCode_dev wrote:
         | While I agree about Nix being a cleaner solution, it will never
         | be adopted, because... it's a cleaner solution.
         | 
         | Nix is pretty much incompatible with everything, because our
         | entire legacy of software stack has state, and we need to
         | accommodate for it.
         | 
         | So we keep adding those hacks because we favor compat above
         | purity.
         | 
         | Even nix has to, after all, there plenty of bash calls in NixOS
         | config scripts as soon as you do something non trivial.
        
           | chriswarbo wrote:
           | Sure, but I'm not saying everyone should adopt Nix; I'm
           | saying the solution to problems caused by global mutable
           | state is to avoid adding more global mutable state.
           | 
           | If 'apt-get dist-upgrade' is breaking the packages we pip-
           | installed into /usr/local, I'd ask (a) why we're using two
           | package managers, and (b) why 'put these files in these
           | locations' is being solved implicitly by running imperative,
           | non-deterministic commands to mutate things in-place, rather
           | than e.g. extracting a .tar.gz of known-good dependencies.
        
             | BiteCode_dev wrote:
             | > If 'apt-get dist-upgrade' is breaking the packages we
             | pip-installed into /usr/local, I'd ask (a) why we're using
             | two package managers,
             | 
             | debian packagers won't use pip for obvious reasons. dev
             | need pip because it's portable, and doesn't need packagers
             | validation.
             | 
             | It would have been better if something like nix would have
             | been adopted 30 years ago by both community, but it hasn't
             | been so we have several packages managers. It's even worst
             | now with poetry, conda, snap, flatpack and docker.
             | 
             | > why 'put these files in these locations' is being solved
             | implicitly by running imperative, non-deterministic
             | commands to mutate things in-place, rather than e.g.
             | extracting a .tar.gz of known-good dependencies.
             | 
             | pip now uses wheels, and basically does that. Whl files are
             | zip, and pip just unpack it at a know location. The problem
             | is, this location is currently shared with the OS if you
             | don't use --user or a venv. This article address that
             | problems. My other comment also talk about some other
             | solution.
             | 
             | It's very hacky, because, well legacy and all that.
        
               | georgyo wrote:
               | You missed what he was saying.
               | 
               | Nix solves this problem in multiple different ways. One
               | of which is packaging, but also the entire /nix/store is
               | mounted read-only.
               | 
               | A sudo pip install would have no ability to break the
               | system or remove something that was installed by nix.
        
               | BiteCode_dev wrote:
               | I get that, but since our entire legacy stack is
               | stateful, there is no way around that unless you nixify
               | the entire stuff.
               | 
               | Immutable works only if the entire chain is.
        
       | [deleted]
        
       | BiteCode_dev wrote:
       | It's a good safeguard, and it's going in the direction of the
       | other initiatives to make python package management default
       | behavior saner.
       | 
       | PEP 852 is the another one to follow up:
       | https://www.python.org/dev/peps/pep-0582/
       | 
       | It basically uses the concept of node_modules, making python
       | interpreters load any local directory names `__pypackages__` .
       | There are 2 differences though:
       | 
       | * Unlike JS, python can only have one version of each lib for a
       | given setup.
       | 
       | * Since having several versions of python often matters, you may
       | have several __pypackages__/X.Y sub dirs to cater to each of
       | them.
       | 
       | It also forces you to use "-m" to call commands written in
       | Python, which is the best practice anyway. I hope it will push
       | jupyter to fix "-m" on windows for them because that's a blocker
       | for beginners.
       | 
       | If you are not already using "-m", start now. It solves a lot of
       | different problems with running python cli programs. It's an old
       | flag, but too few people know about it.
       | 
       | E.G: instead of running "black" or "pylint", do "python -m black"
       | or "python -m pylint". Or course you may want to chose a specific
       | version of python, so "python3.8 -m black" for unix, or "py -3.8
       | -m black" on windows.
       | 
       | To test out the __pypackages__ concept, give a try to the pdm
       | project: https://github.com/pdm-project/pdm
       | 
       | At last, some other tools that I wish people knew more about that
       | solves packaging issues:
       | 
       | * pyflow (https://github.com/David-OConnor/pyflow): it's a
       | package manager like poetry, but it also installs whatever python
       | you want like pyenv. Except it provides the binary, no need to
       | compile anything. It's a young project with plenty of bugs, but I
       | wish it succeeds because it's really a great concept. Give it a
       | try, it needs users and we would all benefit from it becoming
       | popular as it's a very sane way of setuping a python dev env.
       | 
       | * shiv (https://shiv.readthedocs.io/): it leverages the concept
       | of zipapp, see PEP 441 from 2013, meaning the ability that python
       | has to execute code inside a zip file. It's a successor to pex.
       | Basically it lets you bundle your code + all deps inside a zip,
       | like a Java .war file. You can then run the resulting zip, a .pyz
       | file, like if it were a regular .py file. It will unzip on the
       | first execution automatically and run transparently. It makes
       | deployment almost as easy as with golang.
       | 
       | * nuitka (https://nuitka.net/): takes your code and all
       | dependencies, turns them into C, and compiles it. Although it
       | does require a bit of setup, since it needs headers and a
       | compiler, it results reliably in a standalone compiled executable
       | that will run on the same architecture with no need for anything
       | else. Also it will speed up your Python program, up to 4 times.
       | In my experience, it's also easier and more robust than
       | pyinstaller, cx_freeze and so on.
       | 
       | * pyodide (https://pyodide.org/en/stable/): python compiled to
       | WASM to run in the Web browser. Useless for web programming given
       | the huge size of the runtime, but great for teaching, as it
       | allows students to basically access a zero install full featured
       | python dev env by clicking a link. Try it out, it's awesome, you
       | can even create and query a sqlite db, thanks to the virtual FS:
       | https://notebook.basthon.fr/
        
         | ericvsmith wrote:
         | nuitka is at https://nuitka.net/
        
           | BiteCode_dev wrote:
           | thanks, I'll fix my bad copy paste.
        
       | gigatexal wrote:
       | What's still annoying to this day is Python imports. I have been
       | doing Python for a while now and I still get tripped up when some
       | bit of code in a folder isn't importable. I've found adding the
       | package dir that I'm working on to PYTHON_PATH works but then my
       | editor doesn't resolve packages for autocomplete.
        
         | georgyo wrote:
         | If you have foo.py and bar.py in a directory and you want to
         | import foo.py from bar.py you only need to add an empty
         | __init__.py
         | 
         | You would then `import .foo`
        
           | terom wrote:
           | https://www.python.org/dev/peps/pep-0420/ Python 3.3+ no
           | longer requires the `__init__.py` file to make a package.
           | 
           | > You would then `import .foo`
           | 
           | The `import .foo` is a syntax error, and `from . import foo`
           | does not work for scripts [1].
           | 
           | The basic `import foo` works if the directory is on your
           | PYTHONPATH. That will be the case if you run `bar.py` as a
           | script.
           | 
           | It gets more difficult if you want to separate scripts and
           | importable modules into separate sibling directories (bin vs
           | lib). You can't use relative imports to get around that, you
           | need to use a virtualenv (or manual PYTHONPATH wrangling).
           | 
           | [1] https://stackoverflow.com/a/14132912
        
         | BiteCode_dev wrote:
         | Use virtualenvs, that's the problem they solve.
         | 
         | Python now ships with the venv cli tool.
         | 
         | If you point your editor to the python interpreter in the venv,
         | it will detect everything automatically.
        
       | korijn wrote:
       | I'm thrilled to see people adressing this (decades old?) footgun.
       | I hope there will be more collaboration in the future. The split
       | universes system and language specific package managers operate
       | in are a cause for much more issues.
        
       ___________________________________________________________________
       (page generated 2021-09-01 10:00 UTC)