[HN Gopher] Cooperative Package Management for Python ___________________________________________________________________ Cooperative Package Management for Python Author : Tomte Score : 59 points Date : 2021-09-01 05:56 UTC (4 hours ago) (HTM) web link (lwn.net) (TXT) w3m dump (lwn.net) | derriz wrote: | I probably haven't thought this fully through but wouldn't it be | simpler to just have a system venv - root protected - perhaps | distributed using the system's package manager? Then if you mess | up site-packages at least you wouldn't break the system tools. | BiteCode_dev wrote: | It's already the case, that's why you see some mistaken | tutorials telling you to "sudo pip install". | | A venv is just a dir and some path, and your os already have | some dedicated to their own python install, even if it's not | called a venv, but something like dist-packages + manual PATH | fudging. | | This proposal makes the distinction clearer by putting | safeguard to NOT to mess up with the system stdlib. | | You should be using "--user" or a local venv, and "-m" to call | commands. | | See my other comments for more tooling. | derriz wrote: | I'm missing something I guess - I don't understand your | comment in the context of this claim near the start of the | article: | | "The root cause of the problem is that distribution package | managers and Python package managers ("pip" is shorthand to | refer to those throughout the rest of the article) often | share the same "site-packages" directory for storing | installed packages." | | Having a system venv - managed by the system's package | manager - would mean this "root cause" would go away, no? | | Of course by poking around the filesystem tree and messing | with managed files as root, you could mess up the system venv | but that's possible with any installed package. | faho wrote: | >Having a system venv - managed by the system's package | manager - would mean this "root cause" would go away, no? | | Oh, but you need to explain to pip that it's managed by the | system's package manager. Which is what this does with the | "EXTERNALLY-MANAGED" file. | | And you need to do that because the system's package | manager and pip have separate sources - with `pip` you get | the packages from pypi, with the system's package manager | from its repo. | BiteCode_dev wrote: | As I said, a venv is nothing more than a directory + PATH | setup. OS already have a dedicated one. The problem is not | that it does not exist, the problem is that pip is using | it. | JonathanBeuys wrote: | On my computer, I only use Python software that is in the Debian | repos. So all dependencies are handled by Debian. | | Everything else I run in a Docker container. | | This way I never need a venv. | | I hope these changes will not make my workflow harder. | | Running 3rd party software in containers not only makes | dependency handling easier. It also is a security thing. Can you | really trust the whole dependency tree below any python code that | you want to run? I prefer to keep it separate from my main OS. | BiteCode_dev wrote: | No, it fact it will make your workflow easier: no matter what, | no script using pip will break your install. | JonathanBeuys wrote: | I don't know if I actually have something you would call an | "install". | | I run 3rd party software in fresh Debian containers. So there | is nothing in there I would be afraid to "break". | BiteCode_dev wrote: | The container has a system python stdlib, and it's used by | apt. If it breaks, everything breaks. I've destroyed it | twice, with style I must say. | | If any script, or yourself, run "pip install" using admin | rights (the default on a lot of container techs), you have | a chance to break it. | JonathanBeuys wrote: | "Chance to break" what? | | It is not as if I manually work in a terminal inside a | container that is somehow permanent and valuable. | | I just once create a dockerfile with the commands to set | up the application I want to use. That's it. From then | on, I use that container whenever I want to use that | application. | BiteCode_dev wrote: | If pip installs something that has a dependancy with the | same name than any of the packages your install, it can | cause a conflict, because by default, they will override | each other. | chriswarbo wrote: | Eww. I get the rationale, but Python's packaging/import logic is | already ridiculously convoluted. The underlying problem is using | global mutable state (e.g. /usr/lib/.../site-packages). Giving | each application its own package directory is better; there are | Python-specific solutions to that (like virtualenv mentioned in | the article), but I prefer Nix since it's language-agnostic (it | can handle non-Python dependencies too). | | I'm also not a fan of all the Rube Goldberg machines being | cobbled together to appease Docker's fundamentally broken way of | working: | | > Distros that produce official images for single-application | containers (e.g., Docker container images) should remove the | EXTERNALLY-MANAGED file, preferably in a way that makes it not | come back if a user of that image installs package updates inside | their image (think RUN apt-get dist-upgrade). On dpkg-based | systems, using dpkg-divert --local to persistently rename the | file would work. On other systems, there may need to be some | configuration flag available to a post-install script to re- | remove the EXTERNALLY-MANAGED file. | | Here a "single-application container" is assumed to contain _an | entire OS_ rather than, you know, a single application. That was | _supposed_ to make dependency management easier, since we can | tailor the OS to just that one application; but based on this | article it sounds like even that was a lie. Should we revisit the | assumption that Docker makes dependency management easier? No, | the OS maintainers are now expected to change their distros, to | add another layer to the Rube Goldberg machine. | | I also find it pretty alarming that 'apt-get dist-upgrade' is | given as an example of something we might want to do to an | "official image". What's the point sanctioning a snapshot as | "official" if we're just going to immediately, and non- | determinisitcally, overwrite arbitrary parts of it based on | external-server-state du jour? | olau wrote: | > Giving each application its own package directory is better | | One simple way to do that is | | cd projectfoo mkdir libs pip install -t libs PYTHONPATH=libs | python foo.py | | That way you can use system packages like a database connection | library together with the local application dependency farm. | BiteCode_dev wrote: | While I agree about Nix being a cleaner solution, it will never | be adopted, because... it's a cleaner solution. | | Nix is pretty much incompatible with everything, because our | entire legacy of software stack has state, and we need to | accommodate for it. | | So we keep adding those hacks because we favor compat above | purity. | | Even nix has to, after all, there plenty of bash calls in NixOS | config scripts as soon as you do something non trivial. | chriswarbo wrote: | Sure, but I'm not saying everyone should adopt Nix; I'm | saying the solution to problems caused by global mutable | state is to avoid adding more global mutable state. | | If 'apt-get dist-upgrade' is breaking the packages we pip- | installed into /usr/local, I'd ask (a) why we're using two | package managers, and (b) why 'put these files in these | locations' is being solved implicitly by running imperative, | non-deterministic commands to mutate things in-place, rather | than e.g. extracting a .tar.gz of known-good dependencies. | BiteCode_dev wrote: | > If 'apt-get dist-upgrade' is breaking the packages we | pip-installed into /usr/local, I'd ask (a) why we're using | two package managers, | | debian packagers won't use pip for obvious reasons. dev | need pip because it's portable, and doesn't need packagers | validation. | | It would have been better if something like nix would have | been adopted 30 years ago by both community, but it hasn't | been so we have several packages managers. It's even worst | now with poetry, conda, snap, flatpack and docker. | | > why 'put these files in these locations' is being solved | implicitly by running imperative, non-deterministic | commands to mutate things in-place, rather than e.g. | extracting a .tar.gz of known-good dependencies. | | pip now uses wheels, and basically does that. Whl files are | zip, and pip just unpack it at a know location. The problem | is, this location is currently shared with the OS if you | don't use --user or a venv. This article address that | problems. My other comment also talk about some other | solution. | | It's very hacky, because, well legacy and all that. | georgyo wrote: | You missed what he was saying. | | Nix solves this problem in multiple different ways. One | of which is packaging, but also the entire /nix/store is | mounted read-only. | | A sudo pip install would have no ability to break the | system or remove something that was installed by nix. | BiteCode_dev wrote: | I get that, but since our entire legacy stack is | stateful, there is no way around that unless you nixify | the entire stuff. | | Immutable works only if the entire chain is. | [deleted] | BiteCode_dev wrote: | It's a good safeguard, and it's going in the direction of the | other initiatives to make python package management default | behavior saner. | | PEP 852 is the another one to follow up: | https://www.python.org/dev/peps/pep-0582/ | | It basically uses the concept of node_modules, making python | interpreters load any local directory names `__pypackages__` . | There are 2 differences though: | | * Unlike JS, python can only have one version of each lib for a | given setup. | | * Since having several versions of python often matters, you may | have several __pypackages__/X.Y sub dirs to cater to each of | them. | | It also forces you to use "-m" to call commands written in | Python, which is the best practice anyway. I hope it will push | jupyter to fix "-m" on windows for them because that's a blocker | for beginners. | | If you are not already using "-m", start now. It solves a lot of | different problems with running python cli programs. It's an old | flag, but too few people know about it. | | E.G: instead of running "black" or "pylint", do "python -m black" | or "python -m pylint". Or course you may want to chose a specific | version of python, so "python3.8 -m black" for unix, or "py -3.8 | -m black" on windows. | | To test out the __pypackages__ concept, give a try to the pdm | project: https://github.com/pdm-project/pdm | | At last, some other tools that I wish people knew more about that | solves packaging issues: | | * pyflow (https://github.com/David-OConnor/pyflow): it's a | package manager like poetry, but it also installs whatever python | you want like pyenv. Except it provides the binary, no need to | compile anything. It's a young project with plenty of bugs, but I | wish it succeeds because it's really a great concept. Give it a | try, it needs users and we would all benefit from it becoming | popular as it's a very sane way of setuping a python dev env. | | * shiv (https://shiv.readthedocs.io/): it leverages the concept | of zipapp, see PEP 441 from 2013, meaning the ability that python | has to execute code inside a zip file. It's a successor to pex. | Basically it lets you bundle your code + all deps inside a zip, | like a Java .war file. You can then run the resulting zip, a .pyz | file, like if it were a regular .py file. It will unzip on the | first execution automatically and run transparently. It makes | deployment almost as easy as with golang. | | * nuitka (https://nuitka.net/): takes your code and all | dependencies, turns them into C, and compiles it. Although it | does require a bit of setup, since it needs headers and a | compiler, it results reliably in a standalone compiled executable | that will run on the same architecture with no need for anything | else. Also it will speed up your Python program, up to 4 times. | In my experience, it's also easier and more robust than | pyinstaller, cx_freeze and so on. | | * pyodide (https://pyodide.org/en/stable/): python compiled to | WASM to run in the Web browser. Useless for web programming given | the huge size of the runtime, but great for teaching, as it | allows students to basically access a zero install full featured | python dev env by clicking a link. Try it out, it's awesome, you | can even create and query a sqlite db, thanks to the virtual FS: | https://notebook.basthon.fr/ | ericvsmith wrote: | nuitka is at https://nuitka.net/ | BiteCode_dev wrote: | thanks, I'll fix my bad copy paste. | gigatexal wrote: | What's still annoying to this day is Python imports. I have been | doing Python for a while now and I still get tripped up when some | bit of code in a folder isn't importable. I've found adding the | package dir that I'm working on to PYTHON_PATH works but then my | editor doesn't resolve packages for autocomplete. | georgyo wrote: | If you have foo.py and bar.py in a directory and you want to | import foo.py from bar.py you only need to add an empty | __init__.py | | You would then `import .foo` | terom wrote: | https://www.python.org/dev/peps/pep-0420/ Python 3.3+ no | longer requires the `__init__.py` file to make a package. | | > You would then `import .foo` | | The `import .foo` is a syntax error, and `from . import foo` | does not work for scripts [1]. | | The basic `import foo` works if the directory is on your | PYTHONPATH. That will be the case if you run `bar.py` as a | script. | | It gets more difficult if you want to separate scripts and | importable modules into separate sibling directories (bin vs | lib). You can't use relative imports to get around that, you | need to use a virtualenv (or manual PYTHONPATH wrangling). | | [1] https://stackoverflow.com/a/14132912 | BiteCode_dev wrote: | Use virtualenvs, that's the problem they solve. | | Python now ships with the venv cli tool. | | If you point your editor to the python interpreter in the venv, | it will detect everything automatically. | korijn wrote: | I'm thrilled to see people adressing this (decades old?) footgun. | I hope there will be more collaboration in the future. The split | universes system and language specific package managers operate | in are a cause for much more issues. ___________________________________________________________________ (page generated 2021-09-01 10:00 UTC)