CourtBouillon

Authentic people growing open source code with taste

The Python Packaging Hell: Files Everywhere (4 / 7)

Python packaging can sometimes be a nightmare. To convince yourself about that, you just need a few minutes of drowning into the myriad of usable (and used!) files to build or install a packages.

💕💕💕

This article is part of a series of tearful articles about Python packaging:

  1. The Can of Worms
  2. The Roots of Evil
  3. Delusions of Formats
  4. Files Everywhere
  5. The Toolbox
  6. The Expression of Needs
  7. The Minimal Solution

Before starting, we would like to send a lot of love to the members of the PyPA team. We complain a lot in this series, but we have a lot of respect for the sisyphean work already done.

That being said, let’s start (again) the whining 😭.

💕💕💕

But Why?

We can’t say that there is no manual to create Python packages. The issue isn’t the lack, but the abundance. You’ll find manuals everywhere, more or less old, more or less practical, more or less useful… The hardest part isn’t to find one, it’s to read all of them and to come across information in each one, until you’ve made your own conviction.

Maybe you thought that you’d find in the lines of this article a good summary of what exists, but here is the sad truth: what we have here is only an additional source you can refer to if you ever find something interesting.

That being said, it’s not that bad…

What is the connection between this introduction and files? Well, it’s quite simple. We can’t say there is no configuration file to create Python packages. The issue isn’t the lack, but the abundance. You’ll find files everywhere, more or less old, more or less practical, more or less useful… The hardest part isn’t to find one, it’s to read all of them and to come across information in each one, until you’ve made your own conviction.

(It’s OK, you got the connection, right?)

Meme Toy Story "files everywhere"
You want files? You’ll see a lot of them!

We won’t say again the good old "you have to understand the people creating Python, because Python is old, you can’t change everything suddenly". It’s a bit true for this hell of files, but also a bit false. The official example proposed today by the PyPA contains 4 files used to create packages (setup.py, setup.cfg, MANIFEST.in and pyproject.toml). If we can understand the to cover a maximum of possible solutions, we can also condemn the impression of full chaos given to someone who’d like to learn.

(Reminder: a minimal Rust project contains a Cargo.toml file to store metadata and a src/main.rs file to store code. Moreover, these two files are automatically created for you with the cargo new command.)

For sure, it’s hard to think about all the needs of a configuration file at the beginning. But on the other hand, it’s questionable to say we have to live with this sad legacy. Unlike other subjects, nothing would prevent us from defining a new standard of configuration file. And nothing would prevent this standard from being able to generate packages identical to existing ones. We’d be able to leave the past behind us, with its old files and its old tools, to only use one file in any case. The package creator would have to learn how to use these new rules, of course, but nothing would change for the final user, nor for the tools they’d use.

That would be nice, don’t you think? It’s time for the good news: it’s already happening right now. No joke.

Now that you really want to know the solution (yes, it’s totally sneaky and totally assumed), we’ll be able to inflict you the full thought process leading to the current situation. The path matters more than the destination, doesn’t it?

A Rather Short List

There is no need to grumble: as usual, we won’t talk about everything that has ever existed to create or install packages. Don’t expect an exhaustive list, just a few emblematic files allowing us to understand where we come from.

setup.py

This file is the first one introduced to handle package creation, it’s also the most famous and the most used today, despite its advanced age (at least 20 years, this doesn’t make us feel any younger).

The idea behind setup.py is quite simple: to set up the whole configuration needed to create and install packages, we use a Python script defining a set of metadata (the name of the package, the list of files to include, etc.) and various commands (create a source package, create a binary package, install, etc.). To do that, Python offers a module called distutils, containing everything needed to describe these metadata and commands. It’s enough to import it in setup.py, to call the right functions, and voilà.

But, and it’s not the first time we meet this problem with tools managing Python packages, distutils is rather limited and its features aren’t defined strictly. Hurtfully, the code became the reference of what we can do, and the (legitimate) fear of breaking everything quickly prevented developers from adding features, or from fixing bugs that some users would have confused with features.

Mandatory Related XKCD™
The mandatory XKCD

Limited by distutils, setup.py may have been replaced by another solution. But we found a quick fix instead: setuptools.

setuptools is a module using distutils internally, but offering additional features, including a more advanced management of included files, the possibility to create Windows programs, and most of all… the possibility to define dependencies.

We’ll see libraries and tools in detail in the next article, but it’s important to understand that setuptools is going to open, without realizing, a can of worms. As the module is external to Python, it’s much less cluttered by the tweezers of its predecessor. The new features are added in response to the needs of users, in a happy disorganization that at least had the merit of allowing a chaotic but large packages distribution. The library comes with an executable script, easy_install, that allows to install a package and its dependencies. It also comes with the "egg" package format that we discussed about last time.

From the anarchic development of setuptools, it has been impossible to correctly specify options and best practices of package creation. setup.py has the cons of its pros: being written in Python, it allows to use the full power of the language for something that was initially supposed to be a few lines of metadata and installation scripts. Everything that could simply be descriptive potentially becomes dynamic when executed. Extensions are proposed, dependent on setuptools or not, offering a galaxy of possibilities. Scripts are getting bigger, are copied from project to project without being understood. Random parts of code fixing dysfunctions for different versions of Python, distutils or setuptools, are included in all the setup.py of the Earth.

And at the end, we get that. Of course, this project needs a lot of configuration and it would be difficult to do these things with less code. Of course, it’s quite easy to understand the whole file, furthermore nicely written, with some time and some hard work.

The issue of setup.py isn’t its potential complexity, that can be useful in some cases. The real issue is that, for a long time, there has been no simple alternative to create simple, pure Python packages. The only way was to write code, for thing that could have been totally declarative. And who didn’t try to write code, a lot of horrible code, even to do simple things? With this stack of horrible code in many projects, setuptools has had to include workarounds allowing to bypass workarounds set up to bypass issues fixed since. setuptools has had to copy and include different functions of different Python versions (including their own bugs, of course) to be perfectly backward compatible. TL;DR: setuptools has become a hellish monster which has infected the setup.py of a significant majority of projects.

setup.cfg

Obviously, the idea to set up a declarative format for packages creation has finally arrived, and a solution has been integrated into setuptools: setup.cfg.

This INI file is nothing more than a different presentation of most of the options proposed in Python by setuptools. So we’ll find the same disadvantages: same bugs, same poorly documented options, same inconsistencies.

Moreover, this file isn’t a replacement of setup.py, but an extension. We have to keep the script, even when it’s almost empty! If some data are present twice, those from setup.cfg are kept.

Why do we need to keep the setup.py file? Just because setuptools doesn’t provide an external command to execute commands integrated in the script. To generate a source package, we use python setup.py sdist, directly executing the script.

It looks like a detail, but it actually is a major issue. Who would want to use a static format, while we can make a big pile of spaghetti code in a script that you still have to keep anyway? How can we explain to the people discovering Python that they have to write a Python file and an INI file, while we technically can do without the INI file? Yes, you got it: we can’t fight the call of the code.

That explains why setup.cfg isn’t really used today. Attached to the two huge ubiquitous burdens that are setuptools and setup.py, it just brings a small dose of simplicity by its declarative side. As long as it’ll be carrying around a heavy and sclerosing history, it’ll stay a second choice, a clumsy attempt to fix a real issue.

requirements.txt

Here is a file you certainly have already met and used. Praised without finesse by second zone tutorials, acclaimed for its simplicity and its power, used by a lot of famous projects, requirements.txt is the star of the dependencies installation.

But well, let’s just go ahead and say it: it has nothing to do with package creation.

requirements.txt, is a simple list of packages to install, with the possibility to fix versions, sources, branches and installation options.

It’s often used with pip, just for the installation. We can see it as a convenient way to list dependencies, in a format that we could directly write in the command line, but that our laziness and our taste for line breaks push us to confine in a file.

That’s convenient, in particular for everything we’d like to share in a different format than a package. At random: everything but libraries. A little unpretentious script? A requirements.txt file. A web application? A requirements.txt file. A library? Well, OK, let’s write some requirements.txt files for the documentation and the tests.

Yes, we can have a setup.py file, a setup.cfg file and a requirements.txt file in the same project. With all their friends MANIFEST.in, tox.ini, pyproject.toml, pytest.ini, and so on. Each package manager packages its own way, blithely copying things that seem to work from friends’ packages. We’ll always find a specific case that’s only handled by one of these files, and simplicity will be sacrificed on the altar of the sacrosanct features.

MANIFEST.in

Do you want a very special feature? Including assets in a source package is a good example of what a painful puzzle is.

Distributing binary packages is usually done so that users can easily use the code. Packages like wheels are ready-to-use archives, and their installation needs nothing more than decompressing an archive in the right folder. These packages are able to contain only the minimum: code. Everything else (documentation, tests, super-cute-little-nice-files describing changes…) has nothing to do in them.

Source packages are different. These packages are useful for the ones who want to look the code, create packages for Linux distributions, test patches, install libraries, launch tests… We thus try to include everything we can in the package, almost everything from the repository, except files needed for continuous integration, versioning, and other small stuff polluting our so cute project.

To include these files in the source package, in particular when files are at the root of the project and not in the same folder as the code, we use MANIFEST.in. This umpteenth file comes, properly, with its own syntax and its own commands. And don’t worry: it allows you to do, at the same time, things that are already possible with the other files, and things that are not possible with the other files.

Five-set Venn Diagram
Let’s see… Which files do I need to identify an optional dynamic dependency that will only be installed with Python 3.7.x on a 32-bit Windows?

pyproject.toml

Here we are.

At first sight, pyproject.toml seems to be a direct clone of setup.cfg, with a slightly different format and a debatable name. Another file again, another format again, but what a crazy idea?

In reality, things are a little bit more complex. PEP 518, that has introduced this file, is called "Specifying minimum build system requirements for Python projects". It’s not called "And here’s one more stupid format to define package metadata", and there are surprisingly good reasons for that.

In the list of issues caused by setuptools, there is one we didn’t talk about yet: setup.py contains the dependencies of a package, including the dependencies used to build the package. How can we know the dependencies without executing the file? And how to execute the file without knowing its dependencies? This chicken and egg issue is problematic for setuptools, but as everyone is using it to create packages, and as it’s a dependency of pip, there’s a good chance to have it installed with Python. However, if we want to use another tool, like a setuptools extension, things immediately become less easy.

The idea of pyproject.toml isn’t to propose a new metadata format. The idea is to include, in a simple text file, the dependencies needed to build a package. Think about this carefully. A little more.

Well, you understand now. We’ll be able to get rid of setuptools and distutils, at least to build packages. For real.

Of course, in simple cases, we can still use them. pyproject.toml allows to store all the metadata we used to store before. It also allows to store more complex information, like dependencies and supported Python versions, a bit like in setup.cfg, a bit like before.

But nothing prevents packagers from using another tool, that can define their own configuration options, independent of setuptools. Even better: as the file is specified and well organized, it gives other tools (black, pylint, coverage…) the possibility to use this file too, ending the atrocious set of confetti of configuration files.

One last thing has to be fixed: defining the entry point of the tool we’re going to use to create the package. That’s the job of PEP 517 that allows us to definitely get free from setuptools, setup.py and all their friends.

But… Does it work for real?

Yes. We just have to choose the tools we can use. And we’ll choose them in the next article…