Packaging Python Applications in 2021

Michael Seifert, 2021-07-28, Updated 2021-07-28
Two open moving boxes on a table. To the left of them, there is a small pile of books as well as scissors and scotch tape.
Photo: Karolina Grabowska (Pexels)

Let's build a program that computes the answer to the ultimate question of life, the universe, and everything (opens new window). The application does a lot of number crunching, so it uses the numpy package.

import numpy as np

def print_answer():
    summands = np.array([11, 12, 13, 6])
    answer = sum(summands)
    print(answer)

if __name__ == "__main__":
    print_answer()

The script will print 42. This requires you to have Numpy installed. While this is a reasonable thing to ask from a Python developer, it creates a lot of friction for non-Python developers. Rather than requiring the user to install the script dependencies manually, we specify the dependencies using a build system that creates build artefacts.

Creating Python wheels and source tarballs

Python wheels and source tarballs are arguably the simplest build artefacts. They're also the only reasonable choice when packaging a library. Both contain metadata about the packaged software, such as author, license, or dependencies. The main difference is that Wheels are a binary distribution. That means you may have to provide different wheels for different platforms when your software comes with native libraries. Since you cannot realistically create a wheel for all platforms, you should always provide both wheels and source tarballs.

Historically, developers needed to create a setup.py file that used distutils or setuptools to create wheels or tarballs. This was the de-facto standard for packaging Python applications until PEP 517 (opens new window). PEP 517 mandates a pyproject.toml file which contains information about the build system, the project dependencies, and any additional tools used. It simply gives us more choice.

We still want to use setuptools, so we declare it in the pyproject.toml file:

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

Now that our application uses setuptools we can create a setup.cfg file to specify our dependencies and the mandatory package metadata:

[metadata]
name = answer
version = 1.0.0

[options]
packages = answer
install_requires =
    numpy

Once this file is in place, you can use a PEP 517 package builder, such as build (opens new window) to create build artefacts:

$ python -m build

This will create a file called answer-1.0.0-py3-none-any.whl. The file name follows a convention specified in PEP 427 (opens new window). It consists of at least five parts separated by dashes. The first two parts obvisouly represent the package name and the version. The additional parts, describe the compatible Python version, Python ABI, and platform. Our wheel is compatible with any Python 3 version (py3), does not require a specifc Python ABI due to the lack of native code (none), and runs on any platform.

Anyone with access to the wheel can install it on their system using pip install answer-1.0.0-py3-none-any.whl. Since wheels contain metadata that describe their dependencies, Pip will resolve those dependencies and install required packages as well. At this point, we're able to simply pip install our application. This is a massive improvement over our first version, where we had to manually find and install dependencies. Even for users unfamiliar with Python, it's feasible to run a pip install command. A small price to pay to get the answer to the ultimate question of life, the universe, and everything.

Creating cross-platform Python executables

Obviously, the answer to the ultimate question of life, the universe, and everything should be available to all user groups with the least effort possible. Therefore, we decide to package the script as an executable file using PEX (opens new window) (Python Executable). PEX is a tool for packaging Python applications for multiple platforms or Python interpreters, all within a single file. The resulting file requires nothing, but a working Python interpreter on the target system.

PEX provides a --find-links option similar to pip install that we use to point it to the directory containing our wheel. We also specify an --output-file and the --entry-point which gets run when executing the PEX:

$ pex --find-links=dist/ --entry-point answer:print_answer --output-file /tmp/answer.pex answer

The resulting PEX contains the answer application and numpy. We can simply execute the resulting file to print the notorious answer:

$ /tmp/answer.pex
42

This is already great, but the PEX only runs on platforms that have the same operating system, CPU architecture, and Python interpreter as the build platform. How can this be? The answer module is pure Python and the resulting wheel is platform independent. Yes, they are. But the PEX bundles all dependencies as well. We can easily verify this by unzipping answer.pex and looking at the PEX-INFO file:

"distributions": {
  "answer-1.0.0-py3-none-any.whl": "050a7cfd82d311aac1c9db4c61a1e4989525336a",
  "numpy-1.21.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl": "6861137c98e4867fb7d488e4027ed0a0b66f0f69"
},

The distributions section verifies our theory: The included Numpy wheel is platform-specific. That means we will have to include a Numpy wheel for each target platform. Luckily, PEX provides a --python option which we can use to pull in support for Python 3.8 in addition to Pytohn 3.9:

$ pex --python python3.8 --python python3.9 --find-links=dist/ \
  --entry-point answer:print_answer --output-file /tmp/answer.pex answer

Another look at the PEX-INFO file will confirm that the new executable contains a Numpy wheel for both Python 3.8 and Python 3.9:

"distributions": {
  "answer-1.0.0-py3-none-any.whl": "050a7cfd82d311aac1c9db4c61a1e4989525336a",
  "numpy-1.21.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl": "e5a1d065f9406507ea7f6a6a8fb4548ca12fe504",
  "numpy-1.21.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl": "6861137c98e4867fb7d488e4027ed0a0b66f0f69"
},

Adding support for additional operating systems or CPU architectures works much in the same way using PEX's --platform option. This is great as long as all dependencies are available for the target platforms. PEX uses Pip internally to resolve dependencies and Pip's --platform (opens new window) option, in turn, requires dependencies to be available as Python wheels. When any direct or indirect dependency of the answer application is not available for the target platform, the build will fail. Keep that in mind.

Creating native executables

The previous approach still relies on the fact that the target system has a Python interpreter installed. This may not be the case, though. If we really cannot ask the user to install a Python interpreter on their system, we can ship our own.

Let's start by creating a small native application that starts a Python interpreter. CPython provides an extensive C API (opens new window) for this use case. Rust users have a choice between rust-cpython (opens new window) and PyO3 (opens new window). We will use Rust and PyO3:

use pyo3::prelude::*;

fn main() -> Result<(), ()> {
    Python::with_gil(|py| {
        python_main(py).map_err(|e| {
          e.print_and_set_sys_last_vars(py);
        })
    })
}

fn python_main(py: Python) -> PyResult<()> {
    let answer = PyModule::import(py, "answer")?;
    let main_func = answer.getattr("print_answer")?;
    main_func.call0()?;
    Ok(())
}

We define the function python_main which expects an initialized Python interpreter. The function loads the answer Python module and calls the print_answer function without any arguments. PyO3 provides the Python::with_gil shorthand to initialize a Python interpreter. The with_gil function takes a callback as an argument and the callback provides the initialized interpreter. We create a closure and call the python_main function with the Python interpreter provided by the callback. Let's compile the application and move the executable to the folder where the answer Python module is located. We need to set the PYTHONPATH environment variable so that Python knows where to look for modules to import. Then we can run the Rust binary and it will print the answer to the ultimate question of life, the universe, and everything:

$ PYTHONPATH=. answer-rs
42

A closer look at the Rust binary shows that it dynamically links to libpython:

$ ldd answer-rs
    …
    libpython3.9.so.1.0 => /usr/lib64/libpython3.9.so.1.0 (0x00007f1510d11000)
    …

If we bundle libpython with our application and set LD_LIBARY_PATH correctly, we will be able to run our application on Linux systems without a Python installation. [1]

Tools like pyinstaller (opens new window) and PyOxidizer (opens new window) take care of that bundling step for us. They produce stand-alone bundles from your Python application that don't require Python to be installed on the target system. They determine extension modules, the dynamically linked system system libraries and the resources loaded by your Python application and bundle it all together.

This approach works surprisingly well. The drawbacks are significant, though. First of all, anything that's installed on the build host can potentially be pulled into the bundle. Without a tightly controlled build environment, it will be difficult to enforce any kind of license compliance or even determine the correct license of the resulting build artefact. Furthermore, pyinstaller and PyOxidizer don't support cross-compilation, unlike our manual approach with a Rust or C wrapper. And, lastly, the fact that we distribute a bunch of libraries together with our application makes us responsible to watch out for security fixes of those libraries and we have to distribute and updated the bundle accordingly.

Conclusion

In this article, we looked at three different ways to package a Python application that go beyond just shipping a collection of Python scripts. We looked at packaging the application…

  • As Python wheels and source tarballs
  • As cross-platform Python executables (PEX)
  • As a native executable targeting a single platform

Python packaging has come a long way and we now have numerous viable options for distributing applications. The complexity of the process increases with the envisioned simplicity for the user. I recommend packaging libraries as wheels and source tarballs and packaging executables, such as command-line tools and applications, as PEX files. Consider the native executable if you already know you will write Python extensions in C or Rust.

I am always interested in your experience on the topic. Feel free to share them in the comments below or reach out directly to me.


  1. This is an oversimplification, because libpython is not the only dynamically linked library. Notably, the Rust binary and libpython link against libc. Applications built against old versions of GLibC will generally run on systems with newer GlibC versions, but not the other way round. Assuming your system uses GlibC and not any other libc, of course… ↩︎