A comprehensive guide through Python packaging (a.k.a. setup scripts)

May 13th, 2012 by exhuma.twn

One of the really useful things in python are the setup scripts. When doing “serious” business, you really should look into them. Setup scripts are amazingly powerful. But they don’t necessarily need to be complex. But because of this flexibility, the documentation around them seems like a lot to read. Additionally, the state of packaging has changed quite a bit over the years. So you now have a couple of packages (distutils, setuptools, distribute) which all seem to try to solve the same problem. See the current state of packaging for more details on this.

This post attempts to summarize the important bits using a “Hello World” project and steeping through the process of creating the setup.py file:

  • Creating a package distribution
  • Automatic generation of executables
  • Version numbers
  • Dependency management
  • Publishing your package (thus also making it available for automatic dependency resolution)
  • Some more food for thought.
NOTE: The setup.py script we will construct in this post, will use two bits of code which may not work in all cases:

  • importing the package itself (to centralize the version number)
  • Reading the long_description content from a text-file

Both methodologies have their issues. Importing the package only works if you can import it cleanly (i.e. without dependencies) from standard python. Reading the text file only works if the setup.py is called from the proper location.

This is mostly true. There are corner-cases however where this is not possible. If that is the case, you will need to live without these helpful shortcuts. See the comments on this article for more on this.

Our Hello World project

Our project will provide the following:

  • a package called helloworld
  • a method returning the string “Hello World!”
  • an executable printing “Hello World!” to the standard output.

Creating a package distribution

Before you can do any of the aforementioned points, you need to be able to create a package file. In most cases, this would be a tarball (a source distribution). First, you need to think about your file layout:

In python, every package lives in its own folder. Unless you have a one-file module. For the sake of this post, I will assume you have a folder, and leave the one-file module as an exercise to the reader. After all, the differences are not all that great. Let’s also assume, for the sake of simplicity, we have only one module in our package folder.

Let’s create the package

First, the file laout:

the-helloworld-project
├── helloworld
│   ├── __init__.py
│   └── core.py
└── setup.py

NOTE: This is just an example project, and it could have been implemented as a one-file module (using core.py), or the application logic could have been stuffed into __init__.py. But I assume that multi-file packages are far more common, so I will use an appropriate layout for that. Also, I personally like to always create my projects like this. On the one hand for consistency, and on the other hand it makes it easier to “grow” the project later on. It’s up to you to decide for yourself.

In this case, I named the project root “the-helloworld-project”. But (again, personally), I just give the root folder the same name as the package name. The difference between the root (“the-helloworld-project”) and the package (“helloworld”) is that  the package folder contains the actual package code, and the root folder contains packaging metadata and possibly some other development related stuff (unit-tests, fabric files, a readme, the license file, …)

Next, let’s create our business logic. For our target features, we need both a method generating our “Hello World!” message, and one to print it to stdout. Let’s put this into helloworld/core.py:

def get_message():
    return "Hello World!"

def print_message():
    print get_message()

So far so good. We have our business logic. Yay \o/.

Creating a skeleton setup.py file

Next up, we will create a very simple setup.py file. Let’s also add a README.txt file for good measure (it should be formatted in ReST!):

Description
===========

An example Hello World project.

Next, let’s create a very simple setup script, and put it into the root folder:

from setuptools import setup, find_packages

setup(
    name='helloworld',
    version='1.0',
    packages=find_packages(),
    long_description=open('README.txt').read(),
)

Note that we don’t use distutils, but setuptools. You will be better off with setuptools until distutils2 is available.

IMPORTANT:
The package setuptools is actually made available by a package named distribute. You should not install the old setuptools any more!

WARNING

As Ionel Maries Cristian pointed out in the comments, the line long_description=open('README.txt').read() could cause problems. Let’s make it safer…

We know that the README file sits in the same folder as the setup script. So we can use os.path.dirname on __file__ to construct a filename we can reach:

from setuptools import setup, find_packages
from os.path import join, dirname

setup(
    name='helloworld',
    version='1.0',
    packages=find_packages(),
    long_description=open(join(dirname(__file__), 'README.txt')).read(),
)

Now we can test to create a source distribution by running:

python setup.py sdist

There will be some warnings as we did fill in neither an author email nor a package URL. For this example we consider this okay!

This will give you a file called sdist/helloworld-1.0.tar.gz. You can inspect it if you want using tar tzvf sdist/helloworld-1.0.tar.gz.

With this done, we already have two points of our targets done:

  • a package called helloworld
  • a method returning the string “Hello World!”
  • an executable printing “Hello World!” to the standard output.

Testing inside a virtual environment

The details about virtual environments are out of the scope of this document (see the package documentation for more). But because it is highly recommended to use them, we will do so right now. This gives us the ability to test the package installation without polluting our system packages.

So let’s create one:

virtualenv env

This will create an environment called env in our current working folder and install pip and distribute into it. With this in place, we can already test the package installation:

$ ./env/bin/python setup.py install
running install
running bdist_egg
running egg_info
[...]
Processing dependencies for helloworld==1.0
Finished processing dependencies for helloworld==1.0

Now we can test our package within the virtual environment:

$ ./env/bin/python
>>> import helloworld.core as hw
>>> hw.get_message()
'Hello World!'
>>> hw.print_message()
Hello World!

Perfekt!

But we still want our executable…

Creating an executable

Following the official distutils documentation, we should specify the scripts keyword argument on the setup method. On the other hand, distribute gives us entry points. Personally, I prefer entry points. So let’s get this done. Modify the setup.py file and add the following:

setup(
    ...
    entry_points={
        'console_scripts':
            ['helloworld = helloworld.core:print_message']
        }
    )

The complete setup.py file should now look like this:

from setuptools import setup, find_packages
from os.path import join, dirname

setup(
    name='helloworld',
    version='1.0',
    packages=find_packages(),
    long_description=open(join(dirname(__file__), 'README.txt')).read(),
    entry_points={
        'console_scripts':
            ['helloworld = helloworld.core:print_message']
        }
    )

What does this mean?

  • helloworld = ...
    Create an executable with the name helloworld (on windows it will be a .exe file)
  • ... = helloworld.core:...
    The method is found in this package
  • ... :print_message
    Call this method on execution

Let’s re-install our package:

$ ./env/bin/python setup.py install
...
Installing helloworld script to /path/to/the-helloworld-project/env/bin
...
$

As you can see in the output, the setup script installed an executable into our local bin folder. If you execute it, your method from your package will be executed.

Awesome! We can cross the last item off our basic tasks:

  • a package called helloworld
  • a method returning the string “Hello World!”
  • an executable printing “Hello World!” to the standard output.
When running the setup script, you can also use the develop command instead of the install command. This will link the installed package with your source files. In this way, you don’t need to re-install the package when you change the source code. However, when you change something in your setup.py file, you need to re-run it of course!

Additionally, with pip you can instead use pip install -e . to achieve the same as with python setup.py develop with the advantage that pip also has an uninstall command.

Version Numbers

Version numbers are a very important part of your project! You should manage them as soon as possible. They are primarily important in two situation:

  • Installation of the package if an older version exists already (upgrading)
  • Dependency management

If you ever publish a package, it should always stay available! People using your package have the option to “pin” a version in their dependencies. Because of this, you will never know how long a specific version is going to be used. Again: Once published it should never disappear!

In the above example, we “hard-coded” the version number as 1.0. There is a better solution however! It would be nice to give your package users the ability to check the version number. To do that, we will move the version number into our top-level __init__.py file. By convention, the variable should be named __version__. In our package, this will be the only line in the file:

__version__ = '1.0'

This will allow users to do the following:

>>> import helloworld
>>> print helloworld.__version__
'1.0'

This is a very convenient information to have!

But now there is one annoyance… We have to edit the version number in two locations. In our setup script, and the __init__.py file. But there’s a simple solution for that. In your setup script, simply import your project and get the version number from there.

WARNING:
As bboe over on reddit pointed out, when doing this, you must be aware that this could fail if the project tries to import one of it’s dependencies when importing helloworld. As the dependency cannot be available yet when running the script, it will raise an ImportError. So, either you don’t import the version variable, or you must not trigger an import of any dependency when importing the package.
 
Other alternatives are possible. Such as storing your version number in a separate .py file and importing that one. Or, if all else fails, running a regex on you __init__.py file. You can see implementations of these solutions in the reddit link mentioned before.

import helloworld
setup(
    ...
    version=helloworld.__version__,
    ...

which will give you this result:

from setuptools import setup, find_packages
from os.path import join, dirname
import helloworld

setup(
    name='helloworld',
    version=helloworld.__version__,
    packages=find_packages(),
    long_description=open(join(dirname(__file__), 'README.txt')).read(),
    entry_points={
        'console_scripts':
            ['helloworld = helloworld.core:print_message']
        },
    )

It is recommended that the version numbers follow a well-known scheme. There are a few:

I suggest following the “strict” numbers stated in PEP386 for distutils. An example from the PEP document (ordered by version number):

    0.4       0.4.0  (these two are equivalent)
    0.4.1
    0.5a1
    0.5b3
    0.5
    0.9.6
    1.0
    1.0.4a3
    1.0.4b1
    1.0.4

Dependency Management

As stated above, with setup scripts, you can specify dependencies. So let’s make a small hello-world web application using Flask. A web-app will also contain “data” files (templates, …), which will give us the opportunity to talk about the manifest later on. But for now, let’s simply add Flask into our dependencies. To do that, add the following to your setup script:

setup(
    ...
    install_requires=[
        'Flask'
    ]
    ...

So the complete setup script now reads:

from setuptools import setup, find_packages
from os.path import join, dirname
import helloworld

setup(
    name='helloworld',
    version=helloworld.__version__,
    packages=find_packages(),
    long_description=open(join(dirname(__file__), 'README.txt')).read(),
    entry_points={
        'console_scripts':
            ['helloworld = helloworld.core:print_message']
        },
    install_requires=[
        'Flask'
    ]
    )

Let’s run the script again. But let’s use pip now:./env/bin/pip install -e .. Alternatively we could have used ./env/bin/python setup.py install or ./env/bin/python setup.py develop, but pip gives us the option of uninstalling. Not that this would matter in our development environment. But it’s good to know it exists. On the output, we will see that this will now automagically pull in everything required. Python will automatically query pypi to find the latest version of Flask and follow it’s dependencies.

Okay. So now we got the latest Flask version, and we will start to develop against that version. Because of this, we want to pin the version, so that we (and our users) will not be surprised if the API of Flask will change in the future. To determine the version of Flask, we can run the following:

$ ./env/bin/pip freeze
...
Flask==0.8
...

This will list all packages, including the version number installed in the environment from which pip was run.

This output can also be used as a “requirements file” for pip installations. I won’t cover that topic yet. It’s still new to me ;)

So now we know we got version 0.8. We can now change our setup script to pin our version:

setup(
    ...
    install_requires=[
        'Flask==0.8'
    ]
    ...

You are strongly encouraged to do this as it puts you in control about when and where you will upgrade to newer versions!

Now, as a small interlude, let’s create a one-page web application, basing ourselves on the integrated server. This will also give us the opportunity to revisit the creation of executables. In a production environment, you should not do this, but rather deploy a WSGI application. But this post is not about web application. We just want some “data” files!

Create a folder for our templates (which incidentally contains our only data file):

$ mkdir helloworld/templates

… and put the following file into it, called index.html:

<!DOCTYPE HTML>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title></title>
</head>
<body>
    {{message}}
</body>
</html>

Next, create the Flask entry point inside helloworld/web.py

from flask import Flask, render_template
from helloworld.core import get_message
app = Flask(__name__)


@app.route("/")
def hello():
    return render_template('index.html',
        message=get_message())


def run_server():
    app.run()

… and add the entry-point to setup.py:

from setuptools import setup, find_packages
from os.path import join, dirname
import helloworld

setup(
    name='helloworld',
    version=helloworld.__version__,
    packages=find_packages(),
    long_description=open(join(dirname(__file__), 'README.txt')).read(),
    entry_points={
        'console_scripts': [
            'helloworld = helloworld.core:print_message',
            'serve = helloworld.web:run_server',
            ]
        },
    install_requires=[
        'Flask==0.8'
    ]
    )

Execute the setup script again (to make the new executable available):

$ ./env/bin/python setup.py develop

… and test the server:

$ ./env/bin/serve
 * Running on http://127.0.0.1:5000/

Good! We have everything in place to talk about the manifest.

The MANIFEST

There is still one problem with our project. By default the package only contains python files. If you need to add additional data, for example template of image files in a web-application, you need to specify these. The above example contains a template. So let’s see what this means for us. Let’s create a new source distribution:

$ python setup.py sdist

Now, inspect the contents of the tarball:

$ tar tzvf dist/helloworld-1.0.tar.gz
drwxrwxr-x exhuma/exhuma     0 2012-05-13 15:02 helloworld-1.0/
-rw-rw-r-- exhuma/exhuma   443 2012-05-13 14:55 helloworld-1.0/setup.py
drwxrwxr-x exhuma/exhuma     0 2012-05-13 15:02 helloworld-1.0/helloworld/
-rw-rw-r-- exhuma/exhuma    91 2012-05-13 12:46 helloworld-1.0/helloworld/core.py
-rw-rw-r-- exhuma/exhuma   249 2012-05-13 14:57 helloworld-1.0/helloworld/web.py
-rw-rw-r-- exhuma/exhuma    20 2012-05-13 14:34 helloworld-1.0/helloworld/__init__.py
-rw-rw-r-- exhuma/exhuma   264 2012-05-13 15:02 helloworld-1.0/PKG-INFO
drwxrwxr-x exhuma/exhuma     0 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/
-rw-rw-r-- exhuma/exhuma    10 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/requires.txt
-rw-rw-r-- exhuma/exhuma   264 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/PKG-INFO
-rw-rw-r-- exhuma/exhuma    96 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/entry_points.txt
-rw-rw-r-- exhuma/exhuma    11 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/top_level.txt
-rw-rw-r-- exhuma/exhuma     1 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/dependency_links.txt
-rw-rw-r-- exhuma/exhuma   285 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/SOURCES.txt
-rw-rw-r-- exhuma/exhuma    59 2012-05-13 15:02 helloworld-1.0/setup.cfg
-rw-rw-r-- exhuma/exhuma    57 2012-05-13 12:50 helloworld-1.0/README.txt

As you can see, the templates are not included! So our package will not work if we distribute like this! To do this, we need to tell python that we want other non-source files to be added. We can do this easily with a file called MANIFEST.in. In our case, it will look like this:

recursive-include helloworld/templates *.html

Finally, we need to add the following line to the setup script:

setup(
    ...
    include_package_data=True,
    ...
    )
When running the setup script, it will create a folder with the extension .egg-info. If you make changes to your MANIFEST.in file, you should delete this folder. It contains a file named SOURCES.txt which contains all the file in the package. If you remove files from your manifest, the will only disappear when that file is changes as well. So the easiest way is to delete the egg-info folder and let the setup script re-create it!

There are other ways to address the issue of data files. Most notably, as of Python 2.7, the setup procedure will automatically create a manifest based on package_data and data_files. The advantage of this is that your setup specs are no longer split into multiple files. As it is fairly new, and as Python 2.7 is only now gaining traction, I have not yet worked with this. I leave this as an exercise to the reader.

If we now re-create the sdist, we see that the files are included in the tarball. We need to do this for all non-source files which we want to include in our project! You can find more detailed information on this in the official documentation.

Accessing packaged files

When you need to access these packaged data files you cannot simply use open(filename). You don’t know the filename! It could even be stored inside a .egg. This all depends on how the end-user will install the package. You don’t have control over this! For this reason, distribute provides a couple of methods to access these files. The most likely candidates are:

  • resource_stream which returns a file-like object,
  • resource_string which returns a string and
  • resource_filename

Note that when using resource_filename the resource may be (in the case of a zipped installation) extracted to a cache folder. See the pkg_resources (resource-extraction) for details on this behaviour.

Publishing

If you want to make your package available to others, the obvious choice is pypi. Python makes it easy to publish. Simply run:

$ python setup.py register sdist upload

See the hitchhiker’s guide for more details on this.

Before you publish on pypi you may want to test your upload process. For this reason, there is http://testpypi.python.org/pypi
To use this (or any other package index), you need to specify the details in your ~/.pypirc. Here is an example:

[distutils]
index-servers =
    pypi
    ppt

[pypi]
username:yourlogin
password:yourpasswd

[ppt]
repository: http://testpypi.python.org/pypi
username:test
password:test

Having this, you can then upload packages to the given repositories by appending -r ppt for example.

I was testing this while I was writing this and it triggered an infinite recursion. Following the comment of Richard Jones it should work. Maybe someone can shed some light on this?

But corporate policy may prevent you from publishing to the public domain. Or your project contains sensitive data which should not be made public. In this case, it is very easy to set up a local repository of packages. You only need to have a HTML document with the links available. A very easy way is to set up an Apache host which indexes a folder into which you upload your packages.

Even if not behind corporate restrictions, this is an easy way to publish pre-release packages which you don’t want to push to pypi just yet.

To be able to use these repositories for your dependencies, you need to add the links into your setup.py file. Let’s assume you create a new package depending on our helloworld package which we published on our local repository, and that your local repository is available as http://our.local.repo/. Then you would create a setup script like this:

from setuptools import setup, find_packages
from os.path import join, dirname

setup(
    name='myotherproject',
    version='1.0',
    packages=find_packages(),
    long_description=open(join(dirname(__file__), 'README.txt')).read(),
    install_requires=[
        'helloworld==1.0'
    ],
    dependency_links = ['http://our.local.repo'])

Alternatively, when using pip, you can specify URLs at which to look for packages using pip -f ....

Food for thought

As promised, here’s another tidbit. For many python beginners, the setup.py file is some sort of black magic and must contain the call to the setup method and nothing else. But that is not true! The setup script is a completely normal python file which you execute. This has an awesome implication: You can do whatever you want before and after the call to setup. Everything you execute before the call to setup, will obviously be run before the setup process, everything after the setup call will in turn be run after the installation is finished. I will let your imagination run wild with what you can do. Keep in mind though, that you may want to keep away from user-prompts and so forth to avoid breaking automated installations! But it is your project and your choice what you will or won’t do.

In most cases this level of fine-tuning is not needed however distribute and distutils already offer a lot of functionality. Investigate these first! Only do such things if there is really no other way around! Or if you want to play around ;)

References

Posted in Python | 23 Comments »

  • http://blog.tplus1.com Matt Wilson

    I also posted this on the reddit page for your post.

    Glad to see the explanation of how to write MANIFEST.in to include the .html files.It would be an even better article if there was an example of how to use pkg_resources to fetch those data files back out. FWIW, here’s what I would do:t = pkg_resources.resource_filename(‘helloworld.templates’, ‘index.html’)
    return …
    Of course, sometimes you don’t want a filename; you maybe want a file-like object, so you should use pkg_resources.resource_stream then, and other times you want the data loaded in a string, so you can use pkg_resources.resource_string in that case.And what’s so great about entry points vs old-fashioned scripts? With an old-fashioned script, it is usually very easy to see where your command-line parsing is separate from your interaction with inner libraries. Using arguments and options with entry points requires importing sys and accessing sys.argv from some inner code, which just feels odd.But what’s the advantage of entry points?

  • exhuma

    You are absolutely right. After writing all this text, my brain got tired, and my fingers became dyslexic. I now replaced the pkg_resources link with a  small paragraph about this into the document. 

  • http://profiles.google.com/ionel.mc Ionel Maries Cristian

    This:

    long_description=open(‘README.txt’).read(),

    is wrong. It will fail when you run the setup script in a different directory (some tools do that).

  • exhuma

    While I am not aware of any tool that would do this, you are right. And it can be made safer. I updated the article using __file__ to determine the proper path.
    Thanks for your input :)

  • Richard Jones

    Thanks for the awesome post! Could I make a small request that you include mention of testpypi.python.org as a server to push packages at when learning / testing things out? It’s very, very poorly marketed at the moment and mention here would help kick that off :-)

  • http://twitter.com/voidspace Michael Foord

    Actually, I’m *pretty sure* that setup.py will fail *unless* run from the correct directory. So I think it’s safe – but the fix does no harm anyway.

  • BaltoRouberol

    This is awesome, thanks. With so many modules addressing the same problem, it’s hard to figure out how to get started.

  • exhuma

    I will. Just a small question upfront. Does everything work exactly the same way as with pypi? In that case, I can also include a small section about .pypirc. At the time I wrote this article, I did not find a good example case for .pypirc. This would be perfect for that!

  • Nicholas Retallack

    To add some complication to this, if you want to build other package types like RPM (via bdist_rpm), setup will copy your whole source tree including the setup.py file, and then run it again in its new environment, with different arguments.  This can be frustrating.  I ended up adding some hooks into my setup.py file to detect whether it was in the first or second run, and change the arguments on the second run to match the first.

    Also frustrating is attempting to build more than one package from the same setup.py file.  Due to the copy-and-re-run thing, it will always look for a file called setup.py in the second phase regardless of what the first file was named, so you can’t really place two install scripts in the same place.  It doesn’t know the arguments from the first run either, so it will forget which package it was building. If you have any comments to make these use cases easier, I’d love to hear em.

  • http://pythonpackages.com/ Alex Clark

     Whoah, poorly marketed indeed. Good to know about this, thanks!

  • Richard Jones

    Yes, it works identically to the Real PyPI. Thanks!

  • exhuma

    Interesting… I never ran into such problems… but then again, I only package as source dists. I never had the need to do a bdist. But when the day comes, I hope I’ll remember your comment ;)

    I will leave this as a comment and don’t edit it into the main article, as – in my opinion – this is something quite specific. And I don’t want to overcomlicate it.

    Unless someone thinks this is essential of course… ;)

  • guesty-guest-guest

    For a beginner, this is definitely amazeballs. Thank you.

  • Anon

    Excellent guide –  thanks! I’m missing a note to set include_package_data=True though; I thought this was required for MANIFEST.in to work correctly?

  • Anon

     Also, according to the following comment, `python setup.py develop` obsolete with pip: http://stackoverflow.com/questions/3606457/removing-python-module-installed-in-develop-mode#comment10771810_3606457

  • exhuma

    Interesting. Good to know. I’ll edit it into the post!

  • exhuma

    I tried it and got a max recursion error… I added the section to the post. Any idea what might go wrong?

  • exhuma

    You are correct… I think this was a copy paste error ;)
    It’s fixed now. I also added tiny little bit of explanation around this topic. :) Thanks for the spot!

  • Pingback: Visto nel Web – 27 « Ok, panico

  • http://www.facebook.com/marco.massenzio Marco Massenzio

    As a relative beginner, I must say this post is really awesome: I could put it immediately into practice on a project I’m working on, improving (a lot) the packaging – and as a fellow blogger, I can tell you put a lot of effort into it: much appreciated, this is what makes our developers community even stronger!

    One minor nit: you may want to mention the bdist_egg command (I noticed -eventually- that it’s listed as a step of the ‘install’ command, but it was not obvious without googling for it)

  • exhuma

    Thanks :)

    About the bdist_egg command: I never really saw the need for it. So far everything I worked with worked fine as a source dist. Granted, I never packaged anything with C extensions. But would a simple “bdist” not do the trick as well in that case?

  • Pingback: Links zwo drei föhr « Frackmente

  • Pingback: Python Notes | Water of a little bird

Pages

Recent Posts

Categories

Links


Archives

Meta