A comprehensive guide through Python packaging (a.k.a. setup scripts)
One of the really useful things in python are the setup scripts. When doing “serious” business, you really should look into them. Setup scripts are amazingly powerful. But they don’t necessarily need to be complex. But because of this flexibility, the documentation around them seems like a lot to read. Additionally, the state of packaging has changed quite a bit over the years. So you now have a couple of packages (distutils, setuptools, distribute) which all seem to try to solve the same problem. See the current state of packaging for more details on this.
This post attempts to summarize the important bits using a “Hello World” project and steeping through the process of creating the setup.py file:
- Creating a package distribution
- Automatic generation of executables
- Version numbers
- Dependency management
- Publishing your package (thus also making it available for automatic dependency resolution)
- Some more food for thought.
- importing the package itself (to centralize the version number)
- Reading the long_description content from a text-file
Both methodologies have their issues. Importing the package only works if you can import it cleanly (i.e. without dependencies) from standard python. Reading the text file only works if the setup.py is called from the proper location.
This is mostly true. There are corner-cases however where this is not possible. If that is the case, you will need to live without these helpful shortcuts. See the comments on this article for more on this.
Our Hello World project
Our project will provide the following:
- a package called helloworld
- a method returning the string “Hello World!”
- an executable printing “Hello World!” to the standard output.
Creating a package distribution
Before you can do any of the aforementioned points, you need to be able to create a package file. In most cases, this would be a tarball (a source distribution). First, you need to think about your file layout:
In python, every package lives in its own folder. Unless you have a one-file module. For the sake of this post, I will assume you have a folder, and leave the one-file module as an exercise to the reader. After all, the differences are not all that great. Let’s also assume, for the sake of simplicity, we have only one module in our package folder.
Let’s create the package
First, the file laout:
├── helloworld
│ ├── __init__.py
│ └── core.py
└── setup.py
NOTE: This is just an example project, and it could have been implemented as a one-file module (using core.py), or the application logic could have been stuffed into __init__.py. But I assume that multi-file packages are far more common, so I will use an appropriate layout for that. Also, I personally like to always create my projects like this. On the one hand for consistency, and on the other hand it makes it easier to “grow” the project later on. It’s up to you to decide for yourself.
In this case, I named the project root “the-helloworld-project”. But (again, personally), I just give the root folder the same name as the package name. The difference between the root (“the-helloworld-project”) and the package (“helloworld”) is that the package folder contains the actual package code, and the root folder contains packaging metadata and possibly some other development related stuff (unit-tests, fabric files, a readme, the license file, …)
Next, let’s create our business logic. For our target features, we need both a method generating our “Hello World!” message, and one to print it to stdout. Let’s put this into helloworld/core.py:
return "Hello World!"
def print_message():
print get_message()
So far so good. We have our business logic. Yay \o/.
Creating a skeleton setup.py file
Next up, we will create a very simple setup.py file. Let’s also add a README.txt file for good measure (it should be formatted in ReST!):
===========
An example Hello World project.
Next, let’s create a very simple setup script, and put it into the root folder:
setup(
name='helloworld',
version='1.0',
packages=find_packages(),
long_description=open('README.txt').read(),
)
Note that we don’t use distutils, but setuptools. You will be better off with setuptools until distutils2 is available.
IMPORTANT:
The package setuptools is actually made available by a package named distribute. You should not install the old setuptools any more!
WARNING
As Ionel Maries Cristian pointed out in the comments, the line long_description=open('README.txt').read() could cause problems. Let’s make it safer…
We know that the README file sits in the same folder as the setup script. So we can use os.path.dirname on __file__ to construct a filename we can reach:
from os.path import join, dirname
setup(
name='helloworld',
version='1.0',
packages=find_packages(),
long_description=open(join(dirname(__file__), 'README.txt')).read(),
)
Now we can test to create a source distribution by running:
There will be some warnings as we did fill in neither an author email nor a package URL. For this example we consider this okay!
This will give you a file called sdist/helloworld-1.0.tar.gz. You can inspect it if you want using tar tzvf sdist/helloworld-1.0.tar.gz.
With this done, we already have two points of our targets done:
a package called helloworlda method returning the string “Hello World!”- an executable printing “Hello World!” to the standard output.
Testing inside a virtual environment
The details about virtual environments are out of the scope of this document (see the package documentation for more). But because it is highly recommended to use them, we will do so right now. This gives us the ability to test the package installation without polluting our system packages.
So let’s create one:
This will create an environment called env in our current working folder and install pip and distribute into it. With this in place, we can already test the package installation:
running install
running bdist_egg
running egg_info
[...]
Processing dependencies for helloworld==1.0
Finished processing dependencies for helloworld==1.0
Now we can test our package within the virtual environment:
>>> import helloworld.core as hw
>>> hw.get_message()
'Hello World!'
>>> hw.print_message()
Hello World!
Perfekt!
But we still want our executable…
Creating an executable
Following the official distutils documentation, we should specify the scripts keyword argument on the setup method. On the other hand, distribute gives us entry points. Personally, I prefer entry points. So let’s get this done. Modify the setup.py file and add the following:
...
entry_points={
'console_scripts':
['helloworld = helloworld.core:print_message']
}
)
The complete setup.py file should now look like this:
from os.path import join, dirname
setup(
name='helloworld',
version='1.0',
packages=find_packages(),
long_description=open(join(dirname(__file__), 'README.txt')).read(),
entry_points={
'console_scripts':
['helloworld = helloworld.core:print_message']
}
)
What does this mean?
- helloworld = ...
Create an executable with the name helloworld (on windows it will be a .exe file) - ... = helloworld.core:...
The method is found in this package - ... :print_message
Call this method on execution
Let’s re-install our package:
...
Installing helloworld script to /path/to/the-helloworld-project/env/bin
...
$
As you can see in the output, the setup script installed an executable into our local bin folder. If you execute it, your method from your package will be executed.
Awesome! We can cross the last item off our basic tasks:
a package called helloworlda method returning the string “Hello World!”an executable printing “Hello World!” to the standard output.
Additionally, with pip you can instead use pip install -e . to achieve the same as with python setup.py develop with the advantage that pip also has an uninstall command.
Version Numbers
Version numbers are a very important part of your project! You should manage them as soon as possible. They are primarily important in two situation:
- Installation of the package if an older version exists already (upgrading)
- Dependency management
If you ever publish a package, it should always stay available! People using your package have the option to “pin” a version in their dependencies. Because of this, you will never know how long a specific version is going to be used. Again: Once published it should never disappear!
In the above example, we “hard-coded” the version number as 1.0. There is a better solution however! It would be nice to give your package users the ability to check the version number. To do that, we will move the version number into our top-level __init__.py file. By convention, the variable should be named __version__. In our package, this will be the only line in the file:
This will allow users to do the following:
>>> print helloworld.__version__
'1.0'
This is a very convenient information to have!
But now there is one annoyance… We have to edit the version number in two locations. In our setup script, and the __init__.py file. But there’s a simple solution for that. In your setup script, simply import your project and get the version number from there.
WARNING:
As bboe over on reddit pointed out, when doing this, you must be aware that this could fail if the project tries to import one of it’s dependencies when importing helloworld. As the dependency cannot be available yet when running the script, it will raise an ImportError. So, either you don’t import the version variable, or you must not trigger an import of any dependency when importing the package.
Other alternatives are possible. Such as storing your version number in a separate .py file and importing that one. Or, if all else fails, running a regex on you __init__.py file. You can see implementations of these solutions in the reddit link mentioned before.
setup(
...
version=helloworld.__version__,
...
which will give you this result:
from os.path import join, dirname
import helloworld
setup(
name='helloworld',
version=helloworld.__version__,
packages=find_packages(),
long_description=open(join(dirname(__file__), 'README.txt')).read(),
entry_points={
'console_scripts':
['helloworld = helloworld.core:print_message']
},
)
It is recommended that the version numbers follow a well-known scheme. There are a few:
I suggest following the “strict” numbers stated in PEP386 for distutils. An example from the PEP document (ordered by version number):
0.4.1
0.5a1
0.5b3
0.5
0.9.6
1.0
1.0.4a3
1.0.4b1
1.0.4
Dependency Management
As stated above, with setup scripts, you can specify dependencies. So let’s make a small hello-world web application using Flask. A web-app will also contain “data” files (templates, …), which will give us the opportunity to talk about the manifest later on. But for now, let’s simply add Flask into our dependencies. To do that, add the following to your setup script:
...
install_requires=[
'Flask'
]
...
So the complete setup script now reads:
from os.path import join, dirname
import helloworld
setup(
name='helloworld',
version=helloworld.__version__,
packages=find_packages(),
long_description=open(join(dirname(__file__), 'README.txt')).read(),
entry_points={
'console_scripts':
['helloworld = helloworld.core:print_message']
},
install_requires=[
'Flask'
]
)
Let’s run the script again. But let’s use pip now:./env/bin/pip install -e .. Alternatively we could have used ./env/bin/python setup.py install or ./env/bin/python setup.py develop, but pip gives us the option of uninstalling. Not that this would matter in our development environment. But it’s good to know it exists. On the output, we will see that this will now automagically pull in everything required. Python will automatically query pypi to find the latest version of Flask and follow it’s dependencies.
Okay. So now we got the latest Flask version, and we will start to develop against that version. Because of this, we want to pin the version, so that we (and our users) will not be surprised if the API of Flask will change in the future. To determine the version of Flask, we can run the following:
...
Flask==0.8
...
This will list all packages, including the version number installed in the environment from which pip was run.
This output can also be used as a “requirements file” for pip installations. I won’t cover that topic yet. It’s still new to me 😉
So now we know we got version 0.8. We can now change our setup script to pin our version:
...
install_requires=[
'Flask==0.8'
]
...
You are strongly encouraged to do this as it puts you in control about when and where you will upgrade to newer versions!
Now, as a small interlude, let’s create a one-page web application, basing ourselves on the integrated server. This will also give us the opportunity to revisit the creation of executables. In a production environment, you should not do this, but rather deploy a WSGI application. But this post is not about web application. We just want some “data” files!
Create a folder for our templates (which incidentally contains our only data file):
… and put the following file into it, called index.html:
<html lang="en">
<head>
<meta charset="UTF-8">
<title></title>
</head>
<body>
{{message}}
</body>
</html>
Next, create the Flask entry point inside helloworld/web.py
from helloworld.core import get_message
app = Flask(__name__)
@app.route("/")
def hello():
return render_template('index.html',
message=get_message())
def run_server():
app.run()
… and add the entry-point to setup.py:
from os.path import join, dirname
import helloworld
setup(
name='helloworld',
version=helloworld.__version__,
packages=find_packages(),
long_description=open(join(dirname(__file__), 'README.txt')).read(),
entry_points={
'console_scripts': [
'helloworld = helloworld.core:print_message',
'serve = helloworld.web:run_server',
]
},
install_requires=[
'Flask==0.8'
]
)
Execute the setup script again (to make the new executable available):
… and test the server:
* Running on http://127.0.0.1:5000/
Good! We have everything in place to talk about the manifest.
The MANIFEST
There is still one problem with our project. By default the package only contains python files. If you need to add additional data, for example template of image files in a web-application, you need to specify these. The above example contains a template. So let’s see what this means for us. Let’s create a new source distribution:
Now, inspect the contents of the tarball:
drwxrwxr-x exhuma/exhuma 0 2012-05-13 15:02 helloworld-1.0/
-rw-rw-r-- exhuma/exhuma 443 2012-05-13 14:55 helloworld-1.0/setup.py
drwxrwxr-x exhuma/exhuma 0 2012-05-13 15:02 helloworld-1.0/helloworld/
-rw-rw-r-- exhuma/exhuma 91 2012-05-13 12:46 helloworld-1.0/helloworld/core.py
-rw-rw-r-- exhuma/exhuma 249 2012-05-13 14:57 helloworld-1.0/helloworld/web.py
-rw-rw-r-- exhuma/exhuma 20 2012-05-13 14:34 helloworld-1.0/helloworld/__init__.py
-rw-rw-r-- exhuma/exhuma 264 2012-05-13 15:02 helloworld-1.0/PKG-INFO
drwxrwxr-x exhuma/exhuma 0 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/
-rw-rw-r-- exhuma/exhuma 10 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/requires.txt
-rw-rw-r-- exhuma/exhuma 264 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/PKG-INFO
-rw-rw-r-- exhuma/exhuma 96 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/entry_points.txt
-rw-rw-r-- exhuma/exhuma 11 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/top_level.txt
-rw-rw-r-- exhuma/exhuma 1 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/dependency_links.txt
-rw-rw-r-- exhuma/exhuma 285 2012-05-13 15:02 helloworld-1.0/helloworld.egg-info/SOURCES.txt
-rw-rw-r-- exhuma/exhuma 59 2012-05-13 15:02 helloworld-1.0/setup.cfg
-rw-rw-r-- exhuma/exhuma 57 2012-05-13 12:50 helloworld-1.0/README.txt
As you can see, the templates are not included! So our package will not work if we distribute like this! To do this, we need to tell python that we want other non-source files to be added. We can do this easily with a file called MANIFEST.in. In our case, it will look like this:
Finally, we need to add the following line to the setup script:
...
include_package_data=True,
...
)
There are other ways to address the issue of data files. Most notably, as of Python 2.7, the setup procedure will automatically create a manifest based on package_data and data_files. The advantage of this is that your setup specs are no longer split into multiple files. As it is fairly new, and as Python 2.7 is only now gaining traction, I have not yet worked with this. I leave this as an exercise to the reader.
If we now re-create the sdist, we see that the files are included in the tarball. We need to do this for all non-source files which we want to include in our project! You can find more detailed information on this in the official documentation.
Accessing packaged files
When you need to access these packaged data files you cannot simply use open(filename). You don’t know the filename! It could even be stored inside a .egg. This all depends on how the end-user will install the package. You don’t have control over this! For this reason, distribute provides a couple of methods to access these files. The most likely candidates are:
- resource_stream which returns a file-like object,
- resource_string which returns a string and
- resource_filename
Note that when using resource_filename the resource may be (in the case of a zipped installation) extracted to a cache folder. See the pkg_resources (resource-extraction) for details on this behaviour.
Publishing
If you want to make your package available to others, the obvious choice is pypi. Python makes it easy to publish. Simply run:
See the hitchhiker’s guide for more details on this.
Before you publish on pypi you may want to test your upload process. For this reason, there is http://testpypi.python.org/pypi
To use this (or any other package index), you need to specify the details in your ~/.pypirc. Here is an example:
index-servers =
pypi
ppt
[pypi]
username:yourlogin
password:yourpasswd
[ppt]
repository: http://testpypi.python.org/pypi
username:test
password:test
Having this, you can then upload packages to the given repositories by appending -r ppt for example.
But corporate policy may prevent you from publishing to the public domain. Or your project contains sensitive data which should not be made public. In this case, it is very easy to set up a local repository of packages. You only need to have a HTML document with the links available. A very easy way is to set up an Apache host which indexes a folder into which you upload your packages.
Even if not behind corporate restrictions, this is an easy way to publish pre-release packages which you don’t want to push to pypi just yet.
To be able to use these repositories for your dependencies, you need to add the links into your setup.py file. Let’s assume you create a new package depending on our helloworld package which we published on our local repository, and that your local repository is available as http://our.local.repo/. Then you would create a setup script like this:
from os.path import join, dirname
setup(
name='myotherproject',
version='1.0',
packages=find_packages(),
long_description=open(join(dirname(__file__), 'README.txt')).read(),
install_requires=[
'helloworld==1.0'
],
dependency_links = ['http://our.local.repo'])
Alternatively, when using pip, you can specify URLs at which to look for packages using pip -f ....
Food for thought
As promised, here’s another tidbit. For many python beginners, the setup.py file is some sort of black magic and must contain the call to the setup method and nothing else. But that is not true! The setup script is a completely normal python file which you execute. This has an awesome implication: You can do whatever you want before and after the call to setup. Everything you execute before the call to setup, will obviously be run before the setup process, everything after the setup call will in turn be run after the installation is finished. I will let your imagination run wild with what you can do. Keep in mind though, that you may want to keep away from user-prompts and so forth to avoid breaking automated installations! But it is your project and your choice what you will or won’t do.
In most cases this level of fine-tuning is not needed however distribute and distutils already offer a lot of functionality. Investigate these first! Only do such things if there is really no other way around! Or if you want to play around 😉
References
- The Hitchhiker’s Guide to Python Packaging
- A Guide to Python Packagin (IBM)
- The official Python documentation (setupscript, source-dist)
- Distribute
- PEP 386 – Changing the version comparison module in Distutils
Posted in Python | 30 Comments »