PEP 420 Namespace Packages – Somewhat Surprising

Update 2014/03/16: I submitted a better different patch, which was partially merged, but there’s still an ongoing discussion about the best way to make find_packages() support PEP 420.

Earlier today I submitted a patch to setuptools that adds support for PEP 420 namespace packages (“NS packages”) to find_packages(). In the process, I learned a few things about NS packages that I found somewhat surprising.

My initial “intuitive” understanding was that only directories with either one or more .py files or containing only directories would be considered NS packages. I also thought that NS packages couldn’t be nested under regular packages (i.e., those with an __init__.py). I’m not exactly sure how I arrived at this understanding, but it’s what seemed to make sense before I dug into this.

The first thing I found surprising is that NS packages can be nested under regular packages. I couldn’t figure out what the use cases for this would be (which, of course, doesn’t mean there aren’t any). One potential problem with this is that if you have a directory with package data that’s not a Python package, it can be imported as a package, whereas in the pre-420 days you’d get an ImportError.

The second surprising thing, which is a generalization of the first, is that any subdirectory of a directory that’s on sys.path can be imported as an NS package. This is probably useful in certain scenarios, but it can also cause issues in other scenarios.

For example, if find_packages() emulated the above behavior, by default *every* subdirectory in a source tree would be considered a package and included in the package list and therefore in a distribution, which is often (maybe usually) undesirable.

If you’re using vanilla distutils, this isn’t an issue since you have to explicitly list all of the packages in a distribution, but that can be super tedious and it’s easy to forget to add new packages to setup(), and a lot of packages on PyPI already use find_packages().

So I had one thought that an explicit __not_a_namespace__ marker file could be added to directories that shouldn’t be considered NS packages (or maybe __package_data__ is a better name). This is almost certainly a non-starter though, because it could lead to a lot of empty files cluttering up your source tree (plus, you might forget to do this too, so it doesn’t really help with that aspect).

My patch for find_packages() adds the ability to explicitly specify the packages you want to include using wild card patterns. In the following example, the mypkg directory and all of its subdirectories will be included as packages (when running Python 3.3+):

project/
    docs/
        index.rst
    mypkg/
        mysubpkg/
            __init__.py
        xxx/
            some-file
        mymod.py
    setup.py

# setup.py
from setuptools import setup, find_packages

setup(
    packages=find_packages(include=['mypkg*']),
    ...
)

This goes part of the way toward making sure only the appropriate directories are included as packages in a distribution. In simple cases, it will be sufficient by itself. In other cases, it might be necessary to exclude certain directories:

setup(
    packages=find_packages(
        include=['mypkg*'],
        exclude=['mypkg.xxx']),
    ...
)

This is a bit more complex than the way things used to be–where you could almost always simply say packages=find_packages() without thinking about it–but I guess that’s the price of new features and functionality.

Update: I thought of a little hack for explicitly marking non-package directories–name them with an extension (e.g., some.data). They will then become unimportable, and find_packages() already skips directories with dots in their names.

One thought on “PEP 420 Namespace Packages – Somewhat Surprising

  1. Wyatt: As you probably have seen on https://bitbucket.org/pypa/setuptools/issue/97, I share many of your concerns. I think it is a very unconsidered move to have PEP-420 rules operate within a directory that has an explicit __init__.py. Your idea to add a marker file to disable PEP 420 for a directory tries to get at that. It would not need to be for all subdirs, just the top-level package dir. (A bit like .htaccess files.) I don’t say this is a well-thought-out solution, but just alerts us to a problem. But why is this not the default?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>