PEP 420 Namespace Packages – Somewhat Surprising
Posted Saturday, February 8, 2014, at 12:05AM
Earlier today I submitted a patch to
setuptools that adds support for PEP 420 namespace packages (“NS packages”) to
find_packages(). In the process, I learned a few things about NS packages that I found somewhat surprising.
My initial “intuitive” understanding was that only directories with either one or more
.py files or containing only directories would be considered NS packages. I also thought that NS packages couldn’t be nested under regular packages (i.e., those with an
__init__.py). I’m not exactly sure how I arrived at this understanding, but it’s what seemed to make sense before I dug into this.
The first thing I found surprising is that NS packages can be nested under regular packages. I couldn’t figure out what the use cases for this would be (which, of course, doesn’t mean there aren’t any). One potential problem with this is that if you have a directory with package data that’s not a Python package, it can be imported as a package, whereas in the pre-420 days you’d get an
The second surprising thing, which is a generalization of the first, is that any subdirectory of a directory that’s on
sys.path can be imported as an NS package. This is probably useful in certain scenarios, but it can also cause issues in other scenarios.
For example, if
find_packages() emulated the above behavior, by default *every* subdirectory in a source tree would be considered a package and included in the package list and therefore in a distribution, which is often (maybe usually) undesirable.
If you’re using vanilla
distutils, this isn’t an issue since you have to explicitly list all of the packages in a distribution, but that can be super tedious and it’s easy to forget to add new packages to
setup(), and a lot of packages on PyPI already use
So I had one thought that an explicit
__not_a_namespace__ marker file could be added to directories that shouldn’t be considered NS packages (or maybe
__package_data__ is a better name). This is almost certainly a non-starter though, because it could lead to a lot of empty files cluttering up your source tree (plus, you might forget to do this too, so it doesn’t really help with that aspect).
My patch for
find_packages() adds the ability to explicitly specify the packages you want to
include using wild card patterns. In the following example, the
mypkg directory and all of its subdirectories will be included as packages (when running Python 3.3+):
project/ docs/ index.rst mypkg/ mysubpkg/ __init__.py xxx/ some-file mymod.py setup.py # setup.py from setuptools import setup, find_packages setup( packages=find_packages(include=['mypkg*']), ... )
This goes part of the way toward making sure only the appropriate directories are included as packages in a distribution. In simple cases, it will be sufficient by itself. In other cases, it might be necessary to exclude certain directories:
setup( packages=find_packages( include=['mypkg*'], exclude=['mypkg.xxx']), ... )
This is a bit more complex than the way things used to be–where you could almost always simply say
packages=find_packages() without thinking about it–but I guess that’s the price of new features and functionality.
Update: I thought of a little hack for explicitly marking non-package directories–name them with an extension (e.g.,
some.data). They will then become unimportable, and
find_packages() already skips directories with dots in their names.