Category Archives: planet python

PEP 420 Namespace Packages – Somewhat Surprising

Update 2014/03/16: I submitted a better different patch, which was partially merged, but there’s still an ongoing discussion about the best way to make find_packages() support PEP 420.

Earlier today I submitted a patch to setuptools that adds support for PEP 420 namespace packages (“NS packages”) to find_packages(). In the process, I learned a few things about NS packages that I found somewhat surprising.

My initial “intuitive” understanding was that only directories with either one or more .py files or containing only directories would be considered NS packages. I also thought that NS packages couldn’t be nested under regular packages (i.e., those with an __init__.py). I’m not exactly sure how I arrived at this understanding, but it’s what seemed to make sense before I dug into this.

The first thing I found surprising is that NS packages can be nested under regular packages. I couldn’t figure out what the use cases for this would be (which, of course, doesn’t mean there aren’t any). One potential problem with this is that if you have a directory with package data that’s not a Python package, it can be imported as a package, whereas in the pre-420 days you’d get an ImportError.

The second surprising thing, which is a generalization of the first, is that any subdirectory of a directory that’s on sys.path can be imported as an NS package. This is probably useful in certain scenarios, but it can also cause issues in other scenarios.

For example, if find_packages() emulated the above behavior, by default *every* subdirectory in a source tree would be considered a package and included in the package list and therefore in a distribution, which is often (maybe usually) undesirable.

If you’re using vanilla distutils, this isn’t an issue since you have to explicitly list all of the packages in a distribution, but that can be super tedious and it’s easy to forget to add new packages to setup(), and a lot of packages on PyPI already use find_packages().

So I had one thought that an explicit __not_a_namespace__ marker file could be added to directories that shouldn’t be considered NS packages (or maybe __package_data__ is a better name). This is almost certainly a non-starter though, because it could lead to a lot of empty files cluttering up your source tree (plus, you might forget to do this too, so it doesn’t really help with that aspect).

My patch for find_packages() adds the ability to explicitly specify the packages you want to include using wild card patterns. In the following example, the mypkg directory and all of its subdirectories will be included as packages (when running Python 3.3+):

project/
    docs/
        index.rst
    mypkg/
        mysubpkg/
            __init__.py
        xxx/
            some-file
        mymod.py
    setup.py

# setup.py
from setuptools import setup, find_packages

setup(
    packages=find_packages(include=['mypkg*']),
    ...
)

This goes part of the way toward making sure only the appropriate directories are included as packages in a distribution. In simple cases, it will be sufficient by itself. In other cases, it might be necessary to exclude certain directories:

setup(
    packages=find_packages(
        include=['mypkg*'],
        exclude=['mypkg.xxx']),
    ...
)

This is a bit more complex than the way things used to be–where you could almost always simply say packages=find_packages() without thinking about it–but I guess that’s the price of new features and functionality.

Update: I thought of a little hack for explicitly marking non-package directories–name them with an extension (e.g., some.data). They will then become unimportable, and find_packages() already skips directories with dots in their names.

Travis CI, Python 3.3, Buildout

I had the hardest time configuring Travis CI for several Python 3.3 packages. I think it may have been because they’re all PEP 420 namespace packages. The strange thing is that the tests would pass for some of the packages but not others.

I ended up creating a package (which I lovingly named travisty) so my repos wouldn’t be cluttered with hundreds of superfluous commits recording my failed attempts to get things working. I got pretty frustrated and almost gave up on Travis entirely.

There was a post on Planet Python recently about problems with Python 3 and virtualenv, and I’m guessing the issue is something related to that and not Travis specifically, but I never did actually figure it out.

Instead, I tried installing my packages using Buildout (which I prefer over virtualenv for development anyway), and that magically worked. It kind of bugs me that I don’t know why it works and the bare virtualenv doesn’t, but at this point I’m just happy that it does.

Here’s the .travis.yml I ended up with:

language: python
python:
  - "3.3"
install:
  - pip install zc.buildout
  - git clone git://github.com/TangledWeb/tangled ../tangled
  - buildout
script:
  - ./bin/tangled test

The tangled test script is just a tiny wrapper around the built in unittest discovery. It’s equivalent to ./bin/python -m unittest discover my/package.

What I Hate About Python

During an interview a while back, I was asked to name some things I hate about Python. For some reason, I choked and couldn’t think of a good answer (I kind of wanted to blame the interview process, but that’s a rant for another time).

Maybe I’ve just been programming in Python for too long, and that’s why I couldn’t think of something (or maybe I’m just a massive Python fanboy). On the other hand, I’ve been programming in JavaScript (which I generally like) for about as long, and I can think of at least a few things right off the top of my head (mostly related to weak typing).

I did a search to see what other people don’t like about Python to get some inspiration, but I didn’t come across anything I truly hate.

Things That Don’t Bother Me

  • Significant whitespace. I love it.
  • Explicit self. I guess it would be “convenient” if I didn’t have to add self to every method signature, but I really don’t spend much time on that, and it takes about 1ns to type (in fact, my IDE fills it in for me). There are technical and stylistic considerations here, but the upshot for me is that it just doesn’t matter, and I actually like that all instance attribute access requires the self. prefix.
  • “Crippled” lambda. There are rare occasions where I want to define more complex anonymous functions, but there’s no loss of expressiveness from having to use a “regular” named function instead. Maybe multi-line anonymous functions that allow statements would lead to different/better ways of thinking about programs, but I’m not particularly convinced of that. (Aside: one thing I do hate relating to this is the conflation of lambdas and closures–normal functions are closures in the same way that lambdas are.)
  • Packaging. I don’t know why, but I’ve never had any problems with setuptools. There are some issues with the installation of eggs when using easy_install, but I think pip fixes them. I am glad that setuptools is now being actively developed again and the distribute fork is no longer necessary.
  • Performance, GIL, etc. I’ve used Python for some pretty serious data crunching (hello, multiprocessing) as well as for Web stuff. There are cases where something else might have been faster, but Python has almost never been too slow (caveat: for my use cases). Of course, Python isn’t suitable for some things, but for most of the things I need to do, it’s plenty fast enough.
  • len(), et al. I don’t have anything to say about this other than it’s a complete non-issue for me. Commentary about how this means Python isn’t purely object-oriented makes me a little cranky.

Things That Bug Me a Little Bit

  • The way super works in Python 2 is kind of annoying (being required to pass the class and self  in the super() call). This is fixed in Python 3, where you can just say super().method() in the common case.
  • Unicode vs bytes in Python 2. This is also fixed in Python 3 (some people have argued that it’s not, but I haven’t run into any issues with it yet (maybe it’s because I’m working on a Python 3 only project?)).
  • The implicit namespace package support added in Python 3.3 causes some trouble for my IDE (PyCharm), but I’m assuming this is a temporary problem. I also had some trouble using nose and py.test with namespace packages. Again, I assume (hope) this is only temporary.

Things That Bother Me a Little More

  • The Python 2/3 gap is a bit troublesome. Sometimes I think the perception that there’s a problem may be more of a problem, but I don’t maintain any major open source projects, so I’m not qualified to say much about this. Personally, I’ve really been enjoying Python 3, and I do think it offers some worthwhile advantages over Python 2.

Conclusion

There isn’t one. I left some things out intentionally (various quirks). I probably forgot some things too.

I’m Working on a Python 3 (only) Web Framework

I’ve been working on a Python 3 (3.3+) only Web Framework for the past few weeks. My initial motivation was to try to build something “RESTful” that doesn’t use the concepts (or terms) “view” or “controller” at all.

That’s because I don’t even know what a “view” is (or is supposed to be be). I’ve tried to think of views as “representations” (i.e., in the sense of representing some data as a particular content type like HTML or JSON), but in the real world view functions are often used to implement business logic, and the term “view” seems misleading (“action” seems more appropriate in this case). The term view is also used by some frameworks to refer to templates (e.g., Rails), and this usage actually makes a bit more sense to me.

If you’re just throwing some code together, maybe this type of thing doesn’t matter too much, but when things get complex (as they tend to) and you’re looking for ways to keep things simple (in terms of concepts, code organization, etc), I think it can make a big difference. Your core abstractions inform (and possibly limit) your thinking; they can also limit, to an extent, what’s possible or at least easily doable.

A different way to say this is that “views” and “controllers” don’t really fit my brain, so I’m experimenting with something different. I actually tried to implement something like this on top of Pyramid at one point (pyramid_restler), but I got hung up trying to build on top of views (although it’s still useful, IMO).

So, I set out to create a new framework that uses resources and representations as its main abstractions. Resources are things that are mounted at a particular path, respond to HTTP methods, and return data that gets represented according to the requested content type (i.e., as specified by the Accept header (note to self: should look at Accept-* headers as well)).

I’ve also put some effort into making the framework “correct” in terms of HTTP status codes and other RESTful concepts. But I’m finding that it’s not so easy to create a general purpose framework that handles every scenario right out of the box. For example, it’s okay (according to the RFC) to return various status codes for a POST or a PUT depending on the circumstances, so you can’t just say, “Oh, it’s a POST, so we’ll automatically return a 201 Created.”

I’m not sure how to completely generalize HATEOAS either. For certain scenarios you can say things like, “This container resource has these items, so generate a set of links to the items and include that in the representation.” But I’m not sure how you could cover every link scenario in a generic manner. If certain constraints are enforced, maybe… (maybe I just haven’t thought hard enough about this…)

That said, I think it’s possible to create a framework that at least nudges people in the “right” direction (or at least a particular direction).

So that’s the gist of it. As it matures, I plan to post more about it (including links to the source, which will be MIT licensed). For now, I’m mostly avoiding the term RESTful and just calling it resource-oriented.

Python 3

This is the first project that I’ve really used Python 3 for, and it’s been fun playing with all the shiny new stuff (hello, built in namespace packages). I was planning to say more about the choice of Python 3 (3.3+ in particular) in this post, but I think it’s gotten a bit long already. I’ll be writing another post about that soon, including the issues I’ve run into (or lack thereof).

Generating Random Tokens (in Python)

Update 1/13: After reading the comments and thinking about it some more, I think binascii.hexlify(os.urandom(n)) is the easiest way to generate random tokens, and random = random.SystemRandom(); ''.join(random.choice(alphabet) for _ in range(n)) is better when you need a string that contains only characters from a specific alphabet. Pyramid uses the former approach; Django uses the latter.

I’m working on a web site where I need to generate random CSRF tokens. After digging around a bit, I found os.urandom(n), which returns “a string of n random bytes suitable for cryptographic use.” Okay, that sounds good… except that it can include bytes that aren’t “web safe”.

So I needed a way to encode the output of urandom. I poked around some more and saw binascii.hexlify(data) being used for this purpose (in Pyramid). For some reason, though, I thought it would be “clever” to hash the output from urandom like so: hashlib.sha1(os.urandom(128)).hexdigest().

What I like about this is that no matter how many bytes you request from urandom (assuming more bytes means more entropy), you always end up with a 40 character string that’s safely encoded.

I’m not sure if this provides any real benefit though (in terms of increased security). Are there better ways to generate random tokens?

Another thought I had was to use bcrypt.gensalt() and use its output as is–it uses urandom to generate the initial salt, which is then hashed, and also returns a fixed number of bytes (29).

On a slightly related note, I recently needed to generate a new PIN. My first thought was to reuse a PIN I use elsewhere, but of course that’s a bad idea. My second thought was to use KeePassX to generate one. I happened to have a calculator sitting next to me (one with big buttons); I closed my eyes and banged on it a bit to generate the PIN.

What I Learned from the apache.org Break-in

Here’s my initial take-away after reading about the recent apache.org XSS-initiated break-in:

  • If you’re going to allow caching of credentials (Subversion or otherwise) on a server, don’t use an account that shares credentials with any superuser account. Personally, I can’t think of a good reason for these credentials to be cached in the first place (except on a development machine). As an aside, by default, Mercurial doesn’t do this; I suppose the fact that every ‘svn commit’ is also a push makes this more “necessary” with Subversion.
  • If you have an organization-wide login (say a Windows login that is automatically sync’ed with Subversion, your enterprise RDBMS, and who knows what else[1]), if at all possible use a different  password on any server where you’ve got superuser access.
  • All superuser accounts on servers should have different passwords; at a minimum, if you use a common password for superuser accounts across servers, don’t use this password for other accounts.
  • Use Trac instead of Jira. [2]

That’s a bare minimum; I’m still thinking about how vulnerable the organization I work for might be. Most of this probably seems obvious, but I’m betting that these and other less-than-best practices are extremely common.

Ref: https://blogs.apache.org/infra/entry/apache_org_04_09_2010

[1] This is a purely hypothetical scenario. ;)

[2] Just kidding?

Eclipse for Python Web Development on Ubuntu

Updated 14 July 2009 for Eclipse 3.5. Now using Aptana to install PyDev.

Updated 5 March 2010 for Eclipse 3.5.2 and Subversion 1.6.

Updated 20 July 2010 for Eclipse 3.6

Updated 1 August 2011 for Eclipse 3.7 and Aptana Studio 3

Overview

So, I think I’ve finally decided that I prefer Eclipse for Python Web development over NetBeans. (I prefer Wing IDE over both for straight Python development, but that’s another post.) Eclipse’s Python support, via PyDev, seems more advanced and NetBeans has some annoying issues. Eclipse also seems a bit snappier, at least on my machine (YMMV, blah blah).

Eclipse takes a bit more effort to set up, but once you’ve done it a couple times, it’s pretty straightforward. The hard part is keeping track of the links to the Eclipse update sites and remembering a few odd bits of configuration. This document gives details on installing Eclipse for Python Web development with Aptana, PyDev, and Subclipse.

The instructions here are Linux/Ubuntu-centric, but the instructions for getting to the Platform Binary–the smallest possible Eclipse download, as far as I can tell–are applicable to all platforms.

Download Eclipse

If you go to the Eclipse downloads page, you’ll see packages for Java and C++ along with some other options. If you’re doing ONLY Python development, you might wonder which version to download. These versions install cruft I don’t want or need. I finally found what I think is the smallest possible Eclipse package, the so-called Platform Runtime Library.

Install Eclipse

Extract the downloaded package: tar xvzf eclipse-platform-3.7-linux-gtk.tar.gz. I rename the resulting eclipse directory to eclipse-3.6, move it to ~/.local, and create a symlink in ~/.local/bin to the eclipse executable: ln -s ~/.local/eclipse-3.7/eclipse ~/.local/bin/eclipse

Configure Eclipse

Edit ~/.local/eclipse-3.7/eclipse.ini. Find the line containing “-vmargs”. Add a new line directly below that: -Djava.library.path=/usr/lib/jni. Save and close. Fire up Eclipse.

Install Aptana Studio

Install PyDev

PyDev is installed as part of the Apatana Studio install.

Install MercurialEclipse

Eclipse update URL: http://cbes.javaforge.com/update

Supervisor now supports Python 2.6

I’ve been using Supervisor for a few months now to manage some app servers at work. Good stuff; solid. My only gripe with it was that it didn’t run properly under Python 2.6[1,2]. I had to install Python 2.5 just to run Supervisor, which isn’t a super big deal but is kind of annoying. The latest release, 3.0a7, fixes this by including a “patched version of Medusa to allow Supervisor to run on Python 2.6.”

Note: I had to `wget http://dist.supervisord.org/supervisor-3.0a7.tar.gz` and easy_install that due to a network timeout[3]. I assume that’s because Supervisor is super popular and everyone’s upgrading.

[1] It did run under 2.6 but issued an error when starting up.
[2] This was due to changes in the Python 2.6 stdlib, not a problem with Supervisor itself.
[3] Suggestion: If possible, remove link from PyPI to http://supervisord.org/ so easy_install doesn’t get stuck there.

Implementation of Dijkstra’s Single-Source Shortest-Paths in JavaScript

I’m working on a project where the client wants a cool sliding navigation effect. We’re implementing this with JavaScript/AJAX/DHTML.

One of the constraints is that pages can only be reached via certain other pages. For example, if you’re on the /portland/contact page and want to go to the /seattle/contact page, you’ll first slide up to /portland, then over to /seattle, then finally down to /seattle/contact.

After a while, it occurred to me that there were some similarities with another project I’ve been working on off and on for the last few years, byCycle.org, which is a bicycle trip planner a la Google Maps.

I had written a Python version of Dijkstra’s Single-Source Shortest-Paths (SSSP) for byCycle.org. That’s available on Bitbucket as Dijkstar (so named because it also does has the potential to do A*). I figured it wouldn’t be too hard to port the Python version to JavaScript, and it wasn’t.

There were a few snags, though. Most of it was just syntactic and semantic differences between the two languages. The biggest issue was that I use “heapq“ in the Python version to maintain the costs to previously visited nodes in sorted order. JavaScript has no priority queue implementation that I could find, so I came up with a different solution that involves updating an Object (AKA hash) with costs to newly visited nodes and sorting the keys to pick the next node to visit. I’m assuming/hoping the underlying sort implementation is highly optimized.

Interestingly, I think I found at least one bug in the Python version, although I’ve been using that version for a couple years now with no known problems, so it must only be applicable in certain edge (no pun intended) cases (or maybe it’s due to some difference in the languages–need to take a closer look). I think the JS version came out cleaner, too.

If anyone’s interested, I’m releasing this under an MIT license. You can get it from here. Note that it depends on the util module that you can get from here. The util module contains some other Python-inspired JavaScript, in particular a couple of functions for generating namespaces and classes. I might write another post about that at some point.

Ruby on Rails… Revisited

Updated with links and a couple typo corrections.

Update: It wasn’t long before the project got too complex on the back end (SOAP blech) for my limited Ruby knowledge. I switched it back to Python/Pylons and never looked back. The Pylons => Rails migration was straightforward. I guess I could have pushed through with Ruby/Rails, but with deadlines looming, it made more sense for me to go with what I knew best. Being familiar Python and its ecosystem was far more pertinent than the deficiency of any particular library. There’s probably another blog post or two in here…

I’ve been working on a fairly big Web site project lately. My partner and I initially decided to use Django to build the site, mainly because I’m a Python “expert” and Django is (apparently) the #1 Python Web framework. We were also lured by the easy admin interface.

After trying to use Django and not really enjoying it, I tried switching to Pylons because I’ve had a good amount of experience with it in the building of byCycle.org. It’s gone through two fairly major releases since then, and so have a bunch of the libraries that tend to get used with it, like SQLAlchemy, Elixir, etc.

I was having a hard time with the Pylons docs, and so I ended screwing around with Grok (which actually looks fairly interesting) and even took a look at the Zope 3 site. I’m sure Zope is really awesome or whatever, but it might as well suck. Every time I look at that site, I’m just like “WTF! This shit has been around for like five years!” Anyway, I might just not be smart enough for Zope.

This led us back toward Rails (even if it is a ghetto). I used Rails a bit last year but never did anything too serious with it. Diving into it today was quite a pleasure. There are issues to be sure, but overall I’m enjoying it by far over any of the other options we had tried. I’m also enjoying learning/relearning Ruby.

If Pylons had good docs, we’d probably be using that.

So, I don’t know if this is a particularly useful post, since I didn’t get into much in the way of reasons (what, i have back this up?!). This subject’s been hashed and rehashed, but I just wanted (needed) to make a qualitative statement about my/our experience, which, of course, is purely personal.