Why we love Python at Memonic

Fri, 21 Jan 2011

Over on the Memonic blog I blogged about why and how we work with Python at Memonic.

The blog post goes into the architecture we run here and how Pylons and WsgiService fit into that. I hope you enjoy the read.

Pot: WSGI

Fri, 13 Nov 2009

This Python on the Toilet issue is also available as PDF.

WSGI is a Python standard for how web servers can interact with web frameworks. It’s one of my favorite standards: it’s simple yet very powerful.

To write a WSGI web application you only need to create one function.

def my_app(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return repr(environ)

The function receives a environ argument – a dictionary with all the environment variables. The start_response function can be called with the status line and a list of headers as soon as you’re ready to send output.

New Python releases contain the library wsgiref which can be used to get started quickly with a simple web server (that should not be used in production).

pre..from wsgiref.simple_server import make_server
httpd = make_server(‘’, 8000, my_app)
httpd.serve_forever()

Save these two snippets in a file e.g. myapp.py and execute it with Python. This will serve the sample application on port 8000. Try it out.

You don’t usually want to write your own application directly on top of WSGI. But most frameworks now implement WSGI which has led to better interoperability. If you still want to use WSGI directly, there are a ton of good tools such as WebOb?, “Werkzeug” or “Paste”. I used WebOb to easily build a REST service framework called WsgiService.

For more information I recommend the specification for WSGI which is a good read.

This post is part of the Python on the toilet series.

Pot: namedtuple

Sun, 07 Jun 2009

For returning complex values from a method you have a few popular choices:

  1. Return a tuple with positional values. Those are very easy to unpack in Python and quite attractive because of that.
  2. Return a dictionary with the values. This can lead to slightly verbose accessing of the individual values.
  3. Return an object of some class you define. This required the definition of a class which can be too verbose for just a simple return value.

Since Python 2.6 you have one additional tool: collections.namedtuple. They give you a very lightweight class but it’s a one-liner to define them.

from collections import namedtuple

Child = namedtuple(‘Child’, ‘id, user, type, count’)

def get_first_child():
    return Child(‘id’, ‘userid’, ‘special’, 10)

res = get_first_child()
print res
print res.id, res.user, res.type, res.count

To create a class you call the namedtuple function with the type name and a string of field names. The field names are separated by whitespace and/or coma. Alternatively you can also pass in a list of strings.

The returned class can be used like any other class – but all values are read-only.

As always, the full details are available in the official API documentation for collections.namedtuple.

This post is part of the Python on the toilet series.

Modular Python with eggs

Wed, 15 Apr 2009

I did a presentation at the March swiss.py user group meeting about using Python eggs. I hear a lot of bitching about Python’s packaging. Guido even managed to talk at lengths about the problems with it at his PyCon keynote talk. Here I am boldly saying almost the opposite: Python packaging rocks! It’s very easy to have a great workflow both for development as well as deployment using Python eggs.

Some notes from the presentation which you can download as a PDF.

First, I love how easy it is to get started with Python eggs. You only need to create a setup.py file which at it’s simplest can even be just one line. At Nektoon we try to create as small components as possible and then package them using setuptools. We even split up our Pylons frontend using paste.urlmap (we are already a 10 frontend applications after three months).

We then deploy those eggs into our internal pypi server. Creating such a server is laughingly easy. A web server with directory listings enabled is enough.

The eggs are automatically generated and published to our pypi server by Hudson. Whenever we check in a change, the test suite runs in a clean virtualenv environment. This also makes sure we have declared all the dependencies correctly. If all tests run successfully, then the egg is built and published to our pypi.

As a last step we automatically deploy those eggs to our trunk instance of the Nektoon infrastructure.

All of this was very easy to build thanks to all of the existing work in the Python world. So let me thank all the responsible people for this. And who knows, maybe I’m just in the honeymoon phase as I’ve yet to see something comparably easy and powerful in all the other environments I have experience in.

Pot: nose

Mon, 06 Apr 2009

nose is my Python testing tool of choice. You can use it to very easily create test suites.

While it does support the unittest module which is included with Python, it also allows for a much simple style using just methods and asserts.

Put the following code into a file called test_basic.py:

def test_addition():
    print 2 + 2
    assert 2 + 2 == 4
def test_addition_wrong():
    print 2 + 3
    assert 2 + 3 == 6

Then just run nosetests in the directory where you stored this file. You’ll
get an output indicating, that there is a test failure.

nose is based completely on naming conventions. Make sure your test files and methods all start with test for them to get executed. For assertions, just use assert.

When a test goes wrong you usually need some additional output to know what happened. nose helps you with this by handling print statements nicely. In the example above, as test_addition_wrong fails, you’ll see the output of the print statement print 2 + 3. But the print output of test_addition is suppressed because that test does not fail.

One of my favourite features of nose is Test generators. Often you have a large list of inputs and corresponding expected outputs for a function. Test generators allow you to easily test all of them without having to repeat the test code many times.

An example from a Nektoon test suite:

def test_browsers():
    tests = [('Mozilla/4.0 (compatible; MSIE 8.0)', 'msie', '8'),
        ('Mozilla/4.0 (compatible; MSIE 7.0b)', 'msie', '7')]
    for ua, model, version in tests:
        yield check_browser, ua, model, version
def check_browser(ua, model, version):
    res = parse_ua(ua)
    print res
    assert res['model'] == model
    assert res['version'] == version

The tests list is of course much bigger in our test suite and that’s the beauty of it. Adding a new test is very easy, but nose still provides very meaningful error output in the case of a test failure.

Read the official nose documentation if I was able to wet your appetite.

This post is part of the Python on the toilet series.

swiss.py #2

Mon, 30 Mar 2009

Tomorrow (March 31) we’ll hold the second swiss.py user group event. We’ll meet at the main ETH building at 19:30.

If all goes well, Florian Bösch will talk about OpenGL development using Python. He’s currently not feeling that well, so the topic might change on the day itself.

Get more details on the official swiss.py web site.

Pot: the with statement

Thu, 19 Mar 2009

Python’s with statement can be a very elegant alternative to long try/except/finally clauses. It offers a standard protocol that classes can implement to properly clean up state.

The best example for it’s value is file reading. A good implementation would have to look like this traditionally:

f = open('/tmp/myfile', 'r')
try:
    content = f.read()
finally:
    f.close()

The with statement shortens this to the following code:

with open('/tmp/myfile', 'r') as f:
    content = f.read()

Behind the scenes it wraps the finally statement around this and makes sure that the file gets closed upon leaving the with block.

Your classes can implement the with statement with a simple protocol. The following example is silly, but demonstrates the point.

class Restorable(object):
    def __enter__(self):
        self.__dict_backup = copy.deepcopy(self.__dict__)
    def __exit__(self, type, value, tb):
        if tb is not None:
            self.__dict__ = self.__dict_backup
            return True

Restorable is a class that will backup the state when it’s called with the with statement. Upon leaving the with block with an exception, the previous state is restored and the exception ignored. The __exit__ method gets information about any exception that occurred inside the with block and can thus react differently to successful and failed execution.

There is more detail in the Python 2.6 changelog.

The full code of Restorable including usage examples of this classes is available as a syntax-colored paste on Lodge It.

This post is part of the Python on the toilet series.

Pot: Class properties

Thu, 12 Mar 2009

As most programmers, I’ve been trained with the getter/setter idiom. It allows a programmer in any object-oriented programming language to create a class interface which makes sure that only valid data is written.

An example of a class with getter/setter in Python:

class Person1(object):
    def __init__(self):
        self._age = None
    def get_age(self):
        return self._age
    def set_age(self, age):
        self._age = age

We just wrote two methods – four lines – only to be able to modify and access some data. And we have to do that for every property. This is such a common idiom, that IDEs even have wizards for that.

Python takes a different road: properties. For starters, let’s write the simplest possible class with the behaviour from above:

class Person2(object):
    age = None

Usage of this class is a lot more natural – you just assign values – and it’s a lot shorter.

But the reason people started writing setters was access control. Maybe you want to make sure that no negative age is assigned? That’s where properties come in.

class Person3(object):
    _age = None
    def get_age(self):
        return self._age
    def set_age(self, age):
        if age < 0:
            raise Exception("Age can't be negative.")
        self._age = age
    age = property(get_age, set_age)

It’s almost the same as the first class – even a bit more code. But the usage of this class is exactly the same as before.

As you can see properties are extremely useful. They give you the flexibility of becoming more strict with the interface. But you don’t have to pass on this verbosity to the user. It also takes into account that most of the time you don’t care about the input values. So for the great majority of cases where the setter would really just be the one line, your class definition stays light and beautiful.

The full code including usage examples of this classes is available as a syntax-colored paste on Lodge It.

This post is part of the Python on the toilet series.

Pot: String formatting

Thu, 05 Mar 2009

Starting with Python 2.6 there is a new format function to do string formatting. I’ve always been a bit overwhelmed by the official Format String Syntax documentation so I’ll try to present it to you with more examples.

Python so far had the ‘%’ operator. It’s been deprecated with Python 2.6 and removed from Python 3. Instead string has a new format method.

>>> "This is an %s-style formatting" % "old"
'This is an old-style formatting'
>>> "This is a {0}-style formatting".format("new")
'This is an new-style formatting'

Formatting options go after a colon, for example the width specifications together with alignment options which are mostly useful for tabular representations. Let’s print the following table as an exercise:

|   * Company *   |  * Employees *  |
| Apple           | 35000           |
|         Nektoon | 000000000000005 |
| local.ch        |           24.80 |

Note the centered headlines, and the different left- and right-align values with some padding and floating point formatting dropped in for good measure.

This is easily achieved with the following code:

>>> "| {0:^15} | {1:^15} |".format("* Company *", "* Employees *")
>>> "| {0:15} | {1:<15} |".format("Apple", 35000)
>>> "| {0:>15} | {1:015} |".format("Nektoon", 5)
>>> "| {0:15} | {1:15.2f} |".format("local.ch", 24.8)

As you can see, numbers are right-aligned and strings left-aligned by default. But you can force a different behaviour by manually aligning them using ‘<’ or ‘>’.

Last but not least, format also accepts named parameters:

>>> "| {company:15} | {employees:15} |".format(
...     company="Nektoon", employees=5)
'| Nektoon         |               5 |'

That covers the most basic use cases of the format method. Having read this introduction, you can now use the Format String Syntax documentation as the reference it is.

The Python interpreter session is also available as a syntax-colored paste on Lodge It.

This post is part of the Python on the toilet series.

Pot - Python on the toilet

Thu, 05 Mar 2009

With the Pot: String formatting I’m starting a series on my blog: Python on the toilet. I’ll try to write short and dense introductions for Python features or packages. Each introduction should fit on a A4 paper. So if you want to, you could print it out and hang on your toilet doors.

I’m not entirely sure yet what I’ll cover, but we’ll see what comes up. If you have topic proposals shoot me a mail.

This series was inspired of course by the great Testing on the toilet (ToT) series.

First swiss.py

Wed, 25 Feb 2009

Yesterday evening we had the very first swiss.py. Thanks to Uche Mennel for doing a very interesting presentation about Traits. I was really happy about the number and diversity of people who showed up.

The next swiss.py will take place on March 31 with Florian Bösch doing a presentation related to OpenGL programming in Python. Details and location will follow.

Switched to PyBlosxom, no comments anymore

Sun, 15 Feb 2009

I’ve just switched this blog from typo to PyBlosxom.

There is one change I’m adding at the same time: I don’t accept comments anymore. My mail address is in the sidebar column on the right, feel free to mail me. And you can of course voice your opinions on your own blog and link back to me – I’ll probably link back to you from the original post if I find yours. Basically I got tired of fighting spam in my blog, but I already do a good job of filtering it in my mail inbox.

There’s a technical reason for the blog software change. I got fed up with babysitting the mongrel instances and so I decided a while ago to switch to a completely static solution for my blog. And PyBlosxom’s static rendering fits the bill perfectly.

Python overview presentation

Tue, 10 Feb 2009

This evening I did a Webtuesday presentation about Python. Got nice feedback and lots of discussion afterwards. Thank you guys!

You can download the presentation (PDF).

Continuous testing with Python

Sat, 07 Feb 2009

Back when I did some Ruby on Rails development I was a big fan of autotest. With it I could stay in my editor while the project test suite got executed with every change.

Setup

Now that I’m working in Python I was looking for something similar and I was successful. You’ll need the following:

Then you can execute tdaemon like this in your project directory:

Usage

tdaemon.py --custom-args='--with-growl'

This will continually execute the test suite and notify you with growl about the status as you can see in this screencast:

Demo

Details

To get this working as shown in the screencast I actually had to make some changes.

First nosegrowl didn’t install well using easy_install as the images were missing. So I went ahead and did it manually:

$ hg clone http://hg.assembla.com/nosegrowl
$ cd nosegrowl/nose-growl/
$ sed -i.bak 's/growl.start/# growl.start/' nosegrowl/growler.py
$ python setup.py install

The ‘sed’ command is optional. But I don’t want to be notified when the test suite starts, only when it ends. So I uncomment the growl.start line.

Additionally to make tdaemon less noisy when working with vim I added the swap files to the exclude list. Open the tdaemon.py file and edit the IGNORE_EXTENSIONS line to look like this:

IGNORE_EXTENSIONS = ('pyc', 'pyo', 'swp')

Install Python 2.6 on Debian Etch

Wed, 14 Jan 2009

Debian doesn’t yet have any Python 2.6 packages. But creating them on your own is very easy.

You can:

The download will disappear as soon as I see some real distribution which doesn’t take all my shortcuts.

So this are step-by-step instructions for typing in your shell.

Download

$ curl -O http://www.python.org/ftp/python/2.6.1/Python-2.6.1.tar.bz2
$ mv Python-2.6.1.tar.bz2 python2.6-2.6.1.tar.bz2
$ tar -xvjf python2.6-2.6.1.tar.bz2
$ mv Python-2.6.1 python2.6-2.6.1
$ cd python2.6-2.6.1/

You need to properly name the directories as “packagename-version”. In this case the package name is “python2.6” – not “python” – which is why the version seems to be repeated.

Get all the dependencies

$ sudo apt-get install fakeroot dpkg-dev
$ sudo apt-get build-dep python2.5

Create package

$ dh_make -e yourmail@yourdomain.com -f ../python2.6-2.6.1.tar.bz2
$ vi debian/control
$ vi debian/rules

Above you have to edit control and rules. The contents are below:

debian/control

I just copied the dependencies from the official python2.5 package to save time.

Source: python2.6
Section: python
Priority: extra
Maintainer: Patrice Neff
Build-Depends: debhelper (>= 5), autotools-dev
Standards-Version: 3.7.2
Build-Depends: debhelper (>= 4.2), autoconf, libreadline5-dev, libncursesw5-dev (>= 5.3), tk8.4-dev, libdb4.4-dev, zlib1g-dev, libgdbm-dev, blt-dev (>= 2.4z), libssl-dev, sharutils, libbz2-dev, libbluetooth2-dev [!hurd-i386 !kfreebsd-i386 !kfreebsd-amd64], locales, libsqlite3-dev, libffi4-dev (>= 4.1.0), mime-support, libgpmg1 [!hurd-i386 !kfreebsd-i386 !kfreebsd-amd64], netbase, lsb-release, bzip2, libffi4-dev (>= 4.1.1-11) [m68k], binutils (>= 2.17-2+b2) [m68k]
Build-Depends-Indep: libhtml-tree-perl, tetex-bin, tetex-extra, texinfo, emacs21, debiandoc-sgml, sharutils

Package: python2.6
Architecture: any
Depends: ${shlibs:Depends}, ${misc:Depends}, mime-support
Description: An interactive high-level object-oriented language (version 2.6) Version 2.6 of the high-level, interactive object oriented language, includes an extensive class library with lots of goodies for network programming, system administration, sounds and graphics.

debian/rules

Edit the debian/rules file. This is a Makefile used to build the package. You’ll need to add some fine-tuning to the install target.

On top after the CFLAGS add:

ROOT = $(CURDIR)/debian/python2.6/usr

Then in the install target, you need to add the contents from below.

install: build
    # …..
    $(MAKE) prefix=$(CURDIR)/debian/python2.6/usr install

# START from here # Remove python, we’ll only have python2.6 rm $(ROOT)/bin/python{,-config} mv $(ROOT)/share/man/man1/{python,python2.6}.1 # Remove stuff we don’t need rm $(ROOT)/bin/{idle,2to3,pydoc,smtpd.py} rm -r $(ROOT)/lib/python2.6/test # END until here

Build package

You can now compile and install this package.

$ dpkg-buildpackage -rfakeroot
$ sudo dpkg -i ../python*.deb

Proper file overwrites in Python

Mon, 05 Jan 2009

For Nektoon I’m implementing a storage service in Python. It will work with individual text files that it exposes over a REST API.

I struggled a bit with how to correctly handle file overwrites. In the end I came up with this code:

    import os
    import portalocker

if os.name == ‘posix’: # Rely on the atomicity of Posix renames. rename = os.rename else: def _rename(src, dst): “”“Rename the file or directory src to dst. If dst exists and is a file, it will be replaced silently if the user has permission. “”“ if os.path.exists(src) and os.path.isfile(dst): os.remove(dst) os.rename(src, dst) def write(filename, contents): filename_tmp = filename + ‘.TMP’ with open(filename_tmp, ‘a’) as lockfile: portalocker.lock(lockfile, portalocker.LOCK_EX) with open(filename_tmp, ‘w’) as out_file: out_file.write(contents) rename(filename_tmp, filename)

There are two important parts here.

First the locking – with which I struggled most. Locking only works properly on file objects. But if I open a file in write mode then it gets overwritten before the lock is even checked. That’s why I first open the file in append mode, which is non-destructive. Then I open a lock and only continue to open the file in the destructive write mode if I get that lock. The pattern is the same – wether you use the built-in fcntl.flock directly or the excellent portalocker, which abstracts away platform differences.

The second part is the rename. Unfortunately the rename function won’t work on MS Windows platforms if the target file already exists. On Posix platforms it works. And this behaviour is even required because rename guarantees atomicity. So I implement a compatible (but non-atomic!) rename for all non-posix platforms so that the code will still work for developers on MS Windows. But it should only be used in production on Posix platforms.

Has anybody seen a better lock handling in Python that will solve those issues?

Cultural differences: Python vs. Java

Tue, 06 Nov 2007

I’m pretty programming language agnostic and fluent in a handful of different languages. That means I use the language that gets the job done. But I tend to lean towards dynamic programming languages specifically PHP, Python and Ruby.

Last week I had to solve a non-trivial problem for my background: clustering of content. I had to write a program which takes a bunch of search results and clusters them together by content. So similar results would go into the same group.

I started with Carrot2 – a Java framework for exactly that purpose. The only available documentation is the API reference and some examples. The API documentation contains 796 classes. That’s no typo, count them if you must. I spent literally two working days trying to get it running. I got it running somehow but got stuck when I had to customize text distance function.

That’s when I started to search for other packages. I found python-cluster. It exposes two classes (for the two different clustering algorithms) with a constructor and one method each. All I have to pass it is the list of results and a distance function.

I was up and running literally in less than an hour. Most of that I spent on a reasonable distance algorithm.

Not passing any judgment here. Both frameworks have their strengths. But I found it a very good example of the different philosophies in the two camps.