Last week I had to solve a non-trivial problem for my background: clustering of content. I had to write a program which takes a bunch of search results and clusters them together by content. So similar results would go into the same group.
I started with Carrot2 – a Java framework for exactly that purpose. The only available documentation is the API reference and some examples. The API documentation contains 796 classes. That’s no typo, count them if you must. I spent literally two working days trying to get it running. I got it running somehow but got stuck when I had to customize text distance function.
That’s when I started to search for other packages. I found python-cluster. It exposes two classes (for the two different clustering algorithms) with a constructor and one method each. All I have to pass it is the list of results and a distance function.
I was up and running literally in less than an hour. Most of that I spent on a reasonable distance algorithm.
Not passing any judgment here. Both frameworks have their strengths. But I found it a very good example of the different philosophies in the two camps.
We then moved to El Lokal for a beer and Pizza. We of local.ch had to accept being defeated by tel.search.ch for looking up the number of a Pizza delivery service. But only because we don't have the mobile interface ready, yet. It's already being implemented, though and is one of the lacking features I personally care most about.
It was once again interesting to meet a few people I had only met online so far, especially Denis De Mesmaeker and Alain Petignat.
Next time I'll talk about Ruby on Rails which I already defended at our table yesterday. We planned/are planning to have that event on June 13 but that date collides with the Swiss victory over France at the Football World Cup. Details are currently being negotiated and will be announced on the Webtuesday Zurich Web site.