Okapi 1.0

Posted by Patrice Neff Fri, 14 Mar 2008

Yesterday we’ve released version 1.0 of Okapi, a web framework built with PHP and XSLT. I’ve spent a substantial amount of time during the last months working on that release. Okapi is the framework we use at local.ch for all our frontend needs and was originally developed by Silvan from Liip.

So far we used a heavily modified fork of Okapi. New projects at local now don’t use that internal fork anymore but instead the official central Okapi. To facilitate that I’ve merged some stuff from our fork into the main repository and also I’ve cleaned up the code base so that Okapi now sucks less.

The main features are these:

Because of it’s good XML and XSLT support it’s ideal for SOA architectures, especially when built with REST APIs.

Development on Okapi will continue. But the idea is to add new features as extensions, so that the core can stay small.

PHP Testing with SimpleTest

Posted by Patrice Neff Tue, 22 May 2007

Maarten’s post at Tillate finally brought the motivation to document the PHP testing approach we use at local.ch.

First let me give you a short introduction to our architecture at local.ch. We have a clear separation of frontend (presentation, user-visible parts) and backend (search logic and database accesses). The frontend is written in PHP and XSLT. The PHP-part basically only orchestrates queries to our Java-based backend and passes the XML responses to XSLT. The bigger parts of the system are the XSLT stylesheet. All this means, that traditional unit tests don’t have a big value for the frontend as there isn’t much traditional logic. But we need to do functional/integration testing.

Only since a short time we actually have a nice PHP-based testing infrastructure. Before that, we almost exclusively used Selenium Core – see for example my presentation of last year. Now we use SimpleTest slightly extended and with a helper class for the Selenium testing (to be documented in a separate blog post).

This is the basic test.php file which we use to execute the tests:

require_once(“common.php”);

// “Configuration”
$GLOBALS['TLD’] = 'local.ch’;
$GLOBALS['SELENIUM_SERVER’] = 'localhost’;

if (file_exists('config_developer.php’)) { include_once('config_developer.php’);
}
if (getenv('SELENIUM_SERVER’)) { $GLOBALS['SELENIUM_SERVER’] = getenv('SELENIUM_SERVER’);
}
if (getenv('TLD’)) { $GLOBALS['TLD’] = getenv('TLD’);
}

/** * $case: Only run this test case * $test: Only run this test within the case */
function runAllTests($onlyCase = false, $onlyTest = false) { $test = &new TestSuite('All tests’); $dirs = array(“unit”, “selenium”, “selenium/*”);

foreach ($dirs as $dir) { foreach (glob($dir . ‘*.php’) as $file) { $test->addTestFile($file); } } if (!empty($onlyCase)) $result = $test->run(new SelectiveReporter(new TextReporter(), $onlyCase, $onlyTest)); else $result = $test->run(new XMLReporter()); return ($result ? 0 : 1); }

return runAllTests($argv[1], $argv2);
?>

The top part sets up some configuration values we use for Selenium. There are two global variables, the TLD which defines the host name to test against and SELENIUM_SERVER which is the Selenium server to connect to. There are two ways to configure. Either with the “config-developer.php” file which is excluded from version control and can be created by the developer. And then by setting environment variables when calling the test script.

After that the tests are run. Basically it includes tests from a set of directories. Then it either uses the SelectiveReporter or our own XMLReporter to execute tests. The SelectiveReporter will only execute a given test class or even only a given method (the first and second parameter from the command line respectively). The XMLReport gives a JUnit-style parseable output that we use for the continuous integration tool (Bamboo in our case).

The included common.php file contains this:

error_reporting(E_ALL);
ini_set('log_errors’, '0’);

if (! defined('SIMPLE_TEST’)) { define('SIMPLE_TEST’, BX_PROJECT_DIR . ‘inc/vendor/simpletest/’);
}
require_once(SIMPLE_TEST . 'reporter.php’);
require_once(SIMPLE_TEST . 'unit_tester.php’);

class XMLReporter extends SimpleReporter { function XMLReporter() { $this->SimpleReporter();

$this->doc = new DOMDocument(); $this->doc->loadXML(’'); $this->root = $this->doc->documentElement; } function paintHeader($test_name) { $this->testsStart = microtime(true); $this->root->setAttribute('name’, $test_name); $this->root->setAttribute('timestamp’, date('c’)); $this->root->setAttribute('hostname’, 'localhost’); echo “\n”; echo “param string $test_name Name class of test. *
access public */ function paintFooter($test_name) { echo “—>\n”; $duration = microtime(true) – $this->testsStart; $this->root->setAttribute('tests’, $this->getPassCount() + $this->getFailCount() + $this->getExceptionCount()); $this->root->setAttribute('failures’, $this->getFailCount()); $this->root->setAttribute('errors’, $this->getExceptionCount()); $this->root->setAttribute('time’, $duration); $this->doc->formatOutput = true; $xml = $this->doc->saveXML(); // Cut out XML declaration echo preg_replace(’/<\?[^>]*\?>/’, “”, $xml); echo “\n”; } function paintCaseStart($case) { echo “- case start $case\n”; $this->currentCaseName = $case; } function paintCaseEnd($case) { // No output here } function paintMethodStart($test) { echo “ – test start: $test\n”; $this->methodStart = microtime(true); $this->currCase = $this->doc->createElement('testcase’); } function paintMethodEnd($test) { $duration = microtime(true) – $this->methodStart; $this->currCase->setAttribute('name’, $test); $this->currCase->setAttribute('classname’, $this->currentCaseName); $this->currCase->setAttribute('time’, $duration); $this->root->appendChild($this->currCase); } function paintFail($message) { parent::paintFail($message); if (!$this->currCase) { error_log(”!! currCase was not set.”); return; } error_log(“Failure: “ . $message); $ch = $this->doc->createElement('failure’); $breadcrumb = $this->getTestList(); $ch->setAttribute('message’, $breadcrumb[count($breadcrumb)-1]); $ch->setAttribute('type’, $breadcrumb[count($breadcrumb)-1]); $message = implode(’ -> ', $breadcrumb) . “\n\n\n” . $message; $content = $this->doc->createTextNode($message); $ch->appendChild($content); $this->currCase->appendChild($ch); } } ?>

This file sets up SimpleTest by including the necessary file. Then follows the definition of the XMLReporter. It will print out some debugging so we know where it’s at. That’s necessary for us because our Selenium tests take about 15 to 20 minutes. At the end follows the XML-result which can be parsed by Bamboo. It should also work for other tools that expect JUnit XML output but I haven’t tested that.

Looking for a frontend developer

Posted by Patrice Neff Wed, 14 Feb 2007

At local.ch we’re looking for a developer in the area of PHP/XSLT development. You will take over work regarding all user-visible aspects with technologies such as PHP, XSLT or Javascript.

A job at local.ch gives you a lot of freedom to explore, find good solutions, learn new technologies, bring in your opinions and knowledge.

We want a developer who knows how to write clean XHTML and CSS, has experience in client side Javascript and is well-versed in XML. We will gladly teach you XSLT on the job but if you already know to program in XSLT so much the better.

If you’re interested, head over to our blog to read more details. You can get in contact with me via e-mail (patrice [at] local.ch) or Skype (patriceneff).

Mail testing with Selenium

Posted by Patrice Neff Thu, 23 Nov 2006

For the next phase of local.ch E-Mail processes will play a central role. So I wanted to include those processes in our Selenium tests. It’s actually quite easy to do.

First create an account where test mails can go to. That account should be accessible by one of your scripts. I use a normal IMAP account for that. Then write a script which always outputs the newest mail on that account. I include some of the important headers plus the body (body parts for multi-part mails). I also made that page refresh itself every two seconds.

Then writing the tests is easy. Write a test first that executes the action that sends a mail. Make sure the mail is sent to your test account.

Next write a test that opens the getmail script (using the selenese command “open”). Follow that with a waitForTextPresent action to wait until the test mail has arrived – which never lasts more than a few seconds in my environment. Then you can use the normal test commands such as verifyText, verifyTextPresent or even click etc. if you output HTML mails correctly.

Works like a charm around here. If there is interest I can publish my script to get the mails. It’s written in PHP and is basically an IMAP client using the two PEAR packages Net_IMAP and Mail_mimeDecode.

Book Review: Building Scalable Web Sites

Posted by Patrice Neff Wed, 07 Jun 2006

A short while back the book Building Scalable Web Sites came out on Safari. The book is written by Cal Henderson, one of the main people behind Flickr. As I'm currently involved in building a Web application (local.ch) I was interested to learn a few lessons by the Flickr people. Okay, Flickr is not beating any speed records right now, but it's still an incredibly big application with tons of users and data.

Management review: the book is worth a read.
Technical short review: the book covers a lot of stuff a bit and nothing extremely well.

The book does not completely live up to it's title as scaling is only part of the book. It seems more to be a list of lessons learned while building Flickr.

That's also the reason for one of the book's main deficiencies: it's mostly PHP and MySQL only. But it also includes enough lessons that can be applied in other environments for it to be useful.

A short chapter by chapter review follows.

  1. Introduction: Doesn't need a review.
  2. Web Application Architecture: Interesting notes about building the architecture of your application. Including parts about hardware and networking.
  3. Development Environments: A few quick tips about using source control, deployment, testing, etc. Stuff that's covered a lot better and in more depth in many other places.
  4. i18n, L10n, and Unicode: Notes about internationalizing an application, translating, etc. If you have worked with Unicode before there is not much new information here.
  5. Data Integrity and Security: Filter all your input and output. Good for Cal that he includes this, because too many Web developers still fail at this. And many other books don't include it so all new new legions of Web developers come out without this knowledge.
  6. Email: Flickr does Email handling for moblogging. As I wrote that part for the KAYWA weblogs I know how frustrating this can be. That's why section 6.7 is titled "Mobile Carriers Hate You". All in all the chapter mainly covers dealing with incoming email including how to handle attachments.
  7. Remote Services: Interesting information about how to handle remote services and also communication in your application. I found section 7.5 particularly interesting where Cal describes an asynchronous service and how they implemented it for the photo uploads on Flickr.
  8. Bottlenecks: Preparing you for the scaling chapter. How you identify where your application is slow.
  9. Scaling Web Applications: The chapter that set the book's title. A lot of information but in my opinion it doesn't go deep enough.
  10. Statistics, Monitoring, and Alerting: Basically only explains a few tools for gathering statistics of the system. The section on alerting could just as well have been left out.
  11. APIs: How to publish your content with a few APIs. RSS, Atom and Web Services are the buzz words that describe this chapter.

All in all I have mixed feelings about this book. It's nothing earth-shattering but certainly worth a read. If you can read it online on Safari it's well worth to be added to your library for a while. If you have to shell out the money for that book, you will have to decide for yourself whether you can learn enough from it for it to be worth the money. It wouldn't have been worth it for me.

Be careful when cutting UTF-8 text

Posted by Patrice Neff Fri, 24 Mar 2006

I just fixed a nasty problem on the two planets I run (one for namics and one for local.ch). The aggregator script would run forever without stopping. A bit of debugging showed, that the problem was about how UTF-8 character were handled (or rather weren't handled).

The script uses PHP's DomDocument, more specifically it's functions loadHTML and saveXML, to extract valid XML from the blog posts. That's necessary because the posts are shortened and this shortening can lead to a completely invalid (X)HTML structure. Let alone all the rubbish content that many a software produces.

Shortening the content was of course done with the PHP function substr. And that's where the problem was. The relevant part of the text that caused problem was "steht nur auf Englisch zur Verfügung". Translated to UTF-8 this becomes "steht nur auf Englisch zur Verf??gung". To this string substr was applied and it produced "steht nur auf Englisch zur Verf?" - the second half of the UTF-8 character was cut off. If you know anything about UTF-8 you will go "ouch" here and smile about your knowledge and skip the following two paragraphs. If you don't see the problem yet, let me enlighten you.

UTF-8 is a Unicode encoding. So it can transport any of the characters defined in Unicode which is just about any character that you might ever want to use in today's computing. It's neat because for most of the content in European languages it requires just one byte per character (versus two bytes in UTF-16 for example). When a byte is in the ASCII range it's displayed and all is well. But when the byte is outside of the ASCII range (which only knows about 128 characters and can be encoded in 7 bits per character as you probably know) this means that the following byte belongs to the same character. I'm sorry, I don't really know how to explain that any better so let me just give you an example.

The string Für becomes F??r in UTF-8. So the UTF-8 decoder would read the first byte, the letter F. That fits nicely into ASCII, so that byte is read and the decoder continues with the next character. It reads one byte which is ?. "Holy cow" you hear the decoder exclaim, "that's not ASCII". So the decoder has to read one additional byte and gets the ?. Reading those two characters, putting them together and calculating a bit, the decoder then knows that it has just read an ü.

So do you already see the problem in the UTF-8 string "steht nur auf Englisch zur Verf?"? The decoder arrives at the last byte which is ?. It knows it has to read one more byte, but there are none. So somehow the PHP code in question decides to patiently wait until the string magically grows longer.

The real problem though is of course the careless use of substr. You shouldn't just cut UTF-8 characters in half. The problem can be solved with mb_substr, a substr function that is Unicode-aware. Just give it 'utf-8' as its fourth argument and the problem is solved.


Update: It seems that the problem goes away automatically with newer libxml versions. On my server it's 2.6.16, while Chregu uses 2.6.23 and can't reproduce the problem. Thanks Chregu for digging into this.