Everyman Software: August 2012

When building a project with Neo4j or most other graph databases, it is impossible to avoid learning about Tinkerpop's excellent Gremlin graph processing language. The processing layer of Gremlin is built on top of Pipes, a dataflow programming library.

I was inspired by the syntax and ease-of-use of Gremlin to build a simple processing pipeline library in PHP. The result is Plumber, a library for easily building extensible deferred-processing pipelines.

Plumber is built on top of PHP's native Iterator library. The idea is simple: instantiate a new processing pipeline, attach processing pipes to it, then send an iterator through the pipeline and iterate over the results in a foreach. Each element that comes out the other end of the pipeline has been passed through an processed by each pipe. And because Plumber uses iterators, it natively supports lazy-loading and Just-In-Time evaluation.

A simple example would be reading a set of records from a database, formatting them in some manner, then echo'ing them out to the screen:

$users = // code to retrieve user records from a database as an array or Iterator...

$names = array();
foreach ($users as $user) {
    if (!$user['first_name'] || !$user['last_name']) {
        continue;
    }

    $name = $user['first_name'] . ' ' . $user['last_name'];
    $name = ucwords($name);
    $name = htmlentities($name);
    $names[] = $name;
}

// later on, display the names
foreach ($names as $name) {
    echo "$name<br>";
}

There are a few obvious downsides to doing things this way: the entire set of records is looped through more than once; all the records must be in memory at the same time (twice even, once for $users and once for $names); and the processing steps in the foreach are executed immediately on every record. These may not seem like a big deal if the record set is small and the processing steps are trivial, but they can become big problems if you are not careful.

Here is the same code using Plumber:

$users = // code to retrieve user records from a database as an array or Iterator...

$names = new Everyman\Plumber\Pipeline();
$names->filter(function ($user) {
        return $user['first_name'] && $user['last_name'];
    })
    ->transform(function ($user) {
        return $user['first_name'] . ' ' . $user['last_name'];
    })
    ->transform('ucwords')
    ->transform('htmlentities');

// later on, display the names
foreach ($names($users) as $name) {
    echo "$name<br>";
}

The list of $users is only looped through one time, and there is no need to keep a separate list of $names in sync with the $users list. Each $user is transformed into a $name on-demand, keeping resources free.

This can all be accomplished using Iterators, but there is quite a bit of boilerplate code involved. Plumber is meant to remove most of the boilerplate and let the developer concentrate on writing their business logic.

There is more to Plumber, including several built-in pipe types, and the ability to extend the library with your own custom pipes. It is also not necessary to use the fluent interface, if that is not your style. More usage information can be found in the README file in the Plumber github repo. Constructive feedback is always welcome!

Everyman Software

2012-08-31

Plumbing PHP