2010-12-22

The Goal

I have pretty strong feelings about Agile. That is to say, I can't stand it. When someone mentions Agile to me all I think about are overpriced books, puffed up consultants and "coaches" and a whole slew of project management software tools that give Pro[du|je]ct Managers way too much noise vs. signal about how a project is really going. And all of it gets in the way of what I really love to do: sling great code.

But man, oh man, do I dig being agile.

There's already a pretty awesome article about the difference between "Agile" and "agile". It says just about everything I could write on the subject. Rather than rehash ideologies, I'm going to introduce a concept that has helped me towards gaining an agile mindset: the goal.

I've had the goal in my head for a while now. I never bothered writing it down anywhere until a presentation that I threw together in about 30 minutes to help my new team with their agile adoption. That presentation is the first time I've seen the goal presented and worded in that form (I don't take credit for the goal; it is the product of a lot of reading, attending other people's presentations, some experiences good and bad, and a healthy dose of my own biased worldview.) Here is the goal:

Deliver working, usable, high-quality software as quickly as possible.

The goal answers a simple question: why are we adopting agile? This isn't about the project or product. The business has already defined the rationale for that. The goal is about the rationale for wanting to be agile while accomplishing the project.

Nothing in the goal (or the Agile Manifesto, for that matter), speaks to a specific process or method. And that's where the power of the goal comes from. It speaks to what the team is trying to accomplish, while making the how a secondary concern. Focusing on the goal allows a team to easily see where their agility is being compromised:
  • Is our software working? Is it ready to be put in the hands of our users? Are we making continual improvements to it?
  • Is it usable? Does it meet our users' needs and goals? Does it provide value to the stakeholders?
  • Does it have high-quality? Is the code clean, as bug-free as we can make it, and easy to fix when the bugs we missed inevitably appear?
  • Are we delivering fast enough? Are we meeting the current needs of the business? Are we able to act on ideas and opportunities before our competition does?
If the answer to any of those questions becomes "no" at any time, then the process is broken. Fix it. You don't have to follow Scrum or any other Agile methodology to the letter. Process is highly context dependent. Only you and your team know what will get you to the goal.

If everyone can agree on the goal, the rest is downhill. The team will work out it's own way of getting there. I suggest Scrum as a starting point because, as a method, it's easy to understand. But I believe that a team that is really focused on the goal will probably leave the guardrails of Scrum and start building their own process fairly quickly.

There's a concept that I first heard from Alistair Cockburn but originally came from Japanese martial arts: Shuhari 守破離. The basic premise is that there are 3 stages to attaining mastery of a skill:
  1. Shu — You imitate the form exactly. You do not deviate from the specified process, and you follow it to the letter.
  2. Ha — You begin to innovate for yourself. You tweak the process to take advantage of your own special circumstances and strengths.
  3. Ri — You no longer think in terms of forms or processes. There is a constant, smooth and uninterrupted flow of thought into action.
Everyone starts at Shu. Scrum is Shu, as is any other Agile method. Moving towards the goal helps a team move out of Shu and towards Ha. This is where agile beats out Agile. When you stop thinking about Agile or agile, and you simply act in the direction of the goal, you have attained Ri.

2010-12-15

Git, TargetProcess and Hashtags

At $job, we're using TargetProcess for our user story and task management. One of the neat things you can do in TP is tag commits to a code repository in such a way that they will link to the task associated with that commit:
#123 This commit is linked to task number 123
In order to get this to work, the hashtag has to be the first thing on the first line of the commit message. Anyone using `git` as their source control should immediately see the problem with this...

...In git commit messages, any line starting with # is considered a comment and is stripped out of the commit message. Quite annoyingly, there's no way to change git's default comment delimiter. Since $job is using Subversion, this isn't a problem except for myself, the lowly git user hobbling along with git-svn.

The solution I've come up with is a two-pronged approach involving changing a default and adding a git-hook.

Firstly, the default --cleanup option on a `git-commit` command strips trailing and leading whitespace lines and all commentary (lines starting with #). So I aliased `git commit` to only worry about the whitespace lines, by adding this to my .gitconfig:
[alias]
    cmt = "commit --cleanup=whitespace""
So now git will leave lines starting with "#" alone. This means I have strip them out myself. This is accomplished with a git commit-msg hook that looks like this:
# Use TargetProcess hashtags to link commits to tasks
# use this in conjunction with --cleanup=whitespace
sed -e 's/^#.*//' -e 's/^@\(#.*\)/\1/' -i $1
The first sed expression strips out comment lines. The second one translates any line that starts with "@#" into one starting with "#". The trick is that it does it after the comment line stripping. So now I can do commits that look like this:
git cmt

# The editor opens and I can type:

@#123 This commit is linked to task number 123
# This is a comment line that will get stripped out
#  So it this

When the commit goes through, voila! A commit message starting with a hashtag that can be picked up by TargetProcess.

Edit 2010-12-16: After experimentation and verification, it turns out the TP can pick up the task id hashtag anywhere in the commit message. So all the above is not really necessary. It's interesting from an educational perspective, though.

2010-12-05

Private Git repository with Dropbox

I recently began working on a project that needs a remotely-accessible private git repository. The project needs to be done on the cheap, and we don't have the budget to pay for a private github or assembla repository. We're already using Dropbox to handle project documentation and such, so it became a natural fit for sharing code as well.

I already have the Dropbox client installed and set up on my development machine. Here is how to set up the repo:
cd /path/to/Dropbox
mkdir myproject
cd myproject
git --bare init
This initializes a "bare" repo; a repo without a working directory. In other words, you can't edit and commit to it directly. The repo is now your "remote", hosted on Dropbox, and accessible through the local filesystem.

Create a working clone from the remote repo:
cd /path/to/myproject-working
git clone /path/to/Dropbox/myproject .
This will complain about cloning an empty repo, which is fine. Now follow the normal git development cycle:
echo "Read this!" > README
git add README
git commit -m "Initial commit"
When you're done making changes and committing, it's time to push to the remote repo in Dropbox:
git push /path/to/Dropbox/myproject --all
The `--all` flag is necessary on the first commit to the remote. It can be left off of all subsequent commits.

That's it! Now the repo is available on any computer linked to the Dropbox account, and can be checked out using the normal `git clone`, work, commit, `git push` cycle.

If you want other developers to have access, simply share the repo directory with their Dropbox accounts as you would any other directory. They can clone it and push to it just like any other git repository.

Here's another trick if you don't have the Dropbox daemon (dropboxd) running all the time on your development machine. Put the following script in "/path/to/Dropbox/myproject/hooks". Name the script "dropbox-push"
#!/bin/sh

dropbox running
if [ $? -eq 1 ]; then
 echo "Dropbox started by another process."
 exit 0
fi

dropbox start
until [ "`dropbox status`" = "Idle" ]; do
 sleep 1
done
dropbox stop
Now make the post-receive hook runnable:
mv post-receive.sample post-receive
Edit the "post-receive" file, and add the following lines:
cd `dirname $0`
./dropbox-push
Save the file. Now whenever you (or another developer) pushes to the repo, it will check to see if the Dropbox daemon is running, and if not, will start it and let it run until the code is synced, then stop the daemon. Since the hook itself is in under Dropbox, it will be synced and will be run for all other developers pushing to the Dropbox repo as well.

So there it is, a free, private git repository; at least, until the repository gets larger than the Dropbox free 2GB limit.

2010-11-21

JadedPHP controllers

In my last post about JadedPHP, I talked about the model layer. Today, it's all about the controller layer.

Jaded controllers are built around the Intercepting Filter Pattern. In brief, when the Dispatcher sends a request to a controller for processing, the request is actually dispatched to a chain of filters which can modify it or check it for necessary attributes (has it been authenticated, etc.) before passing it on to the actual request handler.

A controller class extends the Jaded_Controller class, and needs to define a process() method.
class MyController extends Jaded_Controller
{
    protected function process(Jaded_Request $oRequest, Jaded_Response $oResponse)
    {
        $oResponse->assign('myVar', 'this is the assigned value');
        $oResponse->assign('requestVar', $oRequest->getParam('reqParam'));
    }
}
Here is a simple controller that assigns some values to a response variable. One of the assignments, myVar, is given by the controller. The other is taken from the passed in request object. If the name parameter was never set in the request, the getParam() call will return null.

$oController = new MyController();
$oRequest = new Jaded_Request();
$oResponse = new Jaded_Response();

// The way we get to the process() method is through the dispatch() method
$oController->dispatch($oRequest, $oResponse);

print_r($oResponse->getAssigns());
/*
Prints:
Array
(
    [myVar] => 'this is the assigned value'
    [requestVar] => 
)

*/

The requestVar is null because we never populated it. Here is a controller filter that will populate it for us:
class RequestFillerFilter extends Jaded_Controller_Filter_PreProcessor
{
    protected function preProcess(Jaded_Request $oRequest, Jaded_Response $oResponse)
    {
        $oRequest->setParam('requestVar', 123);
    }
}

//Now we modify the dispatching code from above
$oController = new MyController();
$oRequest = new Jaded_Request();
$oResponse = new Jaded_Response();

// Create the filter, wrapping the controller that will process the request
$oFilter = new RequestFillerFilter($oController);

// Filters are actually controllers themselves, and are dispatched the same way
$oFilter->dispatch($oRequest, $oResponse);

print_r($oResponse->getAssigns());
/*
Prints:
Array
(
    [myVar] => 'this is the assigned value'
    [requestVar] => 123
)

*/
requestVar was set in the request object before it was passed to the controller, so it was there for the controller to access.

Here's an example of a post-processing filter, that will handle our output for us:
class Print_RFilter extends Jaded_Controller_Filter_PostProcessor
{
    protected function postProcess(Jaded_Request $oRequest, Jaded_Response $oResponse)
    {
        print_r($oResponse->getAssigns());
    }
}

//And here is the usage
$oController = new MyController();
$oRequest = new Jaded_Request();
$oResponse = new Jaded_Response();

// Filters expect a controller as a construction param,
// but since filters are controllers, we can chain them like this:
$oFilter = new Print_RFilter( new RequestFillerFilter( $oController));
$oFilter->dispatch($oRequest, $oResponse);

/*
Prints:
Array
(
    [myVar] => 'this is the assigned value'
    [requestVar] => 123
)

*/
Notice that the dispatcher no longer needs to call print_r() itself, because the post filter will do it instead.

It gets unwieldy to have to keep wrapping controllers in filters, especially if a set of controllers always gets wrapped in the same filters. Jaded provides a Chain filter that let's you string together commonly used filters to simplify wrapping controllers in filters.
class CommonFilterChain extends Jaded_Controller_Filter_Chain
{
    protected $aFilters = array(
        'Print_RFilter',
        'RequestFillerFilter',
    );
}

$oController = new MyController();
$oRequest = new Jaded_Request();
$oResponse = new Jaded_Response();

$oFilter = new CommonFilterChain($oController);
$oFilter->dispatch($oRequest, $oResponse);

/*
Prints:
Array
(
    [myVar] => 'this is the assigned value'
    [requestVar] => 123
)

*/
It doesn't matter in this example, but filters listed in a chain are wrapped with the first listed filter as the outermost, and the last filter as the innermost. This is relevant when using a Jaded_Controller_Filter which defines both preProcess() and postProcess() methods.

The other interesting thing to note about chains is that they are filters themselves, so it is possible to have a chain filter listed inside another chain filter.

My goal in JadedPHP is to provide filters that perform tasks common to a web application and wrap them in easy to use chains. This way, the dispatcher can detect that a request is an HTTP request, and automatically wrap it in filters that will initialize the session, put the request method and parameters into a request object, set up an HTML or AJAX renderer, check for authentication and other necessary tasks.

The example source code is available here.

2010-11-14

New domains

I just registered joshadell.com and everymansoftware.com. Right now, they both redirect to this blog. everymansoftware.com will probably continue to point here for a while. I intend to get some stuff up on joshadell.com about projects, maybe some live demos, who knows?

After attending indieconf yesterday, I was bitten by the bug to start making my name my "brand" and doing a little more self-promotion, blogging, and contributing to the web community.

I'm using DreamHost in case anyone is interested.

2010-11-03

Models can be so Jaded

I am currently working on a PHP framework which I've been calling JadedPHP. It's purpose it to provide a lightweight MVC framework for some of my personal projects. Basically, I'm after a learning experience that provides useful output for me. If someone else gets use out of it as well, that's just icing on the cake.

My feelings on PHP frameworks in general are that they abstract way too much of the application's structure from the developer. I like to dig in and really know what's going on with the code that's running my code. Jaded is an attempt to build as little "automagic" into the framework as possible, and thus maximize the flexibility given to the developer to borrow bits and pieces as needed, and to override the rest if necessary.

The model layer is pretty solid so far. Models in Jaded are broken into 3 parts: a definition, a store, and a container.

The definition, oddly enough, defines the model. This means it lists the available fields of the model, which of those fields are key fields that uniquely identify the record held by the model, and any default values for those fields. Basically, the definition gives the model's structure, what it looks like.

A model store is the actual storage mechanism for the model. It implements basic CRUD operations, and is a place to define additional data manipulation tasks. The base class for model stores does not specify what the storage mechanism is. It could be a database, a CSV or XML file, or some volatile cache. It is up to concrete stores to actually implement the CRUD operations for a given model. (In reality, Jaded comes with basic database store that performs one model -> one row mapping using Jaded's PDO wrapper class.)

Finally, there is the model container itself. The container defines which definition and store the model will use. It is also responsible for holding the individual field values for a given record. It provides basic getter and setter functions, as well as the ability to fill a model from an array of values or spit out an array containing the values.

So how does this look? Let's pretend we have a database table that looks like this:
CREATE TABLE ducks (
    duckid int NOT NULL AUTO_INCREMENT,
    type int NOT NULL,
    name varchar(20) NULL,
    sound varchar(10) NULL,
    PRIMARY KEY duckid (duckid)
);

The definition for a Duck model would look something like this:
class DuckDefinition extends Jaded_Model_Definition
{
    const TypeMallard = 0;
    const TypeWood = 1;

    /**
     * Maps a name that calling code can use to an internal field name
     * Note that they do not have to match, and there can be multiple
     * aliases to a single internal name.
     */
    protected $aFieldMap = array(
        "duckid" => "duckid",
        "type"   => "type",
        "name"   => "name",
        "noise"  => "sound",
        "sound"  => "sound"
    );

    /**
     * The key fields for this model
     * Fields that uniquely identify it
     * A key of "auto" means the key is automatically set by the store,
     * else use "key"
     */
    protected $aKeyFields = array(
        "duckid" => "auto"
    );

    /**
     * Defaults for any fields
     */
    protected $aDefaultValues = array(
        "type"  => self::TypeMallard,
        "sound" => "quack"
    );
}

And now the store. In this case, I'm cheating and using the built-in basic database store:
class DuckStore extends Jaded_Model_Store_Database
{
    protected $sTable = "ducks";

    /**
     * This bit is simply the connection name used by Jaded's database wrapper
     */
    protected $sDbId = "duck_database";
}

And finally, a model container that wraps it all up:
class Duck extends Jaded_Model
{
    protected $sDefaultDefinition = "DuckDefinition";
    protected $sDefaultStore      = "DuckStore";
}

And now a bit of usage:

$oDuck = new Duck();
echo $oDuck->getType();    // prints "0"
echo $oDuck->getSound();   // prints "quack"

$oDuck->setName("Donald");
$oDuck->create();
$iDuckId = $oDuck->getDuckId();

$oDuck2 = new Duck();
$oDuck2->setDuckId($iDuckId);
$oDuck2->load();
echo $oDuck2->getName();   // prints "Donald"

// Now let's pretend we need to migrate all ducks to a CSV file, and we have a store for that
class DuckCSV extends Duck
{
    protected $sDefaultStore      = "DuckCSVStore";
}

$oDuck3 = new DuckCSV($oDuck2);
$oDuck->update();
Note how when we need to store the model in a different storage medium, we can just change the store type, and keep the definition and any methods that might have been built into the Duck class.

If the model's are stored in a database one row per model object, Jaded has a lot of functionality built in. But it also provides the flexibility to build model's that have the same definition, but have a store that pushes to/pulls from an RSS feed, or Twitter, or a stock ticker, or any other data source. For only a little extra setup, you get a lot of options.

There will probably be a GitHub repository soon, and another post or two as I start to use Jaded in more projects.

2010-10-13

Box2dnode Available via npm

box2dnode has been packaged up and released as an npm module. To get it, use the following command:
npm install box2d

Creating the package was relatively easy. It only involved creating a package.json file in the root of the package:

{
 "name" : "box2d",
 "version" : "1.0.0",
 "description" : "2D physics engine",
 "homepage" : "http://github.com/jadell/box2dnode",
 "author": {
  "name" : "Josh Adell",
  "email" : "josh.adell@gmail.com",
  "url" : "http://everymansoftware.blogspot.com/"
 },
 "main" : "./box2dnode",
 "engines" : ["node"]
}
Of course, box2d is a simple package. I'll be interested to see how packaging works with a more complex project.

2010-10-04

Global Pub-Sub

I've started working on a universal messaging bus that would allow any number of components to broadcast "events" to any other number of listeners (the publication-subscription, or observer, pattern.) Since I'm a sucker for playing with new tools, I've decided to build the bus in node.js. The basic idea is to allow a client in one process to subscribe to events broadcast by other clients in other processes by subscribing to a "central" bus.

In practice, each broadcaster will have a local bus object to which they publish. That bus serializes the message and sends it to a central server. The central server checks the message for validity, then pushes the message to any connected listeners. There will also be an HTTP connector that will store events for a period of time to allow AJAX listeners to periodically connect and pull down waiting events.

The listeners will have a local bus object to which they can subscribe. When an event is pushed to the listener object by the server (or pulled down in a long poll by an AJAX client), the bus un-serializes the message and pushes it to the registered listeners for that event type.

To the publishers and subscribers, everything appears to be happening local to the process.

What does this look like codewise?
// Publisher client
var bus = new Bus(server, port);
bus.broadcast('someEvent', arg0, arg1);

// Listener client
var bus = new Bus(server, port);
bus.addListener('someEvent', function (arg0, arg1) {
    // do something with the information
});


Ideally, I'd like to include publisher-subscriber clients in several languages. A PHP subscriber would use a callback:
function handleSomeEvent($arg0, $arg1) {
    // do something with the information
}

$oBus = new Bus($server, $port);
$oBus->addListener('someEvent', 'handleSomeEvent');

The publisher-subscriber client could use a TCP, WebSocket, or HTTP long-poll to communicate with the remote bus. A future iteration might allow for multiple remote bus nodes to push events to each other to then be pushed down to individual client listeners.

More information as it develops.

2010-08-31

Mocking built-in functions in PHP

I just released a library for mocking built-in PHP functions for testing purposes. It can be found at http://github.com/jadell/PHPBuiltinMock.

The README file is pretty detailed about how to use the library, so I wanted to write a bit about why to use it.

PHPBuiltinMock is used to control the output of PHP functions who's usual output is indeterminate. Built-in mocking allows you to treat these indeterminate calls like external dependencies, and mock them.

A classic example is rand(). It is very difficult to write tests against code that uses rand() because the output of the code changes with each run.

class MyClass
{
    public function multiplyWillyNilly($iWilly)
    {
        $iNilly = rand();
        return $iWilly * $iNilly;
    }
}

class MyClassTest extends PHPUnit_Framework_TestCase
{
    public function testMultiply_HopeWeGetA21()
    {
        $iExpected = 21;

        $oClass = new MyClass();
        $iResult = $oClass->multiplyWillyNilly(3);

        // This will work sometimes, but is incredibly brittle!
        self::assertEquals($iExpected, $iResult);
    }
}

The usual way to avoid this problem is to wrap the indeterminate call in another method and mock the call to that method.

class MyClass
{
    public function multiplyWillyNilly($iWilly)
    {
        $iNilly = $this->myRand();
        return $iWilly * $iNilly;
    }

    protected function myRand()
    {
        return rand();
    }
}

class MyClassTest extends PHPUnit_Framework_TestCase
{
    public function testMultiply_Return21()
    {
        $iExpected = 21;

        $oClass = $this->getMock('MyClass', array('myRand'));
        $oClass->expects($this->once())
            ->method('myRand')
            ->will($this->returnValue(7));

       $iResult = $oClass->multiplyWillyNilly(3);
        self::assertEquals($iExpected, $iResult);
    }
}

This does accomplish the trick, but at the expense of code complexity. If all the method does is wrap and return a single function call, it probably doesn't need to be it's own method.

Let's look at our purpose for wanting to mock the rand() call. We don't actually care what the result of rand() is, just that the method calls it. We can keep the code simple and still accomplish this goal by using built-in mocking.

class MyClass
{
    public function multiplyWillyNilly($iWilly)
    {
        $iNilly = rand();
        return $iWilly * $iNilly;
    }
}

class MyClassTest extends PHPUnit_Framework_TestCase
{
    public function teardown()
    {
        BuiltinMock::restore('rand');
    }

    public function testMultiply_IThinkWeWillGetA21()
    {
        $iExpected = 21;

        BuiltinMock::override('rand', new BuiltinMock_Returner_SetValue(7));

        $oClass = new MyClass();
        $iResult = $oClass->multiplyWillyNilly(3);

        self::assertEquals($iExpected, $iResult);
    }
}

We didn't have to modify the code being tested, and we have still accomplished our goal in testing.

Once you get the hang of mocking built-in functions, many uses present themselves. We use built-in mocking to test various scenarios using curl_exec() without requiring an external webserver, and file_get_contents() and file_put_contents() without needing real files or places to write them.

One of the most common uses in our tests is "pinning time." We can run entire test suites pretending that it is the end of February in a leap year, or crossing DST boundaries, or any number of other time-sensitive problem scenarios. Tests would sometimes fail not because the code was incorrect, but because the test machine was under heavy load, which slowed the test down, and expected timestamps would be off by one or more seconds. That problem has virtually disappeared thanks to built-in mocking of the time() function.

2010-08-09

Josh's Modest Proposals

I've started a new side project at work, which I'm calling "Josh's Modest Proposals." The purpose is to periodically write down a ideas about our development process that are a) completely different from what we are currently doing, b) takes a current process to an outrageous extreme, c) counter to our business goals, or d) a little bit of all of those.

Once enshrined on a wiki page with a comment system, I mail a link out to the entire tech department (developers, sys admins, DBAs, product owners, management). Anyone is allowed to comment on any aspect of the idea. I will try to comment back, asking clarifying questions or playing devil's advocate to spur discussion.

My goal is not really to act on the ideas, or even defend them. It's to stimulate discussion about our processes, and hopefully get some of the lunch-table and water-cooler discussions out to a more general audience. If anything about our processes change as a result, that's a by-product, but not an objective.

If there are any interesting discussions, I will post the modest proposal and some choice comments here.

Update 2010-08-18 - Project put on hold for now. Other avenues of driving these sorts of discussions are being explored.

2010-07-16

A Helpful Analogy, or, Why You Should Be Testing Your Protected Methods

It Hurts When I Do This


There seems to be a division in the software development community over how granular code tests should be. Specifically, there are those who believe that tests should only utilize the public interface of a class, and those who believe that every separate piece of functionality should be tested independently of others.

There is common ground: everyone wants to write better code. Tests are simply a tool used for building better code. So the division is not one of outcome, merely granularity.

At iContact we have a goal of 100% code coverage. On the back-end development team, we use both unit tests (testing each method in isolation) and functional tests (testing the entire stack top to bottom.) We have had the discussion over how isolated tests should be. The answer we have come up with is "as isolated as we can possibly make them." This includes testing the innards of a class, not just its interface.

The Doctor Will See You Now


Picture the following scenario. You go to the hospital complaining of sharp stomach pain. The doctor checks your eyes, ears, nose and throat. He taps your knee with the little rubber hammer thing. He asks you questions to determine your lucidity, has you stand on tiptoe and touch your nose with your finger tips. After passing all these tests, the doctor declares that he can't find anything wrong with you, and sends you on your way.

How well do you think a doctor could diagnose your problem if they were confined solely to your "public interface": your five senses and anything the doctor can tell just be looking at and speaking to you.

Instead, we expect that doctors will have tools and equipment to look deeper then your public interface allows: MRI, x-rays, blood analysis and more. In the safe environment of the doctor's office, it is perfectly acceptable for the doctor to bypass your body's natural defenses in order to help fix you.

In your everyday life, however, the story is different. You don't expect, or desire, that any person you meet on the street can take samples of your blood, blast you with x-rays or perform a colonoscopy. It is only proper that others deal with you solely through your public interface.

Your production environment is like everyday life for your code. Your code interacts with other software and systems that don't want or need to know how your code works internally. In this environment, the public interface is sufficient.

However, in a development environment, code should be considered "in the doctor's office." Any tools the developer has should be brought to bear in diagnosing problems, identifying their root cause and fixing them. That includes the ability to circumvent the code's public interface and examine its internals.

When a doctor makes a diagnosis, he needs to be as specific as possible. He doesn't just say that your digestive system isn't working; he identifies the exact organ, or even a smaller piece of that organ, that is faulty. He knows what it is supposed to be doing, and what it actually is doing. Testing your whole digestive “stack” from input to, ahem, output only tells him that something is wrong, not where the problem lies. In order to fix the problem, it must be isolated. Preferably, it should be isolated to the smallest possible area, in order to prevent fixes from doing widespread damage to other levels of the stack.

This Will Only Hurt for a Moment


When we first started testing our class' internal methods, our modus operandi was to create wrapper classes; classes that inherited protected methods and overwrote them as public. The wrapper methods simply called parent's protected methods. Tests could be written against the child's public method and return the same results as the parent's protected method.

The maintenance of these child classes was a nightmare. During development of new code and refactoring of old code we had to remember to not only update our tests to reflect changes, but also to update our wrapper classes to account for new parameters, new default values and new (or removed) methods. The extra work was actually a disincentive to meeting our granular testing goal.

And so dynamic shunts were born. We threw out wrapper class files and replaced them in our tests with calls to our dynamic shunt library.

The shunt library works by using reflection to gather metadata about how individual class method signatures are built. It then constructs a wrapper class definition containing the exact same method signatures, including default values and pass-by-reference parameters. The only difference is that the wrapper class methods are all defined as “public”. The class definition is read into memory and becomes available for tests to instantiate and call methods on.

Take Two and Call Me in the Morning


The upshot to all this is that the wrapper classes are no longer defined statically in a file. They are created on-the-fly at runtime and disappear when the process ends. If the class being shunted changes, the shunt automatically picks up those changes on the next run. We can change the class method signatures, add and remove methods at will, update the tests, and run! No more maintenance of wrapper classes. Now there is no more excuse for not testing class internals as well as interface.

The evolution of our shunt library is ongoing. We are currently exploring ways to test the private and final methods hidden deep within our legacy code (before refactoring them into plain protected methods, of course.) We hope to make testing of non-public methods a community best practice. Dynamic shunts are a huge step towards removing obstacles, and therefore excuses, to doing truly granular unit testing.

The PHP shunt library and usage information is available on Google Code.

2010-07-11

Parallel Testing in the Cloud with Feature Branches

At iContact, we run unit and functional tests in a cloud of testing machines (we call it a "pool" but it's really a cloud.) Each test suite is wrapped in a job and farmed out to one of several Gearman job queues. Worker processes grab a job from the queue, reserve an isolated local database, run the job and report back the results. Suites are run in parallel.

Multiple developers use the cloud at the same time and their tests remain isolated from each other. As we add developers (and tests), the system scales by setting up a new test server, putting a new set of databases on it, and running more worker processes on it to talk to those databases (from now on, I'll refer to a combination of process and database as a "worker".)

Each worker's schema is an identical copy of the schema we write our code against. When we write code that requires schema alterations, we have to make those changes to every worker in the cloud, because we don't know which worker will pick up the tests of the new schema. Since our development process involves running all test suites before checking-in to source control, this means the workers must have the updated schema before the code to use the schema is available to other developers. The result: broken tests for every developer until the code is checked-in.

To compound this problem, we have adopted a new source control repository branching strategy. All new features are developed in separate branches then merged into a central trunk before deployment. Each branch gets its own development environment, complete with web servers and databases that are only accessible by that branch. Tests for every branch, however, still run in the same test cloud on the same workers. So even though code changes are isolated from one another, the schema change problem remains.

I've been thinking about how we might solve, or at least alleviate, this problem. Here are some thoughts:

Separate Clouds

Each branch environment gets its own cloud of test workers. The developers have free access to alter the schema they are developing against, and they are responsible for making the same alterations to their set of workers. The alterations do not need to be pushed to the other test clouds until the branch is merged back into trunk.

Pros:
  • Solves the current problem: tests of new schema do not step on tests that do not expect that schema.
  • Encourages experimenting with schema without bothering the team responsible for maintaining the main test cloud.

Cons:
  • Complicates the automated branch environment setup process.
  • Burdens the developers with deploying every schema change to their branch and all workers.
  • De-commoditizes the workers; we want to allocate workers to one branch or another as needed. In this setup, we would have to know ahead of time that one branch will need more workers than another.

Dynamic Schema

When a new worker starts running, it reads a schema and uses it to build its own database. This happens once when the worker is created. This could be modified, such that a worker always refreshes its schema before each test run. The test job would indicate where its expected schema lives, and the worker would use that schema.

Pros:
  • Maintains test workers as a commodity.
  • Simplifies "experimentation"; changes are made in the canonical location/database, and the worker automatically picks them up.
  • Easily implemented; needs the fewest process and infrastructure changes from the current system.

Cons:
  • Complicates worker process, which is already pretty complicated.
  • Increases test run time; each suite (600+) rebuilds the schema every time when the suite is first run.
  • Disguises test failures; schema thrash means the schema may have already been altered by another branch's tests by the time a failure is reported.

Worker Dispatcher

A combination of the two above ideas. Test runs are are submitted to a central dispatcher which counts the number of test suites being run, reserves an appropriate number of free workers from the cloud, gives them their schema (once, at reservation time), then creates the test jobs in a queue only listened to by those workers. Workers run until there are no more jobs in the queue, then return to the cloud.

Pros:
  • All the pros listed for "Separate Clouds" and "Dynamic Schema" except ease of implementation.

Cons:
  • Most complex solution
  • Adds a new component to maintain, and a new central point of failure
  • Doesn't solve the thrash problem

I think that the "Worker Dispatcher" idea is my favorite. I'm not too concerned with the schema thrash problem. Every developer has their own non-cloud copy of the database against which individual failing tests can be run for investigative purposes. A developer shouldn't be examining the worker database's contents directly anyway, since by the time a failure is reported, that database has probably already been used by another test suite.

I'm not a fan of the complexity of the solution, though.

2010-06-30

Emitting Physics Events with Box2d and Node

Node tries to make as much application logic as possible asynchronous and non-blocking. Most objects communicate by emitting "events" to which other objects can subscribe with listeners. The listeners handle the emitted events asynchronously. All if this is pretty well documented and examples can be found in the node source code itself, most of its included modules, and many applications that make use of the framework.

In sticking with this pattern for nodehockey, I wanted to make the physics simulation asynchronous with sending state updates to the clients. Here's the code (refactored for brevity):
var sys = require("sys"),
    events = require("events"),
    b2d    = require("./path/to/box2dnode");

var PhysicsSim = function () {
    events.EventEmitter.call(this);
    this.runIntervalId = null;

    // Initialize the simulator 
    var worldAABB = new b2d.b2AABB();
    worldAABB.lowerBound.Set(-1000, -1000);
    worldAABB.upperBound.Set( 1000,  1000);
    var gravity = new b2d.b2Vec2(0.0, -9.8);
    var doSleep = true;
    this.world = new b2d.b2World(worldAABB, gravity, doSleep);

    // Initialize physics entities
    // ...
}
sys.inherits(PhysicsSim, events.EventEmitter);
PhysicsSim.prototype.getState() = function () {
    // Get whatever state information we should return to clients
    // ...
    return state;
}
A new class is defined to handle all the physics simulation. The class is responsible for initializing the world (see the Box2D documentation for more on this) and setting up the entities to be simulated. It includes a method for retrieving the current state of the simulated world.

The important bit is setting the class up to be an event emitter, accomplished by the first line in the function (to call the "parent" constructor) and the sys.inherits call which will add all the prototype properties and functions. This will allow the simulator to broadcast events to which other objects can subscribe:
PhysicsSim.prototype.run = function () {
    this.pause();
    this.runIntervalId = setInterval(function (sim, t, i) {
            sim.world.Step(t, i);
            sim.emit("step", sim.getState());
        }, 50, this, 1.0/60.0, 10);

    this.emit("run");
}
PhysicsSim.prototype.pause = function () {
    if (this.runIntervalId != null) {
        clearInterval(this.runIntervalId);
        this.runIntervalId = null;
    }
    this.emit("pause");
}
Three events are defined here. Starting with the last two, "run" is broadcast whenever the simulation begins or is un-paused, and "pause" is broadcast whenever we halt the simulation.

The more interesting event is the "step" event, which is broadcast when the simulation runs another timestep. An interval timer is set up to run the simulation every 50 milliseconds. The simulator is passed in, as well as a timestep and a simulation interval.

In Box2D, the timestep determines how much "in world" time passes with each step. The smaller the value, the more incremental the calculation of the next state. In the above code, the timestep is 1/60th of a second.

The second argument is the simulation interval (the name is a bit misleading.) This is the number of passes in each step that the simulator will take when calculating collisions and movement. Every time a body in the simulated world moves, it has the chance to affect the other bodies. Since this can cascade infinitely within a step, the interval limits how many times the calculations will actually be run. Note that Box2D is smart enough to stop calculating if it doesn't need the full number of passes.

The interval's wait time (50 milliseconds) does not have to match the simulator's timestep. Increasing or decreasing the wait time will speed up or slow down the rate at which the simulator calculates the next step completely independently of how much "in world" time passes with each step, determined by the timestep. This can be useful for adding slow-motion or fast-forward effects to simulations and games.

Once the step is calculated, a "step" event is emitted with the new game state. This is the signal that will be used to let all the clients know that there is a new state they must handle. The following code demonstrates catching these events:
var sim = new PhysicsSim();
sim.addListener("run", function () {
        sys.puts("Simulation running");
    })
    .addListener("pause", function () {
        sys.puts("Simulation paused");
    })
    .addListener("step", function (state) {
        sys.puts("New simulation step: " + sys.inspect(state));
    });
sim.run();
setTimeout(function () {
    sim.pause();
}, 1000);
Running this code in a terminal will show the simulator starting, the first 20 steps of the simulation, then the simulator pausing.

Any number of listeners can be subscribed to these events, even after the simulation is running. Here's an example using websockets:
var ws = require("./path/to/websocket/library");
var sim = new PhysicsSim();
sim.run();

ws.createServer(function (connection) {
    connection.ready = false;
    connection.addListener("connect", function () {
            connection.ready = true;
        });
    sim.addListener("step", function (state) {
            if (connection.ready) {
                connection.write(JSON.stringify(state));
            }
        });
}).listen(8080);
When run, a simulator is created and begins stepping through the simulation. At this point, it is broadcasting "step" events, though there are no listeners.

A websocket server is created and starts listening for connections on port 8080. When it receives a new connection, it attaches a new listener to the simulator for that connection to receive step events, JSON serialize the state that is broadcast and send it to the websocket client.

The "connect" listener on the websocket connection is used to prevent sending step events to the client before the connection is fully initialized. Since all the event handling takes place asynchronously, it is very likely that a step event could be received and sent through the websocket before the socket is done connecting. The connect event on the websocket doesn't fire until the connection is complete, so at that point we can signal the step listener to start sending states to the client.

Since the simulator runs independently of the client connections, clients can connect, disconnect and reconnect at any time, and immediately see the same state as all other clients. Nothing needs to save the simulator state to replay to clients that connect at later points.

If a client does want to see what it has missed, a listener could be added to the step event that would save each state, then replay those states to the client faster than new states are broadcast. Once all the states are replayed, the client could then begin to receive states at the normal pace. This is an exercise left for the reader.

2010-06-29

Canvas Transforms and Origin Re-Mapping

The nodehockey client receives a set of coordinates from the server in units of meters. These physical coordinates need to be translated into pixel values on the canvas.

There are a few obstacles that need to be overcome. The first, is that there is no direct mapping of pixels to meters. My original solution was to take each coordinate and multiply it by a scaling factor, which was the height of the canvas in pixels divided by the height of the physical table in meters. To translate back when sending coordinates from the client to the server, I divided the canvas coordinates by the same scaling factor.

The second issue is that the canvas element places its origin at the top-left of the element, with positive y-values going down towards the bottom. The physical model uses the standard Cartesian scale, with the origin at the bottom-left and positive y-values going up. The original solution to that was to subtract the scaled y-coordinate from the canvas height for rendering, and do the opposite for sending mouse position back.

It turns out, canvas has a built-in method for dealing with these types of problems. Instead of manually scaling each coordinate, you can set a scaling factor on the canvas drawing context, and it will do the scaling for you:
var tableHeight = physical.height;
var canvasHeight = 400;
var scaleFactor = canvasHeight / tableHeight;

context.scale(scaleFactor, -scaleFactor);

The scale() method takes 2 arguments. The first is the multiplier on the x-axis, the second is on the y-axis. They are independent. The above code also reverses the canvas's y-axis, so that y-values move bottom to top.

There is still an issue with this solution. Since y-values are being translated into the negative range, they are drawn from the top of the canvas element upwards, hiding them from view. To solve this, a further transformation must be applied:
context.translate(0, -tableHeight);
The translate() method shifts all coordinates on the canvas by the amounts specified by the first (x-value) and second (y-value) arguments. In this case, it shifts all y-values down by the height of the physical table, which will be scaled to the height of the canvas element. Effectively, this moves to origin of the canvas to the bottom-left, meaning a straight translation of physical coordinates to pixel coordinates.

Translating from canvas pixels back to physical meters still requires a manual transformation:
function scaleToPhysical(coords) {
    scaled = {
        x : coords.x / scaleFactor,
        y : (canvasHeight - coords.y) / scaleFactor,
        r : coords.r / scaleFactor
    }
    return scaled;
}
Anyone know of a better way to do this?

Another note: the scaling applies to all line styles as well as coordinates. Therefore, it's helpful to know the width of an actual pixel in scaled terms. The following code will find the "physical" width of a single pixel:
var singlePixel = 1 / scaleFactor;

2010-06-28

Real-time Multi-player In-Browser Gaming: Air Hockey

I've been wanting to try my hand with nodejs for a while now, and have been trying to find time and a proper project. Recently, I was reading about the <canvas> HTML element, and started trying to think of a cool learning project for that as well. Joining the two technologies together to build a browser-based 2-player air hockey game seemed like just the balance of challenge and opportunity to learn that I was looking for.

The basic premise is to use canvas on the client side to render the state kept track of on the server side, with the server written in Javascript using nodejs. The client is also responsible for capturing player mouse movements and sending them to the server to update game state. Two players will be able to manipulate the same game board at the same time. Only the first two connections to the server are players; other connections can be made and will witness the same game as spectators.

To make things a bit more challenging, the two players should be allowed to be on different clients but still play in real-time. Websockets will allow the client and server to communicate without AJAX long-polling. A physics engine utilizing Box2D will keep track of the game state (actually, I'll be using my nodejs module port of Box2D box2dnode.)

I will be posting now and then on this project as I work on it, learn something interesting or change anything drastically. All source code for the project can be found at http://github.com/jadell/nodehockey. Note: you must have an HTML5 capable browser, like Chrome to run the client.

Blog Title Change

Apparently, there is already a group out there called Citizen Software. In order to not cause confusion or future legal hassles, I have renamed my blog to Everyman Software.

2010-06-21

box2dnode available

I finished my first pass at a Box2D port for nodejs. It is available at http://github.com/jadell/box2dnode.

The module was constructed fairly easily by combining the files from Jonas Wagner's Javascript port of Box2DFlash into one large file, then exporting all the classes using
exports.b2ClassName = b2ClassName;

Pretty simple. Next step is to figure out how to turn mouse movement into forces that act on the various entities on the game board.

2010-06-18

Nodejs and Physics

I was going to make a my first project in nodejs a simple airhockey game. It was going to be an opportunity to not just play with nodejs, but also HTMLS5's canvas tag and websockets. I got the canvas and websockets portion working, and even talking to a nodejs server (more on this stuff in a later post series.)

There was a hang-up halfway through the airhockey project. I found myself having to recreate a physics engine. While this sounds like an interesting project, it wasn't where I was hoping to go with the project I was working on. Instead, I set off on a search for a nodejs-based physics engine. No such luck.

Then I stumbled upon Box2d2. There is already a Javascript physics engine! It seems mainly built for browser use, but it's already written. So my new meta-project is to port Box2d2 into a nodejs module. A side-meta-project is to learn how to properly unit test nodejs modules and applications.

box2dnode is available at http://github.com/jadell/box2dnode.

2010-05-26

Welcome, Citizen

This blog is to keep track of thoughts and ideas I have regarding work projects, new languages/technologies I am playing with, and other musings on software development.

I do not hope to make any money of of it.  I do not expect anyone else to read it.  The blog is really just a way to collect my thoughts and try out my writing skills.

A few words about the title, "Citizen Software": I think that everyone should know how software works.  We are now in an age when saying "I'm no good with computers" is the equivalent of saying "I can't read."  It's unacceptable that society as a whole puts so much faith in something that only a small fraction of that society actually understands.

Yes, everyone can write software.  Maybe not an operating system, or a hardware driver, or a distributed network protocol, or the "next big thing" website, but small convenience scripts that can open the door to larger works.

I used to have a boss who would say, "If it were easy, the accountants would do it."  Well guess what?  The accountant's can do it.  So can the secretaries, sales people, janitors, heck, even the CEOs.  Anyone who has created, used or modified a spreadsheet already has the proper mindset.  The rest is just tools, training and experience.  And actual desire to know (this turns out to be the part most people lack, and what holds them back.)

Hence, Citizen Software; programming for the common man.  Maybe by trying to find the proper mix of technical detail and laymen's terms, someone somewhere will think, "Hey, computers aren't so complex after all."

Or it could end up being a load of geek-speak and mental masturbation.

But however it turns out, I invite anyone out there to come along for the ride.