2010-07-16

A Helpful Analogy, or, Why You Should Be Testing Your Protected Methods

It Hurts When I Do This


There seems to be a division in the software development community over how granular code tests should be. Specifically, there are those who believe that tests should only utilize the public interface of a class, and those who believe that every separate piece of functionality should be tested independently of others.

There is common ground: everyone wants to write better code. Tests are simply a tool used for building better code. So the division is not one of outcome, merely granularity.

At iContact we have a goal of 100% code coverage. On the back-end development team, we use both unit tests (testing each method in isolation) and functional tests (testing the entire stack top to bottom.) We have had the discussion over how isolated tests should be. The answer we have come up with is "as isolated as we can possibly make them." This includes testing the innards of a class, not just its interface.

The Doctor Will See You Now


Picture the following scenario. You go to the hospital complaining of sharp stomach pain. The doctor checks your eyes, ears, nose and throat. He taps your knee with the little rubber hammer thing. He asks you questions to determine your lucidity, has you stand on tiptoe and touch your nose with your finger tips. After passing all these tests, the doctor declares that he can't find anything wrong with you, and sends you on your way.

How well do you think a doctor could diagnose your problem if they were confined solely to your "public interface": your five senses and anything the doctor can tell just be looking at and speaking to you.

Instead, we expect that doctors will have tools and equipment to look deeper then your public interface allows: MRI, x-rays, blood analysis and more. In the safe environment of the doctor's office, it is perfectly acceptable for the doctor to bypass your body's natural defenses in order to help fix you.

In your everyday life, however, the story is different. You don't expect, or desire, that any person you meet on the street can take samples of your blood, blast you with x-rays or perform a colonoscopy. It is only proper that others deal with you solely through your public interface.

Your production environment is like everyday life for your code. Your code interacts with other software and systems that don't want or need to know how your code works internally. In this environment, the public interface is sufficient.

However, in a development environment, code should be considered "in the doctor's office." Any tools the developer has should be brought to bear in diagnosing problems, identifying their root cause and fixing them. That includes the ability to circumvent the code's public interface and examine its internals.

When a doctor makes a diagnosis, he needs to be as specific as possible. He doesn't just say that your digestive system isn't working; he identifies the exact organ, or even a smaller piece of that organ, that is faulty. He knows what it is supposed to be doing, and what it actually is doing. Testing your whole digestive “stack” from input to, ahem, output only tells him that something is wrong, not where the problem lies. In order to fix the problem, it must be isolated. Preferably, it should be isolated to the smallest possible area, in order to prevent fixes from doing widespread damage to other levels of the stack.

This Will Only Hurt for a Moment


When we first started testing our class' internal methods, our modus operandi was to create wrapper classes; classes that inherited protected methods and overwrote them as public. The wrapper methods simply called parent's protected methods. Tests could be written against the child's public method and return the same results as the parent's protected method.

The maintenance of these child classes was a nightmare. During development of new code and refactoring of old code we had to remember to not only update our tests to reflect changes, but also to update our wrapper classes to account for new parameters, new default values and new (or removed) methods. The extra work was actually a disincentive to meeting our granular testing goal.

And so dynamic shunts were born. We threw out wrapper class files and replaced them in our tests with calls to our dynamic shunt library.

The shunt library works by using reflection to gather metadata about how individual class method signatures are built. It then constructs a wrapper class definition containing the exact same method signatures, including default values and pass-by-reference parameters. The only difference is that the wrapper class methods are all defined as “public”. The class definition is read into memory and becomes available for tests to instantiate and call methods on.

Take Two and Call Me in the Morning


The upshot to all this is that the wrapper classes are no longer defined statically in a file. They are created on-the-fly at runtime and disappear when the process ends. If the class being shunted changes, the shunt automatically picks up those changes on the next run. We can change the class method signatures, add and remove methods at will, update the tests, and run! No more maintenance of wrapper classes. Now there is no more excuse for not testing class internals as well as interface.

The evolution of our shunt library is ongoing. We are currently exploring ways to test the private and final methods hidden deep within our legacy code (before refactoring them into plain protected methods, of course.) We hope to make testing of non-public methods a community best practice. Dynamic shunts are a huge step towards removing obstacles, and therefore excuses, to doing truly granular unit testing.

The PHP shunt library and usage information is available on Google Code.

2010-07-11

Parallel Testing in the Cloud with Feature Branches

At iContact, we run unit and functional tests in a cloud of testing machines (we call it a "pool" but it's really a cloud.) Each test suite is wrapped in a job and farmed out to one of several Gearman job queues. Worker processes grab a job from the queue, reserve an isolated local database, run the job and report back the results. Suites are run in parallel.

Multiple developers use the cloud at the same time and their tests remain isolated from each other. As we add developers (and tests), the system scales by setting up a new test server, putting a new set of databases on it, and running more worker processes on it to talk to those databases (from now on, I'll refer to a combination of process and database as a "worker".)

Each worker's schema is an identical copy of the schema we write our code against. When we write code that requires schema alterations, we have to make those changes to every worker in the cloud, because we don't know which worker will pick up the tests of the new schema. Since our development process involves running all test suites before checking-in to source control, this means the workers must have the updated schema before the code to use the schema is available to other developers. The result: broken tests for every developer until the code is checked-in.

To compound this problem, we have adopted a new source control repository branching strategy. All new features are developed in separate branches then merged into a central trunk before deployment. Each branch gets its own development environment, complete with web servers and databases that are only accessible by that branch. Tests for every branch, however, still run in the same test cloud on the same workers. So even though code changes are isolated from one another, the schema change problem remains.

I've been thinking about how we might solve, or at least alleviate, this problem. Here are some thoughts:

Separate Clouds

Each branch environment gets its own cloud of test workers. The developers have free access to alter the schema they are developing against, and they are responsible for making the same alterations to their set of workers. The alterations do not need to be pushed to the other test clouds until the branch is merged back into trunk.

Pros:
  • Solves the current problem: tests of new schema do not step on tests that do not expect that schema.
  • Encourages experimenting with schema without bothering the team responsible for maintaining the main test cloud.

Cons:
  • Complicates the automated branch environment setup process.
  • Burdens the developers with deploying every schema change to their branch and all workers.
  • De-commoditizes the workers; we want to allocate workers to one branch or another as needed. In this setup, we would have to know ahead of time that one branch will need more workers than another.

Dynamic Schema

When a new worker starts running, it reads a schema and uses it to build its own database. This happens once when the worker is created. This could be modified, such that a worker always refreshes its schema before each test run. The test job would indicate where its expected schema lives, and the worker would use that schema.

Pros:
  • Maintains test workers as a commodity.
  • Simplifies "experimentation"; changes are made in the canonical location/database, and the worker automatically picks them up.
  • Easily implemented; needs the fewest process and infrastructure changes from the current system.

Cons:
  • Complicates worker process, which is already pretty complicated.
  • Increases test run time; each suite (600+) rebuilds the schema every time when the suite is first run.
  • Disguises test failures; schema thrash means the schema may have already been altered by another branch's tests by the time a failure is reported.

Worker Dispatcher

A combination of the two above ideas. Test runs are are submitted to a central dispatcher which counts the number of test suites being run, reserves an appropriate number of free workers from the cloud, gives them their schema (once, at reservation time), then creates the test jobs in a queue only listened to by those workers. Workers run until there are no more jobs in the queue, then return to the cloud.

Pros:
  • All the pros listed for "Separate Clouds" and "Dynamic Schema" except ease of implementation.

Cons:
  • Most complex solution
  • Adds a new component to maintain, and a new central point of failure
  • Doesn't solve the thrash problem

I think that the "Worker Dispatcher" idea is my favorite. I'm not too concerned with the schema thrash problem. Every developer has their own non-cloud copy of the database against which individual failing tests can be run for investigative purposes. A developer shouldn't be examining the worker database's contents directly anyway, since by the time a failure is reported, that database has probably already been used by another test suite.

I'm not a fan of the complexity of the solution, though.