Migrating to Dependency Injection

Recently, I gave a lunch-and-learn to my team on the topic of Dependency Injection (DI). Instead of showing a bunch of slides explaining what DI is and what it's good for, I created a small project and demonstrated the process of migrating a codebase that does not use DI to one that does.

Each stage of the project is a different tag in the repository. The code can be found on Github: http://github.com/jadell/laldi. Checkout the code, and run composer install. To see the code at each step in the process, run the git checkout command in the header of each section.

git checkout 0-initial

The initial stage of the project is a listing of developers on the team, and with whom each has pair programmed. The project uses Silex, which is both a PHP micro-framework, and a simple dependency injection container. For the initial stage, only the framework is used (for HTTP request handling, routing and dispatching.) Each route handler is registered in index.php. A handler intantiates a controller object, then calls a method on the controller to handle the request. For this project, there is only one controller, the HelloController, but there could be many others.

When the HelloController is instantiated, it instantiates a UserRepository to handle access to User domain objects and a View to handle rendering the output. The UserRepository instantiates a Datasource to handle access to the raw data. The path to the data file is hardcoded in the Datasource. User objects instantiate a UserRepository of their own, so they can access the list of paired Users.

The code smell here is the chain of initializations that get triggered by constructing a new HelloController, which leads to several problems: it tightly couples every component to every other component; none of the components can be unit-tested in isolation from the others; we can't change the behavior of a class or its dependencies without modifying the class; and if the constructor signature of any component changes, we need to find each instantiation of that component and change it to pass in the new parameters (which might involve threading those parameters through multiple layers which don't require them.) There also may be bugs with multiple objects having their own copies of dependencies (for example, each User object should probably share a UserRepository, not have their own.)

git checkout 1-add-di

A simple way to solve these problems is to ensure that no class is responsible for creating its own dependencies. In order for a class to access its dependencies, we pass those dependencies into the class's constructor (another form of DI uses "setter" injection instead of constructor injection.)

The problem now is that all the dependencies must be wired together. Since no class instantiates its own dependencies, we must have a component that does this for us. This is the job of the Dependency Injection Container (DIC).

Silex is also used as a DIC. In bootstrap.php, all class instantiation is wrapped in a factory method that is registered with the DIC. When the DIC is asked for a registered object, if that object is not already instantiated, it will be created. Additionally, all its dependencies will be created and injected through its constructor.

For cases where a component may need to create multiple objects of the same type (like the UserRepository needs to create many User objects), factory closures are created. The factory makes sure that any User object is created with the correct dependencies, and that shared dependencies are not re-created each time. Since the closure is defined in the DIC, the UserRepository does not need access to any of the external dependencies that might be used by User objects, and the closure can be passed to the UserRepository.

git checkout 2-comments

Our project gets a new requirement to allow comments on users. Following the same pattern as with Users, we create a CommentRepository and a Comment domain object, and factory methods for creating and injecting each. HelloController needs access to the CommentRepository in order to save comments. Also, Users need access to the repository to retrieve all the Comments for that User. Since we are using DI, we change their constructors to receive a CommentRepository.

If we were not using a DIC, we would have to find every place that instantiated a HelloController or User object and change those calls to pass in the CommentRepository. Additionally, we would have to pass the CommentRepository through to any place that instantiated the object. Since we are using a DIC and factory methods, we can be assured that there is only one place that creates each of these object types, and therefore only one place needs to change (the factory methods in the DIC.)

We already get benefits from our refactoring to use a DIC!

git checkout 3-events

Our last requirements are to do some simple tracking of page views and notification of comments (perhaps with email, but for demonstration purposes, to a log file.) Each of these is a good use case for an event-driven system.

One of the challenges with event-driven development is making sure that the event listeners are instantiated before the emitters begin sending events. An option for getting around this is to instantiate all listeners as part of the application's bootstraping/initialization process. But this can become cumbersome and have performance implications.

Instead of always instantiating every listener, we can use our DIC and factory methods. Since event emitters are instantiated through a factory method, we use that same factory method to initialize listeners to any events that emitter may emit. By tying the creation of the emit to the creation of the listeners, we ensure that we only instantiate those listeners for which we have created an emitter.

For example, we know that the HelloController may emit page view events, so we make sure that whenever we instantiate a HelloController, we also create a page view listener and set it to listen for the event.

Potential Issues

Several good questions were asked during my presentation, and I attempted to answer them as best as I could.

Is debugging harder having the definitions of the objects away from their usage?
Since you are explicitly passing an object's dependencies, and you get all the dependencies from the same place, you can minimize external state that may affect an object's behavior. Also, by using factory methods, you can find exactly where an object was created. Without the DIC, an object or its dependencies could come from anywhere in your code. DI also makes unit- and regression-testing easier.

Do the bootstrap/DIC definitions become unwieldy when you have a lot of interconnected objects?
The DIC is kept simple by having each factory method responsible for creating only one type of object. If each method is small and isolated, the file may become very large, but the logic in the file will remain easy to read. You can even break the bootstrap of the DIC into multiple files, each file handling a different component or section of the project's dependencies. Circular dependencies may be a problem with a highly connected object graph, but that is a separate problem regardless of using a DIC or not.

Where is a good place to start with refactoring an existing large codebase to use DIC?
It's a multi-step process:
  1. Pick a class and create a factory method on a globally accessible DIC that instantiates that class.
  2. For every place in the class that uses the "new" operator, replace it with a call to a factory function/method on the DIC.
  3. Move calls to the factory methods into the class's constructor, and store the results as properties on the object.
  4. Change the constructor to not call the factory methods itself, but receive its dependencies as parameters.
  5. Update the class's factory method to call the other factory methods and pass in the dependencies to the constructor.
  6. Find any place in the code that instantiates the class and replace it with a call to the class's factory method.

Repeat this process for any usage of the "new" operator in the code. By the end, the only place in the code that uses the "new" operator or refers to the DIC should be the DIC itself.