2011-06-27

Path finding with Neo4j

In my previous post I talked about graphing databases (Neo4j in particular) and how they can be applied to certain classes of problems where data may have multiple degrees of separation in their relationships.

The thing that makes graphing databases useful is the ability to find relationship paths from one node to another. There are many algorithms for finding paths efficiently, depending on the use case.

Consider a graph that represents a map of a town. Every node in the graph represents an intersection of two or more streets. Each direction of a street is represented as a different relationship: two-way streets will have two relationships between the same two nodes, one pointing from the first node to the second, and the other pointing from the second node to the first. One-way streets will have a single relationship. Each street relationship will also have a distance property. Here is what one such map might look like:
Every street is two-way, except for Market Alley and Park Drive. The numbers in parentheses represent the street distance in hundreds of yards (i. e. 3 = 300 yards.)

Here is the code to create this graph in the database, using the REST client I've been working on (connection initialization is not shown):
$intersections = array(
    "First & Main",
    "Second & Main",
    "Third & Main",
    "First & Oak",
    "Second & Oak",
    "Third & Market",
);

$streets = array(
    // start, end, [direction, distance, name]
    array(0, 1, array('direction'=>'east', 'distance'=>2, 'name'=>'Main St.')),
    array(0, 3, array('direction'=>'south', 'distance'=>2, 'name'=>'First Ave.')),
    array(0, 4, array('direction'=>'south', 'distance'=>3, 'name'=>'Park Dr.')),

    array(1, 0, array('direction'=>'west', 'distance'=>2, 'name'=>'Main St.')),
    array(1, 2, array('direction'=>'east', 'distance'=>2, 'name'=>'Main St.')),
    array(1, 4, array('direction'=>'south', 'distance'=>2, 'name'=>'Second Ave.')),

    array(2, 1, array('direction'=>'west', 'distance'=>2, 'name'=>'Main St.')),
    array(2, 5, array('direction'=>'south', 'distance'=>1, 'name'=>'Third Ave.')),

    array(3, 0, array('direction'=>'north', 'distance'=>2, 'name'=>'First Ave.')),
    array(3, 4, array('direction'=>'east', 'distance'=>2, 'name'=>'Oak St')),

    array(4, 3, array('direction'=>'west', 'distance'=>2, 'name'=>'Oak St.')),
    array(4, 1, array('direction'=>'north', 'distance'=>2, 'name'=>'Second Ave.')),

    array(5, 2, array('direction'=>'north', 'distance'=>1, 'name'=>'Third Ave.')),
    array(5, 4, array('direction'=>'west', 'distance'=>2, 'name'=>'Market Alley')),
);

$nodes = array();
$intersectionIndex = new Index($client, Index::TypeNode, 'intersections');
foreach ($intersections as $intersection) {
    $node = new Node($client);
    $node->setProperty('name', $intersection)->save();
    $intersectionIndex->add($node, 'name', $node->getProperty('name'));
    $nodes[] = $node;
}

foreach ($streets as $street) {
    $start = $nodes[$street[0]];
    $end = $nodes[$street[1]];
    $properties = $street[2];
    $start->relateTo($end, 'CONNECTS')
        ->setProperties($properties)
        ->save();
}
First, we set up a list of the intersections on the map, and also a list of the streets that connect those intersections. Each street will be a relationship that has data about the streets direction, distance and name.

Then, we create each intersection node. Each node is also added to an index. Indexes allow stored nodes to be found quickly. Nodes and relationships can be indexed, and indexing can occur on any node or relationship field. You can even index on fields that don't exist on the node or relationship. We'll use the index later to find the start and end points of our path by name.

Finally, we create a relationship for every street. Relationships can have arbitrary data attached to them, just like nodes.

Now we create way to find and display driving directions from any intersection to any other intersection:
$turns = array(
    'east' => array('north' => 'left','south' => 'right'),
    'west' => array('north' => 'right','south' => 'left'),
    'north' => array('east' => 'right','west' => 'left'),
    'south' => array('east' => 'left','west' => 'right'),
);

$fromNode = $intersectionIndex->findOne("Second & Oak");
$toNode = $intersectionIndex->findOne("Third & Main");

$paths = $fromNode->findPathsTo($toNode, 'CONNECTS', Relationship::DirectionOut)
    ->setMaxDepth(5)
    ->getPaths();

foreach ($paths as $i => $path) {
    $path->setContext(Path::ContextRelationship);
    $prevDirection = null;
    $totalDistance = 0;

    echo "Path " . ($i+1) .":\n";
    foreach ($path as $j => $rel) {
        $direction = $rel->getProperty('direction');
        $distance = $rel->getProperty('distance') * 100;
        $name = $rel->getProperty('name');

        if (!$prevDirection) {
            $action = 'Head';
        } else if ($prevDirection == $direction) {
            $action = 'Continue';
        } else {
            $turn = $turns[$prevDirection][$direction];
            $action = "Turn $turn, and continue";
        }
        $prevDirection = $direction;
        $step = $j+1;
        $totalDistance += $distance;

        echo "\t{$step}: {$action} {$direction} on {$name} for {$distance} yards.\n";
    }
    echo "\tTravel distance: {$totalDistance} yards\n\n";
}
Note that in the `findPathsTo` call, we specify that we want only outgoing relationships. This is how we prevent getting directions that may send us the wrong way down a one-way street. We also set a max depth, since the default search depth is 1. Also, note the `setContext` call which tells the path that we want the `foreach` to loop through the relationships, not the nodes. Running this produces the following directions:
Path 1:
    1: Head north on Second Ave. for 200 yards.
    2: Turn left, and continue west on Main St. for 200 yards.
    Travel distance: 400 yards
We only got back one path because, by default, the path finding algorithm finds the shortest path, meaning the path with the least number of relationships. If we switch the to and from intersections, we can get directions back to our starting point:

Path 1:
    1: Head west on Main St. for 200 yards.
    2: Turn left, and continue south on Second Ave. for 200 yards.
    Travel distance: 400 yards

Path 2:
    1: Head south on Third Ave. for 100 yards.
    2: Turn right, and continue west on Market Alley for 200 yards.
    Travel distance: 300 yards
We got back two paths this time, but one of them is longer than the other. How can this be if we are using the shortest path algorithm? The answer is that shortest path refers only to the number of relationships in the path. We didn't tell our path finder to pay attention to the cost of one path over another. Fortunately, we can do that quite easily by telling the `findPathsTo` call to use a property of each relationship when determining which path is the least expensive:
$paths = $fromNode->findPathsTo($toNode, 'CONNECTS', Relationship::DirectionOut)
    ->setAlgorithm(PathFinder::AlgoDijkstra)
    ->setCostProperty('distance')
    ->setMaxDepth(5)
    ->getPaths();
We're telling the path finder to use Dijkstra's algorithm, an algorithm to find the lowest cost path, which is different from the shortest path. Since our map uses distance, the cost of a path is the total of all the distance properties of the relationships in that path. Running the script now gives us only one path, the path with the least distance:
Path 1:
    1: Head south on Third Ave. for 100 yards.
    2: Turn right, and continue west on Market Alley for 200 yards.
    Travel distance: 300 yards
If there were another path with the same total distance, it would also be displayed. Path cost ignores the number of relationships in the path. If there were a path with 3 relationships totaling 300 yards, and another with 1 relationship totaling 300 yards, both of them would be displayed.

Neo4j has several path finding algorithms built in. Some of them are:
  • shortest path - find paths with the fewest relationships
  • dijkstra - find paths with the lowest cost
  • simple path - find paths with no repeated nodes
  • all - find all paths between two nodes
Neo4j also comes with an expressive traversal API that allows you to create your own path finding algorithms for domain specific needs.

8 comments:

  1. Awesome work - both on this tutorial and your PHP implementation.

    One thing though: I've downloaded Neo4jPHP and ran the bacon.php example. I did the init, and when I examine the web admin interface that comes with Neo4j, I can see the actors/movies data & relations being entered.

    However, when I try to get the 'path' from bacon.php, it doesn't work with any actors names. It's almost like it didn't create the "actors" index in the system properly, so it can't pull them up (it just says they aren't found).

    Any thoughts as to why? I'm using the latest version of Neo4J - Community 1.4-M05. Also, I did modify your script and ran it through my WAMP stack instead of the command line, but it shouldn't have any effect as far as I can see:
    Windows 7
    PHP 5.3.6

    Thanks!

    ReplyDelete
  2. I made a change this past weekend that broke indexing. The current version of the client has been fixed. Can you try again with the latest version? If it still does not work, please open an issue on the github repo.

    ReplyDelete
  3. It works now - beauty :)

    Thanks for creating this REST client! I'm surprised they don't have native support for PHP yet.

    Hopefully your client will become quite stable and feature complete. You should get them to link to yours directly from their documentation. There's no sense duplicating efforts, etc.

    Thanks again!

    ReplyDelete
  4. Hi, Please can you show us how did you install neo4j on your machine ? and also if there are hoster that proposes neo4j ? Thanks a lot 

    ReplyDelete
  5. Hi Mahieddine,
    I have a post about how I installed Neo4j on my Ubuntu machine for development and testing: http://blog.everymansoftware.com/2011/11/development-setup-for-neo4j-and-php.html

    Also, the Neo4j documentation has instructions for installing the server and the Java library if you want to embed Neo4j in a Java application: http://docs.neo4j.org/

    As for hosting, Heroku offers Neo4j as an add-on, and there is an Amazon Web Services EC2 AMI that comes pre-installed with Neo4j. I think the Neo4j people are working with other cloud hosts as well. If you have a specific one you would like to see support for, you should talk to them about it.

    Thanks for reading!

    ReplyDelete
  6. Hi Mahieddine,
    I have a post about how I installed Neo4j on my Ubuntu machine for development and testing: http://blog.everymansoftware.c...

    Also, the Neo4j documentation has instructions for installing the server and the Java library if you want to embed Neo4j in a Java application: http://docs.neo4j.org/

    As for hosting, Heroku offers Neo4j as an add-on, and there is an Amazon Web Services EC2 AMI that comes pre-installed with Neo4j. I think the Neo4j people are working with other cloud hosts as well. If you have a specific one you would like to see support for, you should talk to them about it.

    Thanks for reading!

    ReplyDelete
  7. Hi Josh, First of all, happy new year and thank you for all this precious information, the thing is that i'm working on a windows machine (yeah i know ouuuu :D) but i have no choice, i check out the official doc and find anything about installing Neo4j on windows :(
    Thank you for all

    ReplyDelete
  8. Neo4j can be installed in 2 modes, "embedded" or "server".  If you wish to run embedded, put the Neo4j .jar files in your Java class path. Here are the instructions to run as a server on WIndows: http://docs.neo4j.org/chunked/milestone/server-installation.html#_as_a_windows_service 

    ReplyDelete