Uses cases

Use case #1 : Doctor Who

The Neo4j Koans are a good learning resource to discover Neo4j in practice. Some exercises are based on a graph storing informations about Doctor Who serie. We will base this use case on this dataset to cover:

  1. how to perform data projections to draw the content of the database
  2. how to perform data analyses to enhance data meaning

The database is great for benchmarking visualization tools because it is small: 1.000 nodes, XXX edges

In the next sections, we'll assume you have already downloaded and unzip the latest distribution, which is bundled with the required database and configuration files for this use case.

Flat graph map

Let's start by drawing the graph database as we usually do: using a force based layout algorithm, we can have an early vision on the data. To do so let's open workspace doctorwho. The layout will start and you can stop it once you are satisfied with the result.
It is actually hard to get valueable information from this representation: looking at the big picture roughly shows the number of node and edge types, but at this stage, the only way to know something more is to go closer an individual nodes and feeling frustrated not to see all its neighbours or to understand its role in the overall graph. In other word that flat graph does not say more than the tree widget on the left, and is even less efficient for rapid browsing.

One interesting information given by both the tree and the graph is the type of node. Node types are defined in the workspace file types.ns and define as a set of property keys:


type Character{
 property: character
 
 relation: PLAYED
 relation: LOVES
 relation: OWNS
 relation: APPEARED_IN
 relation: ALLY_OF
 relation: COMPANION_OF
 relation: FATHER_OF
 relation: FIRST_APPEARED
 relation: ENEMY_OF
 relation: IS_A
 relation: COMES_FROM
 relation: DIED_IN
 
 label: character
 icon: doctorwho/character.png
}

If you read the full file, you will see some duplicate types extending a base one:


type Actor{
 property: actor
 relation: PLAYED
 relation: REGENERATED_TO
 relation: APPEARED_IN
 
 label: actor
 icon: doctorwho/actor.png
}

type Actor2{
 extends: Actor
 
 property: actor
 property: wikipedia
 
 label: actor
 icon: doctorwho/actor.png
}

This is used to link all implicitly similar types and to later use a single type name to refer to all of them.

The essential part of the type file was generated (see how to here): all unique set of property keys are read from the graph. From these set, we have types labeled with a default string:


type undefined5{
 property: prop
 relation: MEMBER_OF
 relation: ORIGINAL_PROP
 relation: COMPOSED_OF
}

Our job was simply to give more information to this generated file, e.g.:


type Property{
 property: prop
 relation: MEMBER_OF
 relation: ORIGINAL_PROP
 relation: COMPOSED_OF
 
 label: prop
 icon: doctorwho/property.png
 propertynode
}

The latest information (propertynode) is usefull to build aggregates. Being aware of aggregates let us build views where we only draw main nodes and avoid drawing property nodes for clarity. The data remains present and browsable via other views but is filtered by our projection.

Type map

To have a better understanding of the database content, we can start by simply grouping all nodes by their type. The workspace doctorwho-types contains the file projection.xml which states:


<projection>
  <groupby>
    <type/>
	<andby>
	  <rule name="isolated"/>
	</andby>
  </groupby>
</projection>

This projection will:

  • build groups of node according to their type (Actor, Character, Planet, Specy, etc)
  • make the groups more compact by grouping all non connected node of each group in a subgroup that can be toggled to clarify the map

This map provides much more information than the previous one:

  • we have an immediate estimate of the number of entities of each type
  • we have an overview of the kind of relationships existing between each type

All these informations will help us understanding the data before choosing the relevant and comprehensive views. In this case we thought the dataset should be splitted in two parts in order to build:

  • a map of the scenario, showing characters, their specy and their planet, and their relations
  • a map of the "backstage", explaining which actor played which character at which episode

Scenario map

In the scenario map, we would like to clearly show the hierarchy Planet > Specy > Character, so we have written rules to build such hierarchical structure:


<projection>
  <groupby>
    <relation source="Character" type="COMES_FROM" target="Planet"/>
    <andby>
      <relation source="Character" type="IS_A" target="Specy"/>
    </andby>
  </groupby>
  <remove from="graph">
    <rule name="node_in_no_group"/>
  </remove>
  <remove from="topology">
    <rule name="empty_groups"/>
  </remove>
</projection>

You noticed some cleanup at the end of the file: non projected nodes (actors, episode, etc) are not interesting for us to understand the scenario in itself.

The projection is interesting because:

  • The amount of data looks not too long to read
  • As for the type map, we can generalize with this representation: we can infer the positive/negative relationship between groups (species and planets) using the edge color of individual relationships to guess the amount of positive/negative of a group.
  • The overview does not tie you to one viewpoint: you can use navigation tools to explore relationships of a node in details

Let's open the view: it looks simple now. We can see positive/negative interpersonnal relation clearly, and can even generalize visually which specy is friend with which specy, which is much more hard to read on a flat graph.

Going further

Drawing families

We might show more information in this map: the database holds information about family relationships (e.g. FATHER_OF). Assuming the genealogy is completely described, we should simply:

  • Apply a third level of projection (remember the "andby") in order to project each family in a dedicated group.
  • Implement a tree layout using the Java API.
  • Customize the IHierarchicalLayoutFactory to let it apply this layout when a family group is found.

Want to give a try?