Traversing a Graph

The use of the same data value as a primary key and a secondary key is a basic concept. This reflects real-world relationships and re-establishes those relationships for processing. A programmer can:

  1. start at the beginning or a known record and access records sequentially.
  2. use a database key for direct access to a record’s physical location.
  3. use a primary data key.
  4. use a secondary data key to access all records with that value.
  5. traverse from a set’s owner to its member records.
  6. traverse between members of a set.
  7. start from any member and access the owner of the set.

Data structure sets are declared and maintained. They contribute to programmers the capability to:

CREATE TABLE vertices (
    vertex_id integer PRIMARY KEY,
    properties json);
CREATE TABLE edges (
    edge_id integer PRIMARY KEY,
    tail_vertex integer REFERENCES vertices (vertex_id),
    head_vertex integer REFERENCES vertices (vertex_id),
    label text,
    properties json);
CREATE INDEX edges_tails ON edges (tail_vertex);
CREATE INDEX edges_heads ON edges (head_vertex);

Example 1

For any vertex, you can retrieve both incoming and outgoing edges, thus traversing the graph forward and backward. That’s why Example TG-1 has indexes on tail_vertex and head_vertex.

The directions of “in” and “out” were reversed. Where the input notion of the sequential file world meant “into the computer from tape,” the new input notion became “into the database.” This revolution in thinking changes the programmer from a stationary observer to a navigator traversing the database. Processing a single transaction involves a path through the database. A record would be used to gain access to other records. Each of these records is used in turn as a point for examining other sets. The Neo4j Traversal API is a callback-based, lazily executed way for specifying these movements in Java. Some traversal examples are collected. We want to train programmers to navigate in an n-dimensional data space. Neo4j’s graph algorithms component contains implementations of common graph algorithms like:

This “traverser” concept is new in TinkerPop3, providing the means by which steps remain stateless. A traverser maintains all the metadata about the traversal – how many times the traverser has gone through a loop, path history, the current object, etc. Path calculation is costly in terms of space. The traverser stores an array of previously visited objects. Thus, a traversal strategy analyzes whether path metadata is required. If not, path calculations are turned off. Never rely on the iteration order in TinkerPop3 traversals. Even within a release, traversal optimizations may alter the flow.

Cypher, a powerful declarative query language, queries the graph. TinkerPop3’s GraphTraversal JavaDoc lists all steps and their descriptions. The Gremlin Console can be used for these steps.

gremlin> g.V // (1)
  ==>v[1]
  ==>v[2]
  ==>v[3]
  ==>v[4]
  ==>v[5]
  ==>v[6]

gremlin> g.V.name // (2)
  ==>marko
  ==>vadas
  ==>lop
  ==>josh
  ==>ripple
  ==>peter

gremlin> g.V.outE.weight // (3)
  ==>0.4
  ==>0.4
  ==>0.5
  ==>1.0
  ==>1.0
  ==>0.2

  1. g.V retrieves all vertices. There’s no need for parentheses.
  2. g.V.name is interpreted as g.V().values('name').
  3. A chain of zero-argument step calls is followed by a property value call.

Domain-specific language authors should leverage the steps, as then the common optimization and decoration strategies can reason on the underlying traversal sequence. If new steps are introduced, these optimizations may not function properly.

Physics yields minimum-energy solutions; a similar science must be developed for database access. This includes the traversal of existing databases, building databases, and restructuring them to best fit the changing access patterns.