Run-time Complexity of Graphs

We adopt the Turing machine (TM) model to analyze query complexity. Difficulty is measured by the resources required - usually time and space - relative to the input size $|w|$.

\[\text{LOGSPACE} \subseteq \text{PTIME} \subseteq \text{NP} \subseteq \text{PSPACE} \subset \text{TIME}(\text{Hyp})\]

A language $K$ is complete in $C$ if solving it allows solving all other problems in $C$, also within $C$. Therefore, the definition of completeness in a class $C$ requires the complexity of the reduction function to be lower than that for $C$.

Type of Completeness Type of Reduction
P-completeness LOGSPACE reductions
NP-completeness PTIME reductions
PSPACE-completeness PTIME reductions

The two main possible ways to look at the complexity of queries are as follows:

  1. Data complexity: the complexity of evaluating a fixed query for variable database inputs
  2. Expression complexity: the complexity of evaluating, on a fixed database instance, the various queries specifiable in a given query language

The usual situation is that the size of the database inputs dominates by far the size of the query, so data complexity is typically most relevant.

Complexity is formally defined via the recognition problem: Given an instance $I$ and a tuple $u$, is $u \in I$? We also measure the complexity of constructing the entire result since the recognition problem is the standard for complexity class categorization.

First-order logic queries3 are highly expressive.

Transitive closure and other recursive properties can’t be expressed in CALC. This necessitates Datalog, which uses fixpoint evaluation. We use the semi-naïve algorithm to compute the result of a recursive Datalog program. The goal is to avoid redundant computations by only using newly derived tuples6 in each subsequent iteration. For a rule $p: −p_1 \ldots p_n, q_1 \ldots q_m$7, let $p[i]$ be the set of tuples at the start of iteration $i$, and let $\delta(p)[i]$ be the new tuples generated in iteration $i$. The update rule is

\[p[i + 1] = [i] \cup \Delta(p)[i]\]

Delta Rules for Semi-naïve Evaluation

\[\Delta(p)[i]: -\delta(p_1)[i - 1], p[i - 1]_2...p[i - 1]_n, q_1...q_m\] \[\Delta(p)[i]: -p[i]_1, \delta(p_2)[i - 1], p[i - 1]_3...p[i - 1]_n, q_1...q_m\] \[\vdots\] \[\Delta(p)[i]: -p[i]_1, p[i][n - 1], \delta(p_n)[i - 1], q_1...q_m\]

8


Form the precedence graph $G_P$ as follows: Use the IDB predicates as the nodes. Include the edge $(R, R’)$ if there’s a rule with head predicate $R$ in which $R’$ occurs in the body. Mutual recursion is an equivalence relation on the IDB predicates of $P$, where each equivalence class corresponds to a strongly connected component9 of $G_P$.

The fact that each fixpoint query10 is in PTIME follows immediately from the inflationary11 nature of languages defining the fixpoint queries. The total number of tuples that can be built from constants in a given instance is polynomial in the size of the instance. The while queries with cumbersome updates are complete in PSPACE. A fundamental result is that fixpoint is complete in PTIME and that while is complete in PSPACE. These languages can’t express simple queries12 if no order is provided.

We adopt a common notational convention for two parameters. Inside asymptotic notation, the symbol $V$ denotes $|V|$13 and the symbol $E$ denotes $|E|$14. The function to compute the Bacon numbers evaluates every reachable node once and every edge twice, so it’s $\Omega(V + E)$. In this graph, you’d expect that $E \gg V$, so this is reduced to $\Omega(E)$. To maintain $\Omega(1)$ performance, you need a constant time means of finding the node representing an actor such as a hash table mapping names to nodes. The name “direct” algorithm descends from the fact that the length of the computation doesn’t depend on the length of paths in the underlying graph, very often making use of sparse matrices to represent them.

  1. TIME(Poly) 

  2. SPACE(Poly) 

  3. relational calculus or CALC 

  4. logarithmic in size 

  5. AC0 

  6. deltas 

  7. where pi are IDBs and *qj are EDBs 

  8. and so on for all n IDB atoms 

  9. cycle 

  10. Datalog 

  11. tuples are never removed 

  12. like checking if the number of elements is even 

  13. number of vertices 

  14. number of edges