We adopt the Turing machine (TM) model to analyze query complexity. Difficulty is measured by the resources required - usually time and space - relative to the input size $|w|$.
A language $K$ is complete in $C$ if solving it allows solving all other problems in $C$, also within $C$. Therefore, the definition of completeness in a class $C$ requires the complexity of the reduction function to be lower than that for $C$.
| Type of Completeness | Type of Reduction |
|---|---|
| P-completeness | LOGSPACE reductions |
| NP-completeness | PTIME reductions |
| PSPACE-completeness | PTIME reductions |
The two main possible ways to look at the complexity of queries are as follows:
The usual situation is that the size of the database inputs dominates by far the size of the query, so data complexity is typically most relevant.
Complexity is formally defined via the recognition problem: Given an instance $I$ and a tuple $u$, is $u \in I$? We also measure the complexity of constructing the entire result since the recognition problem is the standard for complexity class categorization.
First-order logic queries3 are highly expressive.
Transitive closure and other recursive properties can’t be expressed in CALC. This necessitates Datalog, which uses fixpoint evaluation. We use the semi-naïve algorithm to compute the result of a recursive Datalog program. The goal is to avoid redundant computations by only using newly derived tuples6 in each subsequent iteration. For a rule $p: −p_1 \ldots p_n, q_1 \ldots q_m$7, let $p[i]$ be the set of tuples at the start of iteration $i$, and let $\delta(p)[i]$ be the new tuples generated in iteration $i$. The update rule is
\[p[i + 1] = [i] \cup \Delta(p)[i]\]Form the precedence graph $G_P$ as follows: Use the IDB predicates as the nodes. Include the edge $(R, R’)$ if there’s a rule with head predicate $R$ in which $R’$ occurs in the body. Mutual recursion is an equivalence relation on the IDB predicates of $P$, where each equivalence class corresponds to a strongly connected component9 of $G_P$.
The fact that each fixpoint query10 is in PTIME follows immediately from the inflationary11 nature of languages defining the fixpoint queries. The total number of tuples that can be built from constants in a given instance is polynomial in the size of the instance. The while queries with cumbersome updates are complete in PSPACE. A fundamental result is that fixpoint is complete in PTIME and that while is complete in PSPACE. These languages can’t express simple queries12 if no order is provided.
We adopt a common notational convention for two parameters. Inside asymptotic notation, the symbol $V$ denotes $|V|$13 and the symbol $E$ denotes $|E|$14. The function to compute the Bacon numbers evaluates every reachable node once and every edge twice, so it’s $\Omega(V + E)$. In this graph, you’d expect that $E \gg V$, so this is reduced to $\Omega(E)$. To maintain $\Omega(1)$ performance, you need a constant time means of finding the node representing an actor such as a hash table mapping names to nodes. The name “direct” algorithm descends from the fact that the length of the computation doesn’t depend on the length of paths in the underlying graph, very often making use of sparse matrices to represent them.