Persistent data structure
13
part of the node. On the other hand, if the access time is after the modification time, then we use the value in the
modification box, overriding that value in the node. (Say the modification box has a new left pointer. Then we’ll use
it instead of the normal left pointer, but we’ll still use the normal right pointer.)
Modifying a node works like this. (We assume that each modification touches one pointer or similar field.) If the
node’s modification box is empty, then we fill it with the modification. Otherwise, the modification box is full. We
make a copy of the node, but using only the latest values.(That is, we overwrite one of the node’s fields with the
value that was stored in the modification box.) Then we perform the modification directly on the new node, without
using the modification box. (We overwrite one of the new node’s fields, and its modification box stays empty.)
Finally, we cascade this change to the node’s parent, just like path copying. (This may involve filling the parent’s
modification box, or making a copy of the parent recursively. If the node has no parent—it’s the root—we add the
new root to a sorted array of roots.)
With this algorithm, given any time t, at most one modification box exists in the data structure with time t. Thus, a
modification at time t splits the tree into three parts: one part contains the data from before time t, one part contains
the data from after time t, and one part was unaffected by the modification.
Complexity of the combination
Time and space for modifications require amortized analysis. A modification takes O(1) amortized space, and O(1)
amortized time. To see why, use a potential function ϕ,where ϕ(T)is the number of full live nodes in T . The live
nodes of T are just the nodes that are reachable from the current root at the current time (that is, after the last
modification). The full live nodes are the live nodes whose modification boxes are full.
Each modification involves some number of copies, say k, followed by 1 change to a modification box. (Well, not
quite—you could add a new root—but that doesn’t change the argument.) Consider each of the k copies. Each costs
O(1) space and time, but decreases the potential function by one. (First, the node we copy must be full and live, so it
contributes to the potential function. The potential function will only drop, however, if the old node isn’t reachable in
the new tree. But we know it isn’t reachable in the new tree—the next step in the algorithm will be to modify the
node’s parent to point at the copy. Finally, we know the copy’s modification box is empty. Thus, we’ve replaced a
full live node with an empty live node, and ϕ goes down by one.) The final step fills a modification box, which costs
O(1) time and increases ϕ by one.
Putting it all together, the change in ϕ is Δϕ =1− k.Thus, we’ve paid O(k +Δϕ)= O(1) space and O(k +Δϕ +1) =
O(1) time.
Fully persistent
In fully persistent model, both updates and queries are allowed on any version of the data structure.
Confluently persistent
In confluently persistent model, we use combinatiors to combine input of more than one previous versions to output
a new single version. Rather than a branching tree, combinations of versions induce a DAG (direct acyclic graph)
structure on the version graph.
Examples of persistent data structures
Perhaps the simplest persistent data structure is the singly linked list or cons-based list, a simple list of objects
formed by each carrying a reference to the next in the list. This is persistent because we can take a tail of the list,
meaning the last k items for some k, and add new nodes on to the front of it. The tail will not be duplicated, instead
becoming shared between both the old list and the new list. So long as the contents of the tail are immutable, this
sharing will be invisible to the program.