CRDT: Mutable Tree Hierarchy

Published on November 18th, 2022

Collaborative peer-to-peer applications may need to arrange data in a mutable tree structure that allows reparenting. For example, a peer-to-peer document storage system might need to sync an arbitrarily-nested directory structure.

The algorithm presented here is one way to do this. It comes from a family of algorithms called CRDTs, which I will not describe here. Unlike my original article about this technique, the algorithm presented here does not require a central server, and works in true peer-to-peer scenarios.

The algorithm:

At a high level, each node in the tree has a parent pointer that can be mutated to reparent that node somewhere else in the tree. However, just syncing the latest parent pointer for each node doesn't work, as the resulting graph is not necessarily a tree. For example, one peer might parent A under B while another peer parents B under A, causing a cycle in the graph:

To prevent cycles in the tree, we may need to "revert" the effect of one or more parent pointer changes if they are causing a cycle. The algorithm that decides which parent pointer changes to revert must be consistent across peers so that every peer ends up with the same tree. To do this, we need to have each node keep track of all parents that it has ever had. This parent history is what is synced between peers.

Specifically each node keeps an edge map where the key is a parent node (i.e. an "edge" in the graph) and the value is an integer counter. The counter can be used to tell which map entries are more recent. To set node X as a child of node Y, set the value for Y in the edge map for node X to 1 more than the greatest counter value in the edge map for node X so far.

A={C: 0}
B={C: 0}
C={R: 0}

A={C: 0, B: 1}
B={C: 0, A: 1}
C={R: 0}

A={C: 0, ~~B: 1~~}
B={C: 0, A: 1}
C={R: 0}

To determine what a given peer considers to be the parent for each node in the tree:

Initially set each node's parent to the key of the entry in that node's edge map with the greatest counter value (i.e. the most recent one), using the key as a tie-breaker. At this point, every node that can reach the root node by traversing parent pointers (i.e. the "rooted" nodes) will form a tree. The other nodes that can't reach the root (i.e. the "non-rooted" nodes) will be part of one or more cycles.

Reattach non-rooted nodes to the tree one-by-one by repeatedly picking a non-rooted node with an edge to a rooted node and setting its parent to that rooted node. This should always be possible given that a) the parent for a given edge was attached to the root at the time that edge was generated and b) updates between peers are never sent incompletely (e.g. the child is never created without the parent also existing). If it's possible to undo the creation of entries in the edge map then it's of course possible that a node could remain detached after this step, but this is the expected outcome of removing all edges connecting a node to the rooted part of the graph.

One heuristic for doing this consistently across peers is to consider all of the edges from a non-rooted node to a rooted node and pick the one with the highest counter value (which prefers newer edges to older edges), with the identifier of the parent and/or the child as tie-breakers. This can be done efficiently by putting the edges in a priority queue with this ordering strategy as the priority comparison. This is what is implemented in the demo below. If this step picks an edge for a given node that isn't the latest edge (i.e. isn't the one with the greatest counter value) because the latest edge didn't point to a rooted node, then it could be said that the behavior for the latest edge was "temporarily reverted" to prevent a cycle.

Note that this means "reverting" a change just means ignoring the effect of that change. It doesn't mean that the data in the edge map has been changed. We don't want to automatically change any data that will be synced to other peers as part of this step because this could cause never-ending network oscillations (peers constantly switching between states and never eventually settling into a consistent state).

A={C: 0, B: 1}
B={C: 0}
C={R: 0, A: 1}

A={C: 0, B: 1}
B={C: 0}
C={R: 0, ~~A: 1~~}

A={C: 0, ~~B: 1~~}
B={C: 0}
C={R: 0, ~~A: 1~~}

A={C: 0, ~~B: 1~~}
B={C: 0}
C={R: 0, ~~A: 1~~}

When reparenting a node, there is one more step: In addition to inserting an edge in the reparented node's map, all of the parents in the old and new paths to the root should also be checked. For any parent pointer that was the result of the reattachment step, the counter value for the edge for that parent pointer in the child's edge map should be updated to 1 more than the greatest counter value for that child (as if it had also just been reparented).

The reason for this is because without this step, reparenting a single node could counter-intuitively cause the parents for multiple nodes to change. This happens when the node being reparented had been reattached to the root because it was part of a cycle. In that case some of the parent pointers on the path to the root are determined by the reattachment algorithm, which reattaches nodes to the tree in a certain order. That order is based on the contents of the edge maps of the relevant non-rooted nodes which means the reattachment order can change if the edge maps change. By resetting the counters for the parent edges of all of these reattached nodes, we "lock in" the old and new paths to the root from the current peer's local perspective so that they behave as expected and don't shift around. If we didn't do this then the tree would still be a valid tree (as the algorithm guarantees this), but it would sometimes exhibit surprising behavior.

For example, consider the following tree in Step 1 with the nodes A B C D and root node R. If two peers simultaneously reparent A under B and B under A to create a cycle, the result might look like Step 2 after the reattachment pass runs. The effect of the B: 1 entry in the edge map for A was "reverted" such that A is parented under its original parent C instead of under B to avoid creating a cycle. After that, if we then try to reparent B under D by only adding the entry D: 2 to the map for B, the result will look like Step 3. This is because the tree is initially formed by setting each node's parent pointer to its edge with the greatest counter. This "reactivates" the B: 1 entry in the edge map and sets the parent of A back to B now that it no longer causes a cycle. Users will not expect this behavior because they will expect reparenting B under D in Step 2 to only move B. To fix this, we need to increment the counter value for C: 0 entry in the map for A to C: 2 as part of the last reparenting operation so that the parent of A does not change in Step 3 (fixed):

A={C: 0}
B={C: 0}
C={R: 0}
D={R: 0}

A={C: 0, ~~B: 1~~}
B={C: 0, A: 1}
C={R: 0}
D={R: 0}

A={C: 0, ~~B: 1~~}
B={C: 0, A: 1, D: 2}
C={R: 0}
D={R: 0}

A={C: 2, B: 1}
B={C: 0, A: 1, D: 2}
C={R: 0}
D={R: 0}

You can use any conflict-resolution strategy for syncing the entries of the per-node parent pointer maps to other peers, as long as editing two different entries for the same map on the same node are not considered a conflict. The demo below assigns a timestamp to each write of an entry in a map and uses last-writer-wins to resolve conflicts, with the identifier of the peer that originally did the write as a tie-breaker.

Below is a demo of what this looks like in practice. Each quadrant represents a peer, and peers send messages to each other with a simulated network delay. Drag a child node to another parent node to reparent it. Click Undo or Redo for a given peer to rewind or playback that peer's local edits. Temporarily disable the simulated network with the pause button to construct simultaneous editing scenarios. You can use your browser's "view source" feature to view the source code for this demo:

Note that this algorithm does not affect the order of children within a parent. This order may not matter for some applications. If this order is important, then this CRDT can be combined with another CRDT for ordered sequences such as fractional indexing or tree-based indexing. It's likely best to store the information for this second CRDT in the value for the edge map for this CRDT along with the counter so that the information about which parent the node has and where in that parent's children the node lives is always updated together atomically.

This technique has the following benefits and drawbacks:

Benefits:

Doesn't require expensive rollback and replay steps to integrate past changes, unlike some other approaches to this problem.
Although this data structure accumulates some data about the history of previous operations, repeatedly reparenting a node between two different parents overwrites entries in the edge map instead of continuing to add new entries, so the size of the data does not necessarily always grow with the number of operations.

Drawbacks:

Somewhat tricky to reason about and implement.