CRDT: Tree-Based Indexing

Published on November 14th, 2022

Collaborative peer-to-peer applications sometimes need to operate on sequences of objects with a consistent order across all peers. For example, a peer-to-peer rich text editing application might need to sync the order in which blocks of text appear in a document.

The algorithm presented here is one way to do this. It comes from a family of algorithms called CRDTs, which I will not describe here. Compared to fractional indexing, tree-based indexing is more complicated but prevents interleaving of concurrently-inserted runs, which makes it appropriate for textual data. The algorithm presented here is similar to a well-known one called "RGA" but with reordering layered on top.

The algorithm:

Each object is given a parent pointer which must point to an object that existed at the time the object was created.
To insert an object, set the parent pointer to the object immediately before the insertion point (or to null when inserting at the beginning). The object order is determined by a pre-order tree traversal that places parents before their children.
To order children within a given parent, have each child store the number of children that the parent had at the original insertion time, then sort the children by these counts in descending order (using the object identifier as a tie-breaker). This way newer objects are sorted before older ones.

You can use any conflict-resolution strategy for syncing the parent pointers and sorting counters to other peers. The demo below assigns each write a timestamp and uses last-writer-wins to resolve conflicts, with the identifier of the peer that originally did the write as a tie-breaker.

Why set the parent pointer to the object before the insertion point instead of after the insertion point? Humans typically insert runs of objects one-after-the-other, and using the object before the insertion point ensures that two runs concurrently inserted in the same spot cannot become interleaved (since each run will form a separate chain). For example, if the original sequence is A B and one peer inserts x y z to get A x y z B while another peer inserts 1 2 3 to get A 1 2 3 B, this algorithm guarantees that the sequence will converge to either A x y z 1 2 3 B or A 1 2 3 x y z B but will never converge to an interleaving sequence such as A x 1 y 2 z 3 B.

This technique requires every object in the tree to live in its insertion spot forever (assuming that peers can disconnect for arbitrary lengths of time, which means garbage collection isn't possible). This means that deleted objects in the tree must be marked as deleted but still retained. It also means that reordering objects in the tree is not supported directly. A simple way of introducing object reordering is to store insertion positions using tree-based indexing, and then to separately store a reference to an insertion position (i.e. a node in the tree) with each reorderable object (which is not a node in the tree). Reordering an object then involves generating a new insertion position and updating that object's insertion position reference.

Below is a demo of what this looks like in practice. Each quadrant represents a peer, and peers send messages to each other with a simulated network delay. Click + to insert new objects, × to delete existing objects, and drag objects to reorder them. Click Undo or Redo for a given peer to rewind or playback that peer's local edits. Temporarily disable the simulated network with the pause button to construct simultaneous editing scenarios. You can use your browser's "view source" feature to view the source code for this demo:

This technique has the following benefits and drawbacks:

Benefits:

It's easy to understand and implement.
If two peers both simultaneously insert a run of objects one-after-the-other at the same location, the resulting objects are not interleaved. So this algorithm is appropriate in situations where object adjacency is critical (e.g. using this to order paragraphs of text in a text document would be appropriate).

Drawbacks:

Every reordering operation adds more data to the database that persists forever, so this isn't ideal for situations with frequent reordering.
Interleaving is not prevented for concurrent runs that are inserted in reverse-order. This doesn't come up in typical text-editing scenarios but is something to keep in mind. For example, if a peer inserts the run A B C by inserting A, then B after A, then C after B, no interleaving is possible. But if a peer inserts the run A B C by inserting C, then B before C, then A before B, interleaving with another peer's concurrent changes could occur.
Using this algorithm with large numbers of objects (e.g. for every character in a text document) is inefficient. Doing this efficiently requires a specialized approach that represents a concurrently-inserted run as a single block of memory with implicit parent pointers between adjacent elements. These blocks may need to be split when a node gains multiple children, which adds complexity. You would also likely drop support for reordering as characters don't typically need to preserve their identity when they are reordered. A more advanced algorithm that does all of this is presented in CRDT: Text Buffer.