CRDT: Tree-Based Indexing

← Back to the algorithm list Published on November 14th, 2022

Collaborative peer-to-peer applications sometimes need to operate on sequences of objects with a consistent order across all peers. For example, a peer-to-peer rich text editing application might need to sync the order in which blocks of text appear in a document.

The algorithm presented here is one way to do this. It comes from a family of algorithms called CRDTs, which I will not describe here. Compared to fractional indexing, tree-based indexing is more complicated but prevents interleaving of concurrently-inserted runs, which makes it appropriate for textual data. The algorithm presented here is similar to a well-known one called "RGA" but with reordering layered on top.

The algorithm:

You can use any conflict-resolution strategy for syncing the parent pointers and sorting counters to other peers. The demo below assigns each write a timestamp and uses last-writer-wins to resolve conflicts, with the identifier of the peer that originally did the write as a tie-breaker.

Why set the parent pointer to the object before the insertion point instead of after the insertion point? Humans typically insert runs of objects one-after-the-other, and using the object before the insertion point ensures that two runs concurrently inserted in the same spot cannot become interleaved (since each run will form a separate chain). For example, if the original sequence is A B and one peer inserts x y z to get A x y z B while another peer inserts 1 2 3 to get A 1 2 3 B, this algorithm guarantees that the sequence will converge to either A x y z 1 2 3 B or A 1 2 3 x y z B but will never converge to an interleaving sequence such as A x 1 y 2 z 3 B.

This technique requires every object in the tree to live in its insertion spot forever (assuming that peers can disconnect for arbitrary lengths of time, which means garbage collection isn't possible). This means that deleted objects in the tree must be marked as deleted but still retained. It also means that reordering objects in the tree is not supported directly. A simple way of introducing object reordering is to store insertion positions using tree-based indexing, and then to separately store a reference to an insertion position (i.e. a node in the tree) with each reorderable object (which is not a node in the tree). Reordering an object then involves generating a new insertion position and updating that object's insertion position reference.

Below is a demo of what this looks like in practice. Each quadrant represents a peer, and peers send messages to each other with a simulated network delay. Click + to insert new objects, × to delete existing objects, and drag objects to reorder them. Click Undo or Redo for a given peer to rewind or playback that peer's local edits. Temporarily disable the simulated network with the pause button to construct simultaneous editing scenarios. You can use your browser's "view source" feature to view the source code for this demo:

This technique has the following benefits and drawbacks:

Benefits:

Drawbacks: