CRDT: Text Buffer - Made by Evan

Published on May 19th, 2024

Collaboratively editing strings of text is a common desire in peer-to-peer applications. For example, a note-taking app might represent each document as a single collaboratively-edited string of text.

The algorithm presented here is one way to do this. It comes from a family of algorithms called CRDTs, which I will not describe here. It's similar to the approaches taken by popular collaborative text editing libraries such as Yjs and Automerge. Other articles have already been written about these similar approaches (see the references section below), but this article also has a nice interactive visualization of what goes on under the hood.

The algorithm:

Each character is assigned a unique identifier consisting of site (the identifier of the creator) and clock (a site-specific integer that is incremented after every operation) as well as a (possibly null) parent pointer to a previous character.
To insert a character, set its parent pointer to the character immediately before the insertion point at the time of insertion (or to null when inserting at the beginning). The character order is determined by a pre-order tree traversal that places parents before their children. This is tree-based indexing.
To order characters with the same parent, have each character also store a counter value and sort first by counter (use descending order), then by the site of the character (use arbitrary but consistent order). When inserting before a character with the same parent, use its counter incremented by 1 to be ordered before it. This order is unique because doing this means the same site will never reuse the same counter in the same spot.
To delete a character, put that character's identifier in a deleted set. Note that this means deleted characters persist forever, which is known as a "tombstone" in CRDT literature. Deleted character values can be forgotten but the positions must be remembered to be able to correctly order incoming changes that use one of the now-deleted characters as a parent.

This is not as expensive as it sounds because of three important optimizations:

Successive inserts from the same site can all be merged into a single block in memory, so for example pasting a large chunk of text uses the same amount of metadata as inserting a single character. This works because character identifiers are carefully designed to be able to be implicit in a contiguous run of text (each character will the same site and counter and have a clock that's 1 more than the previous character's clock).
These blocks of memory can be stored contiguously in a single array that's pre-sorted in document order. Inserting a new block just involves inserting into that array at the correct position. This avoids needing to store the tree data structure explicitly (e.g. with arrays of children and/or sibling pointers) and also takes advantage of CPU optimizations for reading memory sequentially.
The delete set can be represented more efficiently using a range-based representation. A series of deletes with the same site and with consecutive clock values can be represented with a single range. Note that this range is contiguous in identifier-space but not necessarily in document-space.

Below is a demo of what this looks like in practice. Each quadrant represents a peer, and peers send messages to each other with a simulated network delay. Click on a peer's text to edit it. Temporarily disable the simulated network with the pause button to construct simultaneous editing scenarios. You can use your browser's "view source" feature to view the source code for this demo.

This technique has the following benefits and drawbacks:

Benefits:

Memory usage: The metadata overhead needed to track collaborative edits is reasonable and is also likely to compress well as it contains runs of similar values.
Performance: It's possible to implement most of the queries and/or updates to the data structure in O(log n) time using either binary trees or arrays with binary search.

Drawbacks:

Complexity: The logic for splitting and merging is complicated and hard to implement correctly. Fuzz testing is likely essential for a correct implementation.
Grow-only: Deleting data does not reduce the size of the metadata. Addressing this properly is quite challenging as it involves coordinating between peers to ensure data is only removed when it's no longer needed by any peer in the system. It also can break down when a peer goes offline and never comes back online, but the system supports peers being offline for arbitrary lengths of time. This algorithm does not attempt to address that problem.

References:

Here are some excellent resources on CRDT text buffers that I found helpful when implementing this algorithm:

https://josephg.com/blog/crdts-go-brrr/
Talks about how to implement text-based CRDTs efficiently. Covers some of the optimizations used here including storing runs of text together and storing insertions in a flat list.
http://archagon.net/blog/2018/03/24/data-laced-with-history/
Goes over tree-based indexing (which it calls "causal trees") in a lot of depth. Discusses a possible approach to distributed garbage collection as well as the complexities that come with it. Has lots of additional references at the bottom.
https://www.bartoszsypytkowski.com/yrs-architecture/
Goes over the internals of Yjs in more detail, which is relevant because what I implemented is similar to what Yjs does. Some high-level differences are that a) Yjs doesn't store delete sets explicitly and sends all of them all over again each time syncing starts and b) Yjs gives each insert an additional rightward pointer for resolving ordering between children with the same parent.
https://www.inkandswitch.com/peritext/
"Rich text" involves attaching formatting attributes (e.g. bold or italic) to individual characters. Doing this in a collaborative environment requires careful algorithm to preserve the original human intention after an automatic merge. This article presents a way to do that.