Collaborative Text Editing Without CRDTs or OT
46 comments
·May 21, 2025Animats
jahewson
This is literally a degenerate CRDT. Central server for tie-breaking goes back to Google Wave.
yubblegum
Is this really that novel? I mean using a central process for serializing a distributed system is like a no brainer -- didn't we start off from here originally? -- until you have to worry about network partitions, and CAP and all that jazz. You also now have a single point of failure. Also I skimmed the thing but was performance discussed?
bongodongobob
Yeah I have the same question. I'm not familiar with the problem space but this seems like my naive first idea so I'm wondering what the catch is.
sdeframond
What you describe is a CRDT, isn't it ?
ori_b
No; there is no single consistent final state that the system must converge to if the parts go offline. If you have this document:
a{uuid=1}
and two clients send the following operations: b{uuid=2} insert-after{uuid=1}
c{uuid=3} insert-after{uuid=1}
then the following two documents are both valid final states: abc
acb
That's fine as long as you have an authoritative server that observes all events in a single order and a way to unwind misordered local state, but it means that it's not a CRDT.YoumuChan
Like Raft is a "special case" of Paxos, this feels like a "special case" of CRDT.
It has all the flavor of CRDT, but adds a leader and a different way for the total ordering (basically using leader's local lamport clock to break tie).
Throw in leader reelection and some ledger syncing and then give everything some other names, I bet you can have "collaborative text editing on one page".
sdeframond
What about ordering concurrent operations by id? Then "abc" is the only consistent final state.
I get what you mean though, having a central authority greatly relaxes the requirement.
Yoric
If you have the additional semantics that says "operation with lowest uuid gets applied first", don't you essentially get a CRDT?
I mean, a uuid is kind of a poor man's Lamport clock, isn't it?
hem777
Cool to see a write up on this! Discovered the same method years ago and also wondered why it doesn’t show up in academic literature.
I implemented this in a decentralized context and as a CRDT though, so that the properties hold (commutative, idempotent and associative).
k__
If the idea is to have an alternative to CRDT, what did you gain from building it as CRDT?
hem777
It wasn’t really that we wanted to have a CRDT per se, but as it was implemented on top of an op-based append-only log CRDT, it turned out to hold those properties, which makes it a CRDT. We wanted to have the edits to be able to arrive in any order or after delays due to network partitions (this was for a p2p network).
Footkerchief
Surprised to see no discussion of other data structures like dicts/maps, or arrays of arbitrary type. Hopefully they'd be a straightforward extension. IME, apps need collaborative data structures more often than they need pure collaborative text editing.
The motivating examples (update validation, partial loading, higher-level operations) are interesting, but I don't see a strong case that the reason Yjs etc. lack these features is the underlying CRDT implementation, as opposed to these features being intrinsically difficult to build.
filleokus
> Surprised to see no discussion of other data structures like dicts/maps, or arrays of arbitrary type. Hopefully they'd be a straightforward extension. IME, apps need collaborative data structures more often than they need pure collaborative text editing.
Totally agree. I guess an array of "atomic" objects, where the properties of the objects can't be changed can be done just by replacing the string with your own type. Changes inside of the object is probably trickier, but maybe it's just about efficiently storing/traversing the tree?
I've also always thoguth it should be possible to create something where the consumer of the helper library (per OP terminology) can hook in their own lightweight "semantic model" logic, to prevent/manage invalid states. A todo item can't both have isDone: true and state: inProgress at the same time. Similar to rich text formatting semantics mentioned in the linked article.
math_dandy
Is the take-home message of the post that the full complexity of CRDTs/OT is necessary only in the absence of a central server?
mweidner
Even in the absence of a central server, you can still avoid CRDT/OT complexity if you have a decentralized way to eventually total order operations & apply them in that order: https://mattweidner.com/2025/05/21/text-without-crdts.html#d...
As others in the comments argue, this is technically a CRDT (though a fully general one); also, undoing/replaying ops is itself non-trivial to implement. However, I hope this is still simpler than using a traditional CRDT/OT for each data type.
math_dandy
Did you perhaps mean to write, “though not a fully general one”?
sampton
OT requires centralized server.
pshc
Use of server reconciliation makes me think client-side reconciliation would be tricky… how do you preserve smooth editor UX while applying server updates as they arrive?
For example, if your client-sent request to insert a character fails, do you just retry the request? What if an update arrived in the intervening time? (Edit: they acknowledge this case in the “Client-Side” section, the proposal is to rewind and replay, and a simpler proposal to block until the pending queue is flushed)
From a frontend vantage I feel like there may be a long tail of underspecified UI/UX edge cases, such that CRDT would be simpler overall. And how does the editor feel to use while riding the NYC subway where coverage is spotty?
straws
Both ProseMirror and the newer version of CodeMirror have a pretty elegant solution to this: each modification to the document is modeled as a step that keeps track of indices instead of node/text identities, and uses a data structure called a "position map" that the buffered steps can be mapped through to create steps with updated positions, which are then applied to your document.
In practice, it works quite well. Here's more info:
https://marijnhaverbeke.nl/blog/collaborative-editing.html https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...
asdffdasy
so, an unoptimized crdt? just set max set size to 1 and yolo?
th0ma5
It's very appealing as a kind of irreducible complexity. It feels close to what is actually happening and is simple hahah if like you say not optimized.
hencq
I'm not an expert on this, but the main difference with a CRDT like Automerge seems to be the server reconciliation. See for example this article [1]. Automerge handles concurrent insertions by using a sequence number and relying on an agreed ordering of agent ids when insertions are concurrent, while this scheme relies on the server to handle them in the order they come in.
The article mentions this:
> This contrasts with text-editing CRDTs, in which IDs are ordered for you by a fancy algorithm. That ordering algorithm is what differs between the numerous text-editing CRDTs, and it’s the complicated part of any CRDT paper; we get to avoid it entirely.
I can buy the idea that many apps have a central server anyway, so you can avoid the "fancy algorithm", though the server reconciliation requires undoing and replaying of local edits, so it's not 100% clear to me if that's much simpler.
hem777
Agreed, the undoing and replying isn’t exactly non-fancy. Peristent B+Tree is also not very non-fancy.
mweidner
I believe that Automerge internally stores all operations in an eventually consistent total order, which you can use as a substitute for the server in server reconciliation (cf. https://mattweidner.com/2025/05/21/text-without-crdts.html#d...). However, Automerge does not actually do that - instead it processes text operations using a traditional CRDT, RGA, probably because (as you point out) undoing and replaying ops is not trivial to implement.
null
coolcase
Is collaborative text editing with offline sync a nerd snipe [0]? I work for a big tech and write a lot and usually worse case someone else edits at the same time and the server can figure it out. yes it needs some kind of algo but most concurrent edits are on different parts of a huge doc.
Compare this to Git workflows. Git already handles merging most changes seamlessly.
null
apt-apt-apt-apt
Does this finally solve collaborative text editing and its friends?
Such an awesome approach.
criddell
Could it? If you and I are simultaneously editing a list of people and you are trying to order them by age and I'm ordering them alphabetically, we are still going to have to reconcile the result.
shermantanktop
By that definition, it's not a solvable problem, is it?
uh2010
wonder if there would be a perf gain with UUIDv7
pshc
They do discuss optimization of the IDs at the bottom of the article.
poly2it
Why not just use an ever incrementing u64?
kortex
Then you need central coordination, either a single central server containing the counter, or something like Snowflake where you have multiple counters, each assigned orthogonal blocks ahead of time (that need to coordinate with a central server).
UUIDs/ULIDs/etc are fully distributed, you can have two clients assign an ID without coordinating with ~0% of collision.
poly2it
You could also split the u64 to have the first 24 bits be unique to the client, and the last 40 be unique to the produced character. This would still allow 1 TiB of data produced per user and session. The single mutex would be the user ID counter.
zamadatix
An incrementing u64 requires either atomic increments between concurrent clients or recalculation logic to consistently find the newly incremented ID after conflicting information syncs. UUIDs just spit out a unique ID without any complexity or associations with other clients.
ChadNauseam
There's closely related idea that might work, though. Each device editing text could be assigned a 32-bit ID by the server (perhaps auto-incrementing). Devices then maintain a separate 32-bit ID that they increment for each operation they perform. The ID used for each character is (device_id, edit_id), which should fit nicely in 8 bytes.
null
That is very neat. The algorithm:
- Label each text character with a globally unique ID (e.g., a UUID), so that we can refer to it in a consistent way across time - instead of using an array index that changes constantly.
- Clients send the server “insert after” operations that reference an existing ID. The server looks up the target ID and inserts the new characters immediately after it.
- Deletion hides a character for display purposes, but it is still kept for "insert after" position purposes.
This might have potential outside text editing. Game world synchronization, maybe.