For plain text, this is not so hard. The data model for plain text (e.g. a strin...

For plain text, this is not so hard. The data model for plain text (e.g. a string) and the set of mutations on that model are pretty small. Also describing where a cursor is and what is selected is likewise fairly straight forward. Co-editing between plain text editors is completely doable, IMHO.

It's much harder for rich text editors (RTEs) because the various RTE's vary widely in the exact subset of rich text features they support. One will support tables, and another will not. One will support video and another will not. One will use a linear position to describe where the cursor is, and another will use a DOM Range. This makes it very hard to support co-editing between different rich text editors. It's not impossible, just hard enough where most of us are not tackling it.