What is the design a real-time collaborative editor like google docs system design question?

Master collaborative editor system design for your next big tech interview. Covers Operational Transformation vs CRDTs with the math explained, WebSocket architecture for concurrent editing, document storage and version history, cursor presence, offline sync, access control, and the exact deep-dive questions Google, Meta, Amazon, and Microsoft interviewers ask.

Collaborative Editor System Design

If there's one system design question that separates candidates who genuinely understand distributed systems from those who've just memorised patterns, it's this one.

You type a character in a shared document. Simultaneously, your colleague halfway across the world deletes the paragraph you're typing in. A moment later, you both look at your screens — and the document makes sense. Nobody lost work. What just happened?

That seamless experience hides one of the hardest engineering problems in software: concurrent state reconciliation at global scale. Understanding how it works is what this question is really testing.

This question appears consistently at Google, Meta, Amazon, Microsoft, Dropbox, and Notion — asked precisely because it forces you to reason through a genuinely hard distributed systems problem with no clean, easy answer.

Step 1: Clarify the Scope

Interviewer: Design a real-time collaborative document editor like Google Docs.

Candidate: A few clarifying questions. Are we designing for text-only or rich text with formatting? How many concurrent users per document? Is offline editing a hard requirement? Do we need version history with revert? Should I include access control — view-only, comment, edit permissions? And are we designing for a single region or globally distributed users?

Interviewer: Rich text with basic formatting — bold, italic, headings, lists. Up to 100 concurrent editors per document. Offline editing is in scope. Version history with revert is required. Include access control. Design for global users — latency matters.

Candidate: The most technically challenging part of this design is conflict resolution: what happens when two users edit the same position simultaneously. Everything else is meaningful but more conventional. Let me start with requirements and numbers, then build up from there.

Requirements

Functional

Create, read, update, and delete documents
Multiple users can edit simultaneously — changes visible in near real-time
Rich text: bold, italic, headings, lists, hyperlinks
Live cursors: see where other collaborators are positioned
Version history: view past states, diff between versions, revert to any point
Offline editing: edit without internet; auto-sync on reconnect
Access control: owner, editor, commenter, and viewer roles per document
Document sharing via link or invitation

Non-Functional

Low latency — edits appear to collaborators within 100–200ms
Eventual consistency — all users must converge to the same document state
Durability — no committed edit should ever be lost
Scalability — millions of documents, thousands of concurrent sessions
Availability — tolerate individual server failures gracefully

Back-of-the-Envelope Estimates

Interviewer: Give me some rough numbers to anchor the design.

Candidate:

plaintext

Total documents:          2 billion (Google's reported scale)
Daily Active Users:       500 million
Concurrent sessions (peak): ~5 million
Users per active session: 2–5 on average, up to 100
 
Operations per user:      ~60 keystrokes/minute ≈ 1 op/sec
Total ops/sec at peak:    5M sessions × 3 users × 1 op/sec ≈ 15M ops/sec
 
Average document size:    ~50 KB (text content)
Total document storage:   2B × 50 KB = ~100 TB text storage
Version deltas:           ~100 TB additional
 
WebSocket connections:    ~15 million peak
  At 20,000 conns/server: ~750 WebSocket servers needed

Two things stand out. First, 15M ops/sec sounds large, but most operations are confined to a single document's collaboration server — they don't need global distribution. Second, the real architectural challenge is the 15 million WebSocket connections and routing messages between users on different servers.

High-Level Architecture

plaintext

                    ┌──────────────────────────────┐
                    │           Clients             │
                    │  (Browser / Mobile / Desktop) │
                    └──────────┬───────────────────┘
                               │ WebSocket (edits, presence)
                               │ HTTPS (doc load, save, auth)
                    ┌──────────▼───────────────────┐
                    │        API Gateway            │
                    │   (auth, routing, TLS)        │
                    └──────────┬───────────────────┘
                               │
     ┌─────────────────────────┼──────────────────────────┐
     │                         │                          │
┌────▼────────────┐  ┌─────────▼──────────┐  ┌───────────▼──────┐
│  Collaboration  │  │  Document Service  │  │ Presence Service │
│  Service        │  │  (REST CRUD)       │  │ (cursors, status)│
│  (WebSocket +   │  │                    │  │                  │
│   OT engine)    │  │                    │  │                  │
└────┬────────────┘  └─────────┬──────────┘  └───────────┬──────┘
     │                         │                          │
     └─────────────────────────┼──────────────────────────┘
                               │
         ┌─────────────────────┼────────────────────┐
         │                     │                    │
┌────────▼──────────┐  ┌───────▼──────────┐  ┌──────▼──────────┐
│  Operation Log    │  │  Document Store  │  │  Redis          │
│  (Kafka/Kinesis)  │  │  (Bigtable /     │  │  (sessions,     │
│                   │  │   PostgreSQL)    │  │   presence,     │
│                   │  │                  │  │   doc state)    │
└───────────────────┘  └──────────────────┘  └─────────────────┘

The Collaboration Service is the heart of the system. It's the only component where the genuinely hard problems live — maintaining WebSocket connections per active user, running the conflict resolution algorithm, and ensuring all editors converge to the same document state.

The Core Problem: Conflict Resolution

This is what the entire interview hinges on. Handle this section well, and the rest is impressive depth — but not the make-or-break moment.

Interviewer: Two users edit the same document simultaneously. User A inserts "beautiful " at position 4. User B deletes 3 characters starting at position 4. Both work from the same initial state. What happens when their operations reach the server?

Candidate: This is the fundamental conflict in collaborative editing. Let me show why it's hard first.

Why naive approaches fail:

plaintext

Initial document: "The World"
  T=0, h=1, e=2, ' '=3, W=4, o=5, r=6, l=7, d=8
 
User A's intent: Insert "beautiful " at position 4
  → "The beautiful World"
 
User B's intent: Delete 3 chars at position 4
  → "The rld"  (removes "Wor")
 
Server applies in arrival order:
  Apply A: "The beautiful World"
  Apply B as-is (delete pos 4, count 3): removes "bea"
  Result: "The utiful World"  ← wrong. Nobody wanted this.

Neither user's intent was honoured. This is what happens without conflict resolution.

Option 1: Operational Transformation (OT)

What it is: OT is the approach Google Docs uses.

When two operations conflict, the server transforms one relative to the other — adjusting positions and effects so that both users' intent is preserved.

How it works:

plaintext

Server receives:
  Op A: Insert("beautiful ", position=4)
  Op B: Delete(position=4, length=3)
 
OT transforms B relative to A:
  A inserts 10 chars before position 4
  Therefore: B's position shifts by +10
  Transformed B: Delete(position=14, length=3)
 
Apply A: "The beautiful World"
Apply transformed B: "The beautiful rld"
  (deletes "Wor" at the now-correct position 14)

Both users converge to "The beautiful rld." A's insertion is preserved. B's deletion is preserved. Both intents honoured.

The key requirement: OT needs a central server as the arbiter of operation ordering.

Every client sends its operation to the server. The server transforms all concurrent operations and broadcasts the results. Every client also transforms incoming server operations relative to its own locally-applied-but-unacknowledged operations.

This creates a two-level transformation:

Server-side — transforms operations relative to each other
Client-side — transforms incoming ops relative to locally-pending ops

Client state (simplified):

plaintext

State: (document, sent_revision, pending_ops)
 
On user edit:
  Apply op locally (optimistically)
  Add to pending_ops
  Send to server with current revision number
 
On receiving server op:
  Transform server_op against all pending_ops
  Apply transformed op locally
  Update sent_revision

Why Google chose OT:

The server must see every operation anyway — for access control, rendering, and storage. Adding a transformation step costs under 5ms of extra latency, which is well within the 100–200ms budget. In return, documents stay compact with no extra metadata per character.

The trade-off: OT transformation functions are notoriously complex.

For plain text insert/delete, they're manageable. For rich text with formatting, tables, and comments, the number of operation types and transformation rules multiplies rapidly. OT buys you strong intent preservation at the cost of implementation complexity.

Option 2: CRDTs (Conflict-free Replicated Data Types)

What they are: CRDTs are data structures mathematically designed to merge from multiple sources without conflicts.

Instead of transforming operations, every character gets a globally unique ID with an embedded logical clock. When concurrent inserts happen at the same position, the unique IDs provide a deterministic tiebreaker — no central coordinator required.

Example:

plaintext

User A inserts 'X' with ID (timestamp=100, userId=alice) at position 5
User B inserts 'Y' with ID (timestamp=100, userId=bob)  at position 5
 
Merge rule: sort by (timestamp, userId) deterministically
Result: 'X' (alice) before 'Y' (bob) — always, regardless of arrival order

Who uses CRDTs: Figma switched from OT to CRDTs. Notion uses a CRDT-influenced approach. Linear, Liveblocks, and many newer tools are CRDT-based.

The storage trade-off: a basic string CRDT adds 16–32 bytes of metadata per character.

A 10,000-character document grows from 10 KB to 320 KB. Figma mitigated this with aggressive garbage collection — removing tombstones (deleted character markers) older than 24 hours — accepting that very late-joining clients might see brief inconsistencies until they resync.

OT vs CRDT — side by side:

	Operational Transformation	CRDT
Central server required	Yes	No
Offline-first support	Harder	Natural
Document storage overhead	Minimal	High (metadata per char)
Implementation complexity	High (transform functions)	High (data structure)
Intent preservation	Strong	Weaker (tiebreaking)
Who uses it	Google Docs, Wave	Figma, Notion, Linear

Interviewer: Which would you choose?

Candidate: OT. We already require a central server for access control, persistence, and billing — so OT's central authority constraint adds nothing new. The 5ms transformation overhead is negligible. CRDTs shine when you're building peer-to-peer or strongly offline-first — neither of which is our constraint here.

That said, OT transformation functions for rich text are genuinely complex. Most teams today start with a library — ShareDB for OT, Yjs for CRDTs — rather than implementing the algorithm from scratch.

The OT vs CRDT question is one of the few in system design where knowing the answer isn't enough — interviewers want to hear you reason through the trade-offs live, not recite them. Getting comfortable articulating that reasoning under time pressure is exactly what Mockingly.ai is designed for, with collaborative editor simulations built around this specific choice.

WebSocket Architecture

Interviewer: How do you manage WebSocket connections at scale? What happens when two users editing the same document connect to different servers?

Candidate: This is the routing problem — one of the trickier operational challenges.

Connection routing:

When a user opens a document, the API Gateway routes their WebSocket to a Collaboration Server. The routing key is document_id. All users editing the same document must connect to the same Collaboration Server.

Consistent hashing on document_id achieves this:

plaintext

User A opens doc "abc":
  consistent_hash("abc") % num_servers → Server 7
 
User B opens doc "abc":
  consistent_hash("abc") % num_servers → Server 7 (same)
 
Server 7 runs the OT engine for "abc".
All operations for "abc" are ordered by Server 7.

This is the critical property. OT requires exactly one server to see and order all operations for a document. Two servers independently transforming the same document would produce split-brain divergence.

Server failure handling:

The Collaboration Server is stateful — it holds the document state in memory. When it dies:

Clients detect the dropped WebSocket via heartbeat timeout
Clients reconnect — consistent hashing routes them to a new server
The new server loads the last committed state from the Document Store
Clients resend locally-applied-but-unacknowledged operations
The new server re-runs OT on the resubmitted ops to catch up

The resilience property: clients buffer their unacknowledged ops locally. A server death is a latency event (reconnect + replay), not a data loss event.

Kafka for durability:

Every operation confirmed by the Collaboration Server is published to Kafka before the ACK is sent to clients.

plaintext

Client → WebSocket → Collaboration Server
  │
  ├─► Transform op (OT engine, in-memory)
  ├─► Publish to Kafka (partitioned by doc_id)
  ├─► ACK to client
  │
  └─► Kafka consumer → apply to Document Store (async)

This decouples the fast path (transform + broadcast) from the slow path (durable persistence). Users see operations in near real-time. Durability is guaranteed by Kafka — not by blocking on a database write.

Document Storage

Interviewer: How do you store the document itself?

Candidate: Documents have two distinct storage needs: current state for active editors, and full operation history for version history.

Current state — Redis:

For actively-edited documents, the canonical state lives in the Collaboration Server's RAM.

For dormant documents (no active editors), the current state caches in Redis, keyed by document_id. TTL: 1 hour. After that, load from the Document Store on next access.

Durable state — Document Store:

Rich text is stored as a node tree rather than a flat string. This reflects the document's actual structure:

json

{
  "doc_id": "abc123",
  "version": 847,
  "content": {
    "type": "doc",
    "nodes": [
      { "type": "heading", "level": 1, "text": "Project Brief" },
      { "type": "paragraph", "text": "The goal of this project..." },
      { "type": "list", "items": ["Item one", "Item two"] }
    ]
  }
}

For the store itself: Bigtable (Google's choice) or PostgreSQL with JSONB both work. The access pattern is key-based lookups by doc_id — no complex joins needed.

Version history — Operation Log:

Every operation is appended to an immutable log:

plaintext

operation_log
  doc_id      TEXT
  version     INT        -- monotonically increasing per document
  op_type     TEXT       -- insert, delete, format
  op_data     JSONB      -- { position, content, length, attributes }
  user_id     TEXT
  timestamp   TIMESTAMPTZ
  PRIMARY KEY (doc_id, version)

Interviewer: If a document has millions of operations over years, reconstructing any version requires replaying every operation. How do you handle that?

Candidate: Snapshots. Periodically — every 100 or 1,000 versions — store a full document snapshot alongside the log.

plaintext

To reconstruct version V:
  1. Find the latest snapshot S where S ≤ V
  2. Load the snapshot (full document state at version S)
  3. Replay only operations S+1 through V

Instead of replaying 1 million operations, you replay at most 1,000. Snapshots live in object storage (S3/GCS) keyed by (doc_id, snapshot_version). The operation log remains the source of truth — snapshots are derived optimisation artefacts.

Presence: Live Cursors

Interviewer: How do you show each user's cursor position in real time?

Candidate: Presence is ephemeral and high-frequency. It runs on a separate, lighter path — completely decoupled from document operations.

When a user moves their cursor, the client sends:

json

{ "type": "cursor", "user_id": "alice", "position": 142, "name": "Alice", "color": "#E91E63" }

The Collaboration Server broadcasts this to all other WebSocket connections for the same document. Cursor positions are not persisted — a server restart rebuilds cursor state from client heartbeats within seconds.

Throttling: cursor events fire on every caret movement, potentially hundreds of times per second per user. The client throttles to one update every 50–100ms. Ten users at 100ms intervals = 100 broadcasts per second — negligible.

Presence in Redis:

plaintext

Redis hash: presence:{doc_id}
  field: {user_id} → { name, color, cursor_position, last_seen }
  TTL on field: 30 seconds (reset on each heartbeat)

When a user disconnects and sends no heartbeat, their presence entry expires automatically after 30 seconds. No manual cleanup needed.

Offline Editing and Sync

Interviewer: The user closes their laptop mid-edit, opens it on a train with no internet, makes 20 edits offline, then reconnects. What happens?

Candidate: Offline editing is where OT and CRDTs diverge most sharply.

With OT (our approach):

While offline, the client buffers all operations locally. It knows its last confirmed server version — say, version 400. It applies operations locally and shows the user their changes seamlessly.

On reconnect:

plaintext

Client → Server: {
  doc_id: "abc",
  base_version: 400,
  pending_ops: [op_1, op_2, ..., op_20]
}

The server is now at version 450 — other users kept editing. The server transforms the 20 pending ops against operations 401–450, then applies and broadcasts the merged result.

plaintext

Server transforms pending_ops against [op_401...op_450]:
  For each pending_op:
    transform(pending_op, server_ops_since_base_version)
  Apply transformed ops to document
  Broadcast to all clients
  Return new version + missed ops to reconnecting client

This is exactly how Google Docs handles reconnection after offline editing.

The danger: long offline periods.

If the user was offline for hours while the document saw thousands of operations, the transformation is expensive. Google mitigates this by capping the replay window — if a client has been offline too long, it receives a full document snapshot and resync rather than thousands of transforms.

Offline sync is one of those areas that interviewers at Google and Notion push on specifically — "what's the base version?", "what if the server's at version 450?", "what if the client was offline for a week?" Having crisp answers to that chain without losing the thread is a skill. Mockingly.ai puts you in exactly that conversation.

Access Control

Interviewer: How does the permissions model work?

Candidate: Documents have an owner and a list of grants. A grant associates a principal — user, group, or link token — with a permission level.

plaintext

document_permissions
  doc_id      UUID
  principal   TEXT   -- user_id, group_id, or "link:{token}"
  permission  TEXT   -- owner / editor / commenter / viewer
  granted_by  TEXT
  granted_at  TIMESTAMPTZ
  PRIMARY KEY (doc_id, principal)

The API Gateway checks permissions on every request — before any WebSocket connection is established. The Collaboration Server trusts that authenticated users have already been verified. It does not re-check permissions on every keystroke.

Link sharing: a random token is generated when a user creates a shareable link, stored with its permission level. The API Gateway resolves the token on each request.

Viewers and commenters can still have WebSocket connections — they receive live updates and see editing in progress. They just can't send operations. The Collaboration Server silently drops operations from users without editor permission.

Scaling the Collaboration Tier

Interviewer: How does the Collaboration Service scale to thousands of concurrent sessions?

Candidate: The Collaboration Service scales horizontally. Each instance owns a set of document sessions via consistent hashing. Adding servers redistributes documents; removing servers triggers reconnection and reload.

The hot document problem:

If a company-wide announcement has 500 simultaneous editors, one Collaboration Server handles all 500 connections. That's a concentration point. Mitigations:

Cap simultaneous editors — Google Docs caps active editors per document
Separate viewer path — viewers receive updates via Redis Pub/Sub fan-out, not through the primary Collaboration Server. This scales horizontally without burdening the OT engine
Tiered broadcast — Collaboration Server → broadcast tier → clients. The broadcast tier scales independently

Cross-region latency:

A user in Tokyo editing a document whose Collaboration Server is in Virginia sees 200ms+ round-trip. This breaks the real-time feel.

The solution is regional Collaboration Servers with a document home region:

Assign each document a home region based on owner location or access patterns
Route all collaborators to that region's server
Accept that truly global collaborators have geographic latency — it cannot be eliminated without sacrificing OT's central authority
For documents with heavy cross-region use, allow home region migration

This is a genuine trade-off. Google accepts that real-time collaboration across regions involves some latency. The system feels smooth within a region and tolerates slightly more delay across regions.

Common Interview Follow-ups

"How does undo/redo work in a collaborative context?"

Undo in a collaborative editor is not simply "reverse the last operation."

If Alice types "hello" and Bob types "world" after, and Alice hits Undo — should she undo her "hello" or the most recent change overall?

Google Docs implements selective undo: Undo reverses only the user's own operations, transforming the reversal against all operations that happened after the undone one. It's another OT transform. The result: Undo respects collaborators' changes rather than blowing them away.

"What if two users apply conflicting formatting simultaneously — one bolds, the other italicises the same selection?"

Formatting operations compose cleanly in most cases.

Bold-range and italic-range on the same text don't conflict — both apply, producing bold-italic. OT defines transformation functions for formatting just as it does for insert/delete.

The tricky case: one user deletes text while another is formatting it. The OT engine must decide whether to preserve or discard the formatting operation based on whether the target text survived.

"How do you handle very large documents — a 500-page spec?"

Documents above a certain size are split into chunks. Each chunk is its own collaborative unit with its own operation log.

Users editing different chunks don't generate concurrent operations against each other. OT complexity is bounded by chunk size. The top-level document stores chunk references. This is similar to how distributed file systems split large files into blocks.

"How do you efficiently diff between two versions?"

Each version is identified by a monotonically increasing version number.

The diff between V1 and V2 is computed by replaying operations V1+1 through V2 — each operation is the diff. For display, a Myers diff algorithm runs over the reconstructed text to highlight character-level changes in the version history UI.

This is exactly why snapshots are stored — comparing two snapshots avoids replaying thousands of intermediate operations.

The follow-up questions in this section — undo/redo, conflicting formatting, large documents, version diffing — are the ones that show up mid-explanation at companies like Google and Notion when the interviewer wants to see how deep the knowledge actually goes. If answering any of them cleanly feels uncertain, that's the gap worth closing before the real interview. Mockingly.ai has system design simulations specifically built around this question and its follow-ups.

Quick Interview Checklist

✅ Clarified scope — rich text, 100 concurrent editors, offline, version history, access control, global users
✅ Back-of-the-envelope — 15M ops/sec, 15M WebSocket connections, 100TB storage
✅ OT vs CRDT — transformation mechanism described with a concrete example, not just named
✅ Chose OT with justification — central server already required, 5ms overhead negligible
✅ Conflict walk-through — showed why naive ordering breaks before explaining the fix
✅ WebSocket routing — consistent hashing on doc_id; one server per document; correctness requirement
✅ Server failure — client buffers ops locally; server death is latency, not data loss
✅ Kafka for durability — fast transform path decoupled from slow persist path
✅ Document storage — Redis for active state, Bigtable/Postgres for durable state, node tree for rich text
✅ Version history — immutable operation log + periodic snapshots; bounded replay cost
✅ Presence — ephemeral, Redis TTL, 30s auto-expiry, client-side throttling
✅ Offline sync — client buffers with base version; server transforms on reconnect; snapshot fallback for long offline
✅ Access control — permission table, API Gateway enforces before WS connection, viewers get read-only feed
✅ Scaling — horizontal Collaboration Service, hot document cap, viewer fan-out separation, home region per doc

Conclusion

The collaborative editor interview rewards a specific kind of thinking: recognising that the hard problem isn't storage, it isn't WebSockets, it isn't even scale — it's what happens when two users edit the same position at the same time.

Everything else flows from how you answer that.

Ignoring concurrency misses the core challenge. Real-time collaboration is what distinguishes Google Docs from a simple cloud file. If you spend most of your time on storage without explaining conflict resolution, you've missed the point.

The design pillars:

OT for conflict resolution — central server transforms concurrent ops to preserve intent; right when a central server is already required
Consistent hashing on doc_id — one Collaboration Server per document; this is what makes OT's central authority work
Client-side operation buffering — clients buffer unacknowledged ops; server failure is latency, not data loss
Kafka for durability — decouples the fast transform/broadcast path from the slow durable-write path
Operation log + snapshots — full version history with bounded replay cost
Ephemeral presence — cursors on a separate, lighter path; Redis TTL for automatic cleanup
Offline sync as a transform problem — pending ops submitted with base version; server transforms against intervening ops

The WebSocket connection management patterns here — sticky routing, heartbeat reconnection, cross-server message delivery — also apply to messaging systems. For a complete treatment of those patterns at chat scale, see the Real-Time Messaging System Design guide.

Frequently Asked Questions

What is Operational Transformation (OT) in collaborative editing?

Operational Transformation is a concurrency control algorithm that allows multiple users to edit a shared document simultaneously without producing conflicting results.

How it works:

When two users make edits at the same time, both operations are sent to a central server
The server determines the arrival order and transforms each operation relative to the other — adjusting character positions and effects
Both users' intent is preserved in the merged result
The transformed operations are broadcast to all clients, which converge to the same document state

The canonical example: User A inserts "beautiful " at position 4 while User B deletes 3 characters at position 4. Without OT, applying B after A deletes the wrong characters. With OT, B's position is shifted by +10 (the length of A's insertion) before being applied — preserving both users' intent.

Google Docs has used OT since its inception in 2006.

What is the difference between Operational Transformation and CRDTs?

OT requires a central server to order and transform all concurrent operations. CRDTs assign globally unique identifiers to every element, allowing edits to merge without a central coordinator.

	Operational Transformation	CRDT
Central server required	Yes	No
Offline-first support	Harder — requires base version tracking	Natural — merges always succeed
Storage overhead	Minimal — clean text only	High — 16–32 bytes metadata per character
Implementation complexity	High — transformation functions multiply with rich text	High — data structure design is non-trivial
Intent preservation	Strong — transform functions encode intent	Weaker — tie-breaking by ID, not by intent
Who uses it	Google Docs, Apache Wave	Figma, Notion, Linear, Liveblocks

When to choose OT: when a central server is already required (for access control, billing, persistence). OT adds only ~5ms transformation overhead and keeps document storage clean.

When to choose CRDTs: when building peer-to-peer tools, strongly offline-first apps, or systems where the central server is optional.

Does Google Docs use Operational Transformation or CRDT?

Google Docs uses Operational Transformation (OT).

Three reasons Google chose OT:

A central server was already required for authentication, access control, and billing — OT's central authority constraint added nothing new to the architecture
OT transformation overhead is under 5ms — negligible within the 100–200ms real-time editing latency budget
OT keeps document storage clean — no metadata per character means compact storage at 2 billion document scale

CRDTs would have added 16–32 bytes of metadata per character — a 10,000-character document would grow from ~10 KB to ~320 KB. At Google's scale, that storage cost is significant.

Why must all editors of a document connect to the same server?

OT requires a single authoritative server to order all concurrent operations for a document. This is a correctness requirement, not a load balancing choice.

Why it must be exactly one server:

OT works by having the server assign a definitive order to concurrent operations
If Server A and Server B each apply operations for the same document independently, they may apply them in different orders
Different application orders produce different transformed results
The document on Server A and Server B diverges — split-brain
All users would see different versions of the same document with no reconciliation path

How to enforce this: consistent hashing on document_id routes all users editing the same document to the same Collaboration Server instance. Adding or removing servers redistributes documents, triggering reconnection and state reload.

How does offline editing work in a collaborative editor?

Offline editing works by buffering operations locally with a base version number, then submitting them on reconnect for server-side transformation.

Step by step:

User goes offline — client notes its last confirmed server version (e.g., version 400)
User makes edits — operations are applied locally and buffered as pending_ops
User reconnects — client submits { base_version: 400, pending_ops: [...20 ops...] }
Server is now at version 450 (others kept editing while user was offline)
Server transforms the 20 pending ops against operations 401–450
Server applies the merged result and broadcasts to all clients
Client receives the transformed ops and is now in sync

The long-offline edge case:

If a client was offline for hours and the document saw thousands of operations, the transformation computation is expensive. The server caps the replay window — if base_version is too old, the server sends a full document snapshot instead of replaying thousands of transforms. The client resets to the snapshot and continues.

How does version history with revert work in a collaborative editor?

Version history is an immutable append-only operation log where every edit is a permanent record. Revert generates inverse operations rather than deleting history.

The operation log schema:

plaintext

operation_log
  doc_id    TEXT
  version   INT     -- monotonically increasing per document
  op_type   TEXT    -- insert, delete, format
  op_data   JSONB   -- position, content, length, attributes
  user_id   TEXT
  timestamp TIMESTAMPTZ
  PRIMARY KEY (doc_id, version)

To reconstruct any past version V:

Find the nearest snapshot S where S ≤ V (stored periodically — every 100–1,000 versions)
Load the snapshot (full document state at version S)
Replay only operations S+1 through V

To revert to a past version:

Do not delete operations from the log — history is immutable
Generate inverse operations (delete what was inserted, re-insert what was deleted)
Apply the inverse ops as new commits at the current version
The full edit trail is preserved; revert is just another set of operations

Why snapshots are critical: without them, reconstructing version 1 of a document with 1 million edits requires replaying all 1 million operations. With snapshots every 1,000 versions, the maximum replay is 1,000 operations.

How is live cursor presence implemented in a collaborative editor?

Cursor presence is ephemeral and runs on a completely separate path from document operations — it does not go through the OT engine and is not persisted.

How it works:

When a user moves their cursor, the client sends a lightweight message: { type: "cursor", user_id: "alice", position: 142, color: "#E91E63" }
The Collaboration Server broadcasts this to all other connected users for the same document
Cursor positions are not persisted — if the server restarts, cursor state rebuilds from client heartbeats within seconds
Each user's cursor state is stored in Redis with a 30-second TTL per field — it expires automatically if the user disconnects without sending a goodbye message
Client-side throttling limits cursor updates to one every 50–100ms — preventing flooding at high typing speed

Why keep presence separate from OT:

Cursor positions change hundreds of times per second per user. Running them through the OT engine would create unnecessary computational overhead. Cursor positions are also low-stakes — a momentarily stale cursor position has no impact on document consistency.

How does the collaborative editor handle undo in a shared document?

Collaborative undo uses selective undo, which reverses only the current user's own operations — not the most recent operation overall.

Why naive undo breaks in collaboration:

Alice types "hello" → document: "hello world"
Bob types " world" → document: "hello world"
Alice hits Undo expecting to remove "hello"
Naive undo reverses the most recent operation — removes Bob's " world" instead
Alice's change survives; Bob's is lost — not what either user intended

How selective undo works:

Alice's Undo targets her own "insert hello" operation
The reversal (delete "hello") is itself treated as a new OT operation
It is transformed against all operations that happened after Alice's original insert — including Bob's " world"
The transformation shifts positions so that "hello" is deleted correctly without touching Bob's text
Result: "world" — Alice's contribution is removed, Bob's is untouched

Selective undo is another OT transform, using the same engine. It is one of the reasons OT's transformation functions are complex — undo requires the transformation to work in reverse.

Which companies ask the collaborative editor system design question in interviews?

Google, Meta, Amazon, Microsoft, Dropbox, and Notion ask this question for senior software engineer and principal engineer roles.

Why it is a consistently popular interview question:

Cannot be faked — explaining OT with a concrete insert/delete example, describing the WebSocket routing constraint, and reasoning through offline sync requires genuine distributed systems understanding
Scales to seniority — a mid-level answer names OT and CRDTs; a senior answer explains why they differ, which to choose, and what happens when a server crashes mid-session
Directly maps to real products — every company on the list runs a document collaboration product or needs the underlying technology

What interviewers specifically listen for:

Concrete OT example — working through the insert/delete conflict with actual position numbers, not just naming OT abstractly
Why one server per document — framing this as a correctness requirement, not a performance choice
Client-side buffering for server failure — explaining that server death is latency, not data loss
Kafka before ACK — the specific ordering that decouples the fast transform path from durable storage
Selective undo — proactively raising that undo in a collaborative editor is not simple reversal

The collaborative editor interview is one of the few where you genuinely cannot fake your way through. Knowing the difference between OT and CRDTs, explaining the transformation function with a concrete example, and connecting it to the WebSocket routing constraint — these take real preparation. If you want to practice this design under real interview pressure, with follow-up questions on undo, large documents, and cross-region latency, Mockingly.ai has system design simulations built for engineers preparing for senior roles at Google, Meta, Amazon, and Notion.

Design a Real-Time Collaborative Editor like Google Docs

Step 1: Clarify the Scope

Requirements

Functional

Non-Functional

Back-of-the-Envelope Estimates

High-Level Architecture

The Core Problem: Conflict Resolution

Option 1: Operational Transformation (OT)

Option 2: CRDTs (Conflict-free Replicated Data Types)

WebSocket Architecture

Document Storage

Presence: Live Cursors

Offline Editing and Sync

Access Control

Scaling the Collaboration Tier

Common Interview Follow-ups

Quick Interview Checklist

Conclusion

Frequently Asked Questions

What is Operational Transformation (OT) in collaborative editing?

What is the difference between Operational Transformation and CRDTs?

Does Google Docs use Operational Transformation or CRDT?

Why must all editors of a document connect to the same server?

How does offline editing work in a collaborative editor?

How does version history with revert work in a collaborative editor?

How is live cursor presence implemented in a collaborative editor?

How does the collaborative editor handle undo in a shared document?

Which companies ask the collaborative editor system design question in interviews?

Companies That Ask This

Ready to Practice?

Related System Design Guides

Design a Real-Time Messaging System

Design a Metrics Monitoring and Alerting System

Design a Distributed Task Scheduler