Subtracks & Tasks
Why Unique IDs Are Hard
Generate Unique IDs Using Node ID and Timestamp
Distributed systems need unique identifiers for entities, events, and messages. Without a central authority, each node must generate IDs independently...
Add Random Salt to Prevent Collisions
Your basic ID generator might produce duplicates if called multiple times within the same millisecond. Add a random component or sequence counter to e...
Implement ID Generation During Network Partition
Distributed systems must handle **network partitions** - times when nodes cannot communicate with each other. Your ID generator must continue working ...
Validate Uniqueness Across Distributed Nodes
Run your ID generator through Maelstrom verification to prove uniqueness. Maelstrom collects all generated IDs across all nodes and checks for duplica...
Optimize for High-Throughput ID Generation
Optimize your ID generator for maximum throughput. Real systems like Twitter generate millions of IDs per second. Your implementation should handle hi...
Snowflake IDs (Twitter's Approach)
Implement Snowflake ID Bit Layout
Twitter's Snowflake generates unique, roughly-sorted 64-bit IDs without coordination. The layout is: ``` | 1 bit unused | 41 bits timestamp | 10 bits...
Implement Timestamp Component with Custom Epoch
The timestamp component of a Snowflake ID uses 41 bits to represent milliseconds since a **custom epoch**. Using Unix epoch (1970-01-01) wastes over 5...
Implement Sequence Counter with Overflow Handling
Within a single millisecond, each Snowflake node can generate up to 4096 unique IDs (12-bit sequence counter). When traffic bursts exceed this limit, ...
Multi-Node Snowflake ID Verification
Snowflake IDs derive their uniqueness from the machine_id component. With 10 bits, you can have 1024 unique machines generating IDs without any coordi...
Logical Clocks as IDs
Implement a Lamport Clock
Leslie Lamport showed that in a distributed system, you don't need physical clocks to order events. A **Lamport clock** is a simple counter that provi...
Demonstrate Lamport Clock Causality Limitation
Lamport clocks guarantee: if A happens-before B, then L(A) < L(B). But the **converse is not true**: L(A) < L(B) does NOT mean A happened before B. Tw...
Implement Vector Clocks
Vector clocks solve the limitation of Lamport clocks: they can detect **concurrent events**. Each node maintains a vector of N counters (one per node)...
Implement Happened-Before Detector with Vector Clocks
With vector clocks, you can determine the **exact causal relationship** between any two events: - **A -> B** (A happened before B): A[i] <= B[i] for ...
Vector Clock Conflict Detection in Key-Value Store
In databases like Riak, vector clocks detect **write conflicts**. When two clients write to the same key concurrently (neither saw the other's write),...
Hybrid Logical Clocks (HLC)
Understand and Implement HLC Format
Hybrid Logical Clocks (HLC), used in CockroachDB and Spanner, combine physical time with a logical counter. The format is `(physical_ms, logical_count...
Implement HLC Receive Rule
The HLC receive rule is more complex than Lamport's. On receiving a message with HLC `(msg.pt, msg.lc)`: 1. `new_pt = max(local.pt, msg.pt, now_ms())...
HLC Handles Backward Clock Gracefully
NTP can adjust the system clock backward. Physical timestamps break. Lamport clocks keep working but lose time correlation. HLC handles it gracefully ...
Compare HLC, UUID v4, and Snowflake IDs
Different ID schemes have different tradeoffs. Your task is to implement all three and return a comparison table. Implement a `compare_ids` handler t...
HLC-Based Unique ID Generation for Maelstrom
Integrate HLC-based IDs into the Maelstrom `generate` workload. Each generated ID must be globally unique across all nodes and should preserve causal ...
Interview Prep
Common interview questions for Backend / Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.
Rabbit Holes
For when you want to go deeper. Curated papers, posts, and talks beyond what this track covers.
Snowflake: Generating IDs at Scale (Twitter Engineering)
The original 2010 announcement explaining why Twitter could not use UUID v4 at their write rate, and what they built instead. The constraints are instructive even today.
ULID Specification
Universally Unique Lexicographically Sortable Identifier — a more opinionated alternative to Snowflake that trades machine IDs for a simpler encoding. Good for systems where you control the client.
Lamport Clocks (1978)
Leslie Lamport's foundational paper on logical clocks. The question "what does it mean for event A to happen before event B in a distributed system" is the starting point for almost everything else in this curriculum.
Time, Clocks, and the Ordering of Events in a Distributed System
The full Lamport paper if you want every detail. Rarely have 12 pages been cited more often.