Tracks/The Identifier

The Identifier

Beginner

Foundations|19 tasks

Solve the fundamental problem of generating globally unique identifiers in a distributed system. You will implement various ID generation strategies, handle network partitions, and ensure uniqueness across nodes without centralized coordination.

Subtracks & Tasks

Why Unique IDs Are Hard

0/5

WH-1

beginner

Generate Unique IDs Using Node ID and Timestamp

Distributed systems need unique identifiers for entities, events, and messages. Without a central authority, each node must generate IDs independently...

unique IDstimestampsnode identity

WH-2

beginner

Add Random Salt to Prevent Collisions

Your basic ID generator might produce duplicates if called multiple times within the same millisecond. Add a random component or sequence counter to e...

randomnesscollision preventionUUID

WH-3

intermediate

Implement ID Generation During Network Partition

Distributed systems must handle **network partitions** - times when nodes cannot communicate with each other. Your ID generator must continue working ...

network partitionsavailabilityCAP theorem

WH-4

intermediate

Validate Uniqueness Across Distributed Nodes

Run your ID generator through Maelstrom verification to prove uniqueness. Maelstrom collects all generated IDs across all nodes and checks for duplica...

testingverificationglobal uniqueness

WH-5

intermediate

Optimize for High-Throughput ID Generation

Optimize your ID generator for maximum throughput. Real systems like Twitter generate millions of IDs per second. Your implementation should handle hi...

performancethroughputoptimization

Snowflake IDs (Twitter's Approach)

0/4

SN-1

intermediate

Implement Snowflake ID Bit Layout

Twitter's Snowflake generates unique, roughly-sorted 64-bit IDs without coordination. The layout is: ``` | 1 bit unused | 41 bits timestamp | 10 bits...

bit manipulationSnowflake IDbit layout+1 more

SN-2

intermediate

Implement Timestamp Component with Custom Epoch

The timestamp component of a Snowflake ID uses 41 bits to represent milliseconds since a **custom epoch**. Using Unix epoch (1970-01-01) wastes over 5...

timestampepochtime representation+1 more

SN-3

intermediate

Implement Sequence Counter with Overflow Handling

Within a single millisecond, each Snowflake node can generate up to 4096 unique IDs (12-bit sequence counter). When traffic bursts exceed this limit, ...

sequence numberoverflow handlingspin wait+1 more

SN-4

advanced

Multi-Node Snowflake ID Verification

Snowflake IDs derive their uniqueness from the machine_id component. With 10 bits, you can have 1024 unique machines generating IDs without any coordi...

multi-node coordinationuniqueness verificationmonotonicity+1 more

Logical Clocks as IDs

0/5

LO-1

intermediate

Implement a Lamport Clock

Leslie Lamport showed that in a distributed system, you don't need physical clocks to order events. A **Lamport clock** is a simple counter that provi...

Lamport clocklogical timepartial order+1 more

LO-2

intermediate

Demonstrate Lamport Clock Causality Limitation

Lamport clocks guarantee: if A happens-before B, then L(A) < L(B). But the **converse is not true**: L(A) < L(B) does NOT mean A happened before B. Tw...

causalityconcurrent eventspartial order+1 more

LO-3

advanced

Implement Vector Clocks

Vector clocks solve the limitation of Lamport clocks: they can detect **concurrent events**. Each node maintains a vector of N counters (one per node)...

vector clockcausality trackingelement-wise max+1 more

LO-4

advanced

Implement Happened-Before Detector with Vector Clocks

With vector clocks, you can determine the **exact causal relationship** between any two events: - **A -> B** (A happened before B): A[i] <= B[i] for ...

happened-beforeconcurrent eventsvector comparison+1 more

LO-5

advanced

Vector Clock Conflict Detection in Key-Value Store

In databases like Riak, vector clocks detect **write conflicts**. When two clients write to the same key concurrently (neither saw the other's write),...

conflict detectionkey-value storesibling values+1 more

Hybrid Logical Clocks (HLC)

0/5

HY-1

advanced

Understand and Implement HLC Format

Hybrid Logical Clocks (HLC), used in CockroachDB and Spanner, combine physical time with a logical counter. The format is `(physical_ms, logical_count...

HLChybrid clockphysical time+1 more

HY-2

advanced

Implement HLC Receive Rule

The HLC receive rule is more complex than Lamport's. On receiving a message with HLC `(msg.pt, msg.lc)`: 1. `new_pt = max(local.pt, msg.pt, now_ms())...

HLC mergereceive ruleclock synchronization+1 more

HY-3

advanced

HLC Handles Backward Clock Gracefully

NTP can adjust the system clock backward. Physical timestamps break. Lamport clocks keep working but lose time correlation. HLC handles it gracefully ...

NTP adjustmentclock backwardmonotonic guarantee+1 more

HY-4

intermediate

Compare HLC, UUID v4, and Snowflake IDs

Different ID schemes have different tradeoffs. Your task is to implement all three and return a comparison table. Implement a `compare_ids` handler t...

UUIDSnowflakeHLC+2 more

HY-5

advanced

HLC-Based Unique ID Generation for Maelstrom

Integrate HLC-based IDs into the Maelstrom `generate` workload. Each generated ID must be globally unique across all nodes and should preserve causal ...

unique ID generationMaelstrom workloadlinearizability+1 more

Interview Prep

Common interview questions for Backend / Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

Concepts Covered

unique IDstimestampsnode identityrandomnesscollision preventionUUIDnetwork partitionsavailabilityCAP theoremtestingverificationglobal uniquenessperformancethroughputoptimizationbit manipulationSnowflake IDbit layoutscalabilitytimestampepochtime representationoverflow planningsequence numberoverflow handlingspin waitthroughput limitsmulti-node coordinationuniqueness verificationmonotonicityID distributionLamport clocklogical timepartial orderhappens-beforecausalityconcurrent eventshappens-before relationvector clockcausality trackingelement-wise maxconcurrent detectionhappened-beforevector comparisonconflict detectionkey-value storesibling valueslast-writer-winsHLChybrid clockphysical timelogical counterHLC mergereceive ruleclock synchronizationcausal consistencyNTP adjustmentclock backwardmonotonic guaranteeHLC resilienceSnowflakeID tradeoffsbenchmarkingunique ID generationMaelstrom workloadlinearizabilityHLC integration

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.

🕳

Rabbit Holes

For when you want to go deeper. Curated papers, posts, and talks beyond what this track covers.

Blog

Snowflake: Generating IDs at Scale (Twitter Engineering)

The original 2010 announcement explaining why Twitter could not use UUID v4 at their write rate, and what they built instead. The constraints are instructive even today.

Blog

ULID Specification

Universally Unique Lexicographically Sortable Identifier — a more opinionated alternative to Snowflake that trades machine IDs for a simpler encoding. Good for systems where you control the client.

Paper

Lamport Clocks (1978)

Leslie Lamport's foundational paper on logical clocks. The question "what does it mean for event A to happen before event B in a distributed system" is the starting point for almost everything else in this curriculum.

Paper

Time, Clocks, and the Ordering of Events in a Distributed System

The full Lamport paper if you want every detail. Rarely have 12 pages been cited more often.