Subtracks & Tasks
Distributed File Storage
Design a GFS-Style Distributed File System Architecture
The Google File System (GFS) architecture is the foundation of modern distributed storage. It separates metadata (managed by a master) from data (stor...
Implement the Master Namespace Tree
The master's namespace is a hierarchical tree of directories and files. It maps every file to its chunks and their locations. This is stored entirely ...
Implement Chunk Creation and Allocation
When a client creates a file or appends a new chunk, the master must allocate chunk storage on appropriate chunk servers. Chunk creation flow: 1. Cli...
Implement Chunk Replication with Pipeline Writes
When a client writes data, the primary chunk server coordinates replication to all secondaries. GFS uses a **pipeline** design where data flows in a c...
Implement Chunk Leases for Primary Assignment
A chunk lease grants one chunk server the exclusive right to define the mutation order for a chunk. This avoids per-operation consensus while maintain...
Fault Tolerance and Rebalancing
Implement Chunk Server Heartbeats
Chunk server heartbeats are the master's only mechanism for tracking which servers are alive and which chunks they hold. Without heartbeats, the maste...
Implement Automatic Re-Replication
When a chunk server dies, its chunks become under-replicated. The master must automatically schedule re-replication to restore the target replication ...
Implement Chunk Server Load Balancing
Over time, chunk distribution becomes uneven: new servers start empty, old servers fill up, and some receive more writes. Load balancing moves chunks ...
Implement Master Failover with Shadow Master
The master is a single point of failure. A shadow master mitigates this by continuously replaying the primary's WAL, staying nearly synchronized. Fai...
Implement Chunk Checksums for Data Integrity
Disks can silently corrupt data without any error signal. Chunk checksums detect this corruption before it is returned to users. Checksum design: 1. ...
Interview Prep
Common interview questions for Infrastructure / Storage Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.
Rabbit Holes
For when you want to go deeper. Curated papers, posts, and talks beyond what this track covers.
The Google File System
The 2003 SOSP paper describing GFS — the distributed file system that influenced HDFS, Ceph, and essentially every large-scale distributed storage system that followed. Refreshingly honest about the trade-offs made for Google's specific workload.
The Hadoop Distributed File System
HDFS was designed as an open-source implementation of GFS. This paper covers the key differences — stronger consistency model, the NameNode design — and the operational experience from running HDFS at Yahoo scale.
Ceph: A Scalable, High-Performance Distributed File System
The Ceph paper from OSDI 2006 introducing CRUSH (Controlled Replication Under Scalable Hashing) for data placement. Ceph's elimination of a central metadata server using CRUSH is its key architectural innovation over GFS.