Understanding the Architecture of Google File System (Part 2)

2025-05-18

Intro

Hello everyone, in the last part, we understood the need for building a system that can handle a large amount of data.
Why, the traditional system won’t be able to handle the scale that Google was looking for.
In this part, we will go through the high-level architecture of the GFS, understanding the responsibilities of each of the components and what the overall flow looks like whenever clients make any requests.
But before that, let’s understand the Goals and Requirements of our system

Goals and requirements

Scalability to Handle Large Data Volumes

One of the most critical requirements was scalability.
Google needed a system that could store and manage petabytes of data (that’s millions of gigabytes) across thousands of machines, without slowing them down.

Fault Tolerance and High Availability

Data loss is not an option.
Google needed a system that could keep data available and safe, even if individual servers failed.
GFS had to be highly fault-tolerant, meaning that data should be accessible even if machines went down.

Optimised for Large Sequential Reads and Appends

Google needed a system optimised for reading and writing large amounts of data in a single go, rather than lots of small transactions.
This meant GFS was designed to perform well for large, sequential reads and writes, particularly when appending new data to existing files.

Simple but Efficient

GFS was designed to be simple, yet highly efficient.
Google didn’t want to use expensive, fail-proof hardware they wanted to design a system using commodity hardware and since Google knew that with thousands of commodity machines, failures would be common.
GFS had to handle these failures in the background while maintaining high performance and ease of use for developers.

To summarise, GFS had to meet the following key requirements:

Able scale for massive data volumes
Fault-tolerant
Works well with large reads and writes
Simple and efficient
Operate on affordable hardware where failures are common.

Now let’s explore how its architecture was designed to meet these goals.

Architecture

Google decided to store files in chunks — small parts of a large file. You can imagine it like breaking a massive 10,000-page book into smaller volumes and placing them across multiple bookshelves, rather than keeping it all on a single shelf.

This was a conscious decision, as clients would generally not need a whole large file. They would need only a certain section of the file; if needed, they can request another section separately. To relate, think of it as if you are interested in reading a book series and for that, it doesn’t make sense to grab all the books of the series at once, right? You can take one book, go through it and get the next book if required.
With the above point in mind, let’s deep dive into the architecture.
Its architecture is made up of three main components: the Master Server, Chunk Servers, and the Clients that interact with them.

Master Server

At the centre of GFS is the Master Server. Think of it like the library’s head librarian — it keeps track of where every book (or chunk of data) is stored, manages the metadata, and coordinates access to the data. However, unlike traditional systems, the Master Server doesn’t get overloaded because it doesn’t handle the actual data directly — it only manages metadata and access control. One may relate to a librarian who knows where every book is, but the reader goes directly to the bookshelf to get the book themselves.

Key Responsibilities

Metadata Management: The Master Server stores metadata like the namespace (the file structure), file-to-chunk mapping, and the locations of each chunk.
Client Coordination: It handles client requests to locate the chunks but doesn’t serve the actual data.

Chunk Server

The Chunk Servers are where the actual data lives. They store large files that have been broken into smaller, manageable pieces called chunks. Each chunk is 64 MB in size, and every chunk is replicated across multiple Chunk Servers to ensure data reliability.

Key responsibilities

Data Storage: Chunk Servers store chunks of data, and each chunk is replicated three times by default to ensure redundancy.
Fault Tolerance: If a Chunk Server fails, other servers with replicas of the data can quickly take over.

GFS client library

In the Google File System, when you save a large file, it’s not stored as a single, monolithic block of data. Instead, the file is divided into smaller, manageable pieces called ‘chunks.’ But who is responsible for this division? It’s the GFS Client Library.
The GFS Client Library handles the initial task of splitting a large file into these fixed-size chunks, typically 64 megabytes each. This division makes it easier to manage, store, and access data efficiently.
Once the file is split into chunks, the client library communicates with the Master Server. The Master Server stores the metadata about where each chunk would be stored across the Chunk Servers. It keeps track of chunk locations and replication status, and ensures data consistency.
Armed with the chunk location information, the client library directly interacts with the respective Chunk Servers. It sends data to be stored or retrieves data when needed, bypassing the Master Server to reduce bottlenecks and enhance performance.

Control Flow

The Master Server controls how data is accessed, but doesn’t transfer the actual data.
When a client wants to read or write data, it first contacts the Master Server to find out where the relevant chunks are stored.
Once the locations are identified, the client communicates directly with the Chunk Servers to access the data.
This control flow keeps the system efficient, even under heavy load.

Data Flow

Data flow happens between the Client and the Chunk Servers directly.
This reduces network bottlenecks and speeds up data transfer.
The Master Server only gets involved when new chunks need to be allocated or replicated, which allows the system to scale smoothly.

Outro

In summary, the architecture of the Google File System is meticulously designed to meet the demanding needs of large-scale data management.
With the Master Server coordinating metadata, Chunk Servers handling data storage, and the GFS Client Library efficiently managing chunk creation and replication, GFS stands as a robust, scalable, and fault-tolerant solution that powers some of Google’s most critical services.
This was a very high-level view of the architecture. In future parts, we will understand more advanced topics like replication, snapshots, atomic records, etc.
So stay tuned, until next time…..

GFS other parts

References

https://storage.googleapis.com/gweb-research2023-media/pubtools/4446.pdf

Other Blogs

https://medium.com/@shivamgor498/java-virtual-thread-ced98c382212

Enjoyed the read? Share it:

Twitter LinkedIn WhatsApp