What is Google File System?
Google File System (GFS) is a scalable distributed file system (DFS) created and developed by Google Inc. to meet Google's growing computing needs. GFS provides fault tolerance, reliability, scalability, availability, and performance for large networks and connected nodes. GFS consists of several storage systems that are built up from inexpensive standard hardware components.
It has been optimized to meet the different requirements of Google in terms of data usage and storage, for example on the search engine, which generates huge amounts of data that must be stored. The Google File System is based on the strength of standard servers and at the same time minimizes hardware weaknesses. GFS is also known as GoogleFS.
The GFS node cluster is a single master with multiple chunk servers that are continuously accessed by different client systems. Chunk servers store data as Linux files on local hard drives. Stored data is divided into large blocks (64MB) that are replicated at least three times on the network. The large chunk size reduces network overhead.
GFS was designed to meet Google's large clustering needs without putting any applications on it. Files are stored in hierarchical directories identified by path names. Metadata such as namespace, access control data and mapping information are controlled by the master, which interacts with and monitors the status updates of each chunk server via timed heartbeat messages.
GFS features include:
- fault tolerance
- Critical data replication
- Automatic and efficient data recovery
- High aggregate throughput
- Reduced client and master interaction due to the large chunk server size
- Namespace management and locking
- High availability
The largest GFS clusters have more than 1,000 nodes with a storage capacity of 300 TB. This can be accessed by hundreds of customers on an ongoing basis.