谷歌分布式文件系统：Google File System详解

需积分: 10 109 浏览量更新于2024-07-29 收藏 269KB PDF 举报

"Google 文件系统是一种可扩展的分布式文件系统，设计用于大型数据密集型应用。它在廉价的商业硬件上运行，提供高聚合性能，并为大量客户端提供故障容忍能力。该系统根据谷歌自身的工作负载和不断变化的技术环境进行设计，与早期的分布式文件系统有显著不同，挑战了传统选择并探索了全新的设计思路。目前，Google 文件系统在谷歌内部广泛部署，作为存储平台支持服务、研究和开发项目的数据生成和处理需求。最大的集群至今已提供了数百TB的存储空间，横跨数千个磁盘，分布在一千多台机器上。" 谷歌文件系统（Google File System，简称GFS）是谷歌为了满足大规模分布式计算需求而创建的一种创新性文件系统。它的主要设计目标包括高可用性、可扩展性和高性能，同时考虑到运行在低成本硬件上的实际情况。 1. **分布式架构**：GFS通过将数据分割成大块（通常为64MB）并分散存储在多台机器上，实现了数据的分布式存储。每个数据块都有多个副本，以提高容错性和可用性。主服务器（Master Node）负责管理文件系统的元数据，包括文件到数据块的映射，以及块的位置信息。 2. **数据冗余与容错**：GFS采用三副本策略，确保即使在一个或两个副本失效的情况下，系统仍能正常工作。主服务器会监控副本的状态，并在必要时进行恢复或重新分配。 3. **高性能读写**：由于数据块较大，GFS优化了大文件的读写操作，减少了磁盘I/O的开销。写操作通常追加到数据块的末尾，避免了数据块中间的随机修改，从而提高了性能。 4. **租约机制**：为了防止多个客户端同时修改同一数据块，GFS引入了租约机制。客户端在访问数据块时会获取一个租约，只有持有有效租约的客户端才能进行修改，确保了数据的一致性。 5. **粗粒度锁**：GFS使用粗粒度锁来简化并发控制，每个数据块只有一个锁，而不是为文件的每个部分都设置锁。这使得大规模并行访问成为可能，提高了系统性能。 6. **故障检测与恢复**：主服务器定期检查节点的健康状况，发现故障后，会自动将受影响的数据块的副本重新分配到其他健康的节点，保证服务的连续性。 7. **客户端库**：GFS通过客户端库与应用程序交互，库中包含了对文件系统操作的封装，如打开、关闭、读写等，同时也处理了与主服务器和块服务器的通信细节。 8. **弹性扩展**：随着数据量的增长，GFS能够无缝地添加新的硬件节点，扩大存储容量和处理能力，这得益于其分布式和模块化的设计。 9. **应用驱动的设计**：GFS的设计不是基于通用原则，而是根据谷歌内部的应用需求，如网页索引、日志处理等，这些应用的特点是大数据量、高吞吐和低延迟。 10. **优化的块服务器**：块服务器负责实际的数据存储和访问，它们在内存中缓存最近访问的数据块，以提高访问速度。 Google 文件系统是为了解决大规模分布式计算中的存储问题而诞生的，它的设计理念和实现方式对后来的分布式文件系统产生了深远影响，例如Hadoop的HDFS就是受到了GFS的启发。通过在廉价硬件上构建高可用和高性能的文件系统，GFS成功地支撑了谷歌的众多服务和大规模数据处理任务。

Legend:

Data messages

Control messages

Application

(file name, chunk index)

(chunk handle,

chunk locations)

GFS master

File namespace

/foo/bar

Instructions to chunkserver

Chunkserver state

GFS chunkserverGFS chunkserver

(chunk handle, byte range)

chunk data

chunk 2ef0

Linux file system Linux file system

GFS client

Figure 1: GFS Architecture

and replication decisions using global knowledge. However,

we must minimize its involvement in reads and writes so

that it does not become a bottleneck. Clients never read

and write ﬁle data through the master. Instead, a client asks

the master which chunkservers it should contact. It caches

this information for a limited time and interacts with the

chunkservers directly for many subsequent operations.

Let us explain the interactions for a simple read with refer-

ence to Figure 1. First, using the ﬁxed chunk size, the client

translates the ﬁle name and byte oﬀset speciﬁed by the ap-

plication into a chunk index within the ﬁle. Then, it sends

the master a request containing the ﬁle name and chunk

index. The master replies with the corresponding chunk

handle and locations of the replicas. The client caches this

information using the ﬁle name and chunk index as the key.

The client then sends a request to one of the replicas,

most likely the closest one. The request speciﬁes the chunk

handle and a byte range within that chunk. Further reads

of the same chunk require no more client-master interaction

until the cached information expires or the ﬁle is reopened.

In fact, the client typically asks for multiple chunks in the

same request and the master can also include the informa-

tion for chunks immediately following those requested. This

extra information sidesteps several future client-master in-

teractions at practically no extra cost.

2.5 Chunk Size

Chunk size is one of the key design parameters. We have

chosen 64 MB, which is much larger than typical ﬁle sys-

tem block sizes. Each chunk replica is stored as a plain

Linux ﬁle on a chunkserver and is extended only as needed.

Lazy space allocation avoids wasting space due to internal

fragmentation, perhaps the greatest objection against such

a large chunk size.

A large chunk size oﬀers several important advantages.

First, it reduces clients’ need to interact with the master

because reads and writes on the same chunk require only

one initial request to the master for chunk location informa-

tion. The reduction is especially signiﬁcant for our work-

loads because applications mostly read and write large ﬁles

sequentially. Even for small random reads, the client can

comfortably cache all the chunk location information for a

multi-TB working set. Second, since on a large chunk, a

client is more likely to perform many operations on a given

chunk, it can reduce network overhead by keeping a persis-

tent TCP connection to the chunkserver over an extended

period of time. Third, it reduces the size of the metadata

stored on the master. This allows us to keep the metadata

in memory, which in turn brings other advantages that we

will discuss in Section 2.6.1.

On the other hand, a large chunk size, even with lazy space

allocation, has its disadvantages. A small ﬁle consists of a

small number of chunks, perhaps just one. The chunkservers

storing those chunks may become hot spots if many clients

are accessing the same ﬁle. In practice, hot spots have not

been a major issue because our applications mostly read

large multi-chunk ﬁles sequentially.

However, hot spots did develop when GFS was ﬁrst used

by a batch-queue system: an executable was written to GFS

as a single-chunk ﬁle and then started on hundreds of ma-

chines at the same time. The few chunkservers storing this

executable were overloaded by hundreds of simultaneous re-

quests. We ﬁxed this problem by storing such executables

with a higher replication factor and by making the batch-

queue system stagger application start times. A potential

long-term solution is to allow clients to read data from other

clients in such situations.

2.6 Metadata

The master stores three major types of metadata: the ﬁle

and chunk namespaces, the mapping from ﬁles to chunks,

and the locations of each chunk’s replicas. All metadata is

kept in the master’s memory. The ﬁrst two types (names-

paces and ﬁle-to-chunk mapping) are also kept persistent by

logging mutations to an operation log stored on the mas-

ter’s local disk and replicated on remote machines. Using

a log allows us to update the master state simply, reliably,

and without risking inconsistencies in the event of a master

crash. The master does not store chunk location informa-

tion persistently. Instead, it asks each chunkserver about its

chunks at master startup and whenever a chunkserver joins

the cluster.

2.6.1 In-Memory Data Structures

Since metadata is stored in memory, master operations are

fast. Furthermore, it is easy and eﬃcient for the master to

periodically scan through its entire state in the background.

This periodic scanning is used to implement chunk garbage

collection, re-replication in the presence of chunkserver fail-

ures, and chunk migration to balance load and disk space

剩余14页未读，继续阅读

颖哥儿

粉丝: 34
资源: 28

谷歌分布式文件系统：Google File System详解

The Google File System中文版

Google File System Paper

The Google File System中文翻译

google file system

Google File System、Lustre File System、Global File System三种分步式文件系统研究

The Google File System

The Google file system

Google技术之Google File System

Google File System.pdf

Google File System中文版

最新资源