Megastore：构建高可用、可扩展的交互式服务存储解决方案

需积分: 15 177 浏览量更新于2024-09-15 收藏 931KB PDF 举报

"Megastore是谷歌开发的一种存储系统，旨在满足现代互动在线服务的需求。它结合了NoSQL数据存储的可扩展性和传统关系型数据库管理系统的便利性，提供强一致性和高可用性。通过细粒度的数据分区，Megastore实现了在广域网内的同步数据写入复制，允许在数据中心之间无缝故障切换，保持低延迟的高性能。" 本文摘要介绍了Megastore的核心特性及其复制算法，并分享了使用Megastore支持各种谷歌生产服务的实际经验。一、Megastore的特性与设计原理 1. 可扩展性：Megastore借鉴了NoSQL数据存储的设计理念，能够随着业务增长灵活扩展，处理海量数据，确保服务的性能不会因数据量增加而显著下降。 2. 强一致性：在细粒度的数据分区基础上，Megastore实现了ACID（原子性、一致性、隔离性和持久性）事务，保证了数据的一致性状态，即使在分布式环境中也能避免数据不一致的问题。 3. 高可用性：通过在多个数据中心进行同步复制，Megastore可以在一个数据中心出现故障时，快速将服务切换到其他正常运行的数据中心，从而确保服务的连续性和稳定性。 4. 同步复制：每个写操作都会被同步地复制到网络中的其他位置，以实现数据的备份和容灾，同时保持较低的延迟，确保用户体验。二、复制算法 Megastore采用了一种创新的复制算法来保证数据的可靠性和一致性。这种算法可能包括主-从复制、多副本一致性协议等技术，确保在节点故障时能快速选举新的主节点，继续提供服务。三、实际应用与经验 Megastore已被广泛应用于谷歌的各种生产服务，如社交媒体、在线协作工具等。通过这些实际部署，谷歌积累了大量关于系统性能、故障恢复、数据安全等方面的经验，不断优化和改进Megastore的设计。四、分类与主题该文可能被归类在计算机科学的分布式系统领域，具体涉及分布式计算、数据存储和容错机制等多个子领域。其内容可能对研究分布式数据库、云存储解决方案以及在线服务的开发者具有重要参考价值。 Megastore是谷歌应对大规模在线服务挑战的一个重要技术创新，它的设计思路和实践经验对于理解如何构建可扩展、高可用且一致性的分布式存储系统具有深远的启示意义。

Figure 1: Scalable Replication

Figure 2: Operations Across Entity Groups

replicated via Paxos). Operations across entity groups could

rely on expensive two-phase commits, but typically leverage

Megastore’s eﬃcient asynchronous messaging. A transac-

tion in a sending entity group places one or more messages

in a queue; transactions in receiving entity groups atomically

consume those messages and apply ensuing mutations.

Note that we use asynchronous messaging between logi-

cally distant entity groups, not physically distant replicas.

All network traﬃc between datacenters is from replicated

op erations, which are synchronous and consistent.

Indexes local to an entity group obey ACID semantics;

those across entity groups have looser consistency. See Fig-

ure 2 for the various operations on and between entity groups.

2.2.2 Selecting Entity Group Boundaries

The entity group deﬁnes the a priori grouping of data

for fast op erations. Boundaries that are too ﬁne-grained

force excessive cross-group operations, but placing too much

unrelated data in a single group serializes unrelated writes,

which degrades throughput.

The following examples show ways applications can work

within these constraints:

Email Each email account forms a natural entity group.

Operations within an account are transactional and

consistent: a user who sends or labels a message is

guaranteed to observe the change despite possible fail-

over to another replica. External mail routers handle

communication between accounts.

Blogs A blogging application would be modeled with mul-

tiple classes of entity groups. Each user has a proﬁle,

which is naturally its own entity group. However, blogs

are collaborative and have no single permanent owner.

We create a second class of entity groups to hold the

p osts and metadata for each blog. A third class keys

oﬀ the unique name claimed by each blog. The appli-

cation relies on asynchronous messaging when a sin-

gle user operation aﬀects both blogs and proﬁles. For

a lower-traﬃc operation like creating a new blog and

claiming its unique name, two-phase commit is more

convenient and performs adequately.

Maps Geographic data has no natural granularity of any

consistent or convenient size. A mapping application

can create entity groups by dividing the glob e into non-

overlapping patches. For mutations that span patches,

the application uses two-phase commit to make them

atomic. Patches must be large enough that two-phase

transactions are uncommon, but small enough that

each patch requires only a small write throughput.

Unlike the previous examples, the number of entity

groups does not grow with increased usage, so enough

patches must be created initially for suﬃcient aggre-

gate throughput at later scale.

Nearly all applications built on Megastore have found nat-

ural ways to draw entity group boundaries.

2.2.3 Physical Layout

We use Google’s Bigtable [15] for scalable fault-tolerant

storage within a single datacenter, allowing us to support

arbitrary read and write throughput by spreading operations

across multiple rows.

We minimize latency and maximize throughput by let-

ting applications control the placement of data: through the

selection of Bigtable instances and speciﬁcation of locality

within an instance.

To minimize latency, applications try to keep data near

users and replicas near each other. They assign each entity

group to the region or continent from which it is accessed

most. Within that region they assign a triplet or quintuplet

of replicas to datacenters with isolated failure domains.

For low latency, cache eﬃciency, and throughput, the data

for an entity group are held in contiguous ranges of Bigtable

rows. Our schema language lets applications control the

placement of hierarchical data, storing data that is accessed

together in nearby rows or denormalized into the same row.

3. A TOUR OF MEGASTORE

Megastore maps this architecture onto a feature set care-

fully chosen to encourage rapid development of scalable ap-

plications. This section motivates the tradeoﬀs and de-

scrib es the developer-facing features that result.

3.1 API Design Philosophy

ACID transactions simplify reasoning about correctness,

but it is equally imp ortant to be able to reason about perfor-

mance. Megastore emphasizes cost-transparent APIs with

runtime costs that match application developers’ intuitions.

Normalized relational schemas rely on joins at query time

to service user operations. This is not the right model for

Megastore applications for several reasons:

• High-volume interactive workloads beneﬁt more from

predictable performance than from an expressive query

language.

225

剩余11页未读，继续阅读

zhoujq

粉丝: 22
资源: 36

Megastore：构建高可用、可扩展的交互式服务存储解决方案

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

SQL和NOSQL融合

google megastore 报告与揭秘

Altera IP MegaStore

Google Megastore分布式存储技术全揭秘.doc

MegaStore:具有数据库交互的 Android 应用程序

Megastore事务机制与分布式存储解析

谷歌Megastore：分布式存储技术详解

SQL与NOSQL融合：Megastore存储系统

Google Megastore: 数据复制与一致性设计

最新资源