深入理解DRBD官方指南：多路径复制技术详解

需积分: 12 87 浏览量更新于2024-07-17 收藏 1.1MB PDF 举报

《DRBD用户指南》是Brian Hellman、Florian Haas、Philipp Reisner和Lars Ellenberg四位作者共同编写的官方文档，专为DRBD（Distributed Replicated Block Device）社区设计。该指南不仅包含了关于DRBD的核心概念、安装配置、原理与操作的详细信息，还强调了持续改进的重要性，鼓励读者通过公共邮件列表[110]提供反馈和建议。文档版权方面，2008年至2011年期间的部分内容分别由LINBIT Information Technologies GmbH和LINBIT HASolutions GmbH持有。所有文本和插图遵循Creative Commons Attribution-ShareAlike 3.0 Unported (CC-BY-SA) 许可协议，这意味着如果你在分发此文档或其改编版本时，必须提供原始版本的URL链接，尊重原作者的权益。特别指出的是，DRBD、DRBD logo、LINBIT以及LINBIT logo是LINBIT Information Technologies GmbH在奥地利、美国和其他国家注册或使用的商标。《DRBD用户指南》涵盖了以下主要知识点： 1. **DRBD简介**：介绍了DRBD的基本概念，作为一款分布式复制块设备，它旨在实现数据中心中的数据冗余和高可用性。 2. **安装与配置**：详细阐述了如何在不同的操作系统和架构上安装DRBD，包括设置网络连接、配置镜像和容错策略等关键步骤。 3. **工作原理**：解释了DRBD如何实现实时数据同步，包括心跳检测、一致性算法和故障恢复机制。 4. **操作与管理**：提供了日常维护、监控、故障排查和性能优化等方面的指导。 5. **高级主题**：涉及集群扩展、存储池管理、存储虚拟化和与其他技术（如HAProxy或Pacemaker）的集成等内容。 6. **最佳实践与安全**：讨论了如何制定和实施安全策略，以及如何避免常见问题和提高整体性能。 7. **更新与社区支持**：强调了社区的重要性，并指出可以通过邮件列表获得最新的更新和解决方案。《DRBD用户指南》是深入理解并使用DRBD进行高性能、高可用性数据存储的关键资源，无论是对初学者还是经验丰富的系统管理员都具有极高的参考价值。

DRBD Features

Note that efficient refers to efficient use of network bandwidth here, and to the fact that

verification does not break redundancy in any way. On-line verification is still a resource-intensive

operation, with a noticeable impact on CPU utilization and load average.

It works by one node (the verification source) sequentially calculating a cryptographic digest

of every block stored on the lower-level storage device of a particular resource. DRBD then

transmits that digest to the peer node (the verification target), where it is checked against a digest

of the local copy of the affected block. If the digests do not match, the block is marked out-of-

sync and may later be synchronized. Because DRBD transmits just the digests, not the full blocks,

on-line verification uses network bandwidth very efficiently.

The process is termed on-line verification because it does not require that the DRBD resource

being verified is unused at the time of verification. Thus, though it does carry a slight performance

penalty while it is running, on-line verification does not cause service interruption or system down

time — neither during the verification run nor during subsequent synchronization.

It is a common use case to have on-line verification managed by the local cron daemon,

running it, for example, once a week or once a month. See Section6.6, “Using on-line device

verification” [38] for information on how to enable, invoke, and automate on-line verification.

2.8.Replication�traffic�integrity�checking

DRBD optionally performs end-to-end message integrity checking using cryptographic message

digest algorithms such as MD5, SHA-1 or CRC-32C.

These message digest algorithms are not provided by DRBD. The Linux kernel crypto API provides

these; DRBD merely uses them. Thus, DRBD is capable of utilizing any message digest algorithm

available in a particular system’s kernel configuration.

With this feature enabled, DRBD generates a message digest of every data block it replicates

to the peer, which the peer then uses to verify the integrity of the replication packet. If the

replicated block can not be verified against the digest, the peer requests retransmission. Thus,

DRBD replication is protected against several error sources, all of which, if unchecked, would

potentially lead to data corruption during the replication process:

• Bitwise errors ("bit flips") occurring on data in transit between main memory and the network

interface on the sending node (which goes undetected by TCP checksumming if it is offloaded

to the network card, as is common in recent implementations);

• bit flips occuring on data in transit from the network interface to main memory on the receiving

node (the same considerations apply for TCP checksum offloading);

• any form of corruption due to a race conditions or bugs in network interface firmware or

drivers;

• bit flips or random corruption injected by some reassembling network component between

nodes (if not using direct, back-to-back connections).

See Section 6.11, “Configuring replication traffic integrity checking” [43] for information on

how to enable replication traffic integrity checking.

2.9.Split�brain�notification�and�automatic

recovery

Split brain is a situation where, due to temporary failure of all network links between cluster

nodes, and possibly due to intervention by a cluster management software or human error, both

nodes switched to the primary role while disconnected. This is a potentially harmful state, as it

DRBD Features

implies that modifications to the data might have been made on either node, without having been

replicated to the peer. Thus, it is likely in this situation that two diverging sets of data have been

created, which cannot be trivially merged.

DRBD split brain is distinct from cluster split brain, which is the loss of all connectivity between

hosts managed by a distributed cluster management application such as Heartbeat. To avoid

confusion, this guide uses the following convention:

• Split brain refers to DRBD split brain as described in the paragraph above.

• Loss of all cluster connectivity is referred to as a cluster partition, an alternative term for cluster

split brain.

DRBD allows for automatic operator notification (by email or other means) when it detects split

brain. See Section 6.14.1, “Split brain notification” [47] for details on how to configure this

feature.

While the recommended course of action in this scenario is to manually resolve [56] the split

brain and then eliminate its root cause, it may be desirable, in some cases, to automate the process.

DRBD has several resolution algorithms available for doing so:

•

Discarding modifications made on the younger primary. In this mode, when the network

connection is re-established and split brain is discovered, DRBD will discard modifications made,

in the meantime, on the node which switched to the primary role last.

•

Discarding modifications made on the older primary. In this mode, DRBD will discard

modifications made, in the meantime, on the node which switched to the primary role first.

•

Discarding modifications on the primary with fewer changes. In this mode, DRBD will check

which of the two nodes has recorded fewer modifications, and will then discard all modifications

made on that host.

•

Graceful recovery from split brain if one host has had no intermediate changes. In this

mode, if one of the hosts has made no modifications at all during split brain, DRBD will simply

recover gracefully and declare the split brain resolved. Note that this is a fairly unlikely scenario.

Even if both hosts only mounted the file system on the DRBD block device (even read-only),

the device contents would be modified, ruling out the possibility of automatic recovery.

Whether or not automatic split brain recovery is acceptable depends largely on the individual

application. Consider the example of DRBD hosting a database. The discard modifications from

host with fewer changes approach may be fine for a web application click-through database. By

contrast, it may be totally unacceptable to automatically discard any modifications made to a

financial database, requiring manual recovery in any split brain event. Consider your application’s

requirements carefully before enabling automatic split brain recovery.

Refer to Section 6.14.2, “Automatic split brain recovery policies” [47] for details on

configuring DRBD’s automatic split brain recovery policies.

2.10.Support�for�disk�flushes

When local block devices such as hard drives or RAID logical disks have write caching enabled,

writes to these devices are considered completed as soon as they have reached the volatile cache.

Controller manufacturers typically refer to this as write-back mode, the opposite being write-

through. If a power outage occurs on a controller in write-back mode, the last writes are never

committed to the disk, potentially causing data loss.

To counteract this, DRBD makes use of disk flushes. A disk flush is a write operation that completes

only when the associated data has been committed to stable (non-volatile) storage — that is to

say, it has effectively been written to disk, rather than to the cache. DRBD uses disk flushes for

write operations both to its replicated data set and to its meta data. In effect, DRBD circumvents

DRBD Features

the write cache in situations it deems necessary, as in activity log [107] updates or enforcement

of implicit write-after-write dependencies. This means additional reliability even in the face of

power failure.

It is important to understand that DRBD can use disk flushes only when layered on top of backing

devices that support them. Most reasonably recent kernels support disk flushes for most SCSI

and SATA devices. Linux software RAID (md) supports disk flushes for RAID-1 provided that all

component devices support them too. The same is true for device-mapper devices (LVM2, dm-

raid, multipath).

Controllers with battery-backed write cache (BBWC) use a battery to back up their volatile

storage. On such devices, when power is restored after an outage, the controller flushes all

pending writes out to disk from the battery-backed cache, ensuring that all writes committed

to the volatile cache are actually transferred to stable storage. When running DRBD on top

of such devices, it may be acceptable to disable disk flushes, thereby improving DRBD’s write

performance. See Section6.13, “Disabling backing device flushes” [46] for details.

2.11.Disk�error�handling�strategies

If a hard drive fails which is used as a backing block device for DRBD on one of the nodes, DRBD

may either pass on the I/O error to the upper layer (usually the file system) or it can mask I/O

errors from upper layers.

Passing on I/O errors.If DRBD is configured to pass on I/O errors, any such errors occuring on

the lower-level device are transparently passed to upper I/O layers. Thus, it is left to upper layers

to deal with such errors (this may result in a file system being remounted read-only, for example).

This strategy does not ensure service continuity, and is hence not recommended for most users.

Masking I/O errors.If DRBD is configured to detach on lower-level I/O error, DRBD will do

so, automatically, upon occurrence of the first lower-level I/O error. The I/O error is masked

from upper layers while DRBD transparently fetches the affected block from the peer node, over

the network. From then onwards, DRBD is said to operate in diskless mode, and carries out all

subsequent I/O operations, read and write, on the peer node. Performance in this mode will be

reduced, but the service continues without interruption, and can be moved to the peer node in

a deliberate fashion at a convenient time.

See Section 6.10, “Configuring I/O error handling strategies” [42] for information on

configuring I/O error handling strategies for DRBD.

2.12.Strategies�for�dealing�with�outdated�data

DRBD distinguishes between inconsistent and outdated data. Inconsistent data is data that cannot

be expected to be accessible and useful in any manner. The prime example for this is data on

a node that is currently the target of an on-going synchronization. Data on such a node is part

obsolete, part up to date, and impossible to identify as either. Thus, for example, if the device

holds a filesystem (as is commonly the case), that filesystem would be unexpected to mount or

even pass an automatic filesystem check.

Outdated data, by contrast, is data on a secondary node that is consistent, but no longer in

sync with the primary node. This would occur in any interruption of the replication link, whether

temporary or permanent. Data on an outdated, disconnected secondary node is expected to be

clean, but it reflects a state of the peer node some time past. In order to avoid services using

outdated data, DRBD disallows promoting a resource [3] that is in the outdated state.

DRBD has interfaces that allow an external application to outdate a secondary node as soon

as a network interruption occurs. DRBD will then refuse to switch the node to the primary

role, preventing applications from using the outdated data. A complete implementation of this

functionality exists for the Pacemaker cluster management framework [59] (where it uses

DRBD Features

DRBD’s protocol A [5] is asynchronous, but the writing application will block as soon as the

socket output buffer is full (see the sndbuf-size option in drbd.conf(5) [120]). In that event,

the writing application has to wait until some of the data written runs off through a possibly small

bandwith network link.

The average write bandwith is limited by available bandwith of the network link. Write bursts can

only be handled gracefully if they fit into the limited socket output buffer.

You can mitigate this by DRBD Proxy’s buffering mechanism. DRBD Proxy will suck up all available

data from the DRBD on the primary node into its buffers. DRBD Proxy’s buffer size is freely

configurable, only limited by the address room size and available physical RAM.

Optionally DRBD Proxy can be configured to compress and decompress the data it forwards.

Compression and decompression of DRBD’s data packets might slightly increase latency. But when

the bandwidth of the network link is the limiting factor, the gain in shortening transmit time

outweighs the compression and decompression overhead by far.

Compression and decompression were implemented with multi core SMP systems in mind, and

can utilize multiple CPU cores.

The fact that most block I/O data compresses very well and therefore the effective bandwidth

increases well justifies the use of the DRBD Proxy even with DRBD protocols B and C.

See Section6.16, “Using DRBD Proxy” [50] for information on configuring DRBD Proxy.

Note

DRBD Proxy is the only part of the DRBD product family that is not published under

an open source license. Please contact sales@linbit.com [mailto:sales@linbit.com] or

sales_us@linbit.com [mailto:sales_us@linbit.com] for an evaluation license.

2.15.Truck�based�replication

Truck based replication, also known as disk shipping, is a means of preseeding a remote site with

data to be replicated, by physically shipping storage media to the remote site. This is particularly

suited for situations where

• the total amount of data to be replicated is fairly large (more than a few hundreds of gigabytes);

• the expected rate of change of the data to be replicated is less than enormous;

• the available network bandwidth between sites is limited.

In such situations, without truck based replication, DRBD would require a very long initial device

synchronization (on the order of days or weeks). Truck based replication allows us to ship a data

seed to the remote site, and drastically reduce the initial synchronization time. See Section5.6,

“Using truck based replication” [30] for details on this use case.

2.16.Floating�peers

Note

This feature is available in DRBD versions 8.3.2 and above.

A somewhat special use case for DRBD is the floating peers configuration. In floating peer setups,

DRBD peers are not tied to specific named hosts (as in conventional configurations), but instead

have the ability to float between several hosts. In such a configuration, DRBD identifies peers by

IP address, rather than by host name.

剩余169页未读，继续阅读

lisq6151

粉丝: 0
资源: 15

深入理解DRBD官方指南：多路径复制技术详解

drbd-8.4.4.tar.gz

DRBD官方指南（中文）

drbd 8.3 rpm 安装包 64位

linbit-documentation：官方DRBD文档

drbd9.0官方中文文档资料.zip

DRBD中文指南|drbd官方指南翻译

DRBD用户指南：英文官方文档

drbd 使用手册

DRBD9.0.0.pdf

DRBD 9.0 cn.html

最新资源