DRBD用户指南：Florian Haas, Philipp Reisner, Lars Ellenberg

需积分: 31 111 浏览量更新于2024-07-19 收藏 1.12MB PDF 举报

"DRBD用户指南" DRBD（分布式冗余块设备）是一种开源的、用于构建高可用性集群的软件，由LINBIT公司开发。它提供了数据复制功能，使得在多台服务器之间能够同步和保护数据，从而实现故障切换和负载均衡。《DRBD用户指南》是由Florian Haas、Philipp Reisner和Lars Ellenberg编写的，旨在帮助用户理解和使用DRBD技术。本指南自2008年起发布并不断更新，作者鼓励读者提供反馈以促进其持续改进。对于指南中的新增内容，特别标注了“Draft”状态，意味着这些部分还在草案阶段，期待读者的建议和指正。该指南的最新版本发布于2011年2月8日，由LINBIT Information Technologies GmbH和LINBIT HA Solutions GmbH共同拥有版权，并遵循Creative Commons Attribution-ShareAlike 3.0 Unported许可协议，允许他人在保持原作者署名并遵循相同许可条款的前提下自由使用和改编。根据CC-BY-SA协议，如果用户分享或改编这份文档，必须注明原始版本的来源URL。文档中可能包含的商标信息应遵循各自所有者的规定。 DRBD的工作原理是通过网络将数据实时复制到另一台服务器，形成镜像节点。在主节点发生故障时，系统能够自动或手动切换到备用节点，确保服务的连续性。DRBD支持多种操作模式，如同步模式（数据在写入主节点后立即复制到副节点）、异步模式（数据先写入主节点，然后在后台复制到副节点）以及半同步模式（确保至少一个副本被写入）。此外，DRBD还可以与各种集群管理软件结合使用，如Heartbeat，提供更全面的集群解决方案。配置DRBD涉及多个步骤，包括定义资源、设置网络通信参数、配置存储设备以及集成到集群管理系统。用户需要了解如何创建DRBD设备、配置资源文件、启动和停止DRBD服务，以及处理可能的故障转移情况。指南中还可能涵盖监控和调试技巧，帮助用户确保DRBD系统的稳定运行。《DRBD用户指南》是学习和部署DRBD技术的重要参考资料，涵盖了从基础概念到高级特性的全面介绍，对希望构建高可用性集群环境的IT专业人士来说是不可或缺的工具。通过深入阅读和实践，用户可以掌握如何利用DRBD来提升数据安全性，减少单点故障的风险，并实现高效的灾难恢复策略。

DRBD Features

DRBD optionally performs end-to-end message integrity checking using cryptographic message

digest algorithms such as MD5, SHA-1 or CRC-32C.

Note

These message digest algorithms are not provided by DRBD. The Linux kernel crypto

API provides these; DRBD merely uses them. Thus, DRBD is capable of utilizing any

message digest algorithm available in a particular system's kernel configuration.

With this feature enabled, DRBD generates a message digest of every data block it replicates

to the peer, which the peer then uses to verify the integrity of the replication packet. If the

replicated block can not be verified against the digest, the peer requests retransmission. Thus,

DRBD replication is protected against several error sources, all of which, if unchecked, would

potentially lead to data corruption during the replication process:

• Bitwise errors ("bit flips") occurring on data in transit between main memory and the network

interface on the sending node (which goes undetected by TCP checksumming if it is offloaded

to the network card, as is common in recent implementations);

• bit flips occuring on data in transit from the network interface to main memory on the receiving

node (the same considerations apply for TCP checksum offloading);

• any form of corruption due to a race conditions or bugs in network interface firmware or

drivers;

• bit flips or random corruption injected by some reassembling network component between

nodes (if not using direct, back-to-back connections).

See Section 6.10, “Configuring replication traffic integrity checking” [40] for information on

how to enable replication traffic integrity checking.

2.8.Split�brain�notification�and�automatic

recovery

Automatic split brain recovery, in its current incarnation, is available in DRBD 8.0 and later.

Automatic split brain recovery was available in DRBD 0.7, albeit using only the “discard

modifications on the younger primary” strategy, which was not configurable. Automatic split brain

recovery is disabled by default from DRBD 8 onwards.

Split brain notification is available since DRBD 8.2.1.

Split brain is a situation where, due to temporary failure of all network links between cluster

nodes, and possibly due to intervention by a cluster management software or human error, both

nodes switched to the primary role while disconnected. This is a potentially harmful state, as it

implies that modifications to the data might have been made on either node, without having been

replicated to the peer. Thus, it is likely in this situation that two diverging sets of data have been

created, which cannot be trivially merged.

Note

DRBD split brain is distinct from cluster split brain, which is the loss of all connectivity

between hosts managed by a distributed cluster management application such as

Heartbeat. To avoid confusion, this guide uses the following convention:

• Split brain refers to DRBD split brain as described in the paragraph above.

• Loss of all cluster connectivity is referred to as a cluster partition, an alternative

term for cluster split brain.

DRBD Features

DRBD allows for automatic operator notification (by email or other means) when it detects split

brain. See Section 6.13.1, “Split brain notification” [44] for details on how to configure this

feature.

While the recommended course of action in this scenario is to manually resolve the split

brain [53] and then eliminate its root cause, it may be desirable, in some cases, to automate

the process. DRBD has several resolution algorithms available for doing so:

•

Discarding modifications made on the “younger” primary. In this mode, when the

network connection is re-established and split brain is discovered, DRBD will discard

modifications made, in the meantime, on the node which switched to the primary role last.

•

Discarding modifications made on the “older” primary.In this mode, DRBD will discard

modifications made, in the meantime, on the node which switched to the primary role first.

•

Discarding modifications on the primary with fewer changes.In this mode, DRBD will

check which of the two nodes has recorded fewer modifications, and will then discard all

modifications made on that host.

•

Graceful recovery from split brain if one host has had no intermediate changes.In this

mode, if one of the hosts has made no modifications at all during split brain, DRBD will simply

recover gracefully and declare the split brain resolved. Note that this is a fairly unlikely scenario.

Even if both hosts only mounted the file system on the DRBD block device (even read-only),

the device contents would be modified, ruling out the possibility of automatic recovery.

Caution

Whether or not automatic split brain recovery is acceptable depends largely on

the individual application. Consider the example of DRBD hosting a database. The

“discard modifications from host with fewer changes” approach may be fine for a

web application click-through database. By contrast, it may be totally unacceptable

to automatically discard any modifications made to a financial database, requiring

manual recovery in any split brain event. Consider your application's requirements

carefully before enabling automatic split brain recovery.

Refer to Section 6.13.2, “Automatic split brain recovery policies” [45] for details on

configuring DRBD's automatic split brain recovery policies.

2.9.Support�for�disk�flushes

When local block devices such as hard drives or RAID logical disks have write caching enabled,

writes to these devices are considered “completed” as soon as they have reached reached the

volatile cache. Controller manufacturers typically refer to this as write-back mode, the opposite

being write-through. If a power outage occurs on a controller in write-back mode, the most

recent pending writes last writes are never committed to the disk, potentially causing data loss.

To counteract this, DRBD makes use of disk flushes. A disk flush is a write operation that completes

only when the associated data has been committed to stable (non-volatile) storage — that is to

say, it has effectively been written to disk, rather than to the cache. DRBD uses disk flushes for

write operations both to its replicated data set and to its meta data. In effect, DRBD circumvents

the write cache in situations it deems necessary, as in activity log [117] updates or enforcement

of implicit write-after-write dependencies. This means additional reliability even in the face of

power failure.

It is important to understand that DRBD can use disk flushes only when layered on top of backing

devices that support them. Most reasonably recent kernels support disk flushes for most SCSI

and SATA devices. Linux software RAID (md) supports disk flushes for RAID-1, provided all

component devices support them too. The same is true for device-mapper devices (LVM2, dm-

raid, multipath).

DRBD Features

Controllers with battery-backed write cache (BBWC) use a battery to back up their volatile

storage. On such devices, when power is restored after an outage, the controller flushes the most

recent pending writes out to disk from the battery-backed cache, ensuring all writes committed

to the volatile cache are actually transferred to stable storage. When running DRBD on top

of such devices, it may be acceptable to disable disk flushes, thereby improving DRBD's write

performance. See Section6.12, “Disabling backing device flushes” [43] for details.

2.10.Disk�error�handling�strategies

If a hard drive that is used as a backing block device for DRBD on one of the nodes fails, DRBD

may either pass on the I/O error to the upper layer (usually the file system) or it can mask I/O

errors from upper layers.

Passing on I/O errors.If DRBD is configured to “pass on” I/O errors, any such errors occuring

on the lower-level device are transparently passed to upper I/O layers. Thus, it is left to upper

layers to deal with such errors (this may result in a file system being remounted read-only, for

example). This strategy does not ensure service continuity, and is hence not recommended for

most users.

Masking I/O errors. If DRBD is configured to detach on lower-level I/O error, DRBD will do

so, automatically, upon occurrence of the first lower-level I/O error. The I/O error is masked

from upper layers while DRBD transparently fetches the affected block from the peer node,

over the network. From then onwards, DRBD is said to operate in diskless mode, and carries out

all subsequent I/O operations, read and write, on the peer node. Performance in this mode is

inevitably expected to suffer, but the service continues without interruption, and can be moved

to the peer node in a deliberate fashion at a convenient time.

See Section 6.9, “Configuring I/O error handling strategies” [39] for information on

configuring I/O error handling strategies for DRBD.

2.11.Strategies�for�dealing�with�outdated�data

DRBD distinguishes between inconsistent and outdated data. Inconsistent data is data that cannot

be expected to be accessible and useful in any manner. The prime example for this is data on

a node that is currently the target of an on-going synchronization. Data on such a node is part

obsolete, part up to date, and impossible to identify as either. Thus, for example, if the device

holds a filesystem (as is commonly the case), that filesystem would be unexpected to mount or

even pass an automatic filesystem check.

Outdated data, by contrast, is data on a secondary node that is consistent, but no longer in

sync with the primary node. This would occur in any interruption of the replication link, whether

temporary or permanent. Data on an outdated, disconnected secondary node is expected to be

clean, but it reflects a state of the peer node some time past. In order to avoid services using

outdated data, DRBD disallows promoting [3] a resource that is in the outdated state.

DRBD has interfaces that allow an external application to outdate a secondary node as soon

as a network interruption occurs. DRBD will then refuse to switch the node to the primary

role, preventing applications from using the outdated data. A complete implementation of this

functionality exists for the Heartbeat cluster management framework [66] (where it uses a

communication channel separate from the DRBD replication link). However, the interfaces are

generic and may be easily used by any other cluster management application.

Whenever an outdated resource has its replication link re-established, its outdated flag is

automatically cleared. A background synchronization [6] then follows.

See the section about the DRBD outdate-peer daemon (dopd) [77] for an example DRBD/

Heartbeat configuration enabling protection against inadvertent use of outdated data.

DRBD Features

2.12.Three-way�replication

######### ## #### ####### 8.3.0 ### #####

When using three-way replication, DRBD adds a third node to an existing 2-node cluster and

replicates data to that node, where it can be used for backup and disaster recovery purposes.

Three-way replication works by adding another, stacked DRBD resource on top of the existing

resource holding your production data, as seen in this illustration:

Figure2.1.DRBD resource stacking

Primary

Backup

Secondary

Upper layer

Lower layer

The stacked resource is replicated using asynchronous replication (DRBD protocol A), whereas

the production data would usually make use of synchronous replication (DRBD protocol C).

Three-way replication can be used permanently, where the third node is continously updated

with data from the production cluster. Alternatively, it may also be employed on demand,

where the production cluster is normally disconnected from the backup site, and site-to-site

synchronization is performed on a regular basis, for example by running a nightly cron job.

2.13.Long-distance�replication�with�DRBD

Proxy

DRBD Proxy requires DRBD version 8.2.7 or above.

DRBD's protocol A [5] is asynchronous, but the writing application will block as soon as the

socket output buffer is full (see the sndbuf-size option in drbd.conf(5) [123]). In that

event, the writing application has to wait until some of the data written runs off through a possibly

small bandwith network link.

The average write bandwith is limited by available bandwith of the network link. Write bursts can

only be handled gracefully if they fit into the limited socket output buffer.

You can mitigate this by DRBD Proxy's buffering mechanism. DRBD Proxy will suck up all available

data from the DRBD on the primary node into its buffers. DRBD Proxy's buffer size is freely

configurable, only limited by the address room size and available physical RAM.

Optionally DRBD Proxy can be configured to compress and decompress the data it forwards.

Compression and decompression of DRBD's data packets might slightly increase latency. But when

the bandwidth of the network link is the limiting factor, the gain in shortening transmit time

outweighs the compression and decompression overhead by far.

DRBD Features

Compression and decompression were implemented with multi core SMP systems in mind, and

can utilize multiple CPU cores.

The fact that most block I/O data compresses very well and therefore the effective bandwidth

increases well justifies the use of the DRBD Proxy even with DRBD protocols B and C.

See Section6.15, “Using DRBD Proxy” [48] for information on configuring DRBD Proxy.

Note

DRBD Proxy is the only part of the DRBD product family that is not published

under an open source license. Please contact <sales@linbit.com> or

<sales_us@linbit.com> for an evaluation license.

2.14.Truck�based�replication

Truck based replication, also known as “disk shipping”, is a means of preseeding a remote site with

data to be replicated, by physically shipping storage media to the remote site. This is particularly

suited for situations where

• the total amount of data to be replicated is fairly large (more than a few hundreds of gigabytes);

• the expected rate of change of the data to be replicated is less than enormous;

• the available network bandwidth between sites is limited.

In such situations, without truck based replication, DRBD would require a very long initial device

synchronization (on the order of days or weeks). Truck based replication allows us to ship a data

seed to the remote site, and drastically reduce the initial synchronization time.

See Section5.6, “Using truck based replication” [29] for details on this use case.

2.15.Floating�peers

This feature is available in DRBD versions 8.3.2 and above.

A somewhat special use case for DRBD is the floating peers configuration. In floating peer setups,

DRBD peers are not tied to specific named hosts (as in conventional configurations), but instead

have the ability to “float” between several hosts. In such a configuration, DRBD identifies peers

by IP address, rather than by host name.

For more information about managing floating peer configurations, see Section8.5, “Configuring

DRBD to replicate between two SAN-backed Pacemaker clusters” [63].

剩余172页未读，继续阅读

骏马金龙

粉丝: 59
资源: 24

DRBD用户指南：Florian Haas, Philipp Reisner, Lars Ellenberg

DRBD官方文档

drbd 使用手册

DRBD官方指南（中文）

drbd rpm安装

CENTOS7+DRBD+NFS+KEEPLIVED搭建的详细文档、

单台虚拟机安装drbd

Job for drbd.service failed because the control process exited with error code. S

centos 双机热备

NFS+DRBD+KEEPLIVED搭建教程

mysql drbd

最新资源