谷歌B4全球软件定义广域网架构

需积分: 0 129 浏览量更新于2024-09-11 收藏 3.25MB PDF 举报

"本文主要介绍了谷歌的B4网络架构，这是一种全球部署的软件定义广域网，用于连接谷歌的数据中心。B4网络架构的特点包括大规模带宽需求、弹性流量需求以及对边缘服务器和网络的完全控制，这使得在网络边缘可以进行速率限制和需求测量。通过采用OpenFlow来控制基于商用硅的简单交换机，实现了一种软件定义的网络架构。B4的集中式流量工程服务可驱动链路达到近100%的利用率，并根据应用优先级和需求在多条路径之间分配流量。文章详细讨论了B4在三年生产运行中的实践经验。" B4网络架构是谷歌为了满足其全球数据中心间的高效通信需求而设计的一种私有广域网。它具有以下关键特点： 1. **大规模带宽需求**：由于谷歌的数据中心处理着海量数据，B4网络需要具备极大的带宽能力，以支持高速的数据传输。尽管站点数量不多，但每个站点之间的通信带宽需求非常大。 2. **弹性流量需求**：B4网络设计时考虑到了流量的动态变化，旨在最大化平均带宽使用率，确保在不同时间段和不同场景下都能有效利用网络资源。 3. **全面控制**：谷歌对其边缘服务器和网络有完全的控制权，这使得在网络边缘进行流量管理和性能监测成为可能，如速率限制和需求测量，从而提高了网络的灵活性和效率。 4. **软件定义网络（SDN）架构**：B4采用了OpenFlow协议来实现对网络的软件控制。通过OpenFlow，谷歌能够集中管理相对简单的交换机，这些交换机由商用硅芯片构建，降低了硬件复杂性并提升了可编程性。 5. **高效流量工程**：B4的中心化流量工程系统能够使链路利用率接近100%，避免了资源浪费。同时，该系统能够根据应用的优先级和需求，将流量智能地分散到多条路径上，以平衡容量与应用需求。文章还分享了B4在实际生产环境中的运行经验，这涵盖了三年的时间跨度。这可能涉及到网络性能优化、故障恢复策略、安全性和扩展性的改进等方面。这些经验对于理解大规模分布式网络的运维和优化具有重要的参考价值，同时也展示了SDN在解决现代数据中心互联挑战方面的潜力。

Design Decision Rationale/Benets Challenges

B4 routers built from

merchant switch silicon

B4 apps are willing to trade more average bandwidth for fault tolerance.

Edge application control limits nee d for large buers. Limited number of B4 sites means

large forwarding tables are not required.

Relatively low router cost allows us to scale network capacity.

Sacrice hardware fault tolerance,

deep buering, and support for

large routing tables.

Drive links to 100%

utilization

Allows ecient use of expensive long haul transport.

Many applications willing to trade higher average bandwidth for predictability. Largest

bandwidth consumers adapt dynamically to available bandwidth.

Packet loss becomes inevitable

with substantial capacity loss dur-

ing link/switch failure.

Centralized trac

engineering

Use multipath forwarding to balance application demands across available capacity in re-

sponse to failures and changing application demands.

Leverage application classication and priority for scheduling in cooperation with edge rate

limiting.

Trac engineering with traditional distributed routing protocols (e.g. link-state) is known

to be sub-optimal [17, 16] except in special cases [39].

Faster, deterministic global convergence for failures.

No existing protocols for func-

tionality. Requires knowledge

about site to site demand and im-

portance.

Separate hardware

from soware

Customize routing and monitoring protocols to B4 requirements.

Rapid iteration on soware protocols.

Easier to protect against common case soware failures through external replication.

Agnostic to range of hardware deployments exporting the same programming interface.

Previously untested development

model. Breaks fate sharing be-

tween hardware and soware.

Table 1: Summary of design decisions in B4.

Figure 2: B4 architecture overview.

stance of Paxos [9] elects one of multiple available soware replicas

(placed on dierent physical servers) as the primary instance.

e global layer consists of logically centralized applicat ions (e.g.

an SDN Gateway and a central TE server) that enable the central

control of the entire network via the site-level NCAs. e SDN Gate-

way abstracts details of OpenFlow and switch hardware from the

central TE server. We replicate global layer applications across mul-

tiple WAN sites with separate leader election to set the primary.

Each server cluster in our network is a logical “Autonomous Sys-

tem” (AS) with a set of IP prexes. Each cluster contains a set of BGP

routers (not shown in Fig. 2) that peer with B4 switches at each WAN

site. Even before introducing SDN, we ran B4 as a single AS pro-

viding transit among clusters running traditional BGP/ISIS network

protocols. We chose BGP because of its isolation properties bet ween

domains and operator familiarity with the protocol. e SDN-based

B4 then had to support existing distributed routing protocols, both

for interoperability with our non-SDN WAN implementation, and

to enable a gradual rollout.

We considered a number of options for integrating existing rout-

ing protocols with centralized trac engineering. In an aggressive

approach, we would have built one integrated, centralized service

combining routing (e.g., ISIS functionality) and trac engineering.

We instead chose to deploy routing and trac engineering as in-

dependent services, with the standard routing service deployed ini-

tially and central TE subsequently deployed as an overlay. is sep-

aration delivers a number of benets. It allowed us to focus initial

work on building SDN infrastructure, e.g., the OFC and agent, rout-

ing, etc. Moreover, since we initially deployed our network with no

new externally visible functionality such as TE, it gave time to de-

velop and debug the SDN architecture before trying to implement

new features such as TE.

Perhaps most importantly, we layered trac engineering on top

of baseline routing protocols using prioritized switch forwarding ta-

ble entries (§ 5). is isolation gave our network a “ big red button”;

faced with any critical issues in trac engineering, we could dis-

able the service and fall back to shortest path forwarding. is fault

recovery mechanism has proven invaluable (§ 6).

Each B4 site consists of multiple switches with potentially hun-

dreds of individual ports linking to remote sites. To scale, the TE ab-

stracts each site into a single node with a single edge of given capac-

ity to each remote site. To achieve this topology abstraction, all traf-

c crossing a site-to-site edge must be evenly distributed across all

its constituent links. B4 routers employ a custom variant of ECMP

hashing [37] to achieve the necessary load balancing.

In the rest of this section, we describe how we integrate ex-

isting routing protocols running on separate control servers with

OpenFlow-enabled hardware switches. § 4 then describes how we

layer TE on top of this baseline routing implementation.

3.2 Switch Design

Conventional wisdom dictates that wide area routing equipment

must have deep buers, very large forwarding tables, and hardware

support for high availability. All of this functionality adds to hard-

ware cost and complexity. We posited that with careful endpoint

management, we could adjust transmission rates to avoid the need

for deep buers while avoiding expensive packet drops. Further,

our switches run across a relatively small set of data centers, so

we did not require large forwarding tables. Finally, we found that

switch failures typically result from soware rather than hardware

issues. By moving most soware functionality o the switch hard-

ware, we can manage soware fault tolerance through known tech-

niques widely available for existing distributed systems.

Even so, the main reason we chose to build our own hardware

was that no existing platform could support an SDN deployment,

i.e., one that could export low-level control over switch forwarding

behavior. Any extra costs from using custom switch hardware are

more than repaid by the eciency gains available from supporting

novel services such as centralized TE. Given the bandwidth required

剩余11页未读，继续阅读

xu23heng

粉丝: 0
资源: 3

谷歌B4全球软件定义广域网架构

Google-B4网络介绍

Google架构.rar

Google的网络架构

B4a.zip

B4A初级教程

bcMaterial:用于 b4a (b4x.com) 开源的材料库包装器

B4A的示例及截图

setuptools-0.6b4.zip

谷歌B4：全球部署的软件定义广域网实践

Google B4：全球部署的SD-WAN设计与实践

最新资源