2022年Apache Pulsar与Apache Kafka性能对比

需积分: 5 79 浏览量更新于2024-07-04 收藏 1.38MB PDF 举报

"这篇文章对比了2022年Apache Pulsar与Apache Kafka在性能方面的最新基准测试结果，包括活跃贡献者数量、最大可持续吞吐量、发布延迟以及追赶读取（Catch-up Reads）等关键指标。" Apache Pulsar与Apache Kafka是两个广泛使用的分布式消息中间件，它们在实时数据流处理和消息传递方面扮演着重要角色。Pulsar以其独特的存储分离架构和多租户支持受到关注，而Kafka则以其高吞吐量和低延迟著称。文章中的关键基准测试发现主要集中在以下几个方面： 1. **活跃贡献者比较**：图1显示了Pulsar与Kafka的月度活跃贡献者数量。这反映了社区活动和项目的活跃程度，更高的贡献者数量可能意味着更频繁的更新和更快的问题解决速度。 2. **最大可持续吞吐量**：这部分测试了两个系统在高负载下的表现。测试#1分别对单分区和100个分区进行了最大吞吐量测试。结果显示，Pulsar在增加分区后能保持更高的吞吐量（见图2和图3），这表明Pulsar在扩展性方面可能优于Kafka。 3. **发布延迟**：测试#2考察了在固定吞吐量下，消息从发布到被确认的延迟。图4展示了不同百分位数的延迟分布，分析部分将讨论这些延迟对于实时应用的影响。 4. **追赶读取/Backlog Draining**：测试#3关注的是消费者如何快速处理积压的消息。图5a展示了追赶读取的吞吐量，而图5b和5c则揭示了追赶时间及其对发布的影响。这部分可能对那些需要快速处理大量历史数据的应用至关重要。通过这些基准测试，我们可以看到Pulsar和Kafka各自的优势和适用场景。Pulsar可能更适合需要高度扩展性和多租户支持的大型分布式系统，而Kafka可能在低延迟和简单部署的场景中表现出色。然而，选择哪一个取决于具体的应用需求、性能目标以及对社区支持和维护的考虑。在实际应用中，开发者和架构师应综合考虑这些因素，并进行自己的测试以确定最适合他们特定用例的消息中间件。同时，持续关注这两个项目的最新发展也十分重要，因为它们都在不断优化和改进中。

Apache Pulsar™ vs. Apache Kafka

Ⓡ

2022 Benchmark

Benchmark Tests

Using the Linux Foundation Open Messaging benchmark [1], we ran the latest versions of

Apache Pulsar (2.9.1) and Apache Kafka (3.0.0). To ensure an objective baseline

comparison, each test in this Benchmark Report compares Kafka to Pulsar in two

scenarios: Pulsar with Journaling and Pulsar without Journaling.

Pulsar’s default conﬁguration includes Journaling, which offers a higher durability

guarantee than Kafka’s default conﬁguration. Pulsar without Journaling provides the same

durability guarantees as the default Kafka conﬁguration, which results in an

apples-to-apples comparison.

I. What We Tested

For this benchmark, we selected a handful of tests to represent common patterns in the

messaging and streaming domains and to test the limits of each system:

A. Maximum Sustainable Throughput

This test measures the maximum data throughput the system can deliver when consumers

are keeping up with the incoming trac.

We ran this test in two scenarios to test the upper boundary performance and to test the

cost proﬁle for each system:

1. Topic with a single partition. This scenario tests the upper boundary performance

for a total-order use case or, in the worst case, where partition keys’ data is skewed.

At some scale, the design of a system that relies upon single ordering or handling

large amounts of skewed data will need to be reconsidered. Pulsar has the ability to

handle situations where total ordering is required at higher scale or large amounts

of skew arise.

2. Topic with 100 partitions. With more partitions to stress available resources, this

test illustrates how well a system scales horizontally (by adding more machines) and

its cost effectiveness. For example, by modeling the hardware cost per 1GB/s of

trac, it is easy to derive the cost proﬁle for each system.

剩余21页未读，继续阅读

Julywhj

粉丝: 107
资源: 16

2022年Apache Pulsar与Apache Kafka性能对比

OReilly系列图书-Mastering Apache Pulsar -v2

Apache Pulsar入门demo

mastering apache pulsar pdf

深入解析ApachePulsar

java 使用presto jdbc 连接 apache pulsar

怎么查看版本Apache pulsar

pulsar-manager

kafka EMQ

Kafka和pulsar

pulsar-java-spring-boot-starter

最新资源