探索CDH大数据平台的关键工具：Kudu详解

需积分: 50 114 浏览量更新于2024-07-18 1 收藏 1.78MB PDF 举报

Apache Kudu是Cloudera大数据平台（CDH）中不可或缺的一部分，它是一种高性能、列式存储的数据库系统，特别适合于处理大规模数据集并提供实时分析能力。Kudu的设计目标是作为Hadoop生态系统中的一个高效数据存储解决方案，它结合了传统关系数据库的性能优势与NoSQL数据库的灵活性。 Kudu的重要特性包括： 1. **列式存储**：Kudu采用列式存储方式，这使得它在查询时能更快地定位和读取所需的列，显著提高了查询性能，特别是对于那些经常需要扫描大量数据的场景。 2. **分布式架构**：Kudu是完全分布式设计，能够水平扩展，支持大规模数据处理。它通过元数据管理来保持集群的一致性，允许数据在多个节点之间动态分布。 3. **高可用性和容错性**：Kudu具有自动故障检测和恢复功能，能够容忍单个节点故障，并确保数据的可靠性和完整性。 4. **低延迟**：由于优化的读写操作和缓存机制，Kudu可以提供接近实时的数据访问，这对于实时分析和低延迟的应用场景非常重要。 5. **兼容性**：Kudu与Hadoop生态系统无缝集成，可以作为Hive、Impala等分析工具的底层存储，提供高性能的连接。 6. **事务支持**：虽然Kudu主要聚焦于在线分析处理（OLAP），但它也支持ACID事务，这对于某些需要强一致性的场景是关键。 7. **易用性**：Apache Kudu提供了用户友好的命令行工具和API，使得数据开发人员可以轻松地进行数据加载、管理和查询。 8. **安全性和管理**：Kudu支持 Kerberos身份验证，以及Hadoop的其他安全模型，同时，它还提供了内置的日志和审计功能，便于管理和监控。在使用Kudu时，需要遵循相关的版权规定，如Cloudera和Apache Software Foundation的商标政策，未经许可不得复制或使用其商标。此外，任何在文档中提及的产品、服务、流程或其他信息，都是各自所有者财产，使用时需尊重其权益。总结来说，Apache Kudu是CDH中的一款关键组件，它在大数据处理环境中提供了高性能、低延迟和可扩展的存储解决方案，尤其适用于实时分析和数据仓库场景。要充分利用Kudu，开发者需要熟悉其特性和操作，同时也需遵守相关的法律和商业协议。

Overview of Apache Kudu Installation and Upgrade in CDH

Starting with Apache Kudu 1.5.0 / CDH 5.13, Kudu ships with CDH 5. In a parcel-based configuration, Kudu is part of

the CDH parcel rather than a separate parcel. The Kudu packages are also bundled into the CDH package.

Platform Requirements

Before you proceed with installation or upgrade:

• Review Product Compatibility Matrix - Apache Kudu.

• Review the CDH and Cloudera Manager installation options described in Cloudera Manager Deployment.

Installing Kudu

Note: Kudu is not supported in single-user mode.

On a cluster managed by Cloudera Manager, Kudu is installed as part of CDH and does not need to be installed separately.

With Cloudera Manager, you can enable or disable the Kudu service, but the Kudu component remains present on the

cluster. For instructions, see Installing Cloudera Manager and CDH.

On an unmanaged cluster, you can install Kudu packages manually. For instructions, see Kudu Installation.

Upgrading Kudu

Before you proceed with an upgrade, review the Upgrade Notes for Kudu 1.5.0 / CDH 5.13.0.

On a managed cluster,

• If you have just upgraded Cloudera Manager from a version that did not include Kudu, then Kudu will not be

installed automatically. You will need to add the Kudu service manually. Upgrading Cloudera Manager does not

automatically upgrade CDH or other managed services.

• Parcels: If you are upgrading CDH and were previously using the standalone Kudu parcel (version 1.4.0 and lower),

then you must deactivate this parcel and activate the latest CDH parcel that includes Kudu. For instructions, see

Upgrading to CDH 5.x Using Parcels.

• Packages: If you are upgrading CDH and were previously using the Kudu package (version 1.4.0 and lower), then

you must uninstall the kudu package and upgrade to the latest CDH package that includes Kudu. For instructions,

see Upgrading to CDH 5.x Using Packages.

On an unmanaged cluster, you can upgrade Kudu packages manually. For instructions, see Upgrade Kudu Using the

Command Line.

Apache Kudu Guide | 13

Overview of Apache Kudu Installation and Upgrade in CDH

Apache Kudu Usage Limitations

Schema Design Limitations

Primary Key

• The primary key cannot be changed after the table is created. You must drop and recreate a table to select a

new primary key.

• The columns which make up the primary key must be listed first in the schema.

• The primary key of a row cannot be modified using the UPDATE functionality. To modify a row’s primary key,

the row must be deleted and re-inserted with the modified key. Such a modification is non-atomic.

• Columns with DOUBLE, FLOAT, or BOOL types are not allowed as part of a primary key definition. Additionally,

all columns that are part of a primary key definition must be NOT NULL.

• Auto-generated primary keys are not supported.

• Cells making up a composite primary key are limited to a total of 16KB after internal composite-key encoding

is done by Kudu.

Cells

No individual cell may be larger than 64KB before encoding or compression. The cells making up a composite key

are limited to a total of 16KB after the internal composite-key encoding done by Kudu. Inserting rows not conforming

to these limitations will result in errors being returned to the client.

Columns

• By default, Kudu will not permit the creation of tables with more than 300 columns. We recommend schema

designs that use fewer columns for best performance.

• DECIMAL, CHAR, VARCHAR, DATE, and complex types such as ARRAY are not supported.

• Type and nullability of existing columns cannot be changed by altering the table.

• Dropping a column does not immediately reclaim space. Compaction must run first.

Rows

Kudu was primarily designed for analytic use cases. Although individual cells may be up to 64KB, and Kudu supports

up to 300 columns, it is recommended that no single row be larger than a few hundred KB. You are likely to encounter

issues if a single row contains multiple kilobytes of data.

Tables

• Tables must have an odd number of replicas, with a maximum of 7.

• Replication factor (set at table creation time) cannot be changed.

• There is no way to run compaction manually, but dropping a table will reclaim the space immediately.

Other Usage Limitations

• Secondary indexes are not supported.

• Multi-row transactions are not supported.

• Relational features, such as foreign keys, are not supported.

14 | Apache Kudu Guide

Apache Kudu Usage Limitations

• Identifiers such as column and table names are restricted to be valid UTF-8 strings. Additionally, a maximum

length of 256 characters is enforced.

If you are using Apache Impala to query Kudu tables, refer to the section on Impala Integration Limitations on page

16 as well.

Partitioning Limitations

• Tables must be manually pre-split into tablets using simple or compound primary keys. Automatic splitting is not

yet possible. Kudu does not allow you to change how a table is partitioned after creation, with the exception of

adding or dropping range partitions.

• Data in existing tables cannot currently be automatically repartitioned. As a workaround, create a new table with

the new partitioning and insert the contents of the old table.

• Tablets that lose a majority of replicas (such as 1 left out of 3) require manual intervention to be repaired.

Scaling Recommendations and Limitations

• Recommended maximum number of tablet servers is 100.

• Recommended maximum number of masters is 3.

• Recommended maximum amount of stored data, post-replication and post-compression, per tablet server is 8TB.

• Recommended maximum number of tablets per tablet server is 2000, post-replication.

• Maximum number of tablets per table for each tablet server is 60, post-replication, at table-creation time.

Server Management Limitations

• Production deployments should configure a least 4GB of memory for tablet servers, and ideally more than 16GB

when approaching the data and tablet scale limits.

• Write ahead logs (WALs) can only be stored on one disk.

• Disk failures are not tolerated and tablets servers will crash as soon as one is detected.

• Failed disks with unrecoverable data requires formatting of all Kudu data for that tablet server before it can be

started again.

• Data directories cannot be added/removed; they must be reformatted to change the set of directories.

• Tablet servers cannot be gracefully decommissioned.

• Tablet servers cannot change their address or port.

• Kudu has a hard requirement on having an up-to-date NTP. Kudu masters and tablet servers will crash when out

of sync.

• Kudu releases have only been tested with NTP. Other time synchronization providers such as Chrony may not

work.

Cluster Management Limitations

• Rack awareness is not supported.

Apache Kudu Guide | 15

Apache Kudu Usage Limitations

剩余73页未读，继续阅读

qweuytrqoiwerqpoweru

粉丝: 1
资源: 10

探索CDH大数据平台的关键工具：Kudu详解

Apache Kudu 1.4.0 中文文档.pdf

KUDU介绍及实践

REDHAT7.2 安装 CDH5.10 和 Kudu1.2

CDH集群kudu使用

尚硅谷大数据技术之cdh vmware

tdh和cdh各组件的比较

CDH5.9 无法打开

cdh 的安装使用教程

cdh 6 ubuntu 安装包

cdh6.3.2源码下载

最新资源