Apache Hive性能优化指南-HDP3.1.0

需积分: 10 164 浏览量更新于2024-07-09 收藏 805KB PDF 举报

"Apache Hive Performance Tuning - HDP 3.1.0" 在Apache Hive Performance Tuning中，针对HDP 3.1.0版本，本文档提供了优化Hive数据仓库性能的策略和技巧。Hive是大数据处理领域的一个重要组件，常用于执行SQL查询在Hadoop上存储的数据。以下是一些关键点： 1. **LLAP (Live Long and Process) 配置**： - LLAP（Live Long and Process）是Hive的一种新架构，它实现了查询的快速响应，通过缓存部分计算结果来提高交互性。 - 在开始调优前，确保对LLAP的基本概念和工作原理有深入理解。 - 设置LLAP端口，这包括HiveServer Interactive和LLAP Daemon的端口配置，以确保服务正常运行。 2. **性能调优准备**： - 在进行性能调优前，需要对环境进行评估，了解硬件资源、网络状况以及现有工作负载。 - 启用YARN预占，允许高优先级任务抢占低优先级任务的资源，提高交互式查询响应时间。 3. **设置LLAP**： - 启用YARN的交互式查询模式，使Hive更适合处理即时查询需求。 - 设置多个HiveServer Interactive实例以实现高可用性，确保服务不会因单点故障而中断。 - 配置LLAP队列，根据工作负载类型分配合适的资源。 - 设置Hive代理，允许用户通过HiveServer2进行安全访问。 4. **其他LLAP属性**： - 配置LLAP相关的其他属性，如内存分配、线程池大小等，以优化LLAP Daemon的行为。 - 调整HiveServer的堆大小，确保足够的内存供查询处理使用。 5. **保存并重启服务**： - 一旦所有配置完成，记得保存设置并重启相关服务以应用更改。 - 运行一个交互式查询，验证性能是否有所改善。 6. **使用HiveServer Interactive UI和JDBC客户端**： - 通过HiveServer Interactive UI监控查询性能，并进行故障排查。 - 使用JDBC客户端连接到LLAP，以便于开发和测试。 7. **YARN队列配置**： - 针对批处理和交互式查询，分别配置不同的YARN队列，确保资源合理分配。 - 创建自定义LLAP队列，进一步精细化资源管理。 8. **Hive仓库处理的关键组件**： - 查询结果缓存和元数据缓存可显著提高查询速度，减少不必要的数据读取。 - Tez执行引擎的属性配置也对性能有直接影响。 9. **监控Hive性能**： - 监控LLAP资源，包括内存使用、CPU利用率和队列状态，以便及时发现和解决问题。 - 使用Hadoop的监控工具（如Ambari）来跟踪Hive和YARN的性能指标。 10. **最大化存储资源使用ORC**： - ORC（Optimized Row Columnar）是Hive的一种高效存储格式，可以压缩数据，提高读写速度。 - 配置高级ORC属性，如压缩级别、 stripe大小等，以优化存储和I/O性能。 11. **利用分区提升性能**： - 数据分区是提高查询性能的有效方法，通过将数据划分为更小、更易管理的部分，可以加速特定条件的查询。 - 避免过度分区，以免增加元数据负担和复杂性。 12. **处理大表和倾斜表**： - 对于大数据量的表，可能需要考虑分桶、索引或使用MapReduce优化。 - 处理倾斜表时，需采取特殊策略，如倾斜键处理，以避免某些分区或节点过载。这些是优化Hive性能的关键步骤，通过综合应用这些技术和策略，可以在HDP 3.1.0环境中显著提高Hive数据仓库的性能和响应速度。

Data Access Setting up LLAP

Save the settings.

Enable interactive query

You need to enable interactive query to take advantage of low-latency analytical processing (LLAP) of Hive queries.

When you enable interactive query, you select a host for HiveServer Interactive.

About this task

The Interactive Query control displays a range of values for default Maximum Total Concurrent Queries based on the

number of nodes that you select for LLAP processing and the number of CPUs in the Hive LLAP cluster. The Ambari

wizard typically calculates appropriate values for LLAP properties in Interactive Query, so accept the defaults or

change the values to suit your environment.

When you enable Interactive Query, the Run as end user and Hive user security settings have no effect. These

controls affect batch-processing mode.

Procedure

In Ambari, select Services > Hive > Configs > Settings.

In Interactive Query, set Enable Interactive Query to Yes:

In Select HiveServer Interactive Host, accept the default server to host HiveServer Interactive, or from the drop-

down, select a different host.

Data Access Setting up LLAP

If you do not want to set up multiple HiveServer Interactives for high availability, skip the next set of steps, and

proceed to configuring the llap queue.

Set up multiple HiveServer Interactives for high availability

After enabling interactive query, you can optionally set up additional HiveServer Interactive instances for high-

availablilty. One instance operates in active mode, the other in passive (standby) mode. The passive instance serves as

a backup and takes over if the active instance goes down.

About this task

Multiple HiveServer Interactives do not work in active/passive mode unless you set up all instances during the LLAP

setup process, immediately after enabling interactive query. Do not select Add HiveServer2 Interactive from Actions

after completing the LLAP setup. If you add an additional HiveServer Interactive instance in this way, it will not

operate in active/passive mode. If you make this mistake, remove HSI instances, keeping HS2 and HMS, and then re-

add HSI in the following way:

Procedure

In Select HiveServer Interactive Host, after selecting one HiveServer2 Interactive host, click + to add another.

Accept the default server to host the additional HiveServer Interactive, or from the drop-down, select a different

host.

Optionally, repeat these steps to add additional HiveServer Interactives.

Configure an llap queue

Ambari generally creates and configures an interactive query queue named llap and points the Hive service to a

YARN queue. You check, and if necessary, change the llap queue using YARN Queue Manager.

About this task

The llap queue capacity determines the YARN resources for the LLAP application. Reconfiguring the llap queue

is sometimes necessary. For example, if you have a 3-node cluster, Ambari might configure zero percent capacity

for the llap queue, and you must reconfigure settings. If you set the llap queue capacity or number of nodes too low,

you won’t have enough YARN resources or LLAP daemons to run the LLAP application. If you set the llap queue

capacity too high, you waste space on the cluster.

Procedure

In Ambari, select Hive > Configs.

In Interactive Query Queue, choose the llap queue if it appears as a selection, and save the Hive configuration

changes.:

剩余32页未读，继续阅读

啊彪123

粉丝: 23
资源: 22

Apache Hive性能优化指南-HDP3.1.0

编译的spark-hive_2.11-2.3.0和 spark-hive-thriftserver_2.11-2.3.0.jar

apache-hive-3.1.0-bin.tar.gz

TPC-H_on_Hive_2009-08-14.tar.gz

hive_3_2_0-3.1.3-1.el7.noarch.rpm

hive_3_2_0-hbase-3.1.3-1.el7.noarch.rpm

hive_3_2_0-jdbc-3.1.3-1.el7.noarch.rpm

hive_3_2_0-hcatalog-3.1.3-1.el7.noarch.rpm

Hadoop_Hive_Project:NYU CSCI-GA.3033-003的课程项目

hive_3_2_0-hcatalog-server-3.1.3-1.el7.noarch.rpm

hive-service-3.1.0.jar

最新资源