Hive入门教程:从基础到执行原理详解

需积分: 47 54 下载量 153 浏览量 更新于2024-07-20 3 收藏 1.99MB PDF 举报
本资源是一份由作者淳月宾编写的Hive教程电子文档,主要针对大数据技术系列的学习者设计。文档内容覆盖了Hive的基本使用、数据定义和管理、查询语法、执行原理以及优化等方面,旨在帮助读者系统理解和掌握Hive这一流行的数据仓库工具。 **第一部分:Hive基本使用** 这部分详细介绍了Hive的基础概念,包括Hive的定义,其特点如SQL接口的易用性、基于Hadoop的数据处理能力以及对大规模数据的处理支持。随后章节讲解了Hive的关键操作,如创建数据库、查看和管理表、数据加载与导出、SQL查询语法(包括Select、Where、GroupBy、Join等)以及常用内置函数(如explode、collect_set和collect_list)。 **自定义函数**部分深入探讨了用户自定义函数,包括UDF(一对一)、UDAF(多对一)和UDTF(一对多)的功能和应用,这对于在实际项目中扩展Hive功能非常有帮助。 **第二部分:Hive执行原理与优化** 这部分深入分析了Hive的技术架构,包括Hive的核心组件、底层存储机制、数据处理流程以及元数据管理。对于理解Hive如何将SQL语句转化为MapReduce任务,以及如何通过优化减少查询时间和资源消耗,这部分内容至关重要。 **Hive技术架构** 详细解释了Hive的架构图,阐述了Hive是如何整合Hadoop生态系统,包括HDFS作为底层存储、Metastore用于存储元数据、以及MapReduce作为数据处理引擎。这部分还涉及了Hive程序的执行流程,帮助读者理解查询背后的执行逻辑。 通过这份教程,读者可以全面了解Hive的各个方面,从入门到进阶,无论是数据的加载、查询优化,还是自定义函数的编写,都能找到所需的知识点。这是一份实用且详尽的Hive学习资料,适合希望进一步探索大数据领域的学习者和开发者参考。
2019-09-01 上传
Table of Contents About the Tutorial ····································································································································· i Audience ··················································································································································· i Prerequisites ············································································································································· i Disclaimer & Copyright ······························································································································ i Table of Contents ····································································································································· ii . 1. INTRODU CTION ································ ································ ································ ···················· 1 1 Hadoop ···················································································································································· 1 What is Hive? ··········································································································································· 2 Features of Hive ······································································································································· 2 Architecture of Hive ································································································································· 2 Working of Hive ······································································································································· 4 . 2. HIVE INSTALLATION ································ ································ ································ ·············· 6 6 Step 1: Verifying JAVA Installation ··········································································································· 6 Step 2: Verifying Hadoop Installation ······································································································· 8 Step 3: Downloading Hive ······················································································································ 15 Step 4: Installing Hive ····························································································································· 15 Step 5: Configuring Hive ························································································································· 16 Step 6: Downloading and Installing Apache Derby ················································································· 17 Step 7: Configuring Metastore of Hive ···································································································· 19 Step 8: Verifying Hive Installation ·········································································································· 20 . 3. HIVE DATA TYPES ································ ································ ································ ················ 22 Column Types ········································································································································· 22 Literals ··················································································································································· 24 Null Value ··············································································································································· 24 Complex Types ······································································································································· 24

结合下面hive元数据信息,生成hive建表语句,并且加上comment,注意day是分区 dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 apk 应用包名 string day string入库日期 org.apache.hadoop.hive.ql.io.orc.OrcSerde serialization.format 1 hdfs://DSbigdata/hiveDW/dwd_exten_crawl/appmarket_appinfo org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 app_name 应用名称 string day string 入库日期 org.apache.hadoop.hive.ql.io.orc.OrcSerde serialization.format 1 hdfs://DSbigdata/hiveDW/dwd_exten_crawl/appmarket_appinfo org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 app_url 平台详情页链接 string day string入库日期 org.apache.hadoop.hive.ql.io.orc.OrcSerde serialization.format 1 hdfs://DSbigdata/hiveDW/dwd_exten_crawl/appmarket_appinfo org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 cate 应用所属分类 string day string入库日期 org.apache.hadoop.hive.ql.io.orc.OrcSerde serialization.format 1 hdfs://DSbigdata/hiveDW/dwd_exten_crawl/appmarket_appinfo org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 other 其他 string day string 入库日期 org.apache.hadoop.hive.ql.io.orc.OrcSerde serialization.format 1 hdfs://DSbigdata/hiveDW/dwd_exten_crawl/appmarket_appinfo org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 region 平台名称 string day stri

2023-06-10 上传