Apache Phoenix入门与配置教程(英文)

需积分: 0 11 浏览量更新于2024-06-30 收藏 1014KB PDF 举报

Apache Phoenix 是一个列式数据库系统，它为Hadoop生态系统提供了一个高度可扩展和高性能的SQL查询接口。本文档是关于Apache Phoenix 的使用手册，主要涵盖了以下几个关键部分： 1. **使用控制台**：这部分指导用户如何通过命令行工具——Sqlline，开始与Phoenix进行交互。首先，需要从官方下载最新版本的Phoenix JAR包，并将其复制到HBase的lib目录下，然后重启HBase服务。用户可以通过Sqlline连接到Phoenix数据库，执行SQL查询和管理表。 2. **创建表**：在控制台上，用户可以使用SQL语法创建表，如示例中的`CREATE TABLE test (mykey integer not null primary key, mycol ...)`，展示了如何定义表结构，包括主键（primary key）等关键属性。 3. **配置**：文档中提到了配置选项，说明了如何调整Phoenix的性能参数和设置，以适应不同的应用场景。这可能涉及内存分配、索引策略、缓存大小等参数的调整。 4. **功能模块集成**： - **Apache Spark集成**：Phoenix支持与Apache Spark的数据处理框架进行无缝协作，使得Spark能够利用Phoenix的列式存储和查询优化。 - **Hive存储处理器**：提供了将Phoenix数据与Apache Hive集成的能力，使得用户可以通过Hive SQL查询Phoenix数据。 - **Apache Pig** 和 **MapReduce** 集成同样重要，允许用户在这些大数据处理工具中访问Phoenix数据。 - **Apache Flume** 和 **Kafka** 插件，强调了Phoenix与日志收集和流处理系统的交互能力。 - **Python Driver**：文档还提到了Python驱动，这使得Python开发者能够方便地使用Phoenix进行数据操作。 5. **教程和指南**：文档中包含快速入门（QuickStart）、Tuning（调优）和ExplainPlan（解释计划），帮助用户理解和优化查询性能。 6. **兼容性和更新**：BackwardCompatibility章节讨论了Phoenix与其他版本的兼容性，以及ReleaseNotes部分，概述了新版本的主要特性与改进。 7. **问题与解答（F.A.Q.）**：文档最后提供了常见问题解答（Frequently Asked Questions），解决用户在使用过程中可能遇到的问题。 Apache Phoenix 使用文档详细介绍了如何在Hadoop环境中通过命令行和集成工具有效地管理和查询数据，适合Hadoop开发者和数据分析师深入理解和实践。无论是初次接触还是经验丰富的用户，都可以从中找到所需的信息和最佳实践。

scanoverPhoenixwillincludetheemptycolumntoensurethatrowsthatonlyconsist

oftheprimarykey(andhavenullforallnon-keycolumns)willbeincludedinascan

result.

原文:http://phoenix.apache.org/faq.html

F.A.Q.

-12-本文档使用书栈(BookStack.CN)构建

QuickStart

WhatisthisnewPhoenixthingI’vebeenhearingabout?PhoenixisanopensourceSQL

skinforHBase.YouusethestandardJDBCAPIsinsteadoftheregularHBaseclient

APIstocreatetables,insertdata,andqueryyourHBasedata.

Doesn’tputtinganextralayerbetweenmyapplicationandHBasejustslowthingsdown?

Actually,no.Phoenixachievesasgoodorlikelybetterperformancethanifyouhand-

codedityourself(nottomentionwithaheckofalotlesscode)by:

compilingyourSQLqueriestonativeHBasescans

determiningtheoptimalstartandstopforyourscankey

orchestratingtheparallelexecutionofyourscans

bringingthecomputationtothedataby

pushingthepredicatesinyourwhereclausetoaserver-sidefilter

executingaggregatequeriesthroughserver-sidehooks(calledco-processors)

Inadditiontotheseitems,we’vegotsomeinterestingenhancementsintheworks

tofurtheroptimizeperformance:

secondaryindexestoimproveperformanceforqueriesonnonrowkeycolumns

statsgatheringtoimproveparallelizationandguidechoicesbetween

optimizations

skipscanfiltertooptimizeIN,LIKE,andORqueries

optionalsaltingofrowkeystoevenlydistributewriteload

Ok,soit’sfast.ButwhySQL?It’sso1970sWell,that’skindofthepoint:give

folkssomethingwithwhichthey’realreadyfamiliar.Whatbetterwaytospurthe

adoptionofHBase?Ontopofthat,usingJDBCandSQL:

Reducestheamountofcodeusersneedtowrite

Allowsforperformanceoptimizationstransparenttotheuser

Opensthedoorforleveragingandintegratinglotsofexistingtooling

ButhowcanSQLsupportmyfavoriteHBasetechniqueofx,y,zDidn’tmakeitto

thelastHBaseMeetupdidyou?SQLisjustawayofexpressingwhatyouwantto

getnothowyouwanttogetit.Checkoutmypresentationforvariousexisting

andto-be-donePhoenixfeaturestosupportyourfavoriteHBasetrick.Haveideas

ofyourown?We’dlovetohearaboutthem:fileanissueforusand/orjoinour

mailinglist.

Blah,blah,blah-Ijustwanttogetstarted!Ok,great!Justfollowourinstall

instructions:

Phoenixin15minutesorless

QuickStart

-13-本文档使用书栈(BookStack.CN)构建

downloadandexpandourinstallationtar

copythephoenixserverjarthatiscompatiblewithyourHBaseinstallationinto

thelibdirectoryofeveryregionserver

restarttheregionservers

addthephoenixclientjartotheclasspathofyourHBaseclient

downloadandsetupSQuirrelasyourSQLclientsoyoucanissueadhocSQLagainst

yourHBasecluster

Idon’twanttodownloadandsetupanythingelse!Ok,fairenough-youcan

createyourownSQLscriptsandexecutethemusingourcommandlinetoolinstead.

Let’swalkthroughanexamplenow.Beginbynavigatingtothebin/directoryof

yourPhoenixinstalllocation.

First,let’screateaus_population.sqlfile,containingatabledefinition:

1. CREATETABLEIFNOTEXISTSus_population(

2. stateCHAR(2)NOTNULL,

3. cityVARCHARNOTNULL,

4. populationBIGINT

5. CONSTRAINTmy_pkPRIMARYKEY(state,city));

Nowlet’screateaus_population.csvfilecontainingsomedatatoputinthat

table:

1. NY,NewYork,8143197

2. CA,LosAngeles,3844829

3. IL,Chicago,2842518

4. TX,Houston,2016582

5. PA,Philadelphia,1463281

6. AZ,Phoenix,1461575

7. TX,SanAntonio,1256509

8. CA,SanDiego,1255540

9. TX,Dallas,1213825

10. CA,SanJose,912332

Andfinally,let’screateaus_population_queries.sqlfilecontainingaquery

we’dliketorunonthatdata.

1. SELECTstateas"State",count(city)as"CityCount",sum(population)as"PopulationSum"

2. FROMus_population

3. GROUPBYstate

4. ORDERBYsum(population)DESC;

Executethefollowingcommandfromacommandterminal

1. ./psql.py<your_zookeeper_quorum>us_population.sqlus_population.csvus_population_queries.sql

Congratulations!You’vejustcreatedyourfirstPhoenixtable,inserteddataintoit,

QuickStart

-14-本文档使用书栈(BookStack.CN)构建

剩余66页未读，继续阅读

赵小杏儿

粉丝: 25
资源: 314

Apache Phoenix入门与配置教程(英文)

Apache使用文档

phoenix部署使用文档

Apache中文参考文档

apache-phoenix-4.8.1-HBase-0.98-bin.tar

apache-phoenix-4.8.1-HBase-1.2-bin.tar.gz

apache-phoenix-4.14.0-cdh5.14.2-bin.tar.gz

apache-phoenix-4.14.0-cdh5.14.2-bin.tar.gz安装包

apache-phoenix-4.14.3-HBase-1.3-bin.tar.gz

phoenix-spark-toolkit:与Apache Phoenix + Apache Spark一起使用的工具包

phoenix-cdh:基于chd-phoenix的功能扩展、优化等

最新资源