1
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/dimidukkhurana/
HBase in Action
By Nick Dimiduk and Amandeep Khurana
Hadoop is written in Java; HBase is written in Java; the stock HBase client is
written in Java. There’s only one problem: you don’t use Java. You don’t even like
the JVM. You still want to use HBase. Now what? HBase provides you with
alternate clients (both JVM-based as well as those that don’t require the JVM) that
you can use when Java is not an option. When it comes to doing anything more than
exploring a cluster, you’ll want to use the Thrift gateway. That’s what this article
based on chapter 6 from HBase in Action is all about.
You may also be interested in…
Using the HBase Thrift Gateway from Python
When you live in the world beyond Java, the most common way to interact with HBase is via Apache Thrift
1
. Thrift
is a language and set of tools for generating language-agnostic services. Thrift has an Interface Definition
Language (IDL) for describing services and objects. It provides a networking protocol for communication between
processes using those object and service definitions. Thrift uses the IDL you describe to generate code for you in
your favorite languages. Using that code, you can write applications that communicate with each other using the
lingua franca provided by Thrift.
HBase ships a Thrift IDL describing a service layer and set of objects. It also provides a service implementing
that interface. In this section, you’ll generate the Thrift client library for interacting with HBase. You’ll use that
client library to interact with HBase from Python, completely outside of Java and the JVM. We chose Python
because its syntax is approachable to both novice and seasoned programmers. The same approach applies for
interacting with HBase from your favorite language. At the time of this writing, Thrift supports 14 different
languages.
This API is… different.
In part because of Thrift’s ambitions to support so many languages, its IDL is relatively simple. It lacks features
common in many languages, such as object inheritance. As a result, the HBase interface via Thrift is slightly
different from the Java client API we’ve explored thus far.
There is an effort
2
under way to bring the Thrift API closer to Java, but it remains a work in progress. An early
version is available with HBase 0.94, but it lacks some key features like Filters and access to Coprocessor
Endpoints.
3
The API we’re exploring here will be deprecated upon completion of this effort.
The beauty of using the Thrift API is that it’s the same for all languages. Whether you’re using PHP, Perl, or C#,
the interface is always the same. Additional HBase feature support added to the Thrift API is additional feature
support available everywhere.
1
Originally a project out of Facebook, Thrift is now an Apache project. Learn more at http://thrift.apache.org/
2
For more details, see the JIRA ticket “Thrift server to match the new Java API”: https://issues.apache.org/jira/browse/HBASE-1744
3
Well, you can, but you have to modify the Hbase.thrift file for each Endpoint you want to expose. For details, see
https://issues.apache.org/jira/browse/HBASE-5600.