![](https://csdnimg.cn/release/download_crawler_static/9427088/bg13.jpg)
Overview and Architecture
[ 8 ]
Flume 0.9
Flume was rst introduced in Cloudera's CDH3 Distribution in 2011. It consisted
of a federation of worker daemons (agents) congured from a centralized master
(or masters) via Zookeeper (a federated conguration and coordination system).
From the master you could check agent status in a Web UI, as well as push out
conguration centrally from the UI or via a command line shell (both really
communicating via Zookeeper to the worker agents).
Data could be sent in one of the three modes, namely, best effort (BE), disk failover
(DFO), and end-to-end (E2E). The masters were used for the end-to-end (E2E) mode
acknowledgements and multi-master conguration never really matured so usually
you had only one master making it a central point of failure for E2E data ows.
Best effort is just what it sounds like—the agent would try and send the data, but if
it couldn't, the data would be discarded. This mode is good for things like metrics
where gaps can easily be tolerated, as new data is just a second away. Disk failover
mode stores undeliverable data to the local disk (or sometimes a local database)
and keeps retrying until the data can be delivered to the next recipient in your data
ow. This is handy for those planned (or unplanned) outages as long as you have
sufcient local disk space to buffer the load.
In June of 2011, Cloudera moved control of the Flume project to the Apache
foundation. It came out of incubator status a year later in 2012. During that
incubation year, work had already begun to refactor Flume under the Star Trek
Themed tag, Flume-NG (Flume the Next Generation).
Flume 1.X (Flume-NG)
There were many reasons to why Flume was refactored. If you are interested in
the details you can read about it at https://issues.apache.org/jira/browse/
FLUME-728
. What started as a refactoring branch eventually became the main line
of development as Flume 1.X.
The most obvious change in Flume 1.X is that the centralized conguration master/
masters and Zookeeper are gone. The conguration in Flume 0.9 was overly verbose
and mistakes were easy to make. Furthermore, centralized conguration was really
outside the scope of Flume's goals. Centralized conguration was replaced with
a simple on-disk conguration le (although the conguration provider is pluggable
so that it can be replaced). These conguration les are easily distributed using tools
such as cf-engine, chef, and puppet. If you are using a Cloudera Distribution, take
a look at Cloudera Manager to manage your congurations—their licensing was
recently changed to lift the node limit so it may be an attractive option for you.
Be sure you don't manage these congurations manually or you'll be editing those
les manually forever.