![](https://csdnimg.cn/release/download_crawler_static/9715232/bg13.jpg)
Overview and Architecture
[ 8 ]
Flume 0.9
Flume was rst introduced in Cloudera's CDH3 distribution in 2011. It consisted
of a federation of worker daemons (agents) congured from a centralized master
(or masters) via Zookeeper (a federated conguration and coordination system).
From the master, you could check the agent status in a web UI as well as push
out conguration centrally from the UI or via a command-line shell (both really
communicating via Zookeeper to the worker agents).
Data could be sent in one of three modes: Best effort (BE), Disk Failover (DFO), and
End-to-End (E2E). The masters were used for the E2E mode acknowledgements and
multimaster conguration never really matured, so you usually only had one master,
making it a central point of failure for E2E data ows. The BE mode is just what it
sounds like: the agent would try to send the data, but if it couldn't, the data would
be discarded. This mode is good for things such as metrics, where gaps can easily be
tolerated, as new data is just a second away. The DFO mode stores undeliverable data
to the local disk (or sometimes, a local database) and would keep retrying until the
data could be delivered to the next recipient in your data ow. This is handy for those
planned (or unplanned) outages, as long as you have sufcient local disk space to
buffer the load.
In June, 2011, Cloudera moved control of the Flume project to the Apache Foundation.
It came out of the incubator status a year later in 2012. During the incubation year,
work had already begun to refactor Flume under the Star-Trek-themed tag, Flume-NG
(Flume the Next Generation).
Flume 1.X (Flume-NG)
There were many reasons why Flume was refactored. If you are interested in
the details, you can read about them at https://issues.apache.org/jira/
browse/FLUME-728. What started as a refactoring branch eventually became the
main line of development as Flume 1.X.
The most obvious change in Flume 1.X is that the centralized conguration master(s)
and Zookeeper are gone. The conguration in Flume 0.9 was overly verbose, and
mistakes were easy to make. Furthermore, centralized conguration was really outside
the scope of Flume's goals. Centralized conguration was replaced with a simple on-
disk conguration le (although the conguration provider is pluggable so that it
can be replaced). These conguration les are easily distributed using tools such as
cf-engine, Chef, and Puppet. If you are using a Cloudera distribution, take a look at
Cloudera Manager to manage your congurations. About two years ago, they created
a free version with no node limit, so it may be an attractive option for you. Just be
sure you don't manage these congurations manually, or you'll be editing these les
manually forever.