Flume NG
Overview
Architecture
Data Delivery Semantics
Notes
Critical Features
Common Use Cases
Known Issues, Limitations, Concerns
Traits
Groups (client / server trait param)
Catalog of Sources / Sinks
Diagrams
Overview
This is the top level section for all Flume NG documentation. Flume NG is a refactoring of Flume and was originally tracked in . FromFLUME-728
the JIRA's description:
To solve certain known issues and limitations, Flume requires a refactoring of some core classes and systems. This bug is a
parent issue to track the development of a "Flume NG" - a poorly named, but necessary refactoring. Subtasks should be added
to track individual systems and components.
The following known issues are specifically to be addressed:
Code complexity; Flume has evolved over the last few years and has a fair amount of extraneous code.
Core component lifecycle standardization and control code (e.g. anything that can be start()ed or stop()ed, sources,
sinks).
(Static) Configuration access throughout the code base.
Drastic simplification of common data paths (e.g. durability as an element of the source rather than a disconnected
sink).
Heartbeat and master rearchitecture.
Renaming packages to org.apache.flume.
This is a large and far reaching set of tasks. The intent is to perform this work in a branch as to not disrupt immediate releases or
short term forthcoming releases while still allowing open development in the community.
For reference, we refer to the code branch flume-728 (named for the refactoring JIRA) as "Flume NG." We call the current incarnation of Flume
"Flume OG" ("original generation" or the slightly funnier definition, ) which corresponds to the code branch trunk and that which"original gangsta"
was previously released under the 0.9.x stream.
Historically, NG code has been worked on by Arvind Prabhakar, Prasad Mujumdar, and E. Sammer (me). Jon Hsieh, Patrick Hunt, and Henry
Robinson have provided help in vetting design. Will McQueen has provided usability and correctness testing. Development is obviously open to all
and we'd greatly appreciate anyone who wants to jump in and help!
It goes without saying that NG is based on the fantastic work led by Jon Hsieh and all of the other contributors put into Flume (OG).
Architecture
Flume NG's high level architecture solidifies a few concepts from Flume OG and drastically simplifies others. As our goals state, we are focused
on a streamlined codebase that meets the common use cases in a "batteries included," easy to use, easy to extend package. Flume NG retains
Flume OG's general approach to data transfer and handling (a N:M push model data transport, where N is big and M is significantly smaller).
The major components of the system are:
Event
An event is a singular unit of data that can be transported by Flume NG. Events are akin to messages in JMS and similar messaging
systems and are generally small (on the order of a few bytes to a few kilobytes). Events are also commonly single records in a larger
dataset. An event is made up of headers and a body; the former is a key / value map and the latter, a arbitrary byte array
<style type='text/css'> .FootnoteMarker, .FootnoteNum a { background: transparent
url(/confluence/download/resources/com.adaptavist.confluence.footnoteMacros:footnote/gfx/footnote.png) no-repeat top right; padding:
1px 2px 0px 1px; border-left: 1px solid #8898B8; border-bottom: 1px solid #6B7C9B; margin: 1px; text-decoration: none; } .FootnoteNum
a { margin-top: 2px; margin-right: 0px; } .FootnoteNum { font-size: x-small; text-align: right; padding-bottom: 4px; } .footnote-th1 {
text-align: right; } .Footnote { padding-left: 7px; margin-bottom: 4px; border: 1px none #DDDDDD; writingMode: tb-rl; } .accessibility {
display: none; visibility: hidden; } @media aural,braille,embossed { .FootnoteMarker, .FootnoteNum a { border: 1px solid #000000;
background: #ffffff none; } .accessibility { display: run-in; visibility: visible; } } </style> <script type='text/javascript' language='JavaScript'>
//<!-- var effectInProgress = {}; var despamEffect = function (id,effectType,duration) { if ((effectInProgress[id]) ||