没有合适的资源?快使用搜索试试~ 我知道了~
首页Introduction to Apache Apex
资源详情
资源评论
资源推荐

Introduction to Apache
Apex
Siyuan Hua <hsy541@apache.org> @hsy541
PMC Apache Apex, Senior Engineer DataTorrent,
Big Data Technology Conference, Beijing, Dec 10
th
2016

Stream Data Processing
Data
Sources
Events
Logs
Sensor Data
Social
Databases
CDC
Oper1 Oper2 Oper3
Real-time
visualization, …
Data Delivery
Transform /
Analytics
A
O
SA
M
m
ea
B
SAMOA
Beam
Declarative
API
SQL
DAG API
Operator
Library
(roadmap)
2

Industries & Use Cases
Financial Services
Ad-Tech
Telecom
Manufacturing
Energy
IoT
Fraud and risk
monitoring
Real-time
customer facing
dashboards on key
performance
indicators
Call detail record
(CDR) &
extended data
record (XDR)
analysis
Supply chain
planning &
optimization
Smart meter
analytics
Data ingestion
and processing
Credit risk
assessment
Click fraud
detection
Understanding
customer
behavior AND
context
Preventive
maintenance
Reduce outages &
improve resource
utilization
Predictive
analytics
Improve turn around
time of trade settlement
processes
Billing
optimization
Packaging and
selling
anonymous
customer data
Product quality &
defect tracking
Asset &
workforce
management
Data governance
• Large scale ingest and distribution
• Real-time ELTA (Extract Load Transform Analyze)
• Dimensional computation & aggregation
• Enforcing data quality and data governance requirements
• Real-time data enrichment with reference data
• Real-time machine learning model scoring
HORIZONTAL
3

Apache Apex
• In-memory, distributed, parallel stream processing
• Application logic broken into components (operators) that execute distributed in a
cluster
• Unobtrusive Java API to express (custom) logic
• Maintain state and metrics in member variables
• Windowing, event-time processing
• Scalable, high throughput, low latency
• Operators can be scaled up or down at runtime according to the load and SLA
• Dynamic scaling (elasticity), compute locality
• Fault tolerance & correctness
• Automatically recover from node outages without having to reprocess from beginning
• State is preserved, checkpointing, incremental recovery
• End-to-end exactly-once
• Operability
• System and application metrics, record/visualize data
• Dynamic changes and resource allocation, elasticity
4

Native Hadoop Integration
• YARN is
the
resource
manager
• HDFS for
storing
persistent
state
5
剩余35页未读,继续阅读














安全验证
文档复制为VIP权益,开通VIP直接复制

评论1