没有合适的资源?快使用搜索试试~ 我知道了~
首页大数据原理-bigdata-fundamentals
大数据原理-bigdata-fundamentals
需积分: 10 147 浏览量
更新于2023-05-26
评论
收藏 843KB PDF 举报
大数据原理-bigdata-fundamentals,大数据原理-bigdata-fundamentals
资源详情
资源评论
资源推荐

10-1
©2013 Raj Jain
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis
Big Data
Big Data
Fundamentals
Fundamentals
Raj Jain
Washington University in Saint Louis
Saint Louis, MO 63130
Jain@cse.wustl.edu
These slides and audio/video recordings of this class lecture are at:
http://www.cse.wustl.edu/~jain/cse570-13/
.

10-2
©2013 Raj Jain
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis
Overview
Overview
1.
Why Big Data?
2.
Terminology
3.
Key Technologies: Google File System, MapReduce,
Hadoop
4.
Hadoop and other database tools
5.
Types of Databases
Ref: J. Hurwitz, et al., “Big Data for Dummies,”
Wiley, 2013, ISBN:978-1-118-50422-2

10-3
©2013 Raj Jain
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis
Big Data
Big Data
Data is measured by 3V's:
Volume: TB
Velocity: TB/sec. Speed of creation or change
Variety: Type (Text, audio, video, images, geospatial, ...)
Increasing processing power, storage capacity, and networking
have caused data to grow in all 3 dimensions.
Volume, Location, Velocity, Churn, Variety,
Veracity (accuracy, correctness, applicability)
Examples: social network data, sensor networks,
Internet Search, Genomics, astronomy, …

10-4
©2013 Raj Jain
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis
Why Big Data Now?
Why Big Data Now?
1.
Low cost storage to store data that was discarded earlier
2.
Powerful multi-core processors
3.
Low latency possible by distributed computing: Compute
clusters and grids connected via high-speed networks
4.
Virtualization Partition, Aggregate, isolate resources in any
size and dynamically change it Minimize latency for any
scale
5.
Affordable storage and computing with minimal man power
via clouds
Possible because of advances in Networking

10-5
©2013 Raj Jain
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis
Why Big Data Now? (Cont)
Why Big Data Now? (Cont)
6.
Better understanding of task distribution (MapReduce),
computing architecture (Hadoop),
7.
Advanced analytical techniques (Machine learning)
8.
Managed Big Data Platforms: Cloud service providers, such
as Amazon Web Services provide Elastic MapReduce, Simple
Storage Service (S3) and HBase –
column oriented database.
Google’
BigQuery
and Prediction API.
9.
Open-source software: OpenStack, PostGresSQL
10.
March 12, 2012: Obama announced $200M for Big Data
research. Distributed via NSF, NIH, DOE, DoD, DARPA, and
USGS (Geological Survey)
剩余40页未读,继续阅读

















liujunyu
- 粉丝: 7
- 资源: 8
上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助

会员权益专享
最新资源
- Xilinx SRIO详解.pptx
- Informatica PowerCenter 10.2 for Centos7.6安装配置说明.pdf
- 现代无线系统射频电路实用设计卷II 英文版.pdf
- 电子产品可靠性设计 自己讲课用的PPT,包括设计方案的可靠性选择,元器件的选择与使用,降额设计,热设计,余度设计,参数优化设计 和 失效分析等
- MPC5744P-DEV-KIT-REVE-QSG.pdf
- 通信原理课程设计报告(ASK FSK PSK Matlab仿真--数字调制技术的仿真实现及性能研究)
- ORIGIN7.0使用说明
- 在VMware Player 3.1.3下安装Redhat Linux详尽步骤
- python学生信息管理系统实现代码
- 西门子MES手册 13 OpcenterEXCR_PortalStudio1_81RB1.pdf
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制

评论0