掌握Big Data SMACK：Apache Spark、Mesos等技术指南

需积分: 10 24 浏览量更新于2024-07-19 收藏 11.09MB PDF 举报

"《Big Data SMACK：Apache Spark、Mesos、Akka、Cassandra和Kafka指南》一书由Raul Estrada和Isaac Ruiz合著，于2016年出版，针对大数据领域中的热门技术栈SMACK（Spark++）进行了深入讲解。SMACK这个名字来源于这些组件的集合，它们在当前和未来的大数据处理中占据了主导地位。书中强调，在2014、2015和2016年的开发者薪资调查中，数据工程师、数据科学家和数据架构师的收入水平较高，反映出大数据技术在IT行业的巨大需求。传统上，处理大量数据的工作主要由拥有博士学位的顶尖大学出身的专业人士负责，然而随着技术的发展，这种格局正在改变。Apache Spark以其开源特性颠覆了这一行业，因为它打破了大型企业对数据处理平台的垄断。由于开源社区的广泛参与，Spark等工具相较于专有软件更具优势，因为开源项目能够吸引众多开发者的贡献，从而实现更强大的功能。 Spark的特点之一是易于安装和部署，它能够在个人笔记本上轻松搭建，这对于开发者来说是个福音，特别是对于初创公司和小型企业，无需投入大量的生产环境或大型实验室即可进行开发。这种灵活性使得更多人有机会接触到大数据开发，并从中受益。本书的目的在于帮助读者掌握SMACK技术栈，理解其如何在未来成为主流。通过学习Spark（分布式计算框架）、Mesos（资源管理系统）、Akka（高性能并行编程框架）、Cassandra（分布式数据库系统）以及Kafka（实时流处理平台），读者不仅能提升自己的技能，还能适应行业趋势，提高职业竞争力。对于希望成为高薪IT专业人士或者已经在该领域并且寻求未来发展趋势的人来说，这本书是一份宝贵的资源。版权信息表明，所有内容受版权保护，未经许可不得复制、翻译、重印或以任何形式传播。书中可能包含商标名称、标志和图像，使用时需遵循相关商标使用规定。《Big Data SMACK》是一本实用且权威的大数据技术指南，适合数据从业者和爱好者深入学习和实践。"

■ CONTENTS

xvii

HTAP ............................................................................................................................. 255

IaaS .............................................................................................................................. 255

idempotence ................................................................................................................ 256

IMDG ............................................................................................................................. 256

IoT ................................................................................................................................ 256

key-value ...................................................................................................................... 256

keyspace ...................................................................................................................... 256

latency .......................................................................................................................... 256

master-slave................................................................................................................. 256

metadata ...................................................................................................................... 256

NoSQL ........................................................................................................................... 257

operational analytics .................................................................................................... 257

RDBMS ......................................................................................................................... 257

real-time analytics ....................................................................................................... 257

replication .................................................................................................................... 257

PaaS ............................................................................................................................. 257

probabilistic data structures ........................................................................................ 258

SaaS ............................................................................................................................. 258

scalability ..................................................................................................................... 258

shared nothing ............................................................................................................. 258

Spark-Cassandra Connector ........................................................................................ 258

streaming analytics ...................................................................................................... 258

synchronization ............................................................................................................ 258

unstructured data ......................................................................................................... 258

Index ..................................................................................................................... 259

xxv

Introduction

During 2014, 2015, and 2016, surveys show that among all software developers, those with higher wages are

the data engineers, the data scientists, and the data architects.

This is because there is a huge demand for technical professionals in data; unfortunately for large

organizations and fortunately for developers, there is a very low offering.

Traditionally, large volumes of information have been handled by specialized scientists and people

with a PhD from the most prestigious universities. And this is due to the popular belief that not all of us have

access to large volumes of corporate data or large enterprise production environments.

Apache Spark is disrupting the data industry for two reasons. The first is because it is an open source

project. In the last century, companies like IBM, Microsoft, SAP, and Oracle were the only ones capable of

handling large volumes of data, and today there is so much competition between them, that disseminating

designs or platform algorithms is strictly forbidden. Thus, the benefits of open source become stronger

because the contributions of so many people make free tools more powerful than the proprietary ones.

The second reason is that you do not need a production environment with large volumes of data or

large laboratories to develop in Apache Spark. Apache Spark can be installed on a laptop easily and the

development made there can be exported easily to enterprise environments with large volumes of data.

Apache Spark also makes the data development free and accessible to startups and little companies.

If you are reading this book, it is for two reasons: either you want to be among the best paid IT

professionals, or you already are and you want to learn how today’s trends will become requirements in the

not too distant future.

In this book, we explain how dominate the SMACK stack, which is also called the Spark++, because it

seems to be the open stack that will most likely succeed in the near future.

剩余276页未读，继续阅读

Adventure1995

粉丝: 8
资源: 78

掌握Big Data SMACK：Apache Spark、Mesos等技术指南

Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka

数据处理平台架构中的SMACK组合：Spark、Mesos、Akka、Cassandra以及Kafka

Fast.Data.Processing.Systems.with.SMACK.Stack

blendshape命名

[置顶] 强制访问控制内核模块Smack

java實現xmppbosh連接https

smack设置为scoket连接

smack android 示例代码,Smack-Android客户端入门一

springboot 集成 smack

如何通过SparkPlugin和SMACK API实现一个自定义的XMPP客户端插件，并在其中添加事件监听器和自定义UI界面？

最新资源