构建可扩展、可靠的数据密集应用：原则与实践

Data

Intensive

需积分: 9 131 浏览量更新于2024-07-19 收藏 4.17MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

《设计数据密集型应用》(Designing Data-Intensive Applications)是由Martin Kleppmann编著的一本深入探讨如何构建可扩展、可靠且易于维护的软件工程和架构的书籍。该书针对当前IT领域的需求变化，详细剖析了数据系统的关键原则、算法和权衡，通过剖析流行软件包和框架的内部机制，帮助读者理解其工作原理。在本书中，作者首先在"基础数据系统"部分引导读者进入主题，从第1章"可靠、可扩展和可维护的应用程序"开始，介绍了设计高效应用的基础要素，包括数据模型和查询语言的选择，以及存储和检索策略。第3章"编码与进化"探讨了数据结构和持久化技术的重要性，确保系统的长期稳定性和适应性。进入第二部分"分布式数据"，第5章"复制"和第6章"分区"讨论了如何处理大规模数据的分布问题，包括复制策略和数据分割技术。接下来的章节关注事务处理（Chapter 7）和分布式系统中常见的挑战（Chapter 8），以及一致性与共识算法的实现（Chapter 9）。这部分内容对于理解和应对分布式环境中的复杂性至关重要。第三部分"衍生数据"涵盖了对实时数据处理的需求，如批量处理（Chapter 10）和流处理（Chapter 11），这些技术在大数据分析和实时决策支持中扮演着核心角色。最后一章"数据系统的未来"则展望了行业发展趋势和技术革新，帮助读者紧跟技术潮流。《设计数据密集型应用》不仅提供了实用的工具和技术选择指南，还强调了直觉和问题定位能力的培养，使读者能够在面临挑战时快速定位问题并找到解决方案。此外，本书适合软件工程师、架构师、数据科学家以及对大型数据系统感兴趣的开发者阅读，无论是在学术研究还是实际项目中，都能提供有价值的指导。作者Martin Kleppmann以其丰富的经验和深入的专业知识，为读者构建了一个全面的数据密集型应用设计框架，通过实例和理论相结合的方式，确保读者能够掌握设计和优化这类应用的精髓。这本书是IT专业人士必备的参考资料，也是那些希望提升自身技术水平和理解分布式、数据密集型系统特性的理想读物。

资源详情

资源推荐

which it is changing (as opposed to compute-intensive, where CPU cycles are the

bottleneck).

The tools and technologies that help data-intensive applications store and process

data have been rapidly adapting to these changes. New types of database systems

(“NoSQL”) have been getting lots of attention, but message queues, caches, search

indexes, frameworks for batch and stream processing, and related technologies are

very important too. Many applications use some combination of these.

The buzzwords that fill this space are a sign of enthusiasm for the new possibilities,

which is a great thing. However, as software engineers and architects, we also need to

have a technically accurate and precise understanding of the various technologies and

their trade-offs if we want to build good applications. For that understanding, we

have to dig deeper than buzzwords.

Fortunately, behind the rapid changes in technology, there are enduring principles

that remain true, no matter which version of a particular tool you are using. If you

understand those principles, you’re in a position to see where each tool fits in, how to

make good use of it, and how to avoid its pitfalls. That’s where this book comes in.

The goal of this book is to help you navigate the diverse and fast-changing landscape

of technologies for processing and storing data. This book is not a tutorial for one

particular tool, nor is it a textbook full of dry theory. Instead, we will look at examples

of successful data systems: technologies that form the foundation of many popular

applications, and that have to meet scalability, performance and reliability require‐

ments in production every day.

We will dig into the internals of those systems, tease apart their key algorithms, dis‐

cuss their principles and the trade-offs they have to make. On this journey, we will try

to find useful ways of thinking about data systems — not just how they work, but also

why they work that way, and what questions we need to ask.

After reading this book, you will be in a great position to decide which kind of tech‐

nology is appropriate for which purpose, and understand how tools can be combined

to form the foundation of a good application architecture. You won’t be ready to

build your own database storage engine from scratch, but fortunately that is rarely

necessary. You will, however, develop a good intuition for what your systems are

doing under the hood, so that you can reason about their behavior, make good design

decisions, and track down any problems that may arise.

Who Should Read this Book?

If you develop applications that have some kind of server/backend for storing or pro‐

cessing data, and your applications use the internet (e.g. web applications, mobile

apps, or internet-connected sensors), then this book is for you.

xiv | About this Book

This book is for software engineers, software architects and technical managers who

love to code. It is especially relevant if you need to make decisions about the architec‐

ture of the systems you work on — for example, if you need to choose tools for solv‐

ing a given problem, and figure out how best to apply them. But even if you have no

choice over your tools, this book will help you better understand their strengths and

weaknesses.

You should have some experience building web-based applications or network serv‐

ices, and you should be familiar with relational databases and SQL. Any non-

relational databases and other data-related tools you know are a bonus, but not

required. A general understanding of common network protocols like TCP and

HTTP is helpful. Your choice of programming language or framework makes no dif‐

ference for this book.

If any of the following are true for you, you’ll find this book valuable:

• You want to learn how to make data systems scalable, for example to support

web or mobile apps with millions of users.

• You need to make applications highly available (minimizing downtime) and

operationally robust.

• You are looking for ways of making systems easier to maintain in the long run,

even as they grow, and as requirements and technologies change.

•

You have a natural curiosity for the way things work, and want to know what

goes on inside major websites and online services. This book breaks down the

internals of various databases and data processing systems, and it’s great fun to

explore the bright thinking that went into their design.

Sometimes, when discussing scalable data systems, people make comments along the

lines of “you’re not Google or Amazon, stop worrying about scale and just use a rela‐

tional database”. There is truth in that statement: building for scale that you don’t

need is wasted effort, and may lock you into an inflexible design. In effect, it is a form

of premature optimization. However, it’s also important to choose the right tool for

the job, and different technologies each have their own strengths and weaknesses. As

we shall see, relational databases are important, but not the final word on dealing

with data.

Scope of this Book

This book does not attempt to give detailed instructions on how to install or use spe‐

cific software packages or APIs, since there is already plenty of documentation for

those things. Instead we discuss the various principles and trade-offs that are funda‐

mental to data systems, and we explore the different design decisions taken by differ‐

ent products.

About this Book | xv

Most of what we discuss in this book has already been said elsewhere in some form or

another — in conference presentations, research papers, blog posts, code, bug track‐

ers, and engineering folklore. This book summarizes the most important ideas from

many different sources, and it includes pointers to the original literature throughout

the text. The references at the end of each chapter are a great resource if you want to

explore an area in more depth.

We look primarily at the architecture of data systems and the ways how they are inte‐

grated into data-intensive applications. This book doesn’t have space to cover deploy‐

ment, operations, security, ethics, management and other areas — those are complex

and important topics, and we wouldn’t do them justice by making them superficial

side-notes in this book. They deserve books of their own.

Many of the technologies described in this book fall within the realm of the Big Data

buzzword. However, the term Big Data is so over-used and under-defined that it is

not useful in a serious engineering discussion. This book uses less ambiguous terms,

such as single-node vs. distributed systems, or online/interactive vs. offline/batch

processing systems.

This book has a bias towards free and open source software (FOSS), because reading,

modifying and executing source code is a great way to understand how something

works in detail. Open platforms also reduce the risk of vendor lock-in. However,

where appropriate, we also discuss proprietary software (closed-source software, soft‐

ware as a service, or companies’ in-house software that is only described in literature

but not released publicly).

Outline of this Book

This book is arranged into three parts:

1. In Part I, we will discuss the fundamental ideas that we need in order to design

data-intensive applications. We’ll start in Chapter 1 by discussing what we’re

actually trying to achieve: reliability, scalability and maintainability — how we

need to think about them, and how we can achieve them. In Chapter 2 we will

compare several different data models and query languages, and see how they are

appropriate to different situations. In Chapter 3 we will talk about storage

engines: how databases arrange data on disk so that you can find it again effi‐

ciently. Chapter 4 turns to formats for data encoding (serialization) and evolu‐

tion of schemas over time.

In Part II, we will move from data stored on one machine to data that is dis‐

tributed across multiple machines. This is often necessary for scalability, but

brings with it a variety of unique challenges. We’ll first discuss replication (Chap‐

ter 5), partitioning/sharding (Chapter 6), and transactions (Chapter 7). We will

then go into more detail on the problems with distributed systems (Chapter 8)

xvi | About this Book

剩余490页未读，继续阅读

Adventure1995

粉丝: 8
资源: 78

构建可扩展、可靠的数据密集应用：原则与实践

设计数据密集型应用.pdf

Designing Data-Intensive Applications 中文版

design data intensive application

designing data-intensive applications pdf

designing data-intensive applications awz3 mobi

计算机领域包括计算机科学基础、软件工程、硬件工程、网络工程、人工智能、计算机图形学、人机交互那他们又包括什么呢？请一一列举出来

请用英文写出跟软件需求规格说明书同一题目的完整的软件测试规格说明书

学习后端应该准备什么书籍

Android studio app info

ElGamal短签名的相关文献

域控制器结构设计参考资料

请写出收录在ScienceDirect或ISI-SCI Expand的英文文献的题目，各篇内容要求分别含有PyQt、Tkinter、wxpython、kivy、Pygui、Dabo、pyui4win、pyGtk、Pyforms、PySimpleGUI、Flexx、DearPyGui技术介绍

/data/user/0/

how to use js.design to design a data table about trade

有关网络聊天室的英文参考文献

校园交互式导识系统用户体验设计研究英文参考文献

计算机网络课设 参考文献

AWS SAA 题库

推荐一些RISC-V相关的书籍

Please write a behavior recognition review

最新资源

计算机网络课设参考文献