Apache Accumulo:分布式Key-Value存储系统

需积分: 10 3 下载量 135 浏览量 更新于2024-09-03 收藏 100KB DOCX 举报
"Apache Hadoop---Accumulo 是一个基于 Google BigTable 设计思想的分布式 Key-Value 存储系统,适用于大数据处理。它利用 Apache Hadoop、Zookeeper 和 Thrift 技术,提供高可靠性、可扩展性和高性能。Accumulo 强调单元访问控制和服务器端的自定义处理,并具有自动负载均衡、数据分区、数据压缩以及细粒度安全标签功能。" Apache Accumulo 是一个专为大数据分析设计的分布式存储系统,它源自于 Google 的 BigTable 理念,但在 Apache Hadoop 生态系统中实现了这一概念。Accumulo 不仅仅是一个简单的键-值存储系统,而是提供了一种更为复杂的数据模型,允许更精细的数据操作。它的数据模型包括键-值对,其中键由多个元素组成,如列族、列限定符、时间戳等,这些元素都是字节数组。Accumulo 对这些元素进行排序,使得数据在扫描时可以高效地呈现。 Accumulo 的架构由多个组件构成,分布在多台服务器上,确保数据的存储和检索高效且可靠。TabletServer 是核心组件之一,负责管理表的部分分区,执行写入、读取操作,并维护内存中的排序视图。预写日志用于保证数据安全,即使 TabletServer 故障,也能从日志中恢复未完成的操作。垃圾收集器则定期清理不再需要的文件,优化存储空间。主服务器(Master)监控整个系统,处理 TabletServer 的故障,平衡负载,以及处理表的生命周期管理。 此外,Accumulo 引入了细粒度的安全机制,每个键-值对都可以附加安全标签,允许用户基于这些标签设置访问控制策略。这使得 Accumulo 成为一个适合安全敏感环境的存储解决方案。其自动负载平衡和分区策略保证了系统的可扩展性,能够随着数据量的增长动态调整资源分配。 Apache Accumulo 结合了 Hadoop 的分布式计算能力、BigTable 的数据组织方式和自身独特的安全与管理特性,为大数据处理提供了强大而灵活的基础设施。无论是数据分析、实时查询还是安全性要求高的应用场景,Accumulo 都能展现出其优势。
175 浏览量
Editorial Reviews Build and integrate Accumulo clusters with various cloud platforms Overview Shows you how to build Accumulo, Hadoop, and ZooKeeper clusters from scratch on both Windows and Linux Allows you to get hands-on knowledge about how to run Accumulo on Amazon EC2, Google Cloud Platform, Rackspace, and Windows Azure Cloud platforms Packed with practical examples to enable you to manipulate Accumulo with ease In Detail Accumulo is a sorted and distributed key/value store designed to handle large amounts of data. Being highly robust and scalable, its performance makes it ideal for real-time data storage. Apache Accumulo is based on Google’s BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo for Developers is your guide to building an Accumulo cluster both as a single-node and multi-node, on-site and in the cloud. Accumulo has been proven to be able to handle petabytes of data, with cell-level security, and real-time analyses so this is your step by step guide in taking full advantage of this power. Apache Accumulo for Developers looks at the process of setting up three systems – Hadoop, ZooKeeper, and Accumulo – and configuring, monitoring, and securing them. You will learn to connect Accumulo to both Hadoop and ZooKeeper. You will also learn how to monitor the cluster (single-node or multi-node) to find any performance bottlenecks, and then integrate to Amazon EC2, Google Cloud Platform, Rackspace, and Windows Azure. When integrating with these cloud platforms, we will focus on scripting as well. You will also learn to troubleshoot clusters with monitoring tools, and use Accumulo cell-level security to secure your data. What you will learn from this book Set up Hadoop, ZooKeeper, and Accumulo Monitor clusters – both performance and application logs Secure your data in Accumulo Optimize Hadoop, ZooKeeper, and Accumulo performance Integrate to various cloud platforms Use the Accumulo command-line shell Employ Ganglina to monitor the cluster and Graylog2 to monitor application logs Understand what tools are needed to optimize Accumulo performance Approach The book will have a tutorial-based approach that will show the readers how to start from scratch with building an Accumulo cluster and learning how to monitor the system and implement aspects such as security. Who this book is written for This book is great for developers new to Accumulo, who are looking to get a good grounding in how to use Accumulo. It’s assumed that you have an understanding of how Hadoop works, both HDFS and the Map/Reduce. No prior knowledge of ZooKeeper is assumed. Table of Contents Chapter 1: Building an Accumulo Cluster from Scratch Chapter 2: Monitoring and Managing Accumulo Chapter 3: Integrating Accumulo into Various Cloud Platforms Chapter 4: Optimizing Accumulo Performance Chapter 5: Security Appendix A: Accumulo Command References Appendix B: Hadoop Command References Appendix C: ZooKeeper Command References Book Details Title: Apache Accumulo for Developers Author: Guðmundur Jón Halldórsson Length: 120 pages Edition: 1 Language: English Publisher: Packt Publishing Publication Date: 2013-10-16 ISBN-10: 1783285990 ISBN-13: 9781783285990