构建Azure云大数据架构:利用HDInsight实战

需积分: 10 3 下载量 137 浏览量 更新于2024-07-20 收藏 8.62MB PDF 举报
《HDInsight Essentials (第二版,2015)》是一本由Rajesh Nadipalli编著的专业书籍,针对传统关系型数据库在处理大数据挑战时的不足,介绍了如何构建和部署现代大数据架构,以增强企业的数据管理能力。本书的核心焦点是Microsoft Azure云中的Hadoop为基础的服务—HDInsight。通过深入讲解,读者将学会如何有效地设置、管理和分析高容量、高速度的数据。 该书提供了丰富的实践案例,帮助读者了解如何创建自己的HDInsight集群,以便于数据的摄取、组织、转换和分析。书中涵盖了Hadoop生态系统的关键技术,包括Hive(用于数据仓库查询)、Pig(数据流编程语言)、MapReduce(分布式计算框架)、HBase(NoSQL数据库)、Storm(实时数据处理系统)等。此外,作者还探讨了如何利用Excel PowerQuery、PowerMap和PowerBI等数据分析解决方案,以提升数据可视化和商业洞察力。 读者在阅读过程中,不仅能掌握HDInsight的核心功能,还能了解到如何将其与其他工具和技术结合,构建一个全面的大数据处理平台。本书不仅适合已有一定IT基础的专业人士,也对那些希望转型大数据领域的初学者提供了一个实用的学习路径。 值得注意的是,版权方面,所有内容未经出版商Packt Publishing事先书面许可,不得复制、存储或以任何形式传播,除非在评论或学术文章中引用。尽管作者和出版社尽力确保信息的准确性,但书中提供的信息不附带任何保修,不承担因使用本书内容导致的直接或间接损失的责任。 《HDInsight Essentials (第二版,2015)》是一本实用的指南,旨在帮助企业利用HDInsight技术应对大数据时代的挑战,提升业务决策效率,并向读者展示了如何在这个快速发展的领域中保持竞争力。
226 浏览量
With Microsoft HDInsight, business professionals and data analysts can rapidly leverage the power of Hadoop on a flexible, scalable cloud-based platform, using Microsoft's accessible business intelligence, visualization, and productivity tools. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to provision, configure, monitor, troubleshoot, and use HDInsight, even if you're new to big data analytics. Each short, easy lesson builds on all that's come before: you'll learn all of HDInsight's essentials as you solve real data analytics problems. Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours covers all this, and much more: Introduction of Big Data, NoSQL systems, its Business Value Proposition and use cases examples Introduction to Hadoop, Architecture, Ecosystem and Microsoft HDInsight Getting to know Hadoop 2.0 and the innovations it provides like HDFS2 and YARN Quickly installing, configuring, and monitoring Hadoop (HDInsight) clusters in the cloud and automating cluster provisioning Customize the HDInsight cluster and install additional Hadoop ecosystem projects using Script Actions Administering HDInsight from the Hadoop command prompt or Microsoft PowerShell Using the Microsoft Azure HDInsight Emulator for learning or development Understanding HDFS, HDFS vs. Azure Blob Storage, MapReduce Job Framework and Job Execution Pipeline Doing big data analytics with MapReduce, writing your MapReduce programs in your choice of .NET programming language such as C# Using Hive for big data analytics, demonstrate end to end scenario and how Apache Tez improves the performance several folds Consuming HDInsight data from Microsoft BI Tools over Hive ODBC Driver - Using HDInsight with Microsoft BI and Power BI to simplify data integration, analysis, and reporting Using PIG for big data transformation workflows step by step Apache HBase on HDInsight, its architecture, data model, HBase vs. Hive, programmatically managing HBase data with C# and Apache Phoenix Using Sqoop or SSIS (SQL Server Integration Services) to move data to/from HDInsight and build data integration workflows for transferring data Using Oozie for scheduling, co-ordination and managing data processing workflows in HDInsight cluster Using R programming language with HDInsight for performing statistical computing on Big Data sets Using Apache Spark's in-memory computation model to run big data analytics up to 100 times faster than Hadoop MapReduce Perform real-time Stream Analytics on high-velocity big data streams with Storm Integration of Enterprise Data Warehouse with Hadoop and Microsoft Analytics Platform System (APS), formally known as SQL Server Parallel Data Warehouse (PDW) Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid problems. By the time you're finished, you'll be comfortable going beyond the book to create any HDInsight app you can imagine! Table of Contents Part I: Understanding Big Data, Hadoop 1.0, and 2.0 Hour 1. Introduction of Big Data, NoSQL, and Business Value Proposition Hour 2. Introduction to Hadoop, Its Architecture, Ecosystem, and Microsoft Offerings Hour 3. Hadoop Distributed File System Versions 1.0 and 2.0 Hour 4. The MapReduce Job Framework and Job Execution Pipeline Hour 5. MapReduce—Advanced Concepts and YARN Part II: Getting Started with HDInsight and Understanding Its Different Components Hour 6. Getting Started with HDInsight, Provisioning Your HDInsight Service Cluster, and Automating HDInsight Cluster Provisioning Hour 7. Exploring Typical Components of HDFS Cluster Hour 8. Storing Data in Microsoft Azure Storage Blob Hour 9. Working with Microsoft Azure HDInsight Emulator Part III: Programming MapReduce and HDInsight Script Action Hour 10. Programming MapReduce Jobs Hour 11. Customizing the HDInsight Cluster with Script Action Part IV: Querying and Processing Big Data in HDInsight Hour 12. Getting Started with Apache Hive and Apache Tez in HDInsight Hour 13. Programming with Apache Hive, Apache Tez in HDInsight, and Apache HCatalog Hour 14. Consuming HDInsight Data from Microsoft BI Tools over Hive ODBC Driver: Part 1 Hour 15. Consuming HDInsight Data from Microsoft BI Tools over Hive ODBC Driver: Part 2 Hour 16. Integrating HDInsight with SQL Server Integration Services Hour 17. Using Pig for Data Processing Hour 18. Using Sqoop for Data Movement Between RDBMS and HDInsight Part V: Managing Workflow and Performing Statistical Computing Hour 19. Using Oozie Workflows and Job Orchestration with HDInsight Hour 20. Performing Statistical Computing with R Part VI: Performing Interactive Analytics and Machine Learning Hour 21. Performing Big Data Analytics with Spark Hour 22. Microsoft Azure Machine Learning Part VII: Performing Real-time Analytics Hour 23. Performing Stream Analytics with Storm Hour 24. Introduction to Apache HBase on HDInsight