Elasticsearch 2.x权威指南：从入门到深度实践

需积分: 9 115 浏览量更新于2024-07-21 收藏 3.65MB PDF 举报

《Elasticsearch权威指南2.x版本》是一本详尽的文档，专为IT专业人士设计，深入介绍了Elasticsearch这款强大的分布式搜索引擎。该书旨在帮助读者理解、安装、配置和操作Elasticsearch，以便在实际项目中高效利用其功能。首先，本书以引言和前言为引导，介绍了作者写作这本书的初衷和目标读者群体——那些寻求在大数据处理和实时分析场景下提升搜索性能的专业人士。书中特别关注了2.x版本，确保内容与当时的最新技术保持同步。 "Getting Started"章节带领读者快速上手，从为什么要选择Elasticsearch，如其文档导向的架构和在搜索效率上的优势，到实际安装和运行环境的设置。这里会介绍如何搭建一个基本的Elasticsearch实例，并演示如何与之交互。随着对Elasticsearch基础概念的掌握，读者将学习如何进行文档操作，包括索引（Indexing）、检索（Retrieving）、创建、更新和删除文档。例如，"Indexing Employee Documents"部分展示了如何将结构化的员工数据存储到Elasticsearch中，而"Updating a Whole Document"则涵盖了如何全面修改文档内容。 "Search with Query DSL"是核心章节，讲解了查询语言（Query DSL）的强大功能，允许用户执行复杂的搜索操作，包括全文本搜索、短语搜索以及高级搜索技巧。同时，"Highlighting Our Searches"部分介绍了如何在搜索结果中高亮显示相关关键词，提升用户体验。对于数据分析和实时分析，"Analytics"部分探讨了如何通过Elasticsearch进行数据挖掘和聚合，以及如何在搜索结果中嵌入统计信息。此外，书中的教程在实践环节结束后，总结了学到的关键点，并鼓励读者进一步探索Elasticsearch的分布式特性。 "Distributed Nature"深入讨论了Elasticsearch集群的工作原理，包括空集群的初始化、添加新节点、水平扩展以及故障恢复策略。读者将了解到如何确保数据的一致性和可靠性，以及如何应对各种可能的数据进出场景。《Elasticsearch权威指南2.x版本》不仅提供了一个实用的学习路径，还包含了大量的代码示例和实践经验，帮助读者逐步精通这个现代搜索引擎的核心功能。无论是初学者还是经验丰富的开发者，都能从中找到提升技能和解决实际问题所需的资源。

Full-Text Search

that performs a range search, and reused the same match query as before. Now our results show only one employee who

happens to be 32 and is named Jane Smith:

{

...

"hits": {

"total": 1,

"max_score": 0.30685282,

"hits": [

{

...

"_source": {

"first_name": "Jane",

"last_name": "Smith",

"age": 32,

"about": "I like to collect rock albums",

"interests": [ "music" ]

}

]

}

The searches so far have been simple: single names, filtered by age. Let’s try a more advanced, full-text search—a task that

traditional databases would really struggle with.

We are going to search for all employees who enjoy rock climbing:

GET /megacorp/employee/_search

{

"query" : {

"match" : {

"about" : "rock climbing"

}

You can see that we use the same match query as before to search the about field for “rock climbing.” We get back two

matching documents:

{

...

"hits": {

"total": 2,

"max_score": 0.16273327,

"hits": [

{

...

"_score": 0.16273327, <1>

"_source": {

"first_name": "John",

"last_name": "Smith",

"age": 25,

"about": "I love to go rock climbing",

"interests": [ "sports", "music" ]

}

{

...

"_score": 0.016878016, <1>

"_source": {

"first_name": "Jane",

"last_name": "Smith",

"age": 32,

"about": "I like to collect rock albums",

"interests": [ "music" ]

}

]

}

The relevance scores

Phrase Search

Highlighting Our Searches

By default, Elasticsearch sorts matching results by their relevance score, that is, by how well each document matches the

query. The first and highest-scoring result is obvious: John Smith’s about field clearly says “rock climbing” in it.

But why did Jane Smith come back as a result? The reason her document was returned is because the word “rock” was

mentioned in her about field. Because only “rock” was mentioned, and not “climbing,” her _score is lower than John’s.

This is a good example of how Elasticsearch can search within full-text fields and return the most relevant results first. This

concept of relevance is important to Elasticsearch, and is a concept that is completely foreign to traditional relational

databases, in which a record either matches or it doesn’t.

Finding individual words in a field is all well and good, but sometimes you want to match exact sequences of words or

phrases. For instance, we could perform a query that will match only employee records that contain both “rock” and “climbing”

and that display the words next to each other in the phrase “rock climbing.”

To do this, we use a slight variation of the match query called the match_phrase query:

GET /megacorp/employee/_search

{

"query" : {

"match_phrase" : {

"about" : "rock climbing"

}

This, to no surprise, returns only John Smith’s document:

{

...

"hits": {

"total": 1,

"max_score": 0.23013961,

"hits": [

{

...

"_score": 0.23013961,

"_source": {

"first_name": "John",

"last_name": "Smith",

"age": 25,

"about": "I love to go rock climbing",

"interests": [ "sports", "music" ]

}

]

}

Many applications like to highlight snippets of text from each search result so the user can see why the document matched

the query. Retrieving highlighted fragments is easy in Elasticsearch.

Let’s rerun our previous query, but add a new highlight parameter:

GET /megacorp/employee/_search

{

"query" : {

"match_phrase" : {

"about" : "rock climbing"

}

"highlight": {

"fields" : {

"about" : {}

}

When we run this query, the same hit is returned as before, but now we get a new section in the response called highlight.

This contains a snippet of text from the about field with the matching words wrapped in HTML tags:

{

...

"hits": {

Analytics

"total": 1,

"max_score": 0.23013961,

"hits": [

{

...

"_score": 0.23013961,

"_source": {

"first_name": "John",

"last_name": "Smith",

"age": 25,

"about": "I love to go rock climbing",

"interests": [ "sports", "music" ]

"highlight": {

"about": [

"I love to go rock climbing" <1>

]

}

]

}

The highlighted fragment from the original text

You can read more about the highlighting of search snippets in the highlighting reference documentation.

Finally, we come to our last business requirement: allow managers to run analytics over the employee directory.

Elasticsearch has functionality called aggregations, which allow you to generate sophisticated analytics over your data. It is

similar to GROUP BY in SQL, but much more powerful.

For example, let’s find the most popular interests enjoyed by our employees:

GET /megacorp/employee/_search

{

"aggs": {

"all_interests": {

"terms": { "field": "interests" }

}

Ignore the syntax for now and just look at the results:

{

...

"hits": { ... },

"aggregations": {

"all_interests": {

"buckets": [

{

"key": "music",

"doc_count": 2

{

"key": "forestry",

"doc_count": 1

{

"key": "sports",

"doc_count": 1

}

]

}

We can see that two employees are interested in music, one in forestry, and one in sports. These aggregations are not

precalculated; they are generated on the fly from the documents that match the current query. If we want to know the popular

interests of people called Smith, we can just add the appropriate query into the mix:

GET /megacorp/employee/_search

{

"query": {

Tutorial Conclusion

Distributed Nature

Next Steps

The output is basically an enriched version of the first aggregation we ran. We still have a list of interests and their counts, but

now each interest has an additional avg_age, which shows the average age for all employees having that interest.

Even if you don’t understand the syntax yet, you can easily see how complex aggregations and groupings can be

accomplished using this feature. The sky is the limit as to what kind of data you can extract!

Hopefully, this little tutorial was a good demonstration about what is possible in Elasticsearch. It is really just scratching the

surface, and many features—such as suggestions, geolocation, percolation, fuzzy and partial matching—were omitted to

keep the tutorial short. But it did highlight just how easy it is to start building advanced search functionality. No configuration

was needed—just add data and start searching!

It’s likely that the syntax left you confused in places, and you may have questions about how to tweak and tune various

aspects. That’s fine! The rest of the book dives into each of these issues in detail, giving you a solid understanding of how

Elasticsearch works.

At the beginning of this chapter, we said that Elasticsearch can scale out to hundreds (or even thousands) of servers and

handle petabytes of data. While our tutorial gave examples of how to use Elasticsearch, it didn’t touch on the mechanics at

all. Elasticsearch is distributed by nature, and it is designed to hide the complexity that comes with being distributed.

The distributed aspect of Elasticsearch is largely transparent. Nothing in the tutorial required you to know about distributed

systems, sharding, cluster discovery, or dozens of other distributed concepts. It happily ran the tutorial on a single node living

inside your laptop, but if you were to run the tutorial on a cluster containing 100 nodes, everything would work in exactly the

same way.

Elasticsearch tries hard to hide the complexity of distributed systems. Here are some of the operations happening

automatically under the hood:

Partitioning your documents into different containers or shards, which can be stored on a single node or on multiple

nodes

Balancing these shards across the nodes in your cluster to spread the indexing and search load

Duplicating each shard to provide redundant copies of your data, to prevent data loss in case of hardware failure

Routing requests from any node in the cluster to the nodes that hold the data you’re interested in

Seamlessly integrating new nodes as your cluster grows or redistributing shards to recover from node loss

As you read through this book, you’ll encounter supplemental chapters about the distributed nature of Elasticsearch. These

chapters will teach you about how the cluster scales and deals with failover ([distributed-cluster]), handles document storage

([distributed-docs]), executes distributed search ([distributed-search]), and what a shard is and how it works ([inside-a-shard]).

These chapters are not required reading—you can use Elasticsearch without understanding these internals—but they will

provide insight that will make your knowledge of Elasticsearch more complete. Feel free to skim them and revisit at a later

point when you need a more complete understanding.

By now you should have a taste of what you can do with Elasticsearch, and how easy it is to get started. Elasticsearch tries

hard to work out of the box with minimal knowledge and configuration. The best way to learn Elasticsearch is by jumping in:

just start indexing and searching!

However, the more you know about Elasticsearch, the more productive you can become. The more you can tell Elasticsearch

about the domain-specific elements of your application, the more you can fine-tune the output.

The rest of this book will help you move from novice to expert. Each chapter explains the essentials, but also includes expert-

level tips. If you’re just getting started, these tips are probably not immediately relevant to you; Elasticsearch has sensible

defaults and will generally do the right thing without any interference. You can always revisit these chapters later, when you

are looking to improve performance by shaving off any wasted milliseconds.

Life Inside a Cluster

Supplemental Chapter

As mentioned earlier, this is the first of several supplemental chapters about how Elasticsearch operates in a distributed

environment. In this chapter, we explain commonly used terminology like cluster, node, and shard, the mechanics of how

Elasticsearch scales out, and how it deals with hardware failure.

Although this chapter is not required reading—you can use Elasticsearch for a long time without worrying about shards,

replication, and failover—it will help you to understand the processes at work inside Elasticsearch. Feel free to skim

through the chapter and to refer to it again later.

Elasticsearch is built to be always available, and to scale with your needs. Scale can come from buying bigger servers

(vertical scale, or scaling up) or from buying more servers (horizontal scale, or scaling out).

While Elasticsearch can benefit from more-powerful hardware, vertical scale has its limits. Real scalability comes from

horizontal scale—the ability to add more nodes to the cluster and to spread load and reliability between them.

With most databases, scaling horizontally usually requires a major overhaul of your application to take advantage of these

extra boxes. In contrast, Elasticsearch is distributed by nature: it knows how to manage multiple nodes to provide scale and

high availability. This also means that your application doesn’t need to care about it.

剩余336页未读，继续阅读

哥的世界你不懂

粉丝: 17
资源: 12

Elasticsearch 2.x权威指南：从入门到深度实践

Elasticsearch全方位指南：分布式实时搜索引擎

深入理解Elasticsearch：分布式实时搜索引擎指南

Elasticsearch权威指南：全面构建指南解析

Elasticsearch.The.Definitive.Guide.2015.1.epub

Elastic Search - The Definitive Guide

Elasticsearch- The Definitive Guide

elasticsearch-the-definitive-guide

elasticsearch-the-definitive-guide-cn

elasticsearch-definitive-guide.zip

elasticsearch-definitive-guide-en:elasticsearch-definitive-guide-en Gitbook 英文版

最新资源