精通Elasticsearch：从入门到实践教程

需积分: 10 84 浏览量更新于2024-07-19 收藏 1.03MB PDF 举报

"Elasticsearch教程" Elasticsearch是一个基于Lucene构建的分布式、多租户的全文搜索引擎，它提供了一个HTTP web接口和无模式的JSON文档。该技术由Java编写，遵循Apache许可证开源。Elasticsearch是企业搜索领域最受欢迎的引擎，其次是同样基于Lucene的Apache Solr。在Elasticsearch中，可以搜索各种类型的文档，并且具有可扩展的搜索能力、接近实时的搜索速度以及对多租户的支持。其分布式特性意味着索引可以被划分为多个分片，每个分片可以有零个或多个副本。每个节点托管一个或多个分片，并作为协调器将操作委托给正确的分片。Elasticsearch自动处理重新平衡和路由。相关数据通常存储在同一索引中，由一个或多个主分片和零个或多个副本分片组成。一旦创建了索引，主分片的数量就不能更改。本电子书提供了一系列教程，帮助你开发自己的基于Elasticsearch的应用程序。覆盖了广泛的主题，包括安装与运维、Java API集成和报告。通过这些直接明了的教程，你可以用最短的时间让自己的项目运行起来。第1章介绍Elasticsearch，从基础到进阶，讲解了文档、索引、索引设置、映射（包括高级映射）、索引过程、国际化(i18n)、运行Elasticsearch的方式（如独立实例、集群、嵌入应用程序和容器化运行）以及Elasticsearch的应用场景。最后，章节还涵盖了如何进一步学习的建议。第2章专注于命令行使用Elasticsearch，内容包括检查集群健康状况、管理索引、处理文档、优化映射类型、搜索时间、查询操作、从搜索到洞察以及监控集群状态等。章节末尾提供了下一步的学习路径。第3章则深入到使用Java进行Elasticsearch开发，介绍了如何使用Java客户端API和Java REST客户端进行交互。这个教程旨在帮助读者全面理解Elasticsearch的工作原理和实际应用，无论你是初学者还是有经验的开发者，都能从中受益，提升在搜索和数据分析领域的技能。

Elasticsearch Tutorial 4 / 54

– long_range - indexes a range of signed 64-bit integers

– double_range - indexes a range of double-precision 64-bit IEEE 754 ﬂoating point values

– date_range - indexes a range of date values represented as unsigned 64-bit integer milliseconds elapsed since system epoch

Cannot stress it enough, choosing the proper data type for the ﬁelds (properties) of your documents is a key for fast, effective

search which delivers really relevant results. There is one catch though: the ﬁelds in each mapping type are not entirely indepen-

dent of each other. The ﬁelds with the same name and within the same index but in different mapping types must have the

same mapping deﬁnition. The reason is that internally those ﬁelds are mapped to the same ﬁeld.

Getting back to our application data model, let us try to deﬁne the simplest mapping type for books collections, utilizing our

just acquired knowledge about data types.

Figure 1.2: Mapping Book Catalog: ﬁrst attempt

For most of the book properties the mapping data types are pretty straightforward but what about authors and categories?

Those properties essentially contain the collection of values for which Elasticsearch has no direct data type yet, . . . or has it?

1.2.5 Advanced Mappings

Interestingly, indeed Elasticsearch has no dedicated array or collection type but by default, any ﬁeld may contain zero or more

values (of its data type).

In case of complex data structures, Elasticsearch supports mapping using object and nested data types as well as establishing

parent/child relationships between documents within the same index. There are pros and cons of each approach but in order to

learn how to use those techniques let us store categories as nested property of the books mapping type, while authors

are going to be represented as a dedicated mapping which refers to books as parent.

Elasticsearch Tutorial 6 / 54

You may be surprised but explicit deﬁnition of the ﬁelds and mapping types could be omitted. Elasticsearch supports dynamic

mapping thereby new mapping types and new ﬁeld names will be added automatically when document is indexed (in this case

Elasticsearch makes a decision what the ﬁeld data types should be).

Another important detail to mention is that each mapping type can have custom metadata associated with it by using special

_meta property. It is exceptionally useful technique which will be used by us later on in the tutorial.

1.2.6 Indexing

Once Elasticsearch has all your indices and their mapping types deﬁned (or inferred using dynamic mapping), it is ready to

analyze and index the documents. It is quite complex but interesting process which involves at least analyzers, tokenizers, token

ﬁlters and character ﬁlters.

Elasticsearch supports quite a rich number of mapping parameters which let you tailor the indexing, analysis and search phases

precisely to your needs. For example, every single ﬁeld (or property) could be conﬁgured to use own index-time and search-

time analyzers, support synonyms, apply stemming, ﬁlter out stop words and much, much more. By carefully crafting these

parameters you may end up with superior search capabilities, however the opposite also holds true, having them loose, and a lot

of irrelevant and noisy results may be returned every time.

If you don’t need all that, you are good to go with the defaults as we have done in the previous section, omitting the parameters

altogether. However, it is rarely the case. To give a realistic example, most of the time our applications have to support multiple

languages (and locales). Luckily, Elasticsearch shines here as well.

Before we move on to the next topic, there is an important constraint you have to be aware of. Once the mapping types are

conﬁgured, in majority of cases they cannot be updated as it automatically assumes that all the documents in the corresponding

collections are not up to date anymore and should be re-indexed.

1.2.7 Internalization (i18n)

The process of indexing and analyzing the documents is very sensitive to the native language of the document. By default,

Elasticsearch uses standard analyzer if none is speciﬁed in the mapping types. It works well for most of the languages but

Elasticsearch supplies the dedicated analyzers for Arabic, Armenian, Basque, Brazilian, Bulgarian, Czech, Danish, Dutch, En-

glish, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Latvian, Lithuanian, Norwegian, Persian,

Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, Thai and a few more.

There are couple of ways to approach the indexing of the same document in multiple languages, depending on your data model

and business case. For example, if document instances physically exist (translated) in multiple languages, than it probably makes

sense to have one index per language.

In case when documents are partially translated, Elasticsearch has another interesting option hidden in the sleeves called multi-

ﬁelds. Multi-ﬁelds allow indexing the same document ﬁeld (property) in different ways to be used for different purposes (like, for

example, supporting multiple languages). Getting back to our books mapping type, we may have deﬁned the title property

as a multi-ﬁeld one, for example:

"title": {

"type": "text",

"fields": {

"en": { "type": "text", "analyzer": "english" },

"fr": { "type": "text", "analyzer": "french" },

"de": { "type": "text", "analyzer": "german" },

...

}

Those are not the only options available but they illustrate well enough the ﬂexibility and maturity of the Elasticsearch in fulﬁlling

quite sophisticated demands.

剩余60页未读，继续阅读

ricoyu2009

粉丝: 0
资源: 19

精通Elasticsearch：从入门到实践教程

OpenGL ES Tutorial for Android.zip

elasticsearch - 搜索引擎 - elasticsearch tutorial

Elasticsearch实时数据索引与搜索

Elasticsearch的数据分析与聚合

Elasticsearch的安装与配置详解

基于Elasticsearch的文本搜索与分析

Elasticsearch的索引管理和查询优化

ElasticSearch 教程

linux elasticsearch教程

ElasticSearch怎么使用

最新资源