Elasticsearch Tutorial 6 / 54
You may be surprised but explicit definition of the fields and mapping types could be omitted. Elasticsearch supports dynamic
mapping thereby new mapping types and new field names will be added automatically when document is indexed (in this case
Elasticsearch makes a decision what the field data types should be).
Another important detail to mention is that each mapping type can have custom metadata associated with it by using special
_meta property. It is exceptionally useful technique which will be used by us later on in the tutorial.
1.2.6 Indexing
Once Elasticsearch has all your indices and their mapping types defined (or inferred using dynamic mapping), it is ready to
analyze and index the documents. It is quite complex but interesting process which involves at least analyzers, tokenizers, token
filters and character filters.
Elasticsearch supports quite a rich number of mapping parameters which let you tailor the indexing, analysis and search phases
precisely to your needs. For example, every single field (or property) could be configured to use own index-time and search-
time analyzers, support synonyms, apply stemming, filter out stop words and much, much more. By carefully crafting these
parameters you may end up with superior search capabilities, however the opposite also holds true, having them loose, and a lot
of irrelevant and noisy results may be returned every time.
If you don’t need all that, you are good to go with the defaults as we have done in the previous section, omitting the parameters
altogether. However, it is rarely the case. To give a realistic example, most of the time our applications have to support multiple
languages (and locales). Luckily, Elasticsearch shines here as well.
Before we move on to the next topic, there is an important constraint you have to be aware of. Once the mapping types are
configured, in majority of cases they cannot be updated as it automatically assumes that all the documents in the corresponding
collections are not up to date anymore and should be re-indexed.
1.2.7 Internalization (i18n)
The process of indexing and analyzing the documents is very sensitive to the native language of the document. By default,
Elasticsearch uses standard analyzer if none is specified in the mapping types. It works well for most of the languages but
Elasticsearch supplies the dedicated analyzers for Arabic, Armenian, Basque, Brazilian, Bulgarian, Czech, Danish, Dutch, En-
glish, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Latvian, Lithuanian, Norwegian, Persian,
Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, Thai and a few more.
There are couple of ways to approach the indexing of the same document in multiple languages, depending on your data model
and business case. For example, if document instances physically exist (translated) in multiple languages, than it probably makes
sense to have one index per language.
In case when documents are partially translated, Elasticsearch has another interesting option hidden in the sleeves called multi-
fields. Multi-fields allow indexing the same document field (property) in different ways to be used for different purposes (like, for
example, supporting multiple languages). Getting back to our books mapping type, we may have defined the title property
as a multi-field one, for example:
"title": {
"type": "text",
"fields": {
"en": { "type": "text", "analyzer": "english" },
"fr": { "type": "text", "analyzer": "french" },
"de": { "type": "text", "analyzer": "german" },
...
}
}
Those are not the only options available but they illustrate well enough the flexibility and maturity of the Elasticsearch in fulfilling
quite sophisticated demands.