Elasticsearch Mapping

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.

Mappings are used to define:

  • which string fields should be treated as full text fields.
  • which fields contain numbers, dates, or geolocations.
  • the  format  of date values.
  • custom rules to control the mapping for  dynamically added fields.

Field Data Types

  • a simple type like text, keyword, date, long, double, boolean or ip.
  • a type which supports the hierarchical nature of JSON such as object or nested.
  • or a specialised type like geo_point, geo_shape, or completion.

It is often useful to index the same field in different ways for different purposes. For instance, a string field could be  indexed  as a text field for full-text search, and as a keyword field for sorting or aggregations. Alternatively, you could index a string field with the standard analyzer , the english analyzer, and the french analyzer .

This is the purpose of multi-fields. Most datatypes support multi-fields via the fields parameter.

The following settings allow you to limit the number of field mappings that can be created manually or dynamically, in order to prevent bad documents from causing a mapping explosion:index.mapping.total_fields.limit

index.mapping.total_fields.limit – The maximum number of fields in an index. Field and object mappings, as well as field aliases count towards this limit. The default value is 1000.

index.mapping.depth.limit – The maximum depth for a field, which is measured as the number of inner objects. For instance, if all fields are defined at the root object level, then the depth is 1. If there is one object mapping, then the depth is 2, etc. The default is 20.

index.mapping.nested_fields.limit – The maximum number of distinct nested mappings in an index, defaults to 50.

index.mapping.nested_objects.limit – The maximum number of nested JSON objects within a single document across all nested types, defaults to 10000.

Dynamic Mapping

Fields and mapping types do not need to be defined before being used. Thanks to dynamic mapping, new field names will be added automatically, just by indexing a document. New fields can be added both to the top-level mapping type, and to inner object and nested fields.

Mapping Example

PUT user 
{
  "mappings": {
    "properties": { 
      "title":    { "type": "text"  }, 
      "name":     { "type": "text"  }, 
      "age":      { "type": "integer" },  
      "created":  {
        "type":   "date", 
        "format": "strict_date_optional_time||epoch_millis"
      }
    }
  }
}

 Java Mapping Example 

String indexName = "doctors";

CreateIndexRequest request = new CreateIndexRequest(indexName);

request.settings(Settings.builder().put("index.number_of_shards", 1).put("index.number_of_replicas", 2));

XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject();
{
    builder.startObject("properties");
    {
        builder.startObject("locations");
        {
             builder.field("type", "geo_point");
        }
        builder.endObject();
                    
        builder.startObject("addresses");
        {
            builder.field("type", "nested");
        }
        builder.endObject();
                    
        builder.startObject("specialities");
        {
            builder.field("type", "nested");
        }
        builder.endObject();
                    
                    
    }
    builder.endObject();
}
builder.endObject();
request.mapping(builder);
            
CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request,RequestOptions.DEFAULT);

Inverted Index

An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.

An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data. By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure.

 

Source code on Github

 




Subscribe To Our Newsletter
You will receive our latest post and tutorial.
Thank you for subscribing!

required
required


Leave a Reply

Your email address will not be published. Required fields are marked *