Keyword is used for structured content such as IDs, email addresses, hostnames, status codes, zip codes, or tags. Content of keyword data type does not get broken down like the text data type. Only a search for the exact email “folaukaveinga@gmail.com” would return it as a result.
Consider mapping a numeric identifier as a keyword
if:
range
queries.term
query searches on keyword
fields are often faster than term
searches on numeric fields.If you’re unsure which to use, you can use a multi-field to map the data as both a keyword
and a numeric data type.
Text data type is a field to index full-text values, such as the body of an email or the description of a product. These fields are analyzed
, that is they are passed through an analyzer to convert the string into a list of individual terms(broken down) before being indexed. The analysis process allows Elasticsearch to search for individual words within each full text field. Text fields are not used for sorting and seldom used for aggregations.
text
fields are searchable by default, but by default are not available for aggregations, sorting, or scripting. If you try to sort, aggregate, or access values from a script on a text
field, you will see this exception:
Fielddata is disabled on text fields by default. Set fielddata=true
on your_field_name
in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
Field data is the only way to access the analyzed tokens from a full text field in aggregations, sorting, or scripting. For example, a full text field like New York
would get analyzed as new
and york
. To aggregate on these tokens requires field data.
PUT doctor/_mapping { "properties": { "email": { "type": "text", "fielddata": true } } }
Numeric data type can be long, integer, short, byte, double, float, and unsign_float. As far as integer types (byte
, short
, integer
and long
) are concerned, you should pick the smallest type which is enough for your use-case. This will help indexing and searching be more efficient. Note however that storage is optimized based on the actual values that are stored, so picking one type over another one will have no impact on storage requirements.
For floating-point types, it is often more efficient to store floating-point data into an integer using a scaling factor, which is what the scaled_float
type does under the hood. For instance, a price
field could be stored in a scaled_float
with a scaling_factor
of 100
. All APIs would work as if the field was stored as a double, but under the hood Elasticsearch would be working with the number of cents, price*100
, which is an integer. This is mostly helpful to save disk space since integers are way easier to compress than floating points. scaled_float
is also fine to use in order to trade accuracy for disk space. For instance imagine that you are tracking cpu utilization as a number between 0
and 1
. It usually does not matter much whether cpu utilization is 12.7%
or 13%
, so you could use a scaled_float
with a scaling_factor
of 100
in order to round cpu utilization to the closest percent in order to save space.
If scaled_float
is not a good fit, then you should pick the smallest type that is enough for the use-case among the floating-point types: double
, float
and half_float
. Here is a table that compares these types in order to help make a decision.
Date date type in Elasticsearch can either be:
"2015-01-01"
or "2015/01/01 12:10:30"
Dates will always be rendered as strings, even if they were initially supplied as a long in the JSON document. Date formats can be customised, but if no format
is specified then it uses the default.
Multiple formats can be specified by separating them with ||
as a separator. Each format will be tried in turn until a matching format is found. The first format will be used to convert the milliseconds-since-the-epoch value back into a string.
{ "mappings": { "properties": { "date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" } } } }
Nested data type is a specialised version of the object
data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other.
When ingesting key-value pairs with a large, arbitrary set of keys, you might consider modeling each key-value pair as its own nested document with key
and value
fields. Instead, consider using the flattened data type, which maps an entire object as a single field and allows for simple searches over its contents. Nested documents and queries are typically expensive, so using the flattened
data type for this use case is a better option.
Nested documents can be:
nested
query.nested
and reverse_nested
aggregations.Because nested documents are indexed as separate documents, they can only be accessed within the scope of the nested
query, the nested
/reverse_nested
aggregations, or nested inner hits.
There is a limit of 50 nested documents which can be changed to a higher value.
Flattened data type provides an alternative approach, where the entire object is mapped as a single field. Given an object, the flattened
mapping will parse out its leaf values and index them into one field as keywords. The object’s contents can then be searched through simple queries and aggregations.
This data type can be useful for indexing objects with a large or unknown number of unique keys. Only one field mapping is created for the whole JSON object, which can help prevent a mappings explosion from having too many distinct field mappings.
On the other hand, flattened object fields present a trade-off in terms of search functionality. Only basic queries are allowed, with no support for numeric range queries or highlighting.
The flattened
mapping type should not be used for indexing all document content, as it treats all values as keywords and does not provide full search functionality. The default approach, where each subfield has its own entry in the mappings, works well in the majority of cases.
Join data type is a special field that creates parent/child relation within documents of the same index. The relations
section defines a set of possible relations within the documents, each relation being a parent name and a child name.
The join field shouldn’t be used like joins in a relation database. In Elasticsearch the key to good performance is to de-normalize your data into documents. Each join field, has_child
or has_parent
query adds a significant tax to your query performance.
The only case where the join field makes sense is if your data contains a one-to-many relationship where one entity significantly outnumbers the other entity. An example of such case is a use case with products and offers for these products. In the case that offers significantly outnumbers the number of products then it makes sense to model the product as parent document and the offer as child document.
join
field mapping is allowed per index.routing
value needs to be provided when getting, deleting, or updating a child document.join
field.
Multi Fields
It is often useful to index the same field in different ways for different purposes. For instance, a string
field could be mapped as a text
field for full-text search, and as a keyword
field for sorting or aggregations. Alternatively, you could index a text field with the standard
analyzer, the english
analyzer, and the french
analyzer.
This is the purpose of multi-fields. Most field types support multi-fields via the fields
parameter.