Elasticsearch Table of Content




Subscribe To Our Newsletter
You will receive our latest post and tutorial.
Thank you for subscribing!

required
required


Elasticsearch Installation

Local Installation

Installing Elasticsearch on your local Mac computer. I have a Mac laptop so this local installation will be for Mac users but you can google Elasticsearch Windows installation.

I use docker and here is the docker command, 

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.1.0

Once you have pulled the elasticsearch image, run the following command to start Elasticsearch 

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.1.0

If you want to keep your local Elasticsearch server running at all times then run this command. Even when you turn off your computer, it will start back up when you computer comes back on.

docker run -p 9200:9200 -p 9300:9300 --name elasticsearch -e "discovery.type=single-node" -dit --restart unless-stopped -d docker.elastic.co/elasticsearch/elasticsearch:7.1.0

To check if installation well, go to localhost:9200/_cat/nodes?v&pretty on your browser.

You should see something like this which is the status of your Elasticsearch node.

ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.17.0.3           20          96   0    0.00    0.02     0.00 mdi       *      f3056ceb45a2

You also need to install kibana which is a UI interface that works with Elasticsearch

Docker command to pull kibana docker image

docker pull docker.elastic.co/kibana/kibana:7.1.0

Run this command to start your kibana server

docker run --link YOUR_ELASTICSEARCH_CONTAINER_NAME_OR_ID:elasticsearch -p 5601:5601 {docker-repo}:{version}

 

 

 

September 24, 2020

Elasticsearch Search API

You can search data in Elasticsearch by sending a get request with query string as a parameter or post a query in the message body of post request. A search query, or query, is a request for information about data in Elasticsearch data streams or indices. 

GET doctor_ut/_search
{
  "query": {
    "match_all": {}
  }
}

 

Java example of search API

String indexName = Index.DOCTOR_UT.name().toLowerCase();

SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.allowPartialSearchResults(true);
searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen());

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.termQuery("addresses.state.keyword", "UT"));

int from = 1;
int size = 1000;

searchSourceBuilder.from(from);
searchSourceBuilder.size(size);
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

searchRequest.source(searchSourceBuilder);

// with sorting
// log.info("{\"query\":{}, \"sort\":{}}", searchSourceBuilder.query().toString(),
// searchSourceBuilder.sorts().toString());

log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString());

SearchResponse searchResponse = null;

try {
   searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

} catch (Exception e) {
    log.warn(e.getLocalizedMessage());
}

log.info("search got response from elastic!, totalHits={}, maxScore={}, hitLength={}", searchResponse.getHits().getTotalHits().value, searchResponse.getHits().getMaxScore(),searchResponse.getHits().getHits().length);

Iterator<SearchHit> it = searchResponse.getHits().iterator();

while (it.hasNext()) {
    SearchHit searchHit = it.next();

    try {
       // log.info(searchHit.getSourceAsString());
       DoctorIndex doctorIndex = ObjectUtils.getObjectMapper().readValue(searchHit.getSourceAsString(), DoctorIndex.class);
       log.info("doctorIndex={}", ObjectUtils.toJson(doctorIndex));

       // ObjectUtils.getObjectMapper().writeValue(new FileOutputStream("output-2.json", true),
       // doctorIndex);

    } catch (Exception e) {
       // TODO Auto-generated catch block
       e.printStackTrace();
    }
}

Search API Pagination

By default, the  Search API  returns the top 10 matching documents. 

To paginate through a larger set of results, you can use the search API’s size and from parameters. The size parameter is the number of matching documents to return. The from parameter is a zero-indexed offset from the beginning of the complete result set that indicates the document you want to start with.

By default, you cannot page through more than 10,000 documents using the from and size parameters. This limit is set using the index.max_result_window index setting.

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); 
searchSourceBuilder.query(QueryBuilders.termQuery("addresses.state.keyword", "UT")); 
int from = 1; 
int size = 1000; 
searchSourceBuilder.from(from); 
searchSourceBuilder.size(size);
GET doctor_ut/_search
{
  "from": 5, 
  "size": 5,
  "query": {
    "match_all": {}
  }
}

 

Search Scroll API

The Scroll API can be used to retrieve a large number of results from a search request.

While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.

Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one data stream or index into a new data stream or index with a different configuration.

The scroll API requires a scroll ID. To get a scroll ID, submit a  search API  request that includes an argument for the scroll query parameter . The scroll parameter indicates how long Elasticsearch should retain the  search context  for the request.

The search response returns a scroll ID in the _scroll_id response body parameter. You can then use the scroll ID with the scroll API to retrieve the next batch of results for the request.

You can also use the scroll API to specify a new scroll parameter that extends or shortens the retention period for the search context.

The scroll API returns the same response body as the search API

GET doctor_ut/_search/scroll
{
  "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.allowPartialSearchResults(true);
searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen());
searchRequest.scroll(scroll);

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(1000);
searchSourceBuilder.query(QueryBuilders.termQuery("addresses.state.keyword", "UT"));
searchRequest.source(searchSourceBuilder);

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
log.info("search got response from elastic!, totalHits={}, maxScore={}, hitLength={}", searchResponse.getHits().getTotalHits().value, searchResponse.getHits().getMaxScore(),
                    searchResponse.getHits().getHits().length);

// process searchResponse

String scrollId = searchResponse.getScrollId();
SearchHit[] searchHits = searchResponse.getHits().getHits();

while (searchHits != null && searchHits.length > 0) {

   SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
   scrollRequest.scroll(scroll);
   searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);

   log.info("search got response from elastic!, totalHits={}, maxScore={}, hitLength={}", searchResponse.getHits().getTotalHits().value, searchResponse.getHits().getMaxScore(),
                        searchResponse.getHits().getHits().length);

   // process searchResponse

   scrollId = searchResponse.getScrollId();
   searchHits = searchResponse.getHits().getHits();
}

ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
boolean succeeded = clearScrollResponse.isSucceeded();

 

 

September 24, 2020

Elasticsearch CAT API

Usually the results from various Elasticsearch APIs are displayed in JSON format. But JSON is not easy to read always. So CAT APIs feature is available in Elasticsearch helps in taking care of giving an easier to read and comprehend printing format of the results. There are various parameters used in cat API which serve different purpose, for example – the term V makes the output verbose.

Show indices

Show each index and their details 

GET /_cat/indices?v

Show nodes

The nodes command shows the cluster topology

GET /_cat/nodes?h=ip,port,heapPercent,name

Show health

Show health status of each index

GET /_cat/health?v

Show plugins

The plugins command provides a view per node of running plugins.

GET /_cat/plugins?v&s=component&h=name,component,version,description

Show count

Th count provides quick access to the document count of the entire cluster, or individual indices.

GET /_cat/count/<target>

//v, the response includes column headings. Defaults to false.
GET /_cat/count/users?v

 

 

 

September 24, 2020

Elasticsearch Mapping

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.

Mappings are used to define:

  • which string fields should be treated as full text fields.
  • which fields contain numbers, dates, or geolocations.
  • the  format  of date values.
  • custom rules to control the mapping for  dynamically added fields.

Field Data Types

  • a simple type like text, keyword, date, long, double, boolean or ip.
  • a type which supports the hierarchical nature of JSON such as object or nested.
  • or a specialised type like geo_point, geo_shape, or completion.

It is often useful to index the same field in different ways for different purposes. For instance, a string field could be  indexed  as a text field for full-text search, and as a keyword field for sorting or aggregations. Alternatively, you could index a string field with the standard analyzer , the english analyzer, and the french analyzer .

This is the purpose of multi-fields. Most datatypes support multi-fields via the fields parameter.

The following settings allow you to limit the number of field mappings that can be created manually or dynamically, in order to prevent bad documents from causing a mapping explosion:index.mapping.total_fields.limit

index.mapping.total_fields.limit – The maximum number of fields in an index. Field and object mappings, as well as field aliases count towards this limit. The default value is 1000.

index.mapping.depth.limit – The maximum depth for a field, which is measured as the number of inner objects. For instance, if all fields are defined at the root object level, then the depth is 1. If there is one object mapping, then the depth is 2, etc. The default is 20.

index.mapping.nested_fields.limit – The maximum number of distinct nested mappings in an index, defaults to 50.

index.mapping.nested_objects.limit – The maximum number of nested JSON objects within a single document across all nested types, defaults to 10000.

Dynamic Mapping

Fields and mapping types do not need to be defined before being used. Thanks to dynamic mapping, new field names will be added automatically, just by indexing a document. New fields can be added both to the top-level mapping type, and to inner object and nested fields.

Mapping Example

PUT user 
{
  "mappings": {
    "properties": { 
      "title":    { "type": "text"  }, 
      "name":     { "type": "text"  }, 
      "age":      { "type": "integer" },  
      "created":  {
        "type":   "date", 
        "format": "strict_date_optional_time||epoch_millis"
      }
    }
  }
}

 Java Mapping Example 

String indexName = "doctors";

CreateIndexRequest request = new CreateIndexRequest(indexName);

request.settings(Settings.builder().put("index.number_of_shards", 1).put("index.number_of_replicas", 2));

XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject();
{
    builder.startObject("properties");
    {
        builder.startObject("locations");
        {
             builder.field("type", "geo_point");
        }
        builder.endObject();
                    
        builder.startObject("addresses");
        {
            builder.field("type", "nested");
        }
        builder.endObject();
                    
        builder.startObject("specialities");
        {
            builder.field("type", "nested");
        }
        builder.endObject();
                    
                    
    }
    builder.endObject();
}
builder.endObject();
request.mapping(builder);
            
CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request,RequestOptions.DEFAULT);

Source code on Github

 

September 24, 2020

Elasticsearch Document API

Elasticsearch provides single document APIs and multi-document APIs, where the API call is targeting a single document and multiple documents respectively.

All CRUD APIs are single-index APIs.

Index API

Adds a JSON document to the specified data stream or index and makes it searchable. If the target is an index and the document already exists, the request updates the document and increments its version. You cannot use the index API to send update requests for existing documents to a data stream.

You use one of these options to index a document:

PUT /<target>/_doc/<_id> 
POST /<target>/_doc/
PUT /<target>/_create/<_id>
POST /<target>/_create/<_id>

target - name of index. If the target doesn’t exist and doesn’t match a data stream template, this request creates the index.
_id - id of the document
Use POST /<target>/_doc/ when you want Elasticsearch to generate an ID for the document

You can index a new JSON document with the _doc or _create resource. Using _create guarantees that the document is only indexed if it does not already exist. To update an existing document, you must use the _doc resource.

Example of Index

PUT doctor_ut/_doc/1013143536
{
  "npi" : "1013143536",
  "firstName" : "SHAWN",
  "lastName" : "WRIGHT",
  "fullName" : "SHAWN WRIGHT",
  "credential" : "LICSW",
  "otherLastName" : "WRIGHT",
  "otherFirstName" : "SHAWN",
  "type" : "Individual",
  "gender" : "FEMALE"
}

Java example of Index API

IndexRequest request = new IndexRequest(utIndex);
request.id(doctorIndex.getNpi());
request.source(searchHit.getSourceAsString(), XContentType.JSON);
IndexResponse indexResponse = restHighLevelClient.index(request, RequestOptions.DEFAULT);

 

GET API

Retrieves the specified JSON document from an index.

GET <index>/_doc/<_id>

HEAD <index>/_doc/<_id>

You use GET to retrieve a document and its source or stored fields from a particular index. Use HEAD to verify that a document exists. You can use the _source resource retrieve just the document source or verify that it exists.

Example of Get API

GET doctor_ut/_doc/1013143536

You can also specify the fields you want in your result from that particular document.

GET doctors/_doc/1013143536?_source_includes=name,rating

Java example of Get API

public void getDoctorByNPI() {
        String indexName = Index.DOCTOR_UT.name().toLowerCase();

        String npi = "1013143536";
        GetRequest getRequest = new GetRequest(indexName, npi);


        try {
            GetResponse getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
            log.info(getResponse.getSourceAsString());
        } catch (Exception e) {
            log.warn(e.getLocalizedMessage());
        }
}

Multi Get API

Retrieves multiple JSON documents by ID. You use mget to retrieve multiple documents from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body.

GET doctor_ut/_mget
{
  "docs": [
    {
      "_id": "1689633083"
    },
    {
      "_id": "1073924098"
    }
  ]
}

Get multiple documents from different indices

GET _mget
{
  "docs": [
    {
      "_index": "doctor_ut",
      "_id": "1689633083"
    },
    {
      "_index": "doctors",
      "_id": "1073883070"
    }
  ]
}

Java example of Multi Get API

public void getMultipleDoctorsByNPIs() {
    String utahDoctorIndex = Index.DOCTOR_UT.name().toLowerCase();
    String doctorsIndex = Index.DOCTORS.name().toLowerCase();

    String npi1 = "1013143536";
    String npi2 = "1073883070";

    GetRequest getRequest = new GetRequest(utahDoctorIndex, npi1);
    MultiGetRequest request = new MultiGetRequest();
    request.add(new MultiGetRequest.Item(utahDoctorIndex, npi1));
    request.add(new MultiGetRequest.Item(doctorsIndex, npi2));

    try {
       MultiGetResponse response = restHighLevelClient.mget(request, RequestOptions.DEFAULT);

       // utah doctor
       MultiGetItemResponse utahDoctor = response.getResponses()[0];
       log.info(utahDoctor.getResponse().getSourceAsString());

       MultiGetItemResponse doctor = response.getResponses()[1];
       log.info(doctor.getResponse().getSourceAsString());
    } catch (Exception e) {
       log.warn(e.getLocalizedMessage());
    }
}

 

 

Update API

Updates a document using the specified script.

POST /<index>/_update/<_id>
{
...
}

The update API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API .

The document must still be reindexed, but using update removes some network roundtrips and reduces chances of version conflicts between the GET and the index operation.

The _source field must be enabled to use update. In addition to _source,  you can access the following variables through the ctx map: index, _type, _id, _version, _routing, and _now(the current timestamp).

POST doctor_ut/_update/1013143536
{
  "doc": {
    "firstName": "Folau"
  },
  "doc_as_upsert": true
}

 

Java example of Update API

public void updateDoctor() {
    String indexName = Index.DOCTOR_UT.name().toLowerCase();
    String npi = "1013143536";

    UpdateRequest request = new UpdateRequest(indexName, npi);
    Map<String, Object> jsonMap = new HashMap<>();
    jsonMap.put("firstName", "Folau");

    request.doc(jsonMap, XContentType.JSON);

    try {
       UpdateResponse updateResponse = restHighLevelClient.update(request, RequestOptions.DEFAULT);
       log.info(updateResponse.getGetResult().sourceAsString());
    } catch (Exception e) {
       log.warn(e.getLocalizedMessage());
    }
}

Update by query

While processing an update by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents. A bulk update request is performed for each batch of matching documents. Any query or update failures cause the update by query request to fail and the failures are shown in the response. Any update requests that completed successfully still stick, they are not rolled back

POST /<index>/_update_by_query

Updates documents that match the specified query. If no query is specified, performs an update on every document in the data stream or index without modifying the source, which is useful for picking up mapping changes.

POST doctor_ut/_update_by_query
{
  "script": {
    "source": "if (ctx._source.firstName == 'Kinga') {ctx._source.firstName='Tonga';}",
    "lang": "painless"
  },
  "query": {
    "term": {
      "firstName": "Kinga"
    }
  }
}

Java example of Update by query

public void batchUpdateDoctors() {
    String indexName = Index.DOCTOR_UT.name().toLowerCase();

    UpdateByQueryRequest request = new UpdateByQueryRequest(indexName);
    request.setQuery(new TermQueryBuilder("firstName", "new_name1"));
    request.setScript(new Script(ScriptType.INLINE, "painless", "if (ctx._source.firstName == 'new_name1') {ctx._source.firstName='Kinga';}", Collections.emptyMap()));

    try {
        BulkByScrollResponse bulkResponse = restHighLevelClient.updateByQuery(request, RequestOptions.DEFAULT);
        log.info("updated={}", bulkResponse.getStatus().getUpdated());
    } catch (Exception e) {
        log.warn(e.getLocalizedMessage());
    }
}

 

 

 

Delete API

Removes a JSON document from the specified index. You use DELETE to remove a document from an index. You must specify the index name and document ID.

DELETE /<index>/_doc/<_id>
DELETE doctor_ut/_doc/1013143536

Java example of Delete API

public void deleteDoctor() {
    String indexName = Index.DOCTOR_UT.name().toLowerCase();
    String npi = "1013143536";

    DeleteRequest request = new DeleteRequest(indexName, npi); 
    try {
         DeleteResponse deleteResponse = restHighLevelClient.delete(request, RequestOptions.DEFAULT);
         log.info(deleteResponse.getIndex());
    } catch (Exception e) {
         log.warn(e.getLocalizedMessage());
    }
}

 

Reindex API

Copies documents from a source to a destination.

The source and destination can be any pre-existing index, index alias, or  data stream . However, the source and destination must be different. For example, you cannot reindex a data stream into itself.

Reindex requires _source to be enabled for all documents in the source.

The destination should be configured as wanted before calling _reindex. Reindex does not copy the settings from the source or its associated template.

Mappings, shard counts, replicas, and so on must be configured ahead of time.

POST _reindex
{
  "source": {
    "index": "doctors"
  },
  "dest": {
    "index": "doctor-ut"
  }
}

 

 

 

 

 

September 24, 2020