Local Installation
Installing Elasticsearch on your local Mac computer. I have a Mac laptop so this local installation will be for Mac users but you can google Elasticsearch Windows installation.
I use docker and here is the docker command,
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.1.0
Once you have pulled the elasticsearch image, run the following command to start Elasticsearch
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.1.0
If you want to keep your local Elasticsearch server running at all times then run this command. Even when you turn off your computer, it will start back up when you computer comes back on.
docker run -p 9200:9200 -p 9300:9300 --name elasticsearch -e "discovery.type=single-node" -dit --restart unless-stopped -d docker.elastic.co/elasticsearch/elasticsearch:7.1.0
To check if installation well, go to localhost:9200/_cat/nodes?v&pretty on your browser.
You should see something like this which is the status of your Elasticsearch node.
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 172.17.0.3 20 96 0 0.00 0.02 0.00 mdi * f3056ceb45a2
You also need to install kibana which is a UI interface that works with Elasticsearch
Docker command to pull kibana docker image
docker pull docker.elastic.co/kibana/kibana:7.1.0
Run this command to start your kibana server
docker run --link YOUR_ELASTICSEARCH_CONTAINER_NAME_OR_ID:elasticsearch -p 5601:5601 {docker-repo}:{version}
You can search data in Elasticsearch by sending a get request with query string as a parameter or post a query in the message body of post request. A search query, or query, is a request for information about data in Elasticsearch data streams or indices.
GET doctor_ut/_search { "query": { "match_all": {} } }
String indexName = Index.DOCTOR_UT.name().toLowerCase(); SearchRequest searchRequest = new SearchRequest(indexName); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.termQuery("addresses.state.keyword", "UT")); int from = 1; int size = 1000; searchSourceBuilder.from(from); searchSourceBuilder.size(size); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); searchRequest.source(searchSourceBuilder); // with sorting // log.info("{\"query\":{}, \"sort\":{}}", searchSourceBuilder.query().toString(), // searchSourceBuilder.sorts().toString()); log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); SearchResponse searchResponse = null; try { searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); } catch (Exception e) { log.warn(e.getLocalizedMessage()); } log.info("search got response from elastic!, totalHits={}, maxScore={}, hitLength={}", searchResponse.getHits().getTotalHits().value, searchResponse.getHits().getMaxScore(),searchResponse.getHits().getHits().length); Iterator<SearchHit> it = searchResponse.getHits().iterator(); while (it.hasNext()) { SearchHit searchHit = it.next(); try { // log.info(searchHit.getSourceAsString()); DoctorIndex doctorIndex = ObjectUtils.getObjectMapper().readValue(searchHit.getSourceAsString(), DoctorIndex.class); log.info("doctorIndex={}", ObjectUtils.toJson(doctorIndex)); // ObjectUtils.getObjectMapper().writeValue(new FileOutputStream("output-2.json", true), // doctorIndex); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } }
By default, the Search API returns the top 10 matching documents.
To paginate through a larger set of results, you can use the search API’s size and from parameters. The size parameter is the number of matching documents to return. The from parameter is a zero-indexed offset from the beginning of the complete result set that indicates the document you want to start with.
By default, you cannot page through more than 10,000 documents using the from and size parameters. This limit is set using the index.max_result_window index setting.
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.termQuery("addresses.state.keyword", "UT")); int from = 1; int size = 1000; searchSourceBuilder.from(from); searchSourceBuilder.size(size);
GET doctor_ut/_search { "from": 5, "size": 5, "query": { "match_all": {} } }
The Scroll API can be used to retrieve a large number of results from a search request.
While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.
Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one data stream or index into a new data stream or index with a different configuration.
The scroll API requires a scroll ID. To get a scroll ID, submit a search API request that includes an argument for the scroll query parameter . The scroll parameter indicates how long Elasticsearch should retain the search context for the request.
The search response returns a scroll ID in the _scroll_id response body parameter. You can then use the scroll ID with the scroll API to retrieve the next batch of results for the request.
You can also use the scroll API to specify a new scroll parameter that extends or shortens the retention period for the search context.
The scroll API returns the same response body as the search API.
GET doctor_ut/_search/scroll { "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" }
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L)); SearchRequest searchRequest = new SearchRequest(indexName); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); searchRequest.scroll(scroll); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.size(1000); searchSourceBuilder.query(QueryBuilders.termQuery("addresses.state.keyword", "UT")); searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("search got response from elastic!, totalHits={}, maxScore={}, hitLength={}", searchResponse.getHits().getTotalHits().value, searchResponse.getHits().getMaxScore(), searchResponse.getHits().getHits().length); // process searchResponse String scrollId = searchResponse.getScrollId(); SearchHit[] searchHits = searchResponse.getHits().getHits(); while (searchHits != null && searchHits.length > 0) { SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId); scrollRequest.scroll(scroll); searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT); log.info("search got response from elastic!, totalHits={}, maxScore={}, hitLength={}", searchResponse.getHits().getTotalHits().value, searchResponse.getHits().getMaxScore(), searchResponse.getHits().getHits().length); // process searchResponse scrollId = searchResponse.getScrollId(); searchHits = searchResponse.getHits().getHits(); } ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); clearScrollRequest.addScrollId(scrollId); ClearScrollResponse clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT); boolean succeeded = clearScrollResponse.isSucceeded();
Full Body Search
You should use the request body search API because most parameters are passed in the HTTP request body instead of in the query string with the GET request.
Request body search not only handles the query itself, but also allows you to return highlighted snippets from your results, aggregate analytics across all results or subsets of results, and return did-you-mean suggestions, which will help guide your users to the best results quickly.
POST /_search { "from": 30, "size": 10 }
Multiple Query Clauses
Query clauses are simple building blocks that can be combined with each other to create complex queries.
{ "bool": { "must": { "match": { "email": "folau@gmail.com" }}, "must_not": { "match": { "name": "folau" }}, "should": { "match": { "lastName": "kaveinga" }} } }
It is important to note that a compound clause can combine any other query clauses, including other compound clauses. This means that compound clauses can be nested within each other, allowing the expression of very complex logic.
{ "bool": { "must": { "match": { "email": "folau@gmail.com" }}, "should": [ { "match": { "starred": true }}, { "bool": { "must": { "folder": "inbox" }}, "must_not": { "spam": true }} }} ], "minimum_should_match": 1 } }
A filter asks a yeso r no question of every document and is used for fields that contain exact values. For examples:
The goal of filters is to reduce the number of documents that have to be examined by the query.
A query is similar to a filter, but also asks the question: How well does this document match?
A query calculates how relevant each document is to the query, and assigns it a relevance _score, which is later used to sort matching documents by relevance. This concept of relevance is well suited to full-text search, where there is seldom a completely “correct” answer.
Queries have to not only find matching documents, but also calculate how relevant each document is, which typically makes queries heavier than filters. Also, query results are not cachable.
The match_all query simply matches all documents. It is the default query that is used if no query has been specified. It returns all rows and columns.
{ "match_all": {} }
The match query should be the standard query that you reach for whenever you want to query for a full-text or exact value in almost any field.
If you run a match query against a full-text field, it will analyze the query string by using the correct analyzer for that field before executing the search
{ "match": { "email": "folau" } }
If you use it on a field containing an exact value, such as a number, a date, a Boolean, or a not_analyzed string field, then it will search for that exact value
Note that for exact-value searches, you probably want to use a filter instead of a query, as a filter will be cached.
The match query does not use a query syntax like +user_id:2 +tweet:search. It just looks for the words that are specified. This means that it is safe to expose to your users via a search field; you control what fields they can query, and it is not prone to throwing syntax errors.
match_phrase
query analyzes the text and creates a phrase
query out of the analyzed text.
@Test void searchWithMatchPhrase() { String description = "His biggest fear"; int pageNumber = 0; int pageSize = 10; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ // searchSourceBuilder.fetchSource(new String[]{ "id", "firstName", "lastName", "cards" }, new String[]{""}); /** * Query */ /** * Filter<br> * match query is like contain in mysql */ searchSourceBuilder.query(QueryBuilders.matchPhraseQuery("description", description)); searchRequest.source(searchSourceBuilder); if (searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("totalShards={}, totalHits={}", searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
multi_match
query builds on the match
query to allow multi-field queries. Use * to query against all fields. Note that * will not query against nested fields.
{ "multi_match": { "query": "full text search", "fields": ["title","body"] } }
@Test void searchWithMultiMatchAllFields() { int pageNumber = 0; int pageSize = 10; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ // searchSourceBuilder.fetchSource(new String[]{ "id", "firstName", "lastName", "cards" }, new String[]{""}); /** * Query */ /** * Filter<br> * match query is like contain in mysql<br> * * means all fields<br> * Isabell - firstName of a diff user <br> * 3102060312 - phoneNumber of a diff user<br> * biggest fear - description of a diff user<br> */ //searchSourceBuilder.query(QueryBuilders.multiMatchQuery("Isabell 3102060312 biggest fear", "*")); searchSourceBuilder.query(QueryBuilders.multiMatchQuery("Best Buy", "*")); searchRequest.source(searchSourceBuilder); if (searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("totalShards={}, totalHits={}", searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
You can use the term
query to find documents based on a precise value such as a price, a product ID, or a username.
To better search text
fields, the match
query also analyzes your provided search term before performing a search. This means the match
query can search text
fields for analyzed tokens rather than an exact term.
The term
query does not analyze the search term. The term
query only searches for the exact term you provide. This means the term
query may return poor or no results when searching text
fields.
@Test void searchWithTerm() { String firstName = "Isabell"; int pageNumber = 0; int pageSize = 10; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ // searchSourceBuilder.fetchSource(new String[]{ "id", "firstName", "lastName", "cards" }, new String[]{""}); /** * Query */ /** * Filter<br> * term query looks for exact match. Use keyword */ searchSourceBuilder.query(QueryBuilders.termQuery("firstName.keyword", firstName)); searchRequest.source(searchSourceBuilder); searchRequest.preference("firstName"); if (searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
@Test void searchWithTermAndMultiValues() { int pageNumber = 0; int pageSize = 10; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ // searchSourceBuilder.fetchSource(new String[]{ "id", "firstName", "lastName", "cards" }, new String[]{""}); /** * Query */ /** * Filter<br> * term query looks for exact match. Use keyword */ searchSourceBuilder.query(QueryBuilders.termsQuery("firstName.keyword", "Leland","Harmony","Isabell")); searchRequest.source(searchSourceBuilder); searchRequest.preference("firstName"); if (searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
Return documents that has a value for a field.
GET elasticsearch_learning/_search { "query":{ "exists" : { "field" : "firstName", "boost" : 1.0 } } }
@Test void searchWithExistQuery() { int pageNumber = 0; int pageSize = 10; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ // searchSourceBuilder.fetchSource(new String[]{ "id", "firstName", "lastName", "cards" }, new String[]{""}); /** * Query */ /** * Filter<br> * term query looks for exact match. Use keyword */ searchSourceBuilder.query(QueryBuilders.existsQuery("firstName")); searchRequest.source(searchSourceBuilder); searchRequest.preference("firstName"); if (searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
Returns documents that have terms matching a wildcard pattern.
GET elasticsearch_learning/_search { "query":{ "wildcard" : { "firstName" : { "wildcard" : "H*y", "boost" : 1.0 } } } }
@Test void searchWithWildcardQuery() { int pageNumber = 0; int pageSize = 10; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ // searchSourceBuilder.fetchSource(new String[]{ "id", "firstName", "lastName", "cards" }, new String[]{""}); /** * Query */ /** * Filter<br> * term query looks for exact match. Use keyword<br> * These matching terms can include Honey, Henny, or Horsey.<br> */ searchSourceBuilder.query(QueryBuilders.wildcardQuery("firstName", "H*y")); searchRequest.source(searchSourceBuilder); searchRequest.preference("firstName"); if (searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
Returns documents that contain terms matching a regular expression.
The performance of the regexp
query can vary based on the regular expression provided. To improve performance, avoid using wildcard patterns, such as .*
or .*?+
, without a prefix or suffix.
GET elasticsearch_learning/_search { "query":{ "regexp" : { "firstName" : { "value" : "S.e", "flags_value" : 255, "case_insensitive" : true, "max_determinized_states" : 10000, "boost" : 1.0 } } } }
/** * https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html<br> * https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html */ @Test void searchWithRegexQuery() { int pageNumber = 0; int pageSize = 10; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ searchSourceBuilder.fetchSource(new String[]{ "id", "firstName", "lastName","description"}, new String[]{""}); /** * Query<br> * Sydnee<br> * . means match any character.<br> * * Repeat the preceding character zero or more times.<br> */ searchSourceBuilder.query(QueryBuilders.regexpQuery("firstName", "S.*e") .flags(RegexpQueryBuilder.DEFAULT_FLAGS_VALUE) .caseInsensitive(true)); searchRequest.source(searchSourceBuilder); searchRequest.preference("firstName"); if (searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
The bool query, like the bool filter, is used to combine multiple query clauses. However, there are some differences. Remember that while filters give binary yes/no answers, queries calculate a relevance score instead. The bool query combines the _score from each must or should clause that matches. This query accepts the following parameters:
must
Clauses that must match for the document to be included.
must_not
Clauses that must not match for the document to be included.
should
If these clauses match, they increase the _score; otherwise, they have no effect. They are simply used to refine the relevance score for each document.
The bool
query takes a more-matches-is-better approach, so the score from each matching must
or should
clause will be added together to provide the final _score
for each document.
The following query finds documents whose title field matches the query string how to make millions and that are not marked as spam. If any documents are starred or are from 2014 onward, they will rank higher than they would have otherwise. Documents that match both conditions will rank even higher.
GET elasticsearch_learning/_search { "query":{ "bool" : { "must" : [ { "match" : { "firstName" : { "query" : "Leland", "operator" : "OR", "prefix_length" : 0, "max_expansions" : 50, "fuzzy_transpositions" : true, "lenient" : false, "zero_terms_query" : "NONE", "auto_generate_synonyms_phrase_query" : true, "boost" : 1.0 } } } ], "filter" : [ { "match" : { "firstName" : { "query" : "Leland", "operator" : "OR", "prefix_length" : 0, "max_expansions" : 50, "fuzzy_transpositions" : true, "lenient" : false, "zero_terms_query" : "NONE", "auto_generate_synonyms_phrase_query" : true, "boost" : 1.0 } } } ], "must_not" : [ { "match" : { "firstName" : { "query" : "Leilani", "operator" : "OR", "prefix_length" : 0, "max_expansions" : 50, "fuzzy_transpositions" : true, "lenient" : false, "zero_terms_query" : "NONE", "auto_generate_synonyms_phrase_query" : true, "boost" : 1.0 } } } ], "should" : [ { "match" : { "firstName" : { "query" : "Lelanddd", "operator" : "OR", "prefix_length" : 0, "max_expansions" : 50, "fuzzy_transpositions" : true, "lenient" : false, "zero_terms_query" : "NONE", "auto_generate_synonyms_phrase_query" : true, "boost" : 1.0 } } } ], "adjust_pure_negative" : true, "boost" : 1.0 } } }
@Test void searchWithBooleanQuery() { int pageNumber = 0; int pageSize = 10; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ // searchSourceBuilder.fetchSource(new String[]{ "id", "firstName", "lastName", "cards" }, new String[]{""}); /** * Query */ BoolQueryBuilder boolQuery = QueryBuilders.boolQuery(); /** * Filter<br> * term query looks for exact match. Use keyword */ boolQuery.must(QueryBuilders.matchQuery("firstName", "Leland")); boolQuery.mustNot(QueryBuilders.matchQuery("firstName", "Leilani")); boolQuery.should(QueryBuilders.matchQuery("firstName", "Lelanddd")); boolQuery.filter(QueryBuilders.matchQuery("firstName", "Leland")); searchSourceBuilder.query(boolQuery); searchRequest.source(searchSourceBuilder); searchRequest.preference("firstName"); if (searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
Note that if there are no must clauses, at least one should clause has to match. However, if there is at least one must clause, no should clauses are required to match.
Use bool query to combine query and filter.
Let’s assume you have 3 nodes.
#Query Phase
When a search request is sent to a node, that node becomes the coordinating node. It is the job of this node to broadcast the search request to all involved shards, and to gather their responses into a globally sorted result set that it can return to the client.
The first step is to broadcast the request to a shard copy of every node in the index. Just like document GET requests, search requests can be handled by a primary shard or by any of its replicas. This is how more replicas (when combined with more hard‐ ware) can increase search throughput. A coordinating node will round-robin through all shard copies on subsequent requests in order to spread the load.
Each shard executes the query locally and builds a sorted priority queue of length from + size—in other words, enough results to satisfy the global search request all by itself. It returns a lightweight list of results to the coordinating node, which con‐ tains just the doc IDs and any values required for sorting, such as the _score.
The coordinating node merges these shard-level results into its own sorted priority queue, which represents the globally sorted result set. Here the query phase ends.
An index can consist of one or more primary shards, so a search request against a single index needs to be able to combine the results from multiple shards. A search against multiple or all indi‐ ces works in exactly the same way—there are just more shards involved.
#Fetch Phase
The coordinating node first decides which documents actually need to be fetched. For instance, if our query specified { “from”: 90, “size”: 10 }, the first 90 results would be discarded and only the next 10 results would need to be retrieved. These documents may come from one, some, or all of the shards involved in the original search request.
The coordinating node builds a multi-get request for each shard that holds a perti‐ nent document and sends the request to the same shard copy that handled the query phase.
The shard loads the document bodies—the _source field—and, if requested, enriches the results with metadata and search snippet highlighting. Once the coordinating node receives all results, it assembles them into a single response that it returns to the client.
Imagine that you are sorting your results by a timestamp field, and two documents have the same timestamp. Because search requests are round-robined between all available shard copies, these two documents may be returned in one order when the request is served by the primary, and in another order when served by the replica. This is known as the bouncing results problem: every time the user refreshes the page, the results appear in a different order. The problem can be avoided by always using the same shards for the same user, which can be done by setting the preference parameter to an arbitrary string like the user’s session ID.
Usually the results from various Elasticsearch APIs are displayed in JSON format. But JSON is not easy to read always. So CAT APIs feature is available in Elasticsearch helps in taking care of giving an easier to read and comprehend printing format of the results. There are various parameters used in cat API which serve different purpose, for example – the term V makes the output verbose.
Show indices
Show each index and their details
GET /_cat/indices?v
Show nodes
The nodes
command shows the cluster topology
GET /_cat/nodes?h=ip,port,heapPercent,name
Show health
Show health status of each index
GET /_cat/health?v
Show plugins
The plugins
command provides a view per node of running plugins.
GET /_cat/plugins?v&s=component&h=name,component,version,description
Th count
provides quick access to the document count of the entire cluster, or individual indices.
GET /_cat/count/<target> //v, the response includes column headings. Defaults to false. GET /_cat/count/users?v
Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.
Mappings are used to define:
It is often useful to index the same field in different ways for different purposes. For instance, a string field could be indexed as a text field for full-text search, and as a keyword field for sorting or aggregations. Alternatively, you could index a string field with the standard analyzer , the english analyzer, and the french analyzer .
This is the purpose of multi-fields. Most datatypes support multi-fields via the fields parameter.
The following settings allow you to limit the number of field mappings that can be created manually or dynamically, in order to prevent bad documents from causing a mapping explosion:index.mapping.total_fields.limit
index.mapping.total_fields.limit – The maximum number of fields in an index. Field and object mappings, as well as field aliases count towards this limit. The default value is 1000.
index.mapping.depth.limit – The maximum depth for a field, which is measured as the number of inner objects. For instance, if all fields are defined at the root object level, then the depth is 1. If there is one object mapping, then the depth is 2, etc. The default is 20.
index.mapping.nested_fields.limit – The maximum number of distinct nested mappings in an index, defaults to 50.
index.mapping.nested_objects.limit – The maximum number of nested JSON objects within a single document across all nested types, defaults to 10000.
Dynamic Mapping
Fields and mapping types do not need to be defined before being used. Thanks to dynamic mapping, new field names will be added automatically, just by indexing a document. New fields can be added both to the top-level mapping type, and to inner object and nested fields.
Mapping Example
PUT user { "mappings": { "properties": { "title": { "type": "text" }, "name": { "type": "text" }, "age": { "type": "integer" }, "created": { "type": "date", "format": "strict_date_optional_time||epoch_millis" } } } }
Java Mapping Example
String indexName = "doctors"; CreateIndexRequest request = new CreateIndexRequest(indexName); request.settings(Settings.builder().put("index.number_of_shards", 1).put("index.number_of_replicas", 2)); XContentBuilder builder = XContentFactory.jsonBuilder(); builder.startObject(); { builder.startObject("properties"); { builder.startObject("locations"); { builder.field("type", "geo_point"); } builder.endObject(); builder.startObject("addresses"); { builder.field("type", "nested"); } builder.endObject(); builder.startObject("specialities"); { builder.field("type", "nested"); } builder.endObject(); } builder.endObject(); } builder.endObject(); request.mapping(builder); CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request,RequestOptions.DEFAULT);
An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.
An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data. By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure.
Elasticsearch provides single document APIs and multi-document APIs, where the API call is targeting a single document and multiple documents respectively.
All CRUD APIs are single-index APIs.
Adds a JSON document to the specified data stream or index and makes it searchable. If the target is an index and the document already exists, the request updates the document and increments its version. You cannot use the index API to send update requests for existing documents to a data stream.
You use one of these options to index a document:
PUT /<target>/_doc/<_id> POST /<target>/_doc/ PUT /<target>/_create/<_id> POST /<target>/_create/<_id> target - name of index. If the target doesn’t exist and doesn’t match a data stream template, this request creates the index. _id - id of the document Use POST /<target>/_doc/ when you want Elasticsearch to generate an ID for the document
You can index a new JSON document with the _doc or _create resource. Using _create guarantees that the document is only indexed if it does not already exist. To update an existing document, you must use the _doc resource.
Example of Index
PUT doctor_ut/_doc/1013143536 { "npi" : "1013143536", "firstName" : "SHAWN", "lastName" : "WRIGHT", "fullName" : "SHAWN WRIGHT", "credential" : "LICSW", "otherLastName" : "WRIGHT", "otherFirstName" : "SHAWN", "type" : "Individual", "gender" : "FEMALE" }
IndexRequest request = new IndexRequest(utIndex); request.id(doctorIndex.getNpi()); request.source(searchHit.getSourceAsString(), XContentType.JSON); IndexResponse indexResponse = restHighLevelClient.index(request, RequestOptions.DEFAULT);
Retrieves the specified JSON document from an index.
GET <index>/_doc/<_id> HEAD <index>/_doc/<_id>
You use GET to retrieve a document and its source or stored fields from a particular index. Use HEAD to verify that a document exists. You can use the _source resource retrieve just the document source or verify that it exists.
Example of Get API
GET doctor_ut/_doc/1013143536
You can also specify the fields you want in your result from that particular document.
GET doctors/_doc/1013143536?_source_includes=name,rating
public void getDoctorByNPI() { String indexName = Index.DOCTOR_UT.name().toLowerCase(); String npi = "1013143536"; GetRequest getRequest = new GetRequest(indexName, npi); try { GetResponse getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT); log.info(getResponse.getSourceAsString()); } catch (Exception e) { log.warn(e.getLocalizedMessage()); } }
Retrieves multiple JSON documents by ID. You use mget to retrieve multiple documents from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body.
GET doctor_ut/_mget { "docs": [ { "_id": "1689633083" }, { "_id": "1073924098" } ] }
Get multiple documents from different indices
GET _mget { "docs": [ { "_index": "doctor_ut", "_id": "1689633083" }, { "_index": "doctors", "_id": "1073883070" } ] }
public void getMultipleDoctorsByNPIs() { String utahDoctorIndex = Index.DOCTOR_UT.name().toLowerCase(); String doctorsIndex = Index.DOCTORS.name().toLowerCase(); String npi1 = "1013143536"; String npi2 = "1073883070"; GetRequest getRequest = new GetRequest(utahDoctorIndex, npi1); MultiGetRequest request = new MultiGetRequest(); request.add(new MultiGetRequest.Item(utahDoctorIndex, npi1)); request.add(new MultiGetRequest.Item(doctorsIndex, npi2)); try { MultiGetResponse response = restHighLevelClient.mget(request, RequestOptions.DEFAULT); // utah doctor MultiGetItemResponse utahDoctor = response.getResponses()[0]; log.info(utahDoctor.getResponse().getSourceAsString()); MultiGetItemResponse doctor = response.getResponses()[1]; log.info(doctor.getResponse().getSourceAsString()); } catch (Exception e) { log.warn(e.getLocalizedMessage()); } }
Updates a document using the specified script.
POST /<index>/_update/<_id> { ... }
The update API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API .
The document must still be reindexed, but using update removes some network roundtrips and reduces chances of version conflicts between the GET and the index operation.
The _source field must be enabled to use update. In addition to _source, you can access the following variables through the ctx map: index, _type, _id, _version, _routing, and _now(the current timestamp).
POST doctor_ut/_update/1013143536 { "doc": { "firstName": "Folau" }, "doc_as_upsert": true }
public void updateDoctor() { String indexName = Index.DOCTOR_UT.name().toLowerCase(); String npi = "1013143536"; UpdateRequest request = new UpdateRequest(indexName, npi); Map<String, Object> jsonMap = new HashMap<>(); jsonMap.put("firstName", "Folau"); request.doc(jsonMap, XContentType.JSON); try { UpdateResponse updateResponse = restHighLevelClient.update(request, RequestOptions.DEFAULT); log.info(updateResponse.getGetResult().sourceAsString()); } catch (Exception e) { log.warn(e.getLocalizedMessage()); } }
While processing an update by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents. A bulk update request is performed for each batch of matching documents. Any query or update failures cause the update by query request to fail and the failures are shown in the response. Any update requests that completed successfully still stick, they are not rolled back
POST /<index>/_update_by_query
Updates documents that match the specified query. If no query is specified, performs an update on every document in the data stream or index without modifying the source, which is useful for picking up mapping changes.
POST doctor_ut/_update_by_query { "script": { "source": "if (ctx._source.firstName == 'Kinga') {ctx._source.firstName='Tonga';}", "lang": "painless" }, "query": { "term": { "firstName": "Kinga" } } }
Java example of Update by query
public void batchUpdateDoctors() { String indexName = Index.DOCTOR_UT.name().toLowerCase(); UpdateByQueryRequest request = new UpdateByQueryRequest(indexName); request.setQuery(new TermQueryBuilder("firstName", "new_name1")); request.setScript(new Script(ScriptType.INLINE, "painless", "if (ctx._source.firstName == 'new_name1') {ctx._source.firstName='Kinga';}", Collections.emptyMap())); try { BulkByScrollResponse bulkResponse = restHighLevelClient.updateByQuery(request, RequestOptions.DEFAULT); log.info("updated={}", bulkResponse.getStatus().getUpdated()); } catch (Exception e) { log.warn(e.getLocalizedMessage()); } }
Removes a JSON document from the specified index. You use DELETE to remove a document from an index. You must specify the index name and document ID.
DELETE /<index>/_doc/<_id>
DELETE doctor_ut/_doc/1013143536
public void deleteDoctor() { String indexName = Index.DOCTOR_UT.name().toLowerCase(); String npi = "1013143536"; DeleteRequest request = new DeleteRequest(indexName, npi); try { DeleteResponse deleteResponse = restHighLevelClient.delete(request, RequestOptions.DEFAULT); log.info(deleteResponse.getIndex()); } catch (Exception e) { log.warn(e.getLocalizedMessage()); } }
Copies documents from a source to a destination.
The source and destination can be any pre-existing index, index alias, or data stream . However, the source and destination must be different. For example, you cannot reindex a data stream into itself.
Reindex requires _source to be enabled for all documents in the source.
The destination should be configured as wanted before calling _reindex. Reindex does not copy the settings from the source or its associated template.
Mappings, shard counts, replicas, and so on must be configured ahead of time.
POST _reindex { "source": { "index": "doctors" }, "dest": { "index": "doctor-ut" } }