The goal of filters is to reduce the number of documents that have to be examined by the query. Queries have to not only find matching documents, but also calculate how relevant each document is, which typically makes queries heavier than filters. Also, query results are not cachable. Filter is quick to calculate and easy to cache in memory, using only 1 bit per document. These cached filters can be reused efficiently for subsequent requests.
When to Use filter vs query?
As a general rule, use query clauses for full-text search or for any condition that should affect the relevance score, and use filter clauses for everything else.
There are two ways to filter search results.
filter
clause. Search requests apply boolean filters to both search hits and aggregations.post_filter
parameter. Search requests apply post filters only to search hits, not aggregations. You can use a post filter to calculate aggregations based on a broader result set, and then further narrow the results. A post filter has no impact on the aggregation results.Term filter
The term filter is used to filter by exact values, be they numbers, dates, Booleans, or not_analyzed exact-value string fields. Here we are filtering out all users whose rating is not 5. In other words, only retrieve users with rating of 5.
GET elasticsearch_learning/_search { "query": { "bool": { "filter": [ { "term": { "rating": 5 }} ] } } }
/** * https://www.elastic.co/guide/en/elasticsearch/reference/current/filter-search-results.html */ @Test void filterQuery() { int pageNumber = 0; int pageSize = 5; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ searchSourceBuilder.fetchSource(new String[]{"id", "firstName", "lastName", "rating", "dateOfBirth"}, new String[]{""}); BoolQueryBuilder boolQuery = QueryBuilders.boolQuery(); boolQuery.filter(QueryBuilders.termQuery("rating", 5)); searchSourceBuilder.query(boolQuery); searchRequest.source(searchSourceBuilder); searchRequest.preference("rating"); if (searchSourceBuilder.query() != null && searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { log.warn("IOException, msg={}", e.getLocalizedMessage()); e.printStackTrace(); } catch (Exception e) { log.warn("Exception, msg={}", e.getLocalizedMessage()); e.printStackTrace(); } }
Range filter
The range filter allows you to find numbers or dates that fall into a specified range. Here we are filtering out all users whose rating is either a 2, 3, or 4.
GET elasticsearch_learning/_search { "query":{ "bool" : { "filter" : [ { "range" : { "rating" : { "from" : 2, "to" : 4 } } } ] } } }
/** * https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html */ @Test void filterQueryWithRange() { int pageNumber = 0; int pageSize = 5; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ searchSourceBuilder.fetchSource(new String[]{"id", "firstName", "lastName", "rating", "dateOfBirth"}, new String[]{""}); BoolQueryBuilder boolQuery = QueryBuilders.boolQuery(); boolQuery.filter(QueryBuilders.rangeQuery("rating").gte(2).lte(4)); searchSourceBuilder.query(boolQuery); searchRequest.source(searchSourceBuilder); searchRequest.preference("rating"); if (searchSourceBuilder.query() != null && searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { log.warn("IOException, msg={}", e.getLocalizedMessage()); e.printStackTrace(); } catch (Exception e) { log.warn("Exception, msg={}", e.getLocalizedMessage()); e.printStackTrace(); } }
Exists Filter
The exists and missing filters are used to find documents in which the specified field either has one or more values (exists) or doesn’t have any values (missing). It is similar in nature to IS_NULL (missing) and NOT IS_NULL (exists)in SQL.
{ "exists": { "field": "name" } }
Here we are filtering out all users that have logged into the system.
GET elasticsearch_learning/_search { "query":{ "bool" : { "filter" : [ { "exists" : { "field" : "lastLoggedInAt" } } ] } }
/** * https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html */ @Test void filterQueryWithExists() { int pageNumber = 0; int pageSize = 5; SearchRequest searchRequest = new SearchRequest(database); searchRequest.allowPartialSearchResults(true); searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNumber * pageSize); searchSourceBuilder.size(pageSize); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); /** * fetch only a few fields */ searchSourceBuilder.fetchSource(new String[]{"id", "firstName", "lastName", "rating", "dateOfBirth"}, new String[]{""}); BoolQueryBuilder boolQuery = QueryBuilders.boolQuery(); boolQuery.filter(QueryBuilders.existsQuery("lastLoggedInAt")); searchSourceBuilder.query(boolQuery); searchRequest.source(searchSourceBuilder); searchRequest.preference("rating"); if (searchSourceBuilder.query() != null && searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) { log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString()); } else { log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString()); } try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value); List<User> users = getResponseResult(searchResponse.getHits()); log.info("results={}", ObjectUtils.toJson(users)); } catch (IOException e) { log.warn("IOException, msg={}", e.getLocalizedMessage()); e.printStackTrace(); } catch (Exception e) { log.warn("Exception, msg={}", e.getLocalizedMessage()); e.printStackTrace(); } }
Imagine having to execute a long running process and waiting for it to finish, most likely you are connected via ssh and you can’t do anything else but waiting for that process to finish. When the process is taking too long your connection will time out and you will be kicked out. When that happens your process will stop its execution and you will have to rerun it. This is painful and here is where screen comes into the picture.
For example, you are downloading a big file or a database using mysql dump command. This usually takes 10 to 20 minutes.
You can use the Linux screen
command to push running terminal applications or processes to the background and pull them forward when you want to see them. It also supports split-screen displays and works over SSH connections, even after you disconnect and reconnect.
The usual operation with screen is to create a new window with a shell in it, run a command, and then push the window to the background (called “detaching”). When you want to see how your process is doing, you can pull the window to the foreground again (“reattach”) and use it again. This is great for long processes you don’t want to accidentally terminate by closing the terminal window.
Once you’ve got a screen
session running, you can create new windows and run other processes in them. You can easily hop between windows to monitor their progress. You can also split your terminal window into vertical or horizontal regions, and display your various screen
windows in one window.
You can connect to a remote machine, start a screen
session, and launch a process. You can disconnect from the remote host, reconnect, and your process will still be running.
You can share a screen
session between two different SSH connections so two people can see the same thing, in real-time.
Intall screen
# ubuntu sudo apt update sudo apt install screen # centos or fedora sudo yum install screen
Start a screen session
screen # start screen session with a name screen -S loading-database
This will open a screen session, create a new window, and start a shell in that window.
Most used commands for a screen window
Ctrl+a
c
Create a new window (with shell).Ctrl+a
"
List all windows.Ctrl+a
0
Switch to window 0 (by number).Ctrl+a
A
Rename the current window.Ctrl+a
S
Split current region horizontally into two regions.Ctrl+a
|
Split current region vertically into two regions.Ctrl+a
tab
Switch the input focus to the next region.Ctrl+a
Ctrl+a
Toggle between the current and previous windowsCtrl+a
Q
Close all regions but the current one.Ctrl+a
X
Close the current region.Detach from a screen session
Ctrl+a Ctrl+d
The program running in the screen session will continue to run after you detach from the session.
Reattach to a screen session
# resume current screen session, assuming one screen session screen -r
In case you have multiple screen sessions running on your machine, you will need to append the screen session ID after the r
switch.
# list the current running screen sessions screen -ls # output There is a screen on: 30093.pts-0.ip-172-31-8-213 (09/24/21 16:07:23) (Attached) 1 Socket in /run/screen/S-ubuntu. # command screen -r 30093
Terminate a screen session
Make sure you on the screen session and then run the exit command.
exit
or run this command out of the screen session
screen -X -S screen_id kill (or quit) or screen -XS <screen_id> kill (or quit) # Example screen -X -S 30093 kill or screen -XS 30093 quit
When running queries in your production environment, you have to be careful so you don’t bring it down or make a mistake that will cause your environment to act abnormally.
Here are some things you need to be careful about:
The MySQL READ UNCOMMITTED, also known as “dirty read” because it is the lowest level of isolation. If we specify a table hint then it will override the current default isolation level. MySQL default isolation level is REPEATABLE READ which means locks will be placed for each operation, but multiple connections can read data concurrently.
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ; SELECT * FROM TABLE_NAME ; SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ ;
Do everything you can to run query againts dev, qa, or backup database before running your query against a production database. And when you do run your query against your production database, make sure to do it first against backup tables and then the actual tables. A lot of times you will find that as a safety net in case you make a mistake on first try, you will be safe as that would be against the backup and not the actual live data. Once everything looks good in the backup tables then you can run your query against the actual live data.
Make sure that when running queries that your dev database server has the same version as your production database server. If they have to be in different versions the production server must be in higher version than the dev server. In most cases where the version is not the same the dev server is in higher version so be careful about that. The reason for this check is that you may have a query that works in dev and won’t work in production because the query uses functions that the production server does not support. For example. JSON_ARRAYAGG
is not in mysql version 5.7. My dev server was in mysql 8 and production server was in 5.7. Not until release day that I realized my database migration was not going to work in production and had to pull it back.
If possible, run insert/update migration scripts during slow or off hours. There is a good chance you will run into issues especially with updating data based on existing data in the database. You might not be reading live data to perform updates as users are using and updating your data.
You can never go wrong with having help and verifying your query is doing what is intended to do. In most cases, you work in a team, you should have a team member or your team lead look at your query to make sure it’s doing what is intended to do. You can’t take a risk of running a query that may change thousands of records without having someone else review it.
CompletableFuture was introduced in Java 8 and is part of the java.util.concurrent package. It represents a future result of an asynchronous computation—a way to write non–blocking asynchronous code in Java. The main advantage of using CompletableFuture is its ability to chain multiple asynchronous operations, handle exceptions, combine results, and apply transformations without blocking the execution thread.
CompletableFuture.supplyAsync()
The CompletableFuture.supplyAsync() method is a handy tool in Java’s arsenal for asynchronous programming. It provides a way to execute asynchronous computations that produce a result. This method is used to initiate an asynchronous computation and return a CompletableFuture that will be completed with the result of that computation. The computation itself is specified as a Supplier<T>, where T is the type of the resulting value.
static void runSupplyAsync() { System.out.println("runSupplyAsync..."); CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> { // Simulate some long-running task try { System.out.println("supplyAsync sleeping..."); Thread.sleep(1500); System.out.println("supplyAsync done sleeping!"); } catch (InterruptedException e) { throw new IllegalStateException(e); } return "Result"; }); // This will print immediately System.out.println("Waiting for the result..."); String result = future.join(); System.out.println("return result: " + result); System.out.println("runSupplyAsync done!"); }
Output
runSupplyAsync... Waiting for the result... supplyAsync sleeping... supplyAsync done sleeping! return result: Result runSupplyAsync done!
static void runSupplyAsyncThen() { System.out.println("runSupplyAsyncThen..."); CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> { // Simulate some long-running task try { System.out.println("supplyAsync sleeping..."); Thread.sleep(1500); System.out.println("supplyAsync done sleeping!"); } catch (InterruptedException e) { throw new IllegalStateException(e); } return "Result"; }).thenApply(result -> { System.out.println("thenApply, result=" + result); return result + result; }).thenApply(result -> { System.out.println("thenApply, result=" + result); return result + result + result; }); future.thenAccept(result -> { System.out.println("thenAccept, result=" + result); }); // This will print immediately System.out.println("Waiting for the result..."); String result = future.join(); System.out.println("return result: " + result); System.out.println("runSupplyAsyncThen done!"); }
Completable.runAsync()
While CompletableFuture.supplyAsync() is designed to initiate asynchronous computations that produce a result, there are times when you want to perform an action that doesn’t return anything. That’s where CompletableFuture.runAsync() comes in.
This method is used to run a Runnable task asynchronously. A Runnable doesn’t return a result. Hence, the CompletableFuture returned by runAsync() is of type Void.
static void runAsync() { System.out.println("runAsync..."); CompletableFuture<Void> future = CompletableFuture.runAsync(() -> { // Simulate some long-running task try { System.out.println("runAsync sleeping..."); Thread.sleep(1500); System.out.println("runAsync done sleeping!"); } catch (InterruptedException e) { throw new IllegalStateException(e); } }); // This will block until the future is completed (i.e., the Runnable has executed) future.join(); System.out.println("runAsync done!"); }
static void runWithCustomExecutor() { System.out.println("runWithCustomExecutor..."); Executor executor = Executors.newFixedThreadPool(2); CompletableFuture<Void> future = CompletableFuture.runAsync(() -> { // Simulate some long-running task try { System.out.println("runAsync sleeping..."); Thread.sleep(1500); System.out.println("runAsync done sleeping!"); } catch (InterruptedException e) { throw new IllegalStateException(e); } }, executor); // This will block until the future is completed (i.e., the Runnable has executed) future.join(); System.out.println("runWithCustomExecutor done!"); }
Key Takeaways:
CompletableFuture.supplyAsync() initiates a non–blocking, asynchronous computation.
By default, tasks execute in the ForkJoinPool.commonPool(), but you can provide your custom Executor.
It provides a more modern, fluent, and readable approach to asynchronous programming in Java, especially when combined with other methods of CompletableFuture for chaining, combining, and handling results.
MySQL supports the native JSON data type since version 5.7.8. The native JSON data type allows you to store JSON documents more efficiently than the JSON text format in the previous versions.
MySQL stores JSON documents in an internal format that allows quick read access to document elements. The JSON binary format is structured in the way that permits the server to search for values within the JSON document directly by key or array index, which is very fast.
The storage of a JSON document is approximately the same as the storage of LONGBLOB
or LONGTEXT
data.
CREATE TABLE events ( ... browser_info JSON, ... );
Insert into json column
INSERT INTO events(browser_info) VALUES ( '{ "name": "Safari", "os": "Mac", "resolution": { "x": 1920, "y": 1080 } }' )
Automatic validation of JSON documents stored in JSON
columns. Invalid documents produce an error.
Evaluates a (possibly empty) list of key-value pairs and returns a JSON object containing those pairs. An error occurs if any key name is NULL
or the number of arguments is odd.
SELECT JSON_OBJECT('id',u.id, 'firstName',u.first_name, 'lastName',u.first_name) as jsonUser FROM user as u;
Evaluates a (possibly empty) list of values and returns a JSON array containing those values.
SELECT JSON_ARRAY(u.id, u.first_name, u.first_name) as jsonUser FROM user as u;
Json Object Agg
Return result set as a single JSON object
Takes two column names or expressions as arguments, the first of these being used as a key and the second as a value, and returns a JSON object containing key-value pairs. Returns NULL
if the result contains no rows, or in the event of an error. An error occurs if any key name is NULL
or the number of arguments is not equal to 2.
SELECT JSON_OBJECTAGG(u.id, u.firstName, u.lastName) as jsonData FROM user as u; // output { "id": 1, "firstName": "John", "lastName": "Peter" }
Return result set as a single JSON array
Aggregates a result set as a single JSON
array whose elements consist of the rows. The order of elements in this array is undefined. The function acts on a column or an expression that evaluates to a single value. Returns NULL
if the result contains no rows, or in the event of an error.
SELECT JSON_PRETTY(JSON_OBJECT('userId', u.id, 'cards', cardList)) as jsonData FROM user as u LEFT JOIN (SELECT c.user_id, JSON_ARRAYAGG( JSON_OBJECT( 'cardId', c.id, 'cardNumber', c.card_number) ) as cardList FROM card as c GROUP BY c.user_id) as cards ON u.id = cards.user_id;
{ "cards": [ { "cardId": 4, "cardNumber": "2440531" }, { "cardId": 11, "cardNumber": "4061190" } ], "userId": 1 }
How to accomplish JSON_ARRAYAGG before version 5.7.8
SELECT JSON_PRETTY(JSON_OBJECT('userId', u.id, 'cards', cardList)) as jsonData FROM user as u LEFT JOIN (SELECT c.user_id, CONCAT('[', GROUP_CONCAT( JSON_OBJECT( 'cardId', c.id, 'cardNumber', c.card_number) ), ']') as cardList FROM card as c GROUP BY c.user_id) as cards ON u.id = cards.user_id;
Provides pretty-printing of JSON values similar to that implemented in PHP and by other languages and database systems. The value supplied must be a JSON value or a valid string representation of a JSON value.
SELECT JSON_PRETTY(JSON_OBJECT('id',u.id, 'firstName',u.first_name, 'lastName',u.first_name)) as jsonUser FROM user as u;
json_extract(
json_doc
,
path
[,
path
] ...)
Returns data from a JSON document, selected from the parts of the document matched by the path
arguments. Returns NULL
if any argument is NULL
or no paths locate a value in the document. An error occurs if the json_doc
argument is not a valid JSON document or any path
argument is not a valid path expression.