Subscribe To Our Newsletter
You will receive our latest post and tutorial.
Thank you for subscribing!

required
required


Elasticsearch Filter

The goal of filters is to reduce the number of documents that have to be examined by the query. Queries have to not only find matching documents, but also calculate how relevant each document is, which typically makes queries heavier than filters. Also, query results are not cachable. Filter is quick to calculate and easy to cache in memory, using only 1 bit per document. These cached filters can be reused efficiently for subsequent requests.

When to Use filter vs query?

As a general rule, use query clauses for full-text search or for any condition that should affect the relevance score, and use filter clauses for everything else.

There are two ways to filter search results.

  1. Use a boolean query with a filter clause. Search requests apply boolean filters to both search hits and aggregations.
  2. Use the search API’s post_filter parameter. Search requests apply post filters only to search hits, not aggregations. You can use a post filter to calculate aggregations based on a broader result set, and then further narrow the results. A post filter has no impact on the aggregation results.

Term filter

The term filter is used to filter by exact values, be they numbers, dates, Booleans, or not_analyzed exact-value string fields. Here we are filtering out all users whose rating is not 5. In other words, only retrieve users with rating of 5.

GET elasticsearch_learning/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "rating": 5 }}
      ]
    }
  }
}
/**
 * https://www.elastic.co/guide/en/elasticsearch/reference/current/filter-search-results.html
 */
@Test
void filterQuery() {

    int pageNumber = 0;
    int pageSize = 5;

    SearchRequest searchRequest = new SearchRequest(database);
    searchRequest.allowPartialSearchResults(true);
    searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen());

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.from(pageNumber * pageSize);
    searchSourceBuilder.size(pageSize);
    searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
    /**
     * fetch only a few fields
     */
    searchSourceBuilder.fetchSource(new String[]{"id", "firstName", "lastName", "rating", "dateOfBirth"}, new String[]{""});

    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();

    boolQuery.filter(QueryBuilders.termQuery("rating", 5));

    searchSourceBuilder.query(boolQuery);

    searchRequest.source(searchSourceBuilder);

    searchRequest.preference("rating");

    if (searchSourceBuilder.query() != null && searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) {
        log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString());
    } else {
        log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString());
    }

    try {
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value);

        List<User> users = getResponseResult(searchResponse.getHits());

        log.info("results={}", ObjectUtils.toJson(users));

    } catch (IOException e) {
        log.warn("IOException, msg={}", e.getLocalizedMessage());
        e.printStackTrace();
    } catch (Exception e) {
        log.warn("Exception, msg={}", e.getLocalizedMessage());
        e.printStackTrace();
    }

}

 

Range filter

The range filter allows you to find numbers or dates that fall into a specified range. Here we are filtering out all users whose rating is either a 2, 3, or 4. 

GET elasticsearch_learning/_search 
{
  "query":{
    "bool" : {
      "filter" : [
        {
          "range" : {
            "rating" : {
              "from" : 2,
              "to" : 4
            }
          }
        }
      ]
    }
  }
}
/**
 * https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
 */
@Test
void filterQueryWithRange() {

    int pageNumber = 0;
    int pageSize = 5;

    SearchRequest searchRequest = new SearchRequest(database);
    searchRequest.allowPartialSearchResults(true);
    searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen());

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.from(pageNumber * pageSize);
    searchSourceBuilder.size(pageSize);
    searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
    /**
     * fetch only a few fields
     */
    searchSourceBuilder.fetchSource(new String[]{"id", "firstName", "lastName", "rating", "dateOfBirth"}, new String[]{""});

    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();

    boolQuery.filter(QueryBuilders.rangeQuery("rating").gte(2).lte(4));

    searchSourceBuilder.query(boolQuery);

    searchRequest.source(searchSourceBuilder);

    searchRequest.preference("rating");

    if (searchSourceBuilder.query() != null && searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) {
        log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString());
    } else {
        log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString());
    }

    try {
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value);

        List<User> users = getResponseResult(searchResponse.getHits());

        log.info("results={}", ObjectUtils.toJson(users));

    } catch (IOException e) {
        log.warn("IOException, msg={}", e.getLocalizedMessage());
        e.printStackTrace();
    } catch (Exception e) {
        log.warn("Exception, msg={}", e.getLocalizedMessage());
        e.printStackTrace();
    }

}

 

Exists Filter

The exists and missing filters are used to find documents in which the specified field either has one or more values (exists) or doesn’t have any values (missing). It is similar in nature to IS_NULL (missing) and NOT IS_NULL (exists)in SQL.

{
       "exists": {
            "field":    "name"
        }
}

Here we are filtering out all users that have logged into the system.

GET elasticsearch_learning/_search 
{
"query":{
  "bool" : {
    "filter" : [
      {
        "exists" : {
          "field" : "lastLoggedInAt"
        }
      }
    ]
  }
}
/**
 * https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
 */
@Test
void filterQueryWithExists() {

    int pageNumber = 0;
    int pageSize = 5;

    SearchRequest searchRequest = new SearchRequest(database);
    searchRequest.allowPartialSearchResults(true);
    searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen());

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.from(pageNumber * pageSize);
    searchSourceBuilder.size(pageSize);
    searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
    /**
     * fetch only a few fields
     */
    searchSourceBuilder.fetchSource(new String[]{"id", "firstName", "lastName", "rating", "dateOfBirth"}, new String[]{""});

    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();

    boolQuery.filter(QueryBuilders.existsQuery("lastLoggedInAt"));

    searchSourceBuilder.query(boolQuery);

    searchRequest.source(searchSourceBuilder);

    searchRequest.preference("rating");

    if (searchSourceBuilder.query() != null && searchSourceBuilder.sorts() != null && searchSourceBuilder.sorts().size() > 0) {
        log.info("\n{\n\"query\":{}, \"sort\":{}\n}", searchSourceBuilder.query().toString(), searchSourceBuilder.sorts().toString());
    } else {
        log.info("\n{\n\"query\":{}\n}", searchSourceBuilder.query().toString());
    }

    try {
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        log.info("isTimedOut={}, totalShards={}, totalHits={}", searchResponse.isTimedOut(), searchResponse.getTotalShards(), searchResponse.getHits().getTotalHits().value);

        List<User> users = getResponseResult(searchResponse.getHits());

        log.info("results={}", ObjectUtils.toJson(users));

    } catch (IOException e) {
        log.warn("IOException, msg={}", e.getLocalizedMessage());
        e.printStackTrace();
    } catch (Exception e) {
        log.warn("Exception, msg={}", e.getLocalizedMessage());
        e.printStackTrace();
    }

}

Source code on Github

September 27, 2021

Linux Screen

Imagine having to execute a long running process and waiting for it to finish, most likely you are connected via ssh and you can’t do anything else but waiting for that process to finish. When the process is taking too long your connection will time out and you will be kicked out. When that happens your process will stop its execution and you will have to rerun it. This is painful and here is where screen comes into the picture.

For example, you are downloading a big file or a database using mysql dump command. This usually takes 10 to 20 minutes. 

You can use the Linux screen command to push running terminal applications or processes to the background and pull them forward when you want to see them. It also supports split-screen displays and works over SSH connections, even after you disconnect and reconnect. 

How does screen work?

The usual operation with screen is to create a new window with a shell in it, run a command, and then push the window to the background (called “detaching”). When you want to see how your process is doing, you can pull the window to the foreground again (“reattach”) and use it again. This is great for long processes you don’t want to accidentally terminate by closing the terminal window.

Once you’ve got a screen session running, you can create new windows and run other processes in them. You can easily hop between windows to monitor their progress. You can also split your terminal window into vertical or horizontal regions, and display your various screen windows in one window.

You can connect to a remote machine, start a screen session, and launch a process. You can disconnect from the remote host, reconnect, and your process will still be running.

You can share a screen session between two different SSH connections so two people can see the same thing, in real-time.

Intall screen

# ubuntu
sudo apt update
sudo apt install screen

# centos or fedora
sudo yum install screen

Start a screen session

screen

# start screen session with a name
screen -S loading-database

This will open a screen session, create a new window, and start a shell in that window.

Most used commands for a screen window

  • Ctrl+a c Create a new window (with shell).
  • Ctrl+a " List all windows.
  • Ctrl+a 0 Switch to window 0 (by number).
  • Ctrl+a A Rename the current window.
  • Ctrl+a S Split current region horizontally into two regions.
  • Ctrl+a | Split current region vertically into two regions.
  • Ctrl+a tab Switch the input focus to the next region.
  • Ctrl+a Ctrl+a Toggle between the current and previous windows
  • Ctrl+a Q Close all regions but the current one.
  • Ctrl+a X Close the current region.

Detach from a screen session

Ctrl+a Ctrl+d

 

The program running in the screen session will continue to run after you detach from the session.

Reattach to a screen session

# resume current screen session, assuming one screen session
screen -r

In case you have multiple screen sessions running on your machine, you will need to append the screen session ID after the r switch.

# list the current running screen sessions 
screen -ls

# output
There is a screen on:
        30093.pts-0.ip-172-31-8-213     (09/24/21 16:07:23)     (Attached)
1 Socket in /run/screen/S-ubuntu.

# command
screen -r 30093

Terminate a screen session

Make sure you on the screen session and then run the exit command.

exit

or run this command out of the screen session

screen -X -S screen_id kill (or quit)

or 

screen -XS <screen_id> kill (or quit)

# Example
screen -X -S 30093 kill 

or 

screen -XS 30093 quit

 

 

September 24, 2021

MySQL Run Query in Production

When running queries in your production environment, you have to be careful so you don’t bring it down or make a mistake that will cause your environment to act abnormally.

Here are some things you need to be careful about:

Run query with READ UNCOMMITTED option

 The MySQL READ UNCOMMITTED, also known as “dirty read” because it is the lowest level of isolation. If we specify a table hint then it will override the current default isolation level. MySQL default isolation level is REPEATABLE READ which means locks will be placed for each operation, but multiple connections can read data concurrently.

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
SELECT * FROM TABLE_NAME ;
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ ;

Run query against a backup table first

Do everything you can to run query againts dev, qa, or backup database before running your query against a production database. And when you do run your query against your production database, make sure to do it first against backup tables and then the actual tables. A lot of times you will find that as a safety net in case you make a mistake on first try, you will be safe as that would be against the backup and not the actual live data. Once everything looks good in the backup tables then you can run your query against the actual live data.

Make sure dev database server is the same(version) as the production server

Make sure that when running queries that your dev database server has the same version as your production database server. If they have to be in different versions the production server must be in higher version than the dev server. In most cases where the version is not the same the dev server is in higher version so be careful about that. The reason for this check is that you may have a query that works in dev and won’t work in production because the query uses functions that the production server does not support. For example. JSON_ARRAYAGG  is not in mysql version 5.7. My dev server was in mysql 8 and production server was in 5.7. Not until release day that I realized my database migration was not going to work in production and had to pull it back.

Avoid running migration insert/update scripts during operational hours. 

If possible, run insert/update migration scripts during slow or off hours. There is a good chance you will run into issues especially with updating data based on existing data in the database. You might not be reading live data to perform updates as users are using and updating your data.

Always have someone else take a second look at your query

You can never go wrong with having help and verifying your query is doing what is intended to do. In most cases, you work in a team, you should have a team member or your team lead look at your query to make sure it’s doing what is intended to do. You can’t take a risk of running a query that may change thousands of records without having someone else review it.

 

September 13, 2021

Java – CompletableFuture

CompletableFuture was introduced in Java 8 and is part of the java.util.concurrent package. It represents a future result of an asynchronous computation—a way to write nonblocking asynchronous code in Java. The main advantage of using CompletableFuture is its ability to chain multiple asynchronous operations, handle exceptions, combine results, and apply transformations without blocking the execution thread.

CompletableFuture.supplyAsync()

The CompletableFuture.supplyAsync() method is a handy tool in Java’s arsenal for asynchronous programming. It provides a way to execute asynchronous computations that produce a result. This method is used to initiate an asynchronous computation and return a CompletableFuture that will be completed with the result of that computation. The computation itself is specified as a Supplier<T>, where T is the type of the resulting value.

static void runSupplyAsync() {
    System.out.println("runSupplyAsync...");
    CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> {
        // Simulate some long-running task
        try {
            System.out.println("supplyAsync sleeping...");
            Thread.sleep(1500);
            System.out.println("supplyAsync done sleeping!");
        } catch (InterruptedException e) {
            throw new IllegalStateException(e);
        }
        return "Result";
    });

    // This will print immediately
    System.out.println("Waiting for the result...");

    String result = future.join();
    System.out.println("return result: " + result);

    System.out.println("runSupplyAsync done!");
}

Output

runSupplyAsync...
Waiting for the result...
supplyAsync sleeping...
supplyAsync done sleeping!
return result: Result
runSupplyAsync done!
static void runSupplyAsyncThen() {
    System.out.println("runSupplyAsyncThen...");
    CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> {
        // Simulate some long-running task
        try {
            System.out.println("supplyAsync sleeping...");
            Thread.sleep(1500);
            System.out.println("supplyAsync done sleeping!");
        } catch (InterruptedException e) {
            throw new IllegalStateException(e);
        }
        return "Result";
    }).thenApply(result -> {
        System.out.println("thenApply, result=" + result);
        return result + result;
    }).thenApply(result -> {
        System.out.println("thenApply, result=" + result);
        return result + result + result;
    });

    future.thenAccept(result -> {
        System.out.println("thenAccept, result=" + result);
    });

    // This will print immediately
    System.out.println("Waiting for the result...");

    String result = future.join();
    System.out.println("return result: " + result);

    System.out.println("runSupplyAsyncThen done!");
}

Completable.runAsync()

While CompletableFuture.supplyAsync() is designed to initiate asynchronous computations that produce a result, there are times when you want to perform an action that doesn’t return anything. That’s where CompletableFuture.runAsync() comes in.

This method is used to run a Runnable task asynchronously. A Runnable doesn’t return a result. Hence, the CompletableFuture returned by runAsync() is of type Void.

static void runAsync() {
    System.out.println("runAsync...");
    CompletableFuture<Void> future = CompletableFuture.runAsync(() -> {
        // Simulate some long-running task
        try {
            System.out.println("runAsync sleeping...");
            Thread.sleep(1500);
            System.out.println("runAsync done sleeping!");
        } catch (InterruptedException e) {
            throw new IllegalStateException(e);
        }
    });

    // This will block until the future is completed (i.e., the Runnable has executed)
    future.join();

    System.out.println("runAsync done!");

}
static void runWithCustomExecutor() {
    System.out.println("runWithCustomExecutor...");

    Executor executor = Executors.newFixedThreadPool(2);

    CompletableFuture<Void> future = CompletableFuture.runAsync(() -> {
        // Simulate some long-running task
        try {
            System.out.println("runAsync sleeping...");
            Thread.sleep(1500);
            System.out.println("runAsync done sleeping!");
        } catch (InterruptedException e) {
            throw new IllegalStateException(e);
        }
    }, executor);

    // This will block until the future is completed (i.e., the Runnable has executed)
    future.join();

    System.out.println("runWithCustomExecutor done!");

}

 

Key Takeaways:

CompletableFuture.supplyAsync() initiates a nonblocking, asynchronous computation.

By default, tasks execute in the ForkJoinPool.commonPool(), but you can provide your custom Executor.

It provides a more modern, fluent, and readable approach to asynchronous programming in Java, especially when combined with other methods of CompletableFuture for chaining, combining, and handling results.

 

July 27, 2021

MySQL Json

MySQL supports the native JSON data type since version 5.7.8. The native JSON data type allows you to store JSON documents more efficiently than the JSON text format in the previous versions.

MySQL stores JSON documents in an internal format that allows quick read access to document elements. The JSON binary format is structured in the way that permits the server to search for values within the JSON document directly by key or array index, which is very fast.

The storage of a JSON document is approximately the same as the storage of LONGBLOB or LONGTEXT data.

Json Data type

CREATE TABLE events (
    ...
    browser_info JSON,
    ... 
);

Insert into json column

INSERT INTO events(browser_info) 
VALUES (
   '{ "name": "Safari", "os": "Mac", "resolution": { "x": 1920, "y": 1080 } }'
)

Automatic validation of JSON documents stored in JSON columns. Invalid documents produce an error.

Json Object

Evaluates a (possibly empty) list of key-value pairs and returns a JSON object containing those pairs. An error occurs if any key name is NULL or the number of arguments is odd.

SELECT JSON_OBJECT('id',u.id, 
'firstName',u.first_name, 
'lastName',u.first_name) as jsonUser
FROM user as u;

Json Array

Evaluates a (possibly empty) list of values and returns a JSON array containing those values.

SELECT JSON_ARRAY(u.id, 
u.first_name, 
u.first_name) as jsonUser
FROM user as u;

Json Object Agg
Return result set as a single JSON object
Takes two column names or expressions as arguments, the first of these being used as a key and the second as a value, and returns a JSON object containing key-value pairs. Returns NULL if the result contains no rows, or in the event of an error. An error occurs if any key name is NULL or the number of arguments is not equal to 2.

 

SELECT JSON_OBJECTAGG(u.id, u.firstName, u.lastName) as jsonData
FROM user as u;

// output
{
  "id": 1,
  "firstName": "John",
  "lastName": "Peter"
}

Json Array Agg

Return result set as a single JSON array
Aggregates a result set as a single JSON array whose elements consist of the rows. The order of elements in this array is undefined. The function acts on a column or an expression that evaluates to a single value. Returns NULL if the result contains no rows, or in the event of an error.

SELECT JSON_PRETTY(JSON_OBJECT('userId', u.id, 'cards', cardList)) as jsonData
FROM user as u
LEFT JOIN (SELECT c.user_id, 
    JSON_ARRAYAGG(
        JSON_OBJECT(
        'cardId', c.id,
        'cardNumber', c.card_number)
        ) as cardList
    FROM card as c
    GROUP BY c.user_id) as cards ON u.id = cards.user_id;
{
  "cards": [
    {
      "cardId": 4,
      "cardNumber": "2440531"
    },
    {
      "cardId": 11,
      "cardNumber": "4061190"
    }
  ],
  "userId": 1
}

How to accomplish JSON_ARRAYAGG before version 5.7.8

SELECT JSON_PRETTY(JSON_OBJECT('userId', u.id, 'cards', cardList)) as jsonData
FROM user as u
LEFT JOIN (SELECT c.user_id, 
    CONCAT('[', GROUP_CONCAT(
        JSON_OBJECT(
        'cardId', c.id,
        'cardNumber', c.card_number)
        ), ']') as cardList
    FROM card as c
    GROUP BY c.user_id) as cards ON u.id = cards.user_id;

 

Json Pretty

Provides pretty-printing of JSON values similar to that implemented in PHP and by other languages and database systems. The value supplied must be a JSON value or a valid string representation of a JSON value. 

SELECT JSON_PRETTY(JSON_OBJECT('id',u.id, 
'firstName',u.first_name, 
'lastName',u.first_name)) as jsonUser
FROM user as u;

Json Extract

 json_extract(json_docpath[, path] ...) 

Returns data from a JSON document, selected from the parts of the document matched by the path arguments. Returns NULL if any argument is NULL or no paths locate a value in the document. An error occurs if the json_doc argument is not a valid JSON document or any path argument is not a valid path expression.

 

 

July 25, 2021