What is Elasticsearch?

Elasticsearch is the distributed search and analytics engine. Elasticsearch provides near real-time search and analytics for all types of data. Whether you have structured or unstructured text, numerical data, or geospatial data, Elasticsearch can efficiently store and index it in a way that supports fast searches. It is accessible from RESTful web service interface and uses schema less JSON (JavaScript Object Notation) documents to store data. It is built on Java programming language and hence Elasticsearch can run on different platforms. It enables users to explore very large amount of data at very high speed.

General Features

  • Elasticsearch is scalable up to petabytes of structured and unstructured data.
  • Elasticsearch can be used as a replacement of document stores like MongoDB and RavenDB.
  • Elasticsearch uses denormalization to improve the search performance.
  • Elasticsearch is an open source and available under the Apache license version 2.0.
  • Elasticsearch is one of the popular enterprise search engines, and is currently being used by many big organizations like Wikipedia, The Guardian, StackOverflow, GitHub etc.
  • Store and analyze logs, metrics, and security event data
  • Use machine learning to automatically model the behavior of your data in real time
  • Automate business workflows using Elasticsearch as a storage engine
  • Manage, integrate, and analyze spatial information using Elasticsearch as a geographic information system (GIS)

Data stored as Document

Elasticsearch is a distributed document store. Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents. When you have multiple Elasticsearch nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node. 

When a document is stored, it is indexed and fully searchable in near real-time –within 1 second. Elasticsearch uses a data structure called an inverted index that supports very fast full-text searches. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.

An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data. By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees. The ability to use the per-field data structures to assemble and return search results is what makes Elasticsearch so fast.

Elasticsearch also has the ability to be schema-less, which means that documents can be indexed without explicitly specifying how to handle each of the different fields that might occur in a document. When dynamic mapping is enabled, Elasticsearch automatically detects and adds new fields to the index. This default behavior makes it easy to index and explore your data—​just start indexing documents and Elasticsearch will detect and map booleans, floating point and integer values, dates, and strings to the appropriate Elasticsearch data types.

Node

A node is a single running instance(server) of a cluster

Cluster

A cluster is a collection of nodes. Cluster provides collective indexing and search capabilities across all the nodes for entire data.

Index

It is a collection of different type of documents and their properties. Index also uses the concept of shards to improve the performance. For example, a set of document contains data of a social networking application.

Document

It is a collection of fields in a specific manner defined in JSON format. Every document belongs to a type and resides inside an index. Every document is associated with a unique identifier called the UID.

Shard

Indexes are horizontally subdivided into shards. This means each shard contains all the properties of document but contains less number of JSON objects than index. The horizontal separation makes shard an independent node, which can be store in any node. Primary shard is the original horizontal part of an index and then these primary shards are replicated into replica shards.

Replicas

Elasticsearch allows a user to create replicas of their indexes and shards. Replication not only helps in increasing the availability of data in case of failure, but also improves the performance of searching by carrying out a parallel search operation in these replicas

RDBMS and Elasticsearch

Elasticsearch RDBMS
Cluster Database
Shard Shard
Index Table
Field Column
Document Row

Advantages

  • Elasticsearch is developed on Java, which makes it compatible on almost every platform.
  • Elasticsearch is real time, in other words after one second the added document is searchable in this engine
  • Elasticsearch is distributed, which makes it easy to scale and integrate in any big organization.
  • Creating full backups are easy by using the concept of gateway, which is present in Elasticsearch.
  • Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.
  • Elasticsearch uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.
  • Elasticsearch supports almost every document type except those that do not support text rendering.

Disadvantages

  • Elasticsearch has a problem of Split brain situations at times.



Subscribe To Our Newsletter
You will receive our latest post and tutorial.
Thank you for subscribing!

required
required


Leave a Reply

Your email address will not be published.