Technology: ElasticSearch

ElasticSearch is a full test Search and Analytical engine. It is Robust, High Available & Distributed in nature. It supports Aggregations, Log Analysis, Geo-location data, Machine Learning.

Data is stored as Documents. It is similar to row in rdbms

Elastic Stack

Kibana->Analytics and Visualization platform
Logstash -> Data processing pipeline
X-Pack -> Security, Monitoring, ML, Graph, SQL for documents query
Beats-

Sharding

Divide indices into smaller pieces. Each piece is called shard
Sharding is done at index level
This is to horizontally scale the data volume

It is a Full-text search

Searches to terms of all documents
Content is parsed and stored before hand
Equivalent to google search

Analytics:

Search is zooming in -> finding needle in a haystack.
Analytics is:

Opposite of search
Zooming out and looking at a bigger picture

Components of elasticsearch

Logstash

Helps centralize event data like logs, metrics of any format
It can perform transformations before sending to stash
It is a serverside component. Its role is to centralize data from various input sources, transform and forward the data to an output.

Beats

Open source lightweight data shippers
It is a client side component and its role is complementary to logstash.
It consists of core library and libbeat which provides api for

ship data from source
configure input options
implement logging

Elastic team build various beats like Packetbeat, Filebeat, metricbeat, Winlogbeat,Audiobeat,heartbeat

Kibana

Visualization tool of elastic stack
Helps to gain powerful insights about data. It is called window into elastic stack
It offers many visualizations like Histograms, Maps, Linecharts, timeseries and more
Offers management tools

Manage settings& configure x-pack security settings

Offers development tools

build and test REST api

X-Pack

It adds Security, Monitoring, Alerting, reporting and graph capabilities
Security

Authentication and authorization
Secure access to ElsaticSearch and Kibana
Extension helps to configure Fields and Document level security

Monitoring

Monitor Clusters, nodes and index level metrics
Plugin to maintain performance history.

Graph

Elastic Cloud

How does it work: When a document is added to ElasticSearch index, an Inverted index is created by stripping down the document into most optimized form to search. Once the inverted index is created, the document is ready for search.

It indexes all fields of the document
Other optimizers make it lightning fast

Use Cases:

Uber - marketplace dynamics
Salesforce- log analysis for usage trends
ebay - search thru 800 million + listings
New York Times - Search thru 164 years of publications

Command to Start ElasticSearch Cluster

bin\elasticsearch
localhost:9200

Kibana

bin\kibana
localhost:5601

DataTypes

String

text

useful for supporting full-text search for field containing a description
fields are analyzed before indexing

keyword

enables analytics on String fields
These fields support sorting, filtering and aggregations

Dynamic Mapping

Elastic search infers datatypes of all fields when first document is indexed with in a non existing type
GET /catalog/_mapping/product

CURD Operations

Adding or creating a document into a type within an index of ElasticSearch is called an indexing operation.
PUT /my_movies/movie/1/_create
{
"name":"Movie one",
"actor_count":10,
"date":"2015-02-10"
}

Elasticsearch APIs

Document

Query that matches all documents from all indices of the cluster(default is 10)

GET /_search

All documents in one index

GET /catalog/_search
GET /catalog/product/_search
GET /catalog,my_index/product/_search
GET /_all/product/_search

Search
Aggregations
Indices
Cluster
Cat

What is an Index:

Its a Logical namespace that points to 1 or more Shards in an ElasticSearch cluster.
It is place where data is stored in the form of documents
Index is broken into shards and Shards are containers of data
Default number of shards in index is 5 along with one replica shard

Relational DB    ------------------ ElasticSearch
Database         ------------------   Index
table      ------------------   type
row          ------------------   document
column            ------------------   field

Create & Drop Index

curl -XPUT/-XGET/-XDELETE http://localhost:9200/my_test_index

Type

It is representation of a Class of Similar Documents
Type is Optional
eg: index/type/document

Index creation with type
________________
PUT my_movies
{
"mappings":{
    "movie":{
      "properties":{
         "name":{ "type":"text"},
         "actor_count":{"type":"integer"},
    "date":{"type":"date"}
       }
      }
    }
}

Adding document
_________________
PUT /my_movies/movie/1/_create
{
"name":"Movie one",
"actor_count":10,
"date":"2015-02-10"
}

DELETE /my_movies/movie/1

Queries: match/term

GET /my_movies/movie/_count
{
"query": {
    "match":{
                "name": "Movie two"
    }
}
}

DSL an ultra powerful JSON based language that lets you to execute Queries in Elasticsearch. It supports 2 clauses.

Leaf query

Match, term or range which searches for a given value in a given field

Match
Term

Exists - Documents with Not Null Field
Type - Match documents based on mapping type
Range - Objects/Documents that exists between range of values.

Compound query

Combines leaf query and

Query Context - Matches documents and calculates a _score
GET /my_movies/movie/_search?explain
{
"query": {
    "match":{
                "name": "Movie two"
    }
}
}

Filter Context - seeks yes or no answer to whether a document matches
GET /my_movies/movie/_search
{
"query": {

    "bool":{
   "must":[{"match":{"name": "Movie two"}}],
   "filter":[{"term":{"actor_count": 10}}]

    }
}
}

Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.

Supervised Learning - Label data(tables)

For organized/structured data
Used for predictions

Unsupervised Learning

Analyses unstructured data
weblogs to determine anomalies and more.
Elastic search uses unsupervised learning

Semi-supervised Learning

Uses both labeled and unlabeled data to create models.

Machine Learning usage

Anomaly detection
predictive analytics - prices of houses and so
Grouping(Clustering)

Elasticsearch and machine learning

Makes model building easier.
intuitive UI
Easy to feed data and update model
works in tandem with elasticsearch

Machine Learning steps:

Data - Get the data
Train Model
Feed data

Shards

Partitions of index.

Term, Range & Boosting:

Term searches for exact term in specified query. This is most effective when querying values in keyword fields for exact term matches.(Keyword types are not analyzed, text type are analyzed)
Range to retrieve docs with values that fall in a range for select fields.
Boosting used to add more weight to one query relative to another.

Aggregations :

Metrics Aggregation calculates the average numeric value over a given numeric field in a set of documents

Cardinality: Single value metric that aggregates distinct values.
Extended stats
GEO aggregations uses longitude and latitude data from Set of documents to calculate a box that encloses all lon/lat Locations

Bucket Aggregation places results of search into a numerical distribution grouped into buckets.

ElasticStack - It is full search and analytical stack. ElasticSearch is at the heart of Elastic stack providing storage, search and analytical capabilities. This is built on radically different technology - Apache Lucene.

Components of Elastic Stack:

Logstash - Centralize data from input sources. Transform and forward the data to an output.
Beats - Its role is to complementary to logstash. This is client side component. Provides api to ship data from source, configure input options & implement logging.
Elastic Search -
Kibana- Visualization tool of elastic stack. Window into elastic stack. It also has management tools to manage settings and x-pack security features. It also offers development tools to build and test REST API requests.
X-Pack - It adds security, monitoring, alerting, reporting and graph capabilities to Elastic Stack.

Security - Authentication and authorization capabilities to elastic search and kibana.
Monitoring -
Reporting -
Alerting -
Graph -

Elastic Cloud -

ElasticSearch(Distributed, RESTful search and analytics) is an analytical engine designed to be scalable, horizontal in nature.

Key Characteristics

Search, Index and analyze data
Language agnostic
Built-in machine learning
Scalable, Highly available, Distributed

Goals

Lightning fast search
Analytics Engine
Near Real-time

When document is added to elastic search index, an inverted index is created. Once inverted index is created, it is available for search.

Powerful Rest API

Features

Aggregations
Log Analysis
Geo-location data analysis
Machine learning

Installing ElasticSearch

Download elasticsearch from Elastic.co
Extract downloaded file into a directory
Map to ElasticSearch directory
Run command to start cluster

Installing Kibana

Download Kibana from Elastic.co
Extract downloaded file into a directory
Edit Kibana configuration file
Map to Kibana directory and start

What is Index

It is a logical namespace that points to 1 or more Shards(partition or containers of data) in an Elastic Search cluster
It is the place where data is stored in the form of documents
Index is broken into shards and Shards are containers for data. Default number of shards in index is 5.

Docs in Elastic Search(it is like ROW in a rdbms table)

It is individual entry that is the primary method for adding data.
Type is a representation of a Class of similar documents (table in rdbms).
Type is optional in elastic world

What is a cluster

Cluster is One or More instances of Running on a given network
Node is an ElasticSearch instance. They can handle HTTP and transport protocols.

Shards and replicas

Bulk API

ElasticSearch APIs

Document

Pretty=true

Search
Aggregation
Indices
Cluster
Cat

Technology

Tuesday, 16 October 2018

ElasticSearch

No comments:

Post a Comment

Blog Archive