ElasticSearch is a full test Search and Analytical engine. It is Robust, High Available & Distributed in nature. It supports Aggregations, Log Analysis, Geo-location data, Machine Learning.
Data is stored as Documents. It is similar to row in rdbms
Elastic Stack
- Kibana->Analytics and Visualization platform
- Logstash -> Data processing pipeline
- X-Pack -> Security, Monitoring, ML, Graph, SQL for documents query
- Beats-
Sharding
- Divide indices into smaller pieces. Each piece is called shard
- Sharding is done at index level
- This is to horizontally scale the data volume
- Searches to terms of all documents
- Content is parsed and stored before hand
- Equivalent to google search
- Search is zooming in -> finding needle in a haystack.
- Analytics is:
- Opposite of search
- Zooming out and looking at a bigger picture
- Logstash
- Helps centralize event data like logs, metrics of any format
- It can perform transformations before sending to stash
- It is a serverside component. Its role is to centralize data from various input sources, transform and forward the data to an output.
- Beats
- Open source lightweight data shippers
- It is a client side component and its role is complementary to logstash.
- It consists of core library and libbeat which provides api for
- ship data from source
- configure input options
- implement logging
- Elastic team build various beats like Packetbeat, Filebeat, metricbeat, Winlogbeat,Audiobeat,heartbeat
- Kibana
- Visualization tool of elastic stack
- Helps to gain powerful insights about data. It is called window into elastic stack
- It offers many visualizations like Histograms, Maps, Linecharts, timeseries and more
- Offers management tools
- Manage settings& configure x-pack security settings
- Offers development tools
- build and test REST api
- X-Pack
- It adds Security, Monitoring, Alerting, reporting and graph capabilities
- Security
- Authentication and authorization
- Secure access to ElsaticSearch and Kibana
- Extension helps to configure Fields and Document level security
- Monitoring
- Monitor Clusters, nodes and index level metrics
- Plugin to maintain performance history.
- Graph
- Elastic Cloud
How does it work: When a document is added to ElasticSearch index, an Inverted index is created by stripping down the document into most optimized form to search. Once the inverted index is created, the document is ready for search.
- It indexes all fields of the document
- Other optimizers make it lightning fast
Use Cases:
- Uber - marketplace dynamics
- Salesforce- log analysis for usage trends
- ebay - search thru 800 million + listings
- New York Times - Search thru 164 years of publications
- bin\elasticsearch
- localhost:9200
- bin\kibana
- localhost:5601
- String
- text
- useful for supporting full-text search for field containing a description
- fields are analyzed before indexing
- keyword
- enables analytics on String fields
- These fields support sorting, filtering and aggregations
Dynamic Mapping
- Elastic search infers datatypes of all fields when first document is indexed with in a non existing type
- GET /catalog/_mapping/product
CURD Operations
- Adding or creating a document into a type within an index of ElasticSearch is called an indexing operation.
- PUT /my_movies/movie/1/_create
{
"name":"Movie one",
"actor_count":10,
"date":"2015-02-10"
}
Elasticsearch APIs
- Document
- Query that matches all documents from all indices of the cluster(default is 10)
- GET /_search
- All documents in one index
- GET /catalog/_search
- GET /catalog/product/_search
- GET /catalog,my_index/product/_search
- GET /_all/product/_search
- Search
- Aggregations
- Indices
- Cluster
- Cat
- Its a Logical namespace that points to 1 or more Shards in an ElasticSearch cluster.
- It is place where data is stored in the form of documents
- Index is broken into shards and Shards are containers of data
- Default number of shards in index is 5 along with one replica shard
Relational DB ------------------ ElasticSearch
Database ------------------ Index
table ------------------ type
row ------------------ document
column ------------------ field
Database ------------------ Index
table ------------------ type
row ------------------ document
column ------------------ field
Create & Drop Index
- curl -XPUT/-XGET/-XDELETE http://localhost:9200/my_test_index
- It is representation of a Class of Similar Documents
- Type is Optional
- eg: index/type/document
Index creation with type
________________
PUT my_movies
{
"mappings":{
"movie":{
"properties":{
"name":{ "type":"text"},
"actor_count":{"type":"integer"},
"date":{"type":"date"}
}
}
}
}
Adding document
_________________
PUT /my_movies/movie/1/_create
{
"name":"Movie one",
"actor_count":10,
"date":"2015-02-10"
}
DELETE /my_movies/movie/1
Queries: match/term
GET /my_movies/movie/_count
{
"query": {
"match":{
"name": "Movie two"
}
}
}
DSL an ultra powerful JSON based language that lets you to execute Queries in Elasticsearch. It supports 2 clauses.
GET /my_movies/movie/_search?explain
{
"query": {
"match":{
"name": "Movie two"
}
}
}
Filter Context - seeks yes or no answer to whether a document matches
GET /my_movies/movie/_search
{
"query": {
"bool":{
"must":[{"match":{"name": "Movie two"}}],
"filter":[{"term":{"actor_count": 10}}]
}
}
}
Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.
________________
PUT my_movies
{
"mappings":{
"movie":{
"properties":{
"name":{ "type":"text"},
"actor_count":{"type":"integer"},
"date":{"type":"date"}
}
}
}
}
Adding document
_________________
PUT /my_movies/movie/1/_create
{
"name":"Movie one",
"actor_count":10,
"date":"2015-02-10"
}
DELETE /my_movies/movie/1
Queries: match/term
GET /my_movies/movie/_count
{
"query": {
"match":{
"name": "Movie two"
}
}
}
DSL an ultra powerful JSON based language that lets you to execute Queries in Elasticsearch. It supports 2 clauses.
- Leaf query
- Match, term or range which searches for a given value in a given field
- Match
- Term
- Exists - Documents with Not Null Field
- Type - Match documents based on mapping type
- Range - Objects/Documents that exists between range of values.
- Compound query
- Combines leaf query and
GET /my_movies/movie/_search?explain
{
"query": {
"match":{
"name": "Movie two"
}
}
}
Filter Context - seeks yes or no answer to whether a document matches
GET /my_movies/movie/_search
{
"query": {
"bool":{
"must":[{"match":{"name": "Movie two"}}],
"filter":[{"term":{"actor_count": 10}}]
}
}
}
Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.
- Supervised Learning - Label data(tables)
- For organized/structured data
- Used for predictions
- Unsupervised Learning
- Analyses unstructured data
- weblogs to determine anomalies and more.
- Elastic search uses unsupervised learning
- Semi-supervised Learning
- Uses both labeled and unlabeled data to create models.
Machine Learning usage
- Anomaly detection
- predictive analytics - prices of houses and so
- Grouping(Clustering)
- Makes model building easier.
- intuitive UI
- Easy to feed data and update model
- works in tandem with elasticsearch
- Data - Get the data
- Train Model
- Feed data
Shards
- Partitions of index.
- Term searches for exact term in specified query. This is most effective when querying values in keyword fields for exact term matches.(Keyword types are not analyzed, text type are analyzed)
- Range to retrieve docs with values that fall in a range for select fields.
- Boosting used to add more weight to one query relative to another.
Aggregations :
- Metrics Aggregation calculates the average numeric value over a given numeric field in a set of documents
- Cardinality: Single value metric that aggregates distinct values.
- Extended stats
- GEO aggregations uses longitude and latitude data from Set of documents to calculate a box that encloses all lon/lat Locations
- Bucket Aggregation places results of search into a numerical distribution grouped into buckets.
ElasticStack - It is full search and analytical stack. ElasticSearch is at the heart of Elastic stack providing storage, search and analytical capabilities. This is built on radically different technology - Apache Lucene.
Components of Elastic Stack:
ElasticSearch(Distributed, RESTful search and analytics) is an analytical engine designed to be scalable, horizontal in nature.
Docs in Elastic Search(it is like ROW in a rdbms table)
Components of Elastic Stack:
- Logstash - Centralize data from input sources. Transform and forward the data to an output.
- Beats - Its role is to complementary to logstash. This is client side component. Provides api to ship data from source, configure input options & implement logging.
- Elastic Search -
- Kibana- Visualization tool of elastic stack. Window into elastic stack. It also has management tools to manage settings and x-pack security features. It also offers development tools to build and test REST API requests.
- X-Pack - It adds security, monitoring, alerting, reporting and graph capabilities to Elastic Stack.
- Security - Authentication and authorization capabilities to elastic search and kibana.
- Monitoring -
- Reporting -
- Alerting -
- Graph -
- Elastic Cloud -
ElasticSearch(Distributed, RESTful search and analytics) is an analytical engine designed to be scalable, horizontal in nature.
- Key Characteristics
- Search, Index and analyze data
- Language agnostic
- Built-in machine learning
- Scalable, Highly available, Distributed
- Goals
- Lightning fast search
- Analytics Engine
- Near Real-time
- When document is added to elastic search index, an inverted index is created. Once inverted index is created, it is available for search.
- Powerful Rest API
- Features
- Aggregations
- Log Analysis
- Geo-location data analysis
- Machine learning
- Installing ElasticSearch
- Download elasticsearch from Elastic.co
- Extract downloaded file into a directory
- Map to ElasticSearch directory
- Run command to start cluster
- Installing Kibana
- Download Kibana from Elastic.co
- Extract downloaded file into a directory
- Edit Kibana configuration file
- Map to Kibana directory and start
- It is a logical namespace that points to 1 or more Shards(partition or containers of data) in an Elastic Search cluster
- It is the place where data is stored in the form of documents
- Index is broken into shards and Shards are containers for data. Default number of shards in index is 5.
Docs in Elastic Search(it is like ROW in a rdbms table)
- It is individual entry that is the primary method for adding data.
- Type is a representation of a Class of similar documents (table in rdbms).
- Type is optional in elastic world
- Cluster is One or More instances of Running on a given network
- Node is an ElasticSearch instance. They can handle HTTP and transport protocols.
- Bulk API
ElasticSearch APIs
- Document
- Pretty=true
- Search
- Aggregation
- Indices
- Cluster
- Cat
No comments:
Post a Comment