Technology: Apache Kafka

Application - Application communication system

Topic

Kafka is a Warehouse

Topic-is a Storage room in Warehouse

Partition-> Storage Counter

Offset->

HR System

Marketing System

Active Directory

How does legacy system work

Kafka

Topic

avro - datatype

Kafka Streams

MovieFlix

We have data about which user watches which show and how far
Recommendation Engine powered by Kafka Streams take the show position perform some good algorithm and come up with Recommendations
The recommendations are consumed by recommendation service
These recommendations can be consumed by Analytics consumer for Analytical store(Hadoop)

Get Taxi - IOT Example

The user should match with a close by driver
The Pricing should "surge" if the number of drivers are low or the number of users is high
All the position data before and during the ride should be stored in an analytics store so that the cost can be computed accurately
User Application -> User Position Service -> Kafka (user_position) Topic
Taxi Driver Application -> Taxi Position Service -> Kafka (taxi_position) Topic
Surge Pricing computation model(Stream) that consumes User_position & Tax position, produce Surge Pricing and place in Surge Pricing Topic
User Application <- Taxi Cost Service <- Surge Pricing Topic -> Analytics consumer -> Analytics Store

Command, query, responsibility, Segregation
Social media allows people to post images. Others can react using "likes", "comments". Business wants to know the following capabilities:

User Posts -> Posting Service -> Post Topic
User Likes -> Like/Comment Service -> like topic
User Likes -> Like/Comment Service ->Comments Topic
Total Likes/Comments Computation(Kafka Streams) consumes Posts, likes, Comments and performs some aggregations
website <- Refresh feed service <-Posts_with_counts(TOPIC) <- Total Likes/Comments Computation(Kafka Streams)
website <-Trending Feed Service <- Trending posts(TOPIC)<- Trending Posts in past hour(Kafka Streams)

MyBank is a company that allows real-time banking for its users. It wants to deploy a band-new capability to alert users incase of large transactions
Transaction control data already exists in database
Database of Transactions -> Kafka connect source CDC connector -> bank_transaction(Topic)
Users set their threshold in apps -> App Threshold Service -> user_settings(Topic) ->
Real time Big Transaction Detection consumes bank_transaction & user_settings and evaluates the alert to be sent or not
Users see notification in their apps <- Notification service <- user_alerts(Topic)<- Real time Big Transaction Detection

It is common to have "generic" connectors or solutions to offload data from Kafka to HDFS, Amazon S3, and ElasticSearch for example
It is also very common to have Kafka serve a "speed layer" for real time applications, while having a "slow layer" which helps with data ingestions into stores for later analytics
Kafka as a front to Big Data Ingestion is a common pattern in Big Data to provide an "ingestion buffer" in front of some stores
Data Producers (Apps, website, Financials Systems, email, Customer Data, databases) -> Kafka -> Spark, Stork, Flink etc -> Real time analytics, Dashboards, Apps, Alerts

Data Producers (Apps, website, Financials Systems, email, Customer Data, databases) -> Kafka -> Hadeep, Amazon S3, RDBMS -> Data Science, Reporting, Audit, Backup/Long term Storage

One of the first use case of Kafka was to ingest logs and metrics from various applications
This kind of deployment usually wants high throughput, has less restriction regarding data loss, replication of data etc
Appln logs can end up in loggin solutions such as Splunk, CloudWatch, ELK..
applications -> application_logs(partition) -> Kafka connect sink -> Plunk
applications -> application_metrics(partition) -> Kafka connect sink -> Plunk

Technology