Sunday, 6 June 2021

Apache Kafka

Application - Application communication system

  • A message broker
  • Publish/Subscribe broker
  • Keeps log of historic events.
  • Distributed event ledger/log

Topic

  • Logical group of events

Kafka is a Warehouse

Topic-is a Storage room in Warehouse

  • Storage Room. Each Topi can have one or more Partition/Storage counters
  • Partitions are used for concurrent processing
  • Offset is Current index value for each partition

Partition-> Storage Counter

Offset->

 

HR System

Marketing System

Active Directory

How does legacy system work

  • Fetch data from DB OR Webservice at scheduled times using Scheduler

Kafka

  • Central messaging system
  • Activity/Application Log
  • Storing IoT data
  • System decoupling
  • Async processing
  •  
  • Distributed Streaming Platform
  • Messaging system
    • Publish and Subscribe 
  • Advantages of Messaging System

Topic

  • Kafka Topi is same as Queue
  • Kafka guarantees order of records
  •  

 

avro - datatype

Kafka Streams

  • Easy data processing and transformation library within Kafka
    • Data transformation
    • Data enrichment
    • Fraud detection
    • Monitoring and Alerting
MovieFlix

  • Resume video where they left if off
  • Build user profile in real time
  • Recommend next show in real time
  • Store all data in analytics store

  • Show Position
    • Video Player(consumer) -> Video Position Service  -> Kafka
    • Once in a while Video Player sends position to Video Position Service
    • Video Position Service sends it to Kafka
    • Video Player(consumer) -> Resuming Service  <- Kafka
  • Recommendations
    • We have data about which user watches which show and how far
    • Recommendation Engine powered by Kafka Streams take the show position perform some good algorithm and come up with Recommendations
    • The recommendations are consumed by recommendation service 
    • These recommendations can be consumed by Analytics consumer for Analytical store(Hadoop)
Get Taxi - IOT Example
  • The user should match with a close by driver
  • The Pricing should "surge" if the number of drivers are low or the number of users is high
  • All the position data before and during the ride should be stored in an analytics store so that the cost can be computed accurately
  •  
  • User Application -> User Position Service -> Kafka (user_position) Topic
  • Taxi Driver Application -> Taxi Position Service -> Kafka (taxi_position) Topic
  • Surge Pricing computation model(Stream) that consumes User_position & Tax position,  produce Surge Pricing and place in Surge Pricing Topic
  • User Application <- Taxi Cost Service <- Surge Pricing Topic  -> Analytics consumer -> Analytics Store
CQRS - MySocialMedia
  • Command, query, responsibility, Segregation
  • Social media allows people to post images. Others can react using "likes", "comments". Business wants to know the following capabilities:
    • Users should be able to post, like and comment
    • Users should see the total number of likes and comments per post in real time
    • High volume of data is expected on the first day of launch
    • Users should be able to see "trending" posts
  • User Posts -> Posting Service -> Post Topic
  • User Likes -> Like/Comment Service -> like topic
  • User Likes -> Like/Comment Service ->Comments Topic
  • Total Likes/Comments Computation(Kafka Streams) consumes Posts, likes, Comments and performs some aggregations
  • website <- Refresh feed service <-Posts_with_counts(TOPIC)  <- Total Likes/Comments Computation(Kafka Streams) 
  • website <-Trending Feed Service <- Trending posts(TOPIC)<- Trending Posts in past hour(Kafka Streams) 
Finance application - MyBank
  • MyBank is a company that allows real-time banking for its users. It wants to deploy a band-new capability to alert users incase of large transactions
  • Transaction control data already exists in database
  • Database of Transactions -> Kafka connect source CDC connector ->  bank_transaction(Topic)
  • Users set their threshold in apps -> App Threshold Service  -> user_settings(Topic) -> 
  • Real time Big Transaction Detection  consumes bank_transaction & user_settings and evaluates the alert to be sent or not
  • Users see notification in their apps <- Notification service <- user_alerts(Topic)<- Real time Big Transaction Detection 
  • It is common to have "generic" connectors or solutions to offload data from Kafka to HDFS, Amazon S3, and ElasticSearch for example
  • It is also very common to have Kafka serve a "speed layer" for real time applications, while having a "slow layer" which helps with data ingestions into stores for later analytics
  • Kafka as a front to Big Data Ingestion is a common pattern in Big Data to provide an "ingestion buffer" in front of some stores
  • Data Producers (Apps, website, Financials Systems, email, Customer Data, databases) -> Kafka  -> Spark, Stork, Flink etc -> Real time analytics, Dashboards, Apps, Alerts

    Data Producers (Apps, website, Financials Systems, email, Customer Data, databases) -> Kafka  -> Hadeep, Amazon S3, RDBMS -> Data Science, Reporting, Audit, Backup/Long term Storage
  • One of the first use case of Kafka was to ingest logs and metrics from various applications
  • This kind of deployment usually wants high throughput, has less restriction regarding data loss, replication of data etc
  • Appln logs can end up in loggin solutions such as Splunk, CloudWatch, ELK..
  • applications -> application_logs(partition) -> Kafka connect sink -> Plunk
  • applications -> application_metrics(partition) -> Kafka connect sink -> Plunk



No comments:

Post a Comment