Application - Application communication system
- A message broker
- Publish/Subscribe broker
- Keeps log of historic events.
- Distributed event ledger/log
Topic
- Logical group of events
Kafka is a Warehouse
Topic-is a Storage room in Warehouse
- Storage Room. Each Topi can have one or more Partition/Storage counters
- Partitions are used for concurrent processing
- Offset is Current index value for each partition
Partition-> Storage Counter
Offset->
HR System
Marketing System
Active Directory
How does legacy system work
- Fetch data from DB OR Webservice at scheduled times using Scheduler
Kafka
- Central messaging system
- Activity/Application Log
- Storing IoT data
- System decoupling
- Async processing
-
- Distributed Streaming Platform
- Messaging system
- Publish and Subscribe
- Advantages of Messaging System
Topic
- Kafka Topi is same as Queue
- Kafka guarantees order of records
-
avro - datatype
Kafka Streams
- Easy data processing and transformation library within Kafka
- Data transformation
- Data enrichment
- Fraud detection
- Monitoring and Alerting
- Resume video where they left if off
- Build user profile in real time
- Recommend next show in real time
- Store all data in analytics store
- Show Position
- Video Player(consumer) -> Video Position Service -> Kafka
- Once in a while Video Player sends position to Video Position Service
- Video Position Service sends it to Kafka
- Video Player(consumer) -> Resuming Service <- Kafka
- Recommendations
- We have data about which user watches which show and how far
- Recommendation Engine powered by Kafka Streams take the show position perform some good algorithm and come up with Recommendations
- The recommendations are consumed by recommendation service
- These recommendations can be consumed by Analytics consumer for Analytical store(Hadoop)
- The user should match with a close by driver
- The Pricing should "surge" if the number of drivers are low or the number of users is high
- All the position data before and during the ride should be stored in an analytics store so that the cost can be computed accurately
- User Application -> User Position Service -> Kafka (user_position) Topic
- Taxi Driver Application -> Taxi Position Service -> Kafka (taxi_position) Topic
- Surge Pricing computation model(Stream) that consumes User_position & Tax position, produce Surge Pricing and place in Surge Pricing Topic
- User Application <- Taxi Cost Service <- Surge Pricing Topic -> Analytics consumer -> Analytics Store
- Command, query, responsibility, Segregation
- Social media allows people to post images. Others can react using "likes", "comments". Business wants to know the following capabilities:
- Users should be able to post, like and comment
- Users should see the total number of likes and comments per post in real time
- High volume of data is expected on the first day of launch
- Users should be able to see "trending" posts
- User Posts -> Posting Service -> Post Topic
- User Likes -> Like/Comment Service -> like topic
- User Likes -> Like/Comment Service ->Comments Topic
- Total Likes/Comments Computation(Kafka Streams) consumes Posts, likes, Comments and performs some aggregations
- website <- Refresh feed service <-Posts_with_counts(TOPIC) <- Total Likes/Comments Computation(Kafka Streams)
- website <-Trending Feed Service <- Trending posts(TOPIC)<- Trending Posts in past hour(Kafka Streams)
- MyBank is a company that allows real-time banking for its users. It wants to deploy a band-new capability to alert users incase of large transactions
- Transaction control data already exists in database
- Database of Transactions -> Kafka connect source CDC connector -> bank_transaction(Topic)
- Users set their threshold in apps -> App Threshold Service -> user_settings(Topic) ->
- Real time Big Transaction Detection consumes bank_transaction & user_settings and evaluates the alert to be sent or not
- Users see notification in their apps <- Notification service <- user_alerts(Topic)<- Real time Big Transaction Detection
- It is common to have "generic" connectors or solutions to offload data from Kafka to HDFS, Amazon S3, and ElasticSearch for example
- It is also very common to have Kafka serve a "speed layer" for real time applications, while having a "slow layer" which helps with data ingestions into stores for later analytics
- Kafka as a front to Big Data Ingestion is a common pattern in Big Data to provide an "ingestion buffer" in front of some stores
- Data Producers (Apps, website, Financials Systems, email, Customer Data, databases) -> Kafka -> Spark, Stork, Flink etc -> Real time analytics, Dashboards, Apps, Alerts
Data Producers (Apps, website, Financials Systems, email, Customer Data, databases) -> Kafka -> Hadeep, Amazon S3, RDBMS -> Data Science, Reporting, Audit, Backup/Long term Storage
- One of the first use case of Kafka was to ingest logs and metrics from various applications
- This kind of deployment usually wants high throughput, has less restriction regarding data loss, replication of data etc
- Appln logs can end up in loggin solutions such as Splunk, CloudWatch, ELK..
- applications -> application_logs(partition) -> Kafka connect sink -> Plunk
- applications -> application_metrics(partition) -> Kafka connect sink -> Plunk
No comments:
Post a Comment