Tuesday, 29 June 2021

System Designs

 Airline Reservation System

  • Requirements
    • Search available Flights for source, destination & date
    • Show price list, time, Choose flight, seats and select one.
    • Payment and Notification system 
  • DataModel
    • FlightManagement
      • Airlines
        • Flights
          • SeatsTemplate
      • Airports
        • Flight(FlightNumber, From, To, duration)
          • Flight_Seat_Template(Seat, class)
          • FlightSchedules(date, time, gate, status)
            • FlightSeatAllocations
    • FlightReservation
      • Customer(name, email, mobile)
        • Reservation(reservationNumber, #seats, flight, status, Payment mode, Pay details)
          • FlightSeatAllocations(customername, age, price)
    • Payment & Notifications

 Movie Reservation System

  • Requirements
    • Movie Info
    • Search by movie name, Theater name, city name
    • Select Movie & Reserve tickets
    • Payment & Notifications
    • Comments, Rating
    • Offers?
  • REST API
    • GET ListMovies(City, TheaterName, datetime)
    • POST SeatBooking(MovieId, showId, SeatsToBeReserved)

  • DataModel
    • Movie
      • MovieList
      Theater Management
      • Cities
        • Theaters
          • Screens
            • ScreenSeatTemplate
            • Shows(for one screen & Movie combination, date, timing)
              • ShowSeatAllocations
    • Ticket Reservation
      • Users
        • Reservation
          • SeatAllocations
  • Payments & Notifications
  • NOSQL tables
    • Comments, ratings, movieinfo, trailers, artists, cast,  reviews, analytics
  • HighLevelDesign
    • User -> LoadBalancer -> appln server ->

Applicaiton Design

  • CDN for images and videos
  • LogStash & ElasticSearch for Searching. Kibana dashboard
  • NoSQL for storing movie infor, comments & likes(ever increasing)
    • Have replication factor for backups
  • rdbms for reservation system(fixed data and acid properies).
    • Have slaves for reading & Master for read/write
  • Cache all Movie, Seat information(Memcache, Redis) from both NoSQL & rdbms
  • Message queue for notifications(Rabbit MQ)
  • Recommendation engine
    • Hadoop
  • Trends
    • Kafka to feed data into spark
      • Spark or Storm latest trends, fraud detection and mitigation strategies

Handling Concurrency


E-commerse

  • VendorManagement
  • Home page & Search
  • Recommendation systems
  • Wishlist & checkout
  • Orderprocessing
  • Payment and Notification

Online Food Ordering System

  •  Requirements
    • Search by Restaurants & food
    • Add to card and order food
    • Payment & notifications
  • Flow
    • Setup
      • Customer to signup and create account
      • Restaurant to register
      • App to track customer location and show nearby restaurants
      • Customers can search based on Restaurant names / items
    • Order processing
      • Customer to select items from one specific restaurant only
      • Payment thru pay gateway
      • Restaurant gets notification
    •  Order fulfilment
      • Restaurant sends notification of food prepared
      • Delivery boy around same location accepts the request
      • User track delivery boy location
      • Notification when Order delivered to customer
  • Entities
    • Customer account
      • Address(lat, long, pincode)
    • Restaurants regitration (name, location, Geo location)
      • Menus(drink, veg, non-veg)
        • Items
    • Order
      • items
    • DeliveryBoy 
  • MicroServices involved
    • Ordering
    • Restaurant Search service(Elastic Search) - supports geodistancequery
    • Restaurant profile service
    • Order fulfillment service
    • Dispatcher service

 Stock Brokerage System

  • User Management
    • User
    • Watchlist
    • StockHoldings
  • StockManagement
    • List of stocks
    • ListOfOrders
  • Payment
  • Notifications
  • Design Patterns
    • Observer Design pattern

Facebook Feeds

  • Requirement
    • Append new post as they arrive
    • Feed may contain image/video/text
    • New feed must be generated based on users followed
    • scalable system
  • Given billions of users use the system, let us focus on backend 
    • Database should no NO-SQL(Cassandra, MongoDB)
    • Datamodel
      • User
      • Friend
      • FeedItem(feeditemid, userid, content, location)
      • Media(MediaID, type, description, location)
      • FeedMedia(feeditemid, MediaID)
    • Offline generation of news feed
      • background job/server generates and push to cache
      • Cache will be like UserId -> LinkedHashMap<FeedItem> feeditems; lastupdatetime;
    • FeednotificationService
      • Pull
      • Push

TinyURL

  • Requirement
    • Unique 7 leters alias using 72 letters(A-Z a-z 0-9).
    • scalable system 
  • Convert long url into unique decimal value using md5 hashin
    • md5 hash value -> convert to Binary -> take first 45bits -> convert to decimal number-> base62 approach -> use hashmap to convert 7 decimal reminders to 7 chars
  • Base 62 approach to convert any decimal to 7 chars
  • Datamodel
    • User
    •  URL(
  • System Design
    • User -> Load balance -> appln server -> url generating service -> DB
YouTube, Netflix
  • Requirement
    • Upload videos,
    • Share and View videos
    • Search on titles
    • Add and view comment
    • Likes/dislikes & total view
  • Rest apis
    • Upload, Search, Stream
  • Highlevel design
    • Client -> webserver -> appln server -> processing Queue -> encoder to encode into multiple formats -> Thumbnail -> 
  • Database
    • UserDB
    • MetadataDB
      • meta data for video
      • Videos stored in DFS (distributed file system)
  • DB schema
    • Video
    • Comment
    • User
  • Lowleve details
    • Thumbnails in Bigtable by google
    • Videos -> haddop distributed file sytem
  • Sharding
    • done on video id rather than user id
Whats app/Facebook Messger
  • Requirements
    • Support one-one conversation
    • Offline/online status
    • Chat history
  • Non-Functional Requirements
    • Low latency
    • High availability
  • Database design
    • Users
    • UserConversation
  • Highlevel Design
    • Chat server say can hold 20K connections. There will be clusters of chat servers
    • UserA sends  message. It goes to Chat server which is assigned to user
    • Chat server stores the message to DB an forwards it to another chat server where receipent is assigned.
    • Use HBASE(Wide column database)
    •  Chat server maintains hashmap of connected users HashMap<user, connection>

  • Long Polling technique
Twitter/Instagram
  • Requirements
    • Post new tweets
    • Follow others
    • Post photos/Videos
    • Timeline 
  • Non-Functional Requirements
    • Low latency
    • High availability
  •  Rest apis
    • Post Tweet()
    • GET Tweets()
  • HLD
    • Read Heavy
    • Client -> Loadbalancer -> appserver -> Aggregator -> db shards for text & filesystem for photos.(Hadoop distributed file sytem) also cache
  • Database design- NoSQL (Mongo OR Cassandra)
    • MySQL
      • User() -> Data is limited
    • NoSQL(Mongo or Cassandra) -> ever growing data
      • tweet(tweetid, userid, content, location)
      • Friends()
  • Database Partitioning (Sharding)
    • ByUserID -> May had disadvantage when Celebrity tweets due to many followers
    • TweetId-> Use Cache to overcome the disadvantage of few distributed tweets of a user
Deck Of Cards
  • 4 suits -  Diamond, spade, heart, club
Zoom Car(Car Renting
  • Consumer
  • Vehicle
  • VehicleInventory
  • VehicleReservation
  • CarRentalLocations

Online Shopping

  • Search
  • Buy
  • Payment
  • Notifications 
Design Hit Counter
  • Queue based approach-> Cost is O(n)
  • Array based approach ->
Uber
  • Requirements
    • Customer
      • Customer can request a ride
      • After booking customer should be able to see all available drivers
      • Customer can see driver location
    •  Driver
      • Can see customer location
      • Upon reaching destination, driver marks the trip as complete to become available for others
  • Quad Tree(A node will have 4 child nodes)
    • World map divided into small grids in the specific range of latitude & longitude
    • Start with one node that represents whole world
    • Since it has more than 500M locations, break it into 4 nodes.
    • Repeat the process until there are no node left with more than 500 locations
  • Hashmap as cache and copy to quad tree once a min
  • Notification Server -> works on pub-sub model
    • When user requests for ride, system subscribes in notification server.
    • Notification server will communicate with Quad server and store location of user in quad tree
    • Notification server will communicate to Driver as well. 
    • Notification server will then communicate driver location to user
  • LLD
    • User -> find a ride-> quad server -> quad tree
    • driver -> update location -> notification server -> hashMap
    • User-> driver location -> notification server -> hashMap
Gmail
  • User Service
  • Mail Service
  • Authentication service

Instagram(Photo Sharing Service)

  • Requirements
    • Upload/download/view photo
    • Search based on photo title
    • Follow other users
    • Feed
  • DB Schema/Data flow
    • User(userid, mail, dob, creation_time)
    • UserPosts(id, userid, date, type, location)
    • UserFollow(id,
  • Use nosql as it is scalable(Cassandra, MongoDB)
  • epoch time -> everything represented as secs
  • Sharding
    • Technique to breakup the huge database into smaller parts
 Distributed cache
  • memcache/Redis
  • appln server -> Load balancer -> Cache1..N
  • Disadvantage 
    • Rehashing when more cache servers are added/deleted
  • Consistent hashing
    • No need to do modulus with count of servers
    • Total set of values are mapped into Hash ring
    • Hash ring stores key of all values
    • when a value is looked up, hashing is performed. It finds nearest server to it in clockwise direction.
  •  master/slave architecture for backups
  • Configuration Manager
    • zookeeper
 Google Drive/Drop box
  • User -> upload file -> appln server -> cloud storage
  • Metadata DB -> (chunk hash, file, user, device, workspace ID)
    • Which chunk stores which part
  • HLD
    • Client -> block server -> cloud storage
    • Client -> synchronization server -> Metadata DB
  • DLD
    • Client -> split the files into smaller pieces of chunks
    • Maintain workspace in client appln
    • Take care of offline clients
    • sending notification of any modification in local client machine
 
Google Maps
  • Influencers
    • Weather
    • Traffic
    • Road
    • compute modified route and distance when user changes direction
    • Quad Tree
EDA(Event Driven Architechture)
  • Loosely coupled
  • Realtime analytics
  • size of architecture
  • Helps in debugging
  • keep sequence of events
  • Mediator component take event from event queue and partitions to different processing units as per the design
  • EDA model is almost same as Pub-Sub model
    •  

Design Pattern

  • Observer Design pattern 
    • This is also called as Publish Subsceription design pattern
    • Youtube Channel or Facebook groups
    • All the subsribers/group members will get notification where channel will push the same.

  • Chain Of Responsibility
    • Used for loose coupling.
    • Handlers in chain will decide themselves who will serve the request
    • Request -> Handler1-> Handler2->Handler3
    •  Eg
      • Multilevel secutity layer
      • ATM cash withdrawl
      • Logger system
        • Error, Debug, info logger
        • Abstract Class Logger
          • Infoclass, DebugClass, ErrorClass
          • Each object will have instance of next in chain object and invokes it at the end.

Saturday, 26 June 2021

Realtime/Inmemory db

PERFORMANCE

  • Read data from hard disk - 63MB/sec
  • Read from SSD(Solid state disk) - 457MB/sec
  • Read from RAM - 4671 MB/sec

examples

  • Key value pairs
    • Redis
    • memcache
  • sql db
    • sqlite

Features of Real time/RAM DB

  • Expensive as RAM is expensive
  • Volatile
  • high performance
  • AVL tree() for indexing

Caching Best practices

  • Validity of data
  • High hit rate
  • Cache miss
  • TTL
Caching features/estimations
  • Tera byte 
  • 50K to 1M QPS
  • ~1 ms latency
  • LRU(eviction)
  • 100% availability
  • scalable
Cache access patterns
  • Write through
    • Write happens both in cache & DB. Acknowledgement happens after write happens in both
  • Write around
    • Write happens in DB only. Cache will be updated in subsequent read request.
  • Write back
    • Write happens in Cache and acknowledment happens. Another service will sync with db asyncly
Distributed transactions
  • Two phase commit
    • Prepare
    • Commit 
    • Cons- latency due to multiple http calls to micro services
  • Three phase commit
    • Recovers incase coordinator/participant fails

  • Saga
Databases
  • Choice of database - factors
    • Structured vs non-structured
    • Query pattern
    • Scale
  • Caching DB
    • Redis
    • memcache
  • Image/Video DB
    • Blob storage
      • Amazon S3  with CDN
  •  Text based search
    • Text Search engine
      • Elastic Search 
      • Solr
  • Timeseries database
    • For metrics monitoring
      • Open TSDB
  • Structured DB
    • need ACID
      • rdbms 
        • orcl, mysql, sqlserver, postgres
  • Document DB/ NOSQL DB
    • Columnar DB
      • Cassandra(Apache)
      • HBase
        • persists data
        • designed to handle a large amount of data across the distributed community server
        • Good large amount of data with finite number of queries. 
  • Redis
  • Elastic(Facebook)
  • mysql
 Design Interview tips
  • High level design or low level design
  • What functional features to be addressed
  • What non-functional features to be addressed
    • #users
    • scale
    • thruput, data size
  • Start with basic building blocks
    • how they interact
    • Get into depth of 1 or 2 components
  • Is it going in right direction
  • where does business flow start

Sunday, 6 June 2021

Apache Kafka

Application - Application communication system

  • A message broker
  • Publish/Subscribe broker
  • Keeps log of historic events.
  • Distributed event ledger/log

Topic

  • Logical group of events

Kafka is a Warehouse

Topic-is a Storage room in Warehouse

  • Storage Room. Each Topi can have one or more Partition/Storage counters
  • Partitions are used for concurrent processing
  • Offset is Current index value for each partition

Partition-> Storage Counter

Offset->

 

HR System

Marketing System

Active Directory

How does legacy system work

  • Fetch data from DB OR Webservice at scheduled times using Scheduler

Kafka

  • Central messaging system
  • Activity/Application Log
  • Storing IoT data
  • System decoupling
  • Async processing
  •  
  • Distributed Streaming Platform
  • Messaging system
    • Publish and Subscribe 
  • Advantages of Messaging System

Topic

  • Kafka Topi is same as Queue
  • Kafka guarantees order of records
  •  

 

avro - datatype

Kafka Streams

  • Easy data processing and transformation library within Kafka
    • Data transformation
    • Data enrichment
    • Fraud detection
    • Monitoring and Alerting
MovieFlix

  • Resume video where they left if off
  • Build user profile in real time
  • Recommend next show in real time
  • Store all data in analytics store

  • Show Position
    • Video Player(consumer) -> Video Position Service  -> Kafka
    • Once in a while Video Player sends position to Video Position Service
    • Video Position Service sends it to Kafka
    • Video Player(consumer) -> Resuming Service  <- Kafka
  • Recommendations
    • We have data about which user watches which show and how far
    • Recommendation Engine powered by Kafka Streams take the show position perform some good algorithm and come up with Recommendations
    • The recommendations are consumed by recommendation service 
    • These recommendations can be consumed by Analytics consumer for Analytical store(Hadoop)
Get Taxi - IOT Example
  • The user should match with a close by driver
  • The Pricing should "surge" if the number of drivers are low or the number of users is high
  • All the position data before and during the ride should be stored in an analytics store so that the cost can be computed accurately
  •  
  • User Application -> User Position Service -> Kafka (user_position) Topic
  • Taxi Driver Application -> Taxi Position Service -> Kafka (taxi_position) Topic
  • Surge Pricing computation model(Stream) that consumes User_position & Tax position,  produce Surge Pricing and place in Surge Pricing Topic
  • User Application <- Taxi Cost Service <- Surge Pricing Topic  -> Analytics consumer -> Analytics Store
CQRS - MySocialMedia
  • Command, query, responsibility, Segregation
  • Social media allows people to post images. Others can react using "likes", "comments". Business wants to know the following capabilities:
    • Users should be able to post, like and comment
    • Users should see the total number of likes and comments per post in real time
    • High volume of data is expected on the first day of launch
    • Users should be able to see "trending" posts
  • User Posts -> Posting Service -> Post Topic
  • User Likes -> Like/Comment Service -> like topic
  • User Likes -> Like/Comment Service ->Comments Topic
  • Total Likes/Comments Computation(Kafka Streams) consumes Posts, likes, Comments and performs some aggregations
  • website <- Refresh feed service <-Posts_with_counts(TOPIC)  <- Total Likes/Comments Computation(Kafka Streams) 
  • website <-Trending Feed Service <- Trending posts(TOPIC)<- Trending Posts in past hour(Kafka Streams) 
Finance application - MyBank
  • MyBank is a company that allows real-time banking for its users. It wants to deploy a band-new capability to alert users incase of large transactions
  • Transaction control data already exists in database
  • Database of Transactions -> Kafka connect source CDC connector ->  bank_transaction(Topic)
  • Users set their threshold in apps -> App Threshold Service  -> user_settings(Topic) -> 
  • Real time Big Transaction Detection  consumes bank_transaction & user_settings and evaluates the alert to be sent or not
  • Users see notification in their apps <- Notification service <- user_alerts(Topic)<- Real time Big Transaction Detection 
  • It is common to have "generic" connectors or solutions to offload data from Kafka to HDFS, Amazon S3, and ElasticSearch for example
  • It is also very common to have Kafka serve a "speed layer" for real time applications, while having a "slow layer" which helps with data ingestions into stores for later analytics
  • Kafka as a front to Big Data Ingestion is a common pattern in Big Data to provide an "ingestion buffer" in front of some stores
  • Data Producers (Apps, website, Financials Systems, email, Customer Data, databases) -> Kafka  -> Spark, Stork, Flink etc -> Real time analytics, Dashboards, Apps, Alerts

    Data Producers (Apps, website, Financials Systems, email, Customer Data, databases) -> Kafka  -> Hadeep, Amazon S3, RDBMS -> Data Science, Reporting, Audit, Backup/Long term Storage
  • One of the first use case of Kafka was to ingest logs and metrics from various applications
  • This kind of deployment usually wants high throughput, has less restriction regarding data loss, replication of data etc
  • Appln logs can end up in loggin solutions such as Splunk, CloudWatch, ELK..
  • applications -> application_logs(partition) -> Kafka connect sink -> Plunk
  • applications -> application_metrics(partition) -> Kafka connect sink -> Plunk



Friday, 4 June 2021

Messaging System - RabbitMQ

 Application to application communication system

Open source messaging system

Queuing data in real time

RabbitMQ vs others

  • Provides web interface and monitoring services
  • REST api
  • Builtin user access controsls

Kafka vs RabbitMQ


Tuesday, 1 June 2021

Integations with Spring

What is Spring Integration?

  • Framework allows to do the following:
    • Allows communication between components within your application, based on in-memory messaging. This allows these application components to be loosely coupled with each other, sharing data through message channels.
    •  It allows communication with external systems. You just need to send the information; Spring Integration will handle sending it to the specified external system and bring back a response if necessary. Of course, this works the other way round; Spring Integration will handle incoming calls from the external system to your application. 
  • Benefits
    • Loose coupling among components.
    • Event oriented architecture.
    • The integration logic (handled by the framework) is separated from the business logic.
  •  
  • Integration is exchange of data between systems(both internal & external)
  • Integration is a common requirement and rapidly increasing
  • Raise of apis has pushed over this requirement of data exchange

Designed to support enterprise integration patterns

Traditional ways of integration

  • Webservices
  • File transfer
  • Database sharing

Spring provides solution to extract information thru use of trusted enterprise integration patterns. This helps your application to be robust, flexible & consistent.

Common Implementations

  • Enterprise Service Bus(ESB)
    • Bus between applications. Applications are decoupled 
    • Heavy weight
  • Integration frameworks
    • Lightweight
    • Point-point connections for simple solutions

Benefits of Spring Integration

  • Simplify complex enterprise integrations
  • Loosely couple components
  • Well defined boundaries between extension points
  • Separate integration and business logic

Architecture

  • Light weight message driven architecture
  • Channels, messages & end points for internal communication
    • Channels do the routing & end point do the operations(Similar to pipes and filters)
  • When communicating with External systems we use special end points called Adapters or Gateways. These sit on the edge of our applications
  • Adapters facilitate communication between systems
    • Available adapters
      • Data store
      • File store
      • HTTP
      • Mail
      • Messaging
      • Twitter
      • Webservices

Channels

  • Pollable
  • Subscribable
  • MessageChannel(Retains message in buffer until subcriber receives it.
    • PollableChannel
    • Direct
    • PublishSubscribe-> one - many
  • QueueChannel
  • PriortyChannel
  • DirectChannel
  • Router
    •  Route message thru channel to end points based on payload type(payload-type-router)
    • Route based on Header value
    • Route based on Receipent list

  • Filter
  • Splitter
    • split message into more than one
  • Aggregator
    • Combines multiple messages into single
  • Bridge
    •  

Spring Integration

  •  Integration Drivers

Messaging API

External System Integration


Adapter and gateways-

  • Adapters are one way communications
  • Gateway- two way communication

Filesystem integration

FTP Integration

  • transfer file from client to server thru established connection
  •  
External System Integration 
  • Filesystem - Log aggregation
  • ftp -
  • database
  • JMS
  • http