Technology: June 2021

Tuesday, 29 June 2021

System Designs

Airline Reservation System

Requirements

Search available Flights for source, destination & date
Show price list, time, Choose flight, seats and select one.
Payment and Notification system

DataModel

FlightManagement

Airlines

Flights

SeatsTemplate

Airports

Flight(FlightNumber, From, To, duration)

Flight_Seat_Template(Seat, class)

FlightSchedules(date, time, gate, status)

FlightSeatAllocations

FlightReservation

Customer(name, email, mobile)

Reservation(reservationNumber, #seats, flight, status, Payment mode, Pay details)

FlightSeatAllocations(customername, age, price)

Payment & Notifications

Movie Reservation System

Requirements

Movie Info
Search by movie name, Theater name, city name
Select Movie & Reserve tickets
Payment & Notifications
Comments, Rating
Offers?

REST API

GET ListMovies(City, TheaterName, datetime)
POST SeatBooking(MovieId, showId, SeatsToBeReserved)

DataModel

Movie
- MovieList
Theater Management

Cities

Theaters
- Screens

Ticket Reservation

Users
- Reservation

Payments & Notifications
NOSQL tables

Comments, ratings, movieinfo, trailers, artists, cast, reviews, analytics

HighLevelDesign

User -> LoadBalancer -> appln server ->

Applicaiton Design

CDN for images and videos
LogStash & ElasticSearch for Searching. Kibana dashboard
NoSQL for storing movie infor, comments & likes(ever increasing)

Have replication factor for backups

rdbms for reservation system(fixed data and acid properies).

Have slaves for reading & Master for read/write

Cache all Movie, Seat information(Memcache, Redis) from both NoSQL & rdbms
Message queue for notifications(Rabbit MQ)
Recommendation engine

Hadoop

Trends

Kafka to feed data into spark

Spark or Storm latest trends, fraud detection and mitigation strategies

Handling Concurrency

E-commerse

VendorManagement
Home page & Search
Recommendation systems
Wishlist & checkout
Orderprocessing
Payment and Notification

Online Food Ordering System

Requirements

Search by Restaurants & food
Add to card and order food
Payment & notifications

Flow

Setup

Customer to signup and create account
Restaurant to register
App to track customer location and show nearby restaurants
Customers can search based on Restaurant names / items

Order processing

Customer to select items from one specific restaurant only
Payment thru pay gateway
Restaurant gets notification

Order fulfilment

Restaurant sends notification of food prepared
Delivery boy around same location accepts the request
User track delivery boy location
Notification when Order delivered to customer

Entities

Customer account

Address(lat, long, pincode)

Restaurants regitration (name, location, Geo location)

Menus(drink, veg, non-veg)

Items

Order

items

DeliveryBoy

MicroServices involved

Ordering
Restaurant Search service(Elastic Search) - supports geodistancequery
Restaurant profile service
Order fulfillment service
Dispatcher service

Stock Brokerage System

User Management

User
Watchlist
StockHoldings

StockManagement

List of stocks
ListOfOrders

Payment
Notifications
Design Patterns

Observer Design pattern

Facebook Feeds

Requirement

Append new post as they arrive
Feed may contain image/video/text
New feed must be generated based on users followed
scalable system

Given billions of users use the system, let us focus on backend

Database should no NO-SQL(Cassandra, MongoDB)
Datamodel

User
Friend
FeedItem(feeditemid, userid, content, location)
Media(MediaID, type, description, location)
FeedMedia(feeditemid, MediaID)

Offline generation of news feed

background job/server generates and push to cache
Cache will be like UserId -> LinkedHashMap<FeedItem> feeditems; lastupdatetime;

FeednotificationService

Pull
Push

TinyURL

Requirement

Unique 7 leters alias using 72 letters(A-Z a-z 0-9).
scalable system

Convert long url into unique decimal value using md5 hashin

md5 hash value -> convert to Binary -> take first 45bits -> convert to decimal number-> base62 approach -> use hashmap to convert 7 decimal reminders to 7 chars

Base 62 approach to convert any decimal to 7 chars
Datamodel

User
URL(

System Design

User -> Load balance -> appln server -> url generating service -> DB

YouTube, Netflix

Requirement

Upload videos,
Share and View videos
Search on titles
Add and view comment
Likes/dislikes & total view

Rest apis

Upload, Search, Stream

Highlevel design

Client -> webserver -> appln server -> processing Queue -> encoder to encode into multiple formats -> Thumbnail ->

Database

UserDB
MetadataDB

meta data for video
Videos stored in DFS (distributed file system)

DB schema

Video
Comment
User

Lowleve details

Thumbnails in Bigtable by google
Videos -> haddop distributed file sytem

Sharding

done on video id rather than user id

Whats app/Facebook Messger

Requirements

Support one-one conversation
Offline/online status
Chat history

Non-Functional Requirements

Low latency
High availability

Database design

Users
UserConversation

Highlevel Design

Chat server say can hold 20K connections. There will be clusters of chat servers
UserA sends message. It goes to Chat server which is assigned to user
Chat server stores the message to DB an forwards it to another chat server where receipent is assigned.
Use HBASE(Wide column database)
Chat server maintains hashmap of connected users HashMap<user, connection>

Long Polling technique

Twitter/Instagram

Requirements

Post new tweets
Follow others
Post photos/Videos
Timeline

Non-Functional Requirements

Low latency
High availability

Rest apis

Post Tweet()
GET Tweets()

Read Heavy
Client -> Loadbalancer -> appserver -> Aggregator -> db shards for text & filesystem for photos.(Hadoop distributed file sytem) also cache

Database design- NoSQL (Mongo OR Cassandra)

MySQL

User() -> Data is limited

NoSQL(Mongo or Cassandra) -> ever growing data

tweet(tweetid, userid, content, location)
Friends()

Database Partitioning (Sharding)

ByUserID -> May had disadvantage when Celebrity tweets due to many followers
TweetId-> Use Cache to overcome the disadvantage of few distributed tweets of a user

Deck Of Cards

4 suits - Diamond, spade, heart, club

Zoom Car(Car Renting

Consumer
Vehicle
VehicleInventory
VehicleReservation
CarRentalLocations

Online Shopping

Search
Buy
Payment
Notifications

Design Hit Counter

Queue based approach-> Cost is O(n)
Array based approach ->

Uber

Requirements

Customer

Customer can request a ride
After booking customer should be able to see all available drivers
Customer can see driver location

Driver

Can see customer location
Upon reaching destination, driver marks the trip as complete to become available for others

Quad Tree(A node will have 4 child nodes)

World map divided into small grids in the specific range of latitude & longitude
Start with one node that represents whole world
Since it has more than 500M locations, break it into 4 nodes.
Repeat the process until there are no node left with more than 500 locations

Hashmap as cache and copy to quad tree once a min
Notification Server -> works on pub-sub model

When user requests for ride, system subscribes in notification server.
Notification server will communicate with Quad server and store location of user in quad tree
Notification server will communicate to Driver as well.
Notification server will then communicate driver location to user

LLD

User -> find a ride-> quad server -> quad tree
driver -> update location -> notification server -> hashMap
User-> driver location -> notification server -> hashMap

Gmail

User Service
Mail Service
Authentication service

Instagram(Photo Sharing Service)

Requirements

Upload/download/view photo
Search based on photo title
Follow other users
Feed

DB Schema/Data flow

User(userid, mail, dob, creation_time)
UserPosts(id, userid, date, type, location)
UserFollow(id,

Use nosql as it is scalable(Cassandra, MongoDB)
epoch time -> everything represented as secs
Sharding

Technique to breakup the huge database into smaller parts

Distributed cache

memcache/Redis
appln server -> Load balancer -> Cache1..N
Disadvantage

Rehashing when more cache servers are added/deleted

Consistent hashing

No need to do modulus with count of servers
Total set of values are mapped into Hash ring
Hash ring stores key of all values
when a value is looked up, hashing is performed. It finds nearest server to it in clockwise direction.

master/slave architecture for backups
Configuration Manager

zookeeper

Google Drive/Drop box

User -> upload file -> appln server -> cloud storage
Metadata DB -> (chunk hash, file, user, device, workspace ID)

Which chunk stores which part

Client -> block server -> cloud storage
Client -> synchronization server -> Metadata DB

Client -> split the files into smaller pieces of chunks
Maintain workspace in client appln
Take care of offline clients
sending notification of any modification in local client machine

Google Maps

Influencers

Weather
Traffic
Road
compute modified route and distance when user changes direction
Quad Tree

EDA(Event Driven Architechture)

Loosely coupled
Realtime analytics
size of architecture
Helps in debugging
keep sequence of events
Mediator component take event from event queue and partitions to different processing units as per the design
EDA model is almost same as Pub-Sub model

Design Pattern

Observer Design pattern

This is also called as Publish Subsceription design pattern
Youtube Channel or Facebook groups
All the subsribers/group members will get notification where channel will push the same.

Chain Of Responsibility

Used for loose coupling.
Handlers in chain will decide themselves who will serve the request
Request -> Handler1-> Handler2->Handler3
Eg

Multilevel secutity layer
ATM cash withdrawl
Logger system

Error, Debug, info logger
Abstract Class Logger

Infoclass, DebugClass, ErrorClass
Each object will have instance of next in chain object and invokes it at the end.

Saturday, 26 June 2021

Realtime/Inmemory db

PERFORMANCE

Read data from hard disk - 63MB/sec
Read from SSD(Solid state disk) - 457MB/sec
Read from RAM - 4671 MB/sec

examples

Key value pairs

Redis
memcache

sql db

sqlite

Features of Real time/RAM DB

Expensive as RAM is expensive
Volatile
high performance
AVL tree() for indexing

Caching Best practices

Validity of data
High hit rate
Cache miss
TTL

Caching features/estimations

Tera byte
50K to 1M QPS
~1 ms latency
LRU(eviction)
100% availability
scalable

Cache access patterns

Write through

Write happens both in cache & DB. Acknowledgement happens after write happens in both

Write around

Write happens in DB only. Cache will be updated in subsequent read request.

Write back

Write happens in Cache and acknowledment happens. Another service will sync with db asyncly

Distributed transactions

Two phase commit

Prepare
Commit
Cons- latency due to multiple http calls to micro services

Three phase commit

Recovers incase coordinator/participant fails

Saga

Databases

Choice of database - factors

Structured vs non-structured
Query pattern
Scale

Caching DB

Redis
memcache

Image/Video DB

Blob storage

Amazon S3 with CDN

Text based search

Text Search engine

Elastic Search
Solr

Timeseries database

For metrics monitoring

Open TSDB

Structured DB

need ACID

rdbms

orcl, mysql, sqlserver, postgres

Document DB/ NOSQL DB

Columnar DB

Cassandra(Apache)
HBase

persists data
designed to handle a large amount of data across the distributed community server
Good large amount of data with finite number of queries.

Redis
Elastic(Facebook)
mysql

Design Interview tips

High level design or low level design
What functional features to be addressed
What non-functional features to be addressed

#users
scale
thruput, data size

Start with basic building blocks

how they interact
Get into depth of 1 or 2 components

Is it going in right direction
where does business flow start

Sunday, 6 June 2021

Apache Kafka

Application - Application communication system

A message broker
Publish/Subscribe broker
Keeps log of historic events.
Distributed event ledger/log

Topic

Logical group of events

Kafka is a Warehouse

Topic-is a Storage room in Warehouse

Storage Room. Each Topi can have one or more Partition/Storage counters
Partitions are used for concurrent processing
Offset is Current index value for each partition

Partition-> Storage Counter

Offset->

HR System

Marketing System

Active Directory

How does legacy system work

Fetch data from DB OR Webservice at scheduled times using Scheduler

Kafka

Central messaging system
Activity/Application Log
Storing IoT data
System decoupling
Async processing
Distributed Streaming Platform
Messaging system

Publish and Subscribe

Advantages of Messaging System

Topic

Kafka Topi is same as Queue
Kafka guarantees order of records

avro - datatype

Kafka Streams

Easy data processing and transformation library within Kafka

Data transformation
Data enrichment
Fraud detection
Monitoring and Alerting

MovieFlix

Resume video where they left if off
Build user profile in real time
Recommend next show in real time
Store all data in analytics store
Show Position

Video Player(consumer) -> Video Position Service -> Kafka
Once in a while Video Player sends position to Video Position Service
Video Position Service sends it to Kafka
Video Player(consumer) -> Resuming Service <- Kafka

Recommendations

We have data about which user watches which show and how far
Recommendation Engine powered by Kafka Streams take the show position perform some good algorithm and come up with Recommendations
The recommendations are consumed by recommendation service
These recommendations can be consumed by Analytics consumer for Analytical store(Hadoop)

Get Taxi - IOT Example

The user should match with a close by driver
The Pricing should "surge" if the number of drivers are low or the number of users is high
All the position data before and during the ride should be stored in an analytics store so that the cost can be computed accurately
User Application -> User Position Service -> Kafka (user_position) Topic
Taxi Driver Application -> Taxi Position Service -> Kafka (taxi_position) Topic
Surge Pricing computation model(Stream) that consumes User_position & Tax position, produce Surge Pricing and place in Surge Pricing Topic
User Application <- Taxi Cost Service <- Surge Pricing Topic -> Analytics consumer -> Analytics Store

CQRS - MySocialMedia

Command, query, responsibility, Segregation
Social media allows people to post images. Others can react using "likes", "comments". Business wants to know the following capabilities:

Users should be able to post, like and comment
Users should see the total number of likes and comments per post in real time
High volume of data is expected on the first day of launch
Users should be able to see "trending" posts

User Posts -> Posting Service -> Post Topic
User Likes -> Like/Comment Service -> like topic
User Likes -> Like/Comment Service ->Comments Topic
Total Likes/Comments Computation(Kafka Streams) consumes Posts, likes, Comments and performs some aggregations
website <- Refresh feed service <-Posts_with_counts(TOPIC) <- Total Likes/Comments Computation(Kafka Streams)
website <-Trending Feed Service <- Trending posts(TOPIC)<- Trending Posts in past hour(Kafka Streams)

Finance application - MyBank

MyBank is a company that allows real-time banking for its users. It wants to deploy a band-new capability to alert users incase of large transactions
Transaction control data already exists in database
Database of Transactions -> Kafka connect source CDC connector -> bank_transaction(Topic)
Users set their threshold in apps -> App Threshold Service -> user_settings(Topic) ->
Real time Big Transaction Detection consumes bank_transaction & user_settings and evaluates the alert to be sent or not
Users see notification in their apps <- Notification service <- user_alerts(Topic)<- Real time Big Transaction Detection

Finance application - Big Data Ingestion

It is common to have "generic" connectors or solutions to offload data from Kafka to HDFS, Amazon S3, and ElasticSearch for example
It is also very common to have Kafka serve a "speed layer" for real time applications, while having a "slow layer" which helps with data ingestions into stores for later analytics
Kafka as a front to Big Data Ingestion is a common pattern in Big Data to provide an "ingestion buffer" in front of some stores
Data Producers (Apps, website, Financials Systems, email, Customer Data, databases) -> Kafka -> Spark, Stork, Flink etc -> Real time analytics, Dashboards, Apps, Alerts

Data Producers (Apps, website, Financials Systems, email, Customer Data, databases) -> Kafka -> Hadeep, Amazon S3, RDBMS -> Data Science, Reporting, Audit, Backup/Long term Storage

Finance application - Logging and Metrics Aggregation

One of the first use case of Kafka was to ingest logs and metrics from various applications
This kind of deployment usually wants high throughput, has less restriction regarding data loss, replication of data etc
Appln logs can end up in loggin solutions such as Splunk, CloudWatch, ELK..
applications -> application_logs(partition) -> Kafka connect sink -> Plunk
applications -> application_metrics(partition) -> Kafka connect sink -> Plunk

Friday, 4 June 2021

Messaging System - RabbitMQ

Application to application communication system

Open source messaging system

Queuing data in real time

RabbitMQ vs others

Provides web interface and monitoring services
REST api
Builtin user access controsls

Kafka vs RabbitMQ

Tuesday, 1 June 2021

Integations with Spring

What is Spring Integration?

Framework allows to do the following:

Allows communication between components within your application, based on in-memory messaging. This allows these application components to be loosely coupled with each other, sharing data through message channels.
It allows communication with external systems. You just need to send the information; Spring Integration will handle sending it to the specified external system and bring back a response if necessary. Of course, this works the other way round; Spring Integration will handle incoming calls from the external system to your application.

Benefits

Loose coupling among components.
Event oriented architecture.
The integration logic (handled by the framework) is separated from the business logic.

Integration is exchange of data between systems(both internal & external)
Integration is a common requirement and rapidly increasing
Raise of apis has pushed over this requirement of data exchange

Designed to support enterprise integration patterns

Traditional ways of integration

Webservices
File transfer
Database sharing

Spring provides solution to extract information thru use of trusted enterprise integration patterns. This helps your application to be robust, flexible & consistent.

Common Implementations

Enterprise Service Bus(ESB)

Bus between applications. Applications are decoupled
Heavy weight

Integration frameworks

Lightweight
Point-point connections for simple solutions

Benefits of Spring Integration

Simplify complex enterprise integrations
Loosely couple components
Well defined boundaries between extension points
Separate integration and business logic

Architecture

Light weight message driven architecture
Channels, messages & end points for internal communication

Channels do the routing & end point do the operations(Similar to pipes and filters)

When communicating with External systems we use special end points called Adapters or Gateways. These sit on the edge of our applications
Adapters facilitate communication between systems

Available adapters

Data store
File store
HTTP
Mail
Messaging
Twitter
Webservices

Channels

Pollable
Subscribable
MessageChannel(Retains message in buffer until subcriber receives it.

PollableChannel
Direct
PublishSubscribe-> one - many

QueueChannel
PriortyChannel
DirectChannel

Router

Route message thru channel to end points based on payload type(payload-type-router)
Route based on Header value
Route based on Receipent list

Filter
Splitter

split message into more than one

Aggregator

Combines multiple messages into single

Bridge

Spring Integration

Integration Drivers

Messaging API

External System Integration

Adapter and gateways-

Adapters are one way communications
Gateway- two way communication

Filesystem integration

FTP Integration

transfer file from client to server thru established connection

External System Integration

Filesystem - Log aggregation
ftp -
database
JMS
http