Monday, April 24, 2023

Scaling lessons from building BBDaily.com(Bigbasket.com) and shiskha.com?

In this post, I will talk about a few scaling lessons from my own experience.

Below are a high level architectures of any web based application which can fairly scale up to a few million requests/day.

This we have accomplished while scaling out shiksha.com and BBDaily(Bigbasket.com), every application goes though this cycle.

Architecture 1(VM based local cloud):

Architecture 1(AWS based):

Let's talk about the lessons learned in the journey:

Scaling FE tier:

1. Front End Tier can be fine tuned and scale out horizontally if you have a common Session Management Service in place. You can easily add an ALB(Application Load Balancer) on top of the front end tier and route the request based on any built in algorithm provided by ALB like Round Robin, Weighted Round Robin, IP hash etc.. more.

2. Serve Static Content (JS, Images, CSS, Fonts etc...) through any CDN provider, there are many CDN providers which you can consider.

3. While using CDN, you need to take care of versioning of the static contents every time a content is updated. This is required so that you don't need to ask your client to delete the local client side cache after a new release of your application. We faced this issue multiple times where we had to go to our CND providers and manually delete the old cache when versioning was not in place.

4. Static Asset Versioning is a simple technique which you can accomplish by writing a simple script which creates a new version of the asset, store the version no in a Config/DB/Cache. This script you can easily plug in your build/deployment pipeline.

5. You should configure Static Domains (no need to send cookies to get static content) for image, CSS, JS etc.... separately. This is the key learning we had while doing the FE optimization's, because there are limits on no of parallel connections/domain a web browser can make. Also, there are limits on max no of connections.

6. If you are serving different assets through different domain, this will speedup the content loading/parallelization by the web browser/client.

7. No need to send cookies for fetching the static content this was happening because we were serving static contents from the sub-domain not the static domain. It was increasing the HTTP request size which was unnecessary.

8. Use FE profiling tools like Lighthouse or Pagespeed or Chrome Developer Tool to analyze and fix/improve the performance.

Scaling BE tier:

1. Separate target groups based on different applications for example: Consumer app, OPS app, Admin and Crons/Batch Jobs.

2. Within each target group you can set different autoscaling policy based on the type of application for example CPU/Memory/Disk IO intensive or a combination.

3. APIs profiling and optimizations, use any APM tool (New Relic, Rakuten SixthSense).

4. Separate DB replicas for different applications and reports, this is required to scale out read requests. Typically applications are mostly read heavy as compared to write.

5. Used managed services of the cloud provider where your application is hosted (AWS, GCP, Azure etc..). This helps in reducing the overhead of managing the infra/DevOps cost, also you can focus on budling the features for your customer/business. There is a tradeoff here, but, we can discuss that in other post.

6. DB archival/partitioning, you need to do this once your data grows so much that quiring/managing has become difficult. You can think about using any cluster based solution where data is distributed across multiple nodes along with replicas.

7. VividCortex can used for DB monitoring/query analysis.

8. Other monitoring/alerting tools CloudWatch, Opsgenie, Datadog, Apptuit etc.., use these tools for monitoring/alerting purpose.

Here is the detailed tech talk, which I did.

Video link

Medium: https://medium.com/@aditya0987654321/scaling-lessons-from-building-bbdaily-com-bigbasket-com-and-shiskha-com-35c2afe74c59

Sunday, April 16, 2023

Application/Platform cost optimization's series - Part Three

Let's finish this post with an example of "Back-Of-The-Envelope Calculation or Fermi problems", this is a useful technique while you are building a new platform from scratch or scaling out your existing platform.

Example: Let's say you want to build a photo-sharing app, below are the high-level features:

1. User should be able to register with the app
2. Upload a photo/photos
3. Get a sharable link after the photo is uploaded
4. User should be able to view/browse through the uploaded photos

Non-functional requirement(assumptions):
1. No of active users/month ~ 1M
2. Average photo upload/day ~10 photos
3. Total new users/day ~ 1k
4. Average photo size ~2MB

Based on the above nos, let's do some calculations for the user and their photos data.

N = no of years you want to hold the data

1. Let's say the average user profile size (basic details and profile image) is 1MB. The total data size of profile DB = N*1000*365*1MB = N*365GB
2. Photos storage requirements: N*10*33000*365*2MB = N*.25TB
3. Let's say you want to replicate the data to m no of replicas

So the total data storage requirement would be:
Profile DB size ~ m*N*365GB
Photos DB size ~ m*N*.25TB

Similarly, we can do the calculation for the no of read-to-write requests (QPS) and provision our server based on that.

Previous post link(just to connect the dot): http://bitly.ws/D6P2

Happy weekend

Application/Platform cost optimization's series - Part Two

In my last post about the cost optimization, we discussed a few approaches. Here is the link: http://bitly.ws/D6Pc

In this post, we will look deep into one of the famous cost estimation technique called as "The Back-of-envelope calculations".

Every modern day system what we design these days have majorly following components:
1. Applications -> You run on VMs or any cluster like GKE
2. Databases -> SQL, NoSQL etc..
2. Cache layer -> Redis, Memcached etc..
4. Message Queues -> Kafka, Google Pub/Sub etc..
5. Networking layer

We need to do estimations at each of the above component to calculate the actual cost of building/scaling any application.

So how do we do these estimates? What are the prerequisites?

Let's start with the basic refreshers.
Remember this "BKMGTP", this will be used for data size estimations.

B: Byte : Ten: 10
K : Kilo : Thousand : 1000 - > 2^10
M: Mega: Million: 1000 0000 ->2^20
G: Giga: Billion: 1000 000 000 ->2^30
T: Tera: Trillion: 1000 000 000 000 ->2^40
P: Peta: Quadrillion: 1000 000 000 000 000->2^50

Latency numbers:

ms -> 10^-3 seconds
1µs -> 10^-6 seconds
1ns -> 10^-9 seconds

L1 cache reference -> .5 ns
Branch mispredict -> 5 ns
L2 cache reference -> 7 ns
Mutex lock/unlock -> 100 ns
Main memory reference -> 100 ns
Compress 1k bytes with Zippy -> 10 µs
Send 2kb over 1 Gbps network -> 20 µs
Read 1 MB sequentially from memory -> 250 µs
Round trip within the same data center -> 500 µs
Disk seek -> 10ms
Read 1 MB sequentially from the network -> 10 ms
Read 1 MB sequentially from disk -> 30 ms

We will discuss in the next post about the component wise estimation, in the meantime, please refresh above details.

Happy learning :)

References:

Part3: https://rb.gy/3hax8
https://lnkd.in/g66FPP2v

Application/Platform cost optimization's series - Part one

In the current market scenarios, optimization word is very popular.

We can think about optimising our applications to reduce the cost. Usually, everyone likes to build cool products and move on. Maintaining product is not an exciting job, at times it is boring and it takes a lot of hard work.

So, how do we do it?

The solution lies in setting up the right process during the application design and development.

1. Have a predefined cost estimation template as a part of the design.

2. Set-up proper Monitoring/Alerts/Observability

3. Use required monitoring capabilities of any Cloud Platform you use. They provide different ways of setting up alerts, based on log, matrices etc...

4. Invest some bandwidth in every sprint on fixing the optimization's Tech Debt.

5. Use Back of the envelope estimation to calculate the approx. cost of running an application.

Part 2: https://rb.gy/ch6hz

Happy optimization.

A Blog for the Passionate and Curious on Technology, Product, People and Process