From bf648502edb5c6ec33d6e7e2d4a07f4edcdecd1a Mon Sep 17 00:00:00 2001 From: J-Atulya Date: Sat, 12 Apr 2025 23:36:50 +0530 Subject: [PATCH] Added contents for scalability --- README.md | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 106 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 90d2a6ac..ccfe2a35 100644 --- a/README.md +++ b/README.md @@ -379,13 +379,112 @@ First, you'll need a basic understanding of common principles, learning about wh [Scalability Lecture at Harvard](https://www.youtube.com/watch?v=-W9F__D3oY4) -* Topics covered: - * Vertical scaling - * Horizontal scaling - * Caching - * Load balancing - * Database replication - * Database partitioning + +## What is Scalability? + +**Scalability** is the capability of a system to handle a growing amount of work or its potential to accommodate growth. A scalable system maintains or improves performance as load increases by proportionally increasing system resources. + + +## Vertical Scaling + +**Vertical scaling** (scale-up) increases the capacity of a single server by adding more CPU, RAM, or storage. + +**Example**: Upgrading your DB server from 8GB RAM to 64GB RAM. + +**✅ Pros** +- Easy to implement +- No application code changes needed + +**❌ Cons** +- Physical hardware limits +- Downtime may be required +- Becomes expensive quickly + +--- + +## Horizontal Scaling + +**Horizontal scaling** (scale-out) adds more servers to the system and distributes the workload across them. + +**Example**: Adding more web servers behind a load balancer. + +**✅ Pros** +- Scales better for large systems +- Enables high availability and redundancy + +**❌ Cons** +- More complex to manage +- Requires stateless architecture and coordination + +--- + +## Caching + +**Caching** stores frequently accessed data in memory for faster retrieval, reducing load on backend systems. + +### Common caching types: +- **Browser cache**: Your browser saves website files like images, CSS, or JavaScript so it doesn’t have to download them again the next time you visit. This makes websites load faster. +- **CDN cache**: A Content Delivery Network (CDN) stores copies of your content in many places around the world. When a user visits your site, the CDN gives them the closest copy, which loads faster. +- **Server-side cache**: This is caching done on the backend using tools like Redis or Memcached. For example, if a database query is expensive (slow or heavy), the result can be saved in memory so it doesn’t need to be repeated. + +**✅ Pros** +- Significantly improves response times +- Reduces backend load + +**❌ Cons** +- Sometimes the data in the cache is old and doesn’t match the latest data in the database. +- It's tricky to know when to delete or update the cached data. If you do it too soon, you lose the benefit of caching; if too late, users may see outdated info. + +## Load Balancing + +**Load balancing** distributes incoming traffic across multiple servers to ensure no one server is overwhelmed. + +### Common strategies: +- **Round-robin** : Requests go to each server one by one, in a circle (like taking turns). +- **Least connections** : Requests go to the server that is currently handling the fewest active connections. This helps keep load balanced more fairly. +- **IP hashing** : The system uses the user's IP address to decide which server handles their requests. This way, the same user often gets routed to the same server. + +**✅ Pros** +- High availability +- Fault tolerance +- Enables horizontal scaling + +**❌ Cons** +- Can become a single point of failure (use redundant balancers) + +## 🛢️ Database Replication + +**Database replication** copies data from a primary (master) DB to one or more replicas (slaves). + +### Types: +- **Master-slave**: Writes to master, reads from replicas +- **Master-master**: Multiple writable nodes (more complex) + +**✅ Pros** +- Improved read scalability +- Redundancy and failover support + +**❌ Cons** +- **Replication lag** : When you copy data from the main database to replicas, there's a small delay. The replicas might not have the very latest updates right away. +- **Consistency issues in write-heavy apps** : If your app writes a lot of data (e.g., saving user actions), the replicas may fall behind, and different servers might show different versions of the data for a short time. + +## Database Partitioning [Sharding](https://learn.microsoft.com/en-us/azure/architecture/patterns/sharding) + +**Sharding** splits a large database into smaller parts, called shards, each stored on separate machines. + +### Types: +- **Horizontal partitioning**: Splits by rows (e.g., user_id ranges) +- **Vertical partitioning**: Splits by columns (e.g., profile vs activity data) + +**✅ Pros** +- Improves performance and scaling +- Avoids overloading a single node + +**❌ Cons** +- Querying across shards is difficult +- Requires smart shard key design +- Rebalancing shards can be tricky + ### Step 2: Review the scalability article