Update the solutions notebooks and README files to reflect new GitHub repository links for source, ensuring consistency across the documentation.

This commit is contained in:
Ido777
2025-04-30 11:18:28 +03:00
committed by Ido Ophir
parent abe48e6c23
commit 47a065c126
23 changed files with 788 additions and 788 deletions

View File

@@ -1,6 +1,6 @@
# 设计一个网页爬虫
**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
## 第一步:简述用例与约束条件
@@ -79,7 +79,7 @@
我们将用表 `crawled_links` (已抓取链接 )来记录已经处理过的链接以及相应的页面签名。
我们可以将 `links_to_crawl``crawled_links` 记录在键-值型 **NoSQL 数据库**中。对于 `crawled_links` 中已排序的链接,我们可以使用 [Redis](https://redis.io/) 的有序集合来维护网页链接的排名。我们应当在 [选择 SQL 还是 NoSQL 的问题上,讨论有关使用场景以及利弊 ](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
我们可以将 `links_to_crawl``crawled_links` 记录在键-值型 **NoSQL 数据库**中。对于 `crawled_links` 中已排序的链接,我们可以使用 [Redis](https://redis.io/) 的有序集合来维护网页链接的排名。我们应当在 [选择 SQL 还是 NoSQL 的问题上,讨论有关使用场景以及利弊 ](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#sql-还是-nosql)。
* **爬虫服务**按照以下流程循环处理每一个页面链接:
* 选取排名最靠前的待抓取链接
@@ -209,7 +209,7 @@ class RemoveDuplicateUrls(MRJob):
### 用例:用户输入搜索词后,可以看到相关的搜索结果列表,列表每一项都包含由网页爬虫生成的页面标题及摘要
* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
* **客户端**向运行[反向代理](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
* **Web 服务器** 发送请求到 **Query API** 服务器
* **查询 API** 服务将会做这些事情:
* 解析查询参数
@@ -222,7 +222,7 @@ class RemoveDuplicateUrls(MRJob):
* **倒排索引服务**对匹配到的结果进行排名,然后返回最符合的结果
* 使用**文档服务**返回文章标题与摘要
我们使用 [**REST API**](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest) 与客户端通信:
我们使用 [**REST API**](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#表述性状态转移rest) 与客户端通信:
```
$ curl https://search.com/api/v1/search?query=hello+world
@@ -248,7 +248,7 @@ $ curl https://search.com/api/v1/search?query=hello+world
},
```
对于服务器内部通信,我们可以使用 [远程过程调用协议RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
对于服务器内部通信,我们可以使用 [远程过程调用协议RPC](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
## 第四步:架构扩展
@@ -265,19 +265,19 @@ $ curl https://search.com/api/v1/search?query=hello+world
我们将会介绍一些组件来完成设计,并解决架构规模扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及替代方案。
**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及替代方案。
* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
* [水平扩展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
* [Web 服务器(反向代理)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
* [API 服务器(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
* [NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#nosql)
* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
* [DNS](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#域名系统)
* [负载均衡器](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#负载均衡器)
* [水平扩展](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#水平扩展)
* [Web 服务器(反向代理)](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#反向代理web-服务器)
* [API 服务器(应用层)](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#应用层)
* [缓存](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#缓存)
* [NoSQL](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#nosql)
* [一致性模式](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#一致性模式)
* [可用性模式](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#可用性模式)
有些搜索词非常热门,有些则非常冷门。热门的搜索词可以通过诸如 Redis 或者 Memcached 之类的**内存缓存**来缩短响应时间,避免**倒排索引服务**以及**文档服务**过载。**内存缓存**同样适用于流量分布不均匀以及流量短时高峰问题。从内存中读取 1 MB 连续数据大约需要 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。<sup><a href="https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数">1</a></sup>
有些搜索词非常热门,有些则非常冷门。热门的搜索词可以通过诸如 Redis 或者 Memcached 之类的**内存缓存**来缩短响应时间,避免**倒排索引服务**以及**文档服务**过载。**内存缓存**同样适用于流量分布不均匀以及流量短时高峰问题。从内存中读取 1 MB 连续数据大约需要 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。<sup><a href="https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数">1</a></sup>
以下是优化**爬虫服务**的其他建议:
@@ -285,7 +285,7 @@ $ curl https://search.com/api/v1/search?query=hello+world
* 为了处理数据大小问题以及网络请求负载,**倒排索引服务**和**文档服务**可能需要大量应用数据分片和数据复制。
* DNS 查询可能会成为瓶颈,**爬虫服务**最好专门维护一套定期更新的 DNS 查询服务。
* 借助于[连接池](https://en.wikipedia.org/wiki/Connection_pool),即同时维持多个开放网络连接,可以提升**爬虫服务**的性能并减少内存使用量。
* 改用 [UDP](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#用户数据报协议udp) 协议同样可以提升性能
* 改用 [UDP](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#用户数据报协议udp) 协议同样可以提升性能
* 网络爬虫受带宽影响较大,请确保带宽足够维持高吞吐量。
## 其它要点
@@ -294,61 +294,61 @@ $ curl https://search.com/api/v1/search?query=hello+world
### SQL 扩展模式
* [读取复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
* [读取复制](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#主从复制)
* [联合](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#联合)
* [分片](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#分片)
* [非规范化](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#非规范化)
* [SQL 调优](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#sql-调优)
#### NoSQL
* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
* [键-值存储](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#键-值存储)
* [文档类型存储](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#文档类型存储)
* [列型存储](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#列型存储)
* [图数据库](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#图数据库)
* [SQL vs NoSQL](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#sql-还是-nosql)
### 缓存
* 在哪缓存
* [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
* [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
* [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
* [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
* [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
* [客户端缓存](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#客户端缓存)
* [CDN 缓存](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#cdn-缓存)
* [Web 服务器缓存](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#web-服务器缓存)
* [数据库缓存](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#数据库缓存)
* [应用缓存](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#应用缓存)
* 什么需要缓存
* [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
* [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
* [数据库查询级别的缓存](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
* [对象级别的缓存](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#对象级别的缓存)
* 何时更新缓存
* [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
* [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
* [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
* [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
* [缓存模式](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#缓存模式)
* [直写模式](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#直写模式)
* [回写模式](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#回写模式)
* [刷新](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#刷新)
### 异步与微服务
* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
* [消息队列](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#消息队列)
* [任务队列](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#任务队列)
* [背压](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#背压)
* [微服务](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#微服务)
### 通信
* 可权衡选择的方案:
* 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
* 内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
* 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#表述性状态转移rest)
* 内部通信 - [RPC](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
* [服务发现](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#服务发现)
### 安全性
请参阅[安全](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)。
请参阅[安全](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#安全)。
### 延迟数值
请参阅[每个程序员都应该知道的延迟数](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
请参阅[每个程序员都应该知道的延迟数](https://github.com/ido777/system-design-primer-update/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
### 持续探讨

View File

@@ -1,6 +1,6 @@
# Design a web crawler
*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/ido777/system-design-primer-update#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
## Step 1: Outline use cases and constraints
@@ -81,7 +81,7 @@ We'll assume we have an initial list of `links_to_crawl` ranked initially based
We'll use a table `crawled_links` to store processed links and their page signatures.
We could store `links_to_crawl` and `crawled_links` in a key-value **NoSQL Database**. For the ranked links in `links_to_crawl`, we could use [Redis](https://redis.io/) with sorted sets to maintain a ranking of page links. We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
We could store `links_to_crawl` and `crawled_links` in a key-value **NoSQL Database**. For the ranked links in `links_to_crawl`, we could use [Redis](https://redis.io/) with sorted sets to maintain a ranking of page links. We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/ido777/system-design-primer-update#sql-or-nosql).
* The **Crawler Service** processes each page link by doing the following in a loop:
* Takes the top ranked page link to crawl
@@ -211,7 +211,7 @@ We might also choose to support a `Robots.txt` file that gives webmasters contro
### Use case: User inputs a search term and sees a list of relevant pages with titles and snippets
* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/ido777/system-design-primer-update#reverse-proxy-web-server)
* The **Web Server** forwards the request to the **Query API** server
* The **Query API** server does the following:
* Parses the query
@@ -224,7 +224,7 @@ We might also choose to support a `Robots.txt` file that gives webmasters contro
* The **Reverse Index Service** ranks the matching results and returns the top ones
* Uses the **Document Service** to return titles and snippets
We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
We'll use a public [**REST API**](https://github.com/ido777/system-design-primer-update#representational-state-transfer-rest):
```
$ curl https://search.com/api/v1/search?query=hello+world
@@ -250,7 +250,7 @@ Response:
},
```
For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
For internal communications, we could use [Remote Procedure Calls](https://github.com/ido777/system-design-primer-update#remote-procedure-call-rpc).
## Step 4: Scale the design
@@ -266,26 +266,26 @@ It's important to discuss what bottlenecks you might encounter with the initial
We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/ido777/system-design-primer-update#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
* [Cache](https://github.com/donnemartin/system-design-primer#cache)
* [NoSQL](https://github.com/donnemartin/system-design-primer#nosql)
* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
* [DNS](https://github.com/ido777/system-design-primer-update#domain-name-system)
* [Load balancer](https://github.com/ido777/system-design-primer-update#load-balancer)
* [Horizontal scaling](https://github.com/ido777/system-design-primer-update#horizontal-scaling)
* [Web server (reverse proxy)](https://github.com/ido777/system-design-primer-update#reverse-proxy-web-server)
* [API server (application layer)](https://github.com/ido777/system-design-primer-update#application-layer)
* [Cache](https://github.com/ido777/system-design-primer-update#cache)
* [NoSQL](https://github.com/ido777/system-design-primer-update#nosql)
* [Consistency patterns](https://github.com/ido777/system-design-primer-update#consistency-patterns)
* [Availability patterns](https://github.com/ido777/system-design-primer-update#availability-patterns)
Some searches are very popular, while others are only executed once. Popular queries can be served from a **Memory Cache** such as Redis or Memcached to reduce response times and to avoid overloading the **Reverse Index Service** and **Document Service**. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.<sup><a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1</a></sup>
Some searches are very popular, while others are only executed once. Popular queries can be served from a **Memory Cache** such as Redis or Memcached to reduce response times and to avoid overloading the **Reverse Index Service** and **Document Service**. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.<sup><a href=https://github.com/ido777/system-design-primer-update#latency-numbers-every-programmer-should-know>1</a></sup>
Below are a few other optimizations to the **Crawling Service**:
* To handle the data size and request load, the **Reverse Index Service** and **Document Service** will likely need to make heavy use sharding and federation.
* DNS lookup can be a bottleneck, the **Crawler Service** can keep its own DNS lookup that is refreshed periodically
* The **Crawler Service** can improve performance and reduce memory usage by keeping many open connections at a time, referred to as [connection pooling](https://en.wikipedia.org/wiki/Connection_pool)
* Switching to [UDP](https://github.com/donnemartin/system-design-primer#user-datagram-protocol-udp) could also boost performance
* Switching to [UDP](https://github.com/ido777/system-design-primer-update#user-datagram-protocol-udp) could also boost performance
* Web crawling is bandwidth intensive, ensure there is enough bandwidth to sustain high throughput
## Additional talking points
@@ -294,58 +294,58 @@ Below are a few other optimizations to the **Crawling Service**:
### SQL scaling patterns
* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
* [Federation](https://github.com/donnemartin/system-design-primer#federation)
* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
* [Read replicas](https://github.com/ido777/system-design-primer-update#master-slave-replication)
* [Federation](https://github.com/ido777/system-design-primer-update#federation)
* [Sharding](https://github.com/ido777/system-design-primer-update#sharding)
* [Denormalization](https://github.com/ido777/system-design-primer-update#denormalization)
* [SQL Tuning](https://github.com/ido777/system-design-primer-update#sql-tuning)
#### NoSQL
* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
* [Key-value store](https://github.com/ido777/system-design-primer-update#key-value-store)
* [Document store](https://github.com/ido777/system-design-primer-update#document-store)
* [Wide column store](https://github.com/ido777/system-design-primer-update#wide-column-store)
* [Graph database](https://github.com/ido777/system-design-primer-update#graph-database)
* [SQL vs NoSQL](https://github.com/ido777/system-design-primer-update#sql-or-nosql)
### Caching
* Where to cache
* [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
* [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
* [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
* [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
* [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
* [Client caching](https://github.com/ido777/system-design-primer-update#client-caching)
* [CDN caching](https://github.com/ido777/system-design-primer-update#cdn-caching)
* [Web server caching](https://github.com/ido777/system-design-primer-update#web-server-caching)
* [Database caching](https://github.com/ido777/system-design-primer-update#database-caching)
* [Application caching](https://github.com/ido777/system-design-primer-update#application-caching)
* What to cache
* [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
* [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
* [Caching at the database query level](https://github.com/ido777/system-design-primer-update#caching-at-the-database-query-level)
* [Caching at the object level](https://github.com/ido777/system-design-primer-update#caching-at-the-object-level)
* When to update the cache
* [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
* [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
* [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
* [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
* [Cache-aside](https://github.com/ido777/system-design-primer-update#cache-aside)
* [Write-through](https://github.com/ido777/system-design-primer-update#write-through)
* [Write-behind (write-back)](https://github.com/ido777/system-design-primer-update#write-behind-write-back)
* [Refresh ahead](https://github.com/ido777/system-design-primer-update#refresh-ahead)
### Asynchronism and microservices
* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
* [Message queues](https://github.com/ido777/system-design-primer-update#message-queues)
* [Task queues](https://github.com/ido777/system-design-primer-update#task-queues)
* [Back pressure](https://github.com/ido777/system-design-primer-update#back-pressure)
* [Microservices](https://github.com/ido777/system-design-primer-update#microservices)
### Communications
* Discuss tradeoffs:
* External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
* Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
* External communication with clients - [HTTP APIs following REST](https://github.com/ido777/system-design-primer-update#representational-state-transfer-rest)
* Internal communications - [RPC](https://github.com/ido777/system-design-primer-update#remote-procedure-call-rpc)
* [Service discovery](https://github.com/ido777/system-design-primer-update#service-discovery)
### Security
Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
Refer to the [security section](https://github.com/ido777/system-design-primer-update#security).
### Latency numbers
See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
See [Latency numbers every programmer should know](https://github.com/ido777/system-design-primer-update#latency-numbers-every-programmer-should-know).
### Ongoing