Date: Sun, 12 May 2019 12:21:14 +0200
Subject: [PATCH 18/72] Add Ebook generation script (#207)
---
.gitignore | 1 +
README-ja.md | 60 ++++++++++++++++++++++----------------------
README-zh-Hans.md | 62 +++++++++++++++++++++++-----------------------
README-zh-TW.md | 58 +++++++++++++++++++++----------------------
README.md | 60 ++++++++++++++++++++++----------------------
epub-metadata.yaml | 3 +++
generate-epub.sh | 40 ++++++++++++++++++++++++++++++
7 files changed, 164 insertions(+), 120 deletions(-)
create mode 100644 epub-metadata.yaml
create mode 100755 generate-epub.sh
diff --git a/.gitignore b/.gitignore
index 200a617d..5ca2fa24 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,5 @@
# Byte-compiled / optimized / DLL files
+*.epub
__pycache__/
*.py[cod]
diff --git a/README-ja.md b/README-ja.md
index 2ace2591..077df6ee 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -3,7 +3,7 @@
# システム設計入門
-
+
@@ -44,7 +44,7 @@
## 暗記カード
-
+
@@ -61,7 +61,7 @@
コード技術面接用の問題を探している場合は[**こちら**](https://github.com/donnemartin/interactive-coding-challenges)
-
+
@@ -91,7 +91,7 @@
> それぞれのセクションはより学びを深めるような他の文献へのリンクが貼られています。
-
+
@@ -436,7 +436,7 @@
### CAP 理論
-
+
Source: CAP theorem revisited
@@ -530,7 +530,7 @@
## ドメインネームシステム
-
+
Source: DNS security presentation
@@ -568,7 +568,7 @@ DNSは少数のオーソライズされたサーバーが上位に位置する
## コンテンツデリバリーネットワーク(Content delivery network)
-
+
Source: Why use a CDN
@@ -609,7 +609,7 @@ CDNを用いてコンテンツを配信することで以下の二つの理由
## ロードバランサー
-
+
Source: Scalable system design patterns
@@ -679,7 +679,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
## リバースプロキシ(webサーバー)
-
+
Source: Wikipedia
@@ -722,7 +722,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
## アプリケーション層
-
+
Source: Intro to architecting systems for scale
@@ -759,7 +759,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
## データベース
-
+
Source: Scaling up to your first 10 million users
@@ -782,7 +782,7 @@ SQLなどのリレーショナルデータベースはテーブルに整理さ
マスターデータベースが読み取りと書き込みを処理し、書き込みを一つ以上のスレーブデータベースに複製します。スレーブデータベースは読み取りのみを処理します。スレーブデータベースは木構造のように追加のスレーブにデータを複製することもできます。マスターデータベースがオフラインになった場合には、いずれかのスレーブがマスターに昇格するか、新しいマスターデータベースが追加されるまでは読み取り専用モードで稼働します。
-
+
Source: Scalability, availability, stability, patterns
@@ -797,7 +797,7 @@ SQLなどのリレーショナルデータベースはテーブルに整理さ
いずれのマスターも読み取り書き込みの両方に対応する。書き込みに関してはそれぞれ協調する。いずれかのマスターが落ちても、システム全体としては読み書き両方に対応したまま運用できる。
-
+
Source: Scalability, availability, stability, patterns
@@ -825,7 +825,7 @@ SQLなどのリレーショナルデータベースはテーブルに整理さ
#### Federation
-
+
Source: Scaling up to your first 10 million users
@@ -846,7 +846,7 @@ SQLなどのリレーショナルデータベースはテーブルに整理さ
#### シャーディング
-
+
Source: Scalability, availability, stability, patterns
@@ -990,7 +990,7 @@ NoSQL は **key-value store**、 **document-store**、 **wide column store**、
#### ワイドカラムストア
-
+
Source: SQL & NoSQL, a brief history
@@ -1013,7 +1013,7 @@ Googleは[Bigtable](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/cha
#### グラフデータベース
-
+
Source: Graph database
@@ -1041,7 +1041,7 @@ Googleは[Bigtable](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/cha
### SQLか?NoSQLか?
-
+
Source: Transitioning from RDBMS to NoSQL
@@ -1083,7 +1083,7 @@ NoSQLに適するサンプルデータ:
## キャッシュ
-
+
Source: Scalable system design patterns
@@ -1154,7 +1154,7 @@ Redisはさらに以下のような機能を備えています:
#### キャッシュアサイド
-
+
Source: From cache to in-memory data grid
@@ -1190,7 +1190,7 @@ def get_user(self, user_id):
#### ライトスルー
-
+
Source: Scalability, availability, stability, patterns
@@ -1225,7 +1225,7 @@ def set_user(user_id, values):
#### ライトビハインド (ライトバック)
-
+
Source: Scalability, availability, stability, patterns
@@ -1243,7 +1243,7 @@ def set_user(user_id, values):
#### リフレッシュアヘッド
-
+
Source: From cache to in-memory data grid
@@ -1275,7 +1275,7 @@ def set_user(user_id, values):
## 非同期処理
-
+
Source: Intro to architecting systems for scale
@@ -1321,7 +1321,7 @@ def set_user(user_id, values):
## 通信
-
+
Source: OSI 7 layer model
@@ -1353,7 +1353,7 @@ HTTPは**TCP** や **UDP** などの低級プロトコルに依存している
### 伝送制御プロトコル (TCP)
-
+
Source: How to make a multiplayer game
@@ -1377,7 +1377,7 @@ TCPは高い依存性を要し、時間制約が厳しくないものに適し
### ユーザデータグラムプロトコル (UDP)
-
+
Source: How to make a multiplayer game
@@ -1406,7 +1406,7 @@ TCPよりもUDPを使うのは:
### 遠隔手続呼出 (RPC)
-
+
Source: Crack the system design interview
@@ -1602,7 +1602,7 @@ Notes
| 質問 | 解答 |
|---|---|
| Dropboxのようなファイル同期サービスを設計する | [youtube.com](https://www.youtube.com/watch?v=PE4gwstWhmc) |
-| Googleのような検索エンジンの設計 | [queue.acm.org](http://queue.acm.org/detail.cfm?id=988407)
[stackexchange.com](http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search)
[ardendertat.com](http://www.ardendertat.com/2012/01/11/implementing-search-engines/)
[stanford.edu](http://infolab.stanford.edu/~backrub/google.html) |
+| Googleのような検索エンジンの設計 | [queue.acm.org](http://queue.acm.org/detail.cfm?id=988407)
[stackexchange.com](http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search)
[ardendertat.com](http://www.ardendertat.com/2012/01/11/implementing-search-engines/)
[stanford.edu](http://infolab.stanford.edu/~backrub/google.html) |
| Googleのようなスケーラブルなwebクローラーの設計 | [quora.com](https://www.quora.com/How-can-I-build-a-web-crawler-from-scratch) |
| Google docsの設計 | [code.google.com](https://code.google.com/p/google-mobwrite/)
[neil.fraser.name](https://neil.fraser.name/writing/sync/) |
| Redisのようなキーバリューストアの設計 | [slideshare.net](http://www.slideshare.net/dvirsky/introduction-to-redis) |
@@ -1629,7 +1629,7 @@ Notes
> 世の中のシステムがどのように設計されているかについての記事
-
+
Source: Twitter timelines at scale
diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index c8847c30..b71b0531 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -6,7 +6,7 @@
# 系统设计入门
-
+
@@ -55,7 +55,7 @@
## 抽认卡
-
+
@@ -72,7 +72,7 @@
你正在寻找资源以准备[**编程面试**](https://github.com/donnemartin/interactive-coding-challenges)吗?
-
+
@@ -102,7 +102,7 @@
-
+
@@ -446,7 +446,7 @@
### CAP 理论
-
+
来源:再看 CAP 理论
@@ -541,7 +541,7 @@ DNS 和 email 等系统使用的是此种方式。最终一致性在高可用性
## 域名系统
-
+
来源:DNS 安全介绍
@@ -579,7 +579,7 @@ DNS 和 email 等系统使用的是此种方式。最终一致性在高可用性
## 内容分发网络(CDN)
-
+
来源:为什么使用 CDN
@@ -618,7 +618,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
## 负载均衡器
-
+
来源:可扩展的系统设计模式
@@ -687,7 +687,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
## 反向代理(web 服务器)
-
+
资料来源:维基百科
@@ -731,7 +731,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
## 应用层
-
+
资料来源:可缩放系统构架介绍
@@ -769,7 +769,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
## 数据库
-
+
资料来源:扩展你的用户数到第一个一千万
@@ -790,7 +790,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
关系型数据库扩展包括许多技术:**主从复制**、**主主复制**、**联合**、**分片**、**非规范化**和 **SQL调优**。
-
+
资料来源:可扩展性、可用性、稳定性、模式
@@ -805,7 +805,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
- 参考[不利之处:复制](#不利之处复制)中,主从复制和主主复制**共同**的问题。
-
+
资料来源:可扩展性、可用性、稳定性、模式
@@ -840,7 +840,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
#### 联合
-
+
资料来源:扩展你的用户数到第一个一千万
@@ -862,7 +862,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
#### 分片
-
+
资料来源:可扩展性、可用性、稳定性、模式
@@ -1006,7 +1006,7 @@ MongoDB 和 CouchDB 等一些文档类型存储还提供了类似 SQL 语言的
#### 列型存储
-
+
资料来源: SQL 和 NoSQL,一个简短的历史
@@ -1029,9 +1029,9 @@ Google 发布了第一个列型存储数据库 [Bigtable](http://www.read.seas.h
#### 图数据库
-
+
- 资料来源:图数据库
+ 资料来源:图数据库
> 抽象模型: 图
@@ -1056,7 +1056,7 @@ Google 发布了第一个列型存储数据库 [Bigtable](http://www.read.seas.h
### SQL 还是 NoSQL
-
+
资料来源:从 RDBMS 转换到 NoSQL
@@ -1097,7 +1097,7 @@ Google 发布了第一个列型存储数据库 [Bigtable](http://www.read.seas.h
## 缓存
-
+
资料来源:可扩展的系统设计模式
@@ -1168,7 +1168,7 @@ Redis 有下列附加功能:
#### 缓存模式
-
+
资料来源:从缓存到内存数据网格
@@ -1204,7 +1204,7 @@ def get_user(self, user_id):
#### 直写模式
-
+
资料来源:可扩展性、可用性、稳定性、模式
@@ -1239,7 +1239,7 @@ def set_user(user_id, values):
#### 回写模式
-
+
资料来源:可扩展性、可用性、稳定性、模式
@@ -1257,7 +1257,7 @@ def set_user(user_id, values):
#### 刷新
-
+
资料来源:从缓存到内存数据网格
@@ -1289,7 +1289,7 @@ def set_user(user_id, values):
## 异步
-
+
资料来源:可缩放系统构架介绍
@@ -1335,7 +1335,7 @@ def set_user(user_id, values):
## 通讯
-
+
资料来源:OSI 7层模型
@@ -1370,7 +1370,7 @@ HTTP 是依赖于较低级协议(如 **TCP** 和 **UDP**)的应用层协议
### 传输控制协议(TCP)
-
+
资料来源:如何制作多人游戏
@@ -1394,7 +1394,7 @@ TCP 对于需要高可靠性但时间紧迫的应用程序很有用。比如包
### 用户数据报协议(UDP)
-
+
资料来源:如何制作多人游戏
@@ -1423,7 +1423,7 @@ UDP 可靠性更低但适合用在网络电话、视频聊天,流媒体和实
### 远程过程调用协议(RPC)
-
+
Source: Crack the system design interview
@@ -1618,7 +1618,7 @@ Notes
| 问题 | 引用 |
| ----------------------- | ---------------------------------------- |
| 设计类似于 Dropbox 的文件同步服务 | [youtube.com](https://www.youtube.com/watch?v=PE4gwstWhmc) |
-| 设计类似于 Google 的搜索引擎 | [queue.acm.org](http://queue.acm.org/detail.cfm?id=988407)
[stackexchange.com](http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search)
[ardendertat.com](http://www.ardendertat.com/2012/01/11/implementing-search-engines/)
[stanford.edu](http://infolab.stanford.edu/~backrub/google.html) |
+| 设计类似于 Google 的搜索引擎 | [queue.acm.org](http://queue.acm.org/detail.cfm?id=988407)
[stackexchange.com](http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search)
[ardendertat.com](http://www.ardendertat.com/2012/01/11/implementing-search-engines/)
[stanford.edu](http://infolab.stanford.edu/~backrub/google.html) |
| 设计类似于 Google 的可扩展网络爬虫 | [quora.com](https://www.quora.com/How-can-I-build-a-web-crawler-from-scratch) |
| 设计 Google 文档 | [code.google.com](https://code.google.com/p/google-mobwrite/)
[neil.fraser.name](https://neil.fraser.name/writing/sync/) |
| 设计类似 Redis 的键值存储 | [slideshare.net](http://www.slideshare.net/dvirsky/introduction-to-redis) |
@@ -1645,7 +1645,7 @@ Notes
> 关于现实中真实的系统是怎么设计的文章。
-
+
Source: Twitter timelines at scale
diff --git a/README-zh-TW.md b/README-zh-TW.md
index 3c18f934..e0b6c7b4 100644
--- a/README-zh-TW.md
+++ b/README-zh-TW.md
@@ -3,7 +3,7 @@
# 系統設計入門
-
+
@@ -44,7 +44,7 @@
## 學習單字卡
-
+
@@ -61,7 +61,7 @@
你正在尋找資源來面對[**程式語言面試**](https://github.com/donnemartin/interactive-coding-challenges)嗎?
-
+
@@ -91,7 +91,7 @@
> 每一章節都包含更深入資源的連結。
-
+
@@ -435,7 +435,7 @@
### CAP 理論
-
+
來源:再看 CAP 理論
@@ -529,7 +529,7 @@ DNS 或是電子郵件系統使用的就是這種方式,最終一致性在高
## 域名系統
-
+
資料來源:DNS 安全介紹
@@ -567,7 +567,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
## 內容傳遞網路(CDN)
-
+
來源:為什麼要使用 CDN
@@ -608,7 +608,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
## 負載平衡器
-
+
來源:可擴展的系統設計模式
@@ -678,7 +678,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
## 反向代理(網頁伺服器)
-
+
來源:維基百科
@@ -721,7 +721,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
## 應用層
-
+
資料來源:可縮放式系統架構介紹
@@ -758,7 +758,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
## 資料庫
-
+
來源:擴展你的使用者數量到第一個一千萬量級
@@ -781,7 +781,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
主資料庫負責讀和寫,並且將寫入的資料複寫至一或多個從屬資料庫中,從屬資料庫只負責讀取。而從屬資料庫可以再將寫入複製到更多以樹狀結構的其他資料庫中。如果主資料庫離線了,系統可以以只讀模式運行,直到某個從屬資料庫被提升為主資料庫,或有新的主資料庫出現。
-
+
來源: 可擴展性、可用性、穩定性及其模式
@@ -796,7 +796,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
兩個主要的資料庫都負責讀取和寫入,並且兩者互相協調。如果其中一個主要資料庫離線,系統可以繼續運作。
-
+
來源: 可擴展性、可用性、穩定性及其模式
@@ -824,7 +824,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
#### 聯邦式資料庫
-
+
來源:擴展你的使用者數量到第一個一千萬量級
@@ -845,7 +845,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
#### 分片
-
+
來源: 可擴展性、可用性、穩定性及其模式
@@ -991,7 +991,7 @@ NoSQL 指的是 **鍵-值對的資料庫**、**文件類型資料庫**、**列
#### 列儲存型資料庫
-
+
來源:SQL 和 NoSQL,簡短的歷史介紹
@@ -1014,7 +1014,7 @@ Google 發表了第一個列儲存型資料庫 [Bigtable](http://www.read.seas.h
#### 圖形資料庫
-
+
來源: 圖形化資料庫
@@ -1042,7 +1042,7 @@ Google 發表了第一個列儲存型資料庫 [Bigtable](http://www.read.seas.h
### SQL 或 NoSQL
-
+
來源:從 RDBMS 轉換到 NoSQL
@@ -1084,7 +1084,7 @@ Google 發表了第一個列儲存型資料庫 [Bigtable](http://www.read.seas.h
## 快取
-
+
來源:可擴展的系統設計模式
@@ -1155,7 +1155,7 @@ Redis 還有以下額外的功能:
#### 快取模式
-
+
資料來源:從快取到記憶體資料網格
@@ -1191,7 +1191,7 @@ def get_user(self, user_id):
#### 寫入模式
-
+
資料來源:可獲展性、可用性、穩定性與模式
@@ -1226,7 +1226,7 @@ def set_user(user_id, values):
#### 事後寫入(回寫)
-
+
資料來源:可獲展性、可用性、穩定性與模式
@@ -1244,7 +1244,7 @@ def set_user(user_id, values):
#### 更新式快取
-
+
來源:從快取到記憶體資料網格技術
@@ -1276,7 +1276,7 @@ def set_user(user_id, values):
## 非同步機制
-
+
資料來源:可縮放性系統架構介紹
@@ -1322,7 +1322,7 @@ def set_user(user_id, values):
## 通訊
-
+
來源:OSI 七層模型
@@ -1354,7 +1354,7 @@ HTTP 是依賴於較底層的協議(例如:**TCP** 和 **UDP**) 的應用層
### 傳輸控制通訊協定(TCP)
-
+
來源:如何開發多人遊戲
@@ -1378,7 +1378,7 @@ TCP 對於需要高可靠、低時間急迫性的應用來說很有用,比如
### 使用者資料流通訊協定 (UDP)
-
+
資料來源:如何製作多人遊戲
@@ -1407,7 +1407,7 @@ UDP 的可靠性較低,但適合用在像是網路電話、視訊聊天、串
### 遠端程式呼叫 (RPC)
-
+
資料來源:破解系統設計面試
@@ -1630,7 +1630,7 @@ Notes
> 底下是關於真實世界的系統架構是如何設計的文章
-
+
資料來源:可擴展式的 Twitter 時間軸設計
diff --git a/README.md b/README.md
index 9982658a..54edffa1 100644
--- a/README.md
+++ b/README.md
@@ -3,7 +3,7 @@
# The System Design Primer
-
+
@@ -44,7 +44,7 @@ Additional topics for interview prep:
## Anki flashcards
-
+
@@ -61,7 +61,7 @@ Great for use while on-the-go.
Looking for resources to help you prep for the [**Coding Interview**](https://github.com/donnemartin/interactive-coding-challenges)?
-
+
@@ -91,7 +91,7 @@ Review the [Contributing Guidelines](CONTRIBUTING.md).
> Each section contains links to more in-depth resources.
-
+
@@ -436,7 +436,7 @@ Generally, you should aim for **maximal throughput** with **acceptable latency**
### CAP theorem
-
+
Source: CAP theorem revisited
@@ -530,7 +530,7 @@ This topic is further discussed in the [Database](#database) section:
## Domain name system
-
+
Source: DNS security presentation
@@ -568,7 +568,7 @@ Services such as [CloudFlare](https://www.cloudflare.com/dns/) and [Route 53](ht
## Content delivery network
-
+
Source: Why use a CDN
@@ -609,7 +609,7 @@ Sites with heavy traffic work well with pull CDNs, as traffic is spread out more
## Load balancer
-
+
Source: Scalable system design patterns
@@ -679,7 +679,7 @@ Load balancers can also help with horizontal scaling, improving performance and
## Reverse proxy (web server)
-
+
Source: Wikipedia
@@ -722,7 +722,7 @@ Additional benefits include:
## Application layer
-
+
Source: Intro to architecting systems for scale
@@ -757,7 +757,7 @@ Systems such as [Consul](https://www.consul.io/docs/index.html), [Etcd](https://
## Database
-
+
Source: Scaling up to your first 10 million users
@@ -780,7 +780,7 @@ There are many techniques to scale a relational database: **master-slave replica
The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.
-
+
Source: Scalability, availability, stability, patterns
@@ -795,7 +795,7 @@ The master serves reads and writes, replicating writes to one or more slaves, wh
Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.
-
+
Source: Scalability, availability, stability, patterns
@@ -823,7 +823,7 @@ Both masters serve reads and writes and coordinate with each other on writes. I
#### Federation
-
+
Source: Scaling up to your first 10 million users
@@ -844,7 +844,7 @@ Federation (or functional partitioning) splits up databases by function. For ex
#### Sharding
-
+
Source: Scalability, availability, stability, patterns
@@ -988,7 +988,7 @@ Document stores provide high flexibility and are often used for working with occ
#### Wide column store
-
+
Source: SQL & NoSQL, a brief history
@@ -1011,7 +1011,7 @@ Wide column stores offer high availability and high scalability. They are often
#### Graph database
-
+
Source: Graph database
@@ -1039,7 +1039,7 @@ Graphs databases offer high performance for data models with complex relationshi
### SQL or NoSQL
-
+
Source: Transitioning from RDBMS to NoSQL
@@ -1081,7 +1081,7 @@ Sample data well-suited for NoSQL:
## Cache
-
+
Source: Scalable system design patterns
@@ -1152,7 +1152,7 @@ Since you can only store a limited amount of data in cache, you'll need to deter
#### Cache-aside
-
+
Source: From cache to in-memory data grid
@@ -1188,7 +1188,7 @@ Subsequent reads of data added to cache are fast. Cache-aside is also referred
#### Write-through
-
+
Source: Scalability, availability, stability, patterns
@@ -1223,7 +1223,7 @@ Write-through is a slow overall operation due to the write operation, but subseq
#### Write-behind (write-back)
-
+
Source: Scalability, availability, stability, patterns
@@ -1241,7 +1241,7 @@ In write-behind, the application does the following:
#### Refresh-ahead
-
+
Source: From cache to in-memory data grid
@@ -1273,7 +1273,7 @@ Refresh-ahead can result in reduced latency vs read-through if the cache can acc
## Asynchronism
-
+
Source: Intro to architecting systems for scale
@@ -1319,7 +1319,7 @@ If queues start to grow significantly, the queue size can become larger than mem
## Communication
-
+
Source: OSI 7 layer model
@@ -1351,7 +1351,7 @@ HTTP is an application layer protocol relying on lower-level protocols such as *
### Transmission control protocol (TCP)
-
+
Source: How to make a multiplayer game
@@ -1375,7 +1375,7 @@ Use TCP over UDP when:
### User datagram protocol (UDP)
-
+
Source: How to make a multiplayer game
@@ -1404,7 +1404,7 @@ Use UDP over TCP when:
### Remote procedure call (RPC)
-
+
Source: Crack the system design interview
@@ -1600,7 +1600,7 @@ Handy metrics based on numbers above:
| Question | Reference(s) |
|---|---|
| Design a file sync service like Dropbox | [youtube.com](https://www.youtube.com/watch?v=PE4gwstWhmc) |
-| Design a search engine like Google | [queue.acm.org](http://queue.acm.org/detail.cfm?id=988407)
[stackexchange.com](http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search)
[ardendertat.com](http://www.ardendertat.com/2012/01/11/implementing-search-engines/)
[stanford.edu](http://infolab.stanford.edu/~backrub/google.html) |
+| Design a search engine like Google | [queue.acm.org](http://queue.acm.org/detail.cfm?id=988407)
[stackexchange.com](http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search)
[ardendertat.com](http://www.ardendertat.com/2012/01/11/implementing-search-engines/)
[stanford.edu](http://infolab.stanford.edu/~backrub/google.html) |
| Design a scalable web crawler like Google | [quora.com](https://www.quora.com/How-can-I-build-a-web-crawler-from-scratch) |
| Design Google docs | [code.google.com](https://code.google.com/p/google-mobwrite/)
[neil.fraser.name](https://neil.fraser.name/writing/sync/) |
| Design a key-value store like Redis | [slideshare.net](http://www.slideshare.net/dvirsky/introduction-to-redis) |
@@ -1628,7 +1628,7 @@ Handy metrics based on numbers above:
> Articles on how real world systems are designed.
-
+
Source: Twitter timelines at scale
diff --git a/epub-metadata.yaml b/epub-metadata.yaml
new file mode 100644
index 00000000..f4b296ba
--- /dev/null
+++ b/epub-metadata.yaml
@@ -0,0 +1,3 @@
+title: System Design Primer
+creator: Donne Martin
+date: 2018
\ No newline at end of file
diff --git a/generate-epub.sh b/generate-epub.sh
new file mode 100755
index 00000000..d7c21241
--- /dev/null
+++ b/generate-epub.sh
@@ -0,0 +1,40 @@
+#! /usr/bin/env sh
+
+generate_from_stdin() {
+ outfile=$1
+ language=$2
+
+ echo "Generating '$language' ..."
+
+ pandoc --metadata-file=epub-metadata.yaml --metadata=lang:$2 --from=markdown -o $1 <&0
+
+ echo "Done! You can find the '$language' book at ./$outfile"
+}
+
+generate_with_solutions () {
+ tmpfile=$(mktemp /tmp/sytem-design-primer-epub-generator.XXX)
+
+ cat ./README.md >> $tmpfile
+
+ for dir in ./solutions/system_design/*; do
+ case $dir in *template*) continue;; esac
+ case $dir in *__init__.py*) continue;; esac
+ : [[ -d "$dir" ]] && ( cd "$dir" && cat ./README.md >> $tmpfile && echo "" >> $tmpfile )
+ done
+
+ cat $tmpfile | generate_from_stdin 'README.epub' 'en'
+
+ rm "$tmpfile"
+}
+
+generate () {
+ name=$1
+ language=$2
+
+ cat $name.md | generate_from_stdin $name.epub $language
+}
+
+generate_with_solutions
+generate README-ja ja
+generate README-zh-Hans zh-Hans
+generate README-zh-TW zh-TW
From dd15249b65cdfe907706a4b87252af5a0737aab6 Mon Sep 17 00:00:00 2001
From: minhaz
Date: Sun, 12 May 2019 16:49:23 +0530
Subject: [PATCH 19/72] Add availability in numbers section (#237)
---
README.md | 47 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/README.md b/README.md
index 54edffa1..5e068e7d 100644
--- a/README.md
+++ b/README.md
@@ -112,6 +112,7 @@ Review the [Contributing Guidelines](CONTRIBUTING.md).
* [Availability patterns](#availability-patterns)
* [Fail-over](#fail-over)
* [Replication](#replication)
+ * [Availability in numbers](#availability-in-numbers)
* [Domain name system](#domain-name-system)
* [Content delivery network](#content-delivery-network)
* [Push CDNs](#push-cdns)
@@ -527,6 +528,52 @@ This topic is further discussed in the [Database](#database) section:
* [Master-slave replication](#master-slave-replication)
* [Master-master replication](#master-master-replication)
+### Availability in numbers
+
+Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is generally measured in number of 9s--a service with 99.99% availability is described as having four 9s.
+
+#### 99.9% availability - three 9s
+
+| Duration | Acceptable downtime|
+|---------------------|--------------------|
+| Downtime per year | 8h 45min 57s |
+| Downtime per month | 43m 49.7s |
+| Downtime per week | 10m 4.8s |
+| Downtime per day | 1m 26.4s |
+
+#### 99.99% availability - four 9s
+
+| Duration | Acceptable downtime|
+|---------------------|--------------------|
+| Downtime per year | 52min 35.7s |
+| Downtime per month | 4m 23s |
+| Downtime per week | 1m 5s |
+| Downtime per day | 8.6s |
+
+#### Availability in parallel vs in sequence
+
+If a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.
+
+###### In sequence
+
+Overall availability decreases when two components with availability < 100% are in sequence:
+
+```
+Availability (Total) = Availability (Foo) * Availability (Bar)
+```
+
+If both `Foo` and `Bar` each had 99.9% availability, their total availability in sequence would be 99.8%.
+
+###### In parallel
+
+Overall availability increases when two components with availability < 100% are in parallel:
+
+```
+Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))
+```
+
+If both `Foo` and `Bar` each had 99.9% availability, their total availability in parallel would be 99.9999%.
+
## Domain name system
From a95a2937bcd56569f8653ae71fb0b691d9a21481 Mon Sep 17 00:00:00 2001
From: Donne Martin
Date: Mon, 13 May 2019 18:56:36 -0700
Subject: [PATCH 20/72] Update language lists in translations (#280)
---
README-ja.md | 2 +-
README-zh-Hans.md | 2 ++
README-zh-TW.md | 2 +-
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/README-ja.md b/README-ja.md
index 077df6ee..5d7f61b3 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -1,4 +1,4 @@
-*[English](README.md) ∙ [简体中文](README-zh-Hans.md) | [Brazilian Portuguese](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Polish](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [Russian](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Turkish](https://github.com/donnemartin/system-design-primer/issues/39) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
# システム設計入門
diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index b71b0531..1b9b8abf 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -3,6 +3,8 @@
> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)
> * 这个 [链接](https://github.com/xitu/system-design-primer/compare/master...donnemartin:master) 用来查看本翻译与英文版是否有差别(如果你没有看到 README.md 发生变化,那就意味着这份翻译文档是最新的)。
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+
# 系统设计入门
diff --git a/README-zh-TW.md b/README-zh-TW.md
index e0b6c7b4..f52f5504 100644
--- a/README-zh-TW.md
+++ b/README-zh-TW.md
@@ -1,4 +1,4 @@
-*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [Brazilian Portuguese](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Italian](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [Korean](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [Persian](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polish](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [Russian](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Turkish](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [Vietnamese](https://github.com/donnemartin/system-design-primer/issues/127) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
# 系統設計入門
From 9ce0e9d734213807364831743b8b498607b3e46a Mon Sep 17 00:00:00 2001
From: Donne Martin
Date: Thu, 30 May 2019 20:25:26 -0400
Subject: [PATCH 21/72] Add Hebrew translation link (#286)
---
README-ja.md | 2 +-
README-zh-Hans.md | 2 +-
README-zh-TW.md | 2 +-
README.md | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/README-ja.md b/README-ja.md
index 5d7f61b3..9aa683e2 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -1,4 +1,4 @@
-*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
# システム設計入門
diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index 1b9b8abf..e337444a 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -3,7 +3,7 @@
> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)
> * 这个 [链接](https://github.com/xitu/system-design-primer/compare/master...donnemartin:master) 用来查看本翻译与英文版是否有差别(如果你没有看到 README.md 发生变化,那就意味着这份翻译文档是最新的)。
-*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
# 系统设计入门
diff --git a/README-zh-TW.md b/README-zh-TW.md
index f52f5504..c08362d3 100644
--- a/README-zh-TW.md
+++ b/README-zh-TW.md
@@ -1,4 +1,4 @@
-*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
# 系統設計入門
diff --git a/README.md b/README.md
index 5e068e7d..4ff765ff 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
# The System Design Primer
From 33431e61a9e6a636c902298fe3629da3a34c3c66 Mon Sep 17 00:00:00 2001
From: Kevin Xu
Date: Sun, 16 Jun 2019 23:21:21 +0800
Subject: [PATCH 22/72] zh-Hans: Translate Pastebin solution (#273)
---
README-zh-Hans.md | 2 +-
.../system_design/pastebin/README-zh-Hans.md | 330 ++++++++++++++++++
2 files changed, 331 insertions(+), 1 deletion(-)
create mode 100644 solutions/system_design/pastebin/README-zh-Hans.md
diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index e337444a..1ab9dd43 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -300,7 +300,7 @@
| 问题 | |
| ---------------------------------------- | ---------------------------------------- |
-| 设计 Pastebin.com (或者 Bit.ly) | [解答](solutions/system_design/pastebin/README.md) |
+| 设计 Pastebin.com (或者 Bit.ly) | [解答](solutions/system_design/pastebin/README-zh-Hans.md) |
| 设计 Twitter 时间线和搜索 (或者 Facebook feed 和搜索) | [解答](solutions/system_design/twitter/README.md) |
| 设计一个网页爬虫 | [解答](solutions/system_design/web_crawler/README.md) |
| 设计 Mint.com | [解答](solutions/system_design/mint/README.md) |
diff --git a/solutions/system_design/pastebin/README-zh-Hans.md b/solutions/system_design/pastebin/README-zh-Hans.md
new file mode 100644
index 00000000..b5fcbd3a
--- /dev/null
+++ b/solutions/system_design/pastebin/README-zh-Hans.md
@@ -0,0 +1,330 @@
+# 设计 Pastebin.com (或者 Bit.ly)
+
+**Note: 为了避免重复,当前文档直接链接到[系统设计主题](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)的相关区域,请参考链接内容以获得综合的讨论点、权衡和替代方案。**
+
+**设计 Bit.ly** - 是一个类似的问题,区别是 pastebin 需要存储的是 paste 的内容,而不是原始的未短化的 url。
+
+## 第一步:概述用例和约束
+
+> 收集这个问题的需求和范畴。
+> 问相关问题来明确用例和约束。
+> 讨论一些假设。
+
+因为没有面试官来明确这些问题,所以我们自己将定义一些用例和约束。
+
+### 用例
+
+#### 我们将问题的范畴限定在如下用例
+
+* **用户** 输入一段文本,然后得到一个随机生成的链接
+ * 过期设置
+ * 默认的设置是不会过期的
+ * 可以选择设置一个过期的时间
+* **用户** 输入一个 paste 的 url 后,可以看到它存储的内容
+* **用户** 是匿名的
+* **Service** 跟踪页面分析
+ * 一个月的访问统计
+* **Service** 删除过期的 pastes
+* **Service** 需要高可用
+
+#### 超出范畴的用例
+
+* **用户** 可以注册一个账户
+ * **用户** 通过验证邮箱
+* **用户** 可以用注册的账户登录
+ * **用户** 可以编辑文档
+* **用户** 可以设置可见性
+* **用户** 可以设置短链接
+
+### 约束和假设
+
+#### 状态假设
+
+* 访问流量不是均匀分布的
+* 打开一个短链接应该是很快的
+* pastes 只能是文本
+* 页面访问分析数据可以不用实时
+* 一千万的用户量
+* 每个月一千万的 paste 写入量
+* 每个月一亿的 paste 读取量
+* 读写比例在 10:1
+
+#### 计算使用
+
+**向面试官说明你是否应该粗略计算一下使用情况。**
+
+* 每个 paste 的大小
+ * 每一个 paste 1 KB
+ * `shortlink` - 7 bytes
+ * `expiration_length_in_minutes` - 4 bytes
+ * `created_at` - 5 bytes
+ * `paste_path` - 255 bytes
+ * 总共 = ~1.27 KB
+* 每个月新的 paste 内容在 12.7GB
+ * (1.27 * 10000000)KB / 月的 paste
+ * 三年内将近 450GB 的新 paste 内容
+ * 三年内 3.6 亿短链接
+ * 假设大部分都是新的 paste,而不是需要更新已存在的 paste
+* 平均 4paste/s 的写入速度
+* 平均 40paste/s 的读取速度
+
+简单的转换指南:
+
+* 2.5 百万 req/s
+* 1 req/s = 2.5 百万 req/m
+* 40 req/s = 1 亿 req/m
+* 400 req/s = 10 亿 req/m
+
+## 第二步:创建一个高层次设计
+
+> 概述一个包括所有重要的组件的高层次设计
+
+![Imgur](http://i.imgur.com/BKsBnmG.png)
+
+## 第三步:设计核心组件
+
+> 深入每一个核心组件的细节
+
+### 用例:用户输入一段文本,然后得到一个随机生成的链接
+
+我们可以用一个 [关系型数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)作为一个大的哈希表,用来把生成的 url 映射到一个包含 paste 文件的文件服务器和路径上。
+
+为了避免托管一个文件服务器,我们可以用一个托管的**对象存储**,比如 Amazon 的 S3 或者[NoSQL 文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)。
+
+作为一个大的哈希表的关系型数据库的替代方案,我们可以用[NoSQL 键值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)。我们需要讨论[选择 SQL 或 NoSQL 之间的权衡](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。下面的讨论是使用关系型数据库方法。
+
+* **客户端** 发送一个创建 paste 的请求到作为一个[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)启动的 **Web 服务器**。
+* **Web 服务器** 转发请求给 **写接口** 服务器
+* **写接口** 服务器执行如下操作:
+ * 生成一个唯一的 url
+ * 检查这个 url 在 **SQL 数据库** 里面是否是唯一的
+ * 如果这个 url 不是唯一的,生成另外一个 url
+ * 如果我们支持自定义 url,我们可以使用用户提供的 url(也需要检查是否重复)
+ * 把生成的 url 存储到 **SQL 数据库** 的 `pastes` 表里面
+ * 存储 paste 的内容数据到 **对象存储** 里面
+ * 返回生成的 url
+
+**向面试官阐明你需要写多少代码**
+
+`pastes` 表可以有如下结构:
+
+```sql
+shortlink char(7) NOT NULL
+expiration_length_in_minutes int NOT NULL
+created_at datetime NOT NULL
+paste_path varchar(255) NOT NULL
+PRIMARY KEY(shortlink)
+```
+
+我们将在 `shortlink` 字段和 `created_at` 字段上创建一个[数据库索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#使用正确的索引),用来提高查询的速度(避免因为扫描全表导致的长时间查询)并将数据保存在内存中,从内存里面顺序读取 1MB 的数据需要大概 250 微秒,而从 SSD 上读取则需要花费 4 倍的时间,从硬盘上则需要花费 80 倍的时间。 1
+
+为了生成唯一的 url,我们可以:
+
+* 使用 [**MD5**](https://en.wikipedia.org/wiki/MD5) 来哈希用户的 IP 地址 + 时间戳
+ * MD5 是一个普遍用来生成一个 128-bit 长度的哈希值的一种哈希方法
+ * MD5 是一致分布的
+ * 或者我们也可以用 MD5 哈希一个随机生成的数据
+* 用 [**Base 62**](https://www.kerstner.at/2012/07/shortening-strings-using-base-62-encoding/) 编码 MD5 哈希值
+ * 对于 urls,使用 Base 62 编码 `[a-zA-Z0-9]` 是比较合适的
+ * 对于每一个原始输入只会有一个 hash 结果,Base 62 是确定的(不涉及随机性)
+ * Base 64 是另外一个流行的编码方案,但是对于 urls,会因为额外的 `+` 和 `-` 字符串而产生一些问题
+ * 以下 [Base 62 伪代码](http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener) 执行的时间复杂度是 O(k),k 是数字的数量 = 7:
+
+```python
+def base_encode(num, base=62):
+ digits = []
+ while num > 0
+ remainder = modulo(num, base)
+ digits.push(remainder)
+ num = divide(num, base)
+ digits = digits.reverse
+```
+
+* 取输出的前 7 个字符,结果会有 62^7 个可能的值,应该足以满足在 3 年内处理 3.6 亿个短链接的约束:
+
+```python
+url = base_encode(md5(ip_address+timestamp))[:URL_LENGTH]
+```
+
+我们将会用一个公开的 [**REST 风格接口**](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest):
+
+```shell
+$ curl -X POST --data '{"expiration_length_in_minutes":"60", \"paste_contents":"Hello World!"}' https://pastebin.com/api/v1/paste
+```
+
+Response:
+
+```json
+{
+ "shortlink": "foobar"
+}
+```
+
+用于内部通信,我们可以用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
+
+### 用例:用户输入一个 paste 的 url 后可以看到它存储的内容
+
+* **客户端** 发送一个获取 paste 请求到 **Web Server**
+* **Web Server** 转发请求给 **读取接口** 服务器
+* **读取接口** 服务器执行如下操作:
+ * 在 **SQL 数据库** 检查这个生成的 url
+ * 如果这个 url 在 **SQL 数据库** 里面,则从 **对象存储** 获取这个 paste 的内容
+ * 否则,返回一个错误页面给用户
+
+REST API:
+
+```shell
+curl https://pastebin.com/api/v1/paste?shortlink=foobar
+```
+
+Response:
+
+```json
+{
+ "paste_contents": "Hello World",
+ "created_at": "YYYY-MM-DD HH:MM:SS",
+ "expiration_length_in_minutes": "60"
+}
+```
+
+### 用例: 服务跟踪分析页面
+
+因为实时分析不是必须的,所以我们可以简单的 **MapReduce** **Web Server** 的日志,用来生成点击次数。
+
+```python
+class HitCounts(MRJob):
+
+ def extract_url(self, line):
+ """Extract the generated url from the log line."""
+ ...
+
+ def extract_year_month(self, line):
+ """Return the year and month portions of the timestamp."""
+ ...
+
+ def mapper(self, _, line):
+ """Parse each log line, extract and transform relevant lines.
+
+ Emit key value pairs of the form:
+
+ (2016-01, url0), 1
+ (2016-01, url0), 1
+ (2016-01, url1), 1
+ """
+ url = self.extract_url(line)
+ period = self.extract_year_month(line)
+ yield (period, url), 1
+
+ def reducer(self, key, values):
+ """Sum values for each key.
+
+ (2016-01, url0), 2
+ (2016-01, url1), 1
+ """
+ yield key, sum(values)
+```
+
+### 用例: 服务删除过期的 pastes
+
+为了删除过期的 pastes,我们可以直接搜索 **SQL 数据库** 中所有的过期时间比当前时间更早的记录,
+所有过期的记录将从这张表里面删除(或者将其标记为过期)。
+
+## 第四步:扩展这个设计
+
+> 给定约束条件,识别和解决瓶颈。
+
+![Imgur](http://i.imgur.com/4edXG0T.png)
+
+**重要提示: 不要简单的从最初的设计直接跳到最终的设计**
+
+说明您将迭代地执行这样的操作:1)**Benchmark/Load 测试**,2)**Profile** 出瓶颈,3)在评估替代方案和权衡时解决瓶颈,4)重复前面,可以参考[在 AWS 上设计一个可以支持百万用户的系统](../scaling_aws/README.md)这个用来解决如何迭代地扩展初始设计的例子。
+
+重要的是讨论在初始设计中可能遇到的瓶颈,以及如何解决每个瓶颈。比如,在多个 **Web 服务器** 上添加 **负载平衡器** 可以解决哪些问题? **CDN** 解决哪些问题?**Master-Slave Replicas** 解决哪些问题? 替代方案是什么和怎么对每一个替代方案进行权衡比较?
+
+我们将介绍一些组件来完成设计,并解决可伸缩性问题。内部的负载平衡器并不能减少杂乱。
+
+**为了避免重复的讨论**, 参考以下[系统设计主题](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)获取主要讨论要点、权衡和替代方案:
+
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [CDN](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#内容分发网络cdn)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平扩展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [应用层](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
+* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
+* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+
+**分析存储数据库** 可以用比如 Amazon Redshift 或者 Google BigQuery 这样的数据仓库解决方案。
+
+一个像 Amazon S3 这样的 **对象存储**,可以轻松处理每月 12.7 GB 的新内容约束。
+
+要处理 *平均* 每秒 40 读请求(峰值更高),其中热点内容的流量应该由 **内存缓存** 处理,而不是数据库。**内存缓存** 对于处理分布不均匀的流量和流量峰值也很有用。只要副本没有陷入复制写的泥潭,**SQL Read Replicas** 应该能够处理缓存丢失。
+
+对于单个 **SQL Write Master-Slave**,*平均* 每秒 4paste 写入 (峰值更高) 应该是可以做到的。否则,我们需要使用额外的 SQL 扩展模式:
+
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#SQL调优)
+
+我们还应该考虑将一些数据移动到 **NoSQL 数据库**。
+
+## 额外的话题
+
+> 是否更深入探讨额外主题,取决于问题的范围和面试剩余的时间。
+
+### NoSQL
+
+* [键值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [sql 还是 nosql](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+
+### 缓存
+
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 缓存什么
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+
+### 异步和微服务
+
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+
+### 通信
+
+* 讨论权衡:
+ * 跟客户端之间的外部通信 - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+
+### 安全
+
+参考[安全](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)。
+
+### 延迟数字
+
+见[每个程序员都应该知道的延迟数](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+
+### 持续进行
+
+* 继续对系统进行基准测试和监控,以在瓶颈出现时解决它们
+* 扩展是一个迭代的过程
From c65a721b41e79506656d8c6d0c1f9256606c0143 Mon Sep 17 00:00:00 2001
From: Yuya Ma'emichi <6386129+Wintus@users.noreply.github.com>
Date: Sun, 7 Jul 2019 00:07:13 +0900
Subject: [PATCH 23/72] ja: Fix typo of Big-O notation in KVS section (#292)
---
README-ja.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README-ja.md b/README-ja.md
index 9aa683e2..0fcee96c 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -955,7 +955,7 @@ NoSQL は **key-value store**、 **document-store**、 **wide column store**、
> 概要: ハッシュテーブル
-キーバリューストアでは一般的に0、1の読み、書きができ、それらはメモリないしSSDで裏付けられています。データストアはキーを [辞書的順序](https://en.wikipedia.org/wiki/Lexicographical_order) で保持することでキーの効率的な取得を可能にしています。キーバリューストアではメタデータを値とともに保持することが可能です。
+キーバリューストアでは一般的にO(1)の読み書きができ、それらはメモリないしSSDで裏付けられています。データストアはキーを [辞書的順序](https://en.wikipedia.org/wiki/Lexicographical_order) で保持することでキーの効率的な取得を可能にしています。キーバリューストアではメタデータを値とともに保持することが可能です。
キーバリューストアはハイパフォーマンスな挙動が可能で、単純なデータモデルやインメモリーキャッシュレイヤーなどのデータが急速に変わる場合などに使われます。単純な処理のみに機能が制限されているので、追加の処理機能が必要な場合にはその複雑性はアプリケーション層に載せることになります。
From f78db9e5b0b4b3ddbb05d59541e24f642e0bedba Mon Sep 17 00:00:00 2001
From: SATO Yusuke
Date: Tue, 9 Jul 2019 00:57:34 +0900
Subject: [PATCH 24/72] JA: Fix mistranslation in Horizontal scaling section
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
- The Japanese translation is ambiguous about “vertical scaling” means scaling out or scaling up.
- The word “expensive” is missing in the Japanese translation.
---
README-ja.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README-ja.md b/README-ja.md
index 9aa683e2..c79af058 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -651,7 +651,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
### 水平スケーリング
-ロードバランサーでは水平スケーリングによってパフォーマンスと可用性を向上させることができます。手頃な汎用マシンを追加することによってスケーリングさせる方が、 **垂直スケーリング** と言って、サーバーをよりハイパフォーマンスなマシンに載せ替えることよりもずっと費用対効果も可用性も高いでしょう。また、汎用ハードウェアを扱える人材を雇う方が、特化型の商用ハードウェアを扱える人材を雇うよりも簡単でしょう。
+ロードバランサーでは水平スケーリングによってパフォーマンスと可用性を向上させることができます。一つのサーバーをより高価なマシンにスケールアップする(**垂直スケーリング**)よりも、手頃な汎用マシンを使ったスケールアウトの方が、費用対効果も高くなり、結果的に可用性も高くなります。また、汎用ハードウェアを扱える人材を雇う方が、特化型の商用ハードウェアを扱える人材を雇うよりも簡単でしょう。
#### 欠点: 水平スケーリング
From 109235b486ab5dac804d54d85e6b35eb35606a46 Mon Sep 17 00:00:00 2001
From: SATO Yusuke
Date: Thu, 11 Jul 2019 00:13:11 +0900
Subject: [PATCH 25/72] JA: Fix mistranslation in Reverse proxy (web server)
section
- Fix mistranslation of parallel structure. (not information/blacklist/limit, but hide/blacklist/limit)
---
README-ja.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README-ja.md b/README-ja.md
index 9aa683e2..9678970d 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -689,7 +689,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
他には以下のような利点があります:
-* **より堅牢なセキュリティ** - バックエンドサーバーの情報、ブラックリストIP、クライアントごとの接続数などの情報を隠すことができます。
+* **より堅牢なセキュリティ** - バックエンドサーバーの情報を隠したり、IPアドレスをブラックリスト化したり、クライアントごとの接続数を制限したりできます。
* **スケーラビリティや柔軟性が増します** - クライアントはリバースプロキシのIPしか見ないので、裏でサーバーをスケールしたり、設定を変えやすくなります。
* **SSL termination** - 入力されるリクエストを解読し、サーバーのレスポンスを暗号化することでサーバーがこのコストのかかりうる処理をしなくて済むようになります。
* [X.509 証明書](https://en.wikipedia.org/wiki/X.509) を各サーバーにインストールする必要がなくなります。
From 78d15fd16a75ea55d9071f7f038238db1668ba34 Mon Sep 17 00:00:00 2001
From: SATO Yusuke
Date: Sat, 27 Jul 2019 00:22:56 +0900
Subject: [PATCH 26/72] ja: Fix mistranslation in "Horizontal scaling"
---
README-ja.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README-ja.md b/README-ja.md
index c79af058..21006121 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -651,7 +651,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
### 水平スケーリング
-ロードバランサーでは水平スケーリングによってパフォーマンスと可用性を向上させることができます。一つのサーバーをより高価なマシンにスケールアップする(**垂直スケーリング**)よりも、手頃な汎用マシンを使ったスケールアウトの方が、費用対効果も高くなり、結果的に可用性も高くなります。また、汎用ハードウェアを扱える人材を雇う方が、特化型の商用ハードウェアを扱える人材を雇うよりも簡単でしょう。
+ロードバランサーでは水平スケーリングによってパフォーマンスと可用性を向上させることができます。手頃な汎用マシンを追加することによってスケールアウトさせる方が、一つのサーバーをより高価なマシンにスケールアップする(**垂直スケーリング**)より費用対効果も高くなり、結果的に可用性も高くなります。また、汎用ハードウェアを扱える人材を雇う方が、特化型の商用ハードウェアを扱える人材を雇うよりも簡単でしょう。
#### 欠点: 水平スケーリング
From b4135dd6b2f13ad24ec15349fec96559289f55c4 Mon Sep 17 00:00:00 2001
From: SATO Yusuke
Date: Sat, 3 Aug 2019 23:06:57 +0900
Subject: [PATCH 27/72] JA: Fix mistranslation in Weak consistency section
(#299)
---
README-ja.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README-ja.md b/README-ja.md
index 0fcee96c..4103320f 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -471,9 +471,9 @@
### 弱い一貫性
-書き込み後の読み取りでは、その最新の書き込みを読めたり読めなかったりする。一番良いアプローチが選択される。
+書き込み後の読み取りでは、その最新の書き込みを読めたり読めなかったりする。ベストエフォート型のアプローチに基づく。
-メムキャッシュなどのシステムにおいてこのアプローチは取られる。弱い一貫性はリアルタイム性が必要な使用例、例えばVoIP、ビデオチャット、リアルタイムマルチプレイヤーゲームなどと相性がいいでしょう。例えば、電話に出ていて、受信を数秒受け取れなかったとして、その後に接続回復してもその接続が切断されていた間に話されていたことは聞き取れないというような感じです。
+このアプローチはmemcachedなどのシステムに見られます。弱い一貫性はリアルタイム性が必要なユースケース、例えばVoIP、ビデオチャット、リアルタイムマルチプレイヤーゲームなどと相性がいいでしょう。例えば、電話に出ているときに数秒間音声が受け取れなくなったとしたら、その後に接続が回復してもその接続が切断されていた間に話されていたことは聞き取れないというような感じです。
### 結果整合性
From edbe857b7b743425c1ea74f3cb294c558a551243 Mon Sep 17 00:00:00 2001
From: SATO Yusuke
Date: Sat, 3 Aug 2019 23:07:34 +0900
Subject: [PATCH 28/72] JA: Fix mistranslation in Push CDNs section (#300)
---
README-ja.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README-ja.md b/README-ja.md
index 4103320f..0cfe9ac3 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -582,7 +582,7 @@ CDNを用いてコンテンツを配信することで以下の二つの理由
### プッシュCDN
-プッシュCDNではサーバーデータに更新があった時には必ず、新しいコンテンツを受け取る方式です。コンテンツを配信し、CDNに直接アップロードし、URLをCDNを指すように指定するところまで全ての責任を負う形です。コンテンツがいつ期限切れになるのか更新されるのかを設定することができます。コンテンツは新規作成時、更新時のみアップロードされることでトラフィックは最小化される一方、ストレージは最大限消費されてしまいます。
+プッシュCDNではサーバーデータに更新があった時には必ず、新しいコンテンツを受け取る方式です。コンテンツを用意し、CDNに直接アップロードし、URLをCDNを指すように指定するところまで、全て自分で責任を負う形です。コンテンツがいつ期限切れになるのか更新されるのかを設定することができます。コンテンツは新規作成時、更新時のみアップロードされることでトラフィックは最小化される一方、ストレージは最大限消費されてしまいます。
トラフィックの少ない、もしくは頻繁にはコンテンツが更新されないサイトの場合にはプッシュCDNと相性がいいでしょう。コンテンツは定期的に再びプルされるのではなく、CDNに一度のみ配置されます。
From 9dc60cff3a44815bc2f22497b66d815378f21533 Mon Sep 17 00:00:00 2001
From: SATO Yusuke
Date: Mon, 5 Aug 2019 09:30:53 +0900
Subject: [PATCH 29/72] JA: Fix mistranslation in Federation section (#303)
---
README-ja.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README-ja.md b/README-ja.md
index 2922beee..d124de81 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -830,7 +830,7 @@ SQLなどのリレーショナルデータベースはテーブルに整理さ
Source: Scaling up to your first 10 million users
-フェデレーション (もしくは機能分割化とも言う) はデータベースを機能ごとに分割する。例えば、モノリシックな単一データベースの代わりに三つのデータベースを持つことができます: **フォーラム**、 **ユーザー** そして **プロダクト**です。各データベースへの書き込み読み取りのトラフィックが減ることで複製ラグも短くなります。より小さなデータベースを用いることで、メモリーに収まるデータが増えます。ローカルキャッシュに保存できる量が増えることで、キャッシュヒット率も上がります。単一の中央マスターが書き込みの処理をしなくても、並列で書き込みを処理することができ、スループットの向上が期待できます。
+フェデレーション (もしくは機能分割化とも言う) はデータベースを機能ごとに分割する。例えば、モノリシックな単一データベースの代わりに、データベースを **フォーラム**、 **ユーザー**、 **プロダクト** のように三つにすることで、データベース一つあたりの書き込み・読み取りのトラフィックが減り、その結果レプリケーションのラグも短くなります。データベースが小さくなることで、メモリーに収まるデータが増えます。キャッシュの局所性が高まるため、キャッシュヒット率も上がります。単一の中央マスターで書き込みを直列化したりしないため、並列で書き込みを処理することができ、スループットの向上が期待できます。
##### 欠点: federation
From 4bef27e60ad8379973b433fdae30d72ecd49e254 Mon Sep 17 00:00:00 2001
From: SATO Yusuke
Date: Wed, 14 Aug 2019 08:50:09 +0900
Subject: [PATCH 30/72] ja: Fix translation in "Anki flashcards" (#306)
---
README-ja.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README-ja.md b/README-ja.md
index d124de81..c47e1bce 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -48,7 +48,7 @@
-この[暗記カードアプリケーション](https://apps.ankiweb.net/) は、システム設計の主要な概念を学ぶのに役立つアプリケーションです。程よい間隔で同じ問題を繰り返し出題してくれます。
+この[Anki用フラッシュカードデッキ](https://apps.ankiweb.net/) は、間隔反復を活用して、システム設計のキーコンセプトの学習を支援します。
* [システム設計デッキ](resources/flash_cards/System%20Design.apkg)
* [システム設計練習課題デッキ](resources/flash_cards/System%20Design%20Exercises.apkg)
From 3e55f5bd39811688cc1468afa194aea4797b0b08 Mon Sep 17 00:00:00 2001
From: SATO Yusuke
Date: Wed, 14 Aug 2019 08:50:53 +0900
Subject: [PATCH 31/72] =?UTF-8?q?ja:=20Fix=20translation=20in=20=E2=80=9CD?=
=?UTF-8?q?isadvantage(s):=20load=20balancer=E2=80=9D=20(#307)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
---
README-ja.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README-ja.md b/README-ja.md
index c47e1bce..29900fcb 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -664,7 +664,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
* ロードバランサーはリソースが不足していたり、設定が適切でない場合、システム全体のボトルネックになる可能性があります。
* 単一障害点を除こうとしてロードバランサーを導入した結果、複雑さが増してしまうことになります。
-* 単一ロードバランサーでは単一障害点が除かれたことにはなりませんが、複数のロードバランサーはそれすなわち複雑化です。
+* ロードバランサーが一つだけだとそこが単一障害点になってしまいます。一方で、ロードバランサーを複数にすると、さらに複雑さが増してしまいます。
### その他の参考資料、ページ
From 7d4a13d8a28bc52178f0e25819128fabd505d4bb Mon Sep 17 00:00:00 2001
From: SATO Yusuke
Date: Wed, 14 Aug 2019 08:51:20 +0900
Subject: [PATCH 32/72] ja: Fix translation in Service Discovery section (#308)
---
README-ja.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README-ja.md b/README-ja.md
index 29900fcb..eba4d034 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -741,7 +741,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
### サービスディスカバリー
-[Consul](https://www.consul.io/docs/index.html)、 [Etcd](https://coreos.com/etcd/docs/latest)、 そして [Zookeeper](http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper) などのシステムはそれぞれを見つけやすいように、登録された名前、アドレス、そしてポート番号などを監視しています。[Health checks](https://www.consul.io/intro/getting-started/checks.html) はサービスの統一性を証明するのに有用ですが、しばしば[HTTP](#hypertext-transfer-protocol-http) エンドポイントを用いています。 Consul と Etcd のいずれも組み込みの [key-value store](#キーバリューストア) を持っており、設定データや共有データなどのデータを保存しておくことに使われます。
+[Consul](https://www.consul.io/docs/index.html)、 [Etcd](https://coreos.com/etcd/docs/latest)、 [Zookeeper](http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper) などのシステムでは、登録されているサービスの名前、アドレス、ポートの情報を監視することで、サービス同士が互いを見つけやすくしています。サービスの完全性の確認には [Health checks](https://www.consul.io/intro/getting-started/checks.html) が便利で、これには [HTTP](#hypertext-transfer-protocol-http) エンドポイントがよく使われます。 Consul と Etcd のいずれも組み込みの [key-value store](#キーバリューストア) を持っており、設定データや共有データなどのデータを保存しておくことに使われます。
### 欠点: アプリケーション層
From fdba2a2586a30b78249e5daba8a807ee532b0af9 Mon Sep 17 00:00:00 2001
From: Duy Nguyen Hoang
Date: Sun, 3 Nov 2019 17:56:24 +0700
Subject: [PATCH 33/72] Add API security checklist (#328)
---
README.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/README.md b/README.md
index 4ff765ff..7bbee487 100644
--- a/README.md
+++ b/README.md
@@ -1566,6 +1566,7 @@ Security is a broad topic. Unless you have considerable experience, a security
### Source(s) and further reading
+* [API security checklist](https://github.com/shieldfy/API-Security-Checklist)
* [Security guide for developers](https://github.com/FallibleInc/security-guide-for-developers)
* [OWASP top ten](https://www.owasp.org/index.php/OWASP_Top_Ten_Cheat_Sheet)
From 3ea0b15b5088483d1e5d77196aacee1c15306cb6 Mon Sep 17 00:00:00 2001
From: Brandon
Date: Mon, 9 Dec 2019 11:34:17 +0800
Subject: [PATCH 34/72] zh-Hans: Change translation in SQL tuning (#318)
---
README-zh-Hans.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index 1ab9dd43..21a6cddb 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -926,7 +926,7 @@ SQL 调优是一个范围很广的话题,有很多相关的[书](https://www.a
- 使用 `TEXT` 类型存储大块的文本,例如博客正文。`TEXT` 还允许布尔搜索。使用 `TEXT` 字段需要在磁盘上存储一个用于定位文本块的指针。
- 使用 `INT` 类型存储高达 2^32 或 40 亿的较大数字。
- 使用 `DECIMAL` 类型存储货币可以避免浮点数表示错误。
-- 避免使用 `BLOBS` 存储对象,存储存放对象的位置。
+- 避免使用 `BLOBS` 存储实际对象,而是用来存储存放对象的位置。
- `VARCHAR(255)` 是以 8 位数字存储的最大字符数,在某些关系型数据库中,最大限度地利用字节。
- 在适用场景中设置 `NOT NULL` 约束来[提高搜索性能](http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search)。
From eaa447cc39d124a52eb31cf0a22ef9b3c1d3bba4 Mon Sep 17 00:00:00 2001
From: SATO Yusuke
Date: Mon, 9 Dec 2019 12:35:44 +0900
Subject: [PATCH 35/72] ja: Fix mistranslation in SQL tuning section (#305)
---
README-ja.md | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/README-ja.md b/README-ja.md
index eba4d034..6c5cb0cf 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -902,31 +902,31 @@ SQLチューニングは広範な知識を必要とする分野で多くの [本
##### スキーマを絞る
-* より早い接続を得るために、連続したブロックの中のディスクにMySQLをダンプする。
+* MySQLはアクセス速度向上のため、ディスク上の連続したブロックへデータを格納しています。
* 長さの決まったフィールドに対しては `VARCHAR` よりも `CHAR` を使うようにしましょう。
* `CHAR` の方が効率的に速くランダムにデータにアクセスできます。 一方、 `VARCHAR` では次のデータに移る前にデータの末尾を検知しなければならないために速度が犠牲になります。
-* ブログ投稿などの大きなテキスト `TEXT` を使いましょう。 `TEXT` ではブーリアン型の検索も可能です。 `TEXT` フィールドを使うことは、テキストブロックを配置するのに用いたポインターをディスク上に保存することになります。
-* 2の32乗や40億を超えてくる数に関しては `INT` を使いましょう
+* ブログの投稿など、大きなテキストには TEXT を使いましょう。 TEXT ではブーリアン型の検索も可能です。 TEXT フィールドには、テキストブロックが配置されている、ディスク上の場所へのポインターが保存されます。
+* 2の32乗や40億以下を超えない程度の大きな数には INT を使いましょう。
* 通貨に関しては小数点表示上のエラーを避けるために `DECIMAL` を使いましょう。
* 大きな `BLOBS` を保存するのは避けましょう。どこからそのオブジェクトを取ってくることができるかの情報を保存しましょう。
-* `VARCHAR(255)` は8ビットで数えることができる中で最大の文字数ですが、このフィールドがしばしばRDBMSの中で大きな容量を食います。
-* [検索性能を向上させる](http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search) ことが可能な箇所については `NOT NULL` 制約を設定しましょう
+* `VARCHAR(255)` は8ビットで数えられる最大の文字数です。一部のDBMSでは、1バイトの利用効率を最大化するためにこの文字数がよく使われます。
+* [検索性能向上のため](http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search) 、可能であれば `NOT NULL` 制約を設定しましょう。
##### インデックスを効果的に用いる
-* クエリ(`SELECT`、 `GROUP BY`、 `ORDER BY`、 `JOIN`) を用いて取得する列はインデックスを用いると速度を向上できる。
-* インデックスは通常、対数的にデータを検索、挿入、削除する際に用いる[B-tree](https://en.wikipedia.org/wiki/B-tree)として表現されています。
+* クエリ(`SELECT`、 `GROUP BY`、 `ORDER BY`、 `JOIN`) の対象となる列にインデックスを使うことで速度を向上できるかもしれません。
+* インデックスは通常、平衡探索木である[B木](https://en.wikipedia.org/wiki/B-tree)の形で表されます。B木によりデータは常にソートされた状態になります。また検索、順次アクセス、挿入、削除を対数時間で行えます。
* インデックスを配置することはデータをメモリーに残すことにつながりより容量を必要とします。
* インデックスの更新も必要になるため書き込みも遅くなります。
-* 大きなデータを読み込む際には、インデックスを切ってからデータをロードして再びインデックスをビルドした方が速いことがあります。
+* 大量のデータをロードする際には、インデックスを切ってからデータをロードして再びインデックスをビルドした方が速いことがあります。
##### 高負荷なジョインを避ける
-* パフォーマンスが必要なところには[非正規化](#非正規化)を適用する
+* パフォーマンス上必要なところには[非正規化](#非正規化)を適用する
##### テーブルのパーティション
-* メモリー内に保つために、分離されたテーブルを分割してそれぞれにホットスポットを設定する。
+* テーブルを分割し、ホットスポットを独立したテーブルに分離してメモリーに乗せられるようにする。
##### クエリキャッシュを調整する
@@ -935,7 +935,7 @@ SQLチューニングは広範な知識を必要とする分野で多くの [本
##### その他の参考資料、ページ: SQLチューニング
* [MySQLクエリを最適化するためのTips](http://20bits.com/article/10-tips-for-optimizing-mysql-queries-that-dont-suck)
-* [VARCHAR(255)をそんなにたくさん使う必要ある?](http://stackoverflow.com/questions/1217466/is-there-a-good-reason-i-see-varchar255-used-so-often-as-opposed-to-another-l)
+* [VARCHAR(255)をやたらよく見かけるのはなんで?](http://stackoverflow.com/questions/1217466/is-there-a-good-reason-i-see-varchar255-used-so-often-as-opposed-to-another-l)
* [null値はどのようにパフォーマンスに影響するのか?](http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search)
* [Slow query log](http://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html)
From e50f26960dde9adce5614d8ead8be48ee47053e5 Mon Sep 17 00:00:00 2001
From: Christian Clauss
Date: Fri, 27 Dec 2019 02:11:57 +0100
Subject: [PATCH 36/72] Change raise NotImplemented to raise
NotImplementedError (#345)
---
solutions/object_oriented_design/call_center/call_center.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/solutions/object_oriented_design/call_center/call_center.py b/solutions/object_oriented_design/call_center/call_center.py
index a2785594..1d5e7bc6 100644
--- a/solutions/object_oriented_design/call_center/call_center.py
+++ b/solutions/object_oriented_design/call_center/call_center.py
@@ -66,7 +66,7 @@ class Director(Employee):
super(Operator, self).__init__(employee_id, name, Rank.DIRECTOR)
def escalate_call(self):
- raise NotImplemented('Directors must be able to handle any call')
+ raise NotImplementedError('Directors must be able to handle any call')
class CallState(Enum):
From 3b2264e5e87aa7907b86d521a266fda526a4042c Mon Sep 17 00:00:00 2001
From: Dan Choi
Date: Wed, 15 Jan 2020 10:04:08 -0500
Subject: [PATCH 37/72] Fix broken round robin links (#351)
---
README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index 7bbee487..637dd3ff 100644
--- a/README.md
+++ b/README.md
@@ -593,7 +593,7 @@ DNS is hierarchical, with a few authoritative servers at the top level. Your ro
Services such as [CloudFlare](https://www.cloudflare.com/dns/) and [Route 53](https://aws.amazon.com/route53/) provide managed DNS services. Some DNS services can route traffic through various methods:
-* [Weighted round robin](http://g33kinfo.com/info/archives/2657)
+* [Weighted round robin](https://www.g33kinfo.com/info/round-robin-vs-weighted-round-robin-lb)
* Prevent traffic from going to servers under maintenance
* Balance between varying cluster sizes
* A/B testing
@@ -682,7 +682,7 @@ Load balancers can route traffic based on various metrics, including:
* Random
* Least loaded
* Session/cookies
-* [Round robin or weighted round robin](http://g33kinfo.com/info/archives/2657)
+* [Round robin or weighted round robin](https://www.g33kinfo.com/info/round-robin-vs-weighted-round-robin-lb)
* [Layer 4](#layer-4-load-balancing)
* [Layer 7](#layer-7-load-balancing)
From fc563ca297a4667e12c17fa3c41c82949928c9ac Mon Sep 17 00:00:00 2001
From: vyq
Date: Tue, 21 Jan 2020 08:26:09 +0800
Subject: [PATCH 38/72] Fix broken CAP theorem link (#355)
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 637dd3ff..293b8ac5 100644
--- a/README.md
+++ b/README.md
@@ -463,7 +463,7 @@ AP is a good choice if the business needs allow for [eventual consistency](#even
### Source(s) and further reading
* [CAP theorem revisited](http://robertgreiner.com/2014/08/cap-theorem-revisited/)
-* [A plain english introduction to CAP theorem](http://ksat.me/a-plain-english-introduction-to-cap-theorem/)
+* [A plain english introduction to CAP theorem](http://ksat.me/a-plain-english-introduction-to-cap-theorem)
* [CAP FAQ](https://github.com/henryr/cap-faq)
## Consistency patterns
From 8e9c89129bc842df6ed8604bf511d6981d4e7d05 Mon Sep 17 00:00:00 2001
From: Danny Jung <3496334+dannyjung90@users.noreply.github.com>
Date: Sun, 16 Feb 2020 18:00:44 -0800
Subject: [PATCH 39/72] Fix broken link in CAP theorem section (#348)
From 301b9d88e4aed1c34b3275301f18b14957c38c91 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E6=A0=B9=E5=8F=B7=E4=B8=89?=
Date: Tue, 10 Mar 2020 09:34:18 +0800
Subject: [PATCH 40/72] zh-cn: Sync with upstream to keep it up-to-date (#374)
---
README-zh-Hans.md | 11 +-
solutions/system_design/mint/README.md | 395 +++++++------
solutions/system_design/pastebin/README.md | 359 ++++++------
solutions/system_design/query_cache/README.md | 312 +++++-----
solutions/system_design/sales_rank/README.md | 298 +++++-----
solutions/system_design/scaling_aws/README.md | 536 +++++++++---------
.../system_design/social_graph/README.md | 249 ++++----
solutions/system_design/twitter/README.md | 395 +++++++------
solutions/system_design/web_crawler/README.md | 351 ++++++------
9 files changed, 1449 insertions(+), 1457 deletions(-)
diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index 21a6cddb..83c6007b 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -1,6 +1,6 @@
> * 原文地址:[github.com/donnemartin/system-design-primer](https://github.com/donnemartin/system-design-primer)
> * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner)
-> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)
+> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)、[根号三](https://github.com/sqrthree)
> * 这个 [链接](https://github.com/xitu/system-design-primer/compare/master...donnemartin:master) 用来查看本翻译与英文版是否有差别(如果你没有看到 README.md 发生变化,那就意味着这份翻译文档是最新的)。
*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
@@ -12,14 +12,6 @@
-## 翻译
-
-有兴趣参与[翻译](https://github.com/donnemartin/system-design-primer/issues/28)? 以下是正在进行中的翻译:
-
-* [巴西葡萄牙语](https://github.com/donnemartin/system-design-primer/issues/40)
-* [简体中文](https://github.com/donnemartin/system-design-primer/issues/38)
-* [土耳其语](https://github.com/donnemartin/system-design-primer/issues/39)
-
## 目的
> 学习如何设计大型系统。
@@ -91,6 +83,7 @@
* 修复错误
* 完善章节
* 添加章节
+* [帮助翻译](https://github.com/donnemartin/system-design-primer/issues/28)
一些还需要完善的内容放在了[正在完善中](#正在完善中)。
diff --git a/solutions/system_design/mint/README.md b/solutions/system_design/mint/README.md
index 6fca1938..58467bc6 100644
--- a/solutions/system_design/mint/README.md
+++ b/solutions/system_design/mint/README.md
@@ -1,102 +1,102 @@
-# Design Mint.com
+# 设计 Mint.com
-*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题索引)中的有关部分,以避免重复的内容。您可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
-## Step 1: Outline use cases and constraints
+## 第一步:简述用例与约束条件
-> Gather requirements and scope the problem.
-> Ask questions to clarify use cases and constraints.
-> Discuss assumptions.
+> 搜集需求与问题的范围。
+> 提出问题来明确用例与约束条件。
+> 讨论假设。
-Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
-### Use cases
+### 用例
-#### We'll scope the problem to handle only the following use cases
+#### 我们将把问题限定在仅处理以下用例的范围中
-* **User** connects to a financial account
-* **Service** extracts transactions from the account
- * Updates daily
- * Categorizes transactions
- * Allows manual category override by the user
- * No automatic re-categorization
- * Analyzes monthly spending, by category
-* **Service** recommends a budget
- * Allows users to manually set a budget
- * Sends notifications when approaching or exceeding budget
-* **Service** has high availability
+* **用户** 连接到一个财务账户
+* **服务** 从账户中提取交易
+ * 每日更新
+ * 分类交易
+ * 允许用户手动分类
+ * 不自动重新分类
+ * 按类别分析每月支出
+* **服务** 推荐预算
+ * 允许用户手动设置预算
+ * 当接近或者超出预算时,发送通知
+* **服务** 具有高可用性
-#### Out of scope
+#### 非用例范围
-* **Service** performs additional logging and analytics
+* **服务** 执行附加的日志记录和分析
-### Constraints and assumptions
+### 限制条件与假设
-#### State assumptions
+#### 提出假设
-* Traffic is not evenly distributed
-* Automatic daily update of accounts applies only to users active in the past 30 days
-* Adding or removing financial accounts is relatively rare
-* Budget notifications don't need to be instant
-* 10 million users
- * 10 budget categories per user = 100 million budget items
- * Example categories:
+* 网络流量非均匀分布
+* 自动账户日更新只适用于 30 天内活跃的用户
+* 添加或者移除财务账户相对较少
+* 预算通知不需要及时
+* 1000 万用户
+ * 每个用户10个预算类别= 1亿个预算项
+ * 示例类别:
* Housing = $1,000
* Food = $200
* Gas = $100
- * Sellers are used to determine transaction category
- * 50,000 sellers
-* 30 million financial accounts
-* 5 billion transactions per month
-* 500 million read requests per month
-* 10:1 write to read ratio
- * Write-heavy, users make transactions daily, but few visit the site daily
+ * 卖方确定交易类别
+ * 50000 个卖方
+* 3000 万财务账户
+* 每月 50 亿交易
+* 每月 5 亿读请求
+* 10:1 读写比
+ * Write-heavy,用户每天都进行交易,但是每天很少访问该网站
-#### Calculate usage
+#### 计算用量
-**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
-* Size per transaction:
- * `user_id` - 8 bytes
- * `created_at` - 5 bytes
- * `seller` - 32 bytes
- * `amount` - 5 bytes
- * Total: ~50 bytes
-* 250 GB of new transaction content per month
- * 50 bytes per transaction * 5 billion transactions per month
- * 9 TB of new transaction content in 3 years
+* 每次交易的用量:
+ * `user_id` - 8 字节
+ * `created_at` - 5 字节
+ * `seller` - 32 字节
+ * `amount` - 5 字节
+ * Total: ~50 字节
+* 每月产生 250 GB 新的交易内容
+ * 每次交易 50 比特 * 50 亿交易每月
+ * 3年内新的交易内容 9 TB
* Assume most are new transactions instead of updates to existing ones
-* 2,000 transactions per second on average
-* 200 read requests per second on average
+* 平均每秒产生 2000 次交易
+* 平均每秒产生 200 读请求
-Handy conversion guide:
+便利换算指南:
-* 2.5 million seconds per month
-* 1 request per second = 2.5 million requests per month
-* 40 requests per second = 100 million requests per month
-* 400 requests per second = 1 billion requests per month
+* 每个月有 250 万秒
+* 每秒一个请求 = 每个月 250 万次请求
+* 每秒 40 个请求 = 每个月 1 亿次请求
+* 每秒 400 个请求 = 每个月 10 亿次请求
-## Step 2: Create a high level design
+## 第二步:概要设计
-> Outline a high level design with all important components.
+> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/E8klrBh.png)
-## Step 3: Design core components
+## 第三步:设计核心组件
-> Dive into details for each core component.
+> 深入每个核心组件的细节。
-### Use case: User connects to a financial account
+### 用例:用户连接到一个财务账户
-We could store info on the 10 million users in a [relational database](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms). We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
+我们可以将 1000 万用户的信息存储在一个[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)中。我们应该讨论一下[选择SQL或NoSQL之间的用例和权衡](https://github.com/donnemartin/system-design-primer#sql-or-nosql)了。
-* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* The **Web Server** forwards the request to the **Accounts API** server
-* The **Accounts API** server updates the **SQL Database** `accounts` table with the newly entered account info
+* **客户端** 作为一个[反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server),发送请求到 **Web 服务器**
+* **Web 服务器** 转发请求到 **账户API** 服务器
+* **账户API** 服务器将新输入的账户信息更新到 **SQL数据库** 的`accounts`表
-**Clarify with your interviewer how much code you are expected to write**.
+**告知你的面试官你准备写多少代码**。
-The `accounts` table could have the following structure:
+`accounts`表应该具有如下结构:
```
id int NOT NULL AUTO_INCREMENT
@@ -110,9 +110,9 @@ PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```
-We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id`, `user_id `, and `created_at` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
+我们将在`id`,`user_id`和`created_at`等字段上创建一个[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)以加速查找(对数时间而不是扫描整个表)并保持数据在内存中。从内存中顺序读取 1 MB数据花费大约250毫秒,而从SSD读取是其4倍,从磁盘读取是其80倍。1
-We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
+我们将使用公开的[**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
```
$ curl -X POST --data '{ "user_id": "foo", "account_url": "bar", \
@@ -120,35 +120,35 @@ $ curl -X POST --data '{ "user_id": "foo", "account_url": "bar", \
https://mint.com/api/v1/account
```
-For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
+对于内部通信,我们可以使用[远程过程调用](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)。
-Next, the service extracts transactions from the account.
+接下来,服务从账户中提取交易。
-### Use case: Service extracts transactions from the account
+### 用例:服务从账户中提取交易
-We'll want to extract information from an account in these cases:
+如下几种情况下,我们会想要从账户中提取信息:
-* The user first links the account
-* The user manually refreshes the account
-* Automatically each day for users who have been active in the past 30 days
+* 用户首次链接账户
+* 用户手动更新账户
+* 为过去 30 天内活跃的用户自动日更新
-Data flow:
+数据流:
-* The **Client** sends a request to the **Web Server**
-* The **Web Server** forwards the request to the **Accounts API** server
-* The **Accounts API** server places a job on a **Queue** such as [Amazon SQS](https://aws.amazon.com/sqs/) or [RabbitMQ](https://www.rabbitmq.com/)
- * Extracting transactions could take awhile, we'd probably want to do this [asynchronously with a queue](https://github.com/donnemartin/system-design-primer#asynchronism), although this introduces additional complexity
-* The **Transaction Extraction Service** does the following:
- * Pulls from the **Queue** and extracts transactions for the given account from the financial institution, storing the results as raw log files in the **Object Store**
- * Uses the **Category Service** to categorize each transaction
- * Uses the **Budget Service** to calculate aggregate monthly spending by category
- * The **Budget Service** uses the **Notification Service** to let users know if they are nearing or have exceeded their budget
- * Updates the **SQL Database** `transactions` table with categorized transactions
- * Updates the **SQL Database** `monthly_spending` table with aggregate monthly spending by category
- * Notifies the user the transactions have completed through the **Notification Service**:
- * Uses a **Queue** (not pictured) to asynchronously send out notifications
+* **客户端**向 **Web服务器** 发送请求
+* **Web服务器** 将请求转发到 **帐户API** 服务器
+* **帐户API** 服务器将job放在 **队列** 中,如 [Amazon SQS](https://aws.amazon.com/sqs/) 或者 [RabbitMQ](https://www.rabbitmq.com/)
+ * 提取交易可能需要一段时间,我们可能希望[与队列异步](https://github.com/donnemartin/system-design-primer#asynchronism)地来做,虽然这会引入额外的复杂度。
+* **交易提取服务** 执行如下操作:
+ * 从 **Queue** 中拉取并从金融机构中提取给定用户的交易,将结果作为原始日志文件存储在 **对象存储区**。
+ * 使用 **分类服务** 来分类每个交易
+ * 使用 **预算服务** 来按类别计算每月总支出
+ * **预算服务** 使用 **通知服务** 让用户知道他们是否接近或者已经超出预算
+ * 更新具有分类交易的 **SQL数据库** 的`transactions`表
+ * 按类别更新 **SQL数据库** `monthly_spending`表的每月总支出
+ * 通过 **通知服务** 提醒用户交易完成
+ * 使用一个 **队列** (没有画出来) 来异步发送通知
-The `transactions` table could have the following structure:
+`transactions`表应该具有如下结构:
```
id int NOT NULL AUTO_INCREMENT
@@ -160,9 +160,9 @@ PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```
-We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id`, `user_id `, and `created_at`.
+我们将在 `id`,`user_id`,和 `created_at`字段上创建[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)。
-The `monthly_spending` table could have the following structure:
+`monthly_spending`表应该具有如下结构:
```
id int NOT NULL AUTO_INCREMENT
@@ -174,13 +174,13 @@ PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```
-We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id` and `user_id `.
+我们将在`id`,`user_id`字段上创建[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)。
-#### Category service
+#### 分类服务
-For the **Category Service**, we can seed a seller-to-category dictionary with the most popular sellers. If we estimate 50,000 sellers and estimate each entry to take less than 255 bytes, the dictionary would only take about 12 MB of memory.
+对于 **分类服务**,我们可以生成一个带有最受欢迎卖家的卖家-类别字典。如果我们估计 50000 个卖家,并估计每个条目占用不少于 255 个字节,该字典只需要大约 12 MB内存。
-**Clarify with your interviewer how much code you are expected to write**.
+**告知你的面试官你准备写多少代码**。
```python
class DefaultCategories(Enum):
@@ -197,7 +197,7 @@ seller_category_map['Target'] = DefaultCategories.SHOPPING
...
```
-For sellers not initially seeded in the map, we could use a crowdsourcing effort by evaluating the manual category overrides our users provide. We could use a heap to quickly lookup the top manual override per seller in O(1) time.
+对于一开始没有在映射中的卖家,我们可以通过评估用户提供的手动类别来进行众包。在 O(1) 时间内,我们可以用堆来快速查找每个卖家的顶端的手动覆盖。
```python
class Categorizer(object):
@@ -217,7 +217,7 @@ class Categorizer(object):
return None
```
-Transaction implementation:
+交易实现:
```python
class Transaction(object):
@@ -228,9 +228,10 @@ class Transaction(object):
self.amount = amount
```
-### Use case: Service recommends a budget
+### 用例:服务推荐预算
-To start, we could use a generic budget template that allocates category amounts based on income tiers. Using this approach, we would not have to store the 100 million budget items identified in the constraints, only those that the user overrides. If a user overrides a budget category, which we could store the override in the `TABLE budget_overrides`.
+首先,我们可以使用根据收入等级分配每类别金额的通用预算模板。使用这种方法,我们不必存储在约束中标识的 1 亿个预算项目,只需存储用户覆盖的预算项目。如果用户覆盖预算类别,我们可以在
+`TABLE budget_overrides`中存储此覆盖。
```python
class Budget(object):
@@ -252,26 +253,26 @@ class Budget(object):
self.categories_to_budget_map[category] = amount
```
-For the **Budget Service**, we can potentially run SQL queries on the `transactions` table to generate the `monthly_spending` aggregate table. The `monthly_spending` table would likely have much fewer rows than the total 5 billion transactions, since users typically have many transactions per month.
+对于 **预算服务** 而言,我们可以在`transactions`表上运行SQL查询以生成`monthly_spending`聚合表。由于用户通常每个月有很多交易,所以`monthly_spending`表的行数可能会少于总共50亿次交易的行数。
-As an alternative, we can run **MapReduce** jobs on the raw transaction files to:
+作为替代,我们可以在原始交易文件上运行 **MapReduce** 作业来:
-* Categorize each transaction
-* Generate aggregate monthly spending by category
+* 分类每个交易
+* 按类别生成每月总支出
-Running analyses on the transaction files could significantly reduce the load on the database.
+对交易文件的运行分析可以显著减少数据库的负载。
-We could call the **Budget Service** to re-run the analysis if the user updates a category.
+如果用户更新类别,我们可以调用 **预算服务** 重新运行分析。
-**Clarify with your interviewer how much code you are expected to write**.
+**告知你的面试官你准备写多少代码**.
-Sample log file format, tab delimited:
+日志文件格式样例,以tab分割:
```
user_id timestamp seller amount
```
-**MapReduce** implementation:
+**MapReduce** 实现:
```python
class SpendingByCategory(MRJob):
@@ -282,26 +283,25 @@ class SpendingByCategory(MRJob):
...
def calc_current_year_month(self):
- """Return the current year and month."""
+ """返回当前年月"""
...
def extract_year_month(self, timestamp):
- """Return the year and month portions of the timestamp."""
+ """返回时间戳的年,月部分"""
...
def handle_budget_notifications(self, key, total):
- """Call notification API if nearing or exceeded budget."""
+ """如果接近或超出预算,调用通知API"""
...
def mapper(self, _, line):
- """Parse each log line, extract and transform relevant lines.
+ """解析每个日志行,提取和转换相关行。
- Argument line will be of the form:
+ 参数行应为如下形式:
user_id timestamp seller amount
- Using the categorizer to convert seller to category,
- emit key value pairs of the form:
+ 使用分类器来将卖家转换成类别,生成如下形式的key-value对:
(user_id, 2016-01, shopping), 25
(user_id, 2016-01, shopping), 100
@@ -314,7 +314,7 @@ class SpendingByCategory(MRJob):
yield (user_id, period, category), amount
def reducer(self, key, value):
- """Sum values for each key.
+ """将每个key对应的值求和。
(user_id, 2016-01, shopping), 125
(user_id, 2016-01, gas), 50
@@ -323,119 +323,118 @@ class SpendingByCategory(MRJob):
yield key, sum(values)
```
-## Step 4: Scale the design
+## 第四步:设计扩展
-> Identify and address bottlenecks, given the constraints.
+> 根据限制条件,找到并解决瓶颈。
![Imgur](http://i.imgur.com/V5q57vU.png)
-**Important: Do not simply jump right into the final design from the initial design!**
+**重要提示:不要从最初设计直接跳到最终设计中!**
-State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
-It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
-We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
+我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
-*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
-* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
-* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
-* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
-* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
-* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
-* [Cache](https://github.com/donnemartin/system-design-primer#cache)
-* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
-* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
-* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
-* [Asynchronism](https://github.com/donnemartin/system-design-primer#asynchronism)
-* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
-* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
+* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
+* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [异步](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#异步)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
-We'll add an additional use case: **User** accesses summaries and transactions.
+我们将增加一个额外的用例:**用户** 访问摘要和交易数据。
-User sessions, aggregate stats by category, and recent transactions could be placed in a **Memory Cache** such as Redis or Memcached.
+用户会话,按类别统计的统计信息,以及最近的事务可以放在 **内存缓存**(如 Redis 或 Memcached )中。
-* The **Client** sends a read request to the **Web Server**
-* The **Web Server** forwards the request to the **Read API** server
- * Static content can be served from the **Object Store** such as S3, which is cached on the **CDN**
-* The **Read API** server does the following:
- * Checks the **Memory Cache** for the content
- * If the url is in the **Memory Cache**, returns the cached contents
- * Else
- * If the url is in the **SQL Database**, fetches the contents
- * Updates the **Memory Cache** with the contents
+* **客户端** 发送读请求给 **Web 服务器**
+* **Web 服务器** 转发请求到 **读 API** 服务器
+ * 静态内容可通过 **对象存储** 比如缓存在 **CDN** 上的 S3 来服务
+* **读 API** 服务器做如下动作:
+ * 检查 **内存缓存** 的内容
+ * 如果URL在 **内存缓存**中,返回缓存的内容
+ * 否则
+ * 如果URL在 **SQL 数据库**中,获取该内容
+ * 以其内容更新 **内存缓存**
-Refer to [When to update the cache](https://github.com/donnemartin/system-design-primer#when-to-update-the-cache) for tradeoffs and alternatives. The approach above describes [cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside).
+参考 [何时更新缓存](https://github.com/donnemartin/system-design-primer#when-to-update-the-cache) 中权衡和替代的内容。以上方法描述了 [cache-aside缓存模式](https://github.com/donnemartin/system-design-primer#cache-aside).
-Instead of keeping the `monthly_spending` aggregate table in the **SQL Database**, we could create a separate **Analytics Database** using a data warehousing solution such as Amazon Redshift or Google BigQuery.
+我们可以使用诸如 Amazon Redshift 或者 Google BigQuery 等数据仓库解决方案,而不是将`monthly_spending`聚合表保留在 **SQL 数据库** 中。
-We might only want to store a month of `transactions` data in the database, while storing the rest in a data warehouse or in an **Object Store**. An **Object Store** such as Amazon S3 can comfortably handle the constraint of 250 GB of new content per month.
+我们可能只想在数据库中存储一个月的`交易`数据,而将其余数据存储在数据仓库或者 **对象存储区** 中。**对象存储区** (如Amazon S3) 能够舒服地解决每月 250 GB新内容的限制。
-To address the 2,000 *average* read requests per second (higher at peak), traffic for popular content should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. The **SQL Read Replicas** should be able to handle the cache misses, as long as the replicas are not bogged down with replicating writes.
+为了解决每秒 *平均* 2000 次读请求数(峰值时更高),受欢迎的内容的流量应由 **内存缓存** 而不是数据库来处理。 **内存缓存** 也可用于处理不均匀分布的流量和流量尖峰。 只要副本不陷入重复写入的困境,**SQL 读副本** 应该能够处理高速缓存未命中。
-200 *average* transaction writes per second (higher at peak) might be tough for a single **SQL Write Master-Slave**. We might need to employ additional SQL scaling patterns:
+*平均* 200 次交易写入每秒(峰值时更高)对于单个 **SQL 写入主-从服务** 来说可能是棘手的。我们可能需要考虑其它的 SQL 性能拓展技术:
-* [Federation](https://github.com/donnemartin/system-design-primer#federation)
-* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
-* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
-We should also consider moving some data to a **NoSQL Database**.
+我们也可以考虑将一些数据移至 **NoSQL 数据库**。
-## Additional talking points
+## 其它要点
-> Additional topics to dive into, depending on the problem scope and time remaining.
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
#### NoSQL
-* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
-* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
-* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
-* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
-### Caching
+### 缓存
-* Where to cache
- * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
- * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
- * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
- * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
- * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
-* What to cache
- * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
- * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
-* When to update the cache
- * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
- * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
- * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
- * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
-### Asynchronism and microservices
+### 异步与微服务
-* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
-* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
-* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
-* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
-### Communications
+### 通信
-* Discuss tradeoffs:
- * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
- * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
-* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
-### Security
+### 安全性
-Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
+请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
-### Latency numbers
+### 延迟数值
-See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
+请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
-### Ongoing
+### 持续探讨
-* Continue benchmarking and monitoring your system to address bottlenecks as they come up
-* Scaling is an iterative process
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构拓展是一个迭代的过程。
diff --git a/solutions/system_design/pastebin/README.md b/solutions/system_design/pastebin/README.md
index 756c78c2..9210b02b 100644
--- a/solutions/system_design/pastebin/README.md
+++ b/solutions/system_design/pastebin/README.md
@@ -1,112 +1,113 @@
-# Design Pastebin.com (or Bit.ly)
+# 设计 Pastebin.com(或 Bit.ly)
-*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
-**Design Bit.ly** - is a similar question, except pastebin requires storing the paste contents instead of the original unshortened url.
+除了粘贴板需要存储的是完整的内容而不是短链接之外,**设计 Bit.ly**是与本文类似的一个问题。
-## Step 1: Outline use cases and constraints
+## 第一步:简述用例与约束条件
-> Gather requirements and scope the problem.
-> Ask questions to clarify use cases and constraints.
-> Discuss assumptions.
+> 搜集需求与问题的范围。
+> 提出问题来明确用例与约束条件。
+> 讨论假设。
-Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
-### Use cases
+### 用例
-#### We'll scope the problem to handle only the following use cases
+#### 我们将把问题限定在仅处理以下用例的范围中
-* **User** enters a block of text and gets a randomly generated link
- * Expiration
- * Default setting does not expire
- * Can optionally set a timed expiration
-* **User** enters a paste's url and views the contents
-* **User** is anonymous
-* **Service** tracks analytics of pages
- * Monthly visit stats
-* **Service** deletes expired pastes
-* **Service** has high availability
-#### Out of scope
+* **用户**输入一些文本,然后得到一个随机生成的链接
+ * 过期时间
+ * 默认为永不过期
+ * 可选设置为一定时间过期
+* **用户**输入粘贴板中的 url,查看内容
+* **用户**是匿名访问的
+* **服务**需要能够对页面进行跟踪分析
+ * 月访问量统计
+* **服务**将过期的内容删除
+* **服务**有着高可用性
-* **User** registers for an account
- * **User** verifies email
-* **User** logs into a registered account
- * **User** edits the document
-* **User** can set visibility
-* **User** can set the shortlink
+#### 不在用例范围内的有
-### Constraints and assumptions
+* **用户**注册了账号
+ * **用户**通过了邮箱验证
+* **用户**登录已注册的账号
+ * **用户**编辑他们的文档
+* **用户**能设置他们的内容是否可见
+* **用户**是否能自行设置短链接
-#### State assumptions
+### 限制条件与假设
-* Traffic is not evenly distributed
-* Following a short link should be fast
-* Pastes are text only
-* Page view analytics do not need to be realtime
-* 10 million users
-* 10 million paste writes per month
-* 100 million paste reads per month
-* 10:1 read to write ratio
+#### 提出假设
-#### Calculate usage
+* 网络流量不是均匀分布的
+* 生成短链接的速度必须要快
+* 只允许粘贴文本
+* 不需要对页面预览做实时分析
+* 1000 万用户
+* 每个月 1000 万次粘贴
+* 每个月 1 亿次读取请求
+* 10:1 的读写比例
-**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
+#### 计算用量
-* Size per paste
- * 1 KB content per paste
- * `shortlink` - 7 bytes
- * `expiration_length_in_minutes` - 4 bytes
- * `created_at` - 5 bytes
- * `paste_path` - 255 bytes
- * total = ~1.27 KB
-* 12.7 GB of new paste content per month
- * 1.27 KB per paste * 10 million pastes per month
- * ~450 GB of new paste content in 3 years
- * 360 million shortlinks in 3 years
- * Assume most are new pastes instead of updates to existing ones
-* 4 paste writes per second on average
-* 40 read requests per second on average
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
-Handy conversion guide:
+* 每次粘贴的用量
+ * 1 KB 的内容
+ * `shortlink` - 7 字节
+ * `expiration_length_in_minutes` - 4 字节
+ * `created_at` - 5 字节
+ * `paste_path` - 255 字节
+ * 总计:大约 1.27 KB
+* 每个月的粘贴造作将会产生 12.7 GB 的记录
+ * 每次粘贴 1.27 KB * 1000 万次粘贴
+ * 3年内大约产生了 450 GB 的新内容记录
+ * 3年内生成了 36000 万个短链接
+ * 假设大多数的粘贴操作都是新的粘贴而不是更新以前的粘贴内容
+* 平均每秒 4 次读取粘贴
+* 平均每秒 40 次读取粘贴请求
-* 2.5 million seconds per month
-* 1 request per second = 2.5 million requests per month
-* 40 requests per second = 100 million requests per month
-* 400 requests per second = 1 billion requests per month
+便利换算指南:
-## Step 2: Create a high level design
+* 每个月有 250 万秒
+* 每秒一个请求 = 每个月 250 万次请求
+* 每秒 40 个请求 = 每个月 1 亿次请求
+* 每秒 400 个请求 = 每个月 10 亿次请求
-> Outline a high level design with all important components.
+## 第二步:概要设计
+
+> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/BKsBnmG.png)
-## Step 3: Design core components
+## 第三步:设计核心组件
-> Dive into details for each core component.
+> 深入每个核心组件的细节。
-### Use case: User enters a block of text and gets a randomly generated link
+### 用例:用户输入一些文本,然后得到一个随机生成的链接
-We could use a [relational database](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) as a large hash table, mapping the generated url to a file server and path containing the paste file.
+我们将使用[关系型数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms),将其作为一个超大哈希表,将生成的 url 和文件服务器上对应文件的路径一一对应。
-Instead of managing a file server, we could use a managed **Object Store** such as Amazon S3 or a [NoSQL document store](https://github.com/donnemartin/system-design-primer#document-store).
+我们可以使用诸如 Amazon S3 之类的**对象存储服务**或者[NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#nosql)来代替自建文件服务器。
-An alternative to a relational database acting as a large hash table, we could use a [NoSQL key-value store](https://github.com/donnemartin/system-design-primer#key-value-store). We should discuss the [tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql). The following discussion uses the relational database approach.
+除了使用关系型数据库来作为一个超大哈希表之外,我们也可以使用[NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#nosql)来代替它。[究竟是用 SQL 还是用 NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。不过在下面的讨论中,我们默认选择了使用关系型数据库的方案。
-* The **Client** sends a create paste request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* The **Web Server** forwards the request to the **Write API** server
-* The **Write API** server does the following:
- * Generates a unique url
- * Checks if the url is unique by looking at the **SQL Database** for a duplicate
- * If the url is not unique, it generates another url
- * If we supported a custom url, we could use the user-supplied (also check for a duplicate)
- * Saves to the **SQL Database** `pastes` table
- * Saves the paste data to the **Object Store**
- * Returns the url
+* **客户端**向向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个粘贴请求
+* **Web 服务器** 将请求转发给**Write API** 服务
+* **Write API**服务将会:
+ * 生成一个独一无二的 url
+ * 通过在 **SQL 数据库**中查重来确认这个 url 是否的确独一无二
+ * 如果这个 url 已经存在了,重新生成一个 url
+ * 如果支持自定义 url,我们也可以使用用户提供的 url(也需要进行查重)
+ * 将 url 存入 **SQL 数据库**的 `pastes` 表中
+ * 将粘贴的数据存入**对象存储**系统中
+ * 返回 url
-**Clarify with your interviewer how much code you are expected to write**.
+**向你的面试官告知你准备写多少代码**。
-The `pastes` table could have the following structure:
+`pastes` 表的数据结构如下:
```
shortlink char(7) NOT NULL
@@ -116,19 +117,19 @@ paste_path varchar(255) NOT NULL
PRIMARY KEY(shortlink)
```
-We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `shortlink ` and `created_at` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
+我们会以`shortlink` 与 `created_at` 创建一个 [索引](https://github.com/donnemartin/system-design-primer#use-good-indices)以加快查询速度(只需要使用读取日志的时间,不再需要每次都扫描整个数据表)并让数据常驻内存。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
-To generate the unique url, we could:
+为了生成独一无二的 url,我们需要:
-* Take the [**MD5**](https://en.wikipedia.org/wiki/MD5) hash of the user's ip_address + timestamp
- * MD5 is a widely used hashing function that produces a 128-bit hash value
- * MD5 is uniformly distributed
- * Alternatively, we could also take the MD5 hash of randomly-generated data
-* [**Base 62**](https://www.kerstner.at/2012/07/shortening-strings-using-base-62-encoding/) encode the MD5 hash
- * Base 62 encodes to `[a-zA-Z0-9]` which works well for urls, eliminating the need for escaping special characters
- * There is only one hash result for the original input and Base 62 is deterministic (no randomness involved)
- * Base 64 is another popular encoding but provides issues for urls because of the additional `+` and `/` characters
- * The following [Base 62 pseudocode](http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener) runs in O(k) time where k is the number of digits = 7:
+* 对用户的 IP 地址 + 时间戳进行 [**MD5**](https://en.wikipedia.org/wiki/MD5) 哈希编码
+ * MD5 是一种非常常用的哈希化函数,它能生成 128 字节的哈希值
+ * MD5 是均匀分布的
+ * 另外,我们可以使用 MD5 哈希算法来生成随机数据
+* 对 MD5 哈希值进行 [**Base 62**](https://www.kerstner.at/2012/07/shortening-strings-using-base-62-encoding/) 编码
+ * Base 62 编码后的值由 `[a-zA-Z0-9]` 组成,它们可以直接作为 url 的字符,不需要再次转义
+ * 在这儿仅仅只对原始输入进行过一次哈希处理,Base 62 编码步骤是确定性的(不涉及随机性)
+ * Base 64 是另一种很流行的编码形式,但是它生成的字符串作为 url 存在一些问题:Base 64m字符串内包含 `+` 和 `/` 符号
+ * 下面的 [Base 62 pseudocode](http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener) 算法时间复杂度为 O(k),本例中取 num =7,即 k 值为 7:
```python
def base_encode(num, base=62):
@@ -140,20 +141,19 @@ def base_encode(num, base=62):
digits = digits.reverse
```
-* Take the first 7 characters of the output, which results in 62^7 possible values and should be sufficient to handle our constraint of 360 million shortlinks in 3 years:
+* 输出前 7 个字符,其结果将有 62^7 种可能的值,作为短链接来说足够了。因为我们限制了 3 年内最多产生 36000 万个短链接:
```python
url = base_encode(md5(ip_address+timestamp))[:URL_LENGTH]
```
-
-We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
+我们可以调用一个公共的 [REST API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest):
```
$ curl -X POST --data '{ "expiration_length_in_minutes": "60", \
"paste_contents": "Hello World!" }' https://pastebin.com/api/v1/paste
```
-Response:
+返回:
```
{
@@ -161,16 +161,16 @@ Response:
}
```
-For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
+而对于服务器内部的通信,我们可以使用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
-### Use case: User enters a paste's url and views the contents
+### 用例:用户输入了一个之前粘贴得到的 url,希望浏览其存储的内容
-* The **Client** sends a get paste request to the **Web Server**
-* The **Web Server** forwards the request to the **Read API** server
-* The **Read API** server does the following:
- * Checks the **SQL Database** for the generated url
- * If the url is in the **SQL Database**, fetch the paste contents from the **Object Store**
- * Else, return an error message for the user
+* **客户端**向**Web 服务器**发起读取内容请求
+* **Web 服务器**将请求转发给**Read API**服务
+* **Read API**服务将会:
+ * 在**SQL 数据库**中检查生成的 url
+ * 如果查询的 url 存在于 **SQL 数据库**中,从**对象存储**服务将对应的粘贴内容取出
+ * 否则,给用户返回报错
REST API:
@@ -178,7 +178,7 @@ REST API:
$ curl https://pastebin.com/api/v1/paste?shortlink=foobar
```
-Response:
+返回:
```
{
@@ -188,27 +188,27 @@ Response:
}
```
-### Use case: Service tracks analytics of pages
+### 用例:对页面进行跟踪分析
-Since realtime analytics are not a requirement, we could simply **MapReduce** the **Web Server** logs to generate hit counts.
+由于不需要进行实时分析,因此我们可以简单地对 **Web 服务**产生的日志用 **MapReduce** 来统计 hit 计数(命中数)。
-**Clarify with your interviewer how much code you are expected to write**.
+**向你的面试官告知你准备写多少代码**。
```python
class HitCounts(MRJob):
def extract_url(self, line):
- """Extract the generated url from the log line."""
+ """从 log 中取出生成的 url。"""
...
def extract_year_month(self, line):
- """Return the year and month portions of the timestamp."""
+ """返回时间戳中表示年份与月份的一部分"""
...
def mapper(self, _, line):
- """Parse each log line, extract and transform relevant lines.
+ """解析日志的每一行,提取并转换相关行,
- Emit key value pairs of the form:
+ 将键值对设定为如下形式:
(2016-01, url0), 1
(2016-01, url0), 1
@@ -218,8 +218,8 @@ class HitCounts(MRJob):
period = self.extract_year_month(line)
yield (period, url), 1
- def reducer(self, key, values):
- """Sum values for each key.
+ def reducer(self, key, value):
+ """将所有的 key 加起来
(2016-01, url0), 2
(2016-01, url1), 1
@@ -227,106 +227,105 @@ class HitCounts(MRJob):
yield key, sum(values)
```
-### Use case: Service deletes expired pastes
+### 用例:服务删除过期的粘贴内容
-To delete expired pastes, we could just scan the **SQL Database** for all entries whose expiration timestamp are older than the current timestamp. All expired entries would then be deleted (or marked as expired) from the table.
+我们可以通过扫描 **SQL 数据库**,查找出那些过期时间戳小于当前时间戳的条目,然后在表中删除(或者将其标记为过期)这些过期的粘贴内容。
-## Step 4: Scale the design
+## 第四步:架构扩展
-> Identify and address bottlenecks, given the constraints.
+> 根据限制条件,找到并解决瓶颈。
![Imgur](http://i.imgur.com/4edXG0T.png)
-**Important: Do not simply jump right into the final design from the initial design!**
+**重要提示:不要从最初设计直接跳到最终设计中!**
-State you would do this iteratively: 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
-It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
-We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
+我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
-*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
-* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
-* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
-* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
-* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
-* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
-* [Cache](https://github.com/donnemartin/system-design-primer#cache)
-* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
-* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
-* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
-* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
-* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
+* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
+* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
-The **Analytics Database** could use a data warehousing solution such as Amazon Redshift or Google BigQuery.
+**分析数据库** 可以用现成的数据仓储系统,例如使用 Amazon Redshift 或者 Google BigQuery 的解决方案。
-An **Object Store** such as Amazon S3 can comfortably handle the constraint of 12.7 GB of new content per month.
+Amazon S3 的**对象存储**系统可以很方便地设置每个月限制只允许新增 12.7 GB 的存储内容。
-To address the 40 *average* read requests per second (higher at peak), traffic for popular content should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. The **SQL Read Replicas** should be able to handle the cache misses, as long as the replicas are not bogged down with replicating writes.
+平均每秒 40 次的读取请求(峰值将会更高), 可以通过扩展 **内存缓存** 来处理热点内容的读取流量,这对于处理不均匀分布的流量和流量峰值也很有用。只要 SQL 副本不陷入复制-写入困境中,**SQL Read 副本** 基本能够处理缓存命中问题。
-4 *average* paste writes per second (with higher at peak) should be do-able for a single **SQL Write Master-Slave**. Otherwise, we'll need to employ additional SQL scaling patterns:
+平均每秒 4 次的粘贴写入操作(峰值将会更高)对于单个**SQL 写主-从** 模式来说是可行的。不过,我们也需要考虑其它的 SQL 性能拓展技术:
-* [Federation](https://github.com/donnemartin/system-design-primer#federation)
-* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
-* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
-We should also consider moving some data to a **NoSQL Database**.
+我们也可以考虑将一些数据移至 **NoSQL 数据库**。
-## Additional talking points
+## 其它要点
-> Additional topics to dive into, depending on the problem scope and time remaining.
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
#### NoSQL
-* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
-* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
-* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
-* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
-### Caching
+### 缓存
-* Where to cache
- * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
- * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
- * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
- * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
- * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
-* What to cache
- * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
- * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
-* When to update the cache
- * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
- * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
- * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
- * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
-### Asynchronism and microservices
+### 异步与微服务
-* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
-* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
-* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
-* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
-### Communications
+### 通信
-* Discuss tradeoffs:
- * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
- * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
-* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
-### Security
+### 安全性
-Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
+请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
-### Latency numbers
+### 延迟数值
-See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
+请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
-### Ongoing
+### 持续探讨
-* Continue benchmarking and monitoring your system to address bottlenecks as they come up
-* Scaling is an iterative process
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构拓展是一个迭代的过程。
diff --git a/solutions/system_design/query_cache/README.md b/solutions/system_design/query_cache/README.md
index 032adf34..c6f4be75 100644
--- a/solutions/system_design/query_cache/README.md
+++ b/solutions/system_design/query_cache/README.md
@@ -1,101 +1,101 @@
-# Design a key-value cache to save the results of the most recent web server queries
+# 设计一个键-值缓存来存储最近 web 服务查询的结果
-*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
-## Step 1: Outline use cases and constraints
+## 第一步:简述用例与约束条件
-> Gather requirements and scope the problem.
-> Ask questions to clarify use cases and constraints.
-> Discuss assumptions.
+> 搜集需求与问题的范围。
+> 提出问题来明确用例与约束条件。
+> 讨论假设。
-Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
-### Use cases
+### 用例
-#### We'll scope the problem to handle only the following use cases
+#### 我们将把问题限定在仅处理以下用例的范围中
-* **User** sends a search request resulting in a cache hit
-* **User** sends a search request resulting in a cache miss
-* **Service** has high availability
+* **用户**发送一个搜索请求,命中缓存
+* **用户**发送一个搜索请求,未命中缓存
+* **服务**有着高可用性
-### Constraints and assumptions
+### 限制条件与假设
-#### State assumptions
+#### 提出假设
-* Traffic is not evenly distributed
- * Popular queries should almost always be in the cache
- * Need to determine how to expire/refresh
-* Serving from cache requires fast lookups
-* Low latency between machines
-* Limited memory in cache
- * Need to determine what to keep/remove
- * Need to cache millions of queries
-* 10 million users
-* 10 billion queries per month
+* 网络流量不是均匀分布的
+ * 经常被查询的内容应该一直存于缓存中
+ * 需要确定如何规定缓存过期、缓存刷新规则
+* 缓存提供的服务查询速度要快
+* 机器间延迟较低
+* 缓存有内存限制
+ * 需要决定缓存什么、移除什么
+ * 需要缓存百万级的查询
+* 1000 万用户
+* 每个月 100 亿次查询
-#### Calculate usage
+#### 计算用量
-**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
-* Cache stores ordered list of key: query, value: results
- * `query` - 50 bytes
- * `title` - 20 bytes
- * `snippet` - 200 bytes
- * Total: 270 bytes
-* 2.7 TB of cache data per month if all 10 billion queries are unique and all are stored
- * 270 bytes per search * 10 billion searches per month
- * Assumptions state limited memory, need to determine how to expire contents
-* 4,000 requests per second
+* 缓存存储的是键值对有序表,键为 `query`(查询),值为 `results`(结果)。
+ * `query` - 50 字节
+ * `title` - 20 字节
+ * `snippet` - 200 字节
+ * 总计:270 字节
+* 假如 100 亿次查询都是不同的,且全部需要存储,那么每个月需要 2.7 TB 的缓存空间
+ * 单次查询 270 字节 * 每月查询 100 亿次
+ * 假设内存大小有限制,需要决定如何制定缓存过期规则
+* 每秒 4,000 次请求
-Handy conversion guide:
+便利换算指南:
-* 2.5 million seconds per month
-* 1 request per second = 2.5 million requests per month
-* 40 requests per second = 100 million requests per month
-* 400 requests per second = 1 billion requests per month
+* 每个月有 250 万秒
+* 每秒一个请求 = 每个月 250 万次请求
+* 每秒 40 个请求 = 每个月 1 亿次请求
+* 每秒 400 个请求 = 每个月 10 亿次请求
-## Step 2: Create a high level design
+## 第二步:概要设计
-> Outline a high level design with all important components.
+> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/KqZ3dSx.png)
-## Step 3: Design core components
+## 第三步:设计核心组件
-> Dive into details for each core component.
+> 深入每个核心组件的细节。
-### Use case: User sends a request resulting in a cache hit
+### 用例:用户发送了一次请求,命中了缓存
-Popular queries can be served from a **Memory Cache** such as Redis or Memcached to reduce read latency and to avoid overloading the **Reverse Index Service** and **Document Service**. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
+常用的查询可以由例如 Redis 或者 Memcached 之类的**内存缓存**提供支持,以减少数据读取延迟,并且避免**反向索引服务**以及**文档服务**的过载。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
-Since the cache has limited capacity, we'll use a least recently used (LRU) approach to expire older entries.
+由于缓存容量有限,我们将使用 LRU(近期最少使用算法)来控制缓存的过期。
-* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* The **Web Server** forwards the request to the **Query API** server
-* The **Query API** server does the following:
- * Parses the query
- * Removes markup
- * Breaks up the text into terms
- * Fixes typos
- * Normalizes capitalization
- * Converts the query to use boolean operations
- * Checks the **Memory Cache** for the content matching the query
- * If there's a hit in the **Memory Cache**, the **Memory Cache** does the following:
- * Updates the cached entry's position to the front of the LRU list
- * Returns the cached contents
- * Else, the **Query API** does the following:
- * Uses the **Reverse Index Service** to find documents matching the query
- * The **Reverse Index Service** ranks the matching results and returns the top ones
- * Uses the **Document Service** to return titles and snippets
- * Updates the **Memory Cache** with the contents, placing the entry at the front of the LRU list
+* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
+* 这个 **Web 服务器**将请求转发给**查询 API** 服务
+* **查询 API** 服务将会做这些事情:
+ * 分析查询
+ * 移除多余的内容
+ * 将文本分割成词组
+ * 修正拼写错误
+ * 规范化字母的大小写
+ * 将查询转换为布尔运算
+ * 检测**内存缓存**是否有匹配查询的内容
+ * 如果命中**内存缓存**,**内存缓存**将会做以下事情:
+ * 将缓存入口的位置指向 LRU 链表的头部
+ * 返回缓存内容
+ * 否则,**查询 API** 将会做以下事情:
+ * 使用**反向索引服务**来查找匹配查询的文档
+ * **反向索引服务**对匹配到的结果进行排名,然后返回最符合的结果
+ * 使用**文档服务**返回文章标题与片段
+ * 更新**内存缓存**,存入内容,将**内存缓存**入口位置指向 LRU 链表的头部
-#### Cache implementation
+#### 缓存的实现
-The cache can use a doubly-linked list: new items will be added to the head while items to expire will be removed from the tail. We'll use a hash table for fast lookups to each linked list node.
+缓存可以使用双向链表实现:新元素将会在头结点加入,过期的元素将会在尾节点被删除。我们使用哈希表以便能够快速查找每个链表节点。
-**Clarify with your interviewer how much code you are expected to write**.
+**向你的面试官告知你准备写多少代码**。
-**Query API Server** implementation:
+实现**查询 API 服务**:
```python
class QueryApi(object):
@@ -105,8 +105,8 @@ class QueryApi(object):
self.reverse_index_service = reverse_index_service
def parse_query(self, query):
- """Remove markup, break text into terms, deal with typos,
- normalize capitalization, convert to use boolean operations.
+ """移除多余内容,将文本分割成词组,修复拼写错误,
+ 规范化字母大小写,转换布尔运算。
"""
...
@@ -119,7 +119,7 @@ class QueryApi(object):
return results
```
-**Node** implementation:
+实现**节点**:
```python
class Node(object):
@@ -129,7 +129,7 @@ class Node(object):
self.results = results
```
-**LinkedList** implementation:
+实现**链表**:
```python
class LinkedList(object):
@@ -148,7 +148,7 @@ class LinkedList(object):
...
```
-**Cache** implementation:
+实现**缓存**:
```python
class Cache(object):
@@ -160,9 +160,9 @@ class Cache(object):
self.linked_list = LinkedList()
def get(self, query)
- """Get the stored query result from the cache.
+ """从缓存取得存储的内容
- Accessing a node updates its position to the front of the LRU list.
+ 将入口节点位置更新为 LRU 链表的头部。
"""
node = self.lookup[query]
if node is None:
@@ -171,136 +171,136 @@ class Cache(object):
return node.results
def set(self, results, query):
- """Set the result for the given query key in the cache.
+ """将所给查询键的结果存在缓存中。
- When updating an entry, updates its position to the front of the LRU list.
- If the entry is new and the cache is at capacity, removes the oldest entry
- before the new entry is added.
+ 当更新缓存记录的时候,将它的位置指向 LRU 链表的头部。
+ 如果这个记录是新的记录,并且缓存空间已满,应该在加入新记录前
+ 删除最老的记录。
"""
node = self.lookup[query]
if node is not None:
- # Key exists in cache, update the value
+ # 键存在于缓存中,更新它对应的值
node.results = results
self.linked_list.move_to_front(node)
else:
- # Key does not exist in cache
+ # 键不存在于缓存中
if self.size == self.MAX_SIZE:
- # Remove the oldest entry from the linked list and lookup
+ # 在链表中查找并删除最老的记录
self.lookup.pop(self.linked_list.tail.query, None)
self.linked_list.remove_from_tail()
else:
self.size += 1
- # Add the new key and value
+ # 添加新的键值对
new_node = Node(query, results)
self.linked_list.append_to_front(new_node)
self.lookup[query] = new_node
```
-#### When to update the cache
+#### 何时更新缓存
-The cache should be updated when:
+缓存将会在以下几种情况更新:
-* The page contents change
-* The page is removed or a new page is added
-* The page rank changes
+* 页面内容发生变化
+* 页面被移除或者加入了新页面
+* 页面的权值发生变动
-The most straightforward way to handle these cases is to simply set a max time that a cached entry can stay in the cache before it is updated, usually referred to as time to live (TTL).
+解决这些问题的最直接的方法,就是为缓存记录设置一个它在被更新前能留在缓存中的最长时间,这个时间简称为存活时间(TTL)。
-Refer to [When to update the cache](https://github.com/donnemartin/system-design-primer#when-to-update-the-cache) for tradeoffs and alternatives. The approach above describes [cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside).
+参考 [「何时更新缓存」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#何时更新缓存)来了解其权衡取舍及替代方案。以上方法在[缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)一章中详细地进行了描述。
-## Step 4: Scale the design
+## 第四步:架构扩展
-> Identify and address bottlenecks, given the constraints.
+> 根据限制条件,找到并解决瓶颈。
![Imgur](http://i.imgur.com/4j99mhe.png)
-**Important: Do not simply jump right into the final design from the initial design!**
+**重要提示:不要从最初设计直接跳到最终设计中!**
-State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
-It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
-We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
+我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
-*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
-* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
-* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
-* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
-* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
-* [Cache](https://github.com/donnemartin/system-design-primer#cache)
-* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
-* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
-### Expanding the Memory Cache to many machines
+### 将内存缓存扩大到多台机器
-To handle the heavy request load and the large amount of memory needed, we'll scale horizontally. We have three main options on how to store the data on our **Memory Cache** cluster:
+为了解决庞大的请求负载以及巨大的内存需求,我们将要对架构进行水平拓展。如何在我们的**内存缓存**集群中存储数据呢?我们有以下三个主要可选方案:
-* **Each machine in the cache cluster has its own cache** - Simple, although it will likely result in a low cache hit rate.
-* **Each machine in the cache cluster has a copy of the cache** - Simple, although it is an inefficient use of memory.
-* **The cache is [sharded](https://github.com/donnemartin/system-design-primer#sharding) across all machines in the cache cluster** - More complex, although it is likely the best option. We could use hashing to determine which machine could have the cached results of a query using `machine = hash(query)`. We'll likely want to use [consistent hashing](https://github.com/donnemartin/system-design-primer#under-development).
+* **缓存集群中的每一台机器都有自己的缓存** - 简单,但是它会降低缓存命中率。
+* **缓存集群中的每一台机器都有缓存的拷贝** - 简单,但是它的内存使用效率太低了。
+* **对缓存进行[分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片),分别部署在缓存集群中的所有机器中** - 更加复杂,但是它是最佳的选择。我们可以使用哈希,用查询语句 `machine = hash(query)` 来确定哪台机器有需要缓存。当然我们也可以使用[一致性哈希](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#正在完善中)。
-## Additional talking points
+## 其它要点
-> Additional topics to dive into, depending on the problem scope and time remaining.
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
-### SQL scaling patterns
+### SQL 缩放模式
-* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
-* [Federation](https://github.com/donnemartin/system-design-primer#federation)
-* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
-* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [读取复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
#### NoSQL
-* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
-* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
-* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
-* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
-### Caching
+### 缓存
-* Where to cache
- * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
- * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
- * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
- * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
- * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
-* What to cache
- * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
- * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
-* When to update the cache
- * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
- * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
- * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
- * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
-### Asynchronism and microservices
+### 异步与微服务
-* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
-* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
-* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
-* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
-### Communications
+### 通信
-* Discuss tradeoffs:
- * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
- * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
-* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
-### Security
+### 安全性
-Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
+请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
-### Latency numbers
+### 延迟数值
-See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
+请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
-### Ongoing
+### 持续探讨
-* Continue benchmarking and monitoring your system to address bottlenecks as they come up
-* Scaling is an iterative process
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构拓展是一个迭代的过程。
diff --git a/solutions/system_design/sales_rank/README.md b/solutions/system_design/sales_rank/README.md
index 71ad1c7d..960f9258 100644
--- a/solutions/system_design/sales_rank/README.md
+++ b/solutions/system_design/sales_rank/README.md
@@ -1,88 +1,88 @@
-# Design Amazon's sales rank by category feature
+# 为 Amazon 设计分类售卖排行
-*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
-## Step 1: Outline use cases and constraints
+## 第一步:简述用例与约束条件
-> Gather requirements and scope the problem.
-> Ask questions to clarify use cases and constraints.
-> Discuss assumptions.
+> 搜集需求与问题的范围。
+> 提出问题来明确用例与约束条件。
+> 讨论假设。
-Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
-### Use cases
+### 用例
-#### We'll scope the problem to handle only the following use case
+#### 我们将把问题限定在仅处理以下用例的范围中
-* **Service** calculates the past week's most popular products by category
-* **User** views the past week's most popular products by category
-* **Service** has high availability
+* **服务**根据分类计算过去一周中最受欢迎的商品
+* **用户**通过分类浏览过去一周中最受欢迎的商品
+* **服务**有着高可用性
-#### Out of scope
+#### 不在用例范围内的有
-* The general e-commerce site
- * Design components only for calculating sales rank
+* 一般的电商网站
+ * 只为售卖排行榜设计组件
-### Constraints and assumptions
+### 限制条件与假设
-#### State assumptions
+#### 提出假设
-* Traffic is not evenly distributed
-* Items can be in multiple categories
-* Items cannot change categories
-* There are no subcategories ie `foo/bar/baz`
-* Results must be updated hourly
- * More popular products might need to be updated more frequently
-* 10 million products
-* 1000 categories
-* 1 billion transactions per month
-* 100 billion read requests per month
-* 100:1 read to write ratio
+* 网络流量不是均匀分布的
+* 一个商品可能存在于多个分类中
+* 商品不能够更改分类
+* 不会存在如 `foo/bar/baz` 之类的子分类
+* 每小时更新一次结果
+ * 受欢迎的商品越多,就需要更频繁地更新
+* 1000 万个商品
+* 1000 个分类
+* 每个月 10 亿次交易
+* 每个月 1000 亿次读取请求
+* 100:1 的读写比例
-#### Calculate usage
+#### 计算用量
-**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
-* Size per transaction:
- * `created_at` - 5 bytes
- * `product_id` - 8 bytes
- * `category_id` - 4 bytes
- * `seller_id` - 8 bytes
- * `buyer_id` - 8 bytes
- * `quantity` - 4 bytes
- * `total_price` - 5 bytes
- * Total: ~40 bytes
-* 40 GB of new transaction content per month
- * 40 bytes per transaction * 1 billion transactions per month
- * 1.44 TB of new transaction content in 3 years
- * Assume most are new transactions instead of updates to existing ones
-* 400 transactions per second on average
-* 40,000 read requests per second on average
+* 每笔交易的用量:
+ * `created_at` - 5 字节
+ * `product_id` - 8 字节
+ * `category_id` - 4 字节
+ * `seller_id` - 8 字节
+ * `buyer_id` - 8 字节
+ * `quantity` - 4 字节
+ * `total_price` - 5 字节
+ * 总计:大约 40 字节
+* 每个月的交易内容会产生 40 GB 的记录
+ * 每次交易 40 字节 * 每个月 10 亿次交易
+ * 3年内产生了 1.44 TB 的新交易内容记录
+ * 假定大多数的交易都是新交易而不是更改以前进行完的交易
+* 平均每秒 400 次交易次数
+* 平均每秒 40,000 次读取请求
-Handy conversion guide:
+便利换算指南:
-* 2.5 million seconds per month
-* 1 request per second = 2.5 million requests per month
-* 40 requests per second = 100 million requests per month
-* 400 requests per second = 1 billion requests per month
+* 每个月有 250 万秒
+* 每秒一个请求 = 每个月 250 万次请求
+* 每秒 40 个请求 = 每个月 1 亿次请求
+* 每秒 400 个请求 = 每个月 10 亿次请求
-## Step 2: Create a high level design
+## 第二步:概要设计
-> Outline a high level design with all important components.
+> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/vwMa1Qu.png)
-## Step 3: Design core components
+## 第三步:设计核心组件
-> Dive into details for each core component.
+> 深入每个核心组件的细节。
-### Use case: Service calculates the past week's most popular products by category
+### 用例:服务需要根据分类计算上周最受欢迎的商品
-We could store the raw **Sales API** server log files on a managed **Object Store** such as Amazon S3, rather than managing our own distributed file system.
+我们可以在现成的**对象存储**系统(例如 Amazon S3 服务)中存储 **售卖 API** 服务产生的日志文本, 因此不需要我们自己搭建分布式文件系统了。
-**Clarify with your interviewer how much code you are expected to write**.
+**向你的面试官告知你准备写多少代码**。
-We'll assume this is a sample log entry, tab delimited:
+假设下面是一个用 tab 分割的简易的日志记录:
```
timestamp product_id category_id qty total_price seller_id buyer_id
@@ -95,24 +95,25 @@ t5 product4 category1 1 5.00 5 6
...
```
-The **Sales Rank Service** could use **MapReduce**, using the **Sales API** server log files as input and writing the results to an aggregate table `sales_rank` in a **SQL Database**. We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
+**售卖排行服务** 需要用到 **MapReduce**,并使用 **售卖 API** 服务进行日志记录,同时将结果写入 **SQL 数据库**中的总表 `sales_rank` 中。我们也可以讨论一下[究竟是用 SQL 还是用 NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
-We'll use a multi-step **MapReduce**:
+我们需要通过以下步骤使用 **MapReduce**:
-* **Step 1** - Transform the data to `(category, product_id), sum(quantity)`
-* **Step 2** - Perform a distributed sort
+* **第 1 步** - 将数据转换为 `(category, product_id), sum(quantity)` 的形式
+* **第 2 步** - 执行分布式排序
```python
class SalesRanker(MRJob):
def within_past_week(self, timestamp):
- """Return True if timestamp is within past week, False otherwise."""
+ """如果时间戳属于过去的一周则返回 True,
+ 否则返回 False。"""
...
def mapper(self, _ line):
- """Parse each log line, extract and transform relevant lines.
+ """解析日志的每一行,提取并转换相关行,
- Emit key value pairs of the form:
+ 将键值对设定为如下形式:
(category1, product1), 2
(category2, product1), 2
@@ -127,7 +128,7 @@ class SalesRanker(MRJob):
yield (category_id, product_id), quantity
def reducer(self, key, value):
- """Sum values for each key.
+ """将每个 key 的值加起来。
(category1, product1), 2
(category2, product1), 3
@@ -138,9 +139,9 @@ class SalesRanker(MRJob):
yield key, sum(values)
def mapper_sort(self, key, value):
- """Construct key to ensure proper sorting.
+ """构造 key 以确保正确的排序。
- Transform key and value to the form:
+ 将键值对转换成如下形式:
(category1, 2), product1
(category2, 3), product1
@@ -148,8 +149,8 @@ class SalesRanker(MRJob):
(category2, 7), product3
(category1, 1), product4
- The shuffle/sort step of MapReduce will then do a
- distributed sort on the keys, resulting in:
+ MapReduce 的随机排序步骤会将键
+ 值的排序打乱,变成下面这样:
(category1, 1), product4
(category1, 2), product1
@@ -165,7 +166,7 @@ class SalesRanker(MRJob):
yield key, value
def steps(self):
- """Run the map and reduce steps."""
+ """ 此处为 map reduce 步骤"""
return [
self.mr(mapper=self.mapper,
reducer=self.reducer),
@@ -174,7 +175,7 @@ class SalesRanker(MRJob):
]
```
-The result would be the following sorted list, which we could insert into the `sales_rank` table:
+得到的结果将会是如下的排序列,我们将其插入 `sales_rank` 表中:
```
(category1, 1), product4
@@ -184,7 +185,7 @@ The result would be the following sorted list, which we could insert into the `s
(category2, 7), product3
```
-The `sales_rank` table could have the following structure:
+`sales_rank` 表的数据结构如下:
```
id int NOT NULL AUTO_INCREMENT
@@ -196,21 +197,21 @@ FOREIGN KEY(category_id) REFERENCES Categories(id)
FOREIGN KEY(product_id) REFERENCES Products(id)
```
-We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id `, `category_id`, and `product_id` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
+我们会以 `id`、`category_id` 与 `product_id` 创建一个 [索引](https://github.com/donnemartin/system-design-primer#use-good-indices)以加快查询速度(只需要使用读取日志的时间,不再需要每次都扫描整个数据表)并让数据常驻内存。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
-### Use case: User views the past week's most popular products by category
+### 用例:用户需要根据分类浏览上周中最受欢迎的商品
-* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* The **Web Server** forwards the request to the **Read API** server
-* The **Read API** server reads from the **SQL Database** `sales_rank` table
+* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
+* 这个 **Web 服务器**将请求转发给**查询 API** 服务
+* The **查询 API** 服务将从 **SQL 数据库**的 `sales_rank` 表中读取数据
-We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
+我们可以调用一个公共的 [REST API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest):
```
$ curl https://amazon.com/api/v1/popular?category_id=1234
```
-Response:
+返回:
```
{
@@ -233,106 +234,105 @@ Response:
},
```
-For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
+而对于服务器内部的通信,我们可以使用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
-## Step 4: Scale the design
+## 第四步:架构扩展
-> Identify and address bottlenecks, given the constraints.
+> 根据限制条件,找到并解决瓶颈。
![Imgur](http://i.imgur.com/MzExP06.png)
-**Important: Do not simply jump right into the final design from the initial design!**
+**重要提示:不要从最初设计直接跳到最终设计中!**
-State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
-It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
-We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
+我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
-*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
-* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
-* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
-* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
-* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
-* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
-* [Cache](https://github.com/donnemartin/system-design-primer#cache)
-* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
-* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
-* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
-* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
-* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
+* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
+* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
-The **Analytics Database** could use a data warehousing solution such as Amazon Redshift or Google BigQuery.
+**分析数据库** 可以用现成的数据仓储系统,例如使用 Amazon Redshift 或者 Google BigQuery 的解决方案。
-We might only want to store a limited time period of data in the database, while storing the rest in a data warehouse or in an **Object Store**. An **Object Store** such as Amazon S3 can comfortably handle the constraint of 40 GB of new content per month.
+当使用数据仓储技术或者**对象存储**系统时,我们只想在数据库中存储有限时间段的数据。Amazon S3 的**对象存储**系统可以很方便地设置每个月限制只允许新增 40 GB 的存储内容。
-To address the 40,000 *average* read requests per second (higher at peak), traffic for popular content (and their sales rank) should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. With the large volume of reads, the **SQL Read Replicas** might not be able to handle the cache misses. We'll probably need to employ additional SQL scaling patterns.
+平均每秒 40,000 次的读取请求(峰值将会更高), 可以通过扩展 **内存缓存** 来处理热点内容的读取流量,这对于处理不均匀分布的流量和流量峰值也很有用。由于读取量非常大,**SQL Read 副本** 可能会遇到处理缓存未命中的问题,我们可能需要使用额外的 SQL 扩展模式。
-400 *average* writes per second (higher at peak) might be tough for a single **SQL Write Master-Slave**, also pointing to a need for additional scaling techniques.
+平均每秒 400 次写操作(峰值将会更高)可能对于单个 **SQL 写主-从** 模式来说比较很困难,因此同时还需要更多的扩展技术
-SQL scaling patterns include:
+SQL 缩放模式包括:
-* [Federation](https://github.com/donnemartin/system-design-primer#federation)
-* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
-* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
-We should also consider moving some data to a **NoSQL Database**.
+我们也可以考虑将一些数据移至 **NoSQL 数据库**。
-## Additional talking points
+## 其它要点
-> Additional topics to dive into, depending on the problem scope and time remaining.
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
#### NoSQL
-* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
-* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
-* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
-* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
-### Caching
+### 缓存
-* Where to cache
- * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
- * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
- * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
- * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
- * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
-* What to cache
- * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
- * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
-* When to update the cache
- * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
- * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
- * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
- * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
-### Asynchronism and microservices
+### 异步与微服务
-* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
-* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
-* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
-* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
-### Communications
+### 通信
-* Discuss tradeoffs:
- * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
- * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
-* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
-### Security
+### 安全性
-Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
+请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
-### Latency numbers
+### 延迟数值
-See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
+请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
-### Ongoing
+### 持续探讨
-* Continue benchmarking and monitoring your system to address bottlenecks as they come up
-* Scaling is an iterative process
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构拓展是一个迭代的过程。
diff --git a/solutions/system_design/scaling_aws/README.md b/solutions/system_design/scaling_aws/README.md
index 99af0cff..c071c70e 100644
--- a/solutions/system_design/scaling_aws/README.md
+++ b/solutions/system_design/scaling_aws/README.md
@@ -1,403 +1,403 @@
-# Design a system that scales to millions of users on AWS
+# 在 AWS 上设计支持百万级到千万级用户的系统
-*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
+**注释:为了避免重复,这篇文章的链接直接关联到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 的相关章节。为一讨论要点、折中方案和可选方案做参考。**
-## Step 1: Outline use cases and constraints
+## 第 1 步:用例和约束概要
-> Gather requirements and scope the problem.
-> Ask questions to clarify use cases and constraints.
-> Discuss assumptions.
+> 收集需求并调查问题。
+> 通过提问清晰用例和约束。
+> 讨论假设。
-Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
+如果没有面试官提出明确的问题,我们将自己定义一些用例和约束条件。
-### Use cases
+### 用例
-Solving this problem takes an iterative approach of: 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat, which is good pattern for evolving basic designs to scalable designs.
+解决这个问题是一个循序渐进的过程:1) **基准/负载 测试**, 2) 瓶颈 **概述**, 3) 当评估可选和折中方案时定位瓶颈,4) 重复,这是向可扩展的设计发展基础设计的好模式。
-Unless you have a background in AWS or are applying for a position that requires AWS knowledge, AWS-specific details are not a requirement. However, **much of the principles discussed in this exercise can apply more generally outside of the AWS ecosystem.**
+除非你有 AWS 的背景或者正在申请需要 AWS 知识的相关职位,否则不要求了解 AWS 的相关细节。并且,这个练习中讨论的许多原则可以更广泛地应用于AWS生态系统之外。
-#### We'll scope the problem to handle only the following use cases
+#### 我们就处理以下用例讨论这一问题
-* **User** makes a read or write request
- * **Service** does processing, stores user data, then returns the results
-* **Service** needs to evolve from serving a small amount of users to millions of users
- * Discuss general scaling patterns as we evolve an architecture to handle a large number of users and requests
-* **Service** has high availability
+* **用户** 进行读或写请求
+ * **服务** 进行处理,存储用户数据,然后返回结果
+* **服务** 需要从支持小规模用户开始到百万用户
+ * 在我们演化架构来处理大量的用户和请求时,讨论一般的扩展模式
+* **服务** 高可用
-### Constraints and assumptions
+### 约束和假设
-#### State assumptions
+#### 状态假设
-* Traffic is not evenly distributed
-* Need for relational data
-* Scale from 1 user to tens of millions of users
- * Denote increase of users as:
- * Users+
- * Users++
- * Users+++
+* 流量不均匀分布
+* 需要关系数据
+* 从一个用户扩展到千万用户
+ * 表示用户量的增长
+ * 用户量+
+ * 用户量++
+ * 用户量+++
* ...
- * 10 million users
- * 1 billion writes per month
- * 100 billion reads per month
- * 100:1 read to write ratio
- * 1 KB content per write
+ * 1000 万用户
+ * 每月 10 亿次写入
+ * 每月 1000 亿次读出
+ * 100:1 读写比率
+ * 每次写入 1 KB 内容
-#### Calculate usage
+#### 计算使用
-**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
+**向你的面试官厘清你是否应该做粗略的使用计算**
-* 1 TB of new content per month
- * 1 KB per write * 1 billion writes per month
- * 36 TB of new content in 3 years
- * Assume most writes are from new content instead of updates to existing ones
-* 400 writes per second on average
-* 40,000 reads per second on average
+* 1 TB 新内容 / 月
+ * 1 KB 每次写入 * 10 亿 写入 / 月
+ * 36 TB 新内容 / 3 年
+ * 假设大多数写入都是新内容而不是更新已有内容
+* 平均每秒 400 次写入
+* 平均每秒 40,000 次读取
-Handy conversion guide:
+便捷的转换指南:
-* 2.5 million seconds per month
-* 1 request per second = 2.5 million requests per month
-* 40 requests per second = 100 million requests per month
-* 400 requests per second = 1 billion requests per month
+* 250 万秒 / 月
+* 1 次请求 / 秒 = 250 万次请求 / 月
+* 40 次请求 / 秒 = 1 亿次请求 / 月
+* 400 次请求 / 秒 = 10 亿请求 / 月
-## Step 2: Create a high level design
+## 第 2 步:创建高级设计方案
-> Outline a high level design with all important components.
+> 用所有重要组件概述高水平设计
![Imgur](http://i.imgur.com/B8LDKD7.png)
-## Step 3: Design core components
+## 第 3 步:设计核心组件
-> Dive into details for each core component.
+> 深入每个核心组件的细节。
-### Use case: User makes a read or write request
+### 用例:用户进行读写请求
-#### Goals
+#### 目标
-* With only 1-2 users, you only need a basic setup
- * Single box for simplicity
- * Vertical scaling when needed
- * Monitor to determine bottlenecks
+* 只有 1-2 个用户时,你只需要基础配置
+ * 为简单起见,只需要一台服务器
+ * 必要时进行纵向扩展
+ * 监控以确定瓶颈
-#### Start with a single box
+#### 以单台服务器开始
-* **Web server** on EC2
- * Storage for user data
- * [**MySQL Database**](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
+* **Web 服务器** 在 EC2 上
+ * 存储用户数据
+ * [**MySQL 数据库**](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
-Use **Vertical Scaling**:
+运用 **纵向扩展**:
-* Simply choose a bigger box
-* Keep an eye on metrics to determine how to scale up
- * Use basic monitoring to determine bottlenecks: CPU, memory, IO, network, etc
- * CloudWatch, top, nagios, statsd, graphite, etc
-* Scaling vertically can get very expensive
-* No redundancy/failover
+* 选择一台更大容量的服务器
+* 密切关注指标,确定如何扩大规模
+ * 使用基本监控来确定瓶颈:CPU、内存、IO、网络等
+ * CloudWatch, top, nagios, statsd, graphite等
+* 纵向扩展的代价将变得更昂贵
+* 无冗余/容错
-*Trade-offs, alternatives, and additional details:*
+**折中方案, 可选方案, 和其他细节:**
-* The alternative to **Vertical Scaling** is [**Horizontal scaling**](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* **纵向扩展** 的可选方案是 [**横向扩展**](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
-#### Start with SQL, consider NoSQL
+#### 自 SQL 开始,但认真考虑 NoSQL
-The constraints assume there is a need for relational data. We can start off using a **MySQL Database** on the single box.
+约束条件假设需要关系型数据。我们可以开始时在单台服务器上使用 **MySQL 数据库**。
-*Trade-offs, alternatives, and additional details:*
+**折中方案, 可选方案, 和其他细节:**
-* See the [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) section
-* Discuss reasons to use [SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
+* 查阅 [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) 章节
+* 讨论使用 [SQL 或 NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql) 的原因
-#### Assign a public static IP
+#### 分配公共静态 IP
-* Elastic IPs provide a public endpoint whose IP doesn't change on reboot
-* Helps with failover, just point the domain to a new IP
+* 弹性 IP 提供了一个公共端点,不会在重启时改变 IP。
+* 故障转移时只需要把域名指向新 IP。
-#### Use a DNS
+#### 使用 DNS 服务
-Add a **DNS** such as Route 53 to map the domain to the instance's public IP.
+添加 **DNS** 服务,比如 Route 53([Amazon Route 53](https://aws.amazon.com/cn/route53/) - 译者注),将域映射到实例的公共 IP 中。
-*Trade-offs, alternatives, and additional details:*
+**折中方案, 可选方案, 和其他细节:**
-* See the [Domain name system](https://github.com/donnemartin/system-design-primer#domain-name-system) section
+* 查阅 [域名系统](https://github.com/donnemartin/system-design-primer#domain-name-system) 章节
-#### Secure the web server
+#### 安全的 Web 服务器
-* Open up only necessary ports
- * Allow the web server to respond to incoming requests from:
- * 80 for HTTP
- * 443 for HTTPS
- * 22 for SSH to only whitelisted IPs
- * Prevent the web server from initiating outbound connections
+* 只开放必要的端口
+ * 允许 Web 服务器响应来自以下端口的请求
+ * HTTP 80
+ * HTTPS 443
+ * SSH IP 白名单 22
+ * 防止 Web 服务器启动外链
-*Trade-offs, alternatives, and additional details:*
+**折中方案, 可选方案, 和其他细节:**
-* See the [Security](https://github.com/donnemartin/system-design-primer#security) section
+* 查阅 [安全](https://github.com/donnemartin/system-design-primer#security) 章节
-## Step 4: Scale the design
+## 第 4 步:扩展设计
-> Identify and address bottlenecks, given the constraints.
+> 在给定约束条件下,定义和确认瓶颈。
-### Users+
+### 用户+
![Imgur](http://i.imgur.com/rrfjMXB.png)
-#### Assumptions
+#### 假设
-Our user count is starting to pick up and the load is increasing on our single box. Our **Benchmarks/Load Tests** and **Profiling** are pointing to the **MySQL Database** taking up more and more memory and CPU resources, while the user content is filling up disk space.
+我们的用户数量开始上升,并且单台服务器的负载上升。**基准/负载测试** 和 **分析** 指出 **MySQL 数据库** 占用越来越多的内存和 CPU 资源,同时用户数据将填满硬盘空间。
-We've been able to address these issues with **Vertical Scaling** so far. Unfortunately, this has become quite expensive and it doesn't allow for independent scaling of the **MySQL Database** and **Web Server**.
+目前,我们尚能在纵向扩展时解决这些问题。不幸的是,解决这些问题的代价变得相当昂贵,并且原来的系统并不能允许在 **MySQL 数据库** 和 **Web 服务器** 的基础上进行独立扩展。
-#### Goals
+#### 目标
-* Lighten load on the single box and allow for independent scaling
- * Store static content separately in an **Object Store**
- * Move the **MySQL Database** to a separate box
-* Disadvantages
- * These changes would increase complexity and would require changes to the **Web Server** to point to the **Object Store** and the **MySQL Database**
- * Additional security measures must be taken to secure the new components
- * AWS costs could also increase, but should be weighed with the costs of managing similar systems on your own
+* 减轻单台服务器负载并且允许独立扩展
+ * 在 **对象存储** 中单独存储静态内容
+ * 将 **MySQL 数据库** 迁移到单独的服务器上
+* 缺点
+ * 这些变化会增加复杂性,并要求对 **Web服务器** 进行更改,以指向 **对象存储** 和 **MySQL 数据库**
+ * 必须采取额外的安全措施来确保新组件的安全
+ * AWS 的成本也会增加,但应该与自身管理类似系统的成本做比较
-#### Store static content separately
+#### 独立保存静态内容
-* Consider using a managed **Object Store** like S3 to store static content
- * Highly scalable and reliable
- * Server side encryption
-* Move static content to S3
- * User files
+* 考虑使用像 S3 这样可管理的 **对象存储** 服务来存储静态内容
+ * 高扩展性和可靠性
+ * 服务器端加密
+* 迁移静态内容到 S3
+ * 用户文件
* JS
* CSS
- * Images
- * Videos
+ * 图片
+ * 视频
-#### Move the MySQL database to a separate box
+#### 迁移 MySQL 数据库到独立机器上
-* Consider using a service like RDS to manage the **MySQL Database**
- * Simple to administer, scale
- * Multiple availability zones
- * Encryption at rest
+* 考虑使用类似 RDS 的服务来管理 **MySQL 数据库**
+ * 简单的管理,扩展
+ * 多个可用区域
+ * 空闲时加密
-#### Secure the system
+#### 系统安全
-* Encrypt data in transit and at rest
-* Use a Virtual Private Cloud
- * Create a public subnet for the single **Web Server** so it can send and receive traffic from the internet
- * Create a private subnet for everything else, preventing outside access
- * Only open ports from whitelisted IPs for each component
-* These same patterns should be implemented for new components in the remainder of the exercise
+* 在传输和空闲时对数据进行加密
+* 使用虚拟私有云
+ * 为单个 **Web 服务器** 创建一个公共子网,这样就可以发送和接收来自 internet 的流量
+ * 为其他内容创建一个私有子网,禁止外部访问
+ * 在每个组件上只为白名单 IP 打开端口
+* 这些相同的模式应当在新的组件的实现中实践
-*Trade-offs, alternatives, and additional details:*
+**折中方案, 可选方案, 和其他细节:**
-* See the [Security](https://github.com/donnemartin/system-design-primer#security) section
+* 查阅 [安全](https://github.com/donnemartin/system-design-primer#security) 章节
-### Users++
+### 用户+++
![Imgur](http://i.imgur.com/raoFTXM.png)
-#### Assumptions
+#### 假设
-Our **Benchmarks/Load Tests** and **Profiling** show that our single **Web Server** bottlenecks during peak hours, resulting in slow responses and in some cases, downtime. As the service matures, we'd also like to move towards higher availability and redundancy.
+我们的 **基准/负载测试** 和 **性能测试** 显示,在高峰时段,我们的单一 **Web服务器** 存在瓶颈,导致响应缓慢,在某些情况下还会宕机。随着服务的成熟,我们也希望朝着更高的可用性和冗余发展。
-#### Goals
+#### 目标
-* The following goals attempt to address the scaling issues with the **Web Server**
- * Based on the **Benchmarks/Load Tests** and **Profiling**, you might only need to implement one or two of these techniques
-* Use [**Horizontal Scaling**](https://github.com/donnemartin/system-design-primer#horizontal-scaling) to handle increasing loads and to address single points of failure
- * Add a [**Load Balancer**](https://github.com/donnemartin/system-design-primer#load-balancer) such as Amazon's ELB or HAProxy
- * ELB is highly available
- * If you are configuring your own **Load Balancer**, setting up multiple servers in [active-active](https://github.com/donnemartin/system-design-primer#active-active) or [active-passive](https://github.com/donnemartin/system-design-primer#active-passive) in multiple availability zones will improve availability
- * Terminate SSL on the **Load Balancer** to reduce computational load on backend servers and to simplify certificate administration
- * Use multiple **Web Servers** spread out over multiple availability zones
- * Use multiple **MySQL** instances in [**Master-Slave Failover**](https://github.com/donnemartin/system-design-primer#master-slave-replication) mode across multiple availability zones to improve redundancy
-* Separate out the **Web Servers** from the [**Application Servers**](https://github.com/donnemartin/system-design-primer#application-layer)
- * Scale and configure both layers independently
- * **Web Servers** can run as a [**Reverse Proxy**](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
- * For example, you can add **Application Servers** handling **Read APIs** while others handle **Write APIs**
-* Move static (and some dynamic) content to a [**Content Delivery Network (CDN)**](https://github.com/donnemartin/system-design-primer#content-delivery-network) such as CloudFront to reduce load and latency
+* 下面的目标试图用 **Web服务器** 解决扩展问题
+ * 基于 **基准/负载测试** 和 **分析**,你可能只需要实现其中的一两个技术
+* 使用 [**横向扩展**](https://github.com/donnemartin/system-design-primer#horizontal-scaling) 来处理增加的负载和单点故障
+ * 添加 [**负载均衡器**](https://github.com/donnemartin/system-design-primer#load-balancer) 例如 Amazon 的 ELB 或 HAProxy
+ * ELB 是高可用的
+ * 如果你正在配置自己的 **负载均衡器**, 在多个可用区域中设置多台服务器用于 [双活](https://github.com/donnemartin/system-design-primer#active-active) 或 [主被](https://github.com/donnemartin/system-design-primer#active-passive) 将提高可用性
+ * 终止在 **负载平衡器** 上的SSL,以减少后端服务器上的计算负载,并简化证书管理
+ * 在多个可用区域中使用多台 **Web服务器**
+ * 在多个可用区域的 [**主-从 故障转移**](https://github.com/donnemartin/system-design-primer#master-slave-replication) 模式中使用多个 **MySQL** 实例来改进冗余
+* 分离 **Web 服务器** 和 [**应用服务器**](https://github.com/donnemartin/system-design-primer#application-layer)
+ * 独立扩展和配置每一层
+ * **Web 服务器** 可以作为 [**反向代理**](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+ * 例如, 你可以添加 **应用服务器** 处理 **读 API** 而另外一些处理 **写 API**
+* 将静态(和一些动态)内容转移到 [**内容分发网络 (CDN)**](https://github.com/donnemartin/system-design-primer#content-delivery-network) 例如 CloudFront 以减少负载和延迟
-*Trade-offs, alternatives, and additional details:*
+**折中方案, 可选方案, 和其他细节:**
-* See the linked content above for details
+* 查阅以上链接获得更多细节
-### Users+++
+### 用户+++
![Imgur](http://i.imgur.com/OZCxJr0.png)
-**Note:** **Internal Load Balancers** not shown to reduce clutter
+**注意:** **内部负载均衡** 不显示以减少混乱
-#### Assumptions
+#### 假设
-Our **Benchmarks/Load Tests** and **Profiling** show that we are read-heavy (100:1 with writes) and our database is suffering from poor performance from the high read requests.
+我们的 **性能/负载测试** 和 **性能测试** 显示我们读操作频繁(100:1 的读写比率),并且数据库在高读请求时表现很糟糕。
-#### Goals
+#### 目标
-* The following goals attempt to address the scaling issues with the **MySQL Database**
- * Based on the **Benchmarks/Load Tests** and **Profiling**, you might only need to implement one or two of these techniques
-* Move the following data to a [**Memory Cache**](https://github.com/donnemartin/system-design-primer#cache) such as Elasticache to reduce load and latency:
- * Frequently accessed content from **MySQL**
- * First, try to configure the **MySQL Database** cache to see if that is sufficient to relieve the bottleneck before implementing a **Memory Cache**
- * Session data from the **Web Servers**
- * The **Web Servers** become stateless, allowing for **Autoscaling**
- * Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
-* Add [**MySQL Read Replicas**](https://github.com/donnemartin/system-design-primer#master-slave-replication) to reduce load on the write master
-* Add more **Web Servers** and **Application Servers** to improve responsiveness
+* 下面的目标试图解决 **MySQL数据库** 的伸缩性问题
+ * * 基于 **基准/负载测试** 和 **分析**,你可能只需要实现其中的一两个技术
+* 将下列数据移动到一个 [**内存缓存**](https://github.com/donnemartin/system-design-primer#cache),例如弹性缓存,以减少负载和延迟:
+ * **MySQL** 中频繁访问的内容
+ * 首先, 尝试配置 **MySQL 数据库** 缓存以查看是否足以在实现 **内存缓存** 之前缓解瓶颈
+ * 来自 **Web 服务器** 的会话数据
+ * **Web 服务器** 变成无状态的, 允许 **自动伸缩**
+ * 从内存中读取 1 MB 内存需要大约 250 微秒,而从SSD中读取时间要长 4 倍,从磁盘读取的时间要长 80 倍。1
+* 添加 [**MySQL 读取副本**](https://github.com/donnemartin/system-design-primer#master-slave-replication) 来减少写主线程的负载
+* 添加更多 **Web 服务器** and **应用服务器** 来提高响应
-*Trade-offs, alternatives, and additional details:*
+**折中方案, 可选方案, 和其他细节:**
-* See the linked content above for details
+* 查阅以上链接获得更多细节
-#### Add MySQL read replicas
+#### 添加 MySQL 读取副本
-* In addition to adding and scaling a **Memory Cache**, **MySQL Read Replicas** can also help relieve load on the **MySQL Write Master**
-* Add logic to **Web Server** to separate out writes and reads
-* Add **Load Balancers** in front of **MySQL Read Replicas** (not pictured to reduce clutter)
-* Most services are read-heavy vs write-heavy
+* 除了添加和扩展 **内存缓存**,**MySQL 读副本服务器** 也能够帮助缓解在 **MySQL 写主服务器** 的负载。
+* 添加逻辑到 **Web 服务器** 来区分读和写操作
+* 在 **MySQL 读副本服务器** 之上添加 **负载均衡器** (不是为了减少混乱)
+* 大多数服务都是读取负载大于写入负载
-*Trade-offs, alternatives, and additional details:*
+**折中方案, 可选方案, 和其他细节:**
-* See the [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) section
+* 查阅 [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) 章节
-### Users++++
+### 用户++++
![Imgur](http://i.imgur.com/3X8nmdL.png)
-#### Assumptions
+#### 假设
-Our **Benchmarks/Load Tests** and **Profiling** show that our traffic spikes during regular business hours in the U.S. and drop significantly when users leave the office. We think we can cut costs by automatically spinning up and down servers based on actual load. We're a small shop so we'd like to automate as much of the DevOps as possible for **Autoscaling** and for the general operations.
+**基准/负载测试** 和 **分析** 显示,在美国,正常工作时间存在流量峰值,当用户离开办公室时,流量骤降。我们认为,可以通过真实负载自动转换服务器数量来降低成本。我们是一家小商店,所以我们希望 DevOps 尽量自动化地进行 **自动伸缩** 和通用操作。
-#### Goals
+#### 目标
-* Add **Autoscaling** to provision capacity as needed
- * Keep up with traffic spikes
- * Reduce costs by powering down unused instances
-* Automate DevOps
- * Chef, Puppet, Ansible, etc
-* Continue monitoring metrics to address bottlenecks
- * **Host level** - Review a single EC2 instance
- * **Aggregate level** - Review load balancer stats
- * **Log analysis** - CloudWatch, CloudTrail, Loggly, Splunk, Sumo
- * **External site performance** - Pingdom or New Relic
- * **Handle notifications and incidents** - PagerDuty
- * **Error Reporting** - Sentry
+* 根据需要添加 **自动扩展**
+ * 跟踪流量高峰
+ * 通过关闭未使用的实例来降低成本
+* 自动化 DevOps
+ * Chef, Puppet, Ansible 工具等
+* 继续监控指标以解决瓶颈
+ * **主机水平** - 检查一个 EC2 实例
+ * **总水平** - 检查负载均衡器统计数据
+ * **日志分析** - CloudWatch, CloudTrail, Loggly, Splunk, Sumo
+ * **外部站点的性能** - Pingdom or New Relic
+ * **处理通知和事件** - PagerDuty
+ * **错误报告** - Sentry
-#### Add autoscaling
+#### 添加自动扩展
-* Consider a managed service such as AWS **Autoscaling**
- * Create one group for each **Web Server** and one for each **Application Server** type, place each group in multiple availability zones
- * Set a min and max number of instances
- * Trigger to scale up and down through CloudWatch
- * Simple time of day metric for predictable loads or
- * Metrics over a time period:
- * CPU load
- * Latency
- * Network traffic
- * Custom metric
- * Disadvantages
- * Autoscaling can introduce complexity
- * It could take some time before a system appropriately scales up to meet increased demand, or to scale down when demand drops
+* 考虑使用一个托管服务,比如AWS **自动扩展**
+ * 为每个 **Web 服务器** 创建一个组,并为每个 **应用服务器** 类型创建一个组,将每个组放置在多个可用区域中
+ * 设置最小和最大实例数
+ * 通过 CloudWatch 来扩展或收缩
+ * 可预测负载的简单时间度量
+ * 一段时间内的指标:
+ * CPU 负载
+ * 延迟
+ * 网络流量
+ * 自定义指标
+ * 缺点
+ * 自动扩展会引入复杂性
+ * 可能需要一段时间才能适当扩大规模,以满足增加的需求,或者在需求下降时缩减规模
-### Users+++++
+### 用户+++++
![Imgur](http://i.imgur.com/jj3A5N8.png)
-**Note:** **Autoscaling** groups not shown to reduce clutter
+**注释:** **自动伸缩** 组不显示以减少混乱
-#### Assumptions
+#### 假设
-As the service continues to grow towards the figures outlined in the constraints, we iteratively run **Benchmarks/Load Tests** and **Profiling** to uncover and address new bottlenecks.
+当服务继续向着限制条件概述的方向发展,我们反复地运行 **基准/负载测试** 和 **分析** 来进一步发现和定位新的瓶颈。
-#### Goals
+#### 目标
-We'll continue to address scaling issues due to the problem's constraints:
+由于问题的约束,我们将继续提出扩展性的问题:
-* If our **MySQL Database** starts to grow too large, we might consider only storing a limited time period of data in the database, while storing the rest in a data warehouse such as Redshift
- * A data warehouse such as Redshift can comfortably handle the constraint of 1 TB of new content per month
-* With 40,000 average read requests per second, read traffic for popular content can be addressed by scaling the **Memory Cache**, which is also useful for handling the unevenly distributed traffic and traffic spikes
- * The **SQL Read Replicas** might have trouble handling the cache misses, we'll probably need to employ additional SQL scaling patterns
-* 400 average writes per second (with presumably significantly higher peaks) might be tough for a single **SQL Write Master-Slave**, also pointing to a need for additional scaling techniques
+* 如果我们的 **MySQL 数据库** 开始变得过于庞大, 我们可能只考虑把数据在数据库中存储一段有限的时间, 同时在例如 Redshift 这样的数据仓库中存储其余的数据
+ * 像 Redshift 这样的数据仓库能够轻松处理每月 1TB 的新内容
+* 平均每秒 40,000 次的读取请求, 可以通过扩展 **内存缓存** 来处理热点内容的读取流量,这对于处理不均匀分布的流量和流量峰值也很有用
+ * **SQL读取副本** 可能会遇到处理缓存未命中的问题, 我们可能需要使用额外的 SQL 扩展模式
+* 对于单个 **SQL 写主-从** 模式来说,平均每秒 400 次写操作(明显更高)可能会很困难,同时还需要更多的扩展技术
-SQL scaling patterns include:
+SQL 扩展模型包括:
-* [Federation](https://github.com/donnemartin/system-design-primer#federation)
-* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
-* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [集合](https://github.com/donnemartin/system-design-primer#federation)
+* [分片](https://github.com/donnemartin/system-design-primer#sharding)
+* [反范式](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
-To further address the high read and write requests, we should also consider moving appropriate data to a [**NoSQL Database**](https://github.com/donnemartin/system-design-primer#nosql) such as DynamoDB.
+为了进一步处理高读和写请求,我们还应该考虑将适当的数据移动到一个 [**NoSQL数据库**](https://github.com/donnemartin/system-design-primer#nosql) ,例如 DynamoDB。
-We can further separate out our [**Application Servers**](https://github.com/donnemartin/system-design-primer#application-layer) to allow for independent scaling. Batch processes or computations that do not need to be done in real-time can be done [**Asynchronously**](https://github.com/donnemartin/system-design-primer#asynchronism) with **Queues** and **Workers**:
+我们可以进一步分离我们的 [**应用服务器**](https://github.com/donnemartin/system-design-primer#application-layer) 以允许独立扩展。不需要实时完成的批处理任务和计算可以通过 Queues 和 Workers 异步完成:
-* For example, in a photo service, the photo upload and the thumbnail creation can be separated:
- * **Client** uploads photo
- * **Application Server** puts a job in a **Queue** such as SQS
- * The **Worker Service** on EC2 or Lambda pulls work off the **Queue** then:
- * Creates a thumbnail
- * Updates a **Database**
- * Stores the thumbnail in the **Object Store**
+* 以照片服务为例,照片上传和缩略图的创建可以分开进行
+ * **客户端** 上传图片
+ * **应用服务器** 推送一个任务到 **队列** 例如 SQS
+ * EC2 上的 **Worker 服务** 或者 Lambda 从 **队列** 拉取 work,然后:
+ * 创建缩略图
+ * 更新 **数据库**
+ * 在 **对象存储** 中存储缩略图
-*Trade-offs, alternatives, and additional details:*
+**折中方案, 可选方案, 和其他细节:**
-* See the linked content above for details
+* 查阅以上链接获得更多细节
-## Additional talking points
+## 额外的话题
-> Additional topics to dive into, depending on the problem scope and time remaining.
+> 根据问题的范围和剩余时间,还需要深入讨论其他问题。
-### SQL scaling patterns
+### SQL 扩展模式
-* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
-* [Federation](https://github.com/donnemartin/system-design-primer#federation)
-* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
-* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [读取副本](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [集合](https://github.com/donnemartin/system-design-primer#federation)
+* [分区](https://github.com/donnemartin/system-design-primer#sharding)
+* [反规范化](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
#### NoSQL
-* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
-* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
-* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
-* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
+* [键值存储](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [文档存储](https://github.com/donnemartin/system-design-primer#document-store)
+* [宽表存储](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [图数据库](https://github.com/donnemartin/system-design-primer#graph-database)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
-### Caching
+### 缓存
-* Where to cache
- * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
- * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
- * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
- * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
- * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
-* What to cache
- * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
- * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
-* When to update the cache
- * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
- * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
- * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
- * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+* 缓存到哪里
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web 服务缓存](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer#application-caching)
+* 缓存什么
+ * [数据库请求层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [对象层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* 何时更新缓存
+ * [预留缓存](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [完全写入](https://github.com/donnemartin/system-design-primer#write-through)
+ * [延迟写 (写回)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [事先更新](https://github.com/donnemartin/system-design-primer#refresh-ahead)
-### Asynchronism and microservices
+### 异步性和微服务
-* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
-* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
-* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
-* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
+* [消息队列](https://github.com/donnemartin/system-design-primer#message-queues)
+* [任务队列](https://github.com/donnemartin/system-design-primer#task-queues)
+* [回退压力](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [微服务](https://github.com/donnemartin/system-design-primer#microservices)
-### Communications
+### 沟通
-* Discuss tradeoffs:
- * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
- * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
-* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
+* 关于折中方案的讨论:
+ * 客户端的外部通讯 - [遵循 REST 的 HTTP APIs](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * 内部通讯 - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [服务探索](https://github.com/donnemartin/system-design-primer#service-discovery)
-### Security
+### 安全性
-Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
+参考 [安全章节](https://github.com/donnemartin/system-design-primer#security)
-### Latency numbers
+### 延迟数字指标
-See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
+查阅 [每个程序员必懂的延迟数字](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know)
-### Ongoing
+### 正在进行
-* Continue benchmarking and monitoring your system to address bottlenecks as they come up
-* Scaling is an iterative process
+* 继续基准测试并监控你的系统以解决出现的瓶颈问题
+* 扩展是一个迭代的过程
diff --git a/solutions/system_design/social_graph/README.md b/solutions/system_design/social_graph/README.md
index f7dfd4ef..07b8e3e7 100644
--- a/solutions/system_design/social_graph/README.md
+++ b/solutions/system_design/social_graph/README.md
@@ -1,66 +1,66 @@
-# Design the data structures for a social network
+# 为社交网络设计数据结构
-*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
+**注释:为了避免重复,这篇文章的链接直接关联到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 的相关章节。为一讨论要点、折中方案和可选方案做参考。**
-## Step 1: Outline use cases and constraints
+## 第 1 步:用例和约束概要
-> Gather requirements and scope the problem.
-> Ask questions to clarify use cases and constraints.
-> Discuss assumptions.
+> 收集需求并调查问题。
+> 通过提问清晰用例和约束。
+> 讨论假设。
-Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
+如果没有面试官提出明确的问题,我们将自己定义一些用例和约束条件。
-### Use cases
+### 用例
-#### We'll scope the problem to handle only the following use cases
+#### 我们就处理以下用例审视这一问题
-* **User** searches for someone and sees the shortest path to the searched person
-* **Service** has high availability
+* **用户** 寻找某人并显示与被寻人之间的最短路径
+* **服务** 高可用
-### Constraints and assumptions
+### 约束和假设
-#### State assumptions
+#### 状态假设
-* Traffic is not evenly distributed
- * Some searches are more popular than others, while others are only executed once
-* Graph data won't fit on a single machine
-* Graph edges are unweighted
-* 100 million users
-* 50 friends per user average
-* 1 billion friend searches per month
+* 流量分布不均
+ * 某些搜索比别的更热门,同时某些搜索仅执行一次
+* 图数据不适用单一机器
+* 图的边没有权重
+* 1 千万用户
+* 每个用户平均有 50 个朋友
+* 每月 10 亿次朋友搜索
-Exercise the use of more traditional systems - don't use graph-specific solutions such as [GraphQL](http://graphql.org/) or a graph database like [Neo4j](https://neo4j.com/)
+训练使用更传统的系统 - 别用图特有的解决方案例如 [GraphQL](http://graphql.org/) 或图数据库如 [Neo4j](https://neo4j.com/)。
-#### Calculate usage
+#### 计算使用
-**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
+**向你的面试官厘清你是否应该做粗略的使用计算**
-* 5 billion friend relationships
- * 100 million users * 50 friends per user average
-* 400 search requests per second
+* 50 亿朋友关系
+ * 1 亿用户 * 平均每人 50 个朋友
+* 每秒 400 次搜索请求
-Handy conversion guide:
+便捷的转换指南:
-* 2.5 million seconds per month
-* 1 request per second = 2.5 million requests per month
-* 40 requests per second = 100 million requests per month
-* 400 requests per second = 1 billion requests per month
+* 每月 250 万秒
+* 每秒 1 个请求 = 每月 250 万次请求
+* 每秒 40 个请求 = 每月 1 亿次请求
+* 每秒 400 个请求 = 每月 10 亿次请求
-## Step 2: Create a high level design
+## 第 2 步:创建高级设计方案
-> Outline a high level design with all important components.
+> 用所有重要组件概述高水平设计
![Imgur](http://i.imgur.com/wxXyq2J.png)
-## Step 3: Design core components
+## 第 3 步:设计核心组件
-> Dive into details for each core component.
+> 深入每个核心组件的细节。
-### Use case: User searches for someone and sees the shortest path to the searched person
+### 用例: 用户搜索某人并查看到被搜人的最短路径
-**Clarify with your interviewer how much code you are expected to write**.
+**和你的面试官说清你期望的代码量**
-Without the constraint of millions of users (vertices) and billions of friend relationships (edges), we could solve this unweighted shortest path task with a general BFS approach:
+没有百万用户(点)的和十亿朋友关系(边)的限制,我们能够用一般 BFS 方法解决无权重最短路径任务:
```python
class Graph(Graph):
@@ -99,23 +99,22 @@ class Graph(Graph):
return None
```
-We won't be able to fit all users on the same machine, we'll need to [shard](https://github.com/donnemartin/system-design-primer#sharding) users across **Person Servers** and access them with a **Lookup Service**.
+我们不能在同一台机器上满足所有用户,我们需要通过 **人员服务器** [拆分](https://github.com/donnemartin/system-design-primer#sharding) 用户并且通过 **查询服务** 访问。
-* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* The **Web Server** forwards the request to the **Search API** server
-* The **Search API** server forwards the request to the **User Graph Service**
-* The **User Graph Service** does the following:
- * Uses the **Lookup Service** to find the **Person Server** where the current user's info is stored
- * Finds the appropriate **Person Server** to retrieve the current user's list of `friend_ids`
- * Runs a BFS search using the current user as the `source` and the current user's `friend_ids` as the ids for each `adjacent_node`
- * To get the `adjacent_node` from a given id:
- * The **User Graph Service** will *again* need to communicate with the **Lookup Service** to determine which **Person Server** stores the`adjacent_node` matching the given id (potential for optimization)
+* **客户端** 向 **服务器** 发送请求,**服务器** 作为 [反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* **搜索 API** 服务器向 **用户图服务** 转发请求
+* **用户图服务** 有以下功能:
+ * 使用 **查询服务** 找到当前用户信息存储的 **人员服务器**
+ * 找到适当的 **人员服务器** 检索当前用户的 `friend_ids` 列表
+ * 把当前用户作为 `source` 运行 BFS 搜索算法同时 当前用户的 `friend_ids` 作为每个 `adjacent_node` 的 ids
+ * 给定 id 获取 `adjacent_node`:
+ * **用户图服务** 将 **再次** 和 **查询服务** 通讯,最后判断出和给定 id 相匹配的存储 `adjacent_node` 的 **人员服务器**(有待优化)
-**Clarify with your interviewer how much code you should be writing**.
+**和你的面试官说清你应该写的代码量**
-**Note**: Error handling is excluded below for simplicity. Ask if you should code proper error handing.
+**注释**:简易版错误处理执行如下。询问你是否需要编写适当的错误处理方法。
-**Lookup Service** implementation:
+**查询服务** 实现:
```python
class LookupService(object):
@@ -130,7 +129,7 @@ class LookupService(object):
return self.lookup[person_id]
```
-**Person Server** implementation:
+**人员服务器** 实现:
```python
class PersonServer(object):
@@ -149,7 +148,7 @@ class PersonServer(object):
return results
```
-**Person** implementation:
+**用户** 实现:
```python
class Person(object):
@@ -160,7 +159,7 @@ class Person(object):
self.friend_ids = friend_ids
```
-**User Graph Service** implementation:
+**用户图服务** 实现:
```python
class UserGraphService(object):
@@ -218,13 +217,13 @@ class UserGraphService(object):
return None
```
-We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
+我们用的是公共的 [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
```
$ curl https://social.com/api/v1/friend_search?person_id=1234
```
-Response:
+响应:
```
{
@@ -244,106 +243,106 @@ Response:
},
```
-For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
+内部通信使用 [远端过程调用](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)。
-## Step 4: Scale the design
+## 第 4 步:扩展设计
-> Identify and address bottlenecks, given the constraints.
+> 在给定约束条件下,定义和确认瓶颈。
![Imgur](http://i.imgur.com/cdCv5g7.png)
-**Important: Do not simply jump right into the final design from the initial design!**
+**重要:别简化从最初设计到最终设计的过程!**
-State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
+你将要做的是:1) **基准/负载 测试**, 2) 瓶颈 **概述**, 3) 当评估可选和折中方案时定位瓶颈,4) 重复。以 [在 AWS 上设计支持百万级到千万级用户的系统](../scaling_aws/README.md) 为参考迭代地扩展最初设计。
-It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
+讨论最初设计可能遇到的瓶颈和处理方法十分重要。例如,什么问题可以通过添加多台 **Web 服务器** 作为 **负载均衡** 解决?**CDN**?**主从副本**?每个问题都有哪些替代和 **折中** 方案?
-We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
+我们即将介绍一些组件来完成设计和解决扩展性问题。内部负载均衡不显示以减少混乱。
-*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
+**避免重复讨论**,以下网址链接到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 相关的主流方案、折中方案和替代方案。
* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
-* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
-* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
-* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
-* [Cache](https://github.com/donnemartin/system-design-primer#cache)
-* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
-* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
+* [负载均衡](https://github.com/donnemartin/system-design-primer#load-balancer)
+* [横向扩展](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* [Web 服务器(反向代理)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* [API 服务器(应用层)](https://github.com/donnemartin/system-design-primer#application-layer)
+* [缓存](https://github.com/donnemartin/system-design-primer#cache)
+* [一致性模式](https://github.com/donnemartin/system-design-primer#consistency-patterns)
+* [可用性模式](https://github.com/donnemartin/system-design-primer#availability-patterns)
-To address the constraint of 400 *average* read requests per second (higher at peak), person data can be served from a **Memory Cache** such as Redis or Memcached to reduce response times and to reduce traffic to downstream services. This could be especially useful for people who do multiple searches in succession and for people who are well-connected. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
+解决 **平均** 每秒 400 次请求的限制(峰值),人员数据可以存在例如 Redis 或 Memcached 这样的 **内存** 中以减少响应次数和下游流量通信服务。这尤其在用户执行多次连续查询和查询哪些广泛连接的人时十分有用。从内存中读取 1MB 数据大约要 250 微秒,从 SSD 中读取同样大小的数据时间要长 4 倍,从硬盘要长 80 倍。1
-Below are further optimizations:
+以下是进一步优化方案:
-* Store complete or partial BFS traversals to speed up subsequent lookups in the **Memory Cache**
-* Batch compute offline then store complete or partial BFS traversals to speed up subsequent lookups in a **NoSQL Database**
-* Reduce machine jumps by batching together friend lookups hosted on the same **Person Server**
- * [Shard](https://github.com/donnemartin/system-design-primer#sharding) **Person Servers** by location to further improve this, as friends generally live closer to each other
-* Do two BFS searches at the same time, one starting from the source, and one from the destination, then merge the two paths
-* Start the BFS search from people with large numbers of friends, as they are more likely to reduce the number of [degrees of separation](https://en.wikipedia.org/wiki/Six_degrees_of_separation) between the current user and the search target
-* Set a limit based on time or number of hops before asking the user if they want to continue searching, as searching could take a considerable amount of time in some cases
-* Use a **Graph Database** such as [Neo4j](https://neo4j.com/) or a graph-specific query language such as [GraphQL](http://graphql.org/) (if there were no constraint preventing the use of **Graph Databases**)
+* 在 **内存** 中存储完整的或部分的BFS遍历加快后续查找
+* 在 **NoSQL 数据库** 中批量离线计算并存储完整的或部分的BFS遍历加快后续查找
+* 在同一台 **人员服务器** 上托管批处理同一批朋友查找减少机器跳转
+ * 通过地理位置 [拆分](https://github.com/donnemartin/system-design-primer#sharding) **人员服务器** 来进一步优化,因为朋友通常住得都比较近
+* 同时进行两个 BFS 查找,一个从 source 开始,一个从 destination 开始,然后合并两个路径
+* 从有庞大朋友圈的人开始找起,这样更有可能减小当前用户和搜索目标之间的 [离散度数](https://en.wikipedia.org/wiki/Six_degrees_of_separation)
+* 在询问用户是否继续查询之前设置基于时间或跳跃数阈值,当在某些案例中搜索耗费时间过长时。
+* 使用类似 [Neo4j](https://neo4j.com/) 的 **图数据库** 或图特定查询语法,例如 [GraphQL](http://graphql.org/)(如果没有禁止使用 **图数据库** 的限制的话)
-## Additional talking points
+## 额外的话题
-> Additional topics to dive into, depending on the problem scope and time remaining.
+> 根据问题的范围和剩余时间,还需要深入讨论其他问题。
-### SQL scaling patterns
+### SQL 扩展模式
-* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
-* [Federation](https://github.com/donnemartin/system-design-primer#federation)
-* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
-* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [读取副本](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [集合](https://github.com/donnemartin/system-design-primer#federation)
+* [分区](https://github.com/donnemartin/system-design-primer#sharding)
+* [反规范化](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
#### NoSQL
-* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
-* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
-* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
-* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
+* [键值存储](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [文档存储](https://github.com/donnemartin/system-design-primer#document-store)
+* [宽表存储](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [图数据库](https://github.com/donnemartin/system-design-primer#graph-database)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
-### Caching
+### 缓存
-* Where to cache
- * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
- * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
- * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
- * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
- * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
-* What to cache
- * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
- * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
-* When to update the cache
- * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
- * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
- * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
- * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+* 缓存到哪里
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web 服务缓存](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer#application-caching)
+* 缓存什么
+ * [数据库请求层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [对象层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* 何时更新缓存
+ * [预留缓存](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [完全写入](https://github.com/donnemartin/system-design-primer#write-through)
+ * [延迟写 (写回)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [事先更新](https://github.com/donnemartin/system-design-primer#refresh-ahead)
-### Asynchronism and microservices
+### 异步性和微服务
-* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
-* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
-* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
-* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
+* [消息队列](https://github.com/donnemartin/system-design-primer#message-queues)
+* [任务队列](https://github.com/donnemartin/system-design-primer#task-queues)
+* [回退压力](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [微服务](https://github.com/donnemartin/system-design-primer#microservices)
-### Communications
+### 沟通
-* Discuss tradeoffs:
- * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
- * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
-* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
+* 关于折中方案的讨论:
+ * 客户端的外部通讯 - [遵循 REST 的 HTTP APIs](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * 内部通讯 - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [服务探索](https://github.com/donnemartin/system-design-primer#service-discovery)
-### Security
+### 安全性
-Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
+参考 [安全章节](https://github.com/donnemartin/system-design-primer#security)
-### Latency numbers
+### 延迟数字指标
-See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
+查阅 [每个程序员必懂的延迟数字](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know)
-### Ongoing
+### 正在进行
-* Continue benchmarking and monitoring your system to address bottlenecks as they come up
-* Scaling is an iterative process
+* 继续基准测试并监控你的系统以解决出现的瓶颈问题
+* 扩展是一个迭代的过程
diff --git a/solutions/system_design/twitter/README.md b/solutions/system_design/twitter/README.md
index 374f5dd2..1853444d 100644
--- a/solutions/system_design/twitter/README.md
+++ b/solutions/system_design/twitter/README.md
@@ -1,126 +1,126 @@
-# Design the Twitter timeline and search
+# 设计推特时间轴与搜索功能
-*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
-**Design the Facebook feed** and **Design Facebook search** are similar questions.
+**设计 Facebook 的 feed** 与**设计 Facebook 搜索**与此为同一类型问题。
-## Step 1: Outline use cases and constraints
+## 第一步:简述用例与约束条件
-> Gather requirements and scope the problem.
-> Ask questions to clarify use cases and constraints.
-> Discuss assumptions.
+> 搜集需求与问题的范围。
+> 提出问题来明确用例与约束条件。
+> 讨论假设。
-Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
-### Use cases
+### 用例
-#### We'll scope the problem to handle only the following use cases
+#### 我们将把问题限定在仅处理以下用例的范围中
-* **User** posts a tweet
- * **Service** pushes tweets to followers, sending push notifications and emails
-* **User** views the user timeline (activity from the user)
-* **User** views the home timeline (activity from people the user is following)
-* **User** searches keywords
-* **Service** has high availability
+* **用户**发布了一篇推特
+ * **服务**将推特推送给关注者,给他们发送消息通知与邮件
+* **用户**浏览用户时间轴(用户最近的活动)
+* **用户**浏览主页时间轴(用户关注的人最近的活动)
+* **用户**搜索关键词
+* **服务**需要有高可用性
-#### Out of scope
+#### 不在用例范围内的有
-* **Service** pushes tweets to the Twitter Firehose and other streams
-* **Service** strips out tweets based on user's visibility settings
- * Hide @reply if the user is not also following the person being replied to
- * Respect 'hide retweets' setting
-* Analytics
+* **服务**向 Firehose 与其它流数据接口推送推特
+* **服务**根据用户的”是否可见“选项排除推特
+ * 隐藏未关注者的 @回复
+ * 关心”隐藏转发“设置
+* 数据分析
-### Constraints and assumptions
+### 限制条件与假设
-#### State assumptions
+#### 提出假设
-General
+普遍情况
-* Traffic is not evenly distributed
-* Posting a tweet should be fast
- * Fanning out a tweet to all of your followers should be fast, unless you have millions of followers
-* 100 million active users
-* 500 million tweets per day or 15 billion tweets per month
- * Each tweet averages a fanout of 10 deliveries
- * 5 billion total tweets delivered on fanout per day
- * 150 billion tweets delivered on fanout per month
-* 250 billion read requests per month
-* 10 billion searches per month
+* 网络流量不是均匀分布的
+* 发布推特的速度需要足够快速
+ * 除非有上百万的关注者,否则将推特推送给粉丝的速度要足够快
+* 1 亿个活跃用户
+* 每天新发布 5 亿条推特,每月新发布 150 亿条推特
+ * 平均每条推特需要推送给 5 个人
+ * 每天需要进行 50 亿次推送
+ * 每月需要进行 1500 亿次推送
+* 每月需要处理 2500 亿次读取请求
+* 每月需要处理 100 亿次搜索
-Timeline
+时间轴功能
-* Viewing the timeline should be fast
-* Twitter is more read heavy than write heavy
- * Optimize for fast reads of tweets
-* Ingesting tweets is write heavy
+* 浏览时间轴需要足够快
+* 推特的读取负载要大于写入负载
+ * 需要为推特的快速读取进行优化
+* 存入推特是高写入负载功能
-Search
+搜索功能
-* Searching should be fast
-* Search is read-heavy
+* 搜索速度需要足够快
+* 搜索是高负载读取功能
-#### Calculate usage
+#### 计算用量
-**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
-* Size per tweet:
- * `tweet_id` - 8 bytes
- * `user_id` - 32 bytes
- * `text` - 140 bytes
- * `media` - 10 KB average
- * Total: ~10 KB
-* 150 TB of new tweet content per month
- * 10 KB per tweet * 500 million tweets per day * 30 days per month
- * 5.4 PB of new tweet content in 3 years
-* 100 thousand read requests per second
- * 250 billion read requests per month * (400 requests per second / 1 billion requests per month)
-* 6,000 tweets per second
- * 15 billion tweets per month * (400 requests per second / 1 billion requests per month)
-* 60 thousand tweets delivered on fanout per second
- * 150 billion tweets delivered on fanout per month * (400 requests per second / 1 billion requests per month)
-* 4,000 search requests per second
+* 每条推特的大小:
+ * `tweet_id` - 8 字节
+ * `user_id` - 32 字节
+ * `text` - 140 字节
+ * `media` - 平均 10 KB
+ * 总计: 大约 10 KB
+* 每月产生新推特的内容为 150 TB
+ * 每条推特 10 KB * 每天 5 亿条推特 * 每月 30 天
+ * 3 年产生新推特的内容为 5.4 PB
+* 每秒需要处理 10 万次读取请求
+ * 每个月需要处理 2500 亿次请求 * (每秒 400 次请求 / 每月 10 亿次请求)
+* 每秒发布 6000 条推特
+ * 每月发布 150 亿条推特 * (每秒 400 次请求 / 每月 10 次请求)
+* 每秒推送 6 万条推特
+ * 每月推送 1500 亿条推特 * (每秒 400 次请求 / 每月 10 亿次请求)
+* 每秒 4000 次搜索请求
-Handy conversion guide:
+便利换算指南:
-* 2.5 million seconds per month
-* 1 request per second = 2.5 million requests per month
-* 40 requests per second = 100 million requests per month
-* 400 requests per second = 1 billion requests per month
+* 每个月有 250 万秒
+* 每秒一个请求 = 每个月 250 万次请求
+* 每秒 40 个请求 = 每个月 1 亿次请求
+* 每秒 400 个请求 = 每个月 10 亿次请求
-## Step 2: Create a high level design
+## 第二步:概要设计
-> Outline a high level design with all important components.
+> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/48tEA2j.png)
-## Step 3: Design core components
+## 第三步:设计核心组件
-> Dive into details for each core component.
+> 深入每个核心组件的细节。
-### Use case: User posts a tweet
+### 用例:用户发表了一篇推特
-We could store the user's own tweets to populate the user timeline (activity from the user) in a [relational database](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms). We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
+我们可以将用户自己发表的推特存储在[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)中。我们也可以讨论一下[究竟是用 SQL 还是用 NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
-Delivering tweets and building the home timeline (activity from people the user is following) is trickier. Fanning out tweets to all followers (60 thousand tweets delivered on fanout per second) will overload a traditional [relational database](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms). We'll probably want to choose a data store with fast writes such as a **NoSQL database** or **Memory Cache**. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
+构建用户主页时间轴(查看关注用户的活动)以及推送推特是件麻烦事。将特推传播给所有关注者(每秒约递送 6 万条推特)这一操作有可能会使传统的[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)超负载。因此,我们可以使用 **NoSQL 数据库**或**内存数据库**之类的更快的数据存储方式。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
-We could store media such as photos or videos on an **Object Store**.
+我们可以将照片、视频之类的媒体存储于**对象存储**中。
-* The **Client** posts a tweet to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* The **Web Server** forwards the request to the **Write API** server
-* The **Write API** stores the tweet in the user's timeline on a **SQL database**
-* The **Write API** contacts the **Fan Out Service**, which does the following:
- * Queries the **User Graph Service** to find the user's followers stored in the **Memory Cache**
- * Stores the tweet in the *home timeline of the user's followers* in a **Memory Cache**
- * O(n) operation: 1,000 followers = 1,000 lookups and inserts
- * Stores the tweet in the **Search Index Service** to enable fast searching
- * Stores media in the **Object Store**
- * Uses the **Notification Service** to send out push notifications to followers:
- * Uses a **Queue** (not pictured) to asynchronously send out notifications
+* **客户端**向应用[反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)的**Web 服务器**发送一条推特
+* **Web 服务器**将请求转发给**写 API**服务器
+* **写 API**服务器将推特使用 **SQL 数据库**存储于用户时间轴中
+* **写 API**调用**消息输出服务**,进行以下操作:
+ * 查询**用户 图 服务**找到存储于**内存缓存**中的此用户的粉丝
+ * 将推特存储于**内存缓存**中的**此用户的粉丝的主页时间轴**中
+ * O(n) 复杂度操作: 1000 名粉丝 = 1000 次查找与插入
+ * 将特推存储在**搜索索引服务**中,以加快搜索
+ * 将媒体存储于**对象存储**中
+ * 使用**通知服务**向粉丝发送推送:
+ * 使用**队列**异步推送通知
-**Clarify with your interviewer how much code you are expected to write**.
+**向你的面试官告知你准备写多少代码**。
-If our **Memory Cache** is Redis, we could use a native Redis list with the following structure:
+如果我们用 Redis 作为**内存缓存**,那可以用 Redis 原生的 list 作为其数据结构。结构如下:
```
tweet n+2 tweet n+1 tweet n
@@ -128,9 +128,9 @@ If our **Memory Cache** is Redis, we could use a native Redis list with the foll
| tweet_id user_id meta | tweet_id user_id meta | tweet_id user_id meta |
```
-The new tweet would be placed in the **Memory Cache**, which populates user's home timeline (activity from people the user is following).
+新发布的推特将被存储在对应用户(关注且活跃的用户)的主页时间轴的**内存缓存**中。
-We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
+我们可以调用一个公共的 [REST API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest):
```
$ curl -X POST --data '{ "user_id": "123", "auth_token": "ABC123", \
@@ -138,7 +138,7 @@ $ curl -X POST --data '{ "user_id": "123", "auth_token": "ABC123", \
https://twitter.com/api/v1/tweet
```
-Response:
+返回:
```
{
@@ -150,24 +150,24 @@ Response:
}
```
-For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
+而对于服务器内部的通信,我们可以使用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
-### Use case: User views the home timeline
+### 用例:用户浏览主页时间轴
-* The **Client** posts a home timeline request to the **Web Server**
-* The **Web Server** forwards the request to the **Read API** server
-* The **Read API** server contacts the **Timeline Service**, which does the following:
- * Gets the timeline data stored in the **Memory Cache**, containing tweet ids and user ids - O(1)
- * Queries the **Tweet Info Service** with a [multiget](http://redis.io/commands/mget) to obtain additional info about the tweet ids - O(n)
- * Queries the **User Info Service** with a multiget to obtain additional info about the user ids - O(n)
+* **客户端**向 **Web 服务器**发起一次读取主页时间轴的请求
+* **Web 服务器**将请求转发给**读取 API**服务器
+* **读取 API**服务器调用**时间轴服务**进行以下操作:
+ * 从**内存缓存**读取时间轴数据,其中包括推特 id 与用户 id - O(1)
+ * 通过 [multiget](http://redis.io/commands/mget) 向**推特信息服务**进行查询,以获取相关 id 推特的额外信息 - O(n)
+ * 通过 muiltiget 向**用户信息服务**进行查询,以获取相关 id 用户的额外信息 - O(n)
-REST API:
+REST API:
```
$ curl https://twitter.com/api/v1/home_timeline?user_id=123
```
-Response:
+返回:
```
{
@@ -187,146 +187,145 @@ Response:
},
```
-### Use case: User views the user timeline
+### 用例:用户浏览用户时间轴
-* The **Client** posts a user timeline request to the **Web Server**
-* The **Web Server** forwards the request to the **Read API** server
-* The **Read API** retrieves the user timeline from the **SQL Database**
+* **客户端**向**Web 服务器**发起获得用户时间线的请求
+* **Web 服务器**将请求转发给**读取 API**服务器
+* **读取 API**从 **SQL 数据库**中取出用户的时间轴
-The REST API would be similar to the home timeline, except all tweets would come from the user as opposed to the people the user is following.
+REST API 与前面的主页时间轴类似,区别只在于取出的推特是由用户自己发送而不是关注人发送。
-### Use case: User searches keywords
+### 用例:用户搜索关键词
-* The **Client** sends a search request to the **Web Server**
-* The **Web Server** forwards the request to the **Search API** server
-* The **Search API** contacts the **Search Service**, which does the following:
- * Parses/tokenizes the input query, determining what needs to be searched
- * Removes markup
- * Breaks up the text into terms
- * Fixes typos
- * Normalizes capitalization
- * Converts the query to use boolean operations
- * Queries the **Search Cluster** (ie [Lucene](https://lucene.apache.org/)) for the results:
- * [Scatter gathers](https://github.com/donnemartin/system-design-primer#under-development) each server in the cluster to determine if there are any results for the query
- * Merges, ranks, sorts, and returns the results
+* **客户端**将搜索请求发给**Web 服务器**
+* **Web 服务器**将请求转发给**搜索 API**服务器
+* **搜索 API**调用**搜索服务**进行以下操作:
+ * 对输入进行转换与分词,弄明白需要搜索什么东西
+ * 移除标点等额外内容
+ * 将文本打散为词组
+ * 修正拼写错误
+ * 规范字母大小写
+ * 将查询转换为布尔操作
+ * 查询**搜索集群**(例如[Lucene](https://lucene.apache.org/))检索结果:
+ * 对集群内的所有服务器进行查询,将有结果的查询进行[发散聚合(Scatter gathers)](https://github.com/donnemartin/system-design-primer#under-development)
+ * 合并取到的条目,进行评分与排序,最终返回结果
-REST API:
+REST API:
```
$ curl https://twitter.com/api/v1/search?query=hello+world
```
-The response would be similar to that of the home timeline, except for tweets matching the given query.
+返回结果与前面的主页时间轴类似,只不过返回的是符合查询条件的推特。
-## Step 4: Scale the design
+## 第四步:架构扩展
-> Identify and address bottlenecks, given the constraints.
+> 根据限制条件,找到并解决瓶颈。
-![Imgur](http://i.imgur.com/jrUBAF7.png)
+![Imgur](http://i.imgur.com/MzExP06.png)
-**Important: Do not simply jump right into the final design from the initial design!**
+**重要提示:不要从最初设计直接跳到最终设计中!**
-State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
-It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
-We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
+我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
-*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
-* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
-* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
-* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
-* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
-* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
-* [Cache](https://github.com/donnemartin/system-design-primer#cache)
-* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
-* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
-* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
-* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
-* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
+* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
+* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
-The **Fanout Service** is a potential bottleneck. Twitter users with millions of followers could take several minutes to have their tweets go through the fanout process. This could lead to race conditions with @replies to the tweet, which we could mitigate by re-ordering the tweets at serve time.
+**消息输出服务**有可能成为性能瓶颈。那些有着百万数量关注着的用户可能发一条推特就需要好几分钟才能完成消息输出进程。这有可能使 @回复 这种推特时出现竞争条件,因此需要根据服务时间对此推特进行重排序来降低影响。
-We could also avoid fanning out tweets from highly-followed users. Instead, we could search to find tweets for highly-followed users, merge the search results with the user's home timeline results, then re-order the tweets at serve time.
+我们还可以避免从高关注量的用户输出推特。相反,我们可以通过搜索来找到高关注量用户的推特,并将搜索结果与用户的主页时间轴合并,再根据时间对其进行排序。
-Additional optimizations include:
+此外,还可以通过以下内容进行优化:
-* Keep only several hundred tweets for each home timeline in the **Memory Cache**
-* Keep only active users' home timeline info in the **Memory Cache**
- * If a user was not previously active in the past 30 days, we could rebuild the timeline from the **SQL Database**
- * Query the **User Graph Service** to determine who the user is following
- * Get the tweets from the **SQL Database** and add them to the **Memory Cache**
-* Store only a month of tweets in the **Tweet Info Service**
-* Store only active users in the **User Info Service**
-* The **Search Cluster** would likely need to keep the tweets in memory to keep latency low
+* 仅为每个主页时间轴在**内存缓存**中存储数百条推特
+* 仅在**内存缓存**中存储活动用户的主页时间轴
+ * 如果某个用户在过去 30 天都没有产生活动,那我们可以使用 **SQL 数据库**重新构建他的时间轴
+ * 使用**用户 图 服务**来查询并确定用户关注的人
+ * 从 **SQL 数据库**中取出推特,并将它们存入**内存缓存**
+* 仅在**推特信息服务**中存储一个月的推特
+* 仅在**用户信息服务**中存储活动用户的信息
+* **搜索集群**需要将推特保留在内存中,以降低延迟
-We'll also want to address the bottleneck with the **SQL Database**.
+我们还可以考虑优化 **SQL 数据库** 来解决一些瓶颈问题。
-Although the **Memory Cache** should reduce the load on the database, it is unlikely the **SQL Read Replicas** alone would be enough to handle the cache misses. We'll probably need to employ additional SQL scaling patterns.
+**内存缓存**能减小一些数据库的负载,靠 **SQL Read 副本**已经足够处理缓存未命中情况。我们还可以考虑使用一些额外的 SQL 性能拓展技术。
-The high volume of writes would overwhelm a single **SQL Write Master-Slave**, also pointing to a need for additional scaling techniques.
+高容量的写入将淹没单个的 **SQL 写主从**模式,因此需要更多的拓展技术。
-* [Federation](https://github.com/donnemartin/system-design-primer#federation)
-* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
-* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
-We should also consider moving some data to a **NoSQL Database**.
+我们也可以考虑将一些数据移至 **NoSQL 数据库**。
-## Additional talking points
+## 其它要点
-> Additional topics to dive into, depending on the problem scope and time remaining.
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
#### NoSQL
-* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
-* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
-* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
-* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
-### Caching
+### 缓存
-* Where to cache
- * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
- * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
- * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
- * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
- * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
-* What to cache
- * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
- * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
-* When to update the cache
- * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
- * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
- * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
- * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
-### Asynchronism and microservices
+### 异步与微服务
-* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
-* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
-* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
-* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
-### Communications
+### 通信
-* Discuss tradeoffs:
- * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
- * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
-* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
-### Security
+### 安全性
-Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
+请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
-### Latency numbers
+### 延迟数值
-See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
+请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
-### Ongoing
+### 持续探讨
-* Continue benchmarking and monitoring your system to address bottlenecks as they come up
-* Scaling is an iterative process
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构拓展是一个迭代的过程。
diff --git a/solutions/system_design/web_crawler/README.md b/solutions/system_design/web_crawler/README.md
index d95dc107..2ad0938e 100644
--- a/solutions/system_design/web_crawler/README.md
+++ b/solutions/system_design/web_crawler/README.md
@@ -1,104 +1,102 @@
-# Design a web crawler
+# 设计一个网页爬虫
-*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
-## Step 1: Outline use cases and constraints
+## 第一步:简述用例与约束条件
-> Gather requirements and scope the problem.
-> Ask questions to clarify use cases and constraints.
-> Discuss assumptions.
+> 把所有需要的东西聚集在一起,审视问题。不停的提问,以至于我们可以明确使用场景和约束。讨论假设。
-Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
-### Use cases
+### 用例
-#### We'll scope the problem to handle only the following use cases
+#### 我们把问题限定在仅处理以下用例的范围中
-* **Service** crawls a list of urls:
- * Generates reverse index of words to pages containing the search terms
- * Generates titles and snippets for pages
- * Title and snippets are static, they do not change based on search query
-* **User** inputs a search term and sees a list of relevant pages with titles and snippets the crawler generated
- * Only sketch high level components and interactions for this use case, no need to go into depth
-* **Service** has high availability
+* **服务** 抓取一系列链接:
+ * 生成包含搜索词的网页倒排索引
+ * 生成页面的标题和摘要信息
+ * 页面标题和摘要都是静态的,它们不会根据搜索词改变
+* **用户** 输入搜索词后,可以看到相关的搜索结果列表,列表每一项都包含由网页爬虫生成的页面标题及摘要
+ * 只给该用例绘制出概要组件和交互说明,无需讨论细节
+* **服务** 具有高可用性
-#### Out of scope
+#### 无需考虑
-* Search analytics
-* Personalized search results
-* Page rank
+* 搜索分析
+* 个性化搜索结果
+* 页面排名
-### Constraints and assumptions
+### 限制条件与假设
-#### State assumptions
+#### 提出假设
-* Traffic is not evenly distributed
- * Some searches are very popular, while others are only executed once
-* Support only anonymous users
-* Generating search results should be fast
-* The web crawler should not get stuck in an infinite loop
- * We get stuck in an infinite loop if the graph contains a cycle
-* 1 billion links to crawl
- * Pages need to be crawled regularly to ensure freshness
- * Average refresh rate of about once per week, more frequent for popular sites
- * 4 billion links crawled each month
- * Average stored size per web page: 500 KB
- * For simplicity, count changes the same as new pages
-* 100 billion searches per month
+* 搜索流量分布不均
+ * 有些搜索词非常热门,有些则非常冷门
+* 只支持匿名用户
+* 用户很快就能看到搜索结果
+* 网页爬虫不应该陷入死循环
+ * 当爬虫路径包含环的时候,将会陷入死循环
+* 抓取 10 亿个链接
+ * 要定期重新抓取页面以确保新鲜度
+ * 平均每周重新抓取一次,网站越热门,那么重新抓取的频率越高
+ * 每月抓取 40 亿个链接
+ * 每个页面的平均存储大小:500 KB
+ * 简单起见,重新抓取的页面算作新页面
+* 每月搜索量 1000 亿次
-Exercise the use of more traditional systems - don't use existing systems such as [solr](http://lucene.apache.org/solr/) or [nutch](http://nutch.apache.org/).
+用更传统的系统来练习 —— 不要使用 [solr](http://lucene.apache.org/solr/) 、[nutch](http://nutch.apache.org/) 之类的现成系统。
-#### Calculate usage
+#### 计算用量
-**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
-* 2 PB of stored page content per month
- * 500 KB per page * 4 billion links crawled per month
- * 72 PB of stored page content in 3 years
-* 1,600 write requests per second
-* 40,000 search requests per second
+* 每月存储 2 PB 页面
+ * 每月抓取 40 亿个页面,每个页面 500 KB
+ * 三年存储 72 PB 页面
+* 每秒 1600 次写请求
+* 每秒 40000 次搜索请求
-Handy conversion guide:
+简便换算指南:
-* 2.5 million seconds per month
-* 1 request per second = 2.5 million requests per month
-* 40 requests per second = 100 million requests per month
-* 400 requests per second = 1 billion requests per month
+* 一个月有 250 万秒
+* 每秒 1 个请求,即每月 250 万个请求
+* 每秒 40 个请求,即每月 1 亿个请求
+* 每秒 400 个请求,即每月 10 亿个请求
-## Step 2: Create a high level design
+## 第二步: 概要设计
-> Outline a high level design with all important components.
+> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/xjdAAUv.png)
-## Step 3: Design core components
+## 第三步:设计核心组件
-> Dive into details for each core component.
+> 对每一个核心组件进行详细深入的分析。
-### Use case: Service crawls a list of urls
+### 用例:爬虫服务抓取一系列网页
-We'll assume we have an initial list of `links_to_crawl` ranked initially based on overall site popularity. If this is not a reasonable assumption, we can seed the crawler with popular sites that link to outside content such as [Yahoo](https://www.yahoo.com/), [DMOZ](http://www.dmoz.org/), etc
+假设我们有一个初始列表 `links_to_crawl`(待抓取链接),它最初基于网站整体的知名度来排序。当然如果这个假设不合理,我们可以使用 [Yahoo](https://www.yahoo.com/)、[DMOZ](http://www.dmoz.org/) 等知名门户网站作为种子链接来进行扩散 。
-We'll use a table `crawled_links` to store processed links and their page signatures.
+我们将用表 `crawled_links` (已抓取链接 )来记录已经处理过的链接以及相应的页面签名。
-We could store `links_to_crawl` and `crawled_links` in a key-value **NoSQL Database**. For the ranked links in `links_to_crawl`, we could use [Redis](https://redis.io/) with sorted sets to maintain a ranking of page links. We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
+我们可以将 `links_to_crawl` 和 `crawled_links` 记录在键-值型 **NoSQL 数据库**中。对于 `crawled_links` 中已排序的链接,我们可以使用 [Redis](https://redis.io/) 的有序集合来维护网页链接的排名。我们应当在 [选择 SQL 还是 NoSQL 的问题上,讨论有关使用场景以及利弊 ](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
-* The **Crawler Service** processes each page link by doing the following in a loop:
- * Takes the top ranked page link to crawl
- * Checks `crawled_links` in the **NoSQL Database** for an entry with a similar page signature
- * If we have a similar page, reduces the priority of the page link
- * This prevents us from getting into a cycle
- * Continue
- * Else, crawls the link
- * Adds a job to the **Reverse Index Service** queue to generate a [reverse index](https://en.wikipedia.org/wiki/Search_engine_indexing)
- * Adds a job to the **Document Service** queue to generate a static title and snippet
- * Generates the page signature
- * Removes the link from `links_to_crawl` in the **NoSQL Database**
- * Inserts the page link and signature to `crawled_links` in the **NoSQL Database**
+* **爬虫服务**按照以下流程循环处理每一个页面链接:
+ * 选取排名最靠前的待抓取链接
+ * 在 **NoSQL 数据库**的 `crawled_links` 中,检查待抓取页面的签名是否与某个已抓取页面的签名相似
+ * 若存在,则降低该页面链接的优先级
+ * 这样做可以避免陷入死循环
+ * 继续(进入下一次循环)
+ * 若不存在,则抓取该链接
+ * 在**倒排索引服务**任务队列中,新增一个生成[倒排索引](https://en.wikipedia.org/wiki/Search_engine_indexing)任务。
+ * 在**文档服务**任务队列中,新增一个生成静态标题和摘要的任务。
+ * 生成页面签名
+ * 在 **NoSQL 数据库**的 `links_to_crawl` 中删除该链接
+ * 在 **NoSQL 数据库**的 `crawled_links` 中插入该链接以及页面签名
-**Clarify with your interviewer how much code you are expected to write**.
+**向面试官了解你需要写多少代码**。
-`PagesDataStore` is an abstraction within the **Crawler Service** that uses the **NoSQL Database**:
+`PagesDataStore` 是**爬虫服务**中的一个抽象类,它使用 **NoSQL 数据库**进行存储。
```python
class PagesDataStore(object):
@@ -108,31 +106,31 @@ class PagesDataStore(object):
...
def add_link_to_crawl(self, url):
- """Add the given link to `links_to_crawl`."""
+ """将指定链接加入 `links_to_crawl`。"""
...
def remove_link_to_crawl(self, url):
- """Remove the given link from `links_to_crawl`."""
+ """从 `links_to_crawl` 中删除指定链接。"""
...
def reduce_priority_link_to_crawl(self, url)
- """Reduce the priority of a link in `links_to_crawl` to avoid cycles."""
+ """在 `links_to_crawl` 中降低一个链接的优先级以避免死循环。"""
...
def extract_max_priority_page(self):
- """Return the highest priority link in `links_to_crawl`."""
+ """返回 `links_to_crawl` 中优先级最高的链接。"""
...
def insert_crawled_link(self, url, signature):
- """Add the given link to `crawled_links`."""
+ """将指定链接加入 `crawled_links`。"""
...
def crawled_similar(self, signature):
- """Determine if we've already crawled a page matching the given signature"""
+ """判断待抓取页面的签名是否与某个已抓取页面的签名相似。"""
...
```
-`Page` is an abstraction within the **Crawler Service** that encapsulates a page, its contents, child urls, and signature:
+`Page` 是**爬虫服务**的一个抽象类,它封装了网页对象,由页面链接、页面内容、子链接和页面签名构成。
```python
class Page(object):
@@ -144,7 +142,7 @@ class Page(object):
self.signature = signature
```
-`Crawler` is the main class within **Crawler Service**, composed of `Page` and `PagesDataStore`.
+`Crawler` 是**爬虫服务**的主类,由`Page` 和 `PagesDataStore` 组成。
```python
class Crawler(object):
@@ -155,7 +153,7 @@ class Crawler(object):
self.doc_index_queue = doc_index_queue
def create_signature(self, page):
- """Create signature based on url and contents."""
+ """基于页面链接与内容生成签名。"""
...
def crawl_page(self, page):
@@ -176,16 +174,16 @@ class Crawler(object):
self.crawl_page(page)
```
-### Handling duplicates
+### 处理重复内容
-We need to be careful the web crawler doesn't get stuck in an infinite loop, which happens when the graph contains a cycle.
+我们要谨防网页爬虫陷入死循环,这通常会发生在爬虫路径中存在环的情况。
-**Clarify with your interviewer how much code you are expected to write**.
+**向面试官了解你需要写多少代码**.
-We'll want to remove duplicate urls:
+删除重复链接:
-* For smaller lists we could use something like `sort | unique`
-* With 1 billion links to crawl, we could use **MapReduce** to output only entries that have a frequency of 1
+* 假设数据量较小,我们可以用类似于 `sort | unique` 的方法。(译注: 先排序,后去重)
+* 假设有 10 亿条数据,我们应该使用 **MapReduce** 来输出只出现 1 次的记录。
```python
class RemoveDuplicateUrls(MRJob):
@@ -199,38 +197,38 @@ class RemoveDuplicateUrls(MRJob):
yield key, total
```
-Detecting duplicate content is more complex. We could generate a signature based on the contents of the page and compare those two signatures for similarity. Some potential algorithms are [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index) and [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity).
+比起处理重复内容,检测重复内容更为复杂。我们可以基于网页内容生成签名,然后对比两者签名的相似度。可能会用到的算法有 [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index) 以及 [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)。
-### Determining when to update the crawl results
+### 抓取结果更新策略
-Pages need to be crawled regularly to ensure freshness. Crawl results could have a `timestamp` field that indicates the last time a page was crawled. After a default time period, say one week, all pages should be refreshed. Frequently updated or more popular sites could be refreshed in shorter intervals.
+要定期重新抓取页面以确保新鲜度。抓取结果应该有个 `timestamp` 字段记录上一次页面抓取时间。每隔一段时间,比如说 1 周,所有页面都需要更新一次。对于热门网站或是内容频繁更新的网站,爬虫抓取间隔可以缩短。
-Although we won't dive into details on analytics, we could do some data mining to determine the mean time before a particular page is updated, and use that statistic to determine how often to re-crawl the page.
+尽管我们不会深入网页数据分析的细节,我们仍然要做一些数据挖掘工作来确定一个页面的平均更新时间,并且根据相关的统计数据来决定爬虫的重新抓取频率。
-We might also choose to support a `Robots.txt` file that gives webmasters control of crawl frequency.
+当然我们也应该根据站长提供的 `Robots.txt` 来控制爬虫的抓取频率。
-### Use case: User inputs a search term and sees a list of relevant pages with titles and snippets
+### 用例:用户输入搜索词后,可以看到相关的搜索结果列表,列表每一项都包含由网页爬虫生成的页面标题及摘要
-* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* The **Web Server** forwards the request to the **Query API** server
-* The **Query API** server does the following:
- * Parses the query
- * Removes markup
- * Breaks up the text into terms
- * Fixes typos
- * Normalizes capitalization
- * Converts the query to use boolean operations
- * Uses the **Reverse Index Service** to find documents matching the query
- * The **Reverse Index Service** ranks the matching results and returns the top ones
- * Uses the **Document Service** to return titles and snippets
+* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
+* **Web 服务器** 发送请求到 **Query API** 服务器
+* **查询 API** 服务将会做这些事情:
+ * 解析查询参数
+ * 删除 HTML 标记
+ * 将文本分割成词组 (译注: 分词处理)
+ * 修正错别字
+ * 规范化大小写
+ * 将搜索词转换为布尔运算
+ * 使用**倒排索引服务**来查找匹配查询的文档
+ * **倒排索引服务**对匹配到的结果进行排名,然后返回最符合的结果
+ * 使用**文档服务**返回文章标题与摘要
-We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
+我们使用 [**REST API**](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest) 与客户端通信:
```
$ curl https://search.com/api/v1/search?query=hello+world
```
-Response:
+响应内容:
```
{
@@ -250,104 +248,109 @@ Response:
},
```
-For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
+对于服务器内部通信,我们可以使用 [远程过程调用协议(RPC)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
-## Step 4: Scale the design
-> Identify and address bottlenecks, given the constraints.
+## 第四步:架构扩展
+
+> 根据限制条件,找到并解决瓶颈。
![Imgur](http://i.imgur.com/bWxPtQA.png)
-**Important: Do not simply jump right into the final design from the initial design!**
+**重要提示:不要直接从最初设计跳到最终设计!**
-State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
-It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一套配备多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有哪些呢?
-We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
+我们将会介绍一些组件来完成设计,并解决架构规模扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
-*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及替代方案。
-* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
-* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
-* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
-* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
-* [Cache](https://github.com/donnemartin/system-design-primer#cache)
-* [NoSQL](https://github.com/donnemartin/system-design-primer#nosql)
-* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
-* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平扩展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [Web 服务器(反向代理)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务器(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#nosql)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
-Some searches are very popular, while others are only executed once. Popular queries can be served from a **Memory Cache** such as Redis or Memcached to reduce response times and to avoid overloading the **Reverse Index Service** and **Document Service**. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
+有些搜索词非常热门,有些则非常冷门。热门的搜索词可以通过诸如 Redis 或者 Memcached 之类的**内存缓存**来缩短响应时间,避免**倒排索引服务**以及**文档服务**过载。**内存缓存**同样适用于流量分布不均匀以及流量短时高峰问题。从内存中读取 1 MB 连续数据大约需要 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
-Below are a few other optimizations to the **Crawling Service**:
-* To handle the data size and request load, the **Reverse Index Service** and **Document Service** will likely need to make heavy use sharding and replication.
-* DNS lookup can be a bottleneck, the **Crawler Service** can keep its own DNS lookup that is refreshed periodically
-* The **Crawler Service** can improve performance and reduce memory usage by keeping many open connections at a time, referred to as [connection pooling](https://en.wikipedia.org/wiki/Connection_pool)
- * Switching to [UDP](https://github.com/donnemartin/system-design-primer#user-datagram-protocol-udp) could also boost performance
-* Web crawling is bandwidth intensive, ensure there is enough bandwidth to sustain high throughput
+以下是优化**爬虫服务**的其他建议:
-## Additional talking points
+* 为了处理数据大小问题以及网络请求负载,**倒排索引服务**和**文档服务**可能需要大量应用数据分片和数据复制。
+* DNS 查询可能会成为瓶颈,**爬虫服务**最好专门维护一套定期更新的 DNS 查询服务。
+* 借助于[连接池](https://en.wikipedia.org/wiki/Connection_pool),即同时维持多个开放网络连接,可以提升**爬虫服务**的性能并减少内存使用量。
+ * 改用 [UDP](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#用户数据报协议udp) 协议同样可以提升性能
+* 网络爬虫受带宽影响较大,请确保带宽足够维持高吞吐量。
-> Additional topics to dive into, depending on the problem scope and time remaining.
+## 其它要点
-### SQL scaling patterns
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
-* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
-* [Federation](https://github.com/donnemartin/system-design-primer#federation)
-* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
-* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
+### SQL 扩展模式
+
+* [读取复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
#### NoSQL
-* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
-* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
-* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
-* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
-### Caching
-* Where to cache
- * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
- * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
- * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
- * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
- * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
-* What to cache
- * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
- * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
-* When to update the cache
- * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
- * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
- * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
- * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+### 缓存
-### Asynchronism and microservices
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
-* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
-* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
-* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
-* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
+### 异步与微服务
-### Communications
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
-* Discuss tradeoffs:
- * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
- * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
-* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
+### 通信
-### Security
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
-Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
-### Latency numbers
+### 安全性
-See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
+请参阅[安全](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)。
-### Ongoing
-* Continue benchmarking and monitoring your system to address bottlenecks as they come up
-* Scaling is an iterative process
+### 延迟数值
+
+请参阅[每个程序员都应该知道的延迟数](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+
+### 持续探讨
+
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构扩展是一个迭代的过程。
From ac806e46cb63b1b7955d43bf55291f52b0d3eeae Mon Sep 17 00:00:00 2001
From: Donne Martin
Date: Mon, 9 Mar 2020 21:46:02 -0400
Subject: [PATCH 41/72] Revert "zh-cn: Sync with upstream to keep it up-to-date
(#374)" (#391)
This reverts commit 301b9d88e4aed1c34b3275301f18b14957c38c91.
#374 overwrote the English version of the solutions
---
README-zh-Hans.md | 11 +-
solutions/system_design/mint/README.md | 395 ++++++-------
solutions/system_design/pastebin/README.md | 359 ++++++------
solutions/system_design/query_cache/README.md | 312 +++++-----
solutions/system_design/sales_rank/README.md | 298 +++++-----
solutions/system_design/scaling_aws/README.md | 536 +++++++++---------
.../system_design/social_graph/README.md | 249 ++++----
solutions/system_design/twitter/README.md | 395 ++++++-------
solutions/system_design/web_crawler/README.md | 351 ++++++------
9 files changed, 1457 insertions(+), 1449 deletions(-)
diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index 83c6007b..21a6cddb 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -1,6 +1,6 @@
> * 原文地址:[github.com/donnemartin/system-design-primer](https://github.com/donnemartin/system-design-primer)
> * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner)
-> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)、[根号三](https://github.com/sqrthree)
+> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)
> * 这个 [链接](https://github.com/xitu/system-design-primer/compare/master...donnemartin:master) 用来查看本翻译与英文版是否有差别(如果你没有看到 README.md 发生变化,那就意味着这份翻译文档是最新的)。
*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
@@ -12,6 +12,14 @@
+## 翻译
+
+有兴趣参与[翻译](https://github.com/donnemartin/system-design-primer/issues/28)? 以下是正在进行中的翻译:
+
+* [巴西葡萄牙语](https://github.com/donnemartin/system-design-primer/issues/40)
+* [简体中文](https://github.com/donnemartin/system-design-primer/issues/38)
+* [土耳其语](https://github.com/donnemartin/system-design-primer/issues/39)
+
## 目的
> 学习如何设计大型系统。
@@ -83,7 +91,6 @@
* 修复错误
* 完善章节
* 添加章节
-* [帮助翻译](https://github.com/donnemartin/system-design-primer/issues/28)
一些还需要完善的内容放在了[正在完善中](#正在完善中)。
diff --git a/solutions/system_design/mint/README.md b/solutions/system_design/mint/README.md
index 58467bc6..6fca1938 100644
--- a/solutions/system_design/mint/README.md
+++ b/solutions/system_design/mint/README.md
@@ -1,102 +1,102 @@
-# 设计 Mint.com
+# Design Mint.com
-**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题索引)中的有关部分,以避免重复的内容。您可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
-## 第一步:简述用例与约束条件
+## Step 1: Outline use cases and constraints
-> 搜集需求与问题的范围。
-> 提出问题来明确用例与约束条件。
-> 讨论假设。
+> Gather requirements and scope the problem.
+> Ask questions to clarify use cases and constraints.
+> Discuss assumptions.
-我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
-### 用例
+### Use cases
-#### 我们将把问题限定在仅处理以下用例的范围中
+#### We'll scope the problem to handle only the following use cases
-* **用户** 连接到一个财务账户
-* **服务** 从账户中提取交易
- * 每日更新
- * 分类交易
- * 允许用户手动分类
- * 不自动重新分类
- * 按类别分析每月支出
-* **服务** 推荐预算
- * 允许用户手动设置预算
- * 当接近或者超出预算时,发送通知
-* **服务** 具有高可用性
+* **User** connects to a financial account
+* **Service** extracts transactions from the account
+ * Updates daily
+ * Categorizes transactions
+ * Allows manual category override by the user
+ * No automatic re-categorization
+ * Analyzes monthly spending, by category
+* **Service** recommends a budget
+ * Allows users to manually set a budget
+ * Sends notifications when approaching or exceeding budget
+* **Service** has high availability
-#### 非用例范围
+#### Out of scope
-* **服务** 执行附加的日志记录和分析
+* **Service** performs additional logging and analytics
-### 限制条件与假设
+### Constraints and assumptions
-#### 提出假设
+#### State assumptions
-* 网络流量非均匀分布
-* 自动账户日更新只适用于 30 天内活跃的用户
-* 添加或者移除财务账户相对较少
-* 预算通知不需要及时
-* 1000 万用户
- * 每个用户10个预算类别= 1亿个预算项
- * 示例类别:
+* Traffic is not evenly distributed
+* Automatic daily update of accounts applies only to users active in the past 30 days
+* Adding or removing financial accounts is relatively rare
+* Budget notifications don't need to be instant
+* 10 million users
+ * 10 budget categories per user = 100 million budget items
+ * Example categories:
* Housing = $1,000
* Food = $200
* Gas = $100
- * 卖方确定交易类别
- * 50000 个卖方
-* 3000 万财务账户
-* 每月 50 亿交易
-* 每月 5 亿读请求
-* 10:1 读写比
- * Write-heavy,用户每天都进行交易,但是每天很少访问该网站
+ * Sellers are used to determine transaction category
+ * 50,000 sellers
+* 30 million financial accounts
+* 5 billion transactions per month
+* 500 million read requests per month
+* 10:1 write to read ratio
+ * Write-heavy, users make transactions daily, but few visit the site daily
-#### 计算用量
+#### Calculate usage
-**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
-* 每次交易的用量:
- * `user_id` - 8 字节
- * `created_at` - 5 字节
- * `seller` - 32 字节
- * `amount` - 5 字节
- * Total: ~50 字节
-* 每月产生 250 GB 新的交易内容
- * 每次交易 50 比特 * 50 亿交易每月
- * 3年内新的交易内容 9 TB
+* Size per transaction:
+ * `user_id` - 8 bytes
+ * `created_at` - 5 bytes
+ * `seller` - 32 bytes
+ * `amount` - 5 bytes
+ * Total: ~50 bytes
+* 250 GB of new transaction content per month
+ * 50 bytes per transaction * 5 billion transactions per month
+ * 9 TB of new transaction content in 3 years
* Assume most are new transactions instead of updates to existing ones
-* 平均每秒产生 2000 次交易
-* 平均每秒产生 200 读请求
+* 2,000 transactions per second on average
+* 200 read requests per second on average
-便利换算指南:
+Handy conversion guide:
-* 每个月有 250 万秒
-* 每秒一个请求 = 每个月 250 万次请求
-* 每秒 40 个请求 = 每个月 1 亿次请求
-* 每秒 400 个请求 = 每个月 10 亿次请求
+* 2.5 million seconds per month
+* 1 request per second = 2.5 million requests per month
+* 40 requests per second = 100 million requests per month
+* 400 requests per second = 1 billion requests per month
-## 第二步:概要设计
+## Step 2: Create a high level design
-> 列出所有重要组件以规划概要设计。
+> Outline a high level design with all important components.
![Imgur](http://i.imgur.com/E8klrBh.png)
-## 第三步:设计核心组件
+## Step 3: Design core components
-> 深入每个核心组件的细节。
+> Dive into details for each core component.
-### 用例:用户连接到一个财务账户
+### Use case: User connects to a financial account
-我们可以将 1000 万用户的信息存储在一个[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)中。我们应该讨论一下[选择SQL或NoSQL之间的用例和权衡](https://github.com/donnemartin/system-design-primer#sql-or-nosql)了。
+We could store info on the 10 million users in a [relational database](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms). We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
-* **客户端** 作为一个[反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server),发送请求到 **Web 服务器**
-* **Web 服务器** 转发请求到 **账户API** 服务器
-* **账户API** 服务器将新输入的账户信息更新到 **SQL数据库** 的`accounts`表
+* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* The **Web Server** forwards the request to the **Accounts API** server
+* The **Accounts API** server updates the **SQL Database** `accounts` table with the newly entered account info
-**告知你的面试官你准备写多少代码**。
+**Clarify with your interviewer how much code you are expected to write**.
-`accounts`表应该具有如下结构:
+The `accounts` table could have the following structure:
```
id int NOT NULL AUTO_INCREMENT
@@ -110,9 +110,9 @@ PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```
-我们将在`id`,`user_id`和`created_at`等字段上创建一个[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)以加速查找(对数时间而不是扫描整个表)并保持数据在内存中。从内存中顺序读取 1 MB数据花费大约250毫秒,而从SSD读取是其4倍,从磁盘读取是其80倍。1
+We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id`, `user_id `, and `created_at` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
-我们将使用公开的[**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
+We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
```
$ curl -X POST --data '{ "user_id": "foo", "account_url": "bar", \
@@ -120,35 +120,35 @@ $ curl -X POST --data '{ "user_id": "foo", "account_url": "bar", \
https://mint.com/api/v1/account
```
-对于内部通信,我们可以使用[远程过程调用](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)。
+For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
-接下来,服务从账户中提取交易。
+Next, the service extracts transactions from the account.
-### 用例:服务从账户中提取交易
+### Use case: Service extracts transactions from the account
-如下几种情况下,我们会想要从账户中提取信息:
+We'll want to extract information from an account in these cases:
-* 用户首次链接账户
-* 用户手动更新账户
-* 为过去 30 天内活跃的用户自动日更新
+* The user first links the account
+* The user manually refreshes the account
+* Automatically each day for users who have been active in the past 30 days
-数据流:
+Data flow:
-* **客户端**向 **Web服务器** 发送请求
-* **Web服务器** 将请求转发到 **帐户API** 服务器
-* **帐户API** 服务器将job放在 **队列** 中,如 [Amazon SQS](https://aws.amazon.com/sqs/) 或者 [RabbitMQ](https://www.rabbitmq.com/)
- * 提取交易可能需要一段时间,我们可能希望[与队列异步](https://github.com/donnemartin/system-design-primer#asynchronism)地来做,虽然这会引入额外的复杂度。
-* **交易提取服务** 执行如下操作:
- * 从 **Queue** 中拉取并从金融机构中提取给定用户的交易,将结果作为原始日志文件存储在 **对象存储区**。
- * 使用 **分类服务** 来分类每个交易
- * 使用 **预算服务** 来按类别计算每月总支出
- * **预算服务** 使用 **通知服务** 让用户知道他们是否接近或者已经超出预算
- * 更新具有分类交易的 **SQL数据库** 的`transactions`表
- * 按类别更新 **SQL数据库** `monthly_spending`表的每月总支出
- * 通过 **通知服务** 提醒用户交易完成
- * 使用一个 **队列** (没有画出来) 来异步发送通知
+* The **Client** sends a request to the **Web Server**
+* The **Web Server** forwards the request to the **Accounts API** server
+* The **Accounts API** server places a job on a **Queue** such as [Amazon SQS](https://aws.amazon.com/sqs/) or [RabbitMQ](https://www.rabbitmq.com/)
+ * Extracting transactions could take awhile, we'd probably want to do this [asynchronously with a queue](https://github.com/donnemartin/system-design-primer#asynchronism), although this introduces additional complexity
+* The **Transaction Extraction Service** does the following:
+ * Pulls from the **Queue** and extracts transactions for the given account from the financial institution, storing the results as raw log files in the **Object Store**
+ * Uses the **Category Service** to categorize each transaction
+ * Uses the **Budget Service** to calculate aggregate monthly spending by category
+ * The **Budget Service** uses the **Notification Service** to let users know if they are nearing or have exceeded their budget
+ * Updates the **SQL Database** `transactions` table with categorized transactions
+ * Updates the **SQL Database** `monthly_spending` table with aggregate monthly spending by category
+ * Notifies the user the transactions have completed through the **Notification Service**:
+ * Uses a **Queue** (not pictured) to asynchronously send out notifications
-`transactions`表应该具有如下结构:
+The `transactions` table could have the following structure:
```
id int NOT NULL AUTO_INCREMENT
@@ -160,9 +160,9 @@ PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```
-我们将在 `id`,`user_id`,和 `created_at`字段上创建[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)。
+We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id`, `user_id `, and `created_at`.
-`monthly_spending`表应该具有如下结构:
+The `monthly_spending` table could have the following structure:
```
id int NOT NULL AUTO_INCREMENT
@@ -174,13 +174,13 @@ PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```
-我们将在`id`,`user_id`字段上创建[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)。
+We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id` and `user_id `.
-#### 分类服务
+#### Category service
-对于 **分类服务**,我们可以生成一个带有最受欢迎卖家的卖家-类别字典。如果我们估计 50000 个卖家,并估计每个条目占用不少于 255 个字节,该字典只需要大约 12 MB内存。
+For the **Category Service**, we can seed a seller-to-category dictionary with the most popular sellers. If we estimate 50,000 sellers and estimate each entry to take less than 255 bytes, the dictionary would only take about 12 MB of memory.
-**告知你的面试官你准备写多少代码**。
+**Clarify with your interviewer how much code you are expected to write**.
```python
class DefaultCategories(Enum):
@@ -197,7 +197,7 @@ seller_category_map['Target'] = DefaultCategories.SHOPPING
...
```
-对于一开始没有在映射中的卖家,我们可以通过评估用户提供的手动类别来进行众包。在 O(1) 时间内,我们可以用堆来快速查找每个卖家的顶端的手动覆盖。
+For sellers not initially seeded in the map, we could use a crowdsourcing effort by evaluating the manual category overrides our users provide. We could use a heap to quickly lookup the top manual override per seller in O(1) time.
```python
class Categorizer(object):
@@ -217,7 +217,7 @@ class Categorizer(object):
return None
```
-交易实现:
+Transaction implementation:
```python
class Transaction(object):
@@ -228,10 +228,9 @@ class Transaction(object):
self.amount = amount
```
-### 用例:服务推荐预算
+### Use case: Service recommends a budget
-首先,我们可以使用根据收入等级分配每类别金额的通用预算模板。使用这种方法,我们不必存储在约束中标识的 1 亿个预算项目,只需存储用户覆盖的预算项目。如果用户覆盖预算类别,我们可以在
-`TABLE budget_overrides`中存储此覆盖。
+To start, we could use a generic budget template that allocates category amounts based on income tiers. Using this approach, we would not have to store the 100 million budget items identified in the constraints, only those that the user overrides. If a user overrides a budget category, which we could store the override in the `TABLE budget_overrides`.
```python
class Budget(object):
@@ -253,26 +252,26 @@ class Budget(object):
self.categories_to_budget_map[category] = amount
```
-对于 **预算服务** 而言,我们可以在`transactions`表上运行SQL查询以生成`monthly_spending`聚合表。由于用户通常每个月有很多交易,所以`monthly_spending`表的行数可能会少于总共50亿次交易的行数。
+For the **Budget Service**, we can potentially run SQL queries on the `transactions` table to generate the `monthly_spending` aggregate table. The `monthly_spending` table would likely have much fewer rows than the total 5 billion transactions, since users typically have many transactions per month.
-作为替代,我们可以在原始交易文件上运行 **MapReduce** 作业来:
+As an alternative, we can run **MapReduce** jobs on the raw transaction files to:
-* 分类每个交易
-* 按类别生成每月总支出
+* Categorize each transaction
+* Generate aggregate monthly spending by category
-对交易文件的运行分析可以显著减少数据库的负载。
+Running analyses on the transaction files could significantly reduce the load on the database.
-如果用户更新类别,我们可以调用 **预算服务** 重新运行分析。
+We could call the **Budget Service** to re-run the analysis if the user updates a category.
-**告知你的面试官你准备写多少代码**.
+**Clarify with your interviewer how much code you are expected to write**.
-日志文件格式样例,以tab分割:
+Sample log file format, tab delimited:
```
user_id timestamp seller amount
```
-**MapReduce** 实现:
+**MapReduce** implementation:
```python
class SpendingByCategory(MRJob):
@@ -283,25 +282,26 @@ class SpendingByCategory(MRJob):
...
def calc_current_year_month(self):
- """返回当前年月"""
+ """Return the current year and month."""
...
def extract_year_month(self, timestamp):
- """返回时间戳的年,月部分"""
+ """Return the year and month portions of the timestamp."""
...
def handle_budget_notifications(self, key, total):
- """如果接近或超出预算,调用通知API"""
+ """Call notification API if nearing or exceeded budget."""
...
def mapper(self, _, line):
- """解析每个日志行,提取和转换相关行。
+ """Parse each log line, extract and transform relevant lines.
- 参数行应为如下形式:
+ Argument line will be of the form:
user_id timestamp seller amount
- 使用分类器来将卖家转换成类别,生成如下形式的key-value对:
+ Using the categorizer to convert seller to category,
+ emit key value pairs of the form:
(user_id, 2016-01, shopping), 25
(user_id, 2016-01, shopping), 100
@@ -314,7 +314,7 @@ class SpendingByCategory(MRJob):
yield (user_id, period, category), amount
def reducer(self, key, value):
- """将每个key对应的值求和。
+ """Sum values for each key.
(user_id, 2016-01, shopping), 125
(user_id, 2016-01, gas), 50
@@ -323,118 +323,119 @@ class SpendingByCategory(MRJob):
yield key, sum(values)
```
-## 第四步:设计扩展
+## Step 4: Scale the design
-> 根据限制条件,找到并解决瓶颈。
+> Identify and address bottlenecks, given the constraints.
![Imgur](http://i.imgur.com/V5q57vU.png)
-**重要提示:不要从最初设计直接跳到最终设计中!**
+**Important: Do not simply jump right into the final design from the initial design!**
-现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
-讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
+It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
-我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
-**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
+*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
-* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
-* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
-* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
-* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
-* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
-* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
-* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
-* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
-* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
-* [异步](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#异步)
-* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
-* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
+* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
+* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
+* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
+* [Cache](https://github.com/donnemartin/system-design-primer#cache)
+* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
+* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
+* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [Asynchronism](https://github.com/donnemartin/system-design-primer#asynchronism)
+* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
+* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
-我们将增加一个额外的用例:**用户** 访问摘要和交易数据。
+We'll add an additional use case: **User** accesses summaries and transactions.
-用户会话,按类别统计的统计信息,以及最近的事务可以放在 **内存缓存**(如 Redis 或 Memcached )中。
+User sessions, aggregate stats by category, and recent transactions could be placed in a **Memory Cache** such as Redis or Memcached.
-* **客户端** 发送读请求给 **Web 服务器**
-* **Web 服务器** 转发请求到 **读 API** 服务器
- * 静态内容可通过 **对象存储** 比如缓存在 **CDN** 上的 S3 来服务
-* **读 API** 服务器做如下动作:
- * 检查 **内存缓存** 的内容
- * 如果URL在 **内存缓存**中,返回缓存的内容
- * 否则
- * 如果URL在 **SQL 数据库**中,获取该内容
- * 以其内容更新 **内存缓存**
+* The **Client** sends a read request to the **Web Server**
+* The **Web Server** forwards the request to the **Read API** server
+ * Static content can be served from the **Object Store** such as S3, which is cached on the **CDN**
+* The **Read API** server does the following:
+ * Checks the **Memory Cache** for the content
+ * If the url is in the **Memory Cache**, returns the cached contents
+ * Else
+ * If the url is in the **SQL Database**, fetches the contents
+ * Updates the **Memory Cache** with the contents
-参考 [何时更新缓存](https://github.com/donnemartin/system-design-primer#when-to-update-the-cache) 中权衡和替代的内容。以上方法描述了 [cache-aside缓存模式](https://github.com/donnemartin/system-design-primer#cache-aside).
+Refer to [When to update the cache](https://github.com/donnemartin/system-design-primer#when-to-update-the-cache) for tradeoffs and alternatives. The approach above describes [cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside).
-我们可以使用诸如 Amazon Redshift 或者 Google BigQuery 等数据仓库解决方案,而不是将`monthly_spending`聚合表保留在 **SQL 数据库** 中。
+Instead of keeping the `monthly_spending` aggregate table in the **SQL Database**, we could create a separate **Analytics Database** using a data warehousing solution such as Amazon Redshift or Google BigQuery.
-我们可能只想在数据库中存储一个月的`交易`数据,而将其余数据存储在数据仓库或者 **对象存储区** 中。**对象存储区** (如Amazon S3) 能够舒服地解决每月 250 GB新内容的限制。
+We might only want to store a month of `transactions` data in the database, while storing the rest in a data warehouse or in an **Object Store**. An **Object Store** such as Amazon S3 can comfortably handle the constraint of 250 GB of new content per month.
-为了解决每秒 *平均* 2000 次读请求数(峰值时更高),受欢迎的内容的流量应由 **内存缓存** 而不是数据库来处理。 **内存缓存** 也可用于处理不均匀分布的流量和流量尖峰。 只要副本不陷入重复写入的困境,**SQL 读副本** 应该能够处理高速缓存未命中。
+To address the 2,000 *average* read requests per second (higher at peak), traffic for popular content should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. The **SQL Read Replicas** should be able to handle the cache misses, as long as the replicas are not bogged down with replicating writes.
-*平均* 200 次交易写入每秒(峰值时更高)对于单个 **SQL 写入主-从服务** 来说可能是棘手的。我们可能需要考虑其它的 SQL 性能拓展技术:
+200 *average* transaction writes per second (higher at peak) might be tough for a single **SQL Write Master-Slave**. We might need to employ additional SQL scaling patterns:
-* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
-* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
-* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
-* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+* [Federation](https://github.com/donnemartin/system-design-primer#federation)
+* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
+* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
-我们也可以考虑将一些数据移至 **NoSQL 数据库**。
+We should also consider moving some data to a **NoSQL Database**.
-## 其它要点
+## Additional talking points
-> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+> Additional topics to dive into, depending on the problem scope and time remaining.
#### NoSQL
-* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
-* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
-* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
-* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
+* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
-### 缓存
+### Caching
-* 在哪缓存
- * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
- * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
- * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
- * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
- * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
-* 什么需要缓存
- * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
- * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
-* 何时更新缓存
- * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
- * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
- * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
- * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+* Where to cache
+ * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
+* What to cache
+ * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* When to update the cache
+ * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
+ * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
-### 异步与微服务
+### Asynchronism and microservices
-* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
-* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
-* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
-* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
+* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
+* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
-### 通信
+### Communications
-* 可权衡选择的方案:
- * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
- * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
-* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+* Discuss tradeoffs:
+ * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
-### 安全性
+### Security
-请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
+Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
-### 延迟数值
+### Latency numbers
-请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
-### 持续探讨
+### Ongoing
-* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
-* 架构拓展是一个迭代的过程。
+* Continue benchmarking and monitoring your system to address bottlenecks as they come up
+* Scaling is an iterative process
diff --git a/solutions/system_design/pastebin/README.md b/solutions/system_design/pastebin/README.md
index 9210b02b..756c78c2 100644
--- a/solutions/system_design/pastebin/README.md
+++ b/solutions/system_design/pastebin/README.md
@@ -1,113 +1,112 @@
-# 设计 Pastebin.com(或 Bit.ly)
+# Design Pastebin.com (or Bit.ly)
-**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
-除了粘贴板需要存储的是完整的内容而不是短链接之外,**设计 Bit.ly**是与本文类似的一个问题。
+**Design Bit.ly** - is a similar question, except pastebin requires storing the paste contents instead of the original unshortened url.
-## 第一步:简述用例与约束条件
+## Step 1: Outline use cases and constraints
-> 搜集需求与问题的范围。
-> 提出问题来明确用例与约束条件。
-> 讨论假设。
+> Gather requirements and scope the problem.
+> Ask questions to clarify use cases and constraints.
+> Discuss assumptions.
-我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
-### 用例
+### Use cases
-#### 我们将把问题限定在仅处理以下用例的范围中
+#### We'll scope the problem to handle only the following use cases
+* **User** enters a block of text and gets a randomly generated link
+ * Expiration
+ * Default setting does not expire
+ * Can optionally set a timed expiration
+* **User** enters a paste's url and views the contents
+* **User** is anonymous
+* **Service** tracks analytics of pages
+ * Monthly visit stats
+* **Service** deletes expired pastes
+* **Service** has high availability
-* **用户**输入一些文本,然后得到一个随机生成的链接
- * 过期时间
- * 默认为永不过期
- * 可选设置为一定时间过期
-* **用户**输入粘贴板中的 url,查看内容
-* **用户**是匿名访问的
-* **服务**需要能够对页面进行跟踪分析
- * 月访问量统计
-* **服务**将过期的内容删除
-* **服务**有着高可用性
+#### Out of scope
-#### 不在用例范围内的有
+* **User** registers for an account
+ * **User** verifies email
+* **User** logs into a registered account
+ * **User** edits the document
+* **User** can set visibility
+* **User** can set the shortlink
-* **用户**注册了账号
- * **用户**通过了邮箱验证
-* **用户**登录已注册的账号
- * **用户**编辑他们的文档
-* **用户**能设置他们的内容是否可见
-* **用户**是否能自行设置短链接
+### Constraints and assumptions
-### 限制条件与假设
+#### State assumptions
-#### 提出假设
+* Traffic is not evenly distributed
+* Following a short link should be fast
+* Pastes are text only
+* Page view analytics do not need to be realtime
+* 10 million users
+* 10 million paste writes per month
+* 100 million paste reads per month
+* 10:1 read to write ratio
-* 网络流量不是均匀分布的
-* 生成短链接的速度必须要快
-* 只允许粘贴文本
-* 不需要对页面预览做实时分析
-* 1000 万用户
-* 每个月 1000 万次粘贴
-* 每个月 1 亿次读取请求
-* 10:1 的读写比例
+#### Calculate usage
-#### 计算用量
+**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
-**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+* Size per paste
+ * 1 KB content per paste
+ * `shortlink` - 7 bytes
+ * `expiration_length_in_minutes` - 4 bytes
+ * `created_at` - 5 bytes
+ * `paste_path` - 255 bytes
+ * total = ~1.27 KB
+* 12.7 GB of new paste content per month
+ * 1.27 KB per paste * 10 million pastes per month
+ * ~450 GB of new paste content in 3 years
+ * 360 million shortlinks in 3 years
+ * Assume most are new pastes instead of updates to existing ones
+* 4 paste writes per second on average
+* 40 read requests per second on average
-* 每次粘贴的用量
- * 1 KB 的内容
- * `shortlink` - 7 字节
- * `expiration_length_in_minutes` - 4 字节
- * `created_at` - 5 字节
- * `paste_path` - 255 字节
- * 总计:大约 1.27 KB
-* 每个月的粘贴造作将会产生 12.7 GB 的记录
- * 每次粘贴 1.27 KB * 1000 万次粘贴
- * 3年内大约产生了 450 GB 的新内容记录
- * 3年内生成了 36000 万个短链接
- * 假设大多数的粘贴操作都是新的粘贴而不是更新以前的粘贴内容
-* 平均每秒 4 次读取粘贴
-* 平均每秒 40 次读取粘贴请求
+Handy conversion guide:
-便利换算指南:
+* 2.5 million seconds per month
+* 1 request per second = 2.5 million requests per month
+* 40 requests per second = 100 million requests per month
+* 400 requests per second = 1 billion requests per month
-* 每个月有 250 万秒
-* 每秒一个请求 = 每个月 250 万次请求
-* 每秒 40 个请求 = 每个月 1 亿次请求
-* 每秒 400 个请求 = 每个月 10 亿次请求
+## Step 2: Create a high level design
-## 第二步:概要设计
-
-> 列出所有重要组件以规划概要设计。
+> Outline a high level design with all important components.
![Imgur](http://i.imgur.com/BKsBnmG.png)
-## 第三步:设计核心组件
+## Step 3: Design core components
-> 深入每个核心组件的细节。
+> Dive into details for each core component.
-### 用例:用户输入一些文本,然后得到一个随机生成的链接
+### Use case: User enters a block of text and gets a randomly generated link
-我们将使用[关系型数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms),将其作为一个超大哈希表,将生成的 url 和文件服务器上对应文件的路径一一对应。
+We could use a [relational database](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) as a large hash table, mapping the generated url to a file server and path containing the paste file.
-我们可以使用诸如 Amazon S3 之类的**对象存储服务**或者[NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#nosql)来代替自建文件服务器。
+Instead of managing a file server, we could use a managed **Object Store** such as Amazon S3 or a [NoSQL document store](https://github.com/donnemartin/system-design-primer#document-store).
-除了使用关系型数据库来作为一个超大哈希表之外,我们也可以使用[NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#nosql)来代替它。[究竟是用 SQL 还是用 NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。不过在下面的讨论中,我们默认选择了使用关系型数据库的方案。
+An alternative to a relational database acting as a large hash table, we could use a [NoSQL key-value store](https://github.com/donnemartin/system-design-primer#key-value-store). We should discuss the [tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql). The following discussion uses the relational database approach.
-* **客户端**向向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个粘贴请求
-* **Web 服务器** 将请求转发给**Write API** 服务
-* **Write API**服务将会:
- * 生成一个独一无二的 url
- * 通过在 **SQL 数据库**中查重来确认这个 url 是否的确独一无二
- * 如果这个 url 已经存在了,重新生成一个 url
- * 如果支持自定义 url,我们也可以使用用户提供的 url(也需要进行查重)
- * 将 url 存入 **SQL 数据库**的 `pastes` 表中
- * 将粘贴的数据存入**对象存储**系统中
- * 返回 url
+* The **Client** sends a create paste request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* The **Web Server** forwards the request to the **Write API** server
+* The **Write API** server does the following:
+ * Generates a unique url
+ * Checks if the url is unique by looking at the **SQL Database** for a duplicate
+ * If the url is not unique, it generates another url
+ * If we supported a custom url, we could use the user-supplied (also check for a duplicate)
+ * Saves to the **SQL Database** `pastes` table
+ * Saves the paste data to the **Object Store**
+ * Returns the url
-**向你的面试官告知你准备写多少代码**。
+**Clarify with your interviewer how much code you are expected to write**.
-`pastes` 表的数据结构如下:
+The `pastes` table could have the following structure:
```
shortlink char(7) NOT NULL
@@ -117,19 +116,19 @@ paste_path varchar(255) NOT NULL
PRIMARY KEY(shortlink)
```
-我们会以`shortlink` 与 `created_at` 创建一个 [索引](https://github.com/donnemartin/system-design-primer#use-good-indices)以加快查询速度(只需要使用读取日志的时间,不再需要每次都扫描整个数据表)并让数据常驻内存。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
+We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `shortlink ` and `created_at` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
-为了生成独一无二的 url,我们需要:
+To generate the unique url, we could:
-* 对用户的 IP 地址 + 时间戳进行 [**MD5**](https://en.wikipedia.org/wiki/MD5) 哈希编码
- * MD5 是一种非常常用的哈希化函数,它能生成 128 字节的哈希值
- * MD5 是均匀分布的
- * 另外,我们可以使用 MD5 哈希算法来生成随机数据
-* 对 MD5 哈希值进行 [**Base 62**](https://www.kerstner.at/2012/07/shortening-strings-using-base-62-encoding/) 编码
- * Base 62 编码后的值由 `[a-zA-Z0-9]` 组成,它们可以直接作为 url 的字符,不需要再次转义
- * 在这儿仅仅只对原始输入进行过一次哈希处理,Base 62 编码步骤是确定性的(不涉及随机性)
- * Base 64 是另一种很流行的编码形式,但是它生成的字符串作为 url 存在一些问题:Base 64m字符串内包含 `+` 和 `/` 符号
- * 下面的 [Base 62 pseudocode](http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener) 算法时间复杂度为 O(k),本例中取 num =7,即 k 值为 7:
+* Take the [**MD5**](https://en.wikipedia.org/wiki/MD5) hash of the user's ip_address + timestamp
+ * MD5 is a widely used hashing function that produces a 128-bit hash value
+ * MD5 is uniformly distributed
+ * Alternatively, we could also take the MD5 hash of randomly-generated data
+* [**Base 62**](https://www.kerstner.at/2012/07/shortening-strings-using-base-62-encoding/) encode the MD5 hash
+ * Base 62 encodes to `[a-zA-Z0-9]` which works well for urls, eliminating the need for escaping special characters
+ * There is only one hash result for the original input and Base 62 is deterministic (no randomness involved)
+ * Base 64 is another popular encoding but provides issues for urls because of the additional `+` and `/` characters
+ * The following [Base 62 pseudocode](http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener) runs in O(k) time where k is the number of digits = 7:
```python
def base_encode(num, base=62):
@@ -141,19 +140,20 @@ def base_encode(num, base=62):
digits = digits.reverse
```
-* 输出前 7 个字符,其结果将有 62^7 种可能的值,作为短链接来说足够了。因为我们限制了 3 年内最多产生 36000 万个短链接:
+* Take the first 7 characters of the output, which results in 62^7 possible values and should be sufficient to handle our constraint of 360 million shortlinks in 3 years:
```python
url = base_encode(md5(ip_address+timestamp))[:URL_LENGTH]
```
-我们可以调用一个公共的 [REST API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest):
+
+We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
```
$ curl -X POST --data '{ "expiration_length_in_minutes": "60", \
"paste_contents": "Hello World!" }' https://pastebin.com/api/v1/paste
```
-返回:
+Response:
```
{
@@ -161,16 +161,16 @@ $ curl -X POST --data '{ "expiration_length_in_minutes": "60", \
}
```
-而对于服务器内部的通信,我们可以使用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
+For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
-### 用例:用户输入了一个之前粘贴得到的 url,希望浏览其存储的内容
+### Use case: User enters a paste's url and views the contents
-* **客户端**向**Web 服务器**发起读取内容请求
-* **Web 服务器**将请求转发给**Read API**服务
-* **Read API**服务将会:
- * 在**SQL 数据库**中检查生成的 url
- * 如果查询的 url 存在于 **SQL 数据库**中,从**对象存储**服务将对应的粘贴内容取出
- * 否则,给用户返回报错
+* The **Client** sends a get paste request to the **Web Server**
+* The **Web Server** forwards the request to the **Read API** server
+* The **Read API** server does the following:
+ * Checks the **SQL Database** for the generated url
+ * If the url is in the **SQL Database**, fetch the paste contents from the **Object Store**
+ * Else, return an error message for the user
REST API:
@@ -178,7 +178,7 @@ REST API:
$ curl https://pastebin.com/api/v1/paste?shortlink=foobar
```
-返回:
+Response:
```
{
@@ -188,27 +188,27 @@ $ curl https://pastebin.com/api/v1/paste?shortlink=foobar
}
```
-### 用例:对页面进行跟踪分析
+### Use case: Service tracks analytics of pages
-由于不需要进行实时分析,因此我们可以简单地对 **Web 服务**产生的日志用 **MapReduce** 来统计 hit 计数(命中数)。
+Since realtime analytics are not a requirement, we could simply **MapReduce** the **Web Server** logs to generate hit counts.
-**向你的面试官告知你准备写多少代码**。
+**Clarify with your interviewer how much code you are expected to write**.
```python
class HitCounts(MRJob):
def extract_url(self, line):
- """从 log 中取出生成的 url。"""
+ """Extract the generated url from the log line."""
...
def extract_year_month(self, line):
- """返回时间戳中表示年份与月份的一部分"""
+ """Return the year and month portions of the timestamp."""
...
def mapper(self, _, line):
- """解析日志的每一行,提取并转换相关行,
+ """Parse each log line, extract and transform relevant lines.
- 将键值对设定为如下形式:
+ Emit key value pairs of the form:
(2016-01, url0), 1
(2016-01, url0), 1
@@ -218,8 +218,8 @@ class HitCounts(MRJob):
period = self.extract_year_month(line)
yield (period, url), 1
- def reducer(self, key, value):
- """将所有的 key 加起来
+ def reducer(self, key, values):
+ """Sum values for each key.
(2016-01, url0), 2
(2016-01, url1), 1
@@ -227,105 +227,106 @@ class HitCounts(MRJob):
yield key, sum(values)
```
-### 用例:服务删除过期的粘贴内容
+### Use case: Service deletes expired pastes
-我们可以通过扫描 **SQL 数据库**,查找出那些过期时间戳小于当前时间戳的条目,然后在表中删除(或者将其标记为过期)这些过期的粘贴内容。
+To delete expired pastes, we could just scan the **SQL Database** for all entries whose expiration timestamp are older than the current timestamp. All expired entries would then be deleted (or marked as expired) from the table.
-## 第四步:架构扩展
+## Step 4: Scale the design
-> 根据限制条件,找到并解决瓶颈。
+> Identify and address bottlenecks, given the constraints.
![Imgur](http://i.imgur.com/4edXG0T.png)
-**重要提示:不要从最初设计直接跳到最终设计中!**
+**Important: Do not simply jump right into the final design from the initial design!**
-现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+State you would do this iteratively: 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
-讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
+It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
-我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
-**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
+*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
-* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
-* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
-* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
-* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
-* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
-* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
-* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
-* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
-* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
-* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
-* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
+* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
+* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
+* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
+* [Cache](https://github.com/donnemartin/system-design-primer#cache)
+* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
+* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
+* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
+* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
-**分析数据库** 可以用现成的数据仓储系统,例如使用 Amazon Redshift 或者 Google BigQuery 的解决方案。
+The **Analytics Database** could use a data warehousing solution such as Amazon Redshift or Google BigQuery.
-Amazon S3 的**对象存储**系统可以很方便地设置每个月限制只允许新增 12.7 GB 的存储内容。
+An **Object Store** such as Amazon S3 can comfortably handle the constraint of 12.7 GB of new content per month.
-平均每秒 40 次的读取请求(峰值将会更高), 可以通过扩展 **内存缓存** 来处理热点内容的读取流量,这对于处理不均匀分布的流量和流量峰值也很有用。只要 SQL 副本不陷入复制-写入困境中,**SQL Read 副本** 基本能够处理缓存命中问题。
+To address the 40 *average* read requests per second (higher at peak), traffic for popular content should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. The **SQL Read Replicas** should be able to handle the cache misses, as long as the replicas are not bogged down with replicating writes.
-平均每秒 4 次的粘贴写入操作(峰值将会更高)对于单个**SQL 写主-从** 模式来说是可行的。不过,我们也需要考虑其它的 SQL 性能拓展技术:
+4 *average* paste writes per second (with higher at peak) should be do-able for a single **SQL Write Master-Slave**. Otherwise, we'll need to employ additional SQL scaling patterns:
-* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
-* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
-* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
-* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+* [Federation](https://github.com/donnemartin/system-design-primer#federation)
+* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
+* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
-我们也可以考虑将一些数据移至 **NoSQL 数据库**。
+We should also consider moving some data to a **NoSQL Database**.
-## 其它要点
+## Additional talking points
-> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+> Additional topics to dive into, depending on the problem scope and time remaining.
#### NoSQL
-* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
-* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
-* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
-* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
+* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
-### 缓存
+### Caching
-* 在哪缓存
- * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
- * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
- * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
- * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
- * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
-* 什么需要缓存
- * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
- * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
-* 何时更新缓存
- * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
- * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
- * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
- * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+* Where to cache
+ * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
+* What to cache
+ * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* When to update the cache
+ * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
+ * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
-### 异步与微服务
+### Asynchronism and microservices
-* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
-* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
-* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
-* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
+* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
+* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
-### 通信
+### Communications
-* 可权衡选择的方案:
- * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
- * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
-* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+* Discuss tradeoffs:
+ * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
-### 安全性
+### Security
-请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
+Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
-### 延迟数值
+### Latency numbers
-请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
-### 持续探讨
+### Ongoing
-* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
-* 架构拓展是一个迭代的过程。
+* Continue benchmarking and monitoring your system to address bottlenecks as they come up
+* Scaling is an iterative process
diff --git a/solutions/system_design/query_cache/README.md b/solutions/system_design/query_cache/README.md
index c6f4be75..032adf34 100644
--- a/solutions/system_design/query_cache/README.md
+++ b/solutions/system_design/query_cache/README.md
@@ -1,101 +1,101 @@
-# 设计一个键-值缓存来存储最近 web 服务查询的结果
+# Design a key-value cache to save the results of the most recent web server queries
-**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
-## 第一步:简述用例与约束条件
+## Step 1: Outline use cases and constraints
-> 搜集需求与问题的范围。
-> 提出问题来明确用例与约束条件。
-> 讨论假设。
+> Gather requirements and scope the problem.
+> Ask questions to clarify use cases and constraints.
+> Discuss assumptions.
-我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
-### 用例
+### Use cases
-#### 我们将把问题限定在仅处理以下用例的范围中
+#### We'll scope the problem to handle only the following use cases
-* **用户**发送一个搜索请求,命中缓存
-* **用户**发送一个搜索请求,未命中缓存
-* **服务**有着高可用性
+* **User** sends a search request resulting in a cache hit
+* **User** sends a search request resulting in a cache miss
+* **Service** has high availability
-### 限制条件与假设
+### Constraints and assumptions
-#### 提出假设
+#### State assumptions
-* 网络流量不是均匀分布的
- * 经常被查询的内容应该一直存于缓存中
- * 需要确定如何规定缓存过期、缓存刷新规则
-* 缓存提供的服务查询速度要快
-* 机器间延迟较低
-* 缓存有内存限制
- * 需要决定缓存什么、移除什么
- * 需要缓存百万级的查询
-* 1000 万用户
-* 每个月 100 亿次查询
+* Traffic is not evenly distributed
+ * Popular queries should almost always be in the cache
+ * Need to determine how to expire/refresh
+* Serving from cache requires fast lookups
+* Low latency between machines
+* Limited memory in cache
+ * Need to determine what to keep/remove
+ * Need to cache millions of queries
+* 10 million users
+* 10 billion queries per month
-#### 计算用量
+#### Calculate usage
-**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
-* 缓存存储的是键值对有序表,键为 `query`(查询),值为 `results`(结果)。
- * `query` - 50 字节
- * `title` - 20 字节
- * `snippet` - 200 字节
- * 总计:270 字节
-* 假如 100 亿次查询都是不同的,且全部需要存储,那么每个月需要 2.7 TB 的缓存空间
- * 单次查询 270 字节 * 每月查询 100 亿次
- * 假设内存大小有限制,需要决定如何制定缓存过期规则
-* 每秒 4,000 次请求
+* Cache stores ordered list of key: query, value: results
+ * `query` - 50 bytes
+ * `title` - 20 bytes
+ * `snippet` - 200 bytes
+ * Total: 270 bytes
+* 2.7 TB of cache data per month if all 10 billion queries are unique and all are stored
+ * 270 bytes per search * 10 billion searches per month
+ * Assumptions state limited memory, need to determine how to expire contents
+* 4,000 requests per second
-便利换算指南:
+Handy conversion guide:
-* 每个月有 250 万秒
-* 每秒一个请求 = 每个月 250 万次请求
-* 每秒 40 个请求 = 每个月 1 亿次请求
-* 每秒 400 个请求 = 每个月 10 亿次请求
+* 2.5 million seconds per month
+* 1 request per second = 2.5 million requests per month
+* 40 requests per second = 100 million requests per month
+* 400 requests per second = 1 billion requests per month
-## 第二步:概要设计
+## Step 2: Create a high level design
-> 列出所有重要组件以规划概要设计。
+> Outline a high level design with all important components.
![Imgur](http://i.imgur.com/KqZ3dSx.png)
-## 第三步:设计核心组件
+## Step 3: Design core components
-> 深入每个核心组件的细节。
+> Dive into details for each core component.
-### 用例:用户发送了一次请求,命中了缓存
+### Use case: User sends a request resulting in a cache hit
-常用的查询可以由例如 Redis 或者 Memcached 之类的**内存缓存**提供支持,以减少数据读取延迟,并且避免**反向索引服务**以及**文档服务**的过载。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
+Popular queries can be served from a **Memory Cache** such as Redis or Memcached to reduce read latency and to avoid overloading the **Reverse Index Service** and **Document Service**. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
-由于缓存容量有限,我们将使用 LRU(近期最少使用算法)来控制缓存的过期。
+Since the cache has limited capacity, we'll use a least recently used (LRU) approach to expire older entries.
-* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
-* 这个 **Web 服务器**将请求转发给**查询 API** 服务
-* **查询 API** 服务将会做这些事情:
- * 分析查询
- * 移除多余的内容
- * 将文本分割成词组
- * 修正拼写错误
- * 规范化字母的大小写
- * 将查询转换为布尔运算
- * 检测**内存缓存**是否有匹配查询的内容
- * 如果命中**内存缓存**,**内存缓存**将会做以下事情:
- * 将缓存入口的位置指向 LRU 链表的头部
- * 返回缓存内容
- * 否则,**查询 API** 将会做以下事情:
- * 使用**反向索引服务**来查找匹配查询的文档
- * **反向索引服务**对匹配到的结果进行排名,然后返回最符合的结果
- * 使用**文档服务**返回文章标题与片段
- * 更新**内存缓存**,存入内容,将**内存缓存**入口位置指向 LRU 链表的头部
+* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* The **Web Server** forwards the request to the **Query API** server
+* The **Query API** server does the following:
+ * Parses the query
+ * Removes markup
+ * Breaks up the text into terms
+ * Fixes typos
+ * Normalizes capitalization
+ * Converts the query to use boolean operations
+ * Checks the **Memory Cache** for the content matching the query
+ * If there's a hit in the **Memory Cache**, the **Memory Cache** does the following:
+ * Updates the cached entry's position to the front of the LRU list
+ * Returns the cached contents
+ * Else, the **Query API** does the following:
+ * Uses the **Reverse Index Service** to find documents matching the query
+ * The **Reverse Index Service** ranks the matching results and returns the top ones
+ * Uses the **Document Service** to return titles and snippets
+ * Updates the **Memory Cache** with the contents, placing the entry at the front of the LRU list
-#### 缓存的实现
+#### Cache implementation
-缓存可以使用双向链表实现:新元素将会在头结点加入,过期的元素将会在尾节点被删除。我们使用哈希表以便能够快速查找每个链表节点。
+The cache can use a doubly-linked list: new items will be added to the head while items to expire will be removed from the tail. We'll use a hash table for fast lookups to each linked list node.
-**向你的面试官告知你准备写多少代码**。
+**Clarify with your interviewer how much code you are expected to write**.
-实现**查询 API 服务**:
+**Query API Server** implementation:
```python
class QueryApi(object):
@@ -105,8 +105,8 @@ class QueryApi(object):
self.reverse_index_service = reverse_index_service
def parse_query(self, query):
- """移除多余内容,将文本分割成词组,修复拼写错误,
- 规范化字母大小写,转换布尔运算。
+ """Remove markup, break text into terms, deal with typos,
+ normalize capitalization, convert to use boolean operations.
"""
...
@@ -119,7 +119,7 @@ class QueryApi(object):
return results
```
-实现**节点**:
+**Node** implementation:
```python
class Node(object):
@@ -129,7 +129,7 @@ class Node(object):
self.results = results
```
-实现**链表**:
+**LinkedList** implementation:
```python
class LinkedList(object):
@@ -148,7 +148,7 @@ class LinkedList(object):
...
```
-实现**缓存**:
+**Cache** implementation:
```python
class Cache(object):
@@ -160,9 +160,9 @@ class Cache(object):
self.linked_list = LinkedList()
def get(self, query)
- """从缓存取得存储的内容
+ """Get the stored query result from the cache.
- 将入口节点位置更新为 LRU 链表的头部。
+ Accessing a node updates its position to the front of the LRU list.
"""
node = self.lookup[query]
if node is None:
@@ -171,136 +171,136 @@ class Cache(object):
return node.results
def set(self, results, query):
- """将所给查询键的结果存在缓存中。
+ """Set the result for the given query key in the cache.
- 当更新缓存记录的时候,将它的位置指向 LRU 链表的头部。
- 如果这个记录是新的记录,并且缓存空间已满,应该在加入新记录前
- 删除最老的记录。
+ When updating an entry, updates its position to the front of the LRU list.
+ If the entry is new and the cache is at capacity, removes the oldest entry
+ before the new entry is added.
"""
node = self.lookup[query]
if node is not None:
- # 键存在于缓存中,更新它对应的值
+ # Key exists in cache, update the value
node.results = results
self.linked_list.move_to_front(node)
else:
- # 键不存在于缓存中
+ # Key does not exist in cache
if self.size == self.MAX_SIZE:
- # 在链表中查找并删除最老的记录
+ # Remove the oldest entry from the linked list and lookup
self.lookup.pop(self.linked_list.tail.query, None)
self.linked_list.remove_from_tail()
else:
self.size += 1
- # 添加新的键值对
+ # Add the new key and value
new_node = Node(query, results)
self.linked_list.append_to_front(new_node)
self.lookup[query] = new_node
```
-#### 何时更新缓存
+#### When to update the cache
-缓存将会在以下几种情况更新:
+The cache should be updated when:
-* 页面内容发生变化
-* 页面被移除或者加入了新页面
-* 页面的权值发生变动
+* The page contents change
+* The page is removed or a new page is added
+* The page rank changes
-解决这些问题的最直接的方法,就是为缓存记录设置一个它在被更新前能留在缓存中的最长时间,这个时间简称为存活时间(TTL)。
+The most straightforward way to handle these cases is to simply set a max time that a cached entry can stay in the cache before it is updated, usually referred to as time to live (TTL).
-参考 [「何时更新缓存」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#何时更新缓存)来了解其权衡取舍及替代方案。以上方法在[缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)一章中详细地进行了描述。
+Refer to [When to update the cache](https://github.com/donnemartin/system-design-primer#when-to-update-the-cache) for tradeoffs and alternatives. The approach above describes [cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside).
-## 第四步:架构扩展
+## Step 4: Scale the design
-> 根据限制条件,找到并解决瓶颈。
+> Identify and address bottlenecks, given the constraints.
![Imgur](http://i.imgur.com/4j99mhe.png)
-**重要提示:不要从最初设计直接跳到最终设计中!**
+**Important: Do not simply jump right into the final design from the initial design!**
-现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
-讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
+It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
-我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
-**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
+*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
-* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
-* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
-* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
-* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
-* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
-* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
-* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
-* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
+* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
+* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
+* [Cache](https://github.com/donnemartin/system-design-primer#cache)
+* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
+* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
-### 将内存缓存扩大到多台机器
+### Expanding the Memory Cache to many machines
-为了解决庞大的请求负载以及巨大的内存需求,我们将要对架构进行水平拓展。如何在我们的**内存缓存**集群中存储数据呢?我们有以下三个主要可选方案:
+To handle the heavy request load and the large amount of memory needed, we'll scale horizontally. We have three main options on how to store the data on our **Memory Cache** cluster:
-* **缓存集群中的每一台机器都有自己的缓存** - 简单,但是它会降低缓存命中率。
-* **缓存集群中的每一台机器都有缓存的拷贝** - 简单,但是它的内存使用效率太低了。
-* **对缓存进行[分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片),分别部署在缓存集群中的所有机器中** - 更加复杂,但是它是最佳的选择。我们可以使用哈希,用查询语句 `machine = hash(query)` 来确定哪台机器有需要缓存。当然我们也可以使用[一致性哈希](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#正在完善中)。
+* **Each machine in the cache cluster has its own cache** - Simple, although it will likely result in a low cache hit rate.
+* **Each machine in the cache cluster has a copy of the cache** - Simple, although it is an inefficient use of memory.
+* **The cache is [sharded](https://github.com/donnemartin/system-design-primer#sharding) across all machines in the cache cluster** - More complex, although it is likely the best option. We could use hashing to determine which machine could have the cached results of a query using `machine = hash(query)`. We'll likely want to use [consistent hashing](https://github.com/donnemartin/system-design-primer#under-development).
-## 其它要点
+## Additional talking points
-> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+> Additional topics to dive into, depending on the problem scope and time remaining.
-### SQL 缩放模式
+### SQL scaling patterns
-* [读取复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
-* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
-* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
-* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
-* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [Federation](https://github.com/donnemartin/system-design-primer#federation)
+* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
+* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
#### NoSQL
-* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
-* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
-* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
-* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
+* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
-### 缓存
+### Caching
-* 在哪缓存
- * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
- * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
- * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
- * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
- * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
-* 什么需要缓存
- * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
- * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
-* 何时更新缓存
- * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
- * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
- * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
- * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+* Where to cache
+ * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
+* What to cache
+ * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* When to update the cache
+ * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
+ * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
-### 异步与微服务
+### Asynchronism and microservices
-* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
-* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
-* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
-* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
+* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
+* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
-### 通信
+### Communications
-* 可权衡选择的方案:
- * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
- * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
-* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+* Discuss tradeoffs:
+ * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
-### 安全性
+### Security
-请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
+Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
-### 延迟数值
+### Latency numbers
-请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
-### 持续探讨
+### Ongoing
-* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
-* 架构拓展是一个迭代的过程。
+* Continue benchmarking and monitoring your system to address bottlenecks as they come up
+* Scaling is an iterative process
diff --git a/solutions/system_design/sales_rank/README.md b/solutions/system_design/sales_rank/README.md
index 960f9258..71ad1c7d 100644
--- a/solutions/system_design/sales_rank/README.md
+++ b/solutions/system_design/sales_rank/README.md
@@ -1,88 +1,88 @@
-# 为 Amazon 设计分类售卖排行
+# Design Amazon's sales rank by category feature
-**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
-## 第一步:简述用例与约束条件
+## Step 1: Outline use cases and constraints
-> 搜集需求与问题的范围。
-> 提出问题来明确用例与约束条件。
-> 讨论假设。
+> Gather requirements and scope the problem.
+> Ask questions to clarify use cases and constraints.
+> Discuss assumptions.
-我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
-### 用例
+### Use cases
-#### 我们将把问题限定在仅处理以下用例的范围中
+#### We'll scope the problem to handle only the following use case
-* **服务**根据分类计算过去一周中最受欢迎的商品
-* **用户**通过分类浏览过去一周中最受欢迎的商品
-* **服务**有着高可用性
+* **Service** calculates the past week's most popular products by category
+* **User** views the past week's most popular products by category
+* **Service** has high availability
-#### 不在用例范围内的有
+#### Out of scope
-* 一般的电商网站
- * 只为售卖排行榜设计组件
+* The general e-commerce site
+ * Design components only for calculating sales rank
-### 限制条件与假设
+### Constraints and assumptions
-#### 提出假设
+#### State assumptions
-* 网络流量不是均匀分布的
-* 一个商品可能存在于多个分类中
-* 商品不能够更改分类
-* 不会存在如 `foo/bar/baz` 之类的子分类
-* 每小时更新一次结果
- * 受欢迎的商品越多,就需要更频繁地更新
-* 1000 万个商品
-* 1000 个分类
-* 每个月 10 亿次交易
-* 每个月 1000 亿次读取请求
-* 100:1 的读写比例
+* Traffic is not evenly distributed
+* Items can be in multiple categories
+* Items cannot change categories
+* There are no subcategories ie `foo/bar/baz`
+* Results must be updated hourly
+ * More popular products might need to be updated more frequently
+* 10 million products
+* 1000 categories
+* 1 billion transactions per month
+* 100 billion read requests per month
+* 100:1 read to write ratio
-#### 计算用量
+#### Calculate usage
-**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
-* 每笔交易的用量:
- * `created_at` - 5 字节
- * `product_id` - 8 字节
- * `category_id` - 4 字节
- * `seller_id` - 8 字节
- * `buyer_id` - 8 字节
- * `quantity` - 4 字节
- * `total_price` - 5 字节
- * 总计:大约 40 字节
-* 每个月的交易内容会产生 40 GB 的记录
- * 每次交易 40 字节 * 每个月 10 亿次交易
- * 3年内产生了 1.44 TB 的新交易内容记录
- * 假定大多数的交易都是新交易而不是更改以前进行完的交易
-* 平均每秒 400 次交易次数
-* 平均每秒 40,000 次读取请求
+* Size per transaction:
+ * `created_at` - 5 bytes
+ * `product_id` - 8 bytes
+ * `category_id` - 4 bytes
+ * `seller_id` - 8 bytes
+ * `buyer_id` - 8 bytes
+ * `quantity` - 4 bytes
+ * `total_price` - 5 bytes
+ * Total: ~40 bytes
+* 40 GB of new transaction content per month
+ * 40 bytes per transaction * 1 billion transactions per month
+ * 1.44 TB of new transaction content in 3 years
+ * Assume most are new transactions instead of updates to existing ones
+* 400 transactions per second on average
+* 40,000 read requests per second on average
-便利换算指南:
+Handy conversion guide:
-* 每个月有 250 万秒
-* 每秒一个请求 = 每个月 250 万次请求
-* 每秒 40 个请求 = 每个月 1 亿次请求
-* 每秒 400 个请求 = 每个月 10 亿次请求
+* 2.5 million seconds per month
+* 1 request per second = 2.5 million requests per month
+* 40 requests per second = 100 million requests per month
+* 400 requests per second = 1 billion requests per month
-## 第二步:概要设计
+## Step 2: Create a high level design
-> 列出所有重要组件以规划概要设计。
+> Outline a high level design with all important components.
![Imgur](http://i.imgur.com/vwMa1Qu.png)
-## 第三步:设计核心组件
+## Step 3: Design core components
-> 深入每个核心组件的细节。
+> Dive into details for each core component.
-### 用例:服务需要根据分类计算上周最受欢迎的商品
+### Use case: Service calculates the past week's most popular products by category
-我们可以在现成的**对象存储**系统(例如 Amazon S3 服务)中存储 **售卖 API** 服务产生的日志文本, 因此不需要我们自己搭建分布式文件系统了。
+We could store the raw **Sales API** server log files on a managed **Object Store** such as Amazon S3, rather than managing our own distributed file system.
-**向你的面试官告知你准备写多少代码**。
+**Clarify with your interviewer how much code you are expected to write**.
-假设下面是一个用 tab 分割的简易的日志记录:
+We'll assume this is a sample log entry, tab delimited:
```
timestamp product_id category_id qty total_price seller_id buyer_id
@@ -95,25 +95,24 @@ t5 product4 category1 1 5.00 5 6
...
```
-**售卖排行服务** 需要用到 **MapReduce**,并使用 **售卖 API** 服务进行日志记录,同时将结果写入 **SQL 数据库**中的总表 `sales_rank` 中。我们也可以讨论一下[究竟是用 SQL 还是用 NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
+The **Sales Rank Service** could use **MapReduce**, using the **Sales API** server log files as input and writing the results to an aggregate table `sales_rank` in a **SQL Database**. We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
-我们需要通过以下步骤使用 **MapReduce**:
+We'll use a multi-step **MapReduce**:
-* **第 1 步** - 将数据转换为 `(category, product_id), sum(quantity)` 的形式
-* **第 2 步** - 执行分布式排序
+* **Step 1** - Transform the data to `(category, product_id), sum(quantity)`
+* **Step 2** - Perform a distributed sort
```python
class SalesRanker(MRJob):
def within_past_week(self, timestamp):
- """如果时间戳属于过去的一周则返回 True,
- 否则返回 False。"""
+ """Return True if timestamp is within past week, False otherwise."""
...
def mapper(self, _ line):
- """解析日志的每一行,提取并转换相关行,
+ """Parse each log line, extract and transform relevant lines.
- 将键值对设定为如下形式:
+ Emit key value pairs of the form:
(category1, product1), 2
(category2, product1), 2
@@ -128,7 +127,7 @@ class SalesRanker(MRJob):
yield (category_id, product_id), quantity
def reducer(self, key, value):
- """将每个 key 的值加起来。
+ """Sum values for each key.
(category1, product1), 2
(category2, product1), 3
@@ -139,9 +138,9 @@ class SalesRanker(MRJob):
yield key, sum(values)
def mapper_sort(self, key, value):
- """构造 key 以确保正确的排序。
+ """Construct key to ensure proper sorting.
- 将键值对转换成如下形式:
+ Transform key and value to the form:
(category1, 2), product1
(category2, 3), product1
@@ -149,8 +148,8 @@ class SalesRanker(MRJob):
(category2, 7), product3
(category1, 1), product4
- MapReduce 的随机排序步骤会将键
- 值的排序打乱,变成下面这样:
+ The shuffle/sort step of MapReduce will then do a
+ distributed sort on the keys, resulting in:
(category1, 1), product4
(category1, 2), product1
@@ -166,7 +165,7 @@ class SalesRanker(MRJob):
yield key, value
def steps(self):
- """ 此处为 map reduce 步骤"""
+ """Run the map and reduce steps."""
return [
self.mr(mapper=self.mapper,
reducer=self.reducer),
@@ -175,7 +174,7 @@ class SalesRanker(MRJob):
]
```
-得到的结果将会是如下的排序列,我们将其插入 `sales_rank` 表中:
+The result would be the following sorted list, which we could insert into the `sales_rank` table:
```
(category1, 1), product4
@@ -185,7 +184,7 @@ class SalesRanker(MRJob):
(category2, 7), product3
```
-`sales_rank` 表的数据结构如下:
+The `sales_rank` table could have the following structure:
```
id int NOT NULL AUTO_INCREMENT
@@ -197,21 +196,21 @@ FOREIGN KEY(category_id) REFERENCES Categories(id)
FOREIGN KEY(product_id) REFERENCES Products(id)
```
-我们会以 `id`、`category_id` 与 `product_id` 创建一个 [索引](https://github.com/donnemartin/system-design-primer#use-good-indices)以加快查询速度(只需要使用读取日志的时间,不再需要每次都扫描整个数据表)并让数据常驻内存。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
+We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id `, `category_id`, and `product_id` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
-### 用例:用户需要根据分类浏览上周中最受欢迎的商品
+### Use case: User views the past week's most popular products by category
-* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
-* 这个 **Web 服务器**将请求转发给**查询 API** 服务
-* The **查询 API** 服务将从 **SQL 数据库**的 `sales_rank` 表中读取数据
+* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* The **Web Server** forwards the request to the **Read API** server
+* The **Read API** server reads from the **SQL Database** `sales_rank` table
-我们可以调用一个公共的 [REST API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest):
+We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
```
$ curl https://amazon.com/api/v1/popular?category_id=1234
```
-返回:
+Response:
```
{
@@ -234,105 +233,106 @@ $ curl https://amazon.com/api/v1/popular?category_id=1234
},
```
-而对于服务器内部的通信,我们可以使用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
+For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
-## 第四步:架构扩展
+## Step 4: Scale the design
-> 根据限制条件,找到并解决瓶颈。
+> Identify and address bottlenecks, given the constraints.
![Imgur](http://i.imgur.com/MzExP06.png)
-**重要提示:不要从最初设计直接跳到最终设计中!**
+**Important: Do not simply jump right into the final design from the initial design!**
-现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
-讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
+It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
-我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
-**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
+*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
-* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
-* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
-* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
-* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
-* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
-* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
-* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
-* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
-* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
-* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
-* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
+* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
+* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
+* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
+* [Cache](https://github.com/donnemartin/system-design-primer#cache)
+* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
+* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
+* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
+* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
-**分析数据库** 可以用现成的数据仓储系统,例如使用 Amazon Redshift 或者 Google BigQuery 的解决方案。
+The **Analytics Database** could use a data warehousing solution such as Amazon Redshift or Google BigQuery.
-当使用数据仓储技术或者**对象存储**系统时,我们只想在数据库中存储有限时间段的数据。Amazon S3 的**对象存储**系统可以很方便地设置每个月限制只允许新增 40 GB 的存储内容。
+We might only want to store a limited time period of data in the database, while storing the rest in a data warehouse or in an **Object Store**. An **Object Store** such as Amazon S3 can comfortably handle the constraint of 40 GB of new content per month.
-平均每秒 40,000 次的读取请求(峰值将会更高), 可以通过扩展 **内存缓存** 来处理热点内容的读取流量,这对于处理不均匀分布的流量和流量峰值也很有用。由于读取量非常大,**SQL Read 副本** 可能会遇到处理缓存未命中的问题,我们可能需要使用额外的 SQL 扩展模式。
+To address the 40,000 *average* read requests per second (higher at peak), traffic for popular content (and their sales rank) should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. With the large volume of reads, the **SQL Read Replicas** might not be able to handle the cache misses. We'll probably need to employ additional SQL scaling patterns.
-平均每秒 400 次写操作(峰值将会更高)可能对于单个 **SQL 写主-从** 模式来说比较很困难,因此同时还需要更多的扩展技术
+400 *average* writes per second (higher at peak) might be tough for a single **SQL Write Master-Slave**, also pointing to a need for additional scaling techniques.
-SQL 缩放模式包括:
+SQL scaling patterns include:
-* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
-* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
-* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
-* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+* [Federation](https://github.com/donnemartin/system-design-primer#federation)
+* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
+* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
-我们也可以考虑将一些数据移至 **NoSQL 数据库**。
+We should also consider moving some data to a **NoSQL Database**.
-## 其它要点
+## Additional talking points
-> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+> Additional topics to dive into, depending on the problem scope and time remaining.
#### NoSQL
-* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
-* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
-* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
-* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
+* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
-### 缓存
+### Caching
-* 在哪缓存
- * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
- * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
- * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
- * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
- * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
-* 什么需要缓存
- * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
- * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
-* 何时更新缓存
- * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
- * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
- * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
- * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+* Where to cache
+ * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
+* What to cache
+ * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* When to update the cache
+ * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
+ * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
-### 异步与微服务
+### Asynchronism and microservices
-* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
-* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
-* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
-* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
+* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
+* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
-### 通信
+### Communications
-* 可权衡选择的方案:
- * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
- * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
-* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+* Discuss tradeoffs:
+ * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
-### 安全性
+### Security
-请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
+Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
-### 延迟数值
+### Latency numbers
-请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
-### 持续探讨
+### Ongoing
-* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
-* 架构拓展是一个迭代的过程。
+* Continue benchmarking and monitoring your system to address bottlenecks as they come up
+* Scaling is an iterative process
diff --git a/solutions/system_design/scaling_aws/README.md b/solutions/system_design/scaling_aws/README.md
index c071c70e..99af0cff 100644
--- a/solutions/system_design/scaling_aws/README.md
+++ b/solutions/system_design/scaling_aws/README.md
@@ -1,403 +1,403 @@
-# 在 AWS 上设计支持百万级到千万级用户的系统
+# Design a system that scales to millions of users on AWS
-**注释:为了避免重复,这篇文章的链接直接关联到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 的相关章节。为一讨论要点、折中方案和可选方案做参考。**
+*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
-## 第 1 步:用例和约束概要
+## Step 1: Outline use cases and constraints
-> 收集需求并调查问题。
-> 通过提问清晰用例和约束。
-> 讨论假设。
+> Gather requirements and scope the problem.
+> Ask questions to clarify use cases and constraints.
+> Discuss assumptions.
-如果没有面试官提出明确的问题,我们将自己定义一些用例和约束条件。
+Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
-### 用例
+### Use cases
-解决这个问题是一个循序渐进的过程:1) **基准/负载 测试**, 2) 瓶颈 **概述**, 3) 当评估可选和折中方案时定位瓶颈,4) 重复,这是向可扩展的设计发展基础设计的好模式。
+Solving this problem takes an iterative approach of: 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat, which is good pattern for evolving basic designs to scalable designs.
-除非你有 AWS 的背景或者正在申请需要 AWS 知识的相关职位,否则不要求了解 AWS 的相关细节。并且,这个练习中讨论的许多原则可以更广泛地应用于AWS生态系统之外。
+Unless you have a background in AWS or are applying for a position that requires AWS knowledge, AWS-specific details are not a requirement. However, **much of the principles discussed in this exercise can apply more generally outside of the AWS ecosystem.**
-#### 我们就处理以下用例讨论这一问题
+#### We'll scope the problem to handle only the following use cases
-* **用户** 进行读或写请求
- * **服务** 进行处理,存储用户数据,然后返回结果
-* **服务** 需要从支持小规模用户开始到百万用户
- * 在我们演化架构来处理大量的用户和请求时,讨论一般的扩展模式
-* **服务** 高可用
+* **User** makes a read or write request
+ * **Service** does processing, stores user data, then returns the results
+* **Service** needs to evolve from serving a small amount of users to millions of users
+ * Discuss general scaling patterns as we evolve an architecture to handle a large number of users and requests
+* **Service** has high availability
-### 约束和假设
+### Constraints and assumptions
-#### 状态假设
+#### State assumptions
-* 流量不均匀分布
-* 需要关系数据
-* 从一个用户扩展到千万用户
- * 表示用户量的增长
- * 用户量+
- * 用户量++
- * 用户量+++
+* Traffic is not evenly distributed
+* Need for relational data
+* Scale from 1 user to tens of millions of users
+ * Denote increase of users as:
+ * Users+
+ * Users++
+ * Users+++
* ...
- * 1000 万用户
- * 每月 10 亿次写入
- * 每月 1000 亿次读出
- * 100:1 读写比率
- * 每次写入 1 KB 内容
+ * 10 million users
+ * 1 billion writes per month
+ * 100 billion reads per month
+ * 100:1 read to write ratio
+ * 1 KB content per write
-#### 计算使用
+#### Calculate usage
-**向你的面试官厘清你是否应该做粗略的使用计算**
+**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
-* 1 TB 新内容 / 月
- * 1 KB 每次写入 * 10 亿 写入 / 月
- * 36 TB 新内容 / 3 年
- * 假设大多数写入都是新内容而不是更新已有内容
-* 平均每秒 400 次写入
-* 平均每秒 40,000 次读取
+* 1 TB of new content per month
+ * 1 KB per write * 1 billion writes per month
+ * 36 TB of new content in 3 years
+ * Assume most writes are from new content instead of updates to existing ones
+* 400 writes per second on average
+* 40,000 reads per second on average
-便捷的转换指南:
+Handy conversion guide:
-* 250 万秒 / 月
-* 1 次请求 / 秒 = 250 万次请求 / 月
-* 40 次请求 / 秒 = 1 亿次请求 / 月
-* 400 次请求 / 秒 = 10 亿请求 / 月
+* 2.5 million seconds per month
+* 1 request per second = 2.5 million requests per month
+* 40 requests per second = 100 million requests per month
+* 400 requests per second = 1 billion requests per month
-## 第 2 步:创建高级设计方案
+## Step 2: Create a high level design
-> 用所有重要组件概述高水平设计
+> Outline a high level design with all important components.
![Imgur](http://i.imgur.com/B8LDKD7.png)
-## 第 3 步:设计核心组件
+## Step 3: Design core components
-> 深入每个核心组件的细节。
+> Dive into details for each core component.
-### 用例:用户进行读写请求
+### Use case: User makes a read or write request
-#### 目标
+#### Goals
-* 只有 1-2 个用户时,你只需要基础配置
- * 为简单起见,只需要一台服务器
- * 必要时进行纵向扩展
- * 监控以确定瓶颈
+* With only 1-2 users, you only need a basic setup
+ * Single box for simplicity
+ * Vertical scaling when needed
+ * Monitor to determine bottlenecks
-#### 以单台服务器开始
+#### Start with a single box
-* **Web 服务器** 在 EC2 上
- * 存储用户数据
- * [**MySQL 数据库**](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
+* **Web server** on EC2
+ * Storage for user data
+ * [**MySQL Database**](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
-运用 **纵向扩展**:
+Use **Vertical Scaling**:
-* 选择一台更大容量的服务器
-* 密切关注指标,确定如何扩大规模
- * 使用基本监控来确定瓶颈:CPU、内存、IO、网络等
- * CloudWatch, top, nagios, statsd, graphite等
-* 纵向扩展的代价将变得更昂贵
-* 无冗余/容错
+* Simply choose a bigger box
+* Keep an eye on metrics to determine how to scale up
+ * Use basic monitoring to determine bottlenecks: CPU, memory, IO, network, etc
+ * CloudWatch, top, nagios, statsd, graphite, etc
+* Scaling vertically can get very expensive
+* No redundancy/failover
-**折中方案, 可选方案, 和其他细节:**
+*Trade-offs, alternatives, and additional details:*
-* **纵向扩展** 的可选方案是 [**横向扩展**](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* The alternative to **Vertical Scaling** is [**Horizontal scaling**](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
-#### 自 SQL 开始,但认真考虑 NoSQL
+#### Start with SQL, consider NoSQL
-约束条件假设需要关系型数据。我们可以开始时在单台服务器上使用 **MySQL 数据库**。
+The constraints assume there is a need for relational data. We can start off using a **MySQL Database** on the single box.
-**折中方案, 可选方案, 和其他细节:**
+*Trade-offs, alternatives, and additional details:*
-* 查阅 [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) 章节
-* 讨论使用 [SQL 或 NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql) 的原因
+* See the [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) section
+* Discuss reasons to use [SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
-#### 分配公共静态 IP
+#### Assign a public static IP
-* 弹性 IP 提供了一个公共端点,不会在重启时改变 IP。
-* 故障转移时只需要把域名指向新 IP。
+* Elastic IPs provide a public endpoint whose IP doesn't change on reboot
+* Helps with failover, just point the domain to a new IP
-#### 使用 DNS 服务
+#### Use a DNS
-添加 **DNS** 服务,比如 Route 53([Amazon Route 53](https://aws.amazon.com/cn/route53/) - 译者注),将域映射到实例的公共 IP 中。
+Add a **DNS** such as Route 53 to map the domain to the instance's public IP.
-**折中方案, 可选方案, 和其他细节:**
+*Trade-offs, alternatives, and additional details:*
-* 查阅 [域名系统](https://github.com/donnemartin/system-design-primer#domain-name-system) 章节
+* See the [Domain name system](https://github.com/donnemartin/system-design-primer#domain-name-system) section
-#### 安全的 Web 服务器
+#### Secure the web server
-* 只开放必要的端口
- * 允许 Web 服务器响应来自以下端口的请求
- * HTTP 80
- * HTTPS 443
- * SSH IP 白名单 22
- * 防止 Web 服务器启动外链
+* Open up only necessary ports
+ * Allow the web server to respond to incoming requests from:
+ * 80 for HTTP
+ * 443 for HTTPS
+ * 22 for SSH to only whitelisted IPs
+ * Prevent the web server from initiating outbound connections
-**折中方案, 可选方案, 和其他细节:**
+*Trade-offs, alternatives, and additional details:*
-* 查阅 [安全](https://github.com/donnemartin/system-design-primer#security) 章节
+* See the [Security](https://github.com/donnemartin/system-design-primer#security) section
-## 第 4 步:扩展设计
+## Step 4: Scale the design
-> 在给定约束条件下,定义和确认瓶颈。
+> Identify and address bottlenecks, given the constraints.
-### 用户+
+### Users+
![Imgur](http://i.imgur.com/rrfjMXB.png)
-#### 假设
+#### Assumptions
-我们的用户数量开始上升,并且单台服务器的负载上升。**基准/负载测试** 和 **分析** 指出 **MySQL 数据库** 占用越来越多的内存和 CPU 资源,同时用户数据将填满硬盘空间。
+Our user count is starting to pick up and the load is increasing on our single box. Our **Benchmarks/Load Tests** and **Profiling** are pointing to the **MySQL Database** taking up more and more memory and CPU resources, while the user content is filling up disk space.
-目前,我们尚能在纵向扩展时解决这些问题。不幸的是,解决这些问题的代价变得相当昂贵,并且原来的系统并不能允许在 **MySQL 数据库** 和 **Web 服务器** 的基础上进行独立扩展。
+We've been able to address these issues with **Vertical Scaling** so far. Unfortunately, this has become quite expensive and it doesn't allow for independent scaling of the **MySQL Database** and **Web Server**.
-#### 目标
+#### Goals
-* 减轻单台服务器负载并且允许独立扩展
- * 在 **对象存储** 中单独存储静态内容
- * 将 **MySQL 数据库** 迁移到单独的服务器上
-* 缺点
- * 这些变化会增加复杂性,并要求对 **Web服务器** 进行更改,以指向 **对象存储** 和 **MySQL 数据库**
- * 必须采取额外的安全措施来确保新组件的安全
- * AWS 的成本也会增加,但应该与自身管理类似系统的成本做比较
+* Lighten load on the single box and allow for independent scaling
+ * Store static content separately in an **Object Store**
+ * Move the **MySQL Database** to a separate box
+* Disadvantages
+ * These changes would increase complexity and would require changes to the **Web Server** to point to the **Object Store** and the **MySQL Database**
+ * Additional security measures must be taken to secure the new components
+ * AWS costs could also increase, but should be weighed with the costs of managing similar systems on your own
-#### 独立保存静态内容
+#### Store static content separately
-* 考虑使用像 S3 这样可管理的 **对象存储** 服务来存储静态内容
- * 高扩展性和可靠性
- * 服务器端加密
-* 迁移静态内容到 S3
- * 用户文件
+* Consider using a managed **Object Store** like S3 to store static content
+ * Highly scalable and reliable
+ * Server side encryption
+* Move static content to S3
+ * User files
* JS
* CSS
- * 图片
- * 视频
+ * Images
+ * Videos
-#### 迁移 MySQL 数据库到独立机器上
+#### Move the MySQL database to a separate box
-* 考虑使用类似 RDS 的服务来管理 **MySQL 数据库**
- * 简单的管理,扩展
- * 多个可用区域
- * 空闲时加密
+* Consider using a service like RDS to manage the **MySQL Database**
+ * Simple to administer, scale
+ * Multiple availability zones
+ * Encryption at rest
-#### 系统安全
+#### Secure the system
-* 在传输和空闲时对数据进行加密
-* 使用虚拟私有云
- * 为单个 **Web 服务器** 创建一个公共子网,这样就可以发送和接收来自 internet 的流量
- * 为其他内容创建一个私有子网,禁止外部访问
- * 在每个组件上只为白名单 IP 打开端口
-* 这些相同的模式应当在新的组件的实现中实践
+* Encrypt data in transit and at rest
+* Use a Virtual Private Cloud
+ * Create a public subnet for the single **Web Server** so it can send and receive traffic from the internet
+ * Create a private subnet for everything else, preventing outside access
+ * Only open ports from whitelisted IPs for each component
+* These same patterns should be implemented for new components in the remainder of the exercise
-**折中方案, 可选方案, 和其他细节:**
+*Trade-offs, alternatives, and additional details:*
-* 查阅 [安全](https://github.com/donnemartin/system-design-primer#security) 章节
+* See the [Security](https://github.com/donnemartin/system-design-primer#security) section
-### 用户+++
+### Users++
![Imgur](http://i.imgur.com/raoFTXM.png)
-#### 假设
+#### Assumptions
-我们的 **基准/负载测试** 和 **性能测试** 显示,在高峰时段,我们的单一 **Web服务器** 存在瓶颈,导致响应缓慢,在某些情况下还会宕机。随着服务的成熟,我们也希望朝着更高的可用性和冗余发展。
+Our **Benchmarks/Load Tests** and **Profiling** show that our single **Web Server** bottlenecks during peak hours, resulting in slow responses and in some cases, downtime. As the service matures, we'd also like to move towards higher availability and redundancy.
-#### 目标
+#### Goals
-* 下面的目标试图用 **Web服务器** 解决扩展问题
- * 基于 **基准/负载测试** 和 **分析**,你可能只需要实现其中的一两个技术
-* 使用 [**横向扩展**](https://github.com/donnemartin/system-design-primer#horizontal-scaling) 来处理增加的负载和单点故障
- * 添加 [**负载均衡器**](https://github.com/donnemartin/system-design-primer#load-balancer) 例如 Amazon 的 ELB 或 HAProxy
- * ELB 是高可用的
- * 如果你正在配置自己的 **负载均衡器**, 在多个可用区域中设置多台服务器用于 [双活](https://github.com/donnemartin/system-design-primer#active-active) 或 [主被](https://github.com/donnemartin/system-design-primer#active-passive) 将提高可用性
- * 终止在 **负载平衡器** 上的SSL,以减少后端服务器上的计算负载,并简化证书管理
- * 在多个可用区域中使用多台 **Web服务器**
- * 在多个可用区域的 [**主-从 故障转移**](https://github.com/donnemartin/system-design-primer#master-slave-replication) 模式中使用多个 **MySQL** 实例来改进冗余
-* 分离 **Web 服务器** 和 [**应用服务器**](https://github.com/donnemartin/system-design-primer#application-layer)
- * 独立扩展和配置每一层
- * **Web 服务器** 可以作为 [**反向代理**](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
- * 例如, 你可以添加 **应用服务器** 处理 **读 API** 而另外一些处理 **写 API**
-* 将静态(和一些动态)内容转移到 [**内容分发网络 (CDN)**](https://github.com/donnemartin/system-design-primer#content-delivery-network) 例如 CloudFront 以减少负载和延迟
+* The following goals attempt to address the scaling issues with the **Web Server**
+ * Based on the **Benchmarks/Load Tests** and **Profiling**, you might only need to implement one or two of these techniques
+* Use [**Horizontal Scaling**](https://github.com/donnemartin/system-design-primer#horizontal-scaling) to handle increasing loads and to address single points of failure
+ * Add a [**Load Balancer**](https://github.com/donnemartin/system-design-primer#load-balancer) such as Amazon's ELB or HAProxy
+ * ELB is highly available
+ * If you are configuring your own **Load Balancer**, setting up multiple servers in [active-active](https://github.com/donnemartin/system-design-primer#active-active) or [active-passive](https://github.com/donnemartin/system-design-primer#active-passive) in multiple availability zones will improve availability
+ * Terminate SSL on the **Load Balancer** to reduce computational load on backend servers and to simplify certificate administration
+ * Use multiple **Web Servers** spread out over multiple availability zones
+ * Use multiple **MySQL** instances in [**Master-Slave Failover**](https://github.com/donnemartin/system-design-primer#master-slave-replication) mode across multiple availability zones to improve redundancy
+* Separate out the **Web Servers** from the [**Application Servers**](https://github.com/donnemartin/system-design-primer#application-layer)
+ * Scale and configure both layers independently
+ * **Web Servers** can run as a [**Reverse Proxy**](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+ * For example, you can add **Application Servers** handling **Read APIs** while others handle **Write APIs**
+* Move static (and some dynamic) content to a [**Content Delivery Network (CDN)**](https://github.com/donnemartin/system-design-primer#content-delivery-network) such as CloudFront to reduce load and latency
-**折中方案, 可选方案, 和其他细节:**
+*Trade-offs, alternatives, and additional details:*
-* 查阅以上链接获得更多细节
+* See the linked content above for details
-### 用户+++
+### Users+++
![Imgur](http://i.imgur.com/OZCxJr0.png)
-**注意:** **内部负载均衡** 不显示以减少混乱
+**Note:** **Internal Load Balancers** not shown to reduce clutter
-#### 假设
+#### Assumptions
-我们的 **性能/负载测试** 和 **性能测试** 显示我们读操作频繁(100:1 的读写比率),并且数据库在高读请求时表现很糟糕。
+Our **Benchmarks/Load Tests** and **Profiling** show that we are read-heavy (100:1 with writes) and our database is suffering from poor performance from the high read requests.
-#### 目标
+#### Goals
-* 下面的目标试图解决 **MySQL数据库** 的伸缩性问题
- * * 基于 **基准/负载测试** 和 **分析**,你可能只需要实现其中的一两个技术
-* 将下列数据移动到一个 [**内存缓存**](https://github.com/donnemartin/system-design-primer#cache),例如弹性缓存,以减少负载和延迟:
- * **MySQL** 中频繁访问的内容
- * 首先, 尝试配置 **MySQL 数据库** 缓存以查看是否足以在实现 **内存缓存** 之前缓解瓶颈
- * 来自 **Web 服务器** 的会话数据
- * **Web 服务器** 变成无状态的, 允许 **自动伸缩**
- * 从内存中读取 1 MB 内存需要大约 250 微秒,而从SSD中读取时间要长 4 倍,从磁盘读取的时间要长 80 倍。1
-* 添加 [**MySQL 读取副本**](https://github.com/donnemartin/system-design-primer#master-slave-replication) 来减少写主线程的负载
-* 添加更多 **Web 服务器** and **应用服务器** 来提高响应
+* The following goals attempt to address the scaling issues with the **MySQL Database**
+ * Based on the **Benchmarks/Load Tests** and **Profiling**, you might only need to implement one or two of these techniques
+* Move the following data to a [**Memory Cache**](https://github.com/donnemartin/system-design-primer#cache) such as Elasticache to reduce load and latency:
+ * Frequently accessed content from **MySQL**
+ * First, try to configure the **MySQL Database** cache to see if that is sufficient to relieve the bottleneck before implementing a **Memory Cache**
+ * Session data from the **Web Servers**
+ * The **Web Servers** become stateless, allowing for **Autoscaling**
+ * Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
+* Add [**MySQL Read Replicas**](https://github.com/donnemartin/system-design-primer#master-slave-replication) to reduce load on the write master
+* Add more **Web Servers** and **Application Servers** to improve responsiveness
-**折中方案, 可选方案, 和其他细节:**
+*Trade-offs, alternatives, and additional details:*
-* 查阅以上链接获得更多细节
+* See the linked content above for details
-#### 添加 MySQL 读取副本
+#### Add MySQL read replicas
-* 除了添加和扩展 **内存缓存**,**MySQL 读副本服务器** 也能够帮助缓解在 **MySQL 写主服务器** 的负载。
-* 添加逻辑到 **Web 服务器** 来区分读和写操作
-* 在 **MySQL 读副本服务器** 之上添加 **负载均衡器** (不是为了减少混乱)
-* 大多数服务都是读取负载大于写入负载
+* In addition to adding and scaling a **Memory Cache**, **MySQL Read Replicas** can also help relieve load on the **MySQL Write Master**
+* Add logic to **Web Server** to separate out writes and reads
+* Add **Load Balancers** in front of **MySQL Read Replicas** (not pictured to reduce clutter)
+* Most services are read-heavy vs write-heavy
-**折中方案, 可选方案, 和其他细节:**
+*Trade-offs, alternatives, and additional details:*
-* 查阅 [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) 章节
+* See the [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) section
-### 用户++++
+### Users++++
![Imgur](http://i.imgur.com/3X8nmdL.png)
-#### 假设
+#### Assumptions
-**基准/负载测试** 和 **分析** 显示,在美国,正常工作时间存在流量峰值,当用户离开办公室时,流量骤降。我们认为,可以通过真实负载自动转换服务器数量来降低成本。我们是一家小商店,所以我们希望 DevOps 尽量自动化地进行 **自动伸缩** 和通用操作。
+Our **Benchmarks/Load Tests** and **Profiling** show that our traffic spikes during regular business hours in the U.S. and drop significantly when users leave the office. We think we can cut costs by automatically spinning up and down servers based on actual load. We're a small shop so we'd like to automate as much of the DevOps as possible for **Autoscaling** and for the general operations.
-#### 目标
+#### Goals
-* 根据需要添加 **自动扩展**
- * 跟踪流量高峰
- * 通过关闭未使用的实例来降低成本
-* 自动化 DevOps
- * Chef, Puppet, Ansible 工具等
-* 继续监控指标以解决瓶颈
- * **主机水平** - 检查一个 EC2 实例
- * **总水平** - 检查负载均衡器统计数据
- * **日志分析** - CloudWatch, CloudTrail, Loggly, Splunk, Sumo
- * **外部站点的性能** - Pingdom or New Relic
- * **处理通知和事件** - PagerDuty
- * **错误报告** - Sentry
+* Add **Autoscaling** to provision capacity as needed
+ * Keep up with traffic spikes
+ * Reduce costs by powering down unused instances
+* Automate DevOps
+ * Chef, Puppet, Ansible, etc
+* Continue monitoring metrics to address bottlenecks
+ * **Host level** - Review a single EC2 instance
+ * **Aggregate level** - Review load balancer stats
+ * **Log analysis** - CloudWatch, CloudTrail, Loggly, Splunk, Sumo
+ * **External site performance** - Pingdom or New Relic
+ * **Handle notifications and incidents** - PagerDuty
+ * **Error Reporting** - Sentry
-#### 添加自动扩展
+#### Add autoscaling
-* 考虑使用一个托管服务,比如AWS **自动扩展**
- * 为每个 **Web 服务器** 创建一个组,并为每个 **应用服务器** 类型创建一个组,将每个组放置在多个可用区域中
- * 设置最小和最大实例数
- * 通过 CloudWatch 来扩展或收缩
- * 可预测负载的简单时间度量
- * 一段时间内的指标:
- * CPU 负载
- * 延迟
- * 网络流量
- * 自定义指标
- * 缺点
- * 自动扩展会引入复杂性
- * 可能需要一段时间才能适当扩大规模,以满足增加的需求,或者在需求下降时缩减规模
+* Consider a managed service such as AWS **Autoscaling**
+ * Create one group for each **Web Server** and one for each **Application Server** type, place each group in multiple availability zones
+ * Set a min and max number of instances
+ * Trigger to scale up and down through CloudWatch
+ * Simple time of day metric for predictable loads or
+ * Metrics over a time period:
+ * CPU load
+ * Latency
+ * Network traffic
+ * Custom metric
+ * Disadvantages
+ * Autoscaling can introduce complexity
+ * It could take some time before a system appropriately scales up to meet increased demand, or to scale down when demand drops
-### 用户+++++
+### Users+++++
![Imgur](http://i.imgur.com/jj3A5N8.png)
-**注释:** **自动伸缩** 组不显示以减少混乱
+**Note:** **Autoscaling** groups not shown to reduce clutter
-#### 假设
+#### Assumptions
-当服务继续向着限制条件概述的方向发展,我们反复地运行 **基准/负载测试** 和 **分析** 来进一步发现和定位新的瓶颈。
+As the service continues to grow towards the figures outlined in the constraints, we iteratively run **Benchmarks/Load Tests** and **Profiling** to uncover and address new bottlenecks.
-#### 目标
+#### Goals
-由于问题的约束,我们将继续提出扩展性的问题:
+We'll continue to address scaling issues due to the problem's constraints:
-* 如果我们的 **MySQL 数据库** 开始变得过于庞大, 我们可能只考虑把数据在数据库中存储一段有限的时间, 同时在例如 Redshift 这样的数据仓库中存储其余的数据
- * 像 Redshift 这样的数据仓库能够轻松处理每月 1TB 的新内容
-* 平均每秒 40,000 次的读取请求, 可以通过扩展 **内存缓存** 来处理热点内容的读取流量,这对于处理不均匀分布的流量和流量峰值也很有用
- * **SQL读取副本** 可能会遇到处理缓存未命中的问题, 我们可能需要使用额外的 SQL 扩展模式
-* 对于单个 **SQL 写主-从** 模式来说,平均每秒 400 次写操作(明显更高)可能会很困难,同时还需要更多的扩展技术
+* If our **MySQL Database** starts to grow too large, we might consider only storing a limited time period of data in the database, while storing the rest in a data warehouse such as Redshift
+ * A data warehouse such as Redshift can comfortably handle the constraint of 1 TB of new content per month
+* With 40,000 average read requests per second, read traffic for popular content can be addressed by scaling the **Memory Cache**, which is also useful for handling the unevenly distributed traffic and traffic spikes
+ * The **SQL Read Replicas** might have trouble handling the cache misses, we'll probably need to employ additional SQL scaling patterns
+* 400 average writes per second (with presumably significantly higher peaks) might be tough for a single **SQL Write Master-Slave**, also pointing to a need for additional scaling techniques
-SQL 扩展模型包括:
+SQL scaling patterns include:
-* [集合](https://github.com/donnemartin/system-design-primer#federation)
-* [分片](https://github.com/donnemartin/system-design-primer#sharding)
-* [反范式](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [Federation](https://github.com/donnemartin/system-design-primer#federation)
+* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
+* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
-为了进一步处理高读和写请求,我们还应该考虑将适当的数据移动到一个 [**NoSQL数据库**](https://github.com/donnemartin/system-design-primer#nosql) ,例如 DynamoDB。
+To further address the high read and write requests, we should also consider moving appropriate data to a [**NoSQL Database**](https://github.com/donnemartin/system-design-primer#nosql) such as DynamoDB.
-我们可以进一步分离我们的 [**应用服务器**](https://github.com/donnemartin/system-design-primer#application-layer) 以允许独立扩展。不需要实时完成的批处理任务和计算可以通过 Queues 和 Workers 异步完成:
+We can further separate out our [**Application Servers**](https://github.com/donnemartin/system-design-primer#application-layer) to allow for independent scaling. Batch processes or computations that do not need to be done in real-time can be done [**Asynchronously**](https://github.com/donnemartin/system-design-primer#asynchronism) with **Queues** and **Workers**:
-* 以照片服务为例,照片上传和缩略图的创建可以分开进行
- * **客户端** 上传图片
- * **应用服务器** 推送一个任务到 **队列** 例如 SQS
- * EC2 上的 **Worker 服务** 或者 Lambda 从 **队列** 拉取 work,然后:
- * 创建缩略图
- * 更新 **数据库**
- * 在 **对象存储** 中存储缩略图
+* For example, in a photo service, the photo upload and the thumbnail creation can be separated:
+ * **Client** uploads photo
+ * **Application Server** puts a job in a **Queue** such as SQS
+ * The **Worker Service** on EC2 or Lambda pulls work off the **Queue** then:
+ * Creates a thumbnail
+ * Updates a **Database**
+ * Stores the thumbnail in the **Object Store**
-**折中方案, 可选方案, 和其他细节:**
+*Trade-offs, alternatives, and additional details:*
-* 查阅以上链接获得更多细节
+* See the linked content above for details
-## 额外的话题
+## Additional talking points
-> 根据问题的范围和剩余时间,还需要深入讨论其他问题。
+> Additional topics to dive into, depending on the problem scope and time remaining.
-### SQL 扩展模式
+### SQL scaling patterns
-* [读取副本](https://github.com/donnemartin/system-design-primer#master-slave-replication)
-* [集合](https://github.com/donnemartin/system-design-primer#federation)
-* [分区](https://github.com/donnemartin/system-design-primer#sharding)
-* [反规范化](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [Federation](https://github.com/donnemartin/system-design-primer#federation)
+* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
+* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
#### NoSQL
-* [键值存储](https://github.com/donnemartin/system-design-primer#key-value-store)
-* [文档存储](https://github.com/donnemartin/system-design-primer#document-store)
-* [宽表存储](https://github.com/donnemartin/system-design-primer#wide-column-store)
-* [图数据库](https://github.com/donnemartin/system-design-primer#graph-database)
+* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
+* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
-### 缓存
+### Caching
-* 缓存到哪里
- * [客户端缓存](https://github.com/donnemartin/system-design-primer#client-caching)
- * [CDN 缓存](https://github.com/donnemartin/system-design-primer#cdn-caching)
- * [Web 服务缓存](https://github.com/donnemartin/system-design-primer#web-server-caching)
- * [数据库缓存](https://github.com/donnemartin/system-design-primer#database-caching)
- * [应用缓存](https://github.com/donnemartin/system-design-primer#application-caching)
-* 缓存什么
- * [数据库请求层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
- * [对象层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
-* 何时更新缓存
- * [预留缓存](https://github.com/donnemartin/system-design-primer#cache-aside)
- * [完全写入](https://github.com/donnemartin/system-design-primer#write-through)
- * [延迟写 (写回)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
- * [事先更新](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+* Where to cache
+ * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
+* What to cache
+ * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* When to update the cache
+ * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
+ * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
-### 异步性和微服务
+### Asynchronism and microservices
-* [消息队列](https://github.com/donnemartin/system-design-primer#message-queues)
-* [任务队列](https://github.com/donnemartin/system-design-primer#task-queues)
-* [回退压力](https://github.com/donnemartin/system-design-primer#back-pressure)
-* [微服务](https://github.com/donnemartin/system-design-primer#microservices)
+* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
+* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
+* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
-### 沟通
+### Communications
-* 关于折中方案的讨论:
- * 客户端的外部通讯 - [遵循 REST 的 HTTP APIs](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
- * 内部通讯 - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
-* [服务探索](https://github.com/donnemartin/system-design-primer#service-discovery)
+* Discuss tradeoffs:
+ * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
-### 安全性
+### Security
-参考 [安全章节](https://github.com/donnemartin/system-design-primer#security)
+Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
-### 延迟数字指标
+### Latency numbers
-查阅 [每个程序员必懂的延迟数字](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know)
+See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
-### 正在进行
+### Ongoing
-* 继续基准测试并监控你的系统以解决出现的瓶颈问题
-* 扩展是一个迭代的过程
+* Continue benchmarking and monitoring your system to address bottlenecks as they come up
+* Scaling is an iterative process
diff --git a/solutions/system_design/social_graph/README.md b/solutions/system_design/social_graph/README.md
index 07b8e3e7..f7dfd4ef 100644
--- a/solutions/system_design/social_graph/README.md
+++ b/solutions/system_design/social_graph/README.md
@@ -1,66 +1,66 @@
-# 为社交网络设计数据结构
+# Design the data structures for a social network
-**注释:为了避免重复,这篇文章的链接直接关联到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 的相关章节。为一讨论要点、折中方案和可选方案做参考。**
+*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
-## 第 1 步:用例和约束概要
+## Step 1: Outline use cases and constraints
-> 收集需求并调查问题。
-> 通过提问清晰用例和约束。
-> 讨论假设。
+> Gather requirements and scope the problem.
+> Ask questions to clarify use cases and constraints.
+> Discuss assumptions.
-如果没有面试官提出明确的问题,我们将自己定义一些用例和约束条件。
+Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
-### 用例
+### Use cases
-#### 我们就处理以下用例审视这一问题
+#### We'll scope the problem to handle only the following use cases
-* **用户** 寻找某人并显示与被寻人之间的最短路径
-* **服务** 高可用
+* **User** searches for someone and sees the shortest path to the searched person
+* **Service** has high availability
-### 约束和假设
+### Constraints and assumptions
-#### 状态假设
+#### State assumptions
-* 流量分布不均
- * 某些搜索比别的更热门,同时某些搜索仅执行一次
-* 图数据不适用单一机器
-* 图的边没有权重
-* 1 千万用户
-* 每个用户平均有 50 个朋友
-* 每月 10 亿次朋友搜索
+* Traffic is not evenly distributed
+ * Some searches are more popular than others, while others are only executed once
+* Graph data won't fit on a single machine
+* Graph edges are unweighted
+* 100 million users
+* 50 friends per user average
+* 1 billion friend searches per month
-训练使用更传统的系统 - 别用图特有的解决方案例如 [GraphQL](http://graphql.org/) 或图数据库如 [Neo4j](https://neo4j.com/)。
+Exercise the use of more traditional systems - don't use graph-specific solutions such as [GraphQL](http://graphql.org/) or a graph database like [Neo4j](https://neo4j.com/)
-#### 计算使用
+#### Calculate usage
-**向你的面试官厘清你是否应该做粗略的使用计算**
+**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
-* 50 亿朋友关系
- * 1 亿用户 * 平均每人 50 个朋友
-* 每秒 400 次搜索请求
+* 5 billion friend relationships
+ * 100 million users * 50 friends per user average
+* 400 search requests per second
-便捷的转换指南:
+Handy conversion guide:
-* 每月 250 万秒
-* 每秒 1 个请求 = 每月 250 万次请求
-* 每秒 40 个请求 = 每月 1 亿次请求
-* 每秒 400 个请求 = 每月 10 亿次请求
+* 2.5 million seconds per month
+* 1 request per second = 2.5 million requests per month
+* 40 requests per second = 100 million requests per month
+* 400 requests per second = 1 billion requests per month
-## 第 2 步:创建高级设计方案
+## Step 2: Create a high level design
-> 用所有重要组件概述高水平设计
+> Outline a high level design with all important components.
![Imgur](http://i.imgur.com/wxXyq2J.png)
-## 第 3 步:设计核心组件
+## Step 3: Design core components
-> 深入每个核心组件的细节。
+> Dive into details for each core component.
-### 用例: 用户搜索某人并查看到被搜人的最短路径
+### Use case: User searches for someone and sees the shortest path to the searched person
-**和你的面试官说清你期望的代码量**
+**Clarify with your interviewer how much code you are expected to write**.
-没有百万用户(点)的和十亿朋友关系(边)的限制,我们能够用一般 BFS 方法解决无权重最短路径任务:
+Without the constraint of millions of users (vertices) and billions of friend relationships (edges), we could solve this unweighted shortest path task with a general BFS approach:
```python
class Graph(Graph):
@@ -99,22 +99,23 @@ class Graph(Graph):
return None
```
-我们不能在同一台机器上满足所有用户,我们需要通过 **人员服务器** [拆分](https://github.com/donnemartin/system-design-primer#sharding) 用户并且通过 **查询服务** 访问。
+We won't be able to fit all users on the same machine, we'll need to [shard](https://github.com/donnemartin/system-design-primer#sharding) users across **Person Servers** and access them with a **Lookup Service**.
-* **客户端** 向 **服务器** 发送请求,**服务器** 作为 [反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* **搜索 API** 服务器向 **用户图服务** 转发请求
-* **用户图服务** 有以下功能:
- * 使用 **查询服务** 找到当前用户信息存储的 **人员服务器**
- * 找到适当的 **人员服务器** 检索当前用户的 `friend_ids` 列表
- * 把当前用户作为 `source` 运行 BFS 搜索算法同时 当前用户的 `friend_ids` 作为每个 `adjacent_node` 的 ids
- * 给定 id 获取 `adjacent_node`:
- * **用户图服务** 将 **再次** 和 **查询服务** 通讯,最后判断出和给定 id 相匹配的存储 `adjacent_node` 的 **人员服务器**(有待优化)
+* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* The **Web Server** forwards the request to the **Search API** server
+* The **Search API** server forwards the request to the **User Graph Service**
+* The **User Graph Service** does the following:
+ * Uses the **Lookup Service** to find the **Person Server** where the current user's info is stored
+ * Finds the appropriate **Person Server** to retrieve the current user's list of `friend_ids`
+ * Runs a BFS search using the current user as the `source` and the current user's `friend_ids` as the ids for each `adjacent_node`
+ * To get the `adjacent_node` from a given id:
+ * The **User Graph Service** will *again* need to communicate with the **Lookup Service** to determine which **Person Server** stores the`adjacent_node` matching the given id (potential for optimization)
-**和你的面试官说清你应该写的代码量**
+**Clarify with your interviewer how much code you should be writing**.
-**注释**:简易版错误处理执行如下。询问你是否需要编写适当的错误处理方法。
+**Note**: Error handling is excluded below for simplicity. Ask if you should code proper error handing.
-**查询服务** 实现:
+**Lookup Service** implementation:
```python
class LookupService(object):
@@ -129,7 +130,7 @@ class LookupService(object):
return self.lookup[person_id]
```
-**人员服务器** 实现:
+**Person Server** implementation:
```python
class PersonServer(object):
@@ -148,7 +149,7 @@ class PersonServer(object):
return results
```
-**用户** 实现:
+**Person** implementation:
```python
class Person(object):
@@ -159,7 +160,7 @@ class Person(object):
self.friend_ids = friend_ids
```
-**用户图服务** 实现:
+**User Graph Service** implementation:
```python
class UserGraphService(object):
@@ -217,13 +218,13 @@ class UserGraphService(object):
return None
```
-我们用的是公共的 [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
+We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
```
$ curl https://social.com/api/v1/friend_search?person_id=1234
```
-响应:
+Response:
```
{
@@ -243,106 +244,106 @@ $ curl https://social.com/api/v1/friend_search?person_id=1234
},
```
-内部通信使用 [远端过程调用](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)。
+For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
-## 第 4 步:扩展设计
+## Step 4: Scale the design
-> 在给定约束条件下,定义和确认瓶颈。
+> Identify and address bottlenecks, given the constraints.
![Imgur](http://i.imgur.com/cdCv5g7.png)
-**重要:别简化从最初设计到最终设计的过程!**
+**Important: Do not simply jump right into the final design from the initial design!**
-你将要做的是:1) **基准/负载 测试**, 2) 瓶颈 **概述**, 3) 当评估可选和折中方案时定位瓶颈,4) 重复。以 [在 AWS 上设计支持百万级到千万级用户的系统](../scaling_aws/README.md) 为参考迭代地扩展最初设计。
+State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
-讨论最初设计可能遇到的瓶颈和处理方法十分重要。例如,什么问题可以通过添加多台 **Web 服务器** 作为 **负载均衡** 解决?**CDN**?**主从副本**?每个问题都有哪些替代和 **折中** 方案?
+It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
-我们即将介绍一些组件来完成设计和解决扩展性问题。内部负载均衡不显示以减少混乱。
+We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
-**避免重复讨论**,以下网址链接到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 相关的主流方案、折中方案和替代方案。
+*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
-* [负载均衡](https://github.com/donnemartin/system-design-primer#load-balancer)
-* [横向扩展](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
-* [Web 服务器(反向代理)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
-* [API 服务器(应用层)](https://github.com/donnemartin/system-design-primer#application-layer)
-* [缓存](https://github.com/donnemartin/system-design-primer#cache)
-* [一致性模式](https://github.com/donnemartin/system-design-primer#consistency-patterns)
-* [可用性模式](https://github.com/donnemartin/system-design-primer#availability-patterns)
+* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
+* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
+* [Cache](https://github.com/donnemartin/system-design-primer#cache)
+* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
+* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
-解决 **平均** 每秒 400 次请求的限制(峰值),人员数据可以存在例如 Redis 或 Memcached 这样的 **内存** 中以减少响应次数和下游流量通信服务。这尤其在用户执行多次连续查询和查询哪些广泛连接的人时十分有用。从内存中读取 1MB 数据大约要 250 微秒,从 SSD 中读取同样大小的数据时间要长 4 倍,从硬盘要长 80 倍。1
+To address the constraint of 400 *average* read requests per second (higher at peak), person data can be served from a **Memory Cache** such as Redis or Memcached to reduce response times and to reduce traffic to downstream services. This could be especially useful for people who do multiple searches in succession and for people who are well-connected. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
-以下是进一步优化方案:
+Below are further optimizations:
-* 在 **内存** 中存储完整的或部分的BFS遍历加快后续查找
-* 在 **NoSQL 数据库** 中批量离线计算并存储完整的或部分的BFS遍历加快后续查找
-* 在同一台 **人员服务器** 上托管批处理同一批朋友查找减少机器跳转
- * 通过地理位置 [拆分](https://github.com/donnemartin/system-design-primer#sharding) **人员服务器** 来进一步优化,因为朋友通常住得都比较近
-* 同时进行两个 BFS 查找,一个从 source 开始,一个从 destination 开始,然后合并两个路径
-* 从有庞大朋友圈的人开始找起,这样更有可能减小当前用户和搜索目标之间的 [离散度数](https://en.wikipedia.org/wiki/Six_degrees_of_separation)
-* 在询问用户是否继续查询之前设置基于时间或跳跃数阈值,当在某些案例中搜索耗费时间过长时。
-* 使用类似 [Neo4j](https://neo4j.com/) 的 **图数据库** 或图特定查询语法,例如 [GraphQL](http://graphql.org/)(如果没有禁止使用 **图数据库** 的限制的话)
+* Store complete or partial BFS traversals to speed up subsequent lookups in the **Memory Cache**
+* Batch compute offline then store complete or partial BFS traversals to speed up subsequent lookups in a **NoSQL Database**
+* Reduce machine jumps by batching together friend lookups hosted on the same **Person Server**
+ * [Shard](https://github.com/donnemartin/system-design-primer#sharding) **Person Servers** by location to further improve this, as friends generally live closer to each other
+* Do two BFS searches at the same time, one starting from the source, and one from the destination, then merge the two paths
+* Start the BFS search from people with large numbers of friends, as they are more likely to reduce the number of [degrees of separation](https://en.wikipedia.org/wiki/Six_degrees_of_separation) between the current user and the search target
+* Set a limit based on time or number of hops before asking the user if they want to continue searching, as searching could take a considerable amount of time in some cases
+* Use a **Graph Database** such as [Neo4j](https://neo4j.com/) or a graph-specific query language such as [GraphQL](http://graphql.org/) (if there were no constraint preventing the use of **Graph Databases**)
-## 额外的话题
+## Additional talking points
-> 根据问题的范围和剩余时间,还需要深入讨论其他问题。
+> Additional topics to dive into, depending on the problem scope and time remaining.
-### SQL 扩展模式
+### SQL scaling patterns
-* [读取副本](https://github.com/donnemartin/system-design-primer#master-slave-replication)
-* [集合](https://github.com/donnemartin/system-design-primer#federation)
-* [分区](https://github.com/donnemartin/system-design-primer#sharding)
-* [反规范化](https://github.com/donnemartin/system-design-primer#denormalization)
-* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
+* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [Federation](https://github.com/donnemartin/system-design-primer#federation)
+* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
+* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
#### NoSQL
-* [键值存储](https://github.com/donnemartin/system-design-primer#key-value-store)
-* [文档存储](https://github.com/donnemartin/system-design-primer#document-store)
-* [宽表存储](https://github.com/donnemartin/system-design-primer#wide-column-store)
-* [图数据库](https://github.com/donnemartin/system-design-primer#graph-database)
+* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
+* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
-### 缓存
+### Caching
-* 缓存到哪里
- * [客户端缓存](https://github.com/donnemartin/system-design-primer#client-caching)
- * [CDN 缓存](https://github.com/donnemartin/system-design-primer#cdn-caching)
- * [Web 服务缓存](https://github.com/donnemartin/system-design-primer#web-server-caching)
- * [数据库缓存](https://github.com/donnemartin/system-design-primer#database-caching)
- * [应用缓存](https://github.com/donnemartin/system-design-primer#application-caching)
-* 缓存什么
- * [数据库请求层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
- * [对象层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
-* 何时更新缓存
- * [预留缓存](https://github.com/donnemartin/system-design-primer#cache-aside)
- * [完全写入](https://github.com/donnemartin/system-design-primer#write-through)
- * [延迟写 (写回)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
- * [事先更新](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+* Where to cache
+ * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
+* What to cache
+ * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* When to update the cache
+ * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
+ * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
-### 异步性和微服务
+### Asynchronism and microservices
-* [消息队列](https://github.com/donnemartin/system-design-primer#message-queues)
-* [任务队列](https://github.com/donnemartin/system-design-primer#task-queues)
-* [回退压力](https://github.com/donnemartin/system-design-primer#back-pressure)
-* [微服务](https://github.com/donnemartin/system-design-primer#microservices)
+* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
+* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
+* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
-### 沟通
+### Communications
-* 关于折中方案的讨论:
- * 客户端的外部通讯 - [遵循 REST 的 HTTP APIs](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
- * 内部通讯 - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
-* [服务探索](https://github.com/donnemartin/system-design-primer#service-discovery)
+* Discuss tradeoffs:
+ * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
-### 安全性
+### Security
-参考 [安全章节](https://github.com/donnemartin/system-design-primer#security)
+Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
-### 延迟数字指标
+### Latency numbers
-查阅 [每个程序员必懂的延迟数字](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know)
+See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
-### 正在进行
+### Ongoing
-* 继续基准测试并监控你的系统以解决出现的瓶颈问题
-* 扩展是一个迭代的过程
+* Continue benchmarking and monitoring your system to address bottlenecks as they come up
+* Scaling is an iterative process
diff --git a/solutions/system_design/twitter/README.md b/solutions/system_design/twitter/README.md
index 1853444d..374f5dd2 100644
--- a/solutions/system_design/twitter/README.md
+++ b/solutions/system_design/twitter/README.md
@@ -1,126 +1,126 @@
-# 设计推特时间轴与搜索功能
+# Design the Twitter timeline and search
-**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
-**设计 Facebook 的 feed** 与**设计 Facebook 搜索**与此为同一类型问题。
+**Design the Facebook feed** and **Design Facebook search** are similar questions.
-## 第一步:简述用例与约束条件
+## Step 1: Outline use cases and constraints
-> 搜集需求与问题的范围。
-> 提出问题来明确用例与约束条件。
-> 讨论假设。
+> Gather requirements and scope the problem.
+> Ask questions to clarify use cases and constraints.
+> Discuss assumptions.
-我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
-### 用例
+### Use cases
-#### 我们将把问题限定在仅处理以下用例的范围中
+#### We'll scope the problem to handle only the following use cases
-* **用户**发布了一篇推特
- * **服务**将推特推送给关注者,给他们发送消息通知与邮件
-* **用户**浏览用户时间轴(用户最近的活动)
-* **用户**浏览主页时间轴(用户关注的人最近的活动)
-* **用户**搜索关键词
-* **服务**需要有高可用性
+* **User** posts a tweet
+ * **Service** pushes tweets to followers, sending push notifications and emails
+* **User** views the user timeline (activity from the user)
+* **User** views the home timeline (activity from people the user is following)
+* **User** searches keywords
+* **Service** has high availability
-#### 不在用例范围内的有
+#### Out of scope
-* **服务**向 Firehose 与其它流数据接口推送推特
-* **服务**根据用户的”是否可见“选项排除推特
- * 隐藏未关注者的 @回复
- * 关心”隐藏转发“设置
-* 数据分析
+* **Service** pushes tweets to the Twitter Firehose and other streams
+* **Service** strips out tweets based on user's visibility settings
+ * Hide @reply if the user is not also following the person being replied to
+ * Respect 'hide retweets' setting
+* Analytics
-### 限制条件与假设
+### Constraints and assumptions
-#### 提出假设
+#### State assumptions
-普遍情况
+General
-* 网络流量不是均匀分布的
-* 发布推特的速度需要足够快速
- * 除非有上百万的关注者,否则将推特推送给粉丝的速度要足够快
-* 1 亿个活跃用户
-* 每天新发布 5 亿条推特,每月新发布 150 亿条推特
- * 平均每条推特需要推送给 5 个人
- * 每天需要进行 50 亿次推送
- * 每月需要进行 1500 亿次推送
-* 每月需要处理 2500 亿次读取请求
-* 每月需要处理 100 亿次搜索
+* Traffic is not evenly distributed
+* Posting a tweet should be fast
+ * Fanning out a tweet to all of your followers should be fast, unless you have millions of followers
+* 100 million active users
+* 500 million tweets per day or 15 billion tweets per month
+ * Each tweet averages a fanout of 10 deliveries
+ * 5 billion total tweets delivered on fanout per day
+ * 150 billion tweets delivered on fanout per month
+* 250 billion read requests per month
+* 10 billion searches per month
-时间轴功能
+Timeline
-* 浏览时间轴需要足够快
-* 推特的读取负载要大于写入负载
- * 需要为推特的快速读取进行优化
-* 存入推特是高写入负载功能
+* Viewing the timeline should be fast
+* Twitter is more read heavy than write heavy
+ * Optimize for fast reads of tweets
+* Ingesting tweets is write heavy
-搜索功能
+Search
-* 搜索速度需要足够快
-* 搜索是高负载读取功能
+* Searching should be fast
+* Search is read-heavy
-#### 计算用量
+#### Calculate usage
-**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
-* 每条推特的大小:
- * `tweet_id` - 8 字节
- * `user_id` - 32 字节
- * `text` - 140 字节
- * `media` - 平均 10 KB
- * 总计: 大约 10 KB
-* 每月产生新推特的内容为 150 TB
- * 每条推特 10 KB * 每天 5 亿条推特 * 每月 30 天
- * 3 年产生新推特的内容为 5.4 PB
-* 每秒需要处理 10 万次读取请求
- * 每个月需要处理 2500 亿次请求 * (每秒 400 次请求 / 每月 10 亿次请求)
-* 每秒发布 6000 条推特
- * 每月发布 150 亿条推特 * (每秒 400 次请求 / 每月 10 次请求)
-* 每秒推送 6 万条推特
- * 每月推送 1500 亿条推特 * (每秒 400 次请求 / 每月 10 亿次请求)
-* 每秒 4000 次搜索请求
+* Size per tweet:
+ * `tweet_id` - 8 bytes
+ * `user_id` - 32 bytes
+ * `text` - 140 bytes
+ * `media` - 10 KB average
+ * Total: ~10 KB
+* 150 TB of new tweet content per month
+ * 10 KB per tweet * 500 million tweets per day * 30 days per month
+ * 5.4 PB of new tweet content in 3 years
+* 100 thousand read requests per second
+ * 250 billion read requests per month * (400 requests per second / 1 billion requests per month)
+* 6,000 tweets per second
+ * 15 billion tweets per month * (400 requests per second / 1 billion requests per month)
+* 60 thousand tweets delivered on fanout per second
+ * 150 billion tweets delivered on fanout per month * (400 requests per second / 1 billion requests per month)
+* 4,000 search requests per second
-便利换算指南:
+Handy conversion guide:
-* 每个月有 250 万秒
-* 每秒一个请求 = 每个月 250 万次请求
-* 每秒 40 个请求 = 每个月 1 亿次请求
-* 每秒 400 个请求 = 每个月 10 亿次请求
+* 2.5 million seconds per month
+* 1 request per second = 2.5 million requests per month
+* 40 requests per second = 100 million requests per month
+* 400 requests per second = 1 billion requests per month
-## 第二步:概要设计
+## Step 2: Create a high level design
-> 列出所有重要组件以规划概要设计。
+> Outline a high level design with all important components.
![Imgur](http://i.imgur.com/48tEA2j.png)
-## 第三步:设计核心组件
+## Step 3: Design core components
-> 深入每个核心组件的细节。
+> Dive into details for each core component.
-### 用例:用户发表了一篇推特
+### Use case: User posts a tweet
-我们可以将用户自己发表的推特存储在[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)中。我们也可以讨论一下[究竟是用 SQL 还是用 NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
+We could store the user's own tweets to populate the user timeline (activity from the user) in a [relational database](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms). We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
-构建用户主页时间轴(查看关注用户的活动)以及推送推特是件麻烦事。将特推传播给所有关注者(每秒约递送 6 万条推特)这一操作有可能会使传统的[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)超负载。因此,我们可以使用 **NoSQL 数据库**或**内存数据库**之类的更快的数据存储方式。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
+Delivering tweets and building the home timeline (activity from people the user is following) is trickier. Fanning out tweets to all followers (60 thousand tweets delivered on fanout per second) will overload a traditional [relational database](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms). We'll probably want to choose a data store with fast writes such as a **NoSQL database** or **Memory Cache**. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
-我们可以将照片、视频之类的媒体存储于**对象存储**中。
+We could store media such as photos or videos on an **Object Store**.
-* **客户端**向应用[反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)的**Web 服务器**发送一条推特
-* **Web 服务器**将请求转发给**写 API**服务器
-* **写 API**服务器将推特使用 **SQL 数据库**存储于用户时间轴中
-* **写 API**调用**消息输出服务**,进行以下操作:
- * 查询**用户 图 服务**找到存储于**内存缓存**中的此用户的粉丝
- * 将推特存储于**内存缓存**中的**此用户的粉丝的主页时间轴**中
- * O(n) 复杂度操作: 1000 名粉丝 = 1000 次查找与插入
- * 将特推存储在**搜索索引服务**中,以加快搜索
- * 将媒体存储于**对象存储**中
- * 使用**通知服务**向粉丝发送推送:
- * 使用**队列**异步推送通知
+* The **Client** posts a tweet to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* The **Web Server** forwards the request to the **Write API** server
+* The **Write API** stores the tweet in the user's timeline on a **SQL database**
+* The **Write API** contacts the **Fan Out Service**, which does the following:
+ * Queries the **User Graph Service** to find the user's followers stored in the **Memory Cache**
+ * Stores the tweet in the *home timeline of the user's followers* in a **Memory Cache**
+ * O(n) operation: 1,000 followers = 1,000 lookups and inserts
+ * Stores the tweet in the **Search Index Service** to enable fast searching
+ * Stores media in the **Object Store**
+ * Uses the **Notification Service** to send out push notifications to followers:
+ * Uses a **Queue** (not pictured) to asynchronously send out notifications
-**向你的面试官告知你准备写多少代码**。
+**Clarify with your interviewer how much code you are expected to write**.
-如果我们用 Redis 作为**内存缓存**,那可以用 Redis 原生的 list 作为其数据结构。结构如下:
+If our **Memory Cache** is Redis, we could use a native Redis list with the following structure:
```
tweet n+2 tweet n+1 tweet n
@@ -128,9 +128,9 @@
| tweet_id user_id meta | tweet_id user_id meta | tweet_id user_id meta |
```
-新发布的推特将被存储在对应用户(关注且活跃的用户)的主页时间轴的**内存缓存**中。
+The new tweet would be placed in the **Memory Cache**, which populates user's home timeline (activity from people the user is following).
-我们可以调用一个公共的 [REST API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest):
+We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
```
$ curl -X POST --data '{ "user_id": "123", "auth_token": "ABC123", \
@@ -138,7 +138,7 @@ $ curl -X POST --data '{ "user_id": "123", "auth_token": "ABC123", \
https://twitter.com/api/v1/tweet
```
-返回:
+Response:
```
{
@@ -150,24 +150,24 @@ $ curl -X POST --data '{ "user_id": "123", "auth_token": "ABC123", \
}
```
-而对于服务器内部的通信,我们可以使用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
+For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
-### 用例:用户浏览主页时间轴
+### Use case: User views the home timeline
-* **客户端**向 **Web 服务器**发起一次读取主页时间轴的请求
-* **Web 服务器**将请求转发给**读取 API**服务器
-* **读取 API**服务器调用**时间轴服务**进行以下操作:
- * 从**内存缓存**读取时间轴数据,其中包括推特 id 与用户 id - O(1)
- * 通过 [multiget](http://redis.io/commands/mget) 向**推特信息服务**进行查询,以获取相关 id 推特的额外信息 - O(n)
- * 通过 muiltiget 向**用户信息服务**进行查询,以获取相关 id 用户的额外信息 - O(n)
+* The **Client** posts a home timeline request to the **Web Server**
+* The **Web Server** forwards the request to the **Read API** server
+* The **Read API** server contacts the **Timeline Service**, which does the following:
+ * Gets the timeline data stored in the **Memory Cache**, containing tweet ids and user ids - O(1)
+ * Queries the **Tweet Info Service** with a [multiget](http://redis.io/commands/mget) to obtain additional info about the tweet ids - O(n)
+ * Queries the **User Info Service** with a multiget to obtain additional info about the user ids - O(n)
-REST API:
+REST API:
```
$ curl https://twitter.com/api/v1/home_timeline?user_id=123
```
-返回:
+Response:
```
{
@@ -187,145 +187,146 @@ $ curl https://twitter.com/api/v1/home_timeline?user_id=123
},
```
-### 用例:用户浏览用户时间轴
+### Use case: User views the user timeline
-* **客户端**向**Web 服务器**发起获得用户时间线的请求
-* **Web 服务器**将请求转发给**读取 API**服务器
-* **读取 API**从 **SQL 数据库**中取出用户的时间轴
+* The **Client** posts a user timeline request to the **Web Server**
+* The **Web Server** forwards the request to the **Read API** server
+* The **Read API** retrieves the user timeline from the **SQL Database**
-REST API 与前面的主页时间轴类似,区别只在于取出的推特是由用户自己发送而不是关注人发送。
+The REST API would be similar to the home timeline, except all tweets would come from the user as opposed to the people the user is following.
-### 用例:用户搜索关键词
+### Use case: User searches keywords
-* **客户端**将搜索请求发给**Web 服务器**
-* **Web 服务器**将请求转发给**搜索 API**服务器
-* **搜索 API**调用**搜索服务**进行以下操作:
- * 对输入进行转换与分词,弄明白需要搜索什么东西
- * 移除标点等额外内容
- * 将文本打散为词组
- * 修正拼写错误
- * 规范字母大小写
- * 将查询转换为布尔操作
- * 查询**搜索集群**(例如[Lucene](https://lucene.apache.org/))检索结果:
- * 对集群内的所有服务器进行查询,将有结果的查询进行[发散聚合(Scatter gathers)](https://github.com/donnemartin/system-design-primer#under-development)
- * 合并取到的条目,进行评分与排序,最终返回结果
+* The **Client** sends a search request to the **Web Server**
+* The **Web Server** forwards the request to the **Search API** server
+* The **Search API** contacts the **Search Service**, which does the following:
+ * Parses/tokenizes the input query, determining what needs to be searched
+ * Removes markup
+ * Breaks up the text into terms
+ * Fixes typos
+ * Normalizes capitalization
+ * Converts the query to use boolean operations
+ * Queries the **Search Cluster** (ie [Lucene](https://lucene.apache.org/)) for the results:
+ * [Scatter gathers](https://github.com/donnemartin/system-design-primer#under-development) each server in the cluster to determine if there are any results for the query
+ * Merges, ranks, sorts, and returns the results
-REST API:
+REST API:
```
$ curl https://twitter.com/api/v1/search?query=hello+world
```
-返回结果与前面的主页时间轴类似,只不过返回的是符合查询条件的推特。
+The response would be similar to that of the home timeline, except for tweets matching the given query.
-## 第四步:架构扩展
+## Step 4: Scale the design
-> 根据限制条件,找到并解决瓶颈。
+> Identify and address bottlenecks, given the constraints.
-![Imgur](http://i.imgur.com/MzExP06.png)
+![Imgur](http://i.imgur.com/jrUBAF7.png)
-**重要提示:不要从最初设计直接跳到最终设计中!**
+**Important: Do not simply jump right into the final design from the initial design!**
-现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
-讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
+It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
-我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
-**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
+*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
-* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
-* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
-* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
-* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
-* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
-* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
-* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
-* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
-* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
-* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
-* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
+* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
+* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
+* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
+* [Cache](https://github.com/donnemartin/system-design-primer#cache)
+* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
+* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
+* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
+* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
-**消息输出服务**有可能成为性能瓶颈。那些有着百万数量关注着的用户可能发一条推特就需要好几分钟才能完成消息输出进程。这有可能使 @回复 这种推特时出现竞争条件,因此需要根据服务时间对此推特进行重排序来降低影响。
+The **Fanout Service** is a potential bottleneck. Twitter users with millions of followers could take several minutes to have their tweets go through the fanout process. This could lead to race conditions with @replies to the tweet, which we could mitigate by re-ordering the tweets at serve time.
-我们还可以避免从高关注量的用户输出推特。相反,我们可以通过搜索来找到高关注量用户的推特,并将搜索结果与用户的主页时间轴合并,再根据时间对其进行排序。
+We could also avoid fanning out tweets from highly-followed users. Instead, we could search to find tweets for highly-followed users, merge the search results with the user's home timeline results, then re-order the tweets at serve time.
-此外,还可以通过以下内容进行优化:
+Additional optimizations include:
-* 仅为每个主页时间轴在**内存缓存**中存储数百条推特
-* 仅在**内存缓存**中存储活动用户的主页时间轴
- * 如果某个用户在过去 30 天都没有产生活动,那我们可以使用 **SQL 数据库**重新构建他的时间轴
- * 使用**用户 图 服务**来查询并确定用户关注的人
- * 从 **SQL 数据库**中取出推特,并将它们存入**内存缓存**
-* 仅在**推特信息服务**中存储一个月的推特
-* 仅在**用户信息服务**中存储活动用户的信息
-* **搜索集群**需要将推特保留在内存中,以降低延迟
+* Keep only several hundred tweets for each home timeline in the **Memory Cache**
+* Keep only active users' home timeline info in the **Memory Cache**
+ * If a user was not previously active in the past 30 days, we could rebuild the timeline from the **SQL Database**
+ * Query the **User Graph Service** to determine who the user is following
+ * Get the tweets from the **SQL Database** and add them to the **Memory Cache**
+* Store only a month of tweets in the **Tweet Info Service**
+* Store only active users in the **User Info Service**
+* The **Search Cluster** would likely need to keep the tweets in memory to keep latency low
-我们还可以考虑优化 **SQL 数据库** 来解决一些瓶颈问题。
+We'll also want to address the bottleneck with the **SQL Database**.
-**内存缓存**能减小一些数据库的负载,靠 **SQL Read 副本**已经足够处理缓存未命中情况。我们还可以考虑使用一些额外的 SQL 性能拓展技术。
+Although the **Memory Cache** should reduce the load on the database, it is unlikely the **SQL Read Replicas** alone would be enough to handle the cache misses. We'll probably need to employ additional SQL scaling patterns.
-高容量的写入将淹没单个的 **SQL 写主从**模式,因此需要更多的拓展技术。
+The high volume of writes would overwhelm a single **SQL Write Master-Slave**, also pointing to a need for additional scaling techniques.
-* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
-* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
-* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
-* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+* [Federation](https://github.com/donnemartin/system-design-primer#federation)
+* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
+* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
-我们也可以考虑将一些数据移至 **NoSQL 数据库**。
+We should also consider moving some data to a **NoSQL Database**.
-## 其它要点
+## Additional talking points
-> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+> Additional topics to dive into, depending on the problem scope and time remaining.
#### NoSQL
-* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
-* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
-* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
-* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
+* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
-### 缓存
+### Caching
-* 在哪缓存
- * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
- * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
- * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
- * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
- * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
-* 什么需要缓存
- * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
- * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
-* 何时更新缓存
- * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
- * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
- * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
- * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+* Where to cache
+ * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
+* What to cache
+ * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* When to update the cache
+ * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
+ * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
-### 异步与微服务
+### Asynchronism and microservices
-* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
-* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
-* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
-* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
+* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
+* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
-### 通信
+### Communications
-* 可权衡选择的方案:
- * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
- * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
-* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+* Discuss tradeoffs:
+ * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
-### 安全性
+### Security
-请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
+Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
-### 延迟数值
+### Latency numbers
-请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
-### 持续探讨
+### Ongoing
-* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
-* 架构拓展是一个迭代的过程。
+* Continue benchmarking and monitoring your system to address bottlenecks as they come up
+* Scaling is an iterative process
diff --git a/solutions/system_design/web_crawler/README.md b/solutions/system_design/web_crawler/README.md
index 2ad0938e..d95dc107 100644
--- a/solutions/system_design/web_crawler/README.md
+++ b/solutions/system_design/web_crawler/README.md
@@ -1,102 +1,104 @@
-# 设计一个网页爬虫
+# Design a web crawler
-**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
-## 第一步:简述用例与约束条件
+## Step 1: Outline use cases and constraints
-> 把所有需要的东西聚集在一起,审视问题。不停的提问,以至于我们可以明确使用场景和约束。讨论假设。
+> Gather requirements and scope the problem.
+> Ask questions to clarify use cases and constraints.
+> Discuss assumptions.
-我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
-### 用例
+### Use cases
-#### 我们把问题限定在仅处理以下用例的范围中
+#### We'll scope the problem to handle only the following use cases
-* **服务** 抓取一系列链接:
- * 生成包含搜索词的网页倒排索引
- * 生成页面的标题和摘要信息
- * 页面标题和摘要都是静态的,它们不会根据搜索词改变
-* **用户** 输入搜索词后,可以看到相关的搜索结果列表,列表每一项都包含由网页爬虫生成的页面标题及摘要
- * 只给该用例绘制出概要组件和交互说明,无需讨论细节
-* **服务** 具有高可用性
+* **Service** crawls a list of urls:
+ * Generates reverse index of words to pages containing the search terms
+ * Generates titles and snippets for pages
+ * Title and snippets are static, they do not change based on search query
+* **User** inputs a search term and sees a list of relevant pages with titles and snippets the crawler generated
+ * Only sketch high level components and interactions for this use case, no need to go into depth
+* **Service** has high availability
-#### 无需考虑
+#### Out of scope
-* 搜索分析
-* 个性化搜索结果
-* 页面排名
+* Search analytics
+* Personalized search results
+* Page rank
-### 限制条件与假设
+### Constraints and assumptions
-#### 提出假设
+#### State assumptions
-* 搜索流量分布不均
- * 有些搜索词非常热门,有些则非常冷门
-* 只支持匿名用户
-* 用户很快就能看到搜索结果
-* 网页爬虫不应该陷入死循环
- * 当爬虫路径包含环的时候,将会陷入死循环
-* 抓取 10 亿个链接
- * 要定期重新抓取页面以确保新鲜度
- * 平均每周重新抓取一次,网站越热门,那么重新抓取的频率越高
- * 每月抓取 40 亿个链接
- * 每个页面的平均存储大小:500 KB
- * 简单起见,重新抓取的页面算作新页面
-* 每月搜索量 1000 亿次
+* Traffic is not evenly distributed
+ * Some searches are very popular, while others are only executed once
+* Support only anonymous users
+* Generating search results should be fast
+* The web crawler should not get stuck in an infinite loop
+ * We get stuck in an infinite loop if the graph contains a cycle
+* 1 billion links to crawl
+ * Pages need to be crawled regularly to ensure freshness
+ * Average refresh rate of about once per week, more frequent for popular sites
+ * 4 billion links crawled each month
+ * Average stored size per web page: 500 KB
+ * For simplicity, count changes the same as new pages
+* 100 billion searches per month
-用更传统的系统来练习 —— 不要使用 [solr](http://lucene.apache.org/solr/) 、[nutch](http://nutch.apache.org/) 之类的现成系统。
+Exercise the use of more traditional systems - don't use existing systems such as [solr](http://lucene.apache.org/solr/) or [nutch](http://nutch.apache.org/).
-#### 计算用量
+#### Calculate usage
-**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
-* 每月存储 2 PB 页面
- * 每月抓取 40 亿个页面,每个页面 500 KB
- * 三年存储 72 PB 页面
-* 每秒 1600 次写请求
-* 每秒 40000 次搜索请求
+* 2 PB of stored page content per month
+ * 500 KB per page * 4 billion links crawled per month
+ * 72 PB of stored page content in 3 years
+* 1,600 write requests per second
+* 40,000 search requests per second
-简便换算指南:
+Handy conversion guide:
-* 一个月有 250 万秒
-* 每秒 1 个请求,即每月 250 万个请求
-* 每秒 40 个请求,即每月 1 亿个请求
-* 每秒 400 个请求,即每月 10 亿个请求
+* 2.5 million seconds per month
+* 1 request per second = 2.5 million requests per month
+* 40 requests per second = 100 million requests per month
+* 400 requests per second = 1 billion requests per month
-## 第二步: 概要设计
+## Step 2: Create a high level design
-> 列出所有重要组件以规划概要设计。
+> Outline a high level design with all important components.
![Imgur](http://i.imgur.com/xjdAAUv.png)
-## 第三步:设计核心组件
+## Step 3: Design core components
-> 对每一个核心组件进行详细深入的分析。
+> Dive into details for each core component.
-### 用例:爬虫服务抓取一系列网页
+### Use case: Service crawls a list of urls
-假设我们有一个初始列表 `links_to_crawl`(待抓取链接),它最初基于网站整体的知名度来排序。当然如果这个假设不合理,我们可以使用 [Yahoo](https://www.yahoo.com/)、[DMOZ](http://www.dmoz.org/) 等知名门户网站作为种子链接来进行扩散 。
+We'll assume we have an initial list of `links_to_crawl` ranked initially based on overall site popularity. If this is not a reasonable assumption, we can seed the crawler with popular sites that link to outside content such as [Yahoo](https://www.yahoo.com/), [DMOZ](http://www.dmoz.org/), etc
-我们将用表 `crawled_links` (已抓取链接 )来记录已经处理过的链接以及相应的页面签名。
+We'll use a table `crawled_links` to store processed links and their page signatures.
-我们可以将 `links_to_crawl` 和 `crawled_links` 记录在键-值型 **NoSQL 数据库**中。对于 `crawled_links` 中已排序的链接,我们可以使用 [Redis](https://redis.io/) 的有序集合来维护网页链接的排名。我们应当在 [选择 SQL 还是 NoSQL 的问题上,讨论有关使用场景以及利弊 ](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
+We could store `links_to_crawl` and `crawled_links` in a key-value **NoSQL Database**. For the ranked links in `links_to_crawl`, we could use [Redis](https://redis.io/) with sorted sets to maintain a ranking of page links. We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
-* **爬虫服务**按照以下流程循环处理每一个页面链接:
- * 选取排名最靠前的待抓取链接
- * 在 **NoSQL 数据库**的 `crawled_links` 中,检查待抓取页面的签名是否与某个已抓取页面的签名相似
- * 若存在,则降低该页面链接的优先级
- * 这样做可以避免陷入死循环
- * 继续(进入下一次循环)
- * 若不存在,则抓取该链接
- * 在**倒排索引服务**任务队列中,新增一个生成[倒排索引](https://en.wikipedia.org/wiki/Search_engine_indexing)任务。
- * 在**文档服务**任务队列中,新增一个生成静态标题和摘要的任务。
- * 生成页面签名
- * 在 **NoSQL 数据库**的 `links_to_crawl` 中删除该链接
- * 在 **NoSQL 数据库**的 `crawled_links` 中插入该链接以及页面签名
+* The **Crawler Service** processes each page link by doing the following in a loop:
+ * Takes the top ranked page link to crawl
+ * Checks `crawled_links` in the **NoSQL Database** for an entry with a similar page signature
+ * If we have a similar page, reduces the priority of the page link
+ * This prevents us from getting into a cycle
+ * Continue
+ * Else, crawls the link
+ * Adds a job to the **Reverse Index Service** queue to generate a [reverse index](https://en.wikipedia.org/wiki/Search_engine_indexing)
+ * Adds a job to the **Document Service** queue to generate a static title and snippet
+ * Generates the page signature
+ * Removes the link from `links_to_crawl` in the **NoSQL Database**
+ * Inserts the page link and signature to `crawled_links` in the **NoSQL Database**
-**向面试官了解你需要写多少代码**。
+**Clarify with your interviewer how much code you are expected to write**.
-`PagesDataStore` 是**爬虫服务**中的一个抽象类,它使用 **NoSQL 数据库**进行存储。
+`PagesDataStore` is an abstraction within the **Crawler Service** that uses the **NoSQL Database**:
```python
class PagesDataStore(object):
@@ -106,31 +108,31 @@ class PagesDataStore(object):
...
def add_link_to_crawl(self, url):
- """将指定链接加入 `links_to_crawl`。"""
+ """Add the given link to `links_to_crawl`."""
...
def remove_link_to_crawl(self, url):
- """从 `links_to_crawl` 中删除指定链接。"""
+ """Remove the given link from `links_to_crawl`."""
...
def reduce_priority_link_to_crawl(self, url)
- """在 `links_to_crawl` 中降低一个链接的优先级以避免死循环。"""
+ """Reduce the priority of a link in `links_to_crawl` to avoid cycles."""
...
def extract_max_priority_page(self):
- """返回 `links_to_crawl` 中优先级最高的链接。"""
+ """Return the highest priority link in `links_to_crawl`."""
...
def insert_crawled_link(self, url, signature):
- """将指定链接加入 `crawled_links`。"""
+ """Add the given link to `crawled_links`."""
...
def crawled_similar(self, signature):
- """判断待抓取页面的签名是否与某个已抓取页面的签名相似。"""
+ """Determine if we've already crawled a page matching the given signature"""
...
```
-`Page` 是**爬虫服务**的一个抽象类,它封装了网页对象,由页面链接、页面内容、子链接和页面签名构成。
+`Page` is an abstraction within the **Crawler Service** that encapsulates a page, its contents, child urls, and signature:
```python
class Page(object):
@@ -142,7 +144,7 @@ class Page(object):
self.signature = signature
```
-`Crawler` 是**爬虫服务**的主类,由`Page` 和 `PagesDataStore` 组成。
+`Crawler` is the main class within **Crawler Service**, composed of `Page` and `PagesDataStore`.
```python
class Crawler(object):
@@ -153,7 +155,7 @@ class Crawler(object):
self.doc_index_queue = doc_index_queue
def create_signature(self, page):
- """基于页面链接与内容生成签名。"""
+ """Create signature based on url and contents."""
...
def crawl_page(self, page):
@@ -174,16 +176,16 @@ class Crawler(object):
self.crawl_page(page)
```
-### 处理重复内容
+### Handling duplicates
-我们要谨防网页爬虫陷入死循环,这通常会发生在爬虫路径中存在环的情况。
+We need to be careful the web crawler doesn't get stuck in an infinite loop, which happens when the graph contains a cycle.
-**向面试官了解你需要写多少代码**.
+**Clarify with your interviewer how much code you are expected to write**.
-删除重复链接:
+We'll want to remove duplicate urls:
-* 假设数据量较小,我们可以用类似于 `sort | unique` 的方法。(译注: 先排序,后去重)
-* 假设有 10 亿条数据,我们应该使用 **MapReduce** 来输出只出现 1 次的记录。
+* For smaller lists we could use something like `sort | unique`
+* With 1 billion links to crawl, we could use **MapReduce** to output only entries that have a frequency of 1
```python
class RemoveDuplicateUrls(MRJob):
@@ -197,38 +199,38 @@ class RemoveDuplicateUrls(MRJob):
yield key, total
```
-比起处理重复内容,检测重复内容更为复杂。我们可以基于网页内容生成签名,然后对比两者签名的相似度。可能会用到的算法有 [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index) 以及 [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)。
+Detecting duplicate content is more complex. We could generate a signature based on the contents of the page and compare those two signatures for similarity. Some potential algorithms are [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index) and [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity).
-### 抓取结果更新策略
+### Determining when to update the crawl results
-要定期重新抓取页面以确保新鲜度。抓取结果应该有个 `timestamp` 字段记录上一次页面抓取时间。每隔一段时间,比如说 1 周,所有页面都需要更新一次。对于热门网站或是内容频繁更新的网站,爬虫抓取间隔可以缩短。
+Pages need to be crawled regularly to ensure freshness. Crawl results could have a `timestamp` field that indicates the last time a page was crawled. After a default time period, say one week, all pages should be refreshed. Frequently updated or more popular sites could be refreshed in shorter intervals.
-尽管我们不会深入网页数据分析的细节,我们仍然要做一些数据挖掘工作来确定一个页面的平均更新时间,并且根据相关的统计数据来决定爬虫的重新抓取频率。
+Although we won't dive into details on analytics, we could do some data mining to determine the mean time before a particular page is updated, and use that statistic to determine how often to re-crawl the page.
-当然我们也应该根据站长提供的 `Robots.txt` 来控制爬虫的抓取频率。
+We might also choose to support a `Robots.txt` file that gives webmasters control of crawl frequency.
-### 用例:用户输入搜索词后,可以看到相关的搜索结果列表,列表每一项都包含由网页爬虫生成的页面标题及摘要
+### Use case: User inputs a search term and sees a list of relevant pages with titles and snippets
-* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
-* **Web 服务器** 发送请求到 **Query API** 服务器
-* **查询 API** 服务将会做这些事情:
- * 解析查询参数
- * 删除 HTML 标记
- * 将文本分割成词组 (译注: 分词处理)
- * 修正错别字
- * 规范化大小写
- * 将搜索词转换为布尔运算
- * 使用**倒排索引服务**来查找匹配查询的文档
- * **倒排索引服务**对匹配到的结果进行排名,然后返回最符合的结果
- * 使用**文档服务**返回文章标题与摘要
+* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* The **Web Server** forwards the request to the **Query API** server
+* The **Query API** server does the following:
+ * Parses the query
+ * Removes markup
+ * Breaks up the text into terms
+ * Fixes typos
+ * Normalizes capitalization
+ * Converts the query to use boolean operations
+ * Uses the **Reverse Index Service** to find documents matching the query
+ * The **Reverse Index Service** ranks the matching results and returns the top ones
+ * Uses the **Document Service** to return titles and snippets
-我们使用 [**REST API**](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest) 与客户端通信:
+We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
```
$ curl https://search.com/api/v1/search?query=hello+world
```
-响应内容:
+Response:
```
{
@@ -248,109 +250,104 @@ $ curl https://search.com/api/v1/search?query=hello+world
},
```
-对于服务器内部通信,我们可以使用 [远程过程调用协议(RPC)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
+## Step 4: Scale the design
-## 第四步:架构扩展
-
-> 根据限制条件,找到并解决瓶颈。
+> Identify and address bottlenecks, given the constraints.
![Imgur](http://i.imgur.com/bWxPtQA.png)
-**重要提示:不要直接从最初设计跳到最终设计!**
+**Important: Do not simply jump right into the final design from the initial design!**
-现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
-讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一套配备多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有哪些呢?
+It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
-我们将会介绍一些组件来完成设计,并解决架构规模扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
-**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及替代方案。
+*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
-* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
-* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
-* [水平扩展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
-* [Web 服务器(反向代理)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
-* [API 服务器(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
-* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
-* [NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#nosql)
-* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
-* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
+* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
+* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
+* [Cache](https://github.com/donnemartin/system-design-primer#cache)
+* [NoSQL](https://github.com/donnemartin/system-design-primer#nosql)
+* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
+* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
-有些搜索词非常热门,有些则非常冷门。热门的搜索词可以通过诸如 Redis 或者 Memcached 之类的**内存缓存**来缩短响应时间,避免**倒排索引服务**以及**文档服务**过载。**内存缓存**同样适用于流量分布不均匀以及流量短时高峰问题。从内存中读取 1 MB 连续数据大约需要 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
+Some searches are very popular, while others are only executed once. Popular queries can be served from a **Memory Cache** such as Redis or Memcached to reduce response times and to avoid overloading the **Reverse Index Service** and **Document Service**. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
+Below are a few other optimizations to the **Crawling Service**:
-以下是优化**爬虫服务**的其他建议:
+* To handle the data size and request load, the **Reverse Index Service** and **Document Service** will likely need to make heavy use sharding and replication.
+* DNS lookup can be a bottleneck, the **Crawler Service** can keep its own DNS lookup that is refreshed periodically
+* The **Crawler Service** can improve performance and reduce memory usage by keeping many open connections at a time, referred to as [connection pooling](https://en.wikipedia.org/wiki/Connection_pool)
+ * Switching to [UDP](https://github.com/donnemartin/system-design-primer#user-datagram-protocol-udp) could also boost performance
+* Web crawling is bandwidth intensive, ensure there is enough bandwidth to sustain high throughput
-* 为了处理数据大小问题以及网络请求负载,**倒排索引服务**和**文档服务**可能需要大量应用数据分片和数据复制。
-* DNS 查询可能会成为瓶颈,**爬虫服务**最好专门维护一套定期更新的 DNS 查询服务。
-* 借助于[连接池](https://en.wikipedia.org/wiki/Connection_pool),即同时维持多个开放网络连接,可以提升**爬虫服务**的性能并减少内存使用量。
- * 改用 [UDP](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#用户数据报协议udp) 协议同样可以提升性能
-* 网络爬虫受带宽影响较大,请确保带宽足够维持高吞吐量。
+## Additional talking points
-## 其它要点
+> Additional topics to dive into, depending on the problem scope and time remaining.
-> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+### SQL scaling patterns
-### SQL 扩展模式
-
-* [读取复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
-* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
-* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
-* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
-* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [Federation](https://github.com/donnemartin/system-design-primer#federation)
+* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
+* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
#### NoSQL
-* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
-* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
-* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
-* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
-* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
+* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
+### Caching
-### 缓存
+* Where to cache
+ * [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
+* What to cache
+ * [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* When to update the cache
+ * [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
+ * [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
-* 在哪缓存
- * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
- * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
- * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
- * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
- * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
-* 什么需要缓存
- * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
- * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
-* 何时更新缓存
- * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
- * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
- * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
- * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+### Asynchronism and microservices
-### 异步与微服务
+* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
+* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
+* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
-* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
-* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
-* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
-* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+### Communications
-### 通信
+* Discuss tradeoffs:
+ * External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
-* 可权衡选择的方案:
- * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
- * 内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
-* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+### Security
+Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
-### 安全性
+### Latency numbers
-请参阅[安全](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)。
+See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
+### Ongoing
-### 延迟数值
-
-请参阅[每个程序员都应该知道的延迟数](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
-
-### 持续探讨
-
-* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
-* 架构扩展是一个迭代的过程。
+* Continue benchmarking and monitoring your system to address bottlenecks as they come up
+* Scaling is an iterative process
From 449dc27f33991fe558a83c6815eb28ae294428dd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E6=A0=B9=E5=8F=B7=E4=B8=89?=
Date: Mon, 30 Mar 2020 08:40:50 +0800
Subject: [PATCH 42/72] zh-Hans: Translate solutions (#392)
---
README-zh-Hans.md | 11 +-
.../system_design/mint/README-zh-Hans.md | 440 ++++++++++++++++++
.../system_design/pastebin/README-zh-Hans.md | 2 +-
.../query_cache/README-zh-Hans.md | 306 ++++++++++++
.../sales_rank/README-zh-Hans.md | 338 ++++++++++++++
.../scaling_aws/README-zh-Hans.md | 403 ++++++++++++++++
.../social_graph/README-zh-Hans.md | 348 ++++++++++++++
.../system_design/twitter/README-zh-Hans.md | 331 +++++++++++++
.../web_crawler/README-zh-Hans.md | 356 ++++++++++++++
9 files changed, 2525 insertions(+), 10 deletions(-)
create mode 100644 solutions/system_design/mint/README-zh-Hans.md
create mode 100644 solutions/system_design/query_cache/README-zh-Hans.md
create mode 100644 solutions/system_design/sales_rank/README-zh-Hans.md
create mode 100644 solutions/system_design/scaling_aws/README-zh-Hans.md
create mode 100644 solutions/system_design/social_graph/README-zh-Hans.md
create mode 100644 solutions/system_design/twitter/README-zh-Hans.md
create mode 100644 solutions/system_design/web_crawler/README-zh-Hans.md
diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index 21a6cddb..83c6007b 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -1,6 +1,6 @@
> * 原文地址:[github.com/donnemartin/system-design-primer](https://github.com/donnemartin/system-design-primer)
> * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner)
-> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)
+> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)、[根号三](https://github.com/sqrthree)
> * 这个 [链接](https://github.com/xitu/system-design-primer/compare/master...donnemartin:master) 用来查看本翻译与英文版是否有差别(如果你没有看到 README.md 发生变化,那就意味着这份翻译文档是最新的)。
*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
@@ -12,14 +12,6 @@
-## 翻译
-
-有兴趣参与[翻译](https://github.com/donnemartin/system-design-primer/issues/28)? 以下是正在进行中的翻译:
-
-* [巴西葡萄牙语](https://github.com/donnemartin/system-design-primer/issues/40)
-* [简体中文](https://github.com/donnemartin/system-design-primer/issues/38)
-* [土耳其语](https://github.com/donnemartin/system-design-primer/issues/39)
-
## 目的
> 学习如何设计大型系统。
@@ -91,6 +83,7 @@
* 修复错误
* 完善章节
* 添加章节
+* [帮助翻译](https://github.com/donnemartin/system-design-primer/issues/28)
一些还需要完善的内容放在了[正在完善中](#正在完善中)。
diff --git a/solutions/system_design/mint/README-zh-Hans.md b/solutions/system_design/mint/README-zh-Hans.md
new file mode 100644
index 00000000..58467bc6
--- /dev/null
+++ b/solutions/system_design/mint/README-zh-Hans.md
@@ -0,0 +1,440 @@
+# 设计 Mint.com
+
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题索引)中的有关部分,以避免重复的内容。您可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+
+## 第一步:简述用例与约束条件
+
+> 搜集需求与问题的范围。
+> 提出问题来明确用例与约束条件。
+> 讨论假设。
+
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+
+### 用例
+
+#### 我们将把问题限定在仅处理以下用例的范围中
+
+* **用户** 连接到一个财务账户
+* **服务** 从账户中提取交易
+ * 每日更新
+ * 分类交易
+ * 允许用户手动分类
+ * 不自动重新分类
+ * 按类别分析每月支出
+* **服务** 推荐预算
+ * 允许用户手动设置预算
+ * 当接近或者超出预算时,发送通知
+* **服务** 具有高可用性
+
+#### 非用例范围
+
+* **服务** 执行附加的日志记录和分析
+
+### 限制条件与假设
+
+#### 提出假设
+
+* 网络流量非均匀分布
+* 自动账户日更新只适用于 30 天内活跃的用户
+* 添加或者移除财务账户相对较少
+* 预算通知不需要及时
+* 1000 万用户
+ * 每个用户10个预算类别= 1亿个预算项
+ * 示例类别:
+ * Housing = $1,000
+ * Food = $200
+ * Gas = $100
+ * 卖方确定交易类别
+ * 50000 个卖方
+* 3000 万财务账户
+* 每月 50 亿交易
+* 每月 5 亿读请求
+* 10:1 读写比
+ * Write-heavy,用户每天都进行交易,但是每天很少访问该网站
+
+#### 计算用量
+
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+
+* 每次交易的用量:
+ * `user_id` - 8 字节
+ * `created_at` - 5 字节
+ * `seller` - 32 字节
+ * `amount` - 5 字节
+ * Total: ~50 字节
+* 每月产生 250 GB 新的交易内容
+ * 每次交易 50 比特 * 50 亿交易每月
+ * 3年内新的交易内容 9 TB
+ * Assume most are new transactions instead of updates to existing ones
+* 平均每秒产生 2000 次交易
+* 平均每秒产生 200 读请求
+
+便利换算指南:
+
+* 每个月有 250 万秒
+* 每秒一个请求 = 每个月 250 万次请求
+* 每秒 40 个请求 = 每个月 1 亿次请求
+* 每秒 400 个请求 = 每个月 10 亿次请求
+
+## 第二步:概要设计
+
+> 列出所有重要组件以规划概要设计。
+
+![Imgur](http://i.imgur.com/E8klrBh.png)
+
+## 第三步:设计核心组件
+
+> 深入每个核心组件的细节。
+
+### 用例:用户连接到一个财务账户
+
+我们可以将 1000 万用户的信息存储在一个[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)中。我们应该讨论一下[选择SQL或NoSQL之间的用例和权衡](https://github.com/donnemartin/system-design-primer#sql-or-nosql)了。
+
+* **客户端** 作为一个[反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server),发送请求到 **Web 服务器**
+* **Web 服务器** 转发请求到 **账户API** 服务器
+* **账户API** 服务器将新输入的账户信息更新到 **SQL数据库** 的`accounts`表
+
+**告知你的面试官你准备写多少代码**。
+
+`accounts`表应该具有如下结构:
+
+```
+id int NOT NULL AUTO_INCREMENT
+created_at datetime NOT NULL
+last_update datetime NOT NULL
+account_url varchar(255) NOT NULL
+account_login varchar(32) NOT NULL
+account_password_hash char(64) NOT NULL
+user_id int NOT NULL
+PRIMARY KEY(id)
+FOREIGN KEY(user_id) REFERENCES users(id)
+```
+
+我们将在`id`,`user_id`和`created_at`等字段上创建一个[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)以加速查找(对数时间而不是扫描整个表)并保持数据在内存中。从内存中顺序读取 1 MB数据花费大约250毫秒,而从SSD读取是其4倍,从磁盘读取是其80倍。1
+
+我们将使用公开的[**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
+
+```
+$ curl -X POST --data '{ "user_id": "foo", "account_url": "bar", \
+ "account_login": "baz", "account_password": "qux" }' \
+ https://mint.com/api/v1/account
+```
+
+对于内部通信,我们可以使用[远程过程调用](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)。
+
+接下来,服务从账户中提取交易。
+
+### 用例:服务从账户中提取交易
+
+如下几种情况下,我们会想要从账户中提取信息:
+
+* 用户首次链接账户
+* 用户手动更新账户
+* 为过去 30 天内活跃的用户自动日更新
+
+数据流:
+
+* **客户端**向 **Web服务器** 发送请求
+* **Web服务器** 将请求转发到 **帐户API** 服务器
+* **帐户API** 服务器将job放在 **队列** 中,如 [Amazon SQS](https://aws.amazon.com/sqs/) 或者 [RabbitMQ](https://www.rabbitmq.com/)
+ * 提取交易可能需要一段时间,我们可能希望[与队列异步](https://github.com/donnemartin/system-design-primer#asynchronism)地来做,虽然这会引入额外的复杂度。
+* **交易提取服务** 执行如下操作:
+ * 从 **Queue** 中拉取并从金融机构中提取给定用户的交易,将结果作为原始日志文件存储在 **对象存储区**。
+ * 使用 **分类服务** 来分类每个交易
+ * 使用 **预算服务** 来按类别计算每月总支出
+ * **预算服务** 使用 **通知服务** 让用户知道他们是否接近或者已经超出预算
+ * 更新具有分类交易的 **SQL数据库** 的`transactions`表
+ * 按类别更新 **SQL数据库** `monthly_spending`表的每月总支出
+ * 通过 **通知服务** 提醒用户交易完成
+ * 使用一个 **队列** (没有画出来) 来异步发送通知
+
+`transactions`表应该具有如下结构:
+
+```
+id int NOT NULL AUTO_INCREMENT
+created_at datetime NOT NULL
+seller varchar(32) NOT NULL
+amount decimal NOT NULL
+user_id int NOT NULL
+PRIMARY KEY(id)
+FOREIGN KEY(user_id) REFERENCES users(id)
+```
+
+我们将在 `id`,`user_id`,和 `created_at`字段上创建[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)。
+
+`monthly_spending`表应该具有如下结构:
+
+```
+id int NOT NULL AUTO_INCREMENT
+month_year date NOT NULL
+category varchar(32)
+amount decimal NOT NULL
+user_id int NOT NULL
+PRIMARY KEY(id)
+FOREIGN KEY(user_id) REFERENCES users(id)
+```
+
+我们将在`id`,`user_id`字段上创建[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)。
+
+#### 分类服务
+
+对于 **分类服务**,我们可以生成一个带有最受欢迎卖家的卖家-类别字典。如果我们估计 50000 个卖家,并估计每个条目占用不少于 255 个字节,该字典只需要大约 12 MB内存。
+
+**告知你的面试官你准备写多少代码**。
+
+```python
+class DefaultCategories(Enum):
+
+ HOUSING = 0
+ FOOD = 1
+ GAS = 2
+ SHOPPING = 3
+ ...
+
+seller_category_map = {}
+seller_category_map['Exxon'] = DefaultCategories.GAS
+seller_category_map['Target'] = DefaultCategories.SHOPPING
+...
+```
+
+对于一开始没有在映射中的卖家,我们可以通过评估用户提供的手动类别来进行众包。在 O(1) 时间内,我们可以用堆来快速查找每个卖家的顶端的手动覆盖。
+
+```python
+class Categorizer(object):
+
+ def __init__(self, seller_category_map, self.seller_category_crowd_overrides_map):
+ self.seller_category_map = seller_category_map
+ self.seller_category_crowd_overrides_map = \
+ seller_category_crowd_overrides_map
+
+ def categorize(self, transaction):
+ if transaction.seller in self.seller_category_map:
+ return self.seller_category_map[transaction.seller]
+ elif transaction.seller in self.seller_category_crowd_overrides_map:
+ self.seller_category_map[transaction.seller] = \
+ self.seller_category_crowd_overrides_map[transaction.seller].peek_min()
+ return self.seller_category_map[transaction.seller]
+ return None
+```
+
+交易实现:
+
+```python
+class Transaction(object):
+
+ def __init__(self, created_at, seller, amount):
+ self.timestamp = timestamp
+ self.seller = seller
+ self.amount = amount
+```
+
+### 用例:服务推荐预算
+
+首先,我们可以使用根据收入等级分配每类别金额的通用预算模板。使用这种方法,我们不必存储在约束中标识的 1 亿个预算项目,只需存储用户覆盖的预算项目。如果用户覆盖预算类别,我们可以在
+`TABLE budget_overrides`中存储此覆盖。
+
+```python
+class Budget(object):
+
+ def __init__(self, income):
+ self.income = income
+ self.categories_to_budget_map = self.create_budget_template()
+
+ def create_budget_template(self):
+ return {
+ 'DefaultCategories.HOUSING': income * .4,
+ 'DefaultCategories.FOOD': income * .2
+ 'DefaultCategories.GAS': income * .1,
+ 'DefaultCategories.SHOPPING': income * .2
+ ...
+ }
+
+ def override_category_budget(self, category, amount):
+ self.categories_to_budget_map[category] = amount
+```
+
+对于 **预算服务** 而言,我们可以在`transactions`表上运行SQL查询以生成`monthly_spending`聚合表。由于用户通常每个月有很多交易,所以`monthly_spending`表的行数可能会少于总共50亿次交易的行数。
+
+作为替代,我们可以在原始交易文件上运行 **MapReduce** 作业来:
+
+* 分类每个交易
+* 按类别生成每月总支出
+
+对交易文件的运行分析可以显著减少数据库的负载。
+
+如果用户更新类别,我们可以调用 **预算服务** 重新运行分析。
+
+**告知你的面试官你准备写多少代码**.
+
+日志文件格式样例,以tab分割:
+
+```
+user_id timestamp seller amount
+```
+
+**MapReduce** 实现:
+
+```python
+class SpendingByCategory(MRJob):
+
+ def __init__(self, categorizer):
+ self.categorizer = categorizer
+ self.current_year_month = calc_current_year_month()
+ ...
+
+ def calc_current_year_month(self):
+ """返回当前年月"""
+ ...
+
+ def extract_year_month(self, timestamp):
+ """返回时间戳的年,月部分"""
+ ...
+
+ def handle_budget_notifications(self, key, total):
+ """如果接近或超出预算,调用通知API"""
+ ...
+
+ def mapper(self, _, line):
+ """解析每个日志行,提取和转换相关行。
+
+ 参数行应为如下形式:
+
+ user_id timestamp seller amount
+
+ 使用分类器来将卖家转换成类别,生成如下形式的key-value对:
+
+ (user_id, 2016-01, shopping), 25
+ (user_id, 2016-01, shopping), 100
+ (user_id, 2016-01, gas), 50
+ """
+ user_id, timestamp, seller, amount = line.split('\t')
+ category = self.categorizer.categorize(seller)
+ period = self.extract_year_month(timestamp)
+ if period == self.current_year_month:
+ yield (user_id, period, category), amount
+
+ def reducer(self, key, value):
+ """将每个key对应的值求和。
+
+ (user_id, 2016-01, shopping), 125
+ (user_id, 2016-01, gas), 50
+ """
+ total = sum(values)
+ yield key, sum(values)
+```
+
+## 第四步:设计扩展
+
+> 根据限制条件,找到并解决瓶颈。
+
+![Imgur](http://i.imgur.com/V5q57vU.png)
+
+**重要提示:不要从最初设计直接跳到最终设计中!**
+
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
+
+我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
+
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
+* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
+* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [异步](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#异步)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+
+我们将增加一个额外的用例:**用户** 访问摘要和交易数据。
+
+用户会话,按类别统计的统计信息,以及最近的事务可以放在 **内存缓存**(如 Redis 或 Memcached )中。
+
+* **客户端** 发送读请求给 **Web 服务器**
+* **Web 服务器** 转发请求到 **读 API** 服务器
+ * 静态内容可通过 **对象存储** 比如缓存在 **CDN** 上的 S3 来服务
+* **读 API** 服务器做如下动作:
+ * 检查 **内存缓存** 的内容
+ * 如果URL在 **内存缓存**中,返回缓存的内容
+ * 否则
+ * 如果URL在 **SQL 数据库**中,获取该内容
+ * 以其内容更新 **内存缓存**
+
+参考 [何时更新缓存](https://github.com/donnemartin/system-design-primer#when-to-update-the-cache) 中权衡和替代的内容。以上方法描述了 [cache-aside缓存模式](https://github.com/donnemartin/system-design-primer#cache-aside).
+
+我们可以使用诸如 Amazon Redshift 或者 Google BigQuery 等数据仓库解决方案,而不是将`monthly_spending`聚合表保留在 **SQL 数据库** 中。
+
+我们可能只想在数据库中存储一个月的`交易`数据,而将其余数据存储在数据仓库或者 **对象存储区** 中。**对象存储区** (如Amazon S3) 能够舒服地解决每月 250 GB新内容的限制。
+
+为了解决每秒 *平均* 2000 次读请求数(峰值时更高),受欢迎的内容的流量应由 **内存缓存** 而不是数据库来处理。 **内存缓存** 也可用于处理不均匀分布的流量和流量尖峰。 只要副本不陷入重复写入的困境,**SQL 读副本** 应该能够处理高速缓存未命中。
+
+*平均* 200 次交易写入每秒(峰值时更高)对于单个 **SQL 写入主-从服务** 来说可能是棘手的。我们可能需要考虑其它的 SQL 性能拓展技术:
+
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+
+我们也可以考虑将一些数据移至 **NoSQL 数据库**。
+
+## 其它要点
+
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+
+#### NoSQL
+
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+
+### 缓存
+
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+
+### 异步与微服务
+
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+
+### 通信
+
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+
+### 安全性
+
+请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
+
+### 延迟数值
+
+请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+
+### 持续探讨
+
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构拓展是一个迭代的过程。
diff --git a/solutions/system_design/pastebin/README-zh-Hans.md b/solutions/system_design/pastebin/README-zh-Hans.md
index b5fcbd3a..d2946e97 100644
--- a/solutions/system_design/pastebin/README-zh-Hans.md
+++ b/solutions/system_design/pastebin/README-zh-Hans.md
@@ -1,6 +1,6 @@
# 设计 Pastebin.com (或者 Bit.ly)
-**Note: 为了避免重复,当前文档直接链接到[系统设计主题](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)的相关区域,请参考链接内容以获得综合的讨论点、权衡和替代方案。**
+**注意: 为了避免重复,当前文档会直接链接到[系统设计主题](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)的相关区域,请参考链接内容以获得综合的讨论点、权衡和替代方案。**
**设计 Bit.ly** - 是一个类似的问题,区别是 pastebin 需要存储的是 paste 的内容,而不是原始的未短化的 url。
diff --git a/solutions/system_design/query_cache/README-zh-Hans.md b/solutions/system_design/query_cache/README-zh-Hans.md
new file mode 100644
index 00000000..c6f4be75
--- /dev/null
+++ b/solutions/system_design/query_cache/README-zh-Hans.md
@@ -0,0 +1,306 @@
+# 设计一个键-值缓存来存储最近 web 服务查询的结果
+
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+
+## 第一步:简述用例与约束条件
+
+> 搜集需求与问题的范围。
+> 提出问题来明确用例与约束条件。
+> 讨论假设。
+
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+
+### 用例
+
+#### 我们将把问题限定在仅处理以下用例的范围中
+
+* **用户**发送一个搜索请求,命中缓存
+* **用户**发送一个搜索请求,未命中缓存
+* **服务**有着高可用性
+
+### 限制条件与假设
+
+#### 提出假设
+
+* 网络流量不是均匀分布的
+ * 经常被查询的内容应该一直存于缓存中
+ * 需要确定如何规定缓存过期、缓存刷新规则
+* 缓存提供的服务查询速度要快
+* 机器间延迟较低
+* 缓存有内存限制
+ * 需要决定缓存什么、移除什么
+ * 需要缓存百万级的查询
+* 1000 万用户
+* 每个月 100 亿次查询
+
+#### 计算用量
+
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+
+* 缓存存储的是键值对有序表,键为 `query`(查询),值为 `results`(结果)。
+ * `query` - 50 字节
+ * `title` - 20 字节
+ * `snippet` - 200 字节
+ * 总计:270 字节
+* 假如 100 亿次查询都是不同的,且全部需要存储,那么每个月需要 2.7 TB 的缓存空间
+ * 单次查询 270 字节 * 每月查询 100 亿次
+ * 假设内存大小有限制,需要决定如何制定缓存过期规则
+* 每秒 4,000 次请求
+
+便利换算指南:
+
+* 每个月有 250 万秒
+* 每秒一个请求 = 每个月 250 万次请求
+* 每秒 40 个请求 = 每个月 1 亿次请求
+* 每秒 400 个请求 = 每个月 10 亿次请求
+
+## 第二步:概要设计
+
+> 列出所有重要组件以规划概要设计。
+
+![Imgur](http://i.imgur.com/KqZ3dSx.png)
+
+## 第三步:设计核心组件
+
+> 深入每个核心组件的细节。
+
+### 用例:用户发送了一次请求,命中了缓存
+
+常用的查询可以由例如 Redis 或者 Memcached 之类的**内存缓存**提供支持,以减少数据读取延迟,并且避免**反向索引服务**以及**文档服务**的过载。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
+
+由于缓存容量有限,我们将使用 LRU(近期最少使用算法)来控制缓存的过期。
+
+* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
+* 这个 **Web 服务器**将请求转发给**查询 API** 服务
+* **查询 API** 服务将会做这些事情:
+ * 分析查询
+ * 移除多余的内容
+ * 将文本分割成词组
+ * 修正拼写错误
+ * 规范化字母的大小写
+ * 将查询转换为布尔运算
+ * 检测**内存缓存**是否有匹配查询的内容
+ * 如果命中**内存缓存**,**内存缓存**将会做以下事情:
+ * 将缓存入口的位置指向 LRU 链表的头部
+ * 返回缓存内容
+ * 否则,**查询 API** 将会做以下事情:
+ * 使用**反向索引服务**来查找匹配查询的文档
+ * **反向索引服务**对匹配到的结果进行排名,然后返回最符合的结果
+ * 使用**文档服务**返回文章标题与片段
+ * 更新**内存缓存**,存入内容,将**内存缓存**入口位置指向 LRU 链表的头部
+
+#### 缓存的实现
+
+缓存可以使用双向链表实现:新元素将会在头结点加入,过期的元素将会在尾节点被删除。我们使用哈希表以便能够快速查找每个链表节点。
+
+**向你的面试官告知你准备写多少代码**。
+
+实现**查询 API 服务**:
+
+```python
+class QueryApi(object):
+
+ def __init__(self, memory_cache, reverse_index_service):
+ self.memory_cache = memory_cache
+ self.reverse_index_service = reverse_index_service
+
+ def parse_query(self, query):
+ """移除多余内容,将文本分割成词组,修复拼写错误,
+ 规范化字母大小写,转换布尔运算。
+ """
+ ...
+
+ def process_query(self, query):
+ query = self.parse_query(query)
+ results = self.memory_cache.get(query)
+ if results is None:
+ results = self.reverse_index_service.process_search(query)
+ self.memory_cache.set(query, results)
+ return results
+```
+
+实现**节点**:
+
+```python
+class Node(object):
+
+ def __init__(self, query, results):
+ self.query = query
+ self.results = results
+```
+
+实现**链表**:
+
+```python
+class LinkedList(object):
+
+ def __init__(self):
+ self.head = None
+ self.tail = None
+
+ def move_to_front(self, node):
+ ...
+
+ def append_to_front(self, node):
+ ...
+
+ def remove_from_tail(self):
+ ...
+```
+
+实现**缓存**:
+
+```python
+class Cache(object):
+
+ def __init__(self, MAX_SIZE):
+ self.MAX_SIZE = MAX_SIZE
+ self.size = 0
+ self.lookup = {} # key: query, value: node
+ self.linked_list = LinkedList()
+
+ def get(self, query)
+ """从缓存取得存储的内容
+
+ 将入口节点位置更新为 LRU 链表的头部。
+ """
+ node = self.lookup[query]
+ if node is None:
+ return None
+ self.linked_list.move_to_front(node)
+ return node.results
+
+ def set(self, results, query):
+ """将所给查询键的结果存在缓存中。
+
+ 当更新缓存记录的时候,将它的位置指向 LRU 链表的头部。
+ 如果这个记录是新的记录,并且缓存空间已满,应该在加入新记录前
+ 删除最老的记录。
+ """
+ node = self.lookup[query]
+ if node is not None:
+ # 键存在于缓存中,更新它对应的值
+ node.results = results
+ self.linked_list.move_to_front(node)
+ else:
+ # 键不存在于缓存中
+ if self.size == self.MAX_SIZE:
+ # 在链表中查找并删除最老的记录
+ self.lookup.pop(self.linked_list.tail.query, None)
+ self.linked_list.remove_from_tail()
+ else:
+ self.size += 1
+ # 添加新的键值对
+ new_node = Node(query, results)
+ self.linked_list.append_to_front(new_node)
+ self.lookup[query] = new_node
+```
+
+#### 何时更新缓存
+
+缓存将会在以下几种情况更新:
+
+* 页面内容发生变化
+* 页面被移除或者加入了新页面
+* 页面的权值发生变动
+
+解决这些问题的最直接的方法,就是为缓存记录设置一个它在被更新前能留在缓存中的最长时间,这个时间简称为存活时间(TTL)。
+
+参考 [「何时更新缓存」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#何时更新缓存)来了解其权衡取舍及替代方案。以上方法在[缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)一章中详细地进行了描述。
+
+## 第四步:架构扩展
+
+> 根据限制条件,找到并解决瓶颈。
+
+![Imgur](http://i.imgur.com/4j99mhe.png)
+
+**重要提示:不要从最初设计直接跳到最终设计中!**
+
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
+
+我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
+
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+
+### 将内存缓存扩大到多台机器
+
+为了解决庞大的请求负载以及巨大的内存需求,我们将要对架构进行水平拓展。如何在我们的**内存缓存**集群中存储数据呢?我们有以下三个主要可选方案:
+
+* **缓存集群中的每一台机器都有自己的缓存** - 简单,但是它会降低缓存命中率。
+* **缓存集群中的每一台机器都有缓存的拷贝** - 简单,但是它的内存使用效率太低了。
+* **对缓存进行[分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片),分别部署在缓存集群中的所有机器中** - 更加复杂,但是它是最佳的选择。我们可以使用哈希,用查询语句 `machine = hash(query)` 来确定哪台机器有需要缓存。当然我们也可以使用[一致性哈希](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#正在完善中)。
+
+## 其它要点
+
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+
+### SQL 缩放模式
+
+* [读取复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+
+#### NoSQL
+
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+
+### 缓存
+
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+
+### 异步与微服务
+
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+
+### 通信
+
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+
+### 安全性
+
+请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
+
+### 延迟数值
+
+请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+
+### 持续探讨
+
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构拓展是一个迭代的过程。
diff --git a/solutions/system_design/sales_rank/README-zh-Hans.md b/solutions/system_design/sales_rank/README-zh-Hans.md
new file mode 100644
index 00000000..960f9258
--- /dev/null
+++ b/solutions/system_design/sales_rank/README-zh-Hans.md
@@ -0,0 +1,338 @@
+# 为 Amazon 设计分类售卖排行
+
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+
+## 第一步:简述用例与约束条件
+
+> 搜集需求与问题的范围。
+> 提出问题来明确用例与约束条件。
+> 讨论假设。
+
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+
+### 用例
+
+#### 我们将把问题限定在仅处理以下用例的范围中
+
+* **服务**根据分类计算过去一周中最受欢迎的商品
+* **用户**通过分类浏览过去一周中最受欢迎的商品
+* **服务**有着高可用性
+
+#### 不在用例范围内的有
+
+* 一般的电商网站
+ * 只为售卖排行榜设计组件
+
+### 限制条件与假设
+
+#### 提出假设
+
+* 网络流量不是均匀分布的
+* 一个商品可能存在于多个分类中
+* 商品不能够更改分类
+* 不会存在如 `foo/bar/baz` 之类的子分类
+* 每小时更新一次结果
+ * 受欢迎的商品越多,就需要更频繁地更新
+* 1000 万个商品
+* 1000 个分类
+* 每个月 10 亿次交易
+* 每个月 1000 亿次读取请求
+* 100:1 的读写比例
+
+#### 计算用量
+
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+
+* 每笔交易的用量:
+ * `created_at` - 5 字节
+ * `product_id` - 8 字节
+ * `category_id` - 4 字节
+ * `seller_id` - 8 字节
+ * `buyer_id` - 8 字节
+ * `quantity` - 4 字节
+ * `total_price` - 5 字节
+ * 总计:大约 40 字节
+* 每个月的交易内容会产生 40 GB 的记录
+ * 每次交易 40 字节 * 每个月 10 亿次交易
+ * 3年内产生了 1.44 TB 的新交易内容记录
+ * 假定大多数的交易都是新交易而不是更改以前进行完的交易
+* 平均每秒 400 次交易次数
+* 平均每秒 40,000 次读取请求
+
+便利换算指南:
+
+* 每个月有 250 万秒
+* 每秒一个请求 = 每个月 250 万次请求
+* 每秒 40 个请求 = 每个月 1 亿次请求
+* 每秒 400 个请求 = 每个月 10 亿次请求
+
+## 第二步:概要设计
+
+> 列出所有重要组件以规划概要设计。
+
+![Imgur](http://i.imgur.com/vwMa1Qu.png)
+
+## 第三步:设计核心组件
+
+> 深入每个核心组件的细节。
+
+### 用例:服务需要根据分类计算上周最受欢迎的商品
+
+我们可以在现成的**对象存储**系统(例如 Amazon S3 服务)中存储 **售卖 API** 服务产生的日志文本, 因此不需要我们自己搭建分布式文件系统了。
+
+**向你的面试官告知你准备写多少代码**。
+
+假设下面是一个用 tab 分割的简易的日志记录:
+
+```
+timestamp product_id category_id qty total_price seller_id buyer_id
+t1 product1 category1 2 20.00 1 1
+t2 product1 category2 2 20.00 2 2
+t2 product1 category2 1 10.00 2 3
+t3 product2 category1 3 7.00 3 4
+t4 product3 category2 7 2.00 4 5
+t5 product4 category1 1 5.00 5 6
+...
+```
+
+**售卖排行服务** 需要用到 **MapReduce**,并使用 **售卖 API** 服务进行日志记录,同时将结果写入 **SQL 数据库**中的总表 `sales_rank` 中。我们也可以讨论一下[究竟是用 SQL 还是用 NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
+
+我们需要通过以下步骤使用 **MapReduce**:
+
+* **第 1 步** - 将数据转换为 `(category, product_id), sum(quantity)` 的形式
+* **第 2 步** - 执行分布式排序
+
+```python
+class SalesRanker(MRJob):
+
+ def within_past_week(self, timestamp):
+ """如果时间戳属于过去的一周则返回 True,
+ 否则返回 False。"""
+ ...
+
+ def mapper(self, _ line):
+ """解析日志的每一行,提取并转换相关行,
+
+ 将键值对设定为如下形式:
+
+ (category1, product1), 2
+ (category2, product1), 2
+ (category2, product1), 1
+ (category1, product2), 3
+ (category2, product3), 7
+ (category1, product4), 1
+ """
+ timestamp, product_id, category_id, quantity, total_price, seller_id, \
+ buyer_id = line.split('\t')
+ if self.within_past_week(timestamp):
+ yield (category_id, product_id), quantity
+
+ def reducer(self, key, value):
+ """将每个 key 的值加起来。
+
+ (category1, product1), 2
+ (category2, product1), 3
+ (category1, product2), 3
+ (category2, product3), 7
+ (category1, product4), 1
+ """
+ yield key, sum(values)
+
+ def mapper_sort(self, key, value):
+ """构造 key 以确保正确的排序。
+
+ 将键值对转换成如下形式:
+
+ (category1, 2), product1
+ (category2, 3), product1
+ (category1, 3), product2
+ (category2, 7), product3
+ (category1, 1), product4
+
+ MapReduce 的随机排序步骤会将键
+ 值的排序打乱,变成下面这样:
+
+ (category1, 1), product4
+ (category1, 2), product1
+ (category1, 3), product2
+ (category2, 3), product1
+ (category2, 7), product3
+ """
+ category_id, product_id = key
+ quantity = value
+ yield (category_id, quantity), product_id
+
+ def reducer_identity(self, key, value):
+ yield key, value
+
+ def steps(self):
+ """ 此处为 map reduce 步骤"""
+ return [
+ self.mr(mapper=self.mapper,
+ reducer=self.reducer),
+ self.mr(mapper=self.mapper_sort,
+ reducer=self.reducer_identity),
+ ]
+```
+
+得到的结果将会是如下的排序列,我们将其插入 `sales_rank` 表中:
+
+```
+(category1, 1), product4
+(category1, 2), product1
+(category1, 3), product2
+(category2, 3), product1
+(category2, 7), product3
+```
+
+`sales_rank` 表的数据结构如下:
+
+```
+id int NOT NULL AUTO_INCREMENT
+category_id int NOT NULL
+total_sold int NOT NULL
+product_id int NOT NULL
+PRIMARY KEY(id)
+FOREIGN KEY(category_id) REFERENCES Categories(id)
+FOREIGN KEY(product_id) REFERENCES Products(id)
+```
+
+我们会以 `id`、`category_id` 与 `product_id` 创建一个 [索引](https://github.com/donnemartin/system-design-primer#use-good-indices)以加快查询速度(只需要使用读取日志的时间,不再需要每次都扫描整个数据表)并让数据常驻内存。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
+
+### 用例:用户需要根据分类浏览上周中最受欢迎的商品
+
+* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
+* 这个 **Web 服务器**将请求转发给**查询 API** 服务
+* The **查询 API** 服务将从 **SQL 数据库**的 `sales_rank` 表中读取数据
+
+我们可以调用一个公共的 [REST API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest):
+
+```
+$ curl https://amazon.com/api/v1/popular?category_id=1234
+```
+
+返回:
+
+```
+{
+ "id": "100",
+ "category_id": "1234",
+ "total_sold": "100000",
+ "product_id": "50",
+},
+{
+ "id": "53",
+ "category_id": "1234",
+ "total_sold": "90000",
+ "product_id": "200",
+},
+{
+ "id": "75",
+ "category_id": "1234",
+ "total_sold": "80000",
+ "product_id": "3",
+},
+```
+
+而对于服务器内部的通信,我们可以使用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
+
+## 第四步:架构扩展
+
+> 根据限制条件,找到并解决瓶颈。
+
+![Imgur](http://i.imgur.com/MzExP06.png)
+
+**重要提示:不要从最初设计直接跳到最终设计中!**
+
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
+
+我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
+
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
+* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
+* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+
+**分析数据库** 可以用现成的数据仓储系统,例如使用 Amazon Redshift 或者 Google BigQuery 的解决方案。
+
+当使用数据仓储技术或者**对象存储**系统时,我们只想在数据库中存储有限时间段的数据。Amazon S3 的**对象存储**系统可以很方便地设置每个月限制只允许新增 40 GB 的存储内容。
+
+平均每秒 40,000 次的读取请求(峰值将会更高), 可以通过扩展 **内存缓存** 来处理热点内容的读取流量,这对于处理不均匀分布的流量和流量峰值也很有用。由于读取量非常大,**SQL Read 副本** 可能会遇到处理缓存未命中的问题,我们可能需要使用额外的 SQL 扩展模式。
+
+平均每秒 400 次写操作(峰值将会更高)可能对于单个 **SQL 写主-从** 模式来说比较很困难,因此同时还需要更多的扩展技术
+
+SQL 缩放模式包括:
+
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+
+我们也可以考虑将一些数据移至 **NoSQL 数据库**。
+
+## 其它要点
+
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+
+#### NoSQL
+
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+
+### 缓存
+
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+
+### 异步与微服务
+
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+
+### 通信
+
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+
+### 安全性
+
+请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
+
+### 延迟数值
+
+请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+
+### 持续探讨
+
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构拓展是一个迭代的过程。
diff --git a/solutions/system_design/scaling_aws/README-zh-Hans.md b/solutions/system_design/scaling_aws/README-zh-Hans.md
new file mode 100644
index 00000000..c071c70e
--- /dev/null
+++ b/solutions/system_design/scaling_aws/README-zh-Hans.md
@@ -0,0 +1,403 @@
+# 在 AWS 上设计支持百万级到千万级用户的系统
+
+**注释:为了避免重复,这篇文章的链接直接关联到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 的相关章节。为一讨论要点、折中方案和可选方案做参考。**
+
+## 第 1 步:用例和约束概要
+
+> 收集需求并调查问题。
+> 通过提问清晰用例和约束。
+> 讨论假设。
+
+如果没有面试官提出明确的问题,我们将自己定义一些用例和约束条件。
+
+### 用例
+
+解决这个问题是一个循序渐进的过程:1) **基准/负载 测试**, 2) 瓶颈 **概述**, 3) 当评估可选和折中方案时定位瓶颈,4) 重复,这是向可扩展的设计发展基础设计的好模式。
+
+除非你有 AWS 的背景或者正在申请需要 AWS 知识的相关职位,否则不要求了解 AWS 的相关细节。并且,这个练习中讨论的许多原则可以更广泛地应用于AWS生态系统之外。
+
+#### 我们就处理以下用例讨论这一问题
+
+* **用户** 进行读或写请求
+ * **服务** 进行处理,存储用户数据,然后返回结果
+* **服务** 需要从支持小规模用户开始到百万用户
+ * 在我们演化架构来处理大量的用户和请求时,讨论一般的扩展模式
+* **服务** 高可用
+
+### 约束和假设
+
+#### 状态假设
+
+* 流量不均匀分布
+* 需要关系数据
+* 从一个用户扩展到千万用户
+ * 表示用户量的增长
+ * 用户量+
+ * 用户量++
+ * 用户量+++
+ * ...
+ * 1000 万用户
+ * 每月 10 亿次写入
+ * 每月 1000 亿次读出
+ * 100:1 读写比率
+ * 每次写入 1 KB 内容
+
+#### 计算使用
+
+**向你的面试官厘清你是否应该做粗略的使用计算**
+
+* 1 TB 新内容 / 月
+ * 1 KB 每次写入 * 10 亿 写入 / 月
+ * 36 TB 新内容 / 3 年
+ * 假设大多数写入都是新内容而不是更新已有内容
+* 平均每秒 400 次写入
+* 平均每秒 40,000 次读取
+
+便捷的转换指南:
+
+* 250 万秒 / 月
+* 1 次请求 / 秒 = 250 万次请求 / 月
+* 40 次请求 / 秒 = 1 亿次请求 / 月
+* 400 次请求 / 秒 = 10 亿请求 / 月
+
+## 第 2 步:创建高级设计方案
+
+> 用所有重要组件概述高水平设计
+
+![Imgur](http://i.imgur.com/B8LDKD7.png)
+
+## 第 3 步:设计核心组件
+
+> 深入每个核心组件的细节。
+
+### 用例:用户进行读写请求
+
+#### 目标
+
+* 只有 1-2 个用户时,你只需要基础配置
+ * 为简单起见,只需要一台服务器
+ * 必要时进行纵向扩展
+ * 监控以确定瓶颈
+
+#### 以单台服务器开始
+
+* **Web 服务器** 在 EC2 上
+ * 存储用户数据
+ * [**MySQL 数据库**](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
+
+运用 **纵向扩展**:
+
+* 选择一台更大容量的服务器
+* 密切关注指标,确定如何扩大规模
+ * 使用基本监控来确定瓶颈:CPU、内存、IO、网络等
+ * CloudWatch, top, nagios, statsd, graphite等
+* 纵向扩展的代价将变得更昂贵
+* 无冗余/容错
+
+**折中方案, 可选方案, 和其他细节:**
+
+* **纵向扩展** 的可选方案是 [**横向扩展**](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+
+#### 自 SQL 开始,但认真考虑 NoSQL
+
+约束条件假设需要关系型数据。我们可以开始时在单台服务器上使用 **MySQL 数据库**。
+
+**折中方案, 可选方案, 和其他细节:**
+
+* 查阅 [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) 章节
+* 讨论使用 [SQL 或 NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql) 的原因
+
+#### 分配公共静态 IP
+
+* 弹性 IP 提供了一个公共端点,不会在重启时改变 IP。
+* 故障转移时只需要把域名指向新 IP。
+
+#### 使用 DNS 服务
+
+添加 **DNS** 服务,比如 Route 53([Amazon Route 53](https://aws.amazon.com/cn/route53/) - 译者注),将域映射到实例的公共 IP 中。
+
+**折中方案, 可选方案, 和其他细节:**
+
+* 查阅 [域名系统](https://github.com/donnemartin/system-design-primer#domain-name-system) 章节
+
+#### 安全的 Web 服务器
+
+* 只开放必要的端口
+ * 允许 Web 服务器响应来自以下端口的请求
+ * HTTP 80
+ * HTTPS 443
+ * SSH IP 白名单 22
+ * 防止 Web 服务器启动外链
+
+**折中方案, 可选方案, 和其他细节:**
+
+* 查阅 [安全](https://github.com/donnemartin/system-design-primer#security) 章节
+
+## 第 4 步:扩展设计
+
+> 在给定约束条件下,定义和确认瓶颈。
+
+### 用户+
+
+![Imgur](http://i.imgur.com/rrfjMXB.png)
+
+#### 假设
+
+我们的用户数量开始上升,并且单台服务器的负载上升。**基准/负载测试** 和 **分析** 指出 **MySQL 数据库** 占用越来越多的内存和 CPU 资源,同时用户数据将填满硬盘空间。
+
+目前,我们尚能在纵向扩展时解决这些问题。不幸的是,解决这些问题的代价变得相当昂贵,并且原来的系统并不能允许在 **MySQL 数据库** 和 **Web 服务器** 的基础上进行独立扩展。
+
+#### 目标
+
+* 减轻单台服务器负载并且允许独立扩展
+ * 在 **对象存储** 中单独存储静态内容
+ * 将 **MySQL 数据库** 迁移到单独的服务器上
+* 缺点
+ * 这些变化会增加复杂性,并要求对 **Web服务器** 进行更改,以指向 **对象存储** 和 **MySQL 数据库**
+ * 必须采取额外的安全措施来确保新组件的安全
+ * AWS 的成本也会增加,但应该与自身管理类似系统的成本做比较
+
+#### 独立保存静态内容
+
+* 考虑使用像 S3 这样可管理的 **对象存储** 服务来存储静态内容
+ * 高扩展性和可靠性
+ * 服务器端加密
+* 迁移静态内容到 S3
+ * 用户文件
+ * JS
+ * CSS
+ * 图片
+ * 视频
+
+#### 迁移 MySQL 数据库到独立机器上
+
+* 考虑使用类似 RDS 的服务来管理 **MySQL 数据库**
+ * 简单的管理,扩展
+ * 多个可用区域
+ * 空闲时加密
+
+#### 系统安全
+
+* 在传输和空闲时对数据进行加密
+* 使用虚拟私有云
+ * 为单个 **Web 服务器** 创建一个公共子网,这样就可以发送和接收来自 internet 的流量
+ * 为其他内容创建一个私有子网,禁止外部访问
+ * 在每个组件上只为白名单 IP 打开端口
+* 这些相同的模式应当在新的组件的实现中实践
+
+**折中方案, 可选方案, 和其他细节:**
+
+* 查阅 [安全](https://github.com/donnemartin/system-design-primer#security) 章节
+
+### 用户+++
+
+![Imgur](http://i.imgur.com/raoFTXM.png)
+
+#### 假设
+
+我们的 **基准/负载测试** 和 **性能测试** 显示,在高峰时段,我们的单一 **Web服务器** 存在瓶颈,导致响应缓慢,在某些情况下还会宕机。随着服务的成熟,我们也希望朝着更高的可用性和冗余发展。
+
+#### 目标
+
+* 下面的目标试图用 **Web服务器** 解决扩展问题
+ * 基于 **基准/负载测试** 和 **分析**,你可能只需要实现其中的一两个技术
+* 使用 [**横向扩展**](https://github.com/donnemartin/system-design-primer#horizontal-scaling) 来处理增加的负载和单点故障
+ * 添加 [**负载均衡器**](https://github.com/donnemartin/system-design-primer#load-balancer) 例如 Amazon 的 ELB 或 HAProxy
+ * ELB 是高可用的
+ * 如果你正在配置自己的 **负载均衡器**, 在多个可用区域中设置多台服务器用于 [双活](https://github.com/donnemartin/system-design-primer#active-active) 或 [主被](https://github.com/donnemartin/system-design-primer#active-passive) 将提高可用性
+ * 终止在 **负载平衡器** 上的SSL,以减少后端服务器上的计算负载,并简化证书管理
+ * 在多个可用区域中使用多台 **Web服务器**
+ * 在多个可用区域的 [**主-从 故障转移**](https://github.com/donnemartin/system-design-primer#master-slave-replication) 模式中使用多个 **MySQL** 实例来改进冗余
+* 分离 **Web 服务器** 和 [**应用服务器**](https://github.com/donnemartin/system-design-primer#application-layer)
+ * 独立扩展和配置每一层
+ * **Web 服务器** 可以作为 [**反向代理**](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+ * 例如, 你可以添加 **应用服务器** 处理 **读 API** 而另外一些处理 **写 API**
+* 将静态(和一些动态)内容转移到 [**内容分发网络 (CDN)**](https://github.com/donnemartin/system-design-primer#content-delivery-network) 例如 CloudFront 以减少负载和延迟
+
+**折中方案, 可选方案, 和其他细节:**
+
+* 查阅以上链接获得更多细节
+
+### 用户+++
+
+![Imgur](http://i.imgur.com/OZCxJr0.png)
+
+**注意:** **内部负载均衡** 不显示以减少混乱
+
+#### 假设
+
+我们的 **性能/负载测试** 和 **性能测试** 显示我们读操作频繁(100:1 的读写比率),并且数据库在高读请求时表现很糟糕。
+
+#### 目标
+
+* 下面的目标试图解决 **MySQL数据库** 的伸缩性问题
+ * * 基于 **基准/负载测试** 和 **分析**,你可能只需要实现其中的一两个技术
+* 将下列数据移动到一个 [**内存缓存**](https://github.com/donnemartin/system-design-primer#cache),例如弹性缓存,以减少负载和延迟:
+ * **MySQL** 中频繁访问的内容
+ * 首先, 尝试配置 **MySQL 数据库** 缓存以查看是否足以在实现 **内存缓存** 之前缓解瓶颈
+ * 来自 **Web 服务器** 的会话数据
+ * **Web 服务器** 变成无状态的, 允许 **自动伸缩**
+ * 从内存中读取 1 MB 内存需要大约 250 微秒,而从SSD中读取时间要长 4 倍,从磁盘读取的时间要长 80 倍。1
+* 添加 [**MySQL 读取副本**](https://github.com/donnemartin/system-design-primer#master-slave-replication) 来减少写主线程的负载
+* 添加更多 **Web 服务器** and **应用服务器** 来提高响应
+
+**折中方案, 可选方案, 和其他细节:**
+
+* 查阅以上链接获得更多细节
+
+#### 添加 MySQL 读取副本
+
+* 除了添加和扩展 **内存缓存**,**MySQL 读副本服务器** 也能够帮助缓解在 **MySQL 写主服务器** 的负载。
+* 添加逻辑到 **Web 服务器** 来区分读和写操作
+* 在 **MySQL 读副本服务器** 之上添加 **负载均衡器** (不是为了减少混乱)
+* 大多数服务都是读取负载大于写入负载
+
+**折中方案, 可选方案, 和其他细节:**
+
+* 查阅 [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) 章节
+
+### 用户++++
+
+![Imgur](http://i.imgur.com/3X8nmdL.png)
+
+#### 假设
+
+**基准/负载测试** 和 **分析** 显示,在美国,正常工作时间存在流量峰值,当用户离开办公室时,流量骤降。我们认为,可以通过真实负载自动转换服务器数量来降低成本。我们是一家小商店,所以我们希望 DevOps 尽量自动化地进行 **自动伸缩** 和通用操作。
+
+#### 目标
+
+* 根据需要添加 **自动扩展**
+ * 跟踪流量高峰
+ * 通过关闭未使用的实例来降低成本
+* 自动化 DevOps
+ * Chef, Puppet, Ansible 工具等
+* 继续监控指标以解决瓶颈
+ * **主机水平** - 检查一个 EC2 实例
+ * **总水平** - 检查负载均衡器统计数据
+ * **日志分析** - CloudWatch, CloudTrail, Loggly, Splunk, Sumo
+ * **外部站点的性能** - Pingdom or New Relic
+ * **处理通知和事件** - PagerDuty
+ * **错误报告** - Sentry
+
+#### 添加自动扩展
+
+* 考虑使用一个托管服务,比如AWS **自动扩展**
+ * 为每个 **Web 服务器** 创建一个组,并为每个 **应用服务器** 类型创建一个组,将每个组放置在多个可用区域中
+ * 设置最小和最大实例数
+ * 通过 CloudWatch 来扩展或收缩
+ * 可预测负载的简单时间度量
+ * 一段时间内的指标:
+ * CPU 负载
+ * 延迟
+ * 网络流量
+ * 自定义指标
+ * 缺点
+ * 自动扩展会引入复杂性
+ * 可能需要一段时间才能适当扩大规模,以满足增加的需求,或者在需求下降时缩减规模
+
+### 用户+++++
+
+![Imgur](http://i.imgur.com/jj3A5N8.png)
+
+**注释:** **自动伸缩** 组不显示以减少混乱
+
+#### 假设
+
+当服务继续向着限制条件概述的方向发展,我们反复地运行 **基准/负载测试** 和 **分析** 来进一步发现和定位新的瓶颈。
+
+#### 目标
+
+由于问题的约束,我们将继续提出扩展性的问题:
+
+* 如果我们的 **MySQL 数据库** 开始变得过于庞大, 我们可能只考虑把数据在数据库中存储一段有限的时间, 同时在例如 Redshift 这样的数据仓库中存储其余的数据
+ * 像 Redshift 这样的数据仓库能够轻松处理每月 1TB 的新内容
+* 平均每秒 40,000 次的读取请求, 可以通过扩展 **内存缓存** 来处理热点内容的读取流量,这对于处理不均匀分布的流量和流量峰值也很有用
+ * **SQL读取副本** 可能会遇到处理缓存未命中的问题, 我们可能需要使用额外的 SQL 扩展模式
+* 对于单个 **SQL 写主-从** 模式来说,平均每秒 400 次写操作(明显更高)可能会很困难,同时还需要更多的扩展技术
+
+SQL 扩展模型包括:
+
+* [集合](https://github.com/donnemartin/system-design-primer#federation)
+* [分片](https://github.com/donnemartin/system-design-primer#sharding)
+* [反范式](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
+
+为了进一步处理高读和写请求,我们还应该考虑将适当的数据移动到一个 [**NoSQL数据库**](https://github.com/donnemartin/system-design-primer#nosql) ,例如 DynamoDB。
+
+我们可以进一步分离我们的 [**应用服务器**](https://github.com/donnemartin/system-design-primer#application-layer) 以允许独立扩展。不需要实时完成的批处理任务和计算可以通过 Queues 和 Workers 异步完成:
+
+* 以照片服务为例,照片上传和缩略图的创建可以分开进行
+ * **客户端** 上传图片
+ * **应用服务器** 推送一个任务到 **队列** 例如 SQS
+ * EC2 上的 **Worker 服务** 或者 Lambda 从 **队列** 拉取 work,然后:
+ * 创建缩略图
+ * 更新 **数据库**
+ * 在 **对象存储** 中存储缩略图
+
+**折中方案, 可选方案, 和其他细节:**
+
+* 查阅以上链接获得更多细节
+
+## 额外的话题
+
+> 根据问题的范围和剩余时间,还需要深入讨论其他问题。
+
+### SQL 扩展模式
+
+* [读取副本](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [集合](https://github.com/donnemartin/system-design-primer#federation)
+* [分区](https://github.com/donnemartin/system-design-primer#sharding)
+* [反规范化](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
+
+#### NoSQL
+
+* [键值存储](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [文档存储](https://github.com/donnemartin/system-design-primer#document-store)
+* [宽表存储](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [图数据库](https://github.com/donnemartin/system-design-primer#graph-database)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
+
+### 缓存
+
+* 缓存到哪里
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web 服务缓存](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer#application-caching)
+* 缓存什么
+ * [数据库请求层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [对象层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* 何时更新缓存
+ * [预留缓存](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [完全写入](https://github.com/donnemartin/system-design-primer#write-through)
+ * [延迟写 (写回)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [事先更新](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+
+### 异步性和微服务
+
+* [消息队列](https://github.com/donnemartin/system-design-primer#message-queues)
+* [任务队列](https://github.com/donnemartin/system-design-primer#task-queues)
+* [回退压力](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [微服务](https://github.com/donnemartin/system-design-primer#microservices)
+
+### 沟通
+
+* 关于折中方案的讨论:
+ * 客户端的外部通讯 - [遵循 REST 的 HTTP APIs](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * 内部通讯 - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [服务探索](https://github.com/donnemartin/system-design-primer#service-discovery)
+
+### 安全性
+
+参考 [安全章节](https://github.com/donnemartin/system-design-primer#security)
+
+### 延迟数字指标
+
+查阅 [每个程序员必懂的延迟数字](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know)
+
+### 正在进行
+
+* 继续基准测试并监控你的系统以解决出现的瓶颈问题
+* 扩展是一个迭代的过程
diff --git a/solutions/system_design/social_graph/README-zh-Hans.md b/solutions/system_design/social_graph/README-zh-Hans.md
new file mode 100644
index 00000000..07b8e3e7
--- /dev/null
+++ b/solutions/system_design/social_graph/README-zh-Hans.md
@@ -0,0 +1,348 @@
+# 为社交网络设计数据结构
+
+**注释:为了避免重复,这篇文章的链接直接关联到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 的相关章节。为一讨论要点、折中方案和可选方案做参考。**
+
+## 第 1 步:用例和约束概要
+
+> 收集需求并调查问题。
+> 通过提问清晰用例和约束。
+> 讨论假设。
+
+如果没有面试官提出明确的问题,我们将自己定义一些用例和约束条件。
+
+### 用例
+
+#### 我们就处理以下用例审视这一问题
+
+* **用户** 寻找某人并显示与被寻人之间的最短路径
+* **服务** 高可用
+
+### 约束和假设
+
+#### 状态假设
+
+* 流量分布不均
+ * 某些搜索比别的更热门,同时某些搜索仅执行一次
+* 图数据不适用单一机器
+* 图的边没有权重
+* 1 千万用户
+* 每个用户平均有 50 个朋友
+* 每月 10 亿次朋友搜索
+
+训练使用更传统的系统 - 别用图特有的解决方案例如 [GraphQL](http://graphql.org/) 或图数据库如 [Neo4j](https://neo4j.com/)。
+
+#### 计算使用
+
+**向你的面试官厘清你是否应该做粗略的使用计算**
+
+* 50 亿朋友关系
+ * 1 亿用户 * 平均每人 50 个朋友
+* 每秒 400 次搜索请求
+
+便捷的转换指南:
+
+* 每月 250 万秒
+* 每秒 1 个请求 = 每月 250 万次请求
+* 每秒 40 个请求 = 每月 1 亿次请求
+* 每秒 400 个请求 = 每月 10 亿次请求
+
+## 第 2 步:创建高级设计方案
+
+> 用所有重要组件概述高水平设计
+
+![Imgur](http://i.imgur.com/wxXyq2J.png)
+
+## 第 3 步:设计核心组件
+
+> 深入每个核心组件的细节。
+
+### 用例: 用户搜索某人并查看到被搜人的最短路径
+
+**和你的面试官说清你期望的代码量**
+
+没有百万用户(点)的和十亿朋友关系(边)的限制,我们能够用一般 BFS 方法解决无权重最短路径任务:
+
+```python
+class Graph(Graph):
+
+ def shortest_path(self, source, dest):
+ if source is None or dest is None:
+ return None
+ if source is dest:
+ return [source.key]
+ prev_node_keys = self._shortest_path(source, dest)
+ if prev_node_keys is None:
+ return None
+ else:
+ path_ids = [dest.key]
+ prev_node_key = prev_node_keys[dest.key]
+ while prev_node_key is not None:
+ path_ids.append(prev_node_key)
+ prev_node_key = prev_node_keys[prev_node_key]
+ return path_ids[::-1]
+
+ def _shortest_path(self, source, dest):
+ queue = deque()
+ queue.append(source)
+ prev_node_keys = {source.key: None}
+ source.visit_state = State.visited
+ while queue:
+ node = queue.popleft()
+ if node is dest:
+ return prev_node_keys
+ prev_node = node
+ for adj_node in node.adj_nodes.values():
+ if adj_node.visit_state == State.unvisited:
+ queue.append(adj_node)
+ prev_node_keys[adj_node.key] = prev_node.key
+ adj_node.visit_state = State.visited
+ return None
+```
+
+我们不能在同一台机器上满足所有用户,我们需要通过 **人员服务器** [拆分](https://github.com/donnemartin/system-design-primer#sharding) 用户并且通过 **查询服务** 访问。
+
+* **客户端** 向 **服务器** 发送请求,**服务器** 作为 [反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* **搜索 API** 服务器向 **用户图服务** 转发请求
+* **用户图服务** 有以下功能:
+ * 使用 **查询服务** 找到当前用户信息存储的 **人员服务器**
+ * 找到适当的 **人员服务器** 检索当前用户的 `friend_ids` 列表
+ * 把当前用户作为 `source` 运行 BFS 搜索算法同时 当前用户的 `friend_ids` 作为每个 `adjacent_node` 的 ids
+ * 给定 id 获取 `adjacent_node`:
+ * **用户图服务** 将 **再次** 和 **查询服务** 通讯,最后判断出和给定 id 相匹配的存储 `adjacent_node` 的 **人员服务器**(有待优化)
+
+**和你的面试官说清你应该写的代码量**
+
+**注释**:简易版错误处理执行如下。询问你是否需要编写适当的错误处理方法。
+
+**查询服务** 实现:
+
+```python
+class LookupService(object):
+
+ def __init__(self):
+ self.lookup = self._init_lookup() # key: person_id, value: person_server
+
+ def _init_lookup(self):
+ ...
+
+ def lookup_person_server(self, person_id):
+ return self.lookup[person_id]
+```
+
+**人员服务器** 实现:
+
+```python
+class PersonServer(object):
+
+ def __init__(self):
+ self.people = {} # key: person_id, value: person
+
+ def add_person(self, person):
+ ...
+
+ def people(self, ids):
+ results = []
+ for id in ids:
+ if id in self.people:
+ results.append(self.people[id])
+ return results
+```
+
+**用户** 实现:
+
+```python
+class Person(object):
+
+ def __init__(self, id, name, friend_ids):
+ self.id = id
+ self.name = name
+ self.friend_ids = friend_ids
+```
+
+**用户图服务** 实现:
+
+```python
+class UserGraphService(object):
+
+ def __init__(self, lookup_service):
+ self.lookup_service = lookup_service
+
+ def person(self, person_id):
+ person_server = self.lookup_service.lookup_person_server(person_id)
+ return person_server.people([person_id])
+
+ def shortest_path(self, source_key, dest_key):
+ if source_key is None or dest_key is None:
+ return None
+ if source_key is dest_key:
+ return [source_key]
+ prev_node_keys = self._shortest_path(source_key, dest_key)
+ if prev_node_keys is None:
+ return None
+ else:
+ # Iterate through the path_ids backwards, starting at dest_key
+ path_ids = [dest_key]
+ prev_node_key = prev_node_keys[dest_key]
+ while prev_node_key is not None:
+ path_ids.append(prev_node_key)
+ prev_node_key = prev_node_keys[prev_node_key]
+ # Reverse the list since we iterated backwards
+ return path_ids[::-1]
+
+ def _shortest_path(self, source_key, dest_key, path):
+ # Use the id to get the Person
+ source = self.person(source_key)
+ # Update our bfs queue
+ queue = deque()
+ queue.append(source)
+ # prev_node_keys keeps track of each hop from
+ # the source_key to the dest_key
+ prev_node_keys = {source_key: None}
+ # We'll use visited_ids to keep track of which nodes we've
+ # visited, which can be different from a typical bfs where
+ # this can be stored in the node itself
+ visited_ids = set()
+ visited_ids.add(source.id)
+ while queue:
+ node = queue.popleft()
+ if node.key is dest_key:
+ return prev_node_keys
+ prev_node = node
+ for friend_id in node.friend_ids:
+ if friend_id not in visited_ids:
+ friend_node = self.person(friend_id)
+ queue.append(friend_node)
+ prev_node_keys[friend_id] = prev_node.key
+ visited_ids.add(friend_id)
+ return None
+```
+
+我们用的是公共的 [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
+
+```
+$ curl https://social.com/api/v1/friend_search?person_id=1234
+```
+
+响应:
+
+```
+{
+ "person_id": "100",
+ "name": "foo",
+ "link": "https://social.com/foo",
+},
+{
+ "person_id": "53",
+ "name": "bar",
+ "link": "https://social.com/bar",
+},
+{
+ "person_id": "1234",
+ "name": "baz",
+ "link": "https://social.com/baz",
+},
+```
+
+内部通信使用 [远端过程调用](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)。
+
+## 第 4 步:扩展设计
+
+> 在给定约束条件下,定义和确认瓶颈。
+
+![Imgur](http://i.imgur.com/cdCv5g7.png)
+
+**重要:别简化从最初设计到最终设计的过程!**
+
+你将要做的是:1) **基准/负载 测试**, 2) 瓶颈 **概述**, 3) 当评估可选和折中方案时定位瓶颈,4) 重复。以 [在 AWS 上设计支持百万级到千万级用户的系统](../scaling_aws/README.md) 为参考迭代地扩展最初设计。
+
+讨论最初设计可能遇到的瓶颈和处理方法十分重要。例如,什么问题可以通过添加多台 **Web 服务器** 作为 **负载均衡** 解决?**CDN**?**主从副本**?每个问题都有哪些替代和 **折中** 方案?
+
+我们即将介绍一些组件来完成设计和解决扩展性问题。内部负载均衡不显示以减少混乱。
+
+**避免重复讨论**,以下网址链接到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 相关的主流方案、折中方案和替代方案。
+
+* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
+* [负载均衡](https://github.com/donnemartin/system-design-primer#load-balancer)
+* [横向扩展](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
+* [Web 服务器(反向代理)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
+* [API 服务器(应用层)](https://github.com/donnemartin/system-design-primer#application-layer)
+* [缓存](https://github.com/donnemartin/system-design-primer#cache)
+* [一致性模式](https://github.com/donnemartin/system-design-primer#consistency-patterns)
+* [可用性模式](https://github.com/donnemartin/system-design-primer#availability-patterns)
+
+解决 **平均** 每秒 400 次请求的限制(峰值),人员数据可以存在例如 Redis 或 Memcached 这样的 **内存** 中以减少响应次数和下游流量通信服务。这尤其在用户执行多次连续查询和查询哪些广泛连接的人时十分有用。从内存中读取 1MB 数据大约要 250 微秒,从 SSD 中读取同样大小的数据时间要长 4 倍,从硬盘要长 80 倍。1
+
+以下是进一步优化方案:
+
+* 在 **内存** 中存储完整的或部分的BFS遍历加快后续查找
+* 在 **NoSQL 数据库** 中批量离线计算并存储完整的或部分的BFS遍历加快后续查找
+* 在同一台 **人员服务器** 上托管批处理同一批朋友查找减少机器跳转
+ * 通过地理位置 [拆分](https://github.com/donnemartin/system-design-primer#sharding) **人员服务器** 来进一步优化,因为朋友通常住得都比较近
+* 同时进行两个 BFS 查找,一个从 source 开始,一个从 destination 开始,然后合并两个路径
+* 从有庞大朋友圈的人开始找起,这样更有可能减小当前用户和搜索目标之间的 [离散度数](https://en.wikipedia.org/wiki/Six_degrees_of_separation)
+* 在询问用户是否继续查询之前设置基于时间或跳跃数阈值,当在某些案例中搜索耗费时间过长时。
+* 使用类似 [Neo4j](https://neo4j.com/) 的 **图数据库** 或图特定查询语法,例如 [GraphQL](http://graphql.org/)(如果没有禁止使用 **图数据库** 的限制的话)
+
+## 额外的话题
+
+> 根据问题的范围和剩余时间,还需要深入讨论其他问题。
+
+### SQL 扩展模式
+
+* [读取副本](https://github.com/donnemartin/system-design-primer#master-slave-replication)
+* [集合](https://github.com/donnemartin/system-design-primer#federation)
+* [分区](https://github.com/donnemartin/system-design-primer#sharding)
+* [反规范化](https://github.com/donnemartin/system-design-primer#denormalization)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
+
+#### NoSQL
+
+* [键值存储](https://github.com/donnemartin/system-design-primer#key-value-store)
+* [文档存储](https://github.com/donnemartin/system-design-primer#document-store)
+* [宽表存储](https://github.com/donnemartin/system-design-primer#wide-column-store)
+* [图数据库](https://github.com/donnemartin/system-design-primer#graph-database)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
+
+### 缓存
+
+* 缓存到哪里
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer#client-caching)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer#cdn-caching)
+ * [Web 服务缓存](https://github.com/donnemartin/system-design-primer#web-server-caching)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer#database-caching)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer#application-caching)
+* 缓存什么
+ * [数据库请求层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
+ * [对象层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
+* 何时更新缓存
+ * [预留缓存](https://github.com/donnemartin/system-design-primer#cache-aside)
+ * [完全写入](https://github.com/donnemartin/system-design-primer#write-through)
+ * [延迟写 (写回)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
+ * [事先更新](https://github.com/donnemartin/system-design-primer#refresh-ahead)
+
+### 异步性和微服务
+
+* [消息队列](https://github.com/donnemartin/system-design-primer#message-queues)
+* [任务队列](https://github.com/donnemartin/system-design-primer#task-queues)
+* [回退压力](https://github.com/donnemartin/system-design-primer#back-pressure)
+* [微服务](https://github.com/donnemartin/system-design-primer#microservices)
+
+### 沟通
+
+* 关于折中方案的讨论:
+ * 客户端的外部通讯 - [遵循 REST 的 HTTP APIs](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
+ * 内部通讯 - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
+* [服务探索](https://github.com/donnemartin/system-design-primer#service-discovery)
+
+### 安全性
+
+参考 [安全章节](https://github.com/donnemartin/system-design-primer#security)
+
+### 延迟数字指标
+
+查阅 [每个程序员必懂的延迟数字](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know)
+
+### 正在进行
+
+* 继续基准测试并监控你的系统以解决出现的瓶颈问题
+* 扩展是一个迭代的过程
diff --git a/solutions/system_design/twitter/README-zh-Hans.md b/solutions/system_design/twitter/README-zh-Hans.md
new file mode 100644
index 00000000..1853444d
--- /dev/null
+++ b/solutions/system_design/twitter/README-zh-Hans.md
@@ -0,0 +1,331 @@
+# 设计推特时间轴与搜索功能
+
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+
+**设计 Facebook 的 feed** 与**设计 Facebook 搜索**与此为同一类型问题。
+
+## 第一步:简述用例与约束条件
+
+> 搜集需求与问题的范围。
+> 提出问题来明确用例与约束条件。
+> 讨论假设。
+
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+
+### 用例
+
+#### 我们将把问题限定在仅处理以下用例的范围中
+
+* **用户**发布了一篇推特
+ * **服务**将推特推送给关注者,给他们发送消息通知与邮件
+* **用户**浏览用户时间轴(用户最近的活动)
+* **用户**浏览主页时间轴(用户关注的人最近的活动)
+* **用户**搜索关键词
+* **服务**需要有高可用性
+
+#### 不在用例范围内的有
+
+* **服务**向 Firehose 与其它流数据接口推送推特
+* **服务**根据用户的”是否可见“选项排除推特
+ * 隐藏未关注者的 @回复
+ * 关心”隐藏转发“设置
+* 数据分析
+
+### 限制条件与假设
+
+#### 提出假设
+
+普遍情况
+
+* 网络流量不是均匀分布的
+* 发布推特的速度需要足够快速
+ * 除非有上百万的关注者,否则将推特推送给粉丝的速度要足够快
+* 1 亿个活跃用户
+* 每天新发布 5 亿条推特,每月新发布 150 亿条推特
+ * 平均每条推特需要推送给 5 个人
+ * 每天需要进行 50 亿次推送
+ * 每月需要进行 1500 亿次推送
+* 每月需要处理 2500 亿次读取请求
+* 每月需要处理 100 亿次搜索
+
+时间轴功能
+
+* 浏览时间轴需要足够快
+* 推特的读取负载要大于写入负载
+ * 需要为推特的快速读取进行优化
+* 存入推特是高写入负载功能
+
+搜索功能
+
+* 搜索速度需要足够快
+* 搜索是高负载读取功能
+
+#### 计算用量
+
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+
+* 每条推特的大小:
+ * `tweet_id` - 8 字节
+ * `user_id` - 32 字节
+ * `text` - 140 字节
+ * `media` - 平均 10 KB
+ * 总计: 大约 10 KB
+* 每月产生新推特的内容为 150 TB
+ * 每条推特 10 KB * 每天 5 亿条推特 * 每月 30 天
+ * 3 年产生新推特的内容为 5.4 PB
+* 每秒需要处理 10 万次读取请求
+ * 每个月需要处理 2500 亿次请求 * (每秒 400 次请求 / 每月 10 亿次请求)
+* 每秒发布 6000 条推特
+ * 每月发布 150 亿条推特 * (每秒 400 次请求 / 每月 10 次请求)
+* 每秒推送 6 万条推特
+ * 每月推送 1500 亿条推特 * (每秒 400 次请求 / 每月 10 亿次请求)
+* 每秒 4000 次搜索请求
+
+便利换算指南:
+
+* 每个月有 250 万秒
+* 每秒一个请求 = 每个月 250 万次请求
+* 每秒 40 个请求 = 每个月 1 亿次请求
+* 每秒 400 个请求 = 每个月 10 亿次请求
+
+## 第二步:概要设计
+
+> 列出所有重要组件以规划概要设计。
+
+![Imgur](http://i.imgur.com/48tEA2j.png)
+
+## 第三步:设计核心组件
+
+> 深入每个核心组件的细节。
+
+### 用例:用户发表了一篇推特
+
+我们可以将用户自己发表的推特存储在[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)中。我们也可以讨论一下[究竟是用 SQL 还是用 NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
+
+构建用户主页时间轴(查看关注用户的活动)以及推送推特是件麻烦事。将特推传播给所有关注者(每秒约递送 6 万条推特)这一操作有可能会使传统的[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)超负载。因此,我们可以使用 **NoSQL 数据库**或**内存数据库**之类的更快的数据存储方式。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
+
+我们可以将照片、视频之类的媒体存储于**对象存储**中。
+
+* **客户端**向应用[反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)的**Web 服务器**发送一条推特
+* **Web 服务器**将请求转发给**写 API**服务器
+* **写 API**服务器将推特使用 **SQL 数据库**存储于用户时间轴中
+* **写 API**调用**消息输出服务**,进行以下操作:
+ * 查询**用户 图 服务**找到存储于**内存缓存**中的此用户的粉丝
+ * 将推特存储于**内存缓存**中的**此用户的粉丝的主页时间轴**中
+ * O(n) 复杂度操作: 1000 名粉丝 = 1000 次查找与插入
+ * 将特推存储在**搜索索引服务**中,以加快搜索
+ * 将媒体存储于**对象存储**中
+ * 使用**通知服务**向粉丝发送推送:
+ * 使用**队列**异步推送通知
+
+**向你的面试官告知你准备写多少代码**。
+
+如果我们用 Redis 作为**内存缓存**,那可以用 Redis 原生的 list 作为其数据结构。结构如下:
+
+```
+ tweet n+2 tweet n+1 tweet n
+| 8 bytes 8 bytes 1 byte | 8 bytes 8 bytes 1 byte | 8 bytes 8 bytes 1 byte |
+| tweet_id user_id meta | tweet_id user_id meta | tweet_id user_id meta |
+```
+
+新发布的推特将被存储在对应用户(关注且活跃的用户)的主页时间轴的**内存缓存**中。
+
+我们可以调用一个公共的 [REST API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest):
+
+```
+$ curl -X POST --data '{ "user_id": "123", "auth_token": "ABC123", \
+ "status": "hello world!", "media_ids": "ABC987" }' \
+ https://twitter.com/api/v1/tweet
+```
+
+返回:
+
+```
+{
+ "created_at": "Wed Sep 05 00:37:15 +0000 2012",
+ "status": "hello world!",
+ "tweet_id": "987",
+ "user_id": "123",
+ ...
+}
+```
+
+而对于服务器内部的通信,我们可以使用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
+
+### 用例:用户浏览主页时间轴
+
+* **客户端**向 **Web 服务器**发起一次读取主页时间轴的请求
+* **Web 服务器**将请求转发给**读取 API**服务器
+* **读取 API**服务器调用**时间轴服务**进行以下操作:
+ * 从**内存缓存**读取时间轴数据,其中包括推特 id 与用户 id - O(1)
+ * 通过 [multiget](http://redis.io/commands/mget) 向**推特信息服务**进行查询,以获取相关 id 推特的额外信息 - O(n)
+ * 通过 muiltiget 向**用户信息服务**进行查询,以获取相关 id 用户的额外信息 - O(n)
+
+REST API:
+
+```
+$ curl https://twitter.com/api/v1/home_timeline?user_id=123
+```
+
+返回:
+
+```
+{
+ "user_id": "456",
+ "tweet_id": "123",
+ "status": "foo"
+},
+{
+ "user_id": "789",
+ "tweet_id": "456",
+ "status": "bar"
+},
+{
+ "user_id": "789",
+ "tweet_id": "579",
+ "status": "baz"
+},
+```
+
+### 用例:用户浏览用户时间轴
+
+* **客户端**向**Web 服务器**发起获得用户时间线的请求
+* **Web 服务器**将请求转发给**读取 API**服务器
+* **读取 API**从 **SQL 数据库**中取出用户的时间轴
+
+REST API 与前面的主页时间轴类似,区别只在于取出的推特是由用户自己发送而不是关注人发送。
+
+### 用例:用户搜索关键词
+
+* **客户端**将搜索请求发给**Web 服务器**
+* **Web 服务器**将请求转发给**搜索 API**服务器
+* **搜索 API**调用**搜索服务**进行以下操作:
+ * 对输入进行转换与分词,弄明白需要搜索什么东西
+ * 移除标点等额外内容
+ * 将文本打散为词组
+ * 修正拼写错误
+ * 规范字母大小写
+ * 将查询转换为布尔操作
+ * 查询**搜索集群**(例如[Lucene](https://lucene.apache.org/))检索结果:
+ * 对集群内的所有服务器进行查询,将有结果的查询进行[发散聚合(Scatter gathers)](https://github.com/donnemartin/system-design-primer#under-development)
+ * 合并取到的条目,进行评分与排序,最终返回结果
+
+REST API:
+
+```
+$ curl https://twitter.com/api/v1/search?query=hello+world
+```
+
+返回结果与前面的主页时间轴类似,只不过返回的是符合查询条件的推特。
+
+## 第四步:架构扩展
+
+> 根据限制条件,找到并解决瓶颈。
+
+![Imgur](http://i.imgur.com/MzExP06.png)
+
+**重要提示:不要从最初设计直接跳到最终设计中!**
+
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
+
+我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
+
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [反向代理(web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
+* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
+* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+
+**消息输出服务**有可能成为性能瓶颈。那些有着百万数量关注着的用户可能发一条推特就需要好几分钟才能完成消息输出进程。这有可能使 @回复 这种推特时出现竞争条件,因此需要根据服务时间对此推特进行重排序来降低影响。
+
+我们还可以避免从高关注量的用户输出推特。相反,我们可以通过搜索来找到高关注量用户的推特,并将搜索结果与用户的主页时间轴合并,再根据时间对其进行排序。
+
+此外,还可以通过以下内容进行优化:
+
+* 仅为每个主页时间轴在**内存缓存**中存储数百条推特
+* 仅在**内存缓存**中存储活动用户的主页时间轴
+ * 如果某个用户在过去 30 天都没有产生活动,那我们可以使用 **SQL 数据库**重新构建他的时间轴
+ * 使用**用户 图 服务**来查询并确定用户关注的人
+ * 从 **SQL 数据库**中取出推特,并将它们存入**内存缓存**
+* 仅在**推特信息服务**中存储一个月的推特
+* 仅在**用户信息服务**中存储活动用户的信息
+* **搜索集群**需要将推特保留在内存中,以降低延迟
+
+我们还可以考虑优化 **SQL 数据库** 来解决一些瓶颈问题。
+
+**内存缓存**能减小一些数据库的负载,靠 **SQL Read 副本**已经足够处理缓存未命中情况。我们还可以考虑使用一些额外的 SQL 性能拓展技术。
+
+高容量的写入将淹没单个的 **SQL 写主从**模式,因此需要更多的拓展技术。
+
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+
+我们也可以考虑将一些数据移至 **NoSQL 数据库**。
+
+## 其它要点
+
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+
+#### NoSQL
+
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+
+### 缓存
+
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+
+### 异步与微服务
+
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+
+### 通信
+
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+
+### 安全性
+
+请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
+
+### 延迟数值
+
+请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+
+### 持续探讨
+
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构拓展是一个迭代的过程。
diff --git a/solutions/system_design/web_crawler/README-zh-Hans.md b/solutions/system_design/web_crawler/README-zh-Hans.md
new file mode 100644
index 00000000..2ad0938e
--- /dev/null
+++ b/solutions/system_design/web_crawler/README-zh-Hans.md
@@ -0,0 +1,356 @@
+# 设计一个网页爬虫
+
+**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
+
+## 第一步:简述用例与约束条件
+
+> 把所有需要的东西聚集在一起,审视问题。不停的提问,以至于我们可以明确使用场景和约束。讨论假设。
+
+我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
+
+### 用例
+
+#### 我们把问题限定在仅处理以下用例的范围中
+
+* **服务** 抓取一系列链接:
+ * 生成包含搜索词的网页倒排索引
+ * 生成页面的标题和摘要信息
+ * 页面标题和摘要都是静态的,它们不会根据搜索词改变
+* **用户** 输入搜索词后,可以看到相关的搜索结果列表,列表每一项都包含由网页爬虫生成的页面标题及摘要
+ * 只给该用例绘制出概要组件和交互说明,无需讨论细节
+* **服务** 具有高可用性
+
+#### 无需考虑
+
+* 搜索分析
+* 个性化搜索结果
+* 页面排名
+
+### 限制条件与假设
+
+#### 提出假设
+
+* 搜索流量分布不均
+ * 有些搜索词非常热门,有些则非常冷门
+* 只支持匿名用户
+* 用户很快就能看到搜索结果
+* 网页爬虫不应该陷入死循环
+ * 当爬虫路径包含环的时候,将会陷入死循环
+* 抓取 10 亿个链接
+ * 要定期重新抓取页面以确保新鲜度
+ * 平均每周重新抓取一次,网站越热门,那么重新抓取的频率越高
+ * 每月抓取 40 亿个链接
+ * 每个页面的平均存储大小:500 KB
+ * 简单起见,重新抓取的页面算作新页面
+* 每月搜索量 1000 亿次
+
+用更传统的系统来练习 —— 不要使用 [solr](http://lucene.apache.org/solr/) 、[nutch](http://nutch.apache.org/) 之类的现成系统。
+
+#### 计算用量
+
+**如果你需要进行粗略的用量计算,请向你的面试官说明。**
+
+* 每月存储 2 PB 页面
+ * 每月抓取 40 亿个页面,每个页面 500 KB
+ * 三年存储 72 PB 页面
+* 每秒 1600 次写请求
+* 每秒 40000 次搜索请求
+
+简便换算指南:
+
+* 一个月有 250 万秒
+* 每秒 1 个请求,即每月 250 万个请求
+* 每秒 40 个请求,即每月 1 亿个请求
+* 每秒 400 个请求,即每月 10 亿个请求
+
+## 第二步: 概要设计
+
+> 列出所有重要组件以规划概要设计。
+
+![Imgur](http://i.imgur.com/xjdAAUv.png)
+
+## 第三步:设计核心组件
+
+> 对每一个核心组件进行详细深入的分析。
+
+### 用例:爬虫服务抓取一系列网页
+
+假设我们有一个初始列表 `links_to_crawl`(待抓取链接),它最初基于网站整体的知名度来排序。当然如果这个假设不合理,我们可以使用 [Yahoo](https://www.yahoo.com/)、[DMOZ](http://www.dmoz.org/) 等知名门户网站作为种子链接来进行扩散 。
+
+我们将用表 `crawled_links` (已抓取链接 )来记录已经处理过的链接以及相应的页面签名。
+
+我们可以将 `links_to_crawl` 和 `crawled_links` 记录在键-值型 **NoSQL 数据库**中。对于 `crawled_links` 中已排序的链接,我们可以使用 [Redis](https://redis.io/) 的有序集合来维护网页链接的排名。我们应当在 [选择 SQL 还是 NoSQL 的问题上,讨论有关使用场景以及利弊 ](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
+
+* **爬虫服务**按照以下流程循环处理每一个页面链接:
+ * 选取排名最靠前的待抓取链接
+ * 在 **NoSQL 数据库**的 `crawled_links` 中,检查待抓取页面的签名是否与某个已抓取页面的签名相似
+ * 若存在,则降低该页面链接的优先级
+ * 这样做可以避免陷入死循环
+ * 继续(进入下一次循环)
+ * 若不存在,则抓取该链接
+ * 在**倒排索引服务**任务队列中,新增一个生成[倒排索引](https://en.wikipedia.org/wiki/Search_engine_indexing)任务。
+ * 在**文档服务**任务队列中,新增一个生成静态标题和摘要的任务。
+ * 生成页面签名
+ * 在 **NoSQL 数据库**的 `links_to_crawl` 中删除该链接
+ * 在 **NoSQL 数据库**的 `crawled_links` 中插入该链接以及页面签名
+
+**向面试官了解你需要写多少代码**。
+
+`PagesDataStore` 是**爬虫服务**中的一个抽象类,它使用 **NoSQL 数据库**进行存储。
+
+```python
+class PagesDataStore(object):
+
+ def __init__(self, db);
+ self.db = db
+ ...
+
+ def add_link_to_crawl(self, url):
+ """将指定链接加入 `links_to_crawl`。"""
+ ...
+
+ def remove_link_to_crawl(self, url):
+ """从 `links_to_crawl` 中删除指定链接。"""
+ ...
+
+ def reduce_priority_link_to_crawl(self, url)
+ """在 `links_to_crawl` 中降低一个链接的优先级以避免死循环。"""
+ ...
+
+ def extract_max_priority_page(self):
+ """返回 `links_to_crawl` 中优先级最高的链接。"""
+ ...
+
+ def insert_crawled_link(self, url, signature):
+ """将指定链接加入 `crawled_links`。"""
+ ...
+
+ def crawled_similar(self, signature):
+ """判断待抓取页面的签名是否与某个已抓取页面的签名相似。"""
+ ...
+```
+
+`Page` 是**爬虫服务**的一个抽象类,它封装了网页对象,由页面链接、页面内容、子链接和页面签名构成。
+
+```python
+class Page(object):
+
+ def __init__(self, url, contents, child_urls, signature):
+ self.url = url
+ self.contents = contents
+ self.child_urls = child_urls
+ self.signature = signature
+```
+
+`Crawler` 是**爬虫服务**的主类,由`Page` 和 `PagesDataStore` 组成。
+
+```python
+class Crawler(object):
+
+ def __init__(self, data_store, reverse_index_queue, doc_index_queue):
+ self.data_store = data_store
+ self.reverse_index_queue = reverse_index_queue
+ self.doc_index_queue = doc_index_queue
+
+ def create_signature(self, page):
+ """基于页面链接与内容生成签名。"""
+ ...
+
+ def crawl_page(self, page):
+ for url in page.child_urls:
+ self.data_store.add_link_to_crawl(url)
+ page.signature = self.create_signature(page)
+ self.data_store.remove_link_to_crawl(page.url)
+ self.data_store.insert_crawled_link(page.url, page.signature)
+
+ def crawl(self):
+ while True:
+ page = self.data_store.extract_max_priority_page()
+ if page is None:
+ break
+ if self.data_store.crawled_similar(page.signature):
+ self.data_store.reduce_priority_link_to_crawl(page.url)
+ else:
+ self.crawl_page(page)
+```
+
+### 处理重复内容
+
+我们要谨防网页爬虫陷入死循环,这通常会发生在爬虫路径中存在环的情况。
+
+**向面试官了解你需要写多少代码**.
+
+删除重复链接:
+
+* 假设数据量较小,我们可以用类似于 `sort | unique` 的方法。(译注: 先排序,后去重)
+* 假设有 10 亿条数据,我们应该使用 **MapReduce** 来输出只出现 1 次的记录。
+
+```python
+class RemoveDuplicateUrls(MRJob):
+
+ def mapper(self, _, line):
+ yield line, 1
+
+ def reducer(self, key, values):
+ total = sum(values)
+ if total == 1:
+ yield key, total
+```
+
+比起处理重复内容,检测重复内容更为复杂。我们可以基于网页内容生成签名,然后对比两者签名的相似度。可能会用到的算法有 [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index) 以及 [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)。
+
+### 抓取结果更新策略
+
+要定期重新抓取页面以确保新鲜度。抓取结果应该有个 `timestamp` 字段记录上一次页面抓取时间。每隔一段时间,比如说 1 周,所有页面都需要更新一次。对于热门网站或是内容频繁更新的网站,爬虫抓取间隔可以缩短。
+
+尽管我们不会深入网页数据分析的细节,我们仍然要做一些数据挖掘工作来确定一个页面的平均更新时间,并且根据相关的统计数据来决定爬虫的重新抓取频率。
+
+当然我们也应该根据站长提供的 `Robots.txt` 来控制爬虫的抓取频率。
+
+### 用例:用户输入搜索词后,可以看到相关的搜索结果列表,列表每一项都包含由网页爬虫生成的页面标题及摘要
+
+* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
+* **Web 服务器** 发送请求到 **Query API** 服务器
+* **查询 API** 服务将会做这些事情:
+ * 解析查询参数
+ * 删除 HTML 标记
+ * 将文本分割成词组 (译注: 分词处理)
+ * 修正错别字
+ * 规范化大小写
+ * 将搜索词转换为布尔运算
+ * 使用**倒排索引服务**来查找匹配查询的文档
+ * **倒排索引服务**对匹配到的结果进行排名,然后返回最符合的结果
+ * 使用**文档服务**返回文章标题与摘要
+
+我们使用 [**REST API**](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest) 与客户端通信:
+
+```
+$ curl https://search.com/api/v1/search?query=hello+world
+```
+
+响应内容:
+
+```
+{
+ "title": "foo's title",
+ "snippet": "foo's snippet",
+ "link": "https://foo.com",
+},
+{
+ "title": "bar's title",
+ "snippet": "bar's snippet",
+ "link": "https://bar.com",
+},
+{
+ "title": "baz's title",
+ "snippet": "baz's snippet",
+ "link": "https://baz.com",
+},
+```
+
+对于服务器内部通信,我们可以使用 [远程过程调用协议(RPC)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+
+
+## 第四步:架构扩展
+
+> 根据限制条件,找到并解决瓶颈。
+
+![Imgur](http://i.imgur.com/bWxPtQA.png)
+
+**重要提示:不要直接从最初设计跳到最终设计!**
+
+现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时,评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
+
+讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一套配备多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有哪些呢?
+
+我们将会介绍一些组件来完成设计,并解决架构规模扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
+
+**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及替代方案。
+
+* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
+* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
+* [水平扩展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
+* [Web 服务器(反向代理)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
+* [API 服务器(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
+* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
+* [NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#nosql)
+* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
+* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
+
+有些搜索词非常热门,有些则非常冷门。热门的搜索词可以通过诸如 Redis 或者 Memcached 之类的**内存缓存**来缩短响应时间,避免**倒排索引服务**以及**文档服务**过载。**内存缓存**同样适用于流量分布不均匀以及流量短时高峰问题。从内存中读取 1 MB 连续数据大约需要 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。1
+
+
+以下是优化**爬虫服务**的其他建议:
+
+* 为了处理数据大小问题以及网络请求负载,**倒排索引服务**和**文档服务**可能需要大量应用数据分片和数据复制。
+* DNS 查询可能会成为瓶颈,**爬虫服务**最好专门维护一套定期更新的 DNS 查询服务。
+* 借助于[连接池](https://en.wikipedia.org/wiki/Connection_pool),即同时维持多个开放网络连接,可以提升**爬虫服务**的性能并减少内存使用量。
+ * 改用 [UDP](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#用户数据报协议udp) 协议同样可以提升性能
+* 网络爬虫受带宽影响较大,请确保带宽足够维持高吞吐量。
+
+## 其它要点
+
+> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
+
+### SQL 扩展模式
+
+* [读取复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
+* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
+* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
+* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
+* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
+
+#### NoSQL
+
+* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
+* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
+* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
+* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
+* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
+
+
+### 缓存
+
+* 在哪缓存
+ * [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
+ * [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
+ * [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
+ * [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
+ * [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
+* 什么需要缓存
+ * [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
+ * [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
+* 何时更新缓存
+ * [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
+ * [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
+ * [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
+ * [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
+
+### 异步与微服务
+
+* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
+* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
+* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
+* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
+
+### 通信
+
+* 可权衡选择的方案:
+ * 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
+ * 内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
+* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
+
+
+### 安全性
+
+请参阅[安全](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)。
+
+
+### 延迟数值
+
+请参阅[每个程序员都应该知道的延迟数](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
+
+### 持续探讨
+
+* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
+* 架构扩展是一个迭代的过程。
From b2fffe6fd40bd22e5dbf9bd0b398fa76fd11537b Mon Sep 17 00:00:00 2001
From: userstartupideas <64709217+userstartupideas@users.noreply.github.com>
Date: Tue, 26 May 2020 02:59:01 +0100
Subject: [PATCH 43/72] Update "Scaling up to your first 10 million users" link
(#411)
---
README.md | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/README.md b/README.md
index 293b8ac5..f9563097 100644
--- a/README.md
+++ b/README.md
@@ -806,7 +806,7 @@ Systems such as [Consul](https://www.consul.io/docs/index.html), [Etcd](https://
- Source: Scaling up to your first 10 million users
+ Source: Scaling up to your first 10 million users
### Relational database management system (RDBMS)
@@ -872,7 +872,7 @@ Both masters serve reads and writes and coordinate with each other on writes. I
- Source: Scaling up to your first 10 million users
+ Source: Scaling up to your first 10 million users
Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: **forums**, **users**, and **products**, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.
@@ -886,7 +886,7 @@ Federation (or functional partitioning) splits up databases by function. For ex
##### Source(s) and further reading: federation
-* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=w95murBkYmU)
+* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=kKjm4ehYiMs)
#### Sharding
@@ -1122,7 +1122,7 @@ Sample data well-suited for NoSQL:
##### Source(s) and further reading: SQL or NoSQL
-* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=w95murBkYmU)
+* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=kKjm4ehYiMs)
* [SQL vs NoSQL differences](https://www.sitepoint.com/sql-vs-nosql-differences/)
## Cache
From 9a0248063275daddc51a99e3787760249b6a5d0a Mon Sep 17 00:00:00 2001
From: Alexander Teno <5354921+alexanderteno@users.noreply.github.com>
Date: Tue, 30 Jun 2020 20:37:49 -0400
Subject: [PATCH 44/72] Add missing comma in Mint solution (#399)
---
solutions/system_design/mint/README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/solutions/system_design/mint/README.md b/solutions/system_design/mint/README.md
index 6fca1938..383e8375 100644
--- a/solutions/system_design/mint/README.md
+++ b/solutions/system_design/mint/README.md
@@ -242,7 +242,7 @@ class Budget(object):
def create_budget_template(self):
return {
'DefaultCategories.HOUSING': income * .4,
- 'DefaultCategories.FOOD': income * .2
+ 'DefaultCategories.FOOD': income * .2,
'DefaultCategories.GAS': income * .1,
'DefaultCategories.SHOPPING': income * .2
...
From f2d7dd86f0835f5bf27f1a65dec032e013c4dd49 Mon Sep 17 00:00:00 2001
From: panguncle <54489480+panguncle@users.noreply.github.com>
Date: Wed, 1 Jul 2020 08:42:24 +0800
Subject: [PATCH 45/72] Fix single point of failure typo (#398)
---
README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index f9563097..e682e91c 100644
--- a/README.md
+++ b/README.md
@@ -665,7 +665,7 @@ Load balancers distribute incoming client requests to computing resources such a
* Preventing requests from going to unhealthy servers
* Preventing overloading resources
-* Helping eliminate single points of failure
+* Helping to eliminate a single point of failure
Load balancers can be implemented with hardware (expensive) or with software such as HAProxy.
@@ -710,7 +710,7 @@ Load balancers can also help with horizontal scaling, improving performance and
### Disadvantage(s): load balancer
* The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.
-* Introducing a load balancer to help eliminate single points of failure results in increased complexity.
+* Introducing a load balancer to help eliminate a single point of failure results in increased complexity.
* A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.
### Source(s) and further reading
From 42aa63b3c201f41048aa3380549848e77f2a9fc7 Mon Sep 17 00:00:00 2001
From: shiyujiucsb <16054786+shiyujiucsb@users.noreply.github.com>
Date: Tue, 30 Jun 2020 17:56:55 -0700
Subject: [PATCH 46/72] Fix layer 7 load balancers typo (#317)
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index e682e91c..5f10e943 100644
--- a/README.md
+++ b/README.md
@@ -692,7 +692,7 @@ Layer 4 load balancers look at info at the [transport layer](#communication) to
### Layer 7 load balancing
-Layer 7 load balancers look at the [application layer](#communication) to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminates network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.
+Layer 7 load balancers look at the [application layer](#communication) to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.
At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.
From b4a7a09db7b308f2d3b79b69fbc54f3f141f7611 Mon Sep 17 00:00:00 2001
From: Donne Martin
Date: Wed, 1 Jul 2020 20:48:43 -0400
Subject: [PATCH 47/72] Update contributing guidelines for translations (#434)
---
CONTRIBUTING.md | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index eddc2684..69348619 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -45,27 +45,33 @@ If you are not familiar with pull requests, review the [pull request docs](https
We'd like for the guide to be available in many languages. Here is the process for maintaining translations:
* This original version and content of the guide is maintained in English.
-* Translations follow the content of the original. Unfortunately, contributors must speak at least some English, so that translations do not diverge.
-* Each translation has a maintainer to update the translation as the original evolves and to review others' changes. This doesn't require a lot of time, but review by the maintainer is important to maintain quality.
+* Translations follow the content of the original. Contributors must speak at least some English, so that translations do not diverge.
+* Each translation has a maintainer to update the translation as the original evolves and to review others' changes. This doesn't require a lot of time, but a review by the maintainer is important to maintain quality.
+
+See [Translations](TRANSLATIONS.md).
### Changes to translations
* Changes to content should be made to the English version first, and then translated to each other language.
-* Changes that improve translations should be made directly on the file for that language. PRs should only modify one language at a time.
-* Submit a PR with changes to the file in that language. Each language has a maintainer, who reviews changes in that language. Then the primary maintainer @donnemartin merges it in.
-* Prefix PRs and issues with language codes if they are for that translation only, e.g. "es: Improve grammar", so maintainers can find them easily.
+* Changes that improve translations should be made directly on the file for that language. Pull requests should only modify one language at a time.
+* Submit a pull request with changes to the file in that language. Each language has a maintainer, who reviews changes in that language. Then the primary maintainer [@donnemartin](https://github.com/donnemartin) merges it in.
+* Prefix pull requests and issues with language codes if they are for that translation only, e.g. "es: Improve grammar", so maintainers can find them easily.
+* Tag the translation maintainer for a code review, see the list of [translation maintainers](TRANSLATIONS.md).
+ * You will need to get a review from a native speaker (preferably the language maintainer) before your pull request is merged.
### Adding translations to new languages
-Translations to new languages are always welcome, especially if you can maintain the translation!
+Translations to new languages are always welcome! Keep in mind a transation must be maintained.
-* Check existing issues to see if a translation is in progress or stalled. If so, offer to help.
-* If it is not in progress, file an issue for your language so people know you are working on it and we can arrange. Confirm you are native level in the language and are willing to maintain the translation, so it's not orphaned.
-* To get it started, fork the repo, then submit a PR with the single file README-xx.md added, where xx is the language code. Use standard [IETF language tags](https://www.w3.org/International/articles/language-tags/), i.e. the same as is used by Wikipedia, *not* the code for a single country. These are usually just the two-letter lowercase code, for example, `fr` for French and `uk` for Ukrainian (not `ua`, which is for the country). For languages that have variations, use the shortest tag, such as `zh-Hant`.
-* Invite friends to review if possible. If desired, feel free to invite friends to help your original translation by letting them fork your repo, then merging their PRs.
-* Add links to your translation at the top of every README*.md file. (For consistency, the link should be added in alphabetical order by ISO code, and the anchor text should be in the native language.)
-* When done, indicate on the PR that it's ready to be merged into the main repo.
-* Once accepted, your PR will be squashed into a single commit into the `master` branch.
+* Do you have time to be a maintainer for a new language? Please see the list of [translations](TRANSLATIONS.md) and tell us so we know we can count on you in the future.
+* Check the [translations](TRANSLATIONS.md), issues, and pull requests to see if a translation is in progress or stalled. If it's in progress, offer to help. If it's stalled, consider becoming the maintainer if you can commit to it.
+* If a translation has not yet been started, file an issue for your language so people know you are working on it and we'll coordinate. Confirm you are native level in the language and are willing to maintain the translation, so it's not orphaned.
+* To get started, fork the repo, then submit a pull request to the main repo with the single file README-xx.md added, where xx is the language code. Use standard [IETF language tags](https://www.w3.org/International/articles/language-tags/), i.e. the same as is used by Wikipedia, *not* the code for a single country. These are usually just the two-letter lowercase code, for example, `fr` for French and `uk` for Ukrainian (not `ua`, which is for the country). For languages that have variations, use the shortest tag, such as `zh-Hant`.
+* Feel free to invite friends to help your original translation by having them fork your repo, then merging their pull requests to your forked repo. Translations are difficult and usually have errors that others need to find.
+* Add links to your translation at the top of every README-XX.md file. For consistency, the link should be added in alphabetical order by ISO code, and the anchor text should be in the native language.
+* When you've fully translated the English README.md, comment on the pull request in the main repo that it's ready to be merged.
+ * You'll need to have a complete and reviewed translation of the English README.md before your translation will be merged into the `master` branch.
+ * Once accepted, your pull request will be squashed into a single commit into the `master` branch.
### Translation template credits
From b199766f440ce1af6454c7a81c2b830716b71388 Mon Sep 17 00:00:00 2001
From: Donne Martin
Date: Wed, 1 Jul 2020 20:54:05 -0400
Subject: [PATCH 48/72] Add pull request template (#435)
---
.github/PULL_REQUEST_TEMPLATE.md | 11 +++++++++++
1 file changed, 11 insertions(+)
create mode 100644 .github/PULL_REQUEST_TEMPLATE.md
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
new file mode 100644
index 00000000..0bd988de
--- /dev/null
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,11 @@
+## Review the Contributing Guidelines
+
+Before submitting a pull request, verify it meets all requirements in the [Contributing Guidelines](https://github.com/donnemartin/system-design-primer/blob/master/CONTRIBUTING.md).
+
+### Translations
+
+See the [Contributing Guidelines](https://github.com/donnemartin/system-design-primer/blob/master/CONTRIBUTING.md). Verify you've:
+
+* Tagged the [language maintainer](TRANSLATIONS.md)
+* Prefixed the title with a language code
+ * Example: "ja: Fix ..."
From 661c029b574c0b419f0736a30e4e2e44adf4d1fd Mon Sep 17 00:00:00 2001
From: Donne Martin
Date: Thu, 2 Jul 2020 21:11:07 -0400
Subject: [PATCH 49/72] Add status of translations (#436)
---
TRANSLATIONS.md | 163 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 163 insertions(+)
create mode 100644 TRANSLATIONS.md
diff --git a/TRANSLATIONS.md b/TRANSLATIONS.md
new file mode 100644
index 00000000..5bfae9af
--- /dev/null
+++ b/TRANSLATIONS.md
@@ -0,0 +1,163 @@
+# Translations
+
+**Thank you to our awesome translation maintainers!**
+
+## Contributing
+
+See the [Contributing Guidelines](CONTRIBUTING.md).
+
+## Translation Statuses
+
+* 🎉 **Live**: Merged into `master` branch
+* ⏳ **In Progress**: Under active translation for eventual merge into `master` branch
+* ❗ **Stalled***: Needs an active maintainer ✋
+
+**Within the past 2 months, there has been 1) No active work in the translation fork, and 2) No discussions from previous maintainer(s) in the discussion thread.*
+
+Languages not listed here have not been started, [contribute](CONTRIBUTING.md)!
+
+Languages are grouped by status and are listed in alphabetical order.
+
+## Live
+
+### 🎉 Japanese
+
+* [README-ja.md](README-ja.md)
+* Maintainer(s): [@tsukukobaan](https://github.com/tsukukobaan) 👏
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/100
+
+### 🎉 Simplified Chinese
+
+* [zh-Hans.md](README-zh-Hans.md)
+* Maintainer(s): [@sqrthree](https://github.com/sqrthree) 👏
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/38
+
+### 🎉 Traditional Chinese
+
+* [README-zh-TW.md](README-zh-TW.md)
+* Maintainer(s): [@kevingo](https://github.com/kevingo) 👏
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/88
+
+## In Progress
+
+### ⏳ Korean
+
+* Maintainer(s): [@bonomoon](https://github.com/bonomoon), [@mingrammer](https://github.com/mingrammer) 👏
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/102
+* Translation Fork: https://github.com/bonomoon/system-design-primer, https://github.com/donnemartin/system-design-primer/pull/103
+
+### ⏳ Russian
+
+* Maintainer(s): [@voitau](https://github.com/voitau), [@DmitryOlkhovoi](https://github.com/DmitryOlkhovoi) 👏
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/87
+* Translation Fork: https://github.com/voitau/system-design-primer/blob/master/README-ru.md
+
+## Stalled
+
+**Notes**:
+
+* If you're able to commit to being an active maintainer for a language, let us know in the discussion thread for your language and update this file with a pull request.
+ * If you're listed here as a "Previous Maintainer" but can commit to being an active maintainer, also let us know.
+* See the [Contributing Guidelines](CONTRIBUTING.md).
+
+### ❗ Arabic
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@aymns](https://github.com/aymns)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/170
+* Translation Fork: https://github.com/aymns/system-design-primer/blob/develop/README-ar.md
+
+### ❗ Bengali
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@nutboltu](https://github.com/nutboltu)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/220
+* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/240
+
+### ❗ Brazilian Portuguese
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@IuryAlves](https://github.com/IuryAlves)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/40
+* Translation Fork: https://github.com/IuryAlves/system-design-primer, https://github.com/donnemartin/system-design-primer/pull/67
+
+### ❗ French
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@spuyet](https://github.com/spuyet)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/250
+* Translation Fork: https://github.com/spuyet/system-design-primer/blob/add-french-translation/README-fr.md
+
+### ❗ German
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@Allaman](https://github.com/Allaman)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/186
+* Translation Fork: None
+
+### ❗ Greek
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@Belonias](https://github.com/Belonias)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/130
+* Translation Fork: None
+
+### ❗ Hebrew
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@EladLeev](https://github.com/EladLeev)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/272
+* Translation Fork: https://github.com/EladLeev/system-design-primer/tree/he-translate
+
+### ❗ Italian
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@pgoodjohn](https://github.com/pgoodjohn)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/104
+* Translation Fork: https://github.com/pgoodjohn/system-design-primer
+
+### ❗ Persian
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@hadisinaee](https://github.com/hadisinaee)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/pull/112
+* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/112
+
+### ❗ Spanish
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@eamanu](https://github.com/eamanu)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/136
+* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/189
+
+### ❗ Thai
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@iphayao](https://github.com/iphayao)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/187
+* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/221
+
+### ❗ Turkish
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@hwclass](https://github.com/hwclass), [@canerbaran](https://github.com/canerbaran), [@emrahtoy](https://github.com/emrahtoy)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/39
+* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/239
+
+### ❗ Ukrainian
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@Kietzmann](https://github.com/Kietzmann), [@Acarus](https://github.com/Acarus)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/248
+* Translation Fork: https://github.com/Acarus/system-design-primer
+
+### ❗ Vietnamese
+
+* Maintainer(s): **Help Wanted** ✋
+ * Previous Maintainer(s): [@tranlyvu](https://github.com/tranlyvu), [@duynguyenhoang](https://github.com/duynguyenhoang)
+* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/127
+* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/241, https://github.com/donnemartin/system-design-primer/pull/327
+
+## Not Started
+
+Languages not listed here have not been started, [contribute](CONTRIBUTING.md)!
From fca96cafbb3b1e42ca263af3b430ad4b14342ebf Mon Sep 17 00:00:00 2001
From: Donne Martin
Date: Thu, 2 Jul 2020 21:18:36 -0400
Subject: [PATCH 50/72] Highlight translation request (#437)
---
README.md | 2 ++
1 file changed, 2 insertions(+)
diff --git a/README.md b/README.md
index 5f10e943..eb783781 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,7 @@
*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+**Help [translate](TRANSLATIONS.md) this guide!**
+
# The System Design Primer
From d57b3d1f956a65fe5cfeeb0d8293ef006e5cff10 Mon Sep 17 00:00:00 2001
From: Neesara
Date: Sat, 4 Jul 2020 06:52:02 +0530
Subject: [PATCH 51/72] Resolve #164 - Fix phrasing with availability and
partition tolerance (#350)
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index eb783781..7222e801 100644
--- a/README.md
+++ b/README.md
@@ -458,7 +458,7 @@ Waiting for a response from the partitioned node might result in a timeout error
#### AP - availability and partition tolerance
-Responses return the most recent version of the data available on a node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.
+Responses return the most readily available version of the data available on any node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.
AP is a good choice if the business needs allow for [eventual consistency](#eventual-consistency) or when the system needs to continue working despite external errors.
From 60202315cc646b587185faa697ce70c8399dcd2f Mon Sep 17 00:00:00 2001
From: Sainadh Devireddy
Date: Sat, 4 Jul 2020 06:54:24 +0530
Subject: [PATCH 52/72] Check dependencies in Ebook gen script (#406)
---
generate-epub.sh | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/generate-epub.sh b/generate-epub.sh
index d7c21241..18690fbb 100755
--- a/generate-epub.sh
+++ b/generate-epub.sh
@@ -1,4 +1,4 @@
-#! /usr/bin/env sh
+#! /usr/bin/env bash
generate_from_stdin() {
outfile=$1
@@ -34,6 +34,20 @@ generate () {
cat $name.md | generate_from_stdin $name.epub $language
}
+# Check if depencies exist
+check_dependencies () {
+ for dependency in "${dependencies[@]}"
+ do
+ if ! [ -x "$(command -v $dependency)" ]; then
+ echo "Error: $dependency is not installed." >&2
+ exit 1
+ fi
+ done
+}
+
+dependencies=("pandoc")
+
+check_dependencies
generate_with_solutions
generate README-ja ja
generate README-zh-Hans zh-Hans
From 5d4dac6bafe5e60d338e588f74426935c3efd949 Mon Sep 17 00:00:00 2001
From: Kevin Liu
Date: Sat, 4 Jul 2020 10:53:56 -0400
Subject: [PATCH 53/72] Fix typo: Change replication to federation (#418)
---
solutions/system_design/web_crawler/README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/solutions/system_design/web_crawler/README.md b/solutions/system_design/web_crawler/README.md
index d95dc107..355d36d1 100644
--- a/solutions/system_design/web_crawler/README.md
+++ b/solutions/system_design/web_crawler/README.md
@@ -282,7 +282,7 @@ Some searches are very popular, while others are only executed once. Popular qu
Below are a few other optimizations to the **Crawling Service**:
-* To handle the data size and request load, the **Reverse Index Service** and **Document Service** will likely need to make heavy use sharding and replication.
+* To handle the data size and request load, the **Reverse Index Service** and **Document Service** will likely need to make heavy use sharding and federation.
* DNS lookup can be a bottleneck, the **Crawler Service** can keep its own DNS lookup that is refreshed periodically
* The **Crawler Service** can improve performance and reduce memory usage by keeping many open connections at a time, referred to as [connection pooling](https://en.wikipedia.org/wiki/Connection_pool)
* Switching to [UDP](https://github.com/donnemartin/system-design-primer#user-datagram-protocol-udp) could also boost performance
From 914736a29f2b1164c390b30bd8b53479f9ae8727 Mon Sep 17 00:00:00 2001
From: Kofi Forson
Date: Sat, 4 Jul 2020 07:55:42 -0700
Subject: [PATCH 54/72] Update Twitter back-of-the-envelope calculations (#414)
---
solutions/system_design/twitter/README.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/solutions/system_design/twitter/README.md b/solutions/system_design/twitter/README.md
index 374f5dd2..7df01328 100644
--- a/solutions/system_design/twitter/README.md
+++ b/solutions/system_design/twitter/README.md
@@ -80,6 +80,7 @@ Search
* 60 thousand tweets delivered on fanout per second
* 150 billion tweets delivered on fanout per month * (400 requests per second / 1 billion requests per month)
* 4,000 search requests per second
+ * 10 billion searches per month * (400 requests per second / 1 billion requests per month)
Handy conversion guide:
From 2ac6512f6d9f4ef4c925b68eab14ad697e7308ab Mon Sep 17 00:00:00 2001
From: Agade09
Date: Sun, 5 Jul 2020 16:48:23 +0200
Subject: [PATCH 55/72] Fix typos in Twitter and web crawler exercises (#438)
---
solutions/system_design/twitter/README.md | 4 ++--
solutions/system_design/web_crawler/README.md | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/solutions/system_design/twitter/README.md b/solutions/system_design/twitter/README.md
index 7df01328..d14996f1 100644
--- a/solutions/system_design/twitter/README.md
+++ b/solutions/system_design/twitter/README.md
@@ -26,7 +26,7 @@ Without an interviewer to address clarifying questions, we'll define some use ca
#### Out of scope
* **Service** pushes tweets to the Twitter Firehose and other streams
-* **Service** strips out tweets based on user's visibility settings
+* **Service** strips out tweets based on users' visibility settings
* Hide @reply if the user is not also following the person being replied to
* Respect 'hide retweets' setting
* Analytics
@@ -129,7 +129,7 @@ If our **Memory Cache** is Redis, we could use a native Redis list with the foll
| tweet_id user_id meta | tweet_id user_id meta | tweet_id user_id meta |
```
-The new tweet would be placed in the **Memory Cache**, which populates user's home timeline (activity from people the user is following).
+The new tweet would be placed in the **Memory Cache**, which populates the user's home timeline (activity from people the user is following).
We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
diff --git a/solutions/system_design/web_crawler/README.md b/solutions/system_design/web_crawler/README.md
index 355d36d1..e6e79ad2 100644
--- a/solutions/system_design/web_crawler/README.md
+++ b/solutions/system_design/web_crawler/README.md
@@ -77,7 +77,7 @@ Handy conversion guide:
### Use case: Service crawls a list of urls
-We'll assume we have an initial list of `links_to_crawl` ranked initially based on overall site popularity. If this is not a reasonable assumption, we can seed the crawler with popular sites that link to outside content such as [Yahoo](https://www.yahoo.com/), [DMOZ](http://www.dmoz.org/), etc
+We'll assume we have an initial list of `links_to_crawl` ranked initially based on overall site popularity. If this is not a reasonable assumption, we can seed the crawler with popular sites that link to outside content such as [Yahoo](https://www.yahoo.com/), [DMOZ](http://www.dmoz.org/), etc.
We'll use a table `crawled_links` to store processed links and their page signatures.
From 793f47297021bbdc3b0505c3d97454e3a9addd30 Mon Sep 17 00:00:00 2001
From: Nachiket Acharya
Date: Sun, 5 Jul 2020 07:50:31 -0700
Subject: [PATCH 56/72] Fix #313: Clarify availability patterns (#439)
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 7222e801..940b44ec 100644
--- a/README.md
+++ b/README.md
@@ -496,7 +496,7 @@ This approach is seen in file systems and RDBMSes. Strong consistency works wel
## Availability patterns
-There are two main patterns to support high availability: **fail-over** and **replication**.
+There are two complementary patterns to support high availability: **fail-over** and **replication**.
### Fail-over
From d3b3e78966f15ee821697bbd48abd368c9a766c8 Mon Sep 17 00:00:00 2001
From: Rahil
Date: Sun, 5 Jul 2020 20:23:28 +0530
Subject: [PATCH 57/72] Add system design template link (#433)
---
README.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/README.md b/README.md
index 940b44ec..fa6c44f2 100644
--- a/README.md
+++ b/README.md
@@ -282,6 +282,7 @@ Check out the following links to get a better idea of what to expect:
* [How to ace a systems design interview](https://www.palantir.com/2011/10/how-to-rock-a-systems-design-interview/)
* [The system design interview](http://www.hiredintech.com/system-design)
* [Intro to Architecture and Systems Design Interviews](https://www.youtube.com/watch?v=ZgdS0EUmn70)
+* [System design template](https://leetcode.com/discuss/career/229177/My-System-Design-Template)
## System design interview questions with solutions
From 6d700ab9e1b669d85e0ecdaacb701007b1cd3192 Mon Sep 17 00:00:00 2001
From: Youngchul Bang
Date: Sun, 5 Jul 2020 23:57:57 +0900
Subject: [PATCH 58/72] kr: Fix Korean translation link in language index
(#340)
---
README-ja.md | 2 +-
README-zh-Hans.md | 2 +-
README-zh-TW.md | 2 +-
README.md | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/README-ja.md b/README-ja.md
index 6c5cb0cf..4257e495 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -1,4 +1,4 @@
-*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [한국어](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
# システム設計入門
diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index 83c6007b..1b997f3c 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -3,7 +3,7 @@
> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)、[根号三](https://github.com/sqrthree)
> * 这个 [链接](https://github.com/xitu/system-design-primer/compare/master...donnemartin:master) 用来查看本翻译与英文版是否有差别(如果你没有看到 README.md 发生变化,那就意味着这份翻译文档是最新的)。
-*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [한국어](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
# 系统设计入门
diff --git a/README-zh-TW.md b/README-zh-TW.md
index c08362d3..e7996c45 100644
--- a/README-zh-TW.md
+++ b/README-zh-TW.md
@@ -1,4 +1,4 @@
-*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [한국어](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
# 系統設計入門
diff --git a/README.md b/README.md
index fa6c44f2..5d6935df 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
+*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [한국어](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
**Help [translate](TRANSLATIONS.md) this guide!**
From cc11a9b119799425879a935f5536b95e79f1b122 Mon Sep 17 00:00:00 2001
From: Vladimir Mikhaylov <38596482+vemikhaylov@users.noreply.github.com>
Date: Tue, 7 Jul 2020 04:00:34 +0300
Subject: [PATCH 59/72] Fix Mint exercise bugs and typos (#409)
---
solutions/system_design/mint/README.md | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/solutions/system_design/mint/README.md b/solutions/system_design/mint/README.md
index 383e8375..1ec31674 100644
--- a/solutions/system_design/mint/README.md
+++ b/solutions/system_design/mint/README.md
@@ -202,7 +202,7 @@ For sellers not initially seeded in the map, we could use a crowdsourcing effort
```python
class Categorizer(object):
- def __init__(self, seller_category_map, self.seller_category_crowd_overrides_map):
+ def __init__(self, seller_category_map, seller_category_crowd_overrides_map):
self.seller_category_map = seller_category_map
self.seller_category_crowd_overrides_map = \
seller_category_crowd_overrides_map
@@ -223,7 +223,7 @@ Transaction implementation:
class Transaction(object):
def __init__(self, created_at, seller, amount):
- self.timestamp = timestamp
+ self.created_at = created_at
self.seller = seller
self.amount = amount
```
@@ -241,10 +241,10 @@ class Budget(object):
def create_budget_template(self):
return {
- 'DefaultCategories.HOUSING': income * .4,
- 'DefaultCategories.FOOD': income * .2,
- 'DefaultCategories.GAS': income * .1,
- 'DefaultCategories.SHOPPING': income * .2
+ DefaultCategories.HOUSING: self.income * .4,
+ DefaultCategories.FOOD: self.income * .2,
+ DefaultCategories.GAS: self.income * .1,
+ DefaultCategories.SHOPPING: self.income * .2,
...
}
@@ -373,9 +373,9 @@ Instead of keeping the `monthly_spending` aggregate table in the **SQL Database*
We might only want to store a month of `transactions` data in the database, while storing the rest in a data warehouse or in an **Object Store**. An **Object Store** such as Amazon S3 can comfortably handle the constraint of 250 GB of new content per month.
-To address the 2,000 *average* read requests per second (higher at peak), traffic for popular content should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. The **SQL Read Replicas** should be able to handle the cache misses, as long as the replicas are not bogged down with replicating writes.
+To address the 200 *average* read requests per second (higher at peak), traffic for popular content should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. The **SQL Read Replicas** should be able to handle the cache misses, as long as the replicas are not bogged down with replicating writes.
-200 *average* transaction writes per second (higher at peak) might be tough for a single **SQL Write Master-Slave**. We might need to employ additional SQL scaling patterns:
+2,000 *average* transaction writes per second (higher at peak) might be tough for a single **SQL Write Master-Slave**. We might need to employ additional SQL scaling patterns:
* [Federation](https://github.com/donnemartin/system-design-primer#federation)
* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
From 727a2f8bba898010ee72a8d8405b16ef91630489 Mon Sep 17 00:00:00 2001
From: John Richardson <42470533+John-Richardson@users.noreply.github.com>
Date: Tue, 7 Jul 2020 03:05:50 +0200
Subject: [PATCH 60/72] Remove redundant SQL index in Pastebin exercise (#405)
---
solutions/system_design/pastebin/README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/solutions/system_design/pastebin/README.md b/solutions/system_design/pastebin/README.md
index 756c78c2..2d87ddcc 100644
--- a/solutions/system_design/pastebin/README.md
+++ b/solutions/system_design/pastebin/README.md
@@ -116,7 +116,7 @@ paste_path varchar(255) NOT NULL
PRIMARY KEY(shortlink)
```
-We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `shortlink ` and `created_at` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
+Setting the primary key to be based on the `shortlink` column creates an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) that the database uses to enforce uniqueness. We'll create an additional index on `created_at` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
To generate the unique url, we could:
From 2fe45a93914b322d07c0815a6e7f10f405519fff Mon Sep 17 00:00:00 2001
From: Harry Moreno
Date: Mon, 6 Jul 2020 23:53:01 -0400
Subject: [PATCH 61/72] Additional question, build an exchange
fixes https://github.com/donnemartin/system-design-primer/issues/281
---
README.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/README.md b/README.md
index 5d6935df..f82c73bf 100644
--- a/README.md
+++ b/README.md
@@ -1672,6 +1672,7 @@ Handy metrics based on numbers above:
| Design an online multiplayer card game | [indieflashblog.com](http://www.indieflashblog.com/how-to-create-an-asynchronous-multiplayer-game.html)
[buildnewgames.com](http://buildnewgames.com/real-time-multiplayer/) |
| Design a garbage collection system | [stuffwithstuff.com](http://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-collector/)
[washington.edu](http://courses.cs.washington.edu/courses/csep521/07wi/prj/rick.pdf) |
| Design an API rate limiter | [https://stripe.com/blog/](https://stripe.com/blog/rate-limiters) |
+| Design a Stock Exchange (like NASDAQ or Binance) | [Jane Street](https://youtu.be/b1e4t2k2KJY)
[Golang Implementation](https://around25.com/blog/building-a-trading-engine-for-a-crypto-exchange/)
[Go Implemenation](http://bhomnick.net/building-a-simple-limit-order-in-go/) |
| Add a system design question | [Contribute](#contributing) |
### Real world architectures
From 0beb557e8f2f239e961b0948bbf75504f9bf03c4 Mon Sep 17 00:00:00 2001
From: Manas Gupta
Date: Wed, 8 Jul 2020 06:24:01 +0530
Subject: [PATCH 62/72] Add CAP theorem video link (#400)
---
README.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/README.md b/README.md
index 5d6935df..11a91109 100644
--- a/README.md
+++ b/README.md
@@ -468,6 +468,7 @@ AP is a good choice if the business needs allow for [eventual consistency](#even
* [CAP theorem revisited](http://robertgreiner.com/2014/08/cap-theorem-revisited/)
* [A plain english introduction to CAP theorem](http://ksat.me/a-plain-english-introduction-to-cap-theorem)
* [CAP FAQ](https://github.com/henryr/cap-faq)
+* [The CAP theorem](https://www.youtube.com/watch?v=k-Yaq8AHlFA)
## Consistency patterns
From 06b3ed2adc8336ba068b1673c42c6762528efe7a Mon Sep 17 00:00:00 2001
From: Joilson Cisne
Date: Tue, 7 Jul 2020 21:56:05 -0300
Subject: [PATCH 63/72] Fix loop bug in deck of cards exercise (#396)
---
.../object_oriented_design/deck_of_cards/deck_of_cards.ipynb | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/solutions/object_oriented_design/deck_of_cards/deck_of_cards.ipynb b/solutions/object_oriented_design/deck_of_cards/deck_of_cards.ipynb
index 45b217a0..1a9bc1c5 100644
--- a/solutions/object_oriented_design/deck_of_cards/deck_of_cards.ipynb
+++ b/solutions/object_oriented_design/deck_of_cards/deck_of_cards.ipynb
@@ -122,7 +122,7 @@
"\n",
" def score(self):\n",
" total_value = 0\n",
- " for card in card:\n",
+ " for card in self.cards:\n",
" total_value += card.value\n",
" return total_value\n",
"\n",
From 78e2eb5df83a14c6569815a182c9aa9af682f3fb Mon Sep 17 00:00:00 2001
From: Ganessh Kumar
Date: Thu, 9 Jul 2020 05:40:16 +0530
Subject: [PATCH 64/72] Update dead links (#321)
---
README.md | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index 11a91109..765eb235 100644
--- a/README.md
+++ b/README.md
@@ -1756,7 +1756,7 @@ Handy metrics based on numbers above:
* [Box Blogs](https://blog.box.com/blog/category/engineering)
* [Cloudera Developer Blog](http://blog.cloudera.com/)
* [Dropbox Tech Blog](https://tech.dropbox.com/)
-* [Engineering at Quora](http://engineering.quora.com/)
+* [Engineering at Quora](https://www.quora.com/q/quoraengineering)
* [Ebay Tech Blog](http://www.ebaytechblog.com/)
* [Evernote Tech Blog](https://blog.evernote.com/tech/)
* [Etsy Code as Craft](http://codeascraft.com/)
@@ -1776,9 +1776,8 @@ Handy metrics based on numbers above:
* [Microsoft Engineering](https://engineering.microsoft.com/)
* [Microsoft Python Engineering](https://blogs.msdn.microsoft.com/pythonengineering/)
* [Netflix Tech Blog](http://techblog.netflix.com/)
-* [Paypal Developer Blog](https://devblog.paypal.com/category/engineering/)
+* [Paypal Developer Blog](https://medium.com/paypal-engineering)
* [Pinterest Engineering Blog](https://medium.com/@Pinterest_Engineering)
-* [Quora Engineering](https://engineering.quora.com/)
* [Reddit Blog](http://www.redditblog.com/)
* [Salesforce Engineering Blog](https://developer.salesforce.com/blogs/engineering/)
* [Slack Engineering Blog](https://slack.engineering/)
From ad5435ba0dcaa67773f45de945699788d9a47790 Mon Sep 17 00:00:00 2001
From: Varsha Muzumdar
Date: Thu, 9 Jul 2020 17:48:47 -0700
Subject: [PATCH 65/72] Add links for latency based and geolocation based
routing (#319)
---
README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index 765eb235..1cb2d69c 100644
--- a/README.md
+++ b/README.md
@@ -601,8 +601,8 @@ Services such as [CloudFlare](https://www.cloudflare.com/dns/) and [Route 53](ht
* Prevent traffic from going to servers under maintenance
* Balance between varying cluster sizes
* A/B testing
-* Latency-based
-* Geolocation-based
+* [Latency-based](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-latency)
+* [Geolocation-based](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-geo)
### Disadvantage(s): DNS
From cbaae481a54f949b5db4e3b7ea878f19ac0839eb Mon Sep 17 00:00:00 2001
From: Noe Brito
Date: Thu, 9 Jul 2020 17:49:43 -0700
Subject: [PATCH 66/72] Clarify CDN advantages (#310)
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 1cb2d69c..631c484f 100644
--- a/README.md
+++ b/README.md
@@ -628,7 +628,7 @@ A content delivery network (CDN) is a globally distributed network of proxy serv
Serving content from CDNs can significantly improve performance in two ways:
-* Users receive content at data centers close to them
+* Users receive content from data centers close to them
* Your servers do not have to serve requests that the CDN fulfills
### Push CDNs
From b5173d60d5a287e66aab97129537ea897e02beb4 Mon Sep 17 00:00:00 2001
From: Adam Dobrawy
Date: Sat, 11 Jul 2020 03:01:12 +0200
Subject: [PATCH 67/72] Change disk to HDD for clarity (#295)
---
README.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index 631c484f..9127d157 100644
--- a/README.md
+++ b/README.md
@@ -1613,9 +1613,9 @@ Read 4 KB randomly from SSD* 150,000 ns 150 us ~1GB/se
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
-Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
+HDD seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD
-Read 1 MB sequentially from disk 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD
+Read 1 MB sequentially from HDD 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Notes
@@ -1627,7 +1627,7 @@ Notes
Handy metrics based on numbers above:
-* Read sequentially from disk at 30 MB/s
+* Read sequentially from HDD at 30 MB/s
* Read sequentially from 1 Gbps Ethernet at 100 MB/s
* Read sequentially from SSD at 1 GB/s
* Read sequentially from main memory at 4 GB/s
From aaa0acc80d9c82d7433f75b536ba450b34186224 Mon Sep 17 00:00:00 2001
From: Donne Martin
Date: Sun, 12 Jul 2020 11:58:35 -0400
Subject: [PATCH 68/72] Update README.md
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index f82c73bf..18f1d0ec 100644
--- a/README.md
+++ b/README.md
@@ -1672,7 +1672,7 @@ Handy metrics based on numbers above:
| Design an online multiplayer card game | [indieflashblog.com](http://www.indieflashblog.com/how-to-create-an-asynchronous-multiplayer-game.html)
[buildnewgames.com](http://buildnewgames.com/real-time-multiplayer/) |
| Design a garbage collection system | [stuffwithstuff.com](http://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-collector/)
[washington.edu](http://courses.cs.washington.edu/courses/csep521/07wi/prj/rick.pdf) |
| Design an API rate limiter | [https://stripe.com/blog/](https://stripe.com/blog/rate-limiters) |
-| Design a Stock Exchange (like NASDAQ or Binance) | [Jane Street](https://youtu.be/b1e4t2k2KJY)
[Golang Implementation](https://around25.com/blog/building-a-trading-engine-for-a-crypto-exchange/)
[Go Implemenation](http://bhomnick.net/building-a-simple-limit-order-in-go/) |
+| Design a Stock Exchange (like NASDAQ or Binance) | [Jane Street](https://youtu.be/b1e4t2k2KJY)
[Golang Implementation](https://around25.com/blog/building-a-trading-engine-for-a-crypto-exchange/)
[Go Implemenation](http://bhomnick.net/building-a-simple-limit-order-in-go/) |
| Add a system design question | [Contribute](#contributing) |
### Real world architectures
From 828014aaac217f66ef2183fc65655440c672cf9c Mon Sep 17 00:00:00 2001
From: Donne Martin
Date: Sun, 12 Jul 2020 12:02:44 -0400
Subject: [PATCH 69/72] Remove extraneous __init__.py (#393)
---
solutions/system_design/__init__.py | 0
1 file changed, 0 insertions(+), 0 deletions(-)
delete mode 100644 solutions/system_design/__init__.py
diff --git a/solutions/system_design/__init__.py b/solutions/system_design/__init__.py
deleted file mode 100644
index e69de29b..00000000
From 7d39c44293c08a931a3122125ccb406e43ad43ae Mon Sep 17 00:00:00 2001
From: Daniel Julius Lasiman
Date: Sat, 18 Jul 2020 08:15:47 +0700
Subject: [PATCH 70/72] Remove Imgur dependency by storing images locally
(#168)
---
README-ja.md | 76 ++++++++++++++++++++++-----------------------
README-zh-Hans.md | 76 ++++++++++++++++++++++-----------------------
README-zh-TW.md | 76 ++++++++++++++++++++++-----------------------
README.md | 76 ++++++++++++++++++++++-----------------------
images/0vBc0hN.png | Bin 0 -> 102879 bytes
images/4edXG0T.png | Bin 0 -> 215313 bytes
images/4j99mhe.png | Bin 0 -> 110567 bytes
images/54GYsSx.png | Bin 0 -> 88910 bytes
images/5KeocQs.jpg | Bin 0 -> 193547 bytes
images/C9ioGtn.png | Bin 0 -> 248408 bytes
images/IOyLj4i.jpg | Bin 0 -> 67239 bytes
images/JdAsdvG.jpg | Bin 0 -> 21072 bytes
images/MzExP06.png | Bin 0 -> 217417 bytes
images/ONjORqk.png | Bin 0 -> 193579 bytes
images/OfVllex.png | Bin 0 -> 170685 bytes
images/Q6z24La.png | Bin 0 -> 45525 bytes
images/TcUo2fw.png | Bin 0 -> 493992 bytes
images/U3qV33e.png | Bin 0 -> 255079 bytes
images/V5q57vU.png | Bin 0 -> 296637 bytes
images/Xkm5CXz.png | Bin 0 -> 1883180 bytes
images/b4YtAEN.png | Bin 0 -> 556323 bytes
images/bWxPtQA.png | Bin 0 -> 198105 bytes
images/bgLMI2u.png | Bin 0 -> 37358 bytes
images/cdCv5g7.png | Bin 0 -> 130386 bytes
images/fNcl65g.png | Bin 0 -> 164341 bytes
images/h81n9iK.png | Bin 0 -> 41959 bytes
images/h9TAuGI.jpg | Bin 0 -> 63663 bytes
images/iF4Mkb5.png | Bin 0 -> 107611 bytes
images/jj3A5N8.png | Bin 0 -> 322689 bytes
images/jrUBAF7.png | Bin 0 -> 342123 bytes
images/krAHLGg.png | Bin 0 -> 183931 bytes
images/kxtjqgE.png | Bin 0 -> 113844 bytes
images/n16iOGk.png | Bin 0 -> 22528 bytes
images/n41Azff.png | Bin 0 -> 18456 bytes
images/rgSrvjG.png | Bin 0 -> 142491 bytes
images/wU8x5Id.png | Bin 0 -> 148666 bytes
images/wXGqG5f.png | Bin 0 -> 59143 bytes
images/yB5SYwm.png | Bin 0 -> 150899 bytes
images/yzDrJtA.jpg | Bin 0 -> 20669 bytes
images/zdCAkB3.png | Bin 0 -> 1397556 bytes
40 files changed, 152 insertions(+), 152 deletions(-)
create mode 100644 images/0vBc0hN.png
create mode 100644 images/4edXG0T.png
create mode 100644 images/4j99mhe.png
create mode 100644 images/54GYsSx.png
create mode 100644 images/5KeocQs.jpg
create mode 100644 images/C9ioGtn.png
create mode 100644 images/IOyLj4i.jpg
create mode 100644 images/JdAsdvG.jpg
create mode 100644 images/MzExP06.png
create mode 100644 images/ONjORqk.png
create mode 100644 images/OfVllex.png
create mode 100644 images/Q6z24La.png
create mode 100644 images/TcUo2fw.png
create mode 100644 images/U3qV33e.png
create mode 100644 images/V5q57vU.png
create mode 100644 images/Xkm5CXz.png
create mode 100644 images/b4YtAEN.png
create mode 100644 images/bWxPtQA.png
create mode 100644 images/bgLMI2u.png
create mode 100644 images/cdCv5g7.png
create mode 100644 images/fNcl65g.png
create mode 100644 images/h81n9iK.png
create mode 100644 images/h9TAuGI.jpg
create mode 100644 images/iF4Mkb5.png
create mode 100644 images/jj3A5N8.png
create mode 100644 images/jrUBAF7.png
create mode 100644 images/krAHLGg.png
create mode 100644 images/kxtjqgE.png
create mode 100644 images/n16iOGk.png
create mode 100644 images/n41Azff.png
create mode 100644 images/rgSrvjG.png
create mode 100644 images/wU8x5Id.png
create mode 100644 images/wXGqG5f.png
create mode 100644 images/yB5SYwm.png
create mode 100644 images/yzDrJtA.jpg
create mode 100644 images/zdCAkB3.png
diff --git a/README-ja.md b/README-ja.md
index 4257e495..cb6633d5 100644
--- a/README-ja.md
+++ b/README-ja.md
@@ -3,7 +3,7 @@
# システム設計入門
-
+
@@ -44,7 +44,7 @@
## 暗記カード
-
+
@@ -61,7 +61,7 @@
コード技術面接用の問題を探している場合は[**こちら**](https://github.com/donnemartin/interactive-coding-challenges)
-
+
@@ -91,7 +91,7 @@
> それぞれのセクションはより学びを深めるような他の文献へのリンクが貼られています。
-
+
@@ -180,7 +180,7 @@
> 学習スパンに応じてみるべきトピックス (short, medium, long)
-![Imgur](http://i.imgur.com/OfVllex.png)
+![Imgur](images/OfVllex.png)
**Q: 面接のためには、ここにあるものすべてをやらないといけないのでしょうか?**
@@ -302,49 +302,49 @@
[問題と解答を見る](solutions/system_design/pastebin/README.md)
-![Imgur](http://i.imgur.com/4edXG0T.png)
+![Imgur](images/4edXG0T.png)
### Twitterタイムライン&検索 (もしくはFacebookフィード&検索)を設計する
[問題と解答を見る](solutions/system_design/twitter/README.md)
-![Imgur](http://i.imgur.com/jrUBAF7.png)
+![Imgur](images/jrUBAF7.png)
### ウェブクローラーの設計
[問題と解答を見る](solutions/system_design/web_crawler/README.md)
-![Imgur](http://i.imgur.com/bWxPtQA.png)
+![Imgur](images/bWxPtQA.png)
### Mint.comの設計
[問題と解答を見る](solutions/system_design/mint/README.md)
-![Imgur](http://i.imgur.com/V5q57vU.png)
+![Imgur](images/V5q57vU.png)
### SNSサービスのデータ構造を設計する
[問題と解答を見る](solutions/system_design/social_graph/README.md)
-![Imgur](http://i.imgur.com/cdCv5g7.png)
+![Imgur](images/cdCv5g7.png)
### 検索エンジンのキー/バリュー構造を設計する
[問題と解答を見る](solutions/system_design/query_cache/README.md)
-![Imgur](http://i.imgur.com/4j99mhe.png)
+![Imgur](images/4j99mhe.png)
### Amazonのカテゴリ毎の売り上げランキングを設計する
[問題と解答を見る](solutions/system_design/sales_rank/README.md)
-![Imgur](http://i.imgur.com/MzExP06.png)
+![Imgur](images/MzExP06.png)
### AWS上で100万人規模のユーザーを捌くサービスを設計する
[問題と解答を見る](solutions/system_design/scaling_aws/README.md)
-![Imgur](http://i.imgur.com/jj3A5N8.png)
+![Imgur](images/jj3A5N8.png)
## オブジェクト指向設計問題と解答
@@ -436,7 +436,7 @@
### CAP 理論
-
+
Source: CAP theorem revisited
@@ -530,7 +530,7 @@
## ドメインネームシステム
-
+
Source: DNS security presentation
@@ -568,7 +568,7 @@ DNSは少数のオーソライズされたサーバーが上位に位置する
## コンテンツデリバリーネットワーク(Content delivery network)
-
+
Source: Why use a CDN
@@ -609,7 +609,7 @@ CDNを用いてコンテンツを配信することで以下の二つの理由
## ロードバランサー
-
+
Source: Scalable system design patterns
@@ -679,7 +679,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
## リバースプロキシ(webサーバー)
-
+
Source: Wikipedia
@@ -722,7 +722,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
## アプリケーション層
-
+
Source: Intro to architecting systems for scale
@@ -759,7 +759,7 @@ Layer 7 ロードバランサーは [アプリケーションレイヤー](#通
## データベース
-
+
Source: Scaling up to your first 10 million users
@@ -782,7 +782,7 @@ SQLなどのリレーショナルデータベースはテーブルに整理さ
マスターデータベースが読み取りと書き込みを処理し、書き込みを一つ以上のスレーブデータベースに複製します。スレーブデータベースは読み取りのみを処理します。スレーブデータベースは木構造のように追加のスレーブにデータを複製することもできます。マスターデータベースがオフラインになった場合には、いずれかのスレーブがマスターに昇格するか、新しいマスターデータベースが追加されるまでは読み取り専用モードで稼働します。
-
+
Source: Scalability, availability, stability, patterns
@@ -797,7 +797,7 @@ SQLなどのリレーショナルデータベースはテーブルに整理さ
いずれのマスターも読み取り書き込みの両方に対応する。書き込みに関してはそれぞれ協調する。いずれかのマスターが落ちても、システム全体としては読み書き両方に対応したまま運用できる。
-
+
Source: Scalability, availability, stability, patterns
@@ -825,7 +825,7 @@ SQLなどのリレーショナルデータベースはテーブルに整理さ
#### Federation
-
+
Source: Scaling up to your first 10 million users
@@ -846,7 +846,7 @@ SQLなどのリレーショナルデータベースはテーブルに整理さ
#### シャーディング
-
+
Source: Scalability, availability, stability, patterns
@@ -990,7 +990,7 @@ NoSQL は **key-value store**、 **document-store**、 **wide column store**、
#### ワイドカラムストア
-
+
Source: SQL & NoSQL, a brief history
@@ -1013,7 +1013,7 @@ Googleは[Bigtable](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/cha
#### グラフデータベース
-
+
Source: Graph database
@@ -1041,7 +1041,7 @@ Googleは[Bigtable](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/cha
### SQLか?NoSQLか?
-
+
Source: Transitioning from RDBMS to NoSQL
@@ -1083,7 +1083,7 @@ NoSQLに適するサンプルデータ:
## キャッシュ
-
+
Source: Scalable system design patterns
@@ -1154,7 +1154,7 @@ Redisはさらに以下のような機能を備えています:
#### キャッシュアサイド
-
+
Source: From cache to in-memory data grid
@@ -1190,7 +1190,7 @@ def get_user(self, user_id):
#### ライトスルー
-
+
Source: Scalability, availability, stability, patterns
@@ -1225,7 +1225,7 @@ def set_user(user_id, values):
#### ライトビハインド (ライトバック)
-
+
Source: Scalability, availability, stability, patterns
@@ -1243,7 +1243,7 @@ def set_user(user_id, values):
#### リフレッシュアヘッド
-
+
Source: From cache to in-memory data grid
@@ -1275,7 +1275,7 @@ def set_user(user_id, values):
## 非同期処理
-
+
Source: Intro to architecting systems for scale
@@ -1321,7 +1321,7 @@ def set_user(user_id, values):
## 通信
-
+
Source: OSI 7 layer model
@@ -1353,7 +1353,7 @@ HTTPは**TCP** や **UDP** などの低級プロトコルに依存している
### 伝送制御プロトコル (TCP)
-
+
Source: How to make a multiplayer game
@@ -1377,7 +1377,7 @@ TCPは高い依存性を要し、時間制約が厳しくないものに適し
### ユーザデータグラムプロトコル (UDP)
-
+
Source: How to make a multiplayer game
@@ -1406,7 +1406,7 @@ TCPよりもUDPを使うのは:
### 遠隔手続呼出 (RPC)
-
+
Source: Crack the system design interview
@@ -1629,7 +1629,7 @@ Notes
> 世の中のシステムがどのように設計されているかについての記事
-
+
Source: Twitter timelines at scale
diff --git a/README-zh-Hans.md b/README-zh-Hans.md
index 1b997f3c..15de279c 100644
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@@ -8,7 +8,7 @@
# 系统设计入门
-
+
@@ -49,7 +49,7 @@
## 抽认卡
-
+
@@ -66,7 +66,7 @@
你正在寻找资源以准备[**编程面试**](https://github.com/donnemartin/interactive-coding-challenges)吗?
-
+
@@ -97,7 +97,7 @@
-
+
@@ -186,7 +186,7 @@
> 基于你面试的时间线(短、中、长)去复习那些推荐的主题。
-![Imgur](http://i.imgur.com/OfVllex.png)
+![Imgur](images/OfVllex.png)
**问:对于面试来说,我需要知道这里的所有知识点吗?**
@@ -307,49 +307,49 @@
[查看实践与解答](solutions/system_design/pastebin/README.md)
-![Imgur](http://i.imgur.com/4edXG0T.png)
+![Imgur](images/4edXG0T.png)
### 设计 Twitter 时间线和搜索 (或者 Facebook feed 和搜索)
[查看实践与解答](solutions/system_design/twitter/README.md)
-![Imgur](http://i.imgur.com/jrUBAF7.png)
+![Imgur](images/jrUBAF7.png)
### 设计一个网页爬虫
[查看实践与解答](solutions/system_design/web_crawler/README.md)
-![Imgur](http://i.imgur.com/bWxPtQA.png)
+![Imgur](images/bWxPtQA.png)
### 设计 Mint.com
[查看实践与解答](solutions/system_design/mint/README.md)
-![Imgur](http://i.imgur.com/V5q57vU.png)
+![Imgur](images/V5q57vU.png)
### 为一个社交网络设计数据结构
[查看实践与解答](solutions/system_design/social_graph/README.md)
-![Imgur](http://i.imgur.com/cdCv5g7.png)
+![Imgur](images/cdCv5g7.png)
### 为搜索引擎设计一个 key-value 储存
[查看实践与解答](solutions/system_design/query_cache/README.md)
-![Imgur](http://i.imgur.com/4j99mhe.png)
+![Imgur](images/4j99mhe.png)
### 设计按类别分类的 Amazon 销售排名
[查看实践与解答](solutions/system_design/sales_rank/README.md)
-![Imgur](http://i.imgur.com/MzExP06.png)
+![Imgur](images/MzExP06.png)
### 在 AWS 上设计一个百万用户级别的系统
[查看实践与解答](solutions/system_design/scaling_aws/README.md)
-![Imgur](http://i.imgur.com/jj3A5N8.png)
+![Imgur](images/jj3A5N8.png)
## 面向对象设计的面试问题及解答
@@ -441,7 +441,7 @@
### CAP 理论
-
+
来源:再看 CAP 理论
@@ -536,7 +536,7 @@ DNS 和 email 等系统使用的是此种方式。最终一致性在高可用性
## 域名系统
-
+
来源:DNS 安全介绍
@@ -574,7 +574,7 @@ DNS 和 email 等系统使用的是此种方式。最终一致性在高可用性
## 内容分发网络(CDN)
-
+
来源:为什么使用 CDN
@@ -613,7 +613,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
## 负载均衡器
-
+
来源:可扩展的系统设计模式
@@ -682,7 +682,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
## 反向代理(web 服务器)
-
+
资料来源:维基百科
@@ -726,7 +726,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
## 应用层
-
+
资料来源:可缩放系统构架介绍
@@ -764,7 +764,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
## 数据库
-
+
资料来源:扩展你的用户数到第一个一千万
@@ -785,7 +785,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
关系型数据库扩展包括许多技术:**主从复制**、**主主复制**、**联合**、**分片**、**非规范化**和 **SQL调优**。
-
+
资料来源:可扩展性、可用性、稳定性、模式
@@ -800,7 +800,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
- 参考[不利之处:复制](#不利之处复制)中,主从复制和主主复制**共同**的问题。
-
+
资料来源:可扩展性、可用性、稳定性、模式
@@ -835,7 +835,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
#### 联合
-
+
资料来源:扩展你的用户数到第一个一千万
@@ -857,7 +857,7 @@ CDN 拉取是当第一个用户请求该资源时,从服务器上拉取资源
#### 分片
-
+
资料来源:可扩展性、可用性、稳定性、模式
@@ -1001,7 +1001,7 @@ MongoDB 和 CouchDB 等一些文档类型存储还提供了类似 SQL 语言的
#### 列型存储
-
+
资料来源: SQL 和 NoSQL,一个简短的历史
@@ -1024,7 +1024,7 @@ Google 发布了第一个列型存储数据库 [Bigtable](http://www.read.seas.h
#### 图数据库
-
+
资料来源:图数据库
@@ -1051,7 +1051,7 @@ Google 发布了第一个列型存储数据库 [Bigtable](http://www.read.seas.h
### SQL 还是 NoSQL
-
+
资料来源:从 RDBMS 转换到 NoSQL
@@ -1092,7 +1092,7 @@ Google 发布了第一个列型存储数据库 [Bigtable](http://www.read.seas.h
## 缓存
-
+
资料来源:可扩展的系统设计模式
@@ -1163,7 +1163,7 @@ Redis 有下列附加功能:
#### 缓存模式
-
+
资料来源:从缓存到内存数据网格
@@ -1199,7 +1199,7 @@ def get_user(self, user_id):
#### 直写模式
-
+
资料来源:可扩展性、可用性、稳定性、模式
@@ -1234,7 +1234,7 @@ def set_user(user_id, values):
#### 回写模式
-
+
资料来源:可扩展性、可用性、稳定性、模式
@@ -1252,7 +1252,7 @@ def set_user(user_id, values):
#### 刷新
-
+
资料来源:从缓存到内存数据网格
@@ -1284,7 +1284,7 @@ def set_user(user_id, values):
## 异步
-
+
资料来源:可缩放系统构架介绍
@@ -1330,7 +1330,7 @@ def set_user(user_id, values):
## 通讯
-
+
资料来源:OSI 7层模型
@@ -1365,7 +1365,7 @@ HTTP 是依赖于较低级协议(如 **TCP** 和 **UDP**)的应用层协议
### 传输控制协议(TCP)
-
+
资料来源:如何制作多人游戏
@@ -1389,7 +1389,7 @@ TCP 对于需要高可靠性但时间紧迫的应用程序很有用。比如包
### 用户数据报协议(UDP)
-
+
资料来源:如何制作多人游戏
@@ -1418,7 +1418,7 @@ UDP 可靠性更低但适合用在网络电话、视频聊天,流媒体和实
### 远程过程调用协议(RPC)
-
+
Source: Crack the system design interview
@@ -1640,7 +1640,7 @@ Notes
> 关于现实中真实的系统是怎么设计的文章。
-
+
Source: Twitter timelines at scale
diff --git a/README-zh-TW.md b/README-zh-TW.md
index e7996c45..8f302155 100644
--- a/README-zh-TW.md
+++ b/README-zh-TW.md
@@ -3,7 +3,7 @@
# 系統設計入門
-
+
@@ -44,7 +44,7 @@
## 學習單字卡
-
+
@@ -61,7 +61,7 @@
你正在尋找資源來面對[**程式語言面試**](https://github.com/donnemartin/interactive-coding-challenges)嗎?
-
+
@@ -91,7 +91,7 @@
> 每一章節都包含更深入資源的連結。
-
+
@@ -180,7 +180,7 @@
> 基於你面試的時間 (短、中、長) 來複習這些建議的主題。
-![Imgur](http://i.imgur.com/OfVllex.png)
+![Imgur](images/OfVllex.png)
**Q: 對於面試者來說,我需要知道這裡所有的知識嗎?**
@@ -302,49 +302,49 @@
[閱讀練習與解答](solutions/system_design/pastebin/README.md)
-![Imgur](http://i.imgur.com/4edXG0T.png)
+![Imgur](images/4edXG0T.png)
### 設計一個像是 Twitter 的 timeline (或 Facebook feed)設計一個 Twitter 搜尋功能 (or Facebook 搜尋功能)
[閱讀練習與解答](solutions/system_design/twitter/README.md)
-![Imgur](http://i.imgur.com/jrUBAF7.png)
+![Imgur](images/jrUBAF7.png)
### 設計一個爬蟲系統
[閱讀練習與解答](solutions/system_design/web_crawler/README.md)
-![Imgur](http://i.imgur.com/bWxPtQA.png)
+![Imgur](images/bWxPtQA.png)
### 設計 Mint.com 網站
[閱讀練習與解答](solutions/system_design/mint/README.md)
-![Imgur](http://i.imgur.com/V5q57vU.png)
+![Imgur](images/V5q57vU.png)
### 設計一個社交網站的資料結構
[閱讀練習與解答](solutions/system_design/social_graph/README.md)
-![Imgur](http://i.imgur.com/cdCv5g7.png)
+![Imgur](images/cdCv5g7.png)
### 設計一個搜尋引擎使用的鍵值儲存資料結構
[閱讀練習與解答](solutions/system_design/query_cache/README.md)
-![Imgur](http://i.imgur.com/4j99mhe.png)
+![Imgur](images/4j99mhe.png)
### 設計一個根據產品分類的亞馬遜銷售排名
[閱讀練習與解答](solutions/system_design/sales_rank/README.md)
-![Imgur](http://i.imgur.com/MzExP06.png)
+![Imgur](images/MzExP06.png)
### 在 AWS 上設計一個百萬用戶等級的系統
[閱讀練習與解答](solutions/system_design/scaling_aws/README.md)
-![Imgur](http://i.imgur.com/jj3A5N8.png)
+![Imgur](images/jj3A5N8.png)
## 物件導向設計面試問題與解答
@@ -435,7 +435,7 @@
### CAP 理論
-
+
來源:再看 CAP 理論
@@ -529,7 +529,7 @@ DNS 或是電子郵件系統使用的就是這種方式,最終一致性在高
## 域名系統
-
+
資料來源:DNS 安全介紹
@@ -567,7 +567,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
## 內容傳遞網路(CDN)
-
+
來源:為什麼要使用 CDN
@@ -608,7 +608,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
## 負載平衡器
-
+
來源:可擴展的系統設計模式
@@ -678,7 +678,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
## 反向代理(網頁伺服器)
-
+
來源:維基百科
@@ -721,7 +721,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
## 應用層
-
+
資料來源:可縮放式系統架構介紹
@@ -758,7 +758,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
## 資料庫
-
+
來源:擴展你的使用者數量到第一個一千萬量級
@@ -781,7 +781,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
主資料庫負責讀和寫,並且將寫入的資料複寫至一或多個從屬資料庫中,從屬資料庫只負責讀取。而從屬資料庫可以再將寫入複製到更多以樹狀結構的其他資料庫中。如果主資料庫離線了,系統可以以只讀模式運行,直到某個從屬資料庫被提升為主資料庫,或有新的主資料庫出現。
-
+
來源: 可擴展性、可用性、穩定性及其模式
@@ -796,7 +796,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
兩個主要的資料庫都負責讀取和寫入,並且兩者互相協調。如果其中一個主要資料庫離線,系統可以繼續運作。
-
+
來源: 可擴展性、可用性、穩定性及其模式
@@ -824,7 +824,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
#### 聯邦式資料庫
-
+
來源:擴展你的使用者數量到第一個一千萬量級
@@ -845,7 +845,7 @@ DNS 是階層式的架構,一部分的 DNS 伺服器位於頂層,當查詢
#### 分片
-
+
來源: 可擴展性、可用性、穩定性及其模式
@@ -991,7 +991,7 @@ NoSQL 指的是 **鍵-值對的資料庫**、**文件類型資料庫**、**列
#### 列儲存型資料庫
-
+
來源:SQL 和 NoSQL,簡短的歷史介紹
@@ -1014,7 +1014,7 @@ Google 發表了第一個列儲存型資料庫 [Bigtable](http://www.read.seas.h
#### 圖形資料庫
-
+
來源: 圖形化資料庫
@@ -1042,7 +1042,7 @@ Google 發表了第一個列儲存型資料庫 [Bigtable](http://www.read.seas.h
### SQL 或 NoSQL
-
+
來源:從 RDBMS 轉換到 NoSQL
@@ -1084,7 +1084,7 @@ Google 發表了第一個列儲存型資料庫 [Bigtable](http://www.read.seas.h
## 快取
-
+
來源:可擴展的系統設計模式
@@ -1155,7 +1155,7 @@ Redis 還有以下額外的功能:
#### 快取模式
-
+
資料來源:從快取到記憶體資料網格
@@ -1191,7 +1191,7 @@ def get_user(self, user_id):
#### 寫入模式
-
+
資料來源:可獲展性、可用性、穩定性與模式
@@ -1226,7 +1226,7 @@ def set_user(user_id, values):
#### 事後寫入(回寫)
-
+
資料來源:可獲展性、可用性、穩定性與模式
@@ -1244,7 +1244,7 @@ def set_user(user_id, values):
#### 更新式快取
-
+
來源:從快取到記憶體資料網格技術
@@ -1276,7 +1276,7 @@ def set_user(user_id, values):
## 非同步機制
-
+
資料來源:可縮放性系統架構介紹
@@ -1322,7 +1322,7 @@ def set_user(user_id, values):
## 通訊
-
+
來源:OSI 七層模型
@@ -1354,7 +1354,7 @@ HTTP 是依賴於較底層的協議(例如:**TCP** 和 **UDP**) 的應用層
### 傳輸控制通訊協定(TCP)
-
+
來源:如何開發多人遊戲
@@ -1378,7 +1378,7 @@ TCP 對於需要高可靠、低時間急迫性的應用來說很有用,比如
### 使用者資料流通訊協定 (UDP)
-
+
資料來源:如何製作多人遊戲
@@ -1407,7 +1407,7 @@ UDP 的可靠性較低,但適合用在像是網路電話、視訊聊天、串
### 遠端程式呼叫 (RPC)
-
+
資料來源:破解系統設計面試
@@ -1630,7 +1630,7 @@ Notes
> 底下是關於真實世界的系統架構是如何設計的文章
-
+
資料來源:可擴展式的 Twitter 時間軸設計
diff --git a/README.md b/README.md
index fd8a8ff7..a2a1b86d 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@
# The System Design Primer
-
+
@@ -46,7 +46,7 @@ Additional topics for interview prep:
## Anki flashcards
-
+
@@ -63,7 +63,7 @@ Great for use while on-the-go.
Looking for resources to help you prep for the [**Coding Interview**](https://github.com/donnemartin/interactive-coding-challenges)?
-
+
@@ -93,7 +93,7 @@ Review the [Contributing Guidelines](CONTRIBUTING.md).
> Each section contains links to more in-depth resources.
-
+
@@ -183,7 +183,7 @@ Review the [Contributing Guidelines](CONTRIBUTING.md).
> Suggested topics to review based on your interview timeline (short, medium, long).
-![Imgur](http://i.imgur.com/OfVllex.png)
+![Imgur](images/OfVllex.png)
**Q: For interviews, do I need to know everything here?**
@@ -306,49 +306,49 @@ Check out the following links to get a better idea of what to expect:
[View exercise and solution](solutions/system_design/pastebin/README.md)
-![Imgur](http://i.imgur.com/4edXG0T.png)
+![Imgur](images/4edXG0T.png)
### Design the Twitter timeline and search (or Facebook feed and search)
[View exercise and solution](solutions/system_design/twitter/README.md)
-![Imgur](http://i.imgur.com/jrUBAF7.png)
+![Imgur](images/jrUBAF7.png)
### Design a web crawler
[View exercise and solution](solutions/system_design/web_crawler/README.md)
-![Imgur](http://i.imgur.com/bWxPtQA.png)
+![Imgur](images/bWxPtQA.png)
### Design Mint.com
[View exercise and solution](solutions/system_design/mint/README.md)
-![Imgur](http://i.imgur.com/V5q57vU.png)
+![Imgur](images/V5q57vU.png)
### Design the data structures for a social network
[View exercise and solution](solutions/system_design/social_graph/README.md)
-![Imgur](http://i.imgur.com/cdCv5g7.png)
+![Imgur](images/cdCv5g7.png)
### Design a key-value store for a search engine
[View exercise and solution](solutions/system_design/query_cache/README.md)
-![Imgur](http://i.imgur.com/4j99mhe.png)
+![Imgur](images/4j99mhe.png)
### Design Amazon's sales ranking by category feature
[View exercise and solution](solutions/system_design/sales_rank/README.md)
-![Imgur](http://i.imgur.com/MzExP06.png)
+![Imgur](images/MzExP06.png)
### Design a system that scales to millions of users on AWS
[View exercise and solution](solutions/system_design/scaling_aws/README.md)
-![Imgur](http://i.imgur.com/jj3A5N8.png)
+![Imgur](images/jj3A5N8.png)
## Object-oriented design interview questions with solutions
@@ -440,7 +440,7 @@ Generally, you should aim for **maximal throughput** with **acceptable latency**
### CAP theorem
-
+
Source: CAP theorem revisited
@@ -581,7 +581,7 @@ If both `Foo` and `Bar` each had 99.9% availability, their total availability in
## Domain name system
-
+
Source: DNS security presentation
@@ -619,7 +619,7 @@ Services such as [CloudFlare](https://www.cloudflare.com/dns/) and [Route 53](ht
## Content delivery network
-
+
Source: Why use a CDN
@@ -660,7 +660,7 @@ Sites with heavy traffic work well with pull CDNs, as traffic is spread out more
## Load balancer
-
+
Source: Scalable system design patterns
@@ -730,7 +730,7 @@ Load balancers can also help with horizontal scaling, improving performance and
## Reverse proxy (web server)
-
+
Source: Wikipedia
@@ -773,7 +773,7 @@ Additional benefits include:
## Application layer
-
+
Source: Intro to architecting systems for scale
@@ -808,7 +808,7 @@ Systems such as [Consul](https://www.consul.io/docs/index.html), [Etcd](https://
## Database
-
+
Source: Scaling up to your first 10 million users
@@ -831,7 +831,7 @@ There are many techniques to scale a relational database: **master-slave replica
The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.
-
+
Source: Scalability, availability, stability, patterns
@@ -846,7 +846,7 @@ The master serves reads and writes, replicating writes to one or more slaves, wh
Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.
-
+
Source: Scalability, availability, stability, patterns
@@ -874,7 +874,7 @@ Both masters serve reads and writes and coordinate with each other on writes. I
#### Federation
-
+
Source: Scaling up to your first 10 million users
@@ -895,7 +895,7 @@ Federation (or functional partitioning) splits up databases by function. For ex
#### Sharding
-
+
Source: Scalability, availability, stability, patterns
@@ -1039,7 +1039,7 @@ Document stores provide high flexibility and are often used for working with occ
#### Wide column store
-
+
Source: SQL & NoSQL, a brief history
@@ -1062,7 +1062,7 @@ Wide column stores offer high availability and high scalability. They are often
#### Graph database
-
+
Source: Graph database
@@ -1090,7 +1090,7 @@ Graphs databases offer high performance for data models with complex relationshi
### SQL or NoSQL
-
+
Source: Transitioning from RDBMS to NoSQL
@@ -1132,7 +1132,7 @@ Sample data well-suited for NoSQL:
## Cache
-
+
Source: Scalable system design patterns
@@ -1203,7 +1203,7 @@ Since you can only store a limited amount of data in cache, you'll need to deter
#### Cache-aside
-
+
Source: From cache to in-memory data grid
@@ -1239,7 +1239,7 @@ Subsequent reads of data added to cache are fast. Cache-aside is also referred
#### Write-through
-
+
Source: Scalability, availability, stability, patterns
@@ -1274,7 +1274,7 @@ Write-through is a slow overall operation due to the write operation, but subseq
#### Write-behind (write-back)
-
+
Source: Scalability, availability, stability, patterns
@@ -1292,7 +1292,7 @@ In write-behind, the application does the following:
#### Refresh-ahead
-
+
Source: From cache to in-memory data grid
@@ -1324,7 +1324,7 @@ Refresh-ahead can result in reduced latency vs read-through if the cache can acc
## Asynchronism
-
+
Source: Intro to architecting systems for scale
@@ -1370,7 +1370,7 @@ If queues start to grow significantly, the queue size can become larger than mem
## Communication
-
+
Source: OSI 7 layer model
@@ -1402,7 +1402,7 @@ HTTP is an application layer protocol relying on lower-level protocols such as *
### Transmission control protocol (TCP)
-
+
Source: How to make a multiplayer game
@@ -1426,7 +1426,7 @@ Use TCP over UDP when:
### User datagram protocol (UDP)
-
+
Source: How to make a multiplayer game
@@ -1455,7 +1455,7 @@ Use UDP over TCP when:
### Remote procedure call (RPC)
-
+
Source: Crack the system design interview
@@ -1681,7 +1681,7 @@ Handy metrics based on numbers above:
> Articles on how real world systems are designed.
-
+
Source: Twitter timelines at scale
diff --git a/images/0vBc0hN.png b/images/0vBc0hN.png
new file mode 100644
index 0000000000000000000000000000000000000000..f1466344e2b21440c48baba4089589665b11a2bf
GIT binary patch
literal 102879
zcmZU(WmKD8v^ARG?(R}bfC9nYN^y#7p-2nG-90!IFD}K47Ye1tA-EQIC>|UN!Tk&G
zx!<|tj{74S5FU2+T6@`^b4F{ZDd1vJVF3UDTqVUfS^xkN65=(Ffrj`5Z!Iev0Kg%(
zk(Jd@l9gr9aCQ1*V{ZunC`PC1qZ?=slI0p{VPJ9+0NDzJWPDRs#3noC6_LV-WEsll
zC*)A)xR?wkbO})BkOm5w3JnN4S4>}(fet=&Mhs=YfJj~)UA23`|13*cjf$M*xmg_x
zAwvo!R2T;2Qvn0(JJ?0Y0U6OsgQ{OB7)}&W1*M~VM~S@^dN7^3+V4TcJu#GIe8{r#jc!Pu)h+bPkK!gHOOj~oOoc*$<
zL-{*xu}A*y^&vVSbLa!+6i_3v$BOL7lsOG5uxHjEF9H7)FG^yb@)cgshbY#qt&uQ;
zB&=GSeSDY6C+1K42C8Hn0XW01NmvUB>8t~v{@h3npLVdY8u5!pO4?83m}}I(urg@a
zugWYS!f9gWyq}IzjM8P0?B7&8V!2j}0D&BP+Vj5dVB4Wl0P-Q-aO`l3St84*x{X$a
zADk+ni||Wp6mBVDPn1$DEBS;T1?)zjN1h_#P-+@MA~z8!u@59pDx0V%vN-xwzQtf>
zL76^c2ky0A1?eh)i>Tc9SXcx|ElqPin=v~(W!G@Pj7sNUfwR=f1Ulcr@K9Cz*vp-2BI-#)aUCI?<2l
zXILNP18TOqQD0UfUxJcJiC8uh+Q_#QdWRaI
zfL9DCdl2kb3j9TMn3IH(haaUGuZfws2B^=@lHy%ufYU*=Q$xNgi7#n66MZd$O9^(?
zvKrBkBO*Ijumz2lDZMB!KC;J;Dxb675{5ZtXZx
zYAc@4`K?*>QnzZWi+V;j^7P|?27lj%A7@amde^^x8ijMLZ7e+IS6cnockzY~G`&}m
zR4!|L=T<8ABAyfnP&My@oY40`dV%3$(v>(d8?Q-*WnNs?;^q3U$X(T
zrjLSRc45I9x9u{LUY_k1ve%;|BZTltK6vKJNz8~P+S!60amMdi!FYgT`@g!eexr!<
zmN+n%{~YW$(wY`h>fWlyD`Y^t1S;h5v1N1(ciF@aAiVO23=`Wz)OeB*fC}pRS^#v1
z0KwrR3rGrUXsr+_3}6xi*^A)4AN!OH=ykcM+rhp06e%GBX3S-PrA{slTs(+uK4CpX
zd+k*tMsuf!GkJBK$q5k-~hJfg`>~0iVVGIQ$Srq3+tT)4>A_Y<03r03Y(JqBI
z1L78#TA!rv08(0sN2se3CQ}p4SMKlViO83#sH-lw6_U1#90dHCaW-Y1#re1`u
zxrYIt|C~M#&ul_KSF|v#)kd0g0NT2XF!fk3`5MP5Rx6bgbm_G6TGk)%B#$r#gA7Ig
zHJC9eQ7uCb^fHC;k|3UF6iPdz;;sIIi1H0_A@OpyQjh9}@rLq--Uo@=)OlHdRhkz`
zic~MKzlgDY>0yE;e^$y?x>I6R(pPLNPO$K)<620gkyd+oHYm5n3zvjr!b#zL@Wh-i
zIY~K{#>F%Wj`7R3O@CegxTeE=VG*Dq65`KdENNeq3{}}mr!_}4jx|??hKCP^TL#qA
zb_Th>aHmUwg47-h?pZ6aW^(pbzABSy>T6zU8I?DbJK6AO@%)KNQynh-UN|$ZHbpls
z$v#e6MG`_PjD=0gPD({m4Qka&(dO4W*UPCkC^IPWHaM;O(D5bbOYN5$A{Copt#6Y&
z-?EiMhukfxVTv#sSnAMS${2@~K4UpW<#naAPFlsGmdJN_M*IZsZo9i+0A_&D15TTM
zYME^zRaI8`y9!_FUn(BbZ{_}&aGE&$e)nzBMSutSBL!awkxRv|W~++HB6ZrnhIYpPsyD
zP8~NFHdXrQse%-+XL#p6&t_HrQmNFo(=bvH(=f_!GrX(!?7)-lS7Ft*3LVA#D*3*-
zI+9D>EA&|DFS)8fi4b45;i$b8-mc9a>R$A@=f1}t?q2q8{~r0?+`+0)K7}xKF{M4&
znvViJEr27GV`X6H-LE)0umI||PS%W^K>tX*iLz?icNsWR)!?DdkiX`l|c2C`Vs3
z&yBJ1y785}jmyo}{%+Pk-+>EmA(e(#4FQg0teL7CYKszkwJq&p!vUo&^Pzi58@1iC
zhvhqi@G7|fmgQRS>Ehj@or3)}<#0wM?F1-l^|g<{AQ$k>HCqS>MU47CeEf1UGsJF-4fOZMYyj@R_znUR>?
zS=|#Si65sh7m83U8}(?)evA`kJ^a;7ELuA8F6H
zs+Fv*{P?dgvoCsMaf5WTv~R8_Nby9)Ae})iCqr23^uw1=MBL?UnJF}Rt*k!IpPy3V
zw0%oORXKBgKV}WqFHDW9&*kJwJY3`SQdUt$YGbfrvfXL5|6nrIt?Ha!o8~M>wl}?e
zd8ralZiQT$a4`D5KOk*8_czy(PqW|c?Rh)SSZrNvU@mt}$dxD`$jxs%vTC5J&9Hj`
zbKyc{D<>uAqd#dNSw}$0=*R+}9|~E>hhX730k4n85U@@oEB;`1+FVKvR)XkPChZCC
z!bns))KpY@tKKnDaU&nWAkQ6|GZBt$aSj~Ny#tqIx2+wh3i%}$ma$G-MxC(T={Kf>LJzrrX)j*cm{hd6PfYfApgd|BRcX9>)!sDXuy6^x
zc3pv2lX`Lp)?Dj#u{CMfzJI58YsumV3paHL*jDF
z3iAj*psT?*UHuW+~$nQDd^?&;~R4!GwwYlHz(6-Vxc%mKKm+@A!FUc+I{y5k>zWz(wA@Zd6Q^8Gr
zFSo|;#;}$p(7m6~#_7wM+CLLt
zBPkS^eFWn=(`Ul;PcixS*
zyN?xiuYB4Tf|s&5jH>Lys+o0-Js&-HddTVtS?@Z)4cC>arkS^j=V;}
z_+JyD*XHb>a-rS^;fCDR(e{U1wa%{vEiuQ0$EnNu&8C;q2Qo7qBagb=o+R`^yH`KX
zG)IbB5*89nB?tZB-@Nv;7pLm1n#XK&mn5C`uy+pvd0=w?R62(X9yg>?pO)c6=QBwi
zt&jUOiZr_coc#yJpQZCBa%;aPiNzInIYXUT5lL`RSDjdv
z&f1`54LQIi27nQ2mE!P&QbsjgeehIl7CYb5^P|HX_~fCzWo_9Bne4T&_`M!AK#LDx
z8s6P4YZMvTaU5acggl5}hB0<6R=ks=f;J2tgF6v?x4q4XWy3?<5-^<=4c!0$
zY~tq^5RjEa0RTt?l-|hbcmogHd{W-`dOR6v+0l(dGEkx%2^6CYN04xGGaUGO@EA`C
z`bUjM|6Qrn)vXMfI8SxwxTp~?@)ZKWtIUD+a|BoO-S{(WRS%ZLtqK^ba
z){Md{aM|~Nm${uIhI4Z86v+>Bd6#2kimDta8&vGL!3On
zJxa}rvFT~W4OQ=5i}=2cd}iF@Dw6k8JO+*%!e=$`2n$KVA+jt>VVPpC6Y|K|GG;F!!b#Eer`ar8MPsAgQ;GD#1xe
z&Q#=s>St5QtNhl7Z`N$;mV>M90WIAmH52i7wRSxNxye~;F0i8)>ba&i5r!>e;%$2U
zrMOYs*NwSvIn7R~5kBRQL+q44u3q!RtO)R#Q$It?7U}`q#23%m%m`Wh9bklu-gp9=
zb52Evh`@)vHI@&+!twVP#^0Bkew|{kK
zxV4!hwaH;g5WN5=n628KNa&?Bgoce-%c=
z%58LV@*L-#O3&W4>UdLmn~4N^$(2$P>4W<_DCOg^Vsh8Cv*-{WYOJ=Xa==u{TLG9_
z#z8l}lq$85qbjwa{Ru}T$J-kne9OX&CY{KJT6=DfqIrb`s;{E*r0caG`A;s|HppX
zDu?R}G(?WREs*wiFJrTadu=2W?28p;9uW6xXHV<8=}R(G$#)-7b2jNNgFjjSxU~P
z{I>Zh#08@7E=tOl3}~hXZ1yL31>M%~+3eSs)8z^VyvwQ^PN^T}YAOEfxx{(Yes^p$
z_ulIlKTi0mBaNA&bq`~QlXW;_(o}>XB$>l;S5-VRSU(OKoBTOlC7;vv`u4@UX>}55
znL&!g1TF0h+Y0j(8uFpQqs3o)f4x>cSmL{aHsgdQ;jrPU&!im7rft5zsL?4fQ@V|=
z>usitCkpK~+grntcfJ&E$9)Am?P;~vv*?$ku!gH-Od!PxuYF9hUuhL_ShQcq?ZrTm
z?{D@vSO#}`LqWjE*6Wo1D4;_(Xx?izV5-XS=OmrqVbZ6OjCJQlDa#=eddoy6Rw;Mp
zj>YT-(#@C&Vzc_x#wVjz^=b-#kj7ocH+5K`@{11g<4*L1=DCmOdiI>^PKrrnBor3U
zl`zpmk$)#y8^Ab{+QJ?yto+ugN$WR=X9d3MfA%)0MPaYwa?x(@V$Rg7H{pX^X5qZ%
z7D$_GrLRdNA7n@(E_?$P?>@aj#dBJxpPTUf&X*X{kHJ`ljT~WviGzxR7N=T
zbw4@zyi1bzn|`@J6)-Ti#2|hX3ZsJy$_s7A-
zB^`AlIpcU9!ie5b!herl3~|&H@8tS%alUWnZ@Y&NSHIE>;j;#~
zl&TKSV-!-n5HLYmq@TnI6!~HQ@bLW3Y8oBfhncwg7kl$)F#~lhpzt#{P%3b#-X_qJ3s&tG<
zqQ;G>lvo>rDTlfn>n;Gs%tdm$e0)M+mN978!v37z7F!Sj915xN&0ca-cjVxcI?Lh7
z^+@{Mc*)yc_W{Pp;!r$MkcWqdhy)nJKvp_nJUGVnG1pl{H=$ZqUEx!>uQ(@Eqw
zt@n4;%*Lvo`Tuhukh+tH9hec13!6Bm^+w40SzpYCX}z_k@AZ1*)O)YLufUTdz05W65r@yMtZ`!q
z%tjuF~G?l{L5A1`T&B50Bc>x2%8p7R=_>$LVzPkmH_xtsX
z|B}^2*=T*X2rp-Y%ohFrWlHLgCI&sqSJwS)sJoG6tbD7R=eZ6YdtlFW9nOReds2q)ouc7pf?;gSae(o`rSQezR;RkR8_8iU4wS#
zli=vbkR6w0Iaz
zlAwCcwuhm$)4`g~u#T~$u#aKLoL$l6mdbi)ODgHHr;=vT0L4b&`Gmq$+to^TuJ3+%
z{nO(eYs3>J{L^kw$HV2>P2fywdZ}P}K(R}?|6<~1S**Nd*J^46W1{|5Vmf2SkOtMM
zyiLfkzMpQyTF)6gDp~I>Yi@d{JC+#8#CMNHs4UlKJM+AUSpB+7dAHH56N#Qpc)UdW
z<-!ML>#$)3EDI7qseP}m5O_?VU8(bwRBvTp0-)~`hJD}FRUWw%CtJ2uP6r36Rvh<>
z*31S$Ex{qf-;xN%jC7s|eUodFm`#f=^_Gfxsn$6p?(?CeAc6?Q8$Zg_d&PwS*lSku
zmjVS=!bFnF`X;?@a13&Tcywb+B0Uq#fG`fV5Mv)0b_aByE8BgDfX(CUu5Uh~(2wZlD9sJQ33s0UasHZ{8T?;#_r!jfU98av4
zTcdbmP1Pup3*Y#u9ehDbp-Z7ZPW~>>YsF`;>gT9yC}42|Bs-+_z;Uf;bf~y`u)y&3
zvAbLnrfUS%mEy?xm&2e+sp^4|Q0J~0zgPhFF#BQsfNt*T0m7pr_kOH0C~Ek(8+dHZ
z?GYktBJwvBPFxca9+S5z*2*6>5ZX;1KSZ=lxWNiK!DK_Yfc1s3m{l@(@SjXqM7?eE
zMpqKP@p@~O@47AJg|n*xXtQRT<9Agyde0ui$yDx-_KgLVE47jK?eVcc;yyI_$
z%I^uY33x$I<(HQlP|;{s$QB?=zj>kwZ>M!VLbv&|DA1l8{FNR`abx(fvVZ=Vb!BJ7
zYIqr3*i3pe=0pU>zu
zt{sar;y-=Ga+rIVFpQ<}lQkz!5?AP3t7-DJr>f;;7V#wxpVqvEXcNr-g-DpxlqlywKP#X{Hh9?Um>9V-m
zuQ$#Y#sX(io8gtl{xg;LtuV~AO}WN=0Dpv@NSubZFrHcS#jMfPSm5Q)^IPhg(vF*h
z4kfExpArt0lcs?L{I8n16u^{zM^qB2&vYzf;e;ojzq&~`73xdOJfF5hH*yKdO37um%NI|<^t06u6j)$_qAjN$0f&yn;byWEH}Ic}O_
zpX^GY+>{I%H7AmUr^*|HFBd@cx8YN|
zX`s*T%~(+G_MhzZ_Oq}5il@}X{;2m&(QtWJ$ilWS0K_?-;Ri}UYgEper^3?RJ|Xhz
zmgDC0e@~D9s&k&;+)um#k7s#ay^2a8d9I;#hn%mN=hQ7k!he!t+;AEdWt&{{iJi7v
z!+%~vu!Z{ovbQ(oaTH`)71vl-jo4G)FH=OKJWps`)G1Jn!vt`H86iNZbr{!TjCo-a
zRtdSNm{C!$y_>oLu&ue(#zZ6kLqg*@)&S;=Pgj_J`|Unp0>I37xcBU*AlSI@ZRdF?P?Q
zbk5mUY_G27(yu!IM$+d0PygVZer_g`j#TCQ`d^pEsn@nrSF%2>y&Uv5VGEkCTN)O6
zQ;^1J`-0oWI?GVak3%O2Jax_Te~^XKvK?xy(>0RI6Y*<7L8vg-^=ZLU&IH?%RceWr
z+}eB1mhkLSIbqNbKCJT^MrgOuz|z+l>OIa92ny=VcGv5Z3Vb^7`=KJ?_LR~|0^3$4
znrLEQNEzMK0>fi4C*1y7;De&?_zDi&1uZHYjMVHJqcmv1+3J
z-oWU<;M0NzXi6TDy)D<5kDF?=kJ+uf+&IXsSf{a)?uyI3l0UI!zWc2IXAZQo3e!~Q
z8?v8%B!k|~5nwRqqrSg9txU|<88zMoCTmn}6!f@s^`|?($vh*gj%m{6+SU>%9m?VA
zZoyD)|I%RTcFJZNi#qg)kQBSD0>(G}c}+os_i0!zr-3aZM3dbZ+F5XqPUp?=*L9e4
z&u{h4MD)+fOTzkPd?Y>C-Jg`Jq8}9R_yG;8?}F(Yb*8@FOdF1n>YrA!>EhxB7UrD$3<{-14us1;1#j)F*RVR!Z+K${UXv
zKYi_U<*Riqx@tOOaBfWND3gLq#|gk;>r3{pk7?AXCx9_2?H5%JYmXv>T9Jl`DR!fP
z@0G#G(1vWTPwZ#frRFgvNSY+PQ&7ci<`TK`SNedN%m28o<^S)tk8#r2QR0cZilA+W
zL}|^e4GhVgls0(&x#q4G92C?wlIqGhAI*%?`B6sKgsiKo-;$$TOrA6b9;WcVz8e?C
z4&CDecuj?&vMq%LIXD4{QJRua`s8toRZ7Y9x{KWSN5w;{d9;g1jdJBpW_W@#Z3@!w
z*?Axi#aC~;KYO5e#p312G
zcpsPB;JXq?;9PHXq;2%u;aB)OqnJI^?;CxkoY$=mwcgm9p|^(h;e;W8hbgHCZGHaf
zPKot5)CnmrGnw6Bk_zW*i!(ue%Ib7N4oFKov-d@z|B|6&M0j(RL#h*f#b&W(O2*a3e$<1I_@?NQkZ7Ft7>
zmlBT1+#y{tUxb8KUyskc6iX*Y99Zoc+4RVakfACnWMfJE|w@Fi!EY|RSt(csb>gCZ1LR%X;z2Oc=I
zi*uzJx{e#ejmG;9z~!+oS+3C4{LZRP_@-;76cF%5Y(m+2V?0myGIn}gJr+eZ+cpN)
zmvA=02xG-3_o;wklAew-9*ychn7#eLdszUi!d?K){Mf0HxQwo%*&4BZt#nr5dD6(wZ
zUDC=)?rge_q%tANaI1kYHzPZFK`Q}jv75MZrZCy{pQ+fu-fT&%^lqa@Z)SZT6YH43
z7Y0rJn47G{nnl>=dsj{`(VYSsbLcp27k^YoSj9eNWCg!`@57^1vy0*eI!&C#1_{WmhZ1nb&ny1j$FA-NaryC0G8akWx$8V
zhU1;>K++v$^n2)C!d>0pOx?$>jm3+W#p=yT;wOkDOBNshZV*?$M?rN(>@-ok=ua2XEB>YKZyT)f&N2%sy8u}bxGEk9HNtR9
z#h@A8urj^sJi}8%r}zJXO-shZ7^QJf#GAsHK8G7yr<)$_NiI$#4<#k59xC6_A0BnF
z^)LwV`O2>E0lQ!elE|3sa>7IPMsu050{!c^edqR$7yND3FYRncZ?6!j{XUDg@u0fJ
zKbz$MecrrN6nj=5n!FX-&vQu0nXf9bip4Sdb!R=e;LWsX#`C4c?D(t6;!9vi4S3i0
zY!U;2B#)rzf~wX`>aCNU8f3>AbPO!V$Zv!J;UYo5S8DUs-M;wn!J8o*TGR=&!FemDcSH=9X
zVZz02WXaU(%9y-~)IYOm!%R|Dhn9;ivCif+Qv+*2aX#}V)!(~th9p(rZs`w4dAorC
z4`+dzjV4lC57RlbII88kRJ)RH!r+a<7=JR-IAq{xA;H(!^)(4G^?AW8QXjHSonelmc*BiGR~~7
z=^5c`q4>9YB~$j6yxp-?Tc2Q3*iTzJB`XX%+^^r5-&dFoE+t{-h*yd3ccw(t_+o?_
z#Ua;-t`F!-9`~abVGcQeF9JLOj`q-ZYj!>hryH-BiWu+7cFsxcn>3EG$JVo}iVs}Q
z_o|SUMv_Y$79r>KdGD8(mP|Q>nFYV428Xb|8>Se4$1u^Nyku+afF0zm#V|@}T@*Nm
z8jiwI86{Gn?h*^g$O#P;-bmFQFs4BD$HZecd%VAqoEvcFym|44AskPYQB}m}E0OBj
zrlQ>ojTHTxXrZo)uDU$-+@S#tTg#*pjv86Zfw43BMh}wlpWoUC?{Lo}56?dbw|B^XogCnHSKC6gn1*)KywRZ0e6qib8Mj
z`OQ&i91;%`c;2^I&F=+8aWRW3A&D_q3Y?K6Vn~L>iwo6X6Rr=_?ayf&GeTJcWN
z;g>dG3?-fvRIP-e+1TpST_XLYUk41KYDLsdF<6$-{8Ez4^mrnI_+-I@)4QW1?dVY7
zc-<5RJo=Y=ej6nB2VFM@9e?u;QckY9V;Mzl_m&6m_8;d~*0hVh4|=$8Tzi_Jf6x?t
zIG+lLOYl5T6rv|BgQM-uT$UkbaC*d@n4&l!7aJWD*!lY|{Q-*^&;$Q&Asaa>ZuP*hr({C!`s?Y
zds?e(eRKPGkY4IZc;S10YmK_(MxAE8E^IN|C+@5aeEMx+rc(L$r$Dz3x0HZF)}$OV
z3)0+@L@TFw7sZ`F(oG=WCHKvq`U6f#P2ff4UKcjEY9x(sR9>l!$lA~V>xi601qXiR
z%Ci2~uCXa5@p=$}Hgf1MDL%Qc9DW&YcbCTw!Nu$Ikt0+9PWBJvfRNzt3OLlZB+}P`
zI<`LlM2dFSd7HK`(q0zRl(Jukow+=;j`;d`rI~Q6L*K~)ZzE!Me$T2`W#fakP#ya81tm=ov7s3n!Sf;U5WXs
zTx=e(K$_Nt?BG}a&VJ{#5NFx=X9}*1EBrJ^$tRn|o*a*Ud-huiAE|Rctyl3gYJeN^
z&wod(of+b`a#nM)_g(si>P9yh#YDnDAjs6Y!Y7N4zFN6-FYKr-WdEl|+*PjMZt<1;g`uGg<+VM>kO
zz*Axbky4L}+SBj_vuEAmH$LxvKL%jvanjV}9kmja0ucWZ1;AQV_z8!?7(4?sp?gW^
zX#L#OWyU)`yb+LdF+Az9L;-9vjV=K6ja}q!kDLOt@x*tb?x(%vccTDF07}u$jjkuS
z5k!apE!sc~O^IhOEVk>l!3fcGc$T>&$&t`VrkcAnS6WkbhNDKZ>Wo5Ik?!|#J5
zoV}|4Kk7|JIrxL#kLtpR1Q8{IT=@dG7uR_Y3wb^?A0Zwx{sf0n%n6E{{c&EI0J>{q
zdoi_3QvtNsk~ouA1<`-7Wb>nYj_fAD-;Cpw{!Hp;Fb0(XouTuRJ);15WCN1x?@j^3
zJ6M^-Jpe%I&~c7EDcCwhA-|wxXKUFivIl6V!f*>kp89~Y)&phNb{^_2S54?qM;HY5X4BaJ;H`I$9Td}l
z7O~_1#~u2NUb=>mM;SS_%}Ci8$ATR%*pRxu2z?=ivk7r?*RXcUCOi|*wd%b(aNX1m
zU3q4%zAzFa)(~vf<2w6O3SCcpVn&2Q`y8jd>Bcf4b+(4juh#+0g&2qIWB);3ae6
zVR)-SgDgFwl)=}VNo1Cr@GmhhV0NMW=g+CLSv%3!OkP~l*MdH1>G=bvOf=7w>1(WW
zi=D#Y3p2|p7PlxSQ(;+1O##v@
zfC{u8DS0Ar1QCJ2R03uj$e8RB`M!u<52Pis=wjhLcq3Lm{zwVDcGX&jYV7v7eR|RE
zd$E5$E{lJ|M6dT$CkyG7^6_EUhOI|pBjhU)La2$@pmXE$pFge)1Nb4ofH(&(Cf@q``hTn;
z6S8|Tpa094I1oc-=veUUEIRLJQu@~8z-|iZi_`#fj)iKjl!Y2IdxMlXPB7V}`KS(<
zW~5XA_~DH_hA?65OxvqNk+TuLsoH+pgP6c+ozzYFp34MBS`rh)4Ll!uQV6j?c6~Dy
zskN2^9?vwcC3oIxe(3UDMi0W!q~FZ{H+
zn+9;x3=p-2DLxJ;lHPX)7cS;^0$y)vxp8N)Gq4`@N{
zOQ1LHMkZGWC$QIaj(qC68_(+UGTb=eYd7ulSwxRRB-6Jj#g+&H@i7#OQ~aOqPh-r#MD8`4pJx
zM}yRVBAcwwnO04bh}5s2!Pzr_A5YguVerc~HBXEEhFPRg+)>;hu*vumgY0qqzi2;H
z;UU!p#o|wOS{q~i)PvyT)J9pIZuCl~_ge#&PzOUpDLvS@q$2-Efq;UV4>6&_NT8Eg
zlu{U}g)Is&P`+v2A~w&lP$!I8MJR>9C|S!t!CHZXfW%(TzNL1D7W|phO!J5%1TYs>
z-%sZ^-A>CD(^_$2?)pvOMz8Q1ku_!n+S1BK@O*ZTY4^1?$G>_xj1Wd>OcXWBQp*@`
z(y~_jNhBBQ244^7t`Sd)KLuW?CV+=QA*da}q1*JuQaWOQVNirS@W9yd&~(sdXMZw%u7m5E>77bqasylqnN
z2~-iIm(sc(j9^8i1DM>3ei{8Hv379(C(UiG>xB^4P#^Y$dE1-a(ZL_
zo3#Tz#PO^%xghL!_${b3BsGRSvSQkWON`Y+75vh@o&pQ4hb=zOR23V{0_~ECCpBsJ
z_sQb5{Om+&;o;QypJ4371;k*u>7%i=@cV-riT`>y`TdsGEF+J^kH02wTxB8mJ+yl9
z7FuZ*nd#J0h+&b3X?*N`7GGhCq%P={yzC7hQT|_KdswjM0vATIr@Q;P+i>CY%4WhE
z(ADiD`O2YeK4m)plTP<}^T=-qb<7`p_GIy{ekC=woI6CzdHdXQdIJAYYT`t0r1u`_
zHSjQ?0hy8hbv(5j?H!VtG2*M?EcSeopLi}01jT^<1m@8BtdSJx&oPPYh|I`Tu
zuNO%Zsqu6(S?a=Qh*%;@=(Uj_mTSO7s?4Tnh?6}7@2TQDL(vriU30+c@GG~o!p2Vd
zVEMBjlEvAqpFXyFusWTccX}pIU3ywQ}
z1yc;jIv59gu
zeA0C8U*WGFn`bWbUu`H(^dn55@71>O700NNQ}eoreRA7NKIO))xTC6
zsM$!WqgNBy-|;rB$jfAdxt&_-F+XRJN6c@lvltN>D*MAdnX%thqpTJ*nonbqRVDG{
zN0f)07%2g_!f4cSB@%|$mD9&$WGI06QgG*H#H1g<2+t95TUO{}$a7CB9i1vG?4sWw
zi4s8K+%@CLCh6dKn*6}}YSQ;^+a=L$p|;3-X>7e%
zWt2dE;^*=1UTL0MN9%d|UNmd&$pi#(d`~NY#sO*}413OC9OU~X=FjO#gxi~CKOk0T4^jT);SEp5;H6_|$PY2p{TH*miFzR46@l>b9Ayq{oSJYFX)^CQ
zgRj-Q-@SYXBI`JdVR)3?dY#YLteWRu#QJ9;^+jkw$-U;i0L%$?;sh%nuwC$xAf%%r
zi7F2cJXWw;u2T)~ZCpCCcQQN$lbe?yW4Jj2A~DT--A>SbLV$NoZtsk9-u%2T^ks9A
zc=7F+%cN&GO63E24-#`uiw6=5nLOZ|J3a|{?MBm{AqQJ~8sPheQi?qgf2T1;Z#UP`
z+AB({>sJO6%gy^-gN)pMB}v6mP(03hHp`r8P(bt67nOUJl1MsV>&1YpzY=2u@kzfX
z0P?8*7ErUTxo=`R#+KN0kK=`i-o|+}_95zR1du2lj0QaOIXtr50mPv7E^|7VAk^a|
z1G=mg9nRtESqzV}axcc#)*Y9bYJhGu`sk?fPq*%XjMrVPauEuI+$mAG0%BBqsDV6q
z8Wx&rF+Q%EG5t73wM?eKxYbXZ(pZf`J`9!!Yz8UqmEjkzmP6i->ku}1KxrzxFnE%l
z&o_3M8V4y4%S<%CRVVHK(_j4*vBu}DmAXf*+gpu}OoAWAx$;E*DD;o6M}rX3Pj
z$36V~ZuMzp?1jkvNw>C@m6d}EFE};5=2~p;``__MUq~x7
zUGQ*U2cC4+{YP}UPrco1KBzvM=#|A2d2!V7^yo;Cf1kvoD&soHP_AZ20P8cOekV{GVp5(hG*OwVMcA34SLuJG>j#l{VuSJv5!x;J2KNPN8Q3|?9wTZ&8N04F~b=Rz6UsLGV*0a=uje$i!M7+Y-
z-l*YNpse*#JDuC~CaX(|)XG9G8X!B?
zWg#T`aErnxN@3nSDR}v-{!64I0xB%t?oiMaQm-4b
zyw7x1I!?U)E^Gp;a@nP`-k?!W3wl?bFYdbbM`-mwOE>_Oo@XMr@M%{#4`ld?A>%%)zPeMH7X^e<85
z{hzhXHEoyhb#Y1Z+gT&0RHS)jnUxj{{b&UJ>3q7%dE>|QrizWk+0vFJ-L-j6v~|_?
z;UmuhA5QQedPha4)=SrSpRnJo8lA~3Zig2PV9VrWls5QN&wYP8Y-*Zj5^!`;e;ImV
zWA}Th2kWOi3F{wHhYiEJ7&5~Z+LF!k_1TPMaJm?Z~16^ubzzpxxm`}PdhwbKELq@8XD2uiS_>v8m<(l)U;il=ru0D6ic
zdU}ztZuFN$(yo<;SL84}JglB5L{SYTY5knK!OB(g1&+9`@Mk2H1Icd0FXunDlSZAK
z?@eKwe}6q(7u9vGip>lv6RY8tO)!(v29WLz{U9qd1rE)SRwN)9glW<)!NV&|E-r`M
zj|Aor(#VF=WGO#i(d}L4BT4{$xC5$Dp5s!z2xrg*%)S4sR|&pCta|@f*LR{@i6S=j
z4i8=VU=n(|d2`F-e0#r=Lzf@wxEi%H8%x8Wn2?5dMG`Yz;vDxx+_~8wJ_RGSMIVcO
z`9|Is-oTz)8^X(Z^BvBIl{1Dx@6tmkDF@Y8S+MEJVj`pdhbSqJyg)iMU?0Ggua_b*
zMBy}PVh+Eq%msU-hGND1?1X^!|Lkne(&F}D@p}m}`g}RF)8`M!!$eGlpdPjj5VG?CVb*)O6$`h{&e%!=GcfS;2@=Z!8;t
zhmDrRyeIYoLK#bRkXF-6vaFsf_z@Nw5L$uz{H7ZOJV&jI+tjoQA@{(IV^ybG}
z=D+TZj_rNB$!OM`T^Bvzj+)Pju3j$PdZ!HVfKES4obb`WV7US9oyG9>pS|y?HQ(^{
ztCgt_-w2zm#0cGYX{_OS&d)!V#2rj!&HO7)Jzy=ppcZOD;yxZt`N`%-_D~xEK$caf
zgJo2BeLE&KT4z3d(Zr90+@->mCssml&tFkCB2Q#L4Kc@*C$9?IXtHdEvXXwbmAxZ(
zw_fKyS1?r16?^ycLh$w#pA)^4s!J)81^UO@Ei*LJ(C+5>(+J)G`5DM^R8
z+KkEltGSY^qs9C668^>92Bj8NC2g;4b7u6kbRlUzn%|Bq8MBTCd3~xsM7DW8PF3js
zAGXdjkgfNP`$_CgDKSH}M(sU{1VyWAR7WN|VkK`oVre@Djmbbnq4$wI)(j^=`EAKQ1`rtnsYR?p%R2nHS8~
zj3q5*#ObTg;JV16L0FH8UaG`cKYgDYCq(lose3g*pU`<_(_roG)XdNm^~Q3FQoO41
zokXD=TLzT2=Vr9ZuUa7ndi0>N@)Enals}5q|RK%71Ly@=}noG_|?S
zQdH5&H5cfJeGi1g4xS;i|e0FvpyGkZ(#Ww#SLs%SPXitZZqz>s_Fc_
z^mU-h}u0QpMc3;&)kQ@fTwEZ1qhx=zulY=(X)3-eT`1
zy|#pB)m>q5X;}7_+yv{HpTxgv!ENFW8+*&Jh7nYsb@5CVb_w0#Fru0Jk`tl&-*9W5
z!uQ{-=-F^W<5C8<@`|9sCw5+XC4I+dN6U^OhxS1840yK2LyYwUWz}LupeP!6Z($6c
zpOuS%+AkV;UEKsb>V?Bn*v)CDAwI2K7{OxeODV6X|eE`Nt1=Xo$C&yf`H{I;4^_CB)UGzAK@qj!&d>~VZG_|F
zqbF=VC(bOzs!dDyIQ#xu%IZArtknQheYNbr6;iKXDw!1Yv5TfVuUF59r+QrEMFnE5
z5V=L%tt;M8#BfG(zV-CkJ8e1^66`MjO9rLAVUt;)g~>aY4I*V5dOB0+ZSsy_pa
zHpdm#N`7*G2|vdCa6*-*xlOt3ULGcm$Yn$z?Y)u(eTtZk+o(->#=^|wwhlQ+zr&=v
zw;cp)mFp8*4@$TFEXe8oFsKG9N1Rr4z$YI79p18I~+FUP_p?XPK$9Z@h*ucj#j{QCrbZZAX^K{EAc#!|DIxtiz?T6ZX>c`lHq-*ll!7?`>_Te!*oL-Lo*mH1x9OaQNA@
zpphe)3R1s<${iyZmM8KnPZ26$D>-S)+94mEB}d9cW*bOz4Rtx~cW
zN7P|q4=S)mQ&p-ynR5Xa0vVj*rK_F47ld{~6pYAeT@Xflo95-#u^>-SbvRSO;Ws-@LGlfIY|m{h?2u$r;0%1b~R
zyeEMq@Laaqdc7$Yf4v_-4glQN=vsOJ^pY-VY**n?z!DTh#Z;yp^(aLcXK8^
z>#x;R;E;~M%10t^Mm{6coNCwv%B{_5aiD~b9?q8y65Z)L0!Rt6MvIBPTXpIBaU)>O
z#C@_$WBqXI>SwALc?r$E)<3a%PLZ1al**d=c1xNY;V-2>AFqZXP)i^M)8jyKYmJl*
z)Tk+g-_X-1K5{F53p6pWzag)>n^bmzhQbh^7dffC)yp(!oajWLOUdyohYvq%+`upU
z9x*L)b^EKHD~%>B49hEF%zPzh3NI0C+7&q+&np_v>KG1{-X>Pv`|e}+e8OeL`7Mj3
z9$lYxR^MCkzb7lRYIc*yO*I*^yjwLF?_GVc5r;GD;Z`?N2P3|ZPXAT-TqPeyq&>Rn
z=ou)ufW1!i`OueUmHM4wc9vA-?Zxv-tA2Tcf}QJ8WI@C6OwZ3R#pPK>=|NO$;mbRD
zWE*gRQmOT2keDI$WBzH>yT0UcV9~tq$XIDWN~1gyI{ARMn2Z~7BfRboC{`EA)Dc|f
z)1VE)D3Me%e;f__9Tp)qo$TSqN$r`3kzpOXoqI~{0{l3)YxbQ$YIW1ZzR
z4;R7&@^875rl;5YJpvaqsD1_`c}kbL+)I~~-$<8t1sKrTtNM>)IS6?T=NzV})b1cE
zOS`{ZZu}ZduAiio|6dkBbcaS_syWPFQ?*JqmZf%Q#`&~9V7S;KHQro3Wq}L!{&hn7
zkD-h@x|xJ{pHiu`%Gnt!r1p~CMCTWNU_p*xtm^Vf`fome$A$NHPsjh6oVwwUQO@V-
zrJS3~{Qi^EYq)Izo79(5H4bU9TKjUQG9fOnS3BKL>nsFFXJ1l`%&@lmO3cMQFa^D1!@|c{E|{>7Ym^kzk~nn
zH116tb_JZ|dpJ=%G5MY`>wXnpu{k=%x5_VGbd#|5Wsxq|%KUVzyz}rd&q*$3=0`cm
z=X~VmNu-QtRhrNbH&2UY)erdT&!}zX=NwcLg{oJ#tToHYyhmAvQX
zCzX-(y5|}@4F~aQk|()wvV2#tn*+Pe-gCTTQcTsV&EqnP-yDygj*mU0sq$yjB&WLg
z`Y~UA%4OD#h_D2h&i@r1{`1XVnWzH<6Vd~?)E;s}VUFj?zgsDxzbp#R8kM-=)zeXQ
zw!hcNk|*}yePT~Y?BQCh7RfWA&cVV;Y#opUK{|X;DATH)^|Bs_Mur6MFAnLVF|^)H
zu`~7lF+c}g6ktA;BNMMyW5Md%^_RVZnyIcQm0KPjt$drgik$HbvjOV3Ik48X!J!-X
zNPbd9UdY^Hr-s@^@4DJW|CEUB%+L1Nou7xt-D4vc-)1W9NHFC&-WpHod@lrsVOM*2
zYm}4sB>%XQt3k(5+PEJQ!IqQYS)aZLs1IL*DSm6W+n>J)Yqi4}z8eC=8HQe3>t#4W
z92!h^3O(c7FKEFMmUT=;Yoe2v3+cq@S+R<^O_t#1qjb-x;XLH{c~-qUfZ9cu!5g=~s5M
zG0d1FOo>7^F7zEP^avz1|NJAjUp}LbyA<%)eD3+Q*p}G0kw!#~1%q7y@r0rYBY9EV
z_Y|85-=y!{glDS`>}6!R
znHVA3E+_rF@<#qxW^-ll`!Q*8nUYD5p+U5AWQx&12k&H-;4c+|>JlooyY@V4c8REW
z;JV<%g9}FElb1Ejr~1NAWg;R;=f4HroR{3J<6}=6Ph?>RxVm8pm6`Nj7=m#AZpqz3
zhM`ib`r*a}`fm`pQhmvhPwK;Dt(DHNO&Jb=+w*U=gRFAlkb}Ym1u~+vsp2zvTXlB>
zqWP~j_fqTl#icOu#{hx|QvBuC&Qef9IldsNPKn`G=cnR|EEA@;A(F3`Fe3d=0DiwQ
z<0*O0h#V3AF5#+!(L6D+rJXWA#pQjFXx*aUmJK$f41n2
zt0LO|&I>B_CDF6G%%ZU&@T-bW?)@9xQ4u5=dvL&4a8>~i>M49IK0o7EEj?r~TaqKL|J$SF#M
z=~as)xrm>Q%`z&H6q*pplDMNV27Ruohaq6n<#P)m@)UZ|ot7oLa`RfIiv~Q)*SYVz
zl4Rla7;UQveK5D6&zkwN$UvCR&Jjl9g5%~naHU6p7d^;+_(@#lu
z>}1_FxF|-S-4}Wgu-K%N&Irn!w|R_>Pk1eTaLV+`)><8n<5bP}Sd7r)%%V+xpogo~
zrF5POi{ot~PJho7DQo4DReMJ8h+b2Nou4MVv
zyt&Vw)w}Ml_#NxkJ72;w7XBK^ej0nv&3jnxOw0WO9n!P^DGr|~7_?>>+5AbWvBW!*
zZ`$OJ(%;y-#mwBSJ_TE!p9cOlYolt@x1035qkLD0+{|HXCDw1nQJ#4nbN4;4UbLX(uT9FJV?-B&Xhj=VnkSB0MW=<0X
zkm-_W!jmmK-^nWa{uGIlpU@{Rg$DbwinogJAHM0e_qee`cbenM{W7^xhU<7Z_%
zX@$A(Qwzf#tVvF!7k^QzNuMRt$5|5f6so?Wy3FvD;K@{)G&?NqpQpl+950@J`lv4u
z#>ECDcqb(X^^gk-wq_(=y5y5$hX#Y{qF$qWB?$kNAdxXo=U92Fy-v`w0W$uvY;Tlb
zo1v70ttoDIa}jP5uC3o{hZbsHJP0e+LAS03TIR<8
zHBsEqDe2wZbC&hpd(^UmFVi=(Pm=mV%nhy{@w`S!b?xn0`ba1$7ou`BBm5ReGxi<&
z-mm$JFP3-R=O@x2HEMpd%bL4tRQDLi-vm-&J@5a?+?8SdW=c4BP}W>QcYu5dgyk#;
z!%qQ61QB-eg&+q?ac3ahuU^WySYIlJzq&>6LBtgnR&<3O0|PMG{aax&>HO_1KoafL
zA6&j&+N7ma{J@cL7W8|d=cwY;N)7x=!>g5L_+saO@8)sH~E0pFB=WNR+1on
zC?%c`7}%%i@c(gA%g1pQw9=G@RDO)xKGG)96<|o>aBY{d`7iE#dO!$yh&2iOq&id2Yx%j81gAWWTdZP|1)$(A=sGCN
zJt{f0+gXM2pIDfJ#&E_=g&g=3sX1kWe#}X^p8n`$amt7c)pwr{lDmsFmY=T^LpG9q
zK4IT_h{_GJMt~35&VLNX;(a!%o;(nP#fORKy=kM;NJwAfa^o1g@u5YRW
z%=>YRBk^gh(o5RdMvr_yt&$x+Y~CuK?%^W4D=@{w-OUl6_bCz2kkst+Qbd&TeoyZ1
z#AiqsHMh;#=SO-eff@|^0RR&*|7#}>v{!won(n7gs+id>%WAG&*@nvX?7a__v%4U@
zSnQbG`>|v)P9EIm|6i((B8BV{0g>O$^p1~2n)mVFHGro)YL|mD82@|6$W_il>RxTg
zrAht%#cr_m)#pdQ?Z86<%IN0Y%wNX7Tu8K0R%p=kH-FM}Y-&WXwAUVY3y1e@1UBwR
zu4*U8jmZAgjD7u*v>x$rog-t^%51cZA=qB#_0NAf^N>4DTTlH`cZk>@RT$8F;9#P`
zOBiLyuG8PPG-};9#5^ai
zW|R75Ue=`m#gi$gNEm(@5MXhGCWMhm&4C^j{#iNiVrK^hpvzpeI8aiRN_)GEn%KjK
zFL{1jOI&Hr=XJZxzT!F9pG%-myg8X#`4A#o*~<20%?j4AhS4jo>uMW|!&tPjnu#lsEnUpUH)^lwa~g#@S=
z@aD*)*(K*!8s|=@2bh1@Zy|v8B7Sv(5eKyMX7?F*`{2|Z*RmUJUQ1;ZU&HJ2^VnYe
z6v%LiyAAp{%oFhb6^_1j_OF>C-LwZQ%bM|HpIn6r467qmz*W(6{0B0U6J5CylIitJt+^%`(2R`=cK
z^!>atkg-F8D+GEOky!bT^xvPLg6iE*x+8(qjWm|>pAv4>ST9|&JrJAm|8
zx=odcAZ6^$0&&>(6@ezIU)=6be&iHCXI|B%b^
zu5I6T&)ob(3Z#G4S&;dCM3t{A6ABl50)=#?cs&PsoIZTexb&D}wmY;SH`C(LRWO?%
zwgInIiv8~A>OTx~XAjJAI;wC_GY4Pk8pIiTAB`T4`Cg5!`IGQJaoQ58EIJ-_g%NAh
zSNiv$yQw3Ft!@3Xz8Bj+%0XMcT*JQf($m#_2|=bz0R*vb(=O{(OL!+VtIe-eXTX2b}t-CE>Z0
z26{2*4Li0ABtWM>cT(M5mr$hj#`b9}YT_U85M?eJ{Ea_yyf5h6FGFbPWc-CGvdV*h
zWqZ5v;=Kff@Me1U%0LRTHTZ|dG!uO&`4EIv%epG@0
zMp)6^_L-l4F=c$cysO}F*ItWP6sCl#O#KrTc-Prs(Ib&_k-8`OXWqi#JAmBA`JEwn
zhnyXu-W!x50$YQEc^7JcAe)gmUBjePG-Uy5)+|D_UXX($~ybqOcS4HU2c#8mJ|f5j1fMRQ{qAHWb#
zOVpYUIYzTeXdZUatKaipS+lsEsSRW3MJ}C|k1TN;83|8e@j?tnM)1&pb(vMDoBxFQFx(Lzo^o6$r
z+Tm6B&6#rty~U{;&J(%-Jt979x0f`x>INBcwUCkBLD!vp7O6CvaX$i5sOUAH;Yva+=BUZWyqavrGH?to8%8`4?
z$Q2L;I4BjR+D7qHRGs(eQcL@f46EK7c<%q=92wDor>RTM2Rio(b8hha7I7?*oJ8fg1OJ;D|;Q9mtrA*tPMNr-Rb(5d7CL|LkX
ze^LJ;6dPMK2f0hn5|vhhC6$@%8qdt(Ut{?Kc<(Styie_+r^k75ohtZ6uq>q1l(Mm<
zP~`GB8?nh?spZ=ocowvvE0aN+Da8n%`o~GI&kUyB*_5zu6748*-gOGrhDxeC{Lf3G
zq~YRbrI)9XYCg}or8!h^fZe(!_V-t9X2u5@8YGr4(O&v(cjHSO8_=g#k8T;Cu)5yY
zUv+Xw!59dV7wF2|nhS0OIk%X%&$a|T
zmHcQ9%1$n{mPFBP6rgTw@R?Lch~PSQ00iwj8I3UeTN1yXiLAj;Gy_7OKgKJRgGE=Zw3-)p1{LvsAS+GisBVac>O=V{lRv`A))@WuG^y{I}|%
zu0Yi@X8+{K_1=~B3Ycy`X*wS*85SGUU$lluDkoN|aJm0VmLbK|L-A`Iq9O^q2)oRp
z*U6f$ZYf}r3tnIA+4lq=GB2%z<2wA`+>Jh0=gxpKm=C-9r*h2T-
zP92J?;%wP^4jdx%rhH$F8&B1H<_TR6Fn}9{u!kTd;ah-0ov1e}XM`**I|hp`vi>K|
zICHbMr(8^ZUjm_G#bzn00Zi3fdTy-gx)R3NOAaO{UucIY4bXt-I?$I>pusLQ20h`7
zI6jhJZ(Tm3i#oN{1+JlBqG!M1Hj1AV40RZ2I96ji-)mdcI6a0z4SG3!+qzm5U}{xV
zwzs0$c>-kjk-AsOun@?eN4HqbZOTFAeBkDCpaOWYBB5Pws)^)?`JB0fXmlZ!rWhy+
zE(}@fq12ilt)82&y4Bw;l`_l(wu~(kpA?Jt+Pww)SdA_WI?5Uqu(|u1aZ)@mwnXq!0qV?+hxr}?6$N13zQQ_2_DD&dPxH(
zXK`!A8ZyXDouWLi&sIn7>gq@P++56F+ykUsS6A13fQ76cR;hWmCL)VQp@mojK*$b|
zGB$Cjz29$&5K9>)gb=xJaoH~Yntl6ku4B?f%_xuW
zBJq}{sL0t&9#YcZCc@l$4O|Amk>?uBMpOTjK=|{=eU%9Nr+anTB}LRxSqKKYoXDkL
zWMJ+~xt_%T{wLR`|DUjI1_+;}eVxu8mHDtSynNTeLt2ecJEB6n*{D0|L!VkIS%0%a
zQ}rR8)$r2OH9*Hbq6m)CRSfYqe<>VvYi;7o;n*tIX@4d
zx)rXBve0DkXUf>wZ#v&q!xDJafs|hF^D5s93A<%Ls!1f5oI~G=jtk*4ri8?Xiq~oc
zmJEmyNN&!d6AofuxzY6VQuKs;o`-vB)WeCrHK5>P3v!SdPP$1Ul4|&iLIX%M#6nqr
z^YXJmCc(BZSd&QEc%R(>*y+3?0A{Ht|Cl5d2Pg&<$ttT
zeH)K1#{!Njl{{uVJ=RCHC`wo`^&V%T2MwUcM~;BreDM;*dHbZD$JvoR9@?-)j)1wm
zmk6)N9j0?2ZOw7==!~A!G6uzg5u!p9}Xq5bS`b?Rcdi9oEB97E4I`UE=5Bza;!9Ja=wR4q(
zlpG{(q&A+m%~~4oH|%9@>A8Z>%*V#V@a|td_x1Zy7Qd1=7gl@J4hM#Lp)=0@bm36Q
z%!S|xF?3CU^1*^2;Pp}^>ao9+3>iy-s~XzfEOtyJfz3hHC2OHxQ#WSJUi^S#AFw17d=C&OnM9$)t2YFVnwx$gIC)6`W4fAELY}giiM$5RFF{743s`OPH2G+i7kydx~v>kW-^U;GMR}8D7RpnUhgM
zupZ8m6deiTO1%&U6)K~b$b^H#ewyG8l1~$CJ>Z4gxdn#c-&Q;ac#CGgO^0O>{&85T
zfw6L;wxX*r#K;Jj#2y2tCufqsyse)_@GHh()pv6M3YN_#SghVPQY(MvM`P+0ww0(6J
zkQOF}8x^)!e|Tjjq_<{@fc^*=0$u!zN~DJ6DP}gu!(7}Uzl`scHHP~$7L3_HHQ~epao)o~ia^|ds^!RYi8HYA
zZ>L-Jr5OPQ@m`LQK;$mN{m1+fMr|A6xLrorfn}0;9|;J&gGH)1!QAq
z|4TW&+3~Vra^MJ1TcP0ax=#e1seyQMEukdJkDp-5tW$(IdWp1KE3jwx*!HHluuud@
z5Gm=Z)_j>N(w}`#8UHp)Fki}P#hqn6n_R#63ouXHWir>OVZ@=O7<+`5nZz_NLtf9K
zb~-N+2|06m&e!&>18l!jvdu3F8jO@)4J(Yckm=K-)Hg)R8sexNQFkAkACw$&2RyY6lk0Twe0#7U*-+cGqaN6Ai51ZK#lyv)h~ldK@;6dlZ6BC
zhq@Q4sw$WUTFETOKmPRYzoBZAgD_MUME>2#7R%|ay<3St4t!bbiK+O?Cq{$r+yq`w
z!6qZsH=Ba|#4z&?i!U(PA_yk$FNDTIWzC%zSgDi=#jK~6EBv!a9U2G;9c>(br0|>
zc7y=*vrx?xThWJD0qIxo*|QaIPqU(5wZFBnTJR5lZ1p0<^Gb{n>ixA-EpuL
zAO+i3OImwPojUj&3i_jMgX>ZHj=vsQtqtd~7?v9O!L)k|p1dlUz;1o}4(0K9l=3*4
zO}FS@XSh7PJs=)uWOUHr3W2nLb)iuDuwE<7slHyV*{?aW?Ui^vN9Qiw?ws_d3BV`U
zz80jbQot|mJ?EMUh6~+v9>l^&Kx$y4!JIknXR3zmfx-|#EEb)ElKC&L(n;#aCLk-K
z!)l$s_T8>u3Rz-IX^8j-3MNkpNlmwx4|vpq6kqR}4@76ZSn;e|CG+ix=kv*$qhi@)
zo2holL1Is5Bn4Zr5;o(St?UB!I!p(ydlS|rXYI|3YIL_z)^gHVdv*ndt!PjW9-Y;@
za`uh^6|PLVU>`{$^=79;_BM?kE(Gjf_l2LFBMWbMz`BS=y7EB7K9S1R_mu7fKb(?c
zgStNNf3uc;@88?LjVcp1NNJwRygcDj5(ZexH5cE`k)g2dz26O009A(+=
z0ZTMZG%kTQ9u(KBBy|8=kJo+go8wwU={Ky1o-`uH6R3+gXnO<_ufj(jkZ+({rIAvP8{w9lAGubWRK5`t(AD*MA
znfaHIkT#}NgBcgP*V$eQKe)!U?EQ?q0W$q1n}me1_V8H}`910td%@VMnellPx6MSb
z0%ybyK7xLw=5v|dQcMQPo9z9)2b|TME&CfoSrW;F@XIX4{|+Y!{YFXqG$Jh(j)e3J
zf+CRqk{4(GF(TY31LOC&G<5deK;=9Ye1Uyx)^xRHH*$3{yTcPl{{Rb}tKSC{A*u8U
z`G!fG=L&`eJn}qroQk++uLr(vae_z{Ji$&E%{ar$%@ENX5;=~53n0s4P2@x~X(L-Kx!R;u5f
zx^?@GnnQlGD?PtxUwr}aVMIt(uOhOW$RAZcG)(|cNhFN_lEY!xSxb)YBxt~U`Q*z*
zE??jxebAlOfgmx0f943CD#)Uf|Li`zp#7N*l
zmGuss1MZ00ZMyMor2jBbVb-k)Pt(C`m^*?NPdyy(RJR?tF4@nothEp*f7q8q4IzB|
z1`R@ZhK4D#NhM@dC)rC>&YTi0MD&{jW2)FfJf@c^JoZ1}Xl$u)XMz$hyf1%r0&mUP
zC8WQi)#cUfO#`DYP&!n!?p*2ogbTo;s=f)6jC_%m-6ubIQVIX$^vbGp^6WN=%6Tm>
z;SX1^alT!mcq=PfpxS+AKBhinO=IR%>k{hm;+pH)ZF4unZR>N6jC)A5c=vG7e!hM>
zv%!KkxBc+ljs*V%3Q`=u>S3ph(qeMPEC#VcCaO{w@VWDkZ4zX{ggO%^a(6f6cH|Gu
z^a-AKY*x}&<*>Wc8~%-0Zuj7sKOSCC4)6ZGE@&|aIl16-I@StGn|d(^R;um85e*F>
zfTzrNGL;H4V~JHs{K^;c`uYioI#0G*@2!UcKZ3l+fw|yi)Qin-6?#J*tAkUn$7jl0
z%XRnuYJQZ#yE?CB&SD?GpS-_Ki4g@&(;lm7??HkK{{ATUL=V)@
zK$QO8pI>jywqFvQw8L)vm))thpMsAH&M_~Co)=i&;bhLZHYOko0gMYJ*iZtOi*g6F
zact6OMkVl7F)6D1G0{lIE@aH71Lm|#r4n$aM=D4@dE)GCDx#D+
z5lv-+xN2gxM!5)1W_(0{IWx6t>*~j4Pj%6+2`i;VvV61or&n;@&5N)ilKTNhh3RU*-epj$g+T=#;o&gQD
zEzj1rOuA+YIp_ias-I4I)JF&&{zS+*5`i9KquB
zY(`y-IcSP>`w9AQsjJK~0)2A!Kbww(Pkvc~{o~qoi%hrYH+2%f^-XT9we!(7Z~A~=
zI@~QtbM#1*Sd-Z{+kI!tA3~kadj^cO2!l(Wtql_D2_6ol3OF?}owtAmOUCb&3LWED
zf9n%70SOlMPL*`GmevBJ4R3jlHGMAn(X3-lI>F~~qta2#eX;wOhTNH}xc6IjNe}gp
zhs&I|D#!8xDRG8OEZsKU$=z?21wQbMh5sn};VUMu?v4Re;AqB+Hmsx9TuqtIYkYKU
z#u=Usov-Xi*c>Yd(9!!R}|x9x=9)>xGQlu
zDhWoX{%5K;FaAYoU?M4vL_sJvusk#|pGZ>;K!MBoG}r8h0+cxPf(s6Gdn~TzDo6M(x1c~my%@)_D8UoehQZh%OEs9_
zeeb_!31>2*Bm`M}p~_o%yrX2jt<&NZzvkC9Sqt8zlcui16cYj}iW|?@o!jr~o*wk;
zO1-)NrG6K;Tf~h?GrjcvJ;9xd~
z#AhkqlbNFy^b1LlsBL*9<;zLS@(I4z+tlO|Ufh~UJqn~JX;)g<)4cm1*%7z}TB8-o
znR^0IS~Y$Y8Xd)kcn!ff>rCNy7^X_j`3Wti#|hLjjg^-UCCTfE1>=J+(vuiDdm{pX
zFxGx6;Ci;sF|Zk>>s#dJa8Vi7^I4ClmOY574)Ii9lO66A9~WB4$pO8ZdM6`*!|?9$
zV*)XG>g`WbCMzc%2V;MCpkT@Ea0)5ZT&bad*hQuc|N7GK+Xulk2ZIu`UgeR=g9MxR
zS>dO0(z`e1CoG!r%^$WyR_;}y3->9`2dcn!aI7x7gZ9WE5
zdr5Yn!^uLJ)C9GbFbLLYkvmUcgR^+=>v#lQpJ*S**;#KlN?e$`nY*q$Kic-51|~xW
zR}FTxg{=(&$4VUGp(lE8-mI#YD1y8Q&sf4_%_JQG_|V4*An$^yI~tC6d1mGJmLI9h
zHu>f|$yH_H?^Cgp_}r}2ha@S}d;BD8$*h{CzhU`AkauS2uq?cPZJ@wu%d}OgP8L~e
z&OA88V}jooH8T9rFq!ao3ZBG)x0`*BkQIj5Ptttndo)Mi+)ADhX9t!#g=02vjZhOl
z)@RjSYdb6_)7E9QmeM1^SQZ!&gD^4!5%|+Xdwu{DAv+t+O(5>U;YEsLDg@u0p9znk
zSH_;L=du(Gd>9;@e414P&2*E|@R>g98y(YU9`L-FlGwYeYdCZ{KLLN-G}x18%5+ah
zdN2l@ln5TX5c#{l#JS}U1FN7#@QQeMQz$MU-VY&oh&EvGao!R{KMd#|Er>B?aZ4=iWkg
zybTKqnL4=Q2X1J%Dt;AH=3WR{+Bl`}rY-&Qz91`NwHqfm;G=RnIG7}Wu#bcRhg(^j
zK(UpAVvBMP#p-&27Pf5Meof%_hF(CR1=JEbWPxuhOUfodZ8`1jkk2t
zbSJQ6kA+LOr!{9NAsOv;C(7t@P=olO0J1_wZsSw&(v{)oc-l|rT)1~2%Y^p@#dU|_
z!>FwTm42i6aD1127<^4ea|!%0&-b8B8qDB>&k_!=NHu3=)A$e@Z2@Z4{4hw#K&O0{
z9TSilgGe96Ihl5#nKMJ1yMS6J#TvUh2y^JJy1f-Cejo%c5NavwbMNqY?n>fG;>Iq<
zj&p389ImOjF$R1<)y>fDDG~4ooI`U+>Y?|}vF$NXo}22jno?QIuEnJKE~3s>5JxcY
z+ld#L<`~wQg_f(fA=8auoKzW>t}GoL2lv{bS6obLjfyjh;ru5`$&SBL#PbdQ;jeYX
zE|b*SM&)Bg_>T8r*l*M#M}h
z@Ejp|Y61na?`BIv{%s5j`wSg6|DIEX)?GqXPh+1+iB4qN{)<)Q%vg*4Dp1=)5-A*~
z!crqm2DRI7%Xff&dGyZyI3i#^&%pvc>cFh}g*7oJ)3DoTHB{
z8nQco%8P#(lxLj^N=MA8WCDAhkWP}+#0JM*$~3ZnSg|~2n+3Fk7jYzHLUJ}tUZzIo
zRaow&0n5K64rJAJu)Cv2Rq+`wqfhG7pP8pea^w#CJ>JOk;e-WBPVxi`F!s)B#D`E)
z;E*_+_~B_ya9kpYkJLP@Qo~`evEDn~&|rx*#<5AFYroF8#$z&}v;K!y{Hly-@gSA#
z2kFFtTKR99gqTo-6eQ=KSfrQGZbd6qN02Cf5JD}T#AGpVXx;_8cOPMGDU_|hq+9Za
z8DEDXI8oF=g0HEbEytFpv!cpyQg#(#@LT(;Pg~>4$yXTWJXxOZa3U_KxdIlZNX;dR
z@7chvWhR?=#Ks(u*Y8p@*^KfnD!{jJtaP}F-_I&^df<|?@89nL6gL)iFaebZL%qDS#=uhnGNm!b})_ooClmVRD{%EW+=m9OTs<6S=HZ0RsJ&UW{C#U3;`
zyM*vDr*h3WG;d}E6vlpk{Uh6#JV*p=yjqkpE*ku&;)LdMKh8y-;NYK)Uhl8B5IbVUSr6{xmDnl{Sh*-8yL(#|1%uT;R$DMFnIQrOyaa#3JTlAW0vJ~6VoO%bD-qAx!EOw6B{AEtAM{+)g_tvOo$@!@sO;Q)00D#2XVovXZf3sLUfolE^KQuKmV
zwln)-
zPez-s+2(#I#n)x+eOyVK(EY>Wb2{&^2j%FQk!waw;!kcKXUNnp5B9AJ4;#?8cIT}&
zuAawR=)HebFb}4)pKFm?n1k_f=*xAiO)|dWYu^3f(9d21Vr5($prtzDeFTvPcg!V{
z33yzXGF;^@S5z>C)Po~;|Mq7>8R#6x{df1^>2l*`S2Ns5mV=J<$Ei5WcZvOy?rBC~
z)u>Wbd1>&GRSD052RKQSt2)T16J#0S6D+yE3&k$3D&!nglK
zU%w=E8mx$&WO5*U4dE;`z&>Kospy%tVYCO4vwsA)ad{3>UE$-EvSwOKe9T*3S)0Uy
zu5JCwFa-PLti!s;ryehGa(2M#=iz&+`vo6-foV;kHX5paUB0Bzlyp>n2Yc-+hx^5W
zfMDPgVlh3sim4OCaORcKP%@*vQ!*W;K2ss%r${0|&LPaF+gL`>zWb2#fTy)=Zi5+e
z|2s=Gbf0ZDYT+IoWP=U3ES^!kGX4Rz$w{0YgB^(e2)1Z@wqb`+onvKa9Z`hm+0D-fQi>
z=3H}v-C$bie|wj@fNe7+DJI6q9D93-z3e%5ScZaLtus`OHmhz;(XLS&b;#dxt`f{Q
z8}vIMx+-u@Ue(MP#M3`Xd5$W#%O8viMk`J8^Qh@)#o`$HhW36lP
zTalREHD8r5apMua4s06qyJ7O>D5Z;WZ*g6|*~W3zEfymR2N7D&;Dw#-zi;E}_TR8W
z^*~bs^5mnta!}Yc{p*QY)2}qFA{q@uI3wWd|AX?a!H@5xTyP
zt}5+wz94gyTN?xGF|DifiZ#rLjs36{$pBMa
zvE8gQr=O2z&)JKm1MzIF*lH`;d?#w=q%=BPngM34E!4A4EH{UM3jOuy*0Hrx@x_O_LcMVW;B($$+A@%LWx5YJyezHO7=S9
z-nx-?Q>98qq?k2Kdul}y=aV!dR>Z`VphTU?am_lx?iwga#O5B~@wFMz_ZP9A)3>$_
zIh(}};Kw8~QQi?7
zQZi)ybPx#@>8Tanqyu#LX%$d#N#Z5J`>3ZDLL_jP>Mq}wcVw$N%v(!2@ME^)m;c0!Jos7vGa
z_*Feuw#B!WNotm_R0mv{ScrystirMV(Tvrgfp*u)>l(JL8_LOCsTAkq1qKI%84@Xg-Ji7tpfyyWHWn$dcc`8%-8l7WjT2jYdzH
zPZ=2ZG5PZdKy8eiQ<2LhS|@^)q?4&J9}qc>`4ABV7SZr)J#{D=q;M~+=&RVHPaVqk
z{J!C)Y9i6@$RYuu&YpMU+G~h20eI&Tby`c++4NubbUJVn<~(BOVRK`4>YZ_4mBCmq
zeBD>7_RSk9RU$5y4rZv2D!j#@VS9_2T4RA5zv25l?_vXGwDm=q=-5!zZ$T7-e>2q;8{Q%>jk
zac96vga*f-2tU}gkJY(V#zw7