From f17998e1767e2e2fbe19387e0c7d9cc5b9e9e6f4 Mon Sep 17 00:00:00 2001 From: jfxu Date: Fri, 31 Mar 2017 16:36:44 +0800 Subject: [PATCH] =?UTF-8?q?=E7=AC=AC=E4=BA=94=E9=83=A8=E5=88=86=E7=BF=BB?= =?UTF-8?q?=E8=AF=91=E5=AE=8C=E6=88=90?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README-zh-Hant.md | 365 +++++++++++++++++++++++----------------------- 1 file changed, 181 insertions(+), 184 deletions(-) diff --git a/README-zh-Hant.md b/README-zh-Hant.md index defa77ef..ed562327 100644 --- a/README-zh-Hant.md +++ b/README-zh-Hant.md @@ -196,15 +196,15 @@ Start broad and go deeper in a few areas. It helps to know a little about vario * **Medium timeline** - Aim for **breadth** and **some depth** with system design topics. Practice by solving **many** interview questions. * **Long timeline** - Aim for **breadth** and **more depth** with system design topics. Practice by solving **most** interview questions. -| | Short | Medium | Long | -|---|---|---|---| -| Read through the [System design topics](#index-of-system-design-topics) to get a broad understanding of how systems work | :+1: | :+1: | :+1: | -| Read through a few articles in the [Company engineering blogs](#company-engineering-blogs) for the companies you are interviewing with | :+1: | :+1: | :+1: | -| Read through a few [Real world architectures](#real-world-architectures) | :+1: | :+1: | :+1: | -| Review [How to approach a system design interview question](#how-to-approach-a-system-design-interview-question) | :+1: | :+1: | :+1: | -| Work through [System design interview questions with solutions](#system-design-interview-questions-with-solutions) | Some | Many | Most | -| Work through [Object-oriented design interview questions with solutions](#object-oriented-design-interview-questions-with-solutions) | Some | Many | Most | -| Review [Additional system design interview questions](#additional-system-design-interview-questions) | Some | Many | Most | +| | Short | Medium | Long | +| ---------------------------------------- | ----- | ------ | ---- | +| Read through the [System design topics](#index-of-system-design-topics) to get a broad understanding of how systems work | :+1: | :+1: | :+1: | +| Read through a few articles in the [Company engineering blogs](#company-engineering-blogs) for the companies you are interviewing with | :+1: | :+1: | :+1: | +| Read through a few [Real world architectures](#real-world-architectures) | :+1: | :+1: | :+1: | +| Review [How to approach a system design interview question](#how-to-approach-a-system-design-interview-question) | :+1: | :+1: | :+1: | +| Work through [System design interview questions with solutions](#system-design-interview-questions-with-solutions) | Some | Many | Most | +| Work through [Object-oriented design interview questions with solutions](#object-oriented-design-interview-questions-with-solutions) | Some | Many | Most | +| Review [Additional system design interview questions](#additional-system-design-interview-questions) | Some | Many | Most | ## How to approach a system design interview question @@ -280,17 +280,17 @@ Check out the following links to get a better idea of what to expect: > > Solutions linked to content in the `solutions/` folder. -| Question | | -|---|---| -| Design Pastebin.com (or Bit.ly) | [Solution](solutions/system_design/pastebin/README.md) | +| Question | | +| ---------------------------------------- | ---------------------------------------- | +| Design Pastebin.com (or Bit.ly) | [Solution](solutions/system_design/pastebin/README.md) | | Design the Twitter timeline (or Facebook feed)
Design Twitter search (or Facebook search) | [Solution](solutions/system_design/twitter/README.md) | -| Design a web crawler | [Solution](solutions/system_design/web_crawler/README.md) | -| Design Mint.com | [Solution](solutions/system_design/mint/README.md) | +| Design a web crawler | [Solution](solutions/system_design/web_crawler/README.md) | +| Design Mint.com | [Solution](solutions/system_design/mint/README.md) | | Design the data structures for a social network | [Solution](solutions/system_design/social_graph/README.md) | | Design a key-value store for a search engine | [Solution](solutions/system_design/query_cache/README.md) | | Design Amazon's sales ranking by category feature | [Solution](solutions/system_design/sales_rank/README.md) | | Design a system that scales to millions of users on AWS | [Solution](solutions/system_design/scaling_aws/README.md) | -| Add a system design question | [Contribute](#contributing) | +| Add a system design question | [Contribute](#contributing) | ### Design Pastebin.com (or Bit.ly) @@ -348,16 +348,16 @@ Check out the following links to get a better idea of what to expect: >**Note: This section is under development** -| Question | | -|---|---| -| Design a hash map | [Solution](solutions/object_oriented_design/hash_table/hash_map.ipynb) | -| Design a least recently used cache | [Solution](solutions/object_oriented_design/lru_cache/lru_cache.ipynb) | -| Design a call center | [Solution](solutions/object_oriented_design/call_center/call_center.ipynb) | -| Design a deck of cards | [Solution](solutions/object_oriented_design/deck_of_cards/deck_of_cards.ipynb) | -| Design a parking lot | [Solution](solutions/object_oriented_design/parking_lot/parking_lot.ipynb) | -| Design a chat server | [Solution](solutions/object_oriented_design/online_chat/online_chat.ipynb) | -| Design a circular array | [Contribute](#contributing) | -| Add an object-oriented design question | [Contribute](#contributing) | +| Question | | +| -------------------------------------- | ---------------------------------------- | +| Design a hash map | [Solution](solutions/object_oriented_design/hash_table/hash_map.ipynb) | +| Design a least recently used cache | [Solution](solutions/object_oriented_design/lru_cache/lru_cache.ipynb) | +| Design a call center | [Solution](solutions/object_oriented_design/call_center/call_center.ipynb) | +| Design a deck of cards | [Solution](solutions/object_oriented_design/deck_of_cards/deck_of_cards.ipynb) | +| Design a parking lot | [Solution](solutions/object_oriented_design/parking_lot/parking_lot.ipynb) | +| Design a chat server | [Solution](solutions/object_oriented_design/online_chat/online_chat.ipynb) | +| Design a circular array | [Contribute](#contributing) | +| Add an object-oriented design question | [Contribute](#contributing) | ## System design topics: start here @@ -1326,13 +1326,13 @@ HTTP is a method for encoding and transporting data between a client and a serve A basic HTTP request consists of a verb (method) and a resource (endpoint). Below are common HTTP verbs: -| Verb | Description | Idempotent* | Safe | Cacheable | -|---|---|---|---|---| -| GET | Reads a resource | Yes | Yes | Yes | -| POST | Creates a resource or trigger a process that handles data | No | No | Yes if response contains freshness info | -| PUT | Creates or replace a resource | Yes | No | No | -| PATCH | Partially updates a resource | No | No | Yes if response contains freshness info | -| DELETE | Deletes a resource | Yes | No | No | +| Verb | Description | Idempotent* | Safe | Cacheable | +| ------ | ---------------------------------------- | ----------- | ---- | --------------------------------------- | +| GET | Reads a resource | Yes | Yes | Yes | +| POST | Creates a resource or trigger a process that handles data | No | No | Yes if response contains freshness info | +| PUT | Creates or replace a resource | Yes | No | No | +| PATCH | Partially updates a resource | No | No | Yes if response contains freshness info | +| DELETE | Deletes a resource | Yes | No | No | *Can be called many times without different outcomes. @@ -1365,7 +1365,7 @@ Use TCP over UDP when: * You need all of the data to arrive intact * You want to automatically make a best estimate use of the network throughput -### User datagram protocol (UDP) +### 用户数据报协议(UDP)

@@ -1373,19 +1373,19 @@ Use TCP over UDP when: Source: How to make a multiplayer game

-UDP is connectionless. Datagrams (analogous to packets) are guaranteed only at the datagram level. Datagrams might reach their destination out of order or not at all. UDP does not support congestion control. Without the guarantees that TCP support, UDP is generally more efficient. +UDP 是无连接的。数据报(类似于数据包)只在数据报级别有保证。数据报可能会无序的到达目的地,也有可能会遗失。UDP 不支持拥塞控制。虽然不如 TCP 那样有保证,但 UDP 通常效率更高。 -UDP can broadcast, sending datagrams to all devices on the subnet. This is useful with [DHCP](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol) because the client has not yet received an IP address, thus preventing a way for TCP to stream without the IP address. +UDP 可以通过广播将数组报发送至子网内的所有设备。这对 [DHCP](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol) 很有用,因为子网内的设备还没有分配 IP 地址,而 IP 对于 TCP 是必须的。 -UDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and realtime multiplayer games. +UDP 可靠性更低但适合用在网络电话、视频聊天,流(这个不知具体指啥)和实时多人游戏上。 -Use UDP over TCP when: +以下情况使用 UDP 代替 TCP: -* You need the lowest latency -* Late data is worse than loss of data -* You want to implement your own error correction +* 你需要更快的相应速度 +* 相对于数据丢失更糟的是数据延迟 +* 你想实现自己的错误校正方法 -#### Source(s) and further reading: TCP and UDP +#### 延伸阅读:TCP 与 UDP * [Networking for game programming](http://gafferongames.com/networking-for-game-programmers/udp-vs-tcp/) * [Key differences between TCP and UDP protocols](http://www.cyberciti.biz/faq/key-differences-between-tcp-and-udp-protocols/) @@ -1394,7 +1394,7 @@ Use UDP over TCP when: * [User datagram protocol](https://en.wikipedia.org/wiki/User_Datagram_Protocol) * [Scaling memcache at Facebook](http://www.cs.bu.edu/~jappavoo/jappavoo.github.com/451/papers/memcache-fb.pdf) -### Remote procedure call (RPC) +### 远程过程调用协议(RPC)

@@ -1402,18 +1402,17 @@ Use UDP over TCP when: Source: Crack the system design interview

-In an RPC, a client causes a procedure to execute on a different address space, usually a remote server. The procedure is coded as if it were a local procedure call, abstracting away the details of how to communicate with the server from the client program. Remote calls are usually slower and less reliable than local calls so it is helpful to distinguish RPC calls from local calls. Popular RPC frameworks include [Protobuf](https://developers.google.com/protocol-buffers/), [Thrift](https://thrift.apache.org/), and [Avro](https://avro.apache.org/docs/current/). +在一个 RPC 里,客户端会去调用另一个地址空间(通常是一个远程服务器)里的方法。调用代码看起来就像是调用的是一个本地方法,客户端和服务器交互的具体过程被抽象。远程调用相对于本地调用一般较慢而且可靠性更差,因此区分两者是有帮助的。热门的 RPC 框架包括 [Protobuf](https://developers.google.com/protocol-buffers/), [Thrift](https://thrift.apache.org/) 和 [Avro](https://avro.apache.org/docs/current/)。 -RPC is a request-response protocol: +RPC 是一个“请求-响应”协议: -* **Client program** - Calls the client stub procedure. The parameters are pushed onto the stack like a local procedure call. -* **Client stub procedure** - Marshals (packs) procedure id and arguments into a request message. -* **Client communication module** - OS sends the message from the client to the server. -* **Server communication module** - OS passes the incoming packets to the server stub procedure. -* **Server stub procedure** - Unmarshalls the results, calls the server procedure matching the procedure id and passes the given arguments. -* The server response repeats the steps above in reverse order. +* **客户端程序** ── 调用客户端存根程序。就像调用本地方法一样,参数会被压入栈中。 +* **客户端存根程序** ── 将请求过程的 id 和参数打包进请求信息中。 +* **客户端通信模块** ── 将信息从客户端发送至服务端。 +* **服务端通信模块** ── 将接受的包传给服务端存根程序。 +* **服务端存根程序** ── 将结果解包,依据过程 id 调用服务端方法并将参数传递过去。 -Sample RPC calls: +RPC 调用示例: ``` GET /someoperation?data=anId @@ -1425,36 +1424,37 @@ POST /anotheroperation } ``` -RPC is focused on exposing behaviors. RPCs are often used for performance reasons with internal communications, as you can hand-craft native calls to better fit your use cases. +RPC 专注于暴露方法。RPC 通常用于处理内部通讯的性能问题,这样你可以手动处理本地调用以更好的适应你的情况。 -Choose a Native Library aka SDK when: -* You know your target platform. -* You want to control how your "logic" is accessed -* You want to control how error control happens off your library -* Performance and end user experience is your primary concern +当以下情况时选择本地库或者 SDK: -HTTP APIs following **REST** tend to be used more often for public APIs. +* 你知道你的目标平台。 +* 你想控制如何访问你的“逻辑”。 +* 你想对发生在你的库中的错误进行控制。 +* 性能和终端用户体验是你最关心的事。 -#### Disadvantage(s): RPC +遵循 **REST** 的 HTTP API 往往更适用于公共 API。 -* RPC clients become tightly coupled to the service implementation. -* A new API must be defined for every new operation or use case. -* It can be difficult to debug RPC. -* You might not be able to leverage existing technologies out of the box. For example, it might require additional effort to ensure [RPC calls are properly cached](http://etherealbits.com/2012/12/debunking-the-myths-of-rpc-rest/) on caching servers such as [Squid](http://www.squid-cache.org/). +#### 缺点:RPC -### Representational state transfer (REST) +* RPC 客户端与服务实现捆绑地很紧密。 +* 一个新的 API 必须在每一个操作或者用例中定义。 +* RPC 很难调试。 +* 你可能没办法很方便的去修改现有的技术。举个例子,如果你希望在 [Squid](http://www.squid-cache.org/) 这样的缓存服务器上确保 [RPC 被正确缓存](http://etherealbits.com/2012/12/debunking-the-myths-of-rpc-rest/)的话可能需要一些额外的努力了。 -REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. The server provides a representation of resources and actions that can either manipulate or get a new representation of resources. All communication must be stateless and cacheable. +### 具象状态传输(REST) -There are four qualities of a RESTful interface: +REST 是一种强制的客户端/服务端架构设计模型,客户端基于服务端管理的一系列资源操作。服务端提供修改或获取资源的接口。所有的通信必须是无状态和可缓存的。 -* **Identify resources (URI in HTTP)** - use the same URI regardless of any operation. -* **Change with representations (Verbs in HTTP)** - use verbs, headers, and body. -* **Self-descriptive error message (status response in HTTP)** - Use status codes, don't reinvent the wheel. -* **[HATEOAS](http://restcookbook.com/Basics/hateoas/) (HTML interface for HTTP)** - your web service should be fully accessible in a browser. +RESTful 接口有四条规则: -Sample REST calls: +* **标志资源(HTTP 里的 URI)** ── 无论什么操作都使用同一个 URI。 +* **表示的改变(HTTP 的 Verbs)** ── 使用 verbs, headers 和 body。 +* **可自我描述的错误信息(HTTP 中的 状态相应)** ── 使用状态码,不要重新造轮子。 +* **[HATEOAS](http://restcookbook.com/Basics/hateoas/)(HTTP 中的HTML 接口)** ── 你的 web 服务器应该能够通过浏览器访问。 + +REST 请求的例子: ``` GET /someresources/anId @@ -1463,32 +1463,32 @@ PUT /someresources/anId {"anotherdata": "another value"} ``` -REST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP APIs. REST uses a more generic and uniform method of exposing resources through URIs, [representation through headers](https://github.com/for-GET/know-your-http-well/blob/master/headers.md), and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is great for horizontal scaling and partitioning. +REST 关注于暴露数据。它最小化了客户端与服务端的联系并经常被用户公共 HTTP API。REST 使用更通常与规范化的方法来通过 URI 暴露资源,[通过 header 来表述](https://github.com/for-GET/know-your-http-well/blob/master/headers.md)并通过 GET, POST, PUT, DELETE 和 PATCH 这些 verbs 来进行操作。因为无状态的特性,REST 易于横向扩展和隔离。 -#### Disadvantage(s): REST +#### 缺点:REST -* With REST being focused on exposing data, it might not be a good fit if resources are not naturally organized or accessed in a simple hierarchy. For example, returning all updated records from the past hour matching a particular set of events is not easily expressed as a path. With REST, it is likely to be implemented with a combination of URI path, query parameters, and possibly the request body. -* REST typically relies on a few verbs (GET, POST, PUT, DELETE, and PATCH) which sometimes doesn't fit your use case. For example, moving expired documents to the archive folder might not cleanly fit within these verbs. -* Fetching complicated resources with nested hierarchies requires multiple round trips between the client and server to render single views, e.g. fetching content of a blog entry and the comments on that entry. For mobile applications operating in variable network conditions, these multiple roundtrips are highly undesirable. -* Over time, more fields might be added to an API response and older clients will receive all new data fields, even those that they do not need, as a result, it bloats the payload size and leads to larger latencies. +* 由于 REST 将重点放在暴露数据,所以当资源不是自然组织的或者结构复杂的时候它可能无法很好的适应。举个例子,返回过去一小时中与特定事件集匹配的更新记录这种操作就很难表示为路径。使用 REST,可能会使用 URI 路径,查询参数和可能的请求体来实现。 +* REST 一般依赖几个 verbs(GET, POST, PUT, DELETE 和 PATCH),但有时候仅仅这些没法满足你的需要。举个例子,将过期的文档移动到归档文件夹里去,这样的操作可能没法简单的用上面这几个 verbs 表达。 +* 使用嵌套层次结构来获取复杂资源需要客户端和服务端多次交互来渲染单个视图。比如,获取博客的内容和其下的评论。对于在可变网络环境下的移动应用,是不希望看到这种多次请求。 +* 随着时间的推移,更多的字段可能会被添加到 API 响应中,较旧的客户端将会接收到所有新的数据字段,即使是那些它们不需要的字段,结果它会增加负载大小并引起更大的延迟。 -### RPC and REST calls comparison +### PRC 与 REST 比较 -| Operation | RPC | REST | -|---|---|---| -| Signup | **POST** /signup | **POST** /persons | -| Resign | **POST** /resign
{
"personid": "1234"
} | **DELETE** /persons/1234 | -| Read a person | **GET** /readPerson?personid=1234 | **GET** /persons/1234 | -| Read a person’s items list | **GET** /readUsersItemsList?personid=1234 | **GET** /persons/1234/items | +| 操作 | RPC | REST | +| ------------------------------- | ---------------------------------------- | ---------------------------------------- | +| Signup | **POST** /signup | **POST** /persons | +| Resign | **POST** /resign
{
"personid": "1234"
} | **DELETE** /persons/1234 | +| Read a person | **GET** /readPerson?personid=1234 | **GET** /persons/1234 | +| Read a person’s items list | **GET** /readUsersItemsList?personid=1234 | **GET** /persons/1234/items | | Add an item to a person’s items | **POST** /addItemToUsersItemsList
{
"personid": "1234";
"itemid": "456"
} | **POST** /persons/1234/items
{
"itemid": "456"
} | -| Update an item | **POST** /modifyItem
{
"itemid": "456";
"key": "value"
} | **PUT** /items/456
{
"key": "value"
} | -| Delete an item | **POST** /removeItem
{
"itemid": "456"
} | **DELETE** /items/456 | +| Update an item | **POST** /modifyItem
{
"itemid": "456";
"key": "value"
} | **PUT** /items/456
{
"key": "value"
} | +| Delete an item | **POST** /removeItem
{
"itemid": "456"
} | **DELETE** /items/456 |

Source: Do you really know why you prefer REST over RPC

-#### Source(s) and further reading: REST and RPC +#### 延伸阅读:REST 与 RPC * [Do you really know why you prefer REST over RPC](https://apihandyman.io/do-you-really-know-why-you-prefer-rest-over-rpc/) * [When are RPC-ish approaches more appropriate than REST?](http://programmers.stackexchange.com/a/181186) @@ -1499,27 +1499,27 @@ REST is focused on exposing data. It minimizes the coupling between client/serv * [Thrift](https://code.facebook.com/posts/1468950976659943/) * [Why REST for internal use and not RPC](http://arstechnica.com/civis/viewtopic.php?t=1190508) -## Security +## 安全 -This section could use some updates. Consider [contributing](#contributing)! +这一部分需要更多内容。[一起来吧](#contributing)! -Security is a broad topic. Unless you have considerable experience, a security background, or are applying for a position that requires knowledge of security, you probably won't need to know more than the basics: +安全是一个宽泛的话题。下面这些知识对你应该足够了除非你有很多的经验、安全背景或者你正在申请一个需要安全知识的职位。 -* Encrypt in transit and at rest. -* Sanitize all user inputs or any input parameters exposed to user to prevent [XSS](https://en.wikipedia.org/wiki/Cross-site_scripting) and [SQL injection](https://en.wikipedia.org/wiki/SQL_injection). -* Use parameterized queries to prevent SQL injection. -* Use the principle of [least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege). +* 在运输和等待过程中加密 +* 对所有的用户输入和从用户那里发来的参数进行处理以防止 [XSS](https://en.wikipedia.org/wiki/Cross-site_scripting) 和 [SQL 注入](https://en.wikipedia.org/wiki/SQL_injection)。 +* 使用参数化的查询来防止 SQL 注入。 +* 使用[最小权限原则](https://en.wikipedia.org/wiki/Principle_of_least_privilege)。 -### Source(s) and further reading +### 延伸阅读 * [Security guide for developers](https://github.com/FallibleInc/security-guide-for-developers) * [OWASP top ten](https://www.owasp.org/index.php/OWASP_Top_Ten_Cheat_Sheet) -## Appendix +## 附录 -You'll sometimes be asked to do 'back-of-the-envelope' estimates. For example, you might need to determine how long it will take to generate 100 image thumbnails from disk or how much memory a data structure will take. The **Powers of two table** and **Latency numbers every programmer should know** are handy references. +一些时候你会被要求做出保守估计。比如,你可能需要估计从磁盘中生成 100 张图片的缩略图需要的时间或者一个数据结构需要多少的内存。**2 的次方表**和**每个开发者都需要知道的一些时间数据**(译注:OSChina 上有这篇文章的[译文](https://www.oschina.net/news/30009/every-programmer-should-know))都是一些很方便的参考资料。 -### Powers of two table +### 2 的次方表 ``` Power Exact Value Approx Value Bytes @@ -1534,7 +1534,7 @@ Power Exact Value Approx Value Bytes 40 1,099,511,627,776 1 trillion 1 TB ``` -#### Source(s) and further reading +#### 延伸阅读 * [Powers of two](https://en.wikipedia.org/wiki/Power_of_two) @@ -1566,58 +1566,57 @@ Notes 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns ``` -Handy metrics based on numbers above: +基于上述数字的指标: +* 从磁盘以 30 MB/s 的速度顺序读取 +* 以 100 MB/s 从 1 Gbps 的以太网顺序读取 +* 从 SSD 以 1 GB/s 的速度读取 +* 以 4 GB/s 的速度从主存读取 +* 每秒能绕地球 6-7 圈 +* 数据中心内每秒有 2,000 次往返 -* Read sequentially from disk at 30 MB/s -* Read sequentially from 1 Gbps Ethernet at 100 MB/s -* Read sequentially from SSD at 1 GB/s -* Read sequentially from main memory at 4 GB/s -* 6-7 world-wide round trips per second -* 2,000 round trips per second within a data center - -#### Latency numbers visualized +#### 延迟数可视化 ![](https://camo.githubusercontent.com/77f72259e1eb58596b564d1ad823af1853bc60a3/687474703a2f2f692e696d6775722e636f6d2f6b307431652e706e67) -#### Source(s) and further reading +#### 延伸阅读 * [Latency numbers every programmer should know - 1](https://gist.github.com/jboner/2841832) * [Latency numbers every programmer should know - 2](https://gist.github.com/hellerbarde/2843375) * [Designs, lessons, and advice from building large distributed systems](http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf) * [Software Engineering Advice from Building Large-Scale Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf) -### Additional system design interview questions +### 额外的系统设计面试问题 -> Common system design interview questions, with links to resources on how to solve each. +> 常见的系统设计面试问题,给出了如何解决的方案链接 -| Question | Reference(s) | -|---|---| -| Design a file sync service like Dropbox | [youtube.com](https://www.youtube.com/watch?v=PE4gwstWhmc) | -| Design a search engine like Google | [queue.acm.org](http://queue.acm.org/detail.cfm?id=988407)
[stackexchange.com](http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search)
[ardendertat.com](http://www.ardendertat.com/2012/01/11/implementing-search-engines/)
[stanford.edu](http://infolab.stanford.edu/~backrub/google.html) | -| Design a scalable web crawler like Google | [quora.com](https://www.quora.com/How-can-I-build-a-web-crawler-from-scratch) | -| Design Google docs | [code.google.com](https://code.google.com/p/google-mobwrite/)
[neil.fraser.name](https://neil.fraser.name/writing/sync/) | -| Design a key-value store like Redis | [slideshare.net](http://www.slideshare.net/dvirsky/introduction-to-redis) | -| Design a cache system like Memcached | [slideshare.net](http://www.slideshare.net/oemebamo/introduction-to-memcached) | -| Design a recommendation system like Amazon's | [hulu.com](http://tech.hulu.com/blog/2011/09/19/recommendation-system.html)
[ijcai13.org](http://ijcai13.org/files/tutorial_slides/td3.pdf) | -| Design a tinyurl system like Bitly | [n00tc0d3r.blogspot.com](http://n00tc0d3r.blogspot.com/) | -| Design a chat app like WhatsApp | [highscalability.com](http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html) -| Design a picture sharing system like Instagram | [highscalability.com](http://highscalability.com/flickr-architecture)
[highscalability.com](http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html) | -| Design the Facebook news feed function | [quora.com](http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed)
[quora.com](http://www.quora.com/Activity-Streams/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed)
[slideshare.net](http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture) | -| Design the Facebook timeline function | [facebook.com](https://www.facebook.com/note.php?note_id=10150468255628920)
[highscalability.com](http://highscalability.com/blog/2012/1/23/facebook-timeline-brought-to-you-by-the-power-of-denormaliza.html) | -| Design the Facebook chat function | [erlang-factory.com](http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf)
[facebook.com](https://www.facebook.com/note.php?note_id=14218138919&id=9445547199&index=0) | -| Design a graph search function like Facebook's | [facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-building-out-the-infrastructure-for-graph-search/10151347573598920)
[facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-indexing-and-ranking-in-graph-search/10151361720763920)
[facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-the-natural-language-interface-of-graph-search/10151432733048920) | -| Design a content delivery network like CloudFlare | [cmu.edu](http://repository.cmu.edu/cgi/viewcontent.cgi?article=2112&context=compsci) | -| Design a trending topic system like Twitter's | [michael-noll.com](http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/)
[snikolov .wordpress.com](http://snikolov.wordpress.com/2012/11/14/early-detection-of-twitter-trends/) | -| Design a random ID generation system | [blog.twitter.com](https://blog.twitter.com/2010/announcing-snowflake)
[github.com](https://github.com/twitter/snowflake/) | -| Return the top k requests during a time interval | [ucsb.edu](https://icmi.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf)
[wpi.edu](http://davis.wpi.edu/xmdv/docs/EDBT11-diyang.pdf) | -| Design a system that serves data from multiple data centers | [highscalability.com](http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html) | -| Design an online multiplayer card game | [indieflashblog.com](http://www.indieflashblog.com/how-to-create-an-asynchronous-multiplayer-game.html)
[buildnewgames.com](http://buildnewgames.com/real-time-multiplayer/) | -| Design a garbage collection system | [stuffwithstuff.com](http://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-collector/)
[washington.edu](http://courses.cs.washington.edu/courses/csep521/07wi/prj/rick.pdf) | -| Add a system design question | [Contribute](#contributing) | +| 问题 | 引用 | +| --------------------- | ---------------------------------------- | +| 设计类似于 Dropbox 的文件同步服务 | [youtube.com](https://www.youtube.com/watch?v=PE4gwstWhmc) | +| 设计类似于 Google 的搜索引擎 | [queue.acm.org](http://queue.acm.org/detail.cfm?id=988407)
[stackexchange.com](http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search)
[ardendertat.com](http://www.ardendertat.com/2012/01/11/implementing-search-engines/)
[stanford.edu](http://infolab.stanford.edu/~backrub/google.html) | +| 设计类似于 Google 的可扩展网络爬虫 | [quora.com](https://www.quora.com/How-can-I-build-a-web-crawler-from-scratch) | +| 设计 Google 文档 | [code.google.com](https://code.google.com/p/google-mobwrite/)
[neil.fraser.name](https://neil.fraser.name/writing/sync/) | +| 设计类似 Redis 的建值存储 | [slideshare.net](http://www.slideshare.net/dvirsky/introduction-to-redis) | +| 设计类似 Memcached 的缓存系统 | [slideshare.net](http://www.slideshare.net/oemebamo/introduction-to-memcached) | +| 设计类似亚马逊的推荐系统 | [hulu.com](http://tech.hulu.com/blog/2011/09/19/recommendation-system.html)
[ijcai13.org](http://ijcai13.org/files/tutorial_slides/td3.pdf) | +| 设计类似 Bitly 的短链接系统 | [n00tc0d3r.blogspot.com](http://n00tc0d3r.blogspot.com/) | +| 设计类似 WhatsApp 的聊天应用 | [highscalability.com](http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html) +| 设计类似 Instagram 的图片分享系统| [highscalability.com](http://highscalability.com/flickr-architecture)
[highscalability.com](http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html) | +| 设计 Facebook 的新闻推荐方法 | [quora.com](http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed)
[quora.com](http://www.quora.com/Activity-Streams/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed)
[slideshare.net](http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture) | +| 设计 Facebook 的时间线系统 | [facebook.com](https://www.facebook.com/note.php?note_id=10150468255628920)
[highscalability.com](http://highscalability.com/blog/2012/1/23/facebook-timeline-brought-to-you-by-the-power-of-denormaliza.html) | +| 设计 Facebook 的聊天系统 | [erlang-factory.com](http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf)
[facebook.com](https://www.facebook.com/note.php?note_id=14218138919&id=9445547199&index=0) | +| 设计类似 Facebook 的图表搜索系统 | [facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-building-out-the-infrastructure-for-graph-search/10151347573598920)
[facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-indexing-and-ranking-in-graph-search/10151361720763920)
[facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-the-natural-language-interface-of-graph-search/10151432733048920) | +| 设计类似 CloudFlare 的内容传递网络 | [cmu.edu](http://repository.cmu.edu/cgi/viewcontent.cgi?article=2112&context=compsci) | +| 设计类似 Twitter 的热门话题系统 | [michael-noll.com](http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/)
[snikolov .wordpress.com](http://snikolov.wordpress.com/2012/11/14/early-detection-of-twitter-trends/) | +| 设计一个随机 ID 生成系统 | [blog.twitter.com](https://blog.twitter.com/2010/announcing-snowflake)
[github.com](https://github.com/twitter/snowflake/) | +| 返回一定时间段内次数前 k 高的请求 | [ucsb.edu](https://icmi.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf)
[wpi.edu](http://davis.wpi.edu/xmdv/docs/EDBT11-diyang.pdf) | +| 设计一个数据源于多个数据中心的服务系统 | [highscalability.com](http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html) | +| 设计一个多人网络卡牌游戏 | [indieflashblog.com](http://www.indieflashblog.com/how-to-create-an-asynchronous-multiplayer-game.html)
[buildnewgames.com](http://buildnewgames.com/real-time-multiplayer/) | +| 设计一个垃圾回收系统 | [stuffwithstuff.com](http://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-collector/)
[washington.edu](http://courses.cs.washington.edu/courses/csep521/07wi/prj/rick.pdf) | +| 添加更多的系统设计问题 | [Contribute](#contributing) | -### Real world architectures +### 真实的设计架构 -> Articles on how real world systems are designed. +> 关于现实中真实的系统是怎么设计的文章。

@@ -1625,20 +1624,20 @@ Handy metrics based on numbers above: Source: Twitter timelines at scale

-**Don't focus on nitty gritty details for the following articles, instead:** +** 不要专注于一下文章的细节,专注于以下方面:** -* Identify shared principles, common technologies, and patterns within these articles -* Study what problems are solved by each component, where it works, where it doesn't -* Review the lessons learned +* 发现这些文章中的共同的原则、技术和模式。 +* 学习每个组件解决哪些问题,什么情况下使用,什么情况下不适用 +* 复习学过的文章 -|Type | System | Reference(s) | -|---|---|---| +| 类型 | 系统 | 引用 | +| --------------- | ---------------------------------------- | ---------------------------------------- | | Data processing | **MapReduce** - Distributed data processing from Google | [research.google.com](http://static.googleusercontent.com/media/research.google.com/zh-CN/us/archive/mapreduce-osdi04.pdf) | | Data processing | **Spark** - Distributed data processing from Databricks | [slideshare.net](http://www.slideshare.net/AGrishchenko/apache-spark-architecture) | | Data processing | **Storm** - Distributed data processing from Twitter | [slideshare.net](http://www.slideshare.net/previa/storm-16094009) | -| | | | -| Data store | **Bigtable** - Distributed column-oriented database from Google | [harvard.edu](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf) | -| Data store | **HBase** - Open source implementation of Bigtable | [slideshare.net](http://www.slideshare.net/alexbaranau/intro-to-hbase) | +| | | | +| Data store | **Bigtable** - Distributed column-oriented database from Google | [harvard.edu](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf) | +| Data store | **HBase** - Open source implementation of Bigtable | [slideshare.net](http://www.slideshare.net/alexbaranau/intro-to-hbase) | | Data store | **Cassandra** - Distributed column-oriented database from Facebook | [slideshare.net](http://www.slideshare.net/planetcassandra/cassandra-introduction-features-30103666) | Data store | **DynamoDB** - Document-oriented database from Amazon | [harvard.edu](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf) | | Data store | **MongoDB** - Document-oriented database | [slideshare.net](http://www.slideshare.net/mdirolf/introduction-to-mongodb) | @@ -1655,38 +1654,38 @@ Handy metrics based on numbers above: | Misc | **Zookeeper** - Centralized infrastructure and services enabling synchronization | [slideshare.net](http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper) | | | Add an architecture | [Contribute](#contributing) | -### Company architectures +### 公司的系统架构 -| Company | Reference(s) | -|---|---| -| Amazon | [Amazon architecture](http://highscalability.com/amazon-architecture) | -| Cinchcast | [Producing 1,500 hours of audio every day](http://highscalability.com/blog/2012/7/16/cinchcast-architecture-producing-1500-hours-of-audio-every-d.html) | -| DataSift | [Realtime datamining At 120,000 tweets per second](http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html) | -| DropBox | [How we've scaled Dropbox](https://www.youtube.com/watch?v=PE4gwstWhmc) | -| ESPN | [Operating At 100,000 duh nuh nuhs per second](http://highscalability.com/blog/2013/11/4/espns-architecture-at-scale-operating-at-100000-duh-nuh-nuhs.html) | -| Google | [Google architecture](http://highscalability.com/google-architecture) | -| Instagram | [14 million users, terabytes of photos](http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html)
[What powers Instagram](http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances) | -| Justin.tv | [Justin.Tv's live video broadcasting architecture](http://highscalability.com/blog/2010/3/16/justintvs-live-video-broadcasting-architecture.html) | -| Facebook | [Scaling memcached at Facebook](https://cs.uwaterloo.ca/~brecht/courses/854-Emerging-2014/readings/key-value/fb-memcached-nsdi-2013.pdf)
[TAO: Facebook’s distributed data store for the social graph](https://cs.uwaterloo.ca/~brecht/courses/854-Emerging-2014/readings/data-store/tao-facebook-distributed-datastore-atc-2013.pdf)
[Facebook’s photo storage](https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf) | -| Flickr | [Flickr architecture](http://highscalability.com/flickr-architecture) | -| Mailbox | [From 0 to one million users in 6 weeks](http://highscalability.com/blog/2013/6/18/scaling-mailbox-from-0-to-one-million-users-in-6-weeks-and-1.html) | -| Pinterest | [From 0 To 10s of billions of page views a month](http://highscalability.com/blog/2013/4/15/scaling-pinterest-from-0-to-10s-of-billions-of-page-views-a.html)
[18 million visitors, 10x growth, 12 employees](http://highscalability.com/blog/2012/5/21/pinterest-architecture-update-18-million-visitors-10x-growth.html) | -| Playfish | [50 million monthly users and growing](http://highscalability.com/blog/2010/9/21/playfishs-social-gaming-architecture-50-million-monthly-user.html) | -| PlentyOfFish | [PlentyOfFish architecture](http://highscalability.com/plentyoffish-architecture) | -| Salesforce | [How they handle 1.3 billion transactions a day](http://highscalability.com/blog/2013/9/23/salesforce-architecture-how-they-handle-13-billion-transacti.html) | +| Company | Reference(s) | +| -------------- | ---------------------------------------- | +| Amazon | [Amazon architecture](http://highscalability.com/amazon-architecture) | +| Cinchcast | [Producing 1,500 hours of audio every day](http://highscalability.com/blog/2012/7/16/cinchcast-architecture-producing-1500-hours-of-audio-every-d.html) | +| DataSift | [Realtime datamining At 120,000 tweets per second](http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html) | +| DropBox | [How we've scaled Dropbox](https://www.youtube.com/watch?v=PE4gwstWhmc) | +| ESPN | [Operating At 100,000 duh nuh nuhs per second](http://highscalability.com/blog/2013/11/4/espns-architecture-at-scale-operating-at-100000-duh-nuh-nuhs.html) | +| Google | [Google architecture](http://highscalability.com/google-architecture) | +| Instagram | [14 million users, terabytes of photos](http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html)
[What powers Instagram](http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances) | +| Justin.tv | [Justin.Tv's live video broadcasting architecture](http://highscalability.com/blog/2010/3/16/justintvs-live-video-broadcasting-architecture.html) | +| Facebook | [Scaling memcached at Facebook](https://cs.uwaterloo.ca/~brecht/courses/854-Emerging-2014/readings/key-value/fb-memcached-nsdi-2013.pdf)
[TAO: Facebook’s distributed data store for the social graph](https://cs.uwaterloo.ca/~brecht/courses/854-Emerging-2014/readings/data-store/tao-facebook-distributed-datastore-atc-2013.pdf)
[Facebook’s photo storage](https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf) | +| Flickr | [Flickr architecture](http://highscalability.com/flickr-architecture) | +| Mailbox | [From 0 to one million users in 6 weeks](http://highscalability.com/blog/2013/6/18/scaling-mailbox-from-0-to-one-million-users-in-6-weeks-and-1.html) | +| Pinterest | [From 0 To 10s of billions of page views a month](http://highscalability.com/blog/2013/4/15/scaling-pinterest-from-0-to-10s-of-billions-of-page-views-a.html)
[18 million visitors, 10x growth, 12 employees](http://highscalability.com/blog/2012/5/21/pinterest-architecture-update-18-million-visitors-10x-growth.html) | +| Playfish | [50 million monthly users and growing](http://highscalability.com/blog/2010/9/21/playfishs-social-gaming-architecture-50-million-monthly-user.html) | +| PlentyOfFish | [PlentyOfFish architecture](http://highscalability.com/plentyoffish-architecture) | +| Salesforce | [How they handle 1.3 billion transactions a day](http://highscalability.com/blog/2013/9/23/salesforce-architecture-how-they-handle-13-billion-transacti.html) | | Stack Overflow | [Stack Overflow architecture](http://highscalability.com/blog/2009/8/5/stack-overflow-architecture.html) | -| TripAdvisor | [40M visitors, 200M dynamic page views, 30TB data](http://highscalability.com/blog/2011/6/27/tripadvisor-architecture-40m-visitors-200m-dynamic-page-view.html) | -| Tumblr | [15 billion page views a month](http://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html) | -| Twitter | [Making Twitter 10000 percent faster](http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster)
[Storing 250 million tweets a day using MySQL](http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html)
[150M active users, 300K QPS, a 22 MB/S firehose](http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html)
[Timelines at scale](https://www.infoq.com/presentations/Twitter-Timeline-Scalability)
[Big and small data at Twitter](https://www.youtube.com/watch?v=5cKTP36HVgI)
[Operations at Twitter: scaling beyond 100 million users](https://www.youtube.com/watch?v=z8LU0Cj6BOU) | -| Uber | [How Uber scales their real-time market platform](http://highscalability.com/blog/2015/9/14/how-uber-scales-their-real-time-market-platform.html) | -| WhatsApp | [The WhatsApp architecture Facebook bought for $19 billion](http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html) | -| YouTube | [YouTube scalability](https://www.youtube.com/watch?v=w5WVu624fY8)
[YouTube architecture](http://highscalability.com/youtube-architecture) | +| TripAdvisor | [40M visitors, 200M dynamic page views, 30TB data](http://highscalability.com/blog/2011/6/27/tripadvisor-architecture-40m-visitors-200m-dynamic-page-view.html) | +| Tumblr | [15 billion page views a month](http://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html) | +| Twitter | [Making Twitter 10000 percent faster](http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster)
[Storing 250 million tweets a day using MySQL](http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html)
[150M active users, 300K QPS, a 22 MB/S firehose](http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html)
[Timelines at scale](https://www.infoq.com/presentations/Twitter-Timeline-Scalability)
[Big and small data at Twitter](https://www.youtube.com/watch?v=5cKTP36HVgI)
[Operations at Twitter: scaling beyond 100 million users](https://www.youtube.com/watch?v=z8LU0Cj6BOU) | +| Uber | [How Uber scales their real-time market platform](http://highscalability.com/blog/2015/9/14/how-uber-scales-their-real-time-market-platform.html) | +| WhatsApp | [The WhatsApp architecture Facebook bought for $19 billion](http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html) | +| YouTube | [YouTube scalability](https://www.youtube.com/watch?v=w5WVu624fY8)
[YouTube architecture](http://highscalability.com/youtube-architecture) | -### Company engineering blogs +### 公司工程博客 -> Architectures for companies you are interviewing with. +> 你即将面试的公司的架构 > -> Questions you encounter might be from the same domain. +> 你面对的问题可能就来自于同样领域 * [Airbnb Engineering](http://nerds.airbnb.com/) * [Atlassian Developers](https://developer.atlassian.com/blog/) @@ -1730,24 +1729,22 @@ Handy metrics based on numbers above: * [Yelp Engineering Blog](http://engineeringblog.yelp.com/) * [Zynga Engineering Blog](https://www.zynga.com/blogs/engineering) -#### Source(s) and further reading +#### 延伸阅读 * [kilimchoi/engineering-blogs](https://github.com/kilimchoi/engineering-blogs) -## Under development +## 正在开发中 -Interested in adding a section or helping complete one in-progress? [Contribute](#contributing)! +有兴趣加入添加一些部分或者帮助完善某些部分吗?[加入进来吧](#contributing)! -* Distributed computing with MapReduce -* Consistent hashing -* Scatter gather +* 使用 MapReduce 进行分布式计算 +* 一致性哈希 * [Contribute](#contributing) ## Credits -Credits and sources are provided throughout this repo. -Special thanks to: +特别鸣谢: * [Hired in tech](http://www.hiredintech.com/system-design/the-system-design-process/) * [Cracking the coding interview](https://www.amazon.com/dp/0984782850/) @@ -1759,11 +1756,11 @@ Special thanks to: * [A distributed systems reading list](http://dancres.github.io/Pages/) * [Cracking the system design interview](http://www.puncsky.com/blog/2016/02/14/crack-the-system-design-interview/) -## Contact info +## 联系方式 -Feel free to contact me to discuss any issues, questions, or comments. +欢迎联系我讨论本文的不足、问题或者意见 -My contact info can be found on my [GitHub page](https://github.com/donnemartin). +可以在我的 [GitHub 主页](https://github.com/donnemartin)上找到我的联系方式 ## License