Merge pull request #2 from voitau/lang_ru

Lang ru
pull/502/head
Alex Voitau 2020-04-12 15:23:41 -07:00 committed by GitHub
commit 57612f6c74
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
11 changed files with 5891 additions and 11 deletions

3365
README-ru.md Normal file
View File

@ -0,0 +1,3365 @@
[English](README.md) | [日本語](README-ja.md) | **[Русский](README-ru.md)** | [简体中文](README-zh-Hans.md) | [繁體中文](README-zh-TW.md) <!-- l10n:select -->
<!-- l10n:p
# The System Design Primer
<p align="center">
<img src="http://i.imgur.com/jj3A5N8.png"/>
<br/>
</p>
l10n:p -->
# Пособие по проектированию систем
<p align="center">
<img src="http://i.imgur.com/jj3A5N8.png"/>
<br/>
</p>
<!-- l10n:p
## Motivation
> Learn how to design large-scale systems.
>
> Prep for the system design interview.
l10n:p -->
## Мотивация
> Узнайте, как проектировать крупномасштабные системы.
>
> Подготовьтесь к собеседованию по проектированию системы.
<!-- l10n:p
### Learn how to design large-scale systems
Learning how to design scalable systems will help you become a better engineer.
System design is a broad topic. There is a **vast amount of resources scattered throughout the web** on system design principles.
This repo is an **organized collection** of resources to help you learn how to build systems at scale.
l10n:p -->
### Научитесь проектировать крупномасштабные системы
Умение проектировать масштабируемые системы поможет вам стать лучшим инженером.
Проектирование систем - это широкая тема. В сети есть **огромное количество ресурсов** по принципам проектирования систем.
Этот репозиторий представляет собой **организованную коллекцию** ресурсов, которые помогут вам научиться создавать системы на большом масштабе.
<!-- l10n:p
### Learn from the open source community
This is a continually updated, open source project.
[Contributions](#contributing) are welcome!
l10n:p -->
### Учитесь у сообщества по разработке ПО с открытым исходным кодом
Это постоянно обновляемый проект с открытым исходным кодом.
[Contributions](#contributing) очень приветствуются!
<!-- l10n:p
### Prep for the system design interview
In addition to coding interviews, system design is a **required component** of the **technical interview process** at many tech companies.
**Practice common system design interview questions** and **compare** your results with **sample solutions**: discussions, code, and diagrams.
Additional topics for interview prep:
* [Study guide](#study-guide)
* [How to approach a system design interview question](#how-to-approach-a-system-design-interview-question)
* [System design interview questions, **with solutions**](#system-design-interview-questions-with-solutions)
* [Object-oriented design interview questions, **with solutions**](#object-oriented-design-interview-questions-with-solutions)
* [Additional system design interview questions](#additional-system-design-interview-questions)
l10n:p -->
### Подготовка к собеседованию по проектированию системы
В дополнение к интервью по написанию кода, проектирование систем является **обязательным компонентом процесса технического интервью** во многих технологических компаниях.
**Практикуйте общие вопросы по проектированию систем** и **сравнивайте** свои результаты с **примерами решений**: обсуждения, код и диаграммы.
Дополнительные темы для подготовки к собеседованию:
* [Study guide](#study-guide)
* [How to approach a system design interview question](#how-to-approach-a-system-design-interview-question)
* [System design interview questions, **with solutions**](#system-design-interview-questions-with-solutions)
* [Object-oriented design interview questions, **with solutions**](#object-oriented-design-interview-questions-with-solutions)
* [Additional system design interview questions](#additional-system-design-interview-questions)
<!-- l10n:p
## Anki flashcards
<p align="center">
<img src="http://i.imgur.com/zdCAkB3.png"/>
<br/>
</p>
The provided [Anki flashcard decks](https://apps.ankiweb.net/) use spaced repetition to help you retain key system design concepts.
* [System design deck](https://github.com/donnemartin/system-design-primer/tree/master/resources/flash_cards/System%20Design.apkg)
* [System design exercises deck](https://github.com/donnemartin/system-design-primer/tree/master/resources/flash_cards/System%20Design%20Exercises.apkg)
* [Object oriented design exercises deck](https://github.com/donnemartin/system-design-primer/tree/master/resources/flash_cards/OO%20Design.apkg)
Great for use while on-the-go.
l10n:p -->
## Anki flashcards
<p align="center">
<img src="http://i.imgur.com/zdCAkB3.png"/>
<br/>
</p>
Предоставленные [карточки Anki](https://apps.ankiweb.net/) могут быть использованы для повторения и запоминания ключевых концепций проектирования систем.
* [System design deck](https://github.com/donnemartin/system-design-primer/tree/master/resources/flash_cards/System%20Design.apkg)
* [System design exercises deck](https://github.com/donnemartin/system-design-primer/tree/master/resources/flash_cards/System%20Design%20Exercises.apkg)
* [Object oriented design exercises deck](https://github.com/donnemartin/system-design-primer/tree/master/resources/flash_cards/OO%20Design.apkg)
Отлично подходят для использования на ходу.
<!-- l10n:p
### Coding Resource: Interactive Coding Challenges
Looking for resources to help you prep for the [**Coding Interview**](https://github.com/donnemartin/interactive-coding-challenges)?
<p align="center">
<img src="http://i.imgur.com/b4YtAEN.png"/>
<br/>
</p>
Check out the sister repo [**Interactive Coding Challenges**](https://github.com/donnemartin/interactive-coding-challenges), which contains an additional Anki deck:
* [Coding deck](https://github.com/donnemartin/interactive-coding-challenges/tree/master/anki_cards/Coding.apkg)
l10n:p -->
### Coding Resource: Interactive Coding Challenges
Ищете ресурсы для подготовки к [**Coding Interview**](https://github.com/donnemartin/interactive-coding-challenges)?
<p align="center">
<img src="http://i.imgur.com/b4YtAEN.png"/>
<br/>
</p>
Посмотрите другой репозиторий [**Interactive Coding Challenges**](https://github.com/donnemartin/interactive-coding-challenges), который тоже содержит колоду карт Anki:
* [Coding deck](https://github.com/donnemartin/interactive-coding-challenges/tree/master/anki_cards/Coding.apkg)
<!-- l10n:p
## Contributing
> Learn from the community.
Feel free to submit pull requests to help:
* Fix errors
* Improve sections
* Add new sections
* [Translate](https://github.com/donnemartin/system-design-primer/issues/28)
Content that needs some polishing is placed [under development](#under-development).
Review the [Contributing Guidelines](CONTRIBUTING.md).
l10n:p -->
## Содействие
> Учитесь у сообщества.
Не стесняйтесь отправлять запросы на:
* Исправление ошибок
* Улучшение разделов
* Добавление новых разделов
* [Перевод](https://github.com/donnemartin/system-design-primer/issues/28)
Контент, который нуждается в некоторой полировке, помещается в раздел [В разработке](#under-development).
Ознакомьтесь с [Принципами Содействия](CONTRIBUTING.md).
<!-- l10n:p
## Index of system design topics
> Summaries of various system design topics, including pros and cons. **Everything is a trade-off**.
>
> Each section contains links to more in-depth resources.
<p align="center">
<img src="http://i.imgur.com/jrUBAF7.png"/>
<br/>
</p>
* [System design topics: start here](#system-design-topics-start-here)
* [Step 1: Review the scalability video lecture](#step-1-review-the-scalability-video-lecture)
* [Step 2: Review the scalability article](#step-2-review-the-scalability-article)
* [Next steps](#next-steps)
* [Performance vs scalability](#performance-vs-scalability)
* [Latency vs throughput](#latency-vs-throughput)
* [Availability vs consistency](#availability-vs-consistency)
* [CAP theorem](#cap-theorem)
* [CP - consistency and partition tolerance](#cp---consistency-and-partition-tolerance)
* [AP - availability and partition tolerance](#ap---availability-and-partition-tolerance)
* [Consistency patterns](#consistency-patterns)
* [Weak consistency](#weak-consistency)
* [Eventual consistency](#eventual-consistency)
* [Strong consistency](#strong-consistency)
* [Availability patterns](#availability-patterns)
* [Fail-over](#fail-over)
* [Replication](#replication)
* [Availability in numbers](#availability-in-numbers)
* [Domain name system](#domain-name-system)
* [Content delivery network](#content-delivery-network)
* [Push CDNs](#push-cdns)
* [Pull CDNs](#pull-cdns)
* [Load balancer](#load-balancer)
* [Active-passive](#active-passive)
* [Active-active](#active-active)
* [Layer 4 load balancing](#layer-4-load-balancing)
* [Layer 7 load balancing](#layer-7-load-balancing)
* [Horizontal scaling](#horizontal-scaling)
* [Reverse proxy (web server)](#reverse-proxy-web-server)
* [Load balancer vs reverse proxy](#load-balancer-vs-reverse-proxy)
* [Application layer](#application-layer)
* [Microservices](#microservices)
* [Service discovery](#service-discovery)
* [Database](#database)
* [Relational database management system (RDBMS)](#relational-database-management-system-rdbms)
* [Master-slave replication](#master-slave-replication)
* [Master-master replication](#master-master-replication)
* [Federation](#federation)
* [Sharding](#sharding)
* [Denormalization](#denormalization)
* [SQL tuning](#sql-tuning)
* [NoSQL](#nosql)
* [Key-value store](#key-value-store)
* [Document store](#document-store)
* [Wide column store](#wide-column-store)
* [Graph Database](#graph-database)
* [SQL or NoSQL](#sql-or-nosql)
* [Cache](#cache)
* [Client caching](#client-caching)
* [CDN caching](#cdn-caching)
* [Web server caching](#web-server-caching)
* [Database caching](#database-caching)
* [Application caching](#application-caching)
* [Caching at the database query level](#caching-at-the-database-query-level)
* [Caching at the object level](#caching-at-the-object-level)
* [When to update the cache](#when-to-update-the-cache)
* [Cache-aside](#cache-aside)
* [Write-through](#write-through)
* [Write-behind (write-back)](#write-behind-write-back)
* [Refresh-ahead](#refresh-ahead)
* [Asynchronism](#asynchronism)
* [Message queues](#message-queues)
* [Task queues](#task-queues)
* [Back pressure](#back-pressure)
* [Communication](#communication)
* [Transmission control protocol (TCP)](#transmission-control-protocol-tcp)
* [User datagram protocol (UDP)](#user-datagram-protocol-udp)
* [Remote procedure call (RPC)](#remote-procedure-call-rpc)
* [Representational state transfer (REST)](#representational-state-transfer-rest)
* [Security](#security)
* [Appendix](#appendix)
* [Powers of two table](#powers-of-two-table)
* [Latency numbers every programmer should know](#latency-numbers-every-programmer-should-know)
* [Additional system design interview questions](#additional-system-design-interview-questions)
* [Real world architectures](#real-world-architectures)
* [Company architectures](#company-architectures)
* [Company engineering blogs](#company-engineering-blogs)
* [Under development](#under-development)
* [Credits](#credits)
* [Contact info](#contact-info)
* [License](#license)
l10n:p -->
## Index of system design topics
> Обобщение различных тем по проектирования систем, включая преимущества и недостатки. **Любое решение требует уступок**.
>
> Каждый раздел содержит ссылки на более подробное описание.
<p align="center">
<img src="http://i.imgur.com/jrUBAF7.png"/>
<br/>
</p>
* [System design topics: start here](#system-design-topics-start-here)
* [Step 1: Review the scalability video lecture](#step-1-review-the-scalability-video-lecture)
* [Step 2: Review the scalability article](#step-2-review-the-scalability-article)
* [Next steps](#next-steps)
* [Performance vs scalability](#performance-vs-scalability)
* [Latency vs throughput](#latency-vs-throughput)
* [Availability vs consistency](#availability-vs-consistency)
* [CAP theorem](#cap-theorem)
* [CP - consistency and partition tolerance](#cp---consistency-and-partition-tolerance)
* [AP - availability and partition tolerance](#ap---availability-and-partition-tolerance)
* [Consistency patterns](#consistency-patterns)
* [Weak consistency](#weak-consistency)
* [Eventual consistency](#eventual-consistency)
* [Strong consistency](#strong-consistency)
* [Availability patterns](#availability-patterns)
* [Fail-over](#fail-over)
* [Replication](#replication)
* [Availability in numbers](#availability-in-numbers)
* [Domain name system](#domain-name-system)
* [Content delivery network](#content-delivery-network)
* [Push CDNs](#push-cdns)
* [Pull CDNs](#pull-cdns)
* [Load balancer](#load-balancer)
* [Active-passive](#active-passive)
* [Active-active](#active-active)
* [Layer 4 load balancing](#layer-4-load-balancing)
* [Layer 7 load balancing](#layer-7-load-balancing)
* [Horizontal scaling](#horizontal-scaling)
* [Reverse proxy (web server)](#reverse-proxy-web-server)
* [Load balancer vs reverse proxy](#load-balancer-vs-reverse-proxy)
* [Application layer](#application-layer)
* [Microservices](#microservices)
* [Service discovery](#service-discovery)
* [Database](#database)
* [Relational database management system (RDBMS)](#relational-database-management-system-rdbms)
* [Master-slave replication](#master-slave-replication)
* [Master-master replication](#master-master-replication)
* [Federation](#federation)
* [Sharding](#sharding)
* [Denormalization](#denormalization)
* [SQL tuning](#sql-tuning)
* [NoSQL](#nosql)
* [Key-value store](#key-value-store)
* [Document store](#document-store)
* [Wide column store](#wide-column-store)
* [Graph Database](#graph-database)
* [SQL or NoSQL](#sql-or-nosql)
* [Cache](#cache)
* [Client caching](#client-caching)
* [CDN caching](#cdn-caching)
* [Web server caching](#web-server-caching)
* [Database caching](#database-caching)
* [Application caching](#application-caching)
* [Caching at the database query level](#caching-at-the-database-query-level)
* [Caching at the object level](#caching-at-the-object-level)
* [When to update the cache](#when-to-update-the-cache)
* [Cache-aside](#cache-aside)
* [Write-through](#write-through)
* [Write-behind (write-back)](#write-behind-write-back)
* [Refresh-ahead](#refresh-ahead)
* [Asynchronism](#asynchronism)
* [Message queues](#message-queues)
* [Task queues](#task-queues)
* [Back pressure](#back-pressure)
* [Communication](#communication)
* [Transmission control protocol (TCP)](#transmission-control-protocol-tcp)
* [User datagram protocol (UDP)](#user-datagram-protocol-udp)
* [Remote procedure call (RPC)](#remote-procedure-call-rpc)
* [Representational state transfer (REST)](#representational-state-transfer-rest)
* [Security](#security)
* [Appendix](#appendix)
* [Powers of two table](#powers-of-two-table)
* [Latency numbers every programmer should know](#latency-numbers-every-programmer-should-know)
* [Additional system design interview questions](#additional-system-design-interview-questions)
* [Real world architectures](#real-world-architectures)
* [Company architectures](#company-architectures)
* [Company engineering blogs](#company-engineering-blogs)
* [Under development](#under-development)
* [Credits](#credits)
* [Contact info](#contact-info)
* [License](#license)
<!-- l10n:p
## Study guide
> Suggested topics to review based on your interview timeline (short, medium, long).
![Imgur](http://i.imgur.com/OfVllex.png)
**Q: For interviews, do I need to know everything here?**
**A: No, you don't need to know everything here to prepare for the interview**.
What you are asked in an interview depends on variables such as:
* How much experience you have
* What your technical background is
* What positions you are interviewing for
* Which companies you are interviewing with
* Luck
More experienced candidates are generally expected to know more about system design. Architects or team leads might be expected to know more than individual contributors. Top tech companies are likely to have one or more design interview rounds.
Start broad and go deeper in a few areas. It helps to know a little about various key system design topics. Adjust the following guide based on your timeline, experience, what positions you are interviewing for, and which companies you are interviewing with.
* **Short timeline** - Aim for **breadth** with system design topics. Practice by solving **some** interview questions.
* **Medium timeline** - Aim for **breadth** and **some depth** with system design topics. Practice by solving **many** interview questions.
* **Long timeline** - Aim for **breadth** and **more depth** with system design topics. Practice by solving **most** interview questions.
| | Short | Medium | Long |
|---|---|---|---|
| Read through the [System design topics](#index-of-system-design-topics) to get a broad understanding of how systems work | :+1: | :+1: | :+1: |
| Read through a few articles in the [Company engineering blogs](#company-engineering-blogs) for the companies you are interviewing with | :+1: | :+1: | :+1: |
| Read through a few [Real world architectures](#real-world-architectures) | :+1: | :+1: | :+1: |
| Review [How to approach a system design interview question](#how-to-approach-a-system-design-interview-question) | :+1: | :+1: | :+1: |
| Work through [System design interview questions with solutions](#system-design-interview-questions-with-solutions) | Some | Many | Most |
| Work through [Object-oriented design interview questions with solutions](#object-oriented-design-interview-questions-with-solutions) | Some | Many | Most |
| Review [Additional system design interview questions](#additional-system-design-interview-questions) | Some | Many | Most |
l10n:p -->
## Study guide
> Предлагаемые темы для повторения в зависимости от того, сколько у вас есть времени для подготовки к интервью (мало, средне, много)
![Imgur](http://i.imgur.com/OfVllex.png)
**Вопрос: Надо ли мне знать все из этого документа для интервью?**
**Ответ: Нет, не обязательно**.
То, что вас будут спрашивать на интервью, зависит от:
* Вашего опыта - сколько времени и чем вы занимались
* Должности, на которую вы собеседуетесь
* Компания, в которую вы собеседуетесь
* Удача
Ожидается, что более опытные кандидаты в общем случае знают больше о проектировании систем, а архитекторы и руководители комманд знают больше, чем индивидуальные разработчики. Топовые IT компании скорее всего будут проводить один или более этапов собеседования по проектированию систем.
Начинайте широко, и углубляейтесь в некоторые области. Это поможет узнать больше о различных темах по проектированию систем. Корректируйте ваш план в зависомости от того, сколько у вас есть времени, какой у вас опыт, на какую должность вы собеседуетесь и в какие компании.
* **Короткий срок** - настраиватесь на **широту** покрытия тем. Тренируйтесь отвечать на **некоторые** вопросы.
* **Средний срок** - настраиватесь на **широту** и **немного глубины** покрытия тем. Тренируйтесь отвечать на **многие** вопросы.
* **Длительный срок** - настраиватесь на **широту** и **больше глубины** покрытия тем. Тренируйтесь отвечать на **большинство** вопросов.
| | Малый срок | Средний срок | Длительный срок |
|---|---|---|---|
| Читайте [System design topics](#index-of-system-design-topics), чтобы получить общее понимание, как работают системы | :+1: | :+1: | :+1: |
| Почитайте несколько статей из блогов компаний, в который вы собеседуетесь [Company engineering blogs](#company-engineering-blogs) | :+1: | :+1: | :+1: |
| Посмотрите несколько [Real world architectures](#real-world-architectures) | :+1: | :+1: | :+1: |
| [How to approach a system design interview question](#how-to-approach-a-system-design-interview-question) | :+1: | :+1: | :+1: |
| [System design interview questions with solutions](#system-design-interview-questions-with-solutions) | Немного | Много | Большинство |
| [Object-oriented design interview questions with solutions](#object-oriented-design-interview-questions-with-solutions) | Немного | Много | Большинство |
| [Additional system design interview questions](#additional-system-design-interview-questions) | Немного | Много | Большинство |
<!-- l10n:p
## How to approach a system design interview question
> How to tackle a system design interview question.
The system design interview is an **open-ended conversation**. You are expected to lead it.
You can use the following steps to guide the discussion. To help solidify this process, work through the [System design interview questions with solutions](#system-design-interview-questions-with-solutions) section using the following steps.
l10n:p -->
## How to approach a system design interview question
> Как отвечать на вопросы на интерьвю по проектированию систем
Это интервью является **открытой беседой**. Ожидается, что вы возьмете инициативу по его ведению на себя.
Изучите раздел [System design interview questions with solutions](#system-design-interview-questions-with-solutions) и используйте шаги, описанные ниже.
<!-- l10n:p
### Step 1: Outline use cases, constraints, and assumptions
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss assumptions.
* Who is going to use it?
* How are they going to use it?
* How many users are there?
* What does the system do?
* What are the inputs and outputs of the system?
* How much data do we expect to handle?
* How many requests per second do we expect?
* What is the expected read to write ratio?
l10n:p -->
### Step 1: Outline use cases, constraints, and assumptions
Соберите требование и оцените рамки задачи. Задавайте вопросы, чтобы уточнить варианты использования и ограничения. Обсудите допущения.
* Кто будет использовать решение?
* Как его будут использовать?
* Сколько пользователей?
* Что система должна делать?
* Что система получает на вход, и что должно быть на выходе?
* Какое количество данных система должна обрабатывать?
* Сколько ожидается запросов в секунду?
* Какое соотношение количества операций на чтение и запись?
<!-- l10n:p
### Step 2: Create a high level design
Outline a high level design with all important components.
* Sketch the main components and connections
* Justify your ideas
l10n:p -->
### Step 2: Create a high level design
Сделайте набросок проекта с наиболее важными компонентами:
* Выделите главные компоненты и связи между ними
* Обоснуйте ваши идеи
<!-- l10n:p
### Step 3: Design core components
Dive into details for each core component. For example, if you were asked to [design a url shortening service](solutions/system_design/pastebin/README.md), discuss:
* Generating and storing a hash of the full url
* [MD5](solutions/system_design/pastebin/README.md) and [Base62](solutions/system_design/pastebin/README.md)
* Hash collisions
* SQL or NoSQL
* Database schema
* Translating a hashed url to the full url
* Database lookup
* API and object-oriented design
l10n:p -->
### Step 3: Design core components
Детализируйте каждый компонент. Например, если вас попросили разработать [design a url shortening service](solutions/system_design/pastebin/README.md), обсудите следующие моменты:
* Генерация и хранения хэша оригинального URL
* [MD5](solutions/system_design/pastebin/README.md) и [Base62](solutions/system_design/pastebin/README.md)
* Коллизии хэш-функции
* SQL или NoSQL
* Схема базы данных
* Перевод хэшированного URL в оригинальный URL
* Поиск в базе данных
* API и объектно-ориентированное проектирование
<!-- l10n:p
### Step 4: Scale the design
Identify and address bottlenecks, given the constraints. For example, do you need the following to address scalability issues?
* Load balancer
* Horizontal scaling
* Caching
* Database sharding
Discuss potential solutions and trade-offs. Everything is a trade-off. Address bottlenecks using [principles of scalable system design](#index-of-system-design-topics).
l10n:p -->
### Step 4: Scale the design
Определите узкие места и разберитесь с ними, учитывая данные ограничения. Например, для решение проблем с масштабируемостью, может ли вам понадобиться что-то из:
* Балансировщик нагрузки
* Горизонтальное машстабирование
* Кэширование
* Шардинг (sharding) базы данных
Обсудите потенциальные варианты и уступки. Разберитесь с узкими местами используя [principles of scalable system design](#index-of-system-design-topics).
<!-- l10n:p
### Back-of-the-envelope calculations
You might be asked to do some estimates by hand. Refer to the [Appendix](#appendix) for the following resources:
* [Use back of the envelope calculations](http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-envelope-calculations-to-choo.html)
* [Powers of two table](#powers-of-two-table)
* [Latency numbers every programmer should know](#latency-numbers-every-programmer-should-know)
l10n:p -->
### Back-of-the-envelope calculations
Вас могу спросить сделать оценку решения по некоторые параметрам. Некоторые разделы [Appendix](#appendix) могут с этим помочь:
* [Use back of the envelope calculations](http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-envelope-calculations-to-choo.html)
* [Powers of two table](#powers-of-two-table)
* [Latency numbers every programmer should know](#latency-numbers-every-programmer-should-know)
<!-- l10n:p
### Source(s) and further reading
Check out the following links to get a better idea of what to expect:
* [How to ace a systems design interview](https://www.palantir.com/2011/10/how-to-rock-a-systems-design-interview/)
* [The system design interview](http://www.hiredintech.com/system-design)
* [Intro to Architecture and Systems Design Interviews](https://www.youtube.com/watch?v=ZgdS0EUmn70)
l10n:p -->
### Source(s) and further reading
Посмотрите следующие ссылки, чтобы понять, что можно ожидать (внешние ссылки без перевода):
* [Use back of the envelope calculations](http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-envelope-calculations-to-choo.html)
* [Powers of two table](#powers-of-two-table)
* [Latency numbers every programmer should know](#latency-numbers-every-programmer-should-know)
<!-- l10n:p
## System design interview questions with solutions
> Common system design interview questions with sample discussions, code, and diagrams.
>
> Solutions linked to content in the `solutions/` folder.
| Question | |
|---|---|
| Design Pastebin.com (or Bit.ly) | [Solution](solutions/system_design/pastebin/README.md) |
| Design the Twitter timeline and search (or Facebook feed and search) | [Solution](solutions/system_design/twitter/README.md) |
| Design a web crawler | [Solution](solutions/system_design/web_crawler/README.md) |
| Design Mint.com | [Solution](solutions/system_design/mint/README.md) |
| Design the data structures for a social network | [Solution](solutions/system_design/social_graph/README.md) |
| Design a key-value store for a search engine | [Solution](solutions/system_design/query_cache/README.md) |
| Design Amazon's sales ranking by category feature | [Solution](solutions/system_design/sales_rank/README.md) |
| Design a system that scales to millions of users on AWS | [Solution](solutions/system_design/scaling_aws/README.md) |
| Add a system design question | [Contribute](#contributing) |
l10n:p -->
## System design interview questions with solutions
> Распространенные задачи с обсуждением, кодом и диаграммами.
>
> Решение находятся в директории `solutions/`.
| Задача на проектирование | |
|---|---|
| Pastebin.com (или Bit.ly) | [Решение](solutions/system_design/pastebin/README.md) |
| Лента и поиск в Twitter (или Facebook) | [Решение](solutions/system_design/twitter/README.md) |
| Веб-сканер | [Решение](solutions/system_design/web_crawler/README.md) |
| Система управление личными финансами Mint.com | [Решение](solutions/system_design/mint/README.md) |
| Структура данных для социальной сети | [Решение](solutions/system_design/social_graph/README.md) |
| Хранилище типа ключ-значение для поисковика | [Решение](solutions/system_design/query_cache/README.md) |
| Рейтинг продаж по категориям в Amazon | [Решение](solutions/system_design/sales_rank/README.md) |
| Система, которая масштабируется до миллиона пользователей на AWS | [Решение](solutions/system_design/scaling_aws/README.md) |
| Добавьте задачу | [Решение](#contributing) |
<!-- l10n:p
### Design Pastebin.com (or Bit.ly)
[View exercise and solution](solutions/system_design/pastebin/README.md)
![Imgur](http://i.imgur.com/4edXG0T.png)
l10n:p -->
### Design Pastebin.com (or Bit.ly)
[Требования и решение](solutions/system_design/pastebin/README.md)
![Imgur](http://i.imgur.com/4edXG0T.png)
<!-- l10n:p
### Design the Twitter timeline and search (or Facebook feed and search)
[View exercise and solution](solutions/system_design/twitter/README.md)
![Imgur](http://i.imgur.com/jrUBAF7.png)
l10n:p -->
### Design the Twitter timeline and search (or Facebook feed and search)
[Требования и решение](solutions/system_design/twitter/README.md)
![Imgur](http://i.imgur.com/jrUBAF7.png)
<!-- l10n:p
### Design a web crawler
[View exercise and solution](solutions/system_design/web_crawler/README.md)
![Imgur](http://i.imgur.com/bWxPtQA.png)
l10n:p -->
### Design a web crawler
[Требования и решение](solutions/system_design/web_crawler/README.md)
![Imgur](http://i.imgur.com/bWxPtQA.png)
<!-- l10n:p
### Design Mint.com
[View exercise and solution](solutions/system_design/mint/README.md)
![Imgur](http://i.imgur.com/V5q57vU.png)
l10n:p -->
### Design Mint.com
[Требования и решение](solutions/system_design/mint/README.md)
![Imgur](http://i.imgur.com/V5q57vU.png)
<!-- l10n:p
### Design the data structures for a social network
[View exercise and solution](solutions/system_design/social_graph/README.md)
![Imgur](http://i.imgur.com/cdCv5g7.png)
l10n:p -->
### Design the data structures for a social network
[Требования и решение](solutions/system_design/social_graph/README.md)
![Imgur](http://i.imgur.com/cdCv5g7.png)
<!-- l10n:p
### Design a key-value store for a search engine
[View exercise and solution](solutions/system_design/query_cache/README.md)
![Imgur](http://i.imgur.com/4j99mhe.png)
l10n:p -->
### Design a key-value store for a search engine
[Требования и решение](solutions/system_design/query_cache/README.md)
![Imgur](http://i.imgur.com/4j99mhe.png)
<!-- l10n:p
### Design Amazon's sales ranking by category feature
[View exercise and solution](solutions/system_design/sales_rank/README.md)
![Imgur](http://i.imgur.com/MzExP06.png)
l10n:p -->
### Design Amazon's sales ranking by category feature
[Требование и решение](solutions/system_design/sales_rank/README.md)
![Imgur](http://i.imgur.com/MzExP06.png)
<!-- l10n:p
### Design a system that scales to millions of users on AWS
[View exercise and solution](solutions/system_design/scaling_aws/README.md)
![Imgur](http://i.imgur.com/jj3A5N8.png)
l10n:p -->
### Design a system that scales to millions of users on AWS
[Требования и решение](solutions/system_design/scaling_aws/README.md)
![Imgur](http://i.imgur.com/jj3A5N8.png)
<!-- l10n:p
## Object-oriented design interview questions with solutions
> Common object-oriented design interview questions with sample discussions, code, and diagrams.
>
> Solutions linked to content in the `solutions/` folder.
>**Note: This section is under development**
| Question | |
|---|---|
| Design a hash map | [Solution](solutions/object_oriented_design/hash_table/hash_map.ipynb) |
| Design a least recently used cache | [Solution](solutions/object_oriented_design/lru_cache/lru_cache.ipynb) |
| Design a call center | [Solution](solutions/object_oriented_design/call_center/call_center.ipynb) |
| Design a deck of cards | [Solution](solutions/object_oriented_design/deck_of_cards/deck_of_cards.ipynb) |
| Design a parking lot | [Solution](solutions/object_oriented_design/parking_lot/parking_lot.ipynb) |
| Design a chat server | [Solution](solutions/object_oriented_design/online_chat/online_chat.ipynb) |
| Design a circular array | [Contribute](#contributing) |
| Add an object-oriented design question | [Contribute](#contributing) |
l10n:p -->
## Object-oriented design interview questions with solutions
> Распространенные задачи с обсуждением, кодом и диаграммами.
>
> Решение находятся в директории `solutions/`.
>**Внимание, этот раздел находится в стадии разработки**
| Задачи на проектировние | |
|---|---|
| Хэш таблица | [Решение](solutions/object_oriented_design/hash_table/hash_map.ipynb) |
| Кэширование с удалением давно используемых (Least recently used - LRU) | [Решение](solutions/object_oriented_design/lru_cache/lru_cache.ipynb) |
| Центр обработки звонков | [Решение](solutions/object_oriented_design/call_center/call_center.ipynb) |
| Колода карт | [Решение](solutions/object_oriented_design/deck_of_cards/deck_of_cards.ipynb) |
| Парковка | [Решение](solutions/object_oriented_design/parking_lot/parking_lot.ipynb) |
| Чат сервер | [Решение](solutions/object_oriented_design/online_chat/online_chat.ipynb) |
| Циклический массив | [Contribute](#contributing) |
| Добавьте задачу | [Contribute](#contributing) |
<!-- l10n:p
## System design topics: start here
New to system design?
First, you'll need a basic understanding of common principles, learning about what they are, how they are used, and their pros and cons.
l10n:p -->
## System design topics: start here
Только начинайте изучать проектирование систем?
Для начала, вам понадобится понимание базовых принципов, как они используются, их преимущества и недостатки.
<!-- l10n:p
### Step 1: Review the scalability video lecture
[Scalability Lecture at Harvard](https://www.youtube.com/watch?v=-W9F__D3oY4)
* Topics covered:
* Vertical scaling
* Horizontal scaling
* Caching
* Load balancing
* Database replication
* Database partitioning
l10n:p -->
### Step 1: Review the scalability video lecture
[Лекция по масштабированию в Гарварде](https://www.youtube.com/watch?v=-W9F__D3oY4)
* Темы:
* Вертикальное масштабирование
* Горизонтальное масштабирование
* Кэширование
* Балансировка нагрузки
* Репликация баз данных
* Секцирование (Partitioning) баз данных
<!-- l10n:p
### Step 2: Review the scalability article
[Scalability](http://www.lecloud.net/tagged/scalability/chrono)
* Topics covered:
* [Clones](http://www.lecloud.net/post/7295452622/scalability-for-dummies-part-1-clones)
* [Databases](http://www.lecloud.net/post/7994751381/scalability-for-dummies-part-2-database)
* [Caches](http://www.lecloud.net/post/9246290032/scalability-for-dummies-part-3-cache)
* [Asynchronism](http://www.lecloud.net/post/9699762917/scalability-for-dummies-part-4-asynchronism)
l10n:p -->
### Step 2: Review the scalability article
[Масштабирование](http://www.lecloud.net/tagged/scalability/chrono)
* Темы:
* [Клонирование](http://www.lecloud.net/post/7295452622/scalability-for-dummies-part-1-clones)
* [Базы данных](http://www.lecloud.net/post/7994751381/scalability-for-dummies-part-2-database)
* [Кэши](http://www.lecloud.net/post/9246290032/scalability-for-dummies-part-3-cache)
* [Асинхронность](http://www.lecloud.net/post/9699762917/scalability-for-dummies-part-4-asynchronism)
<!-- l10n:p
### Next steps
Next, we'll look at high-level trade-offs:
* **Performance** vs **scalability**
* **Latency** vs **throughput**
* **Availability** vs **consistency**
Keep in mind that **everything is a trade-off**.
Then we'll dive into more specific topics such as DNS, CDNs, and load balancers.
l10n:p -->
### Next steps
Далее, изучим уступки в общем виде:
* **Производительность** и **масштабирование**
* **Задержка** и **пропускная способность**
* **Доступность** и **согласованность данных**
Помните, что **везде необходимы уступки**.
Далее, изучем более детально DNS, CDN, балансировщики нагрузки и другие темы.
<!-- l10n:p
## Performance vs scalability
A service is **scalable** if it results in increased **performance** in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.<sup><a href=http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html>1</a></sup>
Another way to look at performance vs scalability:
* If you have a **performance** problem, your system is slow for a single user.
* If you have a **scalability** problem, your system is fast for a single user but slow under heavy load.
l10n:p -->
## Performance vs scalability
Сервис считается **масштабируемым**, если его **производительность** растет пропорционально добавленным ресурсам. Обычно под увеличением производительности подразумевают увеличение количества обрабатываемых единиц работы. Однако, это может быть и обработка более крупных единиц работы, как, например, при росте объема данных.<sup><a href=http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html>1</a></sup>
Иначе говоря:
* если у вас проблемы с **производительностью**, ваша система медленная для одного пользователя;
* если у вас проблемы с **масштабируемостью**, ваша системы быстрая для одного пользователя, но становится медленной под большой нагрузкой.
<!-- l10n:p
### Source(s) and further reading
* [A word on scalability](http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html)
* [Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/)
l10n:p -->
### Source(s) and further reading
* [A word on scalability](http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html)
* [Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/)
<!-- l10n:p
## Latency vs throughput
**Latency** is the time to perform some action or to produce some result.
**Throughput** is the number of such actions or results per unit of time.
Generally, you should aim for **maximal throughput** with **acceptable latency**.
l10n:p -->
## Latency vs throughput
**Задержка** - это время, необходимое для выполнения действия или достижения некоторого результата.
**Пропускная способность** - это количество такие действий или результататов в единицу времени.
Обычно следует стремиться к **максимальной пропускной способности**, при этом сохраняя **задержку приемлимой**.
<!-- l10n:p
### Source(s) and further reading
* [Understanding latency vs throughput](https://community.cadence.com/cadence_blogs_8/b/sd/archive/2010/09/13/understanding-latency-vs-throughput)
l10n:p -->
### Source(s) and further reading
* [Understanding latency vs throughput](https://community.cadence.com/cadence_blogs_8/b/sd/archive/2010/09/13/understanding-latency-vs-throughput)
<!-- l10n:p
## Availability vs consistency
l10n:p -->
## Availability vs consistency
<!-- l10n:p
### CAP theorem
<p align="center">
<img src="http://i.imgur.com/bgLMI2u.png"/>
<br/>
<i><a href=http://robertgreiner.com/2014/08/cap-theorem-revisited>Source: CAP theorem revisited</a></i>
</p>
In a distributed computer system, you can only support two of the following guarantees:
* **Consistency** - Every read receives the most recent write or an error
* **Availability** - Every request receives a response, without guarantee that it contains the most recent version of the information
* **Partition Tolerance** - The system continues to operate despite arbitrary partitioning due to network failures
*Networks aren't reliable, so you'll need to support partition tolerance. You'll need to make a software tradeoff between consistency and availability.*
l10n:p -->
### CAP theorem
<p align="center">
<img src="http://i.imgur.com/bgLMI2u.png"/>
<br/>
<i><a href=http://robertgreiner.com/2014/08/cap-theorem-revisited>Источник: CAP theorem revisited</a></i>
<br/>
<i><a href=https://ru.wikipedia.org/wiki/%D0%A2%D0%B5%D0%BE%D1%80%D0%B5%D0%BC%D0%B0_CAP>Дополнительный источник: Wikipedia</a></i>
</p>
В распределенный системах можно обеспечить только два из трех свойств, указанных ниже:
* **Согласованность данных (Consistency)** - каждый запрос на чтение возвращает самые актуальные данные либо ошибку.
* **Доступность (Availability)** - любой запрос возвращает результат, но без гарантии, что он содержит самую актуальную версию данных.
* **Устойчивость к разделению (Partition Tolerance)** - система продолжает работать, несмотря на произвольное разделение узлов системы из-за проблем с сетью.
*Сетевые соединения ненадеждны, поэтому поддерживать **устойчивость к разделению** необходимо. Выбор придется делать между **согласованностью данных** и **доступностью**.*
<!-- l10n:p
#### CP - consistency and partition tolerance
Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.
l10n:p -->
#### CP - consistency and partition tolerance
При таком подходе ожидание ответа от узла может привести к ошибке - истечению времени ожидания (timeout error). CP решение хорошо подходит для систем, где необходима атомарность операций чтения и записи.
<!-- l10n:p
#### AP - availability and partition tolerance
Responses return the most recent version of the data available on a node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.
AP is a good choice if the business needs allow for [eventual consistency](#eventual-consistency) or when the system needs to continue working despite external errors.
l10n:p -->
#### AP - availability and partition tolerance
При таком решении ответы на запросы возвращают данные, которые могут быть не самыми актуальными. Операция на запись может занять некоторое время, если придется ожидать восстановления потерянного соединения с одним из узлов распределенной системы.
AP решение подходит для систем, где система должна продолжать работать несмотря на внешние ошибки и допустима [eventual consistency](#eventual-consistency).
<!-- l10n:p
### Source(s) and further reading
* [CAP theorem revisited](http://robertgreiner.com/2014/08/cap-theorem-revisited/)
* [A plain english introduction to CAP theorem](http://ksat.me/a-plain-english-introduction-to-cap-theorem)
* [CAP FAQ](https://github.com/henryr/cap-faq)
l10n:p -->
### Source(s) and further reading
* [CAP theorem revisited](http://robertgreiner.com/2014/08/cap-theorem-revisited/)
* [A plain english introduction to CAP theorem](http://ksat.me/a-plain-english-introduction-to-cap-theorem)
* [CAP FAQ](https://github.com/henryr/cap-faq)
<!-- l10n:p
## Consistency patterns
With multiple copies of the same data, we are faced with options on how to synchronize them so clients have a consistent view of the data. Recall the definition of consistency from the [CAP theorem](#cap-theorem) - Every read receives the most recent write or an error.
l10n:p -->
## Consistency patterns
В распределенной системе можете существовать несколько копий одних и тех же данных. Для достижения согласованности данных, получаемых клиенстким приложением, существует несколько подходов синхронизации этих копий.
<!-- l10n:p
### Weak consistency
After a write, reads may or may not see it. A best effort approach is taken.
This approach is seen in systems such as memcached. Weak consistency works well in real time use cases such as VoIP, video chat, and realtime multiplayer games. For example, if you are on a phone call and lose reception for a few seconds, when you regain connection you do not hear what was spoken during connection loss.
l10n:p -->
### Weak consistency
После операции записи данных, операция чтения может увидеть эти данные, а может и не увидеть. Используется подход, при котором можно сделать как можно лучше, но с учетом данной ситуации.
Этот подход используеются в таких системах, как memcached. Слабая согласованность применяется в таких системах как VoIP, видео чаты и игры реального времени на несколько игроков.
<!-- l10n:p
### Eventual consistency
After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.
This approach is seen in systems such as DNS and email. Eventual consistency works well in highly available systems.
l10n:p -->
### Eventual consistency
После операции записи данных, операция чтения в конечном счете увидит эти данные (обычно в течение нескольких миллисекунд). Данных в таком случае реплицируются асинхронно.
Такой подход используется в таких системах, как DNS и электронная почта. Согласованность в конечном счете хорошо подходит для систем с высокой доступностью.
<!-- l10n:p
### Strong consistency
After a write, reads will see it. Data is replicated synchronously.
This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.
l10n:p -->
### Strong consistency
После операции записи данных, операция чтения увидит эти данны. Данные реплицируются синхронно.
Такой подход используеются в файловых системаях и реляционных БД. Сильная согласованность хорошо подходит для систем, где требуются транзакции.
<!-- l10n:p
### Source(s) and further reading
* [Transactions across data centers](http://snarfed.org/transactions_across_datacenters_io.html)
l10n:p -->
### Source(s) and further reading
* [Transactions across data centers](http://snarfed.org/transactions_across_datacenters_io.html)
<!-- l10n:p
## Availability patterns
There are two main patterns to support high availability: **fail-over** and **replication**.
l10n:p -->
## Availability patterns
Для обеспечения высокой доступности существует два основных паттерна: **отказоустойчивость** и **репликация**.
<!-- l10n:p
### Fail-over
l10n:p -->
### Fail-over
<!-- l10n:p
#### Active-passive
With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active's IP address and resumes service.
The length of downtime is determined by whether the passive server is already running in 'hot' standby or whether it needs to start up from 'cold' standby. Only the active server handles traffic.
Active-passive failover can also be referred to as master-slave failover.
l10n:p -->
#### Active-passive
В таком режиме, активный и пассивны сервер, находящийся в режиме ожидание, обмениваются специальными сообщениями - heartbeats. Если такой сообщение не получается, то пассивный сервер получает IP адрес активного сервера и восстанавливает работу сервера.
Время простоя определяется в каком состоянии находится пассивный сервер:
* горячее (hot) ожидание - сервер уже работает
* холодное (cold) ожидание - сервер должен быть запущен.
Только активный сервер может обрабатывать клиентские запросы.
<!-- l10n:p
#### Active-active
In active-active, both servers are managing traffic, spreading the load between them.
If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.
Active-active failover can also be referred to as master-master failover.
l10n:p -->
#### Active-active
В таком режиме, оба сервера обрабатывают клиентские запросы, распределяют нагрузку между собой.
Если сервера имеют общий доступ, то публичные IP адреса обоих серверов должны быть зарегистрированы в DNS. Если сервера находятся во внутренней сети, то клиентское приложение знать про оба сервера.
Режим "активный-активный" также известен как "ведущий-ведущий".
<!-- l10n:p
### Disadvantage(s): failover
* Fail-over adds more hardware and additional complexity.
* There is a potential for loss of data if the active system fails before any newly written data can be replicated to the passive.
l10n:p -->
### Disadvantage(s): failover
* Отказоустойчивость делает систему более сложной и требует большего количества аппаратного обеспечения.
* Существует вероятность потери данных, если данных не успели реплицироваться во время переключения активного и пассивного серверов.
<!-- l10n:p
### Replication
l10n:p -->
### Replication
<!-- l10n:p
#### Master-slave and master-master
This topic is further discussed in the [Database](#database) section:
* [Master-slave replication](#master-slave-replication)
* [Master-master replication](#master-master-replication)
l10n:p -->
#### Master-slave and master-master
Эта тема обсуждается далее в разделе [Database](#database):
* [Master-slave replication](#master-slave-replication)
* [Master-master replication](#master-master-replication)
<!-- l10n:p
### Availability in numbers
Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is generally measured in number of 9s--a service with 99.99% availability is described as having four 9s.
l10n:p -->
### Availability in numbers
Доступность обычно измеряется как сотношение времени, когда система доступна ко всему промежутку времени измерения. Обычно это количество девяток. Говорят, что сервис с доступностью 99.99%, имеет доступность в четыре девятки.
<!-- l10n:p
#### 99.9% availability - three 9s
| Duration | Acceptable downtime|
|---------------------|--------------------|
| Downtime per year | 8h 45min 57s |
| Downtime per month | 43m 49.7s |
| Downtime per week | 10m 4.8s |
| Downtime per day | 1m 26.4s |
l10n:p -->
#### 99.9% availability - three 9s
| Длительность | Допустимое время простоя |
|------------------------|--------------------------|
| Время простоя в год | 8ч 45мин 57сек |
| Время простоя в месяц | 43мин 49.7сек |
| Время простоя в неделю | 10мин 4.8сек |
| Время простоя в день | 1мин 26.4сек |
<!-- l10n:p
#### 99.99% availability - four 9s
| Duration | Acceptable downtime|
|---------------------|--------------------|
| Downtime per year | 52min 35.7s |
| Downtime per month | 4m 23s |
| Downtime per week | 1m 5s |
| Downtime per day | 8.6s |
l10n:p -->
#### 99.99% availability - four 9s
| Длительность | Допустимое время простоя |
|------------------------|--------------------------|
| Время простоя в год | 52мин 35.7сек |
| Время простоя в месяц | 4мин 23сек |
| Время простоя в неделю | 1мин 5сек |
| Время простоя в день | 8.6сек |
<!-- l10n:p
#### Availability in parallel vs in sequence
If a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.
l10n:p -->
#### Availability in parallel vs in sequence
Если сервис состоит из нескольких компонентов, которые могут отказать в обслуживании, доступность сервиса зависит от того, как связаны эти компоненты - последовательно или параллельно.
<!-- l10n:p
###### In sequence
Overall availability decreases when two components with availability < 100% are in sequence:
```
Availability (Total) = Availability (Foo) * Availability (Bar)
```
If both `Foo` and `Bar` each had 99.9% availability, their total availability in sequence would be 99.8%.
l10n:p -->
###### In sequence
Общая доступность уменьшается, если два компонента (например, Foo и Bar) с доступностью менее 100% связаны последовательно:
```
Доступность (Общая) = Доступность (Foo) * Доступность (Bar)
```
<!-- l10n:p
###### In parallel
Overall availability increases when two components with availability < 100% are in parallel:
```
Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))
```
If both `Foo` and `Bar` each had 99.9% availability, their total availability in parallel would be 99.9999%.
l10n:p -->
###### In parallel
Общая доступность увеличивается, если два компонента с доступностью менее 100% связаны параллельно:
```
Доступность (Общая) = 1 - (1 - Доступность (Foo)) * (1 - Доступность (Bar))
```
<!-- l10n:p
## Domain name system
<p align="center">
<img src="http://i.imgur.com/IOyLj4i.jpg"/>
<br/>
<i><a href=http://www.slideshare.net/srikrupa5/dns-security-presentation-issa>Source: DNS security presentation</a></i>
</p>
A Domain Name System (DNS) translates a domain name such as www.example.com to an IP address.
DNS is hierarchical, with a few authoritative servers at the top level. Your router or ISP provides information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a certain period of time, determined by the [time to live (TTL)](https://en.wikipedia.org/wiki/Time_to_live).
* **NS record (name server)** - Specifies the DNS servers for your domain/subdomain.
* **MX record (mail exchange)** - Specifies the mail servers for accepting messages.
* **A record (address)** - Points a name to an IP address.
* **CNAME (canonical)** - Points a name to another name or `CNAME` (example.com to www.example.com) or to an `A` record.
Services such as [CloudFlare](https://www.cloudflare.com/dns/) and [Route 53](https://aws.amazon.com/route53/) provide managed DNS services. Some DNS services can route traffic through various methods:
* [Weighted round robin](https://www.g33kinfo.com/info/round-robin-vs-weighted-round-robin-lb)
* Prevent traffic from going to servers under maintenance
* Balance between varying cluster sizes
* A/B testing
* Latency-based
* Geolocation-based
l10n:p -->
## Domain name system
TBD
<!-- l10n:p
### Disadvantage(s): DNS
* Accessing a DNS server introduces a slight delay, although mitigated by caching described above.
* DNS server management could be complex and is generally managed by [governments, ISPs, and large companies](http://superuser.com/questions/472695/who-controls-the-dns-servers/472729).
* DNS services have recently come under [DDoS attack](http://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/), preventing users from accessing websites such as Twitter without knowing Twitter's IP address(es).
l10n:p -->
### Disadvantage(s): DNS
TBD
<!-- l10n:p
### Source(s) and further reading
* [DNS architecture](https://technet.microsoft.com/en-us/library/dd197427(v=ws.10).aspx)
* [Wikipedia](https://en.wikipedia.org/wiki/Domain_Name_System)
* [DNS articles](https://support.dnsimple.com/categories/dns/)
l10n:p -->
### Source(s) and further reading
TBD
<!-- l10n:p
## Content delivery network
<p align="center">
<img src="http://i.imgur.com/h9TAuGI.jpg"/>
<br/>
<i><a href=https://www.creative-artworks.eu/why-use-a-content-delivery-network-cdn/>Source: Why use a CDN</a></i>
</p>
A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although some CDNs such as Amazon's CloudFront support dynamic content. The site's DNS resolution will tell clients which server to contact.
Serving content from CDNs can significantly improve performance in two ways:
* Users receive content at data centers close to them
* Your servers do not have to serve requests that the CDN fulfills
l10n:p -->
## Content delivery network
TBD
<!-- l10n:p
### Push CDNs
Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage.
Sites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals.
l10n:p -->
### Push CDNs
TBD
<!-- l10n:p
### Pull CDNs
Pull CDNs grab new content from your server when the first user requests the content. You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN.
A [time-to-live (TTL)](https://en.wikipedia.org/wiki/Time_to_live) determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they have actually changed.
Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested content remaining on the CDN.
l10n:p -->
### Pull CDNs
TBD
<!-- l10n:p
### Disadvantage(s): CDN
* CDN costs could be significant depending on traffic, although this should be weighed with additional costs you would incur not using a CDN.
* Content might be stale if it is updated before the TTL expires it.
* CDNs require changing URLs for static content to point to the CDN.
l10n:p -->
### Disadvantage(s): CDN
TBD
<!-- l10n:p
### Source(s) and further reading
* [Globally distributed content delivery](https://figshare.com/articles/Globally_distributed_content_delivery/6605972)
* [The differences between push and pull CDNs](http://www.travelblogadvice.com/technical/the-differences-between-push-and-pull-cdns/)
* [Wikipedia](https://en.wikipedia.org/wiki/Content_delivery_network)
l10n:p -->
### Source(s) and further reading
TBD
<!-- l10n:p
## Load balancer
<p align="center">
<img src="http://i.imgur.com/h81n9iK.png"/>
<br/>
<i><a href=http://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html>Source: Scalable system design patterns</a></i>
</p>
Load balancers distribute incoming client requests to computing resources such as application servers and databases. In each case, the load balancer returns the response from the computing resource to the appropriate client. Load balancers are effective at:
* Preventing requests from going to unhealthy servers
* Preventing overloading resources
* Helping eliminate single points of failure
Load balancers can be implemented with hardware (expensive) or with software such as HAProxy.
Additional benefits include:
* **SSL termination** - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations
* Removes the need to install [X.509 certificates](https://en.wikipedia.org/wiki/X.509) on each server
* **Session persistence** - Issue cookies and route a specific client's requests to same instance if the web apps do not keep track of sessions
To protect against failures, it's common to set up multiple load balancers, either in [active-passive](#active-passive) or [active-active](#active-active) mode.
Load balancers can route traffic based on various metrics, including:
* Random
* Least loaded
* Session/cookies
* [Round robin or weighted round robin](https://www.g33kinfo.com/info/round-robin-vs-weighted-round-robin-lb)
* [Layer 4](#layer-4-load-balancing)
* [Layer 7](#layer-7-load-balancing)
l10n:p -->
## Load balancer
TBD
<!-- l10n:p
### Layer 4 load balancing
Layer 4 load balancers look at info at the [transport layer](#communication) to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers forward network packets to and from the upstream server, performing [Network Address Translation (NAT)](https://www.nginx.com/resources/glossary/layer-4-load-balancing/).
l10n:p -->
### Layer 4 load balancing
TBD
<!-- l10n:p
### Layer 7 load balancing
Layer 7 load balancers look at the [application layer](#communication) to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminates network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.
At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.
l10n:p -->
### Layer 7 load balancing
TBD
<!-- l10n:p
### Horizontal scaling
Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using commodity machines is more cost efficient and results in higher availability than scaling up a single server on more expensive hardware, called **Vertical Scaling**. It is also easier to hire for talent working on commodity hardware than it is for specialized enterprise systems.
l10n:p -->
### Horizontal scaling
TBD
<!-- l10n:p
#### Disadvantage(s): horizontal scaling
* Scaling horizontally introduces complexity and involves cloning servers
* Servers should be stateless: they should not contain any user-related data like sessions or profile pictures
* Sessions can be stored in a centralized data store such as a [database](#database) (SQL, NoSQL) or a persistent [cache](#cache) (Redis, Memcached)
* Downstream servers such as caches and databases need to handle more simultaneous connections as upstream servers scale out
l10n:p -->
#### Disadvantage(s): horizontal scaling
TBD
<!-- l10n:p
### Disadvantage(s): load balancer
* The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.
* Introducing a load balancer to help eliminate single points of failure results in increased complexity.
* A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.
l10n:p -->
### Disadvantage(s): load balancer
TBD
<!-- l10n:p
### Source(s) and further reading
* [NGINX architecture](https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/)
* [HAProxy architecture guide](http://www.haproxy.org/download/1.2/doc/architecture.txt)
* [Scalability](http://www.lecloud.net/post/7295452622/scalability-for-dummies-part-1-clones)
* [Wikipedia](https://en.wikipedia.org/wiki/Load_balancing_(computing))
* [Layer 4 load balancing](https://www.nginx.com/resources/glossary/layer-4-load-balancing/)
* [Layer 7 load balancing](https://www.nginx.com/resources/glossary/layer-7-load-balancing/)
* [ELB listener config](http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-listener-config.html)
l10n:p -->
### Source(s) and further reading
TBD
<!-- l10n:p
## Reverse proxy (web server)
<p align="center">
<img src="http://i.imgur.com/n41Azff.png"/>
<br/>
<i><a href=https://upload.wikimedia.org/wikipedia/commons/6/67/Reverse_proxy_h2g2bob.svg>Source: Wikipedia</a></i>
<br/>
</p>
A reverse proxy is a web server that centralizes internal services and provides unified interfaces to the public. Requests from clients are forwarded to a server that can fulfill it before the reverse proxy returns the server's response to the client.
Additional benefits include:
* **Increased security** - Hide information about backend servers, blacklist IPs, limit number of connections per client
* **Increased scalability and flexibility** - Clients only see the reverse proxy's IP, allowing you to scale servers or change their configuration
* **SSL termination** - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations
* Removes the need to install [X.509 certificates](https://en.wikipedia.org/wiki/X.509) on each server
* **Compression** - Compress server responses
* **Caching** - Return the response for cached requests
* **Static content** - Serve static content directly
* HTML/CSS/JS
* Photos
* Videos
* Etc
l10n:p -->
## Reverse proxy (web server)
TBD
<!-- l10n:p
### Load balancer vs reverse proxy
* Deploying a load balancer is useful when you have multiple servers. Often, load balancers route traffic to a set of servers serving the same function.
* Reverse proxies can be useful even with just one web server or application server, opening up the benefits described in the previous section.
* Solutions such as NGINX and HAProxy can support both layer 7 reverse proxying and load balancing.
l10n:p -->
### Load balancer vs reverse proxy
TBD
<!-- l10n:p
### Disadvantage(s): reverse proxy
* Introducing a reverse proxy results in increased complexity.
* A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a [failover](https://en.wikipedia.org/wiki/Failover)) further increases complexity.
l10n:p -->
### Disadvantage(s): reverse proxy
TBD
<!-- l10n:p
### Source(s) and further reading
* [Reverse proxy vs load balancer](https://www.nginx.com/resources/glossary/reverse-proxy-vs-load-balancer/)
* [NGINX architecture](https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/)
* [HAProxy architecture guide](http://www.haproxy.org/download/1.2/doc/architecture.txt)
* [Wikipedia](https://en.wikipedia.org/wiki/Reverse_proxy)
l10n:p -->
### Source(s) and further reading
TBD
<!-- l10n:p
## Application layer
<p align="center">
<img src="http://i.imgur.com/yB5SYwm.png"/>
<br/>
<i><a href=http://lethain.com/introduction-to-architecting-systems-for-scale/#platform_layer>Source: Intro to architecting systems for scale</a></i>
</p>
Separating out the web layer from the application layer (also known as platform layer) allows you to scale and configure both layers independently. Adding a new API results in adding application servers without necessarily adding additional web servers. The **single responsibility principle** advocates for small and autonomous services that work together. Small teams with small services can plan more aggressively for rapid growth.
Workers in the application layer also help enable [asynchronism](#asynchronism).
l10n:p -->
## Application layer
TBD
<!-- l10n:p
### Microservices
Related to this discussion are [microservices](https://en.wikipedia.org/wiki/Microservices), which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. <sup><a href=https://smartbear.com/learn/api-design/what-are-microservices>1</a></sup>
Pinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc.
l10n:p -->
### Microservices
TBD
<!-- l10n:p
### Service Discovery
Systems such as [Consul](https://www.consul.io/docs/index.html), [Etcd](https://coreos.com/etcd/docs/latest), and [Zookeeper](http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper) can help services find each other by keeping track of registered names, addresses, and ports. [Health checks](https://www.consul.io/intro/getting-started/checks.html) help verify service integrity and are often done using an [HTTP](#hypertext-transfer-protocol-http) endpoint. Both Consul and Etcd have a built in [key-value store](#key-value-store) that can be useful for storing config values and other shared data.
l10n:p -->
### Service Discovery
TBD
<!-- l10n:p
### Disadvantage(s): application layer
* Adding an application layer with loosely coupled services requires a different approach from an architectural, operations, and process viewpoint (vs a monolithic system).
* Microservices can add complexity in terms of deployments and operations.
l10n:p -->
### Disadvantage(s): application layer
TBD
<!-- l10n:p
### Source(s) and further reading
* [Intro to architecting systems for scale](http://lethain.com/introduction-to-architecting-systems-for-scale)
* [Crack the system design interview](http://www.puncsky.com/blog/2016-02-13-crack-the-system-design-interview)
* [Service oriented architecture](https://en.wikipedia.org/wiki/Service-oriented_architecture)
* [Introduction to Zookeeper](http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper)
* [Here's what you need to know about building microservices](https://cloudncode.wordpress.com/2016/07/22/msa-getting-started/)
l10n:p -->
### Source(s) and further reading
TBD
<!-- l10n:p
## Database
<p align="center">
<img src="http://i.imgur.com/Xkm5CXz.png"/>
<br/>
<i><a href=https://www.youtube.com/watch?v=w95murBkYmU>Source: Scaling up to your first 10 million users</a></i>
</p>
l10n:p -->
## Database
TBD
<!-- l10n:p
### Relational database management system (RDBMS)
A relational database like SQL is a collection of data items organized in tables.
**ACID** is a set of properties of relational database [transactions](https://en.wikipedia.org/wiki/Database_transaction).
* **Atomicity** - Each transaction is all or nothing
* **Consistency** - Any transaction will bring the database from one valid state to another
* **Isolation** - Executing transactions concurrently has the same results as if the transactions were executed serially
* **Durability** - Once a transaction has been committed, it will remain so
There are many techniques to scale a relational database: **master-slave replication**, **master-master replication**, **federation**, **sharding**, **denormalization**, and **SQL tuning**.
l10n:p -->
### Relational database management system (RDBMS)
TBD
<!-- l10n:p
#### Master-slave replication
The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.
<p align="center">
<img src="http://i.imgur.com/C9ioGtn.png"/>
<br/>
<i><a href=http://www.slideshare.net/jboner/scalability-availability-stability-patterns/>Source: Scalability, availability, stability, patterns</a></i>
</p>
l10n:p -->
#### Master-slave replication
TBD
<!-- l10n:p
##### Disadvantage(s): master-slave replication
* Additional logic is needed to promote a slave to a master.
* See [Disadvantage(s): replication](#disadvantages-replication) for points related to **both** master-slave and master-master.
l10n:p -->
##### Disadvantage(s): master-slave replication
TBD
<!-- l10n:p
#### Master-master replication
Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.
<p align="center">
<img src="http://i.imgur.com/krAHLGg.png"/>
<br/>
<i><a href=http://www.slideshare.net/jboner/scalability-availability-stability-patterns/>Source: Scalability, availability, stability, patterns</a></i>
</p>
l10n:p -->
#### Master-master replication
TBD
<!-- l10n:p
##### Disadvantage(s): master-master replication
* You'll need a load balancer or you'll need to make changes to your application logic to determine where to write.
* Most master-master systems are either loosely consistent (violating ACID) or have increased write latency due to synchronization.
* Conflict resolution comes more into play as more write nodes are added and as latency increases.
* See [Disadvantage(s): replication](#disadvantages-replication) for points related to **both** master-slave and master-master.
l10n:p -->
##### Disadvantage(s): master-master replication
TBD
<!-- l10n:p
##### Disadvantage(s): replication
* There is a potential for loss of data if the master fails before any newly written data can be replicated to other nodes.
* Writes are replayed to the read replicas. If there are a lot of writes, the read replicas can get bogged down with replaying writes and can't do as many reads.
* The more read slaves, the more you have to replicate, which leads to greater replication lag.
* On some systems, writing to the master can spawn multiple threads to write in parallel, whereas read replicas only support writing sequentially with a single thread.
* Replication adds more hardware and additional complexity.
l10n:p -->
##### Disadvantage(s): replication
TBD
<!-- l10n:p
##### Source(s) and further reading: replication
* [Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/)
* [Multi-master replication](https://en.wikipedia.org/wiki/Multi-master_replication)
l10n:p -->
##### Source(s) and further reading: replication
TBD
<!-- l10n:p
#### Federation
<p align="center">
<img src="http://i.imgur.com/U3qV33e.png"/>
<br/>
<i><a href=https://www.youtube.com/watch?v=w95murBkYmU>Source: Scaling up to your first 10 million users</a></i>
</p>
Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: **forums**, **users**, and **products**, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.
l10n:p -->
#### Federation
TBD
<!-- l10n:p
##### Disadvantage(s): federation
* Federation is not effective if your schema requires huge functions or tables.
* You'll need to update your application logic to determine which database to read and write.
* Joining data from two databases is more complex with a [server link](http://stackoverflow.com/questions/5145637/querying-data-by-joining-two-tables-in-two-database-on-different-servers).
* Federation adds more hardware and additional complexity.
l10n:p -->
##### Disadvantage(s): federation
TBD
<!-- l10n:p
##### Source(s) and further reading: federation
* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=w95murBkYmU)
l10n:p -->
##### Source(s) and further reading: federation
TBD
<!-- l10n:p
#### Sharding
<p align="center">
<img src="http://i.imgur.com/wU8x5Id.png"/>
<br/>
<i><a href=http://www.slideshare.net/jboner/scalability-availability-stability-patterns/>Source: Scalability, availability, stability, patterns</a></i>
</p>
Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster.
Similar to the advantages of [federation](#federation), sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you'll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput.
Common ways to shard a table of users is either through the user's last name initial or the user's geographic location.
l10n:p -->
#### Sharding
TBD
<!-- l10n:p
##### Disadvantage(s): sharding
* You'll need to update your application logic to work with shards, which could result in complex SQL queries.
* Data distribution can become lopsided in a shard. For example, a set of power users on a shard could result in increased load to that shard compared to others.
* Rebalancing adds additional complexity. A sharding function based on [consistent hashing](http://www.paperplanes.de/2011/12/9/the-magic-of-consistent-hashing.html) can reduce the amount of transferred data.
* Joining data from multiple shards is more complex.
* Sharding adds more hardware and additional complexity.
l10n:p -->
##### Disadvantage(s): sharding
TBD
<!-- l10n:p
##### Source(s) and further reading: sharding
* [The coming of the shard](http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-database-design-the-coming-of-the.html)
* [Shard database architecture](https://en.wikipedia.org/wiki/Shard_(database_architecture))
* [Consistent hashing](http://www.paperplanes.de/2011/12/9/the-magic-of-consistent-hashing.html)
l10n:p -->
##### Source(s) and further reading: sharding
TBD
<!-- l10n:p
#### Denormalization
Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Some RDBMS such as [PostgreSQL](https://en.wikipedia.org/wiki/PostgreSQL) and Oracle support [materialized views](https://en.wikipedia.org/wiki/Materialized_view) which handle the work of storing redundant information and keeping redundant copies consistent.
Once data becomes distributed with techniques such as [federation](#federation) and [sharding](#sharding), managing joins across data centers further increases complexity. Denormalization might circumvent the need for such complex joins.
In most systems, reads can heavily outnumber writes 100:1 or even 1000:1. A read resulting in a complex database join can be very expensive, spending a significant amount of time on disk operations.
l10n:p -->
#### Denormalization
TBD
<!-- l10n:p
##### Disadvantage(s): denormalization
* Data is duplicated.
* Constraints can help redundant copies of information stay in sync, which increases complexity of the database design.
* A denormalized database under heavy write load might perform worse than its normalized counterpart.
l10n:p -->
##### Disadvantage(s): denormalization
TBD
<!-- l10n:p
###### Source(s) and further reading: denormalization
* [Denormalization](https://en.wikipedia.org/wiki/Denormalization)
l10n:p -->
###### Source(s) and further reading: denormalization
TBD
<!-- l10n:p
#### SQL tuning
SQL tuning is a broad topic and many [books](https://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=sql+tuning) have been written as reference.
It's important to **benchmark** and **profile** to simulate and uncover bottlenecks.
* **Benchmark** - Simulate high-load situations with tools such as [ab](http://httpd.apache.org/docs/2.2/programs/ab.html).
* **Profile** - Enable tools such as the [slow query log](http://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html) to help track performance issues.
Benchmarking and profiling might point you to the following optimizations.
l10n:p -->
#### SQL tuning
TBD
<!-- l10n:p
##### Tighten up the schema
* MySQL dumps to disk in contiguous blocks for fast access.
* Use `CHAR` instead of `VARCHAR` for fixed-length fields.
* `CHAR` effectively allows for fast, random access, whereas with `VARCHAR`, you must find the end of a string before moving onto the next one.
* Use `TEXT` for large blocks of text such as blog posts. `TEXT` also allows for boolean searches. Using a `TEXT` field results in storing a pointer on disk that is used to locate the text block.
* Use `INT` for larger numbers up to 2^32 or 4 billion.
* Use `DECIMAL` for currency to avoid floating point representation errors.
* Avoid storing large `BLOBS`, store the location of where to get the object instead.
* `VARCHAR(255)` is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS.
* Set the `NOT NULL` constraint where applicable to [improve search performance](http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search).
l10n:p -->
##### Tighten up the schema
TBD
<!-- l10n:p
##### Use good indices
* Columns that you are querying (`SELECT`, `GROUP BY`, `ORDER BY`, `JOIN`) could be faster with indices.
* Indices are usually represented as self-balancing [B-tree](https://en.wikipedia.org/wiki/B-tree) that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
* Placing an index can keep the data in memory, requiring more space.
* Writes could also be slower since the index also needs to be updated.
* When loading large amounts of data, it might be faster to disable indices, load the data, then rebuild the indices.
l10n:p -->
##### Use good indices
TBD
<!-- l10n:p
##### Avoid expensive joins
* [Denormalize](#denormalization) where performance demands it.
l10n:p -->
##### Avoid expensive joins
TBD
<!-- l10n:p
##### Partition tables
* Break up a table by putting hot spots in a separate table to help keep it in memory.
l10n:p -->
##### Partition tables
TBD
<!-- l10n:p
##### Tune the query cache
* In some cases, the [query cache](https://dev.mysql.com/doc/refman/5.7/en/query-cache.html) could lead to [performance issues](https://www.percona.com/blog/2016/10/12/mysql-5-7-performance-tuning-immediately-after-installation/).
l10n:p -->
##### Tune the query cache
TBD
<!-- l10n:p
##### Source(s) and further reading: SQL tuning
* [Tips for optimizing MySQL queries](http://aiddroid.com/10-tips-optimizing-mysql-queries-dont-suck/)
* [Is there a good reason i see VARCHAR(255) used so often?](http://stackoverflow.com/questions/1217466/is-there-a-good-reason-i-see-varchar255-used-so-often-as-opposed-to-another-l)
* [How do null values affect performance?](http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search)
* [Slow query log](http://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html)
l10n:p -->
##### Source(s) and further reading: SQL tuning
TBD
<!-- l10n:p
### NoSQL
NoSQL is a collection of data items represented in a **key-value store**, **document store**, **wide column store**, or a **graph database**. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores lack true ACID transactions and favor [eventual consistency](#eventual-consistency).
**BASE** is often used to describe the properties of NoSQL databases. In comparison with the [CAP Theorem](#cap-theorem), BASE chooses availability over consistency.
* **Basically available** - the system guarantees availability.
* **Soft state** - the state of the system may change over time, even without input.
* **Eventual consistency** - the system will become consistent over a period of time, given that the system doesn't receive input during that period.
In addition to choosing between [SQL or NoSQL](#sql-or-nosql), it is helpful to understand which type of NoSQL database best fits your use case(s). We'll review **key-value stores**, **document stores**, **wide column stores**, and **graph databases** in the next section.
l10n:p -->
### NoSQL
TBD
<!-- l10n:p
#### Key-value store
> Abstraction: hash table
A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in [lexicographic order](https://en.wikipedia.org/wiki/Lexicographical_order), allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.
Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.
A key-value store is the basis for more complex systems such as a document store, and in some cases, a graph database.
l10n:p -->
#### Key-value store
TBD
<!-- l10n:p
##### Source(s) and further reading: key-value store
* [Key-value database](https://en.wikipedia.org/wiki/Key-value_database)
* [Disadvantages of key-value stores](http://stackoverflow.com/questions/4056093/what-are-the-disadvantages-of-using-a-key-value-table-over-nullable-columns-or)
* [Redis architecture](http://qnimate.com/overview-of-redis-architecture/)
* [Memcached architecture](https://www.adayinthelifeof.nl/2011/02/06/memcache-internals/)
l10n:p -->
##### Source(s) and further reading: key-value store
TBD
<!-- l10n:p
#### Document store
> Abstraction: key-value store with documents stored as values
A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself. *Note, many key-value stores include features for working with a value's metadata, blurring the lines between these two storage types.*
Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other.
Some document stores like [MongoDB](https://www.mongodb.com/mongodb-architecture) and [CouchDB](https://blog.couchdb.org/2016/08/01/couchdb-2-0-architecture/) also provide a SQL-like language to perform complex queries. [DynamoDB](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf) supports both key-values and documents.
Document stores provide high flexibility and are often used for working with occasionally changing data.
l10n:p -->
#### Document store
TBD
<!-- l10n:p
##### Source(s) and further reading: document store
* [Document-oriented database](https://en.wikipedia.org/wiki/Document-oriented_database)
* [MongoDB architecture](https://www.mongodb.com/mongodb-architecture)
* [CouchDB architecture](https://blog.couchdb.org/2016/08/01/couchdb-2-0-architecture/)
* [Elasticsearch architecture](https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up)
l10n:p -->
##### Source(s) and further reading: document store
TBD
<!-- l10n:p
#### Wide column store
<p align="center">
<img src="http://i.imgur.com/n16iOGk.png"/>
<br/>
<i><a href=http://blog.grio.com/2015/11/sql-nosql-a-brief-history.html>Source: SQL & NoSQL, a brief history</a></i>
</p>
> Abstraction: nested map `ColumnFamily<RowKey, Columns<ColKey, Value, Timestamp>>`
A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.
Google introduced [Bigtable](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf) as the first wide column store, which influenced the open-source [HBase](https://www.mapr.com/blog/in-depth-look-hbase-architecture) often-used in the Hadoop ecosystem, and [Cassandra](http://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archIntro.html) from Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges.
Wide column stores offer high availability and high scalability. They are often used for very large data sets.
l10n:p -->
#### Wide column store
TBD
<!-- l10n:p
##### Source(s) and further reading: wide column store
* [SQL & NoSQL, a brief history](http://blog.grio.com/2015/11/sql-nosql-a-brief-history.html)
* [Bigtable architecture](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf)
* [HBase architecture](https://www.mapr.com/blog/in-depth-look-hbase-architecture)
* [Cassandra architecture](http://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archIntro.html)
l10n:p -->
##### Source(s) and further reading: wide column store
TBD
<!-- l10n:p
#### Graph database
<p align="center">
<img src="http://i.imgur.com/fNcl65g.png"/>
<br/>
<i><a href=https://en.wikipedia.org/wiki/File:GraphDatabase_PropertyGraph.png>Source: Graph database</a></i>
</p>
> Abstraction: graph
In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships.
Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with [REST APIs](#representational-state-transfer-rest).
l10n:p -->
#### Graph database
TBD
<!-- l10n:p
##### Source(s) and further reading: graph
* [Graph database](https://en.wikipedia.org/wiki/Graph_database)
* [Neo4j](https://neo4j.com/)
* [FlockDB](https://blog.twitter.com/2010/introducing-flockdb)
l10n:p -->
##### Source(s) and further reading: graph
TBD
<!-- l10n:p
#### Source(s) and further reading: NoSQL
* [Explanation of base terminology](http://stackoverflow.com/questions/3342497/explanation-of-base-terminology)
* [NoSQL databases a survey and decision guidance](https://medium.com/baqend-blog/nosql-databases-a-survey-and-decision-guidance-ea7823a822d#.wskogqenq)
* [Scalability](http://www.lecloud.net/post/7994751381/scalability-for-dummies-part-2-database)
* [Introduction to NoSQL](https://www.youtube.com/watch?v=qI_g07C_Q5I)
* [NoSQL patterns](http://horicky.blogspot.com/2009/11/nosql-patterns.html)
l10n:p -->
#### Source(s) and further reading: NoSQL
TBD
<!-- l10n:p
### SQL or NoSQL
<p align="center">
<img src="http://i.imgur.com/wXGqG5f.png"/>
<br/>
<i><a href=https://www.infoq.com/articles/Transition-RDBMS-NoSQL/>Source: Transitioning from RDBMS to NoSQL</a></i>
</p>
Reasons for **SQL**:
* Structured data
* Strict schema
* Relational data
* Need for complex joins
* Transactions
* Clear patterns for scaling
* More established: developers, community, code, tools, etc
* Lookups by index are very fast
Reasons for **NoSQL**:
* Semi-structured data
* Dynamic or flexible schema
* Non-relational data
* No need for complex joins
* Store many TB (or PB) of data
* Very data intensive workload
* Very high throughput for IOPS
Sample data well-suited for NoSQL:
* Rapid ingest of clickstream and log data
* Leaderboard or scoring data
* Temporary data, such as a shopping cart
* Frequently accessed ('hot') tables
* Metadata/lookup tables
l10n:p -->
### SQL or NoSQL
TBD
<!-- l10n:p
##### Source(s) and further reading: SQL or NoSQL
* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=w95murBkYmU)
* [SQL vs NoSQL differences](https://www.sitepoint.com/sql-vs-nosql-differences/)
l10n:p -->
##### Source(s) and further reading: SQL or NoSQL
TBD
<!-- l10n:p
## Cache
<p align="center">
<img src="http://i.imgur.com/Q6z24La.png"/>
<br/>
<i><a href=http://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html>Source: Scalable system design patterns</a></i>
</p>
Caching improves page load times and can reduce the load on your servers and databases. In this model, the dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.
Databases often benefit from a uniform distribution of reads and writes across its partitions. Popular items can skew the distribution, causing bottlenecks. Putting a cache in front of a database can help absorb uneven loads and spikes in traffic.
l10n:p -->
## Cache
TBD
<!-- l10n:p
### Client caching
Caches can be located on the client side (OS or browser), [server side](#reverse-proxy-web-server), or in a distinct cache layer.
l10n:p -->
### Client caching
TBD
<!-- l10n:p
### CDN caching
[CDNs](#content-delivery-network) are considered a type of cache.
l10n:p -->
### CDN caching
TBD
<!-- l10n:p
### Web server caching
[Reverse proxies](#reverse-proxy-web-server) and caches such as [Varnish](https://www.varnish-cache.org/) can serve static and dynamic content directly. Web servers can also cache requests, returning responses without having to contact application servers.
l10n:p -->
### Web server caching
TBD
<!-- l10n:p
### Database caching
Your database usually includes some level of caching in a default configuration, optimized for a generic use case. Tweaking these settings for specific usage patterns can further boost performance.
l10n:p -->
### Database caching
TBD
<!-- l10n:p
### Application caching
In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so [cache invalidation](https://en.wikipedia.org/wiki/Cache_algorithms) algorithms such as [least recently used (LRU)](https://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used) can help invalidate 'cold' entries and keep 'hot' data in RAM.
Redis has the following additional features:
* Persistence option
* Built-in data structures such as sorted sets and lists
There are multiple levels you can cache that fall into two general categories: **database queries** and **objects**:
* Row level
* Query-level
* Fully-formed serializable objects
* Fully-rendered HTML
Generally, you should try to avoid file-based caching, as it makes cloning and auto-scaling more difficult.
l10n:p -->
### Application caching
TBD
<!-- l10n:p
### Caching at the database query level
Whenever you query the database, hash the query as a key and store the result to the cache. This approach suffers from expiration issues:
* Hard to delete a cached result with complex queries
* If one piece of data changes such as a table cell, you need to delete all cached queries that might include the changed cell
l10n:p -->
### Caching at the database query level
TBD
<!-- l10n:p
### Caching at the object level
See your data as an object, similar to what you do with your application code. Have your application assemble the dataset from the database into a class instance or a data structure(s):
* Remove the object from cache if its underlying data has changed
* Allows for asynchronous processing: workers assemble objects by consuming the latest cached object
Suggestions of what to cache:
* User sessions
* Fully rendered web pages
* Activity streams
* User graph data
l10n:p -->
### Caching at the object level
TBD
<!-- l10n:p
### When to update the cache
Since you can only store a limited amount of data in cache, you'll need to determine which cache update strategy works best for your use case.
l10n:p -->
### When to update the cache
TBD
<!-- l10n:p
#### Cache-aside
<p align="center">
<img src="http://i.imgur.com/ONjORqk.png"/>
<br/>
<i><a href=http://www.slideshare.net/tmatyashovsky/from-cache-to-in-memory-data-grid-introduction-to-hazelcast>Source: From cache to in-memory data grid</a></i>
</p>
The application is responsible for reading and writing from storage. The cache does not interact with storage directly. The application does the following:
* Look for entry in cache, resulting in a cache miss
* Load entry from the database
* Add entry to cache
* Return entry
```python
def get_user(self, user_id):
user = cache.get("user.{0}", user_id)
if user is None:
user = db.query("SELECT * FROM users WHERE user_id = {0}", user_id)
if user is not None:
key = "user.{0}".format(user_id)
cache.set(key, json.dumps(user))
return user
```
[Memcached](https://memcached.org/) is generally used in this manner.
Subsequent reads of data added to cache are fast. Cache-aside is also referred to as lazy loading. Only requested data is cached, which avoids filling up the cache with data that isn't requested.
l10n:p -->
#### Cache-aside
TBD
<!-- l10n:p
##### Disadvantage(s): cache-aside
* Each cache miss results in three trips, which can cause a noticeable delay.
* Data can become stale if it is updated in the database. This issue is mitigated by setting a time-to-live (TTL) which forces an update of the cache entry, or by using write-through.
* When a node fails, it is replaced by a new, empty node, increasing latency.
l10n:p -->
##### Disadvantage(s): cache-aside
TBD
<!-- l10n:p
#### Write-through
<p align="center">
<img src="http://i.imgur.com/0vBc0hN.png"/>
<br/>
<i><a href=http://www.slideshare.net/jboner/scalability-availability-stability-patterns/>Source: Scalability, availability, stability, patterns</a></i>
</p>
The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database:
* Application adds/updates entry in cache
* Cache synchronously writes entry to data store
* Return
Application code:
```python
set_user(12345, {"foo":"bar"})
```
Cache code:
```python
def set_user(user_id, values):
user = db.query("UPDATE Users WHERE id = {0}", user_id, values)
cache.set(user_id, user)
```
Write-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast. Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale.
l10n:p -->
#### Write-through
TBD
<!-- l10n:p
##### Disadvantage(s): write through
* When a new node is created due to failure or scaling, the new node will not cache entries until the entry is updated in the database. Cache-aside in conjunction with write through can mitigate this issue.
* Most data written might never be read, which can be minimized with a TTL.
l10n:p -->
##### Disadvantage(s): write through
TBD
<!-- l10n:p
#### Write-behind (write-back)
<p align="center">
<img src="http://i.imgur.com/rgSrvjG.png"/>
<br/>
<i><a href=http://www.slideshare.net/jboner/scalability-availability-stability-patterns/>Source: Scalability, availability, stability, patterns</a></i>
</p>
In write-behind, the application does the following:
* Add/update entry in cache
* Asynchronously write entry to the data store, improving write performance
l10n:p -->
#### Write-behind (write-back)
TBD
<!-- l10n:p
##### Disadvantage(s): write-behind
* There could be data loss if the cache goes down prior to its contents hitting the data store.
* It is more complex to implement write-behind than it is to implement cache-aside or write-through.
l10n:p -->
##### Disadvantage(s): write-behind
TBD
<!-- l10n:p
#### Refresh-ahead
<p align="center">
<img src="http://i.imgur.com/kxtjqgE.png"/>
<br/>
<i><a href=http://www.slideshare.net/tmatyashovsky/from-cache-to-in-memory-data-grid-introduction-to-hazelcast>Source: From cache to in-memory data grid</a></i>
</p>
You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.
Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future.
l10n:p -->
#### Refresh-ahead
TBD
<!-- l10n:p
##### Disadvantage(s): refresh-ahead
* Not accurately predicting which items are likely to be needed in the future can result in reduced performance than without refresh-ahead.
l10n:p -->
##### Disadvantage(s): refresh-ahead
TBD
<!-- l10n:p
### Disadvantage(s): cache
* Need to maintain consistency between caches and the source of truth such as the database through [cache invalidation](https://en.wikipedia.org/wiki/Cache_algorithms).
* Cache invalidation is a difficult problem, there is additional complexity associated with when to update the cache.
* Need to make application changes such as adding Redis or memcached.
l10n:p -->
### Disadvantage(s): cache
TBD
<!-- l10n:p
### Source(s) and further reading
* [From cache to in-memory data grid](http://www.slideshare.net/tmatyashovsky/from-cache-to-in-memory-data-grid-introduction-to-hazelcast)
* [Scalable system design patterns](http://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html)
* [Introduction to architecting systems for scale](http://lethain.com/introduction-to-architecting-systems-for-scale/)
* [Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/)
* [Scalability](http://www.lecloud.net/post/9246290032/scalability-for-dummies-part-3-cache)
* [AWS ElastiCache strategies](http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Strategies.html)
* [Wikipedia](https://en.wikipedia.org/wiki/Cache_(computing))
l10n:p -->
### Source(s) and further reading
TBD
<!-- l10n:p
## Asynchronism
<p align="center">
<img src="http://i.imgur.com/54GYsSx.png"/>
<br/>
<i><a href=http://lethain.com/introduction-to-architecting-systems-for-scale/#platform_layer>Source: Intro to architecting systems for scale</a></i>
</p>
Asynchronous workflows help reduce request times for expensive operations that would otherwise be performed in-line. They can also help by doing time-consuming work in advance, such as periodic aggregation of data.
l10n:p -->
## Asynchronism
TBD
<!-- l10n:p
### Message queues
Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message queue with the following workflow:
* An application publishes a job to the queue, then notifies the user of job status
* A worker picks up the job from the queue, processes it, then signals the job is complete
The user is not blocked and the job is processed in the background. During this time, the client might optionally do a small amount of processing to make it seem like the task has completed. For example, if posting a tweet, the tweet could be instantly posted to your timeline, but it could take some time before your tweet is actually delivered to all of your followers.
**[Redis](https://redis.io/)** is useful as a simple message broker but messages can be lost.
**[RabbitMQ](https://www.rabbitmq.com/)** is popular but requires you to adapt to the 'AMQP' protocol and manage your own nodes.
**[Amazon SQS](https://aws.amazon.com/sqs/)** is hosted but can have high latency and has the possibility of messages being delivered twice.
l10n:p -->
### Message queues
TBD
<!-- l10n:p
### Task queues
Tasks queues receive tasks and their related data, runs them, then delivers their results. They can support scheduling and can be used to run computationally-intensive jobs in the background.
**Celery** has support for scheduling and primarily has python support.
l10n:p -->
### Task queues
TBD
<!-- l10n:p
### Back pressure
If queues start to grow significantly, the queue size can become larger than memory, resulting in cache misses, disk reads, and even slower performance. [Back pressure](http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html) can help by limiting the queue size, thereby maintaining a high throughput rate and good response times for jobs already in the queue. Once the queue fills up, clients get a server busy or HTTP 503 status code to try again later. Clients can retry the request at a later time, perhaps with [exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff).
l10n:p -->
### Back pressure
TBD
<!-- l10n:p
### Disadvantage(s): asynchronism
* Use cases such as inexpensive calculations and realtime workflows might be better suited for synchronous operations, as introducing queues can add delays and complexity.
l10n:p -->
### Disadvantage(s): asynchronism
TBD
<!-- l10n:p
### Source(s) and further reading
* [It's all a numbers game](https://www.youtube.com/watch?v=1KRYH75wgy4)
* [Applying back pressure when overloaded](http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html)
* [Little's law](https://en.wikipedia.org/wiki/Little%27s_law)
* [What is the difference between a message queue and a task queue?](https://www.quora.com/What-is-the-difference-between-a-message-queue-and-a-task-queue-Why-would-a-task-queue-require-a-message-broker-like-RabbitMQ-Redis-Celery-or-IronMQ-to-function)
l10n:p -->
### Source(s) and further reading
TBD
<!-- l10n:p
## Communication
<p align="center">
<img src="http://i.imgur.com/5KeocQs.jpg"/>
<br/>
<i><a href=http://www.escotal.com/osilayer.html>Source: OSI 7 layer model</a></i>
</p>
l10n:p -->
## Communication
TBD
<!-- l10n:p
### Hypertext transfer protocol (HTTP)
HTTP is a method for encoding and transporting data between a client and a server. It is a request/response protocol: clients issue requests and servers issue responses with relevant content and completion status info about the request. HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression.
A basic HTTP request consists of a verb (method) and a resource (endpoint). Below are common HTTP verbs:
| Verb | Description | Idempotent* | Safe | Cacheable |
|---|---|---|---|---|
| GET | Reads a resource | Yes | Yes | Yes |
| POST | Creates a resource or trigger a process that handles data | No | No | Yes if response contains freshness info |
| PUT | Creates or replace a resource | Yes | No | No |
| PATCH | Partially updates a resource | No | No | Yes if response contains freshness info |
| DELETE | Deletes a resource | Yes | No | No |
*Can be called many times without different outcomes.
HTTP is an application layer protocol relying on lower-level protocols such as **TCP** and **UDP**.
l10n:p -->
### Hypertext transfer protocol (HTTP)
TBD
<!-- l10n:p
#### Source(s) and further reading: HTTP
* [What is HTTP?](https://www.nginx.com/resources/glossary/http/)
* [Difference between HTTP and TCP](https://www.quora.com/What-is-the-difference-between-HTTP-protocol-and-TCP-protocol)
* [Difference between PUT and PATCH](https://laracasts.com/discuss/channels/general-discussion/whats-the-differences-between-put-and-patch?page=1)
l10n:p -->
#### Source(s) and further reading: HTTP
TBD
<!-- l10n:p
### Transmission control protocol (TCP)
<p align="center">
<img src="http://i.imgur.com/JdAsdvG.jpg"/>
<br/>
<i><a href=http://www.wildbunny.co.uk/blog/2012/10/09/how-to-make-a-multi-player-game-part-1/>Source: How to make a multiplayer game</a></i>
</p>
TCP is a connection-oriented protocol over an [IP network](https://en.wikipedia.org/wiki/Internet_Protocol). Connection is established and terminated using a [handshake](https://en.wikipedia.org/wiki/Handshaking). All packets sent are guaranteed to reach the destination in the original order and without corruption through:
* Sequence numbers and [checksum fields](https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Checksum_computation) for each packet
* [Acknowledgement](https://en.wikipedia.org/wiki/Acknowledgement_(data_networks)) packets and automatic retransmission
If the sender does not receive a correct response, it will resend the packets. If there are multiple timeouts, the connection is dropped. TCP also implements [flow control](https://en.wikipedia.org/wiki/Flow_control_(data)) and [congestion control](https://en.wikipedia.org/wiki/Network_congestion#Congestion_control). These guarantees cause delays and generally result in less efficient transmission than UDP.
To ensure high throughput, web servers can keep a large number of TCP connections open, resulting in high memory usage. It can be expensive to have a large number of open connections between web server threads and say, a [memcached](https://memcached.org/) server. [Connection pooling](https://en.wikipedia.org/wiki/Connection_pool) can help in addition to switching to UDP where applicable.
TCP is useful for applications that require high reliability but are less time critical. Some examples include web servers, database info, SMTP, FTP, and SSH.
Use TCP over UDP when:
* You need all of the data to arrive intact
* You want to automatically make a best estimate use of the network throughput
l10n:p -->
### Transmission control protocol (TCP)
TBD
<!-- l10n:p
### User datagram protocol (UDP)
<p align="center">
<img src="http://i.imgur.com/yzDrJtA.jpg"/>
<br/>
<i><a href=http://www.wildbunny.co.uk/blog/2012/10/09/how-to-make-a-multi-player-game-part-1/>Source: How to make a multiplayer game</a></i>
</p>
UDP is connectionless. Datagrams (analogous to packets) are guaranteed only at the datagram level. Datagrams might reach their destination out of order or not at all. UDP does not support congestion control. Without the guarantees that TCP support, UDP is generally more efficient.
UDP can broadcast, sending datagrams to all devices on the subnet. This is useful with [DHCP](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol) because the client has not yet received an IP address, thus preventing a way for TCP to stream without the IP address.
UDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and realtime multiplayer games.
Use UDP over TCP when:
* You need the lowest latency
* Late data is worse than loss of data
* You want to implement your own error correction
l10n:p -->
### User datagram protocol (UDP)
TBD
<!-- l10n:p
#### Source(s) and further reading: TCP and UDP
* [Networking for game programming](http://gafferongames.com/networking-for-game-programmers/udp-vs-tcp/)
* [Key differences between TCP and UDP protocols](http://www.cyberciti.biz/faq/key-differences-between-tcp-and-udp-protocols/)
* [Difference between TCP and UDP](http://stackoverflow.com/questions/5970383/difference-between-tcp-and-udp)
* [Transmission control protocol](https://en.wikipedia.org/wiki/Transmission_Control_Protocol)
* [User datagram protocol](https://en.wikipedia.org/wiki/User_Datagram_Protocol)
* [Scaling memcache at Facebook](http://www.cs.bu.edu/~jappavoo/jappavoo.github.com/451/papers/memcache-fb.pdf)
l10n:p -->
#### Source(s) and further reading: TCP and UDP
TBD
<!-- l10n:p
### Remote procedure call (RPC)
<p align="center">
<img src="http://i.imgur.com/iF4Mkb5.png"/>
<br/>
<i><a href=http://www.puncsky.com/blog/2016-02-13-crack-the-system-design-interview>Source: Crack the system design interview</a></i>
</p>
In an RPC, a client causes a procedure to execute on a different address space, usually a remote server. The procedure is coded as if it were a local procedure call, abstracting away the details of how to communicate with the server from the client program. Remote calls are usually slower and less reliable than local calls so it is helpful to distinguish RPC calls from local calls. Popular RPC frameworks include [Protobuf](https://developers.google.com/protocol-buffers/), [Thrift](https://thrift.apache.org/), and [Avro](https://avro.apache.org/docs/current/).
RPC is a request-response protocol:
* **Client program** - Calls the client stub procedure. The parameters are pushed onto the stack like a local procedure call.
* **Client stub procedure** - Marshals (packs) procedure id and arguments into a request message.
* **Client communication module** - OS sends the message from the client to the server.
* **Server communication module** - OS passes the incoming packets to the server stub procedure.
* **Server stub procedure** - Unmarshalls the results, calls the server procedure matching the procedure id and passes the given arguments.
* The server response repeats the steps above in reverse order.
Sample RPC calls:
```
GET /someoperation?data=anId
POST /anotheroperation
{
"data":"anId";
"anotherdata": "another value"
}
```
RPC is focused on exposing behaviors. RPCs are often used for performance reasons with internal communications, as you can hand-craft native calls to better fit your use cases.
Choose a native library (aka SDK) when:
* You know your target platform.
* You want to control how your "logic" is accessed.
* You want to control how error control happens off your library.
* Performance and end user experience is your primary concern.
HTTP APIs following **REST** tend to be used more often for public APIs.
l10n:p -->
### Remote procedure call (RPC)
TBD
<!-- l10n:p
#### Disadvantage(s): RPC
* RPC clients become tightly coupled to the service implementation.
* A new API must be defined for every new operation or use case.
* It can be difficult to debug RPC.
* You might not be able to leverage existing technologies out of the box. For example, it might require additional effort to ensure [RPC calls are properly cached](http://etherealbits.com/2012/12/debunking-the-myths-of-rpc-rest/) on caching servers such as [Squid](http://www.squid-cache.org/).
l10n:p -->
#### Disadvantage(s): RPC
TBD
<!-- l10n:p
### Representational state transfer (REST)
REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. The server provides a representation of resources and actions that can either manipulate or get a new representation of resources. All communication must be stateless and cacheable.
There are four qualities of a RESTful interface:
* **Identify resources (URI in HTTP)** - use the same URI regardless of any operation.
* **Change with representations (Verbs in HTTP)** - use verbs, headers, and body.
* **Self-descriptive error message (status response in HTTP)** - Use status codes, don't reinvent the wheel.
* **[HATEOAS](http://restcookbook.com/Basics/hateoas/) (HTML interface for HTTP)** - your web service should be fully accessible in a browser.
Sample REST calls:
```
GET /someresources/anId
PUT /someresources/anId
{"anotherdata": "another value"}
```
REST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP APIs. REST uses a more generic and uniform method of exposing resources through URIs, [representation through headers](https://github.com/for-GET/know-your-http-well/blob/master/headers.md), and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is great for horizontal scaling and partitioning.
l10n:p -->
### Representational state transfer (REST)
TBD
<!-- l10n:p
#### Disadvantage(s): REST
* With REST being focused on exposing data, it might not be a good fit if resources are not naturally organized or accessed in a simple hierarchy. For example, returning all updated records from the past hour matching a particular set of events is not easily expressed as a path. With REST, it is likely to be implemented with a combination of URI path, query parameters, and possibly the request body.
* REST typically relies on a few verbs (GET, POST, PUT, DELETE, and PATCH) which sometimes doesn't fit your use case. For example, moving expired documents to the archive folder might not cleanly fit within these verbs.
* Fetching complicated resources with nested hierarchies requires multiple round trips between the client and server to render single views, e.g. fetching content of a blog entry and the comments on that entry. For mobile applications operating in variable network conditions, these multiple roundtrips are highly undesirable.
* Over time, more fields might be added to an API response and older clients will receive all new data fields, even those that they do not need, as a result, it bloats the payload size and leads to larger latencies.
l10n:p -->
#### Disadvantage(s): REST
TBD
<!-- l10n:p
### RPC and REST calls comparison
| Operation | RPC | REST |
|---|---|---|
| Signup | **POST** /signup | **POST** /persons |
| Resign | **POST** /resign<br/>{<br/>"personid": "1234"<br/>} | **DELETE** /persons/1234 |
| Read a person | **GET** /readPerson?personid=1234 | **GET** /persons/1234 |
| Read a persons items list | **GET** /readUsersItemsList?personid=1234 | **GET** /persons/1234/items |
| Add an item to a persons items | **POST** /addItemToUsersItemsList<br/>{<br/>"personid": "1234";<br/>"itemid": "456"<br/>} | **POST** /persons/1234/items<br/>{<br/>"itemid": "456"<br/>} |
| Update an item | **POST** /modifyItem<br/>{<br/>"itemid": "456";<br/>"key": "value"<br/>} | **PUT** /items/456<br/>{<br/>"key": "value"<br/>} |
| Delete an item | **POST** /removeItem<br/>{<br/>"itemid": "456"<br/>} | **DELETE** /items/456 |
<p align="center">
<i><a href=https://apihandyman.io/do-you-really-know-why-you-prefer-rest-over-rpc/>Source: Do you really know why you prefer REST over RPC</a></i>
</p>
l10n:p -->
### RPC and REST calls comparison
TBD
<!-- l10n:p
#### Source(s) and further reading: REST and RPC
* [Do you really know why you prefer REST over RPC](https://apihandyman.io/do-you-really-know-why-you-prefer-rest-over-rpc/)
* [When are RPC-ish approaches more appropriate than REST?](http://programmers.stackexchange.com/a/181186)
* [REST vs JSON-RPC](http://stackoverflow.com/questions/15056878/rest-vs-json-rpc)
* [Debunking the myths of RPC and REST](http://etherealbits.com/2012/12/debunking-the-myths-of-rpc-rest/)
* [What are the drawbacks of using REST](https://www.quora.com/What-are-the-drawbacks-of-using-RESTful-APIs)
* [Crack the system design interview](http://www.puncsky.com/blog/2016-02-13-crack-the-system-design-interview)
* [Thrift](https://code.facebook.com/posts/1468950976659943/)
* [Why REST for internal use and not RPC](http://arstechnica.com/civis/viewtopic.php?t=1190508)
l10n:p -->
#### Source(s) and further reading: REST and RPC
TBD
<!-- l10n:p
## Security
This section could use some updates. Consider [contributing](#contributing)!
Security is a broad topic. Unless you have considerable experience, a security background, or are applying for a position that requires knowledge of security, you probably won't need to know more than the basics:
* Encrypt in transit and at rest.
* Sanitize all user inputs or any input parameters exposed to user to prevent [XSS](https://en.wikipedia.org/wiki/Cross-site_scripting) and [SQL injection](https://en.wikipedia.org/wiki/SQL_injection).
* Use parameterized queries to prevent SQL injection.
* Use the principle of [least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege).
l10n:p -->
## Security
TBD
<!-- l10n:p
### Source(s) and further reading
* [API security checklist](https://github.com/shieldfy/API-Security-Checklist)
* [Security guide for developers](https://github.com/FallibleInc/security-guide-for-developers)
* [OWASP top ten](https://www.owasp.org/index.php/OWASP_Top_Ten_Cheat_Sheet)
l10n:p -->
### Source(s) and further reading
TBD
<!-- l10n:p
## Appendix
You'll sometimes be asked to do 'back-of-the-envelope' estimates. For example, you might need to determine how long it will take to generate 100 image thumbnails from disk or how much memory a data structure will take. The **Powers of two table** and **Latency numbers every programmer should know** are handy references.
l10n:p -->
## Appendix
TBD
<!-- l10n:p
### Powers of two table
```
Power Exact Value Approx Value Bytes
---------------------------------------------------------------
7 128
8 256
10 1024 1 thousand 1 KB
16 65,536 64 KB
20 1,048,576 1 million 1 MB
30 1,073,741,824 1 billion 1 GB
32 4,294,967,296 4 GB
40 1,099,511,627,776 1 trillion 1 TB
```
l10n:p -->
### Powers of two table
TBD
<!-- l10n:p
#### Source(s) and further reading
* [Powers of two](https://en.wikipedia.org/wiki/Power_of_two)
l10n:p -->
#### Source(s) and further reading
TBD
<!-- l10n:p
### Latency numbers every programmer should know
```
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 10,000 ns 10 us
Send 1 KB bytes over 1 Gbps network 10,000 ns 10 us
Read 4 KB randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD
Read 1 MB sequentially from disk 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
```
Handy metrics based on numbers above:
* Read sequentially from disk at 30 MB/s
* Read sequentially from 1 Gbps Ethernet at 100 MB/s
* Read sequentially from SSD at 1 GB/s
* Read sequentially from main memory at 4 GB/s
* 6-7 world-wide round trips per second
* 2,000 round trips per second within a data center
l10n:p -->
### Latency numbers every programmer should know
TBD
<!-- l10n:p
#### Latency numbers visualized
![](https://camo.githubusercontent.com/77f72259e1eb58596b564d1ad823af1853bc60a3/687474703a2f2f692e696d6775722e636f6d2f6b307431652e706e67)
l10n:p -->
#### Latency numbers visualized
![](https://camo.githubusercontent.com/77f72259e1eb58596b564d1ad823af1853bc60a3/687474703a2f2f692e696d6775722e636f6d2f6b307431652e706e67)
<!-- l10n:p
#### Source(s) and further reading
* [Latency numbers every programmer should know - 1](https://gist.github.com/jboner/2841832)
* [Latency numbers every programmer should know - 2](https://gist.github.com/hellerbarde/2843375)
* [Designs, lessons, and advice from building large distributed systems](http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf)
* [Software Engineering Advice from Building Large-Scale Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf)
l10n:p -->
#### Source(s) and further reading
* [Latency numbers every programmer should know - 1](https://gist.github.com/jboner/2841832)
* [Latency numbers every programmer should know - 2](https://gist.github.com/hellerbarde/2843375)
* [Designs, lessons, and advice from building large distributed systems](http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf)
* [Software Engineering Advice from Building Large-Scale Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf)
<!-- l10n:p
### Additional system design interview questions
> Common system design interview questions, with links to resources on how to solve each.
| Question | Reference(s) |
|---|---|
| Design a file sync service like Dropbox | [youtube.com](https://www.youtube.com/watch?v=PE4gwstWhmc) |
| Design a search engine like Google | [queue.acm.org](http://queue.acm.org/detail.cfm?id=988407)<br/>[stackexchange.com](http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search)<br/>[ardendertat.com](http://www.ardendertat.com/2012/01/11/implementing-search-engines/)<br/>[stanford.edu](http://infolab.stanford.edu/~backrub/google.html) |
| Design a scalable web crawler like Google | [quora.com](https://www.quora.com/How-can-I-build-a-web-crawler-from-scratch) |
| Design Google docs | [code.google.com](https://code.google.com/p/google-mobwrite/)<br/>[neil.fraser.name](https://neil.fraser.name/writing/sync/) |
| Design a key-value store like Redis | [slideshare.net](http://www.slideshare.net/dvirsky/introduction-to-redis) |
| Design a cache system like Memcached | [slideshare.net](http://www.slideshare.net/oemebamo/introduction-to-memcached) |
| Design a recommendation system like Amazon's | [hulu.com](https://web.archive.org/web/20170406065247/http://tech.hulu.com/blog/2011/09/19/recommendation-system.html)<br/>[ijcai13.org](http://ijcai13.org/files/tutorial_slides/td3.pdf) |
| Design a tinyurl system like Bitly | [n00tc0d3r.blogspot.com](http://n00tc0d3r.blogspot.com/) |
| Design a chat app like WhatsApp | [highscalability.com](http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html)
| Design a picture sharing system like Instagram | [highscalability.com](http://highscalability.com/flickr-architecture)<br/>[highscalability.com](http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html) |
| Design the Facebook news feed function | [quora.com](http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed)<br/>[quora.com](http://www.quora.com/Activity-Streams/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed)<br/>[slideshare.net](http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture) |
| Design the Facebook timeline function | [facebook.com](https://www.facebook.com/note.php?note_id=10150468255628920)<br/>[highscalability.com](http://highscalability.com/blog/2012/1/23/facebook-timeline-brought-to-you-by-the-power-of-denormaliza.html) |
| Design the Facebook chat function | [erlang-factory.com](http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf)<br/>[facebook.com](https://www.facebook.com/note.php?note_id=14218138919&id=9445547199&index=0) |
| Design a graph search function like Facebook's | [facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-building-out-the-infrastructure-for-graph-search/10151347573598920)<br/>[facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-indexing-and-ranking-in-graph-search/10151361720763920)<br/>[facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-the-natural-language-interface-of-graph-search/10151432733048920) |
| Design a content delivery network like CloudFlare | [figshare.com](https://figshare.com/articles/Globally_distributed_content_delivery/6605972) |
| Design a trending topic system like Twitter's | [michael-noll.com](http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/)<br/>[snikolov .wordpress.com](http://snikolov.wordpress.com/2012/11/14/early-detection-of-twitter-trends/) |
| Design a random ID generation system | [blog.twitter.com](https://blog.twitter.com/2010/announcing-snowflake)<br/>[github.com](https://github.com/twitter/snowflake/) |
| Return the top k requests during a time interval | [cs.ucsb.edu](https://www.cs.ucsb.edu/sites/cs.ucsb.edu/files/docs/reports/2005-23.pdf)<br/>[wpi.edu](http://davis.wpi.edu/xmdv/docs/EDBT11-diyang.pdf) |
| Design a system that serves data from multiple data centers | [highscalability.com](http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html) |
| Design an online multiplayer card game | [indieflashblog.com](http://www.indieflashblog.com/how-to-create-an-asynchronous-multiplayer-game.html)<br/>[buildnewgames.com](http://buildnewgames.com/real-time-multiplayer/) |
| Design a garbage collection system | [stuffwithstuff.com](http://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-collector/)<br/>[washington.edu](http://courses.cs.washington.edu/courses/csep521/07wi/prj/rick.pdf) |
| Design an API rate limiter | [https://stripe.com/blog/](https://stripe.com/blog/rate-limiters) |
| Add a system design question | [Contribute](#contributing) |
l10n:p -->
### Additional system design interview questions
> Распространенные задачи на интервью по проектированию систем со ссылками на решение.
| Задача | Ссылки |
|---|---|
| Design a file sync service like Dropbox | [youtube.com](https://www.youtube.com/watch?v=PE4gwstWhmc) |
| Design a search engine like Google | [queue.acm.org](http://queue.acm.org/detail.cfm?id=988407)<br/>[stackexchange.com](http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search)<br/>[ardendertat.com](http://www.ardendertat.com/2012/01/11/implementing-search-engines/)<br/>[stanford.edu](http://infolab.stanford.edu/~backrub/google.html) |
| Design a scalable web crawler like Google | [quora.com](https://www.quora.com/How-can-I-build-a-web-crawler-from-scratch) |
| Design Google docs | [code.google.com](https://code.google.com/p/google-mobwrite/)<br/>[neil.fraser.name](https://neil.fraser.name/writing/sync/) |
| Design a key-value store like Redis | [slideshare.net](http://www.slideshare.net/dvirsky/introduction-to-redis) |
| Design a cache system like Memcached | [slideshare.net](http://www.slideshare.net/oemebamo/introduction-to-memcached) |
| Design a recommendation system like Amazon's | [hulu.com](https://web.archive.org/web/20170406065247/http://tech.hulu.com/blog/2011/09/19/recommendation-system.html)<br/>[ijcai13.org](http://ijcai13.org/files/tutorial_slides/td3.pdf) |
| Design a tinyurl system like Bitly | [n00tc0d3r.blogspot.com](http://n00tc0d3r.blogspot.com/) |
| Design a chat app like WhatsApp | [highscalability.com](http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html)
| Design a picture sharing system like Instagram | [highscalability.com](http://highscalability.com/flickr-architecture)<br/>[highscalability.com](http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html) |
| Design the Facebook news feed function | [quora.com](http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed)<br/>[quora.com](http://www.quora.com/Activity-Streams/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed)<br/>[slideshare.net](http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture) |
| Design the Facebook timeline function | [facebook.com](https://www.facebook.com/note.php?note_id=10150468255628920)<br/>[highscalability.com](http://highscalability.com/blog/2012/1/23/facebook-timeline-brought-to-you-by-the-power-of-denormaliza.html) |
| Design the Facebook chat function | [erlang-factory.com](http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf)<br/>[facebook.com](https://www.facebook.com/note.php?note_id=14218138919&id=9445547199&index=0) |
| Design a graph search function like Facebook's | [facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-building-out-the-infrastructure-for-graph-search/10151347573598920)<br/>[facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-indexing-and-ranking-in-graph-search/10151361720763920)<br/>[facebook.com](https://www.facebook.com/notes/facebook-engineering/under-the-hood-the-natural-language-interface-of-graph-search/10151432733048920) |
| Design a content delivery network like CloudFlare | [figshare.com](https://figshare.com/articles/Globally_distributed_content_delivery/6605972) |
| Design a trending topic system like Twitter's | [michael-noll.com](http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/)<br/>[snikolov .wordpress.com](http://snikolov.wordpress.com/2012/11/14/early-detection-of-twitter-trends/) |
| Design a random ID generation system | [blog.twitter.com](https://blog.twitter.com/2010/announcing-snowflake)<br/>[github.com](https://github.com/twitter/snowflake/) |
| Return the top k requests during a time interval | [cs.ucsb.edu](https://www.cs.ucsb.edu/sites/cs.ucsb.edu/files/docs/reports/2005-23.pdf)<br/>[wpi.edu](http://davis.wpi.edu/xmdv/docs/EDBT11-diyang.pdf) |
| Design a system that serves data from multiple data centers | [highscalability.com](http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html) |
| Design an online multiplayer card game | [indieflashblog.com](http://www.indieflashblog.com/how-to-create-an-asynchronous-multiplayer-game.html)<br/>[buildnewgames.com](http://buildnewgames.com/real-time-multiplayer/) |
| Design a garbage collection system | [stuffwithstuff.com](http://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-collector/)<br/>[washington.edu](http://courses.cs.washington.edu/courses/csep521/07wi/prj/rick.pdf) |
| Design an API rate limiter | [https://stripe.com/blog/](https://stripe.com/blog/rate-limiters) |
| Add a system design question | [Contribute](#contributing) |
<!-- l10n:p
### Real world architectures
> Articles on how real world systems are designed.
<p align="center">
<img src="http://i.imgur.com/TcUo2fw.png"/>
<br/>
<i><a href=https://www.infoq.com/presentations/Twitter-Timeline-Scalability>Source: Twitter timelines at scale</a></i>
</p>
**Don't focus on nitty gritty details for the following articles, instead:**
* Identify shared principles, common technologies, and patterns within these articles
* Study what problems are solved by each component, where it works, where it doesn't
* Review the lessons learned
|Type | System | Reference(s) |
|---|---|---|
| Data processing | **MapReduce** - Distributed data processing from Google | [research.google.com](http://static.googleusercontent.com/media/research.google.com/zh-CN/us/archive/mapreduce-osdi04.pdf) |
| Data processing | **Spark** - Distributed data processing from Databricks | [slideshare.net](http://www.slideshare.net/AGrishchenko/apache-spark-architecture) |
| Data processing | **Storm** - Distributed data processing from Twitter | [slideshare.net](http://www.slideshare.net/previa/storm-16094009) |
| | | |
| Data store | **Bigtable** - Distributed column-oriented database from Google | [harvard.edu](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf) |
| Data store | **HBase** - Open source implementation of Bigtable | [slideshare.net](http://www.slideshare.net/alexbaranau/intro-to-hbase) |
| Data store | **Cassandra** - Distributed column-oriented database from Facebook | [slideshare.net](http://www.slideshare.net/planetcassandra/cassandra-introduction-features-30103666)
| Data store | **DynamoDB** - Document-oriented database from Amazon | [harvard.edu](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf) |
| Data store | **MongoDB** - Document-oriented database | [slideshare.net](http://www.slideshare.net/mdirolf/introduction-to-mongodb) |
| Data store | **Spanner** - Globally-distributed database from Google | [research.google.com](http://research.google.com/archive/spanner-osdi2012.pdf) |
| Data store | **Memcached** - Distributed memory caching system | [slideshare.net](http://www.slideshare.net/oemebamo/introduction-to-memcached) |
| Data store | **Redis** - Distributed memory caching system with persistence and value types | [slideshare.net](http://www.slideshare.net/dvirsky/introduction-to-redis) |
| | | |
| File system | **Google File System (GFS)** - Distributed file system | [research.google.com](http://static.googleusercontent.com/media/research.google.com/zh-CN/us/archive/gfs-sosp2003.pdf) |
| File system | **Hadoop File System (HDFS)** - Open source implementation of GFS | [apache.org](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) |
| | | |
| Misc | **Chubby** - Lock service for loosely-coupled distributed systems from Google | [research.google.com](http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/chubby-osdi06.pdf) |
| Misc | **Dapper** - Distributed systems tracing infrastructure | [research.google.com](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36356.pdf)
| Misc | **Kafka** - Pub/sub message queue from LinkedIn | [slideshare.net](http://www.slideshare.net/mumrah/kafka-talk-tri-hug) |
| Misc | **Zookeeper** - Centralized infrastructure and services enabling synchronization | [slideshare.net](http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper) |
| | Add an architecture | [Contribute](#contributing) |
l10n:p -->
### Real world architectures
> Статья о том, как спроектированы действующие системы.
<p align="center">
<img src="http://i.imgur.com/TcUo2fw.png"/>
<br/>
<i><a href=https://www.infoq.com/presentations/Twitter-Timeline-Scalability>Источник: Twitter timelines at scale</a></i>
</p>
**Не вдавайтесь в мельчайшие подробности, вместо этого:*
* Определите основные принципы, общие технологии и шаблоны, которые встречаются в этих статьях
* Изучите, какие проблемы решаются каждым компонентом, где это работает, а где нет
* Обратите внимание секции, описывающие полученный опыт и работу над ошибками
| Тип | Система | Ссылки |
|---|---|---|
| Обработка данных | **MapReduce** - Распределенная обработка данных от Google | [research.google.com](http://static.googleusercontent.com/media/research.google.com/zh-CN/us/archive/mapreduce-osdi04.pdf) |
| Обработка данных | **Spark** - Распределенная обработка данных от Databricks | [slideshare.net](http://www.slideshare.net/AGrishchenko/apache-spark-architecture) |
| Обработка данных | **Storm** - Распределенная обработка данных от Twitter | [slideshare.net](http://www.slideshare.net/previa/storm-16094009) |
| | | |
| Хранилище данных | **Bigtable** - Распределенная колоночная база данных от Google | [harvard.edu](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf) |
| Хранилище данных | **HBase** - Реализация Bigtable с открытым исходным кодом | [slideshare.net](http://www.slideshare.net/alexbaranau/intro-to-hbase) |
| Хранилище данных | **Cassandra** - Распределенная колоночная база данных от Facebook | [slideshare.net](http://www.slideshare.net/planetcassandra/cassandra-introduction-features-30103666)
| Хранилище данных | **DynamoDB** - Документно-ориенитрованная база данных от Amazon | [harvard.edu](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf) |
| Хранилище данных | **MongoDB** - Документно-ориенитрованная база данных | [slideshare.net](http://www.slideshare.net/mdirolf/introduction-to-mongodb) |
| Хранилище данных | **Spanner** - Глобально-распределенная база данных от Google | [research.google.com](http://research.google.com/archive/spanner-osdi2012.pdf) |
| Хранилище данных | **Memcached** - Распределенный кэш, хранящийся в памяти | [slideshare.net](http://www.slideshare.net/oemebamo/introduction-to-memcached) |
| Хранилище данных | **Redis** - Распеределенная система кэширавния с возможностью сохранения и типами данных | [slideshare.net](http://www.slideshare.net/dvirsky/introduction-to-redis) |
| | | |
| Файловая система | **Google File System (GFS)** - Распределенная файловая система | [research.google.com](http://static.googleusercontent.com/media/research.google.com/zh-CN/us/archive/gfs-sosp2003.pdf) |
| Файловая система | **Hadoop File System (HDFS)** - Реализация GFS с открытым исходным кодом | [apache.org](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) |
| | | |
| Другое | **Chubby** - Система блокировки для слабосвязанных распределенных систем от Google | [research.google.com](http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/chubby-osdi06.pdf) |
| Другое | **Dapper** - Система отслеживания операций в распределенных системах | [research.google.com](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36356.pdf)
| Другое | **Kafka** - Очередь сообщений Pub/sub от LinkedIn | [slideshare.net](http://www.slideshare.net/mumrah/kafka-talk-tri-hug) |
| Другое | **Zookeeper** - Централизованная инфраструктура и сервисы для синхронизации распределенных систем | [slideshare.net](http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper) |
| | Добавьте архитектуру | [Contribute](#contributing) |
<!-- l10n:p
### Company architectures
| Company | Reference(s) |
|---|---|
| Amazon | [Amazon architecture](http://highscalability.com/amazon-architecture) |
| Cinchcast | [Producing 1,500 hours of audio every day](http://highscalability.com/blog/2012/7/16/cinchcast-architecture-producing-1500-hours-of-audio-every-d.html) |
| DataSift | [Realtime datamining At 120,000 tweets per second](http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html) |
| DropBox | [How we've scaled Dropbox](https://www.youtube.com/watch?v=PE4gwstWhmc) |
| ESPN | [Operating At 100,000 duh nuh nuhs per second](http://highscalability.com/blog/2013/11/4/espns-architecture-at-scale-operating-at-100000-duh-nuh-nuhs.html) |
| Google | [Google architecture](http://highscalability.com/google-architecture) |
| Instagram | [14 million users, terabytes of photos](http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html)<br/>[What powers Instagram](http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances) |
| Justin.tv | [Justin.Tv's live video broadcasting architecture](http://highscalability.com/blog/2010/3/16/justintvs-live-video-broadcasting-architecture.html) |
| Facebook | [Scaling memcached at Facebook](https://cs.uwaterloo.ca/~brecht/courses/854-Emerging-2014/readings/key-value/fb-memcached-nsdi-2013.pdf)<br/>[TAO: Facebooks distributed data store for the social graph](https://cs.uwaterloo.ca/~brecht/courses/854-Emerging-2014/readings/data-store/tao-facebook-distributed-datastore-atc-2013.pdf)<br/>[Facebooks photo storage](https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf)<br/>[How Facebook Live Streams To 800,000 Simultaneous Viewers](http://highscalability.com/blog/2016/6/27/how-facebook-live-streams-to-800000-simultaneous-viewers.html) |
| Flickr | [Flickr architecture](http://highscalability.com/flickr-architecture) |
| Mailbox | [From 0 to one million users in 6 weeks](http://highscalability.com/blog/2013/6/18/scaling-mailbox-from-0-to-one-million-users-in-6-weeks-and-1.html) |
| Netflix | [A 360 Degree View Of The Entire Netflix Stack](http://highscalability.com/blog/2015/11/9/a-360-degree-view-of-the-entire-netflix-stack.html)<br/>[Netflix: What Happens When You Press Play?](http://highscalability.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html) |
| Pinterest | [From 0 To 10s of billions of page views a month](http://highscalability.com/blog/2013/4/15/scaling-pinterest-from-0-to-10s-of-billions-of-page-views-a.html)<br/>[18 million visitors, 10x growth, 12 employees](http://highscalability.com/blog/2012/5/21/pinterest-architecture-update-18-million-visitors-10x-growth.html) |
| Playfish | [50 million monthly users and growing](http://highscalability.com/blog/2010/9/21/playfishs-social-gaming-architecture-50-million-monthly-user.html) |
| PlentyOfFish | [PlentyOfFish architecture](http://highscalability.com/plentyoffish-architecture) |
| Salesforce | [How they handle 1.3 billion transactions a day](http://highscalability.com/blog/2013/9/23/salesforce-architecture-how-they-handle-13-billion-transacti.html) |
| Stack Overflow | [Stack Overflow architecture](http://highscalability.com/blog/2009/8/5/stack-overflow-architecture.html) |
| TripAdvisor | [40M visitors, 200M dynamic page views, 30TB data](http://highscalability.com/blog/2011/6/27/tripadvisor-architecture-40m-visitors-200m-dynamic-page-view.html) |
| Tumblr | [15 billion page views a month](http://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html) |
| Twitter | [Making Twitter 10000 percent faster](http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster)<br/>[Storing 250 million tweets a day using MySQL](http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html)<br/>[150M active users, 300K QPS, a 22 MB/S firehose](http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html)<br/>[Timelines at scale](https://www.infoq.com/presentations/Twitter-Timeline-Scalability)<br/>[Big and small data at Twitter](https://www.youtube.com/watch?v=5cKTP36HVgI)<br/>[Operations at Twitter: scaling beyond 100 million users](https://www.youtube.com/watch?v=z8LU0Cj6BOU)<br/>[How Twitter Handles 3,000 Images Per Second](http://highscalability.com/blog/2016/4/20/how-twitter-handles-3000-images-per-second.html) |
| Uber | [How Uber scales their real-time market platform](http://highscalability.com/blog/2015/9/14/how-uber-scales-their-real-time-market-platform.html)<br/>[Lessons Learned From Scaling Uber To 2000 Engineers, 1000 Services, And 8000 Git Repositories](http://highscalability.com/blog/2016/10/12/lessons-learned-from-scaling-uber-to-2000-engineers-1000-ser.html) |
| WhatsApp | [The WhatsApp architecture Facebook bought for $19 billion](http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html) |
| YouTube | [YouTube scalability](https://www.youtube.com/watch?v=w5WVu624fY8)<br/>[YouTube architecture](http://highscalability.com/youtube-architecture) |
l10n:p -->
### Company architectures
| Компания | Ссылки |
|---|---|
| Amazon | [Amazon architecture](http://highscalability.com/amazon-architecture) |
| Cinchcast | [Producing 1,500 hours of audio every day](http://highscalability.com/blog/2012/7/16/cinchcast-architecture-producing-1500-hours-of-audio-every-d.html) |
| DataSift | [Realtime datamining At 120,000 tweets per second](http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html) |
| DropBox | [How we've scaled Dropbox](https://www.youtube.com/watch?v=PE4gwstWhmc) |
| ESPN | [Operating At 100,000 duh nuh nuhs per second](http://highscalability.com/blog/2013/11/4/espns-architecture-at-scale-operating-at-100000-duh-nuh-nuhs.html) |
| Google | [Google architecture](http://highscalability.com/google-architecture) |
| Instagram | [14 million users, terabytes of photos](http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html)<br/>[What powers Instagram](http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances) |
| Justin.tv | [Justin.Tv's live video broadcasting architecture](http://highscalability.com/blog/2010/3/16/justintvs-live-video-broadcasting-architecture.html) |
| Facebook | [Scaling memcached at Facebook](https://cs.uwaterloo.ca/~brecht/courses/854-Emerging-2014/readings/key-value/fb-memcached-nsdi-2013.pdf)<br/>[TAO: Facebooks distributed data store for the social graph](https://cs.uwaterloo.ca/~brecht/courses/854-Emerging-2014/readings/data-store/tao-facebook-distributed-datastore-atc-2013.pdf)<br/>[Facebooks photo storage](https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf)<br/>[How Facebook Live Streams To 800,000 Simultaneous Viewers](http://highscalability.com/blog/2016/6/27/how-facebook-live-streams-to-800000-simultaneous-viewers.html) |
| Flickr | [Flickr architecture](http://highscalability.com/flickr-architecture) |
| Mailbox | [From 0 to one million users in 6 weeks](http://highscalability.com/blog/2013/6/18/scaling-mailbox-from-0-to-one-million-users-in-6-weeks-and-1.html) |
| Netflix | [A 360 Degree View Of The Entire Netflix Stack](http://highscalability.com/blog/2015/11/9/a-360-degree-view-of-the-entire-netflix-stack.html)<br/>[Netflix: What Happens When You Press Play?](http://highscalability.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html) |
| Pinterest | [From 0 To 10s of billions of page views a month](http://highscalability.com/blog/2013/4/15/scaling-pinterest-from-0-to-10s-of-billions-of-page-views-a.html)<br/>[18 million visitors, 10x growth, 12 employees](http://highscalability.com/blog/2012/5/21/pinterest-architecture-update-18-million-visitors-10x-growth.html) |
| Playfish | [50 million monthly users and growing](http://highscalability.com/blog/2010/9/21/playfishs-social-gaming-architecture-50-million-monthly-user.html) |
| PlentyOfFish | [PlentyOfFish architecture](http://highscalability.com/plentyoffish-architecture) |
| Salesforce | [How they handle 1.3 billion transactions a day](http://highscalability.com/blog/2013/9/23/salesforce-architecture-how-they-handle-13-billion-transacti.html) |
| Stack Overflow | [Stack Overflow architecture](http://highscalability.com/blog/2009/8/5/stack-overflow-architecture.html) |
| TripAdvisor | [40M visitors, 200M dynamic page views, 30TB data](http://highscalability.com/blog/2011/6/27/tripadvisor-architecture-40m-visitors-200m-dynamic-page-view.html) |
| Tumblr | [15 billion page views a month](http://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html) |
| Twitter | [Making Twitter 10000 percent faster](http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster)<br/>[Storing 250 million tweets a day using MySQL](http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html)<br/>[150M active users, 300K QPS, a 22 MB/S firehose](http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html)<br/>[Timelines at scale](https://www.infoq.com/presentations/Twitter-Timeline-Scalability)<br/>[Big and small data at Twitter](https://www.youtube.com/watch?v=5cKTP36HVgI)<br/>[Operations at Twitter: scaling beyond 100 million users](https://www.youtube.com/watch?v=z8LU0Cj6BOU)<br/>[How Twitter Handles 3,000 Images Per Second](http://highscalability.com/blog/2016/4/20/how-twitter-handles-3000-images-per-second.html) |
| Uber | [How Uber scales their real-time market platform](http://highscalability.com/blog/2015/9/14/how-uber-scales-their-real-time-market-platform.html)<br/>[Lessons Learned From Scaling Uber To 2000 Engineers, 1000 Services, And 8000 Git Repositories](http://highscalability.com/blog/2016/10/12/lessons-learned-from-scaling-uber-to-2000-engineers-1000-ser.html) |
| WhatsApp | [The WhatsApp architecture Facebook bought for $19 billion](http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html) |
| YouTube | [YouTube scalability](https://www.youtube.com/watch?v=w5WVu624fY8)<br/>[YouTube architecture](http://highscalability.com/youtube-architecture) |
<!-- l10n:p
### Company engineering blogs
> Architectures for companies you are interviewing with.
>
> Questions you encounter might be from the same domain.
* [Airbnb Engineering](http://nerds.airbnb.com/)
* [Atlassian Developers](https://developer.atlassian.com/blog/)
* [AWS Blog](https://aws.amazon.com/blogs/aws/)
* [Bitly Engineering Blog](http://word.bitly.com/)
* [Box Blogs](https://blog.box.com/blog/category/engineering)
* [Cloudera Developer Blog](http://blog.cloudera.com/)
* [Dropbox Tech Blog](https://tech.dropbox.com/)
* [Engineering at Quora](http://engineering.quora.com/)
* [Ebay Tech Blog](http://www.ebaytechblog.com/)
* [Evernote Tech Blog](https://blog.evernote.com/tech/)
* [Etsy Code as Craft](http://codeascraft.com/)
* [Facebook Engineering](https://www.facebook.com/Engineering)
* [Flickr Code](http://code.flickr.net/)
* [Foursquare Engineering Blog](http://engineering.foursquare.com/)
* [GitHub Engineering Blog](http://githubengineering.com/)
* [Google Research Blog](http://googleresearch.blogspot.com/)
* [Groupon Engineering Blog](https://engineering.groupon.com/)
* [Heroku Engineering Blog](https://engineering.heroku.com/)
* [Hubspot Engineering Blog](http://product.hubspot.com/blog/topic/engineering)
* [High Scalability](http://highscalability.com/)
* [Instagram Engineering](http://instagram-engineering.tumblr.com/)
* [Intel Software Blog](https://software.intel.com/en-us/blogs/)
* [Jane Street Tech Blog](https://blogs.janestreet.com/category/ocaml/)
* [LinkedIn Engineering](http://engineering.linkedin.com/blog)
* [Microsoft Engineering](https://engineering.microsoft.com/)
* [Microsoft Python Engineering](https://blogs.msdn.microsoft.com/pythonengineering/)
* [Netflix Tech Blog](http://techblog.netflix.com/)
* [Paypal Developer Blog](https://devblog.paypal.com/category/engineering/)
* [Pinterest Engineering Blog](https://medium.com/@Pinterest_Engineering)
* [Quora Engineering](https://engineering.quora.com/)
* [Reddit Blog](http://www.redditblog.com/)
* [Salesforce Engineering Blog](https://developer.salesforce.com/blogs/engineering/)
* [Slack Engineering Blog](https://slack.engineering/)
* [Spotify Labs](https://labs.spotify.com/)
* [Twilio Engineering Blog](http://www.twilio.com/engineering)
* [Twitter Engineering](https://blog.twitter.com/engineering/)
* [Uber Engineering Blog](http://eng.uber.com/)
* [Yahoo Engineering Blog](http://yahooeng.tumblr.com/)
* [Yelp Engineering Blog](http://engineeringblog.yelp.com/)
* [Zynga Engineering Blog](https://www.zynga.com/blogs/engineering)
l10n:p -->
### Company engineering blogs
> Вопросы могут быть связаны с архитектурой компаний, в которые вы собеседуетесь.
* [Airbnb Engineering](http://nerds.airbnb.com/)
* [Atlassian Developers](https://developer.atlassian.com/blog/)
* [AWS Blog](https://aws.amazon.com/blogs/aws/)
* [Bitly Engineering Blog](http://word.bitly.com/)
* [Box Blogs](https://blog.box.com/blog/category/engineering)
* [Cloudera Developer Blog](http://blog.cloudera.com/)
* [Dropbox Tech Blog](https://tech.dropbox.com/)
* [Engineering at Quora](http://engineering.quora.com/)
* [Ebay Tech Blog](http://www.ebaytechblog.com/)
* [Evernote Tech Blog](https://blog.evernote.com/tech/)
* [Etsy Code as Craft](http://codeascraft.com/)
* [Facebook Engineering](https://www.facebook.com/Engineering)
* [Flickr Code](http://code.flickr.net/)
* [Foursquare Engineering Blog](http://engineering.foursquare.com/)
* [GitHub Engineering Blog](http://githubengineering.com/)
* [Google Research Blog](http://googleresearch.blogspot.com/)
* [Groupon Engineering Blog](https://engineering.groupon.com/)
* [Heroku Engineering Blog](https://engineering.heroku.com/)
* [Hubspot Engineering Blog](http://product.hubspot.com/blog/topic/engineering)
* [High Scalability](http://highscalability.com/)
* [Instagram Engineering](http://instagram-engineering.tumblr.com/)
* [Intel Software Blog](https://software.intel.com/en-us/blogs/)
* [Jane Street Tech Blog](https://blogs.janestreet.com/category/ocaml/)
* [LinkedIn Engineering](http://engineering.linkedin.com/blog)
* [Microsoft Engineering](https://engineering.microsoft.com/)
* [Microsoft Python Engineering](https://blogs.msdn.microsoft.com/pythonengineering/)
* [Netflix Tech Blog](http://techblog.netflix.com/)
* [Paypal Developer Blog](https://devblog.paypal.com/category/engineering/)
* [Pinterest Engineering Blog](https://medium.com/@Pinterest_Engineering)
* [Quora Engineering](https://engineering.quora.com/)
* [Reddit Blog](http://www.redditblog.com/)
* [Salesforce Engineering Blog](https://developer.salesforce.com/blogs/engineering/)
* [Slack Engineering Blog](https://slack.engineering/)
* [Spotify Labs](https://labs.spotify.com/)
* [Twilio Engineering Blog](http://www.twilio.com/engineering)
* [Twitter Engineering](https://blog.twitter.com/engineering/)
* [Uber Engineering Blog](http://eng.uber.com/)
* [Yahoo Engineering Blog](http://yahooeng.tumblr.com/)
* [Yelp Engineering Blog](http://engineeringblog.yelp.com/)
* [Zynga Engineering Blog](https://www.zynga.com/blogs/engineering)
<!-- l10n:p
#### Source(s) and further reading
Looking to add a blog? To avoid duplicating work, consider adding your company blog to the following repo:
* [kilimchoi/engineering-blogs](https://github.com/kilimchoi/engineering-blogs)
l10n:p -->
#### Source(s) and further reading
Хотите добавить блог? Во избежание дублирования, добавьте его в этот репозиторий:
* [kilimchoi/engineering-blogs](https://github.com/kilimchoi/engineering-blogs)
<!-- l10n:p
## Under development
Interested in adding a section or helping complete one in-progress? [Contribute](#contributing)!
* Distributed computing with MapReduce
* Consistent hashing
* Scatter gather
* [Contribute](#contributing)
l10n:p -->
## Under development
Заинтересованы в добавлении раздела или в завершении того, что уже в процессе? [Содействуйте!](#contributing)!
* Распределенные вычисления с MapReduce
* Согласованное хеширование
* Scatter gather
* [Содействие](#contributing)
<!-- l10n:p
## Credits
Credits and sources are provided throughout this repo.
Special thanks to:
* [Hired in tech](http://www.hiredintech.com/system-design/the-system-design-process/)
* [Cracking the coding interview](https://www.amazon.com/dp/0984782850/)
* [High scalability](http://highscalability.com/)
* [checkcheckzz/system-design-interview](https://github.com/checkcheckzz/system-design-interview)
* [shashank88/system_design](https://github.com/shashank88/system_design)
* [mmcgrana/services-engineering](https://github.com/mmcgrana/services-engineering)
* [System design cheat sheet](https://gist.github.com/vasanthk/485d1c25737e8e72759f)
* [A distributed systems reading list](http://dancres.github.io/Pages/)
* [Cracking the system design interview](http://www.puncsky.com/blog/2016-02-13-crack-the-system-design-interview)
l10n:p -->
## Credits
Источники указаны в самом документе.
Особая благодарность:
* [Hired in tech](http://www.hiredintech.com/system-design/the-system-design-process/)
* [Cracking the coding interview](https://www.amazon.com/dp/0984782850/)
* [High scalability](http://highscalability.com/)
* [checkcheckzz/system-design-interview](https://github.com/checkcheckzz/system-design-interview)
* [shashank88/system_design](https://github.com/shashank88/system_design)
* [mmcgrana/services-engineering](https://github.com/mmcgrana/services-engineering)
* [System design cheat sheet](https://gist.github.com/vasanthk/485d1c25737e8e72759f)
* [A distributed systems reading list](http://dancres.github.io/Pages/)
* [Cracking the system design interview](http://www.puncsky.com/blog/2016-02-13-crack-the-system-design-interview)
<!-- l10n:p
## Contact info
Feel free to contact me to discuss any issues, questions, or comments.
My contact info can be found on my [GitHub page](https://github.com/donnemartin).
l10n:p -->
## Contact info
Сообщайте мне, если вы хотите обсудить любые проблемы, вопросы или комментарии к этому документу.
Моя контактная информация доступна здесь: [GitHub page](https://github.com/donnemartin).
<!-- l10n:p
## License
*I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer (Facebook).*
Copyright 2017 Donne Martin
Creative Commons Attribution 4.0 International License (CC BY 4.0)
http://creativecommons.org/licenses/by/4.0/
l10n:p -->
## License
*I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer (Facebook).*
Copyright 2017 Donne Martin
Creative Commons Attribution 4.0 International License (CC BY 4.0)
http://creativecommons.org/licenses/by/4.0/

View File

@ -1,6 +1,6 @@
> * 原文地址:[github.com/donnemartin/system-design-primer](https://github.com/donnemartin/system-design-primer)
> * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner)
> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)
> * 译者:[XatMassacrE](https://github.com/XatMassacrE)、[L9m](https://github.com/L9m)、[Airmacho](https://github.com/Airmacho)、[xiaoyusilen](https://github.com/xiaoyusilen)、[jifaxu](https://github.com/jifaxu)、[根号三](https://github.com/sqrthree)
> * 这个 [链接](https://github.com/xitu/system-design-primer/compare/master...donnemartin:master) 用来查看本翻译与英文版是否有差别(如果你没有看到 README.md 发生变化,那就意味着这份翻译文档是最新的)。
*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة‎](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
@ -12,14 +12,6 @@
<br/>
</p>
## 翻译
有兴趣参与[翻译](https://github.com/donnemartin/system-design-primer/issues/28)? 以下是正在进行中的翻译:
* [巴西葡萄牙语](https://github.com/donnemartin/system-design-primer/issues/40)
* [简体中文](https://github.com/donnemartin/system-design-primer/issues/38)
* [土耳其语](https://github.com/donnemartin/system-design-primer/issues/39)
## 目的
> 学习如何设计大型系统。
@ -91,6 +83,7 @@
* 修复错误
* 完善章节
* 添加章节
* [帮助翻译](https://github.com/donnemartin/system-design-primer/issues/28)
一些还需要完善的内容放在了[正在完善中](#正在完善中)。

View File

@ -1,4 +1,4 @@
*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) | [العَرَبِيَّة‎](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [русский язык](https://github.com/donnemartin/system-design-primer/issues/87) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
*[English](README.md) ∙ [日本語](README-ja.md) ∙ [简体中文](README-zh-Hans.md) ∙ [繁體中文](README-zh-TW.md) ∙ [Русский](README-ru.md) | [العَرَبِيَّة‎](https://github.com/donnemartin/system-design-primer/issues/170) ∙ [বাংলা](https://github.com/donnemartin/system-design-primer/issues/220) ∙ [Português do Brasil](https://github.com/donnemartin/system-design-primer/issues/40) ∙ [Deutsch](https://github.com/donnemartin/system-design-primer/issues/186) ∙ [ελληνικά](https://github.com/donnemartin/system-design-primer/issues/130) ∙ [עברית](https://github.com/donnemartin/system-design-primer/issues/272) ∙ [Italiano](https://github.com/donnemartin/system-design-primer/issues/104) ∙ [韓國語](https://github.com/donnemartin/system-design-primer/issues/102) ∙ [فارسی](https://github.com/donnemartin/system-design-primer/issues/110) ∙ [Polski](https://github.com/donnemartin/system-design-primer/issues/68) ∙ [Español](https://github.com/donnemartin/system-design-primer/issues/136) ∙ [ภาษาไทย](https://github.com/donnemartin/system-design-primer/issues/187) ∙ [Türkçe](https://github.com/donnemartin/system-design-primer/issues/39) ∙ [tiếng Việt](https://github.com/donnemartin/system-design-primer/issues/127) ∙ [Français](https://github.com/donnemartin/system-design-primer/issues/250) | [Add Translation](https://github.com/donnemartin/system-design-primer/issues/28)*
# The System Design Primer

View File

@ -0,0 +1,440 @@
# 设计 Mint.com
**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题索引)中的有关部分,以避免重复的内容。您可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
## 第一步:简述用例与约束条件
> 搜集需求与问题的范围。
> 提出问题来明确用例与约束条件。
> 讨论假设。
我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
### 用例
#### 我们将把问题限定在仅处理以下用例的范围中
* **用户** 连接到一个财务账户
* **服务** 从账户中提取交易
* 每日更新
* 分类交易
* 允许用户手动分类
* 不自动重新分类
* 按类别分析每月支出
* **服务** 推荐预算
* 允许用户手动设置预算
* 当接近或者超出预算时,发送通知
* **服务** 具有高可用性
#### 非用例范围
* **服务** 执行附加的日志记录和分析
### 限制条件与假设
#### 提出假设
* 网络流量非均匀分布
* 自动账户日更新只适用于 30 天内活跃的用户
* 添加或者移除财务账户相对较少
* 预算通知不需要及时
* 1000 万用户
* 每个用户10个预算类别= 1亿个预算项
* 示例类别:
* Housing = $1,000
* Food = $200
* Gas = $100
* 卖方确定交易类别
* 50000 个卖方
* 3000 万财务账户
* 每月 50 亿交易
* 每月 5 亿读请求
* 10:1 读写比
* Write-heavy用户每天都进行交易但是每天很少访问该网站
#### 计算用量
**如果你需要进行粗略的用量计算,请向你的面试官说明。**
* 每次交易的用量:
* `user_id` - 8 字节
* `created_at` - 5 字节
* `seller` - 32 字节
* `amount` - 5 字节
* Total: ~50 字节
* 每月产生 250 GB 新的交易内容
* 每次交易 50 比特 * 50 亿交易每月
* 3年内新的交易内容 9 TB
* Assume most are new transactions instead of updates to existing ones
* 平均每秒产生 2000 次交易
* 平均每秒产生 200 读请求
便利换算指南:
* 每个月有 250 万秒
* 每秒一个请求 = 每个月 250 万次请求
* 每秒 40 个请求 = 每个月 1 亿次请求
* 每秒 400 个请求 = 每个月 10 亿次请求
## 第二步:概要设计
> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/E8klrBh.png)
## 第三步:设计核心组件
> 深入每个核心组件的细节。
### 用例:用户连接到一个财务账户
我们可以将 1000 万用户的信息存储在一个[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)中。我们应该讨论一下[选择SQL或NoSQL之间的用例和权衡](https://github.com/donnemartin/system-design-primer#sql-or-nosql)了。
* **客户端** 作为一个[反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server),发送请求到 **Web 服务器**
* **Web 服务器** 转发请求到 **账户API** 服务器
* **账户API** 服务器将新输入的账户信息更新到 **SQL数据库** 的`accounts`表
**告知你的面试官你准备写多少代码**。
`accounts`表应该具有如下结构:
```
id int NOT NULL AUTO_INCREMENT
created_at datetime NOT NULL
last_update datetime NOT NULL
account_url varchar(255) NOT NULL
account_login varchar(32) NOT NULL
account_password_hash char(64) NOT NULL
user_id int NOT NULL
PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```
我们将在`id``user_id`和`created_at`等字段上创建一个[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)以加速查找(对数时间而不是扫描整个表)并保持数据在内存中。从内存中顺序读取 1 MB数据花费大约250毫秒而从SSD读取是其4倍从磁盘读取是其80倍。<sup><a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1</a></sup>
我们将使用公开的[**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
```
$ curl -X POST --data '{ "user_id": "foo", "account_url": "bar", \
"account_login": "baz", "account_password": "qux" }' \
https://mint.com/api/v1/account
```
对于内部通信,我们可以使用[远程过程调用](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)。
接下来,服务从账户中提取交易。
### 用例:服务从账户中提取交易
如下几种情况下,我们会想要从账户中提取信息:
* 用户首次链接账户
* 用户手动更新账户
* 为过去 30 天内活跃的用户自动日更新
数据流:
* **客户端**向 **Web服务器** 发送请求
* **Web服务器** 将请求转发到 **帐户API** 服务器
* **帐户API** 服务器将job放在 **队列** 中,如 [Amazon SQS](https://aws.amazon.com/sqs/) 或者 [RabbitMQ](https://www.rabbitmq.com/)
* 提取交易可能需要一段时间,我们可能希望[与队列异步](https://github.com/donnemartin/system-design-primer#asynchronism)地来做,虽然这会引入额外的复杂度。
* **交易提取服务** 执行如下操作:
* 从 **Queue** 中拉取并从金融机构中提取给定用户的交易,将结果作为原始日志文件存储在 **对象存储区**。
* 使用 **分类服务** 来分类每个交易
* 使用 **预算服务** 来按类别计算每月总支出
* **预算服务** 使用 **通知服务** 让用户知道他们是否接近或者已经超出预算
* 更新具有分类交易的 **SQL数据库** 的`transactions`表
* 按类别更新 **SQL数据库** `monthly_spending`表的每月总支出
* 通过 **通知服务** 提醒用户交易完成
* 使用一个 **队列** (没有画出来) 来异步发送通知
`transactions`表应该具有如下结构:
```
id int NOT NULL AUTO_INCREMENT
created_at datetime NOT NULL
seller varchar(32) NOT NULL
amount decimal NOT NULL
user_id int NOT NULL
PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```
我们将在 `id``user_id`,和 `created_at`字段上创建[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)。
`monthly_spending`表应该具有如下结构:
```
id int NOT NULL AUTO_INCREMENT
month_year date NOT NULL
category varchar(32)
amount decimal NOT NULL
user_id int NOT NULL
PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```
我们将在`id``user_id`字段上创建[索引](https://github.com/donnemartin/system-design-primer#use-good-indices)。
#### 分类服务
对于 **分类服务**,我们可以生成一个带有最受欢迎卖家的卖家-类别字典。如果我们估计 50000 个卖家,并估计每个条目占用不少于 255 个字节,该字典只需要大约 12 MB内存。
**告知你的面试官你准备写多少代码**。
```python
class DefaultCategories(Enum):
HOUSING = 0
FOOD = 1
GAS = 2
SHOPPING = 3
...
seller_category_map = {}
seller_category_map['Exxon'] = DefaultCategories.GAS
seller_category_map['Target'] = DefaultCategories.SHOPPING
...
```
对于一开始没有在映射中的卖家,我们可以通过评估用户提供的手动类别来进行众包。在 O(1) 时间内,我们可以用堆来快速查找每个卖家的顶端的手动覆盖。
```python
class Categorizer(object):
def __init__(self, seller_category_map, self.seller_category_crowd_overrides_map):
self.seller_category_map = seller_category_map
self.seller_category_crowd_overrides_map = \
seller_category_crowd_overrides_map
def categorize(self, transaction):
if transaction.seller in self.seller_category_map:
return self.seller_category_map[transaction.seller]
elif transaction.seller in self.seller_category_crowd_overrides_map:
self.seller_category_map[transaction.seller] = \
self.seller_category_crowd_overrides_map[transaction.seller].peek_min()
return self.seller_category_map[transaction.seller]
return None
```
交易实现:
```python
class Transaction(object):
def __init__(self, created_at, seller, amount):
self.timestamp = timestamp
self.seller = seller
self.amount = amount
```
### 用例:服务推荐预算
首先,我们可以使用根据收入等级分配每类别金额的通用预算模板。使用这种方法,我们不必存储在约束中标识的 1 亿个预算项目,只需存储用户覆盖的预算项目。如果用户覆盖预算类别,我们可以在
`TABLE budget_overrides`中存储此覆盖。
```python
class Budget(object):
def __init__(self, income):
self.income = income
self.categories_to_budget_map = self.create_budget_template()
def create_budget_template(self):
return {
'DefaultCategories.HOUSING': income * .4,
'DefaultCategories.FOOD': income * .2
'DefaultCategories.GAS': income * .1,
'DefaultCategories.SHOPPING': income * .2
...
}
def override_category_budget(self, category, amount):
self.categories_to_budget_map[category] = amount
```
对于 **预算服务** 而言,我们可以在`transactions`表上运行SQL查询以生成`monthly_spending`聚合表。由于用户通常每个月有很多交易,所以`monthly_spending`表的行数可能会少于总共50亿次交易的行数。
作为替代,我们可以在原始交易文件上运行 **MapReduce** 作业来:
* 分类每个交易
* 按类别生成每月总支出
对交易文件的运行分析可以显著减少数据库的负载。
如果用户更新类别,我们可以调用 **预算服务** 重新运行分析。
**告知你的面试官你准备写多少代码**.
日志文件格式样例以tab分割
```
user_id timestamp seller amount
```
**MapReduce** 实现:
```python
class SpendingByCategory(MRJob):
def __init__(self, categorizer):
self.categorizer = categorizer
self.current_year_month = calc_current_year_month()
...
def calc_current_year_month(self):
"""返回当前年月"""
...
def extract_year_month(self, timestamp):
"""返回时间戳的年,月部分"""
...
def handle_budget_notifications(self, key, total):
"""如果接近或超出预算调用通知API"""
...
def mapper(self, _, line):
"""解析每个日志行,提取和转换相关行。
参数行应为如下形式:
user_id timestamp seller amount
使用分类器来将卖家转换成类别生成如下形式的key-value对
(user_id, 2016-01, shopping), 25
(user_id, 2016-01, shopping), 100
(user_id, 2016-01, gas), 50
"""
user_id, timestamp, seller, amount = line.split('\t')
category = self.categorizer.categorize(seller)
period = self.extract_year_month(timestamp)
if period == self.current_year_month:
yield (user_id, period, category), amount
def reducer(self, key, value):
"""将每个key对应的值求和。
(user_id, 2016-01, shopping), 125
(user_id, 2016-01, gas), 50
"""
total = sum(values)
yield key, sum(values)
```
## 第四步:设计扩展
> 根据限制条件,找到并解决瓶颈。
![Imgur](http://i.imgur.com/V5q57vU.png)
**重要提示:不要从最初设计直接跳到最终设计中!**
现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
* [反向代理web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
* [异步](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#异步)
* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
我们将增加一个额外的用例:**用户** 访问摘要和交易数据。
用户会话,按类别统计的统计信息,以及最近的事务可以放在 **内存缓存**(如 Redis 或 Memcached )中。
* **客户端** 发送读请求给 **Web 服务器**
* **Web 服务器** 转发请求到 **读 API** 服务器
* 静态内容可通过 **对象存储** 比如缓存在 **CDN** 上的 S3 来服务
* **读 API** 服务器做如下动作:
* 检查 **内存缓存** 的内容
* 如果URL在 **内存缓存**中,返回缓存的内容
* 否则
* 如果URL在 **SQL 数据库**中,获取该内容
* 以其内容更新 **内存缓存**
参考 [何时更新缓存](https://github.com/donnemartin/system-design-primer#when-to-update-the-cache) 中权衡和替代的内容。以上方法描述了 [cache-aside缓存模式](https://github.com/donnemartin/system-design-primer#cache-aside).
我们可以使用诸如 Amazon Redshift 或者 Google BigQuery 等数据仓库解决方案,而不是将`monthly_spending`聚合表保留在 **SQL 数据库** 中。
我们可能只想在数据库中存储一个月的`交易`数据,而将其余数据存储在数据仓库或者 **对象存储区** 中。**对象存储区** 如Amazon S3) 能够舒服地解决每月 250 GB新内容的限制。
为了解决每秒 *平均* 2000 次读请求数(峰值时更高),受欢迎的内容的流量应由 **内存缓存** 而不是数据库来处理。 **内存缓存** 也可用于处理不均匀分布的流量和流量尖峰。 只要副本不陷入重复写入的困境,**SQL 读副本** 应该能够处理高速缓存未命中。
*平均* 200 次交易写入每秒(峰值时更高)对于单个 **SQL 写入主-从服务** 来说可能是棘手的。我们可能需要考虑其它的 SQL 性能拓展技术:
* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
我们也可以考虑将一些数据移至 **NoSQL 数据库**
## 其它要点
> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
#### NoSQL
* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
### 缓存
* 在哪缓存
* [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
* [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
* [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
* [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
* [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
* 什么需要缓存
* [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
* [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
* 何时更新缓存
* [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
* [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
* [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
* [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
### 异步与微服务
* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
### 通信
* 可权衡选择的方案:
* 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
* 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
### 安全性
请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
### 延迟数值
请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
### 持续探讨
* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
* 架构拓展是一个迭代的过程。

View File

@ -1,6 +1,6 @@
# 设计 Pastebin.com (或者 Bit.ly)
**Note: 为了避免重复,当前文档直接链接到[系统设计主题](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)的相关区域,请参考链接内容以获得综合的讨论点、权衡和替代方案。**
**注意: 为了避免重复,当前文档会直接链接到[系统设计主题](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)的相关区域,请参考链接内容以获得综合的讨论点、权衡和替代方案。**
**设计 Bit.ly** - 是一个类似的问题,区别是 pastebin 需要存储的是 paste 的内容,而不是原始的未短化的 url。

View File

@ -0,0 +1,306 @@
# 设计一个键-值缓存来存储最近 web 服务查询的结果
**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
## 第一步:简述用例与约束条件
> 搜集需求与问题的范围。
> 提出问题来明确用例与约束条件。
> 讨论假设。
我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
### 用例
#### 我们将把问题限定在仅处理以下用例的范围中
* **用户**发送一个搜索请求,命中缓存
* **用户**发送一个搜索请求,未命中缓存
* **服务**有着高可用性
### 限制条件与假设
#### 提出假设
* 网络流量不是均匀分布的
* 经常被查询的内容应该一直存于缓存中
* 需要确定如何规定缓存过期、缓存刷新规则
* 缓存提供的服务查询速度要快
* 机器间延迟较低
* 缓存有内存限制
* 需要决定缓存什么、移除什么
* 需要缓存百万级的查询
* 1000 万用户
* 每个月 100 亿次查询
#### 计算用量
**如果你需要进行粗略的用量计算,请向你的面试官说明。**
* 缓存存储的是键值对有序表,键为 `query`(查询),值为 `results`(结果)。
* `query` - 50 字节
* `title` - 20 字节
* `snippet` - 200 字节
* 总计270 字节
* 假如 100 亿次查询都是不同的,且全部需要存储,那么每个月需要 2.7 TB 的缓存空间
* 单次查询 270 字节 * 每月查询 100 亿次
* 假设内存大小有限制,需要决定如何制定缓存过期规则
* 每秒 4,000 次请求
便利换算指南:
* 每个月有 250 万秒
* 每秒一个请求 = 每个月 250 万次请求
* 每秒 40 个请求 = 每个月 1 亿次请求
* 每秒 400 个请求 = 每个月 10 亿次请求
## 第二步:概要设计
> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/KqZ3dSx.png)
## 第三步:设计核心组件
> 深入每个核心组件的细节。
### 用例:用户发送了一次请求,命中了缓存
常用的查询可以由例如 Redis 或者 Memcached 之类的**内存缓存**提供支持,以减少数据读取延迟,并且避免**反向索引服务**以及**文档服务**的过载。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。<sup><a href=https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数>1</a></sup>
由于缓存容量有限,我们将使用 LRU近期最少使用算法来控制缓存的过期。
* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
* 这个 **Web 服务器**将请求转发给**查询 API** 服务
* **查询 API** 服务将会做这些事情:
* 分析查询
* 移除多余的内容
* 将文本分割成词组
* 修正拼写错误
* 规范化字母的大小写
* 将查询转换为布尔运算
* 检测**内存缓存**是否有匹配查询的内容
* 如果命中**内存缓存****内存缓存**将会做以下事情:
* 将缓存入口的位置指向 LRU 链表的头部
* 返回缓存内容
* 否则,**查询 API** 将会做以下事情:
* 使用**反向索引服务**来查找匹配查询的文档
* **反向索引服务**对匹配到的结果进行排名,然后返回最符合的结果
* 使用**文档服务**返回文章标题与片段
* 更新**内存缓存**,存入内容,将**内存缓存**入口位置指向 LRU 链表的头部
#### 缓存的实现
缓存可以使用双向链表实现:新元素将会在头结点加入,过期的元素将会在尾节点被删除。我们使用哈希表以便能够快速查找每个链表节点。
**向你的面试官告知你准备写多少代码**。
实现**查询 API 服务**
```python
class QueryApi(object):
def __init__(self, memory_cache, reverse_index_service):
self.memory_cache = memory_cache
self.reverse_index_service = reverse_index_service
def parse_query(self, query):
"""移除多余内容,将文本分割成词组,修复拼写错误,
规范化字母大小写,转换布尔运算。
"""
...
def process_query(self, query):
query = self.parse_query(query)
results = self.memory_cache.get(query)
if results is None:
results = self.reverse_index_service.process_search(query)
self.memory_cache.set(query, results)
return results
```
实现**节点**
```python
class Node(object):
def __init__(self, query, results):
self.query = query
self.results = results
```
实现**链表**
```python
class LinkedList(object):
def __init__(self):
self.head = None
self.tail = None
def move_to_front(self, node):
...
def append_to_front(self, node):
...
def remove_from_tail(self):
...
```
实现**缓存**
```python
class Cache(object):
def __init__(self, MAX_SIZE):
self.MAX_SIZE = MAX_SIZE
self.size = 0
self.lookup = {} # key: query, value: node
self.linked_list = LinkedList()
def get(self, query)
"""从缓存取得存储的内容
将入口节点位置更新为 LRU 链表的头部。
"""
node = self.lookup[query]
if node is None:
return None
self.linked_list.move_to_front(node)
return node.results
def set(self, results, query):
"""将所给查询键的结果存在缓存中。
当更新缓存记录的时候,将它的位置指向 LRU 链表的头部。
如果这个记录是新的记录,并且缓存空间已满,应该在加入新记录前
删除最老的记录。
"""
node = self.lookup[query]
if node is not None:
# 键存在于缓存中,更新它对应的值
node.results = results
self.linked_list.move_to_front(node)
else:
# 键不存在于缓存中
if self.size == self.MAX_SIZE:
# 在链表中查找并删除最老的记录
self.lookup.pop(self.linked_list.tail.query, None)
self.linked_list.remove_from_tail()
else:
self.size += 1
# 添加新的键值对
new_node = Node(query, results)
self.linked_list.append_to_front(new_node)
self.lookup[query] = new_node
```
#### 何时更新缓存
缓存将会在以下几种情况更新:
* 页面内容发生变化
* 页面被移除或者加入了新页面
* 页面的权值发生变动
解决这些问题的最直接的方法就是为缓存记录设置一个它在被更新前能留在缓存中的最长时间这个时间简称为存活时间TTL
参考 [「何时更新缓存」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#何时更新缓存)来了解其权衡取舍及替代方案。以上方法在[缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)一章中详细地进行了描述。
## 第四步:架构扩展
> 根据限制条件,找到并解决瓶颈。
![Imgur](http://i.imgur.com/4j99mhe.png)
**重要提示:不要从最初设计直接跳到最终设计中!**
现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
* [反向代理web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
### 将内存缓存扩大到多台机器
为了解决庞大的请求负载以及巨大的内存需求,我们将要对架构进行水平拓展。如何在我们的**内存缓存**集群中存储数据呢?我们有以下三个主要可选方案:
* **缓存集群中的每一台机器都有自己的缓存** - 简单,但是它会降低缓存命中率。
* **缓存集群中的每一台机器都有缓存的拷贝** - 简单,但是它的内存使用效率太低了。
* **对缓存进行[分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片),分别部署在缓存集群中的所有机器中** - 更加复杂,但是它是最佳的选择。我们可以使用哈希,用查询语句 `machine = hash(query)` 来确定哪台机器有需要缓存。当然我们也可以使用[一致性哈希](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#正在完善中)。
## 其它要点
> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
### SQL 缩放模式
* [读取复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
#### NoSQL
* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
### 缓存
* 在哪缓存
* [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
* [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
* [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
* [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
* [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
* 什么需要缓存
* [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
* [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
* 何时更新缓存
* [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
* [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
* [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
* [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
### 异步与微服务
* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
### 通信
* 可权衡选择的方案:
* 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
* 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
### 安全性
请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
### 延迟数值
请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
### 持续探讨
* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
* 架构拓展是一个迭代的过程。

View File

@ -0,0 +1,338 @@
# 为 Amazon 设计分类售卖排行
**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
## 第一步:简述用例与约束条件
> 搜集需求与问题的范围。
> 提出问题来明确用例与约束条件。
> 讨论假设。
我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
### 用例
#### 我们将把问题限定在仅处理以下用例的范围中
* **服务**根据分类计算过去一周中最受欢迎的商品
* **用户**通过分类浏览过去一周中最受欢迎的商品
* **服务**有着高可用性
#### 不在用例范围内的有
* 一般的电商网站
* 只为售卖排行榜设计组件
### 限制条件与假设
#### 提出假设
* 网络流量不是均匀分布的
* 一个商品可能存在于多个分类中
* 商品不能够更改分类
* 不会存在如 `foo/bar/baz` 之类的子分类
* 每小时更新一次结果
* 受欢迎的商品越多,就需要更频繁地更新
* 1000 万个商品
* 1000 个分类
* 每个月 10 亿次交易
* 每个月 1000 亿次读取请求
* 100:1 的读写比例
#### 计算用量
**如果你需要进行粗略的用量计算,请向你的面试官说明。**
* 每笔交易的用量:
* `created_at` - 5 字节
* `product_id` - 8 字节
* `category_id` - 4 字节
* `seller_id` - 8 字节
* `buyer_id` - 8 字节
* `quantity` - 4 字节
* `total_price` - 5 字节
* 总计:大约 40 字节
* 每个月的交易内容会产生 40 GB 的记录
* 每次交易 40 字节 * 每个月 10 亿次交易
* 3年内产生了 1.44 TB 的新交易内容记录
* 假定大多数的交易都是新交易而不是更改以前进行完的交易
* 平均每秒 400 次交易次数
* 平均每秒 40,000 次读取请求
便利换算指南:
* 每个月有 250 万秒
* 每秒一个请求 = 每个月 250 万次请求
* 每秒 40 个请求 = 每个月 1 亿次请求
* 每秒 400 个请求 = 每个月 10 亿次请求
## 第二步:概要设计
> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/vwMa1Qu.png)
## 第三步:设计核心组件
> 深入每个核心组件的细节。
### 用例:服务需要根据分类计算上周最受欢迎的商品
我们可以在现成的**对象存储**系统(例如 Amazon S3 服务)中存储 **售卖 API** 服务产生的日志文本, 因此不需要我们自己搭建分布式文件系统了。
**向你的面试官告知你准备写多少代码**。
假设下面是一个用 tab 分割的简易的日志记录:
```
timestamp product_id category_id qty total_price seller_id buyer_id
t1 product1 category1 2 20.00 1 1
t2 product1 category2 2 20.00 2 2
t2 product1 category2 1 10.00 2 3
t3 product2 category1 3 7.00 3 4
t4 product3 category2 7 2.00 4 5
t5 product4 category1 1 5.00 5 6
...
```
**售卖排行服务** 需要用到 **MapReduce**,并使用 **售卖 API** 服务进行日志记录,同时将结果写入 **SQL 数据库**中的总表 `sales_rank` 中。我们也可以讨论一下[究竟是用 SQL 还是用 NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
我们需要通过以下步骤使用 **MapReduce**
* **第 1 步** - 将数据转换为 `(category, product_id), sum(quantity)` 的形式
* **第 2 步** - 执行分布式排序
```python
class SalesRanker(MRJob):
def within_past_week(self, timestamp):
"""如果时间戳属于过去的一周则返回 True
否则返回 False。"""
...
def mapper(self, _ line):
"""解析日志的每一行,提取并转换相关行,
将键值对设定为如下形式:
(category1, product1), 2
(category2, product1), 2
(category2, product1), 1
(category1, product2), 3
(category2, product3), 7
(category1, product4), 1
"""
timestamp, product_id, category_id, quantity, total_price, seller_id, \
buyer_id = line.split('\t')
if self.within_past_week(timestamp):
yield (category_id, product_id), quantity
def reducer(self, key, value):
"""将每个 key 的值加起来。
(category1, product1), 2
(category2, product1), 3
(category1, product2), 3
(category2, product3), 7
(category1, product4), 1
"""
yield key, sum(values)
def mapper_sort(self, key, value):
"""构造 key 以确保正确的排序。
将键值对转换成如下形式:
(category1, 2), product1
(category2, 3), product1
(category1, 3), product2
(category2, 7), product3
(category1, 1), product4
MapReduce 的随机排序步骤会将键
值的排序打乱,变成下面这样:
(category1, 1), product4
(category1, 2), product1
(category1, 3), product2
(category2, 3), product1
(category2, 7), product3
"""
category_id, product_id = key
quantity = value
yield (category_id, quantity), product_id
def reducer_identity(self, key, value):
yield key, value
def steps(self):
""" 此处为 map reduce 步骤"""
return [
self.mr(mapper=self.mapper,
reducer=self.reducer),
self.mr(mapper=self.mapper_sort,
reducer=self.reducer_identity),
]
```
得到的结果将会是如下的排序列,我们将其插入 `sales_rank` 表中:
```
(category1, 1), product4
(category1, 2), product1
(category1, 3), product2
(category2, 3), product1
(category2, 7), product3
```
`sales_rank` 表的数据结构如下:
```
id int NOT NULL AUTO_INCREMENT
category_id int NOT NULL
total_sold int NOT NULL
product_id int NOT NULL
PRIMARY KEY(id)
FOREIGN KEY(category_id) REFERENCES Categories(id)
FOREIGN KEY(product_id) REFERENCES Products(id)
```
我们会以 `id`、`category_id` 与 `product_id` 创建一个 [索引](https://github.com/donnemartin/system-design-primer#use-good-indices)以加快查询速度(只需要使用读取日志的时间,不再需要每次都扫描整个数据表)并让数据常驻内存。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。<sup><a href=https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数>1</a></sup>
### 用例:用户需要根据分类浏览上周中最受欢迎的商品
* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
* 这个 **Web 服务器**将请求转发给**查询 API** 服务
* The **查询 API** 服务将从 **SQL 数据库**`sales_rank` 表中读取数据
我们可以调用一个公共的 [REST API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
```
$ curl https://amazon.com/api/v1/popular?category_id=1234
```
返回:
```
{
"id": "100",
"category_id": "1234",
"total_sold": "100000",
"product_id": "50",
},
{
"id": "53",
"category_id": "1234",
"total_sold": "90000",
"product_id": "200",
},
{
"id": "75",
"category_id": "1234",
"total_sold": "80000",
"product_id": "3",
},
```
而对于服务器内部的通信,我们可以使用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
## 第四步:架构扩展
> 根据限制条件,找到并解决瓶颈。
![Imgur](http://i.imgur.com/MzExP06.png)
**重要提示:不要从最初设计直接跳到最终设计中!**
现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
* [反向代理web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
**分析数据库** 可以用现成的数据仓储系统,例如使用 Amazon Redshift 或者 Google BigQuery 的解决方案。
当使用数据仓储技术或者**对象存储**系统时我们只想在数据库中存储有限时间段的数据。Amazon S3 的**对象存储**系统可以很方便地设置每个月限制只允许新增 40 GB 的存储内容。
平均每秒 40,000 次的读取请求(峰值将会更高), 可以通过扩展 **内存缓存** 来处理热点内容的读取流量,这对于处理不均匀分布的流量和流量峰值也很有用。由于读取量非常大,**SQL Read 副本** 可能会遇到处理缓存未命中的问题,我们可能需要使用额外的 SQL 扩展模式。
平均每秒 400 次写操作(峰值将会更高)可能对于单个 **SQL 写主-从** 模式来说比较很困难,因此同时还需要更多的扩展技术
SQL 缩放模式包括:
* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
我们也可以考虑将一些数据移至 **NoSQL 数据库**
## 其它要点
> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
#### NoSQL
* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
### 缓存
* 在哪缓存
* [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
* [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
* [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
* [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
* [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
* 什么需要缓存
* [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
* [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
* 何时更新缓存
* [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
* [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
* [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
* [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
### 异步与微服务
* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
### 通信
* 可权衡选择的方案:
* 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
* 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
### 安全性
请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
### 延迟数值
请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
### 持续探讨
* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
* 架构拓展是一个迭代的过程。

View File

@ -0,0 +1,403 @@
# 在 AWS 上设计支持百万级到千万级用户的系统
**注释:为了避免重复,这篇文章的链接直接关联到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 的相关章节。为一讨论要点、折中方案和可选方案做参考。**
## 第 1 步:用例和约束概要
> 收集需求并调查问题。
> 通过提问清晰用例和约束。
> 讨论假设。
如果没有面试官提出明确的问题,我们将自己定义一些用例和约束条件。
### 用例
解决这个问题是一个循序渐进的过程1) **基准/负载 测试** 2) 瓶颈 **概述** 3) 当评估可选和折中方案时定位瓶颈4) 重复,这是向可扩展的设计发展基础设计的好模式。
除非你有 AWS 的背景或者正在申请需要 AWS 知识的相关职位,否则不要求了解 AWS 的相关细节。并且这个练习中讨论的许多原则可以更广泛地应用于AWS生态系统之外。
#### 我们就处理以下用例讨论这一问题
* **用户** 进行读或写请求
* **服务** 进行处理,存储用户数据,然后返回结果
* **服务** 需要从支持小规模用户开始到百万用户
* 在我们演化架构来处理大量的用户和请求时,讨论一般的扩展模式
* **服务** 高可用
### 约束和假设
#### 状态假设
* 流量不均匀分布
* 需要关系数据
* 从一个用户扩展到千万用户
* 表示用户量的增长
* 用户量+
* 用户量++
* 用户量+++
* ...
* 1000 万用户
* 每月 10 亿次写入
* 每月 1000 亿次读出
* 100:1 读写比率
* 每次写入 1 KB 内容
#### 计算使用
**向你的面试官厘清你是否应该做粗略的使用计算**
* 1 TB 新内容 / 月
* 1 KB 每次写入 * 10 亿 写入 / 月
* 36 TB 新内容 / 3 年
* 假设大多数写入都是新内容而不是更新已有内容
* 平均每秒 400 次写入
* 平均每秒 40,000 次读取
便捷的转换指南:
* 250 万秒 / 月
* 1 次请求 / 秒 = 250 万次请求 / 月
* 40 次请求 / 秒 = 1 亿次请求 / 月
* 400 次请求 / 秒 = 10 亿请求 / 月
## 第 2 步:创建高级设计方案
> 用所有重要组件概述高水平设计
![Imgur](http://i.imgur.com/B8LDKD7.png)
## 第 3 步:设计核心组件
> 深入每个核心组件的细节。
### 用例:用户进行读写请求
#### 目标
* 只有 1-2 个用户时,你只需要基础配置
* 为简单起见,只需要一台服务器
* 必要时进行纵向扩展
* 监控以确定瓶颈
#### 以单台服务器开始
* **Web 服务器** 在 EC2 上
* 存储用户数据
* [**MySQL 数据库**](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
运用 **纵向扩展**
* 选择一台更大容量的服务器
* 密切关注指标,确定如何扩大规模
* 使用基本监控来确定瓶颈:CPU、内存、IO、网络等
* CloudWatch, top, nagios, statsd, graphite等
* 纵向扩展的代价将变得更昂贵
* 无冗余/容错
**折中方案, 可选方案, 和其他细节:**
* **纵向扩展** 的可选方案是 [**横向扩展**](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
#### 自 SQL 开始,但认真考虑 NoSQL
约束条件假设需要关系型数据。我们可以开始时在单台服务器上使用 **MySQL 数据库**
**折中方案, 可选方案, 和其他细节:**
* 查阅 [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) 章节
* 讨论使用 [SQL 或 NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql) 的原因
#### 分配公共静态 IP
* 弹性 IP 提供了一个公共端点,不会在重启时改变 IP。
* 故障转移时只需要把域名指向新 IP。
#### 使用 DNS 服务
添加 **DNS** 服务,比如 Route 53[Amazon Route 53](https://aws.amazon.com/cn/route53/) - 译者注),将域映射到实例的公共 IP 中。
**折中方案, 可选方案, 和其他细节:**
* 查阅 [域名系统](https://github.com/donnemartin/system-design-primer#domain-name-system) 章节
#### 安全的 Web 服务器
* 只开放必要的端口
* 允许 Web 服务器响应来自以下端口的请求
* HTTP 80
* HTTPS 443
* SSH IP 白名单 22
* 防止 Web 服务器启动外链
**折中方案, 可选方案, 和其他细节:**
* 查阅 [安全](https://github.com/donnemartin/system-design-primer#security) 章节
## 第 4 步:扩展设计
> 在给定约束条件下,定义和确认瓶颈。
### 用户+
![Imgur](http://i.imgur.com/rrfjMXB.png)
#### 假设
我们的用户数量开始上升,并且单台服务器的负载上升。**基准/负载测试** 和 **分析** 指出 **MySQL 数据库** 占用越来越多的内存和 CPU 资源,同时用户数据将填满硬盘空间。
目前,我们尚能在纵向扩展时解决这些问题。不幸的是,解决这些问题的代价变得相当昂贵,并且原来的系统并不能允许在 **MySQL 数据库****Web 服务器** 的基础上进行独立扩展。
#### 目标
* 减轻单台服务器负载并且允许独立扩展
* 在 **对象存储** 中单独存储静态内容
* 将 **MySQL 数据库** 迁移到单独的服务器上
* 缺点
* 这些变化会增加复杂性,并要求对 **Web服务器** 进行更改,以指向 **对象存储** 和 **MySQL 数据库**
* 必须采取额外的安全措施来确保新组件的安全
* AWS 的成本也会增加,但应该与自身管理类似系统的成本做比较
#### 独立保存静态内容
* 考虑使用像 S3 这样可管理的 **对象存储** 服务来存储静态内容
* 高扩展性和可靠性
* 服务器端加密
* 迁移静态内容到 S3
* 用户文件
* JS
* CSS
* 图片
* 视频
#### 迁移 MySQL 数据库到独立机器上
* 考虑使用类似 RDS 的服务来管理 **MySQL 数据库**
* 简单的管理,扩展
* 多个可用区域
* 空闲时加密
#### 系统安全
* 在传输和空闲时对数据进行加密
* 使用虚拟私有云
* 为单个 **Web 服务器** 创建一个公共子网,这样就可以发送和接收来自 internet 的流量
* 为其他内容创建一个私有子网,禁止外部访问
* 在每个组件上只为白名单 IP 打开端口
* 这些相同的模式应当在新的组件的实现中实践
**折中方案, 可选方案, 和其他细节:**
* 查阅 [安全](https://github.com/donnemartin/system-design-primer#security) 章节
### 用户+++
![Imgur](http://i.imgur.com/raoFTXM.png)
#### 假设
我们的 **基准/负载测试** 和 **性能测试** 显示,在高峰时段,我们的单一 **Web服务器** 存在瓶颈,导致响应缓慢,在某些情况下还会宕机。随着服务的成熟,我们也希望朝着更高的可用性和冗余发展。
#### 目标
* 下面的目标试图用 **Web服务器** 解决扩展问题
* 基于 **基准/负载测试** 和 **分析**,你可能只需要实现其中的一两个技术
* 使用 [**横向扩展**](https://github.com/donnemartin/system-design-primer#horizontal-scaling) 来处理增加的负载和单点故障
* 添加 [**负载均衡器**](https://github.com/donnemartin/system-design-primer#load-balancer) 例如 Amazon 的 ELB 或 HAProxy
* ELB 是高可用的
* 如果你正在配置自己的 **负载均衡器**, 在多个可用区域中设置多台服务器用于 [双活](https://github.com/donnemartin/system-design-primer#active-active) 或 [主被](https://github.com/donnemartin/system-design-primer#active-passive) 将提高可用性
* 终止在 **负载平衡器** 上的SSL以减少后端服务器上的计算负载并简化证书管理
* 在多个可用区域中使用多台 **Web服务器**
* 在多个可用区域的 [**主-从 故障转移**](https://github.com/donnemartin/system-design-primer#master-slave-replication) 模式中使用多个 **MySQL** 实例来改进冗余
* 分离 **Web 服务器** 和 [**应用服务器**](https://github.com/donnemartin/system-design-primer#application-layer)
* 独立扩展和配置每一层
* **Web 服务器** 可以作为 [**反向代理**](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
* 例如, 你可以添加 **应用服务器** 处理 **读 API** 而另外一些处理 **写 API**
* 将静态(和一些动态)内容转移到 [**内容分发网络 (CDN)**](https://github.com/donnemartin/system-design-primer#content-delivery-network) 例如 CloudFront 以减少负载和延迟
**折中方案, 可选方案, 和其他细节:**
* 查阅以上链接获得更多细节
### 用户+++
![Imgur](http://i.imgur.com/OZCxJr0.png)
**注意:** **内部负载均衡** 不显示以减少混乱
#### 假设
我们的 **性能/负载测试** 和 **性能测试** 显示我们读操作频繁100:1 的读写比率),并且数据库在高读请求时表现很糟糕。
#### 目标
* 下面的目标试图解决 **MySQL数据库** 的伸缩性问题
* * 基于 **基准/负载测试** 和 **分析**,你可能只需要实现其中的一两个技术
* 将下列数据移动到一个 [**内存缓存**](https://github.com/donnemartin/system-design-primer#cache),例如弹性缓存,以减少负载和延迟:
* **MySQL** 中频繁访问的内容
* 首先, 尝试配置 **MySQL 数据库** 缓存以查看是否足以在实现 **内存缓存** 之前缓解瓶颈
* 来自 **Web 服务器** 的会话数据
* **Web 服务器** 变成无状态的, 允许 **自动伸缩**
* 从内存中读取 1 MB 内存需要大约 250 微秒而从SSD中读取时间要长 4 倍,从磁盘读取的时间要长 80 倍。<sup><a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1</a></sup>
* 添加 [**MySQL 读取副本**](https://github.com/donnemartin/system-design-primer#master-slave-replication) 来减少写主线程的负载
* 添加更多 **Web 服务器** and **应用服务器** 来提高响应
**折中方案, 可选方案, 和其他细节:**
* 查阅以上链接获得更多细节
#### 添加 MySQL 读取副本
* 除了添加和扩展 **内存缓存****MySQL 读副本服务器** 也能够帮助缓解在 **MySQL 写主服务器** 的负载。
* 添加逻辑到 **Web 服务器** 来区分读和写操作
* 在 **MySQL 读副本服务器** 之上添加 **负载均衡器** (不是为了减少混乱)
* 大多数服务都是读取负载大于写入负载
**折中方案, 可选方案, 和其他细节:**
* 查阅 [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) 章节
### 用户++++
![Imgur](http://i.imgur.com/3X8nmdL.png)
#### 假设
**基准/负载测试** 和 **分析** 显示,在美国,正常工作时间存在流量峰值,当用户离开办公室时,流量骤降。我们认为,可以通过真实负载自动转换服务器数量来降低成本。我们是一家小商店,所以我们希望 DevOps 尽量自动化地进行 **自动伸缩** 和通用操作。
#### 目标
* 根据需要添加 **自动扩展**
* 跟踪流量高峰
* 通过关闭未使用的实例来降低成本
* 自动化 DevOps
* Chef, Puppet, Ansible 工具等
* 继续监控指标以解决瓶颈
* **主机水平** - 检查一个 EC2 实例
* **总水平** - 检查负载均衡器统计数据
* **日志分析** - CloudWatch, CloudTrail, Loggly, Splunk, Sumo
* **外部站点的性能** - Pingdom or New Relic
* **处理通知和事件** - PagerDuty
* **错误报告** - Sentry
#### 添加自动扩展
* 考虑使用一个托管服务比如AWS **自动扩展**
* 为每个 **Web 服务器** 创建一个组,并为每个 **应用服务器** 类型创建一个组,将每个组放置在多个可用区域中
* 设置最小和最大实例数
* 通过 CloudWatch 来扩展或收缩
* 可预测负载的简单时间度量
* 一段时间内的指标:
* CPU 负载
* 延迟
* 网络流量
* 自定义指标
* 缺点
* 自动扩展会引入复杂性
* 可能需要一段时间才能适当扩大规模,以满足增加的需求,或者在需求下降时缩减规模
### 用户+++++
![Imgur](http://i.imgur.com/jj3A5N8.png)
**注释:** **自动伸缩** 组不显示以减少混乱
#### 假设
当服务继续向着限制条件概述的方向发展,我们反复地运行 **基准/负载测试** 和 **分析** 来进一步发现和定位新的瓶颈。
#### 目标
由于问题的约束,我们将继续提出扩展性的问题:
* 如果我们的 **MySQL 数据库** 开始变得过于庞大, 我们可能只考虑把数据在数据库中存储一段有限的时间, 同时在例如 Redshift 这样的数据仓库中存储其余的数据
* 像 Redshift 这样的数据仓库能够轻松处理每月 1TB 的新内容
* 平均每秒 40,000 次的读取请求, 可以通过扩展 **内存缓存** 来处理热点内容的读取流量,这对于处理不均匀分布的流量和流量峰值也很有用
* **SQL读取副本** 可能会遇到处理缓存未命中的问题, 我们可能需要使用额外的 SQL 扩展模式
* 对于单个 **SQL 写主-从** 模式来说,平均每秒 400 次写操作(明显更高)可能会很困难,同时还需要更多的扩展技术
SQL 扩展模型包括:
* [集合](https://github.com/donnemartin/system-design-primer#federation)
* [分片](https://github.com/donnemartin/system-design-primer#sharding)
* [反范式](https://github.com/donnemartin/system-design-primer#denormalization)
* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
为了进一步处理高读和写请求,我们还应该考虑将适当的数据移动到一个 [**NoSQL数据库**](https://github.com/donnemartin/system-design-primer#nosql) ,例如 DynamoDB。
我们可以进一步分离我们的 [**应用服务器**](https://github.com/donnemartin/system-design-primer#application-layer) 以允许独立扩展。不需要实时完成的批处理任务和计算可以通过 Queues 和 Workers 异步完成:
* 以照片服务为例,照片上传和缩略图的创建可以分开进行
* **客户端** 上传图片
* **应用服务器** 推送一个任务到 **队列** 例如 SQS
* EC2 上的 **Worker 服务** 或者 Lambda 从 **队列** 拉取 work然后
* 创建缩略图
* 更新 **数据库**
* 在 **对象存储** 中存储缩略图
**折中方案, 可选方案, 和其他细节:**
* 查阅以上链接获得更多细节
## 额外的话题
> 根据问题的范围和剩余时间,还需要深入讨论其他问题。
### SQL 扩展模式
* [读取副本](https://github.com/donnemartin/system-design-primer#master-slave-replication)
* [集合](https://github.com/donnemartin/system-design-primer#federation)
* [分区](https://github.com/donnemartin/system-design-primer#sharding)
* [反规范化](https://github.com/donnemartin/system-design-primer#denormalization)
* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
#### NoSQL
* [键值存储](https://github.com/donnemartin/system-design-primer#key-value-store)
* [文档存储](https://github.com/donnemartin/system-design-primer#document-store)
* [宽表存储](https://github.com/donnemartin/system-design-primer#wide-column-store)
* [图数据库](https://github.com/donnemartin/system-design-primer#graph-database)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
### 缓存
* 缓存到哪里
* [客户端缓存](https://github.com/donnemartin/system-design-primer#client-caching)
* [CDN 缓存](https://github.com/donnemartin/system-design-primer#cdn-caching)
* [Web 服务缓存](https://github.com/donnemartin/system-design-primer#web-server-caching)
* [数据库缓存](https://github.com/donnemartin/system-design-primer#database-caching)
* [应用缓存](https://github.com/donnemartin/system-design-primer#application-caching)
* 缓存什么
* [数据库请求层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
* [对象层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
* 何时更新缓存
* [预留缓存](https://github.com/donnemartin/system-design-primer#cache-aside)
* [完全写入](https://github.com/donnemartin/system-design-primer#write-through)
* [延迟写 (写回)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
* [事先更新](https://github.com/donnemartin/system-design-primer#refresh-ahead)
### 异步性和微服务
* [消息队列](https://github.com/donnemartin/system-design-primer#message-queues)
* [任务队列](https://github.com/donnemartin/system-design-primer#task-queues)
* [回退压力](https://github.com/donnemartin/system-design-primer#back-pressure)
* [微服务](https://github.com/donnemartin/system-design-primer#microservices)
### 沟通
* 关于折中方案的讨论:
* 客户端的外部通讯 - [遵循 REST 的 HTTP APIs](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
* 内部通讯 - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
* [服务探索](https://github.com/donnemartin/system-design-primer#service-discovery)
### 安全性
参考 [安全章节](https://github.com/donnemartin/system-design-primer#security)
### 延迟数字指标
查阅 [每个程序员必懂的延迟数字](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know)
### 正在进行
* 继续基准测试并监控你的系统以解决出现的瓶颈问题
* 扩展是一个迭代的过程

View File

@ -0,0 +1,348 @@
# 为社交网络设计数据结构
**注释:为了避免重复,这篇文章的链接直接关联到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 的相关章节。为一讨论要点、折中方案和可选方案做参考。**
## 第 1 步:用例和约束概要
> 收集需求并调查问题。
> 通过提问清晰用例和约束。
> 讨论假设。
如果没有面试官提出明确的问题,我们将自己定义一些用例和约束条件。
### 用例
#### 我们就处理以下用例审视这一问题
* **用户** 寻找某人并显示与被寻人之间的最短路径
* **服务** 高可用
### 约束和假设
#### 状态假设
* 流量分布不均
* 某些搜索比别的更热门,同时某些搜索仅执行一次
* 图数据不适用单一机器
* 图的边没有权重
* 1 千万用户
* 每个用户平均有 50 个朋友
* 每月 10 亿次朋友搜索
训练使用更传统的系统 - 别用图特有的解决方案例如 [GraphQL](http://graphql.org/) 或图数据库如 [Neo4j](https://neo4j.com/)。
#### 计算使用
**向你的面试官厘清你是否应该做粗略的使用计算**
* 50 亿朋友关系
* 1 亿用户 * 平均每人 50 个朋友
* 每秒 400 次搜索请求
便捷的转换指南:
* 每月 250 万秒
* 每秒 1 个请求 = 每月 250 万次请求
* 每秒 40 个请求 = 每月 1 亿次请求
* 每秒 400 个请求 = 每月 10 亿次请求
## 第 2 步:创建高级设计方案
> 用所有重要组件概述高水平设计
![Imgur](http://i.imgur.com/wxXyq2J.png)
## 第 3 步:设计核心组件
> 深入每个核心组件的细节。
### 用例: 用户搜索某人并查看到被搜人的最短路径
**和你的面试官说清你期望的代码量**
没有百万用户(点)的和十亿朋友关系(边)的限制,我们能够用一般 BFS 方法解决无权重最短路径任务:
```python
class Graph(Graph):
def shortest_path(self, source, dest):
if source is None or dest is None:
return None
if source is dest:
return [source.key]
prev_node_keys = self._shortest_path(source, dest)
if prev_node_keys is None:
return None
else:
path_ids = [dest.key]
prev_node_key = prev_node_keys[dest.key]
while prev_node_key is not None:
path_ids.append(prev_node_key)
prev_node_key = prev_node_keys[prev_node_key]
return path_ids[::-1]
def _shortest_path(self, source, dest):
queue = deque()
queue.append(source)
prev_node_keys = {source.key: None}
source.visit_state = State.visited
while queue:
node = queue.popleft()
if node is dest:
return prev_node_keys
prev_node = node
for adj_node in node.adj_nodes.values():
if adj_node.visit_state == State.unvisited:
queue.append(adj_node)
prev_node_keys[adj_node.key] = prev_node.key
adj_node.visit_state = State.visited
return None
```
我们不能在同一台机器上满足所有用户,我们需要通过 **人员服务器** [拆分](https://github.com/donnemartin/system-design-primer#sharding) 用户并且通过 **查询服务** 访问。
* **客户端** 向 **服务器** 发送请求,**服务器** 作为 [反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
* **搜索 API** 服务器向 **用户图服务** 转发请求
* **用户图服务** 有以下功能:
* 使用 **查询服务** 找到当前用户信息存储的 **人员服务器**
* 找到适当的 **人员服务器** 检索当前用户的 `friend_ids` 列表
* 把当前用户作为 `source` 运行 BFS 搜索算法同时 当前用户的 `friend_ids` 作为每个 `adjacent_node` 的 ids
* 给定 id 获取 `adjacent_node`:
* **用户图服务** 将 **再次** 和 **查询服务** 通讯,最后判断出和给定 id 相匹配的存储 `adjacent_node`**人员服务器**(有待优化)
**和你的面试官说清你应该写的代码量**
**注释**:简易版错误处理执行如下。询问你是否需要编写适当的错误处理方法。
**查询服务** 实现:
```python
class LookupService(object):
def __init__(self):
self.lookup = self._init_lookup() # key: person_id, value: person_server
def _init_lookup(self):
...
def lookup_person_server(self, person_id):
return self.lookup[person_id]
```
**人员服务器** 实现:
```python
class PersonServer(object):
def __init__(self):
self.people = {} # key: person_id, value: person
def add_person(self, person):
...
def people(self, ids):
results = []
for id in ids:
if id in self.people:
results.append(self.people[id])
return results
```
**用户** 实现:
```python
class Person(object):
def __init__(self, id, name, friend_ids):
self.id = id
self.name = name
self.friend_ids = friend_ids
```
**用户图服务** 实现:
```python
class UserGraphService(object):
def __init__(self, lookup_service):
self.lookup_service = lookup_service
def person(self, person_id):
person_server = self.lookup_service.lookup_person_server(person_id)
return person_server.people([person_id])
def shortest_path(self, source_key, dest_key):
if source_key is None or dest_key is None:
return None
if source_key is dest_key:
return [source_key]
prev_node_keys = self._shortest_path(source_key, dest_key)
if prev_node_keys is None:
return None
else:
# Iterate through the path_ids backwards, starting at dest_key
path_ids = [dest_key]
prev_node_key = prev_node_keys[dest_key]
while prev_node_key is not None:
path_ids.append(prev_node_key)
prev_node_key = prev_node_keys[prev_node_key]
# Reverse the list since we iterated backwards
return path_ids[::-1]
def _shortest_path(self, source_key, dest_key, path):
# Use the id to get the Person
source = self.person(source_key)
# Update our bfs queue
queue = deque()
queue.append(source)
# prev_node_keys keeps track of each hop from
# the source_key to the dest_key
prev_node_keys = {source_key: None}
# We'll use visited_ids to keep track of which nodes we've
# visited, which can be different from a typical bfs where
# this can be stored in the node itself
visited_ids = set()
visited_ids.add(source.id)
while queue:
node = queue.popleft()
if node.key is dest_key:
return prev_node_keys
prev_node = node
for friend_id in node.friend_ids:
if friend_id not in visited_ids:
friend_node = self.person(friend_id)
queue.append(friend_node)
prev_node_keys[friend_id] = prev_node.key
visited_ids.add(friend_id)
return None
```
我们用的是公共的 [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
```
$ curl https://social.com/api/v1/friend_search?person_id=1234
```
响应:
```
{
"person_id": "100",
"name": "foo",
"link": "https://social.com/foo",
},
{
"person_id": "53",
"name": "bar",
"link": "https://social.com/bar",
},
{
"person_id": "1234",
"name": "baz",
"link": "https://social.com/baz",
},
```
内部通信使用 [远端过程调用](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)。
## 第 4 步:扩展设计
> 在给定约束条件下,定义和确认瓶颈。
![Imgur](http://i.imgur.com/cdCv5g7.png)
**重要:别简化从最初设计到最终设计的过程!**
你将要做的是1) **基准/负载 测试** 2) 瓶颈 **概述** 3) 当评估可选和折中方案时定位瓶颈4) 重复。以 [在 AWS 上设计支持百万级到千万级用户的系统](../scaling_aws/README.md) 为参考迭代地扩展最初设计。
讨论最初设计可能遇到的瓶颈和处理方法十分重要。例如,什么问题可以通过添加多台 **Web 服务器** 作为 **负载均衡** 解决?**CDN****主从副本**?每个问题都有哪些替代和 **折中** 方案?
我们即将介绍一些组件来完成设计和解决扩展性问题。内部负载均衡不显示以减少混乱。
**避免重复讨论**,以下网址链接到 [系统设计主题](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) 相关的主流方案、折中方案和替代方案。
* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
* [负载均衡](https://github.com/donnemartin/system-design-primer#load-balancer)
* [横向扩展](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
* [Web 服务器(反向代理)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
* [API 服务器(应用层)](https://github.com/donnemartin/system-design-primer#application-layer)
* [缓存](https://github.com/donnemartin/system-design-primer#cache)
* [一致性模式](https://github.com/donnemartin/system-design-primer#consistency-patterns)
* [可用性模式](https://github.com/donnemartin/system-design-primer#availability-patterns)
解决 **平均** 每秒 400 次请求的限制(峰值),人员数据可以存在例如 Redis 或 Memcached 这样的 **内存** 中以减少响应次数和下游流量通信服务。这尤其在用户执行多次连续查询和查询哪些广泛连接的人时十分有用。从内存中读取 1MB 数据大约要 250 微秒,从 SSD 中读取同样大小的数据时间要长 4 倍,从硬盘要长 80 倍。<sup><a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1</a></sup>
以下是进一步优化方案:
* 在 **内存** 中存储完整的或部分的BFS遍历加快后续查找
* 在 **NoSQL 数据库** 中批量离线计算并存储完整的或部分的BFS遍历加快后续查找
* 在同一台 **人员服务器** 上托管批处理同一批朋友查找减少机器跳转
* 通过地理位置 [拆分](https://github.com/donnemartin/system-design-primer#sharding) **人员服务器** 来进一步优化,因为朋友通常住得都比较近
* 同时进行两个 BFS 查找,一个从 source 开始,一个从 destination 开始,然后合并两个路径
* 从有庞大朋友圈的人开始找起,这样更有可能减小当前用户和搜索目标之间的 [离散度数](https://en.wikipedia.org/wiki/Six_degrees_of_separation)
* 在询问用户是否继续查询之前设置基于时间或跳跃数阈值,当在某些案例中搜索耗费时间过长时。
* 使用类似 [Neo4j](https://neo4j.com/) 的 **图数据库** 或图特定查询语法,例如 [GraphQL](http://graphql.org/)(如果没有禁止使用 **图数据库** 的限制的话)
## 额外的话题
> 根据问题的范围和剩余时间,还需要深入讨论其他问题。
### SQL 扩展模式
* [读取副本](https://github.com/donnemartin/system-design-primer#master-slave-replication)
* [集合](https://github.com/donnemartin/system-design-primer#federation)
* [分区](https://github.com/donnemartin/system-design-primer#sharding)
* [反规范化](https://github.com/donnemartin/system-design-primer#denormalization)
* [SQL 调优](https://github.com/donnemartin/system-design-primer#sql-tuning)
#### NoSQL
* [键值存储](https://github.com/donnemartin/system-design-primer#key-value-store)
* [文档存储](https://github.com/donnemartin/system-design-primer#document-store)
* [宽表存储](https://github.com/donnemartin/system-design-primer#wide-column-store)
* [图数据库](https://github.com/donnemartin/system-design-primer#graph-database)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
### 缓存
* 缓存到哪里
* [客户端缓存](https://github.com/donnemartin/system-design-primer#client-caching)
* [CDN 缓存](https://github.com/donnemartin/system-design-primer#cdn-caching)
* [Web 服务缓存](https://github.com/donnemartin/system-design-primer#web-server-caching)
* [数据库缓存](https://github.com/donnemartin/system-design-primer#database-caching)
* [应用缓存](https://github.com/donnemartin/system-design-primer#application-caching)
* 缓存什么
* [数据库请求层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
* [对象层缓存](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
* 何时更新缓存
* [预留缓存](https://github.com/donnemartin/system-design-primer#cache-aside)
* [完全写入](https://github.com/donnemartin/system-design-primer#write-through)
* [延迟写 (写回)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
* [事先更新](https://github.com/donnemartin/system-design-primer#refresh-ahead)
### 异步性和微服务
* [消息队列](https://github.com/donnemartin/system-design-primer#message-queues)
* [任务队列](https://github.com/donnemartin/system-design-primer#task-queues)
* [回退压力](https://github.com/donnemartin/system-design-primer#back-pressure)
* [微服务](https://github.com/donnemartin/system-design-primer#microservices)
### 沟通
* 关于折中方案的讨论:
* 客户端的外部通讯 - [遵循 REST 的 HTTP APIs](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
* 内部通讯 - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
* [服务探索](https://github.com/donnemartin/system-design-primer#service-discovery)
### 安全性
参考 [安全章节](https://github.com/donnemartin/system-design-primer#security)
### 延迟数字指标
查阅 [每个程序员必懂的延迟数字](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know)
### 正在进行
* 继续基准测试并监控你的系统以解决出现的瓶颈问题
* 扩展是一个迭代的过程

View File

@ -0,0 +1,331 @@
# 设计推特时间轴与搜索功能
**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
**设计 Facebook 的 feed** 与**设计 Facebook 搜索**与此为同一类型问题。
## 第一步:简述用例与约束条件
> 搜集需求与问题的范围。
> 提出问题来明确用例与约束条件。
> 讨论假设。
我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
### 用例
#### 我们将把问题限定在仅处理以下用例的范围中
* **用户**发布了一篇推特
* **服务**将推特推送给关注者,给他们发送消息通知与邮件
* **用户**浏览用户时间轴(用户最近的活动)
* **用户**浏览主页时间轴(用户关注的人最近的活动)
* **用户**搜索关键词
* **服务**需要有高可用性
#### 不在用例范围内的有
* **服务**向 Firehose 与其它流数据接口推送推特
* **服务**根据用户的”是否可见“选项排除推特
* 隐藏未关注者的 @回复
* 关心”隐藏转发“设置
* 数据分析
### 限制条件与假设
#### 提出假设
普遍情况
* 网络流量不是均匀分布的
* 发布推特的速度需要足够快速
* 除非有上百万的关注者,否则将推特推送给粉丝的速度要足够快
* 1 亿个活跃用户
* 每天新发布 5 亿条推特,每月新发布 150 亿条推特
* 平均每条推特需要推送给 5 个人
* 每天需要进行 50 亿次推送
* 每月需要进行 1500 亿次推送
* 每月需要处理 2500 亿次读取请求
* 每月需要处理 100 亿次搜索
时间轴功能
* 浏览时间轴需要足够快
* 推特的读取负载要大于写入负载
* 需要为推特的快速读取进行优化
* 存入推特是高写入负载功能
搜索功能
* 搜索速度需要足够快
* 搜索是高负载读取功能
#### 计算用量
**如果你需要进行粗略的用量计算,请向你的面试官说明。**
* 每条推特的大小:
* `tweet_id` - 8 字节
* `user_id` - 32 字节
* `text` - 140 字节
* `media` - 平均 10 KB
* 总计: 大约 10 KB
* 每月产生新推特的内容为 150 TB
* 每条推特 10 KB * 每天 5 亿条推特 * 每月 30 天
* 3 年产生新推特的内容为 5.4 PB
* 每秒需要处理 10 万次读取请求
* 每个月需要处理 2500 亿次请求 * (每秒 400 次请求 / 每月 10 亿次请求)
* 每秒发布 6000 条推特
* 每月发布 150 亿条推特 * (每秒 400 次请求 / 每月 10 次请求)
* 每秒推送 6 万条推特
* 每月推送 1500 亿条推特 * (每秒 400 次请求 / 每月 10 亿次请求)
* 每秒 4000 次搜索请求
便利换算指南:
* 每个月有 250 万秒
* 每秒一个请求 = 每个月 250 万次请求
* 每秒 40 个请求 = 每个月 1 亿次请求
* 每秒 400 个请求 = 每个月 10 亿次请求
## 第二步:概要设计
> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/48tEA2j.png)
## 第三步:设计核心组件
> 深入每个核心组件的细节。
### 用例:用户发表了一篇推特
我们可以将用户自己发表的推特存储在[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)中。我们也可以讨论一下[究竟是用 SQL 还是用 NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
构建用户主页时间轴(查看关注用户的活动)以及推送推特是件麻烦事。将特推传播给所有关注者(每秒约递送 6 万条推特)这一操作有可能会使传统的[关系数据库](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)超负载。因此,我们可以使用 **NoSQL 数据库**或**内存数据库**之类的更快的数据存储方式。从内存读取 1 MB 连续数据大约要花 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。<sup><a href=https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数>1</a></sup>
我们可以将照片、视频之类的媒体存储于**对象存储**中。
* **客户端**向应用[反向代理](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)的**Web 服务器**发送一条推特
* **Web 服务器**将请求转发给**写 API**服务器
* **写 API**服务器将推特使用 **SQL 数据库**存储于用户时间轴中
* **写 API**调用**消息输出服务**,进行以下操作:
* 查询**用户 图 服务**找到存储于**内存缓存**中的此用户的粉丝
* 将推特存储于**内存缓存**中的**此用户的粉丝的主页时间轴**中
* O(n) 复杂度操作: 1000 名粉丝 = 1000 次查找与插入
* 将特推存储在**搜索索引服务**中,以加快搜索
* 将媒体存储于**对象存储**中
* 使用**通知服务**向粉丝发送推送:
* 使用**队列**异步推送通知
**向你的面试官告知你准备写多少代码**。
如果我们用 Redis 作为**内存缓存**,那可以用 Redis 原生的 list 作为其数据结构。结构如下:
```
tweet n+2 tweet n+1 tweet n
| 8 bytes 8 bytes 1 byte | 8 bytes 8 bytes 1 byte | 8 bytes 8 bytes 1 byte |
| tweet_id user_id meta | tweet_id user_id meta | tweet_id user_id meta |
```
新发布的推特将被存储在对应用户(关注且活跃的用户)的主页时间轴的**内存缓存**中。
我们可以调用一个公共的 [REST API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
```
$ curl -X POST --data '{ "user_id": "123", "auth_token": "ABC123", \
"status": "hello world!", "media_ids": "ABC987" }' \
https://twitter.com/api/v1/tweet
```
返回:
```
{
"created_at": "Wed Sep 05 00:37:15 +0000 2012",
"status": "hello world!",
"tweet_id": "987",
"user_id": "123",
...
}
```
而对于服务器内部的通信,我们可以使用 [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)。
### 用例:用户浏览主页时间轴
* **客户端**向 **Web 服务器**发起一次读取主页时间轴的请求
* **Web 服务器**将请求转发给**读取 API**服务器
* **读取 API**服务器调用**时间轴服务**进行以下操作:
* 从**内存缓存**读取时间轴数据,其中包括推特 id 与用户 id - O(1)
* 通过 [multiget](http://redis.io/commands/mget) 向**推特信息服务**进行查询,以获取相关 id 推特的额外信息 - O(n)
* 通过 muiltiget 向**用户信息服务**进行查询,以获取相关 id 用户的额外信息 - O(n)
REST API
```
$ curl https://twitter.com/api/v1/home_timeline?user_id=123
```
返回:
```
{
"user_id": "456",
"tweet_id": "123",
"status": "foo"
},
{
"user_id": "789",
"tweet_id": "456",
"status": "bar"
},
{
"user_id": "789",
"tweet_id": "579",
"status": "baz"
},
```
### 用例:用户浏览用户时间轴
* **客户端**向**Web 服务器**发起获得用户时间线的请求
* **Web 服务器**将请求转发给**读取 API**服务器
* **读取 API**从 **SQL 数据库**中取出用户的时间轴
REST API 与前面的主页时间轴类似,区别只在于取出的推特是由用户自己发送而不是关注人发送。
### 用例:用户搜索关键词
* **客户端**将搜索请求发给**Web 服务器**
* **Web 服务器**将请求转发给**搜索 API**服务器
* **搜索 API**调用**搜索服务**进行以下操作:
* 对输入进行转换与分词,弄明白需要搜索什么东西
* 移除标点等额外内容
* 将文本打散为词组
* 修正拼写错误
* 规范字母大小写
* 将查询转换为布尔操作
* 查询**搜索集群**(例如[Lucene](https://lucene.apache.org/))检索结果:
* 对集群内的所有服务器进行查询,将有结果的查询进行[发散聚合Scatter gathers](https://github.com/donnemartin/system-design-primer#under-development)
* 合并取到的条目,进行评分与排序,最终返回结果
REST API
```
$ curl https://twitter.com/api/v1/search?query=hello+world
```
返回结果与前面的主页时间轴类似,只不过返回的是符合查询条件的推特。
## 第四步:架构扩展
> 根据限制条件,找到并解决瓶颈。
![Imgur](http://i.imgur.com/MzExP06.png)
**重要提示:不要从最初设计直接跳到最终设计中!**
现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[「设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务」](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一个配置多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有什么呢?
我们将会介绍一些组件来完成设计,并解决架构扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及可选的替代方案。
* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
* [水平拓展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
* [反向代理web 服务器)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
* [API 服务(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
* [关系型数据库管理系统 (RDBMS)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#关系型数据库管理系统rdbms)
* [SQL 故障主从切换](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#故障切换)
* [主从复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
**消息输出服务**有可能成为性能瓶颈。那些有着百万数量关注着的用户可能发一条推特就需要好几分钟才能完成消息输出进程。这有可能使 @回复 这种推特时出现竞争条件,因此需要根据服务时间对此推特进行重排序来降低影响。
我们还可以避免从高关注量的用户输出推特。相反,我们可以通过搜索来找到高关注量用户的推特,并将搜索结果与用户的主页时间轴合并,再根据时间对其进行排序。
此外,还可以通过以下内容进行优化:
* 仅为每个主页时间轴在**内存缓存**中存储数百条推特
* 仅在**内存缓存**中存储活动用户的主页时间轴
* 如果某个用户在过去 30 天都没有产生活动,那我们可以使用 **SQL 数据库**重新构建他的时间轴
* 使用**用户 图 服务**来查询并确定用户关注的人
* 从 **SQL 数据库**中取出推特,并将它们存入**内存缓存**
* 仅在**推特信息服务**中存储一个月的推特
* 仅在**用户信息服务**中存储活动用户的信息
* **搜索集群**需要将推特保留在内存中,以降低延迟
我们还可以考虑优化 **SQL 数据库** 来解决一些瓶颈问题。
**内存缓存**能减小一些数据库的负载,靠 **SQL Read 副本**已经足够处理缓存未命中情况。我们还可以考虑使用一些额外的 SQL 性能拓展技术。
高容量的写入将淹没单个的 **SQL 写主从**模式,因此需要更多的拓展技术。
* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
我们也可以考虑将一些数据移至 **NoSQL 数据库**
## 其它要点
> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
#### NoSQL
* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
### 缓存
* 在哪缓存
* [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
* [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
* [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
* [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
* [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
* 什么需要缓存
* [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
* [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
* 何时更新缓存
* [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
* [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
* [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
* [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
### 异步与微服务
* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
### 通信
* 可权衡选择的方案:
* 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
* 服务器内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
### 安全性
请参阅[「安全」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)一章。
### 延迟数值
请参阅[「每个程序员都应该知道的延迟数」](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
### 持续探讨
* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
* 架构拓展是一个迭代的过程。

View File

@ -0,0 +1,356 @@
# 设计一个网页爬虫
**注意:这个文档中的链接会直接指向[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)中的有关部分,以避免重复的内容。你可以参考链接的相关内容,来了解其总的要点、方案的权衡取舍以及可选的替代方案。**
## 第一步:简述用例与约束条件
> 把所有需要的东西聚集在一起,审视问题。不停的提问,以至于我们可以明确使用场景和约束。讨论假设。
我们将在没有面试官明确说明问题的情况下,自己定义一些用例以及限制条件。
### 用例
#### 我们把问题限定在仅处理以下用例的范围中
* **服务** 抓取一系列链接:
* 生成包含搜索词的网页倒排索引
* 生成页面的标题和摘要信息
* 页面标题和摘要都是静态的,它们不会根据搜索词改变
* **用户** 输入搜索词后,可以看到相关的搜索结果列表,列表每一项都包含由网页爬虫生成的页面标题及摘要
* 只给该用例绘制出概要组件和交互说明,无需讨论细节
* **服务** 具有高可用性
#### 无需考虑
* 搜索分析
* 个性化搜索结果
* 页面排名
### 限制条件与假设
#### 提出假设
* 搜索流量分布不均
* 有些搜索词非常热门,有些则非常冷门
* 只支持匿名用户
* 用户很快就能看到搜索结果
* 网页爬虫不应该陷入死循环
* 当爬虫路径包含环的时候,将会陷入死循环
* 抓取 10 亿个链接
* 要定期重新抓取页面以确保新鲜度
* 平均每周重新抓取一次,网站越热门,那么重新抓取的频率越高
* 每月抓取 40 亿个链接
* 每个页面的平均存储大小500 KB
* 简单起见,重新抓取的页面算作新页面
* 每月搜索量 1000 亿次
用更传统的系统来练习 —— 不要使用 [solr](http://lucene.apache.org/solr/) 、[nutch](http://nutch.apache.org/) 之类的现成系统。
#### 计算用量
**如果你需要进行粗略的用量计算,请向你的面试官说明。**
* 每月存储 2 PB 页面
* 每月抓取 40 亿个页面,每个页面 500 KB
* 三年存储 72 PB 页面
* 每秒 1600 次写请求
* 每秒 40000 次搜索请求
简便换算指南:
* 一个月有 250 万秒
* 每秒 1 个请求,即每月 250 万个请求
* 每秒 40 个请求,即每月 1 亿个请求
* 每秒 400 个请求,即每月 10 亿个请求
## 第二步: 概要设计
> 列出所有重要组件以规划概要设计。
![Imgur](http://i.imgur.com/xjdAAUv.png)
## 第三步:设计核心组件
> 对每一个核心组件进行详细深入的分析。
### 用例:爬虫服务抓取一系列网页
假设我们有一个初始列表 `links_to_crawl`(待抓取链接),它最初基于网站整体的知名度来排序。当然如果这个假设不合理,我们可以使用 [Yahoo](https://www.yahoo.com/)、[DMOZ](http://www.dmoz.org/) 等知名门户网站作为种子链接来进行扩散 。
我们将用表 `crawled_links` (已抓取链接 )来记录已经处理过的链接以及相应的页面签名。
我们可以将 `links_to_crawl``crawled_links` 记录在键-值型 **NoSQL 数据库**中。对于 `crawled_links` 中已排序的链接,我们可以使用 [Redis](https://redis.io/) 的有序集合来维护网页链接的排名。我们应当在 [选择 SQL 还是 NoSQL 的问题上,讨论有关使用场景以及利弊 ](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)。
* **爬虫服务**按照以下流程循环处理每一个页面链接:
* 选取排名最靠前的待抓取链接
* 在 **NoSQL 数据库**的 `crawled_links` 中,检查待抓取页面的签名是否与某个已抓取页面的签名相似
* 若存在,则降低该页面链接的优先级
* 这样做可以避免陷入死循环
* 继续(进入下一次循环)
* 若不存在,则抓取该链接
* 在**倒排索引服务**任务队列中,新增一个生成[倒排索引](https://en.wikipedia.org/wiki/Search_engine_indexing)任务。
* 在**文档服务**任务队列中,新增一个生成静态标题和摘要的任务。
* 生成页面签名
* 在 **NoSQL 数据库**的 `links_to_crawl` 中删除该链接
* 在 **NoSQL 数据库**的 `crawled_links` 中插入该链接以及页面签名
**向面试官了解你需要写多少代码**。
`PagesDataStore` 是**爬虫服务**中的一个抽象类,它使用 **NoSQL 数据库**进行存储。
```python
class PagesDataStore(object):
def __init__(self, db);
self.db = db
...
def add_link_to_crawl(self, url):
"""将指定链接加入 `links_to_crawl`。"""
...
def remove_link_to_crawl(self, url):
"""从 `links_to_crawl` 中删除指定链接。"""
...
def reduce_priority_link_to_crawl(self, url)
"""在 `links_to_crawl` 中降低一个链接的优先级以避免死循环。"""
...
def extract_max_priority_page(self):
"""返回 `links_to_crawl` 中优先级最高的链接。"""
...
def insert_crawled_link(self, url, signature):
"""将指定链接加入 `crawled_links`。"""
...
def crawled_similar(self, signature):
"""判断待抓取页面的签名是否与某个已抓取页面的签名相似。"""
...
```
`Page` 是**爬虫服务**的一个抽象类,它封装了网页对象,由页面链接、页面内容、子链接和页面签名构成。
```python
class Page(object):
def __init__(self, url, contents, child_urls, signature):
self.url = url
self.contents = contents
self.child_urls = child_urls
self.signature = signature
```
`Crawler` 是**爬虫服务**的主类,由`Page` 和 `PagesDataStore` 组成。
```python
class Crawler(object):
def __init__(self, data_store, reverse_index_queue, doc_index_queue):
self.data_store = data_store
self.reverse_index_queue = reverse_index_queue
self.doc_index_queue = doc_index_queue
def create_signature(self, page):
"""基于页面链接与内容生成签名。"""
...
def crawl_page(self, page):
for url in page.child_urls:
self.data_store.add_link_to_crawl(url)
page.signature = self.create_signature(page)
self.data_store.remove_link_to_crawl(page.url)
self.data_store.insert_crawled_link(page.url, page.signature)
def crawl(self):
while True:
page = self.data_store.extract_max_priority_page()
if page is None:
break
if self.data_store.crawled_similar(page.signature):
self.data_store.reduce_priority_link_to_crawl(page.url)
else:
self.crawl_page(page)
```
### 处理重复内容
我们要谨防网页爬虫陷入死循环,这通常会发生在爬虫路径中存在环的情况。
**向面试官了解你需要写多少代码**.
删除重复链接:
* 假设数据量较小,我们可以用类似于 `sort | unique` 的方法。(译注: 先排序,后去重)
* 假设有 10 亿条数据,我们应该使用 **MapReduce** 来输出只出现 1 次的记录。
```python
class RemoveDuplicateUrls(MRJob):
def mapper(self, _, line):
yield line, 1
def reducer(self, key, values):
total = sum(values)
if total == 1:
yield key, total
```
比起处理重复内容,检测重复内容更为复杂。我们可以基于网页内容生成签名,然后对比两者签名的相似度。可能会用到的算法有 [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index) 以及 [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)。
### 抓取结果更新策略
要定期重新抓取页面以确保新鲜度。抓取结果应该有个 `timestamp` 字段记录上一次页面抓取时间。每隔一段时间,比如说 1 周,所有页面都需要更新一次。对于热门网站或是内容频繁更新的网站,爬虫抓取间隔可以缩短。
尽管我们不会深入网页数据分析的细节,我们仍然要做一些数据挖掘工作来确定一个页面的平均更新时间,并且根据相关的统计数据来决定爬虫的重新抓取频率。
当然我们也应该根据站长提供的 `Robots.txt` 来控制爬虫的抓取频率。
### 用例:用户输入搜索词后,可以看到相关的搜索结果列表,列表每一项都包含由网页爬虫生成的页面标题及摘要
* **客户端**向运行[反向代理](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)的 **Web 服务器**发送一个请求
* **Web 服务器** 发送请求到 **Query API** 服务器
* **查询 API** 服务将会做这些事情:
* 解析查询参数
* 删除 HTML 标记
* 将文本分割成词组 (译注: 分词处理)
* 修正错别字
* 规范化大小写
* 将搜索词转换为布尔运算
* 使用**倒排索引服务**来查找匹配查询的文档
* **倒排索引服务**对匹配到的结果进行排名,然后返回最符合的结果
* 使用**文档服务**返回文章标题与摘要
我们使用 [**REST API**](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest) 与客户端通信:
```
$ curl https://search.com/api/v1/search?query=hello+world
```
响应内容:
```
{
"title": "foo's title",
"snippet": "foo's snippet",
"link": "https://foo.com",
},
{
"title": "bar's title",
"snippet": "bar's snippet",
"link": "https://bar.com",
},
{
"title": "baz's title",
"snippet": "baz's snippet",
"link": "https://baz.com",
},
```
对于服务器内部通信,我们可以使用 [远程过程调用协议RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
## 第四步:架构扩展
> 根据限制条件,找到并解决瓶颈。
![Imgur](http://i.imgur.com/bWxPtQA.png)
**重要提示:不要直接从最初设计跳到最终设计!**
现在你要 1) **基准测试、负载测试**。2) **分析、描述**性能瓶颈。3) 在解决瓶颈问题的同时评估替代方案、权衡利弊。4) 重复以上步骤。请阅读[设计一个系统,并将其扩大到为数以百万计的 AWS 用户服务](../scaling_aws/README.md) 来了解如何逐步扩大初始设计。
讨论初始设计可能遇到的瓶颈及相关解决方案是很重要的。例如加上一套配备多台 **Web 服务器**的**负载均衡器**是否能够解决问题?**CDN**呢?**主从复制**呢?它们各自的替代方案和需要**权衡**的利弊又有哪些呢?
我们将会介绍一些组件来完成设计,并解决架构规模扩张问题。内置的负载均衡器将不做讨论以节省篇幅。
**为了避免重复讨论**,请参考[系统设计主题索引](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#系统设计主题的索引)相关部分来了解其要点、方案的权衡取舍以及替代方案。
* [DNS](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#域名系统)
* [负载均衡器](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#负载均衡器)
* [水平扩展](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#水平扩展)
* [Web 服务器(反向代理)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#反向代理web-服务器)
* [API 服务器(应用层)](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用层)
* [缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存)
* [NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#nosql)
* [一致性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#一致性模式)
* [可用性模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#可用性模式)
有些搜索词非常热门,有些则非常冷门。热门的搜索词可以通过诸如 Redis 或者 Memcached 之类的**内存缓存**来缩短响应时间,避免**倒排索引服务**以及**文档服务**过载。**内存缓存**同样适用于流量分布不均匀以及流量短时高峰问题。从内存中读取 1 MB 连续数据大约需要 250 微秒,而从 SSD 读取同样大小的数据要花费 4 倍的时间,从机械硬盘读取需要花费 80 倍以上的时间。<sup><a href="https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数">1</a></sup>
以下是优化**爬虫服务**的其他建议:
* 为了处理数据大小问题以及网络请求负载,**倒排索引服务**和**文档服务**可能需要大量应用数据分片和数据复制。
* DNS 查询可能会成为瓶颈,**爬虫服务**最好专门维护一套定期更新的 DNS 查询服务。
* 借助于[连接池](https://en.wikipedia.org/wiki/Connection_pool),即同时维持多个开放网络连接,可以提升**爬虫服务**的性能并减少内存使用量。
* 改用 [UDP](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#用户数据报协议udp) 协议同样可以提升性能
* 网络爬虫受带宽影响较大,请确保带宽足够维持高吞吐量。
## 其它要点
> 是否深入这些额外的主题,取决于你的问题范围和剩下的时间。
### SQL 扩展模式
* [读取复制](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#主从复制)
* [联合](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#联合)
* [分片](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#分片)
* [非规范化](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#非规范化)
* [SQL 调优](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-调优)
#### NoSQL
* [键-值存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#键-值存储)
* [文档类型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#文档类型存储)
* [列型存储](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#列型存储)
* [图数据库](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#图数据库)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#sql-还是-nosql)
### 缓存
* 在哪缓存
* [客户端缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#客户端缓存)
* [CDN 缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#cdn-缓存)
* [Web 服务器缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#web-服务器缓存)
* [数据库缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库缓存)
* [应用缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#应用缓存)
* 什么需要缓存
* [数据库查询级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#数据库查询级别的缓存)
* [对象级别的缓存](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#对象级别的缓存)
* 何时更新缓存
* [缓存模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#缓存模式)
* [直写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#直写模式)
* [回写模式](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#回写模式)
* [刷新](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#刷新)
### 异步与微服务
* [消息队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#消息队列)
* [任务队列](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#任务队列)
* [背压](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#背压)
* [微服务](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#微服务)
### 通信
* 可权衡选择的方案:
* 与客户端的外部通信 - [使用 REST 作为 HTTP API](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#表述性状态转移rest)
* 内部通信 - [RPC](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#远程过程调用协议rpc)
* [服务发现](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#服务发现)
### 安全性
请参阅[安全](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#安全)。
### 延迟数值
请参阅[每个程序员都应该知道的延迟数](https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md#每个程序员都应该知道的延迟数)。
### 持续探讨
* 持续进行基准测试并监控你的系统,以解决他们提出的瓶颈问题。
* 架构扩展是一个迭代的过程。