parent
a07e261677
commit
9f68b03441
90
README.md
90
README.md
|
@ -275,7 +275,7 @@ You might be asked to do some estimates by hand. Refer to the [Appendix](#appen
|
||||||
* [Powers of two table](#powers-of-two-table)
|
* [Powers of two table](#powers-of-two-table)
|
||||||
* [Latency numbers every programmer should know](#latency-numbers-every-programmer-should-know)
|
* [Latency numbers every programmer should know](#latency-numbers-every-programmer-should-know)
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
Check out the following links to get a better idea of what to expect:
|
Check out the following links to get a better idea of what to expect:
|
||||||
|
|
||||||
|
@ -415,10 +415,10 @@ A service is **scalable** if it results in increased **performance** in a manner
|
||||||
|
|
||||||
Another way to look at performance vs scalability:
|
Another way to look at performance vs scalability:
|
||||||
|
|
||||||
* If you have a **performance** problem, your system is slow for a single user.
|
* If you have a **performance** problem, your system will be slow for a single user.
|
||||||
* If you have a **scalability** problem, your system is fast for a single user but slow under heavy load.
|
* If you have a **scalability** problem, your system will be fast for a single user but will be slow under heavy load.
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [A word on scalability](http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html)
|
* [A word on scalability](http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html)
|
||||||
* [Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/)
|
* [Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/)
|
||||||
|
@ -431,7 +431,7 @@ Another way to look at performance vs scalability:
|
||||||
|
|
||||||
Generally, you should aim for **maximal throughput** with **acceptable latency**.
|
Generally, you should aim for **maximal throughput** with **acceptable latency**.
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [Understanding latency vs throughput](https://community.cadence.com/cadence_blogs_8/b/fv/posts/understanding-latency-vs-throughput)
|
* [Understanding latency vs throughput](https://community.cadence.com/cadence_blogs_8/b/fv/posts/understanding-latency-vs-throughput)
|
||||||
|
|
||||||
|
@ -463,7 +463,7 @@ Responses return the most readily available version of the data available on any
|
||||||
|
|
||||||
AP is a good choice if the business needs to allow for [eventual consistency](#eventual-consistency) or when the system needs to continue working despite external errors.
|
AP is a good choice if the business needs to allow for [eventual consistency](#eventual-consistency) or when the system needs to continue working despite external errors.
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [CAP theorem revisited](http://robertgreiner.com/2014/08/cap-theorem-revisited/)
|
* [CAP theorem revisited](http://robertgreiner.com/2014/08/cap-theorem-revisited/)
|
||||||
* [A plain english introduction to CAP theorem](http://ksat.me/a-plain-english-introduction-to-cap-theorem)
|
* [A plain english introduction to CAP theorem](http://ksat.me/a-plain-english-introduction-to-cap-theorem)
|
||||||
|
@ -492,7 +492,7 @@ After a write, reads will see it. Data is replicated synchronously.
|
||||||
|
|
||||||
This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.
|
This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [Transactions across data centers](http://snarfed.org/transactions_across_datacenters_io.html)
|
* [Transactions across data centers](http://snarfed.org/transactions_across_datacenters_io.html)
|
||||||
|
|
||||||
|
@ -512,7 +512,7 @@ Active-passive failover can also be referred to as master-slave failover.
|
||||||
|
|
||||||
#### Active-active
|
#### Active-active
|
||||||
|
|
||||||
In active-active, both servers are managing traffic, spreading the load between them.
|
In active-active, both servers are managing traffic, dividing the load between them.
|
||||||
|
|
||||||
If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.
|
If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.
|
||||||
|
|
||||||
|
@ -556,7 +556,7 @@ Availability is often quantified by uptime (or downtime) as a percentage of time
|
||||||
|
|
||||||
#### Availability in parallel vs in sequence
|
#### Availability in parallel vs in sequence
|
||||||
|
|
||||||
If a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.
|
If a service consists of multiple components which are prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.
|
||||||
|
|
||||||
###### In sequence
|
###### In sequence
|
||||||
|
|
||||||
|
@ -588,7 +588,7 @@ If both `Foo` and `Bar` each had 99.9% availability, their total availability in
|
||||||
|
|
||||||
A Domain Name System (DNS) translates a domain name such as www.example.com to an IP address.
|
A Domain Name System (DNS) translates a domain name such as www.example.com to an IP address.
|
||||||
|
|
||||||
DNS is hierarchical, with a few authoritative servers at the top level. Your router or ISP provides information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a certain period of time, determined by the [time to live (TTL)](https://en.wikipedia.org/wiki/Time_to_live).
|
DNS is hierarchical, with a few authoritative servers at the top level. Your router or ISP provides information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers do cache mappings, which could become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a certain period of time, determined by the [time to live (TTL)](https://en.wikipedia.org/wiki/Time_to_live).
|
||||||
|
|
||||||
* **NS record (name server)** - Specifies the DNS servers for your domain/subdomain.
|
* **NS record (name server)** - Specifies the DNS servers for your domain/subdomain.
|
||||||
* **MX record (mail exchange)** - Specifies the mail servers for accepting messages.
|
* **MX record (mail exchange)** - Specifies the mail servers for accepting messages.
|
||||||
|
@ -606,11 +606,11 @@ Services such as [CloudFlare](https://www.cloudflare.com/dns/) and [Route 53](ht
|
||||||
|
|
||||||
### Disadvantage(s): DNS
|
### Disadvantage(s): DNS
|
||||||
|
|
||||||
* Accessing a DNS server introduces a slight delay, although mitigated by caching described above.
|
* Accessing a DNS server introduces a slight delay, although mitigated by caching as described above.
|
||||||
* DNS server management could be complex and is generally managed by [governments, ISPs, and large companies](http://superuser.com/questions/472695/who-controls-the-dns-servers/472729).
|
* DNS server management could be complex and is generally managed by [governments, ISPs, and large companies](http://superuser.com/questions/472695/who-controls-the-dns-servers/472729).
|
||||||
* DNS services have recently come under [DDoS attack](http://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/), preventing users from accessing websites such as Twitter without knowing Twitter's IP address(es).
|
* DNS services have recently come under [DDoS attack](http://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/), preventing users from accessing websites such as Twitter without knowing Twitter's IP address(es).
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [DNS architecture](https://technet.microsoft.com/en-us/library/dd197427(v=ws.10).aspx)
|
* [DNS architecture](https://technet.microsoft.com/en-us/library/dd197427(v=ws.10).aspx)
|
||||||
* [Wikipedia](https://en.wikipedia.org/wiki/Domain_Name_System)
|
* [Wikipedia](https://en.wikipedia.org/wiki/Domain_Name_System)
|
||||||
|
@ -651,7 +651,7 @@ Sites with heavy traffic work well with pull CDNs, as traffic is spread out more
|
||||||
* Content might be stale if it is updated before the TTL expires it.
|
* Content might be stale if it is updated before the TTL expires it.
|
||||||
* CDNs require changing URLs for static content to point to the CDN.
|
* CDNs require changing URLs for static content to point to the CDN.
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [Globally distributed content delivery](https://figshare.com/articles/Globally_distributed_content_delivery/6605972)
|
* [Globally distributed content delivery](https://figshare.com/articles/Globally_distributed_content_delivery/6605972)
|
||||||
* [The differences between push and pull CDNs](http://www.travelblogadvice.com/technical/the-differences-between-push-and-pull-cdns/)
|
* [The differences between push and pull CDNs](http://www.travelblogadvice.com/technical/the-differences-between-push-and-pull-cdns/)
|
||||||
|
@ -696,13 +696,13 @@ Layer 4 load balancers look at info at the [transport layer](#communication) to
|
||||||
|
|
||||||
### Layer 7 load balancing
|
### Layer 7 load balancing
|
||||||
|
|
||||||
Layer 7 load balancers look at the [application layer](#communication) to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.
|
Layer 7 load balancers look at the [application layer](#communication) to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, read the message, make a load-balancing decision, then open a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.
|
||||||
|
|
||||||
At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.
|
At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.
|
||||||
|
|
||||||
### Horizontal scaling
|
### Horizontal scaling
|
||||||
|
|
||||||
Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using commodity machines is more cost efficient and results in higher availability than scaling up a single server on more expensive hardware, called **Vertical Scaling**. It is also easier to hire for talent working on commodity hardware than it is for specialized enterprise systems.
|
Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using commodity machines is more cost efficient and results in higher availability than scaling up a single server on more expensive hardware, called **Vertical Scaling**. It is also easier to hire people with the talent of working on commodity hardware than it is for specialized enterprise systems.
|
||||||
|
|
||||||
#### Disadvantage(s): horizontal scaling
|
#### Disadvantage(s): horizontal scaling
|
||||||
|
|
||||||
|
@ -715,7 +715,7 @@ Load balancers can also help with horizontal scaling, improving performance and
|
||||||
|
|
||||||
* The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.
|
* The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.
|
||||||
* Introducing a load balancer to help eliminate a single point of failure results in increased complexity.
|
* Introducing a load balancer to help eliminate a single point of failure results in increased complexity.
|
||||||
* A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.
|
* A single load balancer is a single point of failure, configuring multiple load balancers further increases the complexity.
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) and further reading
|
||||||
|
|
||||||
|
@ -761,9 +761,9 @@ Additional benefits include:
|
||||||
### Disadvantage(s): reverse proxy
|
### Disadvantage(s): reverse proxy
|
||||||
|
|
||||||
* Introducing a reverse proxy results in increased complexity.
|
* Introducing a reverse proxy results in increased complexity.
|
||||||
* A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a [failover](https://en.wikipedia.org/wiki/Failover)) further increases complexity.
|
* A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a [failover](https://en.wikipedia.org/wiki/Failover)) further increases the complexity.
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [Reverse proxy vs load balancer](https://www.nginx.com/resources/glossary/reverse-proxy-vs-load-balancer/)
|
* [Reverse proxy vs load balancer](https://www.nginx.com/resources/glossary/reverse-proxy-vs-load-balancer/)
|
||||||
* [NGINX architecture](https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/)
|
* [NGINX architecture](https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/)
|
||||||
|
@ -786,7 +786,7 @@ Workers in the application layer also help enable [asynchronism](#asynchronism).
|
||||||
|
|
||||||
Related to this discussion are [microservices](https://en.wikipedia.org/wiki/Microservices), which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. <sup><a href=https://smartbear.com/learn/api-design/what-are-microservices>1</a></sup>
|
Related to this discussion are [microservices](https://en.wikipedia.org/wiki/Microservices), which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. <sup><a href=https://smartbear.com/learn/api-design/what-are-microservices>1</a></sup>
|
||||||
|
|
||||||
Pinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc.
|
Pinterest, for example, can have the following microservices: user profile, follower, feed, search, photo upload, etc.
|
||||||
|
|
||||||
### Service Discovery
|
### Service Discovery
|
||||||
|
|
||||||
|
@ -797,7 +797,7 @@ Systems such as [Consul](https://www.consul.io/docs/index.html), [Etcd](https://
|
||||||
* Adding an application layer with loosely coupled services requires a different approach from an architectural, operations, and process viewpoint (vs a monolithic system).
|
* Adding an application layer with loosely coupled services requires a different approach from an architectural, operations, and process viewpoint (vs a monolithic system).
|
||||||
* Microservices can add complexity in terms of deployments and operations.
|
* Microservices can add complexity in terms of deployments and operations.
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [Intro to architecting systems for scale](http://lethain.com/introduction-to-architecting-systems-for-scale)
|
* [Intro to architecting systems for scale](http://lethain.com/introduction-to-architecting-systems-for-scale)
|
||||||
* [Crack the system design interview](http://www.puncsky.com/blog/2016-02-13-crack-the-system-design-interview)
|
* [Crack the system design interview](http://www.puncsky.com/blog/2016-02-13-crack-the-system-design-interview)
|
||||||
|
@ -866,7 +866,7 @@ Both masters serve reads and writes and coordinate with each other on writes. I
|
||||||
* On some systems, writing to the master can spawn multiple threads to write in parallel, whereas read replicas only support writing sequentially with a single thread.
|
* On some systems, writing to the master can spawn multiple threads to write in parallel, whereas read replicas only support writing sequentially with a single thread.
|
||||||
* Replication adds more hardware and additional complexity.
|
* Replication adds more hardware and additional complexity.
|
||||||
|
|
||||||
##### Source(s) and further reading: replication
|
##### Source(s) for further reading: replication
|
||||||
|
|
||||||
* [Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/)
|
* [Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/)
|
||||||
* [Multi-master replication](https://en.wikipedia.org/wiki/Multi-master_replication)
|
* [Multi-master replication](https://en.wikipedia.org/wiki/Multi-master_replication)
|
||||||
|
@ -879,7 +879,7 @@ Both masters serve reads and writes and coordinate with each other on writes. I
|
||||||
<i><a href=https://www.youtube.com/watch?v=kKjm4ehYiMs>Source: Scaling up to your first 10 million users</a></i>
|
<i><a href=https://www.youtube.com/watch?v=kKjm4ehYiMs>Source: Scaling up to your first 10 million users</a></i>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: **forums**, **users**, and **products**, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.
|
Federation (or functional partitioning) splits up databases by its functions. For example, instead of a single, monolithic database, you could have three databases: **forums**, **users**, and **products**, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.
|
||||||
|
|
||||||
##### Disadvantage(s): federation
|
##### Disadvantage(s): federation
|
||||||
|
|
||||||
|
@ -888,7 +888,7 @@ Federation (or functional partitioning) splits up databases by function. For ex
|
||||||
* Joining data from two databases is more complex with a [server link](http://stackoverflow.com/questions/5145637/querying-data-by-joining-two-tables-in-two-database-on-different-servers).
|
* Joining data from two databases is more complex with a [server link](http://stackoverflow.com/questions/5145637/querying-data-by-joining-two-tables-in-two-database-on-different-servers).
|
||||||
* Federation adds more hardware and additional complexity.
|
* Federation adds more hardware and additional complexity.
|
||||||
|
|
||||||
##### Source(s) and further reading: federation
|
##### Source(s) for further reading: federation
|
||||||
|
|
||||||
* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=kKjm4ehYiMs)
|
* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=kKjm4ehYiMs)
|
||||||
|
|
||||||
|
@ -904,7 +904,7 @@ Sharding distributes data across different databases such that each database can
|
||||||
|
|
||||||
Similar to the advantages of [federation](#federation), sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you'll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput.
|
Similar to the advantages of [federation](#federation), sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you'll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput.
|
||||||
|
|
||||||
Common ways to shard a table of users is either through the user's last name initial or the user's geographic location.
|
Common ways to shard a table of users is either through the user's last name's initial or the user's geographic location.
|
||||||
|
|
||||||
##### Disadvantage(s): sharding
|
##### Disadvantage(s): sharding
|
||||||
|
|
||||||
|
@ -914,7 +914,7 @@ Common ways to shard a table of users is either through the user's last name ini
|
||||||
* Joining data from multiple shards is more complex.
|
* Joining data from multiple shards is more complex.
|
||||||
* Sharding adds more hardware and additional complexity.
|
* Sharding adds more hardware and additional complexity.
|
||||||
|
|
||||||
##### Source(s) and further reading: sharding
|
##### Source(s) for further reading: sharding
|
||||||
|
|
||||||
* [The coming of the shard](http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-database-design-the-coming-of-the.html)
|
* [The coming of the shard](http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-database-design-the-coming-of-the.html)
|
||||||
* [Shard database architecture](https://en.wikipedia.org/wiki/Shard_(database_architecture))
|
* [Shard database architecture](https://en.wikipedia.org/wiki/Shard_(database_architecture))
|
||||||
|
@ -922,7 +922,7 @@ Common ways to shard a table of users is either through the user's last name ini
|
||||||
|
|
||||||
#### Denormalization
|
#### Denormalization
|
||||||
|
|
||||||
Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Some RDBMS such as [PostgreSQL](https://en.wikipedia.org/wiki/PostgreSQL) and Oracle support [materialized views](https://en.wikipedia.org/wiki/Materialized_view) which handle the work of storing redundant information and keeping redundant copies consistent.
|
Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Some RDBMSes such as [PostgreSQL](https://en.wikipedia.org/wiki/PostgreSQL) and Oracle support [materialized views](https://en.wikipedia.org/wiki/Materialized_view) which handle the work of storing redundant information and keeping redundant copies consistent.
|
||||||
|
|
||||||
Once data becomes distributed with techniques such as [federation](#federation) and [sharding](#sharding), managing joins across data centers further increases complexity. Denormalization might circumvent the need for such complex joins.
|
Once data becomes distributed with techniques such as [federation](#federation) and [sharding](#sharding), managing joins across data centers further increases complexity. Denormalization might circumvent the need for such complex joins.
|
||||||
|
|
||||||
|
@ -934,7 +934,7 @@ In most systems, reads can heavily outnumber writes 100:1 or even 1000:1. A rea
|
||||||
* Constraints can help redundant copies of information stay in sync, which increases complexity of the database design.
|
* Constraints can help redundant copies of information stay in sync, which increases complexity of the database design.
|
||||||
* A denormalized database under heavy write load might perform worse than its normalized counterpart.
|
* A denormalized database under heavy write load might perform worse than its normalized counterpart.
|
||||||
|
|
||||||
###### Source(s) and further reading: denormalization
|
###### Source(s) for further reading: denormalization
|
||||||
|
|
||||||
* [Denormalization](https://en.wikipedia.org/wiki/Denormalization)
|
* [Denormalization](https://en.wikipedia.org/wiki/Denormalization)
|
||||||
|
|
||||||
|
@ -958,7 +958,7 @@ Benchmarking and profiling might point you to the following optimizations.
|
||||||
* Use `INT` for larger numbers up to 2^32 or 4 billion.
|
* Use `INT` for larger numbers up to 2^32 or 4 billion.
|
||||||
* Use `DECIMAL` for currency to avoid floating point representation errors.
|
* Use `DECIMAL` for currency to avoid floating point representation errors.
|
||||||
* Avoid storing large `BLOBS`, store the location of where to get the object instead.
|
* Avoid storing large `BLOBS`, store the location of where to get the object instead.
|
||||||
* `VARCHAR(255)` is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS.
|
* `VARCHAR(255)` is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMSes.
|
||||||
* Set the `NOT NULL` constraint where applicable to [improve search performance](http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search).
|
* Set the `NOT NULL` constraint where applicable to [improve search performance](http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search).
|
||||||
|
|
||||||
##### Use good indices
|
##### Use good indices
|
||||||
|
@ -981,7 +981,7 @@ Benchmarking and profiling might point you to the following optimizations.
|
||||||
|
|
||||||
* In some cases, the [query cache](https://dev.mysql.com/doc/refman/5.7/en/query-cache.html) could lead to [performance issues](https://www.percona.com/blog/2016/10/12/mysql-5-7-performance-tuning-immediately-after-installation/).
|
* In some cases, the [query cache](https://dev.mysql.com/doc/refman/5.7/en/query-cache.html) could lead to [performance issues](https://www.percona.com/blog/2016/10/12/mysql-5-7-performance-tuning-immediately-after-installation/).
|
||||||
|
|
||||||
##### Source(s) and further reading: SQL tuning
|
##### Source(s) for further reading: SQL tuning
|
||||||
|
|
||||||
* [Tips for optimizing MySQL queries](http://aiddroid.com/10-tips-optimizing-mysql-queries-dont-suck/)
|
* [Tips for optimizing MySQL queries](http://aiddroid.com/10-tips-optimizing-mysql-queries-dont-suck/)
|
||||||
* [Is there a good reason i see VARCHAR(255) used so often?](http://stackoverflow.com/questions/1217466/is-there-a-good-reason-i-see-varchar255-used-so-often-as-opposed-to-another-l)
|
* [Is there a good reason i see VARCHAR(255) used so often?](http://stackoverflow.com/questions/1217466/is-there-a-good-reason-i-see-varchar255-used-so-often-as-opposed-to-another-l)
|
||||||
|
@ -1004,13 +1004,13 @@ In addition to choosing between [SQL or NoSQL](#sql-or-nosql), it is helpful to
|
||||||
|
|
||||||
> Abstraction: hash table
|
> Abstraction: hash table
|
||||||
|
|
||||||
A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in [lexicographic order](https://en.wikipedia.org/wiki/Lexicographical_order), allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.
|
A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in [lexicographic order](https://en.wikipedia.org/wiki/Lexicographical_order), allowing efficient retrieval of key ranges. Key-value stores can allow storage of metadata with a value.
|
||||||
|
|
||||||
Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.
|
Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.
|
||||||
|
|
||||||
A key-value store is the basis for more complex systems such as a document store, and in some cases, a graph database.
|
A key-value store is the basis for more complex systems such as a document store, and in some cases, a graph database.
|
||||||
|
|
||||||
##### Source(s) and further reading: key-value store
|
##### Source(s) for further reading: key-value store
|
||||||
|
|
||||||
* [Key-value database](https://en.wikipedia.org/wiki/Key-value_database)
|
* [Key-value database](https://en.wikipedia.org/wiki/Key-value_database)
|
||||||
* [Disadvantages of key-value stores](http://stackoverflow.com/questions/4056093/what-are-the-disadvantages-of-using-a-key-value-table-over-nullable-columns-or)
|
* [Disadvantages of key-value stores](http://stackoverflow.com/questions/4056093/what-are-the-disadvantages-of-using-a-key-value-table-over-nullable-columns-or)
|
||||||
|
@ -1029,7 +1029,7 @@ Some document stores like [MongoDB](https://www.mongodb.com/mongodb-architecture
|
||||||
|
|
||||||
Document stores provide high flexibility and are often used for working with occasionally changing data.
|
Document stores provide high flexibility and are often used for working with occasionally changing data.
|
||||||
|
|
||||||
##### Source(s) and further reading: document store
|
##### Source(s) for further reading: document store
|
||||||
|
|
||||||
* [Document-oriented database](https://en.wikipedia.org/wiki/Document-oriented_database)
|
* [Document-oriented database](https://en.wikipedia.org/wiki/Document-oriented_database)
|
||||||
* [MongoDB architecture](https://www.mongodb.com/mongodb-architecture)
|
* [MongoDB architecture](https://www.mongodb.com/mongodb-architecture)
|
||||||
|
@ -1048,11 +1048,11 @@ Document stores provide high flexibility and are often used for working with occ
|
||||||
|
|
||||||
A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.
|
A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.
|
||||||
|
|
||||||
Google introduced [Bigtable](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf) as the first wide column store, which influenced the open-source [HBase](https://www.edureka.co/blog/hbase-architecture/) often-used in the Hadoop ecosystem, and [Cassandra](http://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archIntro.html) from Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges.
|
Google introduced [Bigtable](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf) as the first wide column store, which influenced the open-source [HBase](https://www.edureka.co/blog/hbase-architecture/) often-used in the Hadoop ecosystem, and [Cassandra](http://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archIntro.html) from Facebook. Stores such as BigTable, HBase and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges.
|
||||||
|
|
||||||
Wide column stores offer high availability and high scalability. They are often used for very large data sets.
|
Wide column stores offer high availability and high scalability. They are often used for very large data sets.
|
||||||
|
|
||||||
##### Source(s) and further reading: wide column store
|
##### Source(s) for further reading: wide column store
|
||||||
|
|
||||||
* [SQL & NoSQL, a brief history](http://blog.grio.com/2015/11/sql-nosql-a-brief-history.html)
|
* [SQL & NoSQL, a brief history](http://blog.grio.com/2015/11/sql-nosql-a-brief-history.html)
|
||||||
* [Bigtable architecture](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf)
|
* [Bigtable architecture](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf)
|
||||||
|
@ -1073,13 +1073,13 @@ In a graph database, each node is a record and each arc is a relationship betwee
|
||||||
|
|
||||||
Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with [REST APIs](#representational-state-transfer-rest).
|
Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with [REST APIs](#representational-state-transfer-rest).
|
||||||
|
|
||||||
##### Source(s) and further reading: graph
|
##### Source(s) for further reading: graph
|
||||||
|
|
||||||
* [Graph database](https://en.wikipedia.org/wiki/Graph_database)
|
* [Graph database](https://en.wikipedia.org/wiki/Graph_database)
|
||||||
* [Neo4j](https://neo4j.com/)
|
* [Neo4j](https://neo4j.com/)
|
||||||
* [FlockDB](https://blog.twitter.com/2010/introducing-flockdb)
|
* [FlockDB](https://blog.twitter.com/2010/introducing-flockdb)
|
||||||
|
|
||||||
#### Source(s) and further reading: NoSQL
|
#### Source(s) for further reading: NoSQL
|
||||||
|
|
||||||
* [Explanation of base terminology](http://stackoverflow.com/questions/3342497/explanation-of-base-terminology)
|
* [Explanation of base terminology](http://stackoverflow.com/questions/3342497/explanation-of-base-terminology)
|
||||||
* [NoSQL databases a survey and decision guidance](https://medium.com/baqend-blog/nosql-databases-a-survey-and-decision-guidance-ea7823a822d#.wskogqenq)
|
* [NoSQL databases a survey and decision guidance](https://medium.com/baqend-blog/nosql-databases-a-survey-and-decision-guidance-ea7823a822d#.wskogqenq)
|
||||||
|
@ -1124,7 +1124,7 @@ Sample data well-suited for NoSQL:
|
||||||
* Frequently accessed ('hot') tables
|
* Frequently accessed ('hot') tables
|
||||||
* Metadata/lookup tables
|
* Metadata/lookup tables
|
||||||
|
|
||||||
##### Source(s) and further reading: SQL or NoSQL
|
##### Source(s) for further reading: SQL or NoSQL
|
||||||
|
|
||||||
* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=kKjm4ehYiMs)
|
* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=kKjm4ehYiMs)
|
||||||
* [SQL vs NoSQL differences](https://www.sitepoint.com/sql-vs-nosql-differences/)
|
* [SQL vs NoSQL differences](https://www.sitepoint.com/sql-vs-nosql-differences/)
|
||||||
|
@ -1159,7 +1159,7 @@ Your database usually includes some level of caching in a default configuration,
|
||||||
|
|
||||||
### Application caching
|
### Application caching
|
||||||
|
|
||||||
In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so [cache invalidation](https://en.wikipedia.org/wiki/Cache_algorithms) algorithms such as [least recently used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)) can help invalidate 'cold' entries and keep 'hot' data in RAM.
|
In-memory caches such as Memcached and Redis the key-value is stored between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so [cache invalidation](https://en.wikipedia.org/wiki/Cache_algorithms) algorithms such as [least recently used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)) can help invalidate 'cold' entries and keep 'hot' data in RAM.
|
||||||
|
|
||||||
Redis has the following additional features:
|
Redis has the following additional features:
|
||||||
|
|
||||||
|
@ -1311,7 +1311,7 @@ Refresh-ahead can result in reduced latency vs read-through if the cache can acc
|
||||||
* Cache invalidation is a difficult problem, there is additional complexity associated with when to update the cache.
|
* Cache invalidation is a difficult problem, there is additional complexity associated with when to update the cache.
|
||||||
* Need to make application changes such as adding Redis or memcached.
|
* Need to make application changes such as adding Redis or memcached.
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [From cache to in-memory data grid](http://www.slideshare.net/tmatyashovsky/from-cache-to-in-memory-data-grid-introduction-to-hazelcast)
|
* [From cache to in-memory data grid](http://www.slideshare.net/tmatyashovsky/from-cache-to-in-memory-data-grid-introduction-to-hazelcast)
|
||||||
* [Scalable system design patterns](http://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html)
|
* [Scalable system design patterns](http://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html)
|
||||||
|
@ -1360,7 +1360,7 @@ If queues start to grow significantly, the queue size can become larger than mem
|
||||||
|
|
||||||
* Use cases such as inexpensive calculations and realtime workflows might be better suited for synchronous operations, as introducing queues can add delays and complexity.
|
* Use cases such as inexpensive calculations and realtime workflows might be better suited for synchronous operations, as introducing queues can add delays and complexity.
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [It's all a numbers game](https://www.youtube.com/watch?v=1KRYH75wgy4)
|
* [It's all a numbers game](https://www.youtube.com/watch?v=1KRYH75wgy4)
|
||||||
* [Applying back pressure when overloaded](http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html)
|
* [Applying back pressure when overloaded](http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html)
|
||||||
|
@ -1393,7 +1393,7 @@ A basic HTTP request consists of a verb (method) and a resource (endpoint). Bel
|
||||||
|
|
||||||
HTTP is an application layer protocol relying on lower-level protocols such as **TCP** and **UDP**.
|
HTTP is an application layer protocol relying on lower-level protocols such as **TCP** and **UDP**.
|
||||||
|
|
||||||
#### Source(s) and further reading: HTTP
|
#### Source(s) for further reading: HTTP
|
||||||
|
|
||||||
* [What is HTTP?](https://www.nginx.com/resources/glossary/http/)
|
* [What is HTTP?](https://www.nginx.com/resources/glossary/http/)
|
||||||
* [Difference between HTTP and TCP](https://www.quora.com/What-is-the-difference-between-HTTP-protocol-and-TCP-protocol)
|
* [Difference between HTTP and TCP](https://www.quora.com/What-is-the-difference-between-HTTP-protocol-and-TCP-protocol)
|
||||||
|
@ -1443,7 +1443,7 @@ Use UDP over TCP when:
|
||||||
* Late data is worse than loss of data
|
* Late data is worse than loss of data
|
||||||
* You want to implement your own error correction
|
* You want to implement your own error correction
|
||||||
|
|
||||||
#### Source(s) and further reading: TCP and UDP
|
#### Source(s) for further reading: TCP and UDP
|
||||||
|
|
||||||
* [Networking for game programming](http://gafferongames.com/networking-for-game-programmers/udp-vs-tcp/)
|
* [Networking for game programming](http://gafferongames.com/networking-for-game-programmers/udp-vs-tcp/)
|
||||||
* [Key differences between TCP and UDP protocols](http://www.cyberciti.biz/faq/key-differences-between-tcp-and-udp-protocols/)
|
* [Key differences between TCP and UDP protocols](http://www.cyberciti.biz/faq/key-differences-between-tcp-and-udp-protocols/)
|
||||||
|
@ -1546,7 +1546,7 @@ REST is focused on exposing data. It minimizes the coupling between client/serv
|
||||||
<i><a href=https://apihandyman.io/do-you-really-know-why-you-prefer-rest-over-rpc/>Source: Do you really know why you prefer REST over RPC</a></i>
|
<i><a href=https://apihandyman.io/do-you-really-know-why-you-prefer-rest-over-rpc/>Source: Do you really know why you prefer REST over RPC</a></i>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
#### Source(s) and further reading: REST and RPC
|
#### Source(s) for further reading: REST and RPC
|
||||||
|
|
||||||
* [Do you really know why you prefer REST over RPC](https://apihandyman.io/do-you-really-know-why-you-prefer-rest-over-rpc/)
|
* [Do you really know why you prefer REST over RPC](https://apihandyman.io/do-you-really-know-why-you-prefer-rest-over-rpc/)
|
||||||
* [When are RPC-ish approaches more appropriate than REST?](http://programmers.stackexchange.com/a/181186)
|
* [When are RPC-ish approaches more appropriate than REST?](http://programmers.stackexchange.com/a/181186)
|
||||||
|
@ -1568,7 +1568,7 @@ Security is a broad topic. Unless you have considerable experience, a security
|
||||||
* Use parameterized queries to prevent SQL injection.
|
* Use parameterized queries to prevent SQL injection.
|
||||||
* Use the principle of [least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege).
|
* Use the principle of [least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege).
|
||||||
|
|
||||||
### Source(s) and further reading
|
### Source(s) for further reading
|
||||||
|
|
||||||
* [API security checklist](https://github.com/shieldfy/API-Security-Checklist)
|
* [API security checklist](https://github.com/shieldfy/API-Security-Checklist)
|
||||||
* [Security guide for developers](https://github.com/FallibleInc/security-guide-for-developers)
|
* [Security guide for developers](https://github.com/FallibleInc/security-guide-for-developers)
|
||||||
|
@ -1790,7 +1790,7 @@ Handy metrics based on numbers above:
|
||||||
* [Yelp Engineering Blog](http://engineeringblog.yelp.com/)
|
* [Yelp Engineering Blog](http://engineeringblog.yelp.com/)
|
||||||
* [Zynga Engineering Blog](https://www.zynga.com/blogs/engineering)
|
* [Zynga Engineering Blog](https://www.zynga.com/blogs/engineering)
|
||||||
|
|
||||||
#### Source(s) and further reading
|
#### Source(s) for further reading
|
||||||
|
|
||||||
Looking to add a blog? To avoid duplicating work, consider adding your company blog to the following repo:
|
Looking to add a blog? To avoid duplicating work, consider adding your company blog to the following repo:
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue