diff --git a/resources/noat.cards/Application layer.md b/resources/noat.cards/Application layer.md index 978afcbe..ae867b58 100644 --- a/resources/noat.cards/Application layer.md +++ b/resources/noat.cards/Application layer.md @@ -5,9 +5,9 @@ isdraft = False # Application layer -### Application layer - Introduction +## Application layer - Introduction -[![](https://camo.githubusercontent.com/feeb549c5b6e94f65c613635f7166dc26e0c7de7/687474703a2f2f692e696d6775722e636f6d2f7942355359776d2e706e67) ](https://camo.githubusercontent.com/feeb549c5b6e94f65c613635f7166dc26e0c7de7/687474703a2f2f692e696d6775722e636f6d2f7942355359776d2e706e67) +![](https://camo.githubusercontent.com/feeb549c5b6e94f65c613635f7166dc26e0c7de7/687474703a2f2f692e696d6775722e636f6d2f7942355359776d2e706e67) _[Source: Intro to architecting systems for scale](http://lethain.com/introduction-to-architecting-systems-for-scale/#platform_layer) _ @@ -17,7 +17,7 @@ The single responsibility principle advocates for small and autonomous services Workers in the application layer also help enable [asynchronism](https://github.com/donnemartin/system-design-primer#asynchronism) . -### Microservices +## Microservices Related to this discussion are [microservices](https://en.wikipedia.org/wiki/Microservices) , which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-definied, lightweight mechanism to serve a business goal. [1](https://smartbear.com/learn/api-design/what-are-microservices) @@ -32,7 +32,7 @@ Systems such as [Zookeeper](http://www.slideshare.net/sauravhaloi/introduction-t - Adding an application layer with loosely coupled services requires a different approach from an architectural, operations, and process viewpoint (vs a monolithic system) . - Microservices can add complexity in terms of deployments and operations. -### [](https://github.com/donnemartin/system-design-primer#sources-and-further-reading-9) Source(s) and further reading +### Source(s) and further reading - [Intro to architecting systems for scale](http://lethain.com/introduction-to-architecting-systems-for-scale) - [Crack the system design interview](http://www.puncsky.com/blog/2016/02/14/crack-the-system-design-interview/) diff --git a/resources/noat.cards/Asynchronism.md b/resources/noat.cards/Asynchronism.md index 3de6fba5..77bd02f1 100644 --- a/resources/noat.cards/Asynchronism.md +++ b/resources/noat.cards/Asynchronism.md @@ -5,12 +5,13 @@ isdraft = False # Asynchronism -[![](https://camo.githubusercontent.com/c01ec137453216bbc188e3a8f16da39ec9131234/687474703a2f2f692e696d6775722e636f6d2f353447597353782e706e67) ](https://camo.githubusercontent.com/c01ec137453216bbc188e3a8f16da39ec9131234/687474703a2f2f692e696d6775722e636f6d2f353447597353782e706e67) -_[Source: Intro to architecting systems for scale](http://lethain.com/introduction-to-architecting-systems-for-scale/#platform_layer) _ +![](https://camo.githubusercontent.com/c01ec137453216bbc188e3a8f16da39ec9131234/687474703a2f2f692e696d6775722e636f6d2f353447597353782e706e67) + +[Source: Intro to architecting systems for scale](http://lethain.com/introduction-to-architecting-systems-for-scale/#platform_layer) Asynchronous workflows help reduce request times for expensive operations that would otherwise be performed in-line. They can also help by doing time-consuming work in advance, such as periodic aggregation of data. -### Message queues +## Message queues Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message queue with the following workflow: @@ -25,21 +26,21 @@ RabbitMQ is popular but requires you to adapt to the 'AMQP' protocol and manage Amazon SQS, is hosted but can have high latency and has the possibility of messages being delivered twice. -### Task queues +## Task queues Tasks queues receive tasks and their related data, runs them, then delivers their results. They can support scheduling and can be used to run computationally-intensive jobs in the background. Celery has support for scheduling and primarily has python support. -### Back pressure +## Back pressure If queues start to grow significantly, the queue size can become larger than memory, resulting in cache misses, disk reads, and even slower performance. [Back pressure](http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html) can help by limiting the queue size, thereby maintaining a high throughput rate and good response times for jobs already in the queue. Once the queue fills up, clients get a server busy or HTTP 503 status code to try again later. Clients can retry the request at a later time, perhaps with [exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff) . -### Disadvantage(s) : asynchronism +## Disadvantage(s) : asynchronism - Use cases such as inexpensive calculations and realtime workflows might be better suited for synchronous operations, as introducing queues can add delays and complexity. -### Source(s) and further reading +## Source(s) and further reading - [It's all a numbers game](https://www.youtube.com/watch?v=1KRYH75wgy4) - [Applying back pressure when overloaded](http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html) diff --git a/resources/noat.cards/Availability patterns.md b/resources/noat.cards/Availability patterns.md index 65814a84..09f0d3ba 100644 --- a/resources/noat.cards/Availability patterns.md +++ b/resources/noat.cards/Availability patterns.md @@ -7,7 +7,7 @@ isdraft = False There are two main patterns to support high availability:fail-over and replication. -### Active-passive (Fail-Over) +## Active-passive (Fail-Over) With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active's IP address and resumes service. diff --git a/resources/noat.cards/MD5.md b/resources/noat.cards/MD5.md new file mode 100644 index 00000000..ae14d468 --- /dev/null +++ b/resources/noat.cards/MD5.md @@ -0,0 +1,9 @@ ++++ +noatcards = True +isdraft = False ++++ + +MD5 +--- +- Widely used hashing function that produces a 128-bit hash value +- Uniformly distributed \ No newline at end of file diff --git a/resources/noat.cards/NoSQL.md b/resources/noat.cards/NoSQL.md new file mode 100644 index 00000000..00c6c19a --- /dev/null +++ b/resources/noat.cards/NoSQL.md @@ -0,0 +1,28 @@ ++++ +noatcards = True +isdraft = False ++++ + +# NoSQL + +## NoSQL introduction + +NoSQL is a collection of data items represented in a key-value store, document-store, wide column store, or a graph database. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores lack true ACID transactions and favor [eventual consistency](https://github.com/donnemartin/system-design-primer#eventual-consistency) . + +## NoSQL under BASE principle + +BASE is often used to describe the properties of NoSQL databases. In comparison with the [CAP Theorem](https://github.com/donnemartin/system-design-primer#cap-theorem) , BASE chooses availability over consistency. + +- Basically available - the system guarantees availability. +- Soft state - the state of the system may change over time, even without input. +- Eventual consistency - the system will become consistent over a period of time, given that the system doesn't receive input during that period. + +In addition to choosing between [SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql) , it is helpful to understand which type of NoSQL database best fits your use case(s) . We'll review key-value stores, document-stores, wide column stores, and graph databases in the next section. + +## Source(s) and further reading: NoSQL + +- [Explanation of base terminology](http://stackoverflow.com/questions/3342497/explanation-of-base-terminology) +- [NoSQL databases a survey and decision guidance](https://medium.com/baqend-blog/nosql-databases-a-survey-and-decision-guidance-ea7823a822d#.wskogqenq) +- [Scalability](http://www.lecloud.net/post/7994751381/scalability-for-dummies-part-2-database) +- [Introduction to NoSQL](https://www.youtube.com/watch?v=qI_g07C_Q5I) +- [NoSQL patterns](http://horicky.blogspot.com/2009/11/nosql-patterns.html) \ No newline at end of file diff --git a/resources/noat.cards/Performance vs scalability.md b/resources/noat.cards/Performance vs scalability.md new file mode 100644 index 00000000..7e9d89bf --- /dev/null +++ b/resources/noat.cards/Performance vs scalability.md @@ -0,0 +1,20 @@ ++++ +noatcards = True +isdraft = False ++++ + +# Performance vs scalability + +## Performance vs scalability + +A service is scalable if it results in increased performance in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.[1](http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html) + +Another way to look at performance vs scalability: + +- If you have a performance problem, your system is slow for a single user. +- If you have a scalability problem, your system is fast for a single user but slow under heavy load. + +### Source(s) and further reading + +- [A word on scalability](http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html) +- [Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/) \ No newline at end of file diff --git a/resources/noat.cards/Refresh-ahead.md b/resources/noat.cards/Refresh-ahead.md new file mode 100644 index 00000000..e7febddb --- /dev/null +++ b/resources/noat.cards/Refresh-ahead.md @@ -0,0 +1,15 @@ +# Refresh-ahead + +## Introduction + +![](https://camo.githubusercontent.com/49dcb54307763b4f56d61a4a1369826e2e7d52e4/687474703a2f2f692e696d6775722e636f6d2f6b78746a7167452e706e67) + +[Source: From cache to in-memory data grid](http://www.slideshare.net/tmatyashovsky/from-cache-to-in-memory-data-grid-introduction-to-hazelcast) + +You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration. + +Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future. + +## Disadvantage(s) : refresh-ahead + +- Not accurately predicting which items are likely to be needed in the future can result in reduced performance than without refresh-ahead. diff --git a/resources/noat.cards/SQL or NoSQL.md b/resources/noat.cards/SQL or NoSQL.md new file mode 100644 index 00000000..dc40d907 --- /dev/null +++ b/resources/noat.cards/SQL or NoSQL.md @@ -0,0 +1,51 @@ ++++ +noatcards = True +isdraft = False ++++ + +# SQL or NoSQL + +## Reasons for SQL: + +![](https://camo.githubusercontent.com/a6e2e844765c9d5382d9c9b64ef7693977981646/687474703a2f2f692e696d6775722e636f6d2f775847714735662e706e67) + +[Source: Transitioning from RDBMS to NoSQL](https://www.infoq.com/articles/Transition-RDBMS-NoSQL/) + + +- Structured data +- Strict schema +- Relational data +- Need for complex joins +- Transactions +- Clear patterns for scaling +- More established: developers, community, code, tools, etc +- Lookups by index are very fast + +## Reasons for NoSQL: + +![](https://camo.githubusercontent.com/a6e2e844765c9d5382d9c9b64ef7693977981646/687474703a2f2f692e696d6775722e636f6d2f775847714735662e706e67) + +[Source: Transitioning from RDBMS to NoSQL](https://www.infoq.com/articles/Transition-RDBMS-NoSQL/) + + +- Semi-structured data +- Dynamic or flexible schema +- Non relational data +- No need for complex joins +- Store many TB (or PB) of data +- Very data intensive workload +- Very high throughput for IOPS + +## Sample data well-suited for NoSQL: + + +- Rapid ingest of clickstream and log data +- Leaderboard or scoring data +- Temporary data, such as a shopping cart +- Frequently accessed ('hot') tables +- Metadata/lookup tables + +## Source(s) and further reading: SQL or NoSQL + +- [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=vg5onp8TU6Q) +- [SQL vs NoSQL differences](https://www.sitepoint.com/sql-vs-nosql-differences/) \ No newline at end of file diff --git a/resources/noat.cards/SQL tuning.md b/resources/noat.cards/SQL tuning.md new file mode 100644 index 00000000..bcbb653b --- /dev/null +++ b/resources/noat.cards/SQL tuning.md @@ -0,0 +1,56 @@ ++++ +noatcards = True +isdraft = False ++++ + +# SQL tuning + +## Introduction + +SQL tuning is a broad topic and many [books](https://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=sql+tuning) have been written as reference. + +It's important to benchmark and profile to simulate and uncover bottlenecks. + +- Benchmark - Simulate high-load situations with tools such as [ab](http://httpd.apache.org/docs/2.2/programs/ab.html) . +- Profile - Enable tools such as the [slow query log](http://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html) to help track performance issues. + +Benchmarking and profiling might point you to the following optimizations. + +## Tighten up the schema + +- MySQL dumps to disk in contiguous blocks for fast access. +- Use `CHAR` instead of `VARCHAR` for fixed-length fields. + - `CHAR` effectively allows for fast, random access, whereas with `VARCHAR`, you must find the end of a string before moving onto the next one. +- Use `TEXT` for large blocks of text such as blog posts. `TEXT` also allows for boolean searches. Using a `TEXT` field results in storing a pointer on disk that is used to locate the text block. +- Use `INT` for larger numbers up to 2^32 or 4 billion. +- Use `DECIMAL` for currency to avoid floating point representation errors. +- Avoid storing large `BLOBS`, store the location of where to get the object instead. +- `VARCHAR(255) ` is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS. +- Set the `NOT NULL` constraint where applicable to [improve search performance](http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search) . + +### Use good indices + +- Columns that you are querying (`SELECT`, `GROUP BY`, `ORDER BY`, `JOIN`) could be faster with indices. +- Indices are usually represented as self-balancing [B-tree](https://en.wikipedia.org/wiki/B-tree) that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. +- Placing an index can keep the data in memory, requiring more space. +- Writes could also be slower since the index also needs to be updated. +- When loading large amounts of data, it might be faster to disable indices, load the data, then rebuild the indices. + +## Avoid expensive joins + +- [Denormalize](https://github.com/donnemartin/system-design-primer#denormalization) where performance demands it. + +## Partition tables + +- Break up a table by putting hot spots in a separate table to help keep it in memory. + +## Tune the query cache + +- In some cases, the [query cache](http://dev.mysql.com/doc/refman/5.7/en/query-cache) could lead to [performance issues](https://www.percona.com/blog/2014/01/28/10-mysql-performance-tuning-settings-after-installation/) . + +## Source(s) and further reading: SQL tuning + +- [Tips for optimizing MySQL queries](http://20bits.com/article/10-tips-for-optimizing-mysql-queries-that-dont-suck) +- [Is there a good reason i see VARCHAR(255) used so often?](http://stackoverflow.com/questions/1217466/is-there-a-good-reason-i-see-varchar255-used-so-often-as-opposed-to-another-l) +- [How do null values affect performance?](http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search) +- [Slow query log](http://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html) \ No newline at end of file diff --git a/resources/noat.cards/Security.md b/resources/noat.cards/Security.md new file mode 100644 index 00000000..7af49afd --- /dev/null +++ b/resources/noat.cards/Security.md @@ -0,0 +1,16 @@ +Security +-------- +--- +This section could use some updates. Consider [contributing](https://github.com/donnemartin/system-design-primer#contributing) ! + +Security is a broad topic. Unless you have considerable experience, a security background, or are applying for a position that requires knowledge of security, you probably won't need to know more than the basics: + +- Encrypt in transit and at rest. +- Sanitize all user inputs or any input parameters exposed to user to prevent [XSS](https://en.wikipedia.org/wiki/Cross-site_scripting) and [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) . +- Use parameterized queries to prevent SQL injection. +- Use the principle of [least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege) . + +### [](https://github.com/donnemartin/system-design-primer#sources-and-further-reading-12) Source(s) and further reading + +- [Security guide for developers](https://github.com/FallibleInc/security-guide-for-developers) +- [OWASP top ten](https://www.owasp.org/index.php/OWASP_Top_Ten_Cheat_Sheet) \ No newline at end of file diff --git a/resources/noat.cards/Sharding.md b/resources/noat.cards/Sharding.md new file mode 100644 index 00000000..7405f70a --- /dev/null +++ b/resources/noat.cards/Sharding.md @@ -0,0 +1,32 @@ ++++ +noatcards = True +isdraft = False ++++ + +# Sharding + +## Introduction + +![](https://camo.githubusercontent.com/1df78be67b749171569a0e11a51aa76b3b678d4f/687474703a2f2f692e696d6775722e636f6d2f775538783549642e706e67) + +[Source: Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/) + +Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster. + +Similar to the advantages of [federation](https://github.com/donnemartin/system-design-primer#federation) , sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you'll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput. + +Common ways to shard a table of users is either through the user's last name initial or the user's geographic location. + +## Disadvantage(s) : sharding + +- You'll need to update your application logic to work with shards, which could result in complex SQL queries. +- Data distribution can become lobsided in a shard. For example, a set of power users on a shard could result in increased load to that shard compared to others. + - Rebalancing adds additional complexity. A sharding function based on [consistent hashing](http://www.paperplanes.de/2011/12/9/the-magic-of-consistent-hashing.html) can reduce the amount of transferred data. +- Joining data from multiple shards is more complex. +- Sharding adds more hardware and additional complexity. + +## Source(s) and further reading: sharding + +- [The coming of the shard](http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-database-design-the-coming-of-the.html) +- [Shard database architecture](https://en.wikipedia.org/wiki/Shard_(database_architecture)) +- [Consistent hashing](http://www.paperplanes.de/2011/12/9/the-magic-of-consistent-hashing.html) \ No newline at end of file diff --git a/resources/noat.cards/Wide column store.md b/resources/noat.cards/Wide column store.md new file mode 100644 index 00000000..b0aa828c --- /dev/null +++ b/resources/noat.cards/Wide column store.md @@ -0,0 +1,27 @@ ++++ +noatcards = True +isdraft = False ++++ + +# Wide column store + +## introduction + +![](https://camo.githubusercontent.com/823668b07b4bff50574e934273c9244e4e5017d6/687474703a2f2f692e696d6775722e636f6d2f6e3136694f476b2e706e67) + +[Source: SQL & NoSQL, a brief history](http://blog.grio.com/2015/11/sql-nosql-a-brief-history.html) + +> Abstraction: nested map `ColumnFamily>` + +A wide column store's basic unit of data is a column (name/value pair) . A column can be grouped in column families (analogous to a SQL table) . Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution. + +Google introduced [Bigtable](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf) as the first wide column store, which influenced the open-source [HBase](https://www.mapr.com/blog/in-depth-look-hbase-architecture) often-used in the Hadoop ecosystem, and [Cassandra](http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureIntro_c.html) from Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges. + +Wide column stores offer high availability and high scalability. They are often used for very large data sets. + +## Source(s) and further reading: wide column store + +- [SQL & NoSQL, a brief history](http://blog.grio.com/2015/11/sql-nosql-a-brief-history.html) +- [Bigtable architecture](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf) +- [HBase architecture](https://www.mapr.com/blog/in-depth-look-hbase-architecture) +- [Cassandra architecture](http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureIntro_c.html) \ No newline at end of file diff --git a/resources/noat.cards/Write-behind (write-back).md b/resources/noat.cards/Write-behind (write-back).md new file mode 100644 index 00000000..321e9ee5 --- /dev/null +++ b/resources/noat.cards/Write-behind (write-back).md @@ -0,0 +1,22 @@ ++++ +noatcards = True +isdraft = False ++++ + +# Write-behind (write-back) + +## Introduction + +![](https://camo.githubusercontent.com/8aa9f1a2f050c1422898bb5e82f1f01773334e22/687474703a2f2f692e696d6775722e636f6d2f72675372766a472e706e67) + +[Source: Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/) + +In write-behind, tha application does the following: + +- Add/update entry in cache +- Asynchronously write entry to the data store, improving write performance + +## Disadvantage(s) : write-behind + +- There could be data loss if the cache goes down prior to its contents hitting the data store. +- It is more complex to implement write-behind than it is to implement cache-aside or write-through. \ No newline at end of file diff --git a/resources/noat.cards/Write-through.md b/resources/noat.cards/Write-through.md new file mode 100644 index 00000000..95046025 --- /dev/null +++ b/resources/noat.cards/Write-through.md @@ -0,0 +1,39 @@ ++++ +noatcards = True +isdraft = False ++++ + +# Write-through + +## Write-through introduction + +![](https://camo.githubusercontent.com/56b870f4d199335ccdbc98b989ef6511ed14f0e2/687474703a2f2f692e696d6775722e636f6d2f3076426330684e2e706e67) + +[Source: Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/) + +The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database: + +- Application adds/updates entry in cache +- Cache synchronously writes entry to data store +- Return + +Application code: + +``` + set_user(12345, {"foo":"bar"}) +``` + +Cache code: + +``` +def set_user(user_id, values) : + user = db.query("UPDATE Users WHERE id = {0}", user_id, values) + cache.set(user_id, user) +``` + +Write-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast. Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale. + +## Disadvantage(s) : write through + +- When a new node is created due to failure or scaling, the new node will not cache entries until the entry is updated in the database. Cache-aside in conjunction with write through can mitigate this issue. +- Most data written might never read, which can be minimized with a TTL. \ No newline at end of file