mirror of
				https://github.com/mxssl/sre-interview-prep-guide.git
				synced 2025-11-04 10:12:34 +03:00 
			
		
		
		
	
			
				
					
						
					
					40b4e1cd893e397ebda5171cf16f906cc01b91d3
				
			
			
		
	Site Reliability Engineer (SRE) Interview Preparation Guide
This repository is an attempt to consolidate useful resources for Site Reliability Engineer (SRE) interview preparation.
Contributing
Please take a look at the contribution guidelines first. Contributions are always welcome!
Basics
- Simple: What happens when you type in ‘www.cnn.com’ in your browser?
 - Detailed: What happens when you type google.com into your browser's address box and press enter?
 
Linux
- Introduction to Linux – Full Course for Beginners
 - What every SRE should know about GNU/Linux shell related internals: file descriptors, pipes, terminals, user sessions, process groups and daemons
 - SRE deep dive into Linux Page Cache
 
Boot Process
- How Does Linux Boot Process Work?
 - An introduction to the Linux boot and startup processes
 - What happens when we turn on computer?
 - What happens when we turn on computer?
 - From Power up to login prompt
 
Filesystem
- Understanding Inodes
 - Understand UNIX / Linux Inodes Basics with Examples
 - Understanding proc filesystem
 - Common Mount Options
 - Understanding Linux filesystems: ext4 and beyond
 
Kernel
- Explain the basics of Linux kernel
 - Kernel Space and User Space
 - Linux Kernel Process Management
 - Linux Addressing
 - Linux Kernel Memory Management
 - STACK AND HEAP
 - Paging and Segmentation
 - Linux Kernel System Calls
 - The Virtual Filesystem
 - Concurrency and Race Conditions
 - Memory Leak
 - What is a kernel Panic?
 - Book about the linux kernel
 
Troubleshooting
- Linux troubleshooting tools
 - Linux Performance Analysis in 60,000 Milliseconds
 - strace
 - lsof
 - Linux system debugging
 - SaaS where users can test their Linux troubleshooting skills
 
Networking
- The Internet explained from first principles
 - Network protocols for anyone who knows a programming language
 - Introduction to Linux interfaces for virtual networking
 - Multi-tier load-balancing with Linux
 - Introduction to modern network load balancing and proxying
 - Load Balancing Algorithms
 
Containers
- Introduction to Docker and Containers
 - Containers Patterns
 - Docker Container Anti Patterns
 - Anti-Patterns When Building Container Images
 
Kubernetes
- Deploying and Scaling Microservices with Docker and Kubernetes
 - Demystifying the Kubernetes Iceberg
 - What happens when ... Kubernetes edition!
 - Kubernetes Production Patterns
 - Kubernetes production best practices
 - A Guide to the Kubernetes Networking Model
 - 47 Things To Become a Kubernetes Expert
 - Kubernetes Best Practices 101
 - 15 Kubernetes Best Practices Every Developer Should Know
 - THE KUBERNETES NETWORKING GUIDE
 - The life of a DNS query in Kubernetes
 
Infrastructure as code / Configuration management
- Terraform
 - A Comprehensive Guide to Terraform
 - Ansible
 - Getting Started With Terraform on AWS
 - Google Cloud: Best practices for using Terraform
 
Databases
- Things You Should Know About Databases
 - 7 Database Paradigms
 - CAP theorem
 - Evolutionary Database Design
 - ACID vs BASE in Databases
 - Understanding Database Sharding
 - Database Replication
 - SQL vs. NoSQL Database: When to Use, How to Choose
 - How do database indexes work?
 - Redis Explained
 - Database Sharding Explained
 
CI/CD
- Continuous Integration
 - 7 Pipeline Design Patterns for Continuous Delivery
 - CI/CD patterns
 - Six Strategies for Application Deployment
 
Clouds
Programming
Python
Go (Golang)
- A tour of Go
 - Go by Example
 - Go Tutorials & Examples
 - Learn Go with Tests
 - Getting up and running with Go
 - Effective Go
 - Go Design Patterns
 - Go Memory Management
 - Style Guide
 - Style Decisions
 - Best Practices
 - 50 Shades of Go: Traps, Gotchas, and Common Mistakes for New Golang Devs
 
Big O Notation, Algorithms and Data Structures
- AlgoExpert
 - Hacking a Google Interview – Handout 1
 - Hacking a Google Interview – Handout 2
 - Hacking a Google Interview – Handout 3
 
System design
- SystemsExpert course from AlgoExpert
 - System Design 101
 - Grokking the System Design Interview
 - The System Design Primer
 - Crack the System Design Interview
 - System design interview for IT companies
 - Web Architecture 101
 - What's in a Production Web Application?
 - Distributed systems
 - Failover
 - Monoliths, Service Architecture, and Microservices
 - Scale From Zero To Millions Of Users
 
System design examples
Monitoring
- SLOs & You: A Guide To Service Level Objectives
 - Setting up Service Monitoring — The Why’s and What’s
 - How NOT to Measure Latency
 - The four Golden Signals of Kubernetes monitoring
 
Prometheus
- Introduction to Prometheus
 - Prometheus Relabeling Training
 - Avoid These 6 Mistakes When Getting Started With Prometheus
 - A Deep Dive Into the Four Types of Prometheus Metrics
 - How Prometheus Querying Works
 - PromQL Cheat Sheet
 
Processes
- The practical guide to incident management
 - Incident Response
 - Postmortems
 - Runbooks
 - Identifying and tracking toil using SRE principles
 - Building SRE from Scratch
 - SRE at Google: Our complete list of CRE life lessons
 - Incident Management vs. Incident Response - What's the Difference?
 - Practical Guide to SRE: Using SLOs to Increase Reliability
 - Practical Guide to SRE: Automating On-Call
 - Going from Zero to SRE
 - An Incident Command Training Handbook
 - Howie guide to post‑incident investigations
 - Rundown of LinkedIn’s SRE practices
 - Rundown of Uber’s SRE practice
 - SRE in the Real World
 - SRE Engagement Models
 - SRE Checklist
 - Why bother with SLI and SLO?
 - The System Resiliency Pyramid
 - 10 Tips for Onboarding New SRE Hires
 
Resume
Interview
SRE interview process
Interview Questions
- A collection of questions to practice with for SRE interviews
 - SRE Interview Questions
 - Sysadmin Test Questions
 - Kubernetes job interview questions
 - DevOps Guide
 - Questions I ask in SRE interviews
 - DevOps Roadmap: Learn to become a DevOps Engineer or SRE
 - The Must-Know Terraform Interview Questions
 
Blogposts
- SRE Interviews in Silicon Valley
 - Preparing the SRE interview
 - How to Get Into SRE
 - My Job Interview at Google
 - Path to Site Reliability Management
 - Becoming a Site Reliability Engineer
 - How I get a job at Google as SRE
 - Become A DevOps Engineer in 2023: [Detailed Guide]
 - How to Get an SRE Role
 
Books
SRE books
- Site Reliability Engineering
 - The Site Reliability Workbook
 - Seeking SRE
 - Building Secure and Reliable Systems
 - Implementing Service Level Objectives
 
Linux
- Linux Kernel Development (3rd Edition)
 - UNIX and Linux System Administration Handbook (5th Edition)
 - Linux Pocket Guide, 3rd Edition
 
Networking
Troubleshooting and Performance
Courses
Description