📘 Learning

My personal roadmap from mid-level to Staff Engineer — built to master systems, design, and technical leadership.

⚠️ Living Document Disclaimer
This study plan is a living, breathing beast. I’m constantly refining it — sometimes through major updates, sometimes in quiet 3am tweaks. New resources get added, stages evolve, and priorities shift as I grow.

If something looks different the next time you check in, that’s intentional — growth is iterative.

For transparency, a full update log is maintained at the end of this document so you can track what’s changed over time.

🟢 Active Focus

✝️ Nave’s Topical Bible

Focus: Studying the Bible by topic to understand what Scripture says across books and verses
Approach: One topic per day, guided by the Holy Spirit
Purpose: To deepen scriptural understanding and anchor faith with clarity and context
Start Date: June 23, 2025
Last updated: June 22, 2025

🚧 Study Plan

🌱 Stage 0: Mindset, Growth, and Career Strategy

Becoming a Staff Engineer is 50% technical skill and 50% vision, leadership, and decision-making. This stage forms the foundation.

🧭 Must-Reads on Growth & Seniority

🔁 Career Navigation & Strategy

📖 books

[ ] The Pragmatic Programmer – link
[ ] Code Complete – Amazon
[ ] Release It! – Amazon
[ ] Scalability Rules – Amazon
[ ] A Philosophy of Software Design – Amazon
[ ] Software Engineering at Google (free) – SEAG

new mentorship & influence topics

[ ] Code Review Culture,
[ ] Constructive Feedback,
[ ] Async Communication,
[ ] 1:1 Coaching,
[ ] Setting Team Standards.

🖊️ writing (communication, blogging)

[ ] Undervalued Software Engineering Skills: Writing Well
- From the HN discussion: “Writing a couple of pages of design docs or an Amazon-style 6 pager or whatever might take a few days of work, but can save weeks or more of wasted implementation time when you realise your system design was flawed or it doesn’t address any real user needs.”
[ ] Sell Yourself Sell Your Work
- If you’ve done great work, if you’ve produced superb software or fixed a fault with an aeroplane or investigated a problem, without telling anyone you may as well not have bothered.
[ ] The Writing Well Handbook
- Ideas — Identify what to write about
- First Drafts — Generate insights on your topic
- Rewriting — Rewrite for clarity, intrigue, and succinctness
- Style — Rewrite for style and flow
- Practicing — Improve as a writer
[ ] Write Simply, Paul Graham
[ ] Writing is Thinking: Learning to Write with Confidence
[ ] It’s time to start writing explains why Jeff Bezos banned PowerPoint at Amazon.
- The reason writing a good 4 page memo is harder than “writing” a 20 page powerpoint is because the narrative structure of a good memo forces better thought and better understanding of what’s more important than what, and how things are related.
- Powerpoint-style presentations somehow give permission to gloss over ideas, flatten out any sense of relative importance, and ignore the interconnectedness of ideas.
[ ] Programming and Writing, Antirez
[ ] Writing one sentence per line
[ ] Ask HN: How to level up your technical writing?. Lots of great resources.
[ ] Patterns in confusing explanations, Julia Evans
[ ] Technical Writing for Developers
[ ] Some blogging myths, Julia Evans
[ ] George Orwell’s Six Rules for Writing
- Never use a metaphor, simile, or other figure of speech which you are used to seeing in print.
- Never use a long word where a short one will do.
- If it is possible to cut a word out, always cut it out.
- Never use the passive where you can use the active.
- Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.
- Break any of these rules sooner than say anything outright barbarous.
[ ] Blog Writing for Developers
[ ] 7 Common Mistakes in Architecture Diagrams
[ ] Why Blog If Nobody Reads It?
- Blogging forces clarity. It makes you structure your thoughts, sharpen your perspective.

⠀Guides & classes about technical writing:

[ ] Documentation Guide — Write the Docs
- Principles
- Style guides
- Docs as code
- Markup languages
- Tools
[ ] Technical Writing One introduction, Google
- Grammar
- Active voice
- Clear & short sentences

If you’re overthinking, write. If you’re underthinking, read. – @AlexAndBooks_

⠀Personal knowledge management (PKM)

[ ] Zettelkasten Method
[ ] How to build a second brain as a software developer
[ ] Notes Against Note-Taking Systems
- An interesting contrarian take!
- I am waiting for any evidence that our most provocative thinkers and writers are those who rely on elaborate, systematic note-taking systems.
- I am seeing evidence that people taught knowledge management for its own sake produce unexciting work.
[ ] MaggieAppleton/digital-gardeners
[ ] Notes apps are where ideas go to die. And that’s good.
[ ] I Deleted My Second Brain

🧮 Stage 1: Math, Programming Fluency & Algorithms

Build solid foundations in computation, logic, math, programming fluency, and algorithms.

elementary math

college math

calculus

[ ] MIT – Single Variable Calculus

Supplementary Material

linear algebra

Required Reading

[ ] 📚 Linear Algebra and its Applications
[ ] 📚 Coding the Matrix

Supplementary Material

discrete math

proofs and logic

Proofs, Set theory, propositional logic, induction, invariants, state-machines

[ ] Coursera – What is a Proof?
[ ] MIT – Mathematics for Computer Science (2015): Unit 1
[ ] MIT – Mathematics for Computer Science (2010): Weeks 1,2,3
[ ] 📚 How to Prove It
[ ] 📚 Book of Proof
[ ] https://www.logicmatters.net/resources/pdfs/TeachYourselfLogic2017.pdf

number theory

Number theory is fundamental in reasoning about numbers as discrete mathematic structures with applications in cryptography and efficient numerical computation.
By the end of this sub-module you should be very confident proving and reasoning about concepts including: divisibility, bezouts identity, modular arithmetic, eulers totient theorem, fermats little theorem, integer factorization, diophantine equations, the fundemental theorem of arithmetic, chinese remainder theorem, RSA and the discrete logarithm problem.

Problem Sets

[ ] MIT – Mathematics for Computer Science (2010): Recitation 4
[ ] MIT – Mathematics for Computer Science (2010): Recitation 5
[ ] MIT – Mathematics for Computer Science (2010): Assignment 3
Optional Supplementary Material
[ ] Coursera – Classical Cryptosystems and Core Concepts
[ ] Coursera – Mathematical Foundations for Cryptography

combinatorics

Combinatorics is a vital skill in reasoning about the size of finite sets.

Problem Sets

graph theory

supplementary material

[ ] MIT – Mathematics for Computer Science (2017)
[ ] MIT – Mathematics for Computer Science (2015)
[ ] MIT – Mathematics for Computer Science (2010)
[ ] Arsdigita University – Discrete Mathematics
[ ] Coursera – Discrete Mathematics
[ ] 📚 From Mathematics to Generic Programming
[ ] Visual Group Theory computer-science/6-042j-mathematics-for-computer-science-fall-2010/)
[ ] https://www.coursera.org/learn/discrete-mathematics#%20
[ ] https://www.youtube.com/playlist?list=PLZzHxk_TPOStgPtqRZ6KzmkUQBQ8TSWVX
[ ] 📚 Discrete-Mathematics-Applications
[ ] 📚 Concrete Mathematics
[ ] Proofs from the book

probability and statistics

probability

statistics

key books:

[ ] Coding the Matrix – Strang et al.

[ ] Concrete Mathematics – Knuth et al.

[ ] The Book of Proof – Hammack

[ ] Elements of Statistical Learning – Hastie, Tibshirani, Friedman

🛠 Languages (Deep Proficiency in 2, Working Knowledge in 2+)

[ ] Rust – The Rust Programming Language, Rust by Example, Tokio
[ ] Go – The Go Programming Language, Go Tour, Go by Tests
[ ] Python – Fluent Python, Clean Code in Python
[ ] C++ – Effective C++, LearnCpp
[ ] Java – Effective Java, Modern Java in Action

typescript (+ javascript, next js, shadcn/ui, tailwind css)

[ ] JavaScript is such a pervasive language that it’s almost required learning.
[ ] mbeaudru/modern-js-cheatsheet: cheatsheet for the JavaScript knowledge you will frequently encounter in modern projects.
[ ] javascript-tutorial: comprehensive JavaScript guide with simple but detailed explanantions. Available in several languages.
[ ] 30 Days of JavaScript: 30 days of JavaScript programming challenge is a step-by-step guide to learn JavaScript programming language in 30 days.
[ ] Unleash JavaScript’s Potential with Functional Programming
[ ] grab/front-end-guide: a study guide and introduction to the modern front end stack.
[ ] Front-End Developer Handbook 2019, Cody Lindley
[ ] A Directory of design and front-end resources
[ ] 🧰 codingknite/frontend-development: a list of resources for frontend development
[ ] leonardomso/33-js-concepts: 33 JavaScript concepts every developer should know.
[ ] The Modern JavaScript Tutorial

web development

Topics:

[ ] 136 facts every web dev should know
[ ] Maintainable CSS
[ ] Things you forgot (or never knew) because of React
[ ] Checklist – The A11Y Project for accessibility
[ ] DevTools Tips
[ ] 67 Weird Debugging Tricks Your Browser Doesn’t Want You to Know
[ ] Client-Side Architecture Basics
[ ] Web Browser Engineering: this book explains how to build a basic but complete web browser, from networking to JavaScript, in a couple thousand lines of Python.

URLs:

[ ] The Great Confusion About URIs
- A URI is a string of characters that identifies a resource. Its syntax is <scheme>:<authority><path>?<query>#<fragment>, where only <scheme> and <path> are mandatory. URL and URN are URIs.
- A URL is a string of characters that identifies a resource located on a computer network. Its syntax depends on its scheme. E.g. mailto:billg@microsoft.com.
- A URN is a string of characters that uniquely identifies a resource. Its syntax is urn:<namespace identifier>:<namespace specific string>. E.g. urn:isbn:9780062301239
[ ] Examples of Great URL Design
[ ] Four Cool URLs – Alex Pounds’ Blog

📚 Programming Theory

[ ] Structure & Interpretation of Computer Programs (SICP)
[ ] SICP Videos – 1
[ ] SICP Videos – 2

🧠 Algorithms & Data Structures

Resources

[ ] 📚 Grokking Algorithms
[ ] 📚 Algorithms to Live By
[ ] 📚 Introduction to Algorithms (CLRS)
[ ] 📚 The Algorithm Design Manual
[ ] 📚 Algorithms (Dasgupta)
[ ] 📚 Algorithm Design (Tardos and Kleinberg)
[ ] 📚 Algorithms (Sedgewick)
[ ] Khan Algorithms
[ ] MIT 6.006 – Introduction to Algorithms
[ ] Intro to Algorithms
[ ] Algorithmic Thinking I
[ ] Algorithmic Thinking II
[ ] https://www.youtube.com/watch?v=T_WffoMAaMA
[ ] https://www.coursera.org/specializations/data-structures-algorithms
[ ] https://www.youtube.com/user/mycodeschool
[ ] http://www3.cs.stonybrook.edu/~algorith/
[ ] https://www.youtube.com/watch?v=ufj5_bppBsA&list=PLFDnELG9dpVxQCxuD-9BSy2E7BWY3t5Sm&index=7
[ ] https://www.youtube.com/user/mikeysambol/playlists
[ ] https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/
[ ] Programming Conversations
[ ] Efficient Programming with Components
[ ] Four Algorithmic Journeys
[ ] Computer Science: Algorithms, Theory, and Machines
[ ] Data Structures & Algorithms Specialization
[ ] Approximation Algorithms I & II
[ ] http://jeffe.cs.illinois.edu/teaching/algorithms/?#book

compilers & interpreters (advanced programming)

⚙️ Stage 2: Computer Systems, OS, and Architecture

Gain deep understanding of how software runs from transistors to OS to protocols.

computer architecture & databases

[ ] DDIA
[ ] Designing Data-Intensive Applications Summary Guides
[ ] 📖 CS:APP – Computer Systems: A Programmer’s Perspective
[ ] https://csapp.cs.cmu.edu/3e/courses.html
[ ] Nand2Tetris
[ ] UC Berkeley CS61C
[ ] Coursera – Computer Architecture
[ ] MIT Computer System Engineering
[ ] Coursera – Computer Science Algs, Theory, Machines

operating systems

[ ] MIT xv6
[ ] 📖 The Linux Programming Interface: A Linux and UNIX System Programming Handbook: already mentioned above.
[ ] 📖 Modern Operating Systems, Andrew Tanenbaum, Herbert Bos (not read)
[ ] 📖 Operating Systems: Three Easy Pieces (free book, not read)
[ ] 📖 Linux Kernel Development, Robert Love. A very complete introduction to developing within the Linux Kernel.
[ ] The 10 Operating System Concepts Software Developers Need to Remember
[ ] Play with xv6 on MIT 6.828
[ ] macOS Internals

networking

[ ] 📖 Computer Networking: A Top-Down Approach (Kurose & Ross)
[ ] Wireshark Labs
[ ] gRPC Docs
[ ] https://www.youtube.com/playlist?list=PLoCMsyE1cvdWKsLVyf6cPwCLDIZnOj0NS
[ ] Everything you need to know about DNS
[ ] Computer Networking Fundamentals
[ ] Topics: TCP/IP, UDP, HTTP/2, gRPC, WebSockets, DNS, load balancers, CDNs
[ ] Rate limiting, timeouts, retries, idempotency
[ ] Computer Networking: A Top Down Approach (Kurose)
[ ] gRPC Docs, HTTP: The Definitive Guide
[ ] API Design Patterns – JJ Geewax (Google)

performance engineering

[ ] High Performance Browser Networking
[ ] Brendan Gregg’s Linux performance tools
[ ] “Numbers Every Programmer Should Know”
[ ] Numbers Everyone Should Know
[ ] Latency numbers every programmer should know
[ ] Rob Pike’s 5 Rules of Programming
- You can’t tell where a program is going to spend its time.
- Measure
- Fancy algorithms are slow when n is small, and n is usually small.
- Fancy algorithms are buggier than simple ones
- Data dominates.
[ ] Performance comparison: counting words in Python, Go, C++, C, AWK, Forth, and Rust: a great way to learn about measuring performance.
[ ] The Mathematical Hacker
[ ] Four Kinds of Optimisation
[ ] Load testing, capacity planning, autoscaling, profiling
[ ] CPU profiling, memory usage, caching layers

information theory

🌐 Stage 3: Databases & Data Engineering

Learn how data moves, scales, and persists. Know RDBMS internals and build ETL systems.

PostgresSQL Internals

[ ] The internals of Redis
[ ] RDBMS (PostgreSQL, MySQL): joins, indexes, query plans
[ ] NoSQL: MongoDB, Cassandra, Redis, DynamoDB
[ ] Consistency models: eventual, linearisable, quorum reads/writes
[ ] Search systems: full text (BM25), vector (ANN)
[ ] DDIA, Database Systems (Ramakrishnan), Transaction Processing Concepts
[ ] Berkeley paper on FNTDB, SQL style guides
[ ] RDBMS: Postgre/MySQL: joins, indexes, query plans
[ ] Internals: Postgres Internals, Redis, DynamoDB consistency
[ ] NoSQL survey + DynamoDB docs
[ ] CAP, PACELC, zero-downtime migrations

databases

[ ] Coursera – Data Systems Specialization
[ ] Coursera – Data Visualization Specialization
[ ] MIT Information and Entropy
[ ] A plain English introduction to CAP Theorem
[ ] PACELC theorem: “in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C) (as per the CAP theorem), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).”
[ ] Zero downtime database migrations (code examples are using Rails but this works great for any programming language)
[ ] Algorithms Behind Modern Storage Systems, ACM Queue
[ ] Let’s Build a Simple Database
[ ] Readings in Database Systems, 5th Edition
[ ] Comparing database types: how database types evolved to meet different needs
[ ] How does a relational database work
[ ] Use the index, Luke
[ ] Course introduction — MySQL for Developers, PlanetScale
[ ] How Query Engines Work
[ ] Why you should probably be using SQLite | Epic Web Dev

⠀Scaling databases:

[ ] How Figma’s Databases Team Lived to Tell the Scale: interesting story about sharding

NoSQL

[ ] NOSQL Patterns
[ ] NoSQL Databases: a Survey and Decision Guidance
[ ] The DynamoDB docs has some great pages:
[ ] Read Consistency
[ ] From SQL to NoSQL
[ ] NoSQL Design for DynamoDB
[ ] Redis Explained

Postgres

[ ] Safe Operations For High Volume PostgreSQL (this is for PostgreSQL but works great for other DBs as well).
[ ] Transaction Isolation in Postgres, explained
[ ] PostgreSQL exercises
[ ] Postgres operations cheat sheet
[ ] Just use Postgres
[ ] Postgres is Enough
[ ] Postgres: don’t Do This
[ ] PostgreSQL and UUID as primary key

⠀Supplementary

sql proficiency

[ ] SQL Style Guide, metabase analytics, SQL best practices
[ ] AnimateSQL, Lost‑at‑SQL, join deep-dive
[ ] LearnDB-Py for internals, sql-tutorial
[ ] PGExercises, Postgres cheat sheet, UUID primary key guides
[ ] SQL styleguide
[ ] Best practices for writing SQL queries
[ ] Practical SQL for Data Analysis
[ ] Reasons why SELECT * is bad for SQL performance
[ ] Animate SQL
[ ] Lost at SQL, an SQL learning game
[ ] Joins 13 Ways
[ ] spandanb/learndb-py: learn database internals by implementing it from scratch.
[ ] SQL for the Weary

data engineering & pipelines

[ ] ETL vs ELT, Apache Kafka, Spark, Airflow, dbt
[ ] Data lake/warehouse/lakehouse architectures
[ ] Metrics pipelines (Zoomvamp), Data Engineering Cookbook
[ ] Streaming Systems book (Hassan), Uber Big Data platform
[ ] A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments
[ ] datastacktv/data-engineer-roadmap: roadmap to becoming a data engineer
[ ] Awesome Data Engineering Learning Path
[ ] Emerging Architectures for Modern Data Infrastructure
[ ] How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
- Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale.
- We need to consider domains as the first class concern, apply platform thinking to create self-serve data infrastructure, and treat data as a product.
[ ] MLOps
[ ] Uber’s Big Data Platform: 100+ Petabytes with Minute Latency
[ ] SQL should be the default choice for data transformation logic

🕸️ Stage 4: Distributed Systems & System Design

The cornerstone of Staff Engineering. Learn scale, fault-tolerance, consistency, and tradeoffs. Master building large-scale, fault-tolerant systems.

core concepts & resources

[ ] CAP, PACELC, Consensus (Paxos, Raft), Eventual Consistency, Sharding
[ ] Queues, Pub/Sub, Schedulers, Logs, Coordination Services (ZK, etcd)
[ ] Read-heavy vs write-heavy, durability vs performance
[ ] API Design, Caching, Rate Limiting, Circuit Breaking
[ ] DDIA, The Art of Scalability, AOSA
[ ] 12-Factor, Loige’s fast-app rules, Lethain’s intro to scale
[ ] Patterns: CAP, consensus, sharding, caches, pub/sub, rate-limit, circuit-breaker
[ ] The Twelve-Factor App
[ ] StaffEng System Design Primer
[ ] Distributed Systems 4th Edition
[ ] Distributed Systems Reading Group papers
[ ] Papers We Love
[ ] MIT 6.5840 – Spring 2025
[ ] madd86/awesome-system-design
[ ] https://www.distributed-systems.net/
[ ] The System Design Primer – DonneMartin
[ ] The Log – LinkedIn
[ ] High Scalability Blog
[ ] Awesome Distributed Systems Papers – Murat’s Blog

systems design

[ ] DonneMartin Primer, dancres pages, murat buffalo list
[ ] HighScalability blog, Martin Fowler’s patterns, Conways Law
[ ] Microservices: Sam Newman, Uber, Google engineering culture
[ ] ADR practice, designing for scale/disaster

Reading lists:

[ ] 🧰 donnemartin/system-design-primer: learn how to design large scale systems. Prep for the system design interview.
[ ] 🧰 A Distributed Systems Reading List
[ ] 🧰 Foundational distributed systems papers
[ ] 🧰 Services Engineering Reading List
[ ] 🧰 System Design Cheatsheet
[ ] karanpratapsingh/system-design: learn how to design systems at scale and prepare for system design interviews
[ ] A Distributed Systems Reading List

Blogs:

[ ] High Scalability: great blog about system architecture, its weekly review article are packed with numerous insights and interesting technology reviews. Checkout the all-times favorites.

Books:

[ ] 📖 Building Microservices, Sam Newman (quite complete discussion of microservices)
[ ] 📖 Designing Data-Intensive Applications
The Art of Scalability – Abott & Fisher
Site Reliability Engineering – Google
Release It! – Michael Nygard

Articles:

[ ] 6 Rules of thumb to build blazing fast web server applications
[ ] The twelve-factor app
[ ] Introduction to architecting systems for scale
[ ] The Log: What every software engineer should know about real-time data’s unifying abstraction: one of those classical articles that everyone should read.
[ ] Turning the database outside-out with Apache Samza
[ ] Fallacies of distributed computing, Wikipedia
[ ] The biggest thing Amazon got right: the platform
- All teams will henceforth expose their data and functionality through service interfaces.
- Monitoring and QA are the same thing.
[ ] Building Services at Airbnb, part 3
- Resilience is a Requirement, Not a Feature
[ ] Building Services at Airbnb, part 4
- Building Schema Based Testing Infrastructure for service development
[ ] Patterns of Distributed Systems, MartinFowler.com
[ ] ConwaysLaw, MartinFowler.com (regarding organization, check out my engineering-management list).
[ ] The C4 model for visualising software architecture
[ ] If Architects had to work like Programmers

Architecture patterns
BFF (backend for frontend)

[ ] Backends For Frontends
[ ] Circuit breaker
[ ] Rate limiter algorithms (and their implementation)
[ ] Interactive Guide: Mastering Rate Limiting
[ ] Load Balancing: a visual exploration of load balancing algos
[ ] Good Retry, Bad Retry: An Incident Story: insightful, well-written story about retries, circuit breakers, deadline, etc.
[ ] AWS Well-Architected Framework
- Operational excellence
- Security
- Reliability
- Performance efficiency
- Cost optimization
- Sustainability

Microservices/splitting a monolith

[ ] Monolith First, Martin Fowler
[ ] Service oriented architecture: scaling the Uber engineering codebase as we grow
[ ] Don’t start with microservices in production – monoliths are your friend
[ ] Deep lessons from Google And EBay on building ecosystems of microservices
[ ] Introducing domain-oriented microservice architecture, Uber
Instead of orienting around single microservices, we oriented around collections of related microservices. We call these domains.
In small organizations, the operational benefit likely does not offset the increase in architectural complexity.
[ ] Best Practices for Building a Microservice Architecture
[ ] 🏙 Avoid Building a Distributed Monolith
[ ] 🏙 Breaking down the monolith
[ ] Monoliths are the future
“We’re gonna break it up and somehow find the engineering discipline we never had in the first place.”
[ ] 12 Ways to Prepare your Monolith Before Transitioning to Microservices
[ ] Death by a thousand microservices
[ ] Microservices
[ ] Disasters I’ve seen in a microservices world

reliability (site reliability engineering – sre)

[ ] 📖 Site Reliability Engineering
- Written by members of Google’s SRE team, with a comprehensive analysis of the entire software lifecycle – how to build, deploy, monitor, and maintain large scale systems.

Quality is a snapshot at the start of life and reliability is a motion picture of the day-by-day operation. – NIST Reliability is the one feature every customer users. — An auth0 SRE.

Articles:

[ ] I already mentioned the book Release it! above. There’s also a presentation from the author.
[ ] Service Recovery: Rolling Back vs. Forward Fixing
[ ] How Complex Systems Fail
- Catastrophe requires multiple failures – single point failures are not enough.
- Complex systems contain changing mixtures of failures latent within them.
- Post-accident attribution to a ‘root cause’ is fundamentally wrong.
- Hindsight biases post-accident assessments of human performance.
- Safety is a characteristic of systems and not of their components
- Failure free operations require experience with failure.
[ ] Systems that defy detailed understanding
- Focus effort on systems-level failure, instead of the individual component failure.
- Invest in sophisticated observability tools, aiming to increase the number of questions we can ask without deploying custom code
[ ] Operating a Large, Distributed System in a Reliable Way: Practices I Learned, Gergely Orosz.
- A good summary of processes to implement.
[ ] Production Oriented Development
- Code in production is the only code that matters
- Engineers are the subject matter experts for the code they write and should be responsible for operating it in production.
- Buy Almost Always Beats Build
- Make Deploys Easy
- Trust the People Closest to the Knives
- QA Gates Make Quality Worse
- Boring Technology is Great.
- Non-Production Environments Have Diminishing Returns
- Things Will Always Break
[ ] 🏙 High Reliability Infrastructure migrations, Julia Evans.
[ ] Appendix F: Personal Observations on the Reliability of the Shuttle, Richard Feynman
[ ] Lessons learned from two decades of Site Reliability Engineering
[ ] Service Reliability Mathematics, Addy Osmani

⠀Resources:

[ ] 🧰 dastergon/awesome-sre
[ ] 🧰 upgundecha/howtheysre: a curated collection of publicly available resources on SRE at technology and tech-savvy organizations

infrastructure resilience

[ ] Circuit breaker, retries, load balancing, idempotency
[ ] AWS Well-Architected
[ ] Disaster stories, retrospective slides (Walking Dead, defensive patterns)
[ ] 🏙 The Walking Dead – A Survival Guide to Resilient Applications
[ ] 🏙 Defensive Programming & Resilient systems in Real World (TM)
[ ] 🏙 Full Stack Fest: Architectural Patterns of Resilient Distributed Systems
[ ] 🏙 The 7 quests of resilient software design
[ ] 🧰 Resilience engineering papers: comprehensive list of resources on resilience engineering
[ ] MTTR is more important than MTBF (for most types of F) (also as a presentation)

scalability

[ ] Scalable web architecture and distributed systems
[ ] 📖 Scalability Rules: 50 Principles for Scaling Web Sites (presentation)
[ ] Scaling to 100k Users, Alex Pareto. The basics of getting from 1 to 100k users.

papers & blogs

[ ] MapReduce
[ ] GFS
[ ] DynamoDB
[ ] Spanner
[ ] The Log – Jay Kreps
[ ] Martin Fowler on distributed patterns

☁️ Stage 5: Cloud, DevOps & Observability

Know how to deploy, monitor, debug, and scale systems in production. Ship, operate, and automate software at scale.

[ ] Production Ready Microservices — Susan Fowler
[ ] Google SRE Book

🔧 Cloud & Infrastructure

[ ] Terraform Up & Running – Yevgeniy Brikman
[ ] Learn Kubernetes the Hard Way
[ ] Open Guide to AWS, Google CRE guide, customer reliability foundations
[ ] AWS Well-Architected Framework
[ ] Multi-region + multi-account AWS Architecture
[ ] CI/CD: GitHub Actions, CircleCI, GitLab CI
[ ] VPC, IAM Secrets Management
[ ] https://github.com/open-guides/og-aws
[ ] https://martinfowler.com/articles/continuousIntegration.html
[ ] Terraform: Up & Running
[ ] Kubernetes the Hard Way
[ ] OpenGuide to AWS
[ ] Google Cloud Architecture Framework
[ ] AWS Well-Architected Framework

ci/cd & devops

[ ] GitHub Actions, Terraform, Docker, Kubernetes, CircleCI, GitLab CI
[ ] AWS/GCP: IAM, S3, Lambda, EC2, RDS, CloudWatch
[ ] CI/CD, Infrastructure as Code, Monitoring
[ ] Continuous Integration (Martin Fowler)
[ ] Docker internals, secrets management, VPC, IAM

🔍 Observability

[ ] USE Method – Brendan Gregg
[ ] RED Metrics – Weaveworks
[ ] Structured Logging + Sentry + Grafana Docs]
[ ] Google SRE Book: monitoring, SLOs, error budgets
[ ] Logging: Do not Log, Lies My Parents Told Me, OWASP logging cheat sheet, structured logging
[ ] Monitoring: USE, RED methods, SQL anomaly detection, golden signals, health checks

Logging

[ ] Do not log dwells on some logging antipatterns.
- Logging does not make much sense in monitoring and error tracking. Use better tools instead: error and business monitorings with alerts, versioning, event sourcing.
- Logging adds significant complexity to your architecture. And it requires more testing. Use architecture patterns that will make logging an explicit part of your contracts
- Logging is a whole infrastructure subsystem on its own. And quite a complex one. You will have to maintain it or to outsource this job to existing logging services
[ ] Lies My Parents Told Me (About Logs)
- Logs are cheap
- I can run it better myself
- Leveled logging is a great way to separate information
- Logs are basically the same as events
- A standard logging format is good enough
[ ] Logging – OWASP Cheat Sheet Series
[ ] The Audit Log Wall of Shame: list of vendors that don’t prioritize high-quality, widely-available audit logs for security and operations teams.
[ ] Guide on Structured Logs

Error/exception handling

[ ] Error handling antipatterns in this repo.
[ ] Writing Helpful Error Messages, Google Developers’ course on Technical Writing
- Explain the problem
- Explain the solution
- Write clearly
[ ] Errors, Errors Everywhere: How We Centralized and Structured Error Handling (for Go, but useful for any languages)
[ ] For inspiration: Handle Errors – Graph API

Metrics

[ ] Meaningful availability
- A good availability metric should be meaningful, proportional, and actionable. By “meaningful” we mean that it should capture what users experience. By “proportional” we mean that a change in the metric should be proportional to the change in user-perceived availability. By “actionable” we mean that the metric should give system owners insight into why availability for a period was low. This paper shows that none of the commonly used metrics satisfy these requirements…
[ ] 📃 Meaningful Availability paper.
- This paper presents and evaluates a novel availability metric: windowed user-uptime

Monitoring

[ ] Google, Site Reliability Engineering, Monitoring Distributed Systems
[ ] Alerting on SLOs
[ ] PagerDuty, Monitoring Business Metrics and Refining Outage Response
[ ] 🧰 crazy-canux/awesome-monitoring: monitoring tools for operations.
[ ] Monitoring in the time of Cloud Native
[ ] How to Monitor the SRE Golden Signals
[ ] From the Google SRE book: Latency, Traffic, Errors, and Saturation
[ ] USE Method (from Brendan Gregg): Utilization, Saturation, and Errors
[ ] RED Method (from Tom Wilkie): Rate, Errors, and Duration
[ ] Simple Anomaly Detection Using Plain SQL
[ ] How percentile approximation works (and why it’s more useful than averages)
[ ] Implementing health checks
[ ] IETF RFC Health Check Response Format for HTTP APIs

incident analysis & debugging

[ ] Debugging zine and rubber duck topics
[ ] 5 Whys vs narrative technique, bounded rationality, codinghorror
[ ] Netflix Linux Perf in 60s, root cause guides
[ ] JVNS tcpdump, falsehoods, minimal reproducible example
[ ] Good questions, downtime, SLO incident write-ups

incident analysis

[ ] Incident Response at Heroku
- Described the Incident Commander role, inspired by natural disaster incident response.
- [ ] Also in presentation: Incident Response Patterns: What we have learned at PagerDuty – Speaker Deck
[ ] The Google SRE book’s chapter about oncall
[ ] Writing Runbook Documentation When You’re An SRE
- Playbooks “reduce stress, the mean time to repair (MTTR), and the risk of human error.”
- Using a template can be beneficial because starting from a blank document is incredibly hard.
- The Curse of Knowledge is a cognitive bias that occurs when someone is communicating with others and unknowingly assumes the level of knowledge of the people they are communicating with.
- Make your content easy to glance over.
- If a script is longer than a single line, treat it like code, and check it into a repository to be source control and potentially tested.
[ ] Incident Review and Postmortem Best Practices, Gergely Orosz
[ ] Computer Security Incident Handling Guide, NIST
[ ] Incident Management Resources, Carnegie Mellon University
[ ] Sterile flight deck rule, Wikipedia
[ ] Shamir Secret Sharing It’s 3am.
[ ] Site Reliability Engineering and the Art of Improvisation has lots of good training ideas
- Walkthroughs of observability toolsets
- Decision requirements table building
- Team knowledge elicitation
- Asking the question, “Why do we have on-call?”
- Spin the Wheel of Expertise!
[ ] Severity Levels, PagerDuty

Alerting

[ ] My Philosophy On Alerting
- Pages should be urgent, important, actionable, and real.
- Err on the side of removing noisy alerts – over-monitoring is a harder problem to solve than under-monitoring.
- Symptoms are a better way to capture more problems more comprehensively and robustly with less effort.
- Include cause-based information in symptom-based pages or on dashboards, but avoid alerting directly on causes.
- The further up your serving stack you go, the more distinct problems you catch in a single rule. But don’t go so far you can’t sufficiently distinguish what’s going on.
- If you want a quiet oncall rotation, it’s imperative to have a system for dealing with things that need timely response, but are not imminently critical.
- [ ] This classical article has now become a chapter in Google’s SRE book.
[ ] 🏙 The Paradox of Alerts: why deleting 90% of your paging alerts can make your systems better, and how to craft an on-call rotation that engineers are happy to join.

Postmortem

[ ] A great example of a postmortem from Gitlab (01/31/2017) for an outage during which an engineer’s action caused the irremediable loss of 6 hours of data.
[ ] Blameless PostMortems and a Just Culture
[ ] A list of postmortems on Github
[ ] Google’s SRE book, Postmortem chapter is excellent and includes many examples.
[ ] Human error models and management
- High reliability organisations — which have less than their fair share of accidents — recognise that human variability is a force to harness in averting errors, but they work hard to focus that variability and are constantly preoccupied with the possibility of failure

“Let’s plan for a future where we’re all as stupid as we are today.”
– Dan Milstein

Example outline for a postmortem:

Executive Summary
- Impact
- Root cause
Impact
- Number of impacted users
- Lost revenue
- Duration
- Team impact
Timeline
- Detection
- Resolution
Root cause analysis
- E.g. with 5 whys method
Lessons learned
- Things that went well
- Things that went poorly
Action items (include direct links to task tracking tool)
- Tasks to improve prevention (including training)
- Tasks to improve detection (including monitoring and alerting)
- Tasks to improve mitigation (including emergency response)

debugging

[ ] Rubber Duck Problem Solving
[ ] Rubber Ducking
[ ] Five Whys
[ ] The Five Lies Analysis
- The real problem reveals itself when the technique becomes a part of a template.
- Action items can be very distant from the root cause.
- [ ] Related article: The Evolution of SRE at Google
[ ] The Infinite Hows criticizes the five whys method and advocates for a different set of questions to learn from the most from incidents.
- [ ] See also: Human errors: models and management
- “The issue with the Five Whys is that it’s tunnel-visioned into a linear and simplistic explanation of how work gets done and events transpire.”
- “Human error becomes a starting point, not a conclusion.” (Dekker, 2009)
- “When we ask ‘how?’, we’re asking for a narrative.”
- “When it comes to decisions and actions, we want to know how it made sense for someone to do what they did.”
- At each “why” step, only one answer will be selected for further investigation. Asking “how” encourage broader exploration.
- [ ] “In accident investigation, as in most other human endeavours, we fall prey to the What-You-Look-For-Is-What-You-Find or WYLFIWYF principle. This is a simple recognition of the fact that assumptions about what we are going to see (What-You-Look-For), to a large extent will determine what we actually find (What-You-Find).” (Hollnagel, 2009, p. 85) (see illustration of WYLFIWYF)
- “A final reason why a ‘root cause’ may be selected is that it is politically acceptable as the identified cause. Other events or explanations may be excluded or not examined in depth because they raise issues that are embarrassing to the organization or its contractors or are politically unacceptable.” (Nancy Leveson, Engineering a Safer World, p. 20)
- [ ] Bounded rationality: rational individuals will select a decision that is satisfactory rather than optimal
  - The article provide concrete ways and questions to solicit stories from people, which will yield better insights.
    - What were you expecting to happen?
    - If you had to describe the situation to your colleague at that point, what would you have told?
    - Did this situation fit a standard scenario?
    - What were you trying to achieve?Were there multiple goals at the same time?Was there time pressure or other limitations on what you could do?
    - [ ] See template here
[ ] Linux Performance Analysis in 60,000 Milliseconds
[ ] Post-Mortems at HubSpot: What I Learned From 250 Whys
[ ] Debugging zine, Julian Evans
[ ] If you understand a bug, you can fix it
[ ] The Thirty Minute Rule: if anyone gets stuck on something for more than 30 minutes, they should ask for help
[ ] How to create a Minimal, Reproducible Example, Stack Overflow
[ ] Some ways to get better at debugging, Julia Evans
- [ ] Learn the codebase
- [ ] Learn the system (e.g., HTTP stack, database transactions)
- [ ] Learn your tools (e.g., strace, tcpdump)
- [ ] Learn strategies (e.g., writing code to reproduce, adding logging, taking a break)
- [ ] Get experience: according to a study, “experts simply formed more correct hypotheses and were more efficient at finding the fault.”
[ ] What exactly is the ‘Saff Squeeze’ method of finding a bug?
- A systematic technique for deleting both test code and non-test code from a failing test until the test and code are small enough to understand.
[ ] tcpdump is amazing, Julia Evans
[ ] What we talk about when we talk about ‘root cause’
[ ] David A. Wheeler’s Review of “Debugging” by David J. Agans
[ ] Troubleshooting: The Skill That Never Goes Obsolete
- Includes links to interesting debugging stories
[ ] Falsehoods software teams believe about user feedback

testing

[ ] ⭐️ Testing strategies in a microservices architecture (Martin Fowler) is an awesome resources explaining how to test a service properly.
[ ] 🧰 Testing Distributed Systems

⠀Why test:

[ ] Why bother writing tests at all?, Dave Cheney. A good intro to the topic.
- Even if you don’t, someone will test your software
- The majority of testing should be performed by development teams
- Manual testing should not be the majority of your testing because manual testing is O(n)
- Tests are the critical component that ensure you can always ship your master branch
- Tests lock in behaviour
- Tests give you confidence to change someone else’s code

⠀How to test:

[ ] A quick puzzle to test your problem solving… and a great way to learn about confirmation bias and why you’re mostly writing positive test cases.
[ ] Testing is not for beginners: why learning to test is hard. This shouldn’t demotivate you though!
[ ] Arrange-act-assert: a pattern for writing good tests
[ ] Test smarter, not harder

⠀Test pyramid:

[ ] The test pyramid, Martin Fowler
[ ] Eradicating non-determinism in tests, Martin Fowler
[ ] The practical test pyramid, MartinFowler.com
- Be clear about the different types of tests that you want to write. Agree on the naming in your team and find consensus on the scope of each type of test.
- Every single test in your test suite is additional baggage and doesn’t come for free.
- Test code is as important as production code.
[ ] Software testing anti-patterns, Kostis Kapelonis.
[ ] Write tests. Not too many. Mostly integration. for a contrarian take about unit testing
[ ] 🎞 Unit test 2, Integration test: 0
[ ] Testing in the Twenties
[ ] Google Testing Blog: Test Sizes
[ ] Pyramid or Crab? Find a testing strategy that fits, web.dev

⠀End-to-end tests:

[ ] Just say no to more end-to-end tests, Google Testing Blog
[ ] End-to-end testing considered harmful

🧠 Stage 6: AI, ML, and Deep Learning

Build intelligence-powered systems and become ML/AI fluent.

[ ] https://www.coursera.org/specializations/aml

[ ] Coursera AML Specialization
[ ] ML Foundations, Regression, Classification, Clustering Specializations
[ ] Deep Learning.ai Specialization (NN, CNN, Sequence, Structuring Projects)
[ ] CS231n (Stanford), Fast.ai, RLL Berkeley, neuralnetworksanddeeplearning.com
[ ] Books: Deep Learning by Goodfellow, Grokking DL, Matrix Calculus explained
[ ] Applied AI Tools: Vector DBs (FAISS, Milvus, Pinecone, QDrant), semantic/hybrid search, RAG
[ ] Libraries: LangChain, LlamaIndex, Haystack, HuggingFace
[ ] Full Stack Deep Learning; MLOps; course on TensorFlow without PhD

artificial intelligence

machine learning

machine learning specialisation by university of washington on coursera

others

[ ] Coursera ML Specialization (Andrew Ng)
[ ] Grokking Deep Learning
[ ] Elements of Statistical Learning
[ ] https://www.analyticsvidhya.com/blog/2015/07/top-youtube-videos-machine-learning-neural-network-deep-learning/
[ ] Statistical Machine Learning 10-702/36-702
[ ] https://www.udacity.com/ai
[ ] https://www.udacity.com/drive
[ ] https://www.udacity.com/course/machine-learning-engineer-nanodegree–nd009
[ ] https://www.edx.org/xseries/data-science-engineering-apacher-sparktm
[ ] https://www.coursera.org/specializations/data-mining
[ ] https://www.coursera.org/specializations/machine-learning
[ ] http://web.stanford.edu/class/cs20si/syllabus.html
[ ] https://work.caltech.edu/telecourse.html
[ ] https://work.caltech.edu/telecourse.html
[ ] https://www.youtube.com/watch?v=bxe2T-V8XRs
[ ] https://www.youtube.com/watch?v=UVwwYZMFocg&list=PLiaHhY2iBX9ihLasvE8BKnS2Xg8AhY6iV&index=8
[ ] https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-868j-the-society-of-mind-fall-2011/video-lectures/
[ ] https://www.coursera.org/specializations/gcp-data-machine-learning

deep learning

[ ] CS231n (Stanford)
[ ] DeepLearning.ai Specialization
[ ] Neural Networks & Backprop, CNNs, RNNs, Transformers
[ ] PyTorch, TensorFlow, HuggingFace

Deep Learning by deeplearning.ai on Coursera

[ ] Neural Networks and Deep Learning
[ ] Improving Deep Neural Networks: Hyperparameter Tuning, Regularization, and Optimization
[ ] Structuring Machine Learning Projects
[ ] Convolutional Neural Networks
[ ] Sequence Models

Goals:

[ ] different activation functions (sigmoid/tanh/relu)
[ ] different cost functions
[ ] with and without bias units
[ ] classification and regression problems
[ ] text / binary / image / recommenders
[ ] batch vs stochastic
[ ] JS, Python, PHP, Matlab, TensorFlow, SciKitLearn
[ ] create visualizations and blog explanations
[ ] Audit best courses / books
[ ] http://explained.ai/matrix-calculus/index.html
[ ] Practical Deep Learning For Coders
[ ] https://classroom.udacity.com/courses/ud730
[ ] http://neuralnetworksanddeeplearning.com/
[ ] http://course.fast.ai/
[ ] http://www.deeplearningbook.org/
[ ] http://cs231n.github.io/ + https://www.youtube.com/playlist?list=PLlJy-eBtNFt6EuMxFYRiNRS07MCWN5UIA
[ ] http://neuralnetworksanddeeplearning.com/
[ ] https://www.youtube.com/playlist?list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH
[ ] http://rll.berkeley.edu/deeprlcourse/
[ ] http://rll.berkeley.edu/deeprlcourse/#lecture-videos
[ ] http://rll.berkeley.edu/deeprlcourse/
[ ] http://introtodeeplearning.com/index.html
[ ] https://www.youtube.com/watch?v=21EiKfQYZXc&app=desktop
[ ] https://courses.csail.mit.edu/6.042/spring17/mcs.pdf
[ ] http://yerevann.com/a-guide-to-deep-learning/
[ ] https://www.coursera.org/learn/neural-networks
[ ] https://www.youtube.com/playlist?list=PLE6Wd9FR–EfW8dtjAuPoTuPcqmOV53Fu
[ ] https://cloud.google.com/blog/big-data/2017/01/learn-tensorflow-and-deep-learning-without-a-phd
[ ] https://www.udacity.com/course/deep-learning–ud730
[ ] http://nbviewer.jupyter.org/github/domluna/labs/blob/master/Build%20Your%20Own%20TensorFlow.ipynb
[ ] https://goc.vivint.com/problems/mlc
[ ] http://blog.floydhub.com/coding-the-history-of-deep-learning/
[ ] https://www.udacity.com/course/deep-learning–ud730
[ ] https://stats385.github.io/
[ ] https://p.migdal.pl/interactive-machine-learning-list/
[ ] https://scrimba.com/g/gneuralnetworks

data mining & recommenders

nlp & computer vision

nlp

image & computer vision

electives

[ ] http://cagd.cs.byu.edu/~557/text/ch1.pdf
[ ] https://www.coursera.org/learn/data-driven-astronomy
[ ] https://www.coursera.org/specializations/genomic-data-science
[ ] https://www.coursera.org/learn/data-genes-medicine
[ ] https://www.coursera.org/specializations/systems-biology
[ ] https://www.coursera.org/specializations/networking-basics
[ ] https://www.coursera.org/learn/neurohacking
[ ] https://www.youtube.com/playlist?list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh
[ ] Raft/Paxos CAP Theorem / Redundancy

Resources

[ ] https://www.youtube.com/playlist?list=PLoROMvodv4rMWw6rRoeSpkiseTHzWj6vu&disable_polymer=true
[ ] https://github.com/open-source-society/data-science
[ ] https://unsupervisedmethods.com/over-150-of-the-best-machine-learning-nlp-and-python-tutorials-ive-found-ffce2939bd78
[ ] http://www.scipy-lectures.org/
[ ] https://github.com/mr-mig/every-programmer-should-know
[ ] https://online-learning.harvard.edu/series/professional-certificate-data-science
[ ] computational geometry https://www.youtube.com/watch?v=rho8QqiHOe4
[ ] kaggle school https://www.kaggle.com/learn/overview
[ ] MIT self driving https://selfdrivingcars.mit.edu/
[ ] MIT GAI https://agi.mit.edu/
[ ] https://ai.google/education
[ ] https://mlcourse.ai/
[ ] https://mml-book.github.io/
[ ] https://github.com/lexfridman/mit-deep-learning/blob/master/README.md#mit-deep-learning
[ ] http://d2l.ai/chapter_introduction/index.html
[ ] https://www.jgoertler.com/visual-exploration-gaussian-processes/
[ ] https://lectures.quantecon.org/py/short_path.html
[ ] http://webdam.inria.fr/Alice/ [databases]
[ ] https://hacker-tools.github.io/

applied ai systems

[ ] Vector DBs: FAISS, Milvus, Qdrant
[ ] Hybrid Search: BM25 + Vectors
[ ] RAG Pipelines (LangChain, LlamaIndex)
[ ] Full Stack Deep Learning
[ ] LLM Ops Playbooks (Haystack, LangChain)

Tools

LangChain, LlamaIndex, Haystack
[FAISS, Pinecone, QDrant, Weaviate]

🧭 Stage 7: Staff Engineer Influence, Docs, and Architecture

Learn to drive alignment, lead with documents, and scale your impact beyond code. Influence systems, architecture, and culture beyond code.

📚 Core Reading

leadership & management

[ ] Staff Engineer – Will Larson, Tanya Reilly
[ ] Staff Engineering Guides (StaffEng.com)
[ ] The Manager’s Path – Camille Fournier
[ ] The Art of Leadership – Michael Lopp
[ ] Will Larson’s “Staff Engineer: Leadership Beyond the Management Track”
[ ] Tanya Rilley’s “The Staff Engineer’s Path”
[ ] https://github.com/charlax/engineering-management

architecture & documentation

[ ] Building Microservices – Sam Newman
[ ] Google Eng Practices Writing
[ ] ADRs, RFCs, One-/Six-pager docs, design templates (Stripe, Google AIPs)

documentation

[ ] Documentation-Driven Development
[ ] Writing automated tests for your documentation: this should be required, IMO. Testing code samples in your documentation ensures they never get outdated.
[ ] 🏙 Documentation is king, Kenneth Reitz
[ ] Keep a Changelog
[ ] Architectural Decision Records (ADR): a way to document architecture decision.
[ ] Documenting Architecture Decisions
[ ] joelparkerhenderson/architecture-decision-record: examples and templates for ADR.
- [ ] And a CLI tool: npryce/adr-tools
[ ] The documentation system
[ ] Checklist for checklists
[ ] Best practices for writing code comments
[ ] Always be quitting
- Document your knowledge
- Train your replacement
- Delegate
- By being disposable, you free yourself to work on high-impact projects.
[ ] Write documentation first. Then build.
[ ] Diátaxis: a systematic approach to technical documentation authoring
- There are four modes: tutorials, how-to guides, technical reference and explanation
- The docs goes into a lot of details about each model.
[ ] ARCHITECTURE.md
[ ] Two open source projects with great documentation (esbuild and redis)
[ ] Rules for Writing Software Tutorials

The palest ink is more reliable than the most powerful memory. — Chinese proverb

api design & development

General REST content:

[ ] Architectural Styles and the Design of Network-based Software Architectures, Roy Fielding (the inventor of REST)
[ ] A collection of useful resources for building RESTful HTTP+JSON APIs.
[ ] Best practices for REST API design, Stack Overflow Blog
[ ] 📖 Undisturbed REST: a guide to designing the perfect API: very complete book about RESTful API design.

⠀Example guidelines:

[ ] Microsoft’s Rest API guidelines
[ ] Zalando RESTful API and Event Scheme Guidelines
[ ] Google’s API Design Guide: a general guide to design networked API.
[ ] AIP-1: AIP Purpose and Guidelines
AIP stands for API Improvement Proposal, which is a design document providing high-level, concise documentation for API development.

⠀More specific topics:

[ ] Why you should use links, not keys, to represent relationships in APIs, Martin Nally, Google
“Using links instead of foreign keys to express relationships in APIs reduces the amount of information a client needs to know to use an API, and reduces the ways in which clients and servers are coupled to each other.”
[ ] Give me /events, not webhooks
Events can unlock much-needed webhook features, like allowing your webhook consumers to replay or reset the position of their webhook subscription.
[ ] Unlocking the Power of JSON Patch

design (oo modelling, architecture, patterns, anti-patterns)

Here’s a list of good books:

[ ] 📖 Design Patterns: Elements of Reusable Object-Oriented Software: dubbed “the gang of four”, this is almost a required reading for any developer. A lot of those are a bit overkill for Python (because everything is an object, and dynamic typing), but the main idea (composition is better than inheritance) definitely is a good philosophy.
- [ ] And their nefarious nemesis Resign Patterns
[ ] 📖 Patterns of Enterprise Application Architecture: learn about how database are used in real world applications. Mike Bayer’s SQLAlchemy has been heavily influenced by this book.
[ ] 📖 Domain-Driven Design: Tackling Complexity in the Heart of Software, Eric Evans
[ ] 📖 Clean Architecture, Robert C. Martin. Uncle Bob proposes an architecture that leverages the Single Responsibility Principle to its fullest. A great way to start a new codebase. Also checkout the clean architecture cheatsheet and this article.
[ ] 📖 Game Programming Patterns: a book about design, sequencing, behavioral patterns and much more by Robert Nystrom explained through the medium of game programming. The book is also free to read online here.
[ ] One of the absolute references on architecture is Martin Fowler: checkout his Software Architecture Guide.

Articles:

[ ] O’Reilly’s How to make mistakes in Python
[ ] Education of a Programmer: a developer’s thoughts after 35 years in the industry. There’s a particularly good section about design & complexity (see “the end to end argument”, “layering and componentization”).
[ ] Domain-driven design, Wikipedia.
[ ] On the Spectrum of Abstraction 🎞, Cheng Lou
[ ] The “Bug-O” Notation, Dan Abramov
[ ] Antipatterns
[ ] Inheritance vs. composition: a concrete example in Python. Another slightly longer one here. One last one, in Python 3.
[ ] Composition Instead Of Inheritance
[ ] Complexity and Strategy: interesting perspective on complexity and flexibility with really good examples (e.g. Google Apps Suite vs. Microsoft Office).
[ ] The Architecture of Open Source Applications
[ ] The Robustness Principle Reconsidered
- Jon Postel: “Be conservative in what you do, be liberal in what you accept from others.” (RFC 793)
- Two general problem areas are impacted by the Robustness Principle: orderly interoperability and security.
[ ] Basics of the Unix Philosophy, Eric S Raymond
[ ] Eight Habits of Expert Software Designers: An Illustrated Guide
[ ] No Silver Bullet – Essence and Accident in Software Engineering, Frederick P. Brooks, Jr. (1986)
- There are four properties of software systems which make building software hard: Complexity, Conformity, Changeability and Invisibility
- There are ways to address this:
  - Exploiting the mass market to avoid constructing what can be bought. (“Buy vs. Build”)
  - Using rapid prototyping as part of a planned iteration in establishing software requirements.
  - Growing software organically, adding more and more function to systems as they are run, used, and tested
  - Identifying and developing the great conceptual designers of the rising generation.
  - (also included in The Mythical Man-Month)
[ ] Out of the Tar Pit, Ben Moseley, Peter Marks (2006) introduces the distinction between essential and accidental complexity
- Complexity is the root cause of the vast majority of problems with software today. Unreliability, late delivery, lack of security — often even poor performance in large-scale systems can all be seen as deriving ultimately from unmanageable complexity.
- Quoting Djikstra: “testing is hopelessly inadequate….(it) can be used very effectively to show the presence of bugs but never to show their absence.”
- Functional programming goes a long way towards avoiding the problems of state-derived complexity, thanks to immutability and clear separation of state and logic.
[ ] A Note on Essential Complexity
- The goal of the software engineer is to minimize accidental complexity and assist with essential complexity.
[ ] Software Design is Knowledge Building
- Programming should be regarded as an activity by which the programmers form or achieve a certain kind of insight, a theory, of the matters at hand. This suggestion is in contrast to what appears to be a more common notion, that programming should be regarded as a production of a program and certain other texts.
- The building of the program is the same as the building of the theory of it by the team of programmers.
[ ] Cognitive load is what matters
- A well-crafted monolith with truly isolated modules is often much more flexible than a bunch of microservices.
- Three decades on, microkernel-based GNU Hurd is still in development, and monolithic Linux is everywhere
- “Reduce cognitive load by limiting the number of choices.” (Rob Pike)
- The same rule applies to all sorts of numeric statuses (in the database or wherever) – prefer self-describing strings.
- With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody. (Hyrum’s Law)
- DDD is about problem space, not about solution space.
- Familiarity is not the same as simplicity
- The more mental models there are to learn, the longer it takes for a new developer to deliver value.

You can use an eraser on the drafting table or a sledge hammer on the construction site. (Frank Lloyd Wright)

Resources:

[ ] 🧰 Design Principles

Design: database schema

[ ] A humble guide to database schema design, Mike Alche
- Use at least third normal form
- Create a last line of defense with constraints
- Never store full addresses in a single field
- Never store firstname and lastname in the same field
- Establish conventions for table and field names.
[ ] YAGRI: You are gonna read it: store created_at, created_by etc.

Design: patterns

[ ] KeystoneInterface, Martin Fowler.
- Build all the back-end code, integrate, but don’t build the user-interface
[ ] 101 Design Patterns & Tips for Developers
[ ] Python Design Patterns: For Sleek And Fashionable Code: a pretty simple introduction to common design patterns (Facade, Adapter, Decorator). A more complete list of design patterns implementation in Python on Github.
[ ] SourceMaking’s Design Patterns seems to be a good web resource too.
[ ] Anti-If: The missing patterns

Design: simplicity

Simple Made Easy 🎞, Rich Hickey. This is an incredibly inspiring talk redefining simplicity, ease and complexity, and showing that solutions that look easy may actually harm your design.

coaching & culture

Mentoring strategies, code review frameworks
Inclusive teams, feedback, performance coaching, async culture

strategic vision

Technical debt frameworks, vision roadmapping
Staff roles: Tech Lead, Architect, Solver, Right-hand
Conference talks, strategy presentation skills (C4 Model)

career growth

[ ] The Conjoined Triangles of Senior-Level Development looks into how to define a senior engineer.
[ ] Ten Principles for Growth as an Engineer, Dan Heller.
[ ] Don’t Call Yourself a Programmer, Patrick McKenzie.
[ ] On being an Engineering Manager
[ ] The career advice I wish I had at 25
- A career is a marathon, not a sprint
- Most success comes from repetition, not new things
- If work was really so great all the rich people would have the jobs
- Management is about people, not things
- Genuinely listen to others
- Recognise that staff are people with finite emotional capacity
- Don’t just network with people your own age
- Never sacrifice personal ethics for a work reason
- Recognise that failure is learning
[ ] Career advice I wish I’d been given when I was young
- Don’t focus too much on long-term plans.
- Find good thinkers and cold-call the ones you most admire.
- Assign a high value to productivity over your whole lifespan.
- Don’t over-optimise things that aren’t your top priority.
- Read a lot, and read things that people around you aren’t reading.
- Reflect seriously on what problem to prioritise solving.
- Read more history.
[ ] Why Good Developers are Promoted into Unhappiness, Rob Walling. Or why management might not be for you.
[ ] A guide to using your career to help solve the world’s most pressing problems
[ ] What’s a senior engineer’s job? You need to be more than just an individual contributor.
[ ] From Coding Bootcamp Graduate to Building Distributed Databases
- Read Books (and papers), not Blog Posts
- Take responsibility for your career trajectory
[ ] 🏙 The Well Rounded Engineer includes lots of great book recommendations.
- Paradigm polyglot (learn different languages & paradigms)
- Database polyglot
- Protocol polyglot (preferably TCP/IP and HTTP)
- Proficiency with build tooling, packaging and distribution
- Debugging, observability
- Deployment, infra and devops
- Software architecture and scaling
- Ability to write toy compilers, interpreters and parsers
- Ability to write toy games
- Ability to understand algorithmic analysis
[ ] Some career advice, Will Larson.
- Advice you get is someone’s attempt to synthesize their experiences, not an accurate statement about how the world works.
- Build a reservoir of prestige.
- Some folks are so good at something that they end up being irreplaceable in their current role, which causes them to get stuck in their role even if they’re a good candidate for more interesting ones.
- Great relationships will follow you everywhere you go. Bad ones too.
- Early in your career, try to work at as many different kinds of companies and in different product vertical as you can.
[ ] Evil tip: avoid “easy” things
[ ] The Ultimate Code Kata
[ ] Traits of a senior software engineer: impact, perception, visibility, influence, mentoring
[ ] Software Engineering – The Soft Parts
- Think critically and formulate well-reasoned arguments
- Master the fundamentals
- Focus on the user and all else will follow
- Learn how to learn
[ ] How To Own Your Growth As A Software Engineer
[ ] The Forty-Year Programmer
- The Better You Get, the Less You Look Like Everybody Else
- You Learn Deep Principles by Doing the Basics
- Look to Other Fields, Learn From Other Fields
- Be Careful About Productivity Tips
[ ] Senior Engineers are Living in the Future
[ ] What would a map of your career look like?
[ ] How to be successful at Amazon (or any other large company for that matter)

About senior engineers:

[ ] Falsehoods Junior Developers believe about becoming Senior

Choosing your next/first opportunity

[ ] Career Decisions – by Elad Gil – Elad Blog

Getting to Staff Eng

[ ] I became a FAANG Staff Engineer in 5 years. These are the 14 lessons I learned along the way.
- Software engineering isn’t just coding. Actually, coding is a small part of it.
- Pipeline your work
- Be open to feedback and listen. Like, seriously, listen.
- Great feedback is hard to find; treasure it.
- Keep an eye on the horizon (but not both).
- Figure out what matters and let the rest go.
- Comparison really is the thief of joy.
- Mentorship is a beautiful thing.
- Good days, in general, don’t just “happen”.
- Advice and guidance are just that; they aren’t rules.
[ ] Guides for reaching Staff-plus engineering roles, Will Larson
- [ ] Being visible
- [ ] Additional resources on Staff-plus engineering
[ ] Staff archetypes, Will Larson

design docs & influence

[ ] Writing RFCs, ADRs, and persuasive technical memos
[ ] Guide to Writing Great Design Docs
[ ] Google Eng Practices

mentoring, communication, culture

[ ] 1:1s, feedback, async communication, team influence
[ ] Guiding juniors, setting standards, building healthy review culture
[ ] How to communicate effectively as a developer
- Lots of concrete advice and examples for short, medium and long-form writing
[ ] What Do You Visualize While Programming?

strategy & vision

[ ] Technical debt management
[ ] Leading architectural change
[ ] Staff archetypes (Tech Lead, Architect, Solver, Right Hand)

technical debt management

[ ] TechnicalDebt, Martin Fowler.
[ ] Fixing Technical Debt with an Engineering Allocation Framework
- You don’t need to stop shipping features to fix technical debt
- Communicate the business value
[ ] Ur-Technical Debt
- Today, any code that a developer dislikes is branded as technical debt.
- Ward Cunningham invented the debt metaphor to explain to his manager that building iteratively gave them working code faster, much like borrowing money to start a project, but that it was essential to keep paying down the debt, otherwise the interest payments would grind the project to a halt.
- Ur-technical debt is generally not detectable by static analysis.
[ ] 3 Kinds of Good Tech Debt

🔨 Stage 8: Mastery Projects & Public Work

Turn knowledge into tangible systems. Prove staff-level capability through impact. Apply knowledge gained to deeply engineered, demonstrable projects.

Personal Projects

✅ Ram – a custom multi-language build system – Build Systems: a Not So Short Introduction, Build Systems by Example, Bazel Overview, Buck2 Documentation, Shake (Haskell)
✅ VectorCrate – a vector + full-text hybrid search engine
✅ GitHub Clone – single-repo infra + remote coding
✅ Notification System – Facebook-style distributed notification service
✅ Feature Flag Service – rollout infra with audit logs and dashboard
✅ Contribute to open source projects

read list

[ ] The Art of Unix Programming
[ ] The C programming language
[ ] Gödel, Escher, Bach: An Eternal Golden Braid
[ ] Deep Learning (Goodfellow, Bengio, Courville)
[ ] Grokking Deep Learing
[ ] Grokking Deep Reinforcement Learning
[ ] Compilers: Principles, Techniques, and Tools (Dragon book)
[ ] Code
[ ] The elements of statistical learning
[ ] The structure and intepretation of computer programs
[ ] Hackers Delight
[ ] Concrete Mathematics
[ ] The Art of Computer Programming
[ ] Artificial Intelligence: A Modern Approach
[ ] https://blog.ycombinator.com/learning-math-for-machine-learning/

Random

📱 Mobile Development

Goal: Build and publish native iOS and Android apps with modern UI and production-quality architecture
Technologies:
- Swift + SwiftUI (iOS)
- Kotlin + Android SDK
Projects: Personal productivity app, Bible study app, or dev tools

🇪🇸 Spanish (with Lingoda)

Goal: Reach conversational and eventually fluent Spanish through structured, CEFR-aligned immersion
Platform: Lingoda (live classes + full curriculum)
Focus Areas:
- Speaking confidently in everyday situations
- Listening & comprehension at native speed
- Grammar, pronunciation, and vocabulary expansion
Timeline: To begin once schedule allows more time

🇩🇪 German (with Lingoda)

Goal: Learn German up to B2/C1 level with focus on clear speaking, reading, and cultural understanding
Platform: Lingoda
Focus Areas:
- Daily conversation and travel fluency
- Reading German technical and cultural materials
- Accent training, grammar, and listening
Timeline: Starts after completing Spanish Sprint

🙏 Acknowledgments & Thanks

This study plan stands on the shoulders of giants. Most — if not all — of the resources and structure within this roadmap were inspired by or directly sourced from the incredible work found in:

Desi Cochrane’s Data Science Curriculum
Charlax’s Professional Programming Guide
TeachYourselfCS.com
ChatGPT (OpenAI) — for structure, and refinement

I’m immensely grateful to the creators of these resources for sharing their wisdom so freely. This study plan wouldn’t exist without their contributions.

📅 Update Log

Date	Update
July 3, 2025	Fully expand study plan to include staff engineer roadmap, ai + ml + deep learning and personal projects
June 22, 2025	Page created. Rust learning active.

A Chameleon’s Survival

Friendship is the shadow of the evening

Testing A Node/Express Application With Mocha & Chai

Andela’s EPIC Values, are they really EPIC?

📘 Learning

🟢 Active Focus

✝️ Nave’s Topical Bible

🚧 Study Plan

🌱 Stage 0: Mindset, Growth, and Career Strategy

🧭 Must-Reads on Growth & Seniority

🔁 Career Navigation & Strategy

📖 books

new mentorship & influence topics

🖊️ writing (communication, blogging)

⠀Personal knowledge management (PKM)

🧮 Stage 1: Math, Programming Fluency & Algorithms

elementary math

college math

calculus

linear algebra

discrete math

proofs and logic

number theory

combinatorics

graph theory

supplementary material

probability and statistics

probability

statistics

key books:

🛠 Languages (Deep Proficiency in 2, Working Knowledge in 2+)

typescript (+ javascript, next js, shadcn/ui, tailwind css)

web development

📚 Programming Theory

🧠 Algorithms & Data Structures

compilers & interpreters (advanced programming)

⚙️ Stage 2: Computer Systems, OS, and Architecture

computer architecture & databases

operating systems

networking

performance engineering

information theory

🌐 Stage 3: Databases & Data Engineering

databases

sql proficiency

data engineering & pipelines

🕸️ Stage 4: Distributed Systems & System Design

core concepts & resources

systems design

reliability (site reliability engineering – sre)

infrastructure resilience

scalability

papers & blogs

☁️ Stage 5: Cloud, DevOps & Observability

🔧 Cloud & Infrastructure

ci/cd & devops

🔍 Observability

Logging

Error/exception handling

Metrics

Monitoring

incident analysis & debugging

incident analysis

Alerting

Postmortem

debugging

testing

🧠 Stage 6: AI, ML, and Deep Learning

artificial intelligence

machine learning

machine learning specialisation by university of washington on coursera

others

deep learning

data mining & recommenders

nlp & computer vision

nlp

image & computer vision

electives

applied ai systems

Tools