Get Ahead of 99% of Software Engineers: The 98 System Design Concepts You Need to Know

Most engineers write code. Senior engineers design systems. Here are the 98 system design concepts that separate the two — from CAP theorem to canary releases.

Most software engineers write code. Senior engineers design systems.

That gap between mid-level and staff engineer isn't just years on a résumé. It's a fundamentally different mental model for how software behaves at scale, under failure, and across distributed infrastructure.

This article covers the 98 system design concepts that separate the two. Not a flat list to memorize — a structured breakdown organized so you can build understanding progressively, layer by layer.

Concepts 1–6: The Core Fundamentals

These are the first principles everything else is built on. If you can't explain these cold, everything downstream falls apart.

1. Scalability

The ability of a system to handle increased load by adding resources. Two approaches: vertical scaling (bigger machine) and horizontal scaling (more machines). Most production systems rely on horizontal scaling because vertical has a hard ceiling.

2. Availability

The percentage of time a system is operational. Expressed as "nines" — 99.9% means ~8.7 hours of downtime per year, 99.99% means ~52 minutes. Your availability target shapes every architectural decision that follows.

3. Reliability

A system is reliable if it consistently performs its intended function, even under adverse conditions. Availability and reliability are related but not identical — a system can be available (up) but unreliable (returning wrong results).

4. Latency

The time it takes for a single request to complete, measured in milliseconds or microseconds. P99 latency (the worst 1% of requests) usually matters more than average latency. Your slowest requests define your user experience.

5. Throughput

The number of operations a system can handle per unit of time — requests per second, transactions per second, messages per second. Throughput and latency are often in tension: optimizing one frequently hurts the other.

6. Capacity

The maximum load a system can handle before degrading. Capacity planning is about estimating future load and provisioning resources before you hit the ceiling — not after.

> "The best time to plan for scale is before you need it. The second best time is right now."

Concepts 7–19: Networking & API Design

7–8. Client-Server & DNS

The Client-Server model is the foundational architecture of the web — clients request, servers respond. Understanding its limits is the starting point for every distributed pattern. DNS translates domain names into IP addresses; TTLs, caching, and propagation delays are real engineering concerns, not just ops trivia.

9–10. CDN & Load Balancing

A CDN serves content from geographically distributed nodes closer to users, reducing latency and offloading origin servers. Load Balancing distributes requests across multiple servers to prevent bottlenecks — using round-robin, least-connections, or consistent hashing algorithms.

11–13. REST, GraphQL & gRPC

REST is stateless, resource-based, and universally understood — but suffers from over-fetching and under-fetching. GraphQL lets clients request exactly the data they need, solving REST's inefficiency at the cost of query complexity. gRPC uses binary Protocol Buffers for high-performance service-to-service communication.

14–16. API Gateway, BFF & WebSockets

An API Gateway is the single entry point for all client requests — routing, auth, and rate limiting in one place. A BFF (Backend for Frontend) creates a dedicated backend per frontend type, optimized for that client's specific data needs. WebSockets enable persistent, full-duplex connections for real-time features like live chat and collaborative editing.

17–19. Authentication, Authorization & Rate Limiting

Authentication answers: who are you? Authorization answers: what are you allowed to do? They are always distinct, often confused. Rate Limiting restricts requests per client per time window — protecting against abuse, DDoS, and accidental thundering herd problems.

Concepts 20–35: Data Storage

Data is the hardest part to get wrong. Unlike application logic, you can't just redeploy a bad data architecture.

20. SQL vs NoSQL

SQL databases offer strong consistency, ACID transactions, and structured schemas — ideal for relational data. NoSQL databases trade some consistency for flexibility, scale, and speed. The choice depends on your data model, not just your scale.

21–23. Indexing, Denormalization & Query Optimization

Indexing speeds up reads at the cost of write overhead — knowing which queries to index is a senior-level skill. Denormalization intentionally introduces redundancy for read performance. Query Optimization is the art of making slow queries fast: execution plans, covering indexes, avoiding N+1 queries.

24–25. ACID & BASE

ACID (Atomicity, Consistency, Isolation, Durability) guarantees reliable transactions — the standard in relational databases. BASE (Basically Available, Soft state, Eventually consistent) is the alternative model in many distributed NoSQL systems, trading strict consistency for availability and speed.

26–28. Replication, Partitioning & Sharding

Replication copies data across nodes for redundancy and read scalability. Partitioning splits large datasets across nodes. Sharding is horizontal partitioning by a shard key — critical for scaling databases beyond what a single machine can hold.

29–31. Connection Pooling, Materialized Views & Full-Text Search

Connection Pooling reuses expensive database connections rather than creating new ones per request. Materialized Views store pre-computed query results for dramatically faster reads on complex aggregations. Full-Text Search engines handle relevance ranking and tokenization that standard databases cannot.

32–34. Time Series DB, Vector DB & Data Lake vs Warehouse

Time Series DBs (InfluxDB, Prometheus) are purpose-built for timestamped metrics and IoT data. Vector DBs (Pinecone, Weaviate) store high-dimensional embeddings for AI-powered semantic search. A Data Lake stores raw unstructured data; a Data Warehouse stores structured, queryable data — modern lakehouses blur the line between them.

35. Caching

The single highest-impact performance optimization in most systems. Storing frequently accessed data in fast memory to avoid slow recomputation or round-trips to the database. Cache hit rate is the metric that matters most.

Concepts 36–48: Scalability Patterns

36–38. Cache Invalidation, Stampede & Warming

Cache Invalidation is the hard problem: knowing when to remove or update cached data. Strategies include TTL expiration, event-driven invalidation, and write-through caching. Cache Stampede happens when a key expires and thousands of requests simultaneously hit the backend — solved with probabilistic expiration or background refresh. Cache Warming pre-populates caches before traffic arrives to prevent cold-start latency spikes.

39–40. Autoscaling & Load Shedding

Autoscaling automatically adds or removes compute resources based on real-time load metrics — essential for handling traffic spikes without manual intervention. Load Shedding intentionally rejects requests when a system is overloaded. Better to fail fast for some users than fail slowly for all.

41–48. The Reliability Pattern Suite

Circuit Breaker stops sending requests to a failing service, giving it time to recover — preventing cascading failures. Bulkhead isolates resource pools so a failure in one area doesn't exhaust resources for everything else. Retry Logic with exponential backoff handles transient failures — but must pair with Idempotency, because retrying a non-idempotent operation can cause data corruption. Timeout prevents slow dependencies from holding connections indefinitely. Backpressure lets downstream systems signal upstream producers to slow down before buffers overflow. Fault Tolerance and High Availability round out the set — eliminating single points of failure through redundancy and failover.

Concepts 49–60: Distributed Systems

49. CAP Theorem

In a distributed system, you can only guarantee two of three: Consistency, Availability, Partition tolerance. Since network partitions are inevitable in real distributed systems, you're really choosing between CP and AP. This single insight explains most distributed database design decisions.

50–51. Consistency Models & Replication Strategies

The spectrum runs from strong consistency (every read reflects the latest write) to eventual consistency (reads will eventually converge). Strong is safer and slower. Eventual is faster and requires careful application-level design to handle stale reads correctly.

52–55. Consensus, Merkle Trees & Probabilistic Data Structures

Consensus algorithms (Raft, Paxos) allow distributed nodes to agree on a single value despite failures — the backbone of distributed databases and coordination services. Merkle Trees efficiently verify data consistency between nodes (used in Git and Bitcoin). Bloom Filters answer "is this in the set?" using minimal memory with no false negatives. HyperLogLog estimates cardinality of massive datasets using kilobytes of memory.

56–60. Service Infrastructure

Service Discovery lets services locate each other dynamically at scale — static configuration breaks in microservices. A Service Mesh (Istio, Linkerd) handles service-to-service communication, mTLS encryption, and observability without touching application code. The Sidecar Pattern deploys helper containers alongside services to handle cross-cutting concerns like logging and proxying.

Concepts 61–75: Architecture Patterns

61. Microservices vs Monolith

Monolith: single deployable unit — simple to build and debug, hard to scale at size. Microservices: independently deployable services — flexible and scalable, but operationally complex. Start monolith, migrate to microservices when you feel the pain. Most teams move too early and regret it.

62–65. Event-Driven, Message Queue, Pub/Sub & Async

Event-Driven Architecture decouples services through events instead of direct calls — scalable but harder to trace. Message Queues (Kafka, RabbitMQ, SQS) buffer producers from consumers and absorb traffic spikes. Pub/Sub broadcasts to multiple independent subscribers. Sync vs Async: synchronous is simple, asynchronous is resilient.

66–68. CQRS, Event Sourcing & Strangler Pattern

CQRS separates write models from read models, letting each be independently optimized. Event Sourcing stores state as an immutable event log — any past state is reconstructable by replaying events. The Strangler Pattern incrementally migrates a monolith to microservices, avoiding the risk of a big-bang rewrite.

69–75. Deployment & Observability Patterns

Blue-Green: two identical production environments, instant rollback. Canary Release: gradual rollout to a percentage of users to minimize blast radius. Feature Flags: decouple deployment from release at runtime without shipping new code. Observability is built on three pillars — Logs, Metrics, and Traces. Correlation IDs link requests across multiple services in distributed tracing. Monitoring and Alerting catch problems before users do.

Concepts 76–98: Advanced Concepts

76–82. Data Processing

MapReduce processes massive datasets in parallel — the conceptual foundation of distributed computation. Batch Processing handles large chunks on a schedule: ETL jobs, model training, end-of-day reports. Stream Processing (Kafka Streams, Apache Flink) handles data in real-time as it arrives — fraud detection, live analytics, real-time recommendations. ETL and Data Pipelines are the infrastructure that moves and transforms data between systems.

83–89. Storage Internals

LSM Trees (Log-Structured Merge Trees) are write-optimized — used in Cassandra and RocksDB. B-Trees are the read-optimized standard in relational databases. Data Compression formats (Gzip, Snappy, Zstandard) trade CPU for I/O. Serialization formats (JSON vs Protobuf vs Avro) have massive performance implications at scale — binary formats can be 5-10x faster and smaller than JSON.

90–94. Security

RBAC assigns permissions to roles, not users — simpler to manage at scale. SSO (OAuth 2.0, SAML, OIDC) lets users authenticate once across multiple applications. Encryption at rest and in transit is non-negotiable. Secrets Management keeps credentials out of codebases. Checksums and Erasure Coding verify and recover data integrity in distributed storage systems.

95–98. Patterns That Separate Good from Great

WebRTC enables peer-to-peer real-time communication directly between browsers — no server required for the media stream. Idempotency keys make distributed operations safe to retry without side effects. Backpressure mechanisms prevent cascading overload across service boundaries. CDN Caching strategies at the HTTP layer require fundamentally different design thinking than application-level caching — they're two separate problems.

---

How to Actually Learn This

Reading a list is the beginning, not the end. The engineers who truly internalize system design do three things:

1. Build toy versions. Implement a rate limiter, a simple cache, a pub/sub system from scratch. You will understand in 2 hours what 10 articles will never teach you.

2. Read post-mortems. Netflix, Cloudflare, GitHub, Discord, and Stripe all publish incident reports publicly. Real outages reveal how these concepts fail in production — and that is where the real learning happens.

3. Practice with real problems. Design Twitter's feed, Uber's dispatch system, Stripe's payment processor end-to-end. Force yourself to make trade-off decisions under pressure, not just understand concepts in isolation.

> "System design isn't memorization. It's a set of trade-off decisions that you get better at by thinking through real problems repeatedly."

The engineers who master this don't just pass interviews. They become the people their teams look to when something breaks at 3am — and they get promoted because of it.

---

What You Should Do Now

If you're preparing for interviews

✅ Pick 10 concepts per week from this list and go deep, not wide

✅ Use real architectures as case studies — study how Netflix, Airbnb, and Uber document their own systems publicly

✅ Practice explaining trade-offs out loud — system design interviews evaluate your reasoning process, not memorization

✅ Build one small system end-to-end that uses at least 10 of these concepts working together

If you're a working engineer

✅ Map these concepts to your current codebase — which ones are you using? Which are missing and causing pain?

✅ Read your company's post-mortems and identify which concepts were violated when things went wrong

✅ Start a system design reading group with your team — discussing is 10x more effective than reading alone

✅ Propose one architectural improvement at your next team meeting using vocabulary from this list

For everyone

✅ Follow engineers who write about systems — infrastructure engineering blogs are the highest signal content available

✅ Contribute to open-source distributed systems — reading production code teaches more than any tutorial

✅ Keep a "concepts journal" — every time you encounter one of these in the wild, write down how it was used and what trade-offs were made

The Bottom Line

The 98 concepts in this article are not 98 separate things to memorize. They are a vocabulary for reasoning about trade-offs.

Every system design decision is a negotiation between latency and throughput, consistency and availability, simplicity and scalability. The engineers who get promoted, who ace interviews, and who lead architecture decisions are the ones who have internalized that negotiation well enough to make it fast.

You don't need to master all 98 at once. You need to start — and keep going.

The gap between you and the top 1% is not talent. It's vocabulary, mental models, and deliberate practice. All three are learnable. 🏗️

---

Want structured paths to master each of these concepts in order? Explore our Backend and System Design roadmaps →