Meta Interview Question

Design a Content Delivery Network (CDN) — Meta Interview

hard20 minBackend System Design

How Meta Tests This

Meta (Facebook) interviews focus heavily on social graph systems, real-time messaging, content delivery at scale, and feed ranking algorithms. Their system design rounds test your ability to design products used by billions of people daily.

Interview focus: Social feeds, messaging systems, content delivery, real-time features, and collaborative tools.

Key Topics
cdndistributed systemscachinganycastedge computingcloudflare

How to Design a Content Delivery Network (CDN)

The CDN question is one of the most nuanced system design prompts you can get. It shows up at Cloudflare (obviously), but also at Amazon, Google, Meta, Netflix, and Fastly. And it catches candidates off guard because the surface answer — "put servers close to users and cache stuff" — is easy to give, but the depth required to actually impress a senior interviewer is significant.

What makes a request go to the right edge server and not just any edge server? What's the difference between a CDN that works at startup scale and one that absorbs a 500 Gbps DDoS attack without flinching? How do you invalidate cached content across 300 edge locations without introducing a thundering herd on your origin? Why does Cloudflare use Anycast while Akamai historically used DNS-based routing — and when does each break?

These are the questions that separate a good CDN answer from a great one. This guide covers all of them, in the kind of back-and-forth you'd actually have in the interview room.


Step 1: Clarify the Scope

Interviewer: Design a content delivery network.

Candidate: A few questions before I start. Are we designing a general-purpose CDN for static and dynamic content, or primarily for one use case like video streaming? What scale are we thinking — a CDN for a single large company's assets, or a multi-tenant platform like Cloudflare that serves millions of customers? Do we need to handle DDoS protection as a core feature, or is that out of scope? And are we expected to support edge compute — running customer logic at the edge — or is this primarily a caching and routing problem?

Interviewer: Design a general-purpose CDN similar to what Cloudflare or Akamai provides — multi-tenant, global scale, static and dynamic content, and yes, DDoS protection should be part of the design. Edge compute is a nice-to-have if we have time.

Candidate: Perfect. Let me start with requirements and back-of-the-envelope numbers, then walk through the architecture from the edge inward.


Requirements

Functional

  • Serve cached static content (images, CSS, JS, videos) from locations close to end users
  • Accelerate dynamic content by terminating connections at the edge and optimising origin fetches
  • Support configurable caching rules — TTL, cache keys, bypass conditions
  • Invalidate cached content on demand, by URL, path prefix, or tag
  • Provide DDoS protection by absorbing and filtering malicious traffic at the edge
  • Support HTTPS for all content with TLS termination at the edge
  • (Stretch) Run customer-defined logic at the edge (edge compute / serverless functions)

Non-Functional

  • Ultra-low latency — content should be served in under 50ms for users in covered regions
  • High availability — a single PoP failure must not impact end users; traffic must fail over automatically
  • High throughput — capable of absorbing hundreds of Gbps of traffic, including DDoS spikes
  • Consistency — after a cache invalidation is requested, stale content must be purged from all PoPs within seconds, not minutes
  • Multi-tenancy — thousands of customer domains, each with independent caching rules

Back-of-the-Envelope Estimates

Interviewer: What are the numbers we're working with?

Candidate: Let me work through a Cloudflare-scale estimate:

plaintext
Customers:                    ~5 million websites
Average requests per site/day: ~10,000
Total requests per day:        ~50 billion
Requests per second:          ~580,000 req/sec (average)
Peak (5× average):            ~2.9 million req/sec
 
Assuming average response:    30 KB (HTML + assets)
Outbound bandwidth (avg):     580,000 × 30 KB = ~17 GB/sec
Outbound bandwidth (peak):    ~87 GB/sec
 
Edge PoP count:               ~300 locations globally
Traffic per PoP (avg):        ~200,000 req/sec per PoP
Servers per PoP:              ~50–200 servers depending on region size
 
Cache storage per PoP:
  Hot content in RAM:         ~100–500 GB per PoP
  Warm content on NVMe SSD:  ~10–50 TB per PoP
 
Cache hit rate target:        >95% globally for static content

Two things jump out. First, the traffic is enormous but well-distributed — 300 PoPs absorb 2.9 million req/sec at peak, which is very manageable per PoP. Second, the 95% cache hit rate target is what makes this economically viable — if only 5% of requests reach origin servers, those origin servers are a fraction of what they'd otherwise need to be. Cache hit ratio is the CDN's primary performance and cost metric.


High-Level Architecture

plaintext
                    ┌──────────────────────────────────────┐
                    │         End Users (globally)          │
                    └───────────────┬──────────────────────┘
                                    │ DNS resolves to
                                    │ Anycast IP
                    ┌───────────────▼──────────────────────┐
                    │      Internet Routing (BGP)           │
                    │  Routes to nearest PoP automatically  │
                    └───────────────┬──────────────────────┘

        ┌───────────────────────────┼───────────────────────────┐
        │                           │                           │
┌───────▼──────────┐    ┌──────────▼──────────┐    ┌──────────▼──────────┐
│  PoP: US-East    │    │  PoP: EU-West        │    │  PoP: APAC          │
│  ┌─────────────┐ │    │  ┌─────────────┐    │    │  ┌─────────────┐    │
│  │ Edge Servers│ │    │  │ Edge Servers│    │    │  │ Edge Servers│    │
│  │ (TLS term,  │ │    │  │             │    │    │  │             │    │
│  │  cache,     │ │    │  │             │    │    │  │             │    │
│  │  WAF, DDoS) │ │    │  │             │    │    │  │             │    │
│  └──────┬──────┘ │    │  └──────┬──────┘    │    │  └──────┬──────┘    │
└─────────┼────────┘    └─────────┼───────────┘    └─────────┼───────────┘
          │                       │                           │
          └───────────────────────┼───────────────────────────┘
                                  │ cache miss → forward to origin
                    ┌─────────────▼──────────────┐
                    │       Origin Shield          │
                    │  (Regional mid-tier cache)   │
                    └─────────────┬───────────────┘
                                  │ shield miss
                    ┌─────────────▼──────────────┐
                    │       Origin Servers         │
                    │  (Customer's infrastructure) │
                    └────────────────────────────┘

Component 1: Traffic Routing — Anycast vs DNS-Based

This is where most CDN design answers fall flat. Both Anycast and DNS-based routing get users to an edge server — but through completely different mechanisms, with different trade-offs.

What Unicast routing looks like: in a standard network, every server has a unique IP address. A request to 93.184.216.34 goes to exactly one server. If that server is down, the request fails.

What Anycast routing is: multiple servers in different locations all advertise the same IP address to the internet via BGP. When a router receives a packet destined for that IP, BGP's routing algorithm picks the "closest" path — defined by the fewest network hops. The packet goes to whichever PoP is topologically nearest to the sender.

Interviewer: Explain Anycast. How does it actually work?

Candidate: Every PoP in the CDN's network runs BGP sessions with its upstream internet carriers. Each PoP announces the same IP address — say 104.16.0.1 — to those carriers. Those carriers propagate the announcement to the rest of the internet. Now, every router on the internet has multiple paths to 104.16.0.1, one via each PoP that's announced it. BGP selects the shortest path for each source router, which roughly correlates with physical proximity.

When a user in London sends a request to 104.16.0.1, their ISP's router has two routes: one through Cloudflare's London PoP (2 hops) and one through the Frankfurt PoP (4 hops). BGP picks London. Another user in Frankfurt gets routed to Frankfurt. Neither user configured anything — the internet's routing infrastructure handled it automatically.

Interviewer: What's the difference between Anycast and DNS-based routing?

Candidate: DNS-based routing — which Akamai historically used and Amazon CloudFront still uses — gives different IP addresses to different users at DNS resolution time. When a user queries the CDN's hostname, a geolocation-aware DNS server returns the IP of the nearest PoP based on the resolver's location. The user then connects to that specific IP.

The key difference: DNS-based routing makes the routing decision at query time, while Anycast makes it at the network level for every packet. Anycast has a few meaningful advantages — there's no dependency on geolocation databases being accurate, failover is automatic at the network level rather than waiting for DNS TTLs to expire, and it provides natural DDoS absorption (an attack on one IP is automatically spread across all PoPs since they all share it). The trade-off is that proper Anycast requires the CDN to run its own network hardware, maintain direct peering relationships with carriers worldwide, and carefully manage BGP announcements — it's enormously complex infrastructure. That's why smaller CDNs often start with DNS-based routing and migrate to Anycast as they grow.

For the design I'm building, I'll use Anycast — it's the right architecture for a Cloudflare-scale, multi-tenant CDN.

The Anycast vs DNS distinction is one of those explanations that reads clearly on paper but takes a few tries to land cleanly in an interview. If you want to practise until it comes out naturally under time pressure, Mockingly.ai has CDN and infrastructure system design simulations where exactly this kind of routing question comes up.


Component 2: The PoP — What's Inside

Every Point of Presence is a small data center containing multiple layers of hardware and software working together.

Interviewer: Walk me through what happens when a request arrives at a PoP.

Candidate: The request arrives at a border router, which forwards it to a load balancer inside the PoP. The load balancer distributes it to one of the edge servers using consistent hashing on the URL — requests for the same URL go to the same edge server, maximising cache hit rate. Here's what the edge server does in order:

plaintext
1. TLS Termination
   → Decrypt the HTTPS request at the edge
   → Session resumed from TLS session cache (avoids full handshake for repeat visitors)
 
2. DDoS / WAF Layer
   → Check request against rate limits, IP reputation lists, bot signatures
   → Block or challenge malicious traffic before it touches the cache
 
3. Cache Lookup
   → Compute cache key from URL + configured Vary headers
   → Check in-memory cache (hot tier: L1, in RAM)
   → Check on-disk NVMe cache (warm tier: L2, on SSD)
   → If HIT: serve immediately, return response
 
4. Cache Miss → Origin Fetch
   → Check if another request is already in-flight for this URL (request coalescing)
   → If yes: queue this request, serve the response when it arrives
   → If no: fetch from origin (or origin shield if configured)
   → Store response in cache, serve to all waiting requests
 
5. Response Delivery
   → Compress response if not already compressed (Brotli preferred, gzip fallback)
   → Set response headers (Age, Cache-Status, CF-Cache-Status)
   → Stream response to client

Interviewer: Why consistent hashing for load balancing within the PoP?

Candidate: If you distribute requests round-robin across edge servers within a PoP, each server builds its own independent cache. A request for /logo.png might go to server 3 the first time (populating server 3's cache) and server 7 the second time (cache miss, fetches from origin again). Consistent hashing maps each URL to a specific edge server, so all requests for /logo.png go to the same server and the cache is always warm on that server. Cache hit rate within the PoP goes up dramatically. When a server is added or removed, consistent hashing only remaps a fraction of URLs — not everything.


Component 3: The Cache Hierarchy

Interviewer: You mentioned L1 and L2 caches. Explain the cache hierarchy.

Candidate: A modern CDN uses at least three tiers of caching:

L1 — In-memory cache (hot tier)

  • Stored in RAM on each edge server
  • Holds the most frequently requested content — the top 1–2% of URLs that account for the majority of traffic
  • Sub-millisecond lookup
  • Limited size: a server might have 64–128 GB RAM, of which 20–40 GB goes to the cache
  • Eviction: LRU (Least Recently Used)

L2 — On-disk NVMe cache (warm tier)

  • Stored on NVMe SSDs, which are fast enough (microsecond seek times) to serve content without perceptible delay
  • Holds the "long tail" of content — URLs that are accessed regularly but not frequently enough to stay in RAM
  • A server might have 2–10 TB of NVMe
  • Eviction: LFU (Least Frequently Used) combined with TTL expiry

L3 — Origin Shield (regional mid-tier)

  • A dedicated caching layer between the edge PoPs and the customer's origin servers
  • Consolidates cache misses from multiple PoPs
  • Without origin shielding: if 300 PoPs all miss cache for the same piece of content, they all hit the origin simultaneously — 300 requests for one URL. With origin shielding: all 300 PoPs route their misses through one shield server, which makes one request to origin and caches the response for everyone
  • Reduces origin load by orders of magnitude for popular content

Interviewer: What's the cache key? How do you decide what counts as the same cached object?

Candidate: By default, the cache key is the full URL: scheme + host + path + query string. https://example.com/image.png?v=2 and https://example.com/image.png?v=3 are separate cache entries.

But this is configurable and it matters a lot. If a customer's origin sets a Vary: Accept-Encoding header, the CDN must store separate cached versions for gzip and Brotli responses. If a customer has A/B testing cookies, and the origin varies content by cookie, the CDN must either ignore the cookie (cache one version for everyone) or include the cookie value in the cache key (massively fragmenting the cache and destroying hit rate). Misconfigured Vary headers are one of the most common reasons a CDN has an unexpectedly low cache hit rate.

For personalised or authenticated content, the customer sets Cache-Control: private or no-store. The CDN respects this and bypasses caching entirely, acting as a transparent proxy rather than a cache.


Component 4: Push CDN vs Pull CDN

Interviewer: What's the difference between a push CDN and a pull CDN?

Candidate: This comes down to who initiates content placement at the edge.

Pull CDN — the CDN fetches content from origin on demand. The first user who requests a piece of content triggers an origin fetch; subsequent users get it from cache. This is what Cloudflare, Fastly, and most modern CDNs do. The cache warms organically based on actual traffic.

plaintext
First request for /hero-image.jpg:
  User → CDN edge → CACHE MISS → fetches from origin → caches → serves
 
All subsequent requests:
  User → CDN edge → CACHE HIT → serves from cache

Push CDN — the content owner proactively pushes content to CDN edge nodes before any user requests it. Used when content is known in advance — software releases, video uploads, firmware updates. The customer uploads directly to CDN storage and the CDN replicates it across PoPs.

Interviewer: When would you use push over pull?

Candidate: Push CDN is ideal when: you know exactly what content will be requested before it's requested (a new iOS app update, a scheduled live event), the content is very large and you want all PoPs warmed before traffic spikes, or the origin infrastructure is very limited and can't handle even a brief flood of cache-miss requests. A game publisher releasing a 30 GB patch doesn't want 300 PoPs all pulling the same 30 GB from their origin simultaneously — they push it in advance during off-peak hours.

Pull CDN is better for most web content because it's self-managing — you don't need to think about which content to push, and outdated content naturally expires. The trade-off is a brief period of origin load on cache cold-start, mitigated by origin shielding.


Component 5: Cache Invalidation

This is the section interviewers dig into hardest. Getting cache invalidation wrong means users see stale content after updates, or you cause an origin-crushing thundering herd. Phil Karlton famously called naming things and cache invalidation the two hardest problems in computer science — and with a CDN, you have 300 locations to invalidate simultaneously.

Interviewer: A customer pushes a critical bug fix to their website and needs the old cached version gone immediately. How does cache invalidation work?

Candidate: There are four mechanisms, and they serve different needs.

TTL-based expiry — the simplest. Every cached object has a max-age from the Cache-Control header. When it expires, the CDN fetches a fresh copy on next request. No active invalidation needed. The trade-off: content stays stale until the TTL expires. A max-age=3600 means an hour of potential staleness after an update.

URL-based purge — the customer calls the CDN's purge API with specific URLs:

plaintext
POST /api/purge
{ "urls": ["https://example.com/logo.png", "https://example.com/index.html"] }

The CDN control plane fans out a purge command to all PoPs. Each PoP marks those cache entries as invalid. Effective but only practical for a small number of URLs — purging thousands of URLs one-by-one is slow.

Surrogate keys (cache tags) — the most powerful approach. When the origin serves content, it includes a Surrogate-Key (or Cache-Tag) header listing semantic tags for that response:

plaintext
HTTP/1.1 200 OK
Cache-Control: public, max-age=3600
Surrogate-Key: product-123 category-electronics homepage

The CDN indexes each cached object by its tags. When the customer updates product 123, they purge by tag:

plaintext
POST /api/purge
{ "tag": "product-123" }

Every cached object tagged with product-123 — regardless of URL — is instantly invalidated across all PoPs. This is how news sites invalidate all pages containing a specific article when it's edited, without knowing every URL that includes it.

Stale-while-revalidate — not a purge mechanism, but a staleness tolerance pattern. The Cache-Control: max-age=60, stale-while-revalidate=300 header tells the CDN:

plaintext
"Keep this fresh for 60 seconds.
 For the next 300 seconds after that, serve the stale version while fetching a fresh one in the background.
 The user gets an immediate response; the cache is updated for the next request."

Interviewer: What about the thundering herd problem during invalidation?

Candidate: When content is invalidated across 300 PoPs simultaneously and all of them experience cache misses at the same time, they can all simultaneously flood the origin. Two mechanisms prevent this.

First, request coalescing: when multiple users request the same invalidated URL at the same time, the CDN holds all but the first request in a queue, makes one request to origin, and distributes the response to everyone waiting. No matter how many simultaneous requests hit the edge, the origin sees exactly one.

Second, soft purge: instead of hard-deleting the cached object, the CDN marks it as stale but keeps it. The first request after the purge serves the stale version while one background request fetches fresh content from origin. Subsequent users are already getting the fresh version. The user doesn't see staleness; the origin sees a trickle of traffic, not a spike. Fastly calls this "soft purge"; Cloudflare implements it via stale-while-revalidate.

Cache invalidation is where interviewers at Cloudflare and Fastly spend the most time — they'll push from TTL to URL purge to surrogate keys to thundering herd in one continuous thread. The candidates who handle it well are the ones who've explained the full chain out loud more than once. That's exactly what Mockingly.ai is built for.


Component 6: TLS Termination and Performance

Interviewer: Why is TLS termination at the edge so important?

Candidate: A standard HTTPS connection involves a TCP handshake plus a TLS handshake. Combined, this is multiple round trips before any data is exchanged. From London to an origin in Virginia, a single round trip is ~100ms. The TLS handshake alone might cost 200–300ms for a new session.

By terminating TLS at the edge PoP — say, the London PoP — the round trips happen over a short geographic distance. The London user's TLS handshake takes 5–10ms instead of 200ms. The CDN then maintains its own persistent, optimised connections to the origin (often HTTP/2 or HTTP/3 multiplexed connections that are kept alive across many requests), amortising the connection setup cost.

Additionally, the CDN maintains a TLS session cache. When a returning user reconnects, the CDN resumes the previous TLS session rather than performing a full handshake, reducing connection time to a single round trip. At scale, this is a significant latency reduction for repeat visitors.

The CDN also handles certificate management — automatically issuing and renewing TLS certificates for customer domains via Let's Encrypt or its own CA. This is a huge operational burden removed from customers.


Component 7: DDoS Protection

Interviewer: How does the CDN absorb a 500 Gbps DDoS attack?

Candidate: This is where Anycast architecture shines. When an attacker sends traffic to the CDN's Anycast IP, the internet's routing infrastructure spreads that traffic across all 300 PoPs globally. A 500 Gbps attack becomes ~1.7 Gbps per PoP — manageable for modern network hardware.

But raw network capacity isn't enough. The edge server needs to identify and drop malicious traffic before it consumes resources. The DDoS mitigation pipeline at each PoP works in layers:

plaintext
Layer 1: Network-level (L3/L4)
  → IP reputation blocklist (known bad actors)
  → Rate limiting by source IP
  → TCP SYN flood protection (SYN cookies)
  → Volumetric traffic shaping at the NIC level
 
Layer 2: Application-level (L7)
  → HTTP request rate limiting per IP, per user agent, per path
  → Bot detection: TLS fingerprinting, JavaScript challenge
  → Web Application Firewall (WAF) rules (OWASP top 10)
  → Challenge pages (CAPTCHA) for suspicious sources

The challenge page is particularly effective against L7 floods. Legitimate browsers execute JavaScript and cookies; bots typically can't. Serving a JavaScript challenge shifts the computational cost of attack mitigation from the CDN's infrastructure to the attacker's botnet.

For the most sophisticated attacks — those that mimic legitimate browser traffic — the CDN uses ML-based anomaly detection. If a specific path suddenly gets 1000× its normal traffic with a new user agent pattern, it's flagged and rate-limited while a human reviews it.


Component 8: Content Caching Strategy by Type

Interviewer: Not all content should be cached the same way. How do you handle different content types?

Candidate: The right caching strategy depends entirely on how often content changes and how critical it is for users to see the latest version immediately.

Immutable static assets (versioned JS, CSS, images with hash in filename):

plaintext
Cache-Control: public, max-age=31536000, immutable

Cache for a year. The content never changes — if it did, the filename (hash) would be different. immutable tells the browser not to even send a revalidation request. Highest possible cache hit rate.

Semi-static content (homepage HTML, product images, blog posts):

plaintext
Cache-Control: public, s-maxage=3600, stale-while-revalidate=86400

Fresh for an hour on the CDN. If stale, serve the old version for up to 24 hours while fetching new content in the background. Tag with surrogate keys so editorial updates can trigger instant purges.

API responses (public, non-personalised):

plaintext
Cache-Control: public, s-maxage=60, stale-while-revalidate=300

Short TTL because data changes. s-maxage distinguishes between CDN cache time (60s) and browser cache time (which could be different). Serve stale for 5 minutes if origin is slow.

Personalised or authenticated content:

plaintext
Cache-Control: private, no-store

The CDN bypasses caching entirely. It still provides TLS termination and network acceleration, but the content is fetched fresh from origin for every request. If this content is high-traffic, consider splitting personalised and generic portions — serve the page shell from CDN cache, load personalised data via a separate API call from the browser.


Edge Compute: Moving Logic to the Edge

Interviewer: How would you design an edge compute layer — the Cloudflare Workers equivalent?

Candidate: Edge compute lets customers run JavaScript (or WebAssembly) functions at the edge, between the CDN's cache lookup and the origin fetch. This enables use cases that pure caching can't handle: request rewriting, A/B testing, authentication at the edge, personalisation without round-tripping to origin.

The execution model is deliberately constrained. Functions run in a V8 isolate — similar to a browser tab — not a full container. This means:

  • Cold start is microseconds, not seconds (no process boot-up)
  • Multiple isolates can run on the same edge server without interfering
  • Memory and CPU are tightly limited (128 MB memory, 50ms CPU time per request)

A typical edge function intercepts the request before the cache lookup:

plaintext
Request arrives at edge
  → Edge runtime checks if a Worker is configured for this route
  → If yes: executes the Worker in an isolate
      → Worker can: modify the request, return a response directly,
                    fetch from multiple origins, write to KV store
  → If no response returned: falls through to normal cache/origin flow

The key design challenge is state. Isolates are stateless — they die after the request. For shared state between requests (user sessions, feature flags, A/B test assignments), edge compute relies on a distributed key-value store replicated across all PoPs. Cloudflare Workers KV is eventually consistent; for strongly consistent state, a durable object model is needed — a single-threaded actor that lives at one PoP and handles requests for a specific key.


Monitoring and Observability

Interviewer: How do you monitor a CDN at this scale?

Candidate: The key metrics are:

Cache hit ratio — the CDN's primary health metric. A drop in hit ratio means more traffic reaching origin, higher latency, and higher infrastructure cost. Alert on any sustained drop below 90% for static content.

Origin response time (p99) — how long cache misses take to resolve. A spike here indicates origin problems or connection pool saturation.

Request error rate — 4xx and 5xx responses by origin. Distinguish between client errors (fine, cacheable) and server errors (not cacheable, escalate).

Edge latency by PoP and region — per-PoP p50/p95/p99 latency. A specific PoP spiking while others are healthy suggests a localised hardware or network issue.

DDoS traffic volume — requests blocked, challenges served, traffic anomalies. The control plane watches for sudden changes in traffic patterns as early attack indicators.

All edge servers stream logs to a centralised logging pipeline (Kafka → ClickHouse or BigQuery for analytics). Real-time dashboards surface the above metrics at the global and per-PoP level with sub-minute delay.


Common Interview Follow-ups

"How do you handle a CDN PoP going completely offline?"

If a PoP goes offline or otherwise becomes unavailable, Anycast traffic is simply rerouted automatically to the next closest location, with no configuration changes required. BGP withdraws the route announcements from the failed PoP, and the internet's routing infrastructure redirects traffic to adjacent PoPs automatically within seconds. Users in the affected region experience slightly higher latency (they're now hitting a more distant PoP) but no service interruption. The origin shield also acts as a safety net — if an edge PoP can't serve content, the origin shield maintains a warm cache to prevent a cold-origin scenario when traffic is redistributed.

"How would you handle a customer whose origin is slow — adding hundreds of milliseconds to cache misses?"

Origin shielding is the first lever — consolidate all PoP misses through one shield location, reducing how often the slow origin is hit. The second lever is stale-while-revalidate with a long staleness window — serve stale content for hours while fetching fresh content in the background. The third lever is prefetching: if a content management system publishes a new page, the CDN can immediately fetch and warm the cache rather than waiting for the first user to trigger a miss.

"What's the difference between cache hit ratio and origin offload ratio?"

Cache hit ratio is the percentage of requests served from cache without hitting origin. Origin offload ratio is the percentage of origin request traffic eliminated by the CDN. These can differ: a URL that accounts for 10% of requests but 80% of bytes (a large video) contributes heavily to origin offload even if its hit ratio is moderate. Origin offload ratio is more directly tied to infrastructure cost savings; hit ratio is more directly tied to user latency.

"How do you handle the Fastly incident — a single config change taking down a global CDN?"

In June 2021, a configuration bug at Fastly triggered a widespread outage that briefly took down major news sites, e-commerce platforms, and developer tools, revealing the importance of blast radius control and staged rollouts. The lesson is that configuration changes must be staged — rolled out to 1% of PoPs, then 10%, then 50%, then 100% — with automated rollback if error rates spike at any stage. A circuit breaker at the control plane level prevents a bad config from propagating globally before it's caught. Each PoP should also have a "last known good" configuration it can revert to autonomously if it detects that a new config is causing elevated errors.

These follow-up questions — the PoP failover, the slow origin, the Fastly incident — are the ones that sort strong candidates from great ones. They reward preparation that goes beyond reading and into actually practising your answers out loud. Mockingly.ai runs live system design simulations where this kind of follow-up is standard, for engineers targeting roles at Cloudflare, Amazon, Google, and Netflix.


Quick Interview Checklist

  • ✅ Clarified scope — general-purpose vs specialised, multi-tenant, DDoS in scope
  • ✅ Back-of-the-envelope with cache hit ratio as the primary number
  • ✅ Anycast routing explained — BGP, shared IP, shortest-path routing, failover
  • ✅ DNS-based routing as the alternative — trade-offs named explicitly
  • ✅ PoP internals — TLS termination, WAF/DDoS layer, cache lookup, request coalescing, origin fetch
  • ✅ Consistent hashing for intra-PoP load balancing — cache locality reasoning
  • ✅ Three-tier cache hierarchy — L1 RAM, L2 NVMe, L3 origin shield
  • ✅ Cache key design — URL default, Vary headers, personalisation bypass
  • ✅ Push vs pull CDN — explained when each is appropriate
  • ✅ Cache invalidation — TTL, URL purge, surrogate keys, stale-while-revalidate
  • ✅ Thundering herd protection — request coalescing + soft purge
  • ✅ TLS termination benefits — round trip reduction, session resumption, certificate management
  • ✅ DDoS mitigation — Anycast absorption, L3/L4 + L7 pipeline, JS challenge
  • ✅ Caching strategies by content type — immutable assets, semi-static, API, personalised
  • ✅ Edge compute model — V8 isolates, cold start, state via KV store
  • ✅ Monitoring — cache hit ratio, origin latency, error rates, per-PoP metrics

Conclusion

Designing a CDN is an exercise in understanding every layer of the network stack simultaneously — IP routing, TCP/TLS, HTTP caching semantics, distributed systems consistency, and large-scale attack mitigation. That's why it's a favourite interview question at infrastructure companies. It rewards candidates who can reason across all those layers coherently.

The candidates who ace this question at companies like Cloudflare, Amazon, and Google are the ones who can explain why Anycast provides DDoS resilience (traffic automatically distributes across PoPs), why request coalescing prevents the thundering herd (one origin request, many waiting clients), and why surrogate keys are better than URL purges for content with complex relationships.

The design pillars:

  1. Anycast routing — BGP makes the routing decision, not DNS; automatic failover, natural DDoS distribution
  2. Three-tier cache hierarchy — RAM for hot content, NVMe for warm, origin shield as the last line before origin
  3. Consistent hashing within each PoP — maps URLs to servers deterministically, maximising local cache hit rate
  4. Surrogate keys for cache invalidation — tag-based purging beats URL-based purging for any real content graph
  5. Soft purge + request coalescing — the two mechanisms that prevent thundering herd after invalidation
  6. stale-while-revalidate — the pattern that decouples content freshness from user latency
  7. L7 DDoS mitigation — JS challenges shift computational cost to the attacker


Frequently Asked Questions

What is Anycast routing and how does it work in a CDN?

Anycast is a network addressing method where multiple servers share the same IP address. BGP (Border Gateway Protocol) routes each incoming request to whichever server advertising that IP is topologically closest to the user.

How it works step by step:

  1. The CDN operator assigns a single IP address — e.g. 104.16.0.1 — to every PoP globally
  2. Each PoP announces this IP to the internet via BGP
  3. When a user's DNS lookup resolves to 104.16.0.1, their ISP routes the packet along the shortest BGP path
  4. That shortest path leads to the nearest PoP — automatically, without any DNS-level steering
  5. If a PoP goes offline, it withdraws its BGP announcement; internet routers converge to the next-nearest PoP within seconds

Why Anycast matters for DDoS:

A 500 Gbps attack targeting one IP is automatically absorbed across all PoPs — each absorbs its proportional share based on BGP routing. No single PoP sees the full attack volume. This is why Cloudflare can absorb record-breaking DDoS attacks that would overwhelm any single datacenter.


What is the difference between Anycast and DNS-based CDN routing?

Anycast uses BGP routing to direct traffic to the nearest PoP at the network layer. DNS-based routing uses geolocation at the DNS level to return a PoP-specific IP address to the user.

AnycastDNS-based
Routing layerNetwork layer (BGP)Application layer (DNS)
Failover speedSeconds (BGP reconverges)Minutes (DNS TTL must expire)
DDoS resilienceInherent — attack distributes across PoPsSingle PoP IP can be overwhelmed
LatencyOptimal — BGP finds true shortest pathGood — geolocation can mismatch actual network topology
Setup complexityRequires BGP peering at each PoPSimpler DNS configuration
Who uses itCloudflare, GoogleAkamai (historically), AWS CloudFront

When DNS-based routing breaks down:

A user in Singapore whose ISP routes through Tokyo may get a DNS response pointing to Singapore's PoP — but their actual packets travel via Tokyo. DNS geolocation sees the DNS resolver's location, not the user's real network path. Anycast resolves this automatically because BGP routing follows the actual packet path.


What is origin shielding in a CDN and why does it matter?

Origin shielding designates one PoP as the exclusive gateway to the origin server. All other PoPs must route cache misses through the shield PoP rather than fetching directly from origin.

Without origin shielding:

  1. A popular asset expires from cache simultaneously in 50 PoPs
  2. All 50 PoPs send a request to the origin at the same moment
  3. The origin receives a sudden burst of 50 simultaneous requests — potentially overwhelming it

With origin shielding:

  1. All 50 PoPs send their cache misses to the shield PoP
  2. The shield PoP deduplicates and sends one request to the origin
  3. The origin's effective traffic is reduced by up to 50×
  4. The shield PoP also maintains a warm cache — other PoPs that miss in their own cache find the content already at the shield

When to use it:

  1. Origins that are expensive to hit (slow, rate-limited, or paying per-request)
  2. Content with irregular cache hit patterns where many PoPs miss simultaneously
  3. Any origin you want to protect from traffic spikes during cache invalidation events

How does CDN cache invalidation work — and what is the difference between URL purge and surrogate keys?

Cache invalidation removes stale content from CDN caches before the TTL expires. There are four mechanisms, each suited to different invalidation patterns.

1. TTL expiry (passive) Content expires naturally after its Cache-Control: max-age value. No active invalidation. Works for content that can tolerate staleness for the TTL duration.

2. URL purge (active) Send a purge request for a specific URL. The CDN marks that URL's cached copy as invalid across all PoPs.

  • Fast and precise for individual assets
  • Impractical for content with many representations (a product image in 12 sizes = 12 separate purge calls)

3. Surrogate keys / cache tags (active) Tag each cached response with one or more logical keys at origin: Surrogate-Key: product-42 category-shoes. A single purge of product-42 invalidates every cached response carrying that tag — across all URLs, formats, and PoPs.

  • One purge call invalidates thousands of cached objects
  • Ideal for content graphs where one upstream change (product update) affects many downstream URLs
  • Used by Fastly, Cloudflare, and Varnish under different names (Cache-Tag, Surrogate-Key, xkey)

4. Stale-while-revalidate (background refresh) Serve the stale cached version immediately while fetching a fresh copy in the background. The next user gets the updated content. Zero latency impact on the user who triggered the revalidation.


What is stale-while-revalidate and when should you use it?

stale-while-revalidate is an HTTP Cache-Control directive that allows a cache to serve a stale response immediately while fetching a fresh copy in the background.

plaintext
Cache-Control: max-age=300, stale-while-revalidate=60

This means:

  1. 0–300 seconds after caching: serve fresh. No network request
  2. 300–360 seconds (the revalidate window): serve stale immediately while fetching fresh in background. User experiences no latency
  3. After 360 seconds: serve stale until the background fetch completes, or block if no background fetch is in progress

Why it matters:

Without stale-while-revalidate, every request arriving exactly at TTL expiry blocks on an origin fetch. The user waits. With it, that user gets the stale response in milliseconds and the cache refreshes silently.

When to use it:

  1. Content that changes frequently but where milliseconds of staleness is acceptable (news feeds, product listings, sports scores)
  2. Any content where origin latency would otherwise be visible to the end user on cache expiry
  3. Combined with surrogate key purging: use stale-while-revalidate for routine TTL management; surrogate keys for immediate forced invalidation when content must change now

What is the thundering herd problem in a CDN and how do you prevent it?

Thundering herd occurs when many simultaneous requests arrive for the same content at the exact moment it expires from cache — all of them triggering origin fetches simultaneously.

Scenario:

  1. A popular image has Cache-Control: max-age=300
  2. At exactly 300 seconds, 10,000 users request it simultaneously
  3. All 10,000 requests find an empty cache and hit the origin
  4. Origin receives 10,000 simultaneous requests for one file — potential overload

Two mechanisms prevent this:

1. Request coalescing (within a PoP)

When multiple simultaneous requests arrive for an uncached (or expired) resource:

  1. The first request is forwarded to origin
  2. All subsequent requests for the same URL are held in a queue
  3. When the origin responds, the content is cached and all queued requests are served simultaneously from the single response

Result: 10,000 simultaneous requests produce exactly one origin request.

2. Soft purge (for invalidation-triggered herd)

Instead of deleting content immediately on purge, mark it as stale. Stale content continues to be served while one background request fetches the fresh version. Only after the fresh version arrives does the old content get evicted. This decouples the purge event from the origin fetch burst.


What is the difference between a push CDN and a pull CDN?

Pull CDN fetches content from origin on the first cache miss and caches it for subsequent requests. Push CDN requires the origin to proactively upload content to CDN edge nodes before any user requests it.

Pull CDNPush CDN
How content reaches the edgeOn first user request (cache miss)Origin pushes it proactively
First user latencyHigher — first request hits originLower — content already at edge
Storage managementCDN manages eviction via LRUOrigin controls what is at the edge
Best forDynamic or unpredictably popular contentLarge static files (software downloads, video, backups)
Origin trafficSpiky — misses hit origin unpredictablyPredictable — origin pushes on publish
ExamplesCloudflare, most general CDNsAWS S3 + CloudFront push, Akamai NetStorage

The right choice depends on content lifecycle:

  1. Use pull for websites, APIs, and content where you cannot predict which assets will be popular
  2. Use push for large files you know will be requested heavily: game downloads, software releases, large video files
  3. Most modern CDNs support both models — use pull as the default and push for known high-traffic assets

How does a CDN protect against DDoS attacks?

CDN DDoS protection works by absorbing attack traffic across the CDN's global edge network — distributing the load so no single point is overwhelmed — and filtering malicious packets before they reach the origin.

The defence pipeline has two layers:

Layer 3/4 (Network layer — volume attacks):

  1. Anycast routing distributes incoming traffic across all PoPs automatically — a 500 Gbps attack is split across 300 PoPs, each absorbing ~1.7 Gbps
  2. BGP blackholing: if an IP is confirmed under attack, upstream ISPs drop traffic for that IP at their routers
  3. Stateless packet filtering: drop packets that don't match expected TCP/UDP profiles (invalid flags, spoofed source IPs)

Layer 7 (Application layer — smart attacks):

  1. Rate limiting per IP, ASN, or geographic region
  2. JS challenge: return a JavaScript puzzle to suspected bots. Legitimate browsers solve it in milliseconds (transparent to users); bots cannot execute JavaScript without a full browser runtime. This shifts the computational cost to the attacker
  3. Anomaly detection: machine learning models trained on traffic patterns flag unusual request signatures, user agents, or behavioural patterns
  4. CAPTCHA escalation for IPs that fail the JS challenge

Why CDN-level DDoS protection beats origin-level:

An origin server absorbing a DDoS attack is burning CPU and bandwidth on malicious traffic while trying to serve legitimate users. A CDN drops the attack at the edge — legitimate users get cached content with sub-50ms latency; malicious traffic never reaches the origin.


How does edge compute work in a CDN — and what are Cloudflare Workers?

Edge compute runs customer-defined code at CDN edge nodes, between the cache layer and the origin. It enables request transformation, authentication, A/B testing, and personalisation without round-tripping to origin.

Cloudflare Workers is the most widely known implementation. The execution model:

  1. Customer deploys a JavaScript or WebAssembly function to Cloudflare
  2. On each request, the edge runtime checks if a Worker is configured for that route
  3. If yes: the Worker runs in a V8 isolate — a lightweight execution context similar to a browser tab
  4. The Worker can modify the request, return a response directly, fetch from multiple origins, or read/write from a key-value store
  5. If the Worker returns a response: origin is never contacted. If it falls through: normal cache/origin flow continues

Why V8 isolates instead of containers:

  1. Cold start: isolate starts in microseconds. A container cold start takes hundreds of milliseconds
  2. Density: thousands of customer functions can run on the same edge server simultaneously
  3. Safety: each isolate is memory-isolated — one customer's function cannot access another's data

The state problem:

Isolates are stateless — they die after the request. Shared state requires a distributed key-value store (Cloudflare KV) replicated across all PoPs. KV is eventually consistent. For strongly consistent state — like a rate limiter that must be exact — a Durable Object model is needed: a single-threaded actor anchored to one specific PoP.


Which companies ask the CDN system design question in interviews?

Cloudflare, Amazon, Google, Meta, Netflix, Akamai, and Fastly all ask variants of this question for senior software engineer and infrastructure roles.

Why it is popular:

  1. Breadth across the stack — it requires reasoning about IP routing (BGP/Anycast), transport (TCP/TLS), HTTP caching semantics, distributed consistency, and attack mitigation simultaneously
  2. Directly tied to revenue — cache hit ratio, origin offload, and DDoS resilience are measurable business outcomes, not abstract engineering concerns
  3. Scales to seniority — a junior answer covers basic caching; a senior answer covers surrogate key invalidation, thundering herd prevention, Anycast vs DNS routing trade-offs, and staged config rollouts

What interviewers specifically listen for:

  1. Anycast vs DNS routing — and explaining why Anycast provides inherent DDoS resilience
  2. Surrogate keys over URL purge — and the content graph scenario that makes URL purge impractical
  3. Request coalescing — named explicitly as the thundering herd solution, not just "caching"
  4. stale-while-revalidate — the pattern that decouples freshness from user latency
  5. Staged config rollouts — proactively mentioning the Fastly 2021 outage lesson signals operational maturity

The CDN question rewards engineers who've built mental models of how the internet actually works — not just application-layer abstractions. If you want to practice holding that mental model under pressure while an interviewer pokes at every layer, Mockingly.ai has system design simulations built for senior engineers preparing for roles at Cloudflare, Amazon, Google, Meta, Netflix, and Fastly.

Companies That Ask This

Ready to Practice?

You've read the guide — now put your knowledge to the test. Our AI interviewer will challenge you with follow-up questions and give you real-time feedback on your system design.

Free tier includes unlimited practice with AI feedback • No credit card required

Related System Design Guides