What is fan-out on write vs fan-out on read for social feeds?

Fan-out on write (push model) means when a user posts, the system immediately copies that post reference into every follower's pre-built feed. Fan-out on read (pull model) means each follower's feed is assembled live when they open the app. Fan-Out on Write Fan-Out on Read --------- How feed is built Pre-built when post is created Assembled live on every read Feed read latency Fast — just read a pre-built list Slow — touch hundreds of data sources Write cost High — one write becom

mockingly.ai

Blog

How to Design Instagram / a Social Feed - Complete System Design Guide

Blog

22 min

October 20, 2025

Mockingly

hard

How to Design Instagram / a Social Feed - Complete System Design Guide

Asked at

Problem Statement

Design a social feed platform like Instagram that can:

Allow users to upload photos and videos with captions
Let users follow other users
Show each user a personalised feed of posts from accounts they follow
Handle likes, comments, and engagement at scale
Serve media (images, video) with low latency globally
Scale to hundreds of millions of daily active users

In the interview, clarify the scope before diving in. Reels, Stories, DMs, Explore, and Search are each substantial sub-systems. Unless the interviewer asks for them explicitly, scope down: focus on posting media and serving the feed. These two flows alone will fill a 45-minute interview if you go deep.

Requirements Gathering

Functional Requirements

Post creation: Users upload a photo or video with a caption and optional location tag
Follow graph: Users can follow and unfollow other users
Feed: Users see a personalised feed of recent posts from accounts they follow, ranked by relevance
Likes and comments: Users can like and comment on posts
Profile page: A user's own posts displayed in reverse-chronological order
Media delivery: Photos and videos load fast globally

Non-Functional Requirements

High availability: The feed should always be readable, even during partial failures
Eventual consistency: A post doesn't need to appear in every follower's feed instantly — seconds or minutes of delay is acceptable
Read-heavy: Feed reads vastly outnumber post writes; optimise accordingly
Low feed latency: Feed should load in under 200ms
Durability: A posted photo must never be lost
Global scale: Users are worldwide; media delivery latency matters

Back-of-the-Envelope Calculations

Let's anchor this in real numbers before touching architecture.

Instagram has 2 billion monthly active users and 500 million daily active users. Instagram sees approximately 1.3 billion images shared every day.

plaintext

Scale assumptions:
  DAU:                    500 million
  Posts per day:          ~100 million (conservative: photos + videos)
  Feed reads per DAU:     ~10 feed refreshes/day
  Posts per feed refresh: 20 posts returned
 
Write throughput (posts):
  100M posts/day ÷ 86,400s ≈ 1,150 posts/second
  Peak (5x):              ~5,750 posts/second
 
Read throughput (feed loads):
  500M DAU × 10 reads/day ÷ 86,400s ≈ 57,870 reads/second
  Peak (3x):              ~175,000 feed reads/second
 
Read:write ratio ≈ 50:1 — this is emphatically a read-heavy system.
 
Media storage:
  Average photo size (compressed): 300 KB
  Average video size:               5 MB
  Assume 80% photos, 20% videos
 
  Per day: (80M × 300KB) + (20M × 5MB)
         = 24 TB  +  100 TB
         = ~124 TB/day of new media
 
  5-year storage: 124 TB/day × 365 × 5 ≈ 226 PB
 
Feed metadata storage (post records):
  ~500 bytes per post record (post ID, user ID, caption, media URL, timestamp, counters)
  100M posts/day × 500B = 50 GB/day of metadata
  5-year: ~90 TB

Two implications stand out immediately: media storage is the dominant concern (petabytes), so it must live in a dedicated object store — not your application database. And at 175,000 feed reads per second, every extra millisecond in your feed generation path costs real money in infrastructure.

The Core Architecture

plaintext

                          ┌───────────────────────┐
                          │     CDN / Edge Cache   │
                          │  (CloudFront / Fastly) │
                          └──────────┬────────────┘
                                     │ media delivery
                          ┌──────────┴────────────┐
                          │    API Gateway /        │
                          │    Load Balancer        │
                          └──────────┬────────────┘
                                     │
         ┌───────────────────────────┼────────────────────────┐
         │                           │                        │
┌────────▼──────────┐    ┌───────────▼────────┐   ┌──────────▼─────────┐
│   Post Service    │    │   Feed Service      │   │   User/Social      │
│  (upload, store,  │    │  (feed generation,  │   │   Graph Service    │
│   fan-out)        │    │   ranking, serve)   │   │  (follow, unfollow)│
└────────┬──────────┘    └───────────┬────────┘   └──────────┬─────────┘
         │                           │                        │
         │              ┌────────────┴──────────────┐        │
         │              │                           │        │
┌────────▼──────┐  ┌────▼──────────┐  ┌────────────▼─┐  ┌───▼──────────────┐
│  Object Store │  │  Feed Cache   │  │  Feed Store  │  │  Social Graph DB │
│  (S3 / Blob)  │  │   (Redis)     │  │  (Cassandra) │  │  (PostgreSQL)    │
│  Photos/Videos│  │  Hot feeds    │  │  Pre-computed│  │  Users/follows   │
└───────────────┘  └───────────────┘  │  timelines   │  └──────────────────┘
                                       └──────────────┘
         ┌──────────────────────────────────────────┐
         │                  Kafka                   │
         │  (async fan-out, media processing jobs)  │
         └──────────────────────────────────────────┘

The Two Critical Flows

Flow 1: Posting a Photo

When a user posts a photo, two things need to happen: the media needs to be stored durably, and the post needs to appear in followers' feeds. These are intentionally decoupled.

Step 1: Media Upload

The naive approach — upload the photo to your application server, have the server write it to storage — has a fatal flaw: a photo can be 5MB, a video can be 500MB. Funnelling that through your API servers means they spend all their capacity buffering binary data instead of handling API requests.

The right pattern is a pre-signed URL upload:

plaintext

1. Client → POST /posts/upload-url
             (requests a pre-signed upload URL)
 
2. Post Service → Object Store (S3/GCS)
             (generates a pre-signed URL, valid for 10 minutes)
 
3. Post Service → Client
             (returns the pre-signed URL)
 
4. Client → Object Store (direct upload, bypassing API servers entirely)
             (uploads the photo/video directly to S3)
 
5. Client → POST /posts
             (notifies Post Service that upload is complete,
              with the media URL and caption)

The application server is completely out of the media upload path after step 3. This pattern is used universally at scale — Instagram, Dropbox, Slack, all do direct-to-S3 uploads.

Step 2: Media Processing

Raw uploads need to be processed: resized to multiple resolutions (thumbnail, standard, 2x), transcoded for video, and have metadata extracted. This is always asynchronous:

plaintext

Client upload completes →
  Post Service writes post record to DB →
    Kafka event: "new_post:{post_id}" →
      Media Processing Worker consumes event →
        Generates thumbnail (320px), standard (720px), HiDPI (1080px) →
          Stores processed versions in Object Store →
            Updates post record with all media URLs

The post is visible to the user immediately (with the original upload). Processed versions become available within seconds to minutes. Users rarely notice the gap.

Step 3: Fan-Out (covered in detail below)

Once the post is persisted and media is processing, a separate fan-out worker pushes the post reference into followers' feed timelines.

Flow 2: Loading the Feed

When a user opens Instagram, their feed loads in under 200ms. That's only possible because the feed isn't computed on-the-fly — it was pre-built.

plaintext

User opens app →
  GET /feed?cursor={last_seen_post_id}&limit=20
 
Feed Service:
  1. Check Redis cache for user's pre-computed feed → HIT (most of the time)
  2. Return top 20 post IDs from the cached feed list
  3. Fetch post metadata for those 20 IDs in parallel (from metadata cache/DB)
  4. Return assembled feed to client

The feed is essentially a sorted list of post IDs cached in Redis per user. The Feed Service just reads from that list — no expensive joins, no real-time social graph traversal.

Pagination uses cursor-based pagination (not page numbers). The client sends the ID of the last post it saw; the server returns the next N posts after that point. This is stable, efficient, and doesn't break when new posts are inserted at the top.

The Fan-Out Problem: The Heart of Feed Design

This is the most important design decision in the entire system, and the one interviewers probe hardest. When a user posts, how does that post get into their followers' feeds?

There are three approaches:

Option 1: Fan-Out on Write (Push Model)

When User A posts, immediately write a reference to that post into every single follower's feed timeline.

plaintext

User A (100 followers) posts →
  Fan-out worker reads User A's follower list →
  For each of 100 followers:
    Append post_id to follower's feed in Cassandra/Redis
 
When Follower B opens their feed:
  Just read the pre-built timeline — O(1), instant

Pros:

Feed reads are extremely fast (just read a pre-built list)
No real-time computation needed at read time

Cons:

Writing is expensive when users have millions of followers
Selena Gomez has 400 million followers. A single post triggers 400 million writes. At peak, multiple celebrities might post simultaneously — this creates massive write spikes

Option 2: Fan-Out on Read (Pull Model)

Don't pre-build feeds at all. When User B opens their feed, the system fetches all the accounts B follows, pulls their recent posts, merges and ranks them in real time.

plaintext

Follower B opens feed →
  Feed Service fetches B's follow list (500 accounts) →
  For each account:
    Fetch their N most recent posts from the post DB
  Merge all results, sort, rank →
  Return top 20

Pros:

Zero fan-out writes — posting is just a single DB write
No wasted work for posts that followers never actually view

Cons:

Feed generation requires touching potentially hundreds of data sources simultaneously
At 175,000 feed reads/second, this creates an enormous read load
Latency is terrible for users who follow many accounts with lots of posts

Option 3: Hybrid Fan-Out (The Right Answer)

Pre-compute the feed for followers of non-celebrity users. For celebrity users with 1M+ followers, skip the pre-computation — the fan-out would be extremely compute and I/O intensive. Instead, merge their posts at read time.

plaintext

User A posts (regular user, 500 followers):
  → Fan-out worker pushes post_id to all 500 followers' timelines immediately ✓
 
Celebrity C posts (10 million followers):
  → Only write the post to Celebrity C's own post store
  → When a follower opens their feed, merge Celebrity C's recent posts in real-time

At read time, the Feed Service assembles each user's feed by:

Reading the user's pre-built timeline (covers all regular accounts they follow) — fast
Querying the post stores of any celebrities they follow — a small, bounded set of additional reads
Merging and ranking the combined results

The threshold for "celebrity" is a tunable parameter — commonly 1 million followers, though it varies by system.

This hybrid approach is what Instagram actually uses, and it's the answer you should give in an interview. The key insight is that the fan-out cost grows linearly with followers, so you apply the expensive push model only where it's cheap, and fall back to pull where it would be prohibitive.

The fan-out question is one where interviewers expect you to argue through all three options, not just land on the hybrid answer. If your reasoning through the trade-offs feels thin anywhere, Mockingly.ai has social feed simulations built around exactly this decision.

python

def fan_out_post(post_id: str, author_id: str):
    follower_count = social_graph.get_follower_count(author_id)
    
    if follower_count > CELEBRITY_THRESHOLD:  # e.g., 1_000_000
        # Publish to celebrity post store only — no fan-out
        celebrity_store.add_post(author_id, post_id)
    else:
        # Fan-out to all followers asynchronously via Kafka
        followers = social_graph.get_followers(author_id)
        kafka.publish("fan_out_jobs", {
            "post_id": post_id,
            "follower_ids": followers
        })
 
def fan_out_worker(job):
    for follower_id in job["follower_ids"]:
        feed_store.prepend(follower_id, job["post_id"])
        # Also invalidate / update Redis cache for online users
        if redis.exists(f"feed:{follower_id}"):
            redis.lpush(f"feed:{follower_id}", job["post_id"])
            redis.ltrim(f"feed:{follower_id}", 0, 999)  # Keep last 1000 entries

Database Design

Instagram's real-world database choices are well-documented, and they reflect the exact trade-offs you'd reason through in an interview.

What Instagram Actually Uses

PostgreSQL stores structured data such as user profiles, comments, relationships, and metadata, running in a leader-follower replication topology for high availability. Cassandra is used for storing highly distributed data such as user feeds, activity logs, and analytics data — it follows an eventual consistency model and provides high write throughput, making it ideal for feed timelines. Memcached is used to reduce database load by caching frequently accessed data, storing temporary copies of user profiles, posts, and like counts. Haystack is Meta's internal storage system for images and videos, minimising the number of file system operations required to fetch content.

For an interview context, using S3/GCS is the right abstraction for object storage — Haystack is Meta-internal. The PostgreSQL + Cassandra combination is exactly what you should propose and can defend well.

PostgreSQL Schema (Structured/Relational Data)

sql

-- Core user table
CREATE TABLE users (
    id            BIGINT PRIMARY KEY,  -- Snowflake ID
    username      VARCHAR(30) UNIQUE NOT NULL,
    email         VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    bio           TEXT,
    profile_pic   VARCHAR(512),         -- URL to object store
    follower_count INT DEFAULT 0,       -- Denormalised for fast reads
    following_count INT DEFAULT 0,
    created_at    TIMESTAMP DEFAULT NOW(),
    INDEX idx_username (username)
);
 
-- Posts table
CREATE TABLE posts (
    id            BIGINT PRIMARY KEY,  -- Snowflake ID
    user_id       BIGINT NOT NULL REFERENCES users(id),
    caption       TEXT,
    media_url     VARCHAR(512) NOT NULL,
    media_type    SMALLINT NOT NULL,   -- 0=photo, 1=video
    thumbnail_url VARCHAR(512),
    like_count    INT DEFAULT 0,       -- Denormalised counter
    comment_count INT DEFAULT 0,
    created_at    TIMESTAMP DEFAULT NOW(),
    INDEX idx_user_posts (user_id, created_at DESC)
);
 
-- Follow relationships (social graph)
CREATE TABLE follows (
    follower_id   BIGINT NOT NULL,
    followee_id   BIGINT NOT NULL,
    created_at    TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY   (follower_id, followee_id),
    INDEX idx_followee (followee_id)   -- "who follows user X?" lookups
);

Important: like_count and comment_count are denormalised counters on the post record — not computed from SELECT COUNT(*) queries at read time. Running count aggregations at Instagram scale would be catastrophic. Increment/decrement them atomically when likes and comments are created/deleted.

Cassandra Schema (Feed Timelines)

sql

-- Pre-computed user feed timeline
CREATE TABLE user_feed (
    user_id       UUID,
    post_id       BIGINT,        -- Snowflake ID — sortable by time
    author_id     UUID,
    created_at    TIMESTAMP,
    PRIMARY KEY   (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC)
  AND default_time_to_live = 2592000;  -- 30 day TTL: auto-expire old entries

The Cassandra feed table is append-only. New post references are prepended to the front (via the clustering key sort order). Reading a user's feed is a single Cassandra partition scan — fast, predictable, and horizontally scalable.

The 30-day TTL automatically purges old feed entries so the table doesn't grow unboundedly. If a user hasn't opened the app in 30 days, their pre-computed feed expires and gets regenerated from scratch when they return.

Why Not a Single Database?

PostgreSQL handles social graph queries beautifully (follow relationships, mutual connections), but it struggles as a write-heavy time-series store. Every new post triggers potentially millions of feed writes — PostgreSQL's B-tree indexes would thrash under that kind of write load. Cassandra's LSM-tree storage is designed for sequential writes at massive throughput. That's why you use both.

Media Storage and CDN

Object Storage

All photos and videos live in object storage (Amazon S3, Google Cloud Storage, or Meta's internal equivalent). The Post Service stores only the URL of the media, not the binary content itself.

Each uploaded photo gets stored in multiple versions:

plaintext

Original upload:  /posts/{post_id}/original.jpg
Thumbnail:        /posts/{post_id}/thumb_320.jpg
Standard:         /posts/{post_id}/std_720.jpg
HiDPI:            /posts/{post_id}/hd_1080.jpg

Video gets transcoded into multiple bitrates for adaptive streaming (similar to HLS), handled by an async transcoding worker.

CDN Strategy

Users spend an average of 33 minutes daily on Instagram — that means an enormous volume of media being fetched, from users all over the world. Serving all that from a single origin would be impossibly slow for users in distant regions.

A Content Delivery Network (CloudFront, Fastly, or Akamai) caches media at edge nodes globally. The URL structure includes a cache key so that the same image served to users in Singapore and São Paulo both hit local edge nodes:

plaintext

https://cdn.instagram.com/posts/{post_id}/std_720.jpg?v={media_version}

Cache TTL for media: Long (days to weeks). Photos don't change after upload. The v query parameter handles cache-busting if a post is edited or deleted.

Cache TTL for feed metadata: Short (seconds to minutes). Feed content is dynamic.

Feed Ranking

Here's where most design articles sell you short: they describe "chronological feeds" as if that's what Instagram does. It hasn't been purely chronological since 2016. The feed is ranked by predicted relevance, and the ranking system is sophisticated.

The Real Ranking Pipeline (From Meta's Own Engineering)

The Instagram Feed AI system uses multiple machine learning models to select, rank, and deliver posts. It first considers all posts from accounts you follow, runs a lightweight model to shortlist approximately 500 of the most relevant posts, then calculates a relevance score for each and orders them accordingly.

Instagram runs over 1,000 machine learning models simultaneously to power recommendations across all surfaces, with these models processing billions of signals daily to predict which content each user will find most valuable.

For a system design interview, you don't need to implement all 1,000 models. But you do need to describe a ranking pipeline that's credible:

Stage 1 — Candidate Generation

Pull post IDs from the user's pre-built timeline (Cassandra)
Merge with recent posts from celebrities they follow (pull model)
Result: a pool of 500–2,000 candidate post IDs

Stage 2 — Lightweight Scoring

Run a fast, approximate ML model over all candidates
Prune to top ~500 by initial score
Features: recency, basic engagement velocity, author relationship strength

Stage 3 — Deep Ranking

Run a heavier ranking model on the top 500 candidates
Key ranking signals include: the user's past activity (likes, shares, saves, comments), information about the post (how many people have liked it, how quickly people are engaging), and information about the person who posted (how many times others have interacted with them recently).
Output: a ranked list of post IDs

Stage 4 — Post-Processing and Diversity

Ensure feed isn't dominated by one author
Inject recommended posts from non-followed accounts (Explore-style suggestions)
Apply content policy filters

The result is personalised to each user and updated each time they refresh the feed. In practice, the pre-built timeline serves as a fast first-pass candidate set, and the ranking models run over it in ~50–100ms before the response is sent.

For your interview, describing this two-stage "candidate generation → ranking" pattern is sufficient. Knowing the ranking signals (recency, engagement velocity, relationship strength, user interest) shows depth without requiring you to describe every ML model.

The two-stage ranking pipeline is something interviewers at Meta and Google specifically probe — they'll ask what happens at the boundary between candidate generation and deep ranking, and what signals you'd use. Having those answers ready out loud, not just in your head, is what Mockingly.ai is designed to help with.

Caching Strategy

With a 50:1 read-to-write ratio, caching is what makes the whole system economically viable.

Layer 1: CDN (Edge Cache)

Caches media (photos, videos, thumbnails)
Handles the vast majority of media delivery traffic
Long TTL (days to weeks for immutable media)

Layer 2: Application Cache (Redis)

Pre-computed feed timelines per user (list of post IDs)
Post metadata for hot/recent posts (to avoid DB lookups on feed assembly)
Like counts and comment counts (high-read counters)
User profiles (hit on every feed card render)
TTL: minutes to hours depending on data type

Layer 3: Memcached (Instagram's actual choice for DB protection)

Memcached is used to reduce database load by caching frequently accessed data, storing temporary copies of user profiles, posts, and like counts to prevent repeated queries to PostgreSQL or Cassandra.

The Thundering Herd Problem

A subtle but important problem: what happens when a cache entry for a wildly popular post (say, a celebrity's new photo) expires simultaneously? Hundreds of servers rush to recompute it from the database at the same moment — this is the thundering herd.

Instagram mitigates this with Memcache lease: when a cache entry expires and the first request for it misses the cache, Memcache issues a lease to that first requester and asks subsequent requests for the same key to wait. This prevents multiple servers from simultaneously querying the database for the same data.

The result: only one request goes to the database to rehydrate the cache entry; all others wait briefly and then read from the freshly populated cache.

The Memcache lease pattern is the kind of production detail that distinguishes candidates who've engaged deeply with caching problems from those who just say "add a cache." Raising it unprompted at Meta or Google is a strong signal. Mockingly.ai runs Instagram-style simulations where follow-up questions on caching strategy are standard.

Database Sharding

PostgreSQL Sharding

At Instagram's scale, a single PostgreSQL instance can't hold all user data. Instagram stores user information, friendships, and post metadata in PostgreSQL, running in a leader-follower replication topology where write requests go to the leader and read requests are routed to followers in the same data center.

Sharding strategy: shard by user_id using consistent hashing. All data for a given user (their posts, their follow relationships) lands on the same shard — this keeps user-centric queries local to a single shard. A shard management service maps user_id → shard_id.

Cassandra Sharding

Cassandra natively handles sharding via its consistent hashing ring. The user_id partition key in the feed table means all feed entries for a user live on the same Cassandra node — exactly the locality you want for "fetch user B's feed timeline" queries.

For follow relationships, PostgreSQL works well at moderate scale. At very large scale, some companies move the social graph to a dedicated graph database or a custom distributed graph store (Meta uses TAO, their distributed in-memory graph cache). For an interview, proposing PostgreSQL with proper indexing on (follower_id, followee_id) and (followee_id) is appropriate, with a note that at extreme scale you'd consider a dedicated graph service.

API Design

POST /posts

plaintext

Request (multipart/form-data):
  media_url: string    (pre-signed URL confirmed complete)
  caption:   string
  media_type: "photo" | "video"
 
Response (201 Created):
  {
    "post_id":    "7234891234",
    "media_url":  "https://cdn.instagram.com/posts/7234891234/std_720.jpg",
    "created_at": "2025-10-20T14:23:00Z"
  }

GET /feed

plaintext

Request:
  cursor: string   (optional — ID of last seen post for pagination)
  limit:  int      (default: 20, max: 50)
 
Response (200 OK):
  {
    "posts": [
      {
        "post_id":       "7234891234",
        "author":        { "user_id": "...", "username": "...", "profile_pic": "..." },
        "media_url":     "https://cdn.instagram.com/...",
        "caption":       "...",
        "like_count":    4821,
        "comment_count": 93,
        "created_at":    "2025-10-20T14:23:00Z",
        "liked_by_me":   false
      },
      ...
    ],
    "next_cursor": "7234889100"
  }

Cursor-based pagination is essential here. Offset-based pagination (page=5) breaks badly on a live feed: new posts arrive at the top between page fetches, causing items to shift and repeat across pages. Cursors are stable.

Monitoring and Key Metrics

Feed Performance

Feed generation latency (p50, p95, p99) — target p99 < 200ms
Cache hit rate for feed timelines — should be > 95%
Fan-out queue lag — how far behind is the fan-out worker from real-time?

Media Pipeline

Media upload success rate (direct-to-S3)
Transcoding job completion time (time from upload to all resolutions available)
CDN cache hit rate (should be > 99% for photos)

Database Health

PostgreSQL follower replication lag
Cassandra write latency and compaction backlog
Memcached miss rate per key type

Business Metrics

Feed engagement rate (likes + comments per feed impression)
Time to first content (how quickly the first post appears after app open)
Posts per active user per day

Common Interview Follow-ups

"How would you handle a celebrity like Selena Gomez posting to 400 million followers?"

This is exactly the celebrity fan-out problem. Don't pre-compute — use the hybrid approach. Store the post in Selena's post store. When her followers open their feeds, the Feed Service checks a "celebrity accounts I follow" index, pulls Selena's N most recent posts, and merges them into the pre-built feed inline. The merge adds only 1–2 extra Cassandra reads per celebrity in the user's follow list — bounded and fast.

"What happens to a user's feed when they follow a new account?"

Two things need to happen: populate the new account's recent posts into the user's feed timeline (backfill), and ensure all future posts from the new account fan out to the user. The backfill is a one-time job: fetch the last N posts from the newly followed account and insert them into the user's Cassandra timeline. Future posts are handled by the standard fan-out worker since the user now appears in the account's follower list.

"How do you handle deleting a post?"

Post deletion needs to cascade through multiple systems: mark the post as deleted in PostgreSQL, remove or soft-delete it from the object store (media files), and purge it from feed timelines. Purging from feed timelines is eventually consistent — the post record is marked deleted and the Feed Service filters out deleted post IDs at read time. A background job can clean up Cassandra entries over time. CDN media URLs are invalidated via cache purge API.

"How does the like count stay accurate at 100,000 likes per second for a viral post?"

Don't write to PostgreSQL for every single like at peak. Use Redis with an atomic INCR command as the write target for like counts on hot posts, then periodically flush to PostgreSQL asynchronously. The Memcache lease pattern protects PostgreSQL from thundering herds on cache misses. For extremely viral posts, consider a counter aggregation tier (similar to how high-frequency trading systems handle order books) that batches increments before persisting them.

"How would you add an Explore/Discovery feed?"

The Explore feed surfaces posts from accounts a user doesn't follow. The architecture is meaningfully different from the home feed: instead of pulling from a pre-built social graph timeline, it relies on collaborative filtering and interest graphs. Candidate posts are generated by finding users similar to the viewer (based on mutual follows, engagement patterns), then pulling posts that those users engaged with. The ranking signals shift — relationship strength is irrelevant, so content-level signals (engagement velocity, topic match, visual similarity) carry more weight. This is a separate service with its own ML pipeline, not a feature bolted onto the home feed.

Quick Interview Checklist

Before wrapping your answer, confirm you've addressed these:

✅ Clarified scope (posting + feed, not DMs/Reels/Stories unless asked)
✅ Called out the read-heavy nature (50:1 read:write ratio) and its implications
✅ Explained the direct-to-S3 media upload pattern
✅ Described all three fan-out options and justified the hybrid approach
✅ Distinguished PostgreSQL (structured metadata, social graph) from Cassandra (feed timelines, activity logs)
✅ Explained why counters like like_count are denormalised
✅ Covered the two-stage feed ranking pipeline (candidate generation → ML ranking)
✅ Explained CDN for media delivery and cache TTL reasoning
✅ Mentioned the thundering herd problem and the Memcache lease mitigation
✅ Used cursor-based pagination (not offsets)
✅ Addressed the celebrity fan-out edge case

Conclusion

Designing a social feed at Instagram's scale is hard not because any single component is exotic, but because the interactions between components are full of non-obvious trade-offs. The fan-out decision alone branches into three valid approaches with meaningfully different characteristics. The database selection requires understanding both the relational nature of a social graph and the write-throughput demands of a time-series feed store. And the ranking system — seemingly a feature detail — turns out to require a multi-stage ML pipeline just to return 20 posts in the right order.

The pillars of a strong answer:

Direct-to-object-storage upload — never funnel media through API servers
Hybrid fan-out — push model for regular users, pull model for celebrities, merged at read time
PostgreSQL for structured data, Cassandra for feed timelines — right tool for each access pattern
Denormalised counters — like_count lives on the post record, never computed with COUNT(*)
Two-stage ranking — candidate generation from the pre-built timeline, followed by ML scoring
Memcache lease — the practical solution to thundering herds on popular content
CDN with long TTLs for media — the bulk of bandwidth cost, handled at the edge

The engineers who ace this interview don't necessarily know more facts — they make the trade-offs explicit. Every choice comes with a "because" and an "instead of." That's what system design is actually testing.

Frequently Asked Questions

Fan-out on write (push model) means when a user posts, the system immediately copies that post reference into every follower's pre-built feed. Fan-out on read (pull model) means each follower's feed is assembled live when they open the app.

	Fan-Out on Write	Fan-Out on Read
How feed is built	Pre-built when post is created	Assembled live on every read
Feed read latency	Fast — just read a pre-built list	Slow — touch hundreds of data sources
Write cost	High — one write becomes N writes	Low — posting is a single DB write
Celebrity problem	Catastrophic — 400M follower writes	Natural — no write amplification
Wasted work	Yes — fans who never open the app	No — only build what's requested
Best for	Regular users	Celebrity accounts

Why neither works alone:

Pure fan-out on write breaks for celebrity accounts — Selena Gomez posting triggers 400 million simultaneous writes
Pure fan-out on read breaks at 175,000 feed reads/second — assembling feeds live from hundreds of accounts for every read is too slow

The right answer is the hybrid approach: push for regular users (< ~1M followers), pull for celebrities, merged at read time.

How does the hybrid fan-out work for celebrity accounts?

The hybrid fan-out applies the push model to regular users and the pull model to celebrity accounts, merging the two at feed-read time.

How it works step by step:

When a regular user (500 followers) posts: fan-out worker pushes the post ID into all 500 followers' pre-built timelines in Cassandra — fast, bounded cost
When a celebrity (10M followers) posts: only write the post to their own post store — no fan-out
When a follower opens their feed:
- Read their pre-built timeline (covers all regular accounts they follow) — one fast read
- Query recent posts from each celebrity they follow — a small, bounded set of additional reads
- Merge and rank the combined results

The celebrity threshold is a tunable parameter — commonly 1 million followers — above which an account is treated as celebrity for fan-out purposes.

Why this works: the expensive push model is only applied where it's cheap (small follower counts). Pull is applied where push would be prohibitive (large follower counts). The extra read cost at fan time is bounded — most users follow only a handful of celebrity accounts.

Why use Cassandra for feed timelines instead of PostgreSQL?

Cassandra's LSM-tree storage engine is designed for the exact write pattern that feed timelines generate — high-throughput sequential appends. PostgreSQL's B-tree indexes degrade under that load.

The feed write pattern:

Every post by every non-celebrity user triggers writes to all followers' timelines
At 1,150 posts/second with an average of 500 followers each, that's ~575,000 Cassandra writes per second
These are sequential appends to existing partitions — Cassandra's sweet spot
PostgreSQL B-tree index updates under this write rate would cause severe write amplification and disk I/O saturation

The feed read pattern:

"Fetch user B's last 20 feed items" is a single Cassandra partition scan — WHERE user_id = B ORDER BY post_id DESC LIMIT 20
All of user B's feed entries are co-located on the same Cassandra node (partition key = user_id)
This is a contiguous range read — fast and predictable at any scale

What Cassandra gives up: no flexible ad-hoc queries, no JOINs, eventual consistency by default. That's why PostgreSQL still handles the structured data — user profiles, follow relationships, post metadata — where relational queries and ACID transactions are needed.

What is the pre-signed URL pattern for photo uploads?

A pre-signed URL is a time-limited URL that grants direct write access to an object storage bucket (S3, GCS) without exposing credentials. It removes the API server completely from the media upload path.

Why never upload photos through your API server:

A 300 KB photo ties up an API server thread for the duration of the upload
A 100 MB video ties it up for seconds or minutes
At Instagram's scale (1,150 posts/second), this would saturate API server capacity entirely

The pre-signed URL flow:

Client calls POST /posts/upload-url — tiny request, returns instantly
Post Service generates a pre-signed S3 URL valid for 10 minutes
Client uploads directly to S3 using that URL — API servers are completely bypassed
Client calls POST /posts with the media URL and caption — confirms the upload
Post Service creates the post record and triggers async media processing

Why this pattern is universal at scale: Instagram, Dropbox, Slack, GitHub — all use direct-to-S3 uploads. The API server's job is coordination, not binary data transfer.

Why are like counts stored as a denormalised counter instead of a COUNT(*) query?

Denormalised counters store the like count directly on the post record and update it atomically on each like/unlike. The alternative — running SELECT COUNT(*) FROM likes WHERE post_id = X at read time — is catastrophic at scale.

Why COUNT(*) fails at Instagram scale:

Instagram processes billions of feed impressions daily — each one would trigger a count query
A viral post might have 10 million likes — COUNT(*) scans 10 million rows per request
At 175,000 feed reads/second, this generates hundreds of millions of full table scans per day

How the denormalised counter works:

sql

-- When a like is created:
INSERT INTO likes (post_id, user_id) VALUES ($post_id, $user_id);
UPDATE posts SET like_count = like_count + 1 WHERE id = $post_id;
 
-- When a like is removed:
DELETE FROM likes WHERE post_id = $post_id AND user_id = $user_id;
UPDATE posts SET like_count = like_count - 1 WHERE id = $post_id;

For viral posts at extreme spike rates: Redis INCR on a per-post key handles the write burst, with async flush to PostgreSQL. The Memcache lease pattern prevents thundering herds when the Redis/Memcached entry expires.

What is the thundering herd problem and how does Memcache lease solve it?

Thundering herd occurs when a cached value expires and many servers simultaneously try to recompute it from the database — all hitting the DB at the same moment and generating redundant load.

The scenario for Instagram:

A celebrity's post goes viral — its metadata is cached in Memcached
The cache entry expires simultaneously on 200 app servers
All 200 servers query PostgreSQL for the same post record at the same moment
PostgreSQL receives 200 concurrent requests for one row — a sudden spike

How Memcache lease prevents this:

The first server to miss the cache receives a lease token from Memcached
All subsequent servers requesting the same key are asked to wait (or receive a stale value)
Only the lease-holder queries the database and repopulates the cache
Waiting servers then read from the freshly populated cache — one DB query instead of 200

This pattern is described in Meta's 2013 Memcached at Scale paper and is used in Instagram's production stack.

How does cursor-based pagination work for a social feed?

Cursor-based pagination uses the ID of the last seen post as the anchor for the next page, instead of a numeric page offset. It is stable, efficient, and does not break when new posts arrive.

Why offset pagination breaks for feeds:

User loads page 1 (posts 1–20) — 5 new posts arrive at the top
User loads page 2 (posts 21–40) — but items have shifted; posts 16–20 are now on "page 2"
User sees posts 16–20 again — duplicate entries on every page load

How cursor pagination works:

plaintext

GET /feed?limit=20
→ Returns posts, includes: "next_cursor": "post_id_7234889100"
 
GET /feed?cursor=7234889100&limit=20
→ Returns the 20 posts with IDs immediately after 7234889100
→ Stable regardless of new posts arriving at the top

Why Snowflake IDs make this work perfectly: Snowflake IDs encode a timestamp in the most significant bits, so they are naturally time-sortable. WHERE post_id < $cursor ORDER BY post_id DESC LIMIT 20 is an efficient range scan with no additional sort overhead, and the result is consistent even as new posts arrive.

How does the Instagram feed ranking pipeline work?

Instagram's feed ranking is a multi-stage ML pipeline, not a simple reverse-chronological sort. It uses a two-stage approach to balance speed and accuracy.

Stage 1: Candidate generation

Pull post IDs from the user's pre-built Cassandra timeline
Merge with recent posts from celebrity accounts they follow (pull model)
Result: a pool of 500–2,000 candidate post IDs

Stage 2: Lightweight scoring

Run a fast, approximate ML model over all candidates
Features: recency, basic engagement velocity, author relationship strength
Prune to ~500 highest-scoring candidates

Stage 3: Deep ranking

Run a heavier model on the top 500 candidates
Key signals: the user's past engagement patterns, how quickly others are liking this specific post, the relationship strength with the author, content type preferences
Output: ranked list

Stage 4: Post-processing

Enforce diversity — cap posts per author to prevent one account dominating the feed
Apply content policy filters
Optionally inject recommended posts from Explore

The entire pipeline runs in ~50–100ms before the feed response is sent. Instagram runs over 1,000 ML models simultaneously across all surfaces to power this.

Meta, Google, Amazon, Microsoft, Twitter (X), TikTok, LinkedIn, Pinterest, and Snap all ask variants of this question for senior software engineer roles.

Why it is one of the most popular system design questions:

Touches every major distributed systems concept — social graphs, media storage, feed generation, caching at scale, and ML ranking all appear in a single question
Has no single correct answer — the fan-out trade-off has multiple valid approaches, which reveals how well a candidate reasons through ambiguity
Scales perfectly to seniority — a mid-level answer describes the happy path; a senior answer covers the celebrity problem, thundering herd mitigation, cursor pagination, and why Cassandra's LSM tree is the right storage engine for feed timelines

What interviewers specifically listen for:

Hybrid fan-out — named and explained with the celebrity threshold, not just "use a queue"
Cassandra over PostgreSQL for feed timelines — with the LSM-tree write throughput reasoning
Pre-signed URL for uploads — eliminating the API server from the binary data path
Denormalised counters — and why COUNT(*) is unacceptable at scale
Cursor-based pagination — and the specific reason offset pagination breaks on live feeds

If any of those five feel uncertain when you imagine explaining them live, Mockingly.ai runs Instagram and social feed system design simulations — with follow-up questions on exactly these points — built for engineers targeting roles at Meta, Google, and TikTok.

The gap between reading about system design and being able to explain it fluently under pressure is larger than most people expect. If you want to find out where your actual gaps are before the real interview, Mockingly.ai runs realistic system design simulations — the kind where you have to actually talk through the fan-out decision, not just nod along while reading about it.

How to Approach This Question in a System Design Interview

When this problem appears in an interview, candidates often jump straight into databases or machine learning ranking. A better approach is to walk through the system step‑by‑step and explain your reasoning.

A clean way to structure the answer:

Clarify requirements
- Is this only photos or also videos?
- Do we support stories and reels?
- How many followers can a user have?
- Do we need real‑time updates?
Start with the simplest design

Begin with a minimal architecture:

User → API Server → Database

The API server stores posts and retrieves them when a user opens their feed.
Introduce the feed generation problem

The main challenge in social feeds is deciding how a user's feed is built.

Two common approaches:

Fan‑out on write
When a user posts, we push that post into the feed of all followers.

Fan‑out on read
We build the feed dynamically when a user opens the app.
Explain the hybrid approach

Large social networks typically use a hybrid approach:
- Normal users → fan‑out on write
- Celebrity accounts → fan‑out on read
This avoids the problem of a celebrity with millions of followers generating huge write amplification.
Add caching

Feeds are heavily cached because users refresh frequently.

Common strategy:

Feed Service → Cache → Database

The cache stores precomputed feeds for fast loading.
Discuss ranking

A simple system may show posts in reverse chronological order.
Production systems use ranking models to sort posts based on signals such as:
- user engagement
- relationship strength
- recency
- content type
Mention media storage

Photos and videos are typically stored in object storage and served through a CDN.
Discuss scaling concerns

Finally discuss:
- celebrity problem
- caching
- feed recomputation
- storage growth

Interviewers mainly care about how you reason about trade‑offs, not about memorizing a specific architecture.

The Celebrity Problem

One of the classic challenges in feed systems is the celebrity problem.

Imagine a user with 200 million followers posting a photo.

If we used pure fan‑out on write, the system would need to generate 200 million feed updates instantly.

That creates massive write amplification and can overload the system.

A common solution is:

normal users → fan‑out on write
celebrity users → fan‑out on read

When someone opens their feed, the system dynamically fetches posts from celebrity accounts they follow.

This dramatically reduces write load.

Why Feed Systems Are Hard

At first glance, a feed seems simple: just show posts from people you follow.

In reality, large social feeds must handle:

billions of posts
billions of users
constant refresh traffic
complex ranking logic
massive media storage

Most of the engineering complexity comes from scale and latency requirements, not the basic functionality.

What Interviewers Are Usually Testing

When interviewers ask this question, they are typically looking for whether you understand a few core ideas:

• fan‑out on write vs fan‑out on read
• caching feeds
• handling celebrity users
• ranking signals
• storing large media files
• scaling to hundreds of millions of users

You don't need to design the exact architecture used by Instagram. What matters is showing that you understand the trade‑offs involved in large‑scale feed systems.