System Design Interview Questions

From Upload to Play: Understanding YouTube’s Scalable Architecture

Quick Overview of What You’ll Learn

  • YouTube’s global scale: 2.7 billion MAU, 122 million DAU, 1 billion hours watched daily, available in 100 countries/80 languages, 500 hours uploaded per minute

  • Financial overview: $29.24 B ad revenue in 2022, with the U.S. video‑streaming market projected to reach $66.4 B by 2030, implying YouTube’s revenue could approach the $60 B range

  • Core system components: CDN network, blob storage, transcoding pipeline (DAG), metadata DB (sharding + caching), API servers, load balancer

  • Key workflows: chunked uploads with pre‑signed URLs, parallel DAG‑based transcoding, cache‑driven streaming, adaptive bitrate (HLS/DASH), multi‑region failover

  • Cost & performance optimizations: on‑demand encoding, regional distribution, CDN partnerships, retry logic, auto‑scaling

  • Advanced patterns: UserID vs. VideoID sharding, Redis hot‑video cache, live‑streaming low‑latency protocols, DRM/encryption, pre‑signed URL security

  • Follow‑up topics & interview prompts

What Are YouTube’s Key Usage Metrics in 2024 – 2025?

YouTube now serves more than 2.7 billion logged‑in users each month, making it the second‑largest social platform after Facebook.
On a daily basis, over 122 million people actively use YouTube, collectively streaming 1 billion hours of video worldwide every day.

To support this global audience, YouTube is localized in over 100 countries and supports more than 80 languages. Creators upload more than 500 hours of new video content every minute, adding roughly 30,000 hours of footage each hour.

How Profitable Is YouTube’s Ad Business?

In 2022, YouTube generated $29.24 billion in advertising revenue, accounting for over 11 percent of Google’s total ad revenues that year..
More recent reports show YouTube ad income rose to $31.51 billion in 2023, an 8 percent increase year‑over‑year. Industry forecasts anticipate the U.S. video‑streaming market will reach US$66.4 billion by 2030 citeturn4search6, suggesting that YouTube’s global revenue, as the leading platform, may scale into the $60 billion range.

Beyond Just Watching The Technology Powering YouTube

How Does YouTube Handle Fast Video Uploads?

  • Pre‑signed URLs grant clients direct, secure write access to blob storage without exposing credentials.

  • Chunked uploads use GOP (Group of Pictures) alignment, enabling resumable transfers and parallelism for faster throughput.

  • Completion queues decouple upload, transcoding, and metadata updates via message queues to ensure a loosely coupled architecture.

If you’re preparing for system design interviews on large‑scale pipelines, explore our Master DSA, Web Dev & System Design course for real‑world case studies.

What Is YouTube’s Transcoding Pipeline?

YouTube’s transcoding service converts raw uploads into multiple streaming formats (HLS, MPEG‑DASH, etc.) for device compatibility and adaptive bitrate delivery:

  1. Preprocessor splits videos into GOP‑aligned segments and caches them for reliability.

  2. DAG scheduler builds a Directed Acyclic Graph of tasks (encoding, thumbnailing, watermarking).

  3. Resource manager allocates worker nodes from task and worker queues based on priority.

  4. Task workers run encoding and asset‑generation in parallel.

  5. Transcoded storage holds completed streams, which are then pushed to CDN edge caches.

For a deep dive into parallel processing and DAG models, check out our DSA course to strengthen your algorithmic foundations.

Back‑of‑the‑Envelope Estimation: Crunching YouTube’s Scale

How Much Storage Do Daily Uploads Require?

  • Assumptions: 5 million DAU, 5 videos per user, 10 % upload rate, 300 MB per video

  • Daily storage = 5 M × 10 % × 0.3 GB = 150 TB per day → over 1 PB per week just for raw uploads

What Would CDN Costs Look Like?

  • Total streaming data = 5 M users × 5 videos/user × 0.3 GB = 7.5 M GB/day

  • CloudFront pricing ≈ $0.02/GB → $150,000 per day, ~$4.5 million per month.

Crafting the Blueprint – High‑Level System Design

Why Leverage Cloud Services Instead of Building In‑House?

  • Time & cost efficiency: Proven blob storage and CDNs scale instantly.

  • Reliability: Third‑party providers like AWS and Akamai guarantee SLAs.

Industry precedent: Netflix uses AWS; Facebook uses Akamai.

What Are the Core Components?

  1. Client (Web, iOS/Android, Smart TV)

  2. Load balancer → distributes requests to

  3. API servers (stateless, handle metadata, recommendations, auth)

  4. Metadata DB & cache (sharded MySQL + Redis)

  5. Original blob storage (raw uploads)

  6. Transcoding cluster

  7. Transcoded storage

  8. CDN edge (streaming delivery)

  9. Message queues (upload/transcode completion, notification)

Video Uploading Flow

Step‑by‑Step Breakdown

  1. Client fetches pre‑signed URL from API server.

  2. Client uploads chunks directly to blob storage.

  3. API server updates metadata DB/cache in parallel.

  4. Transcoding servers pick up raw files, process via DAG, store outputs.

  5. Completion handler updates metadata and pushes streams to CDN.

  6. API server notifies client: “Video ready for streaming.”

How Does Video Streaming Work?

Which Streaming Protocols Are Supported?

  • MPEG‑DASH – adaptive bitrate streaming on most desktops.
  • Apple HLS – required for iOS and Safari.
  • Microsoft Smooth Streaming – legacy Windows devices.

  • Adobe HDS – older Adobe‑based clients.

Edge Delivery via CDN

Edge servers cache transcoded segments close to users, minimizing latency and enabling instant playback.

Database Schema for Video Metadata & User Data

Videos Table (MySQL)

Column

Type

Description

VideoID

BIGINT PK

Unique video identifier

Title

VARCHAR(255)

Video title

Description

TEXT

Detailed description

Size

BIGINT

File size in bytes

Thumbnail

VARCHAR(500)

URL of thumbnail

UploaderID

BIGINT FK

Reference to Users.UserID

LikesCount

BIGINT

Total likes

DislikesCount

BIGINT

Total dislikes

ViewsCount

BIGINT

View count

UploadTime

DATETIME

Upload timestamp

Indexes

  • (VideoID, UploaderID) for fetching user videos

  • Full‑text on (Title, Description) for search

Comments Table

Column

Type

Description

CommentID

BIGINT PK

Unique comment ID

VideoID

BIGINT FK

References Videos.VideoID

UserID

BIGINT FK

References Users.UserID

CommentText

TEXT

Comment content

CreatedAt

DATETIME

Timestamp

Indexes

  • (VideoID, CreatedAt DESC) for recent comments

  • (UserID, VideoID) for engagement analysis

Users Table

Column

Type

Description

UserID

BIGINT PK

Unique user identifier

Name

VARCHAR(255)

Full name

Email

VARCHAR(320)

Unique email for authentication

PasswordHash

VARCHAR(500)

Secure hashed password

Address

VARCHAR(500)

Optional location

Age

INT

Optional age

RegistrationDate

DATETIME

Account creation timestamp

Indexes

  • (Email) for login

  • (UserID, RegistrationDate) for analytics

Design Deep Dive

Video Transcoding 

Why transcoding?

  • Device compatibility (MP4, WebM, HLS, DASH)

  • Adaptive quality (144p→4K)

  • Storage efficiency (multi‑bitrate)

Common codecs & containers:

  • Container: MP4, MKV, HLS TS
  • Video codecs: H.264, VP9, AV1
  • Audio codecs: AAC, Opus

Parallel Processing with DAG

Tasks—splitting, encoding, thumbnailing—are represented as nodes in a Directed Acyclic Graph, executed concurrently where dependencies allow.

Speed & Safety Optimizations

Chunked, Parallel Uploads

  • GOP splitting on the client accelerates uploads and enables resumability.
  • Global upload centers route to nearest edge point, reducing latency.

Secure Pre‑signed URLs

Clients upload directly to storage using URLs that expire, ensuring only authorized writes.

Optimizing Video Streaming Systems

Security & DRM

  • DRM: FairPlay, Widevine, PlayReady.
  • AES Encryption for protected streams.
  • Watermarks to identify content ownership.

Cost‑Saving Techniques

  • On‑demand encoding for infrequent videos.
  • Regional CDN caching based on view patterns.

Custom CDN partnerships to lower bandwidth fees.

Error Handling & Reliability

  • Retry logic for uploads and transcodes.

  • Health‑monitoring load balancers that remove failed nodes.

  • DB master‑slave failover and replicated caches (e.g., Redis).

Metadata Sharding Strategies

Shard by UserID

  • Approach: Hash UserID → assign all user videos to same shard.

  • Pros: Easy user‑centric reads.

  • Cons: Hotspots for popular creators.

  • Mitigation: Consistent hashing, dynamic repartitioning.

Shard by VideoID

  • Approach: Hash VideoID → evenly distribute across shards.

  • Pros: Balanced writes.

  • Cons: Aggregation overhead for user feeds.

  • Enhancement: Cache hot videos in Redis to minimize cross‑shard reads.

Wrapping Up: Designing Your Own Scalable Video Service

By combining:

  • Pre‑signed URLs and chunked uploads for fast, secure ingestion

  • DAG‑based transcoding for parallel, multi‑format output

  • Global CDN for low‑latency delivery

  • Sharded metadata + caching for high‑throughput reads/writes

  • Cost‑saving measures (on‑demand encoding, regional CDNs)

you can architect a video‑platform that rivals YouTube in scale, reliability, and efficiency.

How do I master the core data structures and algorithms for technical interviews?

Start by building a strong foundation in arrays, linked lists, trees, graphs, sorting and searching techniques. Our DSA course offers hands‑on coding challenges, step‑by‑step explanations, and mock interview questions to help you ace your next interview.

Focus on both front‑end and back‑end technologies—HTML, CSS, JavaScript frameworks, RESTful APIs, and databases. Enroll in our Web Development course to learn modern stacks through real projects and mentorship from industry experts.

Integrating design patterns and algorithmic thinking accelerates your learning curve and prepares you for senior roles. Our Design + DSA Combined course merges system design workshops with algorithm deep‑dives for a cohesive experience.

For a comprehensive curriculum that spans data structures, web frameworks, and large‑scale system architecture, check out our Master DSA, Web Dev & System Design course. It’s tailored for those aiming at high‑impact engineering roles.

Learn how to preprocess data, build predictive models, and visualize insights using Python and machine learning libraries. Our Data Science course guides you through end‑to‑end projects and case studies to make your portfolio stand out.

This insightful blog post is authored by Ajit Pedha, who brings his expertise and deep understanding of the topic to provide valuable perspectives.

DSA, High & Low Level System Designs

Buy for 60% OFF
₹25,000.00 ₹9,999.00

Accelerate your Path to a Product based Career

Boost your career or get hired at top product-based companies by joining our expertly crafted courses. Gain practical skills and real-world knowledge to help you succeed.

Reach Out Now

If you have any queries, please fill out this form. We will surely reach out to you.

Contact Email

Reach us at the following email address.

arun@getsdeready.com

Phone Number

You can reach us by phone as well.

+91-97737 28034

Our Location

Rohini, Sector-3, Delhi-110085

WhatsApp Icon

Master Your Interviews with Our Free Roadmap!

Hi Instagram Fam!
Get a FREE Cheat Sheet on System Design.

Hi LinkedIn Fam!
Get a FREE Cheat Sheet on System Design

Loved Our YouTube Videos? Get a FREE Cheat Sheet on System Design.