1. Estimating Scale of the System
1.1. Traffic Estimations
- Assume the system shortens about 100 million URLs per month.
- Assume each short URL is accessed 10 times on average.
- This results in 1 billion redirection requests per month.
1.2. URL Storage Estimations
-
If we store URLs for 10 years, the system will store:
100 million × 12 months × 10 years = 12 billion URLs
1.3. Read vs. Write Ratio
- URL shortening is a write operation.
- URL redirection is a read operation.
- Reads are much higher than writes (typically 100:1 ratio).
2. Storage Constraints
2.1. URL Storage Size
Each stored URL requires:
- Short URL ID: ~7 bytes (Base62 encoding)
- Original URL: ~500 bytes (average)
- Metadata (timestamp, access count, expiration, etc.): ~100 bytes
- Total Storage per URL = ~600 bytes
For 12 billion URLs, the storage required is: 12 billion × 600 bytes ≈ 7.2 TB
If we add indexing overhead, storage may reach 10 TB over 10 years.
2.2. Caching Requirements
- Frequently accessed URLs should be cached (Redis, Memcached).
- If 20% of URLs account for 80% of traffic, caching 2.4 billion URLs would require:
- Each cached URL entry ~100 bytes
- Total Cache Size ≈ 240 GB (using Redis or Memcached).
3. Bandwidth Constraints
3.1. Shortening Requests (Writes)
- 100 million URL shortening requests per month
- ~40 URLs per second
3.2. Redirection Requests (Reads)
- 1 billion redirections per month
- ~4,000 requests per second (RPS) at peak load
3.3. Data Transfer Estimation
Assuming each redirection request is ~0.5 KB (URL lookup, metadata, logs):
- Daily Data Transfer:
4,000 RPS × 0.5 KB × 86,400 sec ≈ 172 GB/day - Monthly Data Transfer:
~5 TB per month
4. Performance & Latency Constraints
4.1. URL Shortening Latency
- Shortening should take <100 ms, including:
- Encoding the URL (Base62)
- Storing it in a database
- Returning the short URL to the user
4.2. URL Redirection Latency
- Redirection should be <10 ms for cached URLs.
- <100 ms for database lookups.
- CDNs and edge caching can reduce latency further.
5. Constraints & Challenges
5.1. Database Constraints
- A relational database (MySQL, PostgreSQL) may not scale well beyond a few billion URLs.
- A NoSQL solution (Cassandra, DynamoDB, MongoDB) may be better for handling high traffic.
5.2. Scalability Constraints
- The system must handle sudden traffic spikes.
- Horizontal scaling with multiple servers is needed.
5.3. Security Constraints
- Prevent brute force attacks on short URLs.
- Detect and block malicious URLs (phishing, malware).
5.4. Fault Tolerance & Availability
- Replication across data centers ensures uptime.
- Load balancers distribute traffic evenly.
6. Summary of Capacity Estimations
Component | Estimated Value |
---|---|
URLs shortened per month | 100 million |
Total URLs stored (10 years) | 12 billion |
Short URL access per month | 1 billion |
Read-to-write ratio | 100:1 |
Storage per URL | ~600 bytes |
Total storage (10 years) | ~10 TB |
Cached URLs | ~2.4 billion |
Cache size | ~240 GB |
Peak redirection requests | ~4,000 RPS |
Bandwidth usage per month | ~5 TB |
Shortening latency | <100 ms |
Redirection latency (cached) | <10 ms |