Course Content
Data Structures & Algorithms
Full Stack Web Development
Understanding and playing with DOM (Document Object Model)
0/2
MERN project
0/2
Low Level System Design
LLD Topics
High Level System Design
Fast-Track to Full Spectrum Software Engineering

1. Why is Database Cleanup Important?

Improves performance – Reduces the size of active records, improving query speed.
Saves storage costs – Deletes old, unused, or expired links to free up space.
Ensures compliance – Removes URLs violating policies (e.g., spam, malicious links).
Reduces clutter – Keeps the system clean by removing inactive data.


2. What Needs to Be Purged?

Expired URLs – URLs with a predefined expiration time.
Unused URLs – URLs that haven’t been accessed for a long time.
Malicious URLs – URLs reported for phishing, spam, or illegal activities.
Orphaned Data – Any temporary or test data created by the system.



3. Strategies for Purging & Cleanup

A. Scheduled Expiry Cleanup (Automatic Deletion)

  • Store an expires_at timestamp in the database.
  • Run a scheduled cron job (daily/weekly) to delete expired URLs.

Example SQL Query:
DELETE FROM url_mapping
WHERE expires_at < NOW();

 

Automation: Use a cron job to run the query daily at midnight:

 
0 0 * * * mysql -u user -p'password' -e "DELETE FROM url_mapping WHERE expires_at < NOW();"


B. Archiving Old URLs Instead of Deleting

Instead of deleting URLs immediately, move them to an archive table before permanent deletion.

 
INSERT INTO archived_urls (short_id, long_url, created_at, expires_at)
SELECT short_id, long_url, created_at, expires_at FROM url_mapping WHERE expires_at < NOW();

DELETE FROM url_mapping WHERE expires_at < NOW();

 

Benefit: This allows recovery of URLs if needed before final deletion.



C. Detecting and Deleting Unused URLs

  • Track URL access frequency in a hits column.
  • If a URL hasn’t been accessed in X months, mark it for deletion.

Example SQL Query:
DELETE FROM url_mapping
WHERE last_accessed < (NOW() - INTERVAL 6 MONTH);

 

This deletes URLs that haven’t been visited in 6 months.



D. Removing Malicious or Blacklisted URLs

  • Maintain a blacklist table with blocked keywords/domains.
  • Run a job to check and delete matching URLs.

Example SQL Query:
 
DELETE FROM url_mapping
WHERE long_url LIKE '%bannedsite.com%' OR long_url IN (SELECT url FROM blacklist);

 

Automation: Run this query every few hours to remove harmful links.



E. Index Optimization & Table Maintenance

After deleting large amounts of data, optimize the database to free up space and improve performance.


Optimize Table After Deletion:
 
OPTIMIZE TABLE url_mapping;

 

This reclaims disk space and defragments the table.



4. When Should Purging Be Done?

 

Night-time Maintenance Windows – Low-traffic hours reduce impact on performance.
Incremental Cleanup – Instead of bulk deletes, remove small batches of records.
On-Demand Cleanup – Triggered manually when storage usage crosses a threshold.



5. Monitoring & Logging Cleanup Operations

 

It’s crucial to log all purging activities for debugging and tracking.


Example Log Entry Format:
 
[2025-03-28 02:00:00] Deleted 50,000 expired URLs.
[2025-03-28 02:05:00] Archived 10,000 unused URLs.
[2025-03-28 02:10:00] Removed 500 malicious URLs.

 

Use Elasticsearch/Kibana/Grafana for log analysis.



6. Scaling Cleanup for Large Systems

 

For massive data volumes, optimize with partitioning and batch processing:

Partitioning by Date – Store URLs in separate tables based on year/month.
Batch Deletion – Instead of deleting millions of rows at once, delete 10,000 rows per batch.

DELETE FROM url_mapping WHERE expires_at < NOW() LIMIT 10000;

 

Use Background Workers – Run cleanup asynchronously using Kafka, Celery, or AWS Lambda.



7. Final Takeaways

  • Automate cleanup with scheduled jobs.
  • Use archival instead of immediate deletion.
  • Optimize database performance after deletion.
  • Monitor logs to track purging efficiency.
  • Ensure scalability with batch deletions and indexing.
0% Complete
WhatsApp Icon

Hi Instagram Fam!
Get a FREE Cheat Sheet on System Design.

Hi LinkedIn Fam!
Get a FREE Cheat Sheet on System Design

Loved Our YouTube Videos? Get a FREE Cheat Sheet on System Design.