Course Content
Data Structures & Algorithms
Full Stack Web Development
Understanding and playing with DOM (Document Object Model)
0/2
MERN project
0/2
Low Level System Design
LLD Topics
High Level System Design
Fast-Track to Full Spectrum Software Engineering
1. What is Purging and Why is it Needed?

Purging refers to the deletion or removal of old, outdated, or unnecessary data from the database to maintain optimal performance.

 

For a messaging system like Messenger, purging is crucial for the following reasons:

 

  • Freeing Up Storage: Media files, messages, and user data accumulate quickly. Without regular purging, the database can grow too large, consuming storage resources unnecessarily.

 

  • Improving Performance: Large databases with unnecessary data can slow down query performance, making it more difficult to retrieve or update user data efficiently.

 

  • Cost Management: Especially for cloud services like AWS or Google Cloud, large volumes of data incur costs. Purging old data helps in reducing storage costs.

 

  • Legal & Privacy Compliance: In some cases, it is required to delete user data after a certain period for legal or privacy reasons (e.g., GDPR compliance).


2. Types of Data to be Purged

In a system like Messenger, various types of data can be purged over time. Key examples include:

 

  • Expired Messages: Messages in a chat may not need to be stored indefinitely, especially if the user has deleted them or they have become irrelevant.
    • For example, deleted or expired chat messages can be removed from the system after a certain period.

 

  • Inactive User Accounts: Users who haven’t logged in or used the system for an extended period may have their data cleaned up.
    • For instance, users who haven’t logged in for 1 or 2 years might have their data purged unless they reactivate their account.

 

  • Media Files: Multimedia content like images, videos, and audio messages consume significant storage.
    • Messenger can remove media files after a certain period or once they are no longer part of active conversations.

 

  • Temporary Data: Temporary files, session data, and cache data may not be needed after a user logs out or after a specific duration.

 

  • Chat Histories: Older messages in a chat can be archived or deleted after a certain threshold.
    • For example, messages older than 30 days in a group chat could be archived or purged.


 

3. Database Cleanup Process

The process of database cleanup involves several key steps to ensure that only relevant and active data remains in the system. It’s often automated and scheduled to run periodically.

 

3.1. Data Retention Policies

A data retention policy outlines how long data will be retained and when it should be purged. This policy is crucial in determining which data to keep and when to delete it. For example:

 

  • Messages: Messages could be retained for 30 days after being sent and then purged.

 

  • User Data: Users who haven’t logged in for 6 months may have their data removed unless they request otherwise.

 

  • Media: Media files can be deleted after a specified period or when they are no longer accessed.

 

3.2. Soft Deletion vs. Hard Deletion

  • Soft Deletion: When data is “soft deleted,” it is marked as deleted in the database, but not actually removed. This allows for potential recovery if needed. For example, when a user deletes a message, the message might be marked as deleted, but not immediately purged, allowing for recovery in case of accidental deletion.

 

  • Hard Deletion: This involves permanently removing data from the database. For example, after a user account is marked as inactive for 1 year, all associated data can be completely deleted from the database.

3.3. Archiving Old Data

Instead of immediately purging older data, another option is archiving it:

 

  • Older messages or user data may be moved to a secondary storage system or cold storage (such as Amazon S3 or Google Cloud Coldline).

 

  • Archiving keeps the data for legal or regulatory purposes but reduces the load on primary databases and minimizes storage costs.

3.4. Automated Cleanup Jobs

Automated cleanup jobs (usually cron jobs or scheduled tasks) are set up to run at regular intervals to purge old data from the system. These tasks typically handle:

 

  • Purging inactive users and their data.
  • Deleting old messages beyond a certain time limit.
  • Removing media files that haven’t been accessed for a set period.

3.5. Transactional Cleanup

In high-volume systems like Messenger, a transactional cleanup approach ensures that cleanup tasks are performed in a way that does not negatively impact system performance. For example:

 

  • Cleanup tasks should be performed during off-peak hours or in small batches to minimize the load on the system.
  • Transactions should be used to ensure that the system remains in a consistent state during cleanup.


4. Challenges in Purging and Cleanup

Some of the challenges that may arise during purging and cleanup in a system like Messenger include:

 

  • Data Integrity: Ensuring that purging data does not accidentally delete important or active data.

 

  • Real-Time Deletion: Ensuring that deletions (especially in a real-time system) are processed quickly and accurately, without affecting ongoing messaging.

 

  • Large-Scale Deletion: Deleting large volumes of data (e.g., when multiple users’ messages need to be deleted) without negatively impacting the system’s performance.

 

  • Consistency: Ensuring that the purge process does not conflict with other operations, like sending or receiving new messages.

 

  • Regulatory Compliance: Ensuring that data retention and purging processes adhere to legal regulations such as GDPR or CCPA, which mandate the deletion of user data after certain periods.


5. Tools and Techniques for Purging and Cleanup

To implement an effective purging and cleanup strategy, the following tools and techniques may be used:

 

  • Database Management Systems (DBMS): Databases like PostgreSQL or Cassandra provide built-in support for managing large datasets, including partitioning and indexing, which help optimize purging operations.

 

  • Log Management Tools: Tools like Elasticsearch can be used for managing logs and cleaning up logs older than a certain threshold.

 

  • Backup Systems: Backup and restore systems help in the case of accidental data deletion, providing a safety net for the cleanup process.

 

  • Scheduled Cron Jobs: Setting up regular cleanup tasks to automatically purge old data without manual intervention.

 

  • Data Analytics Tools: For analyzing data usage patterns to determine which data is inactive and should be purged.


6. Example of Purging Process in Messenger

Let’s consider how the purging process might work in Messenger for different data types:

 

Messages:

  • A user sends a message to a friend or group.
  • After 30 days, if the message is not marked as important or hasn’t been replied to, it is scheduled for deletion.

 

Media Files:

  • A user shares a photo. After 90 days, if the media file has not been accessed again, it is scheduled for deletion, or archived to cold storage for compliance purposes.

 

Inactive Users:

  • If a user has not logged in for over 1 year, their account data is soft-deleted. If no activity is detected after 6 months, their account and associated data are hard-deleted.

 

Conclusion

The purging and database cleanup process in a messaging system like Messenger is an essential part of maintaining the system’s scalability, performance, and cost efficiency. It ensures that unnecessary data is removed, critical data remains intact, and the system operates efficiently over time.

0% Complete
WhatsApp Icon

Hi Instagram Fam!
Get a FREE Cheat Sheet on System Design.

Hi LinkedIn Fam!
Get a FREE Cheat Sheet on System Design

Loved Our YouTube Videos? Get a FREE Cheat Sheet on System Design.