Course Content
Data Structures & Algorithms
Full Stack Web Development
Understanding and playing with DOM (Document Object Model)
0/2
MERN project
0/2
Low Level System Design
LLD Topics
High Level System Design
Fast-Track to Full Spectrum Software Engineering
1. Why Purging & DB Cleanup is Important

  • Storage Management: As the system scales, the volume of data grows significantly. Over time, old or irrelevant data such as completed ride records, payment receipts, and user-session logs accumulate. Purging unnecessary data ensures that the database doesn’t grow uncontrollably, saving storage costs and improving query performance.

 

  • Performance Optimization: A cluttered database can slow down read/write operations, especially when performing queries on large datasets. Regular cleanup and archiving of old data improve system responsiveness.

 

  • Compliance and Data Retention: Certain data may need to be retained for legal or regulatory reasons. However, older data that’s no longer needed should be purged to comply with data retention policies and privacy laws (e.g., GDPR).


2. Types of Data to Purge

For a ride-sharing platform like Uber/Ola, the following types of data may need to be purged or cleaned up:

 

  • Completed Ride Data: After a ride is completed, details such as the ride’s time, payment, and feedback are saved. Once a period of time (e.g., 6 months or a year) has passed, this data may no longer be necessary and can be archived or purged.

 

  • Expired User Sessions: Session data related to logged-in users can be removed after a session expires or when the user logs out.

 

  • Inactive Accounts: Data related to users who haven’t used the app for a long period of time might be cleaned up or archived, depending on the platform’s data retention policy.

 

  • Ride Cancellations: If a ride is cancelled, the ride request data might be kept for analytics purposes, but the actual details of the ride may be purged after a certain period.

 

  • Payment Information: If payment information is not required for future transactions, it may be purged after the ride is completed, depending on the platform’s security and compliance requirements.

 

  • Logs and Audit Trails: Logs that track system events and user activity may be purged or archived once they are no longer needed for security or auditing purposes.


3. Purging Strategy

To manage purging effectively, Uber/Ola may implement a combination of the following strategies:

 

A. Time-based Purging

  • Policy-driven Cleanup: Old records (e.g., rides older than a year) are purged according to predefined retention policies.

 

  • Archiving: Instead of deleting old data, it may be archived in a cheaper, less accessible storage tier. For example, completed ride records can be moved to cloud storage for long-term retention but are not readily available for frequent queries.

B. Data Classification

  • Not all data is the same, and some data might have different retention requirements. For instance, payment data might need to be retained for several years for compliance purposes, while ride history might only need to be stored for a shorter period.

 

  • Tagging or Flagging: Sensitive or regulatory-required data (e.g., payment information) can be flagged and not purged, whereas other less critical data can be safely deleted.

C. Soft Purging vs. Hard Purging

  • Soft Purging: Involves marking records as deleted or archived without immediately removing them from the database. This allows for easier recovery if needed. For example, ride history records can be marked as inactive in the database.

 

  • Hard Purging: Involves permanently deleting records from the system to free up space. This can be done after a certain retention period or if the data is no longer needed.


4. Database Cleanup Process

The cleanup process typically involves the following steps:

 

A. Data Retention Policies

  • Establishing a Policy: Define clear rules on how long different types of data should be retained (e.g., ride data is kept for 12 months, user logs for 6 months).

 

  • Compliance Requirements: Some data may need to be retained due to legal requirements (e.g., tax-related data or payment history), so purging is done in compliance with regulations like GDPR, PCI-DSS, etc.

B. Deleting or Archiving Old Data

  • Archiving: Older ride data and transaction logs might be moved to a different database or storage solution (e.g., a cold storage or data warehouse) where they can be accessed less frequently.

 

  • Deleting Data: Non-essential data (such as temporary session data) can be deleted after a certain retention period to avoid unnecessary storage overhead.

C. Index Cleanup

  • As data is purged or archived, the system must also ensure that database indexes are updated to reflect the changes. Outdated indexes should be removed, and new indexes might need to be created on frequently queried data.

D. Consistency Checks

  • The system must ensure that purging does not result in data inconsistency. For example, if a user’s ride history is deleted, it shouldn’t leave orphaned records in the payment system or driver logs.


5. Automated Cleanup Jobs

To ensure regular purging and database cleanup:

 

  • Scheduled Jobs: Regular, automated cleanup jobs can be scheduled at off-peak hours to minimize the impact on system performance. These jobs would purge old records, remove expired sessions, and update the database indexes.

 

  • Database Triggers: Certain actions can trigger cleanup processes automatically. For example, if a user deletes their account, it could trigger a cascade of actions to remove their ride history, payment information, etc.


6. Impact on System Performance

While purging data can improve the database’s performance, it must be done carefully to avoid performance degradation during the cleanup process. Here are some considerations:

 

  • Concurrency: Cleanup jobs should not lock the entire database during purging to avoid delays for other users or systems interacting with the database.

 

  • Batch Processing: Large cleanup jobs should be broken down into smaller batches to avoid overwhelming the system and to maintain responsiveness.

 

  • Archiving vs. Deletion: The choice between archiving and deleting depends on how often the data needs to be accessed and how much storage cost is acceptable.


7. Challenges and Best Practices

  • Data Integrity: Ensuring that purging does not accidentally delete important data or leave orphaned records is crucial. Good data relationships and constraints must be in place to ensure proper cleanup.

 

  • Compliance with Regulations: Organizations must ensure that they adhere to data retention laws and regulations, especially in cases of personal or financial data.

 

  • Automation: Automated scripts and scheduled jobs must be set up to handle purging consistently and reliably without manual intervention.

 

  • Monitoring and Logging: Ensure there is a monitoring system in place to track the purging process and log any failures or errors, so the team can intervene if necessary.
0% Complete
WhatsApp Icon

Hi Instagram Fam!
Get a FREE Cheat Sheet on System Design.

Hi LinkedIn Fam!
Get a FREE Cheat Sheet on System Design

Loved Our YouTube Videos? Get a FREE Cheat Sheet on System Design.