Course Content
Data Structures & Algorithms
Full Stack Web Development
Understanding and playing with DOM (Document Object Model)
0/2
MERN project
0/2
Low Level System Design
LLD Topics
High Level System Design
Fast-Track to Full Spectrum Software Engineering
Database Design & Schema of Pastebin

In this section, we will outline the database design and schema of a Pastebin system. The Pastebin system needs a database to store pastes (text/code), user information, and metadata associated with pastes. The design must ensure efficient storage, easy retrieval of data, and scalability as the application grows.

 

1. Key Entities for the Pastebin System

The primary entities for the Pastebin system include:

 

  1. Pastes: Represents the text or code that users upload.
  2. Users: Information about users who create pastes (if user authentication is enabled).
  3. Paste Expiration: Details regarding when a paste will expire.
  4. Audit Logs (optional): Logs of user actions like creation or deletion of pastes.

 

2. Entity Relationships

 

The relationship between the entities is simple:

 

  • Each User can create multiple Pastes.
  • Each Paste can have metadata (e.g., visibility, expiration time).
  • A Paste can have a history of actions (e.g., creation, modification) tracked in Audit Logs.

 

3. Database Tables and Schema

 

Let’s break down the core tables for the system:

 

a. Users Table

This table stores information about the users of the Pastebin system. If user authentication is not required, this table can be omitted.

 

Table Name: users

Column Data Type Description
user_id INT (PK) Unique identifier for the user (Primary Key).
username VARCHAR(255) The username for the user.
email VARCHAR(255) The user’s email address (optional, for user accounts).
password_hash VARCHAR(255) The hashed password (if using password-based auth).
created_at TIMESTAMP Timestamp when the user was created.
updated_at TIMESTAMP Timestamp when the user’s details were last updated.

 

b. Pastes Table

This is the core table for storing the paste content, metadata, and expiration information.

 

Table Name: pastes

Column Data Type Description
paste_id VARCHAR(255) (PK) Unique identifier for the paste (Primary Key).
user_id INT (FK) Foreign key referencing users.user_id (nullable if anonymous).
content TEXT The content (text or code) of the paste.
title VARCHAR(255) The title of the paste (optional).
language VARCHAR(50) The language used in the paste (for syntax highlighting).
visibility ENUM(‘public’, ‘private’, ‘unlisted’) Visibility setting of the paste.
expire_in INT Expiration time of the paste in seconds (e.g., 86400 for 1 day).
created_at TIMESTAMP Timestamp when the paste was created.
expires_at TIMESTAMP Timestamp when the paste will expire (calculated based on expire_in).
updated_at TIMESTAMP Timestamp when the paste was last updated.

 

Foreign Key:

  • user_id references users(user_id).


 

c. Audit Logs Table (Optional)

 

Audit logs track user actions like pastes being created, modified, or deleted. This table is useful for monitoring, debugging, and maintaining data integrity.

 

Table Name: audit_logs

Column Data Type Description
log_id INT (PK) Unique identifier for the log (Primary Key).
user_id INT (FK) Foreign key referencing users.user_id.
action_type ENUM(‘create’, ‘update’, ‘delete’, ‘view’) The action performed on the paste.
paste_id VARCHAR(255) The ID of the paste associated with the action.
timestamp TIMESTAMP Timestamp of when the action was performed.

 

Foreign Key:

  • user_id references users(user_id).

  • paste_id references pastes(paste_id).


 

d. Paste Metadata (Expiration) Table (Optional)

 

If expiration details are stored in a separate table for scalability or if some metadata is heavily accessed or modified, we can use a dedicated paste_expiration table.

 

Table Name: paste_expiration

Column Data Type Description
paste_id VARCHAR(255) (PK) Unique identifier of the paste (Primary Key).
expire_in INT The expiration time in seconds (e.g., 86400 for 1 day).
expires_at TIMESTAMP Timestamp when the paste expires.

 

Foreign Key:

  • paste_id references pastes(paste_id).


 

4. Example of Relationships Between Tables

 

  • One-to-Many: A User can have many Pastes.
  • One-to-Many: A Paste can have many Audit Logs (if tracking actions like creation, deletion, updates).
  • One-to-One: Each Paste can have one Paste Expiration entry.

 

5. Database Indexing & Performance Considerations
  • Indexes:
    • paste_id in pastes should be indexed for fast retrieval.

    • user_id in pastes should be indexed to quickly retrieve all pastes by a specific user.

    • expires_at in pastes could be indexed to facilitate quick searches for expired pastes.

  • Partitioning:
    If the number of pastes becomes very large, partitioning the pastes table based on time (e.g., by month or year) can help distribute data across multiple storage devices and improve performance for read/write operations.
  • Replication:
    For scaling, database replication can be used where one primary database handles writes, and multiple replicas handle read-heavy traffic. This can help with performance, especially when there are millions of pastes being accessed frequently.


6. Data Retention and Cleanup Strategy

A Pastebin system may not need to retain pastes forever. Here are some strategies:

 

  • Automatic Expiration: Pastes can be automatically deleted after a certain period (e.g., after 30 days). This can be handled by a scheduled job or a cron task that periodically cleans up expired records.
  • Soft Deletes: Instead of immediately deleting expired pastes, they could be marked as expired in the database, which helps in case of recovery or audits. This allows the system to archive them if needed, before permanent deletion.

 

7. Example Schema in SQL

Here’s an example of how the database schema can be implemented in SQL:

-- Users table
CREATE TABLE users (
user_id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(255) NOT NULL UNIQUE,
email VARCHAR(255) UNIQUE,
password_hash VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

-- Pastes table
CREATE TABLE pastes (
paste_id VARCHAR(255) PRIMARY KEY,
user_id INT,
content TEXT NOT NULL,
title VARCHAR(255),
language VARCHAR(50),
visibility ENUM('public', 'private', 'unlisted') DEFAULT 'public',
expire_in INT, -- Expiration in seconds
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);

-- Audit logs table (optional)
CREATE TABLE audit_logs (
log_id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT,
action_type ENUM('create', 'update', 'delete', 'view'),
paste_id VARCHAR(255),
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id),
FOREIGN KEY (paste_id) REFERENCES pastes(paste_id)
);

-- Paste expiration table (optional)
CREATE TABLE paste_expiration (
paste_id VARCHAR(255) PRIMARY KEY,
expire_in INT,
expires_at TIMESTAMP,
FOREIGN KEY (paste_id) REFERENCES pastes(paste_id)
);


 

Conclusion

 

The database schema for a Pastebin system involves creating several tables to handle users, pastes, metadata, and audit logs. The schema is designed to allow efficient storage, retrieval, and management of pastes, with considerations for user actions, expiration times, and scalability. By explaining these concepts to your students, they can understand the role of database design in ensuring efficient operations for a web application like Pastebin.

0% Complete
WhatsApp Icon

Hi Instagram Fam!
Get a FREE Cheat Sheet on System Design.

Hi LinkedIn Fam!
Get a FREE Cheat Sheet on System Design

Loved Our YouTube Videos? Get a FREE Cheat Sheet on System Design.