Course Content
🎁 Bonus Modules (Integrated Throughout)
Data Analytics
Data Storage

In any data analytics workflow, once data is collected, it must be stored safely, efficiently, and in the right format for further processing and analysis. AWS offers several storage solutions, and the most commonly used for analytics are S3, RDS, and Redshift.


1. Amazon S3 (Simple Storage Service)

What it is:


Amazon S3 is an object storage service designed for unstructured data like images, logs, backups, videos, and big data files (CSV, JSON, Parquet, etc.).

Key Features:

 

  • Unlimited storage
  • High durability (99.999999999%)
  • Cost-effective with tiered pricing (Standard, Infrequent Access, Glacier)

 

Use Cases:

 

  • Storing raw logs and IoT data for batch processing
  • Hosting datasets for machine learning
  • Serving as a data lake (central repository for structured + unstructured data)

 

Why it matters:
S3 is foundational in data pipelines. Tools like AWS Glue, Athena, and EMR often read/write directly from S3.


2. Amazon RDS (Relational Database Service)

What it is:


Amazon RDS is a managed relational database service that supports databases like MySQL, PostgreSQL, SQL Server, and Oracle.

Key Features:

 

  • Automated backups, patching, and scaling
  • ACID-compliant transactions
  • Best for operational (OLTP) workloads

 

Use Cases:

 

  • Storing transactional data from apps or websites
  • Maintaining normalized data tables
  • Running basic SQL queries for reporting or integration

 

Why it matters:
RDS is useful when you need a structured, reliable data source for dashboards or to clean data before moving to analytics platforms like Redshift.


3. Amazon Redshift – Data Warehouse

What it is:


Amazon Redshift is a cloud data warehouse designed for analytical workloads (OLAP), meaning it’s optimized for large-scale data querying and reporting.

Key Features:

 

  • Columnar storage for faster performance
  • Integrates with BI tools (QuickSight, Tableau, Power BI)
  • Can query both Redshift tables and S3 data using Redshift Spectrum

 

Use Cases:

 

  • Running complex SQL queries on millions of rows
  • Powering executive dashboards and business reports
  • Joining and aggregating datasets from different sources

 

Why it matters:
Redshift is built for speed and scale. It allows businesses to quickly turn vast data into insights, even with petabytes of information.


Summary Table
Service Type Best For Example Scenario
Amazon S3 Object Storage Raw, unstructured data Logs from web apps
Amazon RDS Relational DB Structured transactional data E-commerce orders
Amazon Redshift Data Warehouse Analytical queries Sales reporting across regions
0% Complete
WhatsApp Icon

Hi Instagram Fam!
Get a FREE Cheat Sheet on System Design.

Hi LinkedIn Fam!
Get a FREE Cheat Sheet on System Design

Loved Our YouTube Videos? Get a FREE Cheat Sheet on System Design.