Course Content
🎁 Bonus Modules (Integrated Throughout)
Data Analytics
Data Collection

In any data analytics pipeline, the first step is data collection—gathering data from sources like applications, databases, devices, or even physical drives. AWS provides different services for different scenarios, and here’s how three key services—Kinesis, Snowball, and DMS—help collect data effectively.


1. Amazon Kinesis – Real-Time Streaming Data

What it is:


Amazon Kinesis is a fully managed service that lets you collect and process real-time data streams such as logs, social media feeds, website clicks, or IoT sensor data.

Key Use Cases:

 

  • Tracking real-time user activity on apps/websites
  • Collecting log data from EC2 instances or on-prem servers
  • Streaming sensor data from IoT devices

 

Why it matters:
Kinesis enables real-time analytics. For example, a retail company can immediately detect which products are trending during a flash sale and adjust marketing campaigns instantly.


2. AWS Snowball – Offline Data Transfer

 

What it is:


Snowball is a physical device used to transfer large amounts of data (terabytes to petabytes) from on-premises systems to AWS when network transfer is too slow or costly.

 

Key Use Cases:

 

  • Migrating archived media files or database backups
  • Collecting video surveillance footage or industrial data
  • Transporting data from remote locations with poor internet

 

Why it matters:
When you have too much data for the internet to handle, Snowball becomes a fast, secure, and scalable alternative to bring that data to the cloud.


3. AWS DMS – Database Migration Service

 

What it is:


AWS DMS helps you migrate data from one database to another (on-prem to AWS or cloud-to-cloud), either one-time or in real time with ongoing replication.

Key Use Cases:

 

  • Migrating an Oracle or SQL Server database to Amazon RDS or Redshift
  • Synchronizing hybrid databases during a cloud transition
  • Supporting real-time data pipelines with minimal downtime

 

Why it matters:
DMS supports both structured and semi-structured data and allows seamless migration without affecting the live production database.


Summary
Service Data Type Scenario Data Flow
Kinesis Streaming data Real-time logs, sensors Real-time
Snowball Bulk offline data Large physical datasets Batch
DMS Structured databases Migrations or replication One-time or continuous

These tools form the foundation of data collection in AWS, supporting real-time, batch, and hybrid analytics architectures. Understanding when and why to use each is essential for building scalable, cloud-based data pipelines.

0% Complete
WhatsApp Icon

Hi Instagram Fam!
Get a FREE Cheat Sheet on System Design.

Hi LinkedIn Fam!
Get a FREE Cheat Sheet on System Design

Loved Our YouTube Videos? Get a FREE Cheat Sheet on System Design.