AWS Data Pipelines
AWS Tutorial: Data Pipelines & ETL
Welcome to the Data Pipelines lesson. Moving terabytes of raw data, transforming it into a usable format, and loading it into a database is a massive engineering challenge.
Why Learn Data Pipelines?
If your data is fragmented across different databases and external APIs, you cannot run analytics on it. Learning ETL (Extract, Transform, Load) pipelines allows you to consolidate your company's data into a single, clean source of truth.
Tutorial Overview
In this tutorial, you will learn the core data movement services:
- AWS Glue: Managed ETL service.
- Amazon Kinesis: Real-time streaming data.
Extract, Transform, Load (ETL)
- AWS Glue: A fully managed, serverless data integration and ETL service. It easily discovers, prepares, and combines data for analytics. For example, Glue can extract raw messy JSON files from S3, format them into clean tables, and load them into Amazon Redshift for reporting.
Real-Time Streaming Data
- Amazon Kinesis: What if you need to process data the exact millisecond it is generated (like live stock market prices or website clickstreams)? Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can react quickly to new information before it goes stale.