Skip to content

Data Engineering Capstone Project

About This Course

In this capstone course, you'll get a chance to apply the latest data engineering approaches through completion of a hands-on data engineering project. You'll analyze and explore solutions to complex problems commonly found in the real-life application of Apache Spark’s data processing ecosystem — problems that require comprehensive and specialized knowledge, and where basic techniques would be suboptimal.

WHAT YOU’LL LEARN

  • How to design and implement a data lake for a multichannel retail organization in Azure Data Lake and Azure Databricks using a multi-hop, medallion architecture
  • Ways to efficiently and performantly ingest, transform and land big data workloads using Apache Spark
  • How to build a feature data set for a machine learning model
  • Diagnosis and tuning of common performance pitfalls in Spark jobs
  • How to design, orchestrate and curate data sets based on business requirements

GET HANDS-ON EXPERIENCE

  • Explore and transform semi-structured data sets at real scale in Azure Databricks using Apache Spark
  • Write Airflow DAGs to orchestrate common data pipeline operations
  • Use open-source Delta Lake to manage your data storage and perform common DDL operations

Course Sessions

Online Synchronous

April 2027
Dates Apr 8 - May 27
Location Online
Instructor Jerry Kuch
Cost $1,665
Scheduled Meetings
Date
Day
Time
Location
Apr 8, 2027
Thu
6 – 9 p.m.
Online
Apr 15, 2027
Thu
6 – 9 p.m.
Online
Apr 22, 2027
Thu
6 – 9 p.m.
Online
Apr 29, 2027
Thu
6 – 9 p.m.
Online
May 6, 2027
Thu
6 – 9 p.m.
Online
May 13, 2027
Thu
6 – 9 p.m.
Online
May 20, 2027
Thu
6 – 9 p.m.
Online
May 27, 2027
Thu
6 – 9 p.m.
Online

All times are Pacific Time.

Noncredit Course

You'll earn 2.4 continuing education units (CEUs) for successfully completing this course.