About This Course
In this course, you’ll get an introduction to the fundamental building blocks of big data engineering. You'll learn the foundational concepts of distributed computing, distributed data processing, data management and data pipelines. You’ll also survey a variety of available data stack technologies and learn how to run a data processing workflow through a commonly used platform.
What You’ll Learn
The fundamentals of modern big data stacks, their uses, advantages and limitations
How functional programming ideas help with building and using systems to store and process big data
The foundations of the Hadoop ecosystem and its emerging successors like Spark
The ins and outs of big data processing via multiple paradigms, both storage-bound and in-memory (Spark, Spark SQL, Delta Lake, Hive, SQL)
The origins, uses and limitations of NoSQL stores (HBase, Redis, Elasticsearch, Cassandra, graph-processing systems, etc.)
Get Hands-On Experience
Apply contemporary distributed computing frameworks to the storage, processing and analysis of large data sets
Use the MapReduce model and the Spark framework on big data problems
Apply principles of functional programming to data storage and analysis