Note: The course has been updated to include a number of lectures for version 3 certification for Databricks data engineer associate exam.
Whether you’re a seasoned data professional or just starting your journey, this course provides the perfect blend of theory and hands-on examples to ensure your success. With practical exercises and step-by-step guidance, you will learn how to navigate the Data Lakehouse architecture, explore the Data Science and Engineering workspace, and master the powerful Delta Lake.
A Certified Databricks Data Engineer unlocks endless possibilities in the world of data processing and analytics. In this comprehensive course, you will gain the knowledge and skills to harness the power of the Databricks Lakehouse Platform, empowering you to tackle real-world data challenges with confidence and efficiency.
Here’s a breakdown of the topics covered in this course:
-
Databricks Lakehouse Platform:
-
Databricks user interface
-
Notebooks
-
Connecting to repository / CICD
-
All purpose and job clusters
-
Accounts and workspaces
-
Data Lakehouse (architecture, descriptions, benefits)
-
Data Science and Engineering workspace (clusters, notebooks, data storage)
-
Delta Lake (general concepts, table management and manipulation, optimizations)
-
-
Data transformation with Apache Spark:
-
Relational entities (databases, tables, views)
-
Extracting data from files
-
Views, temporary views and CTEs
-
Creating tables, writing data to tables, cleaning data, combining and reshaping tables, SQL UDFs
-
Facilitating Spark SQL with string manipulation and control flow
-
passing data between PySpark and Spark SQL
-
Using Pyspark and SQL for various transformations such as count, count_if, removing duplicates, external tables, timestamps, JSON, structs, arrays, CASE WHEN and many more
-
-
Data management with Delta Lake:
-
Reading files using SQL in Databricks
-
Using CTAS
-
Table constraints, partitions, Operations, time travel,
-
Optimizing using z-ordering and vaccum
-
Delta cloning and external tables
-
-
Data pipeline with Delta live tables:
-
Structured Streaming (general concepts, triggers, watermarks)
-
Auto Loader (streaming reads)
-
Multi-hop Architecture (bronze-silver-gold, streaming applications)
-
Delta Live Tables (benefits and features)
-
Change Data Capture
-
-
Build production pipelines / Workloads:
-
Jobs (scheduling, task orchestration, UI, CRON)
-
Job notifications and history
-
Dashboards (endpoints, scheduling, alerting, refreshing)
-
-
Unity catalog and entity permissions:
-
Unity Catalog (benefits and features)
-
Entity Permissions (team-based permissions, user-based permissions)
-
These topics provide a comprehensive coverage of the Databricks Lakehouse Platform and its tools, allowing learners to gain a solid understanding of data engineering concepts and practices using Databricks.