How to pass the Associate Databricks Data Engineering Certification ?

Number of questions : 45

Type of questions : Multiple choice questions

Duration : 90 Min

Passing score : 70%

Where to register for the certification : https://www.webassessor.com/databricks

Expiration : 2 years

Topics covered :

  • Databricks Lakehouse Platform
  • ELT with Spark SQL and Python
  • Incremental Data Processing
  • Production  Pipelines
  • Data Governance

Practice tests: Link

How to prepare for the certification:

Complete The Data Engineering with Databricks ( Databricks Academy)

Complete The Data Engineering Notebooks( Link)

Read the databricks documentation (recommended)

Features you should know before taking the exam:

Lakehouse

Delta Lake ( Time Travel, Merge, Optimization, CTAs, Insert)

Change Data Feed

Delta Live Tables (DLT + Autoloader)

Apply change into with DLT

Structured Streaming 

Databricks Repos 

Incremental processing ( Autoloader, Copy Into)

Databricks SQL

Managing the clusters

Medallion Architecture

Data Permissions in Unity Catalog

Cluster Access Mode

Unity Catalog Overview

Additional resources :

Data Engineer Associate Slides

Delta Live Tables video

Data Engineering demo video

Introduction to Unity Catalog

Minimally Qualified Candidate :

The minimally qualified candidate should be able to:

Understand how to use and the benefits of using the Databricks Lakehouse Platform and its tools, including:

  • Data Lakehouse (architecture, descriptions, benefits)
  • Data Science and Engineering workspace (clusters, notebooks, data storage)
  • Delta Lake (general concepts, table management and manipulation, optimizations)

Build ETL pipelines using Apache Spark SQL and Python, including:

  • Relational entities (databases, tables, views)
  • ELT (creating tables, writing data to tables, cleaning data, combining and reshaping tables, SQL UDFs)
  • Python (facilitating Spark SQL with string manipulation and control flow, passing data between PySpark and Spark SQL)

Incrementally process data, including:

  • Structured Streaming (general concepts, triggers, watermarks)
  • Auto Loader (streaming reads)
  • Multi-hop Architecture (bronze-silver-gold, streaming applications)
  • Delta Live Tables (benefits and features)

Build production pipelines for data engineering applications and Databricks SQL queries and dashboards, including:

  • Jobs (scheduling, task orchestration, UI)
  • Dashboards (endpoints, scheduling, alerting, refreshing)

Understand and follow best security practices, including:

  • Unity Catalog (benefits and features)
  • Entity Permissions (team-based permissions, user-based permissions)

Article written by Youssef Mrini

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *