How to pass the Associate Machine Learning Certification ?

Number of questions : 45

Type of questions : Multiple choice questions

Duration : 90 Min

Passing score : 70%

Where to register for the certification : https://www.webassessor.com/databricks

Expiration : 2 years

Topics covered :

  • Databricks Machine Learning
  • ML workflows
  • Spark ML
  • Scaling ML Models

Practice tests: No practice exams are available yet.

How to prepare for the certification:

Complete the Scalable Machine Learning With Apache Spark( Github repos)

Implementing MLOps In Databricks Lakehouse ( Link)

Getting started with Databricks Machine Learning ( Link)

Features you should know before taking the exam:

Databricks Runtime for Machine Learning

AutoML 

Feature Store

Mlflow Tracking

Mlflow Models

Mlflow Model registry

Exploratory Data Analysis

Feature Engineering with Sckit Learn

Feature engineering with MlLib

HyperOpt

Evaluation and Selection

Distributed Linear Regression

Distributed Decision Trees

Pandas API

Spark ML Modeling

Additional resources :

Architecting Mlops on The Lakehouse

Build Reliable Production Data and ML Pipelines With Git Support

Automate your Data and  ML Workflows  With Github Actions for Databricks

Save Time on Data and ML Workflows with Repair and Rerun

Feature Store

Model Evaluation in Mlflow

AutoML

Build and Query a Delta Lake

Minimally Qualified Candidate :

  • Use Databricks Machine Learning and its capabilities within machine learning workflows, including:
    • Databricks Machine Learning (clusters, Repos, Jobs)
    • Databricks Runtime for Machine Learning (basics, libraries)
    • AutoML (classification, regression, forecasting)
    • Feature Store (basics)
    • MLflow (Tracking, Models, Model Registry)
  • Implement correct decisions in machine learning workflows, including:
    • Exploratory data analysis (summary statistics, outlier removal)
    • Feature engineering (missing value imputation, one-hot-encoding)
    • Tuning (hyperparameter basics, hyperparameter parallelization)
    • Evaluation and selection (cross-validation, evaluation metrics)
  • Implement machine learning solutions at scale using Spark ML and other tools, including:
    • Distributed ML Concepts
    • Spark ML Modeling APIs (data splitting, training, evaluation, estimators vs. transformers, pipelines)
    • Hyperopt
    • Pandas API on Spark
    • Pandas UDFs and Pandas Function APIs
  • Understand advanced scaling characteristics of classical machine learning models, including:
    • Distributed Linear Regression
    • Distributed Decision Trees
    • Ensembling Methods (bagging, boosting)

Article written by Youssef Mrini