Getting started with Databricks

I’m pretty sure you all must have heard about Databricks somewhere whether it was on Linkedin, Twitter, Blogs or articles.

If you are not familiar with Databricks Lakehouse, it’s unique in 3 ways. 

  1. It’s simple. Data only needs to exist once to support all of your data workloads on one common platform
  2. It’s open. It’s based on open source and open standards to make it easy to work with existing tools and avoid proprietary formats
  3. It’s collaborative. Your data engineers, analysts, and data scientists are able to work together much more easily.

This is why Databricks is the data and AI company. No one else can do what we can with a single solution.

If you want to get started with Databricks, you will have plenty of options :

  1. Databricks Academy

It’s a unique platform developed by Databricks where they post several courses around Databricks.

Inside Databricks Academy you will find two types of courses :

  • Self paced

 You can follow the course at your own pace. You will get the material presented by the trainer as well as the notebook

  • Instructor led Training

Live courses delivered by Instructors for a limited number of people.

——————————————————————————————————————–

The Self paced courses are free for Databricks customers and partners.

If you want to find out whether you have a free access or not, it’s very easy:

  1. Navigate to this link

2. Register using your professional email address 

If your company is a Databricks Partner/client you will have free access to all the courses except the Instructor led training.

You have nothing to lose just give it a try you might be surprised. Databricks has a thousand customers and partners.

  1. Microsoft Learn

It’s an online training platform that provides interactive learning for Microsoft products and more.

  1. If you are a Data Engineer this course is made for you :

https://docs.microsoft.com/en-us/learn/paths/data-engineer-azure-databricks/

This course cover the following chapters :

  • Describe Azure Databricks
  • Spark Architecture Fundamentals
  • Read and Write in Azure Databricks
  • Work with Dataframes in Azure Databricks
  • Describe lazy evaluation and other performance features in Azure Databricks
  • Work with Dataframes columns in Azure Databricks
  • Work with dataframes advanced methods
  • Describe platform architecture security and data protection
  • Build and query Delta Lake
  • Process Streaming Data
  • Describe Azure Databricks Delta Lake Architecture
  • Describe Azure Databricks best practices

2. If you are a Data Scientist this course is made for you :

https://docs.microsoft.com/en-us/learn/paths/perform-data-science-azure-databricks/

This course covers the following chapters :

  • Perform machine learning with Azure Databricks
  • Train a machine learning model
  • Work with mlflow
  • Perform model selection with hyperparameter tuning
  • Deep learning with horovord for distributed training

  1. Coursera

It’s an online course provider very famous where you can learn everything.

  1. If you are a Data Engineer those courses are made for you :

https://www.coursera.org/learn/microsoft-azure-databricks-for-data-engineering (Created by Microsoft)

https://www.coursera.org/learn/apache-spark-sql-for-data-analysts ( Created by Databricks)

2. If you are a Data Scientist those courses are made for you :

https://www.coursera.org/learn/applied-data-science-for-data-analysts ( Created by Databricks )

https://www.coursera.org/specializations/compstats ( Created by Databricks)

Bonus

  • You can download for free the Learning Spark Book : Link
Learning Spark: Lightning-fast Data Analytics : Damji, Jules, Wenig,  Brooke, Das, Tathagata, Lee, Denny: Amazon.es: Libros
  • You can download for free the early release of the Definitive guide to Delta Lake : Link
The Definitive Guide to Delta Lake by O'Reilly- Free digital book -  Download Now in Early Release - The Databricks Blog
  • What if I tell you that we have a large community to whom you can ask questions and even browse to read all the posted answers to the questions posted:  https://community.databricks.com/s/
  • The Databricks Community Edition is the free version of our cloud-based big data platform. Its users can access a micro-cluster as well as a cluster manager and notebook environment. All users can share their notebooks and host them free of charge with Databricks.https://community.cloud.databricks.com/login.html

Article written by Youssef Mrini

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *