Introduction to Databricks Photon

What’s Photon ? Photon is a vectorized query engine written in C++ that leverages data and instruction-level parallelism available in CPUs. It’s 100% compatible with Apache Spark APIs which means you don’t have to rewrite your existing code ( SQL, Python, R, Scala)  to benefit from its advantages.  Photon is an ANSI compliant Engine, it […]

Azure Databricks — Setup SCIM in the Account Console

Requirements: Your Azure Databricks account must have the Azure Databricks Premium Plan. Your Azure Active Directory account must be a Premium edition account. You must be a global administrator for the Azure Active Directory account. Useful links: https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/scim/

How to upgrade your Hive metastore tables to Unity Catalog Using Sync

Before you begin, you must have: A storage credential with an IAM role that authorizes Unity Catalog to access the table’s location path. An external location that references the storage credential you just created and the path to the data on your cloud tenant. CREATE EXTERNAL TABLE permission on the external location of the table […]

Step by step guide to setup Unity Catalog in Azure

You must be an Azure Databricks account admin. The first Azure Databricks account admin must be an Azure Active Directory Global Administrator at the time that they first log in to the Azure Databricks account console. Upon first login, that user becomes an Azure Databricks account admin and no longer needs the Azure Active Directory […]

What’s new in Databricks for March 2023

Platform  You can now work with non-notebook files in Databricks ( py, md,csv, txt, log files). For more information : https://docs.databricks.com/files/workspace.html Databricks runtime 12.2 standard and ML are GA The SQL admin console has been combined with the general Admin console to create a Unified Experience for admin users. Support for Jupyter notebooks is available […]

Databricks Workflows

Databricks workflows is a fully managed orchestration service that’s integrated with the Databricks Lakehouse Platform. It helps you remove operational overhead so you can focus on your workflows and not managing the infrastructure. Databricks Workflows is reliable, you can have full confidence on them with a proven experience  running millions of productions workloads daily. Databricks […]

What’s new in Databricks for February 2023

Platform  Serverless Real time inference exposes your Mlflow ML model as a rest api endpoint. Databricks terraform provider updated to version 1.10.1 Variable explorer in Databricks Notebooks, you can directly  observe current Python variables and their value in the notebook UI. (Requires DBR12.x) Databricks extension for Visual Studio Code  lets developers leverage the powerful authoring […]

What’s new in Databricks for January 2023

Platform  We have added left and right sidebars to the Databricks notebook. You can now find the notebook’s table of contents in the left-hand sidebar, and you can find comments, MLflow experiments, and the notebook revision history in the right-hand sidebar. Databricks integration with Confluent Schema registry now supports external schema registry addresses with authentication. […]

What’s new in Databricks for December 2022

Platform  The workspace administrator setting to disable the uploads data UI now applies to the new upload data UI Memory profiling is now enabled for PySpark UDF.(Runtime 12) Jobs are now available in global search DBR 12.0 and 12.0 ML are GA Databricks Terraform provider updated to version 1.7.0 Databricks ODBC and JDBC drivers were […]

What’s new in Databricks for November 2022

Platform  You can filter by job name with the List all jobs operation ( GET/Jobs/List) in the jobs API Databricks Terraform provider updated to version 1.6.4 to add the warehouse_type parameter to the databricks_sql_endpoint resource to support additional Databricks SQL warehouse types, and more. When you click the Search field in the top bar of […]