What’s new in Databricks for October 2023

BI & Data Warehousing with your Lakehouse

Lakeview Dashboards are in public preview!

Lakeview Dashboards offer a new dashboarding experience, optimized for ease of use, broad distribution, governance and security.

In addition to a brand new UX, making it easier to plot insights, Lakeview Dashboard can be shared with users outside of your organization. 

Create your first Dashboard now (video)

Governance and Unity Catalog

Discover and Organize your data in your Lakehouse

Building your semantic layer is getting easier. AI-Generated table comments automatically describe your data assets.

This will improve the new Semantic search capabilities, letting you ask questions on your lakehouse using plain text: (eg: List all the tables related to football)

Track your compute resources: Clusters and node types available as System Tables

System tables offer more insight into your lakehouse usage in plain SQL. They are available for Audit logsTable and Column lineageBillable usagePricingCluster and node typesMarketplace listing accessPredictive Optimization

For more information: Databricks Documentation or install the System tables demo with dbdemos

Ingestion and performances

10x faster DML Delta queries with Deletion Vectors (update, delete, merge)

Deletion vectors are going GA! Updating content in your tables doesn’t require the engine to rewrite data anymore (write amplification). Delta Lake automatically flags the deleted or updated rows as separate information, resulting in 10x operation speed!

Deletion vectors are part of Predictive I/O, bringing AI to your Lakehouse for faster queries: See Predictive I/O documentation.

Deletion vectors will be enabled by default starting in DBR14! (default behavior can be changed in your workspace settings). For more information : 

Predictive Optimization : Faster queries and cheaper storage

Predictive Optimization leverages Unity Catalog and Lakehouse AI to determine the best optimizations to perform on your data, and then runs those operations on purpose-built serverless infrastructure (VACUUM, OPTIMIZE…). This significantly simplifies your lakehouse journey, freeing up your time to focus on getting business value from your data.

Set the Predictive optimization field in Account console > Settings > Feature Enablement

In just a click, you’ll get the power of AI-optimized data layouts across your Unity Catalog managed tables, making your data faster and more cost-effective.

Note: Predictive Optimization metrics are available as system tables (eg: Which tables have been optimized recently)

For more information :

ML & AI + LLMs

Foundation LLM models available in the Market

Llama 2 foundation chat models are now available in the Databricks Marketplace for fine-tuning and deployment on private model serving endpoints.

Each model is wrapped in MLflow and saved within Unity Catalog, making it easy to use the MLflow evaluation in notebooks and to deploy with a single click on LLM-optimized GPU model serving endpoints.

Deploy private LLMs using Databricks Model Serving

These endpoints are pre-configured with GPUs and accelerated to serve foundational models, providing the best cost/performance ratio. This allows you to build and deploy GenAI applications from data ingestion and fine-tuning, to model deployment and monitoring, all on a single platform. Watch the video.

Try deploying LLM models now!

Other updates

Unity Catalog: UCX – Unity Catalog Upgrade Toolkit

Need some help to upgrade your data asset to Unity Catalog? Try the new Databricks Lab project. Explore the Github Repo or Get started with a Video

Unity Catalog: Workspace-Catalog binding in Unity Catalog

While Metastore can be shared across multiple workspaces, you can now bind a catalog to a specific workspace, preventing it to be READ or WRITE from other workspace (ex: “Development” workspace can only READ the “prod” catalog)

Watch the recording to get started

Compute: Libraries are now supported in compute policies

If you are a Workspace admin  you can now add libraries to compute policies. Compute that use the policy will automatically install the library. Users can’t install or uninstall compute-scoped libraries on compute that use the policy. Read the cluster policies Documentation

Workflows: Pass parameters in Databricks jobs and if/ else condition

You can now add parameters to your Databricks jobs that are automatically passed to all job tasks that accept key-value pairs.. Additionally, you can now use an expanded set of value references to pass context and state between job tasks. Read the Documentation for Parameters

DAB: Databricks Asset bundles

Bundles, for short, facilitate the adoption of software engineering best practices, including source control, code review, testing and continuous integration and delivery (CI/CD)

Demo Center : Databricks Asset Bundles Demo

In a nutshell

  • Databricks Runtime 14.1 is GA: Link
  • You can run selected cells in a notebook.
  • Structured Streaming from Apache Pulsar on Databricks: Link
  • Declare temporary variables in a session which can be set and then referred to from within queries: Link
  • Arguments are explicitly assigned to parameters using the parameter names published by the function: Link
  • Feature Engineering (Feature Store) in Unity Catalog is GA: Link
  • On-demand feature computation is GA.  ML features can be computed on-demand at inference time: Link
  • Structured Streaming can perform streaming reads from views registered with Unity Catalog.: Link
  • Databricks AutoML Generated Notebooks are now saved as ML Artifacts: Link
  • Models in Unity Catalog is GA: Link
  • You can now drop some table features for Delta tables. Current support includes dropping deletionVectors and v2Checkpoint: Link
  • Partner connect now supports Dataiku , Rudderstack and Monte Carlo.

Highlights of the Databricks Blog posts

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *