What’s new in Databricks for August 2022

Governance

  • Unity Catalog is now GA
  • Delta Sharing is now GA
  • Identity Federation is now GA

Databricks SQL

Serverless SQL Warehouse:

  • Worker nodes are private, which means they do not have public IP addresses.
  • Serverless warehouses use a private network for communication between the Databricks control plane and the Serverless data plane, which are both in the Databricks AWS account.
  • When reading or writing to AWS S3 buckets in the same region as your workspace, Serverless SQL warehouses now use direct access to S3 using AWS gateway endpoints. This applies when a Serverless warehouse reads and writes to your workspace’s root S3 bucket in your AWS account and to other S3 data sources in the same region.

Delta Live Tables

Update selected tables in DLT Pipeline:

You can now start an update for only selected tables in a Delta Live Tables pipeline. It accelerates testing pipelines and resolution of errors by allowing you to start a pipeline update that refreshes only selected tables. See Start a pipeline update for selected tables.

Generated Columns:

You can now use generated columns when you define tables in your Delta Live Tables pipelines. Generated columns are supported by the Delta Live Tables Python and SQL interfaces.

Cluster mode:

You can now select a cluster mode, either autoscaling or fixed size, directly in the Delta Live Tables UI when you create a pipeline. See Create a pipeline.

Reduced messages volumes in DLT:

The state transitions for live tables in a Delta Live Tables continuous pipeline are displayed in the UI only until the tables enter the running state. Any transitions related to successful recomputation of the tables are not displayed in the UI, but are available in the Delta Live Tables event log at the METRICS level. Any transitions to failure states are still displayed in the UI.

Security 

Enhanced Security Monitoring:

It provides an enhanced disk image (a CIS-hardened Ubuntu Advantage AMI) and additional security monitoring agents that generate logs that you can review. See Enhanced security monitoring.

Compliance:

The compliance controls for FedRAMP Moderate, PCI-DSS, and HIPAA are now GA

AWS PrivateLink Connectivity:

  • You can use the account console to create or update a workspace with PrivateLink connectivity.
  • AWS VPC endpoints to Databricks automatically and quickly transition to the Available state.
  • For a new VPC endpoint, you can now enable private DNS during creation rather than extra steps afterward. This simplifies the enablement of PrivateLink in the AWS Console as well as using Terraform and other automation tools.

Share vpc endpoint :

You can now share AWS VPC endpoints among multiple Databricks accounts. You must register the AWS VPC endpoints in each Databricks account.  See Enable AWS PrivateLink.

Workflows 

Clusters:

When a cluster is starting, your Databricks jobs now wait for cluster libraries to complete installation before executing.  See Dependent libraries.

Orchestrate dbt on Workflows:

You can run your dbt core project as a task in a Databricks job with the new dbt task, allowing you to include your dbt transformations in a data processing workflow.  See Use dbt in a Databricks job

Platform 

Monaco code base editor:

It’s available for Python notebooks. The new editor includes parameter type hints, object inspection on hover, code folding, multi-cursor support, column (box) selection, and side-by-side diffs in the notebook revision history.

Workspace Search:

You can now search for notebooks, libraries, folders, files, and repos by name. You can also search for content within a notebook and see a preview of the matching content. Search results can be filtered by type.  See Search workspace for an object.

Machine Learning

Feature Store:

Feature Store now supports BooleanType for automatic feature lookup and now supports automatic feature lookup for serverless real time inference. See Serverless Real-Time Inference.

Serving:

Serverless Real-Time Inference processes your machine learning models using MLflow and exposes them as REST API endpoints. It uses Serverless compute, which means that the endpoints and associated compute resources are managed and run in the Databricks cloud account.

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *