What’s new in Databricks for May 2023

Platform 

  • When creating a cluster or  pool AWS fleet instance types are now available.  Fleet instance types map to multiple comparable aws instance types allowing the cluster to use whichever instance types has the best spot capacity and best on demand availability. For more information https://docs.databricks.com/compute/aws-fleet-instances.html ( AWS Only)
  • The new Unified navigation experience is in public preview.  For more information https://docs.databricks.com/workspace/unified-nav.html
  • Cluster scoped init scripts on DBFS are deprecated. They should be stored as workspace files instead. For more information https://docs.databricks.com/files/workspace-init-scripts.html
  • Run file-based SQL queries in a Databricks Workflow
  • Databricks Runtime 13.1 and Databricks Runtime 13.1 for ML are now available as Beta releases.
  • Compliance security profile now supports more EC2 instance types. For more information https://docs.databricks.com/security/privacy/security-profile.html#features
  • New Region : Europe (Paris), South America (Sao Paulo)  ( AWS Only)
  • Databricks has released a recent version of the JDBC driver(2.6.33)
  • Workspaces with Enterprise pricing tier and the enhanced security and compliance add-on will have the opportunity to configure a monthly or biweekly schedule for automatic restart of compute resources if needed to get the latest images and security. ( AWS Only). For more information https://docs.databricks.com/administration-guide/clusters/scheduled-cluster-updates.html
  • You can now authenticate to Databricks REST APIs using OAuth tokens for service principals. A service principal is an identity that you create in Databricks for use with automated tools, jobs, and applications. Account admins can create a client secret for a service principal. You can then use the client secret with the client ID, also known as the service principal’s application ID, to request an OAuth token for the service principal. You can use the same OAuth token for both the account and workspaces, as long as the service principal has the correct access. For more information https://docs.databricks.com/dev-tools/authentication-oauth.html
  • M7g and R7g Graviton3 instances are now supported on Databricks (AWS Only)
  • AWS provides an API called the IMDS API to read instance metadata in your notebooks. AWS announced IMDS version 2 (IMDSv2), which includes security improvements and a session-oriented flow with requests protected by session authentication. You can configure your workspace to enforce the use of IMDS v2 with a workspace admin setting that is now generally available. ( AWS Only)
  • Cluster-scoped Python libraries are supported on Databricks Runtime 13.1 and above. Support is also available for Python wheels that are uploaded as workspace files, but not libraries that are referenced using DBFS filepaths, including libraries uploaded to DBFS root. Non-Python libraries are not supported. 
  • Azure Databricks now supports using Azure confidential computing VM types when creating clusters. Azure confidential computing helps protect data in use, preventing the cloud provider from having access to sensitive data. For more information https://learn.microsoft.com/en-us/azure/databricks/clusters/configure#confidential
  • You can enable secure cluster connectivity (SCC) on an existing workspace to make your VNet have no open ports and Databricks Runtime cluster nodes have no public IP addresses. This feature is configured in Azure templates using the name No Public IP (enableNoPublicIp).
  • You can enable or disable Azure Private Link on an existing workspace for private connectivity between users and their Databricks workspaces, and also between compute resources and the control plane of Azure Databricks infrastructure.
  • Terraform provider updated to version 1.17.0

Delta Lake

  • You can chain multiple stateful operators together, meaning that you can feed the output of an operation such as a windowed aggregation to another stateful operation such as a join. (Require DBR 13.1) For more information https://docs.databricks.com/structured-streaming/stateful-streaming.html
  • You can now use dropDuplicatesWithinWatermark in combination with a specified watermark threshold to deduplicate records in Structured Streaming. (Require DBR 13.1)
  • You can now use Trigger.AvailableNow to consume records from Kinesis as an incremental batch with Structured Streaming. For more information https://docs.databricks.com/structured-streaming/kinesis.html#available-now
  • You can now use CLONE and CONVERT TO DELTA with Iceberg tables that have partitions defined on truncated columns of types int, long, and string. Truncated columns of type decimal are not supported.
  • You can now use shallow clones to create new Unity Catalog managed tables from existing Unity Catalog managed tables. For more information : https://docs.databricks.com/delta/clone-unity-catalog.html (Require DBR 13.1)

Governance

Databricks SQL

  • Schema Browser is now generally available in Data Explorer.
  • On-hover table detail panel showing is less sensitive.
  • The escape key now closes the autocomplete panel.
  • View definitions now have syntax highlighting in the Data Explorer details tab.
  • In the SQL Statement API, the EXTERNAL_LINKS disposition now supports the JSON_ARRAY format. You can extract up to 100 GiB of data in JSON format with pre-signed URLs. The INLINE limit for JSON is 16 MiB.

Partner Connect

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *