Introduction to Databricks Photon

What’s Photon ?

Photon is a vectorized query engine written in C++ that leverages data and instruction-level parallelism available in CPUs.

It’s 100% compatible with Apache Spark APIs which means you don’t have to rewrite your existing code ( SQL, Python, R, Scala)  to benefit from its advantages. 

Photon is an ANSI compliant Engine, it was primarily focused on SQL  but now the scope is much larger, with more ingestion sources, formats, APIs and methods since the launch.

What are the advantages of Photon ?

1)Cheaper and Faster

Built from the ground up for the fastest performance at lower cost. It provides up to 80% TCO savings while accelerating data and analytics workloads up to 12X speedups

2) Built for all use cases

Photon is the first engine that enables Data teams to standardize on one set of APIs for all workloads

How Does Photon Help  to Optimize the Costs ?

By Making the queries running faster, you will spend less on the VMs cost.

By Accelerating your time to market, your product will be quickly available to your customers.

How Can I Enable Photon ?

Photon is activated by Default for SQL Warehouses

Photon is activated by Default for Clusters

How Can I find all the functions that are supported by Photon ?

You can write the following Scala code to get the list of all the available functions supported by Photon.

To make sure to benefit from the latest functions, you need to make sure to be on the latest runtimes.

What are the operations that are highlighted in yellow ?

If a function is vectorized and executed by Photon it’s highlighted in yellow in the DAG

What happens if photon is enabled for my cluster and I run an unsupported function ?

Features not supported by Photon run the same way they would with Databricks Runtime.

Where Can I Find the Photon Paper  ?

You can read the paper over here https://cs.stanford.edu/~matei/papers/2022/sigmod_photon.pdf

  • Apache Spark was awarded the SIGMOD Systems Award
  • Databricks Photon was awarded the Best Industry paper Award

Bonus  : to get more information feel free to watch Simon Whitley’s video

Useful Links:

Databricks Sets official Data Warehousing Performance Record : https://www.databricks.com/blog/2021/11/02/databricks-sets-official-data-warehousing-performance-record.html

Photon product : https://www.databricks.com/product/photon

Photon Documentation : https://docs.databricks.com/runtime/photon.html

Radical Speed on the Lakehouse Photon under the hood

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *