光子运行时

Photon是Databricks上的原生向量化查询引擎,被编写为直接兼容Apache Spark api,因此它可以与您现有的代码一起工作。它是用c++开发的,以利用现代硬件,并使用向量化查询处理中的最新技术来利用cpu中的数据和指令级并行性,增强实际数据和应用程序的性能——所有这些都是在数据湖上原生的。Photon是高性能运行时的一部分,它可以更快地运行现有的SQL和DataFrame API调用,并降低每个工作负载的总成本。Photon在Databricks SQL仓库中默认使用。

砖集群

Photon可用于集群运行Databricks Runtime 9.1 LTS及以上。

要启用光子加速,请选择使用光子加速复选框,当您创建集群.如果使用集群API,设置runtime_engine光子

Photon在驱动和工作节点上支持许多实例类型。Photon实例类型与运行非Photon运行时的相同实例类型消耗DBUs的速率不同。有关Photon实例和DBU消耗的更多信息,请参见数据定价页面

光子的优势

  • 对Delta和Parquet表支持SQL和等效的DataFrame操作。

  • 加速处理大量数据(100GB+)并包括聚合和连接的查询。

  • 当从磁盘缓存中重复访问数据时,性能会更快。

  • 在具有许多列和许多小文件的表上具有更健壮的扫描性能。

  • 更快的Delta和Parquet写入使用更新删除合并插入,创建表格作为选择,特别是对于宽表(数百到数千列)。

  • 将排序合并连接替换为散列连接。

光子的报道

运营商

  • 扫描,过滤,项目

  • 哈希总/加入/洗牌

  • 嵌套循环连接

  • 空感知反连接

  • Union, Expand, ScalarSubquery

  • Delta/Parquet写槽

  • 排序

  • 窗口函数

表达式

  • 比较/逻辑

  • 算术/数学(最多)

  • 条件(IF, CASE,等等)

  • 字符串(常见的)

  • 数据类型转换

  • 聚合(最常见的)

  • 日期/时间戳

数据类型

  • 字节/短/ Int /长

  • 布尔

  • 字符串/二进制

  • 小数

  • 浮动/双

  • 日期/时间戳

  • 结构体

  • 数组

  • 地图

下表列出了支持的Databricks表达式和支持它的Databricks Runtime最低发布版本。

名字

释放

腹肌

Databricks Runtime 8.3

这些“可信赖医疗组织”

Databricks Runtime 10.4 LTS

添加

Databricks Runtime 8.3

AddMonths

Databricks Runtime 8.3

AesDecrypt

Databricks Runtime 10.4 LTS

AesEncrypt

Databricks Runtime 10.4 LTS

Databricks Runtime 8.3

ArrayContains

Databricks Runtime 8.3

ArrayDistinct

Databricks运行时10.0

ArrayExcept

Databricks运行时10.1

ArrayExists

Databricks Runtime 10.4 LTS

ArrayFilter

Databricks Runtime 10.4 LTS

ArrayForAll

Databricks Runtime 10.4 LTS

ArrayIntersect

Databricks运行时10.1

ArrayJoin

Databricks Runtime 10.4 LTS

ArraySize

Databricks Runtime 10.4 LTS

ArrayTransform

Databricks Runtime 10.4 LTS

ArrayUnion

Databricks运行时10.1

:

Databricks Runtime 9.1 LTS

量化

Databricks Runtime 9.1 LTS

平均

Databricks Runtime 8.3

Base64

Databricks Runtime 9.1 LTS

箱子

Databricks运行时10.0

BitAndAgg

Databricks Runtime 8.3

BitLength

Databricks Runtime 11.3 LTS

BitOrAgg

Databricks Runtime 8.3

BitwiseAnd

Databricks Runtime 8.3

BitwiseNot

Databricks Runtime 8.3

BitwiseOr

Databricks Runtime 8.3

BitwiseReverse

Databricks Runtime 8.3

BitwiseXor

Databricks Runtime 8.3

BitXorAgg

Databricks Runtime 8.3

BoundaryAsGeojson

Databricks Runtime 11.3 LTS

BoundaryAsWkb

Databricks Runtime 11.3 LTS

BoundaryAsWkt

Databricks Runtime 11.3 LTS

Databricks Runtime 8.3

Cbrt

Databricks Runtime 8.4

CeilExpressionBuilder

Databricks Runtime 8.3

CenterAsGeojson

Databricks Runtime 11.3 LTS

CenterAsWkb

Databricks Runtime 11.3 LTS

CenterAsWkt

Databricks Runtime 11.3 LTS

空空的

Databricks运行时10.1

合并

Databricks Runtime 8.3

CollectList

Databricks Runtime 9.0

Concat

Databricks Runtime 8.3

ConcatWs

Databricks Runtime 8.3

Conv

Databricks Runtime 8.3

因为

Databricks Runtime 10.4 LTS

Databricks Runtime 8.3

CreateArray

Databricks Runtime 8.3

CreateMap

Databricks Runtime 8.4

CreateNamedStruct

Databricks Runtime 8.3

CreateStruct

Databricks Runtime 8.3

CurrentCatalog

Databricks Runtime 8.3

CurrentDatabase

Databricks Runtime 8.3

CurrentDate

Databricks Runtime 8.3

CurrentTimestamp

Databricks Runtime 8.3

CurrentTimeZone

Databricks Runtime 8.3

CurrentUser

Databricks Runtime 8.3

返回

Databricks Runtime 8.3

DateDiff

Databricks Runtime 8.3

DateFormatClass

Databricks Runtime 8.3

DateFromUnixDate

Databricks Runtime 8.3

DateSub

Databricks Runtime 8.3

DayOfMonth

Databricks Runtime 8.3

DayOfWeek

Databricks Runtime 8.3

DayOfYear

Databricks Runtime 8.3

解码

Databricks Runtime 8.3

DenseRank

Databricks Runtime 10.4 LTS

Databricks Runtime 8.3

ElementAt

Databricks Runtime 8.3

EqualNullSafe

Databricks Runtime 8.3

等于

Databricks Runtime 8.3

经验值

Databricks Runtime 8.4

爆炸

Databricks Runtime 8.4

提取

Databricks Runtime 8.3

第一个

Databricks Runtime 8.3

FloorExpressionBuilder

Databricks Runtime 8.3

FromUnixTime

Databricks Runtime 8.3

FromUTCTimestamp *

Databricks Runtime 8.3

得到

Databricks Runtime 11.3 LTS

GetJsonObject

Databricks运行时11.2

GreaterThan

Databricks Runtime 8.3

GreaterThanOrEqual

Databricks Runtime 8.3

最大的

Databricks Runtime 8.3

GridDistance

Databricks Runtime 11.3 LTS

H3ToString

Databricks Runtime 11.3 LTS

十六进制

Databricks Runtime 9.1 LTS

小时

Databricks Runtime 8.3

如果

Databricks Runtime 8.3

Databricks Runtime 8.3

InitCap

Databricks Runtime 11.3 LTS

InputFileBlockLength

Databricks Runtime 8.3

InputFileBlockStart

Databricks Runtime 8.3

InputFileName

Databricks Runtime 8.3

插图

Databricks Runtime 8.3

IntegralDivide

Databricks Runtime 8.3

IsChildOf

Databricks Runtime 11.3 LTS

IsNaN

Databricks Runtime 8.3

IsNotNull

Databricks Runtime 8.3

IsNull

Databricks Runtime 8.3

IsPentagon

Databricks Runtime 11.3 LTS

IsValid

Databricks Runtime 11.3 LTS

JsonToStructs

Databricks运行时11.2

滞后

Databricks Runtime 10.4 LTS

最后的

Databricks Runtime 10.4 LTS

LastDay

Databricks Runtime 8.3

引领

Databricks Runtime 10.4 LTS

至少

Databricks Runtime 8.3

长度

Databricks Runtime 8.3

LengthOfJsonArray

Databricks运行时

不超过

Databricks Runtime 8.3

Levenshtein

Databricks运行时10.1

就像

Databricks Runtime 8.3

日志

Databricks Runtime 8.3

Log2

Databricks Runtime 8.4

LongLatAsH3

Databricks Runtime 11.3 LTS

LongLatAsH3String

Databricks Runtime 11.3 LTS

较低的

Databricks Runtime 8.3

LPadExpressionBuilder

Databricks Runtime 8.3

MakeDate

Databricks Runtime 8.3

MakeTimestamp

Databricks Runtime 8.3

马克斯

Databricks Runtime 8.3

MaxChild

Databricks Runtime 11.3 LTS

Md5

Databricks Runtime 10.4 LTS

MicrosToTimestamp

Databricks Runtime 8.3

MillisToTimestamp

Databricks Runtime 8.3

最小值

Databricks Runtime 8.3

MinChild

Databricks Runtime 11.3 LTS

一分钟

Databricks Runtime 8.3

MonotonicallyIncreasingID

Databricks Runtime 8.3

Databricks Runtime 8.3

MonthsBetween

Databricks Runtime 8.3

Databricks Runtime 8.3

Murmur3Hash

Databricks Runtime 8.3

NaNvl

Databricks Runtime 8.3

NextDay

Databricks Runtime 8.3

Databricks Runtime 8.3

现在

Databricks Runtime 8.3

NthValue

Databricks Runtime 10.4 LTS

NTile

Databricks Runtime 10.4 LTS

NullIf

Databricks Runtime 8.3

Nvl

Databricks Runtime 8.3

Nvl2

Databricks Runtime 8.3

OctetLength

Databricks Runtime 8.3

ParseToDate

Databricks Runtime 8.3

ParseToTimestamp

Databricks Runtime 8.3

百分位

Databricks Runtime 10.4 LTS

PercentRank

Databricks Runtime 10.4 LTS

π

Databricks Runtime 8.3

Pmod

Databricks Runtime 8.3

PosExplode

Databricks Runtime 9.1 LTS

战俘

Databricks Runtime 8.3

季度

Databricks Runtime 8.3

兰德

Databricks Runtime 8.3

排名

Databricks Runtime 10.4 LTS

RegExpExtract

Databricks Runtime 8.3

RegExpExtractAll

Databricks运行时

RegExpReplace

Databricks Runtime 9.1 LTS

RegrAvgX

Databricks运行时10.5

RegrAvgY

Databricks运行时10.5

剩余部分

Databricks Runtime 8.3

决议

Databricks Runtime 11.3 LTS

反向

Databricks Runtime 8.3

反向

Databricks Runtime 8.3

RLike

Databricks Runtime 8.3

Databricks Runtime 8.3

RowNumber

Databricks Runtime 10.4 LTS

RPadExpressionBuilder

Databricks Runtime 8.3

第二个

Databricks Runtime 8.3

SecondsToTimestamp

Databricks Runtime 8.3

Sha1

Databricks Runtime 10.4 LTS

Sha2

Databricks Runtime 10.4 LTS

ShiftLeft

Databricks Runtime 8.3

ShiftRight

Databricks Runtime 8.3

ShiftRightUnsigned

Databricks Runtime 8.3

Databricks Runtime 10.4 LTS

大小

Databricks Runtime 8.3

Databricks Runtime 8.3

探测法

Databricks运行时10.1

SparkVersion

Databricks Runtime 8.3

√6

Databricks Runtime 8.4

StddevPop

Databricks Runtime 8.3

StddevSamp

Databricks Runtime 8.3

StringInstr

Databricks Runtime 8.3

StringLocate

Databricks Runtime 8.3

StringRepeat

Databricks运行时11.2

StringSpace

Databricks Runtime 8.3

StringSplit

Databricks Runtime 8.3

StringToH3

Databricks Runtime 11.3 LTS

StringTranslate

Databricks Runtime 10.4 LTS

StringTrim

Databricks Runtime 8.3

StringTrimBoth

Databricks Runtime 8.3

StringTrimLeft

Databricks Runtime 8.3

StringTrimRight

Databricks Runtime 8.3

StructsToJson

Databricks运行时

子字符串

Databricks Runtime 8.3

减去

Databricks Runtime 8.3

总和

Databricks Runtime 8.3

棕褐色

Databricks Runtime 9.1 LTS

ToChildren

Databricks Runtime 11.3 LTS

ToParent

Databricks Runtime 11.3 LTS

ToRadians

Databricks运行时10.1

ToUnixTimestamp

Databricks Runtime 8.3

ToUTCTimestamp

Databricks Runtime 8.3

TruncDate

Databricks Runtime 8.3

TruncTimestamp

Databricks Runtime 8.3

TryElementAt

Databricks运行时10.0

TryValidate

Databricks Runtime 11.3 LTS

UnaryMinus

Databricks Runtime 8.3

UnBase64

Databricks Runtime 9.1 LTS

Unhex

Databricks Runtime 9.1 LTS

UnixDate

Databricks Runtime 8.3

UnixMicros

Databricks Runtime 8.3

UnixMillis

Databricks Runtime 8.3

UnixSeconds

Databricks Runtime 8.3

UnixTimestamp

Databricks Runtime 8.3

Databricks Runtime 8.3

Uuid

Databricks Runtime 8.3

验证

Databricks Runtime 11.3 LTS

VarianceSamp

Databricks运行时10.1

工作日

Databricks Runtime 8.3

WeekOfYear

Databricks Runtime 8.3

XxHash64

Databricks运行时10.0

一年

Databricks Runtime 8.3

Photon不完全支持*from_utc_timestamp。看到from_utc_timestamp获取更多信息。

限制

  • 结构化流:Photon目前支持Delta、Parquet和CSV的无状态流。Kafka和Kinesis支持公共预览

  • 不支持udf。

  • 不支持RDD api。

  • 不期望改进短时间运行的查询(<2秒),例如,针对少量数据的查询。

Photon不支持的特性以与Databricks Runtime相同的方式运行;这些特性没有性能优势。