光子运行时
Photon是Databricks上的原生向量化查询引擎,被编写为直接兼容Apache Spark api,因此它可以与您现有的代码一起工作。它是用c++开发的,以利用现代硬件,并使用向量化查询处理中的最新技术来利用cpu中的数据和指令级并行性,增强实际数据和应用程序的性能——所有这些都是在数据湖上原生的。Photon是高性能运行时的一部分,它可以更快地运行现有的SQL和DataFrame API调用,并降低每个工作负载的总成本。Photon在Databricks SQL仓库中默认使用。
砖集群
Photon可用于集群运行Databricks Runtime 9.1 LTS及以上。
要启用光子加速,请选择使用光子加速复选框,当您创建集群.如果使用集群API,设置runtime_engine
来光子
.
Photon在驱动和工作节点上支持许多实例类型。Photon实例类型与运行非Photon运行时的相同实例类型消耗DBUs的速率不同。有关Photon实例和DBU消耗的更多信息,请参见数据定价页面.
光子的优势
对Delta和Parquet表支持SQL和等效的DataFrame操作。
加速处理大量数据(100GB+)并包括聚合和连接的查询。
当从磁盘缓存中重复访问数据时,性能会更快。
在具有许多列和许多小文件的表上具有更健壮的扫描性能。
更快的Delta和Parquet写入使用
更新
,删除
,合并成
,插入
,创建表格作为选择
,特别是对于宽表(数百到数千列)。将排序合并连接替换为散列连接。
光子的报道
运营商
扫描,过滤,项目
哈希总/加入/洗牌
嵌套循环连接
空感知反连接
Union, Expand, ScalarSubquery
Delta/Parquet写槽
排序
窗口函数
表达式
比较/逻辑
算术/数学(最多)
条件(IF, CASE,等等)
字符串(常见的)
数据类型转换
聚合(最常见的)
日期/时间戳
数据类型
字节/短/ Int /长
布尔
字符串/二进制
小数
浮动/双
日期/时间戳
结构体
数组
地图
下表列出了支持的Databricks表达式和支持它的Databricks Runtime最低发布版本。
名字 |
释放 |
---|---|
腹肌 |
Databricks Runtime 8.3 |
这些“可信赖医疗组织” |
Databricks Runtime 10.4 LTS |
添加 |
Databricks Runtime 8.3 |
AddMonths |
Databricks Runtime 8.3 |
AesDecrypt |
Databricks Runtime 10.4 LTS |
AesEncrypt |
Databricks Runtime 10.4 LTS |
和 |
Databricks Runtime 8.3 |
ArrayContains |
Databricks Runtime 8.3 |
ArrayDistinct |
Databricks运行时10.0 |
ArrayExcept |
Databricks运行时10.1 |
ArrayExists |
Databricks Runtime 10.4 LTS |
ArrayFilter |
Databricks Runtime 10.4 LTS |
ArrayForAll |
Databricks Runtime 10.4 LTS |
ArrayIntersect |
Databricks运行时10.1 |
ArrayJoin |
Databricks Runtime 10.4 LTS |
ArraySize |
Databricks Runtime 10.4 LTS |
ArrayTransform |
Databricks Runtime 10.4 LTS |
ArrayUnion |
Databricks运行时10.1 |
: |
Databricks Runtime 9.1 LTS |
量化 |
Databricks Runtime 9.1 LTS |
平均 |
Databricks Runtime 8.3 |
Base64 |
Databricks Runtime 9.1 LTS |
箱子 |
Databricks运行时10.0 |
BitAndAgg |
Databricks Runtime 8.3 |
BitLength |
Databricks Runtime 11.3 LTS |
BitOrAgg |
Databricks Runtime 8.3 |
BitwiseAnd |
Databricks Runtime 8.3 |
BitwiseNot |
Databricks Runtime 8.3 |
BitwiseOr |
Databricks Runtime 8.3 |
BitwiseReverse |
Databricks Runtime 8.3 |
BitwiseXor |
Databricks Runtime 8.3 |
BitXorAgg |
Databricks Runtime 8.3 |
BoundaryAsGeojson |
Databricks Runtime 11.3 LTS |
BoundaryAsWkb |
Databricks Runtime 11.3 LTS |
BoundaryAsWkt |
Databricks Runtime 11.3 LTS |
投 |
Databricks Runtime 8.3 |
Cbrt |
Databricks Runtime 8.4 |
CeilExpressionBuilder |
Databricks Runtime 8.3 |
CenterAsGeojson |
Databricks Runtime 11.3 LTS |
CenterAsWkb |
Databricks Runtime 11.3 LTS |
CenterAsWkt |
Databricks Runtime 11.3 LTS |
空空的 |
Databricks运行时10.1 |
合并 |
Databricks Runtime 8.3 |
CollectList |
Databricks Runtime 9.0 |
Concat |
Databricks Runtime 8.3 |
ConcatWs |
Databricks Runtime 8.3 |
Conv |
Databricks Runtime 8.3 |
因为 |
Databricks Runtime 10.4 LTS |
数 |
Databricks Runtime 8.3 |
CreateArray |
Databricks Runtime 8.3 |
CreateMap |
Databricks Runtime 8.4 |
CreateNamedStruct |
Databricks Runtime 8.3 |
CreateStruct |
Databricks Runtime 8.3 |
CurrentCatalog |
Databricks Runtime 8.3 |
CurrentDatabase |
Databricks Runtime 8.3 |
CurrentDate |
Databricks Runtime 8.3 |
CurrentTimestamp |
Databricks Runtime 8.3 |
CurrentTimeZone |
Databricks Runtime 8.3 |
CurrentUser |
Databricks Runtime 8.3 |
返回 |
Databricks Runtime 8.3 |
DateDiff |
Databricks Runtime 8.3 |
DateFormatClass |
Databricks Runtime 8.3 |
DateFromUnixDate |
Databricks Runtime 8.3 |
DateSub |
Databricks Runtime 8.3 |
DayOfMonth |
Databricks Runtime 8.3 |
DayOfWeek |
Databricks Runtime 8.3 |
DayOfYear |
Databricks Runtime 8.3 |
解码 |
Databricks Runtime 8.3 |
DenseRank |
Databricks Runtime 10.4 LTS |
分 |
Databricks Runtime 8.3 |
ElementAt |
Databricks Runtime 8.3 |
EqualNullSafe |
Databricks Runtime 8.3 |
等于 |
Databricks Runtime 8.3 |
经验值 |
Databricks Runtime 8.4 |
爆炸 |
Databricks Runtime 8.4 |
提取 |
Databricks Runtime 8.3 |
第一个 |
Databricks Runtime 8.3 |
FloorExpressionBuilder |
Databricks Runtime 8.3 |
FromUnixTime |
Databricks Runtime 8.3 |
FromUTCTimestamp * |
Databricks Runtime 8.3 |
得到 |
Databricks Runtime 11.3 LTS |
GetJsonObject |
Databricks运行时11.2 |
GreaterThan |
Databricks Runtime 8.3 |
GreaterThanOrEqual |
Databricks Runtime 8.3 |
最大的 |
Databricks Runtime 8.3 |
GridDistance |
Databricks Runtime 11.3 LTS |
H3ToString |
Databricks Runtime 11.3 LTS |
十六进制 |
Databricks Runtime 9.1 LTS |
小时 |
Databricks Runtime 8.3 |
如果 |
Databricks Runtime 8.3 |
在 |
Databricks Runtime 8.3 |
InitCap |
Databricks Runtime 11.3 LTS |
InputFileBlockLength |
Databricks Runtime 8.3 |
InputFileBlockStart |
Databricks Runtime 8.3 |
InputFileName |
Databricks Runtime 8.3 |
插图 |
Databricks Runtime 8.3 |
IntegralDivide |
Databricks Runtime 8.3 |
IsChildOf |
Databricks Runtime 11.3 LTS |
IsNaN |
Databricks Runtime 8.3 |
IsNotNull |
Databricks Runtime 8.3 |
IsNull |
Databricks Runtime 8.3 |
IsPentagon |
Databricks Runtime 11.3 LTS |
IsValid |
Databricks Runtime 11.3 LTS |
JsonToStructs |
Databricks运行时11.2 |
滞后 |
Databricks Runtime 10.4 LTS |
最后的 |
Databricks Runtime 10.4 LTS |
LastDay |
Databricks Runtime 8.3 |
引领 |
Databricks Runtime 10.4 LTS |
至少 |
Databricks Runtime 8.3 |
长度 |
Databricks Runtime 8.3 |
LengthOfJsonArray |
Databricks运行时 |
不超过 |
Databricks Runtime 8.3 |
Levenshtein |
Databricks运行时10.1 |
就像 |
Databricks Runtime 8.3 |
日志 |
Databricks Runtime 8.3 |
Log2 |
Databricks Runtime 8.4 |
LongLatAsH3 |
Databricks Runtime 11.3 LTS |
LongLatAsH3String |
Databricks Runtime 11.3 LTS |
较低的 |
Databricks Runtime 8.3 |
LPadExpressionBuilder |
Databricks Runtime 8.3 |
MakeDate |
Databricks Runtime 8.3 |
MakeTimestamp |
Databricks Runtime 8.3 |
马克斯 |
Databricks Runtime 8.3 |
MaxChild |
Databricks Runtime 11.3 LTS |
Md5 |
Databricks Runtime 10.4 LTS |
MicrosToTimestamp |
Databricks Runtime 8.3 |
MillisToTimestamp |
Databricks Runtime 8.3 |
最小值 |
Databricks Runtime 8.3 |
MinChild |
Databricks Runtime 11.3 LTS |
一分钟 |
Databricks Runtime 8.3 |
MonotonicallyIncreasingID |
Databricks Runtime 8.3 |
月 |
Databricks Runtime 8.3 |
MonthsBetween |
Databricks Runtime 8.3 |
乘 |
Databricks Runtime 8.3 |
Murmur3Hash |
Databricks Runtime 8.3 |
NaNvl |
Databricks Runtime 8.3 |
NextDay |
Databricks Runtime 8.3 |
不 |
Databricks Runtime 8.3 |
现在 |
Databricks Runtime 8.3 |
NthValue |
Databricks Runtime 10.4 LTS |
NTile |
Databricks Runtime 10.4 LTS |
NullIf |
Databricks Runtime 8.3 |
Nvl |
Databricks Runtime 8.3 |
Nvl2 |
Databricks Runtime 8.3 |
OctetLength |
Databricks Runtime 8.3 |
ParseToDate |
Databricks Runtime 8.3 |
ParseToTimestamp |
Databricks Runtime 8.3 |
百分位 |
Databricks Runtime 10.4 LTS |
PercentRank |
Databricks Runtime 10.4 LTS |
π |
Databricks Runtime 8.3 |
Pmod |
Databricks Runtime 8.3 |
PosExplode |
Databricks Runtime 9.1 LTS |
战俘 |
Databricks Runtime 8.3 |
季度 |
Databricks Runtime 8.3 |
兰德 |
Databricks Runtime 8.3 |
排名 |
Databricks Runtime 10.4 LTS |
RegExpExtract |
Databricks Runtime 8.3 |
RegExpExtractAll |
Databricks运行时 |
RegExpReplace |
Databricks Runtime 9.1 LTS |
RegrAvgX |
Databricks运行时10.5 |
RegrAvgY |
Databricks运行时10.5 |
剩余部分 |
Databricks Runtime 8.3 |
决议 |
Databricks Runtime 11.3 LTS |
反向 |
Databricks Runtime 8.3 |
反向 |
Databricks Runtime 8.3 |
RLike |
Databricks Runtime 8.3 |
轮 |
Databricks Runtime 8.3 |
RowNumber |
Databricks Runtime 10.4 LTS |
RPadExpressionBuilder |
Databricks Runtime 8.3 |
第二个 |
Databricks Runtime 8.3 |
SecondsToTimestamp |
Databricks Runtime 8.3 |
Sha1 |
Databricks Runtime 10.4 LTS |
Sha2 |
Databricks Runtime 10.4 LTS |
ShiftLeft |
Databricks Runtime 8.3 |
ShiftRight |
Databricks Runtime 8.3 |
ShiftRightUnsigned |
Databricks Runtime 8.3 |
罪 |
Databricks Runtime 10.4 LTS |
大小 |
Databricks Runtime 8.3 |
片 |
Databricks Runtime 8.3 |
探测法 |
Databricks运行时10.1 |
SparkVersion |
Databricks Runtime 8.3 |
√6 |
Databricks Runtime 8.4 |
StddevPop |
Databricks Runtime 8.3 |
StddevSamp |
Databricks Runtime 8.3 |
StringInstr |
Databricks Runtime 8.3 |
StringLocate |
Databricks Runtime 8.3 |
StringRepeat |
Databricks运行时11.2 |
StringSpace |
Databricks Runtime 8.3 |
StringSplit |
Databricks Runtime 8.3 |
StringToH3 |
Databricks Runtime 11.3 LTS |
StringTranslate |
Databricks Runtime 10.4 LTS |
StringTrim |
Databricks Runtime 8.3 |
StringTrimBoth |
Databricks Runtime 8.3 |
StringTrimLeft |
Databricks Runtime 8.3 |
StringTrimRight |
Databricks Runtime 8.3 |
StructsToJson |
Databricks运行时 |
子字符串 |
Databricks Runtime 8.3 |
减去 |
Databricks Runtime 8.3 |
总和 |
Databricks Runtime 8.3 |
棕褐色 |
Databricks Runtime 9.1 LTS |
ToChildren |
Databricks Runtime 11.3 LTS |
ToParent |
Databricks Runtime 11.3 LTS |
ToRadians |
Databricks运行时10.1 |
ToUnixTimestamp |
Databricks Runtime 8.3 |
ToUTCTimestamp |
Databricks Runtime 8.3 |
TruncDate |
Databricks Runtime 8.3 |
TruncTimestamp |
Databricks Runtime 8.3 |
TryElementAt |
Databricks运行时10.0 |
TryValidate |
Databricks Runtime 11.3 LTS |
UnaryMinus |
Databricks Runtime 8.3 |
UnBase64 |
Databricks Runtime 9.1 LTS |
Unhex |
Databricks Runtime 9.1 LTS |
UnixDate |
Databricks Runtime 8.3 |
UnixMicros |
Databricks Runtime 8.3 |
UnixMillis |
Databricks Runtime 8.3 |
UnixSeconds |
Databricks Runtime 8.3 |
UnixTimestamp |
Databricks Runtime 8.3 |
上 |
Databricks Runtime 8.3 |
Uuid |
Databricks Runtime 8.3 |
验证 |
Databricks Runtime 11.3 LTS |
VarianceSamp |
Databricks运行时10.1 |
工作日 |
Databricks Runtime 8.3 |
WeekOfYear |
Databricks Runtime 8.3 |
XxHash64 |
Databricks运行时10.0 |
一年 |
Databricks Runtime 8.3 |
Photon不完全支持*from_utc_timestamp。看到from_utc_timestamp获取更多信息。 |
限制
结构化流:Photon目前支持Delta、Parquet和CSV的无状态流。Kafka和Kinesis支持公共预览
不支持udf。
不支持RDD api。
不期望改进短时间运行的查询(<2秒),例如,针对少量数据的查询。
Photon不支持的特性以与Databricks Runtime相同的方式运行;这些特性没有性能优势。