更新三角洲湖表模式<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#update-delta-lake-table-schema" title="">

三角洲湖允许您更新一个表的模式。支持以下类型的变化:

添加新列(在任意位置)
重新安排现有的列
重命名现有列

你可以让这些变化显式或隐式地使用DML使用DDL。

重要的

当你更新一个δ表模式,流读取该表的终止。如果你想继续流必须重新启动它。

推荐的方法,请参阅<一个class="reference internal" href="//www.neidfyre.com/docs.gcp/structured-streaming/production.html">生产注意事项结构化流。

显式地添加列更新模式<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-add-columns" title="">

             改变表table_name添加列(col_namedata_type(评论col_comment](第一个|后colA_name),…)
            

默认情况下,nullability真正的。

添加一个列到嵌套,使用:

             改变表table_name添加列(col_name。nested_col_namedata_type(评论col_comment](第一个|后colA_name),…)
            

例如,如果之前的模式运行改变表盒子添加列(colB.nested字符串后field1)是:

             - - - - - -根|- - - - - -可乐|- - - - - -colB|+ -field1|+ -field2
            

后的模式是:

             - - - - - -根|- - - - - -可乐|- - - - - -colB|+ -field1|+ -嵌套的|+ -field2
            

请注意

添加嵌套列支持结构。不支持数组和地图。

显式地更新模式改变列的评论或订购<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-change-column-comment-or-ordering" title="">

             改变表table_name改变(列]col_name(评论col_comment|第一个|后colA_name)
            

改变一个列在一个嵌套的领域,使用:

             改变表table_name改变(列]col_name。nested_col_name(评论col_comment|第一个|后colA_name)
            

例如,如果之前的模式运行改变表盒子改变列colB.field2第一个是:

             - - - - - -根|- - - - - -可乐|- - - - - -colB|+ -field1|+ -field2
            

后的模式是:

             - - - - - -根|- - - - - -可乐|- - - - - -colB|+ -field2|+ -field1
            

明确更新模式来取代列<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-replace-columns" title="">

             改变表table_name取代列(col_name1col_type1(评论col_comment1),…)
            

例如,当运行以下DDL:

             改变表盒子取代列(colC字符串,colB结构体<field2:字符串,嵌套的:字符串,field1:字符串>,可乐字符串)
            

如果之前的模式是:

             - - - - - -根|- - - - - -可乐|- - - - - -colB|+ -field1|+ -field2
            

后的模式是:

             - - - - - -根|- - - - - -colC|- - - - - -colB|+ -field2|+ -嵌套的|+ -field1|- - - - - -可乐
            

重命名列明确的更新模式<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-rename-columns" title="">

预览

这个特性是在<一个class="reference internal" href="//www.neidfyre.com/docs.gcp/release-notes/release-types.html">公共预览。

请注意

这个特性可以在砖运行时10.2及以上。

重命名列不重写任何列的现有数据,您必须启用列映射表。看到<一个class="reference internal" href="//www.neidfyre.com/docs.gcp/delta/delta-column-mapping.html">重命名和删除列与三角洲湖列映射。

重命名一个列:

             改变表table_name重命名列old_col_name来new_col_name
            

重命名一个嵌套的字段:

             改变表table_name重命名列col_name。old_nested_field来new_nested_field
            

例如,当您运行以下命令:

             改变表盒子重命名列colB。field1来field001
            

如果之前的模式是:

             - - - - - -根|- - - - - -可乐|- - - - - -colB|+ -field1|+ -field2
            

之后的模式是:

             - - - - - -根|- - - - - -可乐|- - - - - -colB|+ -field001|+ -field2
            

看到<一个class="reference internal" href="//www.neidfyre.com/docs.gcp/delta/delta-column-mapping.html">重命名和删除列与三角洲湖列映射。

明确更新模式删除列<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-drop-columns" title="">

预览

这个特性是在<一个class="reference internal" href="//www.neidfyre.com/docs.gcp/release-notes/release-types.html">公共预览。

请注意

这个特性可以在砖运行时11.0及以上。

删除列仅元数据操作,而无需重新编写任何数据文件,您必须启用列映射表。看到<一个class="reference internal" href="//www.neidfyre.com/docs.gcp/delta/delta-column-mapping.html">重命名和删除列与三角洲湖列映射。

重要的

删除一列从元数据不会删除列的底层数据文件。清除掉列数据,您可以使用<一个class="reference internal" href="//www.neidfyre.com/docs.gcp/sql/language-manual/delta-reorg-table.html">REORG表修改文件。然后,您可以使用<一个class="reference internal" href="//www.neidfyre.com/docs.gcp/sql/language-manual/delta-vacuum.html">真空物理上删除的文件包含了列数据。

放弃一个列:

             改变表table_name下降列col_name
            

将多个列:

             改变表table_name下降列(col_name_1,col_name_2)
            

明确更新模式改变列类型或名称<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-change-column-type-or-name" title="">

你可以改变一个列的类型或名称或删除表通过重写一列。要做到这一点,使用overwriteSchema选择。

下面的例子显示了更改列类型:

             (火花。读。表(…)。withColumn(“生日”,上校(“生日”)。投(“日期”))。写。模式(“覆盖”)。选项(“overwriteSchema”,“真正的”)。saveAsTable(…))
            

下面的例子展示了更改列的名字:

             (火花。读。表(…)。withColumnRenamed(“dateOfBirth”,“生日”)。写。模式(“覆盖”)。选项(“overwriteSchema”,“真正的”)。saveAsTable(…))
            

添加列自动模式更新<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#add-columns-with-automatic-schema-update" title="">

列中DataFrame但失踪从表中自动添加时写事务的一部分:

写或writeStream有.option (“mergeSchema”,“真正的”)
spark.databricks.delta.schema.autoMerge.enabled是真正的

当指定两个选项,选择从DataFrameWriter优先。添加的列是附加到他们存在的结构。当添加一个新的列保存。

请注意

mergeSchema不能用于插入成或.write.insertInto ()。

自动模式演化为三角洲湖合并<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#automatic-schema-evolution-for-delta-lake-merge" title="">

模式演化允许用户解决模式不匹配的目标和源表合并。它处理以下两种情况:

源表中的一列目标表中不存在。新列添加到目标模式,和它的值插入或更新使用源值。
目标表中的一列源表中不存在。目标模式是不变的;额外的目标列中的值不变(更新)或一组零(插入)。

重要的

使用模式演化,你必须设置会话配置'spark.databricks.delta.schema.autoMerge.enabled火花”真正的在你运行合并命令。

请注意

在砖LTS 7.3运行时,合并只支持模式演化的顶级列,而不是嵌套的列。
砖运行时的12.2及以上,可以指定列出现在源表的名字在插入或更新操作。在砖12.1运行时,下面的,只有插入*或更新集*行动可以用于模式演化与合并。

这里有一些例子的影响合并操作,没有模式演化。

列	在SQL查询()	行为没有模式演化(默认)	行为模式演化
目标列:`键,价值` 源列:`键,值,new_value`	合并成target_tablet使用source_table年代在t。关键=年代。关键当匹配然后更新集当不匹配然后插入	表模式保持不变;只列`关键`,`价值`更新/插入。	表模式改变`(关键值,new_value)`。与匹配更新现有的记录`价值`和`new_value`在源。新行插入模式`(关键值,new_value)`。
目标列:`键,old_value` 源列:`键,new_value`	合并成target_tablet使用source_table年代在t。关键=年代。关键当匹配然后更新集当不匹配然后插入	`更新`和`插入`因为目标列行动抛出一个错误`old_value`不是在源。	表模式改变`(关键old_value,new_value)`。与匹配更新现有的记录`new_value`在源离开`old_value`不变。新记录插入指定的`关键`,`new_value`,`零`为`old_value`。
目标列:`键,old_value` 源列:`键,new_value`	合并成target_tablet使用source_table年代在t。关键=年代。关键当匹配然后更新集new_value=年代。new_value	`更新`抛出一个错误,因为列`new_value`目标表中不存在。	表模式改变`(关键old_value,new_value)`。与匹配更新现有的记录`new_value`在源离开`old_value`不变,和无与伦比的记录`零`参加了`new_value`。看到的请注意<一个class="reference internal" href="//www.neidfyre.com/docs.gcp/delta/#1">(1)。
目标列:`键,old_value` 源列:`键,new_value`	合并成target_tablet使用source_table年代在t。关键=年代。关键当不匹配然后插入(关键,new_value)值(年代。关键,年代。new_value)	`插入`抛出一个错误,因为列`new_value`目标表中不存在。	表模式改变`(关键old_value,new_value)`。新记录插入指定的`关键`,`new_value`,`零`为`old_value`。现有的记录`零`参加了`new_value`离开`old_value`不变。看到的请注意<一个class="reference internal" href="//www.neidfyre.com/docs.gcp/delta/#1">(1)。

(1)这种行为可以在砖运行时12.2及以上;在这种情况下砖12.1运行时,下面的错误。

自动模式演化结构体的数组<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#automatic-schema-evolution-for-arrays-of-structs" title="">

δ合并成支持解决结构体字段为结构体数组的名字和发展模式。启用模式演化后,目标表模式将为阵列结构的进化,也适用于任何嵌套的结构体数组的内部。

请注意

这个特性可以在砖运行时9.1及以上。砖运行时的9.0及以下,隐式火花铸造用于数组结构来解决结构体字段的位置,和合并操作的影响,没有模式演化的结构体数组与以外的结构体数组的行为不一致。
在砖运行时的12.2及以上,可以指定结构体字段出现在源表的名字在insert或update命令。在砖12.1运行时,下面的,只有插入*或更新集*命令可用于模式演化与合并。

这里有一些例子的合并操作的影响,没有模式演化的结构体数组。

源模式	目标模式	行为没有模式演化(默认)	行为模式演化
数组< struct < b:字符串,答:string > >	<结构体数组< int, b: int > >	表模式保持不变。列将解决名称和更新或插入。	表模式保持不变。列将解决名称和更新或插入。
数组< struct < int, c:字符串,d: string > >	<结构体数组<字符串,b: string > >	`更新`和`插入`把错误是因为`c`和`d`目标表中不存在。	表模式更改为数组< struct <字符串,b:字符串,c:字符串,d: string > >。`c`和`d`是插入`零`现有条目的目标表。`更新`和`插入`源表中的条目填充`一个`字符串和转化`b`作为`零`。
数组< struct <字符串,b: struct < c:字符串,d: string > > >	数组< struct <字符串,b: struct < c: string > > >	`更新`和`插入`把错误是因为`d`目标表中不存在。	目标表模式更改为数组< struct <字符串,b: struct < c:字符串,d: string > > >。`d`是插入`零`现有条目的目标表。

处理`NullType`列模式更新<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#dealing-with-nulltype-columns-in-schema-updates" title="">

因为拼花不支持NullType,NullType列从DataFrame当编写成三角洲表下降,但仍存储在模式。当接收到一个不同的数据类型列,三角洲湖合并到新数据类型的模式。如果δ收到一个湖NullType对于一个已有的列,保留旧模式和新列是在写了。

NullType在不支持流媒体。因为使用流媒体时必须设置模式这应该是非常罕见的。NullType也不接受等复杂类型ArrayType和MapType。

替换表模式<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#replace-table-schema" title="">

默认情况下,覆盖一个表中的数据不会覆盖模式。当覆盖表使用模式(“覆盖”)没有replaceWhere,您可能还想覆盖写入数据的模式。你替换的模式和分区表通过设置overwriteSchema选项真正的:

             df。写。选项(“overwriteSchema”,“真正的”)
            

重要的

你不能指定overwriteSchema作为真正的当使用动态分区覆盖。

更新三角洲湖表模式<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#update-delta-lake-table-schema" title="">

显式地添加列更新模式<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-add-columns" title="">

显式地更新模式改变列的评论或订购<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-change-column-comment-or-ordering" title="">

明确更新模式来取代列<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-replace-columns" title="">

重命名列明确的更新模式<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-rename-columns" title="">

明确更新模式删除列<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-drop-columns" title="">

明确更新模式改变列类型或名称<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#explicitly-update-schema-to-change-column-type-or-name" title="">

添加列自动模式更新<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#add-columns-with-automatic-schema-update" title="">

自动模式演化为三角洲湖合并<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#automatic-schema-evolution-for-delta-lake-merge" title="">

自动模式演化结构体的数组<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#automatic-schema-evolution-for-arrays-of-structs" title="">

处理NullType列模式更新<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#dealing-with-nulltype-columns-in-schema-updates" title="">

替换表模式<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#replace-table-schema" title="">

处理`NullType`列模式更新<一个class="headerlink" href="//www.neidfyre.com/docs.gcp/delta/#dealing-with-nulltype-columns-in-schema-updates" title="">