数据加载和复制

复制SQL命令允许您的数据文件位置加载到三角洲表。这是一个re-triable和幂等操作;文件已经被加载的源位置跳过。

请注意

更具有可伸缩性和健壮的文件摄取经验,砖建议SQL用户利用流表。

需求

一个帐户管理必须遵循的步骤_配置访问云中的数据对象存储在用户可以使用加载数据复制

源格式支持

支持源格式复制包括CSV、JSON、Avro兽人,拼花,文本和二进制文件。源可以在任何地方,你的砖工作空间的访问权。

例如:数据加载到一个无模式三角洲湖表

请注意

这个特性可以在砖运行时11.0及以上。

您可以创建空的占位符三角洲表模式后推断出在一个复制命令:

创建如果存在my_table(评论<- - - - - -描述>](TBLPROPERTIES(<- - - - - -属性>));复制my_table“/道路/ /文件”FILEFORMAT=<格式>FORMAT_OPTIONS(“mergeSchema”=“真正的”)COPY_OPTIONS(“mergeSchema”=“真正的”);

上面的SQL语句是幂等的,可以调度运行摄取数据只有一次到三角洲表。

请注意

空三角洲表之外不是可用的复制插入合并不支持将数据写入无模式三角洲表。在数据插入到表中复制,表就可查询。

看到创建复制到目标表

例如:设置模式和数据加载到一个三角洲湖表

下面的例子显示了如何创建一个增量表,然后使用复制SQL命令加载示例数据砖的数据集到桌子上。您可以运行Python的例子中,R, Scala中,或从一个SQL代码笔记本附加到一个砖集群。您还可以运行的SQL代码查询关联到一个SQL仓库砖的SQL

table_name=“default.loan_risks_upload”source_data=/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet 'source_format=“铺”火花sql(如果存在删除表”+table_name)火花sql(“CREATE TABLE”+table_name+”(“\“loan_id BIGINT。”+\“funded_amnt INT。”+\“paid_amnt加倍,”+\“addr_state字符串)”)火花sql(“复制到”+table_name+\“从”+source_data+“”+\" FILEFORMAT = "+source_format)loan_risks_upload_data=火花sql(“SELECT * FROM”+table_name)显示(loan_risks_upload_data)“‘结果:+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| loan_id | funded_amnt | paid_amnt | addr_state |+ = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +| 0 | 1000 | 182.22 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 1000 | 361.19 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 2 | 1000 | 176.26 | TX |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +“‘
图书馆(SparkR)sparkR.session()table_name=“default.loan_risks_upload”source_data=“/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet”source_format=“铺”sql(粘贴(如果存在删除表”,table_name,9月=”“))sql(粘贴(“CREATE TABLE”,table_name,”(“,“loan_id BIGINT。”,“funded_amnt INT。”,“paid_amnt加倍,”,“addr_state字符串)”,9月=”“))sql(粘贴(“复制到”,table_name,“从”,source_data,“”," FILEFORMAT = ",source_format,9月=”“))loan_risks_upload_data=tableToDF(table_name)显示(loan_risks_upload_data)结果:# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | loan_id | funded_amnt | paid_amnt | addr_state |# + = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +# | 0 | 1000 | 182.22 | |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | 1 | 1000 | 361.19 | |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | 2 | 1000 | 176.26 | TX |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +#……
瓦尔table_name=“default.loan_risks_upload”瓦尔source_data=“/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet”瓦尔source_format=“铺”火花sql(如果存在删除表”+table_name)火花sql(“CREATE TABLE”+table_name+”(“+“loan_id BIGINT。”+“funded_amnt INT。”+“paid_amnt加倍,”+“addr_state字符串)”)火花sql(“复制到”+table_name+“从”+source_data+“”+" FILEFORMAT = "+source_format)瓦尔loan_risks_upload_data=火花(table_name)显示(loan_risks_upload_data)/ *结果:+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| loan_id | funded_amnt | paid_amnt | addr_state |+ = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +| 0 | 1000 | 182.22 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 1000 | 361.19 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 2 | 1000 | 176.26 | TX |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +* /
下降如果存在默认的loan_risks_upload;创建默认的loan_risks_upload(loan_id长整型数字,funded_amntINT,paid_amnt,addr_state字符串);复制默认的loan_risks_upload/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet 'FILEFORMAT=拼花;选择*默认的loan_risks_upload;——结果:- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| loan_id | funded_amnt | paid_amnt | addr_state |- + = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +——| 0 | 1000 | 182.22 | |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| 1 | 1000 | 361.19 | |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| 2 | 1000 | 176.26 | TX |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——……

清理,运行以下代码,删除表:

火花sql(“删除表”+table_name)
sql(粘贴(“删除表”,table_name,9月=”“))
火花sql(“删除表”+table_name)
下降默认的loan_risks_upload

参考

额外的资源