用COPY INTO加载数据
的复制成
SQL命令允许您将数据从文件位置加载到Delta表中。这是一个可重复试验的幂等运算;源位置中已经加载的文件将被跳过。
复制成
以多种方式支持安全访问,包括使用临时凭证.
三角洲湖空桌
请注意
此特性在Databricks Runtime 11.0及以上版本中可用。
可以创建空的占位符Delta表,以便稍后在操作期间推断模式复制成
命令:
创建表格如果不存在my_table[评论<table_description>][TBLPROPERTIES(<table_properties>));复制成my_table从“/道路/ /文件”FILEFORMAT=<格式>FORMAT_OPTIONS(“mergeSchema”=“真正的”)COPY_OPTIONS(“mergeSchema”=“真正的”);
上面的SQL语句是幂等的,可以计划运行以准确地摄取数据——一次进入Delta表。
请注意
空的Delta表在外部不可用复制成
.插入成
而且合并成
不支持将数据写入无模式的Delta表。数据插入表后用复制成
,表变为可查询的。
例子
有关常用模式,请参见使用COPY INTO的常见数据加载模式
方法创建Delta表,然后使用复制成
加载示例数据的SQL命令砖的数据集放进桌子里。您可以运行示例Python, R, Scala或SQL代码从笔记本连接到数据库集群.也可以从查询关联于SQL仓库在砖的SQL.
table_name=“default.loan_risks_upload”source_data=/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet 'source_format=“铺”火花.sql(删除表,如果存在+table_name)火花.sql(“创建表”+table_name+”(“\"loan_id BIGINT, "+\"funded_amnt INT, "+\"paid_amnt DOUBLE, "+\“addr_state字符串)”)火花.sql(“复制到”+table_name+\“从’”+source_data+“”+\" Fileformat = "+source_format)loan_risks_upload_data=火花.sql(“select * from”+table_name)显示(loan_risks_upload_data)'''结果:+---------+-------------+-----------+------------+| loan_id | funded_amnt | paid_amnt | addr_state |+=========+=============+===========+============+| 0 | 1000 | 182.22 | ca |+---------+-------------+-----------+------------+| 1 | 1000 | 361.19 | wa |+---------+-------------+-----------+------------+| 2 | 1000 | 176.26 | tx |+---------+-------------+-----------+------------+...'''
图书馆(SparkR)sparkR.session()table_name=“default.loan_risks_upload”source_data=“/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet”source_format=“铺”sql(粘贴(删除表,如果存在,table_name,9月=""))sql(粘贴(“创建表”,table_name,”(“,"loan_id BIGINT, ","funded_amnt INT, ","paid_amnt DOUBLE, ",“addr_state字符串)”,9月=""))sql(粘贴(“复制到”,table_name,“从’”,source_data,“”," Fileformat = ",source_format,9月=""))loan_risks_upload_data=tableToDF(table_name)显示(loan_risks_upload_data)结果:# +---------+-------------+-----------+------------+# | loan_id | funded_amnt | paid_amnt | addr_state |# +=========+=============+===========+============+# | 0 | 1000 | 182.22 | ca |# +---------+-------------+-----------+------------+# | 1 | 1000 | 361.19 | wa |# +---------+-------------+-----------+------------+# | 2 | 1000 | 176.26 | tx |# +---------+-------------+-----------+------------+#……
瓦尔table_name=“default.loan_risks_upload”瓦尔source_data=“/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet”瓦尔source_format=“铺”火花.sql(删除表,如果存在+table_name)火花.sql(“创建表”+table_name+”(“+"loan_id BIGINT, "+"funded_amnt INT, "+"paid_amnt DOUBLE, "+“addr_state字符串)”)火花.sql(“复制到”+table_name+“从’”+source_data+“”+" Fileformat = "+source_format)瓦尔loan_risks_upload_data=火花.表格(table_name)显示(loan_risks_upload_data)/*结果:+---------+-------------+-----------+------------+| loan_id | funded_amnt | paid_amnt | addr_state |+=========+=============+===========+============+| 0 | 1000 | 182.22 | ca |+---------+-------------+-----------+------------+| 1 | 1000 | 361.19 | wa |+---------+-------------+-----------+------------+| 2 | 1000 | 176.26 | tx |+---------+-------------+-----------+------------+...* /
下降表格如果存在默认的.loan_risks_upload;创建表格默认的.loan_risks_upload(loan_id长整型数字,funded_amntINT,paid_amnt双,addr_state字符串);复制成默认的.loan_risks_upload从/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet 'FILEFORMAT=拼花;选择*从默认的.loan_risks_upload;——结果:-- +---------+-------------+-----------+------------+——| loan_id | funded_amnt | paid_amnt | addr_state |-- +=========+=============+===========+============+——| 0 | 1000 | 182.22 | ca |-- +---------+-------------+-----------+------------+——| 1 | 1000 | 361.19 | wa |-- +---------+-------------+-----------+------------+——| 2 | 1000 | 176.26 | tx |-- +---------+-------------+-----------+------------+——……
要清理,运行下面的代码,删除表:
火花.sql(“跌落表”+table_name)
sql(粘贴(“跌落表”,table_name,9月=""))
火花.sql(“跌落表”+table_name)
下降表格默认的.loan_risks_upload
参考
Databricks RuntimeX及以上:复制到