用COPY INTO加载数据

复制SQL命令允许您将数据从文件位置加载到Delta表中。这是一个可重复试验的幂等运算;源位置中已经加载的文件将被跳过。

复制以多种方式支持安全访问,包括使用临时凭证

三角洲湖空桌

请注意

此特性在Databricks Runtime 11.0及以上版本中可用。

可以创建空的占位符Delta表,以便稍后在操作期间推断模式复制命令:

创建表格如果存在my_table评论<table_description>TBLPROPERTIES<table_properties>));复制my_table“/道路/ /文件”FILEFORMAT<格式>FORMAT_OPTIONS“mergeSchema”“真正的”COPY_OPTIONS“mergeSchema”“真正的”);

上面的SQL语句是幂等的,可以计划运行以准确地摄取数据——一次进入Delta表。

请注意

空的Delta表在外部不可用复制插入而且合并不支持将数据写入无模式的Delta表。数据插入表后用复制,表变为可查询的。

看到为COPY INTO创建目标表

例子

有关常用模式,请参见使用COPY INTO的常见数据加载模式

方法创建Delta表,然后使用复制加载示例数据的SQL命令砖的数据集放进桌子里。您可以运行示例Python, R, Scala或SQL代码从笔记本连接到数据库集群.也可以从查询关联于SQL仓库砖的SQL

table_name“default.loan_risks_upload”source_data/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet 'source_format“铺”火花sql删除表,如果存在+table_name火花sql“创建表”+table_name+”(“"loan_id BIGINT, "+"funded_amnt INT, "+"paid_amnt DOUBLE, "+“addr_state字符串)”火花sql“复制到”+table_name+“从’”+source_data+“”+" Fileformat = "+source_formatloan_risks_upload_data火花sql“select * from”+table_name显示loan_risks_upload_data'''结果:+---------+-------------+-----------+------------+| loan_id | funded_amnt | paid_amnt | addr_state |+=========+=============+===========+============+| 0 | 1000 | 182.22 | ca |+---------+-------------+-----------+------------+| 1 | 1000 | 361.19 | wa |+---------+-------------+-----------+------------+| 2 | 1000 | 176.26 | tx |+---------+-------------+-----------+------------+...'''
图书馆SparkRsparkR.session()table_name“default.loan_risks_upload”source_data“/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet”source_format“铺”sql粘贴删除表,如果存在table_name9月""))sql粘贴“创建表”table_name”(“"loan_id BIGINT, ""funded_amnt INT, ""paid_amnt DOUBLE, "“addr_state字符串)”9月""))sql粘贴“复制到”table_name“从’”source_data“”" Fileformat = "source_format9月""))loan_risks_upload_datatableToDFtable_name显示loan_risks_upload_data结果:# +---------+-------------+-----------+------------+# | loan_id | funded_amnt | paid_amnt | addr_state |# +=========+=============+===========+============+# | 0 | 1000 | 182.22 | ca |# +---------+-------------+-----------+------------+# | 1 | 1000 | 361.19 | wa |# +---------+-------------+-----------+------------+# | 2 | 1000 | 176.26 | tx |# +---------+-------------+-----------+------------+#……
瓦尔table_name“default.loan_risks_upload”瓦尔source_data“/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet”瓦尔source_format“铺”火花sql删除表,如果存在+table_name火花sql“创建表”+table_name+”(“+"loan_id BIGINT, "+"funded_amnt INT, "+"paid_amnt DOUBLE, "+“addr_state字符串)”火花sql“复制到”+table_name+“从’”+source_data+“”+" Fileformat = "+source_format瓦尔loan_risks_upload_data火花表格table_name显示loan_risks_upload_data/*结果:+---------+-------------+-----------+------------+| loan_id | funded_amnt | paid_amnt | addr_state |+=========+=============+===========+============+| 0 | 1000 | 182.22 | ca |+---------+-------------+-----------+------------+| 1 | 1000 | 361.19 | wa |+---------+-------------+-----------+------------+| 2 | 1000 | 176.26 | tx |+---------+-------------+-----------+------------+...* /
下降表格如果存在默认的loan_risks_upload创建表格默认的loan_risks_uploadloan_id长整型数字funded_amntINTpaid_amntaddr_state字符串);复制默认的loan_risks_upload/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet 'FILEFORMAT拼花选择默认的loan_risks_upload——结果:-- +---------+-------------+-----------+------------+——| loan_id | funded_amnt | paid_amnt | addr_state |-- +=========+=============+===========+============+——| 0 | 1000 | 182.22 | ca |-- +---------+-------------+-----------+------------+——| 1 | 1000 | 361.19 | wa |-- +---------+-------------+-----------+------------+——| 2 | 1000 | 176.26 | tx |-- +---------+-------------+-----------+------------+——……

要清理,运行下面的代码,删除表:

火花sql“跌落表”+table_name
sql粘贴“跌落表”table_name9月""))
火花sql“跌落表”+table_name
下降表格默认的loan_risks_upload

参考