Load Files to Iceberg Tables using SQL
You may load files like CSV
, JSON
, Parquet
and ORC
in S3 compatible object storages to iceberg tables in Chango using SQL,
for which Chango provides Chango SQL Procedure
.
Chango Spark Thrift Server
to which hive clients like superset can connect through JDBC/Thrift
and Chango Spark SQL Runner
which exposes REST API can be used to run Chango SQL Procedure
.
If you want to send
Chango SQL Procedure
toChango Spark SQL Runner
through REST to load files to iceberg tables in Chango, Take a look at Send Spark SQL Queries to Chango Spark SQL Runner.
This shows how to run Chango SQL Procedure
with Chango Spark Thrift Server
to load files to iceberg tables in Chango.
Make sure that you have installed Spark on which
Chango Spark Thrift Server
is deployed beforehand.
Load Files to Iceberg Tables using Import Procedure
With import
procedure, you may load files to iceberg tables in Chango.
PROC iceberg.system.import (
source => 's3a://any-bucket/any-path',
s3_access_key => 'any access key',
s3_secret_key => 'any secret key',
s3_endpoint => 'any endpoint',
s3_region => 'any region',
file_format => 'json',
id_columns => 'id_1, id_2',
action => 'MERGE',
target_table => 'iceberg.test_db.test'
)
s3_access_key
: s3 access key.s3_secret_key
: s3 secret key.s3_endpoint
: s3 endpoint.s3_region
: s3 region.file_format
: one ofcsv
,json
,parquet
,orc
.source
: file source path. currently, onlys3a
scheme is supported. It must be started withs3a
.action
: one ofMERGE
,APPEND
,OVERWRITE
.id_columns
: comma separated id columns of iceberg table. If action isMERGE
, it is mandatory.target_table
: target iceberg table with fully qualified name. The convention isiceberg.<schema>.<table>
.
With the action value of MERGE
which needs id_columns for merging, the external file will be merged into iceberg table.
In addition, in order to append to existing iceberg table, the action value of APPEND
needs to be used,
and the action value of OVERWRITE
needs to be used to replace iceberg table dataset with new loaded file data.
For example, to load the files in S3 to iceberg table in Chango using SQL, first you need to create iceberg table iceberg.iceberg_db.test_proc_import_parquet
using for example,
superset which connects to Chango Spark Thrift Server
in Chango like this.
CREATE TABLE IF NOT EXISTS iceberg.iceberg_db.test_proc_import_parquet (
baseproperties STRUCT<eventtype: string,
ts: long,
uid: string,
version: string>,
itemid string,
price long,
quantity long
)
USING iceberg
PARTITIONED BY (itemid);
After that, run the following import procedure with, for example superset.
PROC iceberg.system.import (
source => 's3a://mykidong/temp-external-parquet-path',
s3_access_key => 'xxx',
s3_secret_key => 'xxx',
s3_endpoint => 'https://xxx.compat.objectstorage.ap-singapore-1.oraclecloud.com',
s3_region => 'ap-singapore-1',
file_format => 'parquet',
action => 'APPEND',
target_table => 'iceberg.iceberg_db.test_proc_import_parquet'
)
The parquet file located in s3a://mykidong/temp-external-parquet-path
will be loaded into iceberg table iceberg.iceberg_db.test_proc_import_parquet
in Chango.
Export Iceberg Table to File using Export Procedure
If you want to export iceberg table dataset to files, use the following export
procedure in Chango.
PROC iceberg.system.export (
source_table => 'iceberg.test_db.test',
file_format => 'json',
target_path => 's3a://any-bucket/any-path',
s3_access_key => 'any access key',
s3_secret_key => 'any secret key',
s3_endpoint => 'any endpoint',
s3_region => 'any region'
)
source_table
: source iceberg table with fully qualified name. The convention isiceberg.<schema>.<table>
.target_path
: target file path. currently, onlys3a
scheme is supported. It must be started withs3a
.
For example, run the following export
procedure in order to export iceberg table dataset to parquet file in S3.
-- export iceberg table to parquet file in external s3.
PROC iceberg.system.export (
source_table => 'iceberg.iceberg_db.test_proc_import_parquet',
file_format => 'parquet',
target_path => 's3a://chango-bucket/temp-proc-export-another-s3-parquet',
s3_access_key => 'xxx',
s3_secret_key => 'xxx',
s3_endpoint => 'https://s3.ap-northeast-2.amazonaws.com',
s3_region => 'ap-northeast-2'
)
The dataset of iceberg table iceberg.iceberg_db.test_proc_import_parquet
in Chango will be
exported to s3a://chango-bucket/temp-proc-export-another-s3-parquet
as parquet file.