Apache Iceberg

Preparation

To allow Shaped to connect to your Iceberg table, you need to grant Shaped’s AWS service account read-only access to your data lake. You can do this through the AWS console or with the following steps:

Contact us for our service account via email.
Grant our service account permission to access your data lake via the appropriate IAM role. For example, if your Iceberg table is stored in S3, you can grant our service account the s3:GetObject and s3:ListBucket permissions on the relevant S3 bucket.

Dataset Configuration

Required fields

Field	Example	Description
schema_type	`ICEBERG`	Specifies the connector schema type, in this case "ICEBERG".
catalog_type	`glue` or `hive`	Specifies the type of the Iceberg catalog.
catalog_name	`my_glue_catalog` or `my_hive_catalog`	Specifies the name of the Iceberg catalog.
table_name	`my_iceberg_table`	Specifies the name of the Iceberg table.

Optional fields

Field	Example	Description
aws_role_arn	`arn:aws:iam::123456789012:role/my_role`	Specifies the ARN of an AWS role to assume when accessing the Iceberg table. This is required if the Iceberg table is stored in a secure location, such as an S3 bucket with restricted access.
aws_region	`us-east-1`	Specifies the AWS region where the Iceberg table is located. This is required if the Iceberg table is stored in a region other than the default region for your AWS account.
unique_keys	["productId"]	Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used.
batch_size	10000	Specifies the number of records to fetch in each batch. The default value is 10000.

Dataset Creation Example

Below is an example of an Iceberg dataset connector configuration:

name: my_iceberg_dataset
schema_type: ICEBERG
catalog_type: glue
catalog_name: my_glue_catalog
table_name: my_iceberg_table
aws_role_arn: arn:aws:iam::123456789012:role/my_role
aws_region: us-east-1

The following payload will create an Iceberg dataset and begin syncing data from Shaped using the Shaped CLI.

shaped create-dataset --file dataset.yaml

Apache Iceberg

Preparation​

Dataset Configuration​

Required fields​

Optional fields​

Dataset Creation Example​

Preparation

Dataset Configuration

Required fields

Optional fields

Dataset Creation Example