Skip to main content

Apache Iceberg

Preparation

To allow Shaped to connect to your Iceberg table, you need to grant Shaped’s AWS service account read-only access to your data lake. You can do this through the AWS console or with the following steps:

  1. Contact us for our service account via email.
  2. Grant our service account permission to access your data lake via the appropriate IAM role. For example, if your Iceberg table is stored in S3, you can grant our service account the s3:GetObject and s3:ListBucket permissions on the relevant S3 bucket.

Dataset Configuration

Required fields

FieldExampleDescription
schema_typeICEBERGSpecifies the connector schema type, in this case "ICEBERG".
catalog_typeglue or hiveSpecifies the type of the Iceberg catalog.
catalog_namemy_glue_catalog or my_hive_catalogSpecifies the name of the Iceberg catalog.
table_namemy_iceberg_tableSpecifies the name of the Iceberg table.

Optional fields

FieldExampleDescription
aws_role_arnarn:aws:iam::123456789012:role/my_roleSpecifies the ARN of an AWS role to assume when accessing the Iceberg table. This is required if the Iceberg table is stored in a secure location, such as an S3 bucket with restricted access.
aws_regionus-east-1Specifies the AWS region where the Iceberg table is located. This is required if the Iceberg table is stored in a region other than the default region for your AWS account.

Dataset Creation Example

Below is an example of an Iceberg dataset connector configuration:

name: my_iceberg_dataset
schema_type: ICEBERG
catalog_type: glue
catalog_name: my_glue_catalog
table_name: my_iceberg_table
aws_role_arn: arn:aws:iam::123456789012:role/my_role
aws_region: us-east-1

The following payload will create an Iceberg dataset and begin syncing data from Shaped using the Shaped CLI.

shaped create-dataset --file dataset.yaml