Skip to main content

Apache Iceberg

Premium Connector

This connector requires the Standard Plan or higher.

What this connector does

Shaped reads your Apache Iceberg table through an Iceberg catalog (for example AWS Glue), copies rows into Shaped’s offline store, and keeps them updated on your schedule. You configure the catalog, the table identifier, and optional AWS credentials when the table lives in another account or region.

Preparation

Grant Shaped read-only access to the Glue Data Catalog (or your catalog) and to the S3 objects that hold Iceberg metadata and data files. Access is usually done with a cross-account IAM role that Shaped assumes at sync time.

  1. Contact us for the IAM principal (role or user ARN) that will call sts:AssumeRole into your account.
  2. In your account, create an IAM role that:
    • Trusts that principal (sts:AssumeRole in the role trust policy).
    • Allows at least:
      • Glue: glue:GetDatabase, glue:GetDatabases, glue:GetTable, glue:GetTables, glue:GetPartitions (and related read-only Glue APIs your tables need).
      • S3: s3:GetObject, s3:ListBucket on the buckets and prefixes used by the Iceberg table.
  3. Put the role ARN in aws_role_arn in your table config (see below).

Table configuration

Required fields

FieldExampleDescription
schema_typeICEBERGMust be ICEBERG.
namemy_sales_eventsShaped dataset name (identifier for this connector in your project).
catalog_typeglue (or hive, etc.)Iceberg catalog type for PyIceberg (glue is the common case for AWS Glue Data Catalog).
catalog_nameglue_catalogLogical catalog name passed to the Iceberg runtime (same idea as the name you use with load_catalog in PyIceberg). Must match how your environment resolves the catalog.
database_nameanalyticsIceberg namespace that contains the table. For AWS Glue, this is the Glue database name. Specify the namespace name only — do not prefix it onto table_name.
table_nameordersIceberg table name within database_name. Specify the table name only (e.g. orders); do not include the namespace (e.g. not analytics.orders).

Optional fields

FieldExampleDescription
aws_role_arnarn:aws:iam::111122223333:role/ShapedIcebergReadRole Shaped assumes to read Glue and S3. Use this for cross-account tables or when you want a dedicated reader role in your account.
aws_regionus-west-2AWS region of the Glue catalog and table storage. Set this when the table is not in the default region you rely on otherwise.
schedule_interval@hourlyHow often to sync (cron-style; default is hourly if omitted elsewhere in your stack).
replication_keyevent_tsOptional column for incremental replication when supported for your setup.
unique_keys["order_id"]Columns that uniquely identify a row for deduplication in the ClickHouse copy; latest row wins on conflict.
batch_size10000Rows per batch during extract; default is 10000 if not set.
descriptionOptional human-readable description.

Example (AWS Glue, cross-account role)

name: my_sales_events
schema_type: ICEBERG
catalog_type: glue
catalog_name: glue_catalog
database_name: analytics
table_name: orders
aws_role_arn: arn:aws:iam::111122223333:role/ShapedIcebergRead
aws_region: us-west-2

Create the table in Shaped:

shaped create-table --file dataset.yaml