Apache Iceberg
Premium Connector
This connector requires the Standard Plan or higher.
What this connector does
Shaped reads your Apache Iceberg table through an Iceberg catalog (for example AWS Glue), copies rows into Shaped’s offline store, and keeps them updated on your schedule. You configure the catalog, the table identifier, and optional AWS credentials when the table lives in another account or region.
Preparation
Grant Shaped read-only access to the Glue Data Catalog (or your catalog) and to the S3 objects that hold Iceberg metadata and data files. Access is usually done with a cross-account IAM role that Shaped assumes at sync time.
- Contact us for the IAM principal (role or user ARN) that will call
sts:AssumeRoleinto your account. - In your account, create an IAM role that:
- Trusts that principal (
sts:AssumeRolein the role trust policy). - Allows at least:
- Glue:
glue:GetDatabase,glue:GetDatabases,glue:GetTable,glue:GetTables,glue:GetPartitions(and related read-only Glue APIs your tables need). - S3:
s3:GetObject,s3:ListBucketon the buckets and prefixes used by the Iceberg table.
- Glue:
- Trusts that principal (
- Put the role ARN in
aws_role_arnin your table config (see below).
Table configuration
Required fields
| Field | Example | Description |
|---|---|---|
schema_type | ICEBERG | Must be ICEBERG. |
name | my_sales_events | Shaped dataset name (identifier for this connector in your project). |
catalog_type | glue (or hive, etc.) | Iceberg catalog type for PyIceberg (glue is the common case for AWS Glue Data Catalog). |
catalog_name | glue_catalog | Logical catalog name passed to the Iceberg runtime (same idea as the name you use with load_catalog in PyIceberg). Must match how your environment resolves the catalog. |
database_name | analytics | Iceberg namespace that contains the table. For AWS Glue, this is the Glue database name. Specify the namespace name only — do not prefix it onto table_name. |
table_name | orders | Iceberg table name within database_name. Specify the table name only (e.g. orders); do not include the namespace (e.g. not analytics.orders). |
Optional fields
| Field | Example | Description |
|---|---|---|
aws_role_arn | arn:aws:iam::111122223333:role/ShapedIcebergRead | Role Shaped assumes to read Glue and S3. Use this for cross-account tables or when you want a dedicated reader role in your account. |
aws_region | us-west-2 | AWS region of the Glue catalog and table storage. Set this when the table is not in the default region you rely on otherwise. |
schedule_interval | @hourly | How often to sync (cron-style; default is hourly if omitted elsewhere in your stack). |
replication_key | event_ts | Optional column for incremental replication when supported for your setup. |
unique_keys | ["order_id"] | Columns that uniquely identify a row for deduplication in the ClickHouse copy; latest row wins on conflict. |
batch_size | 10000 | Rows per batch during extract; default is 10000 if not set. |
description | … | Optional human-readable description. |
Example (AWS Glue, cross-account role)
name: my_sales_events
schema_type: ICEBERG
catalog_type: glue
catalog_name: glue_catalog
database_name: analytics
table_name: orders
aws_role_arn: arn:aws:iam::111122223333:role/ShapedIcebergRead
aws_region: us-west-2
Create the table in Shaped:
shaped create-table --file dataset.yaml