ClickHouse
info
This is a preview of the new Shaped docs. Found an issue or have feedback? Let us know!
Preparation
To allow Shaped to connect to your ClickHouse database, you need to create a read-only user and share those credentials through the Create Dataset endpoint. You can create a read-only user on a ClickHouse database with the following commands:
# 1. Create a new user with a password
CREATE USER read_only_user IDENTIFIED BY 'secure_password1!';
# 2. Grant SELECT privileges on the database you want to access
GRANT SELECT ON database_name.* TO read_only_user;
# 3. If you want to restrict access to specific tables, use:
GRANT SELECT ON database_name.table_name TO read_only_user;
Dataset Configuration
Required fields
| Field | Example | Description |
|---|---|---|
| schema_type | CLICKHOUSE | Specifies the connector schema type, in this case "CLICKHOUSE". |
| table | events | The name of the table to sync. |
| user | read_only_user | Access account username. |
| password | secure_password1! | Access account password. |
| host | clickhouse.example.com | Database hostname. |
| port | 9440 | Database port (the default for ClickHouse HTTPS is 8443, HTTP is 8123). |
| replication_key | created_at | The name of the column that contains a datetime key or ascending id for ordering data during incremental syncs. |
Optional fields
| Field | Example | Description |
|---|---|---|
| database | analytics | The name of the database that contains the table to sync. If not specified, the default database will be used. |
| columns | ["userId", "eventType", "timestamp", "properties"] | The name of the columns you wish to sync from ClickHouse into Shaped. If not specified, all columns will be synced. |
| unique_keys | ["eventId"] | Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used. |
| description | "User events data" | A description of the dataset. |
| schedule_interval | "@hourly" | The schedule on which to sync data. Defaults to "@hourly". |
Dataset Creation Example
Below is an example of a ClickHouse dataset connector configuration:
name: clickhouse_events_dataset
schema_type: CLICKHOUSE
table: users
user: read_only_user
password: secure_password1!
host: clickhouse.example.com
port: 9440
database: analytics
replication_key: updated_at
unique_keys:
- userId
columns:
- userId
- eventType
- timestamp
- properties
- created_at
- updated_at
The following payload will create a ClickHouse dataset and begin syncing data from Shaped using the Shaped CLI:
shaped create-dataset --file dataset.yaml