ClickHouse
Preparation
To allow Shaped to connect to your ClickHouse database, you need to create a read-only user and share those credentials through the Create Dataset endpoint. You can create a read-only user on a ClickHouse database with the following commands:
# 1. Create a new user with a password
CREATE USER read_only_user IDENTIFIED BY 'secure_password1!';
# 2. Grant SELECT privileges on the database you want to access
GRANT SELECT ON database_name.* TO read_only_user;
# 3. If you want to restrict access to specific tables, use:
GRANT SELECT ON database_name.table_name TO read_only_user;
Dataset Configuration
Required fields
Field | Example | Description |
---|---|---|
schema_type | CLICKHOUSE | Specifies the connector schema type, in this case "CLICKHOUSE". |
table | events | The name of the table to sync. |
user | read_only_user | Access account username. |
password | secure_password1! | Access account password. |
host | clickhouse.example.com | Database hostname. |
port | 9440 | Database port (the default for ClickHouse HTTPS is 8443, HTTP is 8123). |
replication_key | created_at | The name of the column that contains a datetime key or ascending id for ordering data during incremental syncs. |
Optional fields
Field | Example | Description |
---|---|---|
database | analytics | The name of the database that contains the table to sync. If not specified, the default database will be used. |
columns | ["userId", "eventType", "timestamp", "properties"] | The name of the columns you wish to sync from ClickHouse into Shaped. If not specified, all columns will be synced. |
unique_keys | ["eventId"] | Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used. |
description | "User events data" | A description of the dataset. |
schedule_interval | "@hourly" | The schedule on which to sync data. Defaults to "@hourly". |
Dataset Creation Example
Below is an example of a ClickHouse dataset connector configuration:
name: clickhouse_events_dataset
schema_type: CLICKHOUSE
table: users
user: read_only_user
password: secure_password1!
host: clickhouse.example.com
port: 9440
database: analytics
replication_key: updated_at
unique_keys:
- userId
columns:
- userId
- eventType
- timestamp
- properties
- created_at
- updated_at
The following payload will create a ClickHouse dataset and begin syncing data from Shaped using the Shaped CLI:
shaped create-dataset --file dataset.yaml