ClickHouse
warning
This is an article from the Shaped 1.0 documentation. The APIs have changed and information may be outdated. Go to Shaped 2.0 docs
Preparation
To allow Shaped to connect to your ClickHouse database, you need to create a read-only user and share those credentials through the Create Dataset endpoint. You can create a read-only user on a ClickHouse database with the following commands:
# 1. Create a new user with a password
CREATE USER read_only_user IDENTIFIED BY 'secure_password1!';
# 2. Grant SELECT privileges on the database you want to access
GRANT SELECT ON database_name.* TO read_only_user;
# 3. If you want to restrict access to specific tables, use:
GRANT SELECT ON database_name.table_name TO read_only_user;
Dataset Configuration
Required fields
| Field | Example | Description |
|---|---|---|
| schema_type | CLICKHOUSE | Specifies the connector schema type, in this case "CLICKHOUSE". |
| table | events | The name of the table to sync. |
| user | read_only_user | Access account username. |
| password | secure_password1! | Access account password. |
| host | clickhouse.example.com | Database hostname. |
| port | 9440 | Database port (the default for ClickHouse HTTPS is 8443, HTTP is 8123). |
| replication_key | created_at | The name of the column that contains a datetime key or ascending id for ordering data during incremental syncs. |
Optional fields
| Field | Example | Description |
|---|---|---|
| database | analytics | The name of the database that contains the table to sync. If not specified, the default database will be used. |
| columns | ["userId", "eventType", "timestamp", "properties"] | The name of the columns you wish to sync from ClickHouse into Shaped. If not specified, all columns will be synced. |
| unique_keys | ["eventId"] | Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used. |
| description | "User events data" | A description of the dataset. |
| schedule_interval | "@hourly" | The schedule on which to sync data. Defaults to "@hourly". |
Dataset Creation Example
Below is an example of a ClickHouse dataset connector configuration:
name: clickhouse_events_dataset
schema_type: CLICKHOUSE
table: users
user: read_only_user
password: secure_password1!
host: clickhouse.example.com
port: 9440
database: analytics
replication_key: updated_at
unique_keys:
- userId
columns:
- userId
- eventType
- timestamp
- properties
- created_at
- updated_at
The following payload will create a ClickHouse dataset and begin syncing data from Shaped using the Shaped CLI:
shaped create-dataset --file dataset.yaml