Redshift
Preparation
To allow Shaped to connect to your Redshift data warehouse, you need to create a read-only user and share its credentials through the Create Dataset request. You can create this user with the following steps on your Redshift cluster:
# 1. Create a new user.
CREATE USER read_only_user WITH PASSWORD 'secure_password1!';
# 2. Create a group for granting/revoking permissions.
CREATE GROUP read_only_group;
# 3. Add user to group.
ALTER GROUP read_only_group ADD USER read_only_user;
# 4. Revoke default granted create rights in schema from group.
REVOKE CREATE ON SCHEMA public FROM GROUP read_only_group;
# 5. Grant the group usage access to the schema.
GRANT USAGE ON SCHEMA public TO group read_only_group;
# 6. Grant the group read access to all the tables in the schema. Note you can also
# restrict this to your specific user, item and interaction views.
GRANT SELECT ON ALL TABLES IN SCHEMA public TO group read_only_group;
# 7. Grant the group access to future tables in the schema.
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO group read_only_group;
Dataset Configuration
Required fields
Field | Example | Description |
---|---|---|
schema_type | REDSHIFT | Specifies the connector schema type, in this case "REDSHIFT". |
table | movies | The name of the table to sync. |
user | your_user | Access account username. |
password | pAssw0rd1! | Access account Password. |
host | my-redshift-db.xxxxxxx.us-east-2.rds.amazonaws.com | Database hostname. |
port | 5439 | Database port (the default for Redshift is 5439). |
database | movielens | The name of the database that contains that contains table to sync. |
replication_key | updated_at | The name of the column that contains a datetime key or ascending id for ordering data during incremental syncs. |
Optional fields
Field | Example | Description |
---|---|---|
database_schema | public | The name of the schema that contains table to sync. |
columns | ["productId", "color", "brand", "stockLevel"] | the name of the columns you wish to sync from Redshift into Shaped. If not specified, all columns will be synced. |
unique_keys | ["productId"] | Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used. |
batch_size | 100000 | The number of rows to fetch from the database in each batch, changing this can improve throughput for large tables. The default is 10000. |
Dataset Creation Example
Below is an example of a Redshift dataset connector configuration:
name: your_redshift_dataset
schema_type: REDSHIFT
table: movies
user: your_user
password: pAssw0rd1!
host: my-redshift-db.xxxxxxx.us-east-2.rds.amazonaws.com
port: 5439
database: movielens
database_schema: public
replication_key: updated_at
The following payload will create a Redshift dataset and begin syncing data from Shaped using the Shaped CLI.
shaped create-dataset --file dataset.yaml