Skip to main content

Redshift

Preparation

To allow Shaped to connect to your Redshift data warehouse, you need to create a read-only user and share its credentials through the Create Dataset request. You can create this user with the following steps on your Redshift cluster:

# 1. Create a new user.
CREATE USER read_only_user WITH PASSWORD 'secure_password1!';

# 2. Create a group for granting/revoking permissions.
CREATE GROUP read_only_group;

# 3. Add user to group.
ALTER GROUP read_only_group ADD USER read_only_user;

# 4. Revoke default granted create rights in schema from group.
REVOKE CREATE ON SCHEMA public FROM GROUP read_only_group;

# 5. Grant the group usage access to the schema.
GRANT USAGE ON SCHEMA public TO group read_only_group;

# 6. Grant the group read access to all the tables in the schema. Note you can also
# restrict this to your specific user, item and interaction views.
GRANT SELECT ON ALL TABLES IN SCHEMA public TO group read_only_group;

# 7. Grant the group access to future tables in the schema.
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO group read_only_group;

Dataset Configuration

Required fields

FieldExampleDescription
schema_typeREDSHIFTSpecifies the connector schema type, in this case "REDSHIFT".
tablemoviesThe name of the table to sync.
useryour_userAccess account username.
passwordpAssw0rd1!Access account Password.
hostmy-redshift-db.xxxxxxx.us-east-2.rds.amazonaws.comDatabase hostname.
port5439Database port (the default for Redshift is 5439).
databasemovielensThe name of the database that contains that contains table to sync.
replication_keyupdated_atThe name of the column that contains a datetime key or ascending id for ordering data during incremental syncs.

Optional fields

FieldExampleDescription
database_schemapublicThe name of the schema that contains table to sync.
columns["productId", "color", "brand", "stockLevel"]the name of the columns you wish to sync from Redshift into Shaped. If not specified, all columns will be synced.
unique_keys["productId"]Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used.
batch_size100000The number of rows to fetch from the database in each batch, changing this can improve throughput for large tables. The default is 10000.

Dataset Creation Example

Below is an example of a Redshift dataset connector configuration:

name: your_redshift_dataset
schema_type: REDSHIFT
table: movies
user: your_user
password: pAssw0rd1!
host: my-redshift-db.xxxxxxx.us-east-2.rds.amazonaws.com
port: 5439
database: movielens
database_schema: public
replication_key: updated_at

The following payload will create a Redshift dataset and begin syncing data from Shaped using the Shaped CLI.

shaped create-dataset --file dataset.yaml