Redshift

Preparation

To allow Shaped to connect to your Redshift data warehouse, you need to create a read-only user and share its credentials through the Create Dataset request. You can create this user with the following steps on your Redshift cluster:

# 1. Create a new user.
CREATE USER read_only_user WITH PASSWORD 'secure_password1!';

# 2. Create a group for granting/revoking permissions.
CREATE GROUP read_only_group;

# 3. Add user to group.
ALTER GROUP read_only_group ADD USER read_only_user;

# 4. Revoke default granted create rights in schema from group.
REVOKE CREATE ON SCHEMA public FROM GROUP read_only_group;

# 5. Grant the group usage access to the schema.
GRANT USAGE ON SCHEMA public TO group read_only_group;

# 6. Grant the group read access to all the tables in the schema. Note you can also 
# restrict this to your specific user, item and interaction views.
GRANT SELECT ON ALL TABLES IN SCHEMA public TO group read_only_group;

# 7. Grant the group access to future tables in the schema.
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO group read_only_group;

Dataset Configuration

Required fields

Field	Example	Description
schema_type	`REDSHIFT`	Specifies the connector schema type, in this case "REDSHIFT".
table	movies	The name of the table to sync.
user	your_user	Access account username.
password	pAssw0rd1!	Access account Password.
host	my-redshift-db.xxxxxxx.us-east-2.rds.amazonaws.com	Database hostname.
port	5439	Database port (the default for Redshift is 5439).
database	movielens	The name of the database that contains that contains table to sync.
replication_key	updated_at	The name of the column that contains a datetime key or ascending id for ordering data during incremental syncs.

Optional fields

Field	Example	Description
database_schema	public	The name of the schema that contains table to sync.
columns	["productId", "color", "brand", "stockLevel"]	the name of the columns you wish to sync from Redshift into Shaped. If not specified, all columns will be synced.
unique_keys	["productId"]	Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used.
batch_size	100000	The number of rows to fetch from the database in each batch, changing this can improve throughput for large tables. The default is 10000.

Dataset Creation Example

Below is an example of a Redshift dataset connector configuration:

name: your_redshift_dataset
schema_type: REDSHIFT
table: movies
user: your_user
password: pAssw0rd1!
host: my-redshift-db.xxxxxxx.us-east-2.rds.amazonaws.com
port: 5439
database: movielens
database_schema: public
replication_key: updated_at

The following payload will create a Redshift dataset and begin syncing data from Shaped using the Shaped CLI.

shaped create-dataset --file dataset.yaml

Redshift

Preparation​

Dataset Configuration​

Required fields​

Optional fields​

Dataset Creation Example​

Preparation

Dataset Configuration

Required fields

Optional fields

Dataset Creation Example