PostgreSQL
Preparation
To allow Shaped to connect to your Postgres database, you need to create a read-only user and pass the details into a Create Dataset request. You can create a read-only user on your Postgres DB with the following commands:
# 1. Create a new user.
CREATE USER read_only_user WITH PASSWORD 'secure_password1!';
# 2. Grant the user connection to the database.
GRANT CONNECT ON DATABASE database_name TO read_only_user;
# 3. Grant the user usage of the schema.
GRANT USAGE ON SCHEMA public TO read_only_user;
# 4. Grant the user read access to all the tables in the schema. Note you can also
# restrict this to your specific user, item and interaction views.
GRANT SELECT ON ALL TABLES IN SCHEMA public TO read_only_user;
# 5. Grant the group access to future tables in the schema.
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO read_only_user;
Dataset Configuration
Required fields
Field | Example | Description |
---|---|---|
schema_type | POSTGRES | Specifies the connector schema type, in this case "Postgres". |
table | movies | The name of the table to sync. |
user | your_user | Access account username. |
password | pAssw0rd1! | Access account password. |
host | my-postgres-db.xxxxxxx.us-east-2.rds.amazonaws.com | Database hostname. |
port | 5432 | Database port (the default for Postgres is 5432). |
database | movielens | The name of the database that contains table to sync. |
replication_key | updated_at | The name of the column that contains a datetime key or ascending id for ordering data during incremental syncs. |
Optional fields
Field | Example | Description |
---|---|---|
database_schema | public | The name of the schema that contains table to sync. |
columns | ["productId", "color", "brand", "stockLevel"] | The name of the columns you wish to sync from Postgres into Shaped. If not specified, all columns will be synced. |
unique_keys | ["productId"] | Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used. |
ssl_certificate_authority | (certificate content) | The certificate authority that should be used to verify the server's identity. |
ssl_client_certificate | (certificate content) | The certificate that should be used to verify your identity to the server. |
ssl_client_private_key | (key content) | The private key for the certificate you provided. |
ssh_tunnel_host | ssh.example.com | The hostname of the SSH server to tunnel through. |
ssh_tunnel_port | 22 | The port of the SSH server (default is 22). |
ssh_tunnel_username | your_ssh_username | The username for SSH authentication. |
ssh_tunnel_password | your_ssh_password | The password for SSH authentication (if not using private key). |
ssh_tunnel_private_key | (key content) | The content of the private key for SSH authentication. Can be used instead of private_key_path. |
ssh_tunnel_private_key_password | (password) | The password for the private key (if using ssh_tunnel_private_key). |
batch_size | 100000 | The number of rows to fetch from the database in each batch, changing this can improve throughput for large tables. The default is 10000. |
Dataset Creation Example
Below is an example of a Postgres dataset connector configuration:
name: your_postgres_dataset
schema_type: POSTGRES
table: movies
user: your_user
password: pAssw0rd1!
host: my-postgres-db.xxxxxxx.us-east-2.rds.amazonaws.com
port: 5432
database: movielens
replication_key: updated_at
The following payload will create a Postgres dataset and begin syncing data from Shaped using the Shaped CLI.
shaped create-dataset --file dataset.yaml