PostgreSQL
Preparation
To allow Shaped to connect to your Postgres database, you need to create a read-only user and pass the details into a Create Dataset request. You can create a read-only user on your Postgres DB with the following commands:
# 1. Create a new user.
CREATE USER read_only_user WITH PASSWORD 'secure_password1!';
# 2. Grant the user connection to the database.
GRANT CONNECT ON DATABASE database_name TO read_only_user;
# 3. Grant the user usage of the schema.
GRANT USAGE ON SCHEMA public TO read_only_user;
# 4. Grant the user read access to all the tables in the schema. Note you can also
# restrict this to your specific user, item and interaction views.
GRANT SELECT ON ALL TABLES IN SCHEMA public TO read_only_user;
# 5. Grant the group access to future tables in the schema.
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO read_only_user;
Network Access
If your database is publicy accessible, reach out to the Shaped team to get our IPs for allow listing. Otherwise, check out our private link docs.
Dataset Configuration
Required fields
Field | Example | Description |
---|---|---|
schema_type | POSTGRES | Specifies the connector schema type, in this case "Postgres". |
table | movies | The name of the table to sync. |
user | your_user | Access account username. |
password | pAssw0rd1! | Access account password. |
host | my-postgres-db.xxxxxxx.us-east-2.rds.amazonaws.com | Database hostname. |
port | 5432 | Database port (the default for Postgres is 5432). |
database | movielens | The name of the database that contains table to sync. |
replication_key | updated_at | The name of the column that contains a datetime key or ascending id for ordering data during incremental syncs. |
Optional fields
Field | Example | Description |
---|---|---|
database_schema | public | The name of the schema that contains table to sync. |
columns | ["productId", "color", "brand", "stockLevel"] | The name of the columns you wish to sync from Postgres into Shaped. If not specified, all columns will be synced. |
unique_keys | ["productId"] | Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used. |
ssl_certificate_authority | (certificate content) | The certificate authority that should be used to verify the server's identity. |
ssl_client_certificate | (certificate content) | The certificate that should be used to verify your identity to the server. |
ssl_client_private_key | (key content) | The private key for the certificate you provided. |
ssh_tunnel_host | ssh.example.com | The hostname of the SSH server to tunnel through. |
ssh_tunnel_port | 22 | The port of the SSH server (default is 22). |
ssh_tunnel_username | your_ssh_username | The username for SSH authentication. |
ssh_tunnel_password | your_ssh_password | The password for SSH authentication (if not using private key). |
ssh_tunnel_private_key | (key content) | The content of the private key for SSH authentication. Can be used instead of private_key_path. |
ssh_tunnel_private_key_password | (password) | The password for the private key (if using ssh_tunnel_private_key). |
batch_size | 100000 | The number of rows to fetch from the database in each batch, changing this can improve throughput for large tables. The default is 10000. |
Dataset Creation Example
Below is an example of a Postgres dataset connector configuration:
name: your_postgres_dataset
schema_type: POSTGRES
table: movies
user: your_user
password: pAssw0rd1!
host: my-postgres-db.xxxxxxx.us-east-2.rds.amazonaws.com
port: 5432
database: movielens
replication_key: updated_at
The following payload will create a Postgres dataset and begin syncing data from Shaped using the Shaped CLI.
shaped create-dataset --file dataset.yaml