Skip to main content

PostgreSQL

Preparation

To allow Shaped to connect to your Postgres database, you need to create a read-only user and pass the details into a Create Dataset request. You can create a read-only user on your Postgres DB with the following commands:

# 1. Create a new user. 
CREATE USER read_only_user WITH PASSWORD 'secure_password1!';

# 2. Grant the user connection to the database.
GRANT CONNECT ON DATABASE database_name TO read_only_user;

# 3. Grant the user usage of the schema.
GRANT USAGE ON SCHEMA public TO read_only_user;

# 4. Grant the user read access to all the tables in the schema. Note you can also
# restrict this to your specific user, item and interaction views.
GRANT SELECT ON ALL TABLES IN SCHEMA public TO read_only_user;

# 5. Grant the group access to future tables in the schema.
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO read_only_user;

Dataset Configuration

Required fields

FieldExampleDescription
schema_typePOSTGRESSpecifies the connector schema type, in this case "Postgres".
tablemoviesThe name of the table to sync.
useryour_userAccess account username.
passwordpAssw0rd1!Access account password.
hostmy-postgres-db.xxxxxxx.us-east-2.rds.amazonaws.comDatabase hostname.
port5432Database port (the default for Postgres is 5432).
databasemovielensThe name of the database that contains table to sync.
replication_keyupdated_atThe name of the column that contains a datetime key or ascending id for ordering data during incremental syncs.

Optional fields

FieldExampleDescription
database_schemapublicThe name of the schema that contains table to sync.
columns["productId", "color", "brand", "stockLevel"]The name of the columns you wish to sync from Postgres into Shaped. If not specified, all columns will be synced.
unique_keys["productId"]Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used.
ssl_certificate_authority(certificate content)The certificate authority that should be used to verify the server's identity.
ssl_client_certificate(certificate content)The certificate that should be used to verify your identity to the server.
ssl_client_private_key(key content)The private key for the certificate you provided.
ssh_tunnel_hostssh.example.comThe hostname of the SSH server to tunnel through.
ssh_tunnel_port22The port of the SSH server (default is 22).
ssh_tunnel_usernameyour_ssh_usernameThe username for SSH authentication.
ssh_tunnel_passwordyour_ssh_passwordThe password for SSH authentication (if not using private key).
ssh_tunnel_private_key(key content)The content of the private key for SSH authentication. Can be used instead of private_key_path.
ssh_tunnel_private_key_password(password)The password for the private key (if using ssh_tunnel_private_key).
batch_size100000The number of rows to fetch from the database in each batch, changing this can improve throughput for large tables. The default is 10000.

Dataset Creation Example

Below is an example of a Postgres dataset connector configuration:

name: your_postgres_dataset
schema_type: POSTGRES
table: movies
user: your_user
password: pAssw0rd1!
host: my-postgres-db.xxxxxxx.us-east-2.rds.amazonaws.com
port: 5432
database: movielens
replication_key: updated_at

The following payload will create a Postgres dataset and begin syncing data from Shaped using the Shaped CLI.

shaped create-dataset --file dataset.yaml