BigQuery
Shaped supports BigQuery through a Table Connector which will automatically sync data between your BigQuery data warehouse and Shaped.
Preparation
To allow Shaped to connect to your BigQuery data warehouse, you need to create a GCP service account with the necessary permissions to read your BigQuery datasets and run query jobs.
Prerequisites
- Ensure the BigQuery API is enabled in your GCP project.
- Install the gcloud command line tool.
Service Account Setup
- Create a service account in your GCP project:
gcloud iam service-accounts create shaped-bigquery-readonly \
--display-name="Shaped BigQuery Read-Only Service Account" \
--project=<YOUR_PROJECT>
- Grant the necessary permissions. For least privilege, grant
roles/bigquery.dataVieweronly on the specific datasets that Shaped should read, and grantroles/bigquery.jobUseron the project where BigQuery jobs will be billed:
# Grant dataViewer on specific dataset (recommended for least privilege).
bq add-iam-policy-binding \
--member='serviceAccount:shaped-bigquery-readonly@<YOUR_PROJECT>.iam.gserviceaccount.com' \
--role='roles/bigquery.dataViewer' \
<YOUR_PROJECT>:<YOUR_DATASET>
# Grant jobUser on the project to allow running query jobs.
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:shaped-bigquery-readonly@<YOUR_PROJECT>.iam.gserviceaccount.com' \
--role='roles/bigquery.jobUser'
# If Shaped uses the BigQuery Storage Read API, also grant readSessionUser.
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:shaped-bigquery-readonly@<YOUR_PROJECT>.iam.gserviceaccount.com' \
--role='roles/bigquery.readSessionUser'
Alternatively, if you need to grant access to all datasets in the project, you can grant roles/bigquery.dataViewer at the project level:
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:shaped-bigquery-readonly@<YOUR_PROJECT>.iam.gserviceaccount.com' \
--role='roles/bigquery.dataViewer'
- Create and download a JSON key for the service account:
gcloud iam service-accounts keys create shaped-bigquery-key.json \
--iam-account=shaped-bigquery-readonly@<YOUR_PROJECT>.iam.gserviceaccount.com \
--project=<YOUR_PROJECT>
This will create a shaped-bigquery-key.json file containing the service account credentials.
Table Configuration
Required fields
| Field | Example | Description |
|---|---|---|
| schema_type | BIGQUERY | Specifies the connector schema type, in this case "BIGQUERY". |
| table | bq-project.shaped.data | Specifies the Google BigQuery table, qualified with the project and BigQuery dataset. If your project name contains invalid YAML characters, you must surround it with backticks and double-quotes. |
| columns | ["productId", "color", "brand", "stockLevel", "createdAt", "updatedAt"] | The name of the columns you wish to sync from BigQuery into Shaped. |
| datetime_key | updatedAt | The name of the column that contains a datetime key for ordering data during incremental syncs. |
| service_account_key_json | {"type": "service_account", "project_id": "your-project", ...} | The JSON credentials for your GCP service account. This should be the full contents of the JSON key file downloaded in the preparation steps. |
Optional fields
| Field | Example | Description |
|---|---|---|
| start_datetime | 2020-01-01T00:00:00Z | Specify a start datetime for the initial sync. If not specified, the initial sync will start from the earliest datetime in the table. |
| filters | ["event_type IN ['viewed', 'liked']"] | Specify a list of filters to apply to the table. Filters must be in the form of a SQL WHERE clause. |
| unique_keys | ["productId"] | Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used. |
| batch_size | 100000 | The number of rows to fetch from the database in each batch, changing this can improve throughput for large tables. The default is 10000. |
Table Creation Example
Below is an example of a BigQuery table connector configuration:
name: your_bigquery_dataset
schema_type: BIGQUERY
table: "`bq-project`.shaped.`data`"
columns: ["productId", "color", "brand", "stockLevel", "createdAt", "updatedAt"]
datetime_key: "updatedAt"
service_account_key_json: {"type": "service_account", "project_id": "your-project", "private_key_id": "key-id", "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n", "client_email": "shaped-bigquery-readonly@your-project.iam.gserviceaccount.com", "client_id": "123456789", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/shaped-bigquery-readonly%40your-project.iam.gserviceaccount.com"}
start_datetime: "2020-01-01T00:00:00Z"
filters: ["stockLevel > 0"]
The following payload will create a BigQuery table and begin syncing data from Shaped using the Shaped CLI.
shaped create-table --file dataset.yaml