Skip to main content

BigQuery

Shaped supports BigQuery through a Table Connector which will automatically sync data between your BigQuery data warehouse and Shaped.

Preparation

To allow Shaped to connect to your BigQuery data warehouse, you need to create a GCP service account with the necessary permissions to read your BigQuery datasets and run query jobs.

Prerequisites

  1. Ensure the BigQuery API is enabled in your GCP project.
  2. Install the gcloud command line tool.

Service Account Setup

  1. Create a service account in your GCP project:
gcloud iam service-accounts create shaped-bigquery-readonly \
--display-name="Shaped BigQuery Read-Only Service Account" \
--project=<YOUR_PROJECT>
  1. Grant the necessary permissions. For least privilege, grant roles/bigquery.dataViewer only on the specific datasets that Shaped should read, and grant roles/bigquery.jobUser on the project where BigQuery jobs will be billed:
# Grant dataViewer on specific dataset (recommended for least privilege).
bq add-iam-policy-binding \
--member='serviceAccount:shaped-bigquery-readonly@<YOUR_PROJECT>.iam.gserviceaccount.com' \
--role='roles/bigquery.dataViewer' \
<YOUR_PROJECT>:<YOUR_DATASET>

# Grant jobUser on the project to allow running query jobs.
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:shaped-bigquery-readonly@<YOUR_PROJECT>.iam.gserviceaccount.com' \
--role='roles/bigquery.jobUser'

# If Shaped uses the BigQuery Storage Read API, also grant readSessionUser.
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:shaped-bigquery-readonly@<YOUR_PROJECT>.iam.gserviceaccount.com' \
--role='roles/bigquery.readSessionUser'

Alternatively, if you need to grant access to all datasets in the project, you can grant roles/bigquery.dataViewer at the project level:

gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:shaped-bigquery-readonly@<YOUR_PROJECT>.iam.gserviceaccount.com' \
--role='roles/bigquery.dataViewer'
  1. Create and download a JSON key for the service account:
gcloud iam service-accounts keys create shaped-bigquery-key.json \
--iam-account=shaped-bigquery-readonly@<YOUR_PROJECT>.iam.gserviceaccount.com \
--project=<YOUR_PROJECT>

This will create a shaped-bigquery-key.json file containing the service account credentials.

Table Configuration

Required fields

FieldExampleDescription
schema_typeBIGQUERYSpecifies the connector schema type, in this case "BIGQUERY".
tablebq-project.shaped.dataSpecifies the Google BigQuery table, qualified with the project and BigQuery dataset. If your project name contains invalid YAML characters, you must surround it with backticks and double-quotes.
columns["productId", "color", "brand", "stockLevel", "createdAt", "updatedAt"]The name of the columns you wish to sync from BigQuery into Shaped.
datetime_keyupdatedAtThe name of the column that contains a datetime key for ordering data during incremental syncs.
service_account_key_json{"type": "service_account", "project_id": "your-project", ...}The JSON credentials for your GCP service account. This should be the full contents of the JSON key file downloaded in the preparation steps.

Optional fields

FieldExampleDescription
start_datetime2020-01-01T00:00:00ZSpecify a start datetime for the initial sync. If not specified, the initial sync will start from the earliest datetime in the table.
filters["event_type IN ['viewed', 'liked']"]Specify a list of filters to apply to the table. Filters must be in the form of a SQL WHERE clause.
unique_keys["productId"]Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used.
batch_size100000The number of rows to fetch from the database in each batch, changing this can improve throughput for large tables. The default is 10000.

Table Creation Example

Below is an example of a BigQuery table connector configuration:

name: your_bigquery_dataset
schema_type: BIGQUERY
table: "`bq-project`.shaped.`data`"
columns: ["productId", "color", "brand", "stockLevel", "createdAt", "updatedAt"]
datetime_key: "updatedAt"
service_account_key_json: {"type": "service_account", "project_id": "your-project", "private_key_id": "key-id", "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n", "client_email": "shaped-bigquery-readonly@your-project.iam.gserviceaccount.com", "client_id": "123456789", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/shaped-bigquery-readonly%40your-project.iam.gserviceaccount.com"}
start_datetime: "2020-01-01T00:00:00Z"
filters: ["stockLevel > 0"]

The following payload will create a BigQuery table and begin syncing data from Shaped using the Shaped CLI.

shaped create-table --file dataset.yaml