BigQuery
Shaped supports BigQuery through a Dataset Connector which will automatically sync data between your BigQuery data warehouse and Shaped.
Preparation
To allow Shaped to connect to your BigQuery data warehouse, you need to grant Shaped’s GCP service account read-only access to your project, dataset, and tables. You can do this through the GCP portal or with the following steps:
- Install the gcloud command line tool.
- Contact us for our service account via email.
- Grant our service account permission to access your project, dataset, and tables via
the
roles/bigquery.dataViewer
,roles/bigquery.jobUser
androles/bigquery.readSessionUser
roles.
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:<OUR_SERVICE_ACCOUNT>' \
--role='roles/bigquery.dataViewer'
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:<OUR_SERVICE_ACCOUNT>' \
--role='roles/bigquery.jobUser'
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:<OUR_SERVICE_ACCOUNT>' \
--role='roles/bigquery.readSessionUser'
Dataset Configuration
Required fields
Field | Example | Description |
---|---|---|
schema_type | BIGQUERY | Specifies the connector schema type, in this case "BIGQUERY". |
table | bq-project.shaped.data | Specifies the Google BigQuery table, qualified with the project and dataset. If your project name contains invalid YAML characters, you must surround it with backticks and double-quotes. |
columns | ["productId", "color", "brand", "stockLevel", "createdAt", "updatedAt"] | The name of the columns you wish to sync from BigQuery into Shaped. |
datetime_key | updatedAt | The name of the column that contains a datetime key for ordering data during incremental syncs. |
Optional fields
Field | Example | Description |
---|---|---|
start_datetime | 2020-01-01T00:00:00Z | Specify a start datetime for the initial sync. If not specified, the initial sync will start from the earliest datetime in the table. |
filters | ["event_type IN ['viewed', 'liked']"] | Specify a list of filters to apply to the table. Filters must be in the form of a SQL WHERE clause. |
unique_keys | ["productId"] | Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used. |
batch_size | 100000 | The number of rows to fetch from the database in each batch, changing this can improve throughput for large tables. The default is 10000. |
Dataset Creation Example
Below is an example of a BigQuery dataset connector configuration:
name: your_bigquery_dataset
schema_type: BIGQUERY
table: "`bq-project`.shaped.`data`"
columns: ["productId", "color", "brand", "stockLevel", "createdAt", "updatedAt"]
datetime_key: "updatedAt"
start_datetime: "2020-01-01T00:00:00Z"
filters: ["stockLevel > 0"]
The following payload will create a BigQuery dataset and begin syncing data from Shaped using the Shaped CLI.
shaped create-dataset --file dataset.yaml