BigQuery

Shaped supports BigQuery through a Dataset Connector which will automatically sync data between your BigQuery data warehouse and Shaped.

Preparation

To allow Shaped to connect to your BigQuery data warehouse, you need to grant Shaped’s GCP service account read-only access to your project, dataset, and tables. You can do this through the GCP portal or with the following steps:

Install the gcloud command line tool.
Contact us for our service account via email.
Grant our service account permission to access your project, dataset, and tables via the roles/bigquery.dataViewer, roles/bigquery.jobUser and roles/bigquery.readSessionUser roles.

gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:<OUR_SERVICE_ACCOUNT>' \
--role='roles/bigquery.dataViewer'

gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:<OUR_SERVICE_ACCOUNT>' \
--role='roles/bigquery.jobUser'

gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
--member='serviceAccount:<OUR_SERVICE_ACCOUNT>' \
--role='roles/bigquery.readSessionUser'

Dataset Configuration

Required fields

Field	Example	Description
schema_type	`BIGQUERY`	Specifies the connector schema type, in this case "BIGQUERY".
table	`bq-project.shaped.data`	Specifies the Google BigQuery table, qualified with the project and dataset. If your project name contains invalid YAML characters, you must surround it with backticks and double-quotes.
columns	`["productId", "color", "brand", "stockLevel", "createdAt", "updatedAt"]`	The name of the columns you wish to sync from BigQuery into Shaped.
datetime_key	`updatedAt`	The name of the column that contains a datetime key for ordering data during incremental syncs.

Optional fields

Field	Example	Description
start_datetime	`2020-01-01T00:00:00Z`	Specify a start datetime for the initial sync. If not specified, the initial sync will start from the earliest datetime in the table.
filters	`["event_type IN ['viewed', 'liked']"]`	Specify a list of filters to apply to the table. Filters must be in the form of a SQL WHERE clause.
unique_keys	`["productId"]`	Specify a list of columns that uniquely identify a row in the table, if duplicate rows are inserted with these keys, the latest row will be used.
batch_size	100000	The number of rows to fetch from the database in each batch, changing this can improve throughput for large tables. The default is 10000.

Dataset Creation Example

Below is an example of a BigQuery dataset connector configuration:

name: your_bigquery_dataset
schema_type: BIGQUERY
table: "`bq-project`.shaped.`data`"
columns: ["productId", "color", "brand", "stockLevel", "createdAt", "updatedAt"]
datetime_key: "updatedAt"
start_datetime: "2020-01-01T00:00:00Z"
filters: ["stockLevel > 0"]

The following payload will create a BigQuery dataset and begin syncing data from Shaped using the Shaped CLI.

shaped create-dataset --file dataset.yaml

BigQuery

Preparation​

Dataset Configuration​

Required fields​

Optional fields​

Dataset Creation Example​

Preparation

Dataset Configuration

Required fields

Optional fields

Dataset Creation Example