Skip to main content

Connecting Your Data

Choose What Data to Include

The first step in creating your model is selecting the data to include. Shaped requires event, user, and item data to build your model.

Events

Events represent actions users take in relation to items. Examples include "Alice shared video 3" or "Bob liked video 4". Events should include the following fields:

  • user_id
  • item_id
  • created_at

You do not need to include user or item attributes in the event data, as Shaped will handle this for you.

High-quality events that strongly indicate user preferences are crucial for building a robust model. Events are categorized into:

Positive Events (indicating user likes):

  • Likes
  • Clicks
  • Shares
  • Added_to_cart
  • Purchase
  • Bookmarks
  • Follows
  • Watched

Negative Events (indicating user dislikes, which also improves model performance):

  • Dislikes
  • Impressions
  • Reports

If your events are stored in separate tables, create a dataset in Shaped for each table.

Shaped can create a model using event data alone!

This approach is often used when building a Minimum Viable Product (MVP) model.

Users

For each user, include as much information as possible, such as:

  • user_id
  • created_at
  • country
  • occupation
  • gender

Items

Include relevant details about each item:

  • item_id
  • created_at
  • updated_at
  • categories
  • price
  • url

Consider adding attributes for filtering purposes (e.g., excluding certain items):

  • deleted
  • public

Connect Your Data Warehouse

With your data selected, the next step is connecting your data warehouse. Shaped supports several data warehouse integrations. You can view the full list in the Integrations section.

In this guide, we'll assume you use BigQuery. Follow the BigQuery setup guide to connect your warehouse with Shaped.

Add Your Tables to Shaped

Create Dataset Config Files

Add the tables to Shaped by creating a DATASET for each table or data source. For example, if you have an items table and a items_category table, you need to create two datasets.

Create two YAML configuration files and use the Shaped CLI to create the datasets:

items.yaml
name: items
schema_type: BIGQUERY
table: "`bq-project`.shaped.`data`"
columns: ["item_id", "created_at", "updated_at", "price"]
datetime_key: "updated_at"
start_datetime: "2020-01-01T00:00:00Z"
items_categories.yaml
name: items_categories
schema_type: BIGQUERY
table: "`bq-project`.shaped.`data`"
columns: ["item_id", "created_at", "updated_at", "category", "deleted"]
datetime_key: "updated_at"
start_datetime: "2020-01-01T00:00:00Z"
tip

See a list of all accepted datatypes here.

info

Shaped ingests all rows where the datetime_key value is greater than the last ingested row. Use an updated_at timestamp as the datetime_key if you plan to update rows.

Create Your Datasets With The Shaped CLI

Run the following commands to create the datasets and sync data into Shaped:

shaped create-dataset --file items.yaml
shaped create-dataset --file items_categories.yaml

Next Steps

With your data connected to Shaped, the next step is to build your model configuration!