Connecting Your Data
Choose What Data to Include
The first step in creating your model is selecting the data to include. Shaped requires event, user, and item data to build your model.
Events
Events represent actions users take in relation to items. Examples include "Alice shared video 3" or "Bob liked video 4". Events should include the following fields:
user_id
item_id
created_at
You do not need to include user or item attributes in the event data, as Shaped will handle this for you.
High-quality events that strongly indicate user preferences are crucial for building a robust model. Events are categorized into:
Positive Events (indicating user likes):
- Likes
- Clicks
- Shares
- Added_to_cart
- Purchase
- Bookmarks
- Follows
- Watched
Negative Events (indicating user dislikes, which also improves model performance):
- Dislikes
- Impressions
- Reports
If your events are stored in separate tables, create a dataset in Shaped for each table.
This approach is often used when building a Minimum Viable Product (MVP) model.
Users
For each user, include as much information as possible, such as:
user_id
created_at
country
occupation
gender
Items
Include relevant details about each item:
item_id
created_at
updated_at
categories
price
url
Consider adding attributes for filtering purposes (e.g., excluding certain items):
deleted
public
Connect Your Data Warehouse
With your data selected, the next step is connecting your data warehouse. Shaped supports several data warehouse integrations. You can view the full list in the Integrations section.
In this guide, we'll assume you use BigQuery. Follow the BigQuery setup guide to connect your warehouse with Shaped.
Add Your Tables to Shaped
Create Dataset Config Files
Add the tables to Shaped by creating a DATASET
for each table or data source. For
example, if you have an items
table and a items_category
table, you need to create
two datasets.
Create two YAML configuration files and use the Shaped CLI to create the datasets:
name: items
schema_type: BIGQUERY
table: "`bq-project`.shaped.`data`"
columns: ["item_id", "created_at", "updated_at", "price"]
datetime_key: "updated_at"
start_datetime: "2020-01-01T00:00:00Z"
name: items_categories
schema_type: BIGQUERY
table: "`bq-project`.shaped.`data`"
columns: ["item_id", "created_at", "updated_at", "category", "deleted"]
datetime_key: "updated_at"
start_datetime: "2020-01-01T00:00:00Z"
See a list of all accepted datatypes here.
Shaped ingests all rows where the datetime_key
value is greater than the last ingested
row. Use an updated_at
timestamp as the datetime_key
if you plan to update rows.
Create Your Datasets With The Shaped CLI
Run the following commands to create the datasets and sync data into Shaped:
shaped create-dataset --file items.yaml
shaped create-dataset --file items_categories.yaml
Next Steps
With your data connected to Shaped, the next step is to build your model configuration!