Connect data to an engine
Engines connect to data through the data block in the engine configuration. You can connect tables or views as item, user, or interaction tables.
Connecting data through the data block
The data block defines which tables or views the engine uses. Once your data is loaded as tables or views, you can connect them directly by name. For more minor transormations, you can use the query option.
Connect a table or view
Connect a table or view by name:
data:
item_table:
name: unified_items
type: table
user_table:
name: customers
type: table
interaction_table:
name: normalized_interactions
type: table
Simple transformations with query
For simple column renaming, you can use the query option:
data:
item_table:
type: query
query: |
SELECT product_id as item_id, name, description, category, price
FROM products
For anything more complex than a column rename or casting, we recommend creating an SQL view.
Data unification with views
For complex SQL operations such as joins, aggregations, or multi-table transformations, create a view instead of using the query option. Views are more maintainable and can be reused across multiple engines.
For example, this materialized view joins product info, reviews, and inventory into an enriched products table (the query is expanded for easy reading):
{
"name": "enriched_products",
"view_type": "SQL",
"sql_view_type": "MATERIALIZED_VIEW",
"sql_query": "SELECT
p.product_id as item_id,
p.name as title,
p.description,
p.category,
p.price,
i.inventory_count,
r.rating_avg
FROM products p
LEFT JOIN inventory i ON p.product_id = i.product_id
LEFT JOIN reviews_summary r ON p.product_id = r.product_id"
}
# unified_items_view.yaml
name: unified_items
view_type: SQL
sql_view_type: MATERIALIZED_VIEW
sql_query: |
SELECT
p.product_id as item_id,
p.name as title,
p.description,
p.category,
p.price,
i.inventory_count,
r.rating_avg
FROM products p
LEFT JOIN inventory i ON p.product_id = i.product_id
LEFT JOIN reviews_summary r ON p.product_id = r.product_id
shaped create-view --file unified_items_view.yaml
Then connect the view in your engine:
data:
item_table:
name: unified_items
type: table
For details, see Table & View Basics.
Real-time vs batch connectors
Data sources use either real-time (streaming) or batch connectors:
- Real-time connectors: Stream data continuously as events occur. Data is ingested within 30 seconds. Use for user interaction events (clicks, views, purchases) that need immediate processing.
- Batch connectors: Sync data on a regular schedule (every 15 minutes). Use for item catalogs, user profiles, and historical data.
You can use both connector types in the same engine. Use real-time connectors for interaction events and batch connectors for item catalogs and user profiles.
For details, see Connector Types.
Override column types with schema_override
Use schema_override to specify how columns should be interpreted:
data:
item_table:
name: movies
type: table
schema_override:
item:
id: item_id
features:
- name: genre
type: Sequence[TextCategory]
- name: poster_url
type: Image
- name: movie_title
type: Text
created_at: created_at
If you don't specify a column in schema_override, Shaped will infer the type automatically.
Limit input data for faster iteration
If you have a large item or interaction catalog, start by limiting the input item and interaction catalog to under 500K items. Engines that use less data train faster, so the smaller model will surface errors or inconsistencies faster.
data:
item_table:
type: query
query: |
SELECT *
FROM products
LIMIT 500000
interaction_table:
type: query
query: |
SELECT *
FROM interactions
LIMIT 500000
Once the engine trains successfully and returns good results, train with your whole dataset.