Connecting data to engines
Engines connect to data through the data block in the engine configuration. You can connect tables or views as item, user, or interaction tables.
Data unification with views
Use views to unify data from multiple sources into a single table for your engine. Views let you join tables, transform schemas, and create derived features before connecting to an engine.
Views can be materialized or on-demand:
- Materialized views (default): Results are stored as a physical table and updated automatically when source tables change. Use for fast query performance and frequently accessed data.
- On-demand views: Results are computed on-the-fly each time the view is accessed. Use when data changes frequently or you need real-time results.
For details, see Table & View Basics.
Connecting data through the data block
The data block defines which tables or views the engine uses. You can connect tables directly or use the query option for simple transformations.
Connect a table or view
Connect a table or view by name:
data:
item_table:
name: unified_items
type: table
user_table:
name: customers
type: table
interaction_table:
name: normalized_interactions
type: table
Simple transformations with query
For simple column renaming, you can use the query option:
data:
item_table:
type: query
query: |
SELECT product_id as item_id, name, description, category, price
FROM products
For anything more complex than a column rename or casting, we recommend creating an SQL view.
Use views for complex operations
For complex SQL operations such as joins, aggregations, or multi-table transformations, create a view instead of using the query option. Views are more maintainable and can be reused across multiple engines.
# unified_items_view.yaml
name: unified_items
view_type: SQL
sql_view_type: MATERIALIZED_VIEW
sql_query: |
SELECT
p.product_id as item_id,
p.name as title,
p.description,
p.category,
p.price,
i.inventory_count,
r.rating_avg
FROM products p
LEFT JOIN inventory i ON p.product_id = i.product_id
LEFT JOIN reviews_summary r ON p.product_id = r.product_id
shaped create-view --file unified_items_view.yaml
Then connect the view in your engine:
data:
item_table:
name: unified_items
type: table
Real-time vs batch connectors
Data sources use either real-time (streaming) or batch connectors:
- Real-time connectors: Stream data continuously as events occur. Data is ingested within 30 seconds. Use for user interaction events (clicks, views, purchases) that need immediate processing.
- Batch connectors: Sync data on a regular schedule (every 15 minutes). Use for item catalogs, user profiles, and historical data.
You can use both connector types in the same engine. Use real-time connectors for interaction events and batch connectors for item catalogs and user profiles.
For details, see Connector Types.
Schema configuration
Use schema_override to specify how columns should be interpreted:
data:
item_table:
name: movies
type: table
schema_override:
item:
id: item_id
features:
- name: genre
type: Sequence[TextCategory]
- name: poster_url
type: Image
- name: movie_title
type: Text
created_at: created_at
If you don't specify a column in schema_override, Shaped will infer the type automatically.