Connect data to an engine

Engines connect to data through the data block in the engine configuration. You can connect tables or views as item, user, or interaction tables.

Connecting data through the data block

The data block defines which tables or views the engine uses. Once your data is loaded as tables or views, you can connect them directly by name. For more minor transformations, you can use the query option.

Connect a table or view

Connect a table or view by name:

data:
  item_table:
    name: unified_items
    type: table
  user_table:
    name: customers
    type: table
  interaction_table:
    name: normalized_interactions
    type: table

Interaction tables must include user_id, item_id, label, and created_at columns.

Simple transformations with query

For simple column renaming, you can use the query option:

data:
  item_table:
    type: query
    query: |
      SELECT product_id as item_id, name, description, category, price
      FROM products

For anything more complex than a column rename or casting, we recommend creating an SQL view.

Data unification with views

For complex SQL operations such as joins, aggregations, or multi-table transformations, create a view instead of using the query option. Views are more maintainable and can be reused across multiple engines.

For example, this materialized view joins product info, reviews, and inventory into an enriched products table (the query is expanded for easy reading):

{
  "name": "enriched_products",
  "view_type": "SQL",
  "sql_view_type": "MATERIALIZED_VIEW",
  "sql_query": "SELECT 
    p.product_id as item_id, 
    p.name as title, 
    p.description, 
    p.category, 
    p.price, 
    i.inventory_count, 
    r.rating_avg
  FROM products p
  LEFT JOIN inventory i ON p.product_id = i.product_id
  LEFT JOIN reviews_summary r ON p.product_id = r.product_id"
}

# unified_items_view.yaml
name: unified_items
view_type: SQL
sql_view_type: MATERIALIZED_VIEW
sql_query: |
  SELECT 
    p.product_id as item_id, 
    p.name as title, 
    p.description, 
    p.category, 
    p.price, 
    i.inventory_count, 
    r.rating_avg
  FROM products p
  LEFT JOIN inventory i ON p.product_id = i.product_id
  LEFT JOIN reviews_summary r ON p.product_id = r.product_id

shaped create-view --file unified_items_view.yaml

Then connect the view in your engine:

data:
  item_table:
    name: unified_items
    type: table

For details, see Table & View Basics.

Real-time vs batch connectors

Data sources use either real-time (streaming) or batch connectors:

Real-time connectors: Stream data continuously as events occur. Data is ingested within 30 seconds. Use for user interaction events (clicks, views, purchases) that need immediate processing.
Batch connectors: Sync data on a regular schedule (every 15 minutes). Use for item catalogs, user profiles, and historical data.

You can use both connector types in the same engine. Use real-time connectors for interaction events and batch connectors for item catalogs and user profiles.

For details, see Connector Types.

Override column types with `schema_override`

Use schema_override to specify how columns should be interpreted:

data:
  item_table:
    name: movies
    type: table
  schema_override:
    item:
      id: item_id
      features:
        - name: genre
          type: Sequence[TextCategory]
        - name: poster_url
          type: Image
        - name: movie_title
          type: Text
      created_at: created_at

If you don't specify a column in schema_override, Shaped will infer the type automatically.

Limit input data for faster iteration

If you have a large item or interaction catalog, start by limiting the input item and interaction catalog to under 500K items. Engines that use less data train faster, so the smaller model will surface errors or inconsistencies faster.

data:
  item_table:
    type: query
    query: |
      SELECT *
      FROM products
      LIMIT 500000
  interaction_table:
    type: query
    query: |
      SELECT *
      FROM interactions
      LIMIT 500000

Once the engine trains successfully and returns good results, train with your whole dataset.

Connecting data through the data block​

Connect a table or view​

Simple transformations with query​

Data unification with views​

Real-time vs batch connectors​

Override column types with schema_override​

Limit input data for faster iteration​