Table & View Basics
Tables and views form the foundation of the data layer in Shaped. Tables are the foundation—they store your raw data from production sources via connectors. Views build on tables by transforming and enriching that data through SQL queries or AI-powered enrichment.
A typical workflow: connect your data source → create a table → optionally create views to transform, join, or enrich the data → use tables and views in engines for training and retrieval.
Tables
Tables are the core component of the data layer in Shaped.
Each table is populated via a connector to a source, such as a MySQL database or Amplitude account.
Connectors
Connectors establish a connection to where your data already lives. You can use a custom connector to stream data directly into Shaped using the Shaped API or Shaped CLI.
Built-in Connectors
Shaped supports out-of-the-box connectors to a range of systems, including:
- Data Warehouses: Such as BigQuery, Snowflake, and Redshift.
- Databases: Such as PostgreSQL and MySQL.
- Blob Storage: Such as S3 and Google Cloud Storage (GCS).
- Analytics Applications: Such as Segment and Amplitude.
Custom connectors
If your datasource is not supported, you can declare a custom table.
Use the "Create table" endpoint to declare a new table and its schema.
Then use a /table/insert endpoint to add records to the table.
Connector Types
Shaped supports both real-time (streaming) and batch connectors. Design your data architecture with the difference in mind:
- Real-time connectors: Stream data continuously (e.g., Segment, Kafka, Kinesis). Data appears in Shaped within 30 seconds.
- Batch connectors: Sync data on a schedule, typically every 15 minutes (e.g., BigQuery, PostgreSQL, Snowflake).
For more details on connector types, their use cases, and limitations, see the Connector Types guide.
Table schemas
Every table has a schema that describes the structure of your data.
Built-in connectors infer the column structure from the datasource you connect.
Custom tables require you to upload a schema. It should include:
columns: The names of the fields in your data.unique_keys: A list of columns that uniquely identify each row in the table (e.g., ["id"]).
Append-only tables
Some tables are created from event streams and are append-only. When you add rows to such tables, they are added to the bottom.
To ensure deduplication happens correctly, make sure that each record contains the primary_key and replication_key columns defined in your schema.
Views
A view is a transformed view of one or more tables. Views are created by SQL queries or LLM prompts.
SQL view
An SQL view is created by running an SQL query against one or more tables.
SQL Dialect: Views use ClickHouse SQL syntax. All SQL queries in views are executed using ClickHouse SQL, so ClickHouse functions, operators, and syntax are supported. For example:
- Use
now()(lowercase) instead ofNOW() - Use
toYear(),toMonth(), etc. instead ofEXTRACT(YEAR FROM ...) - Use
quantile()for percentile calculations - Use
INTERVAL 1 HOUR(no quotes, uppercase unit) instead ofINTERVAL '1 hour'
See the ClickHouse SQL documentation for full function reference.
You can use them to:
- Filter or aggregate source tables
- Join multiple tables
- Create derived features from existing columns
- Reshape data structures such as JSON
Materialized vs On-Demand Views
Shaped supports two types of SQL views:
Materialized views (default): The view query is executed and the results are stored as a physical table. Materialized views are updated automatically when source tables change.
- Pros: Fast query performance, pre-computed results
- Cons: Takes storage space, slight delay in updates
- Use when: You need fast queries, the view is used frequently, or you're joining large tables
On-demand views: The view query is executed each time the view is accessed. Results are computed on-the-fly.
- Pros: Always up-to-date, no storage overhead
- Cons: Slower query performance, recomputes on every access
- Use when: Data changes frequently, you need real-time results, or storage is a concern
To create an on-demand view, set sql_view_type: "VIEW":
SELECT *
FROM items
WHERE updated_at > now() - INTERVAL 1 HOUR
POST /v2/views
{
"name": "realtime_items",
"view_type": "SQL",
"sql_view_type": "VIEW",
"sql_query": "SELECT * FROM items WHERE updated_at > now() - INTERVAL 1 HOUR"
}
By default, views use sql_view_type: "MATERIALIZED_VIEW" for optimal
performance.
Example: Join two tables
- Python
- HTTP
import requests
import json
url = "https://api.shaped.ai/v2/views"
payload = json.dumps({
"name": "hm_items_with_descriptions",
"view_type": "SQL",
"sql_query": "select * from hm_items items left join hm_items_with_descriptions ai_items on ai_items.item_id = items.item_id"
})
headers = {
'Content-Type': 'application/json',
'x-api-key': 'LmBjMxgOtV1js4Zf3jiqw1VhnWfa7I1u9jBLSJeg'
}
response = requests.request("POST", url, headers=headers, data=payload)
POST /v2/views HTTP/1.1
Host: api.shaped.ai
Content-Type: application/json
x-api-key: LmBjMxgOtV1js4Zf3jiqw1VhnWfa7I1u9jBLSJeg
Content-Length: 236
{
"name": "hm_items_all_columns",
"view_type": "SQL",
"sql_query" : "select * from hm_articles items left join hm_items_with_descriptions ai_items on ai_items.item_id = items.item_id"
}
AI view
An AI view is created by running an LLM prompt against the records in a table. They take columns from a source table, apply an LLM with a prompt, and output new columns with enriched information.
Use them specifically to enrich data that will power semantic search.
Learn more in our AI enrichment guide - Enrich your tables with AI