Skip to main content

Python SDK

The Python SDK is generated from the Shaped API schema, with rich types and inline descriptions so you can use your IDE’s IntelliSense/auto‑completion to discover methods, parameters, and return types as you work.

Install

Install the SDK from PyPI:

pip install shaped

Verify the installation:

pip show shaped

Instantiate the client

The core entry point is the Client class. You typically construct it once at application startup and reuse it.

from shaped import Client

client = Client(api_key="YOUR_API_KEY")

You can store the API key in an environment variable and read it at runtime:

import os
from shaped import Client

api_key = os.environ["SHAPED_API_KEY"]
client = Client(api_key=api_key)

Load data from raw records

This section shows how to:

  1. Define a custom table schema
  2. Insert raw Python records into the table

1. Create a custom table

Use a CUSTOM schema when you want to control the column definitions and upload data directly:

table_config = {
"schema_type": "CUSTOM",
"name": "pixar_movies",
"column_schema": {
"item_id": "Int64",
"movie_title": "String",
"poster_url": "String",
"description": "String",
"release_date": "String",
"cast": "Array(String)",
},
}

client.create_table(table_config)

2. Insert raw records

You can insert Python dicts directly using insert_table_rows:

records = [
{
"item_id": 187541,
"movie_title": "Incredibles 2 (2018)",
"poster_url": "https://m.media-amazon.com/images/M/MV5BMTEzNzY0OTg0NTdeQTJeQWpwZ15BbWU4MDU3OTg3MjUz._V1_QL75_UX380_CR0,0,380,562_.jpg",
"description": "The Incredibles family takes on a new mission which involves a change in family roles.",
"release_date": "2018-06-15",
"cast": ["Craig T. Nelson", "Holly Hunter"],
},
{
"item_id": 177765,
"movie_title": "Coco (2017)",
"poster_url": "https://m.media-amazon.com/images/M/MV5BMDIyM2E2NTAtMzlhNy00ZGUxLWI1NjgtZDY5MzhiMDc5NGU3XkEyXkFqcGc@._V1_QL75_UY562_CR7,0,380,562_.jpg",
"description": "Aspiring musician Miguel enters the Land of the Dead to find his great‑great‑grandfather.",
"release_date": "2017-11-22",
"cast": ["Anthony Gonzalez", "Gael García Bernal"],
},
]

client.insert_table_rows("pixar_movies", records)

Create engines with vector embeddings

Engines define how data is indexed and scored. A common pattern is to:

  1. Point the engine at a table
  2. Configure embeddings for semantic / vector search
  3. Create the engine to start encoding

1. Define the engine

from shaped.autogen.models.engine_config_v2 import EngineConfigV2
from shaped.autogen.models.data_config import DataConfig

semantic_search_engine = EngineConfigV2(
name="semantic_search",
data=DataConfig(),
)

2. Connect to an item table

from shaped.autogen.models.data_config_interaction_table import DataConfigInteractionTable
from shaped.autogen.models.reference_table_config import ReferenceTableConfig

semantic_search_engine.data = DataConfig(
item_table=DataConfigInteractionTable(
ReferenceTableConfig(name="pixar_movies")
)
)

3. Configure an embedding index

Choose the text fields to encode and the embedding model:

from shaped.autogen.models.index_config import IndexConfig
from shaped.autogen.models.embedding_config import EmbeddingConfig
from shaped.autogen.models.encoder import Encoder
from shaped.autogen.models.hugging_face_encoder import HuggingFaceEncoder

fields_to_encode = ["movie_title", "description"]
embedding_model = "sentence-transformers/all-MiniLM-L6-v2"

semantic_search_engine.index = IndexConfig(
embeddings=[
EmbeddingConfig(
name="movie_text_embedding",
encoder=Encoder(
HuggingFaceEncoder(
model_name=embedding_model,
item_fields=fields_to_encode,
)
),
)
]
)

4. Create the engine (start encoding)

client.create_engine(engine_config=semantic_search_engine)

Once the engine reaches the ACTIVE status, it is ready to serve queries.


Load data from connectors (MongoDB example)

Instead of pushing records directly from your application, you can configure a MongoDB connector table and let Shaped pull data on a schedule.

The MongoDB connector is configured via the Tables API or CLI, but you typically still query the synced table via the SDK.

1. Define a MongoDB table

Use the create_table method with table options to supply your MongoDB credentials.

table_config = {
"name": "mongodb_dataset",
"schema_type": "MONGODB",
"collection": "movies",
"database": "movielens",
"mongodb_connection_string": "mongodb://user:password@host:port/database",
"start_date": "2024-01-01",
}

client.create_table(table_config)

After the connector finishes syncing, the table mongodb_dataset is available to engines and queries, just like a custom table.

See the Connector Reference for a full list of external data sources you can sync with.

2. Use the connector table from Python

You typically reference the connector table when configuring an engine or writing ShapedQL. For example, a semantic search engine on MongoDB data:

from shaped.autogen.models.engine_config_v2 import EngineConfigV2
from shaped.autogen.models.data_config import DataConfig
from shaped.autogen.models.data_config_interaction_table import DataConfigInteractionTable
from shaped.autogen.models.reference_table_config import ReferenceTableConfig
from shaped.autogen.models.index_config import IndexConfig
from shaped.autogen.models.embedding_config import EmbeddingConfig
from shaped.autogen.models.encoder import Encoder
from shaped.autogen.models.hugging_face_encoder import HuggingFaceEncoder

mongo_engine = EngineConfigV2(
name="mongodb_semantic_search",
data=DataConfig(
item_table=DataConfigInteractionTable(
ReferenceTableConfig(name="mongodb_dataset")
)
),
)

mongo_engine.index = IndexConfig(
embeddings=[
EmbeddingConfig(
name="mongo_text_embedding",
encoder=Encoder(
HuggingFaceEncoder(
model_name="sentence-transformers/all-MiniLM-L6-v2",
item_fields=["document"], # JSON document column
)
),
)
]
)

client.create_engine(engine_config=mongo_engine)

For more details on table options, see the MongoDB connector docs.


Query data with the fluent builder

The Python SDK exposes a fluent query builder that compiles to ShapedQL. This is the recommended way to build most ranking queries in application code.

from shaped import RankQueryBuilder, TextSearch

query = (
RankQueryBuilder()
.from_entity("item")
.retrieve(
TextSearch(
input_text_query="$query",
mode={"type": "lexical"},
limit=50,
)
)
.limit(20)
.build()
)

results = client.execute_query(
engine_name="text_search",
query=query,
parameters={"query": "Incredibles"},
return_metadata=True,
)
from shaped import RankQueryBuilder, TextSearch

query = (
RankQueryBuilder()
.from_entity("item")
.retrieve(
TextSearch(
input_text_query="$query",
mode={"type": "vector", "text_embedding_ref": "movie_text_embedding"},
limit=50,
)
)
.limit(20)
.build()
)

results = client.execute_query(
engine_name="semantic_search",
query=query,
parameters={"query": "animated superhero family"},
return_metadata=True,
)

3. Hybrid search with multiple retrievers

from shaped import RankQueryBuilder, TextSearch

query = (
RankQueryBuilder()
.from_entity("item")
.retrieve(
[
TextSearch(
input_text_query="$query",
mode={"type": "lexical"},
limit=50,
name="lexical_search",
),
TextSearch(
input_text_query="$query",
mode={"type": "vector", "text_embedding_ref": "movie_text_embedding"},
limit=50,
name="vector_search",
),
]
)
.limit(20)
.build()
)

results = client.execute_query(
engine_name="hybrid_search",
query=query,
parameters={"query": "Pixar movies about family"},
return_metadata=True,
)

4. Personalized search with a value model

from shaped import RankQueryBuilder, TextSearch

query = (
RankQueryBuilder()
.from_entity("item")
.retrieve(
[
TextSearch(
input_text_query="$query",
mode={"type": "lexical"},
limit=50,
name="lexical_search",
),
TextSearch(
input_text_query="$query",
mode={"type": "vector", "text_embedding_ref": "movie_text_embedding"},
limit=50,
name="vector_search",
),
]
)
.score(
value_model="click_through_rate",
input_user_id="$user_id",
input_interactions_item_ids="$interaction_item_ids",
)
.limit(20)
.build()
)

results = client.execute_query(
engine_name="personalized_search",
query=query,
parameters={
"query": "Pixar",
"user_id": "user1",
"interaction_item_ids": ["187541", "177765", "1"],
},
return_metadata=True,
)

Next steps