Skip to main content

Personalized search

This guide describes how you can implement personalized search that combines hybrid search with a GBDT scoring model trained on user interactions to personalize results.

First, instantiate the Shaped client:

from shaped import Client

client = Client(api_key="YOUR_KEY_HERE")

Upload data

To do personalized search, you need to connect to both an item table and an interaction table. The interaction table tracks user behavior to train the personalization model.

The first step is to declare your item table:

item_table_config = {
"schema_type": "CUSTOM",
"name": "pixar_movies",
"column_schema": {
"item_id": "Int64",
"movie_title": "String",
"poster_url": "String",
"description": "String",
"release_date": "String",
"cast": "Array(String)",
},
}

client.create_table(item_table_config)

Upload your item data:

records = [
{"item_id": 187541, "movie_title": "Incredibles 2 (2018)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTEzNzY0OTg0NTdeQTJeQWpwZ15BbWU4MDU3OTg3MjUz._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "The Incredibles family takes on a new mission which involves a change in family roles: Bob Parr (Mr. Incredible) must manage the house while his wife Helen (Elastigirl) goes out to save the world.", "release_date": "2018-06-15", "cast": ["Craig T. Nelson", "Holly Hunter", "Sarah Vowell", "Huck Milner", "Catherine Keener", "Eli Fucile", "Bob Odenkirk", "Samuel L. Jackson", "Michael Bird", "Sophia Bush", "Brad Bird", "Brad Bird", "Nicole Paradis Grindle", "John Walker", "Michael Giacchino", "Stephen Schaffer", "Natalie Lyon", "Kevin Reher", "Ralph Eggleston"]},
{"item_id": 177765, "movie_title": "Coco (2017)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMDIyM2E2NTAtMzlhNy00ZGUxLWI1NjgtZDY5MzhiMDc5NGU3XkEyXkFqcGc@._V1_QL75_UY562_CR7,0,380,562_.jpg", "description": "Aspiring musician Miguel, confronted with his family's ancestral ban on music, enters the Land of the Dead to find his great-great-grandfather, a legendary singer.", "release_date": "2017-11-22", "cast": ["Anthony Gonzalez", "Gael García Bernal", "Benjamin Bratt", "Alanna Ubach", "Renee Victor", "Jaime Camil", "Alfonso Arau", "Herbert Siguenza", "Gabriel Iglesias", "Lombardo Boyar", "Lee Unkrich", "Lee Unkrich", "Jason Katz", "Matthew Aldrich", "Adrian Molina", "Darla K. Anderson", "Michael Giacchino", "Steve Bloom", "Lee Unkrich", "Carla Hool", "Natalie Lyon", "Kevin Reher", "Harley Jessup"]},
try:
client.insert_table_rows("pixar_movies", records)
except NameError:
# records may be defined elsewhere in your application
pass

Now create an interaction table to track user behavior:

interaction_table_config = {
"schema_type": "CUSTOM",
"name": "user_interactions",
"column_schema": {
"user_id": "String",
"item_id": "Int64",
"event_type": "String",
"timestamp": "String",
},
}

client.create_table(interaction_table_config)

Upload sample interaction data:

interactions = [
{"user_id": "user1", "item_id": 187541, "event_type": "click", "timestamp": "2024-01-15T10:00:00Z"},
{"user_id": "user1", "item_id": 177765, "event_type": "click", "timestamp": "2024-01-16T14:30:00Z"},
{"user_id": "user1", "item_id": 1, "event_type": "purchase", "timestamp": "2024-01-17T09:15:00Z"},
{"user_id": "user2", "item_id": 134853, "event_type": "click", "timestamp": "2024-01-15T11:20:00Z"},
{"user_id": "user2", "item_id": 170957, "event_type": "click", "timestamp": "2024-01-16T16:45:00Z"},
]

client.insert_table_rows("user_interactions", interactions)

Set up your engine

Now you will configure the personalized search engine with hybrid search and GBDT training.

Start by instantiating the engine configuration class:

from shaped.autogen.models.engine_config_v2 import EngineConfigV2
from shaped.autogen.models.data_config import DataConfig

personalized_search_engine = EngineConfigV2(
name="personalized_search",
data=DataConfig(),
)

Connect engine to data

Connect both the item table and interaction table to your engine:

from shaped.autogen.models.data_config_interaction_table import DataConfigInteractionTable
from shaped.autogen.models.reference_table_config import ReferenceTableConfig

personalized_search_engine.data = DataConfig(
item_table=DataConfigInteractionTable(
ReferenceTableConfig(name="pixar_movies")
),
interaction_table=DataConfigInteractionTable(
ReferenceTableConfig(name="user_interactions")
),
)

Configure both lexical and vector search as in the hybrid search example:

from shaped.autogen.models.index_config import IndexConfig
from shaped.autogen.models.search_config import SearchConfig
from shaped.autogen.models.embedding_config import EmbeddingConfig
from shaped.autogen.models.encoder import Encoder
from shaped.autogen.models.hugging_face_encoder import HuggingFaceEncoder

embedding_model = "sentence-transformers/all-MiniLM-L6-v2"

personalized_search_engine.index = IndexConfig(
lexical_search=SearchConfig(
item_fields=["movie_title", "description"],
fuzziness_edit_distance=0,
),
embeddings=[
EmbeddingConfig(
name="movie_text_embedding",
encoder=Encoder(
HuggingFaceEncoder(
model_name=embedding_model,
item_fields=["movie_title", "description"],
)
),
)
],
)

Configure GBDT training

Configure a GBDT model to learn from user interactions and personalize search results:

from shaped.autogen.models.training_config import TrainingConfig
from shaped.autogen.models.models_inner import ModelsInner
from shaped.autogen.models.shaped_internal_recsys_policies_gbdt_gbdt_policy_config import ShapedInternalRecsysPoliciesGbdtGBDTPolicyConfig

personalized_search_engine.training = TrainingConfig(
models=[
ModelsInner(
ShapedInternalRecsysPoliciesGbdtGBDTPolicyConfig(
policy_type="gbdt",
name="click_through_rate",
)
)
],
)

Start indexing and training

After configuring your engine's data, index, and training, use the create engine method to start both indexing and model training:

client.create_engine(engine_config=personalized_search_engine)

Make a personalized search query

After the engine is finished indexing and training, you can search with personalization.

Use hybrid search retrievers combined with a score expression that weights results by the trained GBDT model:

from shaped import RankQueryBuilder, TextSearch

query = (
RankQueryBuilder()
.from_entity('item')
.retrieve([
TextSearch(
input_text_query='$query',
mode={'type': 'lexical'},
limit=50,
name='lexical_search'
),
TextSearch(
input_text_query='$query',
mode={'type': 'vector', 'text_embedding_ref': 'movie_text_embedding'},
limit=50,
name='vector_search'
)
])
.score(
value_model='click_through_rate',
input_user_id='$user_id',
input_interactions_item_ids='$interaction_item_ids'
)
.limit(20)
.build()
)

results = client.execute_query(
engine_name="personalized_search",
query=query,
parameters={
"query": "Incredibles",
"user_id": "user1",
"interaction_item_ids": ["187541", "177765", "1"]
},
return_metadata=True,
)