Skip to main content

Hybrid search

This guide describes how you can implement hybrid search that combines BM25 lexical search with semantic vector search for better recall and precision.

First, instantiate the Shaped client:

from shaped import Client

client = Client(api_key="YOUR_KEY_HERE")

Upload data

To do hybrid search, you need to first connect to a data table. In this example, you'll declare a table and upload some rows of data manually.

Alternatively, you can sync data from an external data source using a connector, or simply import a JSONL or CSV file.

The first step is to declare your data table. You'll use a CUSTOM schema so that you can define the table's input columns yourself and write records manually.

table_config = {
"schema_type": "CUSTOM",
"name": "pixar_movies",
"column_schema": {
"item_id": "Int64",
"movie_title": "String",
"poster_url": "String",
"description": "String",
"release_date": "String",
"cast": "Array(String)",
},
}

client.create_table(table_config)

Once the table schema is created, you can upload your data directly. You'll declare the rows as an object and then use the table insert method to write to the table.

records = [
{"item_id": 187541, "movie_title": "Incredibles 2 (2018)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTEzNzY0OTg0NTdeQTJeQWpwZ15BbWU4MDU3OTg3MjUz._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "The Incredibles family takes on a new mission which involves a change in family roles: Bob Parr (Mr. Incredible) must manage the house while his wife Helen (Elastigirl) goes out to save the world.", "release_date": "2018-06-15", "cast": ["Craig T. Nelson", "Holly Hunter", "Sarah Vowell", "Huck Milner", "Catherine Keener", "Eli Fucile", "Bob Odenkirk", "Samuel L. Jackson", "Michael Bird", "Sophia Bush", "Brad Bird", "Brad Bird", "Nicole Paradis Grindle", "John Walker", "Michael Giacchino", "Stephen Schaffer", "Natalie Lyon", "Kevin Reher", "Ralph Eggleston"]},
{"item_id": 177765, "movie_title": "Coco (2017)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMDIyM2E2NTAtMzlhNy00ZGUxLWI1NjgtZDY5MzhiMDc5NGU3XkEyXkFqcGc@._V1_QL75_UY562_CR7,0,380,562_.jpg", "description": "Aspiring musician Miguel, confronted with his family's ancestral ban on music, enters the Land of the Dead to find his great-great-grandfather, a legendary singer.", "release_date": "2017-11-22", "cast": ["Anthony Gonzalez", "Gael García Bernal", "Benjamin Bratt", "Alanna Ubach", "Renee Victor", "Jaime Camil", "Alfonso Arau", "Herbert Siguenza", "Gabriel Iglesias", "Lombardo Boyar", "Lee Unkrich", "Lee Unkrich", "Jason Katz", "Matthew Aldrich", "Adrian Molina", "Darla K. Anderson", "Michael Giacchino", "Steve Bloom", "Lee Unkrich", "Carla Hool", "Natalie Lyon", "Kevin Reher", "Harley Jessup"]},
{"item_id": 170957, "movie_title": "Cars 3 (2017)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTc0NzU2OTYyN15BMl5BanBnXkFtZTgwMTkwOTg2MTI@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "Lightning McQueen sets out to prove to a new generation of racers that he's still the best race car in the world.", "release_date": "2017-06-16", "cast": ["Owen Wilson", "Cristela Alonzo", "Chris Cooper", "Nathan Fillion", "Larry the Cable Guy", "Armie Hammer", "Ray Magliozzi", "Tony Shalhoub", "Bonnie Hunt", "Lea DeLaria", "Brian Fee", "Brian Fee", "Ben Queen", "Eyal Podell", "Jonathon E. Stewart", "Kevin Reher", "Randy Newman", "Jason Hudak", "Natalie Lyon", "Kevin Reher", "William Cone", "Jay Shuster"]},
{"item_id": 157296, "movie_title": "Finding Dory (2016)", "poster_url": "https://m.media-amazon.com/images/M/MV5BY2VlYWJjMGMtYjcwZC00MDE2LThmMDItYjVlMzNhYzBhYTk5XkEyXkFqcGc@._V1_QL75_UY562_CR7,0,380,562_.jpg", "description": "Friendly but forgetful blue tang Dory begins a search for her long-lost parents and everyone learns a few things about the real meaning of family along the way.", "release_date": "2016-06-17", "cast": ["Ellen DeGeneres", "Albert Brooks", "Ed O'Neill", "Kaitlin Olson", "Hayden Rolence", "Ty Burrell", "Diane Keaton", "Eugene Levy", "Sloane Murray", "Idris Elba", "Andrew Stanton", "Andrew Stanton", "Victoria Strouse", "Lindsey Collins", "Thomas Newman", "Axel Geddes", "Natalie Lyon", "Kevin Reher", "Steve Pilcher"]},
{"item_id": 136016, "movie_title": "The Good Dinosaur (2015)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTc5MTg2NjQ4MV5BMl5BanBnXkFtZTgwNzcxOTY5NjE@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "In a world where dinosaurs and humans live side-by-side, an Apatosaurus named Arlo makes an unlikely human friend.", "release_date": "2015-11-25", "cast": ["Jeffrey Wright", "Frances McDormand", "Maleah Nipay-Padilla", "Ryan Teeple", "Jack McGraw", "Marcus Scribner", "Raymond Ochoa", "Jack Bright", "Peter Sohn", "Steve Zahn", "Peter Sohn", "Bob Peterson", "Peter Sohn", "Erik Benson", "Meg LeFauve", "Denise Ream", "Jeff Danna", "Mychael Danna", "Stephen Schaffer", "Natalie Lyon", "Kevin Reher", "Harley Jessup", "Daniel Lopez Muñoz"]},
{"item_id": 134853, "movie_title": "Inside Out (2015)", "poster_url": "https://m.media-amazon.com/images/M/MV5BOTgxMDQwMDk0OF5BMl5BanBnXkFtZTgwNjU5OTg2NDE@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "After young Riley is uprooted from her Midwest life and moved to San Francisco, her emotions, Joy, Fear, Anger, Disgust, and Sadness, conflict on how best to navigate a new city, house, and school.", "release_date": "2015-06-19", "cast": ["Amy Poehler", "Bill Hader", "Lewis Black", "Mindy Kaling", "Phyllis Smith", "Richard Kind", "Kaitlyn Dias", "Diane Lane", "Kyle MacLachlan", "Paula Poundstone", "Pete Docter", "Pete Docter", "Ronnie Del Carmen", "Meg LeFauve", "Josh Cooley", "Jonas Rivera", "Michael Giacchino", "Kevin Nolting", "Natalie Lyon", "Kevin Reher", "Ralph Eggleston"]},
{"item_id": 103141, "movie_title": "Monsters University (2013)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTUyODgwMDU3M15BMl5BanBnXkFtZTcwOTM4MjcxOQ@@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "A look at the relationship between Mike Wazowski and James P. \"Sully\" Sullivan during their days at Monsters University, when they weren't necessarily the best of friends.", "release_date": "2013-06-21", "cast": ["Billy Crystal", "John Goodman", "Steve Buscemi", "Helen Mirren", "Peter Sohn", "Joel Murray", "Sean Hayes", "Dave Foley", "Charlie Day", "Alfred Molina", "Dan Scanlon", "Dan Scanlon", "Daniel Gerson", "Robert L. Baird", "Kori Rae", "Randy Newman", "Jean-Claude Kalache", "Greg Snyder", "Natalie Lyon", "Kevin Reher", "Ricky Nierva"]},
{"item_id": 95167, "movie_title": "Brave (2012)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMzgwODk3ODA1NF5BMl5BanBnXkFtZTcwNjU3NjQ0Nw@@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "Determined to make her own path in life, Princess Merida defies a custom that brings chaos to her kingdom. Granted one wish, Merida must rely on her bravery and her archery skills to undo a beastly curse.", "release_date": "2012-06-22", "cast": ["Kelly Macdonald", "Billy Connolly", "Emma Thompson", "Julie Walters", "Robbie Coltrane", "Kevin McKidd", "Craig Ferguson", "Sally Kinghorn", "Eilidh Fraser", "Peigi Barker", "Mark Andrews", "Brenda Chapman", "Brenda Chapman", "Mark Andrews", "Steve Purcell", "Irene Mecchi", "Katherine Sarafian", "Patrick Doyle", "Nicholas C. Smith", "Natalie Lyon", "Kevin Reher", "Steve Pilcher"]},
{"item_id": 87876, "movie_title": "Cars 2 (2011)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTUzNTc3MTU3M15BMl5BanBnXkFtZTcwMzIxNTc3NA@@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "Star race car Lightning McQueen and his pal Mater head overseas to compete in the World Grand Prix race. But the road to the championship becomes rocky as Mater gets caught up in an intriguing adventure of his own: international espionage.", "release_date": "2011-06-24", "cast": ["Owen Wilson", "Larry the Cable Guy", "Michael Caine", "Emily Mortimer", "Eddie Izzard", "John Turturro", "Brent Musburger", "Joe Mantegna", "Thomas Kretschmann", "Peter Jacobson", "John Lasseter", "John Lasseter", "Bradford Lewis", "Dan Fogelman", "Ben Queen", "Denise Ream", "Michael Giacchino", "Stephen Schaffer", "Natalie Lyon", "Kevin Reher", "Harley Jessup"]},
{"item_id": 68954, "movie_title": "Up (2009)", "poster_url": "https://m.media-amazon.com/images/M/MV5BNmI1ZTc5MWMtMDYyOS00ZDc2LTkzOTAtNjQ4NWIxNjYyNDgzXkEyXkFqcGc@._V1_QL75_UX380_CR0,1,380,562_.jpg", "description": "78-year-old Carl Fredricksen travels to South America in his house equipped with balloons, inadvertently taking a young stowaway.", "release_date": "2009-05-29", "cast": ["Edward Asner", "Jordan Nagai", "John Ratzenberger", "Christopher Plummer", "Bob Peterson", "Delroy Lindo", "Jerome Ranft", "David Kaye", "Elie Docter", "Jeremy Leary", "Pete Docter", "Pete Docter", "Bob Peterson", "Tom McCarthy", "Jonas Rivera", "Michael Giacchino", "Kevin Nolting", "Natalie Lyon", "Kevin Reher", "Ricky Nierva"]},
{"item_id": 50872, "movie_title": "Ratatouille (2007)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTMzODU0NTkxMF5BMl5BanBnXkFtZTcwMjQ4MzMzMw@@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "A rat who can cook makes an unusual alliance with a young kitchen worker at a famous Paris restaurant.", "release_date": "2007-06-29", "cast": ["Brad Garrett", "Lou Romano", "Patton Oswalt", "Ian Holm", "Brian Dennehy", "Peter Sohn", "Peter O'Toole", "Janeane Garofalo", "Will Arnett", "Julius Callahan", "Brad Bird", "Brad Bird", "Jan Pinkava", "Jim Capobianco", "Emily Cook", "Bradford Lewis", "Michael Giacchino", "Darren T. Holmes", "Natalie Lyon", "Kevin Reher", "Harley Jessup"]},
{"item_id": 45517, "movie_title": "Cars (2006)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTg5NzY0MzA2MV5BMl5BanBnXkFtZTYwNDc3NTc2._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "On the way to the biggest race of his life, a hotshot rookie race car gets stranded in a rundown town and learns that winning isn't everything in life.", "release_date": "2006-06-09", "cast": ["Owen Wilson", "Bonnie Hunt", "Paul Newman", "Larry the Cable Guy", "Cheech Marin", "Tony Shalhoub", "Guido Quaroni", "Jenifer Lewis", "Paul Dooley", "Michael Wallis", "John Lasseter", "John Lasseter", "Joe Ranft", "Jorgen Klubien", "Dan Fogelman", "Darla K. Anderson", "Randy Newman", "Jean-Claude Kalache", "Ken Schretzmann", "Kevin Reher", "William Cone", "Bob Pauley"]},
{"item_id": 8961, "movie_title": "The Incredibles (2004)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTY5OTU0OTc2NV5BMl5BanBnXkFtZTcwMzU4MDcyMQ@@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "While trying to lead a quiet suburban life, a family of undercover superheroes are forced into action to save the world.", "release_date": "2004-11-05", "cast": ["Craig T. Nelson", "Samuel L. Jackson", "Holly Hunter", "Jason Lee", "Dominique Louis", "Theodore Newton", "Jean Sincere", "Eli Fucile", "Maeve Andrews", "Wallace Shawn", "Brad Bird", "Brad Bird", "John Walker", "Michael Giacchino", "Andrew Jimenez", "Patrick Lin", "Janet Lucroy", "Stephen Schaffer", "Matthew Jon Beck", "Mary Hidalgo", "Kevin Reher", "Jen Rudin", "Lou Romano"]},
{"item_id": 6377, "movie_title": "Finding Nemo (2003)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTc5NjExNTA5OV5BMl5BanBnXkFtZTYwMTQ0ODY2._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "After his son is captured in the Great Barrier Reef and taken to Sydney, a timid clownfish sets out on a journey to bring him home.", "release_date": "2003-05-30", "cast": ["Albert Brooks", "Ellen DeGeneres", "Alexander Gould", "Willem Dafoe", "Brad Garrett", "Allison Janney", "Austin Pendleton", "Stephen Root", "Vicki Lewis", "Joe Ranft", "Andrew Stanton", "Andrew Stanton", "Bob Peterson", "David Reynolds", "Graham Walters", "Thomas Newman", "Sharon Calahan", "Jeremy Lasky", "David Ian Salter", "Matthew Jon Beck", "Carl Davies", "Mary Hidalgo", "Kevin Reher", "Ralph Eggleston"]},
{"item_id": 4886, "movie_title": "Monsters, Inc. (2001)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTY1NTI0ODUyOF5BMl5BanBnXkFtZTgwNTEyNjQ0MDE@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "In order to power the city, monsters have to scare children so that they scream. However, the children are toxic to the monsters, and after a child gets through, two monsters realize things may not be what they think.", "release_date": "2001-11-02", "cast": ["Billy Crystal", "John Goodman", "Mary Gibbs", "Steve Buscemi", "James Coburn", "Jennifer Tilly", "Bob Peterson", "John Ratzenberger", "Frank Oz", "Daniel Gerson", "Pete Docter", "Pete Docter", "Jill Culton", "Jeff Pidgeon", "Ralph Eggleston", "Darla K. Anderson", "Randy Newman", "Jim Stewart", "Matthew Jon Beck", "Mary Hidalgo", "Ruth Lambert", "Harley Jessup", "Bob Pauley"]},
{"item_id": 3114, "movie_title": "Toy Story 2 (1999)", "poster_url": "https://m.media-amazon.com/images/M/MV5BNzVmODlhMDEtY2YxZi00OTVjLTlkNTktN2Q2OTRlM2I4M2FhXkEyXkFqcGc@._V1_QL75_UY562_CR1,0,380,562_.jpg", "description": "When Woody is stolen by a toy collector, Buzz and his friends set out on a rescue mission to save Woody before he becomes a museum toy property with his roundup gang Jessie, Prospector, and Bullseye.", "release_date": "1999-11-24", "cast": ["Tom Hanks", "Tim Allen", "Joan Cusack", "Kelsey Grammer", "Don Rickles", "Jim Varney", "Wallace Shawn", "John Ratzenberger", "Annie Potts", "Wayne Knight", "John Lasseter", "John Lasseter", "Pete Docter", "Ash Brannon", "Andrew Stanton", "Karen Robert Jackson", "Helene Plotkin", "Randy Newman", "Sharon Calahan", "Edie Ichioka", "David Ian Salter", "Lee Unkrich", "Mary Hidalgo", "Ruth Lambert", "William Cone", "Jim Pearson"]},
{"item_id": 1, "movie_title": "Toy Story (1995)", "poster_url": "https://m.media-amazon.com/images/M/MV5BZTA3OWVjOWItNjE1NS00NzZiLWE1MjgtZDZhMWI1ZTlkNzYwXkEyXkFqcGc@._V1_QL75_UX380_CR0,2,380,562_.jpg", "description": "A cowboy doll is profoundly jealous when a new spaceman action figure supplants him as the top toy in a boy's bedroom. When circumstances separate them from their owner, the duo have to put aside their differences to return to him.", "release_date": "1995-11-22", "cast": ["Tom Hanks", "Tim Allen", "Don Rickles", "Jim Varney", "Wallace Shawn", "John Ratzenberger", "Annie Potts", "John Morris", "Erik von Detten", "Laurie Metcalf", "John Lasseter", "John Lasseter", "Pete Docter", "Andrew Stanton", "Joe Ranft", "Bonnie Arnold", "Ralph Guggenheim", "Randy Newman", "Robert Gordon", "Lee Unkrich"]},
]

client.insert_table_rows("pixar_movies", records)

Use the insert rows method to add your data to the table you created:

client.insert_table_rows("pixar_movies", records)

Set up your engine

Now you will configure the hybrid search engine. The engine will index text columns using both BM25 lexical search and semantic embeddings for combined retrieval.

Start by instantiating the engine configuration class:

from shaped.autogen.models.engine_config_v2 import EngineConfigV2
from shaped.autogen.models.data_config import DataConfig

hybrid_search_engine = EngineConfigV2(
name="hybrid_search",
data=DataConfig(),
)

Connect engine to data

Engines must connect to an item table, which define the candidate items in your catalog. Connect the table you just created (pixar_movies) to the reference config.

from shaped.autogen.models.data_config_interaction_table import DataConfigInteractionTable
from shaped.autogen.models.reference_table_config import ReferenceTableConfig

hybrid_search_engine.data = DataConfig(
item_table=DataConfigInteractionTable(
ReferenceTableConfig(
name="pixar_movies"
)
)
)

To enable hybrid search, you need to configure both lexical search for BM25 keyword matching and embeddings for semantic search. You'll index the same text fields with both methods.

Configure lexical search first:

from shaped.autogen.models.index_config import IndexConfig
from shaped.autogen.models.search_config import SearchConfig

hybrid_search_engine.index = IndexConfig(
lexical_search=SearchConfig(
item_fields=["movie_title", "description"],
fuzziness_edit_distance=0,
)
)

Now configure the embedding for semantic search:

from shaped.autogen.models.embedding_config import EmbeddingConfig
from shaped.autogen.models.encoder import Encoder
from shaped.autogen.models.hugging_face_encoder import HuggingFaceEncoder

embedding_model = "sentence-transformers/all-MiniLM-L6-v2"

hybrid_search_engine.index.embeddings = [
EmbeddingConfig(
name="movie_text_embedding",
encoder=Encoder(
HuggingFaceEncoder(
model_name=embedding_model,
item_fields=["movie_title", "description"],
)
),
)
]

Start indexing process

After configuring your engine's data and index, use the create engine method to start the indexing process:

client.create_engine(engine_config=hybrid_search_engine)

Make a hybrid search query

After the engine is finished indexing, you can search your Pixar movies using both lexical and semantic search.

Use two retrievers in the same query - one for lexical search and one for vector search. The results will be automatically combined and deduplicated:

from shaped import RankQueryBuilder, TextSearch

query = (
RankQueryBuilder()
.from_entity('item')
.retrieve([
TextSearch(
input_text_query='$query',
mode={'type': 'lexical'},
limit=50,
name='lexical_search'
),
TextSearch(
input_text_query='$query',
mode={'type': 'vector', 'text_embedding_ref': 'movie_text_embedding'},
limit=50,
name='vector_search'
)
])
.limit(20)
.build()
)

results = client.execute_query(
engine_name="hybrid_search",
query=query,
parameters={"query": "Incredibles"},
return_metadata=True,
)