Skip to main content

Image embeddings

This guide describes how you can implement semantic search for images using CLIP embeddings. CLIP is a multimodal model that can encode both images and text into the same vector space, enabling image-to-image and text-to-image search.

First, instantiate the Shaped client:

from shaped import Client

client = Client(api_key="YOUR_KEY_HERE")

Upload data

To do image search, you need to first connect to a data table that contains image URLs or image data. In this example, you'll declare a table and upload some rows of data manually.

Alternatively, you can sync data from an external data source using a connector, or simply import a JSONL or CSV file.

The first step is to declare your data table. You'll use a CUSTOM schema so that you can define the table's input columns yourself and write records manually.

table_config = {
"schema_type": "CUSTOM",
"name": "pixar_movies",
"column_schema": {
"item_id": "Int64",
"movie_title": "String",
"poster_url": "String",
"description": "String",
"release_date": "String",
},
}

client.create_table(table_config)

Once the table schema is created, you can upload your data directly. You'll declare the rows as an object and then use the table insert method to write to the table.

Use the insert rows method to add your data to the table you created:

records = [
{"item_id": 187541, "movie_title": "Incredibles 2 (2018)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTEzNzY0OTg0NTdeQTJeQWpwZ15BbWU4MDU3OTg3MjUz._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "The Incredibles family takes on a new mission which involves a change in family roles: Bob Parr (Mr. Incredible) must manage the house while his wife Helen (Elastigirl) goes out to save the world.", "release_date": "2018-06-15"},
{"item_id": 177765, "movie_title": "Coco (2017)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMDIyM2E2NTAtMzlhNy00ZGUxLWI1NjgtZDY5MzhiMDc5NGU3XkEyXkFqcGc@._V1_QL75_UY562_CR7,0,380,562_.jpg", "description": "Aspiring musician Miguel, confronted with his family's ancestral ban on music, enters the Land of the Dead to find his great-great-grandfather, a legendary singer.", "release_date": "2017-11-22"},
{"item_id": 170957, "movie_title": "Cars 3 (2017)", "poster_url": "https://m.media-amazon.com/images/M/MV5BMTc0NzU2OTYyN15BMl5BanBnXkFtZTgwMTkwOTg2MTI@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "Lightning McQueen sets out to prove to a new generation of racers that he's still the best race car in the world.", "release_date": "2017-06-16"},
{"item_id": 157296, "movie_title": "Finding Dory (2016)", "poster_url": "https://m.media-amazon.com/images/M/MV5BY2VlYWJjMGMtYjcwZC00MDE2LThmMDItYjVlMzNhYzBhYTk5XkEyXkFqcGc@._V1_QL75_UY562_CR7,0,380,562_.jpg", "description": "Friendly but forgetful blue tang Dory begins a search for her long-lost parents and everyone learns a few things about the real meaning of family along the way.", "release_date": "2016-06-17"},
{"item_id": 134853, "movie_title": "Inside Out (2015)", "poster_url": "https://m.media-amazon.com/images/M/MV5BOTgxMDQwMDk0OF5BMl5BanBnXkFtZTgwNjU5OTg2NDE@._V1_QL75_UX380_CR0,0,380,562_.jpg", "description": "After young Riley is uprooted from her Midwest life and moved to San Francisco, her emotions, Joy, Fear, Anger, Disgust, and Sadness, conflict on how best to navigate a new city, house, and school.", "release_date": "2015-06-19"},
{"item_id": 1, "movie_title": "Toy Story (1995)", "poster_url": "https://m.media-amazon.com/images/M/MV5BZTA3OWVjOWItNjE1NS00NzZiLWE1MjgtZDZhMWI1ZTlkNzYwXkEyXkFqcGc@._V1_QL75_UX380_CR0,2,380,562_.jpg", "description": "A cowboy doll is profoundly jealous when a new spaceman action figure supplants him as the top toy in a boy's bedroom. When circumstances separate them from their owner, the duo have to put aside their differences to return to him.", "release_date": "1995-11-22"},
]
client.insert_table_rows("pixar_movies", records)

Set up your engine

Now you will configure the image search engine. The engine will encode images using CLIP embeddings, which represent images in a vector space that can be searched semantically.

Start by instantiating the engine configuration class:

from shaped.autogen.models.engine_config_v2 import EngineConfigV2
from shaped.autogen.models.data_config import DataConfig

image_search_engine = EngineConfigV2(
name="image_search",
data=DataConfig(),
)

Connect engine to data

Engines must connect to an item table, which define the candidate items in your catalog. Connect the table you just created (pixar_movies) to the reference config.

from shaped.autogen.models.data_config_interaction_table import DataConfigInteractionTable
from shaped.autogen.models.reference_table_config import ReferenceTableConfig

image_search_engine.data = DataConfig(
item_table=DataConfigInteractionTable(
ReferenceTableConfig(
name="pixar_movies"
)
)
)

Image encoding

To encode your images, you need to choose which image column to use and select a CLIP embedding model. CLIP models can encode both images and text into the same vector space, enabling text-to-image search.

Configure CLIP embeddings for the poster URL field:

from shaped.autogen.models.index_config import IndexConfig
from shaped.autogen.models.embedding_config import EmbeddingConfig
from shaped.autogen.models.encoder import Encoder
from shaped.autogen.models.hugging_face_encoder import HuggingFaceEncoder

embedding_model = "openai/clip-vit-base-patch32"

image_search_engine.index = IndexConfig(
embeddings=[
EmbeddingConfig(
name="image_embedding",
encoder=Encoder(
HuggingFaceEncoder(
model_name=embedding_model,
item_fields=["poster_url"],
)
),
)
]
)

Start encoding process

After configuring your engine's data and index, use the create engine method to start the encoding process:

client.create_engine(engine_config=image_search_engine)

Make an image search query

After the engine is finished encoding, you can search your images using either text queries or image queries.

Use the text_search retriever with mode vector to perform text-to-image search. The text query will be encoded using CLIP's text encoder and matched against image embeddings:

from shaped import RankQueryBuilder, TextSearch

query = (
RankQueryBuilder()
.from_entity('item')
.retrieve(
TextSearch(
input_text_query='$query',
mode={'type': 'vector', 'text_embedding_ref': 'image_embedding'},
limit=50
)
)
.limit(20)
.build()
)

results = client.execute_query(
engine_name="image_search",
query=query,
parameters={"query": "superhero family"},
return_metadata=True,
)