Skip to main content

Similar items

This page covers similar item queries for finding items closest to a given item. For query fundamentals, see Query Basics.

A similar item query returns the items which are closest to a given item. This is a retrieval operation that uses similarity search.

You can do item similarity in two ways:

  1. Content similarity, which finds items with similar attributes (e.g., text descriptions, images). The intuition is: "Items that have similar attributes to this item."
  2. Collaborative similarity, which finds items that are frequently interacted with by the same users. The intuition is: "People who like this also like" - items that are co-interacted with by similar users.

Content similarity

Content similarity finds items that share similar attributes, such as text descriptions, categories, tags, or other metadata. This is useful when you want to find items that are objectively similar in their characteristics.

Intuition: "Items that have similar attributes to this item."

Prerequisites

  1. An engine with a content-based embedding (e.g., text embedding) configured
  2. An item to find similar items for

Query example

To find similar items based on content/attribute similarity, use the similarity retrieve type with attribute pooling:

SELECT *
FROM similarity(embedding_ref='text_embedding',
encoder='item_attribute_pooling',
input_item_id='$item_id',
limit=20)

Content similarity with model scoring

Combine content similarity with a personalization model and text encoding similarity for more nuanced ranking:

SELECT *
FROM similarity(embedding_ref='text_embedding',
encoder='item_attribute_pooling',
input_item_id='$item_id', limit=50)
ORDER BY score(expression='0.7 * click_through_rate + 0.3 * cosine_similarity(text_encoding(item, embedding_ref=''text_embedding''), text_encoding(user, embedding_ref=''text_embedding''))', input_user_id='$user_id', input_interactions_item_ids='$interaction_item_ids')
LIMIT 20

Collaborative similarity

Collaborative similarity finds items that are frequently interacted with by the same users, based on interaction patterns rather than item attributes. This captures the "wisdom of the crowd" - if users who liked item A also liked item B, then A and B are similar.

Intuition: "People who like this also like" - items that are co-interacted with by similar users.

Prerequisites

  1. An engine with a trained collaborative embedding (e.g., ALS) configured
  2. An item to find similar items for

Query example

To find similar items based on collaborative filtering (interaction patterns), use the similarity retrieve type with a precomputed item embedding:

SELECT *
FROM similarity(embedding_ref='als_embedding',
encoder='precomputed_item',
input_item_id='$item_id',
limit=20)

Collaborative similarity with model scoring

Combine collaborative similarity with a personalization model to rank similar items by predicted user engagement:

SELECT *
FROM similarity(embedding_ref='als_embedding',
encoder='precomputed_item',
input_item_id='$item_id', limit=50)
ORDER BY score(expression='0.6 * click_through_rate + 0.4 * conversion_rate', input_user_id='$user_id', input_interactions_item_ids='$interaction_item_ids')
LIMIT 20

Similar items with multi-objective scoring

Combine multiple similarity signals and scoring models to find items that are both similar and likely to engage the user. This approach balances content similarity, collaborative similarity, and personalization.

Prerequisites

  1. An engine with both content and collaborative embeddings configured
  2. Multiple trained scoring models (e.g., click_through_rate, conversion_rate)
  3. An item to find similar items for
  4. Optionally, a user ID for personalization

Query example

Retrieve candidates from multiple similarity sources and score using an ensemble:

SELECT *
FROM similarity(embedding_ref='text_embedding',
encoder='item_attribute_pooling',
input_item_id='$item_id', limit=50, name='content'),
similarity(embedding_ref='als_embedding',
encoder='precomputed_item',
input_item_id='$item_id', limit=50, name='collab')
ORDER BY score(expression='0.4 * retrieval.content + 0.3 * retrieval.collab + 0.2 * click_through_rate + 0.1 * conversion_rate', input_user_id='$user_id', input_interactions_item_ids='$interaction_item_ids')
LIMIT 20