Huggingface Integration

Shaped enhances its recommendation and search capabilities by integrating with Hugging Face's extensive library of pre-trained language models. This integration allows you to leverage state-of-the-art natural language processing (NLP) and computer vision models to encode your unstructured data (text and images) into meaningful vector representations. These vector representations, also known as embeddings, are then used to power semantic search and improve the accuracy of downstream ranking and recommendation models.

How it Works

The integration works by utilizing a configurable parameter called language_model_name within your Shaped model configuration. This parameter allows you to specify the URI of a Hugging Face model, such as:

sentence-transformers/all-MiniLM-L6-v2 (a sentence-transformers model)
sentence-transformers/distiluse-base-multilingual-cased-v2 (a multilingual sentence transformer model)
openai/clip-vit-base-patch32 (a CLIP model)
jinaai/jina-clip-v2 (Custom Clip model from Jina AI)

Based on the model you select, Shaped automatically encodes your unstructured data into vector embeddings. These embeddings capture the semantic meaning of the data, enabling more sophisticated search and recommendation functionalities.

Encoding Process

Shaped performs a best-effort encoding based on the capabilities of the specified language_model_name.

Text-Only Models: If you choose a text-only model (e.g., a sentence-transformers model), Shaped will encode only the text fields in your data.
Image-Only Models: If you choose an image-only model, Shaped will encode only the image data.
Multimodal Models: For multimodal models like CLIP (which support both text and images), Shaped will encode both text and image data, creating a unified embedding space.

Downstream Usage

The generated embeddings are used in two primary ways:

Semantic Search: Enables users to search using natural language queries that are semantically matched to the encoded items.
Features for Ranking/Recommendation Models: The embeddings serve as powerful features for other Shaped policies, such as scoring and embedding policies, enriching the models with semantic understanding.

Supported Model Types

Shaped's Hugging Face integration supports several model types:

Sentence Transformers:

Any model from the Sentence Transformers collection on Hugging Face.
Optimized for creating semantically meaningful sentence embeddings.
Ideal for text-based search and recommendations.

CLIP (Contrastive Language-Image Pre-training):

Any model from the CLIP collection on Hugging Face.
Supports both text and image modalities, creating a shared embedding space.
Enables cross-modal search (e.g., searching for images using text descriptions).

Custom Backbones:

Shaped supports custom backbones from providers like Nomic AI and Jina AI, offering flexibility beyond standard Hugging Face models.
Enables integration with specialized or proprietary models.

Configuration

To use a Hugging Face model, simply set the language_model_name parameter in your Shaped model configuration file to the URI of the desired model.

Example Configuration

model:
    name: my-huggingface-powered-model
    language_model_name: sentence-transformers/all-MiniLM-L6-v2
    model_policies:
        scoring_policy:
            policy_type: lightgbm
            # ... other scoring policy configurations ...
        embedding_policy:
            policy_type: two-tower
            # ... other embedding policy configurations ...

This configuration will:

Use the sentence-transformers/all-MiniLM-L6-v2 model to encode all text data in your item catalog.
Use these text embeddings for semantic search and as features for the lightgbm scoring policy and the two-tower embedding policy.

Multimodal Example (CLIP)

model:
    name: my-multimodal-model
    language_model_name: openai/clip-vit-base-patch32
    model_policies:
        # ... other policy configurations ...

This configuration will:

Use the openai/clip-vit-base-patch32 model to encode both text and image data.
Enable cross-modal search and provide rich multimodal features to downstream models.

Benefits of Using Hugging Face Models

State-of-the-Art Performance: Leverage cutting-edge NLP and computer vision models pre-trained on massive datasets.
Semantic Understanding: Capture the meaning of your data, going beyond keyword matching.
Flexibility: Choose from a wide variety of models tailored for different modalities and tasks.
Ease of Use: Integrate powerful models with a single configuration parameter.
Improved Search and Recommendations: Enhance the accuracy and relevance of your search and recommendation results.

Further Considerations

Model Selection: Choose a model that aligns with your data modalities (text, images, or both) and your specific use case (e.g., search, recommendations, or both).
Computational Resources: Larger and more complex models may require more computational resources for encoding and inference.

Huggingface Integration

How it Works​

Encoding Process​

Downstream Usage​

Supported Model Types​

Configuration​

Example Configuration​

Multimodal Example (CLIP)​

Benefits of Using Hugging Face Models​

Further Considerations​