Huggingface Integration
Shaped enhances its recommendation and search capabilities by integrating with Hugging Face's extensive library of pre-trained language models. This integration allows you to leverage state-of-the-art natural language processing (NLP) and computer vision models to encode your unstructured data (text and images) into meaningful vector representations. These vector representations, also known as embeddings, are then used to power semantic search and improve the accuracy of downstream ranking and recommendation models.
How it Works
The integration works by utilizing a configurable parameter called
language_model_name
within your Shaped model configuration. This parameter allows
you to specify the URI of a Hugging Face model, such as:
sentence-transformers/all-MiniLM-L6-v2
(a sentence-transformers model)sentence-transformers/distiluse-base-multilingual-cased-v2
(a multilingual sentence transformer model)openai/clip-vit-base-patch32
(a CLIP model)jinaai/jina-clip-v2
(Custom Clip model from Jina AI)
Based on the model you select, Shaped automatically encodes your unstructured data into vector embeddings. These embeddings capture the semantic meaning of the data, enabling more sophisticated search and recommendation functionalities.
Encoding Process
Shaped performs a best-effort encoding based on the capabilities of the specified
language_model_name
.
- Text-Only Models: If you choose a text-only model (e.g., a sentence-transformers model), Shaped will encode only the text fields in your data.
- Image-Only Models: If you choose an image-only model, Shaped will encode only the image data.
- Multimodal Models: For multimodal models like CLIP (which support both text and images), Shaped will encode both text and image data, creating a unified embedding space.
Downstream Usage
The generated embeddings are used in two primary ways:
- Semantic Search: Enables users to search using natural language queries that are semantically matched to the encoded items.
- Features for Ranking/Recommendation Models: The embeddings serve as powerful features for other Shaped policies, such as scoring and embedding policies, enriching the models with semantic understanding.
Supported Model Types
Shaped's Hugging Face integration supports several model types:
Sentence Transformers:
- Any model from the Sentence Transformers collection on Hugging Face.
- Optimized for creating semantically meaningful sentence embeddings.
- Ideal for text-based search and recommendations.
CLIP (Contrastive Language-Image Pre-training):
- Any model from the CLIP collection on Hugging Face.
- Supports both text and image modalities, creating a shared embedding space.
- Enables cross-modal search (e.g., searching for images using text descriptions).
Custom Backbones:
- Shaped supports custom backbones from providers like Nomic AI and Jina AI, offering flexibility beyond standard Hugging Face models.
- Enables integration with specialized or proprietary models.
Configuration
To use a Hugging Face model, simply set the language_model_name
parameter in your
Shaped model configuration file to the URI of the desired model.
Example Configuration
model:
name: my-huggingface-powered-model
language_model_name: sentence-transformers/all-MiniLM-L6-v2
model_policies:
scoring_policy:
policy_type: lightgbm
# ... other scoring policy configurations ...
embedding_policy:
policy_type: two-tower
# ... other embedding policy configurations ...
This configuration will:
- Use the
sentence-transformers/all-MiniLM-L6-v2
model to encode all text data in your item catalog. - Use these text embeddings for semantic search and as features for the
lightgbm
scoring policy and thetwo-tower
embedding policy.
Multimodal Example (CLIP)
model:
name: my-multimodal-model
language_model_name: openai/clip-vit-base-patch32
model_policies:
# ... other policy configurations ...
This configuration will:
- Use the
openai/clip-vit-base-patch32
model to encode both text and image data. - Enable cross-modal search and provide rich multimodal features to downstream models.
Benefits of Using Hugging Face Models
- State-of-the-Art Performance: Leverage cutting-edge NLP and computer vision models pre-trained on massive datasets.
- Semantic Understanding: Capture the meaning of your data, going beyond keyword matching.
- Flexibility: Choose from a wide variety of models tailored for different modalities and tasks.
- Ease of Use: Integrate powerful models with a single configuration parameter.
- Improved Search and Recommendations: Enhance the accuracy and relevance of your search and recommendation results.
Further Considerations
- Model Selection: Choose a model that aligns with your data modalities (text, images, or both) and your specific use case (e.g., search, recommendations, or both).
- Computational Resources: Larger and more complex models may require more computational resources for encoding and inference.