Skip to main content

Shaped Model Library

While Shaped simplifies the complexity of building and deploying recommendation systems, under the hood lies a sophisticated ecosystem of models working in concert to deliver exceptional results. This guide pulls back the curtain and provides insights into the diverse model library powering Shaped's capabilities.

Content Understanding: Transforming Raw Data into Meaningful Representations

Before any recommendations can be made, Shaped needs to understand the essence of your content. Content understanding models analyze your raw data and generate rich embeddings that capture the underlying meaning and relationships.

Shaped's content understanding library includes:

  • Text Encoders:
    • BERT and beyond: Powerful transformer-based language models that excel at capturing semantic meaning and relationships within text.
    • LLM Encoders: Cutting-edge large language models (LLMs) like Llama2 and GPT variants offer unparalleled capabilities in understanding and representing complex textual content.
  • Multi-Modal Encoders:
    • CLIP: Connects images and text, enabling cross-modal search and recommendations by learning joint representations from both modalities.
  • Other Specialized Encoders: Shaped continuously incorporates emerging encoder architectures tailored for specific data types, such as audio, video, or time-series data.
  • Fine-tuning: For optimal performance, Shaped can fine-tune pre-trained content models (text, image, or multi-modal) on your specific data and domain.

Retrieval: Navigating the Vast Sea of Possibilities

Retrieving relevant candidates from your potentially massive catalog is a critical step in the recommendation process. Shaped employs a diverse set of retrieval models to efficiently surface the most promising items.

Shaped's retrieval toolbox includes:

  • Embedding-Based Retrieval:
    • Two-Tower Neural Networks: Separate encoders for users and items learn representations in a shared embedding space, enabling efficient similarity search for candidate retrieval.
    • Matrix Factorization (e.g. ALS): A classic collaborative filtering technique that uncovers latent factors driving user-item interactions, often used for initial candidate retrieval.
    • Word2Vec and Prod2Vec: Word embedding models are extended to represent items or user sessions.
    • BERT4Rec and SAS4Rec: Transformer-based models specifically designed for sequential recommendation tasks, capturing user behavior and item order for effective candidate retrieval.
    • Autoencoders: Learn compressed representations of items, enabling efficient similarity search for retrieval.
    • Collaborative filtering-based Retrieval : Leverage user-item interactions to directly retrieve items similar to those a user has previously interacted with.
    • Content Encoder Retrieval Strategies: Leverage the embeddings generated by the content understanding models to directly retrieve items similar to those a user has previously interacted with.
  • Traditional Methods:
    • N-gram Models: Capture short word sequences for basic content-based retrieval.
    • Cold-start: Retrieve items based on metadata or content features when user interaction data is sparse.
    • Trending: Retrieve items that are recently popular.
    • Popular: Retrieve items based on their historic popularity.
    • Chronological: Retrieve recently added items.
    • BM25: A classic information retrieval algorithm that ranks items based on keyword relevance.
  • Hybrid Retrieval Strategies: Shaped can combine different retrieval methods (e.g., collaborative filtering followed by content-based filtering) to optimize relevance and diversity.
  • Approximate Nearest Neighbor (ANN) Search: For increased speed and scalability, especially with large catalogs, Shaped integrates highly optimized ANN libraries like Faiss, Annoy, or HNSWlib.

Scoring: Predicting the Perfect Match

Once potential candidates are retrieved, scoring models estimate the likelihood of a user engaging with each item, enabling precise ranking and prioritization.

Shaped's scoring models employ a variety of techniques:

  • Deep Learning Methods:
    • DeepFM, Wide & Deep, DLRM: Sophisticated deep learning models designed to capture complex interactions between user and item features, providing highly accurate relevance scores.
    • BERT4Rec, SAS4Rec: Transformer-based models can also be adapted for scoring, leveraging their sequential understanding to predict user preferences.
  • Traditional Machine Learning:
    • Linear Scorers: Simple but effective models that learn linear relationships between features and user engagement.
    • Decision Tree Models (e.g., Gradient Boosted Trees, LambdaMART): Capture non-linear patterns and interactions for improved scoring accuracy.
    • Factorization Machines (FMs): Offer a balance between efficiency and accuracy by efficiently modeling feature interactions.
  • Next-Generation Scoring:
    • LLM Rerankers: Emerging applications of LLMs involve reranking initial candidate lists based on more nuanced language understanding and contextual information.

Ordering and Exploration: Balancing Relevance and Discovery

Shaped goes beyond simple ranking to optimize the final order of recommendations, balancing the need for relevance with the desire to introduce users to new and interesting content.

Key ordering and exploration techniques include:

  • Bandit Algorithms: Continuously learn from user feedback to balance the exploration of new items with the exploitation of known preferences. This includes contextual bandits as well as variations like Thompson Sampling or Upper Confidence Bound (UCB).
  • Maximal Marginal Relevance (MMR): Optimizes for both relevance and diversity, ensuring a mix of recommendations that are both engaging and introduce users to a wider range of content.

Baselines: Measuring Success Against the Expected

To understand the true value of Shaped's approach, we benchmark our models against various baselines:

  • Trending: Recommending the most popular items overall.
  • Chronological: Presenting items in reverse chronological order.
  • Popular: Recommending items based purely on their popularity within specific categories or user segments.
  • Random: Providing a non-personalized baseline for comparison.

A Model for Every Need: Shaped's Automated Selection

With such a diverse model library, choosing the right combination for your specific needs might seem daunting. Shaped eliminates this complexity by automatically selecting and configuring the optimal models based on:

  • Your Data: We analyze your data characteristics, volume, and available features to identify the most suitable models.
  • Your Objectives: Your business goals and desired user experience guide our model selection process.
  • Performance Evaluation: We rigorously evaluate multiple model combinations on your data and select the top performers to power your live recommendations.

Additional Considerations

  • Explainability: Shaped can incorporate techniques for explainable AI to provide insights into why certain recommendations are made, especially when using complex models.
  • Bias Mitigation: Shaped considers potential biases in data and can apply techniques to mitigate bias in models, ensuring fair and equitable recommendations.
  • Calibration: Ensuring that model predictions are well-calibrated and align with user preferences is a key focus of Shaped's model selection process.

Conclusion

Shaped provides a powerful engine for building and deploying sophisticated recommendation systems, fueled by a constantly evolving library of cutting-edge models. While the inner workings are complex, Shaped's automated model selection and optimization process ensures you benefit from the latest advancements in AI without the burden of manual configuration. You can trust Shaped to deliver the most relevant, engaging, and personalized recommendations for your users, driving engagement and achieving your business objectives.