Recall

The Recall@K metric answers the vital question: Out of all the items the user would have found relevant, how many did we actually surface in those top K spots?

Imagine searching for specific camera equipment on an e-commerce site. Precision@K tells you if the first few results shown are indeed camera gear. Recall@K, on the other hand, tells you if the specific lens or tripod you were hoping to find (assuming it's relevant based on your history or query) actually appeared within those top K results. Both perspectives are essential for evaluating the effectiveness of recommendation and search rankings.

Setting the Stage: Ground Truth and Relevance

As with Precision@K, evaluating Recall@K requires a "ground truth" – a set of items known to be relevant for a user (often derived from their interaction history in offline testing). We compare the model's recommended list against this ground truth.

Let's reuse our movie example. Our user's hidden watch history (the relevant items) is {The Terminator, James Bond, Iron Man, and 3 other relevant movies}. So, the Total Number of Relevant Items \= 6.

Now, let's look at the recommendations generated by two different algorithms:

List A: [The Terminator, James Bond, Love Actually]
List B: [Iron Man, Generic Action Flick 1, Generic Action Flick 2]

What is Recall@K?

Recall@K measures the proportion of all relevant items (from the ground truth set) that are successfully included within the top K positions of the ranked recommendation list.

The formula is:

Recall@K \= (Number of relevant items in the top K recommendations) / (Total number of relevant items)

Let's calculate Recall@K for K=3 using our example:

List A: [The Terminator, James Bond, Love Actually]
- Relevant items in the top 3: The Terminator, James Bond (2 items)
- Total relevant items \= 6
- Recall@3 for List A \= 2 / 6 ≈ 0.33

List B: [Iron Man, Generic Action Flick 1, Generic Action Flick 2]
- Relevant items in the top 3: Iron Man (1 item)
- Total relevant items \= 6
- Recall@3 for List B \= 1 / 6 ≈ 0.17

In this case, List A not only had better Precision@3 (2/3 vs 1/3) but also better Recall@3 (2/6 vs 1/6). It found a larger fraction of the user's total relevant items within the top 3 recommendations.

Why Use Recall@K? (Pros)

Measures Discovery / Coverage: Recall@K is crucial when the goal is to ensure users find as many relevant items as possible, even if it means showing slightly less precise results overall. Think finding all relevant documents in legal search or all compatible parts in an inventory system.
Focuses on User Need Fulfillment: High recall indicates the system is good at retrieving the items the user is potentially looking for from the relevant set.
Complements Precision: Used together, Precision@K and Recall@K provide a balanced view. High precision and high recall is ideal. High precision/low recall means accurate but potentially missing things. Low precision/high recall means finding most relevant items but also including irrelevant ones.
Evaluates Retrieval Effectiveness: It directly measures how effectively the system retrieves relevant items from the large pool of possibilities.

Limitations of Recall@K (Cons)

Ignores Precision: A list could achieve high Recall@K by including many relevant items mixed with many irrelevant ones within the top K. The density of relevant items (Precision) isn't measured.
Sensitive to K: Like Precision@K, the value depends heavily on the chosen K. Recall@20 will almost always be higher than Recall@5.
Dependent on Total Relevant Items: The maximum possible Recall@K is 1.0 (finding all relevant items). However, achieving this might require a very large K if the total number of relevant items is large, making it less practical for interfaces with limited space.
Ignores Ranking Order Within K: Just like Precision@K, finding a relevant item at position K counts the same as finding one at position #1.

The Precision-Recall Trade-off

Often, optimizing for Recall@K can negatively impact Precision@K, and vice-versa. To increase recall (find more relevant items within K), a system might need to be less "strict" in its recommendations, potentially pulling in more borderline or even irrelevant items, thus lowering precision. Conversely, tuning for very high precision might mean the system becomes more conservative, recommending only very high-confidence items and potentially missing other relevant ones, thus lowering recall. Understanding this trade-off is key when tuning ranking systems based on specific business goals.

Evaluating Ranking Performance at Shaped

At Shaped, we recognize that simply showing accurate top results (Precision@K) isn't always enough. Users often need to discover items they might be interested in from their broader set of preferences. That's why Recall@K is another essential metric in our evaluation toolkit.

By tracking Recall@K, we help our customers understand how effectively their Shaped-powered models are retrieving the items users truly care about from the entire relevant catalog, within the crucial top K slots. We use Recall@K alongside Precision@K, MAP, AUC, and other metrics to provide a holistic view of model performance, enabling informed decisions about model tuning and optimization strategies.

Conclusion: Finding What Matters

Recall@K provides a vital perspective on ranking performance by measuring the system's ability to find or discover relevant items within the top K results. While Precision@K focuses on the accuracy of the presented list, Recall@K focuses on the coverage of the user's actual interests. Neither metric tells the whole story alone, but together, they offer powerful insights into the quality of your recommendation and search rankings. Understanding and utilizing both is crucial for building systems that not only show good items but also help users effectively discover what they're looking for.

Want to ensure your users are finding all the relevant items they need?

Request a demo of Shaped today to see how we measure Recall@K and other key metrics to build high-performing discovery experiences. Or, start exploring immediately with our free trial sandbox.

Recall

Setting the Stage: Ground Truth and Relevance​

What is Recall@K?​

Why Use Recall@K? (Pros)​

Limitations of Recall@K (Cons)​

The Precision-Recall Trade-off​

Evaluating Ranking Performance at Shaped​

Conclusion: Finding What Matters​