Skip to main content

Airbnb Categories

Use Case

For products that are commerce or entertainment based the user-experience of search diminishes as the amount of relevant options exceeds hundreds or thousands of possibilities. The unknown content cause discovery issues for users that leave them limited to what they already know exists. Effective categorization facilitates the discovery of new and unknown content. The advantage of discovery-oriented products that adapt to user interactions and behavior is expanding users' choices beyond what they already know exists surpassing the limitations of search-based or manually curated content. Airbnb recognized the importance of categorizing their vast inventory of properties into meaningful categories such as "Beachfront Getaways," "Mountain Retreats," and "Urban Apartments." This categorization allowed users to quickly find properties that matched their preferences, streamlining the search process and enhancing the user experience leading to more bookings and higher platform usage. If you’re hoping to make your platform provide a similar delightful and engaging experience to your users, and realize the business benefits that go along with that, you must add personalization to it. On the surface, it seems that all you need to create a recommendation system is an events table containing the interactions between users and items (which essentially tell the system what a user likes and dislike). This is enough to get started with a simple collaborative filtering model but you’ll quickly run into major problems.

  • Why do my users always get shown the same set of listings (aka stuck in a filter bubble)?
  • How do I make the experience for new users more personalized?
  • How does a new listing get categorized?
  • How come it doesn’t feel like the results are quickly changing based off my interactions?

Airbnb does an amazing job of handling all these complicated challenges. They incorporate user features, like the user’s previously booked listings and they incorporate item features like location, photos, guest reviews and price/value.(1) They also do content understanding, meaning they can know what the images contain and how they relate to each other. They have a perfect blend of showing content the user has previously shown interest in and showing content they think the user will want to see and discover. All of this is done incredibly quickly by leveraging powerful (and complicated) realtime infrastructure as well as session based models which provide the feeling of the feed “being alive” and “learning on the fly”. If you’d like to read more about Airbnb’s Categories, we’ve written a whole blog post about it here! Shaped does many of the same things Airbnb’s algorithm does and can power this use-case for you out of the box! All you need to do is tell us where you’re storing your user, item, and event data and Shaped will train a model and provide you with an endpoint to power your platform 💪.

Shaped Model Creation

Shaped Model Creation First create the category_recs_model.yaml and the listing_recs_model.yaml files:

category_recs_model.yaml
---
model:
name: category_recs
connectors:
- type: BigQuery
id: bigquery_data
location: us-east1
project_id: my_app
dataset: my_dataset
fetch:
users: |
SELECT
user_id,
country
FROM
bigquery_data.users
items: |
SELECT
category_id as item_id,
FROM
bigquery_data.categories
events: |
(
SELECT
user_id,
category_id as item_id,
created_at,
CASE
WHEN event_name in (‘filter_for_category’) THEN TRUE
ELSE NULL
END AS label
FROM
bigquery_data.events
)
UNION DISTINCT
(
SELECT
e.user_id,
l.category as item_id,
e.created_at,
CASE
WHEN event_name in (‘view_listing’, ‘booked’) THEN TRUE
WHEN event_name in (‘impression’) THEN FALSE
END AS label
FROM
bigquery_data.events AS e
JOIN
bigquery_data.listings AS l ON e.listing_id = l.listing_id
}
listing_recs_model.yaml
---
model:
name: listing_recs
connectors:
- type: BigQuery
id: bigquery_data
location: us-east1
project_id: my_app
dataset: my_dataset
fetch:
users: |
SELECT
user_id,
country
FROM
bigquery_data.users
items: |
SELECT
listing_id as item_id,
location,
photos,
guest_reviews,
price,
size,
amenities,
property_type,
category_id
FROM
bigquery_data.listings
events: |
SELECT
user_id,
listing_id as item_id,
created_at,
CASE
WHEN event_name in (‘booked’) THEN TRUE
WHEN event_name in (‘impression’) THEN FALSE
END AS label
FROM
bigquery_data.events

Then use the Shaped CLI to create the models with the commands:

shaped create-model --file category_recs_model.yaml
shaped create-model --file listing_recs_model.yaml

Shaped Recommendations

To get the categories listed in order you’d do:

shaped rank --model-name category_recs --user-id “325913”

{
"ids":[
"category_id_21",
"category_id_3",
"category_id_19",
"category_id_10",
"category_id_8"
],
"scores":[
0.99,
0.87,
0.86,
0.80,
0.72
]
}

That would tell you the order in which to lay out your categories on the screen. Then to rank the listings within each category for the user we’d make the following call multiple times, passing in each category_id retrieved from the result of the previous call:

shaped rank --model-name listing_recs --user-id “325913” –-filter-predicate “items_category_id == $category_id

{
"ids": [
"listing_id_824",
"listing_id_3424",
"listing_id_81",
"listing_id_292",
...
],
"scores":[
0.98,
0.92,
0.87,
0.85,
0.73
]
}

We could also add a filter to only return listings that have vacancy during the window the user is looking for.

(1) https://www.airbnb.com/resources/hosting-homes/a/how-search-works-on-airbnb-460#:~:text=The%20algorithm%20prioritizes%20the%20total,can%20help%20determine%20its%20quality