RentTheRunway Recommendations (RentTheRunway)
In this tutorial we'll show you how to setup a recommendation model for the RentTheRunway dataset using Shaped. This dataset contains 192,544 ratings from 105,508 users on 5,850 movies. With Shaped we'll be able to learn a recommendation model that can predict the most likely items each user will want to rent.
This tutorial will be shown using Shaped's local dataset connector, but you can easily translate to any of the data stores or real-time connectors we support.
Let's get started! 🚀
You can follow along in our accompanying notebook!
Shaped CLI Setup​
Installing the Shaped CLI​
You'll need to install the Shaped CLI if you haven't already. You can do this with the following command:
pip install shaped
Shaped supports Python 3.8+, take a look at the installation instructions if you need to install pip.
Initialize the CLI​
You can then initialize the shaped client with your API key. If you don't have an API key yet, check out the How to get an API key page.
shaped init --api-key <YOUR_API_KEY>
Dataset Preparation​
Download public dataset​
To start off, let's fetch the publicly hosted RentTheRunway dataset we'll be training our model with.
wget https://mcauleylab.ucsd.edu/public_datasets/data/renttherunway/renttherunway_final_data.json.gz --no-check-certificate
gunzip renttherunway_final_data.json.gz
Let's take a look at the downloaded dataset. There is a JSON file called renttherunway_final_data.json which inlcudes information about events, users and items in one table. Let us convert the data in this JSON file into a .csv file. Additionally let us preprocess the weight and height columns into numerical type. In the raw data, the rating column is available in 1-10 range. Let us preprocess such that any value > 8 is considered as 1 and <=8 as 0.
You can see the code to perform above preprocessing steps in the accompanying notebook.
data.to_csv('notebook_assets/events.csv', sep='\t', index=False)
We will utilize event data, along with user and item information,
to build the models. While these datasets are typically available
separately, they are combined into a single table in this dataset.
The events.csv
file includes details about events, users, and items.
In the following steps, we will explore methods to define them
individually when configuring the model.
You can see how that's done in the notebook for this tutorial.
Create RentTheRunway Shaped dataset​
For this tutorial we're going to be creating a custom Shaped Dataset and
insert rows from events.csv
into it. We will define the dataset
schema using a yaml configuration file and then use the
shaped create-dataset
command to create the dataset. We will
insert data into it using the shaped dataset-insert
command.
column_schema:
age: Int32
body_type: String
bust_size: String
category: String
fit: String
height: Int32
item_id: String
rating: Int32
rented_for: String
review_date: DateTime
review_summary: String
review_text: String
size: Int32
user_id: String
weight: Int32
name: rent_runway_events
schema_type: CUSTOM
shaped create-dataset --file notebook_assets/events_dataset_schema.yaml
shaped dataset-insert --dataset-name rent_runway_events --file notebook_assets/events.csv --type 'tsv'
You'll see the records uploading in batches of 1000, once it has reached 192,544 records you can move forward.
Create your model​
We're now ready to create your Shaped model! We will use information about
events, users and items to create the model. Since all the information is
present in single rent_runway_events
table, we will write the SQL queries
that extract relavant columns corresponding to each type of data (look at the
below YAML configuration file)
connectors:
- id: rent_runway_events
name: rent_runway_events
type: Dataset
fetch:
events: SELECT user_id, item_id, review_date AS created_at, rating AS label, rented_for,
review_text, review_summary FROM rent_runway_events
items: SELECT item_id, fit, category, size FROM rent_runway_events
users: SELECT user_id, bust_size, weight, body_type, height, age FROM rent_runway_events
model:
name: rent_runway_recommendations
Here's the create model definition we'll be using, and the corresponding
create-model
command.
shaped create-model --file notebook_assets/rent_runway_model_schema.yaml
For further details about creating models please refer to the Create Model API reference.
Inspect your model​
Your recommendation model can take up to a few hours to provision your infrastructure and train on your historic events. This time mostly depends on how large your dataset is i.e. the volume of your users, items and interactions and the number of attributes you're providing.
While the model is being setup, you can view it's status with either the List Models or View Model endpoints. For example, with the CLI:
shaped list-models
Response:
[
"models": {
"created_at": "2023-03-18T19:17:51 UTC",
"model_name": "rent_runway_recommendations",
"model_uri": "https://api.prod.shaped.ai/v1/models/rent_runway_recommendations",
"status": "FETCHING",
}
]
As you see the model is currently fetching the data. The initial model creation pipeline goes through the following stages in order:
SCHEDULING
FETCHING
TUNING
TRAINING
DEPLOYING
ACTIVE
You can periodically poll Shaped to inspect these status changes. Once it's in the
ACTIVE
state, you can move to next step and use it to make rank requests.
Fetch your recommendations​
You're now ready to fetch your movie recommendations. You can do this with the Rank endpoint, just provide the user_id you wish to get the recommendations for and the number of recommendations you want returned.
Shaped's CLI provides a convenience rank command to quickly retrieve results from the command line. You can use it as follows:
shaped rank --model-name rent_runway_recommendations --user-id 1 --limit 5
Response:
{
"ids":[
"427010",
"182094",
"332874",
"827918",
"403528"
],
"scores":[
0.9,
0.8,
0.7,
0.3,
0.2
],
}
The response returns 2 parallel arrays containing the ids and ranking scores for the movies that Shaped estimates are most interesting to the given user.
If you want to integrate this endpoint into your website or application you can use the Rank POST REST endpoint directly with the following request:
curl https://api.prod.shaped.ai/v1/models/rent_runway_recommendations/rank \
-H "x-api-key: <API_KEY>" \
-H "Content-Type: application/json"
-d '{
"user_id": "1",
"limit": 5,
}'
Clean Up​
Don't forget to delete your model once you've finished with it, you can do it with the following CLI command:
shaped delete-model --model-name rent_runway_recommendations