Movie Recommendations (MovieLens)
In this tutorial we'll show you how to setup a recommendation model for the 100k-MovieLens dataset using Shaped. This dataset contains 100,000 ratings from ~1000 users on ~1700 movies. With Shaped we'll be able to learn a recommendation model that can predict the most likely movies each user will want to watch.
This tutorial will be shown using Shaped's local dataset connector, but you can easily translate to any of the data stores or real-time connectors we support.
Let's get started! 🚀
You can follow along in our accompanying notebook!
Shaped CLI Setup
Installing the Shaped CLI
You'll need to install the Shaped CLI if you haven't already. You can do this with the following command:
pip install shaped
Shaped supports Python 3.8+, take a look at the installation instructions if you need to install pip.
Initialize the CLI
You can then initialize the shaped client with your API key. If you don't have an API key yet, check out the How to get an API key page.
shaped init --api-key <YOUR_API_KEY>
Download public dataset
To start off, let's fetch the publicly hosted MovieLens dataset we'll be training our model with.
wget http://files.grouplens.org/datasets/movielens/ml-100k.zip --no-check-certificate
Taking a look at the downloaded dataset, there are three tab-separated files (TSVs) of interest:
- ratings which are stored in ml-100k/u.data
- users which are stored in ml-100k/u.user
- movies which are stored in ml-100k/u.item
Unfortunately each of these tab separated files don't have a header (which is required by Shaped). To address this, we can prepend the header with the following command:
(echo "user_id\titem_id\trating\ttimestamp"; cat ml-100k/u.data) > ml-100k/u.data_with_header
To keep things as simple as possible, this tutorial only uses events to create the model. If you want to use the user and item data as well, just carry out the steps below in the same way. You can see how that's done in the notebook for this tutorial.
Create MovieLens Shaped dataset
For this tutorial we're going to be creating a Shaped Dataset and inserting the ratings records into it. To create this dataset, you first need to create a dataset definition which includes the schema as follows:
You can use this definition to create the ratings dataset with the
command using Shaped's CLI:
shaped create-dataset --file movielens_dataset.yaml
We now want to insert the movielens ratings into the dataset, which we can do with the
shaped dataset-insert --dataset-name movielens_ratings --file ml-100k/u.data_with_header --type 'tsv'
You'll see the records uploading in batches of 1000, once it has reached 100k records you can move forward.
Create your model
We're now ready to create your Shaped model! To keep things simple, today, we're using the ratings records to build a collaborative filtering model. Shaped will use these ratings to determine which users like which movie with the assumption that the higher the rating the more likely a user likes the rated movie.
Here's the create model definition we'll be using, and the corresponding
- type: Dataset
SELECT user_id, item_id, timestamp AS created_at, rating AS label
shaped create-model --file movielens_movie_recommendations.yaml
For further details about creating models please refer to the Create Model API reference.
Inspect your model
Your recommendation model can take up to a few hours to provision your infrastructure and train on your historic events. This time mostly depends on how large your dataset is i.e. the volume of your users, items and interactions and the number of attributes you're providing.
While the model is being setup, you can view it's status with either the List Models or View Model endpoints. For example, with the CLI:
"created_at": "2023-03-18T19:17:51 UTC",
As you see the model is currently fetching the data. The initial model creation pipeline goes through the following stages in order:
You can periodically poll Shaped to inspect these status changes. Once it's in the
ACTIVE state, you can move to next step and use it to make rank requests.
Fetch your recommendations
You're now ready to fetch your movie recommendations. You can do this with the Rank endpoint, just provide the user_id you wish to get the recommendations for and the number of recommendations you want returned.
Shaped's CLI provides a convenience rank command to quickly retrieve results from the command line. You can use it as follows:
shaped rank --model-name movielens_movie_recommendation --user-id 1 --limit 5
The response returns 2 parallel arrays containing the ids and ranking scores for the movies that Shaped estimates are most interesting to the given user.
If you want to integrate this endpoint into your website or application you can use the Rank POST REST endpoint directly with the following request:
curl https://api.prod.shaped.ai/v1/models/movielens_movie_recommendation/rank \
-H "x-api-key: <API_KEY>" \
-H "Content-Type: application/json"
Don't forget to delete your model once you've finished with it, you can do it with the following CLI command:
shaped delete-model --model-name movielens_movie_recommendation