Music Recommendations (LastFM)
In this tutorial we'll show you how to setup a recommendation model for the LastFM-360k dataset using Shaped. This dataset contains listening data (number of plays) for ~360,000 users on ~160,000 artists. With Shaped we'll be able to learn a recommendation model that can predict the most likely artists each user will want to listen to.
This tutorial will be shown using Shaped's local dataset connector, but you can easily translate to any of the data stores or real-time connectors we support.
Let's get started! 🚀
You can follow along in our accompanying notebook!
Shaped CLI Setup​
Installing the Shaped CLI​
You'll need to install the Shaped CLI if you haven't already. You can do this with the following command:
pip install shaped
Shaped supports Python 3.8+, take a look at the installation instructions if you need to install pip.
Initialize the CLI​
You can then initialize the shaped client with your API key. If you don't have an API key yet, check out the How to get an API key page.
shaped init --api-key <YOUR_API_KEY>
Dataset Preparation​
Download public dataset​
To start off, let's fetch the publicly hosted LastFM dataset we'll be training our model with. (NOTE: this step can take ~10 minutes)
curl http://mtg.upf.edu/static/datasets/last.fm/lastfm-dataset-360K.tar.gz -o lastfm-dataset-360K.tar.gz
tar -xzf lastfm-dataset-360K.tar.gz
Taking a look at the downloaded dataset, there are two tab-separated files (TSVs) of interest:
- plays which are stored in lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv
- users which are stored in lastfm-dataset-360K/usersha1-profile.tsv
To keep things simple for this tutorial, we will only use events (the plays table) to create the model. We will also trim the dataset down to
Unfortunately each of these tab separated files don't have a header (which is required by Shaped). To address this, we will prepend the header. As part of this step, we will also trim the dataset down to 100k samples. The command is as follows:
(echo "user_id\tartist_id\tartist_name\tplays"; head -n 100000 lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv) > lastfm-dataset-360K/user-artist-plays-100k.tsv
To keep things as simple as possible, this tutorial only uses events to create the model. If you want to use the user and item data as well, just carry out the steps below in the same way. You can see how that's done in the notebook for this tutorial.
Additionally, to save time for this tutorial, we trim the dataset down to the first 100k events. The entire LastFM dataset has ~17 million events!
See for yourself with wc -l lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv
.
Create LastFM Shaped dataset​
For this tutorial we're going to be creating a Shaped Dataset and inserting the plays
records into it. To create this dataset, we have a concise cli command called
create-dataset-from-uri
. This command will create a dataset and then insert the
records in for you.
shaped create-dataset-from-uri --name lastfm_plays --path lastfm-dataset-360K/user-artist-plays-100k.tsv --type tsv
You'll see the records uploading in batches of 1000, once it has reached 100k records you can move forward.
Create your model​
We're now ready to create your Shaped model! To keep things simple, today, we're using the plays records to build a collaborative filtering model. Shaped will use these plays to determine which users like which artist with the assumption that the more plays the artist has the more likely a user likes the artist.
Here's the create model definition we'll be using, and the corresponding
create-model
command.
model:
name: lastfm_artist_recommendations
connectors:
- type: Dataset
id: lastfm_plays
name: lastfm_plays
fetch:
events: |
SELECT user_id, artist_id AS item_id, 0 AS created_at, plays AS label
FROM lastfm_plays
shaped create-model --file lastfm_artist_recommendations.yaml
For further details about creating models please refer to the Create Model API reference.
Inspect your model​
Your recommendation model can take up to a few hours to provision your infrastructure and train on your historic events. This time mostly depends on how large your dataset is i.e. the volume of your users, items and interactions and the number of attributes you're providing.
While the model is being setup, you can view it's status with either the List Models or View Model endpoints. For example, with the CLI:
shaped list-models
Response:
[
"models": {
"created_at": "2024-05-15T08:55:23 UTC",
"model_name": "lastfm_artist_recommendations",
"model_uri": "https://api.shaped.ai/v1/models/lastfm_artist_recommendations",
"status": "FETCHING",
}
]
As you see the model is currently fetching the data. The initial model creation pipeline goes through the following stages in order:
SCHEDULING
FETCHING
TUNING
TRAINING
DEPLOYING
ACTIVE
You can periodically poll Shaped to inspect these status changes. Once it's in the
ACTIVE
state, you can move to next step and use it to make rank requests.
Fetch your recommendations​
You're now ready to fetch your artist recommendations. You can do this with the Rank endpoint, just provide the user_id you wish to get the recommendations for and the number of recommendations you want returned.
Shaped's CLI provides a convenience rank command to quickly retrieve results from the command line. You can use it as follows:
shaped rank --model-name lastfm_artist_recommendations --user-id 00000c289a1829a808ac09c00daf10bc3c4e223b --limit 5
Response:
{
"ids":[
"67e344da-ec54-4e26-b2a4-8351d744a14c",
"b7ffd2af-418f-4be2-bdd1-22f8b48613da",
"a74b1b7f-71a5-4011-9441-d0b5e4122711",
"e7c2d42e-b045-41b6-a391-88f4ea545185",
"f2fddf9f-02fd-421a-b5e8-75a3988309ab"
],
"scores":[
1.0,
0.43973369,
0.37249291,
0.3511156,
0.33543342
],
}
The response returns 2 parallel arrays containing the ids and ranking scores for the artists that Shaped estimates are most interesting to the given user.
If you want to integrate this endpoint into your website or application you can use the Rank POST REST endpoint directly with the following request:
curl https://api.prod.shaped.ai/v1/models/lastfm_artist_recommendations/rank \
-H "x-api-key: <API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"user_id": "00000c289a1829a808ac09c00daf10bc3c4e223b",
"limit": 5
}'
Clean Up​
Don't forget to delete your dataset and model once you've finished with it, you can do it with the following CLI command:
shaped delete-model --model-name lastfm_artist_recommendations
shaped delete-dataset --dataset-name lastfm_plays