Skip to main content

Music Recommendations (LastFM)

In this tutorial we'll show you how to setup a recommendation model for the LastFM-360k dataset using Shaped. This dataset contains listening data (number of plays) for ~360,000 users on ~160,000 artists. With Shaped we'll be able to learn a recommendation model that can predict the most likely artists each user will want to listen to.

This tutorial will be shown using Shaped's local dataset connector, but you can easily translate to any of the data stores or real-time connectors we support.

Let's get started! 🚀

You can follow along in our accompanying notebook!

Shaped CLI Setup

Installing the Shaped CLI

You'll need to install the Shaped CLI if you haven't already. You can do this with the following command:

pip install shaped
info

Shaped supports Python 3.8+, take a look at the installation instructions if you need to install pip.

Initialize the CLI

You can then initialize the shaped client with your API key. If you don't have an API key yet, check out the How to get an API key page.

shaped init --api-key <YOUR_API_KEY>

Dataset Preparation

Download public dataset

To start off, let's fetch the publicly hosted LastFM dataset we'll be training our model with. (NOTE: this step can take ~10 minutes)

CLI
curl http://mtg.upf.edu/static/datasets/last.fm/lastfm-dataset-360K.tar.gz -o lastfm-dataset-360K.tar.gz
tar -xzf lastfm-dataset-360K.tar.gz

Taking a look at the downloaded dataset, there are two tab-separated files (TSVs) of interest:

  • plays which are stored in lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv
  • users which are stored in lastfm-dataset-360K/usersha1-profile.tsv

lastfm_tables

To keep things simple for this tutorial, we will only use events (the plays table) to create the model. We will also trim the dataset down to

Unfortunately each of these tab separated files don't have a header (which is required by Shaped). To address this, we will prepend the header. As part of this step, we will also trim the dataset down to 100k samples. The command is as follows:

(echo "user_id\tartist_id\tartist_name\tplays"; head -n 100000 lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv) > lastfm-dataset-360K/user-artist-plays-100k.tsv

To keep things as simple as possible, this tutorial only uses events to create the model. If you want to use the user and item data as well, just carry out the steps below in the same way. You can see how that's done in the notebook for this tutorial.

Additionally, to save time for this tutorial, we trim the dataset down to the first 100k events. The entire LastFM dataset has ~17 million events! See for yourself with wc -l lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv.

Create LastFM Shaped dataset

For this tutorial we're going to be creating a Shaped Dataset and inserting the plays records into it. To create this dataset, we have a concise cli command called create-dataset-from-uri. This command will create a dataset and then insert the records in for you.

CLI
shaped create-dataset-from-uri --name lastfm_plays --path lastfm-dataset-360K/user-artist-plays-100k.tsv --type tsv

You'll see the records uploading in batches of 1000, once it has reached 100k records you can move forward.

Create your model

We're now ready to create your Shaped model! To keep things simple, today, we're using the plays records to build a collaborative filtering model. Shaped will use these plays to determine which users like which artist with the assumption that the more plays the artist has the more likely a user likes the artist.

Here's the create model definition we'll be using, and the corresponding create-model command.

lastfm_artist_recommendations.yaml
model:
name: lastfm_artist_recommendations
connectors:
- type: Dataset
id: lastfm_plays
name: lastfm_plays
fetch:
events: |
SELECT user_id, artist_id AS item_id, 0 AS created_at, plays AS label
FROM lastfm_plays
shaped create-model --file lastfm_artist_recommendations.yaml

For further details about creating models please refer to the Create Model API reference.

Inspect your model

Your recommendation model can take up to a few hours to provision your infrastructure and train on your historic events. This time mostly depends on how large your dataset is i.e. the volume of your users, items and interactions and the number of attributes you're providing.

While the model is being setup, you can view it's status with either the List Models or View Model endpoints. For example, with the CLI:

shaped list-models

Response:

[
"models": {
"created_at": "2024-05-15T08:55:23 UTC",
"model_name": "lastfm_artist_recommendations",
"model_uri": "https://api.shaped.ai/v1/models/lastfm_artist_recommendations",
"status": "FETCHING",
}
]

As you see the model is currently fetching the data. The initial model creation pipeline goes through the following stages in order:

  1. SCHEDULING
  2. FETCHING
  3. TUNING
  4. TRAINING
  5. DEPLOYING
  6. ACTIVE

You can periodically poll Shaped to inspect these status changes. Once it's in the ACTIVE state, you can move to next step and use it to make rank requests.

Fetch your recommendations

You're now ready to fetch your artist recommendations. You can do this with the Rank endpoint, just provide the user_id you wish to get the recommendations for and the number of recommendations you want returned.

Shaped's CLI provides a convenience rank command to quickly retrieve results from the command line. You can use it as follows:

shaped rank --model-name lastfm_artist_recommendations --user-id 00000c289a1829a808ac09c00daf10bc3c4e223b --limit 5

Response:

{
"ids":[
"67e344da-ec54-4e26-b2a4-8351d744a14c",
"b7ffd2af-418f-4be2-bdd1-22f8b48613da",
"a74b1b7f-71a5-4011-9441-d0b5e4122711",
"e7c2d42e-b045-41b6-a391-88f4ea545185",
"f2fddf9f-02fd-421a-b5e8-75a3988309ab"
],
"scores":[
1.0,
0.43973369,
0.37249291,
0.3511156,
0.33543342
],
}

The response returns 2 parallel arrays containing the ids and ranking scores for the artists that Shaped estimates are most interesting to the given user.

If you want to integrate this endpoint into your website or application you can use the Rank POST REST endpoint directly with the following request:

curl https://api.prod.shaped.ai/v1/models/lastfm_artist_recommendations/rank \
-H "x-api-key: <API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"user_id": "00000c289a1829a808ac09c00daf10bc3c4e223b",
"limit": 5
}'

Clean Up

Don't forget to delete your dataset and model once you've finished with it, you can do it with the following CLI command:

shaped delete-model --model-name lastfm_artist_recommendations
shaped delete-dataset --dataset-name lastfm_plays