Game Recommendations Using Steam Reviews

In this tutorial we'll show you how to set up a recommendation model for the Steam Australian Reviews dataset using Shaped. We will focus on the australian_user_reviews.json.gz dataset in this tutorial. This dataset contains reviews from Australian Steam users, including whether they recommended the games they reviewed.

With Shaped, we'll learn a recommendation model that can predict which games each user is most likely to enjoy based on their review history.

This tutorial uses Shaped's local dataset connector, but you can easily translate to any of the data stores or real-time connectors we support.

Let's get started! 🚀

You can follow along in our accompanying notebook!

Shaped CLI Setup

Installing the Shaped CLI

You'll need to install the Shaped CLI if you haven't already. You can do this with the following command:

pip install shaped

info

Shaped supports Python 3.8+, take a look at the installation instructions if you need to install pip.

Initialize the CLI

You can then initialize the shaped client with your API key. If you don't have an API key yet, check out the How to get an API key page.

shaped init --api-key <YOUR_API_KEY>

Dataset Preparation

Download public dataset

To start off, let's fetch the Steam Australian User Reviews dataset we'll be training our model with.

CLI

wget https://mcauleylab.ucsd.edu/public_datasets/data/steam/australian_user_reviews.json.gz --no-check-certificate

Parse and prepare the data

The Steam reviews dataset is stored in a JSON.gz file with a nested structure. Each record contains a user_id and an array of reviews they've given for different games. We need to transform this into a flattened format where each row represents a single user-game review.

Let's parse this data and create a clean TSV file that we can use with Shaped:

Python
import pandas as pd
import gzip
import json
import re
from datetime import datetime

def parse(path): 
    """Parse each line of the compressed JSON file."""
    g = gzip.open(path, 'r') 
    for l in g: 
        yield eval(l) 
        
def read_data(path): 
    """Read all data from the compressed JSON file."""
    data = list(parse(path))
    return data

# Read the compressed dataset
users_reviews = read_data('australian_user_reviews.json.gz')

def parse_reviews_data(json_data):
    """Extract structured data from the reviews JSON."""
    cleaned_reviews = []
    
    for user_data in json_data:
        user_id = user_data.get('user_id')
        
        # Process each review for this user
        if 'reviews' in user_data and isinstance(user_data['reviews'], list):
            for review in user_data['reviews']:
                # Extract needed fields
                item_id = review.get('item_id')
                recommend = 1 if review.get('recommend', False) else 0
                
                # Parse the posted date if available
                posted_date = review.get('posted', '')
                # Extract date from string like 'Posted November 5, 2011.'
                date_match = re.search(r'Posted (\w+ \d+, \d{4})', posted_date)
                
                if date_match:
                    try:
                        # Parse the date string to a datetime object
                        date_str = date_match.group(1)
                        date_obj = datetime.strptime(date_str, '%B %d, %Y')
                        # Convert to YYYY-MM-DD format
                        created_at = date_obj.strftime('%Y-%m-%d')
                    except:
                        # Use a default date if parsing fails
                        created_at = '2000-01-01'
                else:
                    created_at = '2000-01-01'
                
                # Create clean review record
                clean_review = {
                    'user_id': user_id,
                    'item_id': item_id,
                    'created_at': created_at,
                    'recommend': recommend
                }
                
                cleaned_reviews.append(clean_review)
    
    return cleaned_reviews

# Process the reviews data
cleaned_reviews = parse_reviews_data(users_reviews)

# Convert cleaned reviews to a DataFrame
df = pd.DataFrame(cleaned_reviews)
print(f"Dataset shape: {df.shape}")
print("Sample data:")
print(df.head())

# Save as TSV
csv_file_path = 'user_reviews.csv'
df.to_csv(csv_file_path, sep='\t', index=False)

Create Steam Reviews Dataset in Shaped

Now that we have our data prepared, we'll create a Shaped Dataset and upload our processed data to it. First, let's define the schema and save it into a YAML file:

steam_review_events_schema.yaml
name: steam_review_events
schema_type: CUSTOM
column_schema:
  user_id: String
  item_id: String
  created_at: DateTime
  recommend: Int32

Next, create the dataset using the Shaped CLI:

shaped create-dataset --file steam_review_events_schema.yaml

Then, insert the data into the dataset:

shaped dataset-insert --dataset-name steam_review_events --file user_reviews.csv --type 'tsv'

You can check if the dataset was created successfully:

shaped list-datasets

Create your model

We're now ready to create our recommendation model! We'll use the review data to build a model. By default, the system will automatically select the optimal policy and hyperparameters for your model. Shaped will use this data to determine which users like which games, based on whether they recommended the game in their review.

Here's the model definition we'll be using and again we save it into a YAML file:

steam_review_model_schema.yaml
model:
  name: steam_review_game_recommendations
connectors:
  - type: Dataset
    id: steam_review_events
    name: steam_review_events
fetch:
  events: SELECT user_id, item_id, created_at, recommend AS label FROM steam_review_events

Create the model using the Shaped CLI:

shaped create-model --file steam_review_model_schema.yaml

Inspect your model

Check the status of your model:

shaped list-models

Your recommendation model can take up to a few hours to provision your infrastructure and train on your historic events. This time mostly depends on how large your dataset is i.e., the volume of your users, items, interactions, and the number of attributes you're providing.

The initial model creation goes through the following stages in order:

SCHEDULING
FETCHING
TUNING
TRAINING
DEPLOYING
ACTIVE

You can periodically poll Shaped to inspect these status changes. Once it's in the ACTIVE state, you can move to the next step and use it to make rank requests.

Fetch your recommendations

You're now ready to fetch your game recommendations! You can do this with the Rank endpoint, just provide the user_id you wish to get the recommendations for and the number of recommendations you want returned.

Shaped's CLI provides a convenient rank command to quickly retrieve results from the command line:

shaped rank --model-name steam_review_game_recommendations --user-id '76561197970982479' --limit 5

Response:

{
   "ids":[
      "219150",
      "245550",
      "620",
      "440",
      "8930"
   ],
   "scores":[
    0.944791813545478,
    0.9243345560353259,
    0.9136819097511378,
    0.8999791870543353,
    0.8831670564757734
   ],
}

The response returns two parallel arrays containing the ids and ranking scores for the games that Shaped estimates are most interesting to the given user.

If you want to integrate this endpoint into your website or application, you can use the Rank POST REST endpoint directly:

curl https://api.shaped.ai/v1/models/steam_review_game_recommendations/rank \
  -X POST \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"user_id": "76561197970982479", "limit": 5 }'

Clean Up

Don't forget to delete your model (and its assets) and the dataset once you're finished with them. You can do it with the following CLI commands:

shaped delete-model --model-name steam_review_game_recommendations
shaped delete-dataset --dataset-name steam_review_events

Game Recommendations Using Steam Reviews

Shaped CLI Setup​

Installing the Shaped CLI​

Initialize the CLI​

Dataset Preparation​

Download public dataset​

Parse and prepare the data​

Create Steam Reviews Dataset in Shaped​

Create your model​

Inspect your model​

Fetch your recommendations​

Clean Up​