Game Recommendations (Steam Reviews)

This tutorial demonstrates how to configure a recommendation engine using the Steam Australian Reviews dataset, specifically the australian_user_reviews.json.gz file. The dataset contains reviews from Australian Steam users, including whether they recommended the games they reviewed.

Accompanying notebook

CLI Setup

Install the CLI

pip install shaped

info

Shaped supports Python 3.8 to 3.11. See installation instructions if you need to install pip.

Initialize the CLI

shaped init --api-key <YOUR_API_KEY>

If you don't have an API key, see How to get an API key.

Data Preparation

Download the dataset

CLI
wget https://mcauleylab.ucsd.edu/public_datasets/data/steam/australian_user_reviews.json.gz --no-check-certificate

Parse and prepare the data

The dataset is stored in JSON.gz format with a nested structure. Each record contains a user_id and an array of reviews. Transform this into a flattened format where each row represents a single user-game review:

Python
import pandas as pd
import gzip
import json
import re
from datetime import datetime

def parse(path): 
    """Parse each line of the compressed JSON file."""
    g = gzip.open(path, 'r') 
    for l in g: 
        yield eval(l) 
        
def read_data(path): 
    """Read all data from the compressed JSON file."""
    data = list(parse(path))
    return data

# Read the compressed dataset
users_reviews = read_data('australian_user_reviews.json.gz')

def parse_reviews_data(json_data):
    """Extract structured data from the reviews JSON."""
    cleaned_reviews = []
    
    for user_data in json_data:
        user_id = user_data.get('user_id')
        
        # Process each review for this user
        if 'reviews' in user_data and isinstance(user_data['reviews'], list):
            for review in user_data['reviews']:
                # Extract needed fields
                item_id = review.get('item_id')
                recommend = 1 if review.get('recommend', False) else 0
                
                # Parse the posted date if available
                posted_date = review.get('posted', '')
                # Extract date from string like 'Posted November 5, 2011.'
                date_match = re.search(r'Posted (\w+ \d+, \d{4})', posted_date)
                
                if date_match:
                    try:
                        # Parse the date string to a datetime object
                        date_str = date_match.group(1)
                        date_obj = datetime.strptime(date_str, '%B %d, %Y')
                        # Convert to YYYY-MM-DD format
                        created_at = date_obj.strftime('%Y-%m-%d')
                    except:
                        # Use a default date if parsing fails
                        created_at = '2000-01-01'
                else:
                    created_at = '2000-01-01'
                
                # Create clean review record
                clean_review = {
                    'user_id': user_id,
                    'item_id': item_id,
                    'created_at': created_at,
                    'recommend': recommend
                }
                
                cleaned_reviews.append(clean_review)
    
    return cleaned_reviews

# Process the reviews data
cleaned_reviews = parse_reviews_data(users_reviews)

# Convert cleaned reviews to a DataFrame
df = pd.DataFrame(cleaned_reviews)
print(f"Table shape: {df.shape}")
print("Sample data:")
print(df.head())

# Save as TSV
csv_file_path = 'user_reviews.csv'
df.to_csv(csv_file_path, sep='\t', index=False)

Create the table

Define the table schema:

steam_review_events_schema.yaml
name: steam_review_events
schema_type: CUSTOM
column_schema:
  user_id: String
  item_id: String
  created_at: DateTime
  recommend: Int32

Create the table:

shaped create-table --file steam_review_events_schema.yaml

Insert data:

shaped table-insert --table-name steam_review_events --file user_reviews.csv --type 'tsv'

Verify table creation:

shaped list-tables

Create the engine

This example uses review data to build a collaborative filtering engine. The system automatically selects policy and hyperparameters. The engine uses recommendation status (whether the user recommended the game) as the interaction signal.

Engine configuration:

steam_review_engine_schema.yaml
data:
  interaction_table:
    type: query
    query: |
      SELECT user_id, item_id, created_at, recommend AS label FROM steam_review_events
training:
  models:
    - name: als
      policy_type: als

Create the engine:

shaped create-engine --file steam_review_engine_schema.yaml

Monitor engine status

Check engine status:

shaped list-engines

Engine creation and training can take several hours, depending on data volume and attributes. The engine progresses through these stages:

SCHEDULING
FETCHING
TRAINING
DEPLOYING
ACTIVE

Once the status is ACTIVE, the engine is ready for queries.

Query recommendations

Query recommendations using the Query endpoint. Provide a user_id and the number of results to return.

Using the CLI:

shaped query --engine-name steam_review_game_recommendations \
  --query "SELECT * FROM similarity(embedding_ref='als', limit=50, encoder='precomputed_user', input_user_id='\$user_id') LIMIT 5" \
  --parameters '{"user_id": "76561197970982479"}'

Response:

{
   "results": [
      {
         "id": "219150",
         "score": 0.944791813545478
      },
      {
         "id": "245550",
         "score": 0.9243345560353259
      },
      {
         "id": "620",
         "score": 0.9136819097511378
      },
      {
         "id": "440",
         "score": 0.8999791870543353
      },
      {
         "id": "8930",
         "score": 0.8831670564757734
      }
   ]
}

The response contains an array of result objects with game IDs and scores.

Using the REST API:

curl https://api.shaped.ai/v2/engines/steam_review_game_recommendations/query \
  -X POST \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "SELECT * FROM similarity(embedding_ref=''als'', limit=50, encoder=''precomputed_user'', input_user_id=''$user_id'') LIMIT 5",
    "parameters": {
      "user_id": "76561197970982479"
    }
  }'

Clean up

Delete the engine and table when finished:

shaped delete-engine --engine-name steam_review_game_recommendations
shaped delete-table --table-name steam_review_events

CLI Setup​

Install the CLI​

Initialize the CLI​

Data Preparation​

Download the dataset​

Parse and prepare the data​

Create the table​

Create the engine​

Monitor engine status​

Query recommendations​

Clean up​