Skip to main content

Game Recommendations (Steam Reviews)

This tutorial demonstrates how to configure a recommendation engine using the Steam Australian Reviews dataset, specifically the australian_user_reviews.json.gz file. The dataset contains reviews from Australian Steam users, including whether they recommended the games they reviewed.

Accompanying notebook

CLI Setup

Install the CLI

pip install shaped
info

Shaped supports Python 3.8 to 3.11. See installation instructions if you need to install pip.

Initialize the CLI

shaped init --api-key <YOUR_API_KEY>

If you don't have an API key, see How to get an API key.

Data Preparation

Download the dataset

CLI
wget https://mcauleylab.ucsd.edu/public_datasets/data/steam/australian_user_reviews.json.gz --no-check-certificate

Parse and prepare the data

The dataset is stored in JSON.gz format with a nested structure. Each record contains a user_id and an array of reviews. Transform this into a flattened format where each row represents a single user-game review:

Python
import pandas as pd
import gzip
import json
import re
from datetime import datetime

def parse(path):
"""Parse each line of the compressed JSON file."""
g = gzip.open(path, 'r')
for l in g:
yield eval(l)

def read_data(path):
"""Read all data from the compressed JSON file."""
data = list(parse(path))
return data

# Read the compressed dataset
users_reviews = read_data('australian_user_reviews.json.gz')

def parse_reviews_data(json_data):
"""Extract structured data from the reviews JSON."""
cleaned_reviews = []

for user_data in json_data:
user_id = user_data.get('user_id')

# Process each review for this user
if 'reviews' in user_data and isinstance(user_data['reviews'], list):
for review in user_data['reviews']:
# Extract needed fields
item_id = review.get('item_id')
recommend = 1 if review.get('recommend', False) else 0

# Parse the posted date if available
posted_date = review.get('posted', '')
# Extract date from string like 'Posted November 5, 2011.'
date_match = re.search(r'Posted (\w+ \d+, \d{4})', posted_date)

if date_match:
try:
# Parse the date string to a datetime object
date_str = date_match.group(1)
date_obj = datetime.strptime(date_str, '%B %d, %Y')
# Convert to YYYY-MM-DD format
created_at = date_obj.strftime('%Y-%m-%d')
except:
# Use a default date if parsing fails
created_at = '2000-01-01'
else:
created_at = '2000-01-01'

# Create clean review record
clean_review = {
'user_id': user_id,
'item_id': item_id,
'created_at': created_at,
'recommend': recommend
}

cleaned_reviews.append(clean_review)

return cleaned_reviews

# Process the reviews data
cleaned_reviews = parse_reviews_data(users_reviews)

# Convert cleaned reviews to a DataFrame
df = pd.DataFrame(cleaned_reviews)
print(f"Table shape: {df.shape}")
print("Sample data:")
print(df.head())

# Save as TSV
csv_file_path = 'user_reviews.csv'
df.to_csv(csv_file_path, sep='\t', index=False)

Create the table

Define the table schema:

steam_review_events_schema.yaml
name: steam_review_events
schema_type: CUSTOM
column_schema:
user_id: String
item_id: String
created_at: DateTime
recommend: Int32

Create the table:

shaped create-table --file steam_review_events_schema.yaml

Insert data:

shaped table-insert --table-name steam_review_events --file user_reviews.csv --type 'tsv'

Verify table creation:

shaped list-tables

Create the engine

This example uses review data to build a collaborative filtering engine. The system automatically selects policy and hyperparameters. The engine uses recommendation status (whether the user recommended the game) as the interaction signal.

Engine configuration:

steam_review_engine_schema.yaml
data:
interaction_table:
type: query
query: |
SELECT user_id, item_id, created_at, recommend AS label FROM steam_review_events
training:
models:
- name: als
policy_type: als

Create the engine:

shaped create-engine --file steam_review_engine_schema.yaml

Monitor engine status

Check engine status:

shaped list-engines

Engine creation and training can take several hours, depending on data volume and attributes. The engine progresses through these stages:

  1. SCHEDULING
  2. FETCHING
  3. TRAINING
  4. DEPLOYING
  5. ACTIVE

Once the status is ACTIVE, the engine is ready for queries.

Query recommendations

Query recommendations using the Query endpoint. Provide a user_id and the number of results to return.

Using the CLI:

shaped query --engine-name steam_review_game_recommendations \
--query "SELECT * FROM similarity(embedding_ref='als', limit=50, encoder='precomputed_user', input_user_id='\$user_id') LIMIT 5" \
--parameters '{"user_id": "76561197970982479"}'

Response:

{
"results": [
{
"id": "219150",
"score": 0.944791813545478
},
{
"id": "245550",
"score": 0.9243345560353259
},
{
"id": "620",
"score": 0.9136819097511378
},
{
"id": "440",
"score": 0.8999791870543353
},
{
"id": "8930",
"score": 0.8831670564757734
}
]
}

The response contains an array of result objects with game IDs and scores.

Using the REST API:

curl https://api.shaped.ai/v2/engines/steam_review_game_recommendations/query \
-X POST \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT * FROM similarity(embedding_ref=''als'', limit=50, encoder=''precomputed_user'', input_user_id=''$user_id'') LIMIT 5",
"parameters": {
"user_id": "76561197970982479"
}
}'

Clean up

Delete the engine and table when finished:

shaped delete-engine --engine-name steam_review_game_recommendations
shaped delete-table --table-name steam_review_events