Skip to main content

Quickstart

Shaped provides three APIs that you should be familiar with to get started:

  1. Dataset API
  2. Model Management API
  3. Rank API

The model and dataset API are used to manage your data sources, and models. They both provide endpoints to create, list and delete your respective dataset or model entities.

The Rank API provides a set of high-performance, real-time endpoints to consume results from your model. These results can include personalized recommendations, semantic search and latent embedding analytics.

All endpoints can be accessed through the Shaped's CLI or REST APIs. In this guide we'll run through an example which puts them altogether.

Installing the CLI

To get started using the CLI to create a model, first install the Shaped CLI from PyPi as follows:

pip install shaped
info

Shaped supports Python 3.8+, take a look at the installation instructions if you need to install pip.

If you are having trouble installing due to package conflicts, use a Python virtual environment, especially if you are using the system default Python installation.

Initialize the client

You can then initialize the shaped client with your API key. If you don't have an API key yet, check out this to get one page.

shaped init --api-key <YOUR_API_KEY>

Creating your first Dataset

Let's say you want to build a movie recommendation model and you have your event data (containing rating events between 1 and 5) stored in a tsv file called u.data. The csv has the following columns:

  1. user_id: the user triggering the interaction event.
  2. movie_id: the movie id that was interacted with.
  3. timestamp: the timestamp of the event.
  4. rating: the ratings between 1 and 5.

The first step of using Shaped is ingesting this data within Shaped. To do this, from a local file let's create a dataset with the create-dataset-from-uri cli command:

shaped create-dataset-from-uri --name movielens_ratings --path u.data --type tsv
info

If you want to follow along you can download a sample movie ratings dataset as follows. If you get stuck feel free to look at our our more in-depth MovieLens tutorial.

CLI
wget http://files.grouplens.org/datasets/movielens/ml-100k.zip --no-check-certificate

Creating your first Model

To create a model from the ingested events we need to write a model definition that includes the SQL transforms needed to choose the user_id, item_id, created_at and label columns. Shaped's event label column expects numerical values, where anything greater than 0 is positive and anything less than or equal to 0 is negative. For this use-case, we'll make clicks: positive (i.e. label=1) and impressions negative (i.e. label=0). Putting it together here's the model definition:

movie_recommendation_model.yaml
model:
name: movie_recommendations
connectors:
- type: Dataset
id: movielens_ratings
name: movielens_ratings
fetch:
events: |
SELECT
user_id,
item_id,
timestamp AS created_at,
CASE WHEN rating >= 4 THEN 1 ELSE 0 END AS label
FROM movielens_ratings
shaped create-model --file movie_recommendation_model.yaml

Rank API

The Rank API is what's used to retrieve your recommendation results. It's a real-time, high performance endpoint designed to be integrated directly into your application and supports 1000s of requests a second. Much like the Model API we provide a CLI and REST API to make rank requests. The Rank API supports several discovery endpoints and argument combinations that handle your use-case whether it be a recommendation feed, a similar items carousels or personalized search.

Ranking

Given you already have the Shaped CLI installed, here's how you can use it to call the rank endpoint and retrieve personalized results for user_id=3.

shaped rank --model_name movie_recommendations --user_id 3 --limit 5

You can also use one of our SDKs to call the rank endpoint in production.

Example Response:

{
"ids":[
"427010",
"182094",
"332874",
"827918",
"403528"
],
"scores":[
0.9,
0.8,
0.7,
0.3,
0.2
],
}

The response contains a list of "ids", which in this case is the unique identifiers for the movies (item_ids) that are most relevant to this user. Because we trained the models with 'ratings >= 4' as the positive interaction, these videos are what we predict the user to most likely want to rate highly and watch next.

The response also contains a parallel list of "scores", which has the respective relevance confidence we have that this item is relevant to the query user. You can use this to get a bit more of an understanding of how the relevancy estimates change throughout the ranking.

To learn more about using and evaluating results from the rank endpoint, take a look at some of specific guides on your use-case.

Next steps

Although this video recommendation model is a good starting place, there's a lot of ways we can improve it with Shaped. Notably,

  1. In depth model creation guide.
  2. Creating personalized item filters that better represent your business logic.
  3. Real-time connectors and session-based ranking

We recommend going through the rest of our guides to find out more!