Enriching Movie Descriptions with LLMs
In this tutorial, we'll show you how to use Shaped Transforms to enrich a dataset with powerful, generated content. For this scenario, we'll take a simple table of movies and use a Large Language Model (LLM) to create clean, consistent, and engaging descriptions—perfect for powering search and recommendations use-cases.
The best part? We'll be doing all of this directly from the Shaped dashboard, with no coding required.
Let's get started! 🚀
1. Sign-Up to Shaped
First, let's get you set up with a Shaped account.
- Navigate to the Shaped website and sign up for a free trial. You'll be asked to provide some basic information to create your organization.
- Once you've signed up, you'll land on the home page of your new Shaped organization. This is your mission control for creating datasets, building models, and managing transforms.
2. Getting the Dataset
For this tutorial, we'll use the classic MovieLens dataset. We'll use the "latest small" version, which is perfect for a quick demo.
- Download the dataset directly from this link: ml-latest-small.zip.
- Unzip the file. Inside, you'll find several files. The one we care about for this tutorial is
movies.csv
.
This file contains a movieId
, title
, and a pipe-separated list of genres
for thousands of movies. We'll use the title
and genres
as the input for our enrichment.
3. Uploading the Dataset to Shaped
Next, let's upload movies.csv
to create our first dataset in Shaped.
- From the home page, navigate to the Connectors tab on the left-hand navigation bar.
- Click the Create Connector button and select the File Upload option.
- Drag and drop the
movies.csv
file directly into the upload area. Shaped will automatically infer the schema and create a new dataset from the file.
- Once the upload is complete, you'll see a confirmation that your new dataset,
movies_csv
, has been created. Click on the dataset name to view it.
- You can now see your data inside Shaped! Notice the
title
andgenres
columns. This is the raw information we are going to enrich.
4. Creating the LLM Enrichment Transform
Now for the exciting part! We'll create a Transform that reads from our movies_csv
dataset, uses an LLM to generate a description for each movie, and saves the result to a new, enriched table.
- Navigate to the Transforms tab on the left-hand navigation bar and click Create Transform.
- In the creation form:
- Give your transform a name, like
movie_summaries
. - For the Source, select the
movies_csv
dataset we just uploaded. - For the Transform Type, select LLM Enrichment.
- For the Columns, select
title
,genres
andmovieId
. Make sure to checkboxinclude
on the movieId so that the output table has a unqiue identifier we can cross reference.
- Give your transform a name, like
- Click Create Transform. Shaped will immediately start the backfilling process. This means it's iterating through every row in your
movies_csv
dataset and running the LLM enrichment you just defined.
5. Evaluating Results & Exporting
Once the backfill process is complete, you can view your newly enriched dataset.
- The status of the transform will change to Finalized. Click on the transform to see its output.
- You'll now see your original movie data, plus the new
llm_description
column filled with high-quality, consistent descriptions generated by the LLM!
Before:
- Title: Toy Story (1995)
- Genres: Adventure|Animation|Children|Comedy|Fantasy
After:
- llm_description: "A cowboy doll's world is turned upside down when a new spaceman action figure arrives, sparking a rivalry that becomes an unexpected friendship."
This enriched data is now ready to be used to power a semantic search index or as rich features for a recommendation model. You can easily export this data by clicking the Export button to use it in other systems.
6. Cleaning Up
To keep your Shaped organization tidy, you can clean up the resources we created for this tutorial.
- Navigate to the Transforms tab and delete the
movies_summaries
transform. - Navigate to the Datasets tab and delete the
movies_csv
dataset.
And that's it! You've successfully used Shaped's no-code Content Enrichment to transform a simple CSV into a feature-rich dataset ready for modern AI applications.