Skip to main content

Enriching Movie Descriptions with LLMs

In this tutorial, we'll show you how to use Shaped Transforms to enrich a dataset with powerful, generated content. For this scenario, we'll take a simple table of movies and use a Large Language Model (LLM) to create clean, consistent, and engaging descriptions—perfect for powering search and recommendations use-cases.

The best part? We'll be doing all of this directly from the Shaped dashboard, with no coding required.

Let's get started! 🚀

1. Sign-Up to Shaped

First, let's get you set up with a Shaped account.

  • Navigate to the Shaped website and sign up for a free trial. You'll be asked to provide some basic information to create your organization.

Sign-Up

  • Once you've signed up, you'll land on the home page of your new Shaped organization. This is your mission control for creating datasets, building models, and managing transforms.

Home Page

2. Getting the Dataset

For this tutorial, we'll use the classic MovieLens dataset. We'll use the "latest small" version, which is perfect for a quick demo.

  • Download the dataset directly from this link: ml-latest-small.zip.
  • Unzip the file. Inside, you'll find several files. The one we care about for this tutorial is movies.csv.

This file contains a movieId, title, and a pipe-separated list of genres for thousands of movies. We'll use the title and genres as the input for our enrichment.

3. Uploading the Dataset to Shaped

Next, let's upload movies.csv to create our first dataset in Shaped.

  • From the home page, navigate to the Connectors tab on the left-hand navigation bar.
  • Click the Create Connector button and select the File Upload option.

Connector Creation

  • Drag and drop the movies.csv file directly into the upload area. Shaped will automatically infer the schema and create a new dataset from the file.

Upload File

  • Once the upload is complete, you'll see a confirmation that your new dataset, movies_csv, has been created. Click on the dataset name to view it.

Dataset Created

  • You can now see your data inside Shaped! Notice the title and genres columns. This is the raw information we are going to enrich.

Dataset View

4. Creating the LLM Enrichment Transform

Now for the exciting part! We'll create a Transform that reads from our movies_csv dataset, uses an LLM to generate a description for each movie, and saves the result to a new, enriched table.

  • Navigate to the Transforms tab on the left-hand navigation bar and click Create Transform.
  • In the creation form:
    1. Give your transform a name, like movie_summaries.
    2. For the Source, select the movies_csv dataset we just uploaded.
    3. For the Transform Type, select LLM Enrichment.
    4. For the Columns, select title, genres and movieId. Make sure to checkbox include on the movieId so that the output table has a unqiue identifier we can cross reference.

Transform Creation

  • Click Create Transform. Shaped will immediately start the backfilling process. This means it's iterating through every row in your movies_csv dataset and running the LLM enrichment you just defined.

Transform Backfilling

5. Evaluating Results & Exporting

Once the backfill process is complete, you can view your newly enriched dataset.

  • The status of the transform will change to Finalized. Click on the transform to see its output.
  • You'll now see your original movie data, plus the new llm_description column filled with high-quality, consistent descriptions generated by the LLM!

Transform Finalized

Before:

  • Title: Toy Story (1995)
  • Genres: Adventure|Animation|Children|Comedy|Fantasy

After:

  • llm_description: "A cowboy doll's world is turned upside down when a new spaceman action figure arrives, sparking a rivalry that becomes an unexpected friendship."

This enriched data is now ready to be used to power a semantic search index or as rich features for a recommendation model. You can easily export this data by clicking the Export button to use it in other systems.

6. Cleaning Up

To keep your Shaped organization tidy, you can clean up the resources we created for this tutorial.

  1. Navigate to the Transforms tab and delete the movies_summaries transform.
  2. Navigate to the Datasets tab and delete the movies_csv dataset.

And that's it! You've successfully used Shaped's no-code Content Enrichment to transform a simple CSV into a feature-rich dataset ready for modern AI applications.