Skip to main content

Real-Time Data Connectors

One of the most powerful features of Shaped is our real-time data connector support, which allows you to connect to supported data-streaming and Customer Data Platform sources and stream data into Shaped in real-time. This guide will walk you through the process of setting up a model with a real-time connector used as the source of your events.

It's completely fine if you don't have a real-time connector set up already. Feel free to skip ahead.

Setting Up a Shaped Dataset to Receive Events

Before you can set up a real-time connector, you need to have provisioned a Shaped Dataset to receive the events. If you don't already have a Shaped Dataset, you can create one by calling the Create Dataset API endpoint, or with the Shaped CLI.

For example, let's create a dataset to stream events from Segment, a leading Customer Data Platform:

segment_dataset.yaml
name: segment_events
schema_type: SEGMENT
shaped create-dataset --file segment_dataset.yaml

Once this request has been processed, your dataset will begin to provision in the Shaped Platform, you can check the provisioning status by calling the List Datasets API endpoint, or with the Shaped CLI:

shaped list-datasets

When the dataset has been provisioned, you will need to follow the instructions for your specific integration to begin streaming data from your source into Shaped. In this case of Segment this will involve setting up a Destination for Shaped in the Segment configuration UI, as detailed in the Integration Docs.

Using Real-Time Data in a Model

When a real-time dataset is used as the source of data, such as events, in a Shaped Model when creating the model, the platform will automatically provision a process to continually consume the data in real-time and write to our inference data store. This means that you don't need to worry about further configuration, if a data source supports real-time, Shaped will recognize this and process accordingly.

For example, let's create a model that uses the events from our Segment dataset as the source of data:

segment_events_recommendations.yaml
model:
name: segment_events_recommendations
connectors:
- type: Dataset
id: events # used to reference the dataset in fetch queries
name: segment_events # use the name of the dataset you created above
fetch:
events: |
SELECT
user_id,
JSON_EXTRACT_STRING(event_properties, '$.productid') AS item_id,
event_time as created_at,
CASE # Label mapping behavior will depend on your Segment events
WHEN event_type IN (
'CLICK_ADD_TO_ORDER', 'VIEW_PRODUCT_DETAIL', 'CLICK_LIKE_PRODUCT'
) THEN TRUE
WHEN event_type IN (
'CLICK_REMOVE_FROM_ORDER', 'CLICK_DISLIKE_PRODUCT',
) THEN FALSE
ELSE NULL
END AS label
FROM events
shaped create-model --file segment_events_recommendations.yaml

The above will create a model based on the schema of the Segment event, mapping the event_type field to a binary label, and extracting the product_id from the event_properties as the item_id for which we want to recommend ranked items for a given user.

Conclusion

In this guide we've covered how to setup a real-time data connector and create a Shaped Model that can receive events in real-time. This example specifically looks at the Segment data connector but take a look at our other real-time data connectors, like Amplitude in our integrations docs.