Real-time vs batch connectors

Shaped supports two types of data connectors: real-time (streaming) and batch. Understanding the difference between these connector types is important for understanding how data flows through Shaped.

Of course, you may not have control over what sources your data is in. So it's important to know the characteristics and limitations of the connector you are using.

Connector Comparison

	Real-Time Connectors	Batch Connectors
Description	Stream data continuously as events occur.	Sync data on a regular schedule and in batches.
Ingestion latency	Ingested within 30 seconds of the event	Synced every 15 minutes
Processing model	Continuous: each event is processed individually as it arrives	Scheduled: data is processed in batches for efficiency
Supported connectors	Segment, Amplitude, Rudderstack, Kafka, Kinesis, Pub/Sub, Custom streaming connectors	BigQuery, Snowflake, Redshift, PostgreSQL, MySQL, MongoDB, S3, GCS, and more
Common use cases	User interaction events (clicks, views, purchases), real-time personalization, live inventory updates, time-sensitive content (news, social feeds)	Item catalog updates, user profile data, historical interaction data, large-scale data backfills

How Data is Handled Between Connector Types

Data Flow

Real-time connectors: Events stream directly into Shaped's real-time data store. The data is immediately available for querying and model training.
Batch connectors: Data is synced from the source, transformed if needed, and loaded into Shaped's data warehouse. The data becomes available after the sync completes.

Mixing Connector Types

You can use both real-time and batch connectors in the same engine:

Use real-time connectors for interaction events (clicks, purchases) that need immediate processing
Use batch connectors for item catalogs, user profiles, and other less time-sensitive data

This hybrid approach allows you to optimize for both freshness (real-time events) and efficiency (batch catalog updates).

Important Limitations

Real-Time Connectors and Query Logic

Critical limitation: Real-time connectors cannot have query logic attached to the real-time stream, either at the data layer or in the engine configuration.

Why: Queries are not real-time operations. They execute against a snapshot of data at query time, not against a live stream. Real-time connectors stream raw events into Shaped, and any filtering, transformation, or query logic must be applied:

At the source: Before data reaches Shaped (e.g., filter events in Segment)
In views: Use SQL views to transform real-time data after ingestion
In queries: Apply filters and transformations at query time using ShapedQL

Example: You cannot use QueryTableConfig or similar query-based configurations on real-time connector tables because queries operate on materialized data, not streams.

Best Practices

Real-time for events: Use real-time connectors for user interactions and events that need immediate processing
Batch for catalogs: Use batch connectors for item catalogs, user profiles, and reference data
Views for transformation: Use SQL views to transform and join data from both connector types
Query-time filtering: Apply business logic and filters in your ShapedQL queries rather than in connector configuration

Choosing the Right Connector Type

Choose real-time connectors when:

You need sub-minute latency for user interactions
Events drive immediate personalization decisions
You're tracking high-frequency user behavior

Choose batch connectors when:

Data updates are less frequent (hourly, daily)
You're syncing large catalogs or reference data
Cost efficiency is more important than latency
Your source system doesn't support streaming

For more information on specific connectors, see the Connector Reference.

Connector Comparison​

How Data is Handled Between Connector Types​

Data Flow​

Mixing Connector Types​

Important Limitations​

Real-Time Connectors and Query Logic​

Best Practices​

Choosing the Right Connector Type​