Skip to main content

Real-time vs batch connectors

Shaped supports two types of data connectors: real-time (streaming) and batch. Understanding the difference between these connector types is important for understanding how data flows through Shaped.

Of course, you may not have control over what sources your data is in. So it's important to know the characteristics and limitations of the connector you are using.

Connector Comparison

Real-Time ConnectorsBatch Connectors
DescriptionStream data continuously as events occur.Sync data on a regular schedule and in batches.
Ingestion latencyIngested within 30 seconds of the eventSynced every 15 minutes
Processing modelContinuous: each event is processed individually as it arrivesScheduled: data is processed in batches for efficiency
Supported connectorsSegment, Amplitude, Rudderstack, Kafka, Kinesis, Pub/Sub, Custom streaming connectorsBigQuery, Snowflake, Redshift, PostgreSQL, MySQL, MongoDB, S3, GCS, and more
Common use casesUser interaction events (clicks, views, purchases), real-time personalization, live inventory updates, time-sensitive content (news, social feeds)Item catalog updates, user profile data, historical interaction data, large-scale data backfills

How Data is Handled Between Connector Types

Data Flow

  1. Real-time connectors: Events stream directly into Shaped's real-time data store. The data is immediately available for querying and model training.

  2. Batch connectors: Data is synced from the source, transformed if needed, and loaded into Shaped's data warehouse. The data becomes available after the sync completes.

Mixing Connector Types

You can use both real-time and batch connectors in the same engine:

  • Use real-time connectors for interaction events (clicks, purchases) that need immediate processing
  • Use batch connectors for item catalogs, user profiles, and other less time-sensitive data

This hybrid approach allows you to optimize for both freshness (real-time events) and efficiency (batch catalog updates).

Important Limitations

Real-Time Connectors and Query Logic

Critical limitation: Real-time connectors cannot have query logic attached to the real-time stream, either at the data layer or in the engine configuration.

Why: Queries are not real-time operations. They execute against a snapshot of data at query time, not against a live stream. Real-time connectors stream raw events into Shaped, and any filtering, transformation, or query logic must be applied:

  1. At the source: Before data reaches Shaped (e.g., filter events in Segment)
  2. In views: Use SQL views to transform real-time data after ingestion
  3. In queries: Apply filters and transformations at query time using ShapedQL

Example: You cannot use QueryTableConfig or similar query-based configurations on real-time connector tables because queries operate on materialized data, not streams.

Best Practices

  • Real-time for events: Use real-time connectors for user interactions and events that need immediate processing
  • Batch for catalogs: Use batch connectors for item catalogs, user profiles, and reference data
  • Views for transformation: Use SQL views to transform and join data from both connector types
  • Query-time filtering: Apply business logic and filters in your ShapedQL queries rather than in connector configuration

Choosing the Right Connector Type

Choose real-time connectors when:

  • You need sub-minute latency for user interactions
  • Events drive immediate personalization decisions
  • You're tracking high-frequency user behavior

Choose batch connectors when:

  • Data updates are less frequent (hourly, daily)
  • You're syncing large catalogs or reference data
  • Cost efficiency is more important than latency
  • Your source system doesn't support streaming

For more information on specific connectors, see the Connector Reference.