Connector Types: Real-Time vs Batch
Shaped supports two types of data connectors: real-time (streaming) and batch. Understanding the difference between these connector types is important for designing your data architecture and understanding how data flows through Shaped.
Real-Time Connectors
Real-time connectors stream data into Shaped continuously as events occur. Data is ingested and processed within seconds, enabling immediate updates to your recommendation models.
Characteristics
- Ingestion latency: Data appears in Shaped within 30 seconds of the source event
- Continuous streaming: Data flows continuously, not on a schedule
- Event-driven: Each event is processed individually as it arrives
Supported Real-Time Connectors
- Segment
- Amplitude
- Rudderstack
- Kafka
- Kinesis
- Pub/Sub
- Custom streaming connectors
Use Cases
- User interaction events (clicks, views, purchases)
- Real-time personalization that responds to immediate user behavior
- Live inventory updates
- Time-sensitive content (news, social feeds)
Batch Connectors
Batch connectors sync data from your source on a regular schedule, typically every 15 minutes. Data is ingested in batches, making them suitable for larger data volumes and less time-sensitive updates.
Characteristics
- Ingestion latency: Data is synced every 15 minutes
- Scheduled syncs: Data is pulled from the source on a fixed schedule
- Bulk processing: Data is processed in batches for efficiency
Supported Batch Connectors
- BigQuery
- Snowflake
- Redshift
- PostgreSQL
- MySQL
- MongoDB
- S3
- GCS
- And more (see Connector Reference)
Use Cases
- Item catalog updates
- User profile data
- Historical interaction data
- Large-scale data backfills
How Data is Handled Between Connector Types
Data Flow
-
Real-time connectors: Events stream directly into Shaped's real-time data store. The data is immediately available for querying and model training.
-
Batch connectors: Data is synced from the source, transformed if needed, and loaded into Shaped's data warehouse. The data becomes available after the sync completes.
Mixing Connector Types
You can use both real-time and batch connectors in the same engine:
- Use real-time connectors for interaction events (clicks, purchases) that need immediate processing
- Use batch connectors for item catalogs, user profiles, and other less time-sensitive data
This hybrid approach allows you to optimize for both freshness (real-time events) and efficiency (batch catalog updates).
Important Limitations
Real-Time Connectors and Query Logic
Critical limitation: Real-time connectors cannot have query logic attached to the real-time stream, either at the data layer or in the engine configuration.
Why: Queries are not real-time operations. They execute against a snapshot of data at query time, not against a live stream. Real-time connectors stream raw events into Shaped, and any filtering, transformation, or query logic must be applied:
- At the source: Before data reaches Shaped (e.g., filter events in Segment)
- In views: Use SQL views to transform real-time data after ingestion
- In queries: Apply filters and transformations at query time using ShapedQL
Example: You cannot use QueryTableConfig or similar query-based configurations on
real-time connector tables because queries operate on materialized data, not streams.
Best Practices
- Real-time for events: Use real-time connectors for user interactions and events that need immediate processing
- Batch for catalogs: Use batch connectors for item catalogs, user profiles, and reference data
- Views for transformation: Use SQL views to transform and join data from both connector types
- Query-time filtering: Apply business logic and filters in your ShapedQL queries rather than in connector configuration
Choosing the Right Connector Type
Choose real-time connectors when:
- You need sub-minute latency for user interactions
- Events drive immediate personalization decisions
- You're tracking high-frequency user behavior
Choose batch connectors when:
- Data updates are less frequent (hourly, daily)
- You're syncing large catalogs or reference data
- Cost efficiency is more important than latency
- Your source system doesn't support streaming
For more information on specific connectors, see the Connector Reference.