Hybrid Real-Time + Offline
End-to-end guide for combining low-latency streaming with high-quality batch backfills in a single Pinot table.
When to use this pattern
Architecture sketch
┌──────────────────┐
Kafka topic ──────────────────▶ │ REALTIME table │ (fresh, last few hours)
└────────┬─────────┘
│
Pinot query │ time boundary
spans both │
│
┌────────┴─────────┐
Spark / Flink batch job ──────▶ │ OFFLINE table │ (historical, optimized)
└──────────────────┘Schema
Real-time table configuration
Offline table configuration
Why the offline table can be more aggressively indexed
Batch ingestion job
How the time boundary advances
Query patterns
Operational checklist
Before go-live
Monitoring
Common pitfalls
Pitfall
Fix
Further reading
Last updated
Was this helpful?

