githubEdit

Real-Time Product Analytics

End-to-end guide for building sub-second dashboards over Kafka event streams with Apache Pinot.

This playbook walks through a complete setup for powering user-facing or internal analytics dashboards with sub-second query latency over streaming data. Think page-view counters, transaction dashboards, click-through-rate monitors, or any workload where events arrive continuously and dashboards must refresh in real time.

When to use this pattern

Use this playbook when:

  • Events are produced to Kafka (or Kinesis/Pulsar) and must be queryable within seconds of arrival.

  • Dashboards aggregate over time windows (last 1 hour, last 24 hours) with filters on dimensions such as country, device, or product category.

  • You need high query concurrency (hundreds of concurrent dashboard users) with P99 latency under one second.

  • Data is append-only; you do not need to update previously ingested rows. If you do, see the CDC / Upsert Pipeline playbook instead.

Architecture sketch

Producers ──▶ Kafka topic ──▶ Pinot Real-Time Table ──▶ Broker ──▶ Dashboard

                          ┌─────────┴─────────┐
                          │  Real-time servers │
                          │  (consuming segments)
                          └───────────────────┘

Key components:

  • Kafka topic partitioned by a high-cardinality key (e.g., userId) to distribute load.

  • Real-time table with one consuming segment per Kafka partition per server.

  • Star-tree index for pre-aggregated rollups on the most common dashboard group-by/filter combinations.

  • Brokers that fan out queries to servers and merge results.

Schema

Design the schema around the queries your dashboards will issue. A product-analytics event stream typically looks like this:

circle-info

Keep the schema narrow. Every dimension column you add increases segment size and potentially slows queries. Only include columns your dashboards actually filter or group on. See Schema and Table Shape for design guidance.

Table configuration

Configuration highlights

Setting
Why

invertedIndexColumns on dimensions

Enables fast filtering on dashboard filter dropdowns

rangeIndexColumns on eventTimestamp

Speeds up time-range predicates like WHERE eventTimestamp > ago('PT1H')

starTreeIndexConfigs

Pre-aggregates the most common rollup queries so they return in single-digit milliseconds

bloomFilterColumns on userId

Accelerates point lookups when a dashboard drills into a single user

noDictionaryColumns on high-cardinality IDs

Saves memory; dictionary encoding is wasteful for columns with millions of unique values

retentionTimeValue: 30

Automatically purges data older than 30 days. Adjust to your retention needs

Indexing strategy

Start with the indexes shown above and iterate:

  1. Inverted indexes on every column used in WHERE equality filters.

  2. Range index on the time column and any numeric column used in range filters.

  3. Sorted index on the column with the most common equality filter (only one sorted column per table).

  4. Star-tree index for the top 3-5 dashboard queries that dominate traffic. Profile your queries before adding more star-tree configs — each one adds ingestion overhead and segment size.

  5. Bloom filter on columns used for point lookups with very high cardinality.

See Choosing Indexes for a decision framework.

Query patterns

Dashboard time-series aggregation

Top-N with filters

Single-user drill-down

circle-info

Use OPTION(timeoutMs=5000) on dashboard queries so a single slow query does not hold the connection pool. See Query Options.

Operational checklist

Before go-live

Monitoring

  • CONSUMING segment lag: Monitor Kafka consumer lag via pinot.server.realtimeConsumptionCatchupRatio. If lag grows, add more servers or partitions.

  • Query latency P99: Track via broker metrics. If P99 exceeds your SLA, consider adding star-tree indexes or scaling brokers.

  • Segment count per table: A very large number of small segments degrades query performance. Use the Minion Merge Rollup Task to compact completed segments.

  • Heap and direct memory: Real-time servers hold consuming segments in memory. Monitor JVM heap and off-heap usage.

See Monitoring and Running Pinot in Production for a comprehensive metric list.

Common pitfalls

Pitfall
Fix

Dashboard queries scan all 30 days of data

Add a time-range predicate; without it, the broker routes the query to every segment

Star-tree does not activate

Verify the query's GROUP BY and aggregation functions exactly match the star-tree config

High GC pauses on servers

Move high-cardinality noDictionary columns out of the dictionary, or increase direct memory for MMAP

Kafka rebalance causes brief query gaps

Set replication: 2 so a second server can serve the data while one server is catching up

Further reading

Last updated

Was this helpful?