Segment Pruning
Understand how Pinot prunes irrelevant segments to reduce query latency and resource usage.
Segment pruning is a query optimization technique that eliminates irrelevant segments before scanning data. By skipping segments that cannot contain matching records, Pinot significantly reduces query latency, I/O, and CPU usage.
Pruning happens at two levels: broker-side (during query routing) and server-side (during query execution).
Broker-Side Pruning
The broker prunes segments before dispatching queries to servers. This reduces the number of segments that servers need to process. Configure broker-side pruning in the table's routing configuration.
Time Pruning
Time pruning skips segments whose time range does not overlap with the query's time filter. This is the most commonly used pruning strategy for time-series workloads.
Configuration:
{
"routing": {
"segmentPrunerTypes": ["time"]
}
}Requirements:
The schema must define a primary time column (
dateTimeFieldSpecswith"granularity": "1:MILLISECONDS:EPOCH"or similar)Data should be ingested in approximate chronological order for best results
Supported filter operators: =, <, <=, >, >=, RANGE, BETWEEN, AND, OR
Example query that benefits from time pruning:
Time pruning is more selective when data is strictly time-ordered. With out-of-order data, segments may have overlapping time ranges, reducing pruning effectiveness.
Partition Pruning
Partition pruning skips segments that do not contain records matching the query's partition column filter. This is effective for queries that filter on a partitioned column.
Configuration:
First, define the partition scheme in the table's index config:
Supported partition functions: Modulo, Murmur, ByteArray, HashCode
Supported filter operators: = (equality), IN
Example query that benefits from partition pruning:
For maximum partition pruning effectiveness, ensure each segment contains data from only one partition. When using Kafka, configure the Kafka topic partitioning to match the Pinot partition configuration.
Combining Pruners
You can enable both time and partition pruning simultaneously:
Server-Side Pruning
Server-side pruning happens after the broker routes the query but before the server scans segment data. These pruners use segment-level metadata and indexes.
Column Value Pruning
Prunes segments based on min/max column statistics stored in segment metadata. If a query filters on a column value outside a segment's min/max range, that segment is skipped.
This pruner works automatically and requires no special configuration. Column statistics are maintained as part of the segment metadata.
Bloom Filter Pruning
When a Bloom filter index is configured on a column, the server uses it to prune segments that definitely do not contain a queried value. This is especially effective for high-cardinality equality lookups.
Configuration (in fieldConfigList):
Bloom filter pruning for IN clauses is limited to 10 values or fewer to minimize overhead.
Multi-Stage Query Engine (MSE)
The multi-stage query engine supports broker-side pruning via the useBrokerPruning query option (enabled by default):
When the physical optimizer is enabled, time and partition pruning are automatically applied to the Leaf Stage of multi-stage queries.
Monitoring Pruning Effectiveness
Use the following metrics to assess pruning effectiveness:
SEGMENT_PRUNING
Time spent pruning segments (part of server query latency)
NUM_SEGMENTS_PRUNED_BY_VALUE
Number of segments pruned by value-based pruning
numSegmentsQueried
Segments sent to servers by the broker
numSegmentsProcessed
Segments actually scanned by servers
A large gap between numSegmentsQueried and numSegmentsProcessed indicates that server-side pruning is doing significant work. If numSegmentsQueried is close to the total segment count, consider enabling broker-side pruning.
Diagnosis: If NUM_DOCS_SCANNED or NUM_ENTRIES_SCANNED_POST_FILTER is high relative to the result set, review:
Whether time pruning is enabled and effective
Whether the table would benefit from partitioning
Whether bloom filters would help for high-cardinality equality lookups
Best Practices
Always enable time pruning for tables with a time column — it has minimal overhead and significant benefit
Partition tables on frequently filtered columns (e.g., tenant ID, user ID) for equality-based queries
Match Kafka partitioning with Pinot partitioning for real-time tables to ensure segments contain single partitions
Use bloom filters on high-cardinality columns used in equality lookups (e.g., UUIDs, session IDs)
Ingest data in time order when possible, to maximize time pruning selectivity
Monitor pruning metrics to identify tables where pruning could be improved
Last updated
Was this helpful?

