Performance Tuning
Reduce query latency, control resource usage, and scale throughput by tuning routing, scheduling, memory, and real-time ingestion.
Purpose
Pinot query costs vary widely depending on workload characteristics, data distribution, and cluster topology. This section provides the operator-facing knobs for improving latency, throughput, and resource efficiency across a Pinot cluster. Use these guides after your tables are ingesting data and you have baseline metrics to compare against.
Tuning categories
Query routing and fanout
Control how brokers select servers and how many servers participate in each query. Reducing unnecessary fanout is the single highest-leverage tuning action for tail latency.
Partition pruning, time pruning, replica-group routing, single-replica routing, preferred pool routing, broker tag enforcement
Route queries to the fastest available server using latency and in-flight request stats
Broker-side and server-side pruning strategies (time, partition, bloom filter, column value) to skip irrelevant segments
Query scheduling and resource isolation
Prioritize production traffic over ad-hoc queries and prevent noisy-neighbor problems.
FCFS, bounded FCFS, and token-bucket schedulers for controlling query concurrency on servers
Binary workload and named-workload schedulers with per-workload CPU and memory budgets
Heap monitoring and automatic query killing to prevent server out-of-memory crashes
Memory and storage
Tune how Pinot maps segment data into memory and controls disk I/O.
Configure posix_madvise hints (RANDOM, SEQUENTIAL, WILL_NEED) for memory-mapped segment files
Predicate reordering, streaming segment download, Netty native TLS and transport
Limit parallelism of segment download, index rebuild, and StarTree preprocessing to protect server resources
Real-time ingestion tuning
Optimize memory, throughput, and segment sizing for tables consuming from streaming sources.
Off-heap allocation, consuming segment row thresholds, completed-segment placement, split commit protocol, RealtimeProvisioningHelper
Eliminate ingestion pauses during segment commit by consuming into a new segment in parallel with build and upload
Automatically pause and resume real-time ingestion when disk utilization exceeds a threshold
Where to start
Measure first. Collect baseline query latency (p50, p95, p99),
numSegmentsQueried,numDocsScanned, and server CPU/memory utilization before changing any configuration.Reduce fanout. Enable time and partition pruning, and consider replica-group routing if your table spans many servers.
Right-size consuming segments. Use the
RealtimeProvisioningHelperto choose optimal segment sizes and flush thresholds for real-time tables.Protect production traffic. Enable adaptive server selection and consider workload isolation if ad-hoc queries share the same cluster.
Tune memory mapping. On Linux, experiment with
RANDOMmadvise if your workload is point-lookup heavy, or keep the default if scans dominate.
Last updated
Was this helpful?

