1.5.0
Release Notes for 1.5.0
This release delivers significant improvements across the Multi-stage Query Engine (new UNNEST support, enriched joins, aggregation rewrites), Upsert (offline table support, commit-time compaction, xxhash compression), a new Federation / Multi-Cluster Routing framework, Kafka 4.x support, real-time auto-reset, Time Series Engine enhancements, new indexing capabilities (N-gram, IFST, combined Lucene), Query Resource Isolation, and numerous performance optimizations, security hardening, and bug fixes. It includes over 1100 commits from the community.
Multi-stage Query Engine Enhancements
UNNEST / CROSS JOIN UNNEST Support #17168
The multi-stage engine now natively supports array unnesting via CROSS JOIN UNNEST(...), with optional WITH ORDINALITY. This enables flattening array columns in queries without workarounds, a frequently requested SQL feature.
Enriched Join Operator #16123
The MSE join operator has been enriched with improved execution strategies, enhancing join performance and flexibility for complex multi-table queries.
MSE Lite Mode Improvements
Disable joins in Lite Mode: Added a configuration to explicitly disallow joins in MSE Lite Mode for workloads that should remain scatter-gather only. #17030
Fan-out adjusted limit: Lite Mode now supports adjusted limits based on fan-out to reduce data transferred. #16822
Push collation from Exchange to Sort: Optimizes Lite Mode queries by pushing collation requirements into sort operators. #16551
Physical Optimizer for Logical Tables #17447
Extended the Physical Optimizer to support logical tables, enabling exchange simplification and optimized execution plans for queries against logical table abstractions.
UNION (Distinct) Support #16570
Added support for UNION (distinct) set operations in addition to the existing UNION ALL, enabling more performant deduplication of results from combined queries.
Other MSE Improvements
Row() value expression support in comparisons. #17317
Add
maxRowsInJoin,maxRowsInWindow,numGroupsto query response metadata. #17784Broker config to support configuring planner rules disabled by default. #17258
Query option to exclude virtual columns from schema. #17047
Fix non-deterministic hash-distributed exchange routing. #17323
Sort servers for deterministic workerId-to-server mapping across stages. #17342
Always use singleton worker for final sort stage to ensure hotspot server rotation. #17347
Reset gRPC connection backoff when server is re-enabled. #17466
New MSE metrics for better observability. #17419
MSE latency metric for multi-stage queries. #16398
Support tracing of planner rule productions. #16581
Federation / Multi-Cluster Routing (New)
Pinot 1.5.0 introduces a new federation framework enabling a single broker to route queries across multiple independent Pinot clusters. This is a foundational capability for organizations operating multiple Pinot clusters that need unified query access.
Multi-cluster routing for SSE queries. #17439
Multi-cluster routing for MSE queries. #17444
Federated broker query support. #17296
MultiClusterHelixBrokerStarter for starting broker in federated mode. #17421
Multi-cluster QuickStart for easy setup and testing. #17581
Warnings for unavailable remote clusters in BrokerResponse. #17510
TLS routing support for logical tables. #17663
Physical optimizer support for multi-cluster routing. #17516
Disallow multi-cluster routing for physical tables (logical tables only). #17731
Logical Table Enhancements
Building on the logical table support introduced in 1.4.0, this release adds significant improvements:
Logical table management UI in the Controller. #17878
Logical tables panel in Query Console. #17199
QuickStart for logical tables. #17166
Pluggable LogicalTableConfig serializer/deserializer. #17678
Logical table TLS integration tests. #17683
Upsert and Dedup Enhancements
Upsert Support for Offline Tables #17789
Extends upsert (primary-key deduplication) to OFFLINE tables, a capability previously limited to REALTIME tables. This enables batch-ingested data to leverage primary-key-based deduplication with the same semantics, simplifying batch-only architectures that need upsert guarantees without requiring streaming infrastructure.
Commit Time Compaction #16344
A performance optimization that removes invalid and obsolete records during segment commit, before the segment becomes immutable. This addresses a fundamental challenge where committed segments contain significantly more physical records than logically valid records, reducing storage and improving query performance. Support for commit time compaction with column-major build was also added. #16769
xxHash for Primary Key Compression #17253
Introduces xxHash (xxh_128) as an option for upsert primary key compression. xxHash provides lower memory footprint compared to MD5 with significantly lower collision probability, reducing upsert memory usage.
Persist QueryableDocIds on Disk #16517
QueryableDocIds are now persisted on disk alongside validDocIds snapshots, which is essential for supporting disk-based upsert implementations and faster segment preloading.
Added a data-only CRC computation (excluding metadata) for new segments, persisted to ZK. This is used for realtime committing segments during replace operations to ensure deterministic segment identification regardless of metadata differences across replicas.
Other Upsert/Dedup Improvements
Revert Upsert Metadata of a segment during inconsistencies. #17324
Avoid inconsistencies among replicas during Upsert Compaction Tasks. #17696
Consensus check before selecting segment for compaction or deletion. #17352
Add checks to prevent updating upsert/dedup configs after table creation. #17645
TIMESTAMP support as dedupTimeColumn. #17200
New
minNumSegmentsPerTaskproperty for UpsertCompactMerge task. #17104Make dedup tables use StrictRealtimeSegmentAssignment with multi-tier support. #17154
Optimize previous record location key handling for upsert metadata revert. #17503
Add metric to detect when server is ahead of ZK committed offset. #17139
Real-Time Ingestion
Kafka 4.x Client Support #17633
Added a new pinot-kafka-4.0 stream ingestion module with Kafka 4.1.1 client support. This module uses KRaft mode (no ZooKeeper dependency) and provides full compatibility with Kafka 4.x clusters.
Kafka Subset-Partition Ingestion #17587
Tables can now consume only a subset of Kafka topic partitions, configured via stream.kafka.partition.ids. This enables flexible partition assignment for use cases where a single table doesn't need all partitions.
Removal of Kafka 2.0 Plugin #17602
The deprecated Kafka 2.0 plugin has been removed. All ingestion is now standardized on Kafka 3.x (and 4.x) clients. The default Kafka consumer was also changed to Kafka 3. #16858
A three-part feature that enables automatic offset reset when ingestion lag exceeds configurable thresholds (by offset count or time). During segment commit, if lag exceeds the threshold, the new segment starts from the latest offset instead of next offset. Part 2 introduces topic "inactive" status, and Part 3 adds a backfill manager and handler interface for recovering skipped data.
Pause Consumption in Batches #17194
For tables with large numbers of partitions, pausing consumption can overload the controller as all consumers rush to commit. This adds batch configuration parameters to the pause consumption API, spreading the load.
Real-Time Table Replication Across Clusters #17235
First part of enabling real-time table replication between two Pinot clusters. Introduces functionality for creating a new real-time table with consuming segment watermarks from a source table's ZK metadata.
Other Ingestion Improvements
Fix all race conditions from stream consumer. #17089
Improve consumer recreation retry policy. #17062
End-to-end ingestion delay metric. #17163
Freshness checker uses minimum ingestion lag. #17598
Skip freshness check for streams not supporting offset lag. #17560
Fix multi-topic ingestion routing table errors. #17810
Fix incorrect stream partition id for multi-stream realtime consumption. #17953
Confluent library bumped to 7.9.5 to match Kafka 3.9.1. #17614
SimpleAvroMessageDecoder: support header-prefixed payloads (e.g., Confluent magic+schemaId). #17077
Time Series Engine (GA)
The Time Series Engine is now GA in Pinot 1.5.0, building on the beta from 1.4.0.
Delta/DeltaDelta Storage Encoding #15258
Added Delta and DeltaDelta compression codecs for time series storage, improving compression ratios for monotonically increasing or slowly changing numeric data.
Query Execution Improvements
Query options and explain plan support in Controller UI. #17511
End-to-end query statistics propagation. #17170
Logical explain plan support. #17484
Partial results support. #17278
Query event listeners. #17464
M3QL JavaCC parser to replace custom tokenizer. #17192
POST /query/timeseries as BrokerResponse-compatible API. #16531
Apache ECharts visualization in Controller UI. #16390
Timeseries language endpoint and UI integration. #16424
Query stats panel in timeseries UI. #17423
DeltaDelta/Delta compression table config validation. #17252
Aggregation Function Improvements
ANY_VALUE Support #16678
New ANY_VALUE aggregation function that returns an arbitrary value from a column for each group. Useful for including columns in SELECT with a 1:1 mapping to GROUP BY columns without forcing them into the GROUP BY clause.
Smart Distinct Count Functions
DistinctCountSmartULL: Smart distinct count backed by UltraLogLog (ULL) with configurable promotion threshold from exact set to ULL sketch. #16605
Automatic Aggregation Rewrites
The query engine now automatically rewrites aggregation functions to type-specific optimized variants:
MIN/MAXon string columns rewrites toMINSTRING/MAXSTRING. #16980MIN/MAX/SUMon long columns rewrites toMINLONG/MAXLONG/SUMLONG. #17058New
SumIntAggregationFunctionfor optimized INT column aggregation. #16704LONG-specific MIN/MAX/SUM aggregation functions. #17001
Dictionary/metadata based optimization for MINSTRING/MAXSTRING. #16983
Comprehensive AggregationOptimizer framework for multiple functions. #16399
Multi-value aggregation functions (e.g., COUNTMV, SUMMV) have been consolidated into regular single-value aggregation functions, simplifying the user experience.
Star-Tree Index on MV Columns #16836
Star-tree index now supports aggregation on multi-value columns (e.g., AVGMV), enabling pre-aggregated queries on MV data.
Other Aggregation Improvements
AVG window function value aggregator. #17268
ListAggMv (multi-value list aggregation) with distinct variant. #17155
ArrayAgg on multi-value columns. #17153
Handle empty tables for various aggregation functions. #17750
New Scalar Functions and Operators
INITCAP: Capitalizes the first letter of each word. #17642
IP Address Functions: Suite of IP address manipulation functions. #17127
Logical Functions: AND, OR, NOT, IF, and related boolean logic functions. #17189
RAND with determinism: Scalar functions now advertise determinism; RAND supports both runtime-only and seeded modes. #17208
arrayPushFront / arrayPushBack: Append or prepend elements to arrays. #17140
ARRAYHASANY: Check if two arrays share any common element. #17156
Ngrams MV UDFs: Generate n-grams from text as multi-value columns. #16671
String functions for regex and distance: New string manipulation functions. #17372
jsonExtractKey depth control and dot notation. #16306
Sanitization, Special, and Time column transformers. #17740
LIKE predicate now case-insensitive. #16568
StringFunctions.replace() enhanced. #16528
Indexing and Storage
N-gram Filtering Index #16364
New realtime N-gram filtering index that efficiently prunes non-matching strings before final regex validation. Benchmarks show 100x matching speed improvement compared to brute-force regex matching. The index extracts N-grams of configurable length and builds posting lists for each N-gram.
IFST Index for Case-Insensitive Regex #16276
New Insensitive FST (IFST) index type alongside the existing FST for case-insensitive regex matching. A third parameter in REGEXP_LIKE specifies case sensitivity.
Minimum Should Match in Lucene Text Search #16650
Added OpenSearch-compatible minimum_should_match support for TEXT_MATCH queries, enabling control over how many terms must match in boolean queries. Supports both integer and percentage-based specifications.
Combined Lucene Text Index Files #16688
Lucene text index files can now be combined and merged into the columns.psf segment file, reducing the number of files per segment and improving storage management.
Column Major Segment Build #16727
New feature to build segments in columnar fashion for columnar input data sources (e.g., Pinot segments, Parquet files). This significantly improves segment rebuild performance for minion operations like schema or index config changes.
Other Indexing Improvements
Enable JSON index for MAP data type. #16808
Match prefix phrase query Lucene parser. #16476
NoDict hybrid cardinality: exact counting up to 2,048, then HLL++ with +10% cap. #17186
Use v4 as default raw index version. #16943
Lucene doc ID mapping for offline segments when storeInSegmentFile is true. #17437
Make ForwardIndexReader pluggable. #16363
Don't fail segment build if star-tree index build fails; skip and support rollback. #17028
Star-tree validation for * column for non-COUNT functions. #17008
Rebuild H3 index on segment reload if resolution config is updated. #16953
Query Performance Optimizations
Early short circuit with AND/OR optimizations: Skip evaluation of remaining predicates when result is already determined. #16583
DescDocIdSetOperator: Improves ORDER BY DESC performance on sorted columns by scanning in reverse order. #16789
Trim group when orderBy = groupBy: Sort-aggregate and pair-wise merge optimization when ORDER BY keys match GROUP BY keys, trimming to LIMIT per segment. #16308
Skip per-row convertTypes() in FunctionOperand when types already match. #17730
Reduce array copying for non-equi join conditions in LookupJoinOperator. #17542
Force colocated join when
is_colocated_by_join_keysis provided for semi-join. #17273Optimized cloud file listing with per-page filtering and early termination in PinotFS. #17847
JSON message decoding optimization: Parse directly to Map, avoiding intermediate JsonNode. #17485
Optimize ProtoBufRecordExtractor with field descriptor caching. #17593
Optimize group ID generation by reducing memory allocation. #16798
Reduce memory footprint for MailboxContentObserver and ReceivingMailbox. #16872
Query Resource Isolation
Workload Scheduler #16018
An improvement over Binary Workload Scheduler that verifies CPU/memory budgets at admission time. Uses WorkloadBudgetManager.canAdmitQuery() before scheduling, with O(1) atomic-counter budget checks adding negligible overhead.
Cost-Split Support #16672
Workload budgets (CPU/memory) can now be split across tables/tenants with support for real-time splits through custom overrides per propagation scheme (e.g., CONSUMING vs COMPLETED stages).
Other Resource Isolation Improvements
Queue-based throttling for ThrottleOnCriticalHeapUsageExecutor (graceful degradation instead of immediate rejection). #16409
Byte-based throttling. #16572
Queries self-terminate in panic mode. #16380
QUERIES_THROTTLED metric. #16676
Workload stats collection interface. #16340
Security and TLS
TLS protocol allowlist: Allow configuring permitted TLS protocols and hardened JVM SSL initialization. #17776
Centralized SSL context: Set SSL context in Controller/Broker/Server contexts for consistent TLS configuration. #17358
Runtime TLS diagnostics for gRPC and HTTPS connections. #17559
Access control for segment download API. #17508
MSE gRPC channel authorization. #16475
@Authenticate annotation on critical endpoints. #17552
SPI-based token resolver for custom audit identity resolution. #17658
SAS token authentication for Azure Data Lake Storage (ADLS). #16343
Validate gRPC broker connections on connect. #17835
Refresh AWS credentials at runtime. #16553
Minion / Task Framework
Table-level distributed locking for atomic minion task generation across controllers. #16857
Configurable max tasks per minion instance. #16981
Cluster-level max subtasks configuration. #16571
Subtask timing metrics. #17190
Task data cleanup on table deletion. #16307
Tag minion as Drained. #17375
GET /minions/status API endpoint. #17475
API for all minion tasks and summaries. #17330
Bound task queue size to prevent ZK bloat. #17741
LaunchBackfillIngestionJob for complete backfill support. #16890
Metadata push mode in BaseSingleSegmentConversionExecutor. #17632
PurgeTask auto-deletes empty segments. #16368
Rebalance and Cluster Management
Disaster Recovery Mode as Controller/Cluster config. #17243
Tenant Rebalance Cancellation. #16886
TenantRebalanceChecker for improved monitoring. #16455
Peer-download enabled table rebalance handling. #16341
Helix constraint on state transition messages. #16933
StateTransitionThreadPoolManager to override executor in SegmentOnlineOfflineStateModelFactory. #17453
Resume partially offline replicas in controller validation. #17754
Configurable server segment predownload. #17287
Batch delete deleted directory segments. #16848
Async segment refresh processing when enabled. #16931
Enable deep store upload retry for Pauseless by default. #17241
Observability and Metrics
Query Fingerprint: Configurable AST-based query fingerprinting that replaces literals with
?to create canonical query representations. Enriches logging via MDC and structured logs. #17177FlameGraph visualization for MSE query stage stats with clock time and allocation modes. #17263
Memory and GC time tracking in query execution statistics. #16576
SLA-style per-query error metrics on broker. #17457
Segment download latency metric. #16375
Netty and gRPC memory metrics (total max and used). #16939
Zone failure tolerant segments metric on broker. #16334
Connectors and Clients
Pinot CLI #17029
New pinot-cli module providing a modern interactive and batch command-line client for Apache Pinot. Features include multi-line SQL REPL with history, multiple output formats (CSV/TSV/JSON/ALIGNED/VERTICAL/MARKDOWN), config via properties file, and support for extra HTTP headers.
Comprehensive Pinot Admin Client #17040
New admin client in pinot-java-client providing complete programmatic access to all Pinot controller REST endpoints. Includes 6 service clients (Table, Schema, Instance, Segment, Tenant, Task), authentication support, async operations, and comprehensive error handling.
Other Client Improvements
Cursor-based pagination support in Java client. #16782
Apache Arrow format decoder for stream data ingestion with 20-30% Kafka data volume reduction. #17031
X-Correlation-Id header in Java client HTTP requests. #17863
Spark connector improvements: Pinot Proxy, unified secureMode, comprehensive gRPC support. #16666
Controller UI Improvements
Logical table management UI. #17878
Primary key display in schema UI. #17566
Minion task stats and status filtering. #16521
Package Versions section in cluster manager. #16534
Time zone support and segment start/end time in summary. #16489
Webpack 5 and TypeScript 5.8 migration. #16365
Other Notable Features
ignoreMissingSegments query option: Skip SERVER_SEGMENT_MISSING errors with broker auto-config. #16556
$partitionId virtual column: Access the partition ID as a virtual column in queries. #16721
Query routing strategy pluggable. #17364
Pluggable table samplers with precomputed broker routing entries. #17532
Storage quota check for batch segment upload. #17512
Disk utilization check extended to offline segment upload. #17579
PinotFS streamed untar for segment download. #17586
Global
pinot.md5.disabledswitch to enforce MD5 guards. #17800Custom map metadata in segment metadata and upload. #17055
Segment tier movement API: Select all segments at once for tier migration. #17812
Custom retention time for untracked segments. #16719
Enhanced /tableConfigs/validate with cluster-aware validations. #16675
Notable Bug Fixes
Fix Broken Lucene Query Tracking and Cancellation for OOM Protection. #17884
Fix NOT TEXT_MATCH false positives on consuming segments. #17880
Fix incorrect stream partition id for multi-stream realtime consumption. #17953
Fix double-unescaping of SQL string literals in single-stage engine. #17438
Fix broker write failure handling in ServerChannels. #17861
Fix server outbound write failure creating zombie channels. #17845
Fix QueryServer channel cleanup on close. #17854
Fix negative reload estimated time and year 2088 rebalance timestamp. #17833
Fix ResponseStoreCleaner bugs causing unbounded response store growth. #17622
Fix funnel aggregation worker threads not responding to timeout/cancellation. #17692
Fix off-heap memory spike issue. #17489
Fix SSE query optimization regression for hybrid tables. #17751
Fix bug with COUNT(col) when using star-tree index with null handling enabled. #17106
Fix null handling issue for string to timestamp casts. #16885
Fix null values in JsonExtractScalarTransformFunction with default value support. #16683
Handle null values gracefully in SSE post aggregation functions. #16711
Fix CAST evaluation with literal-only operands in MSQE. #16421
Fix AWS SDK credential provider thread leak in S3PinotFS. #17869
Fix S3PinotFS URL Encoding for S3-compatible storage. #17691
Improve GcsPinotFS deleteBatch resiliency. #17713
Fix DirectOOMHandler string matching bug for Netty direct memory OOM. #17684
Fix early termination on MSE operator. #16696
Fix lead/lag window functions filling beyond num rows. #17348
Fix truncate(value) signed-zero and non-finite behavior. #17677
Fix complex field ser/de. #17685
Fix metadata APIs to report primary key counts for dedup tables. #17736
Fix non-daemon threads blocking JVM shutdown in RenewableTlsUtils. #17221
Fix NPE in TableRebalancer. #17723
Fix multi-stage query stats serde compatibility issue caused by UNNEST. #17590
Last updated
Was this helpful?

