# 1.5.0

This release delivers significant improvements across the Multi-stage Query Engine (new UNNEST support, enriched joins, aggregation rewrites), Upsert (offline table support, commit-time compaction, xxhash compression), a new Federation / Multi-Cluster Routing framework, Kafka 4.x support, real-time auto-reset, Time Series Engine enhancements, new indexing capabilities (N-gram, IFST, combined Lucene), Query Resource Isolation, and numerous performance optimizations, security hardening, and bug fixes. It includes over 1100 commits from the community.

## Multi-stage Query Engine Enhancements

### UNNEST / CROSS JOIN UNNEST Support [#17168](https://github.com/apache/pinot/pull/17168)

The multi-stage engine now natively supports array unnesting via `CROSS JOIN UNNEST(...)`, with optional `WITH ORDINALITY`. This enables flattening array columns in queries without workarounds, a frequently requested SQL feature.

### Enriched Join Operator [#16123](https://github.com/apache/pinot/pull/16123)

The MSE join operator has been enriched with improved execution strategies, enhancing join performance and flexibility for complex multi-table queries.

### MSE Lite Mode Improvements

* **Disable joins in Lite Mode**: Added a configuration to explicitly disallow joins in MSE Lite Mode for workloads that should remain scatter-gather only. [#17030](https://github.com/apache/pinot/pull/17030)
* **Fan-out adjusted limit**: Lite Mode now supports adjusted limits based on fan-out to reduce data transferred. [#16822](https://github.com/apache/pinot/pull/16822)
* **Push collation from Exchange to Sort**: Optimizes Lite Mode queries by pushing collation requirements into sort operators. [#16551](https://github.com/apache/pinot/pull/16551)

### Physical Optimizer for Logical Tables [#17447](https://github.com/apache/pinot/pull/17447)

Extended the Physical Optimizer to support logical tables, enabling exchange simplification and optimized execution plans for queries against logical table abstractions.

### UNION (Distinct) Support [#16570](https://github.com/apache/pinot/pull/16570)

Added support for `UNION` (distinct) set operations in addition to the existing `UNION ALL`, enabling more performant deduplication of results from combined queries.

### Other MSE Improvements

* Row() value expression support in comparisons. [#17317](https://github.com/apache/pinot/pull/17317)
* Add `maxRowsInJoin`, `maxRowsInWindow`, `numGroups` to query response metadata. [#17784](https://github.com/apache/pinot/pull/17784)
* Broker config to support configuring planner rules disabled by default. [#17258](https://github.com/apache/pinot/pull/17258)
* Query option to exclude virtual columns from schema. [#17047](https://github.com/apache/pinot/pull/17047)
* Fix non-deterministic hash-distributed exchange routing. [#17323](https://github.com/apache/pinot/pull/17323)
* Sort servers for deterministic workerId-to-server mapping across stages. [#17342](https://github.com/apache/pinot/pull/17342)
* Always use singleton worker for final sort stage to ensure hotspot server rotation. [#17347](https://github.com/apache/pinot/pull/17347)
* Reset gRPC connection backoff when server is re-enabled. [#17466](https://github.com/apache/pinot/pull/17466)
* Improved error handling and error messages for leaf stage errors. [#17306](https://github.com/apache/pinot/pull/17306), [#17207](https://github.com/apache/pinot/pull/17207), [#17161](https://github.com/apache/pinot/pull/17161)
* New MSE metrics for better observability. [#17419](https://github.com/apache/pinot/pull/17419)
* MSE latency metric for multi-stage queries. [#16398](https://github.com/apache/pinot/pull/16398)
* Support tracing of planner rule productions. [#16581](https://github.com/apache/pinot/pull/16581)

## Federation / Multi-Cluster Routing (New)

Pinot 1.5.0 introduces a new **federation framework** enabling a single broker to route queries across multiple independent Pinot clusters. This is a foundational capability for organizations operating multiple Pinot clusters that need unified query access.

* **Multi-cluster routing for SSE queries.** [#17439](https://github.com/apache/pinot/pull/17439)
* **Multi-cluster routing for MSE queries.** [#17444](https://github.com/apache/pinot/pull/17444)
* **Federated broker query support.** [#17296](https://github.com/apache/pinot/pull/17296)
* **MultiClusterHelixBrokerStarter** for starting broker in federated mode. [#17421](https://github.com/apache/pinot/pull/17421)
* **Multi-cluster QuickStart** for easy setup and testing. [#17581](https://github.com/apache/pinot/pull/17581)
* Warnings for unavailable remote clusters in BrokerResponse. [#17510](https://github.com/apache/pinot/pull/17510)
* TLS routing support for logical tables. [#17663](https://github.com/apache/pinot/pull/17663)
* Physical optimizer support for multi-cluster routing. [#17516](https://github.com/apache/pinot/pull/17516)
* Disallow multi-cluster routing for physical tables (logical tables only). [#17731](https://github.com/apache/pinot/pull/17731)

## Logical Table Enhancements

Building on the logical table support introduced in 1.4.0, this release adds significant improvements:

* **Logical table management UI** in the Controller. [#17878](https://github.com/apache/pinot/pull/17878)
* **Logical tables panel** in Query Console. [#17199](https://github.com/apache/pinot/pull/17199)
* **QuickStart** for logical tables. [#17166](https://github.com/apache/pinot/pull/17166)
* Pluggable LogicalTableConfig serializer/deserializer. [#17678](https://github.com/apache/pinot/pull/17678)
* Logical table TLS integration tests. [#17683](https://github.com/apache/pinot/pull/17683)

## Upsert and Dedup Enhancements

### Upsert Support for Offline Tables [#17789](https://github.com/apache/pinot/pull/17789)

Extends upsert (primary-key deduplication) to **OFFLINE** tables, a capability previously limited to REALTIME tables. This enables batch-ingested data to leverage primary-key-based deduplication with the same semantics, simplifying batch-only architectures that need upsert guarantees without requiring streaming infrastructure.

### Commit Time Compaction [#16344](https://github.com/apache/pinot/pull/16344)

A performance optimization that removes invalid and obsolete records during segment commit, before the segment becomes immutable. This addresses a fundamental challenge where committed segments contain significantly more physical records than logically valid records, reducing storage and improving query performance. Support for commit time compaction with column-major build was also added. [#16769](https://github.com/apache/pinot/pull/16769)

### xxHash for Primary Key Compression [#17253](https://github.com/apache/pinot/pull/17253)

Introduces xxHash (xxh\_128) as an option for upsert primary key compression. xxHash provides lower memory footprint compared to MD5 with significantly lower collision probability, reducing upsert memory usage.

### Persist QueryableDocIds on Disk [#16517](https://github.com/apache/pinot/pull/16517)

QueryableDocIds are now persisted on disk alongside validDocIds snapshots, which is essential for supporting disk-based upsert implementations and faster segment preloading.

### Deterministic Data-Only CRC [#17264](https://github.com/apache/pinot/pull/17264), [#17380](https://github.com/apache/pinot/pull/17380)

Added a data-only CRC computation (excluding metadata) for new segments, persisted to ZK. This is used for realtime committing segments during replace operations to ensure deterministic segment identification regardless of metadata differences across replicas.

### Other Upsert/Dedup Improvements

* Revert Upsert Metadata of a segment during inconsistencies. [#17324](https://github.com/apache/pinot/pull/17324)
* Avoid inconsistencies among replicas during Upsert Compaction Tasks. [#17696](https://github.com/apache/pinot/pull/17696)
* Consensus check before selecting segment for compaction or deletion. [#17352](https://github.com/apache/pinot/pull/17352)
* Add checks to prevent updating upsert/dedup configs after table creation. [#17645](https://github.com/apache/pinot/pull/17645)
* TIMESTAMP support as dedupTimeColumn. [#17200](https://github.com/apache/pinot/pull/17200)
* New `minNumSegmentsPerTask` property for UpsertCompactMerge task. [#17104](https://github.com/apache/pinot/pull/17104)
* Make dedup tables use StrictRealtimeSegmentAssignment with multi-tier support. [#17154](https://github.com/apache/pinot/pull/17154)
* Optimize previous record location key handling for upsert metadata revert. [#17503](https://github.com/apache/pinot/pull/17503)
* Add metric to detect when server is ahead of ZK committed offset. [#17139](https://github.com/apache/pinot/pull/17139)

## Real-Time Ingestion

### Kafka 4.x Client Support [#17633](https://github.com/apache/pinot/pull/17633)

Added a new `pinot-kafka-4.0` stream ingestion module with Kafka 4.1.1 client support. This module uses KRaft mode (no ZooKeeper dependency) and provides full compatibility with Kafka 4.x clusters.

### Kafka Subset-Partition Ingestion [#17587](https://github.com/apache/pinot/pull/17587)

Tables can now consume only a subset of Kafka topic partitions, configured via `stream.kafka.partition.ids`. This enables flexible partition assignment for use cases where a single table doesn't need all partitions.

### Removal of Kafka 2.0 Plugin [#17602](https://github.com/apache/pinot/pull/17602)

The deprecated Kafka 2.0 plugin has been removed. All ingestion is now standardized on Kafka 3.x (and 4.x) clients. The default Kafka consumer was also changed to Kafka 3. [#16858](https://github.com/apache/pinot/pull/16858)

### Auto Reset Offset During Ingestion Lag [#16492](https://github.com/apache/pinot/pull/16492), [#16692](https://github.com/apache/pinot/pull/16692), [#16724](https://github.com/apache/pinot/pull/16724)

A three-part feature that enables automatic offset reset when ingestion lag exceeds configurable thresholds (by offset count or time). During segment commit, if lag exceeds the threshold, the new segment starts from the latest offset instead of next offset. Part 2 introduces topic "inactive" status, and Part 3 adds a backfill manager and handler interface for recovering skipped data.

### Pause Consumption in Batches [#17194](https://github.com/apache/pinot/pull/17194)

For tables with large numbers of partitions, pausing consumption can overload the controller as all consumers rush to commit. This adds batch configuration parameters to the pause consumption API, spreading the load.

### Real-Time Table Replication Across Clusters [#17235](https://github.com/apache/pinot/pull/17235)

First part of enabling real-time table replication between two Pinot clusters. Introduces functionality for creating a new real-time table with consuming segment watermarks from a source table's ZK metadata.

### Other Ingestion Improvements

* Fix all race conditions from stream consumer. [#17089](https://github.com/apache/pinot/pull/17089)
* Improve consumer recreation retry policy. [#17062](https://github.com/apache/pinot/pull/17062)
* End-to-end ingestion delay metric. [#17163](https://github.com/apache/pinot/pull/17163)
* Freshness checker uses minimum ingestion lag. [#17598](https://github.com/apache/pinot/pull/17598)
* Skip freshness check for streams not supporting offset lag. [#17560](https://github.com/apache/pinot/pull/17560)
* Fix multi-topic ingestion routing table errors. [#17810](https://github.com/apache/pinot/pull/17810)
* Fix incorrect stream partition id for multi-stream realtime consumption. [#17953](https://github.com/apache/pinot/pull/17953)
* Confluent library bumped to 7.9.5 to match Kafka 3.9.1. [#17614](https://github.com/apache/pinot/pull/17614)
* SimpleAvroMessageDecoder: support header-prefixed payloads (e.g., Confluent magic+schemaId). [#17077](https://github.com/apache/pinot/pull/17077)

## Time Series Engine (GA)

The Time Series Engine is now GA in Pinot 1.5.0, building on the beta from 1.4.0.

### Delta/DeltaDelta Storage Encoding [#15258](https://github.com/apache/pinot/pull/15258)

Added Delta and DeltaDelta compression codecs for time series storage, improving compression ratios for monotonically increasing or slowly changing numeric data.

### Query Execution Improvements

* **Query options and explain plan support** in Controller UI. [#17511](https://github.com/apache/pinot/pull/17511)
* **End-to-end query statistics propagation.** [#17170](https://github.com/apache/pinot/pull/17170)
* **Logical explain plan support.** [#17484](https://github.com/apache/pinot/pull/17484)
* **Partial results support.** [#17278](https://github.com/apache/pinot/pull/17278)
* **Query event listeners.** [#17464](https://github.com/apache/pinot/pull/17464)
* **M3QL JavaCC parser** to replace custom tokenizer. [#17192](https://github.com/apache/pinot/pull/17192)
* **POST /query/timeseries** as BrokerResponse-compatible API. [#16531](https://github.com/apache/pinot/pull/16531)
* **Apache ECharts visualization** in Controller UI. [#16390](https://github.com/apache/pinot/pull/16390)
* **Timeseries language endpoint and UI integration.** [#16424](https://github.com/apache/pinot/pull/16424)
* **Query stats panel** in timeseries UI. [#17423](https://github.com/apache/pinot/pull/17423)
* DeltaDelta/Delta compression table config validation. [#17252](https://github.com/apache/pinot/pull/17252)

## Aggregation Function Improvements

### ANY\_VALUE Support [#16678](https://github.com/apache/pinot/pull/16678)

New `ANY_VALUE` aggregation function that returns an arbitrary value from a column for each group. Useful for including columns in SELECT with a 1:1 mapping to GROUP BY columns without forcing them into the GROUP BY clause.

### Smart Distinct Count Functions

* **DistinctCountSmartHLL**: Improved for dictionary-encoded columns, automatically choosing exact counting for low cardinality and HLL++ for high cardinality. [#17411](https://github.com/apache/pinot/pull/17411), [#17011](https://github.com/apache/pinot/pull/17011)
* **DistinctCountSmartULL**: Smart distinct count backed by UltraLogLog (ULL) with configurable promotion threshold from exact set to ULL sketch. [#16605](https://github.com/apache/pinot/pull/16605)

### Automatic Aggregation Rewrites

The query engine now automatically rewrites aggregation functions to type-specific optimized variants:

* `MIN/MAX` on string columns rewrites to `MINSTRING/MAXSTRING`. [#16980](https://github.com/apache/pinot/pull/16980)
* `MIN/MAX/SUM` on long columns rewrites to `MINLONG/MAXLONG/SUMLONG`. [#17058](https://github.com/apache/pinot/pull/17058)
* New `SumIntAggregationFunction` for optimized INT column aggregation. [#16704](https://github.com/apache/pinot/pull/16704)
* LONG-specific MIN/MAX/SUM aggregation functions. [#17001](https://github.com/apache/pinot/pull/17001)
* Dictionary/metadata based optimization for MINSTRING/MAXSTRING. [#16983](https://github.com/apache/pinot/pull/16983)
* Comprehensive AggregationOptimizer framework for multiple functions. [#16399](https://github.com/apache/pinot/pull/16399)

### Multi-Value Aggregation Consolidation [#17519](https://github.com/apache/pinot/pull/17519), [#17109](https://github.com/apache/pinot/pull/17109)

Multi-value aggregation functions (e.g., `COUNTMV`, `SUMMV`) have been consolidated into regular single-value aggregation functions, simplifying the user experience.

### Star-Tree Index on MV Columns [#16836](https://github.com/apache/pinot/pull/16836)

Star-tree index now supports aggregation on multi-value columns (e.g., `AVGMV`), enabling pre-aggregated queries on MV data.

### Other Aggregation Improvements

* AVG window function value aggregator. [#17268](https://github.com/apache/pinot/pull/17268)
* ListAggMv (multi-value list aggregation) with distinct variant. [#17155](https://github.com/apache/pinot/pull/17155)
* ArrayAgg on multi-value columns. [#17153](https://github.com/apache/pinot/pull/17153)
* Handle empty tables for various aggregation functions. [#17750](https://github.com/apache/pinot/pull/17750)

## New Scalar Functions and Operators

* **INITCAP**: Capitalizes the first letter of each word. [#17642](https://github.com/apache/pinot/pull/17642)
* **IP Address Functions**: Suite of IP address manipulation functions. [#17127](https://github.com/apache/pinot/pull/17127)
* **Logical Functions**: AND, OR, NOT, IF, and related boolean logic functions. [#17189](https://github.com/apache/pinot/pull/17189)
* **RAND with determinism**: Scalar functions now advertise determinism; RAND supports both runtime-only and seeded modes. [#17208](https://github.com/apache/pinot/pull/17208)
* **arrayPushFront / arrayPushBack**: Append or prepend elements to arrays. [#17140](https://github.com/apache/pinot/pull/17140)
* **ARRAYHASANY**: Check if two arrays share any common element. [#17156](https://github.com/apache/pinot/pull/17156)
* **Ngrams MV UDFs**: Generate n-grams from text as multi-value columns. [#16671](https://github.com/apache/pinot/pull/16671)
* **String functions for regex and distance**: New string manipulation functions. [#17372](https://github.com/apache/pinot/pull/17372)
* **jsonExtractKey depth control and dot notation**. [#16306](https://github.com/apache/pinot/pull/16306)
* **Sanitization, Special, and Time column transformers**. [#17740](https://github.com/apache/pinot/pull/17740)
* **LIKE predicate now case-insensitive**. [#16568](https://github.com/apache/pinot/pull/16568)
* **REGEXP\_LIKE enhancements**: Scan dictionary when small; configure switch for dictionary vs raw scan. [#16478](https://github.com/apache/pinot/pull/16478), [#16879](https://github.com/apache/pinot/pull/16879)
* **StringFunctions.replace() enhanced**. [#16528](https://github.com/apache/pinot/pull/16528)

## Indexing and Storage

### N-gram Filtering Index [#16364](https://github.com/apache/pinot/pull/16364)

New realtime N-gram filtering index that efficiently prunes non-matching strings before final regex validation. Benchmarks show **100x matching speed improvement** compared to brute-force regex matching. The index extracts N-grams of configurable length and builds posting lists for each N-gram.

### IFST Index for Case-Insensitive Regex [#16276](https://github.com/apache/pinot/pull/16276)

New Insensitive FST (IFST) index type alongside the existing FST for case-insensitive regex matching. A third parameter in `REGEXP_LIKE` specifies case sensitivity.

### Minimum Should Match in Lucene Text Search [#16650](https://github.com/apache/pinot/pull/16650)

Added OpenSearch-compatible `minimum_should_match` support for TEXT\_MATCH queries, enabling control over how many terms must match in boolean queries. Supports both integer and percentage-based specifications.

### Combined Lucene Text Index Files [#16688](https://github.com/apache/pinot/pull/16688)

Lucene text index files can now be combined and merged into the `columns.psf` segment file, reducing the number of files per segment and improving storage management.

### Column Major Segment Build [#16727](https://github.com/apache/pinot/pull/16727)

New feature to build segments in columnar fashion for columnar input data sources (e.g., Pinot segments, Parquet files). This significantly improves segment rebuild performance for minion operations like schema or index config changes.

### Other Indexing Improvements

* Enable JSON index for MAP data type. [#16808](https://github.com/apache/pinot/pull/16808)
* Match prefix phrase query Lucene parser. [#16476](https://github.com/apache/pinot/pull/16476)
* NoDict hybrid cardinality: exact counting up to 2,048, then HLL++ with +10% cap. [#17186](https://github.com/apache/pinot/pull/17186)
* Use v4 as default raw index version. [#16943](https://github.com/apache/pinot/pull/16943)
* Lucene doc ID mapping for offline segments when storeInSegmentFile is true. [#17437](https://github.com/apache/pinot/pull/17437)
* Make ForwardIndexReader pluggable. [#16363](https://github.com/apache/pinot/pull/16363)
* Don't fail segment build if star-tree index build fails; skip and support rollback. [#17028](https://github.com/apache/pinot/pull/17028)
* Star-tree validation for \* column for non-COUNT functions. [#17008](https://github.com/apache/pinot/pull/17008)
* Rebuild H3 index on segment reload if resolution config is updated. [#16953](https://github.com/apache/pinot/pull/16953)

## Query Performance Optimizations

* **Early short circuit with AND/OR optimizations**: Skip evaluation of remaining predicates when result is already determined. [#16583](https://github.com/apache/pinot/pull/16583)
* **DescDocIdSetOperator**: Improves ORDER BY DESC performance on sorted columns by scanning in reverse order. [#16789](https://github.com/apache/pinot/pull/16789)
* **Trim group when orderBy = groupBy**: Sort-aggregate and pair-wise merge optimization when ORDER BY keys match GROUP BY keys, trimming to LIMIT per segment. [#16308](https://github.com/apache/pinot/pull/16308)
* **Skip per-row convertTypes()** in FunctionOperand when types already match. [#17730](https://github.com/apache/pinot/pull/17730)
* **Reduce array copying** for non-equi join conditions in LookupJoinOperator. [#17542](https://github.com/apache/pinot/pull/17542)
* **Force colocated join** when `is_colocated_by_join_keys` is provided for semi-join. [#17273](https://github.com/apache/pinot/pull/17273)
* **Optimized cloud file listing** with per-page filtering and early termination in PinotFS. [#17847](https://github.com/apache/pinot/pull/17847)
* **JSON message decoding optimization**: Parse directly to Map, avoiding intermediate JsonNode. [#17485](https://github.com/apache/pinot/pull/17485)
* **Optimize ProtoBufRecordExtractor** with field descriptor caching. [#17593](https://github.com/apache/pinot/pull/17593)
* **Optimize group ID generation** by reducing memory allocation. [#16798](https://github.com/apache/pinot/pull/16798)
* **Reduce memory footprint** for MailboxContentObserver and ReceivingMailbox. [#16872](https://github.com/apache/pinot/pull/16872)

## Query Resource Isolation

### Workload Scheduler [#16018](https://github.com/apache/pinot/pull/16018)

An improvement over Binary Workload Scheduler that verifies CPU/memory budgets at admission time. Uses `WorkloadBudgetManager.canAdmitQuery()` before scheduling, with O(1) atomic-counter budget checks adding negligible overhead.

### Cost-Split Support [#16672](https://github.com/apache/pinot/pull/16672)

Workload budgets (CPU/memory) can now be split across tables/tenants with support for real-time splits through custom overrides per propagation scheme (e.g., CONSUMING vs COMPLETED stages).

### Other Resource Isolation Improvements

* Queue-based throttling for ThrottleOnCriticalHeapUsageExecutor (graceful degradation instead of immediate rejection). [#16409](https://github.com/apache/pinot/pull/16409)
* Byte-based throttling. [#16572](https://github.com/apache/pinot/pull/16572)
* Queries self-terminate in panic mode. [#16380](https://github.com/apache/pinot/pull/16380)
* QUERIES\_THROTTLED metric. [#16676](https://github.com/apache/pinot/pull/16676)
* Workload stats collection interface. [#16340](https://github.com/apache/pinot/pull/16340)

## Security and TLS

* **TLS protocol allowlist**: Allow configuring permitted TLS protocols and hardened JVM SSL initialization. [#17776](https://github.com/apache/pinot/pull/17776)
* **Centralized SSL context**: Set SSL context in Controller/Broker/Server contexts for consistent TLS configuration. [#17358](https://github.com/apache/pinot/pull/17358)
* **Runtime TLS diagnostics** for gRPC and HTTPS connections. [#17559](https://github.com/apache/pinot/pull/17559)
* **Access control for segment download API**. [#17508](https://github.com/apache/pinot/pull/17508)
* **MSE gRPC channel authorization**. [#16475](https://github.com/apache/pinot/pull/16475)
* **@Authenticate annotation** on critical endpoints. [#17552](https://github.com/apache/pinot/pull/17552)
* **SPI-based token resolver** for custom audit identity resolution. [#17658](https://github.com/apache/pinot/pull/17658)
* **Audit logging system**: Comprehensive audit logging support with cluster config listener, response auditing, request body handling with size limits, URL filter patterns, and component-specific config prefixes. [#16616](https://github.com/apache/pinot/pull/16616), [#16649](https://github.com/apache/pinot/pull/16649), [#16851](https://github.com/apache/pinot/pull/16851), [#16747](https://github.com/apache/pinot/pull/16747), [#16807](https://github.com/apache/pinot/pull/16807), [#16823](https://github.com/apache/pinot/pull/16823), [#16766](https://github.com/apache/pinot/pull/16766)
* **SAS token authentication** for Azure Data Lake Storage (ADLS). [#16343](https://github.com/apache/pinot/pull/16343)
* **Validate gRPC broker connections on connect**. [#17835](https://github.com/apache/pinot/pull/17835)
* **Refresh AWS credentials** at runtime. [#16553](https://github.com/apache/pinot/pull/16553)

## Minion / Task Framework

* **Table-level distributed locking** for atomic minion task generation across controllers. [#16857](https://github.com/apache/pinot/pull/16857)
* **Configurable max tasks per minion instance**. [#16981](https://github.com/apache/pinot/pull/16981)
* **Cluster-level max subtasks configuration**. [#16571](https://github.com/apache/pinot/pull/16571)
* **Subtask timing metrics**. [#17190](https://github.com/apache/pinot/pull/17190)
* **Task data cleanup on table deletion**. [#16307](https://github.com/apache/pinot/pull/16307)
* **Tag minion as Drained**. [#17375](https://github.com/apache/pinot/pull/17375)
* **GET /minions/status** API endpoint. [#17475](https://github.com/apache/pinot/pull/17475)
* **API for all minion tasks and summaries**. [#17330](https://github.com/apache/pinot/pull/17330)
* **Bound task queue size** to prevent ZK bloat. [#17741](https://github.com/apache/pinot/pull/17741)
* **LaunchBackfillIngestionJob** for complete backfill support. [#16890](https://github.com/apache/pinot/pull/16890)
* **Metadata push mode** in BaseSingleSegmentConversionExecutor. [#17632](https://github.com/apache/pinot/pull/17632)
* **PurgeTask auto-deletes empty segments**. [#16368](https://github.com/apache/pinot/pull/16368)

## Rebalance and Cluster Management

* **Disaster Recovery Mode** as Controller/Cluster config. [#17243](https://github.com/apache/pinot/pull/17243)
* **Tenant Rebalance Cancellation**. [#16886](https://github.com/apache/pinot/pull/16886)
* **TenantRebalanceChecker** for improved monitoring. [#16455](https://github.com/apache/pinot/pull/16455)
* **Peer-download enabled table rebalance** handling. [#16341](https://github.com/apache/pinot/pull/16341)
* **Helix constraint on state transition messages**. [#16933](https://github.com/apache/pinot/pull/16933)
* **StateTransitionThreadPoolManager** to override executor in SegmentOnlineOfflineStateModelFactory. [#17453](https://github.com/apache/pinot/pull/17453)
* **Resume partially offline replicas** in controller validation. [#17754](https://github.com/apache/pinot/pull/17754)
* **Configurable server segment predownload**. [#17287](https://github.com/apache/pinot/pull/17287)
* **Batch delete** deleted directory segments. [#16848](https://github.com/apache/pinot/pull/16848)
* **Async segment refresh** processing when enabled. [#16931](https://github.com/apache/pinot/pull/16931)
* Enable deep store upload retry for Pauseless by default. [#17241](https://github.com/apache/pinot/pull/17241)

## Observability and Metrics

* **Query Fingerprint**: Configurable AST-based query fingerprinting that replaces literals with `?` to create canonical query representations. Enriches logging via MDC and structured logs. [#17177](https://github.com/apache/pinot/pull/17177)
* **FlameGraph visualization** for MSE query stage stats with clock time and allocation modes. [#17263](https://github.com/apache/pinot/pull/17263)
* **Memory and GC time tracking** in query execution statistics. [#16576](https://github.com/apache/pinot/pull/16576)
* **SLA-style per-query error metrics** on broker. [#17457](https://github.com/apache/pinot/pull/17457)
* **Query response size metrics** on broker and server. [#17412](https://github.com/apache/pinot/pull/17412), [#17710](https://github.com/apache/pinot/pull/17710)
* **Segment download latency metric**. [#16375](https://github.com/apache/pinot/pull/16375)
* **Netty and gRPC memory metrics** (total max and used). [#16939](https://github.com/apache/pinot/pull/16939)
* **Zone failure tolerant segments metric** on broker. [#16334](https://github.com/apache/pinot/pull/16334)
* **Segment reload failure tracking** with in-memory status cache and segment-level details. [#17099](https://github.com/apache/pinot/pull/17099), [#17234](https://github.com/apache/pinot/pull/17234)

## Connectors and Clients

### Pinot CLI [#17029](https://github.com/apache/pinot/pull/17029)

New `pinot-cli` module providing a modern interactive and batch command-line client for Apache Pinot. Features include multi-line SQL REPL with history, multiple output formats (CSV/TSV/JSON/ALIGNED/VERTICAL/MARKDOWN), config via properties file, and support for extra HTTP headers.

### Comprehensive Pinot Admin Client [#17040](https://github.com/apache/pinot/pull/17040)

New admin client in `pinot-java-client` providing complete programmatic access to all Pinot controller REST endpoints. Includes 6 service clients (Table, Schema, Instance, Segment, Tenant, Task), authentication support, async operations, and comprehensive error handling.

### Other Client Improvements

* **Cursor-based pagination** support in Java client. [#16782](https://github.com/apache/pinot/pull/16782)
* **Apache Arrow format decoder** for stream data ingestion with 20-30% Kafka data volume reduction. [#17031](https://github.com/apache/pinot/pull/17031)
* **X-Correlation-Id header** in Java client HTTP requests. [#17863](https://github.com/apache/pinot/pull/17863)
* **Spark connector improvements**: Pinot Proxy, unified secureMode, comprehensive gRPC support. [#16666](https://github.com/apache/pinot/pull/16666)

## Controller UI Improvements

* **Logical table management UI**. [#17878](https://github.com/apache/pinot/pull/17878)
* **Primary key display** in schema UI. [#17566](https://github.com/apache/pinot/pull/17566)
* **Instance liveness vs health** distinction. [#17450](https://github.com/apache/pinot/pull/17450), [#17248](https://github.com/apache/pinot/pull/17248)
* **Minion task stats and status filtering**. [#16521](https://github.com/apache/pinot/pull/16521)
* **Package Versions** section in cluster manager. [#16534](https://github.com/apache/pinot/pull/16534)
* **Time zone support** and segment start/end time in summary. [#16489](https://github.com/apache/pinot/pull/16489)
* **Webpack 5 and TypeScript 5.8** migration. [#16365](https://github.com/apache/pinot/pull/16365)

## Other Notable Features

* **ignoreMissingSegments query option**: Skip SERVER\_SEGMENT\_MISSING errors with broker auto-config. [#16556](https://github.com/apache/pinot/pull/16556)
* **$partitionId virtual column**: Access the partition ID as a virtual column in queries. [#16721](https://github.com/apache/pinot/pull/16721)
* **Query routing strategy pluggable**. [#17364](https://github.com/apache/pinot/pull/17364)
* **Pluggable table samplers** with precomputed broker routing entries. [#17532](https://github.com/apache/pinot/pull/17532)
* **Storage quota check for batch segment upload**. [#17512](https://github.com/apache/pinot/pull/17512)
* **Disk utilization check extended to offline segment upload**. [#17579](https://github.com/apache/pinot/pull/17579)
* **PinotFS streamed untar** for segment download. [#17586](https://github.com/apache/pinot/pull/17586)
* **Global `pinot.md5.disabled` switch** to enforce MD5 guards. [#17800](https://github.com/apache/pinot/pull/17800)
* **Custom map metadata** in segment metadata and upload. [#17055](https://github.com/apache/pinot/pull/17055)
* **Segment tier movement API**: Select all segments at once for tier migration. [#17812](https://github.com/apache/pinot/pull/17812)
* **Custom retention time** for untracked segments. [#16719](https://github.com/apache/pinot/pull/16719)
* **Enhanced /tableConfigs/validate** with cluster-aware validations. [#16675](https://github.com/apache/pinot/pull/16675)

## Notable Bug Fixes

* Fix Broken Lucene Query Tracking and Cancellation for OOM Protection. [#17884](https://github.com/apache/pinot/pull/17884)
* Fix NOT TEXT\_MATCH false positives on consuming segments. [#17880](https://github.com/apache/pinot/pull/17880)
* Fix incorrect stream partition id for multi-stream realtime consumption. [#17953](https://github.com/apache/pinot/pull/17953)
* Fix double-unescaping of SQL string literals in single-stage engine. [#17438](https://github.com/apache/pinot/pull/17438)
* Fix broker write failure handling in ServerChannels. [#17861](https://github.com/apache/pinot/pull/17861)
* Fix server outbound write failure creating zombie channels. [#17845](https://github.com/apache/pinot/pull/17845)
* Fix QueryServer channel cleanup on close. [#17854](https://github.com/apache/pinot/pull/17854)
* Fix negative reload estimated time and year 2088 rebalance timestamp. [#17833](https://github.com/apache/pinot/pull/17833)
* Fix ResponseStoreCleaner bugs causing unbounded response store growth. [#17622](https://github.com/apache/pinot/pull/17622)
* Fix funnel aggregation worker threads not responding to timeout/cancellation. [#17692](https://github.com/apache/pinot/pull/17692)
* Fix off-heap memory spike issue. [#17489](https://github.com/apache/pinot/pull/17489)
* Fix SSE query optimization regression for hybrid tables. [#17751](https://github.com/apache/pinot/pull/17751)
* Fix bug with COUNT(col) when using star-tree index with null handling enabled. [#17106](https://github.com/apache/pinot/pull/17106)
* Fix null handling issue for string to timestamp casts. [#16885](https://github.com/apache/pinot/pull/16885)
* Fix null values in JsonExtractScalarTransformFunction with default value support. [#16683](https://github.com/apache/pinot/pull/16683)
* Handle null values gracefully in SSE post aggregation functions. [#16711](https://github.com/apache/pinot/pull/16711)
* Fix CAST evaluation with literal-only operands in MSQE. [#16421](https://github.com/apache/pinot/pull/16421)
* Fix AWS SDK credential provider thread leak in S3PinotFS. [#17869](https://github.com/apache/pinot/pull/17869)
* Fix S3PinotFS URL Encoding for S3-compatible storage. [#17691](https://github.com/apache/pinot/pull/17691)
* Improve GcsPinotFS deleteBatch resiliency. [#17713](https://github.com/apache/pinot/pull/17713)
* Fix DirectOOMHandler string matching bug for Netty direct memory OOM. [#17684](https://github.com/apache/pinot/pull/17684)
* Fix early termination on MSE operator. [#16696](https://github.com/apache/pinot/pull/16696)
* Fix lead/lag window functions filling beyond num rows. [#17348](https://github.com/apache/pinot/pull/17348)
* Fix truncate(value) signed-zero and non-finite behavior. [#17677](https://github.com/apache/pinot/pull/17677)
* Fix complex field ser/de. [#17685](https://github.com/apache/pinot/pull/17685)
* Fix metadata APIs to report primary key counts for dedup tables. [#17736](https://github.com/apache/pinot/pull/17736)
* Fix non-daemon threads blocking JVM shutdown in RenewableTlsUtils. [#17221](https://github.com/apache/pinot/pull/17221)
* Fix NPE in TableRebalancer. [#17723](https://github.com/apache/pinot/pull/17723)
* Fix multi-stage query stats serde compatibility issue caused by UNNEST. [#17590](https://github.com/apache/pinot/pull/17590)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pinot.apache.org/reference/release-notes/releases/1.5.0.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.