githubEdit

1.5.0

Release Notes for 1.5.0

This release delivers significant improvements across the Multi-stage Query Engine (new UNNEST support, enriched joins, aggregation rewrites), Upsert (offline table support, commit-time compaction, xxhash compression), a new Federation / Multi-Cluster Routing framework, Kafka 4.x support, real-time auto-reset, Time Series Engine enhancements, new indexing capabilities (N-gram, IFST, combined Lucene), Query Resource Isolation, and numerous performance optimizations, security hardening, and bug fixes. It includes over 1100 commits from the community.

Multi-stage Query Engine Enhancements

UNNEST / CROSS JOIN UNNEST Support #17168arrow-up-right

The multi-stage engine now natively supports array unnesting via CROSS JOIN UNNEST(...), with optional WITH ORDINALITY. This enables flattening array columns in queries without workarounds, a frequently requested SQL feature.

Enriched Join Operator #16123arrow-up-right

The MSE join operator has been enriched with improved execution strategies, enhancing join performance and flexibility for complex multi-table queries.

MSE Lite Mode Improvements

  • Disable joins in Lite Mode: Added a configuration to explicitly disallow joins in MSE Lite Mode for workloads that should remain scatter-gather only. #17030arrow-up-right

  • Fan-out adjusted limit: Lite Mode now supports adjusted limits based on fan-out to reduce data transferred. #16822arrow-up-right

  • Push collation from Exchange to Sort: Optimizes Lite Mode queries by pushing collation requirements into sort operators. #16551arrow-up-right

Physical Optimizer for Logical Tables #17447arrow-up-right

Extended the Physical Optimizer to support logical tables, enabling exchange simplification and optimized execution plans for queries against logical table abstractions.

UNION (Distinct) Support #16570arrow-up-right

Added support for UNION (distinct) set operations in addition to the existing UNION ALL, enabling more performant deduplication of results from combined queries.

Other MSE Improvements

Federation / Multi-Cluster Routing (New)

Pinot 1.5.0 introduces a new federation framework enabling a single broker to route queries across multiple independent Pinot clusters. This is a foundational capability for organizations operating multiple Pinot clusters that need unified query access.

Logical Table Enhancements

Building on the logical table support introduced in 1.4.0, this release adds significant improvements:

Upsert and Dedup Enhancements

Upsert Support for Offline Tables #17789arrow-up-right

Extends upsert (primary-key deduplication) to OFFLINE tables, a capability previously limited to REALTIME tables. This enables batch-ingested data to leverage primary-key-based deduplication with the same semantics, simplifying batch-only architectures that need upsert guarantees without requiring streaming infrastructure.

Commit Time Compaction #16344arrow-up-right

A performance optimization that removes invalid and obsolete records during segment commit, before the segment becomes immutable. This addresses a fundamental challenge where committed segments contain significantly more physical records than logically valid records, reducing storage and improving query performance. Support for commit time compaction with column-major build was also added. #16769arrow-up-right

xxHash for Primary Key Compression #17253arrow-up-right

Introduces xxHash (xxh_128) as an option for upsert primary key compression. xxHash provides lower memory footprint compared to MD5 with significantly lower collision probability, reducing upsert memory usage.

Persist QueryableDocIds on Disk #16517arrow-up-right

QueryableDocIds are now persisted on disk alongside validDocIds snapshots, which is essential for supporting disk-based upsert implementations and faster segment preloading.

Deterministic Data-Only CRC #17264arrow-up-right, #17380arrow-up-right

Added a data-only CRC computation (excluding metadata) for new segments, persisted to ZK. This is used for realtime committing segments during replace operations to ensure deterministic segment identification regardless of metadata differences across replicas.

Other Upsert/Dedup Improvements

Real-Time Ingestion

Kafka 4.x Client Support #17633arrow-up-right

Added a new pinot-kafka-4.0 stream ingestion module with Kafka 4.1.1 client support. This module uses KRaft mode (no ZooKeeper dependency) and provides full compatibility with Kafka 4.x clusters.

Kafka Subset-Partition Ingestion #17587arrow-up-right

Tables can now consume only a subset of Kafka topic partitions, configured via stream.kafka.partition.ids. This enables flexible partition assignment for use cases where a single table doesn't need all partitions.

Removal of Kafka 2.0 Plugin #17602arrow-up-right

The deprecated Kafka 2.0 plugin has been removed. All ingestion is now standardized on Kafka 3.x (and 4.x) clients. The default Kafka consumer was also changed to Kafka 3. #16858arrow-up-right

Auto Reset Offset During Ingestion Lag #16492arrow-up-right, #16692arrow-up-right, #16724arrow-up-right

A three-part feature that enables automatic offset reset when ingestion lag exceeds configurable thresholds (by offset count or time). During segment commit, if lag exceeds the threshold, the new segment starts from the latest offset instead of next offset. Part 2 introduces topic "inactive" status, and Part 3 adds a backfill manager and handler interface for recovering skipped data.

Pause Consumption in Batches #17194arrow-up-right

For tables with large numbers of partitions, pausing consumption can overload the controller as all consumers rush to commit. This adds batch configuration parameters to the pause consumption API, spreading the load.

Real-Time Table Replication Across Clusters #17235arrow-up-right

First part of enabling real-time table replication between two Pinot clusters. Introduces functionality for creating a new real-time table with consuming segment watermarks from a source table's ZK metadata.

Other Ingestion Improvements

Time Series Engine (GA)

The Time Series Engine is now GA in Pinot 1.5.0, building on the beta from 1.4.0.

Delta/DeltaDelta Storage Encoding #15258arrow-up-right

Added Delta and DeltaDelta compression codecs for time series storage, improving compression ratios for monotonically increasing or slowly changing numeric data.

Query Execution Improvements

Aggregation Function Improvements

ANY_VALUE Support #16678arrow-up-right

New ANY_VALUE aggregation function that returns an arbitrary value from a column for each group. Useful for including columns in SELECT with a 1:1 mapping to GROUP BY columns without forcing them into the GROUP BY clause.

Smart Distinct Count Functions

  • DistinctCountSmartHLL: Improved for dictionary-encoded columns, automatically choosing exact counting for low cardinality and HLL++ for high cardinality. #17411arrow-up-right, #17011arrow-up-right

  • DistinctCountSmartULL: Smart distinct count backed by UltraLogLog (ULL) with configurable promotion threshold from exact set to ULL sketch. #16605arrow-up-right

Automatic Aggregation Rewrites

The query engine now automatically rewrites aggregation functions to type-specific optimized variants:

Multi-Value Aggregation Consolidation #17519arrow-up-right, #17109arrow-up-right

Multi-value aggregation functions (e.g., COUNTMV, SUMMV) have been consolidated into regular single-value aggregation functions, simplifying the user experience.

Star-Tree Index on MV Columns #16836arrow-up-right

Star-tree index now supports aggregation on multi-value columns (e.g., AVGMV), enabling pre-aggregated queries on MV data.

Other Aggregation Improvements

New Scalar Functions and Operators

Indexing and Storage

N-gram Filtering Index #16364arrow-up-right

New realtime N-gram filtering index that efficiently prunes non-matching strings before final regex validation. Benchmarks show 100x matching speed improvement compared to brute-force regex matching. The index extracts N-grams of configurable length and builds posting lists for each N-gram.

IFST Index for Case-Insensitive Regex #16276arrow-up-right

New Insensitive FST (IFST) index type alongside the existing FST for case-insensitive regex matching. A third parameter in REGEXP_LIKE specifies case sensitivity.

Minimum Should Match in Lucene Text Search #16650arrow-up-right

Added OpenSearch-compatible minimum_should_match support for TEXT_MATCH queries, enabling control over how many terms must match in boolean queries. Supports both integer and percentage-based specifications.

Combined Lucene Text Index Files #16688arrow-up-right

Lucene text index files can now be combined and merged into the columns.psf segment file, reducing the number of files per segment and improving storage management.

Column Major Segment Build #16727arrow-up-right

New feature to build segments in columnar fashion for columnar input data sources (e.g., Pinot segments, Parquet files). This significantly improves segment rebuild performance for minion operations like schema or index config changes.

Other Indexing Improvements

Query Performance Optimizations

  • Early short circuit with AND/OR optimizations: Skip evaluation of remaining predicates when result is already determined. #16583arrow-up-right

  • DescDocIdSetOperator: Improves ORDER BY DESC performance on sorted columns by scanning in reverse order. #16789arrow-up-right

  • Trim group when orderBy = groupBy: Sort-aggregate and pair-wise merge optimization when ORDER BY keys match GROUP BY keys, trimming to LIMIT per segment. #16308arrow-up-right

  • Skip per-row convertTypes() in FunctionOperand when types already match. #17730arrow-up-right

  • Reduce array copying for non-equi join conditions in LookupJoinOperator. #17542arrow-up-right

  • Force colocated join when is_colocated_by_join_keys is provided for semi-join. #17273arrow-up-right

  • Optimized cloud file listing with per-page filtering and early termination in PinotFS. #17847arrow-up-right

  • JSON message decoding optimization: Parse directly to Map, avoiding intermediate JsonNode. #17485arrow-up-right

  • Optimize ProtoBufRecordExtractor with field descriptor caching. #17593arrow-up-right

  • Optimize group ID generation by reducing memory allocation. #16798arrow-up-right

  • Reduce memory footprint for MailboxContentObserver and ReceivingMailbox. #16872arrow-up-right

Query Resource Isolation

Workload Scheduler #16018arrow-up-right

An improvement over Binary Workload Scheduler that verifies CPU/memory budgets at admission time. Uses WorkloadBudgetManager.canAdmitQuery() before scheduling, with O(1) atomic-counter budget checks adding negligible overhead.

Cost-Split Support #16672arrow-up-right

Workload budgets (CPU/memory) can now be split across tables/tenants with support for real-time splits through custom overrides per propagation scheme (e.g., CONSUMING vs COMPLETED stages).

Other Resource Isolation Improvements

Security and TLS

Minion / Task Framework

Rebalance and Cluster Management

Observability and Metrics

Connectors and Clients

New pinot-cli module providing a modern interactive and batch command-line client for Apache Pinot. Features include multi-line SQL REPL with history, multiple output formats (CSV/TSV/JSON/ALIGNED/VERTICAL/MARKDOWN), config via properties file, and support for extra HTTP headers.

Comprehensive Pinot Admin Client #17040arrow-up-right

New admin client in pinot-java-client providing complete programmatic access to all Pinot controller REST endpoints. Includes 6 service clients (Table, Schema, Instance, Segment, Tenant, Task), authentication support, async operations, and comprehensive error handling.

Other Client Improvements

Controller UI Improvements

Other Notable Features

Notable Bug Fixes

Last updated

Was this helpful?