# 0.9.0

## Summary

This release introduces a new features: Segment Merge and Rollup to simplify users day to day operational work. A new metrics plugin is added to support dropwizard. As usual, new functionalities and many UI/ Performance improvements.

The release was cut from the following commit: [13c9ee9](https://github.com/apache/pinot/commit/13c9ee9556498bb6dc4ab60734743edb8b89773c) and the following cherry-picks: [668b5e0](https://github.com/apache/pinot/commit/668b5e01d7c263e3cdb3081e0a947a43e7d7f782), [ee887b9](https://github.com/apache/pinot/commit/ee887b97e77ef7a132d3d6d60f83a800e52d4555)

### Support Segment Merge and Roll-up

LinkedIn operates a large multi-tenant cluster that serves a business metrics dashboard, and noticed that their tables consisted of millions of small segments. This was leading to slow operations in Helix/Zookeeper, long running queries due to having too many tasks to process, as well as using more space because of a lack of compression.

To solve this problem they added the Segment Merge task, which compresses segments based on timestamps and rolls up/aggregates older data. The task can be run on a schedule or triggered manually via the Pinot REST API.

At the moment this feature is only available for offline tables, but will be added for real-time tables in a future release.

Major Changes:

* Integrate enhanced SegmentProcessorFramework into MergeRollupTaskExecutor ([#7180](https://github.com/apache/pinot/pull/7180))
* Merge/Rollup task scheduler for offline tables. ([#7178](https://github.com/apache/pinot/pull/7178))
* Fix MergeRollupTask uploading segments not updating their metadata ([#7289](https://github.com/apache/pinot/pull/7289))
* MergeRollupTask integration tests ([#7283](https://github.com/apache/pinot/pull/7283))
* Add mergeRollupTask delay metrics ([#7368](https://github.com/apache/pinot/pull/7368))
* MergeRollupTaskGenerator enhancement: enable parallel buckets scheduling ([#7481](https://github.com/apache/pinot/pull/7481))
* Use maxEndTimeMs for merge/roll-up delay metrics. ([#7617](https://github.com/apache/pinot/pull/7617))

### UI Improvement

This release also sees improvements to Pinot’s query console UI.

* Cmd+Enter shortcut to run query in query console ([#7359](https://github.com/apache/pinot/pull/7359))
* Showing tooltip in SQL Editor ([#7387](https://github.com/apache/pinot/pull/7387))
* Make the SQL Editor box expandable ([#7381](https://github.com/apache/pinot/pull/7381))
* Fix tables ordering by number of segments ([#7564](https://github.com/apache/pinot/pull/7564))

### SQL Improvements

There have also been improvements and additions to Pinot’s SQL implementation.

#### New functions:

* IN ([#7542](https://github.com/apache/pinot/pull/7542))
* LASTWITHTIME ([#7584](https://github.com/apache/pinot/pull/7584))
* ID\_SET on MV columns ([#7355](https://github.com/apache/pinot/pull/7355))
* Raw results for Percentile TDigest and Est ([#7226](https://github.com/apache/pinot/pull/7226)),
* Add timezone as argument in function toDateTime ([#7552](https://github.com/apache/pinot/pull/7552))

#### New predicates are supported:

* LIKE([#7214](https://github.com/apache/pinot/pull/7214))
* REGEXP\_EXTRACT([#7114](https://github.com/apache/pinot/pull/7114))
* FILTER([#7566](https://github.com/apache/pinot/pull/7566))

#### Query compatibility improvements:

* Infer data type for Literal ([#7332](https://github.com/apache/pinot/pull/7332))
* Support logical identifier in predicate ([#7347](https://github.com/apache/pinot/pull/7347))
* Support JSON queries with top-level array path expression. ([#7511](https://github.com/apache/pinot/pull/7511))
* Support configurable group by trim size to improve results accuracy ([#7241](https://github.com/apache/pinot/pull/7241))

### Performance Improvements

This release contains many performance improvement, you may sense it for you day to day queries. Thanks to all the great contributions listed below:

* Reduce the disk usage for segment conversion task ([#7193](https://github.com/apache/pinot/pull/7193))
* Simplify association between Java Class and PinotDataType for faster mapping ([#7402](https://github.com/apache/pinot/pull/7402))
* Avoid creating stateless ParseContextImpl once per jsonpath evaluation, avoid varargs allocation ([#7412](https://github.com/apache/pinot/pull/7412))
* Replace MINUS with STRCMP ([#7394](https://github.com/apache/pinot/pull/7394))
* Bit-sliced range index for int, long, float, double, dictionarized SV columns ([#7454](https://github.com/apache/pinot/pull/7454))
* Use MethodHandle to access vectorized unsigned comparison on JDK9+ ([#7487](https://github.com/apache/pinot/pull/7487))
* Add option to limit thread usage per query ([#7492](https://github.com/apache/pinot/pull/7492))
* Improved range queries ([#7513](https://github.com/apache/pinot/pull/7513))
* Faster bitmap scans ([#7530](https://github.com/apache/pinot/pull/7530))
* Optimize EmptySegmentPruner to skip pruning when there is no empty segments ([#7531](https://github.com/apache/pinot/pull/7531))
* Map bitmaps through a bounded window to avoid excessive disk pressure ([#7535](https://github.com/apache/pinot/pull/7535))
* Allow RLE compression of bitmaps for smaller file sizes ([#7582](https://github.com/apache/pinot/pull/7582))
* Support raw index properties for columns with JSON and RANGE indexes ([#7615](https://github.com/apache/pinot/pull/7615))
* Enhance BloomFilter rule to include IN predicate([#7444](https://github.com/apache/pinot/issues/7444)) ([#7624](https://github.com/apache/pinot/pull/7624))
* Introduce `LZ4_WITH_LENGTH` chunk compression type ([#7655](https://github.com/apache/pinot/pull/7655))
* Enhance ColumnValueSegmentPruner and support bloom filter prefetch ([#7654](https://github.com/apache/pinot/pull/7654))
* Apply the optimization on dictIds within the segment to DistinctCountHLL aggregation func ([#7630](https://github.com/apache/pinot/pull/7630))
* During segment pruning, release the bloom filter after each segment is processed ([#7668](https://github.com/apache/pinot/pull/7668))
* Fix JSONPath cache inefficient issue ([#7409](https://github.com/apache/pinot/pull/7409))
* Optimize getUnpaddedString with SWAR padding search ([#7708](https://github.com/apache/pinot/pull/7708))
* Lighter weight LiteralTransformFunction, avoid excessive array fills ([#7707](https://github.com/apache/pinot/pull/7707))
* Inline binary comparison ops to prevent function call overhead ([#7709](https://github.com/apache/pinot/pull/7709))
* Memoize literals in query context in order to deduplicate them ([#7720](https://github.com/apache/pinot/pull/7720))

### Other Notable New Features and Changes

* Human Readable Controller Configs ([#7173](https://github.com/apache/pinot/pull/7173))
* Add the support of geoToH3 function ([#7182](https://github.com/apache/pinot/pull/7182))
* Add Apache Pulsar as Pinot Plugin ([#7223](https://github.com/apache/pinot/pull/7223)) ([#7247](https://github.com/apache/pinot/pull/7247))
* Add dropwizard metrics plugin ([#7263](https://github.com/apache/pinot/pull/7263))
* Introduce OR Predicate Execution On Star Tree Index ([#7184](https://github.com/apache/pinot/pull/7184))
* Allow to extract values from array of objects with jsonPathArray ([#7208](https://github.com/apache/pinot/pull/7208))
* Add Realtime table metadata and indexes API. ([#7169](https://github.com/apache/pinot/pull/7169))
* Support array with mixing data types ([#7234](https://github.com/apache/pinot/pull/7234))
* Support force download segment in reload API ([#7249](https://github.com/apache/pinot/pull/7249))
* Show uncompressed znRecord from zk api ([#7304](https://github.com/apache/pinot/pull/7304))
* Add debug endpoint to get minion task status. ([#7300](https://github.com/apache/pinot/pull/7300))
* Validate CSV Header For Configured Delimiter ([#7237](https://github.com/apache/pinot/pull/7237))
* Add auth tokens and user/password support to ingestion job command ([#7233](https://github.com/apache/pinot/pull/7233))
* Add option to store the hash of the upsert primary key ([#7246](https://github.com/apache/pinot/pull/7246))
* Add null support for time column ([#7269](https://github.com/apache/pinot/pull/7269))
* Add mode aggregation function ([#7318](https://github.com/apache/pinot/pull/7318))
* Support disable swagger in Pinot servers ([#7341](https://github.com/apache/pinot/pull/7341))
* Delete metadata properly on table deletion ([#7329](https://github.com/apache/pinot/pull/7329))
* Add basic Obfuscator Support ([#7407](https://github.com/apache/pinot/pull/7407))
* Add AWS sts dependency to enable auth using web identity token. ([#7017](https://github.com/apache/pinot/pull/7017))([#7445](https://github.com/apache/pinot/pull/7445))
* Mask credentials in debug endpoint /appconfigs ([#7452](https://github.com/apache/pinot/pull/7452))
* Fix /sql query endpoint now compatible with auth ([#7230](https://github.com/apache/pinot/pull/7230))
* Fix case sensitive issue in BasicAuthPrincipal permission check ([#7354](https://github.com/apache/pinot/pull/7354))
* Fix auth token injection in SegmentGenerationAndPushTaskExecutor ([#7464](https://github.com/apache/pinot/pull/7464))
* Add segmentNameGeneratorType config to IndexingConfig ([#7346](https://github.com/apache/pinot/pull/7346))
* Support trigger PeriodicTask manually ([#7174](https://github.com/apache/pinot/pull/7174))
* Add endpoint to check minion task status for a single task. ([#7353](https://github.com/apache/pinot/pull/7353))
* Showing partial status of segment and counting CONSUMING state as good segment status ([#7327](https://github.com/apache/pinot/pull/7327))
* Add "num rows in segments" and "num segments queried per host" to the output of Realtime Provisioning Rule ([#7282](https://github.com/apache/pinot/pull/7282))
* Check schema backward-compatibility when updating schema through addSchema with override ([#7374](https://github.com/apache/pinot/pull/7374))
* Optimize IndexedTable ([#7373](https://github.com/apache/pinot/pull/7373))
* Support indices remove in V3 segment format ([#7301](https://github.com/apache/pinot/pull/7301))
* Optimize TableResizer ([#7392](https://github.com/apache/pinot/pull/7392))
* Introduce resultSize in IndexedTable ([#7420](https://github.com/apache/pinot/pull/7420))
* Offset based realtime consumption status checker ([#7267](https://github.com/apache/pinot/pull/7267))
* Add causes to stack trace return ([#7460](https://github.com/apache/pinot/pull/7460))
* Create controller resource packages config key ([#7488](https://github.com/apache/pinot/pull/7488))
* Enhance TableCache to support schema name different from table name ([#7525](https://github.com/apache/pinot/pull/7525))
* Add validation for realtimeToOffline task ([#7523](https://github.com/apache/pinot/pull/7523))
* Unify CombineOperator multi-threading logic ([#7450](https://github.com/apache/pinot/pull/7450))
* Support no downtime rebalance for table with 1 replica in TableRebalancer ([#7532](https://github.com/apache/pinot/pull/7532))
* Introduce MinionConf, move END\_REPLACE\_SEGMENTS\_TIMEOUT\_MS to minion config instead of task config. ([#7516](https://github.com/apache/pinot/pull/7516))
* Adjust tuner api ([#7553](https://github.com/apache/pinot/pull/7553))
* Adding config for metrics library ([#7551](https://github.com/apache/pinot/pull/7551))
* Add geo type conversion scalar functions ([#7573](https://github.com/apache/pinot/pull/7573))
* Add BOOLEAN\_ARRAY and TIMESTAMP\_ARRAY types ([#7581](https://github.com/apache/pinot/pull/7581))
* Add MV raw forward index and MV `BYTES` data type ([#7595](https://github.com/apache/pinot/pull/7595))
* Enhance TableRebalancer to offload the segments from most loaded instances first ([#7574](https://github.com/apache/pinot/pull/7574))
* Improve get tenant API to differentiate offline and realtime tenants ([#7548](https://github.com/apache/pinot/pull/7548))
* Refactor query rewriter to interfaces and implementations to allow customization ([#7576](https://github.com/apache/pinot/pull/7576))
* In ServiceStartable, apply global cluster config in ZK to instance config ([#7593](https://github.com/apache/pinot/pull/7593))
* Make dimension tables creation bypass tenant validation ([#7559](https://github.com/apache/pinot/pull/7559))
* Allow Metadata and Dictionary Based Plans for No Op Filters ([#7563](https://github.com/apache/pinot/pull/7563))
* Reject query with identifiers not in schema ([#7590](https://github.com/apache/pinot/pull/7590))
* Round Robin IP addresses when retry uploading/downloading segments ([#7585](https://github.com/apache/pinot/pull/7585))
* Support multi-value derived column in offline table reload ([#7632](https://github.com/apache/pinot/pull/7632))
* Support segmentNamePostfix in segment name ([#7646](https://github.com/apache/pinot/pull/7646))
* Add select segments API ([#7651](https://github.com/apache/pinot/pull/7651))
* Controller getTableInstance() call now returns the list of live brokers of a table. ([#7556](https://github.com/apache/pinot/pull/7556))
* Allow MV Field Support For Raw Columns in Text Indices ([#7638](https://github.com/apache/pinot/pull/7638))
* Allow override distinctCount to segmentPartitionedDistinctCount ([#7664](https://github.com/apache/pinot/pull/7664))
* Add a quick start with both UPSERT and JSON index ([#7669](https://github.com/apache/pinot/pull/7669))
* Add revertSegmentReplacement API ([#7662](https://github.com/apache/pinot/pull/7662))
* Smooth segment reloading with non blocking semantic ([#7675](https://github.com/apache/pinot/pull/7675))
* Clear the reused record in PartitionUpsertMetadataManager ([#7676](https://github.com/apache/pinot/pull/7676))
* Replace args4j with picocli ([#7665](https://github.com/apache/pinot/pull/7665))
* Handle datetime column consistently ([#7645](https://github.com/apache/pinot/pull/7645))([#7705](https://github.com/apache/pinot/pull/7705))
* Allow to carry headers with query requests ([#7696](https://github.com/apache/pinot/pull/7696)) ([#7712](https://github.com/apache/pinot/pull/7712))
* Allow adding JSON data type for dimension column types ([#7718](https://github.com/apache/pinot/pull/7718))
* Separate SegmentDirectoryLoader and tierBackend concepts ([#7737](https://github.com/apache/pinot/pull/7737))
* Implement size balanced V4 raw chunk format ([#7661](https://github.com/apache/pinot/pull/7661))
* Add presto-pinot-driver lib ([#7384](https://github.com/apache/pinot/pull/7384))

#### Major Bug fixes

* Fix null pointer exception for non-existed metric columns in schema for JDBC driver ([#7175](https://github.com/apache/pinot/pull/7175))
* Fix the config key for TASK\_MANAGER\_FREQUENCY\_PERIOD ([#7198](https://github.com/apache/pinot/pull/7198))
* Fixed pinot java client to add zkClient close ([#7196](https://github.com/apache/pinot/pull/7196))
* Ignore query json parse errors ([#7165](https://github.com/apache/pinot/pull/7165))
* Fix shutdown hook for PinotServiceManager ([#7251](https://github.com/apache/pinot/pull/7251)) ([#7253](https://github.com/apache/pinot/pull/7253))
* Make STRING to BOOLEAN data type change as backward compatible schema change ([#7259](https://github.com/apache/pinot/pull/7259))
* Replace gcp hardcoded values with generic annotations ([#6985](https://github.com/apache/pinot/pull/6985))
* Fix segment conversion executor for in-place conversion ([#7265](https://github.com/apache/pinot/pull/7265))
* Fix reporting consuming rate when the Kafka partition level consumer isn't stopped ([#7322](https://github.com/apache/pinot/pull/7322))
* Fix the issue with concurrent modification for segment lineage ([#7343](https://github.com/apache/pinot/pull/7343))
* Fix TableNotFound error message in PinotHelixResourceManager ([#7340](https://github.com/apache/pinot/pull/7340))
* Fix upload LLC segment endpoint truncated download URL ([#7361](https://github.com/apache/pinot/pull/7361))
* Fix task scheduling on table update ([#7362](https://github.com/apache/pinot/pull/7362))
* Fix metric method for ONLINE\_MINION\_INSTANCES metric ([#7363](https://github.com/apache/pinot/pull/7363))
* Fix JsonToPinotSchema behavior to be consistent with AvroSchemaToPinotSchema ([#7366](https://github.com/apache/pinot/pull/7366))
* Fix currentOffset volatility in consuming segment([#7365](https://github.com/apache/pinot/pull/7365))
* Fix misleading error msg for missing URI ([#7367](https://github.com/apache/pinot/pull/7367))
* Fix the correctness of getColumnIndices method ([#7370](https://github.com/apache/pinot/pull/7370))
* Fix SegmentZKMetadta time handling ([#7375](https://github.com/apache/pinot/pull/7375))
* Fix retention for cleaning up segment lineage ([#7424](https://github.com/apache/pinot/pull/7424))
* Fix segment generator to not return illegal filenames ([#7085](https://github.com/apache/pinot/pull/7085))
* Fix missing LLC segments in segment store by adding controller periodic task to upload them ([#6778](https://github.com/apache/pinot/pull/6778))
* Fix parsing error messages returned to FileUploadDownloadClient ([#7428](https://github.com/apache/pinot/pull/7428))
* Fix manifest scan which drives /version endpoint ([#7456](https://github.com/apache/pinot/pull/7456))
* Fix missing rate limiter if brokerResourceEV becomes null due to ZK connection ([#7470](https://github.com/apache/pinot/pull/7470))
* Fix race conditions between segment merge/roll-up and purge (or convertToRawIndex) tasks: ([#7427](https://github.com/apache/pinot/pull/7427))
* Fix pql double quote checker exception ([#7485](https://github.com/apache/pinot/pull/7485))
* Fix minion metrics exporter config ([#7496](https://github.com/apache/pinot/pull/7496))
* Fix segment unable to retry issue by catching timeout exception during segment replace ([#7509](https://github.com/apache/pinot/pull/7509))
* Add Exception to Broker Response When Not All Segments Are Available (Partial Response) ([#7397](https://github.com/apache/pinot/pull/7397))
* Fix segment generation commands ([#7527](https://github.com/apache/pinot/pull/7527))
* Return non zero from main with exception ([#7482](https://github.com/apache/pinot/pull/7482))
* Fix parquet plugin shading error ([#7570](https://github.com/apache/pinot/pull/7570))
* Fix the lowest partition id is not 0 for LLC ([#7066](https://github.com/apache/pinot/pull/7066))
* Fix star-tree index map when column name contains '.' ([#7623](https://github.com/apache/pinot/pull/7623))
* Fix cluster manager URLs encoding issue([#7639](https://github.com/apache/pinot/pull/7639))
* Fix fieldConfig nullable validation ([#7648](https://github.com/apache/pinot/pull/7648))
* Fix verifyHostname issue in FileUploadDownloadClient ([#7703](https://github.com/apache/pinot/pull/7703))
* Fix TableCache schema to include the built-in virtual columns ([#7706](https://github.com/apache/pinot/pull/7706))
* Fix DISTINCT with AS function ([#7678](https://github.com/apache/pinot/pull/7678))
* Fix SDF pattern in DataPreprocessingHelper ([#7721](https://github.com/apache/pinot/pull/7721))
* Fix fields missing issue in the source in ParquetNativeRecordReader ([#7742](https://github.com/apache/pinot/pull/7742))
