0.9.0
Summary
This release introduces a new features: Segment Merge and Rollup to simplify users day to day operational work. A new metrics plugin is added to support dropwizard. As usual, new functionalities and many UI/ Performance improvements.
The release was cut from the following commit: 13c9ee9 and the following cherry-picks: 668b5e0, ee887b9
Support Segment Merge and Roll-up
LinkedIn operates a large multi-tenant cluster that serves a business metrics dashboard, and noticed that their tables consisted of millions of small segments. This was leading to slow operations in Helix/Zookeeper, long running queries due to having too many tasks to process, as well as using more space because of a lack of compression.
To solve this problem they added the Segment Merge task, which compresses segments based on timestamps and rolls up/aggregates older data. The task can be run on a schedule or triggered manually via the Pinot REST API.
At the moment this feature is only available for offline tables, but will be added for real-time tables in a future release.
Major Changes:
- Integrate enhanced SegmentProcessorFramework into MergeRollupTaskExecutor (#7180) 
- Merge/Rollup task scheduler for offline tables. (#7178) 
- Fix MergeRollupTask uploading segments not updating their metadata (#7289) 
- MergeRollupTask integration tests (#7283) 
- Add mergeRollupTask delay metrics (#7368) 
- MergeRollupTaskGenerator enhancement: enable parallel buckets scheduling (#7481) 
- Use maxEndTimeMs for merge/roll-up delay metrics. (#7617) 
UI Improvement
This release also sees improvements to Pinot’s query console UI.
- Cmd+Enter shortcut to run query in query console (#7359) 
- Showing tooltip in SQL Editor (#7387) 
- Make the SQL Editor box expandable (#7381) 
- Fix tables ordering by number of segments (#7564) 
SQL Improvements
There have also been improvements and additions to Pinot’s SQL implementation.
New functions:
- IN (#7542) 
- LASTWITHTIME (#7584) 
- ID_SET on MV columns (#7355) 
- Raw results for Percentile TDigest and Est (#7226), 
- Add timezone as argument in function toDateTime (#7552) 
New predicates are supported:
Query compatibility improvements:
- Infer data type for Literal (#7332) 
- Support logical identifier in predicate (#7347) 
- Support JSON queries with top-level array path expression. (#7511) 
- Support configurable group by trim size to improve results accuracy (#7241) 
Performance Improvements
This release contains many performance improvement, you may sense it for you day to day queries. Thanks to all the great contributions listed below:
- Reduce the disk usage for segment conversion task (#7193) 
- Simplify association between Java Class and PinotDataType for faster mapping (#7402) 
- Avoid creating stateless ParseContextImpl once per jsonpath evaluation, avoid varargs allocation (#7412) 
- Replace MINUS with STRCMP (#7394) 
- Bit-sliced range index for int, long, float, double, dictionarized SV columns (#7454) 
- Use MethodHandle to access vectorized unsigned comparison on JDK9+ (#7487) 
- Add option to limit thread usage per query (#7492) 
- Improved range queries (#7513) 
- Faster bitmap scans (#7530) 
- Optimize EmptySegmentPruner to skip pruning when there is no empty segments (#7531) 
- Map bitmaps through a bounded window to avoid excessive disk pressure (#7535) 
- Allow RLE compression of bitmaps for smaller file sizes (#7582) 
- Support raw index properties for columns with JSON and RANGE indexes (#7615) 
- Introduce - LZ4_WITH_LENGTHchunk compression type (#7655)
- Enhance ColumnValueSegmentPruner and support bloom filter prefetch (#7654) 
- Apply the optimization on dictIds within the segment to DistinctCountHLL aggregation func (#7630) 
- During segment pruning, release the bloom filter after each segment is processed (#7668) 
- Fix JSONPath cache inefficient issue (#7409) 
- Optimize getUnpaddedString with SWAR padding search (#7708) 
- Lighter weight LiteralTransformFunction, avoid excessive array fills (#7707) 
- Inline binary comparison ops to prevent function call overhead (#7709) 
- Memoize literals in query context in order to deduplicate them (#7720) 
Other Notable New Features and Changes
- Human Readable Controller Configs (#7173) 
- Add the support of geoToH3 function (#7182) 
- Add dropwizard metrics plugin (#7263) 
- Introduce OR Predicate Execution On Star Tree Index (#7184) 
- Allow to extract values from array of objects with jsonPathArray (#7208) 
- Add Realtime table metadata and indexes API. (#7169) 
- Support array with mixing data types (#7234) 
- Support force download segment in reload API (#7249) 
- Show uncompressed znRecord from zk api (#7304) 
- Add debug endpoint to get minion task status. (#7300) 
- Validate CSV Header For Configured Delimiter (#7237) 
- Add auth tokens and user/password support to ingestion job command (#7233) 
- Add option to store the hash of the upsert primary key (#7246) 
- Add null support for time column (#7269) 
- Add mode aggregation function (#7318) 
- Support disable swagger in Pinot servers (#7341) 
- Delete metadata properly on table deletion (#7329) 
- Add basic Obfuscator Support (#7407) 
- Mask credentials in debug endpoint /appconfigs (#7452) 
- Fix /sql query endpoint now compatible with auth (#7230) 
- Fix case sensitive issue in BasicAuthPrincipal permission check (#7354) 
- Fix auth token injection in SegmentGenerationAndPushTaskExecutor (#7464) 
- Add segmentNameGeneratorType config to IndexingConfig (#7346) 
- Support trigger PeriodicTask manually (#7174) 
- Add endpoint to check minion task status for a single task. (#7353) 
- Showing partial status of segment and counting CONSUMING state as good segment status (#7327) 
- Add "num rows in segments" and "num segments queried per host" to the output of Realtime Provisioning Rule (#7282) 
- Check schema backward-compatibility when updating schema through addSchema with override (#7374) 
- Optimize IndexedTable (#7373) 
- Support indices remove in V3 segment format (#7301) 
- Optimize TableResizer (#7392) 
- Introduce resultSize in IndexedTable (#7420) 
- Offset based real-time consumption status checker (#7267) 
- Add causes to stack trace return (#7460) 
- Create controller resource packages config key (#7488) 
- Enhance TableCache to support schema name different from table name (#7525) 
- Add validation for realtimeToOffline task (#7523) 
- Unify CombineOperator multi-threading logic (#7450) 
- Support no downtime rebalance for table with 1 replica in TableRebalancer (#7532) 
- Introduce MinionConf, move END_REPLACE_SEGMENTS_TIMEOUT_MS to minion config instead of task config. (#7516) 
- Adjust tuner api (#7553) 
- Adding config for metrics library (#7551) 
- Add geo type conversion scalar functions (#7573) 
- Add BOOLEAN_ARRAY and TIMESTAMP_ARRAY types (#7581) 
- Add MV raw forward index and MV - BYTESdata type (#7595)
- Enhance TableRebalancer to offload the segments from most loaded instances first (#7574) 
- Improve get tenant API to differentiate offline and real-time tenants (#7548) 
- Refactor query rewriter to interfaces and implementations to allow customization (#7576) 
- In ServiceStartable, apply global cluster config in ZK to instance config (#7593) 
- Make dimension tables creation bypass tenant validation (#7559) 
- Allow Metadata and Dictionary Based Plans for No Op Filters (#7563) 
- Reject query with identifiers not in schema (#7590) 
- Round Robin IP addresses when retry uploading/downloading segments (#7585) 
- Support multi-value derived column in offline table reload (#7632) 
- Support segmentNamePostfix in segment name (#7646) 
- Add select segments API (#7651) 
- Controller getTableInstance() call now returns the list of live brokers of a table. (#7556) 
- Allow MV Field Support For Raw Columns in Text Indices (#7638) 
- Allow override distinctCount to segmentPartitionedDistinctCount (#7664) 
- Add a quick start with both UPSERT and JSON index (#7669) 
- Add revertSegmentReplacement API (#7662) 
- Smooth segment reloading with non blocking semantic (#7675) 
- Clear the reused record in PartitionUpsertMetadataManager (#7676) 
- Replace args4j with picocli (#7665) 
- Allow adding JSON data type for dimension column types (#7718) 
- Separate SegmentDirectoryLoader and tierBackend concepts (#7737) 
- Implement size balanced V4 raw chunk format (#7661) 
- Add presto-pinot-driver lib (#7384) 
Major Bug fixes
- Fix null pointer exception for non-existed metric columns in schema for JDBC driver (#7175) 
- Fix the config key for TASK_MANAGER_FREQUENCY_PERIOD (#7198) 
- Fixed pinot java client to add zkClient close (#7196) 
- Ignore query json parse errors (#7165) 
- Make STRING to BOOLEAN data type change as backward compatible schema change (#7259) 
- Replace gcp hardcoded values with generic annotations (#6985) 
- Fix segment conversion executor for in-place conversion (#7265) 
- Fix reporting consuming rate when the Kafka partition level consumer isn't stopped (#7322) 
- Fix the issue with concurrent modification for segment lineage (#7343) 
- Fix TableNotFound error message in PinotHelixResourceManager (#7340) 
- Fix upload LLC segment endpoint truncated download URL (#7361) 
- Fix task scheduling on table update (#7362) 
- Fix metric method for ONLINE_MINION_INSTANCES metric (#7363) 
- Fix JsonToPinotSchema behavior to be consistent with AvroSchemaToPinotSchema (#7366) 
- Fix currentOffset volatility in consuming segment(#7365) 
- Fix misleading error msg for missing URI (#7367) 
- Fix the correctness of getColumnIndices method (#7370) 
- Fix SegmentZKMetadta time handling (#7375) 
- Fix retention for cleaning up segment lineage (#7424) 
- Fix segment generator to not return illegal filenames (#7085) 
- Fix missing LLC segments in segment store by adding controller periodic task to upload them (#6778) 
- Fix parsing error messages returned to FileUploadDownloadClient (#7428) 
- Fix manifest scan which drives /version endpoint (#7456) 
- Fix missing rate limiter if brokerResourceEV becomes null due to ZK connection (#7470) 
- Fix race conditions between segment merge/roll-up and purge (or convertToRawIndex) tasks: (#7427) 
- Fix pql double quote checker exception (#7485) 
- Fix minion metrics exporter config (#7496) 
- Fix segment unable to retry issue by catching timeout exception during segment replace (#7509) 
- Add Exception to Broker Response When Not All Segments Are Available (Partial Response) (#7397) 
- Fix segment generation commands (#7527) 
- Return non zero from main with exception (#7482) 
- Fix parquet plugin shading error (#7570) 
- Fix the lowest partition id is not 0 for LLC (#7066) 
- Fix star-tree index map when column name contains '.' (#7623) 
- Fix cluster manager URLs encoding issue(#7639) 
- Fix fieldConfig nullable validation (#7648) 
- Fix verifyHostname issue in FileUploadDownloadClient (#7703) 
- Fix TableCache schema to include the built-in virtual columns (#7706) 
- Fix DISTINCT with AS function (#7678) 
- Fix SDF pattern in DataPreprocessingHelper (#7721) 
- Fix fields missing issue in the source in ParquetNativeRecordReader (#7742) 
Was this helpful?

