0.9.0
Last updated
Was this helpful?
Last updated
Was this helpful?
This release introduces a new features: Segment Merge and Rollup to simplify users day to day operational work. A new metrics plugin is added to support dropwizard. As usual, new functionalities and many UI/ Performance improvements.
The release was cut from the following commit: and the following cherry-picks: ,
LinkedIn operates a large multi-tenant cluster that serves a business metrics dashboard, and noticed that their tables consisted of millions of small segments. This was leading to slow operations in Helix/Zookeeper, long running queries due to having too many tasks to process, as well as using more space because of a lack of compression.
To solve this problem they added the Segment Merge task, which compresses segments based on timestamps and rolls up/aggregates older data. The task can be run on a schedule or triggered manually via the Pinot REST API.
At the moment this feature is only available for offline tables, but will be added for real-time tables in a future release.
Major Changes:
Integrate enhanced SegmentProcessorFramework into MergeRollupTaskExecutor ()
Merge/Rollup task scheduler for offline tables. ()
Fix MergeRollupTask uploading segments not updating their metadata ()
MergeRollupTask integration tests ()
Add mergeRollupTask delay metrics ()
MergeRollupTaskGenerator enhancement: enable parallel buckets scheduling ()
Use maxEndTimeMs for merge/roll-up delay metrics. ()
This release also sees improvements to Pinot’s query console UI.
There have also been improvements and additions to Pinot’s SQL implementation.
This release contains many performance improvement, you may sense it for you day to day queries. Thanks to all the great contributions listed below:
Cmd+Enter shortcut to run query in query console ()
Showing tooltip in SQL Editor ()
Make the SQL Editor box expandable ()
Fix tables ordering by number of segments ()
IN ()
LASTWITHTIME ()
ID_SET on MV columns ()
Raw results for Percentile TDigest and Est (),
Add timezone as argument in function toDateTime ()
LIKE()
REGEXP_EXTRACT()
FILTER()
Infer data type for Literal ()
Support logical identifier in predicate ()
Support JSON queries with top-level array path expression. ()
Support configurable group by trim size to improve results accuracy ()
Reduce the disk usage for segment conversion task ()
Simplify association between Java Class and PinotDataType for faster mapping ()
Avoid creating stateless ParseContextImpl once per jsonpath evaluation, avoid varargs allocation ()
Replace MINUS with STRCMP ()
Bit-sliced range index for int, long, float, double, dictionarized SV columns ()
Use MethodHandle to access vectorized unsigned comparison on JDK9+ ()
Add option to limit thread usage per query ()
Improved range queries ()
Faster bitmap scans ()
Optimize EmptySegmentPruner to skip pruning when there is no empty segments ()
Map bitmaps through a bounded window to avoid excessive disk pressure ()
Allow RLE compression of bitmaps for smaller file sizes ()
Support raw index properties for columns with JSON and RANGE indexes ()
Enhance BloomFilter rule to include IN predicate() ()
Introduce LZ4_WITH_LENGTH
chunk compression type ()
Enhance ColumnValueSegmentPruner and support bloom filter prefetch ()
Apply the optimization on dictIds within the segment to DistinctCountHLL aggregation func ()
During segment pruning, release the bloom filter after each segment is processed ()
Fix JSONPath cache inefficient issue ()
Optimize getUnpaddedString with SWAR padding search ()
Lighter weight LiteralTransformFunction, avoid excessive array fills ()
Inline binary comparison ops to prevent function call overhead ()
Memoize literals in query context in order to deduplicate them ()
Human Readable Controller Configs ()
Add the support of geoToH3 function ()
Add Apache Pulsar as Pinot Plugin () ()
Add dropwizard metrics plugin ()
Introduce OR Predicate Execution On Star Tree Index ()
Allow to extract values from array of objects with jsonPathArray ()
Add Realtime table metadata and indexes API. ()
Support array with mixing data types ()
Support force download segment in reload API ()
Show uncompressed znRecord from zk api ()
Add debug endpoint to get minion task status. ()
Validate CSV Header For Configured Delimiter ()
Add auth tokens and user/password support to ingestion job command ()
Add option to store the hash of the upsert primary key ()
Add null support for time column ()
Add mode aggregation function ()
Support disable swagger in Pinot servers ()
Delete metadata properly on table deletion ()
Add basic Obfuscator Support ()
Add AWS sts dependency to enable auth using web identity token. ()()
Mask credentials in debug endpoint /appconfigs ()
Fix /sql query endpoint now compatible with auth ()
Fix case sensitive issue in BasicAuthPrincipal permission check ()
Fix auth token injection in SegmentGenerationAndPushTaskExecutor ()
Add segmentNameGeneratorType config to IndexingConfig ()
Support trigger PeriodicTask manually ()
Add endpoint to check minion task status for a single task. ()
Showing partial status of segment and counting CONSUMING state as good segment status ()
Add "num rows in segments" and "num segments queried per host" to the output of Realtime Provisioning Rule ()
Check schema backward-compatibility when updating schema through addSchema with override ()
Optimize IndexedTable ()
Support indices remove in V3 segment format ()
Optimize TableResizer ()
Introduce resultSize in IndexedTable ()
Offset based realtime consumption status checker ()
Add causes to stack trace return ()
Create controller resource packages config key ()
Enhance TableCache to support schema name different from table name ()
Add validation for realtimeToOffline task ()
Unify CombineOperator multi-threading logic ()
Support no downtime rebalance for table with 1 replica in TableRebalancer ()
Introduce MinionConf, move END_REPLACE_SEGMENTS_TIMEOUT_MS to minion config instead of task config. ()
Adjust tuner api ()
Adding config for metrics library ()
Add geo type conversion scalar functions ()
Add BOOLEAN_ARRAY and TIMESTAMP_ARRAY types ()
Add MV raw forward index and MV BYTES
data type ()
Enhance TableRebalancer to offload the segments from most loaded instances first ()
Improve get tenant API to differentiate offline and realtime tenants ()
Refactor query rewriter to interfaces and implementations to allow customization ()
In ServiceStartable, apply global cluster config in ZK to instance config ()
Make dimension tables creation bypass tenant validation ()
Allow Metadata and Dictionary Based Plans for No Op Filters ()
Reject query with identifiers not in schema ()
Round Robin IP addresses when retry uploading/downloading segments ()
Support multi-value derived column in offline table reload ()
Support segmentNamePostfix in segment name ()
Add select segments API ()
Controller getTableInstance() call now returns the list of live brokers of a table. ()
Allow MV Field Support For Raw Columns in Text Indices ()
Allow override distinctCount to segmentPartitionedDistinctCount ()
Add a quick start with both UPSERT and JSON index ()
Add revertSegmentReplacement API ()
Smooth segment reloading with non blocking semantic ()
Clear the reused record in PartitionUpsertMetadataManager ()
Replace args4j with picocli ()
Handle datetime column consistently ()()
Allow to carry headers with query requests () ()
Allow adding JSON data type for dimension column types ()
Separate SegmentDirectoryLoader and tierBackend concepts ()
Implement size balanced V4 raw chunk format ()
Add presto-pinot-driver lib ()
Fix null pointer exception for non-existed metric columns in schema for JDBC driver ()
Fix the config key for TASK_MANAGER_FREQUENCY_PERIOD ()
Fixed pinot java client to add zkClient close ()
Ignore query json parse errors ()
Fix shutdown hook for PinotServiceManager () ()
Make STRING to BOOLEAN data type change as backward compatible schema change ()
Replace gcp hardcoded values with generic annotations ()
Fix segment conversion executor for in-place conversion ()
Fix reporting consuming rate when the Kafka partition level consumer isn't stopped ()
Fix the issue with concurrent modification for segment lineage ()
Fix TableNotFound error message in PinotHelixResourceManager ()
Fix upload LLC segment endpoint truncated download URL ()
Fix task scheduling on table update ()
Fix metric method for ONLINE_MINION_INSTANCES metric ()
Fix JsonToPinotSchema behavior to be consistent with AvroSchemaToPinotSchema ()
Fix currentOffset volatility in consuming segment()
Fix misleading error msg for missing URI ()
Fix the correctness of getColumnIndices method ()
Fix SegmentZKMetadta time handling ()
Fix retention for cleaning up segment lineage ()
Fix segment generator to not return illegal filenames ()
Fix missing LLC segments in segment store by adding controller periodic task to upload them ()
Fix parsing error messages returned to FileUploadDownloadClient ()
Fix manifest scan which drives /version endpoint ()
Fix missing rate limiter if brokerResourceEV becomes null due to ZK connection ()
Fix race conditions between segment merge/roll-up and purge (or convertToRawIndex) tasks: ()
Fix pql double quote checker exception ()
Fix minion metrics exporter config ()
Fix segment unable to retry issue by catching timeout exception during segment replace ()
Add Exception to Broker Response When Not All Segments Are Available (Partial Response) ()
Fix segment generation commands ()
Return non zero from main with exception ()
Fix parquet plugin shading error ()
Fix the lowest partition id is not 0 for LLC ()
Fix star-tree index map when column name contains '.' ()
Fix cluster manager URLs encoding issue()
Fix fieldConfig nullable validation ()
Fix verifyHostname issue in FileUploadDownloadClient ()
Fix TableCache schema to include the built-in virtual columns ()
Fix DISTINCT with AS function ()
Fix SDF pattern in DataPreprocessingHelper ()
Fix fields missing issue in the source in ParquetNativeRecordReader ()