0.12.0

Multi-Stage Query Engine

New join semantics support

Left join (#9466)
In-equi join (#9448)
Full join (#9907)
Right join (#9907)
Semi join (#9367)
Using keyword (#9373)

New sql semantics support:

Having (#9274)
Order by (#9279)
In/NotIn clause (#9374)
Cast (#9384)
LIke/Rexlike (#9654)
Range predicate (#9445)

Performance enhancement

Thread safe query planning (#9344)
Partial query execution and round robin scheduling (#9753)
Improve data table serde (#9731)

Major updates

Force commit consuming segments by @sajjad-moradi in #9197
add a freshness based consumption status checker by @jadami10 in #9244
Add metrics to track controller segment download and upload requests in progress by @gviedma in #9258
Adding endpoint to download local log files for each component by @xiangfu0 in #9259
[Feature] Add an option to search input files recursively in ingestion job. The default is set to true to be backward compatible. by @61yao in #9265
add query cancel APIs on controller backed by those on brokers by @klsince in #9276
Add Spark Job Launcher tool by @KKcorps in #9288
Enable Consistent Data Push for Standalone Segment Push Job Runners by @yuanbenson in #9295
Allow server to directly return the final aggregation result by @Jackie-Jiang in #9304
TierBasedSegmentDirectoryLoader to keep segments in multi-datadir by @klsince in #9306
Adaptive Server Selection by @vvivekiyer in #9311
[Feature] Support IsDistinctFrom and IsNotDistinctFrom by @61yao in #9312
Allow ingestion of errored records with incorrect datatype by @KKcorps in #9320
Allow setting custom time boundary for hybrid table queries by @saurabhd336 in #9356
skip late cron job with max allowed delay by @klsince in #9372
Do not allow implicit cast for BOOLEAN and TIMESTAMP by @Jackie-Jiang in #9385
Add missing properties in CSV plugin by @KKcorps in #9399
set MDC so that one can route minion task logs to separate files cleanly by @klsince in #9400
Add a new API to fix segment date time in metadata by @KKcorps in #9413
Update get bytes to return raw bytes of string and support getBytesMV by @61yao in #9441
Exposing consumer's record lag in /consumingSegmentsInfo by @navina in #9515
Do not create dictionary for high-cardinality columns by @KKcorps in #9527
get task runtime configs tracked in Helix by @klsince in #9540
Add more options to json index by @Jackie-Jiang in #9543
add SegmentTierAssigner and refine restful APIs to get segment tier info by @klsince in #9598
Add segment level debug API by @saurabhd336 in #9609
Add record availability lag for Kafka connector by @navina in #9621
notify servers that need to move segments to new tiers via SegmentReloadMessage by @klsince in #9624
Allow to configure multi-datadirs as instance configs and a Quickstart example about them by @klsince in #9705
Customize stopword for Lucene Index by @jasperjiaguo in #9708
Add memory optimized dimension table by @KKcorps in #9802
ADLS file system upgrade by @xiangfu0 in #9855
Added Delete Schema/Table pinot admin commands by @bagipriyank in #9857
Adding new ADLSPinotFS auth type: DEFAULT by @xiangfu0 in #9860
Add rate limit to Kinesis requests by @KKcorps in #9863
Adding configs for zk client timeout by @xiangfu0 in #9975

Other features/changes

Show most recent scheduling errors by @satishwaghela in #9161
Do not use aggregation result for distinct query in IntermediateResultsBlock by @Jackie-Jiang in #9262
Emit metrics for ratio of actual consumption rate to rate limit in real-time tables by @sajjad-moradi in #9201
add metrics entry offlineTableCount by @walterddr in #9270
refine query cancel resp msg by @klsince in #9242
add @ManualAuthorization annotation for non-standard endpoints by @apucher in #9252
Optimize ser/de to avoid using output stream by @Jackie-Jiang in #9278
Add Support for Covariance Function by @SabrinaZhaozyf in #9236
Throw an exception when MV columns are present in the order-by expression list in selection order-by only queries by @somandal in #9078
Improve server query cancellation and timeout checking during execution by @jasperjiaguo in #9286
Add capabilities to ingest from another stream without disabling the real-time table by @sajjad-moradi in #9289
Add minMaxInvalid flag to avoid unnecessary needPreprocess by @npawar in #9238
Add array cardinality function by @walterddr in #9300
TierBasedSegmentDirectoryLoader to keep segments in multi-datadir by @klsince in #9306
Add support for custom null values in CSV record reader by @KKcorps in #9318
Infer parquet reader type based on file metadata by @saurabhd336 in #9294
Add Support for Cast Function on MV Columns by @SabrinaZhaozyf in #9296
Allow ingestion of errored records with incorrect datatype by @KKcorps in #9320
[Feature] Not Operator Transformation by @61yao in #9330
Handle null string in CSV decoder by @KKcorps in #9340
[Feature] Not scalar function by @61yao in #9338
Add support for EXTRACT syntax and converts it to appropriate Pinot expression by @tanmesh in #9184
Add support for Auth in controller requests in java query client by @KKcorps in #9230
delete all related minion task metadata when deleting a table by @zhtaoxiang in #9339
BloomFilterRule should only recommend for supported column type by @yuanbenson in #9364
Support all the types in ParquetNativeRecordReader by @xiangfu0 in #9352
Improve segment name check in metadata push by @zhtaoxiang in #9359
Allow expression transformer cotinue on error by @xiangfu0 in #9376
skip late cron job with max allowed delay by @klsince in #9372
Enhance and filter predicate evaluation efficiency by @jasperjiaguo in #9336
Deprecate instanceId Config For Broker/Minion Specific Configs by @ankitsultana in #9308
Optimize combine operator to fully utilize threads by @Jackie-Jiang in #9387
Terminate the query after plan generation if timeout by @jasperjiaguo in #9386
[Feature] Support IsDistinctFrom and IsNotDistinctFrom by @61yao in #9312
[Feature] Support Coalesce for Column Names by @61yao in #9327
Disable logging for interrupted exceptions in kinesis by @KKcorps in #9405
Benchmark thread cpu time by @jasperjiaguo in #9408
Use ISODateTimeFormat as default for SIMPLE_DATE_FORMAT by @KKcorps in #9378
Extract the common logic for upsert metadata manager by @Jackie-Jiang in #9435
Make minion task metadata manager methods more generic by @saurabhd336 in #9436
Always pass clientId to kafka's consumer properties by @navina in #9444
Adaptive Server Selection by @vvivekiyer in #9311
Refine IndexHandler methods a bit to make them reentrant by @klsince in #9440
use MinionEventObserver to track finer grained task progress status on worker by @klsince in #9432
Allow spaces in input file paths by @KKcorps in #9426
Add support for gracefully handling the errors while transformations by @KKcorps in #9377
Cache Deleted Segment Names in Server to Avoid SegmentMissingError by @ankitsultana in #9423
Handle Invalid timestamps by @KKcorps in #9355
refine minion worker event observer to track finer grained progress for tasks by @klsince in #9449
spark-connector should use v2/brokers endpoint by @itschrispeck in #9451
Remove netty server query support from presto-pinot-driver to remove pinot-core and pinot-segment-local dependencies by @xiangfu0 in #9455
Adaptive Server Selection: Address pending review comments by @vvivekiyer in #9462
track progress from within segment processor framework by @klsince in #9457
Decouple ser/de from DataTable by @Jackie-Jiang in #9468
collect file info like mtime, length while listing files for free by @klsince in #9466
Extract record keys, headers and metadata from Stream sources by @navina in #9224
[pinot-spark-connector] Bump spark connector max inbound message size by @cbalci in #9475
refine the minion task progress api a bit by @klsince in #9482
add parsing for AT TIME ZONE by @agavra in #9477
Eliminate explosion of metrics due to gapfill queries by @elonazoulay in #9490
ForwardIndexHandler: Change compressionType during segmentReload by @vvivekiyer in #9454
Introduce Segment AssignmentStrategy Interface by @GSharayu in #9309
Add query interruption flag check to broker groupby reduction by @jasperjiaguo in #9499
adding optional client payload by @walterddr in #9465
[feature] distinct from scalar functions by @61yao in #9486
Check data table version on server only for null handling by @Jackie-Jiang in #9508
Add docId and column name to segment read exception by @KKcorps in #9512
Sort scanning based operators by cardinality in AndDocIdSet evaluation by @jasperjiaguo in #9420
Do not fail CI when codecov upload fails by @Jackie-Jiang in #9522
[Upsert] persist validDocsIndex snapshot for Pinot upsert optimization by @deemoliu in #9062
broker filter by @dongxiaoman in #9391
[feature] coalesce scalar by @61yao in #9487
Allow setting custom time boundary for hybrid table queries by @saurabhd336 in #9356
[GHA] add cache timeout by @walterddr in #9524
Optimize PinotHelixResourceManager.hasTable() by @Jackie-Jiang in #9526
Include exception when upsert metadata manager cannot be created by @Jackie-Jiang in #9532
allow to config task expire time by @klsince in #9530
expose task finish time via debug API by @klsince in #9534
Remove the wrong warning log in KafkaPartitionLevelConsumer by @Jackie-Jiang in #9536
starting http server for minion worker conditionally by @klsince in #9542
Make StreamMessage generic and a bug fix by @vvivekiyer in #9544
Improve primary key serialization performance by @KKcorps in #9538
[Upsert] Skip removing upsert metadata when shutting down the server by @Jackie-Jiang in #9551
add array element at function by @walterddr in #9554
Handle the case when enableNullHandling is true and an aggregation function is used w/ a column that has an empty null bitmap by @nizarhejazi in #9566
Support segment storage format without forward index by @somandal in #9333
Adding SegmentNameGenerator type inference if not explicitly set in config by @timsants in #9550
add version information to JMX metrics & component logs by @agavra in #9578
remove unused RecordTransform/RecordFilter classes by @agavra in #9607
Support rewriting forward index upon changing compression type for existing raw MV column by @vvivekiyer in #9510
Support Avro's Fixed data type by @sajjad-moradi in #9642
[feature] [kubernetes] add loadBalancerSourceRanges to service-external.yaml for controller and broker by @jameskelleher in #9494
Limit up to 10 unavailable segments to be printed in the query exception by @Jackie-Jiang in #9617
remove more unused filter code by @agavra in #9620
Do not cache record reader in segment by @Jackie-Jiang in #9604
make first part of user agent header configurable by @rino-kadijk in #9471
optimize order by sorted ASC, unsorted and order by DESC cases by @gortiz in #8979
Enhance cluster config update API to handle non-string values properly by @Jackie-Jiang in #9635
Reverts recommender REST API back to PUT (reverts PR #9326) by @yuanbenson in #9638
Remove invalid pruner names from server config by @Jackie-Jiang in #9646
Using usageHelp instead of deprecated help in picocli commands by @navina in #9608
Handle unique query id on server by @Jackie-Jiang in #9648
stateless group marker missing several by @walterddr in #9673
Support reloading consuming segment using force commit by @Jackie-Jiang in #9640
Improve star-tree to use star-node when the predicate matches all the non-star nodes by @Jackie-Jiang in #9667
add FetchPlanner interface to decide what column index to prefetch by @klsince in #9668
Improve star-tree traversal using ArrayDeque by @Jackie-Jiang in #9688
Handle errors in combine operator by @Jackie-Jiang in #9689
return different error code if old version is not on master by @SabrinaZhaozyf in #9686
Support creating dictionary at runtime for an existing column by @vvivekiyer in #9678
check mutable segment explicitly instead of checking existence of indexDir by @klsince in #9718
Remove leftover file before downloading segmentTar by @npawar in #9719
add index key and size map to segment metadata by @walterddr in #9712
Use ideal state as source of truth for segment existence by @Jackie-Jiang in #9735
Close Filesystem on exit with Minion Tasks by @KKcorps in #9681
render the tables list even as the table sizes are loading by @jadami10 in #9741
Add Support for IP Address Function by @SabrinaZhaozyf in #9501
bubble up error messages from broker by @agavra in #9754
Add support to disable the forward index for existing columns by @somandal in #9740
show table metadata info in aggregate index size form by @walterddr in #9733
Preprocess immutable segments from REALTIME table conditionally when loading them by @klsince in #9772
revert default timeout nano change in QueryConfig by @agavra in #9790
AdaptiveServerSelection: Update stats for servers that have not responded by @vvivekiyer in #9801
Add null value index for default column by @KKcorps in #9777
[MergeRollupTask] include partition info into segment name by @zhtaoxiang in #9815
Adding a consumer lag as metric via a periodic task in controller by @navina in #9800
Deserialize Hyperloglog objects more optimally by @priyen in #9749
Download offline segments from peers by @wirybeaver in #9710
Thread Level Usage Accounting and Query Killing on Server by @jasperjiaguo in #9727
Add max merger and min mergers for partial upsert by @deemoliu in #9665
#9518 added pinot helm 0.2.6 with secure version pinot 0.11.0 by @bagipriyank in #9519
Combine the read access for replication config by @snleee in #9849
add v1 ingress in helm chart by @jhisse in #9862
Optimize AdaptiveServerSelection for replicaGroup based routing by @vvivekiyer in #9803
Do not sort the instances in InstancePartitions by @Jackie-Jiang in #9866
Merge new columns in existing record with default merge strategy by @navina in #9851
Support disabling dictionary at runtime for an existing column by @vvivekiyer in #9868
support BOOL_AND and BOOL_OR aggregate functions by @agavra in #9848
Use Pulsar AdminClient to delete unused subscriptions by @navina in #9859
add table sort function for table size by @jadami10 in #9844
In Kafka consumer, seek offset only when needed by @Jackie-Jiang in #9896
fallback if no broker found for the specified table name by @klsince in #9914
Allow liveness check during server shutting down by @Jackie-Jiang in #9915
Allow segment upload via Metadata in MergeRollup Minion task by @KKcorps in #9825
Add back the Helix workaround for missing IS change by @Jackie-Jiang in #9921
Allow uploading real-time segments via CLI by @KKcorps in #9861
Add capability to update and delete table config via CLI by @KKcorps in #9852
default to TAR if push mode is not set by @klsince in #9935
load startree index via segment reader interface by @klsince in #9828
Allow collections for MV transform functions by @saurabhd336 in #9908
Construct new IndexLoadingConfig when loading completed real-time segments by @vvivekiyer in #9938
Make GET /tableConfigs backwards compatible in case schema does not match raw table name by @timsants in #9922
feat: add compressed file support for ORCRecordReader by @etolbakov in #9884
Add Variance and Standard Deviation Aggregation Functions by @snleee in #9910
enable MergeRollupTask on real-time tables by @zhtaoxiang in #9890
Update cardinality when converting raw column to dict based by @vvivekiyer in #9875
Add back auth token for UploadSegmentCommand by @timsants in #9960
Improving gz support for avro record readers by @snleee in #9951
Default column handling of noForwardIndex and regeneration of forward index on reload path by @somandal in #9810
[Feature] Support coalesce literal by @61yao in #9958
Ability to initialize S3PinotFs with serverSideEncryption properties when passing client directly by @npawar in #9988
handle pending minion tasks properly when getting the task progress status by @klsince in #9911
allow gauge stored in metric registry to be updated by @zhtaoxiang in #9961
support case-insensitive query options in SET syntax by @agavra in #9912
pin versions-maven-plugin to 2.13.0 by @jadami10 in #9993
Pulsar Connection handler should not spin up a consumer / reader by @navina in #9893
Handle in-memory segment metadata for index checking by @Jackie-Jiang in #10017
Support the cross-account access using IAM role for S3 PinotFS by @snleee in #10009
report minion task metadata last update time as metric by @zhtaoxiang in #9954
support SKEWNESS and KURTOSIS aggregates by @agavra in #10021
emit minion task generation time and error metrics by @zhtaoxiang in #10026
Use the same default time value for all replicas by @Jackie-Jiang in #10029
Reduce the number of segments to wait for convergence when rebalancing by @saurabhd336 in #10028

UI Update & Improvement

Allow hiding query console tab based on cluster config (#9261)
Allow hiding pinot broker swagger UI by config (#9343)
Add UI to show fine-grained minion task progress (#9488)
Add UI to track segment reload progress (#9521)
Show minion task runtime config details in UI (#9652)
Redefine the segment status (#9699)
Show an option to reload the segments during edit schema (#9762)
Load schema UI async (#9781)
Fix blank screen when redirect to unknown app route (#9888)

Library version upgrade

Upgrade h3 lib from 3.7.2 to 4.0.0 to lower glibc requirement (#9335)
Upgrade ZK version to 3.6.3 (#9612)
Upgrade snakeyaml from 1.30 to 1.33 (#9464)
Upgrade RoaringBitmap from 0.9.28 to 0.9.35 (#9730)
Upgrade spotless-maven-plugin from 2.9.0 to 2.28.0 (#9877)
Upgrade decode-uri-component from 0.2.0 to 0.2.2 (#9941)

BugFixes

Fix bug with logging request headers by @abhs50 in #9247
Fix a UT that only shows up on host with more cores by @klsince in #9257
Fix message count by @Jackie-Jiang in #9271
Fix issue with auth AccessType in Schema REST endpoints by @sajjad-moradi in #9293
Fix PerfBenchmarkRunner to skip the tmp dir by @Jackie-Jiang in #9298
Fix thrift deserializer thread safety issue by @saurabhd336 in #9299
Fix transformation to string for BOOLEAN and TIMESTAMP by @Jackie-Jiang in #9287
[hotfix] Add VARBINARY column to switch case branch by @walterddr in #9313
Fix annotation for "/recommender" endpoint by @sajjad-moradi in #9326
Fix jdk8 build issue due to missing pom dependency by @somandal in #9351
Fix pom to use pinot-common-jdk8 for pinot-connector jkd8 java client by @somandal in #9353
Fix log to reflect job type by @KKcorps in #9381
[Bugfix] schema update bug fix by @MeihanLi in #9382
fix histogram null pointer exception by @jasperjiaguo in #9428
Fix thread safety issues with SDF (WIP) by @saurabhd336 in #9425
Bug fix: failure status in ingestion jobs doesn't reflect in exit code by @KKcorps in #9410
Fix skip segment logic in MinMaxValueBasedSelectionOrderByCombineOperator by @Jackie-Jiang in #9434
Fix the bug of hybrid table request using the same request id by @Jackie-Jiang in #9443
Fix the range check for range index on raw column by @Jackie-Jiang in #9453
Fix Data-Correctness Bug in GTE Comparison in BinaryOperatorTransformFunction by @ankitsultana in #9461
extend PinotFS impls with listFilesWithMetadata and some bugfix by @klsince in #9478
fix null transform bound check by @walterddr in #9495
Fix JsonExtractScalar when no value is extracted by @Jackie-Jiang in #9500
Fix AddTable for real-time tables by @npawar in #9506
Fix some type convert scalar functions by @Jackie-Jiang in #9509
fix spammy logs for ConfluentSchemaRegistryRealtimeClusterIntegrationTest [MINOR] by @agavra in #9516
Fix timestamp index on column of preserved key by @Jackie-Jiang in #9533
Fix record extractor when ByteBuffer can be reused by @Jackie-Jiang in #9549
Fix explain plan ALL_SEGMENTS_PRUNED_ON_SERVER node by @somandal in #9572
Fix time validation when data type needs to be converted by @Jackie-Jiang in #9569
UI: fix incorrect task finish time by @jayeshchoudhary in #9557
Fix the bug where uploaded segments cannot be deleted on real-time table by @Jackie-Jiang in #9579
[bugfix] correct the dir for building segments in FileIngestionHelper by @zhtaoxiang in #9591
Fix NonAggregationGroupByToDistinctQueryRewriter by @Jackie-Jiang in #9605
fix distinct result return by @walterddr in #9582
Fix GcsPinotFS by @lfernandez93 in #9556
fix DataSchema thread-safe issue by @walterddr in #9619
Bug fix: Add missing table config fetch for /tableConfigs list all by @timsants in #9603
Fix re-uploading segment when the previous upload failed by @Jackie-Jiang in #9631
Fix string split which should be on whole separator by @Jackie-Jiang in #9650
Fix server request sent delay to be non-negative by @Jackie-Jiang in #9656
bugfix: Add missing BIG_DECIMAL support for GenericRow serde by @timsants in #9661
Fix extra restlet resource test which should be stateless by @Jackie-Jiang in #9674
AdaptiveServerSelection: Fix timer by @vvivekiyer in #9697
fix PinotVersion to be compatible with prometheus by @agavra in #9701
Fix the setup for ControllerTest shared cluster by @Jackie-Jiang in #9704
[hotfix]groovy class cache leak by @walterddr in #9716
Fix TIMESTAMP index handling in SegmentMapper by @Jackie-Jiang in #9722
Fix the server admin endpoint cache to reflect the config changes by @Jackie-Jiang in #9734
[bugfix] fix case-when issue by @walterddr in #9702
[bugfix] Let StartControllerCommand also handle "pinot.zk.server", "pinot.cluster.name" in default conf/pinot-controller.conf by @thangnd197 in #9739
[hotfix] semi-join opt by @walterddr in #9779
Fixing the rebalance issue for real-time table with tier by @snleee in #9780
UI: show segment debug details when segment is in bad state by @jayeshchoudhary in #9700
Fix the replication in segment assignment strategy by @GSharayu in #9816
fix potential fd leakage for SegmentProcessorFramework by @klsince in #9797
Fix NPE when reading ZK address from controller config by @Jackie-Jiang in #9751
have query table list show search bar; fix InstancesTables filter by @jadami10 in #9742
[pinot-spark-connector] Fix empty data table handling in GRPC reader by @cbalci in #9837
[bugfix] fix mergeRollupTask metrics by @zhtaoxiang in #9864
Bug fix: Get correct primary key count by @KKcorps in #9876
Fix issues for real-time table reload by @Jackie-Jiang in #9885
UI: fix segment status color remains same in different table page by @jayeshchoudhary in #9891
Fix bloom filter creation on BYTES by @Jackie-Jiang in #9898
[hotfix] broker selection not using table name by @walterddr in #9902
Fix race condition when 2 segment upload occurred for the same segment by @jackjlli in #9905
fix timezone_hour/timezone_minute functions by @agavra in #9949
[Bugfix] Move brokerId extraction to BaseBrokerStarter by @jackjlli in #9965
Fix ser/de for StringLongPair by @Jackie-Jiang in #9985
bugfix dir check for HadoopPinotFS.copyFromLocalDir by @klsince in #9979
Bugfix: Use correct exception import in TableRebalancer. by @mayankshriv in #10025
Fix NPE in AbstractMetrics From Race Condition by @ankitsultana in #10022

Previous0.12.1 Next0.11.0

Last updated 2 years ago

Was this helpful?