0.12.0

Multi-Stage Query Engine

New join semantics support

New sql semantics support:

Performance enhancement

  • Thread safe query planning (#9344)

  • Partial query execution and round robin scheduling (#9753)

  • Improve data table serde (#9731)

Major updates

  • Force commit consuming segments by @sajjad-moradi in #9197

  • add a freshness based consumption status checker by @jadami10 in #9244

  • Add metrics to track controller segment download and upload requests in progress by @gviedma in #9258

  • Adding endpoint to download local log files for each component by @xiangfu0 in #9259

  • [Feature] Add an option to search input files recursively in ingestion job. The default is set to true to be backward compatible. by @61yao in #9265

  • add query cancel APIs on controller backed by those on brokers by @klsince in #9276

  • Add Spark Job Launcher tool by @KKcorps in #9288

  • Enable Consistent Data Push for Standalone Segment Push Job Runners by @yuanbenson in #9295

  • Allow server to directly return the final aggregation result by @Jackie-Jiang in #9304

  • TierBasedSegmentDirectoryLoader to keep segments in multi-datadir by @klsince in #9306

  • Adaptive Server Selection by @vvivekiyer in #9311

  • [Feature] Support IsDistinctFrom and IsNotDistinctFrom by @61yao in #9312

  • Allow ingestion of errored records with incorrect datatype by @KKcorps in #9320

  • Allow setting custom time boundary for hybrid table queries by @saurabhd336 in #9356

  • skip late cron job with max allowed delay by @klsince in #9372

  • Do not allow implicit cast for BOOLEAN and TIMESTAMP by @Jackie-Jiang in #9385

  • Add missing properties in CSV plugin by @KKcorps in #9399

  • set MDC so that one can route minion task logs to separate files cleanly by @klsince in #9400

  • Add a new API to fix segment date time in metadata by @KKcorps in #9413

  • Update get bytes to return raw bytes of string and support getBytesMV by @61yao in #9441

  • Exposing consumer's record lag in /consumingSegmentsInfo by @navina in #9515

  • Do not create dictionary for high-cardinality columns by @KKcorps in #9527

  • get task runtime configs tracked in Helix by @klsince in #9540

  • Add more options to json index by @Jackie-Jiang in #9543

  • add SegmentTierAssigner and refine restful APIs to get segment tier info by @klsince in #9598

  • Add segment level debug API by @saurabhd336 in #9609

  • Add record availability lag for Kafka connector by @navina in #9621

  • notify servers that need to move segments to new tiers via SegmentReloadMessage by @klsince in #9624

  • Allow to configure multi-datadirs as instance configs and a Quickstart example about them by @klsince in #9705

  • Customize stopword for Lucene Index by @jasperjiaguo in #9708

  • Add memory optimized dimension table by @KKcorps in #9802

  • ADLS file system upgrade by @xiangfu0 in #9855

  • Added Delete Schema/Table pinot admin commands by @bagipriyank in #9857

  • Adding new ADLSPinotFS auth type: DEFAULT by @xiangfu0 in #9860

  • Add rate limit to Kinesis requests by @KKcorps in #9863

  • Adding configs for zk client timeout by @xiangfu0 in #9975

Other features/changes

  • Show most recent scheduling errors by @satishwaghela in #9161

  • Do not use aggregation result for distinct query in IntermediateResultsBlock by @Jackie-Jiang in #9262

  • Emit metrics for ratio of actual consumption rate to rate limit in real-time tables by @sajjad-moradi in #9201

  • add metrics entry offlineTableCount by @walterddr in #9270

  • refine query cancel resp msg by @klsince in #9242

  • add @ManualAuthorization annotation for non-standard endpoints by @apucher in #9252

  • Optimize ser/de to avoid using output stream by @Jackie-Jiang in #9278

  • Add Support for Covariance Function by @SabrinaZhaozyf in #9236

  • Throw an exception when MV columns are present in the order-by expression list in selection order-by only queries by @somandal in #9078

  • Improve server query cancellation and timeout checking during execution by @jasperjiaguo in #9286

  • Add capabilities to ingest from another stream without disabling the real-time table by @sajjad-moradi in #9289

  • Add minMaxInvalid flag to avoid unnecessary needPreprocess by @npawar in #9238

  • Add array cardinality function by @walterddr in #9300

  • TierBasedSegmentDirectoryLoader to keep segments in multi-datadir by @klsince in #9306

  • Add support for custom null values in CSV record reader by @KKcorps in #9318

  • Infer parquet reader type based on file metadata by @saurabhd336 in #9294

  • Add Support for Cast Function on MV Columns by @SabrinaZhaozyf in #9296

  • Allow ingestion of errored records with incorrect datatype by @KKcorps in #9320

  • [Feature] Not Operator Transformation by @61yao in #9330

  • Handle null string in CSV decoder by @KKcorps in #9340

  • [Feature] Not scalar function by @61yao in #9338

  • Add support for EXTRACT syntax and converts it to appropriate Pinot expression by @tanmesh in #9184

  • Add support for Auth in controller requests in java query client by @KKcorps in #9230

  • delete all related minion task metadata when deleting a table by @zhtaoxiang in #9339

  • BloomFilterRule should only recommend for supported column type by @yuanbenson in #9364

  • Support all the types in ParquetNativeRecordReader by @xiangfu0 in #9352

  • Improve segment name check in metadata push by @zhtaoxiang in #9359

  • Allow expression transformer cotinue on error by @xiangfu0 in #9376

  • skip late cron job with max allowed delay by @klsince in #9372

  • Enhance and filter predicate evaluation efficiency by @jasperjiaguo in #9336

  • Deprecate instanceId Config For Broker/Minion Specific Configs by @ankitsultana in #9308

  • Optimize combine operator to fully utilize threads by @Jackie-Jiang in #9387

  • Terminate the query after plan generation if timeout by @jasperjiaguo in #9386

  • [Feature] Support IsDistinctFrom and IsNotDistinctFrom by @61yao in #9312

  • [Feature] Support Coalesce for Column Names by @61yao in #9327

  • Disable logging for interrupted exceptions in kinesis by @KKcorps in #9405

  • Benchmark thread cpu time by @jasperjiaguo in #9408

  • Use ISODateTimeFormat as default for SIMPLE_DATE_FORMAT by @KKcorps in #9378

  • Extract the common logic for upsert metadata manager by @Jackie-Jiang in #9435

  • Make minion task metadata manager methods more generic by @saurabhd336 in #9436

  • Always pass clientId to kafka's consumer properties by @navina in #9444

  • Adaptive Server Selection by @vvivekiyer in #9311

  • Refine IndexHandler methods a bit to make them reentrant by @klsince in #9440

  • use MinionEventObserver to track finer grained task progress status on worker by @klsince in #9432

  • Allow spaces in input file paths by @KKcorps in #9426

  • Add support for gracefully handling the errors while transformations by @KKcorps in #9377

  • Cache Deleted Segment Names in Server to Avoid SegmentMissingError by @ankitsultana in #9423

  • Handle Invalid timestamps by @KKcorps in #9355

  • refine minion worker event observer to track finer grained progress for tasks by @klsince in #9449

  • spark-connector should use v2/brokers endpoint by @itschrispeck in #9451

  • Remove netty server query support from presto-pinot-driver to remove pinot-core and pinot-segment-local dependencies by @xiangfu0 in #9455

  • Adaptive Server Selection: Address pending review comments by @vvivekiyer in #9462

  • track progress from within segment processor framework by @klsince in #9457

  • Decouple ser/de from DataTable by @Jackie-Jiang in #9468

  • collect file info like mtime, length while listing files for free by @klsince in #9466

  • Extract record keys, headers and metadata from Stream sources by @navina in #9224

  • [pinot-spark-connector] Bump spark connector max inbound message size by @cbalci in #9475

  • refine the minion task progress api a bit by @klsince in #9482

  • add parsing for AT TIME ZONE by @agavra in #9477

  • Eliminate explosion of metrics due to gapfill queries by @elonazoulay in #9490

  • ForwardIndexHandler: Change compressionType during segmentReload by @vvivekiyer in #9454

  • Introduce Segment AssignmentStrategy Interface by @GSharayu in #9309

  • Add query interruption flag check to broker groupby reduction by @jasperjiaguo in #9499

  • adding optional client payload by @walterddr in #9465

  • [feature] distinct from scalar functions by @61yao in #9486

  • Check data table version on server only for null handling by @Jackie-Jiang in #9508

  • Add docId and column name to segment read exception by @KKcorps in #9512

  • Sort scanning based operators by cardinality in AndDocIdSet evaluation by @jasperjiaguo in #9420

  • Do not fail CI when codecov upload fails by @Jackie-Jiang in #9522

  • [Upsert] persist validDocsIndex snapshot for Pinot upsert optimization by @deemoliu in #9062

  • broker filter by @dongxiaoman in #9391

  • [feature] coalesce scalar by @61yao in #9487

  • Allow setting custom time boundary for hybrid table queries by @saurabhd336 in #9356

  • [GHA] add cache timeout by @walterddr in #9524

  • Optimize PinotHelixResourceManager.hasTable() by @Jackie-Jiang in #9526

  • Include exception when upsert metadata manager cannot be created by @Jackie-Jiang in #9532

  • allow to config task expire time by @klsince in #9530

  • expose task finish time via debug API by @klsince in #9534

  • Remove the wrong warning log in KafkaPartitionLevelConsumer by @Jackie-Jiang in #9536

  • starting http server for minion worker conditionally by @klsince in #9542

  • Make StreamMessage generic and a bug fix by @vvivekiyer in #9544

  • Improve primary key serialization performance by @KKcorps in #9538

  • [Upsert] Skip removing upsert metadata when shutting down the server by @Jackie-Jiang in #9551

  • add array element at function by @walterddr in #9554

  • Handle the case when enableNullHandling is true and an aggregation function is used w/ a column that has an empty null bitmap by @nizarhejazi in #9566

  • Support segment storage format without forward index by @somandal in #9333

  • Adding SegmentNameGenerator type inference if not explicitly set in config by @timsants in #9550

  • add version information to JMX metrics & component logs by @agavra in #9578

  • remove unused RecordTransform/RecordFilter classes by @agavra in #9607

  • Support rewriting forward index upon changing compression type for existing raw MV column by @vvivekiyer in #9510

  • Support Avro's Fixed data type by @sajjad-moradi in #9642

  • [feature] [kubernetes] add loadBalancerSourceRanges to service-external.yaml for controller and broker by @jameskelleher in #9494

  • Limit up to 10 unavailable segments to be printed in the query exception by @Jackie-Jiang in #9617

  • remove more unused filter code by @agavra in #9620

  • Do not cache record reader in segment by @Jackie-Jiang in #9604

  • make first part of user agent header configurable by @rino-kadijk in #9471

  • optimize order by sorted ASC, unsorted and order by DESC cases by @gortiz in #8979

  • Enhance cluster config update API to handle non-string values properly by @Jackie-Jiang in #9635

  • Reverts recommender REST API back to PUT (reverts PR #9326) by @yuanbenson in #9638

  • Remove invalid pruner names from server config by @Jackie-Jiang in #9646

  • Using usageHelp instead of deprecated help in picocli commands by @navina in #9608

  • Handle unique query id on server by @Jackie-Jiang in #9648

  • stateless group marker missing several by @walterddr in #9673

  • Support reloading consuming segment using force commit by @Jackie-Jiang in #9640

  • Improve star-tree to use star-node when the predicate matches all the non-star nodes by @Jackie-Jiang in #9667

  • add FetchPlanner interface to decide what column index to prefetch by @klsince in #9668

  • Improve star-tree traversal using ArrayDeque by @Jackie-Jiang in #9688

  • Handle errors in combine operator by @Jackie-Jiang in #9689

  • return different error code if old version is not on master by @SabrinaZhaozyf in #9686

  • Support creating dictionary at runtime for an existing column by @vvivekiyer in #9678

  • check mutable segment explicitly instead of checking existence of indexDir by @klsince in #9718

  • Remove leftover file before downloading segmentTar by @npawar in #9719

  • add index key and size map to segment metadata by @walterddr in #9712

  • Use ideal state as source of truth for segment existence by @Jackie-Jiang in #9735

  • Close Filesystem on exit with Minion Tasks by @KKcorps in #9681

  • render the tables list even as the table sizes are loading by @jadami10 in #9741

  • Add Support for IP Address Function by @SabrinaZhaozyf in #9501

  • bubble up error messages from broker by @agavra in #9754

  • Add support to disable the forward index for existing columns by @somandal in #9740

  • show table metadata info in aggregate index size form by @walterddr in #9733

  • Preprocess immutable segments from REALTIME table conditionally when loading them by @klsince in #9772

  • revert default timeout nano change in QueryConfig by @agavra in #9790

  • AdaptiveServerSelection: Update stats for servers that have not responded by @vvivekiyer in #9801

  • Add null value index for default column by @KKcorps in #9777

  • [MergeRollupTask] include partition info into segment name by @zhtaoxiang in #9815

  • Adding a consumer lag as metric via a periodic task in controller by @navina in #9800

  • Deserialize Hyperloglog objects more optimally by @priyen in #9749

  • Download offline segments from peers by @wirybeaver in #9710

  • Thread Level Usage Accounting and Query Killing on Server by @jasperjiaguo in #9727

  • Add max merger and min mergers for partial upsert by @deemoliu in #9665

  • #9518 added pinot helm 0.2.6 with secure version pinot 0.11.0 by @bagipriyank in #9519

  • Combine the read access for replication config by @snleee in #9849

  • add v1 ingress in helm chart by @jhisse in #9862

  • Optimize AdaptiveServerSelection for replicaGroup based routing by @vvivekiyer in #9803

  • Do not sort the instances in InstancePartitions by @Jackie-Jiang in #9866

  • Merge new columns in existing record with default merge strategy by @navina in #9851

  • Support disabling dictionary at runtime for an existing column by @vvivekiyer in #9868

  • support BOOL_AND and BOOL_OR aggregate functions by @agavra in #9848

  • Use Pulsar AdminClient to delete unused subscriptions by @navina in #9859

  • add table sort function for table size by @jadami10 in #9844

  • In Kafka consumer, seek offset only when needed by @Jackie-Jiang in #9896

  • fallback if no broker found for the specified table name by @klsince in #9914

  • Allow liveness check during server shutting down by @Jackie-Jiang in #9915

  • Allow segment upload via Metadata in MergeRollup Minion task by @KKcorps in #9825

  • Add back the Helix workaround for missing IS change by @Jackie-Jiang in #9921

  • Allow uploading real-time segments via CLI by @KKcorps in #9861

  • Add capability to update and delete table config via CLI by @KKcorps in #9852

  • default to TAR if push mode is not set by @klsince in #9935

  • load startree index via segment reader interface by @klsince in #9828

  • Allow collections for MV transform functions by @saurabhd336 in #9908

  • Construct new IndexLoadingConfig when loading completed real-time segments by @vvivekiyer in #9938

  • Make GET /tableConfigs backwards compatible in case schema does not match raw table name by @timsants in #9922

  • feat: add compressed file support for ORCRecordReader by @etolbakov in #9884

  • Add Variance and Standard Deviation Aggregation Functions by @snleee in #9910

  • enable MergeRollupTask on real-time tables by @zhtaoxiang in #9890

  • Update cardinality when converting raw column to dict based by @vvivekiyer in #9875

  • Add back auth token for UploadSegmentCommand by @timsants in #9960

  • Improving gz support for avro record readers by @snleee in #9951

  • Default column handling of noForwardIndex and regeneration of forward index on reload path by @somandal in #9810

  • [Feature] Support coalesce literal by @61yao in #9958

  • Ability to initialize S3PinotFs with serverSideEncryption properties when passing client directly by @npawar in #9988

  • handle pending minion tasks properly when getting the task progress status by @klsince in #9911

  • allow gauge stored in metric registry to be updated by @zhtaoxiang in #9961

  • support case-insensitive query options in SET syntax by @agavra in #9912

  • pin versions-maven-plugin to 2.13.0 by @jadami10 in #9993

  • Pulsar Connection handler should not spin up a consumer / reader by @navina in #9893

  • Handle in-memory segment metadata for index checking by @Jackie-Jiang in #10017

  • Support the cross-account access using IAM role for S3 PinotFS by @snleee in #10009

  • report minion task metadata last update time as metric by @zhtaoxiang in #9954

  • support SKEWNESS and KURTOSIS aggregates by @agavra in #10021

  • emit minion task generation time and error metrics by @zhtaoxiang in #10026

  • Use the same default time value for all replicas by @Jackie-Jiang in #10029

  • Reduce the number of segments to wait for convergence when rebalancing by @saurabhd336 in #10028

UI Update & Improvement

  • Allow hiding query console tab based on cluster config (#9261)

  • Allow hiding pinot broker swagger UI by config (#9343)

  • Add UI to show fine-grained minion task progress (#9488)

  • Add UI to track segment reload progress (#9521)

  • Show minion task runtime config details in UI (#9652)

  • Redefine the segment status (#9699)

  • Show an option to reload the segments during edit schema (#9762)

  • Load schema UI async (#9781)

  • Fix blank screen when redirect to unknown app route (#9888)

Library version upgrade

  • Upgrade h3 lib from 3.7.2 to 4.0.0 to lower glibc requirement (#9335)

  • Upgrade ZK version to 3.6.3 (#9612)

  • Upgrade snakeyaml from 1.30 to 1.33 (#9464)

  • Upgrade RoaringBitmap from 0.9.28 to 0.9.35 (#9730)

  • Upgrade spotless-maven-plugin from 2.9.0 to 2.28.0 (#9877)

  • Upgrade decode-uri-component from 0.2.0 to 0.2.2 (#9941)

BugFixes