LogoLogo
release-1.1.0
release-1.1.0
  • Introduction
  • Basics
    • Concepts
    • Architecture
    • Components
      • Cluster
        • Tenant
        • Server
        • Controller
        • Broker
        • Minion
      • Table
        • Segment
          • Deep Store
        • Schema
      • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Create and update a table configuration
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • From Query Console
      • Batch Ingestion
        • Spark
        • Flink
        • Hadoop
        • Backfill Data
        • Dimension table
      • Stream ingestion
        • Apache Kafka
        • Amazon Kinesis
        • Apache Pulsar
      • Stream Ingestion with Upsert
      • Segment compaction on upserts
      • Stream Ingestion with Dedup
      • Stream Ingestion with CLP
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
        • Complex Type (Array, Map) Handling
        • Ingest records with dynamic schemas
      • Reload a table segment
      • Upload a table segment
    • Indexing
      • Bloom filter
      • Dictionary index
      • Forward index
      • Geospatial
      • Inverted index
      • JSON index
      • Native text index
      • Range index
      • Star-tree index
      • Text search support
      • Timestamp index
    • Releases
      • 1.1.0
      • Apache Pinotâ„¢ 1.0.0 release notes
      • 0.12.1
      • 0.12.0
      • 0.11.0
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • Connect to Streamlit
      • Connect to Dash
      • Visualize data with Redash
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Query Syntax
        • Aggregation Functions
        • Cardinality Estimation
        • Explain Plan (Single-Stage)
        • Explain Plan (Multi-Stage)
        • Filtering with IdSet
        • GapFill Function For Time-Series Dataset
        • Grouping Algorithm
        • JOINs
        • Lookup UDF Join
        • Querying JSON data
        • Transformation Functions
        • Window aggregate
      • Query Options
      • User-Defined Functions (UDFs)
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
      • Controller API Reference
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Update documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null value support
      • Use the multi-stage query engine (v2)
      • Troubleshoot issues with the multi-stage query engine (v2)
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Set up cluster
      • Server Startup Status Checkers
      • Set up table
      • Set up ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
      • Separating data storage by age
        • Using multiple tenants
        • Using multiple directories
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Consistent Push and Rollback
      • Access Control
      • Monitoring
      • Tuning
        • Real-time
        • Routing
        • Query Routing using Adaptive Server Selection
        • Query Scheduling
      • Upgrading Pinot with confidence
      • Managing Logs
      • OOM Protection Using Automatic Query Killing
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication
        • Basic auth access control
        • ZkBasicAuthAccessControl
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Ingestion
    • Schema
    • Ingestion Job Spec
    • Monitoring Metrics
    • Functions
      • ABS
      • ADD
      • ago
      • EXPR_MIN / EXPR_MAX
      • arrayConcatDouble
      • arrayConcatFloat
      • arrayConcatInt
      • arrayConcatLong
      • arrayConcatString
      • arrayContainsInt
      • arrayContainsString
      • arrayDistinctInt
      • arrayDistinctString
      • arrayIndexOfInt
      • arrayIndexOfString
      • ARRAYLENGTH
      • arrayRemoveInt
      • arrayRemoveString
      • arrayReverseInt
      • arrayReverseString
      • arraySliceInt
      • arraySliceString
      • arraySortInt
      • arraySortString
      • arrayUnionInt
      • arrayUnionString
      • AVGMV
      • Base64
      • caseWhen
      • ceil
      • CHR
      • codepoint
      • concat
      • count
      • COUNTMV
      • COVAR_POP
      • COVAR_SAMP
      • day
      • dayOfWeek
      • dayOfYear
      • DISTINCT
      • DISTINCTAVG
      • DISTINCTAVGMV
      • DISTINCTCOUNT
      • DISTINCTCOUNTBITMAP
      • DISTINCTCOUNTHLLMV
      • DISTINCTCOUNTHLL
      • DISTINCTCOUNTBITMAPMV
      • DISTINCTCOUNTMV
      • DISTINCTCOUNTRAWHLL
      • DISTINCTCOUNTRAWHLLMV
      • DISTINCTCOUNTRAWTHETASKETCH
      • DISTINCTCOUNTTHETASKETCH
      • DISTINCTSUM
      • DISTINCTSUMMV
      • DIV
      • DATETIMECONVERT
      • DATETRUNC
      • exp
      • FIRSTWITHTIME
      • FLOOR
      • FrequentLongsSketch
      • FrequentStringsSketch
      • FromDateTime
      • FromEpoch
      • FromEpochBucket
      • FUNNELCOUNT
      • Histogram
      • hour
      • isSubnetOf
      • JSONFORMAT
      • JSONPATH
      • JSONPATHARRAY
      • JSONPATHARRAYDEFAULTEMPTY
      • JSONPATHDOUBLE
      • JSONPATHLONG
      • JSONPATHSTRING
      • jsonextractkey
      • jsonextractscalar
      • LASTWITHTIME
      • length
      • ln
      • lower
      • lpad
      • ltrim
      • max
      • MAXMV
      • MD5
      • millisecond
      • min
      • minmaxrange
      • MINMAXRANGEMV
      • MINMV
      • minute
      • MOD
      • mode
      • month
      • mult
      • now
      • percentile
      • percentileest
      • percentileestmv
      • percentilemv
      • percentiletdigest
      • percentiletdigestmv
      • percentilekll
      • percentilerawkll
      • percentilekllmv
      • percentilerawkllmv
      • quarter
      • regexpExtract
      • regexpReplace
      • remove
      • replace
      • reverse
      • round
      • ROW_NUMBER
      • rpad
      • rtrim
      • second
      • SEGMENTPARTITIONEDDISTINCTCOUNT
      • sha
      • sha256
      • sha512
      • sqrt
      • startswith
      • ST_AsBinary
      • ST_AsText
      • ST_Contains
      • ST_Distance
      • ST_GeogFromText
      • ST_GeogFromWKB
      • ST_GeometryType
      • ST_GeomFromText
      • ST_GeomFromWKB
      • STPOINT
      • ST_Polygon
      • strpos
      • ST_Union
      • SUB
      • substr
      • sum
      • summv
      • TIMECONVERT
      • timezoneHour
      • timezoneMinute
      • ToDateTime
      • ToEpoch
      • ToEpochBucket
      • ToEpochRounded
      • TOJSONMAPSTR
      • toGeometry
      • toSphericalGeography
      • trim
      • upper
      • Url
      • UTF8
      • VALUEIN
      • week
      • year
      • yearOfWeek
      • Extract
    • Plugin Reference
      • Stream Ingestion Connectors
      • VAR_POP
      • VAR_SAMP
      • STDDEV_POP
      • STDDEV_SAMP
  • Reference
    • Single-stage query engine (v1)
    • Multi-stage query engine (v2)
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
    • Spark-Pinot Connector
  • Contributing
    • Contribute Pinot documentation
    • Style guide
Powered by GitBook
On this page
  • Summary
  • Multi-Stage Query Engine
  • Pause Stream Consumption on Apache Pinot
  • Gap-filling function
  • Add support for Spark 3.x (#8560)
  • Add Flink Pinot connector (#8233)
  • Show running queries and cancel query by id (#9171)
  • Timestamp Index (#8343)
  • Native Text Indices (#8384)
  • Adding DML definition and parse SQL InsertFile (#8557)
  • Deduplication (#8708)
  • Functions support and changes:
  • The full list of features introduced in this release
  • Vulnerability fixs
  • Bug fixs

Was this helpful?

Export as PDF
  1. Basics
  2. Releases

0.11.0

Previous0.12.0Next0.10.0

Was this helpful?

Summary

Apache Pinot 0.11.0 has introduced many new features to extend the query abilities, e.g. the Multi-Stage query engine enables Pinot to do distributed joins, more sql syntax(DML support), query functions and indexes(Text index, Timestamp index) supported for new use cases. And as always, more integrations with other systems(E.g. Spark3, Flink).

Note: there is a major upgrade for Apache Helix to 1.0.4, so make sure you upgrade the system in the order of:

Helix Controller -> Pinot Controller -> Pinot Broker -> Pinot server

Multi-Stage Query Engine

The new multi-stage query engine (a.k.a V2 query engine) is designed to support more complex SQL semantics such as JOIN, OVER window, MATCH_RECOGNIZE and eventually, make Pinot support closer to full ANSI SQL semantics. More to read:

Pause Stream Consumption on Apache Pinot

Pinot operators can pause real-time consumption of events while queries are being executed, and then resume consumption when ready to do so again.\

Gap-filling function

Long waiting feature for segment generation on Spark 3.x.

Similar to the Spark Pinot connector, this allows Flink users to dump data from the Flink application to Pinot.

This feature allows better fine-grained control on pinot queries.

Wanna search text in real time? The new text indexing engine in Pinot supports the following capabilities:

  1. New operator: LIKE

select * FROM foo where text_col LIKE 'a%'
  1. New operator: CONTAINS

select * from foo where text_col CONTAINS 'bar'
  1. Native text index, built from the ground up, focusing on Pinot’s time series use cases and utilizing existing Pinot indices and structures(inverted index, bitmap storage).

  2. Real Time Text Index

This feature supports enabling deduplication for real-time tables, via a top-level table config. At a high level, primaryKey (as defined in the table schema) hashes are stored into in-memory data structures, and each incoming row is validated against it. Duplicate rows are dropped.

The expectation while using this feature is for the stream to be partitioned by the primary key, strictReplicaGroup routing to be enabled, and the configured stream consumer type to be low level. These requirements are therefore mandated via table config API's input validations.

Functions support and changes:

The full list of features introduced in this release

Vulnerability fixs

Pinot has resolved all the high-level vulnerabilities issues:

Bug fixs

More to read:

The gapfilling functions allow users to interpolate data and perform powerful aggregations and data processing over time series data. More to read:

Add support for Spark 3.x ()

Add Flink Pinot connector ()

Show running queries and cancel query by id ()

Timestamp Index ()

This allows users to have better query performance on the timestamp column for lower granularity. See:

Native Text Indices ()

Read more:

Adding DML definition and parse SQL InsertFile ()

Now you can use INSERT INTO [database.]table FROM FILE dataDirURI OPTION ( k=v ) [, OPTION (k=v)]* to load data into Pinot from a file using Minion. See:

Deduplication ()

Add support for functions arrayConcatLong, arrayConcatFloat, arrayConcatDouble ()

Add support for regexpReplace scalar function ()

Add support for Base64 Encode/Decode Scalar Functions ()

Optimize like to regexp conversion to do not include unnecessary ^._ and ._$ ()

Support DISTINCT on multiple MV columns ()

Support DISTINCT on single MV column ()

Add histogram aggregation function ()

Optimize dateTimeConvert scalar function to only parse the format once ()

Support conjugates for scalar functions, add more scalar functions ()

add FIRSTWITHTIME aggregate function support ()

Add PercentileSmartTDigestAggregationFunction ()

Simplify the parameters for DistinctCountSmartHLLAggregationFunction ()

add scalar function for cast so it can be calculated at compile time ()

Scalable Gapfill Implementation for Avg/Count/Sum ()

Add commonly used math, string and date scalar functions in Pinot ()

Datetime transform functions ()

Scalar function for url encoding and decoding ()

Add support for IS NULL and NOT IS NULL in transform functions ()

Support st_contains using H3 index ()

add query cancel APIs on controller backed by those on brokers ()

Add an option to search input files recursively in ingestion job. The default is set to true to be backward compatible. ()

Adding endpoint to download local log files for each component ()

Add metrics to track controller segment download and upload requests in progress ()

add a freshness based consumption status checker ()

Force commit consuming segments ()

Adding kafka offset support for period and timestamp ()

Make upsert metadata manager pluggable ()

Adding logger utils and allow change logger level at runtime ()

Proper null handling in equality, inequality and membership operators for all SV column data types ()

support to show running queries and cancel query by id ()

Enhance upsert metadata handling ()

Proper null handling in Aggregation functions for SV data types ()

Add support for IAM role based credentials in Kinesis Plugin ()

Task genrator debug api ()

Add Segment Lineage List API ()

[colocated-join] Adds Support for instancePartitionsMap in Table Config ()

Support pause/resume consumption of real-time tables ()

Minion tab in Pinot UI ()

Add Protocol Buffer Stream Decoder ()

Update minion task metadata ZNode path ()

add /tasks/{taskType}/{tableNameWithType}/debug API ()

Defined a new broker metric for total query processing time ()

Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY ()

fixing REGEX OPTION parser ()

Enable key value byte stitching in PulsarMessageBatch ()

Add property to skip adding hadoop jars to package ()

Support DISTINCT on multiple MV columns ()

Implement Mutable FST Index ()

Support DISTINCT on single MV column ()

Add controller API for reload segment task status ()

Spark Connector, support for TIMESTAMP and BOOLEAN fields ()

Allow moveToFinalLocation in METADATA push based on config () ()

allow up to 4GB per bitmap index ()

Deprecate debug options and always use query options ()

Streamed segment download & untar with rate limiter to control disk usage ()

Improve the Explain Plan accuracy ()

allow to set https as the default scheme ()

Add histogram aggregation function ()

Allow table name with dots by a PinotConfiguration switch ()

Disable Groovy function by default ()

Deduplication ()

Add pluggable client auth provider ()

Adding pinot file system command ()

Allow broker to automatically rewrite expensive function to its approximate counterpart ()

allow to take data outside the time window by negating the window filter ()

Support BigDecimal raw value forward index; Support BigDecimal in many transforms and operators ()

Ingestion Aggregation Feature ()

Enable uploading segments to real-time tables ()

Package kafka 0.9 shaded jar to pinot-distribution ()

Simplify the parameters for DistinctCountSmartHLLAggregationFunction ()

Add PercentileSmartTDigestAggregationFunction ()

Add support for Spark 3.x ()

Adding DML definition and parse SQL InsertFile ()

endpoints to get and delete minion task metadata ()

Add query option to use more replica groups ()

Only discover public methods annotated with @ScalarFunction ()

Support single-valued BigDecimal in schema, type conversion, SQL statements and minimum set of transforms. ()

Add connection based FailureDetector ()

Add endpoints for some finer control on minion tasks ()

Add adhoc minion task creation endpoint ()

Rewrite PinotQuery based on expression hints at instance/segment level ()

Allow disabling dict generation for High cardinality columns ()

add segment size metric on segment push ()

Implement Native Text Operator ()

Change default memory allocation for consuming segments from on-heap to off-heap ()

New Pinot storage metrics for compressed tar.gz and table size w/o replicas ()

add a experiment API for upsert heap memory estimation ()

Timestamp type index ()

Upgrade Helix to 1.0.4 in Pinot ()

Allow overriding expression in query through query config ()

Always handle null time values ()

Add prefixesToRename config for renaming fields upon ingestion ()

Added multi column partitioning for offline table ()

Automatically update broker resource on broker changes ()

Add a new workflow to check vulnerabilities using trivy ()

Disable Groovy function by default ()

Upgrade netty due to security vulnerability ()

Upgrade protobuf as the current version has security vulnerability ()

Upgrade to hadoop 2.10.1 due to cves ()

Upgrade Helix to 1.0.4 ()

Upgrade thrift to 0.15.0 ()

Upgrade jetty due to security issue ()

Upgrade netty ()

Upgrade snappy version ()

Nested arrays and map not handled correctly for complex types ()

Fix empty data block not returning schema ()

Allow mvn build with development webpack; fix instances default value ()

Fix the race condition of reflection scanning classes ()

Fix ingress manifest for controller and broker ()

Fix jvm processors count ()

Fix grpc query server not setting max inbound msg size ()

Fix upsert replace ()

Fix the race condition for partial upsert record read ()

Fix log msg, as it missed one param value ()

Fix authentication issue when auth annotation is not required ()

Fix segment pruning that can break server subquery ()

Fix the NPE for ADLSGen2PinotFS ()

Fix cross merge ()

Fix LaunchDataIngestionJobCommand auth header ()

Fix catalog skipping ()

Fix adding util for getting URL from InstanceConfig ()

Fix string length in MutableColumnStatistics ()

Fix instance details page loading table for tenant ()

Fix thread safety issue with java client ()

Fix allSegmentLoaded check ()

Fix bug in segmentDetails table name parsing; style the new indexes table ()

Fix pulsar close bug ()

Fix REGEX OPTION parser ()

Avoid reporting negative values for server latency. ()

Fix getConfigOverrides in MinionQuickstart ()

Fix segment generation error handling ()

Fix multi stage engine serde ()

Fix server discovery ()

Fix Upsert config validation to check for metrics aggregation ()

Fix multi value column index creation ()

Fix grpc port assignment in multiple server quickstart ()

Spark Connector GRPC reader fix for reading real-time tables ()

Fix auth provider for minion ()

Fix metadata push mode in IngestionUtils ()

Misc fixes on segment validation for uploaded real-time segments ()

Fix a typo in ServerInstance.startQueryServer() ()

Fix the issue of server opening up query server prematurely ()

Fix regression where case order was reversed, add regression test ()

Fix dimension table load when server restart or reload table ()

Fix when there're two index filter operator h3 inclusion index throw exception ()

Fix the race condition of reading time boundary info ()

Fix pruning in expressions by max/min/bloom ()

Fix GcsPinotFs listFiles by using bucket directly ()

Fix column data type store for data table ()

Fix the potential NPE for timestamp index rewrite ()

Fix on timeout string format in KinesisDataProducer ()

Fix bug in segment rebalance with replica group segment assignment ()

Fix the upsert metadata bug when adding segment with same comparison value ()

Fix the deadlock in ClusterChangeMediator ()

Fix BigDecimal ser/de on negative scale ()

Fix table creation bug for invalid real-time consumer props ()

Fix the bug of missing dot to extract sub props from ingestion job filesytem spec and minion segmentNameGeneratorSpec ()

Fix to query inconsistencies under heavy upsert load (resolves ) ()

Fix ChildTraceId when using multiple child threads, make them unique ()

Fix the group-by reduce handling when query times out ()

Fix a typo in BaseBrokerRequestHandler ()

Fix TIMESTAMP data type usage during segment creation ()

Fix async-profiler install ()

Fix ingestion transform config bugs. ()

Fix upsert inconsistency by snapshotting the validDocIds before reading the numDocs ()

Fix bug when importing files with the same name in different directories ()

Fix the missing NOT handling ()

Fix setting of metrics compression type in RealtimeSegmentConverter ()

Fix segment status checker to skip push in-progress segments ()

Fix datetime truncate for multi-day ()

Fix redirections for routes with access-token ()

Fix CSV files surrounding space issue ()

Fix suppressed exceptions in GrpcBrokerRequestHandler()

https://medium.com/apache-pinot-developer-blog/pause-stream-consumption-on-apache-pinot-772a971ef403
https://www.startree.ai/blog/gapfill-function-for-time-series-datasets-in-pinot
#8560
#8233
#9171
#8343
https://docs.pinot.apache.org/basics/indexing/timestamp-index
#8384
https://medium.com/@atri.jiit/text-search-time-series-style-681af37ba42e
#8557
https://docs.pinot.apache.org/basics/data-import/from-query-console
#8708
#9131
#9123
#9114
#8893
#8873
#8857
#8724
#8939
#8582
#7647
#8181
#8565
#8566
#8535
#8647
#8304
#8397
#8378
#8264
#8498
#9276
#9265
#9259
#9258
#9244
#9197
#9193
#9186
#9180
#9173
#9171
#9095
#9086
#9071
#9058
#9005
#9006
#8989
#8986
#8970
#8978
#8972
#8959
#8949
#8941
#8927
#8905
#8897
#8888
#8873
#8861
#8857
#8828
#8825
#8823
#8815
#8796
#8768
#8753
#8738
#8729
#8724
#8713
#8711
#8708
#8670
#8659
#8655
#8640
#8622
#8611
#8584
#8569
#8566
#8565
#8560
#8557
#8551
#8550
#8544
#8503
#8491
#8486
#8465
#8451
#8398
#8387
#8384
#8380
#8358
#8355
#8343
#8325
#8319
#8310
#8273
#8255
#8249
#9044
#8711
#8328
#8287
#8478
#8325
#8427
#8348
#8346
#8494
#9235
#9222
#9179
#9167
#9135
#9138
#9126
#9132
#9130
#9124
#9110
#9090
#9088
#9087
#9070
#9069
#8856
#9059
#9035
#8971
#9010
#8958
#8913
#8905
#8892
#8858
#8812
#8689
#8664
#8781
#8848
#8834
#8824
#8831
#8802
#8786
#8794
#8785
#8748
#8721
#8707
#8685
#8672
#8656
#8648
#8633
#8631
#8598
#8590
#8572
#8553
#8509
#8511
#7958
#7971
#8443
#8450
#8448
#8407
#8404
#8394
#8392
#8337
#8366
#8350
#8323
#8327
#8285
#9028
#8272
https://docs.pinot.apache.org/developers/advanced/v2-multi-stage-query-engine