Apache Pinot 0.11.0 has introduced many new features to extend the query abilities, e.g. the Multi-Stage query engine enables Pinot to do distributed joins, more sql syntax(DML support), query functions and indexes(Text index, Timestamp index) supported for new use cases. And as always, more integrations with other systems(E.g. Spark3, Flink).
Note: there is a major upgrade for Apache Helix to 1.0.4, so make sure you upgrade the system in the order of:
Helix Controller -> Pinot Controller -> Pinot Broker -> Pinot server
The new multi-stage query engine (a.k.a V2 query engine) is designed to support more complex SQL semantics such as JOIN, OVER window, MATCH_RECOGNIZE and eventually, make Pinot support closer to full ANSI SQL semantics. More to read: https://docs.pinot.apache.org/developers/advanced/v2-multi-stage-query-engine
Pinot operators can pause real-time consumption of events while queries are being executed, and then resume consumption when ready to do so again.\
The gapfilling functions allow users to interpolate data and perform powerful aggregations and data processing over time series data. More to read: https://www.startree.ai/blog/gapfill-function-for-time-series-datasets-in-pinot
Long waiting feature for segment generation on Spark 3.x.
Similar to the Spark Pinot connector, this allows Flink users to dump data from the Flink application to Pinot.
This feature allows better fine-grained control on pinot queries.
Wanna search text in real time? The new text indexing engine in Pinot supports the following capabilities:
- 1.New operator: LIKE
select * FROM foo where text_col LIKE 'a%'
- 1.New operator: CONTAINS
select * from foo where text_col CONTAINS 'bar'
- 1.Native text index, built from the ground up, focusing on Pinot’s time series use cases and utilizing existing Pinot indices and structures(inverted index, bitmap storage).
- 2.Real Time Text Index
Now you can use
INSERT INTO [database.]table FROM FILE dataDirURI OPTION ( k=v ) [, OPTION (k=v)]*to load data into Pinot from a file using Minion. See: https://docs.pinot.apache.org/basics/data-import/from-query-console
This feature supports enabling deduplication for real-time tables, via a top-level table config. At a high level, primaryKey (as defined in the table schema) hashes are stored into in-memory data structures, and each incoming row is validated against it. Duplicate rows are dropped.
The expectation while using this feature is for the stream to be partitioned by the primary key, strictReplicaGroup routing to be enabled, and the configured stream consumer type to be low level. These requirements are therefore mandated via table config API's input validations.
Pinot has resolved all the high-level vulnerabilities issues: