Multi-Stage Query Engine
The new Pinot query engine version 2 (a.k.a Multi-Stage V2 query engine) is designed to support more complex SQL semantics such as JOIN, OVER window, MATCH_RECOGNIZE and eventually, make Pinot support closer to full ANSI SQL semantics.
Scatter-Gather Query Engine It also resolves the bottleneck effect for the broker reduce stage where only a single machine is dedicated to perform heavy lifting such as high cardinality GROUP BY result merging; ORDER BY sorting, etc.
How to use the multi-stage query engine
To enable the multi-stage engine,
please make sure to either
using the latest master commit.
Download the latest Apache Pinot docker image using the .
The V2 query engine is still in the beta phase, there might be various performance or feature gaps from the current query engine.
Here are the general troubleshooting steps:
Semantic / Runtime errors
Try downloading the latest docker image or building from the latest master commit
We continuously push bug fixes for the multi-stage engine so bugs you encountered might have already been fixed in the latest master build
Try reducing the size of the table(s) used:
Adding higher selectivity filters to the tables
Try executing part of the subquery or a simplified version of the query first.
How to share feedbacks
please report any bugs in Apache Pinot Slack . Please include:
the table/schema config(s)
the cluster config (zookeeper config, and each components config and scale)
the problematic SQL query string and corresponding ERROR messages.
We are continuously improving the multi-stage engine. However, since the multi-stage engine is still in beta-testing phase, there are some limitations to call out:
Incomplete data type support: multi-value columns and some other non-primitive data types are not supported. For example SELECT * with multi-value columns will fail.
The intermediate stages of the multi-stage engine are running purely on heap memory, thus executing a large table join will cause potential out-of-memory errors
For more up-to-date tracking of feature and performance support please follow the Github tracking issues:
Performance and stability tracker:
Reference: Design Docs
The overall PEP design doc and discussion can be found in the following links