1 of 1

Multi-Stage Query Engine

Overview

The new Pinot query engine version 2 (a.k.a Multi-Stage V2 query engine) is designed to support more complex SQL semantics such as JOIN, OVER window, MATCH_RECOGNIZE and eventually, make Pinot support closer to full ANSI SQL semantics.

It also resolves the bottleneck effect for the broker reduce stage where only a single machine is dedicated to perform heavy lifting such as high cardinality GROUP BY result merging; ORDER BY sorting, etc.

How to use the multi-stage query engine

To enable the multi-stage engine,

please make sure to either
- using the latest master commit.
- Download the latest Apache Pinot docker image using the .

Troubleshoot

The V2 query engine is still in the beta phase, there might be various performance or feature gaps from the current query engine.

Here are the general troubleshooting steps:

Semantic / Runtime errors

Try downloading the latest docker image or building from the latest master commit
- We continuously push bug fixes for the multi-stage engine so bugs you encountered might have already been fixed in the latest master build
Try rewriting your query

Timeout errors

Try reducing the size of the table(s) used:
- Adding higher selectivity filters to the tables
Try executing part of the subquery or a simplified version of the query first.

please report any bugs in Apache Pinot Slack . Please include:

the table/schema config(s)
the cluster config (zookeeper config, and each components config and scale)
the problematic SQL query string and corresponding ERROR messages.

Limitations

We are continuously improving the multi-stage engine. However, since the multi-stage engine is still in beta-testing phase, there are some limitations to call out:

Incomplete data type support: multi-value columns and some other non-primitive data types are not supported. For example SELECT * with multi-value columns will fail.
The intermediate stages of the multi-stage engine are running purely on heap memory, thus executing a large table join will cause potential out-of-memory errors

For more up-to-date tracking of feature and performance support please follow the Github tracking issues:

SQL feature tracker:
Performance and stability tracker:

Reference: Design Docs

The overall PEP design doc and discussion can be found in the following links

Multi-Stage Query Engine

Overview

How to use the multi-stage query engine

To enable the multi-stage engine,

please make sure to either
- using the latest master commit.
- Download the latest Apache Pinot docker image using the .

Troubleshoot

The V2 query engine is still in the beta phase, there might be various performance or feature gaps from the current query engine.

Here are the general troubleshooting steps:

Semantic / Runtime errors

Try downloading the latest docker image or building from the latest master commit
- We continuously push bug fixes for the multi-stage engine so bugs you encountered might have already been fixed in the latest master build
Try rewriting your query

Timeout errors

Try reducing the size of the table(s) used:
- Adding higher selectivity filters to the tables
Try executing part of the subquery or a simplified version of the query first.

please report any bugs in Apache Pinot Slack . Please include:

the table/schema config(s)
the cluster config (zookeeper config, and each components config and scale)
the problematic SQL query string and corresponding ERROR messages.

Limitations

We are continuously improving the multi-stage engine. However, since the multi-stage engine is still in beta-testing phase, there are some limitations to call out:

Incomplete data type support: multi-value columns and some other non-primitive data types are not supported. For example SELECT * with multi-value columns will fail.
The intermediate stages of the multi-stage engine are running purely on heap memory, thus executing a large table join will cause potential out-of-memory errors

For more up-to-date tracking of feature and performance support please follow the Github tracking issues:

SQL feature tracker:
Performance and stability tracker:

Reference: Design Docs

The overall PEP design doc and discussion can be found in the following links

Multi-Stage Query Engine

hashtagOverview

hashtagHow to use the multi-stage query engine

hashtagTroubleshoot

hashtagSemantic / Runtime errors

hashtagTimeout errors

hashtagHow to share feedbacks

hashtagLimitations

hashtagReference: Design Docs

Multi-Stage Query Engine

hashtagOverview

hashtagHow to use the multi-stage query engine

hashtagTroubleshoot

hashtagSemantic / Runtime errors

hashtagTimeout errors

hashtagHow to share feedbacks

hashtagLimitations

hashtagReference: Design Docs

Overview

How to use the multi-stage query engine

Troubleshoot

Semantic / Runtime errors

Timeout errors

How to share feedbacks

Limitations

Reference: Design Docs

Overview

How to use the multi-stage query engine

Troubleshoot

Semantic / Runtime errors

Timeout errors

How to share feedbacks

Limitations

Reference: Design Docs