Explain
Learn more about multi-stage explain plans and how to interpret them.
This page is deprecated. Please read Explain planto get a better understanding of the different explain plans supported by Pinot.
Multi-stage plans are a bit more complex than single-stage plans. This page explains how to interpret multi-stage explain plans.
As explained in Explaining multi-stage queries, you can use the EXPLAIN PLAN
syntax to obtain the logical plan of a query. There are different formats for the output of the EXPLAIN PLAN
command, but all of them represent the logical plan of the query.
The query
Can produce the following output:
We can see that each node in the tree represents an operation that is executed in the query and each operator has some attributes. For example the LogicalJoin
operator has a condition
attribute that specifies the join condition and a joinType
. Although some of the attributes shown are easy to understand, some of them may require a bit more explanation.
In our example we can see that the LogicalTableScan
operator has a table attribute that indicates the table being scanned. The table is represented as a list with two elements: the first one is the schema name (default
by default) and the second one is the table name. Attributes like offset
and fetch
in the LogicalSort
operator are also easy to understand. But once we start to see expressions and references like $2
things start to be more complex.
These indexed references are used to reference the positions into the input row for each operator. In order to understand these references we need to look at the operator's children and see which attributes are being referenced. That usually requires going to the leaf operators and seeing which attributes are being generated.
For example, the LogicalTableScan
always returns the whole row of the table, so the attributes are the columns of the table. In our example:
We can see that the result of the LogicalTableScan
operator is processed by a LogicalProject
operator that is selecting the columns o_custkey
and o_shippriority
. This LogicalProject
operator is generating a row with two columns. $5
and $10
are the indexes of the column o_custkey
and o_shippriority
in the row generated by the LogicalTableScan
. Then we can see a PinotLogicalExchange
operator that is sending the result to the LogicalJoin
operator in the stage downstream. That PinotLogicalExcange
is distributing the rows using hash[0]
, which means to use the hash of the first column returned by LogicalProject
. As we saw before, that first column is the o_custkey
column, so the rows are distributed by the o_custkey
column.
The LogicalJoin
operator is receiving the rows from the two stages upstream. It is not clearly said anywhere, but the virtual row seen by the join operator is the concatenation of the rows sent by the first stage (aka left hand size) plus the rows sent by the second stage (aka right hand side).
The first stage is sending the c_address
and c_custkey
columns and the second stage is sending the o_custkey
and o_shippriority
columns. Therefore the join operator is consuming a row with the columns [c_address, c_custkey, o_custkey, o_shippriority]
. The LogicalJoin
operator is joining the rows using the condition =($1, $2)
, which means that it is joining the rows using the c_custkey
and o_custkey
columns and comparing them by equality. LogicalJoin
can generate new rows, but does not modify the virtual columns. Therefore this join is sending rows with the columns [c_address, c_custkey, o_custkey, o_shippriority]
to its downstream.
This downstream is the LogicalProject
operator that is selecting the columns $0
and $3
from the rows sent by the join operator. Therefore the resulting row contains the columns c_address
and o_shippriority
.
The rest of the operators are easier to read. Something that can be surprising is the LogicalSort
operator. In the SQL query used as example there was no order by, but the LogicalSort
operator is present in the plan. This is because in relational algebra a sort is always needed to limit the rows. In this case the LogicalSort
operator is limiting the rows to 10 without specifying a sort condition, so it is not really sorting the rows (which may be expensive). The corollary is that a LogicalSort
operator does not imply that an actual sort is being executed.
Last updated