# Explain

Multi-stage plans are a bit more complex than single-stage plans. This page explains how to interpret multi-stage explain plans.

As explained in [Explaining multi-stage queries](/release-1.2.0/for-users/user-guide-query/query-syntax/explain-plan-multi-stage.md), you can use the `EXPLAIN PLAN` syntax to obtain the logical plan of a query. There are different formats for the output of the `EXPLAIN PLAN` command, but all of them represent the logical plan of the query.

The query

```sql
explain plan for
select customer.c_address, orders.o_shippriority
from customer
join orders
    on customer.c_custkey = orders.o_custkey
limit 10
```

Can produce the following output:

```
LogicalSort(offset=[0], fetch=[10])
  PinotLogicalSortExchange(distribution=[hash], collation=[[]], isSortOnSender=[false], isSortOnReceiver=[false])
    LogicalSort(fetch=[10])
      LogicalProject(c_address=[$0], o_shippriority=[$3])
        LogicalJoin(condition=[=($1, $2)], joinType=[inner])
          PinotLogicalExchange(distribution=[hash[1]])
            LogicalProject(c_address=[$4], c_custkey=[$6])
              LogicalTableScan(table=[[default, customer]])
          PinotLogicalExchange(distribution=[hash[0]])
            LogicalProject(o_custkey=[$5], o_shippriority=[$10])
              LogicalTableScan(table=[[default, orders]])
```

We can see that each node in the tree represents an operation that is executed in the query and each operator has some attributes. For example the `LogicalJoin` operator has a `condition` attribute that specifies the join condition and a `joinType`. Although some of the attributes shown are easy to understand, some of them may require a bit more explanation.

In our example we can see that the `LogicalTableScan` operator has a table attribute that indicates the table being scanned. The table is represented as a list with two elements: the first one is the schema name (`default` by default) and the second one is the table name. Attributes like `offset` and `fetch` in the `LogicalSort` operator are also easy to understand. But once we start to see expressions and references like `$2` things start to be more complex.

These indexed references are used to reference the positions into the input row for each operator. In order to understand these references we need to look at the operator's children and see which attributes are being referenced. That usually requires going to the leaf operators and seeing which attributes are being generated.

For example, the `LogicalTableScan` always returns the whole row of the table, so the attributes are the columns of the table. In our example:

```
         PinotLogicalExchange(distribution=[hash[0]])
            LogicalProject(o_custkey=[$5], o_shippriority=[$10])
              LogicalTableScan(table=[[default, orders]])
```

We can see that the result of the `LogicalTableScan` operator is processed by a `LogicalProject` operator that is selecting the columns `o_custkey` and `o_shippriority`. This `LogicalProject` operator is generating a row with two columns. `$5` and `$10` are the indexes of the column `o_custkey` and `o_shippriority` in the row generated by the `LogicalTableScan`. Then we can see a `PinotLogicalExchange` operator that is sending the result to the `LogicalJoin` operator in the stage downstream. That `PinotLogicalExcange` is distributing the rows using `hash[0]`, which means to use the hash of the first column returned by `LogicalProject`. As we saw before, that first column is the `o_custkey` column, so the rows are distributed by the `o_custkey` column.

The `LogicalJoin` operator is receiving the rows from the two stages upstream. It is not clearly said anywhere, but the virtual row seen by the join operator is the concatenation of the rows sent by the first stage (aka left hand size) plus the rows sent by the second stage (aka right hand side).

The first stage is sending the `c_address` and `c_custkey` columns and the second stage is sending the `o_custkey` and `o_shippriority` columns. Therefore the join operator is consuming a row with the columns `[c_address, c_custkey, o_custkey, o_shippriority]`. The `LogicalJoin` operator is joining the rows using the condition `=($1, $2)`, which means that it is joining the rows using the `c_custkey` and `o_custkey` columns and comparing them by equality. `LogicalJoin` can generate new rows, but does not modify the virtual columns. Therefore this join is sending rows with the columns `[c_address, c_custkey, o_custkey, o_shippriority]` to its downstream.

This downstream is the `LogicalProject` operator that is selecting the columns `$0` and `$3` from the rows sent by the join operator. Therefore the resulting row contains the columns `c_address` and `o_shippriority`.

The rest of the operators are easier to read. Something that can be surprising is the `LogicalSort` operator. In the SQL query used as example there was no order by, but the `LogicalSort` operator is present in the plan. This is because in relational algebra a sort is always needed to limit the rows. In this case the `LogicalSort` operator is limiting the rows to 10 without specifying a sort condition, so it is not really sorting the rows (which may be expensive). The corollary is that a `LogicalSort` operator does not imply that an actual sort is being executed.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pinot.apache.org/release-1.2.0/for-users/user-guide-query/multi-stage-query/understanding-multi-stage-explain.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
