Sort or limit
Describes the sort or limit relation operator in the multi-stage query engine.
The sort or limit operator is used to sort the input data, limit the number of rows emitted by the operator or both. This operator is generated by the multi-stage query engine when you use an order by
, limit
or offset
operation in a query.
Implementation details
Blocking nature
Hints
None
Stats
executionTimeMs
Type: Long
The summation of time spent by all threads executing the operator. This means that the wall time spent in the operation may be smaller that this value if the parallelism is larger than 1.
emittedRows
Type: Long
The number of groups emitted by the operator.
Explain attributes
The sort or limit operator is represented in the explain plan as a LogicalSort
explain node.
sort#
Type: Expression
The sort expressions used by the operator. There is one of these attributes per sort expression. The first one is sort0
, the second one is sort1
, and so on.
The value of this attribute is the expression used to sort the data and may contain indexed columns ($0
, $1
, etc) that represent the columns of the virtual row generated by the upstream.
For example, the following plan:
Is saying that the rows are sorted first by the column with index 0 and then by the column with index 2 in the virtual row generated by the upstream. That column is generated by a projection whose first column (index 0) is userUUID
, the second (index 1) is deviceOS
and third (index 2) is the result of the SUBSTRING($4, 0, 2)
expression. As we know $4
in this project is deviceOS
, we can infer that the third column is the first two characters of the deviceOS
column.
dir#
Type: ASC or DESC
The direction of the sort. There is one of these attributes per sort expression.
fetch
Type: Long
The number of rows to emit. This is the equivalent to LIMIT
in SQL. Remember that the limit can be applied without sorting, in which case the order on which the rows are emitted is undefined.
offset
Type: Long
The number of rows to skip before emitting the rows. This is the equivalent to OFFSET
in SQL.
Tips and tricks
Limit and offset can prevent filter pushdown
In SQL, usually limit
and offset
are used in the last stage of the query. But when being used in the middle of the query (like in a subquery or a CTE), it can prevent filter pushdown optimization.
For example, imagine the following query:
This query may generate the plan:
We can see that the filter deviceOS = 'windows'
is pushed down to the leaf stage. This reduce the amount of data that needs to be scanned and can improve the query performance, specially if there is an inverted index in the deviceOS
column.
But if we modify the query to add a limit
to the userAttributes
table scan:
The generated plan will be:
Here we can see that the filter deviceOS = 'windows'
is not pushed down leaf stage, which means that the engine will need to scan all the data in the userAttributes
table and then apply the filter.
The reason why the filter is not pushed down is that the limit
operation must be applied before the filter in order to not break the semantics, which in this case are saying that we want 10 rows of the userAttributes
table without considering their deviceOS
value.
In cases where you actually want to apply the filter before the limit
, you can specify the where clause in the subquery. For example:
Which will produce the following plan:
As you can see, the filter is pushed down to leaf stage, which will reduce the amount of data
Do not abuse offset
pagination
offset
paginationAlthough OFFSET
and LIMIT
are a very simple way to paginate results, they can be very inefficient. It is almost always better to paginate using a WHERE
clause that uses a range of values instead of using OFFSET
.
The reason is that in order to apply an OFFSET
the engine must generate these rows and then discard them. Instead, if you use a WHERE
clause with a range of values, the engine can apply different techniques like indexes or pruning to avoid reading the rows that are not needed.
This is not a Pinot specific issue, but a general one. See for example Paging Through Results (external link) or Pagination, You Are Probably Doing It Wrong (external link).
Last updated
Was this helpful?