# Sort or Limit

The sort or limit operator is used to sort the input data, limit the number of rows emitted by the operator or both. This operator is generated by the multi-stage query engine when you use an `order by`, `limit` or `offset` operation in a query.

## Implementation details

### Blocking nature

## Hints

None

## Stats

### executionTimeMs

Type: Long

The summation of time spent by all threads executing the operator. This means that the wall time spent in the operation may be smaller that this value if the parallelism is larger than 1.

### emittedRows

Type: Long

The number of groups emitted by the operator.

## Explain attributes

The sort or limit operator is represented in the explain plan as a `LogicalSort` explain node.

### sort\#

Type: Expression

The sort expressions used by the operator. There is one of these attributes per sort expression. The first one is `sort0`, the second one is `sort1`, and so on.

The value of this attribute is the expression used to sort the data and may contain indexed columns (`$0`, `$1`, etc) that represent the columns of the virtual row generated by the upstream.

For example, the following plan:

```
LogicalSort(sort0=[$0], sort1=[$2], dir0=[ASC], dir1=[ASC], fetch=[10])
  LogicalProject(userUUID=[$6], deviceOS=[$4], EXPR$2=[SUBSTRING($4, 0, 2)])
    LogicalTableScan(table=[[default, userAttributes]])
```

Is saying that the rows are sorted first by the column with index 0 and then by the column with index 2 in the virtual row generated by the upstream. That column is generated by a projection whose first column (index 0) is `userUUID`, the second (index 1) is `deviceOS` and third (index 2) is the result of the `SUBSTRING($4, 0, 2)` expression. As we know `$4` in this project is `deviceOS`, we can infer that the third column is the first two characters of the `deviceOS` column.

### dir\#

Type: ASC or DESC

The direction of the sort. There is one of these attributes per sort expression.

### fetch

Type: Long

The number of rows to emit. This is the equivalent to `LIMIT` in SQL. Remember that the limit can be applied without sorting, in which case the order on which the rows are emitted is undefined.

### offset

Type: Long

The number of rows to skip before emitting the rows. This is the equivalent to `OFFSET` in SQL.

## Tips and tricks

### Limit and offset can prevent filter pushdown

In SQL, usually `limit` and `offset` are used in the last stage of the query. But when being used in the middle of the query (like in a subquery or a CTE), it can prevent filter pushdown optimization.

For example, imagine the following query:

```sql
select 
a.* 
from userAttributes as a
join userGroups as g
on a.userUUID = g.userUUID
where a.deviceOS = 'windows'
```

This query may generate the plan:

```
LogicalProject(daysSinceFirstTrip=[$0], deviceOS=[$1], totalTrips=[$2], userUUID=[$3])
  LogicalJoin(condition=[=($3, $4)], joinType=[inner])
    PinotLogicalExchange(distribution=[hash[3]])
      LogicalProject(daysSinceFirstTrip=[$3], deviceOS=[$4], totalTrips=[$5], userUUID=[$6])
        LogicalFilter(condition=[=($4, _UTF-8'windows')])
          LogicalTableScan(table=[[default, userAttributes]])
    PinotLogicalExchange(distribution=[hash[0]])
      LogicalProject(userUUID=[$4])
        LogicalTableScan(table=[[default, userGroups]])
```

We can see that the filter `deviceOS = 'windows'` is pushed down to the leaf stage. This reduce the amount of data that needs to be scanned and can improve the query performance, specially if there is an inverted index in the `deviceOS` column.

But if we modify the query to add a `limit` to the `userAttributes` table scan:

```sql
select 
a.* 
from (select * from userAttributes limit 10) as a
join userGroups as g
on a.userUUID = g.userUUID
where a.deviceOS = 'windows'
```

The generated plan will be:

```
LogicalProject(daysSinceFirstTrip=[$0], deviceOS=[$1], totalTrips=[$2], userUUID=[$3])
  LogicalJoin(condition=[=($3, $4)], joinType=[inner])
    PinotLogicalExchange(distribution=[hash[3]])
      LogicalFilter(condition=[=($1, _UTF-8'windows')])
        LogicalSort(offset=[0], fetch=[10])
          PinotLogicalSortExchange(distribution=[hash], collation=[[]], isSortOnSender=[false], isSortOnReceiver=[false])
            LogicalSort(fetch=[10])
              LogicalProject(daysSinceFirstTrip=[$3], deviceOS=[$4], totalTrips=[$5], userUUID=[$6])
                LogicalTableScan(table=[[default, userAttributes]])
    PinotLogicalExchange(distribution=[hash[0]])
      LogicalProject(userUUID=[$4])
        LogicalTableScan(table=[[default, userGroups]])
```

Here we can see that the filter `deviceOS = 'windows'` is not pushed down leaf stage, which means that the engine will need to scan all the data in the `userAttributes` table and then apply the filter.

The reason why the filter is not pushed down is that the `limit` operation must be applied before the filter in order to not break the semantics, which in this case are saying that we want 10 rows of the `userAttributes` table without considering their `deviceOS` value.

In cases where you actually want to apply the filter before the `limit`, you can specify the where clause in the subquery. For example:

```sql
select 
a.* 
from (select * from userAttributes where deviceOS = 'windows' limit 10) as a
join userGroups as g
on a.userUUID = g.userUUID
```

Which will produce the following plan:

```
LogicalProject(daysSinceFirstTrip=[$0], deviceOS=[$1], totalTrips=[$2], userUUID=[$3])
  LogicalJoin(condition=[=($3, $4)], joinType=[inner])
    PinotLogicalExchange(distribution=[hash[3]])
      LogicalSort(offset=[0], fetch=[10])
        PinotLogicalSortExchange(distribution=[hash], collation=[[]], isSortOnSender=[false], isSortOnReceiver=[false])
          LogicalSort(fetch=[10])
            LogicalProject(daysSinceFirstTrip=[$3], deviceOS=[$4], totalTrips=[$5], userUUID=[$6])
              LogicalFilter(condition=[=($4, _UTF-8'windows')])
                LogicalTableScan(table=[[default, userAttributes]])
    PinotLogicalExchange(distribution=[hash[0]])
      LogicalProject(userUUID=[$4])
        LogicalTableScan(table=[[default, userGroups]])
```

As you can see, the filter is pushed down to leaf stage, which will reduce the amount of data

### Do not abuse `offset` pagination

Although `OFFSET` and `LIMIT` are a very simple way to paginate results, they can be very inefficient. It is almost always better to paginate using a `WHERE` clause that uses a range of values instead of using `OFFSET`.

The reason is that in order to apply an `OFFSET` the engine must generate these rows and then discard them. Instead, if you use a `WHERE` clause with a range of values, the engine can apply different techniques like indexes or pruning to avoid reading the rows that are not needed.

This is not a Pinot specific issue, but a general one. See for example [Paging Through Results (external link)](https://use-the-index-luke.com/sql/partial-results/fetch-next-page) or [Pagination, You Are Probably Doing It Wrong (external link)](https://medium.com/swlh/sql-pagination-you-are-probably-doing-it-wrong-d0f2719cc166).
