Window functions
Use window functions to compute averages, sort, rank, or count items, calculate sums, and find minimum or maximum values.
Important: To query using Windows functions, you must use Pinot's multi-stage query engine (v2).
Use window functions to do the following:
Window functions overview
Window functions consist of a supported function and an OVER clause.
To learn more about syntax, see Windows function query syntax, and read about the OVER clause.
To see Windows functions you can use in Pinot, see Supported Pinot window functions.
To query with Windows functions in Pinot, see examples for supported Windows functions.
Window function query syntax
For information about the Window function query syntax, see Calcite documentation.
Example query layout
The following query shows the complete components of the window function. Note, PARTITION BY
and ORDER BY
are optional.
OVER clause
Partition by clause
If a PARTITION BY clause is specified, the intermediate results will be grouped into different partitions based on the values of the columns appearing in the PARTITION BY clause.
If the PARTITION BY clause isn’t specified, the whole result will be regarded as one big partition, i.e. there is only one partition in the result set.
Order by clause
If an ORDER BY clause is specified, all the rows within the same partition will be sorted based on the values of the columns appearing in the window ORDER BY clause. The ORDER BY clause decides the order in which the rows within a partition are to be processed.
If no ORDER BY clause is specified while a PARTITION BY clause is specified, the order of the rows is undefined. If output ordering is desired a global ORDER BY clause should be used in the query.
Frame clause
{RANGE|ROWS} frame_start OR
{RANGE|ROWS} BETWEEN frame_start AND frame_end; frame_start and frame_end can be any of:
UNBOUNDED PRECEDING: expression PRECEDING. May only be allowed in ROWS mode [depends on DB, some support some don’t]
CURRENT ROW expression FOLLOWING. May only be allowed in ROWS mode [depends on DB, some support some don’t]
UNBOUNDED FOLLOWING:
If no FRAME clause is specified, then the default frame behavior depends on whether ORDER BY is present or not.
If an ORDER BY clause is specified, the default behavior is to calculate the aggregation from the beginning of the partition to the current row or UNBOUNDED PRECEDING to CURRENT ROW.
If only a PARTITION BY clause is present, the default frame behavior is to calculate the aggregation from UNBOUNDED PRECEDING to CURRENT ROW.
If there is no FRAME, no PARTITION BY, and no ORDER BY clause specified in the OVER clause (empty OVER), the whole result set is regarded as one partition, and there's one frame in the window.
The OVER clause applies a specified supported Windows function to compute values over a group of rows, and return a single result for each row. The OVER clause specifies how the rows are arranged and how the aggregation is done on those rows.
Inside the over clause, there are three optional components, i.e. PARTITION BY clause, ORDER BY clause, and FRAME clause.
Supported Pinot window functions
Returns the average of the values for a numeric column as aDouble over the specified number of rows or partition (if applicable).
AVG(playerScore)
Double.NEGATIVE_INFINITY
BOOL_AND
Returns true if all input values are true, otherwise false
BOOL_OR
Returns true if at least one input value is true, otherwise false
Examples of windows functions
Sum transactions by customer ID
Calculate the rolling sum transaction amount ordered by the payment date for each customer ID (note, the default frame here is UNBOUNDED PRECEDING and CURRENT ROW).
1
2023-02-14 23:22:38.996577
5.99
5.99
1
2023-02-15 16:31:19.996577
0.99
6.98
1
2023-02-15 19:37:12.996577
9.99
16.97
1
2023-02-16 13:47:23.996577
4.99
21.96
2
2023-02-17 19:23:24.996577
2.99
2.99
2
2023-02-17 19:23:24.996577
0.99
3.98
3
2023-02-16 00:02:31.996577
8.99
8.99
3
2023-02-16 13:47:36.996577
6.99
15.98
3
2023-02-17 03:43:41.996577
6.99
22.97
4
2023-02-15 07:59:54.996577
4.99
4.99
4
2023-02-16 06:37:06.996577
0.99
5.98
Find the minimum or maximum transaction by customer ID
Calculate the least (use MIN()
) or most expensive (use MAX()
) transaction made by each customer comparing all transactions made by the customer (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING). The following query shows how to find the least expensive transaction.
1
2023-02-14 23:22:38.996577
5.99
1
2023-02-15 16:31:19.996577
0.99
1
2023-02-15 19:37:12.996577
9.99
2
2023-04-30 04:34:36.996577
4.99
2
2023-04-30 12:16:09.996577
10.99
3
2023-03-23 05:38:40.996577
2.99
3
2023-04-07 08:51:51.996577
3.99
3
3 | 2023-04-08 11:15:37.996577
4.99
Find the average transaction amount by customer ID
Calculate a customer’s average transaction amount for all transactions they’ve made (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).
1
2023-02-14 23:22:38.996577
5.99
1
2023-02-15 16:31:19.996577
0.99
1
2023-02-15 19:37:12.996577
9.99
2
2023-04-30 04:34:36.996577
4.99
2
2023-04-30 12:16:09.996577
10.99
3
2023-03-23 05:38:40.996577
2.99
3
2023-04-07 08:51:51.996577
3.99
3
2023-04-08 11:15:37.996577
4.99
Rank year-to-date sales for a sales team
Use ROW_NUMBER()
to rank team members by their year-to-date sales (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).
1
Joe
Smith
2
Alice
Davis
3
James
Jones
4
Dane
Scott
Count the number of transactions by customer ID
Count the number of transactions made by each customer (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).
1
2023-02-14 23:22:38.99657
10.99
2
1
2023-02-15 16:31:19.996577
8.99
2
2
2023-04-30 04:34:36.996577
23.50
3
2
2023-04-07 08:51:51.996577
12.35
3
2
2023-04-08 11:15:37.996577
8.29
3