Window Functions
Use window functions to compute averages, sort, rank, or count items, calculate sums, and find minimum or maximum values across windows.
Window Functions overview
This is an overview of the window functions feature.
Window function syntax
Pinot's window function (windowedCall) has the following syntax definition:
windowedCall:
windowFunction
OVER
window
windowFunction:
function_name '(' value [, value ]* ')'
|
function_name '(' '*' ')'
window:
'('
[ PARTITION BY expression [, expression ]* ]
[ ORDER BY orderItem [, orderItem ]* ]
[
RANGE BETWEEN frame_start AND frame_end
|
ROWS BETWEEN frame_start AND frame_end
|
RANGE frame_start
|
ROWS frame_start
]
')'
frame_start:
UNBOUNDED PRECEDING
|
offset PRECEDING
|
CURRENT ROW
|
offset FOLLOWING
frame_end:
offset PRECEDING
|
CURRENT ROW
|
offset FOLLOWING
|
UNBOUNDED FOLLOWING windowedCallrefers to the actual windowed operation.windowFunctionrefers to the window function used, see supported window functions.windowis the window definition / windowing mechanism, see supported window mechanism.
You can jump to the examples section to see more concrete use cases of window functions in Pinot.
Example window function query layout
The following query shows the complete components of the window function. Note that the PARTITION BY ,ORDER BY, and the FRAME clauses are all optional.
SELECT FUNC(column1) OVER (PARTITION BY column2 ORDER BY column3 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM tableName
WHERE filter_clause Window mechanism (OVER clause)
Partition by clause
If a
PARTITION BYclause is specified, the intermediate results will be grouped into different partitions based on the values of the columns appearing in thePARTITION BYclause.If the
PARTITION BYclause isn’t specified, the whole result will be regarded as one big partition, i.e. there is only one partition in the result set.
Order by clause
If an
ORDER BYclause is specified, all the rows within the same partition will be sorted based on the values of the columns appearing in the windowORDER BYclause. TheORDER BYclause decides the order in which the rows within a partition are to be processed.If no
ORDER BYclause is specified while aPARTITION BYclause is specified, the order of the rows is undefined. To order the output, use a globalORDER BYclause in the query.
Frame clause
RANGE type window frames currently cannot be used with offset PRECEDING / offset FOLLOWING
The following window frame clauses are currently supported:
RANGE frame_startwhereframe_startcan beUNBOUNDED PRECEDINGorCURRENT ROW(frame_endwill default toCURRENT ROW)ROWS frame_startwhereframe_startcan beUNBOUNDED PRECEDING,offset PRECEDING, orCURRENT ROW(frame_endwill default toCURRENT ROW)RANGE BETWEEN frame_start AND frame_end;frame_startcan be eitherUNBOUNDED PRECEDINGorCURRENT ROWandframe_endcan be eitherCURRENT ROWorUNBOUNDED FOLLOWINGROWS BETWEEN frame_start AND frame_end;frame_start/frame_endcan be one of:UNBOUNDED PRECEDING(frame_startonly)offset PRECEDINGwhereoffsetis an integer literalCURRENT ROWoffset FOLLOWINGwhereoffsetis an integer literalUNBOUNDED FOLLOWING(frame_endonly)
In RANGE mode, a frame_start of CURRENT ROW means the frame starts with the current row's first peer row (a row that the window's ORDER BY clause sorts as equivalent to the current row), while a frame_end of CURRENT ROW means the frame ends with the current row's last peer row. In ROWS mode, CURRENT ROW simply means the current row.
If no ORDER BY clause is specified, the window frame will always be RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING and cannot be modified. When an ORDER BY clause is present, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW if no explicit window frame is defined in the query.
If there is no FRAME, no PARTITION BY, and no ORDER BY clause specified in the OVER clause (empty OVER), the whole result set is regarded as one partition, and there's one frame in the window.
The OVER clause applies a specified supported windows function to compute values over a group of rows and return a single result for each row. The OVER clause specifies how the rows are arranged and how the aggregation is done on those rows.
Inside the over clause, there are three optional components: PARTITION BY clause, ORDER BY clause, and FRAME clause.
Window functions
Window functions are commonly used to do the following:
Supported window functions are listed in the following table.
Returns the average of the values for a numeric column in the defined window.
AVG(playerScore)
Double.NEGATIVE_INFINITY
BOOL_AND
Returns false if even a single value in the window is false, null if a single value in the window is null, and true if all the values in the window are true.
null
BOOL_OR
Returns true if even a single value in the window is true , null if a single value in the window is null, and false if all the values in the window are false.
null
The LEAD function provides access to a subsequent row within the same result set, without the need for a self-join.
LEAD(column_name, offset, default_value)
The LAG function provides access to a previous row within the same result set, without the need for a self-join.
LAG(column_name, offset, default_value)
The FIRST_VALUE function returns the value from the first row in the window.
FIRST_VALUE(salary)
The LAST_VALUE function returns the value from the last row in the window
LAST_VALUE(salary)
RANK
Returns the rank of the current row, with gaps - i.e., the row_number of the first row in its peer group.
RANK()
DENSE_RANK
Returns the rank of the current row, without gaps.
DENSE_RANK()
Note that no window frame clause can be specified for ROW_NUMBER, RANK, and DENSE_RANK window functions since they're applied on the entire partition by definition. Similarly, no window frame clause can be specified for LAG and LEAD since the row offset is an input to those functions themselves.
Window aggregate query examples
Sum transactions by customer ID
Calculate the rolling sum transaction amount ordered by the payment date for each customer ID (note, the default frame here is UNBOUNDED PRECEDING and CURRENT ROW).
SELECT customer_id, payment_date, amount, SUM(amount) OVER(PARTITION BY customer_id ORDER BY payment_date) from payment;1
2023-02-14 23:22:38.996577
5.99
5.99
1
2023-02-15 16:31:19.996577
0.99
6.98
1
2023-02-15 19:37:12.996577
9.99
16.97
1
2023-02-16 13:47:23.996577
4.99
21.96
2
2023-02-17 19:23:24.996577
2.99
2.99
2
2023-02-17 19:23:24.996577
0.99
3.98
3
2023-02-16 00:02:31.996577
8.99
8.99
3
2023-02-16 13:47:36.996577
6.99
15.98
3
2023-02-17 03:43:41.996577
6.99
22.97
4
2023-02-15 07:59:54.996577
4.99
4.99
4
2023-02-16 06:37:06.996577
0.99
5.98
Find the minimum or maximum transaction by customer ID
Calculate the least (use MIN()) or most expensive (use MAX()) transaction made by each customer comparing all transactions made by the customer (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING). The following query shows how to find the least expensive transaction.
SELECT customer_id, payment_date, amount, MIN(amount) OVER(PARTITION BY customer_id) from payment;1
2023-02-14 23:22:38.996577
5.99
1
2023-02-15 16:31:19.996577
0.99
1
2023-02-15 19:37:12.996577
9.99
2
2023-04-30 04:34:36.996577
4.99
2
2023-04-30 12:16:09.996577
10.99
3
2023-03-23 05:38:40.996577
2.99
3
2023-04-07 08:51:51.996577
3.99
3
3 | 2023-04-08 11:15:37.996577
4.99
Find the average transaction amount by customer ID
Calculate a customer’s average transaction amount for all transactions they’ve made (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).
SELECT customer_id, payment_date, amount, AVG(amount) OVER(PARTITION BY customer_id) from payment;1
2023-02-14 23:22:38.996577
5.99
1
2023-02-15 16:31:19.996577
0.99
1
2023-02-15 19:37:12.996577
9.99
2
2023-04-30 04:34:36.996577
4.99
2
2023-04-30 12:16:09.996577
10.99
3
2023-03-23 05:38:40.996577
2.99
3
2023-04-07 08:51:51.996577
3.99
3
2023-04-08 11:15:37.996577
4.99
Rank year-to-date sales for a sales team
Use ROW_NUMBER() to rank team members by their year-to-date sales (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).
SELECT ROW_NUMBER() OVER(ORDER BY SalesYTD DESC) AS Row,
FirstName, LastName AS "Total sales YTD"
FROM Sales.vSalesPerson; 1
Joe
Smith
2
Alice
Davis
3
James
Jones
4
Dane
Scott
Count the number of transactions by customer ID
Count the number of transactions made by each customer (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).
SELECT customer_id, payment_date, amount, count(amount) OVER(PARTITION BY customer_id) from payment;1
2023-02-14 23:22:38.99657
10.99
2
1
2023-02-15 16:31:19.996577
8.99
2
2
2023-04-30 04:34:36.996577
23.50
3
2
2023-04-07 08:51:51.996577
12.35
3
2
2023-04-08 11:15:37.996577
8.29
3
Last updated
Was this helpful?

