Window aggregate

Use window aggregate to compute averages, sort, rank, or count items, calculate sums, and find minimum or maximum values across window.

Important: To query using Windows functions, you must enable Pinot's multi-stage query engine (v2). See how to enable and use the multi-stage query engine (v2).

Window aggregate overview

This is an overview of the window aggregate feature.

Window aggregate syntax

Pinot's window function (windowedAggCall) includes the following syntax definition:

windowedAggCall:
      windowAggFunction
      OVER 
      window

windowAggFunction:
      agg '(' [ ALL | DISTINCT ] value [, value ]* ')'
   |
      agg '(' '*' ')'

window:
      '('
      [ PARTITION BY expression [, expression ]* ]
      [ ORDER BY orderItem [, orderItem ]* ]
      [
          RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING }
      |   ROWS numericExpression { PRECEDING | FOLLOWING }
      ]
      ')'
  • windowAggCall refers to the actual windowed agg operation.

  • windowAggFunction refers to the aggregation function used inside a windowed aggregate, see supported window aggregate functions.

  • window is the window definition / windowing mechanism, see supported window mechanism.

You can jump to the examples section to see more concrete use cases of window aggregate on Pinot.

Example window aggregate query layout

The following query shows the complete components of the window function. Note, PARTITION BY and ORDER BY are optional.

SELECT FUNC(column1) OVER (PARTITION BY column2 ORDER BY column3)
    FROM tableName
    WHERE filter_clause  

Window mechanism (OVER clause)

Partition by clause

  • If a PARTITION BY clause is specified, the intermediate results will be grouped into different partitions based on the values of the columns appearing in the PARTITION BY clause.

  • If the PARTITION BY clause isn’t specified, the whole result will be regarded as one big partition, i.e. there is only one partition in the result set.

Order by clause

  • If an ORDER BY clause is specified, all the rows within the same partition will be sorted based on the values of the columns appearing in the window ORDER BY clause. The ORDER BY clause decides the order in which the rows within a partition are to be processed.

  • If no ORDER BY clause is specified while a PARTITION BY clause is specified, the order of the rows is undefined. To order the output, use a global ORDER BY clause in the query.

Frame clause

Important Note: in release 1.0.0 window aggregate only supports UNBOUND PRECEDING, UNBOUND FOLLOWING and CURRENT ROW. frame and row count support have not been implemented yet.

  • {RANGE|ROWS} frame_start OR

  • {RANGE|ROWS} BETWEEN frame_start AND frame_end; frame_start and frame_end can be any of:

    • UNBOUNDED PRECEDING: expression PRECEDING. May only be allowed in ROWS mode [depends on DB, some support some don’t]

    • CURRENT ROW expression FOLLOWING. May only be allowed in ROWS mode [depends on DB, some support some don’t]

    • UNBOUNDED FOLLOWING:

      • If no FRAME clause is specified, then the default frame behavior depends on whether ORDER BY is present or not.

      • If an ORDER BY clause is specified, the default behavior is to calculate the aggregation from the beginning of the partition to the current row or UNBOUNDED PRECEDING to CURRENT ROW.

      • If only a PARTITION BY clause is present, the default frame behavior is to calculate the aggregation from UNBOUNDED PRECEDING to CURRENT ROW.

If there is no FRAME, no PARTITION BY, and no ORDER BY clause specified in the OVER clause (empty OVER), the whole result set is regarded as one partition, and there's one frame in the window.

The OVER clause applies a specified supported windows aggregate function to compute values over a group of rows and return a single result for each row. The OVER clause specifies how the rows are arranged and how the aggregation is done on those rows.

Inside the over clause, there are three optional components: PARTITION BY clause, ORDER BY clause, and FRAME clause.

Window aggregate functions

Window aggregate functions are commonly used to do the following:

Supported window aggregate functions are listed in the following table.

FunctionDescriptionExampleDefault Value When No Record Selected

Returns the average of the values for a numeric column as aDouble over the specified number of rows or partition (if applicable).

AVG(playerScore)

Double.NEGATIVE_INFINITY

BOOL_AND

Returns true if all input values are true, otherwise false

BOOL_OR

Returns true if at least one input value is true, otherwise false

Returns the count of the records as Long

COUNT(*)

0

Returns the minimum value of a numeric column as Double

MIN(playerScore)

Double.POSITIVE_INFINITY

Returns the maximum value of a numeric column as Double

MAX(playerScore)

Double.NEGATIVE_INFINITY

Assigns a unique row number to all the rows in a specified table.

ROW_NUMBER()

0

Returns the sum of the values for a numeric column as Double

SUM(playerScore)

0

The LEAD function provides access to a subsequent row within the same result set, without the need for a self-join.

LEAD(column_name, offset, default_value)

The LAG function provides access to a previous row within the same result set, without the need for a self-join.

LAG(column_name, offset, default_value)

FIRST_VALUE

The FIRST_VALUE function returns the first value in an ordered set of values within the window frame.

FIRST_VALUE(salary)

LAST_VALUE

The LAST_VALUE function returns the last value in an ordered set of values within the window frame.

LAST_VALUE(salary)

Window aggregate query examples

Sum transactions by customer ID

Calculate the rolling sum transaction amount ordered by the payment date for each customer ID (note, the default frame here is UNBOUNDED PRECEDING and CURRENT ROW).

SELECT customer_id, payment_date, amount, SUM(amount) OVER(PARTITION BY customer_id ORDER BY payment_date) from payment;
customer_idpayment_dateamountsum

1

2023-02-14 23:22:38.996577

5.99

5.99

1

2023-02-15 16:31:19.996577

0.99

6.98

1

2023-02-15 19:37:12.996577

9.99

16.97

1

2023-02-16 13:47:23.996577

4.99

21.96

2

2023-02-17 19:23:24.996577

2.99

2.99

2

2023-02-17 19:23:24.996577

0.99

3.98

3

2023-02-16 00:02:31.996577

8.99

8.99

3

2023-02-16 13:47:36.996577

6.99

15.98

3

2023-02-17 03:43:41.996577

6.99

22.97

4

2023-02-15 07:59:54.996577

4.99

4.99

4

2023-02-16 06:37:06.996577

0.99

5.98

Find the minimum or maximum transaction by customer ID

Calculate the least (use MIN()) or most expensive (use MAX()) transaction made by each customer comparing all transactions made by the customer (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING). The following query shows how to find the least expensive transaction.

SELECT customer_id, payment_date, amount, MIN(amount) OVER(PARTITION BY customer_id) from payment;
customer_idpayment_dateamountmin

1

2023-02-14 23:22:38.996577

5.99

0.99

1

2023-02-15 16:31:19.996577

0.99

0.99

1

2023-02-15 19:37:12.996577

9.99

0.99

2

2023-04-30 04:34:36.996577

4.99

4.99

2

2023-04-30 12:16:09.996577

10.99

4.99

3

2023-03-23 05:38:40.996577

2.99

2.99

3

2023-04-07 08:51:51.996577

3.99

2.99

3

3 | 2023-04-08 11:15:37.996577

4.99

2.99

Find the average transaction amount by customer ID

Calculate a customer’s average transaction amount for all transactions they’ve made (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).

SELECT customer_id, payment_date, amount, AVG(amount) OVER(PARTITION BY customer_id) from payment;
customer_idpayment_dateamountavg

1

2023-02-14 23:22:38.996577

5.99

5.66

1

2023-02-15 16:31:19.996577

0.99

5.66

1

2023-02-15 19:37:12.996577

9.99

5.66

2

2023-04-30 04:34:36.996577

4.99

7.99

2

2023-04-30 12:16:09.996577

10.99

7.99

3

2023-03-23 05:38:40.996577

2.99

3.99

3

2023-04-07 08:51:51.996577

3.99

3.99

3

2023-04-08 11:15:37.996577

4.99

3.99

Rank year-to-date sales for a sales team

Use ROW_NUMBER() to rank team members by their year-to-date sales (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).

SELECT ROW_NUMBER() OVER(ORDER BY SalesYTD DESC) AS Row,   
    FirstName, LastName AS "Total sales YTD"   
FROM Sales.vSalesPerson;  
RowFirstNameLastNameTotal sales YTD

1

Joe

Smith

2251368.34

2

Alice

Davis

2151341.64

3

James

Jones

1551363.54

4

Dane

Scott

1251358.72

Count the number of transactions by customer ID

Count the number of transactions made by each customer (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).

SELECT customer_id, payment_date, amount, count(amount) OVER(PARTITION BY customer_id) from payment;
customer_idpayment_dateamountcount

1

2023-02-14 23:22:38.99657

10.99

2

1

2023-02-15 16:31:19.996577

8.99

2

2

2023-04-30 04:34:36.996577

23.50

3

2

2023-04-07 08:51:51.996577

12.35

3

2

2023-04-08 11:15:37.996577

8.29

3