LogoLogo
release-1.0.0
release-1.0.0
  • Introduction
  • Basics
    • Concepts
    • Architecture
    • Components
      • Cluster
        • Tenant
        • Server
        • Controller
        • Broker
        • Minion
      • Table
        • Segment
          • Deep Store
        • Schema
      • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • From Query Console
      • Batch Ingestion
        • Spark
        • Flink
        • Hadoop
        • Backfill Data
        • Dimension table
      • Stream ingestion
        • Apache Kafka
        • Amazon Kinesis
        • Apache Pulsar
      • Stream Ingestion with Upsert
      • Stream Ingestion with Dedup
      • Stream Ingestion with CLP
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
      • Complex Type (Array, Map) Handling
      • Reload a table segment
      • Upload a table segment
    • Indexing
      • Forward Index
      • Inverted Index
      • Star-Tree Index
      • Bloom Filter
      • Range Index
      • Native Text Index
      • Text search support
      • JSON Index
      • Geospatial
      • Timestamp Index
    • Releases
      • Apache Pinotâ„¢ 1.0.0 release notes
      • 0.12.1
      • 0.12.0
      • 0.11.0
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Querying JSON data
      • Query Options
      • Aggregation Functions
      • Cardinality Estimation
      • Explain Plan
      • Filtering with IdSet
      • GapFill Function For Time-Series Dataset
      • Grouping Algorithm
      • JOINs
      • Lookup UDF Join
      • Transformation Functions
      • User-Defined Functions (UDFs)
      • Window functions
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
      • Controller API Reference
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Update documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null Value Support
      • Use the multi-stage query engine (v2)
      • Troubleshoot issues with the multi-stage query engine (v2)
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Set up cluster
      • Server Startup Status Checkers
      • Set up table
      • Set up ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
      • Separating data storage by age
        • Using multiple tenants
        • Using multiple directories
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Consistent Push and Rollback
      • Access Control
      • Monitoring
      • Tuning
        • Real-time
        • Routing
        • Query Routing using Adaptive Server Selection
        • Query Scheduling
      • Upgrading Pinot with confidence
      • Managing Logs
      • OOM Protection Using Automatic Query Killing
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication
        • Basic auth access control
        • ZkBasicAuthAccessControl
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Schema
    • Ingestion Job Spec
    • Monitoring Metrics
    • Functions
      • ABS
      • ADD
      • ago
      • ARG_MIN / ARG_MAX
      • arrayConcatDouble
      • arrayConcatFloat
      • arrayConcatInt
      • arrayConcatLong
      • arrayConcatString
      • arrayContainsInt
      • arrayContainsString
      • arrayDistinctInt
      • arrayDistinctString
      • arrayIndexOfInt
      • arrayIndexOfString
      • ARRAYLENGTH
      • arrayRemoveInt
      • arrayRemoveString
      • arrayReverseInt
      • arrayReverseString
      • arraySliceInt
      • arraySliceString
      • arraySortInt
      • arraySortString
      • arrayUnionInt
      • arrayUnionString
      • AVGMV
      • Base64
      • caseWhen
      • ceil
      • CHR
      • codepoint
      • concat
      • count
      • COUNTMV
      • COVAR_POP
      • COVAR_SAMP
      • day
      • dayOfWeek
      • dayOfYear
      • DISTINCT
      • DISTINCTAVG
      • DISTINCTAVGMV
      • DISTINCTCOUNT
      • DISTINCTCOUNTBITMAP
      • DISTINCTCOUNTHLLMV
      • DISTINCTCOUNTHLL
      • DISTINCTCOUNTBITMAPMV
      • DISTINCTCOUNTMV
      • DISTINCTCOUNTRAWHLL
      • DISTINCTCOUNTRAWHLLMV
      • DISTINCTCOUNTRAWTHETASKETCH
      • DISTINCTCOUNTTHETASKETCH
      • DISTINCTSUM
      • DISTINCTSUMMV
      • DIV
      • DATETIMECONVERT
      • DATETRUNC
      • exp
      • FLOOR
      • FromDateTime
      • FromEpoch
      • FromEpochBucket
      • FUNNELCOUNT
      • Histogram
      • hour
      • isSubnetOf
      • JSONFORMAT
      • JSONPATH
      • JSONPATHARRAY
      • JSONPATHARRAYDEFAULTEMPTY
      • JSONPATHDOUBLE
      • JSONPATHLONG
      • JSONPATHSTRING
      • jsonextractkey
      • jsonextractscalar
      • length
      • ln
      • lower
      • lpad
      • ltrim
      • max
      • MAXMV
      • MD5
      • millisecond
      • min
      • minmaxrange
      • MINMAXRANGEMV
      • MINMV
      • minute
      • MOD
      • mode
      • month
      • mult
      • now
      • percentile
      • percentileest
      • percentileestmv
      • percentilemv
      • percentiletdigest
      • percentiletdigestmv
      • percentilekll
      • percentilerawkll
      • percentilekllmv
      • percentilerawkllmv
      • quarter
      • regexpExtract
      • regexpReplace
      • remove
      • replace
      • reverse
      • round
      • ROW_NUMBER
      • rpad
      • rtrim
      • second
      • SEGMENTPARTITIONEDDISTINCTCOUNT
      • sha
      • sha256
      • sha512
      • sqrt
      • startswith
      • ST_AsBinary
      • ST_AsText
      • ST_Contains
      • ST_Distance
      • ST_GeogFromText
      • ST_GeogFromWKB
      • ST_GeometryType
      • ST_GeomFromText
      • ST_GeomFromWKB
      • STPOINT
      • ST_Polygon
      • strpos
      • ST_Union
      • SUB
      • substr
      • sum
      • summv
      • TIMECONVERT
      • timezoneHour
      • timezoneMinute
      • ToDateTime
      • ToEpoch
      • ToEpochBucket
      • ToEpochRounded
      • TOJSONMAPSTR
      • toGeometry
      • toSphericalGeography
      • trim
      • upper
      • Url
      • UTF8
      • VALUEIN
      • week
      • year
      • yearOfWeek
      • Extract
    • Plugin Reference
      • Stream Ingestion Connectors
      • VAR_POP
      • VAR_SAMP
      • STDDEV_POP
      • STDDEV_SAMP
  • Reference
    • Single-stage query engine (v1)
    • Multi-stage query engine (v2)
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
    • Spark-Pinot Connector
  • Contributing
    • Contribute Pinot documentation
    • Style guide
Powered by GitBook
On this page
  • Window functions overview
  • Window function query syntax
  • Example query layout
  • OVER clause
  • Supported Pinot window functions
  • Examples of windows functions
  • Sum transactions by customer ID
  • Find the minimum or maximum transaction by customer ID
  • Find the average transaction amount by customer ID
  • Rank year-to-date sales for a sales team
  • Count the number of transactions by customer ID

Was this helpful?

Export as PDF
  1. For Users
  2. Query

Window functions

Use window functions to compute averages, sort, rank, or count items, calculate sums, and find minimum or maximum values.

PreviousUser-Defined Functions (UDFs)NextAPIs

Was this helpful?

Important: To query using Windows functions, you must

Use window functions to do the following:

Window functions overview

  • Window functions consist of a supported function and an .

  • To learn more about syntax, see , and read about the .

  • To see Windows functions you can use in Pinot, see .

  • To query with Windows functions in Pinot, see

Window function query syntax

For information about the Window function query syntax, see .

Example query layout

The following query shows the complete components of the window function. Note, PARTITION BY and ORDER BY are optional.

SELECT FUNC(column1) OVER (PARTITION BY column2 ORDER BY column3 ROWS 2 PRECEDING)
    FROM tableName
    WHERE filter_clause  

OVER clause

Partition by clause

  • If a PARTITION BY clause is specified, the intermediate results will be grouped into different partitions based on the values of the columns appearing in the PARTITION BY clause.

  • If the PARTITION BY clause isn’t specified, the whole result will be regarded as one big partition, i.e. there is only one partition in the result set.

Order by clause

  • If an ORDER BY clause is specified, all the rows within the same partition will be sorted based on the values of the columns appearing in the window ORDER BY clause. The ORDER BY clause decides the order in which the rows within a partition are to be processed.

  • If no ORDER BY clause is specified while a PARTITION BY clause is specified, the order of the rows is undefined. If output ordering is desired a global ORDER BY clause should be used in the query.

Frame clause

  • {RANGE|ROWS} frame_start OR

  • {RANGE|ROWS} BETWEEN frame_start AND frame_end; frame_start and frame_end can be any of:

    • UNBOUNDED PRECEDING: expression PRECEDING. May only be allowed in ROWS mode [depends on DB, some support some don’t]

    • CURRENT ROW expression FOLLOWING. May only be allowed in ROWS mode [depends on DB, some support some don’t]

    • UNBOUNDED FOLLOWING:

      • If no FRAME clause is specified, then the default frame behavior depends on whether ORDER BY is present or not.

      • If an ORDER BY clause is specified, the default behavior is to calculate the aggregation from the beginning of the partition to the current row or UNBOUNDED PRECEDING to CURRENT ROW.

      • If only a PARTITION BY clause is present, the default frame behavior is to calculate the aggregation from UNBOUNDED PRECEDING to CURRENT ROW.

If there is no FRAME, no PARTITION BY, and no ORDER BY clause specified in the OVER clause (empty OVER), the whole result set is regarded as one partition, and there's one frame in the window.

Inside the over clause, there are three optional components, i.e. PARTITION BY clause, ORDER BY clause, and FRAME clause.

Supported Pinot window functions

Function
Description
Example
Default Value When No Record Selected

Returns the average of the values for a numeric column as aDouble over the specified number of rows or partition (if applicable).

AVG(playerScore)

Double.NEGATIVE_INFINITY

BOOL_AND

Returns true if all input values are true, otherwise false

BOOL_OR

Returns true if at least one input value is true, otherwise false

Returns the count of the records as Long

COUNT(*)

0

Returns the minimum value of a numeric column as Double

MIN(playerScore)

Double.POSITIVE_INFINITY

Returns the maximum value of a numeric column as Double

MAX(playerScore)

Double.NEGATIVE_INFINITY

Assigns a unique row number to all the rows in a specified table.

ROW_NUMBER()

0

Returns the sum of the values for a numeric column as Double

SUM(playerScore)

0

Examples of windows functions

Sum transactions by customer ID

Calculate the rolling sum transaction amount ordered by the payment date for each customer ID (note, the default frame here is UNBOUNDED PRECEDING and CURRENT ROW).

SELECT customer_id, payment_date, amount, SUM(amount) OVER(PARTITION BY customer_id ORDER BY payment_date) from payment;
customer_id
payment_date
amount
sum

1

2023-02-14 23:22:38.996577

5.99

5.99

1

2023-02-15 16:31:19.996577

0.99

6.98

1

2023-02-15 19:37:12.996577

9.99

16.97

1

2023-02-16 13:47:23.996577

4.99

21.96

2

2023-02-17 19:23:24.996577

2.99

2.99

2

2023-02-17 19:23:24.996577

0.99

3.98

3

2023-02-16 00:02:31.996577

8.99

8.99

3

2023-02-16 13:47:36.996577

6.99

15.98

3

2023-02-17 03:43:41.996577

6.99

22.97

4

2023-02-15 07:59:54.996577

4.99

4.99

4

2023-02-16 06:37:06.996577

0.99

5.98

Find the minimum or maximum transaction by customer ID

Calculate the least (use MIN()) or most expensive (use MAX()) transaction made by each customer comparing all transactions made by the customer (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING). The following query shows how to find the least expensive transaction.

SELECT customer_id, payment_date, amount, MIN(amount) OVER(PARTITION BY customer_id) from payment;
customer_id
payment_date
amount
min

1

2023-02-14 23:22:38.996577

5.99

0.99

1

2023-02-15 16:31:19.996577

0.99

0.99

1

2023-02-15 19:37:12.996577

9.99

0.99

2

2023-04-30 04:34:36.996577

4.99

4.99

2

2023-04-30 12:16:09.996577

10.99

4.99

3

2023-03-23 05:38:40.996577

2.99

2.99

3

2023-04-07 08:51:51.996577

3.99

2.99

3

3 | 2023-04-08 11:15:37.996577

4.99

2.99

Find the average transaction amount by customer ID

Calculate a customer’s average transaction amount for all transactions they’ve made (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).

SELECT customer_id, payment_date, amount, AVG(amount) OVER(PARTITION BY customer_id) from payment;
customer_id
payment_date
amount
avg

1

2023-02-14 23:22:38.996577

5.99

5.66

1

2023-02-15 16:31:19.996577

0.99

5.66

1

2023-02-15 19:37:12.996577

9.99

5.66

2

2023-04-30 04:34:36.996577

4.99

7.99

2

2023-04-30 12:16:09.996577

10.99

7.99

3

2023-03-23 05:38:40.996577

2.99

3.99

3

2023-04-07 08:51:51.996577

3.99

3.99

3

2023-04-08 11:15:37.996577

4.99

3.99

Rank year-to-date sales for a sales team

Use ROW_NUMBER() to rank team members by their year-to-date sales (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).

SELECT ROW_NUMBER() OVER(ORDER BY SalesYTD DESC) AS Row,   
    FirstName, LastName AS "Total sales YTD"   
FROM Sales.vSalesPerson;  
Row
FirstName
LastName
Total sales YTD

1

Joe

Smith

2251368.34

2

Alice

Davis

2151341.64

3

James

Jones

1551363.54

4

Dane

Scott

1251358.72

Count the number of transactions by customer ID

Count the number of transactions made by each customer (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).

SELECT customer_id, payment_date, amount, count(amount) OVER(PARTITION BY customer_id) from payment;
customer_id
payment_date
amount
count

1

2023-02-14 23:22:38.99657

10.99

2

1

2023-02-15 16:31:19.996577

8.99

2

2

2023-04-30 04:34:36.996577

23.50

3

2

2023-04-07 08:51:51.996577

12.35

3

2

2023-04-08 11:15:37.996577

8.29

3

The OVER clause applies a specified to compute values over a group of rows, and return a single result for each row. The OVER clause specifies how the rows are arranged and how the aggregation is done on those rows.

use Pinot's multi-stage query engine (v2).
Calcite documentation
Compute averages
Rank items
Calculate sums
Find minimum or maximum values
OVER clause
Windows function query syntax
OVER clause
Supported Pinot window functions
examples for supported Windows functions.
supported Windows function
Sum transactions by customer ID
Find the minimum or maximum transaction by customer ID
Find the average transaction amount by customer ID
Rank year-to-date sales for a sales team
Count the number of transactions by customer ID
AVG
COUNT
MIN
MAX
ROW_NUMBER
SUM