githubEdit

Aggregation Functions

Aggregate functions return a single result for a group of rows.

Aggregate functions return a single result for a group of rows. The pages below provide detailed signatures, usage examples, and notes for each function.

Basic Aggregations

Function
Description

COUNT

Count of rows

SUM / SUMMV

Sum of values

MIN / MINMV

Minimum value

MAX / MAXMV

Maximum value

AVG / AVGMV

Average of values

MODE

Most frequent value

HISTOGRAM

Histogram of values

SUMPRECISION

High-precision sum using BigDecimal

ANYVALUE

Any arbitrary non-null value from a group

BOOLAND / BOOLOR

Logical AND/OR across boolean values

Array and String Aggregations

Function
Description

ARRAYAGG

Collect values into an array

LISTAGG

Concatenate values into a delimited string

SUMARRAYLONG

Element-wise sum of long arrays

SUMARRAYDOUBLE

Element-wise sum of double arrays

Statistical Functions

Function
Description

SKEWNESS

Skewness of a distribution

KURTOSIS

Kurtosis of a distribution

DISTINCTSUM / DISTINCTAVG

Sum/average of distinct values

EXPRMIN / EXPRMAX

Project columns at row with min/max measure

ARG_MIN / ARG_MAX

Project column at row with min/max measure

Set and Sketch Functions

Function
Description

IDSET

Build an IdSet for efficient filtering

For sketch-based functions (FrequentItems, CPC, HLL+, ULL, Tuple), see Sketch Functions. For funnel analysis functions, see Funnel Functions.

Distinct Counting

Function
Description

DISTINCT

Exact distinct values

DISTINCTCOUNT

Exact distinct count

DISTINCTCOUNTBITMAP

Distinct count using bitmap

DISTINCTCOUNTHLL

Approximate distinct count using HLL

DISTINCTCOUNTTHETASKETCH

Distinct count using Theta sketch

DISTINCTCOUNTSMARTHLL

Hybrid exact/HLL distinct count

For sketch-based distinct count functions (CPC, HLL+, ULL, Tuple), see Sketch Functions.

Percentile Functions

Function
Description

PERCENTILE

Exact percentile

PERCENTILEEST

Estimated percentile

PERCENTILEKLL

Percentile using KLL sketch

PERCENTILETDIGEST

Percentile using T-Digest

Function
Description

FIRST_VALUE / LAST_VALUE

First/last value in window

LEAD / LAG

Access subsequent/preceding rows

FIRSTWITHTIME / LASTWITHTIME

First/last value by time

Multi-Value Column Functions

The following aggregation functions can be used for multi-value columns:

Function
Description

COUNTMV

Returns the count of a multi-value column

MINMV

Returns the minimum value of a numeric multi-value column

MAXMV

Returns the maximum value of a numeric multi-value column

SUMMV

Returns the sum of the values for a numeric multi-value column

AVGMV

Returns the average of the values for a numeric multi-value column

MINMAXRANGEMV

Returns the max - min value for a numeric multi-value column

PERCENTILEMV

Returns the Nth percentile of a numeric multi-value column

PERCENTILEESTMV

Returns the Nth percentile using Quantile Digest for a multi-value column

PERCENTILETDIGESTMV

Returns the Nth percentile using T-Digest for a multi-value column

DISTINCTCOUNTMV

Returns the count of distinct values for a multi-value column

DISTINCTCOUNTBITMAPMV

Returns the count of distinct values using bitmap for a multi-value column

DISTINCTCOUNTHLLMV

Returns an approximate distinct count using HLL for a multi-value column

DISTINCTCOUNTRAWHLLMV

Returns HyperLogLog response serialized as string for a multi-value column

DISTINCTSUMMV

Returns the sum of distinct values of a numeric multi-value column

DISTINCTAVGMV

Returns the average of distinct values of a numeric multi-value column

For sketch-based multi-value functions (HLL+), see Sketch Functions.

FILTER Clause in Aggregation

Pinot supports the FILTER clause in aggregation queries:

In the query above, COL1 is aggregated only for rows where COL2 > 300 and COL3 > 50. Similarly, COL2 is aggregated where COL2 < 50 and COL3 > 50.

With NULL Value Support enabled, this allows filtering out null values while performing aggregation:

Deprecated Functions

Function
Description

FASTHLL

Stores serialized HyperLogLog in String format. Performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES format.

FASTHLLMV

Multi-value version of FASTHLL. Also deprecated in favor of DISTINCTCOUNTHLL.

Last updated

Was this helpful?