Aggregation Functions
Aggregate functions return a single result for a group of rows.
Aggregate functions return a single result for a group of rows. The pages below provide detailed signatures, usage examples, and notes for each function.
Basic Aggregations
COUNT
Count of rows
SUM / SUMMV
Sum of values
MIN / MINMV
Minimum value
MAX / MAXMV
Maximum value
AVG / AVGMV
Average of values
MODE
Most frequent value
HISTOGRAM
Histogram of values
SUMPRECISION
High-precision sum using BigDecimal
ANYVALUE
Any arbitrary non-null value from a group
BOOLAND / BOOLOR
Logical AND/OR across boolean values
Array and String Aggregations
ARRAYAGG
Collect values into an array
LISTAGG
Concatenate values into a delimited string
SUMARRAYLONG
Element-wise sum of long arrays
SUMARRAYDOUBLE
Element-wise sum of double arrays
Statistical Functions
SKEWNESS
Skewness of a distribution
KURTOSIS
Kurtosis of a distribution
DISTINCTSUM / DISTINCTAVG
Sum/average of distinct values
EXPRMIN / EXPRMAX
Project columns at row with min/max measure
ARG_MIN / ARG_MAX
Project column at row with min/max measure
Set and Sketch Functions
IDSET
Build an IdSet for efficient filtering
For sketch-based functions (FrequentItems, CPC, HLL+, ULL, Tuple), see Sketch Functions. For funnel analysis functions, see Funnel Functions.
Distinct Counting
DISTINCT
Exact distinct values
DISTINCTCOUNT
Exact distinct count
DISTINCTCOUNTBITMAP
Distinct count using bitmap
DISTINCTCOUNTHLL
Approximate distinct count using HLL
DISTINCTCOUNTTHETASKETCH
Distinct count using Theta sketch
DISTINCTCOUNTSMARTHLL
Hybrid exact/HLL distinct count
For sketch-based distinct count functions (CPC, HLL+, ULL, Tuple), see Sketch Functions.
Percentile Functions
PERCENTILE
Exact percentile
PERCENTILEEST
Estimated percentile
PERCENTILEKLL
Percentile using KLL sketch
PERCENTILETDIGEST
Percentile using T-Digest
Window-Related Functions
FIRST_VALUE / LAST_VALUE
First/last value in window
LEAD / LAG
Access subsequent/preceding rows
FIRSTWITHTIME / LASTWITHTIME
First/last value by time
Multi-Value Column Functions
The following aggregation functions can be used for multi-value columns:
COUNTMV
Returns the count of a multi-value column
MINMV
Returns the minimum value of a numeric multi-value column
MAXMV
Returns the maximum value of a numeric multi-value column
SUMMV
Returns the sum of the values for a numeric multi-value column
AVGMV
Returns the average of the values for a numeric multi-value column
MINMAXRANGEMV
Returns the max - min value for a numeric multi-value column
PERCENTILEMV
Returns the Nth percentile of a numeric multi-value column
PERCENTILEESTMV
Returns the Nth percentile using Quantile Digest for a multi-value column
PERCENTILETDIGESTMV
Returns the Nth percentile using T-Digest for a multi-value column
DISTINCTCOUNTMV
Returns the count of distinct values for a multi-value column
DISTINCTCOUNTBITMAPMV
Returns the count of distinct values using bitmap for a multi-value column
DISTINCTCOUNTHLLMV
Returns an approximate distinct count using HLL for a multi-value column
DISTINCTCOUNTRAWHLLMV
Returns HyperLogLog response serialized as string for a multi-value column
DISTINCTSUMMV
Returns the sum of distinct values of a numeric multi-value column
DISTINCTAVGMV
Returns the average of distinct values of a numeric multi-value column
For sketch-based multi-value functions (HLL+), see Sketch Functions.
FILTER Clause in Aggregation
Pinot supports the FILTER clause in aggregation queries:
In the query above, COL1 is aggregated only for rows where COL2 > 300 and COL3 > 50. Similarly, COL2 is aggregated where COL2 < 50 and COL3 > 50.
With NULL Value Support enabled, this allows filtering out null values while performing aggregation:
Deprecated Functions
FASTHLL
Stores serialized HyperLogLog in String format. Performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES format.
FASTHLLMV
Multi-value version of FASTHLL. Also deprecated in favor of DISTINCTCOUNTHLL.
Last updated
Was this helpful?

