githubEdit

Aggregation Functions

Aggregate functions return a single result for a group of rows.

Aggregate functions return a single result for a group of rows. The pages below provide detailed signatures, usage examples, and notes for each function.

Basic Aggregations

Function
Description

Count of rows

Sum of values

Minimum value

Maximum value

Average of values

Most frequent value

Histogram of values

High-precision sum using BigDecimal

Any arbitrary non-null value from a group

Logical AND/OR across boolean values

Array and String Aggregations

Function
Description

Collect values into an array

Concatenate values into a delimited string

Element-wise sum of long arrays

Element-wise sum of double arrays

Statistical Functions

Function
Description

Skewness of a distribution

Kurtosis of a distribution

Sum/average of distinct values

Project columns at row with min/max measure

Project column at row with min/max measure

Set and Sketch Functions

Function
Description

Build an IdSet for efficient filtering

For sketch-based functions (FrequentItems, CPC, HLL+, ULL, Tuple), see Sketch Functions. For funnel analysis functions, see Funnel Functions.

Distinct Counting

Function
Description

Exact distinct values

Exact distinct count

Distinct count using bitmap

Approximate distinct count using HLL

Distinct count using Theta sketch

Hybrid exact/HLL distinct count

For sketch-based distinct count functions (CPC, HLL+, ULL, Tuple), see Sketch Functions.

Percentile Functions

Function
Description

Exact percentile

Estimated percentile

Percentile using KLL sketch

Percentile using T-Digest

Function
Description

First/last value in window

Access subsequent/preceding rows

First/last value by time

Multi-Value Column Functions

The following aggregation functions can be used for multi-value columns:

Function
Description

Returns the count of a multi-value column

Returns the minimum value of a numeric multi-value column

Returns the maximum value of a numeric multi-value column

Returns the sum of the values for a numeric multi-value column

Returns the average of the values for a numeric multi-value column

Returns the max - min value for a numeric multi-value column

Returns the Nth percentile of a numeric multi-value column

Returns the Nth percentile using Quantile Digest for a multi-value column

Returns the Nth percentile using T-Digest for a multi-value column

Returns the count of distinct values for a multi-value column

Returns the count of distinct values using bitmap for a multi-value column

Returns an approximate distinct count using HLL for a multi-value column

Returns HyperLogLog response serialized as string for a multi-value column

Returns the sum of distinct values of a numeric multi-value column

Returns the average of distinct values of a numeric multi-value column

For sketch-based multi-value functions (HLL+), see Sketch Functions.

FILTER Clause in Aggregation

Pinot supports the FILTER clause in aggregation queries:

In the query above, COL1 is aggregated only for rows where COL2 > 300 and COL3 > 50. Similarly, COL2 is aggregated where COL2 < 50 and COL3 > 50.

With NULL Value Support enabled, this allows filtering out null values while performing aggregation:

Deprecated Functions

Function
Description

FASTHLL

Stores serialized HyperLogLog in String format. Performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES format.

FASTHLLMV

Multi-value version of FASTHLL. Also deprecated in favor of DISTINCTCOUNTHLL.

Last updated

Was this helpful?