Aggregation Functions
Function
Description
Example
Default Value When No Record Selected
COUNT
Returns the count of the records as Long
COUNT(*)
0
MIN
Returns the minimum value of a numeric column as Double
MIN(playerScore)
Double.POSITIVE_INFINITY
MAX
Returns the maximum value of a numeric column as Double
MAX(playerScore)
Double.NEGATIVE_INFINITY
SUM
Returns the sum of the values for a numeric column as Double
SUM(playerScore)
0
Returns the sum of the values for a numeric column with optional precision and scale as BigDecimal
SUMPRECISION(salary), SUMPRECISION(salary, precision, scale)
0.0
AVG
Returns the average of the values for a numeric column as Double
AVG(playerScore)
Double.NEGATIVE_INFINITY
MODE
Returns the most frequent value of a numeric column as Double. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.
MODE(playerScore)
MODE(playerScore, 'MIN')
MODE(playerScore, 'MAX')
MODE(playerScore, 'AVG')
Double.NEGATIVE_INFINITY
Returns the max - min value for a numeric column as Double
MINMAXRANGE(playerScore)
Double.NEGATIVE_INFINITY
Returns the Nth percentile of the values for a numeric column as Double. N is a decimal number between 0 and 100 inclusive.
PERCENTILE(playerScore, 50) PERCENTILE(playerScore, 99.9)
Double.NEGATIVE_INFINITY
Returns the Nth percentile of the values for a numeric column using Quantile Digest as Long
PERCENTILEEST(playerScore, 50)
PERCENTILEEST(playerScore, 99.9)
Long.MIN_VALUE
Returns the Nth percentile of the values for a numeric column using T-digest as Double
PERCENTILETDIGEST(playerScore, 50)
PERCENTILETDIGEST(playerScore, 99.9)
Double.NaN
PERCENTILESMARTTDIGEST
Returns the Nth percentile of the values for a numeric column as Double. When there are too many values, automatically switch to approximate percentile using TDigest. The switch threshold (100_000 by default) and compression (100 by default) for the TDigest can be configured via the optional second argument.
PERCENTILESMARTTDIGEST(playerScore, 50)
PERCENTILESMARTTDIGEST(playerScore, 99.9, 'threshold=100;compression=50)
Double.NEGATIVE_INFINITY
Returns the count of distinct values of a column as Integer
DISTINCTCOUNT(playerName)
0
Returns the count of distinct values of a column as Integer. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.
DISTINCTCOUNTBITMAP(playerName)
0
Returns an approximate distinct count using HyperLogLog as Long. It also takes an optional second argument to configure the log2m for the HyperLogLog.
DISTINCTCOUNTHLL(playerName, 12)
0
Returns HyperLogLog response serialized as String. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
DISTINCTCOUNTRAWHLL(playerName)
0
DISTINCTCOUNTSMARTHLL
Returns the count of distinct values of a column as Integer. When there are too many distinct values, automatically switch to approximate distinct count using HyperLogLog. The switch threshold (100_000 by default) and log2m (12 by default) for the HyperLogLog can be configured via the optional second argument.
DISTINCTCOUNTSMARTHLL(playerName),
DISTINCTCOUNTSMARTHLL(playerName, 'threshold=100;log2m=8')
0
Returns the count of distinct values of a column as Long when the column is pre-partitioned for each segment, where there is no common value within different segments. This function calculates the exact count of distinct values within the segment, then simply sums up the results from different segments to get the final result.
SEGMENTPARTITIONEDDISTINCTCOUNT(playerName)
0
Deprecated functions:
Function
Description
Example
FASTHLL
FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format
FASTHLL(playerName)

Multi-value column functions

The following aggregation functions can be used for multi-value columns
Function
COUNTMV Returns the count of a multi-value column as Long
MINMV Returns the minimum value of a numeric multi-value column as Double
MAXMV Returns the maximum value of a numeric multi-value column as Double
SUMMV Returns the sum of the values for a numeric multi-value column as Double
AVGMV Returns the average of the values for a numeric multi-value column as Double
MINMAXRANGEMV Returns the max - min value for a numeric multi-value column as Double
PERCENTILEMV(column, N) Returns the Nth percentile of the values for a numeric multi-value column as Double
PERCENTILEESTMV(column, N) Returns the Nth percentile using Quantile Digest as Long
PERCENTILETDIGESTMV(column, N) Returns the Nth percentile using T-digest as Double
DISTINCTCOUNTMV Returns the count of distinct values for a multi-value column as Integer
DISTINCTCOUNTBITMAPMV Returns the count of distinct values for a multi-value column as Integer. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.
DISTINCTCOUNTHLLMV Returns an approximate distinct count using HyperLogLog as Long
DISTINCTCOUNTRAWHLLMV Returns HyperLogLog response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
Deprecated functions:
Function
Description
Example
FASTHLLMV (Deprecated)
stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format
FASTHLLMV(playerNames)