Supported Aggregations
Pinot provides support for aggregations using GROUP BY. You can use the following functions to get the aggregated value.
Function
Description
Example
COUNT
Get the count of rows in a group
COUNT(*)
MIN
Get the minimum value in a group
MIN(playerScore)
MAX
Get the maximum value in a group
MAX(playerScore)
SUM
Get the sum of values in a group
SUM(playerScore)
AVG
Get the average of the values in a group
AVG(playerScore)
MODE
Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.
MODE(playerScore)
MODE(playerScore, 'MIN')
MODE(playerScore, 'MAX')
MODE(playerScore, 'AVG')
MINMAXRANGE
Returns the max - min
value in a group
MINMAXRANGE(playerScore)
PERCENTILE(column, N)
Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive
PERCENTILE(playerScore, 50), PERCENTILE(playerScore, 99.9)
PERCENTILEEST(column, N)
Returns the Nth percentile of the group using Quantile Digest algorithm
PERCENTILEEST(playerScore, 50), PERCENTILEEST(playerScore, 99.9)
PERCENTILETDigest(column, N)
Returns the Nth percentile of the group using T-digest algorithm
PERCENTILETDIGEST(playerScore, 50), PERCENTILETDIGEST(playerScore, 99.9)
DISTINCT
Returns the distinct row values in a group
DISTINCT(playerName)
DISTINCTCOUNT
Returns the count of distinct row values in a group
DISTINCTCOUNT(playerName)
DISTINCTCOUNTBITMAP
Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.
DISTINCTCOUNTBITMAP(playerName)
DISTINCTCOUNTHLL
Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.
DISTINCTCOUNTHLL(playerName, 12)
DISTINCTCOUNTRAWHLL
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
DISTINCTCOUNTRAWHLL(playerName)
FASTHLL (Deprecated)
WARN: will be deprecated soon. FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format
FASTHLL(playerName)
SEGMENTPARTITIONEDDISTINCTCOUNT
Returns the count of distinct values of a column when the column is pre-partitioned for each segment, where there is no common value within different segments. This function calculates the exact count of distinct values within the segment, then simply sums up the results from different segments to get the final result.
SEGMENTPARTITIONEDDISTINCTCOUNT(playerName)
Multi-value column functions
The following aggregation functions can be used for multi-value columns
Function
Description
Example
COUNTMV
Get the count of rows in a group
COUNTMV(playerName)
MINMV
Get the minimum value in a group
MINMV(playerScores)
MAXMV
Get the maximum value in a group
MAXMV(playerScores)
SUMMV
Get the sum of values in a group
SUMMV(playerScores)
AVGMV
Get the avg of values in a group
AVGMV(playerScores)
MINMAXRANGEMV
Returns the max - min
value in a group
MINMAXRANGEMV(playerScores)
PERCENTILEMV(column, N)
Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive
PERCENTILEMV(playerScores, 50),
PERCENTILEMV(playerScores, 99.9)
PERCENTILEESTMV(column, N)
Returns the Nth percentile of the group using Quantile Digest algorithm
PERCENTILEESTMV(playerScores, 50),
PERCENTILEESTMV(playerScores, 99.9)
PERCENTILETDIGESTMV(column, N)
Returns the Nth percentile of the group using T-digest algorithm
PERCENTILETDIGESTMV(playerScores, 50),
PERCENTILETDIGESTMV(playerScores, 99.9),
DISTINCTCOUNTMV
Returns the count of distinct row values in a group
DISTINCTCOUNTMV(playerNames)
DISTINCTCOUNTBITMAPMV
Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.
DISTINCTCOUNTBITMAPMV(playerNames)
DISTINCTCOUNTHLLMV
Returns an approximate distinct count using HyperLogLog in a group
DISTINCTCOUNTHLLMV(playerNames)
DISTINCTCOUNTRAWHLLMV
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
DISTINCTCOUNTRAWHLLMV(playerNames)
FASTHLLMV (Deprecated)
stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format
FASTHLLMV(playerNames)
Last updated