arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Aggregation Functions

Pinot provides support for aggregations using GROUP BY. You can use the following functions to get the aggregated value.

Function
Description
Example
Default Value When No Record Selected

Get the count of rows in a group

COUNT(*)

0

Deprecated functions:

Function
Description
Example

hashtag
Multi-value column functions

The following aggregation functions can be used for multi-value columns

Function

Deprecated functions:

Function
Description
Example

Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.

Returns an approximate distinct count using HyperLogLog in a group

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

MIN

Get the minimum value in a group

MIN(playerScore)

Double.POSITIVE_INFINITY

MAX

Get the maximum value in a group

MAX(playerScore)

Double.NEGATIVE_INFINITY

SUM

Get the sum of values in a group

SUM(playerScore)

0

AVGarrow-up-right

Get the average of the values in a group

AVG(playerScore)

Double.NEGATIVE_INFINITY

MODE

Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.

MODE(playerScore)

MODE(playerScore, 'MIN')

MODE(playerScore, 'MAX')

MODE(playerScore, 'AVG')

Double.NEGATIVE_INFINITY

MINMAXRANGE

Returns the max - min value in a group

MINMAXRANGE(playerScore)

Double.NEGATIVE_INFINITY

PERCENTILE(column, N)

Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILE(playerScore, 50), PERCENTILE(playerScore, 99.9)

Double.NEGATIVE_INFINITY

PERCENTILEEST(column, N)

Returns the Nth percentile of the group using Quantile Digestarrow-up-right algorithm

PERCENTILEEST(playerScore, 50), PERCENTILEEST(playerScore, 99.9)

Long.MIN_VALUE

PERCENTILETDigest(column, N)

Returns the Nth percentile of the group using T-digest algorithmarrow-up-right

PERCENTILETDIGEST(playerScore, 50), PERCENTILETDIGEST(playerScore, 99.9)

Double.NaN

DISTINCTCOUNT

Returns the count of distinct row values in a group

DISTINCTCOUNT(playerName)

0

DISTINCTCOUNTBITMAP

Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.

DISTINCTCOUNTBITMAP(playerName)

0

DISTINCTCOUNTHLL

Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.

DISTINCTCOUNTHLL(playerName, 12)

0

DISTINCTCOUNTRAWHLL

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

DISTINCTCOUNTRAWHLL(playerName)

0

DISTINCTCOUNTTHETASKETCH

See Cardinality Estimation

0

DISTINCTCOUNTRAWTHETASKETCH

See Cardinality Estimation

0

SEGMENTPARTITIONEDDISTINCTCOUNT

Returns the count of distinct values of a column when the column is pre-partitioned for each segment, where there is no common value within different segments. This function calculates the exact count of distinct values within the segment, then simply sums up the results from different segments to get the final result.

SEGMENTPARTITIONEDDISTINCTCOUNT(playerName)

0

LASTWITHTIME(dataColumn, timeColumn, 'dataType')

Get the last value of dataColumn where the timeColumn is used to define the time of dataColumn and the dataType specifies the type of dataColumn, which can be BOOLEAN, INT, LONG, FLOAT, DOUBLE, STRING

LASTWITHTIME(playerScore, timestampColumn, 'BOOLEAN')

LASTWITHTIME(playerScore, timestampColumn, 'INT')

LASTWITHTIME(playerScore, timestampColumn, 'LONG')

LASTWITHTIME(playerScore, timestampColumn, 'FLOAT')

LASTWITHTIME(playerScore, timestampColumn, 'DOUBLE')

LASTWITHTIME(playerScore, timestampColumn, 'STRING')

INT: Int.MIN_VALUE LONG: Long.MIN_VALUE FLOAT: Float.NaN DOUBLE: Double.NaN STRING: ""

FASTHLL

FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLL(playerName)

COUNTMV Get the count of rows in a group

MINMV Get the minimum value in a group

MAXMV Get the maximum value in a group

SUMMV Get the sum of values in a group

AVGMV Get the avg of values in a group

MINMAXRANGEMV Returns the max - min value in a group

PERCENTILEMV(column, N) Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILEESTMV(column, N) Returns the Nth percentile of the group using Quantile Digestarrow-up-right

PERCENTILETDIGESTMV(column, N) Returns the Nth percentile of the group using T-digest algorithmarrow-up-right

FASTHLLMV (Deprecated)

stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLLMV(playerNames)

COUNT

Returns the count of distinct row values in a group

DISTINCTCOUNTMV
DISTINCTCOUNTBITMAPMV
DISTINCTCOUNTHLLMV
DISTINCTCOUNTRAWHLLMV