Apache Pinot Docs

Searchâ€¦

release-0.10.0

For Users

For Developers

For Operators

Configuration Reference

Aggregation Functions

Pinot provides support for aggregations using GROUP BY. You can use the following functions to get the aggregated value.

Function

Description

Example

Default Value When No Record Selected

Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.

`MODE(playerScore)`

`MODE(playerScore, 'MIN')`

`MODE(playerScore, 'MAX')`

`MODE(playerScore, 'AVG')`

`Double.NEGATIVE_INFINITY`

Returns the

`max - min`

value in a group`MINMAXRANGE(playerScore)`

`Double.NEGATIVE_INFINITY`

Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

`PERCENTILE(playerScore, 50), PERCENTILE(playerScore, 99.9)`

`Double.NEGATIVE_INFINITY`

`PERCENTILEEST(playerScore, 50), PERCENTILEEST(playerScore, 99.9)`

`Long.MIN_VALUE`

`PERCENTILETDIGEST(playerScore, 50), PERCENTILETDIGEST(playerScore, 99.9)`

`Double.NaN`

Returns the count of distinct row values in a group. This function is accurate for *INT* column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.

`DISTINCTCOUNTBITMAP(playerName)`

`0`

Returns an approximate distinct count using *HyperLogLog*. It also takes an optional second argument to configure the *log2m* for the *HyperLogLog*.

`DISTINCTCOUNTHLL(playerName, 12)`

`0`

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

`DISTINCTCOUNTRAWHLL(playerName)`

`0`

Returns the count of distinct values of a column when the column is pre-partitioned for each segment, where there is no common value within different segments. This function calculates the exact count of distinct values within the segment, then simply sums up the results from different segments to get the final result.

`SEGMENTPARTITIONEDDISTINCTCOUNT(playerName)`

`0`

Get the last value of **dataColumn** where the **timeColumn** is used to define the time of dataColumn and the **dataType** specifies the type of dataColumn, which can be

`BOOLEAN`

, `INT`

, `LONG`

, `FLOAT`

, `DOUBLE`

, `STRING`

`LASTWITHTIME(playerScore, timestampColumn, 'BOOLEAN')`

`LASTWITHTIME(playerScore, timestampColumn, 'INT')`

`LASTWITHTIME(playerScore, timestampColumn, 'LONG')`

`LASTWITHTIME(playerScore, timestampColumn, 'FLOAT')`

`LASTWITHTIME(playerScore, timestampColumn, 'DOUBLE')`

`LASTWITHTIME(playerScore, timestampColumn, 'STRING')`

`INT: Int.MIN_VALUE LONG: Long.MIN_VALUE FLOAT: Float.NaN DOUBLE: Double.NaN STRING: ""`

Deprecated functions:

Function

Description

Example

FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

`FASTHLL(playerName)`

Multi-value column functions

The following aggregation functions can be used for multi-value columns

Function

â€‹**PERCENTILEMV(column, N)**
Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

â€‹**PERCENTILETDIGESTMV(column, N)**
Returns the Nth percentile of the group using T-digest algorithmâ€‹

â€‹**DISTINCTCOUNTBITMAPMV**
Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.

â€‹**DISTINCTCOUNTRAWHLLMV**
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

Deprecated functions:

Function

Description

Example

stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

`FASTHLLMV(playerNames)`

Last modified 3mo ago

Copy link