Aggregation Functions
Aggregate functions return a single result for a group of rows.
Aggregate functions return a single result for a group of rows. The following table shows supported aggregate functions in Pinot.
Function | Description | Example | Default Value When No Record Selected |
---|---|---|---|
Project a column where the maxima appears in a series of measuring columns. | ARG_MAX(measuring1, measuring2, measuring3, projection) | Will return no result | |
Returns the count of the records as |
|
| |
Returns the population covariance between of 2 numerical columns as |
|
| |
Returns the sample covariance between of 2 numerical columns as |
|
| |
Calculate the histogram of a numeric column as |
|
| |
Returns the minimum value of a numeric column as |
|
| |
Returns the maximum value of a numeric column as |
|
| |
Returns the sum of the values for a numeric column as |
|
| |
Returns the sum of the values for a numeric column with optional precision and scale as |
|
| |
Returns the average of the values for a numeric column as |
|
| |
Returns the most frequent value of a numeric column as |
|
| |
Returns the |
|
| |
Returns the Nth percentile of the values for a numeric column as |
|
| |
Returns the Nth percentile of the values for a numeric column using Quantile Digest as |
|
| |
Returns the Nth percentile of the values for a numeric column using T-digest as |
|
| |
Returns the Nth percentile (using compression factor of CF) of the values for a numeric column using T-digest as |
|
| |
PERCENTILESMARTTDIGEST | Returns the Nth percentile of the values for a numeric column as |
|
|
Returns the count of distinct values of a column as |
|
| |
Returns the count of distinct values of a column as |
|
| |
Returns an approximate distinct count using HyperLogLog as |
|
| |
Returns HyperLogLog response serialized as |
|
| |
DISTINCTCOUNTSMARTHLL | Returns the count of distinct values of a column as |
|
|
| |||
| |||
Returns the count of distinct values of a column as |
|
| |
LASTWITHTIME(dataColumn, timeColumn, 'dataType') | Get the last value of dataColumn where the timeColumn is used to define the time of dataColumn and the dataType specifies the type of dataColumn, which can be |
|
|
FIRSTWITHTIME(dataColumn, timeColumn, 'dataType') | Get the first value of dataColumn where the timeColumn is used to define the time of dataColumn and the dataType specifies the type of dataColumn, which can be |
|
|
Deprecated functions:
Function | Description | Example |
---|---|---|
FASTHLL | FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format |
|
Multi-value column functions
The following aggregation functions can be used for multi-value columns
Function |
---|
COUNTMV
Returns the count of a multi-value column as |
MINMV
Returns the minimum value of a numeric multi-value column as |
MAXMV
Returns the maximum value of a numeric multi-value column as |
SUMMV
Returns the sum of the values for a numeric multi-value column as |
AVGMV
Returns the average of the values for a numeric multi-value column as |
MINMAXRANGEMV
Returns the |
PERCENTILEMV(column, N)
Returns the Nth percentile of the values for a numeric multi-value column as |
PERCENTILEESTMV(column, N)
Returns the Nth percentile using Quantile Digest as |
PERCENTILETDIGESTMV(column, N)
Returns the Nth percentile using T-digest as |
PERCENTILETDIGESTMV(column, N, CF)
Returns the Nth percentile (using compression factor CF) using T-digest as |
DISTINCTCOUNTMV
Returns the count of distinct values for a multi-value column as |
DISTINCTCOUNTBITMAPMV
Returns the count of distinct values for a multi-value column as |
DISTINCTCOUNTHLLMV
Returns an approximate distinct count using HyperLogLog as |
DISTINCTCOUNTRAWHLLMV Returns HyperLogLog response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching. |
FILTER Clause in aggregation
Pinot supports FILTER clause in aggregation queries as follows:
In the query above, COL1
is aggregated only for rows where COL2 > 300 and COL3 > 50
. Similarly, COL2
is aggregated where COL2 < 50 and COL3 > 50
.
With NULL Value Support enabled, this allows to filter out the null values while performing aggregation as follows:
In the above query, COL1
is aggregated only for the non-null values. Without NULL value support, we would have to filter using the default null value.
NOTE: TheFILTER
clause is currently supported for aggregation-only queries, i.e., GROUP BY
is not supported.
Deprecated functions:
Function | Description | Example |
---|---|---|
FASTHLLMV (Deprecated) | stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format |
|