Aggregation Functions
Aggregate functions return a single result for a group of rows.
Aggregate functions return a single result for a group of rows.
Aggregate functions return a single result for a group of rows. The following table shows supported aggregate functions in Pinot.
Function | Description | Example | Default Value When No Record Selected |
---|---|---|---|
Deprecated functions:
The following aggregation functions can be used for multi-value columns
Pinot supports FILTER clause in aggregation queries as follows:
In the query above, COL1
is aggregated only for rows where COL2 > 300 and COL3 > 50
. Similarly, COL2
is aggregated where COL2 < 50 and COL3 > 50
.
With NULL Value Support enabled, this allows to filter out the null values while performing aggregation as follows:
In the above query, COL1
is aggregated only for the non-null values. Without NULL value support, we would have to filter using the default null value.
Deprecated functions:
Function | Description | Example |
---|---|---|
Function |
---|
Function | Description | Example |
---|---|---|
Project a column where the maxima appears in a series of measuring columns.
ARG_MAX(measuring1, measuring2, measuring3, projection)
Will return no result
0
Returns the count of the records as Long
COUNT(*)
0
Returns the population covariance between of 2 numerical columns as Double
COVAR_POP(col1, col2)
Double.NEGATIVE_INFINITY
Returns the sample covariance between of 2 numerical columns as Double
COVAR_SAMP(col1, col2)
Double.NEGATIVE_INFINITY
Calculate the histogram of a numeric column as Double[]
HISTOGRAM(numberOfGames,0,200,10)
0, 0, ..., 0
Returns the minimum value of a numeric column as Double
MIN(playerScore)
Double.POSITIVE_INFINITY
Returns the maximum value of a numeric column as Double
MAX(playerScore)
Double.NEGATIVE_INFINITY
Returns the sum of the values for a numeric column as Double
SUM(playerScore)
0
Returns the sum of the values for a numeric column with optional precision and scale as BigDecimal
SUMPRECISION(salary), SUMPRECISION(salary, precision, scale)
0.0
Returns the average of the values for a numeric column as Double
AVG(playerScore)
Double.NEGATIVE_INFINITY
Returns the most frequent value of a numeric column as Double
. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.
MODE(playerScore)
MODE(playerScore, 'MIN')
MODE(playerScore, 'MAX')
MODE(playerScore, 'AVG')
Double.NEGATIVE_INFINITY
Returns the max - min
value for a numeric column as Double
MINMAXRANGE(playerScore)
Double.NEGATIVE_INFINITY
Returns the Nth percentile of the values for a numeric column as Double
. N is a decimal number between 0 and 100 inclusive.
PERCENTILE(playerScore, 50) PERCENTILE(playerScore, 99.9)
Double.NEGATIVE_INFINITY
Returns the Nth percentile of the values for a numeric column using Quantile Digest as Long
PERCENTILEEST(playerScore, 50)
PERCENTILEEST(playerScore, 99.9)
Long.MIN_VALUE
Returns the Nth percentile of the values for a numeric column using T-digest as Double
PERCENTILETDIGEST(playerScore, 50)
PERCENTILETDIGEST(playerScore, 99.9)
Double.NaN
Returns the Nth percentile (using compression factor of CF) of the values for a numeric column using T-digest as Double
PERCENTILETDIGEST(playerScore, 50, 1000)
PERCENTILETDIGEST(playerScore, 99.9, 500)
Double.NaN
PERCENTILESMARTTDIGEST
Returns the Nth percentile of the values for a numeric column as Double
. When there are too many values, automatically switch to approximate percentile using TDigest. The switch threshold
(100_000 by default) and compression
(100 by default) for the TDigest can be configured via the optional second argument.