1 of 9

Query

Learn how to query Apache Pinot using SQL or explore data using the web-based Pinot query console.

Filtering with IdSet

Learn how to write fast queries for looking up ids in a list of values.

A common use case is filtering on an id field with a list of values. This can be done with the IN clause, but this approach doesn't perform well with large lists of ids. In these cases, you can use an IdSet.

Functions

ID_SET

ID_SET(columnName, 'sizeThresholdInBytes=8388608;expectedInsertions=5000000;fpp=0.03' )

This function returns a base 64 encoded IdSet of the values for a single column. The IdSet implementation used depends on the column data type:

INT - RoaringBitmap unless sizeThresholdInBytes is exceeded, in which case Bloom Filter.
LONG - Roaring64NavigableMap unless sizeThresholdInBytes is exceeded, in which case Bloom Filter.
Other types - Bloom Filter

The following parameters are used to configure the Bloom Filter:

expectedInsertions - Number of expected insertions for the BloomFilter, must be positive
fpp - Desired false positive probability for the BloomFilter, must be positive and < 1.0

Note that when a Bloom Filter is used, the filter results are approximate - you can get false-positive results (for membership in the set), leading to potentially unexpected results.

IN_ID_SET

IN_ID_SET(columnName, base64EncodedIdSet)

This function returns 1 if a column contains a value specified in the IdSet and 0 if it does not.

IN_SUBQUERY

IN_SUBQUERY(columnName, subQuery)

This function generates an IdSet from a subquery and then filters ids based on that IdSet on a Pinot broker.

INPARTITIONEDSUBQUERY

IN_PARTITIONED_SUBQUERY(columnName, subQuery)

This function generates an IdSet from a subquery and then filters ids based on that IdSet on a Pinot server.

This function works best when the data is partitioned by the id column and each server contains all the data for a partition. The generated IdSet for the subquery will be smaller as it will only contain the ids for the partitions served by the server. This will give better performance.

Examples

Create IdSet

You can create an IdSet of the values in the yearID column by running the following:

When creating an IdSet for values in non INT/LONG columns, we can configure the expectedInsertions:

We can also configure the fpp parameter:

Filter by values in IdSet

We can use the IN_ID_SET function to filter a query based on an IdSet. To return rows for yearIDs in the IdSet, run the following:

Filter by values not in IdSet

To return rows for yearIDs not in the IdSet, run the following:

Filter on broker

To filter rows for yearIDs in the IdSet on a Pinot Broker, run the following query:

To filter rows for yearIDs not in the IdSet on a Pinot Broker, run the following query:

Filter on server

To filter rows for yearIDs in the IdSet on a Pinot Server, run the following query:

To filter rows for yearIDs not in the IdSet on a Pinot Server, run the following query:

Supported Transformations

This document contains the list of all the transformation functions supported by Pinot SQL.

Math Functions

Function

Description

Example

String Functions

Multiple string functions are supported out of the box from release-0.5.0 .

Function

Description

Example

DateTime Functions

Date time functions allow you to perform transformations on columns that contain timestamps or dates.

JSON Functions

Usage

'jsonPath'and 'results_type'are literals. Pinot uses single quotes to distinguish them from identifiers.

e.g.

JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING') is valid.
JSONEXTRACTSCALAR(profile_json_str, "$.name", "STRING") is invalid.

Transform functions can only be used in Pinot SQL. Scalar functions can be used for column transformation in table ingestion configs.

Examples

The examples below are based on these 3 sample profile JSON documents:

Query 1: Extract string values from the field 'name'

Results are

Query 2: Extract integer values from the field 'age'

Results are

Query 3: Extract Bob's age from the JSON profile.

Results are

Query 4: Extract all field keys of JSON profile.

Results are

Another example of extracting JSON fields from below JSON record:

Extract JSON fields:

Binary Functions

Multi-value Column Functions

All of the functions mentioned till now only support single value columns. You can use the following functions to do operations on multi-value columns.

Advanced Queries

Geospatial Queries

Text Queries

Supported Aggregations

Pinot provides support for aggregations using GROUP BY. You can use the following functions to get the aggregated value.

Function

Description

Example

COUNT

Get the count of rows in a group

COUNT(*)

MIN

Get the minimum value in a group

MIN(playerScore)

MAX

Get the maximum value in a group

MAX(playerScore)

SUM

Get the sum of values in a group

SUM(playerScore)

AVG

Get the average of the values in a group

AVG(playerScore)

MODE

Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.

MODE(playerScore)

MODE(playerScore, 'MIN')

MODE(playerScore, 'MAX')

MODE(playerScore, 'AVG')

MINMAXRANGE

Returns the max - min value in a group

MINMAXRANGE(playerScore)

PERCENTILE(column, N)

Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILE(playerScore, 50), PERCENTILE(playerScore, 99.9)

PERCENTILEEST(column, N)

PERCENTILEEST(playerScore, 50), PERCENTILEEST(playerScore, 99.9)

PERCENTILETDigest(column, N)

PERCENTILETDIGEST(playerScore, 50), PERCENTILETDIGEST(playerScore, 99.9)

DISTINCT

Returns the distinct row values in a group

DISTINCT(playerName)

DISTINCTCOUNT

Returns the count of distinct row values in a group

DISTINCTCOUNT(playerName)

DISTINCTCOUNTBITMAP

Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.

DISTINCTCOUNTBITMAP(playerName)

DISTINCTCOUNTHLL

Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.

DISTINCTCOUNTHLL(playerName, 12)

DISTINCTCOUNTRAWHLL

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

DISTINCTCOUNTRAWHLL(playerName)

FASTHLL (Deprecated)

WARN: will be deprecated soon. FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLL(playerName)

DISTINCTCOUNTTHETASKETCH

DISTINCTCOUNTRAWTHETASKETCH

SEGMENTPARTITIONEDDISTINCTCOUNT

Returns the count of distinct values of a column when the column is pre-partitioned for each segment, where there is no common value within different segments. This function calculates the exact count of distinct values within the segment, then simply sums up the results from different segments to get the final result.

SEGMENTPARTITIONEDDISTINCTCOUNT(playerName)

Multi-value column functions

The following aggregation functions can be used for multi-value columns

Function

Description

Example

COUNTMV

Get the count of rows in a group

COUNTMV(playerName)

MINMV

Get the minimum value in a group

MINMV(playerScores)

MAXMV

Get the maximum value in a group

MAXMV(playerScores)

SUMMV

Get the sum of values in a group

SUMMV(playerScores)

AVGMV

Get the avg of values in a group

AVGMV(playerScores)

MINMAXRANGEMV

Returns the max - min value in a group

MINMAXRANGEMV(playerScores)

PERCENTILEMV(column, N)

Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILEMV(playerScores, 50),

PERCENTILEMV(playerScores, 99.9)

PERCENTILEESTMV(column, N)

PERCENTILEESTMV(playerScores, 50),

PERCENTILEESTMV(playerScores, 99.9)

PERCENTILETDIGESTMV(column, N)

PERCENTILETDIGESTMV(playerScores, 50),

PERCENTILETDIGESTMV(playerScores, 99.9),

DISTINCTCOUNTMV

Returns the count of distinct row values in a group

DISTINCTCOUNTMV(playerNames)

DISTINCTCOUNTBITMAPMV

Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.

DISTINCTCOUNTBITMAPMV(playerNames)

DISTINCTCOUNTHLLMV

Returns an approximate distinct count using HyperLogLog in a group

DISTINCTCOUNTHLLMV(playerNames)

DISTINCTCOUNTRAWHLLMV

DISTINCTCOUNTRAWHLLMV(playerNames)

FASTHLLMV (Deprecated)

stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLLMV(playerNames)

User-Defined Functions (UDFs)

Pinot currently supports two ways for you to implement your own functions:

Groovy Scripts
Scalar Functions

Groovy Scripts

Pinot allows you to run any function using Apache Groovy scripts. The syntax for executing Groovy script within the query is as follows:

GROOVY('result value metadata json', ''groovy script', arg0, arg1, arg2...)

This function will execute the groovy script using the arguments provided and return the result that matches the provided result value metadata. The function requires the following arguments:

Result value metadata json - json string representing result value metadata. Must contain non-null keys resultType and isSingleValue.
Groovy script to execute- groovy script string, which uses arg0, arg1, arg2 etc to refer to the arguments provided within the script
arguments - pinot columns/other transform functions that are arguments to the groovy script

Examples

Add colA and colB and return a single-value INT groovy( '{"returnType":"INT","isSingleValue":true}', 'arg0 + arg1', colA, colB)
Find the max element in mvColumn array and return a single-value INT
groovy('{"returnType":"INT","isSingleValue":true}', 'arg0.toList().max()', mvColumn)
Find all elements of the array mvColumn and return as a multi-value LONG column
groovy('{"returnType":"LONG","isSingleValue":false}', 'arg0.findIndexValues{ it > 5 }', mvColumn)
Multiply length of array mvColumn with colB and return a single-value DOUBLE
groovy('{"returnType":"DOUBLE","isSingleValue":true}', 'arg0 * arg1', arraylength(mvColumn), colB)
Find all indexes in mvColumnA which have value foo, add values at those indexes in mvColumnB
groovy( '{"returnType":"DOUBLE","isSingleValue":true}', 'def x = 0; arg0.eachWithIndex{item, idx-> if (item == "foo") {x = x + arg1[idx] }}; return x' , mvColumnA, mvColumnB)
Switch case which returns a FLOAT value depending on length of mvCol array
groovy('{\"returnType\":\"FLOAT\", \"isSingleValue\":true}', 'def result; switch(arg0.length()) { case 10: result = 1.1; break; case 20: result = 1.2; break; default: result = 1.3;}; return result.floatValue()', mvCol)
Any Groovy script which takes no arguments
groovy('new Date().format( "yyyyMMdd" )', '{"returnType":"STRING","isSingleValue":true}')

Scalar Functions

Since the 0.5.0 release, Pinot supports custom functions that return a single output for multiple inputs. Examples of scalar functions can be found in StringFunctions and DateTimeFunctions

Pinot automatically identifies and registers all the functions that have the @ScalarFunction annotation.

Only Java methods are supported.

Adding user defined scalar functions

You can add new scalar functions as follows:

Create a new java project. Make sure you keep the package name as org.apache.pinot.scalar.XXXX
In your java project include the dependency

<dependency>
  <groupId>org.apache.pinot</groupId>
  <artifactId>pinot-common</artifactId>
  <version>0.5.0</version>
 </dependency>

include 'org.apache.pinot:pinot-common:0.5.0'

Annotate your methods with @ScalarFunction annotation. Make sure the method is static and returns only a single value output. The input and output can have one of the following types -
- Integer
- Long
- Double
- String

//Example Scalar function

@ScalarFunction
static String mySubStr(String input, Integer beginIndex) {
  return input.substring(beginIndex);
}

Place the compiled JAR in the /plugins directory in pinot. You will need to restart all Pinot instances if they are already running.
Now, you can use the function in a query as follows:

SELECT mysubstr(playerName, 4) 
FROM baseballStats

Note that the function name in SQL is the same as the function name in Java. The SQL function name is case-insensitive as well.

Cardinality Estimation

Cardinality estimation is a classic problem. Pinot solves it with multiple ways each of which has a trade-off between accuracy and latency.

Accurate Results

Functions:

DistinctCount(x) -> LONG

Returns accurate count for all unique values in a column.

The underlying implementation is using a IntOpenHashSet in library: it.unimi.dsi:fastutil:8.2.3 to hold all the unique values.

Approximation Results

It usually takes a lot of resources and time to compute accurate results for unique counting on large datasets. In some circumstances, we can tolerate a certain error rate, in which case we can use approximation functions to tackle this problem.

HyperLogLog

HyperLogLog is an approximation algorithm for unique counting. It uses fixed number of bits to estimate the cardinality of given data set.

Pinot leverages HyperLogLog Class in library com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.

Functions:

DistinctCountHLL(x)_ -> LONG_

For column type INT/LONG/FLOAT/DOUBLE/STRING , Pinot treats each value as an individual entry to add into HyperLogLog Object, then compute the approximation by calling method cardinality().

For column type BYTES, Pinot treats each value as a serialized HyperLogLog Object with pre-aggregated values inside. The bytes value is generated by org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog).

All deserialized HyperLogLog object will be merged into one then calling method **cardinality() **to get the approximated unique count.

Theta Sketches

The Theta Sketch framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Class and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.

Functions:

DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**) **-> LONG
- thetaSketchColumn (required): Name of the column to aggregate on.
- thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches. Currently, the only supported parameter is nominalEntries.
- predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
- postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.

In the example query below, the where clause is responsible for identifying the matching rows. Note, the where clause can be completely independent of the postAggregationExpression. Once matching rows are identified, each server unionizes all the sketches that match the individual predicates, i.e. country='USA' , device='mobile' in this case. Once the broker receives the intermediate sketches for each of these individual predicates from all servers, it performs the final aggregation by evaluating the postAggregationExpression and returns the final cardinality of the resulting sketch.

select distinctCountThetaSketch(
  sketchCol, 
  'nominalEntries=1024', 
  'country'=''USA'' AND 'state'=''CA'', 'device'=''mobile'', 'SET_INTERSECT($1, $2)'
) 
from table 
where country = 'USA' or device = 'mobile...'

DistinctCountRawThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**)** -> HexEncoded Serialized Sketch Bytes

This is the same as the previous function, except it returns the byte serialized sketch instead of the cardinality sketch. Since Pinot returns responses as JSON strings, bytes are returned as hex encoded strings. The hex encoded string can be deserialized into sketch by using the library org.apache.commons.codec.binaryas Hex.decodeHex(stringValue.toCharArray()).

Lookup UDF Join

Lookup UDF is used to get dimension data via primary key from a dimension table allowing a decoration join functionality. Lookup UDF can only be used with in Pinot. The UDF signature is as below:

dimTableName Name of the dim table to perform the lookup on.
dimColToLookUp The column name of the dim table to be retrieved to decorate our result.
dimJoinKey The column name on which we want to perform the lookup i.e. the join column name for dim table.
factJoinKeyVal The value of the dim table join column for which we will retrieve the dimColToLookUp for the scope and invocation.

Return type of the UDF will be that of the dimColToLookUp column type. There can also be multiple primary keys and corresponding values.

Note: If the dimension table uses a composite primary key i.e multiple primary keys, then ensure that the order of keys appearing in the lookup() UDF is same as the order defined for "primaryKeyColumns" in the dimension table schema.

Querying JSON data

To see how JSON data can be queried, assume that we have the following table:

Table myTable:
  id        INTEGER
  jsoncolumn    JSON 

Table data:
101,{"name":{"first":"daffy"\,"last":"duck"}\,"score":101\,"data":["a"\,"b"\,"c"\,"d"]}
102,{"name":{"first":"donald"\,"last":"duck"}\,"score":102\,"data":["a"\,"b"\,"e"\,"f"]}
103,{"name":{"first":"mickey"\,"last":"mouse"}\,"score":103\,"data":["a"\,"b"\,"g"\,"h"]}
104,{"name":{"first":"minnie"\,"last":"mouse"}\,"score":104\,"data":["a"\,"b"\,"i"\,"j"]}
105,{"name":{"first":"goofy"\,"last":"dwag"}\,"score":104\,"data":["a"\,"b"\,"i"\,"j"]}
106,{"person":{"name":"daffy duck"\,"companies":[{"name":"n1"\,"title":"t1"}\,{"name":"n2"\,"title":"t2"}]}}
107,{"person":{"name":"scrooge mcduck"\,"companies":[{"name":"n1"\,"title":"t1"}\,{"name":"n2"\,"title":"t2"}]}}

We also assume that "jsoncolumn" has a Json Index on it. Note that the last two rows in the table have different structure than the rest of the rows. In keeping with JSON specification, a JSON column can contain any valid JSON data and doesn't need to adhere to a predefined schema. To pull out the entire JSON document for each row, we can run the query below:

SELECT id, jsoncolumn 
FROM myTable

jsoncolumn

"101"

"{"name":{"first":"daffy","last":"duck"},"score":101,"data":["a","b","c","d"]}"

102"

"{"name":{"first":"donald","last":"duck"},"score":102,"data":["a","b","e","f"]}

"103"

"{"name":{"first":"mickey","last":"mouse"},"score":103,"data":["a","b","g","h"]}

"104"

"{"name":{"first":"minnie","last":"mouse"},"score":104,"data":["a","b","i","j"]}"

"105"

"{"name":{"first":"goofy","last":"dwag"},"score":104,"data":["a","b","i","j"]}"

"106"

"{"person":{"name":"daffy duck","companies":[{"name":"n1","title":"t1"},{"name":"n2","title":"t2"}]}}"

"107"

"{"person":{"name":"scrooge mcduck","companies":[{"name":"n1","title":"t1"},{"name":"n2","title":"t2"}]}}"

To drill down and pull out specific keys within the JSON column, we simply append the JsonPath expression of those keys to the end of the column name.

SELECT id, jsoncolumn.name.last, jsoncolumn.name.first, jsoncolumn.data[1] 
FROM myTable

jsoncolumn.name.last

jsoncolumn.name.first

jsoncolumn.data[1]

"101"

"duck"

"daffy"

"b"

"102"

"duck"

"donald"

"b"

"103"

"mouse"

"mickey"

"b"

"104"

"mouse"

"minnie"

"b"

"105"

"dwag"

"goofy"

"b"

"106"

"null"

"107"

"null"

Note that the third column (jsoncolumn.data[1]) is null for rows with id 106 and 107. This is because these rows have JSON documents that don't have a key with JsonPath jsoncolumn.data[1]. We can filter out these rows.

SELECT id, jsoncolumn.name.last, jsoncolumn.name.first, jsoncolumn.data[1] 
FROM myTable 
WHERE jsoncolumn.data[1] IS NOT NULL

jsoncolumn.name.last

jsoncolumn.name.first

jsoncolumn.data[1]

"101"

"duck"

"daffy"

"b"

"102"

"duck"

"donald"

"b"

"103"

"mouse"

"mickey"

"b"

"104"

"mouse"

"minnie"

"b"

"105"

"dwag"

"goofy"

"b"

Notice that certain last names (duck and mouse for example) repeat in the data above. We can get a count of each last name by running a GROUP BY query on a JsonPath expression.

SELECT jsoncolumn.name.last, count(*) 
FROM myTable 
WHERE jsoncolumn.data[1] IS NOT NULL 
GROUP BY jsoncolumn.name.last 
ORDER BY 2 DESC

jsoncolumn.name.last

count(*)

"mouse"

"2"

"duck"

"2"

"dwag"

"1"

Also there is numerical information (jsconcolumn.score) embeded within the JSON document. We can extract those numerical values from JSON data into SQL and sum them up using the query below.

SELECT jsoncolumn.name.last, sum(jsoncolumn.score) 
FROM myTable 
WHERE jsoncolumn.name.last IS NOT NULL 
GROUP BY jsoncolumn.name.last

jsoncolumn.name.last

sum(jsoncolumn.score)

"mouse"

"207"

"dwag"

"104"

"duck"

"203"

In short, JSON querying support in Pinot will allow you to use a JsonPath expression whereever you can use a column name with the only difference being that to query a column with data type JSON, you must append a JsonPath expression after the name of the column.

Supported Aggregations

Pinot provides support for aggregations using GROUP BY. You can use the following functions to get the aggregated value.

Function

Description

Example

COUNT

Get the count of rows in a group

COUNT(*)

MIN

Get the minimum value in a group

MIN(playerScore)

MAX

Get the maximum value in a group

MAX(playerScore)

SUM

Get the sum of values in a group

SUM(playerScore)

AVG

Get the average of the values in a group

AVG(playerScore)

MODE

Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.

MODE(playerScore)

MODE(playerScore, 'MIN')

MODE(playerScore, 'MAX')

MODE(playerScore, 'AVG')

MINMAXRANGE

Returns the max - min value in a group

MINMAXRANGE(playerScore)

PERCENTILE(column, N)

Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILE(playerScore, 50), PERCENTILE(playerScore, 99.9)

PERCENTILEEST(column, N)

Returns the Nth percentile of the group using algorithm

PERCENTILEEST(playerScore, 50), PERCENTILEEST(playerScore, 99.9)

PERCENTILETDigest(column, N)

Returns the Nth percentile of the group using

PERCENTILETDIGEST(playerScore, 50), PERCENTILETDIGEST(playerScore, 99.9)

DISTINCT

Returns the distinct row values in a group

DISTINCT(playerName)

DISTINCTCOUNT

Returns the count of distinct row values in a group

DISTINCTCOUNT(playerName)

DISTINCTCOUNTBITMAP

DISTINCTCOUNTBITMAP(playerName)

DISTINCTCOUNTHLL

Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.

DISTINCTCOUNTHLL(playerName, 12)

DISTINCTCOUNTRAWHLL

DISTINCTCOUNTRAWHLL(playerName)

FASTHLL (Deprecated)

FASTHLL(playerName)

DISTINCTCOUNTTHETASKETCH

See

DISTINCTCOUNTRAWTHETASKETCH

See

SEGMENTPARTITIONEDDISTINCTCOUNT

SEGMENTPARTITIONEDDISTINCTCOUNT(playerName)

Multi-value column functions

The following aggregation functions can be used for multi-value columns

Function

Description

Example

COUNTMV

Get the count of rows in a group

COUNTMV(playerName)

MINMV

Get the minimum value in a group

MINMV(playerScores)

MAXMV

Get the maximum value in a group

MAXMV(playerScores)

SUMMV

Get the sum of values in a group

SUMMV(playerScores)

AVGMV

Get the avg of values in a group

AVGMV(playerScores)

MINMAXRANGEMV

Returns the max - min value in a group

MINMAXRANGEMV(playerScores)

PERCENTILEMV(column, N)

Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILEMV(playerScores, 50),

PERCENTILEMV(playerScores, 99.9)

PERCENTILEESTMV(column, N)

Returns the Nth percentile of the group using algorithm

PERCENTILEESTMV(playerScores, 50),

PERCENTILEESTMV(playerScores, 99.9)

PERCENTILETDIGESTMV(column, N)

Returns the Nth percentile of the group using

PERCENTILETDIGESTMV(playerScores, 50),

PERCENTILETDIGESTMV(playerScores, 99.9),

DISTINCTCOUNTMV

Returns the count of distinct row values in a group

DISTINCTCOUNTMV(playerNames)

DISTINCTCOUNTBITMAPMV

DISTINCTCOUNTBITMAPMV(playerNames)

DISTINCTCOUNTHLLMV

Returns an approximate distinct count using HyperLogLog in a group

DISTINCTCOUNTHLLMV(playerNames)

DISTINCTCOUNTRAWHLLMV

DISTINCTCOUNTRAWHLLMV(playerNames)

FASTHLLMV (Deprecated)

stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLLMV(playerNames)

Query

Filtering with IdSet

Functions

ID_SET

IN_ID_SET

IN_SUBQUERY

IN__PARTITIONED__SUBQUERY

Examples

Create IdSet

Filter by values in IdSet

Filter by values not in IdSet

Filter on broker

Filter on server

Supported Transformations

Math Functions

String Functions

DateTime Functions

JSON Functions

Binary Functions

Multi-value Column Functions

Advanced Queries

Geospatial Queries

Text Queries

Supported Aggregations

Multi-value column functions

User-Defined Functions (UDFs)

Groovy Scripts

Scalar Functions

Adding user defined scalar functions

Cardinality Estimation

Accurate Results

Approximation Results

HyperLogLog

Theta Sketches

Lookup UDF Join

Querying JSON data

Query

User-Defined Functions (UDFs)

Groovy Scripts

Scalar Functions

Adding user defined scalar functions

Cardinality Estimation

Accurate Results

Approximation Results

HyperLogLog

Theta Sketches

Querying JSON data

Supported Aggregations

Multi-value column functions

Filtering with IdSet

Functions

ID_SET

IN_ID_SET

IN_SUBQUERY

IN__PARTITIONED__SUBQUERY

Examples

Create IdSet

Filter by values in IdSet

Filter by values not in IdSet

Filter on broker

Filter on server

Lookup UDF Join

Supported Transformations

Math Functions

String Functions

DateTime Functions

JSON Functions

Binary Functions

Multi-value Column Functions

Advanced Queries

Geospatial Queries

Text Queries

Querying Pinot

DIALECT

Limitations

Identifier vs Literal

Example Queries

Simple selection

Aggregation

Grouping on Aggregation

INPARTITIONEDSUBQUERY

INPARTITIONEDSUBQUERY