1 of 13

Query

Learn how to query Apache Pinot using SQL or explore data using the web-based Pinot query console.

Querying Pinot

Learn how to query Pinot using SQL

SQL Interface

Pinot provides SQL interface for querying. It uses the Calcite SQL parser to parse queries and uses MYSQL_ANSI dialect. You can see the grammar in the Calcite documentation.

Limitations

The latest Pinot multi-stage supports inner join, left-outer, semi-join, and nested queries out of the box. It is optimized for in-memory process and latency.
- For queries that require a large amount of data shuffling, or require spill-to-disk, or hitting any other limitations of the multi-stage engine, we still recommend using Presto. For more information, see .

Identifier vs Literal

In Pinot SQL:

Double quotes(") are used to force string identifiers, e.g. column names
Single quotes(') are used to enclose string literals. If the string literal also contains a single quote, escape this with a single quote e.g '''Pinot''' to match the string literal 'Pinot'

Mis-using those might cause unexpected query results:

e.g.

WHERE a='b' means the predicate on the column a equals to a string literal value 'b'
WHERE a="b" means the predicate on the column a equals to the value of the column b

If your column names use reserved keywords (e.g. timestamp or date) or special charactesr, you will need to use double quotes when referring to them in queries.

Note: Defining decimal literals within quotes preserves precision.

Example Queries

Selection

Aggregation

Grouping on Aggregation

Ordering on Aggregation

Filtering

For performant filtering of ids in a list, see .

Filtering with NULL predicate

Selection (Projection)

Ordering on Selection

Pagination on Selection

Results might not be consistent if the order by column has the same value in multiple rows.

Wild-card match (in WHERE clause only)

To count rows where the column airlineName starts with U

Case-When Statement

Pinot supports the CASE-WHEN-ELSE statement.

Example 1:

Example 2:

UDF

Functions have to be implemented within Pinot. Injecting functions is not yet supported. The example below demonstrate the use of UDFs.

For more examples, see .

BYTES column

Pinot supports queries on BYTES column using HEX string. The query response also uses HEX string to represent bytes values.

e.g. the query below fetches all the rows for a given UID.

Aggregation Functions

Function

Description

Example

Default Value When No Record Selected

Returns the count of the records as Long

COUNT(*)

0

Returns the population covariance between of 2 numerical columns as Double

COVAR_POP(col1, col2)

Deprecated functions:

Function

Description

Example

Multi-value column functions

The following aggregation functions can be used for multi-value columns

Function

FILTER Clause in aggregation

Pinot supports FILTER clause in aggregation queries as follows:

In the query above, COL1 is aggregated only for rows where COL2 > 300 and COL3 > 50 . Similarly, COL2 is aggregated where COL2 < 50 and COL3 > 50.

With enabled, this allows to filter out the null values while performing aggregation as follows:

In the above query, COL1 is aggregated only for the non-null values. Without NULL value support, we would have to filter using the default null value.

NOTE: TheFILTER clause is currently supported for aggregation-only queries, i.e., GROUP BY

is not supported.

Deprecated functions:

Function

Description

Example

Grouping Algorithm

In this guide we will learn about the heuristics used for trimming results in Pinot's grouping algorithm (used when processing GROUP BY queries) to make sure that the server doesn't run out of memory.

Within segment

When grouping rows within a segment, Pinot keeps a maximum of <numGroupsLimit> groups per segment. This value is set to 100,000 by default and can be configured by the pinot.server.query.executor.num.groups.limit

Query Options

This document contains all the available query options

Supported Query Options

Key

Description

Default Behavior

Cardinality Estimation

Cardinality estimation is a classic problem. Pinot solves it with multiple ways each of which has a trade-off between accuracy and latency.

Accurate Results

Functions:

DistinctCount(x) -> LONG

Returns accurate count for all unique values in a column.

The underlying implementation is using a IntOpenHashSet in library: it.unimi.dsi:fastutil:8.2.3 to hold all the unique values.

Approximation Results

It usually takes a lot of resources and time to compute accurate results for unique counting on large datasets. In some circumstances, we can tolerate a certain error rate, in which case we can use approximation functions to tackle this problem.

HyperLogLog

is an approximation algorithm for unique counting. It uses fixed number of bits to estimate the cardinality of given data set.

Pinot leverages in library com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.

Functions:

DistinctCountHLL(x)_ -> LONG_

For column type INT/LONG/FLOAT/DOUBLE/STRING , Pinot treats each value as an individual entry to add into HyperLogLog Object, then compute the approximation by calling method cardinality().

For column type BYTES, Pinot treats each value as a serialized HyperLogLog Object with pre-aggregated values inside. The bytes value is generated by org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog).

All deserialized HyperLogLog object will be merged into one then calling method **cardinality() **to get the approximated unique count.

Theta Sketches

The framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.

Functions:

DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**) **-> LONG
- thetaSketchColumn (required): Name of the column to aggregate on.
- thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches. Currently, the only supported parameter is nominalEntries

In the example query below, the where clause is responsible for identifying the matching rows. Note, the where clause can be completely independent of the postAggregationExpression. Once matching rows are identified, each server unionizes all the sketches that match the individual predicates, i.e. country='USA' , device='mobile' in this case. Once the broker receives the intermediate sketches for each of these individual predicates from all servers, it performs the final aggregation by evaluating the postAggregationExpression and returns the final cardinality of the resulting sketch.

DistinctCountRawThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**)** -> HexEncoded Serialized Sketch Bytes

This is the same as the previous function, except it returns the byte serialized sketch instead of the cardinality sketch. Since Pinot returns responses as JSON strings, bytes are returned as hex encoded strings. The hex encoded string can be deserialized into sketch by using the library org.apache.commons.codec.binaryas Hex.decodeHex(stringValue.toCharArray()).

Explain Plan

Query execution within Pinot is modeled as a sequence of operators that are executed in a pipelined manner to produce the final result. The output of the EXPLAIN PLAN statement can be used to see how queries are being run or to further optimize queries.

Introduction

EXPLAN PLAN can be run in two modes: verbose and non-verbose (default) via the use of a query option. To enable verbose mode the query option explainPlanVerbose=true must be passed.

Querying Pinot

Learn how to query Pinot using SQL

SQL Interface

Pinot provides SQL interface for querying. It uses the Calcite SQL parser to parse queries and uses MYSQL_ANSI dialect. You can see the grammar in the Calcite documentation.

Limitations

The latest Pinot multi-stage supports inner join, left-outer, semi-join, and nested queries out of the box. It is optimized for in-memory process and latency.
- For queries that require a large amount of data shuffling, or require spill-to-disk, or hitting any other limitations of the multi-stage engine, we still recommend using Presto. For more information, see .

Identifier vs Literal

In Pinot SQL:

Double quotes(") are used to force string identifiers, e.g. column names
Single quotes(') are used to enclose string literals. If the string literal also contains a single quote, escape this with a single quote e.g '''Pinot''' to match the string literal 'Pinot'

Mis-using those might cause unexpected query results:

e.g.

WHERE a='b' means the predicate on the column a equals to a string literal value 'b'
WHERE a="b" means the predicate on the column a equals to the value of the column b

If your column names use reserved keywords (e.g. timestamp or date) or special charactesr, you will need to use double quotes when referring to them in queries.

Note: Defining decimal literals within quotes preserves precision.

Example Queries

Selection

Aggregation

Grouping on Aggregation

Ordering on Aggregation

Filtering

For performant filtering of ids in a list, see .

Filtering with NULL predicate

Selection (Projection)

Ordering on Selection

Pagination on Selection

Results might not be consistent if the order by column has the same value in multiple rows.

Wild-card match (in WHERE clause only)

To count rows where the column airlineName starts with U

Case-When Statement

Pinot supports the CASE-WHEN-ELSE statement.

Example 1:

Example 2:

UDF

Functions have to be implemented within Pinot. Injecting functions is not yet supported. The example below demonstrate the use of UDFs.

For more examples, see .

BYTES column

Pinot supports queries on BYTES column using HEX string. The query response also uses HEX string to represent bytes values.

e.g. the query below fetches all the rows for a given UID.

Cardinality Estimation

Cardinality estimation is a classic problem. Pinot solves it with multiple ways each of which has a trade-off between accuracy and latency.

Accurate Results

Functions:

DistinctCount(x) -> LONG

Returns accurate count for all unique values in a column.

The underlying implementation is using a IntOpenHashSet in library: it.unimi.dsi:fastutil:8.2.3 to hold all the unique values.

Approximation Results

HyperLogLog

is an approximation algorithm for unique counting. It uses fixed number of bits to estimate the cardinality of given data set.

Pinot leverages in library com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.

Functions:

DistinctCountHLL(x)_ -> LONG_

All deserialized HyperLogLog object will be merged into one then calling method **cardinality() **to get the approximated unique count.

Theta Sketches

Functions:

DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**) **-> LONG
- thetaSketchColumn (required): Name of the column to aggregate on.
- thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches. Currently, the only supported parameter is nominalEntries

DistinctCountRawThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**)** -> HexEncoded Serialized Sketch Bytes

The root operator of the EXPLAIN PLAN output is BROKER_REDUCE. BROKER_REDUCE indicates that Broker is processing and combining server results into final result that is sent back to the user. BROKER_REDUCE has a COMBINE operator as its child. Combine operator combines the results of query evaluation from each segment on the server and sends the combined result to the Broker. There are several combine operators (COMBINE_GROUPBY_ORDERBY, COMBINE_DISTINCT, COMBINE_AGGREGATE, etc.) that run depending upon the operations being performed by the query. Under the Combine operator, either a Select (SELECT, SELECT_ORDERBY, etc.) or an Aggregate (AGGREGATE, AGGREGATE_GROUPBY_ORDERBY, etc.) can appear. Aggreate operator is present when query performs aggregation (count(*), min, max, etc.); otherwise, a Select operator is present. If the query performs scalar transformations (Addition, Multiplication, Concat, etc.), then one would see TRANSFORM operator appear under the SELECT operator. Often a TRANSFORM_PASSTHROUGH operator is present instead of the TRANSFORM operator. TRANSFORM_PASSTHROUGH just passes results from operators that appear lower in the operator execution heirarchy to the SELECT operator. DOC_ID_SET operator usually appear above FILTER operators and indicate that a list of matching document IDs are assessed. FILTER operators usually appear at the bottom of the operator heirarchy and show index use. For example, the presence of FILTER_FULL_SCAN indicates that index was not used (and hence the query is likely to run relatively slow). However, if the query used an index one of the indexed filter operators (FILTER_SORTED_INDEX, FILTER_RANGE_INDEX, FILTER_INVERTED_INDEX, FILTER_JSON_INDEX, etc.) will show up.

User-Defined Functions (UDFs)

Pinot currently supports two ways for you to implement your own functions:

Groovy Scripts
Scalar Functions

Groovy Scripts

Pinot allows you to run any function using scripts. The syntax for executing Groovy script within the query is as follows:

GROOVY('result value metadata json', ''groovy script', arg0, arg1, arg2...)

This function will execute the groovy script using the arguments provided and return the result that matches the provided result value metadata. **** The function requires the following arguments:

Result value metadata json - json string representing result value metadata. Must contain non-null keys resultType and isSingleValue.
Groovy script to execute- groovy script string, which uses arg0, arg1

Examples

Add colA and colB and return a single-value INT groovy( '{"returnType":"INT","isSingleValue":true}', 'arg0 + arg1', colA, colB)\
Find the max element in mvColumn array and return a single-value INT
groovy('{"returnType":"INT","isSingleValue":true}', 'arg0.toList().max()', mvColumn)\

⚠️ Note that Groovy script doesn't accept Built-In ScalarFunction that's specific to Pinot queries. See the section below for more information.

⚠️ Enabling Groovy

Allowing execuatable Groovy in queries can be a security vulnerability. Please use caution and be aware of the security risks if you decide to allow groovy. If you would like to enable Groovy in Pinot queries, you can set the following broker config.

pinot.broker.disable.query.groovy=false

If not set, Groovy in queries is disabled by default.

The above configuration applies across the entire Pinot cluster. If you want a table level override to enable/disable Groovy queries, the following property can be set in the query table config.

Scalar Functions

Since the 0.5.0 release, Pinot supports custom functions that return a single output for multiple inputs. Examples of scalar functions can be found in and

Pinot automatically identifies and registers all the functions that have the @ScalarFunction annotation.

Only Java methods are supported.

Adding user defined scalar functions

You can add new scalar functions as follows:

Create a new java project. Make sure you keep the package name as org.apache.pinot.scalar.XXXX
In your java project include the dependency

Annotate your methods with @ScalarFunction annotation. Make sure the method is static and returns only a single value output. The input and output can have one of the following types -
- Integer

Place the compiled JAR in the /plugins directory in pinot. You will need to restart all Pinot instances if they are already running.
Now, you can use the function in a query as follows:

⚠️ Note that the function name in SQL is the same as the function name in Java. The SQL function name is case-insensitive as well.

Filtering with IdSet

Learn how to write fast queries for looking up ids in a list of values.

A common use case is filtering on an id field with a list of values. This can be done with the IN clause, but this approach doesn't perform well with large lists of ids. In these cases, you can use an IdSet.

Functions

ID_SET

ID_SET(columnName, 'sizeThresholdInBytes=8388608;expectedInsertions=5000000;fpp=0.03' )

This function returns a base 64 encoded IdSet of the values for a single column. The IdSet implementation used depends on the column data type:

INT - RoaringBitmap unless sizeThresholdInBytes is exceeded, in which case Bloom Filter.
LONG - Roaring64NavigableMap unless sizeThresholdInBytes is exceeded, in which case Bloom Filter.
Other types - Bloom Filter

The following parameters are used to configure the Bloom Filter:

expectedInsertions - Number of expected insertions for the BloomFilter, must be positive
fpp - Desired false positive probability for the BloomFilter, must be positive and < 1.0

Note that when a Bloom Filter is used, the filter results are approximate - you can get false-positive results (for membership in the set), leading to potentially unexpected results.

IN_ID_SET

IN_ID_SET(columnName, base64EncodedIdSet)

This function returns 1 if a column contains a value specified in the IdSet and 0 if it does not.

IN_SUBQUERY

IN_SUBQUERY(columnName, subQuery)

This function generates an IdSet from a subquery and then filters ids based on that IdSet on a Pinot broker.

INPARTITIONEDSUBQUERY

IN_PARTITIONED_SUBQUERY(columnName, subQuery)

This function generates an IdSet from a subquery and then filters ids based on that IdSet on a Pinot server.

This function works best when the data is partitioned by the id column and each server contains all the data for a partition. The generated IdSet for the subquery will be smaller as it will only contain the ids for the partitions served by the server. This will give better performance.

The query passed to IN_SUBQUERY and IN__PARTITIONED__SUBQUERY can be run on any table - they aren't restricted to the table used in the parent query.

Examples

Create IdSet

You can create an IdSet of the values in the yearID column by running the following:

idset(yearID)

When creating an IdSet for values in non INT/LONG columns, we can configure the expectedInsertions:

idset(playerName)

We can also configure the fpp parameter:

idset(playerName)

Filter by values in IdSet

We can use the IN_ID_SET function to filter a query based on an IdSet. To return rows for yearIDs in the IdSet, run the following:

Filter by values not in IdSet

To return rows for yearIDs not in the IdSet, run the following:

Filter on broker

To filter rows for yearIDs in the IdSet on a Pinot Broker, run the following query:

To filter rows for yearIDs not in the IdSet on a Pinot Broker, run the following query:

Filter on server

To filter rows for yearIDs in the IdSet on a Pinot Server, run the following query:

To filter rows for yearIDs not in the IdSet on a Pinot Server, run the following query:

Querying JSON data

To see how JSON data can be queried, assume that we have the following table:

We also assume that "jsoncolumn" has a Json Index on it. Note that the last two rows in the table have different structure than the rest of the rows. In keeping with JSON specification, a JSON column can contain any valid JSON data and doesn't need to adhere to a predefined schema. To pull out the entire JSON document for each row, we can run the query below:

jsoncolumn

"101"

"{"name":{"first":"daffy","last":"duck"},"score":101,"data":["a","b","c","d"]}"

102"

To drill down and pull out specific keys within the JSON column, we simply append the JsonPath expression of those keys to the end of the column name.

last_name

first_name

value

Note that the third column (value) is null for rows with id 106 and 107. This is because these rows have JSON documents that don't have a key with JsonPath $.data[1]. We can filter out these rows.

last_name

first_name

value

Certain last names (duck and mouse for example) repeat in the data above. We can get a count of each last name by running a GROUP BY query on a JsonPath expression.

jsoncolumn.name.last

count(*)

Also there is numerical information (jsconcolumn.$.id) embeded within the JSON document. We can extract those numerical values from JSON data into SQL and sum them up using the query below.

jsoncolumn.name.last

sum(jsoncolumn.score)

JSON_MATCH and JSON_EXTRACT_SCALAR

Note that the JSON_MATCH function utilizes JsonIndex and can only be used if a JsonIndex is already present on the JSON column. As shown in the examples above, the second argument of JSON_MATCH operator takes a predicate. This predicate is evaluated against the JsonIndex and supports =, !=, IS NULL, or IS NOT NULL operators. Relational operators, such as >, <, >=

jsoncolumn.name.last

sum(jsoncolumn.score)

JSON_MATCH function also provides the ability to use wildcard * JsonPath expressions even though it doesn't support full JsonPath expressions.

last_name

total

While, JSON_MATCH supports IS NULL and IS NOT NULL operators, these operators should only be applied to leaf-level path elements, i.e the predicate JSON_MATCH(jsoncolumn, '"$.data[*]" IS NOT NULL') is not valid since "$.data[*]" does not address a "leaf" element of the path; however, "$.data[0]" IS NOT NULL') is valid since "$.data[0]" unambigously identifies a leaf element of the path.

JSON_EXTRACT_SCALAR does not utilize JsonIndex and therefore performs slower than JSON_MATCH which utilizes JsonIndex. However, JSON_EXTRACT_SCALAR supports a wider range for of JsonPath expressions and operators. To make the best use of fast index access (JSON_MATCH) along with JsonPath expressions (JSON_EXTRACT_SCALAR) you can combine the use of these two functions in WHERE clause.

JSON_MATCH syntax

The second argument of the JSON_MATCH function is a boolean expression in string form. This section shows how to correctly write the second argument of JSON_MATCH. Let's assume we want to search a JSON array array data for values k and j. This can be done by the following predicate:

To convert this predicate into string form for use in JSON_MATCH, we first turn the left side of the predicate into an identifier by enclosing it in double quotes:

Next, the literals in the predicate also need to be enclosed by '. Any existing ' need to be escaped as well. This gives us:

Finally, we need to create a string out of the entire expression above by enclosing it in ':

Now we have the string representation of the original predicate and this can be used in JSON_MATCH function:

GapFill Function For Time-Series Dataset

Many of the datasets are time series in nature, tracking state change of an entity over time. The granularity of recorded data points might be sparse or the events could be missing due to network and other device issues in the IOT environment. But analytics applications which are tracking the state change of these entities over time, might be querying for values at lower granularity than the metric interval.

Here is the sample data set tracking the status of parking lots in parking space.

lotId

event_time

is_occupied

2021-10-01 09:01:00.000

We want to find out the total number of parking lots that are occupied over a period of time which would be a common use case for a company that manages parking spaces.

Let us take 30 minutes' time bucket as an example:

timeBucket/lotId

If you look at the above table, you will see a lot of missing data for parking lots inside the time buckets. In order to calculate the number of occupied park lots per time bucket, we need gap fill the missing data.

The Ways of Gap Filling the Data

There are two ways of gap filling the data: FILL_PREVIOUS_VALUE and FILL_DEFAULT_VALUE.

FILL_PREVIOUS_VALUE means the missing data will be filled with the previous value for the specific entity, in this case, park lot, if the previous value exists. Otherwise, it will be filled with the default value.

FILL_DEFAULT_VALUE means that the missing data will be filled with the default value. For numeric column, the defaul value is 0. For Boolean column type, the default value is false. For TimeStamp, it is January 1, 1970, 00:00:00 GMT. For STRING, JSON and BYTES, it is empty String. For Array type of column, it is empty array.

We will leverage the following the query to calculate the total occupied parking lots per time bucket.

Aggregation/Gapfill/Aggregation

Query Syntax

Workflow

The most nested sql will convert the raw event table to the following table.

lotId

event_time

is_occupied

The second most nested sql will gap fill the returned data as following:

timeBucket/lotId

The outermost query will aggregate the gapfilled data as follows:

timeBucket

totalNumOfOccuppiedSlots

There is one assumption we made here that the raw data is sorted by the timestamp. The Gapfill and Post-Gapfill Aggregation will not sort the data.

The above example just shows the use case where the three steps happen:

The raw data will be aggregated;
The aggregated data will be gapfilled;
The gapfilled data will be aggregated.

There are three more scenarios we can support.

Select/Gapfill

If we want to gapfill the missing data per half an hour time bucket, here is the query:

Query Syntax

Workflow

At first the raw data will be transformed as follows:

lotId

event_time

is_occupied

Then it will be gapfilled as follows:

lotId

event_time

is_occupied

Aggregate/Gapfill

Query Syntax

Workflow

The nested sql will convert the raw event table to the following table.

lotId

event_time

is_occupied

The outer sql will gap fill the returned data as following:

timeBucket/lotId

Gapfill/Aggregate

Query Syntax

Workflow

The raw data will be transformed as following at first:

lotId

event_time

is_occupied

The transformed data will be gap filled as follows:

lotId

event_time

is_occupied

The aggregation will generate the following table:

timeBucket

totalNumOfOccuppiedSlots

Query

Querying Pinot

hashtagSQL Interface

hashtagLimitations

hashtagIdentifier vs Literal

hashtagExample Queries

hashtagSelection

hashtagAggregation

hashtagGrouping on Aggregation

hashtagOrdering on Aggregation

hashtagFiltering

hashtagFiltering with NULL predicate

hashtagSelection (Projection)

hashtagOrdering on Selection

hashtagPagination on Selection

hashtagWild-card match (in WHERE clause only)

hashtagCase-When Statement

hashtagUDF

hashtagBYTES column

Aggregation Functions

hashtagMulti-value column functions

hashtagFILTER Clause in aggregation

Grouping Algorithm

hashtagWithin segment

Query Options

hashtagSupported Query Options

Cardinality Estimation

hashtagAccurate Results

hashtagApproximation Results

hashtagHyperLogLog

hashtagTheta Sketches

Explain Plan

hashtagIntroduction

Aggregation Functions

hashtagMulti-value column functions

hashtagFILTER Clause in aggregation

Query

Querying Pinot

hashtagSQL Interface

hashtagLimitations

hashtagIdentifier vs Literal

hashtagExample Queries

hashtagSelection

hashtagAggregation

hashtagGrouping on Aggregation

hashtagOrdering on Aggregation

hashtagFiltering

hashtagFiltering with NULL predicate

hashtagSelection (Projection)

hashtagOrdering on Selection

hashtagPagination on Selection

hashtagWild-card match (in WHERE clause only)

hashtagCase-When Statement

hashtagUDF

hashtagBYTES column

Cardinality Estimation

hashtagAccurate Results

hashtagApproximation Results

hashtagHyperLogLog

hashtagTheta Sketches

Grouping Algorithm

hashtagWithin segment

Query Options

hashtagSupported Query Options

hashtagTrimming tail groups

hashtagCross segments

hashtagGROUP BY behavior

hashtagHAVING behavior

hashtagConfiguration Parameters

hashtagSet Query Options

hashtagBefore release 0.11.0

hashtagAfter release 0.11.0

Explain Plan

hashtagIntroduction

hashtagEXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

hashtagEXPLAIN PLAN ON GROUP BY QUERY

hashtagEXPLAIN PLAN OPERATORS

User-Defined Functions (UDFs)

hashtagGroovy Scripts

hashtagScalar Functions

SQL Interface

Limitations

Identifier vs Literal

Example Queries

Selection

Aggregation

Grouping on Aggregation

Ordering on Aggregation

Filtering

Filtering with NULL predicate

Selection (Projection)

Ordering on Selection

Pagination on Selection

Wild-card match (in WHERE clause only)

Case-When Statement

UDF

BYTES column

Multi-value column functions

FILTER Clause in aggregation

Within segment

Supported Query Options

Accurate Results

Approximation Results

HyperLogLog

Theta Sketches

Introduction

Multi-value column functions

FILTER Clause in aggregation

SQL Interface

Limitations

Identifier vs Literal

Example Queries

Selection

Aggregation

Grouping on Aggregation

Ordering on Aggregation

Filtering

Filtering with NULL predicate

Selection (Projection)

Ordering on Selection

Pagination on Selection

Wild-card match (in WHERE clause only)

Case-When Statement

UDF

BYTES column

Accurate Results

Approximation Results

HyperLogLog

Theta Sketches

Within segment

Supported Query Options

Trimming tail groups

Cross segments

GROUP BY behavior

HAVING behavior

Configuration Parameters

Set Query Options

Before release 0.11.0

After release 0.11.0

Introduction

EXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

EXPLAIN PLAN ON GROUP BY QUERY

EXPLAIN PLAN OPERATORS

Groovy Scripts

Scalar Functions

Adding user defined scalar functions

Functions

ID_SET

IN_ID_SET

IN_SUBQUERY

INPARTITIONEDSUBQUERY

Examples

Create IdSet

Filter by values in IdSet

Filter by values not in IdSet

Filter on broker

Filter on server

JSON_MATCH and JSON_EXTRACT_SCALAR

JSON_MATCH syntax

Syntax