1 of 14

Query Syntax

Query Pinot using supported syntax.

Aggregation Functions

Aggregate functions return a single result for a group of rows.

Aggregate functions return a single result for a group of rows. The following table shows supported aggregate functions in Pinot.

Function

Description

Example

Default Value When No Record Selected

Project a column where the maxima appears in a series of measuring columns.

ARG_MAX(measuring1, measuring2, measuring3, projection)

Will return no result

Deprecated functions:

Function

Description

Example

Multi-value column functions

The following aggregation functions can be used for multi-value columns

Function

FILTER Clause in aggregation

Pinot supports FILTER clause in aggregation queries as follows:

In the query above, COL1 is aggregated only for rows where COL2 > 300 and COL3 > 50 . Similarly, COL2 is aggregated where COL2 < 50 and COL3 > 50.

With enabled, this allows to filter out the null values while performing aggregation as follows:

In the above query, COL1 is aggregated only for the non-null values. Without NULL value support, we would have to filter using the default null value.

Deprecated functions:

Function

Description

Example

Cardinality Estimation

Cardinality estimation is a classic problem. Pinot solves it with multiple ways each of which has a trade-off between accuracy and latency.

Exact Results

Functions:

DistinctCount(x) -> LONG

Returns accurate count for all unique values in a column.

The underlying implementation is using a IntOpenHashSet in library: it.unimi.dsi:fastutil:8.2.3 to hold all the unique values.

Approximate Results

It usually takes a lot of resources and time to compute exact results for unique counting on large datasets. In some circumstances, we can tolerate a certain error rate, in which case we can use approximation functions to tackle this problem.

HyperLogLog

is an approximation algorithm for unique counting. It uses fixed number of bits to estimate the cardinality of given data set.

Pinot leverages in library com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.

Functions:

DistinctCountHLL(x)_ -> LONG_

For column type INT/LONG/FLOAT/DOUBLE/STRING, Pinot treats each value as an individual entry to add into HyperLogLog Object, and then computes the approximation by calling method _cardinality().

For column type BYTES, Pinot treats each value as a serialized HyperLogLog Object with pre-aggregated values inside. The bytes value is generated by org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog).

All deserialized HyperLogLog object will be merged into one then calling method _cardinality() to get the approximated unique count._

HyperLogLogPlusPlus

The algorithm proposes several improvements in the HyperLogLog algorithm to reduce memory requirements and increase accuracy in some ranges of cardinalities.

64-bit hash function is used instead of the 32 bits used in the original paper. This reduces the hash collisions for large cardinalities allowing to remove the large range correction.
Some bias is found for small cardinalities when switching from linear counting to the HLL counting. An empirical bias correction is proposed to mitigate the problem.
A sparse representation of the registers is implemented to reduce memory requirements for small cardinalities, which can be later transformed to a dense representation if the cardinality grows.

Pinot leverages in library com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.

Functions:

DistinctCountHLLPlus(<HllPlusColumn>)_ -> LONG_
DistinctCountHLLPlus(<HllPlusColumn>, <p>)_ -> LONG_
DistinctCountHLLPlus(<HllPlusColumn>, <p>, <sp>)_ -> LONG_

For column type INT/LONG/FLOAT/DOUBLE/STRING , Pinot treats each value as an individual entry to add into HyperLogLogPlus Object, then compute the approximation by calling method _cardinality().

For column type BYTES, Pinot treats each value as a serialized HyperLogLogPlus Object with pre-aggregated values inside. The bytes value is generated by org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_PLUS_SER_DE.serialize(hyperLogLogPlus).

All deserialized HyperLogLogPlus object will be merged into one then calling method _cardinality() to get the approximated unique count._

Theta Sketches

The framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the and its extensions from the library org.apache.datasketches:datasketches-java:4.2.0 to perform distinct counting as well as evaluating set operations.

Functions:

DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**) **-> LONG
- thetaSketchColumn (required): Name of the column to aggregate on.
- thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches. Currently, the only supported parameter is nominalEntries

In the example query below, the where clause is responsible for identifying the matching rows. Note, the where clause can be completely independent of the postAggregationExpression. Once matching rows are identified, each server unionizes all the sketches that match the individual predicates, i.e. country='USA' , device='mobile' in this case. Once the broker receives the intermediate sketches for each of these individual predicates from all servers, it performs the final aggregation by evaluating the postAggregationExpression and returns the final cardinality of the resulting sketch.

DistinctCountRawThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**)** -> HexEncoded Serialized Sketch Bytes

This is the same as the previous function, except it returns the byte serialized sketch instead of the cardinality sketch. Since Pinot returns responses as JSON strings, bytes are returned as hex encoded strings. The hex encoded string can be deserialized into sketch by using the library org.apache.commons.codec.binaryas Hex.decodeHex(stringValue.toCharArray()).

Tuple Sketches

The is an extension of the . Tuple sketches store an additional summary value with each retained entry which makes the sketch ideal for summarizing attributes such as impressions or clicks. Tuple sketches are interoperable with the Theta Sketch and enable set operations over a stream of data, and can also be used for cardinality estimation.

Functions:

avgValueIntegerSumTupleSketch(<tupleSketchColumn>, <tupleSketchLgK>**) -> Long
- tupleSketchColumn (required): Name of the column to aggregate on.
- tupleSketchLgK (optional): lgK which is the the log2 of K, which controls both the size and accuracy of the sketch.

This function can be used to combine the summary values from the random sample stored within the Tuple sketch and formulate an estimate for an average that applies to the entire dataset. The average should be interpreted as applying to each key tracked by the sketch and is rounded to the nearest whole number.

distinctCountTupleSketch(<tupleSketchColumn>, <tupleSketchLgK>**) -> LONG
- tupleSketchColumn (required): Name of the column to aggregate on.
- tupleSketchLgK (optional): lgK which is the the log2 of K, which controls both the size and accuracy of the sketch.

This returns the cardinality estimate for a column where the values are already encoded as Tuple sketches, stored as BYTES.

distinctCountRawIntegerSumTupleSketch(<tupleSketchColumn>, <tupleSketchLgK>**) -> HexEncoded Serialized Sketch Bytes

sumValuesIntegerSumTupleSketch(<tupleSketchColumn>, <tupleSketchLgK>**) -> Long
- tupleSketchColumn (required): Name of the column to aggregate on.
- tupleSketchLgK (optional): lgK which is the the log2 of K, which controls both the size and accuracy of the sketch.

This function can be used to combine the summary values (using sum) from the random sample stored within the Tuple sketch and formulate an estimate that applies to the entire dataset. See avgValueIntegerSumTupleSketch for extracting an average for integer summaries. If other merging options are required, it is best to extract the raw sketches directly or to implement a new Pinot aggregation function to support these.

Compressed Probability Counting (CPC) Sketches

The enables extremely space-efficient cardinality estimation. The stored CPC sketch can consume about 40% less space than an HLL sketch of comparable accuracy. Pinot can aggregate multiple existing CPC sketches together to get a total distinct count or estimated directly from raw values.

Functions:

distinctCountCpcSketch(<cpcSketchColumn>, <cpcSketchLgK>**) -> Long
- cpcSketchColumn (required): Name of the column to aggregate on.
- cpcSketchLgK

This returns the cardinality estimate for a column.

distinctCountRawCpcSketch(<cpcSketchColumn>, <cpcSketchLgK>**) -> HexEncoded Serialized Sketch Bytes
- cpcSketchColumn (required): Name of the column to aggregate on.
- cpcSketchLgK

UltraLogLog (ULL) Sketches

The from Dynatrace is a variant of HyperLogLog and is used for approximate distinct counts. The UltraLogLog sketch shares many of the same properties of a typical HyperLogLog sketch but requires less space and also provides a simpler and faster estimator.

Pinot uses an production-ready Java implementation available in available under the Apache license.

Functions:

distinctCountULL(<ullSketchColumn>, <ullSketchPrecision>**) -> Long
- ullSketchColumn (required): Name of the column to aggregate on.
- ullSketchPrecision

This returns the cardinality estimate for a column.

distinctCountRawULL(<cpcSketchColumn>, <ullSketchPrecision>**) -> HexEncoded Serialized Sketch Bytes
- ullSketchColumn (required): Name of the column to aggregate on.
- ullSketchPrecision

Explain Plan (Single-Stage)

Query execution within Pinot is modeled as a sequence of operators that are executed in a pipelined manner to produce the final result. The output of the EXPLAIN PLAN statement can be used to see how queries are being run or to further optimize queries.

Introduction

EXPLAN PLAN can be run in two modes: verbose and non-verbose (default) via the use of a query option. To enable verbose mode the query option explainPlanVerbose=true must be passed.

In the non-verbose EXPLAIN PLAN output above, the Operator column describes the operator that Pinot will run where as, the Operator_Id and Parent_Id columns show the parent-child relationship between operators.

This parent-child relationship shows the order in which operators execute. For example, FILTER_MATCH_ENTIRE_SEGMENT will execute before and pass its output to PROJECT. Similarly, PROJECT will execute before and pass its output to TRANSFORM_PASSTHROUGH operator and so on.

Although the EXPLAIN PLAN query produces tabular output, in this document, we show a tree representation of the EXPLAIN PLAN output so that parent-child relationship between operators are easy to see and user can visualize the bottom-up flow of data in the operator tree execution.

Note a special node with the Operator_Id and Parent_Id called PLAN_START(numSegmentsForThisPlan:1). This node indicates the number of segments which match a given plan. The EXPLAIN PLAN query can be run with the verbose mode enabled using the query option explainPlanVerbose=true which will show the varying deduplicated query plans across all segments across all servers.

EXPLAIN PLAN output should only be used for informational purposes because it is likely to change from version to version as Pinot is further developed and enhanced. Pinot uses a "Scatter Gather" approach to query evaluation (see for more details). At the Broker, an incoming query is split into several server-level queries for each backend server to evaluate. At each Server, the query is further split into segment-level queries that are evaluated against each segment on the server. The results of segment queries are combined and sent to the Broker. The Broker in turn combines the results from all the Servers and sends the final results back to the user. Note that if the EXPLAIN PLAN query runs without the verbose mode enabled, a single plan will be returned (the heuristic used is to return the deepest plan tree) and this may not be an accurate representation of all plans across all segments. Different segments may execute the plan in a slightly different way.

Reading the EXPLAIN PLAN output from bottom to top will show how data flows from a table to query results. In the example shown above, the FILTER_MATCH_ENTIRE_SEGMENT operator shows that all 977889 records of the segment matched the query. The DOC_ID_SET over the filter operator gets the set of document IDs matching the filter operator. The PROJECT operator over the DOC_ID_SET operator pulls only those columns that were referenced in the query. The TRANSFORM_PASSTHROUGH operator just passes the column data from PROJECT operator to the SELECT operator. At SELECT, the query has been successfully evaluated against one segment. Results from different data segments are then combined (COMBINE_SELECT) and sent to the Broker. The Broker combines and reduces the results from different servers (BROKER_REDUCE

The rest of this document illustrates the EXPLAIN PLAN output with examples and describe the operators that show up in the output of the EXPLAIN PLAN.

EXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

Since verbose mode is enabled, the EXPLAIN PLAN output returns two plans matching one segment each (assuming 2 segments for this table). The first EXPLAIN PLAN output above shows that Pinot used an inverted index to evaluate the predicate "playerID = 'aardsda01'" (FILTER_INVERTED_INDEX). The result was then fully scanned (FILTER_FULL_SCAN) to evaluate the second predicate "playerName = 'David Allan'". Note that the two predicates are being combined using AND in the query; hence, only the data that satsified the first predicate needs to be scanned for evaluating the second predicate. However, if the predicates were being combined using OR, the query would run very slowly because the entire "playerName" column would need to be scanned from top to bottom to look for values satisfying the second predicate. To improve query efficiency in such cases, one should consider indexing the "playerName" column as well. The second plan output shows a FILTER_EMPTY indicating that no matching documents were found for one segment.

EXPLAIN PLAN ON GROUP BY QUERY

The EXPLAIN PLAN output above shows how GROUP BY queries are evaluated in Pinot. GROUP BY results are created on the server (AGGREGATE_GROUPBY_ORDERBY) for each segment on the server. The server then combines segment-level GROUP BY results (COMBINE_GROUPBY_ORDERBY) and sends the combined result to the Broker. The Broker combines GROUP BY result from all the servers to produce the final result which is send to the user. Note that the COMBINE_SELECT operator from the previous query was not used here, instead a different COMBINE_GROUPBY_ORDERBY operator was used. Depending upon the type of query different combine operators such as COMBINE_DISTINCT and COMBINE_ORDERBY etc may be seen.

EXPLAIN PLAN OPERATORS

The root operator of the EXPLAIN PLAN output is BROKER_REDUCE. BROKER_REDUCE indicates that Broker is processing and combining server results into final result that is sent back to the user. BROKER_REDUCE has a COMBINE operator as its child. Combine operator combines the results of query evaluation from each segment on the server and sends the combined result to the Broker. There are several combine operators (COMBINE_GROUPBY_ORDERBY, COMBINE_DISTINCT, COMBINE_AGGREGATE, etc.) that run depending upon the operations being performed by the query. Under the Combine operator, either a Select (SELECT, SELECT_ORDERBY, etc.) or an Aggregate (AGGREGATE, AGGREGATE_GROUPBY_ORDERBY, etc.) can appear. Aggreate operator is present when query performs aggregation (count(*)

Explain Plan (Multi-Stage)

This document describes EXPLAIN PLAN syntax for multi-stage engine (v2)

This page explains how to use EXPLAIN PLAN FOR syntax to obtain different plans of a query in multi-stage engine. You can read more about how to interpret the plans in the page.

Also remember that plans are logical representations of the query execution. Sometimes it is more useful to study the actual stats of the query execution, which are included on each query result. You can read more about how to interpret the stats in the page.

In , we do not differentiate any logical/physical plan b/c the structure of the query is fixed. By default it explain the Physical Plan

Grouping Algorithm

In this guide we will learn about the heuristics used for trimming results in Pinot's grouping algorithm (used when processing GROUP BY queries) to make sure that the server doesn't run out of memory.

Within segment

When grouping rows within a segment, Pinot keeps a maximum of <numGroupsLimit> groups per segment. This value is set to 100,000 by default and can be configured by the pinot.server.query.executor.num.groups.limit property.

If the number of groups of a segment reaches this value, the extra groups will be ignored and the results returned may not be completely accurate. The numGroupsLimitReached property will be set to true in the query response if the value is reached.

Trimming tail groups

After the inner segment groups have been computed, the Pinot query engine optionally trims tail groups. Tail groups are ones that have a lower rank based on the ORDER BY clause used in the query.

This configuration is disabled by default, but can be enabled by configuring the pinot.server.query.executor.min.segment.group.trim.size property.

When segment group trim is enabled, the query engine will trim the tail groups and keep max(<minSegmentGroupTrimSize>, 5 * LIMIT) groups if it gets more groups. Pinot keeps at least 5 * LIMIT groups when trimming tail groups to ensure the accuracy of results.

This value can be overridden on a query by query basis by passing the following option:

Cross segments

Once grouping has been done within a segment, Pinot will merge segment results and trim tail groups and keep max(<minServerGroupTrimSize>, 5 * LIMIT) groups if it gets more groups.

<minServerGroupTrimSize> is set to 5,000 by default and can be adjusted by configuring the pinot.server.query.executor.min.server.group.trim.size property. When setting the configuration to -1, the cross segments trim can be disabled.

This value can be overridden on a query by query basis by passing the following option:

When cross segments trim is enabled, the server will trim the tail groups before sending the results back to the broker. It will also trim the tail groups when the number of groups reaches the <trimThreshold>.

<trimThreshold> is the upper bound of groups allowed in a server for each query to protect servers from running out of memory. To avoid too frequent trimming, the actual trim size is bounded to <trimThreshold> / 2. Combining this with the above equation, the actual trim size for a query is calculated as min(max(<minServerGroupTrimSize>, 5 * LIMIT), <trimThreshold> / 2).

This configuration is set to 1,000,000 by default and can be adjusted by configuring the pinot.server.query.executor.groupby.trim.threshold property.

A higher threshold reduces the amount of trimming done, but consumes more heap memory. If the threshold is set to more than 1,000,000,000, the server will only trim the groups once before returning the results to the broker.

This value can be overridden on a query by query basis by passing the following option:

At Broker

When broker performs the final merge of the groups returned by various servers, there is another level of trimming that takes place. The tail groups are trimmed and max(<minBrokerGroupTrimSize>, 5 * LIMIT) groups are retained.

Default value of <minBrokerGroupTrimSize> is set to 5000. This can be adjusted by configuring pinot.broker.min.group.trim.size property.

GROUP BY behavior

Pinot sets a default LIMIT of 10 if one isn't defined and this applies to GROUP BY queries as well. Therefore, if no limit is specified, Pinot will return 10 groups.

Pinot will trim tail groups based on the ORDER BY clause to reduce the memory footprint and improve the query performance. It keeps at least 5 * LIMIT groups so that the results give good enough approximation in most cases. The configurable min trim size can be used to increase the groups kept to improve the accuracy but has a larger extra memory footprint.

HAVING behavior

If the query has a HAVING clause, it is applied on the merged GROUP BY results that already have the tail groups trimmed. If the HAVING clause is the opposite of the ORDER BY order, groups matching the condition might already be trimmed and not returned. e.g.

Increase min trim size to keep more groups in these cases.

Configuration Parameters

Parameter

Default

Query Override

Description

JOINs

Pinot supports JOINs, including left, right, full, semi, anti, lateral, and equi JOINs. Use JOINs to connect two table to generate a unified view, based on a related column between the tables.

Important: To query using JOINs, you must

Funnel Analysis

Apache Pinot supports a few funnel functions:

FunnelMaxStep

FunnelMaxStep evaluates user interactions within a specified time window to determine the furthest step reached in a predefined sequence of actions. By analyzing event timestamps and conditions set for each step, it identifies the maximum progression point for each user, ensuring that the sequence follows the configured order or other specific rules like strict timestamp increases or event uniqueness. This function is instrumental in funnel analysis, helping businesses and analysts understand user behavior, measure conversion rates, and identify potential drop-offs in critical user journeys.

Cardinality Estimation

Cardinality estimation is a classic problem. Pinot solves it with multiple ways each of which has a trade-off between accuracy and latency.

Exact Results

Functions:

DistinctCount(x) -> LONG

Returns accurate count for all unique values in a column.

The underlying implementation is using a IntOpenHashSet in library: it.unimi.dsi:fastutil:8.2.3 to hold all the unique values.

Approximate Results

HyperLogLog

is an approximation algorithm for unique counting. It uses fixed number of bits to estimate the cardinality of given data set.

Pinot leverages in library com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.

Functions:

DistinctCountHLL(x)_ -> LONG_

All deserialized HyperLogLog object will be merged into one then calling method _cardinality() to get the approximated unique count._

HyperLogLogPlusPlus

The algorithm proposes several improvements in the HyperLogLog algorithm to reduce memory requirements and increase accuracy in some ranges of cardinalities.

64-bit hash function is used instead of the 32 bits used in the original paper. This reduces the hash collisions for large cardinalities allowing to remove the large range correction.
Some bias is found for small cardinalities when switching from linear counting to the HLL counting. An empirical bias correction is proposed to mitigate the problem.
A sparse representation of the registers is implemented to reduce memory requirements for small cardinalities, which can be later transformed to a dense representation if the cardinality grows.

Pinot leverages in library com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.

Functions:

DistinctCountHLLPlus(<HllPlusColumn>)_ -> LONG_
DistinctCountHLLPlus(<HllPlusColumn>, <p>)_ -> LONG_
DistinctCountHLLPlus(<HllPlusColumn>, <p>, <sp>)_ -> LONG_

All deserialized HyperLogLogPlus object will be merged into one then calling method _cardinality() to get the approximated unique count._

Theta Sketches

Functions:

DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**) **-> LONG
- thetaSketchColumn (required): Name of the column to aggregate on.
- thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches. Currently, the only supported parameter is nominalEntries

DistinctCountRawThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**)** -> HexEncoded Serialized Sketch Bytes

Tuple Sketches

Functions:

avgValueIntegerSumTupleSketch(<tupleSketchColumn>, <tupleSketchLgK>**) -> Long
- tupleSketchColumn (required): Name of the column to aggregate on.
- tupleSketchLgK (optional): lgK which is the the log2 of K, which controls both the size and accuracy of the sketch.

distinctCountTupleSketch(<tupleSketchColumn>, <tupleSketchLgK>**) -> LONG
- tupleSketchColumn (required): Name of the column to aggregate on.
- tupleSketchLgK (optional): lgK which is the the log2 of K, which controls both the size and accuracy of the sketch.

This returns the cardinality estimate for a column where the values are already encoded as Tuple sketches, stored as BYTES.

distinctCountRawIntegerSumTupleSketch(<tupleSketchColumn>, <tupleSketchLgK>**) -> HexEncoded Serialized Sketch Bytes

sumValuesIntegerSumTupleSketch(<tupleSketchColumn>, <tupleSketchLgK>**) -> Long
- tupleSketchColumn (required): Name of the column to aggregate on.
- tupleSketchLgK (optional): lgK which is the the log2 of K, which controls both the size and accuracy of the sketch.

Compressed Probability Counting (CPC) Sketches

Functions:

distinctCountCpcSketch(<cpcSketchColumn>, <cpcSketchLgK>**) -> Long
- cpcSketchColumn (required): Name of the column to aggregate on.
- cpcSketchLgK

This returns the cardinality estimate for a column.

distinctCountRawCpcSketch(<cpcSketchColumn>, <cpcSketchLgK>**) -> HexEncoded Serialized Sketch Bytes
- cpcSketchColumn (required): Name of the column to aggregate on.
- cpcSketchLgK

UltraLogLog (ULL) Sketches

Pinot uses an production-ready Java implementation available in available under the Apache license.

Functions:

distinctCountULL(<ullSketchColumn>, <ullSketchPrecision>**) -> Long
- ullSketchColumn (required): Name of the column to aggregate on.
- ullSketchPrecision

This returns the cardinality estimate for a column.

distinctCountRawULL(<cpcSketchColumn>, <ullSketchPrecision>**) -> HexEncoded Serialized Sketch Bytes
- ullSketchColumn (required): Name of the column to aggregate on.
- ullSketchPrecision

Explain Plan (Single-Stage)

Introduction

EXPLAN PLAN can be run in two modes: verbose and non-verbose (default) via the use of a query option. To enable verbose mode the query option explainPlanVerbose=true must be passed.

EXPLAIN PLAN FOR SELECT playerID, playerName FROM baseballStats

+---------------------------------------------|------------|---------|
| Operator                                    | Operator_Id|Parent_Id|
+---------------------------------------------|------------|---------|
|BROKER_REDUCE(limit:10)                      | 1          | 0       |
|COMBINE_SELECT                               | 2          | 1       |
|PLAN_START(numSegmentsForThisPlan:1)         | -1         | -1      |
|SELECT(selectList:playerID, playerName)      | 3          | 2       |
|TRANSFORM_PASSTHROUGH(playerID, playerName)  | 4          | 3       |
|PROJECT(playerName, playerID)                | 5          | 4       |
|DOC_ID_SET                                   | 6          | 5       |
|FILTER_MATCH_ENTIRE_SEGMENT(docs:97889)      | 7          | 6       |
+---------------------------------------------|------------|---------|

The rest of this document illustrates the EXPLAIN PLAN output with examples and describe the operators that show up in the output of the EXPLAIN PLAN.

EXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

EXPLAIN PLAN ON GROUP BY QUERY

EXPLAIN PLAN OPERATORS

Grouping Algorithm

Within segment

Trimming tail groups

After the inner segment groups have been computed, the Pinot query engine optionally trims tail groups. Tail groups are ones that have a lower rank based on the ORDER BY clause used in the query.

This configuration is disabled by default, but can be enabled by configuring the pinot.server.query.executor.min.segment.group.trim.size property.

This value can be overridden on a query by query basis by passing the following option:

Cross segments

Once grouping has been done within a segment, Pinot will merge segment results and trim tail groups and keep max(<minServerGroupTrimSize>, 5 * LIMIT) groups if it gets more groups.

This value can be overridden on a query by query basis by passing the following option:

This configuration is set to 1,000,000 by default and can be adjusted by configuring the pinot.server.query.executor.groupby.trim.threshold property.

This value can be overridden on a query by query basis by passing the following option:

At Broker

Default value of <minBrokerGroupTrimSize> is set to 5000. This can be adjusted by configuring pinot.broker.min.group.trim.size property.

GROUP BY behavior

Pinot sets a default LIMIT of 10 if one isn't defined and this applies to GROUP BY queries as well. Therefore, if no limit is specified, Pinot will return 10 groups.

HAVING behavior

Increase min trim size to keep more groups in these cases.

Configuration Parameters

Parameter

Default

Query Override

Description

Query Syntax

Aggregation Functions

hashtagMulti-value column functions

hashtagFILTER Clause in aggregation

Cardinality Estimation

hashtagExact Results

hashtagApproximate Results

hashtagHyperLogLog

hashtagHyperLogLogPlusPlus

hashtagTheta Sketches

hashtagTuple Sketches

hashtagCompressed Probability Counting (CPC) Sketches

hashtagUltraLogLog (ULL) Sketches

Explain Plan (Single-Stage)

hashtagIntroduction

hashtagEXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

hashtagEXPLAIN PLAN ON GROUP BY QUERY

hashtagEXPLAIN PLAN OPERATORS

Explain Plan (Multi-Stage)

Grouping Algorithm

hashtagWithin segment

hashtagTrimming tail groups

hashtagCross segments

hashtagAt Broker

hashtagGROUP BY behavior

hashtagHAVING behavior

hashtagConfiguration Parameters

JOINs

Funnel Analysis

hashtagFunnelMaxStep

Cardinality Estimation

hashtagExact Results

hashtagApproximate Results

hashtagHyperLogLog

hashtagHyperLogLogPlusPlus

hashtagTheta Sketches

hashtagTuple Sketches

hashtagCompressed Probability Counting (CPC) Sketches

hashtagUltraLogLog (ULL) Sketches

Query Syntax

Explain Plan (Single-Stage)

hashtagIntroduction

hashtagEXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

hashtagEXPLAIN PLAN ON GROUP BY QUERY

hashtagEXPLAIN PLAN OPERATORS

Aggregation Functions

hashtagMulti-value column functions

hashtagFILTER Clause in aggregation

Funnel Analysis

hashtagFunnelMaxStep

hashtagFunnelMatchStep

hashtagFunnelCompleteCount

Grouping Algorithm

hashtagWithin segment

hashtagTrimming tail groups

hashtagCross segments

hashtagAt Broker

hashtagGROUP BY behavior

hashtagHAVING behavior

hashtagConfiguration Parameters

JOINs

Explain Plan (Multi-Stage)

hashtagJOINs overview

hashtagSupported JOINs types and examples

hashtagInner join

hashtagExample of inner join

hashtagLeft join

hashtagRight join

hashtagFull join

hashtagCross join

hashtagSemi/Anti join

hashtagEqui join

hashtagJOINs optimizations

hashtagExplain Logical Plan

hashtagExplain Implementation Plan

Filtering with IdSet

hashtag

hashtagID_SET

hashtagIN_ID_SET

hashtagIN_SUBQUERY

Multi-value column functions

FILTER Clause in aggregation

Exact Results

Approximate Results

HyperLogLog

HyperLogLogPlusPlus

Theta Sketches

Tuple Sketches

Compressed Probability Counting (CPC) Sketches

UltraLogLog (ULL) Sketches

Introduction

EXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

EXPLAIN PLAN ON GROUP BY QUERY

EXPLAIN PLAN OPERATORS

Within segment

Trimming tail groups

Cross segments

At Broker

GROUP BY behavior

HAVING behavior

Configuration Parameters

FunnelMaxStep

Exact Results

Approximate Results

HyperLogLog

HyperLogLogPlusPlus

Theta Sketches

Tuple Sketches

Compressed Probability Counting (CPC) Sketches

UltraLogLog (ULL) Sketches

Introduction

EXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

EXPLAIN PLAN ON GROUP BY QUERY

EXPLAIN PLAN OPERATORS

Multi-value column functions

FILTER Clause in aggregation

FunnelMaxStep

FunnelMatchStep

FunnelCompleteCount

Within segment

Trimming tail groups

Cross segments

At Broker

GROUP BY behavior

HAVING behavior

Configuration Parameters

JOINs overview

Supported JOINs types and examples

Inner join

Example of inner join

Left join

Right join

Full join

Cross join

Semi/Anti join

Equi join

JOINs optimizations

Explain Logical Plan

Explain Implementation Plan

ID_SET

IN_ID_SET

IN_SUBQUERY

INPARTITIONEDSUBQUERY

Examples

Create IdSet

Filter by values in IdSet

Filter by values not in IdSet

Filter on broker

Filter on server

Syntax

Examples

Single-partition-key-column Example

Dim-Fact LOOKUP example

Self LOOKUP example

Complex-partition-key-columns Example

Self LOOKUP example

Usage FAQ

JSON_MATCH and JSON_EXTRACT_SCALAR

JSON_MATCH syntax

Window aggregate overview