1 of 17

Query

Learn how to query Apache Pinot using SQL or explore data using the web-based Pinot query console.

Explore query syntax:

Querying JSON data Aggregation Functions Cardinality Estimation Explain Plan (Single-Stage)Filtering with IdSet GapFill Function For Time-Series Dataset Grouping Algorithm JOINs JOINs Lookup UDF Join Transformation Functions User-Defined Functions (UDFs)Window aggregate Window aggregate

Querying Pinot

Learn how to query Pinot using SQL

SQL Interface

Pinot provides a SQL interface for querying, which uses the Calcite SQL parser to parse queries and the MYSQL_ANSI dialect. For details on the syntax, see the the Calcite documentation. To find supported SQL operators, see Class SqlLibraryOperators.

Pinot 1.0

In Pinot 1.0, the multi-stage query engine supports inner join, left-outer, semi-join, and nested queries out of the box. It's optimized for in-memory process and latency. For more information, see how to .

Pinot also supports using simple Data Definition Language (DDL) to insert data into a table from file directly. For details, see . More DDL supports will be added in the future. But for now, the most common way for data definition is using the .

Note: For queries that require a large amount of data shuffling, require spill-to-disk, or are hitting any other limitations of the multi-stage query engine (v2), we still recommend using Presto.

Identifier vs Literal

In Pinot SQL:

Double quotes(") are used to force string identifiers, e.g. column names
Single quotes(') are used to enclose string literals. If the string literal also contains a single quote, escape this with a single quote e.g '''Pinot''' to match the string literal 'Pinot'

Misusing those might cause unexpected query results, like the following examples:

WHERE a='b' means the predicate on the column a equals to a string literal value 'b'
WHERE a="b" means the predicate on the column a equals to the value of the column b

If your column names use reserved keywords (e.g. timestamp or date) or special characters, you will need to use double quotes when referring to them in queries.

Note: Define decimal literals within quotes to preserve precision.

Example Queries

Selection

Aggregation

Grouping on Aggregation

Ordering on Aggregation

Filtering

For performant filtering of IDs in a list, see .

Filtering with NULL predicate

Selection (Projection)

Ordering on Selection

Pagination on Selection

Note that results might not be consistent if the ORDER BY column has the same value in multiple rows.

Wild-card match (in WHERE clause only)

The example below counts rows where the column airlineName starts with U:

Case-When Statement

Pinot supports the CASE-WHEN-ELSE statement, as shown in the following two examples:

UDF

Pinot doesn't currently support injecting functions. Functions have to be implemented within Pinot, as shown below:

For more examples, see .

BYTES column

Pinot supports queries on BYTES column using hex strings. The query response also uses hex strings to represent bytes values.

The query below fetches all the rows for a given UID:

Cardinality Estimation

Cardinality estimation is a classic problem. Pinot solves it with multiple ways each of which has a trade-off between accuracy and latency.

Exact Results

Functions:

Explain Plan (Single-Stage)

Query execution within Pinot is modeled as a sequence of operators that are executed in a pipelined manner to produce the final result. The output of the EXPLAIN PLAN statement can be used to see how queries are being run or to further optimize queries.

Introduction

EXPLAN PLAN can be run in two modes: verbose and non-verbose (default) via the use of a query option. To enable verbose mode the query option explainPlanVerbose=true must be passed.

In the non-verbose EXPLAIN PLAN output above, the Operator column describes the operator that Pinot will run where as, the Operator_Id and Parent_Id columns show the parent-child relationship between operators.

This parent-child relationship shows the order in which operators execute. For example, FILTER_MATCH_ENTIRE_SEGMENT will execute before and pass its output to PROJECT. Similarly, PROJECT will execute before and pass its output to TRANSFORM_PASSTHROUGH operator and so on.

Although the EXPLAIN PLAN query produces tabular output, in this document, we show a tree representation of the EXPLAIN PLAN output so that parent-child relationship between operators are easy to see and user can visualize the bottom-up flow of data in the operator tree execution.

Note a special node with the Operator_Id and Parent_Id called PLAN_START(numSegmentsForThisPlan:1). This node indicates the number of segments which match a given plan. The EXPLAIN PLAN query can be run with the verbose mode enabled using the query option explainPlanVerbose=true which will show the varying deduplicated query plans across all segments across all servers.

EXPLAIN PLAN output should only be used for informational purposes because it is likely to change from version to version as Pinot is further developed and enhanced. Pinot uses a "Scatter Gather" approach to query evaluation (see for more details). At the Broker, an incoming query is split into several server-level queries for each backend server to evaluate. At each Server, the query is further split into segment-level queries that are evaluated against each segment on the server. The results of segment queries are combined and sent to the Broker. The Broker in turn combines the results from all the Servers and sends the final results back to the user. Note that if the EXPLAIN PLAN query runs without the verbose mode enabled, a single plan will be returned (the heuristic used is to return the deepest plan tree) and this may not be an accurate representation of all plans across all segments. Different segments may execute the plan in a slightly different way.

Reading the EXPLAIN PLAN output from bottom to top will show how data flows from a table to query results. In the example shown above, the FILTER_MATCH_ENTIRE_SEGMENT operator shows that all 977889 records of the segment matched the query. The DOC_ID_SET over the filter operator gets the set of document IDs matching the filter operator. The PROJECT operator over the DOC_ID_SET operator pulls only those columns that were referenced in the query. The TRANSFORM_PASSTHROUGH operator just passes the column data from PROJECT operator to the SELECT operator. At SELECT, the query has been successfully evaluated against one segment. Results from different data segments are then combined (COMBINE_SELECT) and sent to the Broker. The Broker combines and reduces the results from different servers (BROKER_REDUCE

The rest of this document illustrates the EXPLAIN PLAN output with examples and describe the operators that show up in the output of the EXPLAIN PLAN.

EXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

Since verbose mode is enabled, the EXPLAIN PLAN output returns two plans matching one segment each (assuming 2 segments for this table). The first EXPLAIN PLAN output above shows that Pinot used an inverted index to evaluate the predicate "playerID = 'aardsda01'" (FILTER_INVERTED_INDEX). The result was then fully scanned (FILTER_FULL_SCAN) to evaluate the second predicate "playerName = 'David Allan'". Note that the two predicates are being combined using AND in the query; hence, only the data that satsified the first predicate needs to be scanned for evaluating the second predicate. However, if the predicates were being combined using OR, the query would run very slowly because the entire "playerName" column would need to be scanned from top to bottom to look for values satisfying the second predicate. To improve query efficiency in such cases, one should consider indexing the "playerName" column as well. The second plan output shows a FILTER_EMPTY indicating that no matching documents were found for one segment.

EXPLAIN PLAN ON GROUP BY QUERY

The EXPLAIN PLAN output above shows how GROUP BY queries are evaluated in Pinot. GROUP BY results are created on the server (AGGREGATE_GROUPBY_ORDERBY) for each segment on the server. The server then combines segment-level GROUP BY results (COMBINE_GROUPBY_ORDERBY) and sends the combined result to the Broker. The Broker combines GROUP BY result from all the servers to produce the final result which is send to the user. Note that the COMBINE_SELECT operator from the previous query was not used here, instead a different COMBINE_GROUPBY_ORDERBY operator was used. Depending upon the type of query different combine operators such as COMBINE_DISTINCT and COMBINE_ORDERBY etc may be seen.

EXPLAIN PLAN OPERATORS

The root operator of the EXPLAIN PLAN output is BROKER_REDUCE. BROKER_REDUCE indicates that Broker is processing and combining server results into final result that is sent back to the user. BROKER_REDUCE has a COMBINE operator as its child. Combine operator combines the results of query evaluation from each segment on the server and sends the combined result to the Broker. There are several combine operators (COMBINE_GROUPBY_ORDERBY, COMBINE_DISTINCT, COMBINE_AGGREGATE, etc.) that run depending upon the operations being performed by the query. Under the Combine operator, either a Select (SELECT, SELECT_ORDERBY, etc.) or an Aggregate (AGGREGATE, AGGREGATE_GROUPBY_ORDERBY, etc.) can appear. Aggreate operator is present when query performs aggregation (

Filtering with IdSet

Learn how to write fast queries for looking up IDs in a list of values.

Filtering with IdSet is only supported with the single-stage query engine (v1).

A common use case is filtering on an id field with a list of values. This can be done with the IN clause, but using IN doesn't perform well with large lists of IDs. For large lists of IDs, we recommend using an IdSet.

Functions

ID_SET

ID_SET(columnName, 'sizeThresholdInBytes=8388608;expectedInsertions=5000000;fpp=0.03' )

This function returns a base 64 encoded IdSet of the values for a single column. The IdSet implementation used depends on the column data type:

INT - RoaringBitmap unless sizeThresholdInBytes is exceeded, in which case Bloom Filter.
LONG - Roaring64NavigableMap unless sizeThresholdInBytes is exceeded, in which case Bloom Filter.
Other types - Bloom Filter

The following parameters are used to configure the Bloom Filter:

expectedInsertions - Number of expected insertions for the BloomFilter, must be positive
fpp - False positive probability to use for the BloomFilter. Must be positive and less than 1.0.

Note that when a Bloom Filter is used, the filter results are approximate - you can get false-positive results (for membership in the set), leading to potentially unexpected results.

IN_ID_SET

IN_ID_SET(columnName, base64EncodedIdSet)

This function returns 1 if a column contains a value specified in the IdSet and 0 if it does not.

IN_SUBQUERY

IN_SUBQUERY(columnName, subQuery)

This function generates an IdSet from a subquery and then filters ids based on that IdSet on a Pinot broker.

INPARTITIONEDSUBQUERY

IN_PARTITIONED_SUBQUERY(columnName, subQuery)

This function generates an IdSet from a subquery and then filters ids based on that IdSet on a Pinot server.

This function works best when the data is partitioned by the id column and each server contains all the data for a partition. The generated IdSet for the subquery will be smaller as it will only contain the ids for the partitions served by the server. This will give better performance.

The query passed to IN_SUBQUERY can be run on any table - they aren't restricted to the table used in the parent query.

The query passed to IN__PARTITIONED__SUBQUERY must be run on the same table as the parent query.

Examples

Create IdSet

You can create an IdSet of the values in the yearID column by running the following:

idset(yearID)

When creating an IdSet for values in non INT/LONG columns, we can configure the expectedInsertions:

idset(playerName)

We can also configure the fpp parameter:

idset(playerName)

Filter by values in IdSet

We can use the IN_ID_SET function to filter a query based on an IdSet. To return rows for _yearID_s in the IdSet, run the following:

Filter by values not in IdSet

To return rows for _yearID_s not in the IdSet, run the following:

Filter on broker

To filter rows for _yearID_s in the IdSet on a Pinot Broker, run the following query:

To filter rows for _yearID_s not in the IdSet on a Pinot Broker, run the following query:

Filter on server

To filter rows for _yearID_s in the IdSet on a Pinot Server, run the following query:

To filter rows for _yearID_s not in the IdSet on a Pinot Server, run the following query:

Grouping Algorithm

In this guide we will learn about the heuristics used for trimming results in Pinot's grouping algorithm (used when processing GROUP BY queries) to make sure that the server doesn't run out of memory.

Within segment

When grouping rows within a segment, Pinot keeps a maximum of <numGroupsLimit> groups per segment. This value is set to 100,000 by default and can be configured by the pinot.server.query.executor.num.groups.limit property.

If the number of groups of a segment reaches this value, the extra groups will be ignored and the results returned may not be completely accurate. The numGroupsLimitReached property will be set to true in the query response if the value is reached.

Trimming tail groups

After the inner segment groups have been computed, the Pinot query engine optionally trims tail groups. Tail groups are ones that have a lower rank based on the ORDER BY clause used in the query.

This configuration is disabled by default, but can be enabled by configuring the pinot.server.query.executor.min.segment.group.trim.size property.

When segment group trim is enabled, the query engine will trim the tail groups and keep max(<minSegmentGroupTrimSize>, 5 * LIMIT) groups if it gets more groups. Pinot keeps at least 5 * LIMIT groups when trimming tail groups to ensure the accuracy of results.

This value can be overridden on a query by query basis by passing the following option:

Cross segments

Once grouping has been done within a segment, Pinot will merge segment results and trim tail groups and keep max(<minServerGroupTrimSize>, 5 * LIMIT) groups if it gets more groups.

<minServerGroupTrimSize> is set to 5,000 by default and can be adjusted by configuring the pinot.server.query.executor.min.server.group.trim.size property. When setting the configuration to -1, the cross segments trim can be disabled.

This value can be overridden on a query by query basis by passing the following option:

When cross segments trim is enabled, the server will trim the tail groups before sending the results back to the broker. It will also trim the tail groups when the number of groups reaches the <trimThreshold>.

This configuration is set to 1,000,000 by default and can be adjusted by configuring the pinot.server.query.executor.groupby.trim.threshold property.

A higher threshold reduces the amount of trimming done, but consumes more heap memory. If the threshold is set to more than 1,000,000,000, the server will only trim the groups once before returning the results to the broker.

At Broker

When broker performs the final merge of the groups returned by various servers, there is another level of trimming that takes place. The tail groups are trimmed and max(<minBrokerGroupTrimSize>, 5 * LIMIT) groups are retained.

Default value of <minBrokerGroupTrimSize> is set to 5000. This can be adjusted by configuring pinot.broker.min.group.trim.size property.

GROUP BY behavior

Pinot sets a default LIMIT of 10 if one isn't defined and this applies to GROUP BY queries as well. Therefore, if no limit is specified, Pinot will return 10 groups.

Pinot will trim tail groups based on the ORDER BY clause to reduce the memory footprint and improve the query performance. It keeps at least 5 * LIMIT groups so that the results give good enough approximation in most cases. The configurable min trim size can be used to increase the groups kept to improve the accuracy but has a larger extra memory footprint.

HAVING behavior

If the query has a HAVING clause, it is applied on the merged GROUP BY results that already have the tail groups trimmed. If the HAVING clause is the opposite of the ORDER BY order, groups matching the condition might already be trimmed and not returned. e.g.

Increase min trim size to keep more groups in these cases.

Configuration Parameters

Parameter

Default

Query Override

Description

JOINs

Pinot supports JOINs, including left, right, full, semi, anti, lateral, and equi JOINs. Use JOINs to connect two table to generate a unified view, based on a related column between the tables.

Important: To query using JOINs, you must use Pinot's multi-stage query engine (v2).

Overview of JOINs in Pinot 1.0

JOINs overview

Pinot 1.0 introduces support for all JOIN types. JOINs in Pinot significantly reduce query latency and simplify architecture, achieving the best performance currently available for an OLAP database.

Use JOINs to combine two tables (a left and right table) together, based on a related column between the tables, and other join filters. JOINs let you gain more insights from your data.

Supported JOINs types and examples

Inner join

The inner join selects rows that have matching values in both tables.

Syntax:

Example of inner join

Joins a table containing user transactions with a table containing promotions shown to the users, to show the spending for every userID.

Left join

A left join returns all values from the left relation and the matched values from the right table, or appends NULL if there is no match. Also referred to as a left outer join.

Syntax:

Right join

A right join returns all values from the right relation and the matched values from the left relation, or appends NULL if there is no match. It is also referred to as a right outer join.

Syntax:

Full join

A full join returns all values from both relations, appending NULL values on the side that does not have a match. It is also referred to as a full outer join.

Syntax:

Cross join

A cross join returns the Cartesian product of two relations. If no WHERE clause is used along with CROSS JOIN, this produces a result set that is the number of rows in the first table multiplied by the number of rows in the second table. If a WHERE clause is included with CROSS JOIN, it functions like an .

Syntax:

Semi/Anti join

Semi/anti-join returns rows from the first table where no matches are found in the second table. Returns one copy of each row in the first table for which no match is found.

Syntax:

Equi join

An equi join uses an equality operator to match a single or multiple column values of the relative tables.

Syntax:

JOINs optimizations

Pinot JOINs include the following optimizations:

Predicate push-down to individual tables
Indexing and pruning to reduce scanning and speeds up query processing
Smart data layout considerations to minimize data shuffling

Query Options

This document contains all the available query options

Supported Query Options

Key

Description

Default Behavior

timeoutMs

Set Query Options

SET statement

After release 0.11.0, query options can be set using the SET statement:

OPTION keyword (deprecated)

Before release 0.11.0, query options can be appended to the query with the OPTION keyword:

User-Defined Functions (UDFs)

Pinot currently supports two ways for you to implement your own functions:

Groovy Scripts
Scalar Functions

Grouping Algorithm

Within segment

Trimming tail groups

After the inner segment groups have been computed, the Pinot query engine optionally trims tail groups. Tail groups are ones that have a lower rank based on the ORDER BY clause used in the query.

This configuration is disabled by default, but can be enabled by configuring the pinot.server.query.executor.min.segment.group.trim.size property.

This value can be overridden on a query by query basis by passing the following option:

Cross segments

Once grouping has been done within a segment, Pinot will merge segment results and trim tail groups and keep max(<minServerGroupTrimSize>, 5 * LIMIT) groups if it gets more groups.

This value can be overridden on a query by query basis by passing the following option:

This configuration is set to 1,000,000 by default and can be adjusted by configuring the pinot.server.query.executor.groupby.trim.threshold property.

At Broker

Default value of <minBrokerGroupTrimSize> is set to 5000. This can be adjusted by configuring pinot.broker.min.group.trim.size property.

GROUP BY behavior

Pinot sets a default LIMIT of 10 if one isn't defined and this applies to GROUP BY queries as well. Therefore, if no limit is specified, Pinot will return 10 groups.

HAVING behavior

Increase min trim size to keep more groups in these cases.

Configuration Parameters

Parameter

Default

Query Override

Description

Explain Plan (Single-Stage)

Introduction

EXPLAN PLAN can be run in two modes: verbose and non-verbose (default) via the use of a query option. To enable verbose mode the query option explainPlanVerbose=true must be passed.

EXPLAIN PLAN FOR SELECT playerID, playerName FROM baseballStats

+---------------------------------------------|------------|---------|
| Operator                                    | Operator_Id|Parent_Id|
+---------------------------------------------|------------|---------|
|BROKER_REDUCE(limit:10)                      | 1          | 0       |
|COMBINE_SELECT                               | 2          | 1       |
|PLAN_START(numSegmentsForThisPlan:1)         | -1         | -1      |
|SELECT(selectList:playerID, playerName)      | 3          | 2       |
|TRANSFORM_PASSTHROUGH(playerID, playerName)  | 4          | 3       |
|PROJECT(playerName, playerID)                | 5          | 4       |
|DOC_ID_SET                                   | 6          | 5       |
|FILTER_MATCH_ENTIRE_SEGMENT(docs:97889)      | 7          | 6       |
+---------------------------------------------|------------|---------|

The rest of this document illustrates the EXPLAIN PLAN output with examples and describe the operators that show up in the output of the EXPLAIN PLAN.

EXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

EXPLAIN PLAN ON GROUP BY QUERY

EXPLAIN PLAN OPERATORS

Querying JSON data

To see how JSON data can be queried, assume that we have the following table:

We also assume that "jsoncolumn" has a Json Index on it. Note that the last two rows in the table have different structure than the rest of the rows. In keeping with JSON specification, a JSON column can contain any valid JSON data and doesn't need to adhere to a predefined schema. To pull out the entire JSON document for each row, we can run the query below:

jsoncolumn

"101"

"{"name":{"first":"daffy","last":"duck"},"score":101,"data":["a","b","c","d"]}"

102"

To drill down and pull out specific keys within the JSON column, we simply append the JsonPath expression of those keys to the end of the column name.

last_name

first_name

value

Note that the third column (value) is null for rows with id 106 and 107. This is because these rows have JSON documents that don't have a key with JsonPath $.data[1]. We can filter out these rows.

last_name

first_name

value

Certain last names (duck and mouse for example) repeat in the data above. We can get a count of each last name by running a GROUP BY query on a JsonPath expression.

jsoncolumn.name.last

count(*)

Also there is numerical information (jsconcolumn.$.id) embeded within the JSON document. We can extract those numerical values from JSON data into SQL and sum them up using the query below.

jsoncolumn.name.last

sum(jsoncolumn.score)

JSON_MATCH and JSON_EXTRACT_SCALAR

Note that the JSON_MATCH function utilizes JsonIndex and can only be used if a JsonIndex is already present on the JSON column. As shown in the examples above, the second argument of JSON_MATCH operator takes a predicate. This predicate is evaluated against the JsonIndex and supports =, !=, IS NULL, or IS NOT NULL operators. Relational operators, such as >, <, >=

jsoncolumn.name.last

sum(jsoncolumn.score)

JSON_MATCH function also provides the ability to use wildcard * JsonPath expressions even though it doesn't support full JsonPath expressions.

last_name

total

While, JSON_MATCH supports IS NULL and IS NOT NULL operators, these operators should only be applied to leaf-level path elements, i.e the predicate JSON_MATCH(jsoncolumn, '"$.data[*]" IS NOT NULL') is not valid since "$.data[*]" does not address a "leaf" element of the path; however, "$.data[0]" IS NOT NULL') is valid since "$.data[0]" unambigously identifies a leaf element of the path.

JSON_EXTRACT_SCALAR does not utilize JsonIndex and therefore performs slower than JSON_MATCH which utilizes JsonIndex. However, JSON_EXTRACT_SCALAR supports a wider range for of JsonPath expressions and operators. To make the best use of fast index access (JSON_MATCH) along with JsonPath expressions (JSON_EXTRACT_SCALAR) you can combine the use of these two functions in WHERE clause.

JSON_MATCH syntax

The second argument of the JSON_MATCH function is a boolean expression in string form. This section shows how to correctly write the second argument of JSON_MATCH. Let's assume we want to search a JSON array array data for values k and j. This can be done by the following predicate:

To convert this predicate into string form for use in JSON_MATCH, we first turn the left side of the predicate into an identifier by enclosing it in double quotes:

Next, the literals in the predicate also need to be enclosed by '. Any existing ' need to be escaped as well. This gives us:

Finally, we need to create a string out of the entire expression above by enclosing it in ':

Now we have the string representation of the original predicate and this can be used in JSON_MATCH function:

Window aggregate

Use window aggregate to compute averages, sort, rank, or count items, calculate sums, and find minimum or maximum values across window.

Important: To query using Windows functions, you must enable Pinot's multi-stage query engine (v2). See how to enable and use the multi-stage query engine (v2).

Window aggregate overview

This is an overview of the window aggregate feature.

Window aggregate syntax

Pinot's window function (windowedAggCall) includes the following syntax definition:

windowAggCall refers to the actual windowed agg operation.
windowAggFunction refers to the aggregation function used inside a windowed aggregate, see supported .
window

You can jump to the section to see more concrete use cases of window aggregate on Pinot.

Example window aggregate query layout

The following query shows the complete components of the window function. Note, PARTITION BY and ORDER BY are optional.

Window mechanism (OVER clause)

Partition by clause

If a PARTITION BY clause is specified, the intermediate results will be grouped into different partitions based on the values of the columns appearing in the PARTITION BY clause.
If the PARTITION BY clause isn’t specified, the whole result will be regarded as one big partition, i.e. there is only one partition in the result set.

Order by clause

If an ORDER BY clause is specified, all the rows within the same partition will be sorted based on the values of the columns appearing in the window ORDER BY clause. The ORDER BY clause decides the order in which the rows within a partition are to be processed.
If no ORDER BY clause is specified while a PARTITION BY clause is specified, the order of the rows is undefined. To order the output, use a global ORDER BY clause in the query.

Frame clause

Important Note: in release 1.0.0 window aggregate only supports UNBOUND PRECEDING, UNBOUND FOLLOWING and CURRENT ROW. frame and row count support have not been implemented yet.

{RANGE|ROWS} frame_start OR
{RANGE|ROWS} BETWEEN frame_start AND frame_end; frame_start and frame_end can be any of:
- UNBOUNDED PRECEDING: expression PRECEDING. May only be allowed in ROWS mode [depends on DB, some support some don’t]

If there is no FRAME, no PARTITION BY, and no ORDER BY clause specified in the OVER clause (empty OVER), the whole result set is regarded as one partition, and there's one frame in the window.

The OVER clause applies a specified supported to compute values over a group of rows and return a single result for each row. The OVER clause specifies how the rows are arranged and how the aggregation is done on those rows.

Inside the over clause, there are three optional components: PARTITION BY clause, ORDER BY clause, and FRAME clause.

Window aggregate functions

Window aggregate functions are commonly used to do the following:

Supported window aggregate functions are listed in the following table.

Function

Description

Example

Default Value When No Record Selected

Window aggregate query examples

Sum transactions by customer ID

Calculate the rolling sum transaction amount ordered by the payment date for each customer ID (note, the default frame here is UNBOUNDED PRECEDING and CURRENT ROW).

customer_id

payment_date

amount

sum

Find the minimum or maximum transaction by customer ID

Calculate the least (use MIN()) or most expensive (use MAX()) transaction made by each customer comparing all transactions made by the customer (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING). The following query shows how to find the least expensive transaction.

customer_id

payment_date

amount

min

Find the average transaction amount by customer ID

Calculate a customer’s average transaction amount for all transactions they’ve made (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).

customer_id

payment_date

amount

avg

Rank year-to-date sales for a sales team

Use ROW_NUMBER() to rank team members by their year-to-date sales (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).

Row

FirstName

LastName

Total sales YTD

Count the number of transactions by customer ID

Count the number of transactions made by each customer (default frame here is UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).

customer_id

payment_date

amount

count

Query

hashtagExplore query syntax:

Querying Pinot

hashtagSQL Interface

hashtagPinot 1.0

hashtagIdentifier vs Literal

hashtagExample Queries

hashtagSelection

hashtagAggregation

hashtagGrouping on Aggregation

hashtagOrdering on Aggregation

hashtagFiltering

hashtagFiltering with NULL predicate

hashtagSelection (Projection)

hashtagOrdering on Selection

hashtagPagination on Selection

hashtagWild-card match (in WHERE clause only)

hashtagCase-When Statement

hashtagUDF

hashtagBYTES column

Cardinality Estimation

hashtagExact Results

Explain Plan (Single-Stage)

hashtagIntroduction

hashtagEXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

hashtagEXPLAIN PLAN ON GROUP BY QUERY

hashtagEXPLAIN PLAN OPERATORS

Filtering with IdSet

hashtagFunctions

hashtagID_SET

hashtagIN_ID_SET

hashtagIN_SUBQUERY

hashtagIN__PARTITIONED__SUBQUERY

hashtagExamples

hashtagCreate IdSet

hashtagFilter by values in IdSet

hashtagFilter by values not in IdSet

hashtagFilter on broker

hashtagFilter on server

hashtag

Grouping Algorithm

hashtagWithin segment

hashtagTrimming tail groups

hashtagCross segments

hashtagAt Broker

hashtagGROUP BY behavior

hashtagHAVING behavior

hashtagConfiguration Parameters

JOINs

hashtagJOINs overview

hashtagSupported JOINs types and examples

hashtagInner join

hashtagExample of inner join

hashtagLeft join

hashtagRight join

hashtagFull join

hashtagCross join

hashtagSemi/Anti join

hashtagEqui join

hashtagJOINs optimizations

Query Options

hashtagSupported Query Options

hashtagSet Query Options

hashtagSET statement

hashtagOPTION keyword (deprecated)

User-Defined Functions (UDFs)

Query

hashtagExplore query syntax:

Querying Pinot

hashtagSQL Interface

hashtagPinot 1.0

hashtagIdentifier vs Literal

hashtagExample Queries

hashtagSelection

hashtagAggregation

hashtagGrouping on Aggregation

hashtagOrdering on Aggregation

hashtagFiltering

hashtagFiltering with NULL predicate

hashtagSelection (Projection)

Explore query syntax:

SQL Interface

Pinot 1.0

Identifier vs Literal

Example Queries

Selection

Aggregation

Grouping on Aggregation

Ordering on Aggregation

Filtering

Filtering with NULL predicate

Selection (Projection)

Ordering on Selection

Pagination on Selection

Wild-card match (in WHERE clause only)

Case-When Statement

UDF

BYTES column

Exact Results

Introduction

EXPLAIN PLAN using verbose mode for a query that evaluates filters with and without index

EXPLAIN PLAN ON GROUP BY QUERY

EXPLAIN PLAN OPERATORS

Functions

ID_SET

IN_ID_SET

IN_SUBQUERY

INPARTITIONEDSUBQUERY

Examples

Create IdSet

Filter by values in IdSet

Filter by values not in IdSet

Filter on broker

Filter on server

Within segment

Trimming tail groups

Cross segments

At Broker

GROUP BY behavior

HAVING behavior

Configuration Parameters

JOINs overview

Supported JOINs types and examples

Inner join

Example of inner join

Left join

Right join

Full join

Cross join

Semi/Anti join

Equi join

JOINs optimizations

Supported Query Options

Set Query Options

SET statement

OPTION keyword (deprecated)

Explore query syntax:

SQL Interface

Pinot 1.0

Identifier vs Literal

Example Queries

Selection

Aggregation

Grouping on Aggregation

Ordering on Aggregation

Filtering

Filtering with NULL predicate

Selection (Projection)

Ordering on Selection

Pagination on Selection

Wild-card match (in WHERE clause only)

Case-When Statement

UDF

BYTES column

Exact Results

Approximate Results

HyperLogLog

Theta Sketches

Tuple Sketches

Compressed Probability Counting (CPC) Sketches