Pinot Query Language (PQL)
Learn how to query Pinot using PQL
PQL
PQL is a derivative of SQL that supports selection, projection, aggregation, and grouping aggregation.
PQL Limitations
PQL is only a derivative of SQL, and it does not support Joins nor Subqueries. In order to support them, we suggest to rely on PrestoDB https://prestodb.io/, although Subqueries are not completely supported by PrestoDB at the moment of writing.
PQL Examples
The Pinot Query Language (PQL) is very similar to standard SQL:
Aggregation
Grouping on Aggregation
Ordering on Aggregation
Filtering
Selection (Projection)
Ordering on Selection
Pagination on Selection
Note: results might not be consistent if column ordered by has same value in multiple rows.
Wild-card match (in WHERE clause only)
To count rows where the column airlineName
starts with U
UDF
As of now, functions have to be implemented within Pinot. Injecting functions is not allowed yet. The example below demonstrate the use of UDFs. More examples in Transform Function in Aggregation Grouping
BYTES column
Pinot supports queries on BYTES column using HEX string. The query response also uses hex string to represent bytes value.
E.g. the query below fetches all the rows for a given UID.
PQL Specification
SELECT
The select statement is as follows
outputColumn
can be *
to project all columns, columns (foo
, bar
, baz
) or aggregation functions like (MIN(foo)
, MAX(bar)
, AVG(baz)
).
Filter Functions on Single Value/Multi-value
EQUALS
IN
NOT IN
GT
LT
BETWEEN
REGEXP_LIKE
For Multi-Valued columns, EQUALS is similar to CONTAINS.
Supported aggregations on single-value columns
COUNT
MIN
MAX
SUM
AVG
MINMAXRANGE
DISTINCT
DISTINCTCOUNT
DISTINCTCOUNTHLL
DISTINCTCOUNTRAWHLL
: Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL (see pinot-core/**/HllUtil.java as an example) and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.FASTHLL
(WARN: will be deprecated soon.FASTHLL
stores serialized HyperLogLog in String format, which performs worse thanDISTINCTCOUNTHLL
, which supports serialized HyperLogLog in BYTES (byte array) format)PERCENTILE[0-100]
: e.g.PERCENTILE5
,PERCENTILE50
,PERCENTILE99
, etc.PERCENTILEEST[0-100]
: e.g.PERCENTILEEST5
,PERCENTILEEST50
,PERCENTILEEST99
, etc.
Supported aggregations on multi-value columns
COUNTMV
MINMV
MAXMV
SUMMV
AVGMV
MINMAXRANGEMV
DISTINCTCOUNTMV
DISTINCTCOUNTHLLMV
DISTINCTCOUNTRAWHLLMV
: Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL (see pinot-core/**/HllUtil.java as an example) and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.FASTHLLMV
(WARN: will be deprecated soon. It does not make lots of sense to configure serialized HyperLogLog column as a dimension)PERCENTILE[0-100]MV
: e.g.PERCENTILE5MV
,PERCENTILE50MV
,PERCENTILE99MV
, etc.PERCENTILEEST[0-100]MV
: e.g.PERCENTILEEST5MV
,PERCENTILEEST50MV
,PERCENTILEEST99MV
, etc.
WHERE
Supported predicates are comparisons with a constant using the standard SQL operators (=
, <
, <=
, >
, >=
, <>
, ‘!=’) , range comparisons using BETWEEN
(foo BETWEEN 42 AND 69
), set membership (foo IN (1, 2, 4, 8)
) and exclusion (foo NOT IN (1, 2, 4, 8)
). For BETWEEN
, the range is inclusive.
Comparison with a regular expression is supported using the regexp_like function, as in WHERE regexp_like(columnName, 'regular expression')
GROUP BY
The GROUP BY
clause groups aggregation results by a list of columns, or transform functions on columns (see below)
ORDER BY
The ORDER BY
clause orders selection results or group by results by a list of columns. PQL supports ordering DESC
or ASC
.
TOP
The TOP n
clause causes the ‘n’ largest group results to be returned. If not specified, the top 10 groups are returned.
LIMIT
The LIMIT n
clause causes the selection results to contain at most ‘n’ results. The LIMIT a, b
clause paginate the selection results from the ‘a’ th results and return at most ‘b’ results. By default, 10 records are returned in the result.
Transform Function in Aggregation and Grouping
In aggregation and grouping, each column can be transformed from one or multiple columns. For example, the following query will calculate the maximum value of column foo
divided by column bar
grouping on the column time
converted from time unit MILLISECONDS
to SECONDS
:
Supported transform functions
Function | Description |
ADD | Sum of at least two values |
SUB | Difference between two values |
MULT | Product of at least two values |
DIV | Quotient of two values |
MOD | Modulo of two values |
ABS | Absolute of a value |
CEIL | Rounded up to the nearest integer. |
FLOOR | Rounded down to the nearest integer. |
EXP | exponential of |
LN | Euler’s number raised to the power of x. |
SQRT | Square root of a value |
TIMECONVERT | Takes 3 arguments, converts the value into another time unit.
Examples
|
DATETIMECONVERT | Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity.
Format is expressed as
Granularity is expressed as Examples 1) To convert column "Date" from hoursSinceEpoch to daysSinceEpoch and bucket it to 1 day granularity
2) To simply bucket millis "Date" to 15 minutes granularity
3) To convert column "Date" from hoursSinceEpoch to format yyyyMdd and bucket it to 1 days granularity
4) To convert column "Date" from format yyyy/MM/dd to weeksSinceEpoch and bucket it to 1 weeks granularity
5) To convert column "Date" from millis to format yyyyMdd in timezone PST
|
DATETRUNC | (Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc. Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.
Examples
|
ARRAYLENGTH | Returns the length of a multi-value column |
VALUEIN | Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The |
JSONEXTRACTSCALAR |
Examples
|
JSONEXTRACTKEY |
Examples
|
Differences with SQL
These differences only apply to the PQL endpoint. They do not hold true for the standard-SQL endpoint, which is the recommended endpoint. More information about the two types of endpoints in Querying Pinot
TOP
works likeLIMIT
for truncation in group by queriesNo need to select the columns to group with. The following two queries are both supported in PQL, where the non-aggregation columns are ignored.
The results will always order by the aggregated value (descending). The results for query
will be the same as the combining results from the following queries
where we don’t put the results for the same group together.
No support for ORDER BY in aggregation group by. However, ORDER BY support was added recently and is available in the standard-SQL endpoint. It can be used in the PQL endpoint by passing
queryOptions
into the payload as follows
where,
groupByMode=sql
- standard sql way of execution group by, hence accepting order byresponseFormat=sql
- standard sql way of displaying results, in a tabular manner
Last updated