# Pinot Query Language (PQL)

## PQL

PQL is a derivative of SQL that supports selection, projection, aggregation, and grouping aggregation.

## PQL Limitations

PQL is only a derivative of SQL, and it does not support Joins nor Subqueries. In order to support them, we suggest to rely on PrestoDB <https://prestodb.io/>, although Subqueries are not completely supported by PrestoDB at the moment of writing.

## PQL Examples

The Pinot Query Language (PQL) is very similar to standard SQL:

```sql
SELECT COUNT(*) FROM myTable
```

### Aggregation

```sql
SELECT COUNT(*), MAX(foo), SUM(bar) FROM myTable
```

### Grouping on Aggregation

```sql
SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM myTable
  GROUP BY bar, baz LIMIT 50
```

### Ordering on Aggregation

```sql
SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM myTable
  GROUP BY bar, baz 
  ORDER BY bar, MAX(foo) DESC LIMIT 50
```

### Filtering

```sql
SELECT COUNT(*) FROM myTable
  WHERE foo = 'foo'
  AND bar BETWEEN 1 AND 20
  OR (baz < 42 AND quux IN ('hello', 'goodbye') AND quuux NOT IN (42, 69))
```

### Selection (Projection)

```sql
SELECT * FROM myTable
  WHERE quux < 5
  LIMIT 50
```

### Ordering on Selection

```sql
SELECT foo, bar FROM myTable
  WHERE baz > 20
  ORDER BY bar DESC
  LIMIT 100
```

### Pagination on Selection

Note: results might not be consistent if column ordered by has same value in multiple rows.

```sql
SELECT foo, bar FROM myTable
  WHERE baz > 20
  ORDER BY bar DESC
  LIMIT 50, 100
```

### Wild-card match (in WHERE clause only)

To count rows where the column `airlineName` starts with `U`

```sql
SELECT count(*) FROM SomeTable
  WHERE regexp_like(airlineName, '^U.*')
  GROUP BY airlineName TOP 10
```

### UDF

As of now, functions have to be implemented within Pinot. Injecting functions is not allowed yet. The example below demonstrate the use of UDFs. More examples in [Transform Function in Aggregation Grouping](https://docs.pinot.apache.org/users/user-guide-query/pinot-query-language#transform-function-in-aggregation-and-grouping)

```sql
SELECT count(*) FROM myTable
  GROUP BY dateTimeConvert(timeColumnName, '1:MILLISECONDS:EPOCH', '1:HOURS:EPOCH', '1:HOURS')
```

### BYTES column

Pinot supports queries on BYTES column using HEX string. The query response also uses hex string to represent bytes value.

E.g. the query below fetches all the rows for a given UID.

```sql
SELECT * FROM myTable
  WHERE UID = "c8b3bce0b378fc5ce8067fc271a34892"
```

## PQL Specification

### SELECT

The select statement is as follows

```sql
SELECT <outputColumn> (, outputColumn, outputColumn,...)
  FROM <tableName>
  (WHERE ... | GROUP BY ... | ORDER BY ... | TOP ... | LIMIT ...)
```

`outputColumn` can be `*` to project all columns, columns (`foo`, `bar`, `baz`) or aggregation functions like (`MIN(foo)`, `MAX(bar)`, `AVG(baz)`).

### Filter Functions on Single Value/Multi-value

* `EQUALS`
* `IN`
* `NOT IN`
* `GT`
* `LT`
* `BETWEEN`
* `REGEXP_LIKE`

For Multi-Valued columns, EQUALS is similar to CONTAINS.

### Supported aggregations on single-value columns

* `COUNT`
* `MIN`
* `MAX`
* `SUM`
* `AVG`
* `MINMAXRANGE`
* `DISTINCT`
* `DISTINCTCOUNT`
* `DISTINCTCOUNTHLL`
* `DISTINCTCOUNTRAWHLL`: Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL (see pinot-core/\*\*/HllUtil.java as an example) and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
* `FASTHLL` (**WARN**: will be deprecated soon. `FASTHLL` stores serialized HyperLogLog in String format, which performs worse than `DISTINCTCOUNTHLL`, which supports serialized HyperLogLog in BYTES (byte array) format)
* `PERCENTILE[0-100]`: e.g. `PERCENTILE5`, `PERCENTILE50`, `PERCENTILE99`, etc.
* `PERCENTILEEST[0-100]`: e.g. `PERCENTILEEST5`, `PERCENTILEEST50`, `PERCENTILEEST99`, etc.

### Supported aggregations on multi-value columns

* `COUNTMV`
* `MINMV`
* `MAXMV`
* `SUMMV`
* `AVGMV`
* `MINMAXRANGEMV`
* `DISTINCTCOUNTMV`
* `DISTINCTCOUNTHLLMV`
* `DISTINCTCOUNTRAWHLLMV`: Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL (see pinot-core/\*\*/HllUtil.java as an example) and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
* `FASTHLLMV` (**WARN**: will be deprecated soon. It does not make lots of sense to configure serialized HyperLogLog column as a dimension)
* `PERCENTILE[0-100]MV`: e.g. `PERCENTILE5MV`, `PERCENTILE50MV`, `PERCENTILE99MV`, etc.
* `PERCENTILEEST[0-100]MV`: e.g. `PERCENTILEEST5MV`, `PERCENTILEEST50MV`, `PERCENTILEEST99MV`, etc.

### WHERE

Supported predicates are comparisons with a constant using the standard SQL operators (`=`, `<`, `<=`, `>`, `>=`, `<>`, ‘!=’) , range comparisons using `BETWEEN` (`foo BETWEEN 42 AND 69`), set membership (`foo IN (1, 2, 4, 8)`) and exclusion (`foo NOT IN (1, 2, 4, 8)`). For `BETWEEN`, the range is inclusive.

Comparison with a regular expression is supported using the regexp\_like function, as in `WHERE regexp_like(columnName, 'regular expression')`

### GROUP BY

The `GROUP BY` clause groups aggregation results by a list of columns, or transform functions on columns (see below)

### ORDER BY

The `ORDER BY` clause orders selection results or group by results by a list of columns. PQL supports ordering `DESC` or `ASC`.

### TOP

The `TOP n` clause causes the ‘n’ largest group results to be returned. If not specified, the top 10 groups are returned.

### LIMIT

The `LIMIT n` clause causes the selection results to contain at most ‘n’ results. The `LIMIT a, b` clause paginate the selection results from the ‘a’ th results and return at most ‘b’ results. By default, 10 records are returned in the result.

### Transform Function in Aggregation and Grouping

In aggregation and grouping, each column can be transformed from one or multiple columns. For example, the following query will calculate the maximum value of column `foo` divided by column `bar` grouping on the column `time` converted from time unit `MILLISECONDS` to `SECONDS`:

```sql
SELECT MAX(DIV(foo, bar) FROM myTable
  GROUP BY DATETIMECONVERT(time, '1:MILLISECONDS:EPOCH', '1:SECONDS:EPOCH', '1:SECONDS')
```

#### Supported transform functions

| Function          | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ADD               | Sum of at least two values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| SUB               | Difference between two values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| MULT              | Product of at least two values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| DIV               | Quotient of two values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| MOD               | Modulo of two values                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| ABS               | Absolute of a value                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| CEIL              | Rounded up to the nearest integer.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| FLOOR             | Rounded down to the nearest integer.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| EXP               | exponential of                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| LN                | Euler’s number raised to the power of x.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| SQRT              | Square root of a value                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| TIMECONVERT       | <p>Takes 3 arguments, converts the value into another time unit.</p><p><br><strong>Examples</strong><br><code>TIMECONVERT(time, 'MILLISECONDS', 'SECONDS')</code> - This expression converts the value of column <code>time</code> (taken to be in milliseconds) to the nearest seconds (<em>i.e.</em> the nearest seconds that is lower than the value of <code>date</code> column)</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| DATETIMECONVERT   | <p>Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity.</p><p><code>DATETIMECONVERT(columnName, inputFormat, outputFormat, outputGranularity)</code>where,<br><code>columnName</code> - column name to convert<br><code>inputFormat</code> - format of the column columnName<br><code>outputFormat</code> - format of the result desired after conversion <code>outputGranularity</code> - the granularity in which to bucket the result</p><p>Format is expressed as <code>\<time size>:\<time unit>:\<time format>:\<pattern></code><br>where,</p><p><code>time size</code> - size of the time unit eg: 1, 10</p><p><code>time unit</code> - HOURS, DAYS etc</p><p><code>time format</code> - EPOCH or SIMPLE\_DATE\_FORMAT</p><p><code>pattern</code> - this is defined in case of SIMPLE\_DATE\_FORMAT. eg: yyyyMMdd. A specific timezone can be passed using tz(timezone).</p><p><code>timezone</code> - can be expressed as long form tz(Asia/Kolkata), or short form tz(IST) or in terms of GMT tz(GMT+0530). Default is UTC. It is recommended to use long form timezone, as short forms are ambiguous with daylight savings (eg: PDT works during daylight savings, PST otherwise)</p><p>Granularity is expressed as <code>\<time size>:\<time unit></code></p><p><strong>Examples</strong></p><p>1) To convert column "Date" from hoursSinceEpoch to daysSinceEpoch and bucket it to 1 day granularity<br><code>dateTimeConvert(Date, '1:HOURS:EPOCH', '1:DAYS:EPOCH', '1:DAYS')</code></p><p>2) To simply bucket millis "Date" to 15 minutes granularity<br><code>dateTimeConvert(Date, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES')</code></p><p>3) To convert column "Date" from hoursSinceEpoch to format yyyyMdd and bucket it to 1 days granularity<br><code>dateTimeConvert(Date, '1:HOURS:EPOCH', '1:DAYS:SIMPLE\_DATE\_FORMAT:yyyyMMdd', '1:DAYS')</code></p><p>4) To convert column "Date" from format yyyy/MM/dd to weeksSinceEpoch and bucket it to 1 weeks granularity<br><code>dateTimeConvert(Date, '1:DAYS:SIMPLE\_DATE\_FORMAT:yyyy/MM/dd', '1:WEEKS:EPOCH', '1:WEEKS')</code></p><p>5) To convert column "Date" from millis to format yyyyMdd in timezone PST<br><code>dateTimeConvert(Date, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE\_DATE\_FORMAT:yyyyMMdd tz(America/Los\_Angeles)', '1:DAYS')</code></p> |
| DATETRUNC         | <p>(Presto) SQL compatible date truncation, equivalent to the Presto function <a href="https://mode.com/blog/date-trunc-sql-timestamp-function-count-on">date\_trunc</a>. Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.<br><br><strong>Examples</strong><br><code>DATETRUNC('week', time\_in\_seconds, 'SECONDS')</code> This expression converts the column <code>time\_in\_seconds</code>, which is a long containing seconds since UTC epoch truncated at <code>WEEK</code> (where a Week starts at Monday UTC midnight). The output is a long seconds since UTC epoch.<br></p><p><code>DATETRUNC('quarter', DIV(time\_milliseconds/1000), 'SECONDS', 'America/Los\_Angeles', 'HOURS')</code> This expression converts the expression <code>time\_in\_milliseconds/1000</code> (which is thus in seconds) into hours that are truncated at <code>QUARTER</code> at the Los Angeles time zone (where a Quarter begins on 1/1, 4/1, 7/1, 10/1 in Los Angeles timezone). The output is expressed as hours since UTC epoch (note that the output is not Los Angeles timezone)</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| ARRAYLENGTH       | Returns the length of a multi-value column                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| VALUEIN           | <p>Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The <code>VALUEIN</code> transform function is especially useful when the same multi-valued column is both filtering column and grouping column.<br><br><strong>Examples</strong><br><code>VALUEIN(mvColumn, 3, 5, 15)</code></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| JSONEXTRACTSCALAR | <p><strong><code>JSONEXTRACTSCALAR(jsonField, 'jsonPath', 'resultsType')</code></strong>evaluates the <code>jsonPath</code> on <code>jsonField</code> (a string containing JSON) and returns the result as a type <code>resultsType</code></p><p><code>jsonFieldName</code> is a String field with Json document.</p><p><code>jsonPath</code> is a <a href="https://goessner.net/articles/JsonPath/">JsonPath expression</a> to read from JSON document</p><p><code>results\_type</code> refers to the results data type, could be <code>INT</code>, <code>LONG</code>, <code>FLOAT</code>, <code>DOUBLE</code>, <code>STRING</code>, <code>INT\_ARRAY</code>, <code>LONG\_ARRAY</code>, <code>FLOAT\_ARRAY</code>, <code>DOUBLE\_ARRAY</code>, <code>STRING\_ARRAY</code>.</p><p><strong>Examples</strong></p><p><code>JSONEXTRACTSCALAR(profile\_json\_str, '$.name', 'STRING') -> "bob"</code></p><p><code>JSONEXTRACTSCALAR(profile\_json\_str, '$.age', 'INT') -> 37</code></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| JSONEXTRACTKEY    | <p><strong><code>JSONEXTRACTKEY(jsonField, 'jsonPath')</code></strong> extracts all field names based on <code>jsonPath</code> as a <code>STRING\_ARRAY.</code></p><p><code>jsonFieldName</code> is a String field with Json document.</p><p><code>jsonPath</code> is a <a href="https://goessner.net/articles/JsonPath/">JsonPath expression</a> to read from JSON document</p><p><strong>Examples</strong></p><p><code>JSONEXTRACTSCALAR(profile\_json\_str, '$.\*') -> \["name", "age", "phone"...]</code></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |

## Differences with SQL

{% hint style="info" %}
These differences only apply to the PQL endpoint. They do not hold true for the standard-SQL endpoint, which is the recommended endpoint. More information about the two types of endpoints in [Querying Pinot](https://docs.pinot.apache.org/release-0.4.0/api/querying-pinot-using-standard-sql#rest-api-on-the-broker)
{% endhint %}

* `TOP` works like `LIMIT` for truncation in group by queries
* No need to select the columns to group with. The following two queries are both supported in PQL, where the non-aggregation columns are ignored.

```sql
SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM mytable
  GROUP BY bar, baz
  TOP 50

SELECT bar, baz, MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM mytable
  GROUP BY bar, baz
  TOP 50
```

* The results will always order by the aggregated value (descending). The results for query

```sql
SELECT MIN(foo), MAX(foo) FROM myTable
  GROUP BY bar
  TOP 50
```

will be the same as the combining results from the following queries

```sql
SELECT MIN(foo) FROM myTable
  GROUP BY bar
  TOP 50
SELECT MAX(foo) FROM myTable
  GROUP BY bar
  TOP 50
```

where we don’t put the results for the same group together.

* No support for ORDER BY in aggregation group by. However, ORDER BY support was added recently and is available in the standard-SQL endpoint. It can be used in the PQL endpoint by passing `queryOptions` into the payload as follows

```javascript
{
  "pql" : "SELECT SUM(foo), SUM(bar) from myTable GROUP BY moo ORDER BY SUM(bar) ASC, moo DESC TOP 10",
  "queryOptions" : "groupByMode=sql;responseFormat=sql"
}
```

where,

* `groupByMode=sql` - standard sql way of execution group by, hence accepting order by
* `responseFormat=sql` - standard sql way of displaying results, in a tabular manner
