Supported Transformations

This document contains the list of all the transformation functions supported by Pinot Query Language (PQL).

Math Functions

Function

Description

Example

ADD(col1, col2, col3...)

Sum of at least two values

ADD(score_maths, score_science, score_history)

SUB(col1, col2)

Difference between two values

SUB(total_score, score_science)

MULT(col1, col2, col3...)

Product of at least two values

MUL(score_maths, score_science, score_history)

DIV(col1, col2)

Quotient of two values

SUB(total_score, total_subjects)

MOD(col1, col2)

Modulo of two values

MOD(total_score, total_subjects)

ABS(col1)

Absolute of a value

ABS(score)

CEIL(col1)

Rounded up to the nearest integer.

CEIL(percentage)

FLOOR(col1)

Rounded down to the nearest integer.

FLOOR(percentage)

EXP(col1)

Euler’s number(e) raised to the power of col.

EXP(age)

LN(col1)

Natural log of value i.e. ln(col1)

LN(age)

SQRT(col1)

Square root of a value

SQRT(height)

String Functions

Multiple string functions are supported out of the box from release-0.5.0 .

Function

Description

Example

UPPER(col)

convert string to upper case

UPPER(playerName)

LOWER(col)

convert string to lower case

LOWER(playerName)

REVERSE(col)

reverse the string

REVERSE(playerName)

SUBSTR(col, startIndex, endIndex)

get substring of the input string from start to endIndex. Index begins at 0. Set endIndex to -1 to calculate till end of the string

SUBSTR(playerName, 1, -1)

SUBSTR(playerName, 1, 4)

CONCAT(col1, col2, seperator)

Concatenate two input strings using the seperator

CONCAT(firstName, lastName, '-')

TRIM(col)

trim spaces from both side of the string

TRIM(playerName)

LTRIM(col)

trim spaces from left side of the string

LTRIM(playerName)

RTRIM(col)

trim spaces from right side of the string

RTRIM(playerName)

LENGTH(col)

calculate length of the string

LENGTH(playerName)

STRPOS(col, find, N)

find Nth instance of find string in input. Returns 0 if input string is empty. Returns -1 if the Nth instance is not found or input string is null.

STRPOS(playerName, 'david', 1)

STARTSWITH(col, prefix)

returns true if columns starts with prefix string.

STARTSWITH(playerName, 'david')

REPLACE(col, find, substitute)

replace all instances of find with replace in input

REPLACE(playerName, 'david', 'henry')

RPAD(col, size, pad)

string padded from the right side with pad to reach final size

RPAD(playerName, 20, 'foo')

LPAD(col, size, pad)

string padded from the left side with pad to reach final size

LPAD(playerName, 20, 'foo')

CODEPOINT(col)

the Unicode codepoint of the first character of the string

CODEPOINT(playerName)

CHR(codepoint)

the character corresponding to the Unicode codepoint

CHR(68)

DateTime Functions

Date time functions allow you to perform transformations on columns which contains timestamps or date.

Function

Description

Example

TIMECONVERT

(col, fromUnit, toUnit)

converts the value into another time unit. column should be timestamp. Supported units are DAYS HOURS MICROSECONDS MILLISECONDS MINUTES NANOSECONDS SECONDS

TIMECONVERT(time, 'MILLISECONDS', 'SECONDS')This expression converts the value of column time (taken to be in milliseconds) to the nearest seconds (i.e. the nearest seconds that is lower than the value of date column)

DATETIMECONVERT

(columnName, inputFormat, outputFormat, outputGranularity)

Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity.

Format is expressed as <time size>:<time unit>:<time format>:<pattern> where,

time size - size of the time unit eg: 1, 10

time unit - HOURS, DAYS etc

time format - EPOCH or SIMPLE_DATE_FORMAT

pattern - this is defined in case of SIMPLE_DATE_FORMAT eg: yyyy-MM-dd. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. Asia/Kolkata or PDT

granularity - specified in format<time size>:<time unit>

  • Date from hoursSinceEpoch to daysSinceEpoch and bucket it to 1 day granularity DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:EPOCH', '1:DAYS')

  • Date to 15 minutes granularity DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES')

  • Date from hoursSinceEpoch to format yyyyMdd and bucket it to 1 days granularity DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', '1:DAYS')

  • Date from format yyyy/MM/dd to weeksSinceEpoch and bucket it to 1 week granularity DATETIMECONVERT(Date, '1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd', '1:WEEKS:EPOCH', '1:WEEKS')

  • Date from milliseconds to format yyyyMdd in timezone PST DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd tz(America/Los_Angeles)', '1:DAYS')

DATETRUNC

(Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc.

Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.

DATETRUNC('week', time_in_seconds, 'SECONDS') This expression converts the column time_in_seconds, which is a long containing seconds since UTC epoch truncated at WEEK (where a Week starts at Monday UTC midnight). The output is a long seconds since UTC epoch.

DATETRUNC('quarter', DIV(time_milliseconds/1000), 'SECONDS', 'America/Los_Angeles', 'HOURS') This expression converts the expression time_in_milliseconds/1000into hours that are truncated to QUARTER at the Los Angeles time zone (where a Quarter begins on 1/1, 4/1, 7/1, 10/1 in Los Angeles timezone). The output is expressed as hours since UTC epoch (note that the output is not Los Angeles timezone)

JSON Functions

Function

Description

Example

JSONEXTRACTSCALAR

(jsonField, 'jsonPath', 'resultsType')

Evaluates the jsonPath on jsonField (a string containing JSON) and returns the result as a type resultsType

jsonFieldName is a String field with Json document.

jsonPath is a JsonPath expression to read from JSON document

results_type - can be INT, LONG, FLOAT, DOUBLE, STRING, INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.

  • JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING') -> "bob"

  • JSONEXTRACTSCALAR(profile_json_str, '$.age', 'INT') -> 37

JSONEXTRACTKEY

(jsonField, 'jsonPath')

Extracts all field names based on jsonPath as a STRING_ARRAY.

jsonFieldName is a String field with Json document.

jsonPath is a JsonPath expression to read from JSON document

JSONEXTRACTSCALAR(profile_json_str, '$.*') -> ["name", "age", "phone"...]

Multi-value Column Functions

All of the functions mentioned till now only support single value columns. You can use the following functions to do operations on multi-value columns.

Function

Description

Example

ARRAYLENGTH

Returns the length of a multi-value column

MAP_VALUE

Select the value for a key from Map stored in Pinot.

MAP_VALUE(mapColumn, 'myKey', valueColumn)

VALUEIN

Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The VALUEIN transform function is especially useful when the same multi-valued column is both filtering column and grouping column.

VALUEIN(mvColumn, 3, 5, 15)

Advanced Queries

Geospatial Queries

Pinot supports Geospatial queries on columns containing text-based geographies. Check out Geospatial for more details on the queries and how to enable them.

Text Queries

Pinot supports Pattern matching on text-based columns as well. Only the columns mentioned as text columns in table config can be queried using this method. Check out Text search support for more details on how to enable pattern matching.