Transformation Functions

This document contains the list of all the transformation functions supported by Pinot SQL.

Math Functions

Function

ADD(col1, col2, col3...) Sum of at least two values

SUB(col1, col2) Difference between two values

MULT(col1, col2, col3...) Product of at least two values

DIV(col1, col2) Quotient of two values

MOD(col1, col2) Modulo of two values

ABS(col1) Absolute of a value

CEIL(col1) Rounded up to the nearest integer.

FLOOR(col1) Rounded down to the nearest integer.

EXP(col1) Euler’s number(e) raised to the power of col.

LN(col1) Natural log of value i.e. ln(col1)

SQRT(col1) Square root of a value

String Functions

Multiple string functions are supported out of the box from release-0.5.0 .

Function

UPPER(col) convert string to upper case

LOWER(col) convert string to lower case

REVERSE(col) reverse the string

SUBSTR(col, startIndex, endIndex) Gets substring of the input string from start to endIndex. Index begins at 0. Set endIndex to -1 to calculate till end of the string

CONCAT(col1, col2, seperator) Concatenate two input strings using the seperator

TRIM(col) trim spaces from both side of the string

LTRIM(col) trim spaces from left side of the string

RTRIM(col) trim spaces from right side of the string

LENGTH(col) calculate length of the string

STRPOS(col, find, N) Find Nth instance of `find` string in input. Returns 0 if input string is empty. Returns -1 if the Nth instance is not found or input string is null.

STARTSWITH(col, prefix) returns `true` if columns starts with prefix string.

REPLACE(col, find, substitute) replace all instances of `find` with `replace` in input

RPAD(col, size, pad) string padded from the right side with `pad` to reach final `size`

LPAD(col, size, pad) string padded from the left side with `pad` to reach final `size`

CODEPOINT(col) the Unicode codepoint of the first character of the string

CHR(codepoint) the character corresponding to the Unicode codepoint

regexpExtract(value, regexp) Extracts values that match the provided regular expression

regexpReplace(input, matchRegexp, replaceRegexp, matchStartPos, occurrence, flag) Find and replace a string or regexp pattern with a target string or regexp pattern

remove(input, search) removes all instances of search from string

urlEncoding(string) url-encode a string with UTF-8 format

urlDecoding(string) decode a url to plaintext string

fromBase64(string) decode a Base64-encoded string to bytes represented as a hex string

toUtf8(string) decode a UTF8-encoded string to bytes represented as a hex string

isSubnetOf(ipPrefix, ipAddress) checks if ipAddress is in the subnet of the ipPrefix

DateTime Functions

Date time functions allow you to perform transformations on columns that contain timestamps or dates.

Function

TIMECONVERT(col, fromUnit, toUnit) Converts the value into another time unit. the column should be an epoch timestamp.

DATETIMECONVERT(columnName, inputFormat, outputFormat, outputGranularity) Converts the value into another date time format, and buckets time based on the given time granularity.

DATETRUNC Converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.

ToEpoch<TIME_UNIT>(timeInMillis) Convert epoch milliseconds to epoch <Time Unit>.

ToEpoch<TIME_UNIT>Rounded(timeInMillis, bucketSize) Convert epoch milliseconds to epoch <Time Unit>, round to nearest rounding bucket(Bucket size is defined in <Time Unit>).

ToEpoch<TIME_UNIT>Bucket(timeInMillis, bucketSize) Convert epoch milliseconds to epoch <Time Unit>, and divided by bucket size(Bucket size is defined in <Time Unit>).

FromEpoch<TIME_UNIT> Convert epoch <Time Unit> to epoch milliseconds.(timeIn<Time_UNIT>)

FromEpoch<TIME_UNIT>Bucket(timeIn<Time_UNIT>, bucketSizeIn<Time_UNIT>) Convert epoch <Bucket Size><Time Unit> to epoch milliseconds.

ToDateTime(timeInMillis, pattern[, timezoneId]) Convert epoch millis value to DateTime string represented by pattern.

FromDateTime(dateTimeString, pattern) Convert DateTime string represented by pattern to epoch millis.

round(timeValue, bucketSize) Round the given time value to nearest bucket start value.

now() Return current time as epoch millis

ago() Return time as epoch millis before the given period (in ISO-8601 duration format)

timezoneHour(timeZoneId) Returns the hour of the time zone offset.

timezoneMinute(timeZoneId) Returns the minute of the time zone offset.

year(tsInMillis) Returns the year from the given epoch millis in UTC timezone.

year(tsInMillis, timeZoneId) Returns the year from the given epoch millis and timezone id.

yearOfWeek(tsInMillis) Returns the year of the ISO week from the given epoch millis in UTC timezone. Alias `yow`is also supported.

yearOfWeek(tsInMillis, timeZoneId) Returns the year of the ISO week from the given epoch millis and timezone id. Alias `yow`is also supported.

quarter(tsInMillis) Returns the quarter of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 4.

quarter(tsInMillis, timeZoneId) Returns the quarter of the year from the given epoch millis and timezone id. The value ranges from 1 to 4.

month(tsInMillis) Returns the month of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 12.

month(tsInMillis, timeZoneId) Returns the month of the year from the given epoch millis and timezone id. The value ranges from 1 to 12.

week(tsInMillis) Returns the ISO week of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 53. Alias `weekOfYear` is also supported.

week(tsInMillis, timeZoneId) Returns the ISO week of the year from the given epoch millis and timezone id. The value ranges from 1 to 53. Alias `weekOfYear` is also supported.

dayOfYear(tsInMillis) Returns the day of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 366. Alias `doy` is also supported.

dayOfYear(tsInMillis, timeZoneId) Returns the day of the year from the given epoch millis and timezone id. The value ranges from 1 to 366. Alias `doy` is also supported.

day(tsInMillis) Returns the day of the month from the given epoch millis in UTC timezone. The value ranges from 1 to 31. Alias `dayOfMonth` is also supported.

day(tsInMillis, timeZoneId) Returns the day of the month from the given epoch millis and timezone id. The value ranges from 1 to 31. Alias `dayOfMonth` is also supported.

dayOfWeek(tsInMillis) Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday). Alias `dow` is also supported.

dayOfWeek(tsInMillis, timeZoneId) Returns the day of the week from the given epoch millis and timezone id. The value ranges from 1(Monday) to 7(Sunday). Alias `dow` is also supported.

hour(tsInMillis) Returns the hour of the day from the given epoch millis in UTC timezone. The value ranges from 0 to 23.

hour(tsInMillis, timeZoneId) Returns the hour of the day from the given epoch millis and timezone id. The value ranges from 0 to 23.

minute(tsInMillis) Returns the minute of the hour from the given epoch millis in UTC timezone. The value ranges from 0 to 59.

minute(tsInMillis, timeZoneId) Returns the minute of the hour from the given epoch millis and timezone id. The value ranges from 0 to 59.

second(tsInMillis) Returns the second of the minute from the given epoch millis in UTC timezone. The value ranges from 0 to 59.

second(tsInMillis, timeZoneId) Returns the second of the minute from the given epoch millis and timezone id. The value ranges from 0 to 59.

millisecond(tsInMillis) Returns the millisecond of the second from the given epoch millis in UTC timezone. The value ranges from 0 to 999.

millisecond(tsInMillis, timeZoneId) Returns the millisecond of the second from the given epoch millis and timezone id. The value ranges from 0 to 999.

JSON Functions

Transform Functions

These functions can only be used in Pinot SQL queries.

Function

JSONEXTRACTSCALAR(jsonField, 'jsonPath', 'resultsType', [defaultValue]) Evaluates the `'jsonPath'` on `jsonField`, returns the result as the type `'resultsType'`, use optional `defaultValue`for null or parsing error.

JSONEXTRACTKEY(jsonField, 'jsonPath') Extracts all matched JSON field keys based on `'jsonPath'` into a `STRING_ARRAY.`

EXTRACT(dateTimeField FROM dateTimeExpression) Extracts the field from the DATETIME expression of the format `'YYYY-MM-DD HH:MM:SS'`. Currently, this transformation function supports `YEAR`, `MONTH`, `DAY`, `HOUR`, `MINUTE`, and `SECOND` fields.

Scalar Functions

These functions can be used for column transformation in table ingestion configs.

Function

JSONFORMAT(object) Convert object to JSON String

JSONPATH(jsonField, 'jsonPath') Extracts the object value from `jsonField` based on `'jsonPath'`, the result type is inferred based on JSON value. Cannot be used in query because data type is not specified.

JSONPATHLONG(jsonField, 'jsonPath', [defaultValue]) Extracts the Long value from `jsonField` based on `'jsonPath'`, use optional `defaultValue`for null or parsing error.

JSONPATHDOUBLE(jsonField, 'jsonPath', [defaultValue]) Extracts the Double value from `jsonField` based on `'jsonPath'`, use optional `defaultValue`for null or parsing error.

JSONPATHSTRING(jsonField, 'jsonPath', [defaultValue]) Extracts the String value from `jsonField` based on `'jsonPath'`, use optional `defaultValue`for null or parsing error.

JSONPATHARRAY(jsonField, 'jsonPath') Extracts an array from `jsonField` based on `'jsonPath'`, the result type is inferred based on JSON value. Cannot be used in query because data type is not specified.

JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath') Extracts an array from `jsonField` based on `'jsonPath'`, the result type is inferred based on JSON value. Returns empty array for null or parsing error. Cannot be used in query because data type is not specified.

Binary Functions

Function

SHA(bytesCol) Return SHA-1 digest of binary column(`bytes` type) as hex string

SHA256(bytesCol) Return SHA-256 digest of binary column(`bytes` type) as hex string

SHA512(bytesCol) Return SHA-512 digest of binary column(`bytes` type) as hex string

MD5(bytesCol) Return MD5 digest of binary column(`bytes` type) as hex string

toBase64(bytesCol) Return the Base64-encoded string of binary column(`bytes` type)

fromUtf8(bytesCol) Return the UTF8-encoded string of binary column(`bytes` type)

Multi-value Column Functions

All of the functions mentioned till now only support single value columns. You can use the following functions to do operations on multi-value columns.

Function

ARRAYLENGTH Returns the length of a multi-value

MAP_VALUE Select the value for a key from Map stored in Pinot. `MAP_VALUE(mapColumn, 'myKey', valueColumn)`

VALUEIN The transform function will filter the value from the multi-valued column with the given constant values. The `VALUEIN` transform function is especially useful when the same multi-valued column is both filtering column and grouping column.