githubEdit

Transformation Functions

This document contains the list of all the transformation functions supported by Pinot SQL.

Math Functions

Function

ADD(col1, col2, col3...)arrow-up-right Sum of at least two values

SUB(col1, col2)arrow-up-right Difference between two values

MULT(col1, col2, col3...)arrow-up-right Product of at least two values

DIV(col1, col2)arrow-up-right Quotient of two values

MOD(col1, col2)arrow-up-right Modulo of two values

ABS(col1)arrow-up-right Absolute of a value

CEIL(col1)arrow-up-right Rounded up to the nearest integer

FLOOR(col1)arrow-up-right Rounded down to the nearest integer

EXP(col1)arrow-up-right Euler’s number(e) raised to the power of col.

LN(col1)arrow-up-right Natural log of value

SQRT(col1)arrow-up-right Square root of a value

ROUNDDECIMAL(col1, col2) arrow-up-right

Rounds value to a specified number of decimal places

String Functions

Multiple string functions are supported out of the box from release-0.5.0 .

Function

UPPERarrow-up-right(col) convert string to upper case

LOWERarrow-up-right(col) convert string to lower case

REVERSEarrow-up-right(col) reverse the string

SUBSTRarrow-up-right(col, startIndex, endIndex) Gets substring of the input string from start to endIndex. Index begins at 0. Set endIndex to -1 to calculate till end of the string

CONCAT(col1, col2, seperator)arrow-up-right Concatenate two input strings using the seperator

TRIM(col)arrow-up-right trim spaces from both side of the string

LTRIM(col)arrow-up-right trim spaces from left side of the string

RTRIM(col)arrow-up-right trim spaces from right side of the string

LENGTH(col)arrow-up-right calculate length of the string

STRPOS(col, find, N)arrow-up-right Find Nth instance of find string in input. Returns 0 if input string is empty. Returns -1 if the Nth instance is not found or input string is null.

STARTSWITH(col, prefix)arrow-up-right returns true if columns starts with prefix string.

REPLACE(col, find, substitute)arrow-up-right replace all instances of find with replace in input

RPAD(col, size, pad)arrow-up-right string padded from the right side with pad to reach final size

LPAD(col, size, pad)arrow-up-right string padded from the left side with pad to reach final size

CODEPOINT(col)arrow-up-right the Unicode codepoint of the first character of the string

CHR(codepoint)arrow-up-right the character corresponding to the Unicode codepoint

regexpExtract(value, regexp)arrow-up-right Extracts values that match the provided regular expression

regexpReplace(input, matchRegexp, replaceRegexp, matchStartPos, occurrence, flag) arrow-up-rightFind and replace a string or regexp pattern with a target string or regexp pattern

remove(input, search)arrow-up-right removes all instances of search from string

urlEncoding(string)arrow-up-right url-encode a string with UTF-8 format

urlDecoding(string)arrow-up-right decode a url to plaintext string

fromBase64(string)arrow-up-right decode a Base64-encoded string to bytes represented as a hex string

toUtf8(string)arrow-up-right decode a UTF8-encoded string to bytes represented as a hex string

isSubnetOf(ipPrefix, ipAddress)arrow-up-right checks if ipAddress is in the subnet of the ipPrefix

DateTime Functions

Date time functions allow you to perform transformations on columns that contain timestamps or dates.

Function

TIMECONVERT(col, fromUnit, toUnit)arrow-up-right Converts the value into another time unit. the column should be an epoch timestamp.

DATETIMECONVERT(columnName, inputFormat, outputFormat, outputGranularity)arrow-up-right Converts the value into another date time format, and buckets time based on the given time granularity.

DATETRUNCarrow-up-right Converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.

ToEpoch<TIME_UNIT>(timeInMillis)arrow-up-right Convert epoch milliseconds to epoch <Time Unit>.

ToEpoch<TIME_UNIT>Rounded(timeInMillis, bucketSize)arrow-up-right Convert epoch milliseconds to epoch <Time Unit>, round to nearest rounding bucket(Bucket size is defined in <Time Unit>).

ToEpoch<TIME_UNIT>Bucket(timeInMillis, bucketSize)arrow-up-right Convert epoch milliseconds to epoch <Time Unit>, and divided by bucket size(Bucket size is defined in <Time Unit>).

FromEpoch<TIME_UNIT> arrow-up-rightConvert epoch <Time Unit> to epoch milliseconds.(timeIn<Time_UNIT>)arrow-up-right

FromEpoch<TIME_UNIT>Bucket(timeIn<Time_UNIT>, bucketSizeIn<Time_UNIT>)arrow-up-right Convert epoch <Bucket Size><Time Unit> to epoch milliseconds.

ToDateTime(timeInMillis, pattern[, timezoneId])arrow-up-right Convert epoch millis value to DateTime string represented by pattern.

FromDateTime(dateTimeString, pattern)arrow-up-right Convert DateTime string represented by pattern to epoch millis.

round(timeValue, bucketSize)arrow-up-right Round the given time value to nearest bucket start value.

now()arrow-up-right Return current time as epoch millis

ago()arrow-up-right Return time as epoch millis before the given period (in ISO-8601 duration format)

timezoneHour(timeZoneId)arrow-up-right Returns the hour of the time zone offset.

timezoneMinute(timeZoneId)arrow-up-right Returns the minute of the time zone offset.

year(tsInMillis)arrow-up-right Returns the year from the given epoch millis in UTC timezone.

year(tsInMillis, timeZoneId)arrow-up-right Returns the year from the given epoch millis and timezone id.

yearOfWeek(tsInMillis)arrow-up-right Returns the year of the ISO week from the given epoch millis in UTC timezone. Alias yowis also supported.

yearOfWeek(tsInMillis, timeZoneId)arrow-up-right Returns the year of the ISO week from the given epoch millis and timezone id. Alias yowis also supported.

quarter(tsInMillis)arrow-up-right Returns the quarter of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 4.

quarter(tsInMillis, timeZoneId)arrow-up-right Returns the quarter of the year from the given epoch millis and timezone id. The value ranges from 1 to 4.

month(tsInMillis)arrow-up-right Returns the month of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 12.

month(tsInMillis, timeZoneId)arrow-up-right Returns the month of the year from the given epoch millis and timezone id. The value ranges from 1 to 12.

week(tsInMillis)arrow-up-right Returns the ISO week of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 53. Alias weekOfYear is also supported.

week(tsInMillis, timeZoneId)arrow-up-right Returns the ISO week of the year from the given epoch millis and timezone id. The value ranges from 1 to 53. Alias weekOfYear is also supported.

dayOfYear(tsInMillis)arrow-up-right Returns the day of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 366. Alias doy is also supported.

dayOfYear(tsInMillis, timeZoneId)arrow-up-right Returns the day of the year from the given epoch millis and timezone id. The value ranges from 1 to 366. Alias doy is also supported.

day(tsInMillis)arrow-up-right Returns the day of the month from the given epoch millis in UTC timezone. The value ranges from 1 to 31. Alias dayOfMonth is also supported.

day(tsInMillis, timeZoneId)arrow-up-right Returns the day of the month from the given epoch millis and timezone id. The value ranges from 1 to 31. Alias dayOfMonth is also supported.

dayOfWeek(tsInMillis)arrow-up-right Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday). Alias dow is also supported.

dayOfWeek(tsInMillis, timeZoneId)arrow-up-right Returns the day of the week from the given epoch millis and timezone id. The value ranges from 1(Monday) to 7(Sunday). Alias dow is also supported.

hour(tsInMillis)arrow-up-right Returns the hour of the day from the given epoch millis in UTC timezone. The value ranges from 0 to 23.

hour(tsInMillis, timeZoneId)arrow-up-right Returns the hour of the day from the given epoch millis and timezone id. The value ranges from 0 to 23.

minute(tsInMillis)arrow-up-right Returns the minute of the hour from the given epoch millis in UTC timezone. The value ranges from 0 to 59.

minute(tsInMillis, timeZoneId)arrow-up-right Returns the minute of the hour from the given epoch millis and timezone id. The value ranges from 0 to 59.

second(tsInMillis)arrow-up-right Returns the second of the minute from the given epoch millis in UTC timezone. The value ranges from 0 to 59.

second(tsInMillis, timeZoneId)arrow-up-right Returns the second of the minute from the given epoch millis and timezone id. The value ranges from 0 to 59.

millisecond(tsInMillis)arrow-up-right Returns the millisecond of the second from the given epoch millis in UTC timezone. The value ranges from 0 to 999.

millisecond(tsInMillis, timeZoneId)arrow-up-right Returns the millisecond of the second from the given epoch millis and timezone id. The value ranges from 0 to 999.

JSON Functions

Transform Functions

These functions can only be used in Pinot SQL queries.

Function

JSONEXTRACTSCALAR(jsonField, 'jsonPath', 'resultsType', [defaultValue])arrow-up-right Evaluates the 'jsonPath' on jsonField, returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.

JSONEXTRACTKEYarrow-up-right(jsonField, 'jsonPath')arrow-up-right Extracts all matched JSON field keys based on 'jsonPath' into a STRING_ARRAY.

EXTRACT(dateTimeField FROM dateTimeExpression)arrow-up-right Extracts the field from the DATETIME expression of the format 'YYYY-MM-DD HH:MM:SS'. Currently, this transformation function supports YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND fields.

Scalar Functions

These functions can be used for column transformation in table ingestion configs.

Function

JSONFORMAT(object)arrow-up-right Convert object to JSON String

JSONPATH(jsonField, 'jsonPath')arrow-up-right Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Cannot be used in query because data type is not specified.

JSONPATHLONG(jsonField, 'jsonPath', [defaultValue])arrow-up-right Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.

JSONPATHDOUBLE(jsonField, 'jsonPath', [defaultValue])arrow-up-right Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.

JSONPATHSTRING(jsonField, 'jsonPath', [defaultValue])arrow-up-right Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.

JSONPATHARRAY(jsonField, 'jsonPath')arrow-up-right Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Cannot be used in query because data type is not specified.

JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath')arrow-up-right Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. Cannot be used in query because data type is not specified.

Binary Functions

Function

SHA(bytesCol)arrow-up-right Return SHA-1 digest of binary column(bytes type) as hex string

SHA256(bytesCol)arrow-up-right Return SHA-256 digest of binary column(bytes type) as hex string

SHA512(bytesCol)arrow-up-right Return SHA-512 digest of binary column(bytes type) as hex string

MD5(bytesCol)arrow-up-right Return MD5 digest of binary column(bytes type) as hex string

toBase64(bytesCol)arrow-up-right Return the Base64-encoded string of binary column(bytes type)

fromUtf8(bytesCol)arrow-up-right Return the UTF8-encoded string of binary column(bytes type)

Multi-value Column Functions

All of the functions mentioned till now only support single value columns. You can use the following functions to do operations on multi-value columns.

Function

ARRAYLENGTHarrow-up-right Returns the length of a multi-value

MAP_VALUE Select the value for a key from Map stored in Pinot. MAP_VALUE(mapColumn, 'myKey', valueColumn)

VALUEINarrow-up-right The transform function will filter the value from the multi-valued column with the given constant values. The VALUEIN transform function is especially useful when the same multi-valued column is both filtering column and grouping column.

Advanced Queries

Geospatial Queries

Pinot supports Geospatial queries on columns containing text-based geographies. For more details on the queries and how to enable them, see Geospatial.

Text Queries

Pinot supports pattern matching on text-based columns. Only the columns mentioned as text columns in table config can be queried using this method. For more details on how to enable pattern matching, see Text search support.

Last updated

Was this helpful?