Supported Transformations
This document contains the list of all the transformation functions supported by Pinot SQ

Math Functions

Function
Description
Example
ADD(col1, col2, col3...)
Sum of at least two values
ADD(score_maths, score_science, score_history)
SUB(col1, col2)
Difference between two values
SUB(total_score, score_science)
MULT(col1, col2, col3...)
Product of at least two values
MUL(score_maths, score_science, score_history)
DIV(col1, col2)
Quotient of two values
SUB(total_score, total_subjects)
**MOD(**col1, col2)
Modulo of two values
MOD(total_score, total_subjects)
ABS(col1)
Absolute of a value
ABS(score)
CEIL(col1)
Rounded up to the nearest integer.
CEIL(percentage)
FLOOR(col1)
Rounded down to the nearest integer.
FLOOR(percentage)
EXP(col1)
Euler’s number(e) raised to the power of col.
EXP(age)
LN(col1)
Natural log of value i.e. ln(col1)
LN(age)
SQRT(col1)
Square root of a value
SQRT(height)

String Functions

Multiple string functions are supported out of the box from release-0.5.0 .
Function
Description
Example
UPPER(col)
convert string to upper case
UPPER(playerName)
LOWER(col)
convert string to lower case
LOWER(playerName)
REVERSE(col)
reverse the string
REVERSE(playerName)
SUBSTR(col, startIndex, endIndex)
get substring of the input string from start to endIndex. Index begins at 0. Set endIndex to -1 to calculate till end of the string
SUBSTR(playerName, 1, -1)
<code></code>
SUBSTR(playerName, 1, 4)
CONCAT(col1, col2, seperator)
Concatenate two input strings using the seperator
CONCAT(firstName, lastName, '-')
TRIM(col)
trim spaces from both side of the string
TRIM(playerName)
LTRIM(col)
trim spaces from left side of the string
LTRIM(playerName)
RTRIM(col)
trim spaces from right side of the string
RTRIM(playerName)
LENGTH(col)
calculate length of the string
LENGTH(playerName)
STRPOS(col, find, N)
find Nth instance of find string in input. Returns 0 if input string is empty. Returns -1 if the Nth instance is not found or input string is null.
STRPOS(playerName, 'david', 1)
STARTSWITH(col, prefix)
returns true if columns starts with prefix string.
STARTSWITH(playerName, 'david')
REPLACE(col, find, substitute)
replace all instances of find with replace in input
REPLACE(playerName, 'david', 'henry')
RPAD(col, size, pad)
string padded from the right side with pad to reach final size
RPAD(playerName, 20, 'foo')
LPAD(col, size, pad)
string padded from the left side with pad to reach final size
LPAD(playerName, 20, 'foo')
CODEPOINT(col)
the Unicode codepoint of the first character of the string
CODEPOINT(playerName)
CHR(codepoint)
the character corresponding to the Unicode codepoint
CHR(68)

DateTime Functions

Date time functions allow you to perform transformations on columns which contains timestamps or date.
Function
Description
Example
TIMECONVERT
(col, fromUnit, toUnit)
Converts the value into another time unit. the column should be an epoch timestamp. Supported units are DAYS HOURS MINUTES SECONDS MILLISECONDS MICROSECONDS NANOSECONDS
TIMECONVERT(time, 'MILLISECONDS', 'SECONDS')This expression converts the value of column time (taken to be in milliseconds) to the nearest seconds (i.e. the nearest seconds that is lower than the value of date column)
DATETIMECONVERT
(columnName, inputFormat, outputFormat, outputGranularity)
Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity. Note that, for weeks/months/quarters/years, please use function: DateTrunc.
The format is expressed as <time size>:<time unit>:<time format>:<pattern> where,
time size - size of the time unit eg: 1, 10
time unit - DAYS HOURS MINUTES SECONDS MILLISECONDS MICROSECONDS NANOSECONDS
time format - EPOCH or SIMPLE_DATE_FORMAT
pattern - this is defined in case of SIMPLE_DATE_FORMAT eg: yyyy-MM-dd. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. Asia/Kolkata or PDT
granularity - specified in the format<time size>:<time unit>
    Date from hoursSinceEpoch to daysSinceEpoch and bucket it to 1 day granularity DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:EPOCH', '1:DAYS')
    Date to 15 minutes granularity DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES')
    Date from hoursSinceEpoch to format yyyyMdd and bucket it to 1 days granularity DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', '1:DAYS')
    Date from milliseconds to format yyyyMdd in timezone PST DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd tz(America/Los_Angeles)', '1:DAYS')
DATETRUNC
(Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc.
Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.
DATETRUNC('week', time_in_seconds, 'SECONDS') This expression converts the column time_in_seconds, which is a long containing seconds since UTC epoch truncated at WEEK (where a Week starts at Monday UTC midnight). The output is a long seconds since UTC epoch.
DATETRUNC('quarter', DIV(time_milliseconds/1000), 'SECONDS', 'America/Los_Angeles', 'HOURS') This expression converts the expression time_in_milliseconds/1000into hours that are truncated to QUARTER at the Los Angeles time zone (where a Quarter begins on 1/1, 4/1, 7/1, 10/1 in Los Angeles timezone). The output is expressed as hours since UTC epoch (note that the output is not Los Angeles timezone)
ToEpoch<TIME_UNIT>(timeInMillis)
Convert epoch milliseconds to epoch <Time Unit>. Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS
ToEpochSeconds(tsInMillis):Converts column tsInMillis value from epoch milliseconds to epoch seconds.
ToEpochDays(tsInMillis):Converts column tsInMillis value from epoch milliseconds to epoch days.
ToEpoch<TIME_UNIT>Rounded(timeInMillis, bucketSize)
Convert epoch milliseconds to epoch <Time Unit>, round to nearest rounding bucket(Bucket size is defined in <Time Unit>). Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS
ToEpochSecondsRound(tsInMillis, 10):Converts column tsInMillis value from epoch milliseconds to epoch seconds and round to the 10-minute bucket value. E.g.ToEpochSecondsRound(1613472303000, 10) = 1613472300
ToEpochMinutesRound(tsInMillis, 1440):Converts column tsInMillis value from epoch milliseconds to epoch Minutes, but round to 1-day bucket value. E.g.ToEpochMinutesRound(1613472303000, 1440) = 26890560
ToEpoch<TIME_UNIT>Bucket(timeInMillis, bucketSize)
Convert epoch milliseconds to epoch <Time Unit>, and divided by bucket size(Bucket size is defined in <Time Unit>). Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS
ToEpochSecondsBucket(tsInMillis, 10):Converts column tsInMillis value from epoch milliseconds to epoch seconds then divide by 10 to get the 10 seconds since epoch value. E.g.
ToEpochSecondsBucket(1613472303000, 10) = 161347230
ToEpochHoursBucket(tsInMillis, 24):Converts column tsInMillis value from epoch milliseconds to epoch Hours, then divide by 24 to get 24 hours since epoch value.
FromEpoch<TIME_UNIT>(timeIn<Time_UNIT>)
Convert epoch <Time Unit> to epoch milliseconds. Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS
FromEpochSeconds(tsInSeconds):Converts column tsInSeconds value from epoch seconds to epoch milliseconds. E.g.
FromEpochSeconds(1613472303) = 1613472303000
FromEpoch<TIME_UNIT>Bucket(timeIn<Time_UNIT>, bucketSizeIn<Time_UNIT>)
Convert epoch <Bucket Size><Time Unit> to epoch milliseconds. E.g. 10 seconds since epoch or 5 minutes since Epoch. Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS
FromEpochSecondsBucket(tsInSeconds, 10):Converts column tsInSeconds value from epoch 10-seconds to epoch milliseconds. E.g.
FromEpochSeconds(161347231)= 1613472310000
ToDateTime(timeInMillis, pattern[, timezoneId])
Convert epoch millis value to DateTime string represented by pattern. Time zone will be set to UTC if timezoneId is not specified.
ToDateTime(tsInMillis, 'yyyy-MM-dd') converts tsInMillis value to date time pattern yyyy-MM-dd
ToDateTime(tsInMillis, 'yyyy-MM-dd ZZZ', 'America/Los_Angeles') converts tsInMillis value to date time pattern yyyy-MM-dd ZZZ in America/Los_Angeles time zone
FromDateTime(dateTimeString, pattern)
Convert DateTime string represented by pattern to epoch millis.
FromDateTime(dateTime, 'yyyy-MM-dd')converts dateTime string value to millis epoch value
round(timeValue, bucketSize)
Round the given time value to nearest bucket start value.
round(tsInSeconds, 60) round seconds epoch value to the start value of the 60 seconds bucket it belongs to. E.g. round(161347231, 60)= 161347200
now()
Return current time as epoch millis
Typically used in predicate to filter on timestamp for recent data. E.g. filter data on recent 1 day(86400 seconds).WHERE tsInMillis > now() - 86400000
timezoneHour(timeZoneId)
Returns the hour of the time zone offset.
timezoneMinute(timeZoneId)
Returns the minute of the time zone offset.
year(tsInMillis)
Returns the year from the given epoch millis in UTC timezone.
year(tsInMillis, timeZoneId)
Returns the year from the given epoch millis and timezone id.
yearOfWeek(tsInMillis)
Returns the year of the ISO week from the given epoch millis in UTC timezone. Alias yowis also supported.
yearOfWeek(tsInMillis, timeZoneId)
Returns the year of the ISO week from the given epoch millis and timezone id. Alias yowis also supported.
quarter(tsInMillis)
Returns the quarter of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 4.
quarter(tsInMillis, timeZoneId)
Returns the quarter of the year from the given epoch millis and timezone id. The value ranges from 1 to 4.
month(tsInMillis)
Returns the month of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 12.
month(tsInMillis, timeZoneId)
Returns the month of the year from the given epoch millis and timezone id. The value ranges from 1 to 12.
week(tsInMillis)
Returns the ISO week of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 53. Alias weekOfYear is also supported.
week(tsInMillis, timeZoneId)
Returns the ISO week of the year from the given epoch millis and timezone id. The value ranges from 1 to 53. Alias weekOfYear is also supported.
dayOfYear(tsInMillis)
Returns the day of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 366. Alias doy is also supported.
dayOfYear(tsInMillis, timeZoneId)
Returns the day of the year from the given epoch millis and timezone id. The value ranges from 1 to 366. Alias doy is also supported.
day(tsInMillis)
Returns the day of the month from the given epoch millis in UTC timezone. The value ranges from 1 to 31. Alias dayOfMonth is also supported.
day(tsInMillis, timeZoneId)
Returns the day of the month from the given epoch millis and timezone id. The value ranges from 1 to 31. Alias dayOfMonth is also supported.
dayOfWeek(tsInMillis)
Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday). Alias dow is also supported.
dayOfWeek(tsInMillis, timeZoneId)
Returns the day of the week from the given epoch millis and timezone id. The value ranges from 1(Monday) to 7(Sunday). Alias dow is also supported.
hour(tsInMillis)
Returns the hour of the day from the given epoch millis in UTC timezone. The value ranges from 0 to 23.
hour(tsInMillis, timeZoneId)
Returns the hour of the day from the given epoch millis and timezone id. The value ranges from 0 to 23.
minute(tsInMillis)
Returns the minute of the hour from the given epoch millis in UTC timezone. The value ranges from 0 to 59.
minute(tsInMillis, timeZoneId)
Returns the minute of the hour from the given epoch millis and timezone id. The value ranges from 0 to 59.
second(tsInMillis)
Returns the second of the minute from the given epoch millis in UTC timezone. The value ranges from 0 to 59.
second(tsInMillis, timeZoneId)
Returns the second of the minute from the given epoch millis and timezone id. The value ranges from 0 to 59.
millisecond(tsInMillis)
Returns the millisecond of the second from the given epoch millis in UTC timezone. The value ranges from 0 to 999.
millisecond(tsInMillis, timeZoneId)
Returns the millisecond of the second from the given epoch millis and timezone id. The value ranges from 0 to 999.

JSON Functions

Text
Text
Text
Function
Type
Description
JSONEXTRACTSCALAR
(jsonField, 'jsonPath', 'resultsType', [defaultValue])
Transform
Evaluates the 'jsonPath' on jsonField,
returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.
JSONEXTRACTKEY
(jsonField, 'jsonPath')
Transform
Extracts all matched JSON field keys based on 'jsonPath'
Into aSTRING_ARRAY.
TOJSONMAPSTR(map)
Scalar
Convert map to JSON String
JSONFORMAT(object)
Scalar
Convert object to JSON String
JSONPATH(jsonField, 'jsonPath')
Scalar
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Cannot be used in query because data type is not specified.
JSONPATHLONG(jsonField, 'jsonPath', [defaultValue])
Scalar
Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.
JSONPATHDOUBLE(jsonField, 'jsonPath', [defaultValue])
Scalar
Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.
JSONPATHSTRING(jsonField, 'jsonPath', [defaultValue])
Scalar
Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.
JSONPATHARRAY(jsonField, 'jsonPath')
Scalar
Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Cannot be used in query because data type is not specified.
JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath')
Scalar
Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. Cannot be used in query because data type is not specified.
Usage
Text
Text
Arguments
Description
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows JsonPath Syntax to read values from JSON documents.
'results_type'
One of the Pinot supported data types:INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING,
INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.
'jsonPath'and**'resultsType'**are Literals. Pinot uses single quotes to distinguish it from Identifiers.
E.g.
JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING') is Valid.JSONEXTRACTSCALAR(profile_json_str, "$.name", "STRING") is Invalid.
Transform functions can only be used in Pinot SQL. Scalar functions can be used in table ingestion configs for column transformation.
Examples
Below examples are based on 3 sample profile JSON documents:
1
{
2
"name" : "Bob",
3
"age" : 37,
4
"gender": "male",
5
"location": "San Francisco"
6
},{
7
"name" : "Alice",
8
"age" : 25,
9
"gender": "female",
10
"location": "New York"
11
},{
12
"name" : "Mia",
13
"age" : 18,
14
"gender": "female",
15
"location": "Chicago"
16
}
Copied!
Query 1: Extract string values from the field 'name'
1
SELECT
2
JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING')
3
FROM
4
myTable
Copied!
Results are
1
["Bob", "Alice", "Mia"]
Copied!
Query 2: Extract integer values from the field 'age'
1
SELECT
2
JSONEXTRACTSCALAR(profile_json_str, '$.age', 'INT')
3
FROM
4
myTable
Copied!
Results are
1
[37, 25, 18]
Copied!
Query 3: Extract Bob's age from the JSON profile.
1
SELECT
2
JSONEXTRACTSCALAR(myMapStr,'$.age','INT')
3
FROM
4
myTable
5
WHERE
6
JSONEXTRACTSCALAR(myMapStr,'$.name','STRING') = 'Bob'
Copied!
Results are
1
[37]
Copied!
Query 4: Extract all field keys of JSON profile.
1
SELECT
2
JSONEXTRACTKEY(myMapStr,'$.*')
3
FROM
4
myTable
Copied!
Results are
1
["name", "age", "gender", "location"]
Copied!
Another example of extracting JSON fields from below JSON record:
1
{
2
"name": "Pete",
3
"age": 24,
4
"subjects": [{
5
"name": "maths",
6
"homework_grades": [80, 85, 90, 95, 100],
7
"grade": "A",
8
"score": 90
9
},
10
{
11
"name": "english",
12
"homework_grades": [60, 65, 70, 85, 90],
13
"grade": "B",
14
"score": 70
15
}
16
]
17
}
Copied!
Extract JSON fields:
Expression
Value
JSONPATH(myJsonRecord, '$.name')
"Pete"
JSONPATH(myJsonRecord, '$.age')
24
JSONPATHSTRING(myJsonRecord, '$.age')
"24"
JSONPATHARRAY(myJsonRecord, '$.subjects[*].name')
["maths", "english"]
JSONPATHARRAY(myJsonRecord, '$.subjects[*].score')
[90, 70]
JSONPATHARRAY(myJsonRecord, '$.subjects[*].homework_grades[1]')
[85, 65]

Binary Functions

Function
Description
Example
SHA(bytesCol)
Return SHA-1 digest of binary column(bytes type) as hex string
SHA(rawData)
SHA256(bytesCol)
Return SHA-256 digest of binary column(bytes type) as hex string
SHA256(rawData)
SHA512(bytesCol)
Return SHA-512 digest of binary column(bytes type) as hex string
SHA512(rawData)
MD5(bytesCol)
Return MD5 digest of binary column(bytes type) as hex string
MD5(rawData)

Multi-value Column Functions

All of the functions mentioned till now only support single value columns. You can use the following functions to do operations on multi-value columns.
Function
Description
Example
ARRAYLENGTH
Returns the length of a multi-value column
MAP_VALUE
Select the value for a key from Map stored in Pinot.
MAP_VALUE(mapColumn, 'myKey', valueColumn)
VALUEIN
Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The VALUEIN transform function is especially useful when the same multi-valued column is both filtering column and grouping column.
VALUEIN(mvColumn, 3, 5, 15)

Advanced Queries

Geospatial Queries

Pinot supports Geospatial queries on columns containing text-based geographies. Check out Geospatial for more details on the queries and how to enable them.

Text Queries

Pinot supports Pattern matching on text-based columns as well. Only the columns mentioned as text columns in table config can be queried using this method. Check out Text search support for more details on how to enable pattern matching.
Last modified 11d ago