Supported Transformations
This document contains the list of all the transformation functions supported by Pinot SQL.
Math Functions
Function | Description | Example |
---|---|---|
ADD(col1, col2, col3...) | Sum of at least two values |
|
SUB(col1, col2) | Difference between two values |
|
MULT(col1, col2, col3...) | Product of at least two values |
|
DIV(col1, col2) | Quotient of two values |
|
MOD(col1, col2) | Modulo of two values |
|
ABS(col1) | Absolute of a value |
|
CEIL(col1) | Rounded up to the nearest integer. |
|
FLOOR(col1) | Rounded down to the nearest integer. |
|
EXP(col1) | Euler’s number(e) raised to the power of col. |
|
LN(col1) | Natural log of value i.e. ln(col1) |
|
SQRT(col1) | Square root of a value |
|
String Functions
Multiple string functions are supported out of the box from release-0.5.0 .
Function | Description | Example |
---|---|---|
UPPER(col) | convert string to upper case |
|
LOWER(col) | convert string to lower case |
|
REVERSE(col) | reverse the string |
|
SUBSTR(col, startIndex, endIndex) | get substring of the input string from start to endIndex. Index begins at 0. Set endIndex to -1 to calculate till end of the string |
<code></code>
|
CONCAT(col1, col2, seperator) | Concatenate two input strings using the seperator |
|
TRIM(col) | trim spaces from both side of the string |
|
LTRIM(col) | trim spaces from left side of the string |
|
RTRIM(col) | trim spaces from right side of the string |
|
LENGTH(col) | calculate length of the string |
|
STRPOS(col, find, N) | find Nth instance of |
|
STARTSWITH(col, prefix) | returns |
|
REPLACE(col, find, substitute) | replace all instances of |
|
RPAD(col, size, pad) | string padded from the right side with |
|
LPAD(col, size, pad) | string padded from the left side with |
|
CODEPOINT(col) | the Unicode codepoint of the first character of the string |
|
CHR(codepoint) | the character corresponding to the Unicode codepoint |
|
DateTime Functions
Date time functions allow you to perform transformations on columns that contain timestamps or dates.
Function | Description | Example |
---|---|---|
TIMECONVERT (col, fromUnit, toUnit) | Converts the value into another time unit. the column should be an epoch timestamp. Supported units are
|
|
DATETIMECONVERT (columnName, inputFormat, outputFormat, outputGranularity) | Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity. Note that, for weeks/months/quarters/years, please use function: DateTrunc. The format is expressed as
|
|
DATETRUNC | (Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc. Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone. |
|
ToEpoch<TIME_UNIT>(timeInMillis) | Convert epoch milliseconds to epoch <Time Unit>. Supported <Time Unit>: |
|
ToEpoch<TIME_UNIT>Rounded(timeInMillis, bucketSize) | Convert epoch milliseconds to epoch <Time Unit>, round to nearest rounding bucket(Bucket size is defined in <Time Unit>). Supported <Time Unit>: |
|
ToEpoch<TIME_UNIT>Bucket(timeInMillis, bucketSize) | Convert epoch milliseconds to epoch <Time Unit>, and divided by bucket size(Bucket size is defined in <Time Unit>). Supported <Time Unit>: |
|
FromEpoch<TIME_UNIT>(timeIn<Time_UNIT>) | Convert epoch <Time Unit> to epoch milliseconds. Supported <Time Unit>: |
|
FromEpoch<TIME_UNIT>Bucket(timeIn<Time_UNIT>, bucketSizeIn<Time_UNIT>) | Convert epoch <Bucket Size><Time Unit> to epoch milliseconds. E.g. 10 seconds since epoch or 5 minutes since Epoch. Supported <Time Unit>: |
|
ToDateTime(timeInMillis, pattern[, timezoneId]) | Convert epoch millis value to DateTime string represented by pattern. Time zone will be set to UTC if |
|
FromDateTime(dateTimeString, pattern) | Convert DateTime string represented by pattern to epoch millis. |
|
round(timeValue, bucketSize) | Round the given time value to nearest bucket start value. |
|
now() | Return current time as epoch millis | Typically used in predicate to filter on timestamp for recent data. E.g. filter data on recent 1 day(86400 seconds). |
timezoneHour(timeZoneId) | Returns the hour of the time zone offset. | |
timezoneMinute(timeZoneId) | Returns the minute of the time zone offset. | |
year(tsInMillis) | Returns the year from the given epoch millis in UTC timezone. | |
year(tsInMillis, timeZoneId) | Returns the year from the given epoch millis and timezone id. | |
yearOfWeek(tsInMillis) | Returns the year of the ISO week from the given epoch millis in UTC timezone. Alias | |
yearOfWeek(tsInMillis, timeZoneId) | Returns the year of the ISO week from the given epoch millis and timezone id. Alias | |
quarter(tsInMillis) | Returns the quarter of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 4. | |
quarter(tsInMillis, timeZoneId) | Returns the quarter of the year from the given epoch millis and timezone id. The value ranges from 1 to 4. | |
month(tsInMillis) | Returns the month of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 12. | |
month(tsInMillis, timeZoneId) | Returns the month of the year from the given epoch millis and timezone id. The value ranges from 1 to 12. | |
week(tsInMillis) | Returns the ISO week of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 53. Alias | |
week(tsInMillis, timeZoneId) | Returns the ISO week of the year from the given epoch millis and timezone id. The value ranges from 1 to 53. Alias | |
dayOfYear(tsInMillis) | Returns the day of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 366. Alias | |
dayOfYear(tsInMillis, timeZoneId) | Returns the day of the year from the given epoch millis and timezone id. The value ranges from 1 to 366. Alias | |
day(tsInMillis) | Returns the day of the month from the given epoch millis in UTC timezone. The value ranges from 1 to 31. Alias | |
day(tsInMillis, timeZoneId) | Returns the day of the month from the given epoch millis and timezone id. The value ranges from 1 to 31. Alias | |
dayOfWeek(tsInMillis) | Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday). Alias | |
dayOfWeek(tsInMillis, timeZoneId) | Returns the day of the week from the given epoch millis and timezone id. The value ranges from 1(Monday) to 7(Sunday). Alias | |
hour(tsInMillis) | Returns the hour of the day from the given epoch millis in UTC timezone. The value ranges from 0 to 23. | |
hour(tsInMillis, timeZoneId) | Returns the hour of the day from the given epoch millis and timezone id. The value ranges from 0 to 23. | |
minute(tsInMillis) | Returns the minute of the hour from the given epoch millis in UTC timezone. The value ranges from 0 to 59. | |
minute(tsInMillis, timeZoneId) | Returns the minute of the hour from the given epoch millis and timezone id. The value ranges from 0 to 59. | |
second(tsInMillis) | Returns the second of the minute from the given epoch millis in UTC timezone. The value ranges from 0 to 59. | |
second(tsInMillis, timeZoneId) | Returns the second of the minute from the given epoch millis and timezone id. The value ranges from 0 to 59. | |
millisecond(tsInMillis) | Returns the millisecond of the second from the given epoch millis in UTC timezone. The value ranges from 0 to 999. | |
millisecond(tsInMillis, timeZoneId) | Returns the millisecond of the second from the given epoch millis and timezone id. The value ranges from 0 to 999. |
JSON Functions
Function | Type | Description |
JSONEXTRACTSCALAR (jsonField, 'jsonPath', 'resultsType', [defaultValue]) | Transform | Evaluates the returns the result as the type |
JSONEXTRACTKEY (jsonField, 'jsonPath') | Transform | Extracts all matched JSON field keys based on Into a |
TOJSONMAPSTR(map) | Scalar | Convert map to JSON String |
JSONFORMAT(object) | Scalar | Convert object to JSON String |
JSONPATH(jsonField, 'jsonPath') | Scalar | Extracts the object value from |
JSONPATHLONG(jsonField, 'jsonPath', [defaultValue]) | Scalar | Extracts the Long value from |
JSONPATHDOUBLE(jsonField, 'jsonPath', [defaultValue]) | Scalar | Extracts the Double value from |
JSONPATHSTRING(jsonField, 'jsonPath', [defaultValue]) | Scalar | Extracts the String value from |
JSONPATHARRAY(jsonField, 'jsonPath') | Scalar | Extracts an array from |
JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath') | Scalar | Extracts an array from |
Usage
| Description |
| An Identifier/Expression contains JSON documents. |
| Follows JsonPath Syntax to read values from JSON documents. |
| One of the Pinot supported data types:
|
'jsonPath'
and
'results_type'
are literals. Pinot uses single quotes to distinguish them from identifiers.
e.g.
JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING')
is valid.JSONEXTRACTSCALAR(profile_json_str, "$.name", "STRING")
is invalid.
Transform functions can only be used in Pinot SQL. Scalar functions can be used for column transformation in table ingestion configs.
Examples
The examples below are based on these 3 sample profile JSON documents:
Query 1: Extract string values from the field 'name'
Results are
Query 2: Extract integer values from the field 'age'
Results are
Query 3: Extract Bob's age from the JSON profile.
Results are
Query 4: Extract all field keys of JSON profile.
Results are
Another example of extracting JSON fields from below JSON record:
Extract JSON fields:
Expression | Value |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Binary Functions
Function | Description | Example |
---|---|---|
SHA(bytesCol) | Return SHA-1 digest of binary column( |
|
SHA256(bytesCol) | Return SHA-256 digest of binary column( |
|
SHA512(bytesCol) | Return SHA-512 digest of binary column( |
|
MD5(bytesCol) | Return MD5 digest of binary column( |
|
Multi-value Column Functions
All of the functions mentioned till now only support single value columns. You can use the following functions to do operations on multi-value columns.
Function | Description | Example |
---|---|---|
ARRAYLENGTH | Returns the length of a multi-value column | |
MAP_VALUE | Select the value for a key from Map stored in Pinot. |
|
VALUEIN | Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The |
|
Advanced Queries
Geospatial Queries
Pinot supports Geospatial queries on columns containing text-based geographies. For more details on the queries and how to enable them, see Geospatial.
Text Queries
Pinot supports pattern matching on text-based columns. Only the columns mentioned as text columns in table config can be queried using this method. For more details on how to enable pattern matching, see Text search support.
Last updated