# Supported Transformations

## Math Functions

| Function                      | Description                                   | Example                                          |
| ----------------------------- | --------------------------------------------- | ------------------------------------------------ |
| **ADD(col1, col2, col3...)**  | Sum of at least two values                    | `ADD(score_maths, score_science, score_history)` |
| **SUB(col1, col2)**           | Difference between two values                 | `SUB(total_score, score_science)`                |
| **MULT(col1, col2, col3...)** | Product of at least two values                | `MUL(score_maths, score_science, score_history)` |
| **DIV(col1, col2)**           | Quotient of two values                        | `SUB(total_score, total_subjects)`               |
| **MOD(col1, col2)**           | Modulo of two values                          | `MOD(total_score, total_subjects)`               |
| **ABS(col1)**                 | Absolute of a value                           | `ABS(score)`                                     |
| **CEIL(col1)**                | Rounded up to the nearest integer.            | `CEIL(percentage)`                               |
| **FLOOR(col1)**               | Rounded down to the nearest integer.          | `FLOOR(percentage)`                              |
| **EXP(col1)**                 | Euler’s number(e) raised to the power of col. | `EXP(age)`                                       |
| **LN(col1)**                  | Natural log of value i.e. ln(col1)            | `LN(age)`                                        |
| **SQRT(col1)**                | Square root of a value                        | `SQRT(height)`                                   |

## String Functions

Multiple string functions are supported out of the box from release-0.5.0 .

| Function                              | Description                                                                                                                                                                | Example                                                                                                         |
| ------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| **UPPER**(col)                        | convert string to upper case                                                                                                                                               | `UPPER(playerName)`                                                                                             |
| **LOWER**(col)                        | convert string to lower case                                                                                                                                               | `LOWER(playerName)`                                                                                             |
| **REVERSE**(col)                      | reverse the string                                                                                                                                                         | `REVERSE(playerName)`                                                                                           |
| **SUBSTR**(col, startIndex, endIndex) | <p>get substring of the input string from start to endIndex.<br>Index begins at 0.<br>Set endIndex to -1 to calculate till end of the string</p>                           | <p><code>SUBSTR(playerName, 1, -1)</code></p><p>\<code>\</code></p><p><code>SUBSTR(playerName, 1, 4)</code></p> |
| **CONCAT**(col1, col2, seperator)     | Concatenate two input strings using the seperator                                                                                                                          | `CONCAT(firstName, lastName, '-')`                                                                              |
| **TRIM**(col)                         | trim spaces from both side of the string                                                                                                                                   | `TRIM(playerName)`                                                                                              |
| **LTRIM**(col)                        | trim spaces from left side of the string                                                                                                                                   | `LTRIM(playerName)`                                                                                             |
| **RTRIM**(col)                        | trim spaces from right side of the string                                                                                                                                  | `RTRIM(playerName)`                                                                                             |
| **LENGTH**(col)                       | calculate length of the string                                                                                                                                             | `LENGTH(playerName)`                                                                                            |
| **STRPOS**(col, find, N)              | <p>find Nth instance of <code>find</code> string in input.<br>Returns 0 if input string is empty. Returns -1 if the Nth instance is not found or input string is null.</p> | `STRPOS(playerName, 'david', 1)`                                                                                |
| **STARTSWITH**(col, prefix)           | returns `true` if columns starts with prefix string.                                                                                                                       | `STARTSWITH(playerName, 'david')`                                                                               |
| **REPLACE**(col, find, substitute)    | replace all instances of `find` with `replace` in input                                                                                                                    | `REPLACE(playerName, 'david', 'henry')`                                                                         |
| **RPAD**(col, size, pad)              | string padded from the right side with `pad` to reach final `size`                                                                                                         | `RPAD(playerName, 20, 'foo')`                                                                                   |
| **LPAD**(col, size, pad)              | string padded from the left side with `pad` to reach final `size`                                                                                                          | `LPAD(playerName, 20, 'foo')`                                                                                   |
| **CODEPOINT**(col)                    | the Unicode codepoint of the first character of the string                                                                                                                 | `CODEPOINT(playerName)`                                                                                         |
| **CHR**(codepoint)                    | the character corresponding to the Unicode codepoint                                                                                                                       | `CHR(68)`                                                                                                       |

## DateTime Functions

Date time functions allow you to perform transformations on columns that contain timestamps or dates.

| Function                                                                                                 | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Example                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| -------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <p><strong>TIMECONVERT</strong></p><p><strong>(</strong>col, fromUnit, toUnit<strong>)</strong></p>      | <p>Converts the value into another time unit. the column should be an epoch timestamp. Supported units are<br><code>DAYS HOURS MINUTES SECONDS MILLISECONDS MICROSECONDS NANOSECONDS</code></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | `TIMECONVERT(time, 'MILLISECONDS', 'SECONDS')`This expression converts the value of column `time` (taken to be in milliseconds) to the nearest seconds (*i.e.* the nearest seconds that is lower than the value of `date` column)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| <p><strong>DATETIMECONVERT</strong></p><p>(columnName, inputFormat, outputFormat, outputGranularity)</p> | <p>Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity. Note that, for weeks/months/quarters/years, please use function: <strong>DateTrunc</strong>.</p><p>The format is expressed as <code>\<time size>:\<time unit>:\<time format>:\<pattern></code><br>where,</p><p><strong><code>time size</code></strong> - size of the time unit eg: 1, 10</p><p><strong><code>time unit</code></strong> - <code>DAYS HOURS MINUTES SECONDS MILLISECONDS MICROSECONDS NANOSECONDS</code></p><p><strong><code>time format</code></strong> - <code>EPOCH</code> or <code>SIMPLE\_DATE\_FORMAT</code></p><p><code>pattern</code> - this is defined in case of <code>SIMPLE\_DATE\_FORMAT</code> eg: <code>yyyy-MM-dd</code>. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. <code>Asia/Kolkata</code> or <code>PDT</code></p><p><strong><code>granularity</code></strong> <strong>-</strong> specified in the format<code>\<time size>:\<time unit></code></p> | <ul><li><code>Date</code> from <code>hoursSinceEpoch</code> to <code>daysSinceEpoch</code> and bucket it to 1 day granularity<br><br><code>DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:EPOCH', '1:DAYS')</code><br></li><li><code>Date</code> to 15 minutes granularity<br><br><code>DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES')</code><br></li><li><code>Date</code> from <code>hoursSinceEpoch</code> to format <code>yyyyMdd</code> and bucket it to 1 days granularity<br><br><code>DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:SIMPLE\_DATE\_FORMAT:yyyyMMdd', '1:DAYS')</code><br><br></li><li><code>Date</code> from milliseconds to format <code>yyyyMdd</code> in timezone PST<br><br><code>DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE\_DATE\_FORMAT:yyyyMMdd tz(America/Los\_Angeles)', '1:DAYS')</code></li></ul> |
| **DATETRUNC**                                                                                            | <p>(Presto) SQL compatible date truncation, equivalent to the Presto function <a href="https://mode.com/blog/date-trunc-sql-timestamp-function-count-on">date\_trunc</a>.<br></p><p>Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | <p><code>DATETRUNC('week', time\_in\_seconds, 'SECONDS')</code> This expression converts the column <code>time\_in\_seconds</code>, which is a long containing seconds since UTC epoch truncated at <code>WEEK</code> (where a Week starts at Monday UTC midnight). The output is a long seconds since UTC epoch.<br></p><p><code>DATETRUNC('quarter', DIV(time\_milliseconds/1000), 'SECONDS', 'America/Los\_Angeles', 'HOURS')</code> This expression converts the expression <code>time\_in\_milliseconds/1000</code>into hours that are truncated to <code>QUARTER</code> at the Los Angeles time zone (where a Quarter begins on 1/1, 4/1, 7/1, 10/1 in Los Angeles timezone). The output is expressed as hours since UTC epoch (note that the output is not Los Angeles timezone)</p>                                                                                                  |
| **ToEpoch\<TIME\_UNIT>(timeInMillis)**                                                                   | Convert epoch milliseconds to epoch \<Time Unit>. Supported \<Time Unit>: `SECONDS/MINUTES/HOURS/DAYS`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | <p><code>ToEpochSeconds(tsInMillis):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch seconds.</p><p><code>ToEpochDays(tsInMillis):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch days.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| **ToEpoch\<TIME\_UNIT>Rounded(timeInMillis, bucketSize)**                                                | Convert epoch milliseconds to epoch \<Time Unit>, round to nearest rounding bucket(Bucket size is defined in \<Time Unit>). Supported \<Time Unit>: `SECONDS/MINUTES/HOURS/DAYS`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <p><code>ToEpochSecondsRound(tsInMillis, 10):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch seconds and round to the 10-minute bucket value. E.g.<code>ToEpochSecondsRound(</code>1613472303000, 10) = 1613472300</p><p><code>ToEpochMinutesRound(tsInMillis, 1440):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch Minutes, but round to 1-day bucket value. E.g.<code>ToEpochMinutesRound(</code>1613472303000, 1440) = 26890560</p>                                                                                                                                                                                                                                                                                                                                                                          |
| **ToEpoch\<TIME\_UNIT>Bucket(timeInMillis, bucketSize)**                                                 | Convert epoch milliseconds to epoch \<Time Unit>, and divided by bucket size(Bucket size is defined in \<Time Unit>). Supported \<Time Unit>: `SECONDS/MINUTES/HOURS/DAYS`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <p><code>ToEpochSecondsBucket(tsInMillis, 10):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch seconds then divide by 10 to get the 10 seconds since epoch value. E.g.</p><p><code>ToEpochSecondsBucket(</code>1613472303000, 10) = 161347230</p><p><code>ToEpochHoursBucket(tsInMillis, 24):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch Hours, then divide by 24 to get 24 hours since epoch value.</p>                                                                                                                                                                                                                                                                                                                                                                                                      |
| **FromEpoch\<TIME\_UNIT>(timeIn\<Time\_UNIT>)**                                                          | Convert epoch \<Time Unit> to epoch milliseconds. Supported \<Time Unit>: `SECONDS/MINUTES/HOURS/DAYS`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | <p><code>FromEpochSeconds(tsInSeconds):</code>Converts column <code>tsInSeconds</code> value from epoch seconds to epoch milliseconds. E.g.</p><p><code>FromEpochSeconds(</code>1613472303) = 1613472303000</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **FromEpoch\<TIME\_UNIT>Bucket(timeIn\<Time\_UNIT>, bucketSizeIn\<Time\_UNIT>)**                         | Convert epoch \<Bucket Size>\<Time Unit> to epoch milliseconds. E.g. 10 seconds since epoch or 5 minutes since Epoch. Supported \<Time Unit>: `SECONDS/MINUTES/HOURS/DAYS`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <p><code>FromEpochSecondsBucket(tsInSeconds, 10):</code>Converts column <code>tsInSeconds</code> value from epoch 10-seconds to epoch milliseconds. E.g.</p><p><code>FromEpochSeconds(161347231)= 1613472310000</code></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| **ToDateTime(timeInMillis, pattern\[, timezoneId])**                                                     | Convert epoch millis value to DateTime string represented by pattern. Time zone will be set to UTC if `timezoneId` is not specified.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <p><code>ToDateTime(tsInMillis, 'yyyy-MM-dd')</code> converts tsInMillis value to date time pattern <code>yyyy-MM-dd</code></p><p><code>ToDateTime(tsInMillis, 'yyyy-MM-dd ZZZ', 'America/Los\_Angeles')</code> converts tsInMillis value to date time pattern <code>yyyy-MM-dd ZZZ</code> in America/Los\_Angeles time zone</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| **FromDateTime(dateTimeString, pattern)**                                                                | Convert DateTime string represented by pattern to epoch millis.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | `FromDateTime(dateTime, 'yyyy-MM-dd')`converts `dateTime` string value to millis epoch value                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| **round(timeValue, bucketSize)**                                                                         | Round the given time value to nearest bucket start value.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | `round(tsInSeconds, 60)` round seconds epoch value to the start value of the 60 seconds bucket it belongs to. E.g. `round(161347231, 60)= 161347200`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| **now()**                                                                                                | Return current time as epoch millis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Typically used in predicate to filter on timestamp for recent data. E.g. filter data on recent 1 day(86400 seconds).`WHERE tsInMillis > now() - 86400000`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| **timezoneHour(timeZoneId)**                                                                             | Returns the hour of the time zone offset.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **timezoneMinute(timeZoneId)**                                                                           | Returns the minute of the time zone offset.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **year(tsInMillis)**                                                                                     | Returns the year from the given epoch millis in UTC timezone.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **year(tsInMillis, timeZoneId)**                                                                         | Returns the year from the given epoch millis and timezone id.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **yearOfWeek(tsInMillis)**                                                                               | Returns the year of the ISO week from the given epoch millis in UTC timezone. Alias `yow`is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **yearOfWeek(tsInMillis, timeZoneId)**                                                                   | Returns the year of the ISO week from the given epoch millis and timezone id. Alias `yow`is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **quarter(tsInMillis)**                                                                                  | Returns the quarter of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 4.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **quarter(tsInMillis, timeZoneId)**                                                                      | Returns the quarter of the year from the given epoch millis and timezone id. The value ranges from 1 to 4.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **month(tsInMillis)**                                                                                    | Returns the month of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 12.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **month(tsInMillis, timeZoneId)**                                                                        | Returns the month of the year from the given epoch millis and timezone id. The value ranges from 1 to 12.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **week(tsInMillis)**                                                                                     | Returns the ISO week of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 53. Alias `weekOfYear` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **week(tsInMillis, timeZoneId)**                                                                         | Returns the ISO week of the year from the given epoch millis and timezone id. The value ranges from 1 to 53. Alias `weekOfYear` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **dayOfYear(tsInMillis)**                                                                                | Returns the day of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 366. Alias `doy` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **dayOfYear(tsInMillis, timeZoneId)**                                                                    | Returns the day of the year from the given epoch millis and timezone id. The value ranges from 1 to 366. Alias `doy` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **day(tsInMillis)**                                                                                      | Returns the day of the month from the given epoch millis in UTC timezone. The value ranges from 1 to 31. Alias `dayOfMonth` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **day(tsInMillis, timeZoneId)**                                                                          | Returns the day of the month from the given epoch millis and timezone id. The value ranges from 1 to 31. Alias `dayOfMonth` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **dayOfWeek(tsInMillis)**                                                                                | Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday). Alias `dow` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **dayOfWeek(tsInMillis, timeZoneId)**                                                                    | Returns the day of the week from the given epoch millis and timezone id. The value ranges from 1(Monday) to 7(Sunday). Alias `dow` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **hour(tsInMillis)**                                                                                     | Returns the hour of the day from the given epoch millis in UTC timezone. The value ranges from 0 to 23.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **hour(tsInMillis, timeZoneId)**                                                                         | Returns the hour of the day from the given epoch millis and timezone id. The value ranges from 0 to 23.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **minute(tsInMillis)**                                                                                   | Returns the minute of the hour from the given epoch millis in UTC timezone. The value ranges from 0 to 59.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **minute(tsInMillis, timeZoneId)**                                                                       | Returns the minute of the hour from the given epoch millis and timezone id. The value ranges from 0 to 59.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **second(tsInMillis)**                                                                                   | Returns the second of the minute from the given epoch millis in UTC timezone. The value ranges from 0 to 59.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **second(tsInMillis, timeZoneId)**                                                                       | Returns the second of the minute from the given epoch millis and timezone id. The value ranges from 0 to 59.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **millisecond(tsInMillis)**                                                                              | Returns the millisecond of the second from the given epoch millis in UTC timezone. The value ranges from 0 to 999.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **millisecond(tsInMillis, timeZoneId)**                                                                  | Returns the millisecond of the second from the given epoch millis and timezone id. The value ranges from 0 to 999.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |

## JSON Functions

|                                                                                                                                           |           |                                                                                                                                                                                                                           |
| ----------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Function**                                                                                                                              | Type      | **Description**                                                                                                                                                                                                           |
| <p><strong>JSONEXTRACTSCALAR</strong></p><p><strong>(</strong>jsonField, 'jsonPath', 'resultsType', \[defaultValue]<strong>)</strong></p> | Transform | <p>Evaluates the <code>'jsonPath'</code> on <code>jsonField,</code></p><p>returns the result as the type <code>'resultsType'</code>, use optional <code>defaultValue</code>for null or parsing error.</p>                 |
| <p><strong>JSONEXTRACTKEY</strong></p><p><strong>(</strong>jsonField, 'jsonPath'<strong>)</strong></p>                                    | Transform | <p>Extracts all matched JSON field keys based on <code>'jsonPath'</code></p><p>Into a<code>STRING\_ARRAY.</code></p>                                                                                                      |
| **TOJSONMAPSTR**(map)                                                                                                                     | Scalar    | Convert map to JSON String                                                                                                                                                                                                |
| **JSONFORMAT**(object)                                                                                                                    | Scalar    | Convert object to JSON String                                                                                                                                                                                             |
| **JSONPATH**(jsonField, 'jsonPath')                                                                                                       | Scalar    | Extracts the object value from `jsonField` based on `'jsonPath'`, the result type is inferred based on JSON value. **Cannot be used in query because data type is not specified.**                                        |
| **JSONPATHLONG**(jsonField, 'jsonPath', \[defaultValue])                                                                                  | Scalar    | Extracts the **Long** value from `jsonField` based on `'jsonPath'`, use optional `defaultValue`for null or parsing error.                                                                                                 |
| **JSONPATHDOUBLE**(jsonField, 'jsonPath', \[defaultValue])                                                                                | Scalar    | Extracts the **Double** value from `jsonField` based on `'jsonPath'`, use optional `defaultValue`for null or parsing error.                                                                                               |
| **JSONPATHSTRING**(jsonField, 'jsonPath', \[defaultValue])                                                                                | Scalar    | Extracts the **String** value from `jsonField` based on `'jsonPath'`, use optional `defaultValue`for null or parsing error.                                                                                               |
| **JSONPATHARRAY**(jsonField, 'jsonPath')                                                                                                  | Scalar    | Extracts an array from `jsonField` based on `'jsonPath'`, the result type is inferred based on JSON value. **Cannot be used in query because data type is not specified.**                                                |
| **JSONPATHARRAYDEFAULTEMPTY**(jsonField, 'jsonPath')                                                                                      | Scalar    | Extracts an array from `jsonField` based on `'jsonPath'`, the result type is inferred based on JSON value. Returns empty array for null or parsing error. **Cannot be used in query because data type is not specified.** |

**Usage**

|                  |                                                                                                                                                                                                                                                        |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **`Arguments`**  | **Description**                                                                                                                                                                                                                                        |
| `jsonField`      | An **Identifier**/**Expression** contains JSON documents.                                                                                                                                                                                              |
| `'jsonPath'`     | Follows [JsonPath Syntax](https://goessner.net/articles/JsonPath/) to read values from JSON documents.                                                                                                                                                 |
| `'results_type'` | <p>One of the Pinot supported data types:<strong><code>INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING,</code></strong></p><p><strong><code>INT\_ARRAY, LONG\_ARRAY, FLOAT\_ARRAY, DOUBLE\_ARRAY, STRING\_ARRAY</code></strong><code>.</code></p> |

{% hint style="warning" %}
**`'jsonPath'`**`and`` `**`'results_type'`**&#x61;re l**iterals.** Pinot uses single quotes to distinguish them from **identifiers**.

e.g.

* `JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING')` is v**alid**.
* `JSONEXTRACTSCALAR(profile_json_str, "$.name", "STRING")` is i**nvalid**.
  {% endhint %}

{% hint style="warning" %}
**Transform** functions can only be used in Pinot SQL. **Scalar** functions can be used for column transformation in table ingestion configs.
{% endhint %}

**Examples**

The examples below are based on these 3 sample profile JSON documents:

```
{
  "name" : "Bob",
  "age" : 37,
  "gender": "male",
  "location": "San Francisco"
},{
  "name" : "Alice",
  "age" : 25,
  "gender": "female",
  "location": "New York"
},{
  "name" : "Mia",
  "age" : 18,
  "gender": "female",
  "location": "Chicago"
}
```

Query 1: Extract string values from the field 'name'

```
SELECT
    JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING')
FROM
    myTable
```

Results are

```
["Bob", "Alice", "Mia"]
```

Query 2: Extract integer values from the field 'age'

```
SELECT
    JSONEXTRACTSCALAR(profile_json_str, '$.age', 'INT')
FROM
    myTable
```

Results are

```
[37, 25, 18]
```

Query 3: Extract Bob's age from the JSON profile.

```
SELECT
    JSONEXTRACTSCALAR(myMapStr,'$.age','INT')
FROM
    myTable
WHERE
    JSONEXTRACTSCALAR(myMapStr,'$.name','STRING') = 'Bob'
```

Results are

```
[37]
```

Query 4: Extract all field keys of JSON profile.

```
SELECT
    JSONEXTRACTKEY(myMapStr,'$.*')
FROM
    myTable
```

Results are

```
["name", "age", "gender", "location"]
```

Another **example** of extracting JSON fields from below JSON record:

```
{
        "name": "Pete",
        "age": 24,
        "subjects": [{
                        "name": "maths",
                        "homework_grades": [80, 85, 90, 95, 100],
                        "grade": "A",
                        "score": 90
                },
                {
                        "name": "english",
                        "homework_grades": [60, 65, 70, 85, 90],
                        "grade": "B",
                        "score": 70
                }
        ]
}
```

Extract JSON fields:

| Expression                                                        | Value                  |
| ----------------------------------------------------------------- | ---------------------- |
| `JSONPATH(myJsonRecord, '$.name')`                                | `"Pete"`               |
| `JSONPATH(myJsonRecord, '$.age')`                                 | `24`                   |
| `JSONPATHSTRING(myJsonRecord, '$.age')`                           | `"24"`                 |
| `JSONPATHARRAY(myJsonRecord, '$.subjects[*].name')`               | `["maths", "english"]` |
| `JSONPATHARRAY(myJsonRecord, '$.subjects[*].score')`              | `[90, 70]`             |
| `JSONPATHARRAY(myJsonRecord, '$.subjects[*].homework_grades[1]')` | `[85, 65]`             |

## Binary Functions

| Function             | Description                                                        | Example           |
| -------------------- | ------------------------------------------------------------------ | ----------------- |
| **SHA(bytesCol)**    | Return SHA-1 digest of binary column(`bytes` type) as hex string   | `SHA(rawData)`    |
| **SHA256(bytesCol)** | Return SHA-256 digest of binary column(`bytes` type) as hex string | `SHA256(rawData)` |
| **SHA512(bytesCol)** | Return SHA-512 digest of binary column(`bytes` type) as hex string | `SHA512(rawData)` |
| **MD5(bytesCol)**    | Return MD5 digest of binary column(`bytes` type) as hex string     | `MD5(rawData)`    |

## Multi-value Column Functions

All of the functions mentioned till now only support single value columns. You can use the following functions to do operations on multi-value columns.

| Function        | Description                                                                                                                                                                                                                                                                                                                                                                     | Example                                      |
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- |
| **ARRAYLENGTH** | Returns the length of a multi-value column                                                                                                                                                                                                                                                                                                                                      |                                              |
| **MAP\_VALUE**  | Select the value for a key from Map stored in Pinot.                                                                                                                                                                                                                                                                                                                            | `MAP_VALUE(mapColumn, 'myKey', valueColumn)` |
| **VALUEIN**     | Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The `VALUEIN` transform function is especially useful when the same multi-valued column is both filtering column and grouping column. | `VALUEIN(mvColumn, 3, 5, 15)`                |

## Advanced Queries

### Geospatial Queries

Pinot supports Geospatial queries on columns containing text-based geographies. For more details on the queries and how to enable them, see [Geospatial](https://docs.pinot.apache.org/release-0.9.0/basics/indexing/geospatial-support).

### Text Queries

Pinot supports pattern matching on text-based columns. Only the columns mentioned as text columns in table config can be queried using this method. For more details on how to enable pattern matching, see [Text search support](https://docs.pinot.apache.org/release-0.9.0/basics/indexing/text-search-support).
