# Supported Transformations

## Math Functions

| Function                      | Description                                   | Example                                          |
| ----------------------------- | --------------------------------------------- | ------------------------------------------------ |
| **ADD(col1, col2, col3...)**  | Sum of at least two values                    | `ADD(score_maths, score_science, score_history)` |
| **SUB(col1, col2)**           | Difference between two values                 | `SUB(total_score, score_science)`                |
| **MULT(col1, col2, col3...)** | Product of at least two values                | `MUL(score_maths, score_science, score_history)` |
| **DIV(col1, col2)**           | Quotient of two values                        | `SUB(total_score, total_subjects)`               |
| **MOD(col1, col2)**           | Modulo of two values                          | `MOD(total_score, total_subjects)`               |
| **ABS(col1)**                 | Absolute of a value                           | `ABS(score)`                                     |
| **CEIL(col1)**                | Rounded up to the nearest integer.            | `CEIL(percentage)`                               |
| **FLOOR(col1)**               | Rounded down to the nearest integer.          | `FLOOR(percentage)`                              |
| **EXP(col1)**                 | Euler’s number(e) raised to the power of col. | `EXP(age)`                                       |
| **LN(col1)**                  | Natural log of value i.e. ln(col1)            | `LN(age)`                                        |
| **SQRT(col1)**                | Square root of a value                        | `SQRT(height)`                                   |

## String Functions

Multiple string functions are supported out of the box from release-0.5.0 .

| Function                              | Description                                                                                                                                                                | Example                                                                                                         |
| ------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| **UPPER**(col)                        | convert string to upper case                                                                                                                                               | `UPPER(playerName)`                                                                                             |
| **LOWER**(col)                        | convert string to lower case                                                                                                                                               | `LOWER(playerName)`                                                                                             |
| **REVERSE**(col)                      | reverse the string                                                                                                                                                         | `REVERSE(playerName)`                                                                                           |
| **SUBSTR**(col, startIndex, endIndex) | <p>get substring of the input string from start to endIndex.<br>Index begins at 0.<br>Set endIndex to -1 to calculate till end of the string</p>                           | <p><code>SUBSTR(playerName, 1, -1)</code></p><p>\<code>\</code></p><p><code>SUBSTR(playerName, 1, 4)</code></p> |
| **CONCAT**(col1, col2, seperator)     | Concatenate two input strings using the seperator                                                                                                                          | `CONCAT(firstName, lastName, '-')`                                                                              |
| **TRIM**(col)                         | trim spaces from both side of the string                                                                                                                                   | `TRIM(playerName)`                                                                                              |
| **LTRIM**(col)                        | trim spaces from left side of the string                                                                                                                                   | `LTRIM(playerName)`                                                                                             |
| **RTRIM**(col)                        | trim spaces from right side of the string                                                                                                                                  | `RTRIM(playerName)`                                                                                             |
| **LENGTH**(col)                       | calculate length of the string                                                                                                                                             | `LENGTH(playerName)`                                                                                            |
| **STRPOS**(col, find, N)              | <p>find Nth instance of <code>find</code> string in input.<br>Returns 0 if input string is empty. Returns -1 if the Nth instance is not found or input string is null.</p> | `STRPOS(playerName, 'david', 1)`                                                                                |
| **STARTSWITH**(col, prefix)           | returns `true` if columns starts with prefix string.                                                                                                                       | `STARTSWITH(playerName, 'david')`                                                                               |
| **REPLACE**(col, find, substitute)    | replace all instances of `find` with `replace` in input                                                                                                                    | `REPLACE(playerName, 'david', 'henry')`                                                                         |
| **RPAD**(col, size, pad)              | string padded from the right side with `pad` to reach final `size`                                                                                                         | `RPAD(playerName, 20, 'foo')`                                                                                   |
| **LPAD**(col, size, pad)              | string padded from the left side with `pad` to reach final `size`                                                                                                          | `LPAD(playerName, 20, 'foo')`                                                                                   |
| **CODEPOINT**(col)                    | the Unicode codepoint of the first character of the string                                                                                                                 | `CODEPOINT(playerName)`                                                                                         |
| **CHR**(codepoint)                    | the character corresponding to the Unicode codepoint                                                                                                                       | `CHR(68)`                                                                                                       |

## DateTime Functions

Date time functions allow you to perform transformations on columns that contain timestamps or dates.

| Function                                                                                                 | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Example                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| -------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <p><strong>TIMECONVERT</strong></p><p><strong>(</strong>col, fromUnit, toUnit<strong>)</strong></p>      | <p>Converts the value into another time unit. the column should be an epoch timestamp. Supported units are<br><code>DAYS HOURS MINUTES SECONDS MILLISECONDS MICROSECONDS NANOSECONDS</code></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | `TIMECONVERT(time, 'MILLISECONDS', 'SECONDS')`This expression converts the value of column `time` (taken to be in milliseconds) to the nearest seconds (*i.e.* the nearest seconds that is lower than the value of `date` column)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| <p><strong>DATETIMECONVERT</strong></p><p>(columnName, inputFormat, outputFormat, outputGranularity)</p> | <p>Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity. Note that, for weeks/months/quarters/years, please use function: <strong>DateTrunc</strong>.</p><p>The format is expressed as <code>\<time size>:\<time unit>:\<time format>:\<pattern></code><br>where,</p><p><strong><code>time size</code></strong> - size of the time unit eg: 1, 10</p><p><strong><code>time unit</code></strong> - <code>DAYS HOURS MINUTES SECONDS MILLISECONDS MICROSECONDS NANOSECONDS</code></p><p><strong><code>time format</code></strong> - <code>EPOCH</code> or <code>SIMPLE\_DATE\_FORMAT</code></p><p><code>pattern</code> - this is defined in case of <code>SIMPLE\_DATE\_FORMAT</code> eg: <code>yyyy-MM-dd</code>. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. <code>Asia/Kolkata</code> or <code>PDT</code></p><p><strong><code>granularity</code></strong> <strong>-</strong> specified in the format<code>\<time size>:\<time unit></code></p> | <ul><li><code>Date</code> from <code>hoursSinceEpoch</code> to <code>daysSinceEpoch</code> and bucket it to 1 day granularity<br><br><code>DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:EPOCH', '1:DAYS')</code><br></li><li><code>Date</code> to 15 minutes granularity<br><br><code>DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES')</code><br></li><li><code>Date</code> from <code>hoursSinceEpoch</code> to format <code>yyyyMdd</code> and bucket it to 1 days granularity<br><br><code>DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:SIMPLE\_DATE\_FORMAT:yyyyMMdd', '1:DAYS')</code><br><br></li><li><code>Date</code> from milliseconds to format <code>yyyyMdd</code> in timezone PST<br><br><code>DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE\_DATE\_FORMAT:yyyyMMdd tz(America/Los\_Angeles)', '1:DAYS')</code></li></ul> |
| **DATETRUNC**                                                                                            | <p>(Presto) SQL compatible date truncation, equivalent to the Presto function <a href="https://mode.com/blog/date-trunc-sql-timestamp-function-count-on">date\_trunc</a>.<br></p><p>Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | <p><code>DATETRUNC('week', time\_in\_seconds, 'SECONDS')</code> This expression converts the column <code>time\_in\_seconds</code>, which is a long containing seconds since UTC epoch truncated at <code>WEEK</code> (where a Week starts at Monday UTC midnight). The output is a long seconds since UTC epoch.<br></p><p><code>DATETRUNC('quarter', DIV(time\_milliseconds/1000), 'SECONDS', 'America/Los\_Angeles', 'HOURS')</code> This expression converts the expression <code>time\_in\_milliseconds/1000</code>into hours that are truncated to <code>QUARTER</code> at the Los Angeles time zone (where a Quarter begins on 1/1, 4/1, 7/1, 10/1 in Los Angeles timezone). The output is expressed as hours since UTC epoch (note that the output is not Los Angeles timezone)</p>                                                                                                  |
| **ToEpoch\<TIME\_UNIT>(timeInMillis)**                                                                   | Convert epoch milliseconds to epoch \<Time Unit>. Supported \<Time Unit>: `SECONDS/MINUTES/HOURS/DAYS`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | <p><code>ToEpochSeconds(tsInMillis):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch seconds.</p><p><code>ToEpochDays(tsInMillis):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch days.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| **ToEpoch\<TIME\_UNIT>Rounded(timeInMillis, bucketSize)**                                                | Convert epoch milliseconds to epoch \<Time Unit>, round to nearest rounding bucket(Bucket size is defined in \<Time Unit>). Supported \<Time Unit>: `SECONDS/MINUTES/HOURS/DAYS`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <p><code>ToEpochSecondsRound(tsInMillis, 10):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch seconds and round to the 10-minute bucket value. E.g.<code>ToEpochSecondsRound(</code>1613472303000, 10) = 1613472300</p><p><code>ToEpochMinutesRound(tsInMillis, 1440):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch Minutes, but round to 1-day bucket value. E.g.<code>ToEpochMinutesRound(</code>1613472303000, 1440) = 26890560</p>                                                                                                                                                                                                                                                                                                                                                                          |
| **ToEpoch\<TIME\_UNIT>Bucket(timeInMillis, bucketSize)**                                                 | Convert epoch milliseconds to epoch \<Time Unit>, and divided by bucket size(Bucket size is defined in \<Time Unit>). Supported \<Time Unit>: `SECONDS/MINUTES/HOURS/DAYS`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <p><code>ToEpochSecondsBucket(tsInMillis, 10):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch seconds then divide by 10 to get the 10 seconds since epoch value. E.g.</p><p><code>ToEpochSecondsBucket(</code>1613472303000, 10) = 161347230</p><p><code>ToEpochHoursBucket(tsInMillis, 24):</code>Converts column <code>tsInMillis</code> value from epoch milliseconds to epoch Hours, then divide by 24 to get 24 hours since epoch value.</p>                                                                                                                                                                                                                                                                                                                                                                                                      |
| **FromEpoch\<TIME\_UNIT>(timeIn\<Time\_UNIT>)**                                                          | Convert epoch \<Time Unit> to epoch milliseconds. Supported \<Time Unit>: `SECONDS/MINUTES/HOURS/DAYS`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | <p><code>FromEpochSeconds(tsInSeconds):</code>Converts column <code>tsInSeconds</code> value from epoch seconds to epoch milliseconds. E.g.</p><p><code>FromEpochSeconds(</code>1613472303) = 1613472303000</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **FromEpoch\<TIME\_UNIT>Bucket(timeIn\<Time\_UNIT>, bucketSizeIn\<Time\_UNIT>)**                         | Convert epoch \<Bucket Size>\<Time Unit> to epoch milliseconds. E.g. 10 seconds since epoch or 5 minutes since Epoch. Supported \<Time Unit>: `SECONDS/MINUTES/HOURS/DAYS`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <p><code>FromEpochSecondsBucket(tsInSeconds, 10):</code>Converts column <code>tsInSeconds</code> value from epoch 10-seconds to epoch milliseconds. E.g.</p><p><code>FromEpochSeconds(161347231)= 1613472310000</code></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| **ToDateTime(timeInMillis, pattern\[, timezoneId])**                                                     | Convert epoch millis value to DateTime string represented by pattern. Time zone will be set to UTC if `timezoneId` is not specified.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <p><code>ToDateTime(tsInMillis, 'yyyy-MM-dd')</code> converts tsInMillis value to date time pattern <code>yyyy-MM-dd</code></p><p><code>ToDateTime(tsInMillis, 'yyyy-MM-dd ZZZ', 'America/Los\_Angeles')</code> converts tsInMillis value to date time pattern <code>yyyy-MM-dd ZZZ</code> in America/Los\_Angeles time zone</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| **FromDateTime(dateTimeString, pattern)**                                                                | Convert DateTime string represented by pattern to epoch millis.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | `FromDateTime(dateTime, 'yyyy-MM-dd')`converts `dateTime` string value to millis epoch value                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| **round(timeValue, bucketSize)**                                                                         | Round the given time value to nearest bucket start value.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | `round(tsInSeconds, 60)` round seconds epoch value to the start value of the 60 seconds bucket it belongs to. E.g. `round(161347231, 60)= 161347200`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| **now()**                                                                                                | Return current time as epoch millis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Typically used in predicate to filter on timestamp for recent data. E.g. filter data on recent 1 day(86400 seconds).`WHERE tsInMillis > now() - 86400000`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| **timezoneHour(timeZoneId)**                                                                             | Returns the hour of the time zone offset.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **timezoneMinute(timeZoneId)**                                                                           | Returns the minute of the time zone offset.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **year(tsInMillis)**                                                                                     | Returns the year from the given epoch millis in UTC timezone.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **year(tsInMillis, timeZoneId)**                                                                         | Returns the year from the given epoch millis and timezone id.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **yearOfWeek(tsInMillis)**                                                                               | Returns the year of the ISO week from the given epoch millis in UTC timezone. Alias `yow`is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **yearOfWeek(tsInMillis, timeZoneId)**                                                                   | Returns the year of the ISO week from the given epoch millis and timezone id. Alias `yow`is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **quarter(tsInMillis)**                                                                                  | Returns the quarter of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 4.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **quarter(tsInMillis, timeZoneId)**                                                                      | Returns the quarter of the year from the given epoch millis and timezone id. The value ranges from 1 to 4.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **month(tsInMillis)**                                                                                    | Returns the month of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 12.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **month(tsInMillis, timeZoneId)**                                                                        | Returns the month of the year from the given epoch millis and timezone id. The value ranges from 1 to 12.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **week(tsInMillis)**                                                                                     | Returns the ISO week of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 53. Alias `weekOfYear` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **week(tsInMillis, timeZoneId)**                                                                         | Returns the ISO week of the year from the given epoch millis and timezone id. The value ranges from 1 to 53. Alias `weekOfYear` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **dayOfYear(tsInMillis)**                                                                                | Returns the day of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 366. Alias `doy` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **dayOfYear(tsInMillis, timeZoneId)**                                                                    | Returns the day of the year from the given epoch millis and timezone id. The value ranges from 1 to 366. Alias `doy` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **day(tsInMillis)**                                                                                      | Returns the day of the month from the given epoch millis in UTC timezone. The value ranges from 1 to 31. Alias `dayOfMonth` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **day(tsInMillis, timeZoneId)**                                                                          | Returns the day of the month from the given epoch millis and timezone id. The value ranges from 1 to 31. Alias `dayOfMonth` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **dayOfWeek(tsInMillis)**                                                                                | Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday). Alias `dow` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **dayOfWeek(tsInMillis, timeZoneId)**                                                                    | Returns the day of the week from the given epoch millis and timezone id. The value ranges from 1(Monday) to 7(Sunday). Alias `dow` is also supported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **hour(tsInMillis)**                                                                                     | Returns the hour of the day from the given epoch millis in UTC timezone. The value ranges from 0 to 23.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **hour(tsInMillis, timeZoneId)**                                                                         | Returns the hour of the day from the given epoch millis and timezone id. The value ranges from 0 to 23.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **minute(tsInMillis)**                                                                                   | Returns the minute of the hour from the given epoch millis in UTC timezone. The value ranges from 0 to 59.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **minute(tsInMillis, timeZoneId)**                                                                       | Returns the minute of the hour from the given epoch millis and timezone id. The value ranges from 0 to 59.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **second(tsInMillis)**                                                                                   | Returns the second of the minute from the given epoch millis in UTC timezone. The value ranges from 0 to 59.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **second(tsInMillis, timeZoneId)**                                                                       | Returns the second of the minute from the given epoch millis and timezone id. The value ranges from 0 to 59.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **millisecond(tsInMillis)**                                                                              | Returns the millisecond of the second from the given epoch millis in UTC timezone. The value ranges from 0 to 999.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| **millisecond(tsInMillis, timeZoneId)**                                                                  | Returns the millisecond of the second from the given epoch millis and timezone id. The value ranges from 0 to 999.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |

## JSON Functions

|                                                                                                                                           |           |                                                                                                                                                                                                                           |
| ----------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Function**                                                                                                                              | Type      | **Description**                                                                                                                                                                                                           |
| <p><strong>JSONEXTRACTSCALAR</strong></p><p><strong>(</strong>jsonField, 'jsonPath', 'resultsType', \[defaultValue]<strong>)</strong></p> | Transform | <p>Evaluates the <code>'jsonPath'</code> on <code>jsonField,</code></p><p>returns the result as the type <code>'resultsType'</code>, use optional <code>defaultValue</code>for null or parsing error.</p>                 |
| <p><strong>JSONEXTRACTKEY</strong></p><p><strong>(</strong>jsonField, 'jsonPath'<strong>)</strong></p>                                    | Transform | <p>Extracts all matched JSON field keys based on <code>'jsonPath'</code></p><p>Into a<code>STRING\_ARRAY.</code></p>                                                                                                      |
| **TOJSONMAPSTR**(map)                                                                                                                     | Scalar    | Convert map to JSON String                                                                                                                                                                                                |
| **JSONFORMAT**(object)                                                                                                                    | Scalar    | Convert object to JSON String                                                                                                                                                                                             |
| **JSONPATH**(jsonField, 'jsonPath')                                                                                                       | Scalar    | Extracts the object value from `jsonField` based on `'jsonPath'`, the result type is inferred based on JSON value. **Cannot be used in query because data type is not specified.**                                        |
| **JSONPATHLONG**(jsonField, 'jsonPath', \[defaultValue])                                                                                  | Scalar    | Extracts the **Long** value from `jsonField` based on `'jsonPath'`, use optional `defaultValue`for null or parsing error.                                                                                                 |
| **JSONPATHDOUBLE**(jsonField, 'jsonPath', \[defaultValue])                                                                                | Scalar    | Extracts the **Double** value from `jsonField` based on `'jsonPath'`, use optional `defaultValue`for null or parsing error.                                                                                               |
| **JSONPATHSTRING**(jsonField, 'jsonPath', \[defaultValue])                                                                                | Scalar    | Extracts the **String** value from `jsonField` based on `'jsonPath'`, use optional `defaultValue`for null or parsing error.                                                                                               |
| **JSONPATHARRAY**(jsonField, 'jsonPath')                                                                                                  | Scalar    | Extracts an array from `jsonField` based on `'jsonPath'`, the result type is inferred based on JSON value. **Cannot be used in query because data type is not specified.**                                                |
| **JSONPATHARRAYDEFAULTEMPTY**(jsonField, 'jsonPath')                                                                                      | Scalar    | Extracts an array from `jsonField` based on `'jsonPath'`, the result type is inferred based on JSON value. Returns empty array for null or parsing error. **Cannot be used in query because data type is not specified.** |

**Usage**

|                  |                                                                                                                                                                                                                                                        |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **`Arguments`**  | **Description**                                                                                                                                                                                                                                        |
| `jsonField`      | An **Identifier**/**Expression** contains JSON documents.                                                                                                                                                                                              |
| `'jsonPath'`     | Follows [JsonPath Syntax](https://goessner.net/articles/JsonPath/) to read values from JSON documents.                                                                                                                                                 |
| `'results_type'` | <p>One of the Pinot supported data types:<strong><code>INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING,</code></strong></p><p><strong><code>INT\_ARRAY, LONG\_ARRAY, FLOAT\_ARRAY, DOUBLE\_ARRAY, STRING\_ARRAY</code></strong><code>.</code></p> |

{% hint style="warning" %}
**`'jsonPath'`**`and`` `**`'results_type'`**&#x61;re l**iterals.** Pinot uses single quotes to distinguish them from **identifiers**.

e.g.

* `JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING')` is v**alid**.
* `JSONEXTRACTSCALAR(profile_json_str, "$.name", "STRING")` is i**nvalid**.
  {% endhint %}

{% hint style="warning" %}
**Transform** functions can only be used in Pinot SQL. **Scalar** functions can be used for column transformation in table ingestion configs.
{% endhint %}

**Examples**

The examples below are based on these 3 sample profile JSON documents:

```
{
  "name" : "Bob",
  "age" : 37,
  "gender": "male",
  "location": "San Francisco"
},{
  "name" : "Alice",
  "age" : 25,
  "gender": "female",
  "location": "New York"
},{
  "name" : "Mia",
  "age" : 18,
  "gender": "female",
  "location": "Chicago"
}
```

Query 1: Extract string values from the field 'name'

```
SELECT
    JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING')
FROM
    myTable
```

Results are

```
["Bob", "Alice", "Mia"]
```

Query 2: Extract integer values from the field 'age'

```
SELECT
    JSONEXTRACTSCALAR(profile_json_str, '$.age', 'INT')
FROM
    myTable
```

Results are

```
[37, 25, 18]
```

Query 3: Extract Bob's age from the JSON profile.

```
SELECT
    JSONEXTRACTSCALAR(myMapStr,'$.age','INT')
FROM
    myTable
WHERE
    JSONEXTRACTSCALAR(myMapStr,'$.name','STRING') = 'Bob'
```

Results are

```
[37]
```

Query 4: Extract all field keys of JSON profile.

```
SELECT
    JSONEXTRACTKEY(myMapStr,'$.*')
FROM
    myTable
```

Results are

```
["name", "age", "gender", "location"]
```

Another **example** of extracting JSON fields from below JSON record:

```
{
        "name": "Pete",
        "age": 24,
        "subjects": [{
                        "name": "maths",
                        "homework_grades": [80, 85, 90, 95, 100],
                        "grade": "A",
                        "score": 90
                },
                {
                        "name": "english",
                        "homework_grades": [60, 65, 70, 85, 90],
                        "grade": "B",
                        "score": 70
                }
        ]
}
```

Extract JSON fields:

| Expression                                                        | Value                  |
| ----------------------------------------------------------------- | ---------------------- |
| `JSONPATH(myJsonRecord, '$.name')`                                | `"Pete"`               |
| `JSONPATH(myJsonRecord, '$.age')`                                 | `24`                   |
| `JSONPATHSTRING(myJsonRecord, '$.age')`                           | `"24"`                 |
| `JSONPATHARRAY(myJsonRecord, '$.subjects[*].name')`               | `["maths", "english"]` |
| `JSONPATHARRAY(myJsonRecord, '$.subjects[*].score')`              | `[90, 70]`             |
| `JSONPATHARRAY(myJsonRecord, '$.subjects[*].homework_grades[1]')` | `[85, 65]`             |

## Binary Functions

| Function             | Description                                                        | Example           |
| -------------------- | ------------------------------------------------------------------ | ----------------- |
| **SHA(bytesCol)**    | Return SHA-1 digest of binary column(`bytes` type) as hex string   | `SHA(rawData)`    |
| **SHA256(bytesCol)** | Return SHA-256 digest of binary column(`bytes` type) as hex string | `SHA256(rawData)` |
| **SHA512(bytesCol)** | Return SHA-512 digest of binary column(`bytes` type) as hex string | `SHA512(rawData)` |
| **MD5(bytesCol)**    | Return MD5 digest of binary column(`bytes` type) as hex string     | `MD5(rawData)`    |

## Multi-value Column Functions

All of the functions mentioned till now only support single value columns. You can use the following functions to do operations on multi-value columns.

| Function        | Description                                                                                                                                                                                                                                                                                                                                                                     | Example                                      |
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- |
| **ARRAYLENGTH** | Returns the length of a multi-value column                                                                                                                                                                                                                                                                                                                                      |                                              |
| **MAP\_VALUE**  | Select the value for a key from Map stored in Pinot.                                                                                                                                                                                                                                                                                                                            | `MAP_VALUE(mapColumn, 'myKey', valueColumn)` |
| **VALUEIN**     | Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The `VALUEIN` transform function is especially useful when the same multi-valued column is both filtering column and grouping column. | `VALUEIN(mvColumn, 3, 5, 15)`                |

## Advanced Queries

### Geospatial Queries

Pinot supports Geospatial queries on columns containing text-based geographies. For more details on the queries and how to enable them, see [Geospatial](/release-0.9.0/basics/indexing/geospatial-support.md).

### Text Queries

Pinot supports pattern matching on text-based columns. Only the columns mentioned as text columns in table config can be queried using this method. For more details on how to enable pattern matching, see [Text search support](/release-0.9.0/basics/indexing/text-search-support.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pinot.apache.org/release-0.9.0/users/user-guide-query/supported-transformations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
