select homeRuns, baseOnBalls, ADD(homeRuns, baseOnBalls) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
ABS
This section contains reference documentation for the abs function.
Absolute of a value
Signature
ABS(col1)
Usage Examples
value
value
ceil
This section contains reference documentation for the CEIL function.
Rounded up to the nearest integer.
Signature
CEIL(col1)
CHR
This section contains reference documentation for the CHR function.
the character corresponding to the Unicode codepoint
Signature
CHR(codepoint)
codepoint
This section contains reference documentation for the CODEPOINT function.
the Unicode codepoint of the first character of the string
Signature
CODEPOINT(col)
count
This section contains reference documentation for the count function.
Get the count of rows in a group
Signature
COUNT(colName)
exp
This section contains reference documentation for the exp function.
Euler’s number(e) raised to the power of col.
Signature
EXP(col1)
FLOOR
This section contains reference documentation for the FLOOR function.
Rounded down to the nearest integer.
Signature
FLOOR(col1)
Functions
This page contains reference documentation for functions in Apache Pinot.
This page contains reference documentation for functions in Apache Pinot.
This section contains reference documentation for the day function.
Returns the day of the month from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 31.
Signature
day(tsInMillis)
day(tsInMillis, timeZoneId)
dayOfMonth(tsInMillis)
dayOfMonth(tsInMillis, timeZoneId)
Usage Examples
day
day
day
day
DISTINCTCOUNT
This section contains reference documentation for the DISTINCTCOUNT function.
Returns the count of distinct row values in a group
Signature
DISTINCTCOUNT(colName)
Usage Examples
These examples are based on the .
value
value
DISTINCTCOUNTBITMAP
This section contains reference documentation for the DISTINCTCOUNTBITMAP function.
Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.
For accurate distinct counting on all column types, see DISTINCTCOUNT.
Signature
DISTINCTCOUNTBITMAP(colName)
Usage Examples
These examples are based on the .
value
value
DISTINCTCOUNTHLL
This section contains reference documentation for the DISTINCTCOUNTHLL function.
Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.
For accurate distinct counting, see DISTINCTCOUNT.
Signature
DISTINCTCOUNTHLL(colName, log2m)
Usage Examples
These examples are based on the .
value
value
DISTINCTCOUNTBITMAPMV
This section contains reference documentation for the DISTINCTCOUNTBITMAPMV function.
Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.
Signature
DISTINCTCOUNTBITMAPMV(colName)
Usage Examples
These examples are based on the .
value
value
DISTINCTCOUNTRAWHLL
This section contains reference documentation for the DISTINCTCOUNTRAWHLL function.
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
Signature
DISTINCTCOUNTRAWHLL(colName, log2m)
Usage Examples
These examples are based on the .
value
value
DISTINCTCOUNTRAWHLLMV
This section contains reference documentation for the DISTINCTCOUNTRAWHLLMV function.
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
Signature
DISTINCTCOUNTRAWHLLMV(colName, log2m)
Usage Examples
These examples are based on the .
value
value
FromDateTime
This section contains reference documentation for the FromDateTime function.
Converts a formatted date-time string to milliseconds, based on the provided Joda-Time pattern.
Signature
FromDateTime(dateTimeString, pattern)
Usage Examples
epochMillis
epochMillis
epochMillis
epochMillis
FromEpoch
This section contains reference documentation for the fromEpoch functions.
Convert epoch to epoch milliseconds. The following time units are supported:
SECONDS
MINUTES
HOURS
DAYS
Signature
FromEpoch<TIME_UNIT>(timeIn<Time_UNIT>)
Usage Examples
epochMillis
epochMillis
epochMillis
epochMillis
Histogram
This section contains reference documentation for the HISTOGRAM function.
Returns the count of data points that fall within each bin as a vector. The bins are left-inclusive and right-exclusive, i.e. [a, b), except for the last one which is inclusive on both sides [a, b].
This section contains reference documentation for the hour function.
Returns the hour of the day from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 23.
Signature
hour(tsInMillis)
hour(tsInMillis, timeZoneId)
Usage Examples
hour
hour
JSONFORMAT
This section contains reference documentation for the JSONFORMAT function.
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.
Signature
JSONFORMAT(object)
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
This function can be used in the to extract the meta property into the data column, as described below:
lower
This section contains reference documentation for the lower function.
Converts string to lower case.
Signature
LOWER(col)
Usage Examples
name
lpad
This section contains reference documentation for the LPAD function.
string padded from the left side with pad to reach final size
Signature
LPAD(col, size, pad)
Usage Examples
value
length
This section contains reference documentation for the length function.
calculate length of the string
Signature
LENGTH(col)
Usage Examples
value
ln
This section contains reference documentation for the ln function.
Natural log of value i.e. ln(col1)
Signature
LN(col1)
Usage Examples
value
value
max
This section contains reference documentation for the max function.
Get the maximum value in a group
Signature
MAX(colName)
Usage Examples
These examples are based on the .
value
minmaxrange
This section contains reference documentation for the minmaxrange function.
Returns the max - min value in a group
Signature
MINMAXRANGE(colName)
Usage Examples
These examples are based on the .
value
MINMAXRANGEMV
This section contains reference documentation for the MINMAXRANGEMV function.
Returns the max - min value in a group
Signature
MINMAXRANGEMV(colName)
Usage Examples
These examples are based on the .
value
MINMV
This section contains reference documentation for the MINMV function.
Get the minimum value in a group
Signature
MINMV(colName)
Usage Examples
These examples are based on the .
value
now
This section contains reference documentation for the now function.
Return current time as epoch millis.
Signature
now()
Usage Examples
now
This function is typically used in predicate to filter on timestamp for recent data. e.g. filter data on recent 1 day(86400 seconds)
min
This section contains reference documentation for the min function.
Get the minimum value in a group
Signature
MIN(colName)
Usage Examples
These examples are based on the .
value
arrayConcatInt
This section contains reference documentation for the arrayConcatInt function.
Concatenates two arrays of ints.
Signature
arrayConcatInt('colName1', 'colName2')
arrayDistinctInt
This section contains reference documentation for the arrayDistinctInt function.
Returns unique values in an array of ints.
Signature
arrayDistinctInt('colName')
arrayDistinctString
This section contains reference documentation for the arrayDistinctString function.
Returns unique values in an array of strings.
Signature
arrayDistinctString('colName')
ARRAYLENGTH
This section contains reference documentation for the ARRAYLENGTH function.
Returns the length of a multi-value column
Signature
ARRAYLENGTH('colName')
arrayRemoveString
This section contains reference documentation for the arrayRemoveString function.
Removes value from array of strings.
Signature
arrayRemoveString('colName', value)
arrayReverseInt
This section contains reference documentation for the arrayReverseInt function.
Reverses array of ints.
Signature
arrayReverseInt('colName')
arraySortString
This section contains reference documentation for the arraySortString function.
Sorts array of strings.
Signature
arraySortString('colName')
arrayUnionInt
This section contains reference documentation for the arrayUnionInt function.
Create a union of two arrays of ints.
Signature
arrayUnionInt('colName1', 'colName2')
Base64
This section contains reference documentation for base64 encode and decode functions.
Encoding scheme follows
toBase64 returns Base64 encoded string of input binary data (bytes type).
fromBase64
dayOfYear
This section contains reference documentation for the dayOfYear function.
Returns the day of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 366.
Signature
dayOfYear(tsInMillis)
dayOfYear(tsInMillis, timeZoneId)
DISTINCT
This section contains reference documentation for the DISTINCT function.
Returns the distinct row values in a group
Signature
DISTINCT(colName)
DISTINCTCOUNTHLLMV
This section contains reference documentation for the DISTINCTCOUNTHLLMV function.
Returns an approximate distinct count using HyperLogLog in a group
Signature
DISTINCTCOUNTHLLMV(colName)
DISTINCTCOUNTMV
This section contains reference documentation for the DISTINCTCOUNTMV function.
Returns the count of distinct row values in a group
Signature
DISTINCTCOUNTMV(colName)
DIV
This section contains reference documentation for the DIV function.
Quotient of two values
Signature
DIV(col1, col2)
isSubnetOf
This section contains reference documentation for the isSubnetOf function.
Takes 2 arguments of type STRING. The first argument is an ipPrefix, and the second argument is a single ipAddress. This function handles both IPv4 and IPv6 arguments.
Returns a boolean value checking if ipAddress is in the subnet of ipPrefix
ltrim
This section contains reference documentation for the ltrim function.
trim spaces from left side of the string
Signature
LTRIM(col)
minute
This section contains reference documentation for the minute function.
Returns the minute of the hour from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 59.
Signature
minute(tsInMillis)
minute(tsInMillis, timeZoneId)
mode
This section contains reference documentation for the mode function.
Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.
Signature
MODE(colName, [reducerType])
MD5
This section contains reference documentation for the MD5 function.
Return MD5 digest of binary column(bytes type) as hex string
Signature
MD5(bytesCol)
percentile
This section contains reference documentation for the percentile function.
Returns the max - min value in a group
Signature
percentile(colName, percentile)
dayOfWeek
This section contains reference documentation for the dayOfWeek function.
Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday).
Signature
dayOfWeek(tsInMillis)
dayOfWeek(tsInMillis, timeZoneId)
MOD
This section contains reference documentation for the MOD function.
Modulo of two values
Signature
MOD(col1, col2)
remove
This section contains reference documentation for the remove function.
Removes all instances of search from string
Signature
remove(input, search)
select mvCol1,
arrayConcatLong(mvCol1, mvCol2) AS concatLongs
from multiValueTable
WHERE arraylength(mvCol1) >= 2
limit 5
returns binary data (represented as a Hex string) from Base64-encoded string.
Signature
toBase64(bytesCol)
fromBase64(stringCol)
Usage Examples
For better readability, the following examples converts string hello! into BYTES using toUtf8 function and converts the decoded BYTES into string using fromUtf8.
encoded
aGVsbG8h
decoded
hello!
Note that without UTF8 string conversion, returned BYTES will be represented as a Hex string following Pinot's BYTES column representation. See the example below.
decoded
68656c6c6f21
Note that the following query will throw compilation error as string is not a valid input type for toBase64.
select homeRuns, numberOfGames, DIV(homeRuns, numberOfGames) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
Signatures
isSubnetOf(ipPrefix, ipAddress) -> boolean
Usage Examples
Please see the following sample queries where isSubnetOf is used in different parts of the query.
SELECT isSubnetOf('192.168.0.1/24', '192.168.0.225')
AS result
FROM myTable;
---> returns true
SELECT isSubnetOf('1.2.3.128/26', '1.2.5.1')
AS result
FROM myTable;
---> returns false
SELECT isSubnetOf('2001:4800:7825:103::/64', '2001:4800:7825:103::2050')
AS result
FROM myTable;
---> returns true
SELECT isSubnetOf('7890:db8:113::8a2e:370:7334/127', '7890:db8:113::8a2e:370:7336')
AS result
FROM myTable;
---> returns false
Usage Examples
notTrimmed
trimmed
" Pinot with spaces "
"Pinot with spaces "
SELECT ' Pinot with spaces ' AS notTrimmed,
ltrim(' Pinot with spaces ') AS trimmed
FROM ignoreMe
Usage Examples
minute
30
minute
0
select minute(1639351800000) AS minute
FROM ignoreMe
select minute(1639351800000, 'America/St_Johns') AS minute
FROM ignoreMe
select percentile(homeRuns, 50) AS value
from baseballStats
dow(tsInMillis)
dow(tsInMillis, timeZoneId)
Usage Examples
dayOfWeek
7
dayOfWeek
1
dayOfWeek
7
dayOfWeek
1
select dayOfWeek(1639351800000) AS dayOfWeek
FROM ignoreMe
Usage Examples
value
2
value
0
select MOD(12, 5) AS value
from ignoreMe
select MOD(12, 2) AS value
from ignoreMe
Usage Examples
value
bar sheep
select remove('foo bar foo sheep', 'foo') AS value
from ignoreMe
select DivTailNums,
arrayConcatString(DivTailNums, DivTailNums) AS concatIds
from airlineStats
WHERE arraylength(DivTailNums) >= 2
limit 5
select DivAirportIDs,
arrayContainsInt(DivAirportIDs, 14683) AS containsValue
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivAirportIDs,
arrayIndexOfInt(DivAirportIDs, 14683) AS index
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivAirportIDs,
arrayRemoveInt(DivAirportIDs, 12892) AS value
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
AND arrayContainsInt(DivAirportIDs, 12892) = 1
limit 5
select FlightNum,
arraySliceInt(DivAirportIDs, 0, 1) AS airports,
DivAirportIDs
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select FlightNum,
arraySliceString(RandomAirports, 0, 2) AS airports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select DivAirportIDs,
arraySortInt(DivAirportIDs) AS sortedIds
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivTailNums,
DivAirports,
arrayUnionString(DivTailNums, DivAirports) AS unionIds
from airlineStats
WHERE arraylength(DivTailNums) >= 2
limit 5
select COUNTMV(DivTailNums) AS value
from airlineStats
where arraylength(DivTailNums) > 1
select day(1639351800000, 'CET') AS day
FROM ignoreMe
select dayOfMonth(1639351800000) AS day
FROM ignoreMe
select dayOfMonth(1639351800000, 'CET') AS day
FROM ignoreMe
select DISTINCTCOUNTBITMAP(teamID) AS value
from baseballStats
select DISTINCTCOUNTHLL(teamID, 12) AS value
from baseballStats
select DISTINCTCOUNTBITMAPMV(DivTailNums) AS value
from airlineStats
where arraylength(DivTailNums) > 1
select DISTINCTCOUNTRAWHLL(teamID, 1) AS value
from baseballStats
select DISTINCTCOUNTRAWHLLMV(DivAirports, 1) AS value
from airlineStats
where arraylength(DivAirports) > 1
SELECT FromDateTime('2019-08-07', 'yyyy-MM-dd') AS epochMillis
FROM ignoreMe
SELECT FromDateTime(
'2019-08-07 3:12:13 PM',
'yyyy-MM-dd hh:mm:ss a'
) AS epochMillis
FROM ignoreMe
SELECT FromDateTime(
'2019-08-07T15:12:13',
'yyyy-MM-dd''T''HH:mm:ss'
) AS epochMillis
FROM ignoreMe
SELECT FromDateTime(
'2019-08-07T07:12:13-0800',
'yyyy-MM-dd''T''HH:mm:ssZ'
) AS epochMillis
FROM ignoreMe
select FromEpochSeconds(1613472303) AS epochMillis
FROM ignoreMe
select FromEpochMinutes(26891205) AS epochMillis
FROM ignoreMe
select FromEpochHours(448186) AS epochMillis
FROM ignoreMe
select FromEpochDays(18674) AS epochMillis
FROM ignoreMe
SELECT HISTOGRAM(numberOfGames, 0, 200, 10) AS histogram
FROM baseballStats
select HISTOGRAM(AtBatting, Array['-Infinity', 1, 10, 50, 100, 500, 1000]) AS histogram
from baseballStats
select FromEpochSecondsBucket(1613472303, 1) AS bucket
FROM ignoreMe
select FromEpochSecondsBucket(1613472303, 2) AS bucket
FROM ignoreMe
select FromEpochMinutesBucket(2689120, 10) AS bucket
FROM ignoreMe
select FromEpochHoursBucket(89637, 5) AS bucket
FROM ignoreMe
select FromEpochDaysBucket(1867, 10) AS bucket
FROM ignoreMe
select DivWheelsOffs,
arrayConcatInt(DivWheelsOffs, DivWheelsOns) AS concatIds
from airlineStats
WHERE arraylength(DivWheelsOffs) >= 2
limit 5
select DivAirportIDs,
arrayDistinctInt(DivAirportIDs) AS unique
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivTailNums,
arrayDistinctString(DivTailNums) AS unique
from airlineStats
WHERE arraylength(DivTailNums) >= 2
limit 5
select ARRAYLENGTH(RandomAirports) AS length, count(*)
from airlineStats
GROUP BY length
ORDER BY count(*) DESC
LIMIT 5
select RandomAirports,
arrayRemoveString(RandomAirports, 'SEA') AS value
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select DivAirportIDs,
arrayReverseInt(DivAirportIDs) AS reversedIds
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
SELECT toBase64(toUtf8('hello!')) AS encoded
FROM ignoreMe
SELECT fromUtf8(fromBase64('aGVsbG8h')) AS decoded
FROM ignoreMe
SELECT fromBase64('aGVsbG8h') AS decoded
FROM ignoreMe
SELECT toBase64('hello!') AS encoded
FROM ignoreMe
select dayOfYear(1639351800000, 'CET') AS dayOfYear
FROM ignoreMe
select doy(1639351800000) AS dayOfYear
FROM ignoreMe
select doy(1639351800000, 'CET') AS dayOfYear
FROM ignoreMe
select DISTINCT league AS value
from baseballStats
select DISTINCT(league) AS value
from baseballStats
select DISTINCTCOUNTMV(DivTailNums) AS value
from airlineStats
where arraylength(DivTailNums) > 1
SELECT count(*)
FROM myTable
WHERE isSubnetOf('192.168.0.1/24', ipAddressCol);
SELECT count(*)
FROM myTable
WHERE isSubnetOf('192.168.0.1/24', ipAddressCol)
OR isSubnetOf(ipPrefixCol, '7890:db8:113::8a2e:370:7336');
SELECT
CASE
WHEN isSubnetOf('105.25.245.115/27', srcIPAddress) THEN 'case1'
WHEN isSubnetOf('105.25.245.115/27', dstIPAddress) THEN 'case2'
ELSE 'case3'
END AS differentFlow
FROM myTable;
select mode(yearID, 'AVG') AS value
from baseballStats
WHERE AtBatting != 0 AND yearID > 2001
select mode(yearID, 'MIN') AS value
from baseballStats
WHERE AtBatting != 0 AND yearID > 2001
select mode(yearID, 'MAX') AS value
from baseballStats
WHERE AtBatting != 0 AND yearID > 2001
select percentile(homeRuns, 80) AS value
from baseballStats
select percentile(homeRuns, 99.9) AS value
from baseballStats
select dayOfWeek(1639351800000, 'CET') AS dayOfWeek
FROM ignoreMe
select dow(1639351800000) AS dayOfWeek
FROM ignoreMe
select dow(1639351800000, 'CET') AS dayOfWeek
FROM ignoreMe
This section contains reference documentation for the millisecond function.
Returns the millisecond of the second from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 999.
Signature
millisecond(tsInMillis)
millisecond(tsInMillis, timeZoneId)
Usage Examples
millisecond
millisecond
replace
This section contains reference documentation for the REPLACE function.
replace all instances of find with replace in input
Signature
REPLACE(col, find, replace)
Usage Examples
value
value
reverse
This section contains reference documentation for the reverse function.
Reverse the string
Signature
REVERSE(col)
Usage Examples
name
DISTINCTCOUNTRAWTHETASKETCH
This section contains reference documentation for the DISTINCTCOUNTRAWTHETASKETCH function.
The framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.
Signature
DISTINCTCOUNTTHETASKETCH
This section contains reference documentation for the DISTINCTCOUNTTHETASKETCH function.
The framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.
Signature
DATETRUNC
This section contains reference documentation for the DATETRUNC function.
(Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc.
Converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.
Signature
JSONPATH
This section contains reference documentation for the JSONPATH function.
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an .
Signature
JSONPATH(jsonField, 'jsonPath')
JSONPATHARRAYDEFAULTEMPTY
This section contains reference documentation for the JSONPATHARRAYDEFAULTEMPTY function.
Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. This function can only be used in an .
Signature
JSONPATHDOUBLE
This section contains reference documentation for the JSONPATHDOUBLE function.
Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .
Signature
JSONPATHSTRING
This section contains reference documentation for the JSONPATHSTRING function.
Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .
Signature
percentiletdigest
This section contains reference documentation for the PERCENTILETDigest function.
Returns the Nth percentile of the group using .
Signature
PERCENTILETDigest(colName, percentile)
percentilemv
This section contains reference documentation for the PERCENTILEMV function.
Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive
Signature
PERCENTILEMV(colName, N)
percentileest
This section contains reference documentation for the percentileest function.
Returns the Nth percentile of the group using algorithm.
Signature
percentileest(colName, percentile)
select FlightNum,
arraySortString(RandomAirports) AS sortedAirports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select DivWheelsOffs,
DivWheelsOns,
arrayUnionInt(DivWheelsOffs, DivWheelsOns) AS unionIds
from airlineStats
WHERE arraylength(DivWheelsOffs) >= 2
limit 5
select month(1633046399000, 'UTC') AS month
FROM ignoreMe
9
select month(1633046399000, 'CET') AS month
FROM ignoreMe
10
select homeRuns, baseOnBalls, MULT(homeRuns, baseOnBalls) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
thetaSketchColumn (required): Name of the column to aggregate on.
thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.
Currently, the only supported parameter is nominalEntries (defaults to 4096).
predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
thetaSketchColumn (required): Name of the column to aggregate on.
thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.
Currently, the only supported parameter is nominalEntries (defaults to 4096).
predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
We can also provide predicates and a post aggregation expression to compute more complicated cardinalities. For example, we could can find the intersection of the following queries:
inputTimeUnitStr and outputTimeUnitStr support the following values:
NANOSECONDS
MICROSECONDS
MILLISECONDS
SECONDS
MINUTES
HOURS
DAYS
Usage Examples
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight):
or
ts
1639353600000
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the UTC time zone, returning a result in epoch in seconds in UTC timezone:
ts
1639353600
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the CET time zone, returning a result in epoch in seconds in UTC timezone:
ts
1639350000
Truncates an epoch in milliseconds at QUARTER in the Los Angeles time zone (where a Quarter begins on Jan 1st, April 1st, July 1st, October 1st in Los Angeles timezone), returning a result in hours since UTC epoch:
select percentileest(homeRuns, 50) AS value
from baseballStats
select FlightNum,
arrayReverseString(RandomAirports) AS reversedAirports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select id, repo, JSONEXTRACTKEY(repo, '$.*') AS keys
from githubEvents
WHERE id = 7044874109
select PERCENTILEESTMV(DivLongestGTimes, 90) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILEESTMV(DivLongestGTimes, 99.9) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILETDIGESTMV(DivLongestGTimes, 90) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILETDIGESTMV(DivLongestGTimes, 99.9) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select distinctCountRawThetaSketch(teamID) AS value
from baseballStats
select distinctCountRawThetaSketch(teamID, 'nominalEntries=10') AS value
from baseballStats
select distinctCountRawThetaSketch(
yearID,
'nominalEntries=4096',
'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
'SET_INTERSECT($1, $2)'
) AS value
from baseballStats
select distinctCountThetaSketch(teamID) AS value
from baseballStats
select distinctCountThetaSketch(teamID, 'nominalEntries=10') AS value
from baseballStats
select yearID
from baseballStats
where teamID = 'SFN' AND numberOfGames = 28 AND homeRuns = 1
select yearID
from baseballStats
where teamID = 'CHN' AND numberOfGames = 28 AND homeRuns = 1
select distinctCountThetaSketch(
yearID,
'nominalEntries=4096',
'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
'SET_INTERSECT($1, $2)'
) AS value
from baseballStats
select dateTrunc('week', 1639480981746) AS ts
FROM ignoreMe
select dateTrunc('week', 1639480981746, 'MILLISECONDS') AS ts
FROM ignoreMe
select dateTrunc(
'week',
1639480981746,
'MILLISECONDS',
'UTC',
'SECONDS'
) AS ts
FROM ignoreMe
select dateTrunc(
'week',
1639480981746,
'MILLISECONDS',
'CET',
'SECONDS'
) AS ts
FROM ignoreMe
select dateTrunc(
'quarter',
1639480981746,
'MILLISECONDS',
'America/Los_Angeles',
'HOURS'
) AS ts
FROM ignoreMe
select PERCENTILETDigest(homeRuns, 80) AS value
from baseballStats
select PERCENTILETDigest(homeRuns, 99.9) AS value
from baseballStats
select PERCENTILEMV(DivLongestGTimes, 90) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILEMV(DivLongestGTimes, 99.9) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select percentileest(homeRuns, 80) AS value
from baseballStats
select percentileest(homeRuns, 99.9) AS value
from baseballStats
Arguments
Description
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows to read values from JSON documents.
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
JSONPATH(data, '$.name')
"Pete"
JSONPATH(data, '$.age')
24
This function can be used in the table config to extract the name property into the name column and age property into the age column, as described below:
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the name, score, and second value of homework_grades into their respective columns , as described below:
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
JSONPATHDOUBLE(data, '$.age')
24.0
This function can be used in the table config to extract the age property into the age column, as described below:
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
JSONPATHSTRING(data, '$.age')
"24"
JSONPATHSTRING(data, '$.name["nick.name"]')
"Pete"
This function can be used in the table config to extract the age property into the age column, as described below:
This section contains reference documentation for the JSONPATHLONG function.
Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an ingestion transformation function.
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
This function can be used in the to extract the age property into the age column, as described below:
regexpExtract
This section contains reference documentation for the regexpExtract function.
Extracts values that match the provided regular expression
Signature
regexpExtract(value, regexp)
regexpExtract(value, regexp, group)
regexpExtract(value, regexp, group, defaultValue)
Usage Examples
value
value
value
value
value
value
quarter
This section contains reference documentation for the quarter function.
Returns the quarter of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 4
Signature
quarter(tsInMillis)
quarter(tsInMillis, timeZoneId)
Usage Examples
quarter
quarter
quarter
JSONPATHARRAY
This section contains reference documentation for the JSONPATHARRAY function.
Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an .
Signature
JSONPATHARRAY(jsonField, 'jsonPath')
jsonextractscalar
This section contains reference documentation for the JSONEXTRACTSCALAR function.
Evaluates the 'jsonPath' on jsonField, returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.
select regexpExtract('foo', '.*') AS value
from ignoreMe
select regexpExtract('foo123', '[0-9]+') AS value
from ignoreMe
select regexpExtract('foo123', '[^0-9]+') AS value
from ignoreMe
select regexpExtract('foo bar baz', '(\w+) (\w+)', 0) AS value
from ignoreMe
select regexpExtract('foo bar baz', '(\w+) (\w+)', 2) AS value
from ignoreMe
select regexpExtract('foo123', 'bar', 0, 'defaultValue') AS value
from ignoreMe
select quarter(1633046399000, 'UTC') AS quarter
FROM ignoreMe
select quarter(1633046399000, 'CET') AS quarter
FROM ignoreMe
Arguments
Description
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows to read values from JSON documents.
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the name, score, and second value of homework_grades into their respective columns , as described below:
SIMPLE_DATE_FORMAT pattern - defined in case of SIMPLE_DATE_FORMAT e.g. yyyy-MM-dd. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. Asia/Kolkata or PDT
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:DAYS:EPOCH',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:EPOCH',
'15:MINUTES'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)',
'1:MILLISECONDS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
regexpReplace
This section contains reference documentation for the regexpReplace function
Find and replace a string or regexp pattern with a target string or regexp pattern. If matchStr is not found, inputStr will be returned. By default, all occurrences of match pattern in the input string will be replaced. Default matching mode is case sensitive.
The input string or the column name on which regexpReplace function should be applied.
matchRegexp
The regular expression or string used to match against the input string or column value.
replaceRegexp
The regular expression or string to replace if a match is found.
matchStartPos
Index of inputStr from where matching should start. Counting starts and 0. Default value is 0 if not specified.
occurrence
Controls which occurence of the matched pattern must be replaced. Counting starts at 0. Default value is -1 if not specified
flag
Single character flag that controls how the regex finds matches in inputStr. If an incorrect flag is specified, the function applies default case sensitive match. Only one flag can be specified. Supported flags are:
i -> case insensitive match
Usage Examples
Example 1
In the example below, shows a simple string find and replace example where all occurrences of the matched string o is replaced with string x.
value
Example 2
The example below shows how a regexp pattern containing consecutive digits is found and replaced with a simple string bar.
value
Example 3
The example below shows how a regexp pattern containing consecutive non-digits is found and replaced with a simple string bar.
value
Example 4
The following example demonstrates how replaceStr can contain backreferences to substrings captured by the matchStr regular expression. Backreferences are indicated by $n where n can range from 0-9. In the example below, every character in the input is replaced by the character appended with a space.
value
Example 5
This example shows how regexpReplace can be used to remove extra whitespaces between words in an input string.
value
Example 6
This example shows the power of backreferencing can be used in regexpReplace to format phone numbers.
value
Example 7
This example shows how the matchStartPos parameter can be used. Since the matchStartPos is set to 4, pattern matching against the inputStr begins at index 4 there by leading to the string healthy not being replaced.
value
Example 8
This example shows how the occurence parameter can be used. In the example below, the matchStr regular expression matches against three instances in the input - healthy, wealthy and stealthy. As the occurence is specified to 2, the second occurence (counting from zero) stealthy is replaced with something
value
Example 9
The example below shows the usage of the flag parameter. Here the case insensitive flag i is specified.
value
Example 10
The examples below show some sample queries using regexpReplace in there WHERE clause of a query.
fxx
foobar
bar123
f o o
Pinot is blazing fast
1-(123) 456-7898
healthy, something, something and wise
healthy, wealthy, something and wise
something, wealthy, stealthy and wise
select regexpReplace('foo', 'o', 'x') AS value
from myTable
select regexpReplace('foo123', '[0-9]+', 'bar') AS value
from myTable
select regexpReplace('foo123', '[^0-9]+', 'bar') AS value
from myTable
select regexpReplace('foo', '(.)', '$1 ') AS value
from myTable
select regexpReplace('Pinot is blazing fast', '( ){2,}', ' ') AS value
from myTable
select regexpReplace('11234567898','(\\d)(\\d{3})(\\d{3})(\\d{4})', '$1-($2) $3-$4') AS value
from myTable
select regexpReplace('healthy, wealthy, stealthy and wise','\\w+thy', 'something', 4) AS value
from myTable
select regexpReplace('healthy, wealthy, stealthy and wise','\\w+thy', 'something', 0, 2) AS value
from myTable
select regexpReplace('healthy, wealthy, stealthy and wise','\\w+THY', 'something', 0, 0, 'i') AS value
from myTable
SELECT col1, col2
FROM myTable
WHERE regexpReplace(stateCode, '[VC]A', 'TEST') = 'TEST'
SELECT count(*)
FROM myTable
WHERE contains(regexpReplace(stateCode, '(C)(A)', '$1TEST$2'), 'CTESTA')