This section contains reference documentation for the arrayConcatDouble function.
Concatenates two arrays of doubles.
Signature
arrayConcatDouble('colName1', 'colName2')
Usage Examples
This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type DOUBLE with singleValueField in the table schema set to false.
ADD
This section contains reference documentation for the ADD function.
Sum of at least two values
Signature
ADD(col1, col2, col3...)
Usage Examples
These examples are based on the .
homeRuns
baseOnBalls
total
ABS
This section contains reference documentation for the abs function.
Absolute of a value
Signature
ABS(col1)
Usage Examples
value
value
ago
This section contains reference documentation for the ago function.
Return time as epoch millis before the given period (in ISO-8601 duration format).
Examples:
"PT20.345S" -- parses as "20.345 seconds"
"PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)
ARG_MIN / ARG_MAX
This section contains reference documentation for the ARG_MIN and ARG_MAX function.
This function scans the given dataset to identify the maximum and minimum values in the specified measuring columns. Once these extreme values (the maxima and minima) are found, the function locates the corresponding entries in the projection column. These entries are associated with the rows where the extreme values were found in the measuring columns. The function then returns these projection column values, providing a way to link the extreme measurements with their corresponding data in another part of the dataset.
This section contains reference documentation for the arrayConcatFloat function.
Concatenates two arrays of floats.
Signature
arrayConcatFloat('colName1', 'colName2')
select mvCol1,
arrayConcatDouble(mvCol1, mvCol2) AS concatDoubles
from multiValueTable
WHERE arraylength(mvCol1) >= 2
limit 5
select homeRuns, baseOnBalls, ADD(homeRuns, baseOnBalls) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
Find the user with maximum activity. If there are multiple users, break the tie with their last_activity_date. If still a tie, break with user_id. And project user_id.
More useful is that this multiple such aggregation function can be used with GROUP BY
Note:
In cases where multiple rows share the same extreme values in the measuring columns, all such rows will be returned by the function.
If the goal is to project multiple different columns that correspond to the same set of measuring columns, you can achieve this by invoking the function multiple times, each time specifying a different projection column.
This impl does not work with AS clause (e.g. SELECT argmin(longCol, doubleCol) AS argmin won't work)
Putting argmin/argmax column inside order by clause (e.g. SELECT intCol, argmin(longCol, doubleCol) FROM table GROUP BY intCol ORDER BY argmin(longCol, doubleCol)) is not supported as semantically ordering multi-column multi-row argmin/argmax results doesn't make sense
Currently projecting MV bytes column doesn't work for now due to an issue
This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type FLOAT with singleValueField in the table schema set to false.
select mvCol1,
arrayConcatFloat(mvCol1, mvCol2) AS concatFloats
from multiValueTable
WHERE arraylength(mvCol1) >= 2
limit 5
select ago('P1D') AS oneDayAgo
FROM ignoreMe
SELECT *
FROM tableName
WHERE tsInMillis > ago('P1D')
SELECT ARG_MAX(activity, last_activity_date, user_id, user_id)
FROM userEngagmentTable
SELECT user_region, ARG_MAX(activity, last_activity_date, user_id, user_id),
ARG_MIN(user_satisfaction, user_id)
FROM userEngagmentTable
GROUP BY user_region
CHR
This section contains reference documentation for the CHR function.
the character corresponding to the Unicode codepoint
Signature
CHR(codepoint)
COVAR_POP
This section contains reference documentation for the COVAR_POP function.
Returns the population covariance between of 2 numerical columns.
This section contains reference documentation for the DISTINCTCOUNTBITMAP function.
Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.
For accurate distinct counting on all column types, see DISTINCTCOUNT.
Signature
DISTINCTCOUNTBITMAP(colName)
Usage Examples
These examples are based on the .
value
value
lower
This section contains reference documentation for the lower function.
Converts string to lower case.
Signature
LOWER(col)
Usage Examples
name
arrayConcatLong
This section contains reference documentation for the arrayConcatLong function.
Concatenates two arrays of longs.
Signature
arrayConcatLong('colName1', 'colName2')
Usage Examples
This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type LONG with singleValueField in the table schema set to false.
min
This section contains reference documentation for the min function.
Get the minimum value in a group
Signature
MIN(colName)
Usage Examples
These examples are based on the .
value
MOD
This section contains reference documentation for the MOD function.
Modulo of two values
Signature
MOD(col1, col2)
Usage Examples
value
value
AVGMV
This section contains reference documentation for the AVGMV function.
Get the avg of values in a group
Signature
AVGMV(colName)
Usage Examples
These examples are based on the .
value
arrayContainsInt
This section contains reference documentation for the arrayContainsInt function.
Checks if int value exists in array.
Signature
arrayContainsInt('colName', valueToFind)
arrayDistinctInt
This section contains reference documentation for the arrayDistinctInt function.
Returns unique values in an array of ints.
Signature
arrayDistinctInt('colName')
arrayDistinctString
This section contains reference documentation for the arrayDistinctString function.
Returns unique values in an array of strings.
Signature
arrayDistinctString('colName')
arrayRemoveString
This section contains reference documentation for the arrayRemoveString function.
Removes value from array of strings.
Signature
arrayRemoveString('colName', value)
arraySortInt
This section contains reference documentation for the arraySortInt function.
Sorts array of ints.
Signature
arraySortInt('colName')
Base64
This section contains reference documentation for base64 encode and decode functions.
Encoding scheme follows
toBase64 returns Base64 encoded string of input binary data (bytes type).
fromBase64
dayOfYear
This section contains reference documentation for the dayOfYear function.
Returns the day of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 366.
Signature
dayOfYear(tsInMillis)
dayOfYear(tsInMillis, timeZoneId)
arrayIndexOfString
This section contains reference documentation for the arrayIndexOfString function.
Finds the last index of the given value in the array starting at the given index.
Signature
arrayIndexOfString('colName', valueToFind)
ARRAYLENGTH
This section contains reference documentation for the ARRAYLENGTH function.
Returns the length of a multi-value column
Signature
ARRAYLENGTH('colName')
arrayConcatInt
This section contains reference documentation for the arrayConcatInt function.
Concatenates two arrays of ints.
Signature
arrayConcatInt('colName1', 'colName2')
ceil
This section contains reference documentation for the CEIL function.
Rounded up to the nearest integer.
Signature
CEIL(col1)
DISTINCTCOUNTBITMAPMV
This section contains reference documentation for the DISTINCTCOUNTBITMAPMV function.
Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.
Signature
DISTINCTCOUNTBITMAPMV(colName)
DISTINCTSUMMV
This section contains reference documentation for the DISTINCTSUMMV function.
Returns the sum of the distinct row values in a group
Signature
DISTINCTSUMMV(colName)
arrayRemoveInt
This section contains reference documentation for the arrayRemoveInt function.
Removes value from array of ints.
Signature
arrayRemoveInt('colName', value)
DISTINCTCOUNTRAWHLL
This section contains reference documentation for the DISTINCTCOUNTRAWHLL function.
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
Signature
DISTINCTCOUNTRAWHLL(colName, log2m)
FromEpochBucket
This section contains reference documentation for the fromEpochBucket functions.
Convert epoch to epoch milliseconds. e.g. 10 seconds since epoch or 5 minutes since Epoch. The following time units are supported:
SECONDS
MINUTES
count
This section contains reference documentation for the count function.
Get the count of rows in a group
Signature
COUNT(colName)
FromDateTime
This section contains reference documentation for the FromDateTime function.
Converts a formatted date-time string to milliseconds, based on the provided .
Signature
FromDateTime(dateTimeString, pattern)
DISTINCTCOUNTMV
This section contains reference documentation for the DISTINCTCOUNTMV function.
Returns the count of distinct row values in a group
Signature
DISTINCTCOUNTMV(colName)
ltrim
This section contains reference documentation for the ltrim function.
trim spaces from left side of the string
Signature
LTRIM(col)
concat
This section contains reference documentation for the concat function.
Concatenate two input strings using the seperator
Signature
CONCAT(col1, col2, seperator)
jsonextractkey
This section contains reference documentation for the JSONEXTRACTKEY function.
Extracts all matched JSON field keys based on 'jsonPath' into a STRING_ARRAY.
Signature
JSONEXTRACTKEY(jsonField, 'jsonPath')
Arguments
DISTINCTAVG
This section contains reference documentation for the DISTINCTAVG function.
Returns the average of distinct row values in a group
Signature
DISTINCTAVG(colName) or avg(distinct col)
MD5
This section contains reference documentation for the MD5 function.
Return MD5 digest of binary column(bytes type) as hex string
Signature
MD5(bytesCol)
Histogram
This section contains reference documentation for the HISTOGRAM function.
Returns the count of data points that fall within each bin as a vector. The bins are left-inclusive and right-exclusive, i.e. [a, b), except for the last one which is inclusive on both sides [a, b].
Signatures
ln
This section contains reference documentation for the ln function.
Natural log of value i.e. ln(col1)
Signature
LN(col1)
day
This section contains reference documentation for the day function.
Returns the day of the month from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 31.
Signature
day(tsInMillis)
day(tsInMillis, timeZoneId)
DISTINCTAVGMV
This section contains reference documentation for the DISTINCTAVGMV function.
Returns the average of distinct row values in a group
Signature
DISTINCTAVGMV(colName)
JSONFORMAT
This section contains reference documentation for the JSONFORMAT function.
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an .
Signature
JSONFORMAT(object)
length
This section contains reference documentation for the length function.
calculate length of the string
Signature
LENGTH(col)
DISTINCTSUM
This section contains reference documentation for the DISTINCTSUM function.
Returns the sum of distinct row values in a group
Signature
DISTINCTSUM(colName) or sum(distinct col)
max
This section contains reference documentation for the max function.
Get the maximum value in a group
Signature
MAX(colName)
DISTINCTCOUNTRAWHLLMV
This section contains reference documentation for the DISTINCTCOUNTRAWHLLMV function.
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
Signature
DISTINCTCOUNTRAWHLLMV(colName, log2m)
MAXMV
This section contains reference documentation for the MAXMV function.
Get the maximum value in a group
Signature
MAXMV(colName)
DISTINCT
This section contains reference documentation for the DISTINCT function.
Returns the distinct row values in a group
Signature
DISTINCT(colName)
MINMAXRANGEMV
This section contains reference documentation for the MINMAXRANGEMV function.
Returns the max - min value in a group
Signature
MINMAXRANGEMV(colName)
hour
This section contains reference documentation for the hour function.
Returns the hour of the day from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 23.
Signature
hour(tsInMillis)
hour(tsInMillis, timeZoneId)
dayOfWeek
This section contains reference documentation for the dayOfWeek function.
Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday).
Signature
dayOfWeek(tsInMillis)
dayOfWeek(tsInMillis, timeZoneId)
now
This section contains reference documentation for the now function.
Return current time as epoch millis.
Signature
now()
minmaxrange
This section contains reference documentation for the minmaxrange function.
returns binary data (represented as a Hex string) from Base64-encoded string.
Signature
toBase64(bytesCol)
fromBase64(stringCol)
Usage Examples
For better readability, the following examples converts string hello! into BYTES using toUtf8 function and converts the decoded BYTES into string using fromUtf8.
encoded
aGVsbG8h
decoded
hello!
Note that without UTF8 string conversion, returned BYTES will be represented as a Hex string following Pinot's BYTES column representation. See the example below.
decoded
68656c6c6f21
Note that the following query will throw compilation error as string is not a valid input type for toBase64.
select MINMAXRANGE(yearID) AS value
from baseballStats
select DivAirportIDs,
arrayIndexOfInt(DivAirportIDs, 14683) AS index
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivAirportIDs,
arrayReverseInt(DivAirportIDs) AS reversedIds
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select COUNTMV(DivTailNums) AS value
from airlineStats
where arraylength(DivTailNums) > 1
select DISTINCTCOUNTBITMAP(teamID) AS value
from baseballStats
select DivAirportIDs,
arrayContainsInt(DivAirportIDs, 14683) AS containsValue
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivAirportIDs,
arrayDistinctInt(DivAirportIDs) AS unique
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivTailNums,
arrayDistinctString(DivTailNums) AS unique
from airlineStats
WHERE arraylength(DivTailNums) >= 2
limit 5
select RandomAirports,
arrayRemoveString(RandomAirports, 'SEA') AS value
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select DivAirportIDs,
arraySortInt(DivAirportIDs) AS sortedIds
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
SELECT toBase64(toUtf8('hello!')) AS encoded
FROM ignoreMe
SELECT fromUtf8(fromBase64('aGVsbG8h')) AS decoded
FROM ignoreMe
SELECT fromBase64('aGVsbG8h') AS decoded
FROM ignoreMe
SELECT toBase64('hello!') AS encoded
FROM ignoreMe
select dayOfYear(1639351800000, 'CET') AS dayOfYear
FROM ignoreMe
select doy(1639351800000) AS dayOfYear
FROM ignoreMe
select doy(1639351800000, 'CET') AS dayOfYear
FROM ignoreMe
select ARRAYLENGTH(RandomAirports) AS length, count(*)
from airlineStats
GROUP BY length
ORDER BY count(*) DESC
LIMIT 5
select DivWheelsOffs,
arrayConcatInt(DivWheelsOffs, DivWheelsOns) AS concatIds
from airlineStats
WHERE arraylength(DivWheelsOffs) >= 2
limit 5
select DISTINCTCOUNTBITMAPMV(DivTailNums) AS value
from airlineStats
where arraylength(DivTailNums) > 1
select DivAirportIDs,
arrayRemoveInt(DivAirportIDs, 12892) AS value
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
AND arrayContainsInt(DivAirportIDs, 12892) = 1
limit 5
select DISTINCTCOUNTRAWHLL(teamID, 1) AS value
from baseballStats
select FromEpochSecondsBucket(1613472303, 1) AS bucket
FROM ignoreMe
select FromEpochSecondsBucket(1613472303, 2) AS bucket
FROM ignoreMe
select FromEpochMinutesBucket(2689120, 10) AS bucket
FROM ignoreMe
select FromEpochHoursBucket(89637, 5) AS bucket
FROM ignoreMe
select FromEpochDaysBucket(1867, 10) AS bucket
FROM ignoreMe
SELECT FromDateTime('2019-08-07', 'yyyy-MM-dd') AS epochMillis
FROM ignoreMe
SELECT FromDateTime(
'2019-08-07 3:12:13 PM',
'yyyy-MM-dd hh:mm:ss a'
) AS epochMillis
FROM ignoreMe
SELECT FromDateTime(
'2019-08-07T15:12:13',
'yyyy-MM-dd''T''HH:mm:ss'
) AS epochMillis
FROM ignoreMe
SELECT FromDateTime(
'2019-08-07T07:12:13-0800',
'yyyy-MM-dd''T''HH:mm:ssZ'
) AS epochMillis
FROM ignoreMe
select DISTINCTCOUNTMV(DivTailNums) AS value
from airlineStats
where arraylength(DivTailNums) > 1
SELECT AVG(DISTINCT AtBatting) AS VALUE
FROM baseballStats
SELECT HISTOGRAM(numberOfGames, 0, 200, 10) AS histogram
FROM baseballStats
select HISTOGRAM(AtBatting, Array['-Infinity', 1, 10, 50, 100, 500, 1000]) AS histogram
from baseballStats
select day(1639351800000, 'CET') AS day
FROM ignoreMe
select dayOfMonth(1639351800000) AS day
FROM ignoreMe
select dayOfMonth(1639351800000, 'CET') AS day
FROM ignoreMe
The usage examples are based on extracting fields from the following JSON documents:
Expression
Value
This function can be used in the to add northernHemisphere column:
DISTINCTCOUNTHLL
This section contains reference documentation for the DISTINCTCOUNTHLL function.
Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.
For accurate distinct counting, see DISTINCTCOUNT.
Signature
DISTINCTCOUNTHLL(colName, log2m)
Usage Examples
These examples are based on the .
value
value
millisecond
This section contains reference documentation for the millisecond function.
Returns the millisecond of the second from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 999.
Signature
millisecond(tsInMillis)
millisecond(tsInMillis, timeZoneId)
Usage Examples
millisecond
millisecond
FromEpoch
This section contains reference documentation for the fromEpoch functions.
Convert epoch to epoch milliseconds. The following time units are supported:
SECONDS
MINUTES
HOURS
DAYS
Signature
FromEpoch<TIME_UNIT>(timeIn<Time_UNIT>)
Usage Examples
epochMillis
epochMillis
epochMillis
epochMillis
DISTINCTCOUNTHLLMV
This section contains reference documentation for the DISTINCTCOUNTHLLMV function.
Returns an approximate distinct count using HyperLogLog in a group
Signature
DISTINCTCOUNTHLLMV(colName)
Usage Examples
These examples are based on the .
value
month
This section contains reference documentation for the month function.
Returns the month of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 12.
Signature
month(tsInMillis)
month(tsInMillis, timeZoneId)
Usage Examples
month
month
lpad
This section contains reference documentation for the LPAD function.
string padded from the left side with pad to reach final size
Signature
LPAD(col, size, pad)
Usage Examples
value
percentileest
This section contains reference documentation for the percentileest function.
Returns the Nth percentile of the group using Quantile Digest algorithm.
Signature
percentileest(colName, percentile)
Usage Examples
These examples are based on the .
value
value
value
mult
This section contains reference documentation for the MULT function.
Product of at least two values
Signature
MULT(col1, col2, col3...)
Usage Examples
These examples are based on the .
homeRuns
baseOnBalls
total
percentilemv
This section contains reference documentation for the PERCENTILEMV function.
Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive
Signature
PERCENTILEMV(colName, N)
Usage Examples
These examples are based on the .
value
value
value
minute
This section contains reference documentation for the minute function.
Returns the minute of the hour from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 59.
Signature
minute(tsInMillis)
minute(tsInMillis, timeZoneId)
Usage Examples
minute
minute
percentile
This section contains reference documentation for the percentile function.
Returns the max - min value in a group
Signature
percentile(colName, percentile)
Usage Examples
These examples are based on the .
value
value
value
MINMV
This section contains reference documentation for the MINMV function.
Get the minimum value in a group
Signature
MINMV(colName)
Usage Examples
These examples are based on the .
value
exp
This section contains reference documentation for the exp function.
Euler’s number(e) raised to the power of col.
Signature
EXP(col1)
Usage Examples
value
value
JSONPATHDOUBLE
This section contains reference documentation for the JSONPATHDOUBLE function.
Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .
Signature
DISTINCTCOUNTRAWTHETASKETCH
This section contains reference documentation for the DISTINCTCOUNTRAWTHETASKETCH function.
The framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.
Signature
isSubnetOf
This section contains reference documentation for the isSubnetOf function.
Takes 2 arguments of type STRING. The first argument is an ipPrefix, and the second argument is a single ipAddress. This function handles both IPv4 and IPv6 arguments.
Returns a boolean value checking if ipAddress is in the subnet of ipPrefix
JSONPATHLONG
This section contains reference documentation for the JSONPATHLONG function.
Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .
Signature
mode
This section contains reference documentation for the mode function.
Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.
Signature
MODE(colName, [reducerType])
percentileestmv
This section contains reference documentation for the PERCENTILEESTMV function.
Returns the Nth percentile of the group using algorithm.
Signature
PERCENTILEESTMV(colName, N)
select id, repo, JSONEXTRACTKEY(repo, '$.*') AS keys
from githubEvents
WHERE id = 7044874109
select homeRuns, numberOfGames, DIV(homeRuns, numberOfGames) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
select homeRuns, baseOnBalls, MULT(homeRuns, baseOnBalls) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
thetaSketchColumn (required): Name of the column to aggregate on.
thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.
Currently, the only supported parameter is nominalEntries (defaults to 4096).
predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
See the following sample queries where isSubnetOf is used in different parts of the query.
SELECT isSubnetOf('192.168.0.1/24', '192.168.0.225')
AS result
FROM myTable;
---> returns true
SELECT isSubnetOf('1.2.3.128/26', '1.2.5.1')
AS result
FROM myTable;
---> returns false
SELECT isSubnetOf('2001:4800:7825:103::/64', '2001:4800:7825:103::2050')
AS result
FROM myTable;
---> returns true
SELECT isSubnetOf('7890:db8:113::8a2e:370:7334/127', '7890:db8:113::8a2e:370:7336')
AS result
FROM myTable;
---> returns false
select PERCENTILEESTMV(DivLongestGTimes, 50) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select DivTailNums,
arrayConcatString(DivTailNums, DivTailNums) AS concatIds
from airlineStats
WHERE arraylength(DivTailNums) >= 2
limit 5
select FlightNum,
arrayReverseString(RandomAirports) AS reversedAirports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select FlightNum,
arraySliceInt(DivAirportIDs, 0, 1) AS airports,
DivAirportIDs
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select FlightNum,
arraySliceString(RandomAirports, 0, 2) AS airports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select FlightNum,
arraySortString(RandomAirports) AS sortedAirports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select DivTailNums,
DivAirports,
arrayUnionString(DivTailNums, DivAirports) AS unionIds
from airlineStats
WHERE arraylength(DivTailNums) >= 2
limit 5
select DivWheelsOffs,
DivWheelsOns,
arrayUnionInt(DivWheelsOffs, DivWheelsOns) AS unionIds
from airlineStats
WHERE arraylength(DivWheelsOffs) >= 2
limit 5
select DISTINCTCOUNTHLL(teamID, 12) AS value
from baseballStats
select FromEpochSeconds(1613472303) AS epochMillis
FROM ignoreMe
select FromEpochMinutes(26891205) AS epochMillis
FROM ignoreMe
select FromEpochHours(448186) AS epochMillis
FROM ignoreMe
select FromEpochDays(18674) AS epochMillis
FROM ignoreMe
select percentileest(homeRuns, 80) AS value
from baseballStats
select percentileest(homeRuns, 99.9) AS value
from baseballStats
select PERCENTILEMV(DivLongestGTimes, 90) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILEMV(DivLongestGTimes, 99.9) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select percentile(homeRuns, 80) AS value
from baseballStats
select percentile(homeRuns, 99.9) AS value
from baseballStats
select distinctCountRawThetaSketch(teamID) AS value
from baseballStats
select distinctCountRawThetaSketch(teamID, 'nominalEntries=10') AS value
from baseballStats
select distinctCountRawThetaSketch(
yearID,
'nominalEntries=4096',
'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
'SET_INTERSECT($1, $2)'
) AS value
from baseballStats
SELECT count(*)
FROM myTable
WHERE isSubnetOf('192.168.0.1/24', ipAddressCol);
SELECT count(*)
FROM myTable
WHERE isSubnetOf('192.168.0.1/24', ipAddressCol)
OR isSubnetOf(ipPrefixCol, '7890:db8:113::8a2e:370:7336');
SELECT
CASE
WHEN isSubnetOf('105.25.245.115/27', srcIPAddress) THEN 'case1'
WHEN isSubnetOf('105.25.245.115/27', dstIPAddress) THEN 'case2'
ELSE 'case3'
END AS differentFlow
FROM myTable;
select mode(yearID, 'AVG') AS value
from baseballStats
WHERE AtBatting != 0 AND yearID > 2001
select mode(yearID, 'MIN') AS value
from baseballStats
WHERE AtBatting != 0 AND yearID > 2001
select mode(yearID, 'MAX') AS value
from baseballStats
WHERE AtBatting != 0 AND yearID > 2001
select PERCENTILEESTMV(DivLongestGTimes, 90) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILEESTMV(DivLongestGTimes, 99.9) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
JSONPATHDOUBLE(data, '$.age')
24.0
This function can be used in the table config to extract the age property into the age column, as described below:
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
JSONPATHLONG(data, '$.age')
24
This function can be used in the table config to extract the age property into the age column, as described below:
This section contains reference documentation for the JSONPATH function.
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.
Signature
JSONPATH(jsonField, 'jsonPath')
Arguments
Description
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
This function can be used in the to extract the name property into the name column and age property into the age column, as described below:
DATETRUNC
This section contains reference documentation for the DATETRUNC function.
(Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc.
Converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.
inputTimeUnitStr and outputTimeUnitStr support the following values:
NANOSECONDS
MICROSECONDS
MILLISECONDS
Usage Examples
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight):
or
ts
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the UTC time zone, returning a result in epoch in seconds in UTC timezone:
ts
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the CET time zone, returning a result in epoch in seconds in UTC timezone:
ts
Truncates an epoch in milliseconds at QUARTER in the Los Angeles time zone (where a Quarter begins on Jan 1st, April 1st, July 1st, October 1st in Los Angeles timezone), returning a result in hours since UTC epoch:
ts
JSONPATHARRAY
This section contains reference documentation for the JSONPATHARRAY function.
Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.
Signature
JSONPATHARRAY(jsonField, 'jsonPath')
Arguments
Description
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
This function can be used in the to extract the name, score, and second value of homework_grades into their respective columns , as described below:
DISTINCTCOUNTTHETASKETCH
This section contains reference documentation for the DISTINCTCOUNTTHETASKETCH function.
The Theta Sketch framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Class and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.
Signature
DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate) -> Long
thetaSketchColumn (required): Name of the column to aggregate on.
thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.
Usage Examples
These examples are based on the .
value
value
We can also provide predicates and a post aggregation expression to compute more complicated cardinalities. For example, we could can find the intersection of the following queries:
yearID
yearID
(the yearId 1986 is the only one in common)
By running the following query:
value
jsonextractscalar
This section contains reference documentation for the JSONEXTRACTSCALAR function.
Evaluates the 'jsonPath' on jsonField, returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.
'jsonPath'and`` ``'results_type'are literals. Pinot uses single quotes to distinguish them from identifiers.
Usage Examples
The examples in this section are based on the . In particular we'll be querying the row WHERE id = 7044874109:
repo
The following examples show how to use the JSONEXTRACTSCALAR function:
id
name
id
name
JSONPATHSTRING
This section contains reference documentation for the JSONPATHSTRING function.
Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .
Signature
JSONPATHARRAYDEFAULTEMPTY
This section contains reference documentation for the JSONPATHARRAYDEFAULTEMPTY function.
Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. This function can only be used in an .
Currently, the only supported parameter is nominalEntries (defaults to 4096).
predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
select distinctCountThetaSketch(teamID) AS value
from baseballStats
select distinctCountThetaSketch(teamID, 'nominalEntries=10') AS value
from baseballStats
select yearID
from baseballStats
where teamID = 'SFN' AND numberOfGames = 28 AND homeRuns = 1
select yearID
from baseballStats
where teamID = 'CHN' AND numberOfGames = 28 AND homeRuns = 1
select distinctCountThetaSketch(
yearID,
'nominalEntries=4096',
'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
'SET_INTERSECT($1, $2)'
) AS value
from baseballStats
select repo
from githubEvents
WHERE id = 7044874109
select id, jsonextractscalar(repo, '$.name', 'STRING') AS name
from githubEvents
WHERE id = 7044874109
select id, jsonextractscalar(repo, '$.foo', 'STRING') AS name
from githubEvents
WHERE id = 7044874109
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the JSONPath Online Evaluator to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
JSONPATHSTRING(data, '$.age')
"24"
JSONPATHSTRING(data, '$.name["nick.name"]')
"Pete"
This function can be used in the table config to extract the age property into the age column, as described below:
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the name, score, and second value of homework_grades into their respective columns , as described below:
created_at_timestamp from milliseconds since epoch to days since epoch, bucketed to 1 day granularity:
id
created_at_timestamp
timeInMs
convertedTime
7044874134
2018-01-01 11:00:00.0
1514804402000
17532
created_at_timestamp bucketed to 15 minutes granularity:
id
created_at_timestamp
timeInMs
convertedTime
7044874134
2018-01-01 11:00:00.0
1514804402000
1514804400000
created_at_timestamp to format yyyy-MM-dd, bucketed to 1 days granularity:
id
created_at_timestamp
timeInMs
convertedTime
7044874134
2018-01-01 11:00:00.0
1514804402000
2018-01-01
created_at_timestamp to format yyyy-MM-dd HH:mm, in timezone Pacific/Kiritimati:
id
created_at_timestamp
timeInMs
convertedTime
7044874134
2018-01-01 11:00:00.0
1514804402000
2018-01-02 01:00
created_at_timestamp to format yyyy-MM-dd, in timezone Pacific/Kiritimati and bucketed to 1 day granularity:
id
created_at_timestamp
timeInMs
convertedTime
7044874134
2018-01-01 11:00:00.0
1514804402000
2018-01-02 00:00
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:DAYS:EPOCH',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:EPOCH',
'15:MINUTES'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)',
'1:MILLISECONDS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
SIMPLE_DATE_FORMAT
e.g.
yyyy-MM-dd
. A specific timezone can be passed using
tz(timezone)
. Timezone can be long or short string format timezone. e.g.
Asia/Kolkata
or
PDT
FUNNELCOUNT
This section contains reference documentation for the FUNNELCOUNT function.
Funnel analytics aggregation function.
Returns array of distinct correlated counts for each funnel step.
Signature
FUNNEL_COUNT (
STEPS ( predicate1, predicate2 ... ),
CORRELATED_BY ( correlation_column ),
SETTINGS ( setting1, setting2 ... ) )
Parameter
Arguments
Description
Usage Examples
Many datasets are time series in nature, tracking events of an entity over time. An example of such a dataset could be a user analytics activity log from a commerce web application.
Example
user_id
event_time
url
Funnel
We want to analyse the following checkout funnel:
/cart/add
/checkout/start
/checkout/confirmation
Counts
We want to answer the following questions about the above funnel:
How many users entered the top of the funnel?
How many of these users proceeded to the second step?
How many users reached the bottom of the funnel after completing all steps?
Query
counts
Notes
Notice that although U1 user added to cart twice, it still counted as one conversion in the first step, as we report on unique counts rather than total events. Also notice that although U2 events were logged out of order, we still counted the user as converted.
Equivalence
The above query is equivalent to the below presto SQL query:
Settings
For a large dataset we could use for example a theta_sketch strategy, or furthermore, partition the data by user_id and leverage a partitioned strategy. It is also important to filter in the where clause so to aggregate only necessary rows.
counts
Another Example
We now want to learn how many users checkout after a text search; as opposed to other entry points such as browsing a product category listing. We want to then analyse the following funnel:
/product/search
/cart/add
/checkout/start
Query
counts
Notes
Notice that U1 is not counted in this funnel, as the user did not perform any product search. Both U2 and U3 entered the top of the funnel and performed the second step, but only U2 converted to the bottom of the funnel.
2021-10-01 09:47:00.000
/cart/add
U3
2021-10-01 10:02:00.000
/product/listing
U3
2021-10-01 10:05:00.000
/product/search
U2
2021-10-01 10:06:00.000
/product/search
U2
2021-10-01 10:15:00.000
/checkout/start
U2
2021-10-01 10:16:00.000
/cart/add
U3
2021-10-01 11:17:00.000
/product/details
U2
2021-10-01 11:18:00.000
/checkout/confirmation
U3
2021-10-01 11:21:00.000
/cart/add
U1
2021-10-01 11:33:00.000
/cart/add
U1
2021-10-01 11:46:00.000
/checkout/start
U1
2021-10-01 11:54:00.000
/checkout/confirmation
/checkout/confirmation
STEPS
predicates 1...n
(required) These are individual predicates representing funnel steps which are applied on rows selected by the where clause. Distinct values from the correlation_column that satisfy these predicates are counted per step. For example, all filtered rows that match url = '/checkout' are unionized into a set. Sets are intersected with the sets resulted from the preceding steps, each step retaining only individuals present in previous steps. Finally, unique counts are returned for each step in the funnel.
CORRELATED_BY
correlation_column
(required) Column to leverage for funnel correlation, distinct values from this column are counted per step during aggregation. Only dictionary-encoded columns are supported.
SETTINGS
settings 1...n
(optional) Settings to select and configure a funnel counting strategy:
bitmap (default): This strategy is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions. For accurate distinct counting on all column types, use 'set' instead. See also DISTINCTCOUNTBITMAP.
set: This strategy uses fastutil hash sets. Use with care, unbounded memory cost. See also DISTINCTCOUNT.
theta_sketch: This strategy leverages Theta Sketch framework to provide an approximate funnel count with a small memory footprint. See also DISTINCTCOUNTTHETASKETCH.
nominalEntries: theta-sketch strategy parameter (defaults to 4096). Can only be used in conjunction with theta_sketch setting.
partitioned: This strategy counts funnel steps per segment, then sums up step counts across segments. Correlation column should be configured as partition column for this strategy. See also .
sorted: This strategy counts funnel steps per segment with zero memory footprint. Correlation column should be configured as sort column for this strategy. Can only be used in conjunction with partitioned setting.
U1
2021-10-01 09:01:00.000
/product/listing
U2
2021-10-01 09:17:00.000
/product/search
U1
2021-10-01 09:33:00.000
/product/details
3, 2, 2
3, 2, 2
2, 2, 1, 1
U1
select
FUNNEL_COUNT(
STEPS(
url = '/cart/add',
url = '/checkout/start',
url = '/checkout/confirmation'),
CORRELATED_BY(user_id)
) AS counts
from user_log
select
ARRAY[
count_if(steps[1]),
count_if(steps[1] and steps[2]),
count_if(steps[1] and steps[2] and steps[3])
] as counts
from (
select
ARRAY[
bool_or(url = '/cart/add'),
bool_or(url = '/checkout/start'),
bool_or(url = '/checkout/confirmation')
] as steps
from user_log
group by user_id
)
select
FUNNEL_COUNT(
STEPS(
url = '/cart/add',
url = '/checkout/start',
url = '/checkout/confirmation'),
CORRELATED_BY(user_id),
SETTINGS('theta_sketch', 'nominalEntries=4096')
) AS counts
from user_log
where url in ('/cart/add', '/checkout/start', '/checkout/confirmation')