This section contains reference documentation for the ADD function.
Sum of at least two values
Signature
ADD(col1, col2, col3...)
Usage Examples
These examples are based on the .
homeRuns
baseOnBalls
total
arrayConcatLong
This section contains reference documentation for the arrayConcatLong function.
Concatenates two arrays of longs.
Signature
arrayConcatLong('colName1', 'colName2')
Usage Examples
This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type LONG with singleValueField in the table schema set to false.
arraySliceInt
This section contains reference documentation for the arraySliceInt function.
Returns the values in the array between the start and end positions.
Signature
arraySliceInt('colName', start, end)
arrayIndexOfString
This section contains reference documentation for the arrayIndexOfString function.
Finds the last index of the given value in the array starting at the given index.
Signature
arrayIndexOfString('colName', valueToFind)
arrayDistinctInt
This section contains reference documentation for the arrayDistinctInt function.
Returns unique values in an array of ints.
Signature
arrayDistinctInt('colName')
arraySliceString
This section contains reference documentation for the arraySliceString function.
Returns the values in the array between the start and end positions.
Signature
arraySliceString('colName', start, end)
count
This section contains reference documentation for the count function.
Get the count of rows in a group
Signature
COUNT(colName)
ceil
This section contains reference documentation for the CEIL function.
Rounded up to the nearest integer.
Signature
CEIL(col1)
select homeRuns, baseOnBalls, ADD(homeRuns, baseOnBalls) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
This section contains reference documentation for the now function.
Return current time as epoch millis.
Signature
now()
Usage Examples
now
This function is typically used in predicate to filter on timestamp for recent data. e.g. filter data on recent 1 day(86400 seconds)
AVGMV
This section contains reference documentation for the AVGMV function.
Get the avg of values in a group
Signature
AVGMV(colName)
Usage Examples
These examples are based on the .
value
DISTINCTSUM
This section contains reference documentation for the DISTINCTSUM function.
Returns the sum of distinct row values in a group
Signature
DISTINCTSUM(colName) or sum(distinct col)
Usage Examples
These examples are based on the .
VALUE
VALUE
MINMAXRANGEMV
This section contains reference documentation for the MINMAXRANGEMV function.
Returns the max - min value in a group
Signature
MINMAXRANGEMV(colName)
Usage Examples
These examples are based on the .
value
FromEpochBucket
This section contains reference documentation for the fromEpochBucket functions.
Convert epoch to epoch milliseconds. e.g. 10 seconds since epoch or 5 minutes since Epoch. The following time units are supported:
SECONDS
MINUTES
arrayDistinctString
This section contains reference documentation for the arrayDistinctString function.
Returns unique values in an array of strings.
Signature
arrayDistinctString('colName')
DISTINCTCOUNTMV
This section contains reference documentation for the DISTINCTCOUNTMV function.
Returns the count of distinct row values in a group
Signature
DISTINCTCOUNTMV(colName)
arrayRemoveString
This section contains reference documentation for the arrayRemoveString function.
Removes value from array of strings.
Signature
arrayRemoveString('colName', value)
lpad
This section contains reference documentation for the LPAD function.
string padded from the left side with pad to reach final size
Signature
LPAD(col, size, pad)
ln
This section contains reference documentation for the ln function.
Natural log of value i.e. ln(col1)
Signature
LN(col1)
select FlightNum,
arraySliceInt(DivAirportIDs, 0, 1) AS airports,
DivAirportIDs
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select FlightNum,
arraySliceString(RandomAirports, 0, 2) AS airports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
24
13198
13198,10721
548
10721
10721,12478
424
SEA,PSC
SEA,PSC,PHX,MSY
3162
SEA,PSC
SEA,PSC,PHX,MSY
select now() AS now
FROM ignoreMe
1639150454255
SELECT *
FROM tableName
WHERE tsInMillis > now() - 86400000
select AVGMV(DivLongestGTimes) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
toBase64 returns Base64 encoded string of input binary data (bytes type).
fromBase64 returns binary data (represented as a Hex string) from Base64-encoded string.
Signature
toBase64(bytesCol)
fromBase64(stringCol)
Usage Examples
For better readability, the following examples converts string hello! into BYTES using function and converts the decoded BYTES into string using .
encoded
decoded
Note that without UTF8 string conversion, returned BYTES will be represented as a Hex string following Pinot's . See the example below.
decoded
Note that the following query will throw compilation error as string is not a valid input type for toBase64.
arrayUnionString
This section contains reference documentation for the arrayUnionString function.
Create a union of two arrays of strings.
Signature
arrayUnionString('colName1', 'colName2')
Usage Examples
These examples are based on the .
DivTailNums
DivAirports
unionIds
MOD
This section contains reference documentation for the MOD function.
Modulo of two values
Signature
MOD(col1, col2)
Usage Examples
value
value
DISTINCTAVG
This section contains reference documentation for the DISTINCTAVG function.
Returns the average of distinct row values in a group
Signature
DISTINCTAVG(colName) or avg(distinct col)
COVAR_SAMP
This section contains reference documentation for the COVAR_SAMP function.
Returns the sample covariance between of 2 numerical columns.
Signatures
COVAR_SAMP(col1, col2) -> double
DISTINCTCOUNTBITMAP
This section contains reference documentation for the DISTINCTCOUNTBITMAP function.
Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.
For accurate distinct counting on all column types, see .
Signature
DISTINCTCOUNTBITMAP(colName)
DISTINCTCOUNTRAWHLL
This section contains reference documentation for the DISTINCTCOUNTRAWHLL function.
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
Signature
DISTINCTCOUNTRAWHLL(colName, log2m)
arrayUnionInt
This section contains reference documentation for the arrayUnionInt function.
Create a union of two arrays of ints.
Signature
arrayUnionInt('colName1', 'colName2')
ago
This section contains reference documentation for the ago function.
Return time as epoch millis before the given period (in ISO-8601 duration format).
Examples:
"PT20.345S" -- parses as "20.345 seconds"
"PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)
minute
This section contains reference documentation for the minute function.
Returns the minute of the hour from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 59.
Signature
minute(tsInMillis)
minute(tsInMillis, timeZoneId)
percentile
This section contains reference documentation for the percentile function.
Returns the max - min value in a group
Signature
percentile(colName, percentile)
FromDateTime
This section contains reference documentation for the FromDateTime function.
Converts a formatted date-time string to milliseconds, based on the provided .
Signature
FromDateTime(dateTimeString, pattern)
arrayConcatString
This section contains reference documentation for the arrayConcatString function.
Concatenates two arrays of strings.
Signature
arrayConcatString('colName1', 'colName2')
mode
This section contains reference documentation for the mode function.
Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.
Signature
MODE(colName, [reducerType])
DISTINCTCOUNT
This section contains reference documentation for the DISTINCTCOUNT function.
Returns the count of distinct row values in a group
Signature
DISTINCTCOUNT(colName)
MAXMV
This section contains reference documentation for the MAXMV function.
Get the maximum value in a group
Signature
MAXMV(colName)
SELECT DISTINCTSUMMV(DivLongestGTimes) AS VALUE
FROM airlineStats
WHERE arraylength(DivLongestGTimes) > 1
This section contains reference documentation for the DIV function.
Quotient of two values
Signature
DIV(col1, col2)
Usage Examples
These examples are based on the .
homeRuns
numberOfGames
total
month
This section contains reference documentation for the month function.
Returns the month of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 12.
Signature
month(tsInMillis)
month(tsInMillis, timeZoneId)
Usage Examples
month
month
DISTINCT
This section contains reference documentation for the DISTINCT function.
Returns the distinct row values in a group
Signature
DISTINCT(colName)
Usage Examples
These examples are based on the .
value
value
DISTINCTCOUNTRAWHLLMV
This section contains reference documentation for the DISTINCTCOUNTRAWHLLMV function.
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
Signature
DISTINCTCOUNTRAWHLLMV(colName, log2m)
Usage Examples
These examples are based on the .
value
value
mult
This section contains reference documentation for the MULT function.
Product of at least two values
Signature
MULT(col1, col2, col3...)
Usage Examples
These examples are based on the .
homeRuns
baseOnBalls
total
quarter
This section contains reference documentation for the quarter function.
Returns the quarter of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 4
Signature
quarter(tsInMillis)
quarter(tsInMillis, timeZoneId)
Usage Examples
quarter
quarter
quarter
max
This section contains reference documentation for the max function.
Get the maximum value in a group
Signature
MAX(colName)
Usage Examples
These examples are based on the .
value
percentileest
This section contains reference documentation for the percentileest function.
Returns the Nth percentile of the group using Quantile Digest algorithm.
Signature
percentileest(colName, percentile)
Usage Examples
These examples are based on the .
value
value
value
millisecond
This section contains reference documentation for the millisecond function.
Returns the millisecond of the second from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 999.
Signature
millisecond(tsInMillis)
millisecond(tsInMillis, timeZoneId)
Usage Examples
millisecond
millisecond
MD5
This section contains reference documentation for the MD5 function.
Return MD5 digest of binary column(bytes type) as hex string
Signature
MD5(bytesCol)
Usage Examples
These examples are based on the .
event_id
location
hash
The row returned will be different if you run this example as the data is ingested in real-time.
CHR
This section contains reference documentation for the CHR function.
the character corresponding to the Unicode codepoint
Signature
CHR(codepoint)
Usage Examples
value
JSONPATHSTRING
This section contains reference documentation for the JSONPATHSTRING function.
Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .
Signature
arraySortString
This section contains reference documentation for the arraySortString function.
Sorts array of strings.
Signature
arraySortString('colName')
FromEpoch
This section contains reference documentation for the fromEpoch functions.
Convert epoch to epoch milliseconds. The following time units are supported:
SECONDS
MINUTES
Histogram
This section contains reference documentation for the HISTOGRAM function.
Returns the count of data points that fall within each bin as a vector. The bins are left-inclusive and right-exclusive, i.e. [a, b), except for the last one which is inclusive on both sides [a, b].
Signatures
select DivWheelsOffs,
DivWheelsOns,
arrayUnionInt(DivWheelsOffs, DivWheelsOns) AS unionIds
from airlineStats
WHERE arraylength(DivWheelsOffs) >= 2
limit 5
1908,1758
1339,2310
1908,1758,1339,2310
select homeRuns, numberOfGames, DIV(homeRuns, numberOfGames) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
select homeRuns, baseOnBalls, MULT(homeRuns, baseOnBalls) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
JSONPATHSTRING(data, '$.age')
"24"
JSONPATHSTRING(data, '$.name["nick.name"]')
"Pete"
This function can be used in the table config to extract the age property into the age column, as described below:
This section contains reference documentation for the JSONPATHARRAYDEFAULTEMPTY function.
Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. This function can only be used in an ingestion transformation function.
Signature
JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath')
Arguments
Description
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
This function can be used in the to extract the name, score, and second value of homework_grades into their respective columns , as described below:
JSONPATHLONG
This section contains reference documentation for the JSONPATHLONG function.
Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an ingestion transformation function.
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
This function can be used in the to extract the age property into the age column, as described below:
DISTINCTCOUNTHLLMV
This section contains reference documentation for the DISTINCTCOUNTHLLMV function.
Returns an approximate distinct count using HyperLogLog in a group
Signature
DISTINCTCOUNTHLLMV(colName)
Usage Examples
These examples are based on the .
value
arrayContainsInt
This section contains reference documentation for the arrayContainsInt function.
Checks if int value exists in array.
Signature
arrayContainsInt('colName', valueToFind)
Usage Examples
These examples are based on the .
DivAirportIDs
containsValue
ABS
This section contains reference documentation for the abs function.
Absolute of a value
Signature
ABS(col1)
Usage Examples
value
value
JSONPATH
This section contains reference documentation for the JSONPATH function.
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an .
Signature
JSONPATH(jsonField, 'jsonPath')
JSONPATHDOUBLE
This section contains reference documentation for the JSONPATHDOUBLE function.
Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .
Signature
arrayConcatFloat
This section contains reference documentation for the arrayConcatFloat function.
select FlightNum,
arraySortString(RandomAirports) AS sortedAirports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select DivAirportIDs,
arrayContainsInt(DivAirportIDs, 14683) AS containsValue
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
Arguments
Description
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows to read values from JSON documents.
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
JSONPATH(data, '$.name')
"Pete"
JSONPATH(data, '$.age')
24
This function can be used in the table config to extract the name property into the name column and age property into the age column, as described below:
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
JSONPATHDOUBLE(data, '$.age')
24.0
This function can be used in the table config to extract the age property into the age column, as described below:
This section contains reference documentation for the day function.
Returns the day of the month from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 31.
Signature
day(tsInMillis)
day(tsInMillis, timeZoneId)
dayOfMonth(tsInMillis)
dayOfMonth(tsInMillis, timeZoneId)
Usage Examples
day
day
day
day
JSONFORMAT
This section contains reference documentation for the JSONFORMAT function.
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.
Signature
JSONFORMAT(object)
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
This function can be used in the to extract the meta property into the data column, as described below:
DISTINCTCOUNTHLL
This section contains reference documentation for the DISTINCTCOUNTHLL function.
Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.
For accurate distinct counting, see DISTINCTCOUNT.
Signature
DISTINCTCOUNTHLL(colName, log2m)
Usage Examples
These examples are based on the .
value
value
arrayConcatInt
This section contains reference documentation for the arrayConcatInt function.
Concatenates two arrays of ints.
Signature
arrayConcatInt('colName1', 'colName2')
lower
This section contains reference documentation for the lower function.
Converts string to lower case.
Signature
LOWER(col)
arrayReverseString
This section contains reference documentation for the arrayReverseString function.
Reverses array of strings.
Signature
arrayReverseString('colName')
percentiletdigestmv
This section contains reference documentation for the PERCENTILETDIGESTMV function.
Returns the Nth percentile of the group using .
Signature
PERCENTILETDIGESTMV(colName, N)
ltrim
This section contains reference documentation for the ltrim function.
trim spaces from left side of the string
Signature
LTRIM(col)
arrayContainsString
This section contains reference documentation for the arrayContainsString function.
Checks if string value exists in array.
Signature
arrayContainsString('colName', valueToFind)
exp
This section contains reference documentation for the exp function.
Euler’s number(e) raised to the power of col.
Signature
EXP(col1)
DISTINCTCOUNTBITMAPMV
This section contains reference documentation for the DISTINCTCOUNTBITMAPMV function.
Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.
Signature
DISTINCTCOUNTBITMAPMV(colName)
codepoint
This section contains reference documentation for the CODEPOINT function.
the Unicode codepoint of the first character of the string
This section contains reference documentation for the arrayConcatDouble function.
Concatenates two arrays of doubles.
Signature
arrayConcatDouble('colName1', 'colName2')
Usage Examples
This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type DOUBLE with singleValueField in the table schema set to false.
dayOfYear
This section contains reference documentation for the dayOfYear function.
Returns the day of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 366.
Signature
dayOfYear(tsInMillis)
dayOfYear(tsInMillis, timeZoneId)
doy(tsInMillis)
doy(tsInMillis, timeZoneId)
Usage Examples
dayOfYear
dayOfYear
dayOfYear
dayOfYear
arrayRemoveInt
This section contains reference documentation for the arrayRemoveInt function.
Removes value from array of ints.
Signature
arrayRemoveInt('colName', value)
DISTINCTAVGMV
This section contains reference documentation for the DISTINCTAVGMV function.
Returns the average of distinct row values in a group
Signature
DISTINCTAVGMV(colName)
percentileestmv
This section contains reference documentation for the PERCENTILEESTMV function.
Returns the Nth percentile of the group using algorithm.
Signature
PERCENTILEESTMV(colName, N)
min
This section contains reference documentation for the min function.
Get the minimum value in a group
Signature
MIN(colName)
select FlightNum,
arrayReverseString(RandomAirports) AS reversedAirports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
1023
PHX,PSC,SEA
SEA,PSC,PHX
963
MSY,PHX,PSC,SEA
SEA,PSC,PHX,MSY
select mvCol1,
arrayConcatDouble(mvCol1, mvCol2) AS concatDoubles
from multiValueTable
WHERE arraylength(mvCol1) >= 2
limit 5
select dayOfYear(1639351800000) AS dayOfYear
FROM ignoreMe
select dayOfYear(1639351800000, 'CET') AS dayOfYear
FROM ignoreMe
select doy(1639351800000) AS dayOfYear
FROM ignoreMe
select doy(1639351800000, 'CET') AS dayOfYear
FROM ignoreMe
select DivAirportIDs,
arrayRemoveInt(DivAirportIDs, 12892) AS value
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
AND arrayContainsInt(DivAirportIDs, 12892) = 1
limit 5
select PERCENTILEESTMV(DivLongestGTimes, 90) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILEESTMV(DivLongestGTimes, 99.9) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
ARRAYLENGTH
This section contains reference documentation for the ARRAYLENGTH function.
Returns the length of a multi-value column
Signature
ARRAYLENGTH('colName')
Usage Examples
These examples are based on the .
length
count(*)
The count(*) values will increase each time we execute the query as data is constantly being ingested by the Hybrid Quick Start.
COVAR_POP
This section contains reference documentation for the COVAR_POP function.
Returns the population covariance between of 2 numerical columns.
SELECT concat('Apache', 'Pinot', ' ') AS value
FROM ignoreMe
Apache Pinot
SELECT concat('real-time', 'analytics', '__') AS value
FROM ignoreMe
real-time__analytics
select ARRAYLENGTH(RandomAirports) AS length, count(*)
from airlineStats
GROUP BY length
ORDER BY count(*) DESC
LIMIT 5
JSONPATHARRAY(jsonField, 'jsonPath')
Arguments
Description
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows to read values from JSON documents.
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the name, score, and second value of homework_grades into their respective columns , as described below:
This section contains reference documentation for the DISTINCTCOUNTRAWTHETASKETCH function.
The Theta Sketch framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Class and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.
Currently, the only supported parameter is nominalEntries (defaults to 4096).
predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
The following query returns the documents that have a DivTailNums with more than one value:
DivTailNums
N7713A,N7713A
N344AA,N344AA
N344AA,N344AA
N7713A,N7713A
You can count the number of items in these rows by running the following query:
value
8
select distinctCountRawThetaSketch(teamID) AS value
from baseballStats
select distinctCountRawThetaSketch(teamID, 'nominalEntries=10') AS value
from baseballStats
select distinctCountRawThetaSketch(
yearID,
'nominalEntries=4096',
'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
'SET_INTERSECT($1, $2)'
) AS value
from baseballStats
select PERCENTILEMV(DivLongestGTimes, 90) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILEMV(DivLongestGTimes, 99.9) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select DivAirportIDs,
arrayIndexOfInt(DivAirportIDs, 14683) AS index
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivTailNums
from airlineStats
where arraylength(DivTailNums) > 1
select COUNTMV(DivTailNums) AS value
from airlineStats
where arraylength(DivTailNums) > 1
Arguments
Description
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows to read values from JSON documents.
'results_type'
One of the Pinot supported data types:INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING,
inputTimeUnitStr and outputTimeUnitStr support the following values:
NANOSECONDS
MICROSECONDS
MILLISECONDS
SECONDS
MINUTES
HOURS
DAYS
Usage Examples
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight):
or
ts
1639353600000
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the UTC time zone, returning a result in epoch in seconds in UTC timezone:
ts
1639353600
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the CET time zone, returning a result in epoch in seconds in UTC timezone:
ts
1639350000
Truncates an epoch in milliseconds at QUARTER in the Los Angeles time zone (where a Quarter begins on Jan 1st, April 1st, July 1st, October 1st in Los Angeles timezone), returning a result in hours since UTC epoch:
select PERCENTILETDigest(homeRuns, 50) AS value
from baseballStats
Usage Examples
value
12
value
-13
select FLOOR(12.1) AS value
from ignoreMe
select FLOOR(-12.1) AS value
from ignoreMe
select dateTrunc('week', 1639480981746) AS ts
FROM ignoreMe
select dateTrunc('week', 1639480981746, 'MILLISECONDS') AS ts
FROM ignoreMe
select dateTrunc(
'week',
1639480981746,
'MILLISECONDS',
'UTC',
'SECONDS'
) AS ts
FROM ignoreMe
select dateTrunc(
'week',
1639480981746,
'MILLISECONDS',
'CET',
'SECONDS'
) AS ts
FROM ignoreMe
select dateTrunc(
'quarter',
1639480981746,
'MILLISECONDS',
'America/Los_Angeles',
'HOURS'
) AS ts
FROM ignoreMe
select DivAirportIDs,
arrayReverseInt(DivAirportIDs) AS reversedIds
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select PERCENTILETDigest(homeRuns, 80) AS value
from baseballStats
select PERCENTILETDigest(homeRuns, 99.9) AS value
from baseballStats
isSubnetOf
This section contains reference documentation for the isSubnetOf function.
Takes 2 arguments of type STRING. The first argument is an ipPrefix, and the second argument is a single ipAddress. This function handles both IPv4 and IPv6 arguments.
Returns a boolean value checking if ipAddress is in the subnet of ipPrefix
Signatures
isSubnetOf(ipPrefix, ipAddress) -> boolean
Usage Examples
Please see the following sample queries where isSubnetOf is used in different parts of the query.
arraySortInt
This section contains reference documentation for the arraySortInt function.
Sorts array of ints.
Signature
arraySortInt('colName')
SELECT isSubnetOf('192.168.0.1/24', '192.168.0.225')
AS result
FROM myTable;
---> returns true
SELECT isSubnetOf('1.2.3.128/26', '1.2.5.1')
AS result
FROM myTable;
---> returns false
SELECT isSubnetOf('2001:4800:7825:103::/64', '2001:4800:7825:103::2050')
AS result
FROM myTable;
---> returns true
SELECT isSubnetOf('7890:db8:113::8a2e:370:7334/127', '7890:db8:113::8a2e:370:7336')
AS result
FROM myTable;
---> returns false
SELECT count(*)
FROM myTable
WHERE isSubnetOf('192.168.0.1/24', ipAddressCol);
SELECT count(*)
FROM myTable
WHERE isSubnetOf('192.168.0.1/24', ipAddressCol)
OR isSubnetOf(ipPrefixCol, '7890:db8:113::8a2e:370:7336');
SELECT
CASE
WHEN isSubnetOf('105.25.245.115/27', srcIPAddress) THEN 'case1'
WHEN isSubnetOf('105.25.245.115/27', dstIPAddress) THEN 'case2'
ELSE 'case3'
END AS differentFlow
FROM myTable;
select DivAirportIDs,
arraySortInt(DivAirportIDs) AS sortedIds
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
DISTINCTCOUNTTHETASKETCH
This section contains reference documentation for the DISTINCTCOUNTTHETASKETCH function.
The Theta Sketch framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Class and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.
Signature
DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate) -> Long
thetaSketchColumn (required): Name of the column to aggregate on.
thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.
Usage Examples
These examples are based on the .
value
value
We can also provide predicates and a post aggregation expression to compute more complicated cardinalities. For example, we could can find the intersection of the following queries:
yearID
yearID
(the yearId 1986 is the only one in common)
By running the following query:
value
Currently, the only supported parameter is nominalEntries (defaults to 4096).
predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
select distinctCountThetaSketch(teamID) AS value
from baseballStats
select distinctCountThetaSketch(teamID, 'nominalEntries=10') AS value
from baseballStats
select yearID
from baseballStats
where teamID = 'SFN' AND numberOfGames = 28 AND homeRuns = 1
select yearID
from baseballStats
where teamID = 'CHN' AND numberOfGames = 28 AND homeRuns = 1
select distinctCountThetaSketch(
yearID,
'nominalEntries=4096',
'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
'SET_INTERSECT($1, $2)'
) AS value
from baseballStats
MINMV
This section contains reference documentation for the MINMV function.
Get the minimum value in a group
Signature
MINMV(colName)
Usage Examples
These examples are based on the .
value
hour
This section contains reference documentation for the hour function.
Returns the hour of the day from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 23.
Signature
hour(tsInMillis)
hour(tsInMillis, timeZoneId)
select MINMV(DivLongestGTimes) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
inputFormat and outputFormat are defined using the following structure:
<time size>:<time unit>:<time format>:<pattern>
where:
time size - size of the time unit eg: 1, 10
time unit - DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS
granularity is specified in the format <time size>:<time unit>.
Usage Examples
These examples are based on the .
created_at_timestamp from milliseconds since epoch to days since epoch, bucketed to 1 day granularity:
id
created_at_timestamp
timeInMs
convertedTime
created_at_timestamp bucketed to 15 minutes granularity:
id
created_at_timestamp
timeInMs
convertedTime
created_at_timestamp to format yyyy-MM-dd, bucketed to 1 days granularity:
id
created_at_timestamp
timeInMs
convertedTime
created_at_timestamp to format yyyy-MM-dd HH:mm, in timezone Pacific/Kiritimati:
id
created_at_timestamp
timeInMs
convertedTime
created_at_timestamp to format yyyy-MM-dd, in timezone Pacific/Kiritimati and bucketed to 1 day granularity:
id
created_at_timestamp
timeInMs
convertedTime
,
MICROSECONDS
,
NANOSECONDS
time format
EPOCH
SIMPLE_DATE_FORMAT pattern - defined in case of SIMPLE_DATE_FORMAT e.g. yyyy-MM-dd. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. Asia/Kolkata or PDT
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:DAYS:EPOCH',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:EPOCH',
'15:MINUTES'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)',
'1:MILLISECONDS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134