select homeRuns, baseOnBalls, ADD(homeRuns, baseOnBalls) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
select DivWheelsOffs,
arrayConcatInt(DivWheelsOffs, DivWheelsOns) AS concatIds
from airlineStats
WHERE arraylength(DivWheelsOffs) >= 2
limit 5
DISTINCTCOUNTBITMAP
This section contains reference documentation for the DISTINCTCOUNTBITMAP function.
Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.
For accurate distinct counting on all column types, see .
Signature
DISTINCTCOUNTBITMAP(colName)
AVGMV
This section contains reference documentation for the AVGMV function.
Get the avg of values in a group
Signature
AVGMV(colName)
ceil
This section contains reference documentation for the CEIL function.
Rounded up to the nearest integer.
Signature
CEIL(col1)
CHR
This section contains reference documentation for the CHR function.
the character corresponding to the Unicode codepoint
Signature
CHR(codepoint)
codepoint
This section contains reference documentation for the CODEPOINT function.
the Unicode codepoint of the first character of the string
Signature
CODEPOINT(col)
FLOOR
This section contains reference documentation for the FLOOR function.
select AVGMV(DivLongestGTimes) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
Usage Examples
value
13
value
-12
select CEIL(12.1) AS value
from ignoreMe
select CEIL(-12.1) AS value
from ignoreMe
Usage Examples
value
A
SELECT CHR(65) AS value
FROM ignoreMe
Usage Examples
value
65
SELECT CODEPOINT('Apache Pinot') AS value
FROM ignoreMe
Usage Examples
value
12
value
-13
select FLOOR(12.1) AS value
from ignoreMe
select FLOOR(-12.1) AS value
from ignoreMe
select DISTINCTCOUNTBITMAP(teamID) AS value
from baseballStats
arrayIndexOfInt
This section contains reference documentation for the arrayIndexOfInt function.
Finds the last index of the given value in the array starting at the given index.
Signature
arrayIndexOfInt('colName', valueToFind)
Usage Examples
These examples are based on the .
DivAirportIDs
index
arrayRemoveString
This section contains reference documentation for the arrayRemoveString function.
Removes value from array of strings.
Signature
arrayRemoveString('colName', value)
Usage Examples
These examples are based on the .
DivAirportIDs
value
arrayReverseString
This section contains reference documentation for the arrayReverseString function.
Reverses array of strings.
Signature
arrayReverseString('colName')
Usage Examples
These examples are based on the .
FlightNum
reversedAirports
RandomAirports
arraySliceInt
This section contains reference documentation for the arraySliceInt function.
Returns the values in the array between the start and end positions.
Signature
arraySliceInt('colName', start, end)
Usage Examples
These examples are based on the .
FlightNum
airports
DivAirportIDs
arraySortString
This section contains reference documentation for the arraySortString function.
Sorts array of strings.
Signature
arraySortString('colName')
Usage Examples
These examples are based on the .
FlightNum
sortedAirports
RandomAirports
COUNTMV
This section contains reference documentation for the COUNTMV function.
Get the count of rows in a group
Signature
COUNTMV(colName)
Usage Examples
These examples are based on the .
The following query returns the documents that have a DivTailNums with more than one value:
DivTailNums
You can count the number of items in these rows by running the following query:
value
arrayReverseInt
This section contains reference documentation for the arrayReverseInt function.
Reverses array of ints.
Signature
arrayReverseInt('colName')
Usage Examples
These examples are based on the .
DivAirportIDs
reversedIds
arrayContainsString
This section contains reference documentation for the arrayContainsString function.
Checks if string value exists in array.
Signature
arrayContainsString('colName', valueToFind)
Usage Examples
These examples are based on the .
DivTailNums
index
concat
This section contains reference documentation for the concat function.
Concatenate two input strings using the seperator
Signature
CONCAT(col1, col2, seperator)
Usage Examples
value
value
DIV
This section contains reference documentation for the DIV function.
Quotient of two values
Signature
DIV(col1, col2)
Usage Examples
These examples are based on the .
homeRuns
numberOfGames
total
DISTINCTCOUNTHLLMV
This section contains reference documentation for the DISTINCTCOUNTBITMAPMV function.
Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.
Signature
DISTINCTCOUNTBITMAPMV(colName)
Usage Examples
These examples are based on the .
value
value
arrayRemoveInt
This section contains reference documentation for the arrayRemoveInt function.
Removes value from array of ints.
Signature
arrayRemoveInt('colName', value)
Usage Examples
These examples are based on the .
DivAirportIDs
value
arrayDistinctInt
This section contains reference documentation for the arrayDistinctInt function.
Returns unique values in an array of ints.
Signature
arrayDistinctInt('colName')
Usage Examples
These examples are based on the .
DivAirportIDs
unique
DISTINCTCOUNTHLL
This section contains reference documentation for the DISTINCTCOUNTHLL function.
Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.
For accurate distinct counting, see DISTINCTCOUNT.
Signature
DISTINCTCOUNTHLL(colName, log2m)
Usage Examples
These examples are based on the .
value
value
DISTINCTCOUNTRAWHLL
This section contains reference documentation for the DISTINCTCOUNTRAWHLL function.
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
Signature
DISTINCTCOUNTRAWHLL(colName, log2m)
Usage Examples
These examples are based on the .
value
value
DISTINCTCOUNT
This section contains reference documentation for the DISTINCTCOUNT function.
Returns the count of distinct row values in a group
Signature
DISTINCTCOUNT(colName)
Usage Examples
These examples are based on the .
value
value
MD5
This section contains reference documentation for the MD5 function.
Return MD5 digest of binary column(bytes type) as hex string
Signature
MD5(bytesCol)
Usage Examples
These examples are based on the .
event_id
location
hash
The row returned will be different if you run this example as the data is ingested in real-time.
length
This section contains reference documentation for the length function.
calculate length of the string
Signature
LENGTH(col)
Usage Examples
value
min
This section contains reference documentation for the min function.
Get the minimum value in a group
Signature
MIN(colName)
Usage Examples
These examples are based on the .
value
exp
This section contains reference documentation for the exp function.
Euler’s number(e) raised to the power of col.
Signature
EXP(col1)
Usage Examples
value
value
lower
This section contains reference documentation for the lower function.
Converts string to lower case.
Signature
LOWER(col)
Usage Examples
name
lpad
This section contains reference documentation for the LPAD function.
string padded from the left side with pad to reach final size
Signature
LPAD(col, size, pad)
Usage Examples
value
minmaxrange
This section contains reference documentation for the minmaxrange function.
Returns the max - min value in a group
Signature
MINMAXRANGE(colName)
Usage Examples
These examples are based on the .
value
rpad
This section contains reference documentation for the RPAD function.
string padded from the right side with pad to reach final size
Signature
RPAD(col, size, pad)
Usage Examples
value
reverse
This section contains reference documentation for the reverse function.
Reverse the string
Signature
REVERSE(col)
Usage Examples
name
remove
This section contains reference documentation for the remove function.
Removes all instances of search from string
Signature
remove(input, search)
Usage Examples
value
sqrt
This section contains reference documentation for the sqrt function.
Square root of a value
Signature
SQRT(col1)
Usage Examples
value
arrayContainsInt
This section contains reference documentation for the arrayContainsInt function.
Checks if int value exists in array.
Signature
arrayContainsInt('colName', valueToFind)
ARRAYLENGTH
This section contains reference documentation for the ARRAYLENGTH function.
Returns the length of a multi-value column
Signature
ARRAYLENGTH('colName')
arraySliceString
This section contains reference documentation for the arraySliceString function.
Returns the values in the array between the start and end positions.
Signature
arraySliceString('colName', start, end)
arraySortInt
This section contains reference documentation for the arraySortInt function.
Sorts array of ints.
Signature
arraySortInt('colName')
arrayUnionInt
This section contains reference documentation for the arrayUnionInt function.
Create a union of two arrays of ints.
Signature
arrayUnionInt('colName1', 'colName2')
arrayDistinctString
This section contains reference documentation for the arrayDistinctString function.
Returns unique values in an array of strings.
Signature
arrayDistinctString('colName')
arrayUnionString
This section contains reference documentation for the arrayUnionString function.
Create a union of two arrays of strings.
Signature
arrayUnionString('colName1', 'colName2')
DISTINCTCOUNTBITMAPMV
This section contains reference documentation for the DISTINCTCOUNTHLLMV function.
Returns an approximate distinct count using HyperLogLog in a group
Signature
DISTINCTCOUNTHLLMV(colName)
arrayIndexOfString
This section contains reference documentation for the arrayIndexOfString function.
Finds the last index of the given value in the array starting at the given index.
Signature
arrayIndexOfString('colName', valueToFind)
DISTINCTCOUNTMV
This section contains reference documentation for the DISTINCTCOUNTMV function.
Returns the count of distinct row values in a group
Signature
DISTINCTCOUNTMV(colName)
FromEpoch
This section contains reference documentation for the fromEpoch functions.
Convert epoch to epoch milliseconds. The following time units are supported:
SECONDS
MINUTES
DISTINCTCOUNTRAWHLLMV
This section contains reference documentation for the DISTINCTCOUNTRAWHLLMV function.
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
Signature
DISTINCTCOUNTRAWHLLMV(colName, log2m)
FromDateTime
This section contains reference documentation for the FromDateTime function.
Converts a formatted date-time string to milliseconds, based on the provided .
Signature
FromDateTime(dateTimeString, pattern)
JSONFORMAT
This section contains reference documentation for the JSONFORMAT function.
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an .
Signature
JSONFORMAT(object)
MINMAXRANGEMV
This section contains reference documentation for the MINMAXRANGEMV function.
Returns the max - min value in a group
Signature
MINMAXRANGEMV(colName)
jsonextractkey
This section contains reference documentation for the JSONEXTRACTKEY function.
Extracts all matched JSON field keys based on 'jsonPath' into a STRING_ARRAY.
Signature
JSONEXTRACTKEY(jsonField, 'jsonPath')
Arguments
percentile
This section contains reference documentation for the percentile function.
Returns the max - min value in a group
Signature
percentile(colName, percentile)
mult
This section contains reference documentation for the MULT function.
Product of at least two values
Signature
MULT(col1, col2, col3...)
max
This section contains reference documentation for the max function.
Get the maximum value in a group
Signature
MAX(colName)
now
This section contains reference documentation for the now function.
Return current time as epoch millis.
Signature
now()
DISTINCT
This section contains reference documentation for the DISTINCT function.
Returns the distinct row values in a group
Signature
DISTINCT(colName)
millisecond
This section contains reference documentation for the millisecond function.
Returns the millisecond of the second from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 999.
Signature
millisecond(tsInMillis)
millisecond(tsInMillis, timeZoneId)
hour
This section contains reference documentation for the hour function.
Returns the hour of the day from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 23.
Signature
hour(tsInMillis)
hour(tsInMillis, timeZoneId)
MAXMV
This section contains reference documentation for the MAXMV function.
Get the maximum value in a group
Signature
MAXMV(colName)
MOD
This section contains reference documentation for the MOD function.
Modulo of two values
Signature
MOD(col1, col2)
mode
This section contains reference documentation for the mode function.
Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.
Signature
MODE(colName, [reducerType])
sha512
This section contains reference documentation for the SHA512 function.
Return SHA-512 digest of binary column(bytes type) as hex string
Signature
SHA512(bytesCol)
regexpExtract
This section contains reference documentation for the regexpExtract function.
Extracts values that match the provided regular expression
Signature
regexpExtract(value, regexp)
regexpExtract(value, regexp, group)
sha
This section contains reference documentation for the SHA function.
Return SHA-1 digest of binary column(bytes type) as hex string
SELECT concat('Apache', 'Pinot', ' ') AS value
FROM ignoreMe
Apache Pinot
SELECT concat('real-time', 'analytics', '__') AS value
FROM ignoreMe
real-time__analytics
select homeRuns, numberOfGames, DIV(homeRuns, numberOfGames) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
select homeRuns, baseOnBalls, MULT(homeRuns, baseOnBalls) AS total
from baseballStats
WHERE teamID = 'ML1'
AND yearID = 1956
AND playerName = 'Henry Louis'
The row returned will be different if you run this example as the data is ingested in real-time.
select event_id, location, SHA(location) AS hash
from meetupRsvp
limit 1
select DivAirportIDs,
arrayIndexOfInt(DivAirportIDs, 14683) AS index
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select RandomAirports,
arrayRemoveString(RandomAirports, 'SEA') AS value
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select FlightNum,
arrayReverseString(RandomAirports) AS reversedAirports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select FlightNum,
arraySliceInt(DivAirportIDs, 0, 1) AS airports,
DivAirportIDs
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select FlightNum,
arraySortString(RandomAirports) AS sortedAirports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select COUNTMV(DivTailNums) AS value
from airlineStats
where arraylength(DivTailNums) > 1
select DivAirportIDs,
arrayReverseInt(DivAirportIDs) AS reversedIds
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DISTINCTCOUNTBITMAPMV(DivTailNums) AS value
from airlineStats
where arraylength(DivTailNums) > 1
select DivAirportIDs,
arrayRemoveInt(DivAirportIDs, 12892) AS value
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
AND arrayContainsInt(DivAirportIDs, 12892) = 1
limit 5
select DivAirportIDs,
arrayDistinctInt(DivAirportIDs) AS unique
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DISTINCTCOUNTHLL(teamID, 12) AS value
from baseballStats
select DISTINCTCOUNTRAWHLL(teamID, 1) AS value
from baseballStats
select DivAirportIDs,
arrayContainsInt(DivAirportIDs, 14683) AS containsValue
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select ARRAYLENGTH(RandomAirports) AS length, count(*)
from airlineStats
GROUP BY length
ORDER BY count(*) DESC
LIMIT 5
select DivAirportIDs,
arraySortInt(DivAirportIDs) AS sortedIds
from airlineStats
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivTailNums,
arrayDistinctString(DivTailNums) AS unique
from airlineStats
WHERE arraylength(DivTailNums) >= 2
limit 5
select DISTINCTCOUNTMV(DivTailNums) AS value
from airlineStats
where arraylength(DivTailNums) > 1
select FromEpochSeconds(1613472303) AS epochMillis
FROM ignoreMe
select FromEpochMinutes(26891205) AS epochMillis
FROM ignoreMe
select FromEpochHours(448186) AS epochMillis
FROM ignoreMe
select FromEpochDays(18674) AS epochMillis
FROM ignoreMe
select DISTINCTCOUNTRAWHLLMV(DivAirports, 1) AS value
from airlineStats
where arraylength(DivAirports) > 1
SELECT FromDateTime('2019-08-07', 'yyyy-MM-dd') AS epochMillis
FROM ignoreMe
SELECT FromDateTime(
'2019-08-07 3:12:13 PM',
'yyyy-MM-dd hh:mm:ss a'
) AS epochMillis
FROM ignoreMe
SELECT FromDateTime(
'2019-08-07T15:12:13',
'yyyy-MM-dd''T''HH:mm:ss'
) AS epochMillis
FROM ignoreMe
SELECT FromDateTime(
'2019-08-07T07:12:13-0800',
'yyyy-MM-dd''T''HH:mm:ssZ'
) AS epochMillis
FROM ignoreMe
This section contains reference documentation for the SEGMENTPARTITIONEDDISTINCTCOUNT function.
Returns the count of distinct values of a column when the column is pre-partitioned for each segment, where there is no common value within different segments. This function calculates the exact count of distinct values within the segment, then simply sums up the results from different segments to get the final result.
This function relies on the expression values being partitioned for each segment, where there are no common values within different segments.
Signature
SEGMENTPARTITIONEDDISTINCTCOUNT(colName)
Usage Examples
These examples are based on the .
value
second
This section contains reference documentation for the second function.
Returns the second of the minute from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 59.
Signature
second(tsInMillis)
second(tsInMillis, timeZoneId)
Usage Examples
second
second
replace
This section contains reference documentation for the REPLACE function.
replace all instances of find with replace in input
Signature
REPLACE(col, find, replace)
Usage Examples
value
value
sha256
This section contains reference documentation for the SHA256 function.
Return SHA-256 digest of binary column(bytes type) as hex string
Signature
SHA256(bytesCol)
Usage Examples
These examples are based on the .
event_id
location
hash
The row returned will be different if you run this example as the data is ingested in real-time.
round
This section contains reference documentation for the round function.
Round the given time value to nearest bucket start value.
Signature
round(timeValue, bucketSize)
Usage Examples
Round seconds epoch value to the start value of the 30 seconds bucket to which it belongs.
rounded
Round milliseconds epoch value to the start value of the 5,000 milliseconds bucket to which it belongs.
rounded
DISTINCTCOUNTTHETASKETCH
This section contains reference documentation for the DISTINCTCOUNTTHETASKETCH function.
The framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.
Signature
jsonextractscalar
This section contains reference documentation for the JSONEXTRACTSCALAR function.
Evaluates the 'jsonPath' on jsonField, returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.
This section contains reference documentation for the JSONPATHARRAY function.
Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an .
Signature
JSONPATHARRAY(jsonField, 'jsonPath')
JSONPATHDOUBLE
This section contains reference documentation for the JSONPATHDOUBLE function.
Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .
Signature
select FlightNum,
arraySliceString(RandomAirports, 0, 2) AS airports,
RandomAirports
from airlineStats
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5
select DivWheelsOffs,
DivWheelsOns,
arrayUnionInt(DivWheelsOffs, DivWheelsOns) AS unionIds
from airlineStats
WHERE arraylength(DivWheelsOffs) >= 2
limit 5
select DivTailNums,
DivAirports,
arrayUnionString(DivTailNums, DivAirports) AS unionIds
from airlineStats
WHERE arraylength(DivTailNums) >= 2
limit 5
select id, repo, JSONEXTRACTKEY(repo, '$.*') AS keys
from githubEvents
WHERE id = 7044874109
thetaSketchColumn (required): Name of the column to aggregate on.
thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.
Currently, the only supported parameter is nominalEntries (defaults to 4096).
predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
We can also provide predicates and a post aggregation expression to compute more complicated cardinalities. For example, we could can find the intersection of the following queries:
select dayOfWeek(1639351800000, 'CET') AS dayOfWeek
FROM ignoreMe
select dow(1639351800000) AS dayOfWeek
FROM ignoreMe
select dow(1639351800000, 'CET') AS dayOfWeek
FROM ignoreMe
select dayOfYear(1639351800000, 'CET') AS dayOfYear
FROM ignoreMe
select doy(1639351800000) AS dayOfYear
FROM ignoreMe
select doy(1639351800000, 'CET') AS dayOfYear
FROM ignoreMe
select PERCENTILEMV(DivLongestGTimes, 90) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILEMV(DivLongestGTimes, 99.9) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select percentileest(homeRuns, 80) AS value
from baseballStats
select percentileest(homeRuns, 99.9) AS value
from baseballStats
select quarter(1633046399000, 'UTC') AS quarter
FROM ignoreMe
select quarter(1633046399000, 'CET') AS quarter
FROM ignoreMe
select PERCENTILEESTMV(DivLongestGTimes, 90) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILEESTMV(DivLongestGTimes, 99.9) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILETDigest(homeRuns, 80) AS value
from baseballStats
select PERCENTILETDigest(homeRuns, 99.9) AS value
from baseballStats
select distinctCountThetaSketch(teamID) AS value
from baseballStats
select distinctCountThetaSketch(teamID, 'nominalEntries=10') AS value
from baseballStats
select yearID
from baseballStats
where teamID = 'SFN' AND numberOfGames = 28 AND homeRuns = 1
select yearID
from baseballStats
where teamID = 'CHN' AND numberOfGames = 28 AND homeRuns = 1
select distinctCountThetaSketch(
yearID,
'nominalEntries=4096',
'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
'SET_INTERSECT($1, $2)'
) AS value
from baseballStats
Arguments
Description
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows to read values from JSON documents.
'results_type'
One of the Pinot supported data types:INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING,
This function can be used in the table config to extract the name, score, and second value of homework_grades into their respective columns , as described below:
This section contains reference documentation for the JSONPATHARRAYDEFAULTEMPTY function.
Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. This function can only be used in an ingestion transformation function.
Signature
JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath')
Arguments
Description
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
This function can be used in the to extract the name, score, and second value of homework_grades into their respective columns , as described below:
JSONPATHSTRING
This section contains reference documentation for the JSONPATHSTRING function.
Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an ingestion transformation function.
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
This function can be used in the to extract the age property into the age column, as described below:
JSONPATH
This section contains reference documentation for the JSONPATH function.
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.
Signature
JSONPATH(jsonField, 'jsonPath')
Arguments
Description
'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.
Usage Examples
The usage examples are based on extracting fields from the following JSON document:
Expression
Value
This function can be used in the to extract the name property into the name column and age property into the age column, as described below:
DATETRUNC
This section contains reference documentation for the DATETRUNC function.
(Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc.
Converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.
inputTimeUnitStr and outputTimeUnitStr support the following values:
NANOSECONDS
MICROSECONDS
MILLISECONDS
Usage Examples
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight):
or
ts
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the UTC time zone, returning a result in epoch in seconds in UTC timezone:
ts
Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the CET time zone, returning a result in epoch in seconds in UTC timezone:
ts
Truncates an epoch in milliseconds at QUARTER in the Los Angeles time zone (where a Quarter begins on Jan 1st, April 1st, July 1st, October 1st in Los Angeles timezone), returning a result in hours since UTC epoch:
ts
ltrim
This section contains reference documentation for the ltrim function.
trim spaces from left side of the string
Signature
LTRIM(col)
Usage Examples
notTrimmed
trimmed
MINMV
This section contains reference documentation for the MINMV function.
Get the minimum value in a group
Signature
MINMV(colName)
ln
This section contains reference documentation for the ln function.
Natural log of value i.e. ln(col1)
Signature
LN(col1)
day
This section contains reference documentation for the day function.
Returns the day of the month from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 31.
Signature
day(tsInMillis)
day(tsInMillis, timeZoneId)
select repo
from githubEvents
WHERE id = 7044874109
select id, jsonextractscalar(repo, '$.name', 'STRING') AS name
from githubEvents
WHERE id = 7044874109
select id, jsonextractscalar(repo, '$.foo', 'STRING') AS name
from githubEvents
WHERE id = 7044874109
inputFormat and outputFormat are defined using the following structure:
<time size>:<time unit>:<time format>:<pattern>
where:
time size - size of the time unit eg: 1, 10
time unit - DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS
granularity is specified in the format <time size>:<time unit>.
Usage Examples
These examples are based on the .
created_at_timestamp from milliseconds since epoch to days since epoch, bucketed to 1 day granularity:
id
created_at_timestamp
timeInMs
convertedTime
created_at_timestamp bucketed to 15 minutes granularity:
id
created_at_timestamp
timeInMs
convertedTime
created_at_timestamp to format yyyy-MM-dd, bucketed to 1 days granularity:
id
created_at_timestamp
timeInMs
convertedTime
created_at_timestamp to format yyyy-MM-dd HH:mm, in timezone Pacific/Kiritimati:
id
created_at_timestamp
timeInMs
convertedTime
created_at_timestamp to format yyyy-MM-dd, in timezone Pacific/Kiritimati and bucketed to 1 day granularity:
id
created_at_timestamp
timeInMs
convertedTime
DISTINCTCOUNTRAWTHETASKETCH
This section contains reference documentation for the DISTINCTCOUNTRAWTHETASKETCH function.
The framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.
Signature
arrayConcatString
This section contains reference documentation for the arrayConcatString function.
Concatenates two arrays of strings.
Signature
arrayConcatString('colName1', 'colName2')
percentiletdigestmv
This section contains reference documentation for the PERCENTILETDIGESTMV function.
Returns the Nth percentile of the group using .
Signature
PERCENTILETDIGESTMV(colName, N)
,
MICROSECONDS
,
NANOSECONDS
time format
EPOCH
SIMPLE_DATE_FORMAT pattern - defined in case of SIMPLE_DATE_FORMAT e.g. yyyy-MM-dd. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. Asia/Kolkata or PDT
thetaSketchColumn (required): Name of the column to aggregate on.
thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.
Currently, the only supported parameter is nominalEntries (defaults to 4096).
predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
select PERCENTILETDIGESTMV(DivLongestGTimes, 50) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:DAYS:EPOCH',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:EPOCH',
'15:MINUTES'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)',
'1:MILLISECONDS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select id,
created_at_timestamp,
cast(created_at_timestamp AS long) AS timeInMs,
DATETIMECONVERT(
created_at_timestamp,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)',
'1:DAYS'
) AS convertedTime
from githubEvents
WHERE id = 7044874134
select distinctCountRawThetaSketch(teamID) AS value
from baseballStats
select distinctCountRawThetaSketch(teamID, 'nominalEntries=10') AS value
from baseballStats
select distinctCountRawThetaSketch(
yearID,
'nominalEntries=4096',
'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
'SET_INTERSECT($1, $2)'
) AS value
from baseballStats
select DivTailNums,
arrayConcatString(DivTailNums, DivTailNums) AS concatIds
from airlineStats
WHERE arraylength(DivTailNums) >= 2
limit 5
select PERCENTILETDIGESTMV(DivLongestGTimes, 90) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
select PERCENTILETDIGESTMV(DivLongestGTimes, 99.9) AS value
from airlineStats
where arraylength(DivLongestGTimes) > 1
JSONPATHLONG
This section contains reference documentation for the JSONPATHLONG function.
Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an ingestion transformation function.