Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This section contains reference documentation for the ago function.
Return time as epoch millis before the given period (in ISO-8601 duration format).
Examples:
"PT20.345S" -- parses as "20.345 seconds"
"PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)
"PT10H" -- parses as "10 hours" (where an hour is 3600 seconds)
"P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds)
"P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes"
"P-6H3M" -- parses as "-6 hours and +3 minutes"
"-P6H3M" -- parses as "-6 hours and -3 minutes"
"-P-6H+3M" -- parses as "+6 hours and -3 minutes"
ago()
This function is typically used in the predicate to filter on timestamps for recent data. e.g. filter data on recent 1 day.
This section contains reference documentation for the ARG_MIN and ARG_MAX function.
This function scans the given dataset to identify the maximum and minimum values in the specified measuring columns. Once these extreme values (the maxima and minima) are found, the function locates the corresponding entries in the projection column. These entries are associated with the rows where the extreme values were found in the measuring columns. The function then returns these projection column values, providing a way to link the extreme measurements with their corresponding data in another part of the dataset.
ARG_MIN (measuringCol1, measuringCol2, measuringCol3, projectionCol)
ARG_MAX (measuringCol1, measuringCol2, measuringCol3, projectionCol)
Find the user with maximum activity. If there are multiple users, break the tie with their last_activity_date. If still a tie, break with user_id. And project user_id.
More useful is that this multiple such aggregation function can be used with GROUP BY
Note:
In cases where multiple rows share the same extreme values in the measuring columns, all such rows will be returned by the function.
If the goal is to project multiple different columns that correspond to the same set of measuring columns, you can achieve this by invoking the function multiple times, each time specifying a different projection column.
This impl does not work with AS clause (e.g. SELECT argmin(longCol, doubleCol) AS argmin
won't work)
Putting argmin/argmax
column inside order by clause (e.g. SELECT intCol, argmin(longCol, doubleCol) FROM table GROUP BY intCol ORDER BY argmin(longCol, doubleCol)
) is not supported as semantically ordering multi-column multi-row argmin/argmax
results doesn't make sense
Currently projecting MV bytes column doesn't work for now due to an issue
This page contains reference documentation for functions in Apache Pinot.
This page contains reference documentation for functions in Apache Pinot.
This section contains reference documentation for the arrayConcatDouble function.
This section contains reference documentation for the arrayConcatLong function.
This section contains reference documentation for the arrayConcatFloat function.
value |
---|
value |
---|
oneDayAgo |
---|
For more detailed examples, see:
These examples are based on the .
homeRuns | baseOnBalls | total |
---|
These examples are based on the .
DivTailNums | index |
---|
These examples are based on the .
DivTailNums | concatIds |
---|
These examples are based on the .
DivAirportIDs | containsValue |
---|
These examples are based on the .
DivWheelsOffs | concatIds |
---|
12.1
12.1
1639150454255
26 | 37 | 63 |
N7713A,N7713A | true |
N344AA,N344AA | false |
N7713A,N7713A | true |
N7713A,N7713A | N7713A,N7713A,N7713A,N7713A |
N344AA,N344AA | N344AA,N344AA,N344AA,N344AA |
N344AA,N344AA | N344AA,N344AA,N344AA,N344AA |
N7713A,N7713A | N7713A,N7713A,N7713A,N7713A |
13891,12892 | false |
14683,14683 | true |
12339,12339 | false |
13487,13930 | false |
13029,11292 | false |
1453,1731 | 1453,1731,1415,1623 |
1908,1758 | 1908,1758,1339,2310 |
1453,1731 | 1453,1731,1415,1623 |
1908,1758 | 1908,1758,1339,2310 |
This section contains reference documentation for the arrayIndexOfString function.
Finds the last index of the given value in the array starting at the given index.
arrayIndexOfString('colName', valueToFind)
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the arrayRemoveInt function.
Removes value from array of ints.
arrayRemoveInt('colName', value)
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the arrayIndexOfInt function.
Finds the last index of the given value in the array starting at the given index.
arrayIndexOfInt('colName', valueToFind)
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the ARRAYLENGTH function.
Returns the length of a multi-value column
ARRAYLENGTH('colName')
These examples are based on the Hybrid Quick Start.
The count(*)
values will increase each time we execute the query as data is constantly being ingested by the Hybrid Quick Start.
This section contains reference documentation for the arrayDistinctString function.
Returns unique values in an array of strings.
arrayDistinctString('colName')
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the arrayDistinctInt function.
Returns unique values in an array of ints.
arrayDistinctInt('colName')
These examples are based on the Hybrid Quick Start.
DivTailNums | index |
---|---|
DivAirportIDs | value |
---|---|
DivAirportIDs | index |
---|---|
length | count(*) |
---|---|
DivTailNums | unique |
---|---|
DivAirportIDs | unique |
---|---|
These examples are based on the .
FlightNum | reversedAirports | RandomAirports |
---|
These examples are based on the .
FlightNum | airports | DivAirportIDs |
---|
These examples are based on the .
FlightNum | airports | RandomAirports |
---|
These examples are based on the .
DivAirportIDs | value |
---|
These examples are based on the .
DivAirportIDs | sortedIds |
---|
N7713A,N7713A
0
N344AA,N344AA
-1
N7713A,N7713A
0
13891,12892
13891
13198,12892
13198
11066,12892
11066
13198,12892
13198
13891,12892
13891
13891,12892
-1
14683,14683
0
12339,12339
-1
13487,13930
-1
13029,11292
-1
1
5382
37
267
33
223
17
166
22
160
N7713A,N7713A
N7713A
N344AA,N344AA
N344AA
N344AA,N344AA
N344AA
N7713A,N7713A
N7713A
15016,11066
15016,11066
10620,14869
10620,14869
13891,12892
13891,12892
12264,10397
12264,10397
11066,12892
11066,12892
1206 | PSC,SEA | SEA,PSC |
5300 | PSC,SEA | SEA,PSC |
3359 | MSY,PHX,PSC,SEA | SEA,PSC,PHX,MSY |
1023 | PHX,PSC,SEA | SEA,PSC,PHX |
963 | MSY,PHX,PSC,SEA | SEA,PSC,PHX,MSY |
1531 | 13891 | 13891,12892 |
19 | 14683 | 14683,14683 |
829 | 12339 | 12339,12339 |
24 | 13198 | 13198,10721 |
548 | 10721 | 10721,12478 |
671 | SEA,PSC | SEA,PSC,PHX,MSY |
1767 | SEA,PSC | SEA,PSC,PHX |
2522 | SEA,PSC | SEA,PSC |
424 | SEA,PSC | SEA,PSC,PHX,MSY |
3162 | SEA,PSC | SEA,PSC,PHX,MSY |
SEA,PSC | PSC |
SEA,PSC,PHX,MSY | PSC,PHX,MSY |
SEA,PSC,PHX,MSY | PSC,PHX,MSY |
SEA,PSC | PSC |
SEA,PSC | PSC |
13891,12892 | 12892,13891 |
14683,14683 | 14683,14683 |
12339,12339 | 12339,12339 |
13198,10721 | 10721,13198 |
10721,12478 | 10721,12478 |
This section contains reference documentation for the arrayUnionInt function.
Create a union of two arrays of ints.
arrayUnionInt('colName1', 'colName2')
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the arraySortString function.
Sorts array of strings.
arraySortString('colName')
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the arrayUnionString function.
Create a union of two arrays of strings.
arrayUnionString('colName1', 'colName2')
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the AVGMV function.
Get the avg of values in a group
AVGMV(colName)
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the caseWhen function.
Returns values depending on boolean expressions. This function can only be used in an ingestion transformation function.
caseWhen(booleanExpr1, valueIfExpr1True, booleanExpr2, valueIfExpr2True) caseWhen(booleanExpr1, valueIfExpr1True, booleanExpr2, valueIfExpr2True, ... ,valueIfFalse)
The usage examples are based on extracting fields from the following JSON documents:
This function can be used in the table config to add northernHemisphere
column:
This section contains reference documentation for the arrayReverseInt function.
Reverses array of ints.
arrayReverseInt('colName')
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the day function.
This section contains reference documentation for base64 encode and decode functions.
toBase64
returns Base64 encoded string of input binary data (bytes
type).
fromBase64
returns binary data (represented as a Hex string) from Base64-encoded string.
toBase64(bytesCol)
fromBase64(stringCol)
Note that the following query will throw compilation error as string is not a valid input type for toBase64
.
This section contains reference documentation for the dayOfWeek function.
DivWheelsOffs | DivWheelsOns | unionIds |
---|---|---|
FlightNum | sortedAirports | RandomAirports |
---|---|---|
DivTailNums | DivAirports | unionIds |
---|---|---|
value |
---|
value |
---|
Arguments | Description |
---|---|
Expression | Value |
---|---|
value |
---|
value |
---|
value |
---|
value |
---|
value |
---|
DivAirportIDs | reversedIds |
---|---|
These examples are based on the .
value |
---|
These examples are based on the .
covariance |
---|
day |
---|
day |
---|
day |
---|
day |
---|
Encoding scheme follows
For better readability, the following examples converts string hello!
into BYTES using function and converts the decoded BYTES into string using .
encoded |
---|
decoded |
---|
Note that without UTF8 string conversion, returned BYTES will be represented as a Hex string following Pinot's . See the example below.
decoded |
---|
These examples are based on the .
covariance |
---|
dayOfWeek |
---|
dayOfWeek |
---|
dayOfWeek |
---|
dayOfWeek |
---|
1453,1731
1415,1623
1453,1731,1415,1623
1908,1758
1339,2310
1908,1758,1339,2310
1453,1731
1415,1623
1453,1731,1415,1623
1908,1758
1339,2310
1908,1758,1339,2310
3846
PSC,SEA
SEA,PSC
3635
MSY,PHX,PSC,SEA
SEA,PSC,PHX,MSY
429
MSY,PHX,PSC,SEA
SEA,PSC,PHX,MSY
1206
PSC,SEA
SEA,PSC
5300
PSC,SEA
SEA,PSC
N7713A,N7713A
IND,IND
N7713A,IND
N344AA,N344AA
MCI,BOS
N344AA,MCI,BOS
N7713A,N7713A
IND,IND
N7713A,IND
N344AA,N344AA
MCI,BOS
N344AA,MCI,BOS
18.465753424657535
65
booleanExpr1
A boolean expression
valueIfExpr1True
, valueIfExpr2True
A value to return.
CASEWHEN(latitude > 0, 'North', 'South')
North
CASEWHEN(latitude > 0, 1, 0)
1
A
13
-12
Apache Pinot
real-time__analytics
13891,12892
12892,13891
14683,14683
14683,14683
12339,12339
12339,12339
13487,13930
13930,13487
13029,11292
11292,13029
97889 |
8270.973200974102 |
12 |
13 |
12 |
13 |
aGVsbG8h |
hello! |
68656c6c6f21 |
2314.249154477403 |
7 |
1 |
7 |
1 |
This section contains reference documentation for the DISTINCTAVGMV function.
Returns the average of distinct row values in a group
DISTINCTAVGMV(colName)
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the COUNTMV function.
Get the count of rows in a group
COUNTMV(colName)
These examples are based on the Hybrid Quick Start.
The following query returns the documents that have a DivTailNums
with more than one value:
You can count the number of items in these rows by running the following query:
This section contains reference documentation for the DISTINCT function.
Returns the distinct row values in a group
DISTINCT(colName)
These examples are based on the Batch Quick Start.
This section contains reference documentation for the DISTINCTCOUNT function.
Returns the count of distinct row values in a group
DISTINCTCOUNT(colName)
These examples are based on the Batch Quick Start.
This section contains reference documentation for the DISTINCTAVG function.
Returns the average of distinct row values in a group
DISTINCTAVG(colName) or avg(distinct col)
These examples are based on the Batch Quick Start.
This section contains reference documentation for the DISTINCTCOUNTBITMAP function.
Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions. For accurate distinct counting on all column types, see DISTINCTCOUNT.
DISTINCTCOUNTBITMAP(colName)
These examples are based on the Batch Quick Start.
This section contains reference documentation for the dayOfYear function.
This section contains reference documentation for the DISTINCTCOUNTRAWHLLMV function.
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
DISTINCTCOUNTRAWHLLMV(colName, log2m)
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the DISTINCTCOUNTRAWHLL function.
Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
DISTINCTCOUNTRAWHLL(colName, log2m)
These examples are based on the Batch Quick Start.
This section contains reference documentation for the DISTINCTCOUNTHLL function.
Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog. For accurate distinct counting, see DISTINCTCOUNT.
DISTINCTCOUNTHLL(colName, log2m)
These examples are based on the Batch Quick Start.
This section contains reference documentation for the DISTINCTCOUNTMV function.
Returns the count of distinct row values in a group
DISTINCTCOUNTMV(colName)
These examples are based on the Hybrid Quick Start.
The following query returns the documents that have a DivTailNums
with more than one value:
You can count the distinct number of items in these rows by running the following query:
This section contains reference documentation for the DISTINCTCOUNTBITMAPMV function.
Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.
DISTINCTCOUNTBITMAPMV(colName)
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the DISTINCTCOUNTHLLMV function.
Returns an approximate distinct count using HyperLogLog in a group
DISTINCTCOUNTHLLMV(colName)
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the DISTINCTCOUNTTHETASKETCH function.
DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate) -> Long
thetaSketchColumn
(required): Name of the column to aggregate on.
thetaSketchParams
(required): Parameters for constructing the intermediate theta-sketches.
Currently, the only supported parameter is nominalEntries
(defaults to 4096).
predicates
(optional)_: _ These are individual predicates of form lhs <op> rhs
which are applied on rows selected by the where
clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn
that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA
are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate
(required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT
, where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
We can also provide predicates and a post aggregation expression to compute more complicated cardinalities. For example, we could can find the intersection of the following queries:
(the yearId 1986
is the only one in common)
By running the following query:
This section contains reference documentation for the DISTINCTCOUNTRAWTHETASKETCH function.
DISTINCTCOUNTRAWTHETASKETCH(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate) -> HexEncoded
thetaSketchColumn
(required): Name of the column to aggregate on.
thetaSketchParams
(required): Parameters for constructing the intermediate theta-sketches.
Currently, the only supported parameter is nominalEntries
(defaults to 4096).
predicates
(optional)_: _ These are individual predicates of form lhs <op> rhs
which are applied on rows selected by the where
clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn
that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA
are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate
(required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT
, where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
We can also provide predicates and a post aggregation expression to compute more complicated cardinalities:
This section contains reference documentation for the fromEpochBucket functions.
This section contains reference documentation for the DATETIMECONVERT function.
Converts the value from a column that contains an epoch timestamp into another time unit and buckets based on the given time granularity.
DATETIMECONVERT(columnName, inputFormat, outputFormat, outputGranularity)
inputFormat
and outputFormat
are defined using the following structure:
<time size>:<time unit>:<time format>:<pattern>
where:
time size
- size of the time unit eg: 1, 10
time unit
- DAYS
, HOURS
, MINUTES
, SECONDS
, MILLISECONDS
, MICROSECONDS
, NANOSECONDS
time format
EPOCH
SIMPLE_DATE_FORMAT
pattern - defined in case of SIMPLE_DATE_FORMAT
e.g. yyyy-MM-dd
. A specific timezone can be passed using tz(timezone)
. Timezone can be long or short string format timezone. e.g. Asia/Kolkata
or PDT
granularity
is specified in the format <time size>:<time unit>
.
created_at_timestamp
from milliseconds since epoch to days since epoch, bucketed to 1 day granularity:
created_at_timestamp
bucketed to 15 minutes granularity:
created_at_timestamp
to format yyyy-MM-dd
, bucketed to 1 days granularity:
created_at_timestamp
to format yyyy-MM-dd HH:mm
, in timezone Pacific/Kiritimati
:
created_at_timestamp
to format yyyy-MM-dd
, in timezone Pacific/Kiritimati
and bucketed to 1 day granularity:
This section contains reference documentation for the DATETRUNC function.
(Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc
.
Converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.
DATETRUNC(unit, timeValue)
DATETRUNC(unit, timeValue, inputTimeUnitStr)
DATETRUNC(unit, timeValue, inputTimeUnitStr, timeZone)
DATETRUNC(unit, timeValue, inputTimeUnitStr, timeZone, outputTimeUnitStr)
unit
supports the following values:
millisecond
second
minute
hour
day
week
month
quarter
year
inputTimeUnitStr
and outputTimeUnitStr
support the following values:
NANOSECONDS
MICROSECONDS
MILLISECONDS
SECONDS
MINUTES
HOURS
DAYS
Truncates an epoch in milliseconds at WEEK
(where a Week starts at Monday UTC midnight):
or
Truncates an epoch in milliseconds at WEEK
(where a Week starts at Monday UTC midnight) in the UTC
time zone, returning a result in epoch in seconds in UTC timezone:
Truncates an epoch in milliseconds at WEEK
(where a Week starts at Monday UTC midnight) in the CET
time zone, returning a result in epoch in seconds in UTC timezone:
Truncates an epoch in milliseconds at QUARTER
in the Los Angeles time zone (where a Quarter begins on Jan 1st, April 1st, July 1st, October 1st in Los Angeles timezone), returning a result in hours since UTC epoch:
VALUE |
---|
DivTailNums |
---|
value |
---|
value |
---|
value |
---|
value |
---|
value |
---|
VALUE |
---|
VALUE |
---|
value |
---|
value |
---|
dayOfYear |
---|
dayOfYear |
---|
dayOfYear |
---|
dayOfYear |
---|
value |
---|
value |
---|
value |
---|
value |
---|
value |
---|
value |
---|
DivTailNums |
---|
value |
---|
value |
---|
value |
---|
value |
---|
The framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating
to perform distinct counting as well as evaluating set operations.
These examples are based on the .
value |
---|
value |
---|
yearID |
---|
yearID |
---|
value |
---|
value |
---|
value |
---|
These examples are based on the .
VALUE |
---|
The framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating
to perform distinct counting as well as evaluating set operations.
These examples are based on the .
value |
---|
value |
---|
value |
---|
bucket |
---|
bucket |
---|
bucket |
---|
bucket |
---|
bucket |
---|
epochMillis |
---|
epochMillis |
---|
epochMillis |
---|
epochMillis |
---|
value |
---|
value |
---|
These examples are based on the .
id | created_at_timestamp | timeInMs | convertedTime |
---|
id | created_at_timestamp | timeInMs | convertedTime |
---|
id | created_at_timestamp | timeInMs | convertedTime |
---|
id | created_at_timestamp | timeInMs | convertedTime |
---|
id | created_at_timestamp | timeInMs | convertedTime |
---|
ts |
---|
ts |
---|
ts |
---|
ts |
---|
These examples are based on the .
homeRuns | numberOfGames | total |
---|
Converts a formatted date-time string to milliseconds, based on the provided .
epochMillis |
---|
epochMillis |
---|
epochMillis |
---|
epochMillis |
---|
32.4
N7713A,N7713A
N344AA,N344AA
N344AA,N344AA
N7713A,N7713A
8
NL
UA
AL
NA
PL
AA
FL
NL
UA
AL
NA
PL
AA
FL
7
149
83.36526946107784
349.1158798283262
7
148
346
347
346
347
00000008000000ac00000000000000000000000500000020000000000030000202108000040000010000000300010400000000000000000000000463000000000000000000010001041000200000002000000000000000000a00000000028001000000010800000000010000001008000000804000000000020000040000880000000000000000000000000000000000000000000000800000000800020004000000840000000002000000000000000000001400
0000000100000004000000e4
00000008000000ac00000800000084000210000000000020001020220030042002100420002010020210000300008020040180400001300310001863024004220870800004400421040104610220080000020000040000030000800002108420000110400800000106000060000080020000082000218c0002000000020000010200100000018c0006000400022004a0000088000200800000320820021000000221842000000000025088000220080100009420
000000010000000400000106
158
149
N7713A,N7713A
N344AA,N344AA
N344AA,N344AA
N7713A,N7713A
2
34
2
34
149 |
146 |
1986 |
1985 |
1937 |
2003 |
1979 |
1900 |
1986 |
1978 |
2012 |
1 |
2.718281828459045 |
162754.79141900392 |
1134 |
AgMDAAAKzJOVAAAAAACAPwDAATjfLK5fBJQy2rIU1GYLOK5a09G+XQ1UHWt00/NwFTC4EwzexhE3CHBSU+YIUzkM0goIADEeFViAmzCRcx5FeHrMHfGsU/qrFvMP+Q87UYRC7LFzZ0FV3PIfAF1FMFsM+E9XRwZRYoR79VdK7z1jAD/WClziDmb4Cosm3ctidcRl9VxfNTR47OUFqFP4dYQkZwXIEZtEhngdkGfqkQCKZPX85HITAZrwVDpI4TY6paDTZwLQNiemHFCUlEZCKcOMpkXuYypOxjzXi1ES+07IIH7EqrQeKcssHvOUh2gpzIDajYdQ4UTS6IBoXPB6AtbomPBiMalFURDzh+xppzrg5HcUTMW4Iuzgv5Mz/xIm73yOe7seghzwmH+zXUfda/mkaBqU6XQEAQFagTkndhYHHcjLb0OeQg4BGDAHtRIDD8EqsonkilQT6TZq2uM3CRXJQTlaYewzFvHsKivVomgcQRojVnPKBh0d0GgYeF4eIEXtD1bZTw43eVR1Dk6sBj3pjleOW21dRsUCRmyEDGdIfWQVJXouaUnZqaC9gi1oSrG7GT8HO2xXeb32OzfiHVx5s9+5bGpFXoXTu1n7g2Jone8JMyGuam2x7Bt55a1JdtFCFxhZ2Gd7IajHY4lNBH2lDfUoJed4f7kGUEXmlU6BCfwOkJ1CIoWBTQY+NToDhpmmmPY+rVOH5coybBHlH4vpfPBbbQsOjl0YBSC9uEmZ3WubqnV0KZ1p5d7wq/F0p7Wgo8y4JVXAobKCB+hsVckBNIA4XrYMzdWVSWeQsXHSuR+mWmJPftadyrMlfvoy2mVr8R4Dih7k3XNhXZwjBeuNJQA5Dtci6w0uIUczvEL+nY+9CSHEPQhuT//aluJ2De4Fk94cfWgaxqhYyh10TTIWZFmsDxJeOMaPT1BCwVRF6taOjftNbVDC5Fy1BtVzVIIUOGeBcj5VbhHtqowIB1qGEDIJy9ZBXD73iFBN5kVgvicaFGSKHGQqeIVsgOFdcFKITQTuV2d0pkljkPXKUIc68M0KPpU6iZYuaBA4+hGR9nri0tVnbJZOM1Z/fi01ou5YLYCoHTqkImozpJMYXLCqKtTBm2o7sc5oQATXUBC9dqM8xQoGL8OmltUWc1cX35rtD2D2zHL2IncEKMzsN/c6S31W74VTBtcbJfP9rHENp7yO453qYhA7m++jl2MKFzdvtkHqGDUcs9FKisV9Hx+ruhaGsLkdISszkZ3sYykjx3NH6BbbaCZf9jTswuxHKheTbaEDmSgrx7BfK+Z2My4jdMqCrEtKMSuJqEJ22AM5U8MNFVkCPTobkCEdJx0ZQJu+Tk73t1v3nqLUQH4PbFJzcUrr9yZFZ0u+1mzNNQ5o0w+v1dSRLGsXsPyRqGkQchuz/DKyrjJzf9Vb8HY4Ni63XiaXwgJrjq9rgAp6EmWV2xXUOI9CWZa7HsuRWO95m58nIq9K8VCkO+T/rWwrPqZ/tCgEtkshqecNhszQiki0d5Kf26o/YcATx4ZkJ655y4PTVr+kY0Xbb/UwEo2pPd3Hyd4hVz1I5N9TpYaJk2Lok1+7N+3LG+3Lj3KZtd5/+j8RujEmogI= |
AwMDAAAKzJMQAAAAAACAP4vpfPBbbQsO5N1zYV2cIwWFgU0GPjU6A4Z4HZBn6pEAyQE0gDhetgyKZPX85HITAQ4BGDAHtRIDEDub76OXYwoxK4moQnbYA9LogGhc8HoCE+k2atrjNwlVbhHtqowIBzd5VHUOTqwG+aRoGpTpdAT6PxG6MSaiAnshqMdjiU0EHEEaI1ZzygY= |
AQMDAAA6zJN8QPYIsvHMNQ== |
1613472303000 |
3226944606000 |
1613472000000 |
1613466000000 |
1613088000000 |
1613472303000 |
1613472300000 |
1613469600000 |
1613433600000 |
12 |
-13 |
7044874134 | 2018-01-01 11:00:00.0 | 1514804402000 | 17532 |
7044874134 | 2018-01-01 11:00:00.0 | 1514804402000 | 1514804400000 |
7044874134 | 2018-01-01 11:00:00.0 | 1514804402000 | 2018-01-01 |
7044874134 | 2018-01-01 11:00:00.0 | 1514804402000 | 2018-01-02 01:00 |
7044874134 | 2018-01-01 11:00:00.0 | 1514804402000 | 2018-01-02 00:00 |
1639353600000 |
1639353600 |
1639350000 |
453631 |
26 | 153 | 0.16993464052287582 |
1565136000000 |
1565190733000 |
1565190733000 |
1565190733000 |
This section contains reference documentation for the isSubnetOf function.
Takes 2 arguments of type STRING. The first argument is an ipPrefix
, and the second argument is a single ipAddress
. This function handles both IPv4 and IPv6 arguments.
Returns a boolean value checking if ipAddress
is in the subnet of ipPrefix
isSubnetOf(ipPrefix, ipAddress) -> boolean
See the following sample queries where isSubnetOf
is used in different parts of the query.
This section contains reference documentation for the FUNNELCOUNT function.
Funnel analytics aggregation function.
Returns array of distinct correlated counts for each funnel step.
FUNNEL_COUNT (
STEPS ( predicate1, predicate2 ... ),
CORRELATED_BY ( correlation_column ),
SETTINGS ( setting1, setting2 ... ) )
Many datasets are time series in nature, tracking events of an entity over time. An example of such a dataset could be a user analytics activity log from a commerce web application.
We want to analyse the following checkout funnel:
/cart/add
/checkout/start
/checkout/confirmation
We want to answer the following questions about the above funnel:
How many users entered the top of the funnel?
How many of these users proceeded to the second step?
How many users reached the bottom of the funnel after completing all steps?
Notes
Notice that although U1 user added to cart twice, it still counted as one conversion in the first step, as we report on unique counts rather than total events. Also notice that although U2 events were logged out of order, we still counted the user as converted.
Equivalence
The above query is equivalent to the below presto SQL query:
For a large dataset we could use for example a theta_sketch strategy, or furthermore, partition the data by user_id and leverage a partitioned strategy. It is also important to filter in the where clause so to aggregate only necessary rows.
We now want to learn how many users checkout after a text search; as opposed to other entry points such as browsing a product category listing. We want to then analyse the following funnel:
/product/search
/cart/add
/checkout/start
/checkout/confirmation
Notes
Notice that U1 is not counted in this funnel, as the user did not perform any product search. Both U2 and U3 entered the top of the funnel and performed the second step, but only U2 converted to the bottom of the funnel.
This section contains reference documentation for the HISTOGRAM function.
Returns the count of data points that fall within each bin as a vector. The bins are left-inclusive and right-exclusive, i.e. [a, b)
, except for the last one which is inclusive on both sides [a, b]
.
Equal length bins (better performance):
HISTOGRAM(colName, lower, upper, numBins)
Arbitrary increasing bin edges:
HISTOGRAM(colName, ARRAY[binEdge1, binEdge2, binEdge3, ...])
These examples are based on the Batch Quick Start.
10 equal-length bins [0, 20), [20, 30) ... [180, 200]
6 bins (- ∞, 1), [1, 10), [10, 50), [50,100), [100,500), [500, 1000]
This section contains reference documentation for the JSONPATHARRAY function.
Extracts an array from jsonField
based on 'jsonPath'
, the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.
JSONPATHARRAY(jsonField, 'jsonPath')
'jsonPath'
` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the name
, score
, and second value of homework_grades
into their respective columns , as described below:
This section contains reference documentation for the JSONPATHARRAYDEFAULTEMPTY function.
Extracts an array from jsonField
based on 'jsonPath'
, the result type is inferred based on JSON value. Returns empty array for null or parsing error. This function can only be used in an ingestion transformation function.
JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath')
'jsonPath'
` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the name
, score
, and second value of homework_grades
into their respective columns , as described below:
This section contains reference documentation for the JSONFORMAT function.
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.
JSONFORMAT(object)
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the meta
property into the data
column, as described below:
This section contains reference documentation for the JSONPATH function.
Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.
JSONPATH(jsonField, 'jsonPath')
'jsonPath'
` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the name
property into the name
column and age
property into the age
column, as described below:
This section contains reference documentation for the JSONPATHSTRING function.
Extracts the String value from jsonField
based on 'jsonPath'
, use optional defaultValue
for null or parsing error. This function can only be used in an ingestion transformation function.
JSONPATHSTRING(jsonField, 'jsonPath', [defaultValue])
'jsonPath'
` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the JSONPath Online Evaluator to test JSON expressions before you import any data.
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the age
property into the age
column, as described below:
This section contains reference documentation for the JSONPATHLONG function.
Extracts the Long value from jsonField
based on 'jsonPath'
, use optional defaultValue
for null or parsing error. This function can only be used in an ingestion transformation function.
JSONPATHLONG(jsonField, 'jsonPath', [defaultValue])
'jsonPath'
` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the age
property into the age
column, as described below:
This section contains reference documentation for the JSONEXTRACTSCALAR function.
Evaluates the 'jsonPath' on jsonField, returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.
JSONEXTRACTSCALAR(jsonField, 'jsonPath', 'resultsType', [defaultValue])
'jsonPath'
and`` ``
'results_type'
are literals. Pinot uses single quotes to distinguish them from identifiers.
The examples in this section are based on the Batch JSON Quick Start. In particular we'll be querying the row WHERE id = 7044874109
:
The following examples show how to use the JSONEXTRACTSCALAR
function:
This section contains reference documentation for the JSONEXTRACTKEY function.
Extracts all matched JSON field keys based on 'jsonPath' into a STRING_ARRAY.
JSONEXTRACTKEY(jsonField, 'jsonPath')
'jsonPath'
` is a literal. Pinot uses single quotes to distinguish them from identifiers.
The examples in this section are based on the Batch JSON Quick Start. In particular we'll be querying the row WHERE id = 7044874109
.
This section contains reference documentation for the JSONPATHDOUBLE function.
Extracts the Double value from jsonField
based on 'jsonPath'
, use optional defaultValue
for null or parsing error. This function can only be used in an ingestion transformation function.
JSONPATHDOUBLE(jsonField, 'jsonPath', [defaultValue])
'jsonPath'
` is a literal. Pinot uses single quotes to distinguish them from identifiers.
You can use the Jayway JsonPath Evaluator Tool to test JSON expressions before you import any data.
The usage examples are based on extracting fields from the following JSON document:
This function can be used in the table config to extract the age
property into the age
column, as described below:
hour |
---|
hour |
---|
Parameter | Arguments | Description |
---|---|---|
user_id | event_time | url |
---|---|---|
counts |
---|
counts |
---|
counts |
---|
histogram |
---|
histogram |
---|
Arguments | Description |
---|---|
Expression | Value |
---|---|
Arguments | Description |
---|---|
Expression | Value |
---|---|
Expression | Value |
---|---|
Arguments | Description |
---|---|
Expression | Value |
---|---|
Arguments | Description |
---|---|
Expression | Value |
---|---|
Arguments | Description |
---|---|
Expression | Value |
---|---|
Arguments | Description |
---|---|
repo |
---|
id | name |
---|---|
id | name |
---|---|
Arguments | Description |
---|---|
id | repo | keys |
---|---|---|
Arguments | Description |
---|---|
Expression | Value |
---|---|
value |
---|
These examples are based on the .
value |
---|
name |
---|
notTrimmed | trimmed |
---|
value |
---|
value |
---|
These examples are based on the .
value |
---|
These examples are based on the .
value |
---|
millisecond |
---|
millisecond |
---|
These examples are based on the .
value |
---|
These examples are based on the .
value |
---|
These examples are based on the .
value |
---|
These examples are based on the .
event_id | location | hash |
---|
23
0
STEPS
predicates 1...n
(required) These are individual predicates representing funnel steps which are applied on rows selected by the where
clause. Distinct values from the correlation_column
that satisfy these predicates are counted per step. For example, all filtered rows that match url = '/checkout'
are unionized into a set. Sets are intersected with the sets resulted from the preceding steps, each step retaining only individuals present in previous steps. Finally, unique counts are returned for each step in the funnel.
CORRELATED_BY
correlation_column
(required) Column to leverage for funnel correlation, distinct values from this column are counted per step during aggregation. Only dictionary-encoded columns are supported.
SETTINGS
settings 1...n
(optional) Settings to select and configure a funnel counting strategy:
bitmap
(default): This strategy is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions. For accurate distinct counting on all column types, use 'set' instead. See also DISTINCTCOUNTBITMAP.
set
: This strategy uses fastutil hash sets. Use with care, unbounded memory cost. See also DISTINCTCOUNT.
theta_sketch
: This strategy leverages Theta Sketch framework to provide an approximate funnel count with a small memory footprint. See also DISTINCTCOUNTTHETASKETCH.
nominalEntries
: theta-sketch strategy parameter (defaults to 4096). Can only be used in conjunction with theta_sketch setting.
partitioned
: This strategy counts funnel steps per segment, then sums up step counts across segments. Correlation column should be configured as partition column for this strategy. See also SEGMENTPARTITIONEDDISTINCTCOUNT.
sorted
: This strategy counts funnel steps per segment with zero memory footprint. Correlation column should be configured as sort column for this strategy. Can only be used in conjunction with partitioned
setting.
U1
2021-10-01 09:01:00.000
/product/listing
U2
2021-10-01 09:17:00.000
/product/search
U1
2021-10-01 09:33:00.000
/product/details
U1
2021-10-01 09:47:00.000
/cart/add
U3
2021-10-01 10:02:00.000
/product/listing
U3
2021-10-01 10:05:00.000
/product/search
U2
2021-10-01 10:06:00.000
/product/search
U2
2021-10-01 10:15:00.000
/checkout/start
U2
2021-10-01 10:16:00.000
/cart/add
U3
2021-10-01 11:17:00.000
/product/details
U2
2021-10-01 11:18:00.000
/checkout/confirmation
U3
2021-10-01 11:21:00.000
/cart/add
U1
2021-10-01 11:33:00.000
/cart/add
U1
2021-10-01 11:46:00.000
/checkout/start
U1
2021-10-01 11:54:00.000
/checkout/confirmation
3, 2, 2
3, 2, 2
2, 2, 1, 1
32348,21519,11359,7587,5488,5360,6282,7361,585,0
13520,16506,18375,12403,28591,8494
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows JsonPath Syntax to read values from JSON documents.
JSONPATHARRAY(myJsonRecord, '$.subjects[*].name')
["maths", "english"]
JSONPATHARRAY(myJsonRecord, '$.subjects[*].score')
[90, 70]
JSONPATHARRAY(myJsonRecord, '$.subjects[*].homework_grades[1]')
[85, 65]
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows JsonPath Syntax to read values from JSON documents.
JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].name')
["maths", "english"]
JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].score')
[90, 70]
JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].homework_grades[1]')
[85, 65]
JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].homework_grades[7]')
[]
JSONFORMAT(meta)
"{\"age\":12}"
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows JsonPath Syntax to read values from JSON documents.
JSONPATH(data, '$.name')
"Pete"
JSONPATH(data, '$.age')
24
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows JsonPath Syntax to read values from JSON documents.
JSONPATHSTRING(data, '$.age')
"24"
JSONPATHSTRING(data, '$.name["nick.name"]')
"Pete"
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows JsonPath Syntax to read values from JSON documents.
JSONPATHLONG(data, '$.age')
24
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows JsonPath Syntax to read values from JSON documents.
'results_type'
One of the Pinot supported data types:INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING,
INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY
.
{"id":115911530,"name":"LimeVista/Tapes","url":"https://api.github.com/repos/LimeVista/Tapes"}
7044874109
LimeVista/Tapes
7044874109
dummyValue
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows JsonPath Syntax to read values from JSON documents.
7044874109
{"id":115911530,"name":"LimeVista/Tapes","url":"https://api.github.com/repos/LimeVista/Tapes"}
$['id'],$['name'],$['url']
jsonField
An Identifier/Expression contains JSON documents.
'jsonPath'
Follows JsonPath Syntax to read values from JSON documents.
JSONPATHDOUBLE(data, '$.age')
24.0
********Hello, World |
108 |
pinot |
|
|
0 |
2.4849066497880004 |
1871 |
73 |
0 |
0 |
106 |
142 |
2 |
282776561 | 80406178a3d70a3d714041d5c28f5c28f6 | 92a8b787e81672261aad8afcf9de3aee |
This section contains reference documentation for the percentile function.
Returns the max
- min
value in a group
percentile(colName, percentile)
These examples are based on the Batch Quick Start.
This section contains reference documentation for the PERCENTILEMV function.
Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive
PERCENTILEMV(colName, N)
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the PERCENTILEESTMV function.
Returns the Nth percentile of the group using Quantile Digest algorithm.
PERCENTILEESTMV(colName, N)
These examples are based on the Hybrid Quick Start.
This section contains reference documentation for the MULT function.
Product of at least two values
MULT(col1, col2, col3...)
These examples are based on the Batch Quick Start.
This section contains reference documentation for the percentileest function.
Returns the Nth percentile of the group using Quantile Digest algorithm.
percentileest(colName, percentile)
These examples are based on the Batch Quick Start.
value |
---|
value |
---|
value |
---|
month |
---|
month |
---|
now |
---|
value |
---|
value |
---|
value |
---|
value |
---|
value |
---|
value |
---|
homeRuns | baseOnBalls | total |
---|---|---|
value |
---|
value |
---|
value |
---|
minute |
---|
minute |
---|
value |
---|
value |
---|
0
4
46
9
10
1639150454255
10
44
108
10
44
108
26
37
962
0
4
46
30
0
2
0
This section contains reference documentation for the DISTINCTSUM function.
Returns the sum of distinct row values in a group
DISTINCTSUM(colName) or sum(distinct col)
These examples are based on the Batch Quick Start.
This section contains reference documentation for the mode function.
Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.
MODE(colName, [reducerType])
These examples are based on the Batch Quick Start.
value |
---|
VALUE |
---|
VALUE |
---|
value |
---|
value |
---|
value |
---|
value |
---|
5
13922
244032
2008
2010
2008
2012