arrow-left

All pages
gitbookPowered by GitBook
triangle-exclamation
Couldn't generate the PDF for 163 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

arrayConcatLong

This section contains reference documentation for the arrayConcatLong function.

Concatenates two arrays of longs.

hashtag
Signature

arrayConcatLong('colName1', 'colName2')

hashtag
Usage Examples

This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type LONG with singleValueField in the table schema set to false.

ABS

This section contains reference documentation for the abs function.

Absolute of a value

hashtag
Signature

ABS(col1)

hashtag
Usage Examples

value
value

arrayConcatDouble

This section contains reference documentation for the arrayConcatDouble function.

Concatenates two arrays of doubles.

hashtag
Signature

arrayConcatDouble('colName1', 'colName2')

hashtag
Usage Examples

This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type DOUBLE with singleValueField in the table schema set to false.

arrayConcatFloat

This section contains reference documentation for the arrayConcatFloat function.

Concatenates two arrays of floats.

hashtag
Signature

arrayConcatFloat('colName1', 'colName2')

hashtag
Usage Examples

This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type FLOAT with singleValueField in the table schema set to false.

concat

This section contains reference documentation for the concat function.

Concatenate two input strings using the seperator

hashtag
Signature

CONCAT(col1, col2, seperator)

hashtag
Usage Examples

value
value

codepoint

This section contains reference documentation for the CODEPOINT function.

the Unicode codepoint of the first character of the string

hashtag
Signature

CODEPOINT(col)

hashtag
Usage Examples

value

ADD

This section contains reference documentation for the ADD function.

Sum of at least two values

hashtag
Signature

ADD(col1, col2, col3...)

hashtag
Usage Examples

These examples are based on the .

homeRuns
baseOnBalls
total

length

This section contains reference documentation for the length function.

calculate length of the string

hashtag
Signature

LENGTH(col)

hashtag
Usage Examples

value

COVAR_SAMP

This section contains reference documentation for the COVAR_SAMP function.

Returns the sample covariance between of 2 numerical columns.

COVAR_SAMP(col1, col2) = COVAR_POP(col1, col2) * besselCorrection

hashtag
Signatures

COVAR_SAMP(col1, col2) -> double

hashtag
Usage Examples

These examples are based on the .

covariance

count

This section contains reference documentation for the count function.

Get the count of rows in a group

hashtag
Signature

COUNT(colName)

hashtag
Usage Examples

These examples are based on the .

value

lower

This section contains reference documentation for the lower function.

Converts string to lower case.

hashtag
Signature

LOWER(col)

hashtag
Usage Examples

name

ltrim

This section contains reference documentation for the ltrim function.

trim spaces from left side of the string

hashtag
Signature

LTRIM(col)

hashtag
Usage Examples

notTrimmed
trimmed

ln

This section contains reference documentation for the ln function.

Natural log of value i.e. ln(col1)

hashtag
Signature

LN(col1)

hashtag
Usage Examples

value
value

COVAR_POP

This section contains reference documentation for the COVAR_POP function.

Returns the population covariance between of 2 numerical columns.

COVAR_POP(col1, col2) = E[col1 * col2] - E[col1]E[col2]

hashtag
Signatures

COVAR_POP(col1, col2) -> double

hashtag
Usage Examples

These examples are based on the .

covariance

DISTINCTCOUNTBITMAPMV

This section contains reference documentation for the DISTINCTCOUNTHLLMV function.

Returns an approximate distinct count using HyperLogLog in a group.

hashtag
Signature

DISTINCTCOUNTHLLMV(colName)

hashtag
Usage Examples

These examples are based on the .

value

max

This section contains reference documentation for the max function.

Get the maximum value in a group

hashtag
Signature

MAX(colName)

hashtag
Usage Examples

These examples are based on the .

value

minmaxrange

This section contains reference documentation for the minmaxrange function.

Returns the max - min value in a group

hashtag
Signature

MINMAXRANGE(colName)

hashtag
Usage Examples

These examples are based on the .

value

CHR

This section contains reference documentation for the CHR function.

the character corresponding to the Unicode codepoint

hashtag
Signature

CHR(codepoint)

hashtag
Usage Examples

value

MAXMV

This section contains reference documentation for the MAXMV function.

Get the maximum value in a group

hashtag
Signature

MAXMV(colName)

hashtag
Usage Examples

These examples are based on the .

value

lpad

This section contains reference documentation for the LPAD function.

string padded from the left side with pad to reach final size

hashtag
Signature

LPAD(col, size, pad)

hashtag
Usage Examples

value

MOD

This section contains reference documentation for the MOD function.

Modulo of two values

hashtag
Signature

MOD(col1, col2)

hashtag
Usage Examples

value
value

min

This section contains reference documentation for the min function.

Get the minimum value in a group

hashtag
Signature

MIN(colName)

hashtag
Usage Examples

These examples are based on the .

value

arraySortInt

This section contains reference documentation for the arraySortInt function.

Sorts array of ints.

hashtag
Signature

arraySortInt('colName')

arrayContainsString

This section contains reference documentation for the arrayContainsString function.

Checks if string value exists in array.

hashtag
Signature

arrayContainsString('colName', valueToFind)

arrayRemoveString

This section contains reference documentation for the arrayRemoveString function.

Removes value from array of strings.

hashtag
Signature

arrayRemoveString('colName', value)

arrayConcatString

This section contains reference documentation for the arrayConcatString function.

Concatenates two arrays of strings.

hashtag
Signature

arrayConcatString('colName1', 'colName2')

EXPR_MIN / EXPR_MAX

This section contains reference documentation for the EXPR_MIN and EXPR_MAX function.

This function scans the given dataset to identify the maximum and minimum values in the specified measuring columns. Once these extreme values (the maxima and minima) are found, the function locates the corresponding entries in the projection column. These entries are associated with the rows where the extreme values were found in the measuring columns. The function then returns these projection column values, providing a way to link the extreme measurements with their corresponding data in another part of the dataset.

hashtag
Signature

EXPR_MIN (projectionCol, measuringCol1, measuringCol2, measuringCol3)

dayOfYear

This section contains reference documentation for the dayOfYear function.

Returns the day of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 366.

hashtag
Signature

dayOfYear(tsInMillis)

dayOfYear(tsInMillis, timeZoneId)

DISTINCTAVGMV

This section contains reference documentation for the DISTINCTAVGMV function.

Returns the average of distinct row values in a group

hashtag
Signature

DISTINCTAVGMV(colName)

FromEpoch

This section contains reference documentation for the fromEpoch functions.

Convert epoch to epoch milliseconds. The following time units are supported:

  • SECONDS

  • MINUTES

DISTINCTCOUNTRAWHLL

This section contains reference documentation for the DISTINCTCOUNTRAWHLL function.

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

hashtag
Signature

DISTINCTCOUNTRAWHLL(colName, log2m)

day

This section contains reference documentation for the day function.

Returns the day of the month from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 31.

hashtag
Signature

day(tsInMillis)

day(tsInMillis, timeZoneId)

arrayUnionInt

This section contains reference documentation for the arrayUnionInt function.

Create a union of two arrays of ints.

hashtag
Signature

arrayUnionInt('colName1', 'colName2')

arrayDistinctInt

This section contains reference documentation for the arrayDistinctInt function.

Returns unique values in an array of ints.

hashtag
Signature

arrayDistinctInt('colName')

arrayReverseInt

This section contains reference documentation for the arrayReverseInt function.

Reverses array of ints.

hashtag
Signature

arrayReverseInt('colName')

dayOfWeek

This section contains reference documentation for the dayOfWeek function.

Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday).

hashtag
Signature

dayOfWeek(tsInMillis)

dayOfWeek(tsInMillis, timeZoneId)

ceil

This section contains reference documentation for the CEIL function.

Rounded up to the nearest integer.

hashtag
Signature

CEIL(col1)

arrayIndexOfString

This section contains reference documentation for the arrayIndexOfString function.

Finds the last index of the given value in the array starting at the given index.

hashtag
Signature

arrayIndexOfString('colName', valueToFind)

ARRAYLENGTH

This section contains reference documentation for the ARRAYLENGTH function.

Returns the length of a multi-value column

hashtag
Signature

ARRAYLENGTH('colName')

arrayIndexOfInt

This section contains reference documentation for the arrayIndexOfInt function.

Finds the last index of the given value in the array starting at the given index.

hashtag
Signature

arrayIndexOfInt('colName', valueToFind)

DISTINCTCOUNTBITMAP

This section contains reference documentation for the DISTINCTCOUNTBITMAP function.

Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions. For accurate distinct counting on all column types, see .

hashtag
Signature

DISTINCTCOUNTBITMAP(colName)

DISTINCTSUM

This section contains reference documentation for the DISTINCTSUM function.

Returns the sum of distinct row values in a group

hashtag
Signature

DISTINCTSUM(colName) or sum(distinct col)

arrayConcatInt

This section contains reference documentation for the arrayConcatInt function.

Concatenates two arrays of ints.

hashtag
Signature

arrayConcatInt('colName1', 'colName2')

mode

This section contains reference documentation for the mode function.

Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.

hashtag
Signature

MODE(colName, [reducerType])

FIRSTWITHTIME

This section contains reference documentation for the firstwithtime function.

Returns the value of dataColumn with the smallest timeColumn value where:

  • timeColumn is used to define the time of dataColumn, which can be of type TIMESTAMP, INT, LONG

FLOOR

This section contains reference documentation for the FLOOR function.

Rounded down to the nearest integer.

hashtag
Signature

FLOOR(col1)

DISTINCTCOUNT

This section contains reference documentation for the DISTINCTCOUNT function.

Returns the count of distinct row values in a group

hashtag
Signature

DISTINCTCOUNT(colName)

DISTINCTCOUNTHLL

This section contains reference documentation for the DISTINCTCOUNTHLL function.

Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog. For accurate distinct counting, see .

hashtag
Signature

DISTINCTCOUNTHLL(colName, log2m)

Histogram

This section contains reference documentation for the HISTOGRAM function.

Returns the count of data points that fall within each bin as a vector. The bins are left-inclusive and right-exclusive, i.e. [a, b), except for the last one which is inclusive on both sides [a, b].

hashtag
Signatures

DISTINCTCOUNTRAWHLLMV

This section contains reference documentation for the DISTINCTCOUNTRAWHLLMV function.

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

hashtag
Signature

DISTINCTCOUNTRAWHLLMV(colName, log2m)

AVGMV

This section contains reference documentation for the AVGMV function.

Get the avg of values in a group

hashtag
Signature

AVGMV(colName)

DISTINCTAVG

This section contains reference documentation for the DISTINCTAVG function.

Returns the average of distinct row values in a group

hashtag
Signature

DISTINCTAVG(colName) or avg(distinct col)

hour

This section contains reference documentation for the hour function.

Returns the hour of the day from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 23.

hashtag
Signature

hour(tsInMillis)

hour(tsInMillis, timeZoneId)

DISTINCTSUMMV

This section contains reference documentation for the DISTINCTSUMMV function.

Returns the sum of the distinct row values in a group

hashtag
Signature

DISTINCTSUMMV(colName)

FrequentLongsSketch

This section contains reference documentation for the FREQUENTLONGSSKETCH function.

FREQUENTLONGSSKETCH is an estimation data-sketch function which can be used to estimate the frequency of an item. It is based on and returns a serialized sketch object which can be merged with other sketches.

hashtag
Signature

FREQUENTLONGSSKETCH(column, maxMapSize=256) -> Base64 encoded sketch object

MD5

This section contains reference documentation for the MD5 function.

Return MD5 digest of binary column(bytes type) as hex string

hashtag
Signature

MD5(bytesCol)

JSONFORMAT

This section contains reference documentation for the JSONFORMAT function.

Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an .

hashtag
Signature

JSONFORMAT(object)

MINMV

This section contains reference documentation for the MINMV function.

Get the minimum value in a group

hashtag
Signature

MINMV(colName)

arrayUnionString

This section contains reference documentation for the arrayUnionString function.

Create a union of two arrays of strings.

hashtag
Signature

arrayUnionString('colName1', 'colName2')

DIV

This section contains reference documentation for the DIV function.

Quotient of two values

hashtag
Signature

DIV(col1, col2)

jsonextractkey

This section contains reference documentation for the JSONEXTRACTKEY function.

Extracts all matched JSON field keys based on 'jsonPath' into a STRING_ARRAY.

hashtag
Signature

JSONEXTRACTKEY(jsonField, 'jsonPath')

Arguments

millisecond

This section contains reference documentation for the millisecond function.

Returns the millisecond of the second from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 999.

hashtag
Signature

millisecond(tsInMillis)

millisecond(tsInMillis, timeZoneId)

select mvCol1, 
       arrayConcatLong(mvCol1, mvCol2) AS concatLongs
from multiValueTable
WHERE arraylength(mvCol1) >= 2
limit 5
select ABS(-12.1) AS value
from ignoreMe

12.1

select ABS(12.1) AS value
from ignoreMe

12.1

select mvCol1, 
       arrayConcatDouble(mvCol1, mvCol2) AS concatDoubles
from multiValueTable
WHERE arraylength(mvCol1) >= 2
limit 5
select mvCol1, 
       arrayConcatFloat(mvCol1, mvCol2) AS concatFloats
from multiValueTable
WHERE arraylength(mvCol1) >= 2
limit 5
SELECT concat('Apache', 'Pinot', ' ') AS value
FROM ignoreMe

Apache Pinot

SELECT concat('real-time', 'analytics', '__') AS value
FROM ignoreMe

real-time__analytics

SELECT CODEPOINT('Apache Pinot') AS value
FROM ignoreMe

65

select homeRuns, baseOnBalls, ADD(homeRuns, baseOnBalls) AS total
from baseballStats 
WHERE teamID = 'ML1' 
AND yearID = 1956 
AND playerName = 'Henry Louis'

26

37

63

Batch Quick Start
SELECT length('Pinot') AS value
FROM ignoreMe

5

SELECT COVAR_SAMP(numberOfGames, AtBatting) AS covariance 
FROM baseballStats

8270.973200974102

Batch Quick Start
select count(*) AS value
from baseballStats 

97889

Batch Quick Start
select LOWER('Pinot') AS name
FROM ignoreMe

pinot

SELECT ' Pinot with spaces  ' AS notTrimmed,
       ltrim(' Pinot with spaces ') AS trimmed
FROM ignoreMe

" Pinot with spaces "

"Pinot with spaces "

select ln(1) AS value
from ignoreMe

0

select ln(12) AS value
from ignoreMe

2.4849066497880004

SELECT COVAR_POP(numberOfGames, hits) AS covariance 
FROM baseballStats

2314.249154477403

Batch Quick Start
select DISTINCTCOUNTHLLMV(DivLongestGTimes) AS value
from airlineStats 
where arraylength(DivLongestGTimes) > 1

34

Hybrid Quick Start
select max(homeRuns) AS value
from baseballStats 

73

Batch Quick Start
select MINMAXRANGE(yearID) AS value
from baseballStats 

142

Batch Quick Start
SELECT CHR(65) AS value
FROM ignoreMe

A

select MAXMV(DivLongestGTimes) AS value
from airlineStats 
where arraylength(DivLongestGTimes) > 1

108

Hybrid Quick Start
SELECT LPAD('Hello, World', '20', '*') AS value
FROM ignoreMe

********Hello, World

select MOD(12, 5) AS value
from ignoreMe

2

select MOD(12, 2) AS value
from ignoreMe

0

select min(yearID) AS value
from baseballStats 

1871

Batch Quick Start
hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

DivAirportIDs
sortedIds

13891,12892

12892,13891

14683,14683

14683,14683

12339,12339

12339,12339

13198,10721

10721,13198

10721,12478

10721,12478

hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

DivTailNums
index

N7713A,N7713A

true

N344AA,N344AA

false

N7713A,N7713A

true

select DivTailNums, 
       arrayContainsString(DivTailNums, 'N7713A') AS index
from airlineStats 
WHERE arraylength(DivTailNums) >= 2
limit 5
hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

DivAirportIDs
value

SEA,PSC

PSC

SEA,PSC,PHX,MSY

PSC,PHX,MSY

SEA,PSC,PHX,MSY

PSC,PHX,MSY

SEA,PSC

PSC

SEA,PSC

PSC

hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

DivTailNums
concatIds

N7713A,N7713A

N7713A,N7713A,N7713A,N7713A

N344AA,N344AA

N344AA,N344AA,N344AA,N344AA

N344AA,N344AA

N344AA,N344AA,N344AA,N344AA

N7713A,N7713A

N7713A,N7713A,N7713A,N7713A

EXPR_MAX (projectionCol, measuringCol1, measuringCol2, measuringCol3)

hashtag
Usage Examples

Find the user with maximum activity. If there are multiple users, break the tie with their last_activity_date. If still a tie, break with user_id. And project user_id.

More useful is that this multiple such aggregation function can be used with GROUP BY

Note:

  1. In cases where multiple rows share the same extreme values in the measuring columns, all such rows will be returned by the function.

  2. If the goal is to project multiple different columns that correspond to the same set of measuring columns, you can achieve this by invoking the function multiple times, each time specifying a different projection column.

  3. This impl does not work with AS clause (e.g. SELECT exprmin(longCol, doubleCol) AS exprmin won't work)

  4. Putting exprmin/exprmax column inside order by clause (e.g. SELECT intCol, exprmin(longCol, doubleCol) FROM table GROUP BY intCol ORDER BY exprmin(longCol, doubleCol)) is not supported as semantically ordering multi-column multi-row exprmin/exprmax results doesn't make sense

  5. Currently projecting MV bytes column doesn't work for now due to an issue

For more detailed examples, see: https://github.com/apache/pinot/pull/10636arrow-up-right

doy(tsInMillis)

doy(tsInMillis, timeZoneId)

hashtag
Usage Examples

dayOfYear

346

dayOfYear

347

dayOfYear

346

dayOfYear

347

select dayOfYear(1639351800000) AS dayOfYear
FROM ignoreMe
hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

VALUE

32.4

SELECT DISTINCTAVGMV(DivLongestGTimes) AS VALUE
FROM airlineStats
WHERE arraylength(DivLongestGTimes) > 1
HOURS
  • DAYS

  • hashtag
    Signature

    FromEpoch<TIME_UNIT>(timeIn<Time_UNIT>)

    hashtag
    Usage Examples

    epochMillis

    1613472303000

    epochMillis

    1613472300000

    epochMillis

    1613469600000

    epochMillis

    1613433600000

    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    value

    00000008000000ac00000800000084000210000000000020001020220030042002100420002010020210000300008020040180400001300310001863024004220870800004400421040104610220080000020000040000030000800002108420000110400800000106000060000080020000082000218c0002000000020000010200100000018c0006000400022004a0000088000200800000320820021000000221842000000000025088000220080100009420

    value

    000000010000000400000106

    select DISTINCTCOUNTRAWHLL(teamID) AS value
    from baseballStats 
    dayOfMonth(tsInMillis)

    dayOfMonth(tsInMillis, timeZoneId)

    hashtag
    Usage Examples

    day

    12

    day

    13

    day

    12

    day

    13

    select day(1639351800000) AS day
    FROM ignoreMe
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    DivAirportIDs
    unique

    15016,11066

    15016,11066

    10620,14869

    10620,14869

    13891,12892

    13891,12892

    12264,10397

    12264,10397

    11066,12892

    11066,12892

    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    DivAirportIDs
    reversedIds

    13891,12892

    12892,13891

    14683,14683

    14683,14683

    12339,12339

    12339,12339

    13487,13930

    13930,13487

    13029,11292

    11292,13029

    dow(tsInMillis)

    dow(tsInMillis, timeZoneId)

    hashtag
    Usage Examples

    dayOfWeek

    7

    dayOfWeek

    1

    dayOfWeek

    7

    dayOfWeek

    1

    select dayOfWeek(1639351800000) AS dayOfWeek
    FROM ignoreMe
    hashtag
    Usage Examples
    value

    13

    value

    -12

    select CEIL(12.1) AS value
    from ignoreMe
    select CEIL(-12.1) AS value
    from ignoreMe
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    DivTailNums
    index

    N7713A,N7713A

    0

    N344AA,N344AA

    -1

    N7713A,N7713A

    0

    select DivTailNums, 
           arrayIndexOfString(DivTailNums, 'N7713A') AS index
    from airlineStats 
    WHERE arraylength(DivTailNums) >= 2
    limit 5
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    length
    count(*)

    1

    5382

    37

    267

    33

    223

    17

    166

    22

    160

    circle-info

    The count(*) values will increase each time we execute the query as data is constantly being ingested by the Hybrid Quick Start.

    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    DivAirportIDs
    index

    13891,12892

    -1

    14683,14683

    0

    12339,12339

    -1

    13487,13930

    -1

    13029,11292

    -1

    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    value

    7

    value

    148

    DISTINCTCOUNT
    select DISTINCTCOUNTBITMAP(league) AS value
    from baseballStats 
    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    VALUE

    13922

    VALUE

    244032

    SELECT DISTINCTSUM(runs) AS VALUE
    FROM baseballStats
    SELECT SUM(DISTINCT AtBatting) AS VALUE
    FROM baseballStats
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    DivWheelsOffs
    concatIds

    1453,1731

    1453,1731,1415,1623

    1908,1758

    1908,1758,1339,2310

    1453,1731

    1453,1731,1415,1623

    1908,1758

    1908,1758,1339,2310

    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    value

    2008

    value

    2010

    value

    2008

    value

    2012

    select mode(yearID) AS value
    from baseballStats 
    WHERE AtBatting != 0 AND yearID > 2001
  • dataType specifies the type for dataColumn, which can be BOOLEAN, INT, LONG, FLOAT, DOUBLE, STRING

  • hashtag
    Signature

    FIRSTWITHTIME(dataColumn, timeColumn, 'dataType')

    hashtag
    Example

    This example is based on the Streaming Quick Start.

    value

    group_name1016303453

    hashtag
    Usage Examples
    value

    12

    value

    -13

    select FLOOR(12.1) AS value
    from ignoreMe
    select FLOOR(-12.1) AS value
    from ignoreMe
    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    value

    7

    value

    149

    select DISTINCTCOUNT(league) AS value
    from baseballStats 
    select DISTINCTCOUNT(teamID) AS value
    from baseballStats 
    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    value

    158

    value

    149

    DISTINCTCOUNT
    select DISTINCTCOUNTHLL(teamID) AS value
    from baseballStats 
    Equal length bins (better performance):

    HISTOGRAM(colName, lower, upper, numBins)

    1. Arbitrary increasing bin edges:

    HISTOGRAM(colName, ARRAY[binEdge1, binEdge2, binEdge3, ...])

    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    1. 10 equal-length bins [0, 20), [20, 30) ... [180, 200]

    histogram

    32348,21519,11359,7587,5488,5360,6282,7361,585,0

    1. 6 bins (- ∞, 1), [1, 10), [10, 50), [50,100), [100,500), [500, 1000]

    histogram

    13520,16506,18375,12403,28591,8494

    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    value

    00000008000000ac00000000000000000000000500000020000000000030000202108000040000010000000300010400000000000000000000000463000000000000000000010001041000200000002000000000000000000a00000000028001000000010800000000010000001008000000804000000000020000040000880000000000000000000000000000000000000000000000800000000800020004000000840000000002000000000000000000001400

    value

    0000000100000004000000e4

    select DISTINCTCOUNTRAWHLLMV(DivAirports) AS value
    from airlineStats 
    where arraylength(DivAirports) > 1
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    value

    18.465753424657535

    select AVGMV(DivLongestGTimes) AS value
    from airlineStats 
    where arraylength(DivLongestGTimes) > 1
    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    VALUE

    83.36526946107784

    VALUE

    349.1158798283262

    SELECT DISTINCTAVG(runs) AS VALUE
    FROM baseballStats
    hashtag
    Usage Examples
    hour

    23

    hour

    0

    select hour(1639351800000) AS hour
    FROM ignoreMe
    select hour(1639351800000, 'CET') AS hour
    FROM ignoreMe
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Startarrow-up-right.

    VALUE

    1134

    SELECT DISTINCTSUMMV(DivLongestGTimes) AS VALUE
    FROM airlineStats
    WHERE arraylength(DivLongestGTimes) > 1
  • column (required): Name of the column to aggregate on. Needs to be a type which can be cast into 'LONG'.

  • maxMapSize: This value specifies the maximum physical length of the internal hash map. The maxMapSize must be a power of 2 and the default value is 256.

  • hashtag
    Usage Example

    frequentlongssketch(AirlineID)

    BAEKCAUAAAAOAAAAAA...

    Which can be used, for example in Java as:

    For more examples on the sketch API, refer to the Datasketches documentationarrow-up-right.

    Apache Datasketches libraryarrow-up-right
    hashtag
    Usage Examples

    These examples are based on the Real time Quick Start.

    event_id
    location
    hash

    282776561

    80406178a3d70a3d714041d5c28f5c28f6

    92a8b787e81672261aad8afcf9de3aee

    circle-info

    The row returned will be different if you run this example as the data is ingested in real-time.

    select event_id, location, MD5(location) AS hash
    from meetupRsvp 
    limit 1
    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    JSONFORMAT(meta)

    "{\"age\":12}"

    This function can be used in the table config to extract the meta property into the data column, as described below:

    ingestion transformation function
    {"timestamp": "2019-10-09 21:25:25", "meta": {"age": 12}}
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    value

    2

    select MINMV(DivLongestGTimes) AS value
    from airlineStats 
    where arraylength(DivLongestGTimes) > 1
    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    homeRuns
    numberOfGames
    total

    26

    153

    0.16993464052287582

    select homeRuns, numberOfGames, DIV(homeRuns, numberOfGames) AS total
    from baseballStats 
    WHERE teamID = 'ML1' 
    AND yearID = 1956 
    AND playerName = 'Henry Louis'
    hashtag
    Usage Examples
    millisecond

    0

    millisecond

    0

    select millisecond(1639351800000) AS millisecond
    FROM ignoreMe
    select millisecond(1639351800000, 'America/St_Johns') AS millisecond
    FROM ignoreMe
    select DivAirportIDs, 
           arraySortInt(DivAirportIDs) AS sortedIds
    from airlineStats 
    WHERE arraylength(DivAirportIDs) >= 2
    limit 5
    select RandomAirports, 
           arrayRemoveString(RandomAirports, 'SEA') AS value
    from airlineStats 
    WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
    limit 5
    select DivTailNums, 
           arrayConcatString(DivTailNums, DivTailNums) AS concatIds
    from airlineStats 
    WHERE arraylength(DivTailNums) >= 2
    limit 5
    SELECT EXPR_MAX(user_id, activity, last_activity_date, user_id)
    FROM userEngagmentTable
    SELECT user_region, EXPR_MAX(user_id, activity, last_activity_date, user_id),
        EXPR_MIN(user_id, user_satisfaction)
    FROM userEngagmentTable
    GROUP BY user_region
    select dayOfYear(1639351800000, 'CET') AS dayOfYear
    FROM ignoreMe
    select doy(1639351800000) AS dayOfYear
    FROM ignoreMe
    select doy(1639351800000, 'CET') AS dayOfYear
    FROM ignoreMe
    select FromEpochSeconds(1613472303) AS epochMillis
    FROM ignoreMe
    select FromEpochMinutes(26891205) AS epochMillis
    FROM ignoreMe
    select FromEpochHours(448186) AS epochMillis
    FROM ignoreMe
    select FromEpochDays(18674) AS epochMillis
    FROM ignoreMe
    select DISTINCTCOUNTRAWHLL(teamID, 1) AS value
    from baseballStats 
    select day(1639351800000, 'CET') AS day
    FROM ignoreMe
    select dayOfMonth(1639351800000) AS day
    FROM ignoreMe
    select dayOfMonth(1639351800000, 'CET') AS day
    FROM ignoreMe
    select DivAirportIDs, 
           arrayDistinctInt(DivAirportIDs) AS unique
    from airlineStats 
    WHERE arraylength(DivAirportIDs) >= 2
    limit 5
    select DivAirportIDs, 
           arrayReverseInt(DivAirportIDs) AS reversedIds
    from airlineStats 
    WHERE arraylength(DivAirportIDs) >= 2
    limit 5
    select dayOfWeek(1639351800000, 'CET') AS dayOfWeek
    FROM ignoreMe
    select dow(1639351800000) AS dayOfWeek
    FROM ignoreMe
    select dow(1639351800000, 'CET') AS dayOfWeek
    FROM ignoreMe
    select ARRAYLENGTH(RandomAirports) AS length, count(*) 
    from airlineStats 
    GROUP BY length
    ORDER BY count(*) DESC
    LIMIT 5
    select DivAirportIDs, 
           arrayIndexOfInt(DivAirportIDs, 14683) AS index
    from airlineStats 
    WHERE arraylength(DivAirportIDs) >= 2
    limit 5
    select DISTINCTCOUNTBITMAP(teamID) AS value
    from baseballStats 
    select DivWheelsOffs, 
           arrayConcatInt(DivWheelsOffs, DivWheelsOns) AS concatIds
    from airlineStats 
    WHERE arraylength(DivWheelsOffs) >= 2
    limit 5
    select mode(yearID, 'AVG') AS value
    from baseballStats 
    WHERE AtBatting != 0 AND yearID > 2001
    select mode(yearID, 'MIN') AS value
    from baseballStats 
    WHERE AtBatting != 0 AND yearID > 2001
    select mode(yearID, 'MAX') AS value
    from baseballStats 
    WHERE AtBatting != 0 AND yearID > 2001
    select FIRSTWITHTIME(group_name, __metadata$recordTimestamp, 'STRING')
    from meetupRsvp 
    select DISTINCTCOUNTHLL(teamID, 12) AS value
    from baseballStats 
    SELECT HISTOGRAM(numberOfGames, 0, 200, 10) AS histogram
    FROM baseballStats 
    select HISTOGRAM(AtBatting, Array['-Infinity', 1, 10, 50, 100, 500, 1000]) AS histogram
    from baseballStats
    select DISTINCTCOUNTRAWHLLMV(DivAirports, 1) AS value
    from airlineStats 
    where arraylength(DivAirports) > 1
    SELECT AVG(DISTINCT AtBatting) AS VALUE
    FROM baseballStats
    select FREQUENTLONGSSKETCH(AirlineID, 16) from airlineStats
    byte[] byteArr = Base64.getDecoder().decode(encodedSketch);
    LongsSketch sketch = LongsSketch.getInstance(Memory.wrap(byteArr));
    
    LongsSketch.Row[] items = sketch.getFrequentItems(ErrorType.NO_FALSE_NEGATIVES);
    for (int i = 0; i < items.length; i++) {
      LongsSketch.Row item = items[i];
      System.out.printf("Airline: %d, Frequency: %d %n", item.getItem(), item.getEstimate());
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"data",
                   "transformFunction":"JSONFORMAT(meta)"
                }
             ]
          }
       }
    }
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    DivWheelsOffs
    DivWheelsOns
    unionIds

    1453,1731

    1415,1623

    1453,1731,1415,1623

    1908,1758

    1339,2310

    1908,1758,1339,2310

    1453,1731

    1415,1623

    1453,1731,1415,1623

    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    DivTailNums
    DivAirports
    unionIds

    N7713A,N7713A

    IND,IND

    N7713A,IND

    N344AA,N344AA

    MCI,BOS

    N344AA,MCI,BOS

    N7713A,N7713A

    IND,IND

    N7713A,IND

    Description

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows to read values from JSON documents.

    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.

    hashtag
    Usage Examples

    The examples in this section are based on the Batch JSON Quick Start. In particular we'll be querying the row WHERE id = 7044874109.

    id
    repo
    keys

    7044874109

    {"id":115911530,"name":"LimeVista/Tapes","url":"https://api.github.com/repos/LimeVista/Tapes"}

    $['id'],$['name'],$['url']

    arraySliceInt

    This section contains reference documentation for the arraySliceInt function.

    Returns the values in the array between the start and end positions.

    hashtag
    Signature

    arraySliceInt('colName', start, end)

    hashtag
    Usage Examples

    These examples are based on the .

    FlightNum
    airports
    DivAirportIDs

    arrayRemoveInt

    This section contains reference documentation for the arrayRemoveInt function.

    Removes value from array of ints.

    hashtag
    Signature

    arrayRemoveInt('colName', value)

    hashtag
    Usage Examples

    These examples are based on the .

    DivAirportIDs
    value

    caseWhen

    This section contains reference documentation for the caseWhen function.

    Returns values depending on boolean expressions. This function can only be used in an ingestion transformation function.

    hashtag
    Signature

    caseWhen(booleanExpr1, valueIfExpr1True, booleanExpr2, valueIfExpr2True) caseWhen(booleanExpr1, valueIfExpr1True, booleanExpr2, valueIfExpr2True, ... ,valueIfFalse)

    Arguments
    Description

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON documents:

    Expression
    Value

    This function can be used in the to add northernHemisphere column:

    arraySliceString

    This section contains reference documentation for the arraySliceString function.

    Returns the values in the array between the start and end positions.

    hashtag
    Signature

    arraySliceString('colName', start, end)

    hashtag
    Usage Examples

    These examples are based on the .

    FlightNum
    airports
    RandomAirports

    COUNTMV

    This section contains reference documentation for the COUNTMV function.

    Get the count of rows in a group

    hashtag
    Signature

    COUNTMV(colName)

    hashtag
    Usage Examples

    These examples are based on the .

    The following query returns the documents that have a DivTailNums with more than one value:

    DivTailNums

    You can count the number of items in these rows by running the following query:

    value

    arrayContainsInt

    This section contains reference documentation for the arrayContainsInt function.

    Checks if int value exists in array.

    hashtag
    Signature

    arrayContainsInt('colName', valueToFind)

    hashtag
    Usage Examples

    These examples are based on the .

    DivAirportIDs
    containsValue

    FromDateTime

    This section contains reference documentation for the FromDateTime function.

    Converts a formatted date-time string to milliseconds, based on the provided Joda-Time patternarrow-up-right.

    hashtag
    Signature

    FromDateTime(dateTimeString, pattern)

    hashtag
    Usage Examples

    epochMillis
    epochMillis
    epochMillis
    epochMillis

    ago

    This section contains reference documentation for the ago function.

    Return time as epoch millis before the given period (in ISO-8601 duration format).

    Examples:

    • "PT20.345S" -- parses as "20.345 seconds"

    • "PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)

    • "PT10H" -- parses as "10 hours" (where an hour is 3600 seconds)

    • "P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds)

    • "P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes"

    • "P-6H3M" -- parses as "-6 hours and +3 minutes"

    • "-P6H3M" -- parses as "-6 hours and -3 minutes"

    • "-P-6H+3M" -- parses as "+6 hours and -3 minutes"

    hashtag
    Signature

    ago()

    hashtag
    Usage Examples

    oneDayAgo

    This function is typically used in the predicate to filter on timestamps for recent data. e.g. filter data on recent 1 day.

    exp

    This section contains reference documentation for the exp function.

    Euler’s number(e) raised to the power of col.

    hashtag
    Signature

    EXP(col1)

    hashtag
    Usage Examples

    value
    value

    arraySortString

    This section contains reference documentation for the arraySortString function.

    Sorts array of strings.

    hashtag
    Signature

    arraySortString('colName')

    hashtag
    Usage Examples

    These examples are based on the .

    FlightNum
    sortedAirports
    RandomAirports

    arrayDistinctString

    This section contains reference documentation for the arrayDistinctString function.

    Returns unique values in an array of strings.

    hashtag
    Signature

    arrayDistinctString('colName')

    hashtag
    Usage Examples

    These examples are based on the .

    DivTailNums
    unique

    LASTWITHTIME

    This section contains reference documentation for the lastwithtime function.

    Returns the value of dataColumn with the largest timeColumn value where:

    • timeColumn is used to define the time of dataColumn, which can be of type TIMESTAMP, INT, LONG

    • dataType specifies the type for dataColumn, which can be BOOLEAN, INT, LONG, FLOAT, DOUBLE, STRING

    hashtag
    Signature

    LASTWITHTIME(dataColumn, timeColumn, 'dataType')

    hashtag
    Example

    This example is based on the .

    value

    DISTINCT

    This section contains reference documentation for the DISTINCT function.

    Returns the distinct row values in a group

    hashtag
    Signature

    DISTINCT(colName)

    hashtag
    Usage Examples

    These examples are based on the .

    value
    value

    DISTINCTCOUNTHLLMV

    This section contains reference documentation for the DISTINCTCOUNTBITMAPMV function.

    Returns the count of distinct row values in a group. This function is accurate for an INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.

    hashtag
    Signature

    DISTINCTCOUNTBITMAPMV(colName)

    hashtag
    Usage Examples

    These examples are based on the .

    value
    value

    minute

    This section contains reference documentation for the minute function.

    Returns the minute of the hour from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 59.

    hashtag
    Signature

    minute(tsInMillis)

    minute(tsInMillis, timeZoneId)

    hashtag
    Usage Examples

    minute
    minute

    isSubnetOf

    This section contains reference documentation for the isSubnetOf function.

    Takes 2 arguments of type STRING. The first argument is an ipPrefix, and the second argument is a single ipAddress. This function handles both IPv4 and IPv6 arguments.

    Returns a boolean value checking if ipAddress is in the subnet of ipPrefix

    hashtag
    Signatures

    isSubnetOf(ipPrefix, ipAddress) -> boolean

    hashtag
    Usage Examples

    See the following sample queries where isSubnetOf is used in different parts of the query.

    arrayReverseString

    This section contains reference documentation for the arrayReverseString function.

    Reverses array of strings.

    hashtag
    Signature

    arrayReverseString('colName')

    hashtag
    Usage Examples

    These examples are based on the .

    FlightNum
    reversedAirports
    RandomAirports

    DISTINCTCOUNTRAWTHETASKETCH

    This section contains reference documentation for the DISTINCTCOUNTRAWTHETASKETCH function.

    The Theta Sketcharrow-up-right framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Classarrow-up-right and its extensions from the library org.apache.datasketches:datasketches-java:4.2.0 to perform distinct counting as well as evaluating set operations.

    hashtag
    Signature

    distinctCountRawThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate) -> HexEncoded

    • thetaSketchColumn (required): Name of the column to aggregate on.

    • thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.

    hashtag
    Usage Examples

    These examples are based on the .

    value
    value

    We can also provide predicates and a post aggregation expression to compute more complicated cardinalities:

    value

    MINMAXRANGEMV

    This section contains reference documentation for the MINMAXRANGEMV function.

    Returns the max - min value in a group

    hashtag
    Signature

    MINMAXRANGEMV(colName)

    hashtag
    Usage Examples

    These examples are based on the .

    value

    DISTINCTCOUNTMV

    This section contains reference documentation for the DISTINCTCOUNTMV function.

    Returns the count of distinct row values in a group

    hashtag
    Signature

    DISTINCTCOUNTMV(colName)

    hashtag
    Usage Examples

    These examples are based on the .

    The following query returns the documents that have a DivTailNums with more than one value:

    DivTailNums

    You can count the distinct number of items in these rows by running the following query:

    value

    Functions

    This page contains reference documentation for functions in Apache Pinot.

    This page contains reference documentation for functions in Apache Pinot.

    Base64

    This section contains reference documentation for base64 encode and decode functions.

    Encoding scheme follows

    • toBase64 returns Base64 encoded string of input binary data (bytes type).

    • fromBase64

    JSONPATHLONG

    This section contains reference documentation for the JSONPATHLONG function.

    Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .

    hashtag
    Signature

    FrequentStringsSketch

    This section contains reference documentation for the FREQUENTSTRINGSSKETCH function.

    FREQUENTSTRINGSSKETCH is an estimation data-sketch function which can be used to estimate the frequency of an item. It is based on and returns a serialized sketch object which can be merged with other sketches.

    hashtag
    Signature

    FREQUENTSTRINGSSKETCH(column, maxMapSize=256) -> Base64 encoded sketch object

    FromEpochBucket

    This section contains reference documentation for the fromEpochBucket functions.

    Convert epoch to epoch milliseconds. e.g. 10 seconds since epoch or 5 minutes since Epoch. The following time units are supported:

    • SECONDS

    • MINUTES

    JSONPATHDOUBLE

    This section contains reference documentation for the JSONPATHDOUBLE function.

    Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .

    hashtag
    Signature

    select DivWheelsOffs, 
           DivWheelsOns,
           arrayUnionInt(DivWheelsOffs, DivWheelsOns) AS unionIds
    from airlineStats 
    WHERE arraylength(DivWheelsOffs) >= 2
    limit 5
    select DivTailNums, 
           DivAirports,
           arrayUnionString(DivTailNums, DivAirports) AS unionIds
    from airlineStats 
    WHERE arraylength(DivTailNums) >= 2
    limit 5
    select id, repo, JSONEXTRACTKEY(repo, '$.*') AS keys
    from githubEvents 
    WHERE id = 7044874109

    1908,1758

    1339,2310

    1908,1758,1339,2310

    N344AA,N344AA

    MCI,BOS

    N344AA,MCI,BOS

    JsonPath Syntaxarrow-up-right
    ABSchevron-right
    ADDchevron-right
    arrayConcatDoublechevron-right
    arrayConcatFloatchevron-right
    arrayConcatIntchevron-right
    arrayConcatLongchevron-right
    arrayConcatStringchevron-right
    arrayContainsIntchevron-right
    arrayContainsStringchevron-right
    arrayDistinctIntchevron-right
    arrayDistinctStringchevron-right
    arrayIndexOfIntchevron-right
    arrayIndexOfStringchevron-right
    ARRAYLENGTHchevron-right
    arrayRemoveIntchevron-right
    arrayRemoveStringchevron-right
    arrayReverseIntchevron-right
    arrayReverseStringchevron-right
    arraySliceIntchevron-right
    arraySliceStringchevron-right
    arraySortIntchevron-right
    arraySortStringchevron-right
    arrayUnionIntchevron-right
    arrayUnionStringchevron-right
    AVGMVchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/avgvalueintegersumtuplesketch.mdchevron-right
    Base64chevron-right
    caseWhenchevron-right
    ceilchevron-right
    CHRchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/clpdecode.mdchevron-right
    codepointchevron-right
    concatchevron-right
    countchevron-right
    COUNTMVchevron-right
    DATETIMECONVERTchevron-right
    DATETRUNCchevron-right
    daychevron-right
    dayOfWeekchevron-right
    dayOfYearchevron-right
    DISTINCTchevron-right
    DISTINCTCOUNTchevron-right
    DISTINCTCOUNTBITMAPchevron-right
    DISTINCTCOUNTHLLMVchevron-right
    DISTINCTCOUNTHLLchevron-right
    DISTINCTCOUNTBITMAPMVchevron-right
    DISTINCTCOUNTMVchevron-right
    DISTINCTCOUNTRAWHLLchevron-right
    DISTINCTCOUNTRAWHLLMVchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcountrawcpcsketch.mdchevron-right
    DISTINCTCOUNTRAWTHETASKETCHchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcountrawintegersumtuplesketch.mdchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcountrawull.mdchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcountcpcsketch.mdchevron-right
    DISTINCTCOUNTTHETASKETCHchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcounttuplesketch.mdchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcountull.mdchevron-right
    DIVchevron-right
    expchevron-right
    Extractchevron-right
    FIRSTWITHTIMEchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/freqeuentlongssketch.mdchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/frequentstringsketch.mdchevron-right
    FromDateTimechevron-right
    FLOORchevron-right
    FromEpochchevron-right
    FromEpochBucketchevron-right
    Histogramchevron-right
    hourchevron-right
    jsonextractkeychevron-right
    jsonextractscalarchevron-right
    JSONFORMATchevron-right
    JSONPATHchevron-right
    JSONPATHARRAYchevron-right
    JSONPATHARRAYDEFAULTEMPTYchevron-right
    JSONPATHDOUBLEchevron-right
    JSONPATHLONGchevron-right
    JSONPATHSTRINGchevron-right
    LASTWITHTIMEchevron-right
    lengthchevron-right
    lnchevron-right
    lowerchevron-right
    lpadchevron-right
    ltrimchevron-right
    maxchevron-right
    MAXMVchevron-right
    MD5chevron-right
    millisecondchevron-right
    minchevron-right
    minmaxrangechevron-right
    MINMAXRANGEMVchevron-right
    MINMVchevron-right
    minutechevron-right
    MODchevron-right
    modechevron-right
    monthchevron-right
    multchevron-right
    nowchevron-right
    percentilechevron-right
    percentileestchevron-right
    percentileestmvchevron-right
    percentiletdigestchevron-right
    percentiletdigestmvchevron-right
    percentilemvchevron-right
    percentilekllchevron-right
    percentilerawkllchevron-right
    percentilekllmvchevron-right
    percentilerawkllmvchevron-right
    quarterchevron-right
    regexpExtractchevron-right
    removechevron-right
    replacechevron-right
    reversechevron-right
    roundchevron-right
    ROW_NUMBERchevron-right
    rpadchevron-right
    rtrimchevron-right
    secondchevron-right
    SEGMENTPARTITIONEDDISTINCTCOUNTchevron-right
    shachevron-right
    sha256chevron-right
    sha512chevron-right
    sqrtchevron-right
    startswithchevron-right
    ST_AsBinarychevron-right
    ST_AsTextchevron-right
    ST_Containschevron-right
    ST_Distancechevron-right
    ST_GeogFromTextchevron-right
    ST_GeogFromWKBchevron-right
    ST_GeometryTypechevron-right
    ST_GeomFromTextchevron-right
    ST_GeomFromWKBchevron-right
    STPOINTchevron-right
    ST_Polygonchevron-right
    strposchevron-right
    ST_Unionchevron-right
    SUBchevron-right
    substrchevron-right
    sumchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/sumvaluesintegersumtuplesketch.mdchevron-right
    TIMECONVERTchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/timezoneHour.mdchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/timezoneMinute.mdchevron-right
    ToDateTimechevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/toEpoch.mdchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/toEpochBucket.mdchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/toEpochRounded.mdchevron-right
    TOJSONMAPSTRchevron-right
    toSphericalGeographychevron-right
    trimchevron-right
    upperchevron-right
    Urlchevron-right
    UTF8chevron-right
    VALUEINchevron-right
    weekchevron-right
    yearchevron-right
    https://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/yearOfWeek.mdchevron-right

    13198

    13198,10721

    548

    10721

    10721,12478

    1531

    13891

    13891,12892

    19

    14683

    14683,14683

    829

    12339

    12339,12339

    Hybrid Quick Start

    24

    13891,12892

    13891

    13198,12892

    13198

    11066,12892

    11066

    13198,12892

    13198

    13891,12892

    13891

    Hybrid Quick Start

    booleanExpr1

    A boolean expression

    valueIfExpr1True, valueIfExpr2True

    A value to return.

    CASEWHEN(latitude > 0, 'North', 'South')

    North

    CASEWHEN(latitude > 0, 1, 0)

    1

    table config

    SEA,PSC

    SEA,PSC,PHX,MSY

    3162

    SEA,PSC

    SEA,PSC,PHX,MSY

    671

    SEA,PSC

    SEA,PSC,PHX,MSY

    1767

    SEA,PSC

    SEA,PSC,PHX

    2522

    SEA,PSC

    SEA,PSC

    Hybrid Quick Start

    424

    select DivTailNums
    from airlineStats 
    where arraylength(DivTailNums) > 1

    N7713A,N7713A

    N344AA,N344AA

    N344AA,N344AA

    N7713A,N7713A

    8

    Hybrid Quick Start

    13891,12892

    false

    14683,14683

    true

    12339,12339

    false

    13487,13930

    false

    13029,11292

    false

    Hybrid Quick Start

    1565136000000

    1565190733000

    1565190733000

    1565190733000

    1639150454255

    select EXP(1) AS value
    from ignoreMe

    2.718281828459045

    select EXP(12) AS value
    from ignoreMe

    162754.79141900392

    PSC,SEA

    SEA,PSC

    5300

    PSC,SEA

    SEA,PSC

    3846

    PSC,SEA

    SEA,PSC

    3635

    MSY,PHX,PSC,SEA

    SEA,PSC,PHX,MSY

    429

    MSY,PHX,PSC,SEA

    SEA,PSC,PHX,MSY

    Hybrid Quick Start

    1206

    N7713A,N7713A

    N7713A

    N344AA,N344AA

    N344AA

    N344AA,N344AA

    N344AA

    N7713A,N7713A

    N7713A

    Hybrid Quick Start

    group_name809822304

    Streaming Quick Start

    NL

    UA

    AL

    NA

    PL

    AA

    FL

    NL

    UA

    AL

    NA

    PL

    AA

    FL

    Batch Quick Start
    select DISTINCTCOUNTBITMAPMV(DivLongestGTimes) AS value
    from airlineStats 
    where arraylength(DivLongestGTimes) > 1

    34

    2

    Hybrid Quick Start
    select minute(1639351800000) AS minute
    FROM ignoreMe

    30

    select minute(1639351800000, 'America/St_Johns') AS minute
    FROM ignoreMe

    0

    SELECT isSubnetOf('192.168.0.1/24', '192.168.0.225') 
    AS result
    FROM myTable;
    ---> returns true
    
    SELECT isSubnetOf('1.2.3.128/26', '1.2.5.1') 
    AS result
    FROM myTable;
    ---> returns false
    SELECT isSubnetOf('2001:4800:7825:103::/64', '2001:4800:7825:103::2050')
    AS result
    FROM myTable;
    ---> returns true
    
    SELECT isSubnetOf('7890:db8:113::8a2e:370:7334/127', '7890:db8:113::8a2e:370:7336') 
    AS result
    FROM myTable;
    ---> returns false

    PHX,PSC,SEA

    SEA,PSC,PHX

    963

    MSY,PHX,PSC,SEA

    SEA,PSC,PHX,MSY

    1206

    PSC,SEA

    SEA,PSC

    5300

    PSC,SEA

    SEA,PSC

    3359

    MSY,PHX,PSC,SEA

    SEA,PSC,PHX,MSY

    Hybrid Quick Start

    1023

    Currently, the only supported parameter is nominalEntries (defaults to 4096).
  • predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.

  • postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.

  • AgMDAAAKzJOVAAAAAACAPwDAATj...

    AwMDAAAKzJMQAAAAAACAP4vpfPBbbQsO5N1zYV2c...

    AQMDAAA6zJN8QPYIsvHMNQ==

    Batch Quick Start
    select MINMAXRANGEMV(DivLongestGTimes) AS value
    from airlineStats 
    where arraylength(DivLongestGTimes) > 1

    106

    Hybrid Quick Start
    select DivTailNums
    from airlineStats 
    where arraylength(DivTailNums) > 1

    N7713A,N7713A

    N344AA,N344AA

    N344AA,N344AA

    N7713A,N7713A

    2

    Hybrid Quick Start
    returns binary data (represented as a Hex string) from Base64-encoded string.

    hashtag
    Signature

    toBase64(bytesCol)

    fromBase64(stringCol)

    hashtag
    Usage Examples

    circle-info

    For better readability, the following examples converts string hello! into BYTES using toUtf8arrow-up-right function and converts the decoded BYTES into string using fromUtf8arrow-up-right.

    encoded

    aGVsbG8h

    decoded

    hello!

    circle-info

    Note that without UTF8 string conversion, returned BYTES will be represented as a Hex string following Pinot's BYTES column representationarrow-up-right. See the example below.

    decoded

    68656c6c6f21

    circle-exclamation

    Note that the following query will throw compilation error as string is not a valid input type for toBase64.

    java.util.Base64.Encoderarrow-up-right
  • column (required): Name of the column to aggregate on. Needs to be a type which can be cast into 'STRING'.

  • maxMapSize: This value specifies the maximum physical length of the internal hash map. The maxMapSize must be a power of 2 and the default value is 256.

  • hashtag
    Usage Example

    frequentstringssketch(AirlineID)

    BAEKCAUAAAAOAA...

    Which can be used, for example in Java as:

    For more examples on the sketch API, refer to the Datasketches documentationarrow-up-right.

    Apache Datasketches libraryarrow-up-right
    HOURS
  • DAYS

  • hashtag
    Signature

    FromEpoch<TIME_UNIT>Bucket(timeInMillis, bucketSize)

    hashtag
    Usage Examples

    bucket

    1613472303000

    bucket

    3226944606000

    bucket

    1613472000000

    bucket

    1613466000000

    bucket

    1613088000000

    select FlightNum, 
           arraySliceInt(DivAirportIDs, 0, 1) AS airports, 
    	     DivAirportIDs
    from airlineStats 
    WHERE arraylength(DivAirportIDs) >= 2
    limit 5
    select DivAirportIDs, 
           arrayRemoveInt(DivAirportIDs, 12892) AS value
    from airlineStats 
    WHERE arraylength(DivAirportIDs) >= 2
    AND arrayContainsInt(DivAirportIDs, 12892) = 1
    limit 5
    {
      "latitude": 1.0
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"northernHemisphereStr",
                   "transformFunction":"CASEWHEN(latitude > 0, 'North', 'South')"
                }
             ]
          }
       }
    }
    select FlightNum, 
           arraySliceString(RandomAirports, 0, 2) AS airports, 
           RandomAirports
    from airlineStats 
    WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
    limit 5
    select COUNTMV(DivTailNums) AS value
    from airlineStats 
    where arraylength(DivTailNums) > 1
    select DivAirportIDs, 
           arrayContainsInt(DivAirportIDs, 14683) AS containsValue
    from airlineStats 
    WHERE arraylength(DivAirportIDs) >= 2
    limit 5
    SELECT FromDateTime('2019-08-07', 'yyyy-MM-dd') AS epochMillis
    FROM ignoreMe
    SELECT FromDateTime(
        '2019-08-07 3:12:13 PM', 
        'yyyy-MM-dd hh:mm:ss a'
        ) AS epochMillis
    FROM ignoreMe
    SELECT FromDateTime(
        '2019-08-07T15:12:13', 
        'yyyy-MM-dd''T''HH:mm:ss'
        ) AS epochMillis
    FROM ignoreMe
    SELECT FromDateTime(
        '2019-08-07T07:12:13-0800', 
        'yyyy-MM-dd''T''HH:mm:ssZ'
        ) AS epochMillis
    FROM ignoreMe
    select ago('P1D') AS oneDayAgo
    FROM ignoreMe
    SELECT * 
    FROM tableName
    WHERE tsInMillis > ago('P1D')
    select FlightNum, 
           arraySortString(RandomAirports) AS sortedAirports, 
           RandomAirports
    from airlineStats 
    WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
    limit 5
    select DivTailNums, 
           arrayDistinctString(DivTailNums) AS unique
    from airlineStats 
    WHERE arraylength(DivTailNums) >= 2
    limit 5
    select LASTWITHTIME(group_name, __metadata$recordTimestamp, 'STRING')
    from meetupRsvp 
    select DISTINCT league AS value
    from baseballStats 
    select DISTINCT(league) AS value
    from baseballStats 
    select DISTINCTCOUNTBITMAPMV(DivTailNums) AS value
    from airlineStats 
    where arraylength(DivTailNums) > 1
    SELECT count(*) 
    FROM myTable 
    WHERE isSubnetOf('192.168.0.1/24', ipAddressCol);
    
    SELECT count(*) 
    FROM myTable 
    WHERE isSubnetOf('192.168.0.1/24', ipAddressCol) 
    OR isSubnetOf(ipPrefixCol, '7890:db8:113::8a2e:370:7336');
    SELECT 
        CASE 
            WHEN isSubnetOf('105.25.245.115/27', srcIPAddress) THEN 'case1' 
            WHEN isSubnetOf('105.25.245.115/27', dstIPAddress) THEN 'case2'
            ELSE 'case3' 
        END AS differentFlow
    FROM myTable;
    select FlightNum, 
           arrayReverseString(RandomAirports) AS reversedAirports, 
           RandomAirports
    from airlineStats 
    WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
    limit 5
    select distinctCountRawThetaSketch(teamID) AS value
    from baseballStats 
    select distinctCountRawThetaSketch(teamID, 'nominalEntries=10') AS value
    from baseballStats
    select distinctCountRawThetaSketch(
      yearID, 
      'nominalEntries=4096', 
      'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
      'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
      'SET_INTERSECT($1, $2)'
    ) AS value
    from baseballStats 
    select DISTINCTCOUNTMV(DivTailNums) AS value
    from airlineStats 
    where arraylength(DivTailNums) > 1
    SELECT toBase64(toUtf8('hello!')) AS encoded
    FROM ignoreMe
    SELECT fromUtf8(fromBase64('aGVsbG8h')) AS decoded
    FROM ignoreMe
    SELECT fromBase64('aGVsbG8h') AS decoded
    FROM ignoreMe
    SELECT toBase64('hello!') AS encoded
    FROM ignoreMe
    select FREQUENTSTRINGSSKETCH(AirlineID, 16) from airlineStats
    byte[] byteArr = Base64.getDecoder().decode(encodedSketch);
    ItemsSketch<String> sketch = ItemsSketch.getInstance(Memory.wrap(byteArr), new ArrayOfStringsSerDe());
    
    ItemsSketch.Row[] items = sketch.getFrequentItems(ErrorType.NO_FALSE_NEGATIVES);
    for (int i = 0; i < items.length; i++) {
      ItemsSketch.Row item = items[i];
      System.out.printf("Airline: %s, Frequency: %d %n", item.getItem(), item.getEstimate());
    }
    select FromEpochSecondsBucket(1613472303, 1) AS bucket
    FROM ignoreMe
    select FromEpochSecondsBucket(1613472303, 2) AS bucket
    FROM ignoreMe
    select FromEpochMinutesBucket(2689120, 10) AS bucket
    FROM ignoreMe
    select FromEpochHoursBucket(89637, 5) AS bucket
    FROM ignoreMe
    select FromEpochDaysBucket(1867, 10) AS bucket
    FROM ignoreMe
    JSONPATHLONG(jsonField, 'jsonPath', [defaultValue])
    Arguments
    Description

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows to read values from JSON documents.

    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the Jayway JsonPath Evaluator Toolarrow-up-right to test JSON expressions before you import any data.

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    JSONPATHLONG(data, '$.age')

    24

    This function can be used in the table config to extract the age property into the age column, as described below:

    ingestion transformation function
    JSONPATHDOUBLE(jsonField, 'jsonPath', [defaultValue])
    Arguments
    Description

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows to read values from JSON documents.

    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the Jayway JsonPath Evaluator Toolarrow-up-right to test JSON expressions before you import any data.

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    JSONPATHDOUBLE(data, '$.age')

    24.0

    This function can be used in the table config to extract the age property into the age column, as described below:

    ingestion transformation function

    JSONPATH

    This section contains reference documentation for the JSONPATH function.

    Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.

    hashtag
    Signature

    JSONPATH(jsonField, 'jsonPath')

    Arguments
    Description
    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the to test JSON expressions before you import any data.

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    This function can be used in the to extract the name property into the name column and age property into the age column, as described below:

    DATETRUNC

    This section contains reference documentation for the DATETRUNC function.

    (Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc.

    Converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.

    hashtag
    Signature

    DATETRUNC(unit, timeValue)

    DATETRUNC(unit, timeValue, inputTimeUnitStr)

    DATETRUNC(unit, timeValue, inputTimeUnitStr, timeZone)

    DATETRUNC(unit, timeValue, inputTimeUnitStr, timeZone, outputTimeUnitStr)

    unit supports the following values:

    • millisecond

    • second

    • minute

    inputTimeUnitStr and outputTimeUnitStr support the following values:

    • NANOSECONDS

    • MICROSECONDS

    • MILLISECONDS

    hashtag
    Usage Examples

    Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight):

    or

    ts

    Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the UTC time zone, returning a result in epoch in seconds in UTC timezone:

    ts

    Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the CET time zone, returning a result in epoch in seconds in UTC timezone:

    ts

    Truncates an epoch in milliseconds at QUARTER in the Los Angeles time zone (where a Quarter begins on Jan 1st, April 1st, July 1st, October 1st in Los Angeles timezone), returning a result in hours since UTC epoch:

    ts

    JSONPATHSTRING

    This section contains reference documentation for the JSONPATHSTRING function.

    Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an ingestion transformation function.

    hashtag
    Signature

    JSONPATHSTRING(jsonField, 'jsonPath', [defaultValue])

    Arguments
    Description
    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the to test JSON expressions before you import any data.

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    This function can be used in the to extract the age property into the age column, as described below:

    JSONPATHARRAYDEFAULTEMPTY

    This section contains reference documentation for the JSONPATHARRAYDEFAULTEMPTY function.

    Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. This function can only be used in an ingestion transformation function.

    hashtag
    Signature

    JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath')

    Arguments
    Description
    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the to test JSON expressions before you import any data.

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    This function can be used in the to extract the name, score, and second value of homework_grades into their respective columns , as described below:

    DISTINCTCOUNTTHETASKETCH

    This section contains reference documentation for the DISTINCTCOUNTTHETASKETCH function.

    The framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the and its extensions from the library org.apache.datasketches:datasketches-java:4.2.0 to perform distinct counting as well as evaluating set operations.

    hashtag
    Signature

    jsonextractscalar

    This section contains reference documentation for the JSONEXTRACTSCALAR function.

    Evaluates the 'jsonPath' on jsonField, returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.

    hashtag
    Signature

    JSONEXTRACTSCALAR(jsonField, 'jsonPath', 'resultsType', [defaultValue])

    mult

    This section contains reference documentation for the MULT function.

    Product of at least two values

    hashtag
    Signature

    MULT(col1, col2, col3...)

    month

    This section contains reference documentation for the month function.

    Returns the month of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 12.

    hashtag
    Signature

    month(tsInMillis)

    month(tsInMillis, timeZoneId)

    now

    This section contains reference documentation for the now function.

    Return current time as epoch millis.

    hashtag
    Signature

    now()

    {
      "data": {
        "name": "Pete",
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"age",
                   "transformFunction":"JSONPATHLONG(data, '$.age')"
                }
             ]
          }
       }
    }
    {
      "data": {
        "name": "Pete",
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"age",
                   "transformFunction":"JSONPATHDOUBLE(data, '$.age')"
                }
             ]
          }
       }
    }
    JsonPath Syntaxarrow-up-right
    JsonPath Syntaxarrow-up-right

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows JsonPath Syntaxarrow-up-right to read values from JSON documents.

    JSONPATH(data, '$.name')

    "Pete"

    JSONPATH(data, '$.age')

    24

    Jayway JsonPath Evaluator Toolarrow-up-right
    table config
    hour
  • day

  • week

  • month

  • quarter

  • year

  • SECONDS
  • MINUTES

  • HOURS

  • DAYS

  • 1639353600000

    1639353600

    1639350000

    453631

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows JsonPath Syntaxarrow-up-right to read values from JSON documents.

    JSONPATHSTRING(data, '$.age')

    "24"

    JSONPATHSTRING(data, '$.name["nick.name"]')

    "Pete"

    JSONPath Online Evaluatorarrow-up-right
    table config

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows JsonPath Syntaxarrow-up-right to read values from JSON documents.

    JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].name')

    ["maths", "english"]

    JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].score')

    [90, 70]

    JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].homework_grades[1]')

    [85, 65]

    JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].homework_grades[7]')

    []

    Jayway JsonPath Evaluator Toolarrow-up-right
    table config
    distinctCountThetaSketch(
    <thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate
    ) -> Long
    • thetaSketchColumn (required): Name of the column to aggregate on.

    • thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.

      • Currently, the only supported parameter is nominalEntries (defaults to 4096).

    • predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.

    • postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.

    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    value

    149

    value

    146

    We can also provide predicates and a post aggregation expression to compute more complicated cardinalities. For example, we could can find the intersection of the following queries:

    yearID

    1986

    1985

    yearID

    1937

    2003

    1979

    1900

    1986

    1978

    2012

    (the yearId 1986 is the only one in common)

    By running the following query:

    value

    1

    Theta Sketcharrow-up-right
    Sketch Classarrow-up-right
    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    homeRuns
    baseOnBalls
    total

    26

    37

    962

    select homeRuns, baseOnBalls, MULT(homeRuns, baseOnBalls) AS total
    from baseballStats 
    WHERE teamID = 'ML1' 
    AND yearID = 1956 
    AND playerName = 'Henry Louis'
    hashtag
    Usage Examples
    month

    9

    month

    10

    select month(1633046399000, 'UTC') AS month
    FROM ignoreMe
    select month(1633046399000, 'CET') AS month
    FROM ignoreMe
    hashtag
    Usage Examples
    now

    1639150454255

    This function is typically used in predicate to filter on timestamp for recent data. e.g. filter data on recent 1 day(86400 seconds)

    select now() AS now
    FROM ignoreMe
    SELECT * 
    FROM tableName
    WHERE tsInMillis > now() - 86400000
    {
      "data": {
        "name": "Pete",
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"name",
                   "transformFunction":"JSONPATHSTRING(data, '$.name')"
                },
                {
                   "columnName":"age",
                   "transformFunction":"JSONPATHSTRING(data, '$.age')"
                }
             ]
          }
       }
    }
    select dateTrunc('week', 1639480981746) AS ts
    FROM ignoreMe
    select dateTrunc('week', 1639480981746, 'MILLISECONDS') AS ts
    FROM ignoreMe
    select dateTrunc(
      'week', 
      1639480981746, 
      'MILLISECONDS', 
      'UTC', 
      'SECONDS'
    ) AS ts
    FROM ignoreMe
    select dateTrunc(
      'week', 
      1639480981746, 
      'MILLISECONDS', 
      'CET', 
      'SECONDS'
    ) AS ts
    FROM ignoreMe
    select dateTrunc(
      'quarter', 
      1639480981746, 
      'MILLISECONDS', 
      'America/Los_Angeles', 
      'HOURS'
    ) AS ts
    FROM ignoreMe
    {
      "data": {
        "name": {"full.name": "Peter", "nick.name": "Pete"},
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"age",
                   "transformFunction":"JSONPATHSTRING(data, '$.age')"
                },
                {
                   "columnName":"nickName",
                   "transformFunction":"JSONPATHSTRING(data, '$.name[\"nick.name\"]')"
                }
             ]
          }
       }
    }
    {
      "data": {
        "name": "Pete",
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"names",
                   "transformFunction":"JSONPATHARRAYDEFAULTEMPTY(data, '$.subjects[*].name')"
                },
                {
                   "columnName":"ages",
                   "transformFunction":"JSONPATHARRAYDEFAULTEMPTY(data, '$.subjects[*].score')"
                },
                {
                   "columnName":"homeworkGrades",
                   "transformFunction":"JSONPATHARRAYDEFAULTEMPTY(data, '$.subjects[*].homework_grades[1]')"
                }
             ]
          }
       }
    }
    select distinctCountThetaSketch(teamID) AS value
    from baseballStats 
    select distinctCountThetaSketch(teamID, 'nominalEntries=10') AS value
    from baseballStats
    select yearID
    from baseballStats
    where teamID = 'SFN' AND numberOfGames = 28 AND homeRuns = 1
    select yearID
    from baseballStats
    where teamID = 'CHN' AND numberOfGames = 28 AND homeRuns = 1
    select distinctCountThetaSketch(
      yearID, 
      'nominalEntries=4096', 
      'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
      'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
      'SET_INTERSECT($1, $2)'
    ) AS value
    from baseballStats 
    Arguments
    Description

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows to read values from JSON documents.

    'results_type'

    One of the Pinot supported data types:INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING,

    INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.

    circle-exclamation

    'jsonPath'and`` ``'results_type'are literals. Pinot uses single quotes to distinguish them from identifiers.

    hashtag
    Usage Examples

    The examples in this section are based on the Batch JSON Quick Start. In particular we'll be querying the row WHERE id = 7044874109:

    repo

    {"id":115911530,"name":"LimeVista/Tapes","url":"https://api.github.com/repos/LimeVista/Tapes"}

    The following examples show how to use the JSONEXTRACTSCALAR function:

    id
    name

    7044874109

    LimeVista/Tapes

    id
    name

    7044874109

    dummyValue

    JSONPATHARRAY

    This section contains reference documentation for the JSONPATHARRAY function.

    Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.

    hashtag
    Signature

    JSONPATHARRAY(jsonField, 'jsonPath')

    Arguments
    Description
    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the to test JSON expressions before you import any data.

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    This function can be used in the to extract the name, score, and second value of homework_grades into their respective columns , as described below:

    select repo
    from githubEvents 
    WHERE id = 7044874109
    select id, jsonextractscalar(repo, '$.name', 'STRING') AS name
    from githubEvents 
    WHERE id = 7044874109
    select id, jsonextractscalar(repo, '$.foo', 'STRING') AS name
    from githubEvents 
    WHERE id = 7044874109
    [
      {
        "message": "QueryExecutionError:\njava.lang.RuntimeException: Illegal Json Path: [$.foo], when reading [{\"id\":115911530,\"name\":\"LimeVista/Tapes\",\"url\":\"https://api.github.com/repos/LimeVista/Tapes\"}]\n\tat org.apache.pinot.core.operator.transform.function.JsonExtractScalarTransformFunction.transformToStringValuesSV(JsonExtractScalarTransformFunction.java:254)\n\tat org.apache.pinot.core.operator.docvalsets.TransformBlockValSet.getStringValuesSV(TransformBlockValSet.java:90)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:64)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:32)",
        "errorCode": 200
      }
    ]
    select id, jsonextractscalar(repo, '$.foo', 'STRING', 'dummyValue') AS name
    from githubEvents 
    WHERE id = 7044874109
    JsonPath Syntaxarrow-up-right

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows JsonPath Syntaxarrow-up-right to read values from JSON documents.

    JSONPATHARRAY(myJsonRecord, '$.subjects[*].name')

    ["maths", "english"]

    JSONPATHARRAY(myJsonRecord, '$.subjects[*].score')

    [90, 70]

    JSONPATHARRAY(myJsonRecord, '$.subjects[*].homework_grades[1]')

    [85, 65]

    Jayway JsonPath Evaluator Toolarrow-up-right
    table config
    {
      "data": {
        "name": "Pete",
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"names",
                   "transformFunction":"JSONPATHARRAY(data, '$.subjects[*].name')"
                },
                {
                   "columnName":"ages",
                   "transformFunction":"JSONPATHARRAY(data, '$.subjects[*].score')"
                },
                {
                   "columnName":"homeworkGrades",
                   "transformFunction":"JSONPATHARRAY(data, '$.subjects[*].homework_grades[1]')"
                }
             ]
          }
       }
    }

    DATETIMECONVERT

    This section contains reference documentation for the DATETIMECONVERT function.

    Converts the value from a column that contains an epoch timestamp into another time unit and buckets based on the given time granularity.

    hashtag
    Signature

    DATETIMECONVERT(columnName, inputFormat, outputFormat, outputGranularity)

    inputFormat and outputFormat are defined using the following structure:

    <time size>:<time unit>:<time format>:<pattern>

    where:

    • time size - size of the time unit eg: 1, 10

    • time unit - DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS

    granularity is specified in the format <time size>:<time unit>.

    hashtag
    Usage Examples

    These examples are based on the .

    created_at_timestamp from milliseconds since epoch to days since epoch, bucketed to 1 day granularity:

    id
    created_at_timestamp
    timeInMs
    convertedTime

    created_at_timestamp bucketed to 15 minutes granularity:

    id
    created_at_timestamp
    timeInMs
    convertedTime

    created_at_timestamp to format yyyy-MM-dd, bucketed to 1 days granularity:

    id
    created_at_timestamp
    timeInMs
    convertedTime

    created_at_timestamp to format yyyy-MM-dd HH:mm, in timezone Pacific/Kiritimati:

    id
    created_at_timestamp
    timeInMs
    convertedTime

    created_at_timestamp to format yyyy-MM-dd, in timezone Pacific/Kiritimati and bucketed to 1 day granularity:

    id
    created_at_timestamp
    timeInMs
    convertedTime
    ,
    MICROSECONDS
    ,
    NANOSECONDS
  • time format

    • EPOCH

    • SIMPLE_DATE_FORMAT pattern - defined in case of SIMPLE_DATE_FORMAT e.g. yyyy-MM-dd. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. Asia/Kolkata or PDT

  • 7044874134

    2018-01-01 11:00:00.0

    1514804402000

    17532

    7044874134

    2018-01-01 11:00:00.0

    1514804402000

    1514804400000

    7044874134

    2018-01-01 11:00:00.0

    1514804402000

    2018-01-01

    7044874134

    2018-01-01 11:00:00.0

    1514804402000

    2018-01-02 01:00

    7044874134

    2018-01-01 11:00:00.0

    1514804402000

    2018-01-02 00:00

    Batch JSON Quick Start
    select id, 
           created_at_timestamp, 
           cast(created_at_timestamp AS long) AS timeInMs,
           DATETIMECONVERT(
             created_at_timestamp, 
             '1:MILLISECONDS:EPOCH', 
             '1:DAYS:EPOCH', 
             '1:DAYS'
           ) AS convertedTime
    from githubEvents
    WHERE id = 7044874134
    select id, 
           created_at_timestamp, 
           cast(created_at_timestamp AS long) AS timeInMs,
           DATETIMECONVERT(
             created_at_timestamp, 
             '1:MILLISECONDS:EPOCH', 
             '1:MILLISECONDS:EPOCH', 
             '15:MINUTES'
           ) AS convertedTime
    from githubEvents
    WHERE id = 7044874134
    select id, 
           created_at_timestamp, 
           cast(created_at_timestamp AS long) AS timeInMs,
           DATETIMECONVERT(
             created_at_timestamp, 
             '1:MILLISECONDS:EPOCH', 
             '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd', 
             '1:DAYS'
           ) AS convertedTime
    from githubEvents
    WHERE id = 7044874134
    select id, 
           created_at_timestamp, 
           cast(created_at_timestamp AS long) AS timeInMs,
           DATETIMECONVERT(
             created_at_timestamp, 
             '1:MILLISECONDS:EPOCH', 
             '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)', 
             '1:MILLISECONDS'
           ) AS convertedTime
    from githubEvents
    WHERE id = 7044874134
    select id, 
           created_at_timestamp, 
           cast(created_at_timestamp AS long) AS timeInMs,
           DATETIMECONVERT(
             created_at_timestamp, 
             '1:MILLISECONDS:EPOCH', 
             '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)', 
             '1:DAYS'
           ) AS convertedTime
    from githubEvents
    WHERE id = 7044874134

    FUNNELCOUNT

    This section contains reference documentation for the FUNNELCOUNT function.

    Funnel analytics aggregation function.

    Returns array of distinct correlated counts for each funnel step.

    hashtag
    Signature

    FUNNEL_COUNT (

    STEPS ( predicate1, predicate2 ... ),

    CORRELATED_BY ( correlation_column ),

    SETTINGS ( setting1, setting2 ... ) )

    Parameter
    Arguments
    Description

    hashtag
    Usage Examples

    Many datasets are time series in nature, tracking events of an entity over time. An example of such a dataset could be a user analytics activity log from a commerce web application.

    hashtag
    Example

    user_id
    event_time
    url

    hashtag
    Funnel

    We want to analyse the following checkout funnel:

    • /cart/add

    • /checkout/start

    • /checkout/confirmation

    hashtag
    Counts

    We want to answer the following questions about the above funnel:

    • How many users entered the top of the funnel?

    • How many of these users proceeded to the second step?

    • How many users reached the bottom of the funnel after completing all steps?

    hashtag
    Query

    counts

    Notes

    Notice that although U1 user added to cart twice, it still counted as one conversion in the first step, as we report on unique counts rather than total events. Also notice that although U2 events were logged out of order, we still counted the user as converted.

    Equivalence

    The above query is equivalent to the below presto SQL query:

    hashtag
    Settings

    For a large dataset we could use for example a theta_sketch strategy, or furthermore, partition the data by user_id and leverage a partitioned strategy. It is also important to filter in the where clause so to aggregate only necessary rows.

    counts

    hashtag
    Another Example

    We now want to learn how many users checkout after a text search; as opposed to other entry points such as browsing a product category listing. We want to then analyse the following funnel:

    • /product/search

    • /cart/add

    • /checkout/start

    hashtag
    Query

    counts

    Notes

    Notice that U1 is not counted in this funnel, as the user did not perform any product search. Both U2 and U3 entered the top of the funnel and performed the second step, but only U2 converted to the bottom of the funnel.

    2021-10-01 09:47:00.000

    /cart/add

    U3

    2021-10-01 10:02:00.000

    /product/listing

    U3

    2021-10-01 10:05:00.000

    /product/search

    U2

    2021-10-01 10:06:00.000

    /product/search

    U2

    2021-10-01 10:15:00.000

    /checkout/start

    U2

    2021-10-01 10:16:00.000

    /cart/add

    U3

    2021-10-01 11:17:00.000

    /product/details

    U2

    2021-10-01 11:18:00.000

    /checkout/confirmation

    U3

    2021-10-01 11:21:00.000

    /cart/add

    U1

    2021-10-01 11:33:00.000

    /cart/add

    U1

    2021-10-01 11:46:00.000

    /checkout/start

    U1

    2021-10-01 11:54:00.000

    /checkout/confirmation

    /checkout/confirmation

    STEPS

    predicates 1...n

    (required) These are individual predicates representing funnel steps which are applied on rows selected by the where clause. Distinct values from the correlation_column that satisfy these predicates are counted per step. For example, all filtered rows that match url = '/checkout' are unionized into a set. Sets are intersected with the sets resulted from the preceding steps, each step retaining only individuals present in previous steps. Finally, unique counts are returned for each step in the funnel.

    CORRELATED_BY

    correlation_column

    (required) Column to leverage for funnel correlation, distinct values from this column are counted per step during aggregation. Only dictionary-encoded columns are supported.

    SETTINGS

    settings 1...n

    (optional) Settings to select and configure a funnel counting strategy:

    bitmap (default): This strategy is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions. For accurate distinct counting on all column types, use 'set' instead. See also DISTINCTCOUNTBITMAP.

    set: This strategy uses fastutilarrow-up-right hash sets. Use with care, unbounded memory cost. See also DISTINCTCOUNT.

    theta_sketch: This strategy leverages Theta Sketcharrow-up-right framework to provide an approximate funnel count with a small memory footprint. See also DISTINCTCOUNTTHETASKETCH.

    nominalEntries: theta-sketch strategy parameter (defaults to 4096). Can only be used in conjunction with theta_sketch setting.

    partitioned: This strategy counts funnel steps per segment, then sums up step counts across segments. Correlation column should be configured as partition column for this strategy. See also .

    sorted: This strategy counts funnel steps per segment with zero memory footprint. Correlation column should be configured as sort column for this strategy. Can only be used in conjunction with partitioned setting.

    U1

    2021-10-01 09:01:00.000

    /product/listing

    U2

    2021-10-01 09:17:00.000

    /product/search

    U1

    2021-10-01 09:33:00.000

    /product/details

    3, 2, 2

    3, 2, 2

    2, 2, 1, 1

    U1

    select 
      FUNNEL_COUNT(
        STEPS(
          url = '/cart/add', 
          url = '/checkout/start', 
          url = '/checkout/confirmation'),
        CORRELATED_BY(user_id)
      ) AS counts
    from user_log 
    select 
       ARRAY[
         count_if(steps[1]),
         count_if(steps[1] and steps[2]),
         count_if(steps[1] and steps[2] and steps[3])
       ] as counts
     from (
       select 
         ARRAY[
           bool_or(url = '/cart/add'),
           bool_or(url = '/checkout/start'),
           bool_or(url = '/checkout/confirmation')
         ] as steps
       from user_log
       group by user_id
     )
    select 
      FUNNEL_COUNT(
        STEPS(
          url = '/cart/add', 
          url = '/checkout/start', 
          url = '/checkout/confirmation'),
        CORRELATED_BY(user_id),
        SETTINGS('theta_sketch', 'nominalEntries=4096')
      ) AS counts
    from user_log 
    where url in ('/cart/add', '/checkout/start', '/checkout/confirmation')
    select 
      FUNNEL_COUNT(
        STEPS(
          url = '/product/search',
          url = '/cart/add', 
          url = '/checkout/start', 
          url = '/checkout/confirmation'),
        CORRELATED_BY(user_id)
      ) AS counts
    from user_log 
    SEGMENTPARTITIONEDDISTINCTCOUNT