arrow-left

All pages
gitbookPowered by GitBook
triangle-exclamation
Couldn't generate the PDF for 168 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

ABS

This section contains reference documentation for the abs function.

Absolute of a value

hashtag
Signature

ABS(col1)

hashtag
Usage Examples

value
value
select ABS(-12.1) AS value
from ignoreMe

12.1

select ABS(12.1) AS value
from ignoreMe

12.1

Functions

This page contains reference documentation for functions in Apache Pinot.

This page contains reference documentation for functions in Apache Pinot.

ABSchevron-rightADDchevron-rightarrayConcatDoublechevron-rightarrayConcatFloatchevron-rightarrayConcatIntchevron-rightarrayConcatLongchevron-rightarrayConcatStringchevron-rightarrayContainsIntchevron-rightarrayContainsStringchevron-rightarrayDistinctIntchevron-rightarrayDistinctStringchevron-rightarrayIndexOfIntchevron-rightarrayIndexOfStringchevron-rightARRAYLENGTHchevron-rightarrayRemoveIntchevron-rightarrayRemoveStringchevron-rightarrayReverseIntchevron-rightarrayReverseStringchevron-rightarraySliceIntchevron-rightarraySliceStringchevron-rightarraySortIntchevron-rightarraySortStringchevron-rightarrayUnionIntchevron-rightarrayUnionStringchevron-rightAVGMVchevron-rightBase64chevron-rightcaseWhenchevron-rightceilchevron-rightCHRchevron-righthttps://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/clpdecode.mdchevron-rightcodepointchevron-rightconcatchevron-rightcountchevron-rightCOUNTMVchevron-rightDATETIMECONVERTchevron-rightDATETRUNCchevron-rightdaychevron-rightdayOfWeekchevron-rightdayOfYearchevron-rightDISTINCTchevron-rightDISTINCTCOUNTchevron-rightDISTINCTCOUNTBITMAPchevron-rightDISTINCTCOUNTHLLMVchevron-rightDISTINCTCOUNTHLLchevron-rightDISTINCTCOUNTBITMAPMVchevron-righthttps://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcounthllplus.mdchevron-righthttps://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcounthllplusmv.mdchevron-rightDISTINCTCOUNTMVchevron-rightDISTINCTCOUNTRAWHLLchevron-righthttps://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcountrawhllplus.mdchevron-rightDISTINCTCOUNTRAWHLLMVchevron-righthttps://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcountrawhllplusmv.mdchevron-righthttps://github.com/pinot-contrib/pinot-docs/blob/latest/configuration-reference/functions/distinctcountrawcpcsketch.mdchevron-rightDISTINCTCOUNTRAWTHETASKETCHchevron-rightDISTINCTCOUNTRAWHLLchevron-rightDISTINCTCOUNTTHETASKETCHchevron-rightDISTINCTCOUNTTHETASKETCHchevron-rightDISTINCTCOUNTHLLchevron-rightDISTINCTSUMchevron-rightDISTINCTSUMMVchevron-rightDIVchevron-rightexpchevron-rightExtractchevron-rightFIRSTWITHTIMEchevron-rightFrequentLongsSketchchevron-rightFrequentStringsSketchchevron-rightFromDateTimechevron-rightFLOORchevron-rightFromEpochchevron-rightFromEpochBucketchevron-rightHistogramchevron-righthourchevron-rightjsonextractkeychevron-rightjsonextractscalarchevron-rightJSONFORMATchevron-rightJSONPATHchevron-rightJSONPATHARRAYchevron-rightJSONPATHARRAYDEFAULTEMPTYchevron-rightJSONPATHDOUBLEchevron-rightJSONPATHLONGchevron-rightJSONPATHSTRINGchevron-rightLAGchevron-rightLASTWITHTIMEchevron-rightLEADchevron-rightlengthchevron-rightlnchevron-rightlowerchevron-rightlpadchevron-rightltrimchevron-rightmaxchevron-rightMAXMVchevron-rightMD5chevron-rightmillisecondchevron-rightminchevron-rightminmaxrangechevron-rightMINMAXRANGEMVchevron-rightMINMVchevron-rightminutechevron-rightMODchevron-rightmodechevron-rightmonthchevron-rightmultchevron-rightnowchevron-rightpercentilechevron-rightpercentileestchevron-rightpercentileestmvchevron-rightpercentiletdigestchevron-rightpercentiletdigestmvchevron-rightpercentilemvchevron-rightpercentilekllchevron-rightpercentilerawkllchevron-rightpercentilekllmvchevron-rightpercentilerawkllmvchevron-rightquarterchevron-rightregexpExtractchevron-rightremovechevron-rightreplacechevron-rightreversechevron-rightroundchevron-rightROW_NUMBERchevron-rightrpadchevron-rightrtrimchevron-rightsecondchevron-rightSEGMENTPARTITIONEDDISTINCTCOUNTchevron-rightshachevron-rightsha256chevron-rightsha512chevron-rightsqrtchevron-rightstartswithchevron-rightST_AsBinarychevron-rightST_AsTextchevron-rightST_Containschevron-rightST_Distancechevron-rightST_GeogFromTextchevron-rightST_GeogFromWKBchevron-rightST_GeometryTypechevron-rightST_GeomFromTextchevron-rightST_GeomFromWKBchevron-rightSTPOINTchevron-rightST_Polygonchevron-rightstrposchevron-rightST_Unionchevron-rightSUBchevron-rightsubstrchevron-rightsumchevron-rightsummvchevron-rightTIMECONVERTchevron-righttimezoneHourchevron-righttimezoneMinutechevron-rightToDateTimechevron-rightToEpochchevron-rightToEpochBucketchevron-rightToEpochRoundedchevron-rightTOJSONMAPSTRchevron-righttoSphericalGeographychevron-righttrimchevron-rightupperchevron-rightUrlchevron-rightUTF8chevron-rightVALUEINchevron-rightweekchevron-rightyearchevron-rightyearOfWeekchevron-right

codepoint

This section contains reference documentation for the CODEPOINT function.

the Unicode codepoint of the first character of the string

hashtag
Signature

CODEPOINT(col)

hashtag
Usage Examples

value

arrayConcatFloat

This section contains reference documentation for the arrayConcatFloat function.

Concatenates two arrays of floats.

hashtag
Signature

arrayConcatFloat('colName1', 'colName2')

SELECT CODEPOINT('Apache Pinot') AS value
FROM ignoreMe

65

hashtag
Usage Examples

This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type FLOAT with singleValueField in the table schema set to false.

select mvCol1, 
       arrayConcatFloat(mvCol1, mvCol2) AS concatFloats
from multiValueTable
WHERE arraylength(mvCol1) >= 2
limit 5

ago

This section contains reference documentation for the ago function.

Return time as epoch millis before the given period in ISO-8601 duration format (PnDTnHnMn.nS with days considered to be exactly 24 hours).

Examples:

  • "PT20.345S" -- parses as "20.345 seconds"

  • "PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)

  • "PT10H" -- parses as "10 hours" (where an hour is 3600 seconds)

  • "P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds)

  • "P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes"

  • "P-6H3M" -- parses as "-6 hours and +3 minutes"

  • "-P6H3M" -- parses as "-6 hours and -3 minutes"

  • "-P-6H+3M" -- parses as "+6 hours and -3 minutes"

hashtag
Signature

ago()

hashtag
Usage Examples

oneDayAgo

This function is typically used in the predicate to filter on timestamps for recent data. e.g. filter data on recent 1 day.

arrayContainsInt

This section contains reference documentation for the arrayContainsInt function.

Checks if int value exists in array.

hashtag
Signature

arrayContainsInt('colName', valueToFind)

hashtag
Usage Examples

These examples are based on the .

DivAirportIDs
containsValue

arrayDistinctString

This section contains reference documentation for the arrayDistinctString function.

Returns unique values in an array of strings.

hashtag
Signature

arrayDistinctString('colName')

hashtag
Usage Examples

These examples are based on the .

DivTailNums
unique

arrayReverseInt

This section contains reference documentation for the arrayReverseInt function.

Reverses array of ints.

hashtag
Signature

arrayReverseInt('colName')

hashtag
Usage Examples

These examples are based on the .

DivAirportIDs
reversedIds

arrayConcatInt

This section contains reference documentation for the arrayConcatInt function.

Concatenates two arrays of ints.

hashtag
Signature

arrayConcatInt('colName1', 'colName2')

hashtag
Usage Examples

These examples are based on the .

DivWheelsOffs
concatIds

count

This section contains reference documentation for the count function.

Get the count of rows in a group

hashtag
Signature

COUNT(colName)

hashtag
Usage Examples

These examples are based on the .

value

COVAR_SAMP

This section contains reference documentation for the COVAR_SAMP function.

Returns the sample covariance between of 2 numerical columns.

COVAR_SAMP(col1, col2) = COVAR_POP(col1, col2) * besselCorrection

hashtag
Signatures

COVAR_SAMP(col1, col2) -> double

hashtag
Usage Examples

These examples are based on the .

covariance

arrayDistinctInt

This section contains reference documentation for the arrayDistinctInt function.

Returns unique values in an array of ints.

hashtag
Signature

arrayDistinctInt('colName')

1639150454255

13891,12892

false

14683,14683

true

12339,12339

false

13487,13930

false

13029,11292

false

Hybrid Quick Start

N7713A,N7713A

N7713A

N344AA,N344AA

N344AA

N344AA,N344AA

N344AA

N7713A,N7713A

N7713A

Hybrid Quick Start

13891,12892

12892,13891

14683,14683

14683,14683

12339,12339

12339,12339

13487,13930

13930,13487

13029,11292

11292,13029

Hybrid Quick Start

1453,1731

1453,1731,1415,1623

1908,1758

1908,1758,1339,2310

1453,1731

1453,1731,1415,1623

1908,1758

1908,1758,1339,2310

Hybrid Quick Start
select count(*) AS value
from baseballStats 

97889

Batch Quick Start
SELECT COVAR_SAMP(numberOfGames, AtBatting) AS covariance 
FROM baseballStats

8270.973200974102

Batch Quick Start
hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

DivAirportIDs
unique

15016,11066

15016,11066

10620,14869

10620,14869

13891,12892

13891,12892

12264,10397

12264,10397

11066,12892

11066,12892

select ago('P1D') AS oneDayAgo
FROM ignoreMe
SELECT * 
FROM tableName
WHERE tsInMillis > ago('P1D')
select DivAirportIDs, 
       arrayContainsInt(DivAirportIDs, 14683) AS containsValue
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivTailNums, 
       arrayDistinctString(DivTailNums) AS unique
from airlineStats 
WHERE arraylength(DivTailNums) >= 2
limit 5
select DivAirportIDs, 
       arrayReverseInt(DivAirportIDs) AS reversedIds
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivWheelsOffs, 
       arrayConcatInt(DivWheelsOffs, DivWheelsOns) AS concatIds
from airlineStats 
WHERE arraylength(DivWheelsOffs) >= 2
limit 5
select DivAirportIDs, 
       arrayDistinctInt(DivAirportIDs) AS unique
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5

ceil

This section contains reference documentation for the CEIL function.

Rounded up to the nearest integer.

hashtag
Signature

CEIL(col1)

hashtag
Usage Examples

value
value

arrayConcatString

This section contains reference documentation for the arrayConcatString function.

Concatenates two arrays of strings.

hashtag
Signature

arrayConcatString('colName1', 'colName2')

hashtag
Usage Examples

These examples are based on the .

DivTailNums
concatIds

lpad

This section contains reference documentation for the LPAD function.

string padded from the left side with pad to reach final size

hashtag
Signature

LPAD(col, size, pad)

hashtag
Usage Examples

value

FLOOR

This section contains reference documentation for the FLOOR function.

Rounded down to the nearest integer.

hashtag
Signature

FLOOR(col1)

hashtag
Usage Examples

value
value

caseWhen

This section contains reference documentation for the caseWhen function.

Returns values depending on boolean expressions. This function can only be used in an .

hashtag
Signature

caseWhen(booleanExpr1, valueIfExpr1True, booleanExpr2, valueIfExpr2True) caseWhen(booleanExpr1, valueIfExpr1True, booleanExpr2, valueIfExpr2True, ... ,valueIfFalse)

DISTINCT

This section contains reference documentation for the DISTINCT function.

Returns the distinct row values in a group

hashtag
Signature

DISTINCT(colName)

DISTINCTCOUNT

This section contains reference documentation for the DISTINCTCOUNT function.

Returns the count of distinct row values in a group

hashtag
Signature

DISTINCTCOUNT(colName)

day

This section contains reference documentation for the day function.

Returns the day of the month from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 31.

hashtag
Signature

day(tsInMillis)

day(tsInMillis, timeZoneId)

MD5

This section contains reference documentation for the MD5 function.

Return MD5 digest of binary column(bytes type) as hex string

hashtag
Signature

MD5(bytesCol)

DISTINCTCOUNTHLLMV

This section contains reference documentation for the DISTINCTCOUNTBITMAPMV function.

Returns the count of distinct row values in a group. This function is accurate for an INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.

hashtag
Signature

DISTINCTCOUNTBITMAPMV(colName)

DISTINCTCOUNTBITMAPMV

This section contains reference documentation for the DISTINCTCOUNTHLLMV function.

Returns an approximate distinct count using HyperLogLog in a group.

hashtag
Signature

DISTINCTCOUNTHLLMV(colName)

ltrim

This section contains reference documentation for the ltrim function.

trim spaces from left side of the string

hashtag
Signature

LTRIM(col)

arrayIndexOfString

This section contains reference documentation for the arrayIndexOfString function.

Finds the last index of the given value in the array starting at the given index.

hashtag
Signature

arrayIndexOfString('colName', valueToFind)

DISTINCTCOUNTHLL

This section contains reference documentation for the DISTINCTCOUNTHLL function.

Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog. For accurate distinct counting, see .

hashtag
Signature

DISTINCTCOUNTHLL(colName, log2m)

exp

This section contains reference documentation for the exp function.

Euler’s number(e) raised to the power of col.

hashtag
Signature

EXP(col1)

arrayConcatLong

This section contains reference documentation for the arrayConcatLong function.

Concatenates two arrays of longs.

hashtag
Signature

arrayConcatLong('colName1', 'colName2')

select CEIL(12.1) AS value
from ignoreMe

13

select CEIL(-12.1) AS value
from ignoreMe

-12

N7713A,N7713A

N7713A,N7713A,N7713A,N7713A

N344AA,N344AA

N344AA,N344AA,N344AA,N344AA

N344AA,N344AA

N344AA,N344AA,N344AA,N344AA

N7713A,N7713A

N7713A,N7713A,N7713A,N7713A

Hybrid Quick Start
SELECT LPAD('Hello, World', '20', '*') AS value
FROM ignoreMe

********Hello, World

select FLOOR(12.1) AS value
from ignoreMe

12

select FLOOR(-12.1) AS value
from ignoreMe

-13

Arguments
Description

booleanExpr1

A boolean expression

valueIfExpr1True, valueIfExpr2True

A value to return.

hashtag
Usage Examples

The usage examples are based on extracting fields from the following JSON documents:

Expression
Value

CASEWHEN(latitude > 0, 'North', 'South')

North

CASEWHEN(latitude > 0, 1, 0)

1

This function can be used in the table config to add northernHemisphere column:

ingestion transformation function
hashtag
Usage Examples

These examples are based on the Batch Quick Start.

value

NL

UA

AL

NA

PL

AA

FL

value

NL

UA

AL

NA

PL

AA

FL

hashtag
Usage Examples

These examples are based on the Batch Quick Start.

value

7

value

149

select DISTINCTCOUNT(league) AS value
from baseballStats 
select DISTINCTCOUNT(teamID) AS value
from baseballStats 
dayOfMonth(tsInMillis)

dayOfMonth(tsInMillis, timeZoneId)

hashtag
Usage Examples

day

12

day

13

day

12

day

13

select day(1639351800000) AS day
FROM ignoreMe
hashtag
Usage Examples

These examples are based on the Real time Quick Start.

event_id
location
hash

282776561

80406178a3d70a3d714041d5c28f5c28f6

92a8b787e81672261aad8afcf9de3aee

circle-info

The row returned will be different if you run this example as the data is ingested in real-time.

select event_id, location, MD5(location) AS hash
from meetupRsvp 
limit 1
hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

value

34

value

2

select DISTINCTCOUNTBITMAPMV(DivLongestGTimes) AS value
from airlineStats 
where arraylength(DivLongestGTimes) > 1
hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

value

34

select DISTINCTCOUNTHLLMV(DivLongestGTimes) AS value
from airlineStats 
where arraylength(DivLongestGTimes) > 1
hashtag
Usage Examples
notTrimmed
trimmed

" Pinot with spaces "

"Pinot with spaces "

SELECT ' Pinot with spaces  ' AS notTrimmed,
       ltrim(' Pinot with spaces ') AS trimmed
FROM ignoreMe
hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

DivTailNums
index

N7713A,N7713A

0

N344AA,N344AA

-1

N7713A,N7713A

0

select DivTailNums, 
       arrayIndexOfString(DivTailNums, 'N7713A') AS index
from airlineStats 
WHERE arraylength(DivTailNums) >= 2
limit 5
hashtag
Usage Examples

These examples are based on the Batch Quick Start.

value

158

value

149

DISTINCTCOUNT
select DISTINCTCOUNTHLL(teamID) AS value
from baseballStats 
hashtag
Usage Examples
value

2.718281828459045

value

162754.79141900392

select EXP(1) AS value
from ignoreMe
select EXP(12) AS value
from ignoreMe
hashtag
Usage Examples

This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type LONG with singleValueField in the table schema set to false.

select mvCol1, 
       arrayConcatLong(mvCol1, mvCol2) AS concatLongs
from multiValueTable
WHERE arraylength(mvCol1) >= 2
limit 5
select DivTailNums, 
       arrayConcatString(DivTailNums, DivTailNums) AS concatIds
from airlineStats 
WHERE arraylength(DivTailNums) >= 2
limit 5
{
  "latitude": 1.0
}
{
   "tableConfig":{
      "ingestionConfig":{
         "transformConfigs":[
            {
               "columnName":"northernHemisphereStr",
               "transformFunction":"CASEWHEN(latitude > 0, 'North', 'South')"
            }
         ]
      }
   }
}
select DISTINCT league AS value
from baseballStats 
select DISTINCT(league) AS value
from baseballStats 
select day(1639351800000, 'CET') AS day
FROM ignoreMe
select dayOfMonth(1639351800000) AS day
FROM ignoreMe
select dayOfMonth(1639351800000, 'CET') AS day
FROM ignoreMe
select DISTINCTCOUNTBITMAPMV(DivTailNums) AS value
from airlineStats 
where arraylength(DivTailNums) > 1
select DISTINCTCOUNTHLL(teamID, 12) AS value
from baseballStats 

arrayIndexOfInt

This section contains reference documentation for the arrayIndexOfInt function.

Finds the last index of the given value in the array starting at the given index.

hashtag
Signature

arrayIndexOfInt('colName', valueToFind)

hashtag
Usage Examples

These examples are based on the .

DivAirportIDs
index

arrayRemoveInt

This section contains reference documentation for the arrayRemoveInt function.

Removes value from array of ints.

hashtag
Signature

arrayRemoveInt('colName', value)

hashtag
Usage Examples

These examples are based on the .

DivAirportIDs
value

DISTINCTAVGMV

This section contains reference documentation for the DISTINCTAVGMV function.

Returns the average of distinct row values in a group

hashtag
Signature

DISTINCTAVGMV(colName)

hashtag
Usage Examples

These examples are based on the .

VALUE

ARRAYLENGTH

This section contains reference documentation for the ARRAYLENGTH function.

Returns the length of a multi-value column

hashtag
Signature

ARRAYLENGTH('colName')

hashtag
Usage Examples

These examples are based on the .

length
count(*)
circle-info

The count(*) values will increase each time we execute the query as data is constantly being ingested by the Hybrid Quick Start.

DISTINCTSUM

This section contains reference documentation for the DISTINCTSUM function.

Returns the sum of distinct row values in a group

hashtag
Signature

DISTINCTSUM(colName) or sum(distinct col)

hashtag
Usage Examples

These examples are based on the .

VALUE

VALUE

JSONFORMAT

This section contains reference documentation for the JSONFORMAT function.

Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.

hashtag
Signature

JSONFORMAT(object)

hashtag
Usage Examples

The usage examples are based on extracting fields from the following JSON document:

Expression
Value

This function can be used in the to extract the meta property into the data column, as described below:

minute

This section contains reference documentation for the minute function.

Returns the minute of the hour from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 59.

hashtag
Signature

minute(tsInMillis)

minute(tsInMillis, timeZoneId)

hashtag
Usage Examples

minute
minute

FIRSTWITHTIME

This section contains reference documentation for the firstwithtime function.

Returns the value of dataColumn with the smallest timeColumn value where:

  • timeColumn is used to define the time of dataColumn, which can be of type TIMESTAMP, INT, LONG

  • dataType specifies the type for dataColumn, which can be BOOLEAN, INT, LONG, FLOAT, DOUBLE, STRING

hashtag
Signature

FIRSTWITHTIME(dataColumn, timeColumn, 'dataType')

hashtag
Example

This example is based on the .

value

DISTINCTCOUNTRAWHLLMV

This section contains reference documentation for the DISTINCTCOUNTRAWHLLMV function.

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

hashtag
Signature

DISTINCTCOUNTRAWHLLMV(colName, log2m)

hashtag
Usage Examples

These examples are based on the .

value
value

DIV

This section contains reference documentation for the DIV function.

Quotient of two values

hashtag
Signature

DIV(col1, col2)

hashtag
Usage Examples

These examples are based on the .

homeRuns
numberOfGames
total

JSONPATHDOUBLE

This section contains reference documentation for the JSONPATHDOUBLE function.

Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .

hashtag
Signature

ADD

This section contains reference documentation for the ADD function.

Sum of at least two values

hashtag
Signature

ADD(col1, col2, col3...)

arraySortInt

This section contains reference documentation for the arraySortInt function.

Sorts array of ints.

hashtag
Signature

arraySortInt('colName')

EXPR_MIN / EXPR_MAX

This section contains reference documentation for the EXPR_MIN and EXPR_MAX function.

This function scans the given dataset to identify the maximum and minimum values in the specified measuring columns. Once these extreme values (the maxima and minima) are found, the function locates the corresponding entries in the projection column. These entries are associated with the rows where the extreme values were found in the measuring columns. The function then returns these projection column values, providing a way to link the extreme measurements with their corresponding data in another part of the dataset.

hashtag
Prerequisite

This function has to be used with the following configuration on the broker:

millisecond

This section contains reference documentation for the millisecond function.

Returns the millisecond of the second from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 999.

hashtag
Signature

millisecond(tsInMillis)

millisecond(tsInMillis, timeZoneId)

MINMV

This section contains reference documentation for the MINMV function.

Get the minimum value in a group

hashtag
Signature

MINMV(colName)

13891,12892

-1

14683,14683

0

12339,12339

-1

13487,13930

-1

13029,11292

-1

Hybrid Quick Start

13891,12892

13891

13198,12892

13198

11066,12892

11066

13198,12892

13198

13891,12892

13891

Hybrid Quick Start
SELECT DISTINCTAVGMV(DivLongestGTimes) AS VALUE
FROM airlineStats
WHERE arraylength(DivLongestGTimes) > 1

32.4

Hybrid Quick Start

1

5382

37

267

33

223

17

166

22

160

Hybrid Quick Start
SELECT DISTINCTSUM(runs) AS VALUE
FROM baseballStats

13922

SELECT SUM(DISTINCT AtBatting) AS VALUE
FROM baseballStats

244032

Batch Quick Start
{"timestamp": "2019-10-09 21:25:25", "meta": {"age": 12}}

JSONFORMAT(meta)

"{\"age\":12}"

table config
select minute(1639351800000) AS minute
FROM ignoreMe

30

select minute(1639351800000, 'America/St_Johns') AS minute
FROM ignoreMe

0

group_name1016303453

Streaming Quick Start
select DISTINCTCOUNTRAWHLLMV(DivAirports) AS value
from airlineStats 
where arraylength(DivAirports) > 1

00000008000000ac00000000000000000000000500000020000000000030000202108000040000010000000300010400000000000000000000000463000000000000000000010001041000200000002000000000000000000a00000000028001000000010800000000010000001008000000804000000000020000040000880000000000000000000000000000000000000000000000800000000800020004000000840000000002000000000000000000001400

0000000100000004000000e4

Hybrid Quick Start
select homeRuns, numberOfGames, DIV(homeRuns, numberOfGames) AS total
from baseballStats 
WHERE teamID = 'ML1' 
AND yearID = 1956 
AND playerName = 'Henry Louis'

26

153

0.16993464052287582

Batch Quick Start
hashtag
Usage Examples

These examples are based on the Batch Quick Start.

homeRuns
baseOnBalls
total

26

37

63

select homeRuns, baseOnBalls, ADD(homeRuns, baseOnBalls) AS total
from baseballStats 
WHERE teamID = 'ML1' 
AND yearID = 1956 
AND playerName = 'Henry Louis'
hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

DivAirportIDs
sortedIds

13891,12892

12892,13891

14683,14683

14683,14683

12339,12339

12339,12339

13198,10721

10721,13198

10721,12478

10721,12478

hashtag
Signature

EXPR_MIN (projectionCol, measuringCol1, measuringCol2, measuringCol3)

EXPR_MAX (projectionCol, measuringCol1, measuringCol2, measuringCol3)

hashtag
Usage Examples

Find the user with maximum activity. If there are multiple users, break the tie with their last_activity_date. If still a tie, break with user_id. And project user_id.

More useful is that this multiple such aggregation function can be used with GROUP BY

Note:

  1. In cases where multiple rows share the same extreme values in the measuring columns, all such rows will be returned by the function.

  2. If the goal is to project multiple different columns that correspond to the same set of measuring columns, you can achieve this by invoking the function multiple times, each time specifying a different projection column.

  3. This impl does not work with AS clause (e.g. SELECT exprmin(longCol, doubleCol) AS exprmin won't work)

  4. Putting exprmin/exprmax column inside order by clause (e.g. SELECT intCol, exprmin(longCol, doubleCol) FROM table GROUP BY intCol ORDER BY exprmin(longCol, doubleCol)) is not supported as semantically ordering multi-column multi-row exprmin/exprmax results doesn't make sense

  5. Currently projecting MV bytes column doesn't work for now due to an issue

For more detailed examples, see: https://github.com/apache/pinot/pull/10636arrow-up-right

hashtag
Usage Examples
millisecond

0

millisecond

0

select millisecond(1639351800000) AS millisecond
FROM ignoreMe
select millisecond(1639351800000, 'America/St_Johns') AS millisecond
FROM ignoreMe
hashtag
Usage Examples

These examples are based on the Hybrid Quick Start.

value

2

select MINMV(DivLongestGTimes) AS value
from airlineStats 
where arraylength(DivLongestGTimes) > 1
select DivAirportIDs, 
       arrayIndexOfInt(DivAirportIDs, 14683) AS index
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5
select DivAirportIDs, 
       arrayRemoveInt(DivAirportIDs, 12892) AS value
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
AND arrayContainsInt(DivAirportIDs, 12892) = 1
limit 5
select ARRAYLENGTH(RandomAirports) AS length, count(*) 
from airlineStats 
GROUP BY length
ORDER BY count(*) DESC
LIMIT 5
{
   "tableConfig":{
      "ingestionConfig":{
         "transformConfigs":[
            {
               "columnName":"data",
               "transformFunction":"JSONFORMAT(meta)"
            }
         ]
      }
   }
}
select FIRSTWITHTIME(group_name, __metadata$recordTimestamp, 'STRING')
from meetupRsvp 
select DISTINCTCOUNTRAWHLLMV(DivAirports, 1) AS value
from airlineStats 
where arraylength(DivAirports) > 1
select DivAirportIDs, 
       arraySortInt(DivAirportIDs) AS sortedIds
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5
pinot.broker.query.rewriter.class.names: DEFAULT_QUERY_REWRITERS_CLASS_NAMES + ",org.apache.pinot.sql.parsers.rewriter.ExprMinMaxRewriter"
pinot.broker.result.rewriter.class.names: "org.apache.pinot.core.query.utils.rewriter.ParentAggregationResultRewriter"
SELECT EXPR_MAX(user_id, activity, last_activity_date, user_id)
FROM userEngagmentTable
SELECT user_region, EXPR_MAX(user_id, activity, last_activity_date, user_id),
    EXPR_MIN(user_id, user_satisfaction)
FROM userEngagmentTable
GROUP BY user_region
JSONPATHDOUBLE(jsonField, 'jsonPath', [defaultValue])
Arguments
Description

jsonField

An Identifier/Expression contains JSON documents.

'jsonPath'

Follows to read values from JSON documents.

circle-exclamation

'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the Jayway JsonPath Evaluator Toolarrow-up-right to test JSON expressions before you import any data.

hashtag
Usage Examples

The usage examples are based on extracting fields from the following JSON document:

Expression
Value

JSONPATHDOUBLE(data, '$.age')

24.0

This function can be used in the table config to extract the age property into the age column, as described below:

ingestion transformation function

JSONPATHLONG

This section contains reference documentation for the JSONPATHLONG function.

Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an ingestion transformation function.

hashtag
Signature

JSONPATHLONG(jsonField, 'jsonPath', [defaultValue])

Arguments
Description
circle-exclamation

'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the to test JSON expressions before you import any data.

hashtag
Usage Examples

The usage examples are based on extracting fields from the following JSON document:

Expression
Value

This function can be used in the to extract the age property into the age column, as described below:

LAG

This section contains reference documentation for the LAG function.

Returns the value from a preceding row in the same result set, based on a specified physical offset. It can be used to compare values in the current row with values in a previous row.

hashtag
Signature

hashtag
Arguments

  • expression: The column or calculation from which the value is to be returned.

  • offset: The number of rows before the current row from which to retrieve the value. The default is 1 if not specified.

  • default: The value to return if the offset goes beyond the scope of the window. If not specified, NULL is returned.

hashtag
Example

This example calculates the difference in sales between the current day and the previous day.

Retrieve the previous payment amount for comparison.

Identify trends by comparing current data with historical data.

Calculate the difference in sales between the current day and the previous day This example shows how to use the LAG function to find the sales difference between consecutive days.

Output:

sales_date
sales_amount
previous_day_sales
difference

Retrieve the previous payment amount for comparison This query retrieves the last payment amount for each payment to see if the amount is increasing or decreasing.

Output:

payment_date
amount
previous_amount

Identify trends by comparing current data with historical data Use the LAG function to compare the current month's data with the same month from the previous year to identify trends or significant changes.

Output:

month
year
data_value
previous_year_data

Use with CTE:

jsonextractscalar

This section contains reference documentation for the JSONEXTRACTSCALAR function.

Evaluates the 'jsonPath' on jsonField, returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.

hashtag
Signature

JSONEXTRACTSCALAR(jsonField, 'jsonPath', 'resultsType', [defaultValue])

Arguments
Description
circle-exclamation

'jsonPath'and`` ``'results_type'are literals. Pinot uses single quotes to distinguish them from identifiers.

hashtag
Usage Examples

The examples in this section are based on the . In particular we'll be querying the row WHERE id = 7044874109:

repo

The following examples show how to use the JSONEXTRACTSCALAR function:

id
name
id
name

DISTINCTCOUNTTHETASKETCH

This section contains reference documentation for the DISTINCTCOUNTTHETASKETCH function.

The Theta Sketcharrow-up-right framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Classarrow-up-right and its extensions from the library org.apache.datasketches:datasketches-java:4.2.0 to perform distinct counting as well as evaluating set operations.

hashtag
Signature

distinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate) -> Long

  • thetaSketchColumn (required): Name of the column to aggregate on.

  • thetaSketchParams (required): Semicolon-separated parameter string for constructing the intermediate theta-sketches.

    • The supported parameters are:

hashtag
Usage Examples

These examples are based on the .

value
value

We can also provide predicates and a post aggregation expression to compute more complicated cardinalities. For example, we could can find the intersection of the following queries:

yearID
yearID

(the yearId 1986 is the only one in common)

By running the following query:

value

DATETRUNC

This section contains reference documentation for the DATETRUNC function.

(Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc.

Converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.

hashtag
Signature

DATETRUNC(unit, timeValue)

DATETRUNC(unit, timeValue, inputTimeUnitStr)

DATETRUNC(unit, timeValue, inputTimeUnitStr, timeZone)

DATETRUNC(unit, timeValue, inputTimeUnitStr, timeZone, outputTimeUnitStr)

unit supports the following values:

  • millisecond

  • second

  • minute

inputTimeUnitStr and outputTimeUnitStr support the following values:

  • NANOSECONDS

  • MICROSECONDS

  • MILLISECONDS

hashtag
Usage Examples

Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight):

or

ts

Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the UTC time zone, returning a result in epoch in seconds in UTC timezone:

ts

Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the CET time zone, returning a result in epoch in seconds in UTC timezone:

ts

Truncates an epoch in milliseconds at QUARTER in the Los Angeles time zone (where a Quarter begins on Jan 1st, April 1st, July 1st, October 1st in Los Angeles timezone), returning a result in hours since UTC epoch:

ts

DISTINCTCOUNTRAWTHETASKETCH

This section contains reference documentation for the DISTINCTCOUNTRAWTHETASKETCH function.

The Theta Sketcharrow-up-right framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Classarrow-up-right and its extensions from the library org.apache.datasketches:datasketches-java:4.2.0 to perform distinct counting as well as evaluating set operations.

hashtag
Signature

distinctCountRawThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate) -> HexEncoded

  • thetaSketchColumn (required): Name of the column to aggregate on.

  • thetaSketchParams (required): Semicolon-separated parameter string for constructing the intermediate theta-sketches.

    • The supported parameters are:

hashtag
Usage Examples

These examples are based on the .

value
value

We can also provide predicates and a post aggregation expression to compute more complicated cardinalities:

value

JSONPATHSTRING

This section contains reference documentation for the JSONPATHSTRING function.

Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .

hashtag
Signature

JSONPATHARRAY

This section contains reference documentation for the JSONPATHARRAY function.

Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an .

hashtag
Signature

JSONPATHARRAY(jsonField, 'jsonPath')

{
  "data": {
    "name": "Pete",
    "age": 24,
    "subjects": [
      {
        "name": "maths",
        "homework_grades": [80, 85, 90, 95, 100],
        "grade": "A",
        "score": 90
      },
      {
        "name": "english",
        "homework_grades": [60, 65, 70, 85, 90],
        "grade": "B",
        "score": 70
      }
    ]
  }
}
{
   "tableConfig":{
      "ingestionConfig":{
         "transformConfigs":[
            {
               "columnName":"age",
               "transformFunction":"JSONPATHDOUBLE(data, '$.age')"
            }
         ]
      }
   }
}
LAG(any expression [, bigint offset [, any default]])
JsonPath Syntaxarrow-up-right

jsonField

An Identifier/Expression contains JSON documents.

'jsonPath'

Follows JsonPath Syntaxarrow-up-right to read values from JSON documents.

JSONPATHLONG(data, '$.age')

24

Jayway JsonPath Evaluator Toolarrow-up-right
table config

jsonField

An Identifier/Expression contains JSON documents.

'jsonPath'

Follows JsonPath Syntaxarrow-up-right to read values from JSON documents.

'results_type'

One of the Pinot supported data types:INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING,

INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.

{"id":115911530,"name":"LimeVista/Tapes","url":"https://api.github.com/repos/LimeVista/Tapes"}

7044874109

LimeVista/Tapes

7044874109

dummyValue

Batch JSON Quick Start

nominalEntries: The nominal entries used to create the sketch. (Default 4096)

  • samplingProbability: Sets the upfront uniform sampling probability, p. (Default 1.0)

  • accumulatorThreshold: How many sketches should be kept in memory before merging. (Default 2)

  • predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.

  • postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.

  • 149

    146

    1986

    1985

    1937

    2003

    1979

    1900

    1986

    1978

    2012

    1

    Batch Quick Start
    hour
  • day

  • week

  • month

  • quarter

  • year

  • SECONDS
  • MINUTES

  • HOURS

  • DAYS

  • 1639353600000

    1639353600

    1639350000

    453631

    nominalEntries: The nominal entries used to create the sketch. (Default 4096)

  • samplingProbability: Sets the upfront uniform sampling probability, p. (Default 1.0)

  • accumulatorThreshold: How many sketches should be kept in memory before merging. (Default 2)

  • Currently, the only supported parameter is nominalEntries (defaults to 4096).

  • predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.

  • postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.

  • AgMDAAAKzJOVAAAAAACAPwDAATj...

    AwMDAAAKzJMQAAAAAACAP4vpfPBbbQsO5N1zYV2c...

    AQMDAAA6zJN8QPYIsvHMNQ==

    Batch Quick Start
    {
      "data": {
        "name": "Pete",
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"age",
                   "transformFunction":"JSONPATHLONG(data, '$.age')"
                }
             ]
          }
       }
    }
    select repo
    from githubEvents 
    WHERE id = 7044874109
    select id, jsonextractscalar(repo, '$.name', 'STRING') AS name
    from githubEvents 
    WHERE id = 7044874109
    select id, jsonextractscalar(repo, '$.foo', 'STRING') AS name
    from githubEvents 
    WHERE id = 7044874109
    [
      {
        "message": "QueryExecutionError:\njava.lang.RuntimeException: Illegal Json Path: [$.foo], when reading [{\"id\":115911530,\"name\":\"LimeVista/Tapes\",\"url\":\"https://api.github.com/repos/LimeVista/Tapes\"}]\n\tat org.apache.pinot.core.operator.transform.function.JsonExtractScalarTransformFunction.transformToStringValuesSV(JsonExtractScalarTransformFunction.java:254)\n\tat org.apache.pinot.core.operator.docvalsets.TransformBlockValSet.getStringValuesSV(TransformBlockValSet.java:90)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:64)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:32)",
        "errorCode": 200
      }
    ]
    select id, jsonextractscalar(repo, '$.foo', 'STRING', 'dummyValue') AS name
    from githubEvents 
    WHERE id = 7044874109
    select distinctCountThetaSketch(teamID) AS value
    from baseballStats 
    select distinctCountThetaSketch(teamID, 'nominalEntries=10') AS value
    from baseballStats
    select yearID
    from baseballStats
    where teamID = 'SFN' AND numberOfGames = 28 AND homeRuns = 1
    select yearID
    from baseballStats
    where teamID = 'CHN' AND numberOfGames = 28 AND homeRuns = 1
    select distinctCountThetaSketch(
      yearID, 
      'nominalEntries=4096', 
      'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
      'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
      'SET_INTERSECT($1, $2)'
    ) AS value
    from baseballStats 
    select dateTrunc('week', 1639480981746) AS ts
    FROM ignoreMe
    select dateTrunc('week', 1639480981746, 'MILLISECONDS') AS ts
    FROM ignoreMe
    select dateTrunc(
      'week', 
      1639480981746, 
      'MILLISECONDS', 
      'UTC', 
      'SECONDS'
    ) AS ts
    FROM ignoreMe
    select dateTrunc(
      'week', 
      1639480981746, 
      'MILLISECONDS', 
      'CET', 
      'SECONDS'
    ) AS ts
    FROM ignoreMe
    select dateTrunc(
      'quarter', 
      1639480981746, 
      'MILLISECONDS', 
      'America/Los_Angeles', 
      'HOURS'
    ) AS ts
    FROM ignoreMe
    select distinctCountRawThetaSketch(teamID) AS value
    from baseballStats 
    select distinctCountRawThetaSketch(teamID, 'nominalEntries=10') AS value
    from baseballStats
    select distinctCountRawThetaSketch(
      yearID, 
      'nominalEntries=4096', 
      'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
      'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
      'SET_INTERSECT($1, $2)'
    ) AS value
    from baseballStats 
    JSONPATHSTRING(jsonField, 'jsonPath', [defaultValue])
    Arguments
    Description

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows to read values from JSON documents.

    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the JSONPath Online Evaluatorarrow-up-right to test JSON expressions before you import any data.

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    JSONPATHSTRING(data, '$.age')

    "24"

    JSONPATHSTRING(data, '$.name["nick.name"]')

    "Pete"

    This function can be used in the table config to extract the age property into the age column, as described below:

    ingestion transformation function
    Arguments
    Description

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows to read values from JSON documents.

    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the Jayway JsonPath Evaluator Toolarrow-up-right to test JSON expressions before you import any data.

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    JSONPATHARRAY(myJsonRecord, '$.subjects[*].name')

    ["maths", "english"]

    JSONPATHARRAY(myJsonRecord, '$.subjects[*].score')

    [90, 70]

    JSONPATHARRAY(myJsonRecord, '$.subjects[*].homework_grades[1]')

    [85, 65]

    This function can be used in the table config to extract the name, score, and second value of homework_grades into their respective columns , as described below:

    ingestion transformation function

    2023-02-14

    200

    NULL

    NULL

    2023-02-15

    180

    200

    -20

    2023-02-16

    220

    180

    2023-02-14 21:21:59.996577

    2.99

    NULL

    2023-02-14 21:23:39.996577

    4.99

    2.99

    2023-02-14 21:29:00.996577

    4.99

    4.99

    1

    2023

    150

    NULL

    1

    2024

    170

    150

    CTE based LAG query

    40

    {
      "data": {
        "name": {"full.name": "Peter", "nick.name": "Pete"},
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"age",
                   "transformFunction":"JSONPATHSTRING(data, '$.age')"
                },
                {
                   "columnName":"nickName",
                   "transformFunction":"JSONPATHSTRING(data, '$.name[\"nick.name\"]')"
                }
             ]
          }
       }
    }
    {
      "data": {
        "name": "Pete",
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"names",
                   "transformFunction":"JSONPATHARRAY(data, '$.subjects[*].name')"
                },
                {
                   "columnName":"ages",
                   "transformFunction":"JSONPATHARRAY(data, '$.subjects[*].score')"
                },
                {
                   "columnName":"homeworkGrades",
                   "transformFunction":"JSONPATHARRAY(data, '$.subjects[*].homework_grades[1]')"
                }
             ]
          }
       }
    }
    SELECT
        sales_date,
        sales_amount,
        LAG(sales_amount, 1) OVER (ORDER BY sales_date) AS previous_day_sales,
        sales_amount - LAG(sales_amount, 1) OVER (ORDER BY sales_date) AS difference
    FROM
        daily_sales;
    SELECT
        payment_date,
        amount,
        LAG(amount, 1) OVER (ORDER BY payment_date) AS previous_amount
    FROM
        payment;
    SELECT
        month,
        year,
        data_value,
        LAG(data_value, 12) OVER (ORDER BY year, month) AS previous_year_data
    FROM
        monthly_data;
    WITH tmp AS (
      select count(*) as num_trips,
        DaysSinceEpoch
      from airlineStats
      GROUP BY DaysSinceEpoch
    )
    
    SELECT DaysSinceEpoch,
      num_trips,
      LAG(num_trips, 2) OVER (
        ORDER BY DaysSinceEpoch
      ) AS previous_num_trips,
      num_trips - LAG(num_trips, 2) OVER (
        ORDER BY DaysSinceEpoch
      ) AS difference
    FROM tmp;
    JsonPath Syntaxarrow-up-right
    JsonPath Syntaxarrow-up-right

    DISTINCTSUMMV

    This section contains reference documentation for the DISTINCTSUMMV function.

    Returns the sum of the distinct row values in a group

    hashtag
    Signature

    DISTINCTSUMMV(colName)

    hashtag
    Usage Examples

    These examples are based on the .

    VALUE

    arraySliceString

    This section contains reference documentation for the arraySliceString function.

    Returns the values in the array between the start and end positions.

    hashtag
    Signature

    arraySliceString('colName', start, end)

    hashtag
    Usage Examples

    These examples are based on the .

    FlightNum
    airports
    RandomAirports

    DATETIMECONVERT

    This section contains reference documentation for the DATETIMECONVERT function.

    Converts the value from a column that contains an epoch timestamp into another time unit and buckets based on the given time granularity.

    hashtag
    Signature

    DATETIMECONVERT(columnName, inputFormat, outputFormat, outputGranularity)

    inputFormat

    arrayContainsString

    This section contains reference documentation for the arrayContainsString function.

    Checks if string value exists in array.

    hashtag
    Signature

    arrayContainsString('colName', valueToFind)

    jsonextractkey

    This section contains reference documentation for the JSONEXTRACTKEY function.

    Extracts all matched JSON field keys based on 'jsonPath' into a STRING_ARRAY.

    hashtag
    Signature

    JSONEXTRACTKEY(jsonField, 'jsonPath')

    Arguments
    SELECT DISTINCTSUMMV(DivLongestGTimes) AS VALUE
    FROM airlineStats
    WHERE arraylength(DivLongestGTimes) > 1

    1134

    Hybrid Quick Startarrow-up-right

    SEA,PSC

    SEA,PSC,PHX,MSY

    3162

    SEA,PSC

    SEA,PSC,PHX,MSY

    671

    SEA,PSC

    SEA,PSC,PHX,MSY

    1767

    SEA,PSC

    SEA,PSC,PHX

    2522

    SEA,PSC

    SEA,PSC

    Hybrid Quick Start

    424

    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    DivTailNums
    index

    N7713A,N7713A

    true

    N344AA,N344AA

    false

    N7713A,N7713A

    true

    select DivTailNums, 
           arrayContainsString(DivTailNums, 'N7713A') AS index
    from airlineStats 
    WHERE arraylength(DivTailNums) >= 2
    limit 5
    select FlightNum, 
           arraySliceString(RandomAirports, 0, 2) AS airports, 
           RandomAirports
    from airlineStats 
    WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
    limit 5
    and
    outputFormat
    are defined using the following structure:

    <time size>:<time unit>:<time format>:<pattern>

    where:

    • time size - size of the time unit eg: 1, 10

    • time unit - DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS, MICROSECONDS, NANOSECONDS

    • time format

      • EPOCH

      • SIMPLE_DATE_FORMAT pattern - defined in case of

    granularity is specified in the format <time size>:<time unit>.

    hashtag
    Usage Examples

    These examples are based on the Batch JSON Quick Start.

    created_at_timestamp from milliseconds since epoch to days since epoch, bucketed to 1 day granularity:

    id
    created_at_timestamp
    timeInMs
    convertedTime

    7044874134

    2018-01-01 11:00:00.0

    1514804402000

    17532

    created_at_timestamp bucketed to 15 minutes granularity:

    id
    created_at_timestamp
    timeInMs
    convertedTime

    7044874134

    2018-01-01 11:00:00.0

    1514804402000

    1514804400000

    created_at_timestamp to format yyyy-MM-dd, bucketed to 1 days granularity:

    id
    created_at_timestamp
    timeInMs
    convertedTime

    7044874134

    2018-01-01 11:00:00.0

    1514804402000

    2018-01-01

    created_at_timestamp to format yyyy-MM-dd HH:mm, in timezone Pacific/Kiritimati:

    id
    created_at_timestamp
    timeInMs
    convertedTime

    7044874134

    2018-01-01 11:00:00.0

    1514804402000

    2018-01-02 01:00

    created_at_timestamp to format yyyy-MM-dd, in timezone Pacific/Kiritimati and bucketed to 1 day granularity:

    id
    created_at_timestamp
    timeInMs
    convertedTime

    7044874134

    2018-01-01 11:00:00.0

    1514804402000

    2018-01-02 00:00

    Description

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows to read values from JSON documents.

    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.

    hashtag
    Usage Examples

    The examples in this section are based on the Batch JSON Quick Start. In particular we'll be querying the row WHERE id = 7044874109.

    id
    repo
    keys

    7044874109

    {"id":115911530,"name":"LimeVista/Tapes","url":"https://api.github.com/repos/LimeVista/Tapes"}

    $['id'],$['name'],$['url']

    FrequentStringsSketch

    This section contains reference documentation for the FREQUENTSTRINGSSKETCH function.

    FREQUENTSTRINGSSKETCH is an estimation data-sketch function which can be used to estimate the frequency of an item. It is based on Apache Datasketches libraryarrow-up-right and returns a serialized sketch object which can be merged with other sketches.

    hashtag
    Signature

    FREQUENTSTRINGSSKETCH(column, maxMapSize=256) -> Base64 encoded sketch object

    • column (required): Name of the column to aggregate on. Needs to be a type which can be cast into 'STRING'.

    • maxMapSize: This value specifies the maximum physical length of the internal hash map. The maxMapSize must be a power of 2 and the default value is 256.

    hashtag
    Usage Example

    frequentstringssketch(AirlineID)

    Which can be used, for example in Java as:

    For more examples on the sketch API, refer to the Datasketches .

    select id, 
           created_at_timestamp, 
           cast(created_at_timestamp AS long) AS timeInMs,
           DATETIMECONVERT(
             created_at_timestamp, 
             '1:MILLISECONDS:EPOCH', 
             '1:DAYS:EPOCH', 
             '1:DAYS'
           ) AS convertedTime
    from githubEvents
    WHERE id = 7044874134
    select id, 
           created_at_timestamp, 
           cast(created_at_timestamp AS long) AS timeInMs,
           DATETIMECONVERT(
             created_at_timestamp, 
             '1:MILLISECONDS:EPOCH', 
             '1:MILLISECONDS:EPOCH', 
             '15:MINUTES'
           ) AS convertedTime
    from githubEvents
    WHERE id = 7044874134
    select id, 
           created_at_timestamp, 
           cast(created_at_timestamp AS long) AS timeInMs,
           DATETIMECONVERT(
             created_at_timestamp, 
             '1:MILLISECONDS:EPOCH', 
             '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd', 
             '1:DAYS'
           ) AS convertedTime
    from githubEvents
    WHERE id = 7044874134
    select id, 
           created_at_timestamp, 
           cast(created_at_timestamp AS long) AS timeInMs,
           DATETIMECONVERT(
             created_at_timestamp, 
             '1:MILLISECONDS:EPOCH', 
             '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)', 
             '1:MILLISECONDS'
           ) AS convertedTime
    from githubEvents
    WHERE id = 7044874134
    select id, 
           created_at_timestamp, 
           cast(created_at_timestamp AS long) AS timeInMs,
           DATETIMECONVERT(
             created_at_timestamp, 
             '1:MILLISECONDS:EPOCH', 
             '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)', 
             '1:DAYS'
           ) AS convertedTime
    from githubEvents
    WHERE id = 7044874134
    select id, repo, JSONEXTRACTKEY(repo, '$.*') AS keys
    from githubEvents 
    WHERE id = 7044874109
    SIMPLE_DATE_FORMAT
    e.g.
    yyyy-MM-dd
    . A specific timezone can be passed using
    tz(timezone)
    . Timezone can be long or short string format timezone. e.g.
    Asia/Kolkata
    or
    PDT
    JsonPath Syntaxarrow-up-right

    BAEKCAUAAAAOAA...

    documentationarrow-up-right
    select FREQUENTSTRINGSSKETCH(AirlineID, 16) from airlineStats
    byte[] byteArr = Base64.getDecoder().decode(encodedSketch);
    ItemsSketch<String> sketch = ItemsSketch.getInstance(Memory.wrap(byteArr), new ArrayOfStringsSerDe());
    
    ItemsSketch.Row[] items = sketch.getFrequentItems(ErrorType.NO_FALSE_NEGATIVES);
    for (int i = 0; i < items.length; i++) {
      ItemsSketch.Row item = items[i];
      System.out.printf("Airline: %s, Frequency: %d %n", item.getItem(), item.getEstimate());
    }

    isSubnetOf

    This section contains reference documentation for the isSubnetOf function.

    Takes 2 arguments of type STRING. The first argument is an ipPrefix, and the second argument is a single ipAddress. This function handles both IPv4 and IPv6 arguments.

    Returns a boolean value checking if ipAddress is in the subnet of ipPrefix

    hashtag
    Signatures

    isSubnetOf(ipPrefix, ipAddress) -> boolean

    hashtag
    Usage Examples

    See the following sample queries where isSubnetOf is used in different parts of the query.

    min

    This section contains reference documentation for the min function.

    Get the minimum value in a group

    hashtag
    Signature

    MIN(colName)

    hashtag
    Usage Examples

    These examples are based on the .

    value

    CHR

    This section contains reference documentation for the CHR function.

    the character corresponding to the Unicode codepoint

    hashtag
    Signature

    CHR(codepoint)

    hashtag
    Usage Examples

    value

    FunnelMatchStep

    The FunnelMatchStep function in Pinot is designed to track user progress through a predefined series of steps or stages in a funnel, such as user interactions on a website from page views to purchases. This function is particularly useful for analyzing how far users progress through a conversion process within a specified time window.

    hashtag
    Syntax

    arrayReverseString

    This section contains reference documentation for the arrayReverseString function.

    Reverses array of strings.

    hashtag
    Signature

    arrayReverseString('colName')

    FromDateTime

    This section contains reference documentation for the FromDateTime function.

    Converts a formatted date-time string to milliseconds, based on the provided .

    hashtag
    Signature

    FromDateTime(dateTimeString, pattern)

    LASTWITHTIME

    This section contains reference documentation for the lastwithtime function.

    Returns the value of dataColumn with the largest timeColumn value where:

    • timeColumn is used to define the time of dataColumn, which can be of type TIMESTAMP, INT, LONG

    arrayConcatDouble

    This section contains reference documentation for the arrayConcatDouble function.

    Concatenates two arrays of doubles.

    hashtag
    Signature

    arrayConcatDouble('colName1', 'colName2')

    DISTINCTAVG

    This section contains reference documentation for the DISTINCTAVG function.

    Returns the average of distinct row values in a group

    hashtag
    Signature

    DISTINCTAVG(colName) or avg(distinct col)

    SELECT isSubnetOf('192.168.0.1/24', '192.168.0.225') 
    AS result
    FROM myTable;
    ---> returns true
    
    SELECT isSubnetOf('1.2.3.128/26', '1.2.5.1') 
    AS result
    FROM myTable;
    ---> returns false
    SELECT isSubnetOf('2001:4800:7825:103::/64', '2001:4800:7825:103::2050')
    AS result
    FROM myTable;
    ---> returns true
    
    SELECT isSubnetOf('7890:db8:113::8a2e:370:7334/127', '7890:db8:113::8a2e:370:7336') 
    AS result
    FROM myTable;
    ---> returns false
    select min(yearID) AS value
    from baseballStats 

    1871

    Batch Quick Start
    SELECT CHR(65) AS value
    FROM ignoreMe

    A

    hashtag
    Usage Examples
    epochMillis

    1565136000000

    epochMillis

    1565190733000

    epochMillis

    1565190733000

    epochMillis

    1565190733000

    Joda-Time patternarrow-up-right
  • dataType specifies the type for dataColumn, which can be BOOLEAN, INT, LONG, FLOAT, DOUBLE, STRING

  • hashtag
    Signature

    LASTWITHTIME(dataColumn, timeColumn, 'dataType')

    hashtag
    Example

    This example is based on the Streaming Quick Start.

    value

    group_name809822304

    hashtag
    Usage Examples

    This example assumes the multiValueTable columns mvCol1 and mvCol2 are both of type DOUBLE with singleValueField in the table schema set to false.

    select mvCol1, 
           arrayConcatDouble(mvCol1, mvCol2) AS concatDoubles
    from multiValueTable
    WHERE arraylength(mvCol1) >= 2
    limit 5
    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    VALUE

    83.36526946107784

    VALUE

    349.1158798283262

    SELECT DISTINCTAVG(runs) AS VALUE
    FROM baseballStats
    SELECT count(*) 
    FROM myTable 
    WHERE isSubnetOf('192.168.0.1/24', ipAddressCol);
    
    SELECT count(*) 
    FROM myTable 
    WHERE isSubnetOf('192.168.0.1/24', ipAddressCol) 
    OR isSubnetOf(ipPrefixCol, '7890:db8:113::8a2e:370:7336');
    SELECT 
        CASE 
            WHEN isSubnetOf('105.25.245.115/27', srcIPAddress) THEN 'case1' 
            WHEN isSubnetOf('105.25.245.115/27', dstIPAddress) THEN 'case2'
            ELSE 'case3' 
        END AS differentFlow
    FROM myTable;
    SELECT FromDateTime('2019-08-07', 'yyyy-MM-dd') AS epochMillis
    FROM ignoreMe
    SELECT FromDateTime(
        '2019-08-07 3:12:13 PM', 
        'yyyy-MM-dd hh:mm:ss a'
        ) AS epochMillis
    FROM ignoreMe
    SELECT FromDateTime(
        '2019-08-07T15:12:13', 
        'yyyy-MM-dd''T''HH:mm:ss'
        ) AS epochMillis
    FROM ignoreMe
    SELECT FromDateTime(
        '2019-08-07T07:12:13-0800', 
        'yyyy-MM-dd''T''HH:mm:ssZ'
        ) AS epochMillis
    FROM ignoreMe
    select LASTWITHTIME(group_name, __metadata$recordTimestamp, 'STRING')
    from meetupRsvp 
    SELECT AVG(DISTINCT AtBatting) AS VALUE
    FROM baseballStats
    hashtag
    Return

    This function is similar to the function FunnelMaxStep, instead of returning the number of max step, it returns an array of the size 'number of steps', and marked the matched steps as 1, non-matching as 0.

    E.g.

    hashtag
    Arguments

    1. timestampExpression:

      • Type: Expression in TIMESTAMP or LONG

      • Description: This is an expression that evaluates to the timestamp of each event. It's used to determine the order of events for a particular user or session. The timestamp is crucial for evaluating whether subsequent actions fall within the specified window.

    2. windowSize:

      • Type: LONG

      • Description: Specifies the size of the time window in which the sequence of funnel steps must occur. The window is defined in milliseconds. This parameter sets the maximum allowed time between the first and the last step in the funnel for them to be considered as part of the same user journey.

    3. numberSteps:

      • Type: Integer

      • Description: Defines the total number of distinct steps in the funnel. This count should match the number of stepExpression

    4. stepExpression:

      • Type: Boolean Expression

      • Description: These are expressions that define each step in the funnel. Typically, these are conditions that evaluate whether a specific event type or action has occurred. Multiple step expressions are separated by commas, with each expression corresponding to a step in the funnel sequence.

    5. mode (optional):

      • Type: String

      • Description: Defines additional modes or options that alter how the funnel analysis is calculated. Common modes might include settings to handle overlapping events, reset the window upon each step, or other custom behaviors specific to the needs of the funnel analysis. If unspecified, the default behavior as defined by Pinot is used.

    hashtag
    Optional Mode Supported

    hashtag
    STRICT_DEDUPLICATION

    The STRICT_DEDUPLICATION mode ensures that repeating occurrences of the same event condition within a funnel sequence disrupt further processing of the funnel for that user session. This mode is crucial when it's important to identify and measure unique, non-repeated actions in a sequence, ensuring each step of the funnel represents a distinct action.

    hashtag
    Practical Impact

    • Event Sequence Interruption: When an event that satisfies a current step condition occurs repeatedly without progression to the next step, strict_deduplication interrupts and essentially ends the analysis of the funnel for that sequence. This prevents the funnel from incorrectly advancing if the same action is merely repeated instead of moving through the intended steps.

    • Enhanced Accuracy in Funnel Progression: This mode is useful for scenarios where the continuity and progression of distinct steps are critical for accurate conversion analysis. It avoids the misinterpretation of user engagement where repeated similar actions might otherwise suggest a false progression through the funnel.

    hashtag
    Example

    For instance, if a funnel is designed to track user progression from a homepage visit, to a search, to adding an item to a cart, and then to checkout, the strict_deduplication mode would stop processing the funnel sequence if the user performs multiple searches without proceeding to add an item to the cart. This ensures that only a linear, non-repetitive progression through these steps is considered as valid funnel movement.

    This mode helps maintain the integrity of each step in the user's journey, ensuring that the data reflects true user behavior without overcounting repetitive actions that do not lead to actual progression.

    hashtag
    STRICT_ORDER

    The strict_order mode enforces a stringent sequence order for events within a funnel. This mode ensures that the progression through the steps follows the exact specified order without any intervening events that are not part of the defined sequence.

    hashtag
    Behavior of strict_order

    • Sequence Adherence: The strict_order mode requires that the events occur in the exact order specified without any other types of events intervening. If an event occurs that is not the next expected step in the defined sequence, the analysis of the funnel for that user session is halted.

    • Early Termination: In the presence of an out-of-sequence event, the analysis stops, and the maximum event level is determined as the last correct step in the sequence before the interruption. For instance, in a specified sequence of A -> B -> C, if the sequence is A -> B -> D, then the funnel analysis terminates after B because D is not the expected next step (C).

    hashtag
    Practical Impact

    • Enhanced Precision in Path Analysis: This mode is particularly valuable when the precise order of actions is critical for the analysis, such as in strict process flows where each step must be followed in a specific order to be considered successful.

    • Avoids Misinterpretation: It prevents the misinterpretation of funnel progress where intervening or unordered events could suggest a misleading path through the funnel.

    hashtag
    Example

    Consider a scenario where a funnel is set up to track user progression through the following steps: logging in (A), searching for products (B), adding a product to the cart (C), and completing a purchase (D). Using the strict_order mode, if the sequence goes A -> B -> E -> C, the analysis will terminate after B because E (an unexpected event like viewing account details) intervenes before C, the expected next step. Therefore, the maximum step reached is reported as 2, representing the successful completion of steps A and B only.

    This mode is crucial for scenarios requiring strict compliance to process steps, ensuring that only users who follow the exact intended sequence are counted in the funnel analysis.

    hashtag
    STRICT_INCREASE

    The strict_increase is designed to ensure that the sequence of events being analyzed has strictly increasing timestamps. This mode is crucial for accurately tracking and analyzing user behavior in scenarios where the chronological order of events directly impacts the interpretation of user actions within a funnel.

    hashtag
    Behavior of strict_increase

    • Timestamp Order: This mode requires that each subsequent event in the funnel must have a timestamp greater than the previous event. It ensures that the user's actions are not only in the correct sequence but also follow a temporal progression without any backtracking or simultaneous actions.

    • Analysis Integrity: If any event in the sequence does not adhere to the strictly increasing order by timestamp, the analysis for that sequence either stops at that point or ignores the out-of-order event, depending on how critical the temporal sequence is to the funnel's logic.

    hashtag
    Practical Impact

    • Temporal Validation: This mode is particularly useful in scenarios where the timing of events is crucial, such as in sessions where actions must follow one another in real-time to be considered valid. It validates the sequence not just by the type of event, but also by ensuring that these events are progressively happening over time.

    • Avoiding Data Errors: It helps in avoiding potential data errors or anomalies where timestamps might not have been recorded correctly, or events may appear out of order due to system errors or delays in logging events.

    hashtag
    Example

    Consider a funnel designed to analyze a user's journey from visiting a website to making a purchase, defined by the following steps: page visit (A), item addition (B), checkout initiation (C), and payment completion (D). Using the strict_increase mode, the funnel will only consider sequences where each action occurs later than the previous. If a user's sequence is A (t1) -> B (t2) -> A (t3) -> C (t4) with t3 being less than or equal to t2, then the analysis will ignore the second occurrence of A or terminate, depending on the specific implementation and requirements of the analysis.

    This mode helps ensure that the funnel analysis reflects true, linear progress through the intended actions, with each step occurring in a timely, sequential manner.

    hashtag
    KEEP_ALL

    The KEEP_ALL mode is designed to ensure that all events in the data set are considered in the analysis, even if they do not match any of the specified step conditions in the funnel sequence. This mode is particularly useful for comprehensive data analysis where the context of non-matching events may still provide valuable insights about user behavior or system performance.

    hashtag
    Behavior of KEEP_ALL

    • Inclusive Analysis: In the KEEP_ALL mode, the funnel function includes every event within the specified time window in the analysis, regardless of whether these events correspond to the predefined steps in the funnel. This allows for a more holistic view of the user's actions during the session.

    • Context Retention: By including all events, this mode helps retain the full context of a user's session, capturing activities that may not be directly related to the funnel but could influence or explain the user's behavior and decisions at other points.

    hashtag
    Practical Impact

    • Enhanced Insight: This mode is invaluable for scenarios where understanding the entirety of user interactions is crucial, such as in complex user journeys where additional actions between the main funnel steps might influence the outcomes or indicate other patterns of interest.

    • Data Completeness: It prevents data loss from filtering out non-matching events, which can be important when analyzing sessions for comprehensive patterns, troubleshooting issues, or performing detailed user journey analysis.

    hashtag
    Example

    Consider a scenario where a funnel is set up to track user progress through steps like logging in, searching for a product, and making a purchase. With KEEP_ALL mode enabled, if a user performs additional actions such as updating profile information or viewing terms and conditions, these events are also included in the analysis. This comprehensive inclusion allows analysts to see a fuller picture of what the user did during their session, not just the actions that directly relate to the funnel. This can reveal if other activities are detracting from the main conversion goals, or if they are part of a broader user engagement that doesn't neatly fit into the primary funnel steps.

    This mode helps to ensure that no potential insights are lost by excluding events, making it a powerful option for detailed analysis and understanding of user interactions beyond the strict confines of the predefined funnel steps.

    hashtag
    Examples

    hashtag
    Data Set

    event_name
    ts
    user_id

    screen_viewed

    1718112402

    1

    screen_clicked

    1718112403

    1

    purchased

    1718112404

    1

    screen_viewed

    1718112405

    1

    hashtag
    Queries

    hashtag
    Query funnels

    Response

    user_id

    matchedSteps

    1

    [1, 1, 0, 0]

    2

    [1, 1, 0, 0]

    3

    [1, 1, 1, 1]

    4

    [1, 1, 0, 0]

    hashtag
    Query with funnel count analysis

    The below query puts the above query in the CTE, then use sumArrayLong to show the funnel transitions for each steps.

    Response

    funnelCounts

    [4, 4, 1, 1]

    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    FlightNum
    reversedAirports
    RandomAirports

    1206

    PSC,SEA

    SEA,PSC

    5300

    PSC,SEA

    SEA,PSC

    3359

    MSY,PHX,PSC,SEA

    SEA,PSC,PHX,MSY

    arrayUnionInt

    This section contains reference documentation for the arrayUnionInt function.

    Create a union of two arrays of ints.

    hashtag
    Signature

    arrayUnionInt('colName1', 'colName2')

    hashtag
    Usage Examples

    These examples are based on the .

    DivWheelsOffs
    DivWheelsOns
    unionIds

    FunnelCompleteCount

    The FunnelCompleteCount function in Pinot is designed to track user progress through a predefined series of steps or stages in a funnel, such as user interactions on a website from page views to purchases. This function is particularly useful for analyzing how many times users progress through the whole conversion processes within a specified time window.

    hashtag
    Syntax

    concat

    This section contains reference documentation for the concat function.

    Concatenate two input strings using the seperator

    hashtag
    Signature

    CONCAT(col1, col2, seperator)

    lower

    This section contains reference documentation for the lower function.

    Converts string to lower case.

    hashtag
    Signature

    LOWER(col)

    MAXMV

    This section contains reference documentation for the MAXMV function.

    Get the maximum value in a group

    hashtag
    Signature

    MAXMV(colName)

    FunnelMatchStep(
        timestampExpression, 
        windowSize, 
        numberSteps, stepExpression
        [, stepExpression[, stepExpression, ...]]
        [, mode [, mode, ... ]]
    )
    numberSteps = 3, maxStep = 0 -> [0, 0, 0]
    numberSteps = 4, maxStep = 2 -> [1, 1, 0, 0]
    SELECT user_id,
      funnelMatchStep(
        ts,
        '1000000',
        4,
        event_name = 'screen_viewed',
        event_name = 'screen_clicked',
        event_name = 'cart_viewed',
        event_name = 'purchased'
      ) as matchedsteps
    FROM clickstreamFunnel
    GROUP BY user_id
    ORDER BY user_id
    WITH funnelMatchSteps AS (
      SELECT user_id,
      funnelMatchStep(
        ts,
        '1000000',
        4,
        event_name = 'screen_viewed',
        event_name = 'screen_clicked',
        event_name = 'cart_viewed',
        event_name = 'purchased'
      ) as matchedsteps
      FROM clickstreamFunnel
      GROUP BY user_id
    )
    
    SELECT sumArrayLong(matchedsteps) as funnelCounts FROM funnelMatchSteps 
    select FlightNum, 
           arrayReverseString(RandomAirports) AS reversedAirports, 
           RandomAirports
    from airlineStats 
    WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
    limit 5
    parameters provided.

    screen_clicked

    1718112406

    1

    purchased

    1718112407

    1

    screen_viewed

    1718112405

    2

    screen_clicked

    1718112406

    2

    purchased

    1718112407

    2

    screen_viewed

    1718112404

    3

    screen_clicked

    1718112405

    3

    cart_viewed

    1718112406

    3

    purchased

    1718112407

    3

    screen_viewed

    1717939609

    4

    screen_clicked

    1718112405

    4

    purchased

    1718112405

    4

    1023

    PHX,PSC,SEA

    SEA,PSC,PHX

    963

    MSY,PHX,PSC,SEA

    SEA,PSC,PHX,MSY

    1339,2310

    1908,1758,1339,2310

    1453,1731

    1415,1623

    1453,1731,1415,1623

    1908,1758

    1339,2310

    1908,1758,1339,2310

    1453,1731

    1415,1623

    1453,1731,1415,1623

    Hybrid Quick Start

    1908,1758

    hashtag
    Usage Examples
    value

    Apache Pinot

    value

    real-time__analytics

    SELECT concat('Apache', 'Pinot', ' ') AS value
    FROM ignoreMe
    SELECT concat('real-time', 'analytics', '__') AS value
    FROM ignoreMe
    hashtag
    Usage Examples
    name

    pinot

    select LOWER('Pinot') AS name
    FROM ignoreMe
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    value

    108

    select MAXMV(DivLongestGTimes) AS value
    from airlineStats 
    where arraylength(DivLongestGTimes) > 1
    select DivWheelsOffs, 
           DivWheelsOns,
           arrayUnionInt(DivWheelsOffs, DivWheelsOns) AS unionIds
    from airlineStats 
    WHERE arraylength(DivWheelsOffs) >= 2
    limit 5
    hashtag
    Return

    This function returns how many times the funnel has been went through.

    hashtag
    Arguments

    1. timestampExpression:

      • Type: Expression in TIMESTAMP or LONG

      • Description: This is an expression that evaluates to the timestamp of each event. It's used to determine the order of events for a particular user or session. The timestamp is crucial for evaluating whether subsequent actions fall within the specified window.

    2. windowSize:

      • Type: LONG

      • Description: Specifies the size of the time window in which the sequence of funnel steps must occur. The window is defined in milliseconds. This parameter sets the maximum allowed time between the first and the last step in the funnel for them to be considered as part of the same user journey.

    3. numberSteps:

      • Type: Integer

      • Description: Defines the total number of distinct steps in the funnel. This count should match the number of stepExpression

    4. stepExpression:

      • Type: Boolean Expression

      • Description: These are expressions that define each step in the funnel. Typically, these are conditions that evaluate whether a specific event type or action has occurred. Multiple step expressions are separated by commas, with each expression corresponding to a step in the funnel sequence.

    5. mode (optional):

      • Type: String

      • Description: Defines additional modes or options that alter how the funnel analysis is calculated. Common modes might include settings to handle overlapping events, reset the window upon each step, or other custom behaviors specific to the needs of the funnel analysis. If unspecified, the default behavior as defined by Pinot is used.

    hashtag
    Optional Mode Supported

    hashtag
    STRICT_DEDUPLICATION

    The STRICT_DEDUPLICATION mode ensures that repeating occurrences of the same event condition within a funnel sequence disrupt further processing of the funnel for that user session. This mode is crucial when it's important to identify and measure unique, non-repeated actions in a sequence, ensuring each step of the funnel represents a distinct action.

    hashtag
    Practical Impact

    • Event Sequence Interruption: When an event that satisfies a current step condition occurs repeatedly without progression to the next step, strict_deduplication interrupts and essentially ends the analysis of the funnel for that sequence. This prevents the funnel from incorrectly advancing if the same action is merely repeated instead of moving through the intended steps.

    • Enhanced Accuracy in Funnel Progression: This mode is useful for scenarios where the continuity and progression of distinct steps are critical for accurate conversion analysis. It avoids the misinterpretation of user engagement where repeated similar actions might otherwise suggest a false progression through the funnel.

    hashtag
    Example

    For instance, if a funnel is designed to track user progression from a homepage visit, to a search, to adding an item to a cart, and then to checkout, the strict_deduplication mode would stop processing the funnel sequence if the user performs multiple searches without proceeding to add an item to the cart. This ensures that only a linear, non-repetitive progression through these steps is considered as valid funnel movement.

    This mode helps maintain the integrity of each step in the user's journey, ensuring that the data reflects true user behavior without overcounting repetitive actions that do not lead to actual progression.

    hashtag
    STRICT_ORDER

    The strict_order mode enforces a stringent sequence order for events within a funnel. This mode ensures that the progression through the steps follows the exact specified order without any intervening events that are not part of the defined sequence.

    hashtag
    Behavior of strict_order

    • Sequence Adherence: The strict_order mode requires that the events occur in the exact order specified without any other types of events intervening. If an event occurs that is not the next expected step in the defined sequence, the analysis of the funnel for that user session is halted.

    • Early Termination: In the presence of an out-of-sequence event, the analysis stops, and the maximum event level is determined as the last correct step in the sequence before the interruption. For instance, in a specified sequence of A -> B -> C, if the sequence is A -> B -> D, then the funnel analysis terminates after B because D is not the expected next step (C).

    hashtag
    Practical Impact

    • Enhanced Precision in Path Analysis: This mode is particularly valuable when the precise order of actions is critical for the analysis, such as in strict process flows where each step must be followed in a specific order to be considered successful.

    • Avoids Misinterpretation: It prevents the misinterpretation of funnel progress where intervening or unordered events could suggest a misleading path through the funnel.

    hashtag
    Example

    Consider a scenario where a funnel is set up to track user progression through the following steps: logging in (A), searching for products (B), adding a product to the cart (C), and completing a purchase (D). Using the strict_order mode, if the sequence goes A -> B -> E -> C, the analysis will terminate after B because E (an unexpected event like viewing account details) intervenes before C, the expected next step. Therefore, the maximum step reached is reported as 2, representing the successful completion of steps A and B only.

    This mode is crucial for scenarios requiring strict compliance to process steps, ensuring that only users who follow the exact intended sequence are counted in the funnel analysis.

    hashtag
    STRICT_INCREASE

    The strict_increase is designed to ensure that the sequence of events being analyzed has strictly increasing timestamps. This mode is crucial for accurately tracking and analyzing user behavior in scenarios where the chronological order of events directly impacts the interpretation of user actions within a funnel.

    hashtag
    Behavior of strict_increase

    • Timestamp Order: This mode requires that each subsequent event in the funnel must have a timestamp greater than the previous event. It ensures that the user's actions are not only in the correct sequence but also follow a temporal progression without any backtracking or simultaneous actions.

    • Analysis Integrity: If any event in the sequence does not adhere to the strictly increasing order by timestamp, the analysis for that sequence either stops at that point or ignores the out-of-order event, depending on how critical the temporal sequence is to the funnel's logic.

    hashtag
    Practical Impact

    • Temporal Validation: This mode is particularly useful in scenarios where the timing of events is crucial, such as in sessions where actions must follow one another in real-time to be considered valid. It validates the sequence not just by the type of event, but also by ensuring that these events are progressively happening over time.

    • Avoiding Data Errors: It helps in avoiding potential data errors or anomalies where timestamps might not have been recorded correctly, or events may appear out of order due to system errors or delays in logging events.

    hashtag
    Example

    Consider a funnel designed to analyze a user's journey from visiting a website to making a purchase, defined by the following steps: page visit (A), item addition (B), checkout initiation (C), and payment completion (D). Using the strict_increase mode, the funnel will only consider sequences where each action occurs later than the previous. If a user's sequence is A (t1) -> B (t2) -> A (t3) -> C (t4) with t3 being less than or equal to t2, then the analysis will ignore the second occurrence of A or terminate, depending on the specific implementation and requirements of the analysis.

    This mode helps ensure that the funnel analysis reflects true, linear progress through the intended actions, with each step occurring in a timely, sequential manner.

    hashtag
    KEEP_ALL

    The KEEP_ALL mode is designed to ensure that all events in the data set are considered in the analysis, even if they do not match any of the specified step conditions in the funnel sequence. This mode is particularly useful for comprehensive data analysis where the context of non-matching events may still provide valuable insights about user behavior or system performance.

    hashtag
    Behavior of KEEP_ALL

    • Inclusive Analysis: In the KEEP_ALL mode, the funnel function includes every event within the specified time window in the analysis, regardless of whether these events correspond to the predefined steps in the funnel. This allows for a more holistic view of the user's actions during the session.

    • Context Retention: By including all events, this mode helps retain the full context of a user's session, capturing activities that may not be directly related to the funnel but could influence or explain the user's behavior and decisions at other points.

    hashtag
    Practical Impact

    • Enhanced Insight: This mode is invaluable for scenarios where understanding the entirety of user interactions is crucial, such as in complex user journeys where additional actions between the main funnel steps might influence the outcomes or indicate other patterns of interest.

    • Data Completeness: It prevents data loss from filtering out non-matching events, which can be important when analyzing sessions for comprehensive patterns, troubleshooting issues, or performing detailed user journey analysis.

    hashtag
    Example

    Consider a scenario where a funnel is set up to track user progress through steps like logging in, searching for a product, and making a purchase. With KEEP_ALL mode enabled, if a user performs additional actions such as updating profile information or viewing terms and conditions, these events are also included in the analysis. This comprehensive inclusion allows analysts to see a fuller picture of what the user did during their session, not just the actions that directly relate to the funnel. This can reveal if other activities are detracting from the main conversion goals, or if they are part of a broader user engagement that doesn't neatly fit into the primary funnel steps.

    This mode helps to ensure that no potential insights are lost by excluding events, making it a powerful option for detailed analysis and understanding of user interactions beyond the strict confines of the predefined funnel steps.

    hashtag
    Examples

    hashtag
    Data Set

    event_name
    ts
    user_id

    screen_viewed

    1718112402

    1

    screen_clicked

    1718112403

    1

    purchased

    1718112404

    1

    screen_viewed

    1718112405

    1

    hashtag
    Queries

    hashtag
    Query funnels

    Response

    user_id

    rounds

    1

    0

    2

    0

    3

    1

    4

    0

    hashtag
    Query with strict_order

    Response

    user_id

    rounds

    1

    2

    2

    1

    3

    1

    4

    0

    hashtag
    Query with strict_order and keep_all

    Response

    user_id

    rounds

    1

    2

    2

    1

    3

    0

    4

    0

    hashtag
    Query with longer window

    Response

    user_id

    rounds

    1

    2

    2

    1

    3

    1

    4

    1

    length

    This section contains reference documentation for the length function.

    calculate length of the string

    hashtag
    Signature

    LENGTH(col)

    hashtag
    Usage Examples

    value

    arrayUnionString

    This section contains reference documentation for the arrayUnionString function.

    Create a union of two arrays of strings.

    hashtag
    Signature

    arrayUnionString('colName1', 'colName2')

    Base64

    This section contains reference documentation for base64 encode and decode functions.

    Encoding scheme follows

    • toBase64 returns Base64 encoded string of input binary data (bytes type).

    • fromBase64

    DISTINCTCOUNTMV

    This section contains reference documentation for the DISTINCTCOUNTMV function.

    Returns the count of distinct row values in a group

    hashtag
    Signature

    DISTINCTCOUNTMV(colName)

    AVGMV

    This section contains reference documentation for the AVGMV function.

    Get the avg of values in a group

    hashtag
    Signature

    AVGMV(colName)

    COUNTMV

    This section contains reference documentation for the COUNTMV function.

    Get the count of rows in a group

    hashtag
    Signature

    COUNTMV(colName)

    COVAR_POP

    This section contains reference documentation for the COVAR_POP function.

    Returns the population covariance between of 2 numerical columns.

    hashtag
    Signatures

    COVAR_POP(col1, col2) -> double

    FunnelCompleteCount(
        timestampExpression, 
        windowSize, 
        numberSteps, stepExpression
        [, stepExpression[, stepExpression, ...]]
        [, mode [, mode, ... ]]
    )
    SELECT user_id,
      funnelCompleteCount(
        ts,
        '1000000',
        4,
        event_name = 'screen_viewed',
        event_name = 'screen_clicked',
        event_name = 'cart_viewed',
        event_name = 'purchased'
      ) as rounds
    FROM clickstreamFunnel
    GROUP BY user_id
    ORDER BY user_id
    SELECT user_id,
      funnelCompleteCount(
        ts,
        '1000000',
        3,
        event_name = 'screen_viewed',
        event_name = 'screen_clicked',
        event_name = 'purchased',
        'strict_order'
      ) as rounds
    FROM clickstreamFunnel
    GROUP BY user_id
    ORDER BY user_id
    SELECT user_id,
      funnelCompleteCount(
        ts,
        '100000',
        3,
        event_name = 'screen_viewed',
        event_name = 'screen_clicked',
        event_name = 'purchased',
        'strict_order',
        'keep_all'
      ) as rounds
    FROM clickstreamFunnel
    GROUP BY user_id
    ORDER BY user_id
    SELECT user_id,
      funnelMaxStep(
        ts,
        '1000000',
        3,
        event_name = 'screen_viewed',
        event_name = 'screen_clicked',
        event_name = 'purchased',
        'strict_order'
      ) as rounds
    FROM clickstreamFunnel
    GROUP BY user_id
    ORDER BY user_id
    parameters provided.

    screen_clicked

    1718112406

    1

    purchased

    1718112407

    1

    screen_viewed

    1718112405

    2

    screen_clicked

    1718112406

    2

    purchased

    1718112407

    2

    screen_viewed

    1718112404

    3

    screen_clicked

    1718112405

    3

    cart_viewed

    1718112406

    3

    purchased

    1718112407

    3

    screen_viewed

    1717939609

    4

    screen_clicked

    1718112405

    4

    purchased

    1718112405

    4

    SELECT length('Pinot') AS value
    FROM ignoreMe

    5

    returns binary data (represented as a Hex string) from Base64-encoded string.

    hashtag
    Signature

    toBase64(bytesCol)

    fromBase64(stringCol)

    hashtag
    Usage Examples

    circle-info

    For better readability, the following examples converts string hello! into BYTES using toUtf8arrow-up-right function and converts the decoded BYTES into string using fromUtf8arrow-up-right.

    encoded

    aGVsbG8h

    decoded

    hello!

    circle-info

    Note that without UTF8 string conversion, returned BYTES will be represented as a Hex string following Pinot's BYTES column representationarrow-up-right. See the example below.

    decoded

    68656c6c6f21

    circle-exclamation

    Note that the following query will throw compilation error as string is not a valid input type for toBase64.

    java.util.Base64.Encoderarrow-up-right
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    The following query returns the documents that have a DivTailNums with more than one value:

    DivTailNums

    N7713A,N7713A

    N344AA,N344AA

    N344AA,N344AA

    N7713A,N7713A

    You can count the distinct number of items in these rows by running the following query:

    value

    2

    select DivTailNums
    from airlineStats 
    where arraylength(DivTailNums) > 1
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    value

    18.465753424657535

    select AVGMV(DivLongestGTimes) AS value
    from airlineStats 
    where arraylength(DivLongestGTimes) > 1
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    The following query returns the documents that have a DivTailNums with more than one value:

    DivTailNums

    N7713A,N7713A

    N344AA,N344AA

    N344AA,N344AA

    N7713A,N7713A

    You can count the number of items in these rows by running the following query:

    value

    8

    select DivTailNums
    from airlineStats 
    where arraylength(DivTailNums) > 1
    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    covariance

    2314.249154477403

    COVAR_POP(col1, col2) = E[col1 * col2] - E[col1]E[col2]
    SELECT COVAR_POP(numberOfGames, hits) AS covariance 
    FROM baseballStats
    SELECT toBase64(toUtf8('hello!')) AS encoded
    FROM ignoreMe
    SELECT fromUtf8(fromBase64('aGVsbG8h')) AS decoded
    FROM ignoreMe
    SELECT fromBase64('aGVsbG8h') AS decoded
    FROM ignoreMe
    SELECT toBase64('hello!') AS encoded
    FROM ignoreMe
    select DISTINCTCOUNTMV(DivTailNums) AS value
    from airlineStats 
    where arraylength(DivTailNums) > 1
    select COUNTMV(DivTailNums) AS value
    from airlineStats 
    where arraylength(DivTailNums) > 1
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    DivTailNums
    DivAirports
    unionIds

    N7713A,N7713A

    IND,IND

    N7713A,IND

    N344AA,N344AA

    MCI,BOS

    N344AA,MCI,BOS

    N7713A,N7713A

    IND,IND

    N7713A,IND

    Histogram

    This section contains reference documentation for the HISTOGRAM function.

    Returns the count of data points that fall within each bin as a vector. The bins are left-inclusive and right-exclusive, i.e. [a, b), except for the last one which is inclusive on both sides [a, b].

    hashtag
    Signatures

    1. Equal length bins (better performance):

    HISTOGRAM(colName, lower, upper, numBins)

    1. Arbitrary increasing bin edges:

    HISTOGRAM(colName, ARRAY[binEdge1, binEdge2, binEdge3, ...])

    hashtag
    Usage Examples

    These examples are based on the .

    1. 10 equal-length bins [0, 20), [20, 30) ... [180, 200]

    histogram
    1. 6 bins (- ∞, 1), [1, 10), [10, 50), [50,100), [100,500), [500, 1000]

    histogram

    dayOfYear

    This section contains reference documentation for the dayOfYear function.

    Returns the day of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 366.

    hashtag
    Signature

    dayOfYear(tsInMillis)

    dayOfYear(tsInMillis, timeZoneId)

    doy(tsInMillis)

    doy(tsInMillis, timeZoneId)

    hashtag
    Usage Examples

    dayOfYear
    dayOfYear
    dayOfYear
    dayOfYear

    arrayRemoveString

    This section contains reference documentation for the arrayRemoveString function.

    Removes value from array of strings.

    hashtag
    Signature

    arrayRemoveString('colName', value)

    hashtag
    Usage Examples

    These examples are based on the .

    DivAirportIDs
    value
    select DivTailNums, 
           DivAirports,
           arrayUnionString(DivTailNums, DivAirports) AS unionIds
    from airlineStats 
    WHERE arraylength(DivTailNums) >= 2
    limit 5

    N344AA,N344AA

    MCI,BOS

    N344AA,MCI,BOS

    32348,21519,11359,7587,5488,5360,6282,7361,585,0

    13520,16506,18375,12403,28591,8494

    Batch Quick Start
    select dayOfYear(1639351800000) AS dayOfYear
    FROM ignoreMe

    346

    347

    346

    347

    SEA,PSC

    PSC

    SEA,PSC,PHX,MSY

    PSC,PHX,MSY

    SEA,PSC,PHX,MSY

    PSC,PHX,MSY

    SEA,PSC

    PSC

    SEA,PSC

    PSC

    Hybrid Quick Start
    SELECT HISTOGRAM(numberOfGames, 0, 200, 10) AS histogram
    FROM baseballStats 
    select HISTOGRAM(AtBatting, Array['-Infinity', 1, 10, 50, 100, 500, 1000]) AS histogram
    from baseballStats
    select dayOfYear(1639351800000, 'CET') AS dayOfYear
    FROM ignoreMe
    select doy(1639351800000) AS dayOfYear
    FROM ignoreMe
    select doy(1639351800000, 'CET') AS dayOfYear
    FROM ignoreMe
    select RandomAirports, 
           arrayRemoveString(RandomAirports, 'SEA') AS value
    from airlineStats 
    WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
    limit 5

    FrequentLongsSketch

    This section contains reference documentation for the FREQUENTLONGSSKETCH function.

    FREQUENTLONGSSKETCH is an estimation data-sketch function which can be used to estimate the frequency of an item. It is based on Apache Datasketches libraryarrow-up-right and returns a serialized sketch object which can be merged with other sketches.

    hashtag
    Signature

    FREQUENTLONGSSKETCH(column, maxMapSize=256) -> Base64 encoded sketch object

    • column (required): Name of the column to aggregate on. Needs to be a type which can be cast into 'LONG'.

    • maxMapSize: This value specifies the maximum physical length of the internal hash map. The maxMapSize must be a power of 2 and the default value is 256.

    hashtag
    Usage Example

    frequentlongssketch(AirlineID)

    Which can be used, for example in Java as:

    For more examples on the sketch API, refer to the Datasketches .

    DISTINCTCOUNTRAWHLL

    This section contains reference documentation for the DISTINCTCOUNTRAWHLL function.

    Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

    hashtag
    Signature

    DISTINCTCOUNTRAWHLL(colName, log2m)

    hashtag
    Usage Examples

    These examples are based on the .

    value
    value

    FromEpoch

    This section contains reference documentation for the fromEpoch functions.

    Convert epoch to epoch milliseconds. The following time units are supported:

    • SECONDS

    • MINUTES

    • HOURS

    • DAYS

    hashtag
    Signature

    FromEpoch<TIME_UNIT>(timeIn<Time_UNIT>)

    hashtag
    Usage Examples

    epochMillis
    epochMillis
    epochMillis
    epochMillis

    dayOfWeek

    This section contains reference documentation for the dayOfWeek function.

    Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday).

    hashtag
    Signature

    dayOfWeek(tsInMillis)

    dayOfWeek(tsInMillis, timeZoneId)

    BAEKCAUAAAAOAAAAAA...

    documentationarrow-up-right
    select DISTINCTCOUNTRAWHLL(teamID) AS value
    from baseballStats 

    00000008000000ac00000800000084000210000000000020001020220030042002100420002010020210000300008020040180400001300310001863024004220870800004400421040104610220080000020000040000030000800002108420000110400800000106000060000080020000082000218c0002000000020000010200100000018c0006000400022004a0000088000200800000320820021000000221842000000000025088000220080100009420

    000000010000000400000106

    Batch Quick Start

    1613472303000

    1613472300000

    1613469600000

    1613433600000

    dow(tsInMillis)

    dow(tsInMillis, timeZoneId)

    hashtag
    Usage Examples

    dayOfWeek

    7

    dayOfWeek

    1

    dayOfWeek

    7

    dayOfWeek

    1

    select dayOfWeek(1639351800000) AS dayOfWeek
    FROM ignoreMe
    select FREQUENTLONGSSKETCH(AirlineID, 16) from airlineStats
    byte[] byteArr = Base64.getDecoder().decode(encodedSketch);
    LongsSketch sketch = LongsSketch.getInstance(Memory.wrap(byteArr));
    
    LongsSketch.Row[] items = sketch.getFrequentItems(ErrorType.NO_FALSE_NEGATIVES);
    for (int i = 0; i < items.length; i++) {
      LongsSketch.Row item = items[i];
      System.out.printf("Airline: %d, Frequency: %d %n", item.getItem(), item.getEstimate());
    }
    select DISTINCTCOUNTRAWHLL(teamID, 1) AS value
    from baseballStats 
    select FromEpochSeconds(1613472303) AS epochMillis
    FROM ignoreMe
    select FromEpochMinutes(26891205) AS epochMillis
    FROM ignoreMe
    select FromEpochHours(448186) AS epochMillis
    FROM ignoreMe
    select FromEpochDays(18674) AS epochMillis
    FROM ignoreMe
    select dayOfWeek(1639351800000, 'CET') AS dayOfWeek
    FROM ignoreMe
    select dow(1639351800000) AS dayOfWeek
    FROM ignoreMe
    select dow(1639351800000, 'CET') AS dayOfWeek
    FROM ignoreMe

    LEAD

    This section contains reference documentation for the LEAD function.

    Returns the value from a following row in the same result set, based on a specified physical offset. It can be used to compare values in the current row with values in a subsequent row.

    hashtag
    Signature

    hashtag
    Arguments

    • expression: The column or calculation from which the value is to be returned.

    • offset: The number of rows before the current row from which to retrieve the value. The default is 1 if not specified.

    • default: The value to return if the offset goes beyond the scope of the window. If not specified, NULL is returned.

    hashtag
    Example

    Forecast next day's sales based on current data.

    Anticipate the next payment amount for budget planning.

    Identify potential increases in expenses or revenue.

    Forecast next day's sales based on current data This example shows how to use the LEAD function to anticipate sales for the next day.

    Output:

    sales_date
    sales_amount
    next_day_sales

    Anticipate the next payment amount for budget planning This query retrieves the next payment amount for each transaction to assist in financial forecasting and budgeting.

    Output:

    transaction_id
    payment_date
    amount
    next_payment_amount

    Identify potential increases in expenses or revenue Utilize the LEAD function to examine monthly data and predict potential increases or trends in expenses or revenue for future planning.

    Output:

    month
    year
    expenses
    next_month_expenses

    Use with CTE:

    FromEpochBucket

    This section contains reference documentation for the fromEpochBucket functions.

    Convert epoch to epoch milliseconds. e.g. 10 seconds since epoch or 5 minutes since Epoch. The following time units are supported:

    • SECONDS

    • MINUTES

    • HOURS

    • DAYS

    hashtag
    Signature

    FromEpoch<TIME_UNIT>Bucket(timeInMillis, bucketSize)

    hashtag
    Usage Examples

    bucket
    bucket
    bucket
    bucket
    bucket

    hour

    This section contains reference documentation for the hour function.

    Returns the hour of the day from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 23.

    hashtag
    Signature

    hour(tsInMillis)

    hour(tsInMillis, timeZoneId)

    hashtag
    Usage Examples

    hour
    hour

    ln

    This section contains reference documentation for the ln function.

    Natural log of value i.e. ln(col1)

    hashtag
    Signature

    LN(col1)

    hashtag
    Usage Examples

    value
    value

    arraySortString

    This section contains reference documentation for the arraySortString function.

    Sorts array of strings.

    hashtag
    Signature

    arraySortString('colName')

    max

    This section contains reference documentation for the max function.

    Get the maximum value in a group

    hashtag
    Signature

    MAX(colName)

    DISTINCTCOUNTBITMAP

    This section contains reference documentation for the DISTINCTCOUNTBITMAP function.

    Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions. For accurate distinct counting on all column types, see .

    hashtag
    Signature

    DISTINCTCOUNTBITMAP(colName)

    LEAD(any expression [, bigint offset [, any default]])

    1613472303000

    3226944606000

    1613472000000

    1613466000000

    1613088000000

    select hour(1639351800000) AS hour
    FROM ignoreMe

    23

    select hour(1639351800000, 'CET') AS hour
    FROM ignoreMe

    0

    select ln(1) AS value
    from ignoreMe

    0

    select ln(12) AS value
    from ignoreMe

    2.4849066497880004

    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    value

    73

    select max(homeRuns) AS value
    from baseballStats 
    hashtag
    Usage Examples

    These examples are based on the Batch Quick Start.

    value

    7

    value

    148

    DISTINCTCOUNT
    select DISTINCTCOUNTBITMAP(league) AS value
    from baseballStats 
    select FromEpochSecondsBucket(1613472303, 1) AS bucket
    FROM ignoreMe
    select FromEpochSecondsBucket(1613472303, 2) AS bucket
    FROM ignoreMe
    select FromEpochMinutesBucket(2689120, 10) AS bucket
    FROM ignoreMe
    select FromEpochHoursBucket(89637, 5) AS bucket
    FROM ignoreMe
    select FromEpochDaysBucket(1867, 10) AS bucket
    FROM ignoreMe
    select DISTINCTCOUNTBITMAP(teamID) AS value
    from baseballStats 
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    FlightNum
    sortedAirports
    RandomAirports

    3846

    PSC,SEA

    SEA,PSC

    3635

    MSY,PHX,PSC,SEA

    SEA,PSC,PHX,MSY

    429

    MSY,PHX,PSC,SEA

    SEA,PSC,PHX,MSY

    2023-02-14

    200

    180

    2023-02-15

    180

    220

    2023-02-16

    220

    NULL

    416

    2023-02-14 21:21:59.996577

    2.99

    4.99

    516

    2023-02-14 21:23:39.996577

    4.99

    4.99

    239

    2023-02-14 21:29:00.996577

    4.99

    1

    2023

    1000

    1100

    2

    2023

    1100

    1200

    3

    2023

    1200

    6.99

    NULL

    JSONPATH

    This section contains reference documentation for the JSONPATH function.

    Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.

    hashtag
    Signature

    JSONPATH(jsonField, 'jsonPath')

    Arguments
    Description
    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the to test JSON expressions before you import any data.

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    This function can be used in the to extract the name property into the name column and age property into the age column, as described below:

    MINMAXRANGEMV

    This section contains reference documentation for the MINMAXRANGEMV function.

    Returns the max - min value in a group

    hashtag
    Signature

    MINMAXRANGEMV(colName)

    hashtag
    Usage Examples

    These examples are based on the .

    value

    arraySliceInt

    This section contains reference documentation for the arraySliceInt function.

    Returns the values in the array between the start and end positions.

    hashtag
    Signature

    arraySliceInt('colName', start, end)

    select FlightNum, 
           arraySortString(RandomAirports) AS sortedAirports, 
           RandomAirports
    from airlineStats 
    WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
    limit 5
    SELECT
        sales_date,
        sales_amount,
        LEAD(sales_amount, 1) OVER (ORDER BY sales_date) AS next_day_sales
    FROM
        daily_sales;
    SELECT
        transaction_id,
        payment_date,
        amount,
        LEAD(amount, 1) OVER (ORDER BY payment_date) AS next_payment_amount
    FROM
        payments;
    SELECT
        month,
        year,
        expenses,
        LEAD(expenses, 1) OVER (ORDER BY year, month) AS next_month_expenses
    FROM
        financials;
    WITH tmp AS (
      select count(*) as num_trips,
        DaysSinceEpoch
      from airlineStats
      GROUP BY DaysSinceEpoch
    )
    
    SELECT DaysSinceEpoch,
      num_trips,
      LEAD(num_trips, 2) OVER (
        ORDER BY DaysSinceEpoch
      ) AS previous_num_trips,
      num_trips - LEAD(num_trips, 2) OVER (
        ORDER BY DaysSinceEpoch
      ) AS difference
    FROM tmp;

    1206

    PSC,SEA

    SEA,PSC

    5300

    PSC,SEA

    SEA,PSC

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows JsonPath Syntaxarrow-up-right to read values from JSON documents.

    JSONPATH(data, '$.name')

    "Pete"

    JSONPATH(data, '$.age')

    24

    Jayway JsonPath Evaluator Toolarrow-up-right
    table config
    select MINMAXRANGEMV(DivLongestGTimes) AS value
    from airlineStats 
    where arraylength(DivLongestGTimes) > 1

    106

    Hybrid Quick Start
    {
      "data": {
        "name": "Pete",
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"name",
                   "transformFunction":"JSONPATHSTRING(data, '$.name')"
                },
                {
                   "columnName":"age",
                   "transformFunction":"JSONPATHSTRING(data, '$.age')"
                }
             ]
          }
       }
    }
    hashtag
    Usage Examples

    These examples are based on the Hybrid Quick Start.

    FlightNum
    airports
    DivAirportIDs

    1531

    13891

    13891,12892

    19

    14683

    14683,14683

    829

    12339

    12339,12339

    select FlightNum, 
           arraySliceInt(DivAirportIDs, 0, 1) AS airports, 
    	     DivAirportIDs
    from airlineStats 
    WHERE arraylength(DivAirportIDs) >= 2
    limit 5

    24

    13198

    13198,10721

    548

    10721

    10721,12478

    JSONPATHARRAYDEFAULTEMPTY

    This section contains reference documentation for the JSONPATHARRAYDEFAULTEMPTY function.

    Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. This function can only be used in an ingestion transformation function.

    hashtag
    Signature

    JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath')

    Arguments
    Description
    circle-exclamation

    'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers. You can use the to test JSON expressions before you import any data.

    hashtag
    Usage Examples

    The usage examples are based on extracting fields from the following JSON document:

    Expression
    Value

    This function can be used in the to extract the name, score, and second value of homework_grades into their respective columns , as described below:

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows to read values from JSON documents.

    JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].name')

    ["maths", "english"]

    JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].score')

    [90, 70]

    JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].homework_grades[1]')

    [85, 65]

    JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].homework_grades[7]')

    []

    Jayway JsonPath Evaluator Toolarrow-up-right
    table config
    {
      "data": {
        "name": "Pete",
        "age": 24,
        "subjects": [
          {
            "name": "maths",
            "homework_grades": [80, 85, 90, 95, 100],
            "grade": "A",
            "score": 90
          },
          {
            "name": "english",
            "homework_grades": [60, 65, 70, 85, 90],
            "grade": "B",
            "score": 70
          }
        ]
      }
    }
    {
       "tableConfig":{
          "ingestionConfig":{
             "transformConfigs":[
                {
                   "columnName":"names",
                   "transformFunction":"JSONPATHARRAYDEFAULTEMPTY(data, '$.subjects[*].name')"
                },
                {
                   "columnName":"ages",
                   "transformFunction":"JSONPATHARRAYDEFAULTEMPTY(data, '$.subjects[*].score')"
                },
                {
                   "columnName":"homeworkGrades",
                   "transformFunction":"JSONPATHARRAYDEFAULTEMPTY(data, '$.subjects[*].homework_grades[1]')"
                }
             ]
          }
       }
    }
    JsonPath Syntaxarrow-up-right

    FunnelMaxStep

    The FunnelMaxStep function in Pinot is designed to track user progress through a predefined series of steps or stages in a funnel, such as user interactions on a website from page views to purchases. This function is particularly useful for analyzing how far users progress through a conversion process within a specified time window.

    hashtag
    Syntax

    hashtag
    Return

    This function returns the Integer value of the max steps that window funnel could proceed forward.

    hashtag
    Arguments

    1. timestampExpression:

      • Type: Expression in TIMESTAMP or LONG

    hashtag
    Optional Mode Supported

    hashtag
    STRICT_DEDUPLICATION

    The STRICT_DEDUPLICATION mode ensures that repeating occurrences of the same event condition within a funnel sequence disrupt further processing of the funnel for that user session. This mode is crucial when it's important to identify and measure unique, non-repeated actions in a sequence, ensuring each step of the funnel represents a distinct action.

    hashtag
    Practical Impact

    • Event Sequence Interruption: When an event that satisfies a current step condition occurs repeatedly without progression to the next step, strict_deduplication interrupts and essentially ends the analysis of the funnel for that sequence. This prevents the funnel from incorrectly advancing if the same action is merely repeated instead of moving through the intended steps.

    • Enhanced Accuracy in Funnel Progression: This mode is useful for scenarios where the continuity and progression of distinct steps are critical for accurate conversion analysis. It avoids the misinterpretation of user engagement where repeated similar actions might otherwise suggest a false progression through the funnel.

    hashtag
    Example

    For instance, if a funnel is designed to track user progression from a homepage visit, to a search, to adding an item to a cart, and then to checkout, the strict_deduplication mode would stop processing the funnel sequence if the user performs multiple searches without proceeding to add an item to the cart. This ensures that only a linear, non-repetitive progression through these steps is considered as valid funnel movement.

    This mode helps maintain the integrity of each step in the user's journey, ensuring that the data reflects true user behavior without overcounting repetitive actions that do not lead to actual progression.

    hashtag
    STRICT_ORDER

    The strict_order mode enforces a stringent sequence order for events within a funnel. This mode ensures that the progression through the steps follows the exact specified order without any intervening events that are not part of the defined sequence.

    hashtag
    Behavior of strict_order

    • Sequence Adherence: The strict_order mode requires that the events occur in the exact order specified without any other types of events intervening. If an event occurs that is not the next expected step in the defined sequence, the analysis of the funnel for that user session is halted.

    • Early Termination: In the presence of an out-of-sequence event, the analysis stops, and the maximum event level is determined as the last correct step in the sequence before the interruption. For instance, in a specified sequence of A -> B -> C, if the sequence is A -> B -> D, then the funnel analysis terminates after B because D is not the expected next step (C).

    hashtag
    Practical Impact

    • Enhanced Precision in Path Analysis: This mode is particularly valuable when the precise order of actions is critical for the analysis, such as in strict process flows where each step must be followed in a specific order to be considered successful.

    • Avoids Misinterpretation: It prevents the misinterpretation of funnel progress where intervening or unordered events could suggest a misleading path through the funnel.

    hashtag
    Example

    Consider a scenario where a funnel is set up to track user progression through the following steps: logging in (A), searching for products (B), adding a product to the cart (C), and completing a purchase (D). Using the strict_order mode, if the sequence goes A -> B -> E -> C, the analysis will terminate after B because E (an unexpected event like viewing account details) intervenes before C, the expected next step. Therefore, the maximum step reached is reported as 2, representing the successful completion of steps A and B only.

    This mode is crucial for scenarios requiring strict compliance to process steps, ensuring that only users who follow the exact intended sequence are counted in the funnel analysis.

    hashtag
    STRICT_INCREASE

    The strict_increase is designed to ensure that the sequence of events being analyzed has strictly increasing timestamps. This mode is crucial for accurately tracking and analyzing user behavior in scenarios where the chronological order of events directly impacts the interpretation of user actions within a funnel.

    hashtag
    Behavior of strict_increase

    • Timestamp Order: This mode requires that each subsequent event in the funnel must have a timestamp greater than the previous event. It ensures that the user's actions are not only in the correct sequence but also follow a temporal progression without any backtracking or simultaneous actions.

    • Analysis Integrity: If any event in the sequence does not adhere to the strictly increasing order by timestamp, the analysis for that sequence either stops at that point or ignores the out-of-order event, depending on how critical the temporal sequence is to the funnel's logic.

    hashtag
    Practical Impact

    • Temporal Validation: This mode is particularly useful in scenarios where the timing of events is crucial, such as in sessions where actions must follow one another in real-time to be considered valid. It validates the sequence not just by the type of event, but also by ensuring that these events are progressively happening over time.

    • Avoiding Data Errors: It helps in avoiding potential data errors or anomalies where timestamps might not have been recorded correctly, or events may appear out of order due to system errors or delays in logging events.

    hashtag
    Example

    Consider a funnel designed to analyze a user's journey from visiting a website to making a purchase, defined by the following steps: page visit (A), item addition (B), checkout initiation (C), and payment completion (D). Using the strict_increase mode, the funnel will only consider sequences where each action occurs later than the previous. If a user's sequence is A (t1) -> B (t2) -> A (t3) -> C (t4) with t3 being less than or equal to t2, then the analysis will ignore the second occurrence of A or terminate, depending on the specific implementation and requirements of the analysis.

    This mode helps ensure that the funnel analysis reflects true, linear progress through the intended actions, with each step occurring in a timely, sequential manner.

    hashtag
    KEEP_ALL

    The KEEP_ALL mode is designed to ensure that all events in the data set are considered in the analysis, even if they do not match any of the specified step conditions in the funnel sequence. This mode is particularly useful for comprehensive data analysis where the context of non-matching events may still provide valuable insights about user behavior or system performance.

    hashtag
    Behavior of KEEP_ALL

    • Inclusive Analysis: In the KEEP_ALL mode, the funnel function includes every event within the specified time window in the analysis, regardless of whether these events correspond to the predefined steps in the funnel. This allows for a more holistic view of the user's actions during the session.

    • Context Retention: By including all events, this mode helps retain the full context of a user's session, capturing activities that may not be directly related to the funnel but could influence or explain the user's behavior and decisions at other points.

    hashtag
    Practical Impact

    • Enhanced Insight: This mode is invaluable for scenarios where understanding the entirety of user interactions is crucial, such as in complex user journeys where additional actions between the main funnel steps might influence the outcomes or indicate other patterns of interest.

    • Data Completeness: It prevents data loss from filtering out non-matching events, which can be important when analyzing sessions for comprehensive patterns, troubleshooting issues, or performing detailed user journey analysis.

    hashtag
    Example

    Consider a scenario where a funnel is set up to track user progress through steps like logging in, searching for a product, and making a purchase. With KEEP_ALL mode enabled, if a user performs additional actions such as updating profile information or viewing terms and conditions, these events are also included in the analysis. This comprehensive inclusion allows analysts to see a fuller picture of what the user did during their session, not just the actions that directly relate to the funnel. This can reveal if other activities are detracting from the main conversion goals, or if they are part of a broader user engagement that doesn't neatly fit into the primary funnel steps.

    This mode helps to ensure that no potential insights are lost by excluding events, making it a powerful option for detailed analysis and understanding of user interactions beyond the strict confines of the predefined funnel steps.

    hashtag
    Examples

    hashtag
    Data Set

    event_name
    ts
    user_id

    hashtag
    Queries

    hashtag
    Query funnels

    Response

    hashtag
    Query with strict_order

    Response

    hashtag
    Query with strict_order and keep_all

    Response

    hashtag
    Query with longer window

    Response

    FunnelMaxStep(
        timestampExpression, 
        windowSize, 
        numberSteps, stepExpression
        [, stepExpression[, stepExpression, ...]]
        [, mode [, mode, ... ]]
    )
    Description: This is an expression that evaluates to the timestamp of each event. It's used to determine the order of events for a particular user or session. The timestamp is crucial for evaluating whether subsequent actions fall within the specified window.
  • windowSize:

    • Type: LONG

    • Description: Specifies the size of the time window in which the sequence of funnel steps must occur. The window is defined in milliseconds. This parameter sets the maximum allowed time between the first and the last step in the funnel for them to be considered as part of the same user journey.

  • numberSteps:

    • Type: Integer

    • Description: Defines the total number of distinct steps in the funnel. This count should match the number of stepExpression parameters provided.

  • stepExpression:

    • Type: Boolean Expression

    • Description: These are expressions that define each step in the funnel. Typically, these are conditions that evaluate whether a specific event type or action has occurred. Multiple step expressions are separated by commas, with each expression corresponding to a step in the funnel sequence.

  • mode (optional):

    • Type: String

    • Description: Defines additional modes or options that alter how the funnel analysis is calculated. Common modes might include settings to handle overlapping events, reset the window upon each step, or other custom behaviors specific to the needs of the funnel analysis. If unspecified, the default behavior as defined by Pinot is used.

  • screen_clicked

    1718112406

    1

    purchased

    1718112407

    1

    screen_viewed

    1718112405

    2

    screen_clicked

    1718112406

    2

    purchased

    1718112407

    2

    screen_viewed

    1718112404

    3

    screen_clicked

    1718112405

    3

    cart_viewed

    1718112406

    3

    purchased

    1718112407

    3

    screen_viewed

    1717939609

    4

    screen_clicked

    1718112405

    4

    purchased

    1718112405

    4

    screen_viewed

    1718112402

    1

    screen_clicked

    1718112403

    1

    purchased

    1718112404

    1

    screen_viewed

    1718112405

    1

    user_id

    steps

    1

    2

    2

    2

    3

    4

    4

    2

    user_id

    steps

    1

    3

    2

    3

    3

    3

    4

    1

    user_id

    steps

    1

    3

    2

    3

    3

    2

    4

    1

    user_id

    steps

    1

    3

    2

    3

    3

    3

    4

    3

    SELECT user_id,
      funnelMaxStep(
        ts,
        '1000000',
        4,
        event_name = 'screen_viewed',
        event_name = 'screen_clicked',
        event_name = 'cart_viewed',
        event_name = 'purchased'
      ) as steps
    FROM clickstreamFunnel
    GROUP BY user_id
    ORDER BY user_id
    SELECT user_id,
      funnelMaxStep(
        ts,
        '100000',
        3,
        event_name = 'screen_viewed',
        event_name = 'screen_clicked',
        event_name = 'purchased',
        'strict_order'
      ) as steps
    FROM clickstreamFunnel
    GROUP BY user_id
    ORDER BY user_id
    SELECT user_id,
      funnelMaxStep(
        ts,
        '100000',
        3,
        event_name = 'screen_viewed',
        event_name = 'screen_clicked',
        event_name = 'purchased',
        'strict_order',
        'keep_all'
      ) as steps
    FROM clickstreamFunnel
    GROUP BY user_id
    ORDER BY user_id
    SELECT user_id,
      funnelMaxStep(
        ts,
        '1000000',
        3,
        event_name = 'screen_viewed',
        event_name = 'screen_clicked',
        event_name = 'purchased',
        'strict_order'
      ) as steps
    FROM clickstreamFunnel
    GROUP BY user_id
    ORDER BY user_id

    FUNNELCOUNT

    This section contains reference documentation for the FUNNELCOUNT function.

    Funnel analytics aggregation function.

    Returns array of distinct correlated counts for each funnel step.

    hashtag
    Signature

    FUNNEL_COUNT (

    STEPS ( predicate1, predicate2 ... ),

    CORRELATE_BY ( correlation_column ),

    SETTINGS ( setting1, setting2 ... ) )

    Parameter
    Arguments
    Description

    hashtag
    Usage Examples

    Many datasets are time series in nature, tracking events of an entity over time. An example of such a dataset could be a user analytics activity log from a commerce web application.

    hashtag
    Example

    user_id
    event_time
    url

    hashtag
    Funnel

    We want to analyse the following checkout funnel:

    • /cart/add

    • /checkout/start

    • /checkout/confirmation

    hashtag
    Counts

    We want to answer the following questions about the above funnel:

    • How many users entered the top of the funnel?

    • How many of these users proceeded to the second step?

    • How many users reached the bottom of the funnel after completing all steps?

    hashtag
    Query

    counts

    Notes

    Notice that although U1 user added to cart twice, it still counted as one conversion in the first step, as we report on unique counts rather than total events. Also notice that although U2 events were logged out of order, we still counted the user as converted.

    Equivalence

    The above query is equivalent to the below presto SQL query:

    hashtag
    Settings

    For a large dataset we could use for example a theta_sketch strategy, or furthermore, partition the data by user_id and leverage a partitioned strategy. It is also important to filter in the where clause so to aggregate only necessary rows.

    counts

    hashtag
    Another Example

    We now want to learn how many users checkout after a text search; as opposed to other entry points such as browsing a product category listing. We want to then analyse the following funnel:

    • /product/search

    • /cart/add

    • /checkout/start

    hashtag
    Query

    counts

    Notes

    Notice that U1 is not counted in this funnel, as the user did not perform any product search. Both U2 and U3 entered the top of the funnel and performed the second step, but only U2 converted to the bottom of the funnel.

    U3

    2021-10-01 10:02:00.000

    /product/listing

    U3

    2021-10-01 10:05:00.000

    /product/search

    U2

    2021-10-01 10:06:00.000

    /product/search

    U2

    2021-10-01 10:15:00.000

    /checkout/start

    U2

    2021-10-01 10:16:00.000

    /cart/add

    U3

    2021-10-01 11:17:00.000

    /product/details

    U2

    2021-10-01 11:18:00.000

    /checkout/confirmation

    U3

    2021-10-01 11:21:00.000

    /cart/add

    U1

    2021-10-01 11:33:00.000

    /cart/add

    U1

    2021-10-01 11:46:00.000

    /checkout/start

    U1

    2021-10-01 11:54:00.000

    /checkout/confirmation

    /checkout/confirmation

    STEPS

    predicates 1...n

    (required) These are individual predicates representing funnel steps which are applied on rows selected by the where clause. Distinct values from the correlation_column that satisfy these predicates are counted per step. For example, all filtered rows that match url = '/checkout' are unionized into a set. Sets are intersected with the sets resulted from the preceding steps, each step retaining only individuals present in previous steps. Finally, unique counts are returned for each step in the funnel.

    CORRELATE_BY

    correlation_column

    (required) Column to leverage for funnel correlation, distinct values from this column are counted per step during aggregation. Only dictionary-encoded columns are supported.

    SETTINGS

    settings 1...n

    (optional) Settings to select and configure a funnel counting strategy:

    bitmap (default): This strategy is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions. For accurate distinct counting on all column types, use 'set' instead. See also DISTINCTCOUNTBITMAP.

    set: This strategy uses fastutilarrow-up-right hash sets. Use with care, unbounded memory cost. See also DISTINCTCOUNT.

    theta_sketch: This strategy leverages Theta Sketcharrow-up-right framework to provide an approximate funnel count with a small memory footprint. See also DISTINCTCOUNTTHETASKETCH.

    nominalEntries: theta-sketch strategy parameter (defaults to 4096). Can only be used in conjunction with theta_sketch setting.

    partitioned: This strategy counts funnel steps per segment, then sums up step counts across segments. Correlation column should be configured as partition column for this strategy. See also .

    sorted: This strategy counts funnel steps per segment with zero memory footprint. Correlation column should be configured as sort column for this strategy. Can only be used in conjunction with partitioned setting.

    U1

    2021-10-01 09:01:00.000

    /product/listing

    U2

    2021-10-01 09:17:00.000

    /product/search

    U1

    2021-10-01 09:33:00.000

    /product/details

    U1

    2021-10-01 09:47:00.000

    3, 2, 2

    3, 2, 2

    2, 2, 1, 1

    /cart/add

    select 
      FUNNEL_COUNT(
        STEPS(
          url = '/cart/add', 
          url = '/checkout/start', 
          url = '/checkout/confirmation'),
        CORRELATE_BY(user_id)
      ) AS counts
    from user_log 
    select 
       ARRAY[
         count_if(steps[1]),
         count_if(steps[1] and steps[2]),
         count_if(steps[1] and steps[2] and steps[3])
       ] as counts
     from (
       select 
         ARRAY[
           bool_or(url = '/cart/add'),
           bool_or(url = '/checkout/start'),
           bool_or(url = '/checkout/confirmation')
         ] as steps
       from user_log
       group by user_id
     )
    select 
      FUNNEL_COUNT(
        STEPS(
          url = '/cart/add', 
          url = '/checkout/start', 
          url = '/checkout/confirmation'),
        CORRELATE_BY(user_id),
        SETTINGS('theta_sketch', 'nominalEntries=4096')
      ) AS counts
    from user_log 
    where url in ('/cart/add', '/checkout/start', '/checkout/confirmation')
    select 
      FUNNEL_COUNT(
        STEPS(
          url = '/product/search',
          url = '/cart/add', 
          url = '/checkout/start', 
          url = '/checkout/confirmation'),
        CORRELATE_BY(user_id)
      ) AS counts
    from user_log 
    SEGMENTPARTITIONEDDISTINCTCOUNT