1 of 100

Functions

This page contains reference documentation for functions in Apache Pinot.

ABS

This section contains reference documentation for the abs function.

Absolute of a value

Signature

ABS(col1)

Usage Examples

select ABS(-12.1) AS value
from ignoreMe

value

12.1

select ABS(12.1) AS value
from ignoreMe

value

12.1

ADD

This section contains reference documentation for the ADD function.

Sum of at least two values

Signature

ADD(col1, col2, col3...)

Usage Examples

These examples are based on the Batch Quick Start.

select homeRuns, baseOnBalls, ADD(homeRuns, baseOnBalls) AS total
from baseballStats 
WHERE teamID = 'ML1' 
AND yearID = 1956 
AND playerName = 'Henry Louis'

homeRuns

baseOnBalls

total

arrayConcatInt

This section contains reference documentation for the arrayConcatInt function.

Concatenates two arrays of ints.

Signature

arrayConcatInt('colName1', 'colName2')

Usage Examples

These examples are based on the .

DivWheelsOffs

concatIds

arrayConcatString

This section contains reference documentation for the arrayConcatString function.

Concatenates two arrays of strings.

Signature

arrayConcatString('colName1', 'colName2')

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivTailNums, 
       arrayConcatString(DivTailNums, DivTailNums) AS concatIds
from airlineStats 
WHERE arraylength(DivTailNums) >= 2
limit 5

DivTailNums

concatIds

N7713A,N7713A

N7713A,N7713A,N7713A,N7713A

N344AA,N344AA

N344AA,N344AA,N344AA,N344AA

N344AA,N344AA

N344AA,N344AA,N344AA,N344AA

N7713A,N7713A

N7713A,N7713A,N7713A,N7713A

arrayContainsInt

This section contains reference documentation for the arrayContainsInt function.

Checks if int value exists in array.

Signature

arrayContainsInt('colName', valueToFind)

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivAirportIDs, 
       arrayContainsInt(DivAirportIDs, 14683) AS containsValue
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5

DivAirportIDs

containsValue

13891,12892

false

14683,14683

true

12339,12339

false

13487,13930

false

13029,11292

false

arrayContainsString

This section contains reference documentation for the arrayContainsString function.

Checks if string value exists in array.

Signature

arrayContainsString('colName', valueToFind)

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivTailNums, 
       arrayContainsString(DivTailNums, 'N7713A') AS index
from airlineStats 
WHERE arraylength(DivTailNums) >= 2
limit 5

DivTailNums

index

N7713A,N7713A

true

N344AA,N344AA

false

N7713A,N7713A

true

arrayDistinctString

This section contains reference documentation for the arrayDistinctString function.

Returns unique values in an array of strings.

Signature

arrayDistinctString('colName')

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivTailNums, 
       arrayDistinctString(DivTailNums) AS unique
from airlineStats 
WHERE arraylength(DivTailNums) >= 2
limit 5

DivTailNums

unique

N7713A,N7713A

N7713A

N344AA,N344AA

N344AA

N344AA,N344AA

N344AA

N7713A,N7713A

N7713A

arrayDistinctInt

This section contains reference documentation for the arrayDistinctInt function.

Returns unique values in an array of ints.

Signature

arrayDistinctInt('colName')

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivAirportIDs, 
       arrayDistinctInt(DivAirportIDs) AS unique
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5

DivAirportIDs

unique

15016,11066

10620,14869

13891,12892

12264,10397

11066,12892

arrayIndexOfInt

This section contains reference documentation for the arrayIndexOfInt function.

Finds the last index of the given value in the array starting at the given index.

Signature

arrayIndexOfInt('colName', valueToFind)

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivAirportIDs, 
       arrayIndexOfInt(DivAirportIDs, 14683) AS index
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5

DivAirportIDs

index

13891,12892

-1

14683,14683

12339,12339

-1

13487,13930

-1

13029,11292

-1

arrayIndexOfString

This section contains reference documentation for the arrayIndexOfString function.

Finds the last index of the given value in the array starting at the given index.

Signature

arrayIndexOfString('colName', valueToFind)

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivTailNums, 
       arrayIndexOfString(DivTailNums, 'N7713A') AS index
from airlineStats 
WHERE arraylength(DivTailNums) >= 2
limit 5

DivTailNums

index

N7713A,N7713A

N344AA,N344AA

-1

N7713A,N7713A

ARRAYLENGTH

This section contains reference documentation for the ARRAYLENGTH function.

Returns the length of a multi-value column

Signature

ARRAYLENGTH('colName')

Usage Examples

These examples are based on the Hybrid Quick Start.

select ARRAYLENGTH(RandomAirports) AS length, count(*) 
from airlineStats 
GROUP BY length
ORDER BY count(*) DESC
LIMIT 5

length

count(*)

5382

267

223

166

160

The count(*) values will increase each time we execute the query as data is constantly being ingested by the Hybrid Quick Start.

arrayRemoveInt

This section contains reference documentation for the arrayRemoveInt function.

Removes value from array of ints.

Signature

arrayRemoveInt('colName', value)

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivAirportIDs, 
       arrayRemoveInt(DivAirportIDs, 12892) AS value
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
AND arrayContainsInt(DivAirportIDs, 12892) = 1
limit 5

DivAirportIDs

value

13891,12892

13891

13198,12892

13198

11066,12892

11066

13198,12892

13198

13891,12892

13891

arrayRemoveString

This section contains reference documentation for the arrayRemoveString function.

Removes value from array of strings.

Signature

arrayRemoveString('colName', value)

Usage Examples

These examples are based on the Hybrid Quick Start.

select RandomAirports, 
       arrayRemoveString(RandomAirports, 'SEA') AS value
from airlineStats 
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5

DivAirportIDs

value

SEA,PSC

PSC

SEA,PSC,PHX,MSY

PSC,PHX,MSY

SEA,PSC,PHX,MSY

PSC,PHX,MSY

SEA,PSC

PSC

SEA,PSC

PSC

arrayReverseInt

This section contains reference documentation for the arrayReverseInt function.

Reverses array of ints.

Signature

arrayReverseInt('colName')

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivAirportIDs, 
       arrayReverseInt(DivAirportIDs) AS reversedIds
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5

DivAirportIDs

reversedIds

13891,12892

12892,13891

14683,14683

12339,12339

13487,13930

13930,13487

13029,11292

11292,13029

arrayReverseString

This section contains reference documentation for the arrayReverseString function.

Reverses array of strings.

Signature

arrayReverseString('colName')

Usage Examples

These examples are based on the Hybrid Quick Start.

select FlightNum, 
       arrayReverseString(RandomAirports) AS reversedAirports, 
       RandomAirports
from airlineStats 
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5

FlightNum

reversedAirports

RandomAirports

1206

PSC,SEA

SEA,PSC

5300

PSC,SEA

SEA,PSC

3359

MSY,PHX,PSC,SEA

SEA,PSC,PHX,MSY

1023

PHX,PSC,SEA

SEA,PSC,PHX

963

MSY,PHX,PSC,SEA

SEA,PSC,PHX,MSY

arraySliceInt

This section contains reference documentation for the arraySliceInt function.

Returns the values in the array between the start and end positions.

Signature

arraySliceInt('colName', start, end)

Usage Examples

These examples are based on the Hybrid Quick Start.

select FlightNum, 
       arraySliceInt(DivAirportIDs, 0, 1) AS airports, 
	     DivAirportIDs
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5

FlightNum

airports

DivAirportIDs

1531

13891

13891,12892

14683

14683,14683

829

12339

12339,12339

13198

13198,10721

548

10721

10721,12478

arraySliceString

This section contains reference documentation for the arraySliceString function.

Returns the values in the array between the start and end positions.

Signature

arraySliceString('colName', start, end)

Usage Examples

These examples are based on the Hybrid Quick Start.

select FlightNum, 
       arraySliceString(RandomAirports, 0, 2) AS airports, 
       RandomAirports
from airlineStats 
WHERE arraylength(RandomAirports) BETWEEN 2 AND 4
limit 5

FlightNum

airports

RandomAirports

671

SEA,PSC

SEA,PSC,PHX,MSY

1767

SEA,PSC

SEA,PSC,PHX

2522

SEA,PSC

424

SEA,PSC

SEA,PSC,PHX,MSY

3162

SEA,PSC

SEA,PSC,PHX,MSY

arraySortInt

This section contains reference documentation for the arraySortInt function.

Sorts array of ints.

Signature

arraySortInt('colName')

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivAirportIDs, 
       arraySortInt(DivAirportIDs) AS sortedIds
from airlineStats 
WHERE arraylength(DivAirportIDs) >= 2
limit 5

DivAirportIDs

sortedIds

13891,12892

12892,13891

14683,14683

12339,12339

13198,10721

10721,13198

10721,12478

arraySortString

This section contains reference documentation for the arraySortString function.

Sorts array of strings.

Signature

arraySortString('colName')

Usage Examples

These examples are based on the .

FlightNum

sortedAirports

RandomAirports

arrayUnionInt

This section contains reference documentation for the arrayUnionInt function.

Create a union of two arrays of ints.

Signature

arrayUnionInt('colName1', 'colName2')

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivWheelsOffs, 
       DivWheelsOns,
       arrayUnionInt(DivWheelsOffs, DivWheelsOns) AS unionIds
from airlineStats 
WHERE arraylength(DivWheelsOffs) >= 2
limit 5

DivWheelsOffs

DivWheelsOns

unionIds

1453,1731

1415,1623

1453,1731,1415,1623

1908,1758

1339,2310

1908,1758,1339,2310

1453,1731

1415,1623

1453,1731,1415,1623

1908,1758

1339,2310

1908,1758,1339,2310

arrayUnionString

This section contains reference documentation for the arrayUnionString function.

Create a union of two arrays of strings.

Signature

arrayUnionString('colName1', 'colName2')

Usage Examples

These examples are based on the Hybrid Quick Start.

select DivTailNums, 
       DivAirports,
       arrayUnionString(DivTailNums, DivAirports) AS unionIds
from airlineStats 
WHERE arraylength(DivTailNums) >= 2
limit 5

DivTailNums

DivAirports

unionIds

N7713A,N7713A

IND,IND

N7713A,IND

N344AA,N344AA

MCI,BOS

N344AA,MCI,BOS

N7713A,N7713A

IND,IND

N7713A,IND

N344AA,N344AA

MCI,BOS

N344AA,MCI,BOS

AVGMV

This section contains reference documentation for the AVGMV function.

Get the avg of values in a group

Signature

AVGMV(colName)

Usage Examples

These examples are based on the Hybrid Quick Start.

select AVGMV(DivLongestGTimes) AS value
from airlineStats 
where arraylength(DivLongestGTimes) > 1

value

18.465753424657535

ceil

This section contains reference documentation for the CEIL function.

Rounded up to the nearest integer.

Signature

CEIL(col1)

Usage Examples

select CEIL(12.1) AS value
from ignoreMe

value

select CEIL(-12.1) AS value
from ignoreMe

value

-12

CHR

This section contains reference documentation for the CHR function.

the character corresponding to the Unicode codepoint

Signature

CHR(codepoint)

Usage Examples

SELECT CHR(65) AS value
FROM ignoreMe

value

codepoint

This section contains reference documentation for the CODEPOINT function.

the Unicode codepoint of the first character of the string

Signature

CODEPOINT(col)

Usage Examples

SELECT CODEPOINT('Apache Pinot') AS value
FROM ignoreMe

value

concat

This section contains reference documentation for the concat function.

Concatenate two input strings using the seperator

Signature

CONCAT(col1, col2, seperator)

Usage Examples

SELECT concat('Apache', 'Pinot', ' ') AS value
FROM ignoreMe

value

Apache Pinot

SELECT concat('real-time', 'analytics', '__') AS value
FROM ignoreMe

value

real-time__analytics

count

This section contains reference documentation for the count function.

Get the count of rows in a group

Signature

COUNT(colName)

Usage Examples

These examples are based on the Batch Quick Start.

select count(*) AS value
from baseballStats

value

97889

COUNTMV

This section contains reference documentation for the COUNTMV function.

Get the count of rows in a group

Signature

COUNTMV(colName)

Usage Examples

These examples are based on the .

The following query returns the documents that have a DivTailNums with more than one value:

DivTailNums

You can count the number of items in these rows by running the following query:

day

This section contains reference documentation for the day function.

Returns the day of the month from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 31.

Signature

day(tsInMillis)
day(tsInMillis, timeZoneId)
dayOfMonth(tsInMillis)
dayOfMonth(tsInMillis, timeZoneId)

Usage Examples

select day(1639351800000) AS day
FROM ignoreMe

day

select day(1639351800000, 'CET') AS day
FROM ignoreMe

day

select dayOfMonth(1639351800000) AS day
FROM ignoreMe

day

select dayOfMonth(1639351800000, 'CET') AS day
FROM ignoreMe

day

dayOfWeek

This section contains reference documentation for the dayOfWeek function.

Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday).

Signature

dayOfWeek(tsInMillis)
dayOfWeek(tsInMillis, timeZoneId)
dow(tsInMillis)
dow(tsInMillis, timeZoneId)

Usage Examples

select dayOfWeek(1639351800000) AS dayOfWeek
FROM ignoreMe

dayOfWeek

select dayOfWeek(1639351800000, 'CET') AS dayOfWeek
FROM ignoreMe

dayOfWeek

select dow(1639351800000) AS dayOfWeek
FROM ignoreMe

dayOfWeek

select dow(1639351800000, 'CET') AS dayOfWeek
FROM ignoreMe

dayOfWeek

dayOfYear

This section contains reference documentation for the dayOfYear function.

Returns the day of the year from the given epoch millis in UTC or specified timezone. The value ranges from 1 to 366.

Signature

dayOfYear(tsInMillis)
dayOfYear(tsInMillis, timeZoneId)
doy(tsInMillis)
doy(tsInMillis, timeZoneId)

Usage Examples

dayOfYear

DISTINCT

This section contains reference documentation for the DISTINCT function.

Returns the distinct row values in a group

Signature

DISTINCT(colName)

Usage Examples

These examples are based on the Batch Quick Start.

select DISTINCT league AS value
from baseballStats

value

select DISTINCT(league) AS value
from baseballStats

value

DISTINCTCOUNT

This section contains reference documentation for the DISTINCTCOUNT function.

Returns the count of distinct row values in a group

Signature

DISTINCTCOUNT(colName)

Usage Examples

These examples are based on the Batch Quick Start.

select DISTINCTCOUNT(league) AS value
from baseballStats

value

select DISTINCTCOUNT(teamID) AS value
from baseballStats

value

149

DISTINCTCOUNTBITMAP

This section contains reference documentation for the DISTINCTCOUNTBITMAP function.

Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions. For accurate distinct counting on all column types, see DISTINCTCOUNT.

Signature

DISTINCTCOUNTBITMAP(colName)

Usage Examples

These examples are based on the Batch Quick Start.

select DISTINCTCOUNTBITMAP(league) AS value
from baseballStats

value

select DISTINCTCOUNTBITMAP(teamID) AS value
from baseballStats

value

148

DISTINCTCOUNTBITMAPMV

This section contains reference documentation for the DISTINCTCOUNTHLLMV function.

Returns an approximate distinct count using HyperLogLog in a group

Signature

DISTINCTCOUNTHLLMV(colName)

Usage Examples

These examples are based on the Hybrid Quick Start.

select DISTINCTCOUNTHLLMV(DivLongestGTimes) AS value
from airlineStats 
where arraylength(DivLongestGTimes) > 1

value

DISTINCTCOUNTHLL

This section contains reference documentation for the DISTINCTCOUNTHLL function.

Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog. For accurate distinct counting, see DISTINCTCOUNT.

Signature

DISTINCTCOUNTHLL(colName, log2m)

Usage Examples

These examples are based on the Batch Quick Start.

select DISTINCTCOUNTHLL(teamID) AS value
from baseballStats

value

158

select DISTINCTCOUNTHLL(teamID, 12) AS value
from baseballStats

value

149

DISTINCTCOUNTHLLMV

This section contains reference documentation for the DISTINCTCOUNTBITMAPMV function.

Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.

Signature

DISTINCTCOUNTBITMAPMV(colName)

Usage Examples

These examples are based on the Hybrid Quick Start.

select DISTINCTCOUNTBITMAPMV(DivLongestGTimes) AS value
from airlineStats 
where arraylength(DivLongestGTimes) > 1

value

select DISTINCTCOUNTBITMAPMV(DivTailNums) AS value
from airlineStats 
where arraylength(DivTailNums) > 1

value

DISTINCTCOUNTMV

This section contains reference documentation for the DISTINCTCOUNTMV function.

Returns the count of distinct row values in a group

Signature

DISTINCTCOUNTMV(colName)

Usage Examples

These examples are based on the .

The following query returns the documents that have a DivTailNums with more than one value:

DivTailNums

You can count the distinct number of items in these rows by running the following query:

DISTINCTCOUNTRAWHLL

This section contains reference documentation for the DISTINCTCOUNTRAWHLL function.

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

Signature

DISTINCTCOUNTRAWHLL(colName, log2m)

Usage Examples

These examples are based on the Batch Quick Start.

select DISTINCTCOUNTHLL(teamID) AS value
from baseballStats

value

00000008000000ac00000800000084000210000000000020001020220030042002100420002010020210000300008020040180400001300310001863024004220870800004400421040104610220080000020000040000030000800002108420000110400800000106000060000080020000082000218c0002000000020000010200100000018c0006000400022004a0000088000200800000320820021000000221842000000000025088000220080100009420

select DISTINCTCOUNTRAWHLL(teamID, 1) AS value
from baseballStats

value

000000010000000400000106

DISTINCTCOUNTRAWHLLMV

This section contains reference documentation for the DISTINCTCOUNTRAWHLLMV function.

Signature

DISTINCTCOUNTRAWHLLMV(colName, log2m)

Usage Examples

These examples are based on the Hybrid Quick Start.

select DISTINCTCOUNTRAWHLLMV(DivAirports) AS value
from airlineStats 
where arraylength(DivAirports) > 1

value

00000008000000ac00000000000000000000000500000020000000000030000202108000040000010000000300010400000000000000000000000463000000000000000000010001041000200000002000000000000000000a00000000028001000000010800000000010000001008000000804000000000020000040000880000000000000000000000000000000000000000000000800000000800020004000000840000000002000000000000000000001400

select DISTINCTCOUNTRAWHLLMV(DivAirports, 1) AS value
from airlineStats 
where arraylength(DivAirports) > 1

value

0000000100000004000000e4

DISTINCTCOUNTRAWTHETASKETCH

This section contains reference documentation for the DISTINCTCOUNTRAWTHETASKETCH function.

The Theta Sketch framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Class and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.

Signature

DISTINCTCOUNTRAWTHETASKETCH(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate) -> HexEncoded

thetaSketchColumn (required): Name of the column to aggregate on.
thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.
- Currently, the only supported parameter is nominalEntries (defaults to 4096).
predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.

Usage Examples

These examples are based on the Batch Quick Start.

select distinctCountRawThetaSketch(teamID) AS value
from baseballStats

value

AgMDAAAKzJOVAAAAAACAPwDAATjfLK5fBJQy2rIU1GYLOK5a09G+XQ1UHWt00/NwFTC4EwzexhE3CHBSU+YIUzkM0goIADEeFViAmzCRcx5FeHrMHfGsU/qrFvMP+Q87UYRC7LFzZ0FV3PIfAF1FMFsM+E9XRwZRYoR79VdK7z1jAD/WClziDmb4Cosm3ctidcRl9VxfNTR47OUFqFP4dYQkZwXIEZtEhngdkGfqkQCKZPX85HITAZrwVDpI4TY6paDTZwLQNiemHFCUlEZCKcOMpkXuYypOxjzXi1ES+07IIH7EqrQeKcssHvOUh2gpzIDajYdQ4UTS6IBoXPB6AtbomPBiMalFURDzh+xppzrg5HcUTMW4Iuzgv5Mz/xIm73yOe7seghzwmH+zXUfda/mkaBqU6XQEAQFagTkndhYHHcjLb0OeQg4BGDAHtRIDD8EqsonkilQT6TZq2uM3CRXJQTlaYewzFvHsKivVomgcQRojVnPKBh0d0GgYeF4eIEXtD1bZTw43eVR1Dk6sBj3pjleOW21dRsUCRmyEDGdIfWQVJXouaUnZqaC9gi1oSrG7GT8HO2xXeb32OzfiHVx5s9+5bGpFXoXTu1n7g2Jone8JMyGuam2x7Bt55a1JdtFCFxhZ2Gd7IajHY4lNBH2lDfUoJed4f7kGUEXmlU6BCfwOkJ1CIoWBTQY+NToDhpmmmPY+rVOH5coybBHlH4vpfPBbbQsOjl0YBSC9uEmZ3WubqnV0KZ1p5d7wq/F0p7Wgo8y4JVXAobKCB+hsVckBNIA4XrYMzdWVSWeQsXHSuR+mWmJPftadyrMlfvoy2mVr8R4Dih7k3XNhXZwjBeuNJQA5Dtci6w0uIUczvEL+nY+9CSHEPQhuT//aluJ2De4Fk94cfWgaxqhYyh10TTIWZFmsDxJeOMaPT1BCwVRF6taOjftNbVDC5Fy1BtVzVIIUOGeBcj5VbhHtqowIB1qGEDIJy9ZBXD73iFBN5kVgvicaFGSKHGQqeIVsgOFdcFKITQTuV2d0pkljkPXKUIc68M0KPpU6iZYuaBA4+hGR9nri0tVnbJZOM1Z/fi01ou5YLYCoHTqkImozpJMYXLCqKtTBm2o7sc5oQATXUBC9dqM8xQoGL8OmltUWc1cX35rtD2D2zHL2IncEKMzsN/c6S31W74VTBtcbJfP9rHENp7yO453qYhA7m++jl2MKFzdvtkHqGDUcs9FKisV9Hx+ruhaGsLkdISszkZ3sYykjx3NH6BbbaCZf9jTswuxHKheTbaEDmSgrx7BfK+Z2My4jdMqCrEtKMSuJqEJ22AM5U8MNFVkCPTobkCEdJx0ZQJu+Tk73t1v3nqLUQH4PbFJzcUrr9yZFZ0u+1mzNNQ5o0w+v1dSRLGsXsPyRqGkQchuz/DKyrjJzf9Vb8HY4Ni63XiaXwgJrjq9rgAp6EmWV2xXUOI9CWZa7HsuRWO95m58nIq9K8VCkO+T/rWwrPqZ/tCgEtkshqecNhszQiki0d5Kf26o/YcATx4ZkJ655y4PTVr+kY0Xbb/UwEo2pPd3Hyd4hVz1I5N9TpYaJk2Lok1+7N+3LG+3Lj3KZtd5/+j8RujEmogI=

select distinctCountRawThetaSketch(teamID, 'nominalEntries=10') AS value
from baseballStats

value

AwMDAAAKzJMQAAAAAACAP4vpfPBbbQsO5N1zYV2cIwWFgU0GPjU6A4Z4HZBn6pEAyQE0gDhetgyKZPX85HITAQ4BGDAHtRIDEDub76OXYwoxK4moQnbYA9LogGhc8HoCE+k2atrjNwlVbhHtqowIBzd5VHUOTqwG+aRoGpTpdAT6PxG6MSaiAnshqMdjiU0EHEEaI1ZzygY=

We can also provide predicates and a post aggregation expression to compute more complicated cardinalities:

select distinctCountRawThetaSketch(
  yearID, 
  'nominalEntries=4096', 
  'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
  'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
  'SET_INTERSECT($1, $2)'
) AS value
from baseballStats

value

AQMDAAA6zJN8QPYIsvHMNQ==

DISTINCTCOUNTTHETASKETCH

This section contains reference documentation for the DISTINCTCOUNTTHETASKETCH function.

Signature

DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate) -> Long

thetaSketchColumn (required): Name of the column to aggregate on.
thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches.
- Currently, the only supported parameter is nominalEntries (defaults to 4096).
predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.

Usage Examples

These examples are based on the Batch Quick Start.

select distinctCountThetaSketch(teamID) AS value
from baseballStats

value

149

select distinctCountThetaSketch(teamID, 'nominalEntries=10') AS value
from baseballStats

value

146

We can also provide predicates and a post aggregation expression to compute more complicated cardinalities. For example, we could can find the intersection of the following queries:

select yearID
from baseballStats
where teamID = 'SFN' AND numberOfGames = 28 AND homeRuns = 1

yearID

1986

1985

select yearID
from baseballStats
where teamID = 'CHN' AND numberOfGames = 28 AND homeRuns = 1

yearID

1937

2003

1979

1900

1986

1978

2012

(the yearId 1986 is the only one in common)

By running the following query:

select distinctCountThetaSketch(
  yearID, 
  'nominalEntries=4096', 
  'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
  'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
  'SET_INTERSECT($1, $2)'
) AS value
from baseballStats

value

DIV

This section contains reference documentation for the DIV function.

Quotient of two values

Signature

DIV(col1, col2)

Usage Examples

These examples are based on the Batch Quick Start.

select homeRuns, numberOfGames, DIV(homeRuns, numberOfGames) AS total
from baseballStats 
WHERE teamID = 'ML1' 
AND yearID = 1956 
AND playerName = 'Henry Louis'

homeRuns

numberOfGames

total

153

0.16993464052287582

DATETIMECONVERT

This section contains reference documentation for the DATETIMECONVERT function.

Converts the value from a column that contains an epoch timestamp into another time unit and buckets based on the given time granularity.

Signature

DATETIMECONVERT(columnName, inputFormat, outputFormat, outputGranularity)

inputFormat and outputFormat are defined using the following structure:

<time size>:<time unit>:<time format>:<pattern>

where:

time size - size of the time unit eg: 1, 10
time unit - DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS, MICROSECONDS, NANOSECONDS
time format
- EPOCH
- SIMPLE_DATE_FORMAT pattern - defined in case of SIMPLE_DATE_FORMAT e.g. yyyy-MM-dd. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. Asia/Kolkata or PDT

granularity is specified in the format <time size>:<time unit>.

Usage Examples

These examples are based on the Batch JSON Quick Start.

created_at_timestamp from milliseconds since epoch to days since epoch, bucketed to 1 day granularity:

select id, 
       created_at_timestamp, 
       cast(created_at_timestamp AS long) AS timeInMs,
       DATETIMECONVERT(
         created_at_timestamp, 
         '1:MILLISECONDS:EPOCH', 
         '1:DAYS:EPOCH', 
         '1:DAYS'
       ) AS convertedTime
from githubEvents
WHERE id = 7044874134

created_at_timestamp

timeInMs

convertedTime

7044874109

2018-01-01 11:00:00.0

1514804402000

17532

created_at_timestamp bucketed to 15 minutes granularity:

select id, 
       created_at_timestamp, 
       cast(created_at_timestamp AS long) AS timeInMs,
       DATETIMECONVERT(
         created_at_timestamp, 
         '1:MILLISECONDS:EPOCH', 
         '1:MILLISECONDS:EPOCH', 
         '15:MINUTES'
       ) AS convertedTime
from githubEvents
WHERE id = 7044874134

created_at_timestamp

timeInMs

convertedTime

7044874109

2018-01-01 11:00:00.0

1514804402000

1514804400000

created_at_timestamp to format yyyy-MM-dd, bucketed to 1 days granularity:

select id, 
       created_at_timestamp, 
       cast(created_at_timestamp AS long) AS timeInMs,
       DATETIMECONVERT(
         created_at_timestamp, 
         '1:MILLISECONDS:EPOCH', 
         '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd', 
         '1:DAYS'
       ) AS convertedTime
from githubEvents
WHERE id = 7044874134

created_at_timestamp

timeInMs

convertedTime

7044874109

2018-01-01 11:00:00.0

1514804402000

2018-01-01

created_at_timestamp to format yyyy-MM-dd HH:mm, in timezone Pacific/Kiritimati:

select id, 
       created_at_timestamp, 
       cast(created_at_timestamp AS long) AS timeInMs,
       DATETIMECONVERT(
         created_at_timestamp, 
         '1:MILLISECONDS:EPOCH', 
         '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)', 
         '1:MILLISECONDS'
       ) AS convertedTime
from githubEvents
WHERE id = 7044874134

created_at_timestamp

timeInMs

convertedTime

7044874109

2018-01-01 11:00:00.0

1514804402000

2018-01-02 01:00

created_at_timestamp to format yyyy-MM-dd, in timezone Pacific/Kiritimati and bucketed to 1 day granularity:

select id, 
       created_at_timestamp, 
       cast(created_at_timestamp AS long) AS timeInMs,
       DATETIMECONVERT(
         created_at_timestamp, 
         '1:MILLISECONDS:EPOCH', 
         '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm tz(Pacific/Kiritimati)', 
         '1:DAYS'
       ) AS convertedTime
from githubEvents
WHERE id = 7044874134

created_at_timestamp

timeInMs

convertedTime

7044874109

2018-01-01 11:00:00.0

1514804402000

2018-01-02 00:00

DATETRUNC

This section contains reference documentation for the DATETRUNC function.

(Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc.

Converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.

Signature

DATETRUNC(unit, timeValue)
DATETRUNC(unit, timeValue, inputTimeUnitStr)
DATETRUNC(unit, timeValue, inputTimeUnitStr, timeZone)
DATETRUNC(unit, timeValue, inputTimeUnitStr, timeZone, outputTimeUnitStr)

unit supports the following values:

millisecond
second
minute
hour
day
week
month
quarter
year

inputTimeUnitStr and outputTimeUnitStr support the following values:

NANOSECONDS
MICROSECONDS
MILLISECONDS
SECONDS
MINUTES
HOURS
DAYS

Usage Examples

Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight):

Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the UTC time zone, returning a result in epoch in seconds in UTC timezone:

Truncates an epoch in milliseconds at WEEK (where a Week starts at Monday UTC midnight) in the CET time zone, returning a result in epoch in seconds in UTC timezone:

Truncates an epoch in milliseconds at QUARTER in the Los Angeles time zone (where a Quarter begins on Jan 1st, April 1st, July 1st, October 1st in Los Angeles timezone), returning a result in hours since UTC epoch:

exp

This section contains reference documentation for the exp function.

Euler’s number(e) raised to the power of col.

Signature

EXP(col1)

Usage Examples

value

FLOOR

This section contains reference documentation for the FLOOR function.

Rounded down to the nearest integer.

Signature

FLOOR(col1)

Usage Examples

select FLOOR(12.1) AS value
from ignoreMe

value

select FLOOR(-12.1) AS value
from ignoreMe

value

-13

FromDateTime

This section contains reference documentation for the FromDateTime function.

Converts a formatted date-time string to milliseconds, based on the provided .

Signature

FromDateTime(dateTimeString, pattern)

Usage Examples

epochMillis

FromEpoch

This section contains reference documentation for the fromEpoch functions.

Convert epoch to epoch milliseconds. The following time units are supported:

SECONDS
MINUTES
HOURS
DAYS

Signature

FromEpoch<TIME_UNIT>(timeIn<Time_UNIT>)

Usage Examples

epochMillis

FromEpochBucket

This section contains reference documentation for the fromEpochBucket functions.

Convert epoch to epoch milliseconds. e.g. 10 seconds since epoch or 5 minutes since Epoch. The following time units are supported:

SECONDS
MINUTES
HOURS
DAYS

Signature

FromEpoch<TIME_UNIT>Bucket(timeInMillis, bucketSize)

Usage Examples

bucket

hour

This section contains reference documentation for the hour function.

Returns the hour of the day from the given epoch millis in UTC or specified timezone. The value ranges from 0 to 23.

Signature

hour(tsInMillis)
hour(tsInMillis, timeZoneId)

Usage Examples

hour

JSONFORMAT

This section contains reference documentation for the JSONFORMAT function.

Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an .

Signature

JSONFORMAT(object)

Usage Examples

The usage examples are based on extracting fields from the following JSON document:

Expression

Value

This function can be used in the to extract the meta property into the data column, as described below:

JSONPATH

This section contains reference documentation for the JSONPATH function.

Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.

Signature

JSONPATH(jsonField, 'jsonPath')

Arguments

Description

jsonField

An Identifier/Expression contains JSON documents.

'jsonPath'

'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.

Usage Examples

The usage examples are based on extracting fields from the following JSON document:

{
  "data": {
    "name": "Pete",
    "age": 24,
    "subjects": [
      {
        "name": "maths",
        "homework_grades": [80, 85, 90, 95, 100],
        "grade": "A",
        "score": 90
      },
      {
        "name": "english",
        "homework_grades": [60, 65, 70, 85, 90],
        "grade": "B",
        "score": 70
      }
    ]
  }
}

Expression

Value

JSONPATH(data, '$.name')

"Pete"

JSONPATH(data, '$.age')

24

This function can be used in the table config to extract the name property into the name column and age property into the age column, as described below:

{
   "tableConfig":{
      "ingestionConfig":{
         "transformConfigs":[
            {
               "columnName":"name",
               "transformFunction":"JSONPATHSTRING(data, '$.name')"
            },
            {
               "columnName":"age",
               "transformFunction":"JSONPATHSTRING(data, '$.age')"
            }
         ]
      }
   }
}

JSONPATHARRAY

This section contains reference documentation for the JSONPATHARRAY function.

Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. This function can only be used in an ingestion transformation function.

Signature

JSONPATHARRAY(jsonField, 'jsonPath')

Arguments

Description

jsonField

An Identifier/Expression contains JSON documents.

'jsonPath'

'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.

Usage Examples

The usage examples are based on extracting fields from the following JSON document:

{
  "data": {
    "name": "Pete",
    "age": 24,
    "subjects": [
      {
        "name": "maths",
        "homework_grades": [80, 85, 90, 95, 100],
        "grade": "A",
        "score": 90
      },
      {
        "name": "english",
        "homework_grades": [60, 65, 70, 85, 90],
        "grade": "B",
        "score": 70
      }
    ]
  }
}

Expression

Value

JSONPATHARRAY(myJsonRecord, '$.subjects[*].name')

["maths", "english"]

JSONPATHARRAY(myJsonRecord, '$.subjects[*].score')

[90, 70]

JSONPATHARRAY(myJsonRecord, '$.subjects[*].homework_grades[1]')

[85, 65]

This function can be used in the table config to extract the name, score, and second value of homework_grades into their respective columns , as described below:

{
   "tableConfig":{
      "ingestionConfig":{
         "transformConfigs":[
            {
               "columnName":"names",
               "transformFunction":"JSONPATHARRAY(data, '$.subjects[*].name')"
            },
            {
               "columnName":"ages",
               "transformFunction":"JSONPATHARRAY(data, '$.subjects[*].score')"
            },
            {
               "columnName":"homeworkGrades",
               "transformFunction":"JSONPATHARRAY(data, '$.subjects[*].homework_grades[1]')"
            }
         ]
      }
   }
}

JSONPATHARRAYDEFAULTEMPTY

This section contains reference documentation for the JSONPATHARRAYDEFAULTEMPTY function.

Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. This function can only be used in an ingestion transformation function.

Signature

JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath')

Arguments

Description

jsonField

An Identifier/Expression contains JSON documents.

'jsonPath'

'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.

Usage Examples

The usage examples are based on extracting fields from the following JSON document:

{
  "data": {
    "name": "Pete",
    "age": 24,
    "subjects": [
      {
        "name": "maths",
        "homework_grades": [80, 85, 90, 95, 100],
        "grade": "A",
        "score": 90
      },
      {
        "name": "english",
        "homework_grades": [60, 65, 70, 85, 90],
        "grade": "B",
        "score": 70
      }
    ]
  }
}

Expression

Value

JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].name')

["maths", "english"]

JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].score')

[90, 70]

JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].homework_grades[1]')

[85, 65]

JSONPATHARRAYDEFAULTEMPTY(myJsonRecord, '$.subjects[*].homework_grades[7]')

[]

This function can be used in the table config to extract the name, score, and second value of homework_grades into their respective columns , as described below:

{
   "tableConfig":{
      "ingestionConfig":{
         "transformConfigs":[
            {
               "columnName":"names",
               "transformFunction":"JSONPATHARRAYDEFAULTEMPTY(data, '$.subjects[*].name')"
            },
            {
               "columnName":"ages",
               "transformFunction":"JSONPATHARRAYDEFAULTEMPTY(data, '$.subjects[*].score')"
            },
            {
               "columnName":"homeworkGrades",
               "transformFunction":"JSONPATHARRAYDEFAULTEMPTY(data, '$.subjects[*].homework_grades[1]')"
            }
         ]
      }
   }
}

JSONPATHDOUBLE

This section contains reference documentation for the JSONPATHDOUBLE function.

Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .

Signature

JSONPATHDOUBLE(jsonField, 'jsonPath', [defaultValue])

Arguments

Description

'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.

Usage Examples

The usage examples are based on extracting fields from the following JSON document:

JSONPATHLONG

This section contains reference documentation for the JSONPATHLONG function.

Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .

Signature

JSONPATHLONG(jsonField, 'jsonPath', [defaultValue])

Arguments

Description

'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.

Usage Examples

The usage examples are based on extracting fields from the following JSON document:

JSONPATHSTRING

This section contains reference documentation for the JSONPATHSTRING function.

Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error. This function can only be used in an .

Signature

JSONPATHSTRING(jsonField, 'jsonPath', [defaultValue])

Arguments

Description

'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.

Usage Examples

The usage examples are based on extracting fields from the following JSON document:

jsonextractkey

This section contains reference documentation for the JSONEXTRACTKEY function.

Extracts all matched JSON field keys based on 'jsonPath' into a STRING_ARRAY.

Signature

JSONEXTRACTKEY(jsonField, 'jsonPath')

Arguments

Description

'jsonPath'` is a literal. Pinot uses single quotes to distinguish them from identifiers.

Usage Examples

The examples in this section are based on the . In particular we'll be querying the row WHERE id = 7044874109.

repo

keys

jsonextractscalar

This section contains reference documentation for the JSONEXTRACTSCALAR function.

Evaluates the 'jsonPath' on jsonField, returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.

Signature

JSONEXTRACTSCALAR(jsonField, 'jsonPath', 'resultsType', [defaultValue])

Arguments

Description

'jsonPath'and`` ``'results_type'are literals. Pinot uses single quotes to distinguish them from identifiers.

Usage Examples

The examples in this section are based on the . In particular we'll be querying the row WHERE id = 7044874109:

repo

The following examples show how to use the JSONEXTRACTSCALAR function:

length

This section contains reference documentation for the length function.

calculate length of the string

Signature

LENGTH(col)

Usage Examples

value