arrow-left

All pages
gitbookPowered by GitBook
1 of 9

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Query

Learn how to query Apache Pinot using SQL or explore data using the web-based Pinot query console.

Querying Pinotchevron-right
Supported Transformationschevron-right
Supported Aggregationschevron-right
User-Defined Functions (UDFs)chevron-right
Cardinality Estimationchevron-right
Lookup UDF Joinchevron-right
Querying JSON datachevron-right

Lookup UDF Join

Lookup UDF is used to get dimension data via primary key from a dimension table allowing a decoration join functionality. Lookup UDF can only be used with a dimension table in Pinot. The UDF signature is as below:

lookUp('dimTableName', 'dimColToLookUp', 'dimJoinKey1', factJoinKeyVal1, 'dimJoinKey2', factJoinKeyVal2 ... )
  • dimTableName Name of the dim table to perform the lookup on.

  • dimColToLookUp The column name of the dim table to be retrieved to decorate our result.

  • dimJoinKey The column name on which we want to perform the lookup i.e. the join column name for dim table.

  • factJoinKeyVal The value of the dim table join column for which we will retrieve the dimColToLookUp for the scope and invocation.

Return type of the UDF will be that of the dimColToLookUp column type. There can also be multiple primary keys and corresponding values.

circle-exclamation

Note: If the dimension table uses a composite primary key i.e multiple primary keys, then ensure that the order of keys appearing in the lookup() UDF is same as the order defined for "primaryKeyColumns" in the dimension table schema.

Cardinality Estimation

Cardinality estimation is a classic problem. Pinot solves it with multiple ways each of which has a trade-off between accuracy and latency.

hashtag
Accurate Results

Functions:

Querying Pinot

Learn how to query Pinot using SQL

hashtag
DIALECT

Pinot uses Calcite SQL Parser to parse queries and uses MYSQL_ANSI dialect. You can see the grammar .

hashtag

DistinctCount(x) -> LONG

Returns accurate count for all unique values in a column.

The underlying implementation is using a IntOpenHashSet in library: it.unimi.dsi:fastutil:8.2.3 to hold all the unique values.

hashtag
Approximation Results

It usually takes a lot of resources and time to compute accurate results for unique counting on large datasets. In some circumstances, we can tolerate a certain error rate, in which case we can use approximation functions to tackle this problem.

hashtag
HyperLogLog

HyperLogLogarrow-up-right is an approximation algorithm for unique counting. It uses fixed number of bits to estimate the cardinality of given data set.

Pinot leverages HyperLogLog Classarrow-up-right in library com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.

Functions:

  • DistinctCountHLL(x)_ -> LONG_

For column type INT/LONG/FLOAT/DOUBLE/STRING , Pinot treats each value as an individual entry to add into HyperLogLog Object, then compute the approximation by calling method cardinality().

For column type BYTES, Pinot treats each value as a serialized HyperLogLog Object with pre-aggregated values inside. The bytes value is generated by org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog).

All deserialized HyperLogLog object will be merged into one then calling method **cardinality() **to get the approximated unique count.

hashtag
Theta Sketches

The Theta Sketcharrow-up-right framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Classarrow-up-right and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.

Functions:

  • DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**) **-> LONG

    • thetaSketchColumn (required): Name of the column to aggregate on.

    • thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches. Currently, the only supported parameter is nominalEntries.

    • predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.

    • postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.

In the example query below, the where clause is responsible for identifying the matching rows. Note, the where clause can be completely independent of the postAggregationExpression. Once matching rows are identified, each server unionizes all the sketches that match the individual predicates, i.e. country='USA' , device='mobile' in this case. Once the broker receives the intermediate sketches for each of these individual predicates from all servers, it performs the final aggregation by evaluating the postAggregationExpression and returns the final cardinality of the resulting sketch.

  • DistinctCountRawThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**)** -> HexEncoded Serialized Sketch Bytes

This is the same as the previous function, except it returns the byte serialized sketch instead of the cardinality sketch. Since Pinot returns responses as JSON strings, bytes are returned as hex encoded strings. The hex encoded string can be deserialized into sketch by using the library org.apache.commons.codec.binaryas Hex.decodeHex(stringValue.toCharArray()).

Limitations

Pinot does not support Joins or nested Subqueries and we recommend using Presto for queries that span multiple tables. Read Engineering Full SQL support for Pinot at Uberarrow-up-right for more info.

No DDL support. Tables can be created via the REST APIarrow-up-right.

hashtag
Identifier vs Literal

In Pinot SQL:

  • Double quotes(") are used to force string identifiers, e.g. column name.

  • Single quotes(') are used to enclose string literals.

Mis-using those might cause unexpected query results:

E.g.

  • WHERE a='b' means the predicate on the column a equals to a string literal value 'b'

  • WHERE a="b" means the predicate on the column a equals to the value of the column b

hashtag
Example Queries

  • Use single quotes for literals and double quotes (optional) for identifiers (column names)

  • If you name the columns as timestamp, date, or other reserved keywords, or the column name includes special characters, you need to use double quotes when you refer to them in the query.

hashtag
Simple selection

hashtag
Aggregation

hashtag
Grouping on Aggregation

hashtag
Ordering on Aggregation

hashtag
Filtering

For performant filtering of ids in a list, see Filtering with IdSetarrow-up-right.

hashtag
Filtering with NULL predicate

hashtag
Selection (Projection)

hashtag
Ordering on Selection

hashtag
Pagination on Selection

Note: results might not be consistent if column ordered by has same value in multiple rows.

hashtag
Wild-card match (in WHERE clause only)

To count rows where the column airlineName starts with U

hashtag
Case-When Statement

Pinot supports the CASE-WHEN-ELSE statement.

Example 1:

Example 2:

hashtag
UDF

Functions have to be implemented within Pinot. Injecting functions is not yet supported. The example below demonstrate the use of UDFs. More examples in Transform Function in Aggregation Groupingarrow-up-right

hashtag
BYTES column

Pinot supports queries on BYTES column using HEX string. The query response also uses hex string to represent bytes value.

E.g. the query below fetches all the rows for a given UID.

herearrow-up-right
select distinctCountThetaSketch(
  sketchCol, 
  'nominalEntries=1024', 
  'country'=''USA'' AND 'state'=''CA'', 'device'=''mobile'', 'SET_INTERSECT($1, $2)'
) 
from table 
where country = 'USA' or device = 'mobile...' 
//default to limit 10
SELECT * 
FROM myTable 

SELECT * 
FROM myTable 
LIMIT 100
SELECT COUNT(*), MAX(foo), SUM(bar) 
FROM myTable
SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo), bar, baz 
FROM myTable
GROUP BY bar, baz 
LIMIT 50
SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo), bar, baz 
FROM myTable
GROUP BY bar, baz 
ORDER BY bar, MAX(foo) DESC 
LIMIT 50
SELECT COUNT(*) 
FROM myTable
  WHERE foo = 'foo'
  AND bar BETWEEN 1 AND 20
  OR (baz < 42 AND quux IN ('hello', 'goodbye') AND quuux NOT IN (42, 69))
SELECT COUNT(*) 
FROM myTable
  WHERE foo IS NOT NULL
  AND foo = 'foo'
  AND bar BETWEEN 1 AND 20
  OR (baz < 42 AND quux IN ('hello', 'goodbye') AND quuux NOT IN (42, 69))
SELECT * 
FROM myTable
  WHERE quux < 5
  LIMIT 50
SELECT foo, bar 
FROM myTable
  WHERE baz > 20
  ORDER BY bar DESC
  LIMIT 100
SELECT foo, bar 
FROM myTable
  WHERE baz > 20
  ORDER BY bar DESC
  LIMIT 50, 100
SELECT COUNT(*) 
FROM myTable
  WHERE REGEXP_LIKE(airlineName, '^U.*')
  GROUP BY airlineName LIMIT 10
SELECT
    CASE
      WHEN price > 30 THEN 3
      WHEN price > 20 THEN 2
      WHEN price > 10 THEN 1
      ELSE 0
    END AS price_category
FROM myTable
SELECT
  SUM(
    CASE
      WHEN price > 30 THEN 30
      WHEN price > 20 THEN 20
      WHEN price > 10 THEN 10
      ELSE 0
    END) AS total_cost
FROM myTable
SELECT COUNT(*)
FROM myTable
GROUP BY DATETIMECONVERT(timeColumnName, '1:MILLISECONDS:EPOCH', '1:HOURS:EPOCH', '1:HOURS')
SELECT * 
FROM myTable
WHERE UID = 'c8b3bce0b378fc5ce8067fc271a34892'

Filtering with IdSet

Learn how to write fast queries for looking up ids in a list of values.

A common use case is filtering on an id field with a list of values. This can be done with the IN clause, but this approach doesn't perform well with large lists of ids. In these cases, you can use an IdSet.

hashtag
Functions

hashtag
ID_SET

ID_SET(columnName, 'sizeThresholdInBytes=8388608;expectedInsertions=5000000;fpp=0.03' )

This function returns a base 64 encoded IdSet of the values for a single column. The IdSet implementation used depends on the column data type:

  • INT - RoaringBitmap unless sizeThresholdInBytes is exceeded, in which case Bloom Filter.

  • LONG - Roaring64NavigableMap unless sizeThresholdInBytes is exceeded, in which case Bloom Filter.

  • Other types - Bloom Filter

The following parameters are used to configure the Bloom Filter:

  • expectedInsertions - Number of expected insertions for the BloomFilter, must be positive

  • fpp - Desired false positive probability for the BloomFilter, must be positive and < 1.0

Note that when a Bloom Filter is used, the filter results are approximate - you can get false-positive results (for membership in the set), leading to potentially unexpected results.

hashtag
IN_ID_SET

IN_ID_SET(columnName, base64EncodedIdSet)

This function returns 1 if a column contains a value specified in the IdSet and 0 if it does not.

hashtag
IN_SUBQUERY

IN_SUBQUERY(columnName, subQuery)

This function generates an IdSet from a subquery and then filters ids based on that IdSet on a Pinot broker.

hashtag
IN__PARTITIONED__SUBQUERY

IN_PARTITIONED_SUBQUERY(columnName, subQuery)

This function generates an IdSet from a subquery and then filters ids based on that IdSet on a Pinot server.

This function works best when the data is partitioned by the id column and each server contains all the data for a partition. The generated IdSet for the subquery will be smaller as it will only contain the ids for the partitions served by the server. This will give better performance.

hashtag
Examples

hashtag
Create IdSet

You can create an IdSet of the values in the yearID column by running the following:

idset(yearID)

When creating an IdSet for values in non INT/LONG columns, we can configure the expectedInsertions:

idset(playerName)
idset(playerName)

We can also configure the fpp parameter:

idset(playerName)

hashtag
Filter by values in IdSet

We can use the IN_ID_SET function to filter a query based on an IdSet. To return rows for yearIDs in the IdSet, run the following:

hashtag
Filter by values not in IdSet

To return rows for yearIDs not in the IdSet, run the following:

hashtag
Filter on broker

To filter rows for yearIDs in the IdSet on a Pinot Broker, run the following query:

To filter rows for yearIDs not in the IdSet on a Pinot Broker, run the following query:

hashtag
Filter on server

To filter rows for yearIDs in the IdSet on a Pinot Server, run the following query:

To filter rows for yearIDs not in the IdSet on a Pinot Server, run the following query:

hashtag

User-Defined Functions (UDFs)

Pinot currently supports two ways for you to implement your own functions:

  • Groovy Scripts

  • Scalar Functions

hashtag
Groovy Scripts

Pinot allows you to run any function using scripts. The syntax for executing Groovy script within the query is as follows:

GROOVY('result value metadata json', ''groovy script', arg0, arg1, arg2...)

This function will execute the groovy script using the arguments provided and return the result that matches the provided result value metadata. The function requires the following arguments:

  • Result value metadata json - json string representing result value metadata. Must contain non-null keys resultType and isSingleValue.

  • Groovy script to execute- groovy script string, which uses arg0, arg1, arg2

Examples

  • Add colA and colB and return a single-value INT groovy( '{"returnType":"INT","isSingleValue":true}', 'arg0 + arg1', colA, colB)

  • Find the max element in mvColumn array and return a single-value INT

    groovy('{"returnType":"INT","isSingleValue":true}', 'arg0.toList().max()', mvColumn)

hashtag
Scalar Functions

Since the 0.5.0 release, Pinot supports custom functions that return a single output for multiple inputs. Examples of scalar functions can be found in and

Pinot automatically identifies and registers all the functions that have the @ScalarFunction annotation.

Only Java methods are supported.

hashtag
Adding user defined scalar functions

You can add new scalar functions as follows:

  • Create a new java project. Make sure you keep the package name as org.apache.pinot.scalar.XXXX

  • In your java project include the dependency

  • Annotate your methods with @ScalarFunction annotation. Make sure the method is static and returns only a single value output. The input and output can have one of the following types -

    • Integer

  • Place the compiled JAR in the /plugins directory in pinot. You will need to restart all Pinot instances if they are already running.

  • Now, you can use the function in a query as follows:

Note that the function name in SQL is the same as the function name in Java. The SQL function name is case-insensitive as well.

ATowAAABAAAAAAA7ABAAAABtB24HbwdwB3EHcgdzB3QHdQd2B3cHeAd5B3oHewd8B30Hfgd/B4AHgQeCB4MHhAeFB4YHhweIB4kHigeLB4wHjQeOB48HkAeRB5IHkweUB5UHlgeXB5gHmQeaB5sHnAedB54HnwegB6EHogejB6QHpQemB6cHqAc=

AwIBBQAAAAL/////////////////////

AwIBBQAAAAz///////////////////////////////////////////////9///////f///9/////7///////////////+/////////////////////////////////////////////8=

AwIBBwAAAA/////////////////////////////////////////////////////////////////////////////////////////////////////////9///////////////////////////////////////////////7//////8=

etc to refer to the arguments provided within the script
  • arguments - pinot columns/other transform functions that are arguments to the groovy script

  • Find all elements of the array mvColumn and return as a multi-value LONG column

    groovy('{"returnType":"LONG","isSingleValue":false}', 'arg0.findIndexValues{ it > 5 }', mvColumn)

  • Multiply length of array mvColumn with colB and return a single-value DOUBLE

    groovy('{"returnType":"DOUBLE","isSingleValue":true}', 'arg0 * arg1', arraylength(mvColumn), colB)

  • Find all indexes in mvColumnA which have value foo, add values at those indexes in mvColumnB

    groovy( '{"returnType":"DOUBLE","isSingleValue":true}', 'def x = 0; arg0.eachWithIndex{item, idx-> if (item == "foo") {x = x + arg1[idx] }}; return x' , mvColumnA, mvColumnB)

  • Switch case which returns a FLOAT value depending on length of mvCol array

    groovy('{\"returnType\":\"FLOAT\", \"isSingleValue\":true}', 'def result; switch(arg0.length()) { case 10: result = 1.1; break; case 20: result = 1.2; break; default: result = 1.3;}; return result.floatValue()', mvCol)

  • Any Groovy script which takes no arguments

    groovy('new Date().format( "yyyyMMdd" )', '{"returnType":"STRING","isSingleValue":true}')

  • Long
  • Double

  • String

  • Apache Groovyarrow-up-right
    StringFunctions
    DateTimeFunctions
    SELECT ID_SET(yearID)
    FROM baseballStats
    WHERE teamID = 'WS1'
    SELECT ID_SET(playerName, 'expectedInsertions=10')
    FROM baseballStats
    WHERE teamID = 'WS1'
    SELECT ID_SET(playerName, 'expectedInsertions=100')
    FROM baseballStats
    WHERE teamID = 'WS1'
    SELECT ID_SET(playerName, 'expectedInsertions=100;fpp=0.01')
    FROM baseballStats
    WHERE teamID = 'WS1'
    SELECT yearID, count(*) 
    FROM baseballStats 
    WHERE IN_ID_SET(
     yearID,   
     'ATowAAABAAAAAAA7ABAAAABtB24HbwdwB3EHcgdzB3QHdQd2B3cHeAd5B3oHewd8B30Hfgd/B4AHgQeCB4MHhAeFB4YHhweIB4kHigeLB4wHjQeOB48HkAeRB5IHkweUB5UHlgeXB5gHmQeaB5sHnAedB54HnwegB6EHogejB6QHpQemB6cHqAc='
      ) = 1 
    GROUP BY yearID
    SELECT yearID, count(*) 
    FROM baseballStats 
    WHERE IN_ID_SET(
      yearID,   
      'ATowAAABAAAAAAA7ABAAAABtB24HbwdwB3EHcgdzB3QHdQd2B3cHeAd5B3oHewd8B30Hfgd/B4AHgQeCB4MHhAeFB4YHhweIB4kHigeLB4wHjQeOB48HkAeRB5IHkweUB5UHlgeXB5gHmQeaB5sHnAedB54HnwegB6EHogejB6QHpQemB6cHqAc='
      ) = 0 
    GROUP BY yearID
    SELECT yearID, count(*) 
    FROM baseballStats 
    WHERE IN_SUBQUERY(
      yearID, 
      'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''WS1'''
      ) = 1
    GROUP BY yearID  
    SELECT yearID, count(*) 
    FROM baseballStats 
    WHERE IN_SUBQUERY(
      yearID, 
      'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''WS1'''
      ) = 0
    GROUP BY yearID  
    SELECT yearID, count(*) 
    FROM baseballStats 
    WHERE IN_PARTITIONED_SUBQUERY(
      yearID, 
      'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''WS1'''
      ) = 1
    GROUP BY yearID  
    SELECT yearID, count(*) 
    FROM baseballStats 
    WHERE IN_PARTITIONED_SUBQUERY(
      yearID, 
      'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''WS1'''
      ) = 0
    GROUP BY yearID  
    <dependency>
      <groupId>org.apache.pinot</groupId>
      <artifactId>pinot-common</artifactId>
      <version>0.5.0</version>
     </dependency>
    include 'org.apache.pinot:pinot-common:0.5.0'
    //Example Scalar function
    
    @ScalarFunction
    static String mySubStr(String input, Integer beginIndex) {
      return input.substring(beginIndex);
    }
    SELECT mysubstr(playerName, 4) 
    FROM baseballStats

    Querying JSON data

    To see how JSON data can be queried, assume that we have the following table:

    We also assume that "jsoncolumn" has a Json Indexarrow-up-right on it. Note that the last two rows in the table have different structure than the rest of the rows. In keeping with JSON specification, a JSON column can contain any valid JSON data and doesn't need to adhere to a predefined schema. To pull out the entire JSON document for each row, we can run the query below:

    id
    jsoncolumn

    "101"

    "{"name":{"first":"daffy","last":"duck"},"score":101,"data":["a","b","c","d"]}"

    102"

    To drill down and pull out specific keys within the JSON column, we simply append the JsonPath expression of those keys to the end of the column name.

    id
    jsoncolumn.name.last
    jsoncolumn.name.first
    jsoncolumn.data[1]

    Note that the third column (jsoncolumn.data[1]) is null for rows with id 106 and 107. This is because these rows have JSON documents that don't have a key with JsonPath jsoncolumn.data[1]. We can filter out these rows.

    id
    jsoncolumn.name.last
    jsoncolumn.name.first
    jsoncolumn.data[1]

    Notice that certain last names (duck and mouse for example) repeat in the data above. We can get a count of each last name by running a GROUP BY query on a JsonPath expression.

    jsoncolumn.name.last
    count(*)

    Also there is numerical information (jsconcolumn.score) embeded within the JSON document. We can extract those numerical values from JSON data into SQL and sum them up using the query below.

    jsoncolumn.name.last
    sum(jsoncolumn.score)

    In short, JSON querying support in Pinot will allow you to use a JsonPath expression whereever you can use a column name with the only difference being that to query a column with data type JSON, you must append a JsonPath expression after the name of the column.

    Table myTable:
      id        INTEGER
      jsoncolumn    JSON 
    
    Table data:
    101,{"name":{"first":"daffy"\,"last":"duck"}\,"score":101\,"data":["a"\,"b"\,"c"\,"d"]}
    102,{"name":{"first":"donald"\,"last":"duck"}\,"score":102\,"data":["a"\,"b"\,"e"\,"f"]}
    103,{"name":{"first":"mickey"\,"last":"mouse"}\,"score":103\,"data":["a"\,"b"\,"g"\,"h"]}
    104,{"name":{"first":"minnie"\,"last":"mouse"}\,"score":104\,"data":["a"\,"b"\,"i"\,"j"]}
    105,{"name":{"first":"goofy"\,"last":"dwag"}\,"score":104\,"data":["a"\,"b"\,"i"\,"j"]}
    106,{"person":{"name":"daffy duck"\,"companies":[{"name":"n1"\,"title":"t1"}\,{"name":"n2"\,"title":"t2"}]}}
    107,{"person":{"name":"scrooge mcduck"\,"companies":[{"name":"n1"\,"title":"t1"}\,{"name":"n2"\,"title":"t2"}]}}
    SELECT id, jsoncolumn 
    FROM myTable

    "mouse"

    "mickey"

    "b"

    "104"

    "mouse"

    "minnie"

    "b"

    "105"

    "dwag"

    "goofy"

    "b"

    "106"

    "null"

    "null"

    "null"

    "107"

    "null"

    "null"

    "null"

    "mouse"

    "mickey"

    "b"

    "104"

    "mouse"

    "minnie"

    "b"

    "105"

    "dwag"

    "goofy"

    "b"

    "{"name":{"first":"donald","last":"duck"},"score":102,"data":["a","b","e","f"]}

    "103"

    "{"name":{"first":"mickey","last":"mouse"},"score":103,"data":["a","b","g","h"]}

    "104"

    "{"name":{"first":"minnie","last":"mouse"},"score":104,"data":["a","b","i","j"]}"

    "105"

    "{"name":{"first":"goofy","last":"dwag"},"score":104,"data":["a","b","i","j"]}"

    "106"

    "{"person":{"name":"daffy duck","companies":[{"name":"n1","title":"t1"},{"name":"n2","title":"t2"}]}}"

    "107"

    "{"person":{"name":"scrooge mcduck","companies":[{"name":"n1","title":"t1"},{"name":"n2","title":"t2"}]}}"

    "101"

    "duck"

    "daffy"

    "b"

    "102"

    "duck"

    "donald"

    "b"

    "101"

    "duck"

    "daffy"

    "b"

    "102"

    "duck"

    "donald"

    "b"

    "mouse"

    "2"

    "duck"

    "2"

    "dwag"

    "1"

    "mouse"

    "207"

    "dwag"

    "104"

    "duck"

    "203"

    "103"

    "103"

    SELECT id, jsoncolumn.name.last, jsoncolumn.name.first, jsoncolumn.data[1] 
    FROM myTable
    SELECT id, jsoncolumn.name.last, jsoncolumn.name.first, jsoncolumn.data[1] 
    FROM myTable 
    WHERE jsoncolumn.data[1] IS NOT NULL
    SELECT jsoncolumn.name.last, count(*) 
    FROM myTable 
    WHERE jsoncolumn.data[1] IS NOT NULL 
    GROUP BY jsoncolumn.name.last 
    ORDER BY 2 DESC
    SELECT jsoncolumn.name.last, sum(jsoncolumn.score) 
    FROM myTable 
    WHERE jsoncolumn.name.last IS NOT NULL 
    GROUP BY jsoncolumn.name.last

    Supported Aggregations

    Pinot provides support for aggregations using GROUP BY. You can use the following functions to get the aggregated value.

    Get the maximum value in a group

    MAX(playerScore)

    SUM

    Get the sum of values in a group

    SUM(playerScore)

    AVG

    Get the average of the values in a group

    AVG(playerScore)

    MODE

    Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.

    MODE(playerScore)

    MODE(playerScore, 'MIN')

    MODE(playerScore, 'MAX')

    MODE(playerScore, 'AVG')

    MINMAXRANGE

    Returns the max - min value in a group

    MINMAXRANGE(playerScore)

    PERCENTILE(column, N)

    Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

    PERCENTILE(playerScore, 50), PERCENTILE(playerScore, 99.9)

    PERCENTILEEST(column, N)

    Returns the Nth percentile of the group using algorithm

    PERCENTILEEST(playerScore, 50), PERCENTILEEST(playerScore, 99.9)

    PERCENTILETDigest(column, N)

    Returns the Nth percentile of the group using

    PERCENTILETDIGEST(playerScore, 50), PERCENTILETDIGEST(playerScore, 99.9)

    DISTINCT

    Returns the distinct row values in a group

    DISTINCT(playerName)

    DISTINCTCOUNT

    Returns the count of distinct row values in a group

    DISTINCTCOUNT(playerName)

    DISTINCTCOUNTBITMAP

    Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.

    DISTINCTCOUNTBITMAP(playerName)

    DISTINCTCOUNTHLL

    Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.

    DISTINCTCOUNTHLL(playerName, 12)

    DISTINCTCOUNTRAWHLL

    Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

    DISTINCTCOUNTRAWHLL(playerName)

    FASTHLL (Deprecated)

    WARN: will be deprecated soon. FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

    FASTHLL(playerName)

    DISTINCTCOUNTTHETASKETCH

    See

    DISTINCTCOUNTRAWTHETASKETCH

    See

    SEGMENTPARTITIONEDDISTINCTCOUNT

    Returns the count of distinct values of a column when the column is pre-partitioned for each segment, where there is no common value within different segments. This function calculates the exact count of distinct values within the segment, then simply sums up the results from different segments to get the final result.

    SEGMENTPARTITIONEDDISTINCTCOUNT(playerName)

    hashtag
    Multi-value column functions

    The following aggregation functions can be used for multi-value columns

    Function

    Description

    Example

    COUNTMV

    Get the count of rows in a group

    COUNTMV(playerName)

    MINMV

    Get the minimum value in a group

    MINMV(playerScores)

    MAXMV

    Get the maximum value in a group

    MAXMV(playerScores)

    SUMMV

    Function

    Description

    Example

    COUNT

    Get the count of rows in a group

    COUNT(*)

    MIN

    Get the minimum value in a group

    MIN(playerScore)

    MAX

    Get the sum of values in a group

    SUMMV(playerScores)

    AVGMV

    Get the avg of values in a group

    AVGMV(playerScores)

    MINMAXRANGEMV

    Returns the max - min value in a group

    MINMAXRANGEMV(playerScores)

    PERCENTILEMV(column, N)

    Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

    PERCENTILEMV(playerScores, 50),

    PERCENTILEMV(playerScores, 99.9)

    PERCENTILEESTMV(column, N)

    Returns the Nth percentile of the group using algorithm

    PERCENTILEESTMV(playerScores, 50),

    PERCENTILEESTMV(playerScores, 99.9)

    PERCENTILETDIGESTMV(column, N)

    Returns the Nth percentile of the group using

    PERCENTILETDIGESTMV(playerScores, 50),

    PERCENTILETDIGESTMV(playerScores, 99.9),

    DISTINCTCOUNTMV

    Returns the count of distinct row values in a group

    DISTINCTCOUNTMV(playerNames)

    DISTINCTCOUNTBITMAPMV

    Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.

    DISTINCTCOUNTBITMAPMV(playerNames)

    DISTINCTCOUNTHLLMV

    Returns an approximate distinct count using HyperLogLog in a group

    DISTINCTCOUNTHLLMV(playerNames)

    DISTINCTCOUNTRAWHLLMV

    Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

    DISTINCTCOUNTRAWHLLMV(playerNames)

    FASTHLLMV (Deprecated)

    stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

    FASTHLLMV(playerNames)

    Quantile Digestarrow-up-right
    T-digest algorithmarrow-up-right
    Cardinality Estimation
    Cardinality Estimation
    Quantile Digestarrow-up-right
    T-digest algorithmarrow-up-right

    Supported Transformations

    This document contains the list of all the transformation functions supported by Pinot SQL.

    hashtag
    Math Functions

    Function
    Description
    Example

    ADD(col1, col2, col3...)

    hashtag
    String Functions

    Multiple string functions are supported out of the box from release-0.5.0 .

    Function
    Description
    Example

    hashtag
    DateTime Functions

    Date time functions allow you to perform transformations on columns that contain timestamps or dates.

    Function
    Description
    Example

    hashtag
    JSON Functions

    Usage

    circle-exclamation

    'jsonPath'and 'results_type'are literals. Pinot uses single quotes to distinguish them from identifiers.

    e.g.

    circle-exclamation

    Transform functions can only be used in Pinot SQL. Scalar functions can be used for column transformation in table ingestion configs.

    Examples

    The examples below are based on these 3 sample profile JSON documents:

    Query 1: Extract string values from the field 'name'

    Results are

    Query 2: Extract integer values from the field 'age'

    Results are

    Query 3: Extract Bob's age from the JSON profile.

    Results are

    Query 4: Extract all field keys of JSON profile.

    Results are

    Another example of extracting JSON fields from below JSON record:

    Extract JSON fields:

    Expression
    Value

    hashtag
    Binary Functions

    Function
    Description
    Example

    hashtag
    Multi-value Column Functions

    All of the functions mentioned till now only support single value columns. You can use the following functions to do operations on multi-value columns.

    Function
    Description
    Example

    hashtag
    Advanced Queries

    hashtag
    Geospatial Queries

    Pinot supports Geospatial queries on columns containing text-based geographies. For more details on the queries and how to enable them, see .

    hashtag
    Text Queries

    Pinot supports pattern matching on text-based columns. Only the columns mentioned as text columns in table config can be queried using this method. For more details on how to enable pattern matching, see .

    get substring of the input string from start to endIndex. Index begins at 0. Set endIndex to -1 to calculate till end of the string

    SUBSTR(playerName, 1, -1)

    <code></code>

    SUBSTR(playerName, 1, 4)

    CONCAT(col1, col2, seperator)

    Concatenate two input strings using the seperator

    CONCAT(firstName, lastName, '-')

    TRIM(col)

    trim spaces from both side of the string

    TRIM(playerName)

    LTRIM(col)

    trim spaces from left side of the string

    LTRIM(playerName)

    RTRIM(col)

    trim spaces from right side of the string

    RTRIM(playerName)

    LENGTH(col)

    calculate length of the string

    LENGTH(playerName)

    STRPOS(col, find, N)

    find Nth instance of find string in input. Returns 0 if input string is empty. Returns -1 if the Nth instance is not found or input string is null.

    STRPOS(playerName, 'david', 1)

    STARTSWITH(col, prefix)

    returns true if columns starts with prefix string.

    STARTSWITH(playerName, 'david')

    REPLACE(col, find, substitute)

    replace all instances of find with replace in input

    REPLACE(playerName, 'david', 'henry')

    RPAD(col, size, pad)

    string padded from the right side with pad to reach final size

    RPAD(playerName, 20, 'foo')

    LPAD(col, size, pad)

    string padded from the left side with pad to reach final size

    LPAD(playerName, 20, 'foo')

    CODEPOINT(col)

    the Unicode codepoint of the first character of the string

    CODEPOINT(playerName)

    CHR(codepoint)

    the character corresponding to the Unicode codepoint

    CHR(68)

    Convert epoch milliseconds to epoch <Time Unit>. Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS

    ToEpochSeconds(tsInMillis):Converts column tsInMillis value from epoch milliseconds to epoch seconds.

    ToEpochDays(tsInMillis):Converts column tsInMillis value from epoch milliseconds to epoch days.

    ToEpoch<TIME_UNIT>Rounded(timeInMillis, bucketSize)

    Convert epoch milliseconds to epoch <Time Unit>, round to nearest rounding bucket(Bucket size is defined in <Time Unit>). Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS

    ToEpochSecondsRound(tsInMillis, 10):Converts column tsInMillis value from epoch milliseconds to epoch seconds and round to the 10-minute bucket value. E.g.ToEpochSecondsRound(1613472303000, 10) = 1613472300

    ToEpochMinutesRound(tsInMillis, 1440):Converts column tsInMillis value from epoch milliseconds to epoch Minutes, but round to 1-day bucket value. E.g.ToEpochMinutesRound(1613472303000, 1440) = 26890560

    ToEpoch<TIME_UNIT>Bucket(timeInMillis, bucketSize)

    Convert epoch milliseconds to epoch <Time Unit>, and divided by bucket size(Bucket size is defined in <Time Unit>). Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS

    ToEpochSecondsBucket(tsInMillis, 10):Converts column tsInMillis value from epoch milliseconds to epoch seconds then divide by 10 to get the 10 seconds since epoch value. E.g.

    ToEpochSecondsBucket(1613472303000, 10) = 161347230

    ToEpochHoursBucket(tsInMillis, 24):Converts column tsInMillis value from epoch milliseconds to epoch Hours, then divide by 24 to get 24 hours since epoch value.

    FromEpoch<TIME_UNIT>(timeIn<Time_UNIT>)

    Convert epoch <Time Unit> to epoch milliseconds. Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS

    FromEpochSeconds(tsInSeconds):Converts column tsInSeconds value from epoch seconds to epoch milliseconds. E.g.

    FromEpochSeconds(1613472303) = 1613472303000

    FromEpoch<TIME_UNIT>Bucket(timeIn<Time_UNIT>, bucketSizeIn<Time_UNIT>)

    Convert epoch <Bucket Size><Time Unit> to epoch milliseconds. E.g. 10 seconds since epoch or 5 minutes since Epoch. Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS

    FromEpochSecondsBucket(tsInSeconds, 10):Converts column tsInSeconds value from epoch 10-seconds to epoch milliseconds. E.g.

    FromEpochSeconds(161347231)= 1613472310000

    ToDateTime(timeInMillis, pattern[, timezoneId])

    Convert epoch millis value to DateTime string represented by pattern. Time zone will be set to UTC if timezoneId is not specified.

    ToDateTime(tsInMillis, 'yyyy-MM-dd') converts tsInMillis value to date time pattern yyyy-MM-dd

    ToDateTime(tsInMillis, 'yyyy-MM-dd ZZZ', 'America/Los_Angeles') converts tsInMillis value to date time pattern yyyy-MM-dd ZZZ in America/Los_Angeles time zone

    FromDateTime(dateTimeString, pattern)

    Convert DateTime string represented by pattern to epoch millis.

    FromDateTime(dateTime, 'yyyy-MM-dd')converts dateTime string value to millis epoch value

    round(timeValue, bucketSize)

    Round the given time value to nearest bucket start value.

    round(tsInSeconds, 60) round seconds epoch value to the start value of the 60 seconds bucket it belongs to. E.g. round(161347231, 60)= 161347200

    now()

    Return current time as epoch millis

    Typically used in predicate to filter on timestamp for recent data. E.g. filter data on recent 1 day(86400 seconds).WHERE tsInMillis > now() - 86400000

    timezoneHour(timeZoneId)

    Returns the hour of the time zone offset.

    timezoneMinute(timeZoneId)

    Returns the minute of the time zone offset.

    year(tsInMillis)

    Returns the year from the given epoch millis in UTC timezone.

    year(tsInMillis, timeZoneId)

    Returns the year from the given epoch millis and timezone id.

    yearOfWeek(tsInMillis)

    Returns the year of the ISO week from the given epoch millis in UTC timezone. Alias yowis also supported.

    yearOfWeek(tsInMillis, timeZoneId)

    Returns the year of the ISO week from the given epoch millis and timezone id. Alias yowis also supported.

    quarter(tsInMillis)

    Returns the quarter of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 4.

    quarter(tsInMillis, timeZoneId)

    Returns the quarter of the year from the given epoch millis and timezone id. The value ranges from 1 to 4.

    month(tsInMillis)

    Returns the month of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 12.

    month(tsInMillis, timeZoneId)

    Returns the month of the year from the given epoch millis and timezone id. The value ranges from 1 to 12.

    week(tsInMillis)

    Returns the ISO week of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 53. Alias weekOfYear is also supported.

    week(tsInMillis, timeZoneId)

    Returns the ISO week of the year from the given epoch millis and timezone id. The value ranges from 1 to 53. Alias weekOfYear is also supported.

    dayOfYear(tsInMillis)

    Returns the day of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 366. Alias doy is also supported.

    dayOfYear(tsInMillis, timeZoneId)

    Returns the day of the year from the given epoch millis and timezone id. The value ranges from 1 to 366. Alias doy is also supported.

    day(tsInMillis)

    Returns the day of the month from the given epoch millis in UTC timezone. The value ranges from 1 to 31. Alias dayOfMonth is also supported.

    day(tsInMillis, timeZoneId)

    Returns the day of the month from the given epoch millis and timezone id. The value ranges from 1 to 31. Alias dayOfMonth is also supported.

    dayOfWeek(tsInMillis)

    Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday). Alias dow is also supported.

    dayOfWeek(tsInMillis, timeZoneId)

    Returns the day of the week from the given epoch millis and timezone id. The value ranges from 1(Monday) to 7(Sunday). Alias dow is also supported.

    hour(tsInMillis)

    Returns the hour of the day from the given epoch millis in UTC timezone. The value ranges from 0 to 23.

    hour(tsInMillis, timeZoneId)

    Returns the hour of the day from the given epoch millis and timezone id. The value ranges from 0 to 23.

    minute(tsInMillis)

    Returns the minute of the hour from the given epoch millis in UTC timezone. The value ranges from 0 to 59.

    minute(tsInMillis, timeZoneId)

    Returns the minute of the hour from the given epoch millis and timezone id. The value ranges from 0 to 59.

    second(tsInMillis)

    Returns the second of the minute from the given epoch millis in UTC timezone. The value ranges from 0 to 59.

    second(tsInMillis, timeZoneId)

    Returns the second of the minute from the given epoch millis and timezone id. The value ranges from 0 to 59.

    millisecond(tsInMillis)

    Returns the millisecond of the second from the given epoch millis in UTC timezone. The value ranges from 0 to 999.

    millisecond(tsInMillis, timeZoneId)

    Returns the millisecond of the second from the given epoch millis and timezone id. The value ranges from 0 to 999.

    Convert object to JSON String

    JSONPATH(jsonField, 'jsonPath')

    Scalar

    Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Cannot be used in query because data type is not specified.

    JSONPATHLONG(jsonField, 'jsonPath', [defaultValue])

    Scalar

    Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.

    JSONPATHDOUBLE(jsonField, 'jsonPath', [defaultValue])

    Scalar

    Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.

    JSONPATHSTRING(jsonField, 'jsonPath', [defaultValue])

    Scalar

    Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.

    JSONPATHARRAY(jsonField, 'jsonPath')

    Scalar

    Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Cannot be used in query because data type is not specified.

    JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath')

    Scalar

    Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. Cannot be used in query because data type is not specified.

  • JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING') is valid.

  • JSONEXTRACTSCALAR(profile_json_str, "$.name", "STRING") is invalid.

  • [85, 65]

    Return MD5 digest of binary column(bytes type) as hex string

    MD5(rawData)

    Sum of at least two values

    ADD(score_maths, score_science, score_history)

    SUB(col1, col2)

    Difference between two values

    SUB(total_score, score_science)

    MULT(col1, col2, col3...)

    Product of at least two values

    MUL(score_maths, score_science, score_history)

    DIV(col1, col2)

    Quotient of two values

    SUB(total_score, total_subjects)

    MOD(col1, col2)

    Modulo of two values

    MOD(total_score, total_subjects)

    ABS(col1)

    Absolute of a value

    ABS(score)

    CEIL(col1)

    Rounded up to the nearest integer.

    CEIL(percentage)

    FLOOR(col1)

    Rounded down to the nearest integer.

    FLOOR(percentage)

    EXP(col1)

    Euler’s number(e) raised to the power of col.

    EXP(age)

    LN(col1)

    Natural log of value i.e. ln(col1)

    LN(age)

    SQRT(col1)

    Square root of a value

    SQRT(height)

    UPPER(col)

    convert string to upper case

    UPPER(playerName)

    LOWER(col)

    convert string to lower case

    LOWER(playerName)

    REVERSE(col)

    reverse the string

    REVERSE(playerName)

    TIMECONVERT

    (col, fromUnit, toUnit)

    Converts the value into another time unit. the column should be an epoch timestamp. Supported units are DAYS HOURS MINUTES SECONDS MILLISECONDS MICROSECONDS NANOSECONDS

    TIMECONVERT(time, 'MILLISECONDS', 'SECONDS')This expression converts the value of column time (taken to be in milliseconds) to the nearest seconds (i.e. the nearest seconds that is lower than the value of date column)

    DATETIMECONVERT

    (columnName, inputFormat, outputFormat, outputGranularity)

    Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity. Note that, for weeks/months/quarters/years, please use function: DateTrunc.

    The format is expressed as <time size>:<time unit>:<time format>:<pattern> where,

    time size - size of the time unit eg: 1, 10

    time unit - DAYS HOURS MINUTES SECONDS MILLISECONDS MICROSECONDS NANOSECONDS

    time format - EPOCH or SIMPLE_DATE_FORMAT

    pattern - this is defined in case of SIMPLE_DATE_FORMAT eg: yyyy-MM-dd. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. Asia/Kolkata or PDT

    granularity - specified in the format<time size>:<time unit>

    • Date from hoursSinceEpoch to daysSinceEpoch and bucket it to 1 day granularity DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:EPOCH', '1:DAYS')

    • Date to 15 minutes granularity DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES')

    • Date from hoursSinceEpoch to format yyyyMdd and bucket it to 1 days granularity DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', '1:DAYS')

    • Date from milliseconds to format yyyyMdd in timezone PST DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd tz(America/Los_Angeles)', '1:DAYS')

    DATETRUNC

    (Presto) SQL compatible date truncation, equivalent to the Presto function date_truncarrow-up-right.

    Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.

    DATETRUNC('week', time_in_seconds, 'SECONDS') This expression converts the column time_in_seconds, which is a long containing seconds since UTC epoch truncated at WEEK (where a Week starts at Monday UTC midnight). The output is a long seconds since UTC epoch.

    DATETRUNC('quarter', DIV(time_milliseconds/1000), 'SECONDS', 'America/Los_Angeles', 'HOURS') This expression converts the expression time_in_milliseconds/1000into hours that are truncated to QUARTER at the Los Angeles time zone (where a Quarter begins on 1/1, 4/1, 7/1, 10/1 in Los Angeles timezone). The output is expressed as hours since UTC epoch (note that the output is not Los Angeles timezone)

    Function

    Type

    Description

    JSONEXTRACTSCALAR

    (jsonField, 'jsonPath', 'resultsType', [defaultValue])

    Transform

    Evaluates the 'jsonPath' on jsonField,

    returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.

    JSONEXTRACTKEY

    (jsonField, 'jsonPath')

    Transform

    Extracts all matched JSON field keys based on 'jsonPath'

    Into aSTRING_ARRAY.

    TOJSONMAPSTR(map)

    Scalar

    Convert map to JSON String

    JSONFORMAT(object)

    Arguments

    Description

    jsonField

    An Identifier/Expression contains JSON documents.

    'jsonPath'

    Follows JsonPath Syntaxarrow-up-right to read values from JSON documents.

    'results_type'

    One of the Pinot supported data types:INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING,

    INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.

    JSONPATH(myJsonRecord, '$.name')

    "Pete"

    JSONPATH(myJsonRecord, '$.age')

    24

    JSONPATHSTRING(myJsonRecord, '$.age')

    "24"

    JSONPATHARRAY(myJsonRecord, '$.subjects[*].name')

    ["maths", "english"]

    JSONPATHARRAY(myJsonRecord, '$.subjects[*].score')

    [90, 70]

    SHA(bytesCol)

    Return SHA-1 digest of binary column(bytes type) as hex string

    SHA(rawData)

    SHA256(bytesCol)

    Return SHA-256 digest of binary column(bytes type) as hex string

    SHA256(rawData)

    SHA512(bytesCol)

    Return SHA-512 digest of binary column(bytes type) as hex string

    SHA512(rawData)

    ARRAYLENGTH

    Returns the length of a multi-value column

    MAP_VALUE

    Select the value for a key from Map stored in Pinot.

    MAP_VALUE(mapColumn, 'myKey', valueColumn)

    VALUEIN

    Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The VALUEIN transform function is especially useful when the same multi-valued column is both filtering column and grouping column.

    VALUEIN(mvColumn, 3, 5, 15)

    Geospatial
    Text search support

    SUBSTR(col, startIndex, endIndex)

    ToEpoch<TIME_UNIT>(timeInMillis)

    Scalar

    JSONPATHARRAY(myJsonRecord, '$.subjects[*].homework_grades[1]')

    MD5(bytesCol)

    {
      "name" : "Bob",
      "age" : 37,
      "gender": "male",
      "location": "San Francisco"
    },{
      "name" : "Alice",
      "age" : 25,
      "gender": "female",
      "location": "New York"
    },{
      "name" : "Mia",
      "age" : 18,
      "gender": "female",
      "location": "Chicago"
    }
    SELECT
        JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING')
    FROM
        myTable
    ["Bob", "Alice", "Mia"]
    SELECT
        JSONEXTRACTSCALAR(profile_json_str, '$.age', 'INT')
    FROM
        myTable
    [37, 25, 18]
    SELECT
        JSONEXTRACTSCALAR(myMapStr,'$.age','INT')
    FROM
        myTable
    WHERE
        JSONEXTRACTSCALAR(myMapStr,'$.name','STRING') = 'Bob'
    [37]
    SELECT
        JSONEXTRACTKEY(myMapStr,'$.*')
    FROM
        myTable
    ["name", "age", "gender", "location"]
    {
            "name": "Pete",
            "age": 24,
            "subjects": [{
                            "name": "maths",
                            "homework_grades": [80, 85, 90, 95, 100],
                            "grade": "A",
                            "score": 90
                    },
                    {
                            "name": "english",
                            "homework_grades": [60, 65, 70, 85, 90],
                            "grade": "B",
                            "score": 70
                    }
            ]
    }