This section contains reference documentation for the DISTINCTCOUNTRAWTHETASKETCH function.
The Theta Sketch framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Class and its extensions from the library
org.apache.datasketches:datasketches-java:1.2.0-incubatingto perform distinct counting as well as evaluating set operations.
DISTINCTCOUNTRAWTHETASKETCH(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate) -> HexEncoded
thetaSketchColumn(required): Name of the column to aggregate on.
thetaSketchParams(required): Parameters for constructing the intermediate theta-sketches.
- Currently, the only supported parameter is
nominalEntries(defaults to 4096).
predicates(optional)_: _ These are individual predicates of form
lhs <op> rhswhich are applied on rows selected by the
whereclause. During intermediate sketch aggregation, sketches from the
thetaSketchColumnthat satisfies these predicates are unionized individually. For example, all filtered rows that match
country=USAare unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
postAggregationExpressionToEvaluate(required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are
SET_DIFF, SET_UNION, SET_INTERSECT, where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
select distinctCountRawThetaSketch(teamID) AS value
select distinctCountRawThetaSketch(teamID, 'nominalEntries=10') AS value
We can also provide predicates and a post aggregation expression to compute more complicated cardinalities:
'teamID = ''SFN'' AND numberOfGames=28 AND homeRuns=1',
'teamID = ''CHN'' AND numberOfGames=28 AND homeRuns=1',
) AS value