it.unimi.dsi:fastutil:8.2.3to hold all the unique values.
com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.
org.apache.datasketches:datasketches-java:1.2.0-incubatingto perform distinct counting as well as evaluating set operations.
lhs <op> rhswhich are applied on rows selected by the
whereclause. During intermediate sketch aggregation, sketches from the
thetaSketchColumnthat satisfies these predicates are unionized individually. For example, all filtered rows that match
country=USAare unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.
SET_DIFF, SET_UNION, SET_INTERSECT, where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.
whereclause is responsible for identifying the matching rows. Note, the where clause can be completely independent of the
postAggregationExpression. Once matching rows are identified, each server unionizes all the sketches that match the individual predicates, i.e.
device='mobile'in this case. Once the broker receives the intermediate sketches for each of these individual predicates from all servers, it performs the final aggregation by evaluating the
postAggregationExpressionand returns the final cardinality of the resulting sketch.
select distinctCountThetaSketch(sketchCol, 'nominalEntries=1024', 'country'=''USA'' AND 'state'=''CA'', 'device'=''mobile'', 'SET_INTERSECT($1, $2)') from table where country = 'USA' or device = 'mobile...'