DISTINCTCOUNTSMARTHLL
This section contains reference documentation for the DISTINCT_COUNT_SMART_HLL function.
Signature
DISTINCT_COUNT_SMART_HLL(col[, params])
col(required): Name of the column to aggregate on.params(optional): Semicolon-separated parameter key-value pairs:threshold: The threshold to convert the value set into a HyperLogLog (default 100_000).log2m: log2m for the HyperLogLog (default 12).
Example:
DISTINCT_COUNT_SMART_HLL(col, 'threshold=10000;log2m=8')
Usage Examples
These examples are based on the Batch Quick Start.
DISTINCTCOUNTSMARTHLL considerations
DISTINCTCOUNTHLL()is faster thanDISTINCTCOUNT()if data is pre-aggregated at ingestion or aggregated at a server with enough records. This performance improvement increases when comparing large datasets.If very few records are pre-aggregated,
DISTINCTCOUNTHLL()will not be as fast asDISTINCTCOUNT()because the serialized HLL size is larger than sending individual values.DISTINCTCOUNTHLLPLUS()provides more precise results thanDISTINCTCOUNTHLL()with the same performance.DISTINCTCOUNTSMARTHLL()automatically shifts to HLL when reaching a threshold, and comes with some overhead.
Last updated
Was this helpful?

