DISTINCTCOUNTSMARTHLL

This section contains reference documentation for the DISTINCT_COUNT_SMART_HLL function.

Signature

DISTINCT_COUNT_SMART_HLL(col[, params])

  • col (required): Name of the column to aggregate on.

  • params (optional): Semicolon-separated parameter key-value pairs:

    • threshold: The threshold to convert the value set into a HyperLogLog (default 100_000).

    • log2m: log2m for the HyperLogLog (default 12).

  • Example: DISTINCT_COUNT_SMART_HLL(col, 'threshold=10000;log2m=8')

Usage Examples

These examples are based on the Batch Quick Start.

DISTINCTCOUNTSMARTHLL considerations

  • DISTINCTCOUNTHLL()is faster than DISTINCTCOUNT()if data is pre-aggregated at ingestion or aggregated at a server with enough records. This performance improvement increases when comparing large datasets.

  • If very few records are pre-aggregated, DISTINCTCOUNTHLL()will not be as fast as DISTINCTCOUNT()because the serialized HLL size is larger than sending individual values.

  • DISTINCTCOUNTHLLPLUS()provides more precise results than DISTINCTCOUNTHLL()with the same performance.

  • DISTINCTCOUNTSMARTHLL()automatically shifts to HLL when reaching a threshold, and comes with some overhead.

Last updated

Was this helpful?