DISTINCTCOUNTHLL

This section contains reference documentation for the DISTINCTCOUNTHLL function.

Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.

For accurate distinct counting, see DISTINCTCOUNT. Review DISTINCTCOUNTHLL considerations for your use case.

Signature

DISTINCTCOUNTHLL(colName, log2m)

Usage Examples

These examples are based on the Batch Quick Start.

select DISTINCTCOUNTHLL(teamID) AS value
from baseballStats 
value

158

select DISTINCTCOUNTHLL(teamID, 12) AS value
from baseballStats 
value

149

DISTINCTCOUNTHLL considerations

  • DISTINCTCOUNTHLL()is faster than DISTINCTCOUNT()if data is pre-aggregated at ingestion or aggregated at a server with enough records. This performance improvement increases when comparing large datasets.

  • If very few records are pre-aggregated, DISTINCTCOUNTHLL()will not be as fast as DISTINCTCOUNT()because the serialized HLL size is larger than sending individual values.

  • DISTINCTCOUNTHLLPLUS()provides more precise results than DISTINCTCOUNTHLL()with the same performance.

  • DISTINCTCOUNTSMARTHLL()automatically shifts to HLL when reaching a threshold, and comes with some overhead.

Last updated