githubEdit

Sketch Functions

Reference documentation for approximate distinct count and sketch-based aggregation functions in Apache Pinot.

Pinot supports several sketch-based algorithms for approximate distinct counting and summarization. These functions trade a small amount of accuracy for significant memory and performance savings at scale.

For exact distinct counting, see DISTINCTCOUNT.

CPC Sketch

The Compressed Probability Counting (CPC) Sketcharrow-up-right enables extremely space-efficient cardinality estimation — about 40% less space than an HLL sketch of comparable accuracy.

Function
Description

Returns approximate distinct count using CPC sketch

Returns raw CPC sketch as hex string

HyperLogLog Plus

HyperLogLogPlus (HLL++) provides approximate distinct counts with configurable precision (p, sp parameters).

Function
Description

Approximate distinct count using HLL++

HLL++ for multi-value columns

Returns serialized HLL++ sketch

Serialized HLL++ sketch for multi-value columns

UltraLogLog

The UltraLogLog Sketcharrow-up-right from Dynatrace requires less space than HyperLogLog and provides a simpler, faster estimator. Implemented via Hash4jarrow-up-right.

Function
Description

Approximate distinct count using ULL (default p=12)

Returns serialized ULL sketch

Tuple Sketch

The Tuple Sketcharrow-up-right extends the Theta Sketch with additional summary values per entry, ideal for summarizing attributes like impressions or clicks.

Function
Description

Distinct count from tuple sketch

Average of summary values

Sum of summary values

Frequency Sketches

Function
Description

Frequent items sketch for long values

Frequent items sketch for string values

Last updated

Was this helpful?