githubEdit

Sketch Functions

Reference documentation for approximate distinct count and sketch-based aggregation functions in Apache Pinot.

Pinot supports several sketch-based algorithms for approximate distinct counting and summarization. These functions trade a small amount of accuracy for significant memory and performance savings at scale.

For exact distinct counting, see DISTINCTCOUNT.

CPC Sketch

The Compressed Probability Counting (CPC) Sketcharrow-up-right enables extremely space-efficient cardinality estimation — about 40% less space than an HLL sketch of comparable accuracy.

Function
Description

DISTINCTCOUNTCPCSKETCH

Returns approximate distinct count using CPC sketch

DISTINCTCOUNTRAWCPCSKETCH

Returns raw CPC sketch as hex string

HyperLogLog Plus

HyperLogLogPlus (HLL++) provides approximate distinct counts with configurable precision (p, sp parameters).

Function
Description

DISTINCTCOUNTHLLPLUS

Approximate distinct count using HLL++

DISTINCTCOUNTHLLPLUSMV

HLL++ for multi-value columns

DISTINCTCOUNTRAWHLLPLUS

Returns serialized HLL++ sketch

DISTINCTCOUNTRAWHLLPLUSMV

Serialized HLL++ sketch for multi-value columns

UltraLogLog

The UltraLogLog Sketcharrow-up-right from Dynatrace requires less space than HyperLogLog and provides a simpler, faster estimator. Implemented via Hash4jarrow-up-right.

Function
Description

DISTINCTCOUNTULL

Approximate distinct count using ULL (default p=12)

DISTINCTCOUNTRAWULL

Returns serialized ULL sketch

Tuple Sketch

The Tuple Sketcharrow-up-right extends the Theta Sketch with additional summary values per entry, ideal for summarizing attributes like impressions or clicks.

Function
Description

DISTINCTCOUNTTUPLESKETCH

Distinct count from tuple sketch

DISTINCTCOUNTRAWINTEGERSUMTUPLESKETCH

Raw tuple sketch as hex

AVGVALUEINTEGERSUMTUPLESKETCH

Average of summary values

SUMVALUESINTEGERSUMTUPLESKETCH

Sum of summary values

Frequency Sketches

Function
Description

FREQUENTLONGSSKETCH

Frequent items sketch for long values

FREQUENTSTRINGSSKETCH

Frequent items sketch for string values

Last updated

Was this helpful?