# Sketch Functions

Pinot supports several sketch-based algorithms for approximate distinct counting and summarization. These functions trade a small amount of accuracy for significant memory and performance savings at scale.

For exact distinct counting, see [DISTINCTCOUNT](https://docs.pinot.apache.org/functions/aggregation/distinctcount).

## CPC Sketch

The [Compressed Probability Counting (CPC) Sketch](https://datasketches.apache.org/docs/CPC/CPC.html) enables extremely space-efficient cardinality estimation — about 40% less space than an HLL sketch of comparable accuracy.

| Function                                                                                              | Description                                         |
| ----------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
| [DISTINCTCOUNTCPCSKETCH](https://docs.pinot.apache.org/functions/sketch/distinctcountcpcsketch)       | Returns approximate distinct count using CPC sketch |
| [DISTINCTCOUNTRAWCPCSKETCH](https://docs.pinot.apache.org/functions/sketch/distinctcountrawcpcsketch) | Returns raw CPC sketch as hex string                |

## HyperLogLog Plus

HyperLogLogPlus (HLL++) provides approximate distinct counts with configurable precision (`p`, `sp` parameters).

| Function                                                                                                            | Description                                          |
| ------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- |
| [DISTINCTCOUNTHLLPLUS](https://docs.pinot.apache.org/functions/sketch/distinctcounthllplus)                         | Approximate distinct count using HLL++               |
| [DISTINCTCOUNTSMARTHLLPLUS](https://docs.pinot.apache.org/functions/distinctcounthllplus#distinctcountsmarthllplus) | Starts exact and converts to HLL++ after a threshold |
| [DISTINCTCOUNTHLLPLUSMV](https://docs.pinot.apache.org/functions/sketch/distinctcounthllplusmv)                     | HLL++ for multi-value columns                        |
| [DISTINCTCOUNTRAWHLLPLUS](https://docs.pinot.apache.org/functions/sketch/distinctcountrawhllplus)                   | Returns serialized HLL++ sketch                      |
| [DISTINCTCOUNTRAWHLLPLUSMV](https://docs.pinot.apache.org/functions/sketch/distinctcountrawhllplusmv)               | Serialized HLL++ sketch for multi-value columns      |

## UltraLogLog

The [UltraLogLog Sketch](https://arxiv.org/abs/2308.16862) from Dynatrace requires less space than HyperLogLog and provides a simpler, faster estimator. Implemented via [Hash4j](https://github.com/dynatrace-oss/hash4j/tree/main).

| Function                                                                                  | Description                                         |
| ----------------------------------------------------------------------------------------- | --------------------------------------------------- |
| [DISTINCTCOUNTULL](https://docs.pinot.apache.org/functions/sketch/distinctcountull)       | Approximate distinct count using ULL (default p=12) |
| [DISTINCTCOUNTRAWULL](https://docs.pinot.apache.org/functions/sketch/distinctcountrawull) | Returns serialized ULL sketch                       |

## Tuple Sketch

The [Tuple Sketch](https://datasketches.apache.org/docs/Tuple/TupleOverview.html) extends the Theta Sketch with additional summary values per entry, ideal for summarizing attributes like impressions or clicks.

| Function                                                                                                                      | Description                      |
| ----------------------------------------------------------------------------------------------------------------------------- | -------------------------------- |
| [DISTINCTCOUNTTUPLESKETCH](https://docs.pinot.apache.org/functions/sketch/distinctcounttuplesketch)                           | Distinct count from tuple sketch |
| [DISTINCTCOUNTRAWINTEGERSUMTUPLESKETCH](https://docs.pinot.apache.org/functions/sketch/distinctcountrawintegersumtuplesketch) | Raw tuple sketch as hex          |
| [AVGVALUEINTEGERSUMTUPLESKETCH](https://docs.pinot.apache.org/functions/sketch/avgvalueintegersumtuplesketch)                 | Average of summary values        |
| [SUMVALUESINTEGERSUMTUPLESKETCH](https://docs.pinot.apache.org/functions/sketch/sumvaluesintegersumtuplesketch)               | Sum of summary values            |

## Frequency Sketches

| Function                                                                                      | Description                             |
| --------------------------------------------------------------------------------------------- | --------------------------------------- |
| [FREQUENTLONGSSKETCH](https://docs.pinot.apache.org/functions/sketch/frequentlongssketch)     | Frequent items sketch for long values   |
| [FREQUENTSTRINGSSKETCH](https://docs.pinot.apache.org/functions/sketch/frequentstringssketch) | Frequent items sketch for string values |
