LogoLogo
release-0.9.0
release-0.9.0
  • Introduction
  • Basics
    • Concepts
    • Architecture
    • Components
      • Cluster
      • Controller
      • Broker
      • Server
      • Minion
      • Tenant
      • Schema
      • Table
      • Segment
      • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Running Pinot in Kubernetes
      • Public cloud examples
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Hdfs as Deep Storage
      • Manual cluster setup
      • Batch import example
      • Stream ingestion example
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • Batch Ingestion
        • Spark
        • Hadoop
        • Backfill Data
        • Dimension Table
      • Stream ingestion
        • Apache Kafka
        • Amazon Kinesis
      • Stream Ingestion with Upsert
      • File systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
      • Complex Type (Array, Map) Handling
    • Indexing
      • Forward Index
      • Inverted Index
      • Star-Tree Index
      • Bloom Filter
      • Range Index
      • Text search support
      • JSON Index
      • Geospatial
    • Releases
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Filtering with IdSet
      • Supported Transformations
      • Supported Aggregations
      • User-Defined Functions (UDFs)
      • Cardinality Estimation
      • Lookup UDF Join
      • Querying JSON data
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Update Documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Transformations
      • Null Value Support
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Setup cluster
      • Setup table
      • Setup ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
      • Tiered Storage
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Access Control
      • Monitoring
      • Tuning
        • Realtime
        • Routing
      • Upgrading Pinot with confidence
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication, Authorization, and ACLs
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Schema
    • Ingestion Job Spec
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. For Users
  2. Query

Supported Aggregations

Pinot provides support for aggregations using GROUP BY. You can use the following functions to get the aggregated value.

Function

Description

Example

COUNT

Get the count of rows in a group

COUNT(*)

MIN

Get the minimum value in a group

MIN(playerScore)

MAX

Get the maximum value in a group

MAX(playerScore)

SUM

Get the sum of values in a group

SUM(playerScore)

AVG

Get the average of the values in a group

AVG(playerScore)

MODE

Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.

MODE(playerScore)

MODE(playerScore, 'MIN')

MODE(playerScore, 'MAX')

MODE(playerScore, 'AVG')

MINMAXRANGE

Returns the max - min value in a group

MINMAXRANGE(playerScore)

PERCENTILE(column, N)

Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILE(playerScore, 50), PERCENTILE(playerScore, 99.9)

PERCENTILEEST(column, N)

PERCENTILEEST(playerScore, 50), PERCENTILEEST(playerScore, 99.9)

PERCENTILETDigest(column, N)

PERCENTILETDIGEST(playerScore, 50), PERCENTILETDIGEST(playerScore, 99.9)

DISTINCT

Returns the distinct row values in a group

DISTINCT(playerName)

DISTINCTCOUNT

Returns the count of distinct row values in a group

DISTINCTCOUNT(playerName)

DISTINCTCOUNTBITMAP

Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.

DISTINCTCOUNTBITMAP(playerName)

DISTINCTCOUNTHLL

Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.

DISTINCTCOUNTHLL(playerName, 12)

DISTINCTCOUNTRAWHLL

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

DISTINCTCOUNTRAWHLL(playerName)

FASTHLL (Deprecated)

WARN: will be deprecated soon. FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLL(playerName)

DISTINCTCOUNTTHETASKETCH

DISTINCTCOUNTRAWTHETASKETCH

SEGMENTPARTITIONEDDISTINCTCOUNT

Returns the count of distinct values of a column when the column is pre-partitioned for each segment, where there is no common value within different segments. This function calculates the exact count of distinct values within the segment, then simply sums up the results from different segments to get the final result.

SEGMENTPARTITIONEDDISTINCTCOUNT(playerName)

Multi-value column functions

The following aggregation functions can be used for multi-value columns

Function

Description

Example

COUNTMV

Get the count of rows in a group

COUNTMV(playerName)

MINMV

Get the minimum value in a group

MINMV(playerScores)

MAXMV

Get the maximum value in a group

MAXMV(playerScores)

SUMMV

Get the sum of values in a group

SUMMV(playerScores)

AVGMV

Get the avg of values in a group

AVGMV(playerScores)

MINMAXRANGEMV

Returns the max - min value in a group

MINMAXRANGEMV(playerScores)

PERCENTILEMV(column, N)

Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILEMV(playerScores, 50),

PERCENTILEMV(playerScores, 99.9)

PERCENTILEESTMV(column, N)

PERCENTILEESTMV(playerScores, 50),

PERCENTILEESTMV(playerScores, 99.9)

PERCENTILETDIGESTMV(column, N)

PERCENTILETDIGESTMV(playerScores, 50),

PERCENTILETDIGESTMV(playerScores, 99.9),

DISTINCTCOUNTMV

Returns the count of distinct row values in a group

DISTINCTCOUNTMV(playerNames)

DISTINCTCOUNTBITMAPMV

Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.

DISTINCTCOUNTBITMAPMV(playerNames)

DISTINCTCOUNTHLLMV

Returns an approximate distinct count using HyperLogLog in a group

DISTINCTCOUNTHLLMV(playerNames)

DISTINCTCOUNTRAWHLLMV

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

DISTINCTCOUNTRAWHLLMV(playerNames)

FASTHLLMV (Deprecated)

stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLLMV(playerNames)

PreviousSupported TransformationsNextUser-Defined Functions (UDFs)

Last updated 3 years ago

Was this helpful?

Returns the Nth percentile of the group using algorithm

Returns the Nth percentile of the group using

See

See

Returns the Nth percentile of the group using algorithm

Returns the Nth percentile of the group using

Quantile Digest
T-digest algorithm
Cardinality Estimation
Cardinality Estimation
Quantile Digest
T-digest algorithm