1 of 100

release-0.12.1

Introduction

Apache Pinot, a real-time distributed OLAP datastore, purpose-built for low-latency high throughput analytics, perfect for user-facing analytical workloads.

Join us in our Slack channel for questions, troubleshooting, and feedback. You can request an invite from - .

We'd love to hear from you!

Pinot is a real-time distributed OLAP datastore, purpose-built to provide ultra low-latency analytics, even at extremely high throughput. It can ingest directly from streaming data sources - such as Apache Kafka and Amazon Kinesis - and make the events available for querying instantly. It can also ingest from batch data sources such as Hadoop HDFS, Amazon S3, Azure ADLS, and Google Cloud Storage.

At the heart of the system is a columnar store, with several smart indexing and pre-aggregation techniques for low latency. This makes Pinot the most perfect fit for user-facing realtime analytics. At the same time, Pinot is also a great choice for other analytical use-cases, such as internal dashboards, anomaly detection, and ad-hoc data exploration.

Pinot was built by engineers at LinkedIn and Uber and is designed to scale up and out with no upper bound. Performance always remains constant based on the size of your cluster and an expected query per second (QPS) threshold.

User-Facing Real-Time Analytics

User-facing analytics, or site-facing analytics, is the analytical tools and applications that you would expose directly to the end-users of your product. In a user-facing analytics application, think of the user-base as ALL end users of an App. This App could be a social networking app, or a food delivery app - anything at all. It’s not just a few analysts doing offline analysis, or a handful of data scientists in a company running ad-hoc queries. This is ALL end-users, receiving personalized analytics on their personal devices (think 100s of 1000s of queries per second). These queries are triggered by apps, and not written by people, and so the scale will be as much as the active users on that App (think millions of events/sec)

And, this is for all the freshest possible data, which touches on the other aspect here - realtime analytics. "Yesterday" might be a long time ago for some businesses and they cannot wait for ETLs and batch jobs. The data needs to be used for analytics, as soon as it is generated (think latencies < 1s).

Why is user-facing real-time analytics is so challenging?

Wanting such a user-facing analytics application, using realtime events, sounds great. But what does it mean for the underlying infrastructure, to support such an analytical workload?

Such applications require the freshest possible data, and so the system needs to be able to ingest data in real time and make it available for querying, also in real time.
Data for such apps tend to be event data, for a wide range of actions, coming from multiple sources, and so the data comes in at a very high velocity and tends to be highly dimensional.
Queries are triggered by end-users interacting with apps - with queries per second in hundreds of thousands, with arbitrary query patterns, and latencies are expected to be in milliseconds for good user-experience.
And further do all of the above, while being scalable, reliable, highly available, and having a low cost to serve.

This video talks more about user-facing real-time analytics, and how Pinot is used to achieve that.

Here's another great video that goes into the details of how Pinot tackles some of the challenges faced in handling a user-facing analytics workload.

Companies using Pinot

Pinot originated at LinkedIn which currently has one of the largest deployment powering more than 50+ user facing applications such as Viewed My Profile, Talent Analytics, Company Analytics, Ad Analytics and many more. At LinkedIn, Pinot also serves as the backend to visualize and monitor 10,000+ business metrics.

With Pinot's growing popularity, several companies are now using it in production to power a variety of analytics use cases. A detailed list of companies using Pinot can be found .

Features

A column-oriented database with various compression schemes such as Run Length, Fixed Bit Length
Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index, StarTree Index, Bloom Filter, Range Index, Text Search Index(Lucence/FST), Json Index, Geospatial Index
Ability to optimize query/execution plan based on query and segment metadata
Near real-time ingestion from streams such as Kafka, Kinesis and batch ingestion from sources such as Hadoop, S3, Azure, GCS
SQL-like language that supports selection, aggregation, filtering, group by, order by, distinct queries on data
Support for multi-valued fields
Horizontally scalable and fault-tolerant

When should I use it?

Pinot is designed to execute OLAP queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion.

User facing Analytics Products

Pinot is the perfect choice for user-facing analytics products. Pinot was originally built at LinkedIn to power rich interactive real-time analytic applications such as , , , and many more. is another example of a customer-facing Analytics App. At LinkedIn, Pinot powers 50+ user-facing products, ingesting millions of events per second and serving 100k+ queries per second at millisecond latency.

Real-time Dashboard for Business Metrics

Pinot can be also be used to perform typical analytical operations such as slice and dice, drill down, roll up, and pivot on large scale multi-dimensional data. For instance, at LinkedIn, Pinot powers dashboards for thousands of business metrics. One can connect various BI tools such as Superset, Tableau, or PowerBI to visualize data in Pinot.

Instructions to connect Pinot with Superset can be found .

Anomaly Detection

In addition to visualizing data in Pinot, one can run Machine Learning Algorithms to detect Anomalies in the data stored in Pinot. See for more information on how to use Pinot for Anomaly Detection and Root Cause Analysis.

Frequently asked questions when getting started

Is Pinot a data warehouse or a database?

While Pinot doesn't match the typical mold of a database product, it is best understood based on your role as either an analyst, data scientist, or application developer.

Enterprise business intelligence

For analysts and data scientists, Pinot is best viewed as a highly-scalable data platform for business intelligence. In this view, Pinot converges big data platforms with the traditional role of a data warehouse, making it a suitable replacement for analysis and reporting.

Enterprise application development

For application developers, Pinot is best viewed as an immutable aggregate store that sources events from streaming data sources, such as Kafka, and makes it available for a query using SQL.

As is the case with a microservice architecture, data encapsulation ends up requiring each application to provide its own data store, as opposed to sharing one OLTP database for reads and writes. In this case, it becomes difficult to query the complete view of a domain because it becomes stored in many different databases. This is costly in terms of performance since it requires joins across multiple microservices that expose their data over HTTP under a REST API. To prevent this, Pinot can be used to aggregate all of the data across a microservice architecture into one easily queryable view of the domain.

Pinot prevent any possibility of sharing ownership of database tables across microservice teams. Developers can create their own query models of data from multiple systems of record depending on their use case and needs. As with all aggregate stores, query models are eventually consistent and immutable.

Get started

Our documentation is structured to let you quickly get to the content you need and is organized around the different concerns of users, operators, and developers. If you're new to Pinot and want to learn things by example, please take a look at our getting started section.

Starter guides

To start importing data into Pinot, check out our guides on batch import and stream ingestion based on our .

Query example

Pinot works very well for querying time series data with many dimensions and metrics over a vast unbounded space of records that scales linearly on a per-node basis. Filters and aggregations are both easy and fast.

Pinot supports SQL for querying read-only data. Learn more about querying Pinot for time series data in our guide.

Installation

Pinot may be deployed to and operated on a cloud provider or a local or virtual machine. You may get started either with a bare-metal installation or a Kubernetes one (either locally or in the cloud). To get immediately started with Pinot, check out these quick start guides for bootstrapping a Pinot cluster using Docker or Kubernetes.

Standalone mode

Cluster mode

Learn

For a high-level overview that explains how Pinot works, please take a look at our basic concepts section.

To understand the distributed systems architecture that explains Pinot's operating model, please take a look at our basic architecture section.

Basics

Concepts

Learn about the various components of Pinot and terminologies used to describe data stored in Pinot

Pinot is designed to deliver low latency queries on large datasets. In order to achieve this performance, Pinot stores data in a columnar format and adds additional indices to perform fast filtering, aggregation and group by.

Raw data is broken into small data shards and each shard is converted into a unit known as a segment. One or more segments together form a table, which is the logical container for querying Pinot using SQL/PQL.

Pinot Storage Model

Pinot uses a variety of terms that can refer to either abstractions that model the storage of data or infrastructure components that drive the functionality of the system.

Table

Similar to traditional databases, Pinot has the concept of a table—a logical abstraction to refer to a collection of related data.

As is the case with RDBMS, a table is a construct that consists of columns and rows (documents) that are queried using SQL. A table is associated with a schema that defines the columns in a table as well as their data types.

In contrast to RDBMS schemas, multiple tables in Pinot (real-time or batch) can inherit a single schema definition. Tables are independently configured for concerns such as indexing strategies, partitioning, tenants, data sources, and/or replication.

Segment

Pinot has a distributed systems architecture that scales horizontally. Pinot expects the size of a table to grow infinitely over time. In order to achieve this, all data needs to be distributed across multiple nodes. Pinot achieves this by breaking data into smaller chunks known as segments (similar to shards/partitions in HA relational databases). Segments can also be seen as time-based partitions.

Tenant

In order to support multi-tenancy, Pinot has first class support for tenants. A table is associated with a tenant. This allows all tables belonging to a particular logical namespace to be grouped under a single tenant name and isolated from other tenants. This isolation between tenants provides different namespaces for applications and teams to prevent sharing tables or schemas. Development teams building applications will never have to operate an independent deployment of Pinot. An organization can operate a single cluster and scale it out as new tenants increase the overall volume of queries. Developers can manage their own schemas and tables without being impacted by any other tenant on a cluster.

By default, all tables belong to a default tenant named "default". The concept of tenants is very important, as it satisfies the architectural principle of a "database per service/application" without having to operate many independent data stores. Further, tenants will schedule resources so that segments (shards) are able to restrict a table's data to reside only on a specified set of nodes. Similar to the kind of isolation that is ubiquitously used in Linux containers, compute resources in Pinot can be scheduled to prevent resource contention between tenants.

Cluster

Logically, a cluster is simply a group of tenants. As with the classical definition of a cluster, it is also a grouping of a set of compute nodes. Typically, there is only one cluster per environment/data center. There is no needed to create multiple clusters since Pinot supports the concept of tenants. At LinkedIn, the largest Pinot cluster consists of 1000+ nodes distributed across a data center. The number of nodes in a cluster can be added in a way that will linearly increase performance and availability of queries. The number of nodes and the compute resources per node will reliably predict the QPS for a Pinot cluster, and as such, capacity planning can be easily achieved using SLAs that assert performance expectations for end-user applications.

Auto-scaling is also achievable, however, a set amount of nodes is recommended to keep QPS consistent when query loads vary in sudden unpredictable end-user usage scenarios.

Pinot Components

A Pinot cluster comprises multiple distributed system components. These components are useful to understand for operators that are monitoring system usage or are debugging an issue with a cluster deployment.

Controller
Server
Broker
Minion (optional)

The benefits of scale that make Pinot linearly scalable for an unbounded number of nodes is made possible through its integration with Apache Zookeeper and Apache Helix.

Helix is a cluster management solution that was designed and created by the authors of Pinot at LinkedIn. Helix drives the state of a Pinot cluster from a transient state to an ideal state, acting as the fault-tolerant distributed state store that guarantees consistency. Helix is embedded as agents that operate within a controller, broker, and server, and does not exist as an independent and horizontally scaled component.

Pinot Controller

A controller is the core orchestrator that drives the consistency and routing in a Pinot cluster. Controllers are horizontally scaled as an independent component (container) and has visibility of the state of all other components in a cluster. The controller reacts and responds to state changes in the system and schedules the allocation of resources for tables, segments, or nodes. As mentioned earlier, Helix is embedded within the controller as an agent that is a participant responsible for observing and driving state changes that are subscribed to by other components.

In addition to cluster management, resource allocation, and scheduling, the controller is also the HTTP gateway for REST API administration of a Pinot deployment. A web-based query console is also provided for operators to quickly and easily run SQL/PQL queries.

Pinot Broker

A broker receives queries from a client and routes their execution to one or more Pinot servers before returning a consolidated response.

Pinot Server

Servers host segments (shards) that are scheduled and allocated across multiple nodes and routed on an assignment to a tenant (there is a single-tenant by default). Servers are independent containers that scale horizontally and are notified by Helix through state changes driven by the controller. A server can either be a real-time server or an offline server.

A real-time and offline server have very different resource usage requirements, where real-time servers are continually consuming new messages from external systems (such as Kafka topics) that are ingested and allocated on segments of a tenant. Because of this, resource isolation can be used to prioritize high-throughput real-time data streams that are ingested and then made available for query through a broker.

Pinot Minion

Pinot minion is an optional component that can be used to run background tasks such as "purge" for GDPR (General Data Protection Regulation). As Pinot is an immutable aggregate store, records containing sensitive private data need to be purged on a request-by-request basis. Minion provides a solution for this purpose that complies with GDPR while optimizing Pinot segments and building additional indices that guarantee performance in the presence of the possibility of data deletion. One can also write a custom task that runs on a periodic basis. While it's possible to perform these tasks on the Pinot servers directly, having a separate process (Minion) lessens the overall degradation of query latency as segments are impacted by mutable writes.

Components

Learn about the different components and logical abstractions

This section is a reference for the definition of major components and logical abstractions used in Pinot.

For a general overview that ties together all of the reference material in this section, see .

Operator reference

Developer reference

Cluster

Cluster is a set of nodes comprising of servers, brokers, controllers and minions.

Pinot uses Apache Helix for cluster management. Helix is a cluster management framework that manages replicated, partitioned resources in a distributed system. Helix uses Zookeeper to store cluster state and metadata.

Cluster components

Helix divides nodes into logical components based on their responsibilities:

Participant

The nodes that host distributed, partitioned resources

Pinot Servers are modeled as Participants. For more details about server nodes, see Server.

Spectator

The nodes that observe the current state of each Participant and use that information to access the resources. Spectators are notified of state changes in the cluster (state of a participant, or that of a partition in a participant).

Pinot Brokers are modeled as Spectators. For more details about broker nodes, see Broker.

Controller

The node that observes and controls the Participant nodes. It is responsible for coordinating all transitions in the cluster and ensuring that state constraints are satisfied while maintaining cluster stability.

Pinot Controllers are modeled as Controllers. For more details about controller nodes, see Controller.

Logical view

Another way to visualize the cluster is a logical view, where:

A cluster contains tenants
Tenants contain tables
Tables contain segments.

Setup a Pinot Cluster

Typically, there is only one cluster per environment/data center. There is no need to create multiple Pinot clusters since Pinot supports the concept of tenants. At LinkedIn, the largest Pinot cluster consists of 1000+ nodes.

To set up a cluster, see one of the following guides:

Running Pinot in Docker
Running Pinot locally

Broker

Brokers handle Pinot queries. They accept queries from clients and forward them to the right servers. They collect results back from the servers and consolidate them into a single response, to send back to the client.

Pinot Brokers are modeled as Helix Spectators. They need to know the location of each segment of a table (and each replica of the segments) and route requests to the appropriate server that hosts the segments of the table being queried.

The broker ensures that all the rows of the table are queried exactly once so as to return correct, consistent results for a query. The brokers may optimize to prune some of the segments as long as accuracy is not sacrificed.

Helix provides the framework by which spectators can learn the location in which each partition of a resource (i.e. participant) resides. The brokers use this mechanism to learn the servers that host specific segments of a table.

In the case of hybrid tables, the brokers ensure that the overlap between real-time and offline segment data is queried exactly once, by performing offline and real-time federation.

Let's take this example, we have real-time data for 5 days - March 23 to March 27, and offline data has been pushed until Mar 25, which is 2 days behind real-time. The brokers maintain this time boundary.

Suppose, we get a query to this table : select sum(metric) from table. The broker will split the query into 2 queries based on this time boundary - one for offline and one for realtime. This query becomes - select sum(metric) from table_REALTIME where date >= Mar 25 and select sum(metric) from table_OFFLINE where date < Mar 25

The broker merges results from both these queries before returning the result to the client.

Starting a Broker

Make sure you've . If you're using docker, make sure to . To start a broker

Server

Servers host the data segments and serve queries off the data they host. There are two types of servers:

Offline Offline servers are responsible for downloading segments from the segment store, to host and serve queries off. When a new segment is uploaded to the controller, the controller decides the servers (as many as replication) that will host the new segment and notifies them to download the segment from the segment store. On receiving this notification, the servers download the segment file and load the segment onto the server, to server queries off them.

Real-time Real-time servers directly ingest from a real-time stream (such as Kafka, EventHubs). Periodically, they make segments of the in-memory ingested data, based on certain thresholds. This segment is then persisted onto the segment store.

Pinot Servers are modeled as Helix Participants, hosting Pinot tables (referred to as resources in Helix terminology). Segments of a table are modeled as Helix partitions (of a resource). Thus, a Pinot server hosts one or more helix partitions of one or more helix resources (i.e. one or more segments of one or more tables).

Starting a Server

Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a server

Usage: StartServer
	-serverHost               <String>                      : Host name for controller. (required=false)
	-serverPort               <int>                         : Port number to start the server at. (required=false)
	-serverAdminPort          <int>                         : Port number to serve the server admin API at. (required=false)
	-dataDir                  <string>                      : Path to directory containing data. (required=false)
	-segmentDir               <string>                      : Path to directory containing segments. (required=false)
	-zkAddress                <http>                        : Http address of Zookeeper. (required=false)
	-clusterName              <String>                      : Pinot cluster name. (required=false)
	-configFileName           <Config File Name>            : Broker Starter Config file. (required=false)
	-help                                                   : Print this message. (required=false)

docker run \
    --network=pinot-demo \
    --name pinot-server \
    -d ${PINOT_IMAGE} StartServer \
    -zkAddress pinot-zookeeper:2181

bin/pinot-admin.sh StartServer \
    -zkAddress localhost:2181

USAGE

Usage: StartServer
	-serverHost               <String>                      : Host name for controller. (required=false)
	-serverPort               <int>                         : Port number to start the server at. (required=false)
	-serverAdminPort          <int>                         : Port number to serve the server admin API at. (required=false)
	-dataDir                  <string>                      : Path to directory containing data. (required=false)
	-segmentDir               <string>                      : Path to directory containing segments. (required=false)
	-zkAddress                <http>                        : Http address of Zookeeper. (required=false)
	-clusterName              <String>                      : Pinot cluster name. (required=false)
	-configFileName           <Config File Name>            : Server Starter Config file. (required=false)
	-help                                                   : Print this message. (required=false)

Tenant

A tenant is a logical component defined as a group of server/broker nodes with the same Helix tag.

In order to support multi-tenancy, Pinot has first-class support for tenants. Every table is associated with a server tenant and a broker tenant. This controls the nodes that will be used by this table as servers and brokers. This allows all tables belonging to a particular use case to be grouped under a single tenant name.

The concept of tenants is very important when the multiple use cases are using Pinot and there is a need to provide quotas or some sort of isolation across tenants. For example, consider we have two tables Table A and Table B in the same Pinot cluster.

We can configure Table A with server tenant Tenant A and Table B with server tenant Tenant B. We can tag some of the server nodes for Tenant A and some for Tenant B. This will ensure that segments of Table A only reside on servers tagged with Tenant A, and segment of Table B only reside on servers tagged with Tenant B. The same isolation can be achieved at the broker level, by configuring broker tenants to the tables.

No need to create separate clusters for every table or use case!

Tenant Config

This tenant is defined in the section of the table config.

This section contains 2 main fields broker and server , which decide the tenants used for the broker and server components of this table.

In the above example:

The table will be served by brokers that have been tagged as brokerTenantName_BROKER in Helix.
If this were an offline table, the offline segments for the table will be hosted in Pinot servers tagged in Helix as serverTenantName_OFFLINE
If this were a real-time table, the real-time segments (both consuming as well as completed ones) will be hosted in pinot servers tagged in Helix as serverTenantName_REALTIME.

Creating a tenant

Broker tenant

Here's a sample broker tenant config. This will create a broker tenant sampleBrokerTenant by tagging 3 untagged broker nodes as sampleBrokerTenant_BROKER.

To create this tenant use the following command. The creation will fail if number of untagged broker nodes is less than numberOfInstances.

Follow instructions in to get Pinot locally, and then

Check out the table config in the to make sure it was successfully uploaded.

Server tenant

Here's a sample server tenant config. This will create a server tenant sampleServerTenant by tagging 1 untagged server node as sampleServerTenant_OFFLINE and 1 untagged server node as sampleServerTenant_REALTIME.

To create this tenant use the following command. The creation will fail if number of untagged server nodes is less than offlineInstances + realtimeInstances.

Follow instructions in to get Pinot locally, and then

Check out the table config in the to make sure it was successfully uploaded.

Deep Store

Learn about the deep store that stores a compressed copy of segment files in Pinot.

The deep store (or deep storage) is the permanent store for files.

It is used for backup and restore operations. New nodes in a cluster will pull down a copy of segment files from the deep store. If the local segment files on a server gets damaged in some way (or accidentally deleted), a new copy will be pulled down from the deep store on server restart.

The deep store stores a compressed version of the segment files and it typically won't include any indexes. These compressed files can be stored on a local file system or on a variety of other file systems. For more details on supported file systems, see .

Note: Deep Store by itself is not sufficient for restore operations. Pinot stores metadata such as table config, schema, segment metadata in Zookeeper. For restore operations, both Deep Store as well as Zookeeper metadata are required.

How do segments get into the Deep Store?

There are several different ways that segments are persisted in the deep store.

For offline tables, the batch ingestion job writes the segment directly into the deep store, as shown in the diagram below:

The ingestion job then sends a notification about the new segment to the controller, which in turn notifies the appropriate server to pull down that segment.

For real-time tables, by default, a segment is first built-in memory by the server. It is then uploaded to the lead controller (as part of the Segment Completion Protocol sequence), which writes the segment into the deep store, as shown in the diagram below:

Having all segments go through the controller can become a system bottleneck under heavy load, in which case you can use the peer download policy, as described in .

When using this configuration the server will directly write a completed segment to the deep store, as shown in the diagram below:

Configuring the Deep Store

For hands-on examples of how to configure the deep store, see the following tutorials:

Getting Started

This section contains quick start guides to help you get up and running with Pinot.

Running Pinot

To simplify the getting started experience, Pinot ships with quick start guides that launch Pinot components in a single process and import pre-built datasets.

For a full list of these guides, see Quick Start Examples.

Deploy to a public cloud

Data import examples

Getting data into Pinot is easy. Take a look at these two quick start guides which will help you get up and running with sample data for offline and real-time tables.

Running on public clouds

This page contains multiple quick start guides for deploying Pinot to a public cloud provider.

The following quick start guides will show you how to run an Apache Pinot cluster using Kubernetes on different public cloud providers.

Running on GCP

This starter provides a quick start for running Pinot on Google Cloud Platform (GCP)

This document provides the basic instruction to set up a Kubernetes Cluster on Google Kubernetes Engine(GKE)

1. Tooling Installation

1.1 Install Kubectl

Please follow this link (https://kubernetes.io/docs/tasks/tools/install-kubectl) to install kubectl.

For Mac User

brew install kubernetes-cli

Please check kubectl version after installation.

kubectl version

QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Please follow this link (https://helm.sh/docs/using_helm/#installing-helm) to install helm.

For Mac User

brew install kubernetes-helm

Please check helm version after installation.

helm version

This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

1.3 Install Google Cloud SDK

Please follow this link (https://cloud.google.com/sdk/install) to install Google Cloud SDK.

1.3.1 For Mac User

Install Google Cloud SDK

curl https://sdk.cloud.google.com | bash

Restart your shell

exec -l $SHELL

2. (Optional) Initialize Google Cloud Environment

gcloud init

3. (Optional) Create a Kubernetes cluster(GKE) in Google Cloud

Below script will create a 3 nodes cluster named pinot-quickstart in us-west1-b with n1-standard-2 machines for demo purposes.

Please modify the parameters in the example command below:

GCLOUD_PROJECT=[your gcloud project name]
GCLOUD_ZONE=us-west1-b
GCLOUD_CLUSTER=pinot-quickstart
GCLOUD_MACHINE_TYPE=n1-standard-2
GCLOUD_NUM_NODES=3
gcloud container clusters create ${GCLOUD_CLUSTER} \
  --num-nodes=${GCLOUD_NUM_NODES} \
  --machine-type=${GCLOUD_MACHINE_TYPE} \
  --zone=${GCLOUD_ZONE} \
  --project=${GCLOUD_PROJECT}

You can monitor cluster status by command:

gcloud compute instances list

Once the cluster is in RUNNING status, it's ready to be used.

4. Connect to an existing cluster

Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

GCLOUD_PROJECT=[your gcloud project name]
GCLOUD_ZONE=us-west1-b
GCLOUD_CLUSTER=pinot-quickstart
gcloud container clusters get-credentials ${GCLOUD_CLUSTER} --zone ${GCLOUD_ZONE} --project ${GCLOUD_PROJECT}

To verify the connection, you can run:

kubectl get nodes

5. Pinot Quickstart

Please follow this Kubernetes QuickStart to deploy your Pinot Demo.

6. Delete a Kubernetes Cluster

GCLOUD_ZONE=us-west1-b
gcloud container clusters delete pinot-quickstart --zone=${GCLOUD_ZONE}

Running on AWS

This guide provides a quick start for running Pinot on Amazon Web Services (AWS).

This document provides the basic instruction to set up a Kubernetes Cluster on

1. Tooling Installation

1.1 Install Kubectl

Please follow this link () to install kubectl.

For Mac User

Please check kubectl version after installation.

QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Please follow this link () to install helm.

For Mac User

Please check helm version after installation.

This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

1.3 Install AWS CLI

Please follow this link () to install AWS CLI.

For Mac User

1.4 Install Eksctl

Please follow this link () to install AWS CLI.

For Mac User

For first time AWS user, please register your account at .

Once created the account, you can go to to create a user and create access keys under Security Credential tab.

Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will override AWS configuration stored in file ~/.aws/credentials

3. (Optional) Create a Kubernetes cluster(EKS) in AWS

The script below will create a 1 node cluster named pinot-quickstart in us-west-2 with a t3.xlarge machine for demo purposes:

You can monitor the cluster status via this command:

Once the cluster is in ACTIVE status, it's ready to be used.

4. Connect to an existing cluster

Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

To verify the connection, you can run:

5. Pinot Quickstart

Please follow this to deploy your Pinot Demo.

6. Delete a Kubernetes Cluster

HDFS as Deep Storage

This guide helps to setup HDFS as deepstorage for Pinot Segment.

To use HDFS as deep storage you need to include HDFS dependency jars and plugins.

Server Setup

Configuration.

pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.server.storage.factory.hdfs.hadoop.conf.path=/path/to/hadoop/conf/directory/
pinot.server.segment.fetcher.protocols=file,http,hdfs
pinot.server.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.server.segment.fetcher.hdfs.hadoop.kerberos.principle=<your kerberos principal>
pinot.server.segment.fetcher.hdfs.hadoop.kerberos.keytab=<your kerberos keytab>
pinot.set.instance.id.to.hostname=true
pinot.server.instance.dataDir=/path/in/local/filesystem/for/pinot/data/server/index
pinot.server.instance.segmentTarDir=/path/in/local/filesystem/for/pinot/data/server/segment
pinot.server.grpc.enable=true
pinot.server.grpc.port=8090

Executable.

export HADOOP_HOME=/path/to/hadoop/home
export HADOOP_VERSION=2.7.1
export HADOOP_GUAVA_VERSION=11.0.2
export HADOOP_GSON_VERSION=2.2.4
export GC_LOG_LOCATION=/path/to/gc/log/file
export PINOT_VERSION=0.10.0
export PINOT_DISTRIBUTION_DIR=/path/to/apache-pinot-${PINOT_VERSION}-bin/
export SERVER_CONF_DIR=/path/to/pinot/conf/dir/
export ZOOKEEPER_ADDRESS=localhost:2181


export CLASSPATH_PREFIX="${HADOOP_HOME}/share/hadoop/hdfs/hadoop-hdfs-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-annotations-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-auth-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/hadoop-common-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/guava-${HADOOP_GUAVA_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/gson-${HADOOP_GSON_VERSION}.jar"
export JAVA_OPTS="-Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:${GC_LOG_LOCATION}/gc-pinot-server.log"
${PINOT_DISTRIBUTION_DIR}/bin/start-server.sh  -zkAddress ${ZOOKEEPER_ADDRESS} -configFileName ${SERVER_CONF_DIR}/server.conf

Controller Setup

Configuration.

controller.data.dir=hdfs://path/in/hdfs/for/controller/segment
controller.local.temp.dir=/tmp/pinot/
controller.zk.str=<ZOOKEEPER_HOST:ZOOKEEPER_PORT>
controller.enable.split.commit=true
controller.access.protocols.http.port=9000
controller.helix.cluster.name=PinotCluster
pinot.controller.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.controller.storage.factory.hdfs.hadoop.conf.path=/path/to/hadoop/conf/directory/
pinot.controller.segment.fetcher.protocols=file,http,hdfs
pinot.controller.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.principle=<your kerberos principal>
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.keytab=<your kerberos keytab>
controller.vip.port=9000
controller.port=9000
pinot.set.instance.id.to.hostname=true
pinot.server.grpc.enable=true

Executable.

export HADOOP_HOME=/path/to/hadoop/home
export HADOOP_VERSION=2.7.1
export HADOOP_GUAVA_VERSION=11.0.2
export HADOOP_GSON_VERSION=2.2.4
export GC_LOG_LOCATION=/path/to/gc/log/file
export PINOT_VERSION=0.10.0
export PINOT_DISTRIBUTION_DIR=/path/to/apache-pinot-${PINOT_VERSION}-bin/
export SERVER_CONF_DIR=/path/to/pinot/conf/dir/
export ZOOKEEPER_ADDRESS=localhost:2181


export CLASSPATH_PREFIX="${HADOOP_HOME}/share/hadoop/hdfs/hadoop-hdfs-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-annotations-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-auth-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/hadoop-common-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/guava-${HADOOP_GUAVA_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/gson-${HADOOP_GSON_VERSION}.jar"
export JAVA_OPTS="-Xms8G -Xmx12G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:${GC_LOG_LOCATION}/gc-pinot-controller.log"
${PINOT_DISTRIBUTION_DIR}/bin/start-controller.sh -configFileName ${SERVER_CONF_DIR}/controller.conf

Broker Setup

Configuration.

pinot.set.instance.id.to.hostname=true
pinot.server.grpc.enable=true

Executable.

export HADOOP_HOME=/path/to/hadoop/home
export HADOOP_VERSION=2.7.1
export HADOOP_GUAVA_VERSION=11.0.2
export HADOOP_GSON_VERSION=2.2.4
export GC_LOG_LOCATION=/path/to/gc/log/file
export PINOT_VERSION=0.10.0
export PINOT_DISTRIBUTION_DIR=/path/to/apache-pinot-${PINOT_VERSION}-bin/
export SERVER_CONF_DIR=/path/to/pinot/conf/dir/
export ZOOKEEPER_ADDRESS=localhost:2181


export CLASSPATH_PREFIX="${HADOOP_HOME}/share/hadoop/hdfs/hadoop-hdfs-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-annotations-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-auth-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/hadoop-common-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/guava-${HADOOP_GUAVA_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/gson-${HADOOP_GSON_VERSION}.jar"
export JAVA_OPTS="-Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:${GC_LOG_LOCATION}/gc-pinot-broker.log"
${PINOT_DISTRIBUTION_DIR}/bin/start-broker.sh -zkAddress ${ZOOKEEPER_ADDRESS} -configFileName  ${SERVER_CONF_DIR}/broker.conf

Troubleshooting Pinot

Is there any debug information available in Pinot?

Pinot offers various ways to assist with troubleshooting and debugging problems that might happen. It is recommended to start off with the debug api which may quickly surface some of the commonly occurring problems. The debug api provides information such as tableSize, ingestion status, any error messages related to state transition in server, among other things.

The table debug api can be invoked via the Swagger UI as follows:

It can also be invoked directly by accessing the URL as follows. The api requires the tableName, and can optionally take tableType (offline|realtime) and verbosity level.

curl -X GET "http://localhost:9000/debug/tables/airlineStats?verbosity=0" -H "accept: application/json"

Pinot also provides a wide-variety of operational metrics that can be used for creating dashboards, alerting and monitoring. Also, all pinot components log debug information related to error conditions that can be used for troubleshooting.

How do I debug a slow query or a query which keeps timing out

Please use these steps:

If the query executes, look at the query result. Specifically look at numEntriesScannedInFilter and numDocsScanned.
1. If numEntriesScannedInFilter is very high, consider adding indexes for the corresponding columns being used in the filter predicates. You should also think about partitioning the incoming data based on the dimension most heavily used in your filter queries.
2. If numDocsScanned is very high, that means the selectivity for the query is low and lots of documents need to be processed after the filtering. Consider refining the filter to increase the selectivity of the query.
If the query is not executing, you can extend the query timeout by appending a timeoutMs parameter to the query (eg: select * from mytable limit 10 option(timeoutMs=60000)). Then you can repeat step 1.
You can also look at GC stats for the corresponding Pinot servers. If a particular server seems to be running full GC all the time, you can do a couple of things such as
1. Increase JVM heap (Xmx)
2. Consider using off-heap memory for segments
3. Decrease the total number of segments per server (by partitioning the data in a better way)

Frequently Asked Questions (FAQs)

This page has a collection of frequently asked questions with answers from the community.

This is a list of frequent questions most often asked in our troubleshooting channel on Slack. Please feel free to contribute your questions and answers here and make a pull request.

General

FAQ for general questions around Pinot

How does Pinot use deep storage?

When data is pushed in to Pinot, it makes a backup copy of the data and stores it on the configured deep-storage (S3/GCP/ADLS/NFS/etc). This copy is stored as tar.gz Pinot segments. Note, that pinot servers keep a (untarred) copy of the segments on their local disk as well. This is done for performance reasons.

How does Pinot use Zookeeper?

Pinot uses Apache Helix for cluster management, which in turn is built on top of Zookeeper. Helix uses Zookeeper to store the cluster state, including Ideal State, External View, Participants, etc. Besides that, Pinot uses Zookeeper to store other information such as Table configs, schema, Segment Metadata, etc.

Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

Please check the JDK version you are using. The release 0.8.0 binary is on JDK 11. You may be getting this error if you are using JDK8. In that case, please consider using JDK11, or you will need to download the for the release and it locally.

Pinot On Kubernetes FAQ

How to increase server disk size on AWS

Below is an example of AWS EKS.

1. Update Storage Class

In the K8s cluster, check the storage class: in AWS, it should be gp2.

Then update StorageClass to ensure:

Once StorageClass is updated, it should be like:

2. Update PVC

Once the storage class is updated, then we can update PVC for the server disk size.

Now we want to double the disk size for pinot-server-3.

Below is an example of current disks:

Below is the output of data-pinot-server-3

Now, let's change the PVC size to 2T by editing the server PVC.

Once updated, the spec's PVC size is updated to 2T, but the status's PVC size is still 1T.

3. Restart pod to let it reflect

Restart pinot-server-3 pod:

Recheck PVC size:

Import Data

This section is an overview of the various options for importing data into Pinot.

There are multiple options for importing data into Pinot. These guides are ready-made examples that show you step-by-step instructions for importing records into Pinot, supported by our .

These guides are meant to get you up and running with imported data as quick as possible. Pinot supports multiple file input formats without needing to change anything other than the file name. Each example imports a ready-made dataset so you can see how things work without needing to bring your own dataset.

Pinot Batch Ingestion

Pinot Stream Ingestion

This guide will show you how to import data using stream ingestion from Apache Kafka topics.

This guide will show you how to import data using stream ingestion with upsert.

This guide will show you how to import data using stream ingestion with deduplication.

This guide will show you how to import data using stream ingestion with CLP.

Pinot File Systems

By default, Pinot does not come with a storage layer, so all the data sent, won't be stored in case of system crash. In order to persistently store the generated segments, you will need to change controller and server configs to add a deep storage. Checkout for all the info and related configs.

These guides will show you how to import data as well as persist it in the file systems.

Pinot Input Formats

These guides will show you how to import data from a Pinot supported input format.

This guide will show you how to handle the complex type in the ingested data, such as map and array.

From Query Console

Insert a file into Pinot from Query Console

This feature is supported after the 0.11.0 release. Reference PR:

Prerequisite

Ensure you have available Pinot Minion instances deployed within the cluster.
Pinot version is 0.11.0 or above

How it works

Parse the query with the table name and directory URI along with a list of options for the ingestion job.
Call controller minion task execution API endpoint to schedule the task on minion
Response has the schema of table name and task job id.

Usage Syntax

INSERT INTO [database.]table FROM FILE dataDirURI OPTION ( k=v ) [, OPTION (k=v)]*

Example

Screenshot

Insert Rows into Pinot

We are actively developing this feature...

The details will be revealed soon.

Flink

Pinot supports Apache Flink as a processing framework to push segment files to the database.

Pinot distribution contains an Apache Flink SinkFunction that can be used as part of the Apache Flink application (Streaming or Batch) to directly write into a designated Pinot database.

Example

Flink application

Here is an example code snippet to show how to utilize the PinotSinkFunction in a Flink streaming application:

// some environmental setup
StreamExecutionEnvironment execEnv = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Row> srcRows = execEnv.addSource(new FlinkKafkaConsumer<Row>(...));
RowTypeInfo typeInfo = new RowTypeInfo(
  new TypeInformation[]{Types.FLOAT, Types.FLOAT, Types.STRING, Types.STRING},
  new String[]{"lon", "lat", "address", "name"});


// add processing logic for the data stream for example:
DataStream<Row> processedRows = srcRow.keyBy(r -> r.getField(0));
...

// configurations for PinotSinkFunction
Schema pinotSchema = ...
TableConfig pinotTableConfig = ...
processedRows.addSink(new PinotSinkFunction<>(
  new FlinkRowGenericRowConverter(typeInfo), 
  pinotTableConfig,
  pinotSchema);

// execute the program
execEnv.execute();

As the example shown above, the only required information from the Pinot side is the table schema and the table config.

For a more detail executable please refer to the quick start example.

Table Config

PinotSinkFunction uses mostly the TableConfig object to infer the batch ingestion configuration to start a SegmentWriter and SegmentUploader to communicate with the Pinot cluster.

Note that even though in the above example Flink application is running in streaming mode, the data is still batch together and flush/upload to Pinot once the flush threshold is reached. It is not a direct streaming write into Pinot.

Here is an example table config

{
  "tableName" : "tbl_OFFLINE",
  "tableType" : "OFFLINE",
  "segmentsConfig" : {
    // ...
  },
  "tenants" : {
    // ...
  },
  "tableIndexConfig" : {
    // ....
  },
  "ingestionConfig": {
    "batchIngestionConfig": {
      "segmentIngestionType": "APPEND",
      "segmentIngestionFrequency": "HOURLY", 
      "batchConfigMaps": [
        {
          "outputDirURI": "file://path/to/flink/segmentwriter/output/dir",
          "overwriteOutput": "false",
          "push.controllerUri": "https://target.pinot.cluster.controller.url"
        }
      ]
    }
  }
}

the only required configurations are:

"outputDirURI": where PinotSinkFunction should write the constructed segment file to
"push.controllerUri": which Pinot cluster (controller) URL PinotSinkFunction should communicate with.

The rest of the configurations are standard for any Pinot table.

Hadoop

Segment Creation and Push

Pinot supports Apache Hadoop as a processor to create and push segment files to the database. Pinot distribution is bundled with the Spark code to process your files and convert and upload them to Pinot.

You can follow the [wiki] to build pinot distribution from source. The resulting JAR file can be found in pinot/target/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar

Next, you need to change the execution config in the job spec to the following -

# executionFrameworkSpec: Defines ingestion jobs to be running.
executionFrameworkSpec:

    # name: execution framework name
  name: 'hadoop'

  # segmentGenerationJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner'

  # segmentTarPushJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentTarPushJobRunner'

  # segmentUriPushJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentUriPushJobRunner'

  # segmentMetadataPushJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentMetadataPushJobRunner'

    # extraConfigs: extra configs for execution framework.
  extraConfigs:

    # stagingDir is used in distributed filesystem to host all the segments then move this directory entirely to output directory.
    stagingDir: your/local/dir/staging

You can check out the sample job spec here.

Finally execute the hadoop job using the command -

export PINOT_VERSION=0.10.0
export PINOT_DISTRIBUTION_DIR=${PINOT_ROOT_DIR}/build/
export HADOOP_CLIENT_OPTS="-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml"

hadoop jar  \\
        ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \\
        org.apache.pinot.tools.admin.PinotAdministrator \\
        LaunchDataIngestionJob \\
        -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/batch/airlineStats/hadoopIngestionJobSpec.yaml

Please ensure environment variables PINOT_ROOT_DIR and PINOT_VERSION are set properly.

Data Preprocessing before Segment Creation

We’ve seen some requests that data should be massaged (like partitioning, sorting, resizing) before creating and pushing segments to Pinot.

The MapReduce job called SegmentPreprocessingJob would be the best fit for this use case, regardless of whether the input data is of AVRO or ORC format.

Check the below example to see how to use SegmentPreprocessingJob.

In Hadoop properties, set the following to enable this job:

enable.preprocessing = true
preprocess.path.to.output = <output_path>

In table config, specify the operations in preprocessing.operations that you'd like to enable in the MR job, and then specify the exact configs regarding those operations:

{
    "OFFLINE": {
        "metadata": {
            "customConfigs": {
                “preprocessing.operations”: “resize, partition, sort”, // To enable the following preprocessing operations
                "preprocessing.max.num.records.per.file": "100",       // To enable resizing
                "preprocessing.num.reducers": "3"                      // To enable resizing
            }
        },
        ...
        "tableIndexConfig": {
            "aggregateMetrics": false,
            "autoGeneratedInvertedIndex": false,
            "bloomFilterColumns": [],
            "createInvertedIndexDuringSegmentGeneration": false,
            "invertedIndexColumns": [],
            "loadMode": "MMAP",
            "nullHandlingEnabled": false,
            "segmentPartitionConfig": {       // To enable partitioning
                "columnPartitionMap": {
                    "item": {
                        "functionName": "murmur",
                        "numPartitions": 4
                    }
                }
            },
            "sortedColumn": [                // To enable sorting
                "actorId"
            ],
            "streamConfigs": {}
        },
        "tableName": "tableName_OFFLINE",
        "tableType": "OFFLINE",
        "tenants": {
            ...
        }
    }
}

preprocessing.num.reducers

Minimum number of reducers. Optional. Fetched when partitioning gets disabled and resizing is enabled. This parameter is to avoid having too many small input files for Pinot, which leads to the case where Pinot server is holding too many small segments, causing too many threads.

preprocessing.max.num.records.per.file

Maximum number of records per reducer. Optional.Unlike, “preprocessing.num.reducers”, this parameter is to avoid having too few large input files for Pinot, which misses the advantage of muti-threading when querying. When not set, each reducer will finally generate one output file. When set (e.g. M), the original output file will be split into multiple files and each new output file contains at most M records. It does not matter whether partitioning is enabled or not.

For more details on this MR job, please refer to this document.

Backfill Data

Introduction

Pinot batch ingestion involves two parts: routing ingestion job(hourly/daily) and backfill. Here are some tutorials on how routine batch ingestion works in Pinot Offline Table:

High Level Idea

Organize raw data into buckets (eg: /var/pinot/airlineStats/rawdata/2014/01/01). Each bucket typically contains several files (eg: /var/pinot/airlineStats/rawdata/2014/01/01/airlineStats_data_2014-01-01_0.avro)
Run a Pinot batch ingestion job, which points to a specific date folder like ‘/var/pinot/airlineStats/rawdata/2014/01/01’. The segment generation job will convert each such avro file into a Pinot segment for that day and give it a unique name.
Run Pinot segment push job to upload those segments with those uniques names via a Controller API

IMPORTANT: The segment name is the unique identifier used to uniquely identify that segment in Pinot. If the controller gets an upload request for a segment with the same name - it will attempt to replace it with the new one.

This newly uploaded data can now be queried in Pinot. However, sometimes users will make changes to the raw data which need to be reflected in Pinot. This process is known as 'Backfill'.

How to Backfill data in Pinot

Pinot supports data modification only at the segment level, which means we should update entire segments for doing backfills. The high level idea is to repeat steps 2 (segment generation) and 3 (segment upload) mentioned above:

Backfill jobs must run at the same granularity as the daily job. E.g., if you need to backfill data for 2014/01/01, specify that input folder for your backfill job (e.g.: ‘/var/pinot/airlineStats/rawdata/2014/01/01’)
The backfill job will then generate segments with the same name as the original job (with the new data).
When uploading those segments to Pinot, the controller will replace the old segments with the new ones (segment names act like primary keys within Pinot) one by one.

Edge case

Backfill jobs expect the same number of (or more) data files on the backfill date. So the segment generation job will create the same number of (or more) segments than the original run.

E.g. assuming table airlineStats has 2 segments(airlineStats_2014-01-01_2014-01-01_0, airlineStats_2014-01-01_2014-01-01_1) on date 2014/01/01 and the backfill input directory contains only 1 input file. Then the segment generation job will create just one segment: airlineStats_2014-01-01_2014-01-01_0. After the segment push job, only segment airlineStats_2014-01-01_2014-01-01_0 got replaced and stale data in segment airlineStats_2014-01-01_2014-01-01_1 are still there.

In case the raw data is modified in such a way that the original time bucket has fewer input files than the first ingestion run, backfill will fail.

Dimension Table

Dimension tables in Apache Pinot.

Dimension tables are a special kind of offline tables from which data can be looked up via the lookup UDF, providing join like functionality.

Dimension tables are replicated on all the hosts for a given tenant to allow faster lookups.

To mark an offline table as a dim table, isDimTable should be set to true and segmentsConfig.segementPushType should be set to REFRESH in the table config as shown below:

{
  "OFFLINE": {
    "tableName": "dimBaseballTeams_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "schemaName": "dimBaseballTeams",
      "segmentPushType": "REFRESH"
    },
    "metadata": {},
    "quota": {
      "storage": "200M"
    },
    "isDimTable": true
  }
}

As dimension tables are used to perform lookups of dimension values, they are required to have a primary key (can be a composite key).

{
  "dimensionFieldSpecs": [
    {
      "dataType": "STRING",
      "name": "teamID"
    },
    {
      "dataType": "STRING",
      "name": "teamName"
    }
  ],
  "schemaName": "dimBaseballTeams",
  "primaryKeyColumns": ["teamID"]
}

When a table is marked as a dimension table, it will be replicated on all the hosts, which means that these tables must be small in size.

The maximum size quota for a dimension table in a cluster is controlled by the controller.dimTable.maxSize controller property. Table creation will fail if the storage quota exceeds this maximum size.

A dimension table cannot be part of a hybrid table.

Apache Pulsar

Pinot supports consuming data from via pinot-pulsar plugin. You need to enable this plugin so that Pulsar specific libraries are present in the classpath.

You can enable pulsar plugin with the following config at the time of Pinot setup -Dplugins.include=pinot-pulsar

pinot-pulsar plugin is not part of official 0.10.0 binary. You can download the plugin from and add it to libs or plugins directory in pinot.

Set up Pulsar table

A sample Pulsar stream config to ingest data should look as follows. You can use the streamConfigs section from this sample and make changes for your corresponding table.

Pulsar configuration options

You can change the following Pulsar specifc configurations for your tables

Property

Description

Authentication

Pinot-Pulsar connector supports authentication using the security tokens. You can generate the token by following the . Once generated, you can add the following property to streamConfigs to add auth token for each request

TLS support

Pinot-pulsar connecor also supports TLS for encrypted connections. You can follow to enable TLS on your pulsar cluster. Once done, you can enable TLS in pulsar connector by providing the trust certificate file location generated in the previous step.

Also, make sure to change the brokers url from pulsar://localhost:6650 to pulsar+ssl://localhost:6650 so that secure connections are used.

For other table and stream configurations, you can headover to

Supported Pulsar versions

PInot currently relies on Pulsar client version 2.7.2. Users should make sure the Pulsar broker is compatible with the this client version.

Stream Ingestion with Dedup

Deduplication support in Apache Pinot.

Pinot provides native support of Deduplication during the real-time ingestion (v0.11.0+).

To enable dedup on a Pinot table, there are a couple of table configuration and schema changes needed.

Prerequisites for enabling dedup

There are certain mandatory configurations needed in order to be able to enable dedup.

Define the primary key in the schema

To be able to dedup records, a primary key is needed to uniquely identify a given record. To define a primary key, add the field primaryKeyColumns to the schema definition.

Note this field expects a list of columns, as the primary key can be composite.

While ingesting a record, if its primary key is found to be already present, the record will be dropped.

Partition the input stream by the primary key

An important requirement for the Pinot dedup table is to partition the input stream by the primary key. For Kafka messages, this means the producer shall set the key in the API. If the original stream is not partitioned, then a streaming processing job (e.g. Flink) is needed to shuffle and repartition the input stream into a partitioned one for Pinot's ingestion.

Use strictReplicaGroup for routing

The dedup Pinot table can use only the low-level consumer for the input streams. As a result, it uses the for the segments. Moreover, dedup poses the additional requirement that all segments of the same partition must be served from the same server to ensure the data consistency across the segments. Accordingly, it requires strictReplicaGroup as the routing strategy. To use that, configure instanceSelectorType in Routing as the following:

Other limitations

The high-level consumer is not allowed for the input stream ingestion, which means stream.kafka.consumer.type must be lowLevel.
The incoming stream must be partitioned by the primary key such that, all records with a given primaryKey must be consumed by the same Pinot server instance.

Enable dedup in the table configurations

To enable dedup for a REALTIME table, add the following to the table config.

Supported values for hashFunction are NONE, MD5 and MURMUR3, with the default being NONE.

Best practices

Unlike other real-time tables, Dedup table takes up more memory resources as it needs to bookkeep the primary key and its corresponding segment reference, in memory. As a result, it's important to plan the capacity beforehand, and monitor the resource usage. Here are some recommended practices of using Dedup table.

Create the Kafka topic with more partitions. The number of Kafka partitions determines the partition numbers of the Pinot table. The more partitions you have in the Kafka topic, more Pinot servers you can distribute the Pinot table to and therefore more you can scale the table horizontally.
Dedup table maintains an in-memory map from the primary key to the segment reference. So it's recommended to use a simple primary key type and avoid composite primary keys to save the memory cost. In addition, consider the hashFunction config in the Dedup config, which can be MD5 or MURMUR3, to store the 128-bit hashcode of the primary key instead. This is useful when your primary key takes more space. But keep in mind, this hash may introduce collisions, though the chance is very low.
Monitoring: Set up a dashboard over the metric pinot.server.dedupPrimaryKeysCount.tableName to watch the number of primary keys in a table partition. It's useful for tracking its growth which is proportional to the memory usage growth.
Capacity planning: It's useful to plan the capacity beforehand to ensure you will not run into resource constraints later. A simple way is to measure the amount of the primary keys in the Kafka throughput per partition and time the primary key space cost to approximate the memory usage. A heap dump is also useful to check the memory usage so far on an dedup table instance.

File Systems

This section contains a collection of short guides to show you how to import from a Pinot supported file system.

FileSystem is an abstraction provided by Pinot to access data in distributed file systems (DFS).

Pinot uses distributed file systems for the following purposes:

Batch Ingestion Job - To read the input data (CSV, Avro, Thrift, etc.) and to write generated segments to DFS
Controller - When a segment is uploaded to the controller, the controller saves it in the DFS configured.
Server - When a server(s) is notified of a new segment, the server copies the segment from remote DFS to their local node using the DFS abstraction.

Supported File Systems

Pinot lets you choose a distributed file system provider. The following file systems are supported by Pinot:

Enabling a File System

To use a distributed file system, you need to enable plugins. To do that, specify the plugin directory and include the required plugins -

Now, You can proceed to change the filesystem in the controller and server config as shown below:

scheme refers to the prefix used in the URI of the filesystem. e.g. for the URI s3://bucket/path/to/file , the scheme is s3

You can also change the filesystem during ingestion. In the ingestion job spec, specify the filesystem with the following config:

Azure Data Lake Storage

This guide shows you how to import data from files stored in Azure Data Lake Storage Gen2 (ADLS Gen2)

You can enable the Azure Data Lake Storage using the plugin pinot-adls. In the controller or server, add the config -

By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...

Azure Blob Storage provides the following options -

accountName : Name of the azure account under which the storage is created
accessKey : access key required for the authentication
fileSystemName - name of the filesystem to use i.e. container name (container name is similar to bucket name in S3)
enableChecksum - enable MD5 checksum for verification. Default is false.

Each of these properties should be prefixed by pinot.[node].storage.factory.class.adl2. where node is either controller or server depending on the config

e.g.

Examples

Job spec

Controller config

Server config

Minion config

Google Cloud Storage

This guide shows you how to import data from GCP (Google Cloud Platform).

You can enable the Google Cloud Storage using the plugin pinot-gcs. In the controller or server, add the config -

-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-gcs

GCP filesystems provides the following options -

projectId - The name of the Google Cloud Platform project under which you have created your storage bucket.
gcpKey - Location of the json file containing GCP keys. You can refer Creating and managing service account keys to download the keys.

Each of these properties should be prefixed by pinot.[node].storage.factory.class.gs. where node is either controller or server depending on the config

e.g.

pinot.controller.storage.factory.class.gs.projectId=test-project

Examples

Job spec

executionFrameworkSpec:
    name: 'standalone'
    segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
    segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
    segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: 'gs://my-bucket/path/to/input/directory/'
outputDirURI: 'gs://my-bucket/path/to/output/directory/'
overwriteOutput: true
pinotFSSpecs:
    - scheme: gs
      className: org.apache.pinot.plugin.filesystem.GcsPinotFS
      configs:
        projectId: 'my-project'
        gcpKey: 'path-to-gcp json key file'
recordReaderSpec:
    dataFormat: 'csv'
    className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
    configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
    tableName: 'students'
pinotClusterSpecs:
    - controllerURI: 'http://localhost:9000'

Controller config

controller.data.dir=gs://path/to/data/directory/
controller.local.temp.dir=/path/to/local/temp/directory
controller.enable.split.commit=true
pinot.controller.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
pinot.controller.storage.factory.gs.projectId=my-project
pinot.controller.storage.factory.gs.gcpKey=path/to/gcp/key.json
pinot.controller.segment.fetcher.protocols=file,http,gs
pinot.controller.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Server config

pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
pinot.server.storage.factory.gs.projectId=my-project
pinot.server.storage.factory.gs.gcpKey=path/to/gcp/key.json
pinot.server.segment.fetcher.protocols=file,http,gs
pinot.server.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Minion config

pinot.minion.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
pinot.minion.storage.factory.gs.projectId=my-project
pinot.minion.storage.factory.gs.gcpKey=path/to/gcp/key.json
pinot.minion.segment.fetcher.protocols=file,http,gs
pinot.minion.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Indexing

This page describes the different indexing techniques available in Pinot

Pinot supports the following indexing techniques:

- Dictionary-encoded forward index with bit compression
- Raw value forward index
- Sorted forward index with run-length encoding
- Bitmap inverted index
- Sorted inverted index
Text Index

Each of these techniques has advantages in different query scenarios. By default, Pinot creates a dictionary-encoded forward index for each column.

Enabling indexes

There are 2 ways to create indexes for a Pinot table.

As part of ingestion, during Pinot segment generation

Indexing is enabled by specifying the desired column names in the table config. More details about how to configure each type of index can be found in the respective index's section above or in the Table Config section.

Dynamically added or removed

Indexes can also be dynamically added to or removed from segments at any point. Update your table config with the latest set of indexes you wish to have.

For example, if you have an inverted index on the foo field and now want to include the bar field, you would update your table config from this:

To this:

The updated index config won't be picked up unless you invoke the reload API. This API sends reload messages via Helix to all servers, as part of which indexes are added or removed from the local segments. This happens without any downtime and is completely transparent to the queries.

When adding an index, only the new index is created and appended to the existing segment. When removing an index, its related states are cleaned up from Pinot servers. You can find this API under the Segments tab on Swagger:

You can also find this action on the , on the specific table's page.

Not all indexes can be retrospectively applied to existing segments. For more detailed documentation on applying indexes, see the .

Tuning Index

The inverted index provides good performance for most use cases, especially if your use case doesn't have a strict low latency requirement. You should start by using this, and if your queries aren't fast enough, switch to advanced indices like the sorted or Star-Tree index.

Inverted Index

Bitmap inverted index

When an inverted index is enabled for a column, Pinot maintains a map from each value to a bitmap of rows, which makes value lookup take constant time. If you have a column that is frequently used for filtering, adding an inverted index will improve performance greatly.

An inverted index can be configured for a table by setting it in the table config:

{
    "tableIndexConfig": {
        "invertedIndexColumns": [
            "column_name",
            ...
        ],
        ...
    }
}

Sorted inverted index

A sorted forward index can directly be used as an inverted index, with log(n) time lookup and it can benefit from data locality.

For the below example, if the query has a filter on memberId, Pinot will perform a binary search on memberId values to find the range pair of docIds for corresponding filtering value. If the query needs to scan values for other columns after filtering, values within the range docId pair will be located together, which means we can benefit from data locality.

A sorted index performs much better than an inverted index, but it can only be applied to one column per table. When the query performance with an inverted index is not good enough and most queries are filtering on the same column (e.g. memberId), a sorted index can improve the query performance.

Bloom Filter

Bloom filter helps prune segments that do not contain any record matching an EQUALITY predicate.

It would be useful for a query like the following:

There are 3 parameters to configure the Bloom Filter:

fpp: False positive probability of the bloom filter (from 0 to 1, 0.05 by default). The lower the fpp , the higher accuracy the bloom filter has, but it will also increase the size of the bloom filter.
maxSizeInBytes: Maximum size of the bloom filter (unlimited by default). If a certain fpp generates a bloom filter larger than this size, we will increase the fpp to keep the bloom filter size within this limit.
loadOnHeap: Whether to load the bloom filter using heap memory or off-heap memory (false by default).

There are 2 ways to configure a bloom filter for a table in the :

Default settings

Customized parameters

A Bloom Filter can only be applied to . Support for raw value columns is WIP.

Range Index

Range indexing allows you to get better performance for queries that involve filtering over a range.

It would be useful for a query like the following:

SELECT COUNT(*) 
FROM baseballStats 
WHERE hits > 11

A range index is a variant of an inverted index, where instead of creating a mapping from values to columns, we create mapping of a range of values to columns. You can use the range index by setting the following config in the table config.

{
    "tableIndexConfig": {
        "rangeIndexColumns": [
            "column_name",
            ...
        ],
        ...
    }
}

Range index is supported for both dictionary as well as raw encoded columns.

When to use Range Index?

A good thumb rule is to use a range index when you want to apply range predicates on metric columns that have a very large number of unique values.

Using an inverted index for such columns will create a very large index that is inefficient in terms of storage and performance.

Native Text Index

This page talks about native text indices and corresponding search functionality in Pinot

History Of Text Indexing And Search in Pinot

Pinot supports text indexing and search by building Lucene indices as "sidecars" to the main Pinot segments. While this is a great technique, it essentially limits the avenues of optimizations that can be done for Pinot specific use cases of text search.

How Is Pinot Different?

Pinot, or any other database/OLAP engine, do not need to conform to the entire full text search DSL that is traditionally used by FTS engines like ElasticSearch and Solr. Looking at traditional SQL like text search use cases, majority of text searches comprise of three patterns -- prefix wildcard queries, postfix wildcard queries and term queries.

Native Text Indices in Pinot

Native text indices are built from the ground up. They use a custom text indexing engine, coupled with Pinot's powerful inverted indices, to provide a super fast text search experience.

Benefits of Native Text Indices

Native text indices are 80-120% faster than Lucene based indices for the text search use cases mentioned above. They are also 40% smaller on disk.

Real Time Indexing And Search

A new feature that native text indices support are real time text search. For REALTIME tables, native text indices allow data to be indexed in memory in the text index, while concurrently supporting text searches on the same index.

Historically, most text indices depend on the in memory text index being written to first and then sealed, before searches are possible. This limits the freshness of the search, being near real time at best.

Native text indices come with a custom in memory text index, which allows for real time indexing and search.

Searching Native Text Indices

A new function, TEXT_CONTAINS, is introduced for supporting text search on native text indices.

Examples:

TEXT_CONTAINS can be combined using standard boolean operators

Note that TEXT_CONTAINS supports regex and term queries for now. Also, TEXT_CONTAINS will work only on native indices.

Note that TEXT_CONTAINS supports standard regex patterns (as used by LIKE in SQL Standard). So there might be some syntatical changes from Lucene queries

Creating Native Text Indices

Native text indices are a type of text search index that Pinot supports, hence are created through the regular way of using field configs to configure a text index on a given field. To indicate that the index type is native, an additional property in the field config has to be specified:

Releases

The following summarizes Pinot's releases, from the latest one to the earliest one.

Note

Before upgrading from one version to another one, please read the release notes. While the Pinot committers strive to keep releases backward-compatible and introduce new features in a compatible manner, your environment may have a unique combination of configurations/data/schema that may have been somehow overlooked. Before you roll out a new release of Pinot on your cluster, it is best that you run the compatibility test suite that Pinot provides. The tests can be easily customized to suit the configurations and tables in your pinot cluster(s). As a good practice, you should build your own test suite, mirroring the table configurations, schema, sample data, and queries that are used in your cluster.

0.12.0 (December 2022)

0.11.0 (September 2022)

0.10.0 (March 2022)

0.9.3 (December 2021)

0.9.2 (December 2021)

0.9.1 (December 2021)

0.9.0 (November 2021)

0.8.0 (August 2021)

0.7.1 (April 2021)

0.6.0 (November 2020)

0.5.0 (September 2020)

0.4.0 (June 2020)

0.3.0 (March 2020)

0.2.0 (November 2019)

0.1.0 (March 2019, First release)

0.9.3

Summary

This is a bug fixing release contains:

Update Log4j to 2.17.0 to address ()

The release is based on the release 0.9.2 with the following cherry-picks:

0.9.2

Summary

This is a bug fixing release contains:

Upgrade log4j to 2.16.0 to fix ()
Upgrade swagger-ui to 3.23.11 to fix ()
Fix the bug that RealtimeToOfflineTask failed to progress with large time bucket gaps ().

The release is based on the release 0.9.1 with the following cherry-picks:

0.9.1

Summary

This release fixes the major issue of CVE-2021-44228 and a major bug fixing of pinot admin exit code issue(#7798).

The release is based on the release 0.9.0 with the following cherry-picks:

e44d2e4 af2858a

0.1.0

The 0.1.0 is first release of Pinot as an Apache project

New Features

First release
Off-line data ingestion from Apache Hadoop
Real-time data ingestion from Apache Kafka

Recipes

Here you will find a collection of ready-made sample applications and examples for real-world data

For Users

Query

Learn how to query Apache Pinot using SQL or explore data using the web-based Pinot query console.

Apache Kafka

This guide shows you how to ingest a stream of records from an Apache Kafka topic into a Pinot table.

Introduction

In this guide, you'll learn how to import data into Pinot using Apache Kafka for real-time stream ingestion. Pinot has out-of-the-box real-time ingestion support for Kafka.

Let's setup a demo Kafka cluster locally, and create a sample topic transcript-topic

Start Kafka

docker run \
    --network pinot-demo --name=kafka \
    -e KAFKA_ZOOKEEPER_CONNECT=pinot-zookeeper:2181/kafka \
    -e KAFKA_BROKER_ID=0 \
    -e KAFKA_ADVERTISED_HOST_NAME=kafka \
    -p 2181:2181 \
    -d wurstmeister/kafka:latest

Create a Kafka Topic

docker exec \
  -t kafka \
  /opt/kafka/bin/kafka-topics.sh \
  --zookeeper pinot-zookeeper:2181/kafka \
  --partitions=1 --replication-factor=1 \
  --create --topic transcript-topic

Start Kafka

Start Kafka cluster on port 9092 using the same Zookeeper from the quick-start examples.

bin/pinot-admin.sh  StartKafka -zkAddress=localhost:2181/kafka -port 9092

Create a Kafka topic

Download the latest Kafka. Create a topic.

bin/kafka-topics.sh --create --bootstrap-server kafka:9092 --replication-factor 1 --partitions 1 --topic transcript-topic

Create Schema Configuration

We will publish the data in the same format as mentioned in the Stream ingestion docs. So you can use the same schema mentioned under Create Schema Configuration.

Create Table Configuration

The real-time table configuration for the transcript table described in the schema from the previous step.

For Kafka, we use streamType as kafka . See for available decoder class options. You can also write your own decoder by extending the StreamMessageDecoder interface and putting the jar file in plugins directory.

The lowLevel consumer reads data per partition whereas the highLevel consumer utilises Kafka high level consumer to read data from the whole stream. It doesn't have the control over which partition to read at a particular momemt.

For Kafka versions below 2.X, use org.apache.pinot.plugin.stream.kafka09.KafkaConsumerFactory

For Kafka version 2.X and above, use org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory

You can set the offset to -

smallest to start consumer from the earliest offset
largest to start consumer from the latest offset
timestamp in format yyyy-MM-dd'T'HH:mm:ss.SSSZ to start the consumer from the offset after the timestamp.
datetime duration or period to start the consumer from the offset after the period eg., '2d'.

The resulting configuration should look as follows -

/tmp/pinot-quick-start/transcript-table-realtime.json

 {
  "tableName": "transcript",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "timestamp",
    "timeType": "MILLISECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "transcript-topic",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "kafka:9092",
      "realtime.segment.flush.threshold.time": "3600000",
      "realtime.segment.flush.threshold.rows": "50000",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}

Upload schema and table

Now that we have our table and schema configurations, let's upload them to the Pinot cluster. As soon as the real-time table is created, it will begin ingesting available records from the Kafka topic.

docker run \
    --network=pinot-demo \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-streaming-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \
    -controllerHost pinot-quickstart \
    -controllerPort 9000 \
    -exec

bin/pinot-admin.sh AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \
    -exec

Add sample data to the Kafka topic

We will publish data in the following format to Kafka. Let us save the data in a file named as transcript.json.

transcript.json

{"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"Maths","score":3.8,"timestamp":1571900400000}
{"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"History","score":3.5,"timestamp":1571900400000}
{"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Maths","score":3.2,"timestamp":1571900400000}
{"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Chemistry","score":3.6,"timestamp":1572418800000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Geography","score":3.8,"timestamp":1572505200000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"English","score":3.5,"timestamp":1572505200000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Maths","score":3.2,"timestamp":1572678000000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Physics","score":3.6,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"Maths","score":3.8,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"English","score":3.5,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"History","score":3.2,"timestamp":1572854400000}
{"studentID":212,"firstName":"Nick","lastName":"Young","gender":"Male","subject":"History","score":3.6,"timestamp":1572854400000}

Push sample JSON into the transcript-topic Kafka topic, using the Kafka console producer. This will add 12 records to the topic described in the transcript.json file.

Checkin Kafka docker container

docker exec -ti kafka bash

Publish messages to the target topic

bin/kafka-console-producer.sh \
    --broker-list localhost:9092 \
    --topic transcript-topic < transcript.json

Query the Table

As soon as data flows into the stream, the Pinot table will consume it and it will be ready for querying. Head over to the Query Console to checkout the real-time data.

SELECT * FROM transcript

Kafka Ingestion Guidelines

Kafka Versions in Pinot

Pinot supports 2 major generations of Kafka library - kafka-0.9 and kafka-2.x for both high and low level consumers.

Post release 0.10.0, we have started shading kafka packages inside Pinot. If you are using our latest tagged docker images or master build, you should replace org.apache.kafka with shaded.org.apache.kafka in your table config.

Upgrade from Kafka 0.9 connector to Kafka 2.x connector

Update table config for both high level and low level consumer: Update config: stream.kafka.consumer.factory.class.name from org.apache.pinot.core.realtime.impl.kafka.KafkaConsumerFactory to org.apache.pinot.core.realtime.impl.kafka2.KafkaConsumerFactory.
If using Stream(High) level consumer: Please also add config stream.kafka.hlc.bootstrap.server into tableIndexConfig.streamConfigs. This config should be the URI of Kafka broker lists, e.g. localhost:9092.

How to consume from a Kafka version > 2.0.0?

This connector is also suitable for Kafka lib version higher than 2.0.0. In Kafka 2.0 connector pom.xml, change the kafka.lib.version from 2.0.0 to 2.1.1 will make this Connector working with Kafka 2.1.1.

Kafka Configurations in Pinot

Use Kafka Partition(Low) Level Consumer with SSL

Here is an example config which uses SSL based authentication to talk with kafka and schema-registry. Notice there are two sets of SSL options, ones starting with ssl. are for kafka consumer and ones with stream.kafka.decoder.prop.schema.registry. are for SchemaRegistryClient used by KafkaConfluentSchemaRegistryAvroMessageDecoder.

  {
    "tableName": "transcript",
    "tableType": "REALTIME",
    "segmentsConfig": {
    "timeColumnName": "timestamp",
    "timeType": "MILLISECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
    },
    "tenants": {},
    "tableIndexConfig": {
      "loadMode": "MMAP",
      "streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.consumer.type": "LowLevel",
        "stream.kafka.topic.name": "transcript-topic",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.zk.broker.url": "pinot-zookeeper:2191/kafka",
        "stream.kafka.broker.list": "localhost:9092",
        "schema.registry.url": "",
        "security.protocol": "SSL",
        "ssl.truststore.location": "",
        "ssl.keystore.location": "",
        "ssl.truststore.password": "",
        "ssl.keystore.password": "",
        "ssl.key.password": "",
        "stream.kafka.decoder.prop.schema.registry.rest.url": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.truststore.location": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.keystore.location": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.truststore.password": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.keystore.password": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.keystore.type": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.truststore.type": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.key.password": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.protocol": ""
      }
    },
    "metadata": {
      "customConfigs": {}
    }
  }

Consume Transactionally-committed Messages

The connector with Kafka library 2.0+ supports Kafka transactions. The transaction support is controlled by config kafka.isolation.level in Kafka stream config, which can be read_committed or read_uncommitted (default). Setting it to read_committed will ingest transactionally committed messages in Kafka stream only.

For example,

  {
    "tableName": "transcript",
    "tableType": "REALTIME",
    "segmentsConfig": {
    "timeColumnName": "timestamp",
    "timeType": "MILLISECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
    },
    "tenants": {},
    "tableIndexConfig": {
      "loadMode": "MMAP",
      "streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.consumer.type": "LowLevel",
        "stream.kafka.topic.name": "transcript-topic",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.zk.broker.url": "pinot-zookeeper:2191/kafka",
        "stream.kafka.broker.list": "kafka:9092",
        "stream.kafka.isolation.level": "read_committed"
      }
    },
    "metadata": {
      "customConfigs": {}
    }
  }

Note that the default value of this config read_uncommitted to read all messages. Also, this config supports low-level consumer only.

Use Kafka Partition(Low) Level Consumer with SASL_SSL

Here is an example config which uses SASL_SSL based authentication to talk with kafka and schema-registry. Notice there are two sets of SSL options, some for kafka consumer and ones with stream.kafka.decoder.prop.schema.registry. are for SchemaRegistryClient used by KafkaConfluentSchemaRegistryAvroMessageDecoder.

"streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.consumer.type": "lowlevel",
        "stream.kafka.topic.name": "mytopic",
        "stream.kafka.consumer.prop.auto.offset.reset": "largest",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.broker.list": "kafka:9092",
        "stream.kafka.schema.registry.url": "https://xxx",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
        "stream.kafka.decoder.prop.schema.registry.rest.url": "https://xxx",
        "stream.kafka.decoder.prop.basic.auth.credentials.source": "USER_INFO",
        "stream.kafka.decoder.prop.schema.registry.basic.auth.user.info": "schema_registry_username:schema_registry_password",
        "sasl.mechanism": "PLAIN" ,
        "security.protocol": "SASL_SSL" ,
        "sasl.jaas.config":"org.apache.kafka.common.security.scram.ScramLoginModule required username=\"kafkausername\" password=\"kafkapassword\";",
        "realtime.segment.flush.threshold.rows": "0",
        "realtime.segment.flush.threshold.time": "24h",
        "realtime.segment.flush.autotune.initialRows": "3000000",
        "realtime.segment.flush.threshold.segment.size": "500M"
      },

Extract Record Headers as Pinot table columns

Pinot's Kafka connector now supports automatically extracting record headers and metadata into the Pinot table columns. The following table shows the mapping for record header/metadata to Pinot table column names:

Kafka Record

Pinot Table Column

Description

Record key: any type <K>

__key : String

For simplicity of design, we assume that the record key is always a UTF-8 encoded String

Record Headers: Map<String, String>

Each header key is listed as a separate column: __header$HeaderKeyName : String

For simplicity of design, we directly map the string headers from kafka record to pinot table column

Record metadata - offset : long

__metadata$offset : String

Record metadata - recordTimestamp : long

__metadata$recordTimestamp : String

In order to enable the metadata extraction in a Kafka table, you can set the stream config metadata.populate to true.

In addition to this, if you want to actually use any of these columns in your table, you have to list them explicitly in your table's schema.

For example, if you want to add only the offset and key as dimension columns in your Pinot table, it can listed in the schema as follows:

  "dimensionFieldSpecs": [
    {
      "name": "__key",
      "dataType": "STRING"
    },
    {
      "name": "__metadata$offset",
      "dataType": "STRING"
    },
    ...
  ],

Once the schema is updated, these columns are similar to any other pinot column. You can apply ingestion transforms and / or define indexes on them.

Don't forget to follow the schema evolution guidelines when updating schema of an existing table!

Running in Kubernetes

Pinot quick start in Kubernetes

1. Prerequisites

This quickstart assumes that you already have a running Kubernetes cluster. Please follow the links below to set up a Kubernetes cluster.

Enable Kubernetes on Docker-Desktop
Install Minikube for local setup (make sure to run with enough resources e.g. minikube start --vm=true --cpus=4 --memory=8g --disk-size=50g)
Setup a Kubernetes Cluster using Amazon Elastic Kubernetes Service (Amazon EKS)
Setup a Kubernetes Cluster using Google Kubernetes Engine (GKE)
Setup a Kubernetes Cluster using Azure Kubernetes Service (AKS)

2. Setting up a Pinot cluster in Kubernetes

Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.

The scripts can be found in the Pinot source at ./pinot/kubernetes/helm

# checkout pinot
git clone https://github.com/apache/pinot.git
cd pinot/kubernetes/helm

2.1 Start Pinot with Helm

Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is here.

helm repo add pinot https://raw.githubusercontent.com/apache/pinot/master/kubernetes/helm
kubectl create ns pinot-quickstart
helm install pinot pinot/pinot \
    -n pinot-quickstart \
    --set cluster.name=pinot \
    --set server.replicaCount=2

NOTE: Please specify StorageClass based on your cloud vendor. For Pinot Server, please don't mount blob store like AzureFile/GoogleCloudStorage/S3 as the data serving file system.

Only use Amazon EBS/GCP Persistent Disk/Azure Disk style disks.

For AWS: "gp2"
For GCP: "pd-ssd" or "standard"
For Azure: "AzureDisk"
For Docker-Desktop: "hostpath"

2.1.1 Update helm dependency

helm dependency update

2.1.2 Start Pinot with Helm

For Helm v2.12.1

If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:

helm init --service-account tiller

Then deploy a new HA Pinot cluster using the following command:

helm install --namespace "pinot-quickstart" --name "pinot" pinot

For Helm v3.0.0

kubectl create ns pinot-quickstart
helm install -n pinot-quickstart pinot pinot

2.1.3 Troubleshooting (For helm v2.12.1)

Error: Please run the below command if encountering the following issue:

Error: could not find tiller.

Resolution:

kubectl -n kube-system delete deployment tiller-deploy
kubectl -n kube-system delete service/tiller-deploy
helm init --service-account tiller

Error: Please run the command below if encountering a permission issue:

Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"

Resolution:

kubectl apply -f helm-rbac.yaml

2.2 Check Pinot deployment status

kubectl get all -n pinot-quickstart

3. Load data into Pinot using Kafka

3.1 Bring up a Kafka cluster for real-time data ingestion

helm repo add kafka https://charts.bitnami.com/bitnami
helm install -n pinot-quickstart kafka kafka/kafka --set replicas=1,zookeeper.image.tag=latest

helm repo add incubator https://charts.helm.sh/incubator
helm install --namespace "pinot-quickstart"  --name kafka incubator/kafka --set zookeeper.image.tag=latest

3.2 Check Kafka deployment status

kubectl get all -n pinot-quickstart | grep kafka

Ensure the Kafka deployment is ready before executing the scripts in the following next steps.

pod/kafka-0                                                 1/1     Running     0          2m
pod/kafka-zookeeper-0                                       1/1     Running     0          10m
pod/kafka-zookeeper-1                                       1/1     Running     0          9m
pod/kafka-zookeeper-2                                       1/1     Running     0          8m

3.3 Create Kafka topics

The scripts below will create two Kafka topics for data ingestion:

kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics.sh --bootstrap-server kafka-0:9092 --topic flights-realtime --create --partitions 1 --replication-factor 1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics.sh --bootstrap-server kafka-0:9092 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1

3.4 Load data into Kafka and create Pinot schema/tables

The script below will deploy 3 batch jobs.

Ingest 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec
Ingest 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec
Upload Pinot schema airlineStats
Create Pinot table airlineStats to ingest data from JSON encoded Kafka topic flights-realtime
Create Pinot table airlineStatsAvro to ingest data from Avro encoded Kafka topic flights-realtime-avro

kubectl apply -f pinot/pinot-realtime-quickstart.yml

4. Query using Pinot Data Explorer

4.1 Pinot Data Explorer

Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.

This script can be found in the Pinot source at ./pinot/kubernetes/helm/pinot

./query-pinot-data.sh

5. Using Superset to query Pinot

5.1 Bring up Superset using helm

Install SuperSet Helm Repo

helm repo add superset https://apache.github.io/superset

Get Helm values config file:

helm inspect values superset/superset > /tmp/superset-values.yaml

Edit /tmp/superset-values.yaml file and add pinotdb pip dependency into bootstrapScript field, so Superset will install pinot dependencies during bootstrap time.

You can also build your own image with this dependency or just use image: apachepinot/pinot-superset:latest instead.

Also remember to change the admin credential inside the init section with meaningful user profile and stronger password.

Install Superset using helm

kubectl create ns superset
helm upgrade --install --values /tmp/superset-values.yaml superset superset/superset -n superset

Ensure your cluster is up by running:

kubectl get all -n superset

5.2 Access Superset UI

You can run the below command to port forward superset to your localhost:18088. Then you can navigate superset in your browser with the previous set admin credential.

kubectl port-forward service/superset 18088:8088 -n superset

Create Pinot Database using URI:

pinot+http://pinot-broker.pinot-quickstart:8099/query?controller=http://pinot-controller.pinot-quickstart:9000/

Once the database is added, you can add more data sets and explore the dashboarding.

6. Access Pinot using Trino

6.1 Deploy Trino

You can run the command below to deploy Trino with the Pinot plugin installed.

helm repo add trino https://trinodb.github.io/charts/

The above command adds Trino HelmChart repo. You can then run the below command to see the charts.

helm search repo trino

In order to connect Trino to Pinot, we need to add Pinot catalog, which requires extra configurations. You can run the below command to get all the configurable values.

helm inspect values trino/trino > /tmp/trino-values.yaml

To add Pinot catalog, you can edit the additionalCatalogs section by adding:

additionalCatalogs:
  pinot: |
    connector.name=pinot
    pinot.controller-urls=pinot-controller.pinot-quickstart:9000

Pinot is deployed at namespace pinot-quickstart, so the controller serviceURL is pinot-controller.pinot-quickstart:9000

After modifying the /tmp/trino-values.yaml file, you can deploy Trino with:

kubectl create ns trino-quickstart
helm install my-trino trino/trino --version 0.2.0 -n trino-quickstart --values /tmp/trino-values.yaml

Once you deployed the Trino, You can check Trino deployment status by:

kubectl get pods -n trino-quickstart

6.2 Query Trino using Trino CLI

Once Trino is deployed, you can run the below command to get a runnable Trino CLI.

6.2.1 Download Trino CLI

curl -L https://repo1.maven.org/maven2/io/trino/trino-cli/363/trino-cli-363-executable.jar -o /tmp/trino && chmod +x /tmp/trino

6.2.2 Port forward Trino service to your local if it's not already exposed

echo "Visit http://127.0.0.1:18080 to use your application"
kubectl port-forward service/my-trino 18080:8080 -n trino-quickstart

6.2.3 Use Trino console client to connect to Trino service

/tmp/trino --server localhost:18080 --catalog pinot --schema default

6.2.4 Query Pinot data using Trino CLI

6.3 Sample queries to execute

List all catalogs

trino:default> show catalogs;

  Catalog
---------
 pinot
 system
 tpcds
 tpch
(4 rows)

Query 20211025_010256_00002_mxcvx, FINISHED, 2 nodes
Splits: 36 total, 36 done (100.00%)
0.70 [0 rows, 0B] [0 rows/s, 0B/s]

List All tables

trino:default> show tables;

    Table
--------------
 airlinestats
(1 row)

Query 20211025_010326_00003_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.28 [1 rows, 29B] [3 rows/s, 104B/s]

Show schema

trino:default> DESCRIBE airlinestats;

        Column        |      Type      | Extra | Comment
----------------------+----------------+-------+---------
 flightnum            | integer        |       |
 origin               | varchar        |       |
 quarter              | integer        |       |
 lateaircraftdelay    | integer        |       |
 divactualelapsedtime | integer        |       |
 divwheelsons         | array(integer) |       |
 divwheelsoffs        | array(integer) |       |
......

Query 20211025_010414_00006_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.37 [79 rows, 5.96KB] [212 rows/s, 16KB/s]

Count total documents

trino:default> select count(*) as cnt from airlinestats limit 10;

 cnt
------
 9746
(1 row)

Query 20211025_015607_00009_mxcvx, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0.24 [1 rows, 9B] [4 rows/s, 38B/s]

7. Access Pinot using Presto

7.1 Deploy Presto using Pinot plugin

You can run the command below to deploy a customized Presto with the Pinot plugin installed.

helm install presto pinot/presto -n pinot-quickstart

kubectl apply -f presto-coordinator.yaml

The above command deploys Presto with default configs. For customizing your deployment, you can run the below command to get all the configurable values.

helm inspect values pinot/presto > /tmp/presto-values.yaml

After modifying the /tmp/presto-values.yaml file, you can deploy Presto with:

helm install presto pinot/presto -n pinot-quickstart --values /tmp/presto-values.yaml

Once you deployed the Presto, You can check Presto deployment status by:

kubectl get pods -n pinot-quickstart

7.2 Query Presto using Presto CLI

Once Presto is deployed, you can run the below command from here, or just follow steps 6.2.1 to 6.2.3.

./pinot-presto-cli.sh

6.2.1 Download Presto CLI

curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar -o /tmp/presto-cli && chmod +x /tmp/presto-cli

6.2.2 Port forward presto-coordinator port 8080 to localhost port 18080

kubectl port-forward service/presto-coordinator 18080:8080 -n pinot-quickstart> /dev/null &

6.2.3 Start Presto CLI with pinot catalog to query it then query it

/tmp/presto-cli --server localhost:18080 --catalog pinot --schema default

6.2.4 Query Pinot data using Presto CLI

7.3 Sample queries to execute

List all catalogs

presto:default> show catalogs;

 Catalog
---------
 pinot
 system
(2 rows)

Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

List All tables

presto:default> show tables;

    Table
--------------
 airlinestats
(1 row)

Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [1 rows, 29B] [1 rows/s, 41B/s]

Show schema

presto:default> DESCRIBE pinot.dontcare.airlinestats;

        Column        |  Type   | Extra | Comment
----------------------+---------+-------+---------
 flightnum            | integer |       |
 origin               | varchar |       |
 quarter              | integer |       |
 lateaircraftdelay    | integer |       |
 divactualelapsedtime | integer |       |
......

Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]

Count total documents

presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;

 cnt
------
 9745
(1 row)

Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [2 rows/s, 19B/s]

8. Deleting the Pinot cluster in Kubernetes

kubectl delete ns pinot-quickstart

Note: These are sample configs to be used as reference. For production setup, you may want to customize it to your needs.

Aggregation Functions

Function

Description

Example

Default Value When No Record Selected

Returns the count of the records as Long

COUNT(*)

0

Returns the population covariance between of 2 numerical columns as Double

COVAR_POP(col1, col2)

Double.NEGATIVE_INFINITY

Returns the sample covariance between of 2 numerical columns as Double

COVAR_SAMP(col1, col2)

Double.NEGATIVE_INFINITY

Calculate the histogram of a numeric column as Double[]

HISTOGRAM(numberOfGames,0,200,10)

0, 0, ..., 0

Returns the minimum value of a numeric column as Double

MIN(playerScore)

Double.POSITIVE_INFINITY

Returns the maximum value of a numeric column as Double

MAX(playerScore)

Double.NEGATIVE_INFINITY

Returns the sum of the values for a numeric column as Double

SUM(playerScore)

0

Returns the sum of the values for a numeric column with optional precision and scale as BigDecimal

SUMPRECISION(salary), SUMPRECISION(salary, precision, scale)

0.0

Returns the average of the values for a numeric column as Double

AVG(playerScore)

Double.NEGATIVE_INFINITY

Returns the most frequent value of a numeric column as Double. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.

MODE(playerScore)

MODE(playerScore, 'MIN')

MODE(playerScore, 'MAX')

MODE(playerScore, 'AVG')

Double.NEGATIVE_INFINITY

Returns the max - min value for a numeric column as Double

MINMAXRANGE(playerScore)

Double.NEGATIVE_INFINITY

Returns the Nth percentile of the values for a numeric column as Double. N is a decimal number between 0 and 100 inclusive.

PERCENTILE(playerScore, 50) PERCENTILE(playerScore, 99.9)

Double.NEGATIVE_INFINITY

Returns the Nth percentile of the values for a numeric column using as Long

PERCENTILEEST(playerScore, 50)

PERCENTILEEST(playerScore, 99.9)

Long.MIN_VALUE

Returns the Nth percentile of the values for a numeric column using as Double

PERCENTILETDIGEST(playerScore, 50)

PERCENTILETDIGEST(playerScore, 99.9)

Double.NaN

PERCENTILESMARTTDIGEST

Returns the Nth percentile of the values for a numeric column as Double. When there are too many values, automatically switch to approximate percentile using TDigest. The switch threshold (100_000 by default) and compression (100 by default) for the TDigest can be configured via the optional second argument.

PERCENTILESMARTTDIGEST(playerScore, 50)

PERCENTILESMARTTDIGEST(playerScore, 99.9, 'threshold=100;compression=50)

Double.NEGATIVE_INFINITY

Returns the count of distinct values of a column as Integer

DISTINCTCOUNT(playerName)

0

Returns the count of distinct values of a column as Integer. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.

DISTINCTCOUNTBITMAP(playerName)

0

Returns an approximate distinct count using HyperLogLog as Long. It also takes an optional second argument to configure the log2m for the HyperLogLog.

DISTINCTCOUNTHLL(playerName, 12)

0

Returns HyperLogLog response serialized as String. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

DISTINCTCOUNTRAWHLL(playerName)

0

DISTINCTCOUNTSMARTHLL

Returns the count of distinct values of a column as Integer. When there are too many distinct values, automatically switch to approximate distinct count using HyperLogLog. The switch threshold (100_000 by default) and log2m (12 by default) for the HyperLogLog can be configured via the optional second argument.

DISTINCTCOUNTSMARTHLL(playerName),

DISTINCTCOUNTSMARTHLL(playerName, 'threshold=100;log2m=8')

0

See

0

See

0

Returns the count of distinct values of a column as Long when the column is pre-partitioned for each segment, where there is no common value within different segments. This function calculates the exact count of distinct values within the segment, then simply sums up the results from different segments to get the final result.

SEGMENTPARTITIONEDDISTINCTCOUNT(playerName)

0

LASTWITHTIME(dataColumn, timeColumn, 'dataType')

Get the last value of dataColumn where the timeColumn is used to define the time of dataColumn and the dataType specifies the type of dataColumn, which can be BOOLEAN, INT, LONG, FLOAT, DOUBLE, STRING

LASTWITHTIME(playerScore, timestampColumn, 'BOOLEAN')

LASTWITHTIME(playerScore, timestampColumn, 'INT')

LASTWITHTIME(playerScore, timestampColumn, 'LONG')

LASTWITHTIME(playerScore, timestampColumn, 'FLOAT')

LASTWITHTIME(playerScore, timestampColumn, 'DOUBLE')

LASTWITHTIME(playerScore, timestampColumn, 'STRING')

INT: Int.MIN_VALUE LONG: Long.MIN_VALUE FLOAT: Float.NaN DOUBLE: Double.NaN STRING: ""

FIRSTWITHTIME(dataColumn, timeColumn, 'dataType')

Get the first value of dataColumn where the timeColumn is used to define the time of dataColumn and the dataType specifies the type of dataColumn, which can be BOOLEAN, INT, LONG, FLOAT, DOUBLE, STRING

FIRSTWITHTIME(playerScore, timestampColumn, 'BOOLEAN')

FIRSTWITHTIME(playerScore, timestampColumn, 'INT')

FIRSTWITHTIME(playerScore, timestampColumn, 'LONG')

FIRSTWITHTIME(playerScore, timestampColumn, 'FLOAT')

FIRSTWITHTIME(playerScore, timestampColumn, 'DOUBLE')

FIRSTWITHTIME(playerScore, timestampColumn, 'STRING')

INT: Int.MIN_VALUE LONG: Long.MIN_VALUE FLOAT: Float.NaN DOUBLE: Double.NaN STRING: ""

Deprecated functions:

Function

Description

Example

FASTHLL

FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLL(playerName)

Multi-value column functions

The following aggregation functions can be used for multi-value columns

Function

Returns the count of a multi-value column as Long

Returns the minimum value of a numeric multi-value column as Double

Returns the maximum value of a numeric multi-value column as Double

Returns the sum of the values for a numeric multi-value column as Double

Returns the average of the values for a numeric multi-value column as Double

Returns the max - min value for a numeric multi-value column as Double

Returns the Nth percentile of the values for a numeric multi-value column as Double

Returns the Nth percentile using as Long

Returns the Nth percentile using as Double

Returns the count of distinct values for a multi-value column as Integer

Returns the count of distinct values for a multi-value column as Integer. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.

Returns an approximate distinct count using HyperLogLog as Long

Returns HyperLogLog response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

FILTER Clause in aggregation

Pinot supports FILTER clause in aggregation queries as follows:

SELECT SUM(COL1) FILTER (WHERE COL2 > 300),
       AVG(COL2) FILTER (WHERE COL2 < 50) 
FROM MyTable WHERE COL3 > 50

In the query above, COL1 is aggregated only for rows where COL2 > 300 and COL3 > 50 . Similarly, COL2 is aggregated where COL2 < 50 and COL3 > 50.

With NULL Value Support enabled, this allows to filter out the null values while performing aggregation as follows:

SELECT SUM(COL1) FILTER (WHERE COL1 IS NOT NULL)
FROM MyTable WHERE COL3 > 50

In the above query, COL1 is aggregated only for the non-null values. Without NULL value support, we would have to filter using the default null value.

NOTE: TheFILTER clause is currently supported for aggregation-only queries, i.e., GROUP BY

is not supported.

Deprecated functions:

Function

Description

Example

FASTHLLMV (Deprecated)

stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLLMV(playerNames)

Stream ingestion

Apache Pinot lets users consume data from streams and push it directly into the database, in a process known as stream ingestion. Stream Ingestion makes it possible to query data within seconds of publication.

Stream Ingestion provides support for checkpoints for preventing data loss.

Setting up Stream ingestion involves the following steps:

Create schema configuration
Create table configuration
Upload table and schema spec

Let's take a look at each of the steps in more detail.

Let us assume the data to be ingested is in the following format:

{"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"Maths","score":3.8,"timestamp":1571900400000}
{"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"History","score":3.5,"timestamp":1571900400000}
{"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Maths","score":3.2,"timestamp":1571900400000}
{"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Chemistry","score":3.6,"timestamp":1572418800000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Geography","score":3.8,"timestamp":1572505200000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"English","score":3.5,"timestamp":1572505200000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Maths","score":3.2,"timestamp":1572678000000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Physics","score":3.6,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"Maths","score":3.8,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"English","score":3.5,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"History","score":3.2,"timestamp":1572854400000}
{"studentID":212,"firstName":"Nick","lastName":"Young","gender":"Male","subject":"History","score":3.6,"timestamp":1572854400000}

Create Schema Configuration

Schema defines the fields along with their data types. The schema also defines whether fields serve as dimensions , metrics or timestamp. For more details on schema configuration, see creating a schema.

For our sample data, the schema configuration looks like this:

/tmp/pinot-quick-start/transcript-schema.json

{
  "schemaName": "transcript",
  "dimensionFieldSpecs": [
    {
      "name": "studentID",
      "dataType": "INT"
    },
    {
      "name": "firstName",
      "dataType": "STRING"
    },
    {
      "name": "lastName",
      "dataType": "STRING"
    },
    {
      "name": "gender",
      "dataType": "STRING"
    },
    {
      "name": "subject",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "score",
      "dataType": "FLOAT"
    }
  ],
  "dateTimeFieldSpecs": [{
    "name": "timestamp",
    "dataType": "LONG",
    "format" : "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
  }]
}

Create Table Configuration

The next step is to create a table where all the ingested data will flow and can be queried. Unlike batch ingestion, table configuration for real-time ingestion also triggers the data ingestion job. For a more detailed overview of tables, see the table reference.

The real-time table configuration consists of the following fields:

tableName - The name of the table where the data should flow
tableType - The internal type for the table. Should always be set to REALTIME for realtime ingestion
segmentsConfig -
tableIndexConfig - defines which column to use for indexing along with the type of index. For full configuration, see [Indexing Configs]. It has the following required fields -
- loadMode - specifies how the segments should be loaded. Should beheap or mmap. Here's the difference between both the configs
  - mmap: Segments are loaded onto memory-mapped files. This is the default mode.
  - heap: Segments are loaded into direct memory. Note, 'heap' here is a legacy misnomer, and it does not imply JVM heap. This mode should only be used when we want faster performance than memory-mapped files, and are also sure that we will never run into OOM.
- streamConfig - specifies the data source along with the necessary configs to start consuming the real-time data. The streamConfig can be thought of as the equivalent to the job spec for batch ingestion. The following options are supported:

Config key

Description

Supported values

streamType

The streaming platform from which to consume the data

kafka

stream.[streamType].consumer.type

Whether to use per partition low-level consumer or high-level stream consumer

lowLevel - Consume data from each partition with offset management
highLevel - Consume data without control over the partitions

stream.[streamType].topic.name

The datasource (e.g. topic, data stream) from which to consume the data

String

stream.[streamType].decoder.class.name

Name of the class to be used for parsing the data. The class should implement org.apache.pinot.spi.stream.StreamMessageDecoder interface

String. Available options:

org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder
org.apache.pinot.plugin.inputformat.avro.KafkaAvroMessageDecoder
org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder
org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder
org.apache.pinot.plugin.inputformat.csv.CSVMessageDecoder
org.apache.pinot.plugin.inputformat.protobuf.ProtoBufMessageDecoder

stream.[streamType].consumer.factory.class.name

Name of the factory class to be used to provide the appropriate implementation of low level and high level consumer as well as the metadata

String. Available options:

org.apache.pinot.plugin.stream.kafka09.KafkaConsumerFactory
org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory
org.apache.pinot.plugin.stream.kinesis.KinesisConsumerFactory
org.apache.pinot.plugin.stream.pulsar.PulsarConsumerFactory

stream.[streamType].consumer.prop.auto.offset.reset

Determines the offset from which to start the ingestion

smallest
largest or
timestamp in milliseconds

topic.consumption.rate.limit

Determines the upper bound for consumption rate for the whole topic. Having a consumption rate limiter is beneficial in case the stream message rate has a bursty pattern which leads to long GC pauses on the Pinot servers. The rate limiter can also be considered as a safeguard against excessive ingestion of realtime tables.

Double. The values should be greater than zero.

The following flush threshold settings are also supported:

Config key

Description

Supported values

realtime.segment.flush.threshold.time

Time threshold that will keep the realtime segment open for before we complete the segment. Noted that this time should be smaller than the Kafka retention period configured for the corresponding topic.

realtime.segment.flush.threshold.rows

Row count flush threshold for realtime segments. This behaves in a similar way for HLC and LLC. For HLC,

since there is only one consumer per server, this size is used as the size of the consumption buffer and determines after how many rows we flush to disk. For example, if this threshold is set to two million rows,

then a high level consumer would have a buffer size of two million.

If this value is set to 0, then the consumers adjust the number of rows consumed by a partition such that the size of the completed segment is the desired size (unless

threshold.time is reached first)

realtime.segment.flush.threshold.segment.size

The desired size of a completed realtime segment. This config is used only if realtime.segment.flush.threshold.rows is set to 0.

You can also specify additional configs for the consumer directly into the streamConfigs.

For our sample data and schema, the table config will look like this:

{
  "tableName": "transcript",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "timestamp",
    "timeType": "MILLISECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "transcript-topic",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "localhost:9876",
      "realtime.segment.flush.threshold.time": "3600000",
      "realtime.segment.flush.threshold.rows": "50000",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}

Upload schema and table config

Now that we have our table and schema configurations, let's upload them to the Pinot cluster. As soon as the configs are uploaded, pinot will start ingesting available records from the topic.

docker run \
    --network=pinot-demo \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-streaming-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \
    -controllerHost pinot-quickstart \
    -controllerPort 9000 \
    -exec

bin/pinot-admin.sh AddTable \
    -schemaFile /path/to/transcript-schema.json \
    -tableConfigFile /path/to/transcript-table-realtime.json \
    -exec

Tuning Stream Config

Throttling Stream Consumption

There are some scenarios where the message rate in the input stream has a bursty nature which can lead to long GC pauses on the Pinot servers or affect the ingestion rate of other realtime tables on the same server. In such scenarios, you should throttle the consumption rate during stream ingestion.

Stream consumption throttling can be tuned using the stream config topic.consumption.rate.limit which indicates the upper bound on the message rate for the entire topic.

Here is the sample configuration on how to configure the consumption throttling:

{
  "tableName": "transcript",
  "tableType": "REALTIME",
  ...
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "transcript-topic",
      ...
      "topic.consumption.rate.limit": 1000
    }
  },
  ...
}

Some things to keep in mind while tuning this config are:

Since this config applied to the entire topic, internally, this rate is divided by the number of partitions in the topic and applied to each partition's consumer.
In case of multi-tenant deployment (where you have more than 1 table in the same server instance), you need to make sure that the rate limit on one table doesn't step on/starve the rate limiting of another table. So, when there is more than 1 table on the same server (which is most likely to happen), you may need to re-tune the throttling threshold for all the streaming tables.

Once throttling is enabled for a table, you can verify by searching for a log that looks similar to:

A consumption rate limiter is set up for topic <topic_name> in table <tableName> with rate limit: <rate_limit> (topic rate limit: <topic_rate_limit>, partition count: <partition_count>)

In addition, you can monitor the consumption rate utilization with the metric COSUMPTION_QUOTA_UTILIZATION.

Note that any configuration change for topic.consumption.rate.limit in the stream config will NOT take effect immediately. The new configuration will be picked up from the next consuming segment. In order to enforce the new configuration, you need to trigger forceCommit APIs. Please refer to Pause Stream Ingestion for more details.

$ curl -X POST {controllerHost}/tables/{tableName}/forceCommit

Custom Ingestion Support

We are working on support for other ingestion platforms, but you can also write your own ingestion plugin if it is not supported out of the box. For a walkthrough, see Stream Ingestion Plugin.

Pause Stream Ingestion

There are some scenarios in which you may want to pause the realtime ingestion while your table is available for queries. For example if there is a problem with the stream ingestion, while you are troubleshooting the issue, you still want the queries to be executed on the already ingested data. For these scenarios, you can first issue a Pause request to a Controller host. After troubleshooting with the stream is done, you can issue another request to Controller to resume the consumption.

$ curl -X POST {controllerHost}/tables/{tableName}/pauseConsumption
$ curl -X POST {controllerHost}/tables/{tableName}/resumeConsumption

When a Pause request is issued, Controller instructs the realtime servers hosting your table to commit their consuming segments immediately. However, the commit process may take some time to complete. Please note that Pause and Resume requests are async. OK response means that instructions for pausing or resuming has been successfully sent to the realtime server. If you want to know if the consumptions actually stopped or resumed, you can issue a pause status request.

$ curl -X POST {controllerHost}/tables/{tableName}/pauseStatus

It's worth noting that consuming segments on realtime servers are stored in volatile memory, and their resources are allocated when the consuming segments are first created. These resources cannot be altered if consumption parameters are changed midway through consumption. It may therefore take hours before these changes take effect. Furthermore, if the parameters are changed in an incompatible way (for example, changing the underlying stream with a completely new set of offsets, or changing the stream endpoint from which to consume messages, etc.), it will result in the table getting into an error state.

Pause and resume feature comes to the rescue here. When a Pause request is issued by the operator, consuming segments are committed without starting new mutables ones. Instead, new mutable segments are started only when the Resume request is issued. This mechanism provides the operators as well as developers with more flexibility. It also enables Pinot to be more resilient to the operational and functional constraints imposed by underlying streams.

There is another feature called "Force Commit" which utilizes the primitives of pause and resume feature. When the operator issues a force commit request, the current mutable segments will be committed and new ones started right away. Operators can now use this feature for all compatible table config parameter changes to take effect immediately.

$ curl -X POST {controllerHost}/tables/{tableName}/forceCommit

(v 0.12.0+) Once submitted, the forceCommit API returns a jobId that can be used to get the current progress of the forceCommit operation. A sample response and status API call:

$ curl -X POST {controllerHost}/tables/{tableName}/forceCommit
{
  "forceCommitJobId": "6757284f-b75b-45ce-91d8-a277bdbc06ae",
  "forceCommitStatus": "SUCCESS",
  "jobMetaZKWriteStatus": "SUCCESS"
}

$ curl -X GET {controllerHost}/tables/forceCommitStatus/6757284f-b75b-45ce-91d8-a277bdbc06ae
{
  "jobId": "6757284f-b75b-45ce-91d8-a277bdbc06ae",
  "segmentsForceCommitted": "[\"airlineStats__0__0__20230119T0700Z\",\"airlineStats__1__0__20230119T0700Z\",\"airlineStats__2__0__20230119T0700Z\"]",
  "submissionTimeMs": "1674111682977",
  "numberOfSegmentsYetToBeCommitted": 0,
  "jobType": "FORCE_COMMIT",
  "segmentsYetToBeCommitted": [],
  "tableName": "airlineStats_REALTIME"
}

For incompatible parameter changes, an option is added to the resume request to handle the case of a completely new set of offsets. Operators can now follow a three-step process: First, issue a Pause request. Second, change the consumption parameters. Finally, issue the Resume request with the appropriate option. These steps will preserve the old data and allow the new data to be consumed immediately. All through the operation, queries will continue to be served.

$ curl -X POST {controllerHost}/tables/{tableName}/resumeConsumption?resumeFrom=smallest
$ curl -X POST {controllerHost}/tables/{tableName}/resumeConsumption?resumeFrom=largest

Handling partition changes in Streams

If a Pinot table is configured to consume using a Low Level (partition-based) stream type, then it is possible that the partitions of the table change over time. In Kafka, for example, the number of partitions may increase. In Kinesis, the number of partitions may increase or decrease -- some partitions could be merged to create a new one, or existing partitions split to create new ones.

Pinot runs a periodic task called RealtimeSegmentValidationManager that monitors such changes and starts consumption on new partitions (or stops consumptions from old ones) as necessary. Since this is a periodic task that is run on the controller, it may take some time for Pinot to recognize new partitions and start consuming from them. This may delay the data in new partitions appearing in the results that pinot returns.

If it is desired to recognize the new partitions sooner, then you can manually trigger the periodic task so as to recognize such data immediately.

Inferring Ingestion Status of Realtime Tables

Often, it is important to understand the rate of ingestion of data into your realtime table. This is commonly done by looking at the consumption "lag" of the consumer. The lag itself can be observed in many dimensions. Pinot supports observing consumption lag along the offset dimension and time dimension, whenever applicable (as it depends on the specifics of the connector).

The ingestion status of a connector can be observed by querying either the /consumingSegmentsInfo API or the table's /debug API, as shown below:

# GET /tables/{tableName}/consumingSegmentsInfo
curl -X GET "http://<controller_url:controller_admin_port>/tables/meetupRsvp/consumingSegmentsInfo" -H "accept: application/json"

# GET /debug/tables/{tableName}
curl -X GET "http://localhost:9000/debug/tables/meetupRsvp?type=REALTIME&verbosity=1" -H "accept: application/json"

A sample response from a Kafka based realtime table is shown below. The ingestion status is displayed for each of the CONSUMING segments in the table.

{
  "_segmentToConsumingInfoMap": {
    "meetupRsvp__0__0__20221019T0639Z": [
      {
        "serverName": "Server_192.168.0.103_7000",
        "consumerState": "CONSUMING",
        "lastConsumedTimestamp": 1666161593904,
        "partitionToOffsetMap": { // <<-- Deprecated. See currentOffsetsMap for same info
          "0": "6"
        },
        "partitionOffsetInfo": {
          "currentOffsetsMap": {
            "0": "6" // <-- Current consumer position
          },
          "latestUpstreamOffsetMap": {
            "0": "6"  // <-- Upstream latest position
          },
          "recordsLagMap": {
            "0": "0"  // <-- Lag, in terms of #records behind latest
          },
          "recordsAvailabilityLagMap": {
            "0": "2"  // <-- Lag, in terms of time
          }
        }
      }
    ],

Term

Description

currentOffsetsMap

Current consuming offset position per partition

latestUpstreamOffsetMap

(Wherever applicable) Latest offset found in the upstream topic partition

recordsLagMap

(Whenever applicable) Defines how far behind the current record's offset / pointer is from upstream latest record. This is calculated as the difference between the latestUpstreamOffset and currentOffset for the partition when the lag computation request is made.

recordsAvailabilityLagMap

(Whenever applicable) Defines how soon after record ingestion was the record consumed by Pinot. This is calculated as the difference between the time the record was consumed and the time at which the record was ingested upstream.

Monitoring Realtime Ingestion

Realtime ingestion includes 3 stages of message processing: Decode, Transform and Index.

In each of these stages, a failure can happen which may or may not result in an ingestion failure. The following metrics are available to investigation ingestion issues:

Decode stage -> error here is recorded as INVALID_REALTIME_ROWS_DROPPED
Transform stage -> possible errors here are:
1. When a message gets dropped due to the FILTER transform, it is recorded as REALTIME_ROWS_FILTERED
2. When the transform pipeline sets the $INCOMPLETE_RECORD_KEY$ key in the message, it is recorded as INCOMPLETE_REALTIME_ROWS_CONSUMED , only when continueOnError configuration is enabled. If the continueOnError is not enabled, the ingestion fails.
Index stage -> When there is failure at this stage, the ingestion typically stops and marks the partition as ERROR.

There is yet another metric called ROWS_WITH_ERROR which is the sum of all error counts in the 3 stages above.

Furthermore, the metric REALTIME_CONSUMPTION_EXCEPTIONS gets incremented whenever there is a transient/permanent stream exception seen during consumption.

These metrics can be used to understand why ingestion failed for a particular table partition before diving into the server logs.

GapFill Function For Time-Series Dataset

Many of the datasets are time series in nature, tracking state change of an entity over time. The granularity of recorded data points might be sparse or the events could be missing due to network and other device issues in the IOT environment. But analytics applications which are tracking the state change of these entities over time, might be querying for values at lower granularity than the metric interval.

Here is the sample data set tracking the status of parking lots in parking space.

lotId

event_time

is_occupied

2021-10-01 09:01:00.000

2021-10-01 09:17:00.000

2021-10-01 09:33:00.000

2021-10-01 09:47:00.000

2021-10-01 10:05:00.000

2021-10-01 10:06:00.000

2021-10-01 10:16:00.000

2021-10-01 10:31:00.000

2021-10-01 11:17:00.000

2021-10-01 11:54:00.000

We want to find out the total number of parking lots that are occupied over a period of time which would be a common use case for a company that manages parking spaces.

Let us take 30 minutes' time bucket as an example:

timeBucket/lotId

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

0,1

2021-10-01 10:00:00.000

0,1

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

If you look at the above table, you will see a lot of missing data for parking lots inside the time buckets. In order to calculate the number of occupied park lots per time bucket, we need gap fill the missing data.

The Ways of Gap Filling the Data

There are two ways of gap filling the data: FILL_PREVIOUS_VALUE and FILL_DEFAULT_VALUE.

FILL_PREVIOUS_VALUE means the missing data will be filled with the previous value for the specific entity, in this case, park lot, if the previous value exists. Otherwise, it will be filled with the default value.

FILL_DEFAULT_VALUE means that the missing data will be filled with the default value. For numeric column, the defaul value is 0. For Boolean column type, the default value is false. For TimeStamp, it is January 1, 1970, 00:00:00 GMT. For STRING, JSON and BYTES, it is empty String. For Array type of column, it is empty array.

We will leverage the following the query to calculate the total occupied parking lots per time bucket.

Aggregation/Gapfill/Aggregation

Query Syntax

SELECT time_col, SUM(status) AS occupied_slots_count
FROM (
    SELECT GAPFILL(time_col,'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS','2021-10-01 09:00:00.000',
                   '2021-10-01 12:00:00.000','30:MINUTES', FILL(status, 'FILL_PREVIOUS_VALUE'),
                    TIMESERIESON(lotId)), lotId, status
    FROM (
        SELECT DATETIMECONVERT(event_time,'1:MILLISECONDS:EPOCH',
               '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS','30:MINUTES') AS time_col,
               lotId, lastWithTime(is_occupied, event_time, 'INT') AS status
        FROM parking_data
        WHERE event_time >= 1633078800000 AND  event_time <= 1633089600000
        GROUP BY 1, 2
        ORDER BY 1
        LIMIT 100)
    LIMIT 100)
GROUP BY 1
LIMIT 100

Workflow

The most nested sql will convert the raw event table to the following table.

lotId

event_time

is_occupied

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

2021-10-01 10:00:00.000

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

The second most nested sql will gap fill the returned data as following:

timeBucket/lotId

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

2021-10-01 10:00:00.000

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

The outermost query will aggregate the gapfilled data as follows:

timeBucket

totalNumOfOccuppiedSlots

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

2021-10-01 10:00:00.000

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

There is one assumption we made here that the raw data is sorted by the timestamp. The Gapfill and Post-Gapfill Aggregation will not sort the data.

The above example just shows the use case where the three steps happen:

The raw data will be aggregated;
The aggregated data will be gapfilled;
The gapfilled data will be aggregated.

There are three more scenarios we can support.

Select/Gapfill

If we want to gapfill the missing data per half an hour time bucket, here is the query:

Query Syntax

SELECT GAPFILL(DATETIMECONVERT(event_time,'1:MILLISECONDS:EPOCH',
               '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS','30:MINUTES'),
               '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS','2021-10-01 09:00:00.000',
               '2021-10-01 12:00:00.000','30:MINUTES', FILL(is_occupied, 'FILL_PREVIOUS_VALUE'),
               TIMESERIESON(lotId)) AS time_col, lotId, is_occupied
FROM parking_data
WHERE event_time >= 1633078800000 AND  event_time <= 1633089600000
ORDER BY 1
LIMIT 100

Workflow

At first the raw data will be transformed as follows:

lotId

event_time

is_occupied

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

2021-10-01 10:00:00.000

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

Then it will be gapfilled as follows:

lotId

event_time

is_occupied

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

2021-10-01 10:00:00.000

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

Aggregate/Gapfill

Query Syntax

SELECT GAPFILL(time_col,'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS','2021-10-01 09:00:00.000',
               '2021-10-01 12:00:00.000','30:MINUTES', FILL(status, 'FILL_PREVIOUS_VALUE'),
               TIMESERIESON(lotId)), lotId, status
FROM (
    SELECT DATETIMECONVERT(event_time,'1:MILLISECONDS:EPOCH',
           '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS','30:MINUTES') AS time_col,
           lotId, lastWithTime(is_occupied, event_time, 'INT') AS status
    FROM parking_data
    WHERE event_time >= 1633078800000 AND  event_time <= 1633089600000
    GROUP BY 1, 2
    ORDER BY 1
    LIMIT 100)
LIMIT 100

Workflow

The nested sql will convert the raw event table to the following table.

lotId

event_time

is_occupied

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

2021-10-01 10:00:00.000

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

The outer sql will gap fill the returned data as following:

timeBucket/lotId

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

2021-10-01 10:00:00.000

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

Gapfill/Aggregate

Query Syntax

SELECT time_col, SUM(is_occupied) AS occupied_slots_count
FROM (
    SELECT GAPFILL(DATETIMECONVERT(event_time,'1:MILLISECONDS:EPOCH',
           '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS','30:MINUTES'),
           '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS','2021-10-01 09:00:00.000',
           '2021-10-01 12:00:00.000','30:MINUTES', FILL(is_occupied, 'FILL_PREVIOUS_VALUE'),
           TIMESERIESON(lotId)) AS time_col, lotId, is_occupied
    FROM parking_data
    WHERE event_time >= 1633078800000 AND  event_time <= 1633089600000
    ORDER BY 1
    LIMIT 100)
GROUP BY 1
LIMIT 100

Workflow

The raw data will be transformed as following at first:

lotId

event_time

is_occupied

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

2021-10-01 10:00:00.000

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

The transformed data will be gap filled as follows:

lotId

event_time

is_occupied

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

2021-10-01 10:00:00.000

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

The aggregation will generate the following table:

timeBucket

totalNumOfOccuppiedSlots

2021-10-01 09:00:00.000

2021-10-01 09:30:00.000

2021-10-01 10:00:00.000

2021-10-01 10:30:00.000

2021-10-01 11:00:00.000

2021-10-01 11:30:00.000

0.12.0

Multi-Stage Query Engine

New join semantics support

Left join (#9466)
In-equi join (#9448)
Full join (#9907)
Right join (#9907)
Semi join (#9367)
Using keyword (#9373)

New sql semantics support:

Having (#9274)
Order by (#9279)
In/NotIn clause (#9374)
Cast (#9384)
LIke/Rexlike (#9654)
Range predicate (#9445)

Performance enhancement

Thread safe query planning (#9344)
Partial query execution and round robin scheduling (#9753)
Improve data table serde (#9731)

Major updates

Force commit consuming segments by @sajjad-moradi in #9197
add a freshness based consumption status checker by @jadami10 in #9244
Add metrics to track controller segment download and upload requests in progress by @gviedma in #9258
Adding endpoint to download local log files for each component by @xiangfu0 in #9259
[Feature] Add an option to search input files recursively in ingestion job. The default is set to true to be backward compatible. by @61yao in #9265
add query cancel APIs on controller backed by those on brokers by @klsince in #9276
Add Spark Job Launcher tool by @KKcorps in #9288
Enable Consistent Data Push for Standalone Segment Push Job Runners by @yuanbenson in #9295
Allow server to directly return the final aggregation result by @Jackie-Jiang in #9304
TierBasedSegmentDirectoryLoader to keep segments in multi-datadir by @klsince in #9306
Adaptive Server Selection by @vvivekiyer in #9311
[Feature] Support IsDistinctFrom and IsNotDistinctFrom by @61yao in #9312
Allow ingestion of errored records with incorrect datatype by @KKcorps in #9320
Allow setting custom time boundary for hybrid table queries by @saurabhd336 in #9356
skip late cron job with max allowed delay by @klsince in #9372
Do not allow implicit cast for BOOLEAN and TIMESTAMP by @Jackie-Jiang in #9385
Add missing properties in CSV plugin by @KKcorps in #9399
set MDC so that one can route minion task logs to separate files cleanly by @klsince in #9400
Add a new API to fix segment date time in metadata by @KKcorps in #9413
Update get bytes to return raw bytes of string and support getBytesMV by @61yao in #9441
Exposing consumer's record lag in /consumingSegmentsInfo by @navina in #9515
Do not create dictionary for high-cardinality columns by @KKcorps in #9527
get task runtime configs tracked in Helix by @klsince in #9540
Add more options to json index by @Jackie-Jiang in #9543
add SegmentTierAssigner and refine restful APIs to get segment tier info by @klsince in #9598
Add segment level debug API by @saurabhd336 in #9609
Add record availability lag for Kafka connector by @navina in #9621
notify servers that need to move segments to new tiers via SegmentReloadMessage by @klsince in #9624
Allow to configure multi-datadirs as instance configs and a Quickstart example about them by @klsince in #9705
Customize stopword for Lucene Index by @jasperjiaguo in #9708
Add memory optimized dimension table by @KKcorps in #9802
ADLS file system upgrade by @xiangfu0 in #9855
Added Delete Schema/Table pinot admin commands by @bagipriyank in #9857
Adding new ADLSPinotFS auth type: DEFAULT by @xiangfu0 in #9860
Add rate limit to Kinesis requests by @KKcorps in #9863
Adding configs for zk client timeout by @xiangfu0 in #9975

Other features/changes

Show most recent scheduling errors by @satishwaghela in #9161
Do not use aggregation result for distinct query in IntermediateResultsBlock by @Jackie-Jiang in #9262
Emit metrics for ratio of actual consumption rate to rate limit in realtime tables by @sajjad-moradi in #9201
add metrics entry offlineTableCount by @walterddr in #9270
refine query cancel resp msg by @klsince in #9242
add @ManualAuthorization annotation for non-standard endpoints by @apucher in #9252
Optimize ser/de to avoid using output stream by @Jackie-Jiang in #9278
Add Support for Covariance Function by @SabrinaZhaozyf in #9236
Throw an exception when MV columns are present in the order-by expression list in selection order-by only queries by @somandal in #9078
Improve server query cancellation and timeout checking during execution by @jasperjiaguo in #9286
Add capabilities to ingest from another stream without disabling the realtime table by @sajjad-moradi in #9289
Add minMaxInvalid flag to avoid unnecessary needPreprocess by @npawar in #9238
Add array cardinality function by @walterddr in #9300
TierBasedSegmentDirectoryLoader to keep segments in multi-datadir by @klsince in #9306
Add support for custom null values in CSV record reader by @KKcorps in #9318
Infer parquet reader type based on file metadata by @saurabhd336 in #9294
Add Support for Cast Function on MV Columns by @SabrinaZhaozyf in #9296
Allow ingestion of errored records with incorrect datatype by @KKcorps in #9320
[Feature] Not Operator Transformation by @61yao in #9330
Handle null string in CSV decoder by @KKcorps in #9340
[Feature] Not scalar function by @61yao in #9338
Add support for EXTRACT syntax and converts it to appropriate Pinot expression by @tanmesh in #9184
Add support for Auth in controller requests in java query client by @KKcorps in #9230
delete all related minion task metadata when deleting a table by @zhtaoxiang in #9339
BloomFilterRule should only recommend for supported column type by @yuanbenson in #9364
Support all the types in ParquetNativeRecordReader by @xiangfu0 in #9352
Improve segment name check in metadata push by @zhtaoxiang in #9359
Allow expression transformer cotinue on error by @xiangfu0 in #9376
skip late cron job with max allowed delay by @klsince in #9372
Enhance and filter predicate evaluation efficiency by @jasperjiaguo in #9336
Deprecate instanceId Config For Broker/Minion Specific Configs by @ankitsultana in #9308
Optimize combine operator to fully utilize threads by @Jackie-Jiang in #9387
Terminate the query after plan generation if timeout by @jasperjiaguo in #9386
[Feature] Support IsDistinctFrom and IsNotDistinctFrom by @61yao in #9312
[Feature] Support Coalesce for Column Names by @61yao in #9327
Disable logging for interrupted exceptions in kinesis by @KKcorps in #9405
Benchmark thread cpu time by @jasperjiaguo in #9408
Use ISODateTimeFormat as default for SIMPLE_DATE_FORMAT by @KKcorps in #9378
Extract the common logic for upsert metadata manager by @Jackie-Jiang in #9435
Make minion task metadata manager methods more generic by @saurabhd336 in #9436
Always pass clientId to kafka's consumer properties by @navina in #9444
Adaptive Server Selection by @vvivekiyer in #9311
Refine IndexHandler methods a bit to make them reentrant by @klsince in #9440
use MinionEventObserver to track finer grained task progress status on worker by @klsince in #9432
Allow spaces in input file paths by @KKcorps in #9426
Add support for gracefully handling the errors while transformations by @KKcorps in #9377
Cache Deleted Segment Names in Server to Avoid SegmentMissingError by @ankitsultana in #9423
Handle Invalid timestamps by @KKcorps in #9355
refine minion worker event observer to track finer grained progress for tasks by @klsince in #9449
spark-connector should use v2/brokers endpoint by @itschrispeck in #9451
Remove netty server query support from presto-pinot-driver to remove pinot-core and pinot-segment-local dependencies by @xiangfu0 in #9455
Adaptive Server Selection: Address pending review comments by @vvivekiyer in #9462
track progress from within segment processor framework by @klsince in #9457
Decouple ser/de from DataTable by @Jackie-Jiang in #9468
collect file info like mtime, length while listing files for free by @klsince in #9466
Extract record keys, headers and metadata from Stream sources by @navina in #9224
[pinot-spark-connector] Bump spark connector max inbound message size by @cbalci in #9475
refine the minion task progress api a bit by @klsince in #9482
add parsing for AT TIME ZONE by @agavra in #9477
Eliminate explosion of metrics due to gapfill queries by @elonazoulay in #9490
ForwardIndexHandler: Change compressionType during segmentReload by @vvivekiyer in #9454
Introduce Segment AssignmentStrategy Interface by @GSharayu in #9309
Add query interruption flag check to broker groupby reduction by @jasperjiaguo in #9499
adding optional client payload by @walterddr in #9465
[feature] distinct from scalar functions by @61yao in #9486
Check data table version on server only for null handling by @Jackie-Jiang in #9508
Add docId and column name to segment read exception by @KKcorps in #9512
Sort scanning based operators by cardinality in AndDocIdSet evaluation by @jasperjiaguo in #9420
Do not fail CI when codecov upload fails by @Jackie-Jiang in #9522
[Upsert] persist validDocsIndex snapshot for Pinot upsert optimization by @deemoliu in #9062
broker filter by @dongxiaoman in #9391
[feature] coalesce scalar by @61yao in #9487
Allow setting custom time boundary for hybrid table queries by @saurabhd336 in #9356
[GHA] add cache timeout by @walterddr in #9524
Optimize PinotHelixResourceManager.hasTable() by @Jackie-Jiang in #9526
Include exception when upsert metadata manager cannot be created by @Jackie-Jiang in #9532
allow to config task expire time by @klsince in #9530
expose task finish time via debug API by @klsince in #9534
Remove the wrong warning log in KafkaPartitionLevelConsumer by @Jackie-Jiang in #9536
starting http server for minion worker conditionally by @klsince in #9542
Make StreamMessage generic and a bug fix by @vvivekiyer in #9544
Improve primary key serialization performance by @KKcorps in #9538
[Upsert] Skip removing upsert metadata when shutting down the server by @Jackie-Jiang in #9551
add array element at function by @walterddr in #9554
Handle the case when enableNullHandling is true and an aggregation function is used w/ a column that has an empty null bitmap by @nizarhejazi in #9566
Support segment storage format without forward index by @somandal in #9333
Adding SegmentNameGenerator type inference if not explicitly set in config by @timsants in #9550
add version information to JMX metrics & component logs by @agavra in #9578
remove unused RecordTransform/RecordFilter classes by @agavra in #9607
Support rewriting forward index upon changing compression type for existing raw MV column by @vvivekiyer in #9510
Support Avro's Fixed data type by @sajjad-moradi in #9642
[feature] [kubernetes] add loadBalancerSourceRanges to service-external.yaml for controller and broker by @jameskelleher in #9494
Limit up to 10 unavailable segments to be printed in the query exception by @Jackie-Jiang in #9617
remove more unused filter code by @agavra in #9620
Do not cache record reader in segment by @Jackie-Jiang in #9604
make first part of user agent header configurable by @rino-kadijk in #9471
optimize order by sorted ASC, unsorted and order by DESC cases by @gortiz in #8979
Enhance cluster config update API to handle non-string values properly by @Jackie-Jiang in #9635
Reverts recommender REST API back to PUT (reverts PR #9326) by @yuanbenson in #9638
Remove invalid pruner names from server config by @Jackie-Jiang in #9646
Using usageHelp instead of deprecated help in picocli commands by @navina in #9608
Handle unique query id on server by @Jackie-Jiang in #9648
stateless group marker missing several by @walterddr in #9673
Support reloading consuming segment using force commit by @Jackie-Jiang in #9640
Improve star-tree to use star-node when the predicate matches all the non-star nodes by @Jackie-Jiang in #9667
add FetchPlanner interface to decide what column index to prefetch by @klsince in #9668
Improve star-tree traversal using ArrayDeque by @Jackie-Jiang in #9688
Handle errors in combine operator by @Jackie-Jiang in #9689
return different error code if old version is not on master by @SabrinaZhaozyf in #9686
Support creating dictionary at runtime for an existing column by @vvivekiyer in #9678
check mutable segment explicitly instead of checking existence of indexDir by @klsince in #9718
Remove leftover file before downloading segmentTar by @npawar in #9719
add index key and size map to segment metadata by @walterddr in #9712
Use ideal state as source of truth for segment existence by @Jackie-Jiang in #9735
Close Filesystem on exit with Minion Tasks by @KKcorps in #9681
render the tables list even as the table sizes are loading by @jadami10 in #9741
Add Support for IP Address Function by @SabrinaZhaozyf in #9501
bubble up error messages from broker by @agavra in #9754
Add support to disable the forward index for existing columns by @somandal in #9740
show table metadata info in aggregate index size form by @walterddr in #9733
Preprocess immutable segments from REALTIME table conditionally when loading them by @klsince in #9772
revert default timeout nano change in QueryConfig by @agavra in #9790
AdaptiveServerSelection: Update stats for servers that have not responded by @vvivekiyer in #9801
Add null value index for default column by @KKcorps in #9777
[MergeRollupTask] include partition info into segment name by @zhtaoxiang in #9815
Adding a consumer lag as metric via a periodic task in controller by @navina in #9800
Deserialize Hyperloglog objects more optimally by @priyen in #9749
Download offline segments from peers by @wirybeaver in #9710
Thread Level Usage Accounting and Query Killing on Server by @jasperjiaguo in #9727
Add max merger and min mergers for partial upsert by @deemoliu in #9665
#9518 added pinot helm 0.2.6 with secure version pinot 0.11.0 by @bagipriyank in #9519
Combine the read access for replication config by @snleee in #9849
add v1 ingress in helm chart by @jhisse in #9862
Optimize AdaptiveServerSelection for replicaGroup based routing by @vvivekiyer in #9803
Do not sort the instances in InstancePartitions by @Jackie-Jiang in #9866
Merge new columns in existing record with default merge strategy by @navina in #9851
Support disabling dictionary at runtime for an existing column by @vvivekiyer in #9868
support BOOL_AND and BOOL_OR aggregate functions by @agavra in #9848
Use Pulsar AdminClient to delete unused subscriptions by @navina in #9859
add table sort function for table size by @jadami10 in #9844
In Kafka consumer, seek offset only when needed by @Jackie-Jiang in #9896
fallback if no broker found for the specified table name by @klsince in #9914
Allow liveness check during server shutting down by @Jackie-Jiang in #9915
Allow segment upload via Metadata in MergeRollup Minion task by @KKcorps in #9825
Add back the Helix workaround for missing IS change by @Jackie-Jiang in #9921
Allow uploading realtime segments via CLI by @KKcorps in #9861
Add capability to update and delete table config via CLI by @KKcorps in #9852
default to TAR if push mode is not set by @klsince in #9935
load startree index via segment reader interface by @klsince in #9828
Allow collections for MV transform functions by @saurabhd336 in #9908
Construct new IndexLoadingConfig when loading completed realtime segments by @vvivekiyer in #9938
Make GET /tableConfigs backwards compatible in case schema does not match raw table name by @timsants in #9922
feat: add compressed file support for ORCRecordReader by @etolbakov in #9884
Add Variance and Standard Deviation Aggregation Functions by @snleee in #9910
enable MergeRollupTask on realtime tables by @zhtaoxiang in #9890
Update cardinality when converting raw column to dict based by @vvivekiyer in #9875
Add back auth token for UploadSegmentCommand by @timsants in #9960
Improving gz support for avro record readers by @snleee in #9951
Default column handling of noForwardIndex and regeneration of forward index on reload path by @somandal in #9810
[Feature] Support coalesce literal by @61yao in #9958
Ability to initialize S3PinotFs with serverSideEncryption properties when passing client directly by @npawar in #9988
handle pending minion tasks properly when getting the task progress status by @klsince in #9911
allow gauge stored in metric registry to be updated by @zhtaoxiang in #9961
support case-insensitive query options in SET syntax by @agavra in #9912
pin versions-maven-plugin to 2.13.0 by @jadami10 in #9993
Pulsar Connection handler should not spin up a consumer / reader by @navina in #9893
Handle in-memory segment metadata for index checking by @Jackie-Jiang in #10017
Support the cross-account access using IAM role for S3 PinotFS by @snleee in #10009
report minion task metadata last update time as metric by @zhtaoxiang in #9954
support SKEWNESS and KURTOSIS aggregates by @agavra in #10021
emit minion task generation time and error metrics by @zhtaoxiang in #10026
Use the same default time value for all replicas by @Jackie-Jiang in #10029
Reduce the number of segments to wait for convergence when rebalancing by @saurabhd336 in #10028

UI Update & Improvement

Allow hiding query console tab based on cluster config (#9261)
Allow hiding pinot broker swagger UI by config (#9343)
Add UI to show fine-grained minion task progress (#9488)
Add UI to track segment reload progress (#9521)
Show minion task runtime config details in UI (#9652)
Redefine the segment status (#9699)
Show an option to reload the segments during edit schema (#9762)
Load schema UI async (#9781)
Fix blank screen when redirect to unknown app route (#9888)

Library version upgrade

Upgrade h3 lib from 3.7.2 to 4.0.0 to lower glibc requirement (#9335)
Upgrade ZK version to 3.6.3 (#9612)
Upgrade snakeyaml from 1.30 to 1.33 (#9464)
Upgrade RoaringBitmap from 0.9.28 to 0.9.35 (#9730)
Upgrade spotless-maven-plugin from 2.9.0 to 2.28.0 (#9877)
Upgrade decode-uri-component from 0.2.0 to 0.2.2 (#9941)

BugFixes

Fix bug with logging request headers by @abhs50 in #9247
Fix a UT that only shows up on host with more cores by @klsince in #9257
Fix message count by @Jackie-Jiang in #9271
Fix issue with auth AccessType in Schema REST endpoints by @sajjad-moradi in #9293
Fix PerfBenchmarkRunner to skip the tmp dir by @Jackie-Jiang in #9298
Fix thrift deserializer thread safety issue by @saurabhd336 in #9299
Fix transformation to string for BOOLEAN and TIMESTAMP by @Jackie-Jiang in #9287
[hotfix] Add VARBINARY column to switch case branch by @walterddr in #9313
Fix annotation for "/recommender" endpoint by @sajjad-moradi in #9326
Fix jdk8 build issue due to missing pom dependency by @somandal in #9351
Fix pom to use pinot-common-jdk8 for pinot-connector jkd8 java client by @somandal in #9353
Fix log to reflect job type by @KKcorps in #9381
[Bugfix] schema update bug fix by @MeihanLi in #9382
fix histogram null pointer exception by @jasperjiaguo in #9428
Fix thread safety issues with SDF (WIP) by @saurabhd336 in #9425
Bug fix: failure status in ingestion jobs doesn't reflect in exit code by @KKcorps in #9410
Fix skip segment logic in MinMaxValueBasedSelectionOrderByCombineOperator by @Jackie-Jiang in #9434
Fix the bug of hybrid table request using the same request id by @Jackie-Jiang in #9443
Fix the range check for range index on raw column by @Jackie-Jiang in #9453
Fix Data-Correctness Bug in GTE Comparison in BinaryOperatorTransformFunction by @ankitsultana in #9461
extend PinotFS impls with listFilesWithMetadata and some bugfix by @klsince in #9478
fix null transform bound check by @walterddr in #9495
Fix JsonExtractScalar when no value is extracted by @Jackie-Jiang in #9500
Fix AddTable for realtime tables by @npawar in #9506
Fix some type convert scalar functions by @Jackie-Jiang in #9509
fix spammy logs for ConfluentSchemaRegistryRealtimeClusterIntegrationTest [MINOR] by @agavra in #9516
Fix timestamp index on column of preserved key by @Jackie-Jiang in #9533
Fix record extractor when ByteBuffer can be reused by @Jackie-Jiang in #9549
Fix explain plan ALL_SEGMENTS_PRUNED_ON_SERVER node by @somandal in #9572
Fix time validation when data type needs to be converted by @Jackie-Jiang in #9569
UI: fix incorrect task finish time by @jayeshchoudhary in #9557
Fix the bug where uploaded segments cannot be deleted on real-time table by @Jackie-Jiang in #9579
[bugfix] correct the dir for building segments in FileIngestionHelper by @zhtaoxiang in #9591
Fix NonAggregationGroupByToDistinctQueryRewriter by @Jackie-Jiang in #9605
fix distinct result return by @walterddr in #9582
Fix GcsPinotFS by @lfernandez93 in #9556
fix DataSchema thread-safe issue by @walterddr in #9619
Bug fix: Add missing table config fetch for /tableConfigs list all by @timsants in #9603
Fix re-uploading segment when the previous upload failed by @Jackie-Jiang in #9631
Fix string split which should be on whole separator by @Jackie-Jiang in #9650
Fix server request sent delay to be non-negative by @Jackie-Jiang in #9656
bugfix: Add missing BIG_DECIMAL support for GenericRow serde by @timsants in #9661
Fix extra restlet resource test which should be stateless by @Jackie-Jiang in #9674
AdaptiveServerSelection: Fix timer by @vvivekiyer in #9697
fix PinotVersion to be compatible with prometheus by @agavra in #9701
Fix the setup for ControllerTest shared cluster by @Jackie-Jiang in #9704
[hotfix]groovy class cache leak by @walterddr in #9716
Fix TIMESTAMP index handling in SegmentMapper by @Jackie-Jiang in #9722
Fix the server admin endpoint cache to reflect the config changes by @Jackie-Jiang in #9734
[bugfix] fix case-when issue by @walterddr in #9702
[bugfix] Let StartControllerCommand also handle "pinot.zk.server", "pinot.cluster.name" in default conf/pinot-controller.conf by @thangnd197 in #9739
[hotfix] semi-join opt by @walterddr in #9779
Fixing the rebalance issue for real-time table with tier by @snleee in #9780
UI: show segment debug details when segment is in bad state by @jayeshchoudhary in #9700
Fix the replication in segment assignment strategy by @GSharayu in #9816
fix potential fd leakage for SegmentProcessorFramework by @klsince in #9797
Fix NPE when reading ZK address from controller config by @Jackie-Jiang in #9751
have query table list show search bar; fix InstancesTables filter by @jadami10 in #9742
[pinot-spark-connector] Fix empty data table handling in GRPC reader by @cbalci in #9837
[bugfix] fix mergeRollupTask metrics by @zhtaoxiang in #9864
Bug fix: Get correct primary key count by @KKcorps in #9876
Fix issues for realtime table reload by @Jackie-Jiang in #9885
UI: fix segment status color remains same in different table page by @jayeshchoudhary in #9891
Fix bloom filter creation on BYTES by @Jackie-Jiang in #9898
[hotfix] broker selection not using table name by @walterddr in #9902
Fix race condition when 2 segment upload occurred for the same segment by @jackjlli in #9905
fix timezone_hour/timezone_minute functions by @agavra in #9949
[Bugfix] Move brokerId extraction to BaseBrokerStarter by @jackjlli in #9965
Fix ser/de for StringLongPair by @Jackie-Jiang in #9985
bugfix dir check for HadoopPinotFS.copyFromLocalDir by @klsince in #9979
Bugfix: Use correct exception import in TableRebalancer. by @mayankshriv in #10025
Fix NPE in AbstractMetrics From Race Condition by @ankitsultana in #10022

release-0.12.1

Introduction

User-Facing Real-Time Analytics

Why is user-facing real-time analytics is so challenging?

Companies using Pinot

Features

When should I use it?

Frequently asked questions when getting started

Is Pinot a data warehouse or a database?

Get started

Starter guides

Query example

Installation

Standalone mode

Cluster mode

Learn

Basics

Concepts

Pinot Storage Model

Table

Segment

Tenant

Cluster

Pinot Components

Pinot Controller

Pinot Broker

Pinot Server

Pinot Minion

Components

Operator reference

Developer reference

Cluster

Cluster components

Participant

Spectator

Controller

Logical view

Setup a Pinot Cluster

Broker

Starting a Broker

Server

Starting a Server

Tenant

Tenant Config

Creating a tenant

Broker tenant

Server tenant

Deep Store

How do segments get into the Deep Store?

Configuring the Deep Store

Getting Started

Running Pinot

Deploy to a public cloud

Data import examples

Running on public clouds

Running on GCP

1. Tooling Installation

1.1 Install Kubectl

1.2 Install Helm

1.3 Install Google Cloud SDK

1.3.1 For Mac User

2. (Optional) Initialize Google Cloud Environment

3. (Optional) Create a Kubernetes cluster(GKE) in Google Cloud

4. Connect to an existing cluster

5. Pinot Quickstart

6. Delete a Kubernetes Cluster

Running on AWS

1. Tooling Installation

1.1 Install Kubectl

1.2 Install Helm

1.3 Install AWS CLI

1.4 Install Eksctl

2. (Optional) Login to your AWS account.

3. (Optional) Create a Kubernetes cluster(EKS) in AWS

4. Connect to an existing cluster

5. Pinot Quickstart

6. Delete a Kubernetes Cluster

HDFS as Deep Storage

Server Setup

Configuration.