1 of 19

Getting Started

This section contains quick start guides to help you get up and running with Pinot.

Running Pinot

To simplify the getting started experience, Pinot ships with quick start guides that launch Pinot components in a single process and import pre-built datasets.

For a full list of these guides, see Quick Start Examples.

Running Pinot locally Running Pinot in Docker Running in Kubernetes

Deploy to a public cloud

Data import examples

Getting data into Pinot is easy. Take a look at these two quick start guides which will help you get up and running with sample data for offline and real-time .

Running Pinot locally

This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.

In this guide, you'll learn how to download and install Apache Pinot as a standalone instance.

Download Apache Pinot
Set up a cluster

Download Apache Pinot

First, download the Pinot distribution for this tutorial. You can either download a packaged release or build a distribution from the source code.

Prerequisites

Install JDK11 or higher (JDK16 is not yet supported).
For JDK 8 support, use Pinot 0.7.1 or compile from the source code.

Note that some installations of the JDK do not contain the JNI bindings necessary to run all tests. If you see an error like java.lang.UnsatisfiedLinkError while running tests, you might need to change your JDK.

If using Homebrew, install AdoptOpenJDK 11 using brew install --cask adoptopenjdk11.

Support for M1 and M2 Mac systems

Currently, Apache Pinot doesn't provide official binaries for M1 or M2 Macs. For instructions, see .

Download the distribution or build from source by selecting one of the following tabs:

Download the latest binary release from , or use this command:

Extract the TAR file:

Navigate to the directory containing the launcher scripts:

You can also find older versions of Apache Pinot at . For example, to download Pinot 0.10.0, run the following command:

Follow these steps to checkout code from and build Pinot locally

M1 and M2 Mac Support

Currently, Apache Pinot doesn't provide official binaries for M1 or M2 Mac systems. Follow the instructions below to run on an M1 or M2 Mac:

Add the following to your ~/.m2/settings.xml:

Install Rosetta:

Set up a cluster

Now that we've downloaded Pinot, it's time to set up a cluster. There are two ways to do this: through quick start or through setting up a cluster manually.

Quick start

Pinot comes with quick start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick start command launches Pinot with a baseball dataset pre-loaded:

For a list of all the available quick start commands, see the .

Manual cluster

If you want to play with bigger datasets (more than a few megabytes), you can launch each component individually.

The video below is a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances.

You can find the commands that are shown in this video in the .

The examples below assume that you are using Java 8.

If you are using Java 11+ users, remove the GC settings insideJAVA_OPTS. So, for example, instead of this:

Use the following:

Start Zookeeper

You can use to browse the Zookeeper instance.

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Once your cluster is up and running, you can head over to to learn how to run queries against the data.

Start a Pinot component in debug mode with IntelliJ

Set break points and inspect variables by starting a Pinot component with debug mode in IntelliJ.

The following example demonstrates server debugging:

First, startzookeeper , controller, and broker using the .
Then, use the following configuration under $PROJECT_DIR$\.run ) to start the server, replacing the metrics-core version and cluster name as needed. This is an example of how to use it.

Running on public clouds

This page links to multiple quick start guides for deploying Pinot to different public cloud providers.

These quickstart guides show you how to run an Apache Pinot cluster using Kubernetes on different public cloud providers.

Running on Azure

This quickstart guide helps you get started running Pinot on Microsoft Azure.

In this quickstart guide, you will set up a Kubernetes Cluster on

1. Tooling Installation

Running on GCP

This quickstart guide helps you get started running Pinot on Google Cloud Platform (GCP).

In this quickstart guide, you will set up a Kubernetes Cluster on

1. Tooling Installation

Running on AWS

This quickstart guide helps you get started running Pinot on Amazon Web Services (AWS).

In this quickstart guide, you will set up a Kubernetes Cluster on Amazon Elastic Kubernetes Service (Amazon EKS)

1. Tooling Installation

1.1 Install Kubectl

To install kubectl, see .

For Mac users

Check kubectl version after installation.

Quickstart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Follow this link () to install helm.

For Mac users

Check helm version after installation.

This quickstart provides helm supports for helm v3.0.0 and v2.12.1. Pick the script based on your helm version.

1.3 Install AWS CLI

Follow this link () to install AWS CLI.

For Mac users

1.4 Install Eksctl

Follow this link () to install AWS CLI.

For Mac users

2. (Optional) Log in to your AWS account

For first-time AWS users, register your account at .

Once you have created the account, go to to create a user and create access keys under Security Credential tab.

Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will override the AWS configuration stored in file ~/.aws/credentials

3. (Optional) Create a Kubernetes cluster(EKS) in AWS

The script below will create a 1 node cluster named pinot-quickstart in us-west-2 with a t3.xlarge machine for demo purposes:

For k8s 1.23+, run the following commands to allow the containers to provision their storage:

Use the following command to monitor the cluster status:

Once the cluster is in ACTIVE status, it's ready to be used.

4. Connect to an existing cluster

Run the following command to get the credential for the cluster pinot-quickstart that you just created:

To verify the connection, run the following:

5. Pinot quickstart

Follow this to deploy your Pinot demo.

6. Delete a Kubernetes Cluster

HDFS as Deep Storage

This guide shows how to set up HDFS as deep storage for a Pinot segment.

To use HDFS as deep storage you need to include HDFS dependency jars and plugins.

Server Setup

Troubleshooting Pinot

Find debug information in Pinot

Pinot offers various ways to assist with troubleshooting and debugging problems that might happen.

Start with the which will surface many of the commonly occurring problems. The debug api provides information such as tableSize, ingestion status, and error messages related to state transition in server.

The table debug api can be invoked via the Swagger UI, as in the following image:

Frequently Asked Questions (FAQs)

This page lists pages with frequently asked questions with answers from the community.

This is a list of questions frequently asked in our troubleshooting channel on Slack. To contribute additional questions and answers, .

General

This page has a collection of frequently asked questions of a general nature with answers from the community.

This is a list of questions frequently asked in our troubleshooting channel on Slack. To contribute additional questions and answers, make a pull request.

How does Apache Pinot use deep storage?

When data is pushed to Apache Pinot, Pinot makes a backup copy of the data and stores it on the configured deep-storage (S3/GCP/ADLS/NFS/etc). This copy is stored as tar.gz Pinot segments. Note, that Pinot servers keep a (untarred) copy of the segments on their local disk as well. This is done for performance reasons.

How does Pinot use Zookeeper?

Pinot uses Apache Helix for cluster management, which in turn is built on top of Zookeeper. Helix uses Zookeeper to store the cluster state, including Ideal State, External View, Participants, and so on. Pinot also uses Zookeeper to store information such as Table configurations, schemas, Segment Metadata, and so on.

Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

Please check the JDK version you are using. You may be getting this error if you are using an older version than the current Pinot binary release was built on. If so, you have two options: switch to the same JDK release as Pinot was built with or download the for the Pinot release and it locally.

Pinot On Kubernetes FAQ

This page has a collection of frequently asked questions about Pinot on Kubernetes with answers from the community.

This is a list of questions frequently asked in our troubleshooting channel on Slack. To contribute additional questions and answers, .

How to increase server disk size on AWS

Running Pinot locally

This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.

In this guide, you'll learn how to download and install Apache Pinot as a standalone instance.

Download Apache Pinot
Set up a cluster

Download Apache Pinot

First, download the Pinot distribution for this tutorial. You can either download a packaged release or build a distribution from the source code.

Prerequisites

Install JDK11 or higher (JDK16 is not yet supported).
For JDK 8 support, use Pinot 0.7.1 or compile from the source code.

If using Homebrew, install AdoptOpenJDK 11 using brew install --cask adoptopenjdk11.

Support for M1 and M2 Mac systems

Currently, Apache Pinot doesn't provide official binaries for M1 or M2 Macs. For instructions, see .

Download the distribution or build from source by selecting one of the following tabs:

Download the latest binary release from , or use this command:

Extract the TAR file:

Navigate to the directory containing the launcher scripts:

You can also find older versions of Apache Pinot at . For example, to download Pinot 0.10.0, run the following command:

Follow these steps to checkout code from and build Pinot locally

M1 and M2 Mac Support

Currently, Apache Pinot doesn't provide official binaries for M1 or M2 Mac systems. Follow the instructions below to run on an M1 or M2 Mac:

Add the following to your ~/.m2/settings.xml:

Install Rosetta:

Set up a cluster

Now that we've downloaded Pinot, it's time to set up a cluster. There are two ways to do this: through quick start or through setting up a cluster manually.

Quick start

Pinot comes with quick start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick start command launches Pinot with a baseball dataset pre-loaded:

For a list of all the available quick start commands, see the .

Manual cluster

If you want to play with bigger datasets (more than a few megabytes), you can launch each component individually.

The video below is a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances.

You can find the commands that are shown in this video in the .

The examples below assume that you are using Java 8.

If you are using Java 11+ users, remove the GC settings insideJAVA_OPTS. So, for example, instead of this:

Use the following:

Start Zookeeper

You can use to browse the Zookeeper instance.

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Once your cluster is up and running, you can head over to to learn how to run queries against the data.

Start a Pinot component in debug mode with IntelliJ

Set break points and inspect variables by starting a Pinot component with debug mode in IntelliJ.

The following example demonstrates server debugging:

First, startzookeeper , controller, and broker using the .
Then, use the following configuration under $PROJECT_DIR$\.run ) to start the server, replacing the metrics-core version and cluster name as needed. This is an example of how to use it.

Running on AWS

This quickstart guide helps you get started running Pinot on Amazon Web Services (AWS).

In this quickstart guide, you will set up a Kubernetes Cluster on Amazon Elastic Kubernetes Service (Amazon EKS)

1. Tooling Installation

1.1 Install Kubectl

To install kubectl, see .

For Mac users

Check kubectl version after installation.

Quickstart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Follow this link () to install helm.

For Mac users

Check helm version after installation.

This quickstart provides helm supports for helm v3.0.0 and v2.12.1. Pick the script based on your helm version.

1.3 Install AWS CLI

Follow this link () to install AWS CLI.

For Mac users

1.4 Install Eksctl

Follow this link () to install AWS CLI.

For Mac users

2. (Optional) Log in to your AWS account

For first-time AWS users, register your account at .

Once you have created the account, go to to create a user and create access keys under Security Credential tab.

Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will override the AWS configuration stored in file ~/.aws/credentials

3. (Optional) Create a Kubernetes cluster(EKS) in AWS

The script below will create a 1 node cluster named pinot-quickstart in us-west-2 with a t3.xlarge machine for demo purposes:

For k8s 1.23+, run the following commands to allow the containers to provision their storage:

Use the following command to monitor the cluster status:

Once the cluster is in ACTIVE status, it's ready to be used.

4. Connect to an existing cluster

Run the following command to get the credential for the cluster pinot-quickstart that you just created:

To verify the connection, run the following:

5. Pinot quickstart

Follow this to deploy your Pinot demo.

6. Delete a Kubernetes Cluster

Running Pinot in Docker

This guide will show you to run a Pinot cluster using Docker.

Get started setting up a Pinot cluster with Docker using the guide below.

Prerequisites:

Install Docker
Configure Docker memory with the following minimum resources:
- CPUs: 8
- Memory: 16.00 GB
- Swap: 4 GB

The latest Pinot Docker image is published at apachepinot/pinot:latest. View a list of .

Pull the latest Docker image onto your machine by running the following command:

To pull a specific version, modify the command like below:

Set up a cluster

Once you've downloaded the Pinot Docker image, it's time to set up a cluster. There are two ways to do this.

Quick start

Pinot comes with quick start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick start command launches Pinot with a baseball dataset pre-loaded:

For a list of all available quick start commands, see .

Manual cluster

The quick start scripts launch Pinot with minimal resources. If you want to play with bigger datasets (more than a few MB), you can launch each of the Pinot components individually.

Note that these are sample configurations to be used as references. You will likely want to customize them to meet your needs for production use.

Docker

Create a Network

Create an isolated bridge network in docker

Start Zookeeper

Start Zookeeper in daemon mode. This is a single node zookeeper setup. Zookeeper is the central metadata store for Pinot and should be set up with replication for production use. For more information, see .

Start Pinot Controller

Start Pinot Controller in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Pinot Broker

Start Pinot Broker in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Pinot Server

Start Pinot Server in daemon and connect to Zookeeper.

The command below expects a 16GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Kafka

Optionally, you can also start Kafka for setting up real-time streams. This brings up the Kafka broker on port 9092.

Now all Pinot related components are started as an empty cluster.

Run the below command to check container status:

Sample Console Output

Docker Compose

Create a file called docker-compose.yml that contains the following:

Run the following command to launch all the components:

Run the below command to check the container status:

Sample Console Output

Once your cluster is up and running, see to learn how to run queries against the data.

If you have or installed, you can also try running the .

Ingestion FAQ

This page has a collection of frequently asked questions about ingestion with answers from the community.

This is a list of questions frequently asked in our troubleshooting channel on Slack. To contribute additional questions and answers, make a pull request.

Data processing

What is a good segment size?

While Apache Pinot can work with segments of various sizes, for optimal use of Pinot, you want to get your segments sized in the 100MB to 500MB (un-tarred/uncompressed) range. Having too many (thousands or more) tiny segments for a single table creates overhead in terms of the metadata storage in Zookeeper as well as in the Pinot servers' heap. At the same time, having too few really large (GBs) segments reduces parallelism of query execution, as on the server side, the thread parallelism of query execution is at segment level.

Can multiple Pinot tables consume from the same Kafka topic?

Yes. Each table can be independently configured to consume from any given Kafka topic, regardless of whether there are other tables that are also consuming from the same Kafka topic.

If I add a partition to a Kafka topic, will Pinot automatically ingest data from this partition?

Pinot automatically detects new partitions in Kafka topics. It checks for new partitions whenever RealtimeSegmentValidationManager periodic job runs and starts consumers for new partitions.

You can configure the interval for this job using thecontroller.realtime.segment.validation.frequencyPeriod property in the controller configuration.

Does Pinot support partition pruning on multiple partition columns?

Pinot supports multi-column partitioning for offline tables. Map multiple columns under Pinot assigns the input data to each partition according to the partition configuration individually for each column.

The following example partitions the segment based on two columns, memberID and caseNumber. Note that each partition column is handled separately, so in this case the segment is partitioned on memberID (partition ID 1) and also partiitoned on caseNumber (partition ID 2).

For multi-column partitioning to work, you must also set routing.segementPrunerTypes as follows:

How do I enable partitioning in Pinot when using Kafka stream?

Set up partitioner in the Kafka producer:

The partitioning logic in the stream should match the partitioning config in Pinot. Kafka uses murmur2, and the equivalent in Pinot is the Murmur function.

Set the partitioning configuration as below using same column used in Kafka:

and also set:

To learn how partition works, see .

How do I store BYTES column in JSON data?

For JSON, you can use a hex encoded string to ingest BYTES.

How do I flatten my JSON Kafka stream?

See the function which can store a top level json field as a STRING in Pinot.

Then you can use these during query time, to extract fields from the json string.

NOTE This works well if some of your fields are nested json, but most of your fields are top level json keys. If all of your fields are within a nested JSON key, you will have to store the entire payload as 1 column, which is not ideal.

How do I escape Unicode in my Job Spec YAML file?

To use explicit code points, you must double-quote (not single-quote) the string, and escape the code point via "\uHHHH", where HHHH is the four digit hex code for the character. See for more details.

Is there a limit on the maximum length of a string column in Pinot?

By default, Pinot limits the length of a String column to 512 bytes. If you want to overwrite this value, you can set the maxLength attribute in the schema as follows:

When are new events queryable when getting ingested into a real-time table?

Events are available to queries as soon as they are ingested. This is because events are instantly indexed in memory upon ingestion.

The ingestion of events into the real-time table is not transactional, so replicas of the open segment are not immediately consistent. Pinot trades consistency for availability upon network partitioning (CAP theorem) to provide ultra-low ingestion latencies at high throughput.

However, when the open segment is closed and its in-memory indexes are flushed to persistent storage, all its replicas are guaranteed to be consistent, with the .

How to reset a CONSUMING segment stuck on an offset which has expired from the stream?

This typically happens if:

The consumer is lagging a lot.
The consumer was down (server down, cluster down), and the stream moved on, resulting in offset not found when consumer comes back up.

In case of Kafka, to recover, set property "auto.offset.reset":"earliest" in the streamConfigs section and reset the CONSUMING segment. See for more details about the configuration.

You can also also use the "Resume Consumption" endpoint with "resumeFrom" parameter set to "smallest" (or "largest" if you want). See for more details.

Indexing

How to set inverted indexes?

Inverted indexes are set in the tableConfig's tableIndexConfig -> invertedIndexColumns list. For more info on table configuration, see . For an example showing how to configure an inverted index, see .

Applying inverted indexes to a table configuration will generate an inverted index for all new segments. To apply the inverted indexes to all existing segments, see

How to apply an inverted index to existing segments?

Add the columns you wish to index to the tableIndexConfig-> invertedIndexColumns list. To update the table configuration use the Pinot Swagger API: .
Invoke the reload API: .

Once you've done that, you can check whether the index has been applied by querying the segment metadata API at . Don't forget to include the names of the column on which you have applied the index.

The output from this API should look something like the following:

Can I retrospectively add an index to any segment?

Not all indexes can be retrospectively applied to existing segments.

If you want to add or change the or adjust you will need to manually re-load any existing segments.

How to create star-tree indexes?

Star-tree indexes are configured in the table config under the tableIndexConfig -> starTreeIndexConfigs (list) and enableDefaultStarTree (boolean). See here for more about how to configure star-tree indexes:

The new segments will have star-tree indexes generated after applying the star-tree index configurations to the table configuration. Currently, Pinot does not support adding star-tree indexes to the existing segments.

Handling time in Pinot

How does Pinot’s real-time ingestion handle out-of-order events?

Pinot does not require ordering of event time stamps. Out of order events are still consumed and indexed into the "currently consuming" segment. In a pathological case, if you have a 2 day old event come in "now", it will still be stored in the segment that is open for consumption "now". There is no strict time-based partitioning for segments, but star-indexes and hybrid tables will handle this as appropriate.

See the for more details about how hybrid tables handle this. Specifically, the time-boundary is computed as max(OfflineTIme) - 1 unit of granularity. Pinot does store the min-max time for each segment and uses it for pruning segments, so segments with multiple time intervals may not be perfectly pruned.

When generating star-indexes, the time column will be part of the star-tree so the tree can still be efficiently queried for segments with multiple time intervals.

What is the purpose of a hybrid table not using `max(OfflineTime)` to determine the time-boundary, and instead using an offset?

This lets you have an old event up come in without building complex offline pipelines that perfectly partition your events by event timestamps. With this offset, even if your offline data pipeline produces segments with a maximum timestamp, Pinot will not use the offline dataset for that last chunk of segments. The expectation is if you process offline the next time-range of data, your data pipeline will include any late events.

Why are segments not strictly time-partitioned?

It might seem odd that segments are not strictly time-partitioned, unlike similar systems such as Apache Druid. This allows real-time ingestion to consume out-of-order events. Even though segments are not strictly time-partitioned, Pinot will still index, prune, and query segments intelligently by time intervals for the performance of hybrid tables and time-filtered data.

When generating offline segments, the segments generated such that segments only contain one time interval and are well partitioned by the time column.

Running in Kubernetes

Pinot quick start in Kubernetes

Get started running Pinot in Kubernetes.

Note: The examples in this guide are sample configurations to be used as reference. For production setup, you may want to customize it to your needs.

Prerequisites

Kubernetes

This guide assumes that you already have a running Kubernetes cluster.

If you haven't yet set up a Kubernetes cluster, see the links below for instructions:

- Make sure to run with enough resources: minikube start --vm=true --cpus=4 --memory=8g --disk-size=50g

Pinot

Make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our.

Set up a Pinot cluster in Kubernetes

Start Pinot with Helm

The Pinot repository has pre-packaged Helm charts for Pinot and Presto. The Helm repository index file is .

Note: Specify StorageClass based on your cloud vendor. Don't mount a blob store (such as AzureFile, GoogleCloudStorage, or S3) as the data serving file system. Use only Amazon EBS/GCP Persistent Disk/Azure Disk-style disks.

For AWS: "gp2"

Check Pinot deployment status

Load data into Pinot using Kafka

Bring up a Kafka cluster for real-time data ingestion

Check Kafka deployment status

Ensure the Kafka deployment is ready before executing the scripts in the following steps. Run the following command:

Below is an example output showing the deployment is ready:

Create Kafka topics

Run the scripts below to create two Kafka topics for data ingestion:

Load data into Kafka and create Pinot schema/tables

The script below does the following:

Ingests 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec
Ingests 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec
Uploads Pinot schema airlineStats

Query with the Pinot Data Explorer

Pinot Data Explorer

The script below, located at ./pinot/helm/pinot, performs local port forwarding, and opens the Pinot query console in your default web browser.

Query Pinot with Superset

Bring up Superset using Helm

Install the SuperSet Helm repository:

Get the Helm values configuration file:

For Superset to install Pinot dependencies, edit /tmp/superset-values.yaml file to add apinotdb pip dependency into bootstrapScript field.
You can also build your own image with this dependency or use the image apachepinot/pinot-superset:latest instead.

Replace the default admin credentials inside the init section with a meaningful user profile and stronger password.
Install Superset using Helm:

Ensure your cluster is up by running:

Access the Superset UI

Run the below command to port forward Superset to your localhost:18088.

Navigate to Superset in your browser with the admin credentials you set in the previous section.
Create a new database connection with the following URI: pinot+http://pinot-broker.pinot-quickstart:8099/query?controller=http://pinot-controller.pinot-quickstart:9000/
Once the database is added, you can add more data sets and explore the dashboard options.

Access Pinot with Trino

Deploy Trino

Deploy Trino with the Pinot plugin installed:

See the charts in the Trino Helm chart repository:

In order to connect Trino to Pinot, you'll need to add the Pinot catalog, which requires extra configurations. Run the below command to get all the configurable values.

To add the Pinot catalog, edit the additionalCatalogs section by adding:

Pinot is deployed at namespace pinot-quickstart, so the controller serviceURL is pinot-controller.pinot-quickstart:9000

After modifying the /tmp/trino-values.yaml file, deploy Trino with:

Once you've deployed Trino, check the deployment status:

Query Pinot with the Trino CLI

Once Trino is deployed, run the below command to get a runnable Trino CLI.

Download the Trino CLI:

Port forward Trino service to your local if it's not already exposed:

Use the Trino console client to connect to the Trino service:

Query Pinot data using the Trino CLI, like in the sample queries below.

Sample queries to execute

List all catalogs

List all tables

Show schema

Count total documents

Access Pinot with Presto

Deploy Presto with the Pinot plugin

First, deploy Presto with default configurations:

To customize your deployment, run the below command to get all the configurable values.

After modifying the /tmp/presto-values.yaml file, deploy Presto:

Once you've deployed the Presto instance, check the deployment status:

Query Presto using the Presto CLI

Once Presto is deployed, you can run the below command from , or follow the steps below.

Download the Presto CLI:

Port forward presto-coordinator port 8080 to localhost port 18080:

Start the Presto CLI with the Pinot catalog:

Query Pinot data with the Presto CLI, like in the sample queries below.

Sample queries to execute

List all catalogs

List all tables

Show schema

Count total documents

Delete a Pinot cluster in Kubernetes

To delete your Pinot cluster in Kubernetes, run the following command:

Operations FAQ

This page has a collection of frequently asked questions about operations with answers from the community.

This is a list of questions frequently asked in our troubleshooting channel on Slack. To contribute additional questions and answers, make a pull request.

Memory

How much heap should I allocate for my Pinot instances?

Typically, Apache Pinot components try to use as much off-heap (MMAP/DirectMemory) wherever possible. For example, Pinot servers load segments in memory-mapped files in MMAP mode (recommended), or direct memory in HEAP mode. Heap memory is used mostly for query execution and storing some metadata. We have seen production deployments with high throughput and low-latency work well with just 16 GB of heap for Pinot servers and brokers. The Pinot controller may also cache some metadata (table configurations etc) in heap, so if there are just a few tables in the Pinot cluster, a few GB of heap should suffice.

DR

Does Pinot provide any backup/restore mechanism?

Pinot relies on deep-storage for storing a backup copy of segments (offline as well as real-time). It relies on Zookeeper to store metadata (table configurations, schema, cluster state, and so on). It does not explicitly provide tools to take backups or restore these data, but relies on the deep-storage (ADLS/S3/GCP/etc), and ZK to persist these data/metadata.

Alter Table

Can I change a column name in my table, without losing data?

Changing a column name or data type is considered backward incompatible change. While Pinot does support schema evolution for backward compatible changes, it does not support backward incompatible changes like changing name/data-type of a column.

How to change number of replicas of a table?

You can change the number of replicas by updating the table configuration's section. Make sure you have at least as many servers as the replication.

For offline tables, update :

For real-time tables, update :

After changing the replication, run a .

Note that if you are using replica groups, it's expected these configurations equal numReplicaGroups. If they do not match, Pinot will use numReplicaGroups.

How to set or change table retention?

By default there is no retention set for a table in Apache Pinot. You may however, set retention by setting the following properties in the section inside table configs:

retentionTimeUnit
retentionTimeValue

Updating the retention value in the table config should be good enough, there is no need to rebalance the table or reload its segments.

Rebalance

How to run a rebalance on a table?

See .

Why does my real-time table not use the new nodes I added to the cluster?

Likely explanation: num partitions * num replicas < num servers.

In real-time tables, segments of the same partition always remain on the same node. This sticky assignment is needed for replica groups and is critical if using upserts. For instance, if you have 3 partitions, 1 replica, and 4 nodes, only ¾ nodes will be used, and all of p0 segments will be on 1 node, p1 on 1 node, and p2 on 1 node. One server will be unused, and will remain unused through rebalances.

There’s nothing we can do about CONSUMING segments, they will continue to use only 3 nodes if you have 3 partitions. But we can rebalance such that completed segments use all nodes. If you want to force the completed segments of the table to use the new server use this config:

Segments

How to control the number of segments generated?

The number of segments generated depends on the number of input files. If you provide only 1 input file, you will get 1 segment. If you break up the input file into multiple files, you will get as many segments as the input files.

What are the common reasons my segment is in a BAD state ?

This typically happens when the server is unable to load the segment. Possible causes: out-of-memory, no disk space, unable to download segment from deep-store, and similar other errors. Check server logs for more information.

How to reset a segment when it runs into a BAD state?

Use the segment reset controller REST API to reset the segment:

How do I pause real-time ingestion?

Refer to .

What's the difference between Reset, Refresh, and Reload?

Reset: Gets a segment in ERROR state back to ONLINE or CONSUMING state. Behind the scenes, the Pinot controller takes the segment to the OFFLINE state, waits for External View to stabilize, and then moves it back to ONLINE or CONSUMING state, thus effectively resetting segments or consumers in error states.

In addition, RESET brings the segment OFFLINE temporarily; while REFRESH and RELOAD swap the segment on server atomically without bringing down the segment or affecting ongoing queries.

Tenants

How can I make brokers/servers join the cluster without the DefaultTenant tag?

Set this property in your controller.conf file:

Now your brokers and servers should join the cluster as broker_untagged and server_untagged. You can then directly use the POST /tenants API to create the desired tenants, as in the following:

Minion

How do I tune minion task timeout and parallelism on each worker?

There are two task configurations, but they are set as part of cluster configurations, like in the following example. One controls the task's overall timeout (1hr by default) and one sets how many tasks to run on a single minion worker (1 by default). The <taskType> is the task to tune, such as MergeRollupTask or RealtimeToOfflineSegmentsTask etc.

How to I manually run a Periodic Task?

See .

Tuning and Optimizations

Do replica groups work for real-time?

Yes, replica groups work for real-time. There's 2 parts to enabling replica groups:

Replica groups segment assignment.
Replica group query routing.

Replica group segment assignment

Replica group segment assignment is achieved in real-time, if number of servers is a multiple of number of replicas. The partitions get uniformly sprayed across the servers, creating replica groups. For example, consider we have 6 partitions, 2 replicas, and 4 servers.

As you can see, the set (S0, S2) contains r1 of every partition, and (s1, S3) contains r2 of every partition. The query will only be routed to one of the sets, and not span every server. If you are are adding/removing servers from an existing table setup, you have to run for segment assignment changes to take effect.

Replica group query routing

Once replica group segment assignment is in effect, the query routing can take advantage of it. For replica group based query routing, set the following in the table config's section, and then restart brokers

Overwrite index configs at tier level

When using , user may want to have different encoding and indexing types for a column in different tiers to balance query latency and cost saving more flexibly. For example, segments in the hot tier can use dict-encoding, bloom filter and all kinds of relevant index types for very fast query execution. But for segments in the cold tier, where cost saving matters more than low query latency, one may want to use raw values and bloom filters only.

The following two examples show how to overwrite encoding type and index configs for tiers. Similar changes are also demonstrated in the .

Overwriting single-column index configs using fieldConfigList. All top level fields in can be overwritten, and fields not overwritten are kept intact.

Overwriting star-tree index configurations using tableIndexConfig. The StarTreeIndexConfigs is overwritten as a whole. In fact, all top level fields defined in can be overwritten, so single-column index configs defined in tableIndexConfig can also be overwritten but it's less clear than using fieldConfigList.

Credential

How do I update credentials for real-time upstream without downtime?

.
Wait for the pause status to change to success.
Update the credential in the table config.

SegmentGenerationJobSpec: 
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
  segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**\/*.csv
inputDirURI: /tmp/pinot-quick-start/rawdata/
jobType: SegmentCreationAndTarPush
outputDirURI: /tmp/pinot-quick-start/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: 'http://localhost:9000'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: null
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader,
  configClassName: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig,
  configs: null, dataFormat: csv}
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: 'http://localhost:9000/tables/transcript/schema', tableConfigURI: 'http://localhost:9000/tables/transcript',
  tableName: transcript}

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Finished building StatsCollector!
Collected stats for 4 documents
Using fixed bytes value dictionary for column: studentID, size: 9
Created dictionary for STRING column: studentID with cardinality: 3, max length in bytes: 3, range: 200 to 202
Using fixed bytes value dictionary for column: firstName, size: 12
Created dictionary for STRING column: firstName with cardinality: 3, max length in bytes: 4, range: Bob to Nick
Using fixed bytes value dictionary for column: lastName, size: 15
Created dictionary for STRING column: lastName with cardinality: 3, max length in bytes: 5, range: King to Young
Created dictionary for FLOAT column: score with cardinality: 4, range: 3.2 to 3.8
Using fixed bytes value dictionary for column: gender, size: 12
Created dictionary for STRING column: gender with cardinality: 2, max length in bytes: 6, range: Female to Male
Using fixed bytes value dictionary for column: subject, size: 21
Created dictionary for STRING column: subject with cardinality: 3, max length in bytes: 7, range: English to Physics
Created dictionary for LONG column: timestampInEpoch with cardinality: 4, range: 1570863600000 to 1572418800000
Start building IndexCreator!
Finished records indexing in IndexCreator!
Finished segment seal!
Converting segment: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to v3 format
v3 segment location for segment: transcript_OFFLINE_1570863600000_1572418800000_0 is /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3
Deleting files in v1 segment directory: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0
Starting building 1 star-trees with configs: [StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]] using OFF_HEAP builder
Starting building star-tree with config: StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]
Generated 3 star-tree records from 4 segment records
Finished constructing star-tree, got 9 tree nodes and 4 records under star-node
Finished creating aggregated documents, got 6 aggregated records
Finished building star-tree in 10ms
Finished building 1 star-trees in 27ms
Computed crc = 3454627653, based on files [/var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/columns.psf, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/index_map, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/metadata.properties, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index_map]
Driver, record read time : 0
Driver, stats collector time : 0
Driver, indexing time : 0
Tarring segment from: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz
Size for segment: transcript_OFFLINE_1570863600000_1572418800000_0, uncompressed: 6.73KB, compressed: 1.89KB
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Start pushing segments: [/tmp/pinot-quick-start/segments/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz]... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@243c4f91] for table transcript
Pushing segment: transcript_OFFLINE_1570863600000_1572418800000_0 to location: http://localhost:9000 for table transcript
Sending request: http://localhost:9000/v2/segments?tableName=transcript to controller: nehas-mbp.hsd1.ca.comcast.net, version: Unknown
Response for pushing table transcript segment transcript_OFFLINE_1570863600000_1572418800000_0 to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: transcript_OFFLINE_1570863600000_1572418800000_0 of table: transcript"}

Getting Started

hashtagRunning Pinot

hashtagDeploy to a public cloud

hashtagData import examples

Running Pinot locally

hashtagDownload Apache Pinot

hashtagPrerequisites

hashtagM1 and M2 Mac Support

hashtagSet up a cluster

hashtagQuick start

hashtagManual cluster

hashtagStart Zookeeper

hashtagStart Pinot Controller

hashtagStart Pinot Broker

hashtagStart Pinot Server

hashtagStart Kafka

hashtagStart a Pinot component in debug mode with IntelliJ

Running on public clouds

Running on Azure

hashtag1. Tooling Installation

hashtag

Running on GCP

hashtag1. Tooling Installation

hashtag

Running on AWS

hashtag1. Tooling Installation

hashtag1.1 Install Kubectl

hashtag1.2 Install Helm

hashtag1.3 Install AWS CLI

hashtag1.4 Install Eksctl

hashtag2. (Optional) Log in to your AWS account

hashtag3. (Optional) Create a Kubernetes cluster(EKS) in AWS

hashtag4. Connect to an existing cluster

hashtag5. Pinot quickstart

hashtag6. Delete a Kubernetes Cluster

HDFS as Deep Storage

hashtagServer Setup

hashtag

Troubleshooting Pinot

hashtagFind debug information in Pinot

Frequently Asked Questions (FAQs)

General

hashtagHow does Apache Pinot use deep storage?

hashtagHow does Pinot use Zookeeper?

hashtagWhy am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

Pinot On Kubernetes FAQ

hashtagHow to increase server disk size on AWS

Getting Started

hashtagRunning Pinot

hashtagDeploy to a public cloud

hashtagData import examples

Running Pinot locally

hashtagDownload Apache Pinot

hashtagPrerequisites

hashtagM1 and M2 Mac Support

hashtagSet up a cluster

hashtagQuick start

hashtagManual cluster

hashtagStart Zookeeper

hashtagStart Pinot Controller

hashtagStart Pinot Broker

hashtagStart Pinot Server

hashtagStart Kafka

hashtagStart a Pinot component in debug mode with IntelliJ

HDFS as Deep Storage

hashtagServer Setup

hashtag

hashtagExecutable

hashtagController Setup

hashtagConfiguration

hashtagExecutable

hashtagBroker Setup

hashtagConfiguration

hashtagExecutable

hashtagTroubleshooting

Frequently Asked Questions (FAQs)

Running on public clouds

General

hashtagHow does Apache Pinot use deep storage?

hashtagHow does Pinot use Zookeeper?

Running Pinot

Deploy to a public cloud

Data import examples

Download Apache Pinot

Prerequisites

M1 and M2 Mac Support

Set up a cluster

Quick start

Manual cluster

Start Zookeeper

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Start a Pinot component in debug mode with IntelliJ

1. Tooling Installation

1. Tooling Installation

1. Tooling Installation

1.1 Install Kubectl

1.2 Install Helm

1.3 Install AWS CLI

1.4 Install Eksctl

2. (Optional) Log in to your AWS account

3. (Optional) Create a Kubernetes cluster(EKS) in AWS

4. Connect to an existing cluster

5. Pinot quickstart

6. Delete a Kubernetes Cluster

Server Setup

Find debug information in Pinot

How does Apache Pinot use deep storage?

How does Pinot use Zookeeper?

Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

How to increase server disk size on AWS

Running Pinot

Deploy to a public cloud

Data import examples

Download Apache Pinot

Prerequisites

M1 and M2 Mac Support

Set up a cluster

Quick start

Manual cluster

Start Zookeeper

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Start a Pinot component in debug mode with IntelliJ

Server Setup

Executable

Controller Setup

Configuration

Executable

Broker Setup

Configuration

Executable

Troubleshooting

How does Apache Pinot use deep storage?

How does Pinot use Zookeeper?

Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

Find debug information in Pinot

Debug a slow query or a query which keeps timing out

1. Tooling Installation

1.1 Install Kubectl

1.2 Install Helm

1.3 Install AWS CLI

1.4 Install Eksctl

2. (Optional) Log in to your AWS account

3. (Optional) Create a Kubernetes cluster(EKS) in AWS

4. Connect to an existing cluster

5. Pinot quickstart

6. Delete a Kubernetes Cluster

1. Tooling Installation

1. Tooling Installation

How to increase server disk size on AWS

1.2 Install Helm

1.3 Install Azure CLI

2. (Optional) Log in to your Azure account

3. (Optional) Create a Resource Group

4. (Optional) Create a Kubernetes cluster(AKS) in Azure