1 of 19

Getting Started

This section contains quick start guides to help you get up and running with Pinot.

Running Pinot

To simplify the getting started experience, Pinot ships with quick start guides that launch Pinot components in a single process and import pre-built datasets.

For a full list of these guides, see Quick Start Examples.

Running Pinot locally Running Pinot in Docker Running in Kubernetes

Deploy to a public cloud

Data import examples

Getting data into Pinot is easy. Take a look at these two quick start guides which will help you get up and running with sample data for offline and real-time .

Running Pinot locally

This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.

In this guide, you'll learn how to download and install Apache Pinot as a standalone instance.

Download Apache Pinot
Set up a cluster

Download Apache Pinot

First, download the Pinot distribution for this tutorial. You can either download a packaged release or build a distribution from the source code.

Prerequisites

Install JDK11 or higher (JDK16 is not yet supported).
For JDK 8 support, use Pinot 0.7.1 or compile from the source code.

Note that some installations of the JDK do not contain the JNI bindings necessary to run all tests. If you see an error like java.lang.UnsatisfiedLinkError while running tests, you might need to change your JDK.

If using Homebrew, install AdoptOpenJDK 11 using brew install --cask adoptopenjdk11.

Support for M1 and M2 Mac systems

Currently, Apache Pinot doesn't provide official binaries for M1 or M2 Macs. For instructions, see .

Download the distribution or build from source by selecting one of the following tabs:

Download the latest binary release from , or use this command:

Extract the TAR file:

Navigate to the directory containing the launcher scripts:

You can also find older versions of Apache Pinot at . For example, to download Pinot 0.10.0, run the following command:

Follow these steps to checkout code from and build Pinot locally

M1 and M2 Mac Support

Currently, Apache Pinot doesn't provide official binaries for M1 or M2 Mac systems. Follow the instructions below to run on an M1 or M2 Mac:

Add the following to your ~/.m2/settings.xml:

Install Rosetta:

Set up a cluster

Now that we've downloaded Pinot, it's time to set up a cluster. There are two ways to do this: through quick start or through setting up a cluster manually.

Quick start

Pinot comes with quick start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick start command launches Pinot with a baseball dataset pre-loaded:

For a list of all the available quick start commands, see the .

Manual cluster

If you want to play with bigger datasets (more than a few megabytes), you can launch each component individually.

The video below is a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances.

You can find the commands that are shown in this video in the .

The examples below assume that you are using Java 8.

If you are using Java 11+ users, remove the GC settings insideJAVA_OPTS. So, for example, instead of this:

Use the following:

Start Zookeeper

You can use to browse the Zookeeper instance.

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Once your cluster is up and running, you can head over to to learn how to run queries against the data.

Start a Pinot component in debug mode with IntelliJ

Set break points and inspect variables by starting a Pinot component with debug mode in IntelliJ.

The following example demonstrates server debugging:

First, startzookeeper , controller, and broker using the .
Then, use the following configuration under $PROJECT_DIR$\.run ) to start the server, replacing the metrics-core version and cluster name as needed. This is an example of how to use it.

Running Pinot in Docker

This guide will show you to run a Pinot cluster using Docker.

Get started setting up a Pinot cluster with Docker using the guide below.

Prerequisites:

Install Docker
Configure Docker memory with the following minimum resources:
- CPUs: 8
- Memory: 16.00 GB
- Swap: 4 GB

The latest Pinot Docker image is published at apachepinot/pinot:latest. View a list of .

Pull the latest Docker image onto your machine by running the following command:

To pull a specific version, modify the command like below:

Set up a cluster

Once you've downloaded the Pinot Docker image, it's time to set up a cluster. There are two ways to do this.

Quick start

Pinot comes with quick start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick start command launches Pinot with a baseball dataset pre-loaded:

For a list of all available quick start commands, see .

Manual cluster

The quick start scripts launch Pinot with minimal resources. If you want to play with bigger datasets (more than a few MB), you can launch each of the Pinot components individually.

Note that these are sample configurations to be used as references. You will likely want to customize them to meet your needs for production use.

Docker

Create a Network

Create an isolated bridge network in docker

Start Zookeeper

Start Zookeeper in daemon mode. This is a single node zookeeper setup. Zookeeper is the central metadata store for Pinot and should be set up with replication for production use. For more information, see .

Start Pinot Controller

Start Pinot Controller in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Pinot Broker

Start Pinot Broker in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Pinot Server

Start Pinot Server in daemon and connect to Zookeeper.

The command below expects a 16GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Kafka

Optionally, you can also start Kafka for setting up real-time streams. This brings up the Kafka broker on port 9092.

Now all Pinot related components are started as an empty cluster.

Run the below command to check container status:

Sample Console Output

Docker Compose

Create a file called docker-compose.yml that contains the following:

Run the following command to launch all the components:

Run the below command to check the container status:

Sample Console Output

Once your cluster is up and running, see to learn how to run queries against the data.

If you have or installed, you can also try running the .

Running on public clouds

This page links to multiple quick start guides for deploying Pinot to different public cloud providers.

These quickstart guides show you how to run an Apache Pinot cluster using Kubernetes on different public cloud providers.

Running on Azure

This quickstart guide helps you get started running Pinot on Microsoft Azure.

In this quickstart guide, you will set up a Kubernetes Cluster on Azure Kubernetes Service (AKS)

1. Tooling Installation

1.1 Install Kubectl

Follow this link () to install kubectl.

For Mac users

Check kubectl version after installation.

Quickstart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

To install Helm, see .

For Mac users

Check helm version after installation.

This quickstart provides helm supports for helm v3.0.0 and v2.12.1. Pick the script based on your helm version.

1.3 Install Azure CLI

Follow this link () to install Azure CLI.

For Mac users

2. (Optional) Log in to your Azure account

This script will open your default browser to sign-in to your Azure Account.

3. (Optional) Create a Resource Group

Use the following script create a resource group in location eastus.

4. (Optional) Create a Kubernetes cluster(AKS) in Azure

This script will create a 3 node cluster named pinot-quickstart for demo purposes.

Modify the parameters in the following example command with your resource group and cluster details:

Once the command succeeds, the cluster is ready to be used.

5. Connect to an existing cluster

Run the following command to get the credential for the cluster pinot-quickstart that you just created:

To verify the connection, run the following:

6. Pinot quickstart

Follow this to deploy your Pinot demo.

7. Delete a Kubernetes Cluster

Running on GCP

This quickstart guide helps you get started running Pinot on Google Cloud Platform (GCP).

In this quickstart guide, you will set up a Kubernetes Cluster on

1. Tooling Installation

Running on AWS

This quickstart guide helps you get started running Pinot on Amazon Web Services (AWS).

In this quickstart guide, you will set up a Kubernetes Cluster on

1. Tooling Installation

HDFS as Deep Storage

This guide shows how to set up HDFS as deep storage for a Pinot segment.

To use HDFS as deep storage you need to include HDFS dependency jars and plugins.

Server Setup

Configuration

Executable

Controller Setup

Configuration

Executable

Broker Setup

Configuration

Executable

Troubleshooting

If you receive an error that says No FileSystem for scheme"hdfs", the problem is likely to be a class loading issue.

To fix, try adding the following property to core-site.xml:

fs.hdfs.impl org.apache.hadoop.hdfs.DistributedFileSystem

And then export /opt/pinot/lib/hadoop-common-<release-version>.jar in the classpath.

Frequently Asked Questions (FAQs)

This page lists pages with frequently asked questions with answers from the community.

This is a list of questions frequently asked in our troubleshooting channel on Slack. To contribute additional questions and answers, .

General

This page has a collection of frequently asked questions of a general nature with answers from the community.

This is a list of questions frequently asked in our troubleshooting channel on Slack. To contribute additional questions and answers, make a pull request.

How does Apache Pinot use deep storage?

When data is pushed to Apache Pinot, Pinot makes a backup copy of the data and stores it on the configured deep-storage (S3/GCP/ADLS/NFS/etc). This copy is stored as tar.gz Pinot segments. Note, that Pinot servers keep a (untarred) copy of the segments on their local disk as well. This is done for performance reasons.

How does Pinot use Zookeeper?

Pinot uses Apache Helix for cluster management, which in turn is built on top of Zookeeper. Helix uses Zookeeper to store the cluster state, including Ideal State, External View, Participants, and so on. Pinot also uses Zookeeper to store information such as Table configurations, schemas, Segment Metadata, and so on.

Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

Please check the JDK version you are using. You may be getting this error if you are using an older version than the current Pinot binary release was built on. If so, you have two options: switch to the same JDK release as Pinot was built with or download the for the Pinot release and it locally.

Pinot On Kubernetes FAQ

This page has a collection of frequently asked questions about Pinot on Kubernetes with answers from the community.

This is a list of questions frequently asked in our troubleshooting channel on Slack. To contribute additional questions and answers, make a pull request.

How to increase server disk size on AWS

The following is an example using Amazon Elastic Kubernetes Service (Amazon EKS).

1. Update Storage Class

In the Kubernetes (k8s) cluster, check the storage class: in Amazon EKS, it should be gp2.

Then update StorageClass to ensure:

Once StorageClass is updated, it should look like this:

2. Update PVC

Once the storage class is updated, then we can update the PersistentVolumeClaim (PVC) for the server disk size.

Now we want to double the disk size for pinot-server-3.

The following is an example of current disks:

The following is the output of data-pinot-server-3:

Now, let's change the PVC size to 2T by editing the server PVC.

Once updated, the specification's PVC size is updated to 2T, but the status's PVC size is still 1T.

3. Restart pod to let it reflect

Restart the pinot-server-3 pod:

Recheck the PVC size:

Query FAQ

This page has a collection of frequently asked questions about queries with answers from the community.

This is a list of questions frequently asked in our troubleshooting channel on Slack. To contribute additional questions and answers, .

Querying

Running Pinot locally

This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.

In this guide, you'll learn how to download and install Apache Pinot as a standalone instance.

Download Apache Pinot
Set up a cluster

Download Apache Pinot

First, download the Pinot distribution for this tutorial. You can either download a packaged release or build a distribution from the source code.

Prerequisites

Install JDK11 or higher (JDK16 is not yet supported).
For JDK 8 support, use Pinot 0.7.1 or compile from the source code.

If using Homebrew, install AdoptOpenJDK 11 using brew install --cask adoptopenjdk11.

Support for M1 and M2 Mac systems

Currently, Apache Pinot doesn't provide official binaries for M1 or M2 Macs. For instructions, see .

Download the distribution or build from source by selecting one of the following tabs:

Download the latest binary release from , or use this command:

Extract the TAR file:

Navigate to the directory containing the launcher scripts:

You can also find older versions of Apache Pinot at . For example, to download Pinot 0.10.0, run the following command:

Follow these steps to checkout code from and build Pinot locally

M1 and M2 Mac Support

Currently, Apache Pinot doesn't provide official binaries for M1 or M2 Mac systems. Follow the instructions below to run on an M1 or M2 Mac:

Add the following to your ~/.m2/settings.xml:

Install Rosetta:

Set up a cluster

Now that we've downloaded Pinot, it's time to set up a cluster. There are two ways to do this: through quick start or through setting up a cluster manually.

Quick start

Pinot comes with quick start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick start command launches Pinot with a baseball dataset pre-loaded:

For a list of all the available quick start commands, see the .

Manual cluster

If you want to play with bigger datasets (more than a few megabytes), you can launch each component individually.

The video below is a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances.

You can find the commands that are shown in this video in the .

The examples below assume that you are using Java 8.

If you are using Java 11+ users, remove the GC settings insideJAVA_OPTS. So, for example, instead of this:

Use the following:

Start Zookeeper

You can use to browse the Zookeeper instance.

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Once your cluster is up and running, you can head over to to learn how to run queries against the data.

Start a Pinot component in debug mode with IntelliJ

Set break points and inspect variables by starting a Pinot component with debug mode in IntelliJ.

The following example demonstrates server debugging:

First, startzookeeper , controller, and broker using the .
Then, use the following configuration under $PROJECT_DIR$\.run ) to start the server, replacing the metrics-core version and cluster name as needed. This is an example of how to use it.

Running Pinot in Docker

This guide will show you to run a Pinot cluster using Docker.

Get started setting up a Pinot cluster with Docker using the guide below.

Prerequisites:

Install Docker
Configure Docker memory with the following minimum resources:
- CPUs: 8
- Memory: 16.00 GB
- Swap: 4 GB

The latest Pinot Docker image is published at apachepinot/pinot:latest. View a list of .

Pull the latest Docker image onto your machine by running the following command:

To pull a specific version, modify the command like below:

Set up a cluster

Once you've downloaded the Pinot Docker image, it's time to set up a cluster. There are two ways to do this.

Quick start

Pinot comes with quick start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick start command launches Pinot with a baseball dataset pre-loaded:

For a list of all available quick start commands, see .

Manual cluster

The quick start scripts launch Pinot with minimal resources. If you want to play with bigger datasets (more than a few MB), you can launch each of the Pinot components individually.

Note that these are sample configurations to be used as references. You will likely want to customize them to meet your needs for production use.

Docker

Create a Network

Create an isolated bridge network in docker

Start Zookeeper

Start Pinot Controller

Start Pinot Controller in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Pinot Broker

Start Pinot Broker in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Pinot Server

Start Pinot Server in daemon and connect to Zookeeper.

The command below expects a 16GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Kafka

Optionally, you can also start Kafka for setting up real-time streams. This brings up the Kafka broker on port 9092.

Now all Pinot related components are started as an empty cluster.

Run the below command to check container status:

Sample Console Output

Docker Compose

Create a file called docker-compose.yml that contains the following:

Run the following command to launch all the components:

Run the below command to check the container status:

Sample Console Output

Once your cluster is up and running, see to learn how to run queries against the data.

If you have or installed, you can also try running the .

Quick Start Examples

This section describes quick start commands that launch all Pinot components in a single process.

Pinot ships with QuickStart commands that launch Pinot components in a single process and import pre-built datasets. These quick start examples are a good place if you're just getting started with Pinot. The examples begin with the Batch Processing example, after the following notes:

Prerequisites
You must have either installed Pinot locally or have Docker installed if you want to use the Pinot Docker image. The examples are available in each option and work the same. The decision of which to choose depends on your installation preference and how you generally like to work. If you don't know which to choose, using Docker will make your cleanup easier after you are done with the examples.
Pinot versions in examples
The Docker-based examples on this page use pinot:latest, which instructs Docker to pull and use the most recent release of Apache Pinot. If you prefer to use a specific release instead, you can designate it by replacing latest with the release number, like this: pinot:0.12.1.
The local install-based examples that are run using the launcher scripts will use the Apache Pinot version you installed.
Running examples with Docker on a Mac with an M1 or M2 CPU
Add the -arm64 suffix to the run commands, like this:
Stopping a running example
To stop a running example, enter Ctrl+C in the same terminal where you ran the docker run command to start the example.

macOS Monterey Users

By default the Airplay receiver server runs on port 7000, which is also the port used by the Pinot Server in the Quick Start. You may see the following error when running these examples:

If you disable the Airplay receiver server and try again, you shouldn't see this error message anymore.

Batch Processing

This example demonstrates how to do batch processing with Pinot. The command:

Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates the baseballStats table
Launches a standalone data ingestion job that builds one segment for a given CSV data file for the baseballStats table and pushes the segment to the Pinot Controller.

Batch JSON

This example demonstrates how to import and query JSON documents in Pinot. The command:

Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates the githubEvents table
Launches a standalone data ingestion job that builds one segment for a given JSON data file for the githubEvents table and pushes the segment to the Pinot Controller.

Batch with complex data types

This example demonstrates how to do batch processing in Pinot where the the data items have complex fields that need to be unnested. The command:

Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates the githubEvents table
Launches a standalone data ingestion job that builds one segment for a given JSON data file for the githubEvents table and pushes the segment to the Pinot Controller.

Streaming

This example demonstrates how to do stream processing with Pinot. The command:

Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates meetupRsvp table
Launches a meetup stream

Streaming JSON

This example demonstrates how to do stream processing with JSON documents in Pinot. The command:

Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates meetupRsvp table
Launches a meetup stream

Streaming with minion cleanup

This example demonstrates how to do stream processing in Pinot with RealtimeToOfflineSegmentsTask and MergeRollupTask minion tasks continuously optimizing segments as data gets ingested. The command:

Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, Pinot Minion, and Pinot Server.
Creates githubEvents table
Launches a GitHub events stream

Streaming with complex data types

This example demonstrates how to do stream processing in Pinot where the stream contains items that have complex fields that need to be unnested. The command:

Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, Pinot Minion, and Pinot Server.
Creates meetupRsvp table
Launches a meetup stream

Upsert

This example demonstrates how to do with Pinot. The command:

Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates meetupRsvp table
Launches a meetup stream

Upsert JSON

This example demonstrates how to do with JSON documents in Pinot. The command:

Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates meetupRsvp table
Launches a meetup stream

Hybrid

This example demonstrates how to do hybrid stream and batch processing with Pinot. The command:

Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates airlineStats table
Launches a standalone data ingestion job that builds segments under a given directory of Avro files for the airlineStats table and pushes the segments to the Pinot Controller.

Join

This example demonstrates how to do joins in Pinot using the . The command:

Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server in the same container.
Creates the baseballStats table
Launches a data ingestion job that builds one segment for a given CSV data file for the baseballStats table and pushes the segment to the Pinot Controller.

SegmentGenerationJobSpec: 
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
  segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**\/*.csv
inputDirURI: /tmp/pinot-quick-start/rawdata/
jobType: SegmentCreationAndTarPush
outputDirURI: /tmp/pinot-quick-start/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: 'http://localhost:9000'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: null
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader,
  configClassName: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig,
  configs: null, dataFormat: csv}
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: 'http://localhost:9000/tables/transcript/schema', tableConfigURI: 'http://localhost:9000/tables/transcript',
  tableName: transcript}

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Finished building StatsCollector!
Collected stats for 4 documents
Using fixed bytes value dictionary for column: studentID, size: 9
Created dictionary for STRING column: studentID with cardinality: 3, max length in bytes: 3, range: 200 to 202
Using fixed bytes value dictionary for column: firstName, size: 12
Created dictionary for STRING column: firstName with cardinality: 3, max length in bytes: 4, range: Bob to Nick
Using fixed bytes value dictionary for column: lastName, size: 15
Created dictionary for STRING column: lastName with cardinality: 3, max length in bytes: 5, range: King to Young
Created dictionary for FLOAT column: score with cardinality: 4, range: 3.2 to 3.8
Using fixed bytes value dictionary for column: gender, size: 12
Created dictionary for STRING column: gender with cardinality: 2, max length in bytes: 6, range: Female to Male
Using fixed bytes value dictionary for column: subject, size: 21
Created dictionary for STRING column: subject with cardinality: 3, max length in bytes: 7, range: English to Physics
Created dictionary for LONG column: timestampInEpoch with cardinality: 4, range: 1570863600000 to 1572418800000
Start building IndexCreator!
Finished records indexing in IndexCreator!
Finished segment seal!
Converting segment: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to v3 format
v3 segment location for segment: transcript_OFFLINE_1570863600000_1572418800000_0 is /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3
Deleting files in v1 segment directory: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0
Starting building 1 star-trees with configs: [StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]] using OFF_HEAP builder
Starting building star-tree with config: StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]
Generated 3 star-tree records from 4 segment records
Finished constructing star-tree, got 9 tree nodes and 4 records under star-node
Finished creating aggregated documents, got 6 aggregated records
Finished building star-tree in 10ms
Finished building 1 star-trees in 27ms
Computed crc = 3454627653, based on files [/var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/columns.psf, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/index_map, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/metadata.properties, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index_map]
Driver, record read time : 0
Driver, stats collector time : 0
Driver, indexing time : 0
Tarring segment from: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz
Size for segment: transcript_OFFLINE_1570863600000_1572418800000_0, uncompressed: 6.73KB, compressed: 1.89KB
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Start pushing segments: [/tmp/pinot-quick-start/segments/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz]... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@243c4f91] for table transcript
Pushing segment: transcript_OFFLINE_1570863600000_1572418800000_0 to location: http://localhost:9000 for table transcript
Sending request: http://localhost:9000/v2/segments?tableName=transcript to controller: nehas-mbp.hsd1.ca.comcast.net, version: Unknown
Response for pushing table transcript segment transcript_OFFLINE_1570863600000_1572418800000_0 to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: transcript_OFFLINE_1570863600000_1572418800000_0 of table: transcript"}

Operations FAQ

This page has a collection of frequently asked questions about operations with answers from the community.

This is a list of questions frequently asked in our troubleshooting channel on Slack. To contribute additional questions and answers, make a pull request.

Memory

How much heap should I allocate for my Pinot instances?

Typically, Apache Pinot components try to use as much off-heap (MMAP/DirectMemory) wherever possible. For example, Pinot servers load segments in memory-mapped files in MMAP mode (recommended), or direct memory in HEAP mode. Heap memory is used mostly for query execution and storing some metadata. We have seen production deployments with high throughput and low-latency work well with just 16 GB of heap for Pinot servers and brokers. The Pinot controller may also cache some metadata (table configurations etc) in heap, so if there are just a few tables in the Pinot cluster, a few GB of heap should suffice.

DR

Does Pinot provide any backup/restore mechanism?

Pinot relies on deep-storage for storing a backup copy of segments (offline as well as real-time). It relies on Zookeeper to store metadata (table configurations, schema, cluster state, and so on). It does not explicitly provide tools to take backups or restore these data, but relies on the deep-storage (ADLS/S3/GCP/etc), and ZK to persist these data/metadata.

Alter Table

Can I change a column name in my table, without losing data?

Changing a column name or data type is considered backward incompatible change. While Pinot does support schema evolution for backward compatible changes, it does not support backward incompatible changes like changing name/data-type of a column.

How to change number of replicas of a table?

You can change the number of replicas by updating the table configuration's section. Make sure you have at least as many servers as the replication.

For offline tables, update :

For real-time tables, update :

After changing the replication, run a .

Note that if you are using replica groups, it's expected these configurations equal numReplicaGroups. If they do not match, Pinot will use numReplicaGroups.

How to set or change table retention?

By default there is no retention set for a table in Apache Pinot. You may however, set retention by setting the following properties in the section inside table configs:

retentionTimeUnit
retentionTimeValue

Updating the retention value in the table config should be good enough, there is no need to rebalance the table or reload its segments.

Rebalance

How to run a rebalance on a table?

See .

Why does my real-time table not use the new nodes I added to the cluster?

Likely explanation: num partitions * num replicas < num servers.

In real-time tables, segments of the same partition always remain on the same node. This sticky assignment is needed for replica groups and is critical if using upserts. For instance, if you have 3 partitions, 1 replica, and 4 nodes, only ¾ nodes will be used, and all of p0 segments will be on 1 node, p1 on 1 node, and p2 on 1 node. One server will be unused, and will remain unused through rebalances.

There’s nothing we can do about CONSUMING segments, they will continue to use only 3 nodes if you have 3 partitions. But we can rebalance such that completed segments use all nodes. If you want to force the completed segments of the table to use the new server use this config:

Segments

How to control the number of segments generated?

The number of segments generated depends on the number of input files. If you provide only 1 input file, you will get 1 segment. If you break up the input file into multiple files, you will get as many segments as the input files.

What are the common reasons my segment is in a BAD state ?

This typically happens when the server is unable to load the segment. Possible causes: out-of-memory, no disk space, unable to download segment from deep-store, and similar other errors. Check server logs for more information.

How to reset a segment when it runs into a BAD state?

Use the segment reset controller REST API to reset the segment:

How do I pause real-time ingestion?

Refer to .

What's the difference between Reset, Refresh, and Reload?

Reset: Gets a segment in ERROR state back to ONLINE or CONSUMING state. Behind the scenes, the Pinot controller takes the segment to the OFFLINE state, waits for External View to stabilize, and then moves it back to ONLINE or CONSUMING state, thus effectively resetting segments or consumers in error states.

In addition, RESET brings the segment OFFLINE temporarily; while REFRESH and RELOAD swap the segment on server atomically without bringing down the segment or affecting ongoing queries.

Tenants

How can I make brokers/servers join the cluster without the DefaultTenant tag?

Set this property in your controller.conf file:

Now your brokers and servers should join the cluster as broker_untagged and server_untagged. You can then directly use the POST /tenants API to create the desired tenants, as in the following:

Minion

How do I tune minion task timeout and parallelism on each worker?

There are two task configurations, but they are set as part of cluster configurations, like in the following example. One controls the task's overall timeout (1hr by default) and one sets how many tasks to run on a single minion worker (1 by default). The <taskType> is the task to tune, such as MergeRollupTask or RealtimeToOfflineSegmentsTask etc.

How to I manually run a Periodic Task?

See .

Tuning and Optimizations

Do replica groups work for real-time?

Yes, replica groups work for real-time. There's 2 parts to enabling replica groups:

Replica groups segment assignment.
Replica group query routing.

Replica group segment assignment

Replica group segment assignment is achieved in real-time, if number of servers is a multiple of number of replicas. The partitions get uniformly sprayed across the servers, creating replica groups. For example, consider we have 6 partitions, 2 replicas, and 4 servers.

As you can see, the set (S0, S2) contains r1 of every partition, and (s1, S3) contains r2 of every partition. The query will only be routed to one of the sets, and not span every server. If you are are adding/removing servers from an existing table setup, you have to run for segment assignment changes to take effect.

Replica group query routing

Once replica group segment assignment is in effect, the query routing can take advantage of it. For replica group based query routing, set the following in the table config's section, and then restart brokers

Overwrite index configs at tier level

When using , user may want to have different encoding and indexing types for a column in different tiers to balance query latency and cost saving more flexibly. For example, segments in the hot tier can use dict-encoding, bloom filter and all kinds of relevant index types for very fast query execution. But for segments in the cold tier, where cost saving matters more than low query latency, one may want to use raw values and bloom filters only.

The following two examples show how to overwrite encoding type and index configs for tiers. Similar changes are also demonstrated in the .

Overwriting single-column index configs using fieldConfigList. All top level fields in can be overwritten, and fields not overwritten are kept intact.

Overwriting star-tree index configurations using tableIndexConfig. The StarTreeIndexConfigs is overwritten as a whole. In fact, all top level fields defined in can be overwritten, so single-column index configs defined in tableIndexConfig can also be overwritten but it's less clear than using fieldConfigList.

Credential

How do I update credentials for real-time upstream without downtime?

.
Wait for the pause status to change to success.
Update the credential in the table config.

Getting Started

hashtagRunning Pinot

hashtagDeploy to a public cloud

hashtagData import examples

Running Pinot locally

hashtagDownload Apache Pinot

hashtagPrerequisites

hashtagM1 and M2 Mac Support

hashtagSet up a cluster

hashtagQuick start

hashtagManual cluster

hashtagStart Zookeeper

hashtagStart Pinot Controller

hashtagStart Pinot Broker

hashtagStart Pinot Server

hashtagStart Kafka

hashtagStart a Pinot component in debug mode with IntelliJ

Running Pinot in Docker

hashtagSet up a cluster

hashtagQuick start

hashtagManual cluster

hashtagDocker

hashtagCreate a Network

hashtagStart Zookeeper

hashtagStart Pinot Controller

hashtagStart Pinot Broker

hashtagStart Pinot Server

hashtagStart Kafka

hashtagDocker Compose

Running on public clouds

Running on Azure

hashtag1. Tooling Installation

hashtag1.1 Install Kubectl

hashtag1.2 Install Helm

hashtag1.3 Install Azure CLI

hashtag2. (Optional) Log in to your Azure account

hashtag3. (Optional) Create a Resource Group

hashtag4. (Optional) Create a Kubernetes cluster(AKS) in Azure

hashtag5. Connect to an existing cluster

hashtag6. Pinot quickstart

hashtag7. Delete a Kubernetes Cluster

Running on GCP

hashtag1. Tooling Installation

hashtag

Running on AWS

hashtag1. Tooling Installation

hashtag

HDFS as Deep Storage

hashtagServer Setup

hashtagConfiguration

hashtagExecutable

hashtagController Setup

hashtagConfiguration

hashtagExecutable

hashtagBroker Setup

hashtagConfiguration

hashtagExecutable

hashtagTroubleshooting

Frequently Asked Questions (FAQs)

General

hashtagHow does Apache Pinot use deep storage?

hashtagHow does Pinot use Zookeeper?

hashtagWhy am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

Pinot On Kubernetes FAQ

hashtagHow to increase server disk size on AWS

hashtag1. Update Storage Class

hashtag2. Update PVC

hashtag3. Restart pod to let it reflect

Query FAQ

hashtagQuerying

Running on public clouds

Getting Started

hashtagRunning Pinot

hashtagDeploy to a public cloud

hashtagData import examples

Frequently Asked Questions (FAQs)

General

hashtagHow does Apache Pinot use deep storage?

hashtagHow does Pinot use Zookeeper?

hashtagWhy am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

Running Pinot

Deploy to a public cloud

Data import examples

Download Apache Pinot

Prerequisites

M1 and M2 Mac Support

Set up a cluster

Quick start

Manual cluster

Start Zookeeper

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Start a Pinot component in debug mode with IntelliJ

Set up a cluster

Quick start

Manual cluster

Docker

Create a Network

Start Zookeeper

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Docker Compose

1. Tooling Installation

1.1 Install Kubectl

1.2 Install Helm

1.3 Install Azure CLI

2. (Optional) Log in to your Azure account

3. (Optional) Create a Resource Group

4. (Optional) Create a Kubernetes cluster(AKS) in Azure

5. Connect to an existing cluster

6. Pinot quickstart

7. Delete a Kubernetes Cluster

1. Tooling Installation

1. Tooling Installation

Server Setup

Configuration

Executable

Controller Setup

Configuration

Executable

Broker Setup

Configuration

Executable

Troubleshooting

How does Apache Pinot use deep storage?

How does Pinot use Zookeeper?

Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

How to increase server disk size on AWS

1. Update Storage Class

2. Update PVC

3. Restart pod to let it reflect

Querying

Running Pinot

Deploy to a public cloud

Data import examples

How does Apache Pinot use deep storage?

How does Pinot use Zookeeper?

Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

1. Tooling Installation

1.2 Install Helm

1.3 Install Google Cloud SDK

1.3.1 For Mac users

2. (Optional) Initialize Google Cloud Environment

3. (Optional) Create a Kubernetes cluster(GKE) in Google Cloud

4. Connect to an existing cluster

5. Pinot quickstart

6. Delete a Kubernetes Cluster

1. Tooling Installation

1.1 Install Kubectl

1.2 Install Helm

1.3 Install Azure CLI

2. (Optional) Log in to your Azure account

3. (Optional) Create a Resource Group

4. (Optional) Create a Kubernetes cluster(AKS) in Azure

5. Connect to an existing cluster

6. Pinot quickstart