1 of 19

Getting Started

This section contains quick start guides to help you get up and running with Pinot.

Running Pinot

To simplify the getting started experience, Pinot ships with quick start guides that launch Pinot components in a single process and import pre-built datasets.

For a full list of these guides, see Quick Start Examples.

Running Pinot locally Running Pinot in Docker Running in Kubernetes

Deploy to a public cloud

Data import examples

Getting data into Pinot is easy. Take a look at these two quick start guides which will help you get up and running with sample data for offline and real-time .

Running Pinot locally

This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.

In this guide, you'll learn how to download and install Apache Pinot as a standalone instance.

Download Apache Pinot

First, let's download the Pinot distribution for this tutorial. You can either download a packaged release or build a distribution from the source code.

Prerequisites

Install JDK11 or higher (JDK16 is not yet supported) For JDK 8 support use Pinot 0.7.1 or compile from the source code.

You can build from source or download the distribution:

Download the latest binary release from , or use this command

Once you have the tar file:

You can find older versions of Apache Pinot at . For example, if you wanted to download Pinot 0.10.0, you could run the following command:

Follow these steps to checkout code from and build Pinot locally

M1 Mac Support

Currently Apache Pinot doesn't provide official binaries for M1 Mac. You can however build from source using the steps provided above. In addition to the steps, you will need to add the following in your ~/.m2/settings.xml prior to the build.

Also make sure to install rosetta

softwareupdate --install-rosetta

Note that some installations of the JDK do not contain the JNI bindings that are necessary to run all tests, if you see any java.lang.UnsatisfiedLinkError while running tests, you may need to change your JDK. If using Homebrew, you may install AdoptOpenJDK 11 using: brew install --cask adoptopenjdk11

Now that we've downloaded Pinot, it's time to set up a cluster. There are two ways to do this:

Quick Start

Pinot comes with quick-start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick-start launches Pinot with a baseball dataset pre-loaded:

For a list of all the available quick starts, see the .

Manual Cluster

If you want to play with bigger datasets (more than a few MB), you can launch all the components individually.

The video below is a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances.

You can find the commands that are shown in this video in the GitHub repository.

The examples below assume that you are using Java 8.

If you are using Java 11+ users, remove the GC settings insideJAVA_OPTS. So, for example, instead of:

You'd have:

Start Zookeeper

You can use to browse the Zookeeper instance.

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Once your cluster is up and running, you can head over to to learn how to run queries against the data.

Start Pinot Component in Debug Mode with IntelliJ

Starting a pinot component of interest in IntelliJ using debug mode can be useful for development purposes. You can set break points and inspect variables. Take debugging server for example, one can start zookeeper , controller, and broker using the steps in . Then use the following configuration put under $PROJECT_DIR$\.run ) to start server. This is an example of how it can be used. Please replace the metrics-core version and cluster name as needed.

Running Pinot in Docker

This guide will show you to run a Pinot Cluster using Docker.

In this guide we will learn about running Pinot in Docker.

This guide assumes that you have installed Docker and have configured it with enough memory. A sample config is shown below:

The latest Pinot Docker image is published at apachepinot/pinot:latest and you can see a list of all published tags on Docker Hub.

You can pull the Docker image onto your machine by running the following command:

Or if you want to use a specific version:

Now that we've downloaded the Pinot Docker image, it's time to set up a cluster. There are two ways to do this:

Quick Start

Pinot comes with quick-start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick-start launches Pinot with a baseball dataset pre-loaded:

For a list of all the available quick starts, see the .

Manual Cluster

The quick start scripts launch Pinot with minimal resources. If you want to play with bigger datasets (more than a few MB), you can launch each of the Pinot components individually.

Docker

Create a Network

Create an isolated bridge network in docker

Start Zookeeper

Start Zookeeper in daemon mode. This is a single node zookeeper setup. Zookeeper is the central metadata store for Pinot and should be set up with replication for production use. For more information, see .

Start Pinot Controller

Start Pinot Controller in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Pinot Broker

Start Pinot Broker in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Pinot Server

Start Pinot Server in daemon and connect to Zookeeper.

The command below expects a 16GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Kafka

Optionally, you can also start Kafka for setting up realtime streams. This brings up the Kafka broker on port 9092.

Now all Pinot related components are started as an empty cluster.

You can run the below command to check container status.

Sample Console Output

Docker Compose

Create a file called docker-compose.yml that contains the following:

Run the following command to launch all the components:

You can run the below command to check the container status.

Sample Console Output

Once your cluster is up and running, you can head over to to learn how to run queries against the data.

If you have or installed, you could also try running the .

Note: These are sample configs to be used as references. For production setup, you may want to customize it to your needs.

Running on public clouds

This page contains multiple quick start guides for deploying Pinot to a public cloud provider.

The following quick start guides will show you how to run an Apache Pinot cluster using Kubernetes on different public cloud providers.

Running on Azure Running on GCP Running on AWS

Running on Azure

This starter guide provides a quick start for running Pinot on Microsoft Azure

This document provides the basic instruction to set up a Kubernetes Cluster on

1. Tooling Installation

Running on GCP

This starter provides a quick start for running Pinot on Google Cloud Platform (GCP)

This document provides the basic instruction to set up a Kubernetes Cluster on Google Kubernetes Engine(GKE)

1. Tooling Installation

1.1 Install Kubectl

Please follow this link () to install kubectl.

For Mac User

Please check kubectl version after installation.

QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Please follow this link () to install helm.

For Mac User

Please check helm version after installation.

This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

1.3 Install Google Cloud SDK

Please follow this link () to install Google Cloud SDK.

1.3.1 For Mac User

Install Google Cloud SDK

Restart your shell

2. (Optional) Initialize Google Cloud Environment

3. (Optional) Create a Kubernetes cluster(GKE) in Google Cloud

Below script will create a 3 nodes cluster named pinot-quickstart in us-west1-b with n1-standard-2 machines for demo purposes.

Please modify the parameters in the example command below:

You can monitor cluster status by command:

Once the cluster is in RUNNING status, it's ready to be used.

4. Connect to an existing cluster

Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

To verify the connection, you can run:

5. Pinot Quickstart

Please follow this to deploy your Pinot Demo.

6. Delete a Kubernetes Cluster

Running on AWS

This guide provides a quick start for running Pinot on Amazon Web Services (AWS).

This document provides the basic instruction to set up a Kubernetes Cluster on Amazon Elastic Kubernetes Service (Amazon EKS)

1. Tooling Installation

1.1 Install Kubectl

Please follow this link () to install kubectl.

For Mac User

Please check kubectl version after installation.

QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Please follow this link () to install helm.

For Mac User

Please check helm version after installation.

This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

1.3 Install AWS CLI

Please follow this link () to install AWS CLI.

For Mac User

1.4 Install Eksctl

Please follow this link () to install AWS CLI.

For Mac User

For first time AWS user, please register your account at .

Once created the account, you can go to to create a user and create access keys under Security Credential tab.

Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will override AWS configuration stored in file ~/.aws/credentials

3. (Optional) Create a Kubernetes cluster(EKS) in AWS

The script below will create a 1 node cluster named pinot-quickstart in us-west-2 with a t3.xlarge machine for demo purposes:

You can monitor the cluster status via this command:

Once the cluster is in ACTIVE status, it's ready to be used.

4. Connect to an existing cluster

Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

To verify the connection, you can run:

5. Pinot Quickstart

Please follow this to deploy your Pinot Demo.

6. Delete a Kubernetes Cluster

Stream ingestion example

The Docker instructions on this page are still WIP

So far, we setup our cluster, ran some queries on the demo tables and explored the admin endpoints. We also uploaded some sample batch data for transcript table.

Now, it's time to ingest from a sample stream into Pinot. The rest of the instructions assume you're using .

Data Stream

First, we need to setup a stream. Pinot has out-of-the-box realtime ingestion support for Kafka. Other streams can be plugged in, more details in

HDFS as Deep Storage

This guide helps to setup HDFS as deepstorage for Pinot Segment.

To use HDFS as deep storage you need to include HDFS dependency jars and plugins.

Server Setup

Configuration.

Executable.

Controller Setup

Configuration.

Executable.

Broker Setup

Configuration.

Executable.

Troubleshooting Pinot

Is there any debug information available in Pinot?

Pinot offers various ways to assist with troubleshooting and debugging problems that might happen. It is recommended to start off with the debug api which may quickly surface some of the commonly occurring problems. The debug api provides information such as tableSize, ingestion status, any error messages related to state transition in server, among other things.

The table debug api can be invoked via the Swagger UI as follows:

Frequently Asked Questions (FAQs)

This page has a collection of frequently asked questions with answers from the community.

This is a list of frequent questions most often asked in our troubleshooting channel on Slack. Please feel free to contribute your questions and answers here and make a pull request.

General

FAQ for general questions around Pinot

How does Pinot use deep storage?

When data is pushed in to Pinot, it makes a backup copy of the data and stores it on the configured deep-storage (S3/GCP/ADLS/NFS/etc). This copy is stored as tar.gz Pinot segments. Note, that pinot servers keep a (untarred) copy of the segments on their local disk as well. This is done for performance reasons.

How does Pinot use Zookeeper?

Pinot uses Apache Helix for cluster management, which in turn is built on top of Zookeeper. Helix uses Zookeeper to store the cluster state, including Ideal State, External View, Participants, etc. Besides that, Pinot uses Zookeeper to store other information such as Table configs, schema, Segment Metadata, etc.

Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

Please check the JDK version you are using. The release 0.8.0 binary is on JDK 11. You may be getting this error if you are using JDK8. In that case, please consider using JDK11, or you will need to download the for the release and it locally.

Pinot On Kubernetes FAQ

How to increase server disk size on AWS

Below is an example of AWS EKS.

1. Update Storage Class

In the K8s cluster, check the storage class: in AWS, it should be gp2.

Then update StorageClass to ensure:

Once StorageClass is updated, it should be like:

2. Update PVC

Once the storage class is updated, then we can update PVC for the server disk size.

Now we want to double the disk size for pinot-server-3.

Below is an example of current disks:

Below is the output of data-pinot-server-3

Now, let's change the PVC size to 2T by editing the server PVC.

Once updated, the spec's PVC size is updated to 2T, but the status's PVC size is still 1T.

3. Restart pod to let it reflect

Restart pinot-server-3 pod:

Recheck PVC size:

Running Pinot locally

This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.

In this guide, you'll learn how to download and install Apache Pinot as a standalone instance.

Download Apache Pinot

First, let's download the Pinot distribution for this tutorial. You can either download a packaged release or build a distribution from the source code.

Prerequisites

Install JDK11 or higher (JDK16 is not yet supported) For JDK 8 support use Pinot 0.7.1 or compile from the source code.

You can build from source or download the distribution:

Download the latest binary release from , or use this command

Once you have the tar file:

You can find older versions of Apache Pinot at . For example, if you wanted to download Pinot 0.10.0, you could run the following command:

Follow these steps to checkout code from and build Pinot locally

M1 Mac Support

Also make sure to install rosetta

softwareupdate --install-rosetta

Now that we've downloaded Pinot, it's time to set up a cluster. There are two ways to do this:

Quick Start

Pinot comes with quick-start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick-start launches Pinot with a baseball dataset pre-loaded:

For a list of all the available quick starts, see the .

Manual Cluster

If you want to play with bigger datasets (more than a few MB), you can launch all the components individually.

The video below is a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances.

You can find the commands that are shown in this video in the GitHub repository.

The examples below assume that you are using Java 8.

If you are using Java 11+ users, remove the GC settings insideJAVA_OPTS. So, for example, instead of:

You'd have:

Start Zookeeper

You can use to browse the Zookeeper instance.

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Once your cluster is up and running, you can head over to to learn how to run queries against the data.

Start Pinot Component in Debug Mode with IntelliJ

Running Pinot in Docker

This guide will show you to run a Pinot Cluster using Docker.

In this guide we will learn about running Pinot in Docker.

This guide assumes that you have installed Docker and have configured it with enough memory. A sample config is shown below:

The latest Pinot Docker image is published at apachepinot/pinot:latest and you can see a list of all published tags on Docker Hub.

You can pull the Docker image onto your machine by running the following command:

docker pull apachepinot/pinot:latest

Or if you want to use a specific version:

Now that we've downloaded the Pinot Docker image, it's time to set up a cluster. There are two ways to do this:

Quick Start

Pinot comes with quick-start commands that launch instances of Pinot components in the same process and import pre-built datasets.

For example, the following quick-start launches Pinot with a baseball dataset pre-loaded:

For a list of all the available quick starts, see the .

Manual Cluster

The quick start scripts launch Pinot with minimal resources. If you want to play with bigger datasets (more than a few MB), you can launch each of the Pinot components individually.

Docker

Create a Network

Create an isolated bridge network in docker

Start Zookeeper

Start Pinot Controller

Start Pinot Controller in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Pinot Broker

Start Pinot Broker in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Pinot Server

Start Pinot Server in daemon and connect to Zookeeper.

The command below expects a 16GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.

Start Kafka

Optionally, you can also start Kafka for setting up realtime streams. This brings up the Kafka broker on port 9092.

Now all Pinot related components are started as an empty cluster.

You can run the below command to check container status.

Sample Console Output

Docker Compose

Create a file called docker-compose.yml that contains the following:

Run the following command to launch all the components:

You can run the below command to check the container status.

Sample Console Output

Once your cluster is up and running, you can head over to to learn how to run queries against the data.

If you have or installed, you could also try running the .

Note: These are sample configs to be used as references. For production setup, you may want to customize it to your needs.

Operations FAQ

Memory

How much heap should I allocate for my Pinot instances?

Typically, Pinot components try to use as much off-heap (MMAP/DirectMemory) wherever possible. For example, Pinot servers load segments in memory-mapped files in MMAP mode (recommended), or direct memory in HEAP mode. Heap memory is used mostly for query execution and storing some metadata. We have seen production deployments with high throughput and low-latency work well with just 16 GB of heap for Pinot servers and brokers. Pinot controller may also cache some metadata (table configs etc) in heap, so if there are just a few tables in the Pinot cluster, a few GB of heap should suffice.

DR

Does Pinot provide any backup/restore mechanism?

Pinot relies on deep-storage for storing backup copy of segments (offline as well as realtime). It relies on Zookeeper to store metadata (table configs, schema, cluster state, etc). It does not explicitly provide tools to take backups or restore these data, but relies on the deep-storage (ADLS/S3/GCP/etc), and ZK to persist these data/metadata.

Alter Table

Can I change a column name in my table, without losing data?

Changing a column name or data type is considered backward incompatible change. While Pinot does support schema evolution for backward compatible changes, it does not support backward incompatible changes like changing name/data-type of a column.

How to change number of replicas of a table?

You can change the number of replicas by updating the table config's section. Make sure you have at least as many servers as the replication.

For OFFLINE table, update

For REALTIME table update

After changing the replication, run a .

Rebalance

How to run a rebalance on a table?

Refer to .

Why does my REALTIME table not use the new nodes I added to the cluster?

Likely explanation: num partitions * num replicas < num servers

In realtime tables, segments of the same partition always continue to remain on the same node. This sticky assignment is needed for replica groups and is critical if using upserts. For instance, if you have 3 partitions, 1 replica, and 4 nodes, only ¾ nodes will be used, and all of p0 segments will be on 1 node, p1 on 1 node, and p2 on 1 node. One server will be unused, and will remain unused through rebalances.

There’s nothing we can do about CONSUMING segments, they will continue to use only 3 nodes if you have 3 partitions. But we can rebalance such that completed segments use all nodes. If you want to force the completed segments of the table to use the new server, use this config

Segments

How to control number of segments generated?

The number of segments generated depends on the number of input files. If you provide only 1 input file, you will get 1 segment. If you break up the input file into multiple files, you will get as many segments as the input files.

What are the common reasons my segment is in a BAD state ?

This typically happens when the server is unable to load the segment. Possible causes: Out-Of-Memory, no-disk space, unable to download segment from deep-store, and similar other errors. Please check server logs for more information.

How to reset a segment when it runs into a BAD state?

Use the segment reset controller REST API to reset the segment:

How to pause realtime ingestion?

Refer to .

What's the difference to Reset, Refresh, or Reload a segment?

RESET: this gets a segment in ERROR state back to ONLINE or CONSUMING state. Behind the scenes, Pinot controller takes the segment to OFFLINE state, waits for External View to stabilize, and then moves it back to ONLINE/CONSUMING state, thus effectively resetting segments or consumers in error states.

REFRESH: this replaces the segment with a new one, with the same name but often different data. Under the hood, Pinot controller sets new segment metadata in Zookeeper, and notifies brokers and servers to check their local states about this segment and update accordingly. Servers also download the new segment to replace the old one, when both have different checksums. There is no separate rest API for refreshing, and it is done as part of SegmentUpload API today.

RELOAD: this reloads the segment, often to generate a new index as updated in table config. Underlying, Pinot server gets the new table config from Zookeeper, and uses it to guide the segment reloading. In fact, the last step of REFRESH as explained above is to load the segment into memory to serve queries. There is a dedicated rest API for reloading. By default, it doesn't download segment. But option is provided to force server to download segment to replace the local one cleanly.

In addition, RESET brings the segment OFFLINE temporarily; while REFRESH and RELOAD swap the segment on server atomically without bringing down the segment or affecting ongoing queries.

Tenants

How can I make brokers/servers join the cluster without the DefaultTenant tag?

Set this property in your controller.conf file

Now your brokers and servers should join the cluster as broker_untagged and server_untagged . You can then directly use the POST /tenants API to create the desired tenants

Minion

How to tune minion task timeout and parallelism on each worker

There are two task configs but set as part of cluster configs like below. One controls task's overall timeout (1hr by default) and one for how many tasks to run on a single minion worker (1 by default). The <taskType> is the task to tune, e.g. MergeRollupTask or RealtimeToOfflineSegmentsTask etc.

How to I manually run a Periodic Task

Refer to

Tuning and Optimizations

Do replica groups work for real-time?

Yes, replica groups work for realtime. There's 2 parts to enabling replica groups:

Replica groups segment assignment
Replica group query routing

Replica group segment assignment

Replica group segment assignment is achieved in realtime, if number of servers is a multiple of number of replicas. The partitions get uniformly sprayed across the servers, creating replica groups. For example, consider we have 6 partitions, 2 replicas, and 4 servers.

As you can see, the set (S0, S2) contains r1 of every partition, and (s1, S3) contains r2 of every partition. The query will only be routed to one of the sets, and not span every server. If you are are adding/removing servers from an existing table setup, you have to run for segment assignment changes to take effect.

Replica group query routing

Once replica group segment assignment is in effect, the query routing can take advantage of it. For replica group based query routing, set the following in the table config's section, and then restart brokers

Credential

How to update credential for realtime upstream without downtime

Wait for the pause status to success
Update the credential in the table config

Batch import example

Step-by-step guide on pushing your own data into the Pinot cluster

So far, we have set up our cluster, ran some queries, and explored the admin endpoints. Now, it's time to get our own data into Pinot. The rest of the instructions assume you're using Pinot in Docker.

Preparing your data

Let's gather our data files and put them in pinot-quick-start/rawdata.

Supported file formats are CSV, JSON, AVRO, PARQUET, THRIFT, ORC. If you don't have sample data, you can use this sample CSV.

Creating a schema

Schema is used to define the columns and data types of the Pinot table. A detailed overview of the schema can be found in .

Briefly, we categorize our columns into 3 types

Column Type

Description

For example, in our sample table, the studentID,firstName,lastName,gender,subject columns are the dimensions, the score column is the metric and timestampInEpoch is the time column.

Once you have identified the dimensions, metrics and time columns, create a schema for your data, using the reference below.

Creating a table config

A table config is used to define the config related to the Pinot table. A detailed overview of the table can be found in .

Here's the table config for the sample CSV file. You can use this as a reference to build your own table config. Simply edit the tableName and schemaName.

Uploading your table config and schema

Check the directory structure so far

Upload the table config using the following command

Check out the table config and schema in the to make sure it was successfully uploaded.

Creating a segment

A Pinot table's data is stored as Pinot segments. A detailed overview of the segment can be found in .

To generate a segment, we need to first create a job spec yaml file. JobSpec yaml file has all the information regarding data format, input data location and pinot cluster coordinates. You can just copy over this job spec file. If you're using your own data, be sure to 1) replace transcript with your table name 2) set the right recordReaderSpec

Use the following command to generate a segment and upload it

Sample output

Check that your segment made it to the table using the

Querying your data

You're all set! You should see your table in the and be able to run queries against it now.

SegmentGenerationJobSpec: 
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
  segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**\/*.csv
inputDirURI: /tmp/pinot-quick-start/rawdata/
jobType: SegmentCreationAndTarPush
outputDirURI: /tmp/pinot-quick-start/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: 'http://localhost:9000'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: null
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader,
  configClassName: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig,
  configs: null, dataFormat: csv}
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: 'http://localhost:9000/tables/transcript/schema', tableConfigURI: 'http://localhost:9000/tables/transcript',
  tableName: transcript}

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Finished building StatsCollector!
Collected stats for 4 documents
Using fixed bytes value dictionary for column: studentID, size: 9
Created dictionary for STRING column: studentID with cardinality: 3, max length in bytes: 3, range: 200 to 202
Using fixed bytes value dictionary for column: firstName, size: 12
Created dictionary for STRING column: firstName with cardinality: 3, max length in bytes: 4, range: Bob to Nick
Using fixed bytes value dictionary for column: lastName, size: 15
Created dictionary for STRING column: lastName with cardinality: 3, max length in bytes: 5, range: King to Young
Created dictionary for FLOAT column: score with cardinality: 4, range: 3.2 to 3.8
Using fixed bytes value dictionary for column: gender, size: 12
Created dictionary for STRING column: gender with cardinality: 2, max length in bytes: 6, range: Female to Male
Using fixed bytes value dictionary for column: subject, size: 21
Created dictionary for STRING column: subject with cardinality: 3, max length in bytes: 7, range: English to Physics
Created dictionary for LONG column: timestampInEpoch with cardinality: 4, range: 1570863600000 to 1572418800000
Start building IndexCreator!
Finished records indexing in IndexCreator!
Finished segment seal!
Converting segment: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to v3 format
v3 segment location for segment: transcript_OFFLINE_1570863600000_1572418800000_0 is /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3
Deleting files in v1 segment directory: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0
Starting building 1 star-trees with configs: [StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]] using OFF_HEAP builder
Starting building star-tree with config: StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]
Generated 3 star-tree records from 4 segment records
Finished constructing star-tree, got 9 tree nodes and 4 records under star-node
Finished creating aggregated documents, got 6 aggregated records
Finished building star-tree in 10ms
Finished building 1 star-trees in 27ms
Computed crc = 3454627653, based on files [/var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/columns.psf, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/index_map, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/metadata.properties, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index_map]
Driver, record read time : 0
Driver, stats collector time : 0
Driver, indexing time : 0
Tarring segment from: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz
Size for segment: transcript_OFFLINE_1570863600000_1572418800000_0, uncompressed: 6.73KB, compressed: 1.89KB
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Start pushing segments: [/tmp/pinot-quick-start/segments/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz]... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@243c4f91] for table transcript
Pushing segment: transcript_OFFLINE_1570863600000_1572418800000_0 to location: http://localhost:9000 for table transcript
Sending request: http://localhost:9000/v2/segments?tableName=transcript to controller: nehas-mbp.hsd1.ca.comcast.net, version: Unknown
Response for pushing table transcript segment transcript_OFFLINE_1570863600000_1572418800000_0 to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: transcript_OFFLINE_1570863600000_1572418800000_0 of table: transcript"}

Running in Kubernetes

Pinot quick start in Kubernetes

1. Prerequisites

This quickstart assumes that you already have a running Kubernetes cluster. Please follow the links below to set up a Kubernetes cluster.

(make sure to run with enough resources e.g. minikube start --vm=true --cpus=4 --memory=8g --disk-size=50g)

2. Setting up a Pinot cluster in Kubernetes

Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.

The scripts can be found in the Pinot source at ./pinot/kubernetes/helm

2.1 Start Pinot with Helm

Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is .

NOTE: Please specify StorageClass based on your cloud vendor. For Pinot Server, please don't mount blob store like AzureFile/GoogleCloudStorage/S3 as the data serving file system.

Only use Amazon EBS/GCP Persistent Disk/Azure Disk style disks.

For AWS: "gp2"

2.2 Check Pinot deployment status

3. Load data into Pinot using Kafka

3.1 Bring up a Kafka cluster for real-time data ingestion

3.2 Check Kafka deployment status

Ensure the Kafka deployment is ready before executing the scripts in the following next steps.

3.3 Create Kafka topics

The scripts below will create two Kafka topics for data ingestion:

3.4 Load data into Kafka and create Pinot schema/tables

The script below will deploy 3 batch jobs.

Ingest 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec
Ingest 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec
Upload Pinot schema airlineStats

4. Query using Pinot Data Explorer

4.1 Pinot Data Explorer

Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.

This script can be found in the Pinot source at ./pinot/kubernetes/helm/pinot

5. Using Superset to query Pinot

5.1 Bring up Superset using helm

Install SuperSet Helm Repo

Get Helm values config file:

Edit /tmp/superset-values.yaml file and add pinotdb pip dependency into bootstrapScript field, so Superset will install pinot dependencies during bootstrap time.

You can also build your own image with this dependency or just use image: apachepinot/pinot-superset:latest instead.

Also remember to change the admin credential inside the init section with meaningful user profile and stronger password.

Install Superset using helm

Ensure your cluster is up by running:

5.2 Access Superset UI

You can run the below command to port forward superset to your localhost:18088. Then you can navigate superset in your browser with the previous set admin credential.

Create Pinot Database using URI:

pinot+http://pinot-broker.pinot-quickstart:8099/query?controller=http://pinot-controller.pinot-quickstart:9000/

Once the database is added, you can add more data sets and explore the dashboarding.

6. Access Pinot using Trino

6.1 Deploy Trino

You can run the command below to deploy Trino with the Pinot plugin installed.

The above command adds Trino HelmChart repo. You can then run the below command to see the charts.

In order to connect Trino to Pinot, we need to add Pinot catalog, which requires extra configurations. You can run the below command to get all the configurable values.

To add Pinot catalog, you can edit the additionalCatalogs section by adding:

Pinot is deployed at namespace pinot-quickstart, so the controller serviceURL is pinot-controller.pinot-quickstart:9000

After modifying the /tmp/trino-values.yaml file, you can deploy Trino with:

Once you deployed the Trino, You can check Trino deployment status by:

6.2 Query Trino using Trino CLI

Once Trino is deployed, you can run the below command to get a runnable Trino CLI.

6.2.1 Download Trino CLI

6.2.2 Port forward Trino service to your local if it's not already exposed

6.2.3 Use Trino console client to connect to Trino service

6.2.4 Query Pinot data using Trino CLI

6.3 Sample queries to execute

List all catalogs

List All tables

Show schema

Count total documents

7. Access Pinot using Presto

7.1 Deploy Presto using Pinot plugin

You can run the command below to deploy a customized Presto with the Pinot plugin installed.

The above command deploys Presto with default configs. For customizing your deployment, you can run the below command to get all the configurable values.

After modifying the /tmp/presto-values.yaml file, you can deploy Presto with:

Once you deployed the Presto, You can check Presto deployment status by:

7.2 Query Presto using Presto CLI

Once Presto is deployed, you can run the below command from , or just follow steps 6.2.1 to 6.2.3.

6.2.1 Download Presto CLI

6.2.2 Port forward presto-coordinator port 8080 to localhost port 18080

6.2.3 Start Presto CLI with pinot catalog to query it then query it

6.2.4 Query Pinot data using Presto CLI

7.3 Sample queries to execute

List all catalogs

List All tables

Show schema

Count total documents

8. Deleting the Pinot cluster in Kubernetes

Note: These are sample configs to be used as reference. For production setup, you may want to customize it to your needs.

Getting Started

hashtagRunning Pinot

hashtagDeploy to a public cloud

hashtagData import examples

Running Pinot locally

hashtagDownload Apache Pinot

hashtagM1 Mac Support

hashtagQuick Start

hashtagManual Cluster

hashtagStart Zookeeper

hashtagStart Pinot Controller

hashtagStart Pinot Broker

hashtagStart Pinot Server

hashtagStart Kafka

hashtagStart Pinot Component in Debug Mode with IntelliJ

Running Pinot in Docker

hashtagQuick Start

hashtagManual Cluster

hashtagDocker

hashtagCreate a Network

hashtagStart Zookeeper

hashtagStart Pinot Controller

hashtagStart Pinot Broker

hashtagStart Pinot Server

hashtagStart Kafka

hashtagDocker Compose

Running on public clouds

Running on Azure

hashtag1. Tooling Installation

hashtag

Running on GCP

hashtag1. Tooling Installation

hashtag1.1 Install Kubectl

hashtag1.2 Install Helm

hashtag1.3 Install Google Cloud SDK

hashtag1.3.1 For Mac User

hashtag2. (Optional) Initialize Google Cloud Environment

hashtag3. (Optional) Create a Kubernetes cluster(GKE) in Google Cloud

hashtag4. Connect to an existing cluster

hashtag5. Pinot Quickstart

hashtag6. Delete a Kubernetes Cluster

Running on AWS

hashtag1. Tooling Installation

hashtag1.1 Install Kubectl

hashtag1.2 Install Helm

hashtag1.3 Install AWS CLI

hashtag1.4 Install Eksctl

hashtag2. (Optional) Login to your AWS account.

hashtag3. (Optional) Create a Kubernetes cluster(EKS) in AWS

hashtag4. Connect to an existing cluster

hashtag5. Pinot Quickstart

hashtag6. Delete a Kubernetes Cluster

Stream ingestion example

hashtagData Stream

HDFS as Deep Storage

hashtagServer Setup

hashtagConfiguration.

hashtagExecutable.

hashtagController Setup

hashtagConfiguration.

hashtagExecutable.

hashtagBroker Setup

hashtagConfiguration.

hashtagExecutable.

Troubleshooting Pinot

hashtagIs there any debug information available in Pinot?

Frequently Asked Questions (FAQs)

General

hashtagHow does Pinot use deep storage?

hashtagHow does Pinot use Zookeeper?

hashtagWhy am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

Pinot On Kubernetes FAQ

hashtagHow to increase server disk size on AWS

hashtag1. Update Storage Class

hashtag2. Update PVC

hashtag3. Restart pod to let it reflect

Running Pinot locally

hashtagDownload Apache Pinot

hashtagM1 Mac Support

hashtagQuick Start

Running Pinot

Deploy to a public cloud

Data import examples

Download Apache Pinot

M1 Mac Support

Quick Start

Manual Cluster

Start Zookeeper

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Start Pinot Component in Debug Mode with IntelliJ

Quick Start

Manual Cluster

Docker

Create a Network

Start Zookeeper

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Docker Compose

1. Tooling Installation

1. Tooling Installation

1.1 Install Kubectl

1.2 Install Helm

1.3 Install Google Cloud SDK

1.3.1 For Mac User

2. (Optional) Initialize Google Cloud Environment

3. (Optional) Create a Kubernetes cluster(GKE) in Google Cloud

4. Connect to an existing cluster

5. Pinot Quickstart

6. Delete a Kubernetes Cluster

1. Tooling Installation

1.1 Install Kubectl

1.2 Install Helm

1.3 Install AWS CLI

1.4 Install Eksctl

2. (Optional) Login to your AWS account.

3. (Optional) Create a Kubernetes cluster(EKS) in AWS

4. Connect to an existing cluster

5. Pinot Quickstart

6. Delete a Kubernetes Cluster

Data Stream

Server Setup

Configuration.

Executable.

Controller Setup

Configuration.

Executable.

Broker Setup

Configuration.

Executable.

Is there any debug information available in Pinot?

How does Pinot use deep storage?

How does Pinot use Zookeeper?

Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

How to increase server disk size on AWS

1. Update Storage Class

2. Update PVC

3. Restart pod to let it reflect

Download Apache Pinot

M1 Mac Support

Quick Start

Manual Cluster

Start Zookeeper

Start Pinot Controller

Start Pinot Broker

Start Pinot Server

Start Kafka

Start Pinot Component in Debug Mode with IntelliJ

Running Pinot

Deploy to a public cloud

Data import examples

Quick Start

Manual Cluster

Docker

Create a Network

Start Zookeeper