This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.
In this guide, you'll learn how to download and install Apache Pinot as a standalone instance.
Download Apache Pinot
First, let's download the Pinot distribution for this tutorial. You can either download a packaged release or build a distribution from the source code.
Prerequisites
Install JDK11 or higher (JDK16 is not yet supported)
For JDK 8 support use Pinot 0.7.1 or compile from the source code.
You can build from source or download the distribution:
Download the latest binary release from , or use this command
Once you have the tar file:
You can find older versions of Apache Pinot at . For example, if you wanted to download Pinot 0.10.0, you could run the following command:
Follow these steps to checkout code from and build Pinot locally
M1 Mac Support
Currently Apache Pinot doesn't provide official binaries for M1 Mac. You can however build from source using the steps provided above. In addition to the steps, you will need to add the following in your ~/.m2/settings.xml prior to the build.
Also make sure to install rosetta
softwareupdate --install-rosetta
Note that some installations of the JDK do not contain the JNI bindings that are necessary to run all tests, if you see any java.lang.UnsatisfiedLinkError while running tests, you may need to change your JDK. If using Homebrew, you may install AdoptOpenJDK 11 using: brew install --cask adoptopenjdk11
Now that we've downloaded Pinot, it's time to set up a cluster. There are two ways to do this:
Quick Start
Pinot comes with quick-start commands that launch instances of Pinot components in the same process and import pre-built datasets.
For example, the following quick-start launches Pinot with a baseball dataset pre-loaded:
For a list of all the available quick starts, see the .
Manual Cluster
If you want to play with bigger datasets (more than a few MB), you can launch all the components individually.
The video below is a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances.
You can find the commands that are shown in this video in the GitHub repository.
The examples below assume that you are using Java 8.
If you are using Java 11+ users, remove the GC settings insideJAVA_OPTS. So, for example, instead of:
You'd have:
Start Zookeeper
You can use to browse the Zookeeper instance.
Start Pinot Controller
Start Pinot Broker
Start Pinot Server
Start Kafka
Once your cluster is up and running, you can head over to to learn how to run queries against the data.
Start Pinot Component in Debug Mode with IntelliJ
Starting a pinot component of interest in IntelliJ using debug mode can be useful for development purposes. You can set break points and inspect variables. Take debugging server for example, one can start zookeeper , controller, and broker using the steps in . Then use the following configuration put under $PROJECT_DIR$\.run ) to start server. This is an example of how it can be used. Please replace the metrics-core version and cluster name as needed.
PINOT_VERSION=0.12.0#set to the Pinot version you decide to usewgethttps://downloads.apache.org/pinot/apache-pinot-$PINOT_VERSION/apache-pinot-$PINOT_VERSION-bin.tar.gz
This section contains quick start guides to help you get up and running with Pinot.
Running Pinot
To simplify the getting started experience, Pinot ships with quick start guides that launch Pinot components in a single process and import pre-built datasets.
# navigate to directory containing the setup scripts
cdbuild
Deploy to a public cloud
Data import examples
Getting data into Pinot is easy. Take a look at these two quick start guides which will help you get up and running with sample data for offline and real-time .
This page has a collection of frequently asked questions with answers from the community.
This is a list of frequent questions most often asked in our troubleshooting channel on Slack. Please feel free to contribute your questions and answers here and make a pull request.
When data is pushed in to Pinot, it makes a backup copy of the data and stores it on the configured deep-storage (S3/GCP/ADLS/NFS/etc). This copy is stored as tar.gz Pinot segments. Note, that pinot servers keep a (untarred) copy of the segments on their local disk as well. This is done for performance reasons.
How does Pinot use Zookeeper?
Pinot uses Apache Helix for cluster management, which in turn is built on top of Zookeeper. Helix uses Zookeeper to store the cluster state, including Ideal State, External View, Participants, etc. Besides that, Pinot uses Zookeeper to store other information such as Table configs, schema, Segment Metadata, etc.
Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?
Please check the JDK version you are using. The release 0.8.0 binary is on JDK 11. You may be getting this error if you are using JDK8. In that case, please consider using JDK11, or you will need to download the for the release and it locally.
Running Pinot in Docker
This guide will show you to run a Pinot Cluster using Docker.
In this guide we will learn about running Pinot in Docker.
This guide assumes that you have installed and have configured it with enough memory. A sample config is shown below:
The latest Pinot Docker image is published at apachepinot/pinot:latest and you can see a list of .
You can pull the Docker image onto your machine by running the following command:
The quick start scripts launch Pinot with minimal resources. If you want to play with bigger datasets (more than a few MB), you can launch each of the Pinot components individually.
Docker
Create a Network
Create an isolated bridge network in docker
Start Zookeeper
Start Zookeeper in daemon mode. This is a single node zookeeper setup. Zookeeper is the central metadata store for Pinot and should be set up with replication for production use. For more information, see Running Replicated Zookeeper.
Start Pinot Controller
Start Pinot Controller in daemon and connect to Zookeeper.
The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.
Start Pinot Broker
Start Pinot Broker in daemon and connect to Zookeeper.
The command below expects a 4GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.
Start Pinot Server
Start Pinot Server in daemon and connect to Zookeeper.
The command below expects a 16GB memory container. Tune-Xms and-Xmx if your machine doesn't have enough resources.
Start Kafka
Optionally, you can also start Kafka for setting up realtime streams. This brings up the Kafka broker on port 9092.
Now all Pinot related components are started as an empty cluster.
You can run the below command to check container status.
Sample Console Output
Docker Compose
Create a file called docker-compose.yml that contains the following:
Run the following command to launch all the components:
You can run the below command to check the container status.
Sample Console Output
Once your cluster is up and running, you can head over to Exploring Pinot to learn how to run queries against the data.
Is there any debug information available in Pinot?
Pinot offers various ways to assist with troubleshooting and debugging problems that might happen. It is recommended to start off with the debug api which may quickly surface some of the commonly occurring problems. The debug api provides information such as tableSize, ingestion status, any error messages related to state transition in server, among other things.
The table debug api can be invoked via the Swagger UI as follows:
Swagger - Table Debug Api
It can also be invoked directly by accessing the URL as follows. The api requires the tableName, and can optionally take tableType (offline|realtime) and verbosity level.
Pinot also provides a wide-variety of operational metrics that can be used for creating dashboards, alerting and . Also, all pinot components log debug information related to error conditions that can be used for troubleshooting.
How do I debug a slow query or a query which keeps timing out
Please use these steps:
If the query executes, look at the query result. Specifically look at numEntriesScannedInFilter and numDocsScanned.
If numEntriesScannedInFilter is very high, consider adding indexes for the corresponding columns being used in the filter predicates. You should also think about partitioning the incoming data based on the dimension most heavily used in your filter queries.
Pinot On Kubernetes FAQ
How to increase server disk size on AWS
Below is an example of AWS EKS.
1. Update Storage Class
In the K8s cluster, check the storage class: in AWS, it should be gp2.
Then update StorageClass to ensure:
Once StorageClass is updated, it should be like:
2. Update PVC
Once the storage class is updated, then we can update PVC for the server disk size.
Now we want to double the disk size for pinot-server-3.
Below is an example of current disks:
Below is the output of data-pinot-server-3
Now, let's change the PVC size to 2T by editing the server PVC.
Once updated, the spec's PVC size is updated to 2T, but the status's PVC size is still 1T.
3. Restart pod to let it reflect
Restart pinot-server-3 pod:
Recheck PVC size:
Running on Azure
This starter guide provides a quick start for running Pinot on Microsoft Azure
This document provides the basic instruction to set up a Kubernetes Cluster on
1. Tooling Installation
Running on GCP
This starter provides a quick start for running Pinot on Google Cloud Platform (GCP)
This document provides the basic instruction to set up a Kubernetes Cluster on
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ba5cb0868350 apachepinot/pinot:0.9.3 "./bin/pinot-admin.s…" About a minute ago Up About a minute 8096-8099/tcp, 9000/tcp pinot-server
698f160852f9 apachepinot/pinot:0.9.3 "./bin/pinot-admin.s…" About a minute ago Up About a minute 8096-8098/tcp, 9000/tcp, 0.0.0.0:8099->8099/tcp, :::8099->8099/tcp pinot-broker
b1ba8cf60d69 apachepinot/pinot:0.9.3 "./bin/pinot-admin.s…" About a minute ago Up About a minute 8096-8099/tcp, 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp pinot-controller
54e7e114cd53 zookeeper:3.5.6 "/docker-entrypoint.…" About a minute ago Up About a minute 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 8080/tcp pinot-zookeeper
AKS_RESOURCE_GROUP=pinot-demo
AKS_RESOURCE_GROUP_LOCATION=eastus
az group create --name ${AKS_RESOURCE_GROUP} \
--location ${AKS_RESOURCE_GROUP_LOCATION}
AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks create --resource-group ${AKS_RESOURCE_GROUP} \
--name ${AKS_CLUSTER_NAME} \
--node-count 3
AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks get-credentials --resource-group ${AKS_RESOURCE_GROUP} \
--name ${AKS_CLUSTER_NAME}
kubectl get nodes
AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks delete --resource-group ${AKS_RESOURCE_GROUP} \
--name ${AKS_CLUSTER_NAME}
If numDocsScanned is very high, that means the selectivity for the query is low and lots of documents need to be processed after the filtering. Consider refining the filter to increase the selectivity of the query.
If the query is not executing, you can extend the query timeout by appending a timeoutMs parameter to the query (eg: select * from mytable limit 10 option(timeoutMs=60000)). Then you can repeat step 1.
You can also look at GC stats for the corresponding Pinot servers. If a particular server seems to be running full GC all the time, you can do a couple of things such as
Increase JVM heap (Xmx)
Consider using off-heap memory for segments
Decrease the total number of segments per server (by partitioning the data in a better way)
I get the following error when running a query, what does it mean?
This essentially implies that the Pinot Broker assigned to the table specified in the query was not found. A common root cause for this is a typo in the table name in the query. Another uncommon reason could be if there wasn't actually a broker with required broker tenant tag for the table.
What are all the fields in the Pinot query's JSON response?
Here's the page explaining the Pinot response format:
SQL Query fails with "Encountered 'timestamp' was expecting one of..."
"timestamp" is a reserved keyword in SQL. Escape timestamp with double quotes.
Other commonly encountered reserved keywords are date, time, table.
Filtering on STRING column WHERE column = "foo" does not work?
For filtering on STRING columns, use single quotes
ORDER BY using an alias doesn't work?
The fields in the ORDER BY clause must be one of the group by clauses or aggregations, BEFORE applying the alias. Therefore, this will not work
Instead, this will work
Does pagination work in GROUP BY queries?
No. Pagination only works for SELECTION queries
How do I increase timeout for a query ?
You can add this at the end of your query: option(timeoutMs=X). For eg: the following example will use a timeout of 20 seconds for the query:
You can also use SET "timeoutMs" = 20000; SELECT COUNT(*) from myTable
For changing timeout on the entire cluster, set this property pinot.broker.timeoutMs in either broker configs or cluster configs (using POST /cluster/configs API from swagger)
How do I cancel a query?
Add these two configs for Pinot server and broker to start tracking of running queries. The query tracks are added and cleaned as query starts and ends, so should not consume much resource.
Then use the Rest APIs on Pinot controller to list running queries and cancel them via the query ID and broker ID (as query ID is only local to broker), like below:
How do I optimize my Pinot table for doing aggregations and group-by on high cardinality columns ?
In order to speed up aggregations, you can enable metrics aggregation on the required column by adding a in the corresponding schema and setting aggregateMetrics to true in the table config. You can also use a star-tree index config for such columns ()
How do I verify that an index is created on a particular column ?
There are 2 ways to verify this:
Log in to a server that hosts segments of this table. Inside the data directory, locate the segment directory for this table. In this directory, there is a file named index_map which lists all the indexes and other data structures created for each segment. Verify that the requested index is present here.
During query: Use the column in the filter predicate and check the value of numEntriesScannedInFilter . If this value is 0, then indexing is working as expected (works for Inverted index)
Does Pinot use a default value for LIMIT in queries?
Yes, Pinot uses a default value of LIMIT 10 in queries. The reason behind this default value is to avoid unintentionally submitting expensive queries that end up fetching or processing a lot of data from Pinot. Users can always overwrite this by explicitly specifying a LIMIT value.
Does Pinot cache query results?
Pinot does not cache query results, each query is computed in its entirety. Note though, running the same or similar query multiple times will naturally pull in segment pages into memory making subsequent calls faster. Also, for realtime systems, the data is changing in realtime, so results cannot be cached. For offline-only systems, caching layer can be built on top of Pinot, with invalidation mechanism built-in to invalidate the cache when data is pushed into Pinot.
I'm noticing that the first query is slower than subsequent queries, why is that?
Pinot memory maps segments. It warms up during the first query, when segments are pulled into the memory by the OS. Subsequent queries will have the segment already loaded in memory, and hence will be faster. The OS is responsible for bringing the segments into memory, and also removing them in favor of other segments when other segments not already in memory are accessed.
How do I determine if StarTree index is being used for my query?
The query execution engine will prefer to use StarTree index for all queries where it can be used. The criteria to determine whether StarTree index can be used is as follows:
All aggregation function + column pairs in the query must exist in the StarTree index.
All dimensions that appear in filter predicates and group-by should be StarTree dimensions.
For queries where above is true, StarTree index is used. For other queries, the execution engine will default to using the next best index available.
Stream ingestion example
The Docker instructions on this page are still WIP
So far, we setup our cluster, ran some queries on the demo tables and explored the admin endpoints. We also uploaded some sample batch data for transcript table.
Now, it's time to ingest from a sample stream into Pinot. The rest of the instructions assume you're using .
Data Stream
First, we need to setup a stream. Pinot has out-of-the-box realtime ingestion support for Kafka. Other streams can be plugged in, more details in
SELECT count(colA) as aliasA, colA from tableA GROUP BY colA ORDER BY aliasA
SELECT count(colA) as sumA, colA from tableA GROUP BY colA ORDER BY count(colA)
SELECT COUNT(*) from myTable option(timeoutMs=20000)
pinot.server.enable.query.cancellation=true // false by default
pinot.broker.enable.query.cancellation=true // false by default
GET /queries: to show running queries as tracked by all brokers
Response example: `{
"Broker_192.168.0.105_8000": {
"7": "select G_old from baseballStats limit 10",
"8": "select G_old from baseballStats limit 100"
}
}`
DELETE /query/{brokerId}/{queryId}[?verbose=false/true]: to cancel a running query
with queryId and brokerId. The verbose is false by default, but if set to true,
responses from servers running the query also return.
Response example: `Cancelled query: 8 with responses from servers:
{192.168.0.105:7501=404, 192.168.0.105:7502=200, 192.168.0.105:7500=200}`
.
Let's setup a demo Kafka cluster locally, and create a sample topic transcript-topic
If you followed the Batch upload sample data, you have already pushed a schema for your sample table. If not, head over to Creating a schema on that page, to learn how to create a schema for your sample data.
Creating a table config
If you followed Batch upload sample data, you learnt how to push an offline table and schema. Similar to the offline table config, we will create a realtime table config for the sample. Here's the realtime table config for the transcript table. For a more detailed overview about table, checkout Table.
Uploading your schema and table config
Now that we have our table and schema, let's upload them to the cluster. As soon as the realtime table is created, it will begin ingesting from the Kafka topic.
Loading sample data into stream
Here's a JSON file for transcript table data:
Push sample JSON into Kafka topic, using the Kafka script from the Kafka download
Ingesting streaming data
As soon as data flows into the stream, the Pinot table will consume it and it will be ready for querying. Head over to the Query Console to checkout the realtime data
This section describes quick start commands that launch all Pinot components in a single process.
Pinot ships with QuickStart commands that launch Pinot components in a single process and import pre-built datasets. These QuickStarts are a good place if you're just getting started with Pinot.
By default the Airplay receiver server runs on port 7000, which is also the port used by the Pinot Server in the Quick Start. You may see the following error when running these examples:
If you disable the Airplay receiver server and try again, you shouldn't see this error message anymore.
Batch
This example demonstrates how to do batch processing with Pinot. The command:
Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates the baseballStats table
Launches a standalone data ingestion job that builds one segment for a given CSV data file for the baseballStats table and pushes the segment to the Pinot Controller.
Batch JSON
This example demonstrates how to import and query JSON documents in Pinot. The command:
Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates the githubEvents table
Launches a standalone data ingestion job that builds one segment for a given JSON data file for the githubEvents table and pushes the segment to the Pinot Controller.
Batch with complex data types
This example demonstrates how to do batch processing in Pinot where the the data items have complex fields that need to be unnested. The command:
Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates the githubEvents table
Launches a standalone data ingestion job that builds one segment for a given JSON data file for the githubEvents table and pushes the segment to the Pinot Controller.
Streaming
This example demonstrates how to do stream processing with Pinot. The command:
This example demonstrates how to do stream processing in Pinot with RealtimeToOfflineSegmentsTask and MergeRollupTask minion tasks continuously optimizing segments as data gets ingested. The command:
This example demonstrates how to do stream processing in Pinot where the stream contains items that have complex fields that need to be unnested. The command:
Launches a standalone data ingestion job that builds segments under a given directory of Avro files for the airlineStats table and pushes the segments to the Pinot Controller.
Join
This example demonstrates how to do joins in Pinot using the . The command:
Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server in the same container.
Creates the baseballStats table
Launches a data ingestion job that builds one segment for a given CSV data file for the baseballStats table and pushes the segment to the Pinot Controller.
Ingestion FAQ
Data processing
What is a good segment size?
While Pinot can work with segments of various sizes, for optimal use of Pinot, you want to get your segments sized in the 100MB to 500MB (un-tarred/uncompressed) range. Please note that having too many (thousands or more) of tiny segments for a single table just creates more overhead in terms of the metadata storage in Zookeeper as well as in the Pinot servers' heap. At the same time, having too few really large (GBs) segments reduces parallelism of query execution, as on the server side, the thread parallelism of query execution is at segment level.
Publishes data to a Kafka topic meetupRSVPEvents that is subscribed to by Pinot.
Issues sample queries to Pinot
Publishes data to a Kafka topic meetupRSVPEvents that is subscribed to by Pinot
Issues sample queries to Pinot
Publishes data to a Kafka topic githubEvents that is subscribed to by Pinot.
Issues sample queries to Pinot
Publishes data to a Kafka topic meetupRSVPEvents that is subscribed to by Pinot.
Issues sample queries to Pinot
Publishes data to a Kafka topic meetupRSVPEvents that is subscribed to by Pinot
Issues sample queries to Pinot
Publishes data to a Kafka topic meetupRSVPEvents that is subscribed to by Pinot
Issues sample queries to Pinot
Launches a stream of flights stats
Publishes data to a Kafka topic airlineStatsEvents that is subscribed to by Pinot.
Issues sample queries to Pinot
Creates the dimBaseballTeams table
Launches a data ingestion job that builds one segment for a given CSV data file for the dimBaseballStats table and pushes the segment to the Pinot Controller.
Can multiple Pinot tables consume from the same Kafka topic?
Yes. Each table can be independently configured to consume from any given Kafka topic, regardless of whether there are other tables that are also consuming from the same Kafka topic.
If I add a partition to a Kafka topic, will Pinot automatically ingest data from this partition?
Pinot automatically detects new partitions in Kafka topics. It checks for new partitions whenever RealtimeSegmentValidationManager periodic job runs and starts consumers for new partitions.
You can configure the interval for this job using thecontroller.realtime.segment.validation.frequencyPeriod property in controller configuration.
How do I enable partitioning in Pinot, when using Kafka stream?
The partitioning logic in the stream should match the partitioning config in Pinot. Kafka uses murmur2, and the equivalent in Pinot is Murmur function.
Set partitioning config as below using same column used in Kafka
and also set
More details about how partitioner works in Pinot here.
How do I store BYTES column in JSON data?
For JSON, you can use hex encoded string to ingest BYTES
How do I flatten my JSON Kafka stream?
See the json_format(field) function which can store a top level json field as a STRING in Pinot.
Then you can use these json functions during query time, to extract fields from the json string.
NOTE
This works well if some of your fields are nested json, but most of your fields are top level json keys. If all of your fields are within a nested JSON key, you will have to store the entire payload as 1 column, which is not ideal.
Is there a limit on the maximum length of a string column in Pinot?
By default, Pinot limits the length of a String column to 512 bytes. If you want to overwrite this value, you can set the maxLength attribute in the schema as follows:
When can new events become queryable when getting ingested into a real-time table?
Events are available to queries as soon as they are ingested. This is because events are instantly indexed in memory upon ingestion.
The ingestion of events into the real-time table is not transactional, so replicas of the open segment are not immediately consistent. Pinot trades consistency for availability upon network partitioning (CAP theorem) to provide ultra-low ingestion latencies at high throughput.
However, when the open segment is closed and its in-memory indexes are flushed to persistent storage, all its replicas are guaranteed to be consistent, with the commit protocol.
How to reset a CONSUMING segment stuck on an offset which has expired from the stream?
This typically happens if
The consumer is lagging a lot
The consumer was down (server down, cluster down), and the stream moved on, resulting in offset not found when consumer comes back up.
In case of Kafka, to recover, set property "auto.offset.reset":"earliest" in the streamConfigs section and reset the CONSUMING segment. See Realtime table configs for more details about the config.
You can also also use the "Resume Consumption" endpoint with "resumeFrom" parameter set to "smallest" (or "largest" if you want). Refer to Pause Stream Ingestion for more details.
Indexing
How to set inverted indexes?
Inverted indexes are set in the tableConfig's tableIndexConfig -> invertedIndexColumns list. For documentation on table config, see Table Config Reference. For an example showing how to configure an inverted index, see Inverted Index.
Applying inverted indexes to a table config will generate an inverted index for all new segments. To apply the inverted indexes to all existing segments, see How to apply an inverted index to existing segments?
How to apply an inverted index to existing segments?
Once you've done that, you can check whether the index has been applied by querying the segment metadata API at http://localhost:9000/help#/Segment/getServerMetadata. Don't forget to include the names of the column on which you have applied the index.
The output from this API should look something like the following:
Can I retrospectively add an index to any segment?
Not all indexes can be retrospectively applied to existing segments.
The new segments will have star-tree indexes generated after applying the star-tree index configs to the table config. Currently, Pinot does not support adding star-tree indexes to the existing segments.
Handling time in Pinot
How does Pinot’s real-time ingestion handle out-of-order events?
Pinot does not require ordering of event time stamps. Out of order events are still consumed and indexed into the "currently consuming" segment. In a pathological case, if you have a 2 day old event come in "now", it will still be stored in the segment that is open for consumption "now". There is no strict time-based partitioning for segments, but star-indexes and hybrid tables will handle this as appropriate.
See the Components > Broker for more details about how hybrid tables handle this. Specifically, the time-boundary is computed as max(OfflineTIme) - 1 unit of granularity. Pinot does store the min-max time for each segment and uses it for pruning segments, so segments with multiple time intervals may not be perfectly pruned.
When generating star-indexes, the time column will be part of the star-tree so the tree can still be efficiently queried for segments with multiple time intervals.
What is the purpose of a hybrid table not using max(OfflineTime) to determine the time-boundary, and instead using an offset?
This lets you have an old event up come in without building complex offline pipelines that perfectly partition your events by event timestamps. With this offset, even if your offline data pipeline produces segments with a maximum timestamp, Pinot will not use the offline dataset for that last chunk of segments. The expectation is if you process offline the next time-range of data, your data pipeline will include any late events.
Why are segments not strictly time-partitioned?
It might seem odd that segments are not strictly time-partitioned, unlike similar systems such as Apache Druid. This allows real-time ingestion to consume out-of-order events. Even though segments are not strictly time-partitioned, Pinot will still index, prune, and query segments intelligently by time intervals for the performance of hybrid tables and time-filtered data.
When generating offline segments, the segments generated such that segments only contain one time interval and are well partitioned by the time column.
Failed to start a Pinot [SERVER]
java.lang.RuntimeException: java.net.BindException: Address already in use
at org.apache.pinot.core.transport.QueryServer.start(QueryServer.java:103) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
at org.apache.pinot.server.starter.ServerInstance.start(ServerInstance.java:158) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
at org.apache.helix.manager.zk.ParticipantManager.handleNewSession(ParticipantManager.java:110) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da2113
How much heap should I allocate for my Pinot instances?
Typically, Pinot components try to use as much off-heap (MMAP/DirectMemory) wherever possible. For example, Pinot servers load segments in memory-mapped files in MMAP mode (recommended), or direct memory in HEAP mode. Heap memory is used mostly for query execution and storing some metadata. We have seen production deployments with high throughput and low-latency work well with just 16 GB of heap for Pinot servers and brokers. Pinot controller may also cache some metadata (table configs etc) in heap, so if there are just a few tables in the Pinot cluster, a few GB of heap should suffice.
DR
Does Pinot provide any backup/restore mechanism?
Pinot relies on deep-storage for storing backup copy of segments (offline as well as realtime). It relies on Zookeeper to store metadata (table configs, schema, cluster state, etc). It does not explicitly provide tools to take backups or restore these data, but relies on the deep-storage (ADLS/S3/GCP/etc), and ZK to persist these data/metadata.
Alter Table
Can I change a column name in my table, without losing data?
Changing a column name or data type is considered backward incompatible change. While Pinot does support schema evolution for backward compatible changes, it does not support backward incompatible changes like changing name/data-type of a column.
How to change number of replicas of a table?
You can change the number of replicas by updating the table config's section. Make sure you have at least as many servers as the replication.
For OFFLINE table, update
For REALTIME table update
After changing the replication, run a .
Rebalance
How to run a rebalance on a table?
Refer to .
Why does my REALTIME table not use the new nodes I added to the cluster?
Likely explanation: num partitions * num replicas < num servers
In realtime tables, segments of the same partition always continue to remain on the same node. This sticky assignment is needed for replica groups and is critical if using upserts. For instance, if you have 3 partitions, 1 replica, and 4 nodes, only ¾ nodes will be used, and all of p0 segments will be on 1 node, p1 on 1 node, and p2 on 1 node. One server will be unused, and will remain unused through rebalances.
There’s nothing we can do about CONSUMING segments, they will continue to use only 3 nodes if you have 3 partitions. But we can rebalance such that completed segments use all nodes. If you want to force the completed segments of the table to use the new server, use this config
Segments
How to control number of segments generated?
The number of segments generated depends on the number of input files. If you provide only 1 input file, you will get 1 segment. If you break up the input file into multiple files, you will get as many segments as the input files.
What are the common reasons my segment is in a BAD state ?
This typically happens when the server is unable to load the segment. Possible causes: Out-Of-Memory, no-disk space, unable to download segment from deep-store, and similar other errors. Please check server logs for more information.
How to reset a segment when it runs into a BAD state?
Use the segment reset controller REST API to reset the segment:
How to pause realtime ingestion?
Refer to .
What's the difference to Reset, Refresh, or Reload a segment?
RESET: this gets a segment in ERROR state back to ONLINE or CONSUMING state. Behind the scenes, Pinot controller takes the segment to OFFLINE state, waits for External View to stabilize, and then moves it back to ONLINE/CONSUMING state, thus effectively resetting segments or consumers in error states.
REFRESH: this replaces the segment with a new one, with the same name but often different data. Under the hood, Pinot controller sets new segment metadata in Zookeeper, and notifies brokers and servers to check their local states about this segment and update accordingly. Servers also download the new segment to replace the old one, when both have different checksums. There is no separate rest API for refreshing, and it is done as part of SegmentUpload API today.
RELOAD: this reloads the segment, often to generate a new index as updated in table config. Underlying, Pinot server gets the new table config from Zookeeper, and uses it to guide the segment reloading. In fact, the last step of REFRESH as explained above is to load the segment into memory to serve queries. There is a dedicated rest API for reloading. By default, it doesn't download segment. But option is provided to force server to download segment to replace the local one cleanly.
In addition, RESET brings the segment OFFLINE temporarily; while REFRESH and RELOAD swap the segment on server atomically without bringing down the segment or affecting ongoing queries.
Tenants
How can I make brokers/servers join the cluster without the DefaultTenant tag?
Set this property in your controller.conf file
Now your brokers and servers should join the cluster as broker_untagged and server_untagged . You can then directly use the POST /tenants API to create the desired tenants
Minion
How to tune minion task timeout and parallelism on each worker
There are two task configs but set as part of cluster configs like below. One controls task's overall timeout (1hr by default) and one for how many tasks to run on a single minion worker (1 by default). The <taskType> is the task to tune, e.g. MergeRollupTask or RealtimeToOfflineSegmentsTask etc.
How to I manually run a Periodic Task
Refer to
Tuning and Optimizations
Do replica groups work for real-time?
Yes, replica groups work for realtime. There's 2 parts to enabling replica groups:
Replica groups segment assignment
Replica group query routing
Replica group segment assignment
Replica group segment assignment is achieved in realtime, if number of servers is a multiple of number of replicas. The partitions get uniformly sprayed across the servers, creating replica groups.
For example, consider we have 6 partitions, 2 replicas, and 4 servers.
r1
r2
As you can see, the set (S0, S2) contains r1 of every partition, and (s1, S3) contains r2 of every partition. The query will only be routed to one of the sets, and not span every server.
If you are are adding/removing servers from an existing table setup, you have to run for segment assignment changes to take effect.
Replica group query routing
Once replica group segment assignment is in effect, the query routing can take advantage of it. For replica group based query routing, set the following in the table config's section, and then restart brokers
Credential
How to update credential for realtime upstream without downtime
Using "POST /cluster/configs" API on CLUSTER tab in Swagger, with this payload
{
"<taskType>.timeoutMs": "600000",
"<taskType>.numConcurrentTasksPerInstance": "4"
}
Step-by-step guide on pushing your own data into the Pinot cluster
So far, we have set up our cluster, ran some queries, and explored the admin endpoints. Now, it's time to get our own data into Pinot. The rest of the instructions assume you're using Pinot in Docker.
Preparing your data
Let's gather our data files and put them in pinot-quick-start/rawdata.
Supported file formats are CSV, JSON, AVRO, PARQUET, THRIFT, ORC. If you don't have sample data, you can use this sample CSV.
Creating a schema
Schema is used to define the columns and data types of the Pinot table. A detailed overview of the schema can be found in .
Briefly, we categorize our columns into 3 types
Column Type
Description
For example, in our sample table, the studentID,firstName,lastName,gender,subject columns are the dimensions, the score column is the metric and timestampInEpoch is the time column.
Once you have identified the dimensions, metrics and time columns, create a schema for your data, using the reference below.
Creating a table config
A table config is used to define the config related to the Pinot table. A detailed overview of the table can be found in .
Here's the table config for the sample CSV file. You can use this as a reference to build your own table config. Simply edit the tableName and schemaName.
Uploading your table config and schema
Check the directory structure so far
Upload the table config using the following command
Check out the table config and schema in the to make sure it was successfully uploaded.
Creating a segment
A Pinot table's data is stored as Pinot segments. A detailed overview of the segment can be found in .
To generate a segment, we need to first create a job spec yaml file. JobSpec yaml file has all the information regarding data format, input data location and pinot cluster coordinates. You can just copy over this job spec file. If you're using your own data, be sure to 1) replace transcript with your table name 2) set the right recordReaderSpec
Use the following command to generate a segment and upload it
Sample output
Check that your segment made it to the table using the
Querying your data
You're all set! You should see your table in the and be able to run queries against it now.
Running in Kubernetes
Pinot quick start in Kubernetes
1. Prerequisites
This quickstart assumes that you already have a running Kubernetes cluster. Please follow the links below to set up a Kubernetes cluster.
Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.
The scripts can be found in the Pinot source at ./pinot/kubernetes/helm
NOTE: Please specify StorageClass based on your cloud vendor. For Pinot Server, please don't mount blob store like AzureFile/GoogleCloudStorage/S3 as the data serving file system.
Only use Amazon EBS/GCP Persistent Disk/Azure Disk style disks.
For AWS: "gp2"
For GCP: "pd-ssd" or "standard"
For Azure: "AzureDisk"
For Docker-Desktop: "hostpath"
2.1.1 Update helm dependency
2.1.2 Start Pinot with Helm
For Helm v2.12.1
If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:
Then deploy a new HA Pinot cluster using the following command:
2.2 Check Pinot deployment status
3. Load data into Pinot using Kafka
3.1 Bring up a Kafka cluster for real-time data ingestion
3.2 Check Kafka deployment status
Ensure the Kafka deployment is ready before executing the scripts in the following next steps.
3.3 Create Kafka topics
The scripts below will create two Kafka topics for data ingestion:
3.4 Load data into Kafka and create Pinot schema/tables
The script below will deploy 3 batch jobs.
Ingest 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec
Ingest 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec
Upload Pinot schema airlineStats
Create Pinot table airlineStats to ingest data from JSON encoded Kafka topic flights-realtime
Create Pinot table airlineStatsAvro to ingest data from Avro encoded Kafka topic flights-realtime-avro
4. Query using Pinot Data Explorer
4.1 Pinot Data Explorer
Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.
This script can be found in the Pinot source at ./pinot/kubernetes/helm/pinot
5. Using Superset to query Pinot
5.1 Bring up Superset using helm
Install SuperSet Helm Repo
Get Helm values config file:
Edit /tmp/superset-values.yaml file and add pinotdb pip dependency into bootstrapScript field, so Superset will install pinot dependencies during bootstrap time.
You can also build your own image with this dependency or just use image: apachepinot/pinot-superset:latest instead.
Also remember to change the admin credential inside the init section with meaningful user profile and stronger password.
Install Superset using helm
Ensure your cluster is up by running:
5.2 Access Superset UI
You can run the below command to port forward superset to your localhost:18088. Then you can navigate superset in your browser with the previous set admin credential.
Once the database is added, you can add more data sets and explore the dashboarding.
6. Access Pinot using Trino
6.1 Deploy Trino
You can run the command below to deploy Trino with the Pinot plugin installed.
The above command adds Trino HelmChart repo. You can then run the below command to see the charts.
In order to connect Trino to Pinot, we need to add Pinot catalog, which requires extra configurations. You can run the below command to get all the configurable values.
To add Pinot catalog, you can edit the additionalCatalogs section by adding:
Pinot is deployed at namespace pinot-quickstart, so the controller serviceURL is pinot-controller.pinot-quickstart:9000
After modifying the /tmp/trino-values.yaml file, you can deploy Trino with:
Once you deployed the Trino, You can check Trino deployment status by:
6.2 Query Trino using Trino CLI
Once Trino is deployed, you can run the below command to get a runnable Trino CLI.
6.2.1 Download Trino CLI
6.2.2 Port forward Trino service to your local if it's not already exposed
6.2.3 Use Trino console client to connect to Trino service
6.2.4 Query Pinot data using Trino CLI
6.3 Sample queries to execute
List all catalogs
List All tables
Show schema
Count total documents
7. Access Pinot using Presto
7.1 Deploy Presto using Pinot plugin
You can run the command below to deploy a customized Presto with the Pinot plugin installed.
The above command deploys Presto with default configs. For customizing your deployment, you can run the below command to get all the configurable values.
After modifying the /tmp/presto-values.yaml file, you can deploy Presto with:
Once you deployed the Presto, You can check Presto deployment status by:
Sample Output of K8s Deployment Status
7.2 Query Presto using Presto CLI
Once Presto is deployed, you can run the below command from here, or just follow steps 6.2.1 to 6.2.3.
6.2.1 Download Presto CLI
6.2.2 Port forward presto-coordinator port 8080 to localhost port 18080
6.2.3 Start Presto CLI with pinot catalog to query it then query it
6.2.4 Query Pinot data using Presto CLI
7.3 Sample queries to execute
List all catalogs
List All tables
Show schema
Count total documents
8. Deleting the Pinot cluster in Kubernetes
Note: These are sample configs to be used as reference. For production setup, you may want to customize it to your needs.
Dimensions
Typically used in filters and group by, for slicing and dicing into data
Metrics
Typically used in aggregations, represents the quantitative data
Time
Optional column, represents the timestamp associated with each row
SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**\/*.csv
inputDirURI: /tmp/pinot-quick-start/rawdata/
jobType: SegmentCreationAndTarPush
outputDirURI: /tmp/pinot-quick-start/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: 'http://localhost:9000'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: null
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader,
configClassName: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig,
configs: null, dataFormat: csv}
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: 'http://localhost:9000/tables/transcript/schema', tableConfigURI: 'http://localhost:9000/tables/transcript',
tableName: transcript}
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Finished building StatsCollector!
Collected stats for 4 documents
Using fixed bytes value dictionary for column: studentID, size: 9
Created dictionary for STRING column: studentID with cardinality: 3, max length in bytes: 3, range: 200 to 202
Using fixed bytes value dictionary for column: firstName, size: 12
Created dictionary for STRING column: firstName with cardinality: 3, max length in bytes: 4, range: Bob to Nick
Using fixed bytes value dictionary for column: lastName, size: 15
Created dictionary for STRING column: lastName with cardinality: 3, max length in bytes: 5, range: King to Young
Created dictionary for FLOAT column: score with cardinality: 4, range: 3.2 to 3.8
Using fixed bytes value dictionary for column: gender, size: 12
Created dictionary for STRING column: gender with cardinality: 2, max length in bytes: 6, range: Female to Male
Using fixed bytes value dictionary for column: subject, size: 21
Created dictionary for STRING column: subject with cardinality: 3, max length in bytes: 7, range: English to Physics
Created dictionary for LONG column: timestampInEpoch with cardinality: 4, range: 1570863600000 to 1572418800000
Start building IndexCreator!
Finished records indexing in IndexCreator!
Finished segment seal!
Converting segment: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to v3 format
v3 segment location for segment: transcript_OFFLINE_1570863600000_1572418800000_0 is /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3
Deleting files in v1 segment directory: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0
Starting building 1 star-trees with configs: [StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]] using OFF_HEAP builder
Starting building star-tree with config: StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]
Generated 3 star-tree records from 4 segment records
Finished constructing star-tree, got 9 tree nodes and 4 records under star-node
Finished creating aggregated documents, got 6 aggregated records
Finished building star-tree in 10ms
Finished building 1 star-trees in 27ms
Computed crc = 3454627653, based on files [/var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/columns.psf, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/index_map, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/metadata.properties, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index_map]
Driver, record read time : 0
Driver, stats collector time : 0
Driver, indexing time : 0
Tarring segment from: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz
Size for segment: transcript_OFFLINE_1570863600000_1572418800000_0, uncompressed: 6.73KB, compressed: 1.89KB
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Start pushing segments: [/tmp/pinot-quick-start/segments/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz]... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@243c4f91] for table transcript
Pushing segment: transcript_OFFLINE_1570863600000_1572418800000_0 to location: http://localhost:9000 for table transcript
Sending request: http://localhost:9000/v2/segments?tableName=transcript to controller: nehas-mbp.hsd1.ca.comcast.net, version: Unknown
Response for pushing table transcript segment transcript_OFFLINE_1570863600000_1572418800000_0 to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: transcript_OFFLINE_1570863600000_1572418800000_0 of table: transcript"}
For Helm v3.0.0
2.1.3 Troubleshooting (For helm v2.12.1)
Error: Please run the below command if encountering the following issue:
Resolution:
Error: Please run the command below if encountering a permission issue:
Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"