Running in Kubernetes
Pinot quick start in Kubernetes
Get started running Pinot in Kubernetes.
Prerequisites
Kubernetes
This guide assumes that you already have a running Kubernetes cluster.
If you haven't yet set up a Kubernetes cluster, see the links below for instructions:
Install Minikube for local setup
Make sure to run with enough resources:
minikube start --vm=true --cpus=4 --memory=8g --disk-size=50g
Pinot
Make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.
# checkout pinot
git clone https://github.com/apache/pinot.git
cd pinot/helm/pinotSet up a Pinot cluster in Kubernetes
Start Pinot with Helm
The Pinot repository has pre-packaged Helm charts for Pinot and Presto. The Helm repository index file is here.
helm repo add pinot https://raw.githubusercontent.com/apache/pinot/master/helm
kubectl create ns pinot-quickstart
helm install pinot pinot/pinot \
-n pinot-quickstart \
--set cluster.name=pinot \
--set server.replicaCount=2Note: Specify StorageClass based on your cloud vendor. Don't mount a blob store (such as AzureFile, GoogleCloudStorage, or S3) as the data serving file system. Use only Amazon EBS/GCP Persistent Disk/Azure Disk-style disks.
For AWS: "gp2"
For GCP: "pd-ssd" or "standard"
For Azure: "AzureDisk"
For Docker-Desktop: "hostpath"
Check Pinot deployment status
kubectl get all -n pinot-quickstartLoad data into Pinot using Kafka
Bring up a Kafka cluster for real-time data ingestion
helm repo add kafka https://charts.bitnami.com/bitnami
helm install -n pinot-quickstart kafka kafka/kafka --set replicas=1,zookeeper.image.tag=latestCheck Kafka deployment status
Ensure the Kafka deployment is ready before executing the scripts in the following steps. Run the following command:
kubectl get all -n pinot-quickstart | grep kafkaBelow is an example output showing the deployment is ready:
pod/kafka-0 1/1 Running 0 2m
pod/kafka-zookeeper-0 1/1 Running 0 10m
pod/kafka-zookeeper-1 1/1 Running 0 9m
pod/kafka-zookeeper-2 1/1 Running 0 8mCreate Kafka topics
Run the scripts below to create two Kafka topics for data ingestion:
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics.sh --bootstrap-server kafka-0:9092 --topic flights-realtime --create --partitions 1 --replication-factor 1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics.sh --bootstrap-server kafka-0:9092 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1Load data into Kafka and create Pinot schema/tables
The script below does the following:
Ingests 19492 JSON messages to Kafka topic
flights-realtimeat a speed of 1 msg/secIngests 19492 Avro messages to Kafka topic
flights-realtime-avroat a speed of 1 msg/secUploads Pinot schema
airlineStatsCreates Pinot table
airlineStatsto ingest data from JSON encoded Kafka topicflights-realtimeCreates Pinot table
airlineStatsAvroto ingest data from Avro encoded Kafka topicflights-realtime-avro
kubectl apply -f pinot/pinot-realtime-quickstart.ymlQuery with the Pinot Data Explorer
Pinot Data Explorer
The script below, located at ./pinot/helm/pinot, performs local port forwarding, and opens the Pinot query console in your default web browser.
./query-pinot-data.shQuery Pinot with Superset
Bring up Superset using Helm
Install the SuperSet Helm repository:
helm repo add superset https://apache.github.io/supersetGet the Helm values configuration file:
helm inspect values superset/superset > /tmp/superset-values.yamlFor Superset to install Pinot dependencies, edit
/tmp/superset-values.yamlfile to add apinotdbpip dependency intobootstrapScriptfield.You can also build your own image with this dependency or use the image
apachepinot/pinot-superset:latestinstead.

Replace the default admin credentials inside the
initsection with a meaningful user profile and stronger password.Install Superset using Helm:
kubectl create ns superset
helm upgrade --install --values /tmp/superset-values.yaml superset superset/superset -n supersetEnsure your cluster is up by running:
kubectl get all -n supersetAccess the Superset UI
Run the below command to port forward Superset to your
localhost:18088.
kubectl port-forward service/superset 18088:8088 -n supersetNavigate to Superset in your browser with the admin credentials you set in the previous section.
Create a new database connection with the following URI:
pinot+http://pinot-broker.pinot-quickstart:8099/query?controller=http://pinot-controller.pinot-quickstart:9000/Once the database is added, you can add more data sets and explore the dashboard options.
Access Pinot with Trino
Deploy Trino
Deploy Trino with the Pinot plugin installed:
helm repo add trino https://trinodb.github.io/charts/See the charts in the Trino Helm chart repository:
helm search repo trinoIn order to connect Trino to Pinot, you'll need to add the Pinot catalog, which requires extra configurations. Run the below command to get all the configurable values.
helm inspect values trino/trino > /tmp/trino-values.yamlTo add the Pinot catalog, edit the
additionalCatalogssection by adding:
additionalCatalogs:
pinot: |
connector.name=pinot
pinot.controller-urls=pinot-controller.pinot-quickstart:9000After modifying the
/tmp/trino-values.yamlfile, deploy Trino with:
kubectl create ns trino-quickstart
helm install my-trino trino/trino --version 0.2.0 -n trino-quickstart --values /tmp/trino-values.yamlOnce you've deployed Trino, check the deployment status:
kubectl get pods -n trino-quickstart
Query Pinot with the Trino CLI
Once Trino is deployed, run the below command to get a runnable Trino CLI.
Download the Trino CLI:
curl -L https://repo1.maven.org/maven2/io/trino/trino-cli/363/trino-cli-363-executable.jar -o /tmp/trino && chmod +x /tmp/trinoPort forward Trino service to your local if it's not already exposed:
echo "Visit http://127.0.0.1:18080 to use your application"
kubectl port-forward service/my-trino 18080:8080 -n trino-quickstartUse the Trino console client to connect to the Trino service:
/tmp/trino --server localhost:18080 --catalog pinot --schema defaultQuery Pinot data using the Trino CLI, like in the sample queries below.
Sample queries to execute
List all catalogs
trino:default> show catalogs; Catalog
---------
pinot
system
tpcds
tpch
(4 rows)
Query 20211025_010256_00002_mxcvx, FINISHED, 2 nodes
Splits: 36 total, 36 done (100.00%)
0.70 [0 rows, 0B] [0 rows/s, 0B/s]List all tables
trino:default> show tables; Table
--------------
airlinestats
(1 row)
Query 20211025_010326_00003_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.28 [1 rows, 29B] [3 rows/s, 104B/s]Show schema
trino:default> DESCRIBE airlinestats; Column | Type | Extra | Comment
----------------------+----------------+-------+---------
flightnum | integer | |
origin | varchar | |
quarter | integer | |
lateaircraftdelay | integer | |
divactualelapsedtime | integer | |
divwheelsons | array(integer) | |
divwheelsoffs | array(integer) | |
......
Query 20211025_010414_00006_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.37 [79 rows, 5.96KB] [212 rows/s, 16KB/s]Count total documents
trino:default> select count(*) as cnt from airlinestats limit 10; cnt
------
9746
(1 row)
Query 20211025_015607_00009_mxcvx, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0.24 [1 rows, 9B] [4 rows/s, 38B/s]Access Pinot with Presto
Deploy Presto with the Pinot plugin
First, deploy Presto with default configurations:
helm install presto pinot/presto -n pinot-quickstartTo customize your deployment, run the below command to get all the configurable values.
helm inspect values pinot/presto > /tmp/presto-values.yamlAfter modifying the
/tmp/presto-values.yamlfile, deploy Presto:
helm install presto pinot/presto -n pinot-quickstart --values /tmp/presto-values.yamlOnce you've deployed the Presto instance, check the deployment status:
kubectl get pods -n pinot-quickstartQuery Presto using the Presto CLI
Once Presto is deployed, you can run the below command from here, or follow the steps below.
./pinot-presto-cli.shDownload the Presto CLI:
curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar -o /tmp/presto-cli && chmod +x /tmp/presto-cliPort forward
presto-coordinatorport 8080 tolocalhostport 18080:
kubectl port-forward service/presto-coordinator 18080:8080 -n pinot-quickstart> /dev/null &Start the Presto CLI with the Pinot catalog:
/tmp/presto-cli --server localhost:18080 --catalog pinot --schema defaultQuery Pinot data with the Presto CLI, like in the sample queries below.
Sample queries to execute
List all catalogs
presto:default> show catalogs; Catalog
---------
pinot
system
(2 rows)
Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]List all tables
presto:default> show tables; Table
--------------
airlinestats
(1 row)
Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [1 rows, 29B] [1 rows/s, 41B/s]Show schema
presto:default> DESCRIBE pinot.dontcare.airlinestats; Column | Type | Extra | Comment
----------------------+---------+-------+---------
flightnum | integer | |
origin | varchar | |
quarter | integer | |
lateaircraftdelay | integer | |
divactualelapsedtime | integer | |
......
Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]Count total documents
presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10; cnt
------
9745
(1 row)
Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [2 rows/s, 19B/s]Delete a Pinot cluster in Kubernetes
To delete your Pinot cluster in Kubernetes, run the following command:
kubectl delete ns pinot-quickstartWas this helpful?

