Running Pinot in Kubernetes
Pinot quick start in Kubernetes

1. Prerequisites

This quickstart assumes that you already have a running Kubernetes cluster. Please follow the links below to set up a Kubernetes cluster.

2. Setting up a Pinot cluster in Kubernetes

Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.
The scripts can be found in the Pinot source at ./pinot/kubernetes/helm
Git clone project source
1
# checkout pinot
2
git clone https://github.com/apache/pinot.git
3
cd pinot/kubernetes/helm
Copied!

2.1 Start Pinot with Helm

Run Helm with Pre-installed Package
Run Helm Script within Git Repo
Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is here.
1
helm repo add pinot https://raw.githubusercontent.com/apache/pinot/master/kubernetes/helm
2
kubectl create ns pinot-quickstart
3
helm install pinot pinot/pinot \
4
-n pinot-quickstart \
5
--set cluster.name=pinot \
6
--set server.replicaCount=2
Copied!
NOTE: Please specify StorageClass based on your cloud vendor. For Pinot Server, please don't mount blob store like AzureFile/GoogleCloudStorage/S3 as the data serving file system.
Only use Amazon EBS/GCP Persistent Disk/Azure Disk style disks.
    For AWS: "gp2"
    For GCP: "pd-ssd" or "standard"
    For Azure: "AzureDisk"
    For Docker-Desktop: "hostpath"

2.1.1 Update helm dependency

1
helm dependency update
Copied!

2.1.2 Start Pinot with Helm

    For Helm v2.12.1
If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:
1
helm init --service-account tiller
Copied!
Then deploy a new HA Pinot cluster using the following command:
1
helm install --namespace "pinot-quickstart" --name "pinot" pinot
Copied!
    For Helm v3.0.0
1
kubectl create ns pinot-quickstart
2
helm install -n pinot-quickstart pinot pinot
Copied!

2.1.3 Troubleshooting (For helm v2.12.1)

    Error: Please run the below command if encountering the following issue:
1
Error: could not find tiller.
Copied!
    Resolution:
1
kubectl -n kube-system delete deployment tiller-deploy
2
kubectl -n kube-system delete service/tiller-deploy
3
helm init --service-account tiller
Copied!
    Error: Please run the command below if encountering a permission issue:
Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"
    Resolution:
1
kubectl apply -f helm-rbac.yaml
Copied!

2.2 Check Pinot deployment status

1
kubectl get all -n pinot-quickstart
Copied!

3. Load data into Pinot using Kafka

3.1 Bring up a Kafka cluster for real-time data ingestion

For Helm v3.0.0
For Helm v2.12.1
1
helm repo add incubator https://charts.helm.sh/incubator
2
helm install -n pinot-quickstart kafka incubator/kafka --set replicas=1
Copied!
1
helm repo add incubator https://charts.helm.sh/incubator
2
helm install --namespace "pinot-quickstart" --name kafka incubator/kafka
Copied!

3.2 Check Kafka deployment status

1
kubectl get all -n pinot-quickstart | grep kafka
Copied!
Ensure the Kafka deployment is ready before executing the scripts in the following next steps.
1
pod/kafka-0 1/1 Running 0 2m
2
pod/kafka-zookeeper-0 1/1 Running 0 10m
3
pod/kafka-zookeeper-1 1/1 Running 0 9m
4
pod/kafka-zookeeper-2 1/1 Running 0 8m
Copied!

3.3 Create Kafka topics

The scripts below will create two Kafka topics for data ingestion:
1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor 1
2
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1
Copied!

3.4 Load data into Kafka and create Pinot schema/tables

The script below will deploy 3 batch jobs.
    Ingest 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec
    Ingest 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec
    Upload Pinot schema airlineStats
    Create Pinot table airlineStats to ingest data from JSON encoded Kafka topic flights-realtime
    Create Pinot table airlineStatsAvro to ingest data from Avro encoded Kafka topic flights-realtime-avro
1
kubectl apply -f pinot/pinot-realtime-quickstart.yml
Copied!

4. Query using Pinot Data Explorer

4.1 Pinot Data Explorer

Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.
This script can be found in the Pinot source at ./pinot/kubernetes/helm/pinot
1
./query-pinot-data.sh
Copied!

5. Using Superset to query Pinot

5.1 Bring up Superset

Open superset.yaml file and goto the line showing storageClass. And change it based on your cloud vendor. kubectl get sc will get you the storageClass value for your Kubernetes system. E.g.
    For AWS: "gp2"
    For GCP: "pd-ssd" or "standard"
    For Azure: "AzureDisk"
    For Docker-Desktop: "hostpath"
Then run:
1
kubectl apply -f superset.yaml
Copied!
Ensure your cluster is up by running:
1
kubectl get all -n pinot-quickstart | grep superset
Copied!

5.2 (First time) Set up Admin account

1
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'flask fab create-admin'
Copied!

5.3 (First time) Init Superset

1
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset db upgrade'
2
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset init'
Copied!

5.4 Load Demo data source

1
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_datasources -p /etc/superset/pinot_example_datasource.yaml'
2
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_dashboards -p /etc/superset/pinot_example_dashboard.json'
Copied!

5.5 Access Superset UI

You can run below command to navigate superset in your browser with the previous admin credential.
1
./open-superset-ui.sh
Copied!
You can open the imported dashboard by clicking Dashboards banner and then click on AirlineStats.

6. Access Pinot using Presto

6.1 Deploy Presto using Pinot plugin

You can run the command below to deploy a customized Presto with the Pinot plugin installed.
Helm
K8s Scripts
1
helm install presto pinot/presto -n pinot-quickstart
Copied!
1
kubectl apply -f presto-coordinator.yaml
Copied!
The above command deploys Presto with default configs. For customizing your deployment, you can run the below command to get all the configurable values.
1
helm inspect values pinot/presto > /tmp/presto-values.yaml
Copied!
After modifying the /tmp/presto-values.yaml file, you can deploy Presto with:
1
helm install presto pinot/presto -n pinot-quickstart --values /tmp/presto-values.yaml
Copied!
Once you deployed the Presto, You can check Presto deployment status by:
1
kubectl get pods -n pinot-quickstart
Copied!
Sample Output of K8s Deployment Status

6.2 Query Presto using Presto CLI

Once Presto is deployed, you can run the below command from here, or just follow steps 6.2.1 to 6.2.3.
1
./pinot-presto-cli.sh
Copied!

6.2.1 Download Presto CLI

1
curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar -o /tmp/presto-cli && chmod +x /tmp/presto-cli
Copied!
6.2.2 Port forward presto-coordinator port 8080 to localhost port 18080
1
kubectl port-forward service/presto-coordinator 18080:8080 -n pinot-quickstart> /dev/null &
Copied!

6.2.3 Start Presto CLI with pinot catalog to query it then query it

1
/tmp/presto-cli --server localhost:18080 --catalog pinot --schema default
Copied!
6.2.4 Query Pinot data using Presto CLI

6.3 Sample queries to execute

    List all catalogs
1
presto:default> show catalogs;
Copied!
1
Catalog
2
---------
3
pinot
4
system
5
(2 rows)
6
7
Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
8
Splits: 19 total, 19 done (100.00%)
9
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
Copied!
    List All tables
1
presto:default> show tables;
Copied!
1
Table
2
--------------
3
airlinestats
4
(1 row)
5
6
Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
7
Splits: 19 total, 19 done (100.00%)
8
0:01 [1 rows, 29B] [1 rows/s, 41B/s]
Copied!
    Show schema
1
presto:default> DESCRIBE pinot.dontcare.airlinestats;
Copied!
1
Column | Type | Extra | Comment
2
----------------------+---------+-------+---------
3
flightnum | integer | |
4
origin | varchar | |
5
quarter | integer | |
6
lateaircraftdelay | integer | |
7
divactualelapsedtime | integer | |
8
......
9
10
Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
11
Splits: 19 total, 19 done (100.00%)
12
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]
Copied!
    Count total documents
1
presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;
Copied!
1
cnt
2
------
3
9745
4
(1 row)
5
6
Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
7
Splits: 17 total, 17 done (100.00%)
8
0:00 [1 rows, 8B] [2 rows/s, 19B/s]
Copied!

7. Deleting the Pinot cluster in Kubernetes

1
kubectl delete ns pinot-quickstart
Copied!
Last modified 1mo ago