1. Prerequisites
This quickstart assumes that you already have a running Kubernetes cluster. Please follow the links below to set up a Kubernetes cluster.
2. Setting up a Pinot cluster in Kubernetes
Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.
The scripts can be found in the Pinot source at ./pinot/kubernetes/helm
Git clone project source
Copy # checkout pinot
git clone https://github.com/apache/pinot.git
cd pinot/kubernetes/helm
2.1 Start Pinot with Helm
Run Helm with Pre-installed Package
Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is here .
Copy helm repo add pinot https://raw.githubusercontent.com/apache/pinot/master/kubernetes/helm
kubectl create ns pinot-quickstart
helm install pinot pinot/pinot \
-n pinot-quickstart \
--set cluster.name=pinot \
--set server.replicaCount=2
NOTE : Please specify StorageClass based on your cloud vendor. For Pinot Server, please don't mount blob store like AzureFile/GoogleCloudStorage/S3 as the data serving file system.
Only use Amazon EBS/GCP Persistent Disk/Azure Disk style disks.
For GCP: "pd-ssd " or "standard "
For Docker-Desktop: "hostpath "
Run Helm Script within Git Repo
2.1.1 Update helm dependency
Copy helm dependency update
2.1.2 Start Pinot with Helm
If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:
Copy helm init --service-account tiller
Then deploy a new HA Pinot cluster using the following command:
Copy helm install --namespace "pinot-quickstart" --name "pinot" pinot
Copy kubectl create ns pinot-quickstart
helm install -n pinot-quickstart pinot pinot
2.1.3 Troubleshooting (For helm v2.12.1)
Error: Please run the below command if encountering the following issue:
Copy Error: could not find tiller.
Copy kubectl -n kube-system delete deployment tiller-deploy
kubectl -n kube-system delete service/tiller-deploy
helm init --service-account tiller
Error: Please run the command below if encountering a permission issue:
Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"
Copy kubectl apply -f helm-rbac.yaml
2.2 Check Pinot deployment status
Copy kubectl get all -n pinot-quickstart
3. Load data into Pinot using Kafka
3.1 Bring up a Kafka cluster for real-time data ingestion
For Helm v3.0.0
Copy helm repo add incubator https://charts.helm.sh/incubator
helm install -n pinot-quickstart kafka incubator/kafka --set replicas=1
For Helm v2.12.1
Copy helm repo add incubator https://charts.helm.sh/incubator
helm install --namespace "pinot-quickstart" --name kafka incubator/kafka
3.2 Check Kafka deployment status
Copy kubectl get all -n pinot-quickstart | grep kafka
Ensure the Kafka deployment is ready before executing the scripts in the following next steps.
Copy pod/kafka-0 1/1 Running 0 2m
pod/kafka-zookeeper-0 1/1 Running 0 10m
pod/kafka-zookeeper-1 1/1 Running 0 9m
pod/kafka-zookeeper-2 1/1 Running 0 8m
3.3 Create Kafka topics
The scripts below will create two Kafka topics for data ingestion:
Copy kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor 1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1
3.4 Load data into Kafka and create Pinot schema/tables
The script below will deploy 3 batch jobs.
Ingest 19492 JSON messages to Kafka topic flights-realtime
at a speed of 1 msg/sec
Ingest 19492 Avro messages to Kafka topic flights-realtime-avro
at a speed of 1 msg/sec
Upload Pinot schema airlineStats
Create Pinot table airlineStats
to ingest data from JSON encoded Kafka topic flights-realtime
Create Pinot table airlineStatsAvro
to ingest data from Avro encoded Kafka topic flights-realtime-avro
Copy kubectl apply -f pinot/pinot-realtime-quickstart.yml
4. Query using Pinot Data Explorer
4.1 Pinot Data Explorer
Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.
This script can be found in the Pinot source at ./pinot/kubernetes/helm/pinot
Copy ./query-pinot-data.sh
5. Using Superset to query Pinot
5.1 Bring up Superset
Open superset.yaml
file and goto the line showing storageClass
. And change it based on your cloud vendor. kubectl get sc
will get you the storageClass
value for your Kubernetes system. E.g.
For GCP: "pd-ssd " or "standard "
For Docker-Desktop: "hostpath "
Then run:
Copy kubectl apply -f superset.yaml
Ensure your cluster is up by running:
Copy kubectl get all -n pinot-quickstart | grep superset
5.2 (First time) Set up Admin account
Copy kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'flask fab create-admin'
5.3 (First time) Init Superset
Copy kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset db upgrade'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset init'
5.4 Load Demo data source
Copy kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_datasources -p /etc/superset/pinot_example_datasource.yaml'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_dashboards -p /etc/superset/pinot_example_dashboard.json'
5.5 Access Superset UI
You can run below command to navigate superset in your browser with the previous admin credential.
Copy ./open-superset-ui.sh
You can open the imported dashboard by clicking Dashboards
banner and then click on AirlineStats
.
6. Access Pinot using Trino
6.1 Deploy Trino
You can run the command below to deploy Trino with the Pinot plugin installed.
Copy helm repo add trino https://trinodb.github.io/charts/
The above command adds Trino HelmChart repo. You can then run the below command to see the charts.
Copy helm search repo trino
In order to connect Trino to Pinot, we need to add Pinot catalog, which requires extra configurations. You can run the below command to get all the configurable values.
Copy helm inspect values trino/trino > /tmp/trino-values.yaml
To add Pinot catalog, you can edit the additionalCatalogs
section by adding:
Copy additionalCatalogs:
pinot: |
connector.name=pinot
pinot.controller-urls=pinot-controller.pinot-quickstart:9000
Pinot is deployed at namespace pinot-quickstart
, so the controller serviceURL is pinot-controller.pinot-quickstart:9000
After modifying the /tmp/trino-values.yaml
file, you can deploy Trino with:
Copy kubectl create ns trino-quickstart
helm install my-trino trino/trino --version 0.2.0 -n trino-quickstart --values /tmp/trino-values.yaml
Once you deployed the Trino, You can check Trino deployment status by:
Copy kubectl get pods -n trino-quickstart
6.2 Query Trino using Trino CLI
Once Trino is deployed, you can run the below command to get a runnable Trino CLI.
6.2.1 Download Trino CLI
Copy curl -L https://repo1.maven.org/maven2/io/trino/trino-cli/363/trino-cli-363-executable.jar -o /tmp/trino && chmod +x /tmp/trino
6.2.2 Port forward Trino service to your local if it's not already exposed
Copy echo "Visit http://127.0.0.1:18080 to use your application"
kubectl port-forward service/my-trino 18080:8080 -n trino-quickstart
6.2.3 Use Trino console client to connect to Trino service
Copy /tmp/trino --server localhost:18080 --catalog pinot --schema default
6.2.4 Query Pinot data using Trino CLI
6.3 Sample queries to execute
Copy trino:default> show catalogs;
Copy Catalog
---------
pinot
system
tpcds
tpch
(4 rows)
Query 20211025_010256_00002_mxcvx, FINISHED, 2 nodes
Splits: 36 total, 36 done (100.00%)
0.70 [0 rows, 0B] [0 rows/s, 0B/s]
Copy trino:default> show tables;
Copy Table
--------------
airlinestats
(1 row)
Query 20211025_010326_00003_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.28 [1 rows, 29B] [3 rows/s, 104B/s]
Copy trino:default> DESCRIBE airlinestats;
Copy Column | Type | Extra | Comment
----------------------+----------------+-------+---------
flightnum | integer | |
origin | varchar | |
quarter | integer | |
lateaircraftdelay | integer | |
divactualelapsedtime | integer | |
divwheelsons | array(integer) | |
divwheelsoffs | array(integer) | |
......
Query 20211025_010414_00006_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.37 [79 rows, 5.96KB] [212 rows/s, 16KB/s]
Copy trino:default> select count(*) as cnt from airlinestats limit 10;
Copy cnt
------
9746
(1 row)
Query 20211025_015607_00009_mxcvx, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0.24 [1 rows, 9B] [4 rows/s, 38B/s]
7. Access Pinot using Presto
7.1 Deploy Presto using Pinot plugin
You can run the command below to deploy a customized Presto with the Pinot plugin installed.
Helm
Copy helm install presto pinot/presto -n pinot-quickstart
K8s Scripts
Copy kubectl apply -f presto-coordinator.yaml
The above command deploys Presto with default configs. For customizing your deployment, you can run the below command to get all the configurable values.
Copy helm inspect values pinot/presto > /tmp/presto-values.yaml
After modifying the /tmp/presto-values.yaml
file, you can deploy Presto with:
Copy helm install presto pinot/presto -n pinot-quickstart --values /tmp/presto-values.yaml
Once you deployed the Presto, You can check Presto deployment status by:
Copy kubectl get pods -n pinot-quickstart
7.2 Query Presto using Presto CLI
Once Presto is deployed, you can run the below command from here , or just follow steps 6.2.1 to 6.2.3.
Copy ./pinot-presto-cli.sh
6.2.1 Download Presto CLI
Copy curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar -o /tmp/presto-cli && chmod +x /tmp/presto-cli
6.2.2 Port forward presto-coordinator port 8080 to localhost port 18080
Copy kubectl port-forward service/presto-coordinator 18080:8080 -n pinot-quickstart> /dev/null &
6.2.3 Start Presto CLI with pinot catalog to query it then query it
Copy /tmp/presto-cli --server localhost:18080 --catalog pinot --schema default
6.2.4 Query Pinot data using Presto CLI
7.3 Sample queries to execute
Copy presto:default> show catalogs;
Copy Catalog
---------
pinot
system
(2 rows)
Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
Copy presto:default> show tables;
Copy Table
--------------
airlinestats
(1 row)
Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [1 rows, 29B] [1 rows/s, 41B/s]
Copy presto:default> DESCRIBE pinot.dontcare.airlinestats;
Copy Column | Type | Extra | Comment
----------------------+---------+-------+---------
flightnum | integer | |
origin | varchar | |
quarter | integer | |
lateaircraftdelay | integer | |
divactualelapsedtime | integer | |
......
Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]
Copy presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;
Copy cnt
------
9745
(1 row)
Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [2 rows/s, 19B/s]
8. Deleting the Pinot cluster in Kubernetes
Copy kubectl delete ns pinot-quickstart