This quickstart assumes that you already have a running Kubernetes cluster. Please follow the links below to set up a Kubernetes cluster.
Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.
The scripts can be found in the Pinot source at ./incubator-pinot/kubernetes/helm
# checkout pinotgit clone https://github.com/apache/incubator-pinot.gitcd incubator-pinot/kubernetes/helm
Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is here.
helm repo add pinot https://raw.githubusercontent.com/apache/incubator-pinot/master/kubernetes/helmkubectl create ns pinot-quickstarthelm install pinot pinot/pinot \-n pinot-quickstart \--set cluster.name=pinot \--set server.replicaCount=2
NOTE: Please specify StorageClass based on your cloud vendor. For Pinot Server, please don't mount blob store like AzureFile/GoogleCloudStorage/S3 as the data serving file system.
Only use Amazon EBS/GCP Persistent Disk/Azure Disk style disks.
For AWS: "gp2"
For GCP: "pd-ssd" or "standard"
For Azure: "AzureDisk"
For Docker-Desktop: "hostpath"
helm dependency update
For Helm v2.12.1
If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:
helm init --service-account tiller
Then deploy a new HA Pinot cluster using the following command:
helm install --namespace "pinot-quickstart" --name "pinot" .
For Helm v3.0.0
kubectl create ns pinot-quickstarthelm install -n pinot-quickstart pinot .
Error: Please run the below command if encountering the following issue:
Error: could not find tiller.
Resolution:
kubectl -n kube-system delete deployment tiller-deploykubectl -n kube-system delete service/tiller-deployhelm init --service-account tiller
Error: Please run the command below if encountering a permission issue:
Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"
Resolution:
kubectl apply -f helm-rbac.yaml
kubectl get all -n pinot-quickstart
helm repo add incubator https://charts.helm.sh/incubatorhelm install -n pinot-quickstart kafka incubator/kafka --set replicas=1
helm repo add incubator https://charts.helm.sh/incubatorhelm install --namespace "pinot-quickstart" --name kafka incubator/kafka
kubectl get all -n pinot-quickstart |grep kafka
Ensure the Kafka deployment is ready before executing the scripts in the following next steps.
pod/kafka-0 1/1 Running 0 2mpod/kafka-zookeeper-0 1/1 Running 0 10mpod/kafka-zookeeper-1 1/1 Running 0 9mpod/kafka-zookeeper-2 1/1 Running 0 8m
The scripts below will create two Kafka topics for data ingestion:
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor 1kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1
The script below will deploy 3 batch jobs.
Ingest 19492 JSON messages to Kafka topic flights-realtime
at a speed of 1 msg/sec
Ingest 19492 Avro messages to Kafka topic flights-realtime-avro
at a speed of 1 msg/sec
Upload Pinot schema airlineStats
Create Pinot table airlineStats
to ingest data from JSON encoded Kafka topic flights-realtime
Create Pinot table airlineStatsAvro
to ingest data from Avro encoded Kafka topic flights-realtime-avro
kubectl apply -f pinot-realtime-quickstart.yml
Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.
This script can be found in the Pinot source at ./incubator-pinot/kubernetes/helm
./query-pinot-data.sh
Open superset.yaml
file and goto the line showing storageclass
. And change it based on your cloud vendor:
For AWS: "gp2"
For GCP: "pd-ssd" or "standard"
For Azure: "AzureDisk"
For Docker-Desktop: "hostpath"
Then run:
kubectl apply -f superset.yaml
Ensure your cluster is up by running:
kubectl get all -n pinot-quickstart | grep superset
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'flask fab create-admin'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset db upgrade'kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset init'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_datasources -p /etc/superset/pinot_example_datasource.yaml'kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_dashboards -p /etc/superset/pinot_example_dashboard.json'
You can run below command to navigate superset in your browser with the previous admin credential.
./open-superset-ui.sh
You can open the imported dashboard by clicking Dashboards
banner and then click on AirlineStats
.
You can run the command below to deploy a customized Presto with the Pinot plugin installed.
helm install presto pinot/presto -n pinot-quickstart
kubectl apply -f presto-coordinator.yaml
The above command deploys Presto with default configs. For customizing your deployment, you can run the below command to get all the configurable values.
helm inspect values pinot/presto > /tmp/presto-values.yaml
After modifying the /tmp/presto-values.yaml
file, you can deploy Presto with:
helm install presto pinot/presto -n pinot-quickstart --values /tmp/presto-values.yaml
Once you deployed the Presto, You can check Presto deployment status by:
Kubectl get pod -n pinot-quickstart
Once Presto is deployed, you can run the below command from here, or just follow steps 6.2.1 to 6.2.3.
./pinot-presto-cli.sh
curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar -o /tmp/presto-cli && chmod +x /tmp/presto-cli
6.2.2 Port forward presto-coordinator port 8080 to localhost port 18080
kubectl port-forward service/presto-coordinator 18080:8080 -n pinot-quickstart> /dev/null &
/tmp/presto-cli --server localhost:18080 --catalog pinot --schema default
6.2.4 Query Pinot data using Presto CLI
List all catalogs
presto:default> show catalogs;
Catalog---------pinotsystem(2 rows)​Query 20191112_050827_00003_xkm4g, FINISHED, 1 nodeSplits: 19 total, 19 done (100.00%)0:01 [0 rows, 0B] [0 rows/s, 0B/s]​
List All tables
presto:default> show tables;
Table--------------airlinestats(1 row)​Query 20191112_050907_00004_xkm4g, FINISHED, 1 nodeSplits: 19 total, 19 done (100.00%)0:01 [1 rows, 29B] [1 rows/s, 41B/s]
Show schema
presto:default> DESCRIBE pinot.dontcare.airlinestats;
Column | Type | Extra | Comment----------------------+---------+-------+---------flightnum | integer | |origin | varchar | |quarter | integer | |lateaircraftdelay | integer | |divactualelapsedtime | integer | |......​Query 20191112_051021_00005_xkm4g, FINISHED, 1 nodeSplits: 19 total, 19 done (100.00%)0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]
Count total documents
presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;
cnt------9745(1 row)​Query 20191112_051114_00006_xkm4g, FINISHED, 1 nodeSplits: 17 total, 17 done (100.00%)0:00 [1 rows, 8B] [2 rows/s, 19B/s]
kubectl delete ns pinot-quickstart