Running Pinot in Kubernetes
Pinot quick start in Kubernetes
1. Prerequisites
2. Setting up a Pinot cluster in Kubernetes
Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.
The scripts can be found in the Pinot source at ./incubator-pinot/kubernetes/helm
# checkout pinot
git clone https://github.com/apache/incubator-pinot.git
cd incubator-pinot/kubernetes/helm2.1 Start Pinot with Helm
Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is here.
helm repo add pinot https://raw.githubusercontent.com/apache/incubator-pinot/master/kubernetes/helm
kubectl create ns pinot-quickstart
helm install pinot pinot/pinot \
    -n pinot-quickstart \
    --set cluster.name=pinot \
    --set server.replicaCount=22.2 Check Pinot deployment status
kubectl get all -n pinot-quickstart3. Load data into Pinot using Kafka
3.1 Bring up a Kafka cluster for real-time data ingestion
helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
helm install -n pinot-quickstart kafka incubator/kafka --set replicas=13.2 Check Kafka deployment status
kubectl get all -n pinot-quickstart |grep kafkaEnsure the Kafka deployment is ready before executing the scripts in the following next steps.
pod/kafka-0                                          1/1     Running     0          2m
pod/kafka-zookeeper-0                                       1/1     Running     0          10m
pod/kafka-zookeeper-1                                       1/1     Running     0          9m
pod/kafka-zookeeper-2                                       1/1     Running     0          8m3.3 Create Kafka topics
The scripts below will create two Kafka topics for data ingestion:
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor 1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime-avro --create --partitions 1 --replication-factor 13.4 Load data into Kafka and create Pinot schema/tables
The script below will deploy 3 batch jobs.
- Ingest 19492 JSON messages to Kafka topic - flights-realtimeat a speed of 1 msg/sec
- Ingest 19492 Avro messages to Kafka topic - flights-realtime-avroat a speed of 1 msg/sec
- Upload Pinot schema - airlineStats
- Create Pinot table - airlineStatsto ingest data from JSON encoded Kafka topic- flights-realtime
- Create Pinot table - airlineStatsAvroto ingest data from Avro encoded Kafka topic- flights-realtime-avro
kubectl apply -f pinot-realtime-quickstart.yml4. Query using Pinot Data Explorer
4.1 Pinot Data Explorer
Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.
This script can be found in the Pinot source at ./incubator-pinot/kubernetes/helm
./query-pinot-data.sh5. Using Superset to query Pinot
5.1 Bring up Superset
kubectl apply -f superset.yaml5.2 (First time) Set up Admin account
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'flask fab create-admin'5.3 (First time) Init Superset
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset db upgrade'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset init'5.4 Load Demo data source
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_datasources -p /etc/superset/pinot_example_datasource.yaml'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_dashboards -p /etc/superset/pinot_example_dashboard.json'5.5 Access Superset UI
You can run below command to navigate superset in your browser with the previous admin credential.
./open-superset-ui.shYou can open the imported dashboard by clicking Dashboards banner and then click on AirlineStats.
6. Access Pinot using Presto
6.1 Deploy Presto using Pinot plugin
You can run the command below to deploy a customized Presto with Pinot plugin installed.
helm install presto pinot/presto -n pinot6.2 Query Presto using Presto CLI
Once Presto is deployed, you can run the command below.
./pinot-presto-cli.sh6.3 Sample queries to execute
- List all catalogs 
presto:default> show catalogs; Catalog
---------
 pinot
 system
(2 rows)
Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]- List All tables 
presto:default> show tables;    Table
--------------
 airlinestats
(1 row)
Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [1 rows, 29B] [1 rows/s, 41B/s]- Show schema 
presto:default> DESCRIBE pinot.dontcare.airlinestats;        Column        |  Type   | Extra | Comment
----------------------+---------+-------+---------
 flightnum            | integer |       |
 origin               | varchar |       |
 quarter              | integer |       |
 lateaircraftdelay    | integer |       |
 divactualelapsedtime | integer |       |
......
Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]- Count total documents 
presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10; cnt
------
 9745
(1 row)
Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [2 rows/s, 19B/s]7. Deleting the Pinot cluster in Kubernetes
kubectl delete ns pinot-quickstartLast updated
Was this helpful?

