Get started running Pinot in Kubernetes.
Note: The examples in this guide are sample configurations to be used as reference. For production setup, you may want to customize it to your needs.
Prerequisites
Kubernetes
This guide assumes that you already have a running Kubernetes cluster.
If you haven't yet set up a Kubernetes cluster, see the links below for instructions:
Pinot
Make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub .
Git clone project source
Copy # checkout pinot
git clone https://github.com/apache/pinot.git
cd pinot/helm/pinot
Set up a Pinot cluster in Kubernetes
Start Pinot with Helm
Run Helm with pre-installed package
The Pinot repository has pre-packaged Helm charts for Pinot and Presto. The Helm repository index file is here .
Copy helm repo add pinot https://raw.githubusercontent.com/apache/pinot/master/helm
kubectl create ns pinot-quickstart
helm install pinot pinot/pinot \
-n pinot-quickstart \
--set cluster.name=pinot \
--set server.replicaCount= 2
Note : Specify StorageClass based on your cloud vendor. Don't mount a blob store (such as AzureFile, GoogleCloudStorage, or S3) as the data serving file system. Use only Amazon EBS/GCP Persistent Disk/Azure Disk-style disks.
For GCP: "pd-ssd " or "standard "
For Docker-Desktop: "hostpath "
Run Helm script within Git repo
1.1.1 Update Helm dependency
Copy helm dependency update
1.1.2 Start Pinot with Helm
For Helm v2.12.1:
If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:
Copy helm init --service-account tiller
Then deploy a new HA Pinot cluster using the following command:
Copy helm install --namespace "pinot-quickstart" --name "pinot" pinot
For Helm v3.0.0:
Copy kubectl create ns pinot-quickstart
helm install -n pinot-quickstart pinot ./pinot
1.1.3 Troubleshooting (For helm v2.12.1)
If you see the error below:
Copy Error: could not find tiller.
Run the following:
Copy kubectl -n kube-system delete deployment tiller-deploy
kubectl -n kube-system delete service/tiller-deploy
helm init --service-account tiller
If you encounter a permission issue, like the following:
Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"
Run the command below:
Copy kubectl apply -f helm-rbac.yaml
Check Pinot deployment status
Copy kubectl get all -n pinot-quickstart
Load data into Pinot using Kafka
Bring up a Kafka cluster for real-time data ingestion
Copy helm repo add kafka https://charts.bitnami.com/bitnami
helm install -n pinot-quickstart kafka kafka/kafka --set replicas=1,zookeeper.image.tag=latest
Check Kafka deployment status
Ensure the Kafka deployment is ready before executing the scripts in the following steps. Run the following command:
Copy kubectl get all -n pinot-quickstart | grep kafka
Below is an example output showing the deployment is ready:
Copy pod/kafka-0 1/1 Running 0 2m
pod/kafka-zookeeper-0 1/1 Running 0 10m
pod/kafka-zookeeper-1 1/1 Running 0 9m
pod/kafka-zookeeper-2 1/1 Running 0 8m
Create Kafka topics
Run the scripts below to create two Kafka topics for data ingestion:
Copy kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics.sh --bootstrap-server kafka-0:9092 --topic flights-realtime --create --partitions 1 --replication-factor 1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics.sh --bootstrap-server kafka-0:9092 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1
Load data into Kafka and create Pinot schema/tables
The script below does the following:
Ingests 19492 JSON messages to Kafka topic flights-realtime
at a speed of 1 msg/sec
Ingests 19492 Avro messages to Kafka topic flights-realtime-avro
at a speed of 1 msg/sec
Uploads Pinot schema airlineStats
Creates Pinot table airlineStats
to ingest data from JSON encoded Kafka topic flights-realtime
Creates Pinot table airlineStatsAvro
to ingest data from Avro encoded Kafka topic flights-realtime-avro
Copy kubectl apply -f pinot/pinot-realtime-quickstart.yml
Query with the Pinot Data Explorer
Pinot Data Explorer
The script below, located at ./pinot/helm/pinot
, performs local port forwarding, and opens the Pinot query console in your default web browser.
Copy ./query-pinot-data.sh
Query Pinot with Superset
Bring up Superset using Helm
Install the SuperSet Helm repository:
Copy helm repo add superset https://apache.github.io/superset
Get the Helm values configuration file:
Copy helm inspect values superset/superset > /tmp/superset-values.yaml
For Superset to install Pinot dependencies, edit /tmp/superset-values.yaml
file to add apinotdb
pip dependency into bootstrapScript
field.
You can also build your own image with this dependency or use the image apachepinot/pinot-superset:latest
instead.
Replace the default admin credentials inside the init
section with a meaningful user profile and stronger password.
Install Superset using Helm:
Copy kubectl create ns superset
helm upgrade --install --values /tmp/superset-values.yaml superset superset/superset -n superset
Ensure your cluster is up by running:
Copy kubectl get all -n superset
Access the Superset UI
Run the below command to port forward Superset to your localhost:18088
.
Copy kubectl port-forward service/superset 18088:8088 -n superset
Navigate to Superset in your browser with the admin credentials you set in the previous section.
Create a new database connection with the following URI: pinot+http://pinot-broker.pinot-quickstart:8099/query?controller=http://pinot-controller.pinot-quickstart:9000/
Once the database is added, you can add more data sets and explore the dashboard options.
Access Pinot with Trino
Deploy Trino
Deploy Trino with the Pinot plugin installed:
Copy helm repo add trino https://trinodb.github.io/charts/
See the charts in the Trino Helm chart repository:
Copy helm search repo trino
In order to connect Trino to Pinot, you'll need to add the Pinot catalog, which requires extra configurations. Run the below command to get all the configurable values.
Copy helm inspect values trino/trino > /tmp/trino-values.yaml
To add the Pinot catalog, edit the additionalCatalogs
section by adding:
Copy additionalCatalogs:
pinot: |
connector.name=pinot
pinot.controller-urls=pinot-controller.pinot-quickstart:9000
Pinot is deployed at namespace pinot-quickstart
, so the controller serviceURL is pinot-controller.pinot-quickstart:9000
After modifying the /tmp/trino-values.yaml
file, deploy Trino with:
Copy kubectl create ns trino-quickstart
helm install my-trino trino/trino --version 0.2.0 -n trino-quickstart --values /tmp/trino-values.yaml
Once you've deployed Trino, check the deployment status:
Copy kubectl get pods -n trino-quickstart
Query Pinot with the Trino CLI
Once Trino is deployed, run the below command to get a runnable Trino CLI.
Copy curl -L https://repo1.maven.org/maven2/io/trino/trino-cli/363/trino-cli-363-executable.jar -o /tmp/trino && chmod +x /tmp/trino
Port forward Trino service to your local if it's not already exposed:
Copy echo "Visit http://127.0.0.1:18080 to use your application"
kubectl port-forward service/my-trino 18080:8080 -n trino-quickstart
Use the Trino console client to connect to the Trino service:
Copy /tmp/trino --server localhost:18080 --catalog pinot --schema default
Query Pinot data using the Trino CLI, like in the sample queries below.
Sample queries to execute
List all catalogs
Copy trino:default> show catalogs;
Copy Catalog
---------
pinot
system
tpcds
tpch
(4 rows)
Query 20211025_010256_00002_mxcvx, FINISHED, 2 nodes
Splits: 36 total, 36 done (100.00%)
0.70 [0 rows, 0B] [0 rows/s, 0B/s]
List all tables
Copy trino:default> show tables;
Copy Table
--------------
airlinestats
(1 row)
Query 20211025_010326_00003_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.28 [1 rows, 29B] [3 rows/s, 104B/s]
Show schema
Copy trino:default> DESCRIBE airlinestats;
Copy Column | Type | Extra | Comment
----------------------+----------------+-------+---------
flightnum | integer | |
origin | varchar | |
quarter | integer | |
lateaircraftdelay | integer | |
divactualelapsedtime | integer | |
divwheelsons | array(integer) | |
divwheelsoffs | array(integer) | |
......
Query 20211025_010414_00006_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.37 [79 rows, 5.96KB] [212 rows/s, 16KB/s]
Count total documents
Copy trino:default> select count(*) as cnt from airlinestats limit 10;
Copy cnt
------
9746
(1 row)
Query 20211025_015607_00009_mxcvx, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0.24 [1 rows, 9B] [4 rows/s, 38B/s]
Access Pinot with Presto
Deploy Presto with the Pinot plugin
First, deploy Presto with default configurations:
Helm
Copy helm install presto pinot/presto -n pinot-quickstart
K8s Scripts
Copy kubectl apply -f presto-coordinator.yaml
To customize your deployment, run the below command to get all the configurable values.
Copy helm inspect values pinot/presto > /tmp/presto-values.yaml
After modifying the /tmp/presto-values.yaml
file, deploy Presto:
Copy helm install presto pinot/presto -n pinot-quickstart --values /tmp/presto-values.yaml
Once you've deployed the Presto instance, check the deployment status:
Copy kubectl get pods -n pinot-quickstart
Query Presto using the Presto CLI
Once Presto is deployed, you can run the below command from here , or follow the steps below.
Copy ./pinot-presto-cli.sh
Copy curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar -o /tmp/presto-cli && chmod +x /tmp/presto-cli
Port forward presto-coordinator
port 8080 to localhost
port 18080:
Copy kubectl port-forward service/presto-coordinator 18080:8080 -n pinot-quickstart> /dev/null &
Start the Presto CLI with the Pinot catalog:
Copy /tmp/presto-cli --server localhost:18080 --catalog pinot --schema default
Query Pinot data with the Presto CLI, like in the sample queries below.
Sample queries to execute
List all catalogs
Copy presto:default> show catalogs;
Copy Catalog
---------
pinot
system
(2 rows)
Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
List all tables
Copy presto:default> show tables;
Copy Table
--------------
airlinestats
(1 row)
Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [1 rows, 29B] [1 rows/s, 41B/s]
Show schema
Copy presto:default> DESCRIBE pinot.dontcare.airlinestats;
Copy Column | Type | Extra | Comment
----------------------+---------+-------+---------
flightnum | integer | |
origin | varchar | |
quarter | integer | |
lateaircraftdelay | integer | |
divactualelapsedtime | integer | |
......
Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]
Count total documents
Copy presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;
Copy cnt
------
9745
(1 row)
Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [2 rows/s, 19B/s]
Delete a Pinot cluster in Kubernetes
To delete your Pinot cluster in Kubernetes, run the following command:
Copy kubectl delete ns pinot-quickstart