Running in Kubernetes
Pinot quick start in Kubernetes
1. Prerequisites
This quickstart assumes that you already have a running Kubernetes cluster. Please follow the links below to set up a Kubernetes cluster.
Install Minikube for local setup (make sure to run with enough resources e.g.
minikube start --vm=true --cpus=4 --memory=8g --disk-size=50g)
2. Setting up a Pinot cluster in Kubernetes
Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.
The scripts can be found in the Pinot source at ./pinot/kubernetes/helm
2.1 Start Pinot with Helm
Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is here.
NOTE: Please specify StorageClass based on your cloud vendor. For Pinot Server, please don't mount blob store like AzureFile/GoogleCloudStorage/S3 as the data serving file system.
Only use Amazon EBS/GCP Persistent Disk/Azure Disk style disks.
For AWS: "gp2"
For GCP: "pd-ssd" or "standard"
For Azure: "AzureDisk"
For Docker-Desktop: "hostpath"
2.2 Check Pinot deployment status
3. Load data into Pinot using Kafka
3.1 Bring up a Kafka cluster for real-time data ingestion
3.2 Check Kafka deployment status
Ensure the Kafka deployment is ready before executing the scripts in the following next steps.
3.3 Create Kafka topics
The scripts below will create two Kafka topics for data ingestion:
3.4 Load data into Kafka and create Pinot schema/tables
The script below will deploy 3 batch jobs.
Ingest 19492 JSON messages to Kafka topic
flights-realtime
at a speed of 1 msg/secIngest 19492 Avro messages to Kafka topic
flights-realtime-avro
at a speed of 1 msg/secUpload Pinot schema
airlineStats
Create Pinot table
airlineStats
to ingest data from JSON encoded Kafka topicflights-realtime
Create Pinot table
airlineStatsAvro
to ingest data from Avro encoded Kafka topicflights-realtime-avro
4. Query using Pinot Data Explorer
4.1 Pinot Data Explorer
Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.
This script can be found in the Pinot source at ./pinot/kubernetes/helm/pinot
5. Using Superset to query Pinot
5.1 Bring up Superset using helm
Install SuperSet Helm Repo
Get Helm values config file:
Edit /tmp/superset-values.yaml
file and add pinotdb
pip dependency into bootstrapScript
field, so Superset will install pinot dependencies during bootstrap time.
You can also build your own image with this dependency or just use image: apachepinot/pinot-superset:latest
instead.
Also remember to change the admin credential inside the init
section with meaningful user profile and stronger password.
Install Superset using helm
Ensure your cluster is up by running:
5.2 Access Superset UI
You can run the below command to port forward superset to your localhost:18088
. Then you can navigate superset in your browser with the previous set admin credential.
Create Pinot Database using URI:
pinot+http://pinot-broker.pinot-quickstart:8099/query?controller=http://pinot-controller.pinot-quickstart:9000/
Once the database is added, you can add more data sets and explore the dashboarding.
6. Access Pinot using Trino
6.1 Deploy Trino
You can run the command below to deploy Trino with the Pinot plugin installed.
The above command adds Trino HelmChart repo. You can then run the below command to see the charts.
In order to connect Trino to Pinot, we need to add Pinot catalog, which requires extra configurations. You can run the below command to get all the configurable values.
To add Pinot catalog, you can edit the additionalCatalogs
section by adding:
Pinot is deployed at namespace pinot-quickstart
, so the controller serviceURL is pinot-controller.pinot-quickstart:9000
After modifying the /tmp/trino-values.yaml
file, you can deploy Trino with:
Once you deployed the Trino, You can check Trino deployment status by:
6.2 Query Trino using Trino CLI
Once Trino is deployed, you can run the below command to get a runnable Trino CLI.
6.2.1 Download Trino CLI
6.2.2 Port forward Trino service to your local if it's not already exposed
6.2.3 Use Trino console client to connect to Trino service
6.2.4 Query Pinot data using Trino CLI
6.3 Sample queries to execute
List all catalogs
List All tables
Show schema
Count total documents
7. Access Pinot using Presto
7.1 Deploy Presto using Pinot plugin
You can run the command below to deploy a customized Presto with the Pinot plugin installed.
The above command deploys Presto with default configs. For customizing your deployment, you can run the below command to get all the configurable values.
After modifying the /tmp/presto-values.yaml
file, you can deploy Presto with:
Once you deployed the Presto, You can check Presto deployment status by:
7.2 Query Presto using Presto CLI
Once Presto is deployed, you can run the below command from here, or just follow steps 6.2.1 to 6.2.3.
6.2.1 Download Presto CLI
6.2.2 Port forward presto-coordinator port 8080 to localhost port 18080
6.2.3 Start Presto CLI with pinot catalog to query it then query it
6.2.4 Query Pinot data using Presto CLI
7.3 Sample queries to execute
List all catalogs
List All tables
Show schema
Count total documents
8. Deleting the Pinot cluster in Kubernetes
Note: These are sample configs to be used as reference. For production setup, you may want to customize it to your needs.
Last updated