Running in Kubernetes

Pinot quick start in Kubernetes

Get started running Pinot in Kubernetes.

Note: The examples in this guide are sample configurations to be used as reference. For production setup, you may want to customize it to your needs.

Prerequisites

Kubernetes

This guide assumes that you already have a running Kubernetes cluster.

If you haven't yet set up a Kubernetes cluster, see the links below for instructions:

Enable Kubernetes on Docker-Desktop
Install Minikube for local setup
- Make sure to run with enough resources: minikube start --vm=true --cpus=4 --memory=8g --disk-size=50g
Set up a Kubernetes Cluster using Amazon Elastic Kubernetes Service (Amazon EKS)
Set up a Kubernetes Cluster using Google Kubernetes Engine (GKE)
Set up a Kubernetes Cluster using Azure Kubernetes Service (AKS)

Pinot

Make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.

# checkout pinot
git clone https://github.com/apache/pinot.git
cd pinot/helm/pinot

Set up a Pinot cluster in Kubernetes

Start Pinot with Helm

The Pinot repository has pre-packaged Helm charts for Pinot and Presto. The Helm repository index file is here.

helm repo add pinot https://raw.githubusercontent.com/apache/pinot/master/helm
kubectl create ns pinot-quickstart
helm install pinot pinot/pinot \
    -n pinot-quickstart \
    --set cluster.name=pinot \
    --set server.replicaCount=2

Note: Specify StorageClass based on your cloud vendor. Don't mount a blob store (such as AzureFile, GoogleCloudStorage, or S3) as the data serving file system. Use only Amazon EBS/GCP Persistent Disk/Azure Disk-style disks.

For AWS: "gp2"
For GCP: "pd-ssd" or "standard"
For Azure: "AzureDisk"
For Docker-Desktop: "hostpath"

1.1.1 Update Helm dependency

helm dependency update

1.1.2 Start Pinot with Helm

For Helm v2.12.1:

If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:

helm init --service-account tiller

Then deploy a new HA Pinot cluster using the following command:

helm install --namespace "pinot-quickstart" --name "pinot" pinot

For Helm v3.0.0:

kubectl create ns pinot-quickstart
helm install -n pinot-quickstart pinot ./pinot

1.1.3 Troubleshooting (For helm v2.12.1)

If you see the error below:

Error: could not find tiller.

Run the following:

kubectl -n kube-system delete deployment tiller-deploy
kubectl -n kube-system delete service/tiller-deploy
helm init --service-account tiller

If you encounter a permission issue, like the following:

Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"

Run the command below:

kubectl apply -f helm-rbac.yaml

Check Pinot deployment status

kubectl get all -n pinot-quickstart

Load data into Pinot using Kafka

Bring up a Kafka cluster for real-time data ingestion

helm repo add kafka https://charts.bitnami.com/bitnami
helm install -n pinot-quickstart kafka kafka/kafka --set replicas=1,zookeeper.image.tag=latest

Check Kafka deployment status

Ensure the Kafka deployment is ready before executing the scripts in the following steps. Run the following command:

kubectl get all -n pinot-quickstart | grep kafka

Below is an example output showing the deployment is ready:

pod/kafka-0                                                 1/1     Running     0          2m
pod/kafka-zookeeper-0                                       1/1     Running     0          10m
pod/kafka-zookeeper-1                                       1/1     Running     0          9m
pod/kafka-zookeeper-2                                       1/1     Running     0          8m

Create Kafka topics

Run the scripts below to create two Kafka topics for data ingestion:

kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics.sh --bootstrap-server kafka-0:9092 --topic flights-realtime --create --partitions 1 --replication-factor 1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics.sh --bootstrap-server kafka-0:9092 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1

Load data into Kafka and create Pinot schema/tables

The script below does the following:

Ingests 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec
Ingests 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec
Uploads Pinot schema airlineStats
Creates Pinot table airlineStats to ingest data from JSON encoded Kafka topic flights-realtime
Creates Pinot table airlineStatsAvro to ingest data from Avro encoded Kafka topic flights-realtime-avro

kubectl apply -f pinot/pinot-realtime-quickstart.yml

Query with the Pinot Data Explorer

Pinot Data Explorer

The script below, located at ./pinot/helm/pinot, performs local port forwarding, and opens the Pinot query console in your default web browser.

./query-pinot-data.sh

Query Pinot with Superset

Bring up Superset using Helm

Install the SuperSet Helm repository:

helm repo add superset https://apache.github.io/superset

Get the Helm values configuration file:

helm inspect values superset/superset > /tmp/superset-values.yaml

For Superset to install Pinot dependencies, edit /tmp/superset-values.yaml file to add apinotdb pip dependency into bootstrapScript field.
You can also build your own image with this dependency or use the image apachepinot/pinot-superset:latest instead.

Replace the default admin credentials inside the init section with a meaningful user profile and stronger password.
Install Superset using Helm:

kubectl create ns superset
helm upgrade --install --values /tmp/superset-values.yaml superset superset/superset -n superset

Ensure your cluster is up by running:

kubectl get all -n superset

Access the Superset UI

Run the below command to port forward Superset to your localhost:18088.

kubectl port-forward service/superset 18088:8088 -n superset

Navigate to Superset in your browser with the admin credentials you set in the previous section.
Create a new database connection with the following URI: pinot+http://pinot-broker.pinot-quickstart:8099/query?controller=http://pinot-controller.pinot-quickstart:9000/
Once the database is added, you can add more data sets and explore the dashboard options.

Access Pinot with Trino

Deploy Trino

Deploy Trino with the Pinot plugin installed:

helm repo add trino https://trinodb.github.io/charts/

See the charts in the Trino Helm chart repository:

helm search repo trino

In order to connect Trino to Pinot, you'll need to add the Pinot catalog, which requires extra configurations. Run the below command to get all the configurable values.

helm inspect values trino/trino > /tmp/trino-values.yaml

To add the Pinot catalog, edit the additionalCatalogs section by adding:

additionalCatalogs:
  pinot: |
    connector.name=pinot
    pinot.controller-urls=pinot-controller.pinot-quickstart:9000

Pinot is deployed at namespace pinot-quickstart, so the controller serviceURL is pinot-controller.pinot-quickstart:9000

After modifying the /tmp/trino-values.yaml file, deploy Trino with:

kubectl create ns trino-quickstart
helm install my-trino trino/trino --version 0.2.0 -n trino-quickstart --values /tmp/trino-values.yaml

Once you've deployed Trino, check the deployment status:

kubectl get pods -n trino-quickstart

Query Pinot with the Trino CLI

Once Trino is deployed, run the below command to get a runnable Trino CLI.

Download the Trino CLI:

curl -L https://repo1.maven.org/maven2/io/trino/trino-cli/363/trino-cli-363-executable.jar -o /tmp/trino && chmod +x /tmp/trino

Port forward Trino service to your local if it's not already exposed:

echo "Visit http://127.0.0.1:18080 to use your application"
kubectl port-forward service/my-trino 18080:8080 -n trino-quickstart

Use the Trino console client to connect to the Trino service:

/tmp/trino --server localhost:18080 --catalog pinot --schema default

Query Pinot data using the Trino CLI, like in the sample queries below.

Sample queries to execute

List all catalogs

trino:default> show catalogs;

  Catalog
---------
 pinot
 system
 tpcds
 tpch
(4 rows)

Query 20211025_010256_00002_mxcvx, FINISHED, 2 nodes
Splits: 36 total, 36 done (100.00%)
0.70 [0 rows, 0B] [0 rows/s, 0B/s]

List all tables

trino:default> show tables;

    Table
--------------
 airlinestats
(1 row)

Query 20211025_010326_00003_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.28 [1 rows, 29B] [3 rows/s, 104B/s]

Show schema

trino:default> DESCRIBE airlinestats;

        Column        |      Type      | Extra | Comment
----------------------+----------------+-------+---------
 flightnum            | integer        |       |
 origin               | varchar        |       |
 quarter              | integer        |       |
 lateaircraftdelay    | integer        |       |
 divactualelapsedtime | integer        |       |
 divwheelsons         | array(integer) |       |
 divwheelsoffs        | array(integer) |       |
......

Query 20211025_010414_00006_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.37 [79 rows, 5.96KB] [212 rows/s, 16KB/s]

Count total documents

trino:default> select count(*) as cnt from airlinestats limit 10;

 cnt
------
 9746
(1 row)

Query 20211025_015607_00009_mxcvx, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0.24 [1 rows, 9B] [4 rows/s, 38B/s]

Access Pinot with Presto

Deploy Presto with the Pinot plugin

First, deploy Presto with default configurations:

helm install presto pinot/presto -n pinot-quickstart

kubectl apply -f presto-coordinator.yaml

To customize your deployment, run the below command to get all the configurable values.

helm inspect values pinot/presto > /tmp/presto-values.yaml

After modifying the /tmp/presto-values.yaml file, deploy Presto:

helm install presto pinot/presto -n pinot-quickstart --values /tmp/presto-values.yaml

Once you've deployed the Presto instance, check the deployment status:

kubectl get pods -n pinot-quickstart

Query Presto using the Presto CLI

Once Presto is deployed, you can run the below command from here, or follow the steps below.

./pinot-presto-cli.sh

Download the Presto CLI:

curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar -o /tmp/presto-cli && chmod +x /tmp/presto-cli

Port forward presto-coordinator port 8080 to localhost port 18080:

kubectl port-forward service/presto-coordinator 18080:8080 -n pinot-quickstart> /dev/null &

Start the Presto CLI with the Pinot catalog:

/tmp/presto-cli --server localhost:18080 --catalog pinot --schema default

Query Pinot data with the Presto CLI, like in the sample queries below.

Sample queries to execute

List all catalogs

presto:default> show catalogs;

 Catalog
---------
 pinot
 system
(2 rows)

Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

List all tables

presto:default> show tables;

    Table
--------------
 airlinestats
(1 row)

Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [1 rows, 29B] [1 rows/s, 41B/s]

Show schema

presto:default> DESCRIBE pinot.dontcare.airlinestats;

        Column        |  Type   | Extra | Comment
----------------------+---------+-------+---------
 flightnum            | integer |       |
 origin               | varchar |       |
 quarter              | integer |       |
 lateaircraftdelay    | integer |       |
 divactualelapsedtime | integer |       |
......

Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]

Count total documents

presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;

 cnt
------
 9745
(1 row)

Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [2 rows/s, 19B/s]

Delete a Pinot cluster in Kubernetes

To delete your Pinot cluster in Kubernetes, run the following command:

kubectl delete ns pinot-quickstart

PreviousQuick Start Examples NextRunning on public clouds

Last updated 1 year ago

Was this helpful?