LogoLogo
release-0.4.0
release-0.4.0
  • Introduction
  • Basics
    • Concepts
    • Architecture
    • Components
      • Cluster
      • Controller
      • Broker
      • Server
      • Minion
      • Tenant
      • Table
      • Schema
      • Segment
    • Getting started
      • Frequent questions
      • Running Pinot locally
      • Running Pinot in Docker
      • Running Pinot in Kubernetes
      • Public cloud examples
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Manual cluster setup
      • Batch import example
      • Stream ingestion example
    • Data import
      • Stream ingestion
        • Import from Kafka
      • File systems
        • Import from ADLS (Azure)
        • Import from HDFS
        • Import from GCP
      • Input formats
        • Import from CSV
        • Import from JSON
        • Import from Avro
        • Import from Parquet
        • Import from Thrift
        • Import from ORC
    • Feature guides
      • Pinot data explorer
      • Text search support
      • Indexing
    • Releases
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • GitHub Events Stream
  • For Users
    • Query
      • Pinot Query Language (PQL)
        • Unique Counting
    • API
      • Querying Pinot
        • Response Format
      • Pinot Rest Admin Interface
    • Clients
      • Java
      • Golang
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Pluggable Streams
        • Pluggable Storage
        • Record Reader
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Update Documentation
    • Advanced
      • Data Ingestion Overview
      • Advanced Pinot Setup
    • Tutorials
      • Pinot Architecture
      • Store Data
        • Batch Tables
        • Streaming Tables
      • Ingest Data
        • Batch
          • Creating Pinot Segments
          • Write your batch
          • HDFS
          • AWS S3
          • Azure Storage
          • Google Cloud Storage
        • Streaming
          • Creating Pinot Segments
          • Write your stream
          • Kafka
          • Azure EventHub
          • Amazon Kinesis
          • Google Pub/Sub
    • Design Documents
  • For Operators
    • Basics
      • Setup cluster
      • Setup table
      • Setup ingestion
      • Access Control
      • Monitoring
      • Tuning
        • Realtime
        • Routing
    • Tutorials
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Batch Data Ingestion In Practice
  • RESOURCES
    • Community
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • ThirdEye
    • Superset
    • Presto
  • PLUGINS
    • Plugin Architecture
    • Pinot Input Format
    • Pinot File System
    • Pinot Batch Ingestion
    • Pinot Stream Ingestion
Powered by GitBook
On this page
  • 1. Prerequisites
  • 2. Setting up a Pinot cluster in Kubernetes
  • 2.1 Start Pinot with Helm
  • 2.2 Check Pinot deployment status
  • 3. Load data into Pinot using Kafka
  • 3.1 Bring up a Kafka cluster for real-time data ingestion
  • 3.2 Check Kafka deployment status
  • 3.3 Create Kafka topics
  • 3.4 Load data into Kafka and create Pinot schema/tables
  • 4. Query using Pinot Data Explorer
  • 4.1 Pinot Data Explorer
  • 5. Using Superset to query Pinot
  • 5.1 Bring up Superset
  • 5.2 (First time) Set up Admin account
  • 5.3 (First time) Init Superset
  • 5.4 Load Demo data source
  • 5.5 Access Superset UI
  • 6. Access Pinot using Presto
  • 6.1 Deploy Presto using Pinot plugin
  • 6.2 Query Presto using Presto CLI
  • 6.3 Sample queries to execute
  • 7. Deleting the Pinot cluster in Kubernetes

Was this helpful?

Edit on Git
Export as PDF
  1. Basics
  2. Getting started

Running Pinot in Kubernetes

Pinot quick start in Kubernetes

PreviousRunning Pinot in DockerNextPublic cloud examples

Last updated 4 years ago

Was this helpful?

1. Prerequisites

This quick start assumes the existence of a Kubernetes cluster. Please follow the links below to setup your Kubernetes cluster.

2. Setting up a Pinot cluster in Kubernetes

Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.

The scripts can be found in the Pinot source at ./incubator-pinot/kubernetes/helm

# checkout pinot
git clone https://github.com/apache/incubator-pinot.git
cd incubator-pinot/kubernetes/helm

2.1 Start Pinot with Helm

Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is .

helm repo add pinot https://raw.githubusercontent.com/apache/incubator-pinot/master/kubernetes/helm
kubectl create ns pinot-quickstart
helm install pinot pinot/pinot \
    -n pinot-quickstart \
    --set cluster.name=pinot \
    --set server.replicaCount=2

2.1.1 Update helm dependency

helm dependency update

2.1.2 Start Pinot with Helm

  • For Helm v2.12.1

If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:

helm init --service-account tiller

Then deploy a new HA Pinot cluster using the following command:

helm install --namespace "pinot-quickstart" --name "pinot" .
  • For Helm v3.0.0

kubectl create ns pinot-quickstart
helm install -n pinot-quickstart pinot .

2.1.3 Troubleshooting (For helm v2.12.1)

  • Error: Please run the below command if encountering the following issue:

Error: could not find tiller.
  • Resolution:

kubectl -n kube-system delete deployment tiller-deploy
kubectl -n kube-system delete service/tiller-deploy
helm init --service-account tiller
  • Error: Please run the command below if encountering a permission issue:

Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"

  • Resolution:

kubectl apply -f helm-rbac.yaml

2.2 Check Pinot deployment status

kubectl get all -n pinot-quickstart

3. Load data into Pinot using Kafka

3.1 Bring up a Kafka cluster for real-time data ingestion

helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
helm install -n pinot-quickstart kafka incubator/kafka --set replicas=1
helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
helm install --namespace "pinot-quickstart"  --name kafka incubator/kafka

3.2 Check Kafka deployment status

kubectl get all -n pinot-quickstart |grep kafka

Ensure the Kafka deployment is ready before executing the scripts in the following next steps.

pod/kafka-0                                          1/1     Running     0          2m
pod/kafka-zookeeper-0                                       1/1     Running     0          10m
pod/kafka-zookeeper-1                                       1/1     Running     0          9m
pod/kafka-zookeeper-2                                       1/1     Running     0          8m

3.3 Create Kafka topics

The scripts below will create two Kafka topics for data ingestion:

kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor 1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1

3.4 Load data into Kafka and create Pinot schema/tables

The script below will deploy 3 batch jobs.

  • Ingest 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec

  • Ingest 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec

  • Upload Pinot schema airlineStats

  • Create Pinot table airlineStats to ingest data from JSON encoded Kafka topic flights-realtime

  • Create Pinot table airlineStatsAvro to ingest data from Avro encoded Kafka topic flights-realtime-avro

kubectl apply -f pinot-realtime-quickstart.yml

4. Query using Pinot Data Explorer

4.1 Pinot Data Explorer

Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.

This script can be found in the Pinot source at ./incubator-pinot/kubernetes/helm

./query-pinot-data.sh

5. Using Superset to query Pinot

5.1 Bring up Superset

kubectl apply -f superset.yaml

5.2 (First time) Set up Admin account

kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'flask fab create-admin'

5.3 (First time) Init Superset

kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset db upgrade'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset init'

5.4 Load Demo data source

kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_datasources -p /etc/superset/pinot_example_datasource.yaml'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_dashboards -p /etc/superset/pinot_example_dashboard.json'

5.5 Access Superset UI

You can run below command to navigate superset in your browser with the previous admin credential.

./open-superset-ui.sh

You can open the imported dashboard by clicking Dashboards banner and then click on AirlineStats.

6. Access Pinot using Presto

6.1 Deploy Presto using Pinot plugin

You can run the command below to deploy a customized Presto with Pinot plugin installed.

helm install presto pinot/presto -n pinot
kubectl apply -f presto-coordinator.yaml

6.2 Query Presto using Presto CLI

Once Presto is deployed, you can run the command below.

./pinot-presto-cli.sh

6.3 Sample queries to execute

  • List all catalogs

presto:default> show catalogs;
 Catalog
---------
 pinot
 system
(2 rows)

Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
  • List All tables

presto:default> show tables;
    Table
--------------
 airlinestats
(1 row)

Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [1 rows, 29B] [1 rows/s, 41B/s]
  • Show schema

presto:default> DESCRIBE pinot.dontcare.airlinestats;
        Column        |  Type   | Extra | Comment
----------------------+---------+-------+---------
 flightnum            | integer |       |
 origin               | varchar |       |
 quarter              | integer |       |
 lateaircraftdelay    | integer |       |
 divactualelapsedtime | integer |       |
......

Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]
  • Count total documents

presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;
 cnt
------
 9745
(1 row)

Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [2 rows/s, 19B/s]

7. Deleting the Pinot cluster in Kubernetes

kubectl delete ns pinot-quickstart
Enable Kubernetes on Docker-Desktop
Install Minikube for local setup
Setup a Kubernetes Cluster using Amazon Elastic Kubernetes Service (Amazon EKS)
Setup a Kubernetes Cluster using Google Kubernetes Engine (GKE)
Setup a Kubernetes Cluster using Azure Kubernetes Service (AKS)
here