LogoLogo
release-0.10.0
release-0.10.0
  • Introduction
  • Basics
    • Concepts
    • Architecture
    • Components
      • Cluster
      • Controller
      • Broker
      • Server
      • Minion
      • Tenant
      • Schema
      • Table
      • Segment
      • Deep Store
      • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • Batch Ingestion
        • Spark
        • Hadoop
        • Backfill Data
        • Dimension Table
      • Stream ingestion
        • Apache Kafka
        • Amazon Kinesis
        • Apache Pulsar
      • Stream Ingestion with Upsert
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
      • Complex Type (Array, Map) Handling
    • Indexing
      • Forward Index
      • Inverted Index
      • Star-Tree Index
      • Bloom Filter
      • Range Index
      • Text search support
      • JSON Index
      • Geospatial
      • Timestamp Index
    • Releases
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Filtering with IdSet
      • Transformation Functions
      • Aggregation Functions
      • User-Defined Functions (UDFs)
      • Cardinality Estimation
      • Lookup UDF Join
      • Querying JSON data
      • Explain Plan
      • Grouping Algorithm
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Update Documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Transformations
      • Null Value Support
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Setup cluster
      • Setup table
      • Setup ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
      • Tiered Storage
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Access Control
      • Monitoring
      • Tuning
        • Realtime
        • Routing
      • Upgrading Pinot with confidence
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication, Authorization, and ACLs
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Schema
    • Ingestion Job Spec
    • Functions
      • ABS
      • ADD
      • arrayConcatInt
      • arrayConcatString
      • arrayContainsInt
      • arrayContainsString
      • arrayDistinctString
      • arrayDistinctInt
      • arrayIndexOfInt
      • arrayIndexOfString
      • ARRAYLENGTH
      • arrayRemoveInt
      • arrayRemoveString
      • arrayReverseInt
      • arrayReverseString
      • arraySliceInt
      • arraySliceString
      • arraySortInt
      • arraySortString
      • arrayUnionInt
      • arrayUnionString
      • AVGMV
      • ceil
      • CHR
      • codepoint
      • concat
      • count
      • COUNTMV
      • day
      • dayOfWeek
      • dayOfYear
      • DISTINCT
      • DISTINCTCOUNT
      • DISTINCTCOUNTBITMAP
      • DISTINCTCOUNTBITMAPMV
      • DISTINCTCOUNTHLL
      • DISTINCTCOUNTHLLMV
      • DISTINCTCOUNTMV
      • DISTINCTCOUNTRAWHLL
      • DISTINCTCOUNTRAWHLLMV
      • DISTINCTCOUNTRAWTHETASKETCH
      • DISTINCTCOUNTTHETASKETCH
      • DIV
      • DATETIMECONVERT
      • DATETRUNC
      • exp
      • FLOOR
      • FromDateTime
      • FromEpoch
      • FromEpochBucket
      • hour
      • JSONFORMAT
      • JSONPATH
      • JSONPATHARRAY
      • JSONPATHARRAYDEFAULTEMPTY
      • JSONPATHDOUBLE
      • JSONPATHLONG
      • JSONPATHSTRING
      • jsonextractkey
      • jsonextractscalar
      • length
      • ln
      • lower
      • lpad
      • ltrim
      • max
      • MAXMV
      • MD5
      • millisecond
      • min
      • minmaxrange
      • MINMAXRANGEMV
      • MINMV
      • minute
      • MOD
      • mode
      • month
      • mult
      • now
      • percentile
      • percentileest
      • percentileestmv
      • percentilemv
      • percentiletdigest
      • percentiletdigestmv
      • quarter
      • regexpExtract
      • remove
      • replace
      • reverse
      • round
      • rpad
      • rtrim
      • second
      • SEGMENTPARTITIONEDDISTINCTCOUNT
      • sha
      • sha256
      • sha512
      • sqrt
      • startswith
      • ST_AsBinary
      • ST_AsText
      • ST_Contains
      • ST_Distance
      • ST_GeogFromText
      • ST_GeogFromWKB
      • ST_GeometryType
      • ST_GeomFromText
      • ST_GeomFromWKB
      • STPOINT
      • ST_Polygon
      • strpos
      • ST_Union
      • SUB
      • substr
      • sum
      • summv
      • TIMECONVERT
      • timezoneHour
      • timezoneMinute
      • ToDateTime
      • ToEpoch
      • ToEpochBucket
      • ToEpochRounded
      • TOJSONMAPSTR
      • toGeometry
      • toSphericalGeography
      • trim
      • upper
      • VALUEIN
      • week
      • year
      • yearOfWeek
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
Powered by GitBook
On this page
  • 1. Prerequisites
  • 2. Setting up a Pinot cluster in Kubernetes
  • 2.1 Start Pinot with Helm
  • 2.2 Check Pinot deployment status
  • 3. Load data into Pinot using Kafka
  • 3.1 Bring up a Kafka cluster for real-time data ingestion
  • 3.2 Check Kafka deployment status
  • 3.3 Create Kafka topics
  • 3.4 Load data into Kafka and create Pinot schema/tables
  • 4. Query using Pinot Data Explorer
  • 4.1 Pinot Data Explorer
  • 5. Using Superset to query Pinot
  • 5.1 Bring up Superset using helm
  • 5.2 Access Superset UI
  • 6. Access Pinot using Trino
  • 6.1 Deploy Trino
  • 6.2 Query Trino using Trino CLI
  • 6.3 Sample queries to execute
  • 7. Access Pinot using Presto
  • 7.1 Deploy Presto using Pinot plugin
  • 7.2 Query Presto using Presto CLI
  • 7.3 Sample queries to execute
  • 8. Deleting the Pinot cluster in Kubernetes

Was this helpful?

Export as PDF
  1. Basics
  2. Getting Started

Running in Kubernetes

Pinot quick start in Kubernetes

PreviousQuick Start ExamplesNextRunning on public clouds

Last updated 3 years ago

Was this helpful?

1. Prerequisites

This quickstart assumes that you already have a running Kubernetes cluster. Please follow the links below to set up a Kubernetes cluster.

  • (make sure to run with enough resources e.g. minikube start --vm=true --cpus=4 --memory=8g --disk-size=50g)

2. Setting up a Pinot cluster in Kubernetes

Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.

The scripts can be found in the Pinot source at ./pinot/kubernetes/helm

# checkout pinot
git clone https://github.com/apache/pinot.git
cd pinot/kubernetes/helm

2.1 Start Pinot with Helm

Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is .

helm repo add pinot https://raw.githubusercontent.com/apache/pinot/master/kubernetes/helm
kubectl create ns pinot-quickstart
helm install pinot pinot/pinot \
    -n pinot-quickstart \
    --set cluster.name=pinot \
    --set server.replicaCount=2

NOTE: Please specify StorageClass based on your cloud vendor. For Pinot Server, please don't mount blob store like AzureFile/GoogleCloudStorage/S3 as the data serving file system.

Only use Amazon EBS/GCP Persistent Disk/Azure Disk style disks.

  • For AWS: "gp2"

  • For GCP: "pd-ssd" or "standard"

  • For Azure: "AzureDisk"

  • For Docker-Desktop: "hostpath"

2.1.1 Update helm dependency

helm dependency update

2.1.2 Start Pinot with Helm

  • For Helm v2.12.1

If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:

helm init --service-account tiller

Then deploy a new HA Pinot cluster using the following command:

helm install --namespace "pinot-quickstart" --name "pinot" pinot
  • For Helm v3.0.0

kubectl create ns pinot-quickstart
helm install -n pinot-quickstart pinot pinot

2.1.3 Troubleshooting (For helm v2.12.1)

  • Error: Please run the below command if encountering the following issue:

Error: could not find tiller.
  • Resolution:

kubectl -n kube-system delete deployment tiller-deploy
kubectl -n kube-system delete service/tiller-deploy
helm init --service-account tiller
  • Error: Please run the command below if encountering a permission issue:

Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"

  • Resolution:

kubectl apply -f helm-rbac.yaml

2.2 Check Pinot deployment status

kubectl get all -n pinot-quickstart

3. Load data into Pinot using Kafka

3.1 Bring up a Kafka cluster for real-time data ingestion

helm repo add incubator https://charts.helm.sh/incubator
helm install -n pinot-quickstart kafka incubator/kafka --set replicas=1,zookeeper.image.tag=latest
helm repo add incubator https://charts.helm.sh/incubator
helm install --namespace "pinot-quickstart"  --name kafka incubator/kafka --set zookeeper.image.tag=latest 

3.2 Check Kafka deployment status

kubectl get all -n pinot-quickstart | grep kafka

Ensure the Kafka deployment is ready before executing the scripts in the following next steps.

pod/kafka-0                                                 1/1     Running     0          2m
pod/kafka-zookeeper-0                                       1/1     Running     0          10m
pod/kafka-zookeeper-1                                       1/1     Running     0          9m
pod/kafka-zookeeper-2                                       1/1     Running     0          8m

3.3 Create Kafka topics

The scripts below will create two Kafka topics for data ingestion:

kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor 1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1

3.4 Load data into Kafka and create Pinot schema/tables

The script below will deploy 3 batch jobs.

  • Ingest 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec

  • Ingest 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec

  • Upload Pinot schema airlineStats

  • Create Pinot table airlineStats to ingest data from JSON encoded Kafka topic flights-realtime

  • Create Pinot table airlineStatsAvro to ingest data from Avro encoded Kafka topic flights-realtime-avro

kubectl apply -f pinot/pinot-realtime-quickstart.yml

4. Query using Pinot Data Explorer

4.1 Pinot Data Explorer

Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.

This script can be found in the Pinot source at ./pinot/kubernetes/helm/pinot

./query-pinot-data.sh

5. Using Superset to query Pinot

5.1 Bring up Superset using helm

Install SuperSet Helm Repo

helm repo add superset https://apache.github.io/superset

Get Helm values config file:

helm inspect values superset/superset > /tmp/superset-values.yaml

Edit /tmp/superset-values.yaml file and add pinotdb pip dependency into bootstrapScript field, so Superset will install pinot dependencies during bootstrap time.

You can also build your own image with this dependency or just use image: apachepinot/pinot-superset:latest instead.

Also remember to change the admin credential inside the init section with meaningful user profile and stronger password.

Install Superset using helm

kubectl create ns superset
helm upgrade --install --values /tmp/superset-values.yaml superset superset/superset -n superset

Ensure your cluster is up by running:

kubectl get all -n superset

5.2 Access Superset UI

You can run the below command to port forward superset to your localhost:18088. Then you can navigate superset in your browser with the previous set admin credential.

kubectl port-forward service/superset 18088:8088 -n superset

Create Pinot Database using URI:

pinot+http://pinot-broker.pinot-quickstart:8099/query?controller=http://pinot-controller.pinot-quickstart:9000/

Once the database is added, you can add more data sets and explore the dashboarding.

6. Access Pinot using Trino

6.1 Deploy Trino

You can run the command below to deploy Trino with the Pinot plugin installed.

helm repo add trino https://trinodb.github.io/charts/

The above command adds Trino HelmChart repo. You can then run the below command to see the charts.

helm search repo trino

In order to connect Trino to Pinot, we need to add Pinot catalog, which requires extra configurations. You can run the below command to get all the configurable values.

helm inspect values trino/trino > /tmp/trino-values.yaml

To add Pinot catalog, you can edit the additionalCatalogs section by adding:

additionalCatalogs:
  pinot: |
    connector.name=pinot
    pinot.controller-urls=pinot-controller.pinot-quickstart:9000

Pinot is deployed at namespace pinot-quickstart, so the controller serviceURL is pinot-controller.pinot-quickstart:9000

After modifying the /tmp/trino-values.yaml file, you can deploy Trino with:

kubectl create ns trino-quickstart
helm install my-trino trino/trino --version 0.2.0 -n trino-quickstart --values /tmp/trino-values.yaml

Once you deployed the Trino, You can check Trino deployment status by:

kubectl get pods -n trino-quickstart

6.2 Query Trino using Trino CLI

Once Trino is deployed, you can run the below command to get a runnable Trino CLI.

6.2.1 Download Trino CLI

curl -L https://repo1.maven.org/maven2/io/trino/trino-cli/363/trino-cli-363-executable.jar -o /tmp/trino && chmod +x /tmp/trino

6.2.2 Port forward Trino service to your local if it's not already exposed

echo "Visit http://127.0.0.1:18080 to use your application"
kubectl port-forward service/my-trino 18080:8080 -n trino-quickstart

6.2.3 Use Trino console client to connect to Trino service

/tmp/trino --server localhost:18080 --catalog pinot --schema default

6.2.4 Query Pinot data using Trino CLI

6.3 Sample queries to execute

  • List all catalogs

trino:default> show catalogs;
  Catalog
---------
 pinot
 system
 tpcds
 tpch
(4 rows)

Query 20211025_010256_00002_mxcvx, FINISHED, 2 nodes
Splits: 36 total, 36 done (100.00%)
0.70 [0 rows, 0B] [0 rows/s, 0B/s]
  • List All tables

trino:default> show tables;
    Table
--------------
 airlinestats
(1 row)

Query 20211025_010326_00003_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.28 [1 rows, 29B] [3 rows/s, 104B/s]
  • Show schema

trino:default> DESCRIBE airlinestats;
        Column        |      Type      | Extra | Comment
----------------------+----------------+-------+---------
 flightnum            | integer        |       |
 origin               | varchar        |       |
 quarter              | integer        |       |
 lateaircraftdelay    | integer        |       |
 divactualelapsedtime | integer        |       |
 divwheelsons         | array(integer) |       |
 divwheelsoffs        | array(integer) |       |
......

Query 20211025_010414_00006_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.37 [79 rows, 5.96KB] [212 rows/s, 16KB/s]
  • Count total documents

trino:default> select count(*) as cnt from airlinestats limit 10;
 cnt
------
 9746
(1 row)

Query 20211025_015607_00009_mxcvx, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0.24 [1 rows, 9B] [4 rows/s, 38B/s]

7. Access Pinot using Presto

7.1 Deploy Presto using Pinot plugin

You can run the command below to deploy a customized Presto with the Pinot plugin installed.

helm install presto pinot/presto -n pinot-quickstart
kubectl apply -f presto-coordinator.yaml

The above command deploys Presto with default configs. For customizing your deployment, you can run the below command to get all the configurable values.

helm inspect values pinot/presto > /tmp/presto-values.yaml

After modifying the /tmp/presto-values.yaml file, you can deploy Presto with:

helm install presto pinot/presto -n pinot-quickstart --values /tmp/presto-values.yaml

Once you deployed the Presto, You can check Presto deployment status by:

kubectl get pods -n pinot-quickstart

7.2 Query Presto using Presto CLI

./pinot-presto-cli.sh

6.2.1 Download Presto CLI

curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar -o /tmp/presto-cli && chmod +x /tmp/presto-cli

6.2.2 Port forward presto-coordinator port 8080 to localhost port 18080

kubectl port-forward service/presto-coordinator 18080:8080 -n pinot-quickstart> /dev/null &

6.2.3 Start Presto CLI with pinot catalog to query it then query it

/tmp/presto-cli --server localhost:18080 --catalog pinot --schema default

6.2.4 Query Pinot data using Presto CLI

7.3 Sample queries to execute

  • List all catalogs

presto:default> show catalogs;
 Catalog
---------
 pinot
 system
(2 rows)

Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
  • List All tables

presto:default> show tables;
    Table
--------------
 airlinestats
(1 row)

Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [1 rows, 29B] [1 rows/s, 41B/s]
  • Show schema

presto:default> DESCRIBE pinot.dontcare.airlinestats;
        Column        |  Type   | Extra | Comment
----------------------+---------+-------+---------
 flightnum            | integer |       |
 origin               | varchar |       |
 quarter              | integer |       |
 lateaircraftdelay    | integer |       |
 divactualelapsedtime | integer |       |
......

Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]
  • Count total documents

presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;
 cnt
------
 9745
(1 row)

Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [2 rows/s, 19B/s]

8. Deleting the Pinot cluster in Kubernetes

kubectl delete ns pinot-quickstart

Sample Output of K8s Deployment Status

Once Presto is deployed, you can run the below command from , or just follow steps 6.2.1 to 6.2.3.

Enable Kubernetes on Docker-Desktop
Install Minikube for local setup
Setup a Kubernetes Cluster using Amazon Elastic Kubernetes Service (Amazon EKS)
Setup a Kubernetes Cluster using Google Kubernetes Engine (GKE)
Setup a Kubernetes Cluster using Azure Kubernetes Service (AKS)
here
here