arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Running Pinot in Kubernetes

Pinot quick start in Kubernetes

hashtag
1. Prerequisites

circle-info

This quick start assumes the existence of a Kubernetes cluster. Please follow the links below to setup your Kubernetes cluster.

hashtag
2. Setting up a Pinot cluster in Kubernetes

Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.

The scripts can be found in the Pinot source at ./incubator-pinot/kubernetes/helm

hashtag
2.1 Start Pinot with Helm

Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is .

hashtag
2.1.1 Update helm dependency

hashtag
2.2 Check Pinot deployment status

hashtag
3. Load data into Pinot using Kafka

hashtag
3.1 Bring up a Kafka cluster for real-time data ingestion

hashtag
3.2 Check Kafka deployment status

Ensure the Kafka deployment is ready before executing the scripts in the following next steps.

hashtag
3.3 Create Kafka topics

The scripts below will create two Kafka topics for data ingestion:

hashtag
3.4 Load data into Kafka and create Pinot schema/tables

The script below will deploy 3 batch jobs.

  • Ingest 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec

  • Ingest 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec

  • Upload Pinot schema airlineStats

hashtag
4. Query using Pinot Data Explorer

hashtag
4.1 Pinot Data Explorer

Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.

This script can be found in the Pinot source at ./incubator-pinot/kubernetes/helm

hashtag
5. Using Superset to query Pinot

hashtag
5.1 Bring up Superset

hashtag
5.2 (First time) Set up Admin account

hashtag
5.3 (First time) Init Superset

hashtag
5.4 Load Demo data source

hashtag
5.5 Access Superset UI

You can run below command to navigate superset in your browser with the previous admin credential.

You can open the imported dashboard by clicking Dashboards banner and then click on AirlineStats.

hashtag
6. Access Pinot using Presto

hashtag
6.1 Deploy Presto using Pinot plugin

You can run the command below to deploy a customized Presto with Pinot plugin installed.

hashtag
6.2 Query Presto using Presto CLI

Once Presto is deployed, you can run the command below.

hashtag
6.3 Sample queries to execute

  • List all catalogs

  • List All tables

  • Show schema

  • Count total documents

hashtag
7. Deleting the Pinot cluster in Kubernetes

Setup a Kubernetes Cluster using Google Kubernetes Engine (GKE)
  • Setup a Kubernetes Cluster using Azure Kubernetes Service (AKS)

  • hashtag
    2.1.2 Start Pinot with Helm
    • For Helm v2.12.1

    If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:

    Then deploy a new HA Pinot cluster using the following command:

    • For Helm v3.0.0

    hashtag
    2.1.3 Troubleshooting (For helm v2.12.1)

    • Error: Please run the below command if encountering the following issue:

    • Resolution:

    • Error: Please run the command below if encountering a permission issue:

    Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"

    • Resolution:

  • Create Pinot table airlineStats to ingest data from JSON encoded Kafka topic flights-realtime

  • Create Pinot table airlineStatsAvro to ingest data from Avro encoded Kafka topic flights-realtime-avro

  • helm repo add pinot https://raw.githubusercontent.com/apache/incubator-pinot/master/kubernetes/helm
    kubectl create ns pinot-quickstart
    helm install pinot pinot/pinot \
        -n pinot-quickstart \
        --set cluster.name=pinot \
        --set server.replicaCount=2
    helm dependency update
    Enable Kubernetes on Docker-Desktoparrow-up-right
    Install Minikube for local setuparrow-up-right
    Setup a Kubernetes Cluster using Amazon Elastic Kubernetes Service (Amazon EKS)
    herearrow-up-right
    # checkout pinot
    git clone https://github.com/apache/incubator-pinot.git
    cd incubator-pinot/kubernetes/helm
    helm init --service-account tiller
    helm install --namespace "pinot-quickstart" --name "pinot" .
    kubectl create ns pinot-quickstart
    helm install -n pinot-quickstart pinot .
    Error: could not find tiller.
    kubectl -n kube-system delete deployment tiller-deploy
    kubectl -n kube-system delete service/tiller-deploy
    helm init --service-account tiller
    kubectl get all -n pinot-quickstart
    helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
    helm install -n pinot-quickstart kafka incubator/kafka --set replicas=1
    helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
    helm install --namespace "pinot-quickstart"  --name kafka incubator/kafka
    kubectl get all -n pinot-quickstart |grep kafka
    pod/kafka-0                                          1/1     Running     0          2m
    pod/kafka-zookeeper-0                                       1/1     Running     0          10m
    pod/kafka-zookeeper-1                                       1/1     Running     0          9m
    pod/kafka-zookeeper-2                                       1/1     Running     0          8m
    kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor 1
    kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1
    kubectl apply -f pinot-realtime-quickstart.yml
    ./query-pinot-data.sh
    kubectl apply -f superset.yaml
    kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'flask fab create-admin'
    kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset db upgrade'
    kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset init'
    kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_datasources -p /etc/superset/pinot_example_datasource.yaml'
    kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_dashboards -p /etc/superset/pinot_example_dashboard.json'
    ./open-superset-ui.sh
    helm install presto pinot/presto -n pinot
    kubectl apply -f presto-coordinator.yaml
    ./pinot-presto-cli.sh
    presto:default> show catalogs;
     Catalog
    ---------
     pinot
     system
    (2 rows)
    
    Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
    Splits: 19 total, 19 done (100.00%)
    0:01 [0 rows, 0B] [0 rows/s, 0B/s]
    presto:default> show tables;
        Table
    --------------
     airlinestats
    (1 row)
    
    Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
    Splits: 19 total, 19 done (100.00%)
    0:01 [1 rows, 29B] [1 rows/s, 41B/s]
    presto:default> DESCRIBE pinot.dontcare.airlinestats;
            Column        |  Type   | Extra | Comment
    ----------------------+---------+-------+---------
     flightnum            | integer |       |
     origin               | varchar |       |
     quarter              | integer |       |
     lateaircraftdelay    | integer |       |
     divactualelapsedtime | integer |       |
    ......
    
    Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
    Splits: 19 total, 19 done (100.00%)
    0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]
    presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;
     cnt
    ------
     9745
    (1 row)
    
    Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
    Splits: 17 total, 17 done (100.00%)
    0:00 [1 rows, 8B] [2 rows/s, 19B/s]
    kubectl delete ns pinot-quickstart
    kubectl apply -f helm-rbac.yaml