arrow-left

All pages
gitbookPowered by GitBook
1 of 12

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Running Pinot locally

This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.

In this guide you'll learn how to download and install Apache Pinot as a standalone instance.

circle-check

This is a quickstart guide that will show you how to quickly start an example recipe in a standalone instance and is meant for learning. To run Pinot in cluster mode, please take a look at Manual cluster setup.

hashtag
Download Apache Pinot

First, let's download the Pinot distribution for this tutorial. You can either build the distribution from source or download a packaged release.

circle-info

Prerequisites

Install JDK8 or higher.

hashtag
Build from source or download the distribution

Follow these steps to checkout code from and build Pinot locally

circle-info

Prerequisites

Install 3.6 or higher

circle-info

hashtag
Setting up a Pinot cluster

We'll be using a quick-start script, which does the following:

  1. Sets up the Pinot cluster QuickStartCluster

  2. Creates a sample table and loads sample data

There's 3 kinds of quick start

hashtag
Batch

Batch quick start creates the pinot cluster, creates an offline table baseballStats and pushes sample offline data to the table.

That's it! We've spun up a Pinot cluster. You can continue playing with other types of quick start, or simply head on to to check out the data in the baseballStats table.

hashtag
Streaming

Streaming quick start sets up a Kafka cluster and pushes sample data to a Kafka topic. Then, it creates the Pinot cluster and creates a realtime table meetupRSVP which ingests data from the Kafka topic.

We now have a Pinot cluster with a realtime table! You can head over to to check out the data in the meetupRSVP table.

hashtag
Hybrid

Hybrid quick start sets up a Kafka cluster and pushes sample data to a Kafka topic. Then, it creates the Pinot cluster and creates a hybrid table airlineStats . The realtime table ingests data from the Kafka topic. Lastly, sample data is pushed into the offline table.

Let's head over to to check out the data we pushed to the airlineStats table.

Getting started

This section contains quick start guides to help you get up and running with Pinot.

hashtag
Running Pinot

We want your experience getting started with Pinot to be both low effort and high reward. Here you'll find a collection of quick start guides that contain starter distributions of the Pinot platform.

hashtag

Frequent questions

This page has a collection of frequently asked questions with answers from the community.

circle-info

This is a list of frequent questions most often asked in our troubleshooting channel on Slack. Please feel free to contribute your questions and answers here and make a pull request.

hashtag
Ingestion

Note that Pinot scripts is located under pinot-distribution/target not target directory under root.

Download the latest binary release from Apache Pinotarrow-up-right, or use this command

wget https://downloads.apache.org/incubator/pinot/apache-pinot-incubating-$PINOT_VERSION/apache-pinot-incubating-$PINOT_VERSION-bin.tar.gz

Once you have the tar file,

# untar it
tar -zxvf apache-pinot-incubating-$PINOT_VERSION-bin.tar.gz
Githubarrow-up-right
Apache Mavenarrow-up-right
Pinot Data Explorer
Pinot Data Explorer
Pinot Data Explorer
# checkout pinot
git clone https://github.com/apache/incubator-pinot.git
cd incubator-pinot

# build pinot
mvn install package -DskipTests -Pbin-dist

# navigate to directory containing the setup scripts
cd pinot-distribution/target/apache-pinot-incubating-$PINOT_VERSION-bin/apache-pinot-incubating-$PINOT_VERSION-bin
Bootstrapping a cluster

hashtag
Deploy to a public cloud

hashtag
How to setup a Pinot cluster

This video will show you a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances. This is an excellent resource for developers and operators that want to understand setting up each component and debugging a cluster.

circle-info

You can find the commands that are shown in this video on GitHub https://github.com/npawar/pinot-tutorialarrow-up-right

We also have a step-by-step guide for manually setting up a Pinot cluster using Docker or shell scripts.

hashtag
Data import examples

Getting data into Pinot is easy. Take a look at these two quick start guides which will help you get up and running with sample data for offline and real-time tables.

Running Pinot locallychevron-right
Running Pinot in Dockerchevron-right
Running Pinot in Kuberneteschevron-right
Running on Azurechevron-right
Running on GCPchevron-right
Running on AWSchevron-right
Manual cluster setupchevron-right
Batch import examplechevron-right
Stream ingestion examplechevron-right
hashtag
How do I flatten my JSON Kafka stream?

We have toJsonStr(key) function which can store a top level json field as a STRING in Pinot.

Then you can use jsonExtractScalar(JSON_STRING_FIELD, JSON_PATH, OUTPUT_FORMAT) function during query time to fetch the desired field from the json string. For example

circle-exclamation

NOTE This works well if some of your fields are nested json, but most of your fields are top level json keys. If all of your fields are within a nested JSON key, you will have to store the entire payload as 1 column, which is not ideal.

Support for flattening during ingestion is on the roadmap: https://github.com/apache/incubator-pinot/issues/5264arrow-up-right

hashtag
Indexing

hashtag
How to set inverted indexes?

Inverted indexes are set in the tableConfig's tableIndexConfig -> invertedIndexColumns list. Here's the documentation for tableIndexConfig: https://docs.pinot.apache.org/basics/components/table#tableindexconfig-1arrow-up-right along with a sample table that has set inverted indexes on some columns.

Applying inverted indexes to a table config will generate inverted index to all new segments. In order to apply the inverted indexes to all existing segments, follow steps in How to apply inverted index to existing setup?

hashtag
How to apply inverted index to existing setup?

  1. Add the columns you wish to index to the tableIndexConfig-> invertedIndexColumns list. This sample table config show inverted indexes set: https://docs.pinot.apache.org/basics/components/table#offline-table-config arrow-up-rightTo update the table config use the Pinot Swagger API: http://localhost:9000/help#!/Table/updateTableConfigarrow-up-right

  2. Invoke the reload API: http://localhost:9000/help#!/Segment/reloadAllSegmentsarrow-up-right

Right now, there’s no easy way to confirm that reload succeeded. One way it to check out the index_map file inside the segment metadata, you should see inverted index entries for the new columns. An API for this is coming soon: https://github.com/apache/incubator-pinot/issues/5390arrow-up-right

hashtag
How to apply star tree index?

hashtag
Querying

hashtag
What are all the fields in the Pinot query's JSON response?

Here's the page explaining the Pinot response format: https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql/response-formatarrow-up-right

hashtag
SQL Query fails with "Encountered 'timestamp' was expecting one of..."

"timestamp" is a reserved keyword in SQL. Escape timestamp with double quotes.

Other commonly encountered reserved keywords are date, time, table.

hashtag
Filtering on STRING column WHERE column = "foo" does not work?

For filtering on STRING columns, use single quotes

hashtag
ORDER BY using an alias doesn't work?

The fields in the ORDER BY clause must be one of the group by clauses or aggregations, BEFORE applying the alias. Therefore, this will not work

Instead, this will work

hashtag
Operations

hashtag
Can I change a column name in my table, without losing data?

hashtag
How to change number of replicas of a table?

You can change the number of replicas by updating the table config's segmentsConfigarrow-up-right section. Make sure you have at least as many servers as the replication.

For OFFLINE table, update replicationarrow-up-right

For REALTIME table update replicasPerPartitionarrow-up-right

After changing the replication, run a table rebalance.

hashtag
How to run a rebalance on a table?

A rebalance is run to reassign all the segments of a table to the available servers. This is typically done when capacity changes are done i.e. adding more servers or removing servers from a table.

Offline

Use the rebalance API from the Swagger APIs on the controller http://localhost:9000/help#!/Table/rebalancearrow-up-right, with tableType OFFLINE

Realtime

Use the rebalance API from the Swagger APIs on the controller http://localhost:9000/help#!/Table/rebalancearrow-up-right, with tableType REALTIME. A realtime table has 2 components, the consuming segments and the completed segments. By default, only the completed segments will get rebalanced. The consuming segments will pick the right assignment once they complete. But you can enforce the consuming segments to also be included in the rebalance, by setting the param includeConsuming to true. Note that rebalancing the consuming segments would mean the consuming segment will drop the consumed data so far, and restart consumption from the last offset, which may lead to a short duration of data staleness.

You can check the status of the rebalance by

  1. Checking the controller logs

  2. Running rebalance again after a while, you should receive status "status": "NO_OP"

  3. Checking the External View of the table, to see the changes in capacity/replicas have taken effect.

hashtag
Tuning and Optimizations

hashtag
Do replica groups work for real-time?

Yes, replica groups work for realtime. There's 2 parts to enabling replica groups:

  1. Replica groups segment assignment

  2. Replica group query routing

Replica group segment assignment

Replica group segment assignment is achieved in realtime, if number of servers is a multiple of number of replicas. The partitions get uniformly sprayed across the servers, creating replica groups.

For example, consider we have 6 partitions, 2 replicas, and 4 servers.

r1

r2

p1

S0

S1

p2

S2

S3

p3

S0

S1

p4

As you can see, the set (S0, S2) contains r1 of every partition, and (s1, S3) contains r2 of every partition. The query will only be routed to one of the sets, and not span every server. If you are are adding/removing servers from an existing table setup, you have to run rebalance for segment assignment changes to take effect.

Replica group query routing

Once replica group segment assignment is in effect, the query routing can take advantage of it. For replica group based query routing, set the following in the table config's routingarrow-up-right section, and then restart brokers

# define the pinot version 
PINOT_VERSION=0.3.0
bin/quick-start-batch.sh
# stop previous quick start cluster, if any
bin/quick-start-streaming.sh
# stop previous quick start cluster, if any
bin/quick-start-hybrid.sh
Select jsonExtractScalar(myJsonMapStr,'$.k1','STRING') 
    from myTable  
    where jsonExtractScalar(myJsonMapStr,'$.k1','STRING') = 'value-k1-0'"
Select sum(jsonExtractScalar(complexMapStr,'$.k4.met','INT')) 
    from myTable 
    group by jsonExtractScalar(complexMapStr,'$.k1','STRING')
select "timestamp" from myTable
SELECT COUNT(*) from myTable WHERE column = 'foo'
SELECT count(colA) as aliasA, colA from tableA GROUP BY colA ORDER BY aliasA
SELECT count(colA) as sumA, colA from tableA GROUP BY colA ORDER BY count(colA)
{ 
    "tableName": "pinotTable", 
    "tableType": "OFFLINE", 
    "segmentsConfig": {
      "replication": "3", 
      ... 
    }
    ..
{ 
    "tableName": "pinotTable", 
    "tableType": "REALTIME", 
    "segmentsConfig": {
      "replicasPerPartition": "3", 
      ... 
    }
    ..
{
    "tableName": "pinotTable", 
    "tableType": "REALTIME",
    "routing": {
        "instanceSelectorType": "replicaGroup"
    }
    ..
}
# navigate to directory containing the launcher scripts
cd apache-pinot-incubating-$PINOT_VERSION-bin

S2

S3

p5

S0

S1

p6

S2

S3

Public cloud examples

This page contains multiple quick start guides for deploying Pinot to a public cloud provider.

The following quick start guides will show you how to run an Apache Pinot cluster using Kubernetes on different public cloud providers.

Running on Azurechevron-right
Running on GCPchevron-right
Running on AWSchevron-right

Running Pinot in Docker

This quick start guide will show you how to run a Pinot cluster using Docker.

circle-info

Prerequisites

Install Dockerarrow-up-right

You can also try Kubernetes quick start if you already have a local minikubearrow-up-right cluster installed or Docker Kubernetesarrow-up-right setup.

Create an isolated bridge network in docker

We'll be using our docker image apachepinot/pinot:latest to run this quick start, which does the following:

  • Sets up the Pinot cluster

  • Creates a sample table and loads sample data

There are 3 types of quick start examples.

  • Batch example

  • Streaming example

  • Hybrid example

hashtag
Batch example

In this example we demonstrate how to do batch processing with Pinot.

  • Starts Pinot deployment by starting

    • Apache Zookeeper

    • Pinot Controller

Once the Docker container is running, you can view the logs by running the following command.

That's it! We've spun up a Pinot cluster.

circle-info

It may take a while for all the Pinot components to start and for the sample data to be loaded.

Use the below command to check the status in the container logs.

Your cluster is ready once you see the cluster setup completion messages and sample queries, as demonstrated below.

You can head over to to check out the data in the baseballStats table.

hashtag
Streaming example

In this example we demonstrate how to do stream processing with Pinot.

  • Starts Pinot deployment by starting

    • Apache Kafka

    • Apache Zookeeper

Once the cluster is up, you can head over to to check out the data in the meetupRSVPEvents table.

hashtag
Hybrid example

In this example we demonstrate how to do hybrid stream and batch processing with Pinot.

  1. Starts Pinot deployment by starting

    • Apache Kafka

    • Apache Zookeeper

Once the cluster is up, you can head over to to check out the data in the airlineStats table.

Neha Pawar from the Apache Pinot team shows you how to setup a Pinot cluster

Pinot Broker

  • Pinot Server

  • Creates a demo table

    • baseballStats

  • Launches a standalone data ingestion job

    • Builds one Pinot segment for a given CSV data file for table baseballStats

    • Pushes the built segment to the Pinot controller

  • Issues sample queries to Pinot

  • Pinot Controller

  • Pinot Broker

  • Pinot Server

  • Creates a demo table

    • meetupRsvp

  • Launches a meetup **stream

  • Publishes data to a Kafka topic meetupRSVPEvents to be subscribed to by Pinot

  • Issues sample queries to Pinot

  • Pinot Controller

  • Pinot Broker

  • Pinot Server

  • Creates a demo table

    • airlineStats

  • Launches a standalone data ingestion job

    • Builds Pinot segments under a given directory of Avro files for table airlineStats

    • Pushes built segments to Pinot controller

  • Launches a **stream of flights stats

  • Publishes data to a Kafka topic airlineStatsEvents to be subscribed to by Pinot

  • Issues sample queries to Pinot

  • Exploring Pinot
    Exploring Pinot
    Exploring Pinot
    Cluster Setup Completion Example

    Running on GCP

    This starter provides a quick start for running Pinot on Google Cloud Platform (GCP)

    This document provides the basic instruction to set up a Kubernetes Cluster on Google Kubernetes Engine(GKE)arrow-up-right

    hashtag
    1. Tooling Installation

    hashtag
    1.1 Install Kubectl

    Please follow this link () to install kubectl.

    For Mac User

    Please check kubectl version after installation.

    circle-info

    QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

    hashtag
    1.2 Install Helm

    Please follow this link () to install helm.

    For Mac User

    Please check helm version after installation.

    circle-info

    This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

    hashtag
    1.3 Install Google Cloud SDK

    __

    Please follow this link () to install Google Cloud SDK.

    hashtag
    1.3.1 For Mac User

    • Install Google Cloud SDK

    • Restart your shell

    hashtag
    2. (Optional) Initialize Google Cloud Environment

    hashtag
    3. (Optional) Create a Kubernetes cluster(GKE) in Google Cloud

    Below script will create a 3 nodes cluster named pinot-quickstart in us-west1-b with n1-standard-2 machines for demo purposes.

    Please modify the parameters in the example command below:

    You can monitor cluster status by command:

    Once the cluster is in RUNNING status, it's ready to be used.

    hashtag
    4. Connect to an existing cluster

    Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

    To verify the connection, you can run:

    hashtag
    5. Pinot Quickstart

    Please follow this to deploy your Pinot Demo.

    hashtag
    6. Delete a Kubernetes Cluster

    docker network create -d bridge pinot-demo
    docker run \
        --network=pinot-demo \
        --name pinot-quickstart \
        -p 9000:9000 \
        -d apachepinot/pinot:latest QuickStart \
        -type batch
    docker logs pinot-quickstart -f
    docker logs pinot-quickstart -f
    # stop previous container, if any, or use different network
    docker run \
        --network=pinot-demo \
        --name pinot-quickstart \
        -p 9000:9000 \
        -d apachepinot/pinot:latest QuickStart \
        -type stream
    # stop previous container, if any, or use different network
    docker run \
        --network=pinot-demo \
        --name pinot-quickstart \
        -p 9000:9000 \
        -d apachepinot/pinot:latest QuickStart \
        -type hybrid
    https://kubernetes.io/docs/tasks/tools/install-kubectlarrow-up-right
    https://helm.sh/docs/using_helm/#installing-helmarrow-up-right
    https://cloud.google.com/sdk/installarrow-up-right
    Kubernetes QuickStart
    brew install kubernetes-cli
    kubectl version
    brew install kubernetes-helm
    helm version
    curl https://sdk.cloud.google.com | bash
    exec -l $SHELL
    gcloud init
    GCLOUD_PROJECT=[your gcloud project name]
    GCLOUD_ZONE=us-west1-b
    GCLOUD_CLUSTER=pinot-quickstart
    GCLOUD_MACHINE_TYPE=n1-standard-2
    GCLOUD_NUM_NODES=3
    gcloud container clusters create ${GCLOUD_CLUSTER} \
      --num-nodes=${GCLOUD_NUM_NODES} \
      --machine-type=${GCLOUD_MACHINE_TYPE} \
      --zone=${GCLOUD_ZONE} \
      --project=${GCLOUD_PROJECT}
    gcloud compute instances list
    GCLOUD_PROJECT=[your gcloud project name]
    GCLOUD_ZONE=us-west1-b
    GCLOUD_CLUSTER=pinot-quickstart
    gcloud container clusters get-credentials ${GCLOUD_CLUSTER} --zone ${GCLOUD_ZONE} --project ${GCLOUD_PROJECT}
    kubectl get nodes
    GCLOUD_ZONE=us-west1-b
    gcloud container clusters delete pinot-quickstart --zone=${GCLOUD_ZONE}

    Running on Azure

    This starter guide provides a quick start for running Pinot on Microsoft Azure

    This document provides the basic instruction to set up a Kubernetes Cluster on Azure Kubernetes Service (AKS)arrow-up-right

    hashtag
    1. Tooling Installation

    hashtag
    1.1 Install Kubectl

    Please follow this link () to install kubectl.

    For Mac User

    Please check kubectl version after installation.

    circle-info

    QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

    hashtag
    1.2 Install Helm

    Please follow this link () to install helm.

    For Mac User

    Please check helm version after installation.

    circle-info

    This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

    hashtag
    1.3 Install Azure CLI

    Please follow this link () to install Azure CLI.

    For Mac User

    hashtag
    2. (Optional) Login to your Azure account

    Below script will open default browser to sign-in to your Azure Account.

    hashtag
    3. (Optional) Create a Resource Group

    Below script will create a resource group in location eastus.

    hashtag
    4. (Optional) Create a Kubernetes cluster(AKS) in Azure

    Below script will create a 3 nodes cluster named pinot-quickstart for demo purposes.

    Please modify the parameters in the example command below:

    Once the command is succeed, it's ready to be used.

    hashtag
    5. Connect to an existing cluster

    Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

    To verify the connection, you can run:

    hashtag
    6. Pinot Quickstart

    Please follow this to deploy your Pinot Demo.

    hashtag
    7. Delete a Kubernetes Cluster

    Manual cluster setup

    This quick start guide will show you how to set up a Pinot cluster manually.

    hashtag
    Start Pinot components (scripts or docker images)

    hashtag

    Running on AWS

    This guide provides a quick start for running Pinot on Amazon Web Services (AWS).

    This document provides the basic instruction to set up a Kubernetes Cluster on

    hashtag
    1. Tooling Installation

    hashtag

    https://kubernetes.io/docs/tasks/tools/install-kubectlarrow-up-right
    https://helm.sh/docs/using_helm/#installing-helmarrow-up-right
    https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latestarrow-up-right
    Kubernetes QuickStart
    1.1 Install Kubectl

    Please follow this link (https://kubernetes.io/docs/tasks/tools/install-kubectlarrow-up-right) to install kubectl.

    For Mac User

    Please check kubectl version after installation.

    circle-info

    QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

    hashtag
    1.2 Install Helm

    Please follow this link (https://helm.sh/docs/using_helm/#installing-helmarrow-up-right) to install helm.

    For Mac User

    Please check helm version after installation.

    circle-info

    This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

    hashtag
    1.3 Install AWS CLI

    __

    Please follow this link (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html#install-tool-bundledarrow-up-right) to install AWS CLI.

    For Mac User

    hashtag
    1.4 Install Eksctl

    Please follow this link (https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html#installing-eksctlarrow-up-right) to install AWS CLI.

    For Mac User

    hashtag
    2. (Optional) Login to your AWS account.

    For first time AWS user, please register your account at https://aws.amazon.com/arrow-up-right.

    Once created the account, you can go to AWS Identity and Access Management (IAM)arrow-up-right to create a user and create access keys under Security Credential tab.

    circle-info

    Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will override AWS configuration stored in file ~/.aws/credentials

    hashtag
    3. (Optional) Create a Kubernetes cluster(EKS) in AWS

    Below script will create a 3 nodes cluster named pinot-quickstart in us-west-2 with t3.small machines for demo purposes.

    Please modify the parameters in the example command below:

    You can monitor cluster status by command:

    Once the cluster is in ACTIVE status, it's ready to be used.

    hashtag
    4. Connect to an existing cluster

    Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

    To verify the connection, you can run:

    hashtag
    5. Pinot Quickstart

    Please follow this Kubernetes QuickStart to deploy your Pinot Demo.

    hashtag
    6. Delete a Kubernetes Cluster

    Amazon Elastic Kubernetes Service (Amazon EKS)arrow-up-right
    brew install kubernetes-cli
    kubectl version
    brew install kubernetes-helm
    helm version
    brew update && brew install azure-cli
    az login
    AKS_RESOURCE_GROUP=pinot-demo
    AKS_RESOURCE_GROUP_LOCATION=eastus
    az group create --name ${AKS_RESOURCE_GROUP} \
                    --location ${AKS_RESOURCE_GROUP_LOCATION}
    AKS_RESOURCE_GROUP=pinot-demo
    AKS_CLUSTER_NAME=pinot-quickstart
    az aks create --resource-group ${AKS_RESOURCE_GROUP} \
                  --name ${AKS_CLUSTER_NAME} \
                  --node-count 3
    AKS_RESOURCE_GROUP=pinot-demo
    AKS_CLUSTER_NAME=pinot-quickstart
    az aks get-credentials --resource-group ${AKS_RESOURCE_GROUP} \
                           --name ${AKS_CLUSTER_NAME}
    kubectl get nodes
    AKS_RESOURCE_GROUP=pinot-demo
    AKS_CLUSTER_NAME=pinot-quickstart
    az aks delete --resource-group ${AKS_RESOURCE_GROUP} \
                  --name ${AKS_CLUSTER_NAME}
    brew install kubernetes-cli
    kubectl version
    brew install kubernetes-helm
    helm version
    curl "https://d1vvhvl2y92vvt.cloudfront.net/awscli-exe-macos.zip" -o "awscliv2.zip"
    unzip awscliv2.zip
    sudo ./aws/install
    brew tap weaveworks/tap
    brew install weaveworks/tap/eksctl
    aws configure
    EKS_CLUSTER_NAME=pinot-quickstart
    eksctl create cluster \
    --name ${EKS_CLUSTER_NAME} \
    --version 1.14 \
    --region us-west-2 \
    --nodegroup-name standard-workers \
    --node-type t3.small \
    --nodes 3 \
    --nodes-min 3 \
    --nodes-max 4 \
    --node-ami auto
    EKS_CLUSTER_NAME=pinot-quickstart
    aws eks describe-cluster --name ${EKS_CLUSTER_NAME}
    EKS_CLUSTER_NAME=pinot-quickstart
    aws eks update-kubeconfig --name ${EKS_CLUSTER_NAME}
    kubectl get nodes
    EKS_CLUSTER_NAME=pinot-quickstart
    aws eks delete-cluster --name ${EKS_CLUSTER_NAME}
    Start Pinot Components using docker

    hashtag
    Pull docker image

    You can try out pre-built Pinot all-in-one docker image.

    (Optional) You can also follow the instructions here to build your own images.

    hashtag
    0. Create a Network

    Create an isolated bridge network in docker

    hashtag
    1. Start Zookeeper

    Start Zookeeper in daemon mode. This is a single node zookeeper setup. Zookeeper is the central metadata store for Pinot and should be set up with replication for production use. See https://zookeeper.apache.org/doc/r3.6.0/zookeeperStarted.html#sc_RunningReplicatedZooKeeperarrow-up-right for more information.

    Start ZKUIarrow-up-right to browse Zookeeper data at http://localhost:9090arrow-up-right.

    Alternately, you can use Zooinspectorarrow-up-right.

    hashtag
    2. Start Pinot Controller

    Start Pinot Controller in daemon and connect to Zookeeper.

    hashtag
    3. Start Pinot Broker

    Start Pinot Broker in daemon and connect to Zookeeper.

    hashtag
    4. Start Pinot Server

    Start Pinot Server in daemon and connect to Zookeeper.

    hashtag
    5. Start Kafka

    Optionally, you can also start Kafka for setting up realtime streams. This brings up the Kafka broker on port 9092.

    Now all Pinot related components are started as an empty cluster.

    You can run below command to check container status.

    Sample Console Output

    circle-info

    Prerequisites

    Follow instruction in Getting Pinot to get Pinot

    hashtag
    Start Pinot components via launcher scripts

    hashtag
    1. Start Zookeeper

    You can use to browse the Zookeeper instance.

    hashtag
    2. Start Pinot Controller

    hashtag
    3. Start Pinot Broker

    hashtag
    4. Start Pinot Server

    hashtag
    5. Start Kafka

    Now all Pinot related components are started as an empty cluster.

    Now it's time to start adding data to the cluster. Check out some of the Recipes or follow the Batch upload sample data and Stream sample data for instructions on loading your own data.

    export PINOT_VERSION=0.3.0-SNAPSHOT
    export PINOT_IMAGE=apachepinot/pinot:${PINOT_VERSION}
    docker pull ${PINOT_IMAGE}
    docker network create -d bridge pinot-demo
    docker run \
        --network=pinot-demo \
        --name  pinot-zookeeper \
        --restart always \
        -p 2181:2181 \
        -d zookeeper:3.5.6

    Stream ingestion example

    The Docker instructions on this page are still WIP

    So far, we setup our cluster, ran some queries on the demo tables and explored the admin endpoints. We also uploaded some sample batch data for transcript table.

    Now, it's time to ingest from a sample stream into Pinot.

    hashtag
    Data Stream

    First, we need to setup a stream. Pinot has out-of-the-box realtime ingestion support for Kafka. Other streams can be plugged in, more details in Pluggable Streams.

    Let's setup a demo Kafka cluster locally, and create a sample topic transcript-topic

    Start Kafka

    Create a Kafka Topic

    Start Kafka

    Start Kafka cluster on port 9876 using the same Zookeeper from the quick-start examples

    Create a Kafka topic

    hashtag
    Creating a Schema

    If you followed the , you have already pushed a schema for your sample table. If not, head over to on that page, to learn how to create a schema for your sample data.

    hashtag
    Creating a table config

    If you followed , you learnt how to push an offline table and schema. Similar to the offline table config, we will create a realtime table config for the sample. Here's the realtime table config for the transcript table. For a more detailed overview about table, checkout .

    hashtag
    Uploading your schema and table config

    Now that we have our table and schema, let's upload them to the cluster. As soon as the realtime table is created, it will begin ingesting from the Kafka topic.

    hashtag
    Loading sample data into stream

    Here's a JSON file for transcript table data:

    Push sample JSON into Kafka topic, using the Kafka script from the Kafka download

    hashtag
    Ingesting streaming data

    As soon as data flows into the stream, the Pinot table will consume it and it will be ready for querying. Head over to the to checkout the realtime data

    docker run --rm -ti \
        --network pinot-demo --name=zkui \
        -p 9090:9090 \
        -e ZK_SERVER=pinot-zookeeper:2181 \
        -d qnib/plain-zkui:latest
    docker run --rm -ti \
        --network=pinot-demo \
        --name pinot-controller \
        -p 9000:9000 \
        -d ${PINOT_IMAGE} StartController \
        -zkAddress pinot-zookeeper:2181
    docker run --rm -ti \
        --network=pinot-demo \
        --name pinot-broker \
        -d ${PINOT_IMAGE} StartBroker \
        -zkAddress pinot-zookeeper:2181
    docker run --rm -ti \
        --network=pinot-demo \
        --name pinot-server \
        -d ${PINOT_IMAGE} StartServer \
        -zkAddress pinot-zookeeper:2181
    docker run --rm -ti \
        --network pinot-demo --name=kafka \
        -e KAFKA_ZOOKEEPER_CONNECT=pinot-zookeeper:2181/kafka \
        -e KAFKA_BROKER_ID=0 \
        -e KAFKA_ADVERTISED_HOST_NAME=kafka \
        -d wurstmeister/kafka:latest
    docker container ls -a
    CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS              PORTS                                                  NAMES
    9ec20e4463fa        wurstmeister/kafka:latest   "start-kafka.sh"         43 minutes ago      Up 43 minutes                                                              kafka
    0775f5d8d6bf        apachepinot/pinot:latest    "./bin/pinot-admin.s…"   44 minutes ago      Up 44 minutes       8096-8099/tcp, 9000/tcp                                pinot-server
    64c6392b2e04        apachepinot/pinot:latest    "./bin/pinot-admin.s…"   44 minutes ago      Up 44 minutes       8096-8099/tcp, 9000/tcp                                pinot-broker
    b6d0f2bd26a3        apachepinot/pinot:latest    "./bin/pinot-admin.s…"   45 minutes ago      Up 45 minutes       8096-8099/tcp, 0.0.0.0:9000->9000/tcp                  pinot-controller
    570416fc530e        zookeeper:3.5.6             "/docker-entrypoint.…"   45 minutes ago      Up 45 minutes       2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, 8080/tcp   pinot-zookeeper
    Zooinspectorarrow-up-right
    cd apache-pinot-incubating-${PINOT_VERSION}-bin
    bin/pinot-admin.sh StartZookeeper \
      -zkPort 2191
    bin/pinot-admin.sh StartController \
        -zkAddress localhost:2191 \
        -controllerPort 9000
    bin/pinot-admin.sh StartBroker \
        -zkAddress localhost:2191
    bin/pinot-admin.sh StartServer \
        -zkAddress localhost:2191
    bin/pinot-admin.sh  StartKafka \ 
      -zkAddress=localhost:2191/kafka \
      -port 19092
    Download the latest . Create a topic
    docker run \
        --network pinot-demo --name=kafka \
        -e KAFKA_ZOOKEEPER_CONNECT=pinot-quickstart:2123/kafka \
        -e KAFKA_BROKER_ID=0 \
        -e KAFKA_ADVERTISED_HOST_NAME=kafka \
        -d wurstmeister/kafka:latest
    docker exec \
      -t kafka \
      /opt/kafka/bin/kafka-topics.sh \
      --zookeeper pinot-quickstart:2123/kafka \
      --partitions=1 --replication-factor=1 \
      --create --topic transcript-topic
    bin/pinot-admin.sh  StartKafka -zkAddress=localhost:2123/kafka -port 9876
    Batch upload sample data
    Creating a schema
    Batch upload sample data
    Table
    Query Console arrow-up-right
    /tmp/pinot-quick-start/transcript-table-realtime.json
    {
      "tableName": "transcript",
      "tableType": "REALTIME",
      "segmentsConfig": {
        "timeColumnName": "timestamp",
        "timeType": "MILLISECONDS",
        "schemaName": "transcript",
        "replicasPerPartition": "1"
      },
      "tenants": {},
      "tableIndexConfig": {
        "loadMode": "MMAP",
        "streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.consumer.type": "lowlevel",
          "stream.kafka.topic.name": "transcript-topic",
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.broker.list": "localhost:9876",
          "realtime.segment.flush.threshold.time": "3600000",
          "realtime.segment.flush.threshold.size": "50000",
          "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
        }
      },
      "metadata": {
        "customConfigs": {}
      }
    }
    docker run \
        --network=pinot-demo \
        -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
        --name pinot-streaming-table-creation \
        apachepinot/pinot:latest AddTable \
        -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
        -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \
        -controllerHost pinot-quickstart \
        -controllerPort 9000 \
        -exec
    bin/pinot-admin.sh AddTable \
        -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
        -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \
        -exec
    /tmp/pinot-quick-start/rawData/transcript.json
    {"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"Maths","score":3.8,"timestamp":1571900400000}
    {"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"History","score":3.5,"timestamp":1571900400000}
    {"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Maths","score":3.2,"timestamp":1571900400000}
    {"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Chemistry","score":3.6,"timestamp":1572418800000}
    {"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Geography","score":3.8,"timestamp":1572505200000}
    {"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"English","score":3.5,"timestamp":1572505200000}
    {"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Maths","score":3.2,"timestamp":1572678000000}
    {"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Physics","score":3.6,"timestamp":1572678000000}
    {"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"Maths","score":3.8,"timestamp":1572678000000}
    {"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"English","score":3.5,"timestamp":1572678000000}
    {"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"History","score":3.2,"timestamp":1572854400000}
    {"studentID":212,"firstName":"Nick","lastName":"Young","gender":"Male","subject":"History","score":3.6,"timestamp":1572854400000}
    bin/kafka-console-producer.sh \
        --broker-list localhost:9876 \
        --topic transcript-topic < /tmp/pinot-quick-start/rawData/transcript.json
    Kafkaarrow-up-right
    bin/kafka-topics.sh --create --bootstrap-server localhost:9876 --replication-factor 1 --partitions 1 --topic transcript-topic

    Running Pinot in Kubernetes

    Pinot quick start in Kubernetes

    hashtag
    1. Prerequisites

    circle-info

    This quick start assumes the existence of a Kubernetes cluster. Please follow the links below to setup your Kubernetes cluster.

    hashtag
    2. Setting up a Pinot cluster in Kubernetes

    Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.

    The scripts can be found in the Pinot source at ./incubator-pinot/kubernetes/helm

    hashtag
    2.1 Start Pinot with Helm

    Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is .

    hashtag
    2.1.1 Update helm dependency

    hashtag
    2.2 Check Pinot deployment status

    hashtag
    3. Load data into Pinot using Kafka

    hashtag
    3.1 Bring up a Kafka cluster for real-time data ingestion

    hashtag
    3.2 Check Kafka deployment status

    Ensure the Kafka deployment is ready before executing the scripts in the following next steps.

    hashtag
    3.3 Create Kafka topics

    The scripts below will create two Kafka topics for data ingestion:

    hashtag
    3.4 Load data into Kafka and create Pinot schema/tables

    The script below will deploy 3 batch jobs.

    • Ingest 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec

    • Ingest 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec

    • Upload Pinot schema airlineStats

    hashtag
    4. Query using Pinot Data Explorer

    hashtag
    4.1 Pinot Data Explorer

    Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.

    This script can be found in the Pinot source at ./incubator-pinot/kubernetes/helm

    hashtag
    5. Using Superset to query Pinot

    hashtag
    5.1 Bring up Superset

    hashtag
    5.2 (First time) Set up Admin account

    hashtag
    5.3 (First time) Init Superset

    hashtag
    5.4 Load Demo data source

    hashtag
    5.5 Access Superset UI

    You can run below command to navigate superset in your browser with the previous admin credential.

    You can open the imported dashboard by clicking Dashboards banner and then click on AirlineStats.

    hashtag
    6. Access Pinot using Presto

    hashtag
    6.1 Deploy Presto using Pinot plugin

    You can run the command below to deploy a customized Presto with Pinot plugin installed.

    hashtag
    6.2 Query Presto using Presto CLI

    Once Presto is deployed, you can run the command below.

    hashtag
    6.3 Sample queries to execute

    • List all catalogs

    • List All tables

    • Show schema

    • Count total documents

    hashtag
    7. Deleting the Pinot cluster in Kubernetes

    Setup a Kubernetes Cluster using Google Kubernetes Engine (GKE)
  • Setup a Kubernetes Cluster using Azure Kubernetes Service (AKS)

  • hashtag
    2.1.2 Start Pinot with Helm
    • For Helm v2.12.1

    If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:

    Then deploy a new HA Pinot cluster using the following command:

    • For Helm v3.0.0

    hashtag
    2.1.3 Troubleshooting (For helm v2.12.1)

    • Error: Please run the below command if encountering the following issue:

    • Resolution:

    • Error: Please run the command below if encountering a permission issue:

    Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"

    • Resolution:

  • Create Pinot table airlineStats to ingest data from JSON encoded Kafka topic flights-realtime

  • Create Pinot table airlineStatsAvro to ingest data from Avro encoded Kafka topic flights-realtime-avro

  • helm repo add pinot https://raw.githubusercontent.com/apache/incubator-pinot/master/kubernetes/helm
    kubectl create ns pinot-quickstart
    helm install pinot pinot/pinot \
        -n pinot-quickstart \
        --set cluster.name=pinot \
        --set server.replicaCount=2
    helm dependency update
    Enable Kubernetes on Docker-Desktoparrow-up-right
    Install Minikube for local setuparrow-up-right
    Setup a Kubernetes Cluster using Amazon Elastic Kubernetes Service (Amazon EKS)
    herearrow-up-right
    # checkout pinot
    git clone https://github.com/apache/incubator-pinot.git
    cd incubator-pinot/kubernetes/helm
    helm init --service-account tiller
    helm install --namespace "pinot-quickstart" --name "pinot" .
    kubectl create ns pinot-quickstart
    helm install -n pinot-quickstart pinot .
    Error: could not find tiller.
    kubectl -n kube-system delete deployment tiller-deploy
    kubectl -n kube-system delete service/tiller-deploy
    helm init --service-account tiller
    kubectl get all -n pinot-quickstart
    helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
    helm install -n pinot-quickstart kafka incubator/kafka --set replicas=1
    helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
    helm install --namespace "pinot-quickstart"  --name kafka incubator/kafka
    kubectl get all -n pinot-quickstart |grep kafka
    pod/kafka-0                                          1/1     Running     0          2m
    pod/kafka-zookeeper-0                                       1/1     Running     0          10m
    pod/kafka-zookeeper-1                                       1/1     Running     0          9m
    pod/kafka-zookeeper-2                                       1/1     Running     0          8m
    kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor 1
    kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1
    kubectl apply -f pinot-realtime-quickstart.yml
    ./query-pinot-data.sh
    kubectl apply -f superset.yaml
    kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'flask fab create-admin'
    kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset db upgrade'
    kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset init'
    kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_datasources -p /etc/superset/pinot_example_datasource.yaml'
    kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_dashboards -p /etc/superset/pinot_example_dashboard.json'
    ./open-superset-ui.sh
    helm install presto pinot/presto -n pinot
    kubectl apply -f presto-coordinator.yaml
    ./pinot-presto-cli.sh
    presto:default> show catalogs;
     Catalog
    ---------
     pinot
     system
    (2 rows)
    
    Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
    Splits: 19 total, 19 done (100.00%)
    0:01 [0 rows, 0B] [0 rows/s, 0B/s]
    presto:default> show tables;
        Table
    --------------
     airlinestats
    (1 row)
    
    Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
    Splits: 19 total, 19 done (100.00%)
    0:01 [1 rows, 29B] [1 rows/s, 41B/s]
    presto:default> DESCRIBE pinot.dontcare.airlinestats;
            Column        |  Type   | Extra | Comment
    ----------------------+---------+-------+---------
     flightnum            | integer |       |
     origin               | varchar |       |
     quarter              | integer |       |
     lateaircraftdelay    | integer |       |
     divactualelapsedtime | integer |       |
    ......
    
    Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
    Splits: 19 total, 19 done (100.00%)
    0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]
    presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;
     cnt
    ------
     9745
    (1 row)
    
    Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
    Splits: 17 total, 17 done (100.00%)
    0:00 [1 rows, 8B] [2 rows/s, 19B/s]
    kubectl delete ns pinot-quickstart
    kubectl apply -f helm-rbac.yaml

    Batch import example

    Step-by-step guide on pushing your own data into the Pinot cluster

    So far, we setup our cluster, ran some queries, explored the admin endpoints. Now, it's time to get our own data into Pinot

    hashtag
    Preparing your data

    Let's gather our data files and put it in pinot-quick-start/rawdata.

    Supported file formats are CVS, JSON, AVRO, PARQUET, THRIFT, ORC. If you don't have sample data, you can use this sample CSV.

    hashtag
    Creating a schema

    Schema is used to define the columns and data types of the Pinot table. A detailed overview of the schema can be found in Schema.

    Briefly, we categorize our columns into 3 types

    Column Type

    Description

    Dimensions

    Typically used in filters and group by, for slicing and dicing into data

    Metrics

    Typically used in aggregations, represents the quantitative data

    Time

    Optional column, represents the timestamp associated with each row

    For example, in our sample table, the playerID, yearID, teamID, league, playerName columns are the dimensions, the playerStint, numberOfgames, numberOfGamesAsBatter, AtBatting, runs, hits, doules, triples, homeRuns, runsBattedIn, stolenBases, caughtStealing, baseOnBalls, strikeouts, intentionalWalks, hitsByPitch, sacrificeHits, sacrificeFlies, groundedIntoDoublePlays, G_old columns are the metrics and there is no time column.

    Once you have identified the dimensions, metrics and time columns, create a schema for your data, using the reference below.

    hashtag
    Creating a table config

    A table config is used to define the config related to the Pinot table. A detailed overview of the table can be found in Table.

    Here's the table config for the sample CSV file. You can use this as a reference to build your own table config. Simply edit the tableName and schemaName.

    hashtag
    Uploading your table config and schema

    Check the directory structure so far

    Upload the table config using the following command

    Check out the table config and schema in the Rest APIarrow-up-right to make sure it was successfully uploaded.

    hashtag
    Creating a segment

    A Pinot table's data is stored as Pinot segments. A detailed overview of the segment can be found in Segment.

    To generate a segment, we need to first create a job spec yaml file. JobSpec yaml file has all the information regarding data format, input data location and pinot cluster coordinates. You can just copy over this job spec file. If you're using your own data, be sure to 1) replace transcript with your table name 2) set the right recordReaderSpec

    Use the following command to generate a segment and upload it

    Sample output

    Check that your segment made it to the table using the Rest APIarrow-up-right

    hashtag
    Querying your data

    You're all set! You should see your table in the Query Consolearrow-up-right and be able to run queries against it now.

    select * from transcript
    docker run --rm -ti \
        --network=pinot-demo \
        -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
        --name pinot-batch-table-creation \
        apachepinot/pinot:latest AddTable \
        -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
        -tableConfigFile /tmp/pinot-quick-start/transcript-table-offline.json \
        -controllerHost pinot-quickstart \
        -controllerPort 9000 -exec
    bin/pinot-admin.sh AddTable \
      -tableConfigFile /tmp/pinot-quick-start/transcript-table-offline.json \
      -schemaFile /tmp/pinot-quick-start/transcript-schema.json -exec
    /tmp/pinot-quick-start/batch-job-spec.yml
    executionFrameworkSpec:
      name: 'standalone'
      segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
      segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
      segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
    jobType: SegmentCreationAndTarPush
    inputDirURI: '/tmp/pinot-quick-start/rawdata/'
    includeFileNamePattern: 'glob:**/*.csv'
    outputDirURI: '/tmp/pinot-quick-start/segments/'
    overwriteOutput: true
    pinotFSSpecs:
      - scheme: file
        className: org.apache.pinot.spi.filesystem.LocalPinotFS
    recordReaderSpec:
      dataFormat: 'csv'
      className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
      configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
    tableSpec:
      tableName: 'transcript'
      schemaURI: 'http://localhost:9000/tables/transcript/schema'
      tableConfigURI: 'http://localhost:9000/tables/transcript'
    pinotClusterSpecs:
      - controllerURI: 'http://localhost:9000'
    /tmp/pinot-quick-start/docker-job-spec.yml
    executionFrameworkSpec:
      name: 'standalone'
      segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
      segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
      segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
    jobType: SegmentCreationAndTarPush
    inputDirURI: '/tmp/pinot-quick-start/rawdata/'
    includeFileNamePattern: 'glob:**/*.csv'
    outputDirURI: '/tmp/pinot-quick-start/segments/'
    overwriteOutput: true
    pinotFSSpecs:
      - scheme: file
        className: org.apache.pinot.spi.filesystem.LocalPinotFS
    recordReaderSpec:
      dataFormat: 'csv'
      className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
      configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
    tableSpec:
      tableName: 'transcript'
      schemaURI: 'http://pinot-quickstart:9000/tables/transcript/schema'
      tableConfigURI: 'http://pinot-quickstart:9000/tables/transcript'
    pinotClusterSpecs:
      - controllerURI: 'http://pinot-quickstart:9000'
    docker run --rm -ti \
        --network=pinot-demo \
        -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
        --name pinot-data-ingestion-job \
        apachepinot/pinot:latest LaunchDataIngestionJob \
        -jobSpecFile /tmp/pinot-quick-start/docker-job-spec.yml
    bin/pinot-admin.sh LaunchDataIngestionJob \
        -jobSpecFile /tmp/pinot-quick-start/batch-job-spec.yml
    mkdir -p /tmp/pinot-quick-start/rawdata
    /tmp/pinot-quick-start/rawdata/transcript.csv
    studentID,firstName,lastName,gender,subject,score,timestamp
    200,Lucy,Smith,Female,Maths,3.8,1570863600000
    200,Lucy,Smith,Female,English,3.5,1571036400000
    201,Bob,King,Male,Maths,3.2,1571900400000
    202,Nick,Young,Male,Physics,3.6,1572418800000
    /tmp/pinot-quick-start/transcript-schema.json
    {
      "schemaName": "transcript",
      "dimensionFieldSpecs": [
        {
          "name": "studentID",
          "dataType": "INT"
        },
        {
          "name": "firstName",
          "dataType": "STRING"
        },
        {
          "name": "lastName",
          "dataType": "STRING"
        },
        {
          "name": "gender",
          "dataType": "STRING"
        },
        {
          "name": "subject",
          "dataType": "STRING"
        }
      ],
      "metricFieldSpecs": [
        {
          "name": "score",
          "dataType": "FLOAT"
        }
      ],
      "dateTimeFieldSpecs": [{
        "name": "timestamp",
        "dataType": "LONG",
        "format" : "1:MILLISECONDS:EPOCH",
        "granularity": "1:MILLISECONDS"
      }]
    }
    /tmp/pinot-quick-start/transcript-table-offline.json
    {
      "tableName": "transcript",
      "segmentsConfig" : {
        "timeColumnName": "timestamp",
        "timeType": "MILLISECONDS",
        "replication" : "1",
        "schemaName" : "transcript"
      },
      "tableIndexConfig" : {
        "invertedIndexColumns" : [],
        "loadMode"  : "MMAP"
      },
      "tenants" : {
        "broker":"DefaultTenant",
        "server":"DefaultTenant"
      },
      "tableType":"OFFLINE",
      "metadata": {}
    }
    $ ls /tmp/pinot-quick-start
    rawdata            transcript-schema.json    transcript-table-offline.json
    
    $ ls /tmp/pinot-quick-start/rawdata 
    transcript.csv
    SegmentGenerationJobSpec: 
    !!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
    excludeFileNamePattern: null
    executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
      segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
      segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
    includeFileNamePattern: glob:**\/*.csv
    inputDirURI: /tmp/pinot-quick-start/rawdata/
    jobType: SegmentCreationAndTarPush
    outputDirURI: /tmp/pinot-quick-start/segments
    overwriteOutput: true
    pinotClusterSpecs:
    - {controllerURI: 'http://localhost:9000'}
    pinotFSSpecs:
    - {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
    pushJobSpec: null
    recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader,
      configClassName: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig,
      configs: null, dataFormat: csv}
    segmentNameGeneratorSpec: null
    tableSpec: {schemaURI: 'http://localhost:9000/tables/transcript/schema', tableConfigURI: 'http://localhost:9000/tables/transcript',
      tableName: transcript}
    
    Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
    Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
    Finished building StatsCollector!
    Collected stats for 4 documents
    Using fixed bytes value dictionary for column: studentID, size: 9
    Created dictionary for STRING column: studentID with cardinality: 3, max length in bytes: 3, range: 200 to 202
    Using fixed bytes value dictionary for column: firstName, size: 12
    Created dictionary for STRING column: firstName with cardinality: 3, max length in bytes: 4, range: Bob to Nick
    Using fixed bytes value dictionary for column: lastName, size: 15
    Created dictionary for STRING column: lastName with cardinality: 3, max length in bytes: 5, range: King to Young
    Created dictionary for FLOAT column: score with cardinality: 4, range: 3.2 to 3.8
    Using fixed bytes value dictionary for column: gender, size: 12
    Created dictionary for STRING column: gender with cardinality: 2, max length in bytes: 6, range: Female to Male
    Using fixed bytes value dictionary for column: subject, size: 21
    Created dictionary for STRING column: subject with cardinality: 3, max length in bytes: 7, range: English to Physics
    Created dictionary for LONG column: timestamp with cardinality: 4, range: 1570863600000 to 1572418800000
    Start building IndexCreator!
    Finished records indexing in IndexCreator!
    Finished segment seal!
    Converting segment: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to v3 format
    v3 segment location for segment: transcript_OFFLINE_1570863600000_1572418800000_0 is /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3
    Deleting files in v1 segment directory: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0
    Starting building 1 star-trees with configs: [StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]] using OFF_HEAP builder
    Starting building star-tree with config: StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]
    Generated 3 star-tree records from 4 segment records
    Finished constructing star-tree, got 9 tree nodes and 4 records under star-node
    Finished creating aggregated documents, got 6 aggregated records
    Finished building star-tree in 10ms
    Finished building 1 star-trees in 27ms
    Computed crc = 3454627653, based on files [/var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/columns.psf, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/index_map, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/metadata.properties, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index_map]
    Driver, record read time : 0
    Driver, stats collector time : 0
    Driver, indexing time : 0
    Tarring segment from: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz
    Size for segment: transcript_OFFLINE_1570863600000_1572418800000_0, uncompressed: 6.73KB, compressed: 1.89KB
    Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
    Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
    Start pushing segments: [/tmp/pinot-quick-start/segments/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz]... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@243c4f91] for table transcript
    Pushing segment: transcript_OFFLINE_1570863600000_1572418800000_0 to location: http://localhost:9000 for table transcript
    Sending request: http://localhost:9000/v2/segments?tableName=transcript to controller: nehas-mbp.hsd1.ca.comcast.net, version: Unknown
    Response for pushing table transcript segment transcript_OFFLINE_1570863600000_1572418800000_0 to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: transcript_OFFLINE_1570863600000_1572418800000_0 of table: transcript"}