1 of 100

release-0.4.0

Introduction

Introduction to Apache Pinot, a real-time distributed OLAP datastore.

What is Pinot?

Pinot is a real-time distributed OLAP datastore, built to deliver scalable real-time analytics with low latency. It can ingest from batch data sources (such as Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage) as well as stream data sources (such as Apache Kafka).

Pinot was built by engineers at LinkedIn and Uber and is designed to scale up and out with no upper bound. Performance always remains constant based on the size of your cluster and an expected query per second (QPS) threshold.

Join us in our Slack channel for questions, troubleshooting, and feedback. We'd love to hear from you.

Get started

Our documentation is structured to let you quickly get to the content you need and is organized around the different concerns of users, operators, and developers. If you're new to Pinot and want to learn things by example, please take a look at our getting started section.

Starter guides

To start importing data into Pinot, check out our guides on batch import and stream ingestion based on our .

Query example

Pinot works very well for querying time series data with many dimensions and metrics over a vast unbounded space of records that scales linearly on a per node basis. Filters and aggregations are both easy and fast.

Pinot supports SQL for querying read-only data. Learn more about querying Pinot for time series data in our guide.

Installation

Pinot may be deployed to and operated on a cloud provider or a local or virtual machine. You may get started either with a bare-metal installation or a Kubernetes one (either locally or in the cloud). To get immediately started with Pinot, check out these quick start guides for bootstrapping a Pinot cluster using Docker or Kubernetes.

Standalone mode

Cluster mode

Learn

For a high-level overview that explains how Pinot works, please take a look at our basic concepts section.

To understand the distributed systems architecture that explains Pinot's operating model, please take a look at our basic architecture section.

Overview

This section focuses on answering the most frequently asked questions for people exploring the newly evolving category of distributed OLAP engines. Pinot was created by authors at both Uber and LinkedIn and has been hardened and battle tested at the very highest of load and scale.

Is Pinot a data warehouse or a database?

While Pinot doesn't match the typical mold of a database product, it is best understood based on your role as either an analyst, data scientist, or application developer.

Enterprise business intelligence

For analysts and data scientists, Pinot is best viewed as a highly-scalable data platform for business intelligence. In this view, Pinot converges big data platforms with the traditional role of a data warehouse, making it a suitable replacement for analysis and reporting.

Enterprise application development

For application developers, Pinot is best viewed as an immutable aggregate store that sources events from streaming data sources, such as Kafka, and makes it available for query using SQL.

As is the case with a microservice architecture, data encapsulation ends up requiring each application to provision its own data store, as opposed to sharing one OLTP database for reads and writes. In this case, it becomes difficult to query the complete view of a domain because it becomes stored in many different databases. This is costly in terms of performance, since it requires joins across multiple microservices that expose their data over HTTP under a REST API. To prevent this, Pinot can be used to aggregate all of the data across a microservice architecture into one easily queryable view of the domain.

Pinot prevent any possibility of sharing ownership of database tables across microservice teams. Developers can create their own query models of data from multiple systems of record depending on their use case and needs. As with all aggregate stores, query models are eventually consistent and immutable.

Companies using Pinot

Features

A column-oriented database with various compression schemes such as Run Length, Fixed Bit Length
Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index
Ability to optimize query/execution plan based on query and segment metadata
Near real time ingestion from streams and batch ingestion from Hadoop
SQL-like language that supports selection, aggregation, filtering, group by, order by, distinct queries on data
Support for multi-valued fields
Horizontally scalable and fault-tolerant

When should I use it?

Pinot is designed to execute OLAP queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion.

User facing Analytics Products

Pinot was originally built at LinkedIn to power rich interactive real-time analytic applications such as , , , and many more. is another example of a customer facing Analytics App. At LinkedIn, Pinot powers 50+ user-facing products, ingesting millions of events per second and serving 100k+ queries per second at millisecond latency.

Real-time Dashboard for Business Metrics

Pinot can be also be used to perform typical analytical operations such as slice and dice, drill down, roll up, and pivot on large scale multi-dimensional data. For instance, at LinkedIn, Pinot powers dashboards for thousands of business metrics. One can connect various BI tools such Superset, Tableau, or PowerBI to visualize data in Pinot.

Instructions to connect Pinot with Superset can found .

Anomaly Detection

In addition to visualizing data in Pinot, one can run Machine Learning Algorithms to detect Anomalies on the data stored in Pinot. See for more information on how to use Pinot for Anomaly Detection and Root Cause Analysis.

Basics

Concepts

Learn about the various components of Pinot and terminologies used to describe data stored in Pinot

Pinot is designed to deliver low latency queries on large datasets. In order to achieve this performance, Pinot stores data in a columnar format and adds additional indices to perform fast filtering, aggregation and group by.

Raw data is broken into small data shards and each shard is converted into a unit known as a segment. One or more segments together form a table, which is the logical container for querying Pinot using SQL/PQL.

Pinot Storage Model

Pinot uses a variety of terms which can refer to either abstractions that model the storage of data or infrastructure components that drive the functionality of the system.

Table

Similar to traditional databases, Pinot has the concept of a table—a logical abstraction to refer to a collection of related data. As is the case with RDBMS, a table is a construct that consists of columns and rows (documents) that are queried using SQL. A table is associated with a schema which defines the columns in a table as well as their data types.

As opposed to RDBMS schemas, multiple tables can be created in Pinot (real-time or batch) that inherit a single schema definition. Tables are independently configured for concerns such as indexing strategies, partitioning, tenants, data sources, and/or replication.

Segment

Pinot has a distributed systems architecture that scales horizontally. Pinot expects the size of a table to grow infinitely over time. In order to achieve this, all data needs to be distributed across multiple nodes. Pinot achieves this by breaking data into smaller chunks known as segments **(this is similar to shards/partitions in HA relational databases). Segments can also be seen as time-based partitions.

Tenant

In order to support multi-tenancy, Pinot has first class support for tenants. A table is associated with a tenant. This allows all tables belonging to a particular logical namespace to be grouped under a single tenant name and isolated from other tenants. This isolation between tenants provides different namespaces for applications and teams to prevent sharing tables or schemas. Development teams building applications will never have to operate an independent deployment of Pinot. An organization can operate a single cluster and scale it out as new tenants increase the overall volume of queries. Developers can manage their own schemas and tables without being impacted by any other tenant on a cluster.

By default, all tables belong to a default tenant named "default". The concept of tenants is very important, as it satisfies the architectural principle of a "database per service/application" without having to operate many independent data stores. Further, tenants will schedule resources so that segments (shards) are able to restrict a table's data to reside only on a specified set of nodes. Similar to the kind of isolation that is ubiquitously used in Linux containers, compute resources in Pinot can be scheduled to prevent resource contention between tenants.

Cluster

Logically, a cluster is simply a group of tenants. As with the classical definition of a cluster, it is also a grouping of a set of compute nodes. Typically, there is only one cluster per environment/data center. There is no needed to create multiple clusters since Pinot supports the concept of tenants. At LinkedIn, the largest Pinot cluster consists of 1000+ nodes distributed across a data center. The number of nodes in a cluster can be added in a way that will linearly increase performance and availability of queries. The number of nodes and the compute resources per node will reliably predict the QPS for a Pinot cluster, and as such, capacity planning can be easily achieved using SLAs that assert performance expectations for end-user applications.

Auto-scaling is also achievable, however, a set amount of nodes is recommended to keep QPS consistent when query loads vary in sudden unpredictable end-user usage scenarios.

Pinot Components

A Pinot cluster is comprised of multiple distributed system components. These components are useful to understand for operators that are monitoring system usage or are debugging an issue with a cluster deployment.

Controller
Server
Broker
Minion (optional)

The benefits of scale that make Pinot linearly scalable for an unbounded number of nodes is made possible through its integration with Apache Zookeeper and Apache Helix.

Helix is a cluster management solution that was designed and created by the authors of Pinot at LinkedIn. Helix drives the state of a Pinot cluster from a transient state to an ideal state, acting as the fault-tolerant distributed state store that guarantees consistency. Helix is embedded as agents that operate within a controller, broker, and server, and does not exist as an independent and horizontally scaled component.

Pinot Controller

A controller is the core orchestrator that drives the consistency and routing in a Pinot cluster. Controllers are horizontally scaled as an independent component (container) and has visibility of the state of all other components in a cluster. The controller reacts and responds to state changes in the system and schedules the allocation of resources for tables, segments, or nodes. As mentioned earlier, Helix is embedded within the controller as an agent that is a participant responsible for observing and driving state changes that are subscribed to by other components.

In addition to cluster management, resource allocation, and scheduling, the controller is also the HTTP gateway for REST API administration of a Pinot deployment. A web-based query console is also provided for operators to quickly and easily run SQL/PQL queries.

Pinot Broker

A broker receives queries from a client and routes their execution to one or more Pinot servers before returning a consolidated response.

Pinot Server

Servers host segments (shards) that are scheduled and allocated across multiple nodes and routed on an assignment to a tenant (there is a single tenant by default). Servers are independent containers that scale horizontally and are notified by Helix through state changes driven by the controller. A server can either be a real-time server or an offline server.

A real-time and offline server have very different resource usage requirements, where real-time servers are continually consuming new messages from external systems (such as Kafka topics) that are ingested and allocated on segments of a tenant. Because of this, resource isolation can be used to prioritize high-throughput real-time data streams that are ingested and then made available for query through a broker.

Pinot Minion

Pinot minion is an optional component that can be used to run background tasks such as "purge" for GDPR (General Data Protection Regulation). As Pinot is an immutable aggregate store, records containing sensitive private data need to be purged on a request-by-request basis. Minion provides a solution for this purpose that complies with GDPR while optimizing Pinot segments and building additional indices that guarantees performance in the presence of the possibility of data deletion. One can also write a custom task that runs on a periodic basis. While it's possible to perform these tasks on the Pinot servers directly, having a separate process (Minion) lessens the overall degradation of query latency as segments are impacted by mutable writes.

Components

Learn about the different components and logical abstractions

This section is a reference for the definition of major components and logical abstractions used in Pinot. Please visit the Basic Concepts section to get a general overview that ties together all of the reference material in this section.

Operator reference

Developer reference

Controller

The Pinot Controller is responsible for a number of things

Controllers maintain the global metadata (e.g. configs and schemas) of the system with the help of Zookeeper which is used as the persistent metadata store.
Controllers host Helix Controller and is responsible for managing other pinot components (brokers, servers, minions)
They maintain the mapping of which servers are responsible for which segments. This mapping is used by the servers, to download the portion of the segments that they are responsible for. This mapping is also used by the broker to decide which servers to route the queries to.
Controller has admin endpoints for viewing, creating, updating and deleting configs which help us manage and operate the cluster.
Controllers also have endpoints for segment uploads which are used in offline data pushes. They are responsible for initializing realtime consumption and coordination of persisting the realtime segments into the segment store periodically.
They undertake other management activities such as managing retention of segments, validations.

There can be multiple instances of Pinot controller for redundancy. If there are multiple controllers, Pinot expects that all of them are configured with the same back-end storage system so that they have a common view of the segments (e.g. NFS). Pinot can use other storage systems such as HDFS or .

Starting a Controller

Make sure you've . If you're using docker, make sure to . To start a controller

Broker

Brokers are the components that handle Pinot queries. They accept queries from clients and forward them to the right servers. They collect results back from the servers and consolidate them into a single response, to sent it back to the client.

Pinot Brokers are modeled as Spectators. They need to know the location of each segment of a table (and each replica of the segments) and route requests to the appropriate server that hosts the segments of the table being queried. The broker ensures that all the rows of the table are queried exactly once so as to return correct, consistent results for a query. The brokers may optimize to prune some of the segments as long as accuracy is not sacrificed. Helix provides the framework by which spectators can learn the location in which each partition of a resource (i.e. participant) resides. The brokers use this mechanism to learn the servers that host specific segments of a table.

In case of hybrid tables, the brokers ensure that the overlap between realtime and offline segment data is queried exactly once, by performing offline and realtime federation. Let's take this example, we have realtime data for 5 days - March 23 to March 27, and offline data has been pushed until Mar 25, which is 2 days behind realtime. The brokers maintain this time boundary.

Suppose, we get a query to this table : select sum(metric) from table. The broker will split the query into 2 queries based on this time boundary - one for offline and one for realtime. This query becomes - select sum(metric) from table_REALTIME where date >= Mar 25 and select sum(metric) from table_OFFLINE where date < Mar 25 The broker then merges results from both these queries before returning back to the client.

Starting a Broker

Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a broker

docker run \
    --network=pinot-demo \
    --name pinot-broker \
    -d ${PINOT_IMAGE} StartBroker \
    -zkAddress pinot-zookeeper:2181

bin/pinot-admin.sh StartBroker \
  -zkAddress localhost:2181 \
  -clusterName PinotCluster \
  -brokerPort 7000

Server

Servers host the data segments and serve queries off the data they host. There's two types of servers

Offline Offline servers are responsible for downloading segments from the segment store, to host and serve queries off. When a new segment is uploaded to the controller, the controller decides the servers (as many as replication) that will host the new segment and notifies them to download the segment from the segment store. On receiving this notification, the servers download the segment file and load the segment onto the server, to server queries off them.

Realtime Real time servers directly ingest from a real time stream (such as Kafka, EventHubs). Periodically, they make segments of the in-memory ingested data, based on certain thresholds. This segment is then persisted onto the segment store.

Pinot Servers are modeled as Helix Participants, hosting Pinot tables (referred to as resources in helix terminology). Segments of a table are modeled as Helix partitions (of a resource). Thus, a Pinot server hosts one or more helix partitions of one or more helix resources (i.e. one or more segments of one or more tables).

Starting a Server

Make sure you've . If you're using docker, make sure to . To start a server

USAGE

Minion

Pinot Minion is a new component which leverages the Helix Task Framework . It can be attached to an existing Pinot cluster and then execute tasks as provided by the controller. It's a generic and single place for running background jobs. They help offload computationally intensive tasks—such as adding indexes to segments and merging segments—from other components.

Starting Minion

// coming soon

Tenant

A tenant is a logical component, defined as a group of server/broker nodes with the same Helix tag.

In order to support multi-tenancy, Pinot has first class support for tenants. Every table is associated with a server tenant and a broker tenant. This controls the nodes that will be used by this table as servers and brokers. This allows all tables belonging to a particular use case to be grouped under a single tenant name.

The concept of tenants is very important when the multiple use cases are using Pinot and there is a need to provide quotas or some sort of isolation across tenants. For example, consider we have two tables Table A and Table B in the same Pinot cluster.

We can configure Table A with server tenant Tenant A and Table B with server tenant Tenant B. We can tag some of the server nodes for Tenant A and some for Tenant B. This will ensure that segments of Table A only reside on servers tagged with Tenant A, and segment of Table B only reside on servers tagged with Tenant B. The same isolation can be achieved at the broker level, by configuring broker tenants to the tables.

No need to create separate clusters for every table or use case!

Tenant Config

This tenant is defined in the tenants section of the table config.

This section contains 2 main fields broker and server which decide the tenants used for the broker and server components of this table.

"tenants": {
  "broker": "brokerTenantName",
  "server": "serverTenantName"
}

In the above example,

The table will be served by brokers that have been tagged as brokerTenantName_BROKER in Helix.
If this were an offline table, the offline segments for the table will be hosted in pinot servers tagged in helix as serverTenantName_OFFLINE
If this were a realtime table, the realtime segments (both consuming as well as completed ones) will be hosted in pinot servers tagged in helix as serverTenantName_REALTIME.

Creating a tenant

Broker tenant

Here's a sample broker tenant config. This will create a broker tenant sampleBrokerTenant by tagging 3 untagged broker nodes as sampleBrokerTenant_BROKER.

sample-broker-tenant.json

{
     "tenantRole" : "BROKER",
     "tenantName" : "sampleBrokerTenant",
     "numberOfInstances" : 3
}

To create this tenant use the following command. The creation will fail if number of untagged broker nodes is less than numberOfInstances.

Follow instructions in Getting Pinot to get Pinot locally, and then

bin/pinot-admin.sh AddTenant \
    -name sampleBrokerTenant 
    -role BROKER 
    -instanceCount 3 -exec

curl -i -X POST -H 'Content-Type: application/json' -d @sample-broker-tenant.json localhost:9000/tenants

Check out the table config in the Rest API to make sure it was successfully uploaded.

Server tenant

Here's a sample server tenant config. This will create a server tenant sampleServerTenant by tagging 1 untagged server node as sampleServerTenant_OFFLINE and 1 untagged server node as sampleServerTenant_REALTIME.

sample-server-tenant.json

{
     "tenantRole" : "SERVER",
     "tenantName" : "sampleServerTenant",
     "offlineInstances" : 1,
     "realtimeInstances" : 1
}

To create this tenant use the following command. The creation will fail if number of untagged server nodes is less than offlineInstances + realtimeInstances.

Follow instructions in Getting Pinot to get Pinot locally, and then

bin/pinot-admin.sh AddTenant \
    -name sampleServerTenant \
    -role SERVER \
    -offlineInstanceCount 1 \
    -realtimeInstanceCount 1 -exec

curl -i -X POST -H 'Content-Type: application/json' -d @sample-server-tenant.json localhost:9000/tenants

Check out the table config in the Rest API to make sure it was successfully uploaded.

Getting started

This section contains quick start guides to help you get up and running with Pinot.

Running Pinot

We want your experience getting started with Pinot to be both low effort and high reward. Here you'll find a collection of quick start guides that contain starter distributions of the Pinot platform.

Bootstrapping a cluster

Deploy to a public cloud

How to setup a Pinot cluster

This video will show you a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances. This is an excellent resource for developers and operators that want to understand setting up each component and debugging a cluster.

You can find the commands that are shown in this video on GitHub https://github.com/npawar/pinot-tutorial

We also have a step-by-step guide for manually setting up a Pinot cluster using Docker or shell scripts.

Data import examples

Getting data into Pinot is easy. Take a look at these two quick start guides which will help you get up and running with sample data for offline and real-time tables.

Running Pinot locally

This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.

In this guide you'll learn how to download and install Apache Pinot as a standalone instance.

This is a quickstart guide that will show you how to quickly start an example recipe in a standalone instance and is meant for learning. To run Pinot in cluster mode, please take a look at Manual cluster setup.

Download Apache Pinot

First, let's download the Pinot distribution for this tutorial. You can either build the distribution from source or download a packaged release.

Prerequisites

Install JDK8 or higher.

# define the pinot version 
PINOT_VERSION=0.3.0

Build from source or download the distribution

Follow these steps to checkout code from Github and build Pinot locally

Prerequisites

Install Apache Maven 3.6 or higher

# checkout pinot
git clone https://github.com/apache/incubator-pinot.git
cd incubator-pinot

# build pinot
mvn install package -DskipTests -Pbin-dist

# navigate to directory containing the setup scripts
cd pinot-distribution/target/apache-pinot-incubating-$PINOT_VERSION-bin/apache-pinot-incubating-$PINOT_VERSION-bin

Note that Pinot scripts is located under pinot-distribution/target not target directory under root.

Download the latest binary release from Apache Pinot, or use this command

wget https://downloads.apache.org/incubator/pinot/apache-pinot-incubating-$PINOT_VERSION/apache-pinot-incubating-$PINOT_VERSION-bin.tar.gz

Once you have the tar file,

# untar it
tar -zxvf apache-pinot-incubating-$PINOT_VERSION-bin.tar.gz

# navigate to directory containing the launcher scripts
cd apache-pinot-incubating-$PINOT_VERSION-bin

Setting up a Pinot cluster

We'll be using a quick-start script, which does the following:

Sets up the Pinot cluster QuickStartCluster
Creates a sample table and loads sample data

There's 3 kinds of quick start

Batch

Batch quick start creates the pinot cluster, creates an offline table baseballStats and pushes sample offline data to the table.

bin/quick-start-batch.sh

That's it! We've spun up a Pinot cluster. You can continue playing with other types of quick start, or simply head on to Pinot Data Explorer to check out the data in the baseballStats table.

Streaming

Streaming quick start sets up a Kafka cluster and pushes sample data to a Kafka topic. Then, it creates the Pinot cluster and creates a realtime table meetupRSVP which ingests data from the Kafka topic.

# stop previous quick start cluster, if any
bin/quick-start-streaming.sh

We now have a Pinot cluster with a realtime table! You can head over to Pinot Data Explorer to check out the data in the meetupRSVP table.

Hybrid

Hybrid quick start sets up a Kafka cluster and pushes sample data to a Kafka topic. Then, it creates the Pinot cluster and creates a hybrid table airlineStats . The realtime table ingests data from the Kafka topic. Lastly, sample data is pushed into the offline table.

# stop previous quick start cluster, if any
bin/quick-start-hybrid.sh

Let's head over to Pinot Data Explorer to check out the data we pushed to the airlineStats table.

Public cloud examples

This page contains multiple quick start guides for deploying Pinot to a public cloud provider.

The following quick start guides will show you how to run an Apache Pinot cluster using Kubernetes on different public cloud providers.

Running on Azure

This starter guide provides a quick start for running Pinot on Microsoft Azure

This document provides the basic instruction to set up a Kubernetes Cluster on Azure Kubernetes Service (AKS)

1. Tooling Installation

1.1 Install Kubectl

Please follow this link (https://kubernetes.io/docs/tasks/tools/install-kubectl) to install kubectl.

For Mac User

brew install kubernetes-cli

Please check kubectl version after installation.

kubectl version

QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Please follow this link (https://helm.sh/docs/using_helm/#installing-helm) to install helm.

For Mac User

brew install kubernetes-helm

Please check helm version after installation.

helm version

This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

1.3 Install Azure CLI

Please follow this link (https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) to install Azure CLI.

For Mac User

brew update && brew install azure-cli

Below script will open default browser to sign-in to your Azure Account.

az login

3. (Optional) Create a Resource Group

Below script will create a resource group in location eastus.

AKS_RESOURCE_GROUP=pinot-demo
AKS_RESOURCE_GROUP_LOCATION=eastus
az group create --name ${AKS_RESOURCE_GROUP} \
                --location ${AKS_RESOURCE_GROUP_LOCATION}

4. (Optional) Create a Kubernetes cluster(AKS) in Azure

Below script will create a 3 nodes cluster named pinot-quickstart for demo purposes.

Please modify the parameters in the example command below:

AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks create --resource-group ${AKS_RESOURCE_GROUP} \
              --name ${AKS_CLUSTER_NAME} \
              --node-count 3

Once the command is succeed, it's ready to be used.

5. Connect to an existing cluster

Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks get-credentials --resource-group ${AKS_RESOURCE_GROUP} \
                       --name ${AKS_CLUSTER_NAME}

To verify the connection, you can run:

kubectl get nodes

6. Pinot Quickstart

Please follow this Kubernetes QuickStart to deploy your Pinot Demo.

7. Delete a Kubernetes Cluster

AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks delete --resource-group ${AKS_RESOURCE_GROUP} \
              --name ${AKS_CLUSTER_NAME}

Running on GCP

This starter provides a quick start for running Pinot on Google Cloud Platform (GCP)

This document provides the basic instruction to set up a Kubernetes Cluster on

1. Tooling Installation

1.1 Install Kubectl

Please follow this link () to install kubectl.

For Mac User

Please check kubectl version after installation.

QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Please follow this link () to install helm.

For Mac User

Please check helm version after installation.

This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

1.3 Install Google Cloud SDK

Please follow this link () to install Google Cloud SDK.

1.3.1 For Mac User

Install Google Cloud SDK

Restart your shell

2. (Optional) Initialize Google Cloud Environment

3. (Optional) Create a Kubernetes cluster(GKE) in Google Cloud

Below script will create a 3 nodes cluster named pinot-quickstart in us-west1-b with n1-standard-2 machines for demo purposes.

Please modify the parameters in the example command below:

You can monitor cluster status by command:

Once the cluster is in RUNNING status, it's ready to be used.

4. Connect to an existing cluster

Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

To verify the connection, you can run:

5. Pinot Quickstart

Please follow this to deploy your Pinot Demo.

6. Delete a Kubernetes Cluster

Running on AWS

This guide provides a quick start for running Pinot on Amazon Web Services (AWS).

This document provides the basic instruction to set up a Kubernetes Cluster on

1. Tooling Installation

1.1 Install Kubectl

Please follow this link () to install kubectl.

For Mac User

Please check kubectl version after installation.

QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Please follow this link () to install helm.

For Mac User

Please check helm version after installation.

This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

1.3 Install AWS CLI

Please follow this link () to install AWS CLI.

For Mac User

1.4 Install Eksctl

Please follow this link () to install AWS CLI.

For Mac User

For first time AWS user, please register your account at .

Once created the account, you can go to to create a user and create access keys under Security Credential tab.

Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will override AWS configuration stored in file ~/.aws/credentials

3. (Optional) Create a Kubernetes cluster(EKS) in AWS

Below script will create a 3 nodes cluster named pinot-quickstart in us-west-2 with t3.small machines for demo purposes.

Please modify the parameters in the example command below:

You can monitor cluster status by command:

Once the cluster is in ACTIVE status, it's ready to be used.

4. Connect to an existing cluster

Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

To verify the connection, you can run:

5. Pinot Quickstart

Please follow this to deploy your Pinot Demo.

6. Delete a Kubernetes Cluster

Data import

This section is an overview of the various options for importing data into Pinot.

There are multiple options for importing data into Pinot. These guides are ready-made examples that show you step-by-step instructions for importing records into Pinot, supported by our . These guides are meant to get you up and running with imported data as quick as possible. Pinot supports multiple file input formats without needing to change anything other than the file name. Each example imports a ready-made dataset so you can see how things work without needing to bring your own dataset.

Pinot File Systems

These guides will show you how to import data from a supported file system.

Pinot Input Formats

These guides will show you how to import data from a Pinot supported input format.

Pinot Stream Ingestion

This guide will show you how to import data using stream ingestion from Apache Kafka topics.

Stream ingestion

This page contains guides related to importing data from Apache Kafka using stream ingestion.

Import from Kafka

This guide shows you how to ingest a stream of records from an Apache Kafka topic into a Pinot table.

Coming soon

This guide is a work in progress.

We're actively working on improving our documentation. This doc will be available very soon. Please check back in a day or two for more details.

VERSION=0.3.0
wget https://downloads.apache.org/incubator/pinot/apache-pinot-incubating-$VERSION/apache-pinot-incubating-$VERSION-bin.tar.gz
tar vxf apache-pinot-incubating-*-bin.tar.gz
cd apache-pinot-incubating-*-bin
bin/quick-start-batch.sh

While you're waiting, please take a look at our section on plugin architecture.

File systems

This section contains a collection of short guides to show you how to import from a Pinot supported file system.

Import from ADLS (Azure)

This guide shows you how to import data from files stored in Azure Data Lake Storage (ADLS)

Coming soon

This guide is a work in progress.

We're actively working on improving our documentation. This doc will be available very soon. Please check back in a day or two for more details.

VERSION=0.3.0
wget https://downloads.apache.org/incubator/pinot/apache-pinot-incubating-$VERSION/apache-pinot-incubating-$VERSION-bin.tar.gz
tar vxf apache-pinot-incubating-*-bin.tar.gz
cd apache-pinot-incubating-*-bin
bin/quick-start-batch.sh

While you're waiting, please take a look at our section on plugin architecture.

Import from HDFS

This guide shows you how to import data from HDFS.

Coming soon

This guide is a work in progress.

We're actively working on improving our documentation. This doc will be available very soon. Please check back in a day or two for more details.

While you're waiting, please take a look at our section on .

Import from GCP

This guide shows you how to import data from GCP (Google Cloud Platform).

Coming soon

This guide is a work in progress.

We're actively working on improving our documentation. This doc will be available very soon. Please check back in a day or two for more details.

While you're waiting, please take a look at our section on .

Input formats

This section contains a collection of guides that will show you how to import data from a Pinot supported input format.

Import from CSV

This guide shows you how to import a CSV file of records into Pinot.

Coming soon

This guide is a work in progress.

We're actively working on improving our documentation. This doc will be available very soon. Please check back in a day or two for more details.

While you're waiting, please take a look at our section on .

Import from JSON

This guide shows you how to import a JSON file of records into Pinot.

Coming soon

This guide is a work in progress.

We're actively working on improving our documentation. This doc will be available very soon. Please check back in a day or two for more details.

While you're waiting, please take a look at our section on .

Import from Avro

This guide shows you how to import records into Pinot using Avro file format.

Coming soon

This guide is a work in progress.

We're actively working on improving our documentation. This doc will be available very soon. Please check back in a day or two for more details.

VERSION=0.3.0
wget https://downloads.apache.org/incubator/pinot/apache-pinot-incubating-$VERSION/apache-pinot-incubating-$VERSION-bin.tar.gz
tar vxf apache-pinot-incubating-*-bin.tar.gz
cd apache-pinot-incubating-*-bin
bin/quick-start-batch.sh

While you're waiting, please take a look at our section on plugin architecture.

Import from Parquet

This guide shows you how to import records from a Parquet file into Pinot.

Coming soon

This guide is a work in progress.

We're actively working on improving our documentation. This doc will be available very soon. Please check back in a day or two for more details.

VERSION=0.3.0
wget https://downloads.apache.org/incubator/pinot/apache-pinot-incubating-$VERSION/apache-pinot-incubating-$VERSION-bin.tar.gz
tar vxf apache-pinot-incubating-*-bin.tar.gz
cd apache-pinot-incubating-*-bin
bin/quick-start-batch.sh

While you're waiting, please take a look at our section on plugin architecture.

Import from Thrift

This guide shows you how to import records into Pinot using a Thrift file.

Coming soon

This guide is a work in progress.

We're actively working on improving our documentation. This doc will be available very soon. Please check back in a day or two for more details.

While you're waiting, please take a look at our section on .

Import from ORC

This guide shows you how to import records into Pinot using ORC file format.

Coming soon

This guide is a work in progress.

We're actively working on improving our documentation. This doc will be available very soon. Please check back in a day or two for more details.

VERSION=0.3.0
wget https://downloads.apache.org/incubator/pinot/apache-pinot-incubating-$VERSION/apache-pinot-incubating-$VERSION-bin.tar.gz
tar vxf apache-pinot-incubating-*-bin.tar.gz
cd apache-pinot-incubating-*-bin
bin/quick-start-batch.sh

While you're waiting, please take a look at our section on plugin architecture.

Feature guides

This section contains articles that provide technical and implementation details of Pinot features

Releases

The following summarizes Pinot's releases, from the latest one to the earliest one.

Note

Before upgrading from one version to another one, please read the release notes as there may be some incompatibilities between versions.

0.3.0 (March 2020)

0.2.0 (November 2019)

0.1.0 (March 2019, First release)

Recipes

Here you will find a collection of ready-made sample applications and examples for real-world data

For Users

Query

Unique Counting

Unique counting is a classic problem. Pinot solves it with multiple ways to trade-off between accuracy and latency.

Accurate Results

Functions:

DistinctCount(x) -> LONG

Returns accurate count for all unique values in a column.

The underlying implementation is using a IntOpenHashSet in library: it.unimi.dsi:fastutil:8.2.3 to hold all the unique values.

Approximation Results

Usually it takes a lot of resources and time to compute accurate results for unique counting. In some circumstance, users could tolerate with certain error rate, then we could use approximation functions to tackle this problem.

HyperLogLog

HyperLogLog is one approximation algorithm for unique counting. It uses fixed number of bits to estimate the cardinality of given data set.

Pinot leverages HyperLogLog Class in library com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.

Functions:

DistinctCountHLL(x) -> LONG

For column type INT/LONG/FLOAT/DOUBLE/STRING , Pinot treats each value as an individual entry to add into HyperLogLog Object, then compute the approximation by calling method cardinality().

For column type BYTES, Pinot treats each value as a serialized HyperLogLog Object with pre-aggregated values inside. The bytes value is generated by org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog).

All deserialized HyperLogLog object will be merged into one then calling method cardinality() to get the approximated unique count.

API

Querying Pinot

Ways to query Pinot

REST API on the Broker

Pinot can be queried via a broker endpoint as follows. This example assumes broker is running on localhost:8099

The Pinot REST API can be accessed by invoking POST operation with a JSON body containing the parameter sql to the /query/sql endpoint on a broker.

Note

This endpoint is deprecated, and will soon be removed. The standard-SQL endpoint is the recommended endpoint.

The PQL endpoint can be accessed by invoking POST operation with a JSON body containing the parameter pql to the /query endpoint on a broker.

Query Console

Query Console can be used for running ad-hoc queries (checkbox available to query the PQL endpoint). The Query Console can be accessed by entering the <controller host>:<controller port> in your browser

pinot-admin

You can also query using the pinot-admin scripts. Make sure you follow instructions in to get Pinot locally, and then

Pinot Clients

Here's a list of the clients available to query Pinot from your application

Coming soon - JDBC client

Clients

For Developers

Basics

Extending Pinot

Writing Custom Aggregation Function

Pinot has many inbuilt Aggregation Functions such as MIN, MAX, SUM, AVG etc. See page for the list of aggregation functions.

Adding a new AggregationFunction requires two things

Implement interface and make it available as part of the classpath
Register the function in . As of today, this requires code change in Pinot but we plan to add the ability to plugin Functions without having to change Pinot code.

To get an overall idea, see Aggregation Function implementation. All other implementations can be found .

Lets look at the key methods to implements in AggregationFunction

Before getting into the implementation, it's important to understand how Aggregation works in Pinot.

This is advanced topic and assumes you know Pinot . All the data in Pinot is stored in segments across multiple nodes. The query plan at a high level comprises of 3 phases

1. Map phase

This phase works on the individual segments in Pinot.

Initialization: Depending on the query type the following methods are invoked to setup the result holder. While having different methods and return types adds complexity, it helps in performance.
- AGGREGATION : createAggregationResultHolderThis must return an instance of type . You can either use the or
- GROUP BY: createGroupByResultHolderThis method must return an instance of type . Depending on the type of result object, you might be able to use one of the existing .
Callback: For every record that matches the filter condition in the query,
one of the following methods are invoked depending on the queryType(aggregation vs group by) and columnType(single-value vs multi-value). Note that we invoke this method for a batch of records instead of every row for performance reasons and allows JVM to vectorize some of parts of the execution if possible.
- AGGREGATION: aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<String,BlockValSet> blockValSetMap)
  - length: This represent length of the block. Typically < 10k
  - aggregationResultHolder: this is the object returned fromcreateAggregationResultHolder
  - blockValSetMap: Map of blockValSets depending on the arguments to the AggFunction
- Group By Single Value: aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map blockValSets)
  - length: This represent length of the block. Typically < 10k
  - groupKeyArray: Pinot internally maintains a value to int mapping and this groupKeyArray maps to the internal mapping. These values together form a unique key.
  - groupByResultHolder: This is the object returned fromcreateGroupByResultHolder
  - blockValSetMap: Map of blockValSets depending on the arguments to the AggFunction
- Group By Multi Value: aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map blockValSets)
  - length: This represent length of the block. Typically < 10k
  - groupKeyArray: Pinot internally maintains a value to int mapping and this groupKeyArray maps to the internal mapping. These values together form a unique key.
  - groupByResultHolder: This is the object returned fromcreateGroupByResultHolder
  - blockValSetMap: Map of blockValSets depending on the arguments to the AggFunction

2. Combine phase

In this phase, the results from all segments within a single pinot server are combined into IntermediateResult. The type of IntermediateResult is based on the Generic Type defined in the AggregationFunction implementation.

3. Reduce phase

There are two steps in the Reduce Phase

Merge all the IntermediateResult's from various servers using the merge function
Extract the final results by invoking the extractFinalResult method. In most cases, FinalResult is same type as IntermediateResult. is an example where IntermediateResult (AvgPair) is different from FinalResult(Double)

Pluggable Storage

Pinot enables its users to write a PinotFS abstraction layer to store data in a data layer of their choice for realtime and offline segments.

Some examples of storage backends(other than local storage) currently supported are:

If the above two filesystems do not meet your needs, you can extend the current PinotFS to customize for your needs.

New Storage Type implementation

In order to add a new type of storage backend (say, Amazon s3) implement the following class:

S3FS extends PinotFS

Configurations for Realtime Tables

The example here uses the existing org.apache.pinot.filesystem.HadoopPinotFS to store realtime segments in a HDFS filesytem. In the Pinot controller config, add the following new configs:

"controller.data.dir": "SET_TO_YOUR_HDFS_ROOT_DIR"
"controller.local.temp.dir": "SET_TO_A_LOCAL_FILESYSTEM_DIR"
"pinot.controller.storage.factory.class.hdfs": "org.apache.pinot.filesystem.HadoopPinotFS"
"pinot.controller.storage.factory.hdfs.hadoop.conf.path": "SET_TO_YOUR_HDFS_CONFIG_DIR"
"pinot.controller.storage.factory.hdfs.hadoop.kerberos.principle": "SET_IF_YOU_USE_KERBEROS"
"pinot.controller.storage.factory.hdfs.hadoop.kerberos.keytab": "SET_IF_YOU_USE_KERBEROS"
"controller.enable.split.commit": "true"

In the Pinot controller config, add the following new configs:

"pinot.server.instance.enable.split.commit": "true"

Note: currently there is a bug in the controller (issue <https://github.com/apache/incubator-pinot/issues/3847>\), for now you can cherrypick the PR https://github.com/apache/incubator-pinot/pull/3849 to fix the issue as tested already. The PR is under review now.

Configurations for Offline Tables

These properties for the stream implementation are to be set in your controller and server configurations.

In your controller and server configs, please set the FS class you would like to support. pinot.controller.storage.factory.class.${YOUR_URI_SCHEME} to the full path of the FS class you would like to include

You also need to configure pinot.controller.local.temp.dir for the local dir on the controller machine.

For filesystem specific configs, you can pass in the following with either the pinot.controller prefix or the pinot.server prefix.

All the following configs need to be prefixed with storage.factory.

AzurePinotFS requires the following configs according to your environment:

adl.accountId, adl.authEndpoint, adl.clientId, adl.clientSecret

Sample Controller Config

"pinot.controller.storage.factory.class.adl": "org.apache.pinot.filesystem.AzurePinotFS"
"pinot.controller.storage.factory.adl.accountId": "xxxx"
"pinot.controller.storage.factory.adl.authEndpoint": "xxxx"
"pinot.controller.storage.factory.adl.clientId": "xxxx"
"pinot.controller.segment.fetcher.protocols": "adl"

Sample Server Config

"pinot.server.storage.factory.class.adl": "org.apache.pinot.filesystem.AzurePinotFS"
"pinot.server.storage.factory.adl.accountId": "xxxx"
"pinot.server.storage.factory.adl.authEndpoint": "xxxx"
"pinot.server.storage.factory.adl.clientId": "xxxx"
"pinot.server.segment.fetcher.protocols": "adl"

You can find the parameters in your account as follows: https://stackoverflow.com/questions/56349040/what-is-clientid-authtokenendpoint-clientkey-for-accessing-azure-data-lake

Please also make sure to set the following config with the value “adl”

"segment.fetcher.protocols" : "adl"

To see how to upload segments to different storage systems, check ../segment_fetcher.rst.

HadoopPinotFS requires the following configs according to your environment:

hadoop.kerberos.principle, hadoop.kerberos.keytab, hadoop.conf.path

Please make sure to also set the following config with the value “hdfs”

"segment.fetcher.protocols" : "hdfs"

Record Reader

Pinot supports indexing data from various file formats. To support reading from a file format, a record reader need to be provided to read the file and convert records into the general format which the indexing engine can understand. The record reader serves as the connector from each individual file format to Pinot record format.

Pinot package provides the following record readers out of the box:

Avro record reader: record reader for Avro format files
CSV record reader: record reader for CSV format files
JSON record reader: record reader for JSON format files
ORC record reader: record reader for ORC format files
Thrift record reader: record reader for Thrift format files
Pinot segment record reader: record reader for Pinot segment

Initialize Record Reader

To initialize a record reader, the data file and table schema should be provided (for Pinot segment record reader, only need to provide the index directory because schema can be derived from the segment). The output record will follow the table schema provided.

For Avro/JSON/ORC/Pinot segment record reader, no extra configuration is required as column names and multi-values are embedded in the data file.

For CSV/Thrift record reader, extra configuration might be provided to determine the column names and multi-values for the data.

CSV Record Reader Config

The CSV record reader config contains the following settings:

Header: the header for the CSV file (column names)
Column delimiter: delimiter for each column
Multi-value delimiter: delimiter for each value for a multi-valued column

If no config provided, use the default setting:

Use the first row in the data file as the header
Use ‘,’ as the column delimiter
Use ‘;’ as the multi-value delimiter

Thrift Record Reader Config

The Thrift record reader config is mandatory. It contains the Thrift class name for the record reader to de-serialize the Thrift objects.

ORC Record Reader Config

The following property is to be set during segment generation in your Hadoop properties.

record.reader.path: ${FULL_PATH_OF_YOUR_RECORD_READER_CLASS}

For ORC, it would be:

record.reader.path: org.apache.pinot.orc.data.readers.ORCRecordReader

Implement Your Own Record Reader

For other file formats, we provide a general interface for record reader - RecordReader. To index the file into Pinot segment, simply implement the interface and plug it into the index engine - SegmentCreationDriverImpl. We use a 2-passes algorithm to index the file into Pinot segment, hence the rewind() method is required for the record reader.

Generic Row

GenericRow is the record abstraction which the index engine can read and index with. It is a map from column name (String) to column value (Object). For multi-valued column, the value should be an object array (Object[]).

Contracts for Record Reader

There are several contracts for record readers that developers should follow when implementing their own record readers:

The output GenericRow should follow the table schema provided, in the sense that:
- All the columns in the schema should be preserved (if column does not exist in the original record, put default value instead)
- Columns not in the schema should not be included
- Values for the column should follow the field spec from the schema (data type, single-valued/multi-valued)
For the time column (refer to TimeFieldSpec), record reader should be able to read both incoming and outgoing time (we allow incoming time - time value from the original data to outgoing time - time value stored in Pinot conversion during index creation).
- If incoming and outgoing time column name are the same, use incoming time field spec
- If incoming and outgoing time column name are different, put both of them as time field spec
- We keep both incoming and outgoing time column to handle cases where the input file contains time values that are already converted

Segment Fetchers

When Pinot segment files are created in external systems (hadoop/spark/etc), there are several ways to push those data to Pinot Controller and Server:

push segment to shared NFS and let Pinot pull segment files from the location of that NFS.
push segment to a Web server and let Pinot pull segment files from the Web server with http/https link.
push segment to HDFS and let Pinot pull segment files from HDFS with hdfs location uri.
push segment to other system and implement your own segment fetcher to pull data from those systems.

The first two options should be supported out of the box with Pinot package. As long your remote jobs send Pinot controller with the corresponding URI to the files it will pick up the file and allocate it to proper Pinot Servers and brokers. To enable Pinot support for HDFS, you will need to provide Pinot Hadoop configuration and proper Hadoop dependencies.

HDFS segment fetcher configs

In your Pinot controller/server configuration, you will need to provide the following configs:

pinot.controller.segment.fetcher.hdfs.hadoop.conf.path=`<file path to hadoop conf folder>

pinot.server.segment.fetcher.hdfs.hadoop.conf.path=`<file path to hadoop conf folder>

This path should point the local folder containing core-site.xml and hdfs-site.xml files from your Hadoop installation

pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.principle=`<your kerberos principal>
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.keytab=`<your kerberos keytab>

pinot.server.segment.fetcher.hdfs.hadoop.kerberos.principle=`<your kerberos principal>
pinot.server.segment.fetcher.hdfs.hadoop.kerberos.keytab=`<your kerberos keytab>

These two configs should be the corresponding Kerberos configuration if your Hadoop installation is secured with Kerberos. Please check Hadoop Kerberos guide on how to generate Kerberos security identification.

You will also need to provide proper Hadoop dependencies jars from your Hadoop installation to your Pinot startup scripts.

Push HDFS segment to Pinot Controller

To push HDFS segment files to Pinot controller, you just need to ensure you have proper Hadoop configuration as we mentioned in the previous part. Then your remote segment creation/push job can send the HDFS path of your newly created segment files to the Pinot Controller and let it download the files.

For example, the following curl requests to Controller will notify it to download segment files to the proper table:

curl -X POST -H "UPLOAD_TYPE:URI" -H "DOWNLOAD_URI:hdfs://nameservice1/hadoop/path/to/segment/file.gz" -H "content-type:application/json" -d '' localhost:9000/segments

Implement your own segment fetcher for other systems

You can also implement your own segment fetchers for other file systems and load into Pinot system with an external jar. All you need to do is to implement a class that extends the interface of SegmentFetcher and provides config to Pinot Controller and Server as follows:

pinot.controller.segment.fetcher.`<protocol>`.class =`<class path to your implementation>

pinot.server.segment.fetcher.`<protocol>`.class =`<class path to your implementation>

You can also provide other configs to your fetcher under config-root pinot.server.segment.fetcher.<protocol>

Code Setup

Dev Environment Setup

To contribute to Pinot, please follow the instructions below.

Git

Pinot uses git for source code management. If you are new to Git, it will be good to review of Git and a common tasks like and .

Getting the Source Code

Create a fork

To limit the number of branches created on the Apache Pinot repository, we recommend that you create a fork by clicking on the fork button . Read more about

Clone the repository locally

Maven

Pinot is a Maven project and familiarity with Maven will help you work with Pinot code. If you are new to Maven, you can read about Maven and .

Run the following maven command to setup the project.

Setup IDE

Import the project into your favorite IDE. Setup stylesheet according to your IDE. We have provided instructions for intellij and eclipse. If you are using other IDEs, please ensure you use stylesheet based on .

Intellij

To import the Pinot stylesheet this launch intellij and navigate to Preferences (on Mac) or Settings on Linux.

Navigate to Editor -> Code Style -> Java
Select Import Scheme -> Intellij IDES code style XML
Choose codestyle-intellij.xml from incubator-pinot/config folder of your workspace. Click Apply.

Eclipse

To import the Pinot stylesheet this launch eclipse and navigate to Preferences (on Mac) or Settings on Linux.

Navigate to Java->Code Style->Formatter
Choose codestyle-eclipse.xml from incubator-pinot/config folder of your workspace. Click Apply.

Code Modules and Organization

TODO: Deprecated

Before proceeding to contributing changes to Pinot, review the contents of this section.

External Dependencies

Pinot depends on a number of external projects, the most notable ones are:

Apache Zookeeper
Apache Helix
Apache Kafka
Apache Thrift
Netty
Google Guava
Yammer

Helix is used for ClusterManagement, and Pinot code is tightly integrated with Helix and Zookeeper interfaces.

Kafka is the default realtime stream provider, but can be replaced with others. See customizations section for more info.

Thrift is used for message exchange between broker and server components, with Netty providing the server functionality for processing messages in a non-blocking fashion.

Guava is used for number of auxiliary components such as Caches and RateLimiters. Yammer metrics is used to register and expose metrics from Pinot components.

In addition, Pinot relies on several key external libraries for some of its core functionality: Roaring Bitmaps: Pinot’s inverted indices are built using library. t-Digest: Pinot’s digest based percentile calculations are based on library.

Pinot Modules

Pinot is a multi-module project, with each module providing specific functionality that helps us to build services from a combination of modules. This helps keep clean interface contracts between different modules as well as reduce the overall executable size for individually deployable component.

Each module has a src/main/java folder where the code resides and src/test/java where the unit tests corresponding to the module’s code reside.

Foundational modules

The following figure provides a high-level overview of the foundational Pinot modules.

pinot-common

pinot-common provides classes common to Pinot components. Some key classes you will find here are:

config: Definitions for various elements of Pinot’s table config.
metrics: Definitions for base metrics provided by Controller, Broker and Server.
metadata: Definitions of metadata stored in Zookeeper.
pql.parsers: Code to compile PQL strings into corresponding AbstractSyntaxTrees (AST).
request: Autogenerated thrift classes representing various parts of PQL requests.
response: Definitions of response format returned by the Broker.
filesystem: provides abstractions for working with segments on local or remote filesystems. This module allows for users to plugin filesystems specific to their usecase. Extensions to the base PinotFS should ideally be housed in their specific modules so as not pull in unnecessary dependencies for all users.

pinot-transport

pinot-transport module provides classes required to handle scatter-gather on Pinot Broker and netty wrapper classes used by Server to handle connections from Broker.

pinot-core

pinot-core modules provides the core functionality of Pinot, specifically for handling segments, various index structures, query execution - filters, transformations, aggregations etc and support for realtime segments.

pinot-server

pinot-server provides server specific functionality including server startup and REST APIs exposed by the server.

pinot-controller

pinot-controller houses all the controller specific functionality, including many cluster administration APIs, segment upload (for both offline and realtime), segment assignment, retention strategies etc.

pinot-broker

pinot-broker provides broker functionality that includes wiring the broker startup sequence, building broker routing tables, PQL request handling.

pinot-minion

pinot-minion provides functionality for running auxiliary/periodic tasks on a Pinot Cluster such as purging records for compliance with regulations like GDPR.

pinot-hadoop

pinot-hadoop provides classes for segment generation jobs using Hadoop infrastructure.

Auxiliary modules

In addition to the core modules described above, Pinot code provides the following modules:

pinot-tools: This module is a collection of many tools useful for setting up Pinot cluster, creating/updating segments.It also houses the Pinot quick start guide code.
pinot-perf: This module has a collection of benchmark test code used to evaluate design options.
pinot-client-api: This module houses the Java client API. See for more info.
pinot-integration-tests: This module holds integration tests that test functionality across multiple classes or components.

These tests typically do not rely on mocking and provide more end to end coverage for code.

Extension modules

pinot-hadoop-filesystem and pinot-azure-filesystem are module added to support extensions to Pinot filesystem. The functionality is broken down into modules of their own to avoid polluting the common modules with additional large libraries. These libraries bring in transitive dependencies of their own that can cause classpath conflicts at runtime. We would like to avoid this for the common usage of Pinot as much as possible.

Update Documentation

Pinot documentation is powered by Gitbook, and a bi-directional Github integration is set up to back up all the changes.

The git repo is here:

For Pinot Contributor, there are majorly two ways to update the documentations.

Submit a Pull Request

This follows the old fashion of updating documentations.

You can checkout repo and modify the documentation accordingly then submit a PullRequest for review.

Once the PR got merged, the changes will automatically applied to corresponding Gitbook pages.

Please note that all Gitbook documentation follows .

Directly Edit on Gitbook

Once granted edit permission, contributors could edit any page on Gitbook and then save and merge the changes by themselves. This is one example commit on Github repo to reflect the updates coming from Git book: .

Usually we grant edit permission to committers and active contributors.

Please contact admin(Email to with the content you wanna add) to ask for edit permission for Pinot Gitbook.

Once granted the permission, you can directly working on to modify the documentation, and merge changes.

Advanced

Data Ingestion Overview

Ingesting Offline data

Segments for offline tables are constructed outside of Pinot, typically in Hadoop via map-reduce jobs and ingested into Pinot via REST API provided by the Controller. Pinot provides libraries to create Pinot segments out of input files in AVRO, JSON or CSV formats in a hadoop job, and push the constructed segments to the controllers via REST APIs.

When an Offline segment is ingested, the controller looks up the table’s configuration and assigns the segment to the servers that host the table. It may assign multiple servers for each segment depending on the number of replicas configured for that table.

Pinot supports different segment assignment strategies that are optimized for various use cases.

Once segments are assigned, Pinot servers get notified via Helix to “host” the segment. The servers download the segments (as a cached local copy to serve queries) and load them into local memory. All segment data is maintained in memory as long as the server hosts that segment.

Once the server has loaded the segment, Helix notifies brokers of the availability of these segments. The brokers start include the new segments for queries. Brokers support different routing strategies depending on the type of table, the segment assignment strategy and the use case.

Data in offline segments are immutable (Rows cannot be added, deleted, or modified). However, segments may be replaced with modified data.

Ingesting Realtime Data

Segments for realtime tables are constructed by Pinot servers with rows ingested from data streams such as Kafka. Rows ingested from streams are made available for query processing as soon as they are ingested, thus enabling applications such as those that need real-time charts on analytics.

In large scale installations, data in streams is typically split across multiple stream partitions. The underlying stream may provide consumer implementations that allow applications to consume data from any subset of partitions, including all partitions (or, just from one partition).

A pinot table can be configured to consume from streams in one of two modes:

LowLevel: This is the preferred mode of consumption. Pinot creates independent partition-level consumers for each partition. Depending on the the configured number of replicas, multiple consumers may be created for each partition, taking care that no two replicas exist on the same server host. Therefore you need to provision at least as many hosts as the number of replcias configured.
HighLevel: Pinot creates one stream-level consumer that consumes from all partitions. Each message consumed could be from any of the partitions of the stream. Depending on the configured number of replicas, multiple stream-level consumers are created, taking care that no two replicas exist on the same server host. Therefore you need to provision exactly as many hosts as the number of replicas configured.

Of course, the underlying stream should support either mode of consumption in order for a Pinot table to use that mode. Kafka has support for both of these modes. See Pluggable Streams for more information on support of other data streams in Pinot.

In either mode, Pinot servers store the ingested rows in volatile memory until either one of the following conditions are met:

A certain number of rows are consumed
The consumption has gone on for a certain length of time

(See StreamConfigs Section on how to set these values, or have pinot compute them for you)

Upon reaching either one of these limits, the servers do the following:

Pause consumption
Persist the rows consumed so far into non-volatile storage
Continue consuming new rows into volatile memory again.

The persisted rows form what we call a completed segment (as opposed to a consuming segment that resides in volatile memory).

In LowLevel mode, the completed segments are persisted the into local non-volatile store of pinot server as well as the segment store of the pinot cluster (See Pinot Architecture Overview). This allows for easy and automated mechanisms for replacing pinot servers, or expanding capacity, etc. Pinot has special mechanisms that ensure that the completed segment is equivalent across all replicas.

During segment completion, one winner is chosen by the controller from all the replicas as the committer server. The committer server builds the segment and uploads it to the controller. All the other non-committer servers follow one of these two paths:

If the in-memory segment is equivalent to the committed segment, the non-committer server also builds the segment locally and replaces the in-memory segment
If the in-memory segment is non equivalent to the committed segment, the non-committer server downloads the segment from the controller.

For more details on this protocol, please refer to this doc.

In HighLevel mode, the servers persist the consumed rows into local store (and not the segment store). Since consumption of rows can be from any partition, it is not possible to guarantee equivalence of segments across replicas.

See Consuming and Indexing rows in Realtime for details.

Tutorials

Pinot Architecture

Terminology

First, a bit of naming notions. Pinot has has different components, and different ways of representing the data. In particular, data is represented by:

Table

A table is a logical abstraction to refer to a collection of related data. It consists of columns and rows (documents).

Segment

Data in table is divided into (horizontal) shards referred to as segments.

Pinot Components

Pinot Controller

Manages other pinot components (brokers, servers) as well as controls assignment of tables/segments to servers.

Pinot Server

Hosts one or more segments and serves queries from those segments.

Pinot Broker

Accepts queries from clients and routes them to one or more servers, and returns consolidated response to the client.

Pinot leverages Apache Helix for cluster management. Helix is a cluster management framework to manage replicated, partitioned resources in a distributed system. Helix uses Zookeeper to store cluster state and metadata.

Briefly, Helix divides nodes into three logical components based on their responsibilities:

Participant

The nodes that host distributed, partitioned resources.

Spectator

The nodes that observe the current state of each Participant and use that information to access the resources. Spectators are notified of state changes in the cluster (state of a participant, or that of a partition in a participant).

Controller

The node that observes and controls the Participant nodes. It is responsible for coordinating all transitions in the cluster and ensuring that state constraints are satisfied while maintaining cluster stability

Pinot Controller hosts Helix Controller, in addition to hosting REST APIs for Pinot cluster administration and data ingestion. There can be multiple instances of Pinot controller for redundancy. If there are multiple controllers, Pinot expects that all of them are configured with the same back-end storage system so that they have a common view of the segments (e.g. NFS). Pinot can use other storage systems such as HDFS or ADLS.

Pinot Brokers are modeled as Spectators. They need to know the location of each segment of a table (and each replica of the segments) and route requests to the appropriate server that hosts the segments of the table being queried. The broker ensures that all the rows of the table are queried exactly once so as to return correct, consistent results for a query. The brokers (or servers) may optimize to prune some of the segments as long as accuracy is not satisfied. In case of hybrid tables, the brokers ensure that the overlap between realtime and offline segment data is queried exactly once. Helix provides the framework by which spectators can learn the location (i.e. participant) in which each partition of a resource resides. The brokers use this mechanism to learn the servers that host specific segments of a table.

Store Data

Pinot Persistent Layers

By default, Pinot does not come with a storage layer, so all the data sent, won't be stored in case of system crash. In order to persistently store the generated segments, you will need a storage layer.

Pinot enables its users to write a PinotFS abstraction layer to store data in a data layer of their choice for realtime and offline segments.

Some examples of storage backends(other than local storage) currently supported are:

If the above two filesystems do not meet your needs, you can extend the current to customize for your needs.

New Storage Type implementation

In order to add a new type of storage backend (say, Amazon s3) implement the following class:

S3FS extends

Batch Tables

Configurations for Batch Tables

These properties for the stream implementation are to be set in your controller and server configurations.

You also need to configure pinot.controller.local.temp.dir for the local dir on the controller machine.

For filesystem specific configs, you can pass in the following with either the pinot.controller prefix or the pinot.server prefix.

All the following configs need to be prefixed with storage.factory.

AzurePinotFS requires the following configs according to your environment:

adl.accountId, adl.authEndpoint, adl.clientId, adl.clientSecret

Sample Controller Config

"pinot.controller.storage.factory.class.adl": "org.apache.pinot.filesystem.AzurePinotFS"
"pinot.controller.storage.factory.adl.accountId": "xxxx"
"pinot.controller.storage.factory.adl.authEndpoint": "xxxx"
"pinot.controller.storage.factory.adl.clientId": "xxxx"
"pinot.controller.storage.factory.adl.clientId": "xxxx"
"pinot.controller.segment.fetcher.protocols": "adl"

Sample Server Config

"pinot.server.storage.factory.class.adl": "org.apache.pinot.filesystem.AzurePinotFS"
"pinot.server.storage.factory.adl.accountId": "xxxx"
"pinot.server.storage.factory.adl.authEndpoint": "xxxx"
"pinot.server.storage.factory.adl.clientId": "xxxx"
"pinot.server.storage.factory.adl.clientId": "xxxx"
"pinot.server.segment.fetcher.protocols": "adl"

You can find the parameters in your account as follows: https://stackoverflow.com/questions/56349040/what-is-clientid-authtokenendpoint-clientkey-for-accessing-azure-data-lake

Please also make sure to set the following config with the value “adl”

"segment.fetcher.protocols" : "adl"

To see how to upload segments to different storage systems, check ../segment_fetcher.rst.

HadoopPinotFS requires the following configs according to your environment:

hadoop.kerberos.principle, hadoop.kerberos.keytab, hadoop.conf.path

Please make sure to also set the following config with the value “hdfs”

"segment.fetcher.protocols" : "hdfs"

Streaming Tables

Configurations for Streaming Tables

The example here uses the existing org.apache.pinot.filesystem.HadoopPinotFS to store realtime segments in a HDFS filesytem. In the Pinot controller config, add the following new configs:

In the Pinot controller config, add the following new configs:

Note: currently there is a bug in the controller (issue <), for now you can cherrypick the PR to fix the issue as tested already. The PR is under review now.

Ingest Data

How to turn on the water valve

There are two ways to get data ingested into Pinot:

Batch
Streaming

Batch

Segment Fetchers

When pinot segment files are created in external systems (hadoop/spark/etc), there are several ways to push those data to pinot Controller and Server:

push segment to shared NFS and let pinot pull segment files from the location of that NFS.
push segment to a Web server and let pinot pull segment files from the Web server with http/https link.
push segment to HDFS and let pinot pull segment files from HDFS with hdfs location uri.
push segment to other system and implement your own segment fetcher to pull data from those systems.

The first two options should be supported out of the box with pinot package. As long your remote jobs send Pinot controller with the corresponding URI to the files it will pick up the file and allocate it to proper Pinot Servers and brokers. To enable Pinot support for HDFS, you will need to provide Pinot Hadoop configuration and proper Hadoop dependencies.

Write your batch

Implement your own segment fetcher for other systems

You can also implement your own segment fetchers for other file systems and load into Pinot system with an external jar. All you need to do is to implement a class that extends the interface of and provides config to Pinot Controller and Server as follows:

You can also provide other configs to your fetcher under config-root pinot.server.segment.fetcher.<protocol>

HDFS

HDFS segment fetcher configs

In your Pinot controller/server configuration, you will need to provide the following configs:

This path should point the local folder containing core-site.xml and hdfs-site.xml files from your Hadoop installation

You will also need to provide proper Hadoop dependencies jars from your Hadoop installation to your Pinot startup scripts.

Push HDFS segment to Pinot Controller

For example, the following curl requests to Controller will notify it to download segment files to the proper table:

AWS S3

Azure Storage

Google Cloud Storage

Streaming

Pluggable Streams

Note

This section is a pre-read if you are planning to develop plug-ins for streams other than Kafka. Pinot supports Kafka out of the box.

Prior to commit ba9f2d, Pinot was only able to support consuming from Kafka stream.

Pinot now enables its users to write plug-ins to consume from pub-sub streams other than Kafka. (Please refer to Issue #2583)

Some of the streams for which plug-ins can be added are:

You may encounter some limitations either in Pinot or in the stream system while developing plug-ins. Please feel free to get in touch with us when you start writing a stream plug-in, and we can help you out. We are open to receiving PRs in order to improve these abstractions if they do not work for a certain stream implementation.

Refer to Consuming and Indexing rows in Realtime for details on how Pinot consumes streaming data.

Creating Pinot Segments

Realtime segment generation

To consume in realtime, we simply need to create a table with the same name as the schema and point to the Kafka topic to consume from, using a table definition such as this one:

First, we’ll start a local instance of Kafka and start streaming data into it:Untitled

This will stream one event per second from the Avro file to the Kafka topic. Then, we’ll create a realtime table, which will start consuming from the Kafka topic.

We can then query the table with the following query to see the events stream in:

Repeating the query multiple times should show the events slowly being streamed into the table.

Kafka

This page describes how to connect Kafka to Pinot

Kafka 2.x Plugin

Pinot provides stream plugin support for Kafka 2.x version. Although the version used in this implementation is kafka 2.0.0, it’s possible to compile it with higher kafka lib version, e.g. 2.1.1.

How to build and release Pinot package with Kafka 2.x connector

How to use Kafka 2.x connector

Use Kafka Stream(High) Level Consumer

Below is a sample streamConfigs used to create a realtime table with Kafka Stream(High) level consumer.

Kafka 2.x HLC consumer uses org.apache.pinot.core.realtime.impl.kafka2.KafkaConsumerFactory in config stream.kafka.consumer.factory.class.name.

Use Kafka Partition(Low) Level Consumer

Below is a sample table config used to create a realtime table with Kafka Partition(Low) level consumer:

Please note:

Config replicasPerPartition under segmentsConfig is required to specify table replication.
Config stream.kafka.consumer.type should be specified as LowLevel to use partition level consumer. (The use of simple instead of LowLevel is deprecated)
Configs stream.kafka.zk.broker.url and stream.kafka.broker.list are required under tableIndexConfig.streamConfigs to provide kafka related information.

Upgrade from Kafka 0.9 connector to Kafka 2.x connector

Update table config for both high level and low level consumer: Update config: stream.kafka.consumer.factory.class.name from org.apache.pinot.core.realtime.impl.kafka.KafkaConsumerFactory to org.apache.pinot.core.realtime.impl.kafka2.KafkaConsumerFactory.
If using Stream(High) level consumer: Please also add config stream.kafka.hlc.bootstrap.server into tableIndexConfig.streamConfigs. This config should be the URI of Kafka broker lists, e.g. localhost:9092.

How to use this plugin with higher Kafka version?

This connector is also suitable for Kafka lib version higher than 2.0.0. In pinot-connector-kafka-2.0/pom.xml change the kafka.lib.version from 2.0.0 to 2.1.1 will make this Connector working with Kafka 2.1.1.

Azure EventHub

Amazon Kinesis

Google Pub/Sub

For Operators

Basics

Setup cluster

To setup a Pinot cluster, follow these steps

instances
instances
instances

Pinot Query Language (PQL)

Learn how to query Pinot using PQL

PQL

PQL is a derivative of SQL that supports selection, projection, aggregation, and grouping aggregation.

PQL Limitations

PQL is only a derivative of SQL, and it does not support Joins nor Subqueries. In order to support them, we suggest to rely on PrestoDB https://prestodb.io/, although Subqueries are not completely supported by PrestoDB at the moment of writing.

PQL Examples

The Pinot Query Language (PQL) is very similar to standard SQL:

SELECT COUNT(*) FROM myTable

Aggregation

SELECT COUNT(*), MAX(foo), SUM(bar) FROM myTable

Grouping on Aggregation

SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM myTable
  GROUP BY bar, baz LIMIT 50

Ordering on Aggregation

SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM myTable
  GROUP BY bar, baz 
  ORDER BY bar, MAX(foo) DESC LIMIT 50

Filtering

SELECT COUNT(*) FROM myTable
  WHERE foo = 'foo'
  AND bar BETWEEN 1 AND 20
  OR (baz < 42 AND quux IN ('hello', 'goodbye') AND quuux NOT IN (42, 69))

Selection (Projection)

SELECT * FROM myTable
  WHERE quux < 5
  LIMIT 50

Ordering on Selection

SELECT foo, bar FROM myTable
  WHERE baz > 20
  ORDER BY bar DESC
  LIMIT 100

Pagination on Selection

Note: results might not be consistent if column ordered by has same value in multiple rows.

SELECT foo, bar FROM myTable
  WHERE baz > 20
  ORDER BY bar DESC
  LIMIT 50, 100

Wild-card match (in WHERE clause only)

To count rows where the column airlineName starts with U

SELECT count(*) FROM SomeTable
  WHERE regexp_like(airlineName, '^U.*')
  GROUP BY airlineName TOP 10

UDF

As of now, functions have to be implemented within Pinot. Injecting functions is not allowed yet. The example below demonstrate the use of UDFs. More examples in Transform Function in Aggregation Grouping

SELECT count(*) FROM myTable
  GROUP BY dateTimeConvert(timeColumnName, '1:MILLISECONDS:EPOCH', '1:HOURS:EPOCH', '1:HOURS')

BYTES column

Pinot supports queries on BYTES column using HEX string. The query response also uses hex string to represent bytes value.

E.g. the query below fetches all the rows for a given UID.

SELECT * FROM myTable
  WHERE UID = "c8b3bce0b378fc5ce8067fc271a34892"

PQL Specification

SELECT

The select statement is as follows

SELECT <outputColumn> (, outputColumn, outputColumn,...)
  FROM <tableName>
  (WHERE ... | GROUP BY ... | ORDER BY ... | TOP ... | LIMIT ...)

outputColumn can be * to project all columns, columns (foo, bar, baz) or aggregation functions like (MIN(foo), MAX(bar), AVG(baz)).

Filter Functions on Single Value/Multi-value

EQUALS
IN
NOT IN
GT
LT
BETWEEN
REGEXP_LIKE

For Multi-Valued columns, EQUALS is similar to CONTAINS.

Supported aggregations on single-value columns

COUNT
MIN
MAX
SUM
AVG
MINMAXRANGE
DISTINCT
DISTINCTCOUNT
DISTINCTCOUNTHLL
DISTINCTCOUNTRAWHLL: Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL (see pinot-core/**/HllUtil.java as an example) and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
FASTHLL (WARN: will be deprecated soon. FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format)
PERCENTILE[0-100]: e.g. PERCENTILE5, PERCENTILE50, PERCENTILE99, etc.
PERCENTILEEST[0-100]: e.g. PERCENTILEEST5, PERCENTILEEST50, PERCENTILEEST99, etc.

Supported aggregations on multi-value columns

COUNTMV
MINMV
MAXMV
SUMMV
AVGMV
MINMAXRANGEMV
DISTINCTCOUNTMV
DISTINCTCOUNTHLLMV
DISTINCTCOUNTRAWHLLMV: Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL (see pinot-core/**/HllUtil.java as an example) and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
FASTHLLMV (WARN: will be deprecated soon. It does not make lots of sense to configure serialized HyperLogLog column as a dimension)
PERCENTILE[0-100]MV: e.g. PERCENTILE5MV, PERCENTILE50MV, PERCENTILE99MV, etc.
PERCENTILEEST[0-100]MV: e.g. PERCENTILEEST5MV, PERCENTILEEST50MV, PERCENTILEEST99MV, etc.

WHERE

Supported predicates are comparisons with a constant using the standard SQL operators (=, <, <=, >, >=, <>, ‘!=’) , range comparisons using BETWEEN (foo BETWEEN 42 AND 69), set membership (foo IN (1, 2, 4, 8)) and exclusion (foo NOT IN (1, 2, 4, 8)). For BETWEEN, the range is inclusive.

Comparison with a regular expression is supported using the regexp_like function, as in WHERE regexp_like(columnName, 'regular expression')

GROUP BY

The GROUP BY clause groups aggregation results by a list of columns, or transform functions on columns (see below)

ORDER BY

The ORDER BY clause orders selection results or group by results by a list of columns. PQL supports ordering DESC or ASC.

TOP

The TOP n clause causes the ‘n’ largest group results to be returned. If not specified, the top 10 groups are returned.

LIMIT

The LIMIT n clause causes the selection results to contain at most ‘n’ results. The LIMIT a, b clause paginate the selection results from the ‘a’ th results and return at most ‘b’ results. By default, 10 records are returned in the result.

Transform Function in Aggregation and Grouping

In aggregation and grouping, each column can be transformed from one or multiple columns. For example, the following query will calculate the maximum value of column foo divided by column bar grouping on the column time converted from time unit MILLISECONDS to SECONDS:

SELECT MAX(DIV(foo, bar) FROM myTable
  GROUP BY DATETIMECONVERT(time, '1:MILLISECONDS:EPOCH', '1:SECONDS:EPOCH', '1:SECONDS')

Supported transform functions

Function

Description

ADD

Sum of at least two values

SUB

Difference between two values

MULT

Product of at least two values

DIV

Quotient of two values

MOD

Modulo of two values

ABS

Absolute of a value

CEIL

Rounded up to the nearest integer.

FLOOR

Rounded down to the nearest integer.

EXP

exponential of

Euler’s number raised to the power of x.

SQRT

Square root of a value

TIMECONVERT

Takes 3 arguments, converts the value into another time unit.

Examples TIMECONVERT(time, 'MILLISECONDS', 'SECONDS') - This expression converts the value of column time (taken to be in milliseconds) to the nearest seconds (i.e. the nearest seconds that is lower than the value of date column)

DATETIMECONVERT

Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity.

DATETIMECONVERT(columnName, inputFormat, outputFormat, outputGranularity)where, columnName - column name to convert inputFormat - format of the column columnName outputFormat - format of the result desired after conversion outputGranularity - the granularity in which to bucket the result

Format is expressed as <time size>:<time unit>:<time format>:<pattern> where,

time size - size of the time unit eg: 1, 10

time unit - HOURS, DAYS etc

time format - EPOCH or SIMPLE_DATE_FORMAT

pattern - this is defined in case of SIMPLE_DATE_FORMAT. eg: yyyyMMdd. A specific timezone can be passed using tz(timezone).

timezone - can be expressed as long form tz(Asia/Kolkata), or short form tz(IST) or in terms of GMT tz(GMT+0530). Default is UTC. It is recommended to use long form timezone, as short forms are ambiguous with daylight savings (eg: PDT works during daylight savings, PST otherwise)

Granularity is expressed as <time size>:<time unit>

Examples

1) To convert column "Date" from hoursSinceEpoch to daysSinceEpoch and bucket it to 1 day granularity dateTimeConvert(Date, '1:HOURS:EPOCH', '1:DAYS:EPOCH', '1:DAYS')

2) To simply bucket millis "Date" to 15 minutes granularity dateTimeConvert(Date, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES')

3) To convert column "Date" from hoursSinceEpoch to format yyyyMdd and bucket it to 1 days granularity dateTimeConvert(Date, '1:HOURS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', '1:DAYS')

4) To convert column "Date" from format yyyy/MM/dd to weeksSinceEpoch and bucket it to 1 weeks granularity dateTimeConvert(Date, '1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd', '1:WEEKS:EPOCH', '1:WEEKS')

5) To convert column "Date" from millis to format yyyyMdd in timezone PST dateTimeConvert(Date, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd tz(America/Los_Angeles)', '1:DAYS')

DATETRUNC

(Presto) SQL compatible date truncation, equivalent to the Presto function . Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone. Examples DATETRUNC('week', time_in_seconds, 'SECONDS') This expression converts the column time_in_seconds, which is a long containing seconds since UTC epoch truncated at WEEK (where a Week starts at Monday UTC midnight). The output is a long seconds since UTC epoch.

DATETRUNC('quarter', DIV(time_milliseconds/1000), 'SECONDS', 'America/Los_Angeles', 'HOURS') This expression converts the expression time_in_milliseconds/1000 (which is thus in seconds) into hours that are truncated at QUARTER at the Los Angeles time zone (where a Quarter begins on 1/1, 4/1, 7/1, 10/1 in Los Angeles timezone). The output is expressed as hours since UTC epoch (note that the output is not Los Angeles timezone)

ARRAYLENGTH

Returns the length of a multi-value column

VALUEIN

Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The VALUEIN transform function is especially useful when the same multi-valued column is both filtering column and grouping column. Examples VALUEIN(mvColumn, 3, 5, 15)

JSONEXTRACTSCALAR

JSONEXTRACTSCALAR(jsonField, 'jsonPath', 'resultsType')evaluates the jsonPath on jsonField (a string containing JSON) and returns the result as a type resultsType

jsonFieldName is a String field with Json document.

jsonPath is a to read from JSON document

results_type refers to the results data type, could be INT, LONG, FLOAT, DOUBLE, STRING, INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.

Examples

JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING') -> "bob"

JSONEXTRACTSCALAR(profile_json_str, '$.age', 'INT') -> 37

JSONEXTRACTKEY

JSONEXTRACTKEY(jsonField, 'jsonPath') extracts all field names based on jsonPath as a STRING_ARRAY.

jsonFieldName is a String field with Json document.

jsonPath is a to read from JSON document

Examples

JSONEXTRACTSCALAR(profile_json_str, '$.*') -> ["name", "age", "phone"...]

Differences with SQL

These differences only apply to the PQL endpoint. They do not hold true for the standard-SQL endpoint, which is the recommended endpoint. More information about the two types of endpoints in Querying Pinot

TOP works like LIMIT for truncation in group by queries
No need to select the columns to group with. The following two queries are both supported in PQL, where the non-aggregation columns are ignored.

SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM mytable
  GROUP BY bar, baz
  TOP 50

SELECT bar, baz, MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM mytable
  GROUP BY bar, baz
  TOP 50

The results will always order by the aggregated value (descending). The results for query

SELECT MIN(foo), MAX(foo) FROM myTable
  GROUP BY bar
  TOP 50

will be the same as the combining results from the following queries

SELECT MIN(foo) FROM myTable
  GROUP BY bar
  TOP 50
SELECT MAX(foo) FROM myTable
  GROUP BY bar
  TOP 50

where we don’t put the results for the same group together.

No support for ORDER BY in aggregation group by. However, ORDER BY support was added recently and is available in the standard-SQL endpoint. It can be used in the PQL endpoint by passing queryOptions into the payload as follows

{
  "pql" : "SELECT SUM(foo), SUM(bar) from myTable GROUP BY moo ORDER BY SUM(bar) ASC, moo DESC TOP 10",
  "queryOptions" : "groupByMode=sql;responseFormat=sql"
}

where,

groupByMode=sql - standard sql way of execution group by, hence accepting order by
responseFormat=sql - standard sql way of displaying results, in a tabular manner

Text search support

This page talks about support for text search functionality in Pinot.

Why do we need text search?

Pinot supports super fast query processing through its indexes on non-BLOB like columns. Queries with exact match filters are run efficiently through a combination of dictionary encoding, inverted index and sorted index. An example:

SELECT COUNT(*) FROM Foo WHERE STRING_COL = "ABCDCD" AND INT_COL > 2000

In the above query, we are doing exact match on two columns of type STRING and INT respectively.

For arbitrary text data which falls into the BLOB/CLOB territory, we need more than exact matches. Users are interested in doing regex, phrase, fuzzy queries on BLOB like data. Before 0.3.0, one had to use regexp_like to achieve this. However, this was scan based which was not performant and features like fuzzy search (edit distance search) were not possible.

In version 0.3.0, we added support for text indexes to efficiently do arbitrary search on STRING columns where each column value is a large BLOB of text. This can be achieved by using the new built-in function TEXT_MATCH.

SELECT COUNT(*) FROM Foo WHERE TEXT_MATCH (<column_name>, <search_expression)

where <column_name> is the column text index is created on and <search_expression> can be:

Search Expression Type

Example

Phrase query

TEXT_MATCH (<column_name>, '\"distributed system\"')

Term Query

TEXT_MATCH (<column_name>, 'Java')

Boolean Query

TEXT_MATCH (<column_name>, 'Java and c++')

Prefix Query

TEXT_MATCH (<column_name>, 'stream*')

Regex Query

TEXT_MATCH (<column_name>, '/Exception.*/')

Sample Datasets

Text search should ideally be used on STRING columns where doing standard filter operations (EQUALITY, RANGE, BETWEEN) doesn't fit the bill because each column value is a reasonably large blob of text.

Apache Access Log

Consider the following snippet from Apache access log. Each line in the log consists of arbitrary data (IP addresses, URLs, timestamps, symbols etc) and represents a column value. Data like this is a good candidate for doing text search.

Let's say the following snippet of data is stored in ACCESS_LOG_COL column in Pinot table.

109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-
109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
83.167.113.100 - - [12/Dec/2015:18:31:25 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
83.167.113.100 - - [12/Dec/2015:18:31:25 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
95.29.198.15 - - [12/Dec/2015:18:32:10 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
95.29.198.15 - - [12/Dec/2015:18:32:11 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
109.184.11.34 - - [12/Dec/2015:18:32:56 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
109.184.11.34 - - [12/Dec/2015:18:32:56 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
91.227.29.79 - - [12/Dec/2015:18:33:51 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"

Few examples of search queries on this data:

Count the number of GET requests.

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(ACCESS_LOG_COL, 'GET')

Count the number of POST requests that have administrator in the URL (administrator/index)

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(ACCESS_LOG_COL, 'post AND administrator AND index')

Count the number of POST requests that have a particular URL and handled by Firefox browser

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(ACCESS_LOG_COL, 'post AND administrator AND index AND firefox')

Resume text

Consider another example of simple resume text. Each line in the file represents skill-data from resumes of different candidates

Let's say the following snippet of data is stored in SKILLS_COL column in Pinot table. Each line in the input text represents a column value.

Distributed systems, Java, C++, Go, distributed query engines for analytics and data warehouses, Machine learning, spark, Kubernetes, transaction processing
Java, Python, C++, Machine learning, building and deploying large scale production systems, concurrency, multi-threading, CPU processing
C++, Python, Tensor flow, database kernel, storage, indexing and transaction processing, building large scale systems, Machine learning
Amazon EC2, AWS, hadoop, big data, spark, building high performance scalable systems, building and deploying large scale production systems, concurrency, multi-threading, Java, C++, CPU processing
Distributed systems, database development, columnar query engine, database kernel, storage, indexing and transaction processing, building large scale systems
Distributed systems, Java, realtime streaming systems, Machine learning, spark, Kubernetes, distributed storage, concurrency, multi-threading
CUDA, GPU, Python, Machine learning, database kernel, storage, indexing and transaction processing, building large scale systems
Distributed systems, Java, database engine, cluster management, docker image building and distribution
Kubernetes, cluster management, operating systems, concurrency, multi-threading, apache airflow, Apache Spark,
Apache spark, Java, C++, query processing, transaction processing, distributed storage, concurrency, multi-threading, apache airflow
Big data stream processing, Apache Flink, Apache Beam, database kernel, distributed query engines for analytics and data warehouses
CUDA, GPU processing, Tensor flow, Pandas, Python, Jupyter notebook, spark, Machine learning, building high performance scalable systems
Distributed systems, Apache Kafka, publish-subscribe, building and deploying large scale production systems, concurrency, multi-threading, C++, CPU processing, Java
Realtime stream processing, publish subscribe, columnar processing for data warehouses, concurrency, Java, multi-threading, C++,

Few examples of search queries on this data:

Count the number of candidates that have "machine learning" and "gpu processing" - a phrase search (more on this further in the document) where we are looking for exact match of phrases "machine learning" and "gpu processing" not necessarily in the same order in original data.

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"Machine learning\" AND \"gpu processing\"')

Count the number of candidates that have "distributed systems" and either 'Java' or 'C++' - a combination of searching for exact phrase "distributed systems" along with other terms.

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"distributed systems\" AND (Java C++)')

Query Log

Consider a snippet from a log file containing SQL queries handled by a database. Each line (query) in the file represents a column value in QUERY_LOG_COL column in Pinot table.

SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560988800000 AND 1568764800000 GROUP BY dimensionCol3 TOP 2500
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560988800000 AND 1568764800000 GROUP BY dimensionCol3 TOP 2500
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1545436800000 AND 1553212800000 GROUP BY dimensionCol3 TOP 2500
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1537228800000 AND 1537660800000 GROUP BY dimensionCol3 TOP 2500
SELECT dimensionCol2, dimensionCol4, timestamp, dimensionCol5, dimensionCol6 FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1561366800000 AND 1561370399999 AND dimensionCol3 = 2019062409 LIMIT 10000
SELECT dimensionCol2, dimensionCol4, timestamp, dimensionCol5, dimensionCol6 FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1563807600000 AND 1563811199999 AND dimensionCol3 = 2019072215 LIMIT 10000
SELECT dimensionCol2, dimensionCol4, timestamp, dimensionCol5, dimensionCol6 FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1563811200000 AND 1563814799999 AND dimensionCol3 = 2019072216 LIMIT 10000
SELECT dimensionCol2, dimensionCol4, timestamp, dimensionCol5, dimensionCol6 FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1566327600000 AND 1566329400000 AND dimensionCol3 = 2019082019 LIMIT 10000
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560834000000 AND 1560837599999 AND dimensionCol3 = 2019061805 LIMIT 0
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560870000000 AND 1560871800000 AND dimensionCol3 = 2019061815 LIMIT 0
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560871800001 AND 1560873599999 AND dimensionCol3 = 2019061815 LIMIT 0
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560873600000 AND 1560877199999 AND dimensionCol3 = 2019061816 LIMIT 0

Few examples of search queries on this data:

Count the number of queries that have GROUP BY

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(QUERY_LOG_COL, '\"group by\"')

Count the number of queries that have the SELECT count... pattern

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(QUERY_LOG_COL, '\"select count\"')

Count the number of queries that use BETWEEN filter on timestamp column along with GROUP BY

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(QUERY_LOG_COL, '\"timestamp between\" AND \"group by\"')

Further sections in the document cover several concrete examples on each kind of query and step-by-step guide on how to write text search queries in Pinot.

Current restrictions

Currently we support text search in a restricted manner. More specifically, we have the following constraints:

The column type should be STRING.
The column should be single-valued.
Co-existence of text index with other Pinot indexes is currently not supported.

The last two restrictions are going to be relaxed very soon in the upcoming releases.

Co-existence with other indexes

Currently, a column in Pinot can be dictionary encoded or stored RAW. Furthermore, we can create inverted index on the dictionary encoded column. We can also create a sorted index on the dictionary encoded column.

Text index is an addition to the type of per-column indexes users can create in Pinot. However, the current implementation supports text index on RAW column. In other words, the column should not be dictionary encoded. As we relax this constraint in upcoming releases, text index can be created on a dictionary encoded column that also has other indexes (inverted, sorted etc).

How to enable text index?

Similar to other indexes, users can enable text index on a column through table config. As part of text-search feature, we have also introduced a new generic way of specifying the per-column encoding and index information. In the table config, there will be a new section with name "fieldConfigList".

IMPORTANT: This mechanism of using "fieldConfigList" is currently ONLY used for text indexes. Our plan is to migrate all other indexes to this model. We are going to do that in upcoming releases and accordingly user documentation and new guidelines will be published. So please continue to specify other index info in table config as you have done till now and use the "fieldConfigList" only for text indexes.

"fieldConfigList":[
  {
     "name":"text_col_1",
     "encodingType":"RAW",
     "indexType":"TEXT"
  },
  {
     "name":"text_col_2",
     "encodingType":"RAW",
     "indexType":"TEXT"
  }
]

"fieldConfigList" will be a new section in table config. It is essentially a list of per-column encoding and index information. In the above example, the list contains text index information for two columns text_col_1 and text_col_2. Each object in fieldConfigList contains the following information

name - Name of the column text index is enabled on
encodingType - As mentioned earlier, we can store a column either as RAW or dictionary encoded. Since for now we have a restriction on the text index, this should always be RAW.
indexType - This should be TEXT.

Also, since we haven't yet removed the old way of specifying the index info, each column that text index is enabled on should also be specified in noDictionaryColumns in tableIndexConfig

"tableIndexConfig": {
   "noDictionaryColumns": [
     "text_col_1",
     "text_col_2"
 ]}

The above mechanism should allow the user to use text index in all of the following scenarios:

Adding new table with text index enabled on one or more columns.
Adding a new column with text index enabled to an existing table.
Enabling text index on an existing column.

Since we haven't yet removed the old way of specifying the

Text Index Creation

Once the text index is enabled on one or more columns through table config, our segment generation code will pick up the config and automatically create text index (per column). This is exactly how other indexes in Pinot are created.

Text index is supported for both offline and realtime segments.

Text parsing and tokenization

The original text document (a value in the column with text index enabled) is parsed, tokenized and individual "indexable" terms are extracted. These terms are inserted into the index.

Pinot's text index is built on top of Lucene. Lucene's standard english text tokenizer generally works well for most classes of text. We might want to build custom text parser and tokenizer to suit particular user requirements. Accordingly, we can make this configurable for the user to specify on per column text index basis.

Writing Text Search Queries

A new built-in function TEXT_MATCH has been introduced for using text search in SQL/PQL.

TEXT_MATCH(text_column_name, search_expression)

text_column_name - name of the column to do text search on.
search_expression - search query

We can use TEXT_MATCH function as part of our queries in the WHERE clause. Examples:

SELECT COUNT(*) FROM Foo WHERE TEXT_MATCH(...)
SELECT * FROM Foo WHERE TEXT_MATCH(...)

We can also use the TEXT_MATCH filter clause with other filter operators. For example:

SELECT COUNT(*) FROM Foo WHERE TEXT_MATCH(...) AND some_other_column_1 > 20000
SELECT COUNT(*) FROM Foo WHERE TEXT_MATCH(...) AND some_other_column_1 > 20000 AND some_other_column_2 < 100000

Combining multiple TEXT_MATCH filter clauses

SELECT COUNT(*) FROM Foo WHERE TEXT_MATCH(text_col_1, ....) AND TEXT_MATCH(text_col_2, ...)

TEXT_MATCH can be used in WHERE clause of all kinds of queries supported by Pinot

Selection query which projects one or more columns
- User can also include the text column name in select list
Aggregation query
Aggregation GROUP BY query

The search expression (second argument to TEXT_MATCH function) is the query string that Pinot will use to perform text search on the column's text index. **Following expression types are supported

Phrase Query

This query is used to do exact match of a given phrase. Exact match implies that terms in the user specified phrase should appear in the exact same order in the original text document. Note that document is referred to as the column value.

Let's take the example of resume text data containing 14 documents to walk through queries. The data is stored in column named SKILLS_COL and we have created a text index on this column.

Java, C++, worked on open source projects, coursera machine learning
Machine learning, Tensor flow, Java, Stanford university,
Distributed systems, Java, C++, Go, distributed query engines for analytics and data warehouses, Machine learning, spark, Kubernetes, transaction processing
Java, Python, C++, Machine learning, building and deploying large scale production systems, concurrency, multi-threading, CPU processing
C++, Python, Tensor flow, database kernel, storage, indexing and transaction processing, building large scale systems, Machine learning
Amazon EC2, AWS, hadoop, big data, spark, building high performance scalable systems, building and deploying large scale production systems, concurrency, multi-threading, Java, C++, CPU processing
Distributed systems, database development, columnar query engine, database kernel, storage, indexing and transaction processing, building large scale systems
Distributed systems, Java, realtime streaming systems, Machine learning, spark, Kubernetes, distributed storage, concurrency, multi-threading
CUDA, GPU, Python, Machine learning, database kernel, storage, indexing and transaction processing, building large scale systems
Distributed systems, Java, database engine, cluster management, docker image building and distribution
Kubernetes, cluster management, operating systems, concurrency, multi-threading, apache airflow, Apache Spark,
Apache spark, Java, C++, query processing, transaction processing, distributed storage, concurrency, multi-threading, apache airflow
Big data stream processing, Apache Flink, Apache Beam, database kernel, distributed query engines for analytics and data warehouses
CUDA, GPU processing, Tensor flow, Pandas, Python, Jupyter notebook, spark, Machine learning, building high performance scalable systems
Distributed systems, Apache Kafka, publish-subscribe, building and deploying large scale production systems, concurrency, multi-threading, C++, CPU processing, Java
Realtime stream processing, publish subscribe, columnar processing for data warehouses, concurrency, Java, multi-threading, C++,
C++, Java, Python, realtime streaming systems, Machine learning, spark, Kubernetes, transaction processing, distributed storage, concurrency, multi-threading, apache airflow
Databases, columnar query processing, Apache Arrow, distributed systems, Machine learning, cluster management, docker image building and distribution
Database engine, OLAP systems, OLTP transaction processing at large scale, concurrency, multi-threading, GO, building large scale systems

Example 1 - Search in SKILL_COL column to look for documents where each matching document MUST contain phrase "distributed systems" as is

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"Distributed systems\"')

The search expression is '\"Distributed systems\"'

The search expression is always specified within single quotes '<your expression>'
Since we are doing a phrase search, the phrase should be specified within double quotes inside the single quotes and the double quotes should be escaped
- '\"<your phrase>\"'

The above query will match the following documents:

Distributed systems, Java, C++, Go, distributed query engines for analytics and data warehouses, Machine learning, spark, Kubernetes, transaction processing
Distributed systems, database development, columnar query engine, database kernel, storage, indexing and transaction processing, building large scale systems
Distributed systems, Java, realtime streaming systems, Machine learning, spark, Kubernetes, distributed storage, concurrency, multi-threading
Distributed systems, Java, database engine, cluster management, docker image building and distribution
Distributed systems, Apache Kafka, publish-subscribe, building and deploying large scale production systems, concurrency, multi-threading, C++, CPU processing, Java
Databases, columnar query processing, Apache Arrow, distributed systems, Machine learning, cluster management, docker image building and distribution

But it won't match the following document:

Distributed data processing, systems design experience

This is because the phrase query looks for the phrase occurring in the original document "as is". The terms as specified by the user in phrase should be in the exact same order in the original document for the document to be considered as a match.

NOTE: Matching is always done in a case-insensitive manner.

Example 2 - Search in SKILL_COL column to look for documents where each matching document MUST contain phrase "query processing" as is

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"query processing\"')

The above query will match the following documents:

Apache spark, Java, C++, query processing, transaction processing, distributed storage, concurrency, multi-threading, apache airflow
Databases, columnar query processing, Apache Arrow, distributed systems, Machine learning, cluster management, docker image building and distribution"

Term Query

Term queries are used to search for individual terms

Example 3 - Search in SKILL_COL column to look for documents where each matching document MUST contain the term 'java'

As mentioned earlier, the search expression is always within single quotes. However, since this is a term query, we don't have to use double quotes within single quotes.

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, 'Java')

Composite Query using Boolean Operators

Boolean operators AND, OR are supported and we can use them to build a composite query. Boolean operators can be used to combine phrase and term queries in any arbitrary manner

Example 4 - Search in SKILL_COL column to look for documents where each matching document MUST contain phrases "distributed systems" and "tensor flow". This combines two phrases using AND boolean operator

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"Machine learning\" AND \"Tensor Flow\"')

The above query will match the following documents:

Machine learning, Tensor flow, Java, Stanford university,
C++, Python, Tensor flow, database kernel, storage, indexing and transaction processing, building large scale systems, Machine learning
CUDA, GPU processing, Tensor flow, Pandas, Python, Jupyter notebook, spark, Machine learning, building high performance scalable systems

Example 5 - Search in SKILL_COL column to look for documents where each document MUST contain phrase "machine learning" and term 'gpu' and term 'python'. This combines a phrase and two terms using boolean operator

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"Machine learning\" AND gpu AND python')

The above query will match the following documents:

CUDA, GPU, Python, Machine learning, database kernel, storage, indexing and transaction processing, building large scale systems
CUDA, GPU processing, Tensor flow, Pandas, Python, Jupyter notebook, spark, Machine learning, building high performance scalable systems

When using boolean operators to combine term(s) and phrase(s) or both, please note that:

The matching document can contain the terms and phrases in any order.
The matching document may not have the terms adjacent to each other (if this is needed, please use appropriate phrase query for the concerned terms).

Use of OR operator is implicit. In other words, if phrase(s) and term(s) are not combined using AND operator in the search expression, OR operator is used by default:

Example 6 - Search in SKILL_COL column to look for documents where each document MUST contain ANY one of:

phrase "distributed systems" OR
term 'java' OR
term 'C++'.

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"distributed systems\" Java C++')

We can also do grouping using parentheses:

Example 7 - Search in SKILL_COL column to look for documents where each document MUST contain

phrase "distributed systems" AND
at least one of the terms Java or C++

In the below query, we group terms Java and C++ without any operator which implies the use of OR. The root operator AND is used to combine this with phrase "distributed systems"

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"distributed systems\" AND (Java C++)')

Prefix Query

Prefix searches can also be done in the context of a single term. We can't use prefix matches for phrases.

Example 8 - Search in SKILL_COL column to look for documents where each document MUST contain text like stream, streaming, streams etc

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, 'stream*')

The above query will match the following documents:

Distributed systems, Java, realtime streaming systems, Machine learning, spark, Kubernetes, distributed storage, concurrency, multi-threading
Big data stream processing, Apache Flink, Apache Beam, database kernel, distributed query engines for analytics and data warehouses
Realtime stream processing, publish subscribe, columnar processing for data warehouses, concurrency, Java, multi-threading, C++,
C++, Java, Python, realtime streaming systems, Machine learning, spark, Kubernetes, transaction processing, distributed storage, concurrency, multi-threading, apache airflow

Regular Expression Query

Phrase and term queries work on the fundamental logic of looking up the terms (aka tokens) in the text index. The original text document (a value in the column with text index enabled) is parsed, tokenized and individual "indexable" terms are extracted. These terms are inserted into the index.

Based on the nature of original text and how the text is segmented into tokens, it is possible that some terms don't get indexed individually. In such cases, it is better to use regular expression queries on the text index.

Consider server log as an example and we want to look for exceptions. A regex query is suitable for this scenario as it is unlikely that 'exception' is present as an individual indexed token.

Syntax of a regex query is slightly different from queries mentioned earlier. The regular expression is written between a pair of forward slashes (/).

SELECT SKILLS_COL FROM MyTable WHERE text_match(SKILLS_COL, '/.*Exception/')

The above query will match any text document containing exception.

Deciding Query Types

Generally, a combination of phrase and term queries using boolean operators and grouping should allow us to build a complex text search query expression.

The key thing to remember is that phrases should be used when the order of terms in the document is important and if separating the phrase into individual terms doesn't make sense from end user's perspective.

An example would be phrase "machine learning".

TEXT_MATCH(column, '\"machine learning\"')

However, if we are searching for documents matching Java and C++ terms, using phrase query "Java C++" will actually result in in partial results (could be empty too) since now we are relying the on the user specifying these skills in the exact same order (adjacent to each other) in the resume text.

TEXT_MATCH(column, '\"Java C++\"')

Term query using boolean AND operator is more appropriate for such cases

TEXT_MATCH(column, 'Java AND C++')

Segment

Pinot has the concept of table, which is a logical abstraction to refer to a collection of related data. Pinot has a distributed architecture and scales horizontally. Pinot expects the size of a table to grow infinitely over time. In order to achieve this, the entire data needs to be distributed across multiple nodes. Pinot achieve this by breaking the data into smaller chunks known as segment (this is similar to shards/partitions in relational databases). Segments can also be seen as time based partitions.

Thus, a segment is a horizontal shard representing a chunk of table data with some number of rows. The segment stores data for all columns of the table. Each segment packs the data in a columnar fashion, along with the dictionaries and indices for the columns. The segment is laid out in a columnar format so that it can be directly mapped into memory for serving queries.

Columns may be single or multi-valued. Column types may be STRING, INT, LONG, FLOAT, DOUBLE or BYTES. Columns may be declared to be metric or dimension (or specifically as a time dimension) in the schema. Columns can have default null value. For example, the default null value of a integer column can be 0. Note: The default value of byte column has to be hex-encoded before adding to the schema.

Pinot uses dictionary encoding to store values as a dictionary ID. Columns may be configured to be “no-dictionary” column in which case raw values are stored. Dictionary IDs are encoded using minimum number of bits for efficient storage (e.g. a column with cardinality of 3 will use only 2 bits for each dictionary ID).

There is a forward index built for each column and compressed appropriately for efficient memory use. In addition, optional inverted indices can be configured for any set of columns. Inverted indices, while take up more storage, offer better query performance. Specialized indexes like Star-Tree index is also supported. Check out Indexing for more details.

Creating a segment

Once the table is configured, we can load some data. Loading data involves generating pinot segments from raw data and pushing them to the pinot cluster. Data can be loaded in batch mode or streaming mode. See ingestion overview page for details.

Load Data in Batch

Below are instructions to generate and push segments to Pinot via standalone scripts. For a production setup, you should use frameworks such as Hadoop or Spark. See this page for more details on setting up Data Ingestion Jobs.

Job Spec YAML

To generate a segment, we need to first create a job spec yaml file. JobSpec yaml file has all the information regarding data format, input data location and pinot cluster coordinates. Note that this assumes that the controller is RUNNING to fetch the table config and schema. If not, you will have to configure the spec to point at their location.

job-spec.yml

executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'

jobType: SegmentCreationAndTarPush
inputDirURI: 'examples/batch/baseballStats/rawdata'
includeFileNamePattern: 'glob:**/*.csv'
excludeFileNamePattern: 'glob:**/*.tmp'
outputDirURI: 'examples/batch/baseballStats/segments'
overwriteOutput: true

pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS

recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
  configs:

tableSpec:
  tableName: 'baseballStats'
  schemaURI: 'http://localhost:9000/tables/baseballStats/schema'
  tableConfigURI: 'http://localhost:9000/tables/baseballStats'

segmentNameGeneratorSpec:

pinotClusterSpecs:
  - controllerURI: 'http://localhost:9000'

pushJobSpec:
  pushAttempts: 2
  pushRetryIntervalMillis: 1000

where,

Top level field

Description

executionFrameworkSpec

Defines ingestion jobs to be running. For more details, scroll down to

jobType

Pinot ingestion job type. Supported job types are:

SegmentCreation - only create segment

SegmentTarPush - only upload segments

SegmentUriPush -

SegmentCreationAndTarPush - create and upload segment

SegmentCreationAndUriPush -

inputDirURI

Root directory of input data, expected to have scheme configured in PinotFS.

includeFileNamePattern

Include file name pattern, supported glob pattern. E.g.

'glob:*.avro' will include all avro files just under the inputDirURI, not sub directories

'glob:**/*.avro' will include all the avro files under inputDirURI recursively.

excludeFileNamePattern

Exclude file name pattern, supported glob pattern. Similar usage as includeFilePatternName

outputDirURI

Root directory of output segments, expected to have scheme configured in PinotFS.

overwriteOutput

Overwrite output segments if existed.

pinotFSSpecs

Defines all related Pinot file systems. For more details, scroll down to

recordReaderSpec

Defines all record reader config. For more details, scroll down to

tableSpec

Defines table name and where to fetch corresponding table config and table schema. For more details, scroll down to

segmentNameGeneratorSpec

Defines how the names of the segments will be. For more details, scroll down to

pinotClusterSpecs

Defines the Pinot Cluster Access Point. For more details, scroll down to

pushJobSpec

Defines segment push job related configuration. For more details, scroll down to

executionFrameworkSpec

field

Description

name

execution framework name

segmentGenerationJobRunnerClassName

class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentGenerationJobRunner interface.

segmentTarPushJobRunnerClassName

class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentTarPushJobRunner interface.

segmentUriPushJobRunnerClassName

class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentUriPushJobRunner interface.

extraConfigs

Map of extra configs for execution framework

pinotFSSpecs

field

description

schema

used to identify a PinotFS. E.g. local, hdfs, dbfs, etc

className

Class name used to create the PinotFS instance. E.g.

org.apache.pinot.spi.filesystem.LocalPinotFS is used for local filesystem

org.apache.pinot.plugin.filesystem.HadoopPinotFS is used for HDFS

configs

configs used to init PinotFS instance

recordReaderSpec

field

description

dataFormat

Record data format, e.g. 'avro', 'parquet', 'orc', 'csv', 'json', 'thrift' etc.

className

Corresponding RecordReader class name. E.g.

org.apache.pinot.plugin.inputformat.avro.AvroRecordReader

org.apache.pinot.plugin.inputformat.csv.CSVRecordReader

org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader

org.apache.pinot.plugin.inputformat.json.JsonRecordReader

org.apache.pinot.plugin.inputformat.orc.OrcRecordReader

org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReader

configClassName

Corresponding RecordReaderConfig class name, it's mandatory for CSV and Thrift file format. E.g.

org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig

org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReaderConfig

configs

Used to init RecordReaderConfig class name, this config is required for CSV and Thrift data format.

tableSpec

field

description

tableName

table name

schemaURI

defines where to read the table schema, supports PinotFS or HTTP. E.g.

hdfs://path/to/table_schema.json

http://localhost:9000/tables/myTable/schema

tableConfigURI

defines where to read the table config. Supports using PinotFS or HTTP. E.g.

hdfs://path/to/table_config.json

http://localhost:9000/tables/myTable

segmentNameGeneratorSpec

field

description

type

supported type is simple and normalizedDate

configs

configs to init SegmentNameGenerator

pinotClusterSpecs

field

description

controllerURI

used to fetch table/schema information and data push.

E.g. http://localhost:9000

pushJobSpec

field

description

pushAttempts

number of attempts for push job, default is 1, which means no retry.

pushRetryIntervalMillis

retry wait Ms, default to 1 second.

pushParallelism

push job parallelism, default is 1

Create and push segment

To create and push the segment in one go, use

docker run \
    --network=pinot-demo \
    --name pinot-data-ingestion-job \
    ${PINOT_IMAGE} LaunchDataIngestionJob \
    -jobSpecFile examples/docker/ingestion-job-specs/airlineStats.yaml

Sample Console Output

SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
  segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**/*.avro
inputDirURI: examples/batch/airlineStats/rawdata
jobType: SegmentCreationAndTarPush
outputDirURI: examples/batch/airlineStats/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: 'http://pinot-controller:9000'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: {pushAttempts: 2, pushParallelism: 1, pushRetryIntervalMillis: 1000,
  segmentUriPrefix: null, segmentUriSuffix: null}
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.avro.AvroRecordReader,
  configClassName: null, configs: null, dataFormat: avro}
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: 'http://pinot-controller:9000/tables/airlineStats/schema',
  tableConfigURI: 'http://pinot-controller:9000/tables/airlineStats', tableName: airlineStats}

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Finished building StatsCollector!
Collected stats for 403 documents
Created dictionary for INT column: FlightNum with cardinality: 386, range: 14 to 7389
Using fixed bytes value dictionary for column: Origin, size: 294
Created dictionary for STRING column: Origin with cardinality: 98, max length in bytes: 3, range: ABQ to VPS
Created dictionary for INT column: Quarter with cardinality: 1, range: 1 to 1
Created dictionary for INT column: LateAircraftDelay with cardinality: 50, range: -2147483648 to 303
......
......
Pushing segment: airlineStats_OFFLINE_16085_16085_29 to location: http://pinot-controller:9000 for table airlineStats
Sending request: http://pinot-controller:9000/v2/segments?tableName=airlineStats to controller: a413b0013806, version: Unknown
Response for pushing table airlineStats segment airlineStats_OFFLINE_16085_16085_29 to location http://pinot-controller:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16085_16085_29 of table: airlineStats"}
Pushing segment: airlineStats_OFFLINE_16084_16084_30 to location: http://pinot-controller:9000 for table airlineStats
Sending request: http://pinot-controller:9000/v2/segments?tableName=airlineStats to controller: a413b0013806, version: Unknown
Response for pushing table airlineStats segment airlineStats_OFFLINE_16084_16084_30 to location http://pinot-controller:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16084_16084_30 of table: airlineStats"}

bin/pinot-admin.sh LaunchDataIngestionJob \
    -jobSpecFile examples/batch/airlineStats/ingestionJobSpec.yaml

Alternately, you can separately create and then push, by changing the jobType to SegmentCreation or SegmenTarPush.

Templating Ingestion Job Spec

Ingestion job spec supports templating with Groovy Syntax.

This would be convenient for users to generate one ingestion job template file and schedule it in a daily basis with extra parameters updated daily.

E.g. users can set inputDirURI with parameters to indicate date, so that ingestion job only process the data for a particular date.

Below is an example to specify the date templating for input and output path.

inputDirURI: 'examples/batch/airlineStats/rawdata/${year}/${month}/${day}'
outputDirURI: 'examples/batch/airlineStats/segments/${year}/${month}/${day}'

Then specify the value of ${year}, ${month}, ${day} when kicking off the ingestion job with arguments: -values $param=value1 $param2=value2...

docker run \
    --network=pinot-demo \
    --name pinot-data-ingestion-job \
    ${PINOT_IMAGE} LaunchDataIngestionJob \
    -jobSpecFile examples/docker/ingestion-job-specs/airlineStats.yaml
    -values year=2014 month=01 day=03

This ingestion job only generates segment for date 2014-01-03

Load Data in Streaming

Prerequisites

Setup a cluster
Create broker and server tenants
Create a realtime table and setup a realtime stream

Below is an example of how to publish sample data to your stream. As soon as data is available to the realtime stream, it starts getting consumed by the realtime servers

Kafka

Run below command to stream JSON data into Kafka topic: flights-realtime

docker run \
  --network pinot-demo \
  --name=loading-airlineStats-data-to-kafka \
  ${PINOT_IMAGE} StreamAvroIntoKafka \
  -avroFile examples/stream/airlineStats/sample_data/airlineStats_data.avro \
  -kafkaTopic flights-realtime -kafkaBrokerList kafka:9092 -zkAddress pinot-zookeeper:2181/kafka

Run below command to stream JSON data into Kafka topic: flights-realtime

bin/pinot-admin.sh StreamAvroIntoKafka \
  -avroFile examples/stream/airlineStats/sample_data/airlineStats_data.avro \
  -kafkaTopic flights-realtime -kafkaBrokerList localhost:19092 -zkAddress localhost:2191/kafka

Advanced Pinot Setup

Start Pinot components (scripts or docker images)

Setup Pinot by starting each component individually

Start Pinot Components using docker

Pull docker image

You can try out pre-built Pinot all-in-one docker image.

export PINOT_VERSION=0.3.0-SNAPSHOT
export PINOT_IMAGE=apachepinot/pinot:${PINOT_VERSION}
docker pull ${PINOT_IMAGE}

(Optional) You can also follow the instructions here to build your own images.

0. Create a Network

Create an isolated bridge network in docker

docker network create -d bridge pinot-demo

1. Start Zookeeper

Start Zookeeper in daemon.

docker run \
    --network=pinot-demo \
    --name  pinot-zookeeper \
    --restart always \
    -p 2181:2181 \
    -d zookeeper:3.5.6

Start ZKUI to browse Zookeeper data at http://localhost:9090.

docker run \
    --network pinot-demo --name=zkui \
    -p 9090:9090 \
    -e ZK_SERVER=pinot-zookeeper:2181 \
    -d qnib/plain-zkui:latest

2. Start Pinot Controller

Start Pinot Controller in daemon and connect to Zookeeper.

docker run \
    --network=pinot-demo \
    --name pinot-controller \
    -p 9000:9000 \
    -d ${PINOT_IMAGE} StartController \
    -zkAddress pinot-zookeeper:2181

3. Start Pinot Broker

Start Pinot Broker in daemon and connect to Zookeeper.

docker run \
    --network=pinot-demo \
    --name pinot-broker \
    -d ${PINOT_IMAGE} StartBroker \
    -zkAddress pinot-zookeeper:2181

4. Start Pinot Server

Start Pinot Server in daemon and connect to Zookeeper.

export PINOT_IMAGE=apachepinot/pinot:0.3.0-SNAPSHOT
docker run \
    --network=pinot-demo \
    --name pinot-server \
    -d ${PINOT_IMAGE} StartServer \
    -zkAddress pinot-zookeeper:2181

Now all Pinot related components are started as an empty cluster.

You can run below command to check container status.

docker container ls -a

Sample Console Output

CONTAINER ID        IMAGE                              COMMAND                  CREATED              STATUS                PORTS                                                  NAMES
9e80c3fcd29b        apachepinot/pinot:0.3.0-SNAPSHOT   "./bin/pinot-admin.s…"   18 seconds ago       Up 17 seconds         8096-8099/tcp, 9000/tcp                                pinot-server
f4c42a5865c7        apachepinot/pinot:0.3.0-SNAPSHOT   "./bin/pinot-admin.s…"   21 seconds ago       Up 21 seconds         8096-8099/tcp, 9000/tcp                                pinot-broker
a413b0013806        apachepinot/pinot:0.3.0-SNAPSHOT   "./bin/pinot-admin.s…"   26 seconds ago       Up 25 seconds         8096-8099/tcp, 0.0.0.0:9000->9000/tcp                  pinot-controller
9d3b9c4d454b        zookeeper:3.5.6                    "/docker-entrypoint.…"   About a minute ago   Up About a minute     2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, 8080/tcp   pinot-zookeeper

Download Pinot Distribution from http://pinot.apache.org/download/

$ export PINOT_VERSION=0.2.0
$ tar -xvf apache-pinot-incubating-${PINOT_VERSION}-bin.tar.gz

$ cd apache-pinot-incubating-${PINOT_VERSION}-bin
$ ls
DISCLAIMER    LICENSE        NOTICE        bin        conf        lib        licenses    query_console    sample_data

$ PINOT_INSTALL_DIR=`pwd`

Start Pinot components via launcher scripts

Start Zookeeper

cd apache-pinot-incubating-${PINOT_VERSION}-bin
bin/pinot-admin.sh StartZookeeper

Start Pinot Controller

See controller page for more details .

bin/pinot-admin.sh StartController \
    -zkAddress localhost:2181

Start Pinot Broker

bin/pinot-admin.sh StartBroker \
    -zkAddress localhost:2181

Start Pinot Controller

bin/pinot-admin.sh StartServer \
    -zkAddress localhost:2181

Start Pinot Using Config Files

Often times we need to customized the setup of Pinot Components. Hence user can compile a config file and use it to start Pinot Components.

Below are the examples config files and sample command to start Pinot.

Pinot Controller

Below is a sample pinot-controller.conf used in HelmChart setup.

controller.helix.cluster.name=pinot-quickstart
controller.port=9000
controller.vip.host=pinot-controller
controller.vip.port=9000
controller.data.dir=/var/pinot/controller/data
controller.zk.str=pinot-zookeeper:2181
pinot.set.instance.id.to.hostname=true

In order to run Pinot Controller, the command is:

bin/pinot-admin.sh StartController -configFileName config/pinot-controller.conf

Configure Controller

Below are some outstanding configurations you can set in Pinot Controller:

Config Name

Description

Default Value

controller.helix.cluster.name

Pinot Cluster name

PinotCluster

controller.host

Pinot Controller Host

Required if config pinot.set.instance.id.to.hostname is false.

pinot.set.instance.id.to.hostname

When enabled, use server hostname to infer controller.host

false

controller.port

Pinot Controller Port

9000

controller.vip.host

The VIP hostname used to set the download URL for segments

${controller.host}

controller.vip.port

The VIP port used to set the download URL for segments

${controller.port}

controller.data.dir

Directory to host segment data

${java.io.tmpdir}/PinotController

controller.zk.str

Zookeeper URL

localhost:2181

cluster.tenant.isolation.enable

Enable Tenant Isolation, default is single tenant cluster

true

Pinot Broker

Below is a sample pinot-broker.conf used in HelmChart setup.

pinot.broker.client.queryPort=8099
pinot.broker.routing.table.builder.class=random
pinot.set.instance.id.to.hostname=true

In order to run Pinot Broker, the command is:

bin/pinot-admin.sh StartBroker -clusterName pinot-quickstart -zkAddress pinot-zookeeper:2181 -configFileName config/pinot-broker.conf

Configure Broker

Below are some outstanding configurations you can set in Pinot Broker:

Config Name

Description

Default Value

instanceId

Unique id to register Pinot Broker in the cluster.

BROKER_${BROKER_HOST}_${pinot.broker.client.queryPort}

pinot.set.instance.id.to.hostname

When enabled, use server hostname to set ${BROKER_HOST} in above config, else use IP address.

false

pinot.broker.client.queryPort

Port to query Pinot Broker

8099

pinot.broker.timeoutMs

Timeout for Broker Query in Milliseconds

10000

pinot.broker.enable.query.limit.override

Configuration to enable Query LIMIT Override to protect Pinot Broker and Server from fetch too many records back.

false

pinot.broker.query.response.limit

When config pinot.broker.enable.query.limit.override is enabled, reset limit for selection query if it exceeds this value.

2147483647

pinot.broker.startup.minResourcePercent

Configuration to consider the broker ServiceStatus as being STARTED if the percent of resources (tables) that are ONLINE for this this broker has crossed the threshold percentage of the total number of tables that it is expected to serve

100.0

Pinot Server

Below is a sample pinot-server.conf used in HelmChart setup.

pinot.server.netty.port=8098
pinot.server.adminapi.port=8097
pinot.server.instance.dataDir=/var/pinot/server/data/index
pinot.server.instance.segmentTarDir=/var/pinot/server/data/segment
pinot.set.instance.id.to.hostname=true

In order to run Pinot Server, the command is:

bin/pinot-admin.sh StartServer -clusterName pinot-quickstart -zkAddress pinot-zookeeper:2181 -configFileName config/pinot-server.conf

Configure Server

Below are some outstanding configurations you can set in Pinot Server:

Config Name

Description

Default Value

instanceId

Unique id to register Pinot Server in the cluster.

Server_${SERVER_HOST}_${pinot.server.netty.port}

pinot.set.instance.id.to.hostname

When enabled, use server hostname to set ${SERVER_HOST} in above config, else use IP address.

false

pinot.server.netty.port

Port to query Pinot Server

8098

pinot.server.adminapi.port

Port for Pinot Server Admin UI

8097

pinot.server.instance.dataDir

Directory to hold all the data

${java.io.tmpDir}/PinotServer/index

pinot.server.instance.segmentTarDir

Directory to hold temporary segments downloaded from Controller or Deep Store

${java.io.tmpDir}/PinotServer/segmentTar

pinot.server.query.executor.timeout

Timeout for Server to process Query in Milliseconds

15000

Create and Configure table

A TABLE in regular database world is represented as <TABLE>_OFFLINE and/or <TABLE>_REALTIME in Pinot depending on the ingestion mode (batch, real-time, hybrid)

See examples for all possible batch/streaming tables.

Batch Table Creation

Please see Batch Tables for table configuration details and how to customize it.

docker run \
    --network=pinot-demo \
    --name pinot-batch-table-creation \
    ${PINOT_IMAGE} AddTable \
    -schemaFile examples/batch/airlineStats/airlineStats_schema.json \
    -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json \
    -controllerHost pinot-controller \
    -controllerPort 9000 \
    -exec

Sample Console Output

Executing command: AddTable -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json -schemaFile examples/batch/airlineStats/airlineStats_schema.json -controllerHost pinot-controller -controllerPort 9000 -exec
Sending request: http://pinot-controller:9000/schemas to controller: a413b0013806, version: Unknown
{"status":"Table airlineStats_OFFLINE succesfully added"}

bin/pinot-admin.sh AddTable \
    -schemaFile examples/batch/airlineStats/airlineStats_schema.json \
    -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json \
    -exec

Streaming Table Creation

Please see Streaming Tables for table configuration details and how to customize it.

Start Kafka

docker run \
    --network pinot-demo --name=kafka \
    -e KAFKA_ZOOKEEPER_CONNECT=pinot-zookeeper:2181/kafka \
    -e KAFKA_BROKER_ID=0 \
    -e KAFKA_ADVERTISED_HOST_NAME=kafka \
    -d wurstmeister/kafka:latest

Create a Kafka Topic

docker exec \
  -t kafka \
  /opt/kafka/bin/kafka-topics.sh \
  --zookeeper pinot-zookeeper:2181/kafka \
  --partitions=1 --replication-factor=1 \
  --create --topic flights-realtime

Create a Streaming table

docker run \
    --network=pinot-demo \
    --name pinot-streaming-table-creation \
    ${PINOT_IMAGE} AddTable \
    -schemaFile examples/stream/airlineStats/airlineStats_schema.json \
    -tableConfigFile examples/docker/table-configs/airlineStats_realtime_table_config.json \
    -controllerHost pinot-controller \
    -controllerPort 9000 \
    -exec

Sample output

Executing command: AddTable -tableConfigFile examples/docker/table-configs/airlineStats_realtime_table_config.json -schemaFile examples/stream/airlineStats/airlineStats_schema.json -controllerHost pinot-controller -controllerPort 9000 -exec
Sending request: http://pinot-controller:9000/schemas to controller: 8fbe601012f3, version: Unknown
{"status":"Table airlineStats_REALTIME succesfully added"}

Start Kafka-Zookeeper

bin/pinot-admin.sh StartZookeeper -zkPort 2191

Start Kafka

bin/pinot-admin.sh  StartKafka -zkAddress=localhost:2191/kafka -port 19092

Create stream table

bin/pinot-admin.sh AddTable \
    -schemaFile examples/stream/airlineStats/airlineStats_schema.json \
    -tableConfigFile examples/stream/airlineStats/airlineStats_realtime_table_config.json \
    -exec

Load Data

Now that the table is configured, let's load some data. Data can be loaded in batch mode or streaming mode. See ingestion overview page for details. Loading data involves generating pinot segments from raw data and pushing them to the pinot cluster.

Load Data in Batch

User can always generate and push segments to Pinot via standalone scripts or using frameworks such as Hadoop or Spark. See this page for more details on setting up Data Ingestion Jobs.

Below example goes with the standalone mode.

docker run \
    --network=pinot-demo \
    --name pinot-data-ingestion-job \
    ${PINOT_IMAGE} LaunchDataIngestionJob \
    -jobSpecFile examples/docker/ingestion-job-specs/airlineStats.yaml

Sample Console Output

SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
  segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**/*.avro
inputDirURI: examples/batch/airlineStats/rawdata
jobType: SegmentCreationAndTarPush
outputDirURI: examples/batch/airlineStats/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: 'http://pinot-controller:9000'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: {pushAttempts: 2, pushParallelism: 1, pushRetryIntervalMillis: 1000,
  segmentUriPrefix: null, segmentUriSuffix: null}
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.avro.AvroRecordReader,
  configClassName: null, configs: null, dataFormat: avro}
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: 'http://pinot-controller:9000/tables/airlineStats/schema',
  tableConfigURI: 'http://pinot-controller:9000/tables/airlineStats', tableName: airlineStats}

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Finished building StatsCollector!
Collected stats for 403 documents
Created dictionary for INT column: FlightNum with cardinality: 386, range: 14 to 7389
Using fixed bytes value dictionary for column: Origin, size: 294
Created dictionary for STRING column: Origin with cardinality: 98, max length in bytes: 3, range: ABQ to VPS
Created dictionary for INT column: Quarter with cardinality: 1, range: 1 to 1
Created dictionary for INT column: LateAircraftDelay with cardinality: 50, range: -2147483648 to 303
......
......
Pushing segment: airlineStats_OFFLINE_16085_16085_29 to location: http://pinot-controller:9000 for table airlineStats
Sending request: http://pinot-controller:9000/v2/segments?tableName=airlineStats to controller: a413b0013806, version: Unknown
Response for pushing table airlineStats segment airlineStats_OFFLINE_16085_16085_29 to location http://pinot-controller:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16085_16085_29 of table: airlineStats"}
Pushing segment: airlineStats_OFFLINE_16084_16084_30 to location: http://pinot-controller:9000 for table airlineStats
Sending request: http://pinot-controller:9000/v2/segments?tableName=airlineStats to controller: a413b0013806, version: Unknown
Response for pushing table airlineStats segment airlineStats_OFFLINE_16084_16084_30 to location http://pinot-controller:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16084_16084_30 of table: airlineStats"}

bin/pinot-admin.sh LaunchDataIngestionJob \
    -jobSpecFile examples/batch/airlineStats/ingestionJobSpec.yaml

JobSpec yaml file has all the information regarding data format, input data location and pinot cluster coordinates. Note that this assumes that the controller is RUNNING to fetch the table config and schema. If not, you will have to configure the spec to point at their location. See Pinot Ingestion Job for more details.

Load Data in Streaming

Kafka

Run below command to stream JSON data into Kafka topic: flights-realtime

docker run \
  --network pinot-demo \
  --name=loading-airlineStats-data-to-kafka \
  ${PINOT_IMAGE} StreamAvroIntoKafka \
  -avroFile examples/stream/airlineStats/sample_data/airlineStats_data.avro \
  -kafkaTopic flights-realtime -kafkaBrokerList kafka:9092 -zkAddress pinot-zookeeper:2181/kafka

Run below command to stream JSON data into Kafka topic: flights-realtime

bin/pinot-admin.sh StreamAvroIntoKafka \
  -avroFile examples/stream/airlineStats/sample_data/airlineStats_data.avro \
  -kafkaTopic flights-realtime -kafkaBrokerList localhost:19092 -zkAddress localhost:2191/kafka

Indexing

This page describes the different indexing techniques available in Pinot

Pinot currently supports the following index techniques, where each of them have their own advantages in different query scenarios. By default, Pinot will use dictionary-encoded forward index for each column.

Forward index

Dictionary-encoded forward index with bit compression (default)

For each unique value from a column, we assign an id to it, and build a dictionary from the id to the value. Then in the forward index, we only store the bit-compressed ids instead of the values. With few number of unique values, dictionary-encoding can significantly improve the space efficiency of the storage.

The below diagram shows the dictionary encoding for two columns with integer and string types. As seen in the colA, dictionary encoding will save significant amount of space for duplicated values. On the other hand, colB has no duplicated data. Dictionary encoding will not compress much data in this case where there are a lot of unique values in the column. For string type, we pick the length of the longest value and use it as the length for dictionary’s fixed length value array. In this case, padding overhead can be high if there are a large number of unique values for a column.

Raw value forward index

In contrast to the dictionary-encoded forward index, raw value forward index directly stores values instead of ids.

Without the dictionary, the dictionary lookup step can be skipped for each value fetch. Also, the index can take advantage of the good locality of the values, thus improve the performance of scanning large number of values.

A typical use case to apply raw value forward index is when the column has a large number of unique values and the dictionary does not provide much compression. As seen the above diagram for dictionary encoding, scanning values with a dictionary involves a lot of random access because we need to perform dictionary look up. On the other hand, we can scan values sequentially with raw value forward index and this can improve performance a lot when applied appropriately.

Raw value forward index can be configured for a table by setting it in the table config as

{
    "tableIndexConfig": {
        "noDictionaryColumns": [
            "column_name",
            ...
        ],
        ...
    }
}

Sorted forward index with run-length encoding

When a column is physically sorted, Pinot uses a sorted forward index with run-length encoding on top of the dictionary-encoding. Instead of saving dictionary ids for each document id, we store a pair of start and end document id for each value. (The below diagram does not include dictionary encoding layer for simplicity.)

Sorted forward index has the advantages of both good compression and data locality. Sorted forward index can also be used as inverted index.

Sorted index can be configured for a table by setting it in the table config as

{
    "tableIndexConfig": {
        "sortedColumn": [
            "column_name"
        ],
        ...
    }
}

Real-time server will sort data on sortedColumn when generating segment internally. For offline push, input data needs to be sorted before running Pinot segment conversion and push job.

When applied correctly, one can find the following information on the segment metadata.

$ grep memberId <segment_name>/v3/metadata.properties | grep isSorted
column.memberId.isSorted = true

Inverted index

Bitmap inverted index

When inverted index is enabled for a column, Pinot maintains a map from each value to a bitmap, which makes value lookup to be constant time. When you have a column that is used for filtering frequently, adding an inverted index will improve the performance greatly.

Inverted index can be configured for a table by setting it in the table config as

{
    "tableIndexConfig": {
        "invertedIndexColumns": [
            "column_name",
            ...
        ],
        ...
    }
}

Sorted inverted index

Sorted forward index can directly be used as inverted index, with log(n) time lookup and it can benefit from data locality.

For the below example, if the query has a filter on memberId, Pinot will perform binary search on memberId values to find the range pair of docIds for corresponding filtering value. If the query requires to scan values for other columns after filtering, values within the range docId pair will be located together; therefore, we can benefit a lot from data locality.

Sorted index performs much better than inverted index; however, it can only be applied to one column. When the query performance with inverted index is not good enough and most of queries have a filter on a specific column (e.g. memberId), sorted index can improve the query performance.

Star-tree index

Unlike other index techniques which work on single column, Star-Tree index is built on multiple columns, and utilize the pre-aggregated results to significantly reduce the number of values to be processed, thus improve the query performance.

One of the biggest challenges in realtime OLAP systems is achieving and maintaining tight SLA’s on latency and throughput on large data sets. Existing techniques such as sorted index or inverted index help improve query latencies, but speed-ups are still limited by number of documents necessary to process for computing the results. On the other hand, pre-aggregating the results ensures a constant upper bound on query latencies, but can lead to storage space explosion.

Here we introduce star-tree index to utilize the pre-aggregated documents in a smart way to achieve low query latencies but also use the storage space efficiently for aggregation/group-by queries.

Existing solutions

Consider the following data set as an example to discuss the existing approaches:

Country

Browser

Locale

Impressions

Chrome

400

Firefox

200

Safari

300

Safari

100

USA

Chrome

600

USA

Firefox

200

USA

Firefox

400

Sorted index

In this approach, data is sorted on a primary key, which is likely to appear as filter in most queries in the query set.

This reduces the time to search the documents for a given primary key value from linear scan O(n) to binary search O(logn), and also keeps good locality for the documents selected.

While this is a good improvement over linear scan, there are still a few issues with this approach:

While sorting on one column does not require additional space, sorting on additional columns would require additional storage space to re-index the records for the various sort orders.
While search time is reduced from O(n) to O(logn), overall latency is still a function of total number of documents need to be processed to answer a query.

Inverted index

In this approach, for each value of a given column, we maintain a list of document id’s where this value appears.

Below are the inverted indexes for columns ‘Browser’ and ‘Locale’ for our example data set:

Browser

Doc Id

Firefox

1,5,6

Chrome

0,4

Safari

2,3

Locale

Doc Id

0,3,4,6

2,5

For example, if we want to get all the documents where ‘Browser’ is ‘Firefox’, we can simply look up the inverted index for ‘Browser’ and identify that it appears in documents [1, 5, 6].

Using inverted index, we can reduce the search time to constant time O(1). However, the query latency is still a function of the selectivity of the query, i.e. increases with the number of documents need to be processed to answer the query.

Pre-aggregation

In this technique, we pre-compute the answer for a given query set upfront.

In the example below, we have pre-aggregated the total impressions for each country:

Country

Impressions

600

400

USA

1200

Doing so makes answering queries about total impressions for a country just a value lookup, by eliminating the need of processing a large number of documents. However, to be able to answer with multiple predicates implies pre-aggregating for various combinations of different dimensions. This leads to exponential explosion in storage space.

Star-tree solution

On one end of the spectrum we have indexing techniques that improve search times with limited increase in space, but do not guarantee a hard upper bound on query latencies. On the other end of the spectrum we have pre-aggregation techniques that offer hard upper bound on query latencies, but suffer from exponential explosion of storage space.

Space-Time Trade Off Between Different Techniques

We propose the Star-Tree data structure that offers a configurable trade-off between space and time and allows us to achieve hard upper bound for query latencies for a given use case. In the following sections we will define the Star-Tree data structure, and discuss how it is utilized within Pinot for achieving low latencies with high throughput.

Definitions

Tree Structure

Star-Tree is a tree data structure that is consisted of the following properties:

Star-Tree Structure

Root Node (Orange): Single root node, from which the rest of the tree can be traversed.
Leaf Node (Blue): A leaf node can containing at most T records, where T is configurable.
Non-leaf Node (Green): Nodes with more than T records are further split into children nodes.
Star-Node (Yellow): Non-leaf nodes can also have a special child node called the Star-Node. This node contains the pre-aggregated records after removing the dimension on which the data was split for this level.
Dimensions Split Order ([D1, D2]): Nodes at a given level in the tree are split into children nodes on all values of a particular dimension. The dimensions split order is an ordered list of dimensions that is used to determine the dimension to split on for a given level in the tree.

Node Properties

The properties stored in each node are as follows:

Dimension: The dimension which the node is split on
Start/End Document Id: The range of documents this node points to
Aggregated Document Id: One single document which is the aggregation result of all documents pointed by this node

Index generation

Star-tree index is generated in the following steps:

The data is first projected as per the dimensionsSplitOrder. Only the dimensions from the split order are reserved, others are dropped. For each unique combination of reserved dimensions, metrics are aggregated per configuration. The aggregated documents are written to a file and served as the initial Star-Tree documents (separate from the original documents).
Sort the Star-Tree documents based on the dimensionsSplitOrder. It is primary-sorted on the first dimension in this list, and then secondary sorted on the rest of the dimensions based on their order in the list. Each node in the tree points to a range in the sorted documents.
The tree structure can be created recursively (starting at root node) as follows:
- If a node has more than T records, it is split into multiple children nodes, one for each value of the dimension in the split order corresponding to current level in the tree.
- A Star-Node can be created (per configuration) for the current node, by dropping the dimension being split on, and aggregating the metrics for rows containing dimensions with identical values. These aggregated documents are appended to the end of the Star-Tree documents.
  If there is only one value for the current dimension, Star-Node won’t be created because the documents under the Star-Node are identical to the single node.
The above step is repeated recursively until there are no more nodes to split.
Multiple Star-Trees can be generated based on different configurations (dimensionsSplitOrder, aggregations, T)

Aggregation

Aggregation is configured as a pair of aggregation function and the column to apply the aggregation.

All types of aggregation function with bounded-sized intermediate result are supported.

Supported Functions

COUNT
MIN
MAX
SUM
AVG
MINMAXRANGE
DISTINCTCOUNTHLL
PERCENTILEEST
PERCENTILETDIGEST

Unsupported Functions

DISTINCTCOUNT: Intermediate result Set is unbounded
PERCENTILE: Intermediate result List is unbounded

Index generation configuration

Multiple index generation configurations can be provided to generate multiple Star-Trees. Each configuration should contain the following properties:

dimensionsSplitOrder: An ordered list of dimension names can be specified to configure the split order. Only the dimensions in this list are reserved in the aggregated documents. The nodes will be split based on the order of this list. For example, split at level i is performed on the values of dimension at index i in the list.
skipStarNodeCreationForDimensions (Optional, default empty): A list of dimension names for which to not create the Star-Node.
functionColumnPairs: A list of aggregation function and column pairs (split by double underscore “__”). E.g. SUM__Impressions (SUM of column Impressions)
maxLeafRecords (Optional, default 10000): The threshold T to determine whether to further split each node.

Example

For our example data set, with the following example configuration, the tree and documents should be something like below.

"tableIndexConfig": {
  "starTreeIndexConfigs": [{
    "dimensionsSplitOrder": [
      "Country",
      "Browser",
      "Locale"
    ],
    "skipStarNodeCreationForDimensions": [
    ],
    "functionColumnPairs": [
      "SUM__Impressions"
    ],
    "maxLeafRecords": 1
  }],
  ...
}

Tree structure

The values in the parentheses are the aggregated sum of Impressions for all the documents under the node.

Star-tree documents

Country

Browser

Locale

SUM__Impressions

Chrome

400

Firefox

200

Safari

100

Safari

300

USA

Chrome

600

USA

Firefox

400

USA

Firefox

200

400

200

600

Safari

400

USA

Firefox

600

USA

1000

USA

200

USA

1200

Chrome

1000

Firefox

400

Firefox

200

Firefox

200

Firefox

800

Safari

100

Safari

300

Safari

400

1500

500

200

2200

Query execution

For query execution, the idea is to first check metadata to determine whether the query can be solved with the Star-Tree documents, then traverse the Star-Tree to identify documents that satisfy all the predicates. After applying any remaining predicates that were missed while traversing the Star-Tree to the identified documents, apply aggregation/group-by on the qualified documents.

The algorithm to traverse the tree can be described as follows:

Start from root node.
For each level, what child node(s) to select depends on whether there are any predicates/group-by on the split dimension for the level in the query.
- If there is no predicate or group-by on the split dimension, select the Star-Node if exists, or all child nodes to traverse further.
- If there are predicate(s) on the split dimension, select the child node(s) that satisfy the predicate(s).
- If there is no predicate, but there is a group-by on the split dimension, select all child nodes except Star-Node.
Recursively repeat the previous step until all leaf nodes are reached, or all predicates are satisfied.
Collect all the documents pointed to by the selected nodes.
- If all predicates and group-bys are satisfied, pick the single aggregated document from each selected node.
- Otherwise, collect all the documents in the document range from each selected node.

Notes on index tuning

If your use case is not site facing with a strict low latency requirement, inverted index will perform good enough for the most of use cases. We recommend to start with adding inverted index and if the query does not perform good enough, a user can consider to use more advanced indices such as sorted column and star-tree index.

Schema

Schema is used to define the names, data types and other information for the columns of a Pinot table.

Types of columns

Columns in a Pinot table can be broadly categorized into three categories

Column Category

Description

Dimension

Dimension columns are typically used in slice and dice operations for answering business queries. Frequent operations done on dimension columns:

GROUP BY - group by one or more dimension columns along with aggregations on one or more metric columns
Filter processing

Metric

These columns represent quantitative data of the table. Such columns are frequently used in aggregation operations. In data warehouse terminology, these are also referred to as fact or measure columns.

Frequent operations done on metric columns:

Aggregation - SUM, MIN, MAX, COUNT, AVG etc
Filter processing

DateTime

This column represents time columns in the data. There can be multiple time columns in a table, but only one of them is the primary time column. Primary time column is the one that is set in the . This primary time column is used by Pinot, for maintaining the time boundary between offline and realtime data in a hybrid table and for retention management. A primary time column is mandatory if the table's push type is APPEND and optional if the push type is REFRESH .

Common operations done on time column:

GROUP BY
Filter processing

~~Time~~

This has been deprecated. Use DateTime column type for time columns.

This column represents a timestamp. There can be at most one time column in a table. Common operations done on time column:

GROUP BY
Filter processing

The time column is also used internally by Pinot, for maintaining the time boundary between offline and realtime data in a hybrid table and for retention management. A time column is mandatory if the table's push type is APPEND and optional if the push type is REFRESH .

Schema format

A Pinot schema is written in JSON format. Here's an example which shows all the fields of a schema

flights-schema.json

{
  "schemaName": "flights",
  "dimensionFieldSpecs": [
    {
      "name": "flightNumber",
      "dataType": "LONG"
    },
    {
      "name": "tags",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": "null"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "price",
      "dataType": "DOUBLE",
      "defaultNullValue": 0
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "millisSinceEpoch",
      "dataType": "LONG",
      "format": "1:MILLSECONDS:EPOCH",
      "granularity": "15:MINUTES"
    },
    {
      "name": "hoursSinceEpoch",
      "dataType": "INT",
      "format": "1:HOURS:EPOCH",
      "granularity": "1:HOURS"
    },
    {
      "name": "date",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
      "granularity": "1:DAYS"
    }
  ]
}

The Pinot schema is composed of

schema fields

description

schemaName

Defines the name of the schema. This is usually the same as the table name. The offline and the realtime table of a hybrid table should use the same schema.

dimensionFieldSpecs

A dimensionFieldSpec is defined for each dimension column. For more details, scroll down to

metricFieldSpecs

A metricFieldSpec is defined for each metric column. For more details, scroll down to

dateTimeFieldSpec

A dateTimeFieldSpec is defined for the time columns. There can be multiple time columns. For more details, scroll down to dateTimeFieldSpec.

~~timeFieldSpec~~

Deprecated. Use dateTimeFieldSpec instead. A timeFieldSpec is defined for the time column. There can only be one time column. For more details, scroll down to

Below is a detailed description of each type of field spec.

dimensionFieldSpecs

A dimensionFieldSpec is defined for each dimension column. Here's a list of the fields in the dimensionFieldSpec

field

description

name

Name of the dimension column

dataType

Data type of the dimension column. Can be STRING, BOOLEAN, INT, LONG, DOUBLE, FLOAT, BYTES

<b></b>

defaultNullValue

Represents null values in the data, since Pinot doesn't support storing null column values natively (as part of its on-disk storage format). If not specified, an internal default null value is used as listed here

singleValueField

Boolean indicating if this is a single value or a multi value column. In the example above, the dimension tags is multi-valued. This means that it can have multiple values for a particular row, say tag1, tag2, tag3. For a multi-valued column, individual rows don’t necessarily need to have the same number of values. Typical use case for this would be a column such as skillSet for a person (one row in the table) that can have multiple values such as Real Estate, Mortgages.

Internal default null values for dimension

Data Type

Internal Default Null Value

INT

LONG

FLOAT

DOUBLE

STRING

"null"

BYTES

byte array of length 0

metricFieldSpecs

A metricFieldSpec is defined for each metric column. Here's a list of fields in the metricFieldSpec

field

description

name

Name of the metric column

dataType

Data type of the column. Can be INT, LONG, DOUBLE, FLOAT, BYTES (for specialized representations such as HLL, TDigest, etc, where the column stores byte serialized version of the value)

defaultNullValue

Represents null values in the data. If not specified, an internal default null value is used, as listed here. The values are the same as those used for dimensionFieldSpec.

Internal default null values for metric

Data Type

Internal Default Null Value

INT

LONG

FLOAT

0.0

DOUBLE

0.0

STRING

"null"

BYTES

byte array of length 0

dateTimeFieldSpec

A dateTimeFieldSpec is used to define time columns of the table. Here's a list of the fields in a dateTimeFieldSpec

field

description

name

Name of the date time column

dataType

Data type of the date time column. Can be STRING, INT, LONG

format

The format of the time column. The syntax of the format is timeSize:timeUnit:timeFormat

timeFormat can be either EPOCH or SIMPLE_DATE_FORMAT. If it is SIMPLE_DATE_FORMAT, the pattern string is also specified. For example:

1:MILLISECONDS:EPOCH - epoch millis

1:HOURS:EPOCH - epoch hours

1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd - date specified like 20191018

1:HOURS:SIMPLE_DATE_FORMAT:EEE MMM dd HH:mm:ss ZZZ yyyy - date specified like Mon Aug 24 12:36:50 America/Los_Angeles 2019

granularity

The granularity in which the column is bucketed. The syntax of granularity is bucket size:bucket unit For example, the format can be milliseconds 1:MILLISECONDS:EPOCH, but bucketed to 15 minutes i.e. we only have one value for every 15 minute interval, in which case granularity can be specified as 15:MINUTES

defaultNullValue

Represents null values in the data. If not specified, an internal default null value is used, as listed here. The values are the same as those used for dimensionFieldSpec.

timeFieldSpec

This has been deprecated. Older schemas containing timeFieldSpec will be supported. But for new schemas, use DateTimeFieldSpec instead.

A timeFieldSpec is defined for the time column. A timeFieldSpec is composed of an incomingGranularitySpec and an outgoingGranularitySpec. IncomingGranularitySpec in combination with outgoingGranularitySpec can be used to transform the time column from incoming format to the outgoing format. If both of them are specified, the segment creation process will convert the time column from the incoming format to the outgoing format. If no time column transformation is required, you can specify just the incomingGranularitySpec.

timeFieldSpec fields

Description

incomingGranularitySpec

Details of the time column in the incoming data

outgoingGranularitySpec

Details of the format to which the time column should be converted for using in Pinot

The incoming and outgoing granularitySpec are defined as:

field

description

name

Name of the time column. If incomingGranularitySpec, this is the name of the time column in the incoming data. If outgoingGranularitySpec, this is the name of the column you wish to transform it to and see in Pinot

dataType

Data type of the time column. Can be INT, LONG or STRING

timeType

Indicates the time unit. Can be one of DAYS, SECONDS, HOURS, MILLISECONDS, MICROSECONDS and NANOSECONDS

timeUnitSize

Indicates the bucket length. By default 1. E.g. in the sample above outgoing time is in fiveMinutesSinceEpoch i.e. rounded to 5 minutes buckets

timeFormat

EPOCH (millisSinceEpoch, hoursSinceEpoch etc) or SIMPLE_DATE_FORMAT (yyyyMMdd, yyyyMMdd:hhssmm etc)

Advanced fields

Apart from these, there's some advanced fields. These are common to all field specs.

field name

description

maxLength

Max length of this column

transformFunction

Transform function to generate this column. See section .

virtualColumnProvider

Column value provider

Ingestion Transform Functions

Transform functions can be defined on columns in the schema. For example:

"metricFieldSpecs": [
    {
      "name": "maxPrice",
      "dataType": "DOUBLE",
      "transformFunction": "Groovy({prices.max()}, prices)" // groovy function
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "hoursSinceEpoch",
      "dataType": "INT",
      "format": "1:HOURS:EPOCH",
      "granularity": "1:HOURS",
      "transformFunction": "toEpochHours(timestamp)" // inbuilt function
    }

Currently, we have support for 2 kinds of functions

Groovy functions
Inbuilt functions

Note

Currently, the arguments must be from the source data. They cannot be columns from the Pinot schema which have been created through transformations.

Groovy functions

Groovy functions can be defined using the syntax:

Groovy({groovy script}, argument1, argument2...argumentN)

Here's some examples of commonly needed functions. Any valid Groovy expression can be used.

String concatenation

Concat firstName and lasName to get fullName

{
      "name": "fullName",
      "dataType": "STRING",
      "transformFunction": "Groovy({firstName+' '+lastName}, firstName, lastName)"
}

Find element in an array

Find max value in array bids

{
      "name": "maxBid",
      "dataType": "INT",
      "transformFunction": "Groovy({bids.max{ it.toBigDecimal() }}, bids)"
}

Time transformation

Convert timestamp from MILLISECONDS to HOURS

"dateTimeFieldSpecs": [{
    "name": "hoursSinceEpoch",
    "dataType": "LONG",
    "format" : "1:HOURS:EPOCH",
    "granularity": "1:HOURS"
    "transformFunction": "Groovy({timestamp/(1000*60*60)}, timestamp)"
  }]

Column name change

Simply change name of the column from user_id to userId

{
      "name": "userId",
      "dataType": "LONG",
      "transformFunction": "Groovy({user_id}, user_id)"
}

Ternary operation

If eventType is IMPRESSION set impression to 1. Similar for CLICK.

{
    "name": "impressions",
    "dataType": "LONG",
    "transformFunction": "Groovy({eventType == 'IMPRESSION' ? 1: 0}, eventType)"
},
{
    "name": "clicks",
    "dataType": "LONG",
    "transformFunction": "Groovy({eventType == 'CLICK' ? 1: 0}, eventType)"
}

AVRO Map

Store an AVRO Map in Pinot as two multi-value columns. Sort the keys, to maintain the mapping. 1) The keys of the map as map_keys 2) The values of the map as map_values

{
      "name": "map2_keys",
      "dataType": "STRING",
      "singleValueField": false,
      "transformFunction": "Groovy({map2.sort()*.key}, map2)"
},
{
      "name": "map2_values",
      "dataType": "INT",
      "singleValueField": false,
      "transformFunction": "Groovy({map2.sort()*.value}, map2)"
}

Inbuilt Pinot functions

We have several inbuilt functions that can be used directly in as ingestion transform functions

DateTime functions

These are functions which enable commonly needed time transformations.

toEpochXXX

Converts from epoch milliseconds to a higher granularity.

Function name

Description

toEpochSeconds

Converts epoch millis to epoch seconds.

Usage: "transformFunction": "toEpochSeconds(millis)"

toEpochMinutes

Converts epoch millis to epoch minutes

Usage: "transformFunction": "toEpochMinutes(millis)"

toEpochHours

Converts epoch millis to epoch hours

Usage: "transformFunction": "toEpochHours(millis)"

toEpochDays

Converts epoch millis to epoch days

Usage: "transformFunction": "toEpochDays(millis)"

toEpochXXXRounded

Converts from epoch milliseconds to another granularity, rounding to the nearest rounding bucket. For example, 1588469352000 (2020-05-01 42:29:12) is 26474489 minutesSinceEpoch. `toEpochMinutesRounded(1588469352000) = 26474480 (2020-05-01 42:20:00)

Function Name

Description

toEpochSecondsRounded

Converts epoch millis to epoch seconds, rounding to nearest rounding bucket

"transformFunction": "toEpochSecondsRounded(millis, 30)"

toEpochMinutesRounded

Converts epoch millis to epoch seconds, rounding to nearest rounding bucket

"transformFunction": "toEpochMinutesRounded(millis, 10)"

toEpochHoursRounded

Converts epoch millis to epoch seconds, rounding to nearest rounding bucket

"transformFunction": "toEpochHoursRounded(millis, 6)"

toEpochDaysRounded

Converts epoch millis to epoch seconds, rounding to nearest rounding bucket

"transformFunction": "toEpochDaysRounded(millis, 7)"

fromEpochXXX

Converts from an epoch granularity to milliseconds.

Function Name

Description

fromEpochSeconds

Converts from epoch seconds to milliseconds

"transformFunction": "fromEpochSeconds(secondsSinceEpoch)"

fromEpochMinutes

Converts from epoch minutes to milliseconds

"transformFunction": "fromEpochMinutes(minutesSinceEpoch)"

fromEpochHours

Converts from epoch hours to milliseconds

"transformFunction": "fromEpochHours(hoursSinceEpoch)"

fromEpochDays

Converts from epoch days to milliseconds

"transformFunction": "fromEpochDays(daysSinceEpoch)"

Simple date format

Converts simple date format strings to milliseconds and vice-a-versa, as per the provided pattern string.

Function name

Description

toDateTime

Converts from milliseconds to a formatted date time string, as per the provided pattern

"transformFunction": "toDateTime(millis, 'yyyy-MM-dd')"

fromDateTime

Converts a formatted date time string to milliseconds, as per the provided pattern

"transformFunction": "fromDateTime(dateTimeStr, 'EEE MMM dd HH:mm:ss ZZZ yyyy')"

Json functions

Function name

Description

toJsonMapStr

Converts a JSON/Avro map to a string. This json map can then be queried using function.

"transformFunction": "toJsonMapStr(jsonMapField)"

Creating a Schema

Create a schema for your data, or see examples for examples. Make sure you've setup the cluster

Note: schema can also be created as part of table creation, refer to Creating a table.

bin/pinot-admin.sh AddSchema -schemaFile transcript-schema.json -exec

curl -F [email protected]  localhost:9000/schemas

Check out the schema in the Rest API to make sure it was successfully uploaded

release-0.4.0

Introduction

What is Pinot?

Get started

Starter guides

Query example

Installation

Standalone mode

Cluster mode

Learn

Overview

Is Pinot a data warehouse or a database?

Companies using Pinot

Features

When should I use it?

Basics

Concepts

Pinot Storage Model

Table

Segment

Tenant

Cluster

Pinot Components

Pinot Controller

Pinot Broker

Pinot Server

Pinot Minion

Components

Operator reference

Developer reference

Controller

Starting a Controller

Broker

Starting a Broker

Server

Starting a Server

Minion

Starting Minion

Tenant

Tenant Config

Creating a tenant

Broker tenant

Server tenant

Getting started

Running Pinot

Bootstrapping a cluster

Deploy to a public cloud

How to setup a Pinot cluster

Data import examples

Running Pinot locally

Download Apache Pinot

Build from source or download the distribution

Setting up a Pinot cluster

Batch

Streaming

Hybrid

Public cloud examples

Running on Azure

1. Tooling Installation

1.1 Install Kubectl

1.2 Install Helm

1.3 Install Azure CLI

2. (Optional) Login to your Azure account

3. (Optional) Create a Resource Group

4. (Optional) Create a Kubernetes cluster(AKS) in Azure

5. Connect to an existing cluster

6. Pinot Quickstart

7. Delete a Kubernetes Cluster

Running on GCP

1. Tooling Installation

1.1 Install Kubectl

1.2 Install Helm

1.3 Install Google Cloud SDK

1.3.1 For Mac User

2. (Optional) Initialize Google Cloud Environment

3. (Optional) Create a Kubernetes cluster(GKE) in Google Cloud

4. Connect to an existing cluster

5. Pinot Quickstart

6. Delete a Kubernetes Cluster

Running on AWS