Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 172 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

release-0.9.0

Loading...

Basics

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

For Users

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

For Developers

Controller

The Pinot Controller is responsible for the following:

  • Maintaining global metadata (e.g. configs and schemas) of the system with the help of Zookeeper which is used as the persistent metadata store.

  • Hosting the Helix Controller and managing other Pinot components (brokers, servers, minions)

  • Maintaining the mapping of which servers are responsible for which segments. This mapping is used by the servers to download the portion of the segments that they are responsible for. This mapping is also used by the broker to decide which servers to route the queries to.

  • Serving admin endpoints for viewing, creating, updating, and deleting configs, which are used to manage and operate the cluster.

  • Serving endpoints for segment uploads, which are used in offline data pushes. They are responsible for initializing real-time consumption and coordination of persisting real-time segments into the segment store periodically.

  • Undertaking other management activities such as managing retention of segments, validations.

For redundancy, there can be multiple instances of Pinot controllers. Pinot expects that all controllers are configured with the same back-end storage system so that they have a common view of the segments (e.g. NFS). Pinot can use other storage systems such as HDFS or ADLS.

Starting a Controller

Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a controller

docker run \
    --network=pinot-demo \
    --name pinot-controller \
    -p 9000:9000 \
    -d ${PINOT_IMAGE} StartController \
    -zkAddress pinot-zookeeper:2181
bin/pinot-admin.sh StartController \
  -zkAddress localhost:2181 \
  -clusterName PinotCluster \
  -controllerPort 9000

APIs

Components

Learn about the different components and logical abstractions

This section is a reference for the definition of major components and logical abstractions used in Pinot. Please visit the Basic Concepts section to get a general overview that ties together all of the reference material in this section.

Operator reference

Developer reference

Server

Servers host the data segments and serve queries off the data they host. There are two types of servers:

Offline Offline servers are responsible for downloading segments from the segment store, to host and serve queries off. When a new segment is uploaded to the controller, the controller decides the servers (as many as replication) that will host the new segment and notifies them to download the segment from the segment store. On receiving this notification, the servers download the segment file and load the segment onto the server, to server queries off them.

Real-time Real-time servers directly ingest from a real-time stream (such as Kafka, EventHubs). Periodically, they make segments of the in-memory ingested data, based on certain thresholds. This segment is then persisted onto the segment store.

Pinot Servers are modeled as Helix Participants, hosting Pinot tables (referred to as resources in Helix terminology). Segments of a table are modeled as Helix partitions (of a resource). Thus, a Pinot server hosts one or more helix partitions of one or more helix resources (i.e. one or more segments of one or more tables).

Starting a Server

Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a server

Usage: StartServer
	-serverHost               <String>                      : Host name for controller. (required=false)
	-serverPort               <int>                         : Port number to start the server at. (required=false)
	-serverAdminPort          <int>                         : Port number to serve the server admin API at. (required=false)
	-dataDir                  <string>                      : Path to directory containing data. (required=false)
	-segmentDir               <string>                      : Path to directory containing segments. (required=false)
	-zkAddress                <http>                        : Http address of Zookeeper. (required=false)
	-clusterName              <String>                      : Pinot cluster name. (required=false)
	-configFileName           <Config File Name>            : Broker Starter Config file. (required=false)
	-help                                                   : Print this message. (required=false)

docker run \
    --network=pinot-demo \
    --name pinot-server \
    -d ${PINOT_IMAGE} StartServer \
    -zkAddress pinot-zookeeper:2181
bin/pinot-admin.sh StartServer \
    -zkAddress localhost:2181

USAGE

Usage: StartServer
	-serverHost               <String>                      : Host name for controller. (required=false)
	-serverPort               <int>                         : Port number to start the server at. (required=false)
	-serverAdminPort          <int>                         : Port number to serve the server admin API at. (required=false)
	-dataDir                  <string>                      : Path to directory containing data. (required=false)
	-segmentDir               <string>                      : Path to directory containing segments. (required=false)
	-zkAddress                <http>                        : Http address of Zookeeper. (required=false)
	-clusterName              <String>                      : Pinot cluster name. (required=false)
	-configFileName           <Config File Name>            : Server Starter Config file. (required=false)
	-help                                                   : Print this message. (required=false)

External Clients

A lot of times the user wants to query data from an external application instead of using the inbuilt query explorer. Pinot provides external query client for this purpose. All of the clients have pretty standard interfaces so that the learning curve is minimum.

Currently Pinot provides the following clients

0.1.0

The 0.1.0 is first release of Pinot as an Apache project

New Features

  • First release

  • Off-line data ingestion from Apache Hadoop

  • Real-time data ingestion from Apache Kafka

JDBC
Java
Python
Golang

Recipes

Here you will find a collection of ready-made sample applications and examples for real-world data

Tutorials

Cluster
Controller
Broker
Server
Minion
Tenant
Table
Schema
Segment

Concepts

Learn about the various components of Pinot and terminologies used to describe data stored in Pinot

Pinot is designed to deliver low latency queries on large datasets. In order to achieve this performance, Pinot stores data in a columnar format and adds additional indices to perform fast filtering, aggregation and group by.

Raw data is broken into small data shards and each shard is converted into a unit known as a segment. One or more segments together form a table, which is the logical container for querying Pinot using SQL/PQL.

Pinot Storage Model

Pinot uses a variety of terms which can refer to either abstractions that model the storage of data or infrastructure components that drive the functionality of the system.

Pinot Storage Model Abstraction

Table

Similar to traditional databases, Pinot has the concept of a table—a logical abstraction to refer to a collection of related data. As is the case with RDBMS, a table is a construct that consists of columns and rows (documents) that are queried using SQL. A table is associated with a schema which defines the columns in a table as well as their data types.

As opposed to RDBMS schemas, multiple tables can be created in Pinot (real-time or batch) that inherit a single schema definition. Tables are independently configured for concerns such as indexing strategies, partitioning, tenants, data sources, and/or replication.

Segment

Pinot has a distributed systems architecture that scales horizontally. Pinot expects the size of a table to grow infinitely over time. In order to achieve this, all data needs to be distributed across multiple nodes. Pinot achieves this by breaking data into smaller chunks known as segments (this is similar to shards/partitions in HA relational databases). Segments can also be seen as time-based partitions.

Tenant

In order to support multi-tenancy, Pinot has first class support for tenants. A table is associated with a tenant. This allows all tables belonging to a particular logical namespace to be grouped under a single tenant name and isolated from other tenants. This isolation between tenants provides different namespaces for applications and teams to prevent sharing tables or schemas. Development teams building applications will never have to operate an independent deployment of Pinot. An organization can operate a single cluster and scale it out as new tenants increase the overall volume of queries. Developers can manage their own schemas and tables without being impacted by any other tenant on a cluster.

By default, all tables belong to a default tenant named "default". The concept of tenants is very important, as it satisfies the architectural principle of a "database per service/application" without having to operate many independent data stores. Further, tenants will schedule resources so that segments (shards) are able to restrict a table's data to reside only on a specified set of nodes. Similar to the kind of isolation that is ubiquitously used in Linux containers, compute resources in Pinot can be scheduled to prevent resource contention between tenants.

Cluster

Logically, a cluster is simply a group of tenants. As with the classical definition of a cluster, it is also a grouping of a set of compute nodes. Typically, there is only one cluster per environment/data center. There is no needed to create multiple clusters since Pinot supports the concept of tenants. At LinkedIn, the largest Pinot cluster consists of 1000+ nodes distributed across a data center. The number of nodes in a cluster can be added in a way that will linearly increase performance and availability of queries. The number of nodes and the compute resources per node will reliably predict the QPS for a Pinot cluster, and as such, capacity planning can be easily achieved using SLAs that assert performance expectations for end-user applications.

Auto-scaling is also achievable, however, a set amount of nodes is recommended to keep QPS consistent when query loads vary in sudden unpredictable end-user usage scenarios.

Pinot Components

A Pinot cluster is comprised of multiple distributed system components. These components are useful to understand for operators that are monitoring system usage or are debugging an issue with a cluster deployment.

  • Controller

  • Server

  • Broker

  • Minion (optional)

The benefits of scale that make Pinot linearly scalable for an unbounded number of nodes is made possible through its integration with Apache Zookeeper and Apache Helix.

Helix is a cluster management solution that was designed and created by the authors of Pinot at LinkedIn. Helix drives the state of a Pinot cluster from a transient state to an ideal state, acting as the fault-tolerant distributed state store that guarantees consistency. Helix is embedded as agents that operate within a controller, broker, and server, and does not exist as an independent and horizontally scaled component.

Pinot Controller

A controller is the core orchestrator that drives the consistency and routing in a Pinot cluster. Controllers are horizontally scaled as an independent component (container) and has visibility of the state of all other components in a cluster. The controller reacts and responds to state changes in the system and schedules the allocation of resources for tables, segments, or nodes. As mentioned earlier, Helix is embedded within the controller as an agent that is a participant responsible for observing and driving state changes that are subscribed to by other components.

In addition to cluster management, resource allocation, and scheduling, the controller is also the HTTP gateway for REST API administration of a Pinot deployment. A web-based query console is also provided for operators to quickly and easily run SQL/PQL queries.

Pinot Broker

A broker receives queries from a client and routes their execution to one or more Pinot servers before returning a consolidated response.

Pinot Server

Servers host segments (shards) that are scheduled and allocated across multiple nodes and routed on an assignment to a tenant (there is a single tenant by default). Servers are independent containers that scale horizontally and are notified by Helix through state changes driven by the controller. A server can either be a real-time server or an offline server.

A real-time and offline server have very different resource usage requirements, where real-time servers are continually consuming new messages from external systems (such as Kafka topics) that are ingested and allocated on segments of a tenant. Because of this, resource isolation can be used to prioritize high-throughput real-time data streams that are ingested and then made available for query through a broker.

Pinot Minion

Pinot minion is an optional component that can be used to run background tasks such as "purge" for GDPR (General Data Protection Regulation). As Pinot is an immutable aggregate store, records containing sensitive private data need to be purged on a request-by-request basis. Minion provides a solution for this purpose that complies with GDPR while optimizing Pinot segments and building additional indices that guarantees performance in the presence of the possibility of data deletion. One can also write a custom task that runs on a periodic basis. While it's possible to perform these tasks on the Pinot servers directly, having a separate process (Minion) lessens the overall degradation of query latency as segments are impacted by mutable writes.

Broker

Brokers handle Pinot queries. They accept queries from clients and forward them to the right servers. They collect results back from the servers and consolidate them into a single response, to send back to the client.

Broker interaction with other components

Pinot Brokers are modeled as Helix Spectators. They need to know the location of each segment of a table (and each replica of the segments) and route requests to the appropriate server that hosts the segments of the table being queried.

The broker ensures that all the rows of the table are queried exactly once so as to return correct, consistent results for a query. The brokers may optimize to prune some of the segments as long as accuracy is not sacrificed.

Helix provides the framework by which spectators can learn the location in which each partition of a resource (i.e. participant) resides. The brokers use this mechanism to learn the servers that host specific segments of a table.

In the case of hybrid tables, the brokers ensure that the overlap between real-time and offline segment data is queried exactly once, by performing offline and real-time federation.

Let's take this example, we have real-time data for 5 days - March 23 to March 27, and offline data has been pushed until Mar 25, which is 2 days behind real-time. The brokers maintain this time boundary.

Suppose, we get a query to this table : select sum(metric) from table. The broker will split the query into 2 queries based on this time boundary - one for offline and one for realtime. This query becomes - select sum(metric) from table_REALTIME where date >= Mar 25 and select sum(metric) from table_OFFLINE where date < Mar 25

The broker merges results from both these queries before returning the result to the client.

Starting a Broker

Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a broker

docker run \
    --network=pinot-demo \
    --name pinot-broker \
    -d ${PINOT_IMAGE} StartBroker \
    -zkAddress pinot-zookeeper:2181
bin/pinot-admin.sh StartBroker \
  -zkAddress localhost:2181 \
  -clusterName PinotCluster \
  -brokerPort 7000

Frequently Asked Questions (FAQs)

This page has a collection of frequently asked questions with answers from the community.

This is a list of frequent questions most often asked in our troubleshooting channel on Slack. Please feel free to contribute your questions and answers here and make a pull request.

Query

Learn how to query Apache Pinot using SQL or explore data using the web-based Pinot query console.

Getting Started

This section contains quick start guides to help you get up and running with Pinot.

Running Pinot

We want your experience getting started with Pinot to be both low effort and high reward. Here you'll find a collection of quick start guides that contain starter distributions of the Pinot platform.

Bootstrapping a cluster

Deploy to a public cloud

How to setup a Pinot cluster

This video will show you a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances. This is an excellent resource for developers and operators that want to understand setting up each component and debugging a cluster.

You can find the commands that are shown in this video on GitHub

We also have a step-by-step guide for manually setting up a Pinot cluster using Docker or shell scripts.

Data import examples

Getting data into Pinot is easy. Take a look at these two quick start guides which will help you get up and running with sample data for offline and real-time .

Public cloud examples

This page contains multiple quick start guides for deploying Pinot to a public cloud provider.

The following quick start guides will show you how to run an Apache Pinot cluster using Kubernetes on different public cloud providers.

General

FAQ for general questions around Pinot

How does Pinot use deep storage?

When data is pushed in to Pinot, it makes a backup copy of the data and stores it on the configured deep-storage (S3/GCP/ADLS/NFS/etc). This copy is stored as tar.gz Pinot segments. Note, that pinot servers keep a (untarred) copy of the segments on their local disk as well. This is done for performance reasons.

How does Pinot use Zookeeper?

Pinot uses Apache Helix for cluster management, which in turn is built on top of Zookeeper. Helix uses Zookeeper to store the cluster state, including Ideal State, External View, Participants, etc. Besides that, Pinot uses Zookeeper to store other information such as Table configs, schema, Segment Metadata, etc.

Why am I getting "Could not find or load class" error when running Quickstart using 0.8.0 release?

Please check the JDK version you are using. The release 0.8.0 binary is on JDK 11. You may be getting this error if you are using JDK8. In that case, please consider using JDK11, or you will need to download the for the release and it locally.

Dimension Table

Dimension tables in Apache Pinot.

Dimension tables are a special kind of offline tables from which data can be looked up via the lookup UDF, providing a join like functionality. These dimension tables are replicated on all the hosts for a given tenant to allow faster lookups.

To mark an offline table as a dim table the configuration isDimTable should be set to true in the table config as shown below

As dimension table are used to perform lookups of dimension values, they are required to have a primary key (can be a composite key).

As mentioned above, when a table is marked as a dimension table it will be replicated on all the hosts, because of this the size of the dim table has to be small. The maximum size quota for a dimension table in a cluster is controlled by controller.dimTable.maxSize controller property. Table creation will fail if the storage quota exceeds this maximum size.

File systems

This section contains a collection of short guides to show you how to import from a Pinot supported file system.

FileSystem is an abstraction provided by Pinot to access data in distributed file systems

Pinot uses the distributed file systems or the following purposes -

  • Batch Ingestion Job - To read the input data (CSV, Avro, Thrift etc.) and to write generated segments to DFS

  • Controller - When a segment is uploaded to controller, the controller saves it in the DFS configured.

  • Server - When a server(s) is notified of a new segment, the server copy the segment from remote DFS to their local node using the DFS abstraction

Pinot allows you to choose the distributed file system provider. Currently, the following file systems are supported by Pinot out of the box.

To use a distributed file-system, you need to enable plugins in pinot. To do that, specify the plugin directory and include the required plugins -

Now, You can proceed to change the filesystem in the controller and server config as follows -

Here, scheme refers to the prefix used in URI of the filesystem. e.g. for the URI, s3://bucket/path/to/file the scheme is s3

You can also change the filesystem during ingestion. In the ingestion job spec, specify the filesystem with the following config -

Bloom Filter

Bloom filter helps prune segments that do not contain any record matching a EQUALITY predicate, e.g.

SELECT COUNT(*) from baseballStats where playerID = 12345

There are 3 parameters to configure the bloom filter:

  • fpp: False positive probability of the bloom filter (from 0 to 1, 0.05 by default). The lower the fpp , the higher accuracy the bloom filter has, but it will also increase the size of the bloom filter.

  • maxSizeInBytes: Maximum size of the bloom filter (unlimited by default). If a certain fpp generates a bloom filter larger than this size, we will increase the fpp to keep the bloom filter size within this limit.

  • loadOnHeap: Whether to load the bloom filter using heap memory or off-heap memory (false by default).

There are 2 ways of configuring bloom filter for a table in the table config:

  • Configure bloom filter columns with default settings

  • Configure bloom filter columns with customized parameters

Currently bloom filter can only be applied to the dictionary-encoded columns. Bloom filter support for raw value columns is WIP.

Range Index

Range indexing allows user to get better performance for queries which involve filtering over a range. e.g.

SELECT COUNT(*) from baseballStats where hits > 11

Range index is just a variant of inverted index. Instead of creating mapping from values to columns, we create mapping of a range of values to columns. You can use the range index by setting the following config in the table config json.

Currently, range indexing is only supported for dictionary columns. Range indexing support for raw value columns is WIP.

When to use Range Index?

A good thumb rule is to use range index when you want to apply range predicates on metric columns which have very large number of unique values. Using inverted index for such columns will create a very large index that is inefficient in terms of storage and performance.

Releases

The following summarizes Pinot's releases, from the latest one to the earliest one.

Note

Before upgrading from one version to another one, please read the release notes. While the Pinot committers strive to keep releases backward-compatible and introduce new features in a compatible manner, your environment may have a unique combination of configurations/data/schema that may have been somehow overlooked. Before you roll out a new release of Pinot on your cluster, it is best that you run the that Pinot provides. The tests can be easily customized to suit the configurations and tables in your pinot cluster(s). As a good practice, you should build your own test suite, mirroring the table configurations, schema, sample data, and queries that are used in your cluster.

0.9.0 (November 2021)

0.8.0 (August 2021)

0.7.1 (April 2021)

0.6.0 (November 2020)

0.5.0 (September 2020)

0.4.0 (June 2020)

0.3.0 (March 2020)

0.2.0 (November 2019)

0.1.0 (March 2019, First release)

Ingestion FAQ
Query FAQ
Operations FAQ
{
  "OFFLINE": {
    "tableName": "dimBaseballTeams_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "schemaName": "dimBaseballTeams",
    },
    "metadata": {},
    "quota": {
      "storage": "200M"
    },
    "isDimTable": true
  }
}
{
  "dimensionFieldSpecs": [
    {
      "dataType": "STRING",
      "name": "teamID"
    },
    {
      "dataType": "STRING",
      "name": "teamName"
    }
  ],
  "schemaName": "dimBaseballTeams",
  "primaryKeyColumns": ["teamID"]
}
{
  "tableIndexConfig": {
    "bloomFilterColumns": [
      "playerID",
      ...
    ],
    ...
  },
  ...
}
{
  "tableIndexConfig": {
    "bloomFilterConfigs": {
      "playerID": {
        "fpp": 0.01,
        "maxSizeInBytes": 1000000,
        "loadOnHeap": true
      },
      ...
    },
    ...
  },
  ...
}
{
    "tableIndexConfig": {
        "rangeIndexColumns": [
            "column_name",
            ...
        ],
        ...
    }
}
source code
build
-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-plugin-to-include-1,pinot-plugin-to-include-2
#CONTROLLER

pinot.controller.storage.factory.class.[scheme]=className of the pinot file systems
pinot.controller.segment.fetcher.protocols=file,http,[scheme]
pinot.controller.segment.fetcher.[scheme].class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
#SERVER

pinot.server.storage.factory.class.[scheme]=className of the pinotfile systems
pinot.server.segment.fetcher.protocols=file,http,[scheme]
pinot.server.segment.fetcher.[scheme].class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinotFSSpecs
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
Amazon S3
Google Cloud Storage
HDFS
Azure Data Lake Storage

Backfill Data

Introduction

Pinot batch ingestion involves two parts: routing ingestion job(hourly/daily) and backfill. Here are some tutorials on how routine batch ingestion works in Pinot Offline Table:

  • Batch Ingestion Overview

  • Batch Ingestion in Practice

High Level Idea

  1. Organize raw data into buckets (eg: /var/pinot/airlineStats/rawdata/2014/01/01). Each bucket typically contains several files (eg: /var/pinot/airlineStats/rawdata/2014/01/01/airlineStats_data_2014-01-01_0.avro)

  2. Run a Pinot batch ingestion job, which points to a specific date folder like ‘/var/pinot/airlineStats/rawdata/2014/01/01’. The segment generation job will convert each such avro file into a Pinot segment for that day and give it a unique name.

  3. Run Pinot segment push job to upload those segments with those uniques names via a Controller API

IMPORTANT: The segment name is the unique identifier used to uniquely identify that segment in Pinot. If the controller gets an upload request for a segment with the same name - it will attempt to replace it with the new one.

This newly uploaded data can now be queried in Pinot. However, sometimes users will make changes to the raw data which need to be reflected in Pinot. This process is known as 'Backfill'.

How to Backfill data in Pinot

Pinot supports data modification only at the segment level, which means we should update entire segments for doing backfills. The high level idea is to repeat steps 2 (segment generation) and 3 (segment upload) mentioned above:

  • Backfill jobs must run at the same granularity as the daily job. E.g., if you need to backfill data for 2014/01/01, specify that input folder for your backfill job (e.g.: ‘/var/pinot/airlineStats/rawdata/2014/01/01’)

  • The backfill job will then generate segments with the same name as the original job (with the new data).

  • When uploading those segments to Pinot, the controller will replace the old segments with the new ones (segment names act like primary keys within Pinot) one by one.

Edge case

Backfill jobs expect the same number of (or more) data files on the backfill date. So the segment generation job will create the same number of (or more) segments than the original run.

E.g. assuming table airlineStats has 2 segments(airlineStats_2014-01-01_2014-01-01_0, airlineStats_2014-01-01_2014-01-01_1) on date 2014/01/01 and the backfill input directory contains only 1 input file. Then the segment generation job will create just one segment: airlineStats_2014-01-01_2014-01-01_0. After the segment push job, only segment airlineStats_2014-01-01_2014-01-01_0 got replaced and stale data in segment airlineStats_2014-01-01_2014-01-01_1 are still there.

In case the raw data is modified in such a way that the original time bucket has fewer input files than the first ingestion run, backfill will fail.

Lookup UDF Join

Lookup UDF is used to get dimension data via primary key from a dimension table allowing a decoration join functionality. Lookup UDF can only be used with a dimension table in Pinot. The UDF signature is as below:

lookUp('dimTableName', 'dimColToLookUp', 'dimJoinKey1', factJoinKeyVal1, 'dimJoinKey2', factJoinKeyVal2 ... )
  • dimTableName Name of the dim table to perform the lookup on.

  • dimColToLookUp The column name of the dim table to be retrieved to decorate our result.

  • dimJoinKey The column name on which we want to perform the lookup i.e. the join column name for dim table.

  • factJoinKeyVal The value of the dim table join column for which we will retrieve the dimColToLookUp for the scope and invocation.

Return type of the UDF will be that of the dimColToLookUp column type. There can also be multiple primary keys and corresponding values.

Note: If the dimension table uses a composite primary key i.e multiple primary keys, then ensure that the order of keys appearing in the lookup() UDF is same as the order defined for "primaryKeyColumns" in the dimension table schema.

Running on Azure
Running on GCP
Running on AWS
compatibility test suite
0.9.0
0.8.0
0.7.1
0.6.0
0.5.0
0.4.0
0.3.0
0.2.0
0.1.0
Querying Pinot
Supported Transformations
Supported Aggregations
User-Defined Functions (UDFs)
Cardinality Estimation
Lookup UDF Join
Querying JSON data
https://github.com/npawar/pinot-tutorial
tables
Running Pinot locally
Running Pinot in Docker
Running Pinot in Kubernetes
Running on Azure
Running on GCP
Running on AWS
Manual cluster setup
Batch import example
Stream ingestion example

Cluster

Cluster is a set of nodes comprising of servers, brokers, controllers and minions.

Pinot cluster components

Pinot leverages Apache Helix for cluster management. Helix is a cluster management framework to manage replicated, partitioned resources in a distributed system. Helix uses Zookeeper to store cluster state and metadata.

Cluster components

Briefly, Helix divides nodes into three logical components based on their responsibilities

Participant

The nodes that host distributed, partitioned resources

Spectator

The nodes that observe the current state of each Participant and use that information to access the resources. Spectators are notified of state changes in the cluster (state of a participant, or that of a partition in a participant).

Controller

The node that observes and controls the Participant nodes. It is responsible for coordinating all transitions in the cluster and ensuring that state constraints are satisfied while maintaining cluster stability.

Pinot Servers are modeled as Participants, more details about server nodes can be found in Server. Pinot Brokers are modeled as Spectators, more details about broker nodes can be found in Broker. Pinot Controllers are modeled as Controllers, more details about controller nodes can be found in Controller.

Logical view

Another way to visualize the cluster is a logical view, wherein a cluster contains tenants, tenants contain tables, and tables contain segments.

Setup a Pinot Cluster

Typically, there is only one cluster per environment/data center. There is no needed to create multiple Pinot clusters since Pinot supports the concept of tenants. At LinkedIn, the largest Pinot cluster consists of 1000+ nodes.

To setup a Pinot cluster, we need to first start Zookeeper.

0. Create a Network

Create an isolated bridge network in docker

docker network create -d bridge pinot-demo

1. Start Zookeeper

Start Zookeeper in daemon.

docker run \
    --network=pinot-demo \
    --name pinot-zookeeper \
    --restart always \
    -p 2181:2181 \
    -d zookeeper:3.5.6

2. Start Zookeeper UI

Start ZKUI to browse Zookeeper data at http://localhost:9090.

docker run \
    --network pinot-demo --name=zkui \
    -p 9090:9090 \
    -e ZK_SERVER=pinot-zookeeper:2181 \
    -d qnib/plain-zkui:latest

Download Pinot Distribution using instructions in Download

1. Start Zookeeper

bin/pinot-admin.sh StartZookeeper -zkPort 2181

2. Start Zooinspector

Install zooinspector to view the data in Zookeeper, and connect to localhost:2181

Once we've started Zookeeper, we can start other components to join this cluster. If you're using docker, pull the latest apachepinot/pinot image.

Pull pinot docker image

You can try out pre-built Pinot all-in-one docker image.

export PINOT_VERSION=latest
export PINOT_IMAGE=apachepinot/pinot:${PINOT_VERSION}
docker pull ${PINOT_IMAGE}

(Optional) You can also follow the instructions here to build your own images.

To start other components to join the cluster

  1. Start Controller

  2. Start Broker

  3. Start Server

Explore your cluster via Pinot Data Explorer

Troubleshooting Pinot

Is there any debug information available in Pinot?

Pinot offers various ways to assist with troubleshooting and debugging problems that might happen. It is recommended to start off with the debug api which may quickly surface some of the commonly occurring problems. The debug api provides information such as tableSize, ingestion status, any error messages related to state transition in server, among other things.

The table debug api can be invoked via the Swagger UI as follows:

Swagger - Table Debug Api

It can also be invoked directly by accessing the URL as follows. The api requires the tableName, and can optionally take tableType (offline|realtime) and verbosity level.

curl -X GET "http://localhost:9000/debug/tables/airlineStats?verbosity=0" -H "accept: application/json"

Pinot also provides a wide-variety of operational metrics that can be used for creating dashboards, alerting and monitoring. Also, all pinot components log debug information related to error conditions that can be used for troubleshooting.

How do I debug a slow query or a query which keeps timing out

Please use these steps:

  1. If the query executes, look at the query result. Specifically look at numEntriesScannedInFilter and numDocsScanned.

    1. If numEntriesScannedInFilter is very high, consider adding indexes for the corresponding columns being used in the filter predicates. You should also think about partitioning the incoming data based on the dimension most heavily used in your filter queries.

    2. If numDocsScanned is very high, that means the selectivity for the query is low and lots of documents need to be processed after the filtering. Consider refining the filter to increase the selectivity of the query.

  2. If the query is not executing, you can extend the query timeout by appending a timeoutMs parameter to the query (eg: select * from mytable limit 10 option(timeoutMs=60000)). Then you can repeat step 1.

  3. You can also look at GC stats for the corresponding Pinot servers. If a particular server seems to be running full GC all the time, you can do a couple of things such as

    1. Increase JVM heap (Xmx)

    2. Consider using off-heap memory for segments

    3. Decrease the total number of segments per server (by partitioning the data in a better way)

Pinot On Kubernetes FAQ

How to increase server disk size on AWS

Below is an example of AWS EKS.

1. Update Storage Class

In the K8s cluster, check the storage class: in AWS, it should be gp2.

Then update StorageClass to ensure:

allowVolumeExpansion: true

Once StorageClass is updated, it should be like:

2. Update PVC

Once the storage class is updated, then we can update PVC for the server disk size.

Now we want to double the disk size for pinot-server-3.

Below is an example of current disks:

Below is the output of data-pinot-server-3

PVC data-pinot-server-3

Now, let's change the PVC size to 2T by editing the server PVC.

kubectl edit pvc data-pinot-server-3 -n pinot

Once updated, the spec's PVC size is updated to 2T, but the status's PVC size is still 1T.

3. Restart pod to let it reflect

Restart pinot-server-3 pod:

Recheck PVC size:

Import Data

This section is an overview of the various options for importing data into Pinot.

There are multiple options for importing data into Pinot. These guides are ready-made examples that show you step-by-step instructions for importing records into Pinot, supported by our plugin architecture.

These guides are meant to get you up and running with imported data as quick as possible. Pinot supports multiple file input formats without needing to change anything other than the file name. Each example imports a ready-made dataset so you can see how things work without needing to bring your own dataset.

Pinot Batch Ingestion

Pinot Stream Ingestion

This guide will show you how to import data using stream ingestion from Apache Kafka topics.

This guide will show you how to import data using stream ingestion with upsert.

Pinot File Systems

By default, Pinot does not come with a storage layer, so all the data sent, won't be stored in case of system crash. In order to persistently store the generated segments, you will need to change controller and server configs to add a deep storage. Checkout File systems for all the info and related configs.

These guides will show you how to import data as well as persist it in the file systems.

Pinot Input Formats

These guides will show you how to import data from a Pinot supported input format.

This guide will show you how to handle the complex type in the ingested data, such as map and array.

Inverted Index

Bitmap inverted index

When inverted index is enabled for a column, Pinot maintains a map from each value to a bitmap, which makes value lookup to be constant time. When you have a column that is used for filtering frequently, adding an inverted index will improve the performance greatly.

Inverted index can be configured for a table by setting it in the table config as

{
    "tableIndexConfig": {
        "invertedIndexColumns": [
            "column_name",
            ...
        ],
        ...
    }
}

Sorted inverted index

Sorted forward index can directly be used as inverted index, with log(n) time lookup and it can benefit from data locality.

For the below example, if the query has a filter on memberId, Pinot will perform binary search on memberId values to find the range pair of docIds for corresponding filtering value. If the query requires to scan values for other columns after filtering, values within the range docId pair will be located together; therefore, we can benefit a lot from data locality.

_images/sorted-inverted.png

Sorted index performs much better than inverted index; however, it can only be applied to one column. When the query performance with inverted index is not good enough and most of queries have a filter on a specific column (e.g. memberId), sorted index can improve the query performance.

Hdfs as Deep Storage

This guide helps to setup HDFS as deepstorage for Pinot Segment.

To use HDFS as deep storage you need to include HDFS dependency jars and plugins.

Server Setup

Configuration.

Executable.

Controller Setup

Configuration.

Executable.

Broker Setup

Configuration.

Executable.

Forward Index

Forward index

Dictionary-encoded forward index with bit compression (default)

For each unique value from a column, we assign an id to it, and build a dictionary from the id to the value. Then in the forward index, we only store the bit-compressed ids instead of the values. With few number of unique values, dictionary-encoding can significantly improve the space efficiency of the storage.

The below diagram shows the dictionary encoding for two columns with integer and string types. As seen in the colA, dictionary encoding will save significant amount of space for duplicated values. On the other hand, colB has no duplicated data. Dictionary encoding will not compress much data in this case where there are a lot of unique values in the column. For string type, we pick the length of the longest value and use it as the length for dictionary’s fixed length value array. In this case, padding overhead can be high if there are a large number of unique values for a column.

Raw value forward index

In contrast to the dictionary-encoded forward index, raw value forward index directly stores values instead of ids.

Without the dictionary, the dictionary lookup step can be skipped for each value fetch. Also, the index can take advantage of the good locality of the values, thus improve the performance of scanning large number of values.

A typical use case to apply raw value forward index is when the column has a large number of unique values and the dictionary does not provide much compression. As seen the above diagram for dictionary encoding, scanning values with a dictionary involves a lot of random access because we need to perform dictionary look up. On the other hand, we can scan values sequentially with raw value forward index and this can improve performance a lot when applied appropriately.

Raw value forward index can be configured for a table by setting it in the table config as

Sorted forward index with run-length encoding

When a column is physically sorted, Pinot uses a sorted forward index with run-length encoding on top of the dictionary-encoding. Instead of saving dictionary ids for each document id, we store a pair of start and end document id for each value. (The below diagram does not include dictionary encoding layer for simplicity.)

Sorted forward index has the advantages of both good compression and data locality. Sorted forward index can also be used as inverted index.

Sorted index can be configured for a table by setting it in the table config as

Note: A given Pinot table can only have 1 sorted column

Real-time server will sort data on sortedColumn when generating segment internally. For offline push, input data needs to be sorted before running Pinot segment conversion and push job.

When applied correctly, one can find the following information on the segment metadata.

Broker Query API

REST API on the Broker

Pinot can be queried via a broker endpoint as follows. This example assumes broker is running on localhost:8099

The Pinot REST API can be accessed by invoking POST operation with a JSON body containing the parameter sql to the /query/sql endpoint on a broker.

Note

This endpoint is deprecated, and will soon be removed. The standard-SQL endpoint is the recommended endpoint.

The PQL endpoint can be accessed by invoking POST operation with a JSON body containing the parameter pql to the /query endpoint on a broker.

Query Console

Query Console can be used for running ad-hoc queries (checkbox available to query the PQL endpoint). The Query Console can be accessed by entering the <controller host>:<controller port> in your browser

pinot-admin

You can also query using the pinot-admin scripts. Make sure you follow instructions in to get Pinot locally, and then

JDBC

Pinot offers standard JDBC interface to query the database. This makes it easier to integrate Pinot with other applications such as Tableau.

Installation

You can include the JDBC dependency in your code as follows -

You can also compile the into a JAR and place the JAR in the Drivers directory of your application.

There is no need to register the driver manually as it will automatically register itself at the startup of the application.

Usage

Here's an example of how to use the pinot-jdbc-client for querying. The client only requires the controller URL.

You can also use PreparedStatements. The placeholder parameters are represented using ? ** (question mark) symbol.

Limitation

The JDBC client doesn't support INSERT, DELETE or UPDATE statements due to the database limitations. You can only use the client to query the database. The driver is also not completely ANSI SQL 92 compliant.

If you want to use JDBC driver to integrate Pinot with other applications, do make sure to check JDBC ConnectionMetadata in your code. This will help in determining which features cannot be supported by Pinot since it is an OLAP database.

Spark
Hadoop
Apache Kafka
Stream Ingestion with Upsert
Amazon S3
Azure Data Lake Storage
Google Cloud Storage
HDFS
Input formats
Complex Type (Array, Map) Handling
pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.server.storage.factory.hdfs.hadoop.conf.path=/path/to/hadoop/conf/directory/
pinot.server.segment.fetcher.protocols=file,http,hdfs
pinot.server.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.server.segment.fetcher.hdfs.hadoop.kerberos.principle=<your kerberos principal>
pinot.server.segment.fetcher.hdfs.hadoop.kerberos.keytab=<your kerberos keytab>
pinot.set.instance.id.to.hostname=true
pinot.server.instance.dataDir=/path/in/local/filesystem/for/pinot/data/server/index
pinot.server.instance.segmentTarDir=/path/in/local/filesystem/for/pinot/data/server/segment
pinot.server.grpc.enable=true
pinot.server.grpc.port=8090
export HADOOP_HOME=/path/to/hadoop/home
export HADOOP_VERSION=2.7.1
export HADOOP_GUAVA_VERSION=11.0.2
export HADOOP_GSON_VERSION=2.2.4
export GC_LOG_LOCATION=/path/to/gc/log/file
export PINOT_VERSION=0.8.0
export PINOT_DISTRIBUTION_DIR=/path/to/apache-pinot-${PINOT_VERSION}-bin/
export SERVER_CONF_DIR=/path/to/pinot/conf/dir/
export ZOOKEEPER_ADDRESS=localhost:2181


export CLASSPATH_PREFIX="${HADOOP_HOME}/share/hadoop/hdfs/hadoop-hdfs-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-annotations-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-auth-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/hadoop-common-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/guava-${HADOOP_GUAVA_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/gson-${HADOOP_GSON_VERSION}.jar"
export JAVA_OPTS="-Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:${GC_LOG_LOCATION}/gc-pinot-server.log"
${PINOT_DISTRIBUTION_DIR}/bin/start-server.sh  -zkAddress ${ZOOKEEPER_ADDRESS} -configFileName ${SERVER_CONF_DIR}/server.conf
controller.data.dir=hdfs://path/in/hdfs/for/controller/segment
controller.local.temp.dir=/tmp/pinot/
controller.zk.str=<ZOOKEEPER_HOST:ZOOKEEPER_PORT>
controller.enable.split.commit=true
controller.access.protocols.http.port=9000
controller.helix.cluster.name=PinotCluster
pinot.controller.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.controller.storage.factory.hdfs.hadoop.conf.path=/path/to/hadoop/conf/directory/
pinot.controller.segment.fetcher.protocols=file,http,hdfs
pinot.controller.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.principle=<your kerberos principal>
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.keytab=<your kerberos keytab>
controller.vip.port=9000
controller.port=9000
pinot.set.instance.id.to.hostname=true
pinot.server.grpc.enable=true
export HADOOP_HOME=/path/to/hadoop/home
export HADOOP_VERSION=2.7.1
export HADOOP_GUAVA_VERSION=11.0.2
export HADOOP_GSON_VERSION=2.2.4
export GC_LOG_LOCATION=/path/to/gc/log/file
export PINOT_VERSION=0.8.0
export PINOT_DISTRIBUTION_DIR=/path/to/apache-pinot-${PINOT_VERSION}-bin/
export SERVER_CONF_DIR=/path/to/pinot/conf/dir/
export ZOOKEEPER_ADDRESS=localhost:2181


export CLASSPATH_PREFIX="${HADOOP_HOME}/share/hadoop/hdfs/hadoop-hdfs-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-annotations-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-auth-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/hadoop-common-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/guava-${HADOOP_GUAVA_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/gson-${HADOOP_GSON_VERSION}.jar"
export JAVA_OPTS="-Xms8G -Xmx12G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:${GC_LOG_LOCATION}/gc-pinot-controller.log"
${PINOT_DISTRIBUTION_DIR}/bin/start-controller.sh -configFileName ${SERVER_CONF_DIR}/controller.conf
pinot.set.instance.id.to.hostname=true
pinot.server.grpc.enable=true
export HADOOP_HOME=/path/to/hadoop/home
export HADOOP_VERSION=2.7.1
export HADOOP_GUAVA_VERSION=11.0.2
export HADOOP_GSON_VERSION=2.2.4
export GC_LOG_LOCATION=/path/to/gc/log/file
export PINOT_VERSION=0.8.0
export PINOT_DISTRIBUTION_DIR=/path/to/apache-pinot-${PINOT_VERSION}-bin/
export SERVER_CONF_DIR=/path/to/pinot/conf/dir/
export ZOOKEEPER_ADDRESS=localhost:2181


export CLASSPATH_PREFIX="${HADOOP_HOME}/share/hadoop/hdfs/hadoop-hdfs-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-annotations-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-auth-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/hadoop-common-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/guava-${HADOOP_GUAVA_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/gson-${HADOOP_GSON_VERSION}.jar"
export JAVA_OPTS="-Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:${GC_LOG_LOCATION}/gc-pinot-broker.log"
${PINOT_DISTRIBUTION_DIR}/bin/start-broker.sh -zkAddress ${ZOOKEEPER_ADDRESS} -configFileName  ${SERVER_CONF_DIR}/broker.conf
public static final String DB_URL = "jdbc:pinot://localhost:9000"
DriverManager.registerDriver(new PinotDriver());
Connection conn = DriverManager.getConnection(DB_URL);
Statement statement = conn.createStatement();
Integer limitResults = 10;
ResultSet rs = statement.executeQuery(String.format("SELECT UPPER(playerName) AS name FROM baseballStats LIMIT %d", limitResults));
Set<String> results = new HashSet<>();

while(rs.next()){
 String playerName = rs.getString("name");
 results.add(playerName);
}

conn.close();
Connection conn = DriverManager.getConnection(DB_URL);
PreparedStatement statement = conn.prepareStatement("SELECT UPPER(playerName) AS name FROM baseballStats WHERE age = ?");
statement.setInt(1, 20);

ResultSet rs = statement.executeQuery();
Set<String> results = new HashSet<>();

while(rs.next()){
 String playerName = rs.getString("name");
 results.add(playerName);
}

conn.close();
JDBC code
<dependency>
    <groupId>org.apache.pinot</groupId>
    <artifactId>pinot-jdbc-client</artifactId>
    <version>0.8.0</version>
</dependency>
include 'org.apache.pinot:pinot-jdbc-client:0.8.0'

Running Pinot locally

This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.

In this guide you'll learn how to download and install Apache Pinot as a standalone instance.

This is a quick start guide that will show you how to quickly start an example recipe in a standalone instance and is meant for learning. To run Pinot in cluster mode, please take a look at Manual cluster setup.

Download Apache Pinot

First, let's download the Pinot distribution for this tutorial. You can either build the distribution from source or download a packaged release.

Prerequisites

Install JDK11 or higher (JDK16 is not yet supported) For JDK 8 support use Pinot 0.7.1 or compile from source

Build from source or download the distribution

Follow these steps to checkout code from Github and build Pinot locally

Prerequisites

Install Apache Maven 3.6 or higher

# checkout pinot
git clone https://github.com/apache/pinot.git
cd pinot

# build pinot
mvn install package -DskipTests -Pbin-dist

# navigate to directory containing the setup scripts
cd pinot-distribution/target/apache-pinot-$PINOT_VERSION-bin/apache-pinot-$PINOT_VERSION-bin

Add maven option -Djdk.version=8 when building with JDK 8

Note that Pinot scripts is located under pinot-distribution/target not target directory under root.

Download the latest binary release from Apache Pinot, or use this command

PINOT_VERSION=0.8.0 #set to the Pinot version you decide to use

wget https://downloads.apache.org/pinot/apache-pinot-$PINOT_VERSION/apache-pinot-$PINOT_VERSION-bin.tar.gz

Once you have the tar file,

# untar it
tar -zxvf apache-pinot-$PINOT_VERSION-bin.tar.gz

# navigate to directory containing the launcher scripts
cd apache-pinot-$PINOT_VERSION-bin

Setting up a Pinot cluster

We'll be using the quick-start scripts provided along with pinot distribution, which do the following:

  1. Set up the Pinot cluster QuickStartCluster

  2. Create a sample table and load sample data

The following quick start scripts are available. Please note though, these scripts launch the Pinot cluster with minimal resources. If you intend to play with sizable data (more than few MB), you may want to follow the Manual cluster setup and provide required resources.

Batch

Batch quick start creates the pinot cluster, creates an offline table baseballStats and pushes sample offline data to the table.

bin/quick-start-batch.sh

That's it! We've spun up a Pinot cluster. You can continue playing with other types of quick start, or simply head on to Pinot Data Explorer to check out the data in the baseballStats table.

Streaming

Streaming quick start sets up a Kafka cluster and pushes sample data to a Kafka topic. Then, it creates the Pinot cluster and creates a realtime table meetupRSVP which ingests data from the Kafka topic.

# stop previous quick start cluster, if any
bin/quick-start-streaming.sh

We now have a Pinot cluster with a realtime table! You can head over to Pinot Data Explorer to check out the data in the meetupRSVP table.

Hybrid

Hybrid quick start sets up a Kafka cluster and pushes sample data to a Kafka topic. Then, it creates the Pinot cluster and creates a hybrid table airlineStats . The realtime table ingests data from the Kafka topic. Lastly, sample data is pushed into the offline table.

# stop previous quick start cluster, if any
bin/quick-start-hybrid.sh

Let's head over to Pinot Data Explorer to check out the data we pushed to the airlineStats table.

Running on Azure

This starter guide provides a quick start for running Pinot on Microsoft Azure

This document provides the basic instruction to set up a Kubernetes Cluster on Azure Kubernetes Service (AKS)

1. Tooling Installation

1.1 Install Kubectl

Please follow this link (https://kubernetes.io/docs/tasks/tools/install-kubectl) to install kubectl.

For Mac User

brew install kubernetes-cli

Please check kubectl version after installation.

kubectl version

QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Please follow this link (https://helm.sh/docs/using_helm/#installing-helm) to install helm.

For Mac User

brew install kubernetes-helm

Please check helm version after installation.

helm version

This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

1.3 Install Azure CLI

Please follow this link (https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) to install Azure CLI.

For Mac User

brew update && brew install azure-cli

2. (Optional) Login to your Azure account

Below script will open default browser to sign-in to your Azure Account.

az login

3. (Optional) Create a Resource Group

Below script will create a resource group in location eastus.

AKS_RESOURCE_GROUP=pinot-demo
AKS_RESOURCE_GROUP_LOCATION=eastus
az group create --name ${AKS_RESOURCE_GROUP} \
                --location ${AKS_RESOURCE_GROUP_LOCATION}

4. (Optional) Create a Kubernetes cluster(AKS) in Azure

Below script will create a 3 nodes cluster named pinot-quickstart for demo purposes.

Please modify the parameters in the example command below:

AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks create --resource-group ${AKS_RESOURCE_GROUP} \
              --name ${AKS_CLUSTER_NAME} \
              --node-count 3

Once the command is succeed, it's ready to be used.

5. Connect to an existing cluster

Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks get-credentials --resource-group ${AKS_RESOURCE_GROUP} \
                       --name ${AKS_CLUSTER_NAME}

To verify the connection, you can run:

kubectl get nodes

6. Pinot Quickstart

Please follow this Kubernetes QuickStart to deploy your Pinot Demo.

7. Delete a Kubernetes Cluster

AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks delete --resource-group ${AKS_RESOURCE_GROUP} \
              --name ${AKS_CLUSTER_NAME}

Running on GCP

This starter provides a quick start for running Pinot on Google Cloud Platform (GCP)

This document provides the basic instruction to set up a Kubernetes Cluster on Google Kubernetes Engine(GKE)

1. Tooling Installation

1.1 Install Kubectl

Please follow this link (https://kubernetes.io/docs/tasks/tools/install-kubectl) to install kubectl.

For Mac User

brew install kubernetes-cli

Please check kubectl version after installation.

kubectl version

QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Please follow this link (https://helm.sh/docs/using_helm/#installing-helm) to install helm.

For Mac User

brew install kubernetes-helm

Please check helm version after installation.

helm version

This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

1.3 Install Google Cloud SDK

Please follow this link (https://cloud.google.com/sdk/install) to install Google Cloud SDK.

1.3.1 For Mac User

  • Install Google Cloud SDK

curl https://sdk.cloud.google.com | bash
  • Restart your shell

exec -l $SHELL

2. (Optional) Initialize Google Cloud Environment

gcloud init

3. (Optional) Create a Kubernetes cluster(GKE) in Google Cloud

Below script will create a 3 nodes cluster named pinot-quickstart in us-west1-b with n1-standard-2 machines for demo purposes.

Please modify the parameters in the example command below:

GCLOUD_PROJECT=[your gcloud project name]
GCLOUD_ZONE=us-west1-b
GCLOUD_CLUSTER=pinot-quickstart
GCLOUD_MACHINE_TYPE=n1-standard-2
GCLOUD_NUM_NODES=3
gcloud container clusters create ${GCLOUD_CLUSTER} \
  --num-nodes=${GCLOUD_NUM_NODES} \
  --machine-type=${GCLOUD_MACHINE_TYPE} \
  --zone=${GCLOUD_ZONE} \
  --project=${GCLOUD_PROJECT}

You can monitor cluster status by command:

gcloud compute instances list

Once the cluster is in RUNNING status, it's ready to be used.

4. Connect to an existing cluster

Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

GCLOUD_PROJECT=[your gcloud project name]
GCLOUD_ZONE=us-west1-b
GCLOUD_CLUSTER=pinot-quickstart
gcloud container clusters get-credentials ${GCLOUD_CLUSTER} --zone ${GCLOUD_ZONE} --project ${GCLOUD_PROJECT}

To verify the connection, you can run:

kubectl get nodes

5. Pinot Quickstart

Please follow this Kubernetes QuickStart to deploy your Pinot Demo.

6. Delete a Kubernetes Cluster

GCLOUD_ZONE=us-west1-b
gcloud container clusters delete pinot-quickstart --zone=${GCLOUD_ZONE}

Running on AWS

This guide provides a quick start for running Pinot on Amazon Web Services (AWS).

This document provides the basic instruction to set up a Kubernetes Cluster on Amazon Elastic Kubernetes Service (Amazon EKS)

1. Tooling Installation

1.1 Install Kubectl

Please follow this link (https://kubernetes.io/docs/tasks/tools/install-kubectl) to install kubectl.

For Mac User

brew install kubernetes-cli

Please check kubectl version after installation.

kubectl version

QuickStart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12

1.2 Install Helm

Please follow this link (https://helm.sh/docs/using_helm/#installing-helm) to install helm.

For Mac User

brew install kubernetes-helm

Please check helm version after installation.

helm version

This QuickStart provides helm supports for helm v3.0.0 and v2.12.1. Please pick the script based on your helm version.

1.3 Install AWS CLI

Please follow this link (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html#install-tool-bundled) to install AWS CLI.

For Mac User

curl "https://d1vvhvl2y92vvt.cloudfront.net/awscli-exe-macos.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

1.4 Install Eksctl

Please follow this link (https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html#installing-eksctl) to install AWS CLI.

For Mac User

brew tap weaveworks/tap
brew install weaveworks/tap/eksctl

2. (Optional) Login to your AWS account.

For first time AWS user, please register your account at https://aws.amazon.com/.

Once created the account, you can go to AWS Identity and Access Management (IAM) to create a user and create access keys under Security Credential tab.

aws configure

Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will override AWS configuration stored in file ~/.aws/credentials

3. (Optional) Create a Kubernetes cluster(EKS) in AWS

The script below will create a 1 node cluster named pinot-quickstart in us-west-2 with a t3.xlarge machine for demo purposes:

EKS_CLUSTER_NAME=pinot-quickstart
eksctl create cluster \
--name ${EKS_CLUSTER_NAME} \
--version 1.16 \
--region us-west-2 \
--nodegroup-name standard-workers \
--node-type t3.xlarge \
--nodes 1 \
--nodes-min 1 \
--nodes-max 1 \
--node-ami auto

You can monitor the cluster status via this command:

EKS_CLUSTER_NAME=pinot-quickstart
aws eks describe-cluster --name ${EKS_CLUSTER_NAME}

Once the cluster is in ACTIVE status, it's ready to be used.

4. Connect to an existing cluster

Simply run below command to get the credential for the cluster pinot-quickstart that you just created or your existing cluster.

EKS_CLUSTER_NAME=pinot-quickstart
aws eks update-kubeconfig --name ${EKS_CLUSTER_NAME}

To verify the connection, you can run:

kubectl get nodes

5. Pinot Quickstart

Please follow this Kubernetes QuickStart to deploy your Pinot Demo.

6. Delete a Kubernetes Cluster

EKS_CLUSTER_NAME=pinot-quickstart
aws eks delete-cluster --name ${EKS_CLUSTER_NAME}

Spark

Pinot supports Apache spark as a processor to create and push segment files to the database. Pinot distribution is bundled with the Spark code to process your files and convert and upload them to Pinot.

You can follow the wiki to build pinot distribution from source. The resulting JAR file can be found in pinot/target/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar

Next, you need to change the execution config in the job spec to the following -

# executionFrameworkSpec: Defines ingestion jobs to be running.
executionFrameworkSpec:

  # name: execution framework name
  name: 'spark'

  # segmentGenerationJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'

  # segmentTarPushJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentTarPushJobRunner'

  # segmentUriPushJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner'

  #segmentMetadataPushJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentMetadataPushJobRunner'

  # extraConfigs: extra configs for execution framework.
  extraConfigs:

    # stagingDir is used in distributed filesystem to host all the segments then move this directory entirely to output directory.
    stagingDir: your/local/dir/staging

You can check out the sample job spec here.

Now, add the pinot jar to spark's classpath using following options -

spark.driver.extraJavaOptions =>
-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins

OR

spark.driver.extraClassPath =>
pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar

Please ensure environment variables PINOT_ROOT_DIR and PINOT_VERSION are set properly.

Finally execute the spark job using the command -

export PINOT_VERSION=0.8.0
export PINOT_DISTRIBUTION_DIR=${PINOT_ROOT_DIR}/pinot-distribution/target/apache-pinot-${PINOT_VERSION}-bin/apache-pinot-${PINOT_VERSION}-bin

cd ${PINOT_DISTRIBUTION_DIR}

${SPARK_HOME}/bin/spark-submit \\
  --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \\
  --master "local[2]" \\
  --deploy-mode client \\
  --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \\
  --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar" \\
  local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \\
  -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/batch/airlineStats/sparkIngestionJobSpec.yaml

Note: You should change the master to yarn and deploy-mode to cluster for production.

Azure Data Lake Storage

This guide shows you how to import data from files stored in Azure Data Lake Storage (ADLS)

You can enable the Azure Data Lake Storage using the plugin pinot-adls. In the controller or server, add the config -

-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-adls

By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...

Azure Blob Storage provides the following options -

  • accountName : Name of the azure account under which the storage is created

  • accessKey : access key required for the authentication

  • fileSystemName - name of the filesystem to use i.e. container name (container name is similar to bucket name in S3)

  • enableChecksum - enable MD5 checksum for verification. Default is false.

Each of these properties should be prefixed by pinot.[node].storage.factory.class.abfss. where node is either controller or server depending on the config

e.g.

pinot.controller.storage.factory.class.adl.accountName=test-user

Examples

Job spec

executionFrameworkSpec:
    name: 'standalone'
    segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
    segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
    segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: 'adl://path/to/input/directory/'
outputDirURI: 'adl://path/to/output/directory/'
overwriteOutput: true
pinotFSSpecs:
    - scheme: adl
      className: org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
      configs:
        accountName: 'my-account'
        accessKey: 'foo-bar-1234'
        fileSystemName: 'fs-name'
recordReaderSpec:
    dataFormat: 'csv'
    className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
    configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
    tableName: 'students'
pinotClusterSpecs:
    - controllerURI: 'http://localhost:9000'

Controller config

controller.data.dir=adl://path/to/data/directory/
controller.local.temp.dir=/path/to/local/temp/directory
controller.enable.split.commit=true
pinot.controller.storage.factory.class.adl=org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
pinot.controller.storage.factory.adl.accountName=my-account
pinot.controller.storage.factory.adl.accessKey=foo-bar-1234
pinot.controller.storage.factory.adl.fileSystemName=fs-name
pinot.controller.segment.fetcher.protocols=file,http,adl
pinot.controller.segment.fetcher.adl.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Server config

pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.adl=org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
pinot.server.storage.factory.adl.accountName=my-account
pinot.server.storage.factory.adl.accessKey=foo-bar-1234
pinot.controller.storage.factory.adl.fileSystemName=fs-name
pinot.server.segment.fetcher.protocols=file,http,adl
pinot.server.segment.fetcher.adl.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Minion config

storage.factory.class.adl=org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
storage.factory.adl.accountName=my-account
storage.factory.adl.fileSystemName=fs-name
storage.factory.adl.accessKey=foo-bar-1234
segment.fetcher.protocols=file,http,adl
segment.fetcher.adl.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Google Cloud Storage

This guide shows you how to import data from GCP (Google Cloud Platform).

You can enable the Google Cloud Storage using the plugin pinot-gcs. In the controller or server, add the config -

-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-gcs

By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...

GCP filesystems provides the following options -

  • projectId - The name of the Google Cloud Platform project under which you have created your storage bucket.

  • gcpKey - Location of the json file containing GCP keys. You can refer Creating and managing service account keys to download the keys.

Each of these properties should be prefixed by pinot.[node].storage.factory.class.gs. where node is either controller or server depending on the config

e.g.

pinot.controller.storage.factory.class.gs.projectId=test-project

Examples

Job spec

executionFrameworkSpec:
    name: 'standalone'
    segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
    segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
    segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: 'gs://my-bucket/path/to/input/directory/'
outputDirURI: 'gs://my-bucket/path/to/output/directory/'
overwriteOutput: true
pinotFSSpecs:
    - scheme: gs
      className: org.apache.pinot.plugin.filesystem.GcsPinotFS
      configs:
        projectId: 'my-project'
        gcpKey: 'path-to-gcp json key file'
recordReaderSpec:
    dataFormat: 'csv'
    className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
    configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
    tableName: 'students'
pinotClusterSpecs:
    - controllerURI: 'http://localhost:9000'

Controller config

controller.data.dir=gs://path/to/data/directory/
controller.local.temp.dir=/path/to/local/temp/directory
controller.enable.split.commit=true
pinot.controller.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
pinot.controller.storage.factory.gs.projectId=my-project
pinot.controller.storage.factory.gs.gcpKey=path/to/gcp/key.json
pinot.controller.segment.fetcher.protocols=file,http,gs
pinot.controller.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Server config

pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
pinot.server.storage.factory.gs.projectId=my-project
pinot.server.storage.factory.gs.gcpKey=path/to/gcp/key.json
pinot.server.segment.fetcher.protocols=file,http,gs
pinot.server.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Minion config

storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
storage.factory.gs.projectId=my-project
storage.factory.gs.gcpKey=path/to/gcp/key.json
segment.fetcher.protocols=file,http,gs
segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Indexing

This page describes the different indexing techniques available in Pinot

Pinot supports the following indexing techniques:

  • Forward Index

    • Dictionary-encoded forward index with bit compression

    • Raw value forward index

    • Sorted forward index with run-length encoding

  • Inverted Index

    • Bitmap inverted index

    • Sorted inverted index

  • Star-tree Index

  • Bloom Filter

  • Range Index

  • Text Index

  • Geospatial

  • JSON Index

Each of these techniques has advantages in different query scenarios. By default, Pinot creates a dictionary-encoded forward index for each column.

Enabling indexes

There are 2 ways to create indexes for a Pinot table.

As part of ingestion, during Pinot segment generation

Indexing is enabled by specifying the desired column names in the table config. More details about how to configure each type of index can be found in the respective index's section above or in the Table Config section.

Dynamically added or removed

Indexes can also be dynamically added to or removed from segments at any point. Update your table config with the latest set of indexes you wish to have. For example, if you had an inverted index on foo and now want to include bar, you would update your table config from this:

"tableIndexConfig": {
        "invertedIndexColumns": ["foo"],
        ...
    }

To this:

"tableIndexConfig": {
        "invertedIndexColumns": ["foo", "bar"],
        ...
    }

Next, invoke reload API. This API sends reload messages via Helix to all servers, as part of which indexes are added or removed from the local segments. This happens without any downtime and is completely transparent to the queries. In the case of the addition of an index, only the new index is created and appended to the existing segment. In the case of the removal of an index, its related states are cleaned up from Pinot servers. You can find this API under Segments tab on Swagger:

curl -X POST 
"http://localhost:9000/segments/myTable/reload" 
-H "accept: application/json"

Or you can also find this action on the Pinot UI, on the specific table's page.

Tuning Index

The inverted index will provide good performance for most use cases, especially if your use case doesn't have a strict low latency requirement. You should start by using this, and if your queries aren't fast enough, switch to advanced indices like the sorted or Star-Tree index.

0.2.0

The 0.2.0 release is the first release after the initial one and includes several improvements, reported following.

New Features and Bug Fixes

  • Added support for Kafka 2.0

  • Table rebalancer now supports a minimum number of serving replicas during rebalance

  • Added support for UDF in filter predicates and selection

  • Added support to use hex string as the representation of byte array for queries (see PR #4041)

  • Added support for parquet reader (see PR #3852)

  • Introduced interface stability and audience annotations (see PR #4063)

  • Refactor HelixBrokerStarter to separate constructor and start() - backwards incompatible (see PR #4100)

  • Admin tool for listing segments with invalid intervals for offline tables

  • Migrated to log4j2 (see PR #4139)

  • Added simple avro msg decoder

  • Added support for passing headers in Pinot client

  • Table rebalancer now supports a minimum number of serving replicas during rebalance

  • Support transform functions with AVG aggregation function (see PR #4557)

  • Configurations additions/changes

    • Allow customized metrics prefix (see PR #4392)

    • Controller.enable.batch.message.mode to false by default (see PR #3928)

    • RetentionManager and OfflineSegmentIntervalChecker initial delays configurable (see PR #3946)

    • Config to control kafka fetcher size and increase default (see PR #3869)

    • Added a percent threshold to consider startup of services (see PR #4011)

    • Make SingleConnectionBrokerRequestHandler as default (see PR #4048)

    • Always enable default column feature, remove the configuration (see PR #4074)

    • Remove redundant default broker configurations (see PR #4106)

    • Removed some config keys in server(see PR #4222)

    • Add config to disable HLC realtime segment (see PR #4235)

    • Make RetentionManager and OfflineSegmentIntervalChecker initial delays configurable (see PR #3946)

    • The following config variables are deprecated and will be removed in the next release:

      • pinot.broker.requestHandlerType will be removed, in favor of using the "singleConnection" broker request handler. If you have set this configuration, please remove it and use the default type ("singleConnection") for broker request handler.

Work in Progress

  • We are in the process of separating Helix and Pinot controllers, so that administrators can have the option of running independent Helix controllers and Pinot controllers.

  • We are in the process of moving towards supporting SQL query format and results.

  • We are in the process of separating instance and segment assignment using instance pools to optimize the number of Helix state transitions in Pinot clusters with thousands of tables.

Other Notes

  • Task management does not work correctly in this release, due to bugs in Helix. We will upgrade to Helix 0.9.2 (or later) version to get this fixed.

  • You must upgrade to this release before moving onto newer versions of Pinot release. The protocol between Pinot-broker and Pinot-server has been changed and this release has the code to retain compatibility moving forward. Skipping this release may (depending on your environment) cause query errors if brokers are upgraded and servers are in the process of being upgraded.

  • As always, we recommend that you upgrade controllers first, and then brokers and lastly the servers in order to have zero downtime in production clusters.

  • Pull Request #4100 introduces a backwards incompatible change to Pinot broker. If you use the Java constructor on HelixBrokerStarter class, then you will face a compilation error with this version. You will need to construct the object and call start() method in order to start the broker.

  • Pull Request #4139 introduces a backwards incompatible change for log4j configuration. If you used a custom log4j configuration (log4j.xml), you need to write a new log4j2 configuration (log4j2.xml). In addition, you may need to change the arguments on the command line to start Pinot components.

    If you used Pinot-admin command to start Pinot components, you don't need any change. If you used your own commands to start pinot components, you will need to pass the new log4j2 config as a jvm parameter (i.e. substitute -Dlog4j.configuration or -Dlog4j.configurationFile argument with -Dlog4j2.configurationFile=log4j2.xml).

Cardinality Estimation

Cardinality estimation is a classic problem. Pinot solves it with multiple ways each of which has a trade-off between accuracy and latency.

Accurate Results

Functions:

  • DistinctCount(x) -> LONG

Returns accurate count for all unique values in a column.

The underlying implementation is using a IntOpenHashSet in library: it.unimi.dsi:fastutil:8.2.3 to hold all the unique values.

Approximation Results

It usually takes a lot of resources and time to compute accurate results for unique counting on large datasets. In some circumstances, we can tolerate a certain error rate, in which case we can use approximation functions to tackle this problem.

HyperLogLog

HyperLogLog is an approximation algorithm for unique counting. It uses fixed number of bits to estimate the cardinality of given data set.

Pinot leverages HyperLogLog Class in library com.clearspring.analytics:stream:2.7.0as the data structure to hold intermediate results.

Functions:

  • DistinctCountHLL(x)_ -> LONG_

For column type INT/LONG/FLOAT/DOUBLE/STRING , Pinot treats each value as an individual entry to add into HyperLogLog Object, then compute the approximation by calling method cardinality().

For column type BYTES, Pinot treats each value as a serialized HyperLogLog Object with pre-aggregated values inside. The bytes value is generated by org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog).

All deserialized HyperLogLog object will be merged into one then calling method **cardinality() **to get the approximated unique count.

Theta Sketches

The Theta Sketch framework enables set operations over a stream of data, and can also be used for cardinality estimation. Pinot leverages the Sketch Class and its extensions from the library org.apache.datasketches:datasketches-java:1.2.0-incubating to perform distinct counting as well as evaluating set operations.

Functions:

  • DistinctCountThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**) **-> LONG

    • thetaSketchColumn (required): Name of the column to aggregate on.

    • thetaSketchParams (required): Parameters for constructing the intermediate theta-sketches. Currently, the only supported parameter is nominalEntries.

    • predicates (optional)_: _ These are individual predicates of form lhs <op> rhs which are applied on rows selected by the where clause. During intermediate sketch aggregation, sketches from the thetaSketchColumn that satisfies these predicates are unionized individually. For example, all filtered rows that match country=USA are unionized into a single sketch. Complex predicates that are created by combining (AND/OR) of individual predicates is supported.

    • postAggregationExpressionToEvaluate (required): The set operation to perform on the individual intermediate sketches for each of the predicates. Currently supported operations are SET_DIFF, SET_UNION, SET_INTERSECT , where DIFF requires two arguments and the UNION/INTERSECT allow more than two arguments.

In the example query below, the where clause is responsible for identifying the matching rows. Note, the where clause can be completely independent of the postAggregationExpression. Once matching rows are identified, each server unionizes all the sketches that match the individual predicates, i.e. country='USA' , device='mobile' in this case. Once the broker receives the intermediate sketches for each of these individual predicates from all servers, it performs the final aggregation by evaluating the postAggregationExpression and returns the final cardinality of the resulting sketch.

select distinctCountThetaSketch(
  sketchCol, 
  'nominalEntries=1024', 
  'country'=''USA'' AND 'state'=''CA'', 'device'=''mobile'', 'SET_INTERSECT($1, $2)'
) 
from table 
where country = 'USA' or device = 'mobile...' 
  • DistinctCountRawThetaSketch(<thetaSketchColumn>, <thetaSketchParams>, predicate1, predicate2..., postAggregationExpressionToEvaluate**)** -> HexEncoded Serialized Sketch Bytes

This is the same as the previous function, except it returns the byte serialized sketch instead of the cardinality sketch. Since Pinot returns responses as JSON strings, bytes are returned as hex encoded strings. The hex encoded string can be deserialized into sketch by using the library org.apache.commons.codec.binaryas Hex.decodeHex(stringValue.toCharArray()).

Use OSS as Deep Storage for Pinot

Configure AliCloud Object Storage Service (OSS) as Pinot deep storage

OSS can be used as HDFS deep storage for Apache Pinot without implementing the OSS file system plugin. You should follow the steps below; 1. Configure hdfs-site.xml and core-site.xml files. After that, put these configurations under any desired path, then set the value of pinot.<node>.storage.factory.oss.hadoop.conf config on the controller/server configs to this path.

For hdfs-site.xml; you do not have to give any configuration;

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
</configuration>

For core-site.xml; you have to give OSS access/secret and bucket configurations like below;

<?xml version="1.0"?>
<configuration>
    <property>
	      <name>fs.defaultFS</name>
	      <value>oss://your-bucket-name/</value>
	  </property>
    <property>
        <name>fs.oss.accessKeyId</name>
        <value>your-access-key-id</value>
    </property>
    <property>
        <name>fs.oss.accessKeySecret</name>
        <value>your-access-key-secret</value>
    </property>
    <property>
        <name>fs.oss.impl</name>
        <value>com.aliyun.emr.fs.oss.OssFileSystem</value>
    </property>
    <property>
        <name>fs.oss.endpoint</name>
        <value>your-oss-endpoint</value>
    </property>
</configuration>

2. In order to access OSS, find your HDFS jars related to OSS and put them under the PINOT_DIR/lib. You can use jars below but be careful about versions to avoid conflict.

  • smartdata-aliyun-oss

  • smartdata-hadoop-common

  • guava

3. Set OSS deep storage configs on controller.conf and server.conf;

Controller config

controller.data.dir=oss://your-bucket-name/path/to/segments
controller.local.temp.dir=/path/to/local/temp/directory
controller.enable.split.commit=true
pinot.controller.storage.factory.class.oss=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.controller.storage.factory.oss.hadoop.conf.path=path/to/conf/directory/
pinot.controller.segment.fetcher.protocols=file,http,oss
pinot.controller.segment.fetcher.oss.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Server config

pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.oss=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.server.storage.factory.oss.hadoop.conf.path=path/to/conf/directory/
pinot.server.segment.fetcher.protocols=file,http,oss
pinot.server.segment.fetcher.oss.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Example Job Spec

Using the same HDFS deep storage configs and jars, you can read data from OSS, then create segments and push them to OSS again. An example standalone batch ingestion job can be like below;

executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'
jobType: SegmentCreationAndMetadataPush
inputDirURI: 'oss://your-bucket-name/input'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: 'oss://your-bucket-name/output'
overwriteOutput: true
pinotFSSpecs:
  - scheme: oss
    className: org.apache.pinot.plugin.filesystem.HadoopPinotFS
    configs:
      hadoop.conf.path: '/path/to/hadoop/conf'
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
  tableName: 'transcript'
pinotClusterSpecs:
  - controllerURI: '<http://localhost:9000>'
{
    "tableIndexConfig": {
        "noDictionaryColumns": [
            "column_name",
            ...
        ],
        ...
    }
}
{
    "tableIndexConfig": {
        "sortedColumn": [
            "column_name"
        ],
        ...
    }
}
$ grep memberId <segment_name>/v3/metadata.properties | grep isSorted
column.memberId.isSorted = true
cd incubator-pinot/pinot-tools/target/pinot-tools-pkg 
bin/pinot-admin.sh PostQuery \
  -queryType sql \
  -brokerPort 8000 \
  -query "select count(*) from baseballStats"
2020/03/04 12:46:33.459 INFO [PostQueryCommand] [main] Executing command: PostQuery -brokerHost localhost -brokerPort 8000 -queryType sql -query select count(*) from baseballStats
2020/03/04 12:46:33.854 INFO [PostQueryCommand] [main] Result: {"resultTable":{"dataSchema":{"columnDataTypes":["LONG"],"columnNames":["count(*)"]},"rows":[[97889]]},"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numConsumingSegmentsQueried":0,"numDocsScanned":97889,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":0,"numGroupsLimitReached":false,"totalDocs":97889,"timeUsedMs":185,"segmentStatistics":[],"traceInfo":{},"minConsumingFreshnessTimeMs":0}
Getting Pinot
$ curl -H "Content-Type: application/json" -X POST \
   -d '{"sql":"select foo, count(*) from myTable group by foo limit 100"}' \
   http://localhost:8099/query/sql
$ curl -H "Content-Type: application/json" -X POST \
   -d '{"pql":"select count(*) from myTable group by foo top 100"}' \
   http://localhost:8099/query

Tenant

A tenant is a logical component defined as a group of server/broker nodes with the same Helix tag.

In order to support multi-tenancy, Pinot has first-class support for tenants. Every table is associated with a server tenant and a broker tenant. This controls the nodes that will be used by this table as servers and brokers. This allows all tables belonging to a particular use case to be grouped under a single tenant name.

The concept of tenants is very important when the multiple use cases are using Pinot and there is a need to provide quotas or some sort of isolation across tenants. For example, consider we have two tables Table A and Table B in the same Pinot cluster.

We can configure Table A with server tenant Tenant A and Table B with server tenant Tenant B. We can tag some of the server nodes for Tenant A and some for Tenant B. This will ensure that segments of Table A only reside on servers tagged with Tenant A, and segment of Table B only reside on servers tagged with Tenant B. The same isolation can be achieved at the broker level, by configuring broker tenants to the tables.

No need to create separate clusters for every table or use case!

Tenant Config

This tenant is defined in the section of the table config.

This section contains 2 main fields broker and server , which decide the tenants used for the broker and server components of this table.

In the above example:

  • The table will be served by brokers that have been tagged as brokerTenantName_BROKER in Helix.

  • If this were an offline table, the offline segments for the table will be hosted in Pinot servers tagged in Helix as serverTenantName_OFFLINE

  • If this were a real-time table, the real-time segments (both consuming as well as completed ones) will be hosted in pinot servers tagged in Helix as serverTenantName_REALTIME.

Creating a tenant

Broker tenant

Here's a sample broker tenant config. This will create a broker tenant sampleBrokerTenant by tagging 3 untagged broker nodes as sampleBrokerTenant_BROKER.

To create this tenant use the following command. The creation will fail if number of untagged broker nodes is less than numberOfInstances.

Follow instructions in to get Pinot locally, and then

Check out the table config in the to make sure it was successfully uploaded.

Server tenant

Here's a sample server tenant config. This will create a server tenant sampleServerTenant by tagging 1 untagged server node as sampleServerTenant_OFFLINE and 1 untagged server node as sampleServerTenant_REALTIME.

To create this tenant use the following command. The creation will fail if number of untagged server nodes is less than offlineInstances + realtimeInstances.

Follow instructions in to get Pinot locally, and then

Check out the table config in the to make sure it was successfully uploaded.

Query FAQ

Querying

I get the following error when running a query, what does it mean?

This essentially implies that the Pinot Broker assigned to the table specified in the query was not found. A common root cause for this is a typo in the table name in the query. Another uncommon reason could be if there wasn't actually a broker with required broker tenant tag for the table.

What are all the fields in the Pinot query's JSON response?

Here's the page explaining the Pinot response format:

SQL Query fails with "Encountered 'timestamp' was expecting one of..."

"timestamp" is a reserved keyword in SQL. Escape timestamp with double quotes.

Other commonly encountered reserved keywords are date, time, table.

Filtering on STRING column WHERE column = "foo" does not work?

For filtering on STRING columns, use single quotes

ORDER BY using an alias doesn't work?

The fields in the ORDER BY clause must be one of the group by clauses or aggregations, BEFORE applying the alias. Therefore, this will not work

Instead, this will work

Does pagination work in GROUP BY queries?

No. Pagination only works for SELECTION queries

How do I increase timeout for a query ?

You can add this at the end of your query: option(timeoutMs=X). For eg: the following example will use a timeout of 20 seconds for the query:

How do I optimize my Pinot table for doing aggregations and group-by on high cardinality columns ?

In order to speed up aggregations, you can enable metrics aggregation on the required column by adding a in the corresponding schema and setting aggregateMetrics to true in the table config. You can also use a star-tree index config for such columns ()

How do I verify that an index is created on a particular column ?

There are 2 ways to verify this:

  1. Log in to a server that hosts segments of this table. Inside the data directory, locate the segment directory for this table. In this directory, there is a file named index_map which lists all the indexes and other data structures created for each segment. Verify that the requested index is present here.

  2. During query: Use the column in the filter predicate and check the value of numEntriesScannedInFilter . If this value is 0, then indexing is working as expected (works for Inverted index)

Does Pinot use a default value for LIMIT in queries?

Yes, Pinot uses a default value of LIMIT 10 in queries. The reason behind this default value is to avoid unintentionally submitting expensive queries that end up fetching or processing a lot of data from Pinot. Users can always overwrite this by explicitly specifying a LIMIT value.

Does Pinot cache query results?

Pinot does not cache query results, each query is computed in its entirety. Note though, running the same or similar query multiple times will naturally pull in segment pages into memory making subsequent calls faster. Also, for realtime systems, the data is changing in realtime, so results cannot be cached. For offline-only systems, caching layer can be built on top of Pinot, with invalidation mechanism built-in to invalidate the cache when data is pushed into Pinot.

How do I determine if StarTree index is being used for my query?

The query execution engine will prefer to use StarTree index for all queries where it can be used. The criteria to determine whether StarTree index can be used is as follows:

  • All aggregation function + column pairs in the query must exist in the StarTree index.

  • All dimensions that appear in filter predicates and group-by should be StarTree dimensions.

For queries where above is true, StarTree index is used. For other queries, the execution engine will default to using the next best index available.

Amazon Kinesis

This is not tested in production. You may hit some snags while trying to use this.

To ingest events from an Amazon Kinesis stream into Pinot, set the following configs into the table config

where the Kinesis specific properties are:

Kinesis supports authentication using the . The credential provider looks for the credentials in the following order -

  • Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)

  • Java System Properties - aws.accessKeyId and aws.secretKey

  • Web Identity Token credentials from the environment or container

  • Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI

  • Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable is set and security manager has permission to access the variable,

  • Instance profile credentials delivered through the Amazon EC2 metadata service

You can also specify the accessKey and secretKey using the properties. However, this method is not secure and should be used only for POC setups. You can also specify other aws fields such as AWS_SESSION_TOKEN as environment variables and config and it will work.

Limitations

  1. ShardID is of the format "shardId-000000000001". We use the numeric part as partitionId. Our partitionId variable is integer. If shardIds grow beyond Integer.MAX_VALUE, we will overflow

  2. Segment size based thresholds for segment completion will not work. It assumes that partition "0" always exists. However, once the shard 0 is split/merged, we will no longer have partition 0.

HDFS

This guide shows you how to import data from HDFS.

You can enable the using the plugin pinot-hdfs. In the controller or server, add the config:

By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...

HDFS implementation provides the following options -

  • hadoop.conf.path : Absolute path of the directory containing hadoop XML configuration files such as hdfs-site.xml, core-site.xml .

  • hadoop.write.checksum : create checksum while pushing an object. Default is false

  • hadoop.kerberos.principle

  • hadoop.kerberos.keytab

Each of these properties should be prefixed by pinot.[node].storage.factory.class.hdfs. where node is either controller or server depending on the config

The kerberos configs should be used only if your Hadoop installation is secured with Kerberos. Please check on how to generate Kerberos security identification.

You will also need to provide proper Hadoop dependencies jars from your Hadoop installation to your Pinot startup scripts.

Push HDFS segment to Pinot Controller

To push HDFS segment files to Pinot controller, you just need to ensure you have proper Hadoop configuration as we mentioned in the previous part. Then your remote segment creation/push job can send the HDFS path of your newly created segment files to the Pinot Controller and let it download the files.

For example, the following curl requests to Controller will notify it to download segment files to the proper table:

Examples

Job spec

Standalone Job:

Hadoop Job:

Controller config

Server config

Minion config

User-Defined Functions (UDFs)

Pinot currently supports two ways for you to implement your own functions:

  • Groovy Scripts

  • Scalar Functions

Groovy Scripts

Pinot allows you to run any function using scripts. The syntax for executing Groovy script within the query is as follows:

GROOVY('result value metadata json', ''groovy script', arg0, arg1, arg2...)

This function will execute the groovy script using the arguments provided and return the result that matches the provided result value metadata. The function requires the following arguments:

  • Result value metadata json - json string representing result value metadata. Must contain non-null keys resultType and isSingleValue.

  • Groovy script to execute- groovy script string, which uses arg0, arg1, arg2 etc to refer to the arguments provided within the script

  • arguments - pinot columns/other transform functions that are arguments to the groovy script

Examples

  • Add colA and colB and return a single-value INT groovy( '{"returnType":"INT","isSingleValue":true}', 'arg0 + arg1', colA, colB)

  • Find the max element in mvColumn array and return a single-value INT

    groovy('{"returnType":"INT","isSingleValue":true}', 'arg0.toList().max()', mvColumn)

  • Find all elements of the array mvColumn and return as a multi-value LONG column

    groovy('{"returnType":"LONG","isSingleValue":false}', 'arg0.findIndexValues{ it > 5 }', mvColumn)

  • Multiply length of array mvColumn with colB and return a single-value DOUBLE

    groovy('{"returnType":"DOUBLE","isSingleValue":true}', 'arg0 * arg1', arraylength(mvColumn), colB)

  • Find all indexes in mvColumnA which have value foo, add values at those indexes in mvColumnB

    groovy( '{"returnType":"DOUBLE","isSingleValue":true}', 'def x = 0; arg0.eachWithIndex{item, idx-> if (item == "foo") {x = x + arg1[idx] }}; return x' , mvColumnA, mvColumnB)

  • Switch case which returns a FLOAT value depending on length of mvCol array

    groovy('{\"returnType\":\"FLOAT\", \"isSingleValue\":true}', 'def result; switch(arg0.length()) { case 10: result = 1.1; break; case 20: result = 1.2; break; default: result = 1.3;}; return result.floatValue()', mvCol)

  • Any Groovy script which takes no arguments

    groovy('new Date().format( "yyyyMMdd" )', '{"returnType":"STRING","isSingleValue":true}')

Scalar Functions

Since the 0.5.0 release, Pinot supports custom functions that return a single output for multiple inputs. Examples of scalar functions can be found in and

Pinot automatically identifies and registers all the functions that have the @ScalarFunction annotation.

Only Java methods are supported.

Adding user defined scalar functions

You can add new scalar functions as follows:

  • Create a new java project. Make sure you keep the package name as org.apache.pinot.scalar.XXXX

  • In your java project include the dependency

  • Annotate your methods with @ScalarFunction annotation. Make sure the method is static and returns only a single value output. The input and output can have one of the following types -

    • Integer

    • Long

    • Double

    • String

  • Place the compiled JAR in the /plugins directory in pinot. You will need to restart all Pinot instances if they are already running.

  • Now, you can use the function in a query as follows:

Note that the function name in SQL is the same as the function name in Java. The SQL function name is case-insensitive as well.

Schema Evolution

So far, you've seen how to for a Pinot table. In this tutorial, we'll see how to evolve the schema (e.g. add a new column to the schema). This guide assumes you have a Pinot cluster up and running (eg: as mentioned in ). We will also assume there's an existing table baseballStats created as part of the .

Pinot only allows adding new columns to the schema. In order to drop a column, change the column name or data type, a new table has to be created.

Get the existing schema

Let's begin by first fetching the existing schema. We can do this using the controller API:

Add a new column

Let's add a new column at the end of the schema, something like this (by editing baseballStats.schema

In this example, we're adding a new column called yearsOfExperience with a default value of 1.

Update the schema

You can now update the schema using the following command

Please note: this will not be reflected immediately. You can use the following command to reload the table segments for this column to show up. This can be done as follows:

After the reload, now you can query the new column as shown below:

Real-Time Pinot table: In case of real-time tables, make sure the "pinot.server.instance.reload.consumingSegment" config is set to true inside . Without this, the current consuming segment(s) will not reflect the default null value for newly added columns.

Note that the real values for the newly added columns won't be reflected within the current consuming segment(s). The next consuming segment(s) will start consuming the real values.

Derived Column

New columns can be added with . If all the source columns for the new column exist in the schema, the transformed values will be generated for the new column instead of filling default values.

Backfilling the Data

As you can observe, the current query returns the defaultNullValue for the newly added column. In order to populate this column with real values, you will need to re-run the batch ingestion job for the past dates.

Real-Time Pinot table: Backfilling data does not work for real-time tables. If you only have a real-time table, you can convert it to a hybrid table, by adding an offline counterpart that uses the same schema. Then you can backfill the offline table and fill in values for the newly added column. More on .

{'errorCode': 410, 'message': 'BrokerResourceMissingError'}
select "timestamp" from myTable
SELECT COUNT(*) from myTable WHERE column = 'foo'
SELECT count(colA) as aliasA, colA from tableA GROUP BY colA ORDER BY aliasA
SELECT count(colA) as sumA, colA from tableA GROUP BY colA ORDER BY count(colA)
SELECT COUNT(*) from myTable option(timeoutMs=20000)
https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql/response-format
metric field
read more about star-tree here
{
  "tableName": "kinesisTable",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "timestamp",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kinesis",
      "stream.kinesis.topic.name": "<your kinesis stream name>",
      "region": "<your region>",
      "accessKey": "<your access key>",
      "secretKey": "<your secret key>",
      "shardIteratorType": "AFTER_SEQUENCE_NUMBER",
      "stream.kinesis.consumer.type": "lowlevel",
      "stream.kinesis.fetch.timeout.millis": "30000",
      "stream.kinesis.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kinesis.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kinesis.KinesisConsumerFactory",
      "realtime.segment.flush.threshold.size": "1000000",
      "realtime.segment.flush.threshold.time": "6h"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}

Property

Description

streamType

This should be set to "kinesis"

stream.kinesis.topic.name

Kinesis stream name

region

Kinesis region e.g. us-west-1

accessKey

Kinesis access key

secretKey

Kinesis secret key

shardIteratorType

Set to "LATEST" for largest offset (default),TRIM_HORIZON for earliest offset.

maxRecordsToFetch

... Default is 20.

DefaultCredentialsProviderChain
-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-hdfs
export HADOOP_HOME=/local/hadoop/
export HADOOP_VERSION=2.7.1
export HADOOP_GUAVA_VERSION=11.0.2
export HADOOP_GSON_VERSION=2.2.4
export CLASSPATH_PREFIX="${HADOOP_HOME}/share/hadoop/hdfs/hadoop-hdfs-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-annotations-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/hadoop-auth-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/hadoop-common-${HADOOP_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/guava-${HADOOP_GUAVA_VERSION}.jar:${HADOOP_HOME}/share/hadoop/common/lib/gson-${HADOOP_GSON_VERSION}.jar"
curl -X POST -H "UPLOAD_TYPE:URI" -H "DOWNLOAD_URI:hdfs://nameservice1/hadoop/path/to/segment/file.
executionFrameworkSpec:
    name: 'standalone'
    segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
    segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
    segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: 'hdfs:///path/to/input/directory/'
outputDirURI: 'hdfs:///path/to/output/directory/'
includeFileNamePath: 'glob:**/*.csv'
overwriteOutput: true
pinotFSSpecs:
    - scheme: hdfs
      className: org.apache.pinot.plugin.filesystem.HadoopPinotFS
      configs:
        hadoop.conf.path: 'path/to/conf/directory/' 
recordReaderSpec:
    dataFormat: 'csv'
    className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
    configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
    tableName: 'students'
pinotClusterSpecs:
    - controllerURI: 'http://localhost:9000'
executionFrameworkSpec:
    name: 'hadoop'
    segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner'
    segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentTarPushJobRunner'
    segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentUriPushJobRunner'
    extraConfigs:
      stagingDir: 'hdfs:///path/to/staging/directory/'
jobType: SegmentCreationAndTarPush
inputDirURI: 'hdfs:///path/to/input/directory/'
outputDirURI: 'hdfs:///path/to/output/directory/'
includeFileNamePath: 'glob:**/*.csv'
overwriteOutput: true
pinotFSSpecs:
    - scheme: hdfs
      className: org.apache.pinot.plugin.filesystem.HadoopPinotFS
      configs:
        hadoop.conf.path: '/etc/hadoop/conf/' 
recordReaderSpec:
    dataFormat: 'csv'
    className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
    configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
    tableName: 'students'
pinotClusterSpecs:
    - controllerURI: 'http://localhost:9000'
controller.data.dir=hdfs://path/to/data/directory/
controller.local.temp.dir=/path/to/local/temp/directory
controller.enable.split.commit=true
pinot.controller.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.controller.storage.factory.hdfs.hadoop.conf.path=path/to/conf/directory/
pinot.controller.segment.fetcher.protocols=file,http,hdfs
pinot.controller.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.principle=<your kerberos principal>
pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.keytab=<your kerberos keytab>
pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.server.storage.factory.hdfs.hadoop.conf.path=path/to/conf/directory/
pinot.server.segment.fetcher.protocols=file,http,hdfs
pinot.server.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.server.segment.fetcher.hdfs.hadoop.kerberos.principle=<your kerberos principal>
pinot.server.segment.fetcher.hdfs.hadoop.kerberos.keytab=<your kerberos keytab>
storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
storage.factory.hdfs.hadoop.conf.path=path/to/conf/directory
segment.fetcher.protocols=file,http,hdfs
segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
segment.fetcher.hdfs.hadoop.kerberos.principle=<your kerberos principal>
segment.fetcher.hdfs.hadoop.kerberos.keytab=<your kerberos keytab>
Hadoop DFS
Hadoop Kerberos guide
<dependency>
  <groupId>org.apache.pinot</groupId>
  <artifactId>pinot-common</artifactId>
  <version>0.5.0</version>
 </dependency>
include 'org.apache.pinot:pinot-common:0.5.0'
//Example Scalar function

@ScalarFunction
static String mySubStr(String input, Integer beginIndex) {
  return input.substring(beginIndex);
}
SELECT mysubstr(playerName, 4) 
FROM baseballStats
Apache Groovy
StringFunctions
DateTimeFunctions
$ curl localhost:9000/schemas/baseballStats > baseballStats.schema
{
  "schemaName" : "baseballStats",
  "dimensionFieldSpecs" : [ {
  
    ...
    
    }, {
    "name" : "myNewColumn",
    "dataType" : "INT",
    "defaultNullValue": 1
  } ]
}
bin/pinot-admin.sh AddSchema -schemaFile baseballStats.schema -exec
$ curl -F [email protected] localhost:9000/schemas
$ curl -X POST localhost:9000/segments/baseballStats/reload
$ bin/pinot-admin.sh PostQuery \
  -queryType sql \
  -brokerPort 8000 \
  -query "select playerID, yearsOfExperience from baseballStats limit 10" 2>/dev/null
Executing command: PostQuery -brokerHost 192.168.86.234 -brokerPort 8000 -queryType sql -query select playerID, yearsOfExperience from baseballStats limit 10
Result: {"resultTable":{"dataSchema":{"columnNames":["playerID","yearsOfExperience"],"columnDataTypes":["STRING","INT"]},"rows":[["aardsda01",1],["aardsda01",1],["aardsda01",1],["aardsda01",1],["aardsda01",1],["aardsda01",1],["aardsda01",1],["aaronha01",1],["aaronha01",1],["aaronha01",1]]},"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numConsumingSegmentsQueried":0,"numDocsScanned":10,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":20,"numGroupsLimitReached":false,"totalDocs":97889,"timeUsedMs":3,"segmentStatistics":[],"traceInfo":{},"minConsumingFreshnessTimeMs":0}
create a new schema
https://docs.pinot.apache.org/basics/getting-started/running-pinot-locally
batch quick start
Server config
ingestion transforms
hybrid tables here

Hadoop

Segment Creation and Push

Pinot supports Apache Hadoop as a processor to create and push segment files to the database. Pinot distribution is bundled with the Spark code to process your files and convert and upload them to Pinot.

You can follow the [wiki] to build pinot distribution from source. The resulting JAR file can be found in pinot/target/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar

Next, you need to change the execution config in the job spec to the following -

# executionFrameworkSpec: Defines ingestion jobs to be running.
executionFrameworkSpec:

    # name: execution framework name
  name: 'hadoop'

  # segmentGenerationJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner'

  # segmentTarPushJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentTarPushJobRunner'

  # segmentUriPushJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentUriPushJobRunner'

  # segmentMetadataPushJobRunnerClassName: class name implements org.apache.pinot.spi.ingestion.batch.runner.IngestionJobRunner interface.
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentMetadataPushJobRunner'

    # extraConfigs: extra configs for execution framework.
  extraConfigs:

    # stagingDir is used in distributed filesystem to host all the segments then move this directory entirely to output directory.
    stagingDir: your/local/dir/staging

You can check out the sample job spec here.

Finally execute the hadoop job using the command -

export PINOT_VERSION=0.8.0
export PINOT_DISTRIBUTION_DIR=${PINOT_ROOT_DIR}/pinot-distribution/target/apache-pinot-${PINOT_VERSION}-bin/apache-pinot-${PINOT_VERSION}-bin
export HADOOP_CLIENT_OPTS="-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml"

hadoop jar  \\
        ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \\
        org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \\
        -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/batch/airlineStats/hadoopIngestionJobSpec.yaml

Please ensure environment variables PINOT_ROOT_DIR and PINOT_VERSION are set properly.

Data Preprocessing before Segment Creation

We’ve seen some requests that data should be massaged (like partitioning, sorting, resizing) before creating and pushing segments to Pinot.

The MapReduce job called SegmentPreprocessingJob would be the best fit for this use case, regardless of whether the input data is of AVRO or ORC format.

Check the below example to see how to use SegmentPreprocessingJob.

In Hadoop properties, set the following to enable this job:

enable.preprocessing = true
preprocess.path.to.output = <output_path>

In table config, specify the operations in preprocessing.operations that you'd like to enable in the MR job, and then specify the exact configs regarding those operations:

{
    "OFFLINE": {
        "metadata": {
            "customConfigs": {
                “preprocessing.operations”: “resize, partition, sort”, // To enable the following preprocessing operations
                "preprocessing.max.num.records.per.file": "100",       // To enable resizing
                "preprocessing.num.reducers": "3"                      // To enable resizing
            }
        },
        ...
        "tableIndexConfig": {
            "aggregateMetrics": false,
            "autoGeneratedInvertedIndex": false,
            "bloomFilterColumns": [],
            "createInvertedIndexDuringSegmentGeneration": false,
            "invertedIndexColumns": [],
            "loadMode": "MMAP",
            "nullHandlingEnabled": false,
            "segmentPartitionConfig": {       // To enable partitioning
                "columnPartitionMap": {
                    "item": {
                        "functionName": "murmur",
                        "numPartitions": 4
                    }
                }
            },
            "sortedColumn": [                // To enable sorting
                "actorId"
            ],
            "streamConfigs": {}
        },
        "tableName": "tableName_OFFLINE",
        "tableType": "OFFLINE",
        "tenants": {
            ...
        }
    }
}

preprocessing.num.reducers

Minimum number of reducers. Optional. Fetched when partitioning gets disabled and resizing is enabled. This parameter is to avoid having too many small input files for Pinot, which leads to the case where Pinot server is holding too many small segments, causing too many threads.

preprocessing.max.num.records.per.file

Maximum number of records per reducer. Optional.Unlike, “preprocessing.num.reducers”, this parameter is to avoid having too few large input files for Pinot, which misses the advantage of muti-threading when querying. When not set, each reducer will finally generate one output file. When set (e.g. M), the original output file will be split into multiple files and each new output file contains at most M records. It does not matter whether partitioning is enabled or not.

For more details on this MR job, please refer to this document.

Amazon S3

You can enable Amazon S3 Filesystem backend by including the plugin pinot-s3 .

-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-s3

By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...

You can also configure the S3 filesystem using the following options:

Configuration

Description

region

The AWS Data center region in which the bucket is located

accessKey

(Optional) AWS access key required for authentication. This should only be used for testing purposes as we don't store these keys in secret.

secretKey

(Optional) AWS secret key required for authentication. This should only be used for testing purposes as we don't store these keys in secret.

endpoint

(Optional) Override endpoint for s3 client.

disableAcl

If this is set tofalse, bucket owner is granted full access to the objects created by pinot. Default value is true.

serverSideEncryption

(Optional) The server-side encryption algorithm used when storing this object in Amazon S3 (Now supports aws:kms), set to null to disable SSE.

ssekmsKeyId

(Optional, but required when serverSideEncryption=aws:kms) Specifies the AWS KMS key ID to use for object encryption. All GET and PUT requests for an object protected by AWS KMS will fail if not made via SSL or using SigV4.

ssekmsEncryptionContext

(Optional) Specifies the AWS KMS Encryption Context to use for object encryption. The value of this header is a base64-encoded UTF-8 string holding JSON with the encryption context key-value pairs.

Each of these properties should be prefixed by pinot.[node].storage.factory.s3. where node is either controller or server depending on the config

e.g.

pinot.controller.storage.factory.s3.region=ap-southeast-1

S3 Filesystem supports authentication using the DefaultCredentialsProviderChain. The credential provider looks for the credentials in the following order -

  • Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)

  • Java System Properties - aws.accessKeyId and aws.secretKey

  • Web Identity Token credentials from the environment or container

  • Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI

  • Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable is set and security manager has permission to access the variable,

  • Instance profile credentials delivered through the Amazon EC2 metadata service

You can also specify the accessKey and secretKey using the properties. However, this method is not secure and should be used only for POC setups.

Examples

Job spec

executionFrameworkSpec:
    name: 'standalone'
    segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
    segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
    segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: 's3://pinot-bucket/pinot-ingestion/batch-input/'
outputDirURI: 's3://pinot-bucket/pinot-ingestion/batch-output/'
overwriteOutput: true
pinotFSSpecs:
    - scheme: s3
      className: org.apache.pinot.plugin.filesystem.S3PinotFS
      configs:
        region: 'ap-southeast-1'
recordReaderSpec:
    dataFormat: 'csv'
    className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
    configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
    tableName: 'students'
pinotClusterSpecs:
    - controllerURI: 'http://localhost:9000'

Controller config

controller.data.dir=s3://path/to/data/directory/
controller.local.temp.dir=/path/to/local/temp/directory
controller.enable.split.commit=true
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=ap-southeast-1
pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Server config

pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.server.storage.factory.s3.region=ap-southeast-1
pinot.server.segment.fetcher.protocols=file,http,s3
pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Minion config

storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
storage.factory.s3.region=ap-southeast-1
segment.fetcher.protocols=file,http,s3
segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
"tenants": {
  "broker": "brokerTenantName",
  "server": "serverTenantName"
}
sample-broker-tenant.json
{
     "tenantRole" : "BROKER",
     "tenantName" : "sampleBrokerTenant",
     "numberOfInstances" : 3
}
bin/pinot-admin.sh AddTenant \
    -name sampleBrokerTenant 
    -role BROKER 
    -instanceCount 3 -exec
curl -i -X POST -H 'Content-Type: application/json' -d @sample-broker-tenant.json localhost:9000/tenants
sample-server-tenant.json
{
     "tenantRole" : "SERVER",
     "tenantName" : "sampleServerTenant",
     "offlineInstances" : 1,
     "realtimeInstances" : 1
}
bin/pinot-admin.sh AddTenant \
    -name sampleServerTenant \
    -role SERVER \
    -offlineInstanceCount 1 \
    -realtimeInstanceCount 1 -exec
curl -i -X POST -H 'Content-Type: application/json' -d @sample-server-tenant.json localhost:9000/tenants
tenants
Getting Pinot
Rest API
Getting Pinot
Rest API
Defining tenants for tables
Table isolation using tenants

Running Pinot in Docker

This quick start guide will show you how to run a Pinot cluster using Docker.

This is a quickstart guide that will show you how to quickly start an example recipe in a standalone instance and is meant for learning. To run Pinot in cluster mode, please take a look at .

Prerequisites

Install

You can also try if you already have a local cluster installed or setup.

If running locally, please ensure your docker cluster has enough resources, below is a sample config.

We'll be using our docker image apachepinot/pinot:latest to run this quick start, which does the following:

  • Sets up the Pinot cluster

  • Creates a sample table and loads sample data

The following quick-start scripts are available

  • Batch example

  • Streaming example

  • Hybrid example

Before running the scripts, create an isolated bridge network pinot-demo in docker. This will allow all docker containers to easily communicate with each other. You can create the network using the following command -

Batch example

In this example we demonstrate how to do batch processing with Pinot.

  • Starts Pinot deployment by starting

    • Apache Zookeeper

    • Pinot Controller

    • Pinot Broker

    • Pinot Server

  • Creates a demo table

    • baseballStats

  • Launches a standalone data ingestion job

    • Builds one Pinot segment for a given CSV data file for table baseballStats

    • Pushes the built segment to the Pinot controller

  • Issues sample queries to Pinot

Once the Docker container is running, you can view the logs by running the following command.

That's it! We've spun up a Pinot cluster.

It may take a while for all the Pinot components to start and for the sample data to be loaded.

Use the below command to check the status in the container logs.

Your cluster is ready once you see the cluster setup completion messages and sample queries, as demonstrated below.

You can head over to to check out the data in the baseballStats table.

Streaming example

In this example we demonstrate how to do stream processing with Pinot.

  • Starts Pinot deployment by starting

    • Apache Kafka

    • Apache Zookeeper

    • Pinot Controller

    • Pinot Broker

    • Pinot Server

  • Creates a demo table

    • meetupRsvp

  • Launches a meetup stream

  • Publishes data to a Kafka topic meetupRSVPEvents to be subscribed to by Pinot

  • Issues sample queries to Pinot

Once the cluster is up, you can head over to to check out the data in the meetupRSVPEvents table.

Hybrid example

In this example we demonstrate how to do hybrid stream and batch processing with Pinot.

  1. Starts Pinot deployment by starting

    • Apache Kafka

    • Apache Zookeeper

    • Pinot Controller

    • Pinot Broker

    • Pinot Server

  2. Creates a demo table

    • airlineStats

  3. Launches a standalone data ingestion job

    • Builds Pinot segments under a given directory of Avro files for table airlineStats

    • Pushes built segments to Pinot controller

  4. Launches a stream of flights stats

  5. Publishes data to a Kafka topic airlineStatsEvents to be subscribed to by Pinot

  6. Issues sample queries to Pinot

Once the cluster is up, you can head over to to check out the data in the airlineStats table.

Ingestion FAQ

Data processing

What is a good segment size?

While Pinot can work with segments of various sizes, for optimal use of Pinot, you want to get your segments sized in the 100MB to 500MB (un-tarred/uncompressed) range. Please note that having too many (thousands or more) of tiny segments for a single table just creates more overhead in terms of the metadata storage in Zookeeper as well as in the Pinot servers' heap. At the same time, having too few really large (GBs) segments reduces parallelism of query execution, as on the server side, the thread parallelism of query execution is at segment level.

Can multiple Pinot tables consume from the same Kafka topic?

Yes. Each table can be independently configured to consume from any given Kafka topic, regardless of whether there are other tables that are also consuming from the same Kafka topic.

How do I enable partitioning in Pinot, when using Kafka stream?

Setup partitioner in the Kafka producer:

The partitioning logic in the stream should match the partitioning config in Pinot. Kafka uses murmur2, and the equivalent in Pinot is Murmur function.

Set partitioning config as below using same column used in Kafka

and also set

More details about how partitioner works in Pinot .

How do I store BYTES column in JSON data?

For JSON, you can use hex encoded string to ingest BYTES

How do I flatten my JSON Kafka stream?

We have function which can store a top level json field as a STRING in Pinot.

Then you can use these during query time, to extract fields from the json string.

NOTE This works well if some of your fields are nested json, but most of your fields are top level json keys. If all of your fields are within a nested JSON key, you will have to store the entire payload as 1 column, which is not ideal.

Support for flattening during ingestion is on the roadmap:

How do I escape Unicode in my Job Spec YAML file?

To use explicit code points, you must double-quote (not single-quote) the string, and escape the code point via "\uHHHH", where HHHH is the four digit hex code for the character. See for more details.

Is there a limit on the maximum length of a string column in Pinot?

By default, Pinot limits the length of a String column to 512 bytes. If you want to overwrite this value, you can set the maxLength attribute in the schema as follows:

When can new events become queryable when getting ingested into a real-time table?

Events are available to be read by queries as soon as they are ingested. This is because events are instantly indexed in-memory upon ingestion.

The ingestion of events into the real-time table is not transactional, so replicas of the open segment are not immediately consistent. Pinot trades consistency for availability upon network partitioning (CAP theorem) to provide ultra-low ingestion latencies at high throughput.

However, when the open segment is closed and its in-memory indexes are flushed to persistent storage, all its replicas are guaranteed to be consistent, with the .

Indexing

How to set inverted indexes?

Inverted indexes are set in the tableConfig's tableIndexConfig -> invertedIndexColumns list. Here's the documentation for tableIndexConfig: along with a sample table that has set inverted indexes on some columns.

Applying inverted indexes to a table config will generate inverted index to all new segments. In order to apply the inverted indexes to all existing segments, follow steps in

How to apply inverted index to existing setup?

  1. Add the columns you wish to index to the tableIndexConfig-> invertedIndexColumns list. This sample table config show inverted indexes set: To update the table config use the Pinot Swagger API:

  2. Invoke the reload API:

Right now, there’s no easy way to confirm that reload succeeded. One way it to check out the index_map file inside the segment metadata, you should see inverted index entries for the new columns. An API for this is coming soon:

How to create star-tree indexes?

Star-tree indexes are configured in the table config under the tableIndexConfig -> starTreeIndexConfigs (list) and enableDefaultStarTree (boolean). Read more about how to configure star-tree indexes:

The new segments will have star-tree indexes generated after applying the star-tree index configs to the table config. Currently Pinot does not support adding star-tree indexes to the existing segments.

Handling time in Pinot

How does Pinot’s real-time ingestion handle out-of-order events?

Pinot does not require ordering of event time stamps. Out of order events are still consumed and indexed into the "currently consuming" segment. In a pathological case, if you have a 2 day old event come in "now", it will still be stored in the segment that is open for consumption "now". There is no strict time-based partitioning for segments, but star-indexes and hybrid tables will handle this as appropriate.

See the for more details about how hybrid tables handle this. Specifically, the time-boundary is computed as max(OfflineTIme) - 1 unit of granularity. Pinot does store the min-max time for each segment and uses it for pruning segments, so segments with multiple time intervals may not be perfectly pruned.

When generating star-indexes, the time column will be part of the star-tree so the tree can still be efficiently queried for segments with multiple time intervals.

What is the purpose of a hybrid table not using max(OfflineTime) to determine the time-boundary, and instead using an offset?

This lets you have an old event up come in without building complex offline pipelines that perfectly partition your events by event timestamps. With this offset, even if your offline data pipeline produces segments with a maximum timestamp, Pinot will not use the offline dataset for that last chunk of segments. The expectation is if you process offline the next time-range of data, your data pipeline will include any late events.

Why are segments not strictly time-partitioned?

It might seem odd that segments are not strictly time-partitioned, unlike similar systems such as Apache Druid. This allows real-time ingestion to consume out-of-order events. Even though segments are not strictly time-partitioned, Pinot will still index, prune, and query segments intelligently by time-intervals to for performance of hybrid tables and time-filtered data.

When generating offline segments, the segments generated such that segments only contain one time-interval and are well partitioned by the time column.

JSON Index

JSON index can be applied to JSON string columns to accelerate the value lookup and filtering for the column.

When to use JSON index

JSON string can be used to represent the array, map, nested field without forcing a fixed schema. It is very flexible, but the flexibility comes with a cost - filtering on JSON string column is very expensive.

Suppose we have some JSON records similar to the following sample record stored in the person column:

Without an index, in order to look up a key and filter records based on the value, we need to scan and reconstruct the JSON object from the JSON string for every record, look up the key and then compare the value.

For example, in order to find all persons whose name is "adam", the query will look like:

JSON index is designed to accelerate the filtering on JSON string columns without scanning and reconstructing all the JSON objects.

Configure JSON index

To enable the JSON index, set the following config in the table config:

Note that JSON index can only be applied to STRING columns whose values are JSON strings.

How to use JSON index

JSON index can be used via the JSON_MATCH predicate: JSON_MATCH(<column>, '<filterExpression>'). For example, to find all persons whose name is "adam", the query will look like:

Note that the quotes within the filter expression need to be escaped.

In release 0.7.1, we use the old syntax for filterExpression: 'name=''adam'''

Supported filter expressions

Simple key lookup

Find all persons whose name is "adam":

In release 0.7.1, we use the old syntax for filterExpression: 'name=''adam'''

Chained key lookup

Find all persons who have an address (one of the addresses) with number 112:

In release 0.7.1, we use the old syntax for filterExpression: 'addresses.number=112'

Nested filter expression

Find all persons whose name is "adam" and also have an address (one of the addresses) with number 112:

In release 0.7.1, we use the old syntax for filterExpression: 'name=''adam'' AND addresses.number=112'

Array access

Find all persons whose first address has number 112:

In release 0.7.1, we use the old syntax for filterExpression: '"addresses[0].number"=112'

Existence check

Find all persons who have phone field within the JSON:

In release 0.7.1, we use the old syntax for filterExpression: 'phone IS NOT NULL'

Find all persons whose first address does not contain floor field within the JSON:

In release 0.7.1, we use the old syntax for filterExpression: '"addresses[0].floor" IS NULL'

JSON context is maintained

The JSON context is maintained for object elements within an array, i.e. the filter won't cross match different objects in the array.

To find all persons who live on "main st" in "ca":

This query won't match "adam" because none of his addresses matches both the street and the country.

If JSON context is not desired, use multiple separate JSON_MATCH predicates. E.g. to find all persons who have addresses on "main st" and have addressed in "ca" (doesn't have to be the same address):

This query will match "adam" because one of his addressed matches the street and another one matches the country.

Note that the array index is maintained as a separate entry within the element, so in order to query different elements within an array, multiple JSON_MATCH predicates are required. E.g. to find all persons who have first address on "main st" and second address on "second st":

Supported JSON values

Object

See examples above.

Array

To find the records with array element "item1" in "arrayCol":

To find the records with second array element "item2" in "arrayCol":

Value

To find the records with value 123 in "valueCol":

Null

To find the records with null in "nullableCol":

In release 0.7.1, json string must be object (cannot be null, value or array); multi-dimensional array is not supported.

Limitations

  1. The key (left-hand side) of the filter expression must be the leaf level of the JSON object, e.g. "$.addresses[*]"='main st' won't work.

Querying Pinot

Learn how to query Pinot using SQL

DIALECT

Pinot uses Calcite SQL Parser to parse queries and uses MYSQL_ANSI dialect. You can see the grammar .

Limitations

Pinot does not support Joins or nested Subqueries and we recommend using Presto for queries that span multiple tables. Read for more info.

No DDL support. Tables can be created via the .

Identifier vs Literal

In Pinot SQL:

  • Double quotes(") are used to force string identifiers, e.g. column name.

  • Single quotes(') are used to enclose string literals.

Mis-using those might cause unexpected query results:

E.g.

  • WHERE a='b' means the predicate on the column a equals to a string literal value 'b'

  • WHERE a="b" means the predicate on the column a equals to the value of the column b

Example Queries

  • Use single quotes for literals and double quotes (optional) for identifiers (column names)

  • If you name the columns as timestamp, date, or other reserved keywords, or the column name includes special characters, you need to use double quotes when you refer to them in the query.

Simple selection

Aggregation

Grouping on Aggregation

Ordering on Aggregation

Filtering

For performant filtering of ids in a list, see .

Filtering with NULL predicate

Selection (Projection)

Ordering on Selection

Pagination on Selection

Note: results might not be consistent if column ordered by has same value in multiple rows.

Wild-card match (in WHERE clause only)

To count rows where the column airlineName starts with U

Case-When Statement

Pinot supports the CASE-WHEN-ELSE statement.

Example 1:

Example 2:

UDF

Functions have to be implemented within Pinot. Injecting functions is not yet supported. The example below demonstrate the use of UDFs. More examples in

BYTES column

Pinot supports queries on BYTES column using HEX string. The query response also uses hex string to represent bytes value.

E.g. the query below fetches all the rows for a given UID.

Filtering with IdSet

Learn how to write fast queries for looking up ids in a list of values.

A common use case is filtering on an id field with a list of values. This can be done with the IN clause, but this approach doesn't perform well with large lists of ids. In these cases, you can use an IdSet.

Functions

ID_SET

ID_SET(columnName, 'sizeThresholdInBytes=8388608;expectedInsertions=5000000;fpp=0.03' )

This function returns a base 64 encoded IdSet of the values for a single column. The IdSet implementation used depends on the column data type:

  • INT - RoaringBitmap unless sizeThresholdInBytes is exceeded, in which case Bloom Filter.

  • LONG - Roaring64NavigableMap unless sizeThresholdInBytes is exceeded, in which case Bloom Filter.

  • Other types - Bloom Filter

The following parameters are used to configure the Bloom Filter:

  • expectedInsertions - Number of expected insertions for the BloomFilter, must be positive

  • fpp - Desired false positive probability for the BloomFilter, must be positive and < 1.0

Note that when a Bloom Filter is used, the filter results are approximate - you can get false-positive results (for membership in the set), leading to potentially unexpected results.

IN_ID_SET

IN_ID_SET(columnName, base64EncodedIdSet)

This function returns 1 if a column contains a value specified in the IdSet and 0 if it does not.

IN_SUBQUERY

IN_SUBQUERY(columnName, subQuery)

This function generates an IdSet from a subquery and then filters ids based on that IdSet on a Pinot broker.

IN__PARTITIONED__SUBQUERY

IN_PARTITIONED_SUBQUERY(columnName, subQuery)

This function generates an IdSet from a subquery and then filters ids based on that IdSet on a Pinot server.

This function works best when the data is partitioned by the id column and each server contains all the data for a partition. The generated IdSet for the subquery will be smaller as it will only contain the ids for the partitions served by the server. This will give better performance.

Examples

Create IdSet

You can create an IdSet of the values in the yearID column by running the following:

idset(yearID)

When creating an IdSet for values in non INT/LONG columns, we can configure the expectedInsertions:

idset(playerName)
idset(playerName)

We can also configure the fpp parameter:

idset(playerName)

Filter by values in IdSet

We can use the IN_ID_SET function to filter a query based on an IdSet. To return rows for yearIDs in the IdSet, run the following:

Filter by values not in IdSet

To return rows for yearIDs not in the IdSet, run the following:

Filter on broker

To filter rows for yearIDs in the IdSet on a Pinot Broker, run the following query:

To filter rows for yearIDs not in the IdSet on a Pinot Broker, run the following query:

Filter on server

To filter rows for yearIDs in the IdSet on a Pinot Server, run the following query:

To filter rows for yearIDs not in the IdSet on a Pinot Server, run the following query:

Controller Admin API

The contains all the APIs that you will need to operate and manage your cluster. It provides a set of APIs for Pinot cluster management including health check, instances management, schema and table management, data segments management.

Note: The controller API's are primarily for admin tasks. Even though the UI console queries Pinot when running queries from the query console, please use the for querying Pinot.

Let's check out the tables in this cluster by going to and click on Try it out!. We can see the baseballStats table listed here. We can also see the exact curl call made to the controller API.

You can look at the configuration of this table by going to , type in baseballStats in the table name, and click Try it out!

Let's check out the schemas in the cluster by going to and click Try it out!. We can see a schema called baseballStats in this list.

Take a look at the schema by going to , type baseballStats in the schema name, and click Try it out!.

Finally, let's checkout the data segments in the cluster by going to , type in baseballStats in the table name, and click Try it out!. There's 1 segment for this table, called baseballStats_OFFLINE_0.

You might have figured out by now, in order to get data into the Pinot cluster, we need a table, a schema and segments. Let's head over to , to find out more about these components and learn how to create them for your own data.

Golang

Pinot Client for Golang

Pinot also provides to query database directly from go application.

Installation

Please follow this link to install and start Pinot batch QuickStart locally.

Check out Client library Github Repo

Build and run the example application to query from Pinot Batch Quickstart

Usage

Create a Pinot Connection

Pinot client could be initialized through:

1. Zookeeper Path.

2. A list of broker addresses.

3. ClientConfig

Query Pinot

Please see this for your reference.

Code snippet:

Response Format

Query Response is defined as the struct of following:

Note that AggregationResults and SelectionResults are holders for PQL queries.

Meanwhile ResultTable is the holder for SQL queries. ResultTable is defined as:

RespSchema is defined as:

There are multiple functions defined for ResultTable, like:

Sample Usage is

{
  "name": "adam",
  "age": 30,
  "country": "us",
  "addresses":
  [
    {
      "number" : 112,
      "street" : "main st",
      "country" : "us"
    },
    {
      "number" : 2,
      "street" : "second st",
      "country" : "us"
    },
    {
      "number" : 3,
      "street" : "third st",
      "country" : "ca"
    }
  ]
}
SELECT ... FROM mytable WHERE JSON_EXTRACT_SCALAR(person, '$.name', 'STRING') = 'adam'
{
  "tableIndexConfig": {        
    "jsonIndexColumns": [
      "person",
      ...
    ],
    ...
  }
}
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.name"=''adam''')
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.name"=''adam''')
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.addresses[*].number"=112')
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.name"=''adam'' AND "$.addresses[*].number"=112')
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.addresses[0].number"=112')
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.phone" IS NOT NULL')
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.addresses[0].floor" IS NULL')
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.addresses[*].street"=''main st'' AND "$.addresses[*].country"=''ca''')
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.addresses[*].street"=''main st''') AND JSON_MATCH(person, '"$.addresses[*].country"=''ca''')
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.addresses[0].street"=''main st''') AND JSON_MATCH(person, '"$.addresses[1].street"=''second st''')
["item1", "item2", "item3"]
SELECT ... FROM mytable WHERE JSON_MATCH(arrayCol, '"$[*]"=''item1''')
SELECT ... FROM mytable WHERE JSON_MATCH(arrayCol, '"$[1]"=''item2''')
123
1.23
"Hello World"
SELECT ... FROM mytable WHERE JSON_MATCH(valueCol, '"$"=123')
null
SELECT ... FROM mytable WHERE JSON_MATCH(nullableCol, '"$" IS NULL')
SELECT ID_SET(yearID)
FROM baseballStats
WHERE teamID = 'WS1'

ATowAAABAAAAAAA7ABAAAABtB24HbwdwB3EHcgdzB3QHdQd2B3cHeAd5B3oHewd8B30Hfgd/B4AHgQeCB4MHhAeFB4YHhweIB4kHigeLB4wHjQeOB48HkAeRB5IHkweUB5UHlgeXB5gHmQeaB5sHnAedB54HnwegB6EHogejB6QHpQemB6cHqAc=

SELECT ID_SET(playerName, 'expectedInsertions=10')
FROM baseballStats
WHERE teamID = 'WS1'

AwIBBQAAAAL/////////////////////

SELECT ID_SET(playerName, 'expectedInsertions=100')
FROM baseballStats
WHERE teamID = 'WS1'

AwIBBQAAAAz///////////////////////////////////////////////9///////f///9/////7///////////////+/////////////////////////////////////////////8=

SELECT ID_SET(playerName, 'expectedInsertions=100;fpp=0.01')
FROM baseballStats
WHERE teamID = 'WS1'

AwIBBwAAAA/////////////////////////////////////////////////////////////////////////////////////////////////////////9///////////////////////////////////////////////7//////8=

SELECT yearID, count(*) 
FROM baseballStats 
WHERE IN_ID_SET(
 yearID,   
 'ATowAAABAAAAAAA7ABAAAABtB24HbwdwB3EHcgdzB3QHdQd2B3cHeAd5B3oHewd8B30Hfgd/B4AHgQeCB4MHhAeFB4YHhweIB4kHigeLB4wHjQeOB48HkAeRB5IHkweUB5UHlgeXB5gHmQeaB5sHnAedB54HnwegB6EHogejB6QHpQemB6cHqAc='
  ) = 1 
GROUP BY yearID
SELECT yearID, count(*) 
FROM baseballStats 
WHERE IN_ID_SET(
  yearID,   
  'ATowAAABAAAAAAA7ABAAAABtB24HbwdwB3EHcgdzB3QHdQd2B3cHeAd5B3oHewd8B30Hfgd/B4AHgQeCB4MHhAeFB4YHhweIB4kHigeLB4wHjQeOB48HkAeRB5IHkweUB5UHlgeXB5gHmQeaB5sHnAedB54HnwegB6EHogejB6QHpQemB6cHqAc='
  ) = 0 
GROUP BY yearID
SELECT yearID, count(*) 
FROM baseballStats 
WHERE IN_SUBQUERY(
  yearID, 
  'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''WS1'''
  ) = 1
GROUP BY yearID  
SELECT yearID, count(*) 
FROM baseballStats 
WHERE IN_SUBQUERY(
  yearID, 
  'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''WS1'''
  ) = 0
GROUP BY yearID  
SELECT yearID, count(*) 
FROM baseballStats 
WHERE IN_PARTITIONED_SUBQUERY(
  yearID, 
  'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''WS1'''
  ) = 1
GROUP BY yearID  
SELECT yearID, count(*) 
FROM baseballStats 
WHERE IN_PARTITIONED_SUBQUERY(
  yearID, 
  'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''WS1'''
  ) = 0
GROUP BY yearID  
"tableIndexConfig": {
      ..
      "segmentPartitionConfig": {
        "columnPartitionMap": {
          "column_foo": {
            "functionName": "Murmur",
            "numPartitions": 12 // same as number of kafka partitions
          }
        }
      }
"routing": {
      "segmentPrunerTypes": ["partition"]
    }
    {
      "dataType": "STRING",
      "maxLength": 1000,
      "name": "textDim1"
    },
https://docs.confluent.io/current/clients/producer.html
here
json_format(field)
json functions
https://github.com/apache/pinot/issues/5264
https://yaml.org/spec/spec.html#escaping/in%20double-quoted%20scalars/
commit protocol
https://docs.pinot.apache.org/basics/components/table#tableindexconfig-1
How to apply inverted index to existing setup?
https://docs.pinot.apache.org/basics/components/table#offline-table-config
http://localhost:9000/help#!/Table/updateTableConfig
http://localhost:9000/help#!/Segment/reloadAllSegments
https://github.com/apache/pinot/issues/5390
https://docs.pinot.apache.org/basics/indexing/star-tree-index#index-generation
Components > Broker
//default to limit 10
SELECT * 
FROM myTable 

SELECT * 
FROM myTable 
LIMIT 100
SELECT COUNT(*), MAX(foo), SUM(bar) 
FROM myTable
SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo), bar, baz 
FROM myTable
GROUP BY bar, baz 
LIMIT 50
SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo), bar, baz 
FROM myTable
GROUP BY bar, baz 
ORDER BY bar, MAX(foo) DESC 
LIMIT 50
SELECT COUNT(*) 
FROM myTable
  WHERE foo = 'foo'
  AND bar BETWEEN 1 AND 20
  OR (baz < 42 AND quux IN ('hello', 'goodbye') AND quuux NOT IN (42, 69))
SELECT COUNT(*) 
FROM myTable
  WHERE foo IS NOT NULL
  AND foo = 'foo'
  AND bar BETWEEN 1 AND 20
  OR (baz < 42 AND quux IN ('hello', 'goodbye') AND quuux NOT IN (42, 69))
SELECT * 
FROM myTable
  WHERE quux < 5
  LIMIT 50
SELECT foo, bar 
FROM myTable
  WHERE baz > 20
  ORDER BY bar DESC
  LIMIT 100
SELECT foo, bar 
FROM myTable
  WHERE baz > 20
  ORDER BY bar DESC
  LIMIT 50, 100
SELECT COUNT(*) 
FROM myTable
  WHERE REGEXP_LIKE(airlineName, '^U.*')
  GROUP BY airlineName LIMIT 10
SELECT
    CASE
      WHEN price > 30 THEN 3
      WHEN price > 20 THEN 2
      WHEN price > 10 THEN 1
      ELSE 0
    END AS price_category
FROM myTable
SELECT
  SUM(
    CASE
      WHEN price > 30 THEN 30
      WHEN price > 20 THEN 20
      WHEN price > 10 THEN 10
      ELSE 0
    END) AS total_cost
FROM myTable
SELECT COUNT(*)
FROM myTable
GROUP BY DATETIMECONVERT(timeColumnName, '1:MILLISECONDS:EPOCH', '1:HOURS:EPOCH', '1:HOURS')
SELECT * 
FROM myTable
WHERE UID = 'c8b3bce0b378fc5ce8067fc271a34892'
here
Engineering Full SQL support for Pinot at Uber
REST API
Filtering with IdSet
Transform Function in Aggregation Grouping
bin/quick-start-batch.sh
git clone [email protected]:xiangfu0/pinot-client-go.git
cd pinot-client-go
go build ./examples/batch-quickstart
./batch-quickstart
pinotClient := pinot.NewFromZookeeper([]string{"localhost:2123"}, "", "QuickStartCluster")
pinotClient := pinot.NewFromBrokerList([]string{"localhost:8000"})
pinotClient := pinot.NewWithConfig(&pinot.ClientConfig{
	ZkConfig: &pinot.ZookeeperConfig{
		ZookeeperPath:     zkPath,
		PathPrefix:        strings.Join([]string{zkPathPrefix, pinotCluster}, "/"),
		SessionTimeoutSec: defaultZkSessionTimeoutSec,
	},
    ExtraHTTPHeader: map[string]string{
        "extra-header":"value",
    },
})
pinotClient, err := pinot.NewFromZookeeper([]string{"localhost:2123"}, "", "QuickStartCluster")
if err != nil {
    log.Error(err)
}
brokerResp, err := pinotClient.ExecuteSQL("baseballStats", "select count(*) as cnt, sum(homeRuns) as sum_homeRuns from baseballStats group by teamID limit 10")
if err != nil {
    log.Error(err)
}
log.Infof("Query Stats: response time - %d ms, scanned docs - %d, total docs - %d", brokerResp.TimeUsedMs, brokerResp.NumDocsScanned, brokerResp.TotalDocs)
type BrokerResponse struct {
	AggregationResults          []*AggregationResult `json:"aggregationResults,omitempty"`
	SelectionResults            *SelectionResults    `json:"SelectionResults,omitempty"`
	ResultTable                 *ResultTable         `json:"resultTable,omitempty"`
	Exceptions                  []Exception          `json:"exceptions"`
	TraceInfo                   map[string]string    `json:"traceInfo,omitempty"`
	NumServersQueried           int                  `json:"numServersQueried"`
	NumServersResponded         int                  `json:"numServersResponded"`
	NumSegmentsQueried          int                  `json:"numSegmentsQueried"`
	NumSegmentsProcessed        int                  `json:"numSegmentsProcessed"`
	NumSegmentsMatched          int                  `json:"numSegmentsMatched"`
	NumConsumingSegmentsQueried int                  `json:"numConsumingSegmentsQueried"`
	NumDocsScanned              int64                `json:"numDocsScanned"`
	NumEntriesScannedInFilter   int64                `json:"numEntriesScannedInFilter"`
	NumEntriesScannedPostFilter int64                `json:"numEntriesScannedPostFilter"`
	NumGroupsLimitReached       bool                 `json:"numGroupsLimitReached"`
	TotalDocs                   int64                `json:"totalDocs"`
	TimeUsedMs                  int                  `json:"timeUsedMs"`
	MinConsumingFreshnessTimeMs int64                `json:"minConsumingFreshnessTimeMs"`
}
// ResultTable is a ResultTable
type ResultTable struct {
	DataSchema RespSchema      `json:"dataSchema"`
	Rows       [][]interface{} `json:"rows"`
}
// RespSchema is response schema
type RespSchema struct {
	ColumnDataTypes []string `json:"columnDataTypes"`
	ColumnNames     []string `json:"columnNames"`
}
func (r ResultTable) GetRowCount() int
func (r ResultTable) GetColumnCount() int
func (r ResultTable) GetColumnName(columnIndex int) string
func (r ResultTable) GetColumnDataType(columnIndex int) string
func (r ResultTable) Get(rowIndex int, columnIndex int) interface{}
func (r ResultTable) GetString(rowIndex int, columnIndex int) string
func (r ResultTable) GetInt(rowIndex int, columnIndex int) int
func (r ResultTable) GetLong(rowIndex int, columnIndex int) int64
func (r ResultTable) GetFloat(rowIndex int, columnIndex int) float32
func (r ResultTable) GetDouble(rowIndex int, columnIndex int) float64
// Print Response Schema
for c := 0; c < brokerResp.ResultTable.GetColumnCount(); c++ {
  fmt.Printf("%s(%s)\t", brokerResp.ResultTable.GetColumnName(c), brokerResp.ResultTable.GetColumnDataType(c))
}
fmt.Println()

// Print Row Table
for r := 0; r < brokerResp.ResultTable.GetRowCount(); r++ {
  for c := 0; c < brokerResp.ResultTable.GetColumnCount(); c++ {
    fmt.Printf("%v\t", brokerResp.ResultTable.Get(r, c))
  }
  fmt.Println()
}
a native go client
Pinot Quickstart
example
here

Operations FAQ

Operations

How much heap should I allocate for my Pinot instances?

Typically, Pinot components try to use as much off-heap (MMAP/DirectMemory) where ever possible. For example, Pinot servers load segments in memory-mapped files in MMAP mode (recommended), or direct memory in HEAP mode. Heap memory is used mostly for query execution and storing some metadata. We have seen production deployments with high throughput and low-latency work well with just 16 GB of heap for Pinot servers and brokers. Pinot controller may also cache some metadata (table configs etc) in heap, so if there are just a few tables in the Pinot cluster, a few GB of heap should suffice.

Does Pinot provide any backup/restore mechanism?

Pinot relies on deep-storage for storing backup copy of segments (offline as well as realtime). It relies on Zookeeper to store metadata (table configs, schema, cluster state, etc). It does not explicitly provide tools to take backups or restore these data, but relies on the deep-storage (ADLS/S3/GCP/etc), and ZK to persist these data/metadata.

Can I change a column name in my table, without losing data?

Changing a column name or data type is considered backward incompatible change. While Pinot does support schema evolution for backward compatible changes, it does not support backward incompatible changes like changing name/data-type of a column.

How to change number of replicas of a table?

You can change the number of replicas by updating the table config's segmentsConfig section. Make sure you have at least as many servers as the replication.

For OFFLINE table, update replication

{ 
    "tableName": "pinotTable", 
    "tableType": "OFFLINE", 
    "segmentsConfig": {
      "replication": "3", 
      ... 
    }
    ..

For REALTIME table update replicasPerPartition

{ 
    "tableName": "pinotTable", 
    "tableType": "REALTIME", 
    "segmentsConfig": {
      "replicasPerPartition": "3", 
      ... 
    }
    ..

After changing the replication, run a table rebalance.

How to run a rebalance on a table?

Refer to Rebalance.

How to control number of segments generated?

The number of segments generated depends on the number of input files. If you provide only 1 input file, you will get 1 segment. If you break up the input file into multiple files, you will get as many segments as the input files.

What are the common reasons my segment is in a BAD state ?

This typically happens when the server is unable to load the segment. Possible causes: Out-Of-Memory, no-disk space, unable to download segment from deep-store, and similar other errors. Please check server logs for more information.

How to reset a segment when it runs into a BAD state?

Use the segment reset controller REST API to reset the segment:

curl -X POST "{host}/segments/{tableNameWithType}/{segmentName}/reset"

What's the difference to Reset, Refresh, or Reload a segment?

RESET: this gets a segment in ERROR state back to ONLINE or CONSUMING state. Behind the scenes, Pinot controller takes the segment to OFFLINE state, waits for External View to stabilize, and then moves it back to ONLINE/CONSUMING state, thus effectively resetting segments or consumers in error states.

REFRESH: this replaces the segment with a new one, with the same name but often different data. Under the hood, Pinot controller sets new segment metadata in Zookeeper, and notifies brokers and servers to check their local states about this segment and update accordingly. Servers also download the new segment to replace the old one, when both have different checksums. There is no separate rest API for refreshing, and it is done as part of SegmentUpload API today.

RELOAD: this reloads the segment, often to generate a new index as updated in table config. Underlying, Pinot server gets the new table config from Zookeeper, and uses it to guide the segment reloading. In fact, the last step of REFRESH as explained above is to load the segment into memory to serve queries. There is a dedicated rest API for reloading. By default, it doesn't download segment. But option is provided to force server to download segment to replace the local one cleanly.

In addition, RESET brings the segment OFFLINE temporarily; while REFRESH and RELOAD swap the segment on server atomically without bringing down the segment or affecting ongoing queries.

How can I make brokers/servers join the cluster without the DefaultTenant tag?

Set this property in your controller.conf file

cluster.tenant.isolation.enable=false

Now your brokers and servers should join the cluster as broker_untagged and server_untagged . You can then directly use the POST /tenants API to create the desired tenants

curl -X POST "http://localhost:9000/tenants" 
-H "accept: application/json" 
-H "Content-Type: application/json" 
-d "{\"tenantRole\":\"BROKER\",\"tenantName\":\"foo\",\"numberOfInstances\":1}"

Tuning and Optimizations

Do replica groups work for real-time?

Yes, replica groups work for realtime. There's 2 parts to enabling replica groups:

  1. Replica groups segment assignment

  2. Replica group query routing

Replica group segment assignment

Replica group segment assignment is achieved in realtime, if number of servers is a multiple of number of replicas. The partitions get uniformly sprayed across the servers, creating replica groups. For example, consider we have 6 partitions, 2 replicas, and 4 servers.

r1

r2

p1

S0

S1

p2

S2

S3

p3

S0

S1

p4

S2

S3

p5

S0

S1

p6

S2

S3

As you can see, the set (S0, S2) contains r1 of every partition, and (s1, S3) contains r2 of every partition. The query will only be routed to one of the sets, and not span every server. If you are are adding/removing servers from an existing table setup, you have to run rebalance for segment assignment changes to take effect.

Replica group query routing

Once replica group segment assignment is in effect, the query routing can take advantage of it. For replica group based query routing, set the following in the table config's routing section, and then restart brokers

{
    "tableName": "pinotTable", 
    "tableType": "REALTIME",
    "routing": {
        "instanceSelectorType": "replicaGroup"
    }
    ..
}

0.4.0

0.4.0 release introduced the theta-sketch based distinct count function, an S3 filesystem plugin, a unified star-tree index implementation, migration from TimeFieldSpec to DateTimeFieldSpec, etc.

Summary

0.4.0 release introduced various new features, including the theta-sketch based distinct count aggregation function, an S3 filesystem plugin, a unified star-tree index implementation, deprecation of TimeFieldSpec in favor of DateTimeFieldSpec, etc. Miscellaneous refactoring, performance improvement and bug fixes were also included in this release. See details below.

Notable New Features

  • Made DateTimeFieldSpecs mainstream and deprecated TimeFieldSpec (#2756)

    • Used time column from table config instead of schema (#5320)

    • Included dateTimeFieldSpec in schema columns of Pinot Query Console #5392

    • Used DATE_TIME as the primary time column for Pinot tables (#5399)

  • Supported range queries using indexes (#5240)

  • Supported complex aggregation functions

    • Supported Aggregation functions with multiple arguments (#5261)

    • Added api in AggregationFunction to get compiled input expressions (#5339)

  • Added a simple PinotFS benchmark driver (#5160)

  • Supported default star-tree (#5147)

  • Added an initial implementation for theta-sketch based distinct count aggregation function (#5316)

    • One minor side effect: DataSchemaPruner won't work for DistinctCountThetaSketchAggregatinoFunction (#5382)

  • Added access control for Pinot server segment download api (#5260)

  • Added Pinot S3 Filesystem Plugin (#5249)

  • Text search improvement

    • Pruned stop words for text index (#5297)

    • Used 8byte offsets in chunk based raw index creator (#5285)

    • Derived num docs per chunk from max column value length for varbyte raw index creator (#5256)

    • Added inter segment tests for text search and fixed a bug for Lucene query parser creation (#5226)

    • Made text index query cache a configurable option (#5176)

    • Added Lucene DocId to PinotDocId cache to improve performance (#5177)

    • Removed the construction of second bitmap in text index reader to improve performance (#5199)

  • Tooling/usability improvement

    • Added template support for Pinot Ingestion Job Spec (#5341)

    • Allowed user to specify zk data dir and don't do clean up during zk shutdown (#5295)

    • Allowed configuring minion task timeout in the PinotTaskGenerator (#5317)

    • Update JVM settings for scripts (#5127)

    • Added Stream github events demo (#5189)

    • Moved docs link from gitbook to docs.pinot.apache.org (#5193)

  • Re-implemented ORCRecordReader (#5267)

  • Evaluated schema transform expressions during ingestion (#5238)

  • Handled count distinct query in selection list (#5223)

  • Enabled async processing in pinot broker query api (#5229)

  • Supported bootstrap mode for table rebalance (#5224)

  • Supported order-by on BYTES column (#5213)

  • Added Nightly publish to binary (#5190)

  • Shuffled the segments when rebalancing the table to avoid creating hotspot servers (#5197)

  • Supported inbuilt transform functions (#5312)

    • Added date time transform functions (#5326)

  • Deepstore by-pass in LLC: introduced segment uploader (#5277, #5314)

  • APIs Additions/Changes

    • Added a new server api for download of segments

      • /GET /segments/{tableNameWithType}/{segmentName}

  • Upgraded helix to 0.9.7 (#5411)

  • Added support to execute functions during query compilation (#5406)

  • Other notable refactoring

    • Moved table config into pinot-spi (#5194)

    • Cleaned up integration tests. Standardized the creation of schema, table config and segments (#5385)

    • Added jsonExtractScalar function to extract field from json object (#4597)

    • Added template support for Pinot Ingestion Job Spec #5372

    • Cleaned up AggregationFunctionContext (#5364)

    • Optimized real-time range predicate when cardinality is high (#5331)

    • Made PinotOutputFormat use table config and schema to create segments (#5350)

    • Tracked unavailable segments in InstanceSelector (#5337)

    • Added a new best effort segment uploader with bounded upload time (#5314)

    • In SegmentPurger, used table config to generate the segment (#5325)

    • Decoupled schema from RecordReader and StreamMessageDecoder (#5309)

    • Implemented ARRAYLENGTH UDF for multi-valued columns (#5301)

    • Improved GroupBy query performance (#5291)

    • Optimized ExpressionFilterOperator (#5132)

Major Bug Fixes

  • Do not release the PinotDataBuffer when closing the index (#5400)

  • Handled a no-arg function in query parsing and expression tree (#5375)

  • Fixed compatibility issues during rolling upgrade due to unknown json fields (#5376)

  • Fixed missing error message from pinot-admin command (#5305)

  • Fixed HDFS copy logic (#5218)

  • Fixed spark ingestion issue (#5216)

  • Fixed the capacity of the DistinctTable (#5204)

  • Fixed various links in the Pinot website

Work in Progress

  • Upsert: support overriding data in the real-time table (#4261).

    • Add pinot upsert features to pinot common (#5175)

  • Enhancements for theta-sketch, e.g. multiValue aggregation support, complex predicates, performance tuning, etc

Backward Incompatible Changes

  • TableConfig no longer support de-serialization from json string of nested json string (i.e. no \" inside the json) (#5194)

  • The following APIs are changed in AggregationFunction (use TransformExpressionTree instead of String as the key of blockValSetMap) (#5371):

    void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
    void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
    void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
Neha Pawar from the Apache Pinot team shows you how to setup a Pinot cluster
docker network create -d bridge pinot-demo
docker run \
    --network=pinot-demo \
    --name pinot-quickstart \
    -p 9000:9000 \
    -d apachepinot/pinot:latest QuickStart \
    -type batch
docker logs pinot-quickstart -f
docker logs pinot-quickstart -f
# stop previous container, if any, or use different network
docker run \
    --network=pinot-demo \
    --name pinot-quickstart \
    -p 9000:9000 \
    -d apachepinot/pinot:latest QuickStart \
    -type stream
# stop previous container, if any, or use different network
docker run \
    --network=pinot-demo \
    --name pinot-quickstart \
    -p 9000:9000 \
    -d apachepinot/pinot:latest QuickStart \
    -type hybrid
Manual cluster setup
Docker
Kubernetes quick start
minikube
Docker Kubernetes
Exploring Pinot
Exploring Pinot
Exploring Pinot
Cluster Setup Completion Example
{
  "schemaName": "baseballStats",
  "dimensionFieldSpecs": [
    {
      "name": "playerID",
      "dataType": "STRING"
    },
    {
      "name": "yearID",
      "dataType": "INT"
    },
    {
      "name": "teamID",
      "dataType": "STRING"
    },
    {
      "name": "league",
      "dataType": "STRING"
    },
    {
      "name": "playerName",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "playerStint",
      "dataType": "INT"
    },
    {
      "name": "numberOfGames",
      "dataType": "INT"
    },
    {
      "name": "numberOfGamesAsBatter",
      "dataType": "INT"
    },
    {
      "name": "AtBatting",
      "dataType": "INT"
    },
    {
      "name": "runs",
      "dataType": "INT"
    },
    {
      "name": "hits",
      "dataType": "INT"
    },
    {
      "name": "doules",
      "dataType": "INT"
    },
    {
      "name": "tripples",
      "dataType": "INT"
    },
    {
      "name": "homeRuns",
      "dataType": "INT"
    },
    {
      "name": "runsBattedIn",
      "dataType": "INT"
    },
    {
      "name": "stolenBases",
      "dataType": "INT"
    },
    {
      "name": "caughtStealing",
      "dataType": "INT"
    },
    {
      "name": "baseOnBalls",
      "dataType": "INT"
    },
    {
      "name": "strikeouts",
      "dataType": "INT"
    },
    {
      "name": "intentionalWalks",
      "dataType": "INT"
    },
    {
      "name": "hitsByPitch",
      "dataType": "INT"
    },
    {
      "name": "sacrificeHits",
      "dataType": "INT"
    },
    {
      "name": "sacrificeFlies",
      "dataType": "INT"
    },
    {
      "name": "groundedIntoDoublePlays",
      "dataType": "INT"
    },
    {
      "name": "G_old",
      "dataType": "INT"
    }
  ]
}
Pinot Admin UI
Broker Query API
Table -> List all tables in cluster
Tables -> Get/Enable/Disable/Drop a table
Schema -> List all schemas in the cluster
Schema -> Get a schema
List all segments
Batch upload sample data
List all tables in cluster
List all schemas in the cluster

Complex Type (Array, Map) Handling

Complex-type handling in Apache Pinot.

It's common for the ingested data to have complex structure. For example, Avro schema has records and arrays, and JSON data has objects and arrays. In Apache Pinot, the data model supports primitive data types (including int, long, float, double, string, bytes), as well as limited multi-value types such as an array of primitive types. Such simple data types allow Pinot to build fast indexing structures for good query performance, but it requires some handling on the complex structures. There are in general two options for such handling: convert the complex-type data into JSON string and then build JSON index; or use the inbuilt complex-type handling rules in the ingestion config.

In this page, we'll show how to handle this complex-type structure with these two approaches, to process the example data in the following figure, which is a field group from the Meetup events Quickstart example. Note this object has two child fields, and the child group is a nested array with the element of object type.

Example JSON data

Handle the complex type with JSON indexing

Apache Pinot provides powerful JSON index to accelerate the value lookup and filtering for the column. To convert an object group with complex type to JSON, you can add the following config to table config.

json_meetupRsvp_realtime_table_config.json
{
  "transformConfigs": [
      {
        "columnName": "group_json",
        "transformFunction": "jsonFormat(\"group\")"
      }
    ],
    ...
    "tableIndexConfig": {
    "loadMode": "MMAP",
    "noDictionaryColumns": [
      "group_json"
    ],
    "jsonIndexColumns": [
      "group_json"
    ]
  },

}

Note the config transformConfigs transforms the object group to a JSON string group_json, which then creates the JSON indexing with config jsonIndexColumns. To read the full spec, please check out this file. Also note that group is a reserved keyword in SQL, and that's why it's quoted in the transformFunction.

Additionall, you need to overwrite the maxLength of the field group_json on the schema, because by default, a string column has a limited length. For example,

json_meetupRsvp_realtime_table_schema.json
{
  {
      "name": "group_json",
      "dataType": "STRING",
      "maxLength": 2147483647
    }
    ...
}

For the full spec, please check out this file.

With this, you can start to query the nested fields under group. For the deatils about the supported JSON function, please check out this guide).

Handle the complex type with ingestion configurations

Though JSON indexing is a handy way to process the complex types, there are some limitations:

  • It’s not performant to group by or order by a JSON field, because JSON_EXTRACT_SCALAR is needed to extract the values in the GROUP BY and ORDER BY clauses, which invokes the function evaluation.

  • For cases that you want to use Pinot's multi-column functions such as DISTINCTCOUNTMV

Alternatively, from Pinot 0.8, you can use the complex-type handling in ingestion configurations to flatten and unnest the complex structure and convert them into primitive types. Then you can reduce the complex-type data into a flattened Pinot table, and query it via SQL. With the inbuilt processing rules, you do not need to write ETL jobs in another compute framework such as Flink or Spark.

To process this complex-type, you can add the configuration complexTypeConfig to the ingestionConfig. For example:

complexTypeHandling_meetupRsvp_realtime_table_config.json
{
  "ingestionConfig": {    
    "complexTypeConfig": {
      "delimiter": '.',
      "fieldsToUnnest": ["group.group_topics"],
      "collectionNotUnnestedToJson": "NON_PRIMITIVE"
    }
  }
}

With the complexTypeConfig , all the map objects will be flattened to direct fields automatically. And with unnestFields , a record with the nested collection will unnest into multiple records. For instance, the example in the beginning will transform into two rows with this configuration example.

Flattened/unnested data

Note that

  • The nested field group_id under group is flattened to field group.group_id. The default value of the delimiter is ., you can choose other delimiter by changing the configuration delimiter under complexTypeConfig. This flattening rule also apllies on the maps in the collections to be unnested.

  • The nested array group_topics under group is unnested into the top-level, and convert the output to a collection of two rows. Note the handling of the nested field within group_topics, and the eventual top-level field of group.group_topics.urlkey. All the collections to unnest shall be included in configuration fieldsToUnnest.

  • For the collections not in specified in fieldsToUnnest, the ingestion by default will serialize them into JSON string, except for the array of primitive values, which will be ingested as multi-value column by default. The behavior is defined in config collectionNotUnnestedToJson with default value to NON_PRIMITIVE. Other behaviors include (1) ALL, which aslo convert the array of primitive values to JSON string; (2) NONE, this does not do conversion, but leave it to the users to use transform functions for handling.

You can find the full spec of the table config here and the table schema here.

With the flattening/unnesting, you can then query the table with primitive values using the SQL query like:

SELECT "group.group_topics.urlkey", "group.group_topics.topic_name", "group.group_id" 
FROM meetupRsvp
LIMIT 10

Note . is a reserved character in SQL, so you need to quote the flattened column.

Infer the Pinot schema from the Avro schema and JSON data

When there are complex structures, it could be challenging and tedious to figure out the Pinot schema manually. To help the schema inference, Pinot provides utility tools to take the Avro schema or JSON data as input and output the inferred Pinot schema.

To infer the Pinot schema from Avro schema, you can use the command like the following

bin/pinot-admin.sh AvroSchemaToPinotSchema -timeColumnName fields.hoursSinceEpoch -avroSchemaFile /tmp/test.avsc -pinotSchemaName myTable -outputDir /tmp/test -fieldsToUnnest entries

Note you can input configurations like fieldsToUnnest similar to the ones in complexTypeConfig. And this will simulate the complex-type handling rules on the Avro schema and output the Pinot schema in the file specified in outputDir.

Similarly, you can use the command like the following to infer the Pinot schema from a file of JSON objects.

bin/pinot-admin.sh JsonToPinotSchema -timeColumnName hoursSinceEpoch -jsonFile /tmp/test.json -pinotSchemaName myTable -outputDir /tmp/test -fieldsToUnnest payload.commits

You can check out an example of this run in this PR.

Creating Pinot Segments

Creating Pinot segments

Pinot segments can be created offline on Hadoop, or via command line from data files. Controller REST endpoint can then be used to add the segment to the table to which the segment belongs. Pinot segments can also be created by ingesting data from realtime resources (such as Kafka).

Creating segments using hadoop

Offline Pinot workflow

To create Pinot segments on Hadoop, a workflow can be created to complete the following steps:

  1. Pre-aggregate, clean up and prepare the data, writing it as Avro format files in a single HDFS directory

  2. Create segments

  3. Upload segments to the Pinot cluster

Step one can be done using your favorite tool (such as Pig, Hive or Spark), Pinot provides two MapReduce jobs to do step two and three.

Configuring the job

Create a job properties configuration file, such as one below:

# === Index segment creation job config ===

# path.to.input: Input directory containing Avro files
path.to.input=/user/pinot/input/data

# path.to.output: Output directory containing Pinot segments
path.to.output=/user/pinot/output

# path.to.schema: Schema file for the table, stored locally
path.to.schema=flights-schema.json

# segment.table.name: Name of the table for which to generate segments
segment.table.name=flights

# === Segment tar push job config ===

# push.to.hosts: Comma separated list of controllers host names to which to push
push.to.hosts=controller_host_0,controller_host_1

# push.to.port: The port on which the controller runs
push.to.port=8888

Executing the job

The Pinot Hadoop module contains a job that you can incorporate into your workflow to generate Pinot segments.

mvn clean install -DskipTests -Pbuild-shaded-jar
hadoop jar pinot-hadoop-<version>-SNAPSHOT-shaded.jar SegmentCreation job.properties

You can then use the SegmentTarPush job to push segments via the controller REST API.

hadoop jar pinot-hadoop-<version>-SNAPSHOT-shaded.jar SegmentTarPush job.properties

Creating Pinot segments outside of Hadoop

Here is how you can create Pinot segments from standard formats like CSV/JSON/AVRO.

  1. Follow the steps described in the section on Compiling the code to build pinot. Locate pinot-admin.sh in pinot-tools/target/pinot-tools=pkg/bin/pinot-admin.sh.

  2. Create a top level directory containing all the CSV/JSON/AVRO files that need to be converted into segments.

  3. The file name extensions are expected to be the same as the format name (i.e .csv, .json or .avro), and are case insensitive. Note that the converter expects the .csv extension even if the data is delimited using tabs or spaces instead.

  4. Prepare a schema file describing the schema of the input data. The schema needs to be in JSON format. See example later in this section.

  5. Specifically for CSV format, an optional csv config file can be provided (also in JSON format). This is used to configure parameters like the delimiter/header for the CSV file etc. A detailed description of this follows below.

Run the pinot-admin command to generate the segments. The command can be invoked as follows. Options within “[ ]” are optional. For -format, the default value is AVRO.

bin/pinot-admin.sh CreateSegment -dataDir <input_data_dir> [-format [CSV/JSON/AVRO]] [-readerConfigFile <csv_config_file>] [-generatorConfigFile <generator_config_file>] -segmentName <segment_name> -schemaFile <input_schema_file> -tableName <table_name> -outDir <output_data_dir> [-overwrite]

To configure various parameters for CSV a config file in JSON format can be provided. This file is optional, as are each of its parameters. When not provided, default values used for these parameters are described below:

  1. fileFormat: Specify one of the following. Default is EXCEL.

    1. EXCEL

    2. MYSQL

    3. RFC4180

    4. TDF

  2. header: If the input CSV file does not contain a header, it can be specified using this field. Note, if this is specified, then the input file is expected to not contain the header row, or else it will result in parse error. The columns in the header must be delimited by the same delimiter character as the rest of the CSV file.

  3. delimiter: Use this to specify a delimiter character. The default value is “,”.

  4. multiValueDelimiter: Use this to specify a delimiter character for each value in multi-valued columns. The default value is “;”.

Below is a sample config file.

{
  "fileFormat": "EXCEL",
  "header": "col1,col2,col3,col4",
  "delimiter": "\t",
  "multiValueDelimiter": ","
}

Sample Schema:

{
  "schemaName": "flights",
  "dimensionFieldSpecs": [
    {
      "name": "flightNumber",
      "dataType": "LONG"
    },
    {
      "name": "tags",
      "dataType": "STRING",
      "singleValueField": false
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "price",
      "dataType": "DOUBLE"
    }
  ],
  "timeFieldSpec": {
    "incomingGranularitySpec": {
      "name": "daysSinceEpoch",
      "dataType": "INT",
      "timeType": "DAYS"
    }
  }
}

Pushing offline segments to Pinot

You can use curl to push a segment to pinot:

curl -X POST -F segment=@<segment-tar-file-path> http://controllerHost:controllerPort/segments

Alternatively you can use the pinot-admin.sh utility to upload one or more segments:

pinot-tools/target/pinot-tools-pkg/bin//pinot-admin.sh UploadSegment -controllerHost <hostname> -controllerPort <port> -segmentDir <segmentDirectoryPath>

The command uploads all the segments found in segmentDirectoryPath. The segments could be either tar-compressed (in which case it is a file under segmentDirectoryPath) or uncompressed (in which case it is a directory under segmentDirectoryPath).

Pinot Data Explorer

Explore the data on our Pinot cluster

Once you have set up the Cluster, you can start exploring the data and the APIs. Pinot 0.5.0 comes bundled with a completely new UI.

Navigate to in your browser to open the controller UI.

Let's take a look at the following two features on the UI

Query Console

Let us run some queries on the data in the Pinot cluster. Head over to to see the querying interface.

We can see our baseballStats table listed on the left (you will see meetupRSVP or airlineStats if you used the streaming or the hybrid quick start). Click on the table name to display all the names along with the data types of the columns of the table. You can also execute a sample query select * from baseballStats limit 10 by typing it in the text box and clicking the Run Query button.

Cmd + Enter can also be used to run the query when focused on the console.

You can also try out the following queries:

Pinot supports a subset of standard SQL. For more information, see .

Rest API

The contains all the APIs that you will need to operate and manage your cluster. It provides a set of APIs for Pinot cluster management including health check, instances management, schema and table management, data segments management.

Let's check out the tables in this cluster by going to and click on Try it out!. We can see thebaseballStats table listed here. We can also see the exactcurl call made to the controller API.

You can look at the configuration of this table by going to , type in baseballStats in the table name, and click Try it out!

Let's check out the schemas in the cluster by going to and click Try it out!. We can see a schema called baseballStats in this list.

Take a look at the schema by going to , type baseballStats in the schema name, and click Try it out!.

Finally, let's checkout the data segments in the cluster by going to , type in baseballStats in the table name, and click Try it out!. There's 1 segment for this table, called baseballStats_OFFLINE_0.

To learn how to uploaded your own data and schema, head over to or .

Stream Ingestion with Upsert

Upsert support in Apache Pinot.

Pinot provides native support of upsert during the real-time ingestion (v0.6.0+). There are scenarios that the records need modifications, such as correcting a ride fare and updating a delivery status.

With the foundation of full upsert support in Pinot, another category of use cases on partial upsert are enabled (v0.8.0+). Partial upsert is convenient to users so that they only need to specify the columns whose value changes, and ignore the others.

To enable upsert on a Pinot table, there are a couple of configurations to make on the table configurations as well as on the input stream.

Define the primary key in the schema

To update a record, a primary key is needed to uniquely identify the record. To define a primary key, add the field primaryKeyColumns to the schema definition. For example, the schema definition of UpsertMeetupRSVP in the quick start example has this definition.

Note this field expects a list of columns, as the primary key can be composite.

When two records of the same primary key are ingested, the record with the greater event time (as defined by the time column) is used. When records with the same primary key and event time, then the order is not determined. In most cases, the later ingested record will be used, but may not be so in the cases when the table has a column to sort by.

Partition the input stream by the primary key

An important requirement for the Pinot upsert table is to partition the input stream by the primary key. For Kafka messages, this means the producer shall set the key in the API. If the original stream is not partitioned, then a streaming processing job (e.g. Flink) is needed to shuffle and repartition the input stream into a partitioned one for Pinot's ingestion.

Enable upsert in the table configurations

There are a few configurations needed in the table configurations to enable upsert.

Upsert mode

For append-only tables, the upsert mode defaults to NONE. To enable the full upsert, set the mode to FULL for the full update. For example:

Pinot also added the partial update support in v0.8.0+. To enable the partial upsert, set the mode to PARTIAL and specify partialUpsertStrategies for partial upsert columns. For example:

Pinot supports the following partial upsert strategies -

Comparison Column

By default, Pinot uses the value in the time column to determine the latest record. That means, for two records with the same primary key, the record with the larger value of the time column is picked as the latest update. However, there are cases when users need to use another column to determine the order. In such case, you can use option comparisonColumn to override the column used for comparison. For example,

Use strictReplicaGroup for routing

The upsert Pinot table can use only the low-level consumer for the input streams. As a result, it uses the for the segments. Moreover,upsert poses the additional requirement that all segments of the same partition must be served from the same server to ensure the data consistency across the segments. Accordingly, it requires to use strictReplicaGroup as the routing strategy. To use that, configure instanceSelectorType in Routing as the following:

Limitations

There are some limitations for the upsert Pinot tables.

First, the high-level consumer is not allowed for the input stream ingestion, which means stream.kafka.consumer.type must be lowLevel.

Second, the star-tree index cannot be used for indexing, as the star-tree index performs pre-aggregation during the ingestion.

Example

Putting these together, you can find the table configurations of the quick start example as the following:

Quick Start

To illustrate how the full upsert works, the Pinot binary comes with a quick start example. Use the following command to creates a realtime upsert table meetupRSVP.

You can also run partial upsert demo with the following command

As soon as data flows into the stream, the Pinot table will consume it and it will be ready for querying. Head over to the Query Console to checkout the realtime data.

For partial upsert you can see only the value from configured column changed based on specified partial upsert strategy.

An example for partial upsert is shown below, each of the event_id kept being unique during ingestion, meanwhile the value of rsvp_count incremented.

To see the difference from the append-only table, you can use a query option skipUpsert to skip the upsert effect in the query result.

Input formats

This section contains a collection of guides that will show you how to import data from a Pinot supported input format.

Pinot offers support for various popular input formats during ingestion. By changing the input format, you can reduce the time that goes in serialization-deserialization and speed up the ingestion.

The input format can be changed using the recordReaderSpec config in the ingestion job spec.

The config consists of the following keys -

dataFormat - Name of the data format to consume.

className - name of the class that implements the RecordReader interface. This class is used for parsing the data.

configClassName - name of the class that implements the RecordReaderConfig interface. This class is used the parse the values mentioned in configs

configs - Key value pair for format specific configs. This field can be left out.

Pinot supports the multiple input formats out of the box. You just need to specify the corresponding readers and the associated custom configs to switch between the formats.

CSV

CSV Record Reader supports the following configs -

fileFormat - can be one of default, rfc4180, excel, tdf, mysql

header - header of the file. The columnNames should be seperated by the delimiter mentioned in the config

delimiter - The character seperating the columns

multiValueDelimiter - The character seperating multiple values in a single column. This can be used to split a column into a list.

Your CSV file may have raw text fields that cannot be reliably delimited using any character. In this case, explicitly set the multiValueDelimeter field to empty in the ingestion config. multiValueDelimiter: ''

AVRO

The Avro record reader converts the data in file to a GenericRecord. A java class or .avro file is not required.

JSON

Thrift

Note: Thrift requires the generated class using .thrift file to parse the data. The .class file should be available in the Pinot's classpath. You can put the files in the lib/ folder of pinot distribution directory.

Parquet

The above class doesn't read the Parquet INT96 and Decimaltype.

Please use the below class to handle INT96 and Decimaltype.

ORC

ORC record reader supports the following data types -

In LIST and MAP types, the object should only belong to one of the data types supported by Pinot.

Protocol Buffers

The reader requires a descriptor file to deserialize the data present in the files. You can generate the descriptor file (.desc) from the .proto file using the command -

Geospatial

This page talks about geospatial support in Pinot.

Pinot supports SQL/MM geospatial data and is compliant with the . This includes:

  • Geospatial data types, such as point, line and polygon;

  • Geospatial functions, for querying of spatial properties and relationships.

  • Geospatial indexing, used for efficient processing of spatial operations

Geospatial data types

Geospatial data types abstract and encapsulate spatial structures such as boundary and dimension. In many respects, spatial data types can be understood simply as shapes. Pinot supports the Well-Known Text (WKT) and Well-Known Binary (WKB) form of geospatial objects, for example:

  • POINT (0, 0)

  • LINESTRING (0 0, 1 1, 2 1, 2 2)

  • POLYGON (0 0, 10 0, 10 10, 0 10, 0 0),(1 1, 1 2, 2 2, 2 1, 1 1)

  • MULTIPOINT (0 0, 1 2)

  • MULTILINESTRING ((0 0, 1 1, 1 2), (2 3, 3 2, 5 4))

  • MULTIPOLYGON (((0 0, 4 0, 4 4, 0 4, 0 0), (1 1, 2 1, 2 2, 1 2, 1 1)), ((-1 -1, -1 -2, -2 -2, -2 -1, -1 -1)))

  • GEOMETRYCOLLECTION(POINT(2 0),POLYGON((0 0, 1 0, 1 1, 0 1, 0 0)))

Geometry vs Geography

It is common to have data in which the coordinate are geographics or latitude/longitude. Unlike coordinates in Mercator or UTM, geographic coordinates are not Cartesian coordinates.

  • Geographic coordinates do not represent a linear distance from an origin as plotted on a plane. Rather, these spherical coordinates describe angular coordinates on a globe.

  • Spherical coordinates specify a point by the angle of rotation from a reference meridian (longitude), and the angle from the equator (latitude).

You can treat geographic coordinates as approximate Cartesian coordinates and continue to do spatial calculations. However, measurements of distance, length and area will be nonsensical. Since spherical coordinates measure angular distance, the units are in degrees.

Pinot supports both geometry and geography types, which can be constructed by the corresponding functions as shown in . And for the geography types, the measurement functions such as ST_Distance and ST_Area calculate the spherical distance and area on earth respectively.

Geospatial functions

For manipulating geospatial data, Pinot provides a set of functions for analyzing geometric components, determining spatial relationships, and manipulating geometries. In particular, geospatial functions that begin with the ST_ prefix support the SQL/MM specification.

Following geospatial functions are available out of the box in Pinot-

Aggregations

ST_Union(geometry[] g1_array) → Geometry This aggregate function returns a MULTI geometry or NON-MULTI geometry from a set of geometries. it ignores NULL geometries.

Constructors

  • ST_GeomFromText(String wkt) → Geometry Returns a geometry type object from WKT representation, with the optional spatial system reference.

  • ST_GeomFromWKB(bytes wkb) → Geometry Returns a geometry type object from WKB representation.

  • ST_Point(double x, double y) → Point Returns a geometry type point object with the given coordinate values.

  • ST_Polygon(String wkt) → Polygon Returns a geometry type polygon object from .

  • ST_GeogFromWKB(bytes wkb) → Geography Creates a geography instance from a

  • ST_GeogFromText(String wkt) → Geography Return a specified geography value from .

Measurements

  • ST_Area(Geometry/Geography g) → double For geometry type, it returns the 2D Euclidean area of a geometry. For geography, returns the area of a polygon or multi-polygon in square meters using a spherical model for Earth.

  • ST_Distance(Geometry/Geography g1, Geometry/Geography g2) → double For geometry type, returns the 2-dimensional cartesian minimum distance (based on spatial ref) between two geometries in projected units. For geography, returns the great-circle distance in meters between two SphericalGeography points. Note that g1, g2 shall have the same type.

  • ST_GeometryType(Geometry g) → String Returns the type of the geometry as a string. e.g.: ST_Linestring, ST_Polygon,ST_MultiPolygon etc.

Outputs

  • ST_AsBinary(Geometry/Geography g) → bytes Returns the WKB representation of the geometry.

  • ST_AsText(Geometry/Geography g) → string Returns the WKT representation of the geometry/geography.

Relationship

  • ST_Contains(Geometry, Geometry) → boolean Returns true if and only if no points of the second geometry/geography lie in the exterior of the first geometry/geography, and at least one point of the interior of the first geometry lies in the interior of the second geometry.

  • ST_Equals(Geometry, Geometry) → boolean Returns true if the given geometries represent the same geometry/geography.

  • ST_Within(Geometry, Geometry) → boolean Returns true if first geometry is completely inside second geometry.

Geospatial index

Geospatial functions are typically expensive to evaluate, and using geoindex can greatly accelebrate the query evaluation. Geoindexing in Pinot is based on Uber’s , a hexagons-based hierarchical gridding. A given geospatial location (longitude, latitude) can map to one hexagon (represented as H3Index). And its neighbors in H3 can be approximated by a ring of hexagons. To quickly identify the distance between any given two geospatial locations, we can convert the two locations in the H3Index, and then check the H3 distance between them. H3 distance is measured as the number of hexagons. For example, in the diagram below, the red hexagons are within the 1 distance of the central hexagon. Moreover, the size of the hexagon is determined by the resolution of the indexing. Please check this table for the level of and the corresponding precision (measured in km).

How to use Geoindex

To use the geoindex, first declare the geolocation field as bytes in the schema, as in the example of the .

Note the use of transformFunction that converts the created point into SphericalGeography format, which is needed in the ST_Distance function.

Next, declare the geospatial index in the table config:

So the query below will use the geoindex to filter the Starbucks stores within 5km of the given point in the bay area.

How Geoindex works

Geoindex in Pinot accelerates the query evaluation without compromising the correctness of the query result. Currently, geoindex supports the ST_Distance function used in the range predicates in the WHERE clause, as shown in the query example in the previous section.

At the high level, geoindex is used for retrieving the records within the nearby hexagons of the given location, and then use ST_Distance to accurately filter the matched results.

As in the example diagram above, if we want to find all relevant points within a given distance at San Francisco (represented in the area within the red circle), then the algorithm with geoindex works as the following:

  • Find the H3 distance x that contains the range (i.e. red circle)

  • For the points within the H3 distance (i.e. covered by the hexagons within ), we can directly take those points without filtering

  • For the points falling into the H3 distance (i.e. in the hexagons of kRing(x)), we do filtering on them by evaluating the condition ST_Distance(loc1, loc2) < x

Stream ingestion example

The Docker instructions on this page are still WIP

So far, we setup our cluster, ran some queries on the demo tables and explored the admin endpoints. We also uploaded some sample batch data for transcript table.

Now, it's time to ingest from a sample stream into Pinot. The rest of the instructions assume you're using (inside a pinot-quickstart container).

Data Stream

First, we need to setup a stream. Pinot has out-of-the-box realtime ingestion support for Kafka. Other streams can be plugged in, more details in .

Let's setup a demo Kafka cluster locally, and create a sample topic transcript-topic

Start Kafka

Create a Kafka Topic

Start Kafka

Start Kafka cluster on port 9876 using the same Zookeeper from the quick-start examples

Create a Kafka topic

Download the latest . Create a topic

Creating a Schema

If you followed the , you have already pushed a schema for your sample table. If not, head over to on that page, to learn how to create a schema for your sample data.

Creating a table config

If you followed , you learnt how to push an offline table and schema. Similar to the offline table config, we will create a realtime table config for the sample. Here's the realtime table config for the transcript table. For a more detailed overview about table, checkout .

Uploading your schema and table config

Now that we have our table and schema, let's upload them to the cluster. As soon as the realtime table is created, it will begin ingesting from the Kafka topic.

Loading sample data into stream

Here's a JSON file for transcript table data:

Push sample JSON into Kafka topic, using the Kafka script from the Kafka download

Ingesting streaming data

As soon as data flows into the stream, the Pinot table will consume it and it will be ready for querying. Head over to the to checkout the realtime data

recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
  configs: 
			key1 : 'value1'
			key2 : 'value2'
dataFormat: 'csv'
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
configs:
	fileFormat: 'default' #should be one of default, rfc4180, excel, tdf, mysql
	header: 'columnName seperated by delimiter'
  delimiter: ','
  multiValueDelimiter: '-'
dataFormat: 'avro'
className: 'org.apache.pinot.plugin.inputformat.avro.AvroRecordReader'
dataFormat: 'json'
className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
dataFormat: 'thrift'
className: 'org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReader'
configs:
	thriftClass: 'ParserClassName'
dataFormat: 'parquet'
className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
dataFormat: 'parquet'
className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader'

Parquet Data Type

Java Data Type

Comment

INT96

INT64

ParquetINT96 type converts nanoseconds

to Pinot INT64 type of milliseconds

DECIMAL

DOUBLE

dataFormat: 'orc'
className: 'org.apache.pinot.plugin.inputformat.orc.ORCRecordReader'

ORC Data Type

Java Data Type

BOOLEAN

String

SHORT

Integer

INT

Integer

LONG

Integer

FLOAT

Float

DOUBLE

Double

STRING

String

VARCHAR

String

CHAR

String

LIST

Object[]

MAP

Map<Object, Object>

DATE

Long

TIMESTAMP

Long

BINARY

byte[]

BYTE

Integer

dataFormat: 'proto'
className: 'org.apache.pinot.plugin.inputformat.protobuf.ProtoBufRecordReader'
configs:
	descriptorFile: 'file:///path/to/sample.desc'
protoc --include_imports --descriptor_set_out=/absolute/path/to/output.desc /absolute/path/to/input.proto

0.5.0

This release includes many new features on Pinot ingestion and connectors, query capability and a revamped controller UI.

Summary

This release includes many new features on Pinot ingestion and connectors (e.g., support for filtering during ingestion which is configurable in table config; support for json during ingestion; proto buf input format support and a new Pinot JDBC client), query capability (e.g., a new GROOVY transform function UDF) and admin functions (a revamped Cluster Manager UI & Query Console UI). It also contains many key bug fixes. See details below.

The release was cut from the following commit: d1b4586 and the following cherry-picks:

  • 63a4fd4

  • a7f7f46

  • dafbef1

  • ced3a70

  • d902c1a

Notable New Features

  • Allowing update on an existing instance config: PUT /instances/{instanceName} with Instance object as the pay-load (#PR4952)

  • Add PinotServiceManager to start Pinot components (#PR5266)

  • Support for protocol buffers input format. (#PR5293)

  • Add GenericTransformFunction wrapper for simple ScalarFunctions (PR#5440) — Adding support to invoke any scalar function via GenericTransformFunction

  • Add Support for SQL CASE Statement (PR#5461)

  • Support distinctCountRawThetaSketch aggregation that returns serialized sketch. (PR#5465)

  • Add multi-value support to SegmentDumpTool (PR#5487) — add segment dump tool as part of the pinot-tool.sh script

  • Add json_format function to convert json object to string during ingestion. (PR#5492) — Can be used to store complex objects as a json string (which can later be queries using jsonExtractScalar)

  • Support escaping single quote for SQL literal (PR#5501) — This is especially useful for DistinctCountThetaSketch because it stores expression as literal E.g. DistinctCountThetaSketch(..., 'foo=''bar''', ...)

  • Support expression as the left-hand side for BETWEEN and IN clause (PR#5502)

  • Add a new field IngestionConfig in TableConfig — FilterConfig: ingestion level filtering of records, based on filter function. (PR#5597) — TransformConfig: ingestion level column transformations. This was previously introduced in Schema (FieldSpec#transformFunction), and has now been moved to TableConfig. It continues to remain under schema, but we recommend users to set it in the TableConfig starting this release (PR#5681).

  • Allow star-tree creation during segment load (#PR5641) — Introduced a new boolean config enableDynamicStarTreeCreation in IndexingConfig to enable/disable star-tree creation during segment load.

  • Support for Pinot clients using JDBC connection (#PR5602)

  • Support customized accuracy for distinctCountHLL, distinctCountHLLMV functions by adding log2m value as the second parameter in the function. (#PR5564) —Adding cluster config: default.hyperloglog.log2m to allow user set default log2m value.

  • Add segment encryption on Controller based on table config (PR#5617)

  • Add a constraint to the message queue for all instances in Helix, with a large default value of 100000. (PR#5631)

  • Support order-by aggregations not present in SELECT (PR#5637) — Example: "select subject from transcript group by subject order by count() desc" This is equivalent to the following query but the return response should not contain count(). "select subject, count() from transcript group by subject order by count() desc"

  • Add geo support for Pinot queries (PR#5654) — Added geo-spatial data model and geospatial functions

  • Cluster Manager UI & Query Console UI revamp (PR#5684 and PR#5732) — updated cluster manage UI and added table details page and segment details page

  • Add Controller API to explore Zookeeper (PR#5687)

  • Support BYTES type for dictinctCount and group-by (PR#5701 and PR#5708) —Add BYTES type support to DistinctCountAggregationFunction —Correctly handle BYTES type in DictionaryBasedAggregationOperator for DistinctCount

  • Support for ingestion job spec in JSON format (#PR5729)

  • Improvements to RealtimeProvisioningHelper command (#PR5737) — Improved docs related to ingestion and plugins

  • Added GROOVY transform function UDF (#PR5748) — Ability to run a groovy script in the query as a UDF. e.g. string concatenation: SELECT GROOVY('{"returnType": "INT", "isSingleValue": true}', 'arg0 + " " + arg1', columnA, columnB) FROM myTable

Special notes

  • Changed the stream and metadata interface (PR#5542) — This PR concludes the work for the issue #5359 to extend offset support for other streams

  • TransformConfig: ingestion level column transformations. This was previously introduced in Schema (FieldSpec#transformFunction), and has now been moved to TableConfig. It continues to remain under schema, but we recommend users to set it in the TableConfig starting this release (PR#5681).

  • Config key enable.case.insensitive.pql in Helix cluster config is deprecated, and replaced with enable.case.insensitive. (#PR5546)

  • Change default segment load mode to MMAP. (PR#5539) —The load mode for segments currently defaults to heap.

Major Bug fixes

  • Fix bug in distinctCountRawHLL on SQL path (#5494)

  • Fix backward incompatibility for existing stream implementations (#5549)

  • Fix backward incompatibility in StreamFactoryConsumerProvider (#5557)

  • Fix logic in isLiteralOnlyExpression. (#5611)

  • Fix double memory allocation during operator setup (#5619)

  • Allow segment download url in Zookeeper to be deep store uri instead of hardcoded controller uri (#5639)

  • Fix a backward compatible issue of converting BrokerRequest to QueryContext when querying from Presto segment splits (#5676)

  • Fix the issue that PinotSegmentToAvroConverter does not handle BYTES data type. (#5789)

Backward Incompatible Changes

  • PQL queries with HAVING clause will no longer be accepted for the following reasons: (#PR5570) — HAVING clause does not apply to PQL GROUP-BY semantic where each aggregation column is ordered individually — The current behavior can produce inaccurate results without any notice — HAVING support will be added for SQL queries in the next release

  • Because of the standardization of the DistinctCountThetaSketch predicate strings, please upgrade Broker before Server. The new Broker can handle both standard and non-standard predicate strings for backward-compatibility. (#PR5613)

Java

Pinot provides a native java client to execute queries on the cluster. The client makes it easier for user to query data. The client is also tenant-aware and thus is able to redirect the queries to the correct broker.

Installation

You can use the client by including the following dependency -

<dependency>
    <groupId>org.apache.pinot</groupId>
    <artifactId>pinot-java-client</artifactId>
    <version>0.5.0</version>
</dependency>
include 'org.apache.pinot:pinot-java-client:0.5.0'

You can also build the code for java client locally and use it.

Usage

Here's an example of how to use the pinot-java-client to query Pinot.

import org.apache.pinot.client.Connection;
import org.apache.pinot.client.ConnectionFactory;
import org.apache.pinot.client.Request;
import org.apache.pinot.client.ResultSetGroup;
import org.apache.pinot.client.ResultSet;

/**
 * Demonstrates the use of the pinot-client to query Pinot from Java
 */
public class PinotClientExample {

  public static void main(String[] args) {

    // pinot connection
    String zkUrl = "localhost:2181";
    String pinotClusterName = "PinotCluster";
    Connection pinotConnection = ConnectionFactory.fromZookeeper(zkUrl + "/" + pinotClusterName);

    String query = "SELECT COUNT(*) FROM myTable GROUP BY foo";

    // set queryType=sql for querying the sql endpoint
    Request pinotClientRequest = new Request("sql", query);
    ResultSetGroup pinotResultSetGroup = pinotConnection.execute(pinotClientRequest);
    ResultSet resultTableResultSet = pinotResultSetGroup.getResultSet(0);

    int numRows = resultTableResultSet.getRowCount();
    int numColumns = resultTableResultSet.getColumnCount();
    String columnValue = resultTableResultSet.getString(0, 1);
    String columnName = resultTableResultSet.getColumnName(1);

    System.out.println("ColumnName: " + columnName + ", ColumnValue: " + columnValue);
  }
}

Connection Factory

The client provides a ConnectionFactory class to create connections to a Pinot cluster. The factory supports the following methods to create a connection -

  • Zookeeper (Recommended) - Comma seperated list of zookeeper of the cluster. This is the recommended method which can redirect queries to appropriate brokers based on tenant/table.

  • Broker list - Comma seperated list of the brokers in the cluster. This should only be used in standalone setups or for POC, unless you have a load balancer setup for brokers.

  • Properties file - You can also put the broker list as brokerList in a properties file and provide the path to that file to the factory. This should only be used in standalone setups or for POC, unless you have a load balancer setup for brokers.

Here's an example demonstrating all methods of Connection factory -

Connection connection = ConnectionFactory.fromZookeeper
  ("some-zookeeper-server:2191/zookeeperPath");

Connection connection = ConnectionFactory.fromProperties("demo.properties");

Connection connection = ConnectionFactory.fromHostList
  ("broker-1:1234", "broker-2:1234", ...);

Query Methods

You can run the query in both blocking as well as async manner. Use

  • Connection.execute(org.apache.pinot.client.Request) for blocking queries

  • Connection.executeAsync(org.apache.pinot.client.Request) for asynchronous queries that return a future object.

ResultSetGroup resultSetGroup = 
  connection.execute(new Request("sql", "select * from foo..."));
// OR
Future<ResultSetGroup> futureResultSetGroup = 
  connection.executeAsync(new Request("sql", "select * from foo..."));

You can also use PreparedStatement to escape query parameters. We don't store the Prepared Statement in the database and hence it won't increase the subsequent query performance.

PreparedStatement statement = 
    connection.prepareStatement(new Request("sql", "select * from foo where a = ?"));
statement.setString(1, "bar");

ResultSetGroup resultSetGroup = statement.execute();
// OR
Future<ResultSetGroup> futureResultSetGroup = statement.executeAsync();

Result Set

Results can be obtained with the various get methods in the first ResultSet, obtained through the getResultSet(int) method:

Request request = new Request("sql", "select foo, bar from baz where quux = 'quuux'");
ResultSetGroup resultSetGroup = connection.execute(request);
ResultSet resultTableResultSet = pinotResultSetGroup.getResultSet(0);

for (int i = 0; i < resultSet.getRowCount(); ++i) {
  System.out.println("foo: " + resultSet.getString(i, 0));
  System.out.println("bar: " + resultSet.getInt(i, 1));
}

PQL Queries

If queryFormat pql is used in the Request, there are some differences in how the results can be accessed, depending on the query.

In the case of aggregation, each aggregation function is within its own ResultSet. A query with multiple aggregation function will return one result set per aggregation function, as they are computed in parallel.

ResultSetGroup resultSetGroup = 
    connection.execute(new Request("pql", "select max(foo), min(foo) from bar"));

System.out.println("Number of result groups:" +
    resultSetGroup.getResultSetCount(); // 2, min(foo) and max(foo)
ResultSet resultSetMax = resultSetGroup.getResultSet(0);
System.out.println("Max foo: " + resultSetMax.getInt(0));
ResultSet resultSetMin = resultSetGroup.getResultSet(1);
System.out.println("Min foo: " + resultSetMin.getInt(0));

In case of aggregation with GROUP BY, there will be as many ResultSets as the number of aggregations, each of which will contain multiple results grouped by a grouping key.

ResultSetGroup resultSetGroup = 
    connection.execute(
        new Request("pql", "select min(foo), max(foo) from bar group by baz"));

System.out.println("Number of result groups:" +
    resultSetGroup.getResultSetCount(); // 2, min(foo) and max(foo)

ResultSet minResultSet = resultSetGroup.getResultSet(0);
for(int i = 0; i < minResultSet.length(); ++i) {
    System.out.println("Minimum foo for " + minResultSet.getGroupKeyString(i, 1) +
        ": " + minResultSet.getInt(i));
}

ResultSet maxResultSet = resultSetGroup.getResultSet(1);
for(int i = 0; i < maxResultSet.length(); ++i) {
    System.out.println("Maximum foo for " + maxResultSet.getGroupKeyString(i, 1) +
        ": " + maxResultSet.getInt(i));
}

This section is only applicable for PQL endpoint, which is deprecated and will be deleted soon. For more information about the endpoints, visit Querying Pinot.

select playerName, max(hits) 
from baseballStats 
group by playerName 
order by max(hits) desc
select sum(hits), sum(homeRuns), sum(numberOfGames) 
from baseballStats 
where yearID > 2010
select * 
from baseballStats 
order by league
{
  "schemaName": "baseballStats",
  "dimensionFieldSpecs": [
    {
      "name": "playerID",
      "dataType": "STRING"
    },
    {
      "name": "yearID",
      "dataType": "INT"
    },
    {
      "name": "teamID",
      "dataType": "STRING"
    },
    {
      "name": "league",
      "dataType": "STRING"
    },
    {
      "name": "playerName",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "playerStint",
      "dataType": "INT"
    },
    {
      "name": "numberOfGames",
      "dataType": "INT"
    },
    {
      "name": "numberOfGamesAsBatter",
      "dataType": "INT"
    },
    {
      "name": "AtBatting",
      "dataType": "INT"
    },
    {
      "name": "runs",
      "dataType": "INT"
    },
    {
      "name": "hits",
      "dataType": "INT"
    },
    {
      "name": "doules",
      "dataType": "INT"
    },
    {
      "name": "tripples",
      "dataType": "INT"
    },
    {
      "name": "homeRuns",
      "dataType": "INT"
    },
    {
      "name": "runsBattedIn",
      "dataType": "INT"
    },
    {
      "name": "stolenBases",
      "dataType": "INT"
    },
    {
      "name": "caughtStealing",
      "dataType": "INT"
    },
    {
      "name": "baseOnBalls",
      "dataType": "INT"
    },
    {
      "name": "strikeouts",
      "dataType": "INT"
    },
    {
      "name": "intentionalWalks",
      "dataType": "INT"
    },
    {
      "name": "hitsByPitch",
      "dataType": "INT"
    },
    {
      "name": "sacrificeHits",
      "dataType": "INT"
    },
    {
      "name": "sacrificeFlies",
      "dataType": "INT"
    },
    {
      "name": "groundedIntoDoublePlays",
      "dataType": "INT"
    },
    {
      "name": "G_old",
      "dataType": "INT"
    }
  ]
}
http://localhost:9000
Query Console
Pinot Query Language
Pinot Admin UI
Table -> List all tables in cluster
Tables -> Get/Enable/Disable/Drop a table
Schema -> List all schemas in the cluster
Schema -> Get a schema
Segment -> List all segments
Batch Ingestion
Stream ingestion
List all tables in cluster
List all schemas in the cluster
upsert_meetupRsvp_schema.json
{
    "primaryKeyColumns": ["event_id"]
}
upsert mode: full
{
  "upsertConfig": {
    "mode": "FULL"
  }
}
upsert mode: partial
{
  "upsertConfig": {
    "mode": "PARTIAL",
    "partialUpsertStrategies":{
      "rsvp_count": "INCREMENT",
      "group_name": "UNION",
      "venue_name": "APPEND"
    }
  }
}

Strategy

Description

OVERWRITE

Overwrite the column of the last record

INCREMENT

Add the new value to the existing values

APPEND

Add the new item to the Pinot unordered set

UNION

Add the new item to the Pinot unordered set if not exists

comparison column
{
  "upsertConfig": {
    "mode": "FULL",
    "comparisonColumn": "anotherTimeColumn"
  }
}
routing
{
  "routing": {
    "instanceSelectorType": "strictReplicaGroup"
  }
}
upsert_meetupRsvp_realtime_table_config.json
{
  "tableName": "meetupRsvp",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "mtime",
    "timeType": "MILLISECONDS",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "1",
    "segmentPushType": "APPEND",
    "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
    "schemaName": "meetupRsvp",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowLevel",
      "stream.kafka.topic.name": "meetupRSVPEvents",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.hlc.zk.connect.string": "localhost:2191/kafka",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.zk.broker.url": "localhost:2191/kafka",
      "stream.kafka.broker.list": "localhost:19092",
      "realtime.segment.flush.threshold.size": 30,
      "realtime.segment.flush.threshold.rows": 30
    }
  },
  "metadata": {
    "customConfigs": {}
  },
  "routing": {
    "instanceSelectorType": "strictReplicaGroup"
  },
  "upsertConfig": {
    "mode": "FULL"
  }
}
# stop previous quick start cluster, if any
bin/quick-start-upsert-streaming.sh
# stop previous quick start cluster, if any
bin/quick-start-partial-upsert-streaming.sh
send
partitioned replica-group assignment
Query the upsert table
Query the partial upsert table
Explain partial upsert table
Disable the upsert during query via query option
geoindex schema
{
      "dataType": "BYTES",
      "name": "location_st_point",
      "transformFunction": "toSphericalGeography(stPoint(lon,lat))"
}
geoindex tableConfig
{
  "fieldConfigList": [
  {
    "name": "location_st_point",
    "encodingType":"RAW",
    "indexType":"H3",
    "properties": {
    "resolutions": "5"
     }
    }
  ],
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "noDictionaryColumns": [
      "location_st_point"
    ]
  },
}
SELECT address, ST_DISTANCE(location_st_point, ST_Point(-122, 37, 1))
FROM starbucksStores
WHERE ST_DISTANCE(location_st_point, ST_Point(-122, 37, 1)) < 5000
limit 1000
Open Geospatial Consortium’s (OGC) OpenGIS Specifications
section
WKT representation
Well-Known Binary geometry representation (WKB)
Well-Known Text representation or extended (WKT)
H3
resolutions
QuickStart example
kRing(x)
Hexagonal grid in H3
Geoindex example
/tmp/pinot-quick-start/transcript-table-realtime.json
{
  "tableName": "transcript",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "timestampInEpoch",
    "timeType": "MILLISECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "transcript-topic",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "localhost:9876",
      "realtime.segment.flush.threshold.size": "0",
      "realtime.segment.flush.threshold.time": "24h",
      "realtime.segment.flush.desired.size": "50M",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}
docker run \
    --network=pinot-demo \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-streaming-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \
    -controllerHost pinot-quickstart \
    -controllerPort 9000 \
    -exec
bin/pinot-admin.sh AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \
    -exec
/tmp/pinot-quick-start/rawData/transcript.json
{"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"Maths","score":3.8,"timestampInEpoch":1571900400000}
{"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"History","score":3.5,"timestampInEpoch":1571900400000}
{"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Maths","score":3.2,"timestampInEpoch":1571900400000}
{"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Chemistry","score":3.6,"timestampInEpoch":1572418800000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Geography","score":3.8,"timestampInEpoch":1572505200000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"English","score":3.5,"timestampInEpoch":1572505200000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Maths","score":3.2,"timestampInEpoch":1572678000000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Physics","score":3.6,"timestampInEpoch":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"Maths","score":3.8,"timestampInEpoch":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"English","score":3.5,"timestampInEpoch":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"History","score":3.2,"timestampInEpoch":1572854400000}
{"studentID":212,"firstName":"Nick","lastName":"Young","gender":"Male","subject":"History","score":3.6,"timestampInEpoch":1572854400000}
bin/kafka-console-producer.sh \
    --broker-list localhost:9876 \
    --topic transcript-topic < /tmp/pinot-quick-start/rawData/transcript.json
Pinot running in Docker
Pluggable Streams
docker run \
    --network pinot-demo --name=kafka \
    -e KAFKA_ZOOKEEPER_CONNECT=pinot-quickstart:2123/kafka \
    -e KAFKA_BROKER_ID=0 \
    -e KAFKA_ADVERTISED_HOST_NAME=kafka \
    -d wurstmeister/kafka:latest
docker exec \
  -t kafka \
  /opt/kafka/bin/kafka-topics.sh \
  --zookeeper pinot-quickstart:2123/kafka \
  --partitions=1 --replication-factor=1 \
  --create --topic transcript-topic
bin/pinot-admin.sh  StartKafka -zkAddress=localhost:2123/kafka -port 9876
Kafka
Batch upload sample data
Creating a schema
Batch upload sample data
Table
Query Console
bin/kafka-topics.sh --create --bootstrap-server localhost:9876 --replication-factor 1 --partitions 1 --topic transcript-topic

Manual cluster setup

This quick start guide will show you how to set up a Pinot cluster manually.

Start Pinot components (scripts or Docker images)

A manual cluster setup consists of the following components - 1. Zookeeper 2. Controller 3. Broker 4. Server 5. Kafka

We will run each of these components in separate containers

Start Pinot Components using docker

Prerequisites

If running locally, please ensure your docker cluster has enough resources, below is a sample config.

Pull docker image

You can try out the pre-built Pinot all-in-one docker image.

(Optional) You can also follow the instructions to build your own images.

0. Create a Network

Create an isolated bridge network in docker

1. Start Zookeeper

Start Zookeeper in daemon mode. This is a single node zookeeper setup. Zookeeper is the central metadata store for Pinot and should be set up with replication for production use. For more information, see .

2. Start Pinot Controller

Start Pinot Controller in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Please tune-Xms and-Xmx if your machine doesn't have enough resources.

3. Start Pinot Broker

Start Pinot Broker in daemon and connect to Zookeeper.

The command below expects a 4GB memory container. Please tune-Xms and-Xmx if your machine doesn't have enough resources.

4. Start Pinot Server

Start Pinot Server in daemon and connect to Zookeeper.

The command below expects a 16GB memory container. Please tune-Xms and-Xmx if your machine doesn't have enough resources.

5. Start Kafka

Optionally, you can also start Kafka for setting up realtime streams. This brings up the Kafka broker on port 9092.

Now all Pinot related components are started as an empty cluster.

You can run the below command to check container status.

Sample Console Output

Start Pinot Components using Docker Compose

Prerequisites

If running locally, please ensure your docker cluster has enough resources, below is a sample config.

Create a file called docker-compose.yml that contains the following:

Prerequisites

Follow this instruction in to get Pinot

Start Pinot components via launcher scripts

1. Start Zookeeper

You can use to browse the Zookeeper instance.

2. Start Pinot Controller

The examples below are for Java 8 users.

For Java 11+ users, please remove the GC settings insideJAVA_OPTS. So it looks like: export JAVA_OPTS="-Xms4G -Xmx8G"

3. Start Pinot Broker

4. Start Pinot Server

5. Start Kafka

Now all Pinot related components are started as an empty cluster.

Run docker-compose up to launch all the components.

You can run the below command to check container status.

Sample Console Output

Now it's time to start adding data to the cluster. Check out some of the or follow the and for instructions on loading your own data.

Stream ingestion

Apache Pinot allows user to consume data from streams and push it directly to pinot database. This process is known as Stream Ingestion. Stream Ingestion allows user to query data within seconds of publishing.

Stream Ingestion provides support for checkpoints out of the box for preventing data loss.

Stream ingestion requires the following steps -

  1. Create schema configuration

  2. Create table configuration

  3. Upload table and schema spec

Let's take a look at each of the following steps in a bit more detail. Let us assume the data to be ingested is in the following format -

Create Schema Configuration

Schema defines the fields along with their data types which are available in the datasource. Schema also defines the fields which serve as dimensions , metrics and timestamp respectively.

Follow for more details on schema configuration. For our sample data, the schema configuration should look as follows

Create Table Configuration

The next step is to create a table where all the ingested data will flow and can be queried. Unlike batch ingestion, table configuration for realtime ingestion also triggers the data ingestion job.For a more detailed overview about tables, check out the reference.

The realtime table configuration consists of the the following fields -

  • tableName - The name of the table where the data should flow

  • tableType - The internal type for the table. Should always be set to REALTIME for realtime ingestion

  • segmentsConfig -

  • tableIndexConfig - defines which column to use for indexing along with the type of index. You can refer [Indexing Configs] for full configuration. It consists of the following required fields -

    • loadMode - specifies how the segments should be loaded. Should be one of heap or mmap. Here's the difference between both the configs

    • streamConfig - specifies the datasource along with the necessary configs to start consuming the realtime data. The streamConfig can be thought of as equivalent of job spec in case of batch ingestion. The following options are supported in this config -

You can also specify additional configs for the consumer by prefixing the key with stream.[streamType] where streamType is the name of the streaming platform. For our sample data and schema, the table config will look like -

Upload schema and table config

Now that we have our table and schema configurations, let's upload them to the Pinot cluster. As soon as the configs are uploaded, pinot will start ingesting available records from the topic.

Custom Ingestion Support

We are working on adding more integrations such as Kinesis out of the box. You can easily write your on ingestion plugin in case it is not supported out of the box. Follow for a walkthrough.

Supported Aggregations

Pinot provides support for aggregations using GROUP BY. You can use the following functions to get the aggregated value.

Multi-value column functions

The following aggregation functions can be used for multi-value columns

Introduction

Apache Pinot, a real-time distributed OLAP datastore, purpose-built for low-latency high throughput analytics, perfect for user-facing analytical workloads.

Join us in our Slack channel for questions, troubleshooting, and feedback. You can request an invite from - .

We'd love to hear from you!

Pinot is a real-time distributed OLAP datastore, purpose-built to provide ultra low-latency analytics, even at extremely high throughput. It can ingest directly from streaming data sources - such as Apache Kafka and Amazon Kinesis - and make the events available for querying instantly. It can also ingest from batch data sources such as Hadoop HDFS, Amazon S3, Azure ADLS, and Google Cloud Storage.

At the heart of the system is a columnar store, with several smart indexing and pre-aggregation techniques for low latency. This makes Pinot the most perfect fit for user-facing realtime analytics. At the same time, Pinot is also a great choice for other analytical use-cases, such as internal dashboards, anomaly detection, and ad-hoc data exploration.

Pinot was built by engineers at LinkedIn and Uber and is designed to scale up and out with no upper bound. Performance always remains constant based on the size of your cluster and an expected query per second (QPS) threshold.

User-facing realtime analytics

User-facing analytics, or site-facing analytics, is the analytical tools and applications that you would expose directly to the end-users of your product. In a user-facing analytics application, think of the user-base as ALL end users of an App. This App could be a social networking app, or a food delivery app - anything at all. It’s not just a few analysts doing offline analysis, or a handful of data scientists in a company running ad-hoc queries. This is ALL end-users, receiving personalized analytics on their personal devices (think 100s of 1000s of queries per second). These queries are triggered by apps, and not written by people, and so the scale will be as much as the active users on that App (think millions of events/sec)

And, this is for all the freshest possible data, which touches on the other aspect here - realtime analytics. "Yesterday" might be a long time ago for some businesses and they cannot wait for ETLs and batch jobs. The data needs to be used for analytics, as soon as it is generated (think latencies < 1s).

Why is user-facing realtime analytics is so challenging?

Wanting such a user-facing analytics application, using realtime events, sounds great. But what does it mean for the underlying infrastructure, to support such an analytical workload?

  1. Such applications require the freshest possible data, and so the system needs to be able to ingest data in realtime and make it available for querying, also in realtime.

  2. Data for such apps tend to be event data, for a wide range of actions, coming from multiple sources, and so the data comes in at a very high velocity and tends to be highly dimensional.

  3. Queries are triggered by end-users interacting with apps - with queries per second in hundreds of thousands, with arbitrary query patterns, and latencies are expected to be in milliseconds for good user-experience.

  4. And further do all of the above, while being scalable, reliable, highly available and have a low cost to serve.

This video talks more about user-facing realtime analytics, and how Pinot is used to achieve that.

Here's another great video that goes into the details of how Pinot tackles some of the challenges faced in handling a user-facing analytics workload.

Companies using Pinot

Pinot originated at LinkedIn which currently has one of the largest deployment powering more than 50+ user facing applications such as Viewed My Profile, Talent Analytics, Company Analytics, Ad Analytics and many more. At LinkedIn, Pinot also serves as the backend for to visualize and monitor 10,000+ business metrics.

With Pinot's growing popularity, several companies are now using it in production to power a variety of analytics use cases. A detailed list of companies using Pinot can be found .​

Features

  • A column-oriented database with various compression schemes such as Run Length, Fixed Bit Length

  • Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index, StarTree Index, Range Index,

  • Ability to optimize query/execution plan based on query and segment metadata

  • Near real-time ingestion from streams such as Kafka, Kinesis and batch ingestion from sources such as Hadoop, S3, Azure, GCS

  • SQL-like language that supports selection, aggregation, filtering, group by, order by, distinct queries on data

  • Support for multi-valued fields

  • Horizontally scalable and fault-tolerant

When should I use it?

Pinot is designed to execute OLAP queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion.

User facing Analytics Products

Pinot is the perfect choice for user-facing analytics products. Pinot was originally built at LinkedIn to power rich interactive real-time analytic applications such as , , , and many more. is another example of a customer-facing Analytics App. At LinkedIn, Pinot powers 50+ user-facing products, ingesting millions of events per second and serving 100k+ queries per second at millisecond latency.

Real-time Dashboard for Business Metrics

Pinot can be also be used to perform typical analytical operations such as slice and dice, drill down, roll up, and pivot on large scale multi-dimensional data. For instance, at LinkedIn, Pinot powers dashboards for thousands of business metrics. One can connect various BI tools such Superset, Tableau, or PowerBI to visualize data in Pinot.

Instructions to connect Pinot with Superset can found .

Anomaly Detection

In addition to visualizing data in Pinot, one can run Machine Learning Algorithms to detect Anomalies on the data stored in Pinot. See for more information on how to use Pinot for Anomaly Detection and Root Cause Analysis.

Frequently asked question when getting started

Is Pinot a data warehouse or a database?

While Pinot doesn't match the typical mold of a database product, it is best understood based on your role as either an analyst, data scientist, or application developer.

Enterprise business intelligence

For analysts and data scientists, Pinot is best viewed as a highly-scalable data platform for business intelligence. In this view, Pinot converges big data platforms with the traditional role of a data warehouse, making it a suitable replacement for analysis and reporting.

Enterprise application development

For application developers, Pinot is best viewed as an immutable aggregate store that sources events from streaming data sources, such as Kafka, and makes it available for query using SQL.

As is the case with a microservice architecture, data encapsulation ends up requiring each application to provision its own data store, as opposed to sharing one OLTP database for reads and writes. In this case, it becomes difficult to query the complete view of a domain because it becomes stored in many different databases. This is costly in terms of performance, since it requires joins across multiple microservices that expose their data over HTTP under a REST API. To prevent this, Pinot can be used to aggregate all of the data across a microservice architecture into one easily queryable view of the domain.

Pinot prevent any possibility of sharing ownership of database tables across microservice teams. Developers can create their own query models of data from multiple systems of record depending on their use case and needs. As with all aggregate stores, query models are eventually consistent and immutable.

Get started

Our documentation is structured to let you quickly get to the content you need and is organized around the different concerns of users, operators, and developers. If you're new to Pinot and want to learn things by example, please take a look at our getting started section.

Starter guides

To start importing data into Pinot, check out our guides on batch import and stream ingestion based on our .

Query example

Pinot works very well for querying time series data with many dimensions and metrics over a vast unbounded space of records that scales linearly on a per node basis. Filters and aggregations are both easy and fast.

Pinot supports SQL for querying read-only data. Learn more about querying Pinot for time series data in our guide.

Installation

Pinot may be deployed to and operated on a cloud provider or a local or virtual machine. You may get started either with a bare-metal installation or a Kubernetes one (either locally or in the cloud). To get immediately started with Pinot, check out these quick start guides for bootstrapping a Pinot cluster using Docker or Kubernetes.

Standalone mode

Cluster mode

Learn

For a high-level overview that explains how Pinot works, please take a look at our basic concepts section.

To understand the distributed systems architecture that explains Pinot's operating model, please take a look at our basic architecture section.

Function

Description

Example

COUNT

Get the count of rows in a group

COUNT(*)

MIN

Get the minimum value in a group

MIN(playerScore)

MAX

Get the maximum value in a group

MAX(playerScore)

SUM

Get the sum of values in a group

SUM(playerScore)

AVG

Get the average of the values in a group

AVG(playerScore)

MODE

Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behavior can be overridden to get the maximum or the average mode.

MODE(playerScore)

MODE(playerScore, 'MIN')

MODE(playerScore, 'MAX')

MODE(playerScore, 'AVG')

MINMAXRANGE

Returns the max - min value in a group

MINMAXRANGE(playerScore)

PERCENTILE(column, N)

Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILE(playerScore, 50), PERCENTILE(playerScore, 99.9)

PERCENTILEEST(column, N)

Returns the Nth percentile of the group using Quantile Digest algorithm

PERCENTILEEST(playerScore, 50), PERCENTILEEST(playerScore, 99.9)

PERCENTILETDigest(column, N)

Returns the Nth percentile of the group using T-digest algorithm

PERCENTILETDIGEST(playerScore, 50), PERCENTILETDIGEST(playerScore, 99.9)

DISTINCT

Returns the distinct row values in a group

DISTINCT(playerName)

DISTINCTCOUNT

Returns the count of distinct row values in a group

DISTINCTCOUNT(playerName)

DISTINCTCOUNTBITMAP

Returns the count of distinct row values in a group. This function is accurate for INT column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collisions.

DISTINCTCOUNTBITMAP(playerName)

DISTINCTCOUNTHLL

Returns an approximate distinct count using HyperLogLog. It also takes an optional second argument to configure the log2m for the HyperLogLog.

DISTINCTCOUNTHLL(playerName, 12)

DISTINCTCOUNTRAWHLL

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

DISTINCTCOUNTRAWHLL(playerName)

FASTHLL (Deprecated)

WARN: will be deprecated soon. FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLL(playerName)

DISTINCTCOUNTTHETASKETCH

See Cardinality Estimation

DISTINCTCOUNTRAWTHETASKETCH

See Cardinality Estimation

SEGMENTPARTITIONEDDISTINCTCOUNT

Returns the count of distinct values of a column when the column is pre-partitioned for each segment, where there is no common value within different segments. This function calculates the exact count of distinct values within the segment, then simply sums up the results from different segments to get the final result.

SEGMENTPARTITIONEDDISTINCTCOUNT(playerName)

Function

Description

Example

COUNTMV

Get the count of rows in a group

COUNTMV(playerName)

MINMV

Get the minimum value in a group

MINMV(playerScores)

MAXMV

Get the maximum value in a group

MAXMV(playerScores)

SUMMV

Get the sum of values in a group

SUMMV(playerScores)

AVGMV

Get the avg of values in a group

AVGMV(playerScores)

MINMAXRANGEMV

Returns the max - min value in a group

MINMAXRANGEMV(playerScores)

PERCENTILEMV(column, N)

Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILEMV(playerScores, 50),

PERCENTILEMV(playerScores, 99.9)

PERCENTILEESTMV(column, N)

Returns the Nth percentile of the group using Quantile Digest algorithm

PERCENTILEESTMV(playerScores, 50),

PERCENTILEESTMV(playerScores, 99.9)

PERCENTILETDIGESTMV(column, N)

Returns the Nth percentile of the group using T-digest algorithm

PERCENTILETDIGESTMV(playerScores, 50),

PERCENTILETDIGESTMV(playerScores, 99.9),

DISTINCTCOUNTMV

Returns the count of distinct row values in a group

DISTINCTCOUNTMV(playerNames)

DISTINCTCOUNTBITMAPMV

Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision.

DISTINCTCOUNTBITMAPMV(playerNames)

DISTINCTCOUNTHLLMV

Returns an approximate distinct count using HyperLogLog in a group

DISTINCTCOUNTHLLMV(playerNames)

DISTINCTCOUNTRAWHLLMV

Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

DISTINCTCOUNTRAWHLLMV(playerNames)

FASTHLLMV (Deprecated)

stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format

FASTHLLMV(playerNames)

{"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"Maths","score":3.8,"timestamp":1571900400000}
{"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"History","score":3.5,"timestamp":1571900400000}
{"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Maths","score":3.2,"timestamp":1571900400000}
{"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Chemistry","score":3.6,"timestamp":1572418800000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Geography","score":3.8,"timestamp":1572505200000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"English","score":3.5,"timestamp":1572505200000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Maths","score":3.2,"timestamp":1572678000000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Physics","score":3.6,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"Maths","score":3.8,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"English","score":3.5,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"History","score":3.2,"timestamp":1572854400000}
{"studentID":212,"firstName":"Nick","lastName":"Young","gender":"Male","subject":"History","score":3.6,"timestamp":1572854400000}
/tmp/pinot-quick-start/transcript-schema.json
{
  "schemaName": "transcript",
  "dimensionFieldSpecs": [
    {
      "name": "studentID",
      "dataType": "INT"
    },
    {
      "name": "firstName",
      "dataType": "STRING"
    },
    {
      "name": "lastName",
      "dataType": "STRING"
    },
    {
      "name": "gender",
      "dataType": "STRING"
    },
    {
      "name": "subject",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "score",
      "dataType": "FLOAT"
    }
  ],
  "dateTimeFieldSpecs": [{
    "name": "timestamp",
    "dataType": "LONG",
    "format" : "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
  }]
}
heap: Segments are loaded on direct-memory. Note, 'heap' here is a legacy misnomer, and it does not
      imply JVM heap. This mode should only be used when we want faster performance than memory-mapped files,
       and are also sure that we will never run into OOM.
mmap: Segments are loaded on memory-mapped file. This is the default mode.

Config key

Description

Supported values

streamType

the streaming platform from which to consume the data

kafka

stream.[streamType].consumer.type

whether to use per partition low-level consumer or high-level stream consumer

lowLevel or highLevel

stream.[streamType].topic.name

the datasource (e.g. topic, data stream) from which to consume the data

String

stream.[streamType].decoder.class.name

name of the class to be used for parsing the data. The class should implement org.apache.pinot.spi.stream.StreamMessageDecoder interface

String

stream.[streamType].consumer.factory.class.name

name of the factory class to be used to provide the appropriate implementation of low level and high level consumer as well as the metadata

String

stream.[streamType].consumer.prop.auto.offset.reset

determines the offset from which to start the ingestion

smallest largest or timestamp in milliseconds

realtime.segment.flush.threshold.time

Time threshold that will keep the realtime segment open for before we complete the segment

realtime.segment.flush.threshold.size

Row count flush threshold for realtime segments. This behaves in a similar way for HLC and LLC. For HLC,

since there is only one consumer per server, this size is used as the size of the consumption buffer and determines after how many rows we flush to disk. For example, if this threshold is set to two million rows,

then a high level consumer would have a buffer size of two million.

If this value is set to 0, then the consumers adjust the number of rows consumed by a partition such that the size of the completed segment is the desired size (unless

threshold.time is reached first)

realtime.segment.flush.desired.size

The desired size of a completed realtime segment.This config is used only if threshold.size is set to 0.

{
  "tableName": "transcript",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "timestamp",
    "timeType": "MILLISECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "transcript-topic",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "localhost:9876",
      "realtime.segment.flush.threshold.time": "3600000",
      "realtime.segment.flush.threshold.size": "50000",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}
docker run \
    --network=pinot-demo \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-streaming-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \
    -controllerHost pinot-quickstart \
    -controllerPort 9000 \
    -exec
bin/pinot-admin.sh AddTable \
    -schemaFile /path/to/transcript-schema.json \
    -tableConfigFile /path/to/transcript-table-realtime.json \
    -exec
creating a schema
table
Stream Ingestion Plugin

Segment

Pinot has the concept of a table, which is a logical abstraction to refer to a collection of related data. Pinot has a distributed architecture and scales horizontally. Pinot expects the size of a table to grow infinitely over time. In order to achieve this, the entire data needs to be distributed across multiple nodes.

Pinot achieves this by breaking the data into smaller chunks known as segments (similar to shards/partitions in relational databases). Segments can be seen as time-based partitions.

Thus, a segment is a horizontal shard representing a chunk of table data with some number of rows. The segment stores data for all columns of the table. Each segment packs the data in a columnar fashion, along with the dictionaries and indices for the columns. The segment is laid out in a columnar format so that it can be directly mapped into memory for serving queries.

Columns can be single or multi-valued and the following types are supported: STRING, INT, LONG, FLOAT, DOUBLE or BYTES.

Columns may be declared to be metric or dimension (or specifically as a time dimension) in the schema. Columns can have default null values. For example, the default null value of a integer column can be 0. The default value for bytes columns must be hex-encoded before it's added to the schema.

Pinot uses dictionary encoding to store values as a dictionary ID. Columns may be configured to be “no-dictionary” column in which case raw values are stored. Dictionary IDs are encoded using minimum number of bits for efficient storage (e.g. a column with a cardinality of 3 will use only 2 bits for each dictionary ID).

A forward index is built for each column and compressed for efficient memory use. In addition, you can optionally configure inverted indices for any set of columns. Inverted indices take up more storage, but improve query performance. Specialized indexes like Star-Tree index are also supported. For more details, see Indexing.

Creating a segment

Once the table is configured, we can load some data. Loading data involves generating pinot segments from raw data and pushing them to the pinot cluster. Data can be loaded in batch mode or streaming mode. For more details, see the ingestion overview page.

Load Data in Batch

Prerequisites

  1. Setup a cluster

  2. Create broker and server tenants

  3. Create an offline table

Below are instructions to generate and push segments to Pinot via standalone scripts. For a production setup, you should use frameworks such as Hadoop or Spark. For more details on setting up data ingestion jobs, see Import Data.

Job Spec YAML

To generate a segment, we need to first create a job spec YAML file. This file contains all the information regarding data format, input data location, and pinot cluster coordinates. Note that this assumes that the controller is RUNNING to fetch the table config and schema. If not, you will have to configure the spec to point at their location. For full configurations, see Ingestion Job Spec.

job-spec.yml
executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'

jobType: SegmentCreationAndTarPush
inputDirURI: 'examples/batch/baseballStats/rawdata'
includeFileNamePattern: 'glob:**/*.csv'
excludeFileNamePattern: 'glob:**/*.tmp'
outputDirURI: 'examples/batch/baseballStats/segments'
overwriteOutput: true

pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS

recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
  configs:

tableSpec:
  tableName: 'baseballStats'
  schemaURI: 'http://localhost:9000/tables/baseballStats/schema'
  tableConfigURI: 'http://localhost:9000/tables/baseballStats'
  
segmentNameGeneratorSpec:

pinotClusterSpecs:
  - controllerURI: 'http://localhost:9000'

pushJobSpec:
  pushParallelism: 2
  pushAttempts: 2
  pushRetryIntervalMillis: 1000

Create and push segment

To create and push the segment in one go, use

docker run \
    --network=pinot-demo \
    --name pinot-data-ingestion-job \
    ${PINOT_IMAGE} LaunchDataIngestionJob \
    -jobSpecFile examples/docker/ingestion-job-specs/airlineStats.yaml

Sample Console Output

SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
  segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**/*.avro
inputDirURI: examples/batch/airlineStats/rawdata
jobType: SegmentCreationAndTarPush
outputDirURI: examples/batch/airlineStats/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: 'http://pinot-controller:9000'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: {pushAttempts: 2, pushParallelism: 1, pushRetryIntervalMillis: 1000,
  segmentUriPrefix: null, segmentUriSuffix: null}
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.avro.AvroRecordReader,
  configClassName: null, configs: null, dataFormat: avro}
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: 'http://pinot-controller:9000/tables/airlineStats/schema',
  tableConfigURI: 'http://pinot-controller:9000/tables/airlineStats', tableName: airlineStats}

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Finished building StatsCollector!
Collected stats for 403 documents
Created dictionary for INT column: FlightNum with cardinality: 386, range: 14 to 7389
Using fixed bytes value dictionary for column: Origin, size: 294
Created dictionary for STRING column: Origin with cardinality: 98, max length in bytes: 3, range: ABQ to VPS
Created dictionary for INT column: Quarter with cardinality: 1, range: 1 to 1
Created dictionary for INT column: LateAircraftDelay with cardinality: 50, range: -2147483648 to 303
......
......
Pushing segment: airlineStats_OFFLINE_16085_16085_29 to location: http://pinot-controller:9000 for table airlineStats
Sending request: http://pinot-controller:9000/v2/segments?tableName=airlineStats to controller: a413b0013806, version: Unknown
Response for pushing table airlineStats segment airlineStats_OFFLINE_16085_16085_29 to location http://pinot-controller:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16085_16085_29 of table: airlineStats"}
Pushing segment: airlineStats_OFFLINE_16084_16084_30 to location: http://pinot-controller:9000 for table airlineStats
Sending request: http://pinot-controller:9000/v2/segments?tableName=airlineStats to controller: a413b0013806, version: Unknown
Response for pushing table airlineStats segment airlineStats_OFFLINE_16084_16084_30 to location http://pinot-controller:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16084_16084_30 of table: airlineStats"}
bin/pinot-admin.sh LaunchDataIngestionJob \
    -jobSpecFile examples/batch/airlineStats/ingestionJobSpec.yaml

Alternately, you can separately create and then push, by changing the jobType to SegmentCreation or SegmenTarPush.

Templating Ingestion Job Spec

The Ingestion job spec supports templating with Groovy Syntax.

This is convenient if you want to generate one ingestion job template file and schedule it on a daily basis with extra parameters updated daily.

e.g. you could set inputDirURI with parameters to indicate the date, so that the ingestion job only processes the data for a particular date. Below is an example that templates the date for input and output directories.

inputDirURI: 'examples/batch/airlineStats/rawdata/${year}/${month}/${day}'
outputDirURI: 'examples/batch/airlineStats/segments/${year}/${month}/${day}'

You can pass in arguments containing values for ${year}, ${month}, ${day} when kicking off the ingestion job: -values $param=value1 $param2=value2...

docker run \
    --network=pinot-demo \
    --name pinot-data-ingestion-job \
    ${PINOT_IMAGE} LaunchDataIngestionJob \
    -jobSpecFile examples/docker/ingestion-job-specs/airlineStats.yaml
    -values year=2014 month=01 day=03

This ingestion job only generates segments for date 2014-01-03

Load Data in Streaming

Prerequisites

  1. Setup a cluster

  2. Create broker and server tenants

  3. Create a realtime table and setup a realtime stream

Below is an example of how to publish sample data to your stream. As soon as data is available to the realtime stream, it starts getting consumed by the realtime servers

Kafka

Run below command to stream JSON data into Kafka topic: flights-realtime

docker run \
  --network pinot-demo \
  --name=loading-airlineStats-data-to-kafka \
  ${PINOT_IMAGE} StreamAvroIntoKafka \
  -avroFile examples/stream/airlineStats/sample_data/airlineStats_data.avro \
  -kafkaTopic flights-realtime -kafkaBrokerList kafka:9092 -zkAddress pinot-zookeeper:2181/kafka

Run below command to stream JSON data into Kafka topic: flights-realtime

bin/pinot-admin.sh StreamAvroIntoKafka \
  -avroFile examples/stream/airlineStats/sample_data/airlineStats_data.avro \
  -kafkaTopic flights-realtime -kafkaBrokerList localhost:19092 -zkAddress localhost:2191/kafka
export PINOT_VERSION=0.9.0
export PINOT_IMAGE=apachepinot/pinot:${PINOT_VERSION}
docker pull ${PINOT_IMAGE}
docker network create -d bridge pinot-demo
docker run \
    --network=pinot-demo \
    --name  pinot-zookeeper \
    --restart always \
    -p 2181:2181 \
    -d zookeeper:3.5.6
docker run --rm -ti \
    --network=pinot-demo \
    --name pinot-controller \
    -p 9000:9000 \
    -e JAVA_OPTS="-Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log" \
    -d ${PINOT_IMAGE} StartController \
    -zkAddress pinot-zookeeper:2181
docker run --rm -ti \
    --network=pinot-demo \
    --name pinot-broker \
    -p 8099:8099 \
    -e JAVA_OPTS="-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-broker.log" \
    -d ${PINOT_IMAGE} StartBroker \
    -zkAddress pinot-zookeeper:2181
docker run --rm -ti \
    --network=pinot-demo \
    --name pinot-server \
    -e JAVA_OPTS="-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-server.log" \
    -d ${PINOT_IMAGE} StartServer \
    -zkAddress pinot-zookeeper:2181
docker run --rm -ti \
    --network pinot-demo --name=kafka \
    -e KAFKA_ZOOKEEPER_CONNECT=pinot-zookeeper:2181/kafka \
    -e KAFKA_BROKER_ID=0 \
    -e KAFKA_ADVERTISED_HOST_NAME=kafka \
    -d wurstmeister/kafka:latest
docker container ls -a
CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS              PORTS                                                  NAMES
9ec20e4463fa        wurstmeister/kafka:latest   "start-kafka.sh"         43 minutes ago      Up 43 minutes                                                              kafka
0775f5d8d6bf        apachepinot/pinot:latest    "./bin/pinot-admin.s…"   44 minutes ago      Up 44 minutes       8096-8099/tcp, 9000/tcp                                pinot-server
64c6392b2e04        apachepinot/pinot:latest    "./bin/pinot-admin.s…"   44 minutes ago      Up 44 minutes       8096-8099/tcp, 9000/tcp                                pinot-broker
b6d0f2bd26a3        apachepinot/pinot:latest    "./bin/pinot-admin.s…"   45 minutes ago      Up 45 minutes       8096-8099/tcp, 0.0.0.0:9000->9000/tcp                  pinot-quickstart
570416fc530e        zookeeper:3.5.6             "/docker-entrypoint.…"   45 minutes ago      Up 45 minutes       2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, 8080/tcp   pinot-zookeeper
docker-compose.yml
version: '3.7'
services:
  zookeeper:
    image: zookeeper:3.5.6
    hostname: zookeeper
    container_name: manual-zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
  pinot-controller:
    image: apachepinot/pinot:0.9.0
    command: "StartController -zkAddress manual-zookeeper:2181"
    container_name: "manual-pinot-controller"
    restart: unless-stopped
    ports:
      - "9000:9000"
    environment:
      JAVA_OPTS: "-Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log"
    depends_on:
      - zookeeper
  pinot-broker:
    image: apachepinot/pinot:0.9.0
    command: "StartBroker -zkAddress manual-zookeeper:2181"
    restart: unless-stopped
    container_name: "manual-pinot-broker"
    ports:
      - "8099:8099"
    environment:
      JAVA_OPTS: "-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-broker.log"
    depends_on:
      - pinot-controller
  pinot-server:
    image: apachepinot/pinot:0.9.0
    command: "StartServer -zkAddress manual-zookeeper:2181"
    restart: unless-stopped
    container_name: "manual-pinot-server" 
    environment:
      JAVA_OPTS: "-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-server.log"
    depends_on:
      - pinot-broker
cd apache-pinot-${PINOT_VERSION}-bin
bin/pinot-admin.sh StartZookeeper \
  -zkPort 2191
export JAVA_OPTS="-Xms4G -Xmx8G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log"
bin/pinot-admin.sh StartController \
    -zkAddress localhost:2191 \
    -controllerPort 9000
export JAVA_OPTS="-Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-broker.log"
bin/pinot-admin.sh StartBroker \
    -zkAddress localhost:2191
export JAVA_OPTS="-Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-server.log"
bin/pinot-admin.sh StartServer \
    -zkAddress localhost:2191
bin/pinot-admin.sh  StartKafka \ 
  -zkAddress=localhost:2191/kafka \
  -port 19092
docker container ls 
CONTAINER ID   IMAGE                     COMMAND                  CREATED              STATUS              PORTS                                                                     NAMES
ba5cb0868350   apachepinot/pinot:0.9.0   "./bin/pinot-admin.s…"   About a minute ago   Up About a minute   8096-8099/tcp, 9000/tcp                                                   manual-pinot-server
698f160852f9   apachepinot/pinot:0.9.0   "./bin/pinot-admin.s…"   About a minute ago   Up About a minute   8096-8098/tcp, 9000/tcp, 0.0.0.0:8099->8099/tcp, :::8099->8099/tcp        manual-pinot-broker
b1ba8cf60d69   apachepinot/pinot:0.9.0   "./bin/pinot-admin.s…"   About a minute ago   Up About a minute   8096-8099/tcp, 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                  manual-pinot-controller
54e7e114cd53   zookeeper:3.5.6           "/docker-entrypoint.…"   About a minute ago   Up About a minute   2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 8080/tcp   manual-zookeeper
here
Running Replicated Zookeeper
Getting Pinot
Zooinspector
Recipes
Batch upload sample data
Stream sample data
Sample docker resources
Sample docker resources
SELECT sum(clicks), sum(impressions) FROM AdAnalyticsTable
  WHERE 
       ((daysSinceEpoch >= 17849 AND daysSinceEpoch <= 17856)) AND 
       accountId IN (123456789)
  GROUP BY 
       daysSinceEpoch TOP 100
https://communityinviter.com/apps/apache-pinot/apache-pinot
here
Who Viewed Profile
Company Analytics
Talent Insights
UberEats Restaurant Manager
here
ThirdEye
tenants
plugin architecture
PQL (Pinot Query Language)
Challenges of user-facing realtime analytics
Getting Started
Import Data
Running Pinot locally
Running Pinot in Docker
Running Pinot in Kubernetes
Introduction
Concepts
Architecture

Schema

Each table in Pinot is associated with a Schema. A schema defines what fields are present in the table along with the data types.

The schema is stored in the Zookeeper, along with the table configuration.

Categories

A schema also defines what category a column belongs to. Columns in a Pinot table can be categorized into three categories:

Data Types

Data types determine the operations that can be performed on a column. Pinot supports the following data types:

BOOLEAN, TIMESTAMP, JSON are added after release 0.7.1. In release 0.7.1 and older releases, BOOLEAN is equivalent to STRING.

Pinot also supports columns that contain lists or arrays of items, but there isn't an explicit data type to represent these lists or arrays. Instead, you can indicate that a dimension column accepts multiple values. For more information, see in the Schema configuration reference.

Date Time Fields

Since Pinot doesn't have a dedicated DATETIME datatype support, you need to input time in either STRING, LONG, or INT format. However, Pinot needs to convert the date into an understandable format such as epoch timestamp to do operations.

To achieve this conversion, you will need to provide the format of the date along with the data type in the schema. The format is described using the following syntax: timeSize:timeUnit:timeFormat:pattern .

  • time size - the size of the time unit. This size is multiplied to the value present in the time column to get an actual timestamp. e.g. if timesize is 5 and value in time column is 4996308 minutes. The value that will be converted to epoch timestamp will be 4996308 * 5 * 60 * 1000 = 1498892400000 milliseconds. If your date is not in EPOCH format, this value is not used and can be set to 1 or any other integer.

  • time unit - one of enum values. e.g. HOURS , MINUTES etc. If your date is not in EPOCH format, this value is not used and can be set to MILLISECONDS or any other unit.

  • timeFormat - can be either EPOCH or SIMPLE_DATE_FORMAT. If it is SIMPLE_DATE_FORMAT, the pattern string is also specified.

  • pattern - This is optional and is only specified when the date is in SIMPLE_DATE_FORMAT . The pattern should be specified using the java representation. e.g. 2020-08-21 can be represented as yyyy-MM-dd.

Here are some sample date-time formats you can use in the schema:

  • 1:MILLISECONDS:EPOCH - used when timestamp is in the epoch milliseconds and stored in LONG format

  • 1:HOURS:EPOCH - used when timestamp is in the epoch hours and stored in LONG or INT format

  • 1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd - when the date is in STRING format and has the pattern year-month-date. e.g. 2020-08-21

  • 1:HOURS:SIMPLE_DATE_FORMAT:EEE MMM dd HH:mm:ss ZZZ yyyy - when date is in STRING format. e.g. Mon Aug 24 12:36:50 America/Los_Angeles 2019

Built-in Virtual Columns

There are several built-in virtual columns inside the schema the can be used for debugging purposes:

These virtual columns can be used in queries in a similar way to regular columns.

Creating a Schema

First, Make sure your and running.

Let's create a schema and put it in a JSON file. For this example, we have created a schema for flight data.

For more details on constructing a schema file, see the .

Then, we can upload the sample schema provided above using either a Bash command or REST API call.

Check out the schema in the to make sure it was successfully uploaded

Apache Kafka

This guide shows you how to ingest a stream of records from an Apache Kafka topic into a Pinot table.

Introduction

In this guide, you'll learn how to import data into Pinot using Apache Kafka for real-time stream ingestion. Pinot has out-of-the-box real-time ingestion support for Kafka.

Let's setup a demo Kafka cluster locally, and create a sample topic transcript-topic

Start Kafka

Create a Kafka Topic

Start Kafka

Start Kafka cluster on port 9876 using the same Zookeeper from the .

Create a Kafka topic

Download the latest . Create a topic.

Creating Schema Configuration

We will publish the data in the same format as mentioned in the docs. So you can use the same schema mentioned under .

Creating a table configuration

The real-time table configuration for the transcript table described in the schema from the previous step.

For Kafka, we use streamType as kafka . Currently only JSON format is supported but you can easily write your own decoder by extending the StreamMessageDecoder interface. You can then access your decoder class by putting the jar file in plugins directory

The lowLevel consumer reads data per partition whereas the highLevel consumer utilises Kafka high level consumer to read data from the whole stream. It doesn't have the control over which partition to read at a particular momemt.

For Kafka versions below 2.X, use org.apache.pinot.plugin.stream.kafka09.KafkaConsumerFactory

For Kafka version 2.X and above, use org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory

You can set the offset to -

  • smallest to start consumer from the earliest offset

  • largest to start consumer from the latest offset

  • timestamp in milliseconds to start the consumer from the offset after the timestamp.

The resulting configuration should look as follows -

Upgrade from Kafka 0.9 connector to Kafka 2.x connector

  • Update table config for both high level and low level consumer: Update config: stream.kafka.consumer.factory.class.name from org.apache.pinot.core.realtime.impl.kafka.KafkaConsumerFactory to org.apache.pinot.core.realtime.impl.kafka2.KafkaConsumerFactory.

  • If using Stream(High) level consumer: Please also add config stream.kafka.hlc.bootstrap.server into tableIndexConfig.streamConfigs. This config should be the URI of Kafka broker lists, e.g. localhost:9092.

How to consume from higher Kafka version?

This connector is also suitable for Kafka lib version higher than 2.0.0. In , change the kafka.lib.version from 2.0.0 to 2.1.1 will make this Connector working with Kafka 2.1.1.

Upload schema and table

Now that we have our table and schema configurations, let's upload them to the Pinot cluster. As soon as the real-time table is created, it will begin ingesting available records from the Kafka topic.

Add sample data to the Kafka topic

We will publish data in the following format to Kafka. Let us save the data in a file named as transcript.json.

Push sample JSON into the transcript-topic Kafka topic, using the Kafka console producer. This will add 12 records to the topic described in the transcript.json file.

Ingesting streaming data

As soon as data flows into the stream, the Pinot table will consume it and it will be ready for querying. Head over to the to checkout the real-time data.

Some More kafka ingestion configs

Use Kafka Partition(Low) Level Consumer with SSL

Here is an example config which uses SSL based authentication to talk with kafka and schema-registry. Notice there are two sets of SSL options, ones starting with ssl. are for kafka consumer and ones with stream.kafka.decoder.prop.schema.registry. are for SchemaRegistryClient used by KafkaConfluentSchemaRegistryAvroMessageDecoder.

Ingest transactionally committed messages only from Kafka

With Kafka consumer 2.0, you can ingest transactionally committed messages only by configuring kafka.isolation.level to read_committed. For example,

Note that the default value of this config read_uncommitted to read all messages. Also, this config supports low-level consumer only.

0.6.0

This release introduced some excellent new features, including upsert, tiered storage, pinot-spark-connector, support of having clause, more validations on table config and schema, support of ordinals

Summary

This release introduced some excellent new features, including upsert, tiered storage, pinot-spark-connector, support of having clause, more validations on table config and schema, support of ordinals in GROUP BY and ORDER BY clause, array transform functions, adding push job type of segment metadata only mode, and some new APIs like updating instance tags, new health check endpoint. It also contains many key bug fixes. See details below.

The release was cut from the following commit: and the following cherry-picks:

Notable New Features

  • Tiered storage ()

  • Upsert feature (, , , , )

  • Pre-generate aggregation functions in QueryContext ()

  • Adding controller healthcheck endpoint: /health ()

  • Add pinot-spark-connector ()

  • Support multi-value non-dictionary group by ()

  • Support type conversion for all scalar functions ()

  • Add additional datetime functionality ()

  • Support post-aggregation in ORDER-BY ()

  • Support post-aggregation in SELECT ()

  • Add RANGE FilterKind to support merging ranges for SQL ()

  • Add HAVING support (5889)

  • Support for exact distinct count for non int data types ()

  • Add max qps bucket count ()

  • Add Range Indexing support for raw values ()

  • Add IdSet and IdSetAggregationFunction ()

  • [Deepstore by-pass]Add a Deepstore bypass integration test with minor bug fixes. ()

  • Add Hadoop counters for detecting schema mismatch ()

  • Add RawThetaSketchAggregationFunction ()

  • Instance API to directly updateTags ()

  • Add streaming query handler ()

  • Add InIdSetTransformFunction ()

  • Add ingestion descriptor in the header ()

  • Zookeeper put api ()

  • Feature/#5390 segment indexing reload status api ()

  • Segment processing framework ()

  • Support streaming query in QueryExecutor ()

  • Add list of allowed tables for emitting table level metrics ()

  • Add FilterOptimizer which supports optimizing both PQL and SQL query filter ()

  • Adding push job type of segment metadata only mode ()

  • Minion taskExecutor for RealtimeToOfflineSegments task (, )

  • Adding array transform functions: array_average, array_max, array_min, array_sum ()

  • Allow modifying/removing existing star-trees during segment reload ()

  • Implement off-heap bloom filter reader ()

  • Support for multi-threaded Group By reducer for SQL. ()

  • Add OnHeapGuavaBloomFilterReader ()

  • Support using ordinals in GROUP BY and ORDER BY clause ()

  • Merge common APIs for Dictionary ()

  • Add table level lock for segment upload ([#6165])

  • Added recursive functions validation check for group by ()

  • Add StrictReplicaGroupInstanceSelector ()

  • Add IN_SUBQUERY support ()

  • Add IN_PARTITIONED_SUBQUERY support ()

  • Some UI features (, , , )

Special notes

  • Brokers should be upgraded before servers in order to keep backward-compatible:

    • Change group key delimiter from '\t' to '\0' ()

    • Support for exact distinct count for non int data types ()

  • Pinot Components have to be deployed in the following order:

    (PinotServiceManager -> Bootstrap services in role ServiceRole.CONTROLLER -> All remaining bootstrap services in parallel)

    • Starts Broker and Server in parallel when using ServiceManager ()

    • New settings introduced and old ones deprecated:

    • Make realtime threshold property names less ambiguous ()

    • Change Signature of Broker API in Controller ()

  • This aggregation function is still in beta version. This PR involves change on the format of data sent from server to broker, so it works only when both broker and server are upgraded to the new version:

    • Enhance DistinctCountThetaSketchAggregationFunction ()

Major Bug fixes

  • Improve performance of DistinctCountThetaSketch by eliminating empty sketches and unions. ()

  • Enhance VarByteChunkSVForwardIndexReader to directly read from data buffer for uncompressed data ()

  • Fixing backward-compatible issue of schema fetch call ()

  • Fix race condition in MetricsHelper ()

  • Fixing the race condition that segment finished before ControllerLeaderLocator created. ()

  • Fix CSV and JSON converter on BYTES column ()

  • Fixing the issue that transform UDFs are parsed as function name 'OTHER', not the real function names ()

  • Incorporating embedded exception while trying to fetch stream offset ()

  • Use query timeout for planning phase ()

  • Add null check while fetching the schema ()

  • Validate timeColumnName when adding/updating schema/tableConfig ()

  • Handle the partitioning mismatch between table config and stream ()

  • Fix built-in virtual columns for immutable segment ()

  • Refresh the routing when realtime segment is committed ()

  • Add support for Decimal with Precision Sum aggregation ()

  • Fixing the calls to Helix to throw exception if zk connection is broken ()

  • Allow modifying/removing existing star-trees during segment reload ()

  • Add max length support in schema builder ()

  • Enhance star-tree to skip matching-all predicate on non-star-tree dimension ()

Backward Incompatible Changes

  • Make realtime threshold property names less ambiguous ()

  • Enhance DistinctCountThetaSketchAggregationFunction ()

  • Deep Extraction Support for ORC, Thrift, and ProtoBuf Records ()

Category

Description

Dimension

Dimension columns are typically used in slice and dice operations for answering business queries. Some operations for which dimension columns are used:

  • GROUP BY - group by one or more dimension columns along with aggregations on one or more metric columns

  • Filter clauses such as WHERE

Metric

These columns represent the quantitative data of the table. Such columns are used for aggregation. In data warehouse terminology, these can also be referred to as fact or measure columns.

Some operation for which metric columns are used:

  • Aggregation - SUM, MIN, MAX, COUNT, AVG etc

  • Filter clause such as WHERE

DateTime

This column represents time columns in the data. There can be multiple time columns in a table, but only one of them can be treated as primary. The primary time column is the one that is present in the segment config. The primary time column is used by Pinot to maintain the time boundary between offline and real-time data in a hybrid table and for retention management. A primary time column is mandatory if the table's push type is APPEND and optional if the push type is REFRESH .

Common operations that can be done on time column:

  • GROUP BY

  • Filter clauses such as WHERE

Data Type

Default Dimension Value

Default Metric Value

INT

Integer.MIN_VALUE

0

LONG

Long.MIN_VALUE

0

FLOAT

Float.NEGATIVE_INFINITY

0.0

DOUBLE

Double.NEGATIVE_INFINITY

0.0

BOOLEAN

0 (false)

N/A

TIMESTAMP

0 (1970-01-01 00:00:00 UTC)

N/A

STRING

"null"

N/A

JSON

"null"

N/A

BYTES

byte array of length 0

byte array of length 0

Column Name

Column Type

Data Type

Description

$hostName

Dimension

STRING

Name of the server hosting the data

$segmentName

Dimension

STRING

Name of the segment containing the record

$docId

Dimension

INT

Document id of the record within the segment

flights-schema.json
{
  "schemaName": "flights",
  "dimensionFieldSpecs": [
    {
      "name": "flightNumber",
      "dataType": "LONG"
    },
    {
      "name": "tags",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": "null"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "price",
      "dataType": "DOUBLE",
      "defaultNullValue": 0
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "millisSinceEpoch",
      "dataType": "LONG",
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "15:MINUTES"
    },
    {
      "name": "hoursSinceEpoch",
      "dataType": "INT",
      "format": "1:HOURS:EPOCH",
      "granularity": "1:HOURS"
    },
    {
      "name": "dateString",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
      "granularity": "1:DAYS"
    }
  ]
}
bin/pinot-admin.sh AddSchema -schemaFile flights-schema.json -exec

OR

bin/pinot-admin.sh AddTable -schemaFile flights-schema.json -tableFile flights-table.json -exec
curl -F [email protected]  localhost:9000/schemas
DimensionFieldSpec
TimeUnit
SimpleDateFormat
cluster is up
Schema configuration reference
Rest API
e5c9bec
d033a11
#5793
#6096
#6113
#6141
#6149
#6167
#5805
#5846
#5787
#5851
#5849
#5438
#5856
#5867
#5898
#5889
#5872
#5922
#5853
#5926
#5857
#5873
#5970
#5902
#5717
#5973
#5995
#5949
#5718
#5934
#6027
#6037
#6056
#5967
#6050
#6124
#6084
#6100
#6118
#6044
#6147
#6152
#6176
#6186
#6208
#6022
#6043
#5810
#5981
#6117
#6215
#5858
#5872
#5917
#5953
#6119
#6004
#5798
#5816
#5885
#5887
#5864
#5931
#5940
#5956
#5990
#5994
#5966
#6031
#6042
#6078
#6053
#6069
#6100
#6112
#6109
#5953
#6004
#6046
/tmp/pinot-quick-start/transcript-table-realtime.json
 {
  "tableName": "transcript",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "timestamp",
    "timeType": "MILLISECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "transcript-topic",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "localhost:9876",
      "realtime.segment.flush.threshold.time": "3600000",
      "realtime.segment.flush.threshold.size": "50000",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}
docker run \
    --network=pinot-demo \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-streaming-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \
    -controllerHost pinot-quickstart \
    -controllerPort 9000 \
    -exec
bin/pinot-admin.sh AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \
    -exec
transcript.json
{"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"Maths","score":3.8,"timestamp":1571900400000}
{"studentID":205,"firstName":"Natalie","lastName":"Jones","gender":"Female","subject":"History","score":3.5,"timestamp":1571900400000}
{"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Maths","score":3.2,"timestamp":1571900400000}
{"studentID":207,"firstName":"Bob","lastName":"Lewis","gender":"Male","subject":"Chemistry","score":3.6,"timestamp":1572418800000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Geography","score":3.8,"timestamp":1572505200000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"English","score":3.5,"timestamp":1572505200000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Maths","score":3.2,"timestamp":1572678000000}
{"studentID":209,"firstName":"Jane","lastName":"Doe","gender":"Female","subject":"Physics","score":3.6,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"Maths","score":3.8,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"English","score":3.5,"timestamp":1572678000000}
{"studentID":211,"firstName":"John","lastName":"Doe","gender":"Male","subject":"History","score":3.2,"timestamp":1572854400000}
{"studentID":212,"firstName":"Nick","lastName":"Young","gender":"Male","subject":"History","score":3.6,"timestamp":1572854400000}
bin/kafka-console-producer.sh \
    --broker-list localhost:9876 \
    --topic transcript-topic < transcript.json
SELECT * FROM transcript
  {
    "tableName": "transcript",
    "tableType": "REALTIME",
    "segmentsConfig": {
    "timeColumnName": "timestamp",
    "timeType": "MILLISECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
    },
    "tenants": {},
    "tableIndexConfig": {
      "loadMode": "MMAP",
      "streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.consumer.type": "LowLevel",
        "stream.kafka.topic.name": "transcript-topic",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.zk.broker.url": "localhost:2191/kafka",
        "stream.kafka.broker.list": "localhost:9876",
        "schema.registry.url": "",
        "security.protocol": "SSL",
        "ssl.truststore.location": "",
        "ssl.keystore.location": "",
        "ssl.truststore.password": "",
        "ssl.keystore.password": "",
        "ssl.key.password": "",
        "stream.kafka.decoder.prop.schema.registry.rest.url": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.truststore.location": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.keystore.location": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.truststore.password": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.keystore.password": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.keystore.type": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.truststore.type": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.key.password": "",
        "stream.kafka.decoder.prop.schema.registry.ssl.protocol": "",
      }
    },
    "metadata": {
      "customConfigs": {}
    }
  }
  {
    "tableName": "transcript",
    "tableType": "REALTIME",
    "segmentsConfig": {
    "timeColumnName": "timestamp",
    "timeType": "MILLISECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
    },
    "tenants": {},
    "tableIndexConfig": {
      "loadMode": "MMAP",
      "streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.consumer.type": "LowLevel",
        "stream.kafka.topic.name": "transcript-topic",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.zk.broker.url": "localhost:2191/kafka",
        "stream.kafka.broker.list": "localhost:9876",
        "stream.kafka.isolation.level": "read_committed"
      }
    },
    "metadata": {
      "customConfigs": {}
    }
  }
quick-start examples
bin/pinot-admin.sh  StartKafka -zkAddress=localhost:2123/kafka -port 9876
Kafka
Stream ingestion
Create Schema Configuration
Kafka 2.0 connector pom.xml
Query Console
docker run \
    --network pinot-demo --name=kafka \
    -e KAFKA_ZOOKEEPER_CONNECT=pinot-quickstart:2123/kafka \
    -e KAFKA_BROKER_ID=0 \
    -e KAFKA_ADVERTISED_HOST_NAME=kafka \
    -d wurstmeister/kafka:latest
docker exec \
  -t kafka \
  /opt/kafka/bin/kafka-topics.sh \
  --zookeeper pinot-quickstart:2123/kafka \
  --partitions=1 --replication-factor=1 \
  --create --topic transcript-topic
bin/kafka-topics.sh --create --bootstrap-server localhost:9876 --replication-factor 1 --partitions 1 --topic transcript-topic

Python

Python DB-API and SQLAlchemy dialect for Pinot

Applications can use this python client library to query Apache Pinot.

Pypi Repo: https://pypi.org/project/pinotdb/

Source Code Repo: https://github.com/python-pinot-dbapi/pinot-dbapi

Installation

pip install pinotdb

Please note:

  • pinotdb version >= 0.3.2 is using Pinot SQL API (added in Pinot >= 0.3.0) and drops support for PQL API. So this client requires Pinot server version >= 0.3.0 in order to access Pinot.

  • pinotdb version in 0.2.x is using Pinot PQL API, which works with pinot version <= 0.3.0, but may miss some new SQL query features added in newer Pinot version.

Usage

Using the DB API to query Pinot Broker directly:

from pinotdb import connect

conn = connect(host='localhost', port=8099, path='/query/sql', scheme='http')
curs = conn.cursor()
curs.execute("""
    SELECT place,
           CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
           CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
      FROM places
     LIMIT 10
""")
for row in curs:
    print(row)

Using SQLAlchemy:

The db engine connection string is format as: pinot://:?controller=://:/

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine('pinot://localhost:8099/query/sql?controller=http://localhost:9000/')  # uses HTTP by default :(
# engine = create_engine('pinot+http://localhost:8099/query/sql?controller=http://localhost:9000/')
# engine = create_engine('pinot+https://localhost:8099/query/sql?controller=http://localhost:9000/')

places = Table('places', MetaData(bind=engine), autoload=True)
print(select([func.count('*')], from_obj=places).scalar())

Examples with Pinot Quickstart

Pinot Batch Quickstart

Run below command to start Pinot Batch Quickstart in docker and expose Pinot controller port 9000 and Pinot broker port 8000.

docker run --name pinot-quickstart -p 2123:2123 -p 9000:9000 -p 8000:8000 -d apachepinot/pinot:latest QuickStart -type batch

Once pinot batch quickstart is up, you can run below sample code snippet to query Pinot:

python3 examples/pinot-quickstart-batch.py

Sample Output:

Sending SQL to Pinot: SELECT * FROM baseballStats LIMIT 5
[0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 'NL', 11, 11, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'SFN', 0, 2004]
[2, 45, 0, 0, 0, 0, 0, 0, 0, 0, 'NL', 45, 43, 'aardsda01', 'David Allan', 1, 0, 0, 0, 1, 0, 0, 'CHN', 0, 2006]
[0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 25, 2, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'CHA', 0, 2007]
[1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 47, 5, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 1, 'BOS', 0, 2008]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 73, 3, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'SEA', 0, 2009]

Sending SQL to Pinot: SELECT playerName, sum(runs) FROM baseballStats WHERE yearID>=2000 GROUP BY playerName LIMIT 5
['Scott Michael', 26.0]
['Justin Morgan', 0.0]
['Jason Andre', 0.0]
['Jeffrey Ellis', 0.0]
['Maximiliano R.', 16.0]

Sending SQL to Pinot: SELECT playerName,sum(runs) AS sum_runs FROM baseballStats WHERE yearID>=2000 GROUP BY playerName ORDER BY sum_runs DESC LIMIT 5
['Adrian', 1820.0]
['Jose Antonio', 1692.0]
['Rafael', 1565.0]
['Brian Michael', 1500.0]
['Alexander Emmanuel', 1426.0]

Pinot Hybrid Quickstart

Run below command to start Pinot Hybrid Quickstart in docker and expose Pinot controller port 9000 and Pinot broker port 8000.

docker run --name pinot-quickstart -p 2123:2123 -p 9000:9000 -p 8000:8000 -d apachepinot/pinot:latest QuickStart -type hybrid

Below is an example to query against Pinot Quickstart Hybrid:

python3 examples/pinot-quickstart-hybrid.py
Sending SQL to Pinot: SELECT * FROM airlineStats LIMIT 5
[171, 153, 19393, 0, 8, 8, 1433, '1400-1459', 0, 1425, 1240, 165, 'null', 0, 'WN', -2147483648, 1, 27, 17540, 0, 2, 2, 1242, '1200-1259', 0, 'MDW', 13232, 1323202, 30977, 'Chicago, IL', 'IL', 17, 'Illinois', 41, 861, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 402, 1, -2147483648, -2147483648, 1, -2147483648, 'BOS', 10721, 1072102, 30721, 'Boston, MA', 'MA', 25, 'Massachusetts', 13, 1, ['null'], -2147483648, 'N556WN', 6, 12, -2147483648, 'WN', -2147483648, 1254, 1427, 2014]
[183, 141, 20398, 1, 17, 17, 1302, '1200-1259', 1, 1245, 1005, 160, 'null', 0, 'MQ', 0, 1, 27, 17540, 0, -6, 0, 959, '1000-1059', -1, 'CMH', 11066, 1106603, 31066, 'Columbus, OH', 'OH', 39, 'Ohio', 44, 990, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 3574, 1, 0, -2147483648, 1, 17, 'MIA', 13303, 1330303, 32467, 'Miami, FL', 'FL', 12, 'Florida', 33, 1, ['null'], 0, 'N605MQ', 13, 29, -2147483648, 'MQ', 0, 1028, 1249, 2014]
[-2147483648, -2147483648, 20304, -2147483648, -2147483648, -2147483648, -2147483648, '2100-2159', -2147483648, 2131, 2005, 146, 'null', 0, 'OO', -2147483648, 1, 27, 17541, 1, 52, 52, 2057, '2000-2059', 3, 'COS', 11109, 1110902, 30189, 'Colorado Springs, CO', 'CO', 8, 'Colorado', 82, 809, 4, -2147483648, [11292], 1, [1129202], ['DEN'], -2147483648, 73, [9], 0, ['null'], [9], [-2147483648], [2304], 1, -2147483648, '2014-01-27', 5554, 1, -2147483648, -2147483648, 1, -2147483648, 'IAH', 12266, 1226603, 31453, 'Houston, TX', 'TX', 48, 'Texas', 74, 1, ['SEA', 'PSC', 'PHX', 'MSY', 'ATL', 'TYS', 'DEN', 'CHS', 'PDX', 'LAX', 'EWR', 'SFO', 'PIT', 'RDU', 'RAP', 'LSE', 'SAN', 'SBN', 'IAH', 'OAK', 'BRO', 'JFK', 'SAT', 'ORD', 'ACY', 'DFW', 'BWI'], -2147483648, 'N795SK', -2147483648, 19, -2147483648, 'OO', -2147483648, 2116, -2147483648, 2014]
[153, 125, 20436, 1, 41, 41, 1442, '1400-1459', 2, 1401, 1035, 146, 'null', 0, 'F9', 2, 1, 27, 17541, 1, 34, 34, 1109, '1000-1059', 2, 'DEN', 11292, 1129202, 30325, 'Denver, CO', 'CO', 8, 'Colorado', 82, 967, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 658, 1, 8, -2147483648, 1, 31, 'SFO', 14771, 1477101, 32457, 'San Francisco, CA', 'CA', 6, 'California', 91, 1, ['null'], 0, 'N923FR', 11, 17, -2147483648, 'F9', 0, 1126, 1431, 2014]
[-2147483648, -2147483648, 20304, -2147483648, -2147483648, -2147483648, -2147483648, '1400-1459', -2147483648, 1432, 1314, 78, 'B', 1, 'OO', -2147483648, 1, 27, 17541, -2147483648, -2147483648, -2147483648, -2147483648, '1300-1359', -2147483648, 'EAU', 11471, 1147103, 31471, 'Eau Claire, WI', 'WI', 55, 'Wisconsin', 45, 268, 2, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 5455, 1, -2147483648, -2147483648, 1, -2147483648, 'ORD', 13930, 1393003, 30977, 'Chicago, IL', 'IL', 17, 'Illinois', 41, 1, ['null'], -2147483648, 'N903SW', -2147483648, -2147483648, -2147483648, 'OO', -2147483648, -2147483648, -2147483648, 2014]

Sending SQL to Pinot: SELECT count(*) FROM airlineStats LIMIT 5
[17772]

Sending SQL to Pinot: SELECT AirlineID, sum(Cancelled) FROM airlineStats WHERE Year > 2010 GROUP BY AirlineID LIMIT 5
[20409, 40.0]
[19930, 16.0]
[19805, 60.0]
[19790, 115.0]
[20366, 172.0]

Sending SQL to Pinot: select OriginCityName, max(Flights) from airlineStats group by OriginCityName ORDER BY max(Flights) DESC LIMIT 5
['Casper, WY', 1.0]
['Deadhorse, AK', 1.0]
['Austin, TX', 1.0]
['Chicago, IL', 1.0]
['Monterey, CA', 1.0]

Sending SQL to Pinot: SELECT OriginCityName, sum(Cancelled) AS sum_cancelled FROM airlineStats WHERE Year>2010 GROUP BY OriginCityName ORDER BY sum_cancelled DESC LIMIT 5
['Chicago, IL', 178.0]
['Atlanta, GA', 111.0]
['New York, NY', 65.0]
['Houston, TX', 62.0]
['Denver, CO', 49.0]

Sending Count(*) SQL to Pinot
17773

Sending SQL: "SELECT OriginCityName, sum(Cancelled) AS sum_cancelled FROM "airlineStats" WHERE Year>2010 GROUP BY OriginCityName ORDER BY sum_cancelled DESC LIMIT 5" to Pinot
[('Chicago, IL', 178.0), ('Atlanta, GA', 111.0), ('New York, NY', 65.0), ('Houston, TX', 62.0), ('Denver, CO', 49.0)]

Minion

A Minion is a standby component that leverages the Helix Task Framework to offload computationally intensive tasks from other components.

It can be attached to an existing Pinot cluster and then execute tasks as provided by the controller. Custom tasks can be plugged via annotations into the cluster. Some typical minion tasks are:

  • Segment creation

  • Segment purge

  • Segment merge

Starting a Minion

Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a minion

Usage: StartMinion
    -help                                                   : Print this message. (required=false)
    -minionHost               <String>                      : Host name for minion. (required=false)
    -minionPort               <int>                         : Port number to start the minion at. (required=false)
    -zkAddress                <http>                        : HTTP address of Zookeeper. (required=false)
    -clusterName              <String>                      : Pinot cluster name. (required=false)
    -configFileName           <Config File Name>            : Minion Starter Config file. (required=false)
docker run \
    --network=pinot-demo \
    --name pinot-minion \
    -d ${PINOT_IMAGE} StartMinion \
    -zkAddress pinot-zookeeper:2181
bin/pinot-admin.sh StartMinion \
    -zkAddress localhost:2181

Interfaces

PinotTaskGenerator

PinotTaskGenerator interface defines the APIs for the controller to generate tasks for minions to execute.

public interface PinotTaskGenerator {

  /**
   * Initializes the task generator.
   */
  void init(ClusterInfoAccessor clusterInfoAccessor);

  /**
   * Returns the task type of the generator.
   */
  String getTaskType();

  /**
   * Generates a list of tasks to schedule based on the given table configs.
   */
  List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs);

  /**
   * Returns the timeout in milliseconds for each task, 3600000 (1 hour) by default.
   */
  default long getTaskTimeoutMs() {
    return JobConfig.DEFAULT_TIMEOUT_PER_TASK;
  }

  /**
   * Returns the maximum number of concurrent tasks allowed per instance, 1 by default.
   */
  default int getNumConcurrentTasksPerInstance() {
    return JobConfig.DEFAULT_NUM_CONCURRENT_TASKS_PER_INSTANCE;
  }

  /**
   * Performs necessary cleanups (e.g. remove metrics) when the controller leadership changes.
   */
  default void nonLeaderCleanUp() {
  }
}

PinotTaskExecutorFactory

Factory for PinotTaskExecutor which defines the APIs for Minion to execute the tasks.

public interface PinotTaskExecutorFactory {

  /**
   * Initializes the task executor factory.
   */
  void init(MinionTaskZkMetadataManager zkMetadataManager);

  /**
   * Returns the task type of the executor.
   */
  String getTaskType();

  /**
   * Creates a new task executor.
   */
  PinotTaskExecutor create();
}
public interface PinotTaskExecutor {

  /**
   * Executes the task based on the given task config and returns the execution result.
   */
  Object executeTask(PinotTaskConfig pinotTaskConfig)
      throws Exception;

  /**
   * Tries to cancel the task.
   */
  void cancel();
}

MinionEventObserverFactory

Factory for MinionEventObserver which defines the APIs for task event callbacks on minion.

public interface MinionEventObserverFactory {

  /**
   * Initializes the task executor factory.
   */
  void init(MinionTaskZkMetadataManager zkMetadataManager);

  /**
   * Returns the task type of the event observer.
   */
  String getTaskType();

  /**
   * Creates a new task event observer.
   */
  MinionEventObserver create();
}
public interface MinionEventObserver {

  /**
   * Invoked when a minion task starts.
   *
   * @param pinotTaskConfig Pinot task config
   */
  void notifyTaskStart(PinotTaskConfig pinotTaskConfig);

  /**
   * Invoked when a minion task succeeds.
   *
   * @param pinotTaskConfig Pinot task config
   * @param executionResult Execution result
   */
  void notifyTaskSuccess(PinotTaskConfig pinotTaskConfig, @Nullable Object executionResult);

  /**
   * Invoked when a minion task gets cancelled.
   *
   * @param pinotTaskConfig Pinot task config
   */
  void notifyTaskCancelled(PinotTaskConfig pinotTaskConfig);

  /**
   * Invoked when a minion task encounters exception.
   *
   * @param pinotTaskConfig Pinot task config
   * @param exception Exception encountered during execution
   */
  void notifyTaskError(PinotTaskConfig pinotTaskConfig, Exception exception);
}

Built-in Tasks

SegmentGenerationAndPushTask

To be added

RealtimeToOfflineSegmentsTask

See Pinot managed Offline flows for details.

MergeRollupTask

See Minion merge rollup task for details.

ConvertToRawIndexTask

To be added

Enable Tasks

Tasks are enabled on a per-table basis. To enable a certain task type (e.g. myTask) on a table, update the table config to include the task type:

{
  ...
  "task": {
    "taskTypeConfigsMap": {
      "myTask": {
        "myProperty1": "value1",
        "myProperty2": "value2"
      }
    }
  }
}

Under each enable task type, custom properties can be configured for the task type.

Schedule Tasks

Auto-Schedule

Tasks can be scheduled periodically for all task types on all enabled tables. Enable auto task scheduling by configuring the schedule frequency in the controller config with the key controller.task.frequencyInSeconds.

Tasks can also be scheduled based on cron expressions. The cron expression is set in the schedule config for each task type separately. Thie optioncontroller.task.scheduler.enabled should be set to true to enable cron scheduling.

As shown below, the RealtimeToOfflineSegmentsTask will be scheduled at the first second of every minute (following the syntax defined here).

  "task": {
    "taskTypeConfigsMap": {
      "RealtimeToOfflineSegmentsTask": {
        "bucketTimePeriod": "1h",
        "bufferTimePeriod": "1h",
        "schedule": "0 * * * * ?"
      }
    }
  },

Manual Schedule

Tasks can be manually scheduled using the following controller rest APIs:

Rest API
Description

POST /tasks/schedule

Schedule tasks for all task types on all enabled tables

POST /tasks/schedule?taskType=myTask

Schedule tasks for the given task type on all enabled tables

POST /tasks/schedule?tableName=myTable_OFFLINE

Schedule tasks for all task types on the given table

POST /tasks/schedule?taskType=myTask&tableName=myTable_OFFLINE

Schedule tasks for the given task type on the given table

Plug-in Custom Tasks

To plug in a custom task, implement PinotTaskGenerator, PinotTaskExecutorFactory and MinionEventObserverFactory (optional) for the task type (all of them should return the same string for getTaskType()), and annotate them with the following annotations:

Implementation
Annotation

PinotTaskGenerator

@TaskGenerator

PinotTaskExecutorFactory

@TaskExecutorFactory

MinionEventObserverFactory

@EventObserverFactory

After annotating the classes, put them under the package of name org.apache.pinot.*.plugin.minion.tasks.*, then they will be auto-registered by the controller and minion.

Example

See SimpleMinionClusterIntegrationTest where the TestTask is plugged-in.

Task-related metrics

There is a controller job that runs every 5 minutes by default and emits metrics about Minion tasks scheduled in Pinot. The following metrics are emitted for each task type:

  • NumMinionTasksInProgress: Number of running tasks

  • NumMinionSubtasksRunning: Number of running sub-tasks

  • NumMinionSubtasksWaiting: Number of waiting sub-tasks (unassigned to a minion as yet)

  • NumMinionSubtasksError: Number of error sub-tasks (completed with an error/exception)

  • PercentMinionSubtasksInQueue: Percent of sub-tasks in waiting or running states

  • PercentMinionSubtasksInError: Percent of sub-tasks in error

For each task, the Minion will emit these metrics:

  • TASK_QUEUEING: Task queueing time (task_dequeue_time - task_inqueue_time), assuming the time drift between helix controller and pinot minion is minor, otherwise the value may be negative

  • TASK_EXECUTION: Task execution time, which is the time spent on executing the task

  • NUMBER_OF_TASKS: number of tasks in progress on that minion. Whenever a Minion starts a task, increase the Gauge by 1, whenever a Minion completes (either succeeded or failed) a task, decrease it by 1

Batch Ingestion

Batch ingestion allows users to create a table using data already present in a file system such as S3. This is particularly useful for the cases where the user wants to utilize Pinot's ability to query large data with minimal latency or test out new features using a simple data file.

Ingesting data from a filesystem involves the following steps -

  1. Define Schema

  2. Define Table Config

  3. Upload Schema and Table configs

  4. Upload data

Batch Ingestion currently supports the following mechanisms to upload the data -

  • Standalone

  • Hadoop

  • Spark

Here we'll take a look at the standalone local processing to get you started.

Let's create a table for the following CSV data source.

studentID,firstName,lastName,gender,subject,score,timestampInEpoch
200,Lucy,Smith,Female,Maths,3.8,1570863600000
200,Lucy,Smith,Female,English,3.5,1571036400000
201,Bob,King,Male,Maths,3.2,1571900400000
202,Nick,Young,Male,Physics,3.6,1572418800000

Create Schema Configuration

In our data, the only column on which aggregations can be performed is score. Secondly, timestampInEpoch is the only timestamp column. So, on our schema, we keep score as metric and timestampInEpoch as timestamp column.

{
  "schemaName": "transcript",
  "dimensionFieldSpecs": [
    {
      "name": "studentID",
      "dataType": "INT"
    },
    {
      "name": "firstName",
      "dataType": "STRING"
    },
    {
      "name": "lastName",
      "dataType": "STRING"
    },
    {
      "name": "gender",
      "dataType": "STRING"
    },
    {
      "name": "subject",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "score",
      "dataType": "FLOAT"
    }
  ],
  "dateTimeFieldSpecs": [{
    "name": "timestampInEpoch",
    "dataType": "LONG",
    "format" : "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
  }]
}

Here, we have also defined two extra fields - format and granularity. The format specifies the formatting of our timestamp column in the data source. Currently, it is in milliseconds hence we have specified 1:MILLISECONDS:EPOCH

Create Table Configuration

We define a tabletranscriptand map the schema created in the previous step to the table. For batch data, we keep the tableType as OFFLINE

{
  "tableName": "transcript",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "replication": 1,
    "timeColumnName": "timestampInEpoch",
    "timeType": "MILLISECONDS",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": 365
  },
  "tenants": {
    "broker":"DefaultTenant",
    "server":"DefaultTenant"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP"
  },
  "ingestionConfig": {
    "batchIngestionConfig": {
      "segmentIngestionType": "APPEND",
      "segmentIngestionFrequency": "DAILY"
    }
  },
  "metadata": {}
}

Upload Schema and Table

Now that we have both the configs, we can simply upload them and create a table. To achieve that, just run the command -

bin/pinot-admin.sh AddTable \\
  -tableConfigFile /path/to/table-config.json \\
  -schemaFile /path/to/table-schema.json -exec

Check out the table config and schema in the [Rest API] to make sure it was successfully uploaded.

Upload data

We now have an empty table in pinot. So as the next step we will upload our CSV file to this table.

A table is composed of multiple segments. The segments can be created using three ways

1) Minion based ingestion 2) Upload API 3) Ingestion jobs

Minion Based Ingestion

Refer to SegmentGenerationAndPushTask

Upload API

There are 2 Controller APIs that can be used for a quick ingestion test using a small file.

When these APIs are invoked, the controller has to download the file and build the segment locally.

Hence, these APIs are NOT meant for production environments and for large input files.

/ingestFromFile

This API creates a segment using the given file and pushes it to Pinot. All steps happen on the controller. Example usage:

To upload a JSON file data.json to a table called foo_OFFLINE, use below command

Note that query params need to be URLEncoded. For example, {"inputFormat":"json"} in the command below needs to be converted to %7B%22inputFormat%22%3A%22json%22%7D.

curl -X POST -F [email protected] \
  -H "Content-Type: multipart/form-data" \
  "http://localhost:9000/ingestFromFile?tableNameWithType=foo_OFFLINE&
  batchConfigMapStr={"inputFormat":"json"}"

The batchConfigMapStr can be used to pass in additional properties needed for decoding the file. For example, in case of csv, you may need to provide the delimiter

curl -X POST -F [email protected] \
  -H "Content-Type: multipart/form-data" \
  "http://localhost:9000/ingestFromFile?tableNameWithType=foo_OFFLINE&
batchConfigMapStr={
  "inputFormat":"csv",
  "recordReader.prop.delimiter":"|"
}"

/ingestFromURI

This API creates a segment using file at the given URI and pushes it to Pinot. Properties to access the FS need to be provided in the batchConfigMap. All steps happen on the controller. Example usage:

curl -X POST "http://localhost:9000/ingestFromURI?tableNameWithType=foo_OFFLINE
&batchConfigMapStr={
  "inputFormat":"json",
  "input.fs.className":"org.apache.pinot.plugin.filesystem.S3PinotFS",
  "input.fs.prop.region":"us-central",
  "input.fs.prop.accessKey":"foo",
  "input.fs.prop.secretKey":"bar"
}
&sourceURIStr=s3://test.bucket/path/to/json/data/data.json"

Ingestion Jobs

Segments can be created and uploaded using tasks known as DataIngestionJobs. A job also needs a config of its own. We call this config the JobSpec.

For our CSV file and table, the job spec should look like below.

executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'

# Recommended to set jobType to SegmentCreationAndMetadataPush for production environment where Pinot Deep Store is configured  
jobType: SegmentCreationAndTarPush

inputDirURI: '/tmp/pinot-quick-start/rawdata/'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: '/tmp/pinot-quick-start/segments/'
overwriteOutput: true
pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
  tableName: 'transcript'
pinotClusterSpecs:
  - controllerURI: 'http://localhost:9000'
pushJobSpec:
  pushAttempts: 2
  pushRetryIntervalMillis: 1000

You can refer to Ingestion Job Spec for more details.

Now that we have the job spec for our table transcript , we can trigger the job using the following command

bin/pinot-admin.sh LaunchDataIngestionJob \\
    -jobSpecFile /tmp/pinot-quick-start/batch-job-spec.yaml

Once the job has successfully finished, you can head over to the [query console] and start playing with the data.

Segment Push Job Type

There are 3 ways to upload a Pinot segment:

1. Segment Tar Push

This is the original and default push mechanism.

Tar push requires the segment to be stored locally or can be opened as an InputStream on PinotFS. So we can stream the entire segment tar file to the controller.

The push job will:

  1. Upload the entire segment tar file to the Pinot controller.

Pinot controller will:

  1. Save the segment into the controller segment directory(Local or any PinotFS).

  2. Extract segment metadata.

  3. Add the segment to the table.

2. Segment URI Push

This push mechanism requires the segment Tar file stored on a deep store with a globally accessible segment tar URI.

URI push is light-weight on the client-side, and the controller side requires equivalent work as the Tar push.

The push job will:

  1. POST this segment Tar URI to the Pinot controller.

Pinot controller will:

  1. Download segment from the URI and save it to controller segment directory(Local or any PinotFS).

  2. Extract segment metadata.

  3. Add the segment to the table.

3. Segment Metadata Push

This push mechanism also requires the segment Tar file stored on a deep store with a globally accessible segment tar URI.

Metadata push is light-weight on the controller side, there is no deep store download involves from the controller side.

The push job will:

  1. Download the segment based on URI.

  2. Extract metadata.

  3. Upload metadata to the Pinot Controller.

Pinot Controller will:

  1. Add the segment to the table based on the metadata.

Segment Fetchers

When pinot segment files are created in external systems (Hadoop/spark/etc), there are several ways to push those data to the Pinot Controller and Server:

  1. Push segment to shared NFS and let pinot pull segment files from the location of that NFS. See Segment URI Push.

  2. Push segment to a Web server and let pinot pull segment files from the Web server with HTTP/HTTPS link. See Segment URI Push.

  3. Push segment to PinotFS(HDFS/S3/GCS/ADLS) and let pinot pull segment files from PinotFS URI. See Segment URI Push and Segment Metadata Push.

  4. Push segment to other systems and implement your own segment fetcher to pull data from those systems.

The first three options are supported out of the box within the Pinot package. As long your remote jobs send Pinot controller with the corresponding URI to the files it will pick up the file and allocate it to proper Pinot Servers and brokers. To enable Pinot support for PinotFS, you will need to provide PinotFS configuration and proper Hadoop dependencies.

Persistence

By default, Pinot does not come with a storage layer, so all the data sent, won't be stored in case of a system crash. In order to persistently store the generated segments, you will need to change controller and server configs to add deep storage. Checkout File systems for all the info and related configs.

Tuning

Standalone

Since pinot is written in Java, you can set the following basic java configurations to tune the segment runner job -

  • Log4j2 file location with -Dlog4j2.configurationFile

  • Plugin directory location with -Dplugins.dir=/opt/pinot/plugins

  • JVM props, like -Xmx8g -Xms4G

If you are using the docker, you can set the following under JAVA_OPTS variable.

Hadoop

You can set -D mapreduce.map.memory.mb=8192 to set the mapper memory size when submitting the Hadoop job.

Spark

You can add config spark.executor.memory to tune the memory usage for segment creation when submitting the Spark job.

Use S3 and Pinot in Docker

Setup Pinot Cluster

In order to setup Pinot in Docker to use S3 as deep store, we need to put extra configs for Controller and Server.

Create a docker network

docker network create -d bridge pinot-demo

Start Zookeeper

docker run \
    --name zookeeper \
    --restart always \
    --network=pinot-demo \
    -d zookeeper:3.5.6

Prepare Pinot configuration files

Below sections will prepare 3 config files under /tmp/pinot-s3-docker to mount to the container.

/tmp/pinot-s3-docker/
                     controller.conf
                     server.conf
                     ingestionJobSpec.yaml

Start Controller

Below is a sample controller.conf file.

Please config:

controller.data.dirto your s3 bucket. All the uploaded segments will be stored there.

And add s3 as a pinot storage with configs:

pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=us-west-2

Regarding AWS Credential, we also follow the convention of DefaultAWSCredentialsProviderChain.

You can specify AccessKey and Secret using:

  • Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)

  • Java System Properties - aws.accessKeyId and aws.secretKey

  • Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI

  • Configure AWS credential in pinot config files, e.g. set pinot.controller.storage.factory.s3.accessKey and pinot.controller.storage.factory.s3.secretKey in the config file. (Not recommended)

pinot.controller.storage.factory.s3.accessKey=****************LFVX
pinot.controller.storage.factory.s3.secretKey=****************gfhz

Add s3 to pinot.controller.segment.fetcher.protocols

and set pinot.controller.segment.fetcher.s3.class toorg.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

pinot.role=controller
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=us-west-2
controller.data.dir=s3://<my-bucket>/pinot-data/pinot-s3-example-docker/controller-data/
controller.local.temp.dir=/tmp/pinot-tmp-data/
controller.helix.cluster.name=pinot-s3-example-docker
controller.zk.str=zookeeper:2181
controller.port=9000
controller.enable.split.commit=true
pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Then start pinot controller with:

docker run --rm -ti \
    --name pinot-controller \
    --network=pinot-demo \
    -p 9000:9000 \
    --env AWS_ACCESS_KEY_ID=<aws-access-key-id> \
    --env AWS_SECRET_ACCESS_KEY=<aws-secret-access-key> \
    --mount type=bind,source=/tmp/pinot-s3-docker,target=/tmp \
    apachepinot/pinot:0.6.0-SNAPSHOT-ca8545b29-20201105-jdk11 StartController \
    -configFileName /tmp/controller.conf

Start Broker

Broker is a simple one you can just start it with default:

docker run --rm -ti \
    --name pinot-broker \
    --network=pinot-demo \
    --env AWS_ACCESS_KEY_ID=<aws-access-key-id> \
    --env AWS_SECRET_ACCESS_KEY=<aws-secret-access-key> \
    apachepinot/pinot:0.6.0-SNAPSHOT-ca8545b29-20201105-jdk11 StartBroker \
    -zkAddress zookeeper:2181 -clusterName pinot-s3-example-docker

Start Server

Below is a sample server.conf file

Similar to controller config, please also set s3 configs in pinot server.

pinot.server.netty.port=8098
pinot.server.adminapi.port=8097
pinot.server.instance.dataDir=/tmp/pinot-tmp/server/index
pinot.server.instance.segmentTarDir=/tmp/pinot-tmp/server/segmentTars


pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.server.storage.factory.s3.region=us-west-2
pinot.server.segment.fetcher.protocols=file,http,s3
pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Then start pinot server with:

docker run --rm -ti \
    --name pinot-server \
    --network=pinot-demo \
    --env AWS_ACCESS_KEY_ID=<aws-access-key-id> \
    --env AWS_SECRET_ACCESS_KEY=<aws-secret-access-key> \
    --mount type=bind,source=/tmp/pinot-s3-docker,target=/tmp \
    apachepinot/pinot:0.6.0-SNAPSHOT-ca8545b29-20201105-jdk11 StartServer \
    -zkAddress zookeeper:2181 -clusterName pinot-s3-example-docker \
    -configFileName /tmp/server.conf

Setup Table

In this demo, we just use airlineStats table as an example which is already packaged inside the docker image.

You can also mount your table conf and schema files to the container and run it.

docker run --rm -ti \
    --name pinot-ingestion-job \
    --network=pinot-demo \
    apachepinot/pinot:0.6.0-SNAPSHOT-ca8545b29-20201105-jdk11 AddTable \
    -controllerHost pinot-controller \
    -controllerPort 9000 \
    -schemaFile examples/batch/airlineStats/airlineStats_schema.json \
    -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json \
    -exec

Set up Ingestion Jobs

Standalone Job

Below is a sample standalone ingestion job spec with certain notable changes:

  • jobType is SegmentCreationAndMetadataPush (this job will bypass controller download segment )

  • inputDirURI is set to a s3 location s3://my.bucket/batch/airlineStats/rawdata/

  • outputDirURI is set to a s3 location s3://my.bucket/output/airlineStats/segments

  • Add a new PinotFs under pinotFSSpecs

    - scheme: s3
      className: org.apache.pinot.plugin.filesystem.S3PinotFS
      configs:
        region: 'us-west-2'

Sample ingestionJobSpec.yaml

# executionFrameworkSpec: Defines ingestion jobs to be running.
executionFrameworkSpec:

  # name: execution framework name
  name: 'standalone'

  # segmentGenerationJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentGenerationJobRunner interface.
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'

  # segmentTarPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentTarPushJobRunner interface.
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'

  # segmentUriPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentUriPushJobRunner interface.
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'

  # segmentMetadataPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.IngestionJobRunner interface.
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'

# jobType: Pinot ingestion job type.
# Supported job types are:
#   'SegmentCreation'
#   'SegmentTarPush'
#   'SegmentUriPush'
#   'SegmentCreationAndTarPush'
#   'SegmentCreationAndUriPush'
jobType: SegmentCreationAndMetadataPush

# inputDirURI: Root directory of input data, expected to have scheme configured in PinotFS.
inputDirURI: 's3://<my-bucket>/pinot-data/rawdata/airlineStats/rawdata/'

# includeFileNamePattern: include file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will include all avro files just under the inputDirURI, not sub directories;
#   'glob:**/*.avro' will include all the avro files under inputDirURI recursively.
includeFileNamePattern: 'glob:**/*.avro'

# excludeFileNamePattern: exclude file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will exclude all avro files just under the inputDirURI, not sub directories;
#   'glob:**/*.avro' will exclude all the avro files under inputDirURI recursively.
# _excludeFileNamePattern: ''

# outputDirURI: Root directory of output segments, expected to have scheme configured in PinotFS.
outputDirURI: 's3://<my-bucket>/pinot-data/pinot-s3-docker/segments/airlineStats'

# segmentCreationJobParallelism: The parallelism to create egments.
segmentCreationJobParallelism: 5

# overwriteOutput: Overwrite output segments if existed.
overwriteOutput: true

# pinotFSSpecs: defines all related Pinot file systems.
pinotFSSpecs:

  - # scheme: used to identify a PinotFS.
    # E.g. local, hdfs, dbfs, etc
    scheme: file

    # className: Class name used to create the PinotFS instance.
    # E.g.
    #   org.apache.pinot.spi.filesystem.LocalPinotFS is used for local filesystem
    #   org.apache.pinot.plugin.filesystem.AzurePinotFS is used for Azure Data Lake
    #   org.apache.pinot.plugin.filesystem.HadoopPinotFS is used for HDFS
    className: org.apache.pinot.spi.filesystem.LocalPinotFS


  - scheme: s3
    className: org.apache.pinot.plugin.filesystem.S3PinotFS
    configs:
      region: 'us-west-2'

# recordReaderSpec: defines all record reader
recordReaderSpec:

  # dataFormat: Record data format, e.g. 'avro', 'parquet', 'orc', 'csv', 'json', 'thrift' etc.
  dataFormat: 'avro'

  # className: Corresponding RecordReader class name.
  # E.g.
  #   org.apache.pinot.plugin.inputformat.avro.AvroRecordReader
  #   org.apache.pinot.plugin.inputformat.csv.CSVRecordReader
  #   org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader
  #   org.apache.pinot.plugin.inputformat.json.JSONRecordReader
  #   org.apache.pinot.plugin.inputformat.orc.ORCRecordReader
  #   org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReader
  className: 'org.apache.pinot.plugin.inputformat.avro.AvroRecordReader'

# tableSpec: defines table name and where to fetch corresponding table config and table schema.
tableSpec:

  # tableName: Table name
  tableName: 'airlineStats'

  # schemaURI: defines where to read the table schema, supports PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_schema.json
  #   http://localhost:9000/tables/myTable/schema
  schemaURI: 'http://pinot-controller:9000/tables/airlineStats/schema'

  # tableConfigURI: defines where to reade the table config.
  # Supports using PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_config.json
  #   http://localhost:9000/tables/myTable
  # Note that the API to read Pinot table config directly from pinot controller contains a JSON wrapper.
  # The real table config is the object under the field 'OFFLINE'.
  tableConfigURI: 'http://pinot-controller:9000/tables/airlineStats'

# pinotClusterSpecs: defines the Pinot Cluster Access Point.
pinotClusterSpecs:
  - # controllerURI: used to fetch table/schema information and data push.
    # E.g. http://localhost:9000
    controllerURI: 'http://pinot-controller:9000'

# pushJobSpec: defines segment push job related configuration.
pushJobSpec:

  # pushAttempts: number of attempts for push job, default is 1, which means no retry.
  pushAttempts: 2

  # pushRetryIntervalMillis: retry wait Ms, default to 1 second.
  pushRetryIntervalMillis: 1000

Launch the data ingestion job:

docker run --rm -ti \
    --name pinot-ingestion-job \
    --network=pinot-demo \
    --env AWS_ACCESS_KEY_ID=<aws-access-key-id> \
    --env AWS_SECRET_ACCESS_KEY=<aws-secret-access-key> \
    --mount type=bind,source=/tmp/pinot-s3-docker,target=/tmp \
    apachepinot/pinot:0.6.0-SNAPSHOT-ca8545b29-20201105-jdk11 LaunchDataIngestionJob \
    -jobSpecFile /tmp/ingestionJobSpec.yaml

Using Kafka and Pinot for Real-time User-facing Analytics
Building Latency Sensitive User-facing Analytics via Apache Pinot

Architecture

This page covers everything you need to know about how queries are computed in Pinot's distributed systems architecture.

This page will introduce you to the guiding principles behind the design of Apache Pinot. Here you will learn the distributed systems architecture that allows Pinot to scale the performance of queries linearly based on the number of nodes in a cluster. You'll also be introduced to the two different types of tables used to ingest and query data in offline (batch) or real-time (stream) mode.

It's recommended that you read Basic Concepts to better understand the terms used in this guide.

Guiding design principles

Pinot was designed by engineers at LinkedIn and Uber to scale query performance based on the number of nodes in a cluster. As you add more nodes, query performance will always improve based on the expected query volume per second quota. To achieve horizontal scalability to an unbounded number of nodes and data storage, without performance degradation, the following guiding design principles were established.

  • Highly available: Pinot is built to serve low latency analytical queries for customer facing applications. By design, there is no single point of failure in Pinot. The system continues to serve queries when a node goes down.

  • Horizontally scalable: Ability to scale by adding new nodes as a workload changes.

  • Latency vs Storage: Pinot is built to provide low latency even at high-throughput. Features such as segment assignment strategy, routing strategy, star-tree indexing were developed to achieve this.

  • Immutable data: Pinot assumes that all data stored is immutable. For GDPR compliance, we provide an add-on solution for purging data while maintaining performance guarantees.

  • Dynamic configuration changes: Operations such as adding new tables, expanding a cluster, ingesting data, modifying indexing config, and re-balancing must be performed without impacting query availability or performance.

Core components

As described in the concepts, Pinot has multiple distributed system components: Controller, Broker, Server, and Minion.

Pinot uses Apache Helix for cluster management. Helix is embedded as an agent within the different components and uses Apache Zookeeper for coordination and maintaining the overall cluster state and health.

Apache Helix and Zookeeper

All Pinot servers and brokers are managed by Helix. Helix is a generic cluster management framework to manage partitions and replicas in a distributed system. It's helpful to think of Helix as an event-driven discovery service with push and pull notifications that drives the state of a cluster to an ideal configuration. A finite-state machine maintains a contract of stateful operations that drives the health of the cluster towards its optimal configuration. Query load is optimized as Helix updates routing configurations between nodes based on where data is stored in the cluster.

Helix divides nodes into three logical components based on their responsibilities:

  1. Participant: These are the nodes in the cluster that actually host the distributed storage resources.

  2. Spectator: These nodes observe the current state of each participant and routes requests accordingly. Routers, for example, need to know the instance on which a partition is hosted and its state in order to route the request to the appropriate endpoint. Routing is continually being changed to optimize cluster performance as storage primitives are added and changed.

  3. Controller: The controller observes and manages the state of participant nodes. The controller is responsible for coordinating all state transitions in the cluster and ensures that state constraints are satisfied while maintaining cluster stability.

Helix uses Zookeeper to maintain cluster state. Each component in a Pinot cluster takes a Zookeeper address as a startup parameter. The various components that are distributed in a Pinot cluster will watch Zookeeper notifications and issue updates via its embedded Helix-defined agent.

Component

Helix Mapping

Segment

Modeled as a Helix Partition. Each can have multiple copies referred to as Replicas.

Table

Modeled as a Helix Resource. Multiple segments are grouped into a . All segments belonging to a Pinot Table have the same schema.

Controller

Embeds the Helix agent that drives the overall state of the cluster.

Server

is modeled as a Helix Participant and hosts .

Broker

Broker is modeled as a Helix Spectator that observes the cluster for changes in the state of segments and servers. In order to support multi-tenancy, brokers are also modeled as Helix Participants.

Minion

Pinot Minion is modeled as a Helix Participant.

Helix agents use Zookeeper to store and update configurations, as well as for distributed coordination. Zookeeper stores the following information about the cluster:

Resource

Stored Properties

Controller

  • The controller that is assigned as the current leader

Servers/Brokers

  • A list of servers/brokers and their configuration

  • Health status

Tables

  • List of tables

  • Table configurations

  • Table schema information

  • List of segments within a table

Segment

  • Exact server location(s) of a segment (routing table)

  • State of each segment (online/offline/error/consuming)

  • Meta data about each segment

Knowing the ZNode layout structure in Zookeeper for Helix agents in a cluster is useful for operations and/or troubleshooting cluster state and health.

Pinot's Zookeeper Browser UI

Controller

Pinot's controller acts as the driver of the cluster's overall state and health. Because of its role as a Helix participant and spectator, which drives the state of other components, it is the first component that is typically started after Zookeeper. Two parameters are required for starting a controller: Zookeeper address and cluster name. The controller will automatically create a cluster via Helix if it does not yet exist.

Fault tolerance

To achieve fault tolerance, one can start multiple controllers (typically three) and one of them will act as a leader. If the leader crashes or dies, another leader is automatically elected. Leader election is achieved using Apache Helix. Having at-least one controller is required to perform any DDL equivalent operation on the cluster, such as adding a table or a segment.

The controller does not interfere with query execution. Query execution is not impacted even when all controllers nodes are offline. If all controller nodes are offline, the state of the cluster will stay as it was when the last leader went down. When a new leader comes online, a cluster resumes re-balancing activity and can accept new tables or segments.

Controller REST interface

The controller provides a REST interface to perform CRUD operations on all logical storage resources (servers, brokers, tables, and segments).

See Pinot Data Explorer for more information on the web-based admin tool.

Broker

The responsibility of the broker is to route a given query to an appropriate server instance. A broker will collect and merge the responses from all servers into a final result and send it back to the requesting client. The broker provides HTTP endpoints that accept SQL queries and returns the response in JSON format.

Brokers need three key things to start.

  • Cluster name

  • Zookeeper address

  • Broker instance name

At the start, a broker registers as a Helix Participant and awaits notifications from other Helix agents. These notifications will be handled for table creation, a new segment being loaded, or a server starting up/or going down, in addition to any configuration changes.

Service Discovery/Routing Table

Irrespective of the kind of notification, the key responsibility of a broker is to maintain the query routing table. The query routing table is simply a mapping between segments and the servers that a segment resides on. Typically, a segment resides on more than one server. The broker computes multiple routing tables depending on the configured routing strategy for a table. The default strategy is to balance the query load across all available servers.

There are advanced routing strategies available such as ReplicaAware routing, partition-based routing, and minimal server selection routing. These strategies are meant for special or generic cases that are meant to serve very high throughput queries.

//This is an example ZNode config for EXTERNAL VIEW in Helix
{
  "id" : "baseballStats_OFFLINE",
  "simpleFields" : {
    ...
  },
  "mapFields" : {
    "baseballStats_OFFLINE_0" : {
      "Server_10.1.10.82_7000" : "ONLINE"
    }
  },
  ...
}

Query processing

For every query, a cluster's broker performs the following:

  • Fetches the routes that are computed for a query based on the routing strategy defined in a table's configuration.

  • Computes the list of segments to query from on each server.

  • Scatter-Gather: sends the requests to each server and gathers the responses.

  • Merge: merges the query results returned from each server.

  • Sends the query result to the client.

// Query: select count(*) from baseballStats limit 10

// RESPONSE
// ========
{
    "resultTable": {
        "dataSchema": {
            "columnDataTypes": ["LONG"],
            "columnNames": ["count(*)"]
        },
        "rows": [
            [97889]
        ]
    },
    "exceptions": [],
    "numServersQueried": 1,
    "numServersResponded": 1,
    "numSegmentsQueried": 1,
    "numSegmentsProcessed": 1,
    "numSegmentsMatched": 1,
    "numConsumingSegmentsQueried": 0,
    "numDocsScanned": 97889,
    "numEntriesScannedInFilter": 0,
    "numEntriesScannedPostFilter": 0,
    "numGroupsLimitReached": false,
    "totalDocs": 97889,
    "timeUsedMs": 5,
    "segmentStatistics": [],
    "traceInfo": {},
    "minConsumingFreshnessTimeMs": 0
}

Fault tolerance

Broker instances scale horizontally without an upper bound. In a majority of cases, only three brokers are required. If most query results that are returned to a client are <1MB in size per query, one can run a broker and servers inside the same instance container. This lowers the overall footprint of a cluster deployment for use cases that do not need to guarantee a strict SLA on query performance in production.

Server

Servers host segments and do most of the heavy lifting during query processing. Though the architecture shows that there are two kinds of servers, real-time and offline, a server does not really know if it's going to be a real-time server or an offline server. The responsibility of a server depends on the table assignment strategy.

In theory, a server can host both real-time segments and offline segments. However, in practice, we use different types of machine SKUs for real-time servers and offline servers. The advantage of separating real-time servers and offline servers is to allow each to scale independently.

Offline servers

Offline servers typically host segments that are immutable. In this case, segments are created outside of a cluster and uploaded via a shell-based curl request. Based on the replication factor and the segment assignment strategy, the controller picks one or more servers to host the segment. Servers are notified via Helix about the new segments. Servers fetch the segments from deep store and load them before being ready to serve query requests. At this point, the cluster's broker detects that new segments are available and starts including them in query responses.

Real-time servers

Real-time servers are different from the offline servers. Real-time server nodes ingest data from streaming sources, such as Kafka, and generate the indexed segments in-memory (flushing segments to disk periodically). In memory segments are also known as consuming segments. These consuming segments get flushed periodically based on completion threshold (based on number of rows, time or segment size). At this point, they are known as completed segments. Completed segments are similar to the offline server's segments. Queries go over the in-flight (consuming) segments and the completed segments.

Minion

Minion is an optional component and is not required to get started with Pinot. Minion is used for purging data from a Pinot cluster (for reasons such as GDPR compliance in the UK).

Data ingestion overview

Within Pinot, a logical table is modeled as one of two types of physical tables: offline or real-time. The reason for having two types of tables is because each one follows a different state model.

A real-time and offline table provide different configuration options for indexing and, in the case of real-time, the connector properties for the stream data source (i.e. Kafka). Table types also allow users to use different containers for real-time and offline server nodes. For instance, offline servers might use virtual machines with larger storage capacity where real-time servers might need higher system memory and/or more CPU cores.

The two types of tables also scale differently.

  • Real-time tables have a smaller retention period and scales query performance based on the ingestion rate.

  • Offline tables have larger retention and scales performance based on the size of stored data.

There are a few things to keep in mind when configuring the different types of tables for your workloads. When ingesting data from the same source, you can have two tables that ingest the same data that are configured differently for real-time and offline queries. Even though the two tables have the same data, performance will scale differently for queries based on your requirements. In this scenario, real-time and offline tables must share the same schema.

Tables for real-time and offline can be configured differently depending on usage requirements. For example, you can choose to enable star-tree indexing for an offline table, while the real-time table with the same schema may not need it.

Batch data flow

In batch mode, data is ingested into Pinot via an ingestion job. An ingestion job transforms a raw data source (such as a CSV file) into segments. Once segments are generated for the imported data, an ingestion job stores them into the cluster's segment store (a.k.a deep store) and notifies the controller. The notification is processed and the result is that the Helix agent on the controller updates the ideal state configuration in Zookeeper. Helix will then notify the offline server that there are new segments available. In response to the notification from the controller, the offline server downloads the newly created segments directly from the cluster's segment store. The cluster's broker, which watches for state changes in Helix, detects the new segments and adds them to the list of segments to query (segment-to-server routing table).

Real-time data flow

At table creation, a controller creates a new entry in Zookeeper for the consuming segment. Helix notices the new segment and notifies the real-time server, which starts consuming data from the streaming source. The broker, which watches for changes, detects the new segments and adds them to the list of segments to query (segment-to-server routing table).

Whenever the segment is complete (i.e. full), the real-time server notifies the Controller, which checks with all replicas and picks a winner to commit the segment to. The winner commits the segment and uploads it to the cluster's segment store, updating the state of the segment from "consuming" to "online". The controller then prepares a new segment in a "consuming" state.

Query overview

Queries are received by brokers—which checks the request against the segment-to-server routing table—scattering the request between real-time and offline servers.

Pinot query overview

The two tables then process the request by filtering and aggregating the queried data, which is then returned back to the broker. Finally, the broker gathers together all of the pieces of the query response and responds back to the client with the result.

Batch import example

Step-by-step guide on pushing your own data into the Pinot cluster

So far, we setup our cluster, ran some queries, explored the admin endpoints. Now, it's time to get our own data into Pinot. The rest of the instructions assume you're using (inside a pinot-quickstart container).

Preparing your data

Let's gather our data files and put it in pinot-quick-start/rawdata.

Supported file formats are CVS, JSON, AVRO, PARQUET, THRIFT, ORC. If you don't have sample data, you can use this sample CSV.

Creating a schema

Schema is used to define the columns and data types of the Pinot table. A detailed overview of the schema can be found in .

Briefly, we categorize our columns into 3 types

For example, in our sample table, the playerID, yearID, teamID, league, playerName columns are the dimensions, the playerStint, numberOfgames, numberOfGamesAsBatter, AtBatting, runs, hits, doules, triples, homeRuns, runsBattedIn, stolenBases, caughtStealing, baseOnBalls, strikeouts, intentionalWalks, hitsByPitch, sacrificeHits, sacrificeFlies, groundedIntoDoublePlays, G_old columns are the metrics and there is no time column.

Once you have identified the dimensions, metrics and time columns, create a schema for your data, using the reference below.

Creating a table config

A table config is used to define the config related to the Pinot table. A detailed overview of the table can be found in .

Here's the table config for the sample CSV file. You can use this as a reference to build your own table config. Simply edit the tableName and schemaName.

Uploading your table config and schema

Check the directory structure so far

Upload the table config using the following command

Check out the table config and schema in the to make sure it was successfully uploaded.

Creating a segment

A Pinot table's data is stored as Pinot segments. A detailed overview of the segment can be found in .

To generate a segment, we need to first create a job spec yaml file. JobSpec yaml file has all the information regarding data format, input data location and pinot cluster coordinates. You can just copy over this job spec file. If you're using your own data, be sure to 1) replace transcript with your table name 2) set the right recordReaderSpec

Use the following command to generate a segment and upload it

Sample output

Check that your segment made it to the table using the

Querying your data

You're all set! You should see your table in the and be able to run queries against it now.

0.3.0

0.3.0 release of Apache Pinot introduces the concept of plugins that makes it easy to extend and integrate with other systems.

What's the big change?

The reason behind the architectural change from the previous release (0.2.0) and this release (0.3.0), is the possibility of extending Apache Pinot. The 0.2.0 release was not flexible enough to support new storage types nor new stream types. Basically, inserting a new functionality required to change too much code. Thus, the Pinot team went through an extensive refactoring and improvement of the source code.

For instance, the picture below shows the module dependencies of the 0.2.X or previous releases. If we wanted to support a new storage type, we would have had to change several modules. Pretty bad, huh?

In order to conquer this challenge, below major changes are made:

  • Refactored common interfaces to pinot-spi module

  • Concluded four types of modules:

    • Pinot input format: How to read records from various data/file formats: e.g. Avro/CSV/JSON/ORC/Parquet/Thrift

    • Pinot filesystem: How to operate files on various filesystems: e.g. Azure Data Lake/Google Cloud Storage/S3/HDFS

    • Pinot stream ingestion: How to ingest data stream from various upstream systems, e.g. Kafka/Kinesis/Eventhub

    • Pinot batch ingestion: How to run Pinot batch ingestion jobs in various frameworks, like Standalone, Hadoop, Spark.

  • Built shaded jars for each individual plugin

  • Added support to dynamically load pinot plugins at server startup time

Now the architecture supports a plug-and-play fashion, where new tools can be supported with little and simple extensions, without affecting big chunks of code. Integrations with new streaming services and data formats can be developed in a much more simple and convenient way.

Notable New Features

  • SQL Support

    • Added Calcite SQL compiler

    • Added SQL response format (, )

    • Added support for GROUP BY with ORDER BY ()

    • Query console defaults to use SQL syntax ()

    • Support column alias (, )

    • Added SQL query endpoint: /query/sql ()

    • Support arithmetic operators ()

    • Support non-literal expressions for right-side operand in predicate comparison()

  • Added support for DISTINCT ()

  • Added support default value for BYTES column ()

  • JDK 11 Support

  • Added support to tune size vs accuracy for approximation aggregation functions: DistinctCountHLL, PercentileEst, PercentileTDigest ()

  • Added Data Anonymizer Tool ()

  • Deprecated pinot-hadoop and pinot-spark modules, replace with pinot-batch-ingestion-hadoop and pinot-batch-ingestion-spark

  • Support STRING and BYTES for no dictionary columns in realtime consuming segments ()

  • Make pinot-distribution to build a pinot-all jar and assemble it ()

  • Added support for PQL case insensitive ()

  • Enhanced TableRebalancer logics

    • Moved to new rebalance strategy ()

    • Supported rebalancing tables under any condition()

    • Supported reassigning completed segments along with Consuming segments for LLC realtime table ()

  • Added experimental support for Text Search‌ ()

  • Upgraded Helix to version 0.9.4, task management now works as expected ()

  • Added date_trunc transformation function. ()

  • Support schema evolution for consuming segment. ()

  • APIs Additions/Changes

    • Pinot Admin Command

      • Added -queryType option in PinotAdmin PostQuery subcommand ()

      • Added -schemaFile as option in AddTable command ()

      • Added OperateClusterConfig sub command in PinotAdmin ()

    • Pinot Controller Rest APIs

      • Get Table leader controller resource ()

      • Support HTTP POST/PUT to upload JSON encoded schema ()

      • Table rebalance API now requires both table name and type as parameters. ()

      • Refactored Segments APIs ()

      • Added segment batch deletion REST API ()

      • Update schema API to reload table on schema change when applicable ()

      • Enhance the task related REST APIs ()

      • Added PinotClusterConfig REST APIs ()

        • GET /cluster/configs

        • POST /cluster/configs

        • DELETE /cluster/configs/{configName}

  • Configurations Additions/Changes

    • Config: controller.host is now optional in Pinot Controller

    • Added instance config: queriesDisabled to disable query sending to a running server ()

    • Added broker config: pinot.broker.enable.query.limit.override configurable max query response size ()

    • Removed deprecated server configs ()

      • pinot.server.starter.enableSegmentsLoadingCheck

      • pinot.server.starter.timeoutInSeconds

      • pinot.server.instance.enable.shutdown.delay

      • pinot.server.instance.starter.maxShutdownWaitTime

      • pinot.server.instance.starter.checkIntervalTime

    • Decouple server instance id with hostname/port config. ()

    • Add FieldConfig to encapsulate encoding, indexing info for a field.()

Major Bug Fixes

  • Fixed the bug of releasing the segment when there are still threads working on it. ()

  • Fixed the bug of uneven task distribution for threads ()

  • Fixed encryption for .tar.gz segment file upload ()

  • Fixed controller rest API to download segment from non local FS. ()

  • Fixed the bug of not releasing segment lock if segment recovery throws exception ()

  • Fixed the issue of server not registering state model factory before connecting the Helix manager ()

  • Fixed the exception in server instance when Helix starts a new ZK session ()

  • Fixed ThreadLocal DocIdSet issue in ExpressionFilterOperator ()

  • Fixed the bug in default value provider classes ()

  • Fixed the bug when no segment exists in RealtimeSegmentSelector ()

Work in Progress

  • We are in the process of supporting text search query functionalities.

  • We are in the process of supporting null value (), currently limited query feature is supported

    • Added Presence Vector to represent null value ()

    • Added null predicate support for leaf predicates ()

Backward Incompatible Changes

  • It’s a disruptive upgrade from version 0.1.0 to this because of the protocol changes between Pinot Broker and Pinot Server. Please ensure that you upgrade to release 0.2.0 first, then upgrade to this version.

  • If you build your own startable or war without using scripts generated in Pinot-distribution module. For Java 8, an environment variable “plugins.dir” is required for Pinot to find out where to load all the Pinot plugin jars. For Java 11, plugins directory is required to be explicitly set into classpath. Please see pinot-admin.sh as an example.

  • As always, we recommend that you upgrade controllers first, and then brokers and lastly the servers in order to have zero downtime in production clusters.

  • Kafka 0.9 is no longer included in the release distribution.

  • Pull request introduces a backward incompatible API change for segments management.

    • Removed segment toggle APIs

    • Removed list all segments in cluster APIs

    • Deprecated below APIs:

      • GET /tables/{tableName}/segments

      • GET /tables/{tableName}/segments/metadata

      • GET /tables/{tableName}/segments/crc

      • GET /tables/{tableName}/segments/{segmentName}

      • GET /tables/{tableName}/segments/{segmentName}/metadata

      • GET /tables/{tableName}/segments/{segmentName}/reload

      • POST /tables/{tableName}/segments/{segmentName}/reload

      • GET /tables/{tableName}/segments/reload

      • POST /tables/{tableName}/segments/reload

  • Pull request deprecated below task related APIs:

    • GET:

      • /tasks/taskqueues: List all task queues

      • /tasks/taskqueuestate/{taskType} -> /tasks/{taskType}/state

      • /tasks/tasks/{taskType} -> /tasks/{taskType}/tasks

      • /tasks/taskstates/{taskType} -> /tasks/{taskType}/taskstates

      • /tasks/taskstate/{taskName} -> /tasks/task/{taskName}/taskstate

      • /tasks/taskconfig/{taskName} -> /tasks/task/{taskName}/taskconfig

    • PUT:

      • /tasks/scheduletasks -> POST /tasks/schedule

      • /tasks/cleanuptasks/{taskType} -> /tasks/{taskType}/cleanup

      • /tasks/taskqueue/{taskType}: Toggle a task queue

    • DELETE:

      • /tasks/taskqueue/{taskType} -> /tasks/{taskType}

  • Deprecated modules pinot-hadoop and pinot-spark and replaced with pinot-batch-ingestion-hadoop and pinot-batch-ingestion-spark.

  • Introduced new Pinot batch ingestion jobs and yaml based job specs to define segment generation jobs and segment push jobs.

  • You may see exceptions like below in pinot-brokers during cluster upgrade, but it's safe to ignore them.

Table

A table is a logical abstraction that represents a collection of related data. It is composed of columns and rows (known as documents in Pinot). The columns, data types, and other metadata related to the table are defined using a schema.

Pinot breaks a table into multiple segments and stores these segments in a deep-store such as HDFS as well as Pinot servers.

In the Pinot cluster, a table is modeled as a Helix resource and each segment of a table is modeled as a Helix Partition.

Pinot supports the following types of table:

Type
Description

Offline

Offline tables ingest pre-built pinot-segments from external data stores. This is generally used for batch ingestion.

Realtime

Realtime tables ingest data from streams (such as Kafka) and build segments from the consumed data.

Hybrid

A hybrid Pinot table has both realtime as well as offline tables under the hood. By default, all tables in Pinot are Hybrid in nature.

The user querying the database does not need to know the type of the table. They only need to specify the table name in the query.

e.g. regardless of whether we have an offline table myTable_OFFLINE, a real-time table myTable_REALTIME, or a hybrid table containing both of these, the query will be:

select count(*)
from myTable

Table Configuration is used to define the table properties, such as name, type, indexing, routing, retention etc. It is written in JSON format and is stored in Zookeeper, along with the table schema.

You can use the following properties to make your tables faster or leaner:

  • Segment

  • Indexing

  • Tenants

Segments

A table is comprised of small chunks of data. These chunks are known as Segments. To learn more about how Pinot creates and manages segments see the official documentation

For offline tables, Segments are built outside of pinot and uploaded using a distributed executor such as Spark or Hadoop. For more details, see Batch Ingestion.

For real-time tables, segments are built in a specific interval inside Pinot. You can tune the following for the real-time segments:

Flush

The Pinot real-time consumer ingests the data, creates the segment, and then flushes the in-memory segment to disk. Pinot allows you to configure when to flush the segment in the following ways:

  • Number of consumed rows - After consuming X no. of rows from the stream, Pinot will persist the segment to disk

  • Number of desired rows per segment - Pinot learns and then estimates the number of rows that need to be consumed so that the persisted segment is approximately the size. The learning phase starts by setting the number of rows to 100,000 (this value can be changed) and adjusts it to reach the desired segment size. The segment size may go significantly over the desired size during the learning phase. Pinot corrects the estimation as it goes along, so it is not guaranteed that the resulting completed segments are of the exact size as configured. You should set this value to optimize the performance of queries.

  • Max time duration to wait - Pinot consumers wait for the configured time duration after which segments are persisted to the disk.

Replicas A segment can have multiple replicas to provide higher availability. You can configure the number of replicas for a table segment using

Completion Mode By default, if the in-memory segment in the non-winner server is equivalent to the committed segment, then the non-winner server builds and replaces the segment. If the available segment is not equivalent to the committed segment, the server simply downloads the committed segment from the controller.

However, in certain scenarios, the segment build can get very memory intensive. It might be desirable to enforce the non-committer servers to just download the segment from the controller, instead of building it again. You can do this by setting completionMode: "DOWNLOAD" in the table configuration

For more details on why this is needed, see Completion Config

Download Scheme

A Pinot server may fail to download segments from the deep store such as HDFS after its completion. However, you can configure servers to download these segments from peer servers instead of the deep store. Currently, only HTTP and HTTPS download schemes are supported. More methods such as gRPC/Thrift can be added in the future.

For more details about peer segment download during real-time ingestion, please refer to this design doc on bypass deep store for segment completion.

Indexing

You can create multiple indices on a table to increase the performance of the queries. The following types of indices are supported:

  • Forward Index

    • Dictionary-encoded forward index with bit compression

    • Raw value forward index

    • Sorted forward index with run-length encoding

  • Inverted Index

    • Bitmap inverted index

    • Sorted inverted index

  • Star-tree Index

  • Range Index

  • Text Index

  • Geospatial

For more details on each indexing mechanism and corresponding configurations, see Indexing.

You can also set up Bloomfilters on columns to make queries faster. Further, you can also keep segments in off-heap instead of on-heap memory for faster queries.

Pre-aggregation

You can aggregate the real-time stream data as it is consumed to reduce segment sizes. We sum the metric column values of all rows that have the same dimensions and create a single row in the segment. This feature is only available on REALTIME tables.

The only supported aggregation is SUM. The columns on which pre-aggregation is to be done need to satisfy the following requirements:

  • All metrics should be listed in noDictionaryColumns .

  • There should not be any multi-value dimensions.

  • All dimension columns are treated to have a dictionary, even if they appear as noDictionaryColumns in the config.

The following table config snippet shows an example of enabling pre-aggregation during real-time ingestion.

pinot-table-realtime.json
    "tableIndexConfig": { 
      "noDictionaryColumns": ["metric1", "metric2"],
      "aggregateMetrics": true,
      ...
    }

Tenants

Each table is associated with a tenant. A segment resides on the server, which has the same tenant as itself. For more details on how tenants work, see Tenant.

You can also override if a table should move to a server with different tenant based on segment status.

A tagOverrideConfig can be added under the tenants section for realtime tables, to override tags for consuming and completed segments. For example:

  "broker": "brokerTenantName",
  "server": "serverTenantName",
  "tagOverrideConfig" : {
    "realtimeConsuming" : "serverTenantName_REALTIME"
    "realtimeCompleted" : "serverTenantName_OFFLINE"
  }
}

In the above example, the consuming segments will still be assigned to serverTenantName_REALTIME hosts, but once they are completed, the segments will be moved to serverTeantnName_OFFLINE. It is possible to specify the full name of any tag in this section (so, for example, you could decide that completed segments for this table should be in pinot servers tagged as allTables_COMPLETED). To learn more about this config, see the Moving Completed Segments section.

Hybrid Table

A hybrid table is a table composed of 2 tables, one offline and one real-time that share the same name. In such a table, offline segments may be pushed periodically. The retention on the offline table can be set to a high value since segments are coming in on a periodic basis, whereas the retention on the real-time part can be small.

Once an offline segment is pushed to cover a recent time period, the brokers automatically switch to using the offline table for segments for that time period and use the real-time table only for data not available in the offline table.

To understand how time boundary works in the case of a hybrid table, see Broker.

A typical scenario is pushing a deduped cleaned up data into an offline table every day while consuming real-time data as and when it arrives. The data can be kept in offline tables for even a few years while the real-time data would be cleaned every few days.

Examples

Create a table config for your data, or see examples for all possible batch/streaming tables.

Prerequisites

  1. Setup the cluster

  2. Create broker and server tenants

Offline Table Creation

docker run \
    --network=pinot-demo \
    --name pinot-batch-table-creation \
    ${PINOT_IMAGE} AddTable \
    -schemaFile examples/batch/airlineStats/airlineStats_schema.json \
    -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json \
    -controllerHost pinot-controller \
    -controllerPort 9000 \
    -exec

Sample Console Output

Executing command: AddTable -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json -schemaFile examples/batch/airlineStats/airlineStats_schema.json -controllerHost pinot-controller -controllerPort 9000 -exec
Sending request: http://pinot-controller:9000/schemas to controller: a413b0013806, version: Unknown
{"status":"Table airlineStats_OFFLINE succesfully added"}
bin/pinot-admin.sh AddTable \
    -schemaFile examples/batch/airlineStats/airlineStats_schema.json \
    -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json \
    -exec
# add schema
curl -F schemaName=@airlineStats_schema.json  localhost:9000/schemas

# add table
curl -i -X POST -H 'Content-Type: application/json' \
    -d @airlineStats_offline_table_config.json localhost:9000/tables

Check out the table config in the Rest API to make sure it was successfully uploaded.

Streaming Table Creation

Start Kafka

docker run \
    --network pinot-demo --name=kafka \
    -e KAFKA_ZOOKEEPER_CONNECT=pinot-zookeeper:2181/kafka \
    -e KAFKA_BROKER_ID=0 \
    -e KAFKA_ADVERTISED_HOST_NAME=kafka \
    -d wurstmeister/kafka:latest

Create a Kafka Topic

docker exec \
  -t kafka \
  /opt/kafka/bin/kafka-topics.sh \
  --zookeeper pinot-zookeeper:2181/kafka \
  --partitions=1 --replication-factor=1 \
  --create --topic flights-realtime

Create a Streaming table

docker run \
    --network=pinot-demo \
    --name pinot-streaming-table-creation \
    ${PINOT_IMAGE} AddTable \
    -schemaFile examples/stream/airlineStats/airlineStats_schema.json \
    -tableConfigFile examples/docker/table-configs/airlineStats_realtime_table_config.json \
    -controllerHost pinot-controller \
    -controllerPort 9000 \
    -exec

Sample output

Executing command: AddTable -tableConfigFile examples/docker/table-configs/airlineStats_realtime_table_config.json -schemaFile examples/stream/airlineStats/airlineStats_schema.json -controllerHost pinot-controller -controllerPort 9000 -exec
Sending request: http://pinot-controller:9000/schemas to controller: 8fbe601012f3, version: Unknown
{"status":"Table airlineStats_REALTIME succesfully added"}

Start Kafka-Zookeeper

bin/pinot-admin.sh StartZookeeper -zkPort 2191

Start Kafka

bin/pinot-admin.sh  StartKafka -zkAddress=localhost:2191/kafka -port 19092

Create stream table

bin/pinot-admin.sh AddTable \
    -schemaFile examples/stream/airlineStats/airlineStats_schema.json \
    -tableConfigFile examples/stream/airlineStats/airlineStats_realtime_table_config.json \
    -exec

Check out the table config in the Rest API to make sure it was successfully uploaded.

Hybrid Table creation

"OFFLINE": {
    "tableName": "pinotTable", 
    "tableType": "OFFLINE", 
    "segmentsConfig": {
      ... 
    }, 
    "tableIndexConfig": { 
      ... 
    },  
    "tenants": {
      "broker": "myBrokerTenant", 
      "server": "myServerTenant"
    },
    "metadata": {
      ...
    }
  },
  "REALTIME": { 
    "tableName": "pinotTable", 
    "tableType": "REALTIME", 
    "segmentsConfig": {
      ...
    }, 
    "tableIndexConfig": { 
      ... 
      "streamConfigs": {
        ...
      },  
    },  
    "tenants": {
      "broker": "myBrokerTenant", 
      "server": "myServerTenant"
    },
    "metadata": {
    ...
    }
  }
}

Note that creating a hybrid table has to be done in 2 separate steps of creating an offline and real-time table individually.

0.7.1

This release introduced several awesome new features, including JSON index, lookup-based join support, geospatial support, TLS support for pinot connections, and various performance optimizations.

Summary

This release introduced several awesome new features, including JSON index, lookup-based join support, geospatial support, TLS support for pinot connections, and various performance optimizations and improvements.

It also adds several new APIs to better manage the segments and upload data to the offline table. It also contains many key bug fixes. See details below.

The release was cut from the following commit: 78152cd

and the following cherry-picks:

  • b527af3

  • 84d59e3

  • a18dc60

  • 4ec38f7

  • b48dac0

  • 5d2bc0c

  • 913492e

  • 50a4531

  • 1f21403

  • 8dbb70b

Notable New Features

  • Add a server metric: queriesDisabled to check if queries disabled or not. (#6586)

  • Optimization on GroupKey to save the overhead of ser/de the group keys (#6593) (#6559)

  • Support validation for jsonExtractKey and jsonExtractScalar functions (#6246) (#6594)

  • Real Time Provisioning Helper tool improvement to take data characteristics as input instead of an actual segment (#6546)

  • Add the isolation level config isolation.level to Kafka consumer (2.0) to ingest transactionally committed messages only (#6580)

  • Enhance StarTreeIndexViewer to support multiple trees (#6569)

  • Improves ADLSGen2PinotFS with service principal based auth, auto create container on initial run. It's backwards compatible with key based auth. (#6531)

  • Add metrics for minion tasks status (#6549)

  • Use minion data directory as tmp directory for SegmentGenerationAndPushTask to ensure directory is always cleaned up (#6560)

  • Add optional HTTP basic auth to pinot broker, which enables user- and table-level authentication of incoming queries. (#6552)

  • Add Access Control for REST endpoints of Controller (#6507)

  • Add date_trunc to scalar functions to support date_trunc during ingestion (#6538)

  • Allow tar gz with > 8gb size (#6533)

  • Add Lookup UDF Join support (#6530), (#6465), (#6383) (#6286)

  • Add cron scheduler metrics reporting (#6502)

  • Support generating derived column during segment load, so that derived columns can be added on-the-fly (#6494)

  • Support chained transform functions (#6495)

  • Add scalar function JsonPathArray to extract arrays from json (#6490)

  • Add a guard against multiple consuming segments for same partition (#6483)

  • Remove the usage of deprecated range delimiter (#6475)

  • Handle scheduler calls with proper response when it's disabled. (#6474)

  • Simplify SegmentGenerationAndPushTask handling getting schema and table config (#6469)

  • Add a cluster config to config number of concurrent tasks per instance for minion task: SegmentGenerationAndPushTaskGenerator (#6468)

  • Replace BrokerRequestOptimizer with QueryOptimizer to also optimize the PinotQuery (#6423)

  • Add additional string scalar functions (#6458)

  • Add additional scalar functions for array type (#6446)

  • Add CRON scheduler for Pinot tasks (#6451)

  • Set default Data Type while setting type in Add Schema UI dialog (#6452)

  • Add ImportData sub command in pinot admin (#6396)

  • H3-based geospatial index (#6409) (#6306)

  • Add JSON index support (#6408) (#6216) (#6346)

  • Make minion tasks pluggable via reflection (#6395)

  • Add compatibility test for segment operations upload and delete (#6382)

  • Add segment reset API that disables and then enables the segment (#6336)

  • Add Pinot minion segment generation and push task. (#6340)

  • Add a version option to pinot admin to show all the component versions (#6380)

  • Add FST index using lucene lib to speedup REGEXP_LIKE operator on text (#6120)

  • Add APIs for uploading data to an offline table. (#6354)

  • Allow the use of environment variables in stream configs (#6373)

  • Enhance task schedule api for single type/table support (#6352)

  • Add broker time range based pruner for routing. Query operators supported: RANGE, =, <, <=, >, >=, AND, OR(#6259)

  • Add json path functions to extract values from json object (#6347)

  • Create a pluggable interface for Table config tuner (#6255)

  • Add a Controller endpoint to return table creation time (#6331)

  • Add tooltips, ability to enable-disable table state to the UI (#6327)

  • Add Pinot Minion client (#6339)

  • Add more efficient use of RoaringBitmap in OnHeapBitmapInvertedIndexCreator and OffHeapBitmapInvertedIndexCreator (#6320)

  • Add decimal percentile support. (#6323)

  • Add API to get status of consumption of a table (#6322)

  • Add support to add offline and realtime tables, individually able to add schema and schema listing in UI (#6296)

  • Improve performance for distinct queries (#6285)

  • Allow adding custom configs during the segment creation phase (#6299)

  • Use sorted index based filtering only for dictionary encoded column (#6288)

  • Enhance forward index reader for better performance (#6262)

  • Support for text index without raw (#6284)

  • Add api for cluster manager to get table state (#6211)

  • Perf optimization for SQL GROUP BY ORDER BY (#6225)

  • Add support using environment variables in the format of ${VAR_NAME:DEFAULT_VALUE} in Pinot table configs. (#6271)

Special notes

  • Pinot controller metrics prefix is fixed to add a missing dot (#6499). This is a backward-incompatible change that JMX query on controller metrics must be updated

  • Legacy group key delimiter (\t) was removed to be backward-compatible with release 0.5.0 (#6589)

  • Upgrade zookeeper version to 3.5.8 to fix ZOOKEEPER-2184: Zookeeper Client should re-resolve hosts when connection attempts fail. (#6558)

  • Add TLS-support for client-pinot and pinot-internode connections (#6418) Upgrades to a TLS-enabled cluster can be performed safely and without downtime. To achieve a live-upgrade, go through the following steps:

    • First, configure alternate ingress ports for https/netty-tls on brokers, controllers, and servers. Restart the components with a rolling strategy to avoid cluster downtime.

    • Second, verify manually that https access to controllers and brokers is live. Then, configure all components to prefer TLS-enabled connections (while still allowing unsecured access). Restart the individual components.

    • Third, disable insecure connections via configuration. You may also have to set controller.vip.protocol and controller.vip.port and update the configuration files of any ingestion jobs. Restart components a final time and verify that insecure ingress via http is not available anymore.

  • PQL endpoint on Broker is deprecated (#6607)

    • Apache Pinot has adopted SQL syntax and semantics. Legacy PQL (Pinot Query Language) is deprecated and no longer supported. Please use SQL syntax to query Pinot on broker endpoint /query/sql and controller endpoint /sql

Major Bug fixes

  • Fix the SIGSEGV for large index (#6577)

  • Handle creation of segments with 0 rows so segment creation does not fail if data source has 0 rows. (#6466)

  • Fix QueryRunner tool for multiple runs (#6582)

  • Use URL encoding for the generated segment tar name to handle characters that cannot be parsed to URI. (#6571)

  • Fix a bug of miscounting the top nodes in StarTreeIndexViewer (#6569)

  • Fix the raw bytes column in real-time segment (#6574)

  • Fixes a bug to allow using JSON_MATCH predicate in SQL queries (#6535)

  • Fix the overflow issue when loading the large dictionary into the buffer (#6476)

  • Fix empty data table for distinct query (#6363)

  • Fix the default map return value in DictionaryBasedGroupKeyGenerator (#6712)

  • Fix log message in ControllerPeriodicTask (#6709)

  • Fix bug #6671: RealtimeTableDataManager shuts down SegmentBuildTimeLeaseExtender for all tables in the host (#6682)

  • Fix license headers and plugin checks

Ingest Parquet Files from S3 Using Spark

One of the primary advantage of using Pinot is its pluggable architecture. The plugins make it easy to add support for any third-party system which can be an execution framework, a filesystem or input format.

In this tutorial, we will use three such plugins to easily ingest data and push it to our pinot cluster. The plugins we will be using are -

  • pinot-batch-ingestion-spark

  • pinot-s3

  • pinot-parquet

You can check out Batch Ingestion, File systems and Input formats for all the available plugins.

Setup

We are using the following tools and frameworks for this tutorial -

  • Apache Spark 2.2.3 (Although any spark 2.X should work)

  • Apache Parquet 1.8.2

  • Amazon S3

  • Apache Pinot 0.4.0

Input Data

We need to get input data to ingest first. For our demo, we'll just create some small parquet files and upload them to our S3 bucket. The easiest way is to create CSV files and then convert them to parquet. CSV makes it human-readable and thus easier to modify input in case of some failure in our demo. We will call this file students.csv

timestampInEpoch,id,name,age,score
1597044264380,1,david,15,98
1597044264381,2,henry,16,97
1597044264382,3,katie,14,99
1597044264383,4,catelyn,15,96
1597044264384,5,emma,13,93
1597044264390,6,john,15,100
1597044264396,7,isabella,13,89
1597044264399,8,linda,17,91
1597044264502,9,mark,16,67
1597044264670,10,tom,14,78

Now, we'll create parquet files from the above CSV file using Spark. Since this is a small program, we will be using Spark shell instead of writing a full fledged Spark code.

scala> val df = spark.read.format("csv").option("header", true).load("path/to/students.csv")
scala> df.write.option("compression","none").mode("overwrite").parquet("/path/to/batch_input/")

The .parquet files can now be found in /path/to/batch_input directory. You can now upload this directory to S3 either using their UI or running the command

aws s3 cp /path/to/batch_input s3://my-bucket/batch-input/ --recursive

Create Schema and Table

We need to create a table to query the data that will be ingested. All tables in pinot are associated with a schema. You can check out Table configuration and Schema configuration for more details on creating configurations.

For our demo, we will have the following schema and table configs

student_schema.json
{
    "schemaName": "students",
    "dimensionFieldSpecs": [
        {
            "name": "id",
            "dataType": "INT"
        },
        {
            "name": "name",
            "dataType": "STRING"
        },
        {
            "name": "age",
            "dataType": "INT"
        }
    ],
    "metricFieldSpecs": [
        {
            "name": "score",
            "dataType": "INT"
        }
    ],
    "dateTimeFieldSpecs": [
        {
            "name": "timestampInEpoch",
            "dataType": "LONG",
            "format": "1:MILLISECONDS:EPOCH",
            "granularity": "1:MILLISECONDS"
        }
    ]
}
student_table.json
{
    "tableName": "students",
    "segmentsConfig": {
        "timeColumnName": "timestampInEpoch",
        "timeType": "MILLISECONDS",
        "replication": "1",
        "schemaName": "students"
    },
    "tableIndexConfig": {
        "invertedIndexColumns": [],
        "loadMode": "MMAP"
    },
    "tenants": {
        "broker": "DefaultTenant",
        "server": "DefaultTenant"
    },
    "tableType": "OFFLINE",
    "metadata": {}
}

We can now upload these configurations to pinot and create an empty table. We will be using pinot-admin.sh CLI for these purpose.

pinot-admin.sh AddTable -tableConfigFile /path/to/student_table.json -schemaFile /path/to/student_schema.json -controllerHost localhost -controllerPort 9000 -exec

You can check out Command-Line Interface (CLI) for all the available commands.

Our table will now be available in the Pinot data explorer

Ingest Data

Now that our data is available in S3 as well as we have the Tables in Pinot, we can start the process of ingesting the data. Data ingestion in Pinot involves the following steps -

  • Read data and generate compressed segment files from input

  • Upload the compressed segment files to output location

  • Push the location of the segment files to the controller

Once the location is available to the controller, it can notify the servers to download the segment files and populate the tables.

The above steps can be performed using any distributed executor of your choice such as Hadoop, Spark, Flink etc. For this demo we will be using Apache Spark to execute the steps.

Pinot provides runners for Spark out of the box. So as a user, you don't need to write a single line of code. You can write runners for any other executor using our provided interfaces.

Firstly, we will create a job spec configuration file for our data ingestion process.

spark_job_spec.yaml
executionFrameworkSpec:
  name: 'spark'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner'
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'

# jobType: Pinot ingestion job type.
# Supported job types are:
#   'SegmentCreation'
#   'SegmentTarPush'
#   'SegmentUriPush'
#   'SegmentCreationAndTarPush'
#   'SegmentCreationAndUriPush'
#   'SegmentCreationAndMetadataPush'
jobType: SegmentCreationAndMetadataPush
extraConfigs:
    stagingDir: s3://my-bucket/spark/staging/
inputDirURI: 's3://my-bucket/path/to/batch-input/'
outputDirURI: 's3:///my-bucket/path/to/batch-output/'
overwriteOutput: true
pinotFSSpecs:
  - scheme: s3
    className: org.apache.pinot.plugin.filesystem.S3PinotFS
    configs:    
      region: 'us-west-2'
recordReaderSpec:
  dataFormat: 'parquet'
  className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
tableSpec:
  tableName: 'students'
pinotClusterSpecs:
  - controllerURI: 'http://localhost:9000'
pushJobSpec:
  pushParallelism: 2
  pushAttempts: 2
  pushRetryIntervalMillis: 1000

In the job spec, we have kept execution framework as spark and configured the appropriate runners for each of our steps. We also need a temporary stagingDir for our spark job. This directory is cleaned up after our job has executed.

We also provide the S3 Filesystem and Parquet reader implementation in the config to use. You can refer Ingestion Job Spec for complete list of configuration.

We can now run our spark job to execute all the steps and populate data in pinot.

export PINOT_VERSION=0.8.0
export PINOT_DISTRIBUTION_DIR=/path/to/apache-pinot-${PINOT_VERSION}-bin

spark-submit //
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand //
--master local --deploy-mode client //
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" //
--conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar" //
local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar -jobSpecFile /path/to/spark_job_spec.yaml

In the command , we have included the JARs of all the required plugins in the spark's driver classpath. In practice, you only need to do this if you get a ClassNotFoundException.

Voila! Now our data is successfully ingested. Let's try to query it from Pinot's broker

bin/pinot-admin.sh PostQuery -brokerHost localhost -brokerPort 8000 -queryType sql -query "SELECT * FROM students LIMIT 10"

If everything went right, you should receive the following output

{
  "resultTable": {
    "dataSchema": {
      "columnNames": [
        "age",
        "id",
        "name",
        "score",
        "timestampInEpoch"
      ],
      "columnDataTypes": [
        "INT",
        "INT",
        "STRING",
        "INT",
        "LONG"
      ]
    },
    "rows": [
      [
        15,
        1,
        "david",
        98,
        1597044264380
      ],
      [
        16,
        2,
        "henry",
        97,
        1597044264381
      ],
      [
        14,
        3,
        "katie",
        99,
        1597044264382
      ],
      [
        15,
        4,
        "catelyn",
        96,
        1597044264383
      ],
      [
        13,
        5,
        "emma",
        93,
        1597044264384
      ],
      [
        15,
        6,
        "john",
        100,
        1597044264390
      ],
      [
        13,
        7,
        "isabella",
        89,
        1597044264396
      ],
      [
        17,
        8,
        "linda",
        91,
        1597044264399
      ],
      [
        16,
        9,
        "mark",
        67,
        1597044264502
      ],
      [
        14,
        10,
        "tom",
        78,
        1597044264670
      ]
    ]
  },
  "exceptions": [],
  "numServersQueried": 1,
  "numServersResponded": 1,
  "numSegmentsQueried": 1,
  "numSegmentsProcessed": 1,
  "numSegmentsMatched": 1,
  "numConsumingSegmentsQueried": 0,
  "numDocsScanned": 10,
  "numEntriesScannedInFilter": 0,
  "numEntriesScannedPostFilter": 50,
  "numGroupsLimitReached": false,
  "totalDocs": 10,
  "timeUsedMs": 11,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 0
}
mkdir -p /tmp/pinot-quick-start/rawdata
/tmp/pinot-quick-start/rawdata/transcript.csv
studentID,firstName,lastName,gender,subject,score,timestampInEpoch
200,Lucy,Smith,Female,Maths,3.8,1570863600000
200,Lucy,Smith,Female,English,3.5,1571036400000
201,Bob,King,Male,Maths,3.2,1571900400000
202,Nick,Young,Male,Physics,3.6,1572418800000

Column Type

Description

Dimensions

Typically used in filters and group by, for slicing and dicing into data

Metrics

Typically used in aggregations, represents the quantitative data

Time

Optional column, represents the timestamp associated with each row

/tmp/pinot-quick-start/transcript-schema.json
{
  "schemaName": "transcript",
  "dimensionFieldSpecs": [
    {
      "name": "studentID",
      "dataType": "INT"
    },
    {
      "name": "firstName",
      "dataType": "STRING"
    },
    {
      "name": "lastName",
      "dataType": "STRING"
    },
    {
      "name": "gender",
      "dataType": "STRING"
    },
    {
      "name": "subject",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "score",
      "dataType": "FLOAT"
    }
  ],
  "dateTimeFieldSpecs": [{
    "name": "timestampInEpoch",
    "dataType": "LONG",
    "format" : "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
  }]
}
/tmp/pinot-quick-start/transcript-table-offline.json
{
  "tableName": "transcript",
  "segmentsConfig" : {
    "timeColumnName": "timestampInEpoch",
    "timeType": "MILLISECONDS",
    "replication" : "1",
    "schemaName" : "transcript"
  },
  "tableIndexConfig" : {
    "invertedIndexColumns" : [],
    "loadMode"  : "MMAP"
  },
  "tenants" : {
    "broker":"DefaultTenant",
    "server":"DefaultTenant"
  },
  "tableType":"OFFLINE",
  "metadata": {}
}
$ ls /tmp/pinot-quick-start
rawdata			transcript-schema.json	transcript-table-offline.json

$ ls /tmp/pinot-quick-start/rawdata 
transcript.csv
docker run --rm -ti \
    --network=pinot-demo \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-batch-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-offline.json \
    -controllerHost pinot-quickstart \
    -controllerPort 9000 -exec
bin/pinot-admin.sh AddTable \
  -tableConfigFile /tmp/pinot-quick-start/transcript-table-offline.json \
  -schemaFile /tmp/pinot-quick-start/transcript-schema.json -exec
/tmp/pinot-quick-start/docker-job-spec.yml
executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/tmp/pinot-quick-start/rawdata/'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: '/tmp/pinot-quick-start/segments/'
overwriteOutput: true
pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
  tableName: 'transcript'
  schemaURI: 'http://pinot-quickstart:9000/tables/transcript/schema'
  tableConfigURI: 'http://pinot-quickstart:9000/tables/transcript'
pinotClusterSpecs:
  - controllerURI: 'http://pinot-quickstart:9000'
/tmp/pinot-quick-start/batch-job-spec.yml
executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/tmp/pinot-quick-start/rawdata/'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: '/tmp/pinot-quick-start/segments/'
overwriteOutput: true
pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
  tableName: 'transcript'
  schemaURI: 'http://localhost:9000/tables/transcript/schema'
  tableConfigURI: 'http://localhost:9000/tables/transcript'
pinotClusterSpecs:
  - controllerURI: 'http://localhost:9000'
docker run --rm -ti \
    --network=pinot-demo \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-data-ingestion-job \
    apachepinot/pinot:latest LaunchDataIngestionJob \
    -jobSpecFile /tmp/pinot-quick-start/docker-job-spec.yml
bin/pinot-admin.sh LaunchDataIngestionJob \
    -jobSpecFile /tmp/pinot-quick-start/batch-job-spec.yml
SegmentGenerationJobSpec: 
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
  segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**\/*.csv
inputDirURI: /tmp/pinot-quick-start/rawdata/
jobType: SegmentCreationAndTarPush
outputDirURI: /tmp/pinot-quick-start/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: 'http://localhost:9000'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: null
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader,
  configClassName: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig,
  configs: null, dataFormat: csv}
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: 'http://localhost:9000/tables/transcript/schema', tableConfigURI: 'http://localhost:9000/tables/transcript',
  tableName: transcript}

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Finished building StatsCollector!
Collected stats for 4 documents
Using fixed bytes value dictionary for column: studentID, size: 9
Created dictionary for STRING column: studentID with cardinality: 3, max length in bytes: 3, range: 200 to 202
Using fixed bytes value dictionary for column: firstName, size: 12
Created dictionary for STRING column: firstName with cardinality: 3, max length in bytes: 4, range: Bob to Nick
Using fixed bytes value dictionary for column: lastName, size: 15
Created dictionary for STRING column: lastName with cardinality: 3, max length in bytes: 5, range: King to Young
Created dictionary for FLOAT column: score with cardinality: 4, range: 3.2 to 3.8
Using fixed bytes value dictionary for column: gender, size: 12
Created dictionary for STRING column: gender with cardinality: 2, max length in bytes: 6, range: Female to Male
Using fixed bytes value dictionary for column: subject, size: 21
Created dictionary for STRING column: subject with cardinality: 3, max length in bytes: 7, range: English to Physics
Created dictionary for LONG column: timestampInEpoch with cardinality: 4, range: 1570863600000 to 1572418800000
Start building IndexCreator!
Finished records indexing in IndexCreator!
Finished segment seal!
Converting segment: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to v3 format
v3 segment location for segment: transcript_OFFLINE_1570863600000_1572418800000_0 is /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3
Deleting files in v1 segment directory: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0
Starting building 1 star-trees with configs: [StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]] using OFF_HEAP builder
Starting building star-tree with config: StarTreeV2BuilderConfig[splitOrder=[studentID, firstName],skipStarNodeCreation=[],functionColumnPairs=[org.apache.pinot.core.startree.v2.AggregationFunctionColumnPair@3a48efdc],maxLeafRecords=1]
Generated 3 star-tree records from 4 segment records
Finished constructing star-tree, got 9 tree nodes and 4 records under star-node
Finished creating aggregated documents, got 6 aggregated records
Finished building star-tree in 10ms
Finished building 1 star-trees in 27ms
Computed crc = 3454627653, based on files [/var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/columns.psf, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/index_map, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/metadata.properties, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index, /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0/v3/star_tree_index_map]
Driver, record read time : 0
Driver, stats collector time : 0
Driver, indexing time : 0
Tarring segment from: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0 to: /var/folders/3z/qn6k60qs6ps1bb6s2c26gx040000gn/T/pinot-1583443148720/output/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz
Size for segment: transcript_OFFLINE_1570863600000_1572418800000_0, uncompressed: 6.73KB, compressed: 1.89KB
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Start pushing segments: [/tmp/pinot-quick-start/segments/transcript_OFFLINE_1570863600000_1572418800000_0.tar.gz]... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@243c4f91] for table transcript
Pushing segment: transcript_OFFLINE_1570863600000_1572418800000_0 to location: http://localhost:9000 for table transcript
Sending request: http://localhost:9000/v2/segments?tableName=transcript to controller: nehas-mbp.hsd1.ca.comcast.net, version: Unknown
Response for pushing table transcript segment transcript_OFFLINE_1570863600000_1572418800000_0 to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: transcript_OFFLINE_1570863600000_1572418800000_0 of table: transcript"}
Pinot running in Docker
Schema
Table
Rest API
Segment
Rest API
Query Console
2020/03/09 23:37:19.879 ERROR [HelixTaskExecutor] [CallbackProcessor@b808af5-pinot] [pinot-broker] [] Message cannot be processed: 78816abe-5288-4f08-88c0-f8aa596114fe, {CREATE_TIMESTAMP=1583797034542, MSG_ID=78816abe-5288-4f08-88c0-f8aa596114fe, MSG_STATE=unprocessable, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=fooBar_OFFLINE, RESOURCE_NAME=brokerResource, RETRY_COUNT=0, SRC_CLUSTER=pinot, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_hostname.domain,com_9000, TGT_NAME=Broker_hostname,domain.com_6998, TGT_SESSION_ID=f6e19a457b80db5, TIMEOUT=-1, segmentName=fooBar_559, tableName=fooBar_OFFLINE}{}{}
java.lang.UnsupportedOperationException: Unsupported user defined message sub type: REFRESH_SEGMENT
      at org.apache.pinot.broker.broker.helix.TimeboundaryRefreshMessageHandlerFactory.createHandler(TimeboundaryRefreshMessageHandlerFactory.java:68) ~[pinot-broker-0.2.1172.jar:0.3.0-SNAPSHOT-c9d88e47e02d799dc334d7dd1446a38d9ce161a3]
      at org.apache.helix.messaging.handling.HelixTaskExecutor.createMessageHandler(HelixTaskExecutor.java:1096) ~[helix-core-0.9.1.509.jar:0.9.1.509]
      at org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:866) [helix-core-0.9.1.509.jar:0.9.1.509]
#4694
#4877
#4602
#4994
#5016
#5033
#4964
#5018
#5070
#4535
#4583
#4666
#4747
#4791
#4977
#4983
#4695
#4990
#5015
#4993
#5020
#4740
#4954
#4726
#4959
#5073
#4545
#4639
#4824
#4806
#4828
#4838
#5054
#5073
#4767
#5040
#4903
#4995
#5006
#4764
#4793
#4855
#4808
#4882
#4929
#4976
#5114
#5137
#5138
#4230
#4585
#4943
#4806
#5054
0.2.0 and before Pinot Module Dependency Diagram
Dependency graph after introducing pinot-plugin in 0.3.0
segment
table
Server
segments

Querying JSON data

To see how JSON data can be queried, assume that we have the following table:

Table myTable:
  id        INTEGER
  jsoncolumn    JSON 

Table data:
101,{"name":{"first":"daffy"\,"last":"duck"}\,"score":101\,"data":["a"\,"b"\,"c"\,"d"]}
102,{"name":{"first":"donald"\,"last":"duck"}\,"score":102\,"data":["a"\,"b"\,"e"\,"f"]}
103,{"name":{"first":"mickey"\,"last":"mouse"}\,"score":103\,"data":["a"\,"b"\,"g"\,"h"]}
104,{"name":{"first":"minnie"\,"last":"mouse"}\,"score":104\,"data":["a"\,"b"\,"i"\,"j"]}
105,{"name":{"first":"goofy"\,"last":"dwag"}\,"score":104\,"data":["a"\,"b"\,"i"\,"j"]}
106,{"person":{"name":"daffy duck"\,"companies":[{"name":"n1"\,"title":"t1"}\,{"name":"n2"\,"title":"t2"}]}}
107,{"person":{"name":"scrooge mcduck"\,"companies":[{"name":"n1"\,"title":"t1"}\,{"name":"n2"\,"title":"t2"}]}}

We also assume that "jsoncolumn" has a Json Index on it. Note that the last two rows in the table have different structure than the rest of the rows. In keeping with JSON specification, a JSON column can contain any valid JSON data and doesn't need to adhere to a predefined schema. To pull out the entire JSON document for each row, we can run the query below:

SELECT id, jsoncolumn 
FROM myTable
id
jsoncolumn

"101"

"{"name":{"first":"daffy","last":"duck"},"score":101,"data":["a","b","c","d"]}"

102"

"{"name":{"first":"donald","last":"duck"},"score":102,"data":["a","b","e","f"]}

"103"

"{"name":{"first":"mickey","last":"mouse"},"score":103,"data":["a","b","g","h"]}

"104"

"{"name":{"first":"minnie","last":"mouse"},"score":104,"data":["a","b","i","j"]}"

"105"

"{"name":{"first":"goofy","last":"dwag"},"score":104,"data":["a","b","i","j"]}"

"106"

"{"person":{"name":"daffy duck","companies":[{"name":"n1","title":"t1"},{"name":"n2","title":"t2"}]}}"

"107"

"{"person":{"name":"scrooge mcduck","companies":[{"name":"n1","title":"t1"},{"name":"n2","title":"t2"}]}}"

To drill down and pull out specific keys within the JSON column, we simply append the JsonPath expression of those keys to the end of the column name.

SELECT id, jsoncolumn.name.last, jsoncolumn.name.first, jsoncolumn.data[1] 
FROM myTable
id
jsoncolumn.name.last
jsoncolumn.name.first
jsoncolumn.data[1]

"101"

"duck"

"daffy"

"b"

"102"

"duck"

"donald"

"b"

"103"

"mouse"

"mickey"

"b"

"104"

"mouse"

"minnie"

"b"

"105"

"dwag"

"goofy"

"b"

"106"

"null"

"null"

"null"

"107"

"null"

"null"

"null"

Note that the third column (jsoncolumn.data[1]) is null for rows with id 106 and 107. This is because these rows have JSON documents that don't have a key with JsonPath jsoncolumn.data[1]. We can filter out these rows.

SELECT id, jsoncolumn.name.last, jsoncolumn.name.first, jsoncolumn.data[1] 
FROM myTable 
WHERE jsoncolumn.data[1] IS NOT NULL
id
jsoncolumn.name.last
jsoncolumn.name.first
jsoncolumn.data[1]

"101"

"duck"

"daffy"

"b"

"102"

"duck"

"donald"

"b"

"103"

"mouse"

"mickey"

"b"

"104"

"mouse"

"minnie"

"b"

"105"

"dwag"

"goofy"

"b"

Notice that certain last names (duck and mouse for example) repeat in the data above. We can get a count of each last name by running a GROUP BY query on a JsonPath expression.

SELECT jsoncolumn.name.last, count(*) 
FROM myTable 
WHERE jsoncolumn.data[1] IS NOT NULL 
GROUP BY jsoncolumn.name.last 
ORDER BY 2 DESC
jsoncolumn.name.last
count(*)

"mouse"

"2"

"duck"

"2"

"dwag"

"1"

Also there is numerical information (jsconcolumn.score) embeded within the JSON document. We can extract those numerical values from JSON data into SQL and sum them up using the query below.

SELECT jsoncolumn.name.last, sum(jsoncolumn.score) 
FROM myTable 
WHERE jsoncolumn.name.last IS NOT NULL 
GROUP BY jsoncolumn.name.last
jsoncolumn.name.last
sum(jsoncolumn.score)

"mouse"

"207"

"dwag"

"104"

"duck"

"203"

In short, JSON querying support in Pinot will allow you to use a JsonPath expression whereever you can use a column name with the only difference being that to query a column with data type JSON, you must append a JsonPath expression after the name of the column.

Running Pinot in Kubernetes

Pinot quick start in Kubernetes

1. Prerequisites

This quickstart assumes that you already have a running Kubernetes cluster. Please follow the links below to set up a Kubernetes cluster.

  • (make sure to run with enough resources e.g. minikube start --vm=true --cpus=4 --memory=8g --disk-size=50g)

2. Setting up a Pinot cluster in Kubernetes

Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.

The scripts can be found in the Pinot source at ./pinot/kubernetes/helm

2.1 Start Pinot with Helm

Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is .

NOTE: Please specify StorageClass based on your cloud vendor. For Pinot Server, please don't mount blob store like AzureFile/GoogleCloudStorage/S3 as the data serving file system.

Only use Amazon EBS/GCP Persistent Disk/Azure Disk style disks.

  • For AWS: "gp2"

  • For GCP: "pd-ssd" or "standard"

  • For Azure: "AzureDisk"

  • For Docker-Desktop: "hostpath"

2.1.1 Update helm dependency

2.1.2 Start Pinot with Helm

  • For Helm v2.12.1

If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:

Then deploy a new HA Pinot cluster using the following command:

  • For Helm v3.0.0

2.1.3 Troubleshooting (For helm v2.12.1)

  • Error: Please run the below command if encountering the following issue:

  • Resolution:

  • Error: Please run the command below if encountering a permission issue:

Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"

  • Resolution:

2.2 Check Pinot deployment status

3. Load data into Pinot using Kafka

3.1 Bring up a Kafka cluster for real-time data ingestion

3.2 Check Kafka deployment status

Ensure the Kafka deployment is ready before executing the scripts in the following next steps.

3.3 Create Kafka topics

The scripts below will create two Kafka topics for data ingestion:

3.4 Load data into Kafka and create Pinot schema/tables

The script below will deploy 3 batch jobs.

  • Ingest 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec

  • Ingest 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec

  • Upload Pinot schema airlineStats

  • Create Pinot table airlineStats to ingest data from JSON encoded Kafka topic flights-realtime

  • Create Pinot table airlineStatsAvro to ingest data from Avro encoded Kafka topic flights-realtime-avro

4. Query using Pinot Data Explorer

4.1 Pinot Data Explorer

Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.

This script can be found in the Pinot source at ./pinot/kubernetes/helm/pinot

5. Using Superset to query Pinot

5.1 Bring up Superset

Open superset.yaml file and goto the line showing storageClass. And change it based on your cloud vendor. kubectl get sc will get you the storageClass value for your Kubernetes system. E.g.

  • For AWS: "gp2"

  • For GCP: "pd-ssd" or "standard"

  • For Azure: "AzureDisk"

  • For Docker-Desktop: "hostpath"

Then run:

Ensure your cluster is up by running:

5.2 (First time) Set up Admin account

5.3 (First time) Init Superset

5.4 Load Demo data source

5.5 Access Superset UI

You can run below command to navigate superset in your browser with the previous admin credential.

You can open the imported dashboard by clicking Dashboards banner and then click on AirlineStats.

6. Access Pinot using Trino

6.1 Deploy Trino

You can run the command below to deploy Trino with the Pinot plugin installed.

The above command adds Trino HelmChart repo. You can then run the below command to see the charts.

In order to connect Trino to Pinot, we need to add Pinot catalog, which requires extra configurations. You can run the below command to get all the configurable values.

To add Pinot catalog, you can edit the additionalCatalogs section by adding:

Pinot is deployed at namespace pinot-quickstart, so the controller serviceURL is pinot-controller.pinot-quickstart:9000

After modifying the /tmp/trino-values.yaml file, you can deploy Trino with:

Once you deployed the Trino, You can check Trino deployment status by:

6.2 Query Trino using Trino CLI

Once Trino is deployed, you can run the below command to get a runnable Trino CLI.

6.2.1 Download Trino CLI

6.2.2 Port forward Trino service to your local if it's not already exposed

6.2.3 Use Trino console client to connect to Trino service

6.2.4 Query Pinot data using Trino CLI

6.3 Sample queries to execute

  • List all catalogs

  • List All tables

  • Show schema

  • Count total documents

7. Access Pinot using Presto

7.1 Deploy Presto using Pinot plugin

You can run the command below to deploy a customized Presto with the Pinot plugin installed.

The above command deploys Presto with default configs. For customizing your deployment, you can run the below command to get all the configurable values.

After modifying the /tmp/presto-values.yaml file, you can deploy Presto with:

Once you deployed the Presto, You can check Presto deployment status by:

7.2 Query Presto using Presto CLI

Once Presto is deployed, you can run the below command from , or just follow steps 6.2.1 to 6.2.3.

6.2.1 Download Presto CLI

6.2.2 Port forward presto-coordinator port 8080 to localhost port 18080

6.2.3 Start Presto CLI with pinot catalog to query it then query it

6.2.4 Query Pinot data using Presto CLI

7.3 Sample queries to execute

  • List all catalogs

  • List All tables

  • Show schema

  • Count total documents

8. Deleting the Pinot cluster in Kubernetes

0.8.0

This release introduced several new features, including compatibility tests, enhanced complex type and Json support, partial upsert support, and new stream ingestion plugins.

Summary

This release introduced several awesome new features, including compatibility tests, enhanced complex type and Json support, partial upsert support, and new stream ingestion plugins (AWS Kinesis, Apache Pulsar). It contains a lot of query enhancements such as new timestamp and boolean type support and flexible numerical column comparison. It also includes many key bug fixes. See details below.

The release was cut from the following commit: fe83e95aa9124ee59787c580846793ff7456eaa5

and the following cherry-picks:

Notable New Features

  • Extract time handling for SegmentProcessorFramework ()

  • Add Apache Pulsar low level and high level connector ()

  • Enable parallel builds for compat checker ()

  • Add controller/server API to fetch aggregated segment metadata ()

  • Support Dictionary Based Plan For DISTINCT ()

  • Provide HTTP client to kinesis builder ()

  • Add datetime function with 2 arguments ()

  • Adding ability to check ingestion status for Offline Pinot table ()

  • Add timestamp datatype support in JDBC ()

  • Allow updating controller and broker helix hostname ()

  • Cancel running Kinesis consumer tasks when timeout occurs ()

  • Implement Append merger for partial upsert ()

    `* SegmentProcessorFramework Enhancement ()

  • Added TaskMetricsEmitted periodic controler job ()

  • Support json path expressions in query. ()

  • Support data preprocessing for AVRO and ORC formats ()

  • Add partial upsert config and mergers ()

  • Add support for range index rule recommendation(#7034) ()

  • Allow reloading consuming segment by default ()

  • Add LZ4 Compression Codec (#6804) ([#7035](

    ))

  • Make Pinot JDK 11 Compilable (\

  • Introduce in-Segment Trim for GroupBy OrderBy Query ()

  • Produce GenericRow file in segment processing mapper ()

  • Add ago() scalar transform function ()

  • Add Bloom Filter support for IN predicate(#7005) ()

  • Add genericRow file reader and writer ()

  • Normalize LHS and RHS numerical types for >, >=, <, and <= operators. ()

  • Add Kinesis Stream Ingestion Plugin ()

  • feature/#6766 JSON and Startree index information in API ()

  • Support null value fields in generic row ser/de ()

  • Implement PassThroughTransformOperator to optimize select queries(#6972) ()

  • Optimize TIME_CONVERT/DATE_TIME_CONVERT predicates ()

  • Prefetch call to fetch buffers of columns seen in the query ()

  • Enabling compatibility tests in the script ()

  • Add collectionToJsonMode to schema inference ()

  • Add the complex-type support to decoder/reader ()

  • Adding a new Controller API to retrieve ingestion status for realtime… ()

  • Add support for Long in Modulo partition function. ()

  • Enhance PinotSegmentRecordReader to preserve null values ()

  • add complex-type support to avro-to-pinot schema inference ()

  • Add correct yaml files for real time data(#6787) ()

  • Add complex-type transformation to offline segment creation ()

  • Add config File support(#6787) ()

  • Enhance JSON index to support nested array ()

  • Add debug endpoint for tables. ()

  • JSON column datatype support. ()

  • Allow empty string in MV column ()

  • Add Zstandard compression support with JMH benchmarking(#6804) ()

  • Normalize LHS and RHS numerical types for = and != operator. ()

  • Change ConcatCollector implementation to use off-heap ()

  • [PQL Deprecation] Clean up the old BrokerRequestOptimizer ()

  • [PQL Deprecation] Do not compile PQL broker request for SQL query ()

  • Add TIMESTAMP and BOOLEAN data type support ()

  • Add admin endpoint for Pinot Minon. ()

  • Remove the usage of PQL compiler ()

  • Add endpoints in Pinot Controller, Broker and Server to get system and application configs. ()

  • Support IN predicate in ColumnValue SegmentPruner(#6756) ()

  • Enable adding new segments to a upsert-enabled realtime table ()

  • Interface changes for Kinesis connector ()

  • Pinot Minion SegmentGenerationAndPush task: PinotFS configs inside taskSpec is always temporary and has higher priority than default PinotFS created by the minion server configs ()

  • DataTable V3 implementation and measure data table serialization cost on server ()

  • add uploadLLCSegment endpoint in TableResource ()

  • File-based SegmentWriter implementation ()

  • Basic Auth for pinot-controller ()

  • UI integration with Authentication API and added login page ()

  • Support data ingestion for offline segment in one pass ()

  • SumPrecision: support all data types and star-tree ()

  • complete compatibility regression testing ()

  • Kinesis implementation Part 1: Rename partitionId to partitionGroupId ()

  • Make Pinot metrics pluggable ()

  • Recover the segment from controller when LLC table cannot load it ()

  • Adding a new API for validating specified TableConfig and Schema ()

  • Introduce a metric for query/response size on broker. ()

  • Adding a controller periodic task to clean up dead minion instances ()

  • Adding new validation for Json, TEXT indexing ()

  • Always return a response from query execution. ()

Special notes

  • After the 0.8.0 release, we will officially support jdk 11, and can now safely start to use jdk 11 features. Code is still compilable with jdk 8 ()

  • RealtimeToOfflineSegmentsTask config has some backward incompatible changes ()

    — timeColumnTransformFunction is removed (backward-incompatible, but rollup is not supported anyway)

    — Deprecate collectorType and replace it with mergeType

    — Add roundBucketTimePeriod and partitionBucketTimePeriod to config the time bucket for round and partition

  • Regex path for pluggable MinionEventObserverFactory is changed from org.apache.pinot.*.event.* to org.apache.pinot.*.plugin.minion.tasks.* ()

  • Moved all pinot built-in minion tasks to the pinot-minion-builtin-tasks module and package them into a shaded jar ()

  • Reloading consuming segment flag pinot.server.instance.reload.consumingSegment will be true by default ()

  • Move JSON decoder from pinot-kafka to pinot-json package. ()

  • Backward incompatible schema change through controller rest API PUT /schemas/{schemaName} will be blocked. ()

  • Deprecated /tables/validateTableAndSchema in favor of the new configs/validate API and introduced new APIs for /tableConfigs to operate on the realtime table config, offline table config and schema in one shot. ()

Major Bug fixes

  • Fix race condition in MinionInstancesCleanupTask ()

  • Fix custom instance id for controller/broker/minion ()

  • Fix UpsertConfig JSON deserialization. ()

  • Fix the memory issue for selection query with large limit ()

  • Fix the deleted segments directory not exist warning ()

  • Fixing docker build scripts by providing JDK_VERSION as parameter ()

  • Misc fixes for json data type ()

  • Fix handling of date time columns in query recommender(#7018) ()

  • fixing pinot-hadoop and pinot-spark test ()

  • Fixing HadoopPinotFS listFiles method to always contain scheme ()

  • fixed GenericRow compare for different _fieldToValueMap size ()

  • Fix NPE in NumericalFilterOptimizer due to IS NULL and IS NOT NULL operator. ()

  • Fix the race condition in realtime text index refresh thread (#6858) ()

  • Fix deep store directory structure ()

  • Fix NPE issue when consumed kafka message is null or the record value is null. ()

  • Mitigate calcite NPE bug. ()

  • Fix the exception thrown in the case that a specified table name does not exist (#6328) ()

  • Fix CAST transform function for chained transforms ()

  • Fixed failing pinot-controller npm build ()

Query Response Format

Standard-SQL response

Response is returned in a SQL-like tabular structure. Note, this is the response returned from the standard-SQL endpoint. For PQL endpoint response, skip to

PQL response

Note

PQL endpoint is deprecated, and will soon be removed. The standard sql endpoint is the recommended endpoint.

The response received from PQL endpoint is different depending on the type of the query.

668b5e0
ee887b9
c2f7fcc
c1ac8a1
4da1dae
573651b
c6c407d
0d96c7f
c2637d1
#7158
#7026
#7149
#7102
#7141
#7148
#7116
#7070
#7117
#7064
#7109
#7087
#7092
#7091
#6998
#7062
#6899
#7063
#7078
https://github.com/apache/pinot/pull/7035
#6424
#6991
#7013
#6820
#7007
#6997
#6927
#6661
#6873
#6968
#6973
#6957
#6967
#6959
#6946
#6945
#6890
#6929
#6922
#6928
#6916
#6914
#6901
#6877
#6897
#6878
#6879
#6876
#6811
#6847
#6859
#6855
#6719
#6822
#6808
#6817
#6776
#6567
#6667
#6744
#6710
#6653
#6718
#6613
#6686
#6479
#6668
#6650
#6655
#6640
#6647
#6620
#6590
#6543
#6541
#6596
#6424
#7158
#6980
#6618
#7078
#7021
#6737
#6840
#7122
#7127
#7125
#7112
#7097
#7095
#7057
#7031
#7030
#7027
#6964
#7001
#6990
#6976
#6950
#6908
#6765
#6941
#6795
$ curl -X POST \
  -d '{"sql":"SELECT SUM(moo), MAX(bar), COUNT(*) FROM myTable"}' \
  localhost:8099/query/sql -H "Content-Type: application/json" 
{
  "exceptions": [], 
  "minConsumingFreshnessTimeMs": 0, 
  "numConsumingSegmentsQueried": 0, 
  "numDocsScanned": 6, 
  "numEntriesScannedInFilter": 0, 
  "numEntriesScannedPostFilter": 12, 
  "numGroupsLimitReached": false, 
  "numSegmentsMatched": 2, 
  "numSegmentsProcessed": 2, 
  "numSegmentsQueried": 2, 
  "numServersQueried": 1, 
  "numServersResponded": 1, 
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "DOUBLE", 
        "DOUBLE", 
        "LONG"
      ], 
      "columnNames": [
        "sum(moo)", 
        "max(bar)", 
        "count(*)"
      ]
    }, 
    "rows": [
      [
        62335, 
        2019.0, 
        6
      ]
    ]
  }, 
  "segmentStatistics": [], 
  "timeUsedMs": 87, 
  "totalDocs": 6, 
  "traceInfo": {}
}
$ curl -X POST \
  -d '{"sql":"SELECT SUM(moo), MAX(bar) FROM myTable GROUP BY foo ORDER BY foo"}' \
  localhost:8099/query/sql -H "Content-Type: application/json" 
{
  "exceptions": [], 
  "minConsumingFreshnessTimeMs": 0, 
  "numConsumingSegmentsQueried": 0, 
  "numDocsScanned": 6, 
  "numEntriesScannedInFilter": 0, 
  "numEntriesScannedPostFilter": 18, 
  "numGroupsLimitReached": false, 
  "numSegmentsMatched": 2, 
  "numSegmentsProcessed": 2, 
  "numSegmentsQueried": 2, 
  "numServersQueried": 1, 
  "numServersResponded": 1, 
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "STRING", 
        "DOUBLE", 
        "DOUBLE"
      ], 
      "columnNames": [
        "foo", 
        "sum(moo)", 
        "max(bar)"
      ]
    }, 
    "rows": [
      [
        "abc", 
        560.0, 
        2008.0
      ], 
      [
        "pqr", 
        21760.0, 
        2010.0
      ], 
      [
        "xyz", 
        40015.0, 
        2019.0
      ]
    ]
  }, 
  "segmentStatistics": [], 
  "timeUsedMs": 15, 
  "totalDocs": 6, 
  "traceInfo": {}
}

Response Field

Description

resultTable

This contains everything needed to process the response

resultTable.dataSchema

This describes schema of the response (columnNames and their dataTypes)

resultTable.dataSchema.columnNames

columnNames in the response.

resultTable.dataSchema.columnDataTypes

DataTypes for each column

resultTable.rows

Actual content with values. This is an array of arrays. number of rows depends on the limit value in the query. The number of columns in each row is equal to the length of (resultTable.dataSchema.columnNames)

timeUsedms

Total time taken as seen by the broker before sending the response back to the client

totalDocs

This is number of documents/records in the table

numServersQueried

represents the number of servers queried by the broker (note that this may be less than the total number of servers since broker can apply some optimizations to minimize the number of servers)

numServersResponded

This should be equal to the numServersQueried. If this is not the same, then one of more servers might have timed out. If numServersQueried != numServersResponded the results can be considered partial and clients can retry the query with exponential back off.

numSegmentsQueried

Total number of segmentsQueried for this query. it may be less than the total number of segments since broker can apply optimizations.

numSegmentsMatched

This is the number of segments actually processed. This indicates the effectiveness of pruning logic (based on partitioning, time etc).

numSegmentsProcessed

Actual number of segments that were processed. This is where the majority of the time is spent.

numDocScanned

The number of docs/records that were scanned to process the query. This includes the docs scanned in filter phase (this can be zero if columns in query are indexed) and post filter.

numEntriesScannedInFilter

This along with numEntriesScannedInPostFilter should give an idea on where most of the time is spent during query processing. If this is high, enabling indexing for columns in tableConfig can be one way to bring it down.

numEntriesScannedPostFilter

This along with numEntriesScannedInPostFilter should give an idea on where most of the time is spent during query processing. A high number for this means the selectivity is low (i.e. pinot needs to scan a lot of records to answer the query). If this is high, adding regular inverted/bitmap index will not help. However, consider using start-tree index.

numGroupsLimitReached

If the query has group by clause and top K, pinot drops new entries after the numGroupsLimit is reached. If this boolean is set to true then the query result may not be accurate. Note that the default value for numGroupsLimit is 100k and should be sufficient for most use cases.

exceptions

Will contain the stack trace if there is any exception processing the query.

segmentStatistics

N/A

traceInfo

If trace is enabled (can be enabled for each query), this will contain the timing for each stage and each segment. Advanced feature and intended for dev/debugging purposes

curl -X POST \
  -d '{"pql":"select * from flights limit 3"}' \
  http://localhost:8099/query


{
 "selectionResults":{
    "columns":[
       "Cancelled",
       "Carrier",
       "DaysSinceEpoch",
       "Delayed",
       "Dest",
       "DivAirports",
       "Diverted",
       "Month",
       "Origin",
       "Year"
    ],
    "results":[
       [
          "0",
          "AA",
          "16130",
          "0",
          "SFO",
          [],
          "0",
          "3",
          "LAX",
          "2014"
       ],
       [
          "0",
          "AA",
          "16130",
          "0",
          "LAX",
          [],
          "0",
          "3",
          "SFO",
          "2014"
       ],
       [
          "0",
          "AA",
          "16130",
          "0",
          "SFO",
          [],
          "0",
          "3",
          "LAX",
          "2014"
       ]
    ]
 },
 "traceInfo":{},
 "numDocsScanned":3,
 "aggregationResults":[],
 "timeUsedMs":10,
 "segmentStatistics":[],
 "exceptions":[],
 "totalDocs":102
}
curl -X POST \
  -d '{"pql":"select count(*) from flights"}' \
  http://localhost:8099/query


{
 "traceInfo":{},
 "numDocsScanned":17,
 "aggregationResults":[
    {
       "function":"count_star",
       "value":"17"
    }
 ],
 "timeUsedMs":27,
 "segmentStatistics":[],
 "exceptions":[],
 "totalDocs":17
}
curl -X POST \
  -d '{"pql":"select count(*) from flights group by Carrier"}' \
  http://localhost:8099/query


{
 "traceInfo":{},
 "numDocsScanned":23,
 "aggregationResults":[
    {
       "groupByResult":[
          {
             "value":"10",
             "group":["AA"]
          },
          {
             "value":"9",
             "group":["VX"]
          },
          {
             "value":"4",
             "group":["WN"]
          }
       ],
       "function":"count_star",
       "groupByColumns":["Carrier"]
    }
 ],
 "timeUsedMs":47,
 "segmentStatistics":[],
 "exceptions":[],
 "totalDocs":23
}
PQL endpoint response
$ curl -H "Content-Type: application/json" -X POST \
   -d '{"sql":"SELECT moo, bar, foo FROM myTable ORDER BY foo DESC"}' \
   http://localhost:8099/query/sql
{
  "exceptions": [], 
  "minConsumingFreshnessTimeMs": 0, 
  "numConsumingSegmentsQueried": 0, 
  "numDocsScanned": 6, 
  "numEntriesScannedInFilter": 0, 
  "numEntriesScannedPostFilter": 18, 
  "numGroupsLimitReached": false, 
  "numSegmentsMatched": 2, 
  "numSegmentsProcessed": 2, 
  "numSegmentsQueried": 2, 
  "numServersQueried": 1, 
  "numServersResponded": 1, 
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "LONG",
        "INT",
        "STRING"
      ], 
      "columnNames": [
        "moo", 
        "bar",
        "foo"
      ]
    }, 
    "rows": [
      [ 
        40015, 
        2019,
        "xyz"
      ], 
      [
        1002,
        2001,
        "pqr"
      ], 
      [
        20555,
        1988,
        "pqr"
      ],
      [ 
        203,
        2010,
        "pqr"
      ], 
      [
        500,
        2008,
        "abc"
      ], 
      [
        60, 
        2003,
        "abc"
      ]
    ]
  }, 
  "segmentStatistics": [], 
  "timeUsedMs": 4, 
  "totalDocs": 6, 
  "traceInfo": {}
}
kubectl apply -f helm-rbac.yaml
kubectl get all -n pinot-quickstart
helm repo add incubator https://charts.helm.sh/incubator
helm install -n pinot-quickstart kafka incubator/kafka --set replicas=1
helm repo add incubator https://charts.helm.sh/incubator
helm install --namespace "pinot-quickstart"  --name kafka incubator/kafka
kubectl get all -n pinot-quickstart | grep kafka
pod/kafka-0                                                 1/1     Running     0          2m
pod/kafka-zookeeper-0                                       1/1     Running     0          10m
pod/kafka-zookeeper-1                                       1/1     Running     0          9m
pod/kafka-zookeeper-2                                       1/1     Running     0          8m
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor 1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1
kubectl apply -f pinot/pinot-realtime-quickstart.yml
./query-pinot-data.sh
kubectl apply -f superset.yaml
kubectl get all -n pinot-quickstart | grep superset
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'flask fab create-admin'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset db upgrade'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset init'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_datasources -p /etc/superset/pinot_example_datasource.yaml'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_dashboards -p /etc/superset/pinot_example_dashboard.json'
./open-superset-ui.sh
helm repo add trino https://trinodb.github.io/charts/
helm search repo trino
helm inspect values trino/trino > /tmp/trino-values.yaml
additionalCatalogs:
  pinot: |
    connector.name=pinot
    pinot.controller-urls=pinot-controller.pinot-quickstart:9000
kubectl create ns trino-quickstart
helm install my-trino trino/trino --version 0.2.0 -n trino-quickstart --values /tmp/trino-values.yaml
kubectl get pods -n trino-quickstart
curl -L https://repo1.maven.org/maven2/io/trino/trino-cli/363/trino-cli-363-executable.jar -o /tmp/trino && chmod +x /tmp/trino
echo "Visit http://127.0.0.1:18080 to use your application"
kubectl port-forward service/my-trino 18080:8080 -n trino-quickstart
/tmp/trino --server localhost:18080 --catalog pinot --schema default
trino:default> show catalogs;
  Catalog
---------
 pinot
 system
 tpcds
 tpch
(4 rows)

Query 20211025_010256_00002_mxcvx, FINISHED, 2 nodes
Splits: 36 total, 36 done (100.00%)
0.70 [0 rows, 0B] [0 rows/s, 0B/s]
trino:default> show tables;
    Table
--------------
 airlinestats
(1 row)

Query 20211025_010326_00003_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.28 [1 rows, 29B] [3 rows/s, 104B/s]
trino:default> DESCRIBE airlinestats;
        Column        |      Type      | Extra | Comment
----------------------+----------------+-------+---------
 flightnum            | integer        |       |
 origin               | varchar        |       |
 quarter              | integer        |       |
 lateaircraftdelay    | integer        |       |
 divactualelapsedtime | integer        |       |
 divwheelsons         | array(integer) |       |
 divwheelsoffs        | array(integer) |       |
......

Query 20211025_010414_00006_mxcvx, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0.37 [79 rows, 5.96KB] [212 rows/s, 16KB/s]
trino:default> select count(*) as cnt from airlinestats limit 10;
 cnt
------
 9746
(1 row)

Query 20211025_015607_00009_mxcvx, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0.24 [1 rows, 9B] [4 rows/s, 38B/s]
helm install presto pinot/presto -n pinot-quickstart
kubectl apply -f presto-coordinator.yaml
helm inspect values pinot/presto > /tmp/presto-values.yaml
helm install presto pinot/presto -n pinot-quickstart --values /tmp/presto-values.yaml
kubectl get pods -n pinot-quickstart
./pinot-presto-cli.sh
curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.246/presto-cli-0.246-executable.jar -o /tmp/presto-cli && chmod +x /tmp/presto-cli
kubectl port-forward service/presto-coordinator 18080:8080 -n pinot-quickstart> /dev/null &
/tmp/presto-cli --server localhost:18080 --catalog pinot --schema default
presto:default> show catalogs;
 Catalog
---------
 pinot
 system
(2 rows)

Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
presto:default> show tables;
    Table
--------------
 airlinestats
(1 row)

Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [1 rows, 29B] [1 rows/s, 41B/s]
presto:default> DESCRIBE pinot.dontcare.airlinestats;
        Column        |  Type   | Extra | Comment
----------------------+---------+-------+---------
 flightnum            | integer |       |
 origin               | varchar |       |
 quarter              | integer |       |
 lateaircraftdelay    | integer |       |
 divactualelapsedtime | integer |       |
......

Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]
presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;
 cnt
------
 9745
(1 row)

Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [2 rows/s, 19B/s]
kubectl delete ns pinot-quickstart
Enable Kubernetes on Docker-Desktop
Install Minikube for local setup
Setup a Kubernetes Cluster using Amazon Elastic Kubernetes Service (Amazon EKS)
Setup a Kubernetes Cluster using Google Kubernetes Engine (GKE)
Setup a Kubernetes Cluster using Azure Kubernetes Service (AKS)
here
helm repo add pinot https://raw.githubusercontent.com/apache/pinot/master/kubernetes/helm
kubectl create ns pinot-quickstart
helm install pinot pinot/pinot \
    -n pinot-quickstart \
    --set cluster.name=pinot \
    --set server.replicaCount=2
helm dependency update
helm init --service-account tiller
helm install --namespace "pinot-quickstart" --name "pinot" pinot
kubectl create ns pinot-quickstart
helm install -n pinot-quickstart pinot pinot
Error: could not find tiller.
kubectl -n kube-system delete deployment tiller-deploy
kubectl -n kube-system delete service/tiller-deploy
helm init --service-account tiller
here
Sample Output of K8s Deployment Status
# checkout pinot
git clone https://github.com/apache/pinot.git
cd pinot/kubernetes/helm

Text search support

This page talks about support for text search functionality in Pinot.

Why do we need text search?

Pinot supports super fast query processing through its indexes on non-BLOB like columns. Queries with exact match filters are run efficiently through a combination of dictionary encoding, inverted index and sorted index. An example:

SELECT COUNT(*) FROM Foo WHERE STRING_COL = "ABCDCD" AND INT_COL > 2000

In the above query, we are doing exact match on two columns of type STRING and INT respectively.

For arbitrary text data which falls into the BLOB/CLOB territory, we need more than exact matches. Users are interested in doing regex, phrase, fuzzy queries on BLOB like data. Before 0.3.0, one had to use regexp_like to achieve this. However, this was scan based which was not performant and features like fuzzy search (edit distance search) were not possible.

In version 0.3.0, we added support for text indexes to efficiently do arbitrary search on STRING columns where each column value is a large BLOB of text. This can be achieved by using the new built-in function TEXT_MATCH.

SELECT COUNT(*) FROM Foo WHERE TEXT_MATCH (<column_name>, <search_expression)

where <column_name> is the column text index is created on and <search_expression> can be:

Search Expression Type

Example

Phrase query

TEXT_MATCH (<column_name>, '\"distributed system\"')

Term Query

TEXT_MATCH (<column_name>, 'Java')

Boolean Query

TEXT_MATCH (<column_name>, 'Java and c++')

Prefix Query

TEXT_MATCH (<column_name>, 'stream*')

Regex Query

TEXT_MATCH (<column_name>, '/Exception.*/')

Sample Datasets

Text search should ideally be used on STRING columns where doing standard filter operations (EQUALITY, RANGE, BETWEEN) doesn't fit the bill because each column value is a reasonably large blob of text.

Apache Access Log

Consider the following snippet from Apache access log. Each line in the log consists of arbitrary data (IP addresses, URLs, timestamps, symbols etc) and represents a column value. Data like this is a good candidate for doing text search.

Let's say the following snippet of data is stored in ACCESS_LOG_COL column in Pinot table.

109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-
109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
83.167.113.100 - - [12/Dec/2015:18:31:25 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
83.167.113.100 - - [12/Dec/2015:18:31:25 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
95.29.198.15 - - [12/Dec/2015:18:32:10 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
95.29.198.15 - - [12/Dec/2015:18:32:11 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
109.184.11.34 - - [12/Dec/2015:18:32:56 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
109.184.11.34 - - [12/Dec/2015:18:32:56 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
91.227.29.79 - - [12/Dec/2015:18:33:51 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"

Few examples of search queries on this data:

Count the number of GET requests.

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(ACCESS_LOG_COL, 'GET')

Count the number of POST requests that have administrator in the URL (administrator/index)

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(ACCESS_LOG_COL, 'post AND administrator AND index')

Count the number of POST requests that have a particular URL and handled by Firefox browser

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(ACCESS_LOG_COL, 'post AND administrator AND index AND firefox')

Resume text

Consider another example of simple resume text. Each line in the file represents skill-data from resumes of different candidates

Let's say the following snippet of data is stored in SKILLS_COL column in Pinot table. Each line in the input text represents a column value.

Distributed systems, Java, C++, Go, distributed query engines for analytics and data warehouses, Machine learning, spark, Kubernetes, transaction processing
Java, Python, C++, Machine learning, building and deploying large scale production systems, concurrency, multi-threading, CPU processing
C++, Python, Tensor flow, database kernel, storage, indexing and transaction processing, building large scale systems, Machine learning
Amazon EC2, AWS, hadoop, big data, spark, building high performance scalable systems, building and deploying large scale production systems, concurrency, multi-threading, Java, C++, CPU processing
Distributed systems, database development, columnar query engine, database kernel, storage, indexing and transaction processing, building large scale systems
Distributed systems, Java, realtime streaming systems, Machine learning, spark, Kubernetes, distributed storage, concurrency, multi-threading
CUDA, GPU, Python, Machine learning, database kernel, storage, indexing and transaction processing, building large scale systems
Distributed systems, Java, database engine, cluster management, docker image building and distribution
Kubernetes, cluster management, operating systems, concurrency, multi-threading, apache airflow, Apache Spark,
Apache spark, Java, C++, query processing, transaction processing, distributed storage, concurrency, multi-threading, apache airflow
Big data stream processing, Apache Flink, Apache Beam, database kernel, distributed query engines for analytics and data warehouses
CUDA, GPU processing, Tensor flow, Pandas, Python, Jupyter notebook, spark, Machine learning, building high performance scalable systems
Distributed systems, Apache Kafka, publish-subscribe, building and deploying large scale production systems, concurrency, multi-threading, C++, CPU processing, Java
Realtime stream processing, publish subscribe, columnar processing for data warehouses, concurrency, Java, multi-threading, C++,

Few examples of search queries on this data:

Count the number of candidates that have "machine learning" and "gpu processing" - a phrase search (more on this further in the document) where we are looking for exact match of phrases "machine learning" and "gpu processing" not necessarily in the same order in original data.

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"Machine learning\" AND \"gpu processing\"')

Count the number of candidates that have "distributed systems" and either 'Java' or 'C++' - a combination of searching for exact phrase "distributed systems" along with other terms.

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"distributed systems\" AND (Java C++)')

Query Log

Consider a snippet from a log file containing SQL queries handled by a database. Each line (query) in the file represents a column value in QUERY_LOG_COL column in Pinot table.

SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560988800000 AND 1568764800000 GROUP BY dimensionCol3 TOP 2500
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560988800000 AND 1568764800000 GROUP BY dimensionCol3 TOP 2500
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1545436800000 AND 1553212800000 GROUP BY dimensionCol3 TOP 2500
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1537228800000 AND 1537660800000 GROUP BY dimensionCol3 TOP 2500
SELECT dimensionCol2, dimensionCol4, timestamp, dimensionCol5, dimensionCol6 FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1561366800000 AND 1561370399999 AND dimensionCol3 = 2019062409 LIMIT 10000
SELECT dimensionCol2, dimensionCol4, timestamp, dimensionCol5, dimensionCol6 FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1563807600000 AND 1563811199999 AND dimensionCol3 = 2019072215 LIMIT 10000
SELECT dimensionCol2, dimensionCol4, timestamp, dimensionCol5, dimensionCol6 FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1563811200000 AND 1563814799999 AND dimensionCol3 = 2019072216 LIMIT 10000
SELECT dimensionCol2, dimensionCol4, timestamp, dimensionCol5, dimensionCol6 FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1566327600000 AND 1566329400000 AND dimensionCol3 = 2019082019 LIMIT 10000
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560834000000 AND 1560837599999 AND dimensionCol3 = 2019061805 LIMIT 0
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560870000000 AND 1560871800000 AND dimensionCol3 = 2019061815 LIMIT 0
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560871800001 AND 1560873599999 AND dimensionCol3 = 2019061815 LIMIT 0
SELECT count(dimensionCol2) FROM FOO WHERE dimensionCol1 = 18616904 AND timestamp BETWEEN 1560873600000 AND 1560877199999 AND dimensionCol3 = 2019061816 LIMIT 0

Few examples of search queries on this data:

Count the number of queries that have GROUP BY

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(QUERY_LOG_COL, '\"group by\"')

Count the number of queries that have the SELECT count... pattern

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(QUERY_LOG_COL, '\"select count\"')

Count the number of queries that use BETWEEN filter on timestamp column along with GROUP BY

SELECT COUNT(*) FROM MyTable WHERE TEXT_MATCH(QUERY_LOG_COL, '\"timestamp between\" AND \"group by\"')

Further sections in the document cover several concrete examples on each kind of query and step-by-step guide on how to write text search queries in Pinot.

Current restrictions

Currently we support text search in a restricted manner. More specifically, we have the following constraints:

  • The column type should be STRING.

  • The column should be single-valued.

  • Co-existence of text index with other Pinot indexes is currently not supported.

The last two restrictions are going to be relaxed very soon in the upcoming releases.

Co-existence with other indexes

Currently, a column in Pinot can be dictionary encoded or stored RAW. Furthermore, we can create inverted index on the dictionary encoded column. We can also create a sorted index on the dictionary encoded column.

Text index is an addition to the type of per-column indexes users can create in Pinot. However, the current implementation supports text index on RAW column. In other words, the column should not be dictionary encoded. As we relax this constraint in upcoming releases, text index can be created on a dictionary encoded column that also has other indexes (inverted, sorted etc).

How to enable text index?

Similar to other indexes, users can enable text index on a column through table config. As part of text-search feature, we have also introduced a new generic way of specifying the per-column encoding and index information. In the table config, there will be a new section with name "fieldConfigList".

fieldConfigListis currently ONLY used for text indexes. Our plan is to migrate all other indexes to this model. We are going to do that in upcoming releases and accordingly modify user documentation. So please continue to specify other index info in table config as you have done till now and use the fieldConfigList only for text indexes.

"fieldConfigList":[
  {
     "name":"text_col_1",
     "encodingType":"RAW",
     "indexType":"TEXT"
  },
  {
     "name":"text_col_2",
     "encodingType":"RAW",
     "indexType":"TEXT"
  }
]

"fieldConfigList" will be a new section in table config. It is essentially a list of per-column encoding and index information. In the above example, the list contains text index information for two columns text_col_1 and text_col_2. Each object in fieldConfigList contains the following information

  • name - Name of the column text index is enabled on

  • encodingType - As mentioned earlier, we can store a column either as RAW or dictionary encoded. Since for now we have a restriction on the text index, this should always be RAW.

  • indexType - This should be TEXT.

Since we haven't yet removed the old way of specifying the index info, each column that has a text index should also be specified as noDictionaryColumns in tableIndexConfig:

"tableIndexConfig": {
   "noDictionaryColumns": [
     "text_col_1",
     "text_col_2"
 ]}

The above mechanism can be used to configure text indexes in the following scenarios:

  • Adding a new table with text index enabled on one or more columns.

  • Adding a new column with text index enabled to an existing table.

  • Enabling text index on an existing column.

Text Index Creation

Once the text index is enabled on one or more columns through table config, our segment generation code will pick up the config and automatically create text index (per column). This is exactly how other indexes in Pinot are created.

Text index is supported for both offline and realtime segments.

Text parsing and tokenization

The original text document (a value in the column with text index enabled) is parsed, tokenized and individual "indexable" terms are extracted. These terms are inserted into the index.

Pinot's text index is built on top of Lucene. Lucene's standard english text tokenizer generally works well for most classes of text. We might want to build custom text parser and tokenizer to suit particular user requirements. Accordingly, we can make this configurable for the user to specify on per column text index basis.

Writing Text Search Queries

A new built-in function TEXT_MATCH has been introduced for using text search in SQL/PQL.

TEXT_MATCH(text_column_name, search_expression)

  • text_column_name - name of the column to do text search on.

  • search_expression - search query

We can use TEXT_MATCH function as part of our queries in the WHERE clause. Examples:

SELECT COUNT(*) FROM Foo WHERE TEXT_MATCH(...)
SELECT * FROM Foo WHERE TEXT_MATCH(...)

We can also use the TEXT_MATCH filter clause with other filter operators. For example:

SELECT COUNT(*) FROM Foo WHERE TEXT_MATCH(...) AND some_other_column_1 > 20000
SELECT COUNT(*) FROM Foo WHERE TEXT_MATCH(...) AND some_other_column_1 > 20000 AND some_other_column_2 < 100000

Combining multiple TEXT_MATCH filter clauses

SELECT COUNT(*) FROM Foo WHERE TEXT_MATCH(text_col_1, ....) AND TEXT_MATCH(text_col_2, ...)

TEXT_MATCH can be used in WHERE clause of all kinds of queries supported by Pinot

  • Selection query which projects one or more columns

    • User can also include the text column name in select list

  • Aggregation query

  • Aggregation GROUP BY query

The search expression (second argument to TEXT_MATCH function) is the query string that Pinot will use to perform text search on the column's text index. **Following expression types are supported

Phrase Query

This query is used to do exact match of a given phrase. Exact match implies that terms in the user specified phrase should appear in the exact same order in the original text document. Note that document is referred to as the column value.

Let's take the example of resume text data containing 14 documents to walk through queries. The data is stored in column named SKILLS_COL and we have created a text index on this column.

Java, C++, worked on open source projects, coursera machine learning
Machine learning, Tensor flow, Java, Stanford university,
Distributed systems, Java, C++, Go, distributed query engines for analytics and data warehouses, Machine learning, spark, Kubernetes, transaction processing
Java, Python, C++, Machine learning, building and deploying large scale production systems, concurrency, multi-threading, CPU processing
C++, Python, Tensor flow, database kernel, storage, indexing and transaction processing, building large scale systems, Machine learning
Amazon EC2, AWS, hadoop, big data, spark, building high performance scalable systems, building and deploying large scale production systems, concurrency, multi-threading, Java, C++, CPU processing
Distributed systems, database development, columnar query engine, database kernel, storage, indexing and transaction processing, building large scale systems
Distributed systems, Java, realtime streaming systems, Machine learning, spark, Kubernetes, distributed storage, concurrency, multi-threading
CUDA, GPU, Python, Machine learning, database kernel, storage, indexing and transaction processing, building large scale systems
Distributed systems, Java, database engine, cluster management, docker image building and distribution
Kubernetes, cluster management, operating systems, concurrency, multi-threading, apache airflow, Apache Spark,
Apache spark, Java, C++, query processing, transaction processing, distributed storage, concurrency, multi-threading, apache airflow
Big data stream processing, Apache Flink, Apache Beam, database kernel, distributed query engines for analytics and data warehouses
CUDA, GPU processing, Tensor flow, Pandas, Python, Jupyter notebook, spark, Machine learning, building high performance scalable systems
Distributed systems, Apache Kafka, publish-subscribe, building and deploying large scale production systems, concurrency, multi-threading, C++, CPU processing, Java
Realtime stream processing, publish subscribe, columnar processing for data warehouses, concurrency, Java, multi-threading, C++,
C++, Java, Python, realtime streaming systems, Machine learning, spark, Kubernetes, transaction processing, distributed storage, concurrency, multi-threading, apache airflow
Databases, columnar query processing, Apache Arrow, distributed systems, Machine learning, cluster management, docker image building and distribution
Database engine, OLAP systems, OLTP transaction processing at large scale, concurrency, multi-threading, GO, building large scale systems

Example 1 - Search in SKILL_COL column to look for documents where each matching document MUST contain phrase "distributed systems" as is

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"Distributed systems\"')

The search expression is '\"Distributed systems\"'

  • The search expression is always specified within single quotes '<your expression>'

  • Since we are doing a phrase search, the phrase should be specified within double quotes inside the single quotes and the double quotes should be escaped

    • '\"<your phrase>\"'

The above query will match the following documents:

Distributed systems, Java, C++, Go, distributed query engines for analytics and data warehouses, Machine learning, spark, Kubernetes, transaction processing
Distributed systems, database development, columnar query engine, database kernel, storage, indexing and transaction processing, building large scale systems
Distributed systems, Java, realtime streaming systems, Machine learning, spark, Kubernetes, distributed storage, concurrency, multi-threading
Distributed systems, Java, database engine, cluster management, docker image building and distribution
Distributed systems, Apache Kafka, publish-subscribe, building and deploying large scale production systems, concurrency, multi-threading, C++, CPU processing, Java
Databases, columnar query processing, Apache Arrow, distributed systems, Machine learning, cluster management, docker image building and distribution

But it won't match the following document:

Distributed data processing, systems design experience

This is because the phrase query looks for the phrase occurring in the original document "as is". The terms as specified by the user in phrase should be in the exact same order in the original document for the document to be considered as a match.

NOTE: Matching is always done in a case-insensitive manner.

Example 2 - Search in SKILL_COL column to look for documents where each matching document MUST contain phrase "query processing" as is

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"query processing\"')

The above query will match the following documents:

Apache spark, Java, C++, query processing, transaction processing, distributed storage, concurrency, multi-threading, apache airflow
Databases, columnar query processing, Apache Arrow, distributed systems, Machine learning, cluster management, docker image building and distribution"

Term Query

Term queries are used to search for individual terms

Example 3 - Search in SKILL_COL column to look for documents where each matching document MUST contain the term 'java'

As mentioned earlier, the search expression is always within single quotes. However, since this is a term query, we don't have to use double quotes within single quotes.

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, 'Java')

Composite Query using Boolean Operators

Boolean operators AND, OR are supported and we can use them to build a composite query. Boolean operators can be used to combine phrase and term queries in any arbitrary manner

Example 4 - Search in SKILL_COL column to look for documents where each matching document MUST contain phrases "distributed systems" and "tensor flow". This combines two phrases using AND boolean operator

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"Machine learning\" AND \"Tensor Flow\"')

The above query will match the following documents:

Machine learning, Tensor flow, Java, Stanford university,
C++, Python, Tensor flow, database kernel, storage, indexing and transaction processing, building large scale systems, Machine learning
CUDA, GPU processing, Tensor flow, Pandas, Python, Jupyter notebook, spark, Machine learning, building high performance scalable systems

Example 5 - Search in SKILL_COL column to look for documents where each document MUST contain phrase "machine learning" and term 'gpu' and term 'python'. This combines a phrase and two terms using boolean operator

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"Machine learning\" AND gpu AND python')

The above query will match the following documents:

CUDA, GPU, Python, Machine learning, database kernel, storage, indexing and transaction processing, building large scale systems
CUDA, GPU processing, Tensor flow, Pandas, Python, Jupyter notebook, spark, Machine learning, building high performance scalable systems

When using boolean operators to combine term(s) and phrase(s) or both, please note that:

  • The matching document can contain the terms and phrases in any order.

  • The matching document may not have the terms adjacent to each other (if this is needed, please use appropriate phrase query for the concerned terms).

Use of OR operator is implicit. In other words, if phrase(s) and term(s) are not combined using AND operator in the search expression, OR operator is used by default:

Example 6 - Search in SKILL_COL column to look for documents where each document MUST contain ANY one of:

  • phrase "distributed systems" OR

  • term 'java' OR

  • term 'C++'.

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"distributed systems\" Java C++')

We can also do grouping using parentheses:

Example 7 - Search in SKILL_COL column to look for documents where each document MUST contain

  • phrase "distributed systems" AND

  • at least one of the terms Java or C++

In the below query, we group terms Java and C++ without any operator which implies the use of OR. The root operator AND is used to combine this with phrase "distributed systems"

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '\"distributed systems\" AND (Java C++)')

Prefix Query

Prefix searches can also be done in the context of a single term. We can't use prefix matches for phrases.

Example 8 - Search in SKILL_COL column to look for documents where each document MUST contain text like stream, streaming, streams etc

SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, 'stream*')

The above query will match the following documents:

Distributed systems, Java, realtime streaming systems, Machine learning, spark, Kubernetes, distributed storage, concurrency, multi-threading
Big data stream processing, Apache Flink, Apache Beam, database kernel, distributed query engines for analytics and data warehouses
Realtime stream processing, publish subscribe, columnar processing for data warehouses, concurrency, Java, multi-threading, C++,
C++, Java, Python, realtime streaming systems, Machine learning, spark, Kubernetes, transaction processing, distributed storage, concurrency, multi-threading, apache airflow

Regular Expression Query

Phrase and term queries work on the fundamental logic of looking up the terms (aka tokens) in the text index. The original text document (a value in the column with text index enabled) is parsed, tokenized and individual "indexable" terms are extracted. These terms are inserted into the index.

Based on the nature of original text and how the text is segmented into tokens, it is possible that some terms don't get indexed individually. In such cases, it is better to use regular expression queries on the text index.

Consider server log as an example and we want to look for exceptions. A regex query is suitable for this scenario as it is unlikely that 'exception' is present as an individual indexed token.

Syntax of a regex query is slightly different from queries mentioned earlier. The regular expression is written between a pair of forward slashes (/).

SELECT SKILLS_COL FROM MyTable WHERE text_match(SKILLS_COL, '/.*Exception/')

The above query will match any text document containing exception.

Deciding Query Types

Generally, a combination of phrase and term queries using boolean operators and grouping should allow us to build a complex text search query expression.

The key thing to remember is that phrases should be used when the order of terms in the document is important and if separating the phrase into individual terms doesn't make sense from end user's perspective.

An example would be phrase "machine learning".

TEXT_MATCH(column, '\"machine learning\"')

However, if we are searching for documents matching Java and C++ terms, using phrase query "Java C++" will actually result in in partial results (could be empty too) since now we are relying the on the user specifying these skills in the exact same order (adjacent to each other) in the resume text.

TEXT_MATCH(column, '\"Java C++\"')

Term query using boolean AND operator is more appropriate for such cases

TEXT_MATCH(column, 'Java AND C++')

0.9.0

Summary

This release introduces a new features: Segment Merge and Rollup to simplify users day to day operational work. A new metrics plugin is added to support dropwizard. As usual, new functionalities and many UI/ Performance improvements.

The release was cut from the following commit: 13c9ee9 and the following cherry-picks: 668b5e0, ee887b9

Support Segment Merge and Roll-up

LinkedIn operates a large multi-tenant cluster that serves a business metrics dashboard, and noticed that their tables consisted of millions of small segments. This was leading to slow operations in Helix/Zookeeper, long running queries due to having too many tasks to process, as well as using more space because of a lack of compression.

To solve this problem they added the Segment Merge task, which compresses segments based on timestamps and rolls up/aggregates older data. The task can be run on a schedule or triggered manually via the Pinot REST API.

At the moment this feature is only available for offline tables, but will be added for real-time tables in a future release.

Major Changes:

  • Integrate enhanced SegmentProcessorFramework into MergeRollupTaskExecutor (#7180)

  • Merge/Rollup task scheduler for offline tables. (#7178)

  • Fix MergeRollupTask uploading segments not updating their metadata (#7289)

  • MergeRollupTask integration tests (#7283)

  • Add mergeRollupTask delay metrics (#7368)

  • MergeRollupTaskGenerator enhancement: enable parallel buckets scheduling (#7481)

  • Use maxEndTimeMs for merge/roll-up delay metrics. (#7617)

UI Improvement

This release also sees improvements to Pinot’s query console UI.

  • Cmd+Enter shortcut to run query in query console (#7359)

  • Showing tooltip in SQL Editor (#7387)

  • Make the SQL Editor box expandable (#7381)

  • Fix tables ordering by number of segments (#7564)

SQL Improvements

There have also been improvements and additions to Pinot’s SQL implementation.

New functions:

  • IN (#7542)

  • LASTWITHTIME (#7584)

  • ID_SET on MV columns (#7355)

  • Raw results for Percentile TDigest and Est (#7226),

  • Add timezone as argument in function toDateTime (#7552)

New predicates are supported:

  • LIKE(#7214)

  • REGEXP_EXTRACT(#7114)

  • FILTER(#7566)

Query compatibility improvements:

  • Infer data type for Literal (#7332)

  • Support logical identifier in predicate (#7347)

  • Support JSON queries with top-level array path expression. (#7511)

  • Support configurable group by trim size to improve results accuracy (#7241)

Performance Improvements

This release contains many performance improvement, you may sense it for you day to day queries. Thanks to all the great contributions listed below:

  • Reduce the disk usage for segment conversion task (#7193)

  • Simplify association between Java Class and PinotDataType for faster mapping (#7402)

  • Avoid creating stateless ParseContextImpl once per jsonpath evaluation, avoid varargs allocation (#7412)

  • Replace MINUS with STRCMP (#7394)

  • Bit-sliced range index for int, long, float, double, dictionarized SV columns (#7454)

  • Use MethodHandle to access vectorized unsigned comparison on JDK9+ (#7487)

  • Add option to limit thread usage per query (#7492)

  • Improved range queries (#7513)

  • Faster bitmap scans (#7530)

  • Optimize EmptySegmentPruner to skip pruning when there is no empty segments (#7531)

  • Map bitmaps through a bounded window to avoid excessive disk pressure (#7535)

  • Allow RLE compression of bitmaps for smaller file sizes (#7582)

  • Support raw index properties for columns with JSON and RANGE indexes (#7615)

  • Enhance BloomFilter rule to include IN predicate(#7444) (#7624)

  • Introduce LZ4_WITH_LENGTH chunk compression type (#7655)

  • Enhance ColumnValueSegmentPruner and support bloom filter prefetch (#7654)

  • Apply the optimization on dictIds within the segment to DistinctCountHLL aggregation func (#7630)

  • During segment pruning, release the bloom filter after each segment is processed (#7668)

  • Fix JSONPath cache inefficient issue (#7409)

  • Optimize getUnpaddedString with SWAR padding search (#7708)

  • Lighter weight LiteralTransformFunction, avoid excessive array fills (#7707)

  • Inline binary comparison ops to prevent function call overhead (#7709)

  • Memoize literals in query context in order to deduplicate them (#7720)

Other Notable New Features and Changes

  • Human Readable Controller Configs (#7173)

  • Add the support of geoToH3 function (#7182)

  • Add Apache Pulsar as Pinot Plugin (#7223) (#7247)

  • Add dropwizard metrics plugin (#7263)

  • Introduce OR Predicate Execution On Star Tree Index (#7184)

  • Allow to extract values from array of objects with jsonPathArray (#7208)

  • Add Realtime table metadata and indexes API. (#7169)

  • Support array with mixing data types (#7234)

  • Support force download segment in reload API (#7249)

  • Show uncompressed znRecord from zk api (#7304)

  • Add debug endpoint to get minion task status. (#7300)

  • Validate CSV Header For Configured Delimiter (#7237)

  • Add auth tokens and user/password support to ingestion job command (#7233)

  • Add option to store the hash of the upsert primary key (#7246)

  • Add null support for time column (#7269)

  • Add mode aggregation function (#7318)

  • Support disable swagger in Pinot servers (#7341)

  • Delete metadata properly on table deletion (#7329)

  • Add basic Obfuscator Support (#7407)

  • Add AWS sts dependency to enable auth using web identity token. (#7017)(#7445)

  • Mask credentials in debug endpoint /appconfigs (#7452)

  • Fix /sql query endpoint now compatible with auth (#7230)

  • Fix case sensitive issue in BasicAuthPrincipal permission check (#7354)

  • Fix auth token injection in SegmentGenerationAndPushTaskExecutor (#7464)

  • Add segmentNameGeneratorType config to IndexingConfig (#7346)

  • Support trigger PeriodicTask manually (#7174)

  • Add endpoint to check minion task status for a single task. (#7353)

  • Showing partial status of segment and counting CONSUMING state as good segment status (#7327)

  • Add "num rows in segments" and "num segments queried per host" to the output of Realtime Provisioning Rule (#7282)

  • Check schema backward-compatibility when updating schema through addSchema with override (#7374)

  • Optimize IndexedTable (#7373)

  • Support indices remove in V3 segment format (#7301)

  • Optimize TableResizer (#7392)

  • Introduce resultSize in IndexedTable (#7420)

  • Offset based realtime consumption status checker (#7267)

  • Add causes to stack trace return (#7460)

  • Create controller resource packages config key (#7488)

  • Enhance TableCache to support schema name different from table name (#7525)

  • Add validation for realtimeToOffline task (#7523)

  • Unify CombineOperator multi-threading logic (#7450)

  • Support no downtime rebalance for table with 1 replica in TableRebalancer (#7532)

  • Introduce MinionConf, move END_REPLACE_SEGMENTS_TIMEOUT_MS to minion config instead of task config. (#7516)

  • Adjust tuner api (#7553)

  • Adding config for metrics library (#7551)

  • Add geo type conversion scalar functions (#7573)

  • Add BOOLEAN_ARRAY and TIMESTAMP_ARRAY types (#7581)

  • Add MV raw forward index and MV BYTES data type (#7595)

  • Enhance TableRebalancer to offload the segments from most loaded instances first (#7574)

  • Improve get tenant API to differentiate offline and realtime tenants (#7548)

  • Refactor query rewriter to interfaces and implementations to allow customization (#7576)

  • In ServiceStartable, apply global cluster config in ZK to instance config (#7593)

  • Make dimension tables creation bypass tenant validation (#7559)

  • Allow Metadata and Dictionary Based Plans for No Op Filters (#7563)

  • Reject query with identifiers not in schema (#7590)

  • Round Robin IP addresses when retry uploading/downloading segments (#7585)

  • Support multi-value derived column in offline table reload (#7632)

  • Support segmentNamePostfix in segment name (#7646)

  • Add select segments API (#7651)

  • Controller getTableInstance() call now returns the list of live brokers of a table. (#7556)

  • Allow MV Field Support For Raw Columns in Text Indices (#7638)

  • Allow override distinctCount to segmentPartitionedDistinctCount (#7664)

  • Add a quick start with both UPSERT and JSON index (#7669)

  • Add revertSegmentReplacement API (#7662)

  • Smooth segment reloading with non blocking semantic (#7675)

  • Clear the reused record in PartitionUpsertMetadataManager (#7676)

  • Replace args4j with picocli (#7665)

  • Handle datetime column consistently (#7645)(#7705)

  • Allow to carry headers with query requests (#7696) (#7712)

  • Allow adding JSON data type for dimension column types (#7718)

  • Separate SegmentDirectoryLoader and tierBackend concepts (#7737)

  • Implement size balanced V4 raw chunk format (#7661)

  • Add presto-pinot-driver lib (#7384)

Major Bug fixes

  • Fix null pointer exception for non-existed metric columns in schema for JDBC driver (#7175)

  • Fix the config key for TASK_MANAGER_FREQUENCY_PERIOD (#7198)

  • Fixed pinot java client to add zkClient close (#7196)

  • Ignore query json parse errors (#7165)

  • Fix shutdown hook for PinotServiceManager (#7251) (#7253)

  • Make STRING to BOOLEAN data type change as backward compatible schema change (#7259)

  • Replace gcp hardcoded values with generic annotations (#6985)

  • Fix segment conversion executor for in-place conversion (#7265)

  • Fix reporting consuming rate when the Kafka partition level consumer isn't stopped (#7322)

  • Fix the issue with concurrent modification for segment lineage (#7343)

  • Fix TableNotFound error message in PinotHelixResourceManager (#7340)

  • Fix upload LLC segment endpoint truncated download URL (#7361)

  • Fix task scheduling on table update (#7362)

  • Fix metric method for ONLINE_MINION_INSTANCES metric (#7363)

  • Fix JsonToPinotSchema behavior to be consistent with AvroSchemaToPinotSchema (#7366)

  • Fix currentOffset volatility in consuming segment(#7365)

  • Fix misleading error msg for missing URI (#7367)

  • Fix the correctness of getColumnIndices method (#7370)

  • Fix SegmentZKMetadta time handling (#7375)

  • Fix retention for cleaning up segment lineage (#7424)

  • Fix segment generator to not return illegal filenames (#7085)

  • Fix missing LLC segments in segment store by adding controller periodic task to upload them (#6778)

  • Fix parsing error messages returned to FileUploadDownloadClient (#7428)

  • Fix manifest scan which drives /version endpoint (#7456)

  • Fix missing rate limiter if brokerResourceEV becomes null due to ZK connection (#7470)

  • Fix race conditions between segment merge/roll-up and purge (or convertToRawIndex) tasks: (#7427)

  • Fix pql double quote checker exception (#7485)

  • Fix minion metrics exporter config (#7496)

  • Fix segment unable to retry issue by catching timeout exception during segment replace (#7509)

  • Add Exception to Broker Response When Not All Segments Are Available (Partial Response) (#7397)

  • Fix segment generation commands (#7527)

  • Return non zero from main with exception (#7482)

  • Fix parquet plugin shading error (#7570)

  • Fix the lowest partition id is not 0 for LLC (#7066)

  • Fix star-tree index map when column name contains '.' (#7623)

  • Fix cluster manager URLs encoding issue(#7639)

  • Fix fieldConfig nullable validation (#7648)

  • Fix verifyHostname issue in FileUploadDownloadClient (#7703)

  • Fix TableCache schema to include the built-in virtual columns (#7706)

  • Fix DISTINCT with AS function (#7678)

  • Fix SDF pattern in DataPreprocessingHelper (#7721)

  • Fix fields missing issue in the source in ParquetNativeRecordReader (#7742)

Batch Data Ingestion In Practice

In practice, we need to run Pinot data ingestion as a pipeline or a scheduled job.

Assuming pinot-distribution is already built, inside examples directory, you could find several sample table layouts.

Table Layout

Usually each table deserves its own directory, like airlineStats.

Inside the table directory, rawdata is created to put all the input data.

Typically, for data events with timestamp, we partition those data and store them into a daily folder. E.g. a typically layout would follow this pattern: rawdata/%yyyy%/%mm%/%dd%/[daily_input_files].

/var/pinot/airlineStats/rawdata/2014/01/01/airlineStats_data_2014-01-01.avro
/var/pinot/airlineStats/rawdata/2014/01/02/airlineStats_data_2014-01-02.avro
/var/pinot/airlineStats/rawdata/2014/01/03/airlineStats_data_2014-01-03.avro
/var/pinot/airlineStats/rawdata/2014/01/04/airlineStats_data_2014-01-04.avro
/var/pinot/airlineStats/rawdata/2014/01/05/airlineStats_data_2014-01-05.avro
/var/pinot/airlineStats/rawdata/2014/01/06/airlineStats_data_2014-01-06.avro
/var/pinot/airlineStats/rawdata/2014/01/07/airlineStats_data_2014-01-07.avro
/var/pinot/airlineStats/rawdata/2014/01/08/airlineStats_data_2014-01-08.avro
/var/pinot/airlineStats/rawdata/2014/01/09/airlineStats_data_2014-01-09.avro
/var/pinot/airlineStats/rawdata/2014/01/10/airlineStats_data_2014-01-10.avro
/var/pinot/airlineStats/rawdata/2014/01/11/airlineStats_data_2014-01-11.avro
/var/pinot/airlineStats/rawdata/2014/01/12/airlineStats_data_2014-01-12.avro
/var/pinot/airlineStats/rawdata/2014/01/13/airlineStats_data_2014-01-13.avro
/var/pinot/airlineStats/rawdata/2014/01/14/airlineStats_data_2014-01-14.avro
/var/pinot/airlineStats/rawdata/2014/01/15/airlineStats_data_2014-01-15.avro
/var/pinot/airlineStats/rawdata/2014/01/16/airlineStats_data_2014-01-16.avro
/var/pinot/airlineStats/rawdata/2014/01/17/airlineStats_data_2014-01-17.avro
/var/pinot/airlineStats/rawdata/2014/01/18/airlineStats_data_2014-01-18.avro
/var/pinot/airlineStats/rawdata/2014/01/19/airlineStats_data_2014-01-19.avro
/var/pinot/airlineStats/rawdata/2014/01/20/airlineStats_data_2014-01-20.avro
/var/pinot/airlineStats/rawdata/2014/01/21/airlineStats_data_2014-01-21.avro
/var/pinot/airlineStats/rawdata/2014/01/22/airlineStats_data_2014-01-22.avro
/var/pinot/airlineStats/rawdata/2014/01/23/airlineStats_data_2014-01-23.avro
/var/pinot/airlineStats/rawdata/2014/01/24/airlineStats_data_2014-01-24.avro
/var/pinot/airlineStats/rawdata/2014/01/25/airlineStats_data_2014-01-25.avro
/var/pinot/airlineStats/rawdata/2014/01/26/airlineStats_data_2014-01-26.avro
/var/pinot/airlineStats/rawdata/2014/01/27/airlineStats_data_2014-01-27.avro
/var/pinot/airlineStats/rawdata/2014/01/28/airlineStats_data_2014-01-28.avro
/var/pinot/airlineStats/rawdata/2014/01/29/airlineStats_data_2014-01-29.avro
/var/pinot/airlineStats/rawdata/2014/01/30/airlineStats_data_2014-01-30.avro
/var/pinot/airlineStats/rawdata/2014/01/31/airlineStats_data_2014-01-31.avro

Configuring batch ingestion job

Create a batch ingestion job spec file to describe how to ingest the data.

Below is an example (also located at examples/batch/airlineStats/ingestionJobSpec.yaml)

# executionFrameworkSpec: Defines ingestion jobs to be running.
executionFrameworkSpec:

  # name: execution framework name
  name: 'standalone'

  # segmentGenerationJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentGenerationJobRunner interface.
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'

  # segmentTarPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentTarPushJobRunner interface.
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'

  # segmentUriPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentUriPushJobRunner interface.
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'

# jobType: Pinot ingestion job type.
# Supported job types are:
#   'SegmentCreation'
#   'SegmentTarPush'
#   'SegmentUriPush'
#   'SegmentCreationAndTarPush'
#   'SegmentCreationAndUriPush'
jobType: SegmentCreationAndTarPush

# inputDirURI: Root directory of input data, expected to have scheme configured in PinotFS.
inputDirURI: 'examples/batch/airlineStats/rawdata'

# includeFileNamePattern: include file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will include all avro files just under the inputDirURI, not sub directories;
#   'glob:**\/*.avro' will include all the avro files under inputDirURI recursively.
includeFileNamePattern: 'glob:**/*.avro'

# excludeFileNamePattern: exclude file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will exclude all avro files just under the inputDirURI, not sub directories;
#   'glob:**\/*.avro' will exclude all the avro files under inputDirURI recursively.
# _excludeFileNamePattern: ''

# outputDirURI: Root directory of output segments, expected to have scheme configured in PinotFS.
outputDirURI: 'examples/batch/airlineStats/segments'

# overwriteOutput: Overwrite output segments if existed.
overwriteOutput: true

# pinotFSSpecs: defines all related Pinot file systems.
pinotFSSpecs:

  - # scheme: used to identify a PinotFS.
    # E.g. local, hdfs, dbfs, etc
    scheme: file

    # className: Class name used to create the PinotFS instance.
    # E.g.
    #   org.apache.pinot.spi.filesystem.LocalPinotFS is used for local filesystem
    #   org.apache.pinot.plugin.filesystem.AzurePinotFS is used for Azure Data Lake
    #   org.apache.pinot.plugin.filesystem.HadoopPinotFS is used for HDFS
    className: org.apache.pinot.spi.filesystem.LocalPinotFS

# recordReaderSpec: defines all record reader
recordReaderSpec:

  # dataFormat: Record data format, e.g. 'avro', 'parquet', 'orc', 'csv', 'json', 'thrift' etc.
  dataFormat: 'avro'

  # className: Corresponding RecordReader class name.
  # E.g.
  #   org.apache.pinot.plugin.inputformat.avro.AvroRecordReader
  #   org.apache.pinot.plugin.inputformat.csv.CSVRecordReader
  #   org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader
  #   org.apache.pinot.plugin.inputformat.json.JSONRecordReader
  #   org.apache.pinot.plugin.inputformat.orc.ORCRecordReader
  #   org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReader
  className: 'org.apache.pinot.plugin.inputformat.avro.AvroRecordReader'

# tableSpec: defines table name and where to fetch corresponding table config and table schema.
tableSpec:

  # tableName: Table name
  tableName: 'airlineStats'

  # schemaURI: defines where to read the table schema, supports PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_schema.json
  #   http://localhost:9000/tables/myTable/schema
  schemaURI: 'http://localhost:9000/tables/airlineStats/schema'

  # tableConfigURI: defines where to reade the table config.
  # Supports using PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_config.json
  #   http://localhost:9000/tables/myTable
  # Note that the API to read Pinot table config directly from pinot controller contains a JSON wrapper.
  # The real table config is the object under the field 'OFFLINE'.
  tableConfigURI: 'http://localhost:9000/tables/airlineStats'

# segmentNameGeneratorSpec: defines how to init a SegmentNameGenerator.
segmentNameGeneratorSpec:

  # type: Current supported type is 'simple' and 'normalizedDate'.
  type: normalizedDate

  # configs: Configs to init SegmentNameGenerator.
  configs:
    segment.name.prefix: 'airlineStats_batch'
    exclude.sequence.id: true

# pinotClusterSpecs: defines the Pinot Cluster Access Point.
pinotClusterSpecs:
  - # controllerURI: used to fetch table/schema information and data push.
    # E.g. http://localhost:9000
    controllerURI: 'http://localhost:9000'

# pushJobSpec: defines segment push job related configuration.
pushJobSpec:

  # pushAttempts: number of attempts for push job, default is 1, which means no retry.
  pushAttempts: 2

  # pushRetryIntervalMillis: retry wait Ms, default to 1 second.
  pushRetryIntervalMillis: 1000

Executing the job

Below command will create example table into Pinot cluster.

bin/pinot-admin.sh AddTable  -schemaFile examples/batch/airlineStats/airlineStats_schema.json -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json -exec

Below command will kick off the ingestion job to generate Pinot segments and push them into the cluster.

bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile examples/batch/airlineStats/ingestionJobSpec.yaml

After job finished, segments are stored in examples/batch/airlineStats/segments following same layout of input directory layout.

/var/pinot/airlineStats/segments/2014/01/01/airlineStats_batch_2014-01-01_2014-01-01.tar.gz
/var/pinot/airlineStats/segments/2014/01/02/airlineStats_batch_2014-01-02_2014-01-02.tar.gz
/var/pinot/airlineStats/segments/2014/01/03/airlineStats_batch_2014-01-03_2014-01-03.tar.gz
/var/pinot/airlineStats/segments/2014/01/04/airlineStats_batch_2014-01-04_2014-01-04.tar.gz
/var/pinot/airlineStats/segments/2014/01/05/airlineStats_batch_2014-01-05_2014-01-05.tar.gz
/var/pinot/airlineStats/segments/2014/01/06/airlineStats_batch_2014-01-06_2014-01-06.tar.gz
/var/pinot/airlineStats/segments/2014/01/07/airlineStats_batch_2014-01-07_2014-01-07.tar.gz
/var/pinot/airlineStats/segments/2014/01/08/airlineStats_batch_2014-01-08_2014-01-08.tar.gz
/var/pinot/airlineStats/segments/2014/01/09/airlineStats_batch_2014-01-09_2014-01-09.tar.gz
/var/pinot/airlineStats/segments/2014/01/10/airlineStats_batch_2014-01-10_2014-01-10.tar.gz
/var/pinot/airlineStats/segments/2014/01/11/airlineStats_batch_2014-01-11_2014-01-11.tar.gz
/var/pinot/airlineStats/segments/2014/01/12/airlineStats_batch_2014-01-12_2014-01-12.tar.gz
/var/pinot/airlineStats/segments/2014/01/13/airlineStats_batch_2014-01-13_2014-01-13.tar.gz
/var/pinot/airlineStats/segments/2014/01/14/airlineStats_batch_2014-01-14_2014-01-14.tar.gz
/var/pinot/airlineStats/segments/2014/01/15/airlineStats_batch_2014-01-15_2014-01-15.tar.gz
/var/pinot/airlineStats/segments/2014/01/16/airlineStats_batch_2014-01-16_2014-01-16.tar.gz
/var/pinot/airlineStats/segments/2014/01/17/airlineStats_batch_2014-01-17_2014-01-17.tar.gz
/var/pinot/airlineStats/segments/2014/01/18/airlineStats_batch_2014-01-18_2014-01-18.tar.gz
/var/pinot/airlineStats/segments/2014/01/19/airlineStats_batch_2014-01-19_2014-01-19.tar.gz
/var/pinot/airlineStats/segments/2014/01/20/airlineStats_batch_2014-01-20_2014-01-20.tar.gz
/var/pinot/airlineStats/segments/2014/01/21/airlineStats_batch_2014-01-21_2014-01-21.tar.gz
/var/pinot/airlineStats/segments/2014/01/22/airlineStats_batch_2014-01-22_2014-01-22.tar.gz
/var/pinot/airlineStats/segments/2014/01/23/airlineStats_batch_2014-01-23_2014-01-23.tar.gz
/var/pinot/airlineStats/segments/2014/01/24/airlineStats_batch_2014-01-24_2014-01-24.tar.gz
/var/pinot/airlineStats/segments/2014/01/25/airlineStats_batch_2014-01-25_2014-01-25.tar.gz
/var/pinot/airlineStats/segments/2014/01/26/airlineStats_batch_2014-01-26_2014-01-26.tar.gz
/var/pinot/airlineStats/segments/2014/01/27/airlineStats_batch_2014-01-27_2014-01-27.tar.gz
/var/pinot/airlineStats/segments/2014/01/28/airlineStats_batch_2014-01-28_2014-01-28.tar.gz
/var/pinot/airlineStats/segments/2014/01/29/airlineStats_batch_2014-01-29_2014-01-29.tar.gz
/var/pinot/airlineStats/segments/2014/01/30/airlineStats_batch_2014-01-30_2014-01-30.tar.gz
/var/pinot/airlineStats/segments/2014/01/31/airlineStats_batch_2014-01-31_2014-01-31.tar.gz

Executing the job using Spark

Below example is running in a spark local mode. You can download spark distribution and start it by running:

wget https://downloads.apache.org/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz
tar xvf spark-2.4.6-bin-hadoop2.7.tgz
cd spark-2.4.6-bin-hadoop2.7
./bin/spark-shell --master 'local[2]'

Build latest Pinot Distribution following this Wiki.

Below command shows how to use spark-submit command to submit a spark job using pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar.

Sample Spark ingestion job spec yaml, (also located at examples/batch/airlineStats/sparkIngestionJobSpec.yaml):

# executionFrameworkSpec: Defines ingestion jobs to be running.
executionFrameworkSpec:

  # name: execution framework name
  name: 'spark'

  # segmentGenerationJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentGenerationJobRunner interface.
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'

  # segmentTarPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentTarPushJobRunner interface.
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentTarPushJobRunner'

  # segmentUriPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentUriPushJobRunner interface.
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner'

  # extraConfigs: extra configs for execution framework.
  extraConfigs:

    # stagingDir is used in distributed filesystem to host all the segments then move this directory entirely to output directory.
    stagingDir: examples/batch/airlineStats/staging

# jobType: Pinot ingestion job type.
# Supported job types are:
#   'SegmentCreation'
#   'SegmentTarPush'
#   'SegmentUriPush'
#   'SegmentCreationAndTarPush'
#   'SegmentCreationAndUriPush'
jobType: SegmentCreationAndTarPush

# inputDirURI: Root directory of input data, expected to have scheme configured in PinotFS.
inputDirURI: 'examples/batch/airlineStats/rawdata'

# includeFileNamePattern: include file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will include all avro files just under the inputDirURI, not sub directories;
#   'glob:**\/*.avro' will include all the avro files under inputDirURI recursively.
includeFileNamePattern: 'glob:**/*.avro'

# excludeFileNamePattern: exclude file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will exclude all avro files just under the inputDirURI, not sub directories;
#   'glob:**\/*.avro' will exclude all the avro files under inputDirURI recursively.
# excludeFileNamePattern: ''

# outputDirURI: Root directory of output segments, expected to have scheme configured in PinotFS.
outputDirURI: 'examples/batch/airlineStats/segments'

# overwriteOutput: Overwrite output segments if existed.
overwriteOutput: true

# pinotFSSpecs: defines all related Pinot file systems.
pinotFSSpecs:

  - # scheme: used to identify a PinotFS.
    # E.g. local, hdfs, dbfs, etc
    scheme: file

    # className: Class name used to create the PinotFS instance.
    # E.g.
    #   org.apache.pinot.spi.filesystem.LocalPinotFS is used for local filesystem
    #   org.apache.pinot.plugin.filesystem.AzurePinotFS is used for Azure Data Lake
    #   org.apache.pinot.plugin.filesystem.HadoopPinotFS is used for HDFS
    className: org.apache.pinot.plugin.filesystem.HadoopPinotFS

# recordReaderSpec: defines all record reader
recordReaderSpec:

  # dataFormat: Record data format, e.g. 'avro', 'parquet', 'orc', 'csv', 'json', 'thrift' etc.
  dataFormat: 'avro'

  # className: Corresponding RecordReader class name.
  # E.g.
  #   org.apache.pinot.plugin.inputformat.avro.AvroRecordReader
  #   org.apache.pinot.plugin.inputformat.csv.CSVRecordReader
  #   org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader
  #   org.apache.pinot.plugin.inputformat.json.JSONRecordReader
  #   org.apache.pinot.plugin.inputformat.orc.ORCRecordReader
  #   org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReader
  className: 'org.apache.pinot.plugin.inputformat.avro.AvroRecordReader'

# tableSpec: defines table name and where to fetch corresponding table config and table schema.
tableSpec:

  # tableName: Table name
  tableName: 'airlineStats'

  # schemaURI: defines where to read the table schema, supports PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_schema.json
  #   http://localhost:9000/tables/myTable/schema
  schemaURI: 'http://localhost:9000/tables/airlineStats/schema'

  # tableConfigURI: defines where to reade the table config.
  # Supports using PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_config.json
  #   http://localhost:9000/tables/myTable
  # Note that the API to read Pinot table config directly from pinot controller contains a JSON wrapper.
  # The real table config is the object under the field 'OFFLINE'.
  tableConfigURI: 'http://localhost:9000/tables/airlineStats'

# segmentNameGeneratorSpec: defines how to init a SegmentNameGenerator.
segmentNameGeneratorSpec:

  # type: Current supported type is 'simple' and 'normalizedDate'.
  type: normalizedDate

  # configs: Configs to init SegmentNameGenerator.
  configs:
    segment.name.prefix: 'airlineStats_batch'
    exclude.sequence.id: true

# pinotClusterSpecs: defines the Pinot Cluster Access Point.
pinotClusterSpecs:
  - # controllerURI: used to fetch table/schema information and data push.
    # E.g. http://localhost:9000
    controllerURI: 'http://localhost:9000'

# pushJobSpec: defines segment push job related configuration.
pushJobSpec:

  # pushParallelism: push job parallelism, default is 1.
  pushParallelism: 2

  # pushAttempts: number of attempts for push job, default is 1, which means no retry.
  pushAttempts: 2

  # pushRetryIntervalMillis: retry wait Ms, default to 1 second.
  pushRetryIntervalMillis: 1000

Please ensure parameter PINOT_ROOT_DIR and PINOT_VERSION are set properly.

Please ensure you set

  • spark.driver.extraJavaOptions =>

    -Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins

Or put all the required plugins jars to CLASSPATH, then set -Dplugins.dir=${CLASSPATH}

  • spark.driver.extraClassPath =>

    pinot-all-${PINOT_VERSION}-jar-with-depdencies.jar

export PINOT_VERSION=0.5.0-SNAPSHOT
export PINOT_DISTRIBUTION_DIR=${PINOT_ROOT_DIR}/pinot-distribution/target/apache-pinot-${PINOT_VERSION}-bin/apache-pinot-${PINOT_VERSION}-bin
cd ${PINOT_DISTRIBUTION_DIR}
${SPARK_HOME}/bin/spark-submit \
  --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
  --master "local[2]" \
  --deploy-mode client \
  --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \
  --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar" \
  local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \
  -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/batch/airlineStats/sparkIngestionJobSpec.yaml

Executing the job using Hadoop

Below command shows how to use Hadoop jar command to run a Hadoop job using pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar.

Sample Hadoop ingestion job spec yaml(also located at examples/batch/airlineStats/hadoopIngestionJobSpec.yaml):

# executionFrameworkSpec: Defines ingestion jobs to be running.
executionFrameworkSpec:

  # name: execution framework name
  name: 'hadoop'

  # segmentGenerationJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentGenerationJobRunner interface.
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner'

  # segmentTarPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentTarPushJobRunner interface.
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentTarPushJobRunner'

  # segmentUriPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentUriPushJobRunner interface.
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentUriPushJobRunner'

  # extraConfigs: extra configs for execution framework.
  extraConfigs:

    # stagingDir is used in distributed filesystem to host all the segments then move this directory entirely to output directory.
    stagingDir: examples/batch/airlineStats/staging

# jobType: Pinot ingestion job type.
# Supported job types are:
#   'SegmentCreation'
#   'SegmentTarPush'
#   'SegmentUriPush'
#   'SegmentCreationAndTarPush'
#   'SegmentCreationAndUriPush'
jobType: SegmentCreationAndTarPush

# inputDirURI: Root directory of input data, expected to have scheme configured in PinotFS.
inputDirURI: 'examples/batch/airlineStats/rawdata'

# includeFileNamePattern: include file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will include all avro files just under the inputDirURI, not sub directories;
#   'glob:**\/*.avro' will include all the avro files under inputDirURI recursively.
includeFileNamePattern: 'glob:**/*.avro'

# excludeFileNamePattern: exclude file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will exclude all avro files just under the inputDirURI, not sub directories;
#   'glob:**\/*.avro' will exclude all the avro files under inputDirURI recursively.
# _excludeFileNamePattern: ''

# outputDirURI: Root directory of output segments, expected to have scheme configured in PinotFS.
outputDirURI: 'examples/batch/airlineStats/segments'

# overwriteOutput: Overwrite output segments if existed.
overwriteOutput: true

# pinotFSSpecs: defines all related Pinot file systems.
pinotFSSpecs:

  - # scheme: used to identify a PinotFS.
    # E.g. local, hdfs, dbfs, etc
    scheme: file

    # className: Class name used to create the PinotFS instance.
    # E.g.
    #   org.apache.pinot.spi.filesystem.LocalPinotFS is used for local filesystem
    #   org.apache.pinot.plugin.filesystem.AzurePinotFS is used for Azure Data Lake
    #   org.apache.pinot.plugin.filesystem.HadoopPinotFS is used for HDFS
    className: org.apache.pinot.plugin.filesystem.HadoopPinotFS

# recordReaderSpec: defines all record reader
recordReaderSpec:

  # dataFormat: Record data format, e.g. 'avro', 'parquet', 'orc', 'csv', 'json', 'thrift' etc.
  dataFormat: 'avro'

  # className: Corresponding RecordReader class name.
  # E.g.
  #   org.apache.pinot.plugin.inputformat.avro.AvroRecordReader
  #   org.apache.pinot.plugin.inputformat.csv.CSVRecordReader
  #   org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader
  #   org.apache.pinot.plugin.inputformat.json.JSONRecordReader
  #   org.apache.pinot.plugin.inputformat.orc.ORCRecordReader
  #   org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReader
  className: 'org.apache.pinot.plugin.inputformat.avro.AvroRecordReader'

# tableSpec: defines table name and where to fetch corresponding table config and table schema.
tableSpec:

  # tableName: Table name
  tableName: 'airlineStats'

  # schemaURI: defines where to read the table schema, supports PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_schema.json
  #   http://localhost:9000/tables/myTable/schema
  schemaURI: 'http://localhost:9000/tables/airlineStats/schema'

  # tableConfigURI: defines where to reade the table config.
  # Supports using PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_config.json
  #   http://localhost:9000/tables/myTable
  # Note that the API to read Pinot table config directly from pinot controller contains a JSON wrapper.
  # The real table config is the object under the field 'OFFLINE'.
  tableConfigURI: 'http://localhost:9000/tables/airlineStats'

# segmentNameGeneratorSpec: defines how to init a SegmentNameGenerator.
segmentNameGeneratorSpec:

  # type: Current supported type is 'simple' and 'normalizedDate'.
  type: normalizedDate

  # configs: Configs to init SegmentNameGenerator.
  configs:
    segment.name.prefix: 'airlineStats_batch'
    exclude.sequence.id: true

# pinotClusterSpecs: defines the Pinot Cluster Access Point.
pinotClusterSpecs:
  - # controllerURI: used to fetch table/schema information and data push.
    # E.g. http://localhost:9000
    controllerURI: 'http://localhost:9000'

# pushJobSpec: defines segment push job related configuration.
pushJobSpec:

  # pushParallelism: push job parallelism, default is 1.
  pushParallelism: 2

  # pushAttempts: number of attempts for push job, default is 1, which means no retry.
  pushAttempts: 2

  # pushRetryIntervalMillis: retry wait Ms, default to 1 second.
  pushRetryIntervalMillis: 1000

Please ensure parameter PINOT_ROOT_DIR and PINOT_VERSION are set properly.

export PINOT_VERSION=0.5.0-SNAPSHOT
export PINOT_DISTRIBUTION_DIR=${PINOT_ROOT_DIR}/pinot-distribution/target/apache-pinot-${PINOT_VERSION}-bin/apache-pinot-${PINOT_VERSION}-bin
export HADOOP_CLIENT_OPTS="-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml"
hadoop jar  \
        ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \
        org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
        -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/batch/airlineStats/hadoopIngestionJobSpec.yaml

Tunning

You can set Environment Variable: JAVA_OPTS to modify:

  • Log4j2 file location with -Dlog4j2.configurationFile

  • Plugin directory location with -Dplugins.dir=/opt/pinot/plugins

  • JVM props, like -Xmx8g -Xms4G

Please note that you need to config above three all together in JAVA_OPTS. If you only config JAVA_OPTS="-Xmx4g" then plugins.dir is empty usually will cause job failure.

E.g.

docker run --rm -ti -e JAVA_OPTS="-Xms8G -Dlog4j2.configurationFile=/opt/pinot/conf/pinot-admin-log4j2.xml  -Dplugins.dir=/opt/pinot/plugins" --name pinot-data-ingestion-job apachepinot/pinot:latest LaunchDataIngestionJob -jobSpecFile /path/to/ingestion_job_spec.yaml

You can also add your customized JAVA_OPTS if necessary.

Star-Tree Index

Unlike other index techniques which work on single column, Star-Tree index is built on multiple columns, and utilize the pre-aggregated results to significantly reduce the number of values to be processed, thus improve the query performance.

One of the biggest challenges in realtime OLAP systems is achieving and maintaining tight SLA’s on latency and throughput on large data sets. Existing techniques such as sorted index or inverted index help improve query latencies, but speed-ups are still limited by number of documents necessary to process for computing the results. On the other hand, pre-aggregating the results ensures a constant upper bound on query latencies, but can lead to storage space explosion.

Here we introduce star-tree index to utilize the pre-aggregated documents in a smart way to achieve low query latencies but also use the storage space efficiently for aggregation/group-by queries.

Existing solutions

Consider the following data set as an example to discuss the existing approaches:

Country

Browser

Locale

Impressions

CA

Chrome

en

400

CA

Firefox

fr

200

MX

Safari

es

300

MX

Safari

en

100

USA

Chrome

en

600

USA

Firefox

es

200

USA

Firefox

en

400

Sorted index

In this approach, data is sorted on a primary key, which is likely to appear as filter in most queries in the query set.

This reduces the time to search the documents for a given primary key value from linear scan O(n) to binary search O(logn), and also keeps good locality for the documents selected.

While this is a good improvement over linear scan, there are still a few issues with this approach:

  • While sorting on one column does not require additional space, sorting on additional columns would require additional storage space to re-index the records for the various sort orders.

  • While search time is reduced from O(n) to O(logn), overall latency is still a function of total number of documents need to be processed to answer a query.

Inverted index

In this approach, for each value of a given column, we maintain a list of document id’s where this value appears.

Below are the inverted indexes for columns ‘Browser’ and ‘Locale’ for our example data set:

Browser

Doc Id

Firefox

1,5,6

Chrome

0,4

Safari

2,3

Locale

Doc Id

en

0,3,4,6

es

2,5

fr

1

For example, if we want to get all the documents where ‘Browser’ is ‘Firefox’, we can simply look up the inverted index for ‘Browser’ and identify that it appears in documents [1, 5, 6].

Using inverted index, we can reduce the search time to constant time O(1). However, the query latency is still a function of the selectivity of the query, i.e. increases with the number of documents need to be processed to answer the query.

Pre-aggregation

In this technique, we pre-compute the answer for a given query set upfront.

In the example below, we have pre-aggregated the total impressions for each country:

Country

Impressions

CA

600

MX

400

USA

1200

Doing so makes answering queries about total impressions for a country just a value lookup, by eliminating the need of processing a large number of documents. However, to be able to answer with multiple predicates implies pre-aggregating for various combinations of different dimensions. This leads to exponential explosion in storage space.

Star-tree solution

On one end of the spectrum we have indexing techniques that improve search times with limited increase in space, but do not guarantee a hard upper bound on query latencies. On the other end of the spectrum we have pre-aggregation techniques that offer hard upper bound on query latencies, but suffer from exponential explosion of storage space

Space-Time Trade Off Between Different Techniques

We propose the Star-Tree data structure that offers a configurable trade-off between space and time and allows us to achieve hard upper bound for query latencies for a given use case. In the following sections we will define the Star-Tree data structure, and discuss how it is utilized within Pinot for achieving low latencies with high throughput.

Definitions

Tree structure

Star-tree is a tree data structure that is consisted of the following properties:

Star-tree Structure

  • Root Node (Orange): Single root node, from which the rest of the tree can be traversed.

  • Leaf Node (Blue): A leaf node can containing at most T records, where T is configurable.

  • Non-leaf Node (Green): Nodes with more than T records are further split into children nodes.

  • Star-Node (Yellow): Non-leaf nodes can also have a special child node called the Star-Node. This node contains the pre-aggregated records after removing the dimension on which the data was split for this level.

  • Dimensions Split Order ([D1, D2]): Nodes at a given level in the tree are split into children nodes on all values of a particular dimension. The dimensions split order is an ordered list of dimensions that is used to determine the dimension to split on for a given level in the tree.

Node properties

The properties stored in each node are as follows:

  • Dimension: The dimension which the node is split on

  • Start/End Document Id: The range of documents this node points to

  • Aggregated Document Id: One single document which is the aggregation result of all documents pointed by this node

Index generation

Star-tree index is generated in the following steps:

  • The data is first projected as per the dimensionsSplitOrder. Only the dimensions from the split order are reserved, others are dropped. For each unique combination of reserved dimensions, metrics are aggregated per configuration. The aggregated documents are written to a file and served as the initial Star-Tree documents (separate from the original documents).

  • Sort the Star-Tree documents based on the dimensionsSplitOrder. It is primary-sorted on the first dimension in this list, and then secondary sorted on the rest of the dimensions based on their order in the list. Each node in the tree points to a range in the sorted documents.

  • The tree structure can be created recursively (starting at root node) as follows:

    • If a node has more than T records, it is split into multiple children nodes, one for each value of the dimension in the split order corresponding to current level in the tree.

    • A Star-Node can be created (per configuration) for the current node, by dropping the dimension being split on, and aggregating the metrics for rows containing dimensions with identical values. These aggregated documents are appended to the end of the Star-Tree documents.

      If there is only one value for the current dimension, Star-Node won’t be created because the documents under the Star-Node are identical to the single node.

  • The above step is repeated recursively until there are no more nodes to split.

  • Multiple Star-Trees can be generated based on different configurations (dimensionsSplitOrder, aggregations, T)

Aggregation

Aggregation is configured as a pair of aggregation function and the column to apply the aggregation.

All types of aggregation function with bounded-sized intermediate result are supported.

Supported functions

  • COUNT

  • MIN

  • MAX

  • SUM

  • AVG

  • MIN_MAX_RANGE

  • DISTINCT_COUNT_HLL

  • PERCENTILE_EST

  • PERCENTILE_TDIGEST

  • DISTINCT_COUNT_BITMAP

    • NOTE: The intermediate result RoaringBitmap is not bounded-sized, use carefully on high cardinality columns)

Unsupported functions

  • DISTINCT_COUNT

    • Intermediate result Set is unbounded

  • SEGMENT_PARTITIONED_DISTINCT_COUNT:

    • Intermediate result Set is unbounded

  • PERCENTILE

    • Intermediate result List is unbounded

Functions to be supported

  • DISTINCT_COUNT_THETA_SKETCH

  • ST_UNION

Index generation configuration

Multiple index generation configurations can be provided to generate multiple star-trees. Each configuration should contain the following properties:

  • dimensionsSplitOrder: An ordered list of dimension names can be specified to configure the split order. Only the dimensions in this list are reserved in the aggregated documents. The nodes will be split based on the order of this list. For example, split at level i is performed on the values of dimension at index i in the list.

    • The star-tree dimension does not have to be a dimension column in the table, it can also be time column, date-time column, or metric column if necessary.

    • The star-tree dimension column should be dictionary encoded in order to generate the star-tree index.

    • All columns in the filter and group-by clause of a query should be included in this list in order to use the star-tree index.

  • skipStarNodeCreationForDimensions (Optional, default empty): A list of dimension names for which to not create the Star-Node.

  • functionColumnPairs: A list of aggregation function and column pairs (split by double underscore “__”). E.g. SUM__Impressions (SUM of column Impressions) or COUNT__*.

    • The column within the function-column pair can be either dictionary encoded or raw.

    • All aggregations of a query should be included in this list in order to use the star-tree index.

  • maxLeafRecords (Optional, default 10000): The threshold T to determine whether to further split each node.

Default index generation configuration

Default star-tree index can be added to the segment with a boolean config enableDefaultStarTree under the tableIndexConfig.

The default star-tree will have the following configuration:

  • All dictionary-encoded single-value dimensions with cardinality smaller or equal to a threshold (10000) will be included in the dimensionsSplitOrder, sorted by their cardinality in descending order.

  • All dictionary-encoded Time/DateTime columns will be appended to the dimensionsSplitOrder following the dimensions, sorted by their cardinality in descending order. Here we assume that time columns will be included in most queries as the range filter column and/or the group by column, so for better performance, we always include them as the last elements in the dimensionsSplitOrder.

  • Include COUNT(*) and SUM for all numeric metrics in the functionColumnPairs.

  • Use default maxLeafRecords (10000).

Example

For our example data set, in order to solve the following query efficiently:

SELECT SUM(Impressions) FROM myTable WHERE Country = 'USA' AND Browser = 'Chrome' GROUP BY Locale

We may config the star-tree index as following:

"tableIndexConfig": {
  "starTreeIndexConfigs": [{
    "dimensionsSplitOrder": [
      "Country",
      "Browser",
      "Locale"
    ],
    "skipStarNodeCreationForDimensions": [
    ],
    "functionColumnPairs": [
      "SUM__Impressions"
    ],
    "maxLeafRecords": 1
  }],
  ...
}

The star-tree and documents should be something like below:

Tree structure

The values in the parentheses are the aggregated sum of Impressions for all the documents under the node.

Star-tree documents

Country

Browser

Locale

SUM__Impressions

CA

Chrome

en

400

CA

Firefox

fr

200

MX

Safari

en

100

MX

Safari

es

300

USA

Chrome

en

600

USA

Firefox

en

400

USA

Firefox

es

200

CA

*

en

400

CA

*

fr

200

CA

*

*

600

MX

Safari

*

400

USA

Firefox

*

600

USA

*

en

1000

USA

*

es

200

USA

*

*

1200

*

Chrome

en

1000

*

Firefox

en

400

*

Firefox

es

200

*

Firefox

fr

200

*

Firefox

*

800

*

Safari

en

100

*

Safari

es

300

*

Safari

*

400

*

*

en

1500

*

*

es

500

*

*

fr

200

*

*

*

2200

Query execution

For query execution, the idea is to first check metadata to determine whether the query can be solved with the Star-Tree documents, then traverse the Star-Tree to identify documents that satisfy all the predicates. After applying any remaining predicates that were missed while traversing the Star-Tree to the identified documents, apply aggregation/group-by on the qualified documents.

The algorithm to traverse the tree can be described as follows:

  • Start from root node.

  • For each level, what child node(s) to select depends on whether there are any predicates/group-by on the split dimension for the level in the query.

    • If there is no predicate or group-by on the split dimension, select the Star-Node if exists, or all child nodes to traverse further.

    • If there are predicate(s) on the split dimension, select the child node(s) that satisfy the predicate(s).

    • If there is no predicate, but there is a group-by on the split dimension, select all child nodes except Star-Node.

  • Recursively repeat the previous step until all leaf nodes are reached, or all predicates are satisfied.

  • Collect all the documents pointed by the selected nodes.

    • If all predicates and group-by's are satisfied, pick the single aggregated document from each selected node.

    • Otherwise, collect all the documents in the document range from each selected node.

GitHub Events Stream

Steps for setting up a Pinot cluster and a realtime table which consumes from the GitHub events stream.

Pull Request Merged Events Stream

In this recipe, we will

  1. Set up a Pinot cluster, in the steps

    a. Start zookeeper

    b. Start controller

    c. Start broker

    d. Start server

  2. Set up a Kafka cluster

  3. Create a Kafka topic - pullRequestMergedEvents

  4. Create a realtime table - pullRequestMergedEvents and a schema

  5. Start a task which reads from GitHub events API and publishes events about merged pull requests to the topic.

  6. Query the realtime data

Steps

Using Docker images or Launcher Scripts

Pull docker image

Get the latest Docker image.

export PINOT_VERSION=latest
export PINOT_IMAGE=apachepinot/pinot:${PINOT_VERSION}
docker pull ${PINOT_IMAGE}

Long Version

Set up the Pinot cluster

Follow the instructions in Advanced Pinot Setup to setup the Pinot cluster with the components:

  1. Zookeeper

  2. Controller

  3. Broker

  4. Server

  5. Kafka

Create a Kafka topic

Create a Kafka topic called pullRequestMergedEvents for the demo.

docker exec \
  -t kafka \
  /opt/kafka/bin/kafka-topics.sh \
  --zookeeper pinot-zookeeper:2181/kafka \
  --partitions=1 --replication-factor=1 \
  --create --topic pullRequestMergedEvents

Add Pinot table and schema

The schema is present at examples/stream/githubEvents/pullRequestMergedEvents_schema.json and is also pasted below

pullRequestMergedEvents_schema.json
{
  "schemaName": "pullRequestMergedEvents",
  "dimensionFieldSpecs": [
    {
      "name": "title",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "labels",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "userId",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "userType",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "authorAssociation",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "mergedBy",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "assignees",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "authors",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "committers",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "requestedReviewers",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "requestedTeams",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "reviewers",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "commenters",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "repo",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "organization",
      "dataType": "STRING",
      "defaultNullValue": ""
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "count",
      "dataType": "LONG",
      "defaultNullValue": 1
    },
    {
      "name": "numComments",
      "dataType": "LONG"
    },
    {
      "name": "numReviewComments",
      "dataType": "LONG"
    },
    {
      "name": "numCommits",
      "dataType": "LONG"
    },
    {
      "name": "numLinesAdded",
      "dataType": "LONG"
    },
    {
      "name": "numLinesDeleted",
      "dataType": "LONG"
    },
    {
      "name": "numFilesChanged",
      "dataType": "LONG"
    },
    {
      "name": "numAuthors",
      "dataType": "LONG"
    },
    {
      "name": "numCommitters",
      "dataType": "LONG"
    },
    {
      "name": "numReviewers",
      "dataType": "LONG"
    },
    {
      "name": "numCommenters",
      "dataType": "LONG"
    },
    {
      "name": "createdTimeMillis",
      "dataType": "LONG"
    },
    {
      "name": "elapsedTimeMillis",
      "dataType": "LONG"
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "mergedTimeMillis",
      "dataType": "TIMESTAMP",
      "format": "1:MILLISECONDS:TIMESTAMP",
      "granularity": "1:MILLISECONDS"
    }
  ]
}

The table config is present at examples/stream/githubEvents/docker/pullRequestMergedEvents_realtime_table_config.json and is also pasted below.

Note If you're setting this up on a pre-configured cluster, set the properties stream.kafka.zk.broker.url and stream.kafka.broker.list correctly, depending on the configuration of your Kafka cluster.

pullRequestMergedEvents_realtime_table_config.json
{
  "tableName": "pullRequestMergedEvents",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "mergedTimeMillis",
    "timeType": "MILLISECONDS",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "60",
    "schemaName": "pullRequestMergedEvents",
    "replication": "1",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "invertedIndexColumns": [
      "organization",
      "repo"
    ],
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "simple",
      "stream.kafka.topic.name": "pullRequestMergedEvents",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.zk.broker.url": "pinot-zookeeper:2181/kafka",
      "stream.kafka.broker.list": "kafka:9092",
      "realtime.segment.flush.threshold.time": "12h",
      "realtime.segment.flush.threshold.size": "100000",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}

Add the table and schema using the following command

$ docker run \
    --network=pinot-demo \
    --name pinot-streaming-table-creation \
    ${PINOT_IMAGE} AddTable \
    -schemaFile examples/stream/githubEvents/pullRequestMergedEvents_schema.json \
    -tableConfigFile examples/stream/githubEvents/docker/pullRequestMergedEvents_realtime_table_config.json \
    -controllerHost pinot-controller \
    -controllerPort 9000 \
    -exec
Executing command: AddTable -tableConfigFile examples/stream/githubEvents/docker/pullRequestMergedEvents_realtime_table_config.json -schemaFile examples/stream/githubEvents/pullRequestMergedEvents_schema.json -controllerHost pinot-controller -controllerPort 9000 -exec
Sending request: http://pinot-controller:9000/schemas to controller: 20c241022a96, version: Unknown
{"status":"Table pullRequestMergedEvents_REALTIME succesfully added"}

Publish events

Start streaming GitHub events into the Kafka topic

Prerequisites

Generate a personal access token on GitHub.

$ docker run --rm -ti \
    --network=pinot-demo \
    --name pinot-github-events-into-kafka \
    -d ${PINOT_IMAGE} StreamGitHubEvents \
    -schemaFile examples/stream/githubEvents/pullRequestMergedEvents_schema.json \
    -topic pullRequestMergedEvents \
    -personalAccessToken <your_github_personal_access_token> \
    -kafkaBrokerList kafka:9092

Short Version

For a single command to setup all the above steps, use the following command. Make sure to stop any previous running Pinot services.

$ docker run --rm -ti \
    --network=pinot-demo \
    --name pinot-github-events-quick-start \
     ${PINOT_IMAGE} GitHubEventsQuickStart \
    -personalAccessToken <your_github_personal_access_token> 

Get Pinot

Follow instructions in Build from source to get the latest Pinot code

Long Version

Set up the Pinot cluster

Follow the instructions in Advanced Pinot Setup to setup the Pinot cluster with the components:

  1. Zookeeper

  2. Controller

  3. Broker

  4. Server

  5. Kafka

Create a Kafka topic

Download Apache Kafka release.

Create a Kafka topic called pullRequestMergedEvents for the demo.

$ bin/kafka-topics.sh \
  --create \
  --bootstrap-server localhost:19092 \
  --replication-factor 1 \
  --partitions 1 \
  --topic pullRequestMergedEvents

Add Pinot table and schema

Schema can be found at /examples/stream/githubevents/ in the release, and is also pasted below:

{
  "schemaName": "pullRequestMergedEvents",
  "dimensionFieldSpecs": [
    {
      "name": "title",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "labels",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "userId",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "userType",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "authorAssociation",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "mergedBy",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "assignees",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "authors",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "committers",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "requestedReviewers",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "requestedTeams",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "reviewers",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "commenters",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    },
    {
      "name": "repo",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "organization",
      "dataType": "STRING",
      "defaultNullValue": ""
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "count",
      "dataType": "LONG",
      "defaultNullValue": 1
    },
    {
      "name": "numComments",
      "dataType": "LONG"
    },
    {
      "name": "numReviewComments",
      "dataType": "LONG"
    },
    {
      "name": "numCommits",
      "dataType": "LONG"
    },
    {
      "name": "numLinesAdded",
      "dataType": "LONG"
    },
    {
      "name": "numLinesDeleted",
      "dataType": "LONG"
    },
    {
      "name": "numFilesChanged",
      "dataType": "LONG"
    },
    {
      "name": "numAuthors",
      "dataType": "LONG"
    },
    {
      "name": "numCommitters",
      "dataType": "LONG"
    },
    {
      "name": "numReviewers",
      "dataType": "LONG"
    },
    {
      "name": "numCommenters",
      "dataType": "LONG"
    },
    {
      "name": "createdTimeMillis",
      "dataType": "LONG"
    },
    {
      "name": "elapsedTimeMillis",
      "dataType": "LONG"
    }
  ],
  "timeFieldSpec": {
    "incomingGranularitySpec": {
      "timeType": "MILLISECONDS",
      "timeFormat": "EPOCH",
      "dataType": "LONG",
      "name": "mergedTimeMillis"
    }
  }
}

Table config can be found at /examples/stream/githubevents/ in the release, and is also pasted below.

Note

If you're setting this up on a pre-configured cluster, set the properties stream.kafka.zk.broker.url and stream.kafka.broker.list correctly, depending on the configuration of your Kafka cluster.

{
  "tableName": "pullRequestMergedEvents",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "mergedTimeMillis",
    "timeType": "MILLISECONDS",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "60",
    "schemaName": "pullRequestMergedEvents",
    "replication": "1",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "invertedIndexColumns": [
      "organization",
      "repo"
    ],
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "simple",
      "stream.kafka.topic.name": "pullRequestMergedEvents",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.zk.broker.url": "localhost:2191/kafka",
      "stream.kafka.broker.list": "localhost:19092",
      "realtime.segment.flush.threshold.time": "12h",
      "realtime.segment.flush.threshold.size": "100000",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}

Add the table and schema using the command

$ bin/pinot-admin.sh AddTable \
  -tableConfigFile $PATH_TO_CONFIGS/examples/stream/githubEvents/pullRequestMergedEvents_realtime_table_config.json \
  -schemaFile $PATH_TO_CONFIGS/examples/stream/githubEvents/pullRequestMergedEvents_schema.json \
  -exec

Publish events

Start streaming GitHub events into the Kafka topic

Prerequisites

Generate a personal access token on GitHub.

$ bin/pinot-admin.sh StreamGitHubEvents \
  -topic pullRequestMergedEvents \
  -personalAccessToken <your_github_personal_access_token> \
  -kafkaBrokerList localhost:19092 \
  -schemaFile $PATH_TO_CONFIGS/examples/stream/githubEvents/pullRequestMergedEvents_schema.json

Short Version

For a single command to setup all the above steps

$ bin/pinot-admin.sh GitHubEventsQuickStart \
  -personalAccessToken <your_github_personal_access_token>

Kubernetes cluster

If you already have a Kubernetes cluster with Pinot and Kafka (see Running Pinot in Kubernetes), first create the topic and then setup the table and streaming using

$ cd kubernetes/helm
$ kubectl apply -f pinot-github-realtime-events.yml

Query

Head over to the Query Console to checkout the data!

Visualizing on SuperSet

You can use SuperSet to visualize this data. Some of the interesting insights we captures were

Most Active organizations during the lockdown

Repositories by number of commits in the Apache organization

To integrate with SuperSet you can check out the SuperSet Integrations page.

Use S3 as Deep Storage for Pinot

Below commands are based on pinot distribution binary.

Setup Pinot Cluster

In order to setup Pinot to use S3 as deep store, we need to put extra configs for Controller and Server.

Start Controller

Below is a sample controller.conf file.

Please config:

controller.data.dirto your s3 bucket. All the uploaded segments will be stored there.

And add s3 as a pinot storage with configs:

pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=us-west-2

Regarding AWS Credential, we also follow the convention of DefaultAWSCredentialsProviderChain.

You can specify AccessKey and Secret using:

  • Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)

  • Java System Properties - aws.accessKeyId and aws.secretKey

  • Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI

  • Configure AWS credential in pinot config files, e.g. set pinot.controller.storage.factory.s3.accessKey and pinot.controller.storage.factory.s3.secretKey in the config file. (Not recommended)

pinot.controller.storage.factory.s3.accessKey=****************LFVX
pinot.controller.storage.factory.s3.secretKey=****************gfhz

Add s3 to pinot.controller.segment.fetcher.protocols

and set pinot.controller.segment.fetcher.s3.class toorg.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

controller.data.dir=s3://my.bucket/pinot-data/pinot-s3-example/controller-data
controller.local.temp.dir=/tmp/pinot-tmp-data/
controller.zk.str=localhost:2181
controller.host=127.0.0.1
controller.port=9000
controller.helix.cluster.name=pinot-s3-example
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=us-west-2

pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

If you to grant full control to bucket owner, then add this to the config:

pinot.controller.storage.factory.s3.disableAcl=false

Then start pinot controller with:

bin/pinot-admin.sh StartController -configFileName conf/controller.conf

Start Broker

Broker is a simple one you can just start it with default:

bin/pinot-admin.sh StartBroker -zkAddress localhost:2181 -clusterName pinot-s3-example

Start Server

Below is a sample server.conf file

Similar to controller config, please also set s3 configs in pinot server.

pinot.server.netty.port=8098
pinot.server.adminapi.port=8097
pinot.server.instance.dataDir=/tmp/pinot-tmp/server/index
pinot.server.instance.segmentTarDir=/tmp/pinot-tmp/server/segmentTars


pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.server.storage.factory.s3.region=us-west-2
pinot.server.segment.fetcher.protocols=file,http,s3
pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

If you to grant full control to bucket owner, then add this to the config:

pinot.controller.storage.factory.s3.disableAcl=false

Then start pinot controller with:

bin/pinot-admin.sh StartServer -configFileName conf/server.conf -zkAddress localhost:2181 -clusterName pinot-s3-example

Setup Table

In this demo, we just use airlineStats table as an example.

Create table with below command:

bin/pinot-admin.sh AddTable  -schemaFile examples/batch/airlineStats/airlineStats_schema.json -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json -exec

Set up Ingestion Jobs

Standalone Job

Below is a sample standalone ingestion job spec with certain notable changes:

  • jobType is SegmentCreationAndUriPush

  • inputDirURI is set to a s3 location s3://my.bucket/batch/airlineStats/rawdata/

  • outputDirURI is set to a s3 location s3://my.bucket/output/airlineStats/segments

  • Add a new PinotFs under pinotFSSpecs

    - scheme: s3
      className: org.apache.pinot.plugin.filesystem.S3PinotFS
      configs:
        region: 'us-west-2'
  • For library version < 0.6.0, please set segmentUriPrefix to [scheme]://[bucket.name], e.g. s3://my.bucket , from version 0.6.0, you can put empty string or just ignore segmentUriPrefix.

Sample ingestionJobSpec.yaml

# executionFrameworkSpec: Defines ingestion jobs to be running.
executionFrameworkSpec:

  # name: execution framework name
  name: 'standalone'

  # segmentGenerationJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentGenerationJobRunner interface.
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'

  # segmentTarPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentTarPushJobRunner interface.
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'

  # segmentUriPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentUriPushJobRunner interface.
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'

# jobType: Pinot ingestion job type.
# Supported job types are:
#   'SegmentCreation'
#   'SegmentTarPush'
#   'SegmentUriPush'
#   'SegmentCreationAndTarPush'
#   'SegmentCreationAndUriPush'
#jobType: SegmentCreationAndUriPush
jobType: SegmentCreationAndUriPush
# inputDirURI: Root directory of input data, expected to have scheme configured in PinotFS.
inputDirURI: 's3://my.bucket/batch/airlineStats/rawdata/'

# includeFileNamePattern: include file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will include all avro files just under the inputDirURI, not sub directories;
#   'glob:**/*.avro' will include all the avro files under inputDirURI recursively.
includeFileNamePattern: 'glob:**/*.avro'

# excludeFileNamePattern: exclude file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will exclude all avro files just under the inputDirURI, not sub directories;
#   'glob:**/*.avro' will exclude all the avro files under inputDirURI recursively.
# _excludeFileNamePattern: ''

# outputDirURI: Root directory of output segments, expected to have scheme configured in PinotFS.
outputDirURI: 's3://my.bucket/examples/output/airlineStats/segments'

# overwriteOutput: Overwrite output segments if existed.
overwriteOutput: true

# pinotFSSpecs: defines all related Pinot file systems.
pinotFSSpecs:

  - # scheme: used to identify a PinotFS.
    # E.g. local, hdfs, dbfs, etc
    scheme: file

    # className: Class name used to create the PinotFS instance.
    # E.g.
    #   org.apache.pinot.spi.filesystem.LocalPinotFS is used for local filesystem
    #   org.apache.pinot.plugin.filesystem.AzurePinotFS is used for Azure Data Lake
    #   org.apache.pinot.plugin.filesystem.HadoopPinotFS is used for HDFS
    className: org.apache.pinot.spi.filesystem.LocalPinotFS


  - scheme: s3
    className: org.apache.pinot.plugin.filesystem.S3PinotFS
    configs:
      region: 'us-west-2'

# recordReaderSpec: defines all record reader
recordReaderSpec:

  # dataFormat: Record data format, e.g. 'avro', 'parquet', 'orc', 'csv', 'json', 'thrift' etc.
  dataFormat: 'avro'

  # className: Corresponding RecordReader class name.
  # E.g.
  #   org.apache.pinot.plugin.inputformat.avro.AvroRecordReader
  #   org.apache.pinot.plugin.inputformat.csv.CSVRecordReader
  #   org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader
  #   org.apache.pinot.plugin.inputformat.json.JSONRecordReader
  #   org.apache.pinot.plugin.inputformat.orc.ORCRecordReader
  #   org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReader
  className: 'org.apache.pinot.plugin.inputformat.avro.AvroRecordReader'

# tableSpec: defines table name and where to fetch corresponding table config and table schema.
tableSpec:

  # tableName: Table name
  tableName: 'airlineStats'

  # schemaURI: defines where to read the table schema, supports PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_schema.json
  #   http://localhost:9000/tables/myTable/schema
  schemaURI: 'http://localhost:9000/tables/airlineStats/schema'

  # tableConfigURI: defines where to reade the table config.
  # Supports using PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_config.json
  #   http://localhost:9000/tables/myTable
  # Note that the API to read Pinot table config directly from pinot controller contains a JSON wrapper.
  # The real table config is the object under the field 'OFFLINE'.
  tableConfigURI: 'http://localhost:9000/tables/airlineStats'

# pinotClusterSpecs: defines the Pinot Cluster Access Point.
pinotClusterSpecs:
  - # controllerURI: used to fetch table/schema information and data push.
    # E.g. http://localhost:9000
    controllerURI: 'http://localhost:9000'

# pushJobSpec: defines segment push job related configuration.
pushJobSpec:

  # pushAttempts: number of attempts for push job, default is 1, which means no retry.
  pushAttempts: 2

  # pushRetryIntervalMillis: retry wait Ms, default to 1 second.
  pushRetryIntervalMillis: 1000

  # For Pinot version < 0.6.0, use [scheme]://[bucket.name] as prefix.
  # E.g. s3://my.bucket
  segmentUriPrefix: 's3://my.bucket'
  segmentUriSuffix: ''

Below is a sample job output:

➜ bin/pinot-ingestion-job.sh -jobSpecFile  ~/temp/pinot/pinot-s3-test/ingestionJobSpec.yaml
2020/08/18 16:11:03.521 INFO [IngestionJobLauncher] [main] SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
  segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**/*.avro
inputDirURI: s3://my.bucket/batch/airlineStats/rawdata/
jobType: SegmentUriPush
outputDirURI: s3://my.bucket/examples/output/airlineStats/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: 'http://localhost:9000'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
- className: org.apache.pinot.plugin.filesystem.S3PinotFS
  configs: {region: us-west-2}
  scheme: s3
pushJobSpec: {pushAttempts: 2, pushParallelism: 1, pushRetryIntervalMillis: 1000,
  segmentUriPrefix: '', segmentUriSuffix: ''}
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.avro.AvroRecordReader,
  configClassName: null, configs: null, dataFormat: avro}
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: 'http://localhost:9000/tables/airlineStats/schema', tableConfigURI: 'http://localhost:9000/tables/airlineStats',
  tableName: airlineStats}

2020/08/18 16:11:03.531 INFO [IngestionJobLauncher] [main] Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner
2020/08/18 16:11:03.654 INFO [PinotFSFactory] [main] Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
2020/08/18 16:11:03.656 INFO [PinotFSFactory] [main] Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS
2020/08/18 16:11:05.520 INFO [SegmentPushUtils] [main] Start sending table airlineStats segment URIs: [s3://my.bucket/examples/output/airlineStats/segments/2014/01/01/airlineStats_OFFLINE_16071_16071_0.tar.gz, s3://my.bucket/examples/output/airlineStats/segments/2014/01/02/airlineStats_OFFLINE_16072_16072_1.tar.gz, s3://my.bucket/examples/output/airlineStats/segments/2014/01/03/airlineStats_OFFLINE_16073_16073_2.tar.gz, s3://my.bucket/examples/output/airlineStats/segments/2014/01/04/airlineStats_OFFLINE_16074_16074_3.tar.gz, s3://my.bucket/examples/output/airlineStats/segments/2014/01/05/airlineStats_OFFLINE_16075_16075_4.tar.gz] to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@4e07b95f]
2020/08/18 16:11:05.521 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/01/airlineStats_OFFLINE_16071_16071_0.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:09.356 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:09.358 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/01/airlineStats_OFFLINE_16071_16071_0.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16071_16071_0 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:09.359 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/02/airlineStats_OFFLINE_16072_16072_1.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:09.824 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:09.825 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/02/airlineStats_OFFLINE_16072_16072_1.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16072_16072_1 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:09.825 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/03/airlineStats_OFFLINE_16073_16073_2.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:10.500 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:10.501 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/03/airlineStats_OFFLINE_16073_16073_2.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16073_16073_2 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:10.501 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/04/airlineStats_OFFLINE_16074_16074_3.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:10.967 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:10.968 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/04/airlineStats_OFFLINE_16074_16074_3.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16074_16074_3 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:10.969 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/05/airlineStats_OFFLINE_16075_16075_4.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:11.420 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:11.420 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/05/airlineStats_OFFLINE_16075_16075_4.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16075_16075_4 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:11.421 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/06/airlineStats_OFFLINE_16076_16076_5.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:11.872 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:11.873 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/06/airlineStats_OFFLINE_16076_16076_5.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16076_16076_5 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:11.877 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/07/airlineStats_OFFLINE_16077_16077_6.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:12.293 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:12.294 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/07/airlineStats_OFFLINE_16077_16077_6.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16077_16077_6 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:12.295 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/08/airlineStats_OFFLINE_16078_16078_7.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:12.672 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:12.673 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/08/airlineStats_OFFLINE_16078_16078_7.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16078_16078_7 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:12.674 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/09/airlineStats_OFFLINE_16079_16079_8.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:13.048 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:13.050 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/09/airlineStats_OFFLINE_16079_16079_8.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16079_16079_8 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:13.051 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/10/airlineStats_OFFLINE_16080_16080_9.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:13.483 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:13.485 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/10/airlineStats_OFFLINE_16080_16080_9.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16080_16080_9 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:13.486 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/11/airlineStats_OFFLINE_16081_16081_10.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:14.080 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:14.081 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/11/airlineStats_OFFLINE_16081_16081_10.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16081_16081_10 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:14.082 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/12/airlineStats_OFFLINE_16082_16082_11.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:14.477 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:14.477 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/12/airlineStats_OFFLINE_16082_16082_11.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16082_16082_11 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:14.478 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/13/airlineStats_OFFLINE_16083_16083_12.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:14.865 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:14.866 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/13/airlineStats_OFFLINE_16083_16083_12.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16083_16083_12 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:14.867 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/14/airlineStats_OFFLINE_16084_16084_13.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:15.257 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:15.258 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/14/airlineStats_OFFLINE_16084_16084_13.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16084_16084_13 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:15.258 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/15/airlineStats_OFFLINE_16085_16085_14.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:15.917 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:15.919 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/15/airlineStats_OFFLINE_16085_16085_14.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16085_16085_14 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:15.919 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/16/airlineStats_OFFLINE_16086_16086_15.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:16.719 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:16.720 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/16/airlineStats_OFFLINE_16086_16086_15.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16086_16086_15 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:16.720 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/17/airlineStats_OFFLINE_16087_16087_16.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:17.346 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:17.347 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/17/airlineStats_OFFLINE_16087_16087_16.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16087_16087_16 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:17.347 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/18/airlineStats_OFFLINE_16088_16088_17.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:17.815 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:17.816 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/18/airlineStats_OFFLINE_16088_16088_17.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16088_16088_17 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:17.816 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/19/airlineStats_OFFLINE_16089_16089_18.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:18.389 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:18.389 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/19/airlineStats_OFFLINE_16089_16089_18.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16089_16089_18 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:18.390 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/20/airlineStats_OFFLINE_16090_16090_19.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:18.978 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:18.978 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/20/airlineStats_OFFLINE_16090_16090_19.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16090_16090_19 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:18.979 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/21/airlineStats_OFFLINE_16091_16091_20.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:19.586 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:19.587 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/21/airlineStats_OFFLINE_16091_16091_20.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16091_16091_20 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:19.589 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/22/airlineStats_OFFLINE_16092_16092_21.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:20.087 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:20.087 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/22/airlineStats_OFFLINE_16092_16092_21.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16092_16092_21 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:20.088 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/23/airlineStats_OFFLINE_16093_16093_22.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:20.550 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:20.551 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/23/airlineStats_OFFLINE_16093_16093_22.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16093_16093_22 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:20.552 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/24/airlineStats_OFFLINE_16094_16094_23.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:20.978 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:20.979 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/24/airlineStats_OFFLINE_16094_16094_23.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16094_16094_23 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:20.979 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/25/airlineStats_OFFLINE_16095_16095_24.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:21.626 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:21.628 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/25/airlineStats_OFFLINE_16095_16095_24.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16095_16095_24 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:21.628 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/26/airlineStats_OFFLINE_16096_16096_25.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:22.121 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:22.122 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/26/airlineStats_OFFLINE_16096_16096_25.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16096_16096_25 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:22.123 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/27/airlineStats_OFFLINE_16097_16097_26.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:22.679 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:22.679 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/27/airlineStats_OFFLINE_16097_16097_26.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16097_16097_26 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:22.680 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/28/airlineStats_OFFLINE_16098_16098_27.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:23.373 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:23.374 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/28/airlineStats_OFFLINE_16098_16098_27.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16098_16098_27 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:23.375 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/29/airlineStats_OFFLINE_16099_16099_28.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:23.787 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:23.788 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/29/airlineStats_OFFLINE_16099_16099_28.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16099_16099_28 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:23.788 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/30/airlineStats_OFFLINE_16100_16100_29.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:24.298 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:24.299 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/30/airlineStats_OFFLINE_16100_16100_29.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16100_16100_29 of table: airlineStats_OFFLINE"}
2020/08/18 16:11:24.299 INFO [SegmentPushUtils] [main] Sending table airlineStats segment URI: s3://my.bucket/examples/output/airlineStats/segments/2014/01/31/airlineStats_OFFLINE_16101_16101_30.tar.gz to location: http://localhost:9000 for
2020/08/18 16:11:24.987 INFO [FileUploadDownloadClient] [main] Sending request: http://localhost:9000/v2/segments to controller: localhost, version: Unknown
2020/08/18 16:11:24.987 INFO [SegmentPushUtils] [main] Response for pushing table airlineStats segment uri s3://my.bucket/examples/output/airlineStats/segments/2014/01/31/airlineStats_OFFLINE_16101_16101_30.tar.gz to location http://localhost:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16101_16101_30 of table: airlineStats_OFFLINE"}

Spark Job

Setup Spark Cluster (Skip if you already have one)

Please follow this page to setup a local spark cluster.

Submit Spark Job

Below is a sample Spark Ingestion job

# executionFrameworkSpec: Defines ingestion jobs to be running.
executionFrameworkSpec:

  # name: execution framework name
  name: 'spark'

  # segmentGenerationJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentGenerationJobRunner interface.
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'

  # segmentTarPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentTarPushJobRunner interface.
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentTarPushJobRunner'

  # segmentUriPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentUriPushJobRunner interface.
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner'

# jobType: Pinot ingestion job type.
# Supported job types are:
#   'SegmentCreation'
#   'SegmentTarPush'
#   'SegmentUriPush'
#   'SegmentCreationAndTarPush'
#   'SegmentCreationAndUriPush'
jobType: SegmentCreationAndUriPush

# inputDirURI: Root directory of input data, expected to have scheme configured in PinotFS.
inputDirURI: 's3://my.bucket/batch/airlineStats/rawdata/'

# includeFileNamePattern: include file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will include all avro files just under the inputDirURI, not sub directories;
#   'glob:**/*.avro' will include all the avro files under inputDirURI recursively.
includeFileNamePattern: 'glob:**/*.avro'

# excludeFileNamePattern: exclude file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will exclude all avro files just under the inputDirURI, not sub directories;
#   'glob:**/*.avro' will exclude all the avro files under inputDirURI recursively.
# _excludeFileNamePattern: ''

# outputDirURI: Root directory of output segments, expected to have scheme configured in PinotFS.
outputDirURI: 's3://my.bucket/examples/output/airlineStats/segments'

# overwriteOutput: Overwrite output segments if existed.
overwriteOutput: true

# pinotFSSpecs: defines all related Pinot file systems.
pinotFSSpecs:

  - # scheme: used to identify a PinotFS.
    # E.g. local, hdfs, dbfs, etc
    scheme: file

    # className: Class name used to create the PinotFS instance.
    # E.g.
    #   org.apache.pinot.spi.filesystem.LocalPinotFS is used for local filesystem
    #   org.apache.pinot.plugin.filesystem.AzurePinotFS is used for Azure Data Lake
    #   org.apache.pinot.plugin.filesystem.HadoopPinotFS is used for HDFS
    className: org.apache.pinot.plugin.filesystem.HadoopPinotFS
    configs:
      'hadoop.conf.path': ''

  - scheme: s3
    className: org.apache.pinot.plugin.filesystem.S3PinotFS
    configs:
      region: 'us-west-2'

# recordReaderSpec: defines all record reader
recordReaderSpec:

  # dataFormat: Record data format, e.g. 'avro', 'parquet', 'orc', 'csv', 'json', 'thrift' etc.
  dataFormat: 'avro'

  # className: Corresponding RecordReader class name.
  # E.g.
  #   org.apache.pinot.plugin.inputformat.avro.AvroRecordReader
  #   org.apache.pinot.plugin.inputformat.csv.CSVRecordReader
  #   org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader
  #   org.apache.pinot.plugin.inputformat.json.JSONRecordReader
  #   org.apache.pinot.plugin.inputformat.orc.ORCRecordReader
  #   org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReader
  className: 'org.apache.pinot.plugin.inputformat.avro.AvroRecordReader'

# tableSpec: defines table name and where to fetch corresponding table config and table schema.
tableSpec:

  # tableName: Table name
  tableName: 'airlineStats'

  # schemaURI: defines where to read the table schema, supports PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_schema.json
  #   http://localhost:9000/tables/myTable/schema
  schemaURI: 'http://localhost:9000/tables/airlineStats/schema'

  # tableConfigURI: defines where to reade the table config.
  # Supports using PinotFS or HTTP.
  # E.g.
  #   hdfs://path/to/table_config.json
  #   http://localhost:9000/tables/myTable
  # Note that the API to read Pinot table config directly from pinot controller contains a JSON wrapper.
  # The real table config is the object under the field 'OFFLINE'.
  tableConfigURI: 'http://localhost:9000/tables/airlineStats'

# segmentNameGeneratorSpec: defines how to init a SegmentNameGenerator.
segmentNameGeneratorSpec:

  # type: Current supported type is 'simple' and 'normalizedDate'.
  type: normalizedDate

  # configs: Configs to init SegmentNameGenerator.
  configs:
    segment.name.prefix: 'airlineStats_batch'
    exclude.sequence.id: true

# pinotClusterSpecs: defines the Pinot Cluster Access Point.
pinotClusterSpecs:
  - # controllerURI: used to fetch table/schema information and data push.
    # E.g. http://localhost:9000
    controllerURI: 'http://localhost:9000'

# pushJobSpec: defines segment push job related configuration.
pushJobSpec:

  # pushParallelism: push job parallelism, default is 1.
  pushParallelism: 2

  # pushAttempts: number of attempts for push job, default is 1, which means no retry.
  pushAttempts: 2

  # pushRetryIntervalMillis: retry wait Ms, default to 1 second.
  pushRetryIntervalMillis: 1000

Submit spark job with the ingestion job:

${SPARK_HOME}/bin/spark-submit \
  --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
  --master "local[2]" \
  --deploy-mode client \
  --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \
  --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar" \
  local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \
  -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/sparkIngestionJobSpec.yaml

Sample Results/Snapshots

Below is the sample snapshot of s3 location for controller:

Sample S3 Controller Storage

Below is a sample download URI in PropertyStore, we expect the segment download uri is started with s3://

Sample segment download URI in PropertyStore

Supported Transformations

This document contains the list of all the transformation functions supported by Pinot SQL.

Math Functions

Function
Description
Example

ADD(col1, col2, col3...)

Sum of at least two values

ADD(score_maths, score_science, score_history)

SUB(col1, col2)

Difference between two values

SUB(total_score, score_science)

MULT(col1, col2, col3...)

Product of at least two values

MUL(score_maths, score_science, score_history)

DIV(col1, col2)

Quotient of two values

SUB(total_score, total_subjects)

MOD(col1, col2)

Modulo of two values

MOD(total_score, total_subjects)

ABS(col1)

Absolute of a value

ABS(score)

CEIL(col1)

Rounded up to the nearest integer.

CEIL(percentage)

FLOOR(col1)

Rounded down to the nearest integer.

FLOOR(percentage)

EXP(col1)

Euler’s number(e) raised to the power of col.

EXP(age)

LN(col1)

Natural log of value i.e. ln(col1)

LN(age)

SQRT(col1)

Square root of a value

SQRT(height)

String Functions

Multiple string functions are supported out of the box from release-0.5.0 .

Function
Description
Example

UPPER(col)

convert string to upper case

UPPER(playerName)

LOWER(col)

convert string to lower case

LOWER(playerName)

REVERSE(col)

reverse the string

REVERSE(playerName)

SUBSTR(col, startIndex, endIndex)

get substring of the input string from start to endIndex. Index begins at 0. Set endIndex to -1 to calculate till end of the string

SUBSTR(playerName, 1, -1)

<code></code>

SUBSTR(playerName, 1, 4)

CONCAT(col1, col2, seperator)

Concatenate two input strings using the seperator

CONCAT(firstName, lastName, '-')

TRIM(col)

trim spaces from both side of the string

TRIM(playerName)

LTRIM(col)

trim spaces from left side of the string

LTRIM(playerName)

RTRIM(col)

trim spaces from right side of the string

RTRIM(playerName)

LENGTH(col)

calculate length of the string

LENGTH(playerName)

STRPOS(col, find, N)

find Nth instance of find string in input. Returns 0 if input string is empty. Returns -1 if the Nth instance is not found or input string is null.

STRPOS(playerName, 'david', 1)

STARTSWITH(col, prefix)

returns true if columns starts with prefix string.

STARTSWITH(playerName, 'david')

REPLACE(col, find, substitute)

replace all instances of find with replace in input

REPLACE(playerName, 'david', 'henry')

RPAD(col, size, pad)

string padded from the right side with pad to reach final size

RPAD(playerName, 20, 'foo')

LPAD(col, size, pad)

string padded from the left side with pad to reach final size

LPAD(playerName, 20, 'foo')

CODEPOINT(col)

the Unicode codepoint of the first character of the string

CODEPOINT(playerName)

CHR(codepoint)

the character corresponding to the Unicode codepoint

CHR(68)

DateTime Functions

Date time functions allow you to perform transformations on columns that contain timestamps or dates.

Function
Description
Example

TIMECONVERT

(col, fromUnit, toUnit)

Converts the value into another time unit. the column should be an epoch timestamp. Supported units are DAYS HOURS MINUTES SECONDS MILLISECONDS MICROSECONDS NANOSECONDS

TIMECONVERT(time, 'MILLISECONDS', 'SECONDS')This expression converts the value of column time (taken to be in milliseconds) to the nearest seconds (i.e. the nearest seconds that is lower than the value of date column)

DATETIMECONVERT

(columnName, inputFormat, outputFormat, outputGranularity)

Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity. Note that, for weeks/months/quarters/years, please use function: DateTrunc.

The format is expressed as <time size>:<time unit>:<time format>:<pattern> where,

time size - size of the time unit eg: 1, 10

time unit - DAYS HOURS MINUTES SECONDS MILLISECONDS MICROSECONDS NANOSECONDS

time format - EPOCH or SIMPLE_DATE_FORMAT

pattern - this is defined in case of SIMPLE_DATE_FORMAT eg: yyyy-MM-dd. A specific timezone can be passed using tz(timezone). Timezone can be long or short string format timezone. e.g. Asia/Kolkata or PDT

granularity - specified in the format<time size>:<time unit>

  • Date from hoursSinceEpoch to daysSinceEpoch and bucket it to 1 day granularity DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:EPOCH', '1:DAYS')

  • Date to 15 minutes granularity DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES')

  • Date from hoursSinceEpoch to format yyyyMdd and bucket it to 1 days granularity DATETIMECONVERT(Date, '1:HOURS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', '1:DAYS')

  • Date from milliseconds to format yyyyMdd in timezone PST DATETIMECONVERT(Date, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd tz(America/Los_Angeles)', '1:DAYS')

DATETRUNC

(Presto) SQL compatible date truncation, equivalent to the Presto function .

Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone.

DATETRUNC('week', time_in_seconds, 'SECONDS') This expression converts the column time_in_seconds, which is a long containing seconds since UTC epoch truncated at WEEK (where a Week starts at Monday UTC midnight). The output is a long seconds since UTC epoch.

DATETRUNC('quarter', DIV(time_milliseconds/1000), 'SECONDS', 'America/Los_Angeles', 'HOURS') This expression converts the expression time_in_milliseconds/1000into hours that are truncated to QUARTER at the Los Angeles time zone (where a Quarter begins on 1/1, 4/1, 7/1, 10/1 in Los Angeles timezone). The output is expressed as hours since UTC epoch (note that the output is not Los Angeles timezone)

ToEpoch<TIME_UNIT>(timeInMillis)

Convert epoch milliseconds to epoch <Time Unit>. Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS

ToEpochSeconds(tsInMillis):Converts column tsInMillis value from epoch milliseconds to epoch seconds.

ToEpochDays(tsInMillis):Converts column tsInMillis value from epoch milliseconds to epoch days.

ToEpoch<TIME_UNIT>Rounded(timeInMillis, bucketSize)

Convert epoch milliseconds to epoch <Time Unit>, round to nearest rounding bucket(Bucket size is defined in <Time Unit>). Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS

ToEpochSecondsRound(tsInMillis, 10):Converts column tsInMillis value from epoch milliseconds to epoch seconds and round to the 10-minute bucket value. E.g.ToEpochSecondsRound(1613472303000, 10) = 1613472300

ToEpochMinutesRound(tsInMillis, 1440):Converts column tsInMillis value from epoch milliseconds to epoch Minutes, but round to 1-day bucket value. E.g.ToEpochMinutesRound(1613472303000, 1440) = 26890560

ToEpoch<TIME_UNIT>Bucket(timeInMillis, bucketSize)

Convert epoch milliseconds to epoch <Time Unit>, and divided by bucket size(Bucket size is defined in <Time Unit>). Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS

ToEpochSecondsBucket(tsInMillis, 10):Converts column tsInMillis value from epoch milliseconds to epoch seconds then divide by 10 to get the 10 seconds since epoch value. E.g.

ToEpochSecondsBucket(1613472303000, 10) = 161347230

ToEpochHoursBucket(tsInMillis, 24):Converts column tsInMillis value from epoch milliseconds to epoch Hours, then divide by 24 to get 24 hours since epoch value.

FromEpoch<TIME_UNIT>(timeIn<Time_UNIT>)

Convert epoch <Time Unit> to epoch milliseconds. Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS

FromEpochSeconds(tsInSeconds):Converts column tsInSeconds value from epoch seconds to epoch milliseconds. E.g.

FromEpochSeconds(1613472303) = 1613472303000

FromEpoch<TIME_UNIT>Bucket(timeIn<Time_UNIT>, bucketSizeIn<Time_UNIT>)

Convert epoch <Bucket Size><Time Unit> to epoch milliseconds. E.g. 10 seconds since epoch or 5 minutes since Epoch. Supported <Time Unit>: SECONDS/MINUTES/HOURS/DAYS

FromEpochSecondsBucket(tsInSeconds, 10):Converts column tsInSeconds value from epoch 10-seconds to epoch milliseconds. E.g.

FromEpochSeconds(161347231)= 1613472310000

ToDateTime(timeInMillis, pattern[, timezoneId])

Convert epoch millis value to DateTime string represented by pattern. Time zone will be set to UTC if timezoneId is not specified.

ToDateTime(tsInMillis, 'yyyy-MM-dd') converts tsInMillis value to date time pattern yyyy-MM-dd

ToDateTime(tsInMillis, 'yyyy-MM-dd ZZZ', 'America/Los_Angeles') converts tsInMillis value to date time pattern yyyy-MM-dd ZZZ in America/Los_Angeles time zone

FromDateTime(dateTimeString, pattern)

Convert DateTime string represented by pattern to epoch millis.

FromDateTime(dateTime, 'yyyy-MM-dd')converts dateTime string value to millis epoch value

round(timeValue, bucketSize)

Round the given time value to nearest bucket start value.

round(tsInSeconds, 60) round seconds epoch value to the start value of the 60 seconds bucket it belongs to. E.g. round(161347231, 60)= 161347200

now()

Return current time as epoch millis

Typically used in predicate to filter on timestamp for recent data. E.g. filter data on recent 1 day(86400 seconds).WHERE tsInMillis > now() - 86400000

timezoneHour(timeZoneId)

Returns the hour of the time zone offset.

timezoneMinute(timeZoneId)

Returns the minute of the time zone offset.

year(tsInMillis)

Returns the year from the given epoch millis in UTC timezone.

year(tsInMillis, timeZoneId)

Returns the year from the given epoch millis and timezone id.

yearOfWeek(tsInMillis)

Returns the year of the ISO week from the given epoch millis in UTC timezone. Alias yowis also supported.

yearOfWeek(tsInMillis, timeZoneId)

Returns the year of the ISO week from the given epoch millis and timezone id. Alias yowis also supported.

quarter(tsInMillis)

Returns the quarter of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 4.

quarter(tsInMillis, timeZoneId)

Returns the quarter of the year from the given epoch millis and timezone id. The value ranges from 1 to 4.

month(tsInMillis)

Returns the month of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 12.

month(tsInMillis, timeZoneId)

Returns the month of the year from the given epoch millis and timezone id. The value ranges from 1 to 12.

week(tsInMillis)

Returns the ISO week of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 53. Alias weekOfYear is also supported.

week(tsInMillis, timeZoneId)

Returns the ISO week of the year from the given epoch millis and timezone id. The value ranges from 1 to 53. Alias weekOfYear is also supported.

dayOfYear(tsInMillis)

Returns the day of the year from the given epoch millis in UTC timezone. The value ranges from 1 to 366. Alias doy is also supported.

dayOfYear(tsInMillis, timeZoneId)

Returns the day of the year from the given epoch millis and timezone id. The value ranges from 1 to 366. Alias doy is also supported.

day(tsInMillis)

Returns the day of the month from the given epoch millis in UTC timezone. The value ranges from 1 to 31. Alias dayOfMonth is also supported.

day(tsInMillis, timeZoneId)

Returns the day of the month from the given epoch millis and timezone id. The value ranges from 1 to 31. Alias dayOfMonth is also supported.

dayOfWeek(tsInMillis)

Returns the day of the week from the given epoch millis in UTC timezone. The value ranges from 1(Monday) to 7(Sunday). Alias dow is also supported.

dayOfWeek(tsInMillis, timeZoneId)

Returns the day of the week from the given epoch millis and timezone id. The value ranges from 1(Monday) to 7(Sunday). Alias dow is also supported.

hour(tsInMillis)

Returns the hour of the day from the given epoch millis in UTC timezone. The value ranges from 0 to 23.

hour(tsInMillis, timeZoneId)

Returns the hour of the day from the given epoch millis and timezone id. The value ranges from 0 to 23.

minute(tsInMillis)

Returns the minute of the hour from the given epoch millis in UTC timezone. The value ranges from 0 to 59.

minute(tsInMillis, timeZoneId)

Returns the minute of the hour from the given epoch millis and timezone id. The value ranges from 0 to 59.

second(tsInMillis)

Returns the second of the minute from the given epoch millis in UTC timezone. The value ranges from 0 to 59.

second(tsInMillis, timeZoneId)

Returns the second of the minute from the given epoch millis and timezone id. The value ranges from 0 to 59.

millisecond(tsInMillis)

Returns the millisecond of the second from the given epoch millis in UTC timezone. The value ranges from 0 to 999.

millisecond(tsInMillis, timeZoneId)

Returns the millisecond of the second from the given epoch millis and timezone id. The value ranges from 0 to 999.

JSON Functions

Function

Type

Description

JSONEXTRACTSCALAR

(jsonField, 'jsonPath', 'resultsType', [defaultValue])

Transform

Evaluates the 'jsonPath' on jsonField,

returns the result as the type 'resultsType', use optional defaultValuefor null or parsing error.

JSONEXTRACTKEY

(jsonField, 'jsonPath')

Transform

Extracts all matched JSON field keys based on 'jsonPath'

Into aSTRING_ARRAY.

TOJSONMAPSTR(map)

Scalar

Convert map to JSON String

JSONFORMAT(object)

Scalar

Convert object to JSON String

JSONPATH(jsonField, 'jsonPath')

Scalar

Extracts the object value from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Cannot be used in query because data type is not specified.

JSONPATHLONG(jsonField, 'jsonPath', [defaultValue])

Scalar

Extracts the Long value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.

JSONPATHDOUBLE(jsonField, 'jsonPath', [defaultValue])

Scalar

Extracts the Double value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.

JSONPATHSTRING(jsonField, 'jsonPath', [defaultValue])

Scalar

Extracts the String value from jsonField based on 'jsonPath', use optional defaultValuefor null or parsing error.

JSONPATHARRAY(jsonField, 'jsonPath')

Scalar

Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Cannot be used in query because data type is not specified.

JSONPATHARRAYDEFAULTEMPTY(jsonField, 'jsonPath')

Scalar

Extracts an array from jsonField based on 'jsonPath', the result type is inferred based on JSON value. Returns empty array for null or parsing error. Cannot be used in query because data type is not specified.

Usage

Arguments

Description

jsonField

An Identifier/Expression contains JSON documents.

'jsonPath'

Follows to read values from JSON documents.

'results_type'

One of the Pinot supported data types:INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING,

INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.

'jsonPath'and 'results_type'are literals. Pinot uses single quotes to distinguish them from identifiers.

e.g.

  • JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING') is valid.

  • JSONEXTRACTSCALAR(profile_json_str, "$.name", "STRING") is invalid.

Transform functions can only be used in Pinot SQL. Scalar functions can be used for column transformation in table ingestion configs.

Examples

The examples below are based on these 3 sample profile JSON documents:

{
  "name" : "Bob",
  "age" : 37,
  "gender": "male",
  "location": "San Francisco"
},{
  "name" : "Alice",
  "age" : 25,
  "gender": "female",
  "location": "New York"
},{
  "name" : "Mia",
  "age" : 18,
  "gender": "female",
  "location": "Chicago"
}

Query 1: Extract string values from the field 'name'

SELECT
    JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING')
FROM
    myTable

Results are

["Bob", "Alice", "Mia"]

Query 2: Extract integer values from the field 'age'

SELECT
    JSONEXTRACTSCALAR(profile_json_str, '$.age', 'INT')
FROM
    myTable

Results are

[37, 25, 18]

Query 3: Extract Bob's age from the JSON profile.

SELECT
    JSONEXTRACTSCALAR(myMapStr,'$.age','INT')
FROM
    myTable
WHERE
    JSONEXTRACTSCALAR(myMapStr,'$.name','STRING') = 'Bob'

Results are

[37]

Query 4: Extract all field keys of JSON profile.

SELECT
    JSONEXTRACTKEY(myMapStr,'$.*')
FROM
    myTable

Results are

["name", "age", "gender", "location"]

Another example of extracting JSON fields from below JSON record:

{
        "name": "Pete",
        "age": 24,
        "subjects": [{
                        "name": "maths",
                        "homework_grades": [80, 85, 90, 95, 100],
                        "grade": "A",
                        "score": 90
                },
                {
                        "name": "english",
                        "homework_grades": [60, 65, 70, 85, 90],
                        "grade": "B",
                        "score": 70
                }
        ]
}

Extract JSON fields:

Expression
Value

JSONPATH(myJsonRecord, '$.name')

"Pete"

JSONPATH(myJsonRecord, '$.age')

24

JSONPATHSTRING(myJsonRecord, '$.age')

"24"

JSONPATHARRAY(myJsonRecord, '$.subjects[*].name')

["maths", "english"]

JSONPATHARRAY(myJsonRecord, '$.subjects[*].score')

[90, 70]

JSONPATHARRAY(myJsonRecord, '$.subjects[*].homework_grades[1]')

[85, 65]

Binary Functions

Function
Description
Example

SHA(bytesCol)

Return SHA-1 digest of binary column(bytes type) as hex string

SHA(rawData)

SHA256(bytesCol)

Return SHA-256 digest of binary column(bytes type) as hex string

SHA256(rawData)

SHA512(bytesCol)

Return SHA-512 digest of binary column(bytes type) as hex string

SHA512(rawData)

MD5(bytesCol)

Return MD5 digest of binary column(bytes type) as hex string

MD5(rawData)

Multi-value Column Functions

All of the functions mentioned till now only support single value columns. You can use the following functions to do operations on multi-value columns.

Function
Description
Example

ARRAYLENGTH

Returns the length of a multi-value column

MAP_VALUE

Select the value for a key from Map stored in Pinot.

MAP_VALUE(mapColumn, 'myKey', valueColumn)

VALUEIN

Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The VALUEIN transform function is especially useful when the same multi-valued column is both filtering column and grouping column.

VALUEIN(mvColumn, 3, 5, 15)

Advanced Queries

Geospatial Queries

Pinot supports Geospatial queries on columns containing text-based geographies. For more details on the queries and how to enable them, see Geospatial.

Text Queries

Pinot supports pattern matching on text-based columns. Only the columns mentioned as text columns in table config can be queried using this method. For more details on how to enable pattern matching, see Text search support.

date_trunc
JsonPath Syntax