Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This page contains guides related to importing data from Apache Kafka using stream ingestion.
Loading...
This section contains a collection of short guides to show you how to import from a Pinot supported file system.
Loading...
Loading...
Loading...
This section contains a collection of guides that will show you how to import data from a Pinot supported input format.
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This section contains articles that provide technical and implementation details of Pinot features
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Here you will find a collection of ready-made sample applications and examples for real-world data
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Introduction to Apache Pinot, a real-time distributed OLAP datastore.
Pinot is a real-time distributed OLAP datastore, built to deliver scalable real-time analytics with low latency. It can ingest from batch data sources (such as Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage) as well as stream data sources (such as Apache Kafka).
Pinot was built by engineers at LinkedIn and Uber and is designed to scale up and out with no upper bound. Performance always remains constant based on the size of your cluster and an expected query per second (QPS) threshold.
Join us in our Slack channel for questions, troubleshooting, and feedback. We'd love to hear from you. https://communityinviter.com/apps/apache-pinot/apache-pinot
Our documentation is structured to let you quickly get to the content you need and is organized around the different concerns of users, operators, and developers. If you're new to Pinot and want to learn things by example, please take a look at our getting started section.
To start importing data into Pinot, check out our guides on batch import and stream ingestion based on our plugin architecture.
Pinot works very well for querying time series data with many dimensions and metrics over a vast unbounded space of records that scales linearly on a per node basis. Filters and aggregations are both easy and fast.
Pinot supports SQL for querying read-only data. Learn more about querying Pinot for time series data in our PQL (Pinot Query Language) guide.
Pinot may be deployed to and operated on a cloud provider or a local or virtual machine. You may get started either with a bare-metal installation or a Kubernetes one (either locally or in the cloud). To get immediately started with Pinot, check out these quick start guides for bootstrapping a Pinot cluster using Docker or Kubernetes.
For a high-level overview that explains how Pinot works, please take a look at our basic concepts section.
To understand the distributed systems architecture that explains Pinot's operating model, please take a look at our basic architecture section.
This section focuses on answering the most frequently asked questions for people exploring the newly evolving category of distributed OLAP engines. Pinot was created by authors at both Uber and LinkedIn and has been hardened and battle tested at the very highest of load and scale.
While Pinot doesn't match the typical mold of a database product, it is best understood based on your role as either an analyst, data scientist, or application developer.
Enterprise business intelligence
For analysts and data scientists, Pinot is best viewed as a highly-scalable data platform for business intelligence. In this view, Pinot converges big data platforms with the traditional role of a data warehouse, making it a suitable replacement for analysis and reporting.
Enterprise application development
For application developers, Pinot is best viewed as an immutable aggregate store that sources events from streaming data sources, such as Kafka, and makes it available for query using SQL.
As is the case with a microservice architecture, data encapsulation ends up requiring each application to provision its own data store, as opposed to sharing one OLTP database for reads and writes. In this case, it becomes difficult to query the complete view of a domain because it becomes stored in many different databases. This is costly in terms of performance, since it requires joins across multiple microservices that expose their data over HTTP under a REST API. To prevent this, Pinot can be used to aggregate all of the data across a microservice architecture into one easily queryable view of the domain.
Pinot tenants prevent any possibility of sharing ownership of database tables across microservice teams. Developers can create their own query models of data from multiple systems of record depending on their use case and needs. As with all aggregate stores, query models are eventually consistent and immutable.
Company
Notes
Pinot originated at LinkedIn and it powers more 50+ user facing applications such as Who Viewed My Profile, Talent Analytics, Company Analytics, Ad Analytics and many more. Pinot also serves as the backend for to visualize and monitor 10,000+ business metrics.
Pinot runs on 1000+ nodes serving 100k+ queries while ingesting 1.5M+ events per second.
Uber
Microsoft
Microsoft Teams uses Pinot for analytics on Teams product usage data.
Weibo uses Pinot for realtime analytics on CDN & Weibo Video data to make business decisions, optimize service performance and improve user experience.
Factual
A column-oriented database with various compression schemes such as Run Length, Fixed Bit Length
Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index
Ability to optimize query/execution plan based on query and segment metadata
Near real time ingestion from streams and batch ingestion from Hadoop
SQL-like language that supports selection, aggregation, filtering, group by, order by, distinct queries on data
Support for multi-valued fields
Horizontally scalable and fault-tolerant
Pinot is designed to execute OLAP queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion.
User facing Analytics Products
Pinot was originally built at LinkedIn to power rich interactive real-time analytic applications such as Who Viewed Profile, Company Analytics, Talent Insights, and many more. UberEats Restaurant Manager is another example of a customer facing Analytics App. At LinkedIn, Pinot powers 50+ user-facing products, ingesting millions of events per second and serving 100k+ queries per second at millisecond latency.
Real-time Dashboard for Business Metrics
Pinot can be also be used to perform typical analytical operations such as slice and dice, drill down, roll up, and pivot on large scale multi-dimensional data. For instance, at LinkedIn, Pinot powers dashboards for thousands of business metrics. One can connect various BI tools such Superset, Tableau, or PowerBI to visualize data in Pinot.
Instructions to connect Pinot with Superset can found here.
Anomaly Detection
In addition to visualizing data in Pinot, one can run Machine Learning Algorithms to detect Anomalies on the data stored in Pinot. See ThirdEye for more information on how to use Pinot for Anomaly Detection and Root Cause Analysis.
Pinot powers many internal and external dashboards as well as external site facing analytics applications like .
Insight Product -
The Pinot Controller is responsible for a number of things
Controllers maintain the global metadata (e.g. configs and schemas) of the system with the help of Zookeeper which is used as the persistent metadata store.
Controllers host Helix Controller and is responsible for managing other pinot components (brokers, servers, minions)
They maintain the mapping of which servers are responsible for which segments. This mapping is used by the servers, to download the portion of the segments that they are responsible for. This mapping is also used by the broker to decide which servers to route the queries to.
Controller has admin endpoints for viewing, creating, updating and deleting configs which help us manage and operate the cluster.
Controllers also have endpoints for segment uploads which are used in offline data pushes. They are responsible for initializing realtime consumption and coordination of persisting the realtime segments into the segment store periodically.
They undertake other management activities such as managing retention of segments, validations.
There can be multiple instances of Pinot controller for redundancy. If there are multiple controllers, Pinot expects that all of them are configured with the same back-end storage system so that they have a common view of the segments (e.g. NFS). Pinot can use other storage systems such as HDFS or ADLS.
Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a controller
Learn about the various components of Pinot and terminologies used to describe data stored in Pinot
Pinot is designed to deliver low latency queries on large datasets. In order to achieve this performance, Pinot stores data in a columnar format and adds additional indices to perform fast filtering, aggregation and group by.
Raw data is broken into small data shards and each shard is converted into a unit known as a segment. One or more segments together form a table, which is the logical container for querying Pinot using SQL/PQL.
Pinot uses a variety of terms which can refer to either abstractions that model the storage of data or infrastructure components that drive the functionality of the system.
Similar to traditional databases, Pinot has the concept of a table—a logical abstraction to refer to a collection of related data. As is the case with RDBMS, a table is a construct that consists of columns and rows (documents) that are queried using SQL. A table is associated with a schema which defines the columns in a table as well as their data types.
As opposed to RDBMS schemas, multiple tables can be created in Pinot (real-time or batch) that inherit a single schema definition. Tables are independently configured for concerns such as indexing strategies, partitioning, tenants, data sources, and/or replication.
Pinot has a distributed systems architecture that scales horizontally. Pinot expects the size of a table to grow infinitely over time. In order to achieve this, all data needs to be distributed across multiple nodes. Pinot achieves this by breaking data into smaller chunks known as segments **(this is similar to shards/partitions in HA relational databases). Segments can also be seen as time-based partitions.
In order to support multi-tenancy, Pinot has first class support for tenants. A table is associated with a tenant. This allows all tables belonging to a particular logical namespace to be grouped under a single tenant name and isolated from other tenants. This isolation between tenants provides different namespaces for applications and teams to prevent sharing tables or schemas. Development teams building applications will never have to operate an independent deployment of Pinot. An organization can operate a single cluster and scale it out as new tenants increase the overall volume of queries. Developers can manage their own schemas and tables without being impacted by any other tenant on a cluster.
By default, all tables belong to a default tenant named "default". The concept of tenants is very important, as it satisfies the architectural principle of a "database per service/application" without having to operate many independent data stores. Further, tenants will schedule resources so that segments (shards) are able to restrict a table's data to reside only on a specified set of nodes. Similar to the kind of isolation that is ubiquitously used in Linux containers, compute resources in Pinot can be scheduled to prevent resource contention between tenants.
Logically, a cluster is simply a group of tenants. As with the classical definition of a cluster, it is also a grouping of a set of compute nodes. Typically, there is only one cluster per environment/data center. There is no needed to create multiple clusters since Pinot supports the concept of tenants. At LinkedIn, the largest Pinot cluster consists of 1000+ nodes distributed across a data center. The number of nodes in a cluster can be added in a way that will linearly increase performance and availability of queries. The number of nodes and the compute resources per node will reliably predict the QPS for a Pinot cluster, and as such, capacity planning can be easily achieved using SLAs that assert performance expectations for end-user applications.
Auto-scaling is also achievable, however, a set amount of nodes is recommended to keep QPS consistent when query loads vary in sudden unpredictable end-user usage scenarios.
A Pinot cluster is comprised of multiple distributed system components. These components are useful to understand for operators that are monitoring system usage or are debugging an issue with a cluster deployment.
Controller
Server
Broker
Minion (optional)
The benefits of scale that make Pinot linearly scalable for an unbounded number of nodes is made possible through its integration with Apache Zookeeper and Apache Helix.
Helix is a cluster management solution that was designed and created by the authors of Pinot at LinkedIn. Helix drives the state of a Pinot cluster from a transient state to an ideal state, acting as the fault-tolerant distributed state store that guarantees consistency. Helix is embedded as agents that operate within a controller, broker, and server, and does not exist as an independent and horizontally scaled component.
A controller is the core orchestrator that drives the consistency and routing in a Pinot cluster. Controllers are horizontally scaled as an independent component (container) and has visibility of the state of all other components in a cluster. The controller reacts and responds to state changes in the system and schedules the allocation of resources for tables, segments, or nodes. As mentioned earlier, Helix is embedded within the controller as an agent that is a participant responsible for observing and driving state changes that are subscribed to by other components.
In addition to cluster management, resource allocation, and scheduling, the controller is also the HTTP gateway for REST API administration of a Pinot deployment. A web-based query console is also provided for operators to quickly and easily run SQL/PQL queries.
A broker receives queries from a client and routes their execution to one or more Pinot servers before returning a consolidated response.
Servers host segments (shards) that are scheduled and allocated across multiple nodes and routed on an assignment to a tenant (there is a single tenant by default). Servers are independent containers that scale horizontally and are notified by Helix through state changes driven by the controller. A server can either be a real-time server or an offline server.
A real-time and offline server have very different resource usage requirements, where real-time servers are continually consuming new messages from external systems (such as Kafka topics) that are ingested and allocated on segments of a tenant. Because of this, resource isolation can be used to prioritize high-throughput real-time data streams that are ingested and then made available for query through a broker.
Pinot minion is an optional component that can be used to run background tasks such as "purge" for GDPR (General Data Protection Regulation). As Pinot is an immutable aggregate store, records containing sensitive private data need to be purged on a request-by-request basis. Minion provides a solution for this purpose that complies with GDPR while optimizing Pinot segments and building additional indices that guarantees performance in the presence of the possibility of data deletion. One can also write a custom task that runs on a periodic basis. While it's possible to perform these tasks on the Pinot servers directly, having a separate process (Minion) lessens the overall degradation of query latency as segments are impacted by mutable writes.
Cluster is a set a nodes comprising of servers, brokers, controllers and minions.
Pinot leverages Apache Helix for cluster management. Helix is a cluster management framework to manage replicated, partitioned resources in a distributed system. Helix uses Zookeeper to store cluster state and metadata.
Briefly, Helix divides nodes into three logical components based on their responsibilities
The nodes that host distributed, partitioned resources
The nodes that observe the current state of each Participant and use that information to access the resources. Spectators are notified of state changes in the cluster (state of a participant, or that of a partition in a participant).
The node that observes and controls the Participant nodes. It is responsible for coordinating all transitions in the cluster and ensuring that state constraints are satisfied while maintaining cluster stability.
Pinot Servers are modeled as Participants, more details about server nodes can be found in Server. Pinot Brokers are modeled as Spectators, more details about broker nodes can be found in Broker. Pinot Controllers are modeled as Controllers, more details about controller nodes can be found in Controller.
Another way to visualize the cluster is a logical view, wherein a cluster contains tenants, tenants contain tables, and tables contain segments.
Typically, there is only cluster per environment/data center. There is no needed to create multiple Pinot clusters since Pinot supports the concept of tenants. At LinkedIn, the largest Pinot cluster consists of 1000+ nodes.
To setup a Pinot cluster, we need to first start Zookeeper.
Create an isolated bridge network in docker
Start Zookeeper in daemon.
Start ZKUI to browse Zookeeper data at http://localhost:9090.
Download Pinot Distribution using instructions in Download
Install zooinspector to view the data in Zookeeper, and connect to localhost:2181
Once we've started Zookeeper, we can start other components to join this cluster. If you're using docker, pull the latest apachepinot/pinot
image.
You can try out pre-built Pinot all-in-one docker image.
(Optional) You can also follow the instructions here to build your own images.
To start other components to join the cluster
Explore your cluster via Pinot Data Explorer
Learn about the different components and logical abstractions
This section is a reference for the definition of major components and logical abstractions used in Pinot. Please visit the Basic Concepts section to get a general overview that ties together all of the reference material in this section.
Brokers are the components that handle Pinot queries. They accept queries from clients and forward them to the right servers. They collect results back from the servers and consolidate them into a single response, to sent it back to the client.
Pinot Brokers are modeled as Spectators. They need to know the location of each segment of a table (and each replica of the segments) and route requests to the appropriate server that hosts the segments of the table being queried. The broker ensures that all the rows of the table are queried exactly once so as to return correct, consistent results for a query. The brokers may optimize to prune some of the segments as long as accuracy is not sacrificed. Helix provides the framework by which spectators can learn the location in which each partition of a resource (i.e. participant) resides. The brokers use this mechanism to learn the servers that host specific segments of a table.
In case of hybrid tables, the brokers ensure that the overlap between realtime and offline segment data is queried exactly once, by performing offline and realtime federation. Let's take this example, we have realtime data for 5 days - March 23 to March 27, and offline data has been pushed until Mar 25, which is 2 days behind realtime. The brokers maintain this time boundary.
Suppose, we get a query to this table : select sum(metric) from table
. The broker will split the query into 2 queries based on this time boundary - one for offline and one for realtime. This query becomes - select sum(metric) from table_REALTIME where date >= Mar 25
and
select sum(metric) from table_OFFLINE where date < Mar 25
The broker then merges results from both these queries before returning back to the client.
Servers host the data segments and serve queries off the data they host. There's two types of servers
Offline Offline servers are responsible for downloading segments from the segment store, to host and serve queries off. When a new segment is uploaded to the controller, the controller decides the servers (as many as replication) that will host the new segment and notifies them to download the segment from the segment store. On receiving this notification, the servers download the segment file and load the segment onto the server, to server queries off them.
Realtime Real time servers directly ingest from a real time stream (such as Kafka, EventHubs). Periodically, they make segments of the in-memory ingested data, based on certain thresholds. This segment is then persisted onto the segment store.
Pinot Servers are modeled as Helix Participants, hosting Pinot tables (referred to as resources in helix terminology). Segments of a table are modeled as Helix partitions (of a resource). Thus, a Pinot server hosts one or more helix partitions of one or more helix resources (i.e. one or more segments of one or more tables).
>
USAGE
Pinot Minion is a new component which leverages the . It can be attached to an existing Pinot cluster and then execute tasks as provided by the controller. It's a generic and single place for running background jobs. They help offload computationally intensive tasks—such as adding indexes to segments and merging segments—from other components.
A tenant is a logical component, defined as a group of server/broker nodes with the same Helix tag.
In order to support multi-tenancy, Pinot has first class support for tenants. Every table is associated with a server tenant and a broker tenant. This controls the nodes that will be used by this table as servers and brokers. This allows all tables belonging to a particular use case to be grouped under a single tenant name.
The concept of tenants is very important when the multiple use cases are using Pinot and there is a need to provide quotas or some sort of isolation across tenants. For example, consider we have two tables Table A
and Table B
in the same Pinot cluster.
We can configure Table A
with server tenant Tenant A
and Table B
with server tenant Tenant B
. We can tag some of the server nodes for Tenant A
and some for Tenant B
. This will ensure that segments of Table A
only reside on servers tagged with Tenant A
, and segment of Table B
only reside on servers tagged with Tenant B
. The same isolation can be achieved at the broker level, by configuring broker tenants to the tables.
No need to create separate clusters for every table or use case!
This section contains 2 main fields broker
and server
which decide the tenants used for the broker and server components of this table.
In the above example,
The table will be served by brokers that have been tagged as brokerTenantName_BROKER
in Helix.
If this were an offline table, the offline segments for the table will be hosted in pinot servers tagged in helix as serverTenantName_OFFLINE
If this were a realtime table, the realtime segments (both consuming as well as completed ones) will be hosted in pinot servers tagged in helix as serverTenantName_REALTIME
.
Here's a sample broker tenant config. This will create a broker tenant sampleBrokerTenant
by tagging 3 untagged broker nodes as sampleBrokerTenant_BROKER
.
To create this tenant use the following command. The creation will fail if number of untagged broker nodes is less than numberOfInstances
.
Here's a sample server tenant config. This will create a server tenant sampleServerTenant
by tagging 1 untagged server node as sampleServerTenant_OFFLINE
and 1 untagged server node as sampleServerTenant_REALTIME
.
To create this tenant use the following command. The creation will fail if number of untagged server nodes is less than offlineInstances
+ realtimeInstances
.
Schema is used to define the names, data types and other information for the columns of a Pinot table.
Columns in a Pinot table can be broadly categorized into three categories
A Pinot schema is written in JSON format. Here's an example which shows all the fields of a schema
The Pinot schema is composed of
Below is a detailed description of each type of field spec.
A dimensionFieldSpec is defined for each dimension column. Here's a list of the fields in the dimensionFieldSpec
A metricFieldSpec is defined for each metric column. Here's a list of fields in the metricFieldSpec
A dateTimeFieldSpec is used to define time columns of the table. Here's a list of the fields in a dateTimeFieldSpec
This has been deprecated. Older schemas containing timeFieldSpec will be supported. But for new schemas, use DateTimeFieldSpec instead.
A timeFieldSpec is defined for the time column. A timeFieldSpec is composed of an incomingGranularitySpec and an outgoingGranularitySpec. IncomingGranularitySpec in combination with outgoingGranularitySpec can be used to transform the time column from incoming format to the outgoing format. If both of them are specified, the segment creation process will convert the time column from the incoming format to the outgoing format. If no time column transformation is required, you can specify just the incomingGranularitySpec.
The incoming and outgoing granularitySpec are defined as:
Apart from these, there's some advanced fields. These are common to all field specs.
Transform functions can be defined on columns in the schema. For example:
Currently, we have support for 2 kinds of functions
Groovy functions
Inbuilt functions
Note
Currently, the arguments must be from the source data. They cannot be columns from the Pinot schema which have been created through transformations.
Groovy functions can be defined using the syntax:
Here's some examples of commonly needed functions. Any valid Groovy expression can be used.
Concat firstName
and lasName
to get fullName
Find max value in array bids
Convert timestamp
from MILLISECONDS
to HOURS
Simply change name of the column from user_id
to userId
If eventType
is IMPRESSION
set impression
to 1
. Similar for CLICK
.
Store an AVRO Map in Pinot as two multi-value columns. Sort the keys, to maintain the mapping.
1) The keys of the map as map_keys
2) The values of the map as map_values
We have several inbuilt functions that can be used directly in as ingestion transform functions
These are functions which enable commonly needed time transformations.
toEpochXXX
Converts from epoch milliseconds to a higher granularity.
toEpochXXXRounded
Converts from epoch milliseconds to another granularity, rounding to the nearest rounding bucket. For example, 1588469352000
(2020-05-01 42:29:12) is 26474489
minutesSinceEpoch. `toEpochMinutesRounded(1588469352000) = 26474480
(2020-05-01 42:20:00)
fromEpochXXX
Converts from an epoch granularity to milliseconds.
Simple date format
Converts simple date format strings to milliseconds and vice-a-versa, as per the provided pattern string.
Make sure you've . If you're using docker, make sure to . To start a broker
Make sure you've . If you're using docker, make sure to . To start a server
This tenant is defined in the section of the table config.
Follow instructions in to get Pinot locally, and then
Check out the table config in the to make sure it was successfully uploaded.
Follow instructions in to get Pinot locally, and then
Check out the table config in the to make sure it was successfully uploaded.
Create a schema for your data, or see for examples. Make sure you've
Note: schema can also be created as part of table creation, refer to .
Check out the schema in the to make sure it was successfully uploaded
schema fields
description
schemaName
Defines the name of the schema. This is usually the same as the table name. The offline and the realtime table of a hybrid table should use the same schema.
dimensionFieldSpecs
A dimensionFieldSpec is defined for each dimension column. For more details, scroll down to dimensionFieldSpec
metricFieldSpecs
A metricFieldSpec is defined for each metric column. For more details, scroll down to metricFieldSpec
dateTimeFieldSpec
A dateTimeFieldSpec is defined for the time columns. There can be multiple time columns. For more details, scroll down to dateTimeFieldSpec.
timeFieldSpec
Deprecated. Use dateTimeFieldSpec instead. A timeFieldSpec is defined for the time column. There can only be one time column. For more details, scroll down to timeFieldSpec
field
description
name
Name of the dimension column
dataType
Data type of the dimension column. Can be STRING, BOOLEAN, INT, LONG, DOUBLE, FLOAT, BYTES
<b></b>
defaultNullValue
Represents null values in the data, since Pinot doesn't support storing null column values natively (as part of its on-disk storage format). If not specified, an internal default null value is used as listed here
singleValueField
Boolean indicating if this is a single value or a multi value column. In the example above, the dimension tags
is multi-valued. This means that it can have multiple values for a particular row, say tag1, tag2, tag3
. For a multi-valued column, individual rows don’t necessarily need to have the same number of values. Typical use case for this would be a column such as skillSet
for a person (one row in the table) that can have multiple values such as Real Estate, Mortgages.
Data Type
Internal Default Null Value
INT
LONG
FLOAT
DOUBLE
STRING
"null"
BYTES
byte array of length 0
field
description
name
Name of the metric column
dataType
Data type of the column. Can be INT, LONG, DOUBLE, FLOAT, BYTES (for specialized representations such as HLL, TDigest, etc, where the column stores byte serialized version of the value)
defaultNullValue
Represents null values in the data. If not specified, an internal default null value is used, as listed here. The values are the same as those used for dimensionFieldSpec.
Data Type
Internal Default Null Value
INT
0
LONG
0
FLOAT
0.0
DOUBLE
0.0
STRING
"null"
BYTES
byte array of length 0
field
description
name
Name of the date time column
dataType
Data type of the date time column. Can be STRING, INT, LONG
format
The format of the time column. The syntax of the format is timeSize:timeUnit:timeFormat
timeFormat can be either EPOCH or SIMPLE_DATE_FORMAT. If it is SIMPLE_DATE_FORMAT, the pattern string is also specified. For example:
1:MILLISECONDS:EPOCH - epoch millis
1:HOURS:EPOCH - epoch hours
1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd - date specified like 20191018
1:HOURS:SIMPLE_DATE_FORMAT:EEE MMM dd HH:mm:ss ZZZ yyyy - date specified like Mon Aug 24 12:36:50 America/Los_Angeles 2019
granularity
The granularity in which the column is bucketed. The syntax of granularity is
bucket size:bucket unit
For example, the format can be milliseconds 1:MILLISECONDS:EPOCH
, but bucketed to 15 minutes i.e. we only have one value for every 15 minute interval, in which case granularity can be specified as 15:MINUTES
defaultNullValue
Represents null values in the data. If not specified, an internal default null value is used, as listed here. The values are the same as those used for dimensionFieldSpec.
timeFieldSpec fields
Description
incomingGranularitySpec
Details of the time column in the incoming data
outgoingGranularitySpec
Details of the format to which the time column should be converted for using in Pinot
field
description
name
Name of the time column. If incomingGranularitySpec, this is the name of the time column in the incoming data. If outgoingGranularitySpec, this is the name of the column you wish to transform it to and see in Pinot
dataType
Data type of the time column. Can be INT, LONG or STRING
timeType
Indicates the time unit. Can be one of DAYS, SECONDS, HOURS, MILLISECONDS, MICROSECONDS and NANOSECONDS
timeUnitSize
Indicates the bucket length. By default 1. E.g. in the sample above outgoing time is in fiveMinutesSinceEpoch i.e. rounded to 5 minutes buckets
timeFormat
EPOCH (millisSinceEpoch, hoursSinceEpoch etc) or SIMPLE_DATE_FORMAT (yyyyMMdd, yyyyMMdd:hhssmm etc)
field name
description
maxLength
Max length of this column
transformFunction
Transform function to generate this column. See section below.
virtualColumnProvider
Column value provider
Function name
Description
toEpochSeconds
Converts epoch millis to epoch seconds.
Usage: "transformFunction": "toEpochSeconds(millis)"
toEpochMinutes
Converts epoch millis to epoch minutes
Usage: "transformFunction": "toEpochMinutes(millis)"
toEpochHours
Converts epoch millis to epoch hours
Usage: "transformFunction": "toEpochHours(millis)"
toEpochDays
Converts epoch millis to epoch days
Usage: "transformFunction": "toEpochDays(millis)"
Function Name
Description
toEpochSecondsRounded
Converts epoch millis to epoch seconds, rounding to nearest rounding bucket
"transformFunction": "toEpochSecondsRounded(millis, 30)"
toEpochMinutesRounded
Converts epoch millis to epoch seconds, rounding to nearest rounding bucket
"transformFunction": "toEpochMinutesRounded(millis, 10)"
toEpochHoursRounded
Converts epoch millis to epoch seconds, rounding to nearest rounding bucket
"transformFunction": "toEpochHoursRounded(millis, 6)"
toEpochDaysRounded
Converts epoch millis to epoch seconds, rounding to nearest rounding bucket
"transformFunction": "toEpochDaysRounded(millis, 7)"
Function Name
Description
fromEpochSeconds
Converts from epoch seconds to milliseconds
"transformFunction": "fromEpochSeconds(secondsSinceEpoch)"
fromEpochMinutes
Converts from epoch minutes to milliseconds
"transformFunction": "fromEpochMinutes(minutesSinceEpoch)"
fromEpochHours
Converts from epoch hours to milliseconds
"transformFunction": "fromEpochHours(hoursSinceEpoch)"
fromEpochDays
Converts from epoch days to milliseconds
"transformFunction": "fromEpochDays(daysSinceEpoch)"
Function name
Description
toDateTime
Converts from milliseconds to a formatted date time string, as per the provided pattern
"transformFunction": "toDateTime(millis, 'yyyy-MM-dd')"
fromDateTime
Converts a formatted date time string to milliseconds, as per the provided pattern
"transformFunction": "fromDateTime(dateTimeStr, 'EEE MMM dd HH:mm:ss ZZZ yyyy')"
Function name
Description
toJsonMapStr
Converts a JSON/Avro map to a string. This json map can then be queried using jsonExtractScalar function.
"transformFunction": "toJsonMapStr(jsonMapField)"
Column Category
Description
Dimension
Dimension columns are typically used in slice and dice operations for answering business queries. Frequent operations done on dimension columns:
GROUP BY - group by one or more dimension columns along with aggregations on one or more metric columns
Filter processing
Metric
These columns represent quantitative data of the table. Such columns are frequently used in aggregation operations. In data warehouse terminology, these are also referred to as fact or measure columns.
Frequent operations done on metric columns:
Aggregation - SUM, MIN, MAX, COUNT, AVG etc
Filter processing
DateTime
This column represents time columns in the data. There can be multiple time columns in a table, but only one of them is the primary time column. Primary time column is the one that is set in the segmentConfig. This primary time column is used by Pinot, for maintaining the time boundary between offline and realtime data in a hybrid table and for retention management. A primary time column is mandatory if the table's push type is APPEND
and optional if the push type is REFRESH
.
Common operations done on time column:
GROUP BY
Filter processing
Time
This has been deprecated. Use DateTime column type for time columns.
This column represents a timestamp. There can be at most one time column in a table. Common operations done on time column:
GROUP BY
Filter processing
The time column is also used internally by Pinot, for maintaining the time boundary between offline and realtime data in a hybrid table and for retention management. A time column is mandatory if the table's push type is APPEND
and optional if the push type is REFRESH
.
This page has a collection of frequently asked questions with answers from the community.
This is a list of frequent questions most often asked in our troubleshooting channel on Slack. Please feel free to contribute your questions and answers here and make a pull request.
We have toJsonStr(key)
function which can store a top level json field as a STRING in Pinot.
Then you can use jsonExtractScalar(JSON_STRING_FIELD, JSON_PATH, OUTPUT_FORMAT)
function during query time to fetch the desired field from the json string. For example
NOTE This works well if some of your fields are nested json, but most of your fields are top level json keys. If all of your fields are within a nested JSON key, you will have to store the entire payload as 1 column, which is not ideal.
Support for flattening during ingestion is on the roadmap: https://github.com/apache/incubator-pinot/issues/5264
Inverted indexes are set in the tableConfig's tableIndexConfig -> invertedIndexColumns list. Here's the documentation for tableIndexConfig: https://docs.pinot.apache.org/basics/components/table#tableindexconfig-1 along with a sample table that has set inverted indexes on some columns.
Applying inverted indexes to a table config will generate inverted index to all new segments. In order to apply the inverted indexes to all existing segments, follow steps in How to apply inverted index to existing setup?
Add the columns you wish to index to the tableIndexConfig-> invertedIndexColumns list. This sample table config show inverted indexes set: https://docs.pinot.apache.org/basics/components/table#offline-table-config To update the table config use the Pinot Swagger API: http://localhost:9000/help#!/Table/updateTableConfig
Invoke the reload API: http://localhost:9000/help#!/Segment/reloadAllSegments
Right now, there’s no easy way to confirm that reload succeeded. One way it to check out the index_map file inside the segment metadata, you should see inverted index entries for the new columns. An API for this is coming soon: https://github.com/apache/incubator-pinot/issues/5390
Here's the page explaining the Pinot response format: https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql/response-format
"timestamp" is a reserved keyword in SQL. Escape timestamp with double quotes.
Other commonly encountered reserved keywords are date, time, table.
For filtering on STRING columns, use single quotes
The fields in the ORDER BY
clause must be one of the group by clauses or aggregations, BEFORE applying the alias. Therefore, this will not work
Instead, this will work
You can change the number of replicas by updating the table config's segmentsConfig section. Make sure you have at least as many servers as the replication.
For OFFLINE table, update replication
For REALTIME table update replicasPerPartition
After changing the replication, run a table rebalance.
A rebalance is run to reassign all the segments of a table to the available servers. This is typically done when capacity changes are done i.e. adding more servers or removing servers from a table.
Offline
Use the rebalance API from the Swagger APIs on the controller http://localhost:9000/help#!/Table/rebalance, with tableType OFFLINE
Realtime
Use the rebalance API from the Swagger APIs on the controller http://localhost:9000/help#!/Table/rebalance, with tableType REALTIME.
A realtime table has 2 components, the consuming segments and the completed segments. By default, only the completed segments will get rebalanced. The consuming segments will pick the right assignment once they complete. But you can enforce the consuming segments to also be included in the rebalance, by setting the param includeConsuming
to true. Note that rebalancing the consuming segments would mean the consuming segment will drop the consumed data so far, and restart consumption from the last offset, which may lead to a short duration of data staleness.
You can check the status of the rebalance by
Checking the controller logs
Running rebalance again after a while, you should receive status "status": "NO_OP"
Checking the External View of the table, to see the changes in capacity/replicas have taken effect.
Yes, replica groups work for realtime. There's 2 parts to enabling replica groups:
Replica groups segment assignment
Replica group query routing
Replica group segment assignment
Replica group segment assignment is achieved in realtime, if number of servers is a multiple of number of replicas. The partitions get uniformly sprayed across the servers, creating replica groups.
For example, consider we have 6 partitions, 2 replicas, and 4 servers.
r1
r2
p1
S0
S1
p2
S2
S3
p3
S0
S1
p4
S2
S3
p5
S0
S1
p6
S2
S3
As you can see, the set (S0, S2) contains r1 of every partition, and (s1, S3) contains r2 of every partition. The query will only be routed to one of the sets, and not span every server. If you are are adding/removing servers from an existing table setup, you have to run rebalance for segment assignment changes to take effect.
Replica group query routing
Once replica group segment assignment is in effect, the query routing can take advantage of it. For replica group based query routing, set the following in the table config's routing section, and then restart brokers
Pinot quick start in Kubernetes
This quick start assumes the existence of a Kubernetes cluster. Please follow the links below to setup your Kubernetes cluster.
Before continuing, please make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.
The scripts can be found in the Pinot source at ./incubator-pinot/kubernetes/helm
Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is here.
For Helm v2.12.1
If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:
Then deploy a new HA Pinot cluster using the following command:
For Helm v3.0.0
Error: Please run the below command if encountering the following issue:
Resolution:
Error: Please run the command below if encountering a permission issue:
Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"
Resolution:
Ensure the Kafka deployment is ready before executing the scripts in the following next steps.
The scripts below will create two Kafka topics for data ingestion:
The script below will deploy 3 batch jobs.
Ingest 19492 JSON messages to Kafka topic flights-realtime
at a speed of 1 msg/sec
Ingest 19492 Avro messages to Kafka topic flights-realtime-avro
at a speed of 1 msg/sec
Upload Pinot schema airlineStats
Create Pinot table airlineStats
to ingest data from JSON encoded Kafka topic flights-realtime
Create Pinot table airlineStatsAvro
to ingest data from Avro encoded Kafka topic flights-realtime-avro
Please use the script below to perform local port-forwarding, which will also open Pinot query console in your default web browser.
This script can be found in the Pinot source at ./incubator-pinot/kubernetes/helm
You can run below command to navigate superset in your browser with the previous admin credential.
You can open the imported dashboard by clicking Dashboards
banner and then click on AirlineStats
.
You can run the command below to deploy a customized Presto with Pinot plugin installed.
Once Presto is deployed, you can run the command below.
List all catalogs
List All tables
Show schema
Count total documents
This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.
In this guide you'll learn how to download and install Apache Pinot as a standalone instance.
This is a quickstart guide that will show you how to quickly start an example recipe in a standalone instance and is meant for learning. To run Pinot in cluster mode, please take a look at Manual cluster setup.
First, let's download the Pinot distribution for this tutorial. You can either build the distribution from source or download a packaged release.
Prerequisites
Install JDK8 or higher.
Follow these steps to checkout code from Github and build Pinot locally
Prerequisites
Install Apache Maven 3.6 or higher
Note that Pinot scripts is located under pinot-distribution/target not target directory under root.
Download the latest binary release from Apache Pinot, or use this command
Once you have the tar file,
We'll be using a quick-start script, which does the following:
Sets up the Pinot cluster QuickStartCluster
Creates a sample table and loads sample data
There's 3 kinds of quick start
Batch quick start creates the pinot cluster, creates an offline table baseballStats
and pushes sample offline data to the table.
That's it! We've spun up a Pinot cluster. You can continue playing with other types of quick start, or simply head on to Pinot Data Explorer to check out the data in the baseballStats
table.
Streaming quick start sets up a Kafka cluster and pushes sample data to a Kafka topic. Then, it creates the Pinot cluster and creates a realtime table meetupRSVP
which ingests data from the Kafka topic.
We now have a Pinot cluster with a realtime table! You can head over to Pinot Data Explorer to check out the data in the meetupRSVP
table.
Hybrid quick start sets up a Kafka cluster and pushes sample data to a Kafka topic. Then, it creates the Pinot cluster and creates a hybrid table airlineStats
. The realtime table ingests data from the Kafka topic. Lastly, sample data is pushed into the offline table.
Let's head over to Pinot Data Explorer to check out the data we pushed to the airlineStats
table.
This quick start guide will show you how to run a Pinot cluster using Docker.
Prerequisites
Install Docker
You can also try Kubernetes quick start if you already have a local minikube cluster installed or Docker Kubernetes setup.
Create an isolated bridge network in docker
We'll be using our docker image apachepinot/pinot:latest
to run this quick start, which does the following:
Sets up the Pinot cluster
Creates a sample table and loads sample data
There are 3 types of quick start examples.
Batch example
Streaming example
Hybrid example
In this example we demonstrate how to do batch processing with Pinot.
Starts Pinot deployment by starting
Apache Zookeeper
Pinot Controller
Pinot Broker
Pinot Server
Creates a demo table
baseballStats
Launches a standalone data ingestion job
Builds one Pinot segment for a given CSV data file for table baseballStats
Pushes the built segment to the Pinot controller
Issues sample queries to Pinot
Once the Docker container is running, you can view the logs by running the following command.
That's it! We've spun up a Pinot cluster.
It may take a while for all the Pinot components to start and for the sample data to be loaded.
Use the below command to check the status in the container logs.
Your cluster is ready once you see the cluster setup completion messages and sample queries, as demonstrated below.
You can head over to Exploring Pinot to check out the data in the baseballStats
table.
In this example we demonstrate how to do stream processing with Pinot.
Starts Pinot deployment by starting
Apache Kafka
Apache Zookeeper
Pinot Controller
Pinot Broker
Pinot Server
Creates a demo table
meetupRsvp
Launches a meetup
**stream
Publishes data to a Kafka topic meetupRSVPEvents
to be subscribed to by Pinot
Issues sample queries to Pinot
Once the cluster is up, you can head over to Exploring Pinot to check out the data in the meetupRSVPEvents
table.
In this example we demonstrate how to do hybrid stream and batch processing with Pinot.
Starts Pinot deployment by starting
Apache Kafka
Apache Zookeeper
Pinot Controller
Pinot Broker
Pinot Server
Creates a demo table
airlineStats
Launches a standalone data ingestion job
Builds Pinot segments under a given directory of Avro files for table airlineStats
Pushes built segments to Pinot controller
Launches a **stream of flights stats
Publishes data to a Kafka topic airlineStatsEvents
to be subscribed to by Pinot
Issues sample queries to Pinot
Once the cluster is up, you can head over to Exploring Pinot to check out the data in the airlineStats
table.
Pinot has the concept of table, which is a logical abstraction to refer to a collection of related data. Pinot has a distributed architecture and scales horizontally. Pinot expects the size of a table to grow infinitely over time. In order to achieve this, the entire data needs to be distributed across multiple nodes. Pinot achieve this by breaking the data into smaller chunks known as segment (this is similar to shards/partitions in relational databases). Segments can also be seen as time based partitions.
Thus, a segment is a horizontal shard representing a chunk of table data with some number of rows. The segment stores data for all columns of the table. Each segment packs the data in a columnar fashion, along with the dictionaries and indices for the columns. The segment is laid out in a columnar format so that it can be directly mapped into memory for serving queries.
Columns may be single or multi-valued. Column types may be STRING, INT, LONG, FLOAT, DOUBLE or BYTES. Columns may be declared to be metric or dimension (or specifically as a time dimension) in the schema. Columns can have default null value. For example, the default null value of a integer column can be 0. Note: The default value of byte column has to be hex-encoded before adding to the schema.
Pinot uses dictionary encoding to store values as a dictionary ID. Columns may be configured to be “no-dictionary” column in which case raw values are stored. Dictionary IDs are encoded using minimum number of bits for efficient storage (e.g. a column with cardinality of 3 will use only 2 bits for each dictionary ID).
There is a forward index built for each column and compressed appropriately for efficient memory use. In addition, optional inverted indices can be configured for any set of columns. Inverted indices, while take up more storage, offer better query performance. Specialized indexes like Star-Tree index is also supported. Check out Indexing for more details.
Once the table is configured, we can load some data. Loading data involves generating pinot segments from raw data and pushing them to the pinot cluster. Data can be loaded in batch mode or streaming mode. See ingestion overview page for details.
Below are instructions to generate and push segments to Pinot via standalone scripts. For a production setup, you should use frameworks such as Hadoop or Spark. See this page for more details on setting up Data Ingestion Jobs.
To generate a segment, we need to first create a job spec yaml file. JobSpec yaml file has all the information regarding data format, input data location and pinot cluster coordinates. Note that this assumes that the controller is RUNNING to fetch the table config and schema. If not, you will have to configure the spec to point at their location.
where,
Top level field
Description
executionFrameworkSpec
jobType
Pinot ingestion job type. Supported job types are:
SegmentCreation - only create segment
SegmentTarPush - only upload segments
SegmentUriPush -
SegmentCreationAndTarPush - create and upload segment
SegmentCreationAndUriPush -
inputDirURI
Root directory of input data, expected to have scheme configured in PinotFS.
includeFileNamePattern
Include file name pattern, supported glob pattern. E.g.
'glob:*.avro'
will include all avro files just under the inputDirURI, not sub directories
'glob:**/*.avro'
will include all the avro files under inputDirURI recursively.
excludeFileNamePattern
Exclude file name pattern, supported glob pattern. Similar usage as includeFilePatternName
outputDirURI
Root directory of output segments, expected to have scheme configured in PinotFS.
overwriteOutput
Overwrite output segments if existed.
pinotFSSpecs
recordReaderSpec
tableSpec
segmentNameGeneratorSpec
pinotClusterSpecs
pushJobSpec
field
Description
name
execution framework name
segmentGenerationJobRunnerClassName
class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentGenerationJobRunner interface.
segmentTarPushJobRunnerClassName
class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentTarPushJobRunner interface.
segmentUriPushJobRunnerClassName
class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentUriPushJobRunner interface.
extraConfigs
Map of extra configs for execution framework
field
description
schema
used to identify a PinotFS. E.g. local, hdfs, dbfs, etc
className
Class name used to create the PinotFS instance. E.g.
org.apache.pinot.spi.filesystem.LocalPinotFS
is used for local filesystem
org.apache.pinot.plugin.filesystem.HadoopPinotFS
is used for HDFS
configs
configs used to init PinotFS instance
field
description
dataFormat
Record data format, e.g. 'avro', 'parquet', 'orc', 'csv', 'json', 'thrift' etc.
className
Corresponding RecordReader class name. E.g.
org.apache.pinot.plugin.inputformat.avro.AvroRecordReader
org.apache.pinot.plugin.inputformat.csv.CSVRecordReader
org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader
org.apache.pinot.plugin.inputformat.json.JsonRecordReader
org.apache.pinot.plugin.inputformat.orc.OrcRecordReader
org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReader
configClassName
Corresponding RecordReaderConfig class name, it's mandatory for CSV and Thrift file format. E.g.
org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig
org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReaderConfig
configs
Used to init RecordReaderConfig class name, this config is required for CSV and Thrift data format.
field
description
tableName
table name
schemaURI
defines where to read the table schema, supports PinotFS or HTTP. E.g.
hdfs://path/to/table_schema.json
http://localhost:9000/tables/myTable/schema
tableConfigURI
defines where to read the table config. Supports using PinotFS or HTTP. E.g.
hdfs://path/to/table_config.json
http://localhost:9000/tables/myTable
field
description
type
supported type is simple
and normalizedDate
configs
configs to init SegmentNameGenerator
field
description
controllerURI
used to fetch table/schema information and data push.
E.g. http://localhost:9000
field
description
pushAttempts
number of attempts for push job, default is 1, which means no retry.
pushRetryIntervalMillis
retry wait Ms, default to 1 second.
pushParallelism
push job parallelism, default is 1
To create and push the segment in one go, use
Sample Console Output
Alternately, you can separately create and then push, by changing the jobType to SegmentCreation
or SegmenTarPush
.
Ingestion job spec supports templating with Groovy Syntax.
This would be convenient for users to generate one ingestion job template file and schedule it in a daily basis with extra parameters updated daily.
E.g. users can set inputDirURI
with parameters to indicate date, so that ingestion job only process the data for a particular date.
Below is an example to specify the date templating for input and output path.
Then specify the value of ${year}, ${month}, ${day}
when kicking off the ingestion job with arguments: -values $param=value1 $param2=value2
...
This ingestion job only generates segment for date 2014-01-03
Prerequisites
Below is an example of how to publish sample data to your stream. As soon as data is available to the realtime stream, it starts getting consumed by the realtime servers
Run below command to stream JSON data into Kafka topic: flights-realtime
Run below command to stream JSON data into Kafka topic: flights-realtime
A table is a logical abstraction to refer to a collection of related data. It consists of columns and rows (documents).
Data in Pinot tables is sharded into segments. A Pinot table is modeled as a Helix resource. Each segment of a table is modeled as a Helix Partition.
A table is typically associated with a schema, which is used to define the names, data types and other information of the columns of the table.
There are 3 types of a Pinot table
Table type
Description
Offline
Offline tables ingest pre-built pinot-segments from external data stores.
Realtime
Realtime tables ingest data from streams (such as Kafka) and build segments.
Hybrid
A hybrid Pinot table has both realtime as well as offline tables under the hood.
Note that the query does not know the existence of offline or realtime tables. It only specifies the table name in the query. For example, regardless of whether we have an offline table myTable_OFFLINE
, or a realtime table myTable_REALTIME
or a hybrid table containing both of these, the query will simply use mytable
as select count(*) from myTable
.
A table config file is used to define the table properties, such as name, type, indexing, routing, retention etc. It is written in JSON format, and stored in the property store in Zookeeper, along with the table schema.
Here's an example table config for an offline table
We will now discuss each section of the table config in detail.
Top level field
Description
tableName
Specifies the name of the table. Should only contain alpha-numeric characters, hyphens (‘-‘), or underscores (‘’). (Using a double-underscore (‘_’) is not allowed and reserved for other features within Pinot)
tableType
Defines the table type - OFFLINE
for offline table, REALTIME
for realtime table. A hybrid table is essentially 2 table configs one of each type, with the same table name.
quota
routing
segmentsConfig
tableIndexConfig
tenants
metadata
This section is for keeping custom configs, which are expressed as key value pairs.
quota fields
Description
storage
The maximum storage space the table is allowed to use, before replication. For example, in the above table, the storage is 140G and replication is 3. Therefore, the maximum storage the table is allowed to use is 140*3=420G. The space used by the table is calculated by adding up the sizes of all segments from every server hosting this table. Once this limit is reached, offline segment push throws a 403
exception with message, Quota check failed for segment: segment_0 of table: pinotTable
.
maxQueriesPerSecond
The maximum queries per second allowed to execute on this table. If query volume exceeds this, a 429
exception with message,Request 123 exceeds query quota for table:pinotTable, query:select count(*) from pinotTable