Loading...
Explore the fundamental concepts of Apache Pinot for efficient data processing and analysis. Gain insights into the core principles and foundational ideas behind Pinot's capabilities.
Pinot is designed to deliver low latency queries on large datasets. To achieve this performance, Pinot stores data in a columnar format and adds additional indices to perform fast filtering, aggregation and group by.
Raw data is broken into small data shards. Each shard is converted into a unit called a segment. One or more segments together form a table, which is the logical container for querying Pinot using SQL/PQL.
Pinot's storage model and infrastructure components include segments, tables, tenants, and clusters.
Pinot has a distributed systems architecture that scales horizontally. Pinot expects the size of a table to grow infinitely over time. In order to achieve this, all data needs to be distributed across multiple nodes. Pinot achieves this by breaking data into smaller chunks known as segments (similar to shards/partitions in high-availability (HA) relational databases). Another way to describe segments is as time-based partitions.
Similar to traditional databases, Pinot has the concept of a table—a logical abstraction that refers to a collection of related data.
As is the case with relational database management systems (RDBMS), a table is a construct that consists of columns and rows (documents) that are queried using SQL. A table is associated with a schema that defines the columns in a table as well as their data types.
In contrast to RDBMS schemas, multiple tables in Pinot can share a single schema definition. Tables are independently configured for concerns such as indexing strategies, partitioning, tenants, data sources, or replication.
Pinot supports multi-tenancy. Every Pinot table is associated with a tenant. This allows all tables belonging to a particular logical namespace to be grouped under a single tenant name and isolated from other tenants. This isolation between tenants provides different namespaces for applications and teams to prevent sharing tables or schemas. Development teams building applications will never have to operate an independent deployment of Pinot. An organization can operate a single cluster and scale it out as new tenants increase the overall volume of queries. Developers can manage their own schemas and tables without being impacted by any other tenant on a cluster.
By default, all tables belong to a default tenant named "default". The concept of tenants is very important, as it satisfies the architectural principle of a "database per service/application" without having to operate many independent data stores. Further, tenants will schedule resources so that segments (shards) are able to restrict a table's data to reside only on a specified set of nodes. Similar to the kind of isolation that is ubiquitously used in Linux containers, compute resources in Pinot can be scheduled to prevent resource contention between tenants.
Logically, a cluster is simply a group of tenants. As with the classical definition of a cluster, it is also a grouping of a set of compute nodes. Typically, there is only one cluster per environment/data center. There is no needed to create multiple clusters since Pinot supports the concept of tenants. At LinkedIn, the largest Pinot cluster consists of 1000+ nodes distributed across a data center. The number of nodes in a cluster can be added in a way that will linearly increase performance and availability of queries. The number of nodes and the compute resources per node will reliably predict the QPS for a Pinot cluster, and as such, capacity planning can be easily achieved using SLAs that assert performance expectations for end-user applications.
Auto-scaling is also achievable, however, we recommend a set amount of nodes to keep QPS consistent when query loads vary in sudden unpredictable end-user usage scenarios.
A Pinot cluster consists of multiple distributed system components. These components are useful to understand for operators that are monitoring system usage or are debugging an issue with a cluster deployment.
Controller
Broker
Server
Minion (optional)
Pinot's integration with Apache Zookeeper and Apache Helix allow it to be linearly scalable for an unbounded number of nodes.
Helix is a cluster management solution designed and created by the authors of Pinot at LinkedIn. Helix drives the state of a Pinot cluster from a transient state to an ideal state, acting as the fault-tolerant distributed state store that guarantees consistency. Helix is embedded as agents that operate within a controller, broker, and server, and does not exist as an independent and horizontally scaled component.
A controller is the core orchestrator that drives the consistency and routing in a Pinot cluster. Controllers are horizontally scaled as an independent component (container) and has visibility of the state of all other components in a cluster. The controller reacts and responds to state changes in the system and schedules the allocation of resources for tables, segments, or nodes. As mentioned earlier, Helix is embedded within the controller as an agent that is a participant responsible for observing and driving state changes that are subscribed to by other components.
In addition to cluster management, resource allocation, and scheduling, the controller is also the HTTP gateway for REST API administration of a Pinot deployment. A web-based query console is also provided for operators to quickly and easily run SQL/PQL queries.
A broker receives queries from a client and routes its execution to one or more Pinot servers before returning a consolidated response.
Servers host segments (shards) that are scheduled and allocated across multiple nodes and routed on an assignment to a tenant (there is a single-tenant by default). Servers are independent containers that scale horizontally and are notified by Helix through state changes driven by the controller. A server can either be a real-time server or an offline server.
A real-time and offline server have very different resource usage requirements, where real-time servers are continually consuming new messages from external systems (such as Kafka topics) that are ingested and allocated on segments of a tenant. Because of this, resource isolation can be used to prioritize high-throughput real-time data streams that are ingested and then made available for query through a broker.
Pinot minion is an optional component that can be used to run background tasks such as "purge" for GDPR (General Data Protection Regulation). As Pinot is an immutable aggregate store, records containing sensitive private data need to be purged on a request-by-request basis. Minion provides a solution for this purpose that complies with GDPR while optimizing Pinot segments and building additional indices that guarantee performance in the presence of the possibility of data deletion. One can also write a custom task that runs on a periodic basis. While it's possible to perform these tasks on the Pinot servers directly, having a separate process (Minion) lessens the overall degradation of query latency as segments are impacted by mutable writes.
Loading...
Loading...
Loading...
Loading...
Loading...
Discover the controller component of Apache Pinot, enabling efficient data and query management.
The Pinot controller is responsible for the following:
Maintaining global metadata (e.g., configs and schemas) of the system with the help of Zookeeper which is used as the persistent metadata store.
Hosting the Helix Controller and managing other Pinot components (brokers, servers, minions)
Maintaining the mapping of which servers are responsible for which segments. This mapping is used by the servers to download the portion of the segments that they are responsible for. This mapping is also used by the broker to decide which servers to route the queries to.
Serving admin endpoints for viewing, creating, updating, and deleting configs, which are used to manage and operate the cluster.
Serving endpoints for segment uploads, which are used in offline data pushes. They are responsible for initializing real-time consumption and coordination of persisting real-time segments into the segment store periodically.
Undertaking other management activities such as managing retention of segments, validations.
For redundancy, there can be multiple instances of Pinot controllers. Pinot expects that all controllers are configured with the same back-end storage system so that they have a common view of the segments (e.g. NFS). Pinot can use other storage systems such as HDFS or ADLS.
The controller runs several periodic tasks in the background, to perform activities such as management and validation. Each periodic task has its own configuration to define the run frequency and default frequency. Each task runs at its own schedule or can also be triggered manually if needed. The task runs on the lead controller for each table.
For period task configuration details, see Controller configuration reference.
Use the GET /periodictask/names
API to fetch the names of all the periodic tasks running on your Pinot cluster.
To manually run a named periodic task, use the GET /periodictask/run
API:
The Log Request Id
(api-09630c07
) can be used to search through pinot-controller log file to see log entries related to execution of the Periodic task that was manually run.
If tableName
(and its type OFFLINE
or REALTIME
) is not provided, the task will run against all tables.
Make sure you've set up Zookeeper. If you're using Docker, make sure to pull the Pinot Docker image. To start a controller:
Loading...
Loading...
Explore the table component in Apache Pinot, a fundamental building block for organizing and managing data in Pinot clusters, enabling effective data processing and analysis.
A table is a logical abstraction that represents a collection of related data. It is composed of columns and rows (known as documents in Pinot). The columns, data types, and other metadata related to the table are defined using a schema.
Pinot breaks a table into multiple segments and stores these segments in a deep-store such as Hadoop Distributed File System (HDFS) as well as Pinot servers.
In the Pinot cluster, a table is modeled as a Helix resource and each segment of a table is modeled as a Helix Partition.
Table naming in Pinot follows typical naming conventions, such as starting names with a letter, not ending with an underscore, and using only alphanumeric characters.
Pinot supports the following types of tables:
Offline
Offline tables ingest pre-built Pinot segments from external data stores and are generally used for batch ingestion.
Real-time
Real-time tables ingest data from streams (such as Kafka) and build segments from the consumed data.
Hybrid
Hybrid Pinot tables have both real-time as well as offline tables under the hood. By default, all tables in Pinot are hybrid.
The user querying the database does not need to know the type of the table. They only need to specify the table name in the query.
e.g. regardless of whether we have an offline table myTable_OFFLINE
, a real-time table myTable_REALTIME
, or a hybrid table containing both of these, the query will be:
Table configuration is used to define the table properties, such as name, type, indexing, routing, and retention. It is written in JSON format and is stored in Zookeeper, along with the table schema.
Use the following properties to make your tables faster or leaner:
Segment
Indexing
Tenants
A table is comprised of small chunks of data known as segments. Learn more about how Pinot creates and manages segments here.
For offline tables, segments are built outside of Pinot and uploaded using a distributed executor such as Spark or Hadoop. For details, see Batch Ingestion.
For real-time tables, segments are built in a specific interval inside Pinot. You can tune the following for the real-time segments.
The Pinot real-time consumer ingests the data, creates the segment, and then flushes the in-memory segment to disk. Pinot allows you to configure when to flush the segment in the following ways:
Number of consumed rows: After consuming the specified number of rows from the stream, Pinot will persist the segment to disk.
Number of desired rows per segment: Pinot learns and then estimates the number of rows that need to be consumed. The learning phase starts by setting the number of rows to 100,000 (this value can be changed) and adjusts it to reach the appropriate segment size. Because Pinot corrects the estimate as it goes along, the segment size might go significantly over the correct size during the learning phase. You should set this value to optimize the performance of queries.
Max time duration to wait: Pinot consumers wait for the configured time duration after which segments are persisted to the disk.
Replicas A segment can have multiple replicas to provide higher availability. You can configure the number of replicas for a table segment using the CLI.
Completion Mode By default, if the in-memory segment in the non-winner server is equivalent to the committed segment, then the non-winner server builds and replaces the segment. If the available segment is not equivalent to the committed segment, the server just downloads the committed segment from the controller.
However, in certain scenarios, the segment build can get very memory-intensive. In these cases, you might want to enforce the non-committer servers to just download the segment from the controller instead of building it again. You can do this by setting completionMode: "DOWNLOAD"
in the table configuration.
For details, see Completion Config.
Download Scheme
A Pinot server might fail to download segments from the deep store, such as HDFS, after its completion. However, you can configure servers to download these segments from peer servers instead of the deep store. Currently, only HTTP and HTTPS download schemes are supported. More methods, such as gRPC/Thrift, are planned be added in the future.
For more details about peer segment download during real-time ingestion, refer to this design doc on bypass deep store for segment completion.
You can create multiple indices on a table to increase the performance of the queries. The following types of indices are supported:
Dictionary-encoded forward index with bit compression
Raw value forward index
Sorted forward index with run-length encoding
Bitmap inverted index
Sorted inverted index
For more details on each indexing mechanism and corresponding configurations, see Indexing.
Set up Bloomfilters on columns to make queries faster. You can also keep segments in off-heap instead of on-heap memory for faster queries.
Aggregate the real-time stream data as it is consumed to reduce segment sizes. We add the metric column values of all rows that have the same values for all dimension and time columns and create a single row in the segment. This feature is only available on REALTIME
tables.
The only supported aggregation is SUM
. The columns to pre-aggregate need to satisfy the following requirements:
All metrics should be listed in noDictionaryColumns
.
No multi-value dimensions
All dimension columns are treated to have a dictionary, even if they appear as noDictionaryColumns
in the config.
The following table config snippet shows an example of enabling pre-aggregation during real-time ingestion:
Each table is associated with a tenant. A segment resides on the server, which has the same tenant as itself. For details, see Tenant.
Optionally, override if a table should move to a server with different tenant based on segment status. The example below adds a tagOverrideConfig
under the tenants
section for real-time tables to override tags for consuming and completed segments.
In the above example, the consuming segments will still be assigned to serverTenantName_REALTIME
hosts, but once they are completed, the segments will be moved to serverTeantnName_OFFLINE
.
You can specify the full name of any tag in this section. For example, you could decide that completed segments for this table should be in Pinot servers tagged as allTables_COMPLETED
). To learn more about, see the Moving Completed Segments section.
A hybrid table is a table composed of two tables, one offline and one real-time, that share the same name. In a hybrid table, offline segments can be pushed periodically. The retention on the offline table can be set to a high value because segments are coming in on a periodic basis, whereas the retention on the real-time part can be small.
Once an offline segment is pushed to cover a recent time period, the brokers automatically switch to using the offline table for segments for that time period and use the real-time table only for data not available in the offline table.
To learn how time boundaries work for hybrid tables, see Broker.
A typical use case for hybrid tables is pushing deduplicated, cleaned-up data into an offline table every day while consuming real-time data as it arrives. Data can remain in offline tables for as long as a few years, while the real-time data would be cleaned every few days.
Create a table config for your data, or see examples
for all possible batch/streaming tables.
Prerequisites
Sample console output
Check out the table config in the Rest API to make sure it was successfully uploaded.
Start Kafka
Create a Kafka topic
Create a streaming table
Sample output
Start Kafka-Zookeeper
Start Kafka
Create stream table
Check out the table config in the Rest API to make sure it was successfully uploaded.
To create a hybrid table, you have to create the offline and real-time tables individually. You don't need to create a separate hybrid table.
Discover the segment component in Apache Pinot for efficient data storage and querying within Pinot clusters, enabling optimized data processing and analysis.
Pinot has the concept of a table, which is a logical abstraction to refer to a collection of related data. Pinot has a distributed architecture and scales horizontally. Pinot expects the size of a table to grow infinitely over time. In order to achieve this, the entire data needs to be distributed across multiple nodes.
Pinot achieves this by breaking the data into smaller chunks known as segments (similar to shards/partitions in relational databases). Segments can be seen as time-based partitions.
A segment is a horizontal shard representing a chunk of table data with some number of rows. The segment stores data for all columns of the table. Each segment packs the data in a columnar fashion, along with the dictionaries and indices for the columns. The segment is laid out in a columnar format so that it can be directly mapped into memory for serving queries.
Columns can be single or multi-valued and the following types are supported: STRING, BOOLEAN, INT, LONG, FLOAT, DOUBLE, TIMESTAMP or BYTES. Only single-valued BIG_DECIMAL data type is supported.
Columns may be declared to be metric or dimension (or specifically as a time dimension) in the schema. Columns can have default null values. For example, the default null value of a integer column can be 0. The default value for bytes columns must be hex-encoded before it's added to the schema.
Pinot uses dictionary encoding to store values as a dictionary ID. Columns may be configured to be “no-dictionary” column in which case raw values are stored. Dictionary IDs are encoded using minimum number of bits for efficient storage (e.g. a column with a cardinality of 3 will use only 2 bits for each dictionary ID).
A forward index is built for each column and compressed for efficient memory use. In addition, you can optionally configure inverted indices for any set of columns. Inverted indices take up more storage, but improve query performance. Specialized indexes like Star-Tree index are also supported. For more details, see Indexing.
Once the table is configured, we can load some data. Loading data involves generating pinot segments from raw data and pushing them to the pinot cluster. Data can be loaded in batch mode or streaming mode. For more details, see the ingestion overview page.
Below are instructions to generate and push segments to Pinot via standalone scripts. For a production setup, you should use frameworks such as Hadoop or Spark. For more details on setting up data ingestion jobs, see Import Data.
To generate a segment, we need to first create a job spec YAML file. This file contains all the information regarding data format, input data location, and pinot cluster coordinates. Note that this assumes that the controller is RUNNING to fetch the table config and schema. If not, you will have to configure the spec to point at their location. For full configurations, see Ingestion Job Spec.
To create and push the segment in one go, use the following:
Sample Console Output
Alternately, you can separately create and then push, by changing the jobType to SegmentCreation
or SegmenTarPush
.
The Ingestion job spec supports templating with Groovy Syntax.
This is convenient if you want to generate one ingestion job template file and schedule it on a daily basis with extra parameters updated daily.
e.g. you could set inputDirURI
with parameters to indicate the date, so that the ingestion job only processes the data for a particular date. Below is an example that templates the date for input and output directories.
You can pass in arguments containing values for ${year}, ${month}, ${day}
when kicking off the ingestion job: -values $param=value1 $param2=value2
...
This ingestion job only generates segments for date 2014-01-03
Prerequisites
Below is an example of how to publish sample data to your stream. As soon as data is available to the real-time stream, it starts getting consumed by the real-time servers.
Run below command to stream JSON data into Kafka topic: flights-realtime
Run below command to stream JSON data into Kafka topic: flights-realtime
Loading...
Explore the Schema component in Apache Pinot, vital for defining the structure and data types of Pinot tables, enabling efficient data processing and analysis.
Each table in Pinot is associated with a schema. A schema defines what fields are present in the table along with the data types.
The schema is stored in Zookeeper along with the table configuration.
Schema naming in Pinot follows typical database table naming conventions, such as starting names with a letter, not ending with an underscore, and using only alphanumeric characters
A schema also defines what category a column belongs to. Columns in a Pinot table can be categorized into three categories:
Dimension
Dimension columns are typically used in slice and dice operations for answering business queries. Some operations for which dimension columns are used:
GROUP BY
- group by one or more dimension columns along with aggregations on one or more metric columns
Filter clauses such as WHERE
Metric
These columns represent the quantitative data of the table. Such columns are used for aggregation. In data warehouse terminology, these can also be referred to as fact or measure columns.
Some operation for which metric columns are used:
Aggregation - SUM
, MIN
, MAX
, COUNT
, AVG
etc
Filter clause such as WHERE
DateTime
Common operations that can be done on time column:
GROUP BY
Filter clauses such as WHERE
Pinot does not enforce strict rules on which of these categories columns belong to, rather the categories can be thought of as hints to Pinot to do internal optimizations.
For example, metrics may be stored without a dictionary and can have a different default null value.
The categories are also relevant when doing segment merge and rollups. Pinot uses the dimension and time fields to identify records against which to apply merge/rollups.
Metrics aggregation is another example where Pinot uses dimensions and time are used as the key, and automatically aggregates values for the metric columns.
For configuration details, see Schema configuration reference.
Since Pinot doesn't have a dedicated DATETIME
datatype support, you need to input time in either STRING, LONG, or INT format. However, Pinot needs to convert the date into an understandable format such as epoch timestamp to do operations. You can refer to DateTime field spec configs for more details on supported formats.
First, Make sure your cluster is up and running.
Let's create a schema and put it in a JSON file. For this example, we have created a schema for flight data.
For more details on constructing a schema file, see the Schema configuration reference.
Then, we can upload the sample schema provided above using either a Bash command or REST API call.
Check out the schema in the Rest API to make sure it was successfully uploaded
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Here you will find a collection of ready-made sample applications and examples for real-world data
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Apache Pinot is a real-time distributed OLAP datastore purpose-built for low-latency, high-throughput analytics.
We'd love to hear from you! Join us in our Slack channel to ask questions, troubleshoot, and share feedback.
Apache Pinot is a real-time distributed online analytical processing (OLAP) datastore. Use Pinot to ingest and immediately query data from streaming or batch data sources (including, Apache Kafka, Amazon Kinesis, Hadoop HDFS, Amazon S3, Azure ADLS, and Google Cloud Storage).
Apache Pinot includes the following:
Ultra low-latency analytics even at extremely high throughput.
Columnar data store with several smart indexing and pre-aggregation techniques.
Scaling up and out with no upper bound.
Consistent performance based on the size of your cluster and an expected query per second (QPS) threshold.
It's perfect for user-facing real-time analytics and other analytical use cases, including internal dashboards, anomaly detection, and ad hoc data exploration.
User-facing analytics refers to the analytical tools exposed to the end users of your product. In a user-facing analytics application, all users receive personalized analytics on their devices, resulting in hundreds of thousands of queries per second. Queries triggered by apps may grow quickly in proportion to the number of active users on the app, as many as millions of events per second. Data generated in Pinot is immediately available for analytics in latencies under one second.
User-facing real-time analytics requires the following:
Fresh data. The system needs to be able to ingest data in real time and make it available for querying, also in real time.
Support for high-velocity, highly dimensional event data from a wide range of actions and from multiple sources.
Low latency. Queries are triggered by end users interacting with apps, resulting in hundreds of thousands of queries per second with arbitrary patterns.
Reliability and high availability.
Scalability.
Low cost to serve.
Pinot is designed to execute OLAP queries with low latency. It works well where you need fast analytics, such as aggregations, on both mutable and immutable data.
User-facing, real-time analytics
Pinot was originally built at LinkedIn to power rich interactive real-time analytics applications, such as Who Viewed Profile, Company Analytics, Talent Insights, and many more. UberEats Restaurant Manager is another example of a user-facing analytics app built with Pinot.
Real-time dashboards for business metrics
Pinot can perform typical analytical operations such as slice and dice, drill down, roll up, and pivot on large scale multi-dimensional data. For instance, at LinkedIn, Pinot powers dashboards for thousands of business metrics. Connect various business intelligence (BI) tools such as Superset, Tableau, or PowerBI to visualize data in Pinot.
Enterprise business intelligence
For analysts and data scientists, Pinot works well as a highly-scalable data platform for business intelligence. Pinot converges big data platforms with the traditional role of a data warehouse, making it a suitable replacement for analysis and reporting.
Enterprise application development
For application developers, Pinot works well as an aggregate store that sources events from streaming data sources, such as Kafka, and makes it available for a query using SQL. You can also use Pinot to aggregate data across a microservice architecture into one easily queryable view of the domain.
Pinot tenants prevent any possibility of sharing ownership of database tables across microservice teams. Developers can create their own query models of data from multiple systems of record depending on their use case and needs. As with all aggregate stores, query models are eventually consistent.
If you're new to Pinot, take a look at our Getting Started guide:
Getting StartedTo start importing data into Pinot, see how to import batch and stream data:
Import DataTo start querying data in Pinot, check out our Query guide:
QueryFor a conceptual overview that explains how Pinot works, check out the Concepts guide:
ConceptsTo understand the distributed systems architecture that explains Pinot's operating model, take a look at our basic architecture section:
ArchitectureDiscover the core components of Apache Pinot, enabling efficient data processing and analytics. Unleash the power of Pinot's building blocks for high-performance data-driven applications.
Pages in this section define and describe the major components and logical abstractions used in Pinot.
For a general overview that ties all these components together, see Basic Concepts.
Learn to build and manage Apache Pinot clusters, uncovering key components for efficient data processing and optimized analysis.
A cluster is a set of nodes comprising of servers, brokers, controllers and minions.
Pinot uses Apache Helix for cluster management. Helix is a cluster management framework that manages replicated, partitioned resources in a distributed system. Helix uses Zookeeper to store cluster state and metadata.
For details of cluster configuration settings, see Cluster configuration reference.
Helix divides nodes into logical components based on their responsibilities:
Participants are the nodes that host distributed, partitioned resources
Pinot servers are modeled as participants. For details about server nodes, see Server.
Spectators are the nodes that observe the current state of each participant and use that information to access the resources. Spectators are notified of state changes in the cluster (state of a participant, or that of a partition in a participant).
Pinot brokers are modeled as spectators. For details about broker nodes, see Broker.
The node that observes and controls the Participant nodes. It is responsible for coordinating all transitions in the cluster and ensuring that state constraints are satisfied while maintaining cluster stability.
Pinot controllers are modeled as controllers. For details about controller nodes, see Controller.
Another way to visualize the cluster is a logical view, where:
Typically, there is only one cluster per environment/data center. There is no need to create multiple Pinot clusters because Pinot supports tenants.
To set up a cluster, see one of the following guides:
Discover the tenant component of Apache Pinot, which facilitates efficient data isolation and resource management within Pinot clusters.
A tenant is a logical component defined as a group of server/broker nodes with the same Helix tag.
In order to support multi-tenancy, Pinot has first-class support for tenants. Every table is associated with a server tenant and a broker tenant. This controls the nodes that will be used by this table as servers and brokers. This allows all tables belonging to a particular use case to be grouped under a single tenant name.
The concept of tenants is very important when the multiple use cases are using Pinot and there is a need to provide quotas or some sort of isolation across tenants. For example, consider we have two tables Table A
and Table B
in the same Pinot cluster.
We can configure Table A
with server tenant Tenant A
and Table B
with server tenant Tenant B
. We can tag some of the server nodes for Tenant A
and some for Tenant B
. This will ensure that segments of Table A
only reside on servers tagged with Tenant A
, and segment of Table B
only reside on servers tagged with Tenant B
. The same isolation can be achieved at the broker level, by configuring broker tenants to the tables.
No need to create separate clusters for every table or use case!
This tenant is defined in the tenants section of the table config.
This section contains two main fields broker
and server
, which decide the tenants used for the broker and server components of this table.
In the above example:
The table will be served by brokers that have been tagged as brokerTenantName_BROKER
in Helix.
If this were an offline table, the offline segments for the table will be hosted in Pinot servers tagged in Helix as serverTenantName_OFFLINE
If this were a real-time table, the real-time segments (both consuming as well as completed ones) will be hosted in pinot servers tagged in Helix as serverTenantName_REALTIME
.
Here's a sample broker tenant config. This will create a broker tenant sampleBrokerTenant
by tagging three untagged broker nodes as sampleBrokerTenant_BROKER
.
To create this tenant use the following command. The creation will fail if number of untagged broker nodes is less than numberOfInstances
.
Follow instructions in Getting Pinot to get Pinot locally, and then
Check out the table config in the Rest API to make sure it was successfully uploaded.
Here's a sample server tenant config. This will create a server tenant sampleServerTenant
by tagging 1 untagged server node as sampleServerTenant_OFFLINE
and 1 untagged server node as sampleServerTenant_REALTIME
.
To create this tenant use the following command. The creation will fail if number of untagged server nodes is less than offlineInstances
+ realtimeInstances
.
Follow instructions in Getting Pinot to get Pinot locally, and then
Check out the table config in the Rest API to make sure it was successfully uploaded.
Uncover the efficient data processing and storage capabilities of Apache Pinot's server component, optimizing performance for data-driven applications.
Servers host the data segments and serve queries off the data they host. There are two types of servers:
Offline Offline servers are responsible for downloading segments from the segment store, to host and serve queries off. When a new segment is uploaded to the controller, the controller decides the servers (as many as replication) that will host the new segment and notifies them to download the segment from the segment store. On receiving this notification, the servers download the segment file and load the segment onto the server, to server queries off them.
Real-time Real-time servers directly ingest from a real-time stream (such as Kafka or EventHubs). Periodically, they make segments of the in-memory ingested data, based on certain thresholds. This segment is then persisted onto the segment store.
Pinot servers are modeled as Helix participants, hosting Pinot tables (referred to as resources in Helix terminology). Segments of a table are modeled as Helix partitions (of a resource). Thus, a Pinot server hosts one or more Helix partitions of one or more helix resources (i.e. one or more segments of one or more tables).
Make sure you've set up Zookeeper. If you're using Docker, make sure to pull the Pinot Docker image. To start a server:
Discover how Apache Pinot's broker component optimizes query processing, data retrieval, and enhances data-driven applications.
Brokers handle Pinot queries. They accept queries from clients and forward them to the right servers. They collect results back from the servers and consolidate them into a single response, to send back to the client.
Pinot brokers are modeled as Helix spectators. They need to know the location of each segment of a table (and each replica of the segments) and route requests to the appropriate server that hosts the segments of the table being queried.
The broker ensures that all the rows of the table are queried exactly once so as to return correct, consistent results for a query. The brokers may optimize to prune some of the segments as long as accuracy is not sacrificed.
Helix provides the framework by which spectators can learn the location in which each partition of a resource (i.e. participant) resides. The brokers use this mechanism to learn the servers that host specific segments of a table.
In the case of hybrid tables, the brokers ensure that the overlap between real-time and offline segment data is queried exactly once, by performing offline and real-time federation.
Let's take this example, we have real-time data for 5 days - March 23 to March 27, and offline data has been pushed until Mar 25, which is 2 days behind real-time. The brokers maintain this time boundary.
Suppose, we get a query to this table : select sum(metric) from table
. The broker will split the query into 2 queries based on this time boundary – one for offline and one for real-time. This query becomes select sum(metric) from table_REALTIME where date >= Mar 25
and select sum(metric) from table_OFFLINE where date < Mar 25
The broker merges results from both these queries before returning the result to the client.
Make sure you've set up Zookeeper. If you're using Docker, make sure to pull the Pinot Docker image. To start a broker:
Leverage Apache Pinot's deep store component for efficient large-scale data storage and management, enabling impactful data processing and analysis.
The deep store (or deep storage) is the permanent store for segment files.
It is used for backup and restore operations. New server nodes in a cluster will pull down a copy of segment files from the deep store. If the local segment files on a server gets damaged in some way (or accidentally deleted), a new copy will be pulled down from the deep store on server restart.
The deep store stores a compressed version of the segment files and it typically won't include any indexes. These compressed files can be stored on a local file system or on a variety of other file systems. For more details on supported file systems, see File Systems.
Note: Deep store by itself is not sufficient for restore operations. Pinot stores metadata such as table config, schema, segment metadata in Zookeeper. For restore operations, both Deep Store as well as Zookeeper metadata are required.
There are several different ways that segments are persisted in the deep store.
For offline tables, the batch ingestion job writes the segment directly into the deep store, as shown in the diagram below:
The ingestion job then sends a notification about the new segment to the controller, which in turn notifies the appropriate server to pull down that segment.
For real-time tables, by default, a segment is first built-in memory by the server. It is then uploaded to the lead controller (as part of the Segment Completion Protocol sequence), which writes the segment into the deep store, as shown in the diagram below:
Having all segments go through the controller can become a system bottleneck under heavy load, in which case you can use the peer download policy, as described in Decoupling Controller from the Data Path.
When using this configuration, the server will directly write a completed segment to the deep store, as shown in the diagram below:
For hands-on examples of how to configure the deep store, see the following tutorials:
This section contains quick start guides to help you get up and running with Pinot.
To simplify the getting started experience, Pinot ships with quick start guides that launch Pinot components in a single process and import pre-built datasets.
For a full list of these guides, see Quick Start Examples.
Getting data into Pinot is easy. Take a look at these two quick start guides which will help you get up and running with sample data for offline and real-time tables.
Pinot Data Explorer is a user-friendly interface in Apache Pinot for interactive data exploration, querying, and visualization.
Once you have set up a cluster, you can start exploring the data and the APIs using the Pinot Data Explorer.
Navigate to http://localhost:9000 in your browser to open the Data Explorer UI.
The first screen that you'll see when you open the Pinot Data Explorer is the Cluster Manager. The Cluster Manager provides a UI to operate and manage your cluster.
If you want to view the contents of a server, click on its instance name. You'll then see the following:
To view the baseballStats table, click on its name, which will show the following screen:
From this screen, we can edit or delete the table, edit or adjust its schema, as well as several other operations.
For example, if we want to add yearID to the list of inverted indexes, click on Edit Table, add the extra column, and click Save:
Let's run some queries on the data in the Pinot cluster. Navigate to Query Console to see the querying interface.
We can see our baseballStats
table listed on the left (you will see meetupRSVP
or airlineStats
if you used the streaming or the hybrid quick start). Click on the table name to display all the names along with the data types of the columns of the table.
You can also execute a sample query select * from baseballStats limit 10
by typing it in the text box and clicking the Run Query button.
Cmd + Enter
can also be used to run the query when focused on the console.
Here are some sample queries you can try:
Pinot supports a subset of standard SQL. For more information, see Pinot Query Language.
The Pinot Admin UI contains all the APIs that you will need to operate and manage your cluster. It provides a set of APIs for Pinot cluster management including health check, instances management, schema and table management, data segments management.
Let's check out the tables in this cluster by going to Table -> List all tables in cluster, click Try it out, and then click Execute. We can see thebaseballStats
table listed here. We can also see the exact cURL call made to the controller API.
You can look at the configuration of this table by going to Tables -> Get/Enable/Disable/Drop a table, click Try it out, type baseballStats
in the table name, and then click Execute.
Let's check out the schemas in the cluster by going to Schema -> List all schemas in the cluster, click Try it out, and then click Execute. We can see a schema called baseballStats
in this list.
Take a look at the schema by going to Schema -> Get a schema, click Try it out, type baseballStats
in the schema name, and then click Execute.
Finally, let's check out the data segments in the cluster by going to Segment -> List all segments, click Try it out, type in baseballStats
in the table name, and then click Execute. There's 1 segment for this table, called baseballStats_OFFLINE_0
.
To learn how to upload your own data and schema, see Batch Ingestion or Stream ingestion.
Explore the minion component in Apache Pinot, empowering efficient data movement and segment generation within Pinot clusters.
A minion is a standby component that leverages the to offload computationally intensive tasks from other components.
It can be attached to an existing Pinot cluster and then execute tasks as provided by the controller. Custom tasks can be plugged via annotations into the cluster. Some typical minion tasks are:
Segment creation
Segment purge
Segment merge
Make sure you've . If you're using Docker, make sure to . To start a minion:
PinotTaskGenerator interface defines the APIs for the controller to generate tasks for minions to execute.
Factory for PinotTaskExecutor
which defines the APIs for Minion to execute the tasks.
Factory for MinionEventObserver
which defines the APIs for task event callbacks on minion.
The PushTask can fetch files from an input folder e.g. from a S3 bucket and converts them into segments. The PushTask converts one file into one segment and keeps file name in segment metadata to avoid duplicate ingestion. Below is an example task config to put in TableConfig to enable this task. The task is scheduled every 10min to keep ingesting remaining files, with 10 parallel task at max and 1 file per task.
NOTE: You may want to simply omit "tableMaxNumTasks" due to this caveat: the task generates one segment per file, and derives segment name based on the time column of the file. If two files happen to have same time range and are ingested by tasks from different schedules, there might be segment name conflict. To overcome this issue for now, you can omit “tableMaxNumTasks” and by default it’s Integer.MAX_VALUE, meaning to schedule as many tasks as possible to ingest all input files in a single batch. Within one batch, a sequence number suffix is used to ensure no segment name conflict. Because the sequence number suffix is scoped within one batch, tasks from different batches might encounter segment name conflict issue said above.
When performing ingestion at scale remember that Pinot will list all of the files contained in the `inputDirURI` every time a `SegmentGenerationAndPushTask` job gets scheduled. This could become a bottleneck when fetching files from a cloud bucket like GCS. To prevent this make `inputDirURI` point to the least number of files possible.
Tasks are enabled on a per-table basis. To enable a certain task type (e.g. myTask
) on a table, update the table config to include the task type:
Under each enable task type, custom properties can be configured for the task type.
There are also two task configs to be set as part of cluster configs like below. One controls task's overall timeout (1hr by default) and one for how many tasks to run on a single minion worker (1 by default).
There are 2 ways to enable task scheduling:
Tasks can be scheduled periodically for all task types on all enabled tables. Enable auto task scheduling by configuring the schedule frequency in the controller config with the key controller.task.frequencyPeriod
. This takes period strings as values, e.g. 2h, 30m, 1d.
Tasks can also be scheduled based on cron expressions. The cron expression is set in the schedule
config for each task type separately. This config in the controller config, controller.task.scheduler.enabled
should be set to true
to enable cron scheduling.
Tasks can be manually scheduled using the following controller rest APIs:
To plug in a custom task, implement PinotTaskGenerator
, PinotTaskExecutorFactory
and MinionEventObserverFactory
(optional) for the task type (all of them should return the same string for getTaskType()
), and annotate them with the following annotations:
After annotating the classes, put them under the package of name org.apache.pinot.*.plugin.minion.tasks.*
, then they will be auto-registered by the controller and minion.
In the Pinot UI, there is Minion Task Manager tab under Cluster Manager page. From that minion task manager tab, one can find a lot of task related info for troubleshooting. Those info are mainly collected from the Pinot controller that schedules tasks or Helix that tracks task runtime status. There are also buttons to schedule tasks in an ad hoc way. Below are some brief introductions to some pages under the minion task manager tab.
This one shows which types of Minion Task have been used. Essentially which task types have created their task queues in Helix.
Clicking into a task type, one can see the tables using that task. And a few buttons to stop the task queue, cleaning up ended tasks etc.
Then clicking into any table in this list, one can see how the task is configured for that table. And the task metadata if there is one in ZK. For example, MergeRollupTask tracks a watermark in ZK. If the task is cron scheduled, the current and next schedules are also shown in this page like below.
At the bottom of this page is a list of tasks generated for this table for this specific task type. Like here, one MergeRollup task has been generated and completed.
Clicking into a task from that list, we can see start/end time for it, and the sub tasks generated for that task (as context, one minion task can have multiple sub-tasks to process data in parallel). In this example, it happened to have one sub-task here, and it shows when it starts and stops and which minion worker it's running.
Clicking into this subtask, one can see more details about it like the input task configs and error info if the task failed.
There is a controller job that runs every 5 minutes by default and emits metrics about Minion tasks scheduled in Pinot. The following metrics are emitted for each task type:
NumMinionTasksInProgress: Number of running tasks
NumMinionSubtasksRunning: Number of running sub-tasks
NumMinionSubtasksWaiting: Number of waiting sub-tasks (unassigned to a minion as yet)
NumMinionSubtasksError: Number of error sub-tasks (completed with an error/exception)
PercentMinionSubtasksInQueue: Percent of sub-tasks in waiting or running states
PercentMinionSubtasksInError: Percent of sub-tasks in error
The controller also emits metrics about how tasks are cron scheduled:
cronSchedulerJobScheduled: Number of current cron schedules registered to be triggered regularly according their cron expressions. It's a Gauge.
cronSchedulerJobTrigger: Number of cron scheduled triggered, as a Meter.
cronSchedulerJobSkipped: Number of late cron scheduled skipped, as a Meter.
cronSchedulerJobExecutionTimeMs: Time used to complete task generation, as a Timer.
For each task, the minion will emit these metrics:
TASK_QUEUEING: Task queueing time (task_dequeue_time - task_inqueue_time), assuming the time drift between helix controller and pinot minion is minor, otherwise the value may be negative
TASK_EXECUTION: Task execution time, which is the time spent on executing the task
NUMBER_OF_TASKS: number of tasks in progress on that minion. Whenever a Minion starts a task, increase the Gauge by 1, whenever a Minion completes (either succeeded or failed) a task, decrease it by 1
NUMBER_TASKS_EXECUTED: Number of tasks executed, as a Meter.
NUMBER_TASKS_COMPLETED: Number of tasks completed, as a Meter.
NUMBER_TASKS_CANCELLED: Number of tasks cancelled, as a Meter.
NUMBER_TASKS_FAILED: Number of tasks failed, as a Meter. Different from fatal failure, the task encountered an error which can not be recovered from this run, but it may still succeed by retrying the task.
NUMBER_TASKS_FATAL_FAILED: Number of tasks fatal failed, as a Meter. Different from failure, the task encountered an error, which will not be recoverable even with retrying the task.
This quickstart guide helps you get started running Pinot on Microsoft Azure.
In this quickstart guide, you will set up a Kubernetes Cluster on
Follow this link () to install kubectl.
For Mac users
Check kubectl version after installation.
Quickstart scripts are tested under kubectl client version v1.16.3 and server version v1.13.12
To install Helm, see .
For Mac users
Check helm version after installation.
This quickstart provides helm supports for helm v3.0.0 and v2.12.1. Pick the script based on your helm version.
For Mac users
This script will open your default browser to sign-in to your Azure Account.
Use the following script create a resource group in location eastus.
This script will create a 3 node cluster named pinot-quickstart for demo purposes.
Modify the parameters in the following example command with your resource group and cluster details:
Once the command succeeds, the cluster is ready to be used.
Run the following command to get the credential for the cluster pinot-quickstart that you just created:
To verify the connection, run the following:
This page links to multiple quick start guides for deploying Pinot to different public cloud providers.
These quickstart guides show you how to run an Apache Pinot cluster using Kubernetes on different public cloud providers.
This guide will show you to run a Pinot cluster using Docker.
Get started setting up a Pinot cluster with Docker using the guide below.
Prerequisites:
Install
Configure Docker memory with the following minimum resources:
CPUs: 8
Memory: 16.00 GB
Swap: 4 GB
Disk Image size: 60 GB
The latest Pinot Docker image is published at apachepinot/pinot:latest
. View a list of .
Pull the latest Docker image onto your machine by running the following command:
To pull a specific version, modify the command like below:
Once you've downloaded the Pinot Docker image, it's time to set up a cluster. There are two ways to do this.
Pinot comes with quick start commands that launch instances of Pinot components in the same process and import pre-built datasets.
For example, the following quick start command launches Pinot with a baseball dataset pre-loaded:
The quick start scripts launch Pinot with minimal resources. If you want to play with bigger datasets (more than a few MB), you can launch each of the Pinot components individually.
Note that these are sample configurations to be used as references. You will likely want to customize them to meet your needs for production use.
Create an isolated bridge network in docker
Start Pinot Controller in daemon and connect to Zookeeper.
The command below expects a 4GB memory container. Tune-Xms
and-Xmx
if your machine doesn't have enough resources.
Start Pinot Broker in daemon and connect to Zookeeper.
The command below expects a 4GB memory container. Tune-Xms
and-Xmx
if your machine doesn't have enough resources.
Start Pinot Server in daemon and connect to Zookeeper.
The command below expects a 16GB memory container. Tune-Xms
and-Xmx
if your machine doesn't have enough resources.
Optionally, you can also start Kafka for setting up real-time streams. This brings up the Kafka broker on port 9092.
Now all Pinot related components are started as an empty cluster.
Run the below command to check container status:
Sample Console Output
Create a file called docker-compose.yml that contains the following:
Run the following command to launch all the components:
Run the below command to check the container status:
Sample Console Output
Uncover the efficient data processing architecture of Apache Pinot, empowering impactful analytics. Explore its powerful components and design principles for actionable insights.
This page introduces you to the guiding principles behind the design of Apache Pinot. Here you will learn the distributed systems architecture that allows Pinot to scale the performance of queries linearly based on the number of nodes in a cluster. You'll also learn about the two different types of tables used to ingest and query data in offline (batch) or real-time (stream) mode.
We recommend that you read to better understand the terms used in this guide.
Engineers at LinkedIn and Uber designed Pinot to scale query performance based on the number of nodes in a cluster. As you add more nodes, query performance improves based on the expected query volume per second quota. To achieve horizontal scalability to an unbounded number of nodes and data storage, without performance degradation, we used the following principles:
Highly available: Pinot is built to serve low latency analytical queries for customer-facing applications. By design, there is no single point of failure in Pinot. The system continues to serve queries when a node goes down.
Horizontally scalable: Pinot scales by adding new nodes as a workload changes.
Latency vs. storage: Pinot is built to provide low latency even at high-throughput. Features such as segment assignment strategy, routing strategy, star-tree indexing were developed to achieve this.
Immutable data: Pinot assumes that all data stored is immutable. For GDPR compliance, we provide an add-on solution for purging data while maintaining performance guarantees.
Dynamic configuration changes: Operations such as adding new tables, expanding a cluster, ingesting data, modifying indexing config, and re-balancing must not impact query availability or performance.
As described in the , Pinot has multiple distributed system components:, , , and .
Pinot uses for cluster management. Helix is embedded as an agent within the different components and uses for coordination and maintaining the overall cluster state and health.
Helix divides nodes into three logical components based on their responsibilities:
Participant: These are the nodes in the cluster that actually host the distributed storage resources.
Spectator: These nodes observe the current state of each participant and route requests accordingly. Routers, for example, need to know the instance on which a partition is hosted and its state to route the request to the appropriate endpoint. Routing is continually updated to optimize cluster performance as storage primitives are added and changed.
Helix uses Zookeeper to maintain cluster state. Each component in a Pinot cluster takes a Zookeeper address as a startup parameter. The various components distributed in a Pinot cluster watch Zookeeper notifications and issue updates via its embedded Helix-defined agent.
Helix agents use Zookeeper to store and update configurations, as well as for distributed coordination. Zookeeper stores the following information about the cluster:
Knowing the ZNode
layout structure in Zookeeper for Helix agents in a cluster is useful for operations and/or troubleshooting cluster state and health.
Starting a controller requires two parameters: Zookeeper address and cluster name. The controller will automatically create a cluster via Helix if it does not yet exist.
To configure fault tolerance, start multiple controllers (typically three) and one of them will act as a leader. If the leader crashes or dies, another leader is automatically elected. Leader election is achieved using Apache Helix. Having at-least one controller is required to perform any DDL equivalent operation on the cluster, such as adding a table or a segment.
The controller does not interfere with query execution. Query execution is not impacted even when all controllers nodes are offline. If all controller nodes are offline, the state of the cluster will stay as it was when the last leader went down. When a new leader comes online, a cluster resumes rebalancing activity and can accept new tables or segments.
The controller provides a REST interface to perform CRUD operations on all logical storage resources (servers, brokers, tables, and segments).
Brokers need three key things to start:
Cluster name
Zookeeper address
Broker instance name
Initially, the broker registers as a Helix Participant and waits for notifications from other Helix agents. The broker handles these notifications for table creation, a new segment being loaded, or a server starting up or going down, in addition to any configuration changes.
Service Discovery/Routing Table
Regardless of the kind of notification, the key responsibility of a broker is to maintain the query routing table. The query routing table is simply a mapping between segments and the servers that a segment resides on. Typically, a segment resides on more than one server. The broker computes multiple routing tables depending on the configured routing strategy for a table. The default strategy is to balance the query load across all available servers.
For special or generic cases that serve very high throughput queries, there are advanced routing strategies available such as ReplicaAware routing, partition-based routing, and minimal server selection routing.
Query processing
For every query, a cluster's broker performs the following:
Scatter-gather: sends the requests to each server and gathers the responses.
Merge: merges the query results returned from each server.
Sends the query result to the client.
Fault tolerance
Broker instances scale horizontally without an upper bound. In a majority of cases, only three brokers are required. If most query results that are returned to a client are less than 1 MB in size per query, you can run a broker and servers inside the same instance container. This lowers the overall footprint of a cluster deployment for use cases that don't need to guarantee a strict SLA on query performance in production.
In theory, a server can host both real-time segments and offline segments. However, in practice, we use different types of machine SKUs for real-time servers and offline servers. The advantage of separating real-time servers and offline servers is to allow each to scale independently.
Offline servers
Real-time servers
The two types of tables also scale differently.
Real-time tables have a smaller retention period and scale query performance based on the ingestion rate.
Offline tables have larger retention and scale performance based on the size of stored data.
You can configure real-time an offline tables differently depending on usage requirements. For example, you might choose to enable star-tree indexing for an offline table, while the real-time table with the same schema may not need it.
The controller processes the notification, resulting in the Helix agent on the controller updating the ideal state configuration in Zookeeper.
In response to the notification from the controller, the offline server downloads the newly created segments directly from the cluster's segment store.
The cluster's broker, which watches for state changes in Helix, detects the new segments and adds them to the list of segments to query (segment-to-server routing table).
At table creation, a controller creates a new entry in Zookeeper for the consuming segment. Helix notices the new segment and notifies the real-time server, which starts consuming data from the streaming source. The broker, which watches for changes, detects the new segments and adds them to the list of segments to query (segment-to-server routing table).
Whenever the segment is complete (full), the real-time server notifies the controller, which checks with all replicas and picks a winner to commit the segment to. The winner commits the segment and uploads it to the cluster's segment store, updating the state of the segment from "consuming" to "online". The controller then prepares a new segment in a "consuming" state.
Queries are received by brokers, which check the request against the segment-to-server routing table and scatter the request between real-time and offline servers.
The two tables process the request by filtering and aggregating the queried data, which is then returned back to the broker. Finally, the broker gathers together all of the pieces of the query response and responds back to the client with the result.
This section describes quick start commands that launch all Pinot components in a single process.
Pinot ships with QuickStart
commands that launch Pinot components in a single process and import pre-built datasets. These quick start examples are a good place if you're just getting started with Pinot. The examples begin with the example, after the following notes:
Prerequisites
You must have either or . The examples are available in each option and work the same. The decision of which to choose depends on your installation preference and how you generally like to work. If you don't know which to choose, using Docker will make your cleanup easier after you are done with the examples.
Pinot versions in examples
The Docker-based examples on this page use pinot:latest
, which instructs Docker to pull and use the most recent release of Apache Pinot. If you prefer to use a specific release instead, you can designate it by replacing latest
with the release number, like this: pinot:0.12.1
.
The local install-based examples that are run using the launcher scripts will use the Apache Pinot version you installed.
Running examples with Docker on a Mac with an M1 or M2 CPU
Add the -arm64
suffix to the run
commands, like this:
Stopping a running example
To stop a running example, enter Ctrl+C
in the same terminal where you ran the docker run
command to start the example.
macOS Monterey Users
By default the Airplay receiver server runs on port 7000, which is also the port used by the Pinot Server in the Quick Start. You may see the following error when running these examples:
If you disable the Airplay receiver server and try again, you shouldn't see this error message anymore.
This example demonstrates how to do batch processing with Pinot. The command:
Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates the baseballStats
table
Launches a standalone data ingestion job that builds one segment for a given CSV data file for the baseballStats
table and pushes the segment to the Pinot Controller.
Issues sample queries to Pinot
This example demonstrates how to import and query JSON documents in Pinot. The command:
Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates the githubEvents
table
Launches a standalone data ingestion job that builds one segment for a given JSON data file for the githubEvents
table and pushes the segment to the Pinot Controller.
Issues sample queries to Pinot
This example demonstrates how to do batch processing in Pinot where the the data items have complex fields that need to be unnested. The command:
Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates the githubEvents
table
Launches a standalone data ingestion job that builds one segment for a given JSON data file for the githubEvents
table and pushes the segment to the Pinot Controller.
Issues sample queries to Pinot
This example demonstrates how to do stream processing with Pinot. The command:
Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates meetupRsvp
table
Launches a meetup
stream
Publishes data to a Kafka topic meetupRSVPEvents
that is subscribed to by Pinot.
Issues sample queries to Pinot
This example demonstrates how to do stream processing with JSON documents in Pinot. The command:
Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates meetupRsvp
table
Launches a meetup
stream
Publishes data to a Kafka topic meetupRSVPEvents
that is subscribed to by Pinot
Issues sample queries to Pinot
This example demonstrates how to do stream processing in Pinot with RealtimeToOfflineSegmentsTask and MergeRollupTask minion tasks continuously optimizing segments as data gets ingested. The command:
Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, Pinot Minion, and Pinot Server.
Creates githubEvents
table
Launches a GitHub events stream
Publishes data to a Kafka topic githubEvents
that is subscribed to by Pinot.
Issues sample queries to Pinot
This example demonstrates how to do stream processing in Pinot where the stream contains items that have complex fields that need to be unnested. The command:
Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, Pinot Minion, and Pinot Server.
Creates meetupRsvp
table
Launches a meetup
stream
Publishes data to a Kafka topic meetupRSVPEvents
that is subscribed to by Pinot.
Issues sample queries to Pinot
Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates meetupRsvp
table
Launches a meetup
stream
Publishes data to a Kafka topic meetupRSVPEvents
that is subscribed to by Pinot
Issues sample queries to Pinot
Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates meetupRsvp
table
Launches a meetup
stream
Publishes data to a Kafka topic meetupRSVPEvents
that is subscribed to by Pinot
Issues sample queries to Pinot
This example demonstrates how to do hybrid stream and batch processing with Pinot. The command:
Starts Apache Kafka, Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server.
Creates airlineStats
table
Launches a standalone data ingestion job that builds segments under a given directory of Avro files for the airlineStats
table and pushes the segments to the Pinot Controller.
Launches a stream of flights stats
Publishes data to a Kafka topic airlineStatsEvents
that is subscribed to by Pinot.
Issues sample queries to Pinot
Starts Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server in the same container.
Creates the baseballStats
table
Launches a data ingestion job that builds one segment for a given CSV data file for the baseballStats
table and pushes the segment to the Pinot Controller.
Creates the dimBaseballTeams
table
Launches a data ingestion job that builds one segment for a given CSV data file for the dimBaseballStats
table and pushes the segment to the Pinot Controller.
Issues sample queries to Pinot
This quick start guide will help you bootstrap a Pinot standalone instance on your local machine.
In this guide, you'll learn how to download and install Apache Pinot as a standalone instance.
First, download the Pinot distribution for this tutorial. You can either download a packaged release or build a distribution from the source code.
Install JDK11 or higher (JDK16 is not yet supported).
For JDK 8 support, use Pinot 0.7.1 or compile from the source code.
Note that some installations of the JDK do not contain the JNI bindings necessary to run all tests. If you see an error like java.lang.UnsatisfiedLinkError
while running tests, you might need to change your JDK.
If using Homebrew, install AdoptOpenJDK 11 using brew install --cask adoptopenjdk11
.
Support for M1 and M2 Mac systems
Currently, Apache Pinot doesn't provide official binaries for M1 or M2 Macs. For instructions, see .
Download the distribution or build from source by selecting one of the following tabs:
Download the latest binary release from , or use this command:
Extract the TAR file:
Navigate to the directory containing the launcher scripts:
You can also find older versions of Apache Pinot at . For example, to download Pinot 0.10.0, run the following command:
Follow these steps to checkout code from and build Pinot locally
Prerequisites
Install 3.6 or higher
For M1 and M2 Macs, first follow first.
Check out Pinot:
Build Pinot:
If you're building with JDK 8, add Maven option -Djdk.version=8.
Navigate to the directory containing the setup scripts. Note that Pinot scripts are located under pinot-distribution/target
, not the target
directory under root
.
Currently, Apache Pinot doesn't provide official binaries for M1 or M2 Mac systems. Follow the instructions below to run on an M1 or M2 Mac:
Add the following to your ~/.m2/settings.xml
:
Install Rosetta:
Now that we've downloaded Pinot, it's time to set up a cluster. There are two ways to do this: through quick start or through setting up a cluster manually.
Pinot comes with quick start commands that launch instances of Pinot components in the same process and import pre-built datasets.
For example, the following quick start command launches Pinot with a baseball dataset pre-loaded:
If you want to play with bigger datasets (more than a few megabytes), you can launch each component individually.
The video below is a step-by-step walk through for launching the individual components of Pinot and scaling them to multiple instances.
The examples below assume that you are using Java 8.
If you are using Java 11+ users, remove the GC settings insideJAVA_OPTS
. So, for example, instead of this:
Use the following:
Set break points and inspect variables by starting a Pinot component with debug mode in IntelliJ.
The following example demonstrates server debugging:
Pinot quick start in Kubernetes
Get started running Pinot in Kubernetes.
Note: The examples in this guide are sample configurations to be used as reference. For production setup, you may want to customize it to your needs.
This guide assumes that you already have a running Kubernetes cluster.
If you haven't yet set up a Kubernetes cluster, see the links below for instructions:
Make sure to run with enough resources: minikube start --vm=true --cpus=4 --memory=8g --disk-size=50g
Make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our.
Note: Specify StorageClass based on your cloud vendor. Don't mount a blob store (such as AzureFile, GoogleCloudStorage, or S3) as the data serving file system. Use only Amazon EBS/GCP Persistent Disk/Azure Disk-style disks.
For AWS: "gp2"
For GCP: "pd-ssd" or "standard"
For Azure: "AzureDisk"
For Docker-Desktop: "hostpath"
1.1.1 Update Helm dependency
1.1.2 Start Pinot with Helm
For Helm v2.12.1:
If your Kubernetes cluster is recently provisioned, ensure Helm is initialized by running:
Then deploy a new HA Pinot cluster using the following command:
For Helm v3.0.0:
1.1.3 Troubleshooting (For helm v2.12.1)
If you see the error below:
Run the following:
If you encounter a permission issue, like the following:
Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"
Run the command below:
Ensure the Kafka deployment is ready before executing the scripts in the following steps. Run the following command:
Below is an example output showing the deployment is ready:
Run the scripts below to create two Kafka topics for data ingestion:
The script below does the following:
Ingests 19492 JSON messages to Kafka topic flights-realtime
at a speed of 1 msg/sec
Ingests 19492 Avro messages to Kafka topic flights-realtime-avro
at a speed of 1 msg/sec
Uploads Pinot schema airlineStats
Creates Pinot table airlineStats
to ingest data from JSON encoded Kafka topic flights-realtime
Creates Pinot table airlineStatsAvro
to ingest data from Avro encoded Kafka topic flights-realtime-avro
The script below, located at ./pinot/helm/pinot
, performs local port forwarding, and opens the Pinot query console in your default web browser.
Install the SuperSet Helm repository:
Get the Helm values configuration file:
For Superset to install Pinot dependencies, edit /tmp/superset-values.yaml
file to add apinotdb
pip dependency into bootstrapScript
field.
You can also build your own image with this dependency or use the image apachepinot/pinot-superset:latest
instead.
Replace the default admin credentials inside the init
section with a meaningful user profile and stronger password.
Install Superset using Helm:
Ensure your cluster is up by running:
Run the below command to port forward Superset to your localhost:18088
.
Navigate to Superset in your browser with the admin credentials you set in the previous section.
Create a new database connection with the following URI: pinot+http://pinot-broker.pinot-quickstart:8099/query?controller=http://pinot-controller.pinot-quickstart:9000/
Once the database is added, you can add more data sets and explore the dashboard options.
Deploy Trino with the Pinot plugin installed:
See the charts in the Trino Helm chart repository:
In order to connect Trino to Pinot, you'll need to add the Pinot catalog, which requires extra configurations. Run the below command to get all the configurable values.
To add the Pinot catalog, edit the additionalCatalogs
section by adding:
Pinot is deployed at namespace pinot-quickstart
, so the controller serviceURL is pinot-controller.pinot-quickstart:9000
After modifying the /tmp/trino-values.yaml
file, deploy Trino with:
Once you've deployed Trino, check the deployment status:
Once Trino is deployed, run the below command to get a runnable Trino CLI.
Download the Trino CLI:
Port forward Trino service to your local if it's not already exposed:
Use the Trino console client to connect to the Trino service:
Query Pinot data using the Trino CLI, like in the sample queries below.
First, deploy Presto with default configurations:
To customize your deployment, run the below command to get all the configurable values.
After modifying the /tmp/presto-values.yaml
file, deploy Presto:
Once you've deployed the Presto instance, check the deployment status:
Download the Presto CLI:
Port forward presto-coordinator
port 8080 to localhost
port 18080:
Start the Presto CLI with the Pinot catalog:
Query Pinot data with the Presto CLI, like in the sample queries below.
To delete your Pinot cluster in Kubernetes, run the following command:
This column represents time columns in the data. There can be multiple time columns in a table, but only one of them can be treated as primary. The primary time column is the one that is present in the .
The primary time column is used by Pinot to maintain the time boundary between offline and real-time data in a hybrid table and for retention management. A primary time column is mandatory if the table's push type is APPEND
and optional if the push type is REFRESH
.
See for details.
See for details.
As shown below, the RealtimeToOfflineSegmentsTask will be scheduled at the first second of every minute (following the syntax ).
See where the TestTask
is plugged-in.
Follow this link () to install Azure CLI.
Follow this to deploy your Pinot demo.
For a list of all available quick start commands, see .
Start Zookeeper in daemon mode. This is a single node zookeeper setup. Zookeeper is the central metadata store for Pinot and should be set up with replication for production use. For more information, see .
Once your cluster is up and running, see to learn how to run queries against the data.
If you have or installed, you can also try running the .
Helix, a generic cluster management framework to manage partitions and replicas in a distributed system, manages all Pinot and . It's helpful to think of Helix as an event-driven discovery service with push and pull notifications that drives the state of a cluster to an ideal configuration. A finite-state machine maintains a contract of stateful operations that drives the health of the cluster towards its optimal configuration. Helix optimizes query load by updating routing configurations between nodes based on where data is stored in the cluster.
Controller: The observes and manages the state of participant nodes. The controller is responsible for coordinating all state transitions in the cluster and ensures that state constraints are satisfied while maintaining cluster stability.
Pinot's acts as the driver of the cluster's overall state and health. Because of its role as a Helix participant and spectator, which drives the state of other components, it's the first component that is typically started after Zookeeper.
See for more information on the web-based admin tool.
The responsibility is to route a given query to an appropriate instance. A broker collects and merges the responses from all servers into a final result, then sends it back to the requesting client. The broker provides HTTP endpoints that accept SQL queries and returns the response in JSON format.
Fetches the routes that are computed for a query based on the routing strategy defined in a configuration.
Computes the list of segments to query from on each . To learn more about this, check out .
host and do most of the heavy lifting during query processing. Though the architecture shows that there are two kinds of servers, real-time and offline, a server doesn't really "know" if it's going to be a real-time server or an offline server. The server's responsibility depends on the assignment strategy.
Offline servers typically host segments that are immutable. In this case, segments are created outside of a cluster and uploaded via a shell-based request. Based on the replication factor and the segment assignment strategy, the controller picks one or more servers to host the segment. Helix notifies the servers about the new segments. Servers fetch the segments from deep store and load them. At this point, the cluster's detects that new segments are available and starts including them in query responses.
Unlike offline servers, real-time nodes ingest data from streaming sources, such as Kafka, and generate the indexed segments in-memory while flushing segments to disk periodically. In-memory segments are also known as consuming segments. Consuming segments get flushed periodically based on completion threshold (calculated with number of rows, time or segment size). At this point, they become completed segments. Completed segments are similar to offline servers' segments. Queries go over the in-memory (consuming) segments and the completed segments.
is an optional component used for purging data from a Pinot cluster. For example, you might need to purge data for GDPR compliance in the UK.
Within Pinot, a logical is modeled as one of two types of physical tables: offline or real-time. Each table type follows a different state model.
Real-time and offline tables provide different configuration options for indexing. For real-time tables, you can also configure the connector properties for the stream data source, like Kafka. The two table types also allow users to use different containers for real-time and offline nodes. For instance, offline servers might use virtual machines with larger storage capacity, whereas real-time servers might need higher system memory or more CPU cores.
When ingesting data from the same source, you can have two tables that ingest the same data that are configured differently for real-time and offline queries. Even though the two tables have the same data, performance will scale differently for queries based on your requirements. In this scenario, real-time and offline tables must share the same .
In batch mode, Pinot ingests data via an , which works in the following way:
An ingestion job transforms a raw data source (such as a CSV file) into .
Once segments are generated for the imported data, the ingestion job stores them into the cluster's segment store (also known as deep store) and notifies the .
Helix then notifies the offline that there are new segments available.
This example demonstrates how to do with Pinot. The command:
This example demonstrates how to do with JSON documents in Pinot. The command:
This example demonstrates how to do joins in Pinot using the . The command:
For a list of all the available quick start commands, see the .
You can find the commands that are shown in this video in the .
You can use to browse the Zookeeper instance.
Once your cluster is up and running, you can head over to to learn how to run queries against the data.
First, startzookeeper
, controller
, and broker
using the .
Then, use the following configuration under $PROJECT_DIR$\.run
) to start the server, replacing the metrics-core
version and cluster name as needed.
This is an example of how to use it.
The Pinot repository has pre-packaged Helm charts for Pinot and Presto. The Helm repository index file is .
Once Presto is deployed, you can run the below command from , or follow the steps below.