Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
To setup a Pinot cluster, follow these steps
Start Controller instances
Start Broker instances
Start Server instances
For more details on how to setup ingestion, refer to
.
There are multiple different sections in the documentation to help you get started with operating a Pinot cluster. If you are new to Pinot, please start with the basics.
To get started with operating a Pinot cluster, first please look at the tutorials in on how to run a basic pinot cluster in various environments.
You can then proceed to the more advanced Pinot setup in production environment.
Here are some related blog posts from the Apache Pinot community. You can find all of our blog posts on our .
This page introduces all the segment assignment strategies, when to use them, and how to configure them
Segment assignment refers to the strategy of assigning each segment from a table to the servers hosting the table. Picking the best segment assignment strategy can help reduce the overhead of the query routing, thus providing better performance.
Balanced Segment Assignment is the default assignment strategy, where each segment is assigned to the server with the least segments already assigned. With this strategy, each server will have balanced query load, and each query will be routed to all the servers. It requires minimum configuration, and works well for small use cases.
Balanced Segment Assignment is ideal for small use cases with a small number of servers, but as the number of servers increases, routing each query to all the servers could harm the query performance due to the overhead of the increased fanout.
Replica-Group Segment Assignment is introduced to solve the horizontal scalability problem of the large use cases, which makes Pinot linearly scalable. This strategy breaks the servers into multiple replica-groups, where each replica-group contains a full copy of all the segments.
When executing queries, each query will only be routed to the servers within the same replica-group. In order to scale up the cluster, more replica-groups can be added without affecting the fanout of the query, thus not impacting the query performance but increasing the overall throughput linearly.
In order to further increase the query performance, we can reduce the number of segments processed for each query by partitioning the data and use the Partitioned Replica-Group Segment Assignment.
Partitioned Replica-Group Segment Assignment extends the Replica-Group Segment Assignment by assigning the segments from the same partition to the same set of servers. To solve a query which hits only one partition (e.g. SELECT * FROM myTable WHERE memberId = 123
where myTable
is partitioned with memberId
column), the query only needs to be routed to the servers for the targeting partition, which can significantly reduce the number of segments to be processed. This strategy is especially useful to achieve high throughput and low latency for use cases that filter on an id field.
Segment assignment is configured along with the instance assignment, check Instance Assignment for details.
For more details on how to setup a table, refer to
Rebalance operation is used to recompute assignment of brokers or servers in the cluster. This is not a single command, but more of a series of steps that need to be taken.
In case of brokers, rebalance operation is used to recalculate the broker assignment to the tables. This is typically done after capacity changes.
These are typically done when downsizing/uplifting a cluster, or replacing nodes of a cluster.
Every broker added to the Pinot cluster, has tags associated with it. A group of brokers with the same tag forms a Broker Tenant. By default, a broker in the cluster gets added to the DefaultTenant
i.e. gets tagged as DefaultTenant_BROKER
. Below is an example of how this tag looks in the znode, as seen in ZooInspector.
Using the tenant defined above, a mapping is created, from table name to brokers and stored in the IDEALSTATES/brokerResource
. This mapping can be used by external services that need to pick a broker for querying.
If you want to scale up brokers, add new brokers to the cluster, and then tag them based on the tenant used by the table. If you're using DefaultTenant
, no tagging needs to be done, as every broker node by default joins with tag DefaultTenant_BROKER
.
If you want to scale down brokers, untag the brokers you wish to remove.
To update the tags on the broker, use the following API:
PUT /instances/{instanceName}/updateTags?tags=<comma separated tags>
Example for tagging the broker as per your custom tenant:
PUT /instances/Broker_10.20.151.8_8000/updateTags?tags=customTenant_BROKER
Example for untagging a broker:
PUT /instances/Broker_10.20.151.8_8000/updateTags?tags=untagged_BROKER
After making any capacity changes to the broker, the brokerResource needs to be rebuilt. This can be done with the below API:
POST /tables/{tableNameWithType}/rebuildBrokerResourceFromHelixTags
This is when you untagged and now want to remove the node from the cluster.
First, shutdown the broker. Then, use API below to remove the node from the cluster.
DELETE /instances/{instanceName}
If you encounter the below message when dropping, it means the broker process hasn't been shut down.
If you encounter below message, it means the broker has not been removed from the ideal state. Check the untagging and rebuild steps went through successfully.
This page describes how to rebalance a table
Rebalance operation is used to recompute assignment of brokers or servers in the cluster. This is not a single command, but more of a series of steps that need to be taken.
In case of servers, rebalance operation is used to balance the distribution of the segments amongst the servers being used by a Pinot table. This is typically done after capacity changes, or config changes such as replication or segment assignment strategies.
In case of brokers, rebalance operation is used to recalculate the broker assignment to the tables. This is typically done after capacity changes (scale up/down brokers).
A Pinot table config has a tenants section, to define the tenant to be used by the table. More details about this in the section.