Learn to build and manage Apache Pinot clusters, uncovering key components for efficient data processing and optimized analysis.
A cluster is a set of nodes comprising of servers, brokers, controllers and minions.
Pinot cluster components
Pinot uses Apache Helix for cluster management. Helix is a cluster management framework that manages replicated, partitioned resources in a distributed system. Helix uses Zookeeper to store cluster state and metadata.
Helix divides nodes into logical components based on their responsibilities:
The nodes that host distributed, partitioned resources
The nodes that observe the current state of each Participant and use that information to access the resources. Spectators are notified of state changes in the cluster (state of a participant, or that of a partition in a participant).
The node that observes and controls the Participant nodes. It is responsible for coordinating all transitions in the cluster and ensuring that state constraints are satisfied while maintaining cluster stability.
Another way to visualize the cluster is a logical view, where:
Typically, there is only one cluster per environment/data center. There is no need to create multiple Pinot clusters because Pinot supports tenants. At LinkedIn, the largest Pinot cluster consists of 1000+ nodes.
To set up a cluster, see one of the following guides: