Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Here you will find a collection of ready-made sample applications and examples for real-world data
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Pinot Controller is responsible for the following:
Maintaining global metadata (e.g. configs and schemas) of the system with the help of Zookeeper which is used as the persistent metadata store.
Hosting the Helix Controller and managing other Pinot components (brokers, servers, minions)
Maintaining the mapping of which servers are responsible for which segments. This mapping is used by the servers to download the portion of the segments that they are responsible for. This mapping is also used by the broker to decide which servers to route the queries to.
Serving admin endpoints for viewing, creating, updating, and deleting configs, which are used to manage and operate the cluster.
Serving endpoints for segment uploads, which are used in offline data pushes. They are responsible for initializing real-time consumption and coordination of persisting real-time segments into the segment store periodically.
Undertaking other management activities such as managing retention of segments, validations.
For redundancy, there can be multiple instances of Pinot controllers. Pinot expects that all controllers are configured with the same back-end storage system so that they have a common view of the segments (e.g. NFS). Pinot can use other storage systems such as HDFS or ADLS.
The Controller runs several periodic tasks in the background, to perform activities such as management and validation. Each periodic task has its own configs to define the run frequency and default frequency. Each task runs at its own schedule or can also be triggered manually if needed. The task runs on the lead controller for each table.
Here's a list of all the periodic tasks
This task rebuilds the BrokerResource if the instance set has changed.
controller.broker.resource.validation.frequencyPeriod
1h
controller.broker.resource.validation.initialDelayInSeconds
between 2m-5m
TBD
This task manages the segment ValidationMetrics (missingSegmentCount, offlineSegmentDelayHours, lastPushTimeDelayHours, TotalDocumentCount, NonConsumingPartitionCount, SegmentCount), to ensure that all offline segments are contiguous (no missing segments) and that the offline push delay isn't too high.
controller.offline.segment.interval.checker.frequencyPeriod
24h
controller.statuschecker.waitForPushTimePeriod
10m
controller.offlineSegmentIntervalChecker.initialDelayInSeconds
between 2m-5m
TBD
This task validates the ideal state and segment zk metadata of realtime tables,
fixing any partitions which have stopped consuming
starting consumption from new partitions
uploading segments to deep store if segment download url is missing
This task ensures that the consumption of the realtime tables gets fixed and keeps going when met with erroneous conditions.
This task does not fix consumption stalled due to
CONSUMING segment being deleted
Kafka OOR exceptions
controller.realtime.segment.validation.frequencyPeriod
1h
controller.realtime.segment.validation.initialDelayInSeconds
between 2m-5m
This task manages retention of segments for all tables. During the run, it looks at the retentionTimeUnit
and retentionTimeValue
inside the segmentsConfig
of every table, and deletes segments which are older than the retention. The deleted segments are moved to a DeletedSegments folder colocated with the dataDir on segment store, and permanently deleted from that folder in a configurable number of days.
controller.retention.frequencyPeriod
6h
controller.retentionManager.initialDelayInSeconds
between 2m-5m
controller.deleted.segments.retentionInDays
7d
This task is applicable only if you have tierConfig or tagOverrideConfig. It runs rebalance in the background to
relocate COMPLETED segments to tag overrides
relocate ONLINE segments to tiers if tier configs are set
At most one replica is allowed to be unavailable during rebalance.
controller.segment.relocator.frequencyPeriod
1h
controller.segmentRelocator.initialDelayInSeconds
between 2m-5m
This task manages segment status metrics such as realtimeTableCount, offlineTableCount, disableTableCount, numberOfReplicas, percentOfReplicas, percentOfSegments, idealStateZnodeSize, idealStateZnodeByteSize, segmentCount, segmentsInErrorState, tableCompressedSize.
controller.statuschecker.frequencyPeriod
5m
controller.statusChecker.initialDelayInSeconds
between 2m-5m
TBD
Use the GET /periodictask/names
API to fetch the names of all the Periodic Tasks running on your Pinot cluster.
To manually run a named Periodic Task use the GET /periodictask/run
API
The Log Request Id
(api-09630c07
) can be used to search through pinot-controller log file to see log entries related to execution of the Periodic task that was manually run.
If tableName
(and its type OFFLINE
or REALTIME
) is not provided, the task will run against all tables.
Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a controller
Cluster is a set of nodes comprising of servers, brokers, controllers and minions.
Pinot uses for cluster management. Helix is a cluster management framework that manages replicated, partitioned resources in a distributed system. Helix uses Zookeeper to store cluster state and metadata.
Helix divides nodes into logical components based on their responsibilities:
The nodes that host distributed, partitioned resources
The nodes that observe the current state of each Participant and use that information to access the resources. Spectators are notified of state changes in the cluster (state of a participant, or that of a partition in a participant).
The node that observes and controls the Participant nodes. It is responsible for coordinating all transitions in the cluster and ensuring that state constraints are satisfied while maintaining cluster stability.
Another way to visualize the cluster is a logical view, where:
To set up a cluster, see one of the following guides:
Pinot Servers are modeled as Participants. For more details about server nodes, see .
Pinot Brokers are modeled as Spectators. For more details about broker nodes, see .
Pinot Controllers are modeled as Controllers. For more details about controller nodes, see .
A cluster contains
Tenants contain
Tables contain .
Typically, there is only one cluster per environment/data center. There is no need to create multiple Pinot clusters since Pinot supports the concept of . At LinkedIn, the largest Pinot cluster consists of 1000+ nodes.