Minion
A Minion is a standby component that leverages the Helix Task Framework to offload computationally intensive tasks from other components.
It can be attached to an existing Pinot cluster and then execute tasks as provided by the controller. Custom tasks can be plugged via annotations into the cluster. Some typical minion tasks are:
Segment creation
Segment purge
Segment merge
Starting a Minion
Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a minion
Interfaces
PinotTaskGenerator
PinotTaskGenerator interface defines the APIs for the controller to generate tasks for minions to execute.
PinotTaskExecutorFactory
Factory for PinotTaskExecutor
which defines the APIs for Minion to execute the tasks.
MinionEventObserverFactory
Factory for MinionEventObserver
which defines the APIs for task event callbacks on minion.
Built-in Tasks
SegmentGenerationAndPushTask
To be added
RealtimeToOfflineSegmentsTask
See Pinot managed Offline flows for details.
MergeRollupTask
See Minion merge rollup task for details.
ConvertToRawIndexTask
To be added
Enable Tasks
Tasks are enabled on a per-table basis. To enable a certain task type (e.g. myTask
) on a table, update the table config to include the task type:
Under each enable task type, custom properties can be configured for the task type.
Schedule Tasks
Auto-Schedule
Tasks can be scheduled periodically for all task types on all enabled tables. Enable auto task scheduling by configuring the schedule frequency in the controller config with the key controller.task.frequencyInSeconds
.
Tasks can also be scheduled based on cron expressions. The cron expression is set in the schedule
config for each task type separately. Thie optioncontroller.task.scheduler.enabled
should be set to true
to enable cron scheduling.
As shown below, the RealtimeToOfflineSegmentsTask will be scheduled at the first second of every minute (following the syntax defined here).
Manual Schedule
Tasks can be manually scheduled using the following controller rest APIs:
Rest API | Description |
---|---|
POST /tasks/schedule | Schedule tasks for all task types on all enabled tables |
POST /tasks/schedule?taskType=myTask | Schedule tasks for the given task type on all enabled tables |
POST /tasks/schedule?tableName=myTable_OFFLINE | Schedule tasks for all task types on the given table |
POST /tasks/schedule?taskType=myTask&tableName=myTable_OFFLINE | Schedule tasks for the given task type on the given table |
Plug-in Custom Tasks
To plug in a custom task, implement PinotTaskGenerator
, PinotTaskExecutorFactory
and MinionEventObserverFactory
(optional) for the task type (all of them should return the same string for getTaskType()
), and annotate them with the following annotations:
Implementation | Annotation |
---|---|
PinotTaskGenerator | @TaskGenerator |
PinotTaskExecutorFactory | @TaskExecutorFactory |
MinionEventObserverFactory | @EventObserverFactory |
After annotating the classes, put them under the package of name org.apache.pinot.*.plugin.minion.tasks.*
, then they will be auto-registered by the controller and minion.
Example
See SimpleMinionClusterIntegrationTest where the TestTask
is plugged-in.
Task-related metrics
There is a controller job that runs every 5 minutes by default and emits metrics about Minion tasks scheduled in Pinot. The following metrics are emitted for each task type:
NumMinionTasksInProgress: Number of running tasks
NumMinionSubtasksRunning: Number of running sub-tasks
NumMinionSubtasksWaiting: Number of waiting sub-tasks (unassigned to a minion as yet)
NumMinionSubtasksError: Number of error sub-tasks (completed with an error/exception)
PercentMinionSubtasksInQueue: Percent of sub-tasks in waiting or running states
PercentMinionSubtasksInError: Percent of sub-tasks in error
For each task, the Minion will emit these metrics:
TASK_QUEUEING: Task queueing time (task_dequeue_time - task_inqueue_time), assuming the time drift between helix controller and pinot minion is minor, otherwise the value may be negative
TASK_EXECUTION: Task execution time, which is the time spent on executing the task
NUMBER_OF_TASKS: number of tasks in progress on that minion. Whenever a Minion starts a task, increase the Gauge by 1, whenever a Minion completes (either succeeded or failed) a task, decrease it by 1
Last updated