Minion
Pinot Minion is a standby component which leverages the Helix Task Framework to offload the computationally intensive tasks from other components. It can be attached to an existing Pinot cluster and then execute tasks as provided by the controller. Custom tasks can be plugged via annotations into the cluster. Some typical minion tasks are: segment creation, segment purge, segment merge etc.

Starting a Minion

Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a minion
1
Usage: StartMinion
2
-help : Print this message. (required=false)
3
-minionHost <String> : Host name for minion. (required=false)
4
-minionPort <int> : Port number to start the minion at. (required=false)
5
-zkAddress <http> : HTTP address of Zookeeper. (required=false)
6
-clusterName <String> : Pinot cluster name. (required=false)
7
-configFileName <Config File Name> : Minion Starter Config file. (required=false)
Copied!
Docker Image
Launcher Scripts
1
docker run \
2
--network=pinot-demo \
3
--name pinot-minion \
4
-d ${PINOT_IMAGE} StartMinion \
5
-zkAddress pinot-zookeeper:2181
Copied!
1
bin/pinot-admin.sh StartMinion \
2
-zkAddress localhost:2181
Copied!

Interfaces

PinotTaskGenerator

PinotTaskGenerator interface defines the APIs for controller to generate tasks for minions to execute.
1
public interface PinotTaskGenerator {
2
3
/**
4
* Initializes the task generator.
5
*/
6
void init(ClusterInfoAccessor clusterInfoAccessor);
7
8
/**
9
* Returns the task type of the generator.
10
*/
11
String getTaskType();
12
13
/**
14
* Generates a list of tasks to schedule based on the given table configs.
15
*/
16
List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs);
17
18
/**
19
* Returns the timeout in milliseconds for each task, 3600000 (1 hour) by default.
20
*/
21
default long getTaskTimeoutMs() {
22
return JobConfig.DEFAULT_TIMEOUT_PER_TASK;
23
}
24
25
/**
26
* Returns the maximum number of concurrent tasks allowed per instance, 1 by default.
27
*/
28
default int getNumConcurrentTasksPerInstance() {
29
return JobConfig.DEFAULT_NUM_CONCURRENT_TASKS_PER_INSTANCE;
30
}
31
32
/**
33
* Performs necessary cleanups (e.g. remove metrics) when the controller leadership changes.
34
*/
35
default void nonLeaderCleanUp() {
36
}
37
}
Copied!

PinotTaskExecutorFactory

Factory for PinotTaskExecutor which defines the APIs for minion to execute the tasks.
1
public interface PinotTaskExecutorFactory {
2
3
/**
4
* Initializes the task executor factory.
5
*/
6
void init(MinionTaskZkMetadataManager zkMetadataManager);
7
8
/**
9
* Returns the task type of the executor.
10
*/
11
String getTaskType();
12
13
/**
14
* Creates a new task executor.
15
*/
16
PinotTaskExecutor create();
17
}
Copied!
1
public interface PinotTaskExecutor {
2
3
/**
4
* Executes the task based on the given task config and returns the execution result.
5
*/
6
Object executeTask(PinotTaskConfig pinotTaskConfig)
7
throws Exception;
8
9
/**
10
* Tries to cancel the task.
11
*/
12
void cancel();
13
}
Copied!

MinionEventObserverFactory

Factory for MinionEventObserver which defines the APIs for task event callbacks on minion.
1
public interface MinionEventObserverFactory {
2
3
/**
4
* Initializes the task executor factory.
5
*/
6
void init(MinionTaskZkMetadataManager zkMetadataManager);
7
8
/**
9
* Returns the task type of the event observer.
10
*/
11
String getTaskType();
12
13
/**
14
* Creates a new task event observer.
15
*/
16
MinionEventObserver create();
17
}
Copied!
1
public interface MinionEventObserver {
2
3
/**
4
* Invoked when a minion task starts.
5
*
6
* @param pinotTaskConfig Pinot task config
7
*/
8
void notifyTaskStart(PinotTaskConfig pinotTaskConfig);
9
10
/**
11
* Invoked when a minion task succeeds.
12
*
13
* @param pinotTaskConfig Pinot task config
14
* @param executionResult Execution result
15
*/
16
void notifyTaskSuccess(PinotTaskConfig pinotTaskConfig, @Nullable Object executionResult);
17
18
/**
19
* Invoked when a minion task gets cancelled.
20
*
21
* @param pinotTaskConfig Pinot task config
22
*/
23
void notifyTaskCancelled(PinotTaskConfig pinotTaskConfig);
24
25
/**
26
* Invoked when a minion task encounters exception.
27
*
28
* @param pinotTaskConfig Pinot task config
29
* @param exception Exception encountered during execution
30
*/
31
void notifyTaskError(PinotTaskConfig pinotTaskConfig, Exception exception);
32
}
Copied!

Built-in Tasks

SegmentGenerationAndPushTask

To be added

RealtimeToOfflineSegmentsTask

ConvertToRawIndexTask

To be added

Enable Tasks

Tasks are enabled on a per-table basis. To enable a certain task type (e.g. myTask) on a table, update the table config to include the task type:
1
{
2
...
3
"task": {
4
"taskTypeConfigsMap": {
5
"myTask": {
6
"myProperty1": "value1",
7
"myProperty2": "value2"
8
}
9
}
10
}
11
}
Copied!
Under each enable task type, custom properties can be configured for the task type.

Schedule Tasks

Auto Schedule

Tasks can be scheduled periodically for all task types on all enabled tables. Enable auto task scheduling by configuring the schedule frequency in the controller config with key controller.task.frequencyInSeconds.

Manual Schedule

Tasks can be manually scheduled using the following controller rest APIs:
Rest API
Description
POST /tasks/schedule
Schedule tasks for all task types on all enabled tables
POST /tasks/schedule?taskType=myTask
Schedule tasks for the given task type on all enabled tables
POST /tasks/schedule?tableName=myTable_OFFLINE
Schedule tasks for all task types on the given table
POST /tasks/schedule?taskType=myTask&tableName=myTable_OFFLINE
Schedule tasks for the given task type on the given table

Plug-in Custom Tasks

To plug-in a custom task, implement PinotTaskGenerator, PinotTaskExecutorFactory and MinionEventObserverFactory (optional) for the task type (all of them should return the same string for getTaskType()), and annotate them with the following annotations:
Implementation
Annotation
PinotTaskGenerator
@TaskGenerator
PinotTaskExecutorFactory
@TaskExecutorFactory
MinionEventObserverFactory
@EventObserverFactory
After annotating the classes, put them under the package of name org.apache.pinot.*.plugin.minion.tasks.*, then they will be auto-registered by the controller and minion.

Example

See SimpleMinionClusterIntegrationTest where the TestTask is plugged-in.

Task related metrics

There is a controller job runs every 5 minutes by default and emits metrics about Minion tasks scheduled in Pinot. For now, the following metrics are emitted for each task type:
    NumMinionTasksInProgress: Number of running tasks
    NumMinionSubtasksRunning: Number of running sub-tasks
    NumMinionSubtasksWaiting: Number of waiting sub-tasks (unassigned to a minion as yet)
    NumMinionSubtasksError: Number of error sub-tasks (completed with an error/exception)
    PercentMinionSubtasksInQueue: Percent of sub-tasks in waiting or running states
    PercentMinionSubtasksInError: Percent of sub-tasks in error
For each task, the minion will emit metrics:
    TASK_QUEUEING: Task queueing time (task_dequeue_time - task_inqueue_time), assuming the time drift between helix controller and pinot minion is minor, otherwise the value may be negative
    TASK_EXECUTION: Task execution time, which is the time spent on executing the task
    NUMBER_OF_TASKS: number of tasks in progress on that minion. Whenever minion start a task, increase the Gauge by 1, whenever minion completed (either succeeded or failed) a task, decrease it by 1
Last modified 1mo ago