Writing Custom Aggregation Function
Pinot has many built-in Aggregation Functions such as MIN, MAX, SUM, AVG etc. See PQL page for the list of aggregation functions.
Adding a new AggregationFunction requires two things
Register the function in . As of today, this requires code change in Pinot but we plan to add the ability to plugin Functions without having to change Pinot code.
To get an overall idea, see Aggregation Function implementation. All other implementations can be found .
Lets look at the key methods to implements in AggregationFunction
Before getting into the implementation, it's important to understand how Aggregation works in Pinot.
This is advanced topic and assumes you know Pinot . All the data in Pinot is stored in segments across multiple nodes. The query plan at a high level comprises of 3 phases
1. Map phase
This phase works on the individual segments in Pinot.
Initialization: Depending on the query type the following methods are invoked to set up the result holder. While having different methods and return types adds complexity, it helps in performance.
AGGREGATION : createAggregationResultHolderThis must return an instance of type . You can either use the or
2. Combine phase
In this phase, the results from all segments within a single pinot server are combined into IntermediateResult. The type of IntermediateResult is based on the Generic Type defined in the AggregationFunction implementation.
3. Reduce phase
There are two steps in the Reduce Phase
Merge all the IntermediateResult's from various servers using the merge function
Extract the final results by invoking the extractFinalResult method. In most cases, FinalResult is same type as IntermediateResult. is an example where IntermediateResult (AvgPair) is different from FinalResult(Double)