Adding a new AggregationFunction requires two things
Lets look at the key methods to implements in AggregationFunction
Before getting into the implementation, it's important to understand how Aggregation works in Pinot.
1. Map phase
This phase works on the individual segments in Pinot.
Initialization: Depending on the query type the following methods are invoked to setup the result holder. While having different methods and return types adds complexity, it helps in performance.
Callback: For every record that matches the filter condition in the query,
one of the following methods are invoked depending on the queryType(aggregation vs group by) and columnType(single-value vs multi-value). Note that we invoke this method for a batch of records instead of every row for performance reasons and allows JVM to vectorize some of parts of the execution if possible.
AGGREGATION: aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<String,BlockValSet> blockValSetMap)
length: This represent length of the block. Typically < 10k
aggregationResultHolder: this is the object returned fromcreateAggregationResultHolder
blockValSetMap: Map of blockValSets depending on the arguments to the AggFunction
Group By Single Value: aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map blockValSets)
length: This represent length of the block. Typically < 10k
groupKeyArray: Pinot internally maintains a value to int mapping and this groupKeyArray maps to the internal mapping. These values together form a unique key.
groupByResultHolder: This is the object returned fromcreateGroupByResultHolder
blockValSetMap: Map of blockValSets depending on the arguments to the AggFunction
Group By Multi Value: aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map blockValSets)
length: This represent length of the block. Typically < 10k
groupKeyArray: Pinot internally maintains a value to int mapping and this groupKeyArray maps to the internal mapping. These values together form a unique key.
groupByResultHolder: This is the object returned fromcreateGroupByResultHolder
blockValSetMap: Map of blockValSets depending on the arguments to the AggFunction
2. Combine phase
In this phase, the results from all segments within a single pinot server are combined into IntermediateResult. The type of IntermediateResult is based on the Generic Type defined in the AggregationFunction implementation.
3. Reduce phase
There are two steps in the Reduce Phase
Merge all the IntermediateResult's from various servers using the merge function
Loading...