Time Series Language Plugin
Describes how you can support custom or novel Time Series Query Languages like PromQL, M3QL, etc.
Overview
Time Series Query Languages like PromQL, though limited in what they can do, are quite convenient for Time Series analysis. There are many Time Series Query Languages out there, and Pinot should be able to support any of them via its Time Series Language Plugin.
Plugin High Level Design
We have a dedicated SPI for implementing the Time Series Plugin: pinot-timeseries-spi
.
To build support for your own language, you need to implement the following key components:
TimeSeriesLogicalPlanner
takes in a Time Series Request, and is expected to return a Plan Tree consisting ofBaseTimeSeriesPlanNode
. The leaves of the plan tree should beLeafTimeSeriesPlanNode
. The planner can also define theTimeBuckets
as it sees fit.BaseTimeSeriesPlanNode
allows you to implement your own plan nodes. Each plan node is expected to return aBaseTimeSeriesOperator
via itsrun
method.BaseTimeSeriesOperator
allows you to define and implement your operators.BaseTimeSeriesBuilder
can be implemented for each aggregation type you want to support.
There is an example Plugin implementation in the apache/pinot repo under pinot-plugins/pinot-timeseries-lang.
Plugin Implementation Tips
Consider using frameworks like JavaCC for implementing your Query Parser.
TimeBuckets
that you return from your planner can spill beyond the [start, end] of the original request. You can also define the resolution based on your language's semantics.Choose the resolution of
TimeBuckets
wisely. Setting a very fine resolution can make your visualization tool slow or incur a lot of Heap in Pinot.Use the
limit
in theLeafTimeSeriesPlanNode
to control the maximum number of series that can be returned from the leaf stage.Note that for each series returned by the Leaf Operator, there will be a
Double[]
array with length that is the same asTimeBuckets#getTimeBuckets()
. Ideally you should tune yourlimit
and the resolution ofTimeBuckets
based on a fixed upper-bound of the number of data points you want to allow.
Last updated
Was this helpful?