Speed up your time query with different granularities
This feature is supported from Pinot 0.11+.
Pinot introduces the TIMESTAMP data type from Pinot 0.8.0 release. This data type stores value as millisecond epoch long value internally.
Typically for analytics queries, users won't need this low level granularity, scanning the data and time value conversion can be costly for the big size of data.
A common query pattern for timestamp columns is filtering on a time range and then group by with different time granularities(days/month/etc).
The existing implementation requires the query executor to extract values, apply the transform functions then do filter/groupBy, no leverage on the dictionary or index.
Hence the inspiration of TIMESTAMP INDEX, which is used to improve the query performance for range query and group by queries on TIMESTAMP columns.
TIMESTAMP index can only be created on TIMESTAMP data type.
Users can configure the most useful granularities for a Timestamp data type column.
Pinot will pre-generate one column per time granularity with forward index and range index. The naming convention is $${ts_column_name}$${ts_granularity}
, e.g. Timestamp column ts
with granularities DAY
, MONTH
will have two extra columns generated: $ts$DAY
and $ts$MONTH
.
Query overwrite for predicate and selection/group by:
2.1 GROUP BY: functions like dateTrunc('DAY', ts)
will be translated to use the underly column $ts$DAY
to fetch data.
2.2 PREDICATE: range index is auto-built for all granularity columns.
Example query usage:
Some preliminary benchmark shows the query perf over 2.7 billion records improved from 45 secs to 4.2 secs
vs.
Timestamp index is configured per column basis inside the fieldConfigList section in table config.
Users need to specify TIMESTAMP
as part of the indexTypes
. Then in the field timestampConfig, specify the granularities that you want to index.
Sample config: