Timestamp index
Use a timestamp index to speed up your time query with different granularities
This feature is supported from Pinot 0.11+.
Background
The TIMESTAMP
data type introduced in the Pinot 0.8.0 release stores value as millisecond epoch long value.
Typically, users won't need this low level granularity for analytics queries. Scanning the data and time value conversion can be costly for big data.
A common query pattern for timestamp columns is filtering on a time range and then grouping by using different time granularities(days/month/etc).
Typically, this requires the query executor to extract values, apply the transform functions then do filter/groupBy, with no leverage on the dictionary or index.
This was the inspiration for the Pinot timestamp index, which is used to improve the query performance for range query and group by queries on TIMESTAMP
columns.
Supported data type
A TIMESTAMP
index can only be created on the TIMESTAMP
data type.
Timestamp Index
You can configure the granularity for a Timestamp data type column. Then:
Pinot will pre-generate one column per time granularity using a forward index and range index. The naming convention is
$${ts_column_name}$${ts_granularity}
, where the timestamp columnts
with granularitiesDAY
,MONTH
will have two extra columns generated:$ts$DAY
and$ts$MONTH
.Query overwrite for predicate and selection/group by: 2.1 GROUP BY: Functions like
dateTrunc('DAY', ts)
will be translated to use the underly column$ts$DAY
to fetch data. 2.2 PREDICATE: range index is auto-built for all granularity columns.
Example query usage:
Some preliminary benchmarking shows the query performance across 2.7 billion records improved from 45 secs to 4.2 secs using a timestamp index and a query like this:
vs.
Usage
The timestamp index is configured on a per column basis inside the fieldConfigList
section in the table configuration.
Specify the timestampConfig
field. This object must contain a field called granularities
, which is an array with at least one of the following values:
MILLISECOND
SECOND
MINUTE
HOUR
DAY
WEEK
MONTH
QUARTER
YEAR
Sample config: