Tiered storage allows you to split your server storage into multiple tiers. All the tiers can use different filesystem to hold the data. Tiered storage can be used to optimise the cost to latency tradeoff in production Pinot systems.
Some example scenarios in which tiered storage can be used -
- Tables with very long retention (more than 2 years) but most frequently queries are performed on the recent data.
- Reduce storage cost for older data while tolerating slightly higher latencies In order to optimize for low latency, we often recommend using high performance SSDs. But if such a use case has 2 years of data, and need the high performance only when querying 1 month of data, it might become desirable to keep only the recent time ranges on SSDs, and keep the less frequently queried ones on cheaper nodes such as HDDs or a DFS such as S3.
The data age based tiers is just one of the examples. The logic to split data into tiers may change depending on the use case.
You can configured tiered storage by setting the
tieredConfigskey in your table config json.
In this example, the table uses servers tagged with
base_OFFLINE. We have created two tiers of Pinot servers, tagged with
tier_b_OFFLINE. Segments older than 7 days will move from
tier_a_OFFLINE, and segments older than 15 days will move to
Following properties are supported under
On adding tier config, a periodic task on the pinot-controller called "SegmentRelocator" will move segments from one tenant to another, as and when the segment crosses the segment age.
This periodic task runs every hour by default. You can configure this frequency by setting the config with any period string (60s, 2h, 5d)
This job can also be triggered manually
curl -X GET "https://localhost:9000/periodictask/run?
-H "accept: application/json"