arrow-left

All pages
gitbookPowered by GitBook
1 of 3

Loading...

Loading...

Loading...

Using multiple tenants

With this feature, you can create multiple tenants, such that each tenant has servers of different specs, and use them in the same table. In this way, you'll bring down the cost of the historical data by using a lower spec of node such as HDDs instead of SSDs for storage and compute, while trading off slight latency.

hashtag
Config

You can configured separate tenants for the table by setting this config in your table config json.

hashtag
Example

In this example, the table uses servers tagged with base_OFFLINE. We have created two tenants of Pinot servers, tagged with ssd_OFFLINE and hdd_OFFLINE. Segments older than 7 days will move from base_OFFLINE to ssd_OFFLINE, and segments older than 15 days will move to hdd_OFFLINE.

hashtag
How does data move from one tenant to another?

On adding this config, the periodic task will move segments from one tenant to another, as and when the segment crosses the segment age.

Under the hood, this job runs a rebalance. So you can achieve the same effect as a manual trigger by running a

name

Name of the server group. Every group in the list must have a unique name

segmentSelectorType

The strategy used for selecting segments. The only supported strategy as of now is time, which will pick segments based on segment age.

segmentAge

This property is required when segmentSelectorType is time. Set a period string, eg. 15d, 24h, 60m. Segments which are older than the age will be moved to the the specific tenant

storageType

The type of storage. The only supported type is pinot_server

serverTag

This property is required when storageType is pinot_server. Set the tag of the Pinot servers you wish to use for this selection criteria.

Segment Relocatorarrow-up-right
rebalance
{
  "tableName": "myTable",
  "tableType": ...,
  "tenants": {
    "server": "base_OFFLINE",
    "broker": "base_BROKER"
  },
  "tierConfigs": [{
    "name": "ssdGroup",
    "segmentSelectorType": "time",
    "segmentAge": "7d",
    "storageType": "pinot_server",
    "serverTag": "ssd_OFFLINE"
  }, {
    "name": "hddGroup",
    "segmentSelectorType": "time",
    "segmentAge": "15d",
    "storageType": "pinot_server",
    "serverTag": "hdd_OFFLINE"
  }] 
}

Using multiple directories

With this feature, you can have a single tenant, but for servers in the tenant, you can have multiple data directories on severs, like one data path backed by SSD to keep recent data; one data path backed by HDD to keep older data, to bring down the cost of keeping long term historical data.

hashtag
Config

The servers should start with those configs to enable multi-datadir. In fact, only the first one is required. The tierBased directory loader is aware of the multiple data directories. The tierNames or dataDir specified for each tier are optional, but still recommended to set as server config so that they are consistent across the cluster for easy management. Their values can overwritten in TableConfig as shown below.

The controllers should enable local tier migration for segment relocator.

The tables specify which data to be put on which storage tiers, as an exmaple below

As in this example Segments older than 7 days are kept on hotTier, under path: /tmp/multidir_test/hotTier; and segments older than 15 days are kept on coldTier, under data path /tmp/multidir_test/my_custom_colddir (due to overwriting, although not recommended).

The configs are same as seen in . But instead of moving data across tenants, the data is moved across data paths on the servers locally, as driven by the SegmentRelocator, the periodic task running on the controller.

pinot.server.instance.segment.directory.loader=tierBased
pinot.server.instance.tierConfigs.tierNames=hotTier,coldTier
pinot.server.instance.tierConfigs.hotTier.dataDir=/tmp/multidir_test/hotTier
pinot.server.instance.tierConfigs.coldTier.dataDir=/tmp/multidir_test/coldTier
controller.segmentRelocator.enableLocalTierMigration=true
// by the way,
// controller.segment.relocator.frequencyPeriod=3600s, by default
// controller.segmentRelocator.initialDelayInSeconds=random [120, 300), by default
Using multiple tenants
{
  "tableName": "myTable",
  "tableType": ...,
  "tenants": {
    "server": "base_OFFLINE",
    "broker": "base_BROKER"
  },
  "tierConfigs": [{
    "name": "hotTier",
    "segmentSelectorType": "time",
    "segmentAge": "7d",
    "storageType": "pinot_server",
    "serverTag": "base_OFFLINE"
  }, {
    "name": "coldTier",
    "segmentSelectorType": "time",
    "segmentAge": "15d",
    "storageType": "pinot_server",
    "serverTag": "base_OFFLINE",
    "tierBackendProperties": { // overwriting is not recommended, but can be done as below
       "dataDir": "/tmp/multidir_test/my_custom_colddir" // assume path exists on servers.
    }        
  }] 
}

Separating data storage by age

In order to optimize for low latency, we often recommend using high performance SSDs as server nodes. But if such a use case has vast amount of data, and need the high performance only when querying few recent days of data, it might become desirable to keep only the recent time ranges on SSDs, and keep the less frequently queried ones on cheaper nodes such as HDDs.

By storing data separately at different storage tiers, one can keep large amounts of data in Pinot while having control over the cost of the cluster. Usually, the most recent data is recommended to put in storage tier with fast disk access to support real-time analytics queries of low latency and high throughput; and older data in cheaper and slower storage tiers for analytics where higher query latency can be accepted.

Note that separating data storage by age is not about to achieve the compute-storage decoupled architecture for Pinot.

Using multiple tenantschevron-right
Using multiple directorieschevron-right