Apache Pinot Docs
Search…
Dimension Table
Dimension tables in Apache Pinot.
Dimension tables are a special kind of offline tables from which data can be looked up via the lookup UDF, providing join like functionality.
Dimension tables are replicated on all the hosts for a given tenant to allow faster lookups.
To mark an offline table as a dim table, isDimTable should be set to true and segmentsConfig.segementPushType should be set to REFRESH in the table config as shown below:
1
{
2
"OFFLINE": {
3
"tableName": "dimBaseballTeams_OFFLINE",
4
"tableType": "OFFLINE",
5
"segmentsConfig": {
6
"schemaName": "dimBaseballTeams",
7
"segmentPushType": "REFRESH"
8
},
9
"metadata": {},
10
"quota": {
11
"storage": "200M"
12
},
13
"isDimTable": true
14
}
15
}
Copied!
As dimension tables are used to perform lookups of dimension values, they are required to have a primary key (can be a composite key).
1
{
2
"dimensionFieldSpecs": [
3
{
4
"dataType": "STRING",
5
"name": "teamID"
6
},
7
{
8
"dataType": "STRING",
9
"name": "teamName"
10
}
11
],
12
"schemaName": "dimBaseballTeams",
13
"primaryKeyColumns": ["teamID"]
14
}
Copied!
When a table is marked as a dimension table, it will be replicated on all the hosts, which means that these tables must be small in size.
The maximum size quota for a dimension table in a cluster is controlled by the controller.dimTable.maxSize controller property. Table creation will fail if the storage quota exceeds this maximum size.
A dimension table cannot be part of a hybrid table.
Copy link