Apache Pinot Docs
Search…
Dimension Table
Dimension tables in Apache Pinot.
Dimension tables are a special kind of offline tables from which data can be looked up via the lookup UDF, providing a join like functionality. These dimension tables are replicated on all the hosts for a given tenant to allow faster lookups.
To mark an offline table as a dim table the configuration isDimTable should be set to true in the table config as shown below
1
{
2
"OFFLINE": {
3
"tableName": "dimBaseballTeams_OFFLINE",
4
"tableType": "OFFLINE",
5
"segmentsConfig": {
6
"schemaName": "dimBaseballTeams",
7
},
8
"metadata": {},
9
"quota": {
10
"storage": "200M"
11
},
12
"isDimTable": true
13
}
14
}
Copied!
As dimension table are used to perform lookups of dimension values, they are required to have a primary key (can be a composite key).
1
{
2
"dimensionFieldSpecs": [
3
{
4
"dataType": "STRING",
5
"name": "teamID"
6
},
7
{
8
"dataType": "STRING",
9
"name": "teamName"
10
}
11
],
12
"schemaName": "dimBaseballTeams",
13
"primaryKeyColumns": ["teamID"]
14
}
Copied!
As mentioned above, when a table is marked as a dimension table it will be replicated on all the hosts, because of this the size of the dim table has to be small. The maximum size quota for a dimension table in a cluster is controlled by controller.dimTable.maxSize controller property. Table creation will fail if the storage quota exceeds this maximum size.
Last modified 8mo ago
Copy link