Dimension table
Batch ingestion of data into Apache Pinot using dimension tables.
Dimension tables are a special kind of offline tables from which data can be looked up via the lookup UDF, providing join-like functionality.
Dimension tables are replicated on all the hosts for a given tenant to allow faster lookups.
To mark an offline table as a dimension table, isDimTable
should be set to true and segmentsConfig.segementPushType
should be set to REFRESH in the table config, like this:
As dimension tables are used to perform lookups of dimension values, they are required to have a primary key (can be a composite key).
When a table is marked as a dimension table, it will be replicated on all the hosts, which means that these tables must be small in size.
The maximum size quota for a dimension table in a cluster is controlled by the controller.dimTable.maxSize
controller property. Table creation will fail if the storage quota exceeds this maximum size.
A dimension table cannot be part of a hybrid table.