Table
A table is a logical abstraction to refer to a collection of related data. It consists of columns and rows (documents).
Data in Pinot tables is sharded into segments. A Pinot table is modeled as a Helix resource. Each segment of a table is modeled as a Helix Partition.
A table is typically associated with a schema, which is used to define the names, data types and other information of the columns of the table.
There are 3 types of a Pinot table
Table type | Description |
Offline | Offline tables ingest pre-built pinot-segments from external data stores. |
Realtime | Realtime tables ingest data from streams (such as Kafka) and build segments. |
Hybrid | A hybrid Pinot table has both realtime as well as offline tables under the hood. |
Note that the query does not know the existence of offline or realtime tables. It only specifies the table name in the query. For example, regardless of whether we have an offline table myTable_OFFLINE
, or a realtime table myTable_REALTIME
or a hybrid table containing both of these, the query will simply use mytable
as select count(*) from myTable
.
A table config file is used to define the table properties, such as name, type, indexing, routing, retention etc. It is written in JSON format, and stored in the property store in Zookeeper, along with the table schema.
Offline Table Config
Here's an example table config for an offline table
We will now discuss each section of the table config in detail.
Top level fields
Top level field | Description |
tableName | Specifies the name of the table. Should only contain alpha-numeric characters, hyphens (‘-‘), or underscores (‘’). (Using a double-underscore (‘_’) is not allowed and reserved for other features within Pinot) |
tableType | Defines the table type - |
quota | This section defines properties related to quotas, such as storage quota and query quota. For more details scroll down to quota |
routing | This section defines the properties related to configuring how the broker selects the servers to route, and how segments can be pruned by the broker based on segment metadata. For more details, scroll down to routing |
segmentsConfig | This section defines the properties related to the segments of the table, such as segment push frequency, type, retention, schema, time column etc. For more details scroll down to segmentsConfig |
tableIndexConfig | This section helps configure indexing and dictionary encoding related information for the Pinot table. For more details head over to tableIndexConfig |
tenants | Define the server and broker tenant used for this table. More details about tenant can be found in Tenant. |
metadata | This section is for keeping custom configs, which are expressed as key value pairs. |
Second level fields
quota
quota fields | Description |
storage | The maximum storage space the table is allowed to use, before replication. For example, in the above table, the storage is 140G and replication is 3. Therefore, the maximum storage the table is allowed to use is 140*3=420G. The space used by the table is calculated by adding up the sizes of all segments from every server hosting this table. Once this limit is reached, offline segment push throws a |
maxQueriesPerSecond | The maximum queries per second allowed to execute on this table. If query volume exceeds this, a |