Table
The tables below shows the properties available to set at the table level.
Top-level fields
Property | Description |
---|---|
tableName | Specifies the name of the table. Should only contain alpha-numeric characters, hyphens (‘-‘), or underscores (‘_’). (Two notes: While the hyphen is allowed in table names, it is also a reserved character in SQL, so if you use it you must remember to double quote the table name in your queries. Using a double-underscore (‘__’) is not allowed as it is reserved for other features within Pinot.) |
tableType | Defines the table type: |
isDimTable | Boolean field to indicate whether the table is a dimension table |
quota | Defines properties related to quotas, such as storage quota and query quota. For details, see the Quota table below. |
task | Defines the enabled minion tasks for the table. See Minion for more details. |
routing | Defines the properties that determine how the broker selects the servers to route, and how segments can be pruned by the broker based on segment metadata. For details, see the Routing table below. |
query | Defines the properties related to query execution. For details, see the Query table below. |
segmentsConfig | Defines the properties related to the segments of the table, such as segment push frequency, type, retention, schema, time column etc. For details, see the segmentsConfig table below. |
tableIndexConfig | Defines the indexing related information for the Pinot table. For details, see Table indexing config below. |
fieldConfigList | Specifies the columns and the type of indices to be created on those columns. See Field config list for sub-properties. |
tenants | Defines the server and broker tenant used for this table. For details, see Tenant below. |
ingestionConfig | Defines the configurations needed for ingestion level transformations. For details, see Ingestion Level Transformations and Ingestion Level Aggregations. |
upsertConfig | Set upset configurations. For details, see Stream ingestion with upsert. |
dedupConfig | Set deduplication configurations. For details, see Stream ingestion with Dedup. |
tierConfigs | Defines configurations for tiered storage. For details, see Tiered Storage. |
metadata | Contains other metadata of the table. There is a string to string map field "customConfigs" under it which is expressed as key-value pairs to hold the custom configurations. |
Second-level fields
The following properties can be nested inside the top-level configurations.
Quota
Property | Description |
---|---|
storage | The maximum storage space the table is allowed to use before replication. For example, in the above table, the storage is 140G and replication is 3, so the maximum storage the table is allowed to use is 140G x 3 = 420G. The space the table uses is calculated by adding up the sizes of all segments from every server hosting this table. Once this limit is reached, offline segment push throws a |
maxQueriesPerSecond | The maximum queries per second allowed to execute on this table. If query volume exceeds this, a |
Routing
Find details on configuring routing here.
Property | Description |
---|---|
segmentPrunerTypes | The list of segment pruners to be enabled. The segment pruner prunes the selected segments based on the query. Supported values:
|
instanceSelectorType | The server instances to serve the query based on selected segments. Supported values:
|
Query
Property | Description |
---|---|
timeoutMs | Query timeout in milliseconds |
disableGroovy | Whether to disable groovy in query. This overrides the broker instance level config ( |
useApproximateFunction | Whether to automatically use approximate function for expensive aggregates, such as |
expressionOverrideMap | A map that configures the expressions to override in the query. This can be useful when users cannot control the queries sent to Pinot (e.g. queries auto-generated by some other tools), but want to override the expressions within the query (e.g. override a transform function to a derived column). Example: |
Segments config
Property | Description |
---|---|
schemaName | Name of the schema associated with the table |
timeColumnName | The name of the time column for this table. This must match with the time column name in the schema. This is mandatory for tables with push type |
replication | Number of replicas for the tables. A replication value of 1 means segments won't be replicated across servers. |
retentionTimeUnit | Unit for the retention, such as For example, |
retentionTimeValue | A numeric value for the retention. This, in combination with |
segmentPushType (Deprecated starting 0.7.0 or commit 9eaea9. Use IngestionConfig -> BatchIngestionConfig -> segmentPushType ) | Can be either:
|
segmentPushFrequency (Deprecated starting 0.7.0 or commit 9eaea9. Use IngestionConfig -> BatchIngestionConfig -> segmentPushFrequency ) | The cadence at which segments are pushed, such as |
Table index config
Property | Description |
---|---|
invertedIndexColumns | The list of columns that inverted index should be created on. The name of columns should match the schema. e.g. in the table above, inverted index has been created on three columns |
createInvertedIndexDuringSegmentGeneration | Boolean to indicate whether to create inverted indexes during the segment creation. By default, false i.e. inverted indexes are created when the segments are loaded on the server |
sortedColumn | The column which is sorted in the data and hence will have a sorted index. This does not need to be specified for the offline table, as the segment generation job will automatically detect the sorted column in the data and create a sorted index for it. |
bloomFilterColumns | The list of columns to apply bloom filter on. The names of the columns should match the schema. For more details about using bloom filters refer to Bloom Filter. |
bloomFilterConfigs | The map from the column to the bloom filter config. The names of the columns should match the schema. For more details about using bloom filters refer to Bloom Filter. |
rangeIndexColumns | The list of columns that range index should be created on. Typically used for numeric columns and mostly on metrics. e.g. |
rangeIndexVersion | Version of the range index, 2 (latest) by default. |
starTreeIndexConfigs | The list of StarTree indexing configs for creating StarTree indexes. For details on how to configure this, see StarTree Index. |
enableDefaultStarTree | Boolean to indicate whether to create a default StarTree index for the segment. For details, see StarTree Index. |
enableDynamicStarTreeCreation | Boolean to indicate whether to allow creating StarTree when server loads the segment. StarTree creation could potentially consume a lot of system resources, so this config should be enabled when the servers have the free system resources to create the StarTree. |
noDictionaryColumns | The set of columns that should not be dictionary-encoded. The name of columns should match the schema. NoDictionary dimension columns are LZ4 compressed, while the metrics are not compressed. |
onHeapDictionaryColumns | The list of columns for which the dictionary should be created on heap |
varLengthDictionaryColumns | The list of columns for which the variable length dictionary needs to be enabled in offline segments. This is only valid for string and bytes columns and has no impact for columns of other data types. |
jsonIndexColumns | The list of columns to create the JSON index. See JSON Index for more details. |
jsonIndexConfigs | The map from column to JSON index config. See JSON Index for more details. |
segmentPartitionConfig | Use
Example:
|
loadMode | Indicates how the segments will be loaded onto the server:
|
columnMinMaxValueGeneratorMode | Generate min max values for columns. Supported values:
|
nullHandlingEnabled | Boolean to indicate whether to keep track of null values as part of the segment generation. This is required when using |
aggregateMetrics | (deprecated, use Ingestion Aggregation) (only applicable for stream) set to |
optimizeDictionaryForMetrics | Set to |
noDictionarySizeRatioThreshold | If |
segmentNameGeneratorType | Type of segmentNameGenerator, default is See more on Segment Name Generator Spec |
Field Config List
Specify the columns and the type of indices to be created on those columns. Currently, not all index types can use this property. The following indexes are supported:
Property | |
---|---|
name | Name of the column |
encodingType | Should be one of |
indexTypes | List of indexes to create on this column. Valid values are the ids of the index types (text, fst, h3, etc) |
properties | JSON of key-value pairs containing additional properties associated with the index. The following properties are supported currently -
|
The property indexType
(in singular, accepting a single index id as string) is also supported for compatibility reasons, but we recommend using the plural in order to be able to define several indexes for the same column.
Warning:
If removing the forwardIndexDisabled
property above to regenerate the forward index for multi-value (MV) columns note that the following invariants cannot be maintained after regenerating the forward index for a forward index disabled column:
Ordering guarantees of the MV values within a row
If entries within an MV row are duplicated, the duplicates will be lost. Regenerate the segments via your offline jobs and re-push / refresh the data to get back the original MV data with duplicates.
We will work on removing the second invariant in the future.
Real-time table config
The sections below apply to real-time tables only.
segmentsConfig
Property | Description |
---|---|
replicasPerPartition | The number of replicas per partition for the stream |
completionMode | determines if segment should be downloaded from other server or built in memory. can be |
peerSegmentDownloadScheme | protocol to use to download segments from server. can be on of |
Indexing config
The streamConfigs
section has been deprecated as of release 0.7.0. See streamConfigMaps
instead.
Tenants
Property | Description |
---|---|
broker | Broker tenant in which the segment should reside |
server | Server tenant in which the segment should reside |
tagOverrideConfig | Override the tenant for segment if it fulfills certain conditions. Currently, only support override on |
Example
Environment variables override
Pinot allows users to define environment variables in the format of ${ENV_NAME}
or ${ENV_NAME:DEFAULT_VALUE}
as field values in table config.
Pinot instance will override it during runtime.
Brackets are required when defining the environment variable."$ENV_NAME"
is not supported.
Environment variables used without default value in table config have to be available to all Pinot components - Controller, Broker, Server, and Minion. Otherwise, querying/consumption will be affected depending on the service to which these variables are not available.
Below is an example of setting AWS credential as part of table config using environment variable.
Example:
Sample configurations
Offline table
Real-time table
Here's an example table config for a real-time table. All the fields from the offline table config are valid for the real-time table. Additionally, real-time tables use some extra fields.
Last updated