Ingestion
The ingestion configuration ('ingestionConfig') is a section of the table configuration that specifies how to ingest streaming data into Pinot.
ingestionConfig
ingestionConfig
Config key | Description |
| See the streamConfigMaps section for details. |
| Set to |
| Set to |
| Set to |
streamConfigMaps
streamConfigMaps
Config key | Description | Supported values |
| The streaming platform to ingest data from |
|
| Whether to use per partition low-level consumer or high-level stream consumer | - - |
| Topic or data source to ingest data from | String |
| List of brokers | |
| Name of class to parse the data. The class should implement the | String. Available options: - |
| Name of factory class to provide the appropriate implementation of low-level and high-level consumer, as well as the metadata | String. Available options: - |
| Determines the offset from which to start the ingestion | - |
| Specifies the data format to ingest via a stream. The value of this property should match the format of the data in the stream. | - |
| Maximum elapsed time after which a consuming segment persist. Note that this time should be smaller than the Kafka retention period configured for the corresponding topic. | String, such |
| The maximum number of rows to consume before persisting the consuming segment. If this value is set to 0, the configuration looks to | Default is 5,000,000 |
| The maximum number of rows to consume before persisting the consuming segment. Added since | Int |
| Size the completed segments should be. This value is used when | String, such as |
The number of rows per segment is computed using the following formula: realtime.segment.flush.threshold.rows / maxPartitionsConsumedByServer
For example, if you set realtime.segment.flush.threshold.rows = 1000
and each server consumes 10 partitions, the rows per segment is 1000/10 = 100
.
Since release-1.2.0
, we introduced realtime.segment.flush.threshold.segment.rows
, which is directly used as the number of rows per segment.
Take the above example, if you set realtime.segment.flush.threshold.segment.rows = 1000
and each server consumes 10 partitions, the rows per segment is 1000
.
Example table config with ingestionConfig
ingestionConfig