Ingestion Job Spec
The ingestion job spec is used while generating, running, and pushing segments from the input files.
The Job spec can be in either YAML or JSON format (0.5.0 onwards). Property names remain the same in both formats.
To use the JSON format, add the propertyjob-spec-format=jsonin the properties file while launching the ingestion job. The properties file can be passed as follows
Template your job spec file
Users are allowed to define some variables in the job spec file to make it a template then passing the variables at runtime.
Templating is based on .
E.g. users can specify below in the job spec file:
The values for the template strings in the jobSpecFile can be passed in one of the following three ways mentioned in their order of precedence, for same key, 1 will override 2 will override 3.
Values from the -values array passed from the Cmd Line. See
Values from the environment variables
Values from the propertyFile
Still take above inputDirURI as example,
We can define a job.config file with below content:
Above properties can be override by environment variables
From the command line, user can further override those keys using flag -values,
After that the real ingestion spec passed to ingestion job will have inputDirURI as 'file:///path/to/input/2020/06/03/04'
Ingestion Job Spec
The following configurations are supported by Pinot
Execution Framework Spec
The configs specify the execution framework to use to ingest data. Check out for configs related to all the supported frameworks
Table spec is used to specify the table in which data should be populated along with schema.
Record Reader Spec
Segment Name Generator Spec
To set the segment name to be the same as the input file name (without the trailing .gz), use:
Note that $ in the yaml file must be escaped, since Pinot uses Groovy's SimpleTemplateEngine to process the yaml file, and a raw $ is treated as a template specifier.
Pinot Cluster Spec