1 of 17

Import Data

This section is an overview of the various options for importing data into Pinot.

There are multiple options for importing data into Pinot. These guides are ready-made examples that show you step-by-step instructions for importing records into Pinot, supported by our plugin architecture.

These guides are meant to get you up and running with imported data as quick as possible. Pinot supports multiple file input formats without needing to change anything other than the file name. Each example imports a ready-made dataset so you can see how things work without needing to bring your own dataset.

Pinot Batch Ingestion

Spark Hadoop

Pinot Stream Ingestion

This guide will show you how to import data using stream ingestion from Apache Kafka topics.

This guide will show you how to import data using stream ingestion with upsert.

Pinot File Systems

By default, Pinot does not come with a storage layer, so all the data sent, won't be stored in case of system crash. In order to persistently store the generated segments, you will need to change controller and server configs to add a deep storage. Checkout for all the info and related configs.

These guides will show you how to import data as well as persist it in the file systems.

Pinot Input Formats

These guides will show you how to import data from a Pinot supported input format.

This guide will show you how to handle the complex type in the ingested data, such as map and array.

Batch Ingestion

Batch ingestion allows users to create a table using data already present in a file system such as S3. This is particularly useful for the cases where the user wants to utilize Pinot's ability to query large data with minimal latency or test out new features using a simple data file.

Ingesting data from a filesystem involves the following steps -

Define Schema
Define Table Config
Upload Schema and Table configs
Upload data

Batch Ingestion currently supports the following mechanisms to upload the data -

Standalone

Here we'll take a look at the standalone local processing to get you started.

Let's create a table for the following CSV data source.

Create Schema Configuration

In our data, the only column on which aggregations can be performed is score. Secondly, timestampInEpoch is the only timestamp column. So, on our schema, we keep score as metric and timestampInEpoch as timestamp column.

Here, we have also defined two extra fields - format and granularity. The format specifies the formatting of our timestamp column in the data source. Currently, it is in milliseconds hence we have specified 1:MILLISECONDS:EPOCH

Create Table Configuration

We define a tabletranscriptand map the schema created in the previous step to the table. For batch data, we keep the tableType as OFFLINE

Upload Schema and Table

Now that we have both the configs, we can simply upload them and create a table. To achieve that, just run the command -

Check out the table config and schema in the [Rest API] to make sure it was successfully uploaded.

Upload data

We now have an empty table in pinot. So as the next step we will upload our CSV file to this table.

A table is composed of multiple segments. The segments can be created using three ways

1) Minion based ingestion 2) Upload API 3) Ingestion jobs

Minion Based Ingestion

Refer to

Upload API

There are 2 Controller APIs that can be used for a quick ingestion test using a small file.

When these APIs are invoked, the controller has to download the file and build the segment locally.

Hence, these APIs are NOT meant for production environments and for large input files.

/ingestFromFile

This API creates a segment using the given file and pushes it to Pinot. All steps happen on the controller. Example usage:

To upload a JSON file data.json to a table called foo_OFFLINE, use below command

Note that query params need to be URLEncoded. For example, {"inputFormat":"json"} in the command below needs to be converted to %7B%22inputFormat%22%3A%22json%22%7D.

The batchConfigMapStr can be used to pass in additional properties needed for decoding the file. For example, in case of csv, you may need to provide the delimiter

/ingestFromURI

This API creates a segment using file at the given URI and pushes it to Pinot. Properties to access the FS need to be provided in the batchConfigMap. All steps happen on the controller. Example usage:

Ingestion Jobs

Segments can be created and uploaded using tasks known as DataIngestionJobs. A job also needs a config of its own. We call this config the JobSpec.

For our CSV file and table, the job spec should look like below.

You can refer to for more details.

Now that we have the job spec for our table transcript , we can trigger the job using the following command

Once the job has successfully finished, you can head over to the [query console] and start playing with the data.

Segment Push Job Type

There are 3 ways to upload a Pinot segment:

1. Segment Tar Push

This is the original and default push mechanism.

Tar push requires the segment to be stored locally or can be opened as an InputStream on PinotFS. So we can stream the entire segment tar file to the controller.

The push job will:

Upload the entire segment tar file to the Pinot controller.

Pinot controller will:

Save the segment into the controller segment directory(Local or any PinotFS).
Extract segment metadata.
Add the segment to the table.

2. Segment URI Push

This push mechanism requires the segment Tar file stored on a deep store with a globally accessible segment tar URI.

URI push is light-weight on the client-side, and the controller side requires equivalent work as the Tar push.

The push job will:

POST this segment Tar URI to the Pinot controller.

Pinot controller will:

Download segment from the URI and save it to controller segment directory(Local or any PinotFS).
Extract segment metadata.
Add the segment to the table.

3. Segment Metadata Push

This push mechanism also requires the segment Tar file stored on a deep store with a globally accessible segment tar URI.

Metadata push is light-weight on the controller side, there is no deep store download involves from the controller side.

The push job will:

Download the segment based on URI.
Extract metadata.
Upload metadata to the Pinot Controller.

Pinot Controller will:

Add the segment to the table based on the metadata.

Segment Fetchers

When pinot segment files are created in external systems (Hadoop/spark/etc), there are several ways to push those data to the Pinot Controller and Server:

Push segment to shared NFS and let pinot pull segment files from the location of that NFS. See .
Push segment to a Web server and let pinot pull segment files from the Web server with HTTP/HTTPS link. See .
Push segment to PinotFS(HDFS/S3/GCS/ADLS) and let pinot pull segment files from PinotFS URI. See and .

The first three options are supported out of the box within the Pinot package. As long your remote jobs send Pinot controller with the corresponding URI to the files it will pick up the file and allocate it to proper Pinot Servers and brokers. To enable Pinot support for PinotFS, you will need to provide configuration and proper Hadoop dependencies.

Persistence

By default, Pinot does not come with a storage layer, so all the data sent, won't be stored in case of a system crash. In order to persistently store the generated segments, you will need to change controller and server configs to add deep storage. Checkout for all the info and related configs.

Tuning

Standalone

Since pinot is written in Java, you can set the following basic java configurations to tune the segment runner job -

Log4j2 file location with -Dlog4j2.configurationFile
Plugin directory location with -Dplugins.dir=/opt/pinot/plugins
JVM props, like -Xmx8g -Xms4G

If you are using the docker, you can set the following under JAVA_OPTS variable.

Hadoop

You can set -D mapreduce.map.memory.mb=8192 to set the mapper memory size when submitting the Hadoop job.

Spark

You can add config spark.executor.memory to tune the memory usage for segment creation when submitting the Spark job.

Spark

Pinot supports Apache spark as a processor to create and push segment files to the database. Pinot distribution is bundled with the Spark code to process your files and convert and upload them to Pinot.

You can follow the to build pinot distribution from source. The resulting JAR file can be found in pinot/target/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar

Next, you need to change the execution config in the to the following -

You can check out the sample job spec here.

Now, add the pinot jar to spark's classpath using following options -

Hadoop

Segment Creation and Push

Pinot supports as a processor to create and push segment files to the database. Pinot distribution is bundled with the Spark code to process your files and convert and upload them to Pinot.

You can follow the [wiki] to build pinot distribution from source. The resulting JAR file can be found in pinot/target/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar

Backfill Data

Introduction

Pinot batch ingestion involves two parts: routing ingestion job(hourly/daily) and backfill. Here are some tutorials on how routine batch ingestion works in Pinot Offline Table:

Batch Ingestion Overview

High Level Idea

Organize raw data into buckets (eg: /var/pinot/airlineStats/rawdata/2014/01/01). Each bucket typically contains several files (eg: /var/pinot/airlineStats/rawdata/2014/01/01/airlineStats_data_2014-01-01_0.avro)
Run a Pinot batch ingestion job, which points to a specific date folder like ‘/var/pinot/airlineStats/rawdata/2014/01/01’. The segment generation job will convert each such avro file into a Pinot segment for that day and give it a unique name.
Run Pinot segment push job to upload those segments with those uniques names via a Controller API

IMPORTANT: The segment name is the unique identifier used to uniquely identify that segment in Pinot. If the controller gets an upload request for a segment with the same name - it will attempt to replace it with the new one.

This newly uploaded data can now be queried in Pinot. However, sometimes users will make changes to the raw data which need to be reflected in Pinot. This process is known as 'Backfill'.

How to Backfill data in Pinot

Pinot supports data modification only at the segment level, which means we should update entire segments for doing backfills. The high level idea is to repeat steps 2 (segment generation) and 3 (segment upload) mentioned above:

Backfill jobs must run at the same granularity as the daily job. E.g., if you need to backfill data for 2014/01/01, specify that input folder for your backfill job (e.g.: ‘/var/pinot/airlineStats/rawdata/2014/01/01’)
The backfill job will then generate segments with the same name as the original job (with the new data).
When uploading those segments to Pinot, the controller will replace the old segments with the new ones (segment names act like primary keys within Pinot) one by one.

Edge case

Backfill jobs expect the same number of (or more) data files on the backfill date. So the segment generation job will create the same number of (or more) segments than the original run.

E.g. assuming table airlineStats has 2 segments(airlineStats_2014-01-01_2014-01-01_0, airlineStats_2014-01-01_2014-01-01_1) on date 2014/01/01 and the backfill input directory contains only 1 input file. Then the segment generation job will create just one segment: airlineStats_2014-01-01_2014-01-01_0. After the segment push job, only segment airlineStats_2014-01-01_2014-01-01_0 got replaced and stale data in segment airlineStats_2014-01-01_2014-01-01_1 are still there.

In case the raw data is modified in such a way that the original time bucket has fewer input files than the first ingestion run, backfill will fail.

Dimension Table

Dimension tables in Apache Pinot.

Dimension tables are a special kind of offline tables from which data can be looked up via the lookup UDF, providing a join like functionality. These dimension tables are replicated on all the hosts for a given tenant to allow faster lookups.

To mark an offline table as a dim table the configuration isDimTable should be set to true in the table config as shown below

{
  "OFFLINE": {
    "tableName": "dimBaseballTeams_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "schemaName": "dimBaseballTeams",
    },
    "metadata": {},
    "quota": {
      "storage": "200M"
    },
    "isDimTable": true
  }
}

As dimension table are used to perform lookups of dimension values, they are required to have a primary key (can be a composite key).

{
  "dimensionFieldSpecs": [
    {
      "dataType": "STRING",
      "name": "teamID"
    },
    {
      "dataType": "STRING",
      "name": "teamName"
    }
  ],
  "schemaName": "dimBaseballTeams",
  "primaryKeyColumns": ["teamID"]
}

As mentioned above, when a table is marked as a dimension table it will be replicated on all the hosts, because of this the size of the dim table has to be small. The maximum size quota for a dimension table in a cluster is controlled by controller.dimTable.maxSize controller property. Table creation will fail if the storage quota exceeds this maximum size.

Amazon Kinesis

This is not tested in production. You may hit some snags while trying to use this.

To ingest events from an Amazon Kinesis stream into Pinot, set the following configs into the table config

where the Kinesis specific properties are:

Property

Kinesis supports authentication using the . The credential provider looks for the credentials in the following order -

Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)
Java System Properties - aws.accessKeyId and aws.secretKey

You can also specify the accessKey and secretKey using the properties. However, this method is not secure and should be used only for POC setups. You can also specify other aws fields such as AWS_SESSION_TOKEN as environment variables and config and it will work.

Limitations

ShardID is of the format "shardId-000000000001". We use the numeric part as partitionId. Our partitionId variable is integer. If shardIds grow beyond Integer.MAX_VALUE, we will overflow
Segment size based thresholds for segment completion will not work. It assumes that partition "0" always exists. However, once the shard 0 is split/merged, we will no longer have partition 0.

Stream Ingestion with Upsert

Upsert support in Apache Pinot.

Pinot provides native support of upsert during the real-time ingestion (v0.6.0+). There are scenarios that the records need modifications, such as correcting a ride fare and updating a delivery status.

With the foundation of full upsert support in Pinot, another category of use cases on partial upsert are enabled (v0.8.0+). Partial upsert is convenient to users so that they only need to specify the columns whose value changes, and ignore the others.

To enable upsert on a Pinot table, there are a couple of configurations to make on the table configurations as well as on the input stream.

File systems

This section contains a collection of short guides to show you how to import from a Pinot supported file system.

FileSystem is an abstraction provided by Pinot to access data in distributed file systems

Pinot uses the distributed file systems or the following purposes -

Batch Ingestion Job - To read the input data (CSV, Avro, Thrift etc.) and to write generated segments to DFS
Controller - When a segment is uploaded to controller, the controller saves it in the DFS configured.
Server - When a server(s) is notified of a new segment, the server copy the segment from remote DFS to their local node using the DFS abstraction

Pinot allows you to choose the distributed file system provider. Currently, the following file systems are supported by Pinot out of the box.

To use a distributed file-system, you need to enable plugins in pinot. To do that, specify the plugin directory and include the required plugins -

Now, You can proceed to change the filesystem in the controller and server config as follows -

Here, scheme refers to the prefix used in URI of the filesystem. e.g. for the URI, s3://bucket/path/to/file the scheme is s3

You can also change the filesystem during ingestion. In the ingestion job spec, specify the filesystem with the following config -

Amazon S3

You can enable Filesystem backend by including the plugin pinot-s3 .

By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...

Azure Data Lake Storage

This guide shows you how to import data from files stored in Azure Data Lake Storage (ADLS)

You can enable the Azure Data Lake Storage using the plugin pinot-adls. In the controller or server, add the config -

HDFS

This guide shows you how to import data from HDFS.

You can enable the using the plugin pinot-hdfs. In the controller or server, add the config:

Google Cloud Storage

This guide shows you how to import data from GCP (Google Cloud Platform).

You can enable the Google Cloud Storage using the plugin pinot-gcs. In the controller or server, add the config -

GCP filesystems provides the following options -

projectId - The name of the Google Cloud Platform project under which you have created your storage bucket.
gcpKey - Location of the json file containing GCP keys. You can refer to download the keys.

Each of these properties should be prefixed by pinot.[node].storage.factory.class.gs. where node is either controller or server depending on the config

e.g.

Examples

Job spec

Controller config

Server config

Minion config

Complex Type (Array, Map) Handling

Complex-type handling in Apache Pinot.

It's common for the ingested data to have complex structure. For example, Avro schema has and , and JSON data has and . In Apache Pinot, the data model supports primitive data types (including int, long, float, double, string, bytes), as well as limited multi-value types such as an array of primitive types. Such simple data types allow Pinot to build fast indexing structures for good query performance, but it requires some handling on the complex structures. There are in general two options for such handling: convert the complex-type data into JSON string and then build JSON index; or use the inbuilt complex-type handling rules in the ingestion config.

In this page, we'll show how to handle this complex-type structure with these two approaches, to process the example data in the following figure, which is a field group from the . Note this object has two child fields, and the child group is a nested array with the element of object type.

Batch Ingestion

Ingesting data from a filesystem involves the following steps -

Define Schema
Define Table Config
Upload Schema and Table configs
Upload data

Batch Ingestion currently supports the following mechanisms to upload the data -

Standalone

Here we'll take a look at the standalone local processing to get you started.

Let's create a table for the following CSV data source.

Create Schema Configuration

Create Table Configuration

We define a tabletranscriptand map the schema created in the previous step to the table. For batch data, we keep the tableType as OFFLINE

Upload Schema and Table

Now that we have both the configs, we can simply upload them and create a table. To achieve that, just run the command -

Check out the table config and schema in the [Rest API] to make sure it was successfully uploaded.

Upload data

We now have an empty table in pinot. So as the next step we will upload our CSV file to this table.

A table is composed of multiple segments. The segments can be created using three ways

1) Minion based ingestion 2) Upload API 3) Ingestion jobs

Minion Based Ingestion

Refer to

Upload API

There are 2 Controller APIs that can be used for a quick ingestion test using a small file.

When these APIs are invoked, the controller has to download the file and build the segment locally.

Hence, these APIs are NOT meant for production environments and for large input files.

/ingestFromFile

This API creates a segment using the given file and pushes it to Pinot. All steps happen on the controller. Example usage:

To upload a JSON file data.json to a table called foo_OFFLINE, use below command

Note that query params need to be URLEncoded. For example, {"inputFormat":"json"} in the command below needs to be converted to %7B%22inputFormat%22%3A%22json%22%7D.

The batchConfigMapStr can be used to pass in additional properties needed for decoding the file. For example, in case of csv, you may need to provide the delimiter

/ingestFromURI

Ingestion Jobs

Segments can be created and uploaded using tasks known as DataIngestionJobs. A job also needs a config of its own. We call this config the JobSpec.

For our CSV file and table, the job spec should look like below.

You can refer to for more details.

Now that we have the job spec for our table transcript , we can trigger the job using the following command

Once the job has successfully finished, you can head over to the [query console] and start playing with the data.

Segment Push Job Type

There are 3 ways to upload a Pinot segment:

1. Segment Tar Push

This is the original and default push mechanism.

Tar push requires the segment to be stored locally or can be opened as an InputStream on PinotFS. So we can stream the entire segment tar file to the controller.

The push job will:

Upload the entire segment tar file to the Pinot controller.

Pinot controller will:

Save the segment into the controller segment directory(Local or any PinotFS).
Extract segment metadata.
Add the segment to the table.

2. Segment URI Push

This push mechanism requires the segment Tar file stored on a deep store with a globally accessible segment tar URI.

URI push is light-weight on the client-side, and the controller side requires equivalent work as the Tar push.

The push job will:

POST this segment Tar URI to the Pinot controller.

Pinot controller will:

Download segment from the URI and save it to controller segment directory(Local or any PinotFS).
Extract segment metadata.
Add the segment to the table.

3. Segment Metadata Push

This push mechanism also requires the segment Tar file stored on a deep store with a globally accessible segment tar URI.

Metadata push is light-weight on the controller side, there is no deep store download involves from the controller side.

The push job will:

Download the segment based on URI.
Extract metadata.
Upload metadata to the Pinot Controller.

Pinot Controller will:

Add the segment to the table based on the metadata.

Segment Fetchers

When pinot segment files are created in external systems (Hadoop/spark/etc), there are several ways to push those data to the Pinot Controller and Server:

Push segment to shared NFS and let pinot pull segment files from the location of that NFS. See .
Push segment to a Web server and let pinot pull segment files from the Web server with HTTP/HTTPS link. See .
Push segment to PinotFS(HDFS/S3/GCS/ADLS) and let pinot pull segment files from PinotFS URI. See and .

Persistence

Tuning

Standalone

Since pinot is written in Java, you can set the following basic java configurations to tune the segment runner job -

Log4j2 file location with -Dlog4j2.configurationFile
Plugin directory location with -Dplugins.dir=/opt/pinot/plugins
JVM props, like -Xmx8g -Xms4G

If you are using the docker, you can set the following under JAVA_OPTS variable.

Hadoop

You can set -D mapreduce.map.memory.mb=8192 to set the mapper memory size when submitting the Hadoop job.

Spark

You can add config spark.executor.memory to tune the memory usage for segment creation when submitting the Spark job.

Input formats

This section contains a collection of guides that will show you how to import data from a Pinot supported input format.

Pinot offers support for various popular input formats during ingestion. By changing the input format, you can reduce the time that goes in serialization-deserialization and speed up the ingestion.

The input format can be changed using the recordReaderSpec config in the ingestion job spec.

The config consists of the following keys -

dataFormat - Name of the data format to consume.

className - name of the class that implements the RecordReader interface. This class is used for parsing the data.

configClassName - name of the class that implements the RecordReaderConfig interface. This class is used the parse the values mentioned in configs

configs - Key value pair for format specific configs. This field can be left out.

Pinot supports the multiple input formats out of the box. You just need to specify the corresponding readers and the associated custom configs to switch between the formats.

CSV

CSV Record Reader supports the following configs -

fileFormat - can be one of default, rfc4180, excel, tdf, mysql

header - header of the file. The columnNames should be seperated by the delimiter mentioned in the config

delimiter - The character seperating the columns

multiValueDelimiter - The character seperating multiple values in a single column. This can be used to split a column into a list.

Your CSV file may have raw text fields that cannot be reliably delimited using any character. In this case, explicitly set the multiValueDelimeter field to empty in the ingestion config. multiValueDelimiter: ''

AVRO

The Avro record reader converts the data in file to a GenericRecord. A java class or .avro file is not required.

JSON

Thrift

Note: Thrift requires the generated class using .thrift file to parse the data. The .class file should be available in the Pinot's classpath. You can put the files in the lib/ folder of pinot distribution directory.

Parquet

The above class doesn't read the Parquet INT96 and Decimaltype.

Please use the below class to handle INT96 and Decimaltype.

ORC

ORC record reader supports the following data types -

In LIST and MAP types, the object should only belong to one of the data types supported by Pinot.

Protocol Buffers

The reader requires a descriptor file to deserialize the data present in the files. You can generate the descriptor file (.desc) from the .proto file using the command -

Import Data

hashtagPinot Batch Ingestion

hashtagPinot Stream Ingestion

hashtagPinot File Systems

hashtagPinot Input Formats

Batch Ingestion

hashtagCreate Schema Configuration

hashtagCreate Table Configuration

hashtagUpload Schema and Table

hashtagUpload data

hashtagMinion Based Ingestion

hashtagUpload API

hashtag/ingestFromFile

hashtag/ingestFromURI

hashtagIngestion Jobs

hashtagSegment Push Job Type

hashtag1. Segment Tar Push

hashtag2. Segment URI Push

hashtag3. Segment Metadata Push

hashtagSegment Fetchers

hashtagPersistence

hashtagTuning

hashtagStandalone

hashtagHadoop

hashtagSpark

Spark

Hadoop

hashtagSegment Creation and Push

Backfill Data

hashtagIntroduction

hashtagHow to Backfill data in Pinot

hashtagEdge case

Dimension Table

Amazon Kinesis

hashtagLimitations

Stream Ingestion with Upsert

hashtag

File systems

Amazon S3

Azure Data Lake Storage

HDFS

Google Cloud Storage

hashtagExamples

hashtagJob spec

hashtagController config

hashtagServer config

hashtagMinion config

Complex Type (Array, Map) Handling

Import Data

hashtagPinot Batch Ingestion

hashtagPinot Stream Ingestion

hashtagPinot File Systems

hashtagPinot Input Formats

Spark

Dimension Table

Backfill Data

hashtagIntroduction

hashtagHow to Backfill data in Pinot

hashtagEdge case

Batch Ingestion

hashtagCreate Schema Configuration

hashtagCreate Table Configuration

hashtagUpload Schema and Table

hashtagUpload data

hashtagMinion Based Ingestion

hashtagUpload API

hashtag/ingestFromFile

hashtag/ingestFromURI

hashtagIngestion Jobs

hashtagSegment Push Job Type

hashtag1. Segment Tar Push

hashtag2. Segment URI Push

hashtag3. Segment Metadata Push

hashtagSegment Fetchers

hashtagPersistence

hashtagTuning

hashtagStandalone

hashtagHadoop

hashtagSpark

File systems

Pinot Batch Ingestion

Pinot Stream Ingestion

Pinot File Systems

Pinot Input Formats

Create Schema Configuration

Create Table Configuration

Upload Schema and Table

Upload data

Minion Based Ingestion

Upload API

/ingestFromFile

/ingestFromURI

Ingestion Jobs

Segment Push Job Type

1. Segment Tar Push

2. Segment URI Push

3. Segment Metadata Push

Segment Fetchers

Persistence

Tuning

Standalone

Hadoop

Spark

Segment Creation and Push

Introduction

How to Backfill data in Pinot

Edge case

Limitations

Examples

Job spec

Controller config

Server config

Minion config

Pinot Batch Ingestion

Pinot Stream Ingestion

Pinot File Systems

Pinot Input Formats

Introduction

How to Backfill data in Pinot

Edge case

Create Schema Configuration

Create Table Configuration

Upload Schema and Table

Upload data

Minion Based Ingestion

Upload API

/ingestFromFile

/ingestFromURI

Ingestion Jobs

Segment Push Job Type

1. Segment Tar Push

2. Segment URI Push

3. Segment Metadata Push

Segment Fetchers

Persistence

Tuning

Standalone

Hadoop

Spark

Segment Creation and Push

Data Preprocessing before Segment Creation

preprocessing.num.reducers

preprocessing.max.num.records.per.file

Examples

Job spec

Controller config

Server config

Minion config

Limitations

Examples

Job spec

Controller config

Server config

Minion config

Push HDFS segment to Pinot Controller

Examples

Job spec

Controller config

Server config

Minion config