1 of 26

Import Data

This page lists options for importing data into Pinot with links to detailed instructions with examples.

There are multiple options for importing data into Pinot. The pages in this section provide step-by-step instructions for importing records into Pinot, supported by our plugin architecture. The intent is to get you up and running with imported data as quickly as possible.

Pinot supports multiple file input formats without needing to change anything other than the file name. Each example imports a readsdsdy-made dataset so you can see how things work without needing to find or create your own dataset.

Pinot Batch Ingestion

These guides show you how to import data from popular big data platforms.

Pinot Stream Ingestion

This guide shows you how to import data using stream ingestion from Apache Kafka topics.

This guide shows you how to import data using stream ingestion with upsert.

This guide shows you how to import data using stream ingestion with deduplication.

This guide shows you how to import data using stream ingestion with CLP.

Pinot file systems

By default, Pinot does not come with a storage layer, so all the data sent won't be stored in case of system crash. In order to persistently store the generated segments, you will need to change controller and server configs to add a deep storage. See for all the info and related configs.

These guides show you how to import data and persist it in these file systems.

Pinot input formats

This guide shows you how to import data from various Pinot-supported input formats.

This guide shows you how to handle the complex type in the ingested data, such as map and array.

This guide shows you how to handle records with dynamic schemas, like JSON log events.

Reloading and uploading existing Pinot segments

This guide shows you how to reload Pinot segments from your deep store.

This guide shows you how to upload Pinot segments from an old, closed Pinot instance.

From Query Console

Insert a file into Pinot from Query Console

This feature is supported after the 0.11.0 release. Reference PR: https://github.com/apache/pinot/pull/8557

Prerequisite

Ensure you have available Pinot Minion instances deployed within the cluster.
Pinot version is 0.11.0 or above

How it works

Parse the query with the table name and directory URI along with a list of options for the ingestion job.
Call controller minion task execution API endpoint to schedule the task on minion
Response has the schema of table name and task job id.

Usage Syntax

INSERT INTO [database.]table FROM FILE dataDirURI OPTION ( k=v ) [, OPTION (k=v)]*

Example

Insert Rows into Pinot

We are actively developing this feature...

The details will be revealed soon.

Spark

Batch ingestion of data into Apache Pinot using Apache Spark.

Pinot supports Apache Spark (2.x and 3.x) as a processor to create and push segment files to the database. Pinot distribution is bundled with the Spark code to process your files and convert and upload them to Pinot.

To set up Spark, do one of the following:

Use the Spark-Pinot Connector. For more information, see the ReadMe.
Follow the instructions below.

You can follow the to build Pinot from source. The resulting JAR file can be found in pinot/target/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar

If you do build Pinot from Source, you should consider opting into using the build-shaded-jar jar profile with -Pbuild-shaded-jar. While Pinot does not bundle spark into its jar, it does bundle certain hadoop libraries.

Next, you need to change the execution config in the to the following:

To run Spark ingestion, you need the following jars in your classpath

pinot-batch-ingestion-spark plugin jar - available in plugins-external directory in the package
pinot-all jar - available in lib directory in the package

These jars can be specified using spark.driver.extraClassPath or any other option.

For loading any other plugins that you want to use, use:

The complete spark-submit command should look like this:

Ensure environment variables PINOT_ROOT_DIR and PINOT_VERSION are set properly.

Note: You should change the master to yarn and deploy-mode to cluster for production environments.

We have stopped including spark-core dependency in our jars post 0.10.0 release. Users can try 0.11.0-SNAPSHOT and later versions of pinot-batch-ingestion-spark in case of any runtime issues. You can either or download latest master build jars.

Running in Cluster Mode on YARN

If you want to run the spark job in cluster mode on YARN/EMR cluster, the following needs to be done -

Build Pinot from source with option -DuseProvidedHadoop
Copy Pinot binaries to S3, HDFS or any other distributed storage that is accessible from all nodes.
Copy Ingestion spec YAML file to S3, HDFS or any other distributed storage. Mention this path as part of --files

Example

For Spark 3.x, replace pinot-batch-ingestion-spark-2.4 with pinot-batch-ingestion-spark-3.2 in all places in the commands. Also, ensure the classpath in ingestion spec is changed from org.apache.pinot.plugin.ingestion.batch.spark. to org.apache.pinot.plugin.ingestion.batch.spark3.

FAQ

Q - I am getting the following exception - Class has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0

Since 0.8.0 release, Pinot binaries are compiled with JDK 11. If you are using Spark along with Hadoop 2.7+, you need to use the Java8 version of Pinot. Currently, you need to .

Q - I am not able to find pinot-batch-ingestion-spark jar.

For Pinot version prior to 0.10.0, the spark plugin is located in plugin dir of binary distribution. For 0.10.0 and later, it is located in pinot-external dir.

Q - Spark is not able to find the jars leading to java.nio.file.NoSuchFileException

This means the classpath for spark job has not been configured properly. If you are running spark in a distributed environment such as Yarn or k8s, make sure both spark.driver.classpath and spark.executor.classpath are set. Also, the jars in driver.classpath should be added to --jars argument in spark-submit so that spark can distribute those jars to all the nodes in your cluster. You also need to take provide appropriate scheme with the file path when running the jar. In this doc, we have used local:\\ but it can be different depending on your cluster setup.

Q - Spark job failing while pushing the segments.

It can be because of misconfigured controllerURI in job spec yaml file. If the controllerURI is correct, make sure it is accessible from all the nodes of your YARN or k8s cluster.

Q - My data gets overwritten during ingestion.

Set to APPEND in the tableConfig.

If already set to APPEND, this is likely due to a missing timeColumnName in your table config. If you can't provide a time column, use our in ingestion spec. Generally using inputFile segment name generator should fix your issue.

Q - I am getting java.lang.RuntimeException: java.io.IOException: Failed to create directory: pinot-plugins-dir-0/plugins/*

Removing -Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins from spark.driver.extraJavaOptions should fix this. As long as plugins are mentioned in classpath and jars argument it should not be an issue.

Q - Getting Class not found: exception

Check if extraClassPath arguments contain all the plugin jars for both driver and executors. Also, all the plugin jars are mentioned in the --jars argument. If both of these are correct, check if the extraClassPath contains local filesystem classpaths and not s3 or hdfs or any other distributed file system classpaths.

Flink

Batch ingestion of data into Apache Pinot using Apache Flink.

Pinot supports Apache Flink as a processing framework to push segment files to the database.

Pinot distribution contains an Apache Flink that can be used as part of the Apache Flink application (Streaming or Batch) to directly write into a designated Pinot database.

Example

Hadoop

Batch ingestion of data into Apache Pinot using Apache Hadoop.

Segment Creation and Push

Pinot supports Apache Hadoop as a processor to create and push segment files to the database. Pinot distribution is bundled with the Spark code to process your files and convert and upload them to Pinot.

You can follow the wiki to build Pinot from source. The resulting JAR file can be found in pinot/target/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar

Next, you need to change the execution config in the job spec to the following -

You can check out the sample job spec here.

Finally execute the hadoop job using the command -

Ensure environment variables PINOT_ROOT_DIR and PINOT_VERSION are set properly.

Data Preprocessing before Segment Creation

We’ve seen some requests that data should be massaged (like partitioning, sorting, resizing) before creating and pushing segments to Pinot.

The MapReduce job called SegmentPreprocessingJob would be the best fit for this use case, regardless of whether the input data is of AVRO or ORC format.

Check the below example to see how to use SegmentPreprocessingJob.

In Hadoop properties, set the following to enable this job:

In table config, specify the operations in preprocessing.operations that you'd like to enable in the MR job, and then specify the exact configs regarding those operations:

preprocessing.num.reducers

Minimum number of reducers. Optional. Fetched when partitioning gets disabled and resizing is enabled. This parameter is to avoid having too many small input files for Pinot, which leads to the case where Pinot server is holding too many small segments, causing too many threads.

preprocessing.max.num.records.per.file

Maximum number of records per reducer. Optional.Unlike, “preprocessing.num.reducers”, this parameter is to avoid having too few large input files for Pinot, which misses the advantage of muti-threading when querying. When not set, each reducer will finally generate one output file. When set (e.g. M), the original output file will be split into multiple files and each new output file contains at most M records. It does not matter whether partitioning is enabled or not.

For more details on this MR job, refer to this .

Backfill Data

Batch ingestion of backfill data into Apache Pinot.

Introduction

Pinot batch ingestion involves two parts: routine ingestion job(hourly/daily) and backfill. Here are some examples to show how routine batch ingestion works in Pinot offline table:

Dimension table

Batch ingestion of data into Apache Pinot using dimension tables.

Dimension tables are a special kind of offline tables from which data can be looked up via the , providing join-like functionality.

Dimension tables are replicated on all the hosts for a given tenant to allow faster lookups. When a table is marked as a dimension table, it will be replicated on all the hosts, which means that these tables must be small in size.

A dimension table cannot be part of a .

Configure dimension tables using following properties in the table configuration:

Amazon Kinesis

This guide shows you how to ingest a stream of records from an Amazon Kinesis topic into a Pinot table.

To ingest events from an Amazon Kinesis stream into Pinot, set the following configs into the table config:

where the Kinesis specific properties are:

Property

Description

Segment compaction on upserts

Use segment compaction on upsert-enabled real-time tables.

Overview of segment compaction

Compacting a segment replaces the completed segment with a compacted segment that only contains the latest version of records. For more information about how to use upserts on a real-time table in Pinot, see .

The Pinot upsert feature stores all versions of the record ingested into immutable segments on disk. Even though the previous versions are not queried, they continue to add to the storage overhead. To remove older records (no longer used in query results) and reclaim storage space, we need to compact Pinot segments periodically. Segment compaction is done via a new minion task. To schedule Pinot tasks periodically, see the .

Stream Ingestion with Dedup

Deduplication support in Apache Pinot.

Pinot provides native support for deduplication (dedup) during the real-time ingestion (v0.11.0+).

Prerequisites for enabling dedup

To enable dedup on a Pinot table, make the following table configuration and schema changes:

Define the primary key in the schema

To be able to dedup records, a primary key is needed to uniquely identify a given record. To define a primary key, add the field primaryKeyColumns to the schema definition.

Note this field expects a list of columns, as the primary key can be composite.

While ingesting a record, if its primary key is found to be already present, the record will be dropped.

Partition the input stream by the primary key

An important requirement for the Pinot dedup table is to partition the input stream by the primary key. For Kafka messages, this means the producer shall set the key in the API. If the original stream is not partitioned, then a streaming processing job (e.g. Flink) is needed to shuffle and repartition the input stream into a partitioned one for Pinot's ingestion.

Use strictReplicaGroup for routing

The dedup Pinot table can use only the low-level consumer for the input streams. As a result, it uses the for the segments. Moreover, dedup poses the additional requirement that all segments of the same partition must be served from the same server to ensure the data consistency across the segments. Accordingly, it requires strictReplicaGroup as the routing strategy. To use that, configure instanceSelectorType in Routing as the following:

Other limitations

The high-level consumer is not allowed for the input stream ingestion, which means stream.kafka.consumer.type must be lowLevel.
The incoming stream must be partitioned by the primary key such that, all records with a given primaryKey must be consumed by the same Pinot server instance.

Enable dedup in the table configurations

To enable dedup for a REALTIME table, add the following to the table config.

Supported values for hashFunction are NONE, MD5 and MURMUR3, with the default being NONE.

Best practices

Unlike other real-time tables, Dedup table takes up more memory resources as it needs to bookkeep the primary key and its corresponding segment reference, in memory. As a result, it's important to plan the capacity beforehand, and monitor the resource usage. Here are some recommended practices of using Dedup table.

Create the Kafka topic with more partitions. The number of Kafka partitions determines the partition numbers of the Pinot table. The more partitions you have in the Kafka topic, more Pinot servers you can distribute the Pinot table to and therefore more you can scale the table horizontally.
Dedup table maintains an in-memory map from the primary key to the segment reference. So it's recommended to use a simple primary key type and avoid composite primary keys to save the memory cost. In addition, consider the hashFunction config in the Dedup config, which can be MD5 or MURMUR3, to store the 128-bit hashcode of the primary key instead. This is useful when your primary key takes more space. But keep in mind, this hash may introduce collisions, though the chance is very low.

Stream Ingestion with CLP

Support for encoding fields with CLP during ingestion.

This is an experimental feature. Configuration options and usage may change frequently until it is stabilized.

When performing stream ingestion of JSON records using Kafka, users can encode specific fields with CLP by using a CLP-specific StreamMessageDecoder.

CLP is a compressor designed to encode unstructured log messages in a way that makes them more compressible while retaining the ability to search them. It does this by decomposing the message into three fields:

the message's static text, called a log type;
repetitive variable values, called dictionary variables; and
non-repetitive variable values (called encoded variables since we encode them specially if possible).

Searches are similarly decomposed into queries on the individual fields.

Although CLP is designed for log messages, other unstructured text like file paths may also benefit from its encoding.

For example, consider this JSON record:

If the user specifies the fields message and logPath should be encoded with CLP, then the StreamMessageDecoder will output:

In the fields with the _logtype suffix, \x11 is a placeholder for an integer variable, \x12 is a placeholder for a dictionary variable, and \x13 is a placeholder for a float variable. In message_encoedVars, the float variable 0.335 is encoded as an integer using CLP's custom encoding.

All remaining fields are processed in the same way as they are in org.apache.pinot.plugin.inputformat.json.JSONRecordExtractor. Specifically, fields in the table's schema are extracted from each record and any remaining fields are dropped.

Configuration

Table Index

Assuming the user wants to encode message and logPath as in the example, they should change/add the following settings to their tableIndexConfig (we omit irrelevant settings for brevity):

stream.kafka.decoder.prop.fieldsForClpEncoding is a comma-separated list of names for fields that should be encoded with CLP.
We use for the logtype and dictionary variables since their length can vary significantly.

Schema

For the table's schema, users should configure the CLP-encoded fields as follows (we omit irrelevant settings for brevity):

We use the maximum possible length for the logtype and dictionary variable columns.
The dictionary and encoded variable columns are multi-valued columns.

Searching and decoding CLP-encoded fields

To decode CLP-encoded fields, use .

To search CLP-encoded fields, you can combine CLPDECODE with LIKE. Note, this may decrease performance when querying a large number of rows.

We are working to integrate efficient searches on CLP-encoded columns as another UDF. The development of this feature is being tracked in this .

File Systems

This section contains a collection of short guides to show you how to import data from a Pinot-supported file system.

FileSystem is an abstraction provided by Pinot to access data stored in distributed file systems (DFS).

Pinot uses distributed file systems for the following purposes:

Batch ingestion job: To read the input data (CSV, Avro, Thrift, etc.) and to write generated segments to DFS.
Controller: When a segment is uploaded to the controller, the controller saves it in the configured DFS.
Server:- When a server(s) is notified of a new segment, the server copies the segment from remote DFS to their local node using the DFS abstraction.

Supported file systems

Pinot lets you choose a distributed file system provider. The following file systems are supported by Pinot:

Enabling a file system

To use a distributed file system, you need to enable plugins. To do that, specify the plugin directory and include the required plugins:

You can change the file system in the controller and server configuration. In the following configuration example, the URI is s3://bucket/path/to/file and scheme refers to the file system URI prefix s3.

You can also change the file system during ingestion. In the ingestion job spec, specify the file system with the following configuration:

Amazon S3

This guide shows you how to import data from files stored in Amazon S3.

Enable the Amazon S3 file system backend by including the pinot-s3 plugin. In the controller or server configuration, add the config:

By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...

You can configure the S3 file system using the following options:

Configuration

Description

Each of these properties should be prefixed by pinot.[node].storage.factory.s3. where node is either controller or server depending on the config

e.g.

S3 Filesystem supports authentication using the . The credential provider looks for the credentials in the following order -

Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)
Java System Properties - aws.accessKeyId and aws.secretKey

You can also specify the accessKey and secretKey using the properties. However, this method is not secure and should be used only for POC setups.

Examples

Job spec

Controller config

Server config

Minion config

Azure Data Lake Storage

This guide shows you how to import data from files stored in Azure Data Lake Storage Gen2 (ADLS Gen2)

Enable the Azure Data Lake Storage using the pinot-adls plugin. In the controller or server, add the config:

HDFS

This guide shows you how to import data from HDFS.

Enable the Hadoop distributed file system (HDFS) using the pinot-hdfs plugin. In the controller or server, add the config:

HDFS implementation provides the following options:

hadoop.conf.path: Absolute path of the directory containing Hadoop XML configuration files, such as hdfs-site.xml, core-site.xml .
hadoop.write.checksum: Create checksum while pushing an object. Default is false

Each of these properties should be prefixed by pinot.[node].storage.factory.class.hdfs. where node is either controller or server depending on the config

The kerberos configs should be used only if your Hadoop installation is secured with Kerberos. Refer to the for information on how to secure Hadoop using Kerberos.

You must provide proper Hadoop dependencies jars from your Hadoop installation to your Pinot startup scripts.

Push HDFS segment to Pinot Controller

To push HDFS segment files to Pinot controller, send the HDFS path of your newly created segment files to the Pinot Controller. The controller will download the files.

This curl example requests tells the controller to download segment files to the proper table:

Examples

Job spec

Standalone Job:

Hadoop Job:

Controller config

Server config

Minion config

Google Cloud Storage

This guide shows you how to import data from GCP (Google Cloud Platform).

Enable the using the pinot-gcs plugin. In the controller or server, add the config:

Ingest records with dynamic schemas

Storing records with dynamic schemas in a table with a fixed schema.

Some domains (e.g., logging) generate records where each record can have a different set of keys, whereas Pinot tables have a relatively static schema. For records with varying keys, it's impractical to store each field in its own table column. However, most (if not all) fields may be important, so fields should not be dropped unnecessarily.

The SchemaConformingTransformer is a RecordTransformer that can transform records with dynamic schemas such that they can be ingested in a table with a static schema. The transformer primarily takes record fields that don't exist in the schema and stores them in a type of catchall field.

For example, consider this record:

Let's say the table's schema contains the following fields:

timestamp
hostname
level
message
tags.platform
tags.service
indexableExtras
unindexableExtras

Without this transformer, the HOSTNAME field and the entire tags field would be dropped when storing the record in the table. However, with this transformer, the record would be transformed into the following:

Notice that the transformer does the following:

Flattens nested fields which exist in the schema, like tags.platform
Drops some fields like HOSTNAME, where HOSTNAME must be listed as a field in the config option fieldPathsToDrop

The unindexableExtras field allows the transformer to separate fields that don't need indexing (because they are only retrieved, not searched) from those that do.

SchemaConformingTransformer Configuration

To use the transformer, add the schemaConformingTransformerConfig option in the ingestionConfig section of your table configuration, as shown in the following example.

For example:

Available configuration options are listed in .

Reload a table segment

Reload a table segment in Apache Pinot.

When Pinot writes data to segments in a table, it saves those segments to a deep store location specified in your table configuration, such as a storage drive or Amazon S3 bucket.

To reload segments from your deep store, use the Pinot Controller API or Pinot Admin Console.

Use the Pinot Controller API to reload segments

To reload all segments from a table, use:

POST /segments/{tableName}/reload

To reload a specific segment from a table, use:

A successful API call returns the following response:

Use the Pinot Admin Console to reload segments

To use the Pinot Admin Console, do the following:

From the left navigation menu, select Cluster Manager.
Under TENANTS, select the Tenant Name.
From the list of tables in the tenant, select the Table Name.

Spark

Batch ingestion of data into Apache Pinot using Apache Spark.

To set up Spark, do one of the following:

Use the Spark-Pinot Connector. For more information, see the ReadMe.
Follow the instructions below.

You can follow the to build Pinot from source. The resulting JAR file can be found in pinot/target/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar

Next, you need to change the execution config in the to the following:

To run Spark ingestion, you need the following jars in your classpath

pinot-batch-ingestion-spark plugin jar - available in plugins-external directory in the package
pinot-all jar - available in lib directory in the package

These jars can be specified using spark.driver.extraClassPath or any other option.

For loading any other plugins that you want to use, use:

The complete spark-submit command should look like this:

Ensure environment variables PINOT_ROOT_DIR and PINOT_VERSION are set properly.

Note: You should change the master to yarn and deploy-mode to cluster for production environments.

Running in Cluster Mode on YARN

If you want to run the spark job in cluster mode on YARN/EMR cluster, the following needs to be done -

Build Pinot from source with option -DuseProvidedHadoop
Copy Pinot binaries to S3, HDFS or any other distributed storage that is accessible from all nodes.
Copy Ingestion spec YAML file to S3, HDFS or any other distributed storage. Mention this path as part of --files

Example

FAQ

Since 0.8.0 release, Pinot binaries are compiled with JDK 11. If you are using Spark along with Hadoop 2.7+, you need to use the Java8 version of Pinot. Currently, you need to .

Q - I am not able to find pinot-batch-ingestion-spark jar.

For Pinot version prior to 0.10.0, the spark plugin is located in plugin dir of binary distribution. For 0.10.0 and later, it is located in pinot-external dir.

Q - Spark is not able to find the jars leading to java.nio.file.NoSuchFileException

Q - Spark job failing while pushing the segments.

It can be because of misconfigured controllerURI in job spec yaml file. If the controllerURI is correct, make sure it is accessible from all the nodes of your YARN or k8s cluster.

Q - My data gets overwritten during ingestion.

Set to APPEND in the tableConfig.

Q - I am getting java.lang.RuntimeException: java.io.IOException: Failed to create directory: pinot-plugins-dir-0/plugins/*

Q - Getting Class not found: exception

Complex Type (Array, Map) Handling

Complex type handling in Apache Pinot.

Commonly, ingested data has a complex structure. For example, Avro schemas have records and arrays while JSON supports objects and arrays.

Apache Pinot's data model supports primitive data types (including int, long, float, double, BigDecimal, string, bytes), and limited multi-value types, such as an array of primitive types. Simple data types allow Pinot to build fast indexing structures for good query performance, but does require some handling of the complex structures.

There are two options for complex type handling:

Convert the complex-type data into a JSON string and then build a JSON index.
Use the built-in complex-type handling rules in the ingestion configuration.

On this page, we'll show how to handle these complex-type structures with each of these two approaches. We will process some example data, consisting of the field group from the .

This object has two child fields and the child group is a nested array with elements of object type.

JSON indexing

Apache Pinot provides a powerful to accelerate the value lookup and filtering for the column. To convert an object group with complex type to JSON, add the following to your table configuration.

The config transformConfigs transforms the object group to a JSON string group_json, which then creates the JSON indexing with configuration jsonIndexColumns. To read the full spec, see .

Also, note that group is a reserved keyword in SQL and therefore needs to be quoted in transformFunction.

The columnName can't use the same name as any of the fields in the source JSON data, for example, if our source data contains the field group and we want to transform the data in that field before persisting it, the destination column name would need to be something different, like group_json.

Note that you do not need to worry about the maxLength of the field group_json on the schema, because "JSON" data type does not have a maxLength and will not be truncated. This is true even though "JSON" is stored as a string internally.

The schema will look like this:

For the full specification, see .

With this, you can start to query the nested fields under group. For more details about the supported JSON function, see ).

Ingestion configurations

Though JSON indexing is a handy way to process the complex types, there are some limitations:

It’s not performant to group by or order by a JSON field, because JSON_EXTRACT_SCALAR is needed to extract the values in the GROUP BY and ORDER BY clauses, which invokes the function evaluation.
It does not work with Pinot's such as DISTINCTCOUNTMV.

Alternatively, from Pinot 0.8, you can use the complex-type handling in ingestion configurations to flatten and unnest the complex structure and convert them into primitive types. Then you can reduce the complex-type data into a flattened Pinot table, and query it via SQL. With the built-in processing rules, you do not need to write ETL jobs in another compute framework such as Flink or Spark.

To process this complex type, you can add the configuration complexTypeConfig to the ingestionConfig. For example:

With the complexTypeConfig , all the map objects will be flattened to direct fields automatically. And with unnestFields , a record with the nested collection will unnest into multiple records. For instance, the example at the beginning will transform into two rows with this configuration example.

Note that:

The nested field group_id under group is flattened to group.group_id. The default value of the delimiter is . You can choose another delimiter by specifying the configuration delimiter under complexTypeConfig. This flattening rule also applies to maps in the collections to be unnested.

You can find the full specifications of the table config and the table schema .

You can then query the table with primitive values using the following SQL query:

. is a reserved character in SQL, so you need to quote the flattened columns in the query.

Infer the Pinot schema from the Avro schema and JSON data

When there are complex structures, it can be challenging and tedious to figure out the Pinot schema manually. To help with schema inference, Pinot provides utility tools to take the Avro schema or JSON data as input and output the inferred Pinot schema.

To infer the Pinot schema from Avro schema, you can use a command like this:

Note you can input configurations like fieldsToUnnest similar to the ones in complexTypeConfig. And this will simulate the complex-type handling rules on the Avro schema and output the Pinot schema in the file specified in outputDir.

Similarly, you can use the command like the following to infer the Pinot schema from a file of JSON objects.

You can check out an example of this run in this .

Batch Ingestion

Batch ingestion of data into Apache Pinot.

With batch ingestion you create a table using data already present in a file system such as S3. This is particularly useful when you want to use Pinot to query across large data with minimal latency or to test out new features using a simple data file.

To ingest data from a filesystem, perform the following steps, which are described in more detail in this page:

Create schema configuration
Create table configuration
Upload schema and table configs
Upload data

Batch ingestion currently supports the following mechanisms to upload the data:

Standalone

Here's an example using standalone local processing.

First, create a table using the following CSV data.

Create schema configuration

In our data, the only column on which aggregations can be performed is score. Secondly, timestampInEpoch is the only timestamp column. So, on our schema, we keep score as metric and timestampInEpoch as timestamp column.

Here, we have also defined two extra fields: format and granularity. The format specifies the formatting of our timestamp column in the data source. Currently, it's in milliseconds, so we've specified 1:MILLISECONDS:EPOCH.

Create table configuration

We define a table transcript and map the schema created in the previous step to the table. For batch data, we keep the tableType as OFFLINE.

Upload schema and table configs

Now that we have both the configs, upload them and create a table by running the following command:

Check out the table config and schema in the \[Rest API] to make sure it was successfully uploaded.

Upload data

We now have an empty table in Pinot. Next, upload the CSV file to this empty table.

A table is composed of multiple segments. The segments can be created in the following three ways:

Minion based ingestion\
Upload API\
Ingestion jobs

Minion-based ingestion

Refer to

Upload API

There are 2 controller APIs that can be used for a quick ingestion test using a small file.

When these APIs are invoked, the controller has to download the file and build the segment locally.

Hence, these APIs are NOT meant for production environments and for large input files.

/ingestFromFile

This API creates a segment using the given file and pushes it to Pinot. All steps happen on the controller.

Example usage:

To upload a JSON file data.json to a table called foo_OFFLINE, use below command

Note that query params need to be URLEncoded. For example, {"inputFormat":"json"} in the command below needs to be converted to %7B%22inputFormat%22%3A%22json%22%7D.

The batchConfigMapStr can be used to pass in additional properties needed for decoding the file. For example, in case of csv, you may need to provide the delimiter

/ingestFromURI

This API creates a segment using file at the given URI and pushes it to Pinot. Properties to access the FS need to be provided in the batchConfigMap. All steps happen on the controller. Example usage:

Ingestion jobs

Segments can be created and uploaded using tasks known as DataIngestionJobs. A job also needs a config of its own. We call this config the JobSpec.

For our CSV file and table, the JobSpec should look like this:

For more detail, refer to .

Now that we have the job spec for our table transcript, we can trigger the job using the following command:

Once the job successfully finishes, head over to the \[query console] and start playing with the data.

Segment push job type

There are 3 ways to upload a Pinot segment:

Segment tar push
Segment URI push
Segment metadata push

Segment tar push

This is the original and default push mechanism.

Tar push requires the segment to be stored locally or can be opened as an InputStream on PinotFS. So we can stream the entire segment tar file to the controller.

The push job will:

Upload the entire segment tar file to the Pinot controller.

Pinot controller will:

Save the segment into the controller segment directory(Local or any PinotFS).
Extract segment metadata.
Add the segment to the table.

Segment URI push

This push mechanism requires the segment tar file stored on a deep store with a globally accessible segment tar URI.

URI push is light-weight on the client-side, and the controller side requires equivalent work as the tar push.

The push job will:

POST this segment tar URI to the Pinot controller.

Pinot controller will:

Download segment from the URI and save it to controller segment directory (local or any PinotFS).
Extract segment metadata.
Add the segment to the table.

Segment metadata push

This push mechanism also requires the segment tar file stored on a deep store with a globally accessible segment tar URI.

Metadata push is light-weight on the controller side, there is no deep store download involves from the controller side.

The push job will:

Download the segment based on URI.
Extract metadata.
Upload metadata to the Pinot Controller.

Pinot Controller will:

Add the segment to the table based on the metadata.

4. Segment Metadata Push with copyToDeepStore

This extends the original Segment Metadata Push for cases, where the segments are pushed to a location not used as deep store. The ingestion job can still do metadata push but ask Pinot Controller to copy the segments into deep store. Those use cases usually happen when the ingestion jobs don't have direct access to deep store but still want to use metadata push for its efficiency, thus using a staging location to keep the segments temporarily.

NOTE: the staging location and deep store have to use same storage scheme, like both on s3. This is because the copy is done via PinotFS.copyDir interface that assumes so; but also because this does copy at storage system side, so segments don't need to go through Pinot Controller at all.

To make this work, grant Pinot controllers access to the staging location. For example on AWS, this may require adding an access policy like this example for the controller EC2 instances:

Then use metadata push to add one extra config like this one:

Consistent data push and rollback

Pinot supports atomic update on segment level, which means that when data consisting of multiple segments are pushed to a table, as segments are replaced one at a time, queries to the broker during this upload phase may produce inconsistent results due to interleaving of old and new data.

See for how to enable this feature.

Segment fetchers

When Pinot segment files are created in external systems (Hadoop/spark/etc), there are several ways to push those data to the Pinot controller and server:

Push segment to shared NFS and let pinot pull segment files from the location of that NFS. See .
Push segment to a Web server and let pinot pull segment files from the Web server with HTTP/HTTPS link. See .
Push segment to PinotFS(HDFS/S3/GCS/ADLS) and let pinot pull segment files from PinotFS URI. See and .

The first three options are supported out of the box within the Pinot package. As long your remote jobs send Pinot controller with the corresponding URI to the files, it will pick up the file and allocate it to proper Pinot servers and brokers. To enable Pinot support for PinotFS, you'll need to provide configuration and proper Hadoop dependencies.

Persistence

By default, Pinot does not come with a storage layer, so all the data sent, won't be stored in case of a system crash. In order to persistently store the generated segments, you will need to change controller and server configs to add deep storage. Checkout for all the info and related configs.

Tuning

Standalone

Since pinot is written in Java, you can set the following basic Java configurations to tune the segment runner job -

Log4j2 file location with -Dlog4j2.configurationFile
Plugin directory location with -Dplugins.dir=/opt/pinot/plugins
JVM props, like -Xmx8g -Xms4G

If you are using the docker, you can set the following under JAVA_OPTS variable.

Hadoop

You can set -D mapreduce.map.memory.mb=8192 to set the mapper memory size when submitting the Hadoop job.

Spark

You can add config spark.executor.memory to tune the memory usage for segment creation when submitting the Spark job.

Stream Ingestion with Upsert

Upsert support in Apache Pinot.

Pinot provides native support of upserts during real-time ingestion. There are scenarios where records need modifications, such as correcting a ride fare or updating a delivery status.

Partial upserts are convenient as you only need to specify the columns where values change, and you ignore the rest.

Overview of upserts in Pinot

See an overview of how upserts work in Pinot 1.0.

Enable upserts in Pinot

To enable upserts on a Pinot table, do the following:

Define the primary key in the schema

To update a record, you need a primary key to uniquely identify the record. To define a primary key, add the field primaryKeyColumns to the schema definition. For example, the schema definition of UpsertMeetupRSVP in the quick start example has this definition.

Note this field expects a list of columns, as the primary key can be a composite.

When two records of the same primary key are ingested, the record with the greater comparison value (timeColumn by default) is used. When records have the same primary key and event time, then the order is not determined. In most cases, the later ingested record will be used, but this may not be true in cases where the table has a column to sort by.

Partition the input stream by the primary key

An important requirement for the Pinot upsert table is to partition the input stream by the primary key. For Kafka messages, this means the producer shall set the key in the API. If the original stream is not partitioned, then a streaming processing job (such as with Flink) is needed to shuffle and repartition the input stream into a partitioned one for Pinot's ingestion. Additionally if using segmentPartitionConfigto leverage Broker segment pruning then it's important to ensure that the partition function used matches both on the Kafka producer side as well as Pinot. In Kafka default for Java client is 32-bit murmur2 hash and for all other languages such as Python its CRC32 (Cyclic Redundancy Check 32-bit).

Enable upsert in the table configurations

To enable upsert, make the following configurations in the table configurations.

Upsert modes

Full upsert

The upsert mode defaults to FULL . FULL upsert means that a new record will replace the older record completely if they have same primary key. Example config:

Partial upserts

Partial upsert lets you choose to update only specific columns and ignore the rest.

To enable the partial upsert, set the mode to PARTIAL and specify partialUpsertStrategies for partial upsert columns. Since release-0.10.0, OVERWRITE is used as the default strategy for columns without a specified strategy. defaultPartialUpsertStrategy is also introduced to change the default strategy for all columns.

Note that null handling must be enabled for partial upsert to work.

For example:

Pinot supports the following partial upsert strategies:

Strategy

Description

With partial upsert, if the value is null in either the existing record or the new coming record, Pinot will ignore the upsert strategy and the null value:

(null, newValue) -> newValue

(oldValue, null

None upserts

If set mode to NONE, the upsert is disabled.

Comparison column

By default, Pinot uses the value in the time column (timeColumn in tableConfig) to determine the latest record. That means, for two records with the same primary key, the record with the larger value of the time column is picked as the latest update. However, there are cases when users need to use another column to determine the order. In such case, you can use option comparisonColumn to override the column used for comparison. For example,

For partial upsert table, the out-of-order events won't be consumed and indexed. For example, for two records with the same primary key, if the record with the smaller value of the comparison column came later than the other record, it will be skipped.

Multiple comparison columns

In some cases, especially where partial upsert might be employed, there may be multiple producers of data each writing to a mutually exclusive set of columns, sharing only the primary key. In such a case, it may be helpful to use one comparison column per producer group so that each group can manage its own specific versioning semantics without the need to coordinate versioning across other producer groups.

Documents written to Pinot are expected to have exactly 1 non-null value out of the set of comparisonColumns; if more than 1 of the columns contains a value, the document will be rejected. When new documents are written, whichever comparison column is non-null will be compared against only that same comparison column seen in prior documents with the same primary key. Consider the following examples, where the documents are assumed to arrive in the order specified in the array.

The following would occur:

orderReceived: 1

Result: persisted
Reason: first doc seen for primary key "aa"

orderReceived: 2

Result: persisted (replacing orderReceived: 1)
Reason: comparison column (secondsSinceEpoch) larger than that previously seen

orderReceived: 3

Result: rejected
Reason: comparison column (secondsSinceEpoch) smaller than that previously seen

orderReceived: 4

Result: persisted (replacing orderReceived: 2)
Reason: comparison column (otherComparisonColumn) larger than previously seen (never seen previously), despite the value being smaller than that seen for secondsSinceEpoch

orderReceived: 5

Result: rejected
Reason: comparison column (otherComparisonColumn) smaller than that previously seen

orderReceived: 6

Result: persist (replacing orderReceived: 4)
Reason: comparison column (otherComparisonColumn) larger than that previously seen

Delete column

Upsert Pinot table can support soft-deletes of primary keys. This requires the incoming record to contain a dedicated boolean single-field column that serves as a delete marker for a primary key. Once the real-time engine encounters a record with delete column set to true , the primary key will no longer be part of the queryable set of documents. This means the primary key will not be visible in the queries, unless explicitly requested via query option skipUpsert=true.

Note that the delete column has to be a single-value boolean column.

Note that when deleteRecordColumn is added to an existing table, it will require a server restart to actually pick up the upsert config changes.

A deleted primary key can be revived by ingesting a record with the same primary, but with higher comparison column value(s).

Note that when reviving a primary key in a partial upsert table, the revived record will be treated as the source of truth for all columns. This means any previous updates to the columns will be ignored and overwritten with the new record's values.

Deleted Keys time-to-live (TTL)

The above config deleteRecordColumn only soft-deletes the primary key. To decrease in-memory data and improve performance, minimize the time deleted-primary-key entries are stored in the metadata map (deletedKeys time-to-live (TTL)). Limiting the TTL is especially useful for deleted-primary-keys where there are no future updates foreseen.

Configure how long deleted-primary-keys are stored in metadata

To configure how long primary keys are stored in metadata, specify the length of time in deletedKeysTTL For example:

In this example, Pinot will retain the deleted-primary-keys in metadata for 1 day.

Note that the value of this field deletedKeysTTL should be the same as the unit of comparison column. If your comparison column is having values which corresponds to seconds, this config should also have values in seconds (see above example).

Use strictReplicaGroup for routing

The upsert Pinot table can use only the low-level consumer for the input streams. As a result, it uses the implicitly for the segments. Moreover, upsert poses the additional requirement that all segments of the same partition must be served from the same server to ensure the data consistency across the segments. Accordingly, it requires to use strictReplicaGroup as the routing strategy. To use that, configure instanceSelectorType in Routing as the following:

Using implicit partitioned replica-group assignment from low-level consumer won't persist the instance assignment (mapping from partition to servers) to the ZooKeeper, and new added servers will be automatically included without explicit reassigning instances (usually through rebalance). This can cause new segments of the same partition assigned to a different server and break the requirement of upsert.

To prevent this, we recommend using explicit to ensure the instance assignment is persisted. Note that numInstancesPerPartition should always be 1 in replicaGroupPartitionConfig.

Enable validDocIds snapshots for upsert metadata recovery

Upsert snapshot support is also added in release-0.12.0. To enable the snapshot, set the enableSnapshot to true. For example:

Upsert maintains metadata in memory containing which docIds are valid in a particular segment (ValidDocIndexes). This metadata gets lost during server restarts and needs to be recreated again. ValidDocIndexes can not be recovered easily after out-of-TTL primary keys get removed. Enabling snapshots addresses this problem by adding functions to store and recover validDocIds snapshot for Immutable Segments

The snapshots are taken on every segment commit to ensure that they are consistent with the persisted data in case of abrupt shutdown. We recommend that you enable this feature so as to speed up server boot times during restarts.

The lifecycle for validDocIds snapshots are shows as follows,

If snapshot is enabled, load validDocIds from snapshot during add segments.
If snapshot is not enabled, delete validDocIds snapshots during add segments if exists.

Enable preload for faster restarts

Upsert preload support is also added in master. To enable the preload, set the enablePreload to true. For example:

For preload to improve your restart times, enableSnapshot: true should also we set in the table config. Under the hood, it uses the snapshots to quickly insert the data instead of performing a whole upsert comparison flow for all the primary keys. The flow is triggered before server is marked as ready to load segments without snapshots (hence the name preload).

The feature also requires you to specify pinot.server.instance.max.segment.preload.threads: N in the server config where N should be replaced with the number of threads that should be used for preload. This feature is still in beta.

Metadata time-to-live (TTL)

In Pinot, the metadata map is stored in heap memory. To decrease in-memory data and improve performance, minimize the time primary key entries are stored in the metadata map (metadata time-to-live (TTL)). Limiting the TTL is especially useful for primary keys with high cardinality and frequent updates.

Configure how long primary keys are stored in metadata

To configure how long primary keys are stored in metadata, specify the length of time in upsertTTL. For example:{

In this example, Pinot will retain primary keys in metadata for 3 days.

Handle out-of-order events

There are 2 configs added related to handling out-of-order events.

dropOutOfOrderRecord

To enable dropping of out-of-order record, set the dropOutOfOrderRecord to true. For example:

This feature doesn't persist any out-of-order event to the consuming segment. If not specified, the default value is false.

When false, the out-of-order record gets persisted to the consuming segment, but the MetadataManager mapping is not updated thus this record is not referenced in query or in any future updates. You can still see the records when using skipUpsert query option.
When true, the out-of-order record doesn't get persisted at all and the MetadataManager mapping is not updated so this record is not referenced in query or in any future updates. You cannot see the records when using skipUpsert query option.

outOfOrderRecordColumn

This is to identify out-of-order events programmatically. To enable this config, add a boolean field in your table schema, say isOutOfOrder and enable via this config. For example:

This feature persists a true / false value to the isOutOfOrder field based on the orderness of the event. You can filter out out-of-order events while using skipUpsert to avoid any confusion. For example:

Upsert table limitations

There are some limitations for the upsert Pinot tables.

The upsert feature is supported for Real-time tables only, and not for Hybrid or Offline tables.
The high-level consumer is not allowed for the input stream ingestion, which means stream.[consumerName].consumer.type must always be lowLevel.

Best practices

Unlike other real-time tables, Upsert table takes up more memory resources as it needs to bookkeep the record locations in memory. As a result, it's important to plan the capacity beforehand, and monitor the resource usage. Here are some recommended practices of using Upsert table.

Create the topic/stream with more partitions.

The number of partitions in input streams determines the partition numbers of the Pinot table. The more partitions you have in input topic/stream, more Pinot servers you can distribute the Pinot table to and therefore more you can scale the table horizontally. Do note that you can't increase the partitions in future for upsert enabled tables so you need to start with good enough partitions (atleast 2-3X the number of pinot servers)

Memory usage

Upsert table maintains an in-memory map from the primary key to the record location. So it's recommended to use a simple primary key type and avoid composite primary keys to save the memory cost. In addition, consider the hashFunction config in the Upsert config, which can be MD5 or MURMUR3, to store the 128-bit hashcode of the primary key instead. This is useful when your primary key takes more space. But keep in mind, this hash may introduce collisions, though the chance is very low.

Monitoring

Set up a dashboard over the metric pinot.server.upsertPrimaryKeysCount.tableName to watch the number of primary keys in a table partition. It's useful for tracking its growth which is proportional to the memory usage growth. **** The total memory usage by upsert is roughly (primaryKeysCount * (sizeOfKeyInBytes + 24))

Capacity planning

It's useful to plan the capacity beforehand to ensure you will not run into resource constraints later. A simple way is to measure the rate of the primary keys in the input stream per partition and extrapolate the data to a specific time period (based on table retention) to approximate the memory usage. A heap dump is also useful to check the memory usage so far on an upsert table instance.

Example

Putting these together, you can find the table configurations of the quick start examples as the following:

Pinot server maintains a primary key to record location map across all the segments served in an upsert-enabled table. As a result, when updating the config for an existing upsert table (e.g. change the columns in the primary key, change the comparison column), servers need to be restarted in order to apply the changes and rebuild the map.

Quick Start

To illustrate how the full upsert works, the Pinot binary comes with a quick start example. Use the following command to creates a real-time upsert table meetupRSVP.

You can also run partial upsert demo with the following command

As soon as data flows into the stream, the Pinot table will consume it and it will be ready for querying. Head over to the Query Console to check out the real-time data.

For partial upsert you can see only the value from configured column changed based on specified partial upsert strategy.

An example for partial upsert is shown below, each of the event_id kept being unique during ingestion, meanwhile the value of rsvp_count incremented.

To see the difference from the non-upsert table, you can use a query option skipUpsert to skip the upsert effect in the query result.

FAQ

Can I change primary key columns in existing upsert table?

Yes, you can add or delete columns to primary keys as long as input stream is partitioned on one of the primary key columns. However, you need to restart all Pinot servers so that it can rebuild the primary key to record location map with the new columns.

Import Data

hashtagPinot Batch Ingestion

hashtagPinot Stream Ingestion

hashtagPinot file systems

hashtagPinot input formats

hashtagReloading and uploading existing Pinot segments

From Query Console

hashtagPrerequisite

hashtagHow it works

hashtagUsage Syntax

hashtagExample

hashtagInsert Rows into Pinot

Spark

hashtagRunning in Cluster Mode on YARN

hashtagFAQ

Flink

hashtagExample

Hadoop

hashtagSegment Creation and Push

hashtagData Preprocessing before Segment Creation

hashtagpreprocessing.num.reducers

hashtagpreprocessing.max.num.records.per.file

Backfill Data

hashtagIntroduction

Dimension table

Amazon Kinesis

Segment compaction on upserts

hashtagOverview of segment compaction

Stream Ingestion with Dedup

hashtagPrerequisites for enabling dedup

hashtagDefine the primary key in the schema

hashtagPartition the input stream by the primary key

hashtagUse strictReplicaGroup for routing

hashtagOther limitations

hashtagEnable dedup in the table configurations

hashtagBest practices

Stream Ingestion with CLP

hashtagConfiguration

hashtagTable Index

hashtagSchema

hashtagSearching and decoding CLP-encoded fields

File Systems

hashtagSupported file systems

hashtagEnabling a file system

Amazon S3

hashtagExamples

hashtagJob spec

hashtagController config

hashtagServer config

hashtagMinion config

Azure Data Lake Storage

HDFS

hashtagPush HDFS segment to Pinot Controller

hashtagExamples

hashtagJob spec

hashtagController config

hashtagServer config

hashtagMinion config

Google Cloud Storage

Ingest records with dynamic schemas

hashtagSchemaConformingTransformer Configuration

Reload a table segment

hashtagUse the Pinot Controller API to reload segments

hashtagUse the Pinot Admin Console to reload segments

Import Data

hashtagPinot Batch Ingestion

hashtagPinot Stream Ingestion

hashtagPinot file systems

hashtagPinot input formats

hashtagReloading and uploading existing Pinot segments

From Query Console

hashtagPrerequisite

hashtagHow it works

hashtagUsage Syntax

hashtagExample

hashtagInsert Rows into Pinot

Flink

hashtagExample

Dimension table

hashtagTable Config

Pinot Batch Ingestion

Pinot Stream Ingestion

Pinot file systems

Pinot input formats

Reloading and uploading existing Pinot segments

Prerequisite

How it works

Usage Syntax

Example

Insert Rows into Pinot

Running in Cluster Mode on YARN

FAQ

Example

Segment Creation and Push

Data Preprocessing before Segment Creation

preprocessing.num.reducers

preprocessing.max.num.records.per.file

Introduction

Overview of segment compaction

Prerequisites for enabling dedup

Define the primary key in the schema

Partition the input stream by the primary key

Use strictReplicaGroup for routing

Other limitations

Enable dedup in the table configurations

Best practices

Configuration

Table Index

Schema

Searching and decoding CLP-encoded fields

Supported file systems

Enabling a file system

Examples

Job spec

Controller config

Server config

Minion config

Push HDFS segment to Pinot Controller

Examples

Job spec

Controller config

Server config

Minion config

SchemaConformingTransformer Configuration

Use the Pinot Controller API to reload segments

Use the Pinot Admin Console to reload segments

Pinot Batch Ingestion

Pinot Stream Ingestion

Pinot file systems

Pinot input formats

Reloading and uploading existing Pinot segments

Prerequisite

How it works

Usage Syntax

Example

Insert Rows into Pinot

Example

Table Config

Example dimension table configuration

Use the Pinot Controller API to reload segments

Use the Pinot Admin Console to reload segments

Supported file systems

Enabling a file system

Introduction

How to backfill data in Pinot

Edge case example

Examples

Job spec

Controller config

Server config

Minion config

Examples

Job spec

Controller config

Server config

Minion config

Running in Cluster Mode on YARN

FAQ

Segment Creation and Push

Data Preprocessing before Segment Creation