1 of 1

Creating Pinot Segments

Creating Pinot segments

Pinot segments can be created offline on Hadoop, or via command line from data files. Controller REST endpoint can then be used to add the segment to the table to which the segment belongs. Pinot segments can also be created by ingesting data from realtime resources (such as Kafka).

Creating segments using hadoop

Offline Pinot workflow

To create Pinot segments on Hadoop, a workflow can be created to complete the following steps:

Pre-aggregate, clean up and prepare the data, writing it as Avro format files in a single HDFS directory
Create segments
Upload segments to the Pinot cluster

Step one can be done using your favorite tool (such as Pig, Hive or Spark), Pinot provides two MapReduce jobs to do step two and three.

Configuring the job

Create a job properties configuration file, such as one below:

Executing the job

The Pinot Hadoop module contains a job that you can incorporate into your workflow to generate Pinot segments.

You can then use the SegmentTarPush job to push segments via the controller REST API.

Creating Pinot segments outside of Hadoop

Here is how you can create Pinot segments from standard formats like CSV/JSON/AVRO.

Follow the steps described in the section on to build pinot. Locate pinot-admin.sh in pinot-tools/target/pinot-tools=pkg/bin/pinot-admin.sh.
Create a top level directory containing all the CSV/JSON/AVRO files that need to be converted into segments.

Run the pinot-admin command to generate the segments. The command can be invoked as follows. Options within “[ ]” are optional. For -format, the default value is AVRO.

To configure various parameters for CSV a config file in JSON format can be provided. This file is optional, as are each of its parameters. When not provided, default values used for these parameters are described below:

fileFormat: Specify one of the following. Default is EXCEL.
1. EXCEL
2. MYSQL

Below is a sample config file.

Sample Schema:

Pushing offline segments to Pinot

You can use curl to push a segment to pinot:

Alternatively you can use the pinot-admin.sh utility to upload one or more segments:

The command uploads all the segments found in segmentDirectoryPath. The segments could be either tar-compressed (in which case it is a file under segmentDirectoryPath) or uncompressed (in which case it is a directory under segmentDirectoryPath).

Creating Pinot Segments

Creating Pinot segments

Creating segments using hadoop

Offline Pinot workflow

To create Pinot segments on Hadoop, a workflow can be created to complete the following steps:

Pre-aggregate, clean up and prepare the data, writing it as Avro format files in a single HDFS directory
Create segments
Upload segments to the Pinot cluster

Step one can be done using your favorite tool (such as Pig, Hive or Spark), Pinot provides two MapReduce jobs to do step two and three.

Configuring the job

Create a job properties configuration file, such as one below:

Executing the job

The Pinot Hadoop module contains a job that you can incorporate into your workflow to generate Pinot segments.

You can then use the SegmentTarPush job to push segments via the controller REST API.

Creating Pinot segments outside of Hadoop

Here is how you can create Pinot segments from standard formats like CSV/JSON/AVRO.

Follow the steps described in the section on to build pinot. Locate pinot-admin.sh in pinot-tools/target/pinot-tools=pkg/bin/pinot-admin.sh.
Create a top level directory containing all the CSV/JSON/AVRO files that need to be converted into segments.

Run the pinot-admin command to generate the segments. The command can be invoked as follows. Options within “[ ]” are optional. For -format, the default value is AVRO.

fileFormat: Specify one of the following. Default is EXCEL.
1. EXCEL
2. MYSQL

Below is a sample config file.

Sample Schema:

Pushing offline segments to Pinot

You can use curl to push a segment to pinot:

Alternatively you can use the pinot-admin.sh utility to upload one or more segments:

# === Index segment creation job config ===

# path.to.input: Input directory containing Avro files
path.to.input=/user/pinot/input/data

# path.to.output: Output directory containing Pinot segments
path.to.output=/user/pinot/output

# path.to.schema: Schema file for the table, stored locally
path.to.schema=flights-schema.json

# segment.table.name: Name of the table for which to generate segments
segment.table.name=flights

# === Segment tar push job config ===

# push.to.hosts: Comma separated list of controllers host names to which to push
push.to.hosts=controller_host_0,controller_host_1

# push.to.port: The port on which the controller runs
push.to.port=8888

# === Index segment creation job config ===

# path.to.input: Input directory containing Avro files
path.to.input=/user/pinot/input/data

# path.to.output: Output directory containing Pinot segments
path.to.output=/user/pinot/output

# path.to.schema: Schema file for the table, stored locally
path.to.schema=flights-schema.json

# segment.table.name: Name of the table for which to generate segments
segment.table.name=flights

# === Segment tar push job config ===

# push.to.hosts: Comma separated list of controllers host names to which to push
push.to.hosts=controller_host_0,controller_host_1

# push.to.port: The port on which the controller runs
push.to.port=8888

Creating Pinot Segments

hashtagCreating Pinot segments

hashtagCreating segments using hadoop

hashtagConfiguring the job

hashtagExecuting the job

hashtagCreating Pinot segments outside of Hadoop

hashtagPushing offline segments to Pinot

Creating Pinot Segments

hashtagCreating Pinot segments

hashtagCreating segments using hadoop

hashtagConfiguring the job

hashtagExecuting the job

hashtagCreating Pinot segments outside of Hadoop

hashtagPushing offline segments to Pinot

Creating Pinot segments

Creating segments using hadoop

Configuring the job

Executing the job

Creating Pinot segments outside of Hadoop

Pushing offline segments to Pinot

Creating Pinot segments

Creating segments using hadoop

Configuring the job

Executing the job

Creating Pinot segments outside of Hadoop

Pushing offline segments to Pinot