Advanced Pinot Setup

Start Pinot components (scripts or docker images)

Setup Pinot by starting each component individually
Using docker images
Using launcher scripts

Start Pinot Components using docker

Prerequisites

If running locally, please ensure your docker cluster has enough resources, below is a sample config.
Sample docker resources

Pull docker image

You can try out pre-built Pinot all-in-one docker image.
1
export PINOT_VERSION=0.7.0-SNAPSHOT
2
export PINOT_IMAGE=apachepinot/pinot:${PINOT_VERSION}
3
docker pull ${PINOT_IMAGE}
Copied!
(Optional) You can also follow the instructions here to build your own images.

0. Create a Network

Create an isolated bridge network in docker
1
docker network create -d bridge pinot-demo
Copied!

1. Start Zookeeper

Start Zookeeper in daemon.
1
docker run \
2
--network=pinot-demo \
3
--name pinot-zookeeper \
4
--restart always \
5
-p 2181:2181 \
6
-d zookeeper:3.5.6
Copied!
Start ZKUI to browse Zookeeper data at http://localhost:9090.
1
docker run \
2
--network pinot-demo --name=zkui \
3
-p 9090:9090 \
4
-e ZK_SERVER=pinot-zookeeper:2181 \
5
-d qnib/plain-zkui:latest
Copied!

2. Start Pinot Controller

Start Pinot Controller in daemon and connect to Zookeeper.
1
docker run \
2
--network=pinot-demo \
3
--name pinot-controller \
4
-p 9000:9000 \
5
-d ${PINOT_IMAGE} StartController \
6
-zkAddress pinot-zookeeper:2181
Copied!

3. Start Pinot Broker

Start Pinot Broker in daemon and connect to Zookeeper.
1
docker run \
2
--network=pinot-demo \
3
--name pinot-broker \
4
-d ${PINOT_IMAGE} StartBroker \
5
-zkAddress pinot-zookeeper:2181
Copied!

4. Start Pinot Server

Start Pinot Server in daemon and connect to Zookeeper.
1
export PINOT_IMAGE=apachepinot/pinot:0.3.0-SNAPSHOT
2
docker run \
3
--network=pinot-demo \
4
--name pinot-server \
5
-d ${PINOT_IMAGE} StartServer \
6
-zkAddress pinot-zookeeper:2181
Copied!
Now all Pinot related components are started as an empty cluster.
You can run below command to check container status.
1
docker container ls -a
Copied!
Sample Console Output
1
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2
9e80c3fcd29b apachepinot/pinot:0.3.0-SNAPSHOT "./bin/pinot-admin.s…" 18 seconds ago Up 17 seconds 8096-8099/tcp, 9000/tcp pinot-server
3
f4c42a5865c7 apachepinot/pinot:0.3.0-SNAPSHOT "./bin/pinot-admin.s…" 21 seconds ago Up 21 seconds 8096-8099/tcp, 9000/tcp pinot-broker
4
a413b0013806 apachepinot/pinot:0.3.0-SNAPSHOT "./bin/pinot-admin.s…" 26 seconds ago Up 25 seconds 8096-8099/tcp, 0.0.0.0:9000->9000/tcp pinot-controller
5
9d3b9c4d454b zookeeper:3.5.6 "/docker-entrypoint.…" About a minute ago Up About a minute 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, 8080/tcp pinot-zookeeper
Copied!
Download Pinot Distribution from http://pinot.apache.org/download/
1
$ export PINOT_VERSION=0.7.0
2
$ tar -xvf apache-pinot-${PINOT_VERSION}-bin.tar.gz
3
4
$ cd apache-pinot-${PINOT_VERSION}-bin
5
$ ls
6
DISCLAIMER LICENSE NOTICE bin conf lib licenses query_console sample_data
7
8
$ PINOT_INSTALL_DIR=`pwd`
Copied!

Start Pinot components via launcher scripts

Start Zookeeper

1
cd apache-pinot-${PINOT_VERSION}-bin
2
bin/pinot-admin.sh StartZookeeper
Copied!

Start Pinot Controller

See controller page for more details .
1
bin/pinot-admin.sh StartController \
2
-zkAddress localhost:2181
Copied!

Start Pinot Broker

1
bin/pinot-admin.sh StartBroker \
2
-zkAddress localhost:2181
Copied!

Start Pinot Controller

1
bin/pinot-admin.sh StartServer \
2
-zkAddress localhost:2181
Copied!

Start Pinot Using Config Files

Often times we need to customized the setup of Pinot Components. Hence user can compile a config file and use it to start Pinot Components.
Below are the examples config files and sample command to start Pinot.

Pinot Controller

Below is a sample pinot-controller.conf used in HelmChart setup.
1
controller.helix.cluster.name=pinot-quickstart
2
controller.port=9000
3
controller.vip.host=pinot-controller
4
controller.vip.port=9000
5
controller.data.dir=/var/pinot/controller/data
6
controller.zk.str=pinot-zookeeper:2181
7
pinot.set.instance.id.to.hostname=true
Copied!
In order to run Pinot Controller, the command is:
1
bin/pinot-admin.sh StartController -configFileName config/pinot-controller.conf
Copied!

Configure Controller

Below are some configurations you can set in Pinot Controller. You can head over to Controller for complete list of available configs.
Config Name
Description
Default Value
controller.helix.cluster.name
Pinot Cluster name
PinotCluster
controller.host
Pinot Controller Host
Required if config pinot.set.instance.id.to.hostname is false.
pinot.set.instance.id.to.hostname
When enabled, use server hostname to infer controller.host
false
controller.port
Pinot Controller Port
9000
controller.vip.host
The VIP hostname used to set the download URL for segments
${controller.host}
controller.vip.port
The VIP port used to set the download URL for segments
${controller.port}
controller.data.dir
Directory to host segment data
${java.io.tmpdir}/PinotController
controller.zk.str
Zookeeper URL
localhost:2181
cluster.tenant.isolation.enable
Enable Tenant Isolation, default is single tenant cluster
true

Pinot Broker

Below is a sample pinot-broker.conf used in HelmChart setup.
1
pinot.broker.client.queryPort=8099
2
pinot.broker.routing.table.builder.class=random
3
pinot.set.instance.id.to.hostname=true
Copied!
In order to run Pinot Broker, the command is:
1
bin/pinot-admin.sh StartBroker -clusterName pinot-quickstart -zkAddress pinot-zookeeper:2181 -configFileName config/pinot-broker.conf
Copied!

Configure Broker

Below are some configurations you can set in Pinot Broker. You can head over to Broker for complete list of available configs.
Config Name
Description
Default Value
instanceId
Unique id to register Pinot Broker in the cluster.
BROKER_${BROKER_HOST}_${pinot.broker.client.queryPort}
pinot.set.instance.id.to.hostname
When enabled, use server hostname to set ${BROKER_HOST} in above config, else use IP address.
false
pinot.broker.client.queryPort
Port to query Pinot Broker
8099
pinot.broker.timeoutMs
Timeout for Broker Query in Milliseconds
10000
pinot.broker.enable.query.limit.override
Configuration to enable Query LIMIT Override to protect Pinot Broker and Server from fetch too many records back.
false
pinot.broker.query.response.limit
When config pinot.broker.enable.query.limit.override is enabled, reset limit for selection query if it exceeds this value.
2147483647
pinot.broker.startup.minResourcePercent
Configuration to consider the broker ServiceStatus as being STARTED if the percent of resources (tables) that are ONLINE for this this broker has crossed the threshold percentage of the total number of tables that it is expected to serve
100.0

Pinot Server

Below is a sample pinot-server.conf used in HelmChart setup.
1
pinot.server.netty.port=8098
2
pinot.server.adminapi.port=8097
3
pinot.server.instance.dataDir=/var/pinot/server/data/index
4
pinot.server.instance.segmentTarDir=/var/pinot/server/data/segment
5
pinot.set.instance.id.to.hostname=true
Copied!
In order to run Pinot Server, the command is:
1
bin/pinot-admin.sh StartServer -clusterName pinot-quickstart -zkAddress pinot-zookeeper:2181 -configFileName config/pinot-server.conf
Copied!

Configure Server

Below are some outstanding configurations you can set in Pinot Server. You can head over to Server for complete list of available configs.
Config Name
Description
Default Value
instanceId
Unique id to register Pinot Server in the cluster.
Server_${SERVER_HOST}_${pinot.server.netty.port}
pinot.set.instance.id.to.hostname
When enabled, use server hostname to set ${SERVER_HOST} in above config, else use IP address.
false
pinot.server.netty.port
Port to query Pinot Server
8098
pinot.server.adminapi.port
Port for Pinot Server Admin UI
8097
pinot.server.instance.dataDir
Directory to hold all the data
${java.io.tmpDir}/PinotServer/index
pinot.server.instance.segmentTarDir
Directory to hold temporary segments downloaded from Controller or Deep Store
${java.io.tmpDir}/PinotServer/segmentTar
pinot.server.query.executor.timeout
Timeout for Server to process Query in Milliseconds
15000

Create and Configure table

A TABLE in regular database world is represented as <TABLE>_OFFLINE and/or <TABLE>_REALTIME in Pinot depending on the ingestion mode (batch, real-time, hybrid)
See examples for all possible batch/streaming tables.

Batch Table Creation

Please see Batch Tables for table configuration details and how to customize it.
Docker
Using launcher scripts
1
docker run \
2
--network=pinot-demo \
3
--name pinot-batch-table-creation \
4
${PINOT_IMAGE} AddTable \
5
-schemaFile examples/batch/airlineStats/airlineStats_schema.json \
6
-tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json \
7
-controllerHost pinot-controller \
8
-controllerPort 9000 \
9
-exec
Copied!
Sample Console Output
1
Executing command: AddTable -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json -schemaFile examples/batch/airlineStats/airlineStats_schema.json -controllerHost pinot-controller -controllerPort 9000 -exec
2
Sending request: http://pinot-controller:9000/schemas to controller: a413b0013806, version: Unknown
3
{"status":"Table airlineStats_OFFLINE succesfully added"}
Copied!
1
bin/pinot-admin.sh AddTable \
2
-schemaFile examples/batch/airlineStats/airlineStats_schema.json \
3
-tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json \
4
-exec
Copied!

Streaming Table Creation

Please see Streaming Tables for table configuration details and how to customize it.
Docker
Using launcher scripts
Start Kafka
1
docker run \
2
--network pinot-demo --name=kafka \
3
-e KAFKA_ZOOKEEPER_CONNECT=pinot-zookeeper:2181/kafka \
4
-e KAFKA_BROKER_ID=0 \
5
-e KAFKA_ADVERTISED_HOST_NAME=kafka \
6
-d wurstmeister/kafka:latest
Copied!
Create a Kafka Topic
1
docker exec \
2
-t kafka \
3
/opt/kafka/bin/kafka-topics.sh \
4
--zookeeper pinot-zookeeper:2181/kafka \
5
--partitions=1 --replication-factor=1 \
6
--create --topic flights-realtime
Copied!
Create a Streaming table
1
docker run \
2
--network=pinot-demo \
3
--name pinot-streaming-table-creation \
4
${PINOT_IMAGE} AddTable \
5
-schemaFile examples/stream/airlineStats/airlineStats_schema.json \
6
-tableConfigFile examples/docker/table-configs/airlineStats_realtime_table_config.json \
7
-controllerHost pinot-controller \
8
-controllerPort 9000 \
9
-exec
Copied!
Sample output
1
Executing command: AddTable -tableConfigFile examples/docker/table-configs/airlineStats_realtime_table_config.json -schemaFile examples/stream/airlineStats/airlineStats_schema.json -controllerHost pinot-controller -controllerPort 9000 -exec
2
Sending request: http://pinot-controller:9000/schemas to controller: 8fbe601012f3, version: Unknown
3
{"status":"Table airlineStats_REALTIME succesfully added"}
Copied!
Start Kafka-Zookeeper
1
bin/pinot-admin.sh StartZookeeper -zkPort 2191
Copied!
Start Kafka
1
bin/pinot-admin.sh StartKafka -zkAddress=localhost:2191/kafka -port 19092
Copied!
Create stream table
1
bin/pinot-admin.sh AddTable \
2
-schemaFile examples/stream/airlineStats/airlineStats_schema.json \
3
-tableConfigFile examples/stream/airlineStats/airlineStats_realtime_table_config.json \
4
-exec
Copied!

Load Data

Now that the table is configured, let's load some data. Data can be loaded in batch mode or streaming mode. See ingestion overview page for details. Loading data involves generating pinot segments from raw data and pushing them to the pinot cluster.

Load Data in Batch

User can always generate and push segments to Pinot via standalone scripts or using frameworks such as Hadoop or Spark. See this page for more details on setting up Data Ingestion Jobs.
Below example goes with the standalone mode.
Docker
Using launcher scripts
1
docker run \
2
--network=pinot-demo \
3
--name pinot-data-ingestion-job \
4
${PINOT_IMAGE} LaunchDataIngestionJob \
5
-jobSpecFile examples/docker/ingestion-job-specs/airlineStats.yaml
Copied!
Sample Console Output
1
SegmentGenerationJobSpec:
2
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
3
excludeFileNamePattern: null
4
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
5
segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
6
segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
7
includeFileNamePattern: glob:**/*.avro
8
inputDirURI: examples/batch/airlineStats/rawdata
9
jobType: SegmentCreationAndTarPush
10
outputDirURI: examples/batch/airlineStats/segments
11
overwriteOutput: true
12
pinotClusterSpecs:
13
- {controllerURI: 'http://pinot-controller:9000'}
14
pinotFSSpecs:
15
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
16
pushJobSpec: {pushAttempts: 2, pushParallelism: 1, pushRetryIntervalMillis: 1000,
17
segmentUriPrefix: null, segmentUriSuffix: null}
18
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.avro.AvroRecordReader,
19
configClassName: null, configs: null, dataFormat: avro}
20
segmentNameGeneratorSpec: null
21
tableSpec: {schemaURI: 'http://pinot-controller:9000/tables/airlineStats/schema',
22
tableConfigURI: 'http://pinot-controller:9000/tables/airlineStats', tableName: airlineStats}
23
24
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
25
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
26
Finished building StatsCollector!
27
Collected stats for 403 documents
28
Created dictionary for INT column: FlightNum with cardinality: 386, range: 14 to 7389
29
Using fixed bytes value dictionary for column: Origin, size: 294
30
Created dictionary for STRING column: Origin with cardinality: 98, max length in bytes: 3, range: ABQ to VPS
31
Created dictionary for INT column: Quarter with cardinality: 1, range: 1 to 1
32
Created dictionary for INT column: LateAircraftDelay with cardinality: 50, range: -2147483648 to 303
33
......
34
......
35
Pushing segment: airlineStats_OFFLINE_16085_16085_29 to location: http://pinot-controller:9000 for table airlineStats
36
Sending request: http://pinot-controller:9000/v2/segments?tableName=airlineStats to controller: a413b0013806, version: Unknown
37
Response for pushing table airlineStats segment airlineStats_OFFLINE_16085_16085_29 to location http://pinot-controller:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16085_16085_29 of table: airlineStats"}
38
Pushing segment: airlineStats_OFFLINE_16084_16084_30 to location: http://pinot-controller:9000 for table airlineStats
39
Sending request: http://pinot-controller:9000/v2/segments?tableName=airlineStats to controller: a413b0013806, version: Unknown
40
Response for pushing table airlineStats segment airlineStats_OFFLINE_16084_16084_30 to location http://pinot-controller:9000 - 200: {"status":"Successfully uploaded segment: airlineStats_OFFLINE_16084_16084_30 of table: airlineStats"}
Copied!
1
bin/pinot-admin.sh LaunchDataIngestionJob \
2
-jobSpecFile examples/batch/airlineStats/ingestionJobSpec.yaml
Copied!
JobSpec yaml file has all the information regarding data format, input data location and pinot cluster coordinates. Note that this assumes that the controller is RUNNING to fetch the table config and schema. If not, you will have to configure the spec to point at their location. See Pinot Ingestion Job for more details.

Load Data in Streaming

Kafka

Docker
Using launcher scripts
Run below command to stream JSON data into Kafka topic: flights-realtime
1
docker run \
2
--network pinot-demo \
3
--name=loading-airlineStats-data-to-kafka \
4
${PINOT_IMAGE} StreamAvroIntoKafka \
5
-avroFile examples/stream/airlineStats/sample_data/airlineStats_data.avro \
6
-kafkaTopic flights-realtime -kafkaBrokerList kafka:9092 -zkAddress pinot-zookeeper:2181/kafka
Copied!
Run below command to stream JSON data into Kafka topic: flights-realtime
1
bin/pinot-admin.sh StreamAvroIntoKafka \
2
-avroFile examples/stream/airlineStats/sample_data/airlineStats_data.avro \
3
-kafkaTopic flights-realtime -kafkaBrokerList localhost:19092 -zkAddress localhost:2191/kafka
Copied!
Last modified 1mo ago