LogoLogo
release-1.1.0
release-1.1.0
  • Introduction
  • Basics
    • Concepts
    • Architecture
    • Components
      • Cluster
        • Tenant
        • Server
        • Controller
        • Broker
        • Minion
      • Table
        • Segment
          • Deep Store
        • Schema
      • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Create and update a table configuration
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • From Query Console
      • Batch Ingestion
        • Spark
        • Flink
        • Hadoop
        • Backfill Data
        • Dimension table
      • Stream ingestion
        • Apache Kafka
        • Amazon Kinesis
        • Apache Pulsar
      • Stream Ingestion with Upsert
      • Segment compaction on upserts
      • Stream Ingestion with Dedup
      • Stream Ingestion with CLP
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
        • Complex Type (Array, Map) Handling
        • Ingest records with dynamic schemas
      • Reload a table segment
      • Upload a table segment
    • Indexing
      • Bloom filter
      • Dictionary index
      • Forward index
      • Geospatial
      • Inverted index
      • JSON index
      • Native text index
      • Range index
      • Star-tree index
      • Text search support
      • Timestamp index
    • Releases
      • 1.1.0
      • Apache Pinot™ 1.0.0 release notes
      • 0.12.1
      • 0.12.0
      • 0.11.0
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • Connect to Streamlit
      • Connect to Dash
      • Visualize data with Redash
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Query Syntax
        • Aggregation Functions
        • Cardinality Estimation
        • Explain Plan (Single-Stage)
        • Explain Plan (Multi-Stage)
        • Filtering with IdSet
        • GapFill Function For Time-Series Dataset
        • Grouping Algorithm
        • JOINs
        • Lookup UDF Join
        • Querying JSON data
        • Transformation Functions
        • Window aggregate
      • Query Options
      • User-Defined Functions (UDFs)
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
      • Controller API Reference
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Update documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null value support
      • Use the multi-stage query engine (v2)
      • Troubleshoot issues with the multi-stage query engine (v2)
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Set up cluster
      • Server Startup Status Checkers
      • Set up table
      • Set up ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
      • Separating data storage by age
        • Using multiple tenants
        • Using multiple directories
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Consistent Push and Rollback
      • Access Control
      • Monitoring
      • Tuning
        • Real-time
        • Routing
        • Query Routing using Adaptive Server Selection
        • Query Scheduling
      • Upgrading Pinot with confidence
      • Managing Logs
      • OOM Protection Using Automatic Query Killing
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication
        • Basic auth access control
        • ZkBasicAuthAccessControl
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Ingestion
    • Schema
    • Ingestion Job Spec
    • Monitoring Metrics
    • Functions
      • ABS
      • ADD
      • ago
      • EXPR_MIN / EXPR_MAX
      • arrayConcatDouble
      • arrayConcatFloat
      • arrayConcatInt
      • arrayConcatLong
      • arrayConcatString
      • arrayContainsInt
      • arrayContainsString
      • arrayDistinctInt
      • arrayDistinctString
      • arrayIndexOfInt
      • arrayIndexOfString
      • ARRAYLENGTH
      • arrayRemoveInt
      • arrayRemoveString
      • arrayReverseInt
      • arrayReverseString
      • arraySliceInt
      • arraySliceString
      • arraySortInt
      • arraySortString
      • arrayUnionInt
      • arrayUnionString
      • AVGMV
      • Base64
      • caseWhen
      • ceil
      • CHR
      • codepoint
      • concat
      • count
      • COUNTMV
      • COVAR_POP
      • COVAR_SAMP
      • day
      • dayOfWeek
      • dayOfYear
      • DISTINCT
      • DISTINCTAVG
      • DISTINCTAVGMV
      • DISTINCTCOUNT
      • DISTINCTCOUNTBITMAP
      • DISTINCTCOUNTHLLMV
      • DISTINCTCOUNTHLL
      • DISTINCTCOUNTBITMAPMV
      • DISTINCTCOUNTMV
      • DISTINCTCOUNTRAWHLL
      • DISTINCTCOUNTRAWHLLMV
      • DISTINCTCOUNTRAWTHETASKETCH
      • DISTINCTCOUNTTHETASKETCH
      • DISTINCTSUM
      • DISTINCTSUMMV
      • DIV
      • DATETIMECONVERT
      • DATETRUNC
      • exp
      • FIRSTWITHTIME
      • FLOOR
      • FrequentLongsSketch
      • FrequentStringsSketch
      • FromDateTime
      • FromEpoch
      • FromEpochBucket
      • FUNNELCOUNT
      • Histogram
      • hour
      • isSubnetOf
      • JSONFORMAT
      • JSONPATH
      • JSONPATHARRAY
      • JSONPATHARRAYDEFAULTEMPTY
      • JSONPATHDOUBLE
      • JSONPATHLONG
      • JSONPATHSTRING
      • jsonextractkey
      • jsonextractscalar
      • LASTWITHTIME
      • length
      • ln
      • lower
      • lpad
      • ltrim
      • max
      • MAXMV
      • MD5
      • millisecond
      • min
      • minmaxrange
      • MINMAXRANGEMV
      • MINMV
      • minute
      • MOD
      • mode
      • month
      • mult
      • now
      • percentile
      • percentileest
      • percentileestmv
      • percentilemv
      • percentiletdigest
      • percentiletdigestmv
      • percentilekll
      • percentilerawkll
      • percentilekllmv
      • percentilerawkllmv
      • quarter
      • regexpExtract
      • regexpReplace
      • remove
      • replace
      • reverse
      • round
      • ROW_NUMBER
      • rpad
      • rtrim
      • second
      • SEGMENTPARTITIONEDDISTINCTCOUNT
      • sha
      • sha256
      • sha512
      • sqrt
      • startswith
      • ST_AsBinary
      • ST_AsText
      • ST_Contains
      • ST_Distance
      • ST_GeogFromText
      • ST_GeogFromWKB
      • ST_GeometryType
      • ST_GeomFromText
      • ST_GeomFromWKB
      • STPOINT
      • ST_Polygon
      • strpos
      • ST_Union
      • SUB
      • substr
      • sum
      • summv
      • TIMECONVERT
      • timezoneHour
      • timezoneMinute
      • ToDateTime
      • ToEpoch
      • ToEpochBucket
      • ToEpochRounded
      • TOJSONMAPSTR
      • toGeometry
      • toSphericalGeography
      • trim
      • upper
      • Url
      • UTF8
      • VALUEIN
      • week
      • year
      • yearOfWeek
      • Extract
    • Plugin Reference
      • Stream Ingestion Connectors
      • VAR_POP
      • VAR_SAMP
      • STDDEV_POP
      • STDDEV_SAMP
  • Reference
    • Single-stage query engine (v1)
    • Multi-stage query engine (v2)
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
    • Spark-Pinot Connector
  • Contributing
    • Contribute Pinot documentation
    • Style guide
Powered by GitBook
On this page
  • Primary configuration
  • Periodic task configuration
  • BrokerResourceValidationManager
  • StaleInstancesCleanupTask
  • OfflineSegmentIntervalChecker
  • PinotTaskManager
  • RealtimeSegmentValidationManager
  • RetentionManager
  • SegmentRelocator
  • SegmentStatusChecker
  • RebalanceChecker
  • TaskMetricsEmitter

Was this helpful?

Export as PDF
  1. Configuration Reference

Controller

Most of the properties in Pinot are set in Zookeeper via Helix. However, you can also set properties in controller.conf file. You can specify the path to configuration file at the startup time as follows:

bin/pinot-admin.sh StartController -configFileName /path/to/controller.conf

Controller.conf can have the following properties.

Primary configuration

Property
Default
Description

controller.vip.host

same as controller.host

The VIP hostname used to set the download URL for segments

controller.vip.port

same as controller.port

controller.vip.protocol

controller.host

localhost

The ip of the host on which controller is running

controller.port

9000

The port on which controller should run

controller.access.protocol

controller.data.dir

${java.io.tmpdir}/PinotController

Directory to host segment data

controller.local.temp.dir

controller.zk.str

localhost:2181

zookeeper host:port string to connect

controller.update_segment_state_model

false

controller.helix.cluster.name

Pinot Cluster Name, required.

cluster.tenant.isolation.enable

true

Enable Tenant Isolation, default is single tenant cluster

controller.enable.split.commit

false

controller.query.console.useHttps

false

use https instead of http for cluster

controller.upload.onlineToOfflineTimeout

2 minutes

controller.mode

dual

Should be one of helix_only, pinot_only or dual

controller.resource.rebalance.strategy

org.apache.helix.controller. rebalancer.strategy.AutoRebalanceStrategy

controller.realtime.segment.commit.timeoutSeconds

120 seconds

request timeout for segment commit

controller.deleted.segments.retentionInDays

7 days

duration for which to retain deleted segments

controller.admin.access.control.factory.class

org.apache.pinot.controller. api.access.AllowAllAccessFactory

controller.segment.upload.timeoutInMillis

10 minutes

timeout for upload of segments.

controller.realtime.segment.metadata.commit.numLocks

64

controller.enable.storage.quota.check

true

controller.enable.batch.message.mode

false

controller.allow.hlc.tables

true

controller.storage.factory.class.file

org.apache.pinot.spi. filesystem.LocalPinotFS

table.minReplicas

1

controller.access.protocols

http

Ingress protocols to access controller (http or https or http,https)

controller.access.protocols.http.port

Port to access controller via http

controller.broker.protocols.https.port

Port to access controller via https

controller.broker.protocol

http

protocol for forwarding query requests (http or https)

controller.broker.port.override

override for broker port when forwarding query requests (use in multi-ingress scenarios)

controller.tls.keystore.path

Path to controller TLS keystore

controller.tls.keystore.password

keystore password

controller.tls.truststore.path

Path to controller TLS truststore

controller.tls.truststore.password

truststore password

controller.tls.client.auth

false

toggle for requiring TLS client auth

pinot.controller.http.server.thread.pool.corePoolSize

2 * cores

Config for the thread-pool used by pinot-controller's http-server.

pinot.controller.http.server.thread.pool.maxPoolSize

2 * cores

Config for the thread-pool used by pinot-controller's http-server.

pinot.controller.segment.fetcher.http.client.maxConnTotal

Config for the http-client used by HttpSegmentFetcher for downloading segments

pinot.controller.segment.fetcher.http.client.maxConnPerRoute

Config for the http-client used by HttpSegmentFetcher for downloading segments

Periodic task configuration

The following period tasks are

BrokerResourceValidationManager

This task rebuilds the BrokerResource if the instance set has changed.

Config
Default Value

controller.broker.resource.validation.frequencyPeriod

1h

controller.broker.resource.validation.initialDelayInSeconds

between 2m-5m

StaleInstancesCleanupTask

This task periodically cleans up stale Pinot broker/server/minion instances.

Config
Default Value

controller.stale.instances.cleanup.task.frequencyPeriod

1h

controller.stale.instances.cleanup.task.initialDelaySeconds

between 2m-5m

controller.stale.instances.cleanup.task.minOfflineTimeBeforeDeletionPeriod

1h

OfflineSegmentIntervalChecker

This task manages the segment ValidationMetrics (missingSegmentCount, offlineSegmentDelayHours, lastPushTimeDelayHours, TotalDocumentCount, NonConsumingPartitionCount, SegmentCount), to ensure that all offline segments are contiguous (no missing segments) and that the offline push delay isn't too high.

Config
Default Value

controller.offline.segment.interval.checker.frequencyPeriod

24h

controller.statuschecker.waitForPushTimePeriod

10m

controller.offlineSegmentIntervalChecker.initialDelayInSeconds

between 2m-5m

PinotTaskManager

TBD

RealtimeSegmentValidationManager

This task validates the ideal state and segment zk metadata of real-time tables by doing the following:

  • fixing any partitions which have stopped consuming

  • starting consumption from new partitions

  • uploading segments to deep store if segment download url is missing

This task ensures that the consumption of the real-time tables gets fixed and keeps going when met with erroneous conditions.

This task does not fix consumption stalled due to

  • CONSUMING segment being deleted

  • Kafka OOR exceptions

Config
Default Value

controller.realtime.segment.validation.frequencyPeriod

1h

controller.realtime.segment.validation.initialDelayInSeconds

between 2m-5m

controller.realtime.segment.deepStoreUploadRetryEnabled

false

controller.realtime.segment.deepStoreUploadRetry.timeoutMs

-1

controller.realtime.segment.deepStoreUploadRetry.parallelism

1

RetentionManager

This task manages retention of segments for all tables. During the run, it looks at the retentionTimeUnit and retentionTimeValue inside the segmentsConfig of every table, and deletes segments which are older than the retention. The deleted segments are moved to a DeletedSegments folder colocated with the dataDir on segment store, and permanently deleted from that folder in a configurable number of days.

Config
Default Value

controller.retention.frequencyPeriod

6h

controller.retentionManager.initialDelayInSeconds

between 2m-5m

controller.deleted.segments.retentionInDays

7d

SegmentRelocator

This task is applicable only if you have tierConfig or tagOverrideConfig. It runs rebalance in the background to

  1. relocate COMPLETED segments to tag overrides

  2. relocate ONLINE segments to tiers if tier configs are set

At most one replica is allowed to be unavailable during rebalance.

Config
Default Value

controller.segment.relocator.frequencyPeriod

1h

controller.segmentRelocator.initialDelayInSeconds

between 2m-5m

SegmentStatusChecker

This task manages segment status metrics such as realtimeTableCount, offlineTableCount, disableTableCount, numberOfReplicas, percentOfReplicas, percentOfSegments, idealStateZnodeSize, idealStateZnodeByteSize, segmentCount, segmentsInErrorState, tableCompressedSize.

Config
Default Value

controller.statuschecker.frequencyPeriod

5m

controller.statusChecker.initialDelayInSeconds

between 2m-5m

RebalanceChecker

Currently, table rebalance triggered by user runs at best effort. It could fail if the controller running it got restarted; or some servers were not stable, making the rebalance timed out while waiting for external view to converge with ideal state, etc. This task checks for failed rebalance and retry them automatically, up to certain times as configured.

Config
Default Value

controller.rebalance.checker.frequencyPeriod

5m

controller.rebalanceChecker.initialDelayInSeconds

between 2m-5m

TaskMetricsEmitter

TBD

PreviousClusterNextBroker

Last updated 3 months ago

Was this helpful?