LogoLogo
release-0.10.0
release-0.10.0
  • Introduction
  • Basics
    • Concepts
    • Architecture
    • Components
      • Cluster
      • Controller
      • Broker
      • Server
      • Minion
      • Tenant
      • Schema
      • Table
      • Segment
      • Deep Store
      • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • Batch Ingestion
        • Spark
        • Hadoop
        • Backfill Data
        • Dimension Table
      • Stream ingestion
        • Apache Kafka
        • Amazon Kinesis
        • Apache Pulsar
      • Stream Ingestion with Upsert
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
      • Complex Type (Array, Map) Handling
    • Indexing
      • Forward Index
      • Inverted Index
      • Star-Tree Index
      • Bloom Filter
      • Range Index
      • Text search support
      • JSON Index
      • Geospatial
      • Timestamp Index
    • Releases
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Filtering with IdSet
      • Transformation Functions
      • Aggregation Functions
      • User-Defined Functions (UDFs)
      • Cardinality Estimation
      • Lookup UDF Join
      • Querying JSON data
      • Explain Plan
      • Grouping Algorithm
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Update Documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Transformations
      • Null Value Support
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Setup cluster
      • Setup table
      • Setup ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
      • Tiered Storage
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Access Control
      • Monitoring
      • Tuning
        • Realtime
        • Routing
      • Upgrading Pinot with confidence
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication, Authorization, and ACLs
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Schema
    • Ingestion Job Spec
    • Functions
      • ABS
      • ADD
      • arrayConcatInt
      • arrayConcatString
      • arrayContainsInt
      • arrayContainsString
      • arrayDistinctString
      • arrayDistinctInt
      • arrayIndexOfInt
      • arrayIndexOfString
      • ARRAYLENGTH
      • arrayRemoveInt
      • arrayRemoveString
      • arrayReverseInt
      • arrayReverseString
      • arraySliceInt
      • arraySliceString
      • arraySortInt
      • arraySortString
      • arrayUnionInt
      • arrayUnionString
      • AVGMV
      • ceil
      • CHR
      • codepoint
      • concat
      • count
      • COUNTMV
      • day
      • dayOfWeek
      • dayOfYear
      • DISTINCT
      • DISTINCTCOUNT
      • DISTINCTCOUNTBITMAP
      • DISTINCTCOUNTBITMAPMV
      • DISTINCTCOUNTHLL
      • DISTINCTCOUNTHLLMV
      • DISTINCTCOUNTMV
      • DISTINCTCOUNTRAWHLL
      • DISTINCTCOUNTRAWHLLMV
      • DISTINCTCOUNTRAWTHETASKETCH
      • DISTINCTCOUNTTHETASKETCH
      • DIV
      • DATETIMECONVERT
      • DATETRUNC
      • exp
      • FLOOR
      • FromDateTime
      • FromEpoch
      • FromEpochBucket
      • hour
      • JSONFORMAT
      • JSONPATH
      • JSONPATHARRAY
      • JSONPATHARRAYDEFAULTEMPTY
      • JSONPATHDOUBLE
      • JSONPATHLONG
      • JSONPATHSTRING
      • jsonextractkey
      • jsonextractscalar
      • length
      • ln
      • lower
      • lpad
      • ltrim
      • max
      • MAXMV
      • MD5
      • millisecond
      • min
      • minmaxrange
      • MINMAXRANGEMV
      • MINMV
      • minute
      • MOD
      • mode
      • month
      • mult
      • now
      • percentile
      • percentileest
      • percentileestmv
      • percentilemv
      • percentiletdigest
      • percentiletdigestmv
      • quarter
      • regexpExtract
      • remove
      • replace
      • reverse
      • round
      • rpad
      • rtrim
      • second
      • SEGMENTPARTITIONEDDISTINCTCOUNT
      • sha
      • sha256
      • sha512
      • sqrt
      • startswith
      • ST_AsBinary
      • ST_AsText
      • ST_Contains
      • ST_Distance
      • ST_GeogFromText
      • ST_GeogFromWKB
      • ST_GeometryType
      • ST_GeomFromText
      • ST_GeomFromWKB
      • STPOINT
      • ST_Polygon
      • strpos
      • ST_Union
      • SUB
      • substr
      • sum
      • summv
      • TIMECONVERT
      • timezoneHour
      • timezoneMinute
      • ToDateTime
      • ToEpoch
      • ToEpochBucket
      • ToEpochRounded
      • TOJSONMAPSTR
      • toGeometry
      • toSphericalGeography
      • trim
      • upper
      • VALUEIN
      • week
      • year
      • yearOfWeek
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
Powered by GitBook
On this page
  • Requirements
  • Deploying Pinot
  • Managing Pinot
  • Pinot Management Console
  • Command line utility (pinot-admin.sh)
  • Monitoring Pinot
  • Pinot Server
  • Pinot Broker
  • Pinot Controller

Was this helpful?

Export as PDF
  1. For Operators
  2. Tutorials

Running Pinot in Production

Requirements

You will need the following in order to run pinot in production:

  • Hardware for controller/broker/servers as per your load

  • Working installation of Zookeeper that Pinot can use. We recommend setting aside a path within zookpeer and including that path in pinot.controller.zkStr. Pinot will create its own cluster under this path (cluster name decided by pinot.controller.helixClusterName)

  • Shared storage mounted on controllers (if you plan to have multiple controllers for the same cluster). Alternatively, an implementation of PinotFS that the Pinot hosts have access to.

  • HTTP load balancers for spraying queries across brokers (or other mechanism to balance queries)

  • HTTP load balancers for spraying controller requests (e.g. segment push, or other controller APIs) or other mechanisms for distribution of these requests.

Deploying Pinot

In general, when deploying Pinot services, it is best to adhere to a specific ordering in which the various components should be deployed. This deployment order is recommended in case of the scenario that there might be protocol or other significant differences, the deployments go out in a predictable order in which failure due to these changes can be avoided.

The ordering is as follows:

  1. pinot-controller

  2. pinot-broker

  3. pinot-server

  4. pinot-minion

Managing Pinot

Pinot provides a web-based management console and a command-line utility (pinot-admin.sh) in order to help provision and manage pinot clusters.

Pinot Management Console

The web based management console allows operations on tables, tenants, segments and schemas. You can access the console via http://controller-host:port/help. The console also allows you to enter queries for interactive debugging. Here are some screen-shots from the console.

Listing all the schemas in the Pinot cluster:

Rebalancing segments of a table:

Command line utility (pinot-admin.sh)

The command line utility (pinot-admin.sh) can be generated by running mvn install package -DskipTests -Pbin-dist in the directory in which you checked out Pinot.

Here is an example of invoking the command to create a pinot segment:

$ ./pinot-distribution/target/apache-pinot-0.8.0-SNAPSHOT-bin/apache-pinot-0.8.0-SNAPSHOT-bin/bin/pinot-admin.sh CreateSegment -dataDir /Users/host1/Desktop/test/ -format CSV -outDir /Users/host1/Desktop/test2/ -tableName baseballStats -segmentName baseballStats_data -overwrite -schemaFile ./pinot-distribution/target/apache-pinot-0.8.0-SNAPSHOT-bin/apache-pinot-0.8.0-SNAPSHOT-bin/sample_data/baseballStats_schema.json
Executing command: CreateSegment  -generatorConfigFile null -dataDir /Users/host1/Desktop/test/ -format CSV -outDir /Users/host1/Desktop/test2/ -overwrite true -tableName baseballStats -segmentName baseballStats_data -timeColumnName null -schemaFile ./pinot-distribution/target/apache-pinot-0.8.0-SNAPSHOT-bin/apache-pinot-0.8.0-SNAPSHOT-bin/sample_data/baseballStats_schema.json -readerConfigFile null -enableStarTreeIndex false -starTreeIndexSpecFile null -hllSize 9 -hllColumns null -hllSuffix _hll -numThreads 1
Accepted files: [/Users/host1/Desktop/test/baseballStats_data.csv]
Finished building StatsCollector!
Collected stats for 97889 documents
Created dictionary for INT column: homeRuns with cardinality: 67, range: 0 to 73
Created dictionary for INT column: playerStint with cardinality: 5, range: 1 to 5
Created dictionary for INT column: groundedIntoDoublePlays with cardinality: 35, range: 0 to 36
Created dictionary for INT column: numberOfGames with cardinality: 165, range: 1 to 165
Created dictionary for INT column: AtBatting with cardinality: 699, range: 0 to 716
Created dictionary for INT column: stolenBases with cardinality: 114, range: 0 to 138
Created dictionary for INT column: tripples with cardinality: 32, range: 0 to 36
Created dictionary for INT column: hitsByPitch with cardinality: 41, range: 0 to 51
Created dictionary for STRING column: teamID with cardinality: 149, max length in bytes: 3, range: ALT to WSU
Created dictionary for INT column: numberOfGamesAsBatter with cardinality: 166, range: 0 to 165
Created dictionary for INT column: strikeouts with cardinality: 199, range: 0 to 223
Created dictionary for INT column: sacrificeFlies with cardinality: 20, range: 0 to 19
Created dictionary for INT column: caughtStealing with cardinality: 36, range: 0 to 42
Created dictionary for INT column: baseOnBalls with cardinality: 154, range: 0 to 232
Created dictionary for STRING column: playerName with cardinality: 11976, max length in bytes: 43, range:  to Zoilo Casanova
Created dictionary for INT column: doules with cardinality: 64, range: 0 to 67
Created dictionary for STRING column: league with cardinality: 7, max length in bytes: 2, range: AA to UA
Created dictionary for INT column: yearID with cardinality: 143, range: 1871 to 2013
Created dictionary for INT column: hits with cardinality: 250, range: 0 to 262
Created dictionary for INT column: runsBattedIn with cardinality: 175, range: 0 to 191
Created dictionary for INT column: G_old with cardinality: 166, range: 0 to 165
Created dictionary for INT column: sacrificeHits with cardinality: 54, range: 0 to 67
Created dictionary for INT column: intentionalWalks with cardinality: 45, range: 0 to 120
Created dictionary for INT column: runs with cardinality: 167, range: 0 to 192
Created dictionary for STRING column: playerID with cardinality: 18107, max length in bytes: 9, range: aardsda01 to zwilldu01
Start building IndexCreator!
Finished records indexing in IndexCreator!
Finished segment seal!
Converting segment: /Users/host1/Desktop/test2/baseballStats_data_0 to v3 format
v3 segment location for segment: baseballStats_data_0 is /Users/host1/Desktop/test2/baseballStats_data_0/v3
Deleting files in v1 segment directory: /Users/host1/Desktop/test2/baseballStats_data_0
Driver, record read time : 369
Driver, stats collector time : 0
Driver, indexing time : 373

Here is an example of executing a query on a Pinot table:

$ ./pinot-distribution/target/apache-pinot-0.8.0-SNAPSHOT-bin/apache-pinot-0.8.0-SNAPSHOT-bin/bin/pinot-admin.sh PostQuery -query "select count(*) from baseballStats"
Executing command: PostQuery -brokerHost [broker_host] -brokerPort [broker_port] -query select count(*) from baseballStats
Result: {"aggregationResults":[{"function":"count_star","value":"97889"}],"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numDocsScanned":97889,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":0,"numGroupsLimitReached":false,"totalDocs":97889,"timeUsedMs":107,"segmentStatistics":[],"traceInfo":{}}

Monitoring Pinot

Pinot Server

    • Number of missing segments that the broker queried for (expected to be on the server) but the server didn’t have. This can be due to retention or stale routing table.

    • Total time to take from receiving to finishing executing the query.

    • The number of exception which might have occurred during query execution.

    • This gives a binary value based on whether low-level consumption is healthy (1) or unhealthy (0). It’s important to ensure at least a single replica of each partition is consuming.

    • The highest offset which has been consumed so far.

Pinot Broker

    • The rate which an individual broker is receiving queries. Units are in QPS.

    • These multiple metrics will indicate if a query is dropped, ie the processing of that query has been forfeited for some reason.

    • Indicates a count of partial responses. A partial response is when at least 1 of the requested servers fails to respond to the query.

    • Binary metric which will indicate when the configured QPS quota for a table is exceeded (1) or if there is capacity remaining (0).

    • Percentage of the configured QPS quota being utilized.

Pinot Controller

Many of the controller metrics include a table name and thus are dynamically generated in the code. The metrics below point to the classes which generate the corresponding metrics.

To get the real metric name, the easiest route is to spin up a controller instance, create a table with the desired name and look through the generated metrics.

Todo

Give a more detailed explanation of how metrics are generated, how to identify real metrics names and where to find them in the code.

    • Percentage of complete online replicas in external view as compared to replicas in ideal state.

    • Number of segments in an ERROR state for a given table.

    • The time in hours since the last time an offline segment has been pushed to the controller.

    • Percentage of complete online replicas in external view as compared to replicas in ideal state.

    • Shows how much of the table’s storage quota is currently being used, metric will a percentage of a the entire quota.

PreviousBuild Docker ImagesNextKubernetes Deployment

Last updated 3 years ago

Was this helpful?

Pinot exposes several metrics to monitor the service and ensure that pinot users are not experiencing issues. In this section we discuss some of the key metrics that are useful to monitor. A full list of metrics is available in the section.

Missing Segments -

Query latency -

Query Execution Exceptions -

Realtime Consumption Status -

Realtime Highest Offset Consumed -

Incoming QPS (per broker) -

Dropped Requests - , ,

Partial Responses -

Table QPS quota exceeded -

Table QPS quota usage percent -

Percent Segments Available -

Segments in Error State -

Last push delay - Generated in the class.

Percent of replicas up -

Table storage quota usage percent -

NUM_MISSING_SEGMENTS
TOTAL_QUERY_TIME
QUERY_EXECUTION_EXCEPTIONS
LLC_PARTITION_CONSUMING
HIGHEST_STREAM_OFFSET_CONSUMED
QUERIES
REQUEST_DROPPED_DUE_TO_SEND_ERROR
REQUEST_DROPPED_DUE_TO_CONNECTION_ERROR
REQUEST_DROPPED_DUE_TO_ACCESS_ERROR
BROKER_RESPONSES_WITH_PARTIAL_SERVERS_RESPONDED
QUERY_QUOTA_EXCEEDED
QUERY_QUOTA_CAPACITY_UTILIZATION_RATE
PERCENT_SEGMENTS_AVAILABLE
SEGMENTS_IN_ERROR_STATE
ValidationMetrics
PERCENT_OF_REPLICAS
TABLE_STORAGE_QUOTA_UTILIZATION
Metrics