Monitoring Metrics
Pinot provides metrics out of the box so that you can monitor every aspect of performance and robustness of the Pinot cluster. Most of the metrics are available either at table level or instance level. The metrics can be divided into the following major types when plotting -
  • Gauge - These represent a single value at any point in time.
  • Meter - These represent rates of the metric per minute, per 5 minute etc.
  • Timer - These metrics record durations and can be used to fetch average last 5 minute duration, 75, 99, 999 percentile values, min-max values etc.

Pinot Server

Metric Name
Description
Metric type
LLC-PARTITION-CONSUMING
This gives a binary value based on whether low-level consumption is healthy (1) or unhealthy (0). It’s important to ensure at least a single replica of each partition is consuming.
Gauge
HIGHEST-STREAM-OFFSET-CONSUMED
The highest offset which has been consumed so far
Gauge
DOCUMENT_COUNT
total number of records in table
Gauge
SEGMENT_COUNT
total number of segments in table
Gauge
UPSERT_PRIMARY_KEYS_COUNT
total unique primary keys in table
Gauge
LAST_REALTIME_SEGMENT_CREATION_DURATION_SECONDS
time in seconds it took for latest realtime segment to get created
Gauge
LAST_REALTIME_SEGMENT_CREATION_WAIT_TIME_SECONDS
time in seconds it took for segment creation to start (generally due to waiting for a lock to get acquired)
Gauge
LAST_REALTIME_SEGMENT_INITIAL_CONSUMPTION_DURATION_SECONDS
time in seconds spent consuming records for latest segment
Gauge
LAST_REALTIME_SEGMENT_CATCHUP_DURATION_SECONDS
time in seconds spent on catching up to the latest offset in metadata. This can happen when multiple servers are consuming from same partition.
Gauge
LAST_REALTIME_SEGMENT_COMPLETION_DURATION_SECONDS
time in seconds between when we stopped consuming records and when the segment gets committed
Gauge
REALTIME_OFFHEAP_MEMORY_USED
off heap memory in bytes current used by realtime segments
Gauge
REALTIME_SEGMENT_NUM_PARTITIONS
Number of partitions for a table
Gauge
LLC_SIMULTANEOUS_SEGMENT_BUILDS
Number of segments being built currently
Gauge
ROWS_WITH_ERRORS
number of rows that either didn't get transformed or didn't get indexed.
Meter
REALTIME_ROWS_CONSUMED
total number of records consumed from input
Meter
INVALID_REALTIME_ROWS_DROPPED
number of records that were filtered based on FilterConfig specified in table config
Meter
REALTIME_CONSUMPTION_EXCEPTIONS
number of rows that were not consumed because of some exception. It doesn't track exceptions during transformation and indexing.
Meter
RELOAD_FAILURES
Number of failures occurred while reloading segments
Meter
REFRESH_FAILURES
Number of failures occurred while refreshing segments
Meter
UNTAR_FAILURES
Number of failures occurred while uncompressing segments
Meter
SEGMENT_DOWNLOAD_FAILURES
Number of failures occurred while downloading segments from deep store to local
Meter
DELETED_SEGMENT_COUNT
Number of segments deleted either because of retention policies, explicit delete request etc.
Meter
QUERIES
Number of queries executed
Meter
QUERY_EXECUTION_EXCEPTIONS
Number of exceptions encountered during query execution
Meter
NUM-MISSING-SEGMENTS
Number of missing segments that the broker queried for (expected to be on the server) but the server didn’t have. This can be due to retention or stale routing table
Meter
NO_TABLE_ACCESS
number of query requests for which table access was denied either due to table not being present or access control restrictions.
Meter
HELIX_ZOOKEEPER_RECONNECTS
Number of times Server instance re-connected to zookeeper.
Meter
NETTY_CONNECTION_BYTES_RECEIVED
total bytes received by the server
Meter
NETTY_CONNECTION_BYTES_SENT
total bytes sent by the server
Meter
NETTY_CONNECTION_RESPONSES_SENT
total responses sent by the server
Meter
FRESHNESS_LAG_MS
time period between when the data was last updated in the table and the current time
Timer
NETTY_CONNECTION_SEND_RESPONSE_LATENCY
time spent in sending response to brokers after the results are available
Timer
EXECUTION_THREAD_CPU_TIME_NS
time spent by all threads processing query and results (doesn't includes time spent in system activities)
Timer
SYSTEM_ACTIVITIES_CPU_TIME_NS
time spent in nanoseconds processing query on the servers (only counts system acitivities such as GC, OS paging etc.)
Timer
RESPONSE_SER_CPU_TIME_NS
time spent in nanoseconds serializing query response on servers
Timer
TOTAL_CPU_TIME_NS
total time spent in nanoseconds processing query on the servers
Timer

Tracking time spent in various phases of Query execution in milliseconds -

Metric Name
Description
Select
REQUEST_DESERIALIZATION
Time spent in deserializing query request
Timer
SEGMENT_PRUNING
Time spent in Segment Pruning
Timer
BUILD_QUERY_PLAN
Time spent in building query plan
Timer
QUERY_PLAN_EXECUTION
Time spent in executing query plan
Timer
QUERY_PROCESSING
Total Time spent in processing the query request from receiving the parsed query to getting data. Doesn't include ser-de time.
Timer
SCHEDULER_WAIT
Time spent in the scheduler queue waiting for the query to be executed
Timer
RESPONSE_SERIALIZATION
Time spent in serializing query response
Timer
TOTAL_QUERY_TIME
Total time to take from receiving the query to returning the responde.
Timer

Pinot Broker

Metric Name
Description
Metric Type
UNHEALTHY_SERVERS
Number of unhealthy servers detected
Gauge
QUERY_QUOTA_CAPACITY_UTILIZATION_RATE
percentage of configured rate limit being used on each broker
Gauge
MAX_BURST_QPS
Gauge
QUERY_RATE_LIMIT_DISABLED
1 if rate limit is enabled on broker, 0 otherwise
Gauge
REQUEST_SIZE
Query String length on each broker
Gauge
RESIZE_TIME_MS
time spent in resizing results for the output. either because of LIMIT or maximum allowed group by keys or any other criteria
Gauge
QUERIES
The rate which an individual broker is receiving queries. Units are in QPS
Meter
REQUEST_COMPILATION_EXCEPTIONS
Number of queries which failed during compilation
Meter
RESOURCE_MISSING_EXCEPTIONS
Number of queries for which table doesn't exists
Meter
QUERY_VALIDATION_EXCEPTIONS
Number of invalid queries
Meter
UNKNOWN_COLUMN_EXCEPTIONS
Number of queries with unknown columns
Meter
NO_SERVER_FOUND_EXCEPTIONS
Number of queries for which no server was found to contain its data
Meter
REQUEST_TIMEOUT_BEFORE_SCATTERED_EXCEPTIONS
Number of times query timed out before even being sent to the servers
Meter
REQUEST_CHANNEL_LOCK_TIMEOUT_EXCEPTIONS
number of times query failes while trying to acquire lock to server connections
Meter
REQUEST_SEND_EXCEPTIONS
Number of queries failed while sending to server
Meter
RESPONSE_FETCH_EXCEPTIONS
Number of queries failed while handling response from servers
Meter
DATA_TABLE_DESERIALIZATION_EXCEPTIONS
Number of queries failed while deserializing response data from servers
Meter
RESPONSE_MERGE_EXCEPTIONS
Number of queries that failed while merging responses from multiple servers. This can be due to schema inconsitency or any other issues
Meter
BROKER_RESPONSES_WITH_PROCESSING_EXCEPTIONS
Number of queries where atleast one exception occured
Meter
BROKER_RESPONSES_WITH_PARTIAL_SERVERS_RESPONDED
Number of queries with incomplete results due to missing responses from servers
Meter
BROKER_RESPONSES_WITH_NUM_GROUPS_LIMIT_REACHED
Number of queries where total number of groups exceeded configured limit (default limit - 100K)
Meter
DOCUMENTS_SCANNED
Total number of documents read from segments in each query
Meter
ENTRIES_SCANNED_IN_FILTER
Meter
ENTRIES_SCANNED_POST_FILTER
Meter
NUM_RESIZES
Number of result resizes for queries
Meter
REQUEST_DROPPED_DUE_TO_ACCESS_ERROR
Number of queries dropped due to invalid access permissions on table
Meter
GROUP_BY_SIZE
Number of rows in group by queries
Meter
TOTAL_SERVER_RESPONSE_SIZE
Total number of bytes received from servers for queries
Meter
QUERY_QUOTA_EXCEEDED
Number of queries failed due to query rate limit being breached
Meter
NO_SERVING_HOST_FOR_SEGMENT
Number of segments per query for which no servers are available
Meter
SERVER_MISSING_FOR_ROUTING
Number of servers that could not be added to routing table for query
Meter
NETTY_CONNECTION_REQUESTS_SENT
total number of requests sent to servers
Meter
NETTY_CONNECTION_BYTES_SENT
total bytes sent to servers
Meter
NETTY_CONNECTION_BYTES_RECEIVED
total bytes received from servers
Meter
PROACTIVE_CLUSTER_CHANGE_CHECK
Number of requests raised to zookeeper to check the cluster state such as IDEAL STATES, EXTERNAL VIEW etc.
Meter
HELIX_ZOOKEEPER_RECONNECTS
Number of times broker instance re-connected to zookeeper.
Meter
CLUSTER_CHANGE_QUEUE_TIME
Time spent in milliseconds in queue for cluster change requests
Timer
FRESHNESS_LAG_MS
time period between when the data was last updated in the table and the current time
Timer
NETTY_CONNECTION_SEND_REQUEST_LATENCY
latency of sending the request from broker to server
Timer
OFFLINE_THREAD_CPU_TIME_NS
aggregated thread cpu time in nanoseconds for query processing from offline servers
Timer
REALTIME_THREAD_CPU_TIME_NS
aggregated thread cpu time in nanoseconds for query processing from realtime servers
Timer
OFFLINE_SYSTEM_ACTIVITIES_CPU_TIME_NS
aggregated system activities cpu time in nanoseconds for query processing from offline servers (e.g. GC, OS paging etc.)
Timer
REALTIME_SYSTEM_ACTIVITIES_CPU_TIME_NS
aggregated system activities cpu time in nanoseconds for query processing from realtime servers (e.g. GC, OS paging etc.)
Timer
OFFLINE_RESPONSE_SER_CPU_TIME_NS
aggregated response serialization cpu time in nanoseconds for query processing from offline servers
Timer
REALTIME_RESPONSE_SER_CPU_TIME_NS
aggregated response serialization cpu time in nanoseconds for query processing from realtime servers
Timer
OFFLINE_TOTAL_CPU_TIME_NS
aggregated total cpu time(thread + system activities + response serialization) in nanoseconds for query processing from offline servers
Timer
REALTIME_TOTAL_CPU_TIME_NS
time(thread + system activities + response serialization) in nanoseconds for query processing from realtime servers
Timer

Tracking time spent in various phases of Query execution in milliseconds -

Metric Name
Description
Metric Type
REQUEST_COMPILATION
Time spent in compiling SQL query
Timer
QUERY_EXECUTION
Total Time spent in query executiong
Timer
QUERY_ROUTING
Time spent in creating a routing table for segments
Timer
SCATTER_GATHER
Time spent in sending and collecting responses from servers.
Timer
REDUCE
Time spent in combining query results from multiple servers
Timer
AUTHORIZATION
Time spent checking table access after query compilation
Timer

Pinot Controller

Metric Name
Description
Metric Type
PERCENT_SEGMENTS_AVAILABLE
Percentage of complete online replicas in external view as compared to replicas in ideal state
Gauge
NUMBER_OF_REPLICAS
Total number of replicas available for table
Gauge
PERCENT_SEGMENTS_AVAILABLE
Percentage of complete online replicas in external view as compared to replicas in ideal state.
Gauge
SEGMENTS_IN_ERROR_STATE
Number of segments in an ERROR state for a given table.
Gauge
TABLE_STORAGE_QUOTA_UTILIZATION
Shows how much of the table’s storage quota is currently being used, metric will a percentage of a the entire quota.
Gauge
LAST_PUSH_TIME_DELAY_HOURS
The time in hours since the last time an offline segment has been pushed to the controller.
Gauge
HEALTHCHECK_OK_CALLS
Number of health check requests for which controller was healthy
Meter
HEALTHCHECK_BAD_CALLS
Number of health check requests for which controller was unhealthy
Meter
CONTROLLER_INSTANCE_POST_ERROR
Errors occurred while updating state for an instance (server and broker)
Meter
CONTROLLER_SEGMENT_UPLOAD_ERROR
Errors occurred while sending uploading segment request
Meter
CONTROLLER_SCHEMA_UPLOAD_ERROR
Errors occurred while uploading schema
Meter
CONTROLLER_TABLE_SCHEMA_UPDATE_ERROR
Errors occurred while updating schema
Meter
CONTROLLER_TABLE_ADD_ERROR
Errors occurred while adding table config
Meter
CONTROLLER_TABLE_UPDATE_ERROR
Errors occurred while updating table config
Meter
CONTROLLER_TABLE_TENANT_UPDATE_ERROR
Errors occurred while updating a Tenant
Meter
CONTROLLER_TABLE_TENANT_CREATE_ERROR
Errors occurred while creating a Tenant
Meter
CONTROLLER_TABLE_TENANT_DELETE_ERROR
Errors while deleting a Tenant
Meter
CONTROLLER_REALTIME_TABLE_SEGMENT_ASSIGNMENT_ERROR
Errors occurred while assigning a realtime segment to a server instance
Meter
CONTROLLER_LEADERSHIP_CHANGE_WITHOUT_CALLBACK
Number of times a controller loses/gains leadership without a callback from Helix
Meter
CONTROLLER_PERIODIC_TASK_RUN
Number of Periodic tasks running currently
Meter
CONTROLLER_PERIODIC_TASK_ERROR
Number of Periodic tasks that failed due to error
Meter
NUMBER_TIMES_SCHEDULE_TASKS_CALLED
Minion tasks schedule request sent to controller
Meter
NUMBER_TASKS_SUBMITTED
Number of minion tasks submitted to the controller.
Meter
NUMBER_SEGMENT_UPLOAD_TIMEOUT_EXCEEDED
Number of segments uploads failed due to timeout. Segments are re-uploaded in this case by controller itself.
Meter
CRON_SCHEDULER_JOB_TRIGGERED
Number of minion tasks triggered that use cron
Meter
NUMBER_ADHOC_TASKS_SUBMITTED
Number of minion ad-hoc tasks submitted
Meter
LLC_STATE_MACHINE_ABORTS
Number of times a realtime segment commit operation was aborted
Meter
LLC_ZOOKEEPER_FETCH_FAILURES
Number of Zookeeper metadata fetch requests failed
Meter
LLC_ZOOKEEPER_UPDATE_FAILURES
Number of Zookeeper metadata update requests failed
Meter
LLC_STREAM_DATA_LOSS
Indicates data loss for table either due to offsets available to consume from topic larger than the last stored offset in pinot or segment lost in CONSUMING state
Meter
HELIX_ZOOKEEPER_RECONNECTS
Number of times broker instance re-connected to zookeeper.
Meter
CRON_SCHEDULER_JOB_EXECUTION_TIME_MS
Time spent in scheduling cron jobs
Timer

Pinot Minion

Metric Name
Description
Metric Type
NUMBER_OF_TASKS
Number of tasks currently running
Gauge
NUMBER_TASKS_EXECUTED
Number of tasks triggered in Minion
Meter
NUMBER_TASKS_COMPLETED
Number of tasks completed successfully
Meter
NUMBER_TASKS_CANCELLED
Number of tasks that were cancelled
Meter
NUMBER_TASKS_FAILED
Number of tasks that failed
Meter
NUMBER_TASKS_FATAL_FAILED
Number of tasks that failed with unretryable exceptions
Meter
TASK_QUEUEING
Time spent by tasks in queue
Timer
TASK_EXECUTION
Time spent by tasks in execution
Timer
Copy link
Edit on GitHub
Outline
Pinot Server
Pinot Broker
Pinot Controller
Pinot Minion