Query Correlation ID

Tracking queries on a distributed cluster may be difficult. To facilitate tracking a query, Pinot assigns a correlation ID (also known as cid) to each received query. This correlation ID is then included in the Mapped Diagnostic Context (also known as MDC) and can be included in the logs. For example, a query whose correlation ID is "1234" will print logs as follows using the default Log4j2 configuration provided by pinot-tools:

2025/06/04 13:59:17.405 INFO [QueryLogger] [jersey-server-managed-async-executor-0] [cid=1234] SQL query for request 1234: SELECT ...

Custom Correlation ID

By default, Pinot assigns a random correlation ID to the query as soon as the broker receives it, but clients can provide their ID by using the clientQueryId query option. When this option is set, Pinot will use its value as the correlation ID. For example, the following query:

set clientQueryId='myCustomCid';
select * 
from userAttributes
limit 10

Will use myCustomCid as a correlation ID, which means that logs will be something like:

2025/06/04 13:59:17.405 INFO [QueryLogger] [jersey-server-managed-async-executor-0] [cid=myCustomCid] SQL query for request 1234: SELECT ...

This feature can be helpful in cases where queries are produced by programs that already have their correlation IDs. Remember that correlation IDs generated by Pinot are unique, but different clients can provide the same clientQueryId or even the same client may use the same clientQueryId for two different queries. This is why it is recommended to use high cardinality IDs like UUIDs when custom clientQueryId are provided.

The clientQueryId option is also used for Query Cancellation

PreviousQuery Cancellation NextQuery using Cursors

Last updated 2 months ago

Was this helpful?