# Query Options

## Query Options

Query options are per-query switches that let you choose the execution engine, control runtime limits, and adjust diagnostics.

### The options that matter most

* `useMultistageEngine` selects the multi-stage engine for queries that need advanced SQL features.
* `timeoutMs` bounds how long Pinot should spend on the query.
* `clientQueryId` gives the query a stable identifier for tracking and cancellation.
* `explainPlanVerbose` asks Pinot for richer plan output.
* `maxExecutionThreads` limits concurrency for expensive queries.

```sql
SET useMultistageEngine = true;
SET timeoutMs = 5000;
SET clientQueryId = 'query-2026-03-24-001';
SELECT city, COUNT(*)
FROM stores
GROUP BY city
LIMIT 10;
```

### When to reach for options

Use query options when you want to change behavior without changing table configuration.

Typical cases include:

* lowering timeout for a user-facing request
* forcing multi-stage execution for a query that needs joins
* attaching a custom query ID for logs and cancellation
* widening or narrowing execution resources for a single query

### Where the deeper knobs live

Some query behavior is controlled by dedicated pages rather than a raw option list:

* use [Query quotas](/build-with-pinot/querying-and-sql/query-execution-controls/query-quotas.md) for rate limiting
* use [Query cancellation](/build-with-pinot/querying-and-sql/query-execution-controls/query-cancellation.md) when a query must be stopped
* use [Cursor pagination](/build-with-pinot/querying-and-sql/query-execution-controls/query-using-cursors.md) when the result set is too large

### What this page covered

This page covered the main query-level switches, why you would set them, and which dedicated pages handle related controls.

### Next step

If you need to track or stop a query, read [Query cancellation](/build-with-pinot/querying-and-sql/query-execution-controls/query-cancellation.md). If you need to shape result delivery, read [Cursor pagination](/build-with-pinot/querying-and-sql/query-execution-controls/query-using-cursors.md).

### Related pages

* [Querying Pinot](/build-with-pinot/querying-and-sql/querying-pinot.md)
* [SQL syntax](/build-with-pinot/querying-and-sql/sql-syntax.md)
* [Query quotas](/build-with-pinot/querying-and-sql/query-execution-controls/query-quotas.md)
* [Correlation IDs](/build-with-pinot/querying-and-sql/query-execution-controls/query-correlation-id.md)

***

### description: This document contains all the available query options

## Query Options

### Supported Query Options

| Key                                                 | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Default Behavior                                                                                                                                                                                                                                                                                         |
| --------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **timeoutMs**                                       | Timeout of the query in milliseconds                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Use table/broker level timeout                                                                                                                                                                                                                                                                           |
| **extraPassiveTimeoutMs**                           | For multi-stage queries, add this many milliseconds to the passive deadline Pinot uses while waiting on upstream stages or mailbox data. This does not change the active query deadline from `timeoutMs`; it only gives passive waits more time before Pinot treats them as timed out. This overrides the broker-level `pinot.broker.extraPassiveTimeoutMs` setting for the query.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Broker level config (default `100`)                                                                                                                                                                                                                                                                      |
| **enableNullHandling**                              | Enables advanced null handling. See [Null value support](/build-with-pinot/querying-and-sql/sql-syntax/null-value-support.md) for more information.(since 0.11.0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | `false` (disabled)                                                                                                                                                                                                                                                                                       |
| **explainPlanVerbose**                              | Return verbose result for `EXPLAIN` query (since 0.11.0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | `false` (not verbose)                                                                                                                                                                                                                                                                                    |
| **useMultistageEngine**                             | Use multi-stage engine to execute the query (since 0.11.0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | `false` (use single-stage engine)                                                                                                                                                                                                                                                                        |
| **useMSEToFillEmptyResponseSchema**                 | For single-stage queries that return zero rows, try the multi-stage engine compiler to populate a more accurate empty result schema instead of relying only on table metadata or the default single-stage fallback. This overrides the broker-level `pinot.broker.use.mse.to.fill.empty.response.schema` setting for the query. If the query does not compile in the multi-stage engine, Pinot falls back to the normal empty-response schema behavior.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | `false`                                                                                                                                                                                                                                                                                                  |
| **maxExecutionThreads**                             | Maximum threads to use to execute the query. Useful to limit the resource usage for expensive queries                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Half of the CPU cores for non-group-by queries; all CPU cores for group-by queries                                                                                                                                                                                                                       |
| **numReplicaGroupsToQuery**                         | When replica-group based routing is enabled, use it to query multiple replica-groups (since 0.11.0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | `1` (only query servers within the same replica-group)                                                                                                                                                                                                                                                   |
| **useFixedReplica**                                 | Route each segment to a consistent replica for the query instead of relying on the broker's normal per-query server selection. This overrides the table-level or broker-level fixed-replica setting for the query. When replica-group based routing is enabled, Pinot still honors `numReplicaGroupsToQuery`; fixed-replica routing only makes the replica choice deterministic within each queried replica-group.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Use the table-level or broker-level fixed-replica setting; when neither is set, `false`                                                                                                                                                                                                                  |
| **sortAggregateLimitThreshold**                     | For group-by query, in case of ordering by all group keys, use sort-aggregation when query `LIMIT` is below this threshold. Under such cases, sort-aggregation allows trimming that does not affect result correctness, and is often more efficient than hash aggregation due to significantly reduced memory pressure. Example query that uses sort-aggregation: `SELECT col1, COUNT(*) FROM t1 GROUP BY col1, col2 ORDER BY col1, col2 DESC LIMIT X;` , where `X` is smaller than this threshold.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | `10000`                                                                                                                                                                                                                                                                                                  |
| **sortAggregateSingleThreadedNumSegmentsThreshold** | When sort-aggregation is used under the above cases described in `sortAggregateLimitThreshold`, use single-threaded sequential combine instead of parallel pair-wise combine when the number of segments to merge is below this threshold.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Number of CPU cores                                                                                                                                                                                                                                                                                      |
| **numGroupsLimit**                                  | For group-by queries, cap how many groups each Pinot query operator keeps before it stops admitting new groups. This applies to both leaf and intermediate stages in the multi-stage engine. When the limit is reached, Pinot keeps aggregating rows for groups it has already seen and sets `numGroupsLimitReached=true` in the response metadata. See [Grouping algorithm](/build-with-pinot/querying-and-sql/sql-syntax/grouping-algorithm.md) for the related server config and trimming behavior.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Server level config (default `100000`)                                                                                                                                                                                                                                                                   |
| **numGroupsWarningLimit**                           | For group-by queries, set the warning threshold for the number of groups a query operator accumulates. When the threshold is reached, Pinot logs a warning and sets `numGroupsWarningLimitReached=true` in the response metadata, but continues query execution.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Server level config (default `150000`)                                                                                                                                                                                                                                                                   |
| **errorOnNumGroupsLimit**                           | For multi-stage group-by queries, throw an exception instead of returning partial results when `numGroupsLimit` or the aggregate hint `num_groups_limit` is reached. The aggregate hint `error_on_num_groups_limit='true'` provides the same behavior for a specific aggregate node.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | `false`                                                                                                                                                                                                                                                                                                  |
| **allowReverseOrder**                               | For single-stage selection queries, allow Pinot to read a sorted segment in descending order for `ORDER BY ... DESC` instead of scanning the segment in ascending order and reordering it. This can enable early termination when the full sorted prefix of the `ORDER BY` clause is descending and the query has a small `LIMIT`. If the segment cannot provide descending blocks, or if later `ORDER BY` expressions still require additional ordering work, Pinot falls back to the normal path and can still scan the full segment.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | `false`                                                                                                                                                                                                                                                                                                  |
| **minInitialIndexedTableCapacity**                  | For group-by queries, set the minimum initial capacity Pinot uses when creating `IndexedTable` instances to merge grouped results. Higher values can reduce rehashing for queries with many groups, but increase memory usage for smaller queries. This query option overrides both the server-side and broker-side defaults.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | `128`                                                                                                                                                                                                                                                                                                    |
| **minSegmentGroupTrimSize**                         | Minimum groups to keep when trimming groups at the segment level for group-by queries. See [#configuration-parameters](/build-with-pinot/querying-and-sql/sql-syntax/grouping-algorithm.md#configuration-parameters)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Server level config                                                                                                                                                                                                                                                                                      |
| **minServerGroupTrimSize**                          | Minimum groups to keep when trimming groups at the server level for group-by queries. See [#configuration-parameters](/build-with-pinot/querying-and-sql/sql-syntax/grouping-algorithm.md#configuration-parameters)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Server level config                                                                                                                                                                                                                                                                                      |
| **serverReturnFinalResult**                         | For aggregation and group-by queries, ask servers to directly return final results instead of intermediate results for aggregations. Can be applied when the group key is server partitioned, i.e. the column(s) is partitioned, and all the data for a partition is served by the same server.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | `true` when a single server is queried, `false` otherwise                                                                                                                                                                                                                                                |
| **serverReturnFinalResultKeyUnpartitioned**         | For group-by queries, ask servers to directly return final results instead of intermediate results for aggregations. Different from **serverReturnFinalResult**, this option should be used when the group key is not server partitioned, but the aggregated column is server partitioned. It is particularly useful for distinct count queries. When this option is enabled, server will return final results, but won't directly trim the result to the query limit.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | `false`                                                                                                                                                                                                                                                                                                  |
| **skipIndexes**                                     | Which indexes to skip usage of (i.e. scan instead), per-column. This is useful for side-by-side comparison/debugging. There can be cases where the use of an index is actually more expensive than performing a scan of the docs which match other filters. One such example could be a low-selectivity inverted index used in conjunction with another highly selective filter. Config can be specified using url parameter format: `skipIndexes='col1=inverted,range&col2=inverted'` Possible index types to skip are: `sorted, range, inverted, H3`. To find out which indexes are used to resolve a given query, use the `EXPLAIN` query.                                                                                                                                                                                                                                                                                                                                                                                             | `null/empty` (use all available indexes)                                                                                                                                                                                                                                                                 |
| **vectorNprobe**                                    | For IVF\_FLAT, IVF\_PQ, and IVF\_ON\_DISK vector search, set how many inverted lists Pinot probes per segment before collecting ANN candidates. Higher values usually improve recall at the cost of latency. Pinot ignores this option for HNSW indexes and for the exact-scan fallback path used on segments without a vector index. See [Vector index](/build-with-pinot/indexing/vector-index.md) for the related index configuration.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | `4` for IVF\_FLAT, IVF\_PQ, and IVF\_ON\_DISK; ignored for HNSW and exact-scan fallback                                                                                                                                                                                                                  |
| **vectorExactRerank**                               | For vector search that uses an ANN index, re-score the ANN candidates with exact distance from the forward index before Pinot returns the final top-K. This improves accuracy but does not turn ANN search into a full exact scan. See [Vector index](/build-with-pinot/indexing/vector-index.md) for the related ANN search behavior.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | `true` for IVF\_PQ (PQ distances are approximate by construction); `false` for HNSW and IVF\_FLAT                                                                                                                                                                                                        |
| **vectorMaxCandidates**                             | When `vectorExactRerank=true`, set how many ANN candidates Pinot retrieves before exact reranking. Pinot always uses at least `topK`, and ignores this option when exact rerank is disabled. See [Vector index](/build-with-pinot/indexing/vector-index.md) for tuning guidance.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | `topK * 10` when exact rerank is enabled                                                                                                                                                                                                                                                                 |
| **vectorDistanceThreshold**                         | For vector similarity queries, return all results within this distance threshold instead of limiting to top-K. This enables confidence-based retrieval where all results meeting a quality threshold are returned. Applies to `VECTOR_SIMILARITY` queries; the distance function used in the WHERE clause determines which threshold is applied. See [Vector Query Execution Semantics](/build-with-pinot/querying-and-sql/sql-syntax/vector-query-execution.md) for usage examples.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Not set (uses top-K retrieval)                                                                                                                                                                                                                                                                           |
| **vectorEfSearch**                                  | For HNSW vector search, control how many nodes the graph traversal visits (search beam width). Higher values improve recall at the cost of query latency. Typical ranges: 100–150 for low latency, 200–300 for balanced, 400–800 for high recall. Pinot ignores this option for IVF indexes and the exact-scan fallback path. Applies to both mutable (consuming) and immutable (offline) HNSW segments. See [Vector index](/build-with-pinot/indexing/vector-index.md) for tuning guidance.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | From index config (defaults to `topK` when not set)                                                                                                                                                                                                                                                      |
| **vectorUseRelativeDistance**                       | For HNSW vector search, toggle competitive (relative-distance) pruning during graph traversal. Disabling this (`false`) can improve recall on some data distributions at the cost of higher latency. Pinot ignores this option for non-HNSW indexes. See [Vector index](/build-with-pinot/indexing/vector-index.md) for tuning guidance.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | `true`                                                                                                                                                                                                                                                                                                   |
| **vectorUseBoundedQueue**                           | For HNSW vector search, toggle the bounded top-K collector during graph traversal. Disabling this (`false`) may improve recall in certain scenarios. Pinot ignores this option for non-HNSW indexes. See [Vector index](/build-with-pinot/indexing/vector-index.md) for tuning guidance.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | `true`                                                                                                                                                                                                                                                                                                   |
| **skipPlannerRules**                                | Which defaultly enabled query planner rules should be disabled. This is useful when `EXPLAIN PLAN FOR` suggests a rule evaluation is taking too long or when it is known that a rule produces sub-optimal plan for the query. Currently this only applies to rules in optProgram that are mostly logical transformations. Config can be specified using rule names delimited by comma: `skipPlannerRules='FilterProjectTranspose,PruneEmptySort'` The rule name used here is consistent with the output of `EXPLAIN PLAN FOR`, which is the rule description.                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | `null/empty` (no defaultly enabled rules are skipped)                                                                                                                                                                                                                                                    |
| **usePlannerRules**                                 | Which query planner rules that are disabled by default should be used. This is useful when the defaultly disabled rules could help query execution. Config can be specified using rule names delimited by comma: `usePlannerRules='AggregateJoinTransposeExtended,JoinToEnrichedJoin'`. By default this query option applies to the built-in disabled-rule set listed in [Default Disabled Rules](/build-with-pinot/querying-and-sql/query-execution-controls/default-disabled-rules.md), but brokers can replace that set with `pinot.broker.mse.planner.disabled.rules`. At this point, Pinot does not have a cost-based optimizer and the multi-stage query engine uses Calcite's HepPlanner for query optimization. The rules that are disabled by default are those that are only helpful under certain circumstances. For a more detailed description on what these rules do and when they are helpful, please see [Default Disabled Rules](/build-with-pinot/querying-and-sql/query-execution-controls/default-disabled-rules.md). | `null/empty` (no defaultly disabled rules are used)                                                                                                                                                                                                                                                      |
| **skipUpsert**                                      | For upsert-enabled table, skip the effect of upsert and query all the records. See [upsert.md](/build-with-pinot/ingestion/upsert-dedup/upsert.md)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | `false` (exclude the replaced records)                                                                                                                                                                                                                                                                   |
| **upsertViewFreshnessMs**                           | For upsert tables using `SNAPSHOT` consistency mode, override the query-time freshness window for the upsert view. By default, Pinot uses the table's `upsertViewRefreshIntervalMs`. Smaller values can refresh the view sooner for a query, and `0` forces a refresh for every query. See [upsert.md](/build-with-pinot/ingestion/upsert-dedup/upsert.md) for the table-level setting and consistency-mode details.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Use the table's `upsertViewRefreshIntervalMs`                                                                                                                                                                                                                                                            |
| **skipUpsertView**                                  | For debugging upsert tables, bypass the consistent upsert view maintained by `SYNC` or `SNAPSHOT` consistency mode and query as if the table were using `NONE` mode. See [upsert.md](/build-with-pinot/ingestion/upsert-dedup/upsert.md) for the upsert-view behavior this skips.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | `false`                                                                                                                                                                                                                                                                                                  |
| **useStarTree**                                     | Useful to debug the star-tree index (since 0.11.0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | `true` (use star-tree if available)                                                                                                                                                                                                                                                                      |
| **AndScanReordering**                               | [See detailed description](/operate-pinot/tuning/performance-optimization-configurations.md)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | disabled                                                                                                                                                                                                                                                                                                 |
| **maxRowsInJoin**                                   | Configure maximum rows allowed in a join operation. This limit is applied to both the hash table build phase for the join's right input as well as the number of joined rows emitted after matching with the join's left input.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | default value read from cluster config`pinot.query.join.max.rows` if not set, the default will be **2^20 (1024\*1024)**                                                                                                                                                                                  |
| **inPredicatePreSorted**                            | (Only apply to STRING columns) Indicates that the values in the IN clause is already sorted, so that Pinot doesn't need to sort them again at query time                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | `false` (values in IN predicate is not pre-sorted)                                                                                                                                                                                                                                                       |
| **inPredicateLookupAlgorithm**                      | (Only apply to STRING columns) The algorithm to use to look up the dictionary ids for the IN clause values. - `DIVIDE_BINARY_SEARCH`: Sort the IN clause values and do binary search on both dictionary and IN clause values at same time to reduce the value lookups - `SCAN`: Sort the IN clause values and scan both dictionary and IN clause values to get the matching dictionary ids - `PLAIN_BINARY_SEARCH`: Do not sort the IN clause values, but directly binary search each IN clause value in the dictionary                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | `DIVIDE_BINARY_SEARCH`                                                                                                                                                                                                                                                                                   |
| **maxServerResponseSizeBytes**                      | Long value config indicating the maximum length of the serialized response per server for a query.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Overriding priortiy order: 1. QueryOption -> maxServerResponseSizeBytes 2. QueryOption -> maxQueryResponseSizeBytes 3. TableConfig -> maxServerResponseSizeBytes 4. TableConfig -> maxQueryResponseSizeBytes 5. BrokerConfig -> maxServerResponseSizeBytes 6. BrokerConfig -> maxServerResponseSizeBytes |
| **maxQueryResponseSizeBytes**                       | Long value config indicating the maximum serialized response size across all servers for a query. This value is equally divided across all servers processing the query.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Overriding priortiy order: 1. QueryOption -> maxServerResponseSizeBytes 2. QueryOption -> maxQueryResponseSizeBytes 3. TableConfig -> maxServerResponseSizeBytes 4. TableConfig -> maxQueryResponseSizeBytes 5. BrokerConfig -> maxServerResponseSizeBytes 6. BrokerConfig -> maxServerResponseSizeBytes |
| **filteredAggregationsSkipEmptyGroups**             | This config can be set to `true` to avoid computing all the groups in a group by query with only filtered aggregations (and no non-filtered aggregations). By default, the groups are computed over all the rows returned by the main filter, even if certain rows will never match any of the aggregation filters. This is the standard SQL behavior. However, if the selectivity of the main filter is very high as compared to the selectivity of the aggregation filters, this query option can help provide a big performance boost if the empty groups aren't required. For instance, a query like `SELECT SUM(X) FILTER (WHERE Y = 1) FROM mytable` will compute the groups over all the rows in the table by default since there's no main query filter. Setting this query option to `true` in such cases can massively improve performance if there's an inverted index on column `Y` for instance.                                                                                                                             | `false` (i.e., all groups are computed by default as per standard SQL)                                                                                                                                                                                                                                   |
| **dropResults**                                     | Set dropResults=true in the config to drop the resultTable from the response. Use this option to troubleshoot a customer's query (which may have sensitive data in the result) using metadata only.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | `false`                                                                                                                                                                                                                                                                                                  |
| **skipUnavailableServers**                          | Set skipUnavailableServers=true in the config to continue sending queries to remaining servers if dispatching a query fails.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | `false`                                                                                                                                                                                                                                                                                                  |
| **ignoreMissingSegments**                           | Ignore `SERVER_SEGMENT_MISSING` exceptions when a routed segment is unavailable on a server. When enabled, Pinot filters those exceptions during server execution and broker-side reduce so the query can continue. This is useful during short routing propagation windows after segment deletion or movement, but the query can succeed while silently omitting data from the missing segments.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | `false`                                                                                                                                                                                                                                                                                                  |
| **sampler**                                         | Selects a named table sampler from the table config. Use this to run a query against a sampled subset of segments, for example `SET sampler='small'`. See [Table](/reference/configuration-reference/table.md#table-samplers) for the table-level configuration.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | `null/empty` (normal routing, no table sampler)                                                                                                                                                                                                                                                          |
| **clientQueryId**                                   | Set to define a [custom correlation ID](/build-with-pinot/querying-and-sql/query-execution-controls/query-correlation-id.md) for a query and to [cancel queries](/build-with-pinot/querying-and-sql/query-execution-controls/query-cancellation.md).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | `null/empty`                                                                                                                                                                                                                                                                                             |
| **applicationName**                                 | Assign the query to a named application for application-level query quotas. Use this with the [Query Quotas](/build-with-pinot/querying-and-sql/query-execution-controls/query-quotas.md) APIs when you want Pinot to rate-limit queries from a specific caller or workload. If no per-application quota is configured for the name, Pinot can fall back to the global application quota default when one exists.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | `null/empty` (no application name)                                                                                                                                                                                                                                                                       |
| **accurateGroupByWithoutOrderBy**                   | Improves correctness of group-by queries with LIMIT but without ORDER BY by applying better trimming on servers. See PR #15844. Set `accurateGroupByWithoutOrderBy=true` to enable                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | `false` (disabled)                                                                                                                                                                                                                                                                                       |
| **traceRuleProductions**                            | Trace planner rule productions. Specify `SET traceRuleProductions=true` to collect and return planner rules that successfully produced new relations and the relation subtree before and after the production in time order along with rule attempt timing. Useful for debugging query planning.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | `false`                                                                                                                                                                                                                                                                                                  |
| **excludeVirtualColumns**                           | When you want to ignore virtual columns (those starting with $) in a query — such as a NATURAL JOIN where they shouldn't be participating in join condition matching. This option helps remove all virtual columns from the schema during query planning and execution making NATURAL JOIN successful. This is currently implemented in MSE.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | `false` (virtual columns are included by default for all queries during join-match)                                                                                                                                                                                                                      |
| **useSpools**                                       | Enable stage-level spooling for multi-stage queries. When enabled, Pinot can reuse equivalent stages within a query plan instead of executing them repeatedly. See [stage-level-spooling.md](/build-with-pinot/querying-and-sql/multi-stage-query/stage-level-spooling.md) for the feature behavior and limitations.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Broker level config (default `false`)                                                                                                                                                                                                                                                                    |
| **usePhysicalOptimizer**                            | Enable the Physical Optimizer for the multi-stage engine (MSE). The Physical Optimizer can automatically eliminate or simplify redundant Exchanges (shuffles) for arbitrarily complex queries without requiring query hints. Must be used with `useMultistageEngine=true`. See [physical-optimizer.md](/build-with-pinot/querying-and-sql/multi-stage-query/physical-optimizer.md) for details. (introduced in 1.4.0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | `false` (disabled)                                                                                                                                                                                                                                                                                       |
| **sortExchangeCopyThreshold**                       | For multi-stage `ORDER BY ... LIMIT` queries, control when Pinot pushes the sort-and-limit operation below the sort exchange so each upstream worker sends only its top rows. Pinot applies the rewrite only when the effective fetch is at or below this integer threshold. If there is no `LIMIT`, or the effective fetch is larger than the threshold, Pinot keeps the normal sort exchange plan.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Broker level config (default `10000`)                                                                                                                                                                                                                                                                    |
| **inferRealtimeSegmentPartition**                   | For partitioned realtime tables in the multi-stage engine, infer segment partitions from realtime segment names when Pinot cannot rely on the stored segment-partition metadata. Pinot still uses the table's `segmentPartitionConfig` for the partition column, function, and partition count. If Pinot cannot infer partitions for the routed realtime segments, it falls back to unpartitioned distribution.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | `false`                                                                                                                                                                                                                                                                                                  |
| **useLiteMode**                                     | Enable Multistage Engine Lite Mode, which runs MSE queries using a scatter-gather paradigm (like the single-stage engine) with a configurable limit on rows returned by each leaf stage instance (default 100k). This allows safe access to MSE features like window functions, subqueries, and joins at high QPS with minimal reliability risks. Requires both `useMultistageEngine=true` and `usePhysicalOptimizer=true`. See [multistage-lite-mode.md](/build-with-pinot/querying-and-sql/multi-stage-query/multistage-lite-mode.md) for details. (introduced in 1.4.0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                | `false` (disabled)                                                                                                                                                                                                                                                                                       |
| **liteModeLeafStageLimit**                          | For Multistage Engine Lite Mode, override the maximum number of records a leaf stage worker is allowed to return. The value must be a positive integer. See [multistage-lite-mode.md](/build-with-pinot/querying-and-sql/multi-stage-query/multistage-lite-mode.md) for Lite Mode behavior.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Broker level config (default `100000`)                                                                                                                                                                                                                                                                   |
| **liteModeLeafStageFanOutAdjustedLimit**            | For Multistage Engine Lite Mode, override the fan-out-adjusted leaf-stage limit with a positive integer. Pinot divides this value by the number of workers assigned to the leaf stage and uses the quotient as the per-worker hard limit instead of `liteModeLeafStageLimit`. See [multistage-lite-mode.md](/build-with-pinot/querying-and-sql/multi-stage-query/multistage-lite-mode.md) for Lite Mode behavior.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Broker level config (disabled when unset or non-positive)                                                                                                                                                                                                                                                |
| **runInBroker**                                     | For Multistage Engine Lite Mode, control where Pinot runs the non-leaf stages. When `true`, Pinot runs the non-leaf stages in the broker to preserve the scatter-gather execution model. When `false`, Pinot runs the non-leaf stages on servers instead. See [multistage-lite-mode.md](/build-with-pinot/querying-and-sql/multi-stage-query/multistage-lite-mode.md) for details.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Broker level config (default `true` when Lite Mode is enabled)                                                                                                                                                                                                                                           |
| **useLeafServerForIntermediateStage**               | For multi-stage queries, use the same servers chosen for leaf stages as the workers for intermediate stages instead of selecting from all enabled servers. This helps control fanout and reduce data shuffling. If a query has no leaf servers to reuse, such as an empty-table query, Pinot falls back to all enabled servers for the intermediate stage.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | `false`                                                                                                                                                                                                                                                                                                  |
| **inferPartitionHint**                              | For multi-stage joins, tell Pinot to infer partition information from the joined tables so it can use colocated execution when the table partitioning and server assignment allow it. This overrides the broker-level `pinot.broker.multistage.infer.partition.hint` setting for the query. See [Colocated join strategy](/build-with-pinot/querying-and-sql/multi-stage-query/join-strategies/colocated-join-strategy.md) for the conditions and alternatives such as explicit `tableOptions` hints.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Broker level config (default `false`)                                                                                                                                                                                                                                                                    |
| **orderedPreferredPools**                           | Specify a prioritized list of server pools for broker query routing, provided as a vertical bar (`\|`) separated list of pool identifiers (integers). The broker uses the list as a routing hint, attempting to route queries to the specified pools in order and falling back gracefully to other available replicas if none of the preferred pools are available. Useful for canary deployments where directing traffic to specific replica groups is desired. Currently supported for Balanced and ReplicaGroup routing strategies with Adaptive Server Selection in non-MSE mode. (introduced in 1.4.0)                                                                                                                                                                                                                                                                                                                                                                                                                               | `null/empty` (no pool preference; use default routing)                                                                                                                                                                                                                                                   |
| **workloadName**                                    | Assigns the query to a named workload for CPU and memory accounting and workload budget enforcement. This works with the host-side `accounting.workload.*` controls, and when the `workload` scheduler is enabled the same option also selects the scheduler-managed workload. See [Workload-Based Query Resource Isolation](/operate-pinot/tuning/workload-query-isolation.md) (introduced in 1.4.0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | `null/empty` (query belongs to the default workload with no budget enforcement)                                                                                                                                                                                                                          |
| **isSecondaryWorkload**                             | Marks the query as a secondary workload query. With the `binary_workload` scheduler, secondary queries run with limited threads. With the `workload` scheduler, maps to the configured secondary workload budget. See [Workload-Based Query Resource Isolation](/operate-pinot/tuning/workload-query-isolation.md) (introduced in 1.4.0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | `false`                                                                                                                                                                                                                                                                                                  |
| **enableMultiClusterRouting**                       | Enable multi-cluster querying (federation) to route queries across multiple Pinot clusters. Requires broker to be configured with remote cluster connections using `MultiClusterHelixBrokerStarter`, and only applies to logical tables. See [Multi-Cluster Querying](/build-with-pinot/querying-and-sql/multi-cluster-querying.md) for details.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | `false` (queries only execute against local cluster)                                                                                                                                                                                                                                                     |

#### Cursor Pagination

| Key               | Description                                                                                                                                                                   | Default Behavior                 |
| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------- |
| **getCursor**     | When set to `true`, a cursor is returned instead of the complete result set. This allows clients to fetch query results incrementally, which is useful for large result sets. | `false` (return full result set) |
| **cursorNumRows** | Number of rows each cursor page should contain. Only applies when `getCursor=true`.                                                                                           | Broker level config              |

#### Explain Plan

| Key                      | Description                                                                                                                                                                                                                                                                                | Default Behavior                      |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------- |
| **explainAskingServers** | Controls the explain behavior in the multi-stage engine. When set to `true`, servers are asked to return the segment plan. When `false`, Pinot returns only the logical plan. This overrides the broker-level `pinot.query.multistage.explain.include.segment.plan` setting for the query. | Broker level config (default `false`) |

#### Group-By Trim

| Key                        | Description                                                                                                                                                                                                                                                                                                                                                                                                        | Default Behavior                        |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------- |
| **minBrokerGroupTrimSize** | Minimum number of groups to keep when trimming groups at the broker level for group-by queries (SSE only). Similar to **minSegmentGroupTrimSize** and **minServerGroupTrimSize** but applied at the broker reduce phase. Setting to a non-positive value disables broker-level trim. See [#configuration-parameters](/build-with-pinot/querying-and-sql/sql-syntax/grouping-algorithm.md#configuration-parameters) | Broker level config (default `5000`)    |
| **mseMinGroupTrimSize**    | For multi-stage group-by queries, set how many groups Pinot keeps when trimming intermediate-stage aggregation results. This provides the same override as the `mse_min_group_trim_size` aggregate hint described in [Grouping algorithm](/build-with-pinot/querying-and-sql/sql-syntax/grouping-algorithm.md#configuration-parameters).                                                                           | Server level config (default `5000`)    |
| **groupTrimThreshold**     | Threshold for group-by trimming at the broker level. Controls the maximum number of groups that can be held before trimming is triggered during the broker reduce phase.                                                                                                                                                                                                                                           | Broker level config (default `1000000`) |

#### Multi-Stage Engine Group-By Streaming

| Key                                | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Default Behavior                                                                                                                    |
| ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| **streamingGroupByFlushThreshold** | For GROUP BY queries in the multi-stage engine, flushes partial group-by results when the accumulated number of groups reaches this threshold. This bounds server memory usage for high-cardinality GROUP BY queries by periodically emitting intermediate results instead of buffering all groups in memory. When set to a positive value, enables the `StreamingGroupByCombineOperator` which flushes partial results; when unset or 0, uses the standard `GroupByCombineOperator` (backward compatible). This is a middle ground between completely skipping leaf-stage group by (`is_skip_leaf_stage_group_by=true`) and full leaf-stage group by (the default). Note: result trimming is disabled in streaming mode to prevent incorrect partial aggregates. Recommended for queries with GROUP BY cardinality exceeding typical memory limits. The broker-level `pinot.broker.mse.streaming.group.by.flush.threshold` setting supplies the default for queries that do not set this option; an explicit query-level value always wins, including `0` to disable streaming mode for one query. | Broker level config when `pinot.broker.mse.streaming.group.by.flush.threshold` is set to a positive value; otherwise `0` (disabled) |

**Example: Streaming Group-By with Flush Threshold**

For a high-cardinality GROUP BY query that might cause memory pressure, you can enable streaming mode:

```sql
-- Enable multi-stage engine with streaming group-by
SET useMultistageEngine = true;
SET streamingGroupByFlushThreshold = 5000;

-- Query with potentially high cardinality GROUP BY
SELECT user_id, country, COUNT(*) as event_count
FROM events
WHERE date = '2026-04-01'
GROUP BY user_id, country
LIMIT 100000;
```

In this example, the query will flush partial group-by results every time the number of accumulated groups reaches 5000, preventing unbounded memory growth. This is especially useful for queries where the GROUP BY cardinality is unknown or very high.

#### Join and Window Overflow

| Key                    | Description                                                                                                                                                                   | Default Behavior                                                                                                    |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
| **joinOverflowMode**   | Controls behavior when a join operation exceeds **maxRowsInJoin**. Possible values: `THROW` (throw an exception) or `BREAK` (stop processing and return partial results).     | `THROW`                                                                                                             |
| **maxRowsInWindow**    | Configure maximum rows allowed in a window function operation. This helps prevent excessive memory usage when processing large window frames.                                 | Default value read from cluster config `pinot.query.window.max.rows`; if not set, defaults to **2^20 (1024\*1024)** |
| **windowOverflowMode** | Controls behavior when a window operation exceeds **maxRowsInWindow**. Possible values: `THROW` (throw an exception) or `BREAK` (stop processing and return partial results). | `THROW`                                                                                                             |

#### Multi-Cluster Routing

| Key                           | Description                                                                                                                                                                                                                                                                                                                                      | Default Behavior                                     |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------- |
| **enableMultiClusterRouting** | Enable multi-cluster querying (federation) to route queries across multiple Pinot clusters. Requires broker to be configured with remote cluster connections using `MultiClusterHelixBrokerStarter`, and only applies to logical tables. See [Multi-Cluster Querying](/build-with-pinot/querying-and-sql/multi-cluster-querying.md) for details. | `false` (queries only execute against local cluster) |

#### Broker Pruning

| Key                  | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Default Behavior                                                                                                                    |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| **useBrokerPruning** | When set to `true`, enables broker-side segment pruning for multi-stage leaf-stage routing. On the physical optimizer path, Pinot uses the broker default from `pinot.broker.multistage.use.broker.pruning`. On the logical planner path, Pinot currently applies this to non-partitioned leaf stages and uses the broker default from `pinot.broker.multistage.logical.planner.use.broker.pruning`. Query-level `useBrokerPruning` overrides the applicable broker default. | Physical optimizer path: broker config (default `true`); logical planner non-partitioned leaf path: broker config (default `false`) |

#### Index-Based DISTINCT

| Key                                | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Default Behavior                                            |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------- |
| **useIndexBasedDistinctOperator**  | When `true`, enables index-based DISTINCT operators when applicable. Pinot routes eligible `SELECT DISTINCT jsonExtractIndex(col, path, type)` queries to `JsonIndexDistinctOperator`, which reads distinct values directly from the JSON index without scanning documents. This requires the queried JSON path to have a backing JSON index and supports INT, LONG, FLOAT, DOUBLE, BIG\_DECIMAL, and STRING result types. Pinot also routes eligible single-column `SELECT DISTINCT col` queries on dictionary-encoded columns with an inverted index to `InvertedIndexDistinctOperator`. | `false` (disabled)                                          |
| **invertedIndexDistinctCostRatio** | Overrides the cost heuristic for `InvertedIndexDistinctOperator`. Pinot chooses the bitmap inverted-index path when `dictionaryCardinality * costRatio <= filteredDocCount`. Set this option to `0` to force the bitmap inverted-index path whenever the filter matches at least one row. Requires `useIndexBasedDistinctOperator=true`.                                                                                                                                                                                                                                                   | `null/empty` (use the built-in cardinality-based heuristic) |

#### DISTINCT Early-Termination (Single-Stage Engine)

| Key                                | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Default Behavior        |
| ---------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------- |
| **maxRowsInDistinct**              | Maximum number of rows to scan across all segments in a DISTINCT query before early termination. When this limit is reached, the query stops scanning additional segments and returns partial results. Only applies to single-stage engine (SSE/v1) DISTINCT queries. See `maxRowsInDistinctReached` in the response to determine if this limit was triggered.                                                                                                         | `null/empty` (no limit) |
| **maxRowsWithoutChangeInDistinct** | Maximum number of rows to scan in a DISTINCT query without producing any new distinct values before early termination. This option is useful to optimize queries when the distinct value set converges quickly. When this limit is reached, the query stops scanning and returns partial results. Only applies to single-stage engine (SSE/v1) DISTINCT queries. See `maxRowsWithoutChangeInDistinctReached` in the response to determine if this limit was triggered. | `null/empty` (no limit) |
| **maxExecutionTimeMsInDistinct**   | Wall-clock time budget in milliseconds for the combine operator in a DISTINCT query. When this time limit is exceeded, the query returns partial results. Only applies to single-stage engine (SSE/v1) DISTINCT queries. See `maxExecutionTimeInDistinctReached` in the response to determine if this limit was triggered.                                                                                                                                             | `null/empty` (no limit) |

**Example DISTINCT Queries with Early Termination**

```sql
-- Stop after scanning 100,000 rows
SELECT DISTINCT country FROM users
OPTION (maxRowsInDistinct=100000)

-- Stop after 10,000 rows without new distinct values
SELECT DISTINCT state FROM users
OPTION (maxRowsWithoutChangeInDistinct=10000)

-- Stop after 5 seconds of execution
SELECT DISTINCT user_id FROM events
OPTION (maxExecutionTimeMsInDistinct=5000)

-- Combine multiple limits
SELECT DISTINCT product_id FROM orders
OPTION (maxRowsInDistinct=50000, maxRowsWithoutChangeInDistinct=5000)
```

When any of these early-termination budgets is exceeded, the response will include the following flags to indicate which limit was reached:

* `partialResult` (boolean): Set to `true` when the result is partial due to early termination
* `maxRowsInDistinctReached` (boolean): Set to `true` if `maxRowsInDistinct` limit was exceeded
* `maxRowsWithoutChangeInDistinctReached` (boolean): Set to `true` if `maxRowsWithoutChangeInDistinct` limit was exceeded
* `maxExecutionTimeInDistinctReached` (boolean): Set to `true` if `maxExecutionTimeMsInDistinct` limit was exceeded

**Example Response with Early Termination**

```json
{
  "resultTable": {
    "dataSchema": {
      "columnNames": ["country"],
      "columnDataTypes": ["STRING"]
    },
    "rows": [
      ["USA"],
      ["Canada"],
      ["Mexico"]
    ]
  },
  "numDocsScanned": 100000,
  "numEntriesScannedInFilter": 100000,
  "numEntriesScannedPostFilter": 100000,
  "totalDocs": 10000000,
  "numSegmentsQueried": 10,
  "numSegmentsProcessed": 6,
  "partialResult": true,
  "maxRowsInDistinctReached": true,
  "maxRowsWithoutChangeInDistinctReached": false,
  "maxExecutionTimeInDistinctReached": false,
  "timeUsedms": 245
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pinot.apache.org/build-with-pinot/querying-and-sql/query-execution-controls/query-options.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.