Multi-Cluster Querying
Query data across multiple Pinot clusters using multi-cluster querying (federation)
Multi-cluster querying (also known as federation) enables Apache Pinot brokers to execute queries across multiple Pinot clusters using logical tables. By creating a logical table that references physical tables in different clusters, you can query data distributed across clusters as if it were in a single unified table.
Overview
Multi-cluster querying allows you to:
Create logical tables that combine physical tables from multiple clusters
Query federated logical tables as a single unified view
Execute cross-cluster joins using the multi-stage query engine
Aggregate data from multiple clusters in a single query
Important: Only logical tables support multi-cluster federation. Physical tables in remote clusters cannot be queried directly — you must create a logical table that references them.
This feature is particularly useful for:
Geographic distribution: Query data from clusters in different regions
Data isolation: Query across clusters with different data retention policies or security boundaries
Logical table federation: Combine physical tables from different clusters into a single logical view
How It Works
Multi-cluster querying extends the broker's routing capabilities to include remote clusters:
Broker startup: The broker connects to remote clusters as a Helix spectator via their ZooKeeper instances
Routing table building: The broker maintains routing tables for both local and all remote clusters, periodically checking for table changes
Query execution: When
enableMultiClusterRouting=trueis set:The broker queries routing managers for all clusters
Routes are combined across local and remote clusters
Queries are scattered to servers in all applicable clusters
Results are merged before returning to the client
Prerequisites
Before enabling multi-cluster querying:
All clusters must be accessible via ZooKeeper from the broker host
Network connectivity must exist between the broker and servers in all clusters
For logical table federation, table schemas must be compatible across clusters
Configuration
Broker Configuration
To enable multi-cluster querying, configure your broker to use MultiClusterHelixBrokerStarter and specify remote cluster connections:
Configuration Properties:
pinot.remote.cluster.names: Comma-separated list of remote cluster namespinot.remote.zk.server.<clusterName>: ZooKeeper address for each remote cluster
Example Configuration:
Broker Startup Class
Add the multi-cluster broker starter to your configuration file:
The pinot.broker.startable.class property is required. Without it, the broker uses the default HelixBrokerStarter which only supports single-cluster querying.
Starting the Broker
Start the broker with your configuration file:
Or use config overrides for quick testing:
Verifying Broker Startup
Check the broker logs for successful remote cluster connections:
If a remote cluster fails to connect, you'll see warnings but the broker will still start:
Running Multi-Cluster Queries
Multi-cluster routing must be explicitly enabled for each query using the enableMultiClusterRouting query option.
Without enableMultiClusterRouting=true, queries only execute against the local cluster, even if the broker is configured for multi-cluster.
Using SET Statement
Cross-Cluster Joins with Multi-Stage Engine
Multi-cluster querying works with the multi-stage query engine for complex queries including joins:
Each table in the join must still be a logical table. Multi-cluster routing rejects direct queries to physical tables.
Logical Tables with Multi-Cluster
Only logical tables can be federated across clusters. Physical tables cannot be directly queried across clusters. To query data from multiple clusters, you must create a logical table that references the physical tables in each cluster.
Multi-cluster querying integrates with logical tables, allowing physical tables from different clusters to be combined:
Creating a Federated Logical Table
Step 1: Ensure physical tables exist in each cluster:
sales_east_OFFLINEin PrimaryCluster (local)sales_west_OFFLINEin SecondaryCluster (remote)
Step 2: Create the logical table configuration:
tableName
Name of the logical table
physicalTableConfigMap
Map of physical table names to their configurations
multiCluster
Set to true for tables in remote clusters, false for local tables
refOfflineTableName
Reference table for schema and config inheritance
Step 3: Create the logical table via the Controller API:
Step 4: Query the logical table:
The logical table must be created on each cluster's controller, with appropriate multiCluster flags indicating which tables are local vs. remote from that cluster's perspective.
Query Behavior
Unavailable Clusters
If a remote cluster is unavailable (cannot connect to its ZooKeeper), the broker will:
Continue processing queries using available clusters
Add a warning to the query response indicating which clusters are unavailable
Query results may be incomplete if data exists only in unavailable clusters
Example Response with Unavailable Cluster:
Error code 510 (RemoteClusterUnavailable) indicates query results may be incomplete.
Configuration Reference
Broker Properties
pinot.remote.cluster.names
Comma-separated list of remote cluster names
Yes
pinot.remote.zk.server.<clusterName>
ZooKeeper address for the specified remote cluster
Yes (per cluster)
pinot.broker.startable.class
Must be org.apache.pinot.broker.broker.helix.MultiClusterHelixBrokerStarter
Yes
Query Options
enableMultiClusterRouting
boolean
false
Enable querying across all configured clusters
Performance Considerations
Network latency
Cross-cluster queries add network RTT
Place brokers with low latency to all clusters
Query timeouts
Timeout applies to entire federated query
Set timeout considering worst-case cluster latency
Routing table size
Broker memory increases with remote tables
Monitor broker heap usage
ZK connections
Broker maintains spectator connections to all ZKs
Ensure ZK can handle additional connections
Limitations
Logical tables only: Only logical tables can be federated across clusters; physical tables cannot be directly queried from remote clusters
No cross-cluster ingestion: Multi-cluster querying is read-only; data ingestion is cluster-specific
Schema compatibility: Logical tables require compatible schemas across physical tables
Single query timeout: Cannot set per-cluster timeouts
ZK accessibility: All remote ZooKeepers must be reachable from the broker host
Troubleshooting
Broker fails to start or connect to remote cluster
Symptoms: Broker logs show connection failures to remote ZooKeeper
Resolution:
Verify ZooKeeper address is correct:
pinot.remote.zk.server.<clusterName>Test network connectivity:
nc -zv <zk-host> <zk-port>Check ZooKeeper is running and accessible
Verify no firewall rules blocking the connection
Queries don't return data from remote clusters
Symptoms: Query results seem to only include local data
Resolution:
Confirm
enableMultiClusterRouting=trueis set in queryCheck broker logs for remote cluster connection status
Verify table exists in remote cluster
For logical tables, verify
multiCluster: trueis set correctly
Error code 510 in query response
Symptoms: Query response contains exception with error code 510
Resolution: This indicates a remote cluster was unavailable. Check:
Remote cluster health and ZooKeeper status
Network connectivity from broker to remote ZooKeeper
If temporary, results may be incomplete but query succeeds
High query latency
Symptoms: Multi-cluster queries are significantly slower than single-cluster
Resolution:
Check network latency to remote cluster servers (not just ZK)
Use
EXPLAIN PLAN FORto understand query executionConsider data locality optimizations
Related Documentation
Query Options - Complete list of query options including
enableMultiClusterRoutingMulti-Stage Query Engine - Advanced query execution across clusters
Tables - Table types and concepts including logical tables
Broker Configuration - Broker configuration reference
Last updated
Was this helpful?

