Replica Group, which allows us to control the number of servers to fan out for each query.
Replica Groupis a set of servers that contains a ‘complete’ set of segments of a table. Once we assign the segment based on replica group, each query can be answered by fanning out to a single replica group instead of all servers.
Replica Groupcan be configured by setting the
InstanceAssignmentConfigin the table config. Replica group based routing can be configured by setting
numReplicaGroupsto control the number of replica groups (replications), and use
numInstancesPerReplicaGroupto control the number of servers to span. For instance, let’s say that you have 12 servers in the cluster. Above configuration will generate 3 replica groups (
numReplicaGroups=3), and each replica group will contain 4 servers (
numInstancesPerPartition=4). In this example, each query will span to a single replica group (4 servers).
numInstancesPerReplicaGroup, you should consider the trade-off between throughput and latency. Given a fixed number of servers, increasing
numReplicaGroupsfactor while decreasing
numInstancesPerReplicaGroupwill give you more throughput because each server requires to process less number of queries. However, each server will need to process more number of segments per query, thus increasing overall latency. Similarly, decreasing
numInstancesPerReplicaGroupwill make each server processing more number of queries but each server needs to process less number of segments per query. So, this number has to be decided based on the use case requirements.
Partitoningcan be enabled by setting the following configuration in the table config.
HashCodehash functions. After setting the above config, data needs to be partitioned with the same partition function and number of partitions before running Pinot segment build and push job for offline push. Realtime partitioning depends on the kafka for partitioning. When emitting an event to kafka, a user need to feed partitioning key and partition function for Kafka producer API.
RoutingConfig. Note that the current implementation for partitioning only works for EQUALITY and IN filter (e.g.
memberId = xx,
memberId IN (x, y, z)).