Replica Group
, which allows us to control the number of servers to fan out for each query.Replica Group
is a set of servers that contains a ‘complete’ set of segments of a table. Once we assign the segment based on replica group, each query can be answered by fanning out to a single replica group instead of all servers.Replica Group
can be configured by setting the InstanceAssignmentConfig
in the table config. Replica group based routing can be configured by setting replicaGroup
as the instanceSelectorType
in the RoutingConfig
.numReplicaGroups
to control the number of replica groups (replications), and use numInstancesPerReplicaGroup
to control the number of servers to span. For instance, let’s say that you have 12 servers in the cluster. Above configuration will generate 3 replica groups (numReplicaGroups=3
), and each replica group will contain 4 servers (numInstancesPerPartition=4
). In this example, each query will span to a single replica group (4 servers).numReplicaGroups
and numInstancesPerReplicaGroup
, you should consider the trade-off between throughput and latency. Given a fixed number of servers, increasing numReplicaGroups
factor while decreasing numInstancesPerReplicaGroup
will give you more throughput because each server requires to process less number of queries. However, each server will need to process more number of segments per query, thus increasing overall latency. Similarly, decreasing numReplicaGroups
while increasing numInstancesPerReplicaGroup
will make each server processing more number of queries but each server needs to process less number of segments per query. So, this number has to be decided based on the use case requirements.Partitoning
can be enabled by setting the following configuration in the table config.Modulo
, Murmur
, ByteArray
and HashCode
hash functions. After setting the above config, data needs to be partitioned with the same partition function and number of partitions before running Pinot segment build and push job for offline push. Realtime partitioning depends on the kafka for partitioning. When emitting an event to kafka, a user need to feed partitioning key and partition function for Kafka producer API.segmentPrunerTypes
in the RoutingConfig
. Note that the current implementation for partitioning only works for EQUALITY and IN filter (e.g. memberId = xx
, memberId IN (x, y, z)
).