primaryKeyColumns
to the schema definition.send
API. If the original stream is not partitioned, then a streaming processing job (e.g. Flink) is needed to shuffle and repartition the input stream into a partitioned one for Pinot's ingestion.strictReplicaGroup
as the routing strategy. To use that, configure instanceSelectorType
in Routing
as the following:stream.kafka.consumer.type
must be lowLevel
.hashFunction
are NONE
, MD5
and MURMUR3
, with the default being NONE
.hashFunction
config in the Dedup config, which can be MD5
or MURMUR3
, to store the 128-bit hashcode of the primary key instead. This is useful when your primary key takes more space. But keep in mind, this hash may introduce collisions, though the chance is very low.pinot.server.dedupPrimaryKeysCount.tableName
to watch the number of primary keys in a table partition. It's useful for tracking its growth which is proportional to the memory usage growth.