Bloom Filter
This page describes configuring the bloom filter for Apache Pinot
The bloom filter prunes segments that do not contain any record matching an EQUALITY predicate.
This is useful for a query like the following:
SELECT COUNT(*)
FROM baseballStats
WHERE playerID = 12345
There are 3 parameters to configure the bloom filter:
fpp
: False positive probability of the bloom filter (from0
to1
,0.05
by default). The lower thefpp
, the higher accuracy the bloom filter has, but it will also increase the size of the bloom filter.maxSizeInBytes
: Maximum size of the bloom filter (unlimited by default). If afpp
setting generates a bloom filter larger than this size, using this setting will increase thefpp
to keep the bloom filter size within this limit.loadOnHeap
: Whether to load the bloom filter using heap memory or off-heap memory (false
by default).
- Default settings
{
"tableIndexConfig": {
"bloomFilterColumns": [
"playerID",
...
],
...
},
...
}
- Customized parameters
{
"tableIndexConfig": {
"bloomFilterConfigs": {
"playerID": {
"fpp": 0.01,
"maxSizeInBytes": 1000000,
"loadOnHeap": true
},
...
},
...
},
...
}
Last modified 1mo ago