Bloom Filter
Bloom filter helps prune segments that do not contain any record matching a EQUALITY predicate, e.g.
SELECT COUNT(*) from baseballStats where playerID = 12345
There are 3 parameters to configure the bloom filter:
fpp: False positive probability of the bloom filter (from0to1,0.05by default). The lower thefpp, the higher accuracy the bloom filter has, but it will also increase the size of the bloom filter.maxSizeInBytes: Maximum size of the bloom filter (unlimited by default). If a certainfppgenerates a bloom filter larger than this size, we will increase thefppto keep the bloom filter size within this limit.loadOnHeap: Whether to load the bloom filter using heap memory or off-heap memory (falseby default).
There are 2 ways of configuring bloom filter for a table in the table config:
Configure bloom filter columns with default settings
{
"tableIndexConfig": {
"bloomFilterColumns": [
"playerID",
...
],
...
},
...
}Configure bloom filter columns with customized parameters
{
"tableIndexConfig": {
"bloomFilterConfigs": {
"playerID": {
"fpp": 0.01,
"maxSizeInBytes": 1000000,
"loadOnHeap": true
},
...
},
...
},
...
}Currently bloom filter can only be applied to the dictionary-encoded columns. Bloom filter support for raw value columns is WIP.
Last updated
Was this helpful?

