Bloom Filter

This page describes configuring the bloom filter for Apache Pinot

The bloom filter prunes segments that do not contain any record matching an EQUALITY predicate.

This is useful for a query like the following:

SELECT COUNT(*) 
FROM baseballStats 
WHERE playerID = 12345

There are 3 parameters to configure the bloom filter:

  • fpp: False positive probability of the bloom filter (from 0 to 1, 0.05 by default). The lower the fpp , the higher accuracy the bloom filter has, but it will also increase the size of the bloom filter.

  • maxSizeInBytes: Maximum size of the bloom filter (unlimited by default). If a fpp setting generates a bloom filter larger than this size, using this setting will increase the fpp to keep the bloom filter size within this limit.

  • loadOnHeap: Whether to load the bloom filter using heap memory or off-heap memory (false by default).

There are 2 ways to configure a bloom filter for a table in the table configuration:

  • Default settings

{
  "tableIndexConfig": {
    "bloomFilterColumns": [
      "playerID",
      ...
    ],
    ...
  },
  ...
}
  • Customized parameters

{
  "tableIndexConfig": {
    "bloomFilterConfigs": {
      "playerID": {
        "fpp": 0.01,
        "maxSizeInBytes": 1000000,
        "loadOnHeap": true
      },
      ...
    },
    ...
  },
  ...
}