Everyone is always raving about bloom filters. But what exactly are they, and what are they useful for?
The Bloom filter is a space-efficient, probabilistic data structure – used to test whether an item does not belong to a collection.
The basic bloom filter supports two operations: add and query.
Query is used to check whether a given element is in the set or not. It can only return a boolean value:
- true, if the element is probably in the set.
- false, if the element is definitely not in the set.
Add simply adds an element to the set.
Removal is impossible without introducing false negatives, but there are versions where is possible to remove the element e.g. counting filters.
Internally Bloom filters use a bit array, and multiple different hash functions.
Let’s say for instance we have a bit array of a 100 elements and 3 hash functions.
Add, when we want to insert the word “Maciej” into the filter:
- We pass it through hash functions:
- hash 1, returns 33
- hash 2, returns 7
- hash 3, returns 22
- Next, we go to each of those elements in the array and set them to 1.
Query, now to test whether the word might be in the collection:
- We pass it through hash functions, and check those elements in the bit array:
- true, if all 3 elements are set to 1.
- false, if any one of the elements are set to zero.