· 7 min read ·

How Binary Fuse Filters Improve on Xor: Space, Cache, and the Segmented Array Structure

Source: lobsters

Probabilistic set membership is one of those problems where the standard solution works well enough that people stop looking for better ones. A cache, a database, a content delivery system: all of them regularly need to answer “have I seen this key before?” without the memory cost of storing every key. Bloom filters answer that question with a controlled false positive rate and modest memory overhead. They have been good enough since 1970.

But good enough has costs that compound at scale. At a billion keys, every extra bit per element is another gigabyte of RAM. Construction time matters when you are building filters over large key sets during compaction or query planning. Cache behavior matters when your filter is hot but not hot enough to live in L1.

The 2022 ACM paper by Thomas Mueller Graf and Daniel Lemire describes Binary Fuse Filters, which sit at the current frontier of static probabilistic filter efficiency. They improve on Xor Filters, which Graf and Lemire introduced in their 2020 Journal of Experimental Algorithmics paper. Understanding why Binary Fuse Filters are better requires understanding what Xor Filters got right and where they left room for improvement.

How Xor Filters Work

A Bloom filter maps each key to k positions in a bit array using k independent hash functions, then sets those bits. A query checks whether all k bits are set. With optimal k, space usage is about 1.44 * log2(1/ε) bits per element, which works out to roughly 9.59 bits per element at a 1% false positive rate.

Xor filters take a different approach. Instead of asking “are these k bits set?”, they ask “do these three fingerprint values XOR to the expected result?” For an 8-bit variant, the filter maintains an array of 8-bit values divided into three equal blocks. For each key x, the construction guarantees:

array[h0(x)] XOR array[h1(x)] XOR array[h2(x)] == fingerprint(x)

Where h0, h1, h2 map into their respective blocks, and fingerprint(x) is an 8-bit hash of x. A membership query computes the three positions, fetches the values, XORs them, and checks against the expected fingerprint. Three array accesses and two XOR operations, regardless of dataset size.

The array must be larger than the number of keys to make construction possible. Xor8 requires an array of size approximately 1.23n, which at 8 bits per slot comes to 9.84 bits per key. The false positive rate is roughly 1/256, around 0.39%, because an arbitrary key has a 1/256 chance of the XOR equation holding by coincidence.

Construction uses a process from combinatorics called hypergraph peeling. Think of the keys as hyperedges in a 3-uniform hypergraph, where each hyperedge connects the three positions a key hashes to. You iteratively remove hyperedges that have at least one position covered by only that edge. If the entire graph peels down to empty, you have a valid peeling order. Then you assign fingerprint values in reverse peeling order: when you place a key, two of its positions already have values, so you set the third to make the XOR equation hold.

This works with high probability when m ≈ 1.23n. Below that threshold, the hypergraph develops a non-trivial 3-core, a subset of edges where every position is covered by at least two edges, which defeats peeling. The 23% overhead is the price of guaranteed peelability over random inputs.

What Binary Fuse Filters Change

Binary Fuse Filters use a segmented array structure. The fingerprint array is divided into many equal-sized segments, and the three hash functions are constrained to map within a local span of contiguous segments rather than across three globally separate blocks. A key’s primary hash determines a starting segment, and all three lookup positions fall within a bounded window starting from there.

This change looks minor but has two significant effects.

First, it reduces the required overhead. Constraining positions to a local window creates better load balancing across the array. The resulting hypergraph structure requires less slack to guarantee peelability with high probability, because the locality of positions reduces the chance of globally difficult configurations. Binary Fuse Filters need an array of approximately 1.125n rather than 1.23n, reducing space from 9.84 to roughly 9.0 bits per key at 8-bit fingerprints. That is an 8-9% reduction, which matters at the scale where you would choose a specialized filter in the first place.

For 100 million keys, xor8 needs roughly 123 MB; the same key set in Binary Fuse 8 needs around 113 MB. The savings increase with key count, since both are linear but with different constants.

Second, construction becomes more cache-friendly. Xor filter construction accesses positions scattered across three globally separate blocks, generating cache misses proportional to array size when the filter is too large to fit in cache. With the segmented structure, the three positions for any given key fall within a small local range of the array. For large filters built during database compaction or bulk index construction, this difference is measurable in wall-clock time.

Query performance is similar between the two. Both require exactly three array accesses per lookup.

Using Binary Fuse Filters in Rust

The xorf crate provides both Xor and Binary Fuse filter implementations. Construction is straightforward:

use xorf::{BinaryFuse8, Filter};

fn main() {
    let keys: Vec<u64> = (0u64..1_000_000).collect();

    let filter = BinaryFuse8::try_from(&keys)
        .expect("construction failed");

    // No false negatives: every inserted key is found
    assert!(filter.contains(&0));
    assert!(filter.contains(&999_999));

    // False positive rate ~0.39% for random keys not in the set
    let outside: u64 = 1_000_001;
    println!("Contains key outside set: {}", filter.contains(&outside));

    // With serde + bincode, roughly 1.125 MB for 1M keys
    let bytes = bincode::serialize(&filter).unwrap();
    println!("Filter size: {} bytes", bytes.len());
}

The try_from constructor retries internally with different seeds if the first construction attempt does not converge, which happens with negligible probability. For lower false positive rates, BinaryFuse16 uses 16-bit fingerprints and drops FPR to roughly 1/65536 at about 18 bits per key.

Go users can use the xorfilter package from the FastFilter organization, which includes Binary Fuse support. Daniel Lemire also maintains a single-header C implementation suitable for embedding in systems code where you cannot take external dependencies.

Where These Filters Actually Get Used

Static filters appear in a few recurring scenarios. Database query optimizers use them to avoid scanning partitions that cannot possibly contain a queried key. LSM-tree databases like RocksDB use Bloom filters per SST file to skip unnecessary disk reads; Binary Fuse Filters would provide better space utilization for the same false positive rate, and faster filter construction during compaction.

DNS filtering and network routing systems build large static sets of known-bad domains or prefixes. Filters over tens of millions of entries where the set refreshes periodically are a good fit for static construction. The space savings translate directly to better cache utilization for the filter itself.

Cryptocurrency light clients use probabilistic filters to request only relevant blocks from full nodes. BIP 158 standardized Golomb-coded sets for this purpose, and newer systems have been evaluating denser alternatives. Binary Fuse Filters are a natural fit because a block’s transaction set is fixed once the block is mined.

For join processing in analytical databases, filters over the smaller join side allow early elimination of non-matching rows from the probe side. DuckDB and similar systems use Bloom filters in this role today; the space reduction from Binary Fuse would matter when join filters are built over large intermediate result sets.

The Static Filter Tradeoff

All of xor, Binary Fuse, and most high-performance filter variants are static: you build them once over a fixed key set and cannot add or remove keys without rebuilding. This is fine for the use cases above, where the key set is immutable or rebuilt periodically. If you need dynamic membership with streaming inserts or deletions, you are back to Bloom filters with counting variants, or Cuckoo filters, which support deletions at some cost in complexity and space.

The right choice depends on your update pattern. A join filter built fresh per query can be static. A cache membership oracle tracking keys as they age out needs something dynamic.

Binary Fuse Filters occupy the current frontier of static filter efficiency. The paper’s theoretical analysis shows the space overhead approaching the information-theoretic minimum more closely than prior constructions. The minimum for a false positive rate of ε is log2(1/ε) bits per key; for 8-bit fingerprints and ε = 1/256, that minimum is 8 bits, and Binary Fuse 8 lands at roughly 9.0, compared to xor8’s 9.84 and Bloom’s 9.59 at similar false positive rates.

Future work could narrow that remaining gap further, but at this point the practical bottleneck for most systems is elsewhere. If you are building something that queries a large static set many times with construction cost amortized over many lookups, these filters are worth knowing about. The original 2022 paper is worth reading for the theoretical construction, particularly if you want to understand why the segmented structure achieves better peelability than the blocked structure that Xor Filters use.

Was this interesting?