Ayush's Personal Blog

BIP157 and BIP158

Introduction

There are several modes in which a bitcoin client can operate.

A fullnode downloads all the blocks present in the blockchain from its peers. While it requires significant bandwidth and memory usage, it can confirm the validity of all transactions and blocks. As long as a full node is connected to atleast one honest node it is extremely difficult (if not impossible) to feed it with malicious information.

By contrast an SPV client downloads only the block headers from its peers. Using the block headers the SPV client can verify that it is on the longest PoW( Proof of work) chain. The SPV client can check if a certain transaction is included in a block by using the merkle path (which you get by requesting from a full node), However an SPV client still cannot validate if the transaction spends a valid UTXO because it does not have access to all the transactions. Since the SPV client has to specifically request the full nodes for the transaction it is interested in, it might lead to a breach of privacy as the full nodes would then be able to figure out which addresses are of interest to the SPV client.

This ends up creating a privacy vs bandwidth tradeoff, In this post we look at some of the BIPs which help address these concerns.

BIP37 - The Bloom Filters

Bitcoin developers proposed BIP37 as a way to help with these concerns. BIP37 utilizes Bloom Filters to introduce some false positive rate when requesting for transactions from a full node. This makes it difficult for a full node to accurately ascertain which addresses are of interest to the SPV client. Bloom filters A bloom filter uses multiple hash functions to map elements to some index which is an integer between 0 and m-1 where m is the size of bloom filter. When we add an element to a bloom filter we pass the element through all the hash functions and set that bit of bloom filter to 1. Now to test the membership of some element in the bloom filter, we pass the element through all the hash functions and check if we get a result from all the hash functions such that the bit at that position is set to 1. This test never produces false negatives but it does produce false positives. In the context of bitcoin, we add all the elements we are interested in (for e.g transaction outputs, addresses, etc.) to the bloom filter and then send it to a full node. Then the following steps occur:

Problems with bloom filter

BIP157 and BIP158

To address these problems bitcoin developers proposed BIP157 and BIP158. BIP158 describes the filtering process while the BIP157 describes the networking protocols associated with these filters.

The idea of these proposals is to essentially reverse the process of client sending a filter to node to that of a node sending filters to client. The node sends deterministic filters for each record in the blockchain to the client, the client can use the filters to decide if that entry is of interest to the client.

Golomb Filters

BIP158 makes the use of golomb coding. BIP158 constructs Golomb Coded Sets by

In bitcoin repository, the number of items N and M both are constraint to less than 2^32. In golomb rice coding every value is divided into two parts: quotient and remainder after dividing it with 2^P respectively. The quotient is stored in unary while the remainder is represented with P bits in big-endian format.

For querying the hash of elements are reconstructed by reversing the process. Then the hash of the query element can be used to check if it exists in the set or not.

Currently the BIP defines only a single filter type with the value of M=784931 and P=19, However it can be extended to support multiple filter types. A part of my project at Summer of bitcoin 2022 this year was to add support for multiple filter types in bcoin repo.

Conclusion

Thereby GCS filters helps in compressing the data and at the same time helps with the privacy concerns as now a client can request for the transactions of its interest from multiple nodes instead of relying on just one node. It also helps with security concerns as the filters only need to be generated once and stored on the disk. Hence it is not possible to conduct a DoS attack on a node.