mainIPQ BDB filter - Homepage

Missing SVG capability? This image is an svg file that is supposed to be displayed inline. If your browser does not support scalable vector graphics, and you're not interested in upgrading it, chances are that you're not interested in this page's content either.

ibd is used as a prefix for the commands that can be issued. It stands for IP filter based on a Berkeley Database, a further contraction of IPQ BDB (pronounced I-peek-you b'oody-bud). The diagram on the left illustrates the concept. It may refer to a single server, a cluster, or a local subnet.

The blue part represents netfilter, the kernel module normally configured via the iptables command, running on the bastion host. Netfilter includes a netfilter queue part (NFQUEUE) that marshals packets to a user-space daemon. ibd-judge sniffs the packets and looks up the relevant IPv4 addresses in a Berkeley DB. If the record is found, its probability of being blocked is compared to a random number, and the result determines the verdict.

Behind the firewall, web and/or mail daemons listen for client connections. Bad client behavior can be recognized by a local script or by parsing log files. For example, a wrong userid/ password pair, non-existing users or web pages, delivery to spamtraps, are symptoms of bad client behavior. Scripts can use ibd-ban, while ibd-parse reads log files. Reported IPs will either be inserted into the db, or, if they already exist, their probability to be blocked will be doubled. While the first attempts are harmless, if the client persists the probability will grow high enough to affect the daemon's next verdict for that IP.

As time passes the probability of being blocked decays, and the IP is eventually rehabilitated without human intervention. In the twilight range of middle probabilities, gray clients experience occasional timeouts in their attempts to connect to the server, while the server reserves more band for good clients. For new records, the initial probability is specified by the initial count; that is, the number of times required to reach 100% probability by doubling the current value each time the IP is caught. The decay is expressed as the number of seconds required for the probability to halve. These values, as well as a human readable reason, accompany each invocation of ibd-ban and each regular expression configured for ibd-parse.

I've been running this since December 2008. Version 1 in January 2010 changed the database record structure, and I restarted collecting IPs from scratch. Berkeley DB's Concurrent Data Store model is simple and effective for controlling access to the database. With a nominal record size of 64 bytes, in this moment (March 2011, testing v1.03 candidate) my block.db is 36Mbytes, holding 319691 records at an average of 118.54 bytes each.

Maintenance consists in running ibd-del once a day. That command (not showed in the diagram) is used to list or delete selected records.

Size is a minor problem, though. The btree algorithm performs as O(log N), where N is the number of keys and the logarithm's base is the average number of keys per page. IPv4 addresses result in tiny 4-byte keys, so we may easily fit 1000 keys per page, which means that access time would roughly double whenever the total number of mapped addresses is multiplied by one million! With such performance, I could map the entire IPv4 address space, up to the end of the world.