Botnet Mitigation with ipset

The Internet is a hostile place. Public servers are under constant attack. The GNU & FSF servers are no exception. A large part of the task of keeping our servers online is to deal with these ongoing attacks.

DDoS Distributed Denial of Service attacks are now so strong that no single server can stand under a large attack. These attacks take down even well equipment large commercial providers. If we come under this attack then we will fall the same as others have fallen.

But much abuse from the network is not intending to take our servers offline but do so by being careless. Currently the biggest problem hitting our servers routinely are "AI" scrapers. Almost all of our data is already made public but these poorly designed web scrapers will most often scrape every URL on every page including the version control browsable pages. Whereas it would be most efficient to git clone the entire repository they scrape every version of every project in the most inefficient way. This browns out our servers.

We also get abuse from scrapers that hit completely broken URLs. When these are serving static pages these are handled fairly well with 404 returns from the web servers. But when these are from dynamic pages such as the FastCGI CGIT dynamically served pages which are rather heavy weight processes it really hits the server system resources hard. These botnets can take the entire server down due to browning them out by hammering on the cgit interface.

When this happens and a large botnet is hammering on the cgit web UI for browsing git repositories dynamic FastCGI cgit.cgi interface it pegs the load average of the system to the max number of those processes that we configure. Since we configure this to be the maximum tolerable on the system the system resource drain becomes large. All CPUs run at 100% and legitimate clients are starved out of the system.

In one case the pattern of URL the botnet was hitting was easily identifiable. In this case the pattern was a mangled impossible one with multiple project.git strings one after the other in the URL. This easily identifiable pattern was used in two ways to mitigate this attack.

Nginx Configuration

The URL could be identified within the Nginx configuration file and then using this to immediately return a 429 HTTP code (Too Many Requests) without processing the much heavier cgit.cgi process.

location /cgit/ {
        location ~ ^/cgit/.*\.git/.*\.git/.*\.git/ {
                return 429;
        }
        ... serve cgit.cgi ...

iptables and ipset

Traditionally we would use fail2ban rules to recognize and then block abusing IP addresses. This could still be useful but initially we had two problems. One is fail2ban only works with IPv4 addresses at this time and this attack included both IPv4 and IPv6. Another is that this was a huge botnet that we knew was over a million strong remote IP addresses from over the globe and that this would not work well with iptables. Additionally writing fail2ban rules is tedious.

We were aware of ipset but had not previously had experience using it. Jing provided initial documentation, motivation, and energy into using ipset and this worked well. Thanks Jing! This resulted in a simple script to extract the IP addresses that were hitting the URL pattern and then putting them into an ipset to be blocked.

I wrote a perl script to do this block. The main part of the pattern was the same as the above for the Nginx configuration. This pattern was used to build the ipset block list.

m{/cgit/.*\.git/.*\.git/.*\.git/}

In order to create the ipset table it must be created. I knew the ipset for IPv4 would be a big list and started with these initial sizes.

create cgit-bl hash:ip family inet hashsize 1048576 maxelem 2500000
create cgit-blv6 hash:ip family inet6

After the ipset is created then this was added to the iptables using the following.

iptables -w -I INPUT -m set --match-set cgit-bl src -p tcp -m multiport --dports 80,443 -j DROP
ip6tables -w -I INPUT -m set --match-set cgit-blv6 src -p tcp -m multiport --dports 80,443 -j DROP

The size of the ipset turned out to be insufficiently large enough. After running for a full day and night we hit the 2500000 maxelem limit! This required destroying the ipset and creating it again with a larger size. This was done by saving the previous ipset, increasing the size, restoring it in a larger size. To destroy the ipset it must be removed from use in the iptables first. Saving and restoring is very fast and takes only a few seconds.

ipset save > /var/tmp/ipset-cgit-bl-save
iptables -w -D INPUT -m set --match-set cgit-bl src -p tcp -m multiport --dports 80,443 -j DROP
ipset destroy cgit-bl
iptables -w -D INPUT -m set --match-set cgit-bl src -p tcp -m multiport --dports 80,443 -j DROP
ipset destroy cgit-bl
sed 's/maxelem 2500000/maxelem 3000000/' /var/tmp/ipset-cgit-bl-save > /var/tmp/ipset-cgit-bl-save2
ipset restore < /var/tmp/ipset-cgit-bl-save2
iptables -w -I INPUT -m set --match-set cgit-bl src -p tcp -m multiport --dports 80,443 -j DROP
ip6tables -w -I INPUT -m set --match-set cgit-blv6 src -p tcp -m multiport --dports 80,443 -j DROP

At this time we know the modified larger ipset size has become this larger creation with these larger values.

ipset create cgit-bl hash:ip family inet hashsize 2097152 maxelem 3000000

The resulting ipset sizes after most of the botnet IPs have been accrued are the following.

root@vcs3:/var/log# ipset list cgit-bl | head
Name: cgit-bl
Type: hash:ip
Revision: 5
Header: family inet hashsize 2097152 maxelem 3000000 bucketsize 12 initval 0x73846ad0
Size in memory: 72030328
References: 1
Number of entries: 2635555
Members:
38.61.137.112
165.16.161.199

root@vcs3:/var/log# ipset list cgit-blv6 | head
Name: cgit-blv6
Type: hash:ip
Revision: 5
Header: family inet6 hashsize 1024 maxelem 65536 bucketsize 12 initval 0x385c59aa
Size in memory: 113104
References: 1
Number of entries: 3017
Members:
2400:cb00:413:1000:c28a:c512:99d1:be1c
2400:cb00:573:1000:e954:b34f:c108:519a

Currently it feels like we are in the tail of the botnet when all of the IPs of the fast machines have already been collected. But there is always a long thin tail of botnet systems which trickle in for a long time. At this writing the botnet is still showing us new members. Those new members are immediately being added to the ipset block list. It's astounding that this botnet is so large. Though it is possible that some of these systems are on dynamic addresses and are simply rotating to new addresses.

The result at this time is that the system is able to keep operating acceptably in spite of this very large 2.63 Million botnet strong abuse. That's mainly because this feels like a misconfigured "AI" scraper and not an actual DDoS attack. No single system would survive a DDoS from such a large botnet. But since this seems to be an incorrectly configured scraper botnet this was able to be mostly mitigated.

Conclusion

Using ipset worked extremely well! It will be useful to configure fail2ban to insert IPs into an ipset rather than an iptables and use it to manage the ban on these IPs. Because then everything would be fully automatic without further human intervention.