Scraper-defense reverse proxy
Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots.
Scraper-defense reverse proxy
Blogging platform for minimalists.
Detailed discussions on excessive crawling targeting code.forgejo.org: - [February 2025](https://codeberg.org/forgejo/discussions/issues/297) - [April 2025](https://codeberg.org/forgejo/discussions/issues/331) --- Codeberg and the Forgejo infrastructure (which are entirely separate) were both ...
Hopefully-useful tidbits of code and information.
Nowadays it seems like every tech company is eager to scrape the web. Unfortunately, it seems like 1 the majority of traffic that comes to this small site are scrapers. While my static website is able to handle the load, the same cannot be said about everyone. Overall, the techinques I’ve seen website owners use aim to make scraping more difficult. Though it’s a balance. The harder we make it for bots to access a website, the more we turn away regular humans as well. Here’s a short and non-exhaustive list of techinques:
This morning I was once again thinking about how to put some proper antibot behavior onto my websites, without relying on Cloudflare. There are plenty of fronting proxies like Anubis and Go Away which put a simple proof-of-work task in front of a website. This is pretty effective, but it adds more of an admin tax (and is often quite difficult to configure for servers that host multiple websites, such as mine), and sometimes the false positive rates can have some other bad effects, such as disallowing feed readers and the like.
Blogging platform for minimalists.