GeistHaus
log in · sign up

go-away

git.gammaspectra.live

Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots.

9 pages link to this URL
powxy

Scraper-defense reverse proxy

2 inbound links object en gitnon-profitfossossfreesoftwareopensourcecodehosting
Mataroa blog

Blogging platform for minimalists.

0 inbound links en blogbloggingplatformfastsimpleminimal
Crawlers hitting Forgejo instances - global abuse trend

Detailed discussions on excessive crawling targeting code.forgejo.org: - [February 2025](https://codeberg.org/forgejo/discussions/issues/297) - [April 2025](https://codeberg.org/forgejo/discussions/issues/331) --- Codeberg and the Forgejo infrastructure (which are entirely separate) were both ...

2 inbound links object en gitnon-profitfossossfreesoftwareopensourcecodehosting
Code

Hopefully-useful tidbits of code and information.

0 inbound links webpage en
Dealing with Web Scrapers

Nowadays it seems like every tech company is eager to scrape the web. Unfortunately, it seems like 1 the majority of traffic that comes to this small site are scrapers. While my static website is able to handle the load, the same cannot be said about everyone. Overall, the techinques I’ve seen website owners use aim to make scraping more difficult. Though it’s a balance. The harder we make it for bots to access a website, the more we turn away regular humans as well. Here’s a short and non-exhaustive list of techinques:

0 inbound links article en blog Web ScrapingCAPTCHARate LimitingRobots.txtProof of Work
Preventing bot scraping on Publ and Flask

This morning I was once again thinking about how to put some proper antibot behavior onto my websites, without relying on Cloudflare. There are plenty of fronting proxies like Anubis and Go Away which put a simple proof-of-work task in front of a website. This is pretty effective, but it adds more of an admin tax (and is often quite difficult to configure for servers that host multiple websites, such as mine), and sometimes the false positive rates can have some other bad effects, such as disallowing feed readers and the like.

0 inbound links website en