Wednesday, March 11, 2026

Re: cvsweb news

On 06/03/2026 22:28, Dave Polaschek wrote:
> Identifying the crawlers is (almost) as simple as three ifs in a trenchcoat: https://chronicles.mad-scientist.club/tales/surviving-the-crawlers/#three-ifs-in-a-trenchcoat

only 1 of those 3 ifs would be applicable here, but that's better that none

> As for filtering by ip address, I found on my test website that blocking AWS, Azure, Digital Ocean, and a few other "cloud providers" stopped most of the AI crawlers, but the ones that got through after I blocked those were using an automated chrome running from residential IP addresses, and were even more aggressive about crawling than the ones coming from the cloud providers. These are nasty, because they are real browsers from real residential addresses, using some sort of browser extension to do the crawling unbeknownst to the human user.

no, don't block IPs. score requests. what rspamd does for mail, you
should do for http requests, more or less.

neutral request: 0 points

bad protocol compliance? +1
bad IP? +1
bad user agent? +1

so you get a normal browsers, 0 points, vs suspected bot, 1 point and
that's that.

blocking requests by IPs end up in cutting off legit VPN traffic... and
now since we need VPNs to access our pornhubs and spankbangs here in
Europe, their usage will only intensify

No comments:

Post a Comment