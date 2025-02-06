Geekzone: technology news, blogs, forums
ForumsICT Policies and RegulationLast night our NZ site was 'attacked' by a ChatGPT bot. Anyone else?
atomjump

3 posts

Wannabe Geek


#318639 6-Feb-2025 09:49
Send private message quote this post

Hi - we are a self-hosted website in New Zealand (AtomJump), and during a five hour period last night we registered 70K hits by OpenAI's ChatGPT bot, ie. an average of 4 requests per second, but peaking at maybe 10 requests/second.

 

We scrambled and blocked their bot on our robots.txt file. Our service stayed up, but probably only because we have a few load-balanced servers.

 

OpenAI have not got back to us, though they seemed to stop the bot 'attack' after our robots.txt change.  Is anyone else in NZ seeing similar behaviour?

davidcole
6008 posts

Uber Geek

Trusted

  #3339840 6-Feb-2025 11:46
Send private message quote this post

On no, it’s started.  It’s become self aware and decided our websites are the problem.  Next it will be us!!!!  Where’s John Conner when we need him. 




atomjump

3 posts

Wannabe Geek


  #3339881 6-Feb-2025 13:15
Send private message quote this post

😀 Curiously I don't think their legal team (the only sensible contact address I could find on OpenAI's website) got the AI memo. No answer so far.  Maybe the bots have seen the future, already, and terminated them before they could reply?

marpada
471 posts

Ultimate Geek


  #3339887 6-Feb-2025 13:30
Send private message quote this post

How do you know the requests came from ChatGPT? User Agent header is trivial to spoof.



freitasm
BDFL - Memuneh
78984 posts

Uber Geek

Administrator
ID Verified
Trusted
Geekzone
Lifetime subscriber

  #3339888 6-Feb-2025 13:40
Send private message quote this post

atomjump:

 

We scrambled and blocked their bot on our robots.txt file. Our service stayed up, but probably only because we have a few load-balanced servers.

 

 

This has no effect at all. It's known most of these bots ignore robost.txt. Very few bots are "good netizens". Also the robots.txt file is not checked on every request so it would probably continue doing it.

 

 

OpenAI have not got back to us, though they seemed to stop the bot 'attack' after our robots.txt change.  Is anyone else in NZ seeing similar behaviour?

 

 

You won't hear back from them. They aren't good netizens either.

 

If you are worried you'd need a service like Cloudflare Security. This is the free Bot detection (which also includes blocking RSS feeds, and might impact inbound API requests):

 

 

This is the paid version:

 

 

 

 

 

Or you can define some Web Application Firewall rules for this:

 

 

Again, robots.txt does nothing if they ignore it. And they mostly do.




atomjump

3 posts

Wannabe Geek


  #3339901 6-Feb-2025 15:16
Send private message quote this post

Thanks both. Cloudfare is an interesting option, although we are purposely independent, as an organisation, from any US-based company.  Our router firewall is probably the next best bet.

 

re: spoofing the header agent. Yes, we can't be 100% certain it is them.

 

The user agent was:

 

"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)"

 

On adding the recommended

 

"User-agent: GPTBot  Disallow: /"

 

it did seem to back down within a few hours (unless it simply ran it's course of URIs).  

 

Other theories are malicious use of their API: https://www.theregister.com/2025/01/19/openais_chatgpt_crawler_vulnerability/

 

 

philipnewmannz80
2 posts

Wannabe Geek

Trusted

  #3366587 22-Apr-2025 18:57
Send private message quote this post

Late to the party but was looking to see what others were doing without using a 3rd party service and I came across A few Apache Mod_Security Rules to rate limit different types of bot traffic. You will find 99% of the bot traffic will come though HTTP/1.1 and normal website uses will mostly be on HTTP/2

Behodar
10353 posts

Uber Geek

Trusted
Lifetime subscriber

  #3366589 22-Apr-2025 19:18
Send private message quote this post

I've also seen a trick of putting a "honeypot" URL in robots.txt (and nowhere else): if a given client hits that disallowed URL then you can automatically block them at the firewall. You'd probably want to set a reasonable timeout that auto-unblocks them again in case the IP address gets reallocated to a legitimate user.

