3

Today my server was bloated with hundred of requests to the contact page of my site (/contact) in just 2 minutes.

I get hundred of these lines in my apache log:

*31.13.115.6 - - [18/Jun/2019:10:54:39 +0200] "GET /contacto HTTP/1.1" 301 331 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 232* *31.13.115.25 - - [18/Jun/2019:10:54:39 +0200] "GET /contacto HTTP/1.1" 301 331 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 232* 

I'm not sure what is the cause of this, but my server was down because of this. I want to make sure this will not happen again.

My server provider told that I can block these request adding a rule in my .htaccess using RewriteCond.

I know that I will have to use something like:

RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit/1.1" 

but I have not much knowledge about this.



UPDATE for MrWhite:

I think that I know what could be the problem. I have an old site oldsite.com which is redirected to my new site newsite.com. In the htaccess of oldsite.com I added these lines to created the redirection:

Rules in oldsite.com/.htaccess

RewriteEngine on RewriteRule ^(.*)$ https://www.newsite.com/$1 [R=301,L]

This rule was created because I changed the domain of my site, then the goal of this rule is to redirect the traffic from the oldsite to the newsite without hurting SEO.

It worked fine until now. Do you think this could be the cause of this? If so, do you think I need to change this rule in www.oldsite.com/.htaccess instead of adding other rules in www.newsite.com/.htaccess ?

3
  • Presumably oldsite.com and newsite.com aren't pointing to the same place, ie. they don't share the same .htaccess file? Presumably the log entries you posted above are from newsite.com, not oldsite.com? What about HTTP vs HTTPS - these would normally be in separate logs. What are the log entries that follow the above? The redirect directives you posted above shouldn't be a problem. But if these requests are for oldsite.com (as opposed to newsite.com) then these blocking directives need to be applied to oldsite.com, not newsite.com. Commented Jun 20, 2019 at 9:38
  • @MrWhite 1- oldsite.com is pointing to newsite.com. 2- newsite.com has not any pointing rule. 3- The log entries are from newsite.com. I will add the blocking rules to newsite.com. Commented Jun 21, 2019 at 7:22
  • When I say "pointing to the same replace" I mean that the domains resolve to the same place (before redirection) - the same place on the filesystem. But I think you have taken it to mean "redirect"? If oldsite.com and newsite.com reside on separate filesystems and you are simply redirecting from the old to the new and the log entires are from the new, then the redirect you posted above would seem to having nothing to do with this problem? The "301 redirects" in the log entries above are caused by something else at newsite.com. Commented Jun 21, 2019 at 7:58

1 Answer 1

3

You state that these requests are for your contact page /contact, however, the log entries you've posted are for /contacto (and extra "o") and these show a 301 redirect response, which will trigger a second request to your server (providing the crawler follows redirects). Why is there a 301 redirect? To what page are you redirecting to?

These do appear to relate to the genuine Facebook "crawler", but as noted in numerous StackOverflow questions, the Facebook crawler does seem to be prone to being rather aggressive!

RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit/1.1" 

The RewriteCond (condition) directive alone does nothing. You need a RewriteRule to actually do something.

For example:

RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit/1\.1 RewriteRule ^contact$ - [F] 

The above will send a 403 Forbidden for all requests to /contact where the user-agent starts with facebookexternalhit/1.1. (It's a regex, so the literal dot should be backslash escaped.)

The request is naturally still hitting your application server (to block the request entirely you would need some kind of proxy), but it's now not doing much when it does.

The accepted answer on the linked question above talks about sending a 429 Too Many Requests status instead (together with a Retry-After header) - but this is only after a certain number of requests in quick succession (PHP script provided).

1
  • I added an update to my question for you. I think you hit the cause of this issue. Commented Jun 20, 2019 at 7:23

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.