Fri, 12 Jan 2007
Graceful Reload
Yesterday we have tried to solve a problem we probably had for a month or so: we have observed a very high load spikes on our application cluster servers. There were usually only few such spikes a day, and the spikes usually did not occur on all servers simultaneously. I think the problem lasted since we have moved to the new system (Apache 2.2 based, native x86_64). Here is a load graph (the problem has been solved around 5:30pm):
Mirek found that during this load peak there
was an extraordinary number of Apache
processes serving our title page (which is
quite computationally intensive, but rarely used in such a massive scale).
So we thought about
somebody DDoSing[?] us. But according to the Apache status page the clients came from 127.0.0.1
[?] address.
I don't know about any case where our application would want to access
our title page over HTTP (we do some self-referencing requests
for, for example, WAP access, but none for the title page). After increasing
the server log level we have found that these requests had strange User-Agent
value "internal dummy connection
". Quick search
for this string gave us the answer:
During the "graceful reload", the main Apache process apparently contacts
its children not by sending them the SIGUSR1
signal, as in previous
Apache releases, but instead sends them a dummy request "GET /
",
so that they can after the request check (and find out) that the
configuration has been changed, and terminate themselves.
So every time we have changed something in our applications (which is
several times a day), there was many Apache processes trying to serve our
dynamic title page to the Apache itself. Because there are some other
(service-only) Apache processes, the load spike was sometimes way
bigger than an ordinary remote DDoS attack can cause.
A mod_rewrite
hack in the server configuration has solved the
problem - we redirect such dummy requests to /robots.txt
instead
of the dynamic title page:
<Directory /documentroot> RewriteEngine on RewriteCond %{HTTP_USER_AGENT} internal\ dummy\ connection RewriteRule ^$ /robots.txt [L] ... </Directory>
If you ask me, I think it is pretty lame way to restart itself. The URL in the internal
request is not even configurable (what would Apache do when not configured
to listen on 127.0.0.1
at all?), and from my searches it looks
like we are not the first who ran into this problem.