I ran into a problem recently where my customers were unable to access their Trac instances, which we serve via mod_wsgi, without receiving a 503 error. Under the hood, we saw an error of the form:
No such file or directory: mod_wsgi (pid=[redacted]): Unable to connect to WSGI daemon process '[redacted]' on '/var/run/wsgi.[redacted].sock'
The socket is made up of 3 numbers, so if you see a socket of the form /var/run/wsgi/1.2.3.sock, 1 refers to the server process that spawned it, 2 refers to the ap_my_generation setting (a number related to gracefuls), and the 3 is an entry_id, which I don't know about because I didn't have to.
I concluded that Apache was essentially saying to mod_wsgi, "You promised there would be a socket at this place we agreed on, and there's not." mod_wsgi is no longer serving sockets there because it has been respawned by a new child process.
Using the socket number, I was able to determine that there was a stray httpd process that should have died many gracefuls ago, but did not. We graceful dozens of times per day to accommodate the changes on the server, so it's unsurprising one would occasionally get dropped. In our case, the immediate solution was to kill the process mentioned in the socket number (1 in the example above). At that time, the problem went away.
Thursday, June 19, 2014
Subscribe to:
Posts (Atom)