Recent Server Downtime

I have been a member of the Purdue Amateur Radio club for nearly 4 years and have made some great friends there and eventually rose to the position of Vice President.  One of the benefits of reaching an officer position is that I also get to colocate my server in the Shack with the club.  This helps me out a lot with keeping my small apartment quiet and in return I opened the server up for controlling my packet radio remotely to anyone in the club who is willing to go through the training for it.  This does however have the side effect of making my server prone to accidental power loss if somebody trips over the power cords.

On Saturday, I was asked to come up to the shack to help reorganize the rack.  The club has recently received a lot of equipment which needed to be mounted in the rack, so we decided to shut everything down and redo everything from the ground up.  The rack looks much nicer now with properly managed cables and a better layout for our gear but we had to unplug the server to do it.  This is where the problems started.

My server runs a hypervisor known as ESXi.  For those who aren’t in the know, that means the server runs virtual machines which then host the various services.  The VM that hosts this site and a few other services, known as Ubuntu Main, is my only “high availability” VM on the machine.  I keep that in quotes because high availability on my VMs is typically around 1 month.  The hypervisor on the other hand can stay running for much longer times.  When I shut it down, I was surprised to find that the server had been on for 199 days!  That’s the longest I’ve managed to keep a machine running without rebooting and it would have probably gone until it experienced a hardware failure.

After reorganizing and wiring up the rack, we began powering everything up and inspecting everything.  From the outside, everything seemed to be working normally but once I got back home and started trying to log in to WordPress I realized something wasn’t right.  Whenever I tried to log in, the site would error out and give an HTTP Error 500.  This is verybad.

Since my server is public facing, meaning it has nothing between it and the internet,  I am always being scanned by automated bots looking for potential hacking targets on a daily basis.  For this reason, I run firewalls on everything but they can only do so much to protect the server.  When a server returns an Error 500, some of those bots recognize the error as a vulnerability and will continue to scan further to see if they can compromise the system.  To add to this, I wasn’t able to log into the site’s admin interface to change anything until I fixed the error.

After googling the issue, I found out that this error is actually not uncommon.  WordPress has options to add plugins for firewalls, text editors, and many other functions which can improve the site.  As it turns out, this is a common error for WordPress which is caused by a plugin going corrupt.  Further testing showed that the Captcha logins were primarily to blame for the error.  I added that plugin when I started the site because of a rampant number of bots posting ads with their own accounts (seriously, anybody visiting this site, a UAS blog, isn’t going to care about growing weed indoors.  Just think about the irony in that one for a moment.)

I have restored the functionality of the site after discovering that bug, but since the captchas were the only thing keeping the bots at bay, I will be turning off the sign up option to keep the spam to a minimum.  If you would like to be able to post comments, feel free to email me with the info in my contact page.

Leave a Reply