System of a Down, or How I Rebuilt This Site in a Week

In some of my previous posts, I mentioned that I host this site from my own server. This gives me a lot of options to modify the site in any way I want and add features that would otherwise be difficult or impossible if I were paying for a VPS service in a datacenter. It does however mean that I am a lot more prone to server downtime in the event of a hardware or software failure. This is a story about one such failure.

About two years ago, I bought my first server. It was an R710 with 16 GB of ram and two Xeon 5000 series processors. This is a pretty basic configuration with very limited capabilities, but it was good enough for learning.

When I got the server I very quickly learned how to use ESXI, the operating system of the server. It allows me to create virtual machines for different functions using industry accepted software. Over time I had to upgrade the server and add more configurations to it which made it “organically grow” as I learned more about what was possible. This was where I made my mistake.

Since I didn’t have much knowledge about what options were out there when I set up the server, I had to learn through trial and error. I learned how to set up network interfaces, adjust firewall settings for better security, manage a website, and many other things. I am fortunate though that I started playing with archive.org when I did or I would be in a lot more trouble than I am right now.

For those who haven’t heard of it, archive.org is a website that aims to take snapshots of websites so you can go back in time to see what was on a website at a particular date. To get it to archive a page, a user has to tell it what page to save. This site doesn’t have too many pages so I had the whole thing up on archive.org in about a half hour. Less than a day later I did a software update which took the site down.

My initial attempts to repair the server gave me little success. Apache, one of the programs used to host the website, had seemingly lost all of its configurations. WordPress also had a lot of messed up configurations which didn’t make much sense. After correcting the configuration files, I was able to get Apache to show its default page but it still didn’t see the wordpress site. At this point I decided it would be faster to start from scratch since the problems were running a lot deeper than I expected.

The rebuild

This isn’t a completely bad thing. I was actually wanting to do a rebuild of the server so I can correct some of the mistakes I made much earlier during its initial setup. Among the list of things I wanted to do was adding a RAID 6 array to the server. This would make the server able to tolerate up to two failed hard drives which can help reduce risks from mechanical failure. I also wanted to ditch ESXI since they were wanting me to pay exorbitant amounts of money to do something as simple as taking a backup of my system. Fortunately I found an open source alternative, ProxMox.

Similar to ESXI, ProxMox is a hypervisor that allows multiple virtual machines to run on one server. It supports a bunch of features such as high availability clustering or Ceph if I wanted to add more servers for fault tolerance. The main draw for me though was the ability to take snapshots. If you have ever used quicksaves in games, then snapshots are the server equivalent. If I want to experiment with something that could ruin the server, I can take a snapshot and work from there knowing I can always roll back to the previous snapshot. An example of this can be seen in image 1 below.

Image 1: Taking snapshots is as easy as hitting a button

While I was setting up the server, I also took the chance to set up the individual services using Docker containers. These are small self-contained software packages that can be easily moved from one machine to another. If I upgrade my server in the future, these will make the migration to the new machine much easier since I won’t have to install every dependency manually.

With everything running through Docker, I have a lot of options of what software I want to install on the server. Since I am experimenting with OpenDroneMap, a free open source program used to stitch images into 3D maps, I decided to install the ODM containers first. I was surprised at how easy it was to install – it only took one command to install everything needed to run the service. It really drove home the reason why so many organizations are transitioning their services to Docker.

Image 2: WebODM dashboard with two processes running

WordPress was also very easy to set up. It took only an hour or two to get everything set up between the database, plugins, and themes. This is where the time consuming part begins. Since I don’t have the database from the previous server, I have to build everything from the ground up. That means using copy/paste to move all of the posts from the archive.org copy to the website. I have this process about half way done but it is very time consuming since I have to find all of the files on my computer and update all of the links in each post. It is also possible that some posts might loose their images forever but only time will tell.

This has been a very stressful week for me but I am glad to have it over. The server is now back up and running and it’s going to be a lot safer in the future. I still have to install some other smaller services that used to run on the old server but since they aren’t as commonly used I can put those off until I have time to work on them. The docker containers will also allow me to do more experimenting with high availability services in the future with the possibility for fail-over to another server if this issue happens again.

Leave a Reply