The server does not boot. No show on the network, no display output when connected to a monitor. How did this happen?
We were minding our own business and watching the television when ZAP… power outage… Oh well I thought, it will be back in a moment. And so it was. The outage lasted for about 10-15 minutes and was caused by a lightning strike in a converter somewhere. Thereafter we had no internet. Curious, I thought but perhaps the internet provider was affected by the outage and it will take a short moment for it to come back up..
The next day internet was still offline additionally I couldn’t listened to my music on plex, even on the LAN. This is were I noticed the server was offline aswell. I gave up on studying and devoted myself to fixing the server, but why is the internet still offline? It dawned on me, of course! Pihole!
I had set up Pihole on my server and the router has the server set up as primary AND secondary DNS so when it’s down, no internet… So now internet was back online! But what happened to my server? Why isn’t it booting? After rebooting the machine a couple of times without change I accepted the fact that the some part of the OS concerned with the boot-sequence probably is corrupt due to the sudden power outage.
Backup? what backup?
So on my previous setup I used the automatic backup plugin for open media vault(OMV). Unfortunately I didn’t backup the boot and/or the root partition, only the data volumes for my docker containers. So how to restore the server? In hindsight I could’ve probably replaced the boot partition and it would come back online but I downloaded a new OMV image and burned it onto the eMMC module. Back online but all settings and setups where gone… Even worse, the docker interface of the older versions of OMV where gone aswell. In its place we now have docker compose.
Tbh docker compose is a much better setup than using the docker interface, but I was used to the interface. The old graphical web interface made it intuitive for me as a Docker novice to set up docker containers. The problem was that it didn’t help you understand how docker really works and it felt a lot like getting the services to work was completely up to chance. Docker compose is based on the setup file(a YAML file) and if you learn how it functions you can very easily set up(or orchestrate) a whole set of services in a heartbeat(or at least a few minutes). If I had used docker compose to begin with I wouldn’t have had such a problem with getting my server back online.
I still can’t say that i have gotten the hang of docker compose nor have I got all services back online. One service that is causing me problems is Plex which is giving me very strange errors and I still haven’t installed pihole. I am working on it though!
Protection against future problems
So how do I prevent that I have to redo all these setups again when disaster strikes again? For one, I am now backing up both the boot and root partition as an image on my RAID array. “How are you going to access the image if you can’t boot your machine?”, well I’ll just pop in a temporary SD-card with a OMV image on it and extract the image and burn it onto the eMMC module. Sure, it’s not seamless but at least I don’t have to redo all of the setups I have done on the server. In case that don’t work I now have a docker compose YAML file that can orchestrate the setup of all my docker containers. That way I can easily redo that part of my setup. To be honest, the docker containers are the most time-consuming part and the rest is pretty easy.
So why did a simple blackout corrupt my OS?
Well, I’m no engineer but I suspect that it’s because of the eMMC module lacking power loss protection. A solution for this would be investing in a UPS that would keep the server up during the outage. The machine is very low power so I wouldn’t need a powerful one.
The lesson learned here is that you need to check that your backup is adequate and that you understand how to resurrect your system from the type of backup you have.