Power Outage Corrupted XFS Filesystem | How I Fixed It

Dell Inspiron 20 3048-07-Calendar

This past Monday, 27 May 2019, there was a somewhat severe storm that rolled through Southwestern Michigan that had a disruption on power. I have numerous computers in the house, most of which run some variation of openSUSE. Most of the computers are also battery backed in some form except for one, my Kitchen Command Center. In many ways, I think it is rather crazy computers don’t now have battery backups by default. Since I didn’t take the time and care to have a UPS (Uninterruptible Power Supply) on this computer, it lost power and so my troubles began.

Post Storm

After the storm was cleared from the skies and the likelihood of another power outage had diminished I felt it was safe to power my devices back on. Upon doing so, the machine booted up like it normally would but logging in and all the applications were dreadfully slow. I must emphasis dreadfully in the slowness of the system, I could see that the disk was thrashing but the RAM was hardly being used. Looking at the System Monitor, my I/O was taking up all the CPU bandwidth. This was most certainly not the normal behavior of this machine and I was becoming a sad “Geeko.” RAM usage was less than 2 GiB and Plasma Desktop kept hanging, a behavior with which I am most certainly not familiar. I was starting to worry that there may have been hardware damage.

After doing a little digging, I was able to determine that it was related to a corrupted file system and in my usage of the computer, my estimation was that it was the /home directory partition and not the root directory. When I looked at the System Activity, whatever application I was trying to use had “disk sleep” next to it in the table. My first course of action was to do a file system repair.

The Fix

I rebooted the machine and instead of logging in to the desktop environment, I dropped down to a terminal (Ctrl + Alt + F2), logged in as root.

I unmounted /home, which, in my case is located at /dev/sda4

umount /dev/sda4

Since the terminal didn’t give me any confirmation the drive was unmounted, I checked

df -h

Looking through the list (You can omit the -h). I saw that there was indeed nothing mounted at /home so I was able to conduct the repair.

xfs_repair /dev/sda4

After several minutes. The process completed and seemingly completed without any errors. I rebooted the system and crossed my fingers.

Post Repair

Seemingly everything is back to normal. Whatever was causing the “Disk Sleep” is not happening anymore. I performed another update on the machine,

sudo zypper dup

rebooted it and it is continuing to function just as it had before. I have not lost any data on the computer and I am using like it all never even happened. I don’t know the exact cause and depth of the corruption but I am just glad to be back to normal.

Final Thoughts

I have had to forcibly shut down systems with XFS before and this is the first time I have had to do a file system repair. I could see that someone without technical expertise could just think their computer was broken and take more intrusive actions. I am also not sure if there was some sort of file system integrity verification that didn’t happen that should have automatically checked and repaired the file system that has normally done so. Regardless, the fix was relatively straight forward and the computer is back to normal. Furthermore, it might also behoove me to gift the machine with a UPS.

After losing a few hours of use out of the computer, I was able to learn another tool in my open source / Linux toolbox. The storm, although inconvenient, has given me further confidence in the technology I have chosen.

Further Reading

https://linux.die.net/man/8/xfs_repair
Dell Inspiron 20 3048 All-In-One Desktop

Tuning Snapper | BTRFS Snapshot Management on openSUSE

BTRFS on openSUSE.png

Throughout my time helping users with openSUSE, one reoccurring issue that I have heard or read from some users has been the issue of system snapshots by Snapper filling up the root file system. Users have complained that their root file system fills up which ultimately locks up their system. This is often caused by setting up the root partition with an insufficient size, less than 40 GiB. Some users may not want to allocate that much space so a common course of action is to either use BTRFS without snapshots, use XFS or ext4.

There is this misguided impression that BTRFS is not a file system to be trusted but I can, with great assurance tell you that I have yet to have an issue with the file system. If you disagree with this than your perception is based on either a non-openSUSE implementation or if you had problems on openSUSE you did not satisfy its recommendation.

BTRFS with snapshots is a good option for newer machines but your disk partition size may be less than the recommended 40GiB for root, here is what you can do to adjust Snapper. As root open the following file in your editor of choice:

/etc/snapper/configs/root

Scan down the configuration file and look for the line #limit for number cleanup section. To limit the total number of snapshots, adjust the NUMBER_LIMIT and NUMBER_LIMIT_IMPORTANT lines.

I changed mine to the following:

# limit for number cleanup
NUMBER_MIN_AGE=”1800″
NUMBER_LIMIT=”2-6″
NUMBER_LIMIT_IMPORTANT=”4-6″

After this adjustment, I have no more than 6 total file system snapshots and it reduced the space taken up by snapshots by about 10 GiB. It should be understood that your mileage may vary depending on how much you fiddle with your system and how much software you have installed.

Final Thoughts

openSUSE is such a stable distribution, even the rolling release, Tumbleweed, that snapshots are almost not necessary. I personally look at snapshots as a kind of insurance policy but the fact is, as long as I have a working internet connection and a working terminal, entering sudo zypper dup (in Tumbleweed) will likely fix any issues I may have caused. As far as Leap is concerned, I haven’t seen an update that brok a system which would require a rollback. That doesn’t mean something couldn’t slip past openQA that may affect your system, I just haven’t seen it.

Also note, I have such confidence in openSUSE Tumbleweed with BTRFS, it is what is on my home server. In over a year, not one update has broken any of the servers or messed with any configurations. It should also be noted that I run older and generally Linux friendly hardware so my chance at issues is much less.

Further Reading

SUSE.com Snapper Cleanup