"I'm not proud of being a congenital pain in the ass. But I will take money for it."

Unwedging /dev/loop voodoo

Thu 08 December 2022 | -- (permalink)

At $dayjob we make heavy use of a build system which creates pre-formatted bootable disk images for various platforms. Some of the tricks involved in getting this to work are kind of nasty. In particular, GRUB puts up a fight when asked to configure boot information for something it doesn't really believe is a disk. The build system knows this and uses various strange tricks to convince GRUB that it's working on a disk. All of which works surprisingly well...except occasionally when debugging changes to the build system itself, which of course can introduce bugs that crash the build system in ways that it doesn't quite know how to clean up.

Upshot is that sometimes we find that the build system has left a "stuck" /dev/loop device that needs to be unwedged by hand.

The first few steps are obvious. Assuming for purposes of discussion that we're trying to free up /dev/loop0:

lsblk -f

will show what block devices are present and what filesystems are mounted. If a filesystem is mounted, say on /dev/loop0p3, unmounting it with umount is a good first cleanup step.

Once all filesystems are dismounted, in theory we can just do:

sudo losetup -d /dev/loop0
sudo losetup -l

If the second losetup shows that the loopback device is gone, you're done. But sometimes it's not, even if the first losetup returns successfully, because all losetup is really doing here is scheduling the device for removal as soon as nothing is using it anymore. If something still has the device open, it won't go away (yet).

At this point things get weird. At least with this build system, sometimes it's necessary to use dmsetup to remove device entries from the device mapper:

sudo dmsetup remove /dev/mapper/loop1p1 
sudo dmsetup remove /dev/mapper/loop1p2
sudo dmsetup remove /dev/mapper/loop1p3

If the device mapping was all that was keeping the loopback device open, losetup should show it gone now.

But sometimes even that's not enough, because some process still has the device open even though tools like lsof don't show it. A process with such a device open will cause the dmsetup call to fail. So now we look for processes that have the device open:

sudo egrep 'loop|dm-' /proc/*/mountinfo  

For whatever reason, this incantation seems able to spot processes that have handles on loopback or mapped devices even when lsof can't. If this command returns anything at all, you can use the process IDs it reports to investigate the processes in question, and whack them if they look like the culprit(s), after which you'll need to re-run the failed dmsetup call, at which point, if nothing else goes wrong, losetup will finally show /dev/loop0 gone.

If you think this write-up is primarily as a reference for the next time I find myself down this particular rabbit hole, you're not wrong.