rbd unmap fails with "Device or resource busy"

* rbd unmap fails with "Device or resource busy"
@ 2022-09-13  1:20 Chris Dunlop
  2022-09-13 11:43 ` Ilya Dryomov
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Dunlop @ 2022-09-13  1:20 UTC (permalink / raw)
  To: ceph-devel

Hi,

What can make a "rbd unmap" fail, assuming the device is not mounted and not 
(obviously) open by any other processes?

linux-5.15.58
ceph-16.2.9

I have multiple XFS on rbd filesystems, and often create rbd snapshots, map 
and read-only mount the snapshot, perform some work on the fs, then unmount 
and unmap. The unmap regularly (about 1 in 10 times) fails like:

$ sudo rbd unmap /dev/rbd29
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy

I've double checked the device is no longer mounted, and, using "lsof" etc., 
nothing has the device open.

A "rbd unmap -f" can unmap the "busy" device but I'm concerned this may have 
undesirable consequences, e.g. ceph resource leakage, or even potential data 
corruption on non-read-only mounts.

I've found that waiting "a while", e.g. 5-30 minutes, will usually allow the 
"busy" device to be unmapped without the -f flag.

A simple "map/mount/read/unmount/unmap" test sees the unmap fail about 1 in 10 
times. When it fails it often takes 30 min or more for the unmap to finally 
succeed. E.g.:

----------------------------------------
#!/bin/bash

set -e

rbdname=pool/name

for ((i=0; ++i<=50; )); do
   dev=$(rbd map "${rbdname}")
   mount -oro,norecovery,nouuid "${dev}" /mnt/test

   dd if="/mnt/test/big-file" of=/dev/null bs=1G count=1
   umount /mnt/test
   # blockdev --flushbufs "${dev}"
   for ((j=0; ++j; )); do
     rbd unmap "${rdev}" && break
     sleep 5m
   done
done
----------------------------------------

Running "blockdev --flushbufs" prior to the unmap doesn't change the unmap 
failures.

What can I look at to see what's causing these unmaps to fail?

Chris

^ permalink raw reply	[flat|nested] 16+ messages in thread