btrfs degraded raid1 gone read-only

* btrfs degraded raid1 gone read-only
@ 2021-05-12 23:02 Dominique Martinet
  0 siblings, 0 replies; only message in thread
From: Dominique Martinet @ 2021-05-12 23:02 UTC (permalink / raw)
  To: linux-btrfs

Hi,

this looks a bit like the warning on the wiki[1], but since that falls
under old kernels it's probably slightly different.

[1] https://btrfs.wiki.kernel.org/index.php/Gotchas#raid1_volumes_only_mountable_once_RW_if_degraded

Kernel 5.11.16-300.fc34.x86_64

How to get there (didn't try to reproduce yet, I'll keep the broken
device around a few days if requested to get more traces just in case
but it looks reproductible):
 - two disks in raid1 profile
 - at some point I was lacking slotting space, took one disk out and
mounted just the other. I didn't take any special care so it was most
likely mounted rw, but I didn't write any data on it as I expected to
put the drives back together later.
 - put the disks back together, time passed and tons of data was
written...
 - this time I had to "sacrifice" a disk to a silly appliance which
wouldn't let me ssh into it without formatting a disk first, so I backed
up data as well as I could, did a full scrub and picked a disk of the
two to feed the monster.
The whole volume was also brought down during the operation (no
hotplug), so I didn't take any special care in separating the disks
e.g. didn't do anything like btrfs device remove...

That's done and I'd like to add the disk back to the raid1, but the
original volume can no longer be mounted read-write, so all disk
operations fail saying the volume is read-only.

dmesg, first mounting with -o degraded,ro
[ 3324.135037] BTRFS: device fsid c3f4826a-6936-445d-8bae-98233ddda48f devid 1 transid 22031 /dev/dm-1 scanned by systemd-udevd (4812)
[ 3340.207257] BTRFS info (device dm-1): allowing degraded mounts
[ 3340.207273] BTRFS info (device dm-1): using free space tree
[ 3340.207278] BTRFS info (device dm-1): has skinny extents
[ 3340.208125] BTRFS warning (device dm-1): devid 2 uuid 0bab0bad-2cd2-42f4-96e8-8dff89d44552 is missing
[ 3340.218286] BTRFS warning (device dm-1): devid 2 uuid 0bab0bad-2cd2-42f4-96e8-8dff89d44552 is missing
[ 3340.516655] BTRFS info (device dm-1): bdev /dev/mapper/data1 errs: wr 0, rd 0, flush 0, corrupt 0, gen 1799

then mount -o remount,rw failing:
[ 3512.420569] BTRFS info (device dm-1): allowing degraded mounts
[ 3512.420615] BTRFS info (device dm-1): using free space tree
[ 3512.421180] BTRFS warning (device dm-1): chunk 3475900465152 missing 1 devices, max tolerance is 0 for writable mount
[ 3512.421189] BTRFS warning (device dm-1): too many missing devices, writable remount is not allowed
[ 3512.421206] BTRFS info (device dm-1): allowing degraded mounts
[ 3512.421218] BTRFS info (device dm-1): using free space tree

and `btrfs ins dump-tree -t chunk /dev/mapper/data1` does show that
chunk is indeed missing, from what I can see only these two are
(num_stripes 1):
	item 109 key (FIRST_CHUNK_TREE CHUNK_ITEM 3475900465152) itemoff 3995 itemsize 80
		length 1073741824 owner 2 stripe_len 65536 type METADATA
		io_align 65536 io_width 65536 sector_size 4096
		num_stripes 1 sub_stripes 1
			stripe 0 devid 2 offset 1752381259776
			dev_uuid 0bab0bad-2cd2-42f4-96e8-8dff89d44552
	item 110 key (FIRST_CHUNK_TREE CHUNK_ITEM 3476974206976) itemoff 3915 itemsize 80
		length 33554432 owner 2 stripe_len 65536 type SYSTEM
		io_align 65536 io_width 65536 sector_size 4096
		num_stripes 1 sub_stripes 1
			stripe 0 devid 2 offset 1271311368192
			dev_uuid 0bab0bad-2cd2-42f4-96e8-8dff89d44552

This can also be confirmed with btrfs fi df which prints a warning:
Data, RAID1: total=2.81TiB, used=2.81TiB
System, RAID1: total=32.00MiB, used=416.00KiB
System, single: total=32.00MiB, used=0.00B
Metadata, RAID1: total=4.00GiB, used=3.61GiB
Metadata, single: total=1.00GiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING:   Metadata: single, raid1
WARNING:   System: single, raid1

So my guess is that when I mounted the other disk a while ago, these two
chunks got created to accomodate e.g. atime updates or whatever it is
that it wanted to accomodate.

Later when I remounted both arrays I didn't notice there were single
extents in the pool, and scrub only checks that each extent is coherent
so didn't reveal any problem. Data later written with two disks
correctly got written as raid1, so in practice there is no problem with
any of the data and as far as I can see all files are readable and
intact.

If I had never put the disks back together or if when they were together
I had cleanly removed the disk first I suspect things would have worked
fine and the rebalance to single would have copied whatever there was to
copy from there just fine... But that doesn't really help me now!

As it mounts read-only fine I just created a new btrfs device and copied
data from it overnight, but while this was possible for me I think it's
worth addressing.

So, two questions:
 - first for my culture, is there a way to tell btrfs to ignore/remove
these chunks? "trust me there's no data I care about in there"? At which
point rw mount would probably work again.
I don't think anything automated can be done at this point.

 - for other people (read: future me), if like me they don't check btrfs
fi df output, would it make sense to be a bit more verbose about the
mixed profile and suggest running a rebalance?
Things were really working well when I put the two disks back together
so I think most people wouldn't notice it either, but I'm not sure what
to suggest... For all I know there could already have been warnings in
dmesg when I put the two disks back together before my check.

Thanks,
-- 
Dominique

^ permalink raw reply	[flat|nested] only message in thread