From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH] btrfs: volumes: Allow missing devices to be writeable
Date: Thu, 29 Aug 2019 15:17:31 +0800 [thread overview]
Message-ID: <20190829071731.11521-1-wqu@suse.com> (raw)
[BUG]
There is a long existing bug that degraded mounted btrfs can allocate new
SINGLE/DUP chunks on a RAID1 fs:
#!/bin/bash
dev1=/dev/test/scratch1
dev2=/dev/test/scratch2
mnt=/mnt/btrfs
umount $mnt &> /dev/null
umount $dev1 &> /dev/null
umount $dev2 &> /dev/null
dmesg -C
mkfs.btrfs -f -m raid1 -d raid1 $dev1 $dev2
wipefs -fa $dev2
mount -o degraded $dev1 $mnt
btrfs balance start --full $mnt
umount $mnt
echo "=== chunk after degraded mount ==="
btrfs ins dump-tree -t chunk $dev1 | grep stripe_len.*type
The result fs will have chunks with SINGLE and DUP only:
=== chunk after degraded mount ===
length 33554432 owner 2 stripe_len 65536 type SYSTEM
length 1073741824 owner 2 stripe_len 65536 type DATA
length 1073741824 owner 2 stripe_len 65536 type DATA|DUP
length 219676672 owner 2 stripe_len 65536 type METADATA|DUP
length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP
This behavior greatly breaks the RAID1 tolerance.
Even with missing device replaced, if the device with DUP/SINGLE chunks
on them get missing, the whole fs can't be mounted RW any more.
And we already have reports that user even can't mount the fs as some
essential tree blocks got written to those DUP chunks.
[CAUSE]
The cause is pretty simple, we treat missing devices as non-writable.
Thus when we need to allocate chunks, we can only fall back to single
device profiles (SINGLE and DUP).
[FIX]
Just consider the missing devices as WRITABLE, so we allocate new chunks
on them to maintain old profiles.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/volumes.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 56f751192a6c..cc30b1fa9306 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7002,6 +7002,18 @@ static int read_one_dev(struct extent_buffer *leaf,
fill_device_from_item(leaf, dev_item, device);
set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
+
+ /*
+ * We treat missing devices as writable, so that we can maintain
+ * the existing profiles without degrading to DUP/SINGLE.
+ */
+ if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state)) {
+ set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
+ list_add(&device->dev_alloc_list,
+ &fs_devices->alloc_list);
+ fs_devices->rw_devices++;
+ }
+
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) &&
!test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
device->fs_devices->total_rw_bytes += device->total_bytes;
--
2.23.0
next reply other threads:[~2019-08-29 7:17 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-29 7:17 Qu Wenruo [this message]
2019-09-11 17:17 ` [PATCH] btrfs: volumes: Allow missing devices to be writeable David Sterba
2019-09-11 23:39 ` Qu Wenruo
2019-09-12 10:27 ` Anand Jain
2019-09-12 10:32 ` WenRuo Qu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190829071731.11521-1-wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).