linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH 3/3] btrfs: volumes: Allocate degraded chunks if rw devices can't fullfil a chunk
Date: Thu,  7 Nov 2019 14:27:10 +0800	[thread overview]
Message-ID: <20191107062710.67964-4-wqu@suse.com> (raw)
In-Reply-To: <20191107062710.67964-1-wqu@suse.com>

[PROBLEM]
Btrfs degraded mount will fallback to SINGLE profile if there are not
enough devices:

 # mkfs.btrfs -f /dev/test/scratch[12] -m raid1 -d raid1
 # wipefs -fa /dev/test/scratch2
 # mount -o degraded /dev/test/scratch1 /mnt/btrfs
 # fallocate -l 1G /mnt/btrfs/foobar
 # btrfs ins dump-tree -t chunk /dev/test/scratch1
        item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 1674575872) itemoff 15511 itemsize 80
                length 536870912 owner 2 stripe_len 65536 type DATA
 New data chunk will fallback to SINGLE.

If user doesn't balance those SINGLE chunks, even with missing devices
replaced, the fs is no longer full RAID1, and a missing device can break
the tolerance.

[CAUSE]
The cause is pretty simple, when mounted degraded, missing devices can't
be used for chunk allocation.
Thus btrfs has to fall back to SINGLE profile.

[ENHANCEMENT]
To avoid such problem, this patch will:
- Make all profiler reducer/updater to consider missing devices as part
  of num_devices
- Make chunk allocator to fallback to missing_list as last resort

If we have enough rw_devices, then go regular chunk allocation code.
This can avoid allocating degraded chunks.
E.g. for 3 devices RAID1 degraded mount, we will use the 2 existing
devices to allocate chunk, avoid degraded chunk.

But if we don't have enough rw_devices, then we check missing devices to
allocate degraded chunks.
E.g. for 2 devices RAID1 degraded mount, we have to allocate degraded
chunks to keep the RAID1 profile.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/block-group.c | 10 +++++++---
 fs/btrfs/volumes.c     | 18 +++++++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index bf7e3f23bba7..1686fd31679b 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -52,11 +52,13 @@ static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags)
  */
 static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags)
 {
-	u64 num_devices = fs_info->fs_devices->rw_devices;
+	u64 num_devices;
 	u64 target;
 	u64 raid_type;
 	u64 allowed = 0;
 
+	num_devices = fs_info->fs_devices->rw_devices +
+		      fs_info->fs_devices->missing_devices;
 	/*
 	 * See if restripe for this chunk_type is in progress, if so try to
 	 * reduce to the target profile
@@ -1986,7 +1988,8 @@ static u64 update_block_group_flags(struct btrfs_fs_info *fs_info, u64 flags)
 	if (stripped)
 		return extended_to_chunk(stripped);
 
-	num_devices = fs_info->fs_devices->rw_devices;
+	num_devices = fs_info->fs_devices->rw_devices +
+		      fs_info->fs_devices->missing_devices;
 
 	stripped = BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID56_MASK |
 		BTRFS_BLOCK_GROUP_RAID1_MASK | BTRFS_BLOCK_GROUP_RAID10;
@@ -2981,7 +2984,8 @@ static u64 get_profile_num_devs(struct btrfs_fs_info *fs_info, u64 type)
 
 	num_dev = btrfs_raid_array[btrfs_bg_flags_to_raid_index(type)].devs_max;
 	if (!num_dev)
-		num_dev = fs_info->fs_devices->rw_devices;
+		num_dev = fs_info->fs_devices->rw_devices +
+			  fs_info->fs_devices->missing_devices;
 
 	return num_dev;
 }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a462d8de5d2a..4dee1974ceb7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5052,8 +5052,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
 			     max_chunk_size);
 
-	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
-			       GFP_NOFS);
+	devices_info = kcalloc(fs_devices->rw_devices +
+			       fs_devices->missing_devices,
+			       sizeof(*devices_info), GFP_NOFS);
 	if (!devices_info)
 		return -ENOMEM;
 
@@ -5067,7 +5068,18 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 			max_stripe_size, dev_stripes);
 	if (ret < 0)
 		goto error;
-
+	/*
+	 * If rw devices can't fullfil the request, fallback to missing devices
+	 * as last resort.
+	 */
+	if (ndevs < devs_min) {
+		ret = gather_dev_holes(info, devices_info + ndevs, &ndevs,
+				&fs_devices->missing_list,
+				fs_devices->missing_devices,
+				max_stripe_size, dev_stripes);
+		if (ret < 0)
+			goto error;
+	}
 	/*
 	 * now sort the devices by hole size / available space
 	 */
-- 
2.24.0


  parent reply	other threads:[~2019-11-07  6:27 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07  6:27 [PATCH 0/3] btrfs: More intelligent degraded chunk allocator Qu Wenruo
2019-11-07  6:27 ` [PATCH 1/3] btrfs: volumes: Refactor device holes gathering into a separate function Qu Wenruo
2019-11-07  9:20   ` Johannes Thumshirn
2019-11-07  9:33     ` Qu Wenruo
2019-11-07  9:45       ` Johannes Thumshirn
2019-11-07  6:27 ` [PATCH 2/3] btrfs: volumes: Add btrfs_fs_devices::missing_list to collect missing devices Qu Wenruo
2019-11-07  9:31   ` Johannes Thumshirn
2019-11-19 10:03   ` Anand Jain
2019-11-19 10:29     ` Qu Wenruo
2019-11-27 19:36       ` David Sterba
2019-11-07  6:27 ` Qu Wenruo [this message]
2019-11-19 10:05   ` [PATCH 3/3] btrfs: volumes: Allocate degraded chunks if rw devices can't fullfil a chunk Anand Jain
2019-11-19 10:41     ` Qu Wenruo
2019-11-27 19:23       ` David Sterba
2019-11-27 23:36         ` Qu Wenruo
2019-11-28 11:24           ` David Sterba
2019-11-28 12:29             ` Qu Wenruo
2019-11-28 12:30             ` Qu WenRuo
2019-11-28 12:39               ` Qu Wenruo
2019-11-18 20:18 ` [PATCH 0/3] btrfs: More intelligent degraded chunk allocator David Sterba
2019-11-18 23:32   ` Qu Wenruo
2019-11-19  5:18     ` Alberto Bursi
2019-11-27 19:26       ` David Sterba
2019-12-02  3:22     ` Zygo Blaxell
2019-12-02  4:41       ` Qu Wenruo
2019-12-02 19:27         ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191107062710.67964-4-wqu@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).