linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] btrfs: Allow more disks missing for RAID10
@ 2019-07-18  6:27 Qu Wenruo
  2019-07-25 18:37 ` David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2019-07-18  6:27 UTC (permalink / raw)
  To: linux-btrfs

RAID10 can accept as much as half of its disks to be missing, as long as
each sub stripe still has a good mirror.

Thanks to the per-chunk degradable check, we can handle it pretty easily
now.

So Add this special check for RAID10, to allow user to be creative
(or crazy) using btrfs RAID10.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f209127a8bc6..65b10d13fc2d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7088,6 +7088,42 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
 	return -EIO;
 }
 
+static bool check_raid10_rw_degradable(struct btrfs_fs_info *fs_info,
+				       struct extent_map *em)
+{
+	struct map_lookup *map = em->map_lookup;
+	int sub_stripes = map->sub_stripes;
+	int num_stripes = map->num_stripes;
+	int tolerance = 1;
+	int i, j;
+
+	ASSERT(sub_stripes == 2);
+	ASSERT(num_stripes % sub_stripes == 0);
+	/*
+	 * Check substripes as a group, in each group we need to
+	 * have at least one good mirror;
+	 */
+	for (i = 0; i < num_stripes / sub_stripes; i ++) {
+		int missing = 0;
+		for (j = 0; j < sub_stripes; j++) {
+			struct btrfs_device *dev = map->stripes[i * 2 + j].dev;
+
+			if (!dev || !dev->bdev ||
+			    test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) ||
+			    dev->last_flush_error)
+				missing++;
+		}
+		if (missing > tolerance) {
+			btrfs_warn(fs_info,
+"chunk %llu stripes %d,%d missing %d devices, max tolerance is %d for writable mount",
+				   em->start, i, i + sub_stripes - 1, missing,
+				   tolerance);
+			return false;
+		}
+	}
+	return true;
+}
+
 /*
  * Check if all chunks in the fs are OK for read-write degraded mount
  *
@@ -7119,6 +7155,14 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
 		int i;
 
 		map = em->map_lookup;
+		if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
+			ret = check_raid10_rw_degradable(fs_info, em);
+			if (!ret) {
+				free_extent_map(em);
+				goto out;
+			}
+			goto next;
+		}
 		max_tolerated =
 			btrfs_get_num_tolerated_disk_barrier_failures(
 					map->type);
@@ -7141,6 +7185,7 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
 			ret = false;
 			goto out;
 		}
+next:
 		next_start = extent_map_end(em);
 		free_extent_map(em);
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: Allow more disks missing for RAID10
  2019-07-18  6:27 [PATCH] btrfs: Allow more disks missing for RAID10 Qu Wenruo
@ 2019-07-25 18:37 ` David Sterba
  2019-07-25 19:14   ` Austin S. Hemmelgarn
  2019-07-25 23:41   ` Qu Wenruo
  0 siblings, 2 replies; 7+ messages in thread
From: David Sterba @ 2019-07-25 18:37 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jul 18, 2019 at 02:27:49PM +0800, Qu Wenruo wrote:
> RAID10 can accept as much as half of its disks to be missing, as long as
> each sub stripe still has a good mirror.

Can you please make a test case for that?

I think the number of devices that can be lost can be higher than a half
in some extreme cases: one device has copies of all stripes, 2nd copy
can be scattered randomly on the other devices, but that's highly
unlikely to happen.

On average it's same as raid1, but the more exact check can potentially
utilize the stripe layout.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: Allow more disks missing for RAID10
  2019-07-25 18:37 ` David Sterba
@ 2019-07-25 19:14   ` Austin S. Hemmelgarn
  2019-07-25 23:41   ` Qu Wenruo
  1 sibling, 0 replies; 7+ messages in thread
From: Austin S. Hemmelgarn @ 2019-07-25 19:14 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On 2019-07-25 14:37, David Sterba wrote:
> On Thu, Jul 18, 2019 at 02:27:49PM +0800, Qu Wenruo wrote:
>> RAID10 can accept as much as half of its disks to be missing, as long as
>> each sub stripe still has a good mirror.
> 
> Can you please make a test case for that?
> 
> I think the number of devices that can be lost can be higher than a half
> in some extreme cases: one device has copies of all stripes, 2nd copy
> can be scattered randomly on the other devices, but that's highly
> unlikely to happen.
It is possible, but as you mention highly unlikely.  It's also possible 
with raid1 mode too, and a lot less unlikely there (in fact, it's almost 
guaranteed to happen in certain configurations).
> 
> On average it's same as raid1, but the more exact check can potentially
> utilize the stripe layout.
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: Allow more disks missing for RAID10
  2019-07-25 18:37 ` David Sterba
  2019-07-25 19:14   ` Austin S. Hemmelgarn
@ 2019-07-25 23:41   ` Qu Wenruo
  2019-07-26 10:39     ` David Sterba
  1 sibling, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2019-07-25 23:41 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 824 bytes --]



On 2019/7/26 上午2:37, David Sterba wrote:
> On Thu, Jul 18, 2019 at 02:27:49PM +0800, Qu Wenruo wrote:
>> RAID10 can accept as much as half of its disks to be missing, as long as
>> each sub stripe still has a good mirror.
> 
> Can you please make a test case for that?

Fstests one or btrfs-progs one?

> 
> I think the number of devices that can be lost can be higher than a half
> in some extreme cases: one device has copies of all stripes, 2nd copy
> can be scattered randomly on the other devices, but that's highly
> unlikely to happen.
> 
> On average it's same as raid1, but the more exact check can potentially
> utilize the stripe layout.
> 
That will be at extent level, to me it's an internal level violation,
far from what we want to improve.

So not that worthy.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: Allow more disks missing for RAID10
  2019-07-25 23:41   ` Qu Wenruo
@ 2019-07-26 10:39     ` David Sterba
  2019-07-31  6:58       ` Qu Wenruo
  0 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2019-07-26 10:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Jul 26, 2019 at 07:41:41AM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/7/26 上午2:37, David Sterba wrote:
> > On Thu, Jul 18, 2019 at 02:27:49PM +0800, Qu Wenruo wrote:
> >> RAID10 can accept as much as half of its disks to be missing, as long as
> >> each sub stripe still has a good mirror.
> > 
> > Can you please make a test case for that?
> 
> Fstests one or btrfs-progs one?

For fstests.

> > I think the number of devices that can be lost can be higher than a half
> > in some extreme cases: one device has copies of all stripes, 2nd copy
> > can be scattered randomly on the other devices, but that's highly
> > unlikely to happen.
> > 
> > On average it's same as raid1, but the more exact check can potentially
> > utilize the stripe layout.
> > 
> That will be at extent level, to me it's an internal level violation,
> far from what we want to improve.

Ah I don't mean to go the extent level, as you implemented it is enough
and an improvement.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: Allow more disks missing for RAID10
  2019-07-26 10:39     ` David Sterba
@ 2019-07-31  6:58       ` Qu Wenruo
  2019-07-31 13:23         ` David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2019-07-31  6:58 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1421 bytes --]



On 2019/7/26 下午6:39, David Sterba wrote:
> On Fri, Jul 26, 2019 at 07:41:41AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/7/26 上午2:37, David Sterba wrote:
>>> On Thu, Jul 18, 2019 at 02:27:49PM +0800, Qu Wenruo wrote:
>>>> RAID10 can accept as much as half of its disks to be missing, as long as
>>>> each sub stripe still has a good mirror.
>>>
>>> Can you please make a test case for that?
>>
>> Fstests one or btrfs-progs one?
> 
> For fstests.

OK, that test case in fact exposed a long-existing bug, we can't create
degraded chunks.

So if we're replacing the missing devices on a 4 disk RAID10 btrfs, we
will hit ENOSPC as we can't find 4 devices to fulfill a new chunk.
And it will finally trigger transaction abort.

Please discard this patch until we solve that problem.

Thanks,
Qu

> 
>>> I think the number of devices that can be lost can be higher than a half
>>> in some extreme cases: one device has copies of all stripes, 2nd copy
>>> can be scattered randomly on the other devices, but that's highly
>>> unlikely to happen.
>>>
>>> On average it's same as raid1, but the more exact check can potentially
>>> utilize the stripe layout.
>>>
>> That will be at extent level, to me it's an internal level violation,
>> far from what we want to improve.
> 
> Ah I don't mean to go the extent level, as you implemented it is enough
> and an improvement.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: Allow more disks missing for RAID10
  2019-07-31  6:58       ` Qu Wenruo
@ 2019-07-31 13:23         ` David Sterba
  0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2019-07-31 13:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Wed, Jul 31, 2019 at 02:58:02PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/7/26 下午6:39, David Sterba wrote:
> > On Fri, Jul 26, 2019 at 07:41:41AM +0800, Qu Wenruo wrote:
> >>
> >>
> >> On 2019/7/26 上午2:37, David Sterba wrote:
> >>> On Thu, Jul 18, 2019 at 02:27:49PM +0800, Qu Wenruo wrote:
> >>>> RAID10 can accept as much as half of its disks to be missing, as long as
> >>>> each sub stripe still has a good mirror.
> >>>
> >>> Can you please make a test case for that?
> >>
> >> Fstests one or btrfs-progs one?
> > 
> > For fstests.
> 
> OK, that test case in fact exposed a long-existing bug, we can't create
> degraded chunks.
> 
> So if we're replacing the missing devices on a 4 disk RAID10 btrfs, we
> will hit ENOSPC as we can't find 4 devices to fulfill a new chunk.
> And it will finally trigger transaction abort.
> 
> Please discard this patch until we solve that problem.

Ok, done.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-31 13:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-18  6:27 [PATCH] btrfs: Allow more disks missing for RAID10 Qu Wenruo
2019-07-25 18:37 ` David Sterba
2019-07-25 19:14   ` Austin S. Hemmelgarn
2019-07-25 23:41   ` Qu Wenruo
2019-07-26 10:39     ` David Sterba
2019-07-31  6:58       ` Qu Wenruo
2019-07-31 13:23         ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).