All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Btrfs: loop retry on raid6 read failures
@ 2017-12-04 22:40 Liu Bo
  2017-12-04 22:40 ` [PATCH 1/3] Btrfs: remove redundant check in rbio_can_merge Liu Bo
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Liu Bo @ 2017-12-04 22:40 UTC (permalink / raw)
  To: linux-btrfs

Patch 1 is a simple cleanup.
Patch 2 fixes a bug in raid56 rbio merging code.
Patch 3 fixes a bug in raid6 reconstruction process which can end up
read failure when it can rebuild up good data.

Liu Bo (3):
  Btrfs: remove redundant check in rbio_can_merge
  Btrfs: do not merge rbios if their fail stripe index are not identical
  Btrfs: make raid6 rebuild retry more

 fs/btrfs/raid56.c  | 33 +++++++++++++++++++++++++--------
 fs/btrfs/volumes.c |  9 ++++++++-
 2 files changed, 33 insertions(+), 9 deletions(-)

-- 
2.9.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] Btrfs: remove redundant check in rbio_can_merge
  2017-12-04 22:40 [PATCH 0/3] Btrfs: loop retry on raid6 read failures Liu Bo
@ 2017-12-04 22:40 ` Liu Bo
  2017-12-05 18:20   ` David Sterba
  2017-12-04 22:40 ` [PATCH 2/3] Btrfs: do not merge rbios if their fail stripe index are not identical Liu Bo
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Liu Bo @ 2017-12-04 22:40 UTC (permalink / raw)
  To: linux-btrfs

Given the above
'
if (last->operation != cur->operation)
	return 0;
',
it's guaranteed that two operations are same.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/raid56.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 24a6222..c4188fb 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -595,12 +595,10 @@ static int rbio_can_merge(struct btrfs_raid_bio *last,
 	 * bio list here, anyone else that wants to
 	 * change this stripe needs to do their own rmw.
 	 */
-	if (last->operation == BTRFS_RBIO_PARITY_SCRUB ||
-	    cur->operation == BTRFS_RBIO_PARITY_SCRUB)
+	if (last->operation == BTRFS_RBIO_PARITY_SCRUB)
 		return 0;
 
-	if (last->operation == BTRFS_RBIO_REBUILD_MISSING ||
-	    cur->operation == BTRFS_RBIO_REBUILD_MISSING)
+	if (last->operation == BTRFS_RBIO_REBUILD_MISSING)
 		return 0;
 
 	return 1;
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] Btrfs: do not merge rbios if their fail stripe index are not identical
  2017-12-04 22:40 [PATCH 0/3] Btrfs: loop retry on raid6 read failures Liu Bo
  2017-12-04 22:40 ` [PATCH 1/3] Btrfs: remove redundant check in rbio_can_merge Liu Bo
@ 2017-12-04 22:40 ` Liu Bo
  2017-12-05 18:24   ` David Sterba
  2017-12-04 22:40 ` [PATCH 3/3] Btrfs: make raid6 rebuild retry more Liu Bo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Liu Bo @ 2017-12-04 22:40 UTC (permalink / raw)
  To: linux-btrfs

Since fail stripe index in rbio would be used to decide which
algorithm reconstruction would be run, we cannot merge rbios if
their's fail striped index are different, otherwise, one of the two
reconstructions would fail.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/raid56.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index c4188fb..8d09535 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -601,6 +601,15 @@ static int rbio_can_merge(struct btrfs_raid_bio *last,
 	if (last->operation == BTRFS_RBIO_REBUILD_MISSING)
 		return 0;
 
+	if (last->operation == BTRFS_RBIO_READ_REBUILD) {
+		int fa = (last->faila < last->failb) ? last->faila : last->failb;
+		int fb = (last->faila < last->failb) ? last->failb : last->faila;
+		int cur_fa = (cur->faila < cur->failb) ? cur->faila : cur->failb;
+		int cur_fb = (cur->faila < cur->failb) ? cur->failb : cur->faila;
+
+		if (fa != cur_fa || fb != cur_fb)
+			return 0;
+	}
 	return 1;
 }
 
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] Btrfs: make raid6 rebuild retry more
  2017-12-04 22:40 [PATCH 0/3] Btrfs: loop retry on raid6 read failures Liu Bo
  2017-12-04 22:40 ` [PATCH 1/3] Btrfs: remove redundant check in rbio_can_merge Liu Bo
  2017-12-04 22:40 ` [PATCH 2/3] Btrfs: do not merge rbios if their fail stripe index are not identical Liu Bo
@ 2017-12-04 22:40 ` Liu Bo
  2017-12-05  8:07   ` Qu Wenruo
  2017-12-05  8:08   ` Qu Wenruo
  2017-12-05 18:26 ` [PATCH 0/3] Btrfs: loop retry on raid6 read failures David Sterba
  2018-01-05 15:54 ` David Sterba
  4 siblings, 2 replies; 16+ messages in thread
From: Liu Bo @ 2017-12-04 22:40 UTC (permalink / raw)
  To: linux-btrfs

There is a scenario that can end up with rebuild process failing to
return good content, i.e.
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,

- the 1st retry is to rebuild with all other stripes, it'll eventually
  be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
  that it will do raid6 style rebuild,

however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content, and users will think of this as data loss.
More seriouly, if the loss happens on some important internal btree
roots, it could refuse to mount.

This extends btrfs to do more retries and each retry fails only one
stripe.  Since raid6 can tolerate 2 disk failures, if there is one
more failure besides the failure on which we're recovering, this can
always work.

The worst case is to retry as many times as the number of raid6 disks,
but given the fact that such a scenario is really rare in practice,
it's still acceptable.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/raid56.c  | 18 ++++++++++++++----
 fs/btrfs/volumes.c |  9 ++++++++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 8d09535..064d5bc 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
 	}
 
 	/*
-	 * reconstruct from the q stripe if they are
-	 * asking for mirror 3
+	 * Loop retry:
+	 * for 'mirror == 2', reconstruct from all other stripes.
+	 * for 'mirror_num > 2', select a stripe to fail on every retry.
 	 */
-	if (mirror_num == 3)
-		rbio->failb = rbio->real_stripes - 2;
+	if (mirror_num > 2) {
+		/*
+		 * 'mirror == 3' is to fail the p stripe and
+		 * reconstruct from the q stripe.  'mirror > 3' is to
+		 * fail a data stripe and reconstruct from p+q stripe.
+		 */
+		rbio->failb = rbio->real_stripes - (mirror_num - 1);
+		ASSERT(rbio->failb > 0);
+		if (rbio->failb <= rbio->faila)
+			rbio->failb--;
+	}
 
 	ret = lock_stripe_add(rbio);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b397375..95371f8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5094,7 +5094,14 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	else if (map->type & BTRFS_BLOCK_GROUP_RAID5)
 		ret = 2;
 	else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
-		ret = 3;
+		/*
+		 * There could be two corrupted data stripes, we need
+		 * to loop retry in order to rebuild the correct data.
+		 * 
+		 * Fail a stripe at a time on every retry except the
+		 * stripe under reconstruction.
+		 */
+		ret = map->num_stripes;
 	else
 		ret = 1;
 	free_extent_map(em);
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more
  2017-12-04 22:40 ` [PATCH 3/3] Btrfs: make raid6 rebuild retry more Liu Bo
@ 2017-12-05  8:07   ` Qu Wenruo
  2017-12-05 18:04     ` Liu Bo
  2017-12-05 18:09     ` David Sterba
  2017-12-05  8:08   ` Qu Wenruo
  1 sibling, 2 replies; 16+ messages in thread
From: Qu Wenruo @ 2017-12-05  8:07 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3888 bytes --]



On 2017年12月05日 06:40, Liu Bo wrote:
> There is a scenario that can end up with rebuild process failing to
> return good content, i.e.
> suppose that all disks can be read without problems and if the content
> that was read out doesn't match its checksum, currently for raid6
> btrfs at most retries twice,
> 
> - the 1st retry is to rebuild with all other stripes, it'll eventually
>   be a raid5 xor rebuild,
> - if the 1st fails, the 2nd retry will deliberately fail parity p so
>   that it will do raid6 style rebuild,
> 
> however, the chances are that another non-parity stripe content also
> has something corrupted, so that the above retries are not able to
> return correct content, and users will think of this as data loss.
> More seriouly, if the loss happens on some important internal btree
> roots, it could refuse to mount.
> 
> This extends btrfs to do more retries and each retry fails only one
> stripe.  Since raid6 can tolerate 2 disk failures, if there is one
> more failure besides the failure on which we're recovering, this can
> always work.

This should be the correct behavior for RAID6, try all possible
combination until all combination is exhausted or correct data can be
recovered.

> 
> The worst case is to retry as many times as the number of raid6 disks,
> but given the fact that such a scenario is really rare in practice,
> it's still acceptable.

And even we tried that much times, I don't think it will be a big problem.
Since most of the that happens purely in memory, it should be so fast
that no obvious impact can be observed.

While with some small nitpick inlined below, the idea looks pretty good
to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/raid56.c  | 18 ++++++++++++++----
>  fs/btrfs/volumes.c |  9 ++++++++-
>  2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 8d09535..064d5bc 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
>  	}
>  
>  	/*
> -	 * reconstruct from the q stripe if they are
> -	 * asking for mirror 3
> +	 * Loop retry:
> +	 * for 'mirror == 2', reconstruct from all other stripes.

What about using macro to makes the reassemble method more human readable?

And for mirror == 2 case, "rebuild from all" do you mean rebuild using
all remaining data stripe + P? The word "all" here is a little confusing.

Thanks,
Qu

> +	 * for 'mirror_num > 2', select a stripe to fail on every retry.
>  	 */> -	if (mirror_num == 3)
> -		rbio->failb = rbio->real_stripes - 2;
> +	if (mirror_num > 2) {
> +		/*
> +		 * 'mirror == 3' is to fail the p stripe and
> +		 * reconstruct from the q stripe.  'mirror > 3' is to
> +		 * fail a data stripe and reconstruct from p+q stripe.
> +		 */
> +		rbio->failb = rbio->real_stripes - (mirror_num - 1);
> +		ASSERT(rbio->failb > 0);
> +		if (rbio->failb <= rbio->faila)
> +			rbio->failb--;
> +	}
>  
>  	ret = lock_stripe_add(rbio);
>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b397375..95371f8 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5094,7 +5094,14 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
>  	else if (map->type & BTRFS_BLOCK_GROUP_RAID5)
>  		ret = 2;
>  	else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
> -		ret = 3;
> +		/*
> +		 * There could be two corrupted data stripes, we need
> +		 * to loop retry in order to rebuild the correct data.
> +		 * 
> +		 * Fail a stripe at a time on every retry except the
> +		 * stripe under reconstruction.
> +		 */
> +		ret = map->num_stripes;
>  	else
>  		ret = 1;
>  	free_extent_map(em);
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more
  2017-12-04 22:40 ` [PATCH 3/3] Btrfs: make raid6 rebuild retry more Liu Bo
  2017-12-05  8:07   ` Qu Wenruo
@ 2017-12-05  8:08   ` Qu Wenruo
  1 sibling, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2017-12-05  8:08 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3888 bytes --]



On 2017年12月05日 06:40, Liu Bo wrote:
> There is a scenario that can end up with rebuild process failing to
> return good content, i.e.
> suppose that all disks can be read without problems and if the content
> that was read out doesn't match its checksum, currently for raid6
> btrfs at most retries twice,
> 
> - the 1st retry is to rebuild with all other stripes, it'll eventually
>   be a raid5 xor rebuild,
> - if the 1st fails, the 2nd retry will deliberately fail parity p so
>   that it will do raid6 style rebuild,
> 
> however, the chances are that another non-parity stripe content also
> has something corrupted, so that the above retries are not able to
> return correct content, and users will think of this as data loss.
> More seriouly, if the loss happens on some important internal btree
> roots, it could refuse to mount.
> 
> This extends btrfs to do more retries and each retry fails only one
> stripe.  Since raid6 can tolerate 2 disk failures, if there is one
> more failure besides the failure on which we're recovering, this can
> always work.

This should be the correct behavior for RAID6, try all possible
combination until all combination is exhausted or correct data can be
recovered.

> 
> The worst case is to retry as many times as the number of raid6 disks,
> but given the fact that such a scenario is really rare in practice,
> it's still acceptable.

And even we tried that much times, I don't think it will be a big problem.
Since most of the that happens purely in memory, it should be so fast
that no obvious impact can be observed.

While with some small nitpick inlined below, the idea looks pretty good
to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/raid56.c  | 18 ++++++++++++++----
>  fs/btrfs/volumes.c |  9 ++++++++-
>  2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 8d09535..064d5bc 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
>  	}
>  
>  	/*
> -	 * reconstruct from the q stripe if they are
> -	 * asking for mirror 3
> +	 * Loop retry:
> +	 * for 'mirror == 2', reconstruct from all other stripes.

What about using macro to makes the reassemble method more human readable?

And for mirror == 2 case, "rebuild from all" do you mean rebuild using
all remaining data stripe + P? The word "all" here is a little confusing.

Thanks,
Qu

> +	 * for 'mirror_num > 2', select a stripe to fail on every retry.
>  	 */> -	if (mirror_num == 3)
> -		rbio->failb = rbio->real_stripes - 2;
> +	if (mirror_num > 2) {
> +		/*
> +		 * 'mirror == 3' is to fail the p stripe and
> +		 * reconstruct from the q stripe.  'mirror > 3' is to
> +		 * fail a data stripe and reconstruct from p+q stripe.
> +		 */
> +		rbio->failb = rbio->real_stripes - (mirror_num - 1);
> +		ASSERT(rbio->failb > 0);
> +		if (rbio->failb <= rbio->faila)
> +			rbio->failb--;
> +	}
>  
>  	ret = lock_stripe_add(rbio);
>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b397375..95371f8 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5094,7 +5094,14 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
>  	else if (map->type & BTRFS_BLOCK_GROUP_RAID5)
>  		ret = 2;
>  	else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
> -		ret = 3;
> +		/*
> +		 * There could be two corrupted data stripes, we need
> +		 * to loop retry in order to rebuild the correct data.
> +		 * 
> +		 * Fail a stripe at a time on every retry except the
> +		 * stripe under reconstruction.
> +		 */
> +		ret = map->num_stripes;
>  	else
>  		ret = 1;
>  	free_extent_map(em);
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more
  2017-12-05  8:07   ` Qu Wenruo
@ 2017-12-05 18:04     ` Liu Bo
  2017-12-05 19:29       ` David Sterba
  2017-12-05 18:09     ` David Sterba
  1 sibling, 1 reply; 16+ messages in thread
From: Liu Bo @ 2017-12-05 18:04 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Dec 05, 2017 at 04:07:35PM +0800, Qu Wenruo wrote:
> 
> 
> On 2017年12月05日 06:40, Liu Bo wrote:
> > There is a scenario that can end up with rebuild process failing to
> > return good content, i.e.
> > suppose that all disks can be read without problems and if the content
> > that was read out doesn't match its checksum, currently for raid6
> > btrfs at most retries twice,
> > 
> > - the 1st retry is to rebuild with all other stripes, it'll eventually
> >   be a raid5 xor rebuild,
> > - if the 1st fails, the 2nd retry will deliberately fail parity p so
> >   that it will do raid6 style rebuild,
> > 
> > however, the chances are that another non-parity stripe content also
> > has something corrupted, so that the above retries are not able to
> > return correct content, and users will think of this as data loss.
> > More seriouly, if the loss happens on some important internal btree
> > roots, it could refuse to mount.
> > 
> > This extends btrfs to do more retries and each retry fails only one
> > stripe.  Since raid6 can tolerate 2 disk failures, if there is one
> > more failure besides the failure on which we're recovering, this can
> > always work.
> 
> This should be the correct behavior for RAID6, try all possible
> combination until all combination is exhausted or correct data can be
> recovered.
> 
> > 
> > The worst case is to retry as many times as the number of raid6 disks,
> > but given the fact that such a scenario is really rare in practice,
> > it's still acceptable.
> 
> And even we tried that much times, I don't think it will be a big problem.
> Since most of the that happens purely in memory, it should be so fast
> that no obvious impact can be observed.
>

It's basically a while loop, so it may cause some delay/hang, anyway,
it's rare though.

> While with some small nitpick inlined below, the idea looks pretty good
> to me.
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> 
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/raid56.c  | 18 ++++++++++++++----
> >  fs/btrfs/volumes.c |  9 ++++++++-
> >  2 files changed, 22 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > index 8d09535..064d5bc 100644
> > --- a/fs/btrfs/raid56.c
> > +++ b/fs/btrfs/raid56.c
> > @@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
> >  	}
> >  
> >  	/*
> > -	 * reconstruct from the q stripe if they are
> > -	 * asking for mirror 3
> > +	 * Loop retry:
> > +	 * for 'mirror == 2', reconstruct from all other stripes.
> 
> What about using macro to makes the reassemble method more human readable?
> 
> And for mirror == 2 case, "rebuild from all" do you mean rebuild using
> all remaining data stripe + P? The word "all" here is a little confusing.
>

Thank you for the comments.

It depends, if all other stripes are good to read, then it'd do
'data+p' which is raid5 xor rebuild, if some disks also fail, then
it'd may do 'data+p+q' or 'data+q'.

Is it better to say "for mirror == 2, reconstruct from other available
stripes"?

Thanks,

-liubo

> Thanks,
> Qu
> 
> > +	 * for 'mirror_num > 2', select a stripe to fail on every retry.
> >  	 */> -	if (mirror_num == 3)
> > -		rbio->failb = rbio->real_stripes - 2;
> > +	if (mirror_num > 2) {
> > +		/*
> > +		 * 'mirror == 3' is to fail the p stripe and
> > +		 * reconstruct from the q stripe.  'mirror > 3' is to
> > +		 * fail a data stripe and reconstruct from p+q stripe.
> > +		 */
> > +		rbio->failb = rbio->real_stripes - (mirror_num - 1);
> > +		ASSERT(rbio->failb > 0);
> > +		if (rbio->failb <= rbio->faila)
> > +			rbio->failb--;
> > +	}
> >  
> >  	ret = lock_stripe_add(rbio);
> >  
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index b397375..95371f8 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -5094,7 +5094,14 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
> >  	else if (map->type & BTRFS_BLOCK_GROUP_RAID5)
> >  		ret = 2;
> >  	else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
> > -		ret = 3;
> > +		/*
> > +		 * There could be two corrupted data stripes, we need
> > +		 * to loop retry in order to rebuild the correct data.
> > +		 * 
> > +		 * Fail a stripe at a time on every retry except the
> > +		 * stripe under reconstruction.
> > +		 */
> > +		ret = map->num_stripes;
> >  	else
> >  		ret = 1;
> >  	free_extent_map(em);
> > 
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more
  2017-12-05  8:07   ` Qu Wenruo
  2017-12-05 18:04     ` Liu Bo
@ 2017-12-05 18:09     ` David Sterba
  2017-12-05 22:55       ` Liu Bo
  1 sibling, 1 reply; 16+ messages in thread
From: David Sterba @ 2017-12-05 18:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Liu Bo, linux-btrfs

On Tue, Dec 05, 2017 at 04:07:35PM +0800, Qu Wenruo wrote:
> > @@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
> >  	}
> >  
> >  	/*
> > -	 * reconstruct from the q stripe if they are
> > -	 * asking for mirror 3
> > +	 * Loop retry:
> > +	 * for 'mirror == 2', reconstruct from all other stripes.
> 
> What about using macro to makes the reassemble method more human readable?

Yeah, that's definetelly needed and should be based on
BTRFS_MAX_MIRRORS, not just hardcoded to 3.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] Btrfs: remove redundant check in rbio_can_merge
  2017-12-04 22:40 ` [PATCH 1/3] Btrfs: remove redundant check in rbio_can_merge Liu Bo
@ 2017-12-05 18:20   ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2017-12-05 18:20 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Dec 04, 2017 at 03:40:35PM -0700, Liu Bo wrote:
> Given the above
> '
> if (last->operation != cur->operation)
> 	return 0;
> ',
> it's guaranteed that two operations are same.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] Btrfs: do not merge rbios if their fail stripe index are not identical
  2017-12-04 22:40 ` [PATCH 2/3] Btrfs: do not merge rbios if their fail stripe index are not identical Liu Bo
@ 2017-12-05 18:24   ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2017-12-05 18:24 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Dec 04, 2017 at 03:40:36PM -0700, Liu Bo wrote:
> Since fail stripe index in rbio would be used to decide which
> algorithm reconstruction would be run, we cannot merge rbios if
> their's fail striped index are different, otherwise, one of the two
> reconstructions would fail.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/raid56.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index c4188fb..8d09535 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -601,6 +601,15 @@ static int rbio_can_merge(struct btrfs_raid_bio *last,
>  	if (last->operation == BTRFS_RBIO_REBUILD_MISSING)
>  		return 0;
>  
> +	if (last->operation == BTRFS_RBIO_READ_REBUILD) {
> +		int fa = (last->faila < last->failb) ? last->faila : last->failb;
> +		int fb = (last->faila < last->failb) ? last->failb : last->faila;
> +		int cur_fa = (cur->faila < cur->failb) ? cur->faila : cur->failb;
> +		int cur_fb = (cur->faila < cur->failb) ? cur->failb : cur->faila;

Can you please convert it to if/else? All the lines look very similar.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] Btrfs: loop retry on raid6 read failures
  2017-12-04 22:40 [PATCH 0/3] Btrfs: loop retry on raid6 read failures Liu Bo
                   ` (2 preceding siblings ...)
  2017-12-04 22:40 ` [PATCH 3/3] Btrfs: make raid6 rebuild retry more Liu Bo
@ 2017-12-05 18:26 ` David Sterba
  2018-01-05 15:54 ` David Sterba
  4 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2017-12-05 18:26 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Dec 04, 2017 at 03:40:34PM -0700, Liu Bo wrote:
> Patch 1 is a simple cleanup.
> Patch 2 fixes a bug in raid56 rbio merging code.
> Patch 3 fixes a bug in raid6 reconstruction process which can end up
> read failure when it can rebuild up good data.
> 
> Liu Bo (3):
>   Btrfs: remove redundant check in rbio_can_merge
>   Btrfs: do not merge rbios if their fail stripe index are not identical
>   Btrfs: make raid6 rebuild retry more

I'll add the patches to next so we can get some testing. The patch 3 can
stay as is, the cleanups for special mirror values would be good as a
separate patch.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more
  2017-12-05 18:04     ` Liu Bo
@ 2017-12-05 19:29       ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2017-12-05 19:29 UTC (permalink / raw)
  To: Liu Bo; +Cc: Qu Wenruo, linux-btrfs

On Tue, Dec 05, 2017 at 11:04:03AM -0700, Liu Bo wrote:
> On Tue, Dec 05, 2017 at 04:07:35PM +0800, Qu Wenruo wrote:
> > 
> > 
> > On 2017年12月05日 06:40, Liu Bo wrote:
> > > There is a scenario that can end up with rebuild process failing to
> > > return good content, i.e.
> > > suppose that all disks can be read without problems and if the content
> > > that was read out doesn't match its checksum, currently for raid6
> > > btrfs at most retries twice,
> > > 
> > > - the 1st retry is to rebuild with all other stripes, it'll eventually
> > >   be a raid5 xor rebuild,
> > > - if the 1st fails, the 2nd retry will deliberately fail parity p so
> > >   that it will do raid6 style rebuild,
> > > 
> > > however, the chances are that another non-parity stripe content also
> > > has something corrupted, so that the above retries are not able to
> > > return correct content, and users will think of this as data loss.
> > > More seriouly, if the loss happens on some important internal btree
> > > roots, it could refuse to mount.
> > > 
> > > This extends btrfs to do more retries and each retry fails only one
> > > stripe.  Since raid6 can tolerate 2 disk failures, if there is one
> > > more failure besides the failure on which we're recovering, this can
> > > always work.
> > 
> > This should be the correct behavior for RAID6, try all possible
> > combination until all combination is exhausted or correct data can be
> > recovered.
> > 
> > > 
> > > The worst case is to retry as many times as the number of raid6 disks,
> > > but given the fact that such a scenario is really rare in practice,
> > > it's still acceptable.
> > 
> > And even we tried that much times, I don't think it will be a big problem.
> > Since most of the that happens purely in memory, it should be so fast
> > that no obvious impact can be observed.
> >
> 
> It's basically a while loop, so it may cause some delay/hang, anyway,
> it's rare though.
> 
> > While with some small nitpick inlined below, the idea looks pretty good
> > to me.
> > 
> > Reviewed-by: Qu Wenruo <wqu@suse.com>
> > 
> > > 
> > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > ---
> > >  fs/btrfs/raid56.c  | 18 ++++++++++++++----
> > >  fs/btrfs/volumes.c |  9 ++++++++-
> > >  2 files changed, 22 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > > index 8d09535..064d5bc 100644
> > > --- a/fs/btrfs/raid56.c
> > > +++ b/fs/btrfs/raid56.c
> > > @@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
> > >  	}
> > >  
> > >  	/*
> > > -	 * reconstruct from the q stripe if they are
> > > -	 * asking for mirror 3
> > > +	 * Loop retry:
> > > +	 * for 'mirror == 2', reconstruct from all other stripes.
> > 
> > What about using macro to makes the reassemble method more human readable?
> > 
> > And for mirror == 2 case, "rebuild from all" do you mean rebuild using
> > all remaining data stripe + P? The word "all" here is a little confusing.
> >
> 
> Thank you for the comments.
> 
> It depends, if all other stripes are good to read, then it'd do
> 'data+p' which is raid5 xor rebuild, if some disks also fail, then
> it'd may do 'data+p+q' or 'data+q'.
> 
> Is it better to say "for mirror == 2, reconstruct from other available
> stripes"?

Yes it is, you can also add the examples from the previous paragraph.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more
  2017-12-05 18:09     ` David Sterba
@ 2017-12-05 22:55       ` Liu Bo
  2017-12-06  0:11         ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Liu Bo @ 2017-12-05 22:55 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Tue, Dec 05, 2017 at 07:09:25PM +0100, David Sterba wrote:
> On Tue, Dec 05, 2017 at 04:07:35PM +0800, Qu Wenruo wrote:
> > > @@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
> > >  	}
> > >  
> > >  	/*
> > > -	 * reconstruct from the q stripe if they are
> > > -	 * asking for mirror 3
> > > +	 * Loop retry:
> > > +	 * for 'mirror == 2', reconstruct from all other stripes.
> > 
> > What about using macro to makes the reassemble method more human readable?
> 
> Yeah, that's definetelly needed and should be based on
> BTRFS_MAX_MIRRORS, not just hardcoded to 3.

OK.

In case of raid5/6, BTRFS_MAX_MIRRORS is an abused name, it's more a
raid1/10 concept, either BTRFS_RAID56_FULL_REBUILD or
BTRFS_RAID56_FULL_CHK is better to me, which one do you guys like?

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more
  2017-12-05 22:55       ` Liu Bo
@ 2017-12-06  0:11         ` Qu Wenruo
  2017-12-07  0:26           ` Liu Bo
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2017-12-06  0:11 UTC (permalink / raw)
  To: bo.li.liu, dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1135 bytes --]



On 2017年12月06日 06:55, Liu Bo wrote:
> On Tue, Dec 05, 2017 at 07:09:25PM +0100, David Sterba wrote:
>> On Tue, Dec 05, 2017 at 04:07:35PM +0800, Qu Wenruo wrote:
>>>> @@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
>>>>  	}
>>>>  
>>>>  	/*
>>>> -	 * reconstruct from the q stripe if they are
>>>> -	 * asking for mirror 3
>>>> +	 * Loop retry:
>>>> +	 * for 'mirror == 2', reconstruct from all other stripes.
>>>
>>> What about using macro to makes the reassemble method more human readable?
>>
>> Yeah, that's definetelly needed and should be based on
>> BTRFS_MAX_MIRRORS, not just hardcoded to 3.
> 
> OK.
> 
> In case of raid5/6, BTRFS_MAX_MIRRORS is an abused name, it's more a
> raid1/10 concept, either BTRFS_RAID56_FULL_REBUILD or
> BTRFS_RAID56_FULL_CHK is better to me, which one do you guys like?

For mirror > 2 case, the mirror_num is no longer a single indicator, but
a ranged iterator for later rebuild retries.

Something like set_raid_fail_from_mirror_num() seems better to me.

Thanks,
Qu

> 
> Thanks,
> 
> -liubo
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Btrfs: make raid6 rebuild retry more
  2017-12-06  0:11         ` Qu Wenruo
@ 2017-12-07  0:26           ` Liu Bo
  0 siblings, 0 replies; 16+ messages in thread
From: Liu Bo @ 2017-12-07  0:26 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Wed, Dec 06, 2017 at 08:11:30AM +0800, Qu Wenruo wrote:
> 
> 
> On 2017年12月06日 06:55, Liu Bo wrote:
> > On Tue, Dec 05, 2017 at 07:09:25PM +0100, David Sterba wrote:
> >> On Tue, Dec 05, 2017 at 04:07:35PM +0800, Qu Wenruo wrote:
> >>>> @@ -2166,11 +2166,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
> >>>>  	}
> >>>>  
> >>>>  	/*
> >>>> -	 * reconstruct from the q stripe if they are
> >>>> -	 * asking for mirror 3
> >>>> +	 * Loop retry:
> >>>> +	 * for 'mirror == 2', reconstruct from all other stripes.
> >>>
> >>> What about using macro to makes the reassemble method more human readable?
> >>
> >> Yeah, that's definetelly needed and should be based on
> >> BTRFS_MAX_MIRRORS, not just hardcoded to 3.
> > 
> > OK.
> > 
> > In case of raid5/6, BTRFS_MAX_MIRRORS is an abused name, it's more a
> > raid1/10 concept, either BTRFS_RAID56_FULL_REBUILD or
> > BTRFS_RAID56_FULL_CHK is better to me, which one do you guys like?
> 
> For mirror > 2 case, the mirror_num is no longer a single indicator, but
> a ranged iterator for later rebuild retries.
> 
> Something like set_raid_fail_from_mirror_num() seems better to me.

I feel like having the logic open-code'd plus necessary comments is
probably better as we can see which stripe failb is.

For those who are new to this raid6 retry logic are likely to feel
confused by a helper function like set_raid_fail_from_mirror_num() and
will go check the helper function about the retry logic.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] Btrfs: loop retry on raid6 read failures
  2017-12-04 22:40 [PATCH 0/3] Btrfs: loop retry on raid6 read failures Liu Bo
                   ` (3 preceding siblings ...)
  2017-12-05 18:26 ` [PATCH 0/3] Btrfs: loop retry on raid6 read failures David Sterba
@ 2018-01-05 15:54 ` David Sterba
  4 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2018-01-05 15:54 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Dec 04, 2017 at 03:40:34PM -0700, Liu Bo wrote:
> Patch 1 is a simple cleanup.
> Patch 2 fixes a bug in raid56 rbio merging code.
> Patch 3 fixes a bug in raid6 reconstruction process which can end up
> read failure when it can rebuild up good data.
> 
> Liu Bo (3):
>   Btrfs: remove redundant check in rbio_can_merge
>   Btrfs: do not merge rbios if their fail stripe index are not identical
>   Btrfs: make raid6 rebuild retry more

All (their most recent version) added to 4.16 queue.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-01-05 15:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-04 22:40 [PATCH 0/3] Btrfs: loop retry on raid6 read failures Liu Bo
2017-12-04 22:40 ` [PATCH 1/3] Btrfs: remove redundant check in rbio_can_merge Liu Bo
2017-12-05 18:20   ` David Sterba
2017-12-04 22:40 ` [PATCH 2/3] Btrfs: do not merge rbios if their fail stripe index are not identical Liu Bo
2017-12-05 18:24   ` David Sterba
2017-12-04 22:40 ` [PATCH 3/3] Btrfs: make raid6 rebuild retry more Liu Bo
2017-12-05  8:07   ` Qu Wenruo
2017-12-05 18:04     ` Liu Bo
2017-12-05 19:29       ` David Sterba
2017-12-05 18:09     ` David Sterba
2017-12-05 22:55       ` Liu Bo
2017-12-06  0:11         ` Qu Wenruo
2017-12-07  0:26           ` Liu Bo
2017-12-05  8:08   ` Qu Wenruo
2017-12-05 18:26 ` [PATCH 0/3] Btrfs: loop retry on raid6 read failures David Sterba
2018-01-05 15:54 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.