All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] BTRFS: Runs the xor function if a Block has failed
@ 2015-12-30  6:28 Sanidhya Solanki
  2015-12-30 17:18 ` David Sterba
  0 siblings, 1 reply; 5+ messages in thread
From: Sanidhya Solanki @ 2015-12-30  6:28 UTC (permalink / raw)
  To: clm, jbacik, dsterba, jpage.lkml; +Cc: linux-kernel, linux-btrfs

The patch adds the xor function after the P stripe
has failed, without bad data or the Q stripe.

Signed-off-by: Sanidhya Solanki <jpage.lkml@gmail.com>
---
 fs/btrfs/raid56.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 1a33d3e..d33734a 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1864,8 +1864,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 					/*
 					 * Just the P stripe has failed, without
 					 * a bad data or Q stripe.
-					 * TODO, we should redo the xor here.
 					 */
+					run_xor(pointers, rbio->nr_data - 1, PAGE_CACHE_SIZE);
 					err = -EIO;
 					goto cleanup;
 				}
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] BTRFS: Runs the xor function if a Block has failed
  2015-12-30  6:28 [PATCH] BTRFS: Runs the xor function if a Block has failed Sanidhya Solanki
@ 2015-12-30 17:18 ` David Sterba
  2015-12-31  2:15   ` Sanidhya Solanki
  0 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2015-12-30 17:18 UTC (permalink / raw)
  To: Sanidhya Solanki; +Cc: clm, jbacik, dsterba, linux-kernel, linux-btrfs

On Wed, Dec 30, 2015 at 01:28:36AM -0500, Sanidhya Solanki wrote:
> The patch adds the xor function after the P stripe
> has failed, without bad data or the Q stripe.

That's just the comment copied, the changelog does not explain why it's
ok to do just the run_xor there. It does not seem trivial to me. Please
describe that the end result after the code change is expected.

> @@ -1864,8 +1864,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
>  					/*
>  					 * Just the P stripe has failed, without
>  					 * a bad data or Q stripe.
> -					 * TODO, we should redo the xor here.
>  					 */
> +					run_xor(pointers, rbio->nr_data - 1, PAGE_CACHE_SIZE);
>  					err = -EIO;
>  					goto cleanup;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] BTRFS: Runs the xor function if a Block has failed
  2015-12-30 17:18 ` David Sterba
@ 2015-12-31  2:15   ` Sanidhya Solanki
  2016-01-05  9:22     ` David Sterba
  0 siblings, 1 reply; 5+ messages in thread
From: Sanidhya Solanki @ 2015-12-31  2:15 UTC (permalink / raw)
  To: David Sterba; +Cc: clm, jbacik, dsterba, linux-kernel, linux-btrfs

On Wed, 30 Dec 2015 18:18:26 +0100
David Sterba <dsterba@suse.cz> wrote:

> That's just the comment copied, the changelog does not explain why
> it's ok to do just the run_xor there. It does not seem trivial to me.
> Please describe that the end result after the code change is expected.

In the RAID 6 case after a failure, we discover that the failure
affected the entire P stripe, without any bad data occurring. Hence, we
xor the previously stored parity data to return the data that was lost
in the P stripe failure.

The xor-red data is from the parity blocks. Hence, we are left with 
recovered data belonging to the P stripe.

If there is an error during the completion of the xor (provided by the
patch ), we got to the cleanup function.

Hope that is satisfactory.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] BTRFS: Runs the xor function if a Block has failed
  2015-12-31  2:15   ` Sanidhya Solanki
@ 2016-01-05  9:22     ` David Sterba
  2016-01-05 10:03       ` Sanidhya Solanki
  0 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2016-01-05  9:22 UTC (permalink / raw)
  To: Sanidhya Solanki
  Cc: David Sterba, clm, jbacik, dsterba, linux-kernel, linux-btrfs

On Wed, Dec 30, 2015 at 09:15:44PM -0500, Sanidhya Solanki wrote:
> On Wed, 30 Dec 2015 18:18:26 +0100
> David Sterba <dsterba@suse.cz> wrote:
> 
> > That's just the comment copied, the changelog does not explain why
> > it's ok to do just the run_xor there. It does not seem trivial to me.
> > Please describe that the end result after the code change is expected.
> 
> In the RAID 6 case after a failure, we discover that the failure
> affected the entire P stripe, without any bad data occurring. Hence, we
> xor the previously stored parity data to return the data that was lost
> in the P stripe failure.
> 
> The xor-red data is from the parity blocks. Hence, we are left with 
> recovered data belonging to the P stripe.

If the data a rerecovered, why is -EIO still returned? Also, I see some
post-recovery steps eg. for the damaged P stripes (at label pstripes)
and I'd expect something similar for the case you're fixing.

I'm not familiar with the raid56 implementation but the fix looks
suspiciously trivial and I doubt that the xor was omitted out of
laziness.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] BTRFS: Runs the xor function if a Block has failed
  2016-01-05  9:22     ` David Sterba
@ 2016-01-05 10:03       ` Sanidhya Solanki
  0 siblings, 0 replies; 5+ messages in thread
From: Sanidhya Solanki @ 2016-01-05 10:03 UTC (permalink / raw)
  To: David Sterba; +Cc: clm, jbacik, dsterba, linux-kernel, linux-btrfs

On Tue, 5 Jan 2016 10:22:36 +0100
David Sterba <dsterba@suse.cz> wrote:

> If the data a rerecovered, why is -EIO still returned? 

In the other places in the file where the code appears, the submitted
patch is all that is required to do the xor. I think we also need to
include the following line:
memcpy(pointers[nr_data], pointers[0], PAGE_SIZE);
and delete the "return EIO" statement which I believe appears to be a
placeholder for the xor function.
In the end the final patch should look something like this:
> -					 * TODO, we should redo the xor here.
>  					 */
> +					memcpy(pointers[nr_data], pointers[0], PAGE_SIZE);
> +					run_xor(pointers, rbio->nr_data - 1, PAGE_CACHE_SIZE); 
> - err = -EIO;

So, I was just going to send you the mail as it is written above, but I
decided to investigate. The commit in question that added the todo was
53b381b3abeb86f12787a6c40fee9b2f71edc23b. Unfortunately, it was not
submitted  by the original author, nor was it by any means a small
dedicated patch. It adds the entire file, without much comment or
explanation and has not been touched since. 

However, we can get some idea of what is expected by looking at line
2398 in raid56.c, where a similar case of raid 5 recovery is handled.

So what the patch described above does is deal with a scenario where
no q stripe or bad data block exists, and, we can only rebuild from the
p-stripe, in effect like a raid 5 recovery.

So, if you are satisfied with the above retouched change, I can modify
my original patch with your suggestions and my changes, and, I can
forward them to the list again.

> Also, I see some post-recovery steps eg. for the damaged P stripes
> (at label pstripes) and I'd expect something similar for the case
> you're fixing.

I believe that is the case because the other cases still have the q-
stripe available tor rebuild from, which requires cleanup afterwards,
but the raid5-like scenario above does not. Let me know if anything
else is needed.

> I'm not familiar with the raid56 implementation but the fix looks
> suspiciously trivial and I doubt that the xor was omitted out of
> laziness.

I guess we will never know.

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-01-05 14:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-30  6:28 [PATCH] BTRFS: Runs the xor function if a Block has failed Sanidhya Solanki
2015-12-30 17:18 ` David Sterba
2015-12-31  2:15   ` Sanidhya Solanki
2016-01-05  9:22     ` David Sterba
2016-01-05 10:03       ` Sanidhya Solanki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.