[PATCH] MD: Do not increment resync_mismatches unless MD_RECOVERY

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED
@ 2013-04-19 20:09 Jonathan Brassow
  2013-04-22  0:36 ` NeilBrown
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Brassow @ 2013-04-19 20:09 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb, jbrassow

MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED

resync_mismatches is used to display the number of differences that
have been found or repaired during a scrubbing operation.  It is not
meant to count anything during resync or repair operations.  (How
much sense does it make to find resync_mismatches populated after an
initial synchronization of the array?  After cleaning-up an unclean
shutdown?  After [re]integrating a device into an existing array?)
The incrementing of the variable must be restricted to when the user
initiates a scrubbing operation (i.e. "check" or "repair").

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>

Index: linux-upstream/drivers/md/raid1.c
===================================================================
--- linux-upstream.orig/drivers/md/raid1.c
+++ linux-upstream/drivers/md/raid1.c
@@ -1878,7 +1878,8 @@ static int process_checks(struct r1bio *
 			}
 		} else
 			j = 0;
-		if (j >= 0)
+		if ((j >= 0) &&
+		    (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)))
 			atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
 		if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
 			      && test_bit(BIO_UPTODATE, &sbio->bi_flags))) {
Index: linux-upstream/drivers/md/raid10.c
===================================================================
--- linux-upstream.orig/drivers/md/raid10.c
+++ linux-upstream/drivers/md/raid10.c
@@ -2071,7 +2071,10 @@ static void sync_request_write(struct md
 					break;
 			if (j == vcnt)
 				continue;
-			atomic64_add(r10_bio->sectors, &mddev->resync_mismatches);
+
+			if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
+				atomic64_add(r10_bio->sectors,
+					     &mddev->resync_mismatches);
 			if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
 				/* Don't fix anything. */
 				continue;
Index: linux-upstream/drivers/md/raid5.c
===================================================================
--- linux-upstream.orig/drivers/md/raid5.c
+++ linux-upstream/drivers/md/raid5.c
@@ -2989,7 +2989,10 @@ static void handle_parity_checks5(struct
 			 */
 			set_bit(STRIPE_INSYNC, &sh->state);
 		else {
-			atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
+			if (test_bit(MD_RECOVERY_REQUESTED,
+				     &conf->mddev->recovery))
+				atomic64_add(STRIPE_SECTORS,
+					     &conf->mddev->resync_mismatches);
 			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
 				/* don't try to repair!! */
 				set_bit(STRIPE_INSYNC, &sh->state);
@@ -3141,7 +3144,10 @@ static void handle_parity_checks6(struct
 				 */
 			}
 		} else {
-			atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
+			if (test_bit(MD_RECOVERY_REQUESTED,
+				     &conf->mddev->recovery))
+				atomic64_add(STRIPE_SECTORS,
+					     &conf->mddev->resync_mismatches);
 			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
 				/* don't try to repair!! */
 				set_bit(STRIPE_INSYNC, &sh->state);



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED
  2013-04-19 20:09 [PATCH] MD: Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED Jonathan Brassow
@ 2013-04-22  0:36 ` NeilBrown
  2013-04-22 16:26   ` Brassow Jonathan
  0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2013-04-22  0:36 UTC (permalink / raw)
  To: Jonathan Brassow; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3850 bytes --]

On Fri, 19 Apr 2013 15:09:21 -0500 Jonathan Brassow <jbrassow@redhat.com>
wrote:

> MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED
> 
> resync_mismatches is used to display the number of differences that
> have been found or repaired during a scrubbing operation.  It is not
> meant to count anything during resync or repair operations.  (How
> much sense does it make to find resync_mismatches populated after an
> initial synchronization of the array?  After cleaning-up an unclean
> shutdown?  After [re]integrating a device into an existing array?)
> The incrementing of the variable must be restricted to when the user
> initiates a scrubbing operation (i.e. "check" or "repair").

How do you know what it is "meant" to do? :-)

While it might not be particularly useful, I see no point in hiding
information, and no desire to change what mismatch_cnt reports.

It may well be useful to somehow report what the real meaning for
mismatch_cnt is.  i.e. to report what the last sync_action was.
Then any use-space tool that reported mismatch_cnt, could adjust according to
whether the last operation was check/repair or resync/recovery/reshape. etc.

So if you were to provide a patch which adds "last_sync_action" or similar,
I'd certainly consider that.

Thanks,
NeilBrown

> 
> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
> 
> Index: linux-upstream/drivers/md/raid1.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/raid1.c
> +++ linux-upstream/drivers/md/raid1.c
> @@ -1878,7 +1878,8 @@ static int process_checks(struct r1bio *
>  			}
>  		} else
>  			j = 0;
> -		if (j >= 0)
> +		if ((j >= 0) &&
> +		    (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)))
>  			atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
>  		if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
>  			      && test_bit(BIO_UPTODATE, &sbio->bi_flags))) {
> Index: linux-upstream/drivers/md/raid10.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/raid10.c
> +++ linux-upstream/drivers/md/raid10.c
> @@ -2071,7 +2071,10 @@ static void sync_request_write(struct md
>  					break;
>  			if (j == vcnt)
>  				continue;
> -			atomic64_add(r10_bio->sectors, &mddev->resync_mismatches);
> +
> +			if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
> +				atomic64_add(r10_bio->sectors,
> +					     &mddev->resync_mismatches);
>  			if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
>  				/* Don't fix anything. */
>  				continue;
> Index: linux-upstream/drivers/md/raid5.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/raid5.c
> +++ linux-upstream/drivers/md/raid5.c
> @@ -2989,7 +2989,10 @@ static void handle_parity_checks5(struct
>  			 */
>  			set_bit(STRIPE_INSYNC, &sh->state);
>  		else {
> -			atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
> +			if (test_bit(MD_RECOVERY_REQUESTED,
> +				     &conf->mddev->recovery))
> +				atomic64_add(STRIPE_SECTORS,
> +					     &conf->mddev->resync_mismatches);
>  			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
>  				/* don't try to repair!! */
>  				set_bit(STRIPE_INSYNC, &sh->state);
> @@ -3141,7 +3144,10 @@ static void handle_parity_checks6(struct
>  				 */
>  			}
>  		} else {
> -			atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
> +			if (test_bit(MD_RECOVERY_REQUESTED,
> +				     &conf->mddev->recovery))
> +				atomic64_add(STRIPE_SECTORS,
> +					     &conf->mddev->resync_mismatches);
>  			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
>  				/* don't try to repair!! */
>  				set_bit(STRIPE_INSYNC, &sh->state);
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED
  2013-04-22  0:36 ` NeilBrown
@ 2013-04-22 16:26   ` Brassow Jonathan
  2013-04-22 17:22     ` Doug Ledford
  2013-04-22 23:59     ` NeilBrown
  0 siblings, 2 replies; 6+ messages in thread
From: Brassow Jonathan @ 2013-04-22 16:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org Raid, Jonathan Brassow


On Apr 21, 2013, at 7:36 PM, NeilBrown wrote:

> On Fri, 19 Apr 2013 15:09:21 -0500 Jonathan Brassow <jbrassow@redhat.com>
> wrote:
> 
>> MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED
>> 
>> resync_mismatches is used to display the number of differences that
>> have been found or repaired during a scrubbing operation.  It is not
>> meant to count anything during resync or repair operations.  (How
>> much sense does it make to find resync_mismatches populated after an
>> initial synchronization of the array?  After cleaning-up an unclean
>> shutdown?  After [re]integrating a device into an existing array?)
>> The incrementing of the variable must be restricted to when the user
>> initiates a scrubbing operation (i.e. "check" or "repair").
> 
> How do you know what it is "meant" to do? :-)

Yes, I suppose I did infer the meaning, but I don't think it's too much of a stretch - especially given the commit message where 'resync_mismatches' was introduced.
commit 9d88883e68f404d5581bd391713ceef470ea53a9
Author: NeilBrown <neilb@suse.de>
Date:   Tue Nov 8 21:39:26 2005 -0800

    [PATCH] md: teach raid5 the difference between 'check' and 'repair'.
    
    With this, raid5 can be asked to check parity without repairing it.  It also
    keeps a count of the number of incorrect parity blocks found (mismatches) an
    reports them through sysfs.
    
    Signed-off-by: Neil Brown <neilb@suse.de>
    Cc: Greg KH <greg@kroah.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Also, it seems that 'mismatches' is only ever talked about in reference to scrubbing operations.  (Although it does say in 'man 4 md', "This  is  set to zero when a scrub starts and is incremented whenever a sector is found that is a mismatch.", which could imply that 'mismatches' might be non-zero before a scrub operation starts.)  There is no indication anywhere that I can find what it would mean to have a non-zero 'mismatches' absent a scrub operation - perhaps we just assume "uninitialized".

However, if it has come to mean something more since then or provides any sort of useful information, then I can see your point.  I find the behavior to be problematic.  A user creates an array and sometimes the value is non-zero - that could be alarming, causing the user to think the array needs immediate "repair".

Anyway, I suppose I am simply bloviating if your mind is made up.  I can make due with the idea below of recording the "last_sync_action".  I'm thinking of recording the MD_RECOVERY_* flags that would define the type of sync_action - would you rather have a 'char *'?  If we kept the flags, I wonder if we could also use that to define acceptable sync_action transitions...  We don't, for example, particularly want "resync" -> "idle" -> "check" -> "idle" -> "resync" - especially if checkpoints are being made throughout.

> While it might not be particularly useful, I see no point in hiding
> information, and no desire to change what mismatch_cnt reports.
> 
> It may well be useful to somehow report what the real meaning for
> mismatch_cnt is.  i.e. to report what the last sync_action was.
> Then any use-space tool that reported mismatch_cnt, could adjust according to
> whether the last operation was check/repair or resync/recovery/reshape. etc.
> 
> So if you were to provide a patch which adds "last_sync_action" or similar,
> I'd certainly consider that.

 brassow

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED
  2013-04-22 16:26   ` Brassow Jonathan
@ 2013-04-22 17:22     ` Doug Ledford
  2013-04-22 23:44       ` NeilBrown
  2013-04-22 23:59     ` NeilBrown
  1 sibling, 1 reply; 6+ messages in thread
From: Doug Ledford @ 2013-04-22 17:22 UTC (permalink / raw)
  To: Brassow Jonathan; +Cc: NeilBrown, linux-raid@vger.kernel.org Raid

On 04/22/2013 12:26 PM, Brassow Jonathan wrote:
> 
> On Apr 21, 2013, at 7:36 PM, NeilBrown wrote:
> 
>> On Fri, 19 Apr 2013 15:09:21 -0500 Jonathan Brassow
>> <jbrassow@redhat.com> wrote:
>> 
>>> MD:  Do not increment resync_mismatches unless
>>> MD_RECOVERY_REQUESTED
>>> 
>>> resync_mismatches is used to display the number of differences
>>> that have been found or repaired during a scrubbing operation.
>>> It is not meant to count anything during resync or repair
>>> operations.  (How much sense does it make to find
>>> resync_mismatches populated after an initial synchronization of
>>> the array?  After cleaning-up an unclean shutdown?  After
>>> [re]integrating a device into an existing array?) The
>>> incrementing of the variable must be restricted to when the user 
>>> initiates a scrubbing operation (i.e. "check" or "repair").
>> 
>> How do you know what it is "meant" to do? :-)
> 
> Yes, I suppose I did infer the meaning, but I don't think it's too
> much of a stretch - especially given the commit message where
> 'resync_mismatches' was introduced.

Which also matches the understanding other people have had for a long
time ;-)

The information Neil doesn't want to throw away is a valid issue, but
maybe the proper thing to do here is to have two counters:
repaired_mistmatches and detected_mismatches, and then you can infer
total_mismatches from that, and add mismatch_cnt_since (which makes more
sense to me than last_sync_action if you're using it to delimit when you
started the current mismatch counts) as a representation of when you
started the current counts.  If it was since boot, or since disk add, or
since check/repair, whatever, you put that in the since file (which I
think answers your other question you had about a set of flags or
text...I personally think text since the last operation that we might
want to record could be assemble or something like that).  That way you
aren't throwing anything away, but you also aren't confusing people and
making them concerned since most people think mismatch_cnt is
uncorrected errors found, having that increment for corrected issues
found during assembly or something can certainly confuse people.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED
  2013-04-22 17:22     ` Doug Ledford
@ 2013-04-22 23:44       ` NeilBrown
  0 siblings, 0 replies; 6+ messages in thread
From: NeilBrown @ 2013-04-22 23:44 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Brassow Jonathan, linux-raid@vger.kernel.org Raid

[-- Attachment #1: Type: text/plain, Size: 2848 bytes --]

On Mon, 22 Apr 2013 13:22:11 -0400 Doug Ledford <dledford@redhat.com> wrote:

> On 04/22/2013 12:26 PM, Brassow Jonathan wrote:
> > 
> > On Apr 21, 2013, at 7:36 PM, NeilBrown wrote:
> > 
> >> On Fri, 19 Apr 2013 15:09:21 -0500 Jonathan Brassow
> >> <jbrassow@redhat.com> wrote:
> >> 
> >>> MD:  Do not increment resync_mismatches unless
> >>> MD_RECOVERY_REQUESTED
> >>> 
> >>> resync_mismatches is used to display the number of differences
> >>> that have been found or repaired during a scrubbing operation.
> >>> It is not meant to count anything during resync or repair
> >>> operations.  (How much sense does it make to find
> >>> resync_mismatches populated after an initial synchronization of
> >>> the array?  After cleaning-up an unclean shutdown?  After
> >>> [re]integrating a device into an existing array?) The
> >>> incrementing of the variable must be restricted to when the user 
> >>> initiates a scrubbing operation (i.e. "check" or "repair").
> >> 
> >> How do you know what it is "meant" to do? :-)
> > 
> > Yes, I suppose I did infer the meaning, but I don't think it's too
> > much of a stretch - especially given the commit message where
> > 'resync_mismatches' was introduced.
> 
> Which also matches the understanding other people have had for a long
> time ;-)
> 
> The information Neil doesn't want to throw away is a valid issue, but
> maybe the proper thing to do here is to have two counters:
> repaired_mistmatches and detected_mismatches, and then you can infer
> total_mismatches from that, and add mismatch_cnt_since (which makes more
> sense to me than last_sync_action if you're using it to delimit when you
> started the current mismatch counts) as a representation of when you
> started the current counts.  If it was since boot, or since disk add, or
> since check/repair, whatever, you put that in the since file (which I
> think answers your other question you had about a set of flags or
> text...I personally think text since the last operation that we might
> want to record could be assemble or something like that).  That way you
> aren't throwing anything away, but you also aren't confusing people and
> making them concerned since most people think mismatch_cnt is
> uncorrected errors found, having that increment for corrected issues
> found during assembly or something can certainly confuse people.
> 

I don't think 'mismatch_cnt_since' really makes sense.  Mismatches aren't
just found at random points in time.  They are a direct result of a
sync_action.
So "mismatch_cnt_found_during" or "mismatch_cnt_detected_by".

If they were found by "sync" or "check" or "repair" then it means something
different in each case.

The issues isn't "when" as a time or time-range they were found, but what was
happening at the time.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED
  2013-04-22 16:26   ` Brassow Jonathan
  2013-04-22 17:22     ` Doug Ledford
@ 2013-04-22 23:59     ` NeilBrown
  1 sibling, 0 replies; 6+ messages in thread
From: NeilBrown @ 2013-04-22 23:59 UTC (permalink / raw)
  To: Brassow Jonathan; +Cc: linux-raid@vger.kernel.org Raid

[-- Attachment #1: Type: text/plain, Size: 5160 bytes --]

On Mon, 22 Apr 2013 11:26:16 -0500 Brassow Jonathan <jbrassow@redhat.com>
wrote:

> 
> On Apr 21, 2013, at 7:36 PM, NeilBrown wrote:
> 
> > On Fri, 19 Apr 2013 15:09:21 -0500 Jonathan Brassow <jbrassow@redhat.com>
> > wrote:
> > 
> >> MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED
> >> 
> >> resync_mismatches is used to display the number of differences that
> >> have been found or repaired during a scrubbing operation.  It is not
> >> meant to count anything during resync or repair operations.  (How
> >> much sense does it make to find resync_mismatches populated after an
> >> initial synchronization of the array?  After cleaning-up an unclean
> >> shutdown?  After [re]integrating a device into an existing array?)
> >> The incrementing of the variable must be restricted to when the user
> >> initiates a scrubbing operation (i.e. "check" or "repair").
> > 
> > How do you know what it is "meant" to do? :-)
> 
> Yes, I suppose I did infer the meaning, but I don't think it's too much of a stretch - especially given the commit message where 'resync_mismatches' was introduced.
> commit 9d88883e68f404d5581bd391713ceef470ea53a9
> Author: NeilBrown <neilb@suse.de>
> Date:   Tue Nov 8 21:39:26 2005 -0800
> 
>     [PATCH] md: teach raid5 the difference between 'check' and 'repair'.
>     
>     With this, raid5 can be asked to check parity without repairing it.  It also
>     keeps a count of the number of incorrect parity blocks found (mismatches) an
>     reports them through sysfs.
>     
>     Signed-off-by: Neil Brown <neilb@suse.de>
>     Cc: Greg KH <greg@kroah.com>
>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> Also, it seems that 'mismatches' is only ever talked about in reference to scrubbing operations.  (Although it does say in 'man 4 md', "This  is  set to zero when a scrub starts and is incremented whenever a sector is found that is a mismatch.", which could imply that 'mismatches' might be non-zero before a scrub operation starts.)  There is no indication anywhere that I can find what it would mean to have a non-zero 'mismatches' absent a scrub operation - perhaps we just assume "uninitialized".
> 
> However, if it has come to mean something more since then or provides any sort of useful information, then I can see your point.  I find the behavior to be problematic.  A user creates an array and sometimes the value is non-zero - that could be alarming, causing the user to think the array needs immediate "repair".
> 
> Anyway, I suppose I am simply bloviating if your mind is made up.  I can make due with the idea below of recording the "last_sync_action".  I'm thinking of recording the MD_RECOVERY_* flags that would define the type of sync_action - would you rather have a 'char *'?  If we kept the flags, I wonder if we could also use that to define acceptable sync_action transitions...  We don't, for example, particularly want "resync" -> "idle" -> "check" -> "idle" -> "resync" - especially if checkpoints are being made throughout.
> 
> > While it might not be particularly useful, I see no point in hiding
> > information, and no desire to change what mismatch_cnt reports.
> > 
> > It may well be useful to somehow report what the real meaning for
> > mismatch_cnt is.  i.e. to report what the last sync_action was.
> > Then any use-space tool that reported mismatch_cnt, could adjust according to
> > whether the last operation was check/repair or resync/recovery/reshape. etc.
> > 
> > So if you were to provide a patch which adds "last_sync_action" or similar,
> > I'd certainly consider that.
> 
>  brassow

"bloviating" : Talk at length, esp. in an inflated or empty way.

I learnt a new word today - thanks!

On reflection, not all RAID personalities count mismatches for resync.  RAID1
doesn't (so your change to raid1.c was unnecessary).
So assuming that it means anything after a 'sync' is already invalid.

You are right that people get confused by it, but they get confused that it
is non-zero after a "repair" mostly, and we cannot "fix" that.

I think the really useful thing here is to record the cause of the
mismatch_cnt - why we even counted mismatches.  I'd probably record it as a
string but I don't think it matter much either way: whatever makes the code
simple is good.

And if we do record the action which generated mismatch_cnt, then having the
mismatch_cnt file appear empty when the action was not check or repair is
probably valid after all.

I don't think there is any extra need to enforce sync_action transitions.
If you write "idle" while a sync is needed, it will interrupt the sync then
immediately restart it again.
You could possible write "frozen" and then "check".  That might do something
odd, and possibly should  be fixed.
However the choice of next action doesn't depend on what has been done
previously, but on the current state of the array.  So there is no need to
record extra state, but there might be a need to validate some transitions,
particularly involving 'frozen'.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-22 23:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-19 20:09 [PATCH] MD: Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED Jonathan Brassow
2013-04-22  0:36 ` NeilBrown
2013-04-22 16:26   ` Brassow Jonathan
2013-04-22 17:22     ` Doug Ledford
2013-04-22 23:44       ` NeilBrown
2013-04-22 23:59     ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.