All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists
@ 2015-07-28 19:28 Nate Dailey
  2015-07-29 20:46 ` Joe Lawrence
  0 siblings, 1 reply; 6+ messages in thread
From: Nate Dailey @ 2015-07-28 19:28 UTC (permalink / raw)
  To: linux-raid; +Cc: Nate Dailey

If a bitmap recovery is interrupted and later restarted, then
sectors below the recovery offset, written between interruption
and resumption, will not be copied. This results in corruption.

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777511
for a script that can be used to repro this.

Seems like ignoring the recovery_offset if a bitmap exists is
the way to go.

Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
---
 drivers/md/md.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0c2a4e8..79c6285 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7738,16 +7738,18 @@ void md_do_sync(struct md_thread *thread)
 	else {
 		/* recovery follows the physical size of devices */
 		max_sectors = mddev->dev_sectors;
-		j = MaxSector;
-		rcu_read_lock();
-		rdev_for_each_rcu(rdev, mddev)
-			if (rdev->raid_disk >= 0 &&
-			    !test_bit(Faulty, &rdev->flags) &&
-			    !test_bit(In_sync, &rdev->flags) &&
-			    rdev->recovery_offset < j)
-				j = rdev->recovery_offset;
-		rcu_read_unlock();
-
+		/* we don't use the offset if there's a bitmap */
+		if (!mddev->bitmap) {
+			j = MaxSector;
+			rcu_read_lock();
+			rdev_for_each_rcu(rdev, mddev)
+				if (rdev->raid_disk >= 0 &&
+				    !test_bit(Faulty, &rdev->flags) &&
+				    !test_bit(In_sync, &rdev->flags) &&
+				    rdev->recovery_offset < j)
+					j = rdev->recovery_offset;
+			rcu_read_unlock();
+		}
 		/* If there is a bitmap, we need to make sure all
 		 * writes that started before we added a spare
 		 * complete before we start doing a recovery.
@@ -7756,7 +7758,7 @@ void md_do_sync(struct md_thread *thread)
 		 * recovery has checked that bit and skipped that
 		 * region.
 		 */
-		if (mddev->bitmap) {
+		else {
 			mddev->pers->quiesce(mddev, 1);
 			mddev->pers->quiesce(mddev, 0);
 		}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists
  2015-07-28 19:28 [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists Nate Dailey
@ 2015-07-29 20:46 ` Joe Lawrence
  2015-08-14 14:58   ` Nate Dailey
  0 siblings, 1 reply; 6+ messages in thread
From: Joe Lawrence @ 2015-07-29 20:46 UTC (permalink / raw)
  To: linux-raid; +Cc: Ben Hutchings, Cyril Vechera

On 07/28/2015 03:28 PM, Nate Dailey wrote:
> If a bitmap recovery is interrupted and later restarted, then
> sectors below the recovery offset, written between interruption
> and resumption, will not be copied. This results in corruption.
> 
> See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777511
> for a script that can be used to repro this.
> 
> Seems like ignoring the recovery_offset if a bitmap exists is
> the way to go.
> 
> Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
> ---
>  drivers/md/md.c | 24 +++++++++++++-----------
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0c2a4e8..79c6285 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7738,16 +7738,18 @@ void md_do_sync(struct md_thread *thread)
>  	else {
>  		/* recovery follows the physical size of devices */
>  		max_sectors = mddev->dev_sectors;
> -		j = MaxSector;
> -		rcu_read_lock();
> -		rdev_for_each_rcu(rdev, mddev)
> -			if (rdev->raid_disk >= 0 &&
> -			    !test_bit(Faulty, &rdev->flags) &&
> -			    !test_bit(In_sync, &rdev->flags) &&
> -			    rdev->recovery_offset < j)
> -				j = rdev->recovery_offset;
> -		rcu_read_unlock();
> -
> +		/* we don't use the offset if there's a bitmap */
> +		if (!mddev->bitmap) {
> +			j = MaxSector;
> +			rcu_read_lock();
> +			rdev_for_each_rcu(rdev, mddev)
> +				if (rdev->raid_disk >= 0 &&
> +				    !test_bit(Faulty, &rdev->flags) &&
> +				    !test_bit(In_sync, &rdev->flags) &&
> +				    rdev->recovery_offset < j)
> +					j = rdev->recovery_offset;
> +			rcu_read_unlock();
> +		}
>  		/* If there is a bitmap, we need to make sure all
>  		 * writes that started before we added a spare
>  		 * complete before we start doing a recovery.
> @@ -7756,7 +7758,7 @@ void md_do_sync(struct md_thread *thread)
>  		 * recovery has checked that bit and skipped that
>  		 * region.
>  		 */
> -		if (mddev->bitmap) {
> +		else {
>  			mddev->pers->quiesce(mddev, 1);
>  			mddev->pers->quiesce(mddev, 0);
>  		}
> 

[+cc Ben & Cyril from the Debian bug report]

-- Joe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists
  2015-07-29 20:46 ` Joe Lawrence
@ 2015-08-14 14:58   ` Nate Dailey
  2015-10-30  2:51     ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Nate Dailey @ 2015-08-14 14:58 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb, Jes.Sorensen

I hate to nag... but looking for feedback on this change, which addresses what 
seems to me to be a serious bug.

Thanks,
Nate




On 07/29/2015 04:46 PM, Joe Lawrence wrote:
> On 07/28/2015 03:28 PM, Nate Dailey wrote:
>> If a bitmap recovery is interrupted and later restarted, then
>> sectors below the recovery offset, written between interruption
>> and resumption, will not be copied. This results in corruption.
>>
>> See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777511
>> for a script that can be used to repro this.
>>
>> Seems like ignoring the recovery_offset if a bitmap exists is
>> the way to go.
>>
>> Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
>> ---
>>   drivers/md/md.c | 24 +++++++++++++-----------
>>   1 file changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 0c2a4e8..79c6285 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -7738,16 +7738,18 @@ void md_do_sync(struct md_thread *thread)
>>   	else {
>>   		/* recovery follows the physical size of devices */
>>   		max_sectors = mddev->dev_sectors;
>> -		j = MaxSector;
>> -		rcu_read_lock();
>> -		rdev_for_each_rcu(rdev, mddev)
>> -			if (rdev->raid_disk >= 0 &&
>> -			    !test_bit(Faulty, &rdev->flags) &&
>> -			    !test_bit(In_sync, &rdev->flags) &&
>> -			    rdev->recovery_offset < j)
>> -				j = rdev->recovery_offset;
>> -		rcu_read_unlock();
>> -
>> +		/* we don't use the offset if there's a bitmap */
>> +		if (!mddev->bitmap) {
>> +			j = MaxSector;
>> +			rcu_read_lock();
>> +			rdev_for_each_rcu(rdev, mddev)
>> +				if (rdev->raid_disk >= 0 &&
>> +				    !test_bit(Faulty, &rdev->flags) &&
>> +				    !test_bit(In_sync, &rdev->flags) &&
>> +				    rdev->recovery_offset < j)
>> +					j = rdev->recovery_offset;
>> +			rcu_read_unlock();
>> +		}
>>   		/* If there is a bitmap, we need to make sure all
>>   		 * writes that started before we added a spare
>>   		 * complete before we start doing a recovery.
>> @@ -7756,7 +7758,7 @@ void md_do_sync(struct md_thread *thread)
>>   		 * recovery has checked that bit and skipped that
>>   		 * region.
>>   		 */
>> -		if (mddev->bitmap) {
>> +		else {
>>   			mddev->pers->quiesce(mddev, 1);
>>   			mddev->pers->quiesce(mddev, 0);
>>   		}
>>
> [+cc Ben & Cyril from the Debian bug report]
>
> -- Joe
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists
  2015-08-14 14:58   ` Nate Dailey
@ 2015-10-30  2:51     ` Neil Brown
  2015-10-30 13:30       ` Nate Dailey
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2015-10-30  2:51 UTC (permalink / raw)
  To: Nate Dailey, linux-raid; +Cc: Jes.Sorensen

[-- Attachment #1: Type: text/plain, Size: 4241 bytes --]

On Sat, Aug 15 2015, Nate Dailey wrote:

> I hate to nag... but looking for feedback on this change, which addresses what 
> seems to me to be a serious bug.

Being a nag is good.  I don't have the earlier emails in my inbox - I
wonder what happened to them.... and for some reason this one was marked
"read".
But it arrived about when I converted over to notmuch and just before I
went on 3 weeks leave...

Anyway, Jes just poked me so I'm looking now.

>
> Thanks,
> Nate
>
>
>
>
> On 07/29/2015 04:46 PM, Joe Lawrence wrote:
>> On 07/28/2015 03:28 PM, Nate Dailey wrote:
>>> If a bitmap recovery is interrupted and later restarted, then
>>> sectors below the recovery offset, written between interruption
>>> and resumption, will not be copied. This results in corruption.
>>>
>>> See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777511
>>> for a script that can be used to repro this.
>>>
>>> Seems like ignoring the recovery_offset if a bitmap exists is
>>> the way to go.

This doesn't feel like the right solution.
Why does the presence of a bitmap affect the validity of
->recovery_offset.

Surely recovery_offset should always be reliable and we should always
use it.  Maybe it isn't being updated correctly in some situation when a
bitmap is present.

Does it ever make sense to honour the recovery-offset when a device is
re-added?
I don't think it does....

Oh.  Look what I found.
commit 7eb418851f3278de67126ea0c427641ab4792c57
Author: NeilBrown <neilb@suse.de>
Date:   Tue Jan 14 15:55:14 2014 +1100

    md: allow a partially recovered device to be hot-added to an array.

...
-               rdev->recovery_offset = 0;
+               if (rdev->saved_raid_disk < 0)
+                       rdev->recovery_offset = 0;


we used to clear recovery_offset for a re-add, but we don't any more.
I guess this patch introduced the bug.

I cannot find anything in my mail logs to suggest why I wrote that
patch.

Right now I cannot think of any real justification for that patch.
Could someone please test to see if reverting that patch fixes the
problem?

sorry for the delay in getting to this.

Thanks.
NeilBrown



>>>
>>> Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
>>> ---
>>>   drivers/md/md.c | 24 +++++++++++++-----------
>>>   1 file changed, 13 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 0c2a4e8..79c6285 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -7738,16 +7738,18 @@ void md_do_sync(struct md_thread *thread)
>>>   	else {
>>>   		/* recovery follows the physical size of devices */
>>>   		max_sectors = mddev->dev_sectors;
>>> -		j = MaxSector;
>>> -		rcu_read_lock();
>>> -		rdev_for_each_rcu(rdev, mddev)
>>> -			if (rdev->raid_disk >= 0 &&
>>> -			    !test_bit(Faulty, &rdev->flags) &&
>>> -			    !test_bit(In_sync, &rdev->flags) &&
>>> -			    rdev->recovery_offset < j)
>>> -				j = rdev->recovery_offset;
>>> -		rcu_read_unlock();
>>> -
>>> +		/* we don't use the offset if there's a bitmap */
>>> +		if (!mddev->bitmap) {
>>> +			j = MaxSector;
>>> +			rcu_read_lock();
>>> +			rdev_for_each_rcu(rdev, mddev)
>>> +				if (rdev->raid_disk >= 0 &&
>>> +				    !test_bit(Faulty, &rdev->flags) &&
>>> +				    !test_bit(In_sync, &rdev->flags) &&
>>> +				    rdev->recovery_offset < j)
>>> +					j = rdev->recovery_offset;
>>> +			rcu_read_unlock();
>>> +		}
>>>   		/* If there is a bitmap, we need to make sure all
>>>   		 * writes that started before we added a spare
>>>   		 * complete before we start doing a recovery.
>>> @@ -7756,7 +7758,7 @@ void md_do_sync(struct md_thread *thread)
>>>   		 * recovery has checked that bit and skipped that
>>>   		 * region.
>>>   		 */
>>> -		if (mddev->bitmap) {
>>> +		else {
>>>   			mddev->pers->quiesce(mddev, 1);
>>>   			mddev->pers->quiesce(mddev, 0);
>>>   		}
>>>
>> [+cc Ben & Cyril from the Debian bug report]
>>
>> -- Joe
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists
  2015-10-30  2:51     ` Neil Brown
@ 2015-10-30 13:30       ` Nate Dailey
  2015-10-31  0:26         ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Nate Dailey @ 2015-10-30 13:30 UTC (permalink / raw)
  To: Neil Brown, linux-raid; +Cc: Jes.Sorensen

I first tested 4.3-rc6 that I already had laying around, and verified that the 
bug still happens.

Then I reverted 7eb418851f3278de67126ea0c427641ab4792c57, rebuilt & installed, 
and tested again. Reverting this patch did indeed fix the bug.

Thank you!

Nate



On 10/29/2015 10:51 PM, Neil Brown wrote:
> On Sat, Aug 15 2015, Nate Dailey wrote:
>
>> I hate to nag... but looking for feedback on this change, which addresses what
>> seems to me to be a serious bug.
> Being a nag is good.  I don't have the earlier emails in my inbox - I
> wonder what happened to them.... and for some reason this one was marked
> "read".
> But it arrived about when I converted over to notmuch and just before I
> went on 3 weeks leave...
>
> Anyway, Jes just poked me so I'm looking now.
>
>> Thanks,
>> Nate
>>
>>
>>
>>
>> On 07/29/2015 04:46 PM, Joe Lawrence wrote:
>>> On 07/28/2015 03:28 PM, Nate Dailey wrote:
>>>> If a bitmap recovery is interrupted and later restarted, then
>>>> sectors below the recovery offset, written between interruption
>>>> and resumption, will not be copied. This results in corruption.
>>>>
>>>> See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777511
>>>> for a script that can be used to repro this.
>>>>
>>>> Seems like ignoring the recovery_offset if a bitmap exists is
>>>> the way to go.
> This doesn't feel like the right solution.
> Why does the presence of a bitmap affect the validity of
> ->recovery_offset.
>
> Surely recovery_offset should always be reliable and we should always
> use it.  Maybe it isn't being updated correctly in some situation when a
> bitmap is present.
>
> Does it ever make sense to honour the recovery-offset when a device is
> re-added?
> I don't think it does....
>
> Oh.  Look what I found.
> commit 7eb418851f3278de67126ea0c427641ab4792c57
> Author: NeilBrown <neilb@suse.de>
> Date:   Tue Jan 14 15:55:14 2014 +1100
>
>      md: allow a partially recovered device to be hot-added to an array.
>
> ...
> -               rdev->recovery_offset = 0;
> +               if (rdev->saved_raid_disk < 0)
> +                       rdev->recovery_offset = 0;
>
>
> we used to clear recovery_offset for a re-add, but we don't any more.
> I guess this patch introduced the bug.
>
> I cannot find anything in my mail logs to suggest why I wrote that
> patch.
>
> Right now I cannot think of any real justification for that patch.
> Could someone please test to see if reverting that patch fixes the
> problem?
>
> sorry for the delay in getting to this.
>
> Thanks.
> NeilBrown
>
>
>
>>>> Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
>>>> ---
>>>>    drivers/md/md.c | 24 +++++++++++++-----------
>>>>    1 file changed, 13 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>> index 0c2a4e8..79c6285 100644
>>>> --- a/drivers/md/md.c
>>>> +++ b/drivers/md/md.c
>>>> @@ -7738,16 +7738,18 @@ void md_do_sync(struct md_thread *thread)
>>>>    	else {
>>>>    		/* recovery follows the physical size of devices */
>>>>    		max_sectors = mddev->dev_sectors;
>>>> -		j = MaxSector;
>>>> -		rcu_read_lock();
>>>> -		rdev_for_each_rcu(rdev, mddev)
>>>> -			if (rdev->raid_disk >= 0 &&
>>>> -			    !test_bit(Faulty, &rdev->flags) &&
>>>> -			    !test_bit(In_sync, &rdev->flags) &&
>>>> -			    rdev->recovery_offset < j)
>>>> -				j = rdev->recovery_offset;
>>>> -		rcu_read_unlock();
>>>> -
>>>> +		/* we don't use the offset if there's a bitmap */
>>>> +		if (!mddev->bitmap) {
>>>> +			j = MaxSector;
>>>> +			rcu_read_lock();
>>>> +			rdev_for_each_rcu(rdev, mddev)
>>>> +				if (rdev->raid_disk >= 0 &&
>>>> +				    !test_bit(Faulty, &rdev->flags) &&
>>>> +				    !test_bit(In_sync, &rdev->flags) &&
>>>> +				    rdev->recovery_offset < j)
>>>> +					j = rdev->recovery_offset;
>>>> +			rcu_read_unlock();
>>>> +		}
>>>>    		/* If there is a bitmap, we need to make sure all
>>>>    		 * writes that started before we added a spare
>>>>    		 * complete before we start doing a recovery.
>>>> @@ -7756,7 +7758,7 @@ void md_do_sync(struct md_thread *thread)
>>>>    		 * recovery has checked that bit and skipped that
>>>>    		 * region.
>>>>    		 */
>>>> -		if (mddev->bitmap) {
>>>> +		else {
>>>>    			mddev->pers->quiesce(mddev, 1);
>>>>    			mddev->pers->quiesce(mddev, 0);
>>>>    		}
>>>>
>>> [+cc Ben & Cyril from the Debian bug report]
>>>
>>> -- Joe
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists
  2015-10-30 13:30       ` Nate Dailey
@ 2015-10-31  0:26         ` Neil Brown
  0 siblings, 0 replies; 6+ messages in thread
From: Neil Brown @ 2015-10-31  0:26 UTC (permalink / raw)
  To: Nate Dailey, linux-raid; +Cc: Jes.Sorensen

[-- Attachment #1: Type: text/plain, Size: 2611 bytes --]

On Sat, Oct 31 2015, Nate Dailey wrote:

> I first tested 4.3-rc6 that I already had laying around, and verified that the 
> bug still happens.
>
> Then I reverted 7eb418851f3278de67126ea0c427641ab4792c57, rebuilt & installed, 
> and tested again. Reverting this patch did indeed fix the bug.
>
> Thank you!
>
> Nate
>

Thanks a lot for testing.
The following with go to Linus soon, hopefully in time for 4.3-final.
It should then be picked up by stable.

If anyone wants to try their hand at the "future patch" I mentioned, I
wouldn't object. MD_FEATURE_RECOVERY_BITMAP is an important part of the
picture.  I won't have an opportunity to work on it until December.

Thanks,
NeilBrown


commit d01552a76d71f9879af448e9142389ee9be6e95b
Author: NeilBrown <neilb@suse.com>
Date:   Sat Oct 31 11:00:56 2015 +1100

    Revert "md: allow a partially recovered device to be hot-added to an array."
    
    This reverts commit 7eb418851f3278de67126ea0c427641ab4792c57.
    
    This commit is poorly justified, I can find not discusison in email,
    and it clearly causes a problem.
    
    If a device which is being recovered fails and is subsequently
    re-added to an array, there could easily have been changes to the
    array *before* the point where the recovery was up to.  So the
    recovery must start again from the beginning.
    
    If a spare is being recovered and fails, then when it is re-added we
    really should do a bitmap-based recovery up to the recovery-offset,
    and then a full recovery from there.  Before this reversion, we only
    did the "full recovery from there" which is not corect.  After this
    reversion with will do a full recovery from the start, which is safer
    but not ideal.
    
    It will be left to a future patch to arrange the two different styles
    of recovery.
    
    Reported-and-tested-by: Nate Dailey <nate.dailey@stratus.com>
    Signed-off-by: NeilBrown <neilb@suse.com>
    Cc: stable@vger.kernel.org (3.14+)
    Fixes: 7eb418851f32 ("md: allow a partially recovered device to be hot-added to an array.")

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c702de18207a..3fe3d04a968a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8040,8 +8040,7 @@ static int remove_and_add_spares(struct mddev *mddev,
 		       !test_bit(Bitmap_sync, &rdev->flags)))
 			continue;
 
-		if (rdev->saved_raid_disk < 0)
-			rdev->recovery_offset = 0;
+		rdev->recovery_offset = 0;
 		if (mddev->pers->
 		    hot_add_disk(mddev, rdev) == 0) {
 			if (sysfs_link_rdev(mddev, rdev))

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-10-31  0:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-28 19:28 [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists Nate Dailey
2015-07-29 20:46 ` Joe Lawrence
2015-08-14 14:58   ` Nate Dailey
2015-10-30  2:51     ` Neil Brown
2015-10-30 13:30       ` Nate Dailey
2015-10-31  0:26         ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.