All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Nate Dailey <nate.dailey@stratus.com>, linux-raid@vger.kernel.org
Cc: Jes.Sorensen@redhat.com
Subject: Re: [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists
Date: Fri, 30 Oct 2015 13:51:02 +1100	[thread overview]
Message-ID: <87r3kdvzl5.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <55CE0207.1020707@stratus.com>

[-- Attachment #1: Type: text/plain, Size: 4241 bytes --]

On Sat, Aug 15 2015, Nate Dailey wrote:

> I hate to nag... but looking for feedback on this change, which addresses what 
> seems to me to be a serious bug.

Being a nag is good.  I don't have the earlier emails in my inbox - I
wonder what happened to them.... and for some reason this one was marked
"read".
But it arrived about when I converted over to notmuch and just before I
went on 3 weeks leave...

Anyway, Jes just poked me so I'm looking now.

>
> Thanks,
> Nate
>
>
>
>
> On 07/29/2015 04:46 PM, Joe Lawrence wrote:
>> On 07/28/2015 03:28 PM, Nate Dailey wrote:
>>> If a bitmap recovery is interrupted and later restarted, then
>>> sectors below the recovery offset, written between interruption
>>> and resumption, will not be copied. This results in corruption.
>>>
>>> See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777511
>>> for a script that can be used to repro this.
>>>
>>> Seems like ignoring the recovery_offset if a bitmap exists is
>>> the way to go.

This doesn't feel like the right solution.
Why does the presence of a bitmap affect the validity of
->recovery_offset.

Surely recovery_offset should always be reliable and we should always
use it.  Maybe it isn't being updated correctly in some situation when a
bitmap is present.

Does it ever make sense to honour the recovery-offset when a device is
re-added?
I don't think it does....

Oh.  Look what I found.
commit 7eb418851f3278de67126ea0c427641ab4792c57
Author: NeilBrown <neilb@suse.de>
Date:   Tue Jan 14 15:55:14 2014 +1100

    md: allow a partially recovered device to be hot-added to an array.

...
-               rdev->recovery_offset = 0;
+               if (rdev->saved_raid_disk < 0)
+                       rdev->recovery_offset = 0;


we used to clear recovery_offset for a re-add, but we don't any more.
I guess this patch introduced the bug.

I cannot find anything in my mail logs to suggest why I wrote that
patch.

Right now I cannot think of any real justification for that patch.
Could someone please test to see if reverting that patch fixes the
problem?

sorry for the delay in getting to this.

Thanks.
NeilBrown



>>>
>>> Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
>>> ---
>>>   drivers/md/md.c | 24 +++++++++++++-----------
>>>   1 file changed, 13 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 0c2a4e8..79c6285 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -7738,16 +7738,18 @@ void md_do_sync(struct md_thread *thread)
>>>   	else {
>>>   		/* recovery follows the physical size of devices */
>>>   		max_sectors = mddev->dev_sectors;
>>> -		j = MaxSector;
>>> -		rcu_read_lock();
>>> -		rdev_for_each_rcu(rdev, mddev)
>>> -			if (rdev->raid_disk >= 0 &&
>>> -			    !test_bit(Faulty, &rdev->flags) &&
>>> -			    !test_bit(In_sync, &rdev->flags) &&
>>> -			    rdev->recovery_offset < j)
>>> -				j = rdev->recovery_offset;
>>> -		rcu_read_unlock();
>>> -
>>> +		/* we don't use the offset if there's a bitmap */
>>> +		if (!mddev->bitmap) {
>>> +			j = MaxSector;
>>> +			rcu_read_lock();
>>> +			rdev_for_each_rcu(rdev, mddev)
>>> +				if (rdev->raid_disk >= 0 &&
>>> +				    !test_bit(Faulty, &rdev->flags) &&
>>> +				    !test_bit(In_sync, &rdev->flags) &&
>>> +				    rdev->recovery_offset < j)
>>> +					j = rdev->recovery_offset;
>>> +			rcu_read_unlock();
>>> +		}
>>>   		/* If there is a bitmap, we need to make sure all
>>>   		 * writes that started before we added a spare
>>>   		 * complete before we start doing a recovery.
>>> @@ -7756,7 +7758,7 @@ void md_do_sync(struct md_thread *thread)
>>>   		 * recovery has checked that bit and skipped that
>>>   		 * region.
>>>   		 */
>>> -		if (mddev->bitmap) {
>>> +		else {
>>>   			mddev->pers->quiesce(mddev, 1);
>>>   			mddev->pers->quiesce(mddev, 0);
>>>   		}
>>>
>> [+cc Ben & Cyril from the Debian bug report]
>>
>> -- Joe
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

  reply	other threads:[~2015-10-30  2:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-28 19:28 [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists Nate Dailey
2015-07-29 20:46 ` Joe Lawrence
2015-08-14 14:58   ` Nate Dailey
2015-10-30  2:51     ` Neil Brown [this message]
2015-10-30 13:30       ` Nate Dailey
2015-10-31  0:26         ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r3kdvzl5.fsf@notabene.neil.brown.name \
    --to=neilb@suse.de \
    --cc=Jes.Sorensen@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=nate.dailey@stratus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.