From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nate Dailey Subject: Re: [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists Date: Fri, 14 Aug 2015 10:58:15 -0400 Message-ID: <55CE0207.1020707@stratus.com> References: <1438111733-17778-1-git-send-email-nate.dailey@stratus.com> <55B93B9D.5000103@stratus.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55B93B9D.5000103@stratus.com> Sender: linux-raid-owner@vger.kernel.org To: linux-raid@vger.kernel.org Cc: neilb@suse.de, Jes.Sorensen@redhat.com List-Id: linux-raid.ids I hate to nag... but looking for feedback on this change, which addresses what seems to me to be a serious bug. Thanks, Nate On 07/29/2015 04:46 PM, Joe Lawrence wrote: > On 07/28/2015 03:28 PM, Nate Dailey wrote: >> If a bitmap recovery is interrupted and later restarted, then >> sectors below the recovery offset, written between interruption >> and resumption, will not be copied. This results in corruption. >> >> See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777511 >> for a script that can be used to repro this. >> >> Seems like ignoring the recovery_offset if a bitmap exists is >> the way to go. >> >> Signed-off-by: Nate Dailey >> --- >> drivers/md/md.c | 24 +++++++++++++----------- >> 1 file changed, 13 insertions(+), 11 deletions(-) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index 0c2a4e8..79c6285 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -7738,16 +7738,18 @@ void md_do_sync(struct md_thread *thread) >> else { >> /* recovery follows the physical size of devices */ >> max_sectors = mddev->dev_sectors; >> - j = MaxSector; >> - rcu_read_lock(); >> - rdev_for_each_rcu(rdev, mddev) >> - if (rdev->raid_disk >= 0 && >> - !test_bit(Faulty, &rdev->flags) && >> - !test_bit(In_sync, &rdev->flags) && >> - rdev->recovery_offset < j) >> - j = rdev->recovery_offset; >> - rcu_read_unlock(); >> - >> + /* we don't use the offset if there's a bitmap */ >> + if (!mddev->bitmap) { >> + j = MaxSector; >> + rcu_read_lock(); >> + rdev_for_each_rcu(rdev, mddev) >> + if (rdev->raid_disk >= 0 && >> + !test_bit(Faulty, &rdev->flags) && >> + !test_bit(In_sync, &rdev->flags) && >> + rdev->recovery_offset < j) >> + j = rdev->recovery_offset; >> + rcu_read_unlock(); >> + } >> /* If there is a bitmap, we need to make sure all >> * writes that started before we added a spare >> * complete before we start doing a recovery. >> @@ -7756,7 +7758,7 @@ void md_do_sync(struct md_thread *thread) >> * recovery has checked that bit and skipped that >> * region. >> */ >> - if (mddev->bitmap) { >> + else { >> mddev->pers->quiesce(mddev, 1); >> mddev->pers->quiesce(mddev, 0); >> } >> > [+cc Ben & Cyril from the Debian bug report] > > -- Joe > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html