* [PATCH] md/bitmap: don't read page from device with Bitmap_sync @ 2017-06-16 9:19 Guoqing Jiang 2017-06-16 17:36 ` Shaohua Li 0 siblings, 1 reply; 8+ messages in thread From: Guoqing Jiang @ 2017-06-16 9:19 UTC (permalink / raw) To: shli, neilb; +Cc: linux-raid, Guoqing Jiang The device owns Bitmap_sync flag needs recovery to become in sync, and read page from this type device could get stale status. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> --- When develop for clustered raid10 feature, if add a disk under grow mode in master node, I could get the "bitmap superblock UUID mismatch" warning due to the page is read from Bitmap_sync device. drivers/md/bitmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index bf7419a..bf34cd8 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -156,7 +156,8 @@ static int read_sb_page(struct mddev *mddev, loff_t offset, rdev_for_each(rdev, mddev) { if (! test_bit(In_sync, &rdev->flags) - || test_bit(Faulty, &rdev->flags)) + || test_bit(Faulty, &rdev->flags) + || test_bit(Bitmap_sync, &rdev->flags)) continue; target = offset + index * (PAGE_SIZE/512); -- 2.10.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] md/bitmap: don't read page from device with Bitmap_sync 2017-06-16 9:19 [PATCH] md/bitmap: don't read page from device with Bitmap_sync Guoqing Jiang @ 2017-06-16 17:36 ` Shaohua Li 2017-06-18 23:11 ` NeilBrown 0 siblings, 1 reply; 8+ messages in thread From: Shaohua Li @ 2017-06-16 17:36 UTC (permalink / raw) To: Guoqing Jiang; +Cc: neilb, linux-raid On Fri, Jun 16, 2017 at 05:19:27PM +0800, Guoqing Jiang wrote: > The device owns Bitmap_sync flag needs recovery > to become in sync, and read page from this type > device could get stale status. > > Signed-off-by: Guoqing Jiang <gqjiang@suse.com> > --- > When develop for clustered raid10 feature, if add a > disk under grow mode in master node, I could get > the "bitmap superblock UUID mismatch" warning due > to the page is read from Bitmap_sync device. > > drivers/md/bitmap.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c > index bf7419a..bf34cd8 100644 > --- a/drivers/md/bitmap.c > +++ b/drivers/md/bitmap.c > @@ -156,7 +156,8 @@ static int read_sb_page(struct mddev *mddev, loff_t offset, > > rdev_for_each(rdev, mddev) { > if (! test_bit(In_sync, &rdev->flags) Why In_sync is set for the rdev? I think it shoudn't. > - || test_bit(Faulty, &rdev->flags)) > + || test_bit(Faulty, &rdev->flags) > + || test_bit(Bitmap_sync, &rdev->flags)) I didn't see code clears the Bitmap_sync bit after disk is synced, so likely there is bug in this side. > continue; > > target = offset + index * (PAGE_SIZE/512); > -- > 2.10.0 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] md/bitmap: don't read page from device with Bitmap_sync 2017-06-16 17:36 ` Shaohua Li @ 2017-06-18 23:11 ` NeilBrown 2017-06-20 0:41 ` Shaohua Li 0 siblings, 1 reply; 8+ messages in thread From: NeilBrown @ 2017-06-18 23:11 UTC (permalink / raw) To: Shaohua Li, Guoqing Jiang; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3056 bytes --] On Fri, Jun 16 2017, Shaohua Li wrote: > On Fri, Jun 16, 2017 at 05:19:27PM +0800, Guoqing Jiang wrote: >> The device owns Bitmap_sync flag needs recovery >> to become in sync, and read page from this type >> device could get stale status. >> >> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> >> --- >> When develop for clustered raid10 feature, if add a >> disk under grow mode in master node, I could get >> the "bitmap superblock UUID mismatch" warning due >> to the page is read from Bitmap_sync device. >> >> drivers/md/bitmap.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c >> index bf7419a..bf34cd8 100644 >> --- a/drivers/md/bitmap.c >> +++ b/drivers/md/bitmap.c >> @@ -156,7 +156,8 @@ static int read_sb_page(struct mddev *mddev, loff_t offset, >> >> rdev_for_each(rdev, mddev) { >> if (! test_bit(In_sync, &rdev->flags) > > Why In_sync is set for the rdev? I think it shoudn't. There are several states a device can be in. If ->raid_disk is < 0, then the device is a spare, and doesn't contain and valid data. Otherwise ->raid_disk >=0 and: In_sync is clear, which means that blocks before ->recovery_offset contain valid data, and blocks after there don't In_sync is set, which means ->recovery_offset is irrelevant and should be treated as MaxSector Bitmap_sync is set, which could apply in either of the above cases, and means blocks corresponding to bits that are set in the bitmap may not be up to date. Bitmap_sync is only relevant before a device has been handed to the personality. After it has been added, ->recovery_cp ensures that the blocks that might be wrong are not read until until the bitmap-based recovery has fixed them up. Bitmap_sync is only use to stop the device from being given to the personality if a recovery won't be started because the array is read-only. So it is perfectly valid for both In_sync and Bitmap_sync to be set. If they are, it makes sense to avoid reading bitmap information from the Bitmap_sync device, as that will be out-of-date. I'm not quite sure why Guoqing is getting a UUID mismatch ... it suggests that the new device wasn't initialized properly. So there might be another bug. But I think this is definitely a bug. > >> - || test_bit(Faulty, &rdev->flags)) >> + || test_bit(Faulty, &rdev->flags) >> + || test_bit(Bitmap_sync, &rdev->flags)) > > I didn't see code clears the Bitmap_sync bit after disk is synced, so likely > there is bug in this side. There is no need to clear Bitmap_sync. It stays set until it becomes irrelevant. Thanks, NeilBrown > >> continue; >> >> target = offset + index * (PAGE_SIZE/512); >> -- >> 2.10.0 >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] md/bitmap: don't read page from device with Bitmap_sync 2017-06-18 23:11 ` NeilBrown @ 2017-06-20 0:41 ` Shaohua Li 2017-06-20 3:59 ` NeilBrown 0 siblings, 1 reply; 8+ messages in thread From: Shaohua Li @ 2017-06-20 0:41 UTC (permalink / raw) To: NeilBrown; +Cc: Guoqing Jiang, linux-raid On Mon, Jun 19, 2017 at 09:11:23AM +1000, Neil Brown wrote: > On Fri, Jun 16 2017, Shaohua Li wrote: > > > On Fri, Jun 16, 2017 at 05:19:27PM +0800, Guoqing Jiang wrote: > >> The device owns Bitmap_sync flag needs recovery > >> to become in sync, and read page from this type > >> device could get stale status. > >> > >> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> > >> --- > >> When develop for clustered raid10 feature, if add a > >> disk under grow mode in master node, I could get > >> the "bitmap superblock UUID mismatch" warning due > >> to the page is read from Bitmap_sync device. > >> > >> drivers/md/bitmap.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c > >> index bf7419a..bf34cd8 100644 > >> --- a/drivers/md/bitmap.c > >> +++ b/drivers/md/bitmap.c > >> @@ -156,7 +156,8 @@ static int read_sb_page(struct mddev *mddev, loff_t offset, > >> > >> rdev_for_each(rdev, mddev) { > >> if (! test_bit(In_sync, &rdev->flags) > > > > Why In_sync is set for the rdev? I think it shoudn't. > > There are several states a device can be in. > If ->raid_disk is < 0, then the device is a spare, and doesn't contain > and valid data. > Otherwise ->raid_disk >=0 and: > In_sync is clear, which means that blocks before ->recovery_offset > contain valid data, and blocks after there don't > In_sync is set, which means ->recovery_offset is irrelevant and > should be treated as MaxSector > Bitmap_sync is set, which could apply in either of the above cases, > and means blocks corresponding to bits that are set in the bitmap > may not be up to date. > > Bitmap_sync is only relevant before a device has been handed to the > personality. After it has been added, ->recovery_cp ensures that the > blocks that might be wrong are not read until until the bitmap-based > recovery has fixed them up. > > Bitmap_sync is only use to stop the device from being given to the > personality if a recovery won't be started because the array is > read-only. > > So it is perfectly valid for both In_sync and Bitmap_sync to be set. > If they are, it makes sense to avoid reading bitmap information from the > Bitmap_sync device, as that will be out-of-date. > > I'm not quite sure why Guoqing is getting a UUID mismatch ... it > suggests that the new device wasn't initialized properly. So there > might be another bug. But I think this is definitely a bug. Can you describe a scenario a disk has both bits set? I had hard time to imagine it. md_update_sb will update super and bitmap super at almost the same time. It's possible we update super but not bitmap super and there is power loss. Is this the case we have the both bits set? There seems no mechanism to prevent the first disk has older bitmap super than the second disk in this scenario. > > > >> - || test_bit(Faulty, &rdev->flags)) > >> + || test_bit(Faulty, &rdev->flags) > >> + || test_bit(Bitmap_sync, &rdev->flags)) > > > > I didn't see code clears the Bitmap_sync bit after disk is synced, so likely > > there is bug in this side. > > There is no need to clear Bitmap_sync. It stays set until it becomes > irrelevant. Hmm, if we use in this way, we should add comments there to declare this bit can only be used very early. Thanks, Shaohua ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] md/bitmap: don't read page from device with Bitmap_sync 2017-06-20 0:41 ` Shaohua Li @ 2017-06-20 3:59 ` NeilBrown 2017-06-21 9:05 ` Guoqing Jiang 2017-06-21 17:47 ` Shaohua Li 0 siblings, 2 replies; 8+ messages in thread From: NeilBrown @ 2017-06-20 3:59 UTC (permalink / raw) To: Shaohua Li; +Cc: Guoqing Jiang, linux-raid [-- Attachment #1: Type: text/plain, Size: 4794 bytes --] On Mon, Jun 19 2017, Shaohua Li wrote: > On Mon, Jun 19, 2017 at 09:11:23AM +1000, Neil Brown wrote: >> On Fri, Jun 16 2017, Shaohua Li wrote: >> >> > On Fri, Jun 16, 2017 at 05:19:27PM +0800, Guoqing Jiang wrote: >> >> The device owns Bitmap_sync flag needs recovery >> >> to become in sync, and read page from this type >> >> device could get stale status. >> >> >> >> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> >> >> --- >> >> When develop for clustered raid10 feature, if add a >> >> disk under grow mode in master node, I could get >> >> the "bitmap superblock UUID mismatch" warning due >> >> to the page is read from Bitmap_sync device. >> >> >> >> drivers/md/bitmap.c | 3 ++- >> >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c >> >> index bf7419a..bf34cd8 100644 >> >> --- a/drivers/md/bitmap.c >> >> +++ b/drivers/md/bitmap.c >> >> @@ -156,7 +156,8 @@ static int read_sb_page(struct mddev *mddev, loff_t offset, >> >> >> >> rdev_for_each(rdev, mddev) { >> >> if (! test_bit(In_sync, &rdev->flags) >> > >> > Why In_sync is set for the rdev? I think it shoudn't. >> >> There are several states a device can be in. >> If ->raid_disk is < 0, then the device is a spare, and doesn't contain >> and valid data. >> Otherwise ->raid_disk >=0 and: >> In_sync is clear, which means that blocks before ->recovery_offset >> contain valid data, and blocks after there don't >> In_sync is set, which means ->recovery_offset is irrelevant and >> should be treated as MaxSector >> Bitmap_sync is set, which could apply in either of the above cases, >> and means blocks corresponding to bits that are set in the bitmap >> may not be up to date. >> >> Bitmap_sync is only relevant before a device has been handed to the >> personality. After it has been added, ->recovery_cp ensures that the >> blocks that might be wrong are not read until until the bitmap-based >> recovery has fixed them up. >> >> Bitmap_sync is only use to stop the device from being given to the >> personality if a recovery won't be started because the array is >> read-only. >> >> So it is perfectly valid for both In_sync and Bitmap_sync to be set. >> If they are, it makes sense to avoid reading bitmap information from the >> Bitmap_sync device, as that will be out-of-date. >> >> I'm not quite sure why Guoqing is getting a UUID mismatch ... it >> suggests that the new device wasn't initialized properly. So there >> might be another bug. But I think this is definitely a bug. > > Can you describe a scenario a disk has both bits set? I had hard time to > imagine it. An array with a bitmap has a device fail (cable break?). So the bitmap stops clearing bits and ->events_cleared stays where it was, with the same value as events in the superblock of the failed device. The device then starts working (cable is replaced) and it device is added back to the array. super_1_validate notices that the array has a bitmap and that the event count in the new device is >= mddev->bitmap->events_cleared, but < mddev->events. i.e. it is in the range for which a bitmap-based recovery is both possible and needed. So it sets Bitmap_sync. Then it notices that the 'role' in a valid role for the array, so that role is stored in rdev->saved_raid_disk and, as there is no MD_FEATURE_RECOVERY_OFFSET, In_sync is set. A similar sequence can happen in super_90_validate. > > md_update_sb will update super and bitmap super at almost the same time. It's > possible we update super but not bitmap super and there is power loss. Is this > the case we have the both bits set? There seems no mechanism to prevent the > first disk has older bitmap super than the second disk in this scenario. This isn't really about the superblock and bitmap on the one device being in sync or not. It is the superblock on one (failed) device having a different event count than the bitmap (on the other devices) that is actively being used by the array. > >> > >> >> - || test_bit(Faulty, &rdev->flags)) >> >> + || test_bit(Faulty, &rdev->flags) >> >> + || test_bit(Bitmap_sync, &rdev->flags)) >> > >> > I didn't see code clears the Bitmap_sync bit after disk is synced, so likely >> > there is bug in this side. >> >> There is no need to clear Bitmap_sync. It stays set until it becomes >> irrelevant. > > Hmm, if we use in this way, we should add comments there to declare this bit > can only be used very early. Certainly would be appropriate to document that it is only meaningful before the device has been passed to ->hot_add_disk(). Thanks, NeilBrown > > Thanks, > Shaohua [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] md/bitmap: don't read page from device with Bitmap_sync 2017-06-20 3:59 ` NeilBrown @ 2017-06-21 9:05 ` Guoqing Jiang 2017-06-21 17:47 ` Shaohua Li 1 sibling, 0 replies; 8+ messages in thread From: Guoqing Jiang @ 2017-06-21 9:05 UTC (permalink / raw) To: NeilBrown, Shaohua Li; +Cc: linux-raid On 06/20/2017 11:59 AM, NeilBrown wrote: > On Mon, Jun 19 2017, Shaohua Li wrote: > >> On Mon, Jun 19, 2017 at 09:11:23AM +1000, Neil Brown wrote: >>> On Fri, Jun 16 2017, Shaohua Li wrote: >>> >>>> On Fri, Jun 16, 2017 at 05:19:27PM +0800, Guoqing Jiang wrote: >>>>> The device owns Bitmap_sync flag needs recovery >>>>> to become in sync, and read page from this type >>>>> device could get stale status. >>>>> >>>>> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> >>>>> --- >>>>> When develop for clustered raid10 feature, if add a >>>>> disk under grow mode in master node, I could get >>>>> the "bitmap superblock UUID mismatch" warning due >>>>> to the page is read from Bitmap_sync device. >>>>> >>>>> drivers/md/bitmap.c | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c >>>>> index bf7419a..bf34cd8 100644 >>>>> --- a/drivers/md/bitmap.c >>>>> +++ b/drivers/md/bitmap.c >>>>> @@ -156,7 +156,8 @@ static int read_sb_page(struct mddev *mddev, loff_t offset, >>>>> >>>>> rdev_for_each(rdev, mddev) { >>>>> if (! test_bit(In_sync, &rdev->flags) >>>> Why In_sync is set for the rdev? I think it shoudn't. >>> There are several states a device can be in. >>> If ->raid_disk is < 0, then the device is a spare, and doesn't contain >>> and valid data. >>> Otherwise ->raid_disk >=0 and: >>> In_sync is clear, which means that blocks before ->recovery_offset >>> contain valid data, and blocks after there don't >>> In_sync is set, which means ->recovery_offset is irrelevant and >>> should be treated as MaxSector >>> Bitmap_sync is set, which could apply in either of the above cases, >>> and means blocks corresponding to bits that are set in the bitmap >>> may not be up to date. >>> >>> Bitmap_sync is only relevant before a device has been handed to the >>> personality. After it has been added, ->recovery_cp ensures that the >>> blocks that might be wrong are not read until until the bitmap-based >>> recovery has fixed them up. >>> >>> Bitmap_sync is only use to stop the device from being given to the >>> personality if a recovery won't be started because the array is >>> read-only. >>> >>> So it is perfectly valid for both In_sync and Bitmap_sync to be set. >>> If they are, it makes sense to avoid reading bitmap information from the >>> Bitmap_sync device, as that will be out-of-date. >>> >>> I'm not quite sure why Guoqing is getting a UUID mismatch ... it >>> suggests that the new device wasn't initialized properly. So there >>> might be another bug. But I think this is definitely a bug. >> Can you describe a scenario a disk has both bits set? I had hard time to >> imagine it. > An array with a bitmap has a device fail (cable break?). > So the bitmap stops clearing bits and ->events_cleared stays > where it was, with the same value as events in the superblock > of the failed device. > > The device then starts working (cable is replaced) and > it device is added back to the array. > super_1_validate notices that the array has a bitmap and that the event > count in the new device is >= mddev->bitmap->events_cleared, but < > mddev->events. > i.e. it is in the range for which a bitmap-based recovery is both > possible and needed. > So it sets Bitmap_sync. > Then it notices that the 'role' in a valid role for the array, > so that role is stored in rdev->saved_raid_disk and, as there is no > MD_FEATURE_RECOVERY_OFFSET, In_sync is set. > A similar sequence can happen in super_90_validate. > >> md_update_sb will update super and bitmap super at almost the same time. It's >> possible we update super but not bitmap super and there is power loss. Is this >> the case we have the both bits set? There seems no mechanism to prevent the >> first disk has older bitmap super than the second disk in this scenario. > This isn't really about the superblock and bitmap on the one device > being in sync or not. > It is the superblock on one (failed) device having a different event > count than the bitmap (on the other devices) that is actively being used > by the array. > >> >>>>> - || test_bit(Faulty, &rdev->flags)) >>>>> + || test_bit(Faulty, &rdev->flags) >>>>> + || test_bit(Bitmap_sync, &rdev->flags)) >>>> I didn't see code clears the Bitmap_sync bit after disk is synced, so likely >>>> there is bug in this side. >>> There is no need to clear Bitmap_sync. It stays set until it becomes >>> irrelevant. >> Hmm, if we use in this way, we should add comments there to declare this bit >> can only be used very early. > Certainly would be appropriate to document that it is only meaningful > before the device has been passed to ->hot_add_disk(). Ok, I will send v2 version with the comment included. Thanks, Guoqing ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] md/bitmap: don't read page from device with Bitmap_sync 2017-06-20 3:59 ` NeilBrown 2017-06-21 9:05 ` Guoqing Jiang @ 2017-06-21 17:47 ` Shaohua Li 2017-06-22 6:45 ` Guoqing Jiang 1 sibling, 1 reply; 8+ messages in thread From: Shaohua Li @ 2017-06-21 17:47 UTC (permalink / raw) To: NeilBrown; +Cc: Guoqing Jiang, linux-raid On Tue, Jun 20, 2017 at 01:59:45PM +1000, Neil Brown wrote: > On Mon, Jun 19 2017, Shaohua Li wrote: > > > On Mon, Jun 19, 2017 at 09:11:23AM +1000, Neil Brown wrote: > >> On Fri, Jun 16 2017, Shaohua Li wrote: > >> > >> > On Fri, Jun 16, 2017 at 05:19:27PM +0800, Guoqing Jiang wrote: > >> >> The device owns Bitmap_sync flag needs recovery > >> >> to become in sync, and read page from this type > >> >> device could get stale status. > >> >> > >> >> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> > >> >> --- > >> >> When develop for clustered raid10 feature, if add a > >> >> disk under grow mode in master node, I could get > >> >> the "bitmap superblock UUID mismatch" warning due > >> >> to the page is read from Bitmap_sync device. > >> >> > >> >> drivers/md/bitmap.c | 3 ++- > >> >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> >> > >> >> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c > >> >> index bf7419a..bf34cd8 100644 > >> >> --- a/drivers/md/bitmap.c > >> >> +++ b/drivers/md/bitmap.c > >> >> @@ -156,7 +156,8 @@ static int read_sb_page(struct mddev *mddev, loff_t offset, > >> >> > >> >> rdev_for_each(rdev, mddev) { > >> >> if (! test_bit(In_sync, &rdev->flags) > >> > > >> > Why In_sync is set for the rdev? I think it shoudn't. > >> > >> There are several states a device can be in. > >> If ->raid_disk is < 0, then the device is a spare, and doesn't contain > >> and valid data. > >> Otherwise ->raid_disk >=0 and: > >> In_sync is clear, which means that blocks before ->recovery_offset > >> contain valid data, and blocks after there don't > >> In_sync is set, which means ->recovery_offset is irrelevant and > >> should be treated as MaxSector > >> Bitmap_sync is set, which could apply in either of the above cases, > >> and means blocks corresponding to bits that are set in the bitmap > >> may not be up to date. > >> > >> Bitmap_sync is only relevant before a device has been handed to the > >> personality. After it has been added, ->recovery_cp ensures that the > >> blocks that might be wrong are not read until until the bitmap-based > >> recovery has fixed them up. > >> > >> Bitmap_sync is only use to stop the device from being given to the > >> personality if a recovery won't be started because the array is > >> read-only. > >> > >> So it is perfectly valid for both In_sync and Bitmap_sync to be set. > >> If they are, it makes sense to avoid reading bitmap information from the > >> Bitmap_sync device, as that will be out-of-date. > >> > >> I'm not quite sure why Guoqing is getting a UUID mismatch ... it > >> suggests that the new device wasn't initialized properly. So there > >> might be another bug. But I think this is definitely a bug. > > > > Can you describe a scenario a disk has both bits set? I had hard time to > > imagine it. > > An array with a bitmap has a device fail (cable break?). > So the bitmap stops clearing bits and ->events_cleared stays > where it was, with the same value as events in the superblock > of the failed device. > > The device then starts working (cable is replaced) and > it device is added back to the array. > super_1_validate notices that the array has a bitmap and that the event > count in the new device is >= mddev->bitmap->events_cleared, but < > mddev->events. > i.e. it is in the range for which a bitmap-based recovery is both > possible and needed. > So it sets Bitmap_sync. > Then it notices that the 'role' in a valid role for the array, > so that role is stored in rdev->saved_raid_disk and, as there is no > MD_FEATURE_RECOVERY_OFFSET, In_sync is set. > A similar sequence can happen in super_90_validate. This part is quite confusing actually. The failed device is kicked out, so the array is degraded. When we readd the device, we shouldn't set In_sync. Actually we set In_sync in super_1_validate, we then clear it in add_bound_rdev and reset it till recovery finishes. Set it till recovery finishes makes sense to me, the super_1_validate is confusing. > > > > md_update_sb will update super and bitmap super at almost the same time. It's > > possible we update super but not bitmap super and there is power loss. Is this > > the case we have the both bits set? There seems no mechanism to prevent the > > first disk has older bitmap super than the second disk in this scenario. > > This isn't really about the superblock and bitmap on the one device > being in sync or not. > It is the superblock on one (failed) device having a different event > count than the bitmap (on the other devices) that is actively being used > by the array. My point is we could use a disk's superblock, but its bitmap superblock is stale (even is old) Anyway, I still didn't understand when the problem could happen. Maybe Guoqing can clarify. Thanks, Shaohua ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] md/bitmap: don't read page from device with Bitmap_sync 2017-06-21 17:47 ` Shaohua Li @ 2017-06-22 6:45 ` Guoqing Jiang 0 siblings, 0 replies; 8+ messages in thread From: Guoqing Jiang @ 2017-06-22 6:45 UTC (permalink / raw) To: Shaohua Li, NeilBrown; +Cc: linux-raid On 06/22/2017 01:47 AM, Shaohua Li wrote: > On Tue, Jun 20, 2017 at 01:59:45PM +1000, Neil Brown wrote: >> On Mon, Jun 19 2017, Shaohua Li wrote: >> >>> On Mon, Jun 19, 2017 at 09:11:23AM +1000, Neil Brown wrote: >>>> On Fri, Jun 16 2017, Shaohua Li wrote: >>>> >>>>> On Fri, Jun 16, 2017 at 05:19:27PM +0800, Guoqing Jiang wrote: >>>>>> The device owns Bitmap_sync flag needs recovery >>>>>> to become in sync, and read page from this type >>>>>> device could get stale status. >>>>>> >>>>>> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> >>>>>> --- >>>>>> When develop for clustered raid10 feature, if add a >>>>>> disk under grow mode in master node, I could get >>>>>> the "bitmap superblock UUID mismatch" warning due >>>>>> to the page is read from Bitmap_sync device. >>>>>> >>>>>> drivers/md/bitmap.c | 3 ++- >>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c >>>>>> index bf7419a..bf34cd8 100644 >>>>>> --- a/drivers/md/bitmap.c >>>>>> +++ b/drivers/md/bitmap.c >>>>>> @@ -156,7 +156,8 @@ static int read_sb_page(struct mddev *mddev, loff_t offset, >>>>>> >>>>>> rdev_for_each(rdev, mddev) { >>>>>> if (! test_bit(In_sync, &rdev->flags) >>>>> Why In_sync is set for the rdev? I think it shoudn't. >>>> There are several states a device can be in. >>>> If ->raid_disk is < 0, then the device is a spare, and doesn't contain >>>> and valid data. >>>> Otherwise ->raid_disk >=0 and: >>>> In_sync is clear, which means that blocks before ->recovery_offset >>>> contain valid data, and blocks after there don't >>>> In_sync is set, which means ->recovery_offset is irrelevant and >>>> should be treated as MaxSector >>>> Bitmap_sync is set, which could apply in either of the above cases, >>>> and means blocks corresponding to bits that are set in the bitmap >>>> may not be up to date. >>>> >>>> Bitmap_sync is only relevant before a device has been handed to the >>>> personality. After it has been added, ->recovery_cp ensures that the >>>> blocks that might be wrong are not read until until the bitmap-based >>>> recovery has fixed them up. >>>> >>>> Bitmap_sync is only use to stop the device from being given to the >>>> personality if a recovery won't be started because the array is >>>> read-only. >>>> >>>> So it is perfectly valid for both In_sync and Bitmap_sync to be set. >>>> If they are, it makes sense to avoid reading bitmap information from the >>>> Bitmap_sync device, as that will be out-of-date. >>>> >>>> I'm not quite sure why Guoqing is getting a UUID mismatch ... it >>>> suggests that the new device wasn't initialized properly. So there >>>> might be another bug. But I think this is definitely a bug. >>> Can you describe a scenario a disk has both bits set? I had hard time to >>> imagine it. >> An array with a bitmap has a device fail (cable break?). >> So the bitmap stops clearing bits and ->events_cleared stays >> where it was, with the same value as events in the superblock >> of the failed device. >> >> The device then starts working (cable is replaced) and >> it device is added back to the array. >> super_1_validate notices that the array has a bitmap and that the event >> count in the new device is >= mddev->bitmap->events_cleared, but < >> mddev->events. >> i.e. it is in the range for which a bitmap-based recovery is both >> possible and needed. >> So it sets Bitmap_sync. >> Then it notices that the 'role' in a valid role for the array, >> so that role is stored in rdev->saved_raid_disk and, as there is no >> MD_FEATURE_RECOVERY_OFFSET, In_sync is set. >> A similar sequence can happen in super_90_validate. > This part is quite confusing actually. The failed device is kicked out, so the > array is degraded. When we readd the device, we shouldn't set In_sync. > > Actually we set In_sync in super_1_validate, we then clear it in add_bound_rdev > and reset it till recovery finishes. Set it till recovery finishes makes sense > to me, the super_1_validate is confusing. > >>> md_update_sb will update super and bitmap super at almost the same time. It's >>> possible we update super but not bitmap super and there is power loss. Is this >>> the case we have the both bits set? There seems no mechanism to prevent the >>> first disk has older bitmap super than the second disk in this scenario. >> This isn't really about the superblock and bitmap on the one device >> being in sync or not. >> It is the superblock on one (failed) device having a different event >> count than the bitmap (on the other devices) that is actively being used >> by the array. > My point is we could use a disk's superblock, but its bitmap superblock is > stale (even is old) > > Anyway, I still didn't understand when the problem could happen. Maybe Guoqing > can clarify. My case is not same as what Neil described. But if add a disk to cluster raid10 under grow mode, I can see: 1. super_1_validate sets Bitmap_sync flag first, this is because the new added disk's event is less than array's event. But for native raid, both of the events are 0, so Bitmap_sync is not set, this difference is caused by do_md_run only calls md_allow_write for clustered raid, and IIRC we have to do it to resolve deadlock. 2. In_sync is set by raid10_start_reshape. This is called by either clustered raid or native raid, I don't think we can change it here ... Based on the above, it is possible that both of the two flags can exist together. Thanks, Guoqing ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-06-22 6:45 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-16 9:19 [PATCH] md/bitmap: don't read page from device with Bitmap_sync Guoqing Jiang 2017-06-16 17:36 ` Shaohua Li 2017-06-18 23:11 ` NeilBrown 2017-06-20 0:41 ` Shaohua Li 2017-06-20 3:59 ` NeilBrown 2017-06-21 9:05 ` Guoqing Jiang 2017-06-21 17:47 ` Shaohua Li 2017-06-22 6:45 ` Guoqing Jiang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.