* [PATCH RFC] md: call md_cluster_stop() in __md_stop()
@ 2022-08-12 2:16 NeilBrown
2022-08-15 6:19 ` Guoqing Jiang
0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2022-08-12 2:16 UTC (permalink / raw)
To: Song Liu, Guoqing Jiang; +Cc: linux-raid
[[ I noticed the e151 patch recently which seems to admit that it broke
something. So I looked into it and came up with this.
It seems to make sense, but I'm not in a position to test it.
Guoqing: does it look OK to you?
- NeilBrown
]]
As described in Commit: 48df498daf62 ("md: move bitmap_destroy to the
beginning of __md_stop") md_cluster_stop() needs to run before the
mdddev->thread is stopped.
The change to make this happen was reverted in Commit: e151db8ecfb0
("md-raid: destroy the bitmap after destroying the thread") due to
problems it caused.
To restore correct behaviour, make md_cluster_stop() reentrant and
explicitly call it at the start of __md_stop(), after first calling
md_bitmap_wait_behind_writes().
Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread")
Signed-off-by: NeilBrown <neilb@suse.de>
---
drivers/md/md-cluster.c | 1 +
drivers/md/md.c | 3 +++
2 files changed, 4 insertions(+)
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 742b2349fea3..37bf0aa4ed71 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -1009,6 +1009,7 @@ static int leave(struct mddev *mddev)
test_bit(MD_CLOSING, &mddev->flags)))
resync_bitmap(mddev);
+ mddev->cluster_info = NULL;
set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
md_unregister_thread(&cinfo->recovery_thread);
md_unregister_thread(&cinfo->recv_thread);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index afaf36b2f6ab..a57b2dff64dd 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6238,6 +6238,9 @@ static void mddev_detach(struct mddev *mddev)
static void __md_stop(struct mddev *mddev)
{
struct md_personality *pers = mddev->pers;
+
+ md_bitmap_wait_behind_writes(mddev);
+ md_cluster_stop(mddev);
mddev_detach(mddev);
/* Ensure ->event_work is done */
if (mddev->event_work.func)
--
2.36.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] md: call md_cluster_stop() in __md_stop()
2022-08-12 2:16 [PATCH RFC] md: call md_cluster_stop() in __md_stop() NeilBrown
@ 2022-08-15 6:19 ` Guoqing Jiang
2022-08-16 0:58 ` NeilBrown
0 siblings, 1 reply; 4+ messages in thread
From: Guoqing Jiang @ 2022-08-15 6:19 UTC (permalink / raw)
To: NeilBrown, Song Liu; +Cc: linux-raid
Hi Neil,
Sorry for later reply since I was on vacation last week.
On 8/12/22 10:16 AM, NeilBrown wrote:
> [[ I noticed the e151 patch recently which seems to admit that it broke
> something. So I looked into it and came up with this.
I just noticed it ...
> It seems to make sense, but I'm not in a position to test it.
> Guoqing: does it look OK to you?
> - NeilBrown
> ]]
>
> As described in Commit: 48df498daf62 ("md: move bitmap_destroy to the
> beginning of __md_stop") md_cluster_stop() needs to run before the
> mdddev->thread is stopped.
> The change to make this happen was reverted in Commit: e151db8ecfb0
> ("md-raid: destroy the bitmap after destroying the thread") due to
> problems it caused.
>
> To restore correct behaviour, make md_cluster_stop() reentrant and
> explicitly call it at the start of __md_stop(), after first calling
> md_bitmap_wait_behind_writes().
>
> Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread")
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> drivers/md/md-cluster.c | 1 +
> drivers/md/md.c | 3 +++
> 2 files changed, 4 insertions(+)
>
> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
> index 742b2349fea3..37bf0aa4ed71 100644
> --- a/drivers/md/md-cluster.c
> +++ b/drivers/md/md-cluster.c
> @@ -1009,6 +1009,7 @@ static int leave(struct mddev *mddev)
> test_bit(MD_CLOSING, &mddev->flags)))
> resync_bitmap(mddev);
>
> + mddev->cluster_info = NULL;
The above makes sense.
> set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
> md_unregister_thread(&cinfo->recovery_thread);
> md_unregister_thread(&cinfo->recv_thread);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index afaf36b2f6ab..a57b2dff64dd 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6238,6 +6238,9 @@ static void mddev_detach(struct mddev *mddev)
> static void __md_stop(struct mddev *mddev)
> {
> struct md_personality *pers = mddev->pers;
> +
> + md_bitmap_wait_behind_writes(mddev);
> + md_cluster_stop(mddev);
> mddev_detach(mddev);
> /* Ensure ->event_work is done */
> if (mddev->event_work.func)
The md_bitmap_destroy is called in __md_stop with or without e151db8ecfb0,
and it already invokes md_bitmap_wait_behind_writes and md_cluster_stop by
md_bitmap_free. So the above is sort of redundant to me.
For the issue described in e151db8ecfb, looks like raid1d was running after
__md_stop, I am wondering if dm-raid should stop write first just like
normal
md-raid.
diff --git a/drivers/md/md.c b/drivers/md/md.c
index afaf36b2f6ab..afc8d638eba0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6260,6 +6260,7 @@ void md_stop(struct mddev *mddev)
/* stop the array and free an attached data structures.
* This is called from dm-raid
*/
+ __md_stop_writes(mddev);
__md_stop(mddev);
bioset_exit(&mddev->bio_set);
bioset_exit(&mddev->sync_set);
Thanks,
Guoqing
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] md: call md_cluster_stop() in __md_stop()
2022-08-15 6:19 ` Guoqing Jiang
@ 2022-08-16 0:58 ` NeilBrown
2022-08-16 13:53 ` Guoqing Jiang
0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2022-08-16 0:58 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: Song Liu, linux-raid
On Mon, 15 Aug 2022, Guoqing Jiang wrote:
> Hi Neil,
>
> Sorry for later reply since I was on vacation last week.
>
> On 8/12/22 10:16 AM, NeilBrown wrote:
> > [[ I noticed the e151 patch recently which seems to admit that it broke
> > something. So I looked into it and came up with this.
>
> I just noticed it ...
>
> > It seems to make sense, but I'm not in a position to test it.
> > Guoqing: does it look OK to you?
> > - NeilBrown
> > ]]
> >
> > As described in Commit: 48df498daf62 ("md: move bitmap_destroy to the
> > beginning of __md_stop") md_cluster_stop() needs to run before the
> > mdddev->thread is stopped.
> > The change to make this happen was reverted in Commit: e151db8ecfb0
> > ("md-raid: destroy the bitmap after destroying the thread") due to
> > problems it caused.
> >
> > To restore correct behaviour, make md_cluster_stop() reentrant and
> > explicitly call it at the start of __md_stop(), after first calling
> > md_bitmap_wait_behind_writes().
> >
> > Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread")
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> > drivers/md/md-cluster.c | 1 +
> > drivers/md/md.c | 3 +++
> > 2 files changed, 4 insertions(+)
> >
> > diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
> > index 742b2349fea3..37bf0aa4ed71 100644
> > --- a/drivers/md/md-cluster.c
> > +++ b/drivers/md/md-cluster.c
> > @@ -1009,6 +1009,7 @@ static int leave(struct mddev *mddev)
> > test_bit(MD_CLOSING, &mddev->flags)))
> > resync_bitmap(mddev);
> >
> > + mddev->cluster_info = NULL;
>
> The above makes sense.
Thanks.
>
> > set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
> > md_unregister_thread(&cinfo->recovery_thread);
> > md_unregister_thread(&cinfo->recv_thread);
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index afaf36b2f6ab..a57b2dff64dd 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -6238,6 +6238,9 @@ static void mddev_detach(struct mddev *mddev)
> > static void __md_stop(struct mddev *mddev)
> > {
> > struct md_personality *pers = mddev->pers;
> > +
> > + md_bitmap_wait_behind_writes(mddev);
> > + md_cluster_stop(mddev);
> > mddev_detach(mddev);
> > /* Ensure ->event_work is done */
> > if (mddev->event_work.func)
>
> The md_bitmap_destroy is called in __md_stop with or without e151db8ecfb0,
> and it already invokes md_bitmap_wait_behind_writes and md_cluster_stop by
> md_bitmap_free. So the above is sort of redundant to me.
The point is that md_cluster_stop() needs to run before mddev_detach()
shuts down the thread. If we don't call all of md_bitmap_destroy()
before mddev_detach() we need to at least run md_cluster_stop(), and I
think we need to run md_bitmap_wait_behind_writes() before that.
>
> For the issue described in e151db8ecfb, looks like raid1d was running after
> __md_stop, I am wondering if dm-raid should stop write first just like
> normal
> md-raid.
That looks like a really good idea, that should make it safe to move
md_bitmap_destroy() back before mddev_detach().
Would you like to post a patch to make those two changes, and include
Mikulas Patocka, or should I?
Thanks,
NeilBrown
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index afaf36b2f6ab..afc8d638eba0 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6260,6 +6260,7 @@ void md_stop(struct mddev *mddev)
> /* stop the array and free an attached data structures.
> * This is called from dm-raid
> */
> + __md_stop_writes(mddev);
> __md_stop(mddev);
> bioset_exit(&mddev->bio_set);
> bioset_exit(&mddev->sync_set);
>
> Thanks,
> Guoqing
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] md: call md_cluster_stop() in __md_stop()
2022-08-16 0:58 ` NeilBrown
@ 2022-08-16 13:53 ` Guoqing Jiang
0 siblings, 0 replies; 4+ messages in thread
From: Guoqing Jiang @ 2022-08-16 13:53 UTC (permalink / raw)
To: NeilBrown; +Cc: Song Liu, linux-raid
On 8/16/22 8:58 AM, NeilBrown wrote:
> On Mon, 15 Aug 2022, Guoqing Jiang wrote:
>> Hi Neil,
>>
>> Sorry for later reply since I was on vacation last week.
>>
>> On 8/12/22 10:16 AM, NeilBrown wrote:
>>> [[ I noticed the e151 patch recently which seems to admit that it broke
>>> something. So I looked into it and came up with this.
>> I just noticed it ...
>>
>>> It seems to make sense, but I'm not in a position to test it.
>>> Guoqing: does it look OK to you?
>>> - NeilBrown
>>> ]]
>>>
>>> As described in Commit: 48df498daf62 ("md: move bitmap_destroy to the
>>> beginning of __md_stop") md_cluster_stop() needs to run before the
>>> mdddev->thread is stopped.
>>> The change to make this happen was reverted in Commit: e151db8ecfb0
>>> ("md-raid: destroy the bitmap after destroying the thread") due to
>>> problems it caused.
>>>
>>> To restore correct behaviour, make md_cluster_stop() reentrant and
>>> explicitly call it at the start of __md_stop(), after first calling
>>> md_bitmap_wait_behind_writes().
>>>
>>> Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread")
>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>> ---
>>> drivers/md/md-cluster.c | 1 +
>>> drivers/md/md.c | 3 +++
>>> 2 files changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
>>> index 742b2349fea3..37bf0aa4ed71 100644
>>> --- a/drivers/md/md-cluster.c
>>> +++ b/drivers/md/md-cluster.c
>>> @@ -1009,6 +1009,7 @@ static int leave(struct mddev *mddev)
>>> test_bit(MD_CLOSING, &mddev->flags)))
>>> resync_bitmap(mddev);
>>>
>>> + mddev->cluster_info = NULL;
>> The above makes sense.
> Thanks.
>
>>> set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
>>> md_unregister_thread(&cinfo->recovery_thread);
>>> md_unregister_thread(&cinfo->recv_thread);
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index afaf36b2f6ab..a57b2dff64dd 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -6238,6 +6238,9 @@ static void mddev_detach(struct mddev *mddev)
>>> static void __md_stop(struct mddev *mddev)
>>> {
>>> struct md_personality *pers = mddev->pers;
>>> +
>>> + md_bitmap_wait_behind_writes(mddev);
>>> + md_cluster_stop(mddev);
>>> mddev_detach(mddev);
>>> /* Ensure ->event_work is done */
>>> if (mddev->event_work.func)
>> The md_bitmap_destroy is called in __md_stop with or without e151db8ecfb0,
>> and it already invokes md_bitmap_wait_behind_writes and md_cluster_stop by
>> md_bitmap_free. So the above is sort of redundant to me.
> The point is that md_cluster_stop() needs to run before mddev_detach()
> shuts down the thread. If we don't call all of md_bitmap_destroy()
> before mddev_detach() we need to at least run md_cluster_stop(), and I
> think we need to run md_bitmap_wait_behind_writes() before that.
>
>
>> For the issue described in e151db8ecfb, looks like raid1d was running after
>> __md_stop, I am wondering if dm-raid should stop write first just like
>> normal md-raid.
> That looks like a really good idea, that should make it safe to move
> md_bitmap_destroy() back before mddev_detach().
> Would you like to post a patch to make those two changes, and include
> Mikulas Patocka, or should I?
NP, will send it later, thanks for your review.
BRs,
Guoqing
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-08-16 13:54 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 2:16 [PATCH RFC] md: call md_cluster_stop() in __md_stop() NeilBrown
2022-08-15 6:19 ` Guoqing Jiang
2022-08-16 0:58 ` NeilBrown
2022-08-16 13:53 ` Guoqing Jiang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.