linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Menzel <pmenzel@molgen.mpg.de>
To: Song Liu <song@kernel.org>,
	Guoqing Jiang <guoqing.jiang@cloud.ionos.com>,
	Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>
Cc: linux-raid <linux-raid@vger.kernel.org>,
	dm-devel@redhat.com, Donald Buczek <buczek@molgen.mpg.de>,
	it+raid@molgen.mpg.de
Subject: Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
Date: Tue, 30 Nov 2021 18:25:06 +0100	[thread overview]
Message-ID: <a3a1fed7-b886-8603-aa20-20d667a837a7@molgen.mpg.de> (raw)
In-Reply-To: <CAPhsuW4V8JCCKePj11rf3zo4MJTz6TpW6DDeNmcJBfRSoN+NDA@mail.gmail.com>

Dear Linux folks,


Am 20.03.21 um 00:00 schrieb Song Liu:
> On Wed, Feb 24, 2021 at 1:26 AM Guoqing Jiang wrote:

>> On 2/24/21 10:09, Song Liu wrote:
>>> On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel wrote:
>>>>
>>>> [+cc Donald]
>>>>
>>>> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>>>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>>>> doesn't reconfigure array.
>>>>>
>>>>> And it could cause deadlock problem for raid5 as follows:
>>>>>
>>>>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>>>>       idle to sync_action.
>>>>> 2. raid5 sync thread was blocked if there were too many active stripes.
>>>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>>>>       which causes the number of active stripes can't be decreased.
>>>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>>>>       to hold reconfig_mutex.
>>>>>
>>>>> More details in the link:
>>>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>>>>
>>>>> And add one parameter to md_reap_sync_thread since it could be called by
>>>>> dm-raid which doesn't hold reconfig_mutex.
>>>>>
>>>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>>>>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>>>
>>> I don't really like this fix. But I haven't got a better (and not too
>>> complicated)
>>> alternative.
>>>
>>>>> ---
>>>>> V2:
>>>>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>>>>
>>>>>     drivers/md/dm-raid.c |  2 +-
>>>>>     drivers/md/md.c      | 14 +++++++++-----
>>>>>     drivers/md/md.h      |  2 +-
>>>>>     3 files changed, 11 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>>>> index cab12b2..0c4cbba 100644
>>>>> --- a/drivers/md/dm-raid.c
>>>>> +++ b/drivers/md/dm-raid.c
>>>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>>>>         if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>>>>                 if (mddev->sync_thread) {
>>>>>                         set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>>>> -                     md_reap_sync_thread(mddev);
>>>>> +                     md_reap_sync_thread(mddev, false);
>>>
>>> I think we can add mddev_lock() and mddev_unlock() here and then we don't
>>> need the extra parameter?
>>
>> I thought it too, but I would prefer get the input from DM people first.
>>
>> @ Mike or Alasdair
> 
> Hi Mike and Alasdair,
> 
> Could you please comment on this option: adding mddev_lock() and mddev_unlock()
> to raid_message() around md_reap_sync_thread()?

The issue is unfortunately still unresolved (at least Linux 5.10.82). 
How can we move forward?


Kind regards,

Paul

  reply	other threads:[~2021-11-30 17:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-13  0:49 [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held Guoqing Jiang
2021-02-15 11:07 ` Paul Menzel
2021-02-24  9:09   ` Song Liu
2021-02-24  9:25     ` Guoqing Jiang
2021-03-19 23:00       ` Song Liu
2021-11-30 17:25         ` Paul Menzel [this message]
2021-11-30 17:27           ` Paul Menzel
2021-12-08 14:16             ` Guoqing Jiang
     [not found]               ` <CAM23VxrYRbWEUsCsez2QOQM9oWKxSv432rc9oZCj5zEPmtND0A@mail.gmail.com>
2021-12-09  0:47                 ` Guoqing Jiang
2021-12-09 12:54   ` Donald Buczek
2021-12-09 12:57   ` Donald Buczek
2021-12-10  1:06     ` Guoqing Jiang
2021-12-10 14:16 ` Donald Buczek
2021-12-14  2:34   ` Guoqing Jiang
2021-12-14  9:31     ` Donald Buczek
2021-12-14 10:03       ` Guoqing Jiang
2021-12-14 12:21         ` Donald Buczek
2022-01-11 12:25 ` Mikko Rantalainen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3a1fed7-b886-8603-aa20-20d667a837a7@molgen.mpg.de \
    --to=pmenzel@molgen.mpg.de \
    --cc=agk@redhat.com \
    --cc=buczek@molgen.mpg.de \
    --cc=dm-devel@redhat.com \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=it+raid@molgen.mpg.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).