linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dragan Stancevic <dragan@stancevic.com>
To: Yu Kuai <yukuai1@huaweicloud.com>, song@kernel.org
Cc: buczek@molgen.mpg.de, guoqing.jiang@linux.dev,
	it+raid@molgen.mpg.de, linux-kernel@vger.kernel.org,
	linux-raid@vger.kernel.org, msmith626@gmail.com,
	"yangerkun@huawei.com" <yangerkun@huawei.com>,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition
Date: Mon, 28 Aug 2023 15:32:35 -0500	[thread overview]
Message-ID: <cf765117-7270-1b98-7e82-82a1ca1daa2a@stancevic.com> (raw)
In-Reply-To: <07d5c7c2-c444-8747-ed6d-ca24231decd8@huaweicloud.com>

Hi Kuai,


On 8/23/23 20:18, Yu Kuai wrote:
> Hi,
> 
> 在 2023/08/23 23:33, Dragan Stancevic 写道:
>> Hi Kuai-
>>
>> On 8/22/23 20:22, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2023/08/23 5:16, Dragan Stancevic 写道:
>>>> On Tue, 3/28/23 17:01 Song Liu wrote:
>>>>> On Thu, Mar 16, 2023 at 8:25=E2=80=AFAM Marc Smith 
>>>>> <msmith626@gmail.com>
>>>>> wr=
>>>>> ote:
>>>>>   >
>>>>>   > On Tue, Mar 14, 2023 at 10:45=E2=80=AFAM Marc Smith
>>>>> <msmith626@gmail.com>=
>>>>>    wrote:
>>>>>   > >
>>>>>   > > On Tue, Mar 14, 2023 at 9:55=E2=80=AFAM Guoqing Jiang
>>>>> <guoqing.jiang@li=
>>>>> nux.dev> wrote:
>>>>>   > > >
>>>>>   > > >
>>>>>   > > >
>>>>>   > > > On 3/14/23 21:25, Marc Smith wrote:
>>>>>   > > > > On Mon, Feb 8, 2021 at 7:49=E2=80=AFPM Guoqing Jiang
>>>>>   > > > > <guoqing.jiang@cloud.ionos.com> wrote:
>>>>>   > > > >> Hi Donald,
>>>>>   > > > >>
>>>>>   > > > >> On 2/8/21 19:41, Donald Buczek wrote:
>>>>>   > > > >>> Dear Guoqing,
>>>>>   > > > >>>
>>>>>   > > > >>> On 08.02.21 15:53, Guoqing Jiang wrote:
>>>>>   > > > >>>>
>>>>>   > > > >>>> On 2/8/21 12:38, Donald Buczek wrote:
>>>>>   > > > >>>>>> 5. maybe don't hold reconfig_mutex when try to 
>>>>> unregister
>>>>>   > > > >>>>>> sync_thread, like this.
>>>>>   > > > >>>>>>
>>>>>   > > > >>>>>>           /* resync has finished, collect result */
>>>>>   > > > >>>>>>           mddev_unlock(mddev);
>>>>>   > > > >>>>>>           md_unregister_thread(&mddev->sync_thread);
>>>>>   > > > >>>>>>           mddev_lock(mddev);
>>>>>   > > > >>>>> As above: While we wait for the sync thread to 
>>>>> terminate,
>>>>> would=
>>>>> n't it
>>>>>   > > > >>>>> be a problem, if another user space operation takes 
>>>>> the mutex?
>>>>>   > > > >>>> I don't think other places can be blocked while hold 
>>>>> mutex,
>>>>> othe=
>>>>> rwise
>>>>>   > > > >>>> these places can cause potential deadlock. Please try 
>>>>> above
>>>>> two =
>>>>> lines
>>>>>   > > > >>>> change. And perhaps others have better idea.
>>>>>   > > > >>> Yes, this works. No deadlock after >11000 seconds,
>>>>>   > > > >>>
>>>>>   > > > >>> (Time till deadlock from previous runs/seconds: 1723, 37,
>>>>> 434, 12=
>>>>> 65,
>>>>>   > > > >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
>>>>>   > > > >> Great. I will send a formal patch with your reported-by and
>>>>> tested=
>>>>> -by.
>>>>>   > > > >>
>>>>>   > > > >> Thanks,
>>>>>   > > > >> Guoqing
>>>>>   > > > > I'm still hitting this issue with Linux 5.4.229 -- it looks
>>>>> like 1/=
>>>>> 2
>>>>>   > > > > of the patches that supposedly resolve this were applied 
>>>>> to the
>>>>> sta=
>>>>> ble
>>>>>   > > > > kernels, however, one was omitted due to a regression:
>>>>>   > > > > md: don't unregister sync_thread with reconfig_mutex held
>>>>> (upstream
>>>>>   > > > > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
>>>>>   > > > >
>>>>>   > > > > I don't see any follow-up on the thread from June 8th 2022
>>>>> asking f=
>>>>> or
>>>>>   > > > > this patch to be dropped from all stable kernels since it 
>>>>> caused a
>>>>>   > > > > regression.
>>>>>   > > > >
>>>>>   > > > > The patch doesn't appear to be present in the current 
>>>>> mainline
>>>>> kern=
>>>>> el
>>>>>   > > > > (6.3-rc2) either. So I assume this issue is still present
>>>>> there, or=
>>>>>    it
>>>>>   > > > > was resolved differently and I just can't find the 
>>>>> commit/patch.
>>>>>   > > >
>>>>>   > > > It should be fixed by commit 9dfbdafda3b3"md: unlock mddev 
>>>>> before
>>>>> rea=
>>>>> p
>>>>>   > > > sync_thread in action_store".
>>>>>   > >
>>>>>   > > Okay, let me try applying that patch... it does not appear to be
>>>>>   > > present in my 5.4.229 kernel source. Thanks.
>>>>>   >
>>>>>   > Yes, applying this '9dfbdafda3b3 "md: unlock mddev before reap
>>>>>   > sync_thread in action_store"' patch on top of vanilla 5.4.229 
>>>>> source
>>>>>   > appears to fix the problem for me -- I can't reproduce the 
>>>>> issue with
>>>>>   > the script, and it's been running for >24 hours now. 
>>>>> (Previously I was
>>>>>   > able to induce the issue within a matter of minutes.)
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> Could you please run your reproducer on the md-tmp branch?
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>>>>
>>>>> This contains a different version of the fix by Yu Kuai.
>>>>>
>>>>> Thanks,
>>>>> Song
>>>>>
>>>>
>>>> Hi Song, I can easily reproduce this issue on 5.10.133 and 5.10.53. 
>>>> The change
>>>> "9dfbdafda3b3 "md: unlock mddev before reap" does not fix the issue 
>>>> for me.
>>>>
>>>> But I did pull the changes from the md-tmp branch you are refering:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>>>
>>>> I was not totally clear on which change exactly to pull, but I pulled
>>>> the following changes:
>>>> 2023-03-28 md: enhance checking in md_check_recovery()md-tmp    Yu 
>>>> Kuai    1 -7/+15
>>>> 2023-03-28 md: wake up 'resync_wait' at last in 
>>>> md_reap_sync_thread()    Yu Kuai    1 -1/+1
>>>> 2023-03-28 md: refactor idle/frozen_sync_thread()    Yu Kuai    2 
>>>> -4/+22
>>>> 2023-03-28 md: add a mutex to synchronize idle and frozen in 
>>>> action_store()    Yu Kuai    2 -0/+8
>>>> 2023-03-28 md: refactor action_store() for 'idle' and 'frozen'    Yu 
>>>> Kuai    1 -16/+45
>>>>
>>>> I used to be able to reproduce the lockup within minutes, but with 
>>>> those
>>>> changes the test system has been running for more than 120 hours.
>>>>
>>>> When you said a "different fix", can you confirm that I grabbed the 
>>>> right
>>>> changes and that I need all 5 of them.
>>>
>>> Yes, you grabbed the right changes, and these patches is merged to
>>> linux-next as well.
>>>>
>>>> And second question was, has this fix been submitted upstream yet?
>>>> If so which kernel version?
>>>
>>> This fix is currently in linux-next, and will be applied to v6.6-rc1
>>> soon.
>>
>> Thank you, that is great news. I'd like to see this change backported 
>> to 5.10 and 6.1, do you have any plans of backporting to any of the 
>> previous kernels?
>>
>> If not, I would like to try to get your changes into 5.10 and 6.1 if 
>> Greg will accept them.
>>
> 
> I don't have plans yet, so feel free to do this, I guess these patches
> won't be picked automatically due to the conflict. Feel free to ask if
> you meet any problems.

Just a followup on 6.1 testing. I tried reproducing this problem for 5 
days with 6.1.42 kernel without your patches and I was not able to 
reproduce it.

It seems that 6.1 has some other code that prevents this from happening.

On 5.10 I can reproduce it within minutes to an hour.



> 
> Thanks,
> Kuai
> 
>>
>> Four out of five of your changes were a straight cherry-pick into 
>> 5.10, one needed a minor conflict resolution. But I can definitely 
>> confirm that your changes fix the lockup issue on 5.10
>>
>> I am now switching to 6.1 and will test the changes there too.
>>
>>
>> Thanks
>>
>>
>> -- 
>> Peace can only come as a natural consequence
>> of universal enlightenment -Dr. Nikola Tesla
>>
>>
>> .
>>
> 

-- 
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla


  reply	other threads:[~2023-08-28 20:43 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28 12:25 md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition Donald Buczek
2020-11-30  2:06 ` Guoqing Jiang
2020-12-01  9:29   ` Donald Buczek
2020-12-02 17:28     ` Donald Buczek
2020-12-03  1:55       ` Guoqing Jiang
2020-12-03 11:42         ` Donald Buczek
2020-12-21 12:33           ` Donald Buczek
2021-01-19 11:30             ` Donald Buczek
2021-01-20 16:33               ` Guoqing Jiang
2021-01-23 13:04                 ` Donald Buczek
2021-01-25  8:54                   ` Donald Buczek
2021-01-25 21:32                     ` Donald Buczek
2021-01-26  0:44                       ` Guoqing Jiang
2021-01-26  9:50                         ` Donald Buczek
2021-01-26 11:14                           ` Guoqing Jiang
2021-01-26 12:58                             ` Donald Buczek
2021-01-26 14:06                               ` Guoqing Jiang
2021-01-26 16:05                                 ` Donald Buczek
2021-02-02 15:42                                   ` Guoqing Jiang
2021-02-08 11:38                                     ` Donald Buczek
2021-02-08 14:53                                       ` Guoqing Jiang
2021-02-08 18:41                                         ` Donald Buczek
2021-02-09  0:46                                           ` Guoqing Jiang
2021-02-09  9:24                                             ` Donald Buczek
2023-03-14 13:25                                             ` Marc Smith
2023-03-14 13:55                                               ` Guoqing Jiang
2023-03-14 14:45                                                 ` Marc Smith
2023-03-16 15:25                                                   ` Marc Smith
2023-03-29  0:01                                                     ` Song Liu
2023-08-22 21:16                                                       ` Dragan Stancevic
2023-08-23  1:22                                                         ` Yu Kuai
2023-08-23 15:33                                                           ` Dragan Stancevic
2023-08-24  1:18                                                             ` Yu Kuai
2023-08-28 20:32                                                               ` Dragan Stancevic [this message]
2023-08-30  1:36                                                                 ` Yu Kuai
2023-09-05  3:50                                                                   ` Yu Kuai
2023-09-05 13:54                                                                     ` Dragan Stancevic
2023-09-13  9:08                                                                       ` Donald Buczek
2023-09-13 14:16                                                                         ` Dragan Stancevic
2023-09-14  6:03                                                                           ` Donald Buczek
2023-09-17  8:55                                                                             ` Donald Buczek
2023-09-24 14:35                                                                               ` Donald Buczek
2023-09-25  1:11                                                                                 ` Yu Kuai
2023-09-25  9:11                                                                                   ` Donald Buczek
2023-09-25  9:32                                                                                     ` Yu Kuai
2023-03-15  3:02                                                 ` Yu Kuai
2023-03-15  9:30                                                   ` Guoqing Jiang
2023-03-15  9:53                                                     ` Yu Kuai
2023-03-15  7:52                                               ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf765117-7270-1b98-7e82-82a1ca1daa2a@stancevic.com \
    --to=dragan@stancevic.com \
    --cc=buczek@molgen.mpg.de \
    --cc=guoqing.jiang@linux.dev \
    --cc=it+raid@molgen.mpg.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=msmith626@gmail.com \
    --cc=song@kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).