All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dragan Stancevic <dragan@stancevic.com>
To: Yu Kuai <yukuai1@huaweicloud.com>, song@kernel.org
Cc: buczek@molgen.mpg.de, guoqing.jiang@linux.dev,
	it+raid@molgen.mpg.de, linux-kernel@vger.kernel.org,
	linux-raid@vger.kernel.org, msmith626@gmail.com,
	"yangerkun@huawei.com" <yangerkun@huawei.com>,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition
Date: Mon, 28 Aug 2023 15:32:35 -0500	[thread overview]
Message-ID: <cf765117-7270-1b98-7e82-82a1ca1daa2a@stancevic.com> (raw)
In-Reply-To: <07d5c7c2-c444-8747-ed6d-ca24231decd8@huaweicloud.com>

Hi Kuai,


On 8/23/23 20:18, Yu Kuai wrote:
> Hi,
> 
> 在 2023/08/23 23:33, Dragan Stancevic 写道:
>> Hi Kuai-
>>
>> On 8/22/23 20:22, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2023/08/23 5:16, Dragan Stancevic 写道:
>>>> On Tue, 3/28/23 17:01 Song Liu wrote:
>>>>> On Thu, Mar 16, 2023 at 8:25=E2=80=AFAM Marc Smith 
>>>>> <msmith626@gmail.com>
>>>>> wr=
>>>>> ote:
>>>>>   >
>>>>>   > On Tue, Mar 14, 2023 at 10:45=E2=80=AFAM Marc Smith
>>>>> <msmith626@gmail.com>=
>>>>>    wrote:
>>>>>   > >
>>>>>   > > On Tue, Mar 14, 2023 at 9:55=E2=80=AFAM Guoqing Jiang
>>>>> <guoqing.jiang@li=
>>>>> nux.dev> wrote:
>>>>>   > > >
>>>>>   > > >
>>>>>   > > >
>>>>>   > > > On 3/14/23 21:25, Marc Smith wrote:
>>>>>   > > > > On Mon, Feb 8, 2021 at 7:49=E2=80=AFPM Guoqing Jiang
>>>>>   > > > > <guoqing.jiang@cloud.ionos.com> wrote:
>>>>>   > > > >> Hi Donald,
>>>>>   > > > >>
>>>>>   > > > >> On 2/8/21 19:41, Donald Buczek wrote:
>>>>>   > > > >>> Dear Guoqing,
>>>>>   > > > >>>
>>>>>   > > > >>> On 08.02.21 15:53, Guoqing Jiang wrote:
>>>>>   > > > >>>>
>>>>>   > > > >>>> On 2/8/21 12:38, Donald Buczek wrote:
>>>>>   > > > >>>>>> 5. maybe don't hold reconfig_mutex when try to 
>>>>> unregister
>>>>>   > > > >>>>>> sync_thread, like this.
>>>>>   > > > >>>>>>
>>>>>   > > > >>>>>>           /* resync has finished, collect result */
>>>>>   > > > >>>>>>           mddev_unlock(mddev);
>>>>>   > > > >>>>>>           md_unregister_thread(&mddev->sync_thread);
>>>>>   > > > >>>>>>           mddev_lock(mddev);
>>>>>   > > > >>>>> As above: While we wait for the sync thread to 
>>>>> terminate,
>>>>> would=
>>>>> n't it
>>>>>   > > > >>>>> be a problem, if another user space operation takes 
>>>>> the mutex?
>>>>>   > > > >>>> I don't think other places can be blocked while hold 
>>>>> mutex,
>>>>> othe=
>>>>> rwise
>>>>>   > > > >>>> these places can cause potential deadlock. Please try 
>>>>> above
>>>>> two =
>>>>> lines
>>>>>   > > > >>>> change. And perhaps others have better idea.
>>>>>   > > > >>> Yes, this works. No deadlock after >11000 seconds,
>>>>>   > > > >>>
>>>>>   > > > >>> (Time till deadlock from previous runs/seconds: 1723, 37,
>>>>> 434, 12=
>>>>> 65,
>>>>>   > > > >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
>>>>>   > > > >> Great. I will send a formal patch with your reported-by and
>>>>> tested=
>>>>> -by.
>>>>>   > > > >>
>>>>>   > > > >> Thanks,
>>>>>   > > > >> Guoqing
>>>>>   > > > > I'm still hitting this issue with Linux 5.4.229 -- it looks
>>>>> like 1/=
>>>>> 2
>>>>>   > > > > of the patches that supposedly resolve this were applied 
>>>>> to the
>>>>> sta=
>>>>> ble
>>>>>   > > > > kernels, however, one was omitted due to a regression:
>>>>>   > > > > md: don't unregister sync_thread with reconfig_mutex held
>>>>> (upstream
>>>>>   > > > > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
>>>>>   > > > >
>>>>>   > > > > I don't see any follow-up on the thread from June 8th 2022
>>>>> asking f=
>>>>> or
>>>>>   > > > > this patch to be dropped from all stable kernels since it 
>>>>> caused a
>>>>>   > > > > regression.
>>>>>   > > > >
>>>>>   > > > > The patch doesn't appear to be present in the current 
>>>>> mainline
>>>>> kern=
>>>>> el
>>>>>   > > > > (6.3-rc2) either. So I assume this issue is still present
>>>>> there, or=
>>>>>    it
>>>>>   > > > > was resolved differently and I just can't find the 
>>>>> commit/patch.
>>>>>   > > >
>>>>>   > > > It should be fixed by commit 9dfbdafda3b3"md: unlock mddev 
>>>>> before
>>>>> rea=
>>>>> p
>>>>>   > > > sync_thread in action_store".
>>>>>   > >
>>>>>   > > Okay, let me try applying that patch... it does not appear to be
>>>>>   > > present in my 5.4.229 kernel source. Thanks.
>>>>>   >
>>>>>   > Yes, applying this '9dfbdafda3b3 "md: unlock mddev before reap
>>>>>   > sync_thread in action_store"' patch on top of vanilla 5.4.229 
>>>>> source
>>>>>   > appears to fix the problem for me -- I can't reproduce the 
>>>>> issue with
>>>>>   > the script, and it's been running for >24 hours now. 
>>>>> (Previously I was
>>>>>   > able to induce the issue within a matter of minutes.)
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> Could you please run your reproducer on the md-tmp branch?
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>>>>
>>>>> This contains a different version of the fix by Yu Kuai.
>>>>>
>>>>> Thanks,
>>>>> Song
>>>>>
>>>>
>>>> Hi Song, I can easily reproduce this issue on 5.10.133 and 5.10.53. 
>>>> The change
>>>> "9dfbdafda3b3 "md: unlock mddev before reap" does not fix the issue 
>>>> for me.
>>>>
>>>> But I did pull the changes from the md-tmp branch you are refering:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>>>
>>>> I was not totally clear on which change exactly to pull, but I pulled
>>>> the following changes:
>>>> 2023-03-28 md: enhance checking in md_check_recovery()md-tmp    Yu 
>>>> Kuai    1 -7/+15
>>>> 2023-03-28 md: wake up 'resync_wait' at last in 
>>>> md_reap_sync_thread()    Yu Kuai    1 -1/+1
>>>> 2023-03-28 md: refactor idle/frozen_sync_thread()    Yu Kuai    2 
>>>> -4/+22
>>>> 2023-03-28 md: add a mutex to synchronize idle and frozen in 
>>>> action_store()    Yu Kuai    2 -0/+8
>>>> 2023-03-28 md: refactor action_store() for 'idle' and 'frozen'    Yu 
>>>> Kuai    1 -16/+45
>>>>
>>>> I used to be able to reproduce the lockup within minutes, but with 
>>>> those
>>>> changes the test system has been running for more than 120 hours.
>>>>
>>>> When you said a "different fix", can you confirm that I grabbed the 
>>>> right
>>>> changes and that I need all 5 of them.
>>>
>>> Yes, you grabbed the right changes, and these patches is merged to
>>> linux-next as well.
>>>>
>>>> And second question was, has this fix been submitted upstream yet?
>>>> If so which kernel version?
>>>
>>> This fix is currently in linux-next, and will be applied to v6.6-rc1
>>> soon.
>>
>> Thank you, that is great news. I'd like to see this change backported 
>> to 5.10 and 6.1, do you have any plans of backporting to any of the 
>> previous kernels?
>>
>> If not, I would like to try to get your changes into 5.10 and 6.1 if 
>> Greg will accept them.
>>
> 
> I don't have plans yet, so feel free to do this, I guess these patches
> won't be picked automatically due to the conflict. Feel free to ask if
> you meet any problems.

Just a followup on 6.1 testing. I tried reproducing this problem for 5 
days with 6.1.42 kernel without your patches and I was not able to 
reproduce it.

It seems that 6.1 has some other code that prevents this from happening.

On 5.10 I can reproduce it within minutes to an hour.



> 
> Thanks,
> Kuai
> 
>>
>> Four out of five of your changes were a straight cherry-pick into 
>> 5.10, one needed a minor conflict resolution. But I can definitely 
>> confirm that your changes fix the lockup issue on 5.10
>>
>> I am now switching to 6.1 and will test the changes there too.
>>
>>
>> Thanks
>>
>>
>> -- 
>> Peace can only come as a natural consequence
>> of universal enlightenment -Dr. Nikola Tesla
>>
>>
>> .
>>
> 

-- 
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla


  reply	other threads:[~2023-08-28 20:43 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28 12:25 md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition Donald Buczek
2020-11-30  2:06 ` Guoqing Jiang
2020-12-01  9:29   ` Donald Buczek
2020-12-02 17:28     ` Donald Buczek
2020-12-03  1:55       ` Guoqing Jiang
2020-12-03 11:42         ` Donald Buczek
2020-12-21 12:33           ` Donald Buczek
2021-01-19 11:30             ` Donald Buczek
2021-01-20 16:33               ` Guoqing Jiang
2021-01-23 13:04                 ` Donald Buczek
2021-01-25  8:54                   ` Donald Buczek
2021-01-25 21:32                     ` Donald Buczek
2021-01-26  0:44                       ` Guoqing Jiang
2021-01-26  9:50                         ` Donald Buczek
2021-01-26 11:14                           ` Guoqing Jiang
2021-01-26 12:58                             ` Donald Buczek
2021-01-26 14:06                               ` Guoqing Jiang
2021-01-26 16:05                                 ` Donald Buczek
2021-02-02 15:42                                   ` Guoqing Jiang
2021-02-08 11:38                                     ` Donald Buczek
2021-02-08 14:53                                       ` Guoqing Jiang
2021-02-08 18:41                                         ` Donald Buczek
2021-02-09  0:46                                           ` Guoqing Jiang
2021-02-09  9:24                                             ` Donald Buczek
2023-03-14 13:25                                             ` Marc Smith
2023-03-14 13:55                                               ` Guoqing Jiang
2023-03-14 14:45                                                 ` Marc Smith
2023-03-16 15:25                                                   ` Marc Smith
2023-03-29  0:01                                                     ` Song Liu
2023-08-22 21:16                                                       ` Dragan Stancevic
2023-08-23  1:22                                                         ` Yu Kuai
2023-08-23 15:33                                                           ` Dragan Stancevic
2023-08-24  1:18                                                             ` Yu Kuai
2023-08-28 20:32                                                               ` Dragan Stancevic [this message]
2023-08-30  1:36                                                                 ` Yu Kuai
2023-09-05  3:50                                                                   ` Yu Kuai
2023-09-05 13:54                                                                     ` Dragan Stancevic
2023-09-13  9:08                                                                       ` Donald Buczek
2023-09-13 14:16                                                                         ` Dragan Stancevic
2023-09-14  6:03                                                                           ` Donald Buczek
2023-09-17  8:55                                                                             ` Donald Buczek
2023-09-24 14:35                                                                               ` Donald Buczek
2023-09-25  1:11                                                                                 ` Yu Kuai
2023-09-25  9:11                                                                                   ` Donald Buczek
2023-09-25  9:32                                                                                     ` Yu Kuai
2023-03-15  3:02                                                 ` Yu Kuai
2023-03-15  9:30                                                   ` Guoqing Jiang
2023-03-15  9:53                                                     ` Yu Kuai
2023-03-15  7:52                                               ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf765117-7270-1b98-7e82-82a1ca1daa2a@stancevic.com \
    --to=dragan@stancevic.com \
    --cc=buczek@molgen.mpg.de \
    --cc=guoqing.jiang@linux.dev \
    --cc=it+raid@molgen.mpg.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=msmith626@gmail.com \
    --cc=song@kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.