linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dragan Stancevic <dragan@stancevic.com>
To: Donald Buczek <buczek@molgen.mpg.de>,
	Yu Kuai <yukuai1@huaweicloud.com>,
	song@kernel.org
Cc: guoqing.jiang@linux.dev, it+raid@molgen.mpg.de,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	msmith626@gmail.com,
	"yangerkun@huawei.com" <yangerkun@huawei.com>
Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition
Date: Wed, 13 Sep 2023 09:16:51 -0500	[thread overview]
Message-ID: <de7f6fba-c6e0-7549-199e-36548b68a862@stancevic.com> (raw)
In-Reply-To: <63c63d93-30fc-0175-0033-846b93fe9eff@molgen.mpg.de>

Hi Donald-

On 9/13/23 04:08, Donald Buczek wrote:
> On 9/5/23 3:54 PM, Dragan Stancevic wrote:
>> On 9/4/23 22:50, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2023/08/30 9:36, Yu Kuai 写道:
>>>> Hi,
>>>>
>>>> 在 2023/08/29 4:32, Dragan Stancevic 写道:
>>>>
>>>>> Just a followup on 6.1 testing. I tried reproducing this problem for 5 days with 6.1.42 kernel without your patches and I was not able to reproduce it.
>>>
>>> oops, I forgot that you need to backport this patch first to reporduce
>>> this problem:
>>>
>>> https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@huaweicloud.com/
>>>
>>> The patch fix the deadlock as well, but it introduce some regressions.
> 
> We've just got an unplanned lock up on "check" to "idle" transition with 6.1.52 after a few hours on a backup server. For the last 2 1/2 years we used the patch I originally proposed with multiple kernel versions [1]. But this no longer seems to be valid or maybe its even destructive in combination with the other changes.
> 
> But I totally lost track of the further development. As I understood, there are patches queue up in mainline, which might go into 6.1, too, but have not landed there which should fix the problem?
> 
> Can anyone give me exact references to the patches I'd need to apply to 6.1.52, so that I could probably fix my problem and also test the patches for you those on production systems with a load which tends to run into that problem easily?

Here is a list of changes for 6.1:

e5e9b9cb71a0 md: factor out a helper to wake up md_thread directly
f71209b1f21c md: enhance checking in md_check_recovery()
753260ed0b46 md: wake up 'resync_wait' at last in md_reap_sync_thread()
130443d60b1b md: refactor idle/frozen_sync_thread() to fix deadlock
6f56f0c4f124 md: add a mutex to synchronize idle and frozen in 
action_store()
64e5e09afc14 md: refactor action_store() for 'idle' and 'frozen'
a865b96c513b Revert "md: unlock mddev before reap sync_thread in 
action_store"

You can get them from the following tree:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git


> 
> Thanks
> 
>    Donald
> 
> [1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/
> 
>> Ha, jinx :) I was about to email you that I isolated that change with the testing over the weekend that made it more difficult to reproduce in 6.1 and that the original change must be reverted :)
>>
>>
>>
>>>
>>> Thanks,
>>> Kuai
>>>
>>>>>
>>>>> It seems that 6.1 has some other code that prevents this from happening.
>>>>>
>>>>
>>>> I see that there are lots of patches for raid456 between 5.10 and 6.1,
>>>> however, I remember that I used to reporduce the deadlock after 6.1, and
>>>> it's true it's not easy to reporduce, see below:
>>>>
>>>> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
>>>>
>>>> My guess is that 6.1 is harder to reporduce than 5.10 due to some
>>>> changes inside raid456.
>>>>
>>>> By the way, raid10 had a similiar deadlock, and can be fixed the same
>>>> way, so it make sense to backport these patches.
>>>>
>>>> https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
>>>>
>>>> Thanks,
>>>> Kuai
>>>>
>>>>
>>>>> On 5.10 I can reproduce it within minutes to an hour.
>>>>>
>>>>
>>>> .
>>>>
>>>
>>
> 
> 

-- 
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla


  reply	other threads:[~2023-09-13 14:16 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28 12:25 md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition Donald Buczek
2020-11-30  2:06 ` Guoqing Jiang
2020-12-01  9:29   ` Donald Buczek
2020-12-02 17:28     ` Donald Buczek
2020-12-03  1:55       ` Guoqing Jiang
2020-12-03 11:42         ` Donald Buczek
2020-12-21 12:33           ` Donald Buczek
2021-01-19 11:30             ` Donald Buczek
2021-01-20 16:33               ` Guoqing Jiang
2021-01-23 13:04                 ` Donald Buczek
2021-01-25  8:54                   ` Donald Buczek
2021-01-25 21:32                     ` Donald Buczek
2021-01-26  0:44                       ` Guoqing Jiang
2021-01-26  9:50                         ` Donald Buczek
2021-01-26 11:14                           ` Guoqing Jiang
2021-01-26 12:58                             ` Donald Buczek
2021-01-26 14:06                               ` Guoqing Jiang
2021-01-26 16:05                                 ` Donald Buczek
2021-02-02 15:42                                   ` Guoqing Jiang
2021-02-08 11:38                                     ` Donald Buczek
2021-02-08 14:53                                       ` Guoqing Jiang
2021-02-08 18:41                                         ` Donald Buczek
2021-02-09  0:46                                           ` Guoqing Jiang
2021-02-09  9:24                                             ` Donald Buczek
2023-03-14 13:25                                             ` Marc Smith
2023-03-14 13:55                                               ` Guoqing Jiang
2023-03-14 14:45                                                 ` Marc Smith
2023-03-16 15:25                                                   ` Marc Smith
2023-03-29  0:01                                                     ` Song Liu
2023-08-22 21:16                                                       ` Dragan Stancevic
2023-08-23  1:22                                                         ` Yu Kuai
2023-08-23 15:33                                                           ` Dragan Stancevic
2023-08-24  1:18                                                             ` Yu Kuai
2023-08-28 20:32                                                               ` Dragan Stancevic
2023-08-30  1:36                                                                 ` Yu Kuai
2023-09-05  3:50                                                                   ` Yu Kuai
2023-09-05 13:54                                                                     ` Dragan Stancevic
2023-09-13  9:08                                                                       ` Donald Buczek
2023-09-13 14:16                                                                         ` Dragan Stancevic [this message]
2023-09-14  6:03                                                                           ` Donald Buczek
2023-09-17  8:55                                                                             ` Donald Buczek
2023-09-24 14:35                                                                               ` Donald Buczek
2023-09-25  1:11                                                                                 ` Yu Kuai
2023-09-25  9:11                                                                                   ` Donald Buczek
2023-09-25  9:32                                                                                     ` Yu Kuai
2023-03-15  3:02                                                 ` Yu Kuai
2023-03-15  9:30                                                   ` Guoqing Jiang
2023-03-15  9:53                                                     ` Yu Kuai
2023-03-15  7:52                                               ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de7f6fba-c6e0-7549-199e-36548b68a862@stancevic.com \
    --to=dragan@stancevic.com \
    --cc=buczek@molgen.mpg.de \
    --cc=guoqing.jiang@linux.dev \
    --cc=it+raid@molgen.mpg.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=msmith626@gmail.com \
    --cc=song@kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).