All of lore.kernel.org
 help / color / mirror / Atom feed
From: Song Liu <song@kernel.org>
To: Dan Moulding <dan@danm.net>
Cc: regressions@lists.linux.dev, linux-raid@vger.kernel.org,
	 linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	 Junxiao Bi <junxiao.bi@oracle.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	 Yu Kuai <yukuai1@huaweicloud.com>
Subject: Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected
Date: Mon, 22 Jan 2024 17:08:06 -0800	[thread overview]
Message-ID: <CAPhsuW4WSfcJWjYt56eCamgc1nqyQ1gxFc=6-2DV-NJs9wroeg@mail.gmail.com> (raw)
In-Reply-To: <20240123005700.9302-1-dan@danm.net>

On Mon, Jan 22, 2024 at 4:57 PM Dan Moulding <dan@danm.net> wrote:
>
> After upgrading from 6.7.0 to 6.7.1 a couple of my systems with md
> RAID-5 arrays started experiencing hangs. It starts with some
> processes which write to the array getting stuck. The whole system
> eventually becomes unresponsive and unclean shutdown must be performed
> (poweroff and reboot don't work).
>
> While trying to diagnose the issue, I noticed that the md0_raid5
> kernel thread consumes 100% CPU after the issue occurs. No relevant
> warnings or errors were found in dmesg.
>
> On 6.7.1, I can reproduce the issue somewhat reliably by copying a
> large amount of data to the array. I am unable to reproduce the issue
> at all on 6.7.0. The bisection was a bit difficult since I don't have
> a 100% reliable method to reproduce the problem, but with some
> perseverence I eventually managed to whittle it down to commit
> 0de40f76d567 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in
> raid5d"). After reverting that commit (i.e. reapplying the reverted
> commit) on top of 6.7.1 I can no longer reproduce the problem at all.
>
> Some details that might be relevant:
> - Both systems are running MD RAID-5 with a journal device.
> - mdadm in monitor mode is always running on both systems.
> - Both systems were previously running 6.7.0 and earlier just fine.
> - The older of the two systems has been running a raid5 array without
>   incident for many years (kernel going back to at least 5.1) -- this
>   is the first raid5 issue it has encountered.
>
> Please let me know if there is any other helpful information that I
> might be able to provide.

Thanks for the report, and sorry for the problem.

We are looking into some regressions that are probably related to this.
We will fix the issue ASAP.

Song

  reply	other threads:[~2024-01-23  1:08 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-23  0:56 [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected Dan Moulding
2024-01-23  1:08 ` Song Liu [this message]
2024-01-23  1:35 ` Dan Moulding
2024-01-23  6:35   ` Song Liu
2024-01-23 21:53     ` Dan Moulding
2024-01-23 22:21       ` Song Liu
2024-01-23 23:58         ` Dan Moulding
2024-01-25  0:01           ` Song Liu
2024-01-25 16:44             ` junxiao.bi
2024-01-25 19:40               ` Song Liu
2024-01-25 20:31               ` Dan Moulding
2024-01-26  3:30                 ` Carlos Carvalho
2024-01-26 15:46                   ` Dan Moulding
2024-01-30 16:26                     ` Blazej Kucman
2024-01-30 20:21                       ` Song Liu
2024-01-31  1:26                       ` Song Liu
2024-01-31  2:13                         ` Yu Kuai
2024-01-31  2:41                       ` Yu Kuai
2024-01-31  4:55                         ` Song Liu
2024-01-31 13:36                           ` Blazej Kucman
2024-02-01  1:39                             ` Yu Kuai
2024-01-26 16:21                   ` Roman Mamedov
2024-01-31 17:37                 ` junxiao.bi
2024-02-06  8:07                 ` Song Liu
2024-02-06 20:56                   ` Dan Moulding
2024-02-06 21:34                     ` Song Liu
2024-02-20 23:06 ` Dan Moulding
2024-02-20 23:15   ` junxiao.bi
2024-02-21 14:50     ` Mateusz Kusiak
2024-02-21 19:15       ` junxiao.bi
2024-02-23 17:44     ` Dan Moulding
2024-02-23 19:18       ` junxiao.bi
2024-02-23 20:22         ` Dan Moulding
2024-02-23  8:07   ` Linux regression tracking (Thorsten Leemhuis)
2024-02-24  2:13     ` Song Liu
2024-02-25 17:46       ` Thomas B. Clark
2024-02-26  1:17         ` Thomas B. Clark
2024-02-26 17:35           ` Song Liu
2024-03-01 20:26       ` junxiao.bi
2024-03-01 23:12         ` Dan Moulding
2024-03-02  0:05           ` Song Liu
2024-03-06  8:38             ` Linux regression tracking (Thorsten Leemhuis)
2024-03-06 17:13               ` Song Liu
2024-03-02 16:55         ` Dan Moulding
2024-03-07  3:34         ` Yu Kuai
2024-03-08 23:49         ` junxiao.bi
2024-03-10  5:13           ` Dan Moulding
2024-03-11  1:50           ` Yu Kuai
2024-03-12 22:56             ` junxiao.bi
2024-03-13  1:20               ` Yu Kuai
2024-03-14 18:20                 ` junxiao.bi
2024-03-14 22:36                   ` Song Liu
2024-03-15  1:30                   ` Yu Kuai
2024-03-14 16:12             ` Dan Moulding
2024-03-15  1:17               ` Yu Kuai
2024-03-19 14:16                 ` Dan Moulding

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPhsuW4WSfcJWjYt56eCamgc1nqyQ1gxFc=6-2DV-NJs9wroeg@mail.gmail.com' \
    --to=song@kernel.org \
    --cc=dan@danm.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=junxiao.bi@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=yukuai1@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.