All of lore.kernel.org
 help / color / mirror / Atom feed
From: Song Liu <song@kernel.org>
To: Guoqing Jiang <guoqing.jiang@linux.dev>
Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	 Christoph Hellwig <hch@lst.de>, AceLan Kao <acelan@gmail.com>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	 Linux Regressions <regressions@lists.linux.dev>,
	Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Infiniate systemd loop when power off the machine with multiple MD RAIDs
Date: Mon, 21 Aug 2023 23:17:54 -0700	[thread overview]
Message-ID: <CAPhsuW6cSLqwRVO_EpFyimvc7hgi1rb3T8-NA+stHdwrqrScBA@mail.gmail.com> (raw)
In-Reply-To: <b0488ff7-10c8-4b4e-28b8-01809133c297@linux.dev>

On Mon, Aug 21, 2023 at 8:51 PM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
>
>
> On 8/18/23 16:16, Mariusz Tkaczyk wrote:
> > On Wed, 16 Aug 2023 16:37:26 +0700
> > Bagas Sanjaya<bagasdotme@gmail.com>  wrote:
> >
> >> Hi,
> >>
> >> I notice a regression report on Bugzilla [1]. Quoting from it:
> >>
> >>> It needs to build at least 2 different RAIDs(eg. RAID0 and RAID10, RAID5
> >>> and RAID10) and then you will see below error repeatly(need to use serial
> >>> console to see it)
> >>>
> >>> [ 205.360738] systemd-shutdown[1]: Stopping MD devices.
> >>> [ 205.366384] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
> >>> [ 205.373327] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
> >>> [ 205.380427] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
> >>> [ 205.388257] systemd-shutdown[1]: Stopping MD /dev/md127 (9:127).
> >>> [ 205.394880] systemd-shutdown[1]: Failed to sync MD block device
> >>> /dev/md127, ignoring: Input/output error [ 205.404975] md: md127 stopped.
> >>> [ 205.470491] systemd-shutdown[1]: Stopping MD /dev/md126 (9:126).
> >>> [ 205.770179] md: md126: resync interrupted.
> >>> [ 205.776258] md126: detected capacity change from 1900396544 to 0
> >>> [ 205.783349] md: md126 stopped.
> >>> [ 205.862258] systemd-shutdown[1]: Stopping MD /dev/md125 (9:125).
> >>> [ 205.862435] md: md126 stopped.
> >>> [ 205.868376] systemd-shutdown[1]: Failed to sync MD block device
> >>> /dev/md125, ignoring: Input/output error [ 205.872845] block device
> >>> autoloading is deprecated and will be removed. [ 205.880955] md: md125
> >>> stopped. [ 205.934349] systemd-shutdown[1]: Stopping MD /dev/md124p2
> >>> (259:7). [ 205.947707] systemd-shutdown[1]: Could not stop MD /dev/md124p2:
> >>> Device or resource busy [ 205.957004] systemd-shutdown[1]: Stopping MD
> >>> /dev/md124p1 (259:6). [ 205.964177] systemd-shutdown[1]: Could not stop MD
> >>> /dev/md124p1: Device or resource busy [ 205.973155] systemd-shutdown[1]:
> >>> Stopping MD /dev/md124 (9:124). [ 205.979789] systemd-shutdown[1]: Could
> >>> not stop MD /dev/md124: Device or resource busy [ 205.988475]
> >>> systemd-shutdown[1]: Not all MD devices stopped, 4 left.
> >> See Bugzilla for the full thread and attached full journalctl log.
> >>
> >> Anyway, I'm adding this regression to be tracked by regzbot:
> >>
> >> #regzbot introduced: 12a6caf273240a
> >> https://bugzilla.kernel.org/show_bug.cgi?id=217798  #regzbot title: systemd
> >> shutdown hang on machine with different RAID levels
> >>
> >> Thanks.
> >>
> >> [1]:https://bugzilla.kernel.org/show_bug.cgi?id=217798
> >>
> > Hello,
> > The issue is reproducible with IMSM metadata too, around 20% of reboot hangs. I
> > will try to raise the priority in the bug because it is valid high- the
> > base functionality of the system is affected.
>
> Since it it reproducible from your side, is it possible to turn the
> reproduce steps into a test case
> given the importance?
>
> I guess If all arrays are set with MD_DELETED flag, then reboot might
> hang, not sure whether
> below (maybe need to flush wq as well  before list_del) helps or not,
> just FYI.
>
> @@ -9566,8 +9566,10 @@ static int md_notify_reboot(struct notifier_block
> *this,
>
>          spin_lock(&all_mddevs_lock);
>          list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> -               if (!mddev_get(mddev))
> +               if (!mddev_get(mddev)) {
> +                       list_del(&mddev->all_mddevs);
>                          continue;
> +               }

I am still not able to reproduce this, probably due to differences in the
timing. Maybe we only need something like:

diff --git i/drivers/md/md.c w/drivers/md/md.c
index 5c3c19b8d509..ebb529b0faf8 100644
--- i/drivers/md/md.c
+++ w/drivers/md/md.c
@@ -9619,8 +9619,10 @@ static int md_notify_reboot(struct notifier_block *this,

        spin_lock(&all_mddevs_lock);
        list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
-               if (!mddev_get(mddev))
+               if (!mddev_get(mddev)) {
+                       need_delay = 1;
                        continue;
+               }
                spin_unlock(&all_mddevs_lock);
                if (mddev_trylock(mddev)) {
                        if (mddev->pers)


Thanks,
Song

  reply	other threads:[~2023-08-22  6:18 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-16  9:37 Fwd: Infiniate systemd loop when power off the machine with multiple MD RAIDs Bagas Sanjaya
2023-08-18  8:16 ` Mariusz Tkaczyk
2023-08-18  9:21   ` Hannes Reinecke
2023-08-21  3:23     ` AceLan Kao
2023-08-22  3:51   ` Guoqing Jiang
2023-08-22  6:17     ` Song Liu [this message]
2023-08-22  6:39       ` Mariusz Tkaczyk
2023-08-22  8:13         ` AceLan Kao
2023-08-22 12:41           ` Guoqing Jiang
2023-08-23  8:02             ` AceLan Kao
2023-08-23 13:25               ` Song Liu
2023-08-26  4:31                 ` AceLan Kao
2023-08-28  5:20                   ` Song Liu
2023-08-28 10:48                     ` AceLan Kao
2023-08-29  3:12                       ` AceLan Kao
2023-08-28 13:50                     ` Yu Kuai
2023-08-31  2:28                       ` Yu Kuai
2023-08-31  6:50                         ` Mariusz Tkaczyk
2023-09-06  6:26                           ` AceLan Kao
2023-09-06 10:27                             ` Mariusz Tkaczyk
2023-09-07  2:04                               ` Yu Kuai
2023-09-07 10:18                                 ` Mariusz Tkaczyk
2023-09-07 11:26                                   ` Yu Kuai
2023-09-07 12:14                                     ` Yu Kuai
2023-09-07 12:41                                       ` Mariusz Tkaczyk
2023-09-07 12:53                                         ` Yu Kuai
2023-09-07 15:09                                           ` Mariusz Tkaczyk
2023-09-08 20:25                                             ` Song Liu
2023-08-21 13:18 ` Fwd: " Yu Kuai
2023-08-22  1:39   ` AceLan Kao
2023-08-22 18:56 ` Song Liu
2023-08-22 19:13   ` Carlos Carvalho
2023-08-23  1:28     ` Yu Kuai
2023-08-23  6:04       ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPhsuW6cSLqwRVO_EpFyimvc7hgi1rb3T8-NA+stHdwrqrScBA@mail.gmail.com \
    --to=song@kernel.org \
    --cc=acelan@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=guoqing.jiang@linux.dev \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mariusz.tkaczyk@linux.intel.com \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.