All of lore.kernel.org
 help / color / mirror / Atom feed
From: AceLan Kao <acelan@gmail.com>
To: Guoqing Jiang <guoqing.jiang@linux.dev>
Cc: Song Liu <song@kernel.org>,
	Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>,
	 Bagas Sanjaya <bagasdotme@gmail.com>,
	Christoph Hellwig <hch@lst.de>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	 Linux Regressions <regressions@lists.linux.dev>,
	Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Infiniate systemd loop when power off the machine with multiple MD RAIDs
Date: Wed, 23 Aug 2023 16:02:46 +0800	[thread overview]
Message-ID: <CAMz9Wg_zKSJ2vL=r2zAtLBOv4GSMT63+ZQGXfYTjVJsE+DLQGA@mail.gmail.com> (raw)
In-Reply-To: <35130b3f-c0fd-e2d6-e849-a5ceb6a2895f@linux.dev>

Hi Guoqing,

Guoqing Jiang <guoqing.jiang@linux.dev> 於 2023年8月22日 週二 下午8:41寫道:
>
> Hi Acelan,
>
> On 8/22/23 16:13, AceLan Kao wrote:
> >>>>> Hello,
> >>>>> The issue is reproducible with IMSM metadata too, around 20% of reboot
> >>>>> hangs. I will try to raise the priority in the bug because it is valid
> >>>>> high- the base functionality of the system is affected.
> >>>> Since it it reproducible from your side, is it possible to turn the
> >>>> reproduce steps into a test case
> >>>> given the importance?
> >> I didn't try to reproduce it locally yet because customer was able to
> >> bisect the regression and it pointed them to the same patch so I connected it
> >> and asked author to take a look first. At a first glance, I wanted to get
> >> community voice to see if it is not something obvious.
> >>
> >> So far I know, customer is creating 3 IMSM raid arrays, one is the system
> >> volume and do a reboot and it sporadically fails (around 20%). That is all.
> >>
> >>>> I guess If all arrays are set with MD_DELETED flag, then reboot might
> >>>> hang, not sure whether
> >>>> below (maybe need to flush wq as well  before list_del) helps or not,
> >>>> just FYI.
> >>>>
> >>>> @@ -9566,8 +9566,10 @@ static int md_notify_reboot(struct notifier_block
> >>>> *this,
> >>>>
> >>>>           spin_lock(&all_mddevs_lock);
> >>>>           list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> >>>> -               if (!mddev_get(mddev))
> >>>> +               if (!mddev_get(mddev)) {
> >>>> +                       list_del(&mddev->all_mddevs);
> >>>>                           continue;
> >>>> +               }
>
> My suggestion is delete the list node under this scenario, did you try
> above?
Still no luck, the patch doesn't work, the sympton is the same.

>
> >>> I am still not able to reproduce this, probably due to differences in the
> >>> timing. Maybe we only need something like:
> >>>
> >>> diff --git i/drivers/md/md.c w/drivers/md/md.c
> >>> index 5c3c19b8d509..ebb529b0faf8 100644
> >>> --- i/drivers/md/md.c
> >>> +++ w/drivers/md/md.c
> >>> @@ -9619,8 +9619,10 @@ static int md_notify_reboot(struct notifier_block
> >>> *this,
> >>>
> >>>          spin_lock(&all_mddevs_lock);
> >>>          list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> >>> -               if (!mddev_get(mddev))
> >>> +               if (!mddev_get(mddev)) {
> >>> +                       need_delay = 1;
> >>>                          continue;
> >>> +               }
> >>>                  spin_unlock(&all_mddevs_lock);
> >>>                  if (mddev_trylock(mddev)) {
> >>>                          if (mddev->pers)
> >>>
> >>>
> >>> Thanks,
> >>> Song
> >> I will try to reproduce issue at Intel lab to check this.
> >>
> >> Thanks,
> >> Mariusz
> > Hi Guoqing,
> >
> > Here is the command how I trigger the issue, have to do it around 10
> > times to make sure the issue is reproducible
> >
> > echo "repair" | sudo tee /sys/class/block/md12?/md/sync_action && sudo
> > grub-reboot "Advanced options for Ubuntu>Ubuntu, with Linux 6.5.0-rc77
> > 06a74159504-dirty" && head -c 1G < /dev/urandom > myfile1 && sleep 180
> > && head -c 1G < /dev/urandom > myfile2 && sleep 1 && cat /proc/mdstat
> > && sleep 1 && rm myfile1 &&
> > sudo reboot
>
> Is the issue still reproducible with remove below from cmd?
>
> echo "repair" | sudo tee /sys/class/block/md12?/md/sync_action
>
> Just want to know if resync thread is related with the issue or not.
Probably not, we can reproduce the issue without resync, just feel
it's easier to reproduce the issue, so I put the command in my script.

>
> > And the patch to add need_delay doesn't work.
>
> My assumption is that mddev_get always returns NULL, so set need_delay
> wouldn't help.
>
> Thanks,
> Guoqing



-- 
Chia-Lin Kao(AceLan)
http://blog.acelan.idv.tw/
E-Mail: acelan.kaoATcanonical.com (s/AT/@/)

  reply	other threads:[~2023-08-23  8:03 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-16  9:37 Fwd: Infiniate systemd loop when power off the machine with multiple MD RAIDs Bagas Sanjaya
2023-08-18  8:16 ` Mariusz Tkaczyk
2023-08-18  9:21   ` Hannes Reinecke
2023-08-21  3:23     ` AceLan Kao
2023-08-22  3:51   ` Guoqing Jiang
2023-08-22  6:17     ` Song Liu
2023-08-22  6:39       ` Mariusz Tkaczyk
2023-08-22  8:13         ` AceLan Kao
2023-08-22 12:41           ` Guoqing Jiang
2023-08-23  8:02             ` AceLan Kao [this message]
2023-08-23 13:25               ` Song Liu
2023-08-26  4:31                 ` AceLan Kao
2023-08-28  5:20                   ` Song Liu
2023-08-28 10:48                     ` AceLan Kao
2023-08-29  3:12                       ` AceLan Kao
2023-08-28 13:50                     ` Yu Kuai
2023-08-31  2:28                       ` Yu Kuai
2023-08-31  6:50                         ` Mariusz Tkaczyk
2023-09-06  6:26                           ` AceLan Kao
2023-09-06 10:27                             ` Mariusz Tkaczyk
2023-09-07  2:04                               ` Yu Kuai
2023-09-07 10:18                                 ` Mariusz Tkaczyk
2023-09-07 11:26                                   ` Yu Kuai
2023-09-07 12:14                                     ` Yu Kuai
2023-09-07 12:41                                       ` Mariusz Tkaczyk
2023-09-07 12:53                                         ` Yu Kuai
2023-09-07 15:09                                           ` Mariusz Tkaczyk
2023-09-08 20:25                                             ` Song Liu
2023-08-21 13:18 ` Fwd: " Yu Kuai
2023-08-22  1:39   ` AceLan Kao
2023-08-22 18:56 ` Song Liu
2023-08-22 19:13   ` Carlos Carvalho
2023-08-23  1:28     ` Yu Kuai
2023-08-23  6:04       ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMz9Wg_zKSJ2vL=r2zAtLBOv4GSMT63+ZQGXfYTjVJsE+DLQGA@mail.gmail.com' \
    --to=acelan@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=guoqing.jiang@linux.dev \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mariusz.tkaczyk@linux.intel.com \
    --cc=regressions@lists.linux.dev \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.