All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>,
	AceLan Kao <acelan@gmail.com>
Cc: Yu Kuai <yukuai1@huaweicloud.com>, Song Liu <song@kernel.org>,
	Guoqing Jiang <guoqing.jiang@linux.dev>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Christoph Hellwig <hch@lst.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Regressions <regressions@lists.linux.dev>,
	Linux RAID <linux-raid@vger.kernel.org>,
	"yukuai (C)" <yukuai3@huawei.com>,
	"yangerkun@huawei.com" <yangerkun@huawei.com>
Subject: Re: Infiniate systemd loop when power off the machine with multiple MD RAIDs
Date: Thu, 7 Sep 2023 10:04:11 +0800	[thread overview]
Message-ID: <43b0b2f4-17c0-61d2-9c41-0595fb6f2efc@huaweicloud.com> (raw)
In-Reply-To: <20230906122751.00001e5b@linux.intel.com>

Hi,

在 2023/09/06 18:27, Mariusz Tkaczyk 写道:
> On Wed, 6 Sep 2023 14:26:30 +0800
> AceLan Kao <acelan@gmail.com> wrote:
> 
>>  From previous testing, I don't think it's an issue in systemd, so I
>> did a simple test and found the issue is gone.
>> You only need to add a small delay in md_release(), then the issue
>> can't be reproduced.
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 78be7811a89f..ef47e34c1af5 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -7805,6 +7805,7 @@ static void md_release(struct gendisk *disk)
>> {
>>         struct mddev *mddev = disk->private_data;
>>
>> +       msleep(10);
>>         BUG_ON(!mddev);
>>         atomic_dec(&mddev->openers);
>>         mddev_put(mddev);
> 
> I have repro and I tested it on my setup. It is not working for me.
> My setup could be more "advanced" to maximalize chance of reproduction:
> 
> # cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4] [raid10] [raid0]
> md121 : active raid0 nvme2n1[1] nvme5n1[0]
>        7126394880 blocks super external:/md127/0 128k chunks
> 
> md122 : active raid10 nvme6n1[3] nvme4n1[2] nvme1n1[1] nvme7n1[0]
>        104857600 blocks super external:/md126/0 64K chunks 2 near-copies [4/4]
> [UUUU]
> 
> md123 : active raid5 nvme6n1[3] nvme4n1[2] nvme1n1[1] nvme7n1[0]
>        2655765504 blocks super external:/md126/1 level 5, 32k chunk, algorithm 0
> [4/4] [UUUU]
> 
> md124 : active raid1 nvme0n1[1] nvme3n1[0]
>        99614720 blocks super external:/md125/0 [2/2] [UU]
> 
> md125 : inactive nvme3n1[1](S) nvme0n1[0](S)
>        10402 blocks super external:imsm
> 
> md126 : inactive nvme7n1[3](S) nvme1n1[2](S) nvme6n1[1](S) nvme4n1[0](S)
>        20043 blocks super external:imsm
> 
> md127 : inactive nvme2n1[1](S) nvme5n1[0](S)
>        10402 blocks super external:imsm
> 
> I have almost 99% repro ratio, slowly moving forward..
> 
> It is endless loop because systemd-shutdown sends ioctl "stop_array" which is
> successful but array is not stopped. For that reason it sets "changed = true".

How does systemd-shutdown judge if array is stopped? cat /proc/mdstat or
ls /dev/md* or other way?
> 
> Systemd-shutdown see the change and retries to check if there is something else
> which can be stopped now, and again, again...
> 
> I will check what is returned first, it could be 0 or it could be positive
> errno (nit?) because systemd cares "if(r < 0)".

I do noticed that there are lots of log about md123 stopped:

[ 1371.834034] md122:systemd-shutdow bd_prepare_to_claim return -16
[ 1371.840294] md122:systemd-shutdow blkdev_get_by_dev return -16
[ 1371.846845] md: md123 stopped.
[ 1371.850155] md122:systemd-shutdow bd_prepare_to_claim return -16
[ 1371.856411] md122:systemd-shutdow blkdev_get_by_dev return -16
[ 1371.862941] md: md123 stopped.

And md_ioctl->do_md_stop doesn't have error path after printing this
log, hence 0 will be returned to user.

The normal case is that:

open md123
ioctl STOP_ARRAY -> all rdev should be removed from array
close md123 -> mddev will finally be freed by:
	md_release
	 mddev_put
	  set_bit(MD_DELETED, &mddev->flags) -> user shound not see this mddev
	  queue_work(md_misc_wq, &mddev->del_work)

	mddev_delayed_delete
	 kobject_put(&mddev->kobj)

	md_kobj_release
	 del_gendisk
	  md_free_disk
	   mddev_free

Now that you can reporduce this problem 99%, can you dig deeper and find
out what is wrong?

Thanks,
Kuai

> 
> Thanks,
> Mariusz
> 
> .
> 


  reply	other threads:[~2023-09-07  2:21 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-16  9:37 Fwd: Infiniate systemd loop when power off the machine with multiple MD RAIDs Bagas Sanjaya
2023-08-18  8:16 ` Mariusz Tkaczyk
2023-08-18  9:21   ` Hannes Reinecke
2023-08-21  3:23     ` AceLan Kao
2023-08-22  3:51   ` Guoqing Jiang
2023-08-22  6:17     ` Song Liu
2023-08-22  6:39       ` Mariusz Tkaczyk
2023-08-22  8:13         ` AceLan Kao
2023-08-22 12:41           ` Guoqing Jiang
2023-08-23  8:02             ` AceLan Kao
2023-08-23 13:25               ` Song Liu
2023-08-26  4:31                 ` AceLan Kao
2023-08-28  5:20                   ` Song Liu
2023-08-28 10:48                     ` AceLan Kao
2023-08-29  3:12                       ` AceLan Kao
2023-08-28 13:50                     ` Yu Kuai
2023-08-31  2:28                       ` Yu Kuai
2023-08-31  6:50                         ` Mariusz Tkaczyk
2023-09-06  6:26                           ` AceLan Kao
2023-09-06 10:27                             ` Mariusz Tkaczyk
2023-09-07  2:04                               ` Yu Kuai [this message]
2023-09-07 10:18                                 ` Mariusz Tkaczyk
2023-09-07 11:26                                   ` Yu Kuai
2023-09-07 12:14                                     ` Yu Kuai
2023-09-07 12:41                                       ` Mariusz Tkaczyk
2023-09-07 12:53                                         ` Yu Kuai
2023-09-07 15:09                                           ` Mariusz Tkaczyk
2023-09-08 20:25                                             ` Song Liu
2023-08-21 13:18 ` Fwd: " Yu Kuai
2023-08-22  1:39   ` AceLan Kao
2023-08-22 18:56 ` Song Liu
2023-08-22 19:13   ` Carlos Carvalho
2023-08-23  1:28     ` Yu Kuai
2023-08-23  6:04       ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43b0b2f4-17c0-61d2-9c41-0595fb6f2efc@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=acelan@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=guoqing.jiang@linux.dev \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mariusz.tkaczyk@linux.intel.com \
    --cc=regressions@lists.linux.dev \
    --cc=song@kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.