linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai3@huawei.com>
To: <agk@redhat.com>, <snitzer@kernel.org>, <mpatocka@redhat.com>,
	<dm-devel@lists.linux.dev>, <xni@redhat.com>, <song@kernel.org>,
	<yukuai3@huawei.com>, <jbrassow@f14.redhat.com>, <neilb@suse.de>,
	<heinzm@redhat.com>, <shli@fb.com>, <akpm@osdl.org>
Cc: <linux-kernel@vger.kernel.org>, <linux-raid@vger.kernel.org>,
	<yukuai1@huaweicloud.com>, <yi.zhang@huawei.com>,
	<yangerkun@huawei.com>
Subject: [PATCH v2 01/11] md: don't ignore suspended array in md_check_recovery()
Date: Wed, 24 Jan 2024 17:14:11 +0800	[thread overview]
Message-ID: <20240124091421.1261579-2-yukuai3@huawei.com> (raw)
In-Reply-To: <20240124091421.1261579-1-yukuai3@huawei.com>

mddev_suspend() never stop sync_thread, hence it doesn't make sense to
ignore suspended array in md_check_recovery(), which might cause
sync_thread can't be unregistered.

After commit f52f5c71f3d4 ("md: fix stopping sync thread"), following
hang can be triggered by test shell/integrity-caching.sh:

1) suspend the array:
raid_postsuspend
 mddev_suspend

2) stop the array:
raid_dtr
 md_stop
  __md_stop_writes
   stop_sync_thread
    set_bit(MD_RECOVERY_INTR, &mddev->recovery);
    md_wakeup_thread_directly(mddev->sync_thread);
    wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))

3) sync thread done:
md_do_sync
 set_bit(MD_RECOVERY_DONE, &mddev->recovery);
 md_wakeup_thread(mddev->thread);

4) daemon thread can't unregister sync thread:
md_check_recovery
 if (mddev->suspended)
   return; -> return directly
 md_read_sync_thread
 clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
 -> MD_RECOVERY_RUNNING can't be cleared, hence step 2 hang;

This problem is not just related to dm-raid, fix it by ignoring
suspended array in md_check_recovery(). And follow up patches will
improve dm-raid better to frozen sync thread during suspend.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Closes: https://lore.kernel.org/all/8fb335e-6d2c-dbb5-d7-ded8db5145a@redhat.com/
Fixes: 68866e425be2 ("MD: no sync IO while suspended")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2266358d8074..07b80278eaa5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9469,9 +9469,6 @@ static void md_start_sync(struct work_struct *ws)
  */
 void md_check_recovery(struct mddev *mddev)
 {
-	if (READ_ONCE(mddev->suspended))
-		return;
-
 	if (mddev->bitmap)
 		md_bitmap_daemon_work(mddev);
 
-- 
2.39.2


  reply	other threads:[~2024-01-24  9:18 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24  9:14 [PATCH v2 00/11] dm-raid: fix v6.7 regressions Yu Kuai
2024-01-24  9:14 ` Yu Kuai [this message]
2024-01-24  9:14 ` [PATCH v2 02/11] md: don't ignore read-only array in md_check_recovery() Yu Kuai
2024-01-24  9:14 ` [PATCH v2 03/11] md: make sure md_do_sync() will set MD_RECOVERY_DONE Yu Kuai
2024-01-24  9:14 ` [PATCH v2 04/11] md: don't register sync_thread for reshape directly Yu Kuai
2024-01-24  9:14 ` [PATCH v2 05/11] md: export helpers to stop sync_thread Yu Kuai
2024-01-25  7:51   ` Xiao Ni
2024-01-25  7:57     ` Yu Kuai
2024-01-26  2:38       ` Yu Kuai
2024-01-25 11:35   ` Xiao Ni
2024-01-25 11:42     ` Yu Kuai
2024-01-25 11:52       ` Xiao Ni
2024-01-25 11:56         ` Yu Kuai
2024-01-25 11:45     ` Yu Kuai
2024-01-25 13:33   ` Xiao Ni
2024-01-26  0:14     ` Song Liu
2024-01-26  6:54       ` Xiao Ni
2024-01-24  9:14 ` [PATCH v2 06/11] dm-raid: really frozen sync_thread during suspend Yu Kuai
2024-01-24  9:14 ` [PATCH v2 07/11] md/dm-raid: don't call md_reap_sync_thread() directly Yu Kuai
2024-01-24  9:14 ` [PATCH v2 08/11] dm-raid: remove mddev_suspend/resume() Yu Kuai
2024-01-24  9:14 ` [PATCH v2 09/11] dm-raid: add a new helper prepare_suspend() in md_personality Yu Kuai
2024-01-24  9:14 ` [PATCH v2 10/11] md: export helper md_is_rdwr() Yu Kuai
2024-01-24  9:14 ` [PATCH v2 11/11] md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape Yu Kuai
2024-01-24 12:19 ` [PATCH v2 00/11] dm-raid: fix v6.7 regressions Xiao Ni
2024-01-25  0:50   ` Xiao Ni
2024-01-25  1:40     ` Yu Kuai
2024-01-25  0:46 ` Song Liu
2024-01-25  1:08   ` Yu Kuai
2024-01-25  1:51     ` Song Liu
2024-01-25  2:36       ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240124091421.1261579-2-yukuai3@huawei.com \
    --to=yukuai3@huawei.com \
    --cc=agk@redhat.com \
    --cc=akpm@osdl.org \
    --cc=dm-devel@lists.linux.dev \
    --cc=heinzm@redhat.com \
    --cc=jbrassow@f14.redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=neilb@suse.de \
    --cc=shli@fb.com \
    --cc=snitzer@kernel.org \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).