linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running
@ 2023-05-29 13:20 Yu Kuai
  2023-05-29 13:20 ` [PATCH -next v2 1/6] Revert "md: unlock mddev before reap sync_thread in action_store" Yu Kuai
                   ` (6 more replies)
  0 siblings, 7 replies; 37+ messages in thread
From: Yu Kuai @ 2023-05-29 13:20 UTC (permalink / raw)
  To: guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Changes in v2:
 - rebase for the latest md-next

Patch 1 revert the commit because it will cause MD_RECOVERY_RUNNING to be
cleared while sync_thread is still running. The deadlock this patch tries
to fix will be fixed by patch 2-5.

Patch 6 enhance checking to prevent MD_RECOVERY_RUNNING to be cleared
while sync_thread is still running.

Yu Kuai (6):
  Revert "md: unlock mddev before reap sync_thread in action_store"
  md: refactor action_store() for 'idle' and 'frozen'
  md: add a mutex to synchronize idle and frozen in action_store()
  md: refactor idle/frozen_sync_thread() to fix deadlock
  md: wake up 'resync_wait' at last in md_reap_sync_thread()
  md: enhance checking in md_check_recovery()

 drivers/md/dm-raid.c |   1 -
 drivers/md/md.c      | 124 +++++++++++++++++++++++++++++--------------
 drivers/md/md.h      |   5 ++
 3 files changed, 88 insertions(+), 42 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH -next v2 1/6] Revert "md: unlock mddev before reap sync_thread in action_store"
  2023-05-29 13:20 [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
@ 2023-05-29 13:20 ` Yu Kuai
  2023-06-13  6:25   ` Xiao Ni
  2023-05-29 13:20 ` [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen' Yu Kuai
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-05-29 13:20 UTC (permalink / raw)
  To: guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

This reverts commit 9dfbdafda3b34e262e43e786077bab8e476a89d1.

Because it will introduce a defect that sync_thread can be running while
MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems,
for example:

list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0).
Call trace:
 __list_add_valid+0xfc/0x140
 insert_work+0x78/0x1a0
 __queue_work+0x500/0xcf4
 queue_work_on+0xe8/0x12c
 md_check_recovery+0xa34/0xf30
 raid10d+0xb8/0x900 [raid10]
 md_thread+0x16c/0x2cc
 kthread+0x1a4/0x1ec
 ret_from_fork+0x10/0x18

This is because work is requeued while it's still inside workqueue:

t1:			t2:
action_store
 mddev_lock
  if (mddev->sync_thread)
   mddev_unlock
   md_unregister_thread
   // first sync_thread is done
			md_check_recovery
			 mddev_try_lock
			 /*
			  * once MD_RECOVERY_DONE is set, new sync_thread
			  * can start.
			  */
			 set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
			 INIT_WORK(&mddev->del_work, md_start_sync)
			 queue_work(md_misc_wq, &mddev->del_work)
			  test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)
			  // set pending bit
			  insert_work
			   list_add_tail
			 mddev_unlock
   mddev_lock_nointr
   md_reap_sync_thread
   // MD_RECOVERY_RUNNING is cleared
 mddev_unlock

t3:

// before queued work started from t2
md_check_recovery
 // MD_RECOVERY_RUNNING is not set, a new sync_thread can be started
 INIT_WORK(&mddev->del_work, md_start_sync)
  work->data = 0
  // work pending bit is cleared
 queue_work(md_misc_wq, &mddev->del_work)
  insert_work
   list_add_tail
   // list is corrupted

The above commit is reverted to fix the problem, the deadlock this
commit tries to fix will be fixed in following patches.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/dm-raid.c |  1 -
 drivers/md/md.c      | 19 ++-----------------
 2 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 8846bf510a35..1f22bef27841 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3725,7 +3725,6 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
 	if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
 		if (mddev->sync_thread) {
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-			md_unregister_thread(&mddev->sync_thread);
 			md_reap_sync_thread(mddev);
 		}
 	} else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index a5a7af2f4e59..9b97731e1fe4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4772,19 +4772,6 @@ action_store(struct mddev *mddev, const char *page, size_t len)
 			if (work_pending(&mddev->del_work))
 				flush_workqueue(md_misc_wq);
 			if (mddev->sync_thread) {
-				sector_t save_rp = mddev->reshape_position;
-
-				mddev_unlock(mddev);
-				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-				md_unregister_thread(&mddev->sync_thread);
-				mddev_lock_nointr(mddev);
-				/*
-				 * set RECOVERY_INTR again and restore reshape
-				 * position in case others changed them after
-				 * got lock, eg, reshape_position_store and
-				 * md_check_recovery.
-				 */
-				mddev->reshape_position = save_rp;
 				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 				md_reap_sync_thread(mddev);
 			}
@@ -6184,7 +6171,6 @@ static void __md_stop_writes(struct mddev *mddev)
 		flush_workqueue(md_misc_wq);
 	if (mddev->sync_thread) {
 		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-		md_unregister_thread(&mddev->sync_thread);
 		md_reap_sync_thread(mddev);
 	}
 
@@ -9336,7 +9322,6 @@ void md_check_recovery(struct mddev *mddev)
 			 * ->spare_active and clear saved_raid_disk
 			 */
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-			md_unregister_thread(&mddev->sync_thread);
 			md_reap_sync_thread(mddev);
 			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
@@ -9372,7 +9357,6 @@ void md_check_recovery(struct mddev *mddev)
 			goto unlock;
 		}
 		if (mddev->sync_thread) {
-			md_unregister_thread(&mddev->sync_thread);
 			md_reap_sync_thread(mddev);
 			goto unlock;
 		}
@@ -9452,7 +9436,8 @@ void md_reap_sync_thread(struct mddev *mddev)
 	sector_t old_dev_sectors = mddev->dev_sectors;
 	bool is_reshaped = false;
 
-	/* sync_thread should be unregistered, collect result */
+	/* resync has finished, collect result */
+	md_unregister_thread(&mddev->sync_thread);
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
 	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
 	    mddev->degraded != mddev->raid_disks) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen'
  2023-05-29 13:20 [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
  2023-05-29 13:20 ` [PATCH -next v2 1/6] Revert "md: unlock mddev before reap sync_thread in action_store" Yu Kuai
@ 2023-05-29 13:20 ` Yu Kuai
  2023-06-13  8:02   ` [dm-devel] " Xiao Ni
  2023-05-29 13:20 ` [PATCH -next v2 3/6] md: add a mutex to synchronize idle and frozen in action_store() Yu Kuai
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-05-29 13:20 UTC (permalink / raw)
  To: guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there
are no functional changes except that MD_RECOVERY_RUNNING is checked
again after 'reconfig_mutex' is held.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 61 ++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 45 insertions(+), 16 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9b97731e1fe4..23e8e7eae062 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4755,6 +4755,46 @@ action_show(struct mddev *mddev, char *page)
 	return sprintf(page, "%s\n", type);
 }
 
+static void stop_sync_thread(struct mddev *mddev)
+{
+	if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+		return;
+
+	if (mddev_lock(mddev))
+		return;
+
+	/*
+	 * Check again in case MD_RECOVERY_RUNNING is cleared before lock is
+	 * held.
+	 */
+	if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
+		mddev_unlock(mddev);
+		return;
+	}
+
+	if (work_pending(&mddev->del_work))
+		flush_workqueue(md_misc_wq);
+
+	if (mddev->sync_thread) {
+		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
+		md_reap_sync_thread(mddev);
+	}
+
+	mddev_unlock(mddev);
+}
+
+static void idle_sync_thread(struct mddev *mddev)
+{
+	clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+	stop_sync_thread(mddev);
+}
+
+static void frozen_sync_thread(struct mddev *mddev)
+{
+	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+	stop_sync_thread(mddev);
+}
+
 static ssize_t
 action_store(struct mddev *mddev, const char *page, size_t len)
 {
@@ -4762,22 +4802,11 @@ action_store(struct mddev *mddev, const char *page, size_t len)
 		return -EINVAL;
 
 
-	if (cmd_match(page, "idle") || cmd_match(page, "frozen")) {
-		if (cmd_match(page, "frozen"))
-			set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
-		else
-			clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
-		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
-		    mddev_lock(mddev) == 0) {
-			if (work_pending(&mddev->del_work))
-				flush_workqueue(md_misc_wq);
-			if (mddev->sync_thread) {
-				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-				md_reap_sync_thread(mddev);
-			}
-			mddev_unlock(mddev);
-		}
-	} else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+	if (cmd_match(page, "idle"))
+		idle_sync_thread(mddev);
+	else if (cmd_match(page, "frozen"))
+		frozen_sync_thread(mddev);
+	else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
 		return -EBUSY;
 	else if (cmd_match(page, "resync"))
 		clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH -next v2 3/6] md: add a mutex to synchronize idle and frozen in action_store()
  2023-05-29 13:20 [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
  2023-05-29 13:20 ` [PATCH -next v2 1/6] Revert "md: unlock mddev before reap sync_thread in action_store" Yu Kuai
  2023-05-29 13:20 ` [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen' Yu Kuai
@ 2023-05-29 13:20 ` Yu Kuai
  2023-06-13 14:43   ` [dm-devel] " Xiao Ni
  2023-05-29 13:20 ` [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock Yu Kuai
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-05-29 13:20 UTC (permalink / raw)
  To: guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Currently, for idle and frozen, action_store will hold 'reconfig_mutex'
and call md_reap_sync_thread() to stop sync thread, however, this will
cause deadlock (explained in the next patch). In order to fix the
problem, following patch will release 'reconfig_mutex' and wait on
'resync_wait', like md_set_readonly() and do_md_stop() does.

Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN'
unconditionally, which might cause unexpected problems, for example,
frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while
'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which
might starve in progress frozen. A mutex is added to synchronize idle
and frozen from action_store().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 5 +++++
 drivers/md/md.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 23e8e7eae062..63a993b52cd7 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -644,6 +644,7 @@ void mddev_init(struct mddev *mddev)
 	mutex_init(&mddev->open_mutex);
 	mutex_init(&mddev->reconfig_mutex);
 	mutex_init(&mddev->delete_mutex);
+	mutex_init(&mddev->sync_mutex);
 	mutex_init(&mddev->bitmap_info.mutex);
 	INIT_LIST_HEAD(&mddev->disks);
 	INIT_LIST_HEAD(&mddev->all_mddevs);
@@ -4785,14 +4786,18 @@ static void stop_sync_thread(struct mddev *mddev)
 
 static void idle_sync_thread(struct mddev *mddev)
 {
+	mutex_lock(&mddev->sync_mutex);
 	clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	stop_sync_thread(mddev);
+	mutex_unlock(&mddev->sync_mutex);
 }
 
 static void frozen_sync_thread(struct mddev *mddev)
 {
+	mutex_init(&mddev->delete_mutex);
 	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	stop_sync_thread(mddev);
+	mutex_unlock(&mddev->sync_mutex);
 }
 
 static ssize_t
diff --git a/drivers/md/md.h b/drivers/md/md.h
index bfd2306bc750..2fa903de5bd0 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -537,6 +537,9 @@ struct mddev {
 	/* Protect the deleting list */
 	struct mutex			delete_mutex;
 
+	/* Used to synchronize idle and frozen for action_store() */
+	struct mutex			sync_mutex;
+
 	bool	has_superblocks:1;
 	bool	fail_last_dev:1;
 	bool	serialize_policy:1;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-05-29 13:20 [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
                   ` (2 preceding siblings ...)
  2023-05-29 13:20 ` [PATCH -next v2 3/6] md: add a mutex to synchronize idle and frozen in action_store() Yu Kuai
@ 2023-05-29 13:20 ` Yu Kuai
  2023-06-13 14:50   ` [dm-devel] " Xiao Ni
  2023-05-29 13:20 ` [PATCH -next v2 5/6] md: wake up 'resync_wait' at last in md_reap_sync_thread() Yu Kuai
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-05-29 13:20 UTC (permalink / raw)
  To: guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Our test found a following deadlock in raid10:

1) Issue a normal write, and such write failed:

  raid10_end_write_request
   set_bit(R10BIO_WriteError, &r10_bio->state)
   one_write_done
    reschedule_retry

  // later from md thread
  raid10d
   handle_write_completed
    list_add(&r10_bio->retry_list, &conf->bio_end_io_list)

  // later from md thread
  raid10d
   if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
    list_move(conf->bio_end_io_list.prev, &tmp)
    r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
    raid_end_bio_io(r10_bio)

Dependency chain 1: normal io is waiting for updating superblock

2) Trigger a recovery:

  raid10_sync_request
   raise_barrier

Dependency chain 2: sync thread is waiting for normal io

3) echo idle/frozen to sync_action:

  action_store
   mddev_lock
    md_unregister_thread
     kthread_stop

Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread

4) md thread can't update superblock:

  raid10d
   md_check_recovery
    if (mddev_trylock(mddev))
     md_update_sb

Dependency chain 4: update superblock is waiting for 'reconfig_mutex'

Hence cyclic dependency exist, in order to fix the problem, we must
break one of them. Dependency 1 and 2 can't be broken because they are
foundation design. Dependency 4 may be possible if it can be guaranteed
that no io can be inflight, however, this requires a new mechanism which
seems complex. Dependency 3 is a good choice, because idle/frozen only
requires sync thread to finish, which can be done asynchronously that is
already implemented, and 'reconfig_mutex' is not needed anymore.

This patch switch 'idle' and 'frozen' to wait sync thread to be done
asynchronously, and this patch also add a sequence counter to record how
many times sync thread is done, so that 'idle' won't keep waiting on new
started sync thread.

Noted that raid456 has similiar deadlock([1]), and it's verified[2] this
deadlock can be fixed by this patch as well.

[1] https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
[2] https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 23 +++++++++++++++++++----
 drivers/md/md.h |  2 ++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 63a993b52cd7..7912de0e4d12 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -652,6 +652,7 @@ void mddev_init(struct mddev *mddev)
 	timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0);
 	atomic_set(&mddev->active, 1);
 	atomic_set(&mddev->openers, 0);
+	atomic_set(&mddev->sync_seq, 0);
 	spin_lock_init(&mddev->lock);
 	atomic_set(&mddev->flush_pending, 0);
 	init_waitqueue_head(&mddev->sb_wait);
@@ -4776,19 +4777,27 @@ static void stop_sync_thread(struct mddev *mddev)
 	if (work_pending(&mddev->del_work))
 		flush_workqueue(md_misc_wq);
 
-	if (mddev->sync_thread) {
-		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-		md_reap_sync_thread(mddev);
-	}
+	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
+	/*
+	 * Thread might be blocked waiting for metadata update which will now
+	 * never happen
+	 */
+	md_wakeup_thread_directly(mddev->sync_thread);
 
 	mddev_unlock(mddev);
 }
 
 static void idle_sync_thread(struct mddev *mddev)
 {
+	int sync_seq = atomic_read(&mddev->sync_seq);
+
 	mutex_lock(&mddev->sync_mutex);
 	clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	stop_sync_thread(mddev);
+
+	wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) ||
+			!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
+
 	mutex_unlock(&mddev->sync_mutex);
 }
 
@@ -4797,6 +4806,10 @@ static void frozen_sync_thread(struct mddev *mddev)
 	mutex_init(&mddev->delete_mutex);
 	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	stop_sync_thread(mddev);
+
+	wait_event(resync_wait, mddev->sync_thread == NULL &&
+			!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
+
 	mutex_unlock(&mddev->sync_mutex);
 }
 
@@ -9472,6 +9485,8 @@ void md_reap_sync_thread(struct mddev *mddev)
 
 	/* resync has finished, collect result */
 	md_unregister_thread(&mddev->sync_thread);
+	atomic_inc(&mddev->sync_seq);
+
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
 	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
 	    mddev->degraded != mddev->raid_disks) {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 2fa903de5bd0..7cab9c7c45b8 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -539,6 +539,8 @@ struct mddev {
 
 	/* Used to synchronize idle and frozen for action_store() */
 	struct mutex			sync_mutex;
+	/* The sequence number for sync thread */
+	atomic_t sync_seq;
 
 	bool	has_superblocks:1;
 	bool	fail_last_dev:1;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH -next v2 5/6] md: wake up 'resync_wait' at last in md_reap_sync_thread()
  2023-05-29 13:20 [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
                   ` (3 preceding siblings ...)
  2023-05-29 13:20 ` [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock Yu Kuai
@ 2023-05-29 13:20 ` Yu Kuai
  2023-06-14  7:20   ` Xiao Ni
  2023-05-29 13:20 ` [PATCH -next v2 6/6] md: enhance checking in md_check_recovery() Yu Kuai
  2023-06-08  2:41 ` [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
  6 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-05-29 13:20 UTC (permalink / raw)
  To: guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

md_reap_sync_thread() is just replaced with wait_event(resync_wait, ...)
from action_store(), just make sure action_store() will still wait for
everything to be done in md_reap_sync_thread().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7912de0e4d12..f90226e6ddf8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9531,7 +9531,6 @@ void md_reap_sync_thread(struct mddev *mddev)
 	if (mddev_is_clustered(mddev) && is_reshaped
 				      && !test_bit(MD_CLOSING, &mddev->flags))
 		md_cluster_ops->update_size(mddev, old_dev_sectors);
-	wake_up(&resync_wait);
 	/* flag recovery needed just to double check */
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	sysfs_notify_dirent_safe(mddev->sysfs_completed);
@@ -9539,6 +9538,7 @@ void md_reap_sync_thread(struct mddev *mddev)
 	md_new_event();
 	if (mddev->event_work.func)
 		queue_work(md_misc_wq, &mddev->event_work);
+	wake_up(&resync_wait);
 }
 EXPORT_SYMBOL(md_reap_sync_thread);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH -next v2 6/6] md: enhance checking in md_check_recovery()
  2023-05-29 13:20 [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
                   ` (4 preceding siblings ...)
  2023-05-29 13:20 ` [PATCH -next v2 5/6] md: wake up 'resync_wait' at last in md_reap_sync_thread() Yu Kuai
@ 2023-05-29 13:20 ` Yu Kuai
  2023-06-14  7:24   ` [dm-devel] " Xiao Ni
  2023-06-08  2:41 ` [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
  6 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-05-29 13:20 UTC (permalink / raw)
  To: guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

For md_check_recovery():

1) if 'MD_RECOVERY_RUNING' is not set, register new sync_thread.
2) if 'MD_RECOVERY_RUNING' is set:
 a) if 'MD_RECOVERY_DONE' is not set, don't do anything, wait for
   md_do_sync() to be done.
 b) if 'MD_RECOVERY_DONE' is set, unregister sync_thread. Current code
   expects that sync_thread is not NULL, otherwise new sync_thread will
   be registered, which will corrupt the array.

Make sure md_check_recovery() won't register new sync_thread if
'MD_RECOVERY_RUNING' is still set, and a new WARN_ON_ONCE() is added for
the above corruption,

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index f90226e6ddf8..9da0fc906bbd 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9397,16 +9397,24 @@ void md_check_recovery(struct mddev *mddev)
 		if (mddev->sb_flags)
 			md_update_sb(mddev, 0);
 
-		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
-		    !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
-			/* resync/recovery still happening */
-			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-			goto unlock;
-		}
-		if (mddev->sync_thread) {
+		/*
+		 * Never start a new sync thread if MD_RECOVERY_RUNNING is
+		 * still set.
+		 */
+		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
+			if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
+				/* resync/recovery still happening */
+				clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+				goto unlock;
+			}
+
+			if (WARN_ON_ONCE(!mddev->sync_thread))
+				goto unlock;
+
 			md_reap_sync_thread(mddev);
 			goto unlock;
 		}
+
 		/* Set RUNNING before clearing NEEDED to avoid
 		 * any transients in the value of "sync_action".
 		 */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running
  2023-05-29 13:20 [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
                   ` (5 preceding siblings ...)
  2023-05-29 13:20 ` [PATCH -next v2 6/6] md: enhance checking in md_check_recovery() Yu Kuai
@ 2023-06-08  2:41 ` Yu Kuai
  2023-06-09  4:44   ` Song Liu
  6 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-06-08  2:41 UTC (permalink / raw)
  To: Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/05/29 21:20, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Changes in v2:
>   - rebase for the latest md-next
> 
> Patch 1 revert the commit because it will cause MD_RECOVERY_RUNNING to be
> cleared while sync_thread is still running. The deadlock this patch tries
> to fix will be fixed by patch 2-5.
> 
> Patch 6 enhance checking to prevent MD_RECOVERY_RUNNING to be cleared
> while sync_thread is still running.

Any suggestions on this patchset? I already sent regression test
for the deadlock problem for both raid10 and raid456.

Thanks,
Kuai
> 
> Yu Kuai (6):
>    Revert "md: unlock mddev before reap sync_thread in action_store"
>    md: refactor action_store() for 'idle' and 'frozen'
>    md: add a mutex to synchronize idle and frozen in action_store()
>    md: refactor idle/frozen_sync_thread() to fix deadlock
>    md: wake up 'resync_wait' at last in md_reap_sync_thread()
>    md: enhance checking in md_check_recovery()
> 
>   drivers/md/dm-raid.c |   1 -
>   drivers/md/md.c      | 124 +++++++++++++++++++++++++++++--------------
>   drivers/md/md.h      |   5 ++
>   3 files changed, 88 insertions(+), 42 deletions(-)
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running
  2023-06-08  2:41 ` [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
@ 2023-06-09  4:44   ` Song Liu
  0 siblings, 0 replies; 37+ messages in thread
From: Song Liu @ 2023-06-09  4:44 UTC (permalink / raw)
  To: Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, linux-kernel, linux-raid,
	yi.zhang, yangerkun, yukuai (C)

On Wed, Jun 7, 2023 at 7:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/29 21:20, Yu Kuai 写道:
> > From: Yu Kuai <yukuai3@huawei.com>
> >
> > Changes in v2:
> >   - rebase for the latest md-next
> >
> > Patch 1 revert the commit because it will cause MD_RECOVERY_RUNNING to be
> > cleared while sync_thread is still running. The deadlock this patch tries
> > to fix will be fixed by patch 2-5.
> >
> > Patch 6 enhance checking to prevent MD_RECOVERY_RUNNING to be cleared
> > while sync_thread is still running.
>
> Any suggestions on this patchset? I already sent regression test
> for the deadlock problem for both raid10 and raid456.

Sorry for the delay. I will look into this soon.

Thanks,
Song

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH -next v2 1/6] Revert "md: unlock mddev before reap sync_thread in action_store"
  2023-05-29 13:20 ` [PATCH -next v2 1/6] Revert "md: unlock mddev before reap sync_thread in action_store" Yu Kuai
@ 2023-06-13  6:25   ` Xiao Ni
  2023-06-13 11:58     ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-13  6:25 UTC (permalink / raw)
  To: Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yi.zhang, yangerkun


在 2023/5/29 下午9:20, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> This reverts commit 9dfbdafda3b34e262e43e786077bab8e476a89d1.
>
> Because it will introduce a defect that sync_thread can be running while
> MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems,
> for example:
>
> list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0).
> Call trace:
>   __list_add_valid+0xfc/0x140
>   insert_work+0x78/0x1a0
>   __queue_work+0x500/0xcf4
>   queue_work_on+0xe8/0x12c
>   md_check_recovery+0xa34/0xf30
>   raid10d+0xb8/0x900 [raid10]
>   md_thread+0x16c/0x2cc
>   kthread+0x1a4/0x1ec
>   ret_from_fork+0x10/0x18
>
> This is because work is requeued while it's still inside workqueue:
>
> t1:			t2:
> action_store
>   mddev_lock
>    if (mddev->sync_thread)
>     mddev_unlock
>     md_unregister_thread
>     // first sync_thread is done
> 			md_check_recovery
> 			 mddev_try_lock
> 			 /*
> 			  * once MD_RECOVERY_DONE is set, new sync_thread
> 			  * can start.
> 			  */
> 			 set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
> 			 INIT_WORK(&mddev->del_work, md_start_sync)
> 			 queue_work(md_misc_wq, &mddev->del_work)
> 			  test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)
> 			  // set pending bit
> 			  insert_work
> 			   list_add_tail
> 			 mddev_unlock
>     mddev_lock_nointr
>     md_reap_sync_thread
>     // MD_RECOVERY_RUNNING is cleared
>   mddev_unlock
>
> t3:
>
> // before queued work started from t2
> md_check_recovery
>   // MD_RECOVERY_RUNNING is not set, a new sync_thread can be started
>   INIT_WORK(&mddev->del_work, md_start_sync)
>    work->data = 0
>    // work pending bit is cleared
>   queue_work(md_misc_wq, &mddev->del_work)
>    insert_work
>     list_add_tail
>     // list is corrupted
>
> The above commit is reverted to fix the problem, the deadlock this
> commit tries to fix will be fixed in following patches.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/dm-raid.c |  1 -
>   drivers/md/md.c      | 19 ++-----------------
>   2 files changed, 2 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index 8846bf510a35..1f22bef27841 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3725,7 +3725,6 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>   	if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>   		if (mddev->sync_thread) {
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_unregister_thread(&mddev->sync_thread);
>   			md_reap_sync_thread(mddev);
>   		}
>   	} else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index a5a7af2f4e59..9b97731e1fe4 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4772,19 +4772,6 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>   			if (work_pending(&mddev->del_work))
>   				flush_workqueue(md_misc_wq);
>   			if (mddev->sync_thread) {
> -				sector_t save_rp = mddev->reshape_position;
> -
> -				mddev_unlock(mddev);
> -				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -				md_unregister_thread(&mddev->sync_thread);
> -				mddev_lock_nointr(mddev);
> -				/*
> -				 * set RECOVERY_INTR again and restore reshape
> -				 * position in case others changed them after
> -				 * got lock, eg, reshape_position_store and
> -				 * md_check_recovery.
> -				 */
> -				mddev->reshape_position = save_rp;
>   				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>   				md_reap_sync_thread(mddev);
>   			}
> @@ -6184,7 +6171,6 @@ static void __md_stop_writes(struct mddev *mddev)
>   		flush_workqueue(md_misc_wq);
>   	if (mddev->sync_thread) {
>   		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -		md_unregister_thread(&mddev->sync_thread);
>   		md_reap_sync_thread(mddev);
>   	}
>   
> @@ -9336,7 +9322,6 @@ void md_check_recovery(struct mddev *mddev)
>   			 * ->spare_active and clear saved_raid_disk
>   			 */
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_unregister_thread(&mddev->sync_thread);
>   			md_reap_sync_thread(mddev);
>   			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>   			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> @@ -9372,7 +9357,6 @@ void md_check_recovery(struct mddev *mddev)
>   			goto unlock;
>   		}
>   		if (mddev->sync_thread) {
> -			md_unregister_thread(&mddev->sync_thread);
>   			md_reap_sync_thread(mddev);
>   			goto unlock;
>   		}
> @@ -9452,7 +9436,8 @@ void md_reap_sync_thread(struct mddev *mddev)
>   	sector_t old_dev_sectors = mddev->dev_sectors;
>   	bool is_reshaped = false;
>   
> -	/* sync_thread should be unregistered, collect result */
> +	/* resync has finished, collect result */
> +	md_unregister_thread(&mddev->sync_thread);
>   	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>   	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>   	    mddev->degraded != mddev->raid_disks) {


Hi Kuai

Thanks for the patch and the explanation in V1. In version1, I took much 
time to try to understand the problem. Maybe we can use the problem

itself as the subject. Something like "Don't allow two sync processes 
running at the same time"? And could you add the test steps which talked 
in v1

in the patch? It can help to understand the problem very much.

Best Regards

Xiao


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen'
  2023-05-29 13:20 ` [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen' Yu Kuai
@ 2023-06-13  8:02   ` Xiao Ni
  2023-06-13 12:00     ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-13  8:02 UTC (permalink / raw)
  To: Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai3


在 2023/5/29 下午9:20, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there
> are no functional changes except that MD_RECOVERY_RUNNING is checked
> again after 'reconfig_mutex' is held.


Can you explain more about why it needs to check MD_RECOVERY_RUNNING 
again here?

>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/md.c | 61 ++++++++++++++++++++++++++++++++++++-------------
>   1 file changed, 45 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 9b97731e1fe4..23e8e7eae062 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4755,6 +4755,46 @@ action_show(struct mddev *mddev, char *page)
>   	return sprintf(page, "%s\n", type);
>   }
>   
> +static void stop_sync_thread(struct mddev *mddev)
> +{
> +	if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> +		return;
> +
> +	if (mddev_lock(mddev))
> +		return;
> +
> +	/*
> +	 * Check again in case MD_RECOVERY_RUNNING is cleared before lock is
> +	 * held.
> +	 */
> +	if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> +		mddev_unlock(mddev);
> +		return;
> +	}
> +
> +	if (work_pending(&mddev->del_work))
> +		flush_workqueue(md_misc_wq);
> +
> +	if (mddev->sync_thread) {
> +		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> +		md_reap_sync_thread(mddev);
> +	}
> +
> +	mddev_unlock(mddev);
> +}
> +
> +static void idle_sync_thread(struct mddev *mddev)
> +{
> +	clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> +	stop_sync_thread(mddev);
> +}
> +
> +static void frozen_sync_thread(struct mddev *mddev)
> +{
> +	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> +	stop_sync_thread(mddev);
> +}
> +
>   static ssize_t
>   action_store(struct mddev *mddev, const char *page, size_t len)
>   {
> @@ -4762,22 +4802,11 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>   		return -EINVAL;
>   
>   
> -	if (cmd_match(page, "idle") || cmd_match(page, "frozen")) {
> -		if (cmd_match(page, "frozen"))
> -			set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> -		else
> -			clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> -		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
> -		    mddev_lock(mddev) == 0) {
> -			if (work_pending(&mddev->del_work))
> -				flush_workqueue(md_misc_wq);
> -			if (mddev->sync_thread) {
> -				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -				md_reap_sync_thread(mddev);
> -			}
> -			mddev_unlock(mddev);
> -		}
> -	} else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> +	if (cmd_match(page, "idle"))
> +		idle_sync_thread(mddev);
> +	else if (cmd_match(page, "frozen"))
> +		frozen_sync_thread(mddev);
> +	else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>   		return -EBUSY;
>   	else if (cmd_match(page, "resync"))
>   		clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH -next v2 1/6] Revert "md: unlock mddev before reap sync_thread in action_store"
  2023-06-13  6:25   ` Xiao Ni
@ 2023-06-13 11:58     ` Yu Kuai
  0 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2023-06-13 11:58 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/06/13 14:25, Xiao Ni 写道:
> Thanks for the patch and the explanation in V1. In version1, I took much 
> time to try to understand the problem. Maybe we can use the problem
> itself as the subject. Something like "Don't allow two sync processes 
> running at the same time"? And could you add the test steps which talked 
> in v1
> 
> in the patch? It can help to understand the problem very much.

Ok, thanks for the suggestion, I can do that in the next version.

Kuai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen'
  2023-06-13  8:02   ` [dm-devel] " Xiao Ni
@ 2023-06-13 12:00     ` Yu Kuai
  2023-06-13 12:25       ` Xiao Ni
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-06-13 12:00 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai (C)

Hi,

在 2023/06/13 16:02, Xiao Ni 写道:
> 
> 在 2023/5/29 下午9:20, Yu Kuai 写道:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, 
>> there
>> are no functional changes except that MD_RECOVERY_RUNNING is checked
>> again after 'reconfig_mutex' is held.
> 
> 
> Can you explain more about why it needs to check MD_RECOVERY_RUNNING 
> again here?

As I explain in the following comment:
>> +    /*
>> +     * Check again in case MD_RECOVERY_RUNNING is cleared before lock is
>> +     * held.
>> +     */
>> +    if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
>> +        mddev_unlock(mddev);
>> +        return;
>> +    }

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen'
  2023-06-13 12:00     ` Yu Kuai
@ 2023-06-13 12:25       ` Xiao Ni
  2023-06-13 12:44         ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-13 12:25 UTC (permalink / raw)
  To: Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, yi.zhang, yangerkun,
	linux-kernel, linux-raid, yukuai (C)

On Tue, Jun 13, 2023 at 8:00 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/13 16:02, Xiao Ni 写道:
> >
> > 在 2023/5/29 下午9:20, Yu Kuai 写道:
> >> From: Yu Kuai <yukuai3@huawei.com>
> >>
> >> Prepare to handle 'idle' and 'frozen' differently to fix a deadlock,
> >> there
> >> are no functional changes except that MD_RECOVERY_RUNNING is checked
> >> again after 'reconfig_mutex' is held.
> >
> >
> > Can you explain more about why it needs to check MD_RECOVERY_RUNNING
> > again here?
>
> As I explain in the following comment:

Hi

Who can clear the flag before the lock is held?

Regards
Xiao
> >> +    /*
> >> +     * Check again in case MD_RECOVERY_RUNNING is cleared before lock is
> >> +     * held.
> >> +     */
> >> +    if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> >> +        mddev_unlock(mddev);
> >> +        return;
> >> +    }
>
> Thanks,
> Kuai
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen'
  2023-06-13 12:25       ` Xiao Ni
@ 2023-06-13 12:44         ` Yu Kuai
  2023-06-13 14:14           ` Xiao Ni
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-06-13 12:44 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, yi.zhang, yangerkun,
	linux-kernel, linux-raid, yukuai (C)

Hi,

在 2023/06/13 20:25, Xiao Ni 写道:
> On Tue, Jun 13, 2023 at 8:00 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/06/13 16:02, Xiao Ni 写道:
>>>
>>> 在 2023/5/29 下午9:20, Yu Kuai 写道:
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> Prepare to handle 'idle' and 'frozen' differently to fix a deadlock,
>>>> there
>>>> are no functional changes except that MD_RECOVERY_RUNNING is checked
>>>> again after 'reconfig_mutex' is held.
>>>
>>>
>>> Can you explain more about why it needs to check MD_RECOVERY_RUNNING
>>> again here?
>>
>> As I explain in the following comment:
> 
> Hi
> 
> Who can clear the flag before the lock is held?

Basically every where that can clear the flag...

// This context 	// Other context
			mutex_lock
			...
test_bit -> pass
			clear_bit
			mutex_unlock
mutex_lock
test_bit -> check again

Thanks,
Kuai
> 
> Regards
> Xiao
>>>> +    /*
>>>> +     * Check again in case MD_RECOVERY_RUNNING is cleared before lock is
>>>> +     * held.
>>>> +     */
>>>> +    if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
>>>> +        mddev_unlock(mddev);
>>>> +        return;
>>>> +    }
>>
>> Thanks,
>> Kuai
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen'
  2023-06-13 12:44         ` Yu Kuai
@ 2023-06-13 14:14           ` Xiao Ni
  0 siblings, 0 replies; 37+ messages in thread
From: Xiao Ni @ 2023-06-13 14:14 UTC (permalink / raw)
  To: Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, yi.zhang, yangerkun,
	linux-kernel, linux-raid, yukuai (C)


在 2023/6/13 下午8:44, Yu Kuai 写道:
> Hi,
>
> 在 2023/06/13 20:25, Xiao Ni 写道:
>> On Tue, Jun 13, 2023 at 8:00 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>
>>> Hi,
>>>
>>> 在 2023/06/13 16:02, Xiao Ni 写道:
>>>>
>>>> 在 2023/5/29 下午9:20, Yu Kuai 写道:
>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>
>>>>> Prepare to handle 'idle' and 'frozen' differently to fix a deadlock,
>>>>> there
>>>>> are no functional changes except that MD_RECOVERY_RUNNING is checked
>>>>> again after 'reconfig_mutex' is held.
>>>>
>>>>
>>>> Can you explain more about why it needs to check MD_RECOVERY_RUNNING
>>>> again here?
>>>
>>> As I explain in the following comment:
>>
>> Hi
>>
>> Who can clear the flag before the lock is held?
>
> Basically every where that can clear the flag...
>
> // This context     // Other context
>             mutex_lock
>             ...
> test_bit -> pass
>             clear_bit
>             mutex_unlock
> mutex_lock
> test_bit -> check again
>
> Thanks,
> Kuai

At first, I wanted to figure out a specific case. Now I have the answer. 
Maybe there are two people that want to stop

the sync action at the same time. So this is the case that can be 
checked by the codes.

Regards

Xiao

>>
>> Regards
>> Xiao
>>>>> +    /*
>>>>> +     * Check again in case MD_RECOVERY_RUNNING is cleared before 
>>>>> lock is
>>>>> +     * held.
>>>>> +     */
>>>>> +    if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
>>>>> +        mddev_unlock(mddev);
>>>>> +        return;
>>>>> +    }
>>>
>>> Thanks,
>>> Kuai
>>>
>>
>> .
>>
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 3/6] md: add a mutex to synchronize idle and frozen in action_store()
  2023-05-29 13:20 ` [PATCH -next v2 3/6] md: add a mutex to synchronize idle and frozen in action_store() Yu Kuai
@ 2023-06-13 14:43   ` Xiao Ni
  2023-06-14  1:15     ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-13 14:43 UTC (permalink / raw)
  To: Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai3


在 2023/5/29 下午9:20, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Currently, for idle and frozen, action_store will hold 'reconfig_mutex'
> and call md_reap_sync_thread() to stop sync thread, however, this will
> cause deadlock (explained in the next patch). In order to fix the
> problem, following patch will release 'reconfig_mutex' and wait on
> 'resync_wait', like md_set_readonly() and do_md_stop() does.
>
> Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN'
> unconditionally, which might cause unexpected problems, for example,
> frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while
> 'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which
> might starve in progress frozen. A mutex is added to synchronize idle
> and frozen from action_store().
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/md.c | 5 +++++
>   drivers/md/md.h | 3 +++
>   2 files changed, 8 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 23e8e7eae062..63a993b52cd7 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -644,6 +644,7 @@ void mddev_init(struct mddev *mddev)
>   	mutex_init(&mddev->open_mutex);
>   	mutex_init(&mddev->reconfig_mutex);
>   	mutex_init(&mddev->delete_mutex);
> +	mutex_init(&mddev->sync_mutex);
>   	mutex_init(&mddev->bitmap_info.mutex);
>   	INIT_LIST_HEAD(&mddev->disks);
>   	INIT_LIST_HEAD(&mddev->all_mddevs);
> @@ -4785,14 +4786,18 @@ static void stop_sync_thread(struct mddev *mddev)
>   
>   static void idle_sync_thread(struct mddev *mddev)
>   {
> +	mutex_lock(&mddev->sync_mutex);
>   	clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>   	stop_sync_thread(mddev);
> +	mutex_unlock(&mddev->sync_mutex);
>   }
>   
>   static void frozen_sync_thread(struct mddev *mddev)
>   {
> +	mutex_init(&mddev->delete_mutex);


typo error? It should be mutex_lock(&mddev->sync_mutex); ?

Regards

Xiao

>   	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>   	stop_sync_thread(mddev);
> +	mutex_unlock(&mddev->sync_mutex);
>   }
>   
>   static ssize_t
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index bfd2306bc750..2fa903de5bd0 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -537,6 +537,9 @@ struct mddev {
>   	/* Protect the deleting list */
>   	struct mutex			delete_mutex;
>   
> +	/* Used to synchronize idle and frozen for action_store() */
> +	struct mutex			sync_mutex;
> +
>   	bool	has_superblocks:1;
>   	bool	fail_last_dev:1;
>   	bool	serialize_policy:1;


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-05-29 13:20 ` [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock Yu Kuai
@ 2023-06-13 14:50   ` Xiao Ni
  2023-06-14  1:48     ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-13 14:50 UTC (permalink / raw)
  To: Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai3


在 2023/5/29 下午9:20, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Our test found a following deadlock in raid10:
>
> 1) Issue a normal write, and such write failed:
>
>    raid10_end_write_request
>     set_bit(R10BIO_WriteError, &r10_bio->state)
>     one_write_done
>      reschedule_retry
>
>    // later from md thread
>    raid10d
>     handle_write_completed
>      list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
>
>    // later from md thread
>    raid10d
>     if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>      list_move(conf->bio_end_io_list.prev, &tmp)
>      r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
>      raid_end_bio_io(r10_bio)
>
> Dependency chain 1: normal io is waiting for updating superblock

Hi Kuai

It looks like the above situation is more complex. It only needs a 
normal write and md_write_start needs to

wait until the metadata is written to member disks, right? If so, it 
doesn't need to introduce raid10 write failure

here. I guess, it should be your test case. It's nice, if you can put 
your test steps in the patch. But for the analysis

of the deadlock here, it's better to be simple.

>
> 2) Trigger a recovery:
>
>    raid10_sync_request
>     raise_barrier
>
> Dependency chain 2: sync thread is waiting for normal io
>
> 3) echo idle/frozen to sync_action:
>
>    action_store
>     mddev_lock
>      md_unregister_thread
>       kthread_stop
>
> Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread
>
> 4) md thread can't update superblock:
>
>    raid10d
>     md_check_recovery
>      if (mddev_trylock(mddev))
>       md_update_sb
>
> Dependency chain 4: update superblock is waiting for 'reconfig_mutex'
>
> Hence cyclic dependency exist, in order to fix the problem, we must
> break one of them. Dependency 1 and 2 can't be broken because they are
> foundation design. Dependency 4 may be possible if it can be guaranteed
> that no io can be inflight, however, this requires a new mechanism which
> seems complex. Dependency 3 is a good choice, because idle/frozen only
> requires sync thread to finish, which can be done asynchronously that is
> already implemented, and 'reconfig_mutex' is not needed anymore.
>
> This patch switch 'idle' and 'frozen' to wait sync thread to be done
> asynchronously, and this patch also add a sequence counter to record how
> many times sync thread is done, so that 'idle' won't keep waiting on new
> started sync thread.

In the patch, sync_seq is added in md_reap_sync_thread. In 
idle_sync_thread, if sync_seq isn't equal

mddev->sync_seq, it should mean there is someone that stops the sync 
thread already, right? Why do

you say 'new started sync thread' here?

Regards

Xiao


>
> Noted that raid456 has similiar deadlock([1]), and it's verified[2] this
> deadlock can be fixed by this patch as well.
>
> [1] https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> [2] https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/md.c | 23 +++++++++++++++++++----
>   drivers/md/md.h |  2 ++
>   2 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 63a993b52cd7..7912de0e4d12 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -652,6 +652,7 @@ void mddev_init(struct mddev *mddev)
>   	timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0);
>   	atomic_set(&mddev->active, 1);
>   	atomic_set(&mddev->openers, 0);
> +	atomic_set(&mddev->sync_seq, 0);
>   	spin_lock_init(&mddev->lock);
>   	atomic_set(&mddev->flush_pending, 0);
>   	init_waitqueue_head(&mddev->sb_wait);
> @@ -4776,19 +4777,27 @@ static void stop_sync_thread(struct mddev *mddev)
>   	if (work_pending(&mddev->del_work))
>   		flush_workqueue(md_misc_wq);
>   
> -	if (mddev->sync_thread) {
> -		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -		md_reap_sync_thread(mddev);
> -	}
> +	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> +	/*
> +	 * Thread might be blocked waiting for metadata update which will now
> +	 * never happen
> +	 */
> +	md_wakeup_thread_directly(mddev->sync_thread);
>   
>   	mddev_unlock(mddev);
>   }
>   
>   static void idle_sync_thread(struct mddev *mddev)
>   {
> +	int sync_seq = atomic_read(&mddev->sync_seq);
> +
>   	mutex_lock(&mddev->sync_mutex);
>   	clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>   	stop_sync_thread(mddev);
> +
> +	wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) ||
> +			!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
> +
>   	mutex_unlock(&mddev->sync_mutex);
>   }
>   
> @@ -4797,6 +4806,10 @@ static void frozen_sync_thread(struct mddev *mddev)
>   	mutex_init(&mddev->delete_mutex);
>   	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>   	stop_sync_thread(mddev);
> +
> +	wait_event(resync_wait, mddev->sync_thread == NULL &&
> +			!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
> +
>   	mutex_unlock(&mddev->sync_mutex);
>   }
>   
> @@ -9472,6 +9485,8 @@ void md_reap_sync_thread(struct mddev *mddev)
>   
>   	/* resync has finished, collect result */
>   	md_unregister_thread(&mddev->sync_thread);
> +	atomic_inc(&mddev->sync_seq);
> +
>   	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>   	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>   	    mddev->degraded != mddev->raid_disks) {
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 2fa903de5bd0..7cab9c7c45b8 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -539,6 +539,8 @@ struct mddev {
>   
>   	/* Used to synchronize idle and frozen for action_store() */
>   	struct mutex			sync_mutex;
> +	/* The sequence number for sync thread */
> +	atomic_t sync_seq;
>   
>   	bool	has_superblocks:1;
>   	bool	fail_last_dev:1;


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 3/6] md: add a mutex to synchronize idle and frozen in action_store()
  2023-06-13 14:43   ` [dm-devel] " Xiao Ni
@ 2023-06-14  1:15     ` Yu Kuai
  2023-06-16  6:41       ` Song Liu
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-06-14  1:15 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai (C)

Hi,

在 2023/06/13 22:43, Xiao Ni 写道:
> 
> 在 2023/5/29 下午9:20, Yu Kuai 写道:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Currently, for idle and frozen, action_store will hold 'reconfig_mutex'
>> and call md_reap_sync_thread() to stop sync thread, however, this will
>> cause deadlock (explained in the next patch). In order to fix the
>> problem, following patch will release 'reconfig_mutex' and wait on
>> 'resync_wait', like md_set_readonly() and do_md_stop() does.
>>
>> Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN'
>> unconditionally, which might cause unexpected problems, for example,
>> frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while
>> 'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which
>> might starve in progress frozen. A mutex is added to synchronize idle
>> and frozen from action_store().
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/md.c | 5 +++++
>>   drivers/md/md.h | 3 +++
>>   2 files changed, 8 insertions(+)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 23e8e7eae062..63a993b52cd7 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -644,6 +644,7 @@ void mddev_init(struct mddev *mddev)
>>       mutex_init(&mddev->open_mutex);
>>       mutex_init(&mddev->reconfig_mutex);
>>       mutex_init(&mddev->delete_mutex);
>> +    mutex_init(&mddev->sync_mutex);
>>       mutex_init(&mddev->bitmap_info.mutex);
>>       INIT_LIST_HEAD(&mddev->disks);
>>       INIT_LIST_HEAD(&mddev->all_mddevs);
>> @@ -4785,14 +4786,18 @@ static void stop_sync_thread(struct mddev *mddev)
>>   static void idle_sync_thread(struct mddev *mddev)
>>   {
>> +    mutex_lock(&mddev->sync_mutex);
>>       clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>       stop_sync_thread(mddev);
>> +    mutex_unlock(&mddev->sync_mutex);
>>   }
>>   static void frozen_sync_thread(struct mddev *mddev)
>>   {
>> +    mutex_init(&mddev->delete_mutex);
> 
> 
> typo error? It should be mutex_lock(&mddev->sync_mutex); ?
> 

Yes, and thanks for spotting this, this looks like I did this while
rebasing.

Thanks,
Kuai
> Regards
> 
> Xiao
> 
>>       set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>       stop_sync_thread(mddev);
>> +    mutex_unlock(&mddev->sync_mutex);
>>   }
>>   static ssize_t
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index bfd2306bc750..2fa903de5bd0 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -537,6 +537,9 @@ struct mddev {
>>       /* Protect the deleting list */
>>       struct mutex            delete_mutex;
>> +    /* Used to synchronize idle and frozen for action_store() */
>> +    struct mutex            sync_mutex;
>> +
>>       bool    has_superblocks:1;
>>       bool    fail_last_dev:1;
>>       bool    serialize_policy:1;
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-13 14:50   ` [dm-devel] " Xiao Ni
@ 2023-06-14  1:48     ` Yu Kuai
  2023-06-14  2:04       ` Yu Kuai
  2023-06-14  3:47       ` Xiao Ni
  0 siblings, 2 replies; 37+ messages in thread
From: Yu Kuai @ 2023-06-14  1:48 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-raid, yangerkun, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2023/06/13 22:50, Xiao Ni 写道:
> 
> 在 2023/5/29 下午9:20, Yu Kuai 写道:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Our test found a following deadlock in raid10:
>>
>> 1) Issue a normal write, and such write failed:
>>
>>    raid10_end_write_request
>>     set_bit(R10BIO_WriteError, &r10_bio->state)
>>     one_write_done
>>      reschedule_retry
>>
>>    // later from md thread
>>    raid10d
>>     handle_write_completed
>>      list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
>>
>>    // later from md thread
>>    raid10d
>>     if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>>      list_move(conf->bio_end_io_list.prev, &tmp)
>>      r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
>>      raid_end_bio_io(r10_bio)
>>
>> Dependency chain 1: normal io is waiting for updating superblock
> 
> Hi Kuai
> 
> It looks like the above situation is more complex. It only needs a 
> normal write and md_write_start needs to
> 
> wait until the metadata is written to member disks, right? If so, it 
> doesn't need to introduce raid10 write failure
> 
> here. I guess, it should be your test case. It's nice, if you can put 
> your test steps in the patch. But for the analysis
> 
> of the deadlock here, it's better to be simple.

Test script can be found here, it's pretty easy to trigger:

https://patchwork.kernel.org/project/linux-raid/patch/20230529132826.2125392-4-yukuai1@huaweicloud.com/

While reviewing the related code, I found that io can only be added to
list bio_end_io_list from handle_write_completed() if such io failed, so
I think io failure is needed to trigger deadlock from daemon thread.

I think the key point is how MD_SB_CHANGE_PENDING is set:

1) raid10_error() and rdev_set_badblocks(), trigger by io failure;
2) raid10_write_request() related to reshape;
3) md_write_start() and md_allow_write(), and mddev->in_sync is set,
however, I was thinking this is not a common case;

1) is used here because it's quite easy to trigger and this is what
we meet in real test. 3) is possible but I will say let's keep 1), I
don't think it's necessary to reporduce this deadlock through another
path again.

Thanks,
Kuai
> 
>>
>> 2) Trigger a recovery:
>>
>>    raid10_sync_request
>>     raise_barrier
>>
>> Dependency chain 2: sync thread is waiting for normal io
>>
>> 3) echo idle/frozen to sync_action:
>>
>>    action_store
>>     mddev_lock
>>      md_unregister_thread
>>       kthread_stop
>>
>> Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread
>>
>> 4) md thread can't update superblock:
>>
>>    raid10d
>>     md_check_recovery
>>      if (mddev_trylock(mddev))
>>       md_update_sb
>>
>> Dependency chain 4: update superblock is waiting for 'reconfig_mutex'
>>
>> Hence cyclic dependency exist, in order to fix the problem, we must
>> break one of them. Dependency 1 and 2 can't be broken because they are
>> foundation design. Dependency 4 may be possible if it can be guaranteed
>> that no io can be inflight, however, this requires a new mechanism which
>> seems complex. Dependency 3 is a good choice, because idle/frozen only
>> requires sync thread to finish, which can be done asynchronously that is
>> already implemented, and 'reconfig_mutex' is not needed anymore.
>>
>> This patch switch 'idle' and 'frozen' to wait sync thread to be done
>> asynchronously, and this patch also add a sequence counter to record how
>> many times sync thread is done, so that 'idle' won't keep waiting on new
>> started sync thread.
> 
> In the patch, sync_seq is added in md_reap_sync_thread. In 
> idle_sync_thread, if sync_seq isn't equal
> 
> mddev->sync_seq, it should mean there is someone that stops the sync 
> thread already, right? Why do
> 
> you say 'new started sync thread' here?
> 
> Regards
> 
> Xiao
> 
> 
>>
>> Noted that raid456 has similiar deadlock([1]), and it's verified[2] this
>> deadlock can be fixed by this patch as well.
>>
>> [1] 
>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t 
>>
>> [2] 
>> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/ 
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/md.c | 23 +++++++++++++++++++----
>>   drivers/md/md.h |  2 ++
>>   2 files changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 63a993b52cd7..7912de0e4d12 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -652,6 +652,7 @@ void mddev_init(struct mddev *mddev)
>>       timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0);
>>       atomic_set(&mddev->active, 1);
>>       atomic_set(&mddev->openers, 0);
>> +    atomic_set(&mddev->sync_seq, 0);
>>       spin_lock_init(&mddev->lock);
>>       atomic_set(&mddev->flush_pending, 0);
>>       init_waitqueue_head(&mddev->sb_wait);
>> @@ -4776,19 +4777,27 @@ static void stop_sync_thread(struct mddev *mddev)
>>       if (work_pending(&mddev->del_work))
>>           flush_workqueue(md_misc_wq);
>> -    if (mddev->sync_thread) {
>> -        set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -        md_reap_sync_thread(mddev);
>> -    }
>> +    set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> +    /*
>> +     * Thread might be blocked waiting for metadata update which will 
>> now
>> +     * never happen
>> +     */
>> +    md_wakeup_thread_directly(mddev->sync_thread);
>>       mddev_unlock(mddev);
>>   }
>>   static void idle_sync_thread(struct mddev *mddev)
>>   {
>> +    int sync_seq = atomic_read(&mddev->sync_seq);
>> +
>>       mutex_lock(&mddev->sync_mutex);
>>       clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>       stop_sync_thread(mddev);
>> +
>> +    wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) ||
>> +            !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
>> +
>>       mutex_unlock(&mddev->sync_mutex);
>>   }
>> @@ -4797,6 +4806,10 @@ static void frozen_sync_thread(struct mddev 
>> *mddev)
>>       mutex_init(&mddev->delete_mutex);
>>       set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>       stop_sync_thread(mddev);
>> +
>> +    wait_event(resync_wait, mddev->sync_thread == NULL &&
>> +            !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
>> +
>>       mutex_unlock(&mddev->sync_mutex);
>>   }
>> @@ -9472,6 +9485,8 @@ void md_reap_sync_thread(struct mddev *mddev)
>>       /* resync has finished, collect result */
>>       md_unregister_thread(&mddev->sync_thread);
>> +    atomic_inc(&mddev->sync_seq);
>> +
>>       if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>>           !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>>           mddev->degraded != mddev->raid_disks) {
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 2fa903de5bd0..7cab9c7c45b8 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -539,6 +539,8 @@ struct mddev {
>>       /* Used to synchronize idle and frozen for action_store() */
>>       struct mutex            sync_mutex;
>> +    /* The sequence number for sync thread */
>> +    atomic_t sync_seq;
>>       bool    has_superblocks:1;
>>       bool    fail_last_dev:1;
> 
> -- 
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-14  1:48     ` Yu Kuai
@ 2023-06-14  2:04       ` Yu Kuai
  2023-06-14  7:12         ` Xiao Ni
  2023-06-14  3:47       ` Xiao Ni
  1 sibling, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-06-14  2:04 UTC (permalink / raw)
  To: Yu Kuai, Xiao Ni, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-raid, yangerkun, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2023/06/14 9:48, Yu Kuai 写道:


>>
>> In the patch, sync_seq is added in md_reap_sync_thread. In 
>> idle_sync_thread, if sync_seq isn't equal
>>
>> mddev->sync_seq, it should mean there is someone that stops the sync 
>> thread already, right? Why do
>>
>> you say 'new started sync thread' here?

If someone stops the sync thread, and new sync thread is not started,
then this sync_seq won't make a difference, above wait_event() will not
wait because !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) will pass.
So 'sync_seq' is only used when the old sync thread stops and new sync
thread starts, add 'sync_seq' will bypass this case.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-14  1:48     ` Yu Kuai
  2023-06-14  2:04       ` Yu Kuai
@ 2023-06-14  3:47       ` Xiao Ni
  2023-06-14  6:04         ` Yu Kuai
  1 sibling, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-14  3:47 UTC (permalink / raw)
  To: Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, linux-raid,
	yangerkun, linux-kernel, yi.zhang, yukuai (C)

On Wed, Jun 14, 2023 at 9:48 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/13 22:50, Xiao Ni 写道:
> >
> > 在 2023/5/29 下午9:20, Yu Kuai 写道:
> >> From: Yu Kuai <yukuai3@huawei.com>
> >>
> >> Our test found a following deadlock in raid10:
> >>
> >> 1) Issue a normal write, and such write failed:
> >>
> >>    raid10_end_write_request
> >>     set_bit(R10BIO_WriteError, &r10_bio->state)
> >>     one_write_done
> >>      reschedule_retry
> >>
> >>    // later from md thread
> >>    raid10d
> >>     handle_write_completed
> >>      list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
> >>
> >>    // later from md thread
> >>    raid10d
> >>     if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
> >>      list_move(conf->bio_end_io_list.prev, &tmp)
> >>      r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
> >>      raid_end_bio_io(r10_bio)
> >>
> >> Dependency chain 1: normal io is waiting for updating superblock
> >
> > Hi Kuai
> >
> > It looks like the above situation is more complex. It only needs a
> > normal write and md_write_start needs to
> >
> > wait until the metadata is written to member disks, right? If so, it
> > doesn't need to introduce raid10 write failure
> >
> > here. I guess, it should be your test case. It's nice, if you can put
> > your test steps in the patch. But for the analysis
> >
> > of the deadlock here, it's better to be simple.
>
> Test script can be found here, it's pretty easy to trigger:
>
> https://patchwork.kernel.org/project/linux-raid/patch/20230529132826.2125392-4-yukuai1@huaweicloud.com/

Thanks for this.
>
> While reviewing the related code, I found that io can only be added to
> list bio_end_io_list from handle_write_completed() if such io failed, so
> I think io failure is needed to trigger deadlock from daemon thread.
>
> I think the key point is how MD_SB_CHANGE_PENDING is set:
>
> 1) raid10_error() and rdev_set_badblocks(), trigger by io failure;
> 2) raid10_write_request() related to reshape;
> 3) md_write_start() and md_allow_write(), and mddev->in_sync is set,
> however, I was thinking this is not a common case;
>
> 1) is used here because it's quite easy to trigger and this is what
> we meet in real test. 3) is possible but I will say let's keep 1), I
> don't think it's necessary to reporduce this deadlock through another
> path again.

It makes sense. Let's go back to the first path mentioned in the patch.

> 1) Issue a normal write, and such write failed:
>
>    raid10_end_write_request
>     set_bit(R10BIO_WriteError, &r10_bio->state)
>     one_write_done
>      reschedule_retry

This is good.
>
>    // later from md thread
>    raid10d
>     handle_write_completed
>      list_add(&r10_bio->retry_list, &conf->bio_end_io_list)

I have a question here. It should run narrow_write_error in
handle_write_completed. In the test case, will narrow_write_error run
successfully? Or it fails and will call rdev_set_badblocks and
md_error. So MD_RECOVERY_PENDING will be set?

>
>    // later from md thread
>    raid10d
>     if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>      list_move(conf->bio_end_io_list.prev, &tmp)
>      r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
>      raid_end_bio_io(r10_bio)
>
> Dependency chain 1: normal io is waiting for updating superblock

It's a little hard to understand. Because it doesn't show how normal
io waits for a superblock update. And based on your last email, I
guess you want to say rdev_set_badblock sets MD_RECOVERY_PENDING, but
the flag can't be cleared, so the bios can't be added to
bio_end_io_list, so the io rquests can't be finished.

Regards
Xiao
>
> Thanks,
> Kuai
> >
> >>
> >> 2) Trigger a recovery:
> >>
> >>    raid10_sync_request
> >>     raise_barrier
> >>
> >> Dependency chain 2: sync thread is waiting for normal io
> >>
> >> 3) echo idle/frozen to sync_action:
> >>
> >>    action_store
> >>     mddev_lock
> >>      md_unregister_thread
> >>       kthread_stop
> >>
> >> Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread
> >>
> >> 4) md thread can't update superblock:
> >>
> >>    raid10d
> >>     md_check_recovery
> >>      if (mddev_trylock(mddev))
> >>       md_update_sb
> >>
> >> Dependency chain 4: update superblock is waiting for 'reconfig_mutex'
> >>
> >> Hence cyclic dependency exist, in order to fix the problem, we must
> >> break one of them. Dependency 1 and 2 can't be broken because they are
> >> foundation design. Dependency 4 may be possible if it can be guaranteed
> >> that no io can be inflight, however, this requires a new mechanism which
> >> seems complex. Dependency 3 is a good choice, because idle/frozen only
> >> requires sync thread to finish, which can be done asynchronously that is
> >> already implemented, and 'reconfig_mutex' is not needed anymore.
> >>
> >> This patch switch 'idle' and 'frozen' to wait sync thread to be done
> >> asynchronously, and this patch also add a sequence counter to record how
> >> many times sync thread is done, so that 'idle' won't keep waiting on new
> >> started sync thread.
> >
> > In the patch, sync_seq is added in md_reap_sync_thread. In
> > idle_sync_thread, if sync_seq isn't equal
> >
> > mddev->sync_seq, it should mean there is someone that stops the sync
> > thread already, right? Why do
> >
> > you say 'new started sync thread' here?
> >
> > Regards
> >
> > Xiao
> >
> >
> >>
> >> Noted that raid456 has similiar deadlock([1]), and it's verified[2] this
> >> deadlock can be fixed by this patch as well.
> >>
> >> [1]
> >> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> >>
> >> [2]
> >> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
> >>
> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> >> ---
> >>   drivers/md/md.c | 23 +++++++++++++++++++----
> >>   drivers/md/md.h |  2 ++
> >>   2 files changed, 21 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 63a993b52cd7..7912de0e4d12 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -652,6 +652,7 @@ void mddev_init(struct mddev *mddev)
> >>       timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0);
> >>       atomic_set(&mddev->active, 1);
> >>       atomic_set(&mddev->openers, 0);
> >> +    atomic_set(&mddev->sync_seq, 0);
> >>       spin_lock_init(&mddev->lock);
> >>       atomic_set(&mddev->flush_pending, 0);
> >>       init_waitqueue_head(&mddev->sb_wait);
> >> @@ -4776,19 +4777,27 @@ static void stop_sync_thread(struct mddev *mddev)
> >>       if (work_pending(&mddev->del_work))
> >>           flush_workqueue(md_misc_wq);
> >> -    if (mddev->sync_thread) {
> >> -        set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >> -        md_reap_sync_thread(mddev);
> >> -    }
> >> +    set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >> +    /*
> >> +     * Thread might be blocked waiting for metadata update which will
> >> now
> >> +     * never happen
> >> +     */
> >> +    md_wakeup_thread_directly(mddev->sync_thread);
> >>       mddev_unlock(mddev);
> >>   }
> >>   static void idle_sync_thread(struct mddev *mddev)
> >>   {
> >> +    int sync_seq = atomic_read(&mddev->sync_seq);
> >> +
> >>       mutex_lock(&mddev->sync_mutex);
> >>       clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> >>       stop_sync_thread(mddev);
> >> +
> >> +    wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) ||
> >> +            !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
> >> +
> >>       mutex_unlock(&mddev->sync_mutex);
> >>   }
> >> @@ -4797,6 +4806,10 @@ static void frozen_sync_thread(struct mddev
> >> *mddev)
> >>       mutex_init(&mddev->delete_mutex);
> >>       set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> >>       stop_sync_thread(mddev);
> >> +
> >> +    wait_event(resync_wait, mddev->sync_thread == NULL &&
> >> +            !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
> >> +
> >>       mutex_unlock(&mddev->sync_mutex);
> >>   }
> >> @@ -9472,6 +9485,8 @@ void md_reap_sync_thread(struct mddev *mddev)
> >>       /* resync has finished, collect result */
> >>       md_unregister_thread(&mddev->sync_thread);
> >> +    atomic_inc(&mddev->sync_seq);
> >> +
> >>       if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
> >>           !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
> >>           mddev->degraded != mddev->raid_disks) {
> >> diff --git a/drivers/md/md.h b/drivers/md/md.h
> >> index 2fa903de5bd0..7cab9c7c45b8 100644
> >> --- a/drivers/md/md.h
> >> +++ b/drivers/md/md.h
> >> @@ -539,6 +539,8 @@ struct mddev {
> >>       /* Used to synchronize idle and frozen for action_store() */
> >>       struct mutex            sync_mutex;
> >> +    /* The sequence number for sync thread */
> >> +    atomic_t sync_seq;
> >>       bool    has_superblocks:1;
> >>       bool    fail_last_dev:1;
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://listman.redhat.com/mailman/listinfo/dm-devel
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-14  3:47       ` Xiao Ni
@ 2023-06-14  6:04         ` Yu Kuai
  2023-06-14  6:37           ` Xiao Ni
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-06-14  6:04 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, linux-raid,
	yangerkun, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2023/06/14 11:47, Xiao Ni 写道:
> On Wed, Jun 14, 2023 at 9:48 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/06/13 22:50, Xiao Ni 写道:
>>>
>>> 在 2023/5/29 下午9:20, Yu Kuai 写道:
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> Our test found a following deadlock in raid10:
>>>>
>>>> 1) Issue a normal write, and such write failed:
>>>>
>>>>     raid10_end_write_request
>>>>      set_bit(R10BIO_WriteError, &r10_bio->state)
>>>>      one_write_done
>>>>       reschedule_retry
>>>>
>>>>     // later from md thread
>>>>     raid10d
>>>>      handle_write_completed
>>>>       list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
>>>>
>>>>     // later from md thread
>>>>     raid10d
>>>>      if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>>>>       list_move(conf->bio_end_io_list.prev, &tmp)
>>>>       r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
>>>>       raid_end_bio_io(r10_bio)
>>>>
>>>> Dependency chain 1: normal io is waiting for updating superblock
>>>
>>> Hi Kuai
>>>
>>> It looks like the above situation is more complex. It only needs a
>>> normal write and md_write_start needs to
>>>
>>> wait until the metadata is written to member disks, right? If so, it
>>> doesn't need to introduce raid10 write failure
>>>
>>> here. I guess, it should be your test case. It's nice, if you can put
>>> your test steps in the patch. But for the analysis
>>>
>>> of the deadlock here, it's better to be simple.
>>
>> Test script can be found here, it's pretty easy to trigger:
>>
>> https://patchwork.kernel.org/project/linux-raid/patch/20230529132826.2125392-4-yukuai1@huaweicloud.com/
> 
> Thanks for this.
>>
>> While reviewing the related code, I found that io can only be added to
>> list bio_end_io_list from handle_write_completed() if such io failed, so
>> I think io failure is needed to trigger deadlock from daemon thread.
>>
>> I think the key point is how MD_SB_CHANGE_PENDING is set:
>>
>> 1) raid10_error() and rdev_set_badblocks(), trigger by io failure;
>> 2) raid10_write_request() related to reshape;
>> 3) md_write_start() and md_allow_write(), and mddev->in_sync is set,
>> however, I was thinking this is not a common case;
>>
>> 1) is used here because it's quite easy to trigger and this is what
>> we meet in real test. 3) is possible but I will say let's keep 1), I
>> don't think it's necessary to reporduce this deadlock through another
>> path again.
> 
> It makes sense. Let's go back to the first path mentioned in the patch.
> 
>> 1) Issue a normal write, and such write failed:
>>
>>     raid10_end_write_request
>>      set_bit(R10BIO_WriteError, &r10_bio->state)
>>      one_write_done
>>       reschedule_retry
> 
> This is good.
>>
>>     // later from md thread
>>     raid10d
>>      handle_write_completed
>>       list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
> 
> I have a question here. It should run narrow_write_error in
> handle_write_completed. In the test case, will narrow_write_error run
> successfully? Or it fails and will call rdev_set_badblocks and
> md_error. So MD_RECOVERY_PENDING will be set?

r10_bio will always be added to bio_end_io_list, no matter
narrow_write_error() succeed or not. The dependecy chain 1 here is just
indicate handle this r10_bio will wait for updating super block, it's
not where MD_RECOVERY_PENDING is set...

And MD_RECOVERY_PENDING can be set from narrow_write_error() and other
places where rdev_set_badblocks() is called.
> 
>>
>>     // later from md thread
>>     raid10d
>>      if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
	-> It's here, if the flag is set, bio won't be handled.
>>       list_move(conf->bio_end_io_list.prev, &tmp)
>>       r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
>>       raid_end_bio_io(r10_bio)
>>
>> Dependency chain 1: normal io is waiting for updating superblock
> 
> It's a little hard to understand. Because it doesn't show how normal
> io waits for a superblock update. And based on your last email, I
> guess you want to say rdev_set_badblock sets MD_RECOVERY_PENDING, but
> the flag can't be cleared, so the bios can't be added to
> bio_end_io_list, so the io rquests can't be finished.

It's not that bio can't be added to bio_end_io_list, it's that bio in
this list can't be handled if sb_flags is set.

Thanks,
Kuai
> 
> Regards
> Xiao
>>
>> Thanks,
>> Kuai
>>>
>>>>
>>>> 2) Trigger a recovery:
>>>>
>>>>     raid10_sync_request
>>>>      raise_barrier
>>>>
>>>> Dependency chain 2: sync thread is waiting for normal io
>>>>
>>>> 3) echo idle/frozen to sync_action:
>>>>
>>>>     action_store
>>>>      mddev_lock
>>>>       md_unregister_thread
>>>>        kthread_stop
>>>>
>>>> Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread
>>>>
>>>> 4) md thread can't update superblock:
>>>>
>>>>     raid10d
>>>>      md_check_recovery
>>>>       if (mddev_trylock(mddev))
>>>>        md_update_sb
>>>>
>>>> Dependency chain 4: update superblock is waiting for 'reconfig_mutex'
>>>>
>>>> Hence cyclic dependency exist, in order to fix the problem, we must
>>>> break one of them. Dependency 1 and 2 can't be broken because they are
>>>> foundation design. Dependency 4 may be possible if it can be guaranteed
>>>> that no io can be inflight, however, this requires a new mechanism which
>>>> seems complex. Dependency 3 is a good choice, because idle/frozen only
>>>> requires sync thread to finish, which can be done asynchronously that is
>>>> already implemented, and 'reconfig_mutex' is not needed anymore.
>>>>
>>>> This patch switch 'idle' and 'frozen' to wait sync thread to be done
>>>> asynchronously, and this patch also add a sequence counter to record how
>>>> many times sync thread is done, so that 'idle' won't keep waiting on new
>>>> started sync thread.
>>>
>>> In the patch, sync_seq is added in md_reap_sync_thread. In
>>> idle_sync_thread, if sync_seq isn't equal
>>>
>>> mddev->sync_seq, it should mean there is someone that stops the sync
>>> thread already, right? Why do
>>>
>>> you say 'new started sync thread' here?
>>>
>>> Regards
>>>
>>> Xiao
>>>
>>>
>>>>
>>>> Noted that raid456 has similiar deadlock([1]), and it's verified[2] this
>>>> deadlock can be fixed by this patch as well.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>>>
>>>> [2]
>>>> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
>>>>
>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>> ---
>>>>    drivers/md/md.c | 23 +++++++++++++++++++----
>>>>    drivers/md/md.h |  2 ++
>>>>    2 files changed, 21 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>> index 63a993b52cd7..7912de0e4d12 100644
>>>> --- a/drivers/md/md.c
>>>> +++ b/drivers/md/md.c
>>>> @@ -652,6 +652,7 @@ void mddev_init(struct mddev *mddev)
>>>>        timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0);
>>>>        atomic_set(&mddev->active, 1);
>>>>        atomic_set(&mddev->openers, 0);
>>>> +    atomic_set(&mddev->sync_seq, 0);
>>>>        spin_lock_init(&mddev->lock);
>>>>        atomic_set(&mddev->flush_pending, 0);
>>>>        init_waitqueue_head(&mddev->sb_wait);
>>>> @@ -4776,19 +4777,27 @@ static void stop_sync_thread(struct mddev *mddev)
>>>>        if (work_pending(&mddev->del_work))
>>>>            flush_workqueue(md_misc_wq);
>>>> -    if (mddev->sync_thread) {
>>>> -        set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>>> -        md_reap_sync_thread(mddev);
>>>> -    }
>>>> +    set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>>> +    /*
>>>> +     * Thread might be blocked waiting for metadata update which will
>>>> now
>>>> +     * never happen
>>>> +     */
>>>> +    md_wakeup_thread_directly(mddev->sync_thread);
>>>>        mddev_unlock(mddev);
>>>>    }
>>>>    static void idle_sync_thread(struct mddev *mddev)
>>>>    {
>>>> +    int sync_seq = atomic_read(&mddev->sync_seq);
>>>> +
>>>>        mutex_lock(&mddev->sync_mutex);
>>>>        clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>>>        stop_sync_thread(mddev);
>>>> +
>>>> +    wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) ||
>>>> +            !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
>>>> +
>>>>        mutex_unlock(&mddev->sync_mutex);
>>>>    }
>>>> @@ -4797,6 +4806,10 @@ static void frozen_sync_thread(struct mddev
>>>> *mddev)
>>>>        mutex_init(&mddev->delete_mutex);
>>>>        set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>>>        stop_sync_thread(mddev);
>>>> +
>>>> +    wait_event(resync_wait, mddev->sync_thread == NULL &&
>>>> +            !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
>>>> +
>>>>        mutex_unlock(&mddev->sync_mutex);
>>>>    }
>>>> @@ -9472,6 +9485,8 @@ void md_reap_sync_thread(struct mddev *mddev)
>>>>        /* resync has finished, collect result */
>>>>        md_unregister_thread(&mddev->sync_thread);
>>>> +    atomic_inc(&mddev->sync_seq);
>>>> +
>>>>        if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>>>>            !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>>>>            mddev->degraded != mddev->raid_disks) {
>>>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>>>> index 2fa903de5bd0..7cab9c7c45b8 100644
>>>> --- a/drivers/md/md.h
>>>> +++ b/drivers/md/md.h
>>>> @@ -539,6 +539,8 @@ struct mddev {
>>>>        /* Used to synchronize idle and frozen for action_store() */
>>>>        struct mutex            sync_mutex;
>>>> +    /* The sequence number for sync thread */
>>>> +    atomic_t sync_seq;
>>>>        bool    has_superblocks:1;
>>>>        bool    fail_last_dev:1;
>>>
>>> --
>>> dm-devel mailing list
>>> dm-devel@redhat.com
>>> https://listman.redhat.com/mailman/listinfo/dm-devel
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-14  6:04         ` Yu Kuai
@ 2023-06-14  6:37           ` Xiao Ni
  0 siblings, 0 replies; 37+ messages in thread
From: Xiao Ni @ 2023-06-14  6:37 UTC (permalink / raw)
  To: Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, linux-raid,
	yangerkun, linux-kernel, yi.zhang, yukuai (C)

On Wed, Jun 14, 2023 at 2:05 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/14 11:47, Xiao Ni 写道:
> > On Wed, Jun 14, 2023 at 9:48 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/06/13 22:50, Xiao Ni 写道:
> >>>
> >>> 在 2023/5/29 下午9:20, Yu Kuai 写道:
> >>>> From: Yu Kuai <yukuai3@huawei.com>
> >>>>
> >>>> Our test found a following deadlock in raid10:
> >>>>
> >>>> 1) Issue a normal write, and such write failed:
> >>>>
> >>>>     raid10_end_write_request
> >>>>      set_bit(R10BIO_WriteError, &r10_bio->state)
> >>>>      one_write_done
> >>>>       reschedule_retry
> >>>>
> >>>>     // later from md thread
> >>>>     raid10d
> >>>>      handle_write_completed
> >>>>       list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
> >>>>
> >>>>     // later from md thread
> >>>>     raid10d
> >>>>      if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
> >>>>       list_move(conf->bio_end_io_list.prev, &tmp)
> >>>>       r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
> >>>>       raid_end_bio_io(r10_bio)
> >>>>
> >>>> Dependency chain 1: normal io is waiting for updating superblock
> >>>
> >>> Hi Kuai
> >>>
> >>> It looks like the above situation is more complex. It only needs a
> >>> normal write and md_write_start needs to
> >>>
> >>> wait until the metadata is written to member disks, right? If so, it
> >>> doesn't need to introduce raid10 write failure
> >>>
> >>> here. I guess, it should be your test case. It's nice, if you can put
> >>> your test steps in the patch. But for the analysis
> >>>
> >>> of the deadlock here, it's better to be simple.
> >>
> >> Test script can be found here, it's pretty easy to trigger:
> >>
> >> https://patchwork.kernel.org/project/linux-raid/patch/20230529132826.2125392-4-yukuai1@huaweicloud.com/
> >
> > Thanks for this.
> >>
> >> While reviewing the related code, I found that io can only be added to
> >> list bio_end_io_list from handle_write_completed() if such io failed, so
> >> I think io failure is needed to trigger deadlock from daemon thread.
> >>
> >> I think the key point is how MD_SB_CHANGE_PENDING is set:
> >>
> >> 1) raid10_error() and rdev_set_badblocks(), trigger by io failure;
> >> 2) raid10_write_request() related to reshape;
> >> 3) md_write_start() and md_allow_write(), and mddev->in_sync is set,
> >> however, I was thinking this is not a common case;
> >>
> >> 1) is used here because it's quite easy to trigger and this is what
> >> we meet in real test. 3) is possible but I will say let's keep 1), I
> >> don't think it's necessary to reporduce this deadlock through another
> >> path again.
> >
> > It makes sense. Let's go back to the first path mentioned in the patch.
> >
> >> 1) Issue a normal write, and such write failed:
> >>
> >>     raid10_end_write_request
> >>      set_bit(R10BIO_WriteError, &r10_bio->state)
> >>      one_write_done
> >>       reschedule_retry
> >
> > This is good.
> >>
> >>     // later from md thread
> >>     raid10d
> >>      handle_write_completed
> >>       list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
> >
> > I have a question here. It should run narrow_write_error in
> > handle_write_completed. In the test case, will narrow_write_error run
> > successfully? Or it fails and will call rdev_set_badblocks and
> > md_error. So MD_RECOVERY_PENDING will be set?
>
> r10_bio will always be added to bio_end_io_list, no matter
> narrow_write_error() succeed or not. The dependecy chain 1 here is just
> indicate handle this r10_bio will wait for updating super block, it's
> not where MD_RECOVERY_PENDING is set...
>
> And MD_RECOVERY_PENDING can be set from narrow_write_error() and other
> places where rdev_set_badblocks() is called.

Because in your patch, it doesn't show which step sets
MD_RECOVERY_PENDING. It's the reason I need to guess. It's a normal
write, so md_write_start can set the flag. In this case, it can cause
the same deadlock. So it's better to give which step sets the flag.
> >
> >>
> >>     // later from md thread
> >>     raid10d
> >>      if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>         -> It's here, if the flag is set, bio won't be handled.

Yes.

> >>       list_move(conf->bio_end_io_list.prev, &tmp)
> >>       r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
> >>       raid_end_bio_io(r10_bio)
> >>
> >> Dependency chain 1: normal io is waiting for updating superblock
> >
> > It's a little hard to understand. Because it doesn't show how normal
> > io waits for a superblock update. And based on your last email, I
> > guess you want to say rdev_set_badblock sets MD_RECOVERY_PENDING, but
> > the flag can't be cleared, so the bios can't be added to
> > bio_end_io_list, so the io rquests can't be finished.
>
> It's not that bio can't be added to bio_end_io_list, it's that bio in
> this list can't be handled if sb_flags is set.

Sorry for this. I wanted to say the same thing. I understand the case totally.

Regards
Xiao
>
> Thanks,
> Kuai
> >
> > Regards
> > Xiao
> >>
> >> Thanks,
> >> Kuai
> >>>
> >>>>
> >>>> 2) Trigger a recovery:
> >>>>
> >>>>     raid10_sync_request
> >>>>      raise_barrier
> >>>>
> >>>> Dependency chain 2: sync thread is waiting for normal io
> >>>>
> >>>> 3) echo idle/frozen to sync_action:
> >>>>
> >>>>     action_store
> >>>>      mddev_lock
> >>>>       md_unregister_thread
> >>>>        kthread_stop
> >>>>
> >>>> Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread
> >>>>
> >>>> 4) md thread can't update superblock:
> >>>>
> >>>>     raid10d
> >>>>      md_check_recovery
> >>>>       if (mddev_trylock(mddev))
> >>>>        md_update_sb
> >>>>
> >>>> Dependency chain 4: update superblock is waiting for 'reconfig_mutex'
> >>>>
> >>>> Hence cyclic dependency exist, in order to fix the problem, we must
> >>>> break one of them. Dependency 1 and 2 can't be broken because they are
> >>>> foundation design. Dependency 4 may be possible if it can be guaranteed
> >>>> that no io can be inflight, however, this requires a new mechanism which
> >>>> seems complex. Dependency 3 is a good choice, because idle/frozen only
> >>>> requires sync thread to finish, which can be done asynchronously that is
> >>>> already implemented, and 'reconfig_mutex' is not needed anymore.
> >>>>
> >>>> This patch switch 'idle' and 'frozen' to wait sync thread to be done
> >>>> asynchronously, and this patch also add a sequence counter to record how
> >>>> many times sync thread is done, so that 'idle' won't keep waiting on new
> >>>> started sync thread.
> >>>
> >>> In the patch, sync_seq is added in md_reap_sync_thread. In
> >>> idle_sync_thread, if sync_seq isn't equal
> >>>
> >>> mddev->sync_seq, it should mean there is someone that stops the sync
> >>> thread already, right? Why do
> >>>
> >>> you say 'new started sync thread' here?
> >>>
> >>> Regards
> >>>
> >>> Xiao
> >>>
> >>>
> >>>>
> >>>> Noted that raid456 has similiar deadlock([1]), and it's verified[2] this
> >>>> deadlock can be fixed by this patch as well.
> >>>>
> >>>> [1]
> >>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> >>>>
> >>>> [2]
> >>>> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
> >>>>
> >>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> >>>> ---
> >>>>    drivers/md/md.c | 23 +++++++++++++++++++----
> >>>>    drivers/md/md.h |  2 ++
> >>>>    2 files changed, 21 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >>>> index 63a993b52cd7..7912de0e4d12 100644
> >>>> --- a/drivers/md/md.c
> >>>> +++ b/drivers/md/md.c
> >>>> @@ -652,6 +652,7 @@ void mddev_init(struct mddev *mddev)
> >>>>        timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0);
> >>>>        atomic_set(&mddev->active, 1);
> >>>>        atomic_set(&mddev->openers, 0);
> >>>> +    atomic_set(&mddev->sync_seq, 0);
> >>>>        spin_lock_init(&mddev->lock);
> >>>>        atomic_set(&mddev->flush_pending, 0);
> >>>>        init_waitqueue_head(&mddev->sb_wait);
> >>>> @@ -4776,19 +4777,27 @@ static void stop_sync_thread(struct mddev *mddev)
> >>>>        if (work_pending(&mddev->del_work))
> >>>>            flush_workqueue(md_misc_wq);
> >>>> -    if (mddev->sync_thread) {
> >>>> -        set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >>>> -        md_reap_sync_thread(mddev);
> >>>> -    }
> >>>> +    set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >>>> +    /*
> >>>> +     * Thread might be blocked waiting for metadata update which will
> >>>> now
> >>>> +     * never happen
> >>>> +     */
> >>>> +    md_wakeup_thread_directly(mddev->sync_thread);
> >>>>        mddev_unlock(mddev);
> >>>>    }
> >>>>    static void idle_sync_thread(struct mddev *mddev)
> >>>>    {
> >>>> +    int sync_seq = atomic_read(&mddev->sync_seq);
> >>>> +
> >>>>        mutex_lock(&mddev->sync_mutex);
> >>>>        clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> >>>>        stop_sync_thread(mddev);
> >>>> +
> >>>> +    wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) ||
> >>>> +            !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
> >>>> +
> >>>>        mutex_unlock(&mddev->sync_mutex);
> >>>>    }
> >>>> @@ -4797,6 +4806,10 @@ static void frozen_sync_thread(struct mddev
> >>>> *mddev)
> >>>>        mutex_init(&mddev->delete_mutex);
> >>>>        set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> >>>>        stop_sync_thread(mddev);
> >>>> +
> >>>> +    wait_event(resync_wait, mddev->sync_thread == NULL &&
> >>>> +            !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
> >>>> +
> >>>>        mutex_unlock(&mddev->sync_mutex);
> >>>>    }
> >>>> @@ -9472,6 +9485,8 @@ void md_reap_sync_thread(struct mddev *mddev)
> >>>>        /* resync has finished, collect result */
> >>>>        md_unregister_thread(&mddev->sync_thread);
> >>>> +    atomic_inc(&mddev->sync_seq);
> >>>> +
> >>>>        if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
> >>>>            !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
> >>>>            mddev->degraded != mddev->raid_disks) {
> >>>> diff --git a/drivers/md/md.h b/drivers/md/md.h
> >>>> index 2fa903de5bd0..7cab9c7c45b8 100644
> >>>> --- a/drivers/md/md.h
> >>>> +++ b/drivers/md/md.h
> >>>> @@ -539,6 +539,8 @@ struct mddev {
> >>>>        /* Used to synchronize idle and frozen for action_store() */
> >>>>        struct mutex            sync_mutex;
> >>>> +    /* The sequence number for sync thread */
> >>>> +    atomic_t sync_seq;
> >>>>        bool    has_superblocks:1;
> >>>>        bool    fail_last_dev:1;
> >>>
> >>> --
> >>> dm-devel mailing list
> >>> dm-devel@redhat.com
> >>> https://listman.redhat.com/mailman/listinfo/dm-devel
> >>
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-14  2:04       ` Yu Kuai
@ 2023-06-14  7:12         ` Xiao Ni
  2023-06-14  7:38           ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-14  7:12 UTC (permalink / raw)
  To: Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, linux-raid,
	yangerkun, linux-kernel, yi.zhang, yukuai (C)

On Wed, Jun 14, 2023 at 10:04 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/14 9:48, Yu Kuai 写道:
>
>
> >>
> >> In the patch, sync_seq is added in md_reap_sync_thread. In
> >> idle_sync_thread, if sync_seq isn't equal
> >>
> >> mddev->sync_seq, it should mean there is someone that stops the sync
> >> thread already, right? Why do
> >>
> >> you say 'new started sync thread' here?
>
> If someone stops the sync thread, and new sync thread is not started,
> then this sync_seq won't make a difference, above wait_event() will not
> wait because !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) will pass.
> So 'sync_seq' is only used when the old sync thread stops and new sync
> thread starts, add 'sync_seq' will bypass this case.

Hi

If a new sync thread starts, why can sync_seq be different? sync_seq
is only added in md_reap_sync_thread. And when a new sync request
starts, it can't stop the sync request again?

Af first, the sync_seq is 0

admin1
echo idle > sync_action
idle_sync_thread(sync_seq is 1)
echo resync > sync_action (new sync)

Then admin2 echos idle > sync_action, sync_seq is still 1

Regards
Xiao

>
> Thanks,
> Kuai
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH -next v2 5/6] md: wake up 'resync_wait' at last in md_reap_sync_thread()
  2023-05-29 13:20 ` [PATCH -next v2 5/6] md: wake up 'resync_wait' at last in md_reap_sync_thread() Yu Kuai
@ 2023-06-14  7:20   ` Xiao Ni
  0 siblings, 0 replies; 37+ messages in thread
From: Xiao Ni @ 2023-06-14  7:20 UTC (permalink / raw)
  To: Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yi.zhang, yangerkun


在 2023/5/29 下午9:20, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> md_reap_sync_thread() is just replaced with wait_event(resync_wait, ...)
> from action_store(), just make sure action_store() will still wait for
> everything to be done in md_reap_sync_thread().
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/md.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 7912de0e4d12..f90226e6ddf8 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9531,7 +9531,6 @@ void md_reap_sync_thread(struct mddev *mddev)
>   	if (mddev_is_clustered(mddev) && is_reshaped
>   				      && !test_bit(MD_CLOSING, &mddev->flags))
>   		md_cluster_ops->update_size(mddev, old_dev_sectors);
> -	wake_up(&resync_wait);
>   	/* flag recovery needed just to double check */
>   	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>   	sysfs_notify_dirent_safe(mddev->sysfs_completed);
> @@ -9539,6 +9538,7 @@ void md_reap_sync_thread(struct mddev *mddev)
>   	md_new_event();
>   	if (mddev->event_work.func)
>   		queue_work(md_misc_wq, &mddev->event_work);
> +	wake_up(&resync_wait);
>   }
>   EXPORT_SYMBOL(md_reap_sync_thread);
>   


Reviewd-by: Xiao Ni <xni@redhat.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 6/6] md: enhance checking in md_check_recovery()
  2023-05-29 13:20 ` [PATCH -next v2 6/6] md: enhance checking in md_check_recovery() Yu Kuai
@ 2023-06-14  7:24   ` Xiao Ni
  0 siblings, 0 replies; 37+ messages in thread
From: Xiao Ni @ 2023-06-14  7:24 UTC (permalink / raw)
  To: Yu Kuai, guoqing.jiang, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai3


在 2023/5/29 下午9:20, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> For md_check_recovery():
>
> 1) if 'MD_RECOVERY_RUNING' is not set, register new sync_thread.
> 2) if 'MD_RECOVERY_RUNING' is set:
>   a) if 'MD_RECOVERY_DONE' is not set, don't do anything, wait for
>     md_do_sync() to be done.
>   b) if 'MD_RECOVERY_DONE' is set, unregister sync_thread. Current code
>     expects that sync_thread is not NULL, otherwise new sync_thread will
>     be registered, which will corrupt the array.
>
> Make sure md_check_recovery() won't register new sync_thread if
> 'MD_RECOVERY_RUNING' is still set, and a new WARN_ON_ONCE() is added for
> the above corruption,
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/md.c | 22 +++++++++++++++-------
>   1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index f90226e6ddf8..9da0fc906bbd 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9397,16 +9397,24 @@ void md_check_recovery(struct mddev *mddev)
>   		if (mddev->sb_flags)
>   			md_update_sb(mddev, 0);
>   
> -		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
> -		    !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
> -			/* resync/recovery still happening */
> -			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> -			goto unlock;
> -		}
> -		if (mddev->sync_thread) {
> +		/*
> +		 * Never start a new sync thread if MD_RECOVERY_RUNNING is
> +		 * still set.
> +		 */
> +		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> +			if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
> +				/* resync/recovery still happening */
> +				clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> +				goto unlock;
> +			}
> +
> +			if (WARN_ON_ONCE(!mddev->sync_thread))
> +				goto unlock;
> +
>   			md_reap_sync_thread(mddev);
>   			goto unlock;
>   		}
> +
>   		/* Set RUNNING before clearing NEEDED to avoid
>   		 * any transients in the value of "sync_action".
>   		 */

It makes the logical more clear.

Reviewed-by: Xiao Ni <xni@redhat.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-14  7:12         ` Xiao Ni
@ 2023-06-14  7:38           ` Yu Kuai
  2023-06-14  7:57             ` Xiao Ni
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-06-14  7:38 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, linux-raid,
	yangerkun, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2023/06/14 15:12, Xiao Ni 写道:
> On Wed, Jun 14, 2023 at 10:04 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/06/14 9:48, Yu Kuai 写道:
>>
>>
>>>>
>>>> In the patch, sync_seq is added in md_reap_sync_thread. In
>>>> idle_sync_thread, if sync_seq isn't equal
>>>>
>>>> mddev->sync_seq, it should mean there is someone that stops the sync
>>>> thread already, right? Why do
>>>>
>>>> you say 'new started sync thread' here?
>>
>> If someone stops the sync thread, and new sync thread is not started,
>> then this sync_seq won't make a difference, above wait_event() will not
>> wait because !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) will pass.
>> So 'sync_seq' is only used when the old sync thread stops and new sync
>> thread starts, add 'sync_seq' will bypass this case.
> 
> Hi
> 
> If a new sync thread starts, why can sync_seq be different? sync_seq
> is only added in md_reap_sync_thread. And when a new sync request
> starts, it can't stop the sync request again?
> 
> Af first, the sync_seq is 0
> 
> admin1
> echo idle > sync_action
> idle_sync_thread(sync_seq is 1)

Wait, I'm confused here, how can sync_seq to be 1 here? I suppose you
mean that there is a sync_thread just finished?

Then the problem is that idle_sync_thread() read sync_seq after the old
sync_thread is done, and new sync_thread start before wait_event() is
called, should we wait for this new sync_thread?

My answer here is that we should, but I'm also ok to not wait this new
sync_thread, I don't think this behaviour matters. The key point here
is that once wait_event() is called from idle_sync_thread(), this
wait_event() should not wait for new sync_thread...

> echo resync > sync_action (new sync)

If this is behind "echo idle > sync_action", idle_sync_thread should not
see that MD_RECOVERY_RUNNING is set and wait_event() won't wait at all.

Thanks,
Kuai
> 
> Then admin2 echos idle > sync_action, sync_seq is still 1
> 
> Regards
> Xiao
> 
>>
>> Thanks,
>> Kuai
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-14  7:38           ` Yu Kuai
@ 2023-06-14  7:57             ` Xiao Ni
  2023-06-14  8:28               ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-14  7:57 UTC (permalink / raw)
  To: Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, linux-raid,
	yangerkun, linux-kernel, yi.zhang, yukuai (C)

On Wed, Jun 14, 2023 at 3:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/14 15:12, Xiao Ni 写道:
> > On Wed, Jun 14, 2023 at 10:04 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/06/14 9:48, Yu Kuai 写道:
> >>
> >>
> >>>>
> >>>> In the patch, sync_seq is added in md_reap_sync_thread. In
> >>>> idle_sync_thread, if sync_seq isn't equal
> >>>>
> >>>> mddev->sync_seq, it should mean there is someone that stops the sync
> >>>> thread already, right? Why do
> >>>>
> >>>> you say 'new started sync thread' here?
> >>
> >> If someone stops the sync thread, and new sync thread is not started,
> >> then this sync_seq won't make a difference, above wait_event() will not
> >> wait because !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) will pass.
> >> So 'sync_seq' is only used when the old sync thread stops and new sync
> >> thread starts, add 'sync_seq' will bypass this case.
> >
> > Hi
> >
> > If a new sync thread starts, why can sync_seq be different? sync_seq
> > is only added in md_reap_sync_thread. And when a new sync request
> > starts, it can't stop the sync request again?
> >
> > Af first, the sync_seq is 0
> >
> > admin1
> > echo idle > sync_action
> > idle_sync_thread(sync_seq is 1)
>
> Wait, I'm confused here, how can sync_seq to be 1 here? I suppose you
> mean that there is a sync_thread just finished?

Hi Kuai

Yes. Because idle_sync_thread needs to wait until md_reap_sync_thread
finishes. And md_reap_sync_thread adds sync_seq. Do I understand your
patch right?

>
> Then the problem is that idle_sync_thread() read sync_seq after the old
> sync_thread is done, and new sync_thread start before wait_event() is
> called, should we wait for this new sync_thread?
>
> My answer here is that we should, but I'm also ok to not wait this new
> sync_thread, I don't think this behaviour matters. The key point here
> is that once wait_event() is called from idle_sync_thread(), this
> wait_event() should not wait for new sync_thread...

I think we should wait. If we don't wait for it, there is a problem.
One person echos idle to sync_action and it doesn't work sometimes.
It's a strange thing.

>
> > echo resync > sync_action (new sync)
>
> If this is behind "echo idle > sync_action", idle_sync_thread should not
> see that MD_RECOVERY_RUNNING is set and wait_event() won't wait at all.

`echo resync > sync_action` can't change the sync_seq. So 'echo idle >
sync_action' still waits until MD_RECOVERY_RUNNING is cleared?

Regards
Xiao

>
> Thanks,
> Kuai
> >
> > Then admin2 echos idle > sync_action, sync_seq is still 1
> >
> > Regards
> > Xiao
> >
> >>
> >> Thanks,
> >> Kuai
> >>
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-14  7:57             ` Xiao Ni
@ 2023-06-14  8:28               ` Yu Kuai
  2023-06-14  9:08                 ` Xiao Ni
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-06-14  8:28 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, linux-raid,
	yangerkun, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2023/06/14 15:57, Xiao Ni 写道:
> On Wed, Jun 14, 2023 at 3:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/06/14 15:12, Xiao Ni 写道:
>>> On Wed, Jun 14, 2023 at 10:04 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2023/06/14 9:48, Yu Kuai 写道:
>>>>
>>>>
>>>>>>
>>>>>> In the patch, sync_seq is added in md_reap_sync_thread. In
>>>>>> idle_sync_thread, if sync_seq isn't equal
>>>>>>
>>>>>> mddev->sync_seq, it should mean there is someone that stops the sync
>>>>>> thread already, right? Why do
>>>>>>
>>>>>> you say 'new started sync thread' here?
>>>>
>>>> If someone stops the sync thread, and new sync thread is not started,
>>>> then this sync_seq won't make a difference, above wait_event() will not
>>>> wait because !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) will pass.
>>>> So 'sync_seq' is only used when the old sync thread stops and new sync
>>>> thread starts, add 'sync_seq' will bypass this case.
>>>
>>> Hi
>>>
>>> If a new sync thread starts, why can sync_seq be different? sync_seq
>>> is only added in md_reap_sync_thread. And when a new sync request
>>> starts, it can't stop the sync request again?
>>>
>>> Af first, the sync_seq is 0
>>>
>>> admin1
>>> echo idle > sync_action
>>> idle_sync_thread(sync_seq is 1)
>>
>> Wait, I'm confused here, how can sync_seq to be 1 here? I suppose you
>> mean that there is a sync_thread just finished?
> 
> Hi Kuai
> 
> Yes. Because idle_sync_thread needs to wait until md_reap_sync_thread
> finishes. And md_reap_sync_thread adds sync_seq. Do I understand your
> patch right?

Yes, noted that idle_sync_thread() will only wait if MD_RECOVERY_RUNNING
is set.

> 
>>
>> Then the problem is that idle_sync_thread() read sync_seq after the old
>> sync_thread is done, and new sync_thread start before wait_event() is
>> called, should we wait for this new sync_thread?
>>
>> My answer here is that we should, but I'm also ok to not wait this new
>> sync_thread, I don't think this behaviour matters. The key point here
>> is that once wait_event() is called from idle_sync_thread(), this
>> wait_event() should not wait for new sync_thread...
> 
> I think we should wait. If we don't wait for it, there is a problem.
> One person echos idle to sync_action and it doesn't work sometimes.
> It's a strange thing.
> 

Ok. I'll add new comment to emphasize that idle_sync_thread() won't wait
for new sync_thread that is started after wait_event().

>>
>>> echo resync > sync_action (new sync)
>>
>> If this is behind "echo idle > sync_action", idle_sync_thread should not
>> see that MD_RECOVERY_RUNNING is set and wait_event() won't wait at all.
> 
> `echo resync > sync_action` can't change the sync_seq. So 'echo idle >
> sync_action' still waits until MD_RECOVERY_RUNNING is cleared?

This is not accurate, if `echo resync > sync_action` triggers a new
sync_thread, then sync_seq is updated when this sync_thread is done,
during this period, MD_RECOVERY_RUNNING is still set, so `echo idle
 >sync_action` will wait for sync_thread to be done.

Thanks,
Kuai
> 
> Regards
> Xiao
> 
>>
>> Thanks,
>> Kuai
>>>
>>> Then admin2 echos idle > sync_action, sync_seq is still 1
>>>
>>> Regards
>>> Xiao
>>>
>>>>
>>>> Thanks,
>>>> Kuai
>>>>
>>>
>>> .
>>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-14  8:28               ` Yu Kuai
@ 2023-06-14  9:08                 ` Xiao Ni
  2023-06-15  1:28                   ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-14  9:08 UTC (permalink / raw)
  To: Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, linux-raid,
	yangerkun, linux-kernel, yi.zhang, yukuai (C)

On Wed, Jun 14, 2023 at 4:29 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/14 15:57, Xiao Ni 写道:
> > On Wed, Jun 14, 2023 at 3:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/06/14 15:12, Xiao Ni 写道:
> >>> On Wed, Jun 14, 2023 at 10:04 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> 在 2023/06/14 9:48, Yu Kuai 写道:
> >>>>
> >>>>
> >>>>>>
> >>>>>> In the patch, sync_seq is added in md_reap_sync_thread. In
> >>>>>> idle_sync_thread, if sync_seq isn't equal
> >>>>>>
> >>>>>> mddev->sync_seq, it should mean there is someone that stops the sync
> >>>>>> thread already, right? Why do
> >>>>>>
> >>>>>> you say 'new started sync thread' here?
> >>>>
> >>>> If someone stops the sync thread, and new sync thread is not started,
> >>>> then this sync_seq won't make a difference, above wait_event() will not
> >>>> wait because !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) will pass.
> >>>> So 'sync_seq' is only used when the old sync thread stops and new sync
> >>>> thread starts, add 'sync_seq' will bypass this case.
> >>>
> >>> Hi
> >>>
> >>> If a new sync thread starts, why can sync_seq be different? sync_seq
> >>> is only added in md_reap_sync_thread. And when a new sync request
> >>> starts, it can't stop the sync request again?
> >>>
> >>> Af first, the sync_seq is 0
> >>>
> >>> admin1
> >>> echo idle > sync_action
> >>> idle_sync_thread(sync_seq is 1)
> >>
> >> Wait, I'm confused here, how can sync_seq to be 1 here? I suppose you
> >> mean that there is a sync_thread just finished?
> >
> > Hi Kuai
> >
> > Yes. Because idle_sync_thread needs to wait until md_reap_sync_thread
> > finishes. And md_reap_sync_thread adds sync_seq. Do I understand your
> > patch right?
>
> Yes, noted that idle_sync_thread() will only wait if MD_RECOVERY_RUNNING
> is set.
>
> >
> >>
> >> Then the problem is that idle_sync_thread() read sync_seq after the old
> >> sync_thread is done, and new sync_thread start before wait_event() is
> >> called, should we wait for this new sync_thread?
> >>
> >> My answer here is that we should, but I'm also ok to not wait this new
> >> sync_thread, I don't think this behaviour matters. The key point here
> >> is that once wait_event() is called from idle_sync_thread(), this
> >> wait_event() should not wait for new sync_thread...
> >
> > I think we should wait. If we don't wait for it, there is a problem.
> > One person echos idle to sync_action and it doesn't work sometimes.
> > It's a strange thing.
> >
>
> Ok. I'll add new comment to emphasize that idle_sync_thread() won't wait
> for new sync_thread that is started after wait_event().

I suggest removing this function. Without this change, it's more
simple and it can work well without problem. The people that echo idle
to sync_action needs to wait until the sync action finishes. The code
semantic is clear and simple.
>
> >>
> >>> echo resync > sync_action (new sync)
> >>
> >> If this is behind "echo idle > sync_action", idle_sync_thread should not
> >> see that MD_RECOVERY_RUNNING is set and wait_event() won't wait at all.
> >
> > `echo resync > sync_action` can't change the sync_seq. So 'echo idle >
> > sync_action' still waits until MD_RECOVERY_RUNNING is cleared?
>
> This is not accurate, if `echo resync > sync_action` triggers a new
> sync_thread, then sync_seq is updated when this sync_thread is done,
> during this period, MD_RECOVERY_RUNNING is still set, so `echo idle
>  >sync_action` will wait for sync_thread to be done.

I can understand your comment, but sorry, I still can't get how
sync_seq works. Could you give a specific case that explains how it
works?

Regards
Xiao
>
> Thanks,
> Kuai
> >
> > Regards
> > Xiao
> >
> >>
> >> Thanks,
> >> Kuai
> >>>
> >>> Then admin2 echos idle > sync_action, sync_seq is still 1
> >>>
> >>> Regards
> >>> Xiao
> >>>
> >>>>
> >>>> Thanks,
> >>>> Kuai
> >>>>
> >>>
> >>> .
> >>>
> >>
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-14  9:08                 ` Xiao Ni
@ 2023-06-15  1:28                   ` Yu Kuai
  2023-06-15  8:01                     ` Xiao Ni
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-06-15  1:28 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: guoqing.jiang, agk, snitzer, dm-devel, song, linux-raid,
	yangerkun, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2023/06/14 17:08, Xiao Ni 写道:
> On Wed, Jun 14, 2023 at 4:29 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/06/14 15:57, Xiao Ni 写道:
>>> On Wed, Jun 14, 2023 at 3:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2023/06/14 15:12, Xiao Ni 写道:
>>>>> On Wed, Jun 14, 2023 at 10:04 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> 在 2023/06/14 9:48, Yu Kuai 写道:
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> In the patch, sync_seq is added in md_reap_sync_thread. In
>>>>>>>> idle_sync_thread, if sync_seq isn't equal
>>>>>>>>
>>>>>>>> mddev->sync_seq, it should mean there is someone that stops the sync
>>>>>>>> thread already, right? Why do
>>>>>>>>
>>>>>>>> you say 'new started sync thread' here?
>>>>>>
>>>>>> If someone stops the sync thread, and new sync thread is not started,
>>>>>> then this sync_seq won't make a difference, above wait_event() will not
>>>>>> wait because !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) will pass.
>>>>>> So 'sync_seq' is only used when the old sync thread stops and new sync
>>>>>> thread starts, add 'sync_seq' will bypass this case.
>>>>>
>>>>> Hi
>>>>>
>>>>> If a new sync thread starts, why can sync_seq be different? sync_seq
>>>>> is only added in md_reap_sync_thread. And when a new sync request
>>>>> starts, it can't stop the sync request again?
>>>>>
>>>>> Af first, the sync_seq is 0
>>>>>
>>>>> admin1
>>>>> echo idle > sync_action
>>>>> idle_sync_thread(sync_seq is 1)
>>>>
>>>> Wait, I'm confused here, how can sync_seq to be 1 here? I suppose you
>>>> mean that there is a sync_thread just finished?
>>>
>>> Hi Kuai
>>>
>>> Yes. Because idle_sync_thread needs to wait until md_reap_sync_thread
>>> finishes. And md_reap_sync_thread adds sync_seq. Do I understand your
>>> patch right?
>>
>> Yes, noted that idle_sync_thread() will only wait if MD_RECOVERY_RUNNING
>> is set.
>>
>>>
>>>>
>>>> Then the problem is that idle_sync_thread() read sync_seq after the old
>>>> sync_thread is done, and new sync_thread start before wait_event() is
>>>> called, should we wait for this new sync_thread?
>>>>
>>>> My answer here is that we should, but I'm also ok to not wait this new
>>>> sync_thread, I don't think this behaviour matters. The key point here
>>>> is that once wait_event() is called from idle_sync_thread(), this
>>>> wait_event() should not wait for new sync_thread...
>>>
>>> I think we should wait. If we don't wait for it, there is a problem.
>>> One person echos idle to sync_action and it doesn't work sometimes.
>>> It's a strange thing.
>>>
>>
>> Ok. I'll add new comment to emphasize that idle_sync_thread() won't wait
>> for new sync_thread that is started after wait_event().
> 
> I suggest removing this function. Without this change, it's more
> simple and it can work well without problem. The people that echo idle
> to sync_action needs to wait until the sync action finishes. The code
> semantic is clear and simple.
>>
>>>>
>>>>> echo resync > sync_action (new sync)
>>>>
>>>> If this is behind "echo idle > sync_action", idle_sync_thread should not
>>>> see that MD_RECOVERY_RUNNING is set and wait_event() won't wait at all.
>>>
>>> `echo resync > sync_action` can't change the sync_seq. So 'echo idle >
>>> sync_action' still waits until MD_RECOVERY_RUNNING is cleared?
>>
>> This is not accurate, if `echo resync > sync_action` triggers a new
>> sync_thread, then sync_seq is updated when this sync_thread is done,
>> during this period, MD_RECOVERY_RUNNING is still set, so `echo idle
>>   >sync_action` will wait for sync_thread to be done.
> 
> I can understand your comment, but sorry, I still can't get how
> sync_seq works. Could you give a specific case that explains how it
> works?

Ok, the problem is that echo ilde is supposed to interrupt sync_thread
and stop sync_thread quickly. Now that we don't hold mutex here, we
can't prevent new sync_thread to start. For exapmle:

1) a sync_thread A is runing, MD_RECOVERY_RUNNING is set;

2) echo idle, A will be interrupted, mutex is not hold and
idle_sync_thread() is waiting for MD_RECOVERY_RUNNING to be cleared.

3) A is interrupted, it'll clear MD_RECOVERY_RUNNING and try to wakeup
idle_sync_thread(), however, before idle_sync_thread() is woken, A can
be done and a new sync_thread B can be started, and MD_RECOVERY_RUNNING
will be set again.

4) idle_sync_thread() finially wake up, however, MD_RECOVERY_RUNNING is
set and it will still waiting. And this time B won't be interrupted.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-15  1:28                   ` Yu Kuai
@ 2023-06-15  8:01                     ` Xiao Ni
  2023-06-15  8:17                       ` Xiao Ni
  0 siblings, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-15  8:01 UTC (permalink / raw)
  To: Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, linux-kernel, linux-raid, song,
	dm-devel, guoqing.jiang, yukuai (C),
	agk

On Thu, Jun 15, 2023 at 9:29 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/14 17:08, Xiao Ni 写道:
> > On Wed, Jun 14, 2023 at 4:29 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/06/14 15:57, Xiao Ni 写道:
> >>> On Wed, Jun 14, 2023 at 3:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> 在 2023/06/14 15:12, Xiao Ni 写道:
> >>>>> On Wed, Jun 14, 2023 at 10:04 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> 在 2023/06/14 9:48, Yu Kuai 写道:
> >>>>>>
> >>>>>>
> >>>>>>>>
> >>>>>>>> In the patch, sync_seq is added in md_reap_sync_thread. In
> >>>>>>>> idle_sync_thread, if sync_seq isn't equal
> >>>>>>>>
> >>>>>>>> mddev->sync_seq, it should mean there is someone that stops the sync
> >>>>>>>> thread already, right? Why do
> >>>>>>>>
> >>>>>>>> you say 'new started sync thread' here?
> >>>>>>
> >>>>>> If someone stops the sync thread, and new sync thread is not started,
> >>>>>> then this sync_seq won't make a difference, above wait_event() will not
> >>>>>> wait because !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) will pass.
> >>>>>> So 'sync_seq' is only used when the old sync thread stops and new sync
> >>>>>> thread starts, add 'sync_seq' will bypass this case.
> >>>>>
> >>>>> Hi
> >>>>>
> >>>>> If a new sync thread starts, why can sync_seq be different? sync_seq
> >>>>> is only added in md_reap_sync_thread. And when a new sync request
> >>>>> starts, it can't stop the sync request again?
> >>>>>
> >>>>> Af first, the sync_seq is 0
> >>>>>
> >>>>> admin1
> >>>>> echo idle > sync_action
> >>>>> idle_sync_thread(sync_seq is 1)
> >>>>
> >>>> Wait, I'm confused here, how can sync_seq to be 1 here? I suppose you
> >>>> mean that there is a sync_thread just finished?
> >>>
> >>> Hi Kuai
> >>>
> >>> Yes. Because idle_sync_thread needs to wait until md_reap_sync_thread
> >>> finishes. And md_reap_sync_thread adds sync_seq. Do I understand your
> >>> patch right?
> >>
> >> Yes, noted that idle_sync_thread() will only wait if MD_RECOVERY_RUNNING
> >> is set.
> >>
> >>>
> >>>>
> >>>> Then the problem is that idle_sync_thread() read sync_seq after the old
> >>>> sync_thread is done, and new sync_thread start before wait_event() is
> >>>> called, should we wait for this new sync_thread?
> >>>>
> >>>> My answer here is that we should, but I'm also ok to not wait this new
> >>>> sync_thread, I don't think this behaviour matters. The key point here
> >>>> is that once wait_event() is called from idle_sync_thread(), this
> >>>> wait_event() should not wait for new sync_thread...
> >>>
> >>> I think we should wait. If we don't wait for it, there is a problem.
> >>> One person echos idle to sync_action and it doesn't work sometimes.
> >>> It's a strange thing.
> >>>
> >>
> >> Ok. I'll add new comment to emphasize that idle_sync_thread() won't wait
> >> for new sync_thread that is started after wait_event().
> >
> > I suggest removing this function. Without this change, it's more
> > simple and it can work well without problem. The people that echo idle
> > to sync_action needs to wait until the sync action finishes. The code
> > semantic is clear and simple.
> >>
> >>>>
> >>>>> echo resync > sync_action (new sync)
> >>>>
> >>>> If this is behind "echo idle > sync_action", idle_sync_thread should not
> >>>> see that MD_RECOVERY_RUNNING is set and wait_event() won't wait at all.
> >>>
> >>> `echo resync > sync_action` can't change the sync_seq. So 'echo idle >
> >>> sync_action' still waits until MD_RECOVERY_RUNNING is cleared?
> >>
> >> This is not accurate, if `echo resync > sync_action` triggers a new
> >> sync_thread, then sync_seq is updated when this sync_thread is done,
> >> during this period, MD_RECOVERY_RUNNING is still set, so `echo idle
> >>   >sync_action` will wait for sync_thread to be done.
> >
> > I can understand your comment, but sorry, I still can't get how
> > sync_seq works. Could you give a specific case that explains how it
> > works?
>
> Ok, the problem is that echo ilde is supposed to interrupt sync_thread
> and stop sync_thread quickly. Now that we don't hold mutex here, we
> can't prevent new sync_thread to start. For exapmle:
>
> 1) a sync_thread A is runing, MD_RECOVERY_RUNNING is set;
>
> 2) echo idle, A will be interrupted, mutex is not hold and
> idle_sync_thread() is waiting for MD_RECOVERY_RUNNING to be cleared.
>
> 3) A is interrupted, it'll clear MD_RECOVERY_RUNNING and try to wakeup
> idle_sync_thread(), however, before idle_sync_thread() is woken, A can
> be done and a new sync_thread B can be started, and MD_RECOVERY_RUNNING
> will be set again.
>
> 4) idle_sync_thread() finially wake up, however, MD_RECOVERY_RUNNING is
> set and it will still waiting. And this time B won't be interrupted.

Thanks for the example. I can understand the usage of it. It's the
side effect that removes the mutex protection for idle_sync_thread.

There is a problem. New sync thread is started in md_check_recovery.
After your patch, md_reap_sync_thread is called in md_check_recovery
too. So it looks like they can't happen at the same time?

Regards
Xiao

>
> Thanks,
> Kuai
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-15  8:01                     ` Xiao Ni
@ 2023-06-15  8:17                       ` Xiao Ni
  2023-06-15  9:05                         ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: Xiao Ni @ 2023-06-15  8:17 UTC (permalink / raw)
  To: Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, linux-kernel, linux-raid, song,
	dm-devel, guoqing.jiang, yukuai (C),
	agk

On Thu, Jun 15, 2023 at 4:01 PM Xiao Ni <xni@redhat.com> wrote:
>
> On Thu, Jun 15, 2023 at 9:29 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >
> > Hi,
> >
> > 在 2023/06/14 17:08, Xiao Ni 写道:
> > > On Wed, Jun 14, 2023 at 4:29 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> 在 2023/06/14 15:57, Xiao Ni 写道:
> > >>> On Wed, Jun 14, 2023 at 3:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> 在 2023/06/14 15:12, Xiao Ni 写道:
> > >>>>> On Wed, Jun 14, 2023 at 10:04 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> 在 2023/06/14 9:48, Yu Kuai 写道:
> > >>>>>>
> > >>>>>>
> > >>>>>>>>
> > >>>>>>>> In the patch, sync_seq is added in md_reap_sync_thread. In
> > >>>>>>>> idle_sync_thread, if sync_seq isn't equal
> > >>>>>>>>
> > >>>>>>>> mddev->sync_seq, it should mean there is someone that stops the sync
> > >>>>>>>> thread already, right? Why do
> > >>>>>>>>
> > >>>>>>>> you say 'new started sync thread' here?
> > >>>>>>
> > >>>>>> If someone stops the sync thread, and new sync thread is not started,
> > >>>>>> then this sync_seq won't make a difference, above wait_event() will not
> > >>>>>> wait because !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) will pass.
> > >>>>>> So 'sync_seq' is only used when the old sync thread stops and new sync
> > >>>>>> thread starts, add 'sync_seq' will bypass this case.
> > >>>>>
> > >>>>> Hi
> > >>>>>
> > >>>>> If a new sync thread starts, why can sync_seq be different? sync_seq
> > >>>>> is only added in md_reap_sync_thread. And when a new sync request
> > >>>>> starts, it can't stop the sync request again?
> > >>>>>
> > >>>>> Af first, the sync_seq is 0
> > >>>>>
> > >>>>> admin1
> > >>>>> echo idle > sync_action
> > >>>>> idle_sync_thread(sync_seq is 1)
> > >>>>
> > >>>> Wait, I'm confused here, how can sync_seq to be 1 here? I suppose you
> > >>>> mean that there is a sync_thread just finished?
> > >>>
> > >>> Hi Kuai
> > >>>
> > >>> Yes. Because idle_sync_thread needs to wait until md_reap_sync_thread
> > >>> finishes. And md_reap_sync_thread adds sync_seq. Do I understand your
> > >>> patch right?
> > >>
> > >> Yes, noted that idle_sync_thread() will only wait if MD_RECOVERY_RUNNING
> > >> is set.
> > >>
> > >>>
> > >>>>
> > >>>> Then the problem is that idle_sync_thread() read sync_seq after the old
> > >>>> sync_thread is done, and new sync_thread start before wait_event() is
> > >>>> called, should we wait for this new sync_thread?
> > >>>>
> > >>>> My answer here is that we should, but I'm also ok to not wait this new
> > >>>> sync_thread, I don't think this behaviour matters. The key point here
> > >>>> is that once wait_event() is called from idle_sync_thread(), this
> > >>>> wait_event() should not wait for new sync_thread...
> > >>>
> > >>> I think we should wait. If we don't wait for it, there is a problem.
> > >>> One person echos idle to sync_action and it doesn't work sometimes.
> > >>> It's a strange thing.
> > >>>
> > >>
> > >> Ok. I'll add new comment to emphasize that idle_sync_thread() won't wait
> > >> for new sync_thread that is started after wait_event().
> > >
> > > I suggest removing this function. Without this change, it's more
> > > simple and it can work well without problem. The people that echo idle
> > > to sync_action needs to wait until the sync action finishes. The code
> > > semantic is clear and simple.
> > >>
> > >>>>
> > >>>>> echo resync > sync_action (new sync)
> > >>>>
> > >>>> If this is behind "echo idle > sync_action", idle_sync_thread should not
> > >>>> see that MD_RECOVERY_RUNNING is set and wait_event() won't wait at all.
> > >>>
> > >>> `echo resync > sync_action` can't change the sync_seq. So 'echo idle >
> > >>> sync_action' still waits until MD_RECOVERY_RUNNING is cleared?
> > >>
> > >> This is not accurate, if `echo resync > sync_action` triggers a new
> > >> sync_thread, then sync_seq is updated when this sync_thread is done,
> > >> during this period, MD_RECOVERY_RUNNING is still set, so `echo idle
> > >>   >sync_action` will wait for sync_thread to be done.
> > >
> > > I can understand your comment, but sorry, I still can't get how
> > > sync_seq works. Could you give a specific case that explains how it
> > > works?
> >
> > Ok, the problem is that echo ilde is supposed to interrupt sync_thread
> > and stop sync_thread quickly. Now that we don't hold mutex here, we
> > can't prevent new sync_thread to start. For exapmle:
> >
> > 1) a sync_thread A is runing, MD_RECOVERY_RUNNING is set;
> >
> > 2) echo idle, A will be interrupted, mutex is not hold and
> > idle_sync_thread() is waiting for MD_RECOVERY_RUNNING to be cleared.
> >
> > 3) A is interrupted, it'll clear MD_RECOVERY_RUNNING and try to wakeup
> > idle_sync_thread(), however, before idle_sync_thread() is woken, A can
> > be done and a new sync_thread B can be started, and MD_RECOVERY_RUNNING
> > will be set again.
> >
> > 4) idle_sync_thread() finially wake up, however, MD_RECOVERY_RUNNING is
> > set and it will still waiting. And this time B won't be interrupted.
>
> Thanks for the example. I can understand the usage of it. It's the
> side effect that removes the mutex protection for idle_sync_thread.
>
> There is a problem. New sync thread is started in md_check_recovery.
> After your patch, md_reap_sync_thread is called in md_check_recovery
> too. So it looks like they can't happen at the same time?

After thinking a while, there is still a race possibility.

md_reap_sync_thread is called in pers deamon (e.g. raid10d ->
md_check_recovery) and md_check_recovery returns. Before
idle_sync_thread is woken, the new sync thread can be started in
md_check_recovery again.

But it's really strange, when one people echo idle to sync_action.
It's better to add some messages to notify the users that they need to
echo idle to sync_action again to have a try. Is there a way that
md_reap_sync_thread can wait idle_sync_thread?

Regards
Xiao
>
> Regards
> Xiao
>
> >
> > Thanks,
> > Kuai
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-15  8:17                       ` Xiao Ni
@ 2023-06-15  9:05                         ` Yu Kuai
  2023-06-15  9:14                           ` Xiao Ni
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-06-15  9:05 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, linux-kernel, linux-raid, song,
	dm-devel, guoqing.jiang, agk, yukuai (C)

Hi,

在 2023/06/15 16:17, Xiao Ni 写道:
>> Thanks for the example. I can understand the usage of it. It's the
>> side effect that removes the mutex protection for idle_sync_thread.
>>
>> There is a problem. New sync thread is started in md_check_recovery.
>> After your patch, md_reap_sync_thread is called in md_check_recovery
>> too. So it looks like they can't happen at the same time?

Of course they can't. md_check_recovery() can only do one thing at a
time.

> 
> After thinking a while, there is still a race possibility.
> 
> md_reap_sync_thread is called in pers deamon (e.g. raid10d ->
> md_check_recovery) and md_check_recovery returns. Before
> idle_sync_thread is woken, the new sync thread can be started in
> md_check_recovery again.
> 
> But it's really strange, when one people echo idle to sync_action.
> It's better to add some messages to notify the users that they need to
> echo idle to sync_action again to have a try. Is there a way that
> md_reap_sync_thread can wait idle_sync_thread?

I don't think this is a problem, echo idle only make sure to interupt
current sync_thread, there is no gurantee that sync_thread is not
running after "echo idle" is done with or without this patchset, before
this patchset, new sync thread can still start after the mutex is
released.

User shoud "echo forzen" instead of "echo idle" if they really what to
avoid new sync_thread to start.

Thanks,
Kuai
> 
> Regards
> Xiao
>>
>> Regards
>> Xiao
>>
>>>
>>> Thanks,
>>> Kuai
>>>
>>> --
>>> dm-devel mailing list
>>> dm-devel@redhat.com
>>> https://listman.redhat.com/mailman/listinfo/dm-devel
> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock
  2023-06-15  9:05                         ` Yu Kuai
@ 2023-06-15  9:14                           ` Xiao Ni
  0 siblings, 0 replies; 37+ messages in thread
From: Xiao Ni @ 2023-06-15  9:14 UTC (permalink / raw)
  To: Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, linux-kernel, linux-raid, song,
	dm-devel, guoqing.jiang, agk, yukuai (C)

On Thu, Jun 15, 2023 at 5:05 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/15 16:17, Xiao Ni 写道:
> >> Thanks for the example. I can understand the usage of it. It's the
> >> side effect that removes the mutex protection for idle_sync_thread.
> >>
> >> There is a problem. New sync thread is started in md_check_recovery.
> >> After your patch, md_reap_sync_thread is called in md_check_recovery
> >> too. So it looks like they can't happen at the same time?
>
> Of course they can't. md_check_recovery() can only do one thing at a
> time.
>
> >
> > After thinking a while, there is still a race possibility.
> >
> > md_reap_sync_thread is called in pers deamon (e.g. raid10d ->
> > md_check_recovery) and md_check_recovery returns. Before
> > idle_sync_thread is woken, the new sync thread can be started in
> > md_check_recovery again.
> >
> > But it's really strange, when one people echo idle to sync_action.
> > It's better to add some messages to notify the users that they need to
> > echo idle to sync_action again to have a try. Is there a way that
> > md_reap_sync_thread can wait idle_sync_thread?
>
> I don't think this is a problem, echo idle only make sure to interupt
> current sync_thread, there is no gurantee that sync_thread is not
> running after "echo idle" is done with or without this patchset, before
> this patchset, new sync thread can still start after the mutex is
> released.
>
> User shoud "echo forzen" instead of "echo idle" if they really what to
> avoid new sync_thread to start.

Thanks for all the explanations and patience.

Regards
Xiao
>
> Thanks,
> Kuai
> >
> > Regards
> > Xiao
> >>
> >> Regards
> >> Xiao
> >>
> >>>
> >>> Thanks,
> >>> Kuai
> >>>
> >>> --
> >>> dm-devel mailing list
> >>> dm-devel@redhat.com
> >>> https://listman.redhat.com/mailman/listinfo/dm-devel
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH -next v2 3/6] md: add a mutex to synchronize idle and frozen in action_store()
  2023-06-14  1:15     ` Yu Kuai
@ 2023-06-16  6:41       ` Song Liu
  0 siblings, 0 replies; 37+ messages in thread
From: Song Liu @ 2023-06-16  6:41 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Xiao Ni, guoqing.jiang, agk, snitzer, dm-devel, yi.zhang,
	yangerkun, linux-kernel, linux-raid, yukuai (C)

On Tue, Jun 13, 2023 at 6:15 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/13 22:43, Xiao Ni 写道:
> >
> > 在 2023/5/29 下午9:20, Yu Kuai 写道:
> >> From: Yu Kuai <yukuai3@huawei.com>
> >>
> >> Currently, for idle and frozen, action_store will hold 'reconfig_mutex'
> >> and call md_reap_sync_thread() to stop sync thread, however, this will
> >> cause deadlock (explained in the next patch). In order to fix the
> >> problem, following patch will release 'reconfig_mutex' and wait on
> >> 'resync_wait', like md_set_readonly() and do_md_stop() does.
> >>
> >> Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN'
> >> unconditionally, which might cause unexpected problems, for example,
> >> frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while
> >> 'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which
> >> might starve in progress frozen. A mutex is added to synchronize idle
> >> and frozen from action_store().
> >>
> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> >> ---
> >>   drivers/md/md.c | 5 +++++
> >>   drivers/md/md.h | 3 +++
> >>   2 files changed, 8 insertions(+)
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 23e8e7eae062..63a993b52cd7 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -644,6 +644,7 @@ void mddev_init(struct mddev *mddev)
> >>       mutex_init(&mddev->open_mutex);
> >>       mutex_init(&mddev->reconfig_mutex);
> >>       mutex_init(&mddev->delete_mutex);
> >> +    mutex_init(&mddev->sync_mutex);
> >>       mutex_init(&mddev->bitmap_info.mutex);
> >>       INIT_LIST_HEAD(&mddev->disks);
> >>       INIT_LIST_HEAD(&mddev->all_mddevs);
> >> @@ -4785,14 +4786,18 @@ static void stop_sync_thread(struct mddev *mddev)
> >>   static void idle_sync_thread(struct mddev *mddev)
> >>   {
> >> +    mutex_lock(&mddev->sync_mutex);
> >>       clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> >>       stop_sync_thread(mddev);
> >> +    mutex_unlock(&mddev->sync_mutex);
> >>   }
> >>   static void frozen_sync_thread(struct mddev *mddev)
> >>   {
> >> +    mutex_init(&mddev->delete_mutex);
> >
> >
> > typo error? It should be mutex_lock(&mddev->sync_mutex); ?
> >
>
> Yes, and thanks for spotting this, this looks like I did this while
> rebasing.

I fixed this one and applied the set to md-next.

Thanks,
Song

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-06-16  6:42 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-29 13:20 [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
2023-05-29 13:20 ` [PATCH -next v2 1/6] Revert "md: unlock mddev before reap sync_thread in action_store" Yu Kuai
2023-06-13  6:25   ` Xiao Ni
2023-06-13 11:58     ` Yu Kuai
2023-05-29 13:20 ` [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen' Yu Kuai
2023-06-13  8:02   ` [dm-devel] " Xiao Ni
2023-06-13 12:00     ` Yu Kuai
2023-06-13 12:25       ` Xiao Ni
2023-06-13 12:44         ` Yu Kuai
2023-06-13 14:14           ` Xiao Ni
2023-05-29 13:20 ` [PATCH -next v2 3/6] md: add a mutex to synchronize idle and frozen in action_store() Yu Kuai
2023-06-13 14:43   ` [dm-devel] " Xiao Ni
2023-06-14  1:15     ` Yu Kuai
2023-06-16  6:41       ` Song Liu
2023-05-29 13:20 ` [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock Yu Kuai
2023-06-13 14:50   ` [dm-devel] " Xiao Ni
2023-06-14  1:48     ` Yu Kuai
2023-06-14  2:04       ` Yu Kuai
2023-06-14  7:12         ` Xiao Ni
2023-06-14  7:38           ` Yu Kuai
2023-06-14  7:57             ` Xiao Ni
2023-06-14  8:28               ` Yu Kuai
2023-06-14  9:08                 ` Xiao Ni
2023-06-15  1:28                   ` Yu Kuai
2023-06-15  8:01                     ` Xiao Ni
2023-06-15  8:17                       ` Xiao Ni
2023-06-15  9:05                         ` Yu Kuai
2023-06-15  9:14                           ` Xiao Ni
2023-06-14  3:47       ` Xiao Ni
2023-06-14  6:04         ` Yu Kuai
2023-06-14  6:37           ` Xiao Ni
2023-05-29 13:20 ` [PATCH -next v2 5/6] md: wake up 'resync_wait' at last in md_reap_sync_thread() Yu Kuai
2023-06-14  7:20   ` Xiao Ni
2023-05-29 13:20 ` [PATCH -next v2 6/6] md: enhance checking in md_check_recovery() Yu Kuai
2023-06-14  7:24   ` [dm-devel] " Xiao Ni
2023-06-08  2:41 ` [PATCH -next v2 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
2023-06-09  4:44   ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).