[PATCH -next v2 0/7] md: make rdev addition and removal independent from daemon thread

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH -next v2 0/7] md: make rdev addition and removal independent from daemon thread
@ 2023-08-15  3:09 Yu Kuai
  2023-08-15  3:09 ` [PATCH -next v2 1/7] md: use separate work_struct for md_start_sync() Yu Kuai
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Yu Kuai @ 2023-08-15  3:09 UTC (permalink / raw)
  To: xni, song; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Changes in v2:
 - remove patch 1 from v1 and some related patches, those patches will
 be sent later when rcu protection for rdev is removed.
 - add patch 2.

This is the third patchset to do some preparatory work to synchronize
io with array reconfiguration.

1) The first patchset refactor 'active_io', make sure that mddev_suspend()
will wait for io to be done. [1]

2) The second patchset remove 'quiesce' callback from mddev_suspend(), so
that mddev_suspend() doesn't rely on 'quiesce' callback is registered,
and can be used for all personalites; [2]

3) This patchset make array reconfiguration independent from daemon thread,
and synchronize it with io will be much easier because io may rely on
daemon thread to be done.

More patchset on the way!

Yu Kuai (7):
  md: use separate work_struct for md_start_sync()
  md: factor out a helper to choose sync direction from
    md_check_recovery()
  md: delay choosing sync direction to md_start_sync()
  md: factor out a helper rdev_removeable() from remove_and_add_spares()
  md: factor out a helper rdev_is_spare() from remove_and_add_spares()
  md: factor out a helper rdev_addable() from remove_and_add_spares()
  md: delay remove_and_add_spares() for read only array to
    md_start_sync()

 drivers/md/md.c | 257 +++++++++++++++++++++++++++++-------------------
 drivers/md/md.h |   5 +-
 2 files changed, 160 insertions(+), 102 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH -next v2 1/7] md: use separate work_struct for md_start_sync()
  2023-08-15  3:09 [PATCH -next v2 0/7] md: make rdev addition and removal independent from daemon thread Yu Kuai
@ 2023-08-15  3:09 ` Yu Kuai
  2023-08-15  3:09 ` [PATCH -next v2 2/7] md: factor out a helper to choose sync direction from md_check_recovery() Yu Kuai
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Yu Kuai @ 2023-08-15  3:09 UTC (permalink / raw)
  To: xni, song; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

It's a little weird to borrow 'del_work' for md_start_sync(), declare
a new work_struct 'sync_work' for md_start_sync().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 10 ++++++----
 drivers/md/md.h |  5 ++++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5c3c19b8d509..90815be1e80f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -631,13 +631,13 @@ void mddev_put(struct mddev *mddev)
 		 * flush_workqueue() after mddev_find will succeed in waiting
 		 * for the work to be done.
 		 */
-		INIT_WORK(&mddev->del_work, mddev_delayed_delete);
 		queue_work(md_misc_wq, &mddev->del_work);
 	}
 	spin_unlock(&all_mddevs_lock);
 }
 
 static void md_safemode_timeout(struct timer_list *t);
+static void md_start_sync(struct work_struct *ws);
 
 void mddev_init(struct mddev *mddev)
 {
@@ -662,6 +662,9 @@ void mddev_init(struct mddev *mddev)
 	mddev->resync_min = 0;
 	mddev->resync_max = MaxSector;
 	mddev->level = LEVEL_NONE;
+
+	INIT_WORK(&mddev->sync_work, md_start_sync);
+	INIT_WORK(&mddev->del_work, mddev_delayed_delete);
 }
 EXPORT_SYMBOL_GPL(mddev_init);
 
@@ -9245,7 +9248,7 @@ static int remove_and_add_spares(struct mddev *mddev,
 
 static void md_start_sync(struct work_struct *ws)
 {
-	struct mddev *mddev = container_of(ws, struct mddev, del_work);
+	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
 
 	rcu_assign_pointer(mddev->sync_thread,
 			   md_register_thread(md_do_sync, mddev, "resync"));
@@ -9458,8 +9461,7 @@ void md_check_recovery(struct mddev *mddev)
 				 */
 				md_bitmap_write_all(mddev->bitmap);
 			}
-			INIT_WORK(&mddev->del_work, md_start_sync);
-			queue_work(md_misc_wq, &mddev->del_work);
+			queue_work(md_misc_wq, &mddev->sync_work);
 			goto unlock;
 		}
 	not_running:
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 9bcb77bca963..64d05cb65287 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -450,7 +450,10 @@ struct mddev {
 	struct kernfs_node		*sysfs_degraded;	/*handle for 'degraded' */
 	struct kernfs_node		*sysfs_level;		/*handle for 'level' */
 
-	struct work_struct del_work;	/* used for delayed sysfs removal */
+	/* used for delayed sysfs removal */
+	struct work_struct del_work;
+	/* used for register new sync thread */
+	struct work_struct sync_work;
 
 	/* "lock" protects:
 	 *   flush_bio transition from NULL to !NULL
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -next v2 2/7] md: factor out a helper to choose sync direction from md_check_recovery()
  2023-08-15  3:09 [PATCH -next v2 0/7] md: make rdev addition and removal independent from daemon thread Yu Kuai
  2023-08-15  3:09 ` [PATCH -next v2 1/7] md: use separate work_struct for md_start_sync() Yu Kuai
@ 2023-08-15  3:09 ` Yu Kuai
  2023-08-17  7:58   ` Mariusz Tkaczyk
  2023-08-15  3:09 ` [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync() Yu Kuai
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Yu Kuai @ 2023-08-15  3:09 UTC (permalink / raw)
  To: xni, song; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, on the one hand make the code cleaner,
on the other hand prevent following checkpatch error in the next patch to
delay choosing sync direction to md_start_sync().

ERROR: do not use assignment in if condition
+       } else if ((spares = remove_and_add_spares(mddev, NULL))) {

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 68 +++++++++++++++++++++++++++++++------------------
 1 file changed, 43 insertions(+), 25 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 90815be1e80f..4846ff6d25b0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9246,6 +9246,48 @@ static int remove_and_add_spares(struct mddev *mddev,
 	return spares;
 }
 
+static bool md_choose_sync_direction(struct mddev *mddev, int *spares)
+{
+	/* check reshape first */
+	if (mddev->reshape_position != MaxSector) {
+		if (mddev->pers->check_reshape == NULL ||
+		    mddev->pers->check_reshape(mddev) != 0)
+			return false;
+
+		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
+		clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
+		return true;
+	}
+
+	/*
+	 * remove any failed drives, then add spares if possible. Spares are
+	 * also removed and re-added, to allow the personality to fail the
+	 * re-add.
+	 */
+	*spares = remove_and_add_spares(mddev, NULL);
+	if (*spares) {
+		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+		clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
+		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
+		return true;
+	}
+
+	/* check recovery */
+	if (mddev->recovery_cp < MaxSector) {
+		set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+		clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
+		return true;
+	}
+
+	/* check resync */
+	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
+		return true;
+
+	/* nothing to be done */
+	return false;
+}
+
 static void md_start_sync(struct work_struct *ws)
 {
 	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
@@ -9427,32 +9469,8 @@ void md_check_recovery(struct mddev *mddev)
 		if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
 		    test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
 			goto not_running;
-		/* no recovery is running.
-		 * remove any failed drives, then
-		 * add spares if possible.
-		 * Spares are also removed and re-added, to allow
-		 * the personality to fail the re-add.
-		 */
-
-		if (mddev->reshape_position != MaxSector) {
-			if (mddev->pers->check_reshape == NULL ||
-			    mddev->pers->check_reshape(mddev) != 0)
-				/* Cannot proceed */
-				goto not_running;
-			set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
-		} else if ((spares = remove_and_add_spares(mddev, NULL))) {
-			clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
-			clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
-			clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
-			set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
-		} else if (mddev->recovery_cp < MaxSector) {
-			set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
-			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
-		} else if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
-			/* nothing to be done ... */
+		if (!md_choose_sync_direction(mddev, &spares))
 			goto not_running;
-
 		if (mddev->pers->sync_request) {
 			if (spares) {
 				/* We are adding a device or devices to an array
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync()
  2023-08-15  3:09 [PATCH -next v2 0/7] md: make rdev addition and removal independent from daemon thread Yu Kuai
  2023-08-15  3:09 ` [PATCH -next v2 1/7] md: use separate work_struct for md_start_sync() Yu Kuai
  2023-08-15  3:09 ` [PATCH -next v2 2/7] md: factor out a helper to choose sync direction from md_check_recovery() Yu Kuai
@ 2023-08-15  3:09 ` Yu Kuai
  2023-08-15  6:00   ` Yu Kuai
  2023-08-16  6:38   ` Xiao Ni
  2023-08-15  3:09 ` [PATCH -next v2 4/7] md: factor out a helper rdev_removeable() from remove_and_add_spares() Yu Kuai
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Yu Kuai @ 2023-08-15  3:09 UTC (permalink / raw)
  To: xni, song; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Before this patch, for read-write array:

1) md_check_recover() found that something need to be done, and it'll
   try to grab 'reconfig_mutex'. The case that md_check_recover() need
   to do something:
   - array is not suspend;
   - super_block need to be updated;
   - 'MD_RECOVERY_NEEDED' or ''MD_RECOVERY_DONE' is set;
   - unusual case related to safemode;

2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
   md_check_recover() will try to choose a sync direction, and then
   queue a work md_start_sync().

3) md_start_sync() register sync_thread;

After this patch,

1) is the same;
2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
   queue a work md_start_sync() directly;
3) md_start_sync() will try to choose a sync direction, and then
   register sync_thread();

Because 'MD_RECOVERY_RUNNING' is cleared when sync_thread is done, 2)
and 3) is always ran in serial and they can never concurrent, this
change should not introduce any behavior change for now.

Also fix a problem that md_start_sync() can clear 'MD_RECOVERY_RUNNING'
without protection in error path, which might affect the logical in
md_check_recovery().

The advantage to change this is that array reconfiguration is
independent from daemon now, and it'll be much easier to synchronize it
with io, consider that io may rely on daemon thread to be done.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 70 ++++++++++++++++++++++++++-----------------------
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4846ff6d25b0..03615b0e9fe1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9291,6 +9291,22 @@ static bool md_choose_sync_direction(struct mddev *mddev, int *spares)
 static void md_start_sync(struct work_struct *ws)
 {
 	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
+	int spares = 0;
+
+	mddev_lock_nointr(mddev);
+
+	if (!md_choose_sync_direction(mddev, &spares))
+		goto not_running;
+
+	if (!mddev->pers->sync_request)
+		goto not_running;
+
+	/*
+	 * We are adding a device or devices to an array which has the bitmap
+	 * stored on all devices. So make sure all bitmap pages get written.
+	 */
+	if (spares)
+		md_bitmap_write_all(mddev->bitmap);
 
 	rcu_assign_pointer(mddev->sync_thread,
 			   md_register_thread(md_do_sync, mddev, "resync"));
@@ -9298,20 +9314,27 @@ static void md_start_sync(struct work_struct *ws)
 		pr_warn("%s: could not start resync thread...\n",
 			mdname(mddev));
 		/* leave the spares where they are, it shouldn't hurt */
-		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
-		clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-		clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
-		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
-		clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-		wake_up(&resync_wait);
-		if (test_and_clear_bit(MD_RECOVERY_RECOVER,
-				       &mddev->recovery))
-			if (mddev->sysfs_action)
-				sysfs_notify_dirent_safe(mddev->sysfs_action);
-	} else
-		md_wakeup_thread(mddev->sync_thread);
+		goto not_running;
+	}
+
+	mddev_unlock(mddev);
+	md_wakeup_thread(mddev->sync_thread);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
 	md_new_event();
+	return;
+
+not_running:
+	clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+	clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
+	clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
+	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+	clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+	mddev_unlock(mddev);
+
+	wake_up(&resync_wait);
+	if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
+	    mddev->sysfs_action)
+		sysfs_notify_dirent_safe(mddev->sysfs_action);
 }
 
 /*
@@ -9379,7 +9402,6 @@ void md_check_recovery(struct mddev *mddev)
 		return;
 
 	if (mddev_trylock(mddev)) {
-		int spares = 0;
 		bool try_set_sync = mddev->safemode != 0;
 
 		if (!mddev->external && mddev->safemode == 1)
@@ -9467,29 +9489,11 @@ void md_check_recovery(struct mddev *mddev)
 		clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
 
 		if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
-		    test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
-			goto not_running;
-		if (!md_choose_sync_direction(mddev, &spares))
-			goto not_running;
-		if (mddev->pers->sync_request) {
-			if (spares) {
-				/* We are adding a device or devices to an array
-				 * which has the bitmap stored on all devices.
-				 * So make sure all bitmap pages get written
-				 */
-				md_bitmap_write_all(mddev->bitmap);
-			}
+		    test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
 			queue_work(md_misc_wq, &mddev->sync_work);
-			goto unlock;
-		}
-	not_running:
-		if (!mddev->sync_thread) {
+		} else {
 			clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
 			wake_up(&resync_wait);
-			if (test_and_clear_bit(MD_RECOVERY_RECOVER,
-					       &mddev->recovery))
-				if (mddev->sysfs_action)
-					sysfs_notify_dirent_safe(mddev->sysfs_action);
 		}
 	unlock:
 		wake_up(&mddev->sb_wait);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -next v2 4/7] md: factor out a helper rdev_removeable() from remove_and_add_spares()
  2023-08-15  3:09 [PATCH -next v2 0/7] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (2 preceding siblings ...)
  2023-08-15  3:09 ` [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync() Yu Kuai
@ 2023-08-15  3:09 ` Yu Kuai
  2023-08-15  3:09 ` [PATCH -next v2 5/7] md: factor out a helper rdev_is_spare() " Yu Kuai
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Yu Kuai @ 2023-08-15  3:09 UTC (permalink / raw)
  To: xni, song; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, just to make the code simpler and
prepare to delay remove_and_add_spares() to md_start_sync().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 03615b0e9fe1..ea091eef23d1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9153,6 +9153,22 @@ void md_do_sync(struct md_thread *thread)
 }
 EXPORT_SYMBOL_GPL(md_do_sync);
 
+static bool rdev_removeable(struct md_rdev *rdev)
+{
+	if (rdev->raid_disk < 0 || test_bit(Blocked, &rdev->flags) ||
+	    atomic_read(&rdev->nr_pending))
+		return false;
+
+	if (test_bit(RemoveSynchronized, &rdev->flags))
+		return true;
+
+	if (test_bit(In_sync, &rdev->flags) ||
+	    test_bit(Journal, &rdev->flags))
+		return false;
+
+	return true;
+}
+
 static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this)
 {
@@ -9166,11 +9182,7 @@ static int remove_and_add_spares(struct mddev *mddev,
 		return 0;
 
 	rdev_for_each(rdev, mddev) {
-		if ((this == NULL || rdev == this) &&
-		    rdev->raid_disk >= 0 &&
-		    !test_bit(Blocked, &rdev->flags) &&
-		    test_bit(Faulty, &rdev->flags) &&
-		    atomic_read(&rdev->nr_pending)==0) {
+		if ((this == NULL || rdev == this) && rdev_removeable(rdev)) {
 			/* Faulty non-Blocked devices with nr_pending == 0
 			 * never get nr_pending incremented,
 			 * never get Faulty cleared, and never get Blocked set.
@@ -9185,19 +9197,12 @@ static int remove_and_add_spares(struct mddev *mddev,
 		synchronize_rcu();
 	rdev_for_each(rdev, mddev) {
 		if ((this == NULL || rdev == this) &&
-		    rdev->raid_disk >= 0 &&
-		    !test_bit(Blocked, &rdev->flags) &&
-		    ((test_bit(RemoveSynchronized, &rdev->flags) ||
-		     (!test_bit(In_sync, &rdev->flags) &&
-		      !test_bit(Journal, &rdev->flags))) &&
-		    atomic_read(&rdev->nr_pending)==0)) {
-			if (mddev->pers->hot_remove_disk(
-				    mddev, rdev) == 0) {
+		    rdev_removeable(rdev) &&
+		    mddev->pers->hot_remove_disk(mddev, rdev) == 0) {
 				sysfs_unlink_rdev(mddev, rdev);
 				rdev->saved_raid_disk = rdev->raid_disk;
 				rdev->raid_disk = -1;
 				removed++;
-			}
 		}
 		if (remove_some && test_bit(RemoveSynchronized, &rdev->flags))
 			clear_bit(RemoveSynchronized, &rdev->flags);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -next v2 5/7] md: factor out a helper rdev_is_spare() from remove_and_add_spares()
  2023-08-15  3:09 [PATCH -next v2 0/7] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (3 preceding siblings ...)
  2023-08-15  3:09 ` [PATCH -next v2 4/7] md: factor out a helper rdev_removeable() from remove_and_add_spares() Yu Kuai
@ 2023-08-15  3:09 ` Yu Kuai
  2023-08-15  3:09 ` [PATCH -next v2 6/7] md: factor out a helper rdev_addable() " Yu Kuai
  2023-08-15  3:09 ` [PATCH -next v2 7/7] md: delay remove_and_add_spares() for read only array to md_start_sync() Yu Kuai
  6 siblings, 0 replies; 20+ messages in thread
From: Yu Kuai @ 2023-08-15  3:09 UTC (permalink / raw)
  To: xni, song; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, just to make the code simpler and
prepare to delay remove_and_add_spares() to md_start_sync().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ea091eef23d1..6baaa4d314b3 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9169,6 +9169,14 @@ static bool rdev_removeable(struct md_rdev *rdev)
 	return true;
 }
 
+static bool rdev_is_spare(struct md_rdev *rdev)
+{
+	return !test_bit(Candidate, &rdev->flags) && rdev->raid_disk >= 0 &&
+	       !test_bit(In_sync, &rdev->flags) &&
+	       !test_bit(Journal, &rdev->flags) &&
+	       !test_bit(Faulty, &rdev->flags);
+}
+
 static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this)
 {
@@ -9217,13 +9225,10 @@ static int remove_and_add_spares(struct mddev *mddev,
 	rdev_for_each(rdev, mddev) {
 		if (this && this != rdev)
 			continue;
+		if (rdev_is_spare(rdev))
+			spares++;
 		if (test_bit(Candidate, &rdev->flags))
 			continue;
-		if (rdev->raid_disk >= 0 &&
-		    !test_bit(In_sync, &rdev->flags) &&
-		    !test_bit(Journal, &rdev->flags) &&
-		    !test_bit(Faulty, &rdev->flags))
-			spares++;
 		if (rdev->raid_disk >= 0)
 			continue;
 		if (test_bit(Faulty, &rdev->flags))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -next v2 6/7] md: factor out a helper rdev_addable() from remove_and_add_spares()
  2023-08-15  3:09 [PATCH -next v2 0/7] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (4 preceding siblings ...)
  2023-08-15  3:09 ` [PATCH -next v2 5/7] md: factor out a helper rdev_is_spare() " Yu Kuai
@ 2023-08-15  3:09 ` Yu Kuai
  2023-08-15  3:09 ` [PATCH -next v2 7/7] md: delay remove_and_add_spares() for read only array to md_start_sync() Yu Kuai
  6 siblings, 0 replies; 20+ messages in thread
From: Yu Kuai @ 2023-08-15  3:09 UTC (permalink / raw)
  To: xni, song; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, just to make the code simpler and
prepare to delay remove_and_add_spares() to md_start_sync().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 6baaa4d314b3..d26d2c35f9af 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9177,6 +9177,20 @@ static bool rdev_is_spare(struct md_rdev *rdev)
 	       !test_bit(Faulty, &rdev->flags);
 }
 
+static bool rdev_addable(struct md_rdev *rdev)
+{
+	if (test_bit(Candidate, &rdev->flags) || rdev->raid_disk >= 0 ||
+	    test_bit(Faulty, &rdev->flags))
+		return false;
+
+	if (!test_bit(Journal, &rdev->flags) && !md_is_rdwr(rdev->mddev) &&
+	    !(rdev->saved_raid_disk >= 0 &&
+	      !test_bit(Bitmap_sync, &rdev->flags)))
+		return false;
+
+	return true;
+}
+
 static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this)
 {
@@ -9227,20 +9241,10 @@ static int remove_and_add_spares(struct mddev *mddev,
 			continue;
 		if (rdev_is_spare(rdev))
 			spares++;
-		if (test_bit(Candidate, &rdev->flags))
+		if (!rdev_addable(rdev))
 			continue;
-		if (rdev->raid_disk >= 0)
-			continue;
-		if (test_bit(Faulty, &rdev->flags))
-			continue;
-		if (!test_bit(Journal, &rdev->flags)) {
-			if (!md_is_rdwr(mddev) &&
-			    !(rdev->saved_raid_disk >= 0 &&
-			      !test_bit(Bitmap_sync, &rdev->flags)))
-				continue;
-
+		if (!test_bit(Journal, &rdev->flags))
 			rdev->recovery_offset = 0;
-		}
 		if (mddev->pers->hot_add_disk(mddev, rdev) == 0) {
 			/* failure here is OK */
 			sysfs_link_rdev(mddev, rdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -next v2 7/7] md: delay remove_and_add_spares() for read only array to md_start_sync()
  2023-08-15  3:09 [PATCH -next v2 0/7] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (5 preceding siblings ...)
  2023-08-15  3:09 ` [PATCH -next v2 6/7] md: factor out a helper rdev_addable() " Yu Kuai
@ 2023-08-15  3:09 ` Yu Kuai
  2023-08-16  7:18   ` Xiao Ni
  6 siblings, 1 reply; 20+ messages in thread
From: Yu Kuai @ 2023-08-15  3:09 UTC (permalink / raw)
  To: xni, song; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Before this patch, for read-only array:

md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, then it will
call remove_and_add_spares() directly to try to remove and add rdevs
from array.

After this patch:

1) md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, and the
   worker 'sync_work' is not pending, and there are rdevs can be added
   or removed, then it will queue new work md_start_sync();
2) md_start_sync() will call remove_and_add_spares() and exist;

This change make sure that array reconfiguration is independent from
daemon, and it'll be much easier to synchronize it with io, consier
that io may rely on daemon thread to be done.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d26d2c35f9af..74d529479fcf 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9191,6 +9191,16 @@ static bool rdev_addable(struct md_rdev *rdev)
 	return true;
 }
 
+static bool md_spares_need_change(struct mddev *mddev)
+{
+	struct md_rdev *rdev;
+
+	rdev_for_each(rdev, mddev)
+		if (rdev_removeable(rdev) || rdev_addable(rdev))
+			return true;
+	return false;
+}
+
 static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this)
 {
@@ -9309,6 +9319,12 @@ static void md_start_sync(struct work_struct *ws)
 
 	mddev_lock_nointr(mddev);
 
+	if (!md_is_rdwr(mddev)) {
+		remove_and_add_spares(mddev, NULL);
+		mddev_unlock(mddev);
+		return;
+	}
+
 	if (!md_choose_sync_direction(mddev, &spares))
 		goto not_running;
 
@@ -9403,7 +9419,8 @@ void md_check_recovery(struct mddev *mddev)
 	}
 
 	if (!md_is_rdwr(mddev) &&
-	    !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
+	    (!test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
+	     work_pending(&mddev->sync_work)))
 		return;
 	if ( ! (
 		(mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
@@ -9431,15 +9448,8 @@ void md_check_recovery(struct mddev *mddev)
 				 */
 				rdev_for_each(rdev, mddev)
 					clear_bit(Blocked, &rdev->flags);
-			/* On a read-only array we can:
-			 * - remove failed devices
-			 * - add already-in_sync devices if the array itself
-			 *   is in-sync.
-			 * As we only add devices that are already in-sync,
-			 * we can activate the spares immediately.
-			 */
-			remove_and_add_spares(mddev, NULL);
-			/* There is no thread, but we need to call
+			/*
+			 * There is no thread, but we need to call
 			 * ->spare_active and clear saved_raid_disk
 			 */
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
@@ -9447,6 +9457,13 @@ void md_check_recovery(struct mddev *mddev)
 			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
+
+			/*
+			 * Let md_start_sync() to remove and add rdevs to the
+			 * array.
+			 */
+			if (md_spares_need_change(mddev))
+				queue_work(md_misc_wq, &mddev->sync_work);
 			goto unlock;
 		}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync()
  2023-08-15  3:09 ` [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync() Yu Kuai
@ 2023-08-15  6:00   ` Yu Kuai
  2023-08-15 15:54     ` Song Liu
  2023-08-16  6:38   ` Xiao Ni
  1 sibling, 1 reply; 20+ messages in thread
From: Yu Kuai @ 2023-08-15  6:00 UTC (permalink / raw)
  To: Yu Kuai, xni, song
  Cc: linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/08/15 11:09, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Before this patch, for read-write array:
> 
> 1) md_check_recover() found that something need to be done, and it'll
>     try to grab 'reconfig_mutex'. The case that md_check_recover() need
>     to do something:
>     - array is not suspend;
>     - super_block need to be updated;
>     - 'MD_RECOVERY_NEEDED' or ''MD_RECOVERY_DONE' is set;
>     - unusual case related to safemode;
> 
> 2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
>     md_check_recover() will try to choose a sync direction, and then
>     queue a work md_start_sync().
> 
> 3) md_start_sync() register sync_thread;
> 
> After this patch,
> 
> 1) is the same;
> 2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
>     queue a work md_start_sync() directly;
> 3) md_start_sync() will try to choose a sync direction, and then
>     register sync_thread();
> 
> Because 'MD_RECOVERY_RUNNING' is cleared when sync_thread is done, 2)
> and 3) is always ran in serial and they can never concurrent, this
> change should not introduce any behavior change for now.
> 
> Also fix a problem that md_start_sync() can clear 'MD_RECOVERY_RUNNING'
> without protection in error path, which might affect the logical in
> md_check_recovery().
> 
> The advantage to change this is that array reconfiguration is
> independent from daemon now, and it'll be much easier to synchronize it
> with io, consider that io may rely on daemon thread to be done.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/md.c | 70 ++++++++++++++++++++++++++-----------------------
>   1 file changed, 37 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 4846ff6d25b0..03615b0e9fe1 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9291,6 +9291,22 @@ static bool md_choose_sync_direction(struct mddev *mddev, int *spares)
>   static void md_start_sync(struct work_struct *ws)
>   {
>   	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
> +	int spares = 0;
> +
> +	mddev_lock_nointr(mddev);
> +
> +	if (!md_choose_sync_direction(mddev, &spares))
> +		goto not_running;
> +
> +	if (!mddev->pers->sync_request)
> +		goto not_running;
> +
> +	/*
> +	 * We are adding a device or devices to an array which has the bitmap
> +	 * stored on all devices. So make sure all bitmap pages get written.
> +	 */
> +	if (spares)
> +		md_bitmap_write_all(mddev->bitmap);
>   
>   	rcu_assign_pointer(mddev->sync_thread,
>   			   md_register_thread(md_do_sync, mddev, "resync"));
> @@ -9298,20 +9314,27 @@ static void md_start_sync(struct work_struct *ws)
>   		pr_warn("%s: could not start resync thread...\n",
>   			mdname(mddev));
>   		/* leave the spares where they are, it shouldn't hurt */
> -		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
> -		clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> -		clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
> -		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> -		clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> -		wake_up(&resync_wait);
> -		if (test_and_clear_bit(MD_RECOVERY_RECOVER,
> -				       &mddev->recovery))
> -			if (mddev->sysfs_action)
> -				sysfs_notify_dirent_safe(mddev->sysfs_action);
> -	} else
> -		md_wakeup_thread(mddev->sync_thread);
> +		goto not_running;
> +	}
> +
> +	mddev_unlock(mddev);
> +	md_wakeup_thread(mddev->sync_thread);
>   	sysfs_notify_dirent_safe(mddev->sysfs_action);
>   	md_new_event();
> +	return;
> +
> +not_running:
> +	clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
> +	clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> +	clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
> +	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> +	clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> +	mddev_unlock(mddev);
> +
> +	wake_up(&resync_wait);
> +	if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
> +	    mddev->sysfs_action)
> +		sysfs_notify_dirent_safe(mddev->sysfs_action);
>   }
>   
>   /*
> @@ -9379,7 +9402,6 @@ void md_check_recovery(struct mddev *mddev)
>   		return;
>   
>   	if (mddev_trylock(mddev)) {
> -		int spares = 0;
>   		bool try_set_sync = mddev->safemode != 0;
>   
>   		if (!mddev->external && mddev->safemode == 1)
> @@ -9467,29 +9489,11 @@ void md_check_recovery(struct mddev *mddev)
>   		clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
>   
>   		if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
> -		    test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
> -			goto not_running;
> -		if (!md_choose_sync_direction(mddev, &spares))
> -			goto not_running;
> -		if (mddev->pers->sync_request) {
> -			if (spares) {
> -				/* We are adding a device or devices to an array
> -				 * which has the bitmap stored on all devices.
> -				 * So make sure all bitmap pages get written
> -				 */
> -				md_bitmap_write_all(mddev->bitmap);
> -			}
> +		    test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {

Sorry that I made a mistake here while rebasing v2, here should be

!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)

With this fixed, there are no new regression for mdadm tests using loop
devicein my VM.

Thanks,
Kuai
>   			queue_work(md_misc_wq, &mddev->sync_work);
> -			goto unlock;
> -		}
> -	not_running:
> -		if (!mddev->sync_thread) {
> +		} else {
>   			clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
>   			wake_up(&resync_wait);
> -			if (test_and_clear_bit(MD_RECOVERY_RECOVER,
> -					       &mddev->recovery))
> -				if (mddev->sysfs_action)
> -					sysfs_notify_dirent_safe(mddev->sysfs_action);
>   		}
>   	unlock:
>   		wake_up(&mddev->sb_wait);
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync()
  2023-08-15  6:00   ` Yu Kuai
@ 2023-08-15 15:54     ` Song Liu
  2023-08-16  1:07       ` Yu Kuai
  0 siblings, 1 reply; 20+ messages in thread
From: Song Liu @ 2023-08-15 15:54 UTC (permalink / raw)
  To: Yu Kuai; +Cc: xni, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C)

On Tue, Aug 15, 2023 at 2:00 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
[...]
> > +
> > +not_running:
> > +     clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
> > +     clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> > +     clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
> > +     clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> > +     clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> > +     mddev_unlock(mddev);
> > +
> > +     wake_up(&resync_wait);
> > +     if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
> > +         mddev->sysfs_action)
> > +             sysfs_notify_dirent_safe(mddev->sysfs_action);
> >   }
> >
> >   /*
> > @@ -9379,7 +9402,6 @@ void md_check_recovery(struct mddev *mddev)
> >               return;
> >
> >       if (mddev_trylock(mddev)) {
> > -             int spares = 0;
> >               bool try_set_sync = mddev->safemode != 0;
> >
> >               if (!mddev->external && mddev->safemode == 1)
> > @@ -9467,29 +9489,11 @@ void md_check_recovery(struct mddev *mddev)
> >               clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
> >
> >               if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
> > -                 test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
> > -                     goto not_running;
> > -             if (!md_choose_sync_direction(mddev, &spares))
> > -                     goto not_running;
> > -             if (mddev->pers->sync_request) {
> > -                     if (spares) {
> > -                             /* We are adding a device or devices to an array
> > -                              * which has the bitmap stored on all devices.
> > -                              * So make sure all bitmap pages get written
> > -                              */
> > -                             md_bitmap_write_all(mddev->bitmap);
> > -                     }
> > +                 test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
>
> Sorry that I made a mistake here while rebasing v2, here should be
>
> !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)
>
> With this fixed, there are no new regression for mdadm tests using loop
> devicein my VM.

                if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
                    !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
                        queue_work(md_misc_wq, &mddev->sync_work);
                } else {

This doesn't look right. Should we do

                if (test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) &&
                    !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
                        queue_work(md_misc_wq, &mddev->sync_work);
                } else {

instead?

Thanks,
Song

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync()
  2023-08-15 15:54     ` Song Liu
@ 2023-08-16  1:07       ` Yu Kuai
  2023-08-17 21:53         ` Song Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Yu Kuai @ 2023-08-16  1:07 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: xni, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/08/15 23:54, Song Liu 写道:
> On Tue, Aug 15, 2023 at 2:00 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> [...]
>>> +
>>> +not_running:
>>> +     clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
>>> +     clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
>>> +     clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
>>> +     clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
>>> +     clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
>>> +     mddev_unlock(mddev);
>>> +
>>> +     wake_up(&resync_wait);
>>> +     if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
>>> +         mddev->sysfs_action)
>>> +             sysfs_notify_dirent_safe(mddev->sysfs_action);
>>>    }
>>>
>>>    /*
>>> @@ -9379,7 +9402,6 @@ void md_check_recovery(struct mddev *mddev)
>>>                return;
>>>
>>>        if (mddev_trylock(mddev)) {
>>> -             int spares = 0;
>>>                bool try_set_sync = mddev->safemode != 0;
>>>
>>>                if (!mddev->external && mddev->safemode == 1)
>>> @@ -9467,29 +9489,11 @@ void md_check_recovery(struct mddev *mddev)
>>>                clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
>>>
>>>                if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
>>> -                 test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
>>> -                     goto not_running;
>>> -             if (!md_choose_sync_direction(mddev, &spares))
>>> -                     goto not_running;
>>> -             if (mddev->pers->sync_request) {
>>> -                     if (spares) {
>>> -                             /* We are adding a device or devices to an array
>>> -                              * which has the bitmap stored on all devices.
>>> -                              * So make sure all bitmap pages get written
>>> -                              */
>>> -                             md_bitmap_write_all(mddev->bitmap);
>>> -                     }
>>> +                 test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
>>
>> Sorry that I made a mistake here while rebasing v2, here should be
>>
>> !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)
>>
>> With this fixed, there are no new regression for mdadm tests using loop
>> devicein my VM.
> 
>                  if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
>                      !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
>                          queue_work(md_misc_wq, &mddev->sync_work);
>                  } else {
> 
> This doesn't look right. Should we do
> 
>                  if (test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) &&
>                      !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
>                          queue_work(md_misc_wq, &mddev->sync_work);
>                  } else {
> 
> instead?
> 

Yes you're right, this is exactly what I did in v1, sorry that I keep
making mistake while rebasing.

Thanks,
Kuai

> Thanks,
> Song
> .
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync()
  2023-08-15  3:09 ` [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync() Yu Kuai
  2023-08-15  6:00   ` Yu Kuai
@ 2023-08-16  6:38   ` Xiao Ni
  2023-08-20  2:04     ` Yu Kuai
  1 sibling, 1 reply; 20+ messages in thread
From: Xiao Ni @ 2023-08-16  6:38 UTC (permalink / raw)
  To: Yu Kuai; +Cc: song, linux-raid, linux-kernel, yukuai3, yi.zhang, yangerkun

On Tue, Aug 15, 2023 at 11:13 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Before this patch, for read-write array:
>
> 1) md_check_recover() found that something need to be done, and it'll
>    try to grab 'reconfig_mutex'. The case that md_check_recover() need
>    to do something:
>    - array is not suspend;
>    - super_block need to be updated;
>    - 'MD_RECOVERY_NEEDED' or ''MD_RECOVERY_DONE' is set;
>    - unusual case related to safemode;
>
> 2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
>    md_check_recover() will try to choose a sync direction, and then
>    queue a work md_start_sync().
>
> 3) md_start_sync() register sync_thread;
>
> After this patch,
>
> 1) is the same;
> 2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
>    queue a work md_start_sync() directly;
> 3) md_start_sync() will try to choose a sync direction, and then
>    register sync_thread();
>
> Because 'MD_RECOVERY_RUNNING' is cleared when sync_thread is done, 2)
> and 3) is always ran in serial and they can never concurrent, this
> change should not introduce any behavior change for now.
>
> Also fix a problem that md_start_sync() can clear 'MD_RECOVERY_RUNNING'
> without protection in error path, which might affect the logical in
> md_check_recovery().
>
> The advantage to change this is that array reconfiguration is
> independent from daemon now, and it'll be much easier to synchronize it
> with io, consider that io may rely on daemon thread to be done.

Hi Kuai

What's the meaning of "array reconfiguration" here? "mdadm -f/-r/-a"
something like this, right?. Because before and after this patch, only
one sync thread can be running, so If we don't do this change, are
there bugs or performance problems?

If it's only a patch that wants to make md_check_recovery more clearer
and easier, I'm good with this idea too.

Best Regards
Xiao
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md.c | 70 ++++++++++++++++++++++++++-----------------------
>  1 file changed, 37 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 4846ff6d25b0..03615b0e9fe1 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9291,6 +9291,22 @@ static bool md_choose_sync_direction(struct mddev *mddev, int *spares)
>  static void md_start_sync(struct work_struct *ws)
>  {
>         struct mddev *mddev = container_of(ws, struct mddev, sync_work);
> +       int spares = 0;
> +
> +       mddev_lock_nointr(mddev);
> +
> +       if (!md_choose_sync_direction(mddev, &spares))
> +               goto not_running;
> +
> +       if (!mddev->pers->sync_request)
> +               goto not_running;
> +
> +       /*
> +        * We are adding a device or devices to an array which has the bitmap
> +        * stored on all devices. So make sure all bitmap pages get written.
> +        */
> +       if (spares)
> +               md_bitmap_write_all(mddev->bitmap);
>
>         rcu_assign_pointer(mddev->sync_thread,
>                            md_register_thread(md_do_sync, mddev, "resync"));
> @@ -9298,20 +9314,27 @@ static void md_start_sync(struct work_struct *ws)
>                 pr_warn("%s: could not start resync thread...\n",
>                         mdname(mddev));
>                 /* leave the spares where they are, it shouldn't hurt */
> -               clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
> -               clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> -               clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
> -               clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> -               clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> -               wake_up(&resync_wait);
> -               if (test_and_clear_bit(MD_RECOVERY_RECOVER,
> -                                      &mddev->recovery))
> -                       if (mddev->sysfs_action)
> -                               sysfs_notify_dirent_safe(mddev->sysfs_action);
> -       } else
> -               md_wakeup_thread(mddev->sync_thread);
> +               goto not_running;
> +       }
> +
> +       mddev_unlock(mddev);
> +       md_wakeup_thread(mddev->sync_thread);
>         sysfs_notify_dirent_safe(mddev->sysfs_action);
>         md_new_event();
> +       return;
> +
> +not_running:
> +       clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
> +       clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> +       clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
> +       clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> +       clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> +       mddev_unlock(mddev);
> +
> +       wake_up(&resync_wait);
> +       if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
> +           mddev->sysfs_action)
> +               sysfs_notify_dirent_safe(mddev->sysfs_action);
>  }
>
>  /*
> @@ -9379,7 +9402,6 @@ void md_check_recovery(struct mddev *mddev)
>                 return;
>
>         if (mddev_trylock(mddev)) {
> -               int spares = 0;
>                 bool try_set_sync = mddev->safemode != 0;
>
>                 if (!mddev->external && mddev->safemode == 1)
> @@ -9467,29 +9489,11 @@ void md_check_recovery(struct mddev *mddev)
>                 clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
>
>                 if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
> -                   test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
> -                       goto not_running;
> -               if (!md_choose_sync_direction(mddev, &spares))
> -                       goto not_running;
> -               if (mddev->pers->sync_request) {
> -                       if (spares) {
> -                               /* We are adding a device or devices to an array
> -                                * which has the bitmap stored on all devices.
> -                                * So make sure all bitmap pages get written
> -                                */
> -                               md_bitmap_write_all(mddev->bitmap);
> -                       }
> +                   test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
>                         queue_work(md_misc_wq, &mddev->sync_work);
> -                       goto unlock;
> -               }
> -       not_running:
> -               if (!mddev->sync_thread) {
> +               } else {
>                         clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
>                         wake_up(&resync_wait);
> -                       if (test_and_clear_bit(MD_RECOVERY_RECOVER,
> -                                              &mddev->recovery))
> -                               if (mddev->sysfs_action)
> -                                       sysfs_notify_dirent_safe(mddev->sysfs_action);
>                 }
>         unlock:
>                 wake_up(&mddev->sb_wait);
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 7/7] md: delay remove_and_add_spares() for read only array to md_start_sync()
  2023-08-15  3:09 ` [PATCH -next v2 7/7] md: delay remove_and_add_spares() for read only array to md_start_sync() Yu Kuai
@ 2023-08-16  7:18   ` Xiao Ni
  2023-08-20  2:19     ` Yu Kuai
  0 siblings, 1 reply; 20+ messages in thread
From: Xiao Ni @ 2023-08-16  7:18 UTC (permalink / raw)
  To: Yu Kuai; +Cc: song, linux-raid, linux-kernel, yukuai3, yi.zhang, yangerkun

On Tue, Aug 15, 2023 at 11:13 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Before this patch, for read-only array:
>
> md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, then it will
> call remove_and_add_spares() directly to try to remove and add rdevs
> from array.
>
> After this patch:
>
> 1) md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, and the
>    worker 'sync_work' is not pending, and there are rdevs can be added
>    or removed, then it will queue new work md_start_sync();
> 2) md_start_sync() will call remove_and_add_spares() and exist;

Hi Kuai

If md_check_recovery returns and md_starts_sync doesn't start, during
the window the raid changes to read-write, the sync thread can be run
but MD_RECOVERY_RUNNING isn't set. Is there such a problem?

Regards
Xiao

>
> This change make sure that array reconfiguration is independent from
> daemon, and it'll be much easier to synchronize it with io, consier
> that io may rely on daemon thread to be done.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md.c | 37 +++++++++++++++++++++++++++----------
>  1 file changed, 27 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index d26d2c35f9af..74d529479fcf 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9191,6 +9191,16 @@ static bool rdev_addable(struct md_rdev *rdev)
>         return true;
>  }
>
> +static bool md_spares_need_change(struct mddev *mddev)
> +{
> +       struct md_rdev *rdev;
> +
> +       rdev_for_each(rdev, mddev)
> +               if (rdev_removeable(rdev) || rdev_addable(rdev))
> +                       return true;
> +       return false;
> +}
> +
>  static int remove_and_add_spares(struct mddev *mddev,
>                                  struct md_rdev *this)
>  {
> @@ -9309,6 +9319,12 @@ static void md_start_sync(struct work_struct *ws)
>
>         mddev_lock_nointr(mddev);
>
> +       if (!md_is_rdwr(mddev)) {
> +               remove_and_add_spares(mddev, NULL);
> +               mddev_unlock(mddev);
> +               return;
> +       }
> +
>         if (!md_choose_sync_direction(mddev, &spares))
>                 goto not_running;
>
> @@ -9403,7 +9419,8 @@ void md_check_recovery(struct mddev *mddev)
>         }
>
>         if (!md_is_rdwr(mddev) &&
> -           !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
> +           (!test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
> +            work_pending(&mddev->sync_work)))
>                 return;
>         if ( ! (
>                 (mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
> @@ -9431,15 +9448,8 @@ void md_check_recovery(struct mddev *mddev)
>                                  */
>                                 rdev_for_each(rdev, mddev)
>                                         clear_bit(Blocked, &rdev->flags);
> -                       /* On a read-only array we can:
> -                        * - remove failed devices
> -                        * - add already-in_sync devices if the array itself
> -                        *   is in-sync.
> -                        * As we only add devices that are already in-sync,
> -                        * we can activate the spares immediately.
> -                        */
> -                       remove_and_add_spares(mddev, NULL);
> -                       /* There is no thread, but we need to call
> +                       /*
> +                        * There is no thread, but we need to call
>                          * ->spare_active and clear saved_raid_disk
>                          */
>                         set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> @@ -9447,6 +9457,13 @@ void md_check_recovery(struct mddev *mddev)
>                         clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>                         clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>                         clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
> +
> +                       /*
> +                        * Let md_start_sync() to remove and add rdevs to the
> +                        * array.
> +                        */
> +                       if (md_spares_need_change(mddev))
> +                               queue_work(md_misc_wq, &mddev->sync_work);
>                         goto unlock;
>                 }
>
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 2/7] md: factor out a helper to choose sync direction from md_check_recovery()
  2023-08-15  3:09 ` [PATCH -next v2 2/7] md: factor out a helper to choose sync direction from md_check_recovery() Yu Kuai
@ 2023-08-17  7:58   ` Mariusz Tkaczyk
  2023-08-17 21:49     ` Song Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Mariusz Tkaczyk @ 2023-08-17  7:58 UTC (permalink / raw)
  To: Yu Kuai; +Cc: xni, song, linux-raid, linux-kernel, yukuai3, yi.zhang, yangerkun

On Tue, 15 Aug 2023 11:09:52 +0800
Yu Kuai <yukuai1@huaweicloud.com> wrote:

> From: Yu Kuai <yukuai3@huawei.com>
> 
> There are no functional changes, on the one hand make the code cleaner,
> on the other hand prevent following checkpatch error in the next patch to
> delay choosing sync direction to md_start_sync().
> 
> ERROR: do not use assignment in if condition
> +       } else if ((spares = remove_and_add_spares(mddev, NULL))) {
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md.c | 68 +++++++++++++++++++++++++++++++------------------
>  1 file changed, 43 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 90815be1e80f..4846ff6d25b0 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9246,6 +9246,48 @@ static int remove_and_add_spares(struct mddev *mddev,
>  	return spares;
>  }
>  
> +static bool md_choose_sync_direction(struct mddev *mddev, int *spares)

The naming is little confusing because as a direction I would expect forward or
backward - from end to start or or from start to end. In this case you are
determining the type of the background operation needed. Assuming that reshape
is a kind of "sync" operation I would say "md_choose_sync_action".

Anyway, it looks good to me.

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 2/7] md: factor out a helper to choose sync direction from md_check_recovery()
  2023-08-17  7:58   ` Mariusz Tkaczyk
@ 2023-08-17 21:49     ` Song Liu
  2023-08-20  1:44       ` Yu Kuai
  0 siblings, 1 reply; 20+ messages in thread
From: Song Liu @ 2023-08-17 21:49 UTC (permalink / raw)
  To: Mariusz Tkaczyk
  Cc: Yu Kuai, xni, linux-raid, linux-kernel, yukuai3, yi.zhang, yangerkun

On Thu, Aug 17, 2023 at 12:58 AM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> On Tue, 15 Aug 2023 11:09:52 +0800
> Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> > From: Yu Kuai <yukuai3@huawei.com>
> >
> > There are no functional changes, on the one hand make the code cleaner,
> > on the other hand prevent following checkpatch error in the next patch to
> > delay choosing sync direction to md_start_sync().
> >
> > ERROR: do not use assignment in if condition
> > +       } else if ((spares = remove_and_add_spares(mddev, NULL))) {
> >
> > Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> > ---
> >  drivers/md/md.c | 68 +++++++++++++++++++++++++++++++------------------
> >  1 file changed, 43 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 90815be1e80f..4846ff6d25b0 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -9246,6 +9246,48 @@ static int remove_and_add_spares(struct mddev *mddev,
> >       return spares;
> >  }
> >
> > +static bool md_choose_sync_direction(struct mddev *mddev, int *spares)
>
> The naming is little confusing because as a direction I would expect forward or
> backward - from end to start or or from start to end. In this case you are
> determining the type of the background operation needed. Assuming that reshape
> is a kind of "sync" operation I would say "md_choose_sync_action".

Yeah, md_choose_sync_direction is indeed confusing.

Thanks,
Song

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync()
  2023-08-16  1:07       ` Yu Kuai
@ 2023-08-17 21:53         ` Song Liu
  2023-08-20  1:45           ` Yu Kuai
  0 siblings, 1 reply; 20+ messages in thread
From: Song Liu @ 2023-08-17 21:53 UTC (permalink / raw)
  To: Yu Kuai; +Cc: xni, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C)

On Tue, Aug 15, 2023 at 6:07 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/08/15 23:54, Song Liu 写道:
> > On Tue, Aug 15, 2023 at 2:00 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > [...]
> >>> +
> >>> +not_running:
> >>> +     clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
> >>> +     clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> >>> +     clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
> >>> +     clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> >>> +     clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> >>> +     mddev_unlock(mddev);
> >>> +
> >>> +     wake_up(&resync_wait);
> >>> +     if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
> >>> +         mddev->sysfs_action)
> >>> +             sysfs_notify_dirent_safe(mddev->sysfs_action);
> >>>    }
> >>>
> >>>    /*
> >>> @@ -9379,7 +9402,6 @@ void md_check_recovery(struct mddev *mddev)
> >>>                return;
> >>>
> >>>        if (mddev_trylock(mddev)) {
> >>> -             int spares = 0;
> >>>                bool try_set_sync = mddev->safemode != 0;
> >>>
> >>>                if (!mddev->external && mddev->safemode == 1)
> >>> @@ -9467,29 +9489,11 @@ void md_check_recovery(struct mddev *mddev)
> >>>                clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
> >>>
> >>>                if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
> >>> -                 test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
> >>> -                     goto not_running;
> >>> -             if (!md_choose_sync_direction(mddev, &spares))
> >>> -                     goto not_running;
> >>> -             if (mddev->pers->sync_request) {
> >>> -                     if (spares) {
> >>> -                             /* We are adding a device or devices to an array
> >>> -                              * which has the bitmap stored on all devices.
> >>> -                              * So make sure all bitmap pages get written
> >>> -                              */
> >>> -                             md_bitmap_write_all(mddev->bitmap);
> >>> -                     }
> >>> +                 test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
> >>
> >> Sorry that I made a mistake here while rebasing v2, here should be
> >>
> >> !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)
> >>
> >> With this fixed, there are no new regression for mdadm tests using loop
> >> devicein my VM.
> >
> >                  if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
> >                      !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
> >                          queue_work(md_misc_wq, &mddev->sync_work);
> >                  } else {
> >
> > This doesn't look right. Should we do
> >
> >                  if (test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) &&
> >                      !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
> >                          queue_work(md_misc_wq, &mddev->sync_work);
> >                  } else {
> >
> > instead?
> >
>
> Yes you're right, this is exactly what I did in v1, sorry that I keep
> making mistake while rebasing.

Please fix this, address comments from other reviews, and resend the
patches. Also, there are some typos in the commit logs, please also fix them.

Unfortunately, we won't ship this (and the two other big sets) in 6.6.

Thanks,
Song

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 2/7] md: factor out a helper to choose sync direction from md_check_recovery()
  2023-08-17 21:49     ` Song Liu
@ 2023-08-20  1:44       ` Yu Kuai
  0 siblings, 0 replies; 20+ messages in thread
From: Yu Kuai @ 2023-08-20  1:44 UTC (permalink / raw)
  To: Song Liu, Mariusz Tkaczyk
  Cc: Yu Kuai, xni, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/08/18 5:49, Song Liu 写道:
> On Thu, Aug 17, 2023 at 12:58 AM Mariusz Tkaczyk
> <mariusz.tkaczyk@linux.intel.com> wrote:
>>
>> On Tue, 15 Aug 2023 11:09:52 +0800
>> Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> There are no functional changes, on the one hand make the code cleaner,
>>> on the other hand prevent following checkpatch error in the next patch to
>>> delay choosing sync direction to md_start_sync().
>>>
>>> ERROR: do not use assignment in if condition
>>> +       } else if ((spares = remove_and_add_spares(mddev, NULL))) {
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>>   drivers/md/md.c | 68 +++++++++++++++++++++++++++++++------------------
>>>   1 file changed, 43 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 90815be1e80f..4846ff6d25b0 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -9246,6 +9246,48 @@ static int remove_and_add_spares(struct mddev *mddev,
>>>        return spares;
>>>   }
>>>
>>> +static bool md_choose_sync_direction(struct mddev *mddev, int *spares)
>>
>> The naming is little confusing because as a direction I would expect forward or
>> backward - from end to start or or from start to end. In this case you are
>> determining the type of the background operation needed. Assuming that reshape
>> is a kind of "sync" operation I would say "md_choose_sync_action".
> 
> Yeah, md_choose_sync_direction is indeed confusing.
> 

Thanks for the suggestion, I'll update this in the new version.

Kuai,

> Thanks,
> Song
> .
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync()
  2023-08-17 21:53         ` Song Liu
@ 2023-08-20  1:45           ` Yu Kuai
  0 siblings, 0 replies; 20+ messages in thread
From: Yu Kuai @ 2023-08-20  1:45 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: xni, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/08/18 5:53, Song Liu 写道:
> On Tue, Aug 15, 2023 at 6:07 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/08/15 23:54, Song Liu 写道:
>>> On Tue, Aug 15, 2023 at 2:00 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>> [...]
>>>>> +
>>>>> +not_running:
>>>>> +     clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
>>>>> +     clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
>>>>> +     clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
>>>>> +     clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
>>>>> +     clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
>>>>> +     mddev_unlock(mddev);
>>>>> +
>>>>> +     wake_up(&resync_wait);
>>>>> +     if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
>>>>> +         mddev->sysfs_action)
>>>>> +             sysfs_notify_dirent_safe(mddev->sysfs_action);
>>>>>     }
>>>>>
>>>>>     /*
>>>>> @@ -9379,7 +9402,6 @@ void md_check_recovery(struct mddev *mddev)
>>>>>                 return;
>>>>>
>>>>>         if (mddev_trylock(mddev)) {
>>>>> -             int spares = 0;
>>>>>                 bool try_set_sync = mddev->safemode != 0;
>>>>>
>>>>>                 if (!mddev->external && mddev->safemode == 1)
>>>>> @@ -9467,29 +9489,11 @@ void md_check_recovery(struct mddev *mddev)
>>>>>                 clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
>>>>>
>>>>>                 if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
>>>>> -                 test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
>>>>> -                     goto not_running;
>>>>> -             if (!md_choose_sync_direction(mddev, &spares))
>>>>> -                     goto not_running;
>>>>> -             if (mddev->pers->sync_request) {
>>>>> -                     if (spares) {
>>>>> -                             /* We are adding a device or devices to an array
>>>>> -                              * which has the bitmap stored on all devices.
>>>>> -                              * So make sure all bitmap pages get written
>>>>> -                              */
>>>>> -                             md_bitmap_write_all(mddev->bitmap);
>>>>> -                     }
>>>>> +                 test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
>>>>
>>>> Sorry that I made a mistake here while rebasing v2, here should be
>>>>
>>>> !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)
>>>>
>>>> With this fixed, there are no new regression for mdadm tests using loop
>>>> devicein my VM.
>>>
>>>                   if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
>>>                       !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
>>>                           queue_work(md_misc_wq, &mddev->sync_work);
>>>                   } else {
>>>
>>> This doesn't look right. Should we do
>>>
>>>                   if (test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) &&
>>>                       !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
>>>                           queue_work(md_misc_wq, &mddev->sync_work);
>>>                   } else {
>>>
>>> instead?
>>>
>>
>> Yes you're right, this is exactly what I did in v1, sorry that I keep
>> making mistake while rebasing.
> 
> Please fix this, address comments from other reviews, and resend the
> patches. Also, there are some typos in the commit logs, please also fix them.
> 

Of course, and sorry for the dealy, I was ill and rested at home for a
few days.

Thanks,
Kuai

> Unfortunately, we won't ship this (and the two other big sets) in 6.6.
> 
> Thanks,
> Song
> .
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync()
  2023-08-16  6:38   ` Xiao Ni
@ 2023-08-20  2:04     ` Yu Kuai
  0 siblings, 0 replies; 20+ messages in thread
From: Yu Kuai @ 2023-08-20  2:04 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: song, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/08/16 14:38, Xiao Ni 写道:
> On Tue, Aug 15, 2023 at 11:13 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Before this patch, for read-write array:
>>
>> 1) md_check_recover() found that something need to be done, and it'll
>>     try to grab 'reconfig_mutex'. The case that md_check_recover() need
>>     to do something:
>>     - array is not suspend;
>>     - super_block need to be updated;
>>     - 'MD_RECOVERY_NEEDED' or ''MD_RECOVERY_DONE' is set;
>>     - unusual case related to safemode;
>>
>> 2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
>>     md_check_recover() will try to choose a sync direction, and then
>>     queue a work md_start_sync().
>>
>> 3) md_start_sync() register sync_thread;
>>
>> After this patch,
>>
>> 1) is the same;
>> 2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
>>     queue a work md_start_sync() directly;
>> 3) md_start_sync() will try to choose a sync direction, and then
>>     register sync_thread();
>>
>> Because 'MD_RECOVERY_RUNNING' is cleared when sync_thread is done, 2)
>> and 3) is always ran in serial and they can never concurrent, this
>> change should not introduce any behavior change for now.
>>
>> Also fix a problem that md_start_sync() can clear 'MD_RECOVERY_RUNNING'
>> without protection in error path, which might affect the logical in
>> md_check_recovery().
>>
>> The advantage to change this is that array reconfiguration is
>> independent from daemon now, and it'll be much easier to synchronize it
>> with io, consider that io may rely on daemon thread to be done.
> 
> Hi Kuai
> 
> What's the meaning of "array reconfiguration" here? "mdadm -f/-r/-a"
> something like this, right?. Because before and after this patch, only
> one sync thread can be running, so If we don't do this change, are
> there bugs or performance problems?

As we discussed([1]), and explained in patch 0, the purpose of this
change is to prepare to synchronize io with array reconfiguration(add
or remove rdev from array, for example, modify
conf->mirrors[].rdev/replacement for raid10).

Without this change, normal io can rely on daemon thread, while daemone
thread can change array configuration. raid1/raid10 record such io as
'io_queued', and can use freeze_array() to do synchronization in daemon
thread. However, other personalities have to implement such logical as
well, and I found it quite complicated, at least for raid456.

[1] 
https://lore.kernel.org/all/cb390b39-1b1e-04e1-55ad-2ff8afc47e1b@huawei.com/

Thanks,
Kuai

> 
> If it's only a patch that wants to make md_check_recovery more clearer
> and easier, I'm good with this idea too.
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -next v2 7/7] md: delay remove_and_add_spares() for read only array to md_start_sync()
  2023-08-16  7:18   ` Xiao Ni
@ 2023-08-20  2:19     ` Yu Kuai
  0 siblings, 0 replies; 20+ messages in thread
From: Yu Kuai @ 2023-08-20  2:19 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: song, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/08/16 15:18, Xiao Ni 写道:
> On Tue, Aug 15, 2023 at 11:13 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Before this patch, for read-only array:
>>
>> md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, then it will
>> call remove_and_add_spares() directly to try to remove and add rdevs
>> from array.
>>
>> After this patch:
>>
>> 1) md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, and the
>>     worker 'sync_work' is not pending, and there are rdevs can be added
>>     or removed, then it will queue new work md_start_sync();
>> 2) md_start_sync() will call remove_and_add_spares() and exist;
> 
> Hi Kuai
> 
> If md_check_recovery returns and md_starts_sync doesn't start, during
> the window the raid changes to read-write, the sync thread can be run
> but MD_RECOVERY_RUNNING isn't set. Is there such a problem?

That's a good question, looks like this is possible. Read-only array
doesn't test or set MD_RECOVERY_RUNNING at all in md_check_recovery().
I'll use MD_RECOVERY_RUNNING to prevent such concurrent scenario in
the new version.

Thanks,
Kuai

> 
> Regards
> Xiao
> 
>>
>> This change make sure that array reconfiguration is independent from
>> daemon, and it'll be much easier to synchronize it with io, consier
>> that io may rely on daemon thread to be done.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/md.c | 37 +++++++++++++++++++++++++++----------
>>   1 file changed, 27 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index d26d2c35f9af..74d529479fcf 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -9191,6 +9191,16 @@ static bool rdev_addable(struct md_rdev *rdev)
>>          return true;
>>   }
>>
>> +static bool md_spares_need_change(struct mddev *mddev)
>> +{
>> +       struct md_rdev *rdev;
>> +
>> +       rdev_for_each(rdev, mddev)
>> +               if (rdev_removeable(rdev) || rdev_addable(rdev))
>> +                       return true;
>> +       return false;
>> +}
>> +
>>   static int remove_and_add_spares(struct mddev *mddev,
>>                                   struct md_rdev *this)
>>   {
>> @@ -9309,6 +9319,12 @@ static void md_start_sync(struct work_struct *ws)
>>
>>          mddev_lock_nointr(mddev);
>>
>> +       if (!md_is_rdwr(mddev)) {
>> +               remove_and_add_spares(mddev, NULL);
>> +               mddev_unlock(mddev);
>> +               return;
>> +       }
>> +
>>          if (!md_choose_sync_direction(mddev, &spares))
>>                  goto not_running;
>>
>> @@ -9403,7 +9419,8 @@ void md_check_recovery(struct mddev *mddev)
>>          }
>>
>>          if (!md_is_rdwr(mddev) &&
>> -           !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
>> +           (!test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
>> +            work_pending(&mddev->sync_work)))
>>                  return;
>>          if ( ! (
>>                  (mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
>> @@ -9431,15 +9448,8 @@ void md_check_recovery(struct mddev *mddev)
>>                                   */
>>                                  rdev_for_each(rdev, mddev)
>>                                          clear_bit(Blocked, &rdev->flags);
>> -                       /* On a read-only array we can:
>> -                        * - remove failed devices
>> -                        * - add already-in_sync devices if the array itself
>> -                        *   is in-sync.
>> -                        * As we only add devices that are already in-sync,
>> -                        * we can activate the spares immediately.
>> -                        */
>> -                       remove_and_add_spares(mddev, NULL);
>> -                       /* There is no thread, but we need to call
>> +                       /*
>> +                        * There is no thread, but we need to call
>>                           * ->spare_active and clear saved_raid_disk
>>                           */
>>                          set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> @@ -9447,6 +9457,13 @@ void md_check_recovery(struct mddev *mddev)
>>                          clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>                          clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>                          clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>> +
>> +                       /*
>> +                        * Let md_start_sync() to remove and add rdevs to the
>> +                        * array.
>> +                        */
>> +                       if (md_spares_need_change(mddev))
>> +                               queue_work(md_misc_wq, &mddev->sync_work);
>>                          goto unlock;
>>                  }
>>
>> --
>> 2.39.2
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-08-20  2:47 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-15  3:09 [PATCH -next v2 0/7] md: make rdev addition and removal independent from daemon thread Yu Kuai
2023-08-15  3:09 ` [PATCH -next v2 1/7] md: use separate work_struct for md_start_sync() Yu Kuai
2023-08-15  3:09 ` [PATCH -next v2 2/7] md: factor out a helper to choose sync direction from md_check_recovery() Yu Kuai
2023-08-17  7:58   ` Mariusz Tkaczyk
2023-08-17 21:49     ` Song Liu
2023-08-20  1:44       ` Yu Kuai
2023-08-15  3:09 ` [PATCH -next v2 3/7] md: delay choosing sync direction to md_start_sync() Yu Kuai
2023-08-15  6:00   ` Yu Kuai
2023-08-15 15:54     ` Song Liu
2023-08-16  1:07       ` Yu Kuai
2023-08-17 21:53         ` Song Liu
2023-08-20  1:45           ` Yu Kuai
2023-08-16  6:38   ` Xiao Ni
2023-08-20  2:04     ` Yu Kuai
2023-08-15  3:09 ` [PATCH -next v2 4/7] md: factor out a helper rdev_removeable() from remove_and_add_spares() Yu Kuai
2023-08-15  3:09 ` [PATCH -next v2 5/7] md: factor out a helper rdev_is_spare() " Yu Kuai
2023-08-15  3:09 ` [PATCH -next v2 6/7] md: factor out a helper rdev_addable() " Yu Kuai
2023-08-15  3:09 ` [PATCH -next v2 7/7] md: delay remove_and_add_spares() for read only array to md_start_sync() Yu Kuai
2023-08-16  7:18   ` Xiao Ni
2023-08-20  2:19     ` Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).