All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
@ 2023-09-28  6:15 ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Changes in v3:
 - rebase with latest md-next;
 - remove patch 2 from v2, and replace it with a new patch;
 - fix a null-ptr-derefrence in rdev_attr_store() that mddev is used
 before checking;
 - merge patch 20-22 from v1 into one patch;
 - mddev_lock() used to be called first and can be interruptted, allow new
 api, which is called before mddev_lock() now, to be interruptted as well;
 - improve some comments and coding;

Changes in v2:
 - rebase with latest md-next;
 - remove some follow up cleanup patches, these patches will be sent
 later after this patchset.

After previous four patchset of preparatory work, this patchset impelement
a new version of mddev_suspend(), the new apis:
 - reconfig_mutex is not required;
 - the weird logical that suspend array hold 'reconfig_mutex' for
   mddev_check_recovery() to update superblock is not needed;
 - the special handling, 'pers->prepare_suspend', for raid456 is not
   needed;
 - It's safe to be called at any time once mddev is allocated, and it's
   designed to be used from slow path where array configuration is changed;

And use the new api to replace:

mddev_lock
mddev_suspend or not
// array reconfiguration
mddev_resume or not
mddev_unlock

With:

mddev_suspend
mddev_lock
// array reconfiguration
mddev_unlock
mddev_resume

However, the above change is not possible for raid5 and raid-cluster in
some corner cases, and mddev_suspend/resume() is replaced with quiesce()
callback, which will suspend the array as well.

This patchset is tested in my VM with mdadm testsuite with loop device
except for 10ddf tests(they always fail before this patchset).

A lot of cleanups will be started after this patchset.

Yu Kuai (25):
  md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
  md: replace is_md_suspended() with 'mddev->suspended' in
    md_check_recovery()
  md: add new helpers to suspend/resume array
  md: add new helpers to suspend/resume and lock/unlock array
  md: use new apis to suspend array for suspend_lo/hi_store()
  md: use new apis to suspend array for level_store()
  md: use new apis to suspend array for serialize_policy_store()
  md/dm-raid: use new apis to suspend array
  md/md-bitmap: use new apis to suspend array for location_store()
  md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
  md/raid5-cache: use new apis to suspend array for
    r5c_disable_writeback_async()
  md/raid5-cache: use new apis to suspend array for
    r5c_journal_mode_store()
  md/raid5: use new apis to suspend array for raid5_store_stripe_size()
  md/raid5: use new apis to suspend array for raid5_store_skip_copy()
  md/raid5: use new apis to suspend array for
    raid5_store_group_thread_cnt()
  md/raid5: use new apis to suspend array for
    raid5_change_consistency_policy()
  md/raid5: replace suspend with quiesce() callback
  md: use new apis to suspend array for ioctls involed array
    reconfiguration
  md: use new apis to suspend array for adding/removing rdev from
    state_store()
  md: use new apis to suspend array before
    mddev_create/destroy_serial_pool
  md: cleanup mddev_create/destroy_serial_pool()
  md/md-linear: cleanup linear_add()
  md: suspend array in md_start_sync() if array need reconfiguration
  md: remove old apis to suspend the array
  md: rename __mddev_suspend/resume() back to mddev_suspend/resume()

 drivers/md/dm-raid.c       |  10 +-
 drivers/md/md-autodetect.c |   4 +-
 drivers/md/md-bitmap.c     |  18 ++-
 drivers/md/md-linear.c     |   2 -
 drivers/md/md.c            | 233 ++++++++++++++++++++-----------------
 drivers/md/md.h            |  43 +++++--
 drivers/md/raid5-cache.c   |  64 +++++-----
 drivers/md/raid5.c         |  56 ++++-----
 8 files changed, 226 insertions(+), 204 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
@ 2023-09-28  6:15 ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Changes in v3:
 - rebase with latest md-next;
 - remove patch 2 from v2, and replace it with a new patch;
 - fix a null-ptr-derefrence in rdev_attr_store() that mddev is used
 before checking;
 - merge patch 20-22 from v1 into one patch;
 - mddev_lock() used to be called first and can be interruptted, allow new
 api, which is called before mddev_lock() now, to be interruptted as well;
 - improve some comments and coding;

Changes in v2:
 - rebase with latest md-next;
 - remove some follow up cleanup patches, these patches will be sent
 later after this patchset.

After previous four patchset of preparatory work, this patchset impelement
a new version of mddev_suspend(), the new apis:
 - reconfig_mutex is not required;
 - the weird logical that suspend array hold 'reconfig_mutex' for
   mddev_check_recovery() to update superblock is not needed;
 - the special handling, 'pers->prepare_suspend', for raid456 is not
   needed;
 - It's safe to be called at any time once mddev is allocated, and it's
   designed to be used from slow path where array configuration is changed;

And use the new api to replace:

mddev_lock
mddev_suspend or not
// array reconfiguration
mddev_resume or not
mddev_unlock

With:

mddev_suspend
mddev_lock
// array reconfiguration
mddev_unlock
mddev_resume

However, the above change is not possible for raid5 and raid-cluster in
some corner cases, and mddev_suspend/resume() is replaced with quiesce()
callback, which will suspend the array as well.

This patchset is tested in my VM with mdadm testsuite with loop device
except for 10ddf tests(they always fail before this patchset).

A lot of cleanups will be started after this patchset.

Yu Kuai (25):
  md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
  md: replace is_md_suspended() with 'mddev->suspended' in
    md_check_recovery()
  md: add new helpers to suspend/resume array
  md: add new helpers to suspend/resume and lock/unlock array
  md: use new apis to suspend array for suspend_lo/hi_store()
  md: use new apis to suspend array for level_store()
  md: use new apis to suspend array for serialize_policy_store()
  md/dm-raid: use new apis to suspend array
  md/md-bitmap: use new apis to suspend array for location_store()
  md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
  md/raid5-cache: use new apis to suspend array for
    r5c_disable_writeback_async()
  md/raid5-cache: use new apis to suspend array for
    r5c_journal_mode_store()
  md/raid5: use new apis to suspend array for raid5_store_stripe_size()
  md/raid5: use new apis to suspend array for raid5_store_skip_copy()
  md/raid5: use new apis to suspend array for
    raid5_store_group_thread_cnt()
  md/raid5: use new apis to suspend array for
    raid5_change_consistency_policy()
  md/raid5: replace suspend with quiesce() callback
  md: use new apis to suspend array for ioctls involed array
    reconfiguration
  md: use new apis to suspend array for adding/removing rdev from
    state_store()
  md: use new apis to suspend array before
    mddev_create/destroy_serial_pool
  md: cleanup mddev_create/destroy_serial_pool()
  md/md-linear: cleanup linear_add()
  md: suspend array in md_start_sync() if array need reconfiguration
  md: remove old apis to suspend the array
  md: rename __mddev_suspend/resume() back to mddev_suspend/resume()

 drivers/md/dm-raid.c       |  10 +-
 drivers/md/md-autodetect.c |   4 +-
 drivers/md/md-bitmap.c     |  18 ++-
 drivers/md/md-linear.c     |   2 -
 drivers/md/md.c            | 233 ++++++++++++++++++++-----------------
 drivers/md/md.h            |  43 +++++--
 drivers/md/raid5-cache.c   |  64 +++++-----
 drivers/md/raid5.c         |  56 ++++-----
 8 files changed, 226 insertions(+), 204 deletions(-)

-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH -next v3 01/25] md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Because reading 'suspend_lo' and 'suspend_hi' from md_handle_request()
is not protected, use READ_ONCE/WRITE_ONCE to prevent reading abnormal
value.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0b83a406e797..18a0f12dbf5f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -359,11 +359,11 @@ static bool is_suspended(struct mddev *mddev, struct bio *bio)
 		return true;
 	if (bio_data_dir(bio) != WRITE)
 		return false;
-	if (mddev->suspend_lo >= mddev->suspend_hi)
+	if (READ_ONCE(mddev->suspend_lo) >= READ_ONCE(mddev->suspend_hi))
 		return false;
-	if (bio->bi_iter.bi_sector >= mddev->suspend_hi)
+	if (bio->bi_iter.bi_sector >= READ_ONCE(mddev->suspend_hi))
 		return false;
-	if (bio_end_sector(bio) < mddev->suspend_lo)
+	if (bio_end_sector(bio) < READ_ONCE(mddev->suspend_lo))
 		return false;
 	return true;
 }
@@ -5176,7 +5176,8 @@ __ATTR(sync_max, S_IRUGO|S_IWUSR, max_sync_show, max_sync_store);
 static ssize_t
 suspend_lo_show(struct mddev *mddev, char *page)
 {
-	return sprintf(page, "%llu\n", (unsigned long long)mddev->suspend_lo);
+	return sprintf(page, "%llu\n",
+		       (unsigned long long)READ_ONCE(mddev->suspend_lo));
 }
 
 static ssize_t
@@ -5196,7 +5197,7 @@ suspend_lo_store(struct mddev *mddev, const char *buf, size_t len)
 		return err;
 
 	mddev_suspend(mddev);
-	mddev->suspend_lo = new;
+	WRITE_ONCE(mddev->suspend_lo, new);
 	mddev_resume(mddev);
 
 	mddev_unlock(mddev);
@@ -5208,7 +5209,8 @@ __ATTR(suspend_lo, S_IRUGO|S_IWUSR, suspend_lo_show, suspend_lo_store);
 static ssize_t
 suspend_hi_show(struct mddev *mddev, char *page)
 {
-	return sprintf(page, "%llu\n", (unsigned long long)mddev->suspend_hi);
+	return sprintf(page, "%llu\n",
+		       (unsigned long long)READ_ONCE(mddev->suspend_hi));
 }
 
 static ssize_t
@@ -5228,7 +5230,7 @@ suspend_hi_store(struct mddev *mddev, const char *buf, size_t len)
 		return err;
 
 	mddev_suspend(mddev);
-	mddev->suspend_hi = new;
+	WRITE_ONCE(mddev->suspend_hi, new);
 	mddev_resume(mddev);
 
 	mddev_unlock(mddev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 01/25] md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Because reading 'suspend_lo' and 'suspend_hi' from md_handle_request()
is not protected, use READ_ONCE/WRITE_ONCE to prevent reading abnormal
value.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0b83a406e797..18a0f12dbf5f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -359,11 +359,11 @@ static bool is_suspended(struct mddev *mddev, struct bio *bio)
 		return true;
 	if (bio_data_dir(bio) != WRITE)
 		return false;
-	if (mddev->suspend_lo >= mddev->suspend_hi)
+	if (READ_ONCE(mddev->suspend_lo) >= READ_ONCE(mddev->suspend_hi))
 		return false;
-	if (bio->bi_iter.bi_sector >= mddev->suspend_hi)
+	if (bio->bi_iter.bi_sector >= READ_ONCE(mddev->suspend_hi))
 		return false;
-	if (bio_end_sector(bio) < mddev->suspend_lo)
+	if (bio_end_sector(bio) < READ_ONCE(mddev->suspend_lo))
 		return false;
 	return true;
 }
@@ -5176,7 +5176,8 @@ __ATTR(sync_max, S_IRUGO|S_IWUSR, max_sync_show, max_sync_store);
 static ssize_t
 suspend_lo_show(struct mddev *mddev, char *page)
 {
-	return sprintf(page, "%llu\n", (unsigned long long)mddev->suspend_lo);
+	return sprintf(page, "%llu\n",
+		       (unsigned long long)READ_ONCE(mddev->suspend_lo));
 }
 
 static ssize_t
@@ -5196,7 +5197,7 @@ suspend_lo_store(struct mddev *mddev, const char *buf, size_t len)
 		return err;
 
 	mddev_suspend(mddev);
-	mddev->suspend_lo = new;
+	WRITE_ONCE(mddev->suspend_lo, new);
 	mddev_resume(mddev);
 
 	mddev_unlock(mddev);
@@ -5208,7 +5209,8 @@ __ATTR(suspend_lo, S_IRUGO|S_IWUSR, suspend_lo_show, suspend_lo_store);
 static ssize_t
 suspend_hi_show(struct mddev *mddev, char *page)
 {
-	return sprintf(page, "%llu\n", (unsigned long long)mddev->suspend_hi);
+	return sprintf(page, "%llu\n",
+		       (unsigned long long)READ_ONCE(mddev->suspend_hi));
 }
 
 static ssize_t
@@ -5228,7 +5230,7 @@ suspend_hi_store(struct mddev *mddev, const char *buf, size_t len)
 		return err;
 
 	mddev_suspend(mddev);
-	mddev->suspend_hi = new;
+	WRITE_ONCE(mddev->suspend_hi, new);
 	mddev_resume(mddev);
 
 	mddev_unlock(mddev);
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 02/25] md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Prepare to cleanup pers->prepare_suspend(), which is used to fix a
deadlock in raid456 by returning error for io that is waiting for
reshape to make progress in mddev_suspend().

This change will allow reshape to make progress while waiting for io to
be done in mddev_suspend() in following patches.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 18a0f12dbf5f..e460b380143d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9415,7 +9415,7 @@ void md_check_recovery(struct mddev *mddev)
 		wake_up(&mddev->sb_wait);
 	}
 
-	if (is_md_suspended(mddev))
+	if (READ_ONCE(mddev->suspended))
 		return;
 
 	if (mddev->bitmap)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 02/25] md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Prepare to cleanup pers->prepare_suspend(), which is used to fix a
deadlock in raid456 by returning error for io that is waiting for
reshape to make progress in mddev_suspend().

This change will allow reshape to make progress while waiting for io to
be done in mddev_suspend() in following patches.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 18a0f12dbf5f..e460b380143d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9415,7 +9415,7 @@ void md_check_recovery(struct mddev *mddev)
 		wake_up(&mddev->sb_wait);
 	}
 
-	if (is_md_suspended(mddev))
+	if (READ_ONCE(mddev->suspended))
 		return;
 
 	if (mddev->bitmap)
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 03/25] md: add new helpers to suspend/resume array
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Advantages for new apis:
 - reconfig_mutex is not required;
 - the weird logical that suspend array hold 'reconfig_mutex' for
   mddev_check_recovery() to update superblock is not needed;
 - the specail handling, 'pers->prepare_suspend', for raid456 is not
   needed;
 - It's safe to be called at any time once mddev is allocated, and it's
   designed to be used from slow path where array configuration is changed;
 - the new helpers is designed to be called before mddev_lock(), hence
   it support to be interrupted by user as well.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/md/md.h |   3 ++
 2 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e460b380143d..a075d03d03d3 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -443,12 +443,22 @@ void mddev_suspend(struct mddev *mddev)
 			lockdep_is_held(&mddev->reconfig_mutex));
 
 	WARN_ON_ONCE(thread && current == thread->tsk);
-	if (mddev->suspended++)
+
+	/* can't concurrent with __mddev_suspend() and __mddev_resume() */
+	mutex_lock(&mddev->suspend_mutex);
+	if (mddev->suspended++) {
+		mutex_unlock(&mddev->suspend_mutex);
 		return;
+	}
+
 	wake_up(&mddev->sb_wait);
 	set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
 	percpu_ref_kill(&mddev->active_io);
 
+	/*
+	 * TODO: cleanup 'pers->prepare_suspend after all callers are replaced
+	 * by __mddev_suspend().
+	 */
 	if (mddev->pers && mddev->pers->prepare_suspend)
 		mddev->pers->prepare_suspend(mddev);
 
@@ -459,14 +469,21 @@ void mddev_suspend(struct mddev *mddev)
 	del_timer_sync(&mddev->safemode_timer);
 	/* restrict memory reclaim I/O during raid array is suspend */
 	mddev->noio_flag = memalloc_noio_save();
+
+	mutex_unlock(&mddev->suspend_mutex);
 }
 EXPORT_SYMBOL_GPL(mddev_suspend);
 
 void mddev_resume(struct mddev *mddev)
 {
 	lockdep_assert_held(&mddev->reconfig_mutex);
-	if (--mddev->suspended)
+
+	/* can't concurrent with __mddev_suspend() and __mddev_resume() */
+	mutex_lock(&mddev->suspend_mutex);
+	if (--mddev->suspended) {
+		mutex_unlock(&mddev->suspend_mutex);
 		return;
+	}
 
 	/* entred the memalloc scope from mddev_suspend() */
 	memalloc_noio_restore(mddev->noio_flag);
@@ -477,9 +494,89 @@ void mddev_resume(struct mddev *mddev)
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	md_wakeup_thread(mddev->thread);
 	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
+
+	mutex_unlock(&mddev->suspend_mutex);
 }
 EXPORT_SYMBOL_GPL(mddev_resume);
 
+int __mddev_suspend(struct mddev *mddev, bool interruptible)
+{
+	int err = 0;
+
+	/*
+	 * hold reconfig_mutex to wait for normal io will deadlock, because
+	 * other context can't update super_block, and normal io can rely on
+	 * updating super_block.
+	 */
+	lockdep_assert_not_held(&mddev->reconfig_mutex);
+
+	if (interruptible)
+		err = mutex_lock_interruptible(&mddev->suspend_mutex);
+	else
+		mutex_lock(&mddev->suspend_mutex);
+	if (err)
+		return err;
+
+	if (mddev->suspended) {
+		WRITE_ONCE(mddev->suspended, mddev->suspended + 1);
+		mutex_unlock(&mddev->suspend_mutex);
+		return 0;
+	}
+
+	percpu_ref_kill(&mddev->active_io);
+	if (interruptible)
+		err = wait_event_interruptible(mddev->sb_wait,
+				percpu_ref_is_zero(&mddev->active_io));
+	else
+		wait_event(mddev->sb_wait,
+				percpu_ref_is_zero(&mddev->active_io));
+	if (err) {
+		percpu_ref_resurrect(&mddev->active_io);
+		mutex_unlock(&mddev->suspend_mutex);
+		return err;
+	}
+
+	/*
+	 * For raid456, io might be waiting for reshape to make progress,
+	 * allow new reshape to start while waiting for io to be done to
+	 * prevent deadlock.
+	 */
+	WRITE_ONCE(mddev->suspended, mddev->suspended + 1);
+
+	del_timer_sync(&mddev->safemode_timer);
+	/* restrict memory reclaim I/O during raid array is suspend */
+	mddev->noio_flag = memalloc_noio_save();
+
+	mutex_unlock(&mddev->suspend_mutex);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__mddev_suspend);
+
+void __mddev_resume(struct mddev *mddev)
+{
+	lockdep_assert_not_held(&mddev->reconfig_mutex);
+
+	mutex_lock(&mddev->suspend_mutex);
+	WRITE_ONCE(mddev->suspended, mddev->suspended - 1);
+	if (mddev->suspended) {
+		mutex_unlock(&mddev->suspend_mutex);
+		return;
+	}
+
+	/* entred the memalloc scope from __mddev_suspend() */
+	memalloc_noio_restore(mddev->noio_flag);
+
+	percpu_ref_resurrect(&mddev->active_io);
+	wake_up(&mddev->sb_wait);
+
+	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+	md_wakeup_thread(mddev->thread);
+	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
+
+	mutex_unlock(&mddev->suspend_mutex);
+}
+EXPORT_SYMBOL_GPL(__mddev_resume);
+
 /*
  * Generic flush handling for md
  */
@@ -672,6 +769,7 @@ int mddev_init(struct mddev *mddev)
 	mutex_init(&mddev->open_mutex);
 	mutex_init(&mddev->reconfig_mutex);
 	mutex_init(&mddev->sync_mutex);
+	mutex_init(&mddev->suspend_mutex);
 	mutex_init(&mddev->bitmap_info.mutex);
 	INIT_LIST_HEAD(&mddev->disks);
 	INIT_LIST_HEAD(&mddev->all_mddevs);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b628c292506e..b5894dc64615 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -316,6 +316,7 @@ struct mddev {
 	unsigned long			sb_flags;
 
 	int				suspended;
+	struct mutex			suspend_mutex;
 	struct percpu_ref		active_io;
 	int				ro;
 	int				sysfs_active; /* set when sysfs deletes
@@ -811,6 +812,8 @@ extern void md_rdev_clear(struct md_rdev *rdev);
 extern void md_handle_request(struct mddev *mddev, struct bio *bio);
 extern void mddev_suspend(struct mddev *mddev);
 extern void mddev_resume(struct mddev *mddev);
+extern int __mddev_suspend(struct mddev *mddev, bool interruptible);
+extern void __mddev_resume(struct mddev *mddev);
 
 extern void md_reload_sb(struct mddev *mddev, int raid_disk);
 extern void md_update_sb(struct mddev *mddev, int force);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 03/25] md: add new helpers to suspend/resume array
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Advantages for new apis:
 - reconfig_mutex is not required;
 - the weird logical that suspend array hold 'reconfig_mutex' for
   mddev_check_recovery() to update superblock is not needed;
 - the specail handling, 'pers->prepare_suspend', for raid456 is not
   needed;
 - It's safe to be called at any time once mddev is allocated, and it's
   designed to be used from slow path where array configuration is changed;
 - the new helpers is designed to be called before mddev_lock(), hence
   it support to be interrupted by user as well.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/md/md.h |   3 ++
 2 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e460b380143d..a075d03d03d3 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -443,12 +443,22 @@ void mddev_suspend(struct mddev *mddev)
 			lockdep_is_held(&mddev->reconfig_mutex));
 
 	WARN_ON_ONCE(thread && current == thread->tsk);
-	if (mddev->suspended++)
+
+	/* can't concurrent with __mddev_suspend() and __mddev_resume() */
+	mutex_lock(&mddev->suspend_mutex);
+	if (mddev->suspended++) {
+		mutex_unlock(&mddev->suspend_mutex);
 		return;
+	}
+
 	wake_up(&mddev->sb_wait);
 	set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
 	percpu_ref_kill(&mddev->active_io);
 
+	/*
+	 * TODO: cleanup 'pers->prepare_suspend after all callers are replaced
+	 * by __mddev_suspend().
+	 */
 	if (mddev->pers && mddev->pers->prepare_suspend)
 		mddev->pers->prepare_suspend(mddev);
 
@@ -459,14 +469,21 @@ void mddev_suspend(struct mddev *mddev)
 	del_timer_sync(&mddev->safemode_timer);
 	/* restrict memory reclaim I/O during raid array is suspend */
 	mddev->noio_flag = memalloc_noio_save();
+
+	mutex_unlock(&mddev->suspend_mutex);
 }
 EXPORT_SYMBOL_GPL(mddev_suspend);
 
 void mddev_resume(struct mddev *mddev)
 {
 	lockdep_assert_held(&mddev->reconfig_mutex);
-	if (--mddev->suspended)
+
+	/* can't concurrent with __mddev_suspend() and __mddev_resume() */
+	mutex_lock(&mddev->suspend_mutex);
+	if (--mddev->suspended) {
+		mutex_unlock(&mddev->suspend_mutex);
 		return;
+	}
 
 	/* entred the memalloc scope from mddev_suspend() */
 	memalloc_noio_restore(mddev->noio_flag);
@@ -477,9 +494,89 @@ void mddev_resume(struct mddev *mddev)
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	md_wakeup_thread(mddev->thread);
 	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
+
+	mutex_unlock(&mddev->suspend_mutex);
 }
 EXPORT_SYMBOL_GPL(mddev_resume);
 
+int __mddev_suspend(struct mddev *mddev, bool interruptible)
+{
+	int err = 0;
+
+	/*
+	 * hold reconfig_mutex to wait for normal io will deadlock, because
+	 * other context can't update super_block, and normal io can rely on
+	 * updating super_block.
+	 */
+	lockdep_assert_not_held(&mddev->reconfig_mutex);
+
+	if (interruptible)
+		err = mutex_lock_interruptible(&mddev->suspend_mutex);
+	else
+		mutex_lock(&mddev->suspend_mutex);
+	if (err)
+		return err;
+
+	if (mddev->suspended) {
+		WRITE_ONCE(mddev->suspended, mddev->suspended + 1);
+		mutex_unlock(&mddev->suspend_mutex);
+		return 0;
+	}
+
+	percpu_ref_kill(&mddev->active_io);
+	if (interruptible)
+		err = wait_event_interruptible(mddev->sb_wait,
+				percpu_ref_is_zero(&mddev->active_io));
+	else
+		wait_event(mddev->sb_wait,
+				percpu_ref_is_zero(&mddev->active_io));
+	if (err) {
+		percpu_ref_resurrect(&mddev->active_io);
+		mutex_unlock(&mddev->suspend_mutex);
+		return err;
+	}
+
+	/*
+	 * For raid456, io might be waiting for reshape to make progress,
+	 * allow new reshape to start while waiting for io to be done to
+	 * prevent deadlock.
+	 */
+	WRITE_ONCE(mddev->suspended, mddev->suspended + 1);
+
+	del_timer_sync(&mddev->safemode_timer);
+	/* restrict memory reclaim I/O during raid array is suspend */
+	mddev->noio_flag = memalloc_noio_save();
+
+	mutex_unlock(&mddev->suspend_mutex);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__mddev_suspend);
+
+void __mddev_resume(struct mddev *mddev)
+{
+	lockdep_assert_not_held(&mddev->reconfig_mutex);
+
+	mutex_lock(&mddev->suspend_mutex);
+	WRITE_ONCE(mddev->suspended, mddev->suspended - 1);
+	if (mddev->suspended) {
+		mutex_unlock(&mddev->suspend_mutex);
+		return;
+	}
+
+	/* entred the memalloc scope from __mddev_suspend() */
+	memalloc_noio_restore(mddev->noio_flag);
+
+	percpu_ref_resurrect(&mddev->active_io);
+	wake_up(&mddev->sb_wait);
+
+	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+	md_wakeup_thread(mddev->thread);
+	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
+
+	mutex_unlock(&mddev->suspend_mutex);
+}
+EXPORT_SYMBOL_GPL(__mddev_resume);
+
 /*
  * Generic flush handling for md
  */
@@ -672,6 +769,7 @@ int mddev_init(struct mddev *mddev)
 	mutex_init(&mddev->open_mutex);
 	mutex_init(&mddev->reconfig_mutex);
 	mutex_init(&mddev->sync_mutex);
+	mutex_init(&mddev->suspend_mutex);
 	mutex_init(&mddev->bitmap_info.mutex);
 	INIT_LIST_HEAD(&mddev->disks);
 	INIT_LIST_HEAD(&mddev->all_mddevs);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b628c292506e..b5894dc64615 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -316,6 +316,7 @@ struct mddev {
 	unsigned long			sb_flags;
 
 	int				suspended;
+	struct mutex			suspend_mutex;
 	struct percpu_ref		active_io;
 	int				ro;
 	int				sysfs_active; /* set when sysfs deletes
@@ -811,6 +812,8 @@ extern void md_rdev_clear(struct md_rdev *rdev);
 extern void md_handle_request(struct mddev *mddev, struct bio *bio);
 extern void mddev_suspend(struct mddev *mddev);
 extern void mddev_resume(struct mddev *mddev);
+extern int __mddev_suspend(struct mddev *mddev, bool interruptible);
+extern void __mddev_resume(struct mddev *mddev);
 
 extern void md_reload_sb(struct mddev *mddev, int raid_disk);
 extern void md_update_sb(struct mddev *mddev, int force);
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 04/25] md: add new helpers to suspend/resume and lock/unlock array
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

The new helpers suspend the array first and then lock the array,

Prepare to refactor from:

mddev_lock/lock_nointr
mddev_suspend
...
mddev_resuem
mddev_lock

With:

mddev_suspend_and_lock/lock_nointr
...
mddev_unlock_and_resume

After all the use cases is refactored, mddev_suspend/resume() will be
removed.

And mddev_suspend_and_lock() will also replace mddev_lock() for the case
that the array will be reconfigured, in order to synchronize with io to
prevent problems in many corner cases.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.h | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index b5894dc64615..5c8f3f045e78 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -858,6 +858,33 @@ static inline void mddev_check_write_zeroes(struct mddev *mddev, struct bio *bio
 		mddev->queue->limits.max_write_zeroes_sectors = 0;
 }
 
+static inline int mddev_suspend_and_lock(struct mddev *mddev)
+{
+	int ret;
+
+	ret = __mddev_suspend(mddev, true);
+	if (ret)
+		return ret;
+
+	ret = mddev_lock(mddev);
+	if (ret)
+		__mddev_resume(mddev);
+
+	return ret;
+}
+
+static inline void mddev_suspend_and_lock_nointr(struct mddev *mddev)
+{
+	__mddev_suspend(mddev, false);
+	mutex_lock(&mddev->reconfig_mutex);
+}
+
+static inline void mddev_unlock_and_resume(struct mddev *mddev)
+{
+	mddev_unlock(mddev);
+	__mddev_resume(mddev);
+}
+
 struct mdu_array_info_s;
 struct mdu_disk_info_s;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 04/25] md: add new helpers to suspend/resume and lock/unlock array
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

The new helpers suspend the array first and then lock the array,

Prepare to refactor from:

mddev_lock/lock_nointr
mddev_suspend
...
mddev_resuem
mddev_lock

With:

mddev_suspend_and_lock/lock_nointr
...
mddev_unlock_and_resume

After all the use cases is refactored, mddev_suspend/resume() will be
removed.

And mddev_suspend_and_lock() will also replace mddev_lock() for the case
that the array will be reconfigured, in order to synchronize with io to
prevent problems in many corner cases.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.h | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index b5894dc64615..5c8f3f045e78 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -858,6 +858,33 @@ static inline void mddev_check_write_zeroes(struct mddev *mddev, struct bio *bio
 		mddev->queue->limits.max_write_zeroes_sectors = 0;
 }
 
+static inline int mddev_suspend_and_lock(struct mddev *mddev)
+{
+	int ret;
+
+	ret = __mddev_suspend(mddev, true);
+	if (ret)
+		return ret;
+
+	ret = mddev_lock(mddev);
+	if (ret)
+		__mddev_resume(mddev);
+
+	return ret;
+}
+
+static inline void mddev_suspend_and_lock_nointr(struct mddev *mddev)
+{
+	__mddev_suspend(mddev, false);
+	mutex_lock(&mddev->reconfig_mutex);
+}
+
+static inline void mddev_unlock_and_resume(struct mddev *mddev)
+{
+	mddev_unlock(mddev);
+	__mddev_resume(mddev);
+}
+
 struct mdu_array_info_s;
 struct mdu_disk_info_s;
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 05/25] md: use new apis to suspend array for suspend_lo/hi_store()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

These are not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index a075d03d03d3..d1ec4805aa4e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5290,15 +5290,13 @@ suspend_lo_store(struct mddev *mddev, const char *buf, size_t len)
 	if (new != (sector_t)new)
 		return -EINVAL;
 
-	err = mddev_lock(mddev);
+	err = __mddev_suspend(mddev, true);
 	if (err)
 		return err;
 
-	mddev_suspend(mddev);
 	WRITE_ONCE(mddev->suspend_lo, new);
-	mddev_resume(mddev);
+	__mddev_resume(mddev);
 
-	mddev_unlock(mddev);
 	return len;
 }
 static struct md_sysfs_entry md_suspend_lo =
@@ -5323,15 +5321,13 @@ suspend_hi_store(struct mddev *mddev, const char *buf, size_t len)
 	if (new != (sector_t)new)
 		return -EINVAL;
 
-	err = mddev_lock(mddev);
+	err = __mddev_suspend(mddev, true);
 	if (err)
 		return err;
 
-	mddev_suspend(mddev);
 	WRITE_ONCE(mddev->suspend_hi, new);
-	mddev_resume(mddev);
+	__mddev_resume(mddev);
 
-	mddev_unlock(mddev);
 	return len;
 }
 static struct md_sysfs_entry md_suspend_hi =
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 05/25] md: use new apis to suspend array for suspend_lo/hi_store()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

These are not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index a075d03d03d3..d1ec4805aa4e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5290,15 +5290,13 @@ suspend_lo_store(struct mddev *mddev, const char *buf, size_t len)
 	if (new != (sector_t)new)
 		return -EINVAL;
 
-	err = mddev_lock(mddev);
+	err = __mddev_suspend(mddev, true);
 	if (err)
 		return err;
 
-	mddev_suspend(mddev);
 	WRITE_ONCE(mddev->suspend_lo, new);
-	mddev_resume(mddev);
+	__mddev_resume(mddev);
 
-	mddev_unlock(mddev);
 	return len;
 }
 static struct md_sysfs_entry md_suspend_lo =
@@ -5323,15 +5321,13 @@ suspend_hi_store(struct mddev *mddev, const char *buf, size_t len)
 	if (new != (sector_t)new)
 		return -EINVAL;
 
-	err = mddev_lock(mddev);
+	err = __mddev_suspend(mddev, true);
 	if (err)
 		return err;
 
-	mddev_suspend(mddev);
 	WRITE_ONCE(mddev->suspend_hi, new);
-	mddev_resume(mddev);
+	__mddev_resume(mddev);
 
-	mddev_unlock(mddev);
 	return len;
 }
 static struct md_sysfs_entry md_suspend_hi =
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 06/25] md: use new apis to suspend array for level_store()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d1ec4805aa4e..740c477a6149 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4019,7 +4019,7 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
 	if (slen == 0 || slen >= sizeof(clevel))
 		return -EINVAL;
 
-	rv = mddev_lock(mddev);
+	rv = mddev_suspend_and_lock(mddev);
 	if (rv)
 		return rv;
 
@@ -4112,7 +4112,6 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
 	}
 
 	/* Looks like we have a winner */
-	mddev_suspend(mddev);
 	mddev_detach(mddev);
 
 	spin_lock(&mddev->lock);
@@ -4198,14 +4197,13 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
 	blk_set_stacking_limits(&mddev->queue->limits);
 	pers->run(mddev);
 	set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
-	mddev_resume(mddev);
 	if (!mddev->thread)
 		md_update_sb(mddev, 1);
 	sysfs_notify_dirent_safe(mddev->sysfs_level);
 	md_new_event();
 	rv = len;
 out_unlock:
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return rv;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 06/25] md: use new apis to suspend array for level_store()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d1ec4805aa4e..740c477a6149 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4019,7 +4019,7 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
 	if (slen == 0 || slen >= sizeof(clevel))
 		return -EINVAL;
 
-	rv = mddev_lock(mddev);
+	rv = mddev_suspend_and_lock(mddev);
 	if (rv)
 		return rv;
 
@@ -4112,7 +4112,6 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
 	}
 
 	/* Looks like we have a winner */
-	mddev_suspend(mddev);
 	mddev_detach(mddev);
 
 	spin_lock(&mddev->lock);
@@ -4198,14 +4197,13 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
 	blk_set_stacking_limits(&mddev->queue->limits);
 	pers->run(mddev);
 	set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
-	mddev_resume(mddev);
 	if (!mddev->thread)
 		md_update_sb(mddev, 1);
 	sysfs_notify_dirent_safe(mddev->sysfs_level);
 	md_new_event();
 	rv = len;
 out_unlock:
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return rv;
 }
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 07/25] md: use new apis to suspend array for serialize_policy_store()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 740c477a6149..0c5a6169453c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5573,7 +5573,7 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
 	if (value == mddev->serialize_policy)
 		return len;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	if (mddev->pers == NULL || (mddev->pers->level != 1)) {
@@ -5582,15 +5582,13 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
 		goto unlock;
 	}
 
-	mddev_suspend(mddev);
 	if (value)
 		mddev_create_serial_pool(mddev, NULL, true);
 	else
 		mddev_destroy_serial_pool(mddev, NULL, true);
 	mddev->serialize_policy = value;
-	mddev_resume(mddev);
 unlock:
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return err ?: len;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 07/25] md: use new apis to suspend array for serialize_policy_store()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 740c477a6149..0c5a6169453c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5573,7 +5573,7 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
 	if (value == mddev->serialize_policy)
 		return len;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	if (mddev->pers == NULL || (mddev->pers->level != 1)) {
@@ -5582,15 +5582,13 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
 		goto unlock;
 	}
 
-	mddev_suspend(mddev);
 	if (value)
 		mddev_create_serial_pool(mddev, NULL, true);
 	else
 		mddev_destroy_serial_pool(mddev, NULL, true);
 	mddev->serialize_policy = value;
-	mddev_resume(mddev);
 unlock:
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return err ?: len;
 }
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 08/25] md/dm-raid: use new apis to suspend array
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

These are not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/dm-raid.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 69805d37e113..05dd6ccf6f48 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3244,7 +3244,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);
 
 	/* Has to be held on running the array */
-	mddev_lock_nointr(&rs->md);
+	mddev_suspend_and_lock_nointr(&rs->md);
 	r = md_run(&rs->md);
 	rs->md.in_sync = 0; /* Assume already marked dirty */
 	if (r) {
@@ -3268,7 +3268,6 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		}
 	}
 
-	mddev_suspend(&rs->md);
 	set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags);
 
 	/* Try to adjust the raid4/5/6 stripe cache size to the stripe size */
@@ -3798,9 +3797,7 @@ static void raid_postsuspend(struct dm_target *ti)
 		if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
 			md_stop_writes(&rs->md);
 
-		mddev_lock_nointr(&rs->md);
-		mddev_suspend(&rs->md);
-		mddev_unlock(&rs->md);
+		__mddev_suspend(&rs->md, false);
 	}
 }
 
@@ -4012,7 +4009,7 @@ static int raid_preresume(struct dm_target *ti)
 	}
 
 	/* Check for any resize/reshape on @rs and adjust/initiate */
-	/* Be prepared for mddev_resume() in raid_resume() */
+	/* Be prepared for __mddev_resume() in raid_resume() */
 	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) {
 		set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
@@ -4059,8 +4056,7 @@ static void raid_resume(struct dm_target *ti)
 		clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 		mddev->ro = 0;
 		mddev->in_sync = 0;
-		mddev_resume(mddev);
-		mddev_unlock(mddev);
+		mddev_unlock_and_resume(mddev);
 	}
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 08/25] md/dm-raid: use new apis to suspend array
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

These are not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/dm-raid.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 69805d37e113..05dd6ccf6f48 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3244,7 +3244,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);
 
 	/* Has to be held on running the array */
-	mddev_lock_nointr(&rs->md);
+	mddev_suspend_and_lock_nointr(&rs->md);
 	r = md_run(&rs->md);
 	rs->md.in_sync = 0; /* Assume already marked dirty */
 	if (r) {
@@ -3268,7 +3268,6 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		}
 	}
 
-	mddev_suspend(&rs->md);
 	set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags);
 
 	/* Try to adjust the raid4/5/6 stripe cache size to the stripe size */
@@ -3798,9 +3797,7 @@ static void raid_postsuspend(struct dm_target *ti)
 		if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
 			md_stop_writes(&rs->md);
 
-		mddev_lock_nointr(&rs->md);
-		mddev_suspend(&rs->md);
-		mddev_unlock(&rs->md);
+		__mddev_suspend(&rs->md, false);
 	}
 }
 
@@ -4012,7 +4009,7 @@ static int raid_preresume(struct dm_target *ti)
 	}
 
 	/* Check for any resize/reshape on @rs and adjust/initiate */
-	/* Be prepared for mddev_resume() in raid_resume() */
+	/* Be prepared for __mddev_resume() in raid_resume() */
 	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) {
 		set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
@@ -4059,8 +4056,7 @@ static void raid_resume(struct dm_target *ti)
 		clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 		mddev->ro = 0;
 		mddev->in_sync = 0;
-		mddev_resume(mddev);
-		mddev_unlock(mddev);
+		mddev_unlock_and_resume(mddev);
 	}
 }
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 09/25] md/md-bitmap: use new apis to suspend array for location_store()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-bitmap.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 0c661e5036bb..7d21e2a5b06e 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2348,11 +2348,10 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
 {
 	int rv;
 
-	rv = mddev_lock(mddev);
+	rv = mddev_suspend_and_lock(mddev);
 	if (rv)
 		return rv;
 
-	mddev_suspend(mddev);
 	if (mddev->pers) {
 		if (mddev->recovery || mddev->sync_thread) {
 			rv = -EBUSY;
@@ -2429,8 +2428,7 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
 	}
 	rv = 0;
 out:
-	mddev_resume(mddev);
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	if (rv)
 		return rv;
 	return len;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 09/25] md/md-bitmap: use new apis to suspend array for location_store()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-bitmap.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 0c661e5036bb..7d21e2a5b06e 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2348,11 +2348,10 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
 {
 	int rv;
 
-	rv = mddev_lock(mddev);
+	rv = mddev_suspend_and_lock(mddev);
 	if (rv)
 		return rv;
 
-	mddev_suspend(mddev);
 	if (mddev->pers) {
 		if (mddev->recovery || mddev->sync_thread) {
 			rv = -EBUSY;
@@ -2429,8 +2428,7 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
 	}
 	rv = 0;
 out:
-	mddev_resume(mddev);
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	if (rv)
 		return rv;
 	return len;
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 10/25] md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

'conf->log' is set with 'reconfig_mutex' grabbed, however, readers are
not procted, hence use READ_ONCE/WRITE_ONCE to prevent reading abnormal
value.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5-cache.c | 47 +++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 518b7cfa78b9..889bba60d6ff 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -327,8 +327,9 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
 void r5c_check_stripe_cache_usage(struct r5conf *conf)
 {
 	int total_cached;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
-	if (!r5c_is_writeback(conf->log))
+	if (!r5c_is_writeback(log))
 		return;
 
 	total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
@@ -344,7 +345,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
 	 */
 	if (total_cached > conf->min_nr_stripes * 1 / 2 ||
 	    atomic_read(&conf->empty_inactive_list_nr) > 0)
-		r5l_wake_reclaim(conf->log, 0);
+		r5l_wake_reclaim(log, 0);
 }
 
 /*
@@ -353,7 +354,9 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
  */
 void r5c_check_cached_full_stripe(struct r5conf *conf)
 {
-	if (!r5c_is_writeback(conf->log))
+	struct r5l_log *log = READ_ONCE(conf->log);
+
+	if (!r5c_is_writeback(log))
 		return;
 
 	/*
@@ -363,7 +366,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
 	if (atomic_read(&conf->r5c_cached_full_stripes) >=
 	    min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
 		conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
-		r5l_wake_reclaim(conf->log, 0);
+		r5l_wake_reclaim(log, 0);
 }
 
 /*
@@ -396,7 +399,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
  */
 static sector_t r5c_log_required_to_flush_cache(struct r5conf *conf)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
 	if (!r5c_is_writeback(log))
 		return 0;
@@ -449,7 +452,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
 void r5c_make_stripe_write_out(struct stripe_head *sh)
 {
 	struct r5conf *conf = sh->raid_conf;
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
 	BUG_ON(!r5c_is_writeback(log));
 
@@ -491,7 +494,7 @@ static void r5c_handle_parity_cached(struct stripe_head *sh)
  */
 static void r5c_finish_cache_stripe(struct stripe_head *sh)
 {
-	struct r5l_log *log = sh->raid_conf->log;
+	struct r5l_log *log = READ_ONCE(sh->raid_conf->log);
 
 	if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
 		BUG_ON(test_bit(STRIPE_R5C_CACHING, &sh->state));
@@ -692,7 +695,7 @@ static void r5c_disable_writeback_async(struct work_struct *work)
 
 	/* wait superblock change before suspend */
 	wait_event(mddev->sb_wait,
-		   conf->log == NULL ||
+		   !READ_ONCE(conf->log) ||
 		   (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
 		    (locked = mddev_trylock(mddev))));
 	if (locked) {
@@ -1151,7 +1154,7 @@ static void r5l_run_no_space_stripes(struct r5l_log *log)
 static sector_t r5c_calculate_new_cp(struct r5conf *conf)
 {
 	struct stripe_head *sh;
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 	sector_t new_cp;
 	unsigned long flags;
 
@@ -1159,12 +1162,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
 		return log->next_checkpoint;
 
 	spin_lock_irqsave(&log->stripe_in_journal_lock, flags);
-	if (list_empty(&conf->log->stripe_in_journal_list)) {
+	if (list_empty(&log->stripe_in_journal_list)) {
 		/* all stripes flushed */
 		spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
 		return log->next_checkpoint;
 	}
-	sh = list_first_entry(&conf->log->stripe_in_journal_list,
+	sh = list_first_entry(&log->stripe_in_journal_list,
 			      struct stripe_head, r5c);
 	new_cp = sh->log_start;
 	spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
@@ -1399,7 +1402,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
 	struct stripe_head *sh, *next;
 
 	lockdep_assert_held(&conf->device_lock);
-	if (!conf->log)
+	if (!READ_ONCE(conf->log))
 		return;
 
 	count = 0;
@@ -1420,7 +1423,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
 
 static void r5c_do_reclaim(struct r5conf *conf)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 	struct stripe_head *sh;
 	int count = 0;
 	unsigned long flags;
@@ -1549,7 +1552,7 @@ static void r5l_reclaim_thread(struct md_thread *thread)
 {
 	struct mddev *mddev = thread->mddev;
 	struct r5conf *conf = mddev->private;
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
 	if (!log)
 		return;
@@ -1591,7 +1594,7 @@ void r5l_quiesce(struct r5l_log *log, int quiesce)
 
 bool r5l_log_disk_error(struct r5conf *conf)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
 	/* don't allow write if journal disk is missing */
 	if (!log)
@@ -2635,7 +2638,7 @@ int r5c_try_caching_write(struct r5conf *conf,
 			  struct stripe_head_state *s,
 			  int disks)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 	int i;
 	struct r5dev *dev;
 	int to_cache = 0;
@@ -2802,7 +2805,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
 				 struct stripe_head *sh,
 				 struct stripe_head_state *s)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 	int i;
 	int do_wakeup = 0;
 	sector_t tree_index;
@@ -2941,7 +2944,7 @@ int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh)
 /* check whether this big stripe is in write back cache. */
 bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 	sector_t tree_index;
 	void *slot;
 
@@ -3049,14 +3052,14 @@ int r5l_start(struct r5l_log *log)
 void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev)
 {
 	struct r5conf *conf = mddev->private;
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
 	if (!log)
 		return;
 
 	if ((raid5_calc_degraded(conf) > 0 ||
 	     test_bit(Journal, &rdev->flags)) &&
-	    conf->log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
+	    log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
 		schedule_work(&log->disable_writeback_work);
 }
 
@@ -3145,7 +3148,7 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
 	spin_lock_init(&log->stripe_in_journal_lock);
 	atomic_set(&log->stripe_in_journal_count, 0);
 
-	conf->log = log;
+	WRITE_ONCE(conf->log, log);
 
 	set_bit(MD_HAS_JOURNAL, &conf->mddev->flags);
 	return 0;
@@ -3173,7 +3176,7 @@ void r5l_exit_log(struct r5conf *conf)
 	 * 'reconfig_mutex' is held by caller, set 'confg->log' to NULL to
 	 * ensure disable_writeback_work wakes up and exits.
 	 */
-	conf->log = NULL;
+	WRITE_ONCE(conf->log, NULL);
 	wake_up(&conf->mddev->sb_wait);
 	flush_work(&log->disable_writeback_work);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 10/25] md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

'conf->log' is set with 'reconfig_mutex' grabbed, however, readers are
not procted, hence use READ_ONCE/WRITE_ONCE to prevent reading abnormal
value.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5-cache.c | 47 +++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 518b7cfa78b9..889bba60d6ff 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -327,8 +327,9 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
 void r5c_check_stripe_cache_usage(struct r5conf *conf)
 {
 	int total_cached;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
-	if (!r5c_is_writeback(conf->log))
+	if (!r5c_is_writeback(log))
 		return;
 
 	total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
@@ -344,7 +345,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
 	 */
 	if (total_cached > conf->min_nr_stripes * 1 / 2 ||
 	    atomic_read(&conf->empty_inactive_list_nr) > 0)
-		r5l_wake_reclaim(conf->log, 0);
+		r5l_wake_reclaim(log, 0);
 }
 
 /*
@@ -353,7 +354,9 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
  */
 void r5c_check_cached_full_stripe(struct r5conf *conf)
 {
-	if (!r5c_is_writeback(conf->log))
+	struct r5l_log *log = READ_ONCE(conf->log);
+
+	if (!r5c_is_writeback(log))
 		return;
 
 	/*
@@ -363,7 +366,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
 	if (atomic_read(&conf->r5c_cached_full_stripes) >=
 	    min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
 		conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
-		r5l_wake_reclaim(conf->log, 0);
+		r5l_wake_reclaim(log, 0);
 }
 
 /*
@@ -396,7 +399,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
  */
 static sector_t r5c_log_required_to_flush_cache(struct r5conf *conf)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
 	if (!r5c_is_writeback(log))
 		return 0;
@@ -449,7 +452,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
 void r5c_make_stripe_write_out(struct stripe_head *sh)
 {
 	struct r5conf *conf = sh->raid_conf;
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
 	BUG_ON(!r5c_is_writeback(log));
 
@@ -491,7 +494,7 @@ static void r5c_handle_parity_cached(struct stripe_head *sh)
  */
 static void r5c_finish_cache_stripe(struct stripe_head *sh)
 {
-	struct r5l_log *log = sh->raid_conf->log;
+	struct r5l_log *log = READ_ONCE(sh->raid_conf->log);
 
 	if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
 		BUG_ON(test_bit(STRIPE_R5C_CACHING, &sh->state));
@@ -692,7 +695,7 @@ static void r5c_disable_writeback_async(struct work_struct *work)
 
 	/* wait superblock change before suspend */
 	wait_event(mddev->sb_wait,
-		   conf->log == NULL ||
+		   !READ_ONCE(conf->log) ||
 		   (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
 		    (locked = mddev_trylock(mddev))));
 	if (locked) {
@@ -1151,7 +1154,7 @@ static void r5l_run_no_space_stripes(struct r5l_log *log)
 static sector_t r5c_calculate_new_cp(struct r5conf *conf)
 {
 	struct stripe_head *sh;
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 	sector_t new_cp;
 	unsigned long flags;
 
@@ -1159,12 +1162,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
 		return log->next_checkpoint;
 
 	spin_lock_irqsave(&log->stripe_in_journal_lock, flags);
-	if (list_empty(&conf->log->stripe_in_journal_list)) {
+	if (list_empty(&log->stripe_in_journal_list)) {
 		/* all stripes flushed */
 		spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
 		return log->next_checkpoint;
 	}
-	sh = list_first_entry(&conf->log->stripe_in_journal_list,
+	sh = list_first_entry(&log->stripe_in_journal_list,
 			      struct stripe_head, r5c);
 	new_cp = sh->log_start;
 	spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
@@ -1399,7 +1402,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
 	struct stripe_head *sh, *next;
 
 	lockdep_assert_held(&conf->device_lock);
-	if (!conf->log)
+	if (!READ_ONCE(conf->log))
 		return;
 
 	count = 0;
@@ -1420,7 +1423,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
 
 static void r5c_do_reclaim(struct r5conf *conf)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 	struct stripe_head *sh;
 	int count = 0;
 	unsigned long flags;
@@ -1549,7 +1552,7 @@ static void r5l_reclaim_thread(struct md_thread *thread)
 {
 	struct mddev *mddev = thread->mddev;
 	struct r5conf *conf = mddev->private;
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
 	if (!log)
 		return;
@@ -1591,7 +1594,7 @@ void r5l_quiesce(struct r5l_log *log, int quiesce)
 
 bool r5l_log_disk_error(struct r5conf *conf)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
 	/* don't allow write if journal disk is missing */
 	if (!log)
@@ -2635,7 +2638,7 @@ int r5c_try_caching_write(struct r5conf *conf,
 			  struct stripe_head_state *s,
 			  int disks)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 	int i;
 	struct r5dev *dev;
 	int to_cache = 0;
@@ -2802,7 +2805,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
 				 struct stripe_head *sh,
 				 struct stripe_head_state *s)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 	int i;
 	int do_wakeup = 0;
 	sector_t tree_index;
@@ -2941,7 +2944,7 @@ int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh)
 /* check whether this big stripe is in write back cache. */
 bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 	sector_t tree_index;
 	void *slot;
 
@@ -3049,14 +3052,14 @@ int r5l_start(struct r5l_log *log)
 void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev)
 {
 	struct r5conf *conf = mddev->private;
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = READ_ONCE(conf->log);
 
 	if (!log)
 		return;
 
 	if ((raid5_calc_degraded(conf) > 0 ||
 	     test_bit(Journal, &rdev->flags)) &&
-	    conf->log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
+	    log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
 		schedule_work(&log->disable_writeback_work);
 }
 
@@ -3145,7 +3148,7 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
 	spin_lock_init(&log->stripe_in_journal_lock);
 	atomic_set(&log->stripe_in_journal_count, 0);
 
-	conf->log = log;
+	WRITE_ONCE(conf->log, log);
 
 	set_bit(MD_HAS_JOURNAL, &conf->mddev->flags);
 	return 0;
@@ -3173,7 +3176,7 @@ void r5l_exit_log(struct r5conf *conf)
 	 * 'reconfig_mutex' is held by caller, set 'confg->log' to NULL to
 	 * ensure disable_writeback_work wakes up and exits.
 	 */
-	conf->log = NULL;
+	WRITE_ONCE(conf->log, NULL);
 	wake_up(&conf->mddev->sb_wait);
 	flush_work(&log->disable_writeback_work);
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 11/25] md/raid5-cache: use new apis to suspend array for r5c_disable_writeback_async()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5-cache.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 889bba60d6ff..01d33e5c19c1 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -686,7 +686,6 @@ static void r5c_disable_writeback_async(struct work_struct *work)
 					   disable_writeback_work);
 	struct mddev *mddev = log->rdev->mddev;
 	struct r5conf *conf = mddev->private;
-	int locked = 0;
 
 	if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH)
 		return;
@@ -696,13 +695,13 @@ static void r5c_disable_writeback_async(struct work_struct *work)
 	/* wait superblock change before suspend */
 	wait_event(mddev->sb_wait,
 		   !READ_ONCE(conf->log) ||
-		   (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
-		    (locked = mddev_trylock(mddev))));
-	if (locked) {
-		mddev_suspend(mddev);
+		   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
+
+	log = READ_ONCE(conf->log);
+	if (log) {
+		__mddev_suspend(mddev, false);
 		log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
-		mddev_resume(mddev);
-		mddev_unlock(mddev);
+		__mddev_resume(mddev);
 	}
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 11/25] md/raid5-cache: use new apis to suspend array for r5c_disable_writeback_async()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5-cache.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 889bba60d6ff..01d33e5c19c1 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -686,7 +686,6 @@ static void r5c_disable_writeback_async(struct work_struct *work)
 					   disable_writeback_work);
 	struct mddev *mddev = log->rdev->mddev;
 	struct r5conf *conf = mddev->private;
-	int locked = 0;
 
 	if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH)
 		return;
@@ -696,13 +695,13 @@ static void r5c_disable_writeback_async(struct work_struct *work)
 	/* wait superblock change before suspend */
 	wait_event(mddev->sb_wait,
 		   !READ_ONCE(conf->log) ||
-		   (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
-		    (locked = mddev_trylock(mddev))));
-	if (locked) {
-		mddev_suspend(mddev);
+		   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
+
+	log = READ_ONCE(conf->log);
+	if (log) {
+		__mddev_suspend(mddev, false);
 		log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
-		mddev_resume(mddev);
-		mddev_unlock(mddev);
+		__mddev_resume(mddev);
 	}
 }
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 12/25] md/raid5-cache: use new apis to suspend array for r5c_journal_mode_store()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

r5c_journal_mode_set() will suspend array and it has only 2 caller, the
other caller raid_ctl() already suspend the array with new apis.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5-cache.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 01d33e5c19c1..9909110262ee 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2585,9 +2585,7 @@ int r5c_journal_mode_set(struct mddev *mddev, int mode)
 	    mode == R5C_JOURNAL_MODE_WRITE_BACK)
 		return -EINVAL;
 
-	mddev_suspend(mddev);
 	conf->log->r5c_journal_mode = mode;
-	mddev_resume(mddev);
 
 	pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
 		 mdname(mddev), mode, r5c_journal_mode_str[mode]);
@@ -2612,11 +2610,11 @@ static ssize_t r5c_journal_mode_store(struct mddev *mddev,
 		if (strlen(r5c_journal_mode_str[mode]) == len &&
 		    !strncmp(page, r5c_journal_mode_str[mode], len))
 			break;
-	ret = mddev_lock(mddev);
+	ret = mddev_suspend_and_lock(mddev);
 	if (ret)
 		return ret;
 	ret = r5c_journal_mode_set(mddev, mode);
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return ret ?: length;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 12/25] md/raid5-cache: use new apis to suspend array for r5c_journal_mode_store()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

r5c_journal_mode_set() will suspend array and it has only 2 caller, the
other caller raid_ctl() already suspend the array with new apis.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5-cache.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 01d33e5c19c1..9909110262ee 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2585,9 +2585,7 @@ int r5c_journal_mode_set(struct mddev *mddev, int mode)
 	    mode == R5C_JOURNAL_MODE_WRITE_BACK)
 		return -EINVAL;
 
-	mddev_suspend(mddev);
 	conf->log->r5c_journal_mode = mode;
-	mddev_resume(mddev);
 
 	pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
 		 mdname(mddev), mode, r5c_journal_mode_str[mode]);
@@ -2612,11 +2610,11 @@ static ssize_t r5c_journal_mode_store(struct mddev *mddev,
 		if (strlen(r5c_journal_mode_str[mode]) == len &&
 		    !strncmp(page, r5c_journal_mode_str[mode], len))
 			break;
-	ret = mddev_lock(mddev);
+	ret = mddev_suspend_and_lock(mddev);
 	if (ret)
 		return ret;
 	ret = r5c_journal_mode_set(mddev, mode);
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return ret ?: length;
 }
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 13/25] md/raid5: use new apis to suspend array for raid5_store_stripe_size()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6383723468e5..f1c32b4d190f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7025,7 +7025,7 @@ raid5_store_stripe_size(struct mddev  *mddev, const char *page, size_t len)
 			new != roundup_pow_of_two(new))
 		return -EINVAL;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 
@@ -7049,7 +7049,6 @@ raid5_store_stripe_size(struct mddev  *mddev, const char *page, size_t len)
 		goto out_unlock;
 	}
 
-	mddev_suspend(mddev);
 	mutex_lock(&conf->cache_size_mutex);
 	size = conf->max_nr_stripes;
 
@@ -7064,10 +7063,9 @@ raid5_store_stripe_size(struct mddev  *mddev, const char *page, size_t len)
 		err = -ENOMEM;
 	}
 	mutex_unlock(&conf->cache_size_mutex);
-	mddev_resume(mddev);
 
 out_unlock:
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return err ?: len;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 13/25] md/raid5: use new apis to suspend array for raid5_store_stripe_size()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6383723468e5..f1c32b4d190f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7025,7 +7025,7 @@ raid5_store_stripe_size(struct mddev  *mddev, const char *page, size_t len)
 			new != roundup_pow_of_two(new))
 		return -EINVAL;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 
@@ -7049,7 +7049,6 @@ raid5_store_stripe_size(struct mddev  *mddev, const char *page, size_t len)
 		goto out_unlock;
 	}
 
-	mddev_suspend(mddev);
 	mutex_lock(&conf->cache_size_mutex);
 	size = conf->max_nr_stripes;
 
@@ -7064,10 +7063,9 @@ raid5_store_stripe_size(struct mddev  *mddev, const char *page, size_t len)
 		err = -ENOMEM;
 	}
 	mutex_unlock(&conf->cache_size_mutex);
-	mddev_resume(mddev);
 
 out_unlock:
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return err ?: len;
 }
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 14/25] md/raid5: use new apis to suspend array for raid5_store_skip_copy()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f1c32b4d190f..c937716fed01 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7151,7 +7151,7 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
 		return -EINVAL;
 	new = !!new;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	conf = mddev->private;
@@ -7160,15 +7160,13 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
 	else if (new != conf->skip_copy) {
 		struct request_queue *q = mddev->queue;
 
-		mddev_suspend(mddev);
 		conf->skip_copy = new;
 		if (new)
 			blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
 		else
 			blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q);
-		mddev_resume(mddev);
 	}
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return err ?: len;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 14/25] md/raid5: use new apis to suspend array for raid5_store_skip_copy()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f1c32b4d190f..c937716fed01 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7151,7 +7151,7 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
 		return -EINVAL;
 	new = !!new;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	conf = mddev->private;
@@ -7160,15 +7160,13 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
 	else if (new != conf->skip_copy) {
 		struct request_queue *q = mddev->queue;
 
-		mddev_suspend(mddev);
 		conf->skip_copy = new;
 		if (new)
 			blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
 		else
 			blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q);
-		mddev_resume(mddev);
 	}
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return err ?: len;
 }
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 15/25] md/raid5: use new apis to suspend array for raid5_store_group_thread_cnt()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c937716fed01..8060d29e99d2 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7221,15 +7221,13 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
 	if (new > 8192)
 		return -EINVAL;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	conf = mddev->private;
 	if (!conf)
 		err = -ENODEV;
 	else if (new != conf->worker_cnt_per_group) {
-		mddev_suspend(mddev);
-
 		old_groups = conf->worker_groups;
 		if (old_groups)
 			flush_workqueue(raid5_wq);
@@ -7246,9 +7244,8 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
 				kfree(old_groups[0].workers);
 			kfree(old_groups);
 		}
-		mddev_resume(mddev);
 	}
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 
 	return err ?: len;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 15/25] md/raid5: use new apis to suspend array for raid5_store_group_thread_cnt()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c937716fed01..8060d29e99d2 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7221,15 +7221,13 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
 	if (new > 8192)
 		return -EINVAL;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	conf = mddev->private;
 	if (!conf)
 		err = -ENODEV;
 	else if (new != conf->worker_cnt_per_group) {
-		mddev_suspend(mddev);
-
 		old_groups = conf->worker_groups;
 		if (old_groups)
 			flush_workqueue(raid5_wq);
@@ -7246,9 +7244,8 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
 				kfree(old_groups[0].workers);
 			kfree(old_groups);
 		}
-		mddev_resume(mddev);
 	}
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 
 	return err ?: len;
 }
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 16/25] md/raid5: use new apis to suspend array for raid5_change_consistency_policy()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8060d29e99d2..e6b8c0145648 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8967,12 +8967,12 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
 	struct r5conf *conf;
 	int err;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	conf = mddev->private;
 	if (!conf) {
-		mddev_unlock(mddev);
+		mddev_unlock_and_resume(mddev);
 		return -ENODEV;
 	}
 
@@ -8982,19 +8982,14 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
 			err = log_init(conf, NULL, true);
 			if (!err) {
 				err = resize_stripes(conf, conf->pool_size);
-				if (err) {
-					mddev_suspend(mddev);
+				if (err)
 					log_exit(conf);
-					mddev_resume(mddev);
-				}
 			}
 		} else
 			err = -EINVAL;
 	} else if (strncmp(buf, "resync", 6) == 0) {
 		if (raid5_has_ppl(conf)) {
-			mddev_suspend(mddev);
 			log_exit(conf);
-			mddev_resume(mddev);
 			err = resize_stripes(conf, conf->pool_size);
 		} else if (test_bit(MD_HAS_JOURNAL, &conf->mddev->flags) &&
 			   r5l_log_disk_error(conf)) {
@@ -9007,11 +9002,9 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
 					break;
 				}
 
-			if (!journal_dev_exists) {
-				mddev_suspend(mddev);
+			if (!journal_dev_exists)
 				clear_bit(MD_HAS_JOURNAL, &mddev->flags);
-				mddev_resume(mddev);
-			} else  /* need remove journal device first */
+			else  /* need remove journal device first */
 				err = -EBUSY;
 		} else
 			err = -EINVAL;
@@ -9022,7 +9015,7 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
 	if (!err)
 		md_update_sb(mddev, 1);
 
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 
 	return err;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 16/25] md/raid5: use new apis to suspend array for raid5_change_consistency_policy()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8060d29e99d2..e6b8c0145648 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8967,12 +8967,12 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
 	struct r5conf *conf;
 	int err;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	conf = mddev->private;
 	if (!conf) {
-		mddev_unlock(mddev);
+		mddev_unlock_and_resume(mddev);
 		return -ENODEV;
 	}
 
@@ -8982,19 +8982,14 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
 			err = log_init(conf, NULL, true);
 			if (!err) {
 				err = resize_stripes(conf, conf->pool_size);
-				if (err) {
-					mddev_suspend(mddev);
+				if (err)
 					log_exit(conf);
-					mddev_resume(mddev);
-				}
 			}
 		} else
 			err = -EINVAL;
 	} else if (strncmp(buf, "resync", 6) == 0) {
 		if (raid5_has_ppl(conf)) {
-			mddev_suspend(mddev);
 			log_exit(conf);
-			mddev_resume(mddev);
 			err = resize_stripes(conf, conf->pool_size);
 		} else if (test_bit(MD_HAS_JOURNAL, &conf->mddev->flags) &&
 			   r5l_log_disk_error(conf)) {
@@ -9007,11 +9002,9 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
 					break;
 				}
 
-			if (!journal_dev_exists) {
-				mddev_suspend(mddev);
+			if (!journal_dev_exists)
 				clear_bit(MD_HAS_JOURNAL, &mddev->flags);
-				mddev_resume(mddev);
-			} else  /* need remove journal device first */
+			else  /* need remove journal device first */
 				err = -EBUSY;
 		} else
 			err = -EINVAL;
@@ -9022,7 +9015,7 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
 	if (!err)
 		md_update_sb(mddev, 1);
 
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 
 	return err;
 }
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 17/25] md/raid5: replace suspend with quiesce() callback
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

raid5 is the only personality to suspend array in check_reshape() and
start_reshape() callback, suspend and quiesce() callback can both wait
for all normal io to be done, and prevent new io to be dispatched, the
difference is that suspend is implemented in common layer, and quiesce()
callback is implemented in raid5.

In order to cleanup all the usage of mddev_suspend(), the new apis
__mddev_suspend() need to be called before 'reconfig_mutex' is held,
and it's not good to affect all the personalities in common layer just
for raid5. Hence replace suspend with quiesce() callaback, prepare to
reomove all the users of mddev_suspend().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e6b8c0145648..d6de084a85e5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -70,6 +70,8 @@ MODULE_PARM_DESC(devices_handle_discard_safely,
 		 "Set to Y if all devices in each array reliably return zeroes on reads from discarded regions");
 static struct workqueue_struct *raid5_wq;
 
+static void raid5_quiesce(struct mddev *mddev, int quiesce);
+
 static inline struct hlist_head *stripe_hash(struct r5conf *conf, sector_t sect)
 {
 	int hash = (sect >> RAID5_STRIPE_SHIFT(conf)) & HASH_MASK;
@@ -2492,15 +2494,12 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
 	unsigned long cpu;
 	int err = 0;
 
-	/*
-	 * Never shrink. And mddev_suspend() could deadlock if this is called
-	 * from raid5d. In that case, scribble_disks and scribble_sectors
-	 * should equal to new_disks and new_sectors
-	 */
+	/* Never shrink. */
 	if (conf->scribble_disks >= new_disks &&
 	    conf->scribble_sectors >= new_sectors)
 		return 0;
-	mddev_suspend(conf->mddev);
+
+	raid5_quiesce(conf->mddev, true);
 	cpus_read_lock();
 
 	for_each_present_cpu(cpu) {
@@ -2514,7 +2513,8 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
 	}
 
 	cpus_read_unlock();
-	mddev_resume(conf->mddev);
+	raid5_quiesce(conf->mddev, false);
+
 	if (!err) {
 		conf->scribble_disks = new_disks;
 		conf->scribble_sectors = new_sectors;
@@ -8551,8 +8551,8 @@ static int raid5_start_reshape(struct mddev *mddev)
 	 * the reshape wasn't running - like Discard or Read - have
 	 * completed.
 	 */
-	mddev_suspend(mddev);
-	mddev_resume(mddev);
+	raid5_quiesce(mddev, true);
+	raid5_quiesce(mddev, false);
 
 	/* Add some new drives, as many as will fit.
 	 * We know there are enough to make the newly sized array work.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 17/25] md/raid5: replace suspend with quiesce() callback
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

raid5 is the only personality to suspend array in check_reshape() and
start_reshape() callback, suspend and quiesce() callback can both wait
for all normal io to be done, and prevent new io to be dispatched, the
difference is that suspend is implemented in common layer, and quiesce()
callback is implemented in raid5.

In order to cleanup all the usage of mddev_suspend(), the new apis
__mddev_suspend() need to be called before 'reconfig_mutex' is held,
and it's not good to affect all the personalities in common layer just
for raid5. Hence replace suspend with quiesce() callaback, prepare to
reomove all the users of mddev_suspend().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e6b8c0145648..d6de084a85e5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -70,6 +70,8 @@ MODULE_PARM_DESC(devices_handle_discard_safely,
 		 "Set to Y if all devices in each array reliably return zeroes on reads from discarded regions");
 static struct workqueue_struct *raid5_wq;
 
+static void raid5_quiesce(struct mddev *mddev, int quiesce);
+
 static inline struct hlist_head *stripe_hash(struct r5conf *conf, sector_t sect)
 {
 	int hash = (sect >> RAID5_STRIPE_SHIFT(conf)) & HASH_MASK;
@@ -2492,15 +2494,12 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
 	unsigned long cpu;
 	int err = 0;
 
-	/*
-	 * Never shrink. And mddev_suspend() could deadlock if this is called
-	 * from raid5d. In that case, scribble_disks and scribble_sectors
-	 * should equal to new_disks and new_sectors
-	 */
+	/* Never shrink. */
 	if (conf->scribble_disks >= new_disks &&
 	    conf->scribble_sectors >= new_sectors)
 		return 0;
-	mddev_suspend(conf->mddev);
+
+	raid5_quiesce(conf->mddev, true);
 	cpus_read_lock();
 
 	for_each_present_cpu(cpu) {
@@ -2514,7 +2513,8 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
 	}
 
 	cpus_read_unlock();
-	mddev_resume(conf->mddev);
+	raid5_quiesce(conf->mddev, false);
+
 	if (!err) {
 		conf->scribble_disks = new_disks;
 		conf->scribble_sectors = new_sectors;
@@ -8551,8 +8551,8 @@ static int raid5_start_reshape(struct mddev *mddev)
 	 * the reshape wasn't running - like Discard or Read - have
 	 * completed.
 	 */
-	mddev_suspend(mddev);
-	mddev_resume(mddev);
+	raid5_quiesce(mddev, true);
+	raid5_quiesce(mddev, false);
 
 	/* Add some new drives, as many as will fit.
 	 * We know there are enough to make the newly sized array work.
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 18/25] md: use new apis to suspend array for ioctls involed array reconfiguration
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

'reconfig_mutex' will be grabbed before these ioctls, suspend array
before holding the lock, so that io won't concurrent with array
reconfiguration through ioctls.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0c5a6169453c..957813b7d7e5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7209,7 +7209,6 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
 			struct bitmap *bitmap;
 
 			bitmap = md_bitmap_create(mddev, -1);
-			mddev_suspend(mddev);
 			if (!IS_ERR(bitmap)) {
 				mddev->bitmap = bitmap;
 				err = md_bitmap_load(mddev);
@@ -7219,11 +7218,8 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
 				md_bitmap_destroy(mddev);
 				fd = -1;
 			}
-			mddev_resume(mddev);
 		} else if (fd < 0) {
-			mddev_suspend(mddev);
 			md_bitmap_destroy(mddev);
-			mddev_resume(mddev);
 		}
 	}
 	if (fd < 0) {
@@ -7512,7 +7508,6 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
 			mddev->bitmap_info.space =
 				mddev->bitmap_info.default_space;
 			bitmap = md_bitmap_create(mddev, -1);
-			mddev_suspend(mddev);
 			if (!IS_ERR(bitmap)) {
 				mddev->bitmap = bitmap;
 				rv = md_bitmap_load(mddev);
@@ -7520,7 +7515,6 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
 				rv = PTR_ERR(bitmap);
 			if (rv)
 				md_bitmap_destroy(mddev);
-			mddev_resume(mddev);
 		} else {
 			/* remove the bitmap */
 			if (!mddev->bitmap) {
@@ -7545,9 +7539,7 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
 				module_put(md_cluster_mod);
 				mddev->safemode_delay = DEFAULT_SAFEMODE_DELAY;
 			}
-			mddev_suspend(mddev);
 			md_bitmap_destroy(mddev);
-			mddev_resume(mddev);
 			mddev->bitmap_info.offset = 0;
 		}
 	}
@@ -7618,6 +7610,20 @@ static inline bool md_ioctl_valid(unsigned int cmd)
 	}
 }
 
+static bool md_ioctl_need_suspend(unsigned int cmd)
+{
+	switch (cmd) {
+	case ADD_NEW_DISK:
+	case HOT_ADD_DISK:
+	case HOT_REMOVE_DISK:
+	case SET_BITMAP_FILE:
+	case SET_ARRAY_INFO:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static int __md_set_array_info(struct mddev *mddev, void __user *argp)
 {
 	mdu_array_info_t info;
@@ -7750,7 +7756,8 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
 	if (!md_is_rdwr(mddev))
 		flush_work(&mddev->sync_work);
 
-	err = mddev_lock(mddev);
+	err = md_ioctl_need_suspend(cmd) ? mddev_suspend_and_lock(mddev) :
+					   mddev_lock(mddev);
 	if (err) {
 		pr_debug("md: ioctl lock interrupted, reason %d, cmd %d\n",
 			 err, cmd);
@@ -7878,7 +7885,10 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
 	if (mddev->hold_active == UNTIL_IOCTL &&
 	    err != -EINVAL)
 		mddev->hold_active = 0;
-	mddev_unlock(mddev);
+
+	md_ioctl_need_suspend(cmd) ? mddev_unlock_and_resume(mddev) :
+				     mddev_unlock(mddev);
+
 out:
 	if(did_set_md_closing)
 		clear_bit(MD_CLOSING, &mddev->flags);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 18/25] md: use new apis to suspend array for ioctls involed array reconfiguration
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

'reconfig_mutex' will be grabbed before these ioctls, suspend array
before holding the lock, so that io won't concurrent with array
reconfiguration through ioctls.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0c5a6169453c..957813b7d7e5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7209,7 +7209,6 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
 			struct bitmap *bitmap;
 
 			bitmap = md_bitmap_create(mddev, -1);
-			mddev_suspend(mddev);
 			if (!IS_ERR(bitmap)) {
 				mddev->bitmap = bitmap;
 				err = md_bitmap_load(mddev);
@@ -7219,11 +7218,8 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
 				md_bitmap_destroy(mddev);
 				fd = -1;
 			}
-			mddev_resume(mddev);
 		} else if (fd < 0) {
-			mddev_suspend(mddev);
 			md_bitmap_destroy(mddev);
-			mddev_resume(mddev);
 		}
 	}
 	if (fd < 0) {
@@ -7512,7 +7508,6 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
 			mddev->bitmap_info.space =
 				mddev->bitmap_info.default_space;
 			bitmap = md_bitmap_create(mddev, -1);
-			mddev_suspend(mddev);
 			if (!IS_ERR(bitmap)) {
 				mddev->bitmap = bitmap;
 				rv = md_bitmap_load(mddev);
@@ -7520,7 +7515,6 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
 				rv = PTR_ERR(bitmap);
 			if (rv)
 				md_bitmap_destroy(mddev);
-			mddev_resume(mddev);
 		} else {
 			/* remove the bitmap */
 			if (!mddev->bitmap) {
@@ -7545,9 +7539,7 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
 				module_put(md_cluster_mod);
 				mddev->safemode_delay = DEFAULT_SAFEMODE_DELAY;
 			}
-			mddev_suspend(mddev);
 			md_bitmap_destroy(mddev);
-			mddev_resume(mddev);
 			mddev->bitmap_info.offset = 0;
 		}
 	}
@@ -7618,6 +7610,20 @@ static inline bool md_ioctl_valid(unsigned int cmd)
 	}
 }
 
+static bool md_ioctl_need_suspend(unsigned int cmd)
+{
+	switch (cmd) {
+	case ADD_NEW_DISK:
+	case HOT_ADD_DISK:
+	case HOT_REMOVE_DISK:
+	case SET_BITMAP_FILE:
+	case SET_ARRAY_INFO:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static int __md_set_array_info(struct mddev *mddev, void __user *argp)
 {
 	mdu_array_info_t info;
@@ -7750,7 +7756,8 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
 	if (!md_is_rdwr(mddev))
 		flush_work(&mddev->sync_work);
 
-	err = mddev_lock(mddev);
+	err = md_ioctl_need_suspend(cmd) ? mddev_suspend_and_lock(mddev) :
+					   mddev_lock(mddev);
 	if (err) {
 		pr_debug("md: ioctl lock interrupted, reason %d, cmd %d\n",
 			 err, cmd);
@@ -7878,7 +7885,10 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
 	if (mddev->hold_active == UNTIL_IOCTL &&
 	    err != -EINVAL)
 		mddev->hold_active = 0;
-	mddev_unlock(mddev);
+
+	md_ioctl_need_suspend(cmd) ? mddev_unlock_and_resume(mddev) :
+				     mddev_unlock(mddev);
+
 out:
 	if(did_set_md_closing)
 		clear_bit(MD_CLOSING, &mddev->flags);
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 19/25] md: use new apis to suspend array for adding/removing rdev from state_store()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

User can write 'remove' and 're-add' to trigger array reconfiguration
through sysfs, suspend array in this case so that io won't concurrent
with array reconfiguration.

And now that all the caller of add_bound_rdev() alread suspend the
array, remove mddev_suspend/resume() from add_bound_rdev() as well.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 957813b7d7e5..c5fb75b066b5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2940,11 +2940,7 @@ static int add_bound_rdev(struct md_rdev *rdev)
 		 */
 		super_types[mddev->major_version].
 			validate_super(mddev, rdev);
-		if (add_journal)
-			mddev_suspend(mddev);
 		err = mddev->pers->hot_add_disk(mddev, rdev);
-		if (add_journal)
-			mddev_resume(mddev);
 		if (err) {
 			md_kick_rdev_from_array(rdev);
 			return err;
@@ -3697,6 +3693,7 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
 	struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr);
 	struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj);
 	struct kernfs_node *kn = NULL;
+	bool suspend = false;
 	ssize_t rv;
 	struct mddev *mddev = rdev->mddev;
 
@@ -3704,17 +3701,23 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
 		return -EIO;
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
+	if (!mddev)
+		return -ENODEV;
 
-	if (entry->store == state_store && cmd_match(page, "remove"))
-		kn = sysfs_break_active_protection(kobj, attr);
+	if (entry->store == state_store) {
+		if (cmd_match(page, "remove"))
+			kn = sysfs_break_active_protection(kobj, attr);
+		if (cmd_match(page, "remove") || cmd_match(page, "re-add"))
+			suspend = true;
+	}
 
-	rv = mddev ? mddev_lock(mddev) : -ENODEV;
+	rv = suspend ? mddev_suspend_and_lock(mddev) : mddev_lock(mddev);
 	if (!rv) {
 		if (rdev->mddev == NULL)
 			rv = -ENODEV;
 		else
 			rv = entry->store(rdev, page, length);
-		mddev_unlock(mddev);
+		suspend ? mddev_unlock_and_resume(mddev) : mddev_unlock(mddev);
 	}
 
 	if (kn)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 19/25] md: use new apis to suspend array for adding/removing rdev from state_store()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

User can write 'remove' and 're-add' to trigger array reconfiguration
through sysfs, suspend array in this case so that io won't concurrent
with array reconfiguration.

And now that all the caller of add_bound_rdev() alread suspend the
array, remove mddev_suspend/resume() from add_bound_rdev() as well.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 957813b7d7e5..c5fb75b066b5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2940,11 +2940,7 @@ static int add_bound_rdev(struct md_rdev *rdev)
 		 */
 		super_types[mddev->major_version].
 			validate_super(mddev, rdev);
-		if (add_journal)
-			mddev_suspend(mddev);
 		err = mddev->pers->hot_add_disk(mddev, rdev);
-		if (add_journal)
-			mddev_resume(mddev);
 		if (err) {
 			md_kick_rdev_from_array(rdev);
 			return err;
@@ -3697,6 +3693,7 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
 	struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr);
 	struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj);
 	struct kernfs_node *kn = NULL;
+	bool suspend = false;
 	ssize_t rv;
 	struct mddev *mddev = rdev->mddev;
 
@@ -3704,17 +3701,23 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
 		return -EIO;
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
+	if (!mddev)
+		return -ENODEV;
 
-	if (entry->store == state_store && cmd_match(page, "remove"))
-		kn = sysfs_break_active_protection(kobj, attr);
+	if (entry->store == state_store) {
+		if (cmd_match(page, "remove"))
+			kn = sysfs_break_active_protection(kobj, attr);
+		if (cmd_match(page, "remove") || cmd_match(page, "re-add"))
+			suspend = true;
+	}
 
-	rv = mddev ? mddev_lock(mddev) : -ENODEV;
+	rv = suspend ? mddev_suspend_and_lock(mddev) : mddev_lock(mddev);
 	if (!rv) {
 		if (rdev->mddev == NULL)
 			rv = -ENODEV;
 		else
 			rv = entry->store(rdev, page, length);
-		mddev_unlock(mddev);
+		suspend ? mddev_unlock_and_resume(mddev) : mddev_unlock(mddev);
 	}
 
 	if (kn)
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 20/25] md: use new apis to suspend array before mddev_create/destroy_serial_pool
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

mddev_create/destroy_serial_pool() will be called from several places
where mddev_suspend() will be called later.

Prepare to remove the mddev_suspend() from
mddev_create/destroy_serial_pool().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-autodetect.c |  4 ++--
 drivers/md/md-bitmap.c     |  8 ++++----
 drivers/md/md.c            | 22 ++++++++++++----------
 3 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/md/md-autodetect.c b/drivers/md/md-autodetect.c
index 6eaa0eab40f9..4b80165afd23 100644
--- a/drivers/md/md-autodetect.c
+++ b/drivers/md/md-autodetect.c
@@ -175,7 +175,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
 		return;
 	}
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err) {
 		pr_err("md: failed to lock array %s\n", name);
 		goto out_mddev_put;
@@ -221,7 +221,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
 	if (err)
 		pr_warn("md: starting %s failed\n", name);
 out_unlock:
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 out_mddev_put:
 	mddev_put(mddev);
 }
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 7d21e2a5b06e..b3d701c5c461 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2537,7 +2537,7 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
 	if (backlog > COUNTER_MAX)
 		return -EINVAL;
 
-	rv = mddev_lock(mddev);
+	rv = mddev_suspend_and_lock(mddev);
 	if (rv)
 		return rv;
 
@@ -2562,16 +2562,16 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
 	if (!backlog && mddev->serial_info_pool) {
 		/* serial_info_pool is not needed if backlog is zero */
 		if (!mddev->serialize_policy)
-			mddev_destroy_serial_pool(mddev, NULL, false);
+			mddev_destroy_serial_pool(mddev, NULL, true);
 	} else if (backlog && !mddev->serial_info_pool) {
 		/* serial_info_pool is needed since backlog is not zero */
 		rdev_for_each(rdev, mddev)
-			mddev_create_serial_pool(mddev, rdev, false);
+			mddev_create_serial_pool(mddev, rdev, true);
 	}
 	if (old_mwb != backlog)
 		md_bitmap_update_sb(mddev->bitmap);
 
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return len;
 }
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c5fb75b066b5..f8d92d745105 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2557,7 +2557,7 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
 	pr_debug("md: bind<%s>\n", b);
 
 	if (mddev->raid_disks)
-		mddev_create_serial_pool(mddev, rdev, false);
+		mddev_create_serial_pool(mddev, rdev, true);
 
 	if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b)))
 		goto fail;
@@ -3077,11 +3077,11 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 		}
 	} else if (cmd_match(buf, "writemostly")) {
 		set_bit(WriteMostly, &rdev->flags);
-		mddev_create_serial_pool(rdev->mddev, rdev, false);
+		mddev_create_serial_pool(rdev->mddev, rdev, true);
 		need_update_sb = true;
 		err = 0;
 	} else if (cmd_match(buf, "-writemostly")) {
-		mddev_destroy_serial_pool(rdev->mddev, rdev, false);
+		mddev_destroy_serial_pool(rdev->mddev, rdev, true);
 		clear_bit(WriteMostly, &rdev->flags);
 		need_update_sb = true;
 		err = 0;
@@ -3707,7 +3707,9 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
 	if (entry->store == state_store) {
 		if (cmd_match(page, "remove"))
 			kn = sysfs_break_active_protection(kobj, attr);
-		if (cmd_match(page, "remove") || cmd_match(page, "re-add"))
+		if (cmd_match(page, "remove") || cmd_match(page, "re-add") ||
+		    cmd_match(page, "writemostly") ||
+		    cmd_match(page, "-writemostly"))
 			suspend = true;
 	}
 
@@ -4681,7 +4683,7 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len)
 	    minor != MINOR(dev))
 		return -EOVERFLOW;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	if (mddev->persistent) {
@@ -4702,14 +4704,14 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len)
 		rdev = md_import_device(dev, -1, -1);
 
 	if (IS_ERR(rdev)) {
-		mddev_unlock(mddev);
+		mddev_unlock_and_resume(mddev);
 		return PTR_ERR(rdev);
 	}
 	err = bind_rdev_to_array(rdev, mddev);
  out:
 	if (err)
 		export_rdev(rdev, mddev);
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	if (!err)
 		md_new_event();
 	return err ? err : len;
@@ -6646,13 +6648,13 @@ static void autorun_devices(int part)
 		if (IS_ERR(mddev))
 			break;
 
-		if (mddev_lock(mddev))
+		if (mddev_suspend_and_lock(mddev))
 			pr_warn("md: %s locked, cannot run\n", mdname(mddev));
 		else if (mddev->raid_disks || mddev->major_version
 			 || !list_empty(&mddev->disks)) {
 			pr_warn("md: %s already running, cannot run %pg\n",
 				mdname(mddev), rdev0->bdev);
-			mddev_unlock(mddev);
+			mddev_unlock_and_resume(mddev);
 		} else {
 			pr_debug("md: created %s\n", mdname(mddev));
 			mddev->persistent = 1;
@@ -6662,7 +6664,7 @@ static void autorun_devices(int part)
 					export_rdev(rdev, mddev);
 			}
 			autorun_array(mddev);
-			mddev_unlock(mddev);
+			mddev_unlock_and_resume(mddev);
 		}
 		/* on success, candidates will be empty, on error
 		 * it won't...
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 20/25] md: use new apis to suspend array before mddev_create/destroy_serial_pool
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

mddev_create/destroy_serial_pool() will be called from several places
where mddev_suspend() will be called later.

Prepare to remove the mddev_suspend() from
mddev_create/destroy_serial_pool().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-autodetect.c |  4 ++--
 drivers/md/md-bitmap.c     |  8 ++++----
 drivers/md/md.c            | 22 ++++++++++++----------
 3 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/md/md-autodetect.c b/drivers/md/md-autodetect.c
index 6eaa0eab40f9..4b80165afd23 100644
--- a/drivers/md/md-autodetect.c
+++ b/drivers/md/md-autodetect.c
@@ -175,7 +175,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
 		return;
 	}
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err) {
 		pr_err("md: failed to lock array %s\n", name);
 		goto out_mddev_put;
@@ -221,7 +221,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
 	if (err)
 		pr_warn("md: starting %s failed\n", name);
 out_unlock:
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 out_mddev_put:
 	mddev_put(mddev);
 }
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 7d21e2a5b06e..b3d701c5c461 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2537,7 +2537,7 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
 	if (backlog > COUNTER_MAX)
 		return -EINVAL;
 
-	rv = mddev_lock(mddev);
+	rv = mddev_suspend_and_lock(mddev);
 	if (rv)
 		return rv;
 
@@ -2562,16 +2562,16 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
 	if (!backlog && mddev->serial_info_pool) {
 		/* serial_info_pool is not needed if backlog is zero */
 		if (!mddev->serialize_policy)
-			mddev_destroy_serial_pool(mddev, NULL, false);
+			mddev_destroy_serial_pool(mddev, NULL, true);
 	} else if (backlog && !mddev->serial_info_pool) {
 		/* serial_info_pool is needed since backlog is not zero */
 		rdev_for_each(rdev, mddev)
-			mddev_create_serial_pool(mddev, rdev, false);
+			mddev_create_serial_pool(mddev, rdev, true);
 	}
 	if (old_mwb != backlog)
 		md_bitmap_update_sb(mddev->bitmap);
 
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return len;
 }
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c5fb75b066b5..f8d92d745105 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2557,7 +2557,7 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
 	pr_debug("md: bind<%s>\n", b);
 
 	if (mddev->raid_disks)
-		mddev_create_serial_pool(mddev, rdev, false);
+		mddev_create_serial_pool(mddev, rdev, true);
 
 	if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b)))
 		goto fail;
@@ -3077,11 +3077,11 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 		}
 	} else if (cmd_match(buf, "writemostly")) {
 		set_bit(WriteMostly, &rdev->flags);
-		mddev_create_serial_pool(rdev->mddev, rdev, false);
+		mddev_create_serial_pool(rdev->mddev, rdev, true);
 		need_update_sb = true;
 		err = 0;
 	} else if (cmd_match(buf, "-writemostly")) {
-		mddev_destroy_serial_pool(rdev->mddev, rdev, false);
+		mddev_destroy_serial_pool(rdev->mddev, rdev, true);
 		clear_bit(WriteMostly, &rdev->flags);
 		need_update_sb = true;
 		err = 0;
@@ -3707,7 +3707,9 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
 	if (entry->store == state_store) {
 		if (cmd_match(page, "remove"))
 			kn = sysfs_break_active_protection(kobj, attr);
-		if (cmd_match(page, "remove") || cmd_match(page, "re-add"))
+		if (cmd_match(page, "remove") || cmd_match(page, "re-add") ||
+		    cmd_match(page, "writemostly") ||
+		    cmd_match(page, "-writemostly"))
 			suspend = true;
 	}
 
@@ -4681,7 +4683,7 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len)
 	    minor != MINOR(dev))
 		return -EOVERFLOW;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	if (mddev->persistent) {
@@ -4702,14 +4704,14 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len)
 		rdev = md_import_device(dev, -1, -1);
 
 	if (IS_ERR(rdev)) {
-		mddev_unlock(mddev);
+		mddev_unlock_and_resume(mddev);
 		return PTR_ERR(rdev);
 	}
 	err = bind_rdev_to_array(rdev, mddev);
  out:
 	if (err)
 		export_rdev(rdev, mddev);
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	if (!err)
 		md_new_event();
 	return err ? err : len;
@@ -6646,13 +6648,13 @@ static void autorun_devices(int part)
 		if (IS_ERR(mddev))
 			break;
 
-		if (mddev_lock(mddev))
+		if (mddev_suspend_and_lock(mddev))
 			pr_warn("md: %s locked, cannot run\n", mdname(mddev));
 		else if (mddev->raid_disks || mddev->major_version
 			 || !list_empty(&mddev->disks)) {
 			pr_warn("md: %s already running, cannot run %pg\n",
 				mdname(mddev), rdev0->bdev);
-			mddev_unlock(mddev);
+			mddev_unlock_and_resume(mddev);
 		} else {
 			pr_debug("md: created %s\n", mdname(mddev));
 			mddev->persistent = 1;
@@ -6662,7 +6664,7 @@ static void autorun_devices(int part)
 					export_rdev(rdev, mddev);
 			}
 			autorun_array(mddev);
-			mddev_unlock(mddev);
+			mddev_unlock_and_resume(mddev);
 		}
 		/* on success, candidates will be empty, on error
 		 * it won't...
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 21/25] md: cleanup mddev_create/destroy_serial_pool()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that except for stopping the array, all the callers already suspend
the array, there is no need to suspend anymore, hence remove the second
parameter.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-bitmap.c |  8 ++++----
 drivers/md/md.c        | 33 ++++++++++-----------------------
 drivers/md/md.h        |  7 +++----
 3 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index b3d701c5c461..9672f75c3050 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1861,7 +1861,7 @@ void md_bitmap_destroy(struct mddev *mddev)
 
 	md_bitmap_wait_behind_writes(mddev);
 	if (!mddev->serialize_policy)
-		mddev_destroy_serial_pool(mddev, NULL, true);
+		mddev_destroy_serial_pool(mddev, NULL);
 
 	mutex_lock(&mddev->bitmap_info.mutex);
 	spin_lock(&mddev->lock);
@@ -1977,7 +1977,7 @@ int md_bitmap_load(struct mddev *mddev)
 		goto out;
 
 	rdev_for_each(rdev, mddev)
-		mddev_create_serial_pool(mddev, rdev, true);
+		mddev_create_serial_pool(mddev, rdev);
 
 	if (mddev_is_clustered(mddev))
 		md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes);
@@ -2562,11 +2562,11 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
 	if (!backlog && mddev->serial_info_pool) {
 		/* serial_info_pool is not needed if backlog is zero */
 		if (!mddev->serialize_policy)
-			mddev_destroy_serial_pool(mddev, NULL, true);
+			mddev_destroy_serial_pool(mddev, NULL);
 	} else if (backlog && !mddev->serial_info_pool) {
 		/* serial_info_pool is needed since backlog is not zero */
 		rdev_for_each(rdev, mddev)
-			mddev_create_serial_pool(mddev, rdev, true);
+			mddev_create_serial_pool(mddev, rdev);
 	}
 	if (old_mwb != backlog)
 		md_bitmap_update_sb(mddev->bitmap);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index f8d92d745105..c3a51d309063 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -206,8 +206,7 @@ static int rdev_need_serial(struct md_rdev *rdev)
  * 1. rdev is the first device which return true from rdev_enable_serial.
  * 2. rdev is NULL, means we want to enable serialization for all rdevs.
  */
-void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
-			      bool is_suspend)
+void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev)
 {
 	int ret = 0;
 
@@ -215,15 +214,12 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 	    !test_bit(CollisionCheck, &rdev->flags))
 		return;
 
-	if (!is_suspend)
-		mddev_suspend(mddev);
-
 	if (!rdev)
 		ret = rdevs_init_serial(mddev);
 	else
 		ret = rdev_init_serial(rdev);
 	if (ret)
-		goto abort;
+		return;
 
 	if (mddev->serial_info_pool == NULL) {
 		/*
@@ -238,10 +234,6 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 			pr_err("can't alloc memory pool for serialization\n");
 		}
 	}
-
-abort:
-	if (!is_suspend)
-		mddev_resume(mddev);
 }
 
 /*
@@ -250,8 +242,7 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
  * 2. when bitmap is destroyed while policy is not enabled.
  * 3. for disable policy, the pool is destroyed only when no rdev needs it.
  */
-void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
-			       bool is_suspend)
+void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev)
 {
 	if (rdev && !test_bit(CollisionCheck, &rdev->flags))
 		return;
@@ -260,8 +251,6 @@ void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 		struct md_rdev *temp;
 		int num = 0; /* used to track if other rdevs need the pool */
 
-		if (!is_suspend)
-			mddev_suspend(mddev);
 		rdev_for_each(temp, mddev) {
 			if (!rdev) {
 				if (!mddev->serialize_policy ||
@@ -283,8 +272,6 @@ void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 			mempool_destroy(mddev->serial_info_pool);
 			mddev->serial_info_pool = NULL;
 		}
-		if (!is_suspend)
-			mddev_resume(mddev);
 	}
 }
 
@@ -2557,7 +2544,7 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
 	pr_debug("md: bind<%s>\n", b);
 
 	if (mddev->raid_disks)
-		mddev_create_serial_pool(mddev, rdev, true);
+		mddev_create_serial_pool(mddev, rdev);
 
 	if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b)))
 		goto fail;
@@ -2610,7 +2597,7 @@ static void md_kick_rdev_from_array(struct md_rdev *rdev)
 	bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
 	list_del_rcu(&rdev->same_set);
 	pr_debug("md: unbind<%pg>\n", rdev->bdev);
-	mddev_destroy_serial_pool(rdev->mddev, rdev, false);
+	mddev_destroy_serial_pool(rdev->mddev, rdev);
 	rdev->mddev = NULL;
 	sysfs_remove_link(&rdev->kobj, "block");
 	sysfs_put(rdev->sysfs_state);
@@ -3077,11 +3064,11 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 		}
 	} else if (cmd_match(buf, "writemostly")) {
 		set_bit(WriteMostly, &rdev->flags);
-		mddev_create_serial_pool(rdev->mddev, rdev, true);
+		mddev_create_serial_pool(rdev->mddev, rdev);
 		need_update_sb = true;
 		err = 0;
 	} else if (cmd_match(buf, "-writemostly")) {
-		mddev_destroy_serial_pool(rdev->mddev, rdev, true);
+		mddev_destroy_serial_pool(rdev->mddev, rdev);
 		clear_bit(WriteMostly, &rdev->flags);
 		need_update_sb = true;
 		err = 0;
@@ -5588,9 +5575,9 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
 	}
 
 	if (value)
-		mddev_create_serial_pool(mddev, NULL, true);
+		mddev_create_serial_pool(mddev, NULL);
 	else
-		mddev_destroy_serial_pool(mddev, NULL, true);
+		mddev_destroy_serial_pool(mddev, NULL);
 	mddev->serialize_policy = value;
 unlock:
 	mddev_unlock_and_resume(mddev);
@@ -6356,7 +6343,7 @@ static void __md_stop_writes(struct mddev *mddev)
 	}
 	/* disable policy to guarantee rdevs free resources for serialization */
 	mddev->serialize_policy = 0;
-	mddev_destroy_serial_pool(mddev, NULL, true);
+	mddev_destroy_serial_pool(mddev, NULL);
 }
 
 void md_stop_writes(struct mddev *mddev)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 5c8f3f045e78..63b4c393b1ee 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -817,10 +817,9 @@ extern void __mddev_resume(struct mddev *mddev);
 
 extern void md_reload_sb(struct mddev *mddev, int raid_disk);
 extern void md_update_sb(struct mddev *mddev, int force);
-extern void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
-				     bool is_suspend);
-extern void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
-				      bool is_suspend);
+extern void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev);
+extern void mddev_destroy_serial_pool(struct mddev *mddev,
+				      struct md_rdev *rdev);
 struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr);
 struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 21/25] md: cleanup mddev_create/destroy_serial_pool()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that except for stopping the array, all the callers already suspend
the array, there is no need to suspend anymore, hence remove the second
parameter.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-bitmap.c |  8 ++++----
 drivers/md/md.c        | 33 ++++++++++-----------------------
 drivers/md/md.h        |  7 +++----
 3 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index b3d701c5c461..9672f75c3050 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1861,7 +1861,7 @@ void md_bitmap_destroy(struct mddev *mddev)
 
 	md_bitmap_wait_behind_writes(mddev);
 	if (!mddev->serialize_policy)
-		mddev_destroy_serial_pool(mddev, NULL, true);
+		mddev_destroy_serial_pool(mddev, NULL);
 
 	mutex_lock(&mddev->bitmap_info.mutex);
 	spin_lock(&mddev->lock);
@@ -1977,7 +1977,7 @@ int md_bitmap_load(struct mddev *mddev)
 		goto out;
 
 	rdev_for_each(rdev, mddev)
-		mddev_create_serial_pool(mddev, rdev, true);
+		mddev_create_serial_pool(mddev, rdev);
 
 	if (mddev_is_clustered(mddev))
 		md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes);
@@ -2562,11 +2562,11 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
 	if (!backlog && mddev->serial_info_pool) {
 		/* serial_info_pool is not needed if backlog is zero */
 		if (!mddev->serialize_policy)
-			mddev_destroy_serial_pool(mddev, NULL, true);
+			mddev_destroy_serial_pool(mddev, NULL);
 	} else if (backlog && !mddev->serial_info_pool) {
 		/* serial_info_pool is needed since backlog is not zero */
 		rdev_for_each(rdev, mddev)
-			mddev_create_serial_pool(mddev, rdev, true);
+			mddev_create_serial_pool(mddev, rdev);
 	}
 	if (old_mwb != backlog)
 		md_bitmap_update_sb(mddev->bitmap);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index f8d92d745105..c3a51d309063 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -206,8 +206,7 @@ static int rdev_need_serial(struct md_rdev *rdev)
  * 1. rdev is the first device which return true from rdev_enable_serial.
  * 2. rdev is NULL, means we want to enable serialization for all rdevs.
  */
-void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
-			      bool is_suspend)
+void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev)
 {
 	int ret = 0;
 
@@ -215,15 +214,12 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 	    !test_bit(CollisionCheck, &rdev->flags))
 		return;
 
-	if (!is_suspend)
-		mddev_suspend(mddev);
-
 	if (!rdev)
 		ret = rdevs_init_serial(mddev);
 	else
 		ret = rdev_init_serial(rdev);
 	if (ret)
-		goto abort;
+		return;
 
 	if (mddev->serial_info_pool == NULL) {
 		/*
@@ -238,10 +234,6 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 			pr_err("can't alloc memory pool for serialization\n");
 		}
 	}
-
-abort:
-	if (!is_suspend)
-		mddev_resume(mddev);
 }
 
 /*
@@ -250,8 +242,7 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
  * 2. when bitmap is destroyed while policy is not enabled.
  * 3. for disable policy, the pool is destroyed only when no rdev needs it.
  */
-void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
-			       bool is_suspend)
+void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev)
 {
 	if (rdev && !test_bit(CollisionCheck, &rdev->flags))
 		return;
@@ -260,8 +251,6 @@ void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 		struct md_rdev *temp;
 		int num = 0; /* used to track if other rdevs need the pool */
 
-		if (!is_suspend)
-			mddev_suspend(mddev);
 		rdev_for_each(temp, mddev) {
 			if (!rdev) {
 				if (!mddev->serialize_policy ||
@@ -283,8 +272,6 @@ void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 			mempool_destroy(mddev->serial_info_pool);
 			mddev->serial_info_pool = NULL;
 		}
-		if (!is_suspend)
-			mddev_resume(mddev);
 	}
 }
 
@@ -2557,7 +2544,7 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
 	pr_debug("md: bind<%s>\n", b);
 
 	if (mddev->raid_disks)
-		mddev_create_serial_pool(mddev, rdev, true);
+		mddev_create_serial_pool(mddev, rdev);
 
 	if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b)))
 		goto fail;
@@ -2610,7 +2597,7 @@ static void md_kick_rdev_from_array(struct md_rdev *rdev)
 	bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
 	list_del_rcu(&rdev->same_set);
 	pr_debug("md: unbind<%pg>\n", rdev->bdev);
-	mddev_destroy_serial_pool(rdev->mddev, rdev, false);
+	mddev_destroy_serial_pool(rdev->mddev, rdev);
 	rdev->mddev = NULL;
 	sysfs_remove_link(&rdev->kobj, "block");
 	sysfs_put(rdev->sysfs_state);
@@ -3077,11 +3064,11 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 		}
 	} else if (cmd_match(buf, "writemostly")) {
 		set_bit(WriteMostly, &rdev->flags);
-		mddev_create_serial_pool(rdev->mddev, rdev, true);
+		mddev_create_serial_pool(rdev->mddev, rdev);
 		need_update_sb = true;
 		err = 0;
 	} else if (cmd_match(buf, "-writemostly")) {
-		mddev_destroy_serial_pool(rdev->mddev, rdev, true);
+		mddev_destroy_serial_pool(rdev->mddev, rdev);
 		clear_bit(WriteMostly, &rdev->flags);
 		need_update_sb = true;
 		err = 0;
@@ -5588,9 +5575,9 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
 	}
 
 	if (value)
-		mddev_create_serial_pool(mddev, NULL, true);
+		mddev_create_serial_pool(mddev, NULL);
 	else
-		mddev_destroy_serial_pool(mddev, NULL, true);
+		mddev_destroy_serial_pool(mddev, NULL);
 	mddev->serialize_policy = value;
 unlock:
 	mddev_unlock_and_resume(mddev);
@@ -6356,7 +6343,7 @@ static void __md_stop_writes(struct mddev *mddev)
 	}
 	/* disable policy to guarantee rdevs free resources for serialization */
 	mddev->serialize_policy = 0;
-	mddev_destroy_serial_pool(mddev, NULL, true);
+	mddev_destroy_serial_pool(mddev, NULL);
 }
 
 void md_stop_writes(struct mddev *mddev)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 5c8f3f045e78..63b4c393b1ee 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -817,10 +817,9 @@ extern void __mddev_resume(struct mddev *mddev);
 
 extern void md_reload_sb(struct mddev *mddev, int raid_disk);
 extern void md_update_sb(struct mddev *mddev, int force);
-extern void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
-				     bool is_suspend);
-extern void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
-				      bool is_suspend);
+extern void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev);
+extern void mddev_destroy_serial_pool(struct mddev *mddev,
+				      struct md_rdev *rdev);
 struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr);
 struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev);
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 22/25] md/md-linear: cleanup linear_add()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that caller already suspend the array, there is no need to suspend
array in liner_add().

Note that mddev_suspend/resume() is not used anymore.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-linear.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
index ae2826e9645b..8eca7693b793 100644
--- a/drivers/md/md-linear.c
+++ b/drivers/md/md-linear.c
@@ -183,7 +183,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
 	 * in linear_congested(), therefore kfree_rcu() is used to free
 	 * oldconf until no one uses it anymore.
 	 */
-	mddev_suspend(mddev);
 	oldconf = rcu_dereference_protected(mddev->private,
 			lockdep_is_held(&mddev->reconfig_mutex));
 	mddev->raid_disks++;
@@ -192,7 +191,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
 	rcu_assign_pointer(mddev->private, newconf);
 	md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
 	set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
-	mddev_resume(mddev);
 	kfree_rcu(oldconf, rcu);
 	return 0;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 22/25] md/md-linear: cleanup linear_add()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that caller already suspend the array, there is no need to suspend
array in liner_add().

Note that mddev_suspend/resume() is not used anymore.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-linear.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
index ae2826e9645b..8eca7693b793 100644
--- a/drivers/md/md-linear.c
+++ b/drivers/md/md-linear.c
@@ -183,7 +183,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
 	 * in linear_congested(), therefore kfree_rcu() is used to free
 	 * oldconf until no one uses it anymore.
 	 */
-	mddev_suspend(mddev);
 	oldconf = rcu_dereference_protected(mddev->private,
 			lockdep_is_held(&mddev->reconfig_mutex));
 	mddev->raid_disks++;
@@ -192,7 +191,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
 	rcu_assign_pointer(mddev->private, newconf);
 	md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
 	set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
-	mddev_resume(mddev);
 	kfree_rcu(oldconf, rcu);
 	return 0;
 }
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 23/25] md: suspend array in md_start_sync() if array need reconfiguration
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

So that io won't concurrent with array reconfiguration, and it's safe to
suspend the array directly because normal io won't rely on
md_start_sync().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c3a51d309063..a3b62c6c5332 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9414,8 +9414,13 @@ static void md_start_sync(struct work_struct *ws)
 {
 	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
 	int spares = 0;
+	bool suspend = false;
 
-	mddev_lock_nointr(mddev);
+	if (md_spares_need_change(mddev))
+		suspend = true;
+
+	suspend ? mddev_suspend_and_lock_nointr(mddev) :
+		  mddev_lock_nointr(mddev);
 
 	if (!md_is_rdwr(mddev)) {
 		/*
@@ -9451,7 +9456,7 @@ static void md_start_sync(struct work_struct *ws)
 		goto not_running;
 	}
 
-	mddev_unlock(mddev);
+	suspend ? mddev_unlock_and_resume(mddev) : mddev_unlock(mddev);
 	md_wakeup_thread(mddev->sync_thread);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
 	md_new_event();
@@ -9463,7 +9468,7 @@ static void md_start_sync(struct work_struct *ws)
 	clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
 	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 	clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-	mddev_unlock(mddev);
+	suspend ? mddev_unlock_and_resume(mddev) : mddev_unlock(mddev);
 
 	wake_up(&resync_wait);
 	if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 23/25] md: suspend array in md_start_sync() if array need reconfiguration
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

So that io won't concurrent with array reconfiguration, and it's safe to
suspend the array directly because normal io won't rely on
md_start_sync().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c3a51d309063..a3b62c6c5332 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9414,8 +9414,13 @@ static void md_start_sync(struct work_struct *ws)
 {
 	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
 	int spares = 0;
+	bool suspend = false;
 
-	mddev_lock_nointr(mddev);
+	if (md_spares_need_change(mddev))
+		suspend = true;
+
+	suspend ? mddev_suspend_and_lock_nointr(mddev) :
+		  mddev_lock_nointr(mddev);
 
 	if (!md_is_rdwr(mddev)) {
 		/*
@@ -9451,7 +9456,7 @@ static void md_start_sync(struct work_struct *ws)
 		goto not_running;
 	}
 
-	mddev_unlock(mddev);
+	suspend ? mddev_unlock_and_resume(mddev) : mddev_unlock(mddev);
 	md_wakeup_thread(mddev->sync_thread);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
 	md_new_event();
@@ -9463,7 +9468,7 @@ static void md_start_sync(struct work_struct *ws)
 	clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
 	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 	clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-	mddev_unlock(mddev);
+	suspend ? mddev_unlock_and_resume(mddev) : mddev_unlock(mddev);
 
 	wake_up(&resync_wait);
 	if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 24/25] md: remove old apis to suspend the array
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that mddev_suspend() and mddev_resume() is not used anywhere, remove
them, and remove 'MD_ALLOW_SB_UPDATE' and 'MD_UPDATING_SB' as well.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 82 ++-----------------------------------------------
 drivers/md/md.h |  8 -----
 2 files changed, 3 insertions(+), 87 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index a3b62c6c5332..271d3f336026 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -418,74 +418,10 @@ static void md_submit_bio(struct bio *bio)
 	md_handle_request(mddev, bio);
 }
 
-/* mddev_suspend makes sure no new requests are submitted
- * to the device, and that any requests that have been submitted
- * are completely handled.
- * Once mddev_detach() is called and completes, the module will be
- * completely unused.
+/*
+ * Make sure no new requests are submitted to the device, and any requests that
+ * have been submitted are completely handled.
  */
-void mddev_suspend(struct mddev *mddev)
-{
-	struct md_thread *thread = rcu_dereference_protected(mddev->thread,
-			lockdep_is_held(&mddev->reconfig_mutex));
-
-	WARN_ON_ONCE(thread && current == thread->tsk);
-
-	/* can't concurrent with __mddev_suspend() and __mddev_resume() */
-	mutex_lock(&mddev->suspend_mutex);
-	if (mddev->suspended++) {
-		mutex_unlock(&mddev->suspend_mutex);
-		return;
-	}
-
-	wake_up(&mddev->sb_wait);
-	set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
-	percpu_ref_kill(&mddev->active_io);
-
-	/*
-	 * TODO: cleanup 'pers->prepare_suspend after all callers are replaced
-	 * by __mddev_suspend().
-	 */
-	if (mddev->pers && mddev->pers->prepare_suspend)
-		mddev->pers->prepare_suspend(mddev);
-
-	wait_event(mddev->sb_wait, percpu_ref_is_zero(&mddev->active_io));
-	clear_bit_unlock(MD_ALLOW_SB_UPDATE, &mddev->flags);
-	wait_event(mddev->sb_wait, !test_bit(MD_UPDATING_SB, &mddev->flags));
-
-	del_timer_sync(&mddev->safemode_timer);
-	/* restrict memory reclaim I/O during raid array is suspend */
-	mddev->noio_flag = memalloc_noio_save();
-
-	mutex_unlock(&mddev->suspend_mutex);
-}
-EXPORT_SYMBOL_GPL(mddev_suspend);
-
-void mddev_resume(struct mddev *mddev)
-{
-	lockdep_assert_held(&mddev->reconfig_mutex);
-
-	/* can't concurrent with __mddev_suspend() and __mddev_resume() */
-	mutex_lock(&mddev->suspend_mutex);
-	if (--mddev->suspended) {
-		mutex_unlock(&mddev->suspend_mutex);
-		return;
-	}
-
-	/* entred the memalloc scope from mddev_suspend() */
-	memalloc_noio_restore(mddev->noio_flag);
-
-	percpu_ref_resurrect(&mddev->active_io);
-	wake_up(&mddev->sb_wait);
-
-	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-	md_wakeup_thread(mddev->thread);
-	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
-
-	mutex_unlock(&mddev->suspend_mutex);
-}
-EXPORT_SYMBOL_GPL(mddev_resume);
-
 int __mddev_suspend(struct mddev *mddev, bool interruptible)
 {
 	int err = 0;
@@ -9500,18 +9436,6 @@ static void md_start_sync(struct work_struct *ws)
  */
 void md_check_recovery(struct mddev *mddev)
 {
-	if (test_bit(MD_ALLOW_SB_UPDATE, &mddev->flags) && mddev->sb_flags) {
-		/* Write superblock - thread that called mddev_suspend()
-		 * holds reconfig_mutex for us.
-		 */
-		set_bit(MD_UPDATING_SB, &mddev->flags);
-		smp_mb__after_atomic();
-		if (test_bit(MD_ALLOW_SB_UPDATE, &mddev->flags))
-			md_update_sb(mddev, 0);
-		clear_bit_unlock(MD_UPDATING_SB, &mddev->flags);
-		wake_up(&mddev->sb_wait);
-	}
-
 	if (READ_ONCE(mddev->suspended))
 		return;
 
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 63b4c393b1ee..4c5f3f032656 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -248,10 +248,6 @@ struct md_cluster_info;
  *			    become failed.
  * @MD_HAS_PPL:  The raid array has PPL feature set.
  * @MD_HAS_MULTIPLE_PPLS: The raid array has multiple PPLs feature set.
- * @MD_ALLOW_SB_UPDATE: md_check_recovery is allowed to update the metadata
- *			 without taking reconfig_mutex.
- * @MD_UPDATING_SB: md_check_recovery is updating the metadata without
- *		     explicitly holding reconfig_mutex.
  * @MD_NOT_READY: do_md_run() is active, so 'array_state', ust not report that
  *		   array is ready yet.
  * @MD_BROKEN: This is used to stop writes and mark array as failed.
@@ -268,8 +264,6 @@ enum mddev_flags {
 	MD_FAILFAST_SUPPORTED,
 	MD_HAS_PPL,
 	MD_HAS_MULTIPLE_PPLS,
-	MD_ALLOW_SB_UPDATE,
-	MD_UPDATING_SB,
 	MD_NOT_READY,
 	MD_BROKEN,
 	MD_DELETED,
@@ -810,8 +804,6 @@ extern int md_rdev_init(struct md_rdev *rdev);
 extern void md_rdev_clear(struct md_rdev *rdev);
 
 extern void md_handle_request(struct mddev *mddev, struct bio *bio);
-extern void mddev_suspend(struct mddev *mddev);
-extern void mddev_resume(struct mddev *mddev);
 extern int __mddev_suspend(struct mddev *mddev, bool interruptible);
 extern void __mddev_resume(struct mddev *mddev);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 24/25] md: remove old apis to suspend the array
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that mddev_suspend() and mddev_resume() is not used anywhere, remove
them, and remove 'MD_ALLOW_SB_UPDATE' and 'MD_UPDATING_SB' as well.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 82 ++-----------------------------------------------
 drivers/md/md.h |  8 -----
 2 files changed, 3 insertions(+), 87 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index a3b62c6c5332..271d3f336026 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -418,74 +418,10 @@ static void md_submit_bio(struct bio *bio)
 	md_handle_request(mddev, bio);
 }
 
-/* mddev_suspend makes sure no new requests are submitted
- * to the device, and that any requests that have been submitted
- * are completely handled.
- * Once mddev_detach() is called and completes, the module will be
- * completely unused.
+/*
+ * Make sure no new requests are submitted to the device, and any requests that
+ * have been submitted are completely handled.
  */
-void mddev_suspend(struct mddev *mddev)
-{
-	struct md_thread *thread = rcu_dereference_protected(mddev->thread,
-			lockdep_is_held(&mddev->reconfig_mutex));
-
-	WARN_ON_ONCE(thread && current == thread->tsk);
-
-	/* can't concurrent with __mddev_suspend() and __mddev_resume() */
-	mutex_lock(&mddev->suspend_mutex);
-	if (mddev->suspended++) {
-		mutex_unlock(&mddev->suspend_mutex);
-		return;
-	}
-
-	wake_up(&mddev->sb_wait);
-	set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
-	percpu_ref_kill(&mddev->active_io);
-
-	/*
-	 * TODO: cleanup 'pers->prepare_suspend after all callers are replaced
-	 * by __mddev_suspend().
-	 */
-	if (mddev->pers && mddev->pers->prepare_suspend)
-		mddev->pers->prepare_suspend(mddev);
-
-	wait_event(mddev->sb_wait, percpu_ref_is_zero(&mddev->active_io));
-	clear_bit_unlock(MD_ALLOW_SB_UPDATE, &mddev->flags);
-	wait_event(mddev->sb_wait, !test_bit(MD_UPDATING_SB, &mddev->flags));
-
-	del_timer_sync(&mddev->safemode_timer);
-	/* restrict memory reclaim I/O during raid array is suspend */
-	mddev->noio_flag = memalloc_noio_save();
-
-	mutex_unlock(&mddev->suspend_mutex);
-}
-EXPORT_SYMBOL_GPL(mddev_suspend);
-
-void mddev_resume(struct mddev *mddev)
-{
-	lockdep_assert_held(&mddev->reconfig_mutex);
-
-	/* can't concurrent with __mddev_suspend() and __mddev_resume() */
-	mutex_lock(&mddev->suspend_mutex);
-	if (--mddev->suspended) {
-		mutex_unlock(&mddev->suspend_mutex);
-		return;
-	}
-
-	/* entred the memalloc scope from mddev_suspend() */
-	memalloc_noio_restore(mddev->noio_flag);
-
-	percpu_ref_resurrect(&mddev->active_io);
-	wake_up(&mddev->sb_wait);
-
-	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-	md_wakeup_thread(mddev->thread);
-	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
-
-	mutex_unlock(&mddev->suspend_mutex);
-}
-EXPORT_SYMBOL_GPL(mddev_resume);
-
 int __mddev_suspend(struct mddev *mddev, bool interruptible)
 {
 	int err = 0;
@@ -9500,18 +9436,6 @@ static void md_start_sync(struct work_struct *ws)
  */
 void md_check_recovery(struct mddev *mddev)
 {
-	if (test_bit(MD_ALLOW_SB_UPDATE, &mddev->flags) && mddev->sb_flags) {
-		/* Write superblock - thread that called mddev_suspend()
-		 * holds reconfig_mutex for us.
-		 */
-		set_bit(MD_UPDATING_SB, &mddev->flags);
-		smp_mb__after_atomic();
-		if (test_bit(MD_ALLOW_SB_UPDATE, &mddev->flags))
-			md_update_sb(mddev, 0);
-		clear_bit_unlock(MD_UPDATING_SB, &mddev->flags);
-		wake_up(&mddev->sb_wait);
-	}
-
 	if (READ_ONCE(mddev->suspended))
 		return;
 
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 63b4c393b1ee..4c5f3f032656 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -248,10 +248,6 @@ struct md_cluster_info;
  *			    become failed.
  * @MD_HAS_PPL:  The raid array has PPL feature set.
  * @MD_HAS_MULTIPLE_PPLS: The raid array has multiple PPLs feature set.
- * @MD_ALLOW_SB_UPDATE: md_check_recovery is allowed to update the metadata
- *			 without taking reconfig_mutex.
- * @MD_UPDATING_SB: md_check_recovery is updating the metadata without
- *		     explicitly holding reconfig_mutex.
  * @MD_NOT_READY: do_md_run() is active, so 'array_state', ust not report that
  *		   array is ready yet.
  * @MD_BROKEN: This is used to stop writes and mark array as failed.
@@ -268,8 +264,6 @@ enum mddev_flags {
 	MD_FAILFAST_SUPPORTED,
 	MD_HAS_PPL,
 	MD_HAS_MULTIPLE_PPLS,
-	MD_ALLOW_SB_UPDATE,
-	MD_UPDATING_SB,
 	MD_NOT_READY,
 	MD_BROKEN,
 	MD_DELETED,
@@ -810,8 +804,6 @@ extern int md_rdev_init(struct md_rdev *rdev);
 extern void md_rdev_clear(struct md_rdev *rdev);
 
 extern void md_handle_request(struct mddev *mddev, struct bio *bio);
-extern void mddev_suspend(struct mddev *mddev);
-extern void mddev_resume(struct mddev *mddev);
 extern int __mddev_suspend(struct mddev *mddev, bool interruptible);
 extern void __mddev_resume(struct mddev *mddev);
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH -next v3 25/25] md: rename __mddev_suspend/resume() back to mddev_suspend/resume()
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28  6:15   ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: linux-kernel, linux-raid, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that the old apis are removed, __mddev_suspend/resume() can be
renamed to their original names.

This is done by:

sed -i "s/__mddev_suspend/mddev_suspend/g" *.[ch]
sed -i "s/__mddev_resume/mddev_resume/g" *.[ch]

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/dm-raid.c     |  4 ++--
 drivers/md/md.c          | 18 +++++++++---------
 drivers/md/md.h          | 12 ++++++------
 drivers/md/raid5-cache.c |  4 ++--
 4 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 05dd6ccf6f48..a4692f8f98ee 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3797,7 +3797,7 @@ static void raid_postsuspend(struct dm_target *ti)
 		if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
 			md_stop_writes(&rs->md);
 
-		__mddev_suspend(&rs->md, false);
+		mddev_suspend(&rs->md, false);
 	}
 }
 
@@ -4009,7 +4009,7 @@ static int raid_preresume(struct dm_target *ti)
 	}
 
 	/* Check for any resize/reshape on @rs and adjust/initiate */
-	/* Be prepared for __mddev_resume() in raid_resume() */
+	/* Be prepared for mddev_resume() in raid_resume() */
 	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) {
 		set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 271d3f336026..b711eaf53e41 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -422,7 +422,7 @@ static void md_submit_bio(struct bio *bio)
  * Make sure no new requests are submitted to the device, and any requests that
  * have been submitted are completely handled.
  */
-int __mddev_suspend(struct mddev *mddev, bool interruptible)
+int mddev_suspend(struct mddev *mddev, bool interruptible)
 {
 	int err = 0;
 
@@ -473,9 +473,9 @@ int __mddev_suspend(struct mddev *mddev, bool interruptible)
 	mutex_unlock(&mddev->suspend_mutex);
 	return 0;
 }
-EXPORT_SYMBOL_GPL(__mddev_suspend);
+EXPORT_SYMBOL_GPL(mddev_suspend);
 
-void __mddev_resume(struct mddev *mddev)
+void mddev_resume(struct mddev *mddev)
 {
 	lockdep_assert_not_held(&mddev->reconfig_mutex);
 
@@ -486,7 +486,7 @@ void __mddev_resume(struct mddev *mddev)
 		return;
 	}
 
-	/* entred the memalloc scope from __mddev_suspend() */
+	/* entred the memalloc scope from mddev_suspend() */
 	memalloc_noio_restore(mddev->noio_flag);
 
 	percpu_ref_resurrect(&mddev->active_io);
@@ -498,7 +498,7 @@ void __mddev_resume(struct mddev *mddev)
 
 	mutex_unlock(&mddev->suspend_mutex);
 }
-EXPORT_SYMBOL_GPL(__mddev_resume);
+EXPORT_SYMBOL_GPL(mddev_resume);
 
 /*
  * Generic flush handling for md
@@ -5216,12 +5216,12 @@ suspend_lo_store(struct mddev *mddev, const char *buf, size_t len)
 	if (new != (sector_t)new)
 		return -EINVAL;
 
-	err = __mddev_suspend(mddev, true);
+	err = mddev_suspend(mddev, true);
 	if (err)
 		return err;
 
 	WRITE_ONCE(mddev->suspend_lo, new);
-	__mddev_resume(mddev);
+	mddev_resume(mddev);
 
 	return len;
 }
@@ -5247,12 +5247,12 @@ suspend_hi_store(struct mddev *mddev, const char *buf, size_t len)
 	if (new != (sector_t)new)
 		return -EINVAL;
 
-	err = __mddev_suspend(mddev, true);
+	err = mddev_suspend(mddev, true);
 	if (err)
 		return err;
 
 	WRITE_ONCE(mddev->suspend_hi, new);
-	__mddev_resume(mddev);
+	mddev_resume(mddev);
 
 	return len;
 }
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 4c5f3f032656..55d01d431418 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -804,8 +804,8 @@ extern int md_rdev_init(struct md_rdev *rdev);
 extern void md_rdev_clear(struct md_rdev *rdev);
 
 extern void md_handle_request(struct mddev *mddev, struct bio *bio);
-extern int __mddev_suspend(struct mddev *mddev, bool interruptible);
-extern void __mddev_resume(struct mddev *mddev);
+extern int mddev_suspend(struct mddev *mddev, bool interruptible);
+extern void mddev_resume(struct mddev *mddev);
 
 extern void md_reload_sb(struct mddev *mddev, int raid_disk);
 extern void md_update_sb(struct mddev *mddev, int force);
@@ -853,27 +853,27 @@ static inline int mddev_suspend_and_lock(struct mddev *mddev)
 {
 	int ret;
 
-	ret = __mddev_suspend(mddev, true);
+	ret = mddev_suspend(mddev, true);
 	if (ret)
 		return ret;
 
 	ret = mddev_lock(mddev);
 	if (ret)
-		__mddev_resume(mddev);
+		mddev_resume(mddev);
 
 	return ret;
 }
 
 static inline void mddev_suspend_and_lock_nointr(struct mddev *mddev)
 {
-	__mddev_suspend(mddev, false);
+	mddev_suspend(mddev, false);
 	mutex_lock(&mddev->reconfig_mutex);
 }
 
 static inline void mddev_unlock_and_resume(struct mddev *mddev)
 {
 	mddev_unlock(mddev);
-	__mddev_resume(mddev);
+	mddev_resume(mddev);
 }
 
 struct mdu_array_info_s;
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 9909110262ee..6157f5beb9fe 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -699,9 +699,9 @@ static void r5c_disable_writeback_async(struct work_struct *work)
 
 	log = READ_ONCE(conf->log);
 	if (log) {
-		__mddev_suspend(mddev, false);
+		mddev_suspend(mddev, false);
 		log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
-		__mddev_resume(mddev);
+		mddev_resume(mddev);
 	}
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [dm-devel] [PATCH -next v3 25/25] md: rename __mddev_suspend/resume() back to mddev_suspend/resume()
@ 2023-09-28  6:15   ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-09-28  6:15 UTC (permalink / raw)
  To: xni, agk, snitzer, dm-devel, song
  Cc: yi.zhang, yangerkun, linux-kernel, linux-raid, yukuai1, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that the old apis are removed, __mddev_suspend/resume() can be
renamed to their original names.

This is done by:

sed -i "s/__mddev_suspend/mddev_suspend/g" *.[ch]
sed -i "s/__mddev_resume/mddev_resume/g" *.[ch]

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/dm-raid.c     |  4 ++--
 drivers/md/md.c          | 18 +++++++++---------
 drivers/md/md.h          | 12 ++++++------
 drivers/md/raid5-cache.c |  4 ++--
 4 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 05dd6ccf6f48..a4692f8f98ee 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3797,7 +3797,7 @@ static void raid_postsuspend(struct dm_target *ti)
 		if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
 			md_stop_writes(&rs->md);
 
-		__mddev_suspend(&rs->md, false);
+		mddev_suspend(&rs->md, false);
 	}
 }
 
@@ -4009,7 +4009,7 @@ static int raid_preresume(struct dm_target *ti)
 	}
 
 	/* Check for any resize/reshape on @rs and adjust/initiate */
-	/* Be prepared for __mddev_resume() in raid_resume() */
+	/* Be prepared for mddev_resume() in raid_resume() */
 	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) {
 		set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 271d3f336026..b711eaf53e41 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -422,7 +422,7 @@ static void md_submit_bio(struct bio *bio)
  * Make sure no new requests are submitted to the device, and any requests that
  * have been submitted are completely handled.
  */
-int __mddev_suspend(struct mddev *mddev, bool interruptible)
+int mddev_suspend(struct mddev *mddev, bool interruptible)
 {
 	int err = 0;
 
@@ -473,9 +473,9 @@ int __mddev_suspend(struct mddev *mddev, bool interruptible)
 	mutex_unlock(&mddev->suspend_mutex);
 	return 0;
 }
-EXPORT_SYMBOL_GPL(__mddev_suspend);
+EXPORT_SYMBOL_GPL(mddev_suspend);
 
-void __mddev_resume(struct mddev *mddev)
+void mddev_resume(struct mddev *mddev)
 {
 	lockdep_assert_not_held(&mddev->reconfig_mutex);
 
@@ -486,7 +486,7 @@ void __mddev_resume(struct mddev *mddev)
 		return;
 	}
 
-	/* entred the memalloc scope from __mddev_suspend() */
+	/* entred the memalloc scope from mddev_suspend() */
 	memalloc_noio_restore(mddev->noio_flag);
 
 	percpu_ref_resurrect(&mddev->active_io);
@@ -498,7 +498,7 @@ void __mddev_resume(struct mddev *mddev)
 
 	mutex_unlock(&mddev->suspend_mutex);
 }
-EXPORT_SYMBOL_GPL(__mddev_resume);
+EXPORT_SYMBOL_GPL(mddev_resume);
 
 /*
  * Generic flush handling for md
@@ -5216,12 +5216,12 @@ suspend_lo_store(struct mddev *mddev, const char *buf, size_t len)
 	if (new != (sector_t)new)
 		return -EINVAL;
 
-	err = __mddev_suspend(mddev, true);
+	err = mddev_suspend(mddev, true);
 	if (err)
 		return err;
 
 	WRITE_ONCE(mddev->suspend_lo, new);
-	__mddev_resume(mddev);
+	mddev_resume(mddev);
 
 	return len;
 }
@@ -5247,12 +5247,12 @@ suspend_hi_store(struct mddev *mddev, const char *buf, size_t len)
 	if (new != (sector_t)new)
 		return -EINVAL;
 
-	err = __mddev_suspend(mddev, true);
+	err = mddev_suspend(mddev, true);
 	if (err)
 		return err;
 
 	WRITE_ONCE(mddev->suspend_hi, new);
-	__mddev_resume(mddev);
+	mddev_resume(mddev);
 
 	return len;
 }
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 4c5f3f032656..55d01d431418 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -804,8 +804,8 @@ extern int md_rdev_init(struct md_rdev *rdev);
 extern void md_rdev_clear(struct md_rdev *rdev);
 
 extern void md_handle_request(struct mddev *mddev, struct bio *bio);
-extern int __mddev_suspend(struct mddev *mddev, bool interruptible);
-extern void __mddev_resume(struct mddev *mddev);
+extern int mddev_suspend(struct mddev *mddev, bool interruptible);
+extern void mddev_resume(struct mddev *mddev);
 
 extern void md_reload_sb(struct mddev *mddev, int raid_disk);
 extern void md_update_sb(struct mddev *mddev, int force);
@@ -853,27 +853,27 @@ static inline int mddev_suspend_and_lock(struct mddev *mddev)
 {
 	int ret;
 
-	ret = __mddev_suspend(mddev, true);
+	ret = mddev_suspend(mddev, true);
 	if (ret)
 		return ret;
 
 	ret = mddev_lock(mddev);
 	if (ret)
-		__mddev_resume(mddev);
+		mddev_resume(mddev);
 
 	return ret;
 }
 
 static inline void mddev_suspend_and_lock_nointr(struct mddev *mddev)
 {
-	__mddev_suspend(mddev, false);
+	mddev_suspend(mddev, false);
 	mutex_lock(&mddev->reconfig_mutex);
 }
 
 static inline void mddev_unlock_and_resume(struct mddev *mddev)
 {
 	mddev_unlock(mddev);
-	__mddev_resume(mddev);
+	mddev_resume(mddev);
 }
 
 struct mdu_array_info_s;
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 9909110262ee..6157f5beb9fe 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -699,9 +699,9 @@ static void r5c_disable_writeback_async(struct work_struct *work)
 
 	log = READ_ONCE(conf->log);
 	if (log) {
-		__mddev_suspend(mddev, false);
+		mddev_suspend(mddev, false);
 		log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
-		__mddev_resume(mddev);
+		mddev_resume(mddev);
 	}
 }
 
-- 
2.39.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH -next v3 01/25] md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
  2023-09-28  6:15   ` [dm-devel] " Yu Kuai
@ 2023-09-28 15:54     ` Song Liu
  -1 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-09-28 15:54 UTC (permalink / raw)
  To: Yu Kuai
  Cc: xni, agk, snitzer, dm-devel, linux-kernel, linux-raid, yukuai3,
	yi.zhang, yangerkun

On Wed, Sep 27, 2023 at 11:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Because reading 'suspend_lo' and 'suspend_hi' from md_handle_request()
> is not protected, use READ_ONCE/WRITE_ONCE to prevent reading abnormal
> value.

I rephrase this a bit:

Protect 'suspend_lo' and 'suspend_hi' with READ_ONCE/WRITE_ONCE to prevent
reading abnormal values.

Thanks,
Song

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [dm-devel] [PATCH -next v3 01/25] md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
@ 2023-09-28 15:54     ` Song Liu
  0 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-09-28 15:54 UTC (permalink / raw)
  To: Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, dm-devel, linux-kernel, linux-raid,
	xni, yukuai3, agk

On Wed, Sep 27, 2023 at 11:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Because reading 'suspend_lo' and 'suspend_hi' from md_handle_request()
> is not protected, use READ_ONCE/WRITE_ONCE to prevent reading abnormal
> value.

I rephrase this a bit:

Protect 'suspend_lo' and 'suspend_hi' with READ_ONCE/WRITE_ONCE to prevent
reading abnormal values.

Thanks,
Song

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH -next v3 03/25] md: add new helpers to suspend/resume array
  2023-09-28  6:15   ` [dm-devel] " Yu Kuai
@ 2023-09-28 18:45     ` Song Liu
  -1 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-09-28 18:45 UTC (permalink / raw)
  To: Yu Kuai
  Cc: xni, agk, snitzer, dm-devel, linux-kernel, linux-raid, yukuai3,
	yi.zhang, yangerkun

On Wed, Sep 27, 2023 at 11:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Advantages for new apis:
>  - reconfig_mutex is not required;
>  - the weird logical that suspend array hold 'reconfig_mutex' for
>    mddev_check_recovery() to update superblock is not needed;
>  - the specail handling, 'pers->prepare_suspend', for raid456 is not
>    needed;
>  - It's safe to be called at any time once mddev is allocated, and it's
>    designed to be used from slow path where array configuration is changed;
>  - the new helpers is designed to be called before mddev_lock(), hence
>    it support to be interrupted by user as well.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++-
>  drivers/md/md.h |   3 ++
>  2 files changed, 103 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index e460b380143d..a075d03d03d3 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -443,12 +443,22 @@ void mddev_suspend(struct mddev *mddev)
>                         lockdep_is_held(&mddev->reconfig_mutex));
>
>         WARN_ON_ONCE(thread && current == thread->tsk);
> -       if (mddev->suspended++)
> +
> +       /* can't concurrent with __mddev_suspend() and __mddev_resume() */
> +       mutex_lock(&mddev->suspend_mutex);
> +       if (mddev->suspended++) {
> +               mutex_unlock(&mddev->suspend_mutex);
>                 return;

Can we make mddev->suspended atomic_t, and use atomic_inc_return()
here?

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [dm-devel] [PATCH -next v3 03/25] md: add new helpers to suspend/resume array
@ 2023-09-28 18:45     ` Song Liu
  0 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-09-28 18:45 UTC (permalink / raw)
  To: Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, dm-devel, linux-kernel, linux-raid,
	xni, yukuai3, agk

On Wed, Sep 27, 2023 at 11:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Advantages for new apis:
>  - reconfig_mutex is not required;
>  - the weird logical that suspend array hold 'reconfig_mutex' for
>    mddev_check_recovery() to update superblock is not needed;
>  - the specail handling, 'pers->prepare_suspend', for raid456 is not
>    needed;
>  - It's safe to be called at any time once mddev is allocated, and it's
>    designed to be used from slow path where array configuration is changed;
>  - the new helpers is designed to be called before mddev_lock(), hence
>    it support to be interrupted by user as well.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++-
>  drivers/md/md.h |   3 ++
>  2 files changed, 103 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index e460b380143d..a075d03d03d3 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -443,12 +443,22 @@ void mddev_suspend(struct mddev *mddev)
>                         lockdep_is_held(&mddev->reconfig_mutex));
>
>         WARN_ON_ONCE(thread && current == thread->tsk);
> -       if (mddev->suspended++)
> +
> +       /* can't concurrent with __mddev_suspend() and __mddev_resume() */
> +       mutex_lock(&mddev->suspend_mutex);
> +       if (mddev->suspended++) {
> +               mutex_unlock(&mddev->suspend_mutex);
>                 return;

Can we make mddev->suspended atomic_t, and use atomic_inc_return()
here?

Thanks,
Song

[...]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
  2023-09-28  6:15 ` [dm-devel] " Yu Kuai
@ 2023-09-28 19:15   ` Song Liu
  -1 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-09-28 19:15 UTC (permalink / raw)
  To: Yu Kuai
  Cc: xni, agk, snitzer, dm-devel, linux-kernel, linux-raid, yukuai3,
	yi.zhang, yangerkun

Hi Kuai,

Thanks for the patchset!

A few high level questions/suggestions:

1. This is a big change that needs a lot of explanation. While you managed to
keep each patch relatively small (great job btw), it is not very clear why we
need these changes. Specifically, we are adding a new mutex, it is worth
mentioning why we cannot achieve the same goal without it. Please add
more information in the cover letter. We will put part of the cover letter in
the merge commit.

2. In the cover letter, please also highlight that we are removing
 MD_ALLOW_SB_UPDATE and MD_UPDATING_SB. This is a big improvement.

3. Please rearrange the patch set so that the two "READ_ONCE/WRITE_ONCE"
patches are at the beginning.

4. Please consider merging some patches. Current "add-api => use-api =>
remove-old-api" makes it tricky to follow what is being changed. For this set,
I found the diff of the whole set easier to follow than some of the big patches.

Thanks again for your hard work into this!
Song

On Wed, Sep 27, 2023 at 11:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
[...]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [dm-devel] [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
@ 2023-09-28 19:15   ` Song Liu
  0 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-09-28 19:15 UTC (permalink / raw)
  To: Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, dm-devel, linux-kernel, linux-raid,
	xni, yukuai3, agk

Hi Kuai,

Thanks for the patchset!

A few high level questions/suggestions:

1. This is a big change that needs a lot of explanation. While you managed to
keep each patch relatively small (great job btw), it is not very clear why we
need these changes. Specifically, we are adding a new mutex, it is worth
mentioning why we cannot achieve the same goal without it. Please add
more information in the cover letter. We will put part of the cover letter in
the merge commit.

2. In the cover letter, please also highlight that we are removing
 MD_ALLOW_SB_UPDATE and MD_UPDATING_SB. This is a big improvement.

3. Please rearrange the patch set so that the two "READ_ONCE/WRITE_ONCE"
patches are at the beginning.

4. Please consider merging some patches. Current "add-api => use-api =>
remove-old-api" makes it tricky to follow what is being changed. For this set,
I found the diff of the whole set easier to follow than some of the big patches.

Thanks again for your hard work into this!
Song

On Wed, Sep 27, 2023 at 11:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
[...]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [dm-devel] [PATCH -next v3 03/25] md: add new helpers to suspend/resume array
  2023-09-28 18:45     ` [dm-devel] " Song Liu
  (?)
@ 2023-10-05  2:58     ` Yu Kuai
  2023-10-05  4:00         ` Song Liu
  -1 siblings, 1 reply; 70+ messages in thread
From: Yu Kuai @ 2023-10-05  2:58 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, dm-devel, linux-kernel, linux-raid,
	xni, yukuai (C),
	agk

Hi,

在 2023/09/29 2:45, Song Liu 写道:
> On Wed, Sep 27, 2023 at 11:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Advantages for new apis:
>>   - reconfig_mutex is not required;
>>   - the weird logical that suspend array hold 'reconfig_mutex' for
>>     mddev_check_recovery() to update superblock is not needed;
>>   - the specail handling, 'pers->prepare_suspend', for raid456 is not
>>     needed;
>>   - It's safe to be called at any time once mddev is allocated, and it's
>>     designed to be used from slow path where array configuration is changed;
>>   - the new helpers is designed to be called before mddev_lock(), hence
>>     it support to be interrupted by user as well.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/md.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++-
>>   drivers/md/md.h |   3 ++
>>   2 files changed, 103 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index e460b380143d..a075d03d03d3 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -443,12 +443,22 @@ void mddev_suspend(struct mddev *mddev)
>>                          lockdep_is_held(&mddev->reconfig_mutex));
>>
>>          WARN_ON_ONCE(thread && current == thread->tsk);
>> -       if (mddev->suspended++)
>> +
>> +       /* can't concurrent with __mddev_suspend() and __mddev_resume() */
>> +       mutex_lock(&mddev->suspend_mutex);
>> +       if (mddev->suspended++) {
>> +               mutex_unlock(&mddev->suspend_mutex);
>>                  return;
> 
> Can we make mddev->suspended atomic_t, and use atomic_inc_return()
> here?

'suspend_mutex' is needed, because concurrent caller of
mddev_suspend() shound be ordered, they need to wait for the first
mddev_suspend() to be done.

Updating suspended is protected by 'suspend_mutex' in the new api, so I
think it's not necessary to use atomic, WRITE/READ_ONCE() should be
enough.

Thanks,
Kuai

> 
> Thanks,
> Song
> 
> [...]
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [dm-devel] [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
  2023-09-28 19:15   ` [dm-devel] " Song Liu
  (?)
@ 2023-10-05  3:42   ` Yu Kuai
  2023-10-05  3:55       ` Song Liu
  -1 siblings, 1 reply; 70+ messages in thread
From: Yu Kuai @ 2023-10-05  3:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, dm-devel, linux-kernel, linux-raid,
	xni, yukuai (C),
	agk

Hi,

在 2023/09/29 3:15, Song Liu 写道:
> Hi Kuai,
> 
> Thanks for the patchset!
> 
> A few high level questions/suggestions:

Thanks a lot for these!
> 
> 1. This is a big change that needs a lot of explanation. While you managed to
> keep each patch relatively small (great job btw), it is not very clear why we
> need these changes. Specifically, we are adding a new mutex, it is worth
> mentioning why we cannot achieve the same goal without it. Please add
> more information in the cover letter. We will put part of the cover letter in
> the merge commit.

Yeah, I realize that I explain too little. I will add background and
design.
> 
> 2. In the cover letter, please also highlight that we are removing
>   MD_ALLOW_SB_UPDATE and MD_UPDATING_SB. This is a big improvement.
> 

Okay.
> 3. Please rearrange the patch set so that the two "READ_ONCE/WRITE_ONCE"
> patches are at the beginning.

Okay.
> 
> 4. Please consider merging some patches. Current "add-api => use-api =>
> remove-old-api" makes it tricky to follow what is being changed. For this set,
> I found the diff of the whole set easier to follow than some of the big patches.
I refer to some other big patchset to replace an old api, for example:

https://lore.kernel.org/all/20230818123232.2269-1-jack@suse.cz/

Currently I prefer to use one patch for each function point. And I do
merged some patches in this version, and for remaining patches, do you
prefer to use one patch for one file instead of one function point?(For
example, merge patch 10-12 for md/raid5-cache, and 13-16 for md/raid5).

Thanks,
Kuai

> 
> Thanks again for your hard work into this!
> Song
> 
> On Wed, Sep 27, 2023 at 11:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
> [...]
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [dm-devel] [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
  2023-10-05  3:42   ` Yu Kuai
@ 2023-10-05  3:55       ` Song Liu
  0 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-10-05  3:55 UTC (permalink / raw)
  To: Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, dm-devel, linux-kernel, linux-raid,
	xni, yukuai (C),
	agk

On Wed, Oct 4, 2023 at 8:42 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/09/29 3:15, Song Liu 写道:
> > Hi Kuai,
> >
> > Thanks for the patchset!
> >
> > A few high level questions/suggestions:
>
> Thanks a lot for these!
> >
> > 1. This is a big change that needs a lot of explanation. While you managed to
> > keep each patch relatively small (great job btw), it is not very clear why we
> > need these changes. Specifically, we are adding a new mutex, it is worth
> > mentioning why we cannot achieve the same goal without it. Please add
> > more information in the cover letter. We will put part of the cover letter in
> > the merge commit.
>
> Yeah, I realize that I explain too little. I will add background and
> design.
> >
> > 2. In the cover letter, please also highlight that we are removing
> >   MD_ALLOW_SB_UPDATE and MD_UPDATING_SB. This is a big improvement.
> >
>
> Okay.
> > 3. Please rearrange the patch set so that the two "READ_ONCE/WRITE_ONCE"
> > patches are at the beginning.
>
> Okay.
> >
> > 4. Please consider merging some patches. Current "add-api => use-api =>
> > remove-old-api" makes it tricky to follow what is being changed. For this set,
> > I found the diff of the whole set easier to follow than some of the big patches.
> I refer to some other big patchset to replace an old api, for example:
>
> https://lore.kernel.org/all/20230818123232.2269-1-jack@suse.cz/

Yes, this is a safe way to replace old APIs. Since the scale of this
patchset is
smaller, I was thinking it might not be necessary to go that path. But
I will let
you make the decision.

> Currently I prefer to use one patch for each function point. And I do
> merged some patches in this version, and for remaining patches, do you
> prefer to use one patch for one file instead of one function point?(For
> example, merge patch 10-12 for md/raid5-cache, and 13-16 for md/raid5).

I think 10 should be a separate patch, and we can merge 11 and 12. We can
merge 13-16, and maybe also 5-7 and 18-20.

Thanks,
Song

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
@ 2023-10-05  3:55       ` Song Liu
  0 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-10-05  3:55 UTC (permalink / raw)
  To: Yu Kuai
  Cc: xni, agk, snitzer, dm-devel, linux-kernel, linux-raid, yi.zhang,
	yangerkun, yukuai (C)

On Wed, Oct 4, 2023 at 8:42 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/09/29 3:15, Song Liu 写道:
> > Hi Kuai,
> >
> > Thanks for the patchset!
> >
> > A few high level questions/suggestions:
>
> Thanks a lot for these!
> >
> > 1. This is a big change that needs a lot of explanation. While you managed to
> > keep each patch relatively small (great job btw), it is not very clear why we
> > need these changes. Specifically, we are adding a new mutex, it is worth
> > mentioning why we cannot achieve the same goal without it. Please add
> > more information in the cover letter. We will put part of the cover letter in
> > the merge commit.
>
> Yeah, I realize that I explain too little. I will add background and
> design.
> >
> > 2. In the cover letter, please also highlight that we are removing
> >   MD_ALLOW_SB_UPDATE and MD_UPDATING_SB. This is a big improvement.
> >
>
> Okay.
> > 3. Please rearrange the patch set so that the two "READ_ONCE/WRITE_ONCE"
> > patches are at the beginning.
>
> Okay.
> >
> > 4. Please consider merging some patches. Current "add-api => use-api =>
> > remove-old-api" makes it tricky to follow what is being changed. For this set,
> > I found the diff of the whole set easier to follow than some of the big patches.
> I refer to some other big patchset to replace an old api, for example:
>
> https://lore.kernel.org/all/20230818123232.2269-1-jack@suse.cz/

Yes, this is a safe way to replace old APIs. Since the scale of this
patchset is
smaller, I was thinking it might not be necessary to go that path. But
I will let
you make the decision.

> Currently I prefer to use one patch for each function point. And I do
> merged some patches in this version, and for remaining patches, do you
> prefer to use one patch for one file instead of one function point?(For
> example, merge patch 10-12 for md/raid5-cache, and 13-16 for md/raid5).

I think 10 should be a separate patch, and we can merge 11 and 12. We can
merge 13-16, and maybe also 5-7 and 18-20.

Thanks,
Song

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [dm-devel] [PATCH -next v3 03/25] md: add new helpers to suspend/resume array
  2023-10-05  2:58     ` Yu Kuai
@ 2023-10-05  4:00         ` Song Liu
  0 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-10-05  4:00 UTC (permalink / raw)
  To: Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, dm-devel, linux-kernel, linux-raid,
	xni, yukuai (C),
	agk

On Wed, Oct 4, 2023 at 7:59 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/09/29 2:45, Song Liu 写道:
> > On Wed, Sep 27, 2023 at 11:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> From: Yu Kuai <yukuai3@huawei.com>
> >>
> >> Advantages for new apis:
> >>   - reconfig_mutex is not required;
> >>   - the weird logical that suspend array hold 'reconfig_mutex' for
> >>     mddev_check_recovery() to update superblock is not needed;
> >>   - the specail handling, 'pers->prepare_suspend', for raid456 is not
> >>     needed;
> >>   - It's safe to be called at any time once mddev is allocated, and it's
> >>     designed to be used from slow path where array configuration is changed;
> >>   - the new helpers is designed to be called before mddev_lock(), hence
> >>     it support to be interrupted by user as well.
> >>
> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> >> ---
> >>   drivers/md/md.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++-
> >>   drivers/md/md.h |   3 ++
> >>   2 files changed, 103 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index e460b380143d..a075d03d03d3 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -443,12 +443,22 @@ void mddev_suspend(struct mddev *mddev)
> >>                          lockdep_is_held(&mddev->reconfig_mutex));
> >>
> >>          WARN_ON_ONCE(thread && current == thread->tsk);
> >> -       if (mddev->suspended++)
> >> +
> >> +       /* can't concurrent with __mddev_suspend() and __mddev_resume() */
> >> +       mutex_lock(&mddev->suspend_mutex);
> >> +       if (mddev->suspended++) {
> >> +               mutex_unlock(&mddev->suspend_mutex);
> >>                  return;
> >
> > Can we make mddev->suspended atomic_t, and use atomic_inc_return()
> > here?
>
> 'suspend_mutex' is needed, because concurrent caller of
> mddev_suspend() shound be ordered, they need to wait for the first
> mddev_suspend() to be done.
>
> Updating suspended is protected by 'suspend_mutex' in the new api, so I
> think it's not necessary to use atomic, WRITE/READ_ONCE() should be
> enough.

Thanks for the explanation.

Song

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH -next v3 03/25] md: add new helpers to suspend/resume array
@ 2023-10-05  4:00         ` Song Liu
  0 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-10-05  4:00 UTC (permalink / raw)
  To: Yu Kuai
  Cc: xni, agk, snitzer, dm-devel, linux-kernel, linux-raid, yi.zhang,
	yangerkun, yukuai (C)

On Wed, Oct 4, 2023 at 7:59 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/09/29 2:45, Song Liu 写道:
> > On Wed, Sep 27, 2023 at 11:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> From: Yu Kuai <yukuai3@huawei.com>
> >>
> >> Advantages for new apis:
> >>   - reconfig_mutex is not required;
> >>   - the weird logical that suspend array hold 'reconfig_mutex' for
> >>     mddev_check_recovery() to update superblock is not needed;
> >>   - the specail handling, 'pers->prepare_suspend', for raid456 is not
> >>     needed;
> >>   - It's safe to be called at any time once mddev is allocated, and it's
> >>     designed to be used from slow path where array configuration is changed;
> >>   - the new helpers is designed to be called before mddev_lock(), hence
> >>     it support to be interrupted by user as well.
> >>
> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> >> ---
> >>   drivers/md/md.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++-
> >>   drivers/md/md.h |   3 ++
> >>   2 files changed, 103 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index e460b380143d..a075d03d03d3 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -443,12 +443,22 @@ void mddev_suspend(struct mddev *mddev)
> >>                          lockdep_is_held(&mddev->reconfig_mutex));
> >>
> >>          WARN_ON_ONCE(thread && current == thread->tsk);
> >> -       if (mddev->suspended++)
> >> +
> >> +       /* can't concurrent with __mddev_suspend() and __mddev_resume() */
> >> +       mutex_lock(&mddev->suspend_mutex);
> >> +       if (mddev->suspended++) {
> >> +               mutex_unlock(&mddev->suspend_mutex);
> >>                  return;
> >
> > Can we make mddev->suspended atomic_t, and use atomic_inc_return()
> > here?
>
> 'suspend_mutex' is needed, because concurrent caller of
> mddev_suspend() shound be ordered, they need to wait for the first
> mddev_suspend() to be done.
>
> Updating suspended is protected by 'suspend_mutex' in the new api, so I
> think it's not necessary to use atomic, WRITE/READ_ONCE() should be
> enough.

Thanks for the explanation.

Song

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
  2023-10-05  3:55       ` Song Liu
@ 2023-10-07  2:32         ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-10-07  2:32 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: xni, agk, snitzer, dm-devel, linux-kernel, linux-raid, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2023/10/05 11:55, Song Liu 写道:
> On Wed, Oct 4, 2023 at 8:42 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/09/29 3:15, Song Liu 写道:
>>> Hi Kuai,
>>>
>>> Thanks for the patchset!
>>>
>>> A few high level questions/suggestions:
>>
>> Thanks a lot for these!
>>>
>>> 1. This is a big change that needs a lot of explanation. While you managed to
>>> keep each patch relatively small (great job btw), it is not very clear why we
>>> need these changes. Specifically, we are adding a new mutex, it is worth
>>> mentioning why we cannot achieve the same goal without it. Please add
>>> more information in the cover letter. We will put part of the cover letter in
>>> the merge commit.
>>
>> Yeah, I realize that I explain too little. I will add background and
>> design.
>>>
Can you take a look about this new cover letter?

##### Backgroud

Our testers started to test raid10 last year, and we found that there
are lots of problem in the following test scenario:

- add or remove disks to the array
- issue io to the array

At first, we fixed each problem independently respect that io can
concurrent with array reconfiguration.  However, on the one hand new
issues are continuously reported, on the other hand other personalities
might have the same problems. I'm thinking about how to fix these
problems thoroughly.

Refer to how block layer protect io with queue reconfiguration(for
example, change elevator):

```
blk_mq_freeze_queue
-> wait for all io to be done, and prevent new io to be dispatched
// reconfiguration
blk_mq_unfreeze_queue
```

Then it comes to my mind that I can do something similar to synchronize
io with array reconfiguration.

##### rcu introduction

see details in https://www.kernel.org/doc/html/next/RCU/whatisRCU.html

- writer should replace old data with new data first, and free old data
after grace period;
- reader should handle both cases that old data and new data is read,
and the data that is read should not be dereferenced after critical
section;

##### Current synchronization

Add or remove disks to the array can be triggered by ioctl/sysfs/daemon
thread:

1. hold 'reconfig_mutex';

2. check that rdev can be added/removed, one condition is that there is
no IO, for example:

    ```
    raid10_remove_disk
     if (atomic_read(&rdev->nr_pending))
      err = -EBUSY;
    ```

3. do the actual operations to add/remove a rdev, one procedure is
set/clear a pointer to rdev, for example:

    ```
    raid10_remove_disk
     p = conf->mirrors[xx]
     rdevp = &p->rdev/replacement
     *rdevp = NULL
    ```

4. check if there is still no io on this rdev, if not, revert the
pointer to rdev and return failure, for example

    ```
    raid10_remove_disk
     synchronize_rcu()
     if (atomic_read(&rdev->nr_pending))
      err = -EBUSY
      *rdevp = rdev
    ```

IO path is using rcu_read_lock/unlock() to access rdev, for example:

```
raid10_write_request
  rcu_read_lock
  rdev = rcu_dereference(mirror->rdev/replacement)
  rcu_read_unlock

raid10_end_write_request
  rdev = conf->mirrors[dev].rdev/replacement
  -> rdev/rrdev is still used after rcu_read_unlock()
```

##### Current problems

- rcu is used wrongly;
- There are lots of places involved that old value is read, however,
many places doesn't handle this correctly;
- Between step 3 and 4, if new io is dispatched, NULL will be read for
the rdev, and data will be lost.

##### New synchronization

Similar to how blk_mq_freeze_queue() works

Add or remove disks:

1. suspend the array, this should guarantee no new io is dispatched and
wait for dispatched io to be done;
2. add or remove rdevs from array;
3. resume the array;

IO path doesn't need to change for now, and all rcu implementation can
be removed.

There are already apis to suspend/resume the array, unfortunately, they
can't be used here because:

- old apis only wait for io to be dispatched, not to be done;
- old apis is only supported for the personality that implement quiesce
callback;
- old apis must be called after the array start running;
- old apis must hold 'reconfig_mutex', and will wait for io to be done,
this behavior is risky because 'reconfig_mutex' is used for daemon
thread to update super_block and handle io. In order to prevent
potential problems, there is a weird logical that suspend array hold
'reconfig_mutex' for mddev_check_recovery() to update super_block;

Then main work is divided into 3 steps, at first make sure new apis to
suspend the array is general:

- make sure suspend array will wait for io to be done(Done by []);
- make sure suspend array can be called for all personalities(Done by
[]);
- make sure suspend array can be called at any time(Done by []);
- make sure suspend array doesn't rely on 'reconfig_mutex';

The second step is to replace old apis with new apis:

```
From:
lock reconfig_mutex
suspend array
resume array
unlock reconfig_mutex

To:
suspend array
lock reconfig_mutex
unlock reconfig_mutex
resume array
```

Finally, for the remain path that involved reconfiguration, suspend the
array first:

```
From:
// reconfiguration

To:
suspend array
// reconfiguration
resume array
```

>>> 2. In the cover letter, please also highlight that we are removing
>>>    MD_ALLOW_SB_UPDATE and MD_UPDATING_SB. This is a big improvement.
>>>
>>
>> Okay.
>>> 3. Please rearrange the patch set so that the two "READ_ONCE/WRITE_ONCE"
>>> patches are at the beginning.
>>
>> Okay.
>>>
>>> 4. Please consider merging some patches. Current "add-api => use-api =>
>>> remove-old-api" makes it tricky to follow what is being changed. For this set,
>>> I found the diff of the whole set easier to follow than some of the big patches.
>> I refer to some other big patchset to replace an old api, for example:
>>
>> https://lore.kernel.org/all/20230818123232.2269-1-jack@suse.cz/
> 
> Yes, this is a safe way to replace old APIs. Since the scale of this
> patchset is
> smaller, I was thinking it might not be necessary to go that path. But
> I will let
> you make the decision.
> 
>> Currently I prefer to use one patch for each function point. And I do
>> merged some patches in this version, and for remaining patches, do you
>> prefer to use one patch for one file instead of one function point?(For
>> example, merge patch 10-12 for md/raid5-cache, and 13-16 for md/raid5).
> 
> I think 10 should be a separate patch, and we can merge 11 and 12. We can
> merge 13-16, and maybe also 5-7 and 18-20.
> 
> Thanks,
> Song
> .
> 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [dm-devel] [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
@ 2023-10-07  2:32         ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-10-07  2:32 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, dm-devel, linux-kernel, linux-raid,
	xni, yukuai (C),
	agk

Hi,

在 2023/10/05 11:55, Song Liu 写道:
> On Wed, Oct 4, 2023 at 8:42 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/09/29 3:15, Song Liu 写道:
>>> Hi Kuai,
>>>
>>> Thanks for the patchset!
>>>
>>> A few high level questions/suggestions:
>>
>> Thanks a lot for these!
>>>
>>> 1. This is a big change that needs a lot of explanation. While you managed to
>>> keep each patch relatively small (great job btw), it is not very clear why we
>>> need these changes. Specifically, we are adding a new mutex, it is worth
>>> mentioning why we cannot achieve the same goal without it. Please add
>>> more information in the cover letter. We will put part of the cover letter in
>>> the merge commit.
>>
>> Yeah, I realize that I explain too little. I will add background and
>> design.
>>>
Can you take a look about this new cover letter?

##### Backgroud

Our testers started to test raid10 last year, and we found that there
are lots of problem in the following test scenario:

- add or remove disks to the array
- issue io to the array

At first, we fixed each problem independently respect that io can
concurrent with array reconfiguration.  However, on the one hand new
issues are continuously reported, on the other hand other personalities
might have the same problems. I'm thinking about how to fix these
problems thoroughly.

Refer to how block layer protect io with queue reconfiguration(for
example, change elevator):

```
blk_mq_freeze_queue
-> wait for all io to be done, and prevent new io to be dispatched
// reconfiguration
blk_mq_unfreeze_queue
```

Then it comes to my mind that I can do something similar to synchronize
io with array reconfiguration.

##### rcu introduction

see details in https://www.kernel.org/doc/html/next/RCU/whatisRCU.html

- writer should replace old data with new data first, and free old data
after grace period;
- reader should handle both cases that old data and new data is read,
and the data that is read should not be dereferenced after critical
section;

##### Current synchronization

Add or remove disks to the array can be triggered by ioctl/sysfs/daemon
thread:

1. hold 'reconfig_mutex';

2. check that rdev can be added/removed, one condition is that there is
no IO, for example:

    ```
    raid10_remove_disk
     if (atomic_read(&rdev->nr_pending))
      err = -EBUSY;
    ```

3. do the actual operations to add/remove a rdev, one procedure is
set/clear a pointer to rdev, for example:

    ```
    raid10_remove_disk
     p = conf->mirrors[xx]
     rdevp = &p->rdev/replacement
     *rdevp = NULL
    ```

4. check if there is still no io on this rdev, if not, revert the
pointer to rdev and return failure, for example

    ```
    raid10_remove_disk
     synchronize_rcu()
     if (atomic_read(&rdev->nr_pending))
      err = -EBUSY
      *rdevp = rdev
    ```

IO path is using rcu_read_lock/unlock() to access rdev, for example:

```
raid10_write_request
  rcu_read_lock
  rdev = rcu_dereference(mirror->rdev/replacement)
  rcu_read_unlock

raid10_end_write_request
  rdev = conf->mirrors[dev].rdev/replacement
  -> rdev/rrdev is still used after rcu_read_unlock()
```

##### Current problems

- rcu is used wrongly;
- There are lots of places involved that old value is read, however,
many places doesn't handle this correctly;
- Between step 3 and 4, if new io is dispatched, NULL will be read for
the rdev, and data will be lost.

##### New synchronization

Similar to how blk_mq_freeze_queue() works

Add or remove disks:

1. suspend the array, this should guarantee no new io is dispatched and
wait for dispatched io to be done;
2. add or remove rdevs from array;
3. resume the array;

IO path doesn't need to change for now, and all rcu implementation can
be removed.

There are already apis to suspend/resume the array, unfortunately, they
can't be used here because:

- old apis only wait for io to be dispatched, not to be done;
- old apis is only supported for the personality that implement quiesce
callback;
- old apis must be called after the array start running;
- old apis must hold 'reconfig_mutex', and will wait for io to be done,
this behavior is risky because 'reconfig_mutex' is used for daemon
thread to update super_block and handle io. In order to prevent
potential problems, there is a weird logical that suspend array hold
'reconfig_mutex' for mddev_check_recovery() to update super_block;

Then main work is divided into 3 steps, at first make sure new apis to
suspend the array is general:

- make sure suspend array will wait for io to be done(Done by []);
- make sure suspend array can be called for all personalities(Done by
[]);
- make sure suspend array can be called at any time(Done by []);
- make sure suspend array doesn't rely on 'reconfig_mutex';

The second step is to replace old apis with new apis:

```
From:
lock reconfig_mutex
suspend array
resume array
unlock reconfig_mutex

To:
suspend array
lock reconfig_mutex
unlock reconfig_mutex
resume array
```

Finally, for the remain path that involved reconfiguration, suspend the
array first:

```
From:
// reconfiguration

To:
suspend array
// reconfiguration
resume array
```

>>> 2. In the cover letter, please also highlight that we are removing
>>>    MD_ALLOW_SB_UPDATE and MD_UPDATING_SB. This is a big improvement.
>>>
>>
>> Okay.
>>> 3. Please rearrange the patch set so that the two "READ_ONCE/WRITE_ONCE"
>>> patches are at the beginning.
>>
>> Okay.
>>>
>>> 4. Please consider merging some patches. Current "add-api => use-api =>
>>> remove-old-api" makes it tricky to follow what is being changed. For this set,
>>> I found the diff of the whole set easier to follow than some of the big patches.
>> I refer to some other big patchset to replace an old api, for example:
>>
>> https://lore.kernel.org/all/20230818123232.2269-1-jack@suse.cz/
> 
> Yes, this is a safe way to replace old APIs. Since the scale of this
> patchset is
> smaller, I was thinking it might not be necessary to go that path. But
> I will let
> you make the decision.
> 
>> Currently I prefer to use one patch for each function point. And I do
>> merged some patches in this version, and for remaining patches, do you
>> prefer to use one patch for one file instead of one function point?(For
>> example, merge patch 10-12 for md/raid5-cache, and 13-16 for md/raid5).
> 
> I think 10 should be a separate patch, and we can merge 11 and 12. We can
> merge 13-16, and maybe also 5-7 and 18-20.
> 
> Thanks,
> Song
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
  2023-10-07  2:32         ` [dm-devel] " Yu Kuai
@ 2023-10-07  2:40           ` Song Liu
  -1 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-10-07  2:40 UTC (permalink / raw)
  To: Yu Kuai
  Cc: xni, agk, snitzer, dm-devel, linux-kernel, linux-raid, yi.zhang,
	yangerkun, yukuai (C)

On Fri, Oct 6, 2023 at 7:32 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/10/05 11:55, Song Liu 写道:
> > On Wed, Oct 4, 2023 at 8:42 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/09/29 3:15, Song Liu 写道:
> >>> Hi Kuai,
> >>>
> >>> Thanks for the patchset!
> >>>
> >>> A few high level questions/suggestions:
> >>
> >> Thanks a lot for these!
> >>>
> >>> 1. This is a big change that needs a lot of explanation. While you managed to
> >>> keep each patch relatively small (great job btw), it is not very clear why we
> >>> need these changes. Specifically, we are adding a new mutex, it is worth
> >>> mentioning why we cannot achieve the same goal without it. Please add
> >>> more information in the cover letter. We will put part of the cover letter in
> >>> the merge commit.
> >>
> >> Yeah, I realize that I explain too little. I will add background and
> >> design.
> >>>
> Can you take a look about this new cover letter?

I don't have time right now to look into all the details, but it looks
great at first glance. We can still edit it a little bit when applying the
patchset, but that may not be necessary.

Thanks,
Song

>
> ##### Backgroud
>
> Our testers started to test raid10 last year, and we found that there
> are lots of problem in the following test scenario:
>
> - add or remove disks to the array
> - issue io to the array

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [dm-devel] [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
@ 2023-10-07  2:40           ` Song Liu
  0 siblings, 0 replies; 70+ messages in thread
From: Song Liu @ 2023-10-07  2:40 UTC (permalink / raw)
  To: Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, dm-devel, linux-kernel, linux-raid,
	xni, yukuai (C),
	agk

On Fri, Oct 6, 2023 at 7:32 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/10/05 11:55, Song Liu 写道:
> > On Wed, Oct 4, 2023 at 8:42 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/09/29 3:15, Song Liu 写道:
> >>> Hi Kuai,
> >>>
> >>> Thanks for the patchset!
> >>>
> >>> A few high level questions/suggestions:
> >>
> >> Thanks a lot for these!
> >>>
> >>> 1. This is a big change that needs a lot of explanation. While you managed to
> >>> keep each patch relatively small (great job btw), it is not very clear why we
> >>> need these changes. Specifically, we are adding a new mutex, it is worth
> >>> mentioning why we cannot achieve the same goal without it. Please add
> >>> more information in the cover letter. We will put part of the cover letter in
> >>> the merge commit.
> >>
> >> Yeah, I realize that I explain too little. I will add background and
> >> design.
> >>>
> Can you take a look about this new cover letter?

I don't have time right now to look into all the details, but it looks
great at first glance. We can still edit it a little bit when applying the
patchset, but that may not be necessary.

Thanks,
Song

>
> ##### Backgroud
>
> Our testers started to test raid10 last year, and we found that there
> are lots of problem in the following test scenario:
>
> - add or remove disks to the array
> - issue io to the array

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
  2023-10-07  2:40           ` [dm-devel] " Song Liu
@ 2023-10-07  2:49             ` Yu Kuai
  -1 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-10-07  2:49 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: xni, agk, snitzer, dm-devel, linux-kernel, linux-raid, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2023/10/07 10:40, Song Liu 写道:
>> Can you take a look about this new cover letter?
> 
> I don't have time right now to look into all the details, but it looks
> great at first glance. We can still edit it a little bit when applying the
> patchset, but that may not be necessary.

Yeah, it's not urgent so you can take it slow, I just want to make sure
that you're good with it. I'll edit this cover letter a bit and send v4
soon.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [dm-devel] [PATCH -next v3 00/25] md: synchronize io with array reconfiguration
@ 2023-10-07  2:49             ` Yu Kuai
  0 siblings, 0 replies; 70+ messages in thread
From: Yu Kuai @ 2023-10-07  2:49 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: yi.zhang, yangerkun, snitzer, dm-devel, linux-kernel, linux-raid,
	xni, yukuai (C),
	agk

Hi,

在 2023/10/07 10:40, Song Liu 写道:
>> Can you take a look about this new cover letter?
> 
> I don't have time right now to look into all the details, but it looks
> great at first glance. We can still edit it a little bit when applying the
> patchset, but that may not be necessary.

Yeah, it's not urgent so you can take it slow, I just want to make sure
that you're good with it. I'll edit this cover letter a bit and send v4
soon.

Thanks,
Kuai

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2023-10-07  2:50 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-28  6:15 [PATCH -next v3 00/25] md: synchronize io with array reconfiguration Yu Kuai
2023-09-28  6:15 ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 01/25] md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi' Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28 15:54   ` Song Liu
2023-09-28 15:54     ` [dm-devel] " Song Liu
2023-09-28  6:15 ` [PATCH -next v3 02/25] md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 03/25] md: add new helpers to suspend/resume array Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28 18:45   ` Song Liu
2023-09-28 18:45     ` [dm-devel] " Song Liu
2023-10-05  2:58     ` Yu Kuai
2023-10-05  4:00       ` Song Liu
2023-10-05  4:00         ` Song Liu
2023-09-28  6:15 ` [PATCH -next v3 04/25] md: add new helpers to suspend/resume and lock/unlock array Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 05/25] md: use new apis to suspend array for suspend_lo/hi_store() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 06/25] md: use new apis to suspend array for level_store() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 07/25] md: use new apis to suspend array for serialize_policy_store() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 08/25] md/dm-raid: use new apis to suspend array Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 09/25] md/md-bitmap: use new apis to suspend array for location_store() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 10/25] md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log' Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 11/25] md/raid5-cache: use new apis to suspend array for r5c_disable_writeback_async() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 12/25] md/raid5-cache: use new apis to suspend array for r5c_journal_mode_store() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 13/25] md/raid5: use new apis to suspend array for raid5_store_stripe_size() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 14/25] md/raid5: use new apis to suspend array for raid5_store_skip_copy() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 15/25] md/raid5: use new apis to suspend array for raid5_store_group_thread_cnt() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 16/25] md/raid5: use new apis to suspend array for raid5_change_consistency_policy() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 17/25] md/raid5: replace suspend with quiesce() callback Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 18/25] md: use new apis to suspend array for ioctls involed array reconfiguration Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 19/25] md: use new apis to suspend array for adding/removing rdev from state_store() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 20/25] md: use new apis to suspend array before mddev_create/destroy_serial_pool Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 21/25] md: cleanup mddev_create/destroy_serial_pool() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 22/25] md/md-linear: cleanup linear_add() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 23/25] md: suspend array in md_start_sync() if array need reconfiguration Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 24/25] md: remove old apis to suspend the array Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28  6:15 ` [PATCH -next v3 25/25] md: rename __mddev_suspend/resume() back to mddev_suspend/resume() Yu Kuai
2023-09-28  6:15   ` [dm-devel] " Yu Kuai
2023-09-28 19:15 ` [PATCH -next v3 00/25] md: synchronize io with array reconfiguration Song Liu
2023-09-28 19:15   ` [dm-devel] " Song Liu
2023-10-05  3:42   ` Yu Kuai
2023-10-05  3:55     ` Song Liu
2023-10-05  3:55       ` Song Liu
2023-10-07  2:32       ` Yu Kuai
2023-10-07  2:32         ` [dm-devel] " Yu Kuai
2023-10-07  2:40         ` Song Liu
2023-10-07  2:40           ` [dm-devel] " Song Liu
2023-10-07  2:49           ` Yu Kuai
2023-10-07  2:49             ` [dm-devel] " Yu Kuai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.