[PATCH -next v3 0/7] limit the number of plugged bio

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH -next v3 0/7] limit the number of plugged bio
@ 2023-05-29 13:10 Yu Kuai
  2023-05-29 13:11 ` [PATCH -next v3 1/7] md/raid10: prevent soft lockup while flush writes Yu Kuai
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Yu Kuai @ 2023-05-29 13:10 UTC (permalink / raw)
  To: song, neilb, akpm
  Cc: xni, linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Changes in v3:
 - prefix function with 'raid1_' instead of 'md_'
 - use a globle workequeue instead of per bitmap in patch 5
Changes in v2:
 - remove the patch to rename raid1-10.c

Yu Kuai (7):
  md/raid10: prevent soft lockup while flush writes
  md/raid1-10: factor out a helper to add bio to plug
  md/raid1-10: factor out a helper to submit normal write
  md/raid1-10: submit write io directly if bitmap is not enabled
  md/md-bitmap: add a new helper to unplug bitmap asynchrously
  md/raid1-10: don't handle pluged bio by daemon thread
  md/raid1-10: limit the number of plugged bio

 drivers/md/md-bitmap.c | 33 ++++++++++++++++++++--
 drivers/md/md-bitmap.h |  8 ++++++
 drivers/md/md.c        |  9 ++++++
 drivers/md/md.h        |  1 +
 drivers/md/raid1-10.c  | 63 ++++++++++++++++++++++++++++++++++++++++++
 drivers/md/raid1.c     | 29 ++++---------------
 drivers/md/raid10.c    | 47 +++++++------------------------
 7 files changed, 126 insertions(+), 64 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH -next v3 1/7] md/raid10: prevent soft lockup while flush writes
  2023-05-29 13:10 [PATCH -next v3 0/7] limit the number of plugged bio Yu Kuai
@ 2023-05-29 13:11 ` Yu Kuai
  2023-05-29 13:11 ` [PATCH -next v3 2/7] md/raid1-10: factor out a helper to add bio to plug Yu Kuai
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Yu Kuai @ 2023-05-29 13:11 UTC (permalink / raw)
  To: song, neilb, akpm
  Cc: xni, linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Currently, there is no limit for raid1/raid10 plugged bio. While flushing
writes, raid1 has cond_resched() while raid10 doesn't, and too many
writes can cause soft lockup.

Follow up soft lockup can be triggered easily with writeback test for
raid10 with ramdisks:

watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
Call Trace:
 <TASK>
 call_rcu+0x16/0x20
 put_object+0x41/0x80
 __delete_object+0x50/0x90
 delete_object_full+0x2b/0x40
 kmemleak_free+0x46/0xa0
 slab_free_freelist_hook.constprop.0+0xed/0x1a0
 kmem_cache_free+0xfd/0x300
 mempool_free_slab+0x1f/0x30
 mempool_free+0x3a/0x100
 bio_free+0x59/0x80
 bio_put+0xcf/0x2c0
 free_r10bio+0xbf/0xf0
 raid_end_bio_io+0x78/0xb0
 one_write_done+0x8a/0xa0
 raid10_end_write_request+0x1b4/0x430
 bio_endio+0x175/0x320
 brd_submit_bio+0x3b9/0x9b7 [brd]
 __submit_bio+0x69/0xe0
 submit_bio_noacct_nocheck+0x1e6/0x5a0
 submit_bio_noacct+0x38c/0x7e0
 flush_pending_writes+0xf0/0x240
 raid10d+0xac/0x1ed0

Fix the problem by adding cond_resched() to raid10 like what raid1 did.

Note that unlimited plugged bio still need to be optimized, for example,
in the case of lots of dirty pages writeback, this will take lots of
memory and io will spend a long time in plug, hence io latency is bad.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid10.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 32fb4ff0acdb..6b31f848a6d9 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -921,6 +921,7 @@ static void flush_pending_writes(struct r10conf *conf)
 			else
 				submit_bio_noacct(bio);
 			bio = next;
+			cond_resched();
 		}
 		blk_finish_plug(&plug);
 	} else
@@ -1145,6 +1146,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 		else
 			submit_bio_noacct(bio);
 		bio = next;
+		cond_resched();
 	}
 	kfree(plug);
 }
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH -next v3 2/7] md/raid1-10: factor out a helper to add bio to plug
  2023-05-29 13:10 [PATCH -next v3 0/7] limit the number of plugged bio Yu Kuai
  2023-05-29 13:11 ` [PATCH -next v3 1/7] md/raid10: prevent soft lockup while flush writes Yu Kuai
@ 2023-05-29 13:11 ` Yu Kuai
  2023-05-29 13:11 ` [PATCH -next v3 3/7] md/raid1-10: factor out a helper to submit normal write Yu Kuai
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Yu Kuai @ 2023-05-29 13:11 UTC (permalink / raw)
  To: song, neilb, akpm
  Cc: xni, linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

The code in raid1 and raid10 is identical, prepare to limit the number
of plugged bios.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid1-10.c | 16 ++++++++++++++++
 drivers/md/raid1.c    | 12 +-----------
 drivers/md/raid10.c   | 11 +----------
 3 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index e61f6cad4e08..9bf19a3409ce 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -109,3 +109,19 @@ static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
 		size -= len;
 	} while (idx++ < RESYNC_PAGES && size > 0);
 }
+
+static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
+				      blk_plug_cb_fn unplug)
+{
+	struct raid1_plug_cb *plug = NULL;
+	struct blk_plug_cb *cb = blk_check_plugged(unplug, mddev,
+						   sizeof(*plug));
+
+	if (!cb)
+		return false;
+
+	plug = container_of(cb, struct raid1_plug_cb, cb);
+	bio_list_add(&plug->pending, bio);
+
+	return true;
+}
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 2f1011ffdf09..e86c5e71c604 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1343,8 +1343,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 	struct bitmap *bitmap = mddev->bitmap;
 	unsigned long flags;
 	struct md_rdev *blocked_rdev;
-	struct blk_plug_cb *cb;
-	struct raid1_plug_cb *plug = NULL;
 	int first_clone;
 	int max_sectors;
 	bool write_behind = false;
@@ -1573,15 +1571,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 					      r1_bio->sector);
 		/* flush_pending_writes() needs access to the rdev so...*/
 		mbio->bi_bdev = (void *)rdev;
-
-		cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
-		if (cb)
-			plug = container_of(cb, struct raid1_plug_cb, cb);
-		else
-			plug = NULL;
-		if (plug) {
-			bio_list_add(&plug->pending, mbio);
-		} else {
+		if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug)) {
 			spin_lock_irqsave(&conf->device_lock, flags);
 			bio_list_add(&conf->pending_bio_list, mbio);
 			spin_unlock_irqrestore(&conf->device_lock, flags);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6b31f848a6d9..18702051ebd1 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1287,8 +1287,6 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
 	const blk_opf_t do_sync = bio->bi_opf & REQ_SYNC;
 	const blk_opf_t do_fua = bio->bi_opf & REQ_FUA;
 	unsigned long flags;
-	struct blk_plug_cb *cb;
-	struct raid1_plug_cb *plug = NULL;
 	struct r10conf *conf = mddev->private;
 	struct md_rdev *rdev;
 	int devnum = r10_bio->devs[n_copy].devnum;
@@ -1328,14 +1326,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
 
 	atomic_inc(&r10_bio->remaining);
 
-	cb = blk_check_plugged(raid10_unplug, mddev, sizeof(*plug));
-	if (cb)
-		plug = container_of(cb, struct raid1_plug_cb, cb);
-	else
-		plug = NULL;
-	if (plug) {
-		bio_list_add(&plug->pending, mbio);
-	} else {
+	if (!raid1_add_bio_to_plug(mddev, mbio, raid10_unplug)) {
 		spin_lock_irqsave(&conf->device_lock, flags);
 		bio_list_add(&conf->pending_bio_list, mbio);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH -next v3 3/7] md/raid1-10: factor out a helper to submit normal write
  2023-05-29 13:10 [PATCH -next v3 0/7] limit the number of plugged bio Yu Kuai
  2023-05-29 13:11 ` [PATCH -next v3 1/7] md/raid10: prevent soft lockup while flush writes Yu Kuai
  2023-05-29 13:11 ` [PATCH -next v3 2/7] md/raid1-10: factor out a helper to add bio to plug Yu Kuai
@ 2023-05-29 13:11 ` Yu Kuai
  2023-05-31  7:20   ` Xiao Ni
  2023-05-29 13:11 ` [PATCH -next v3 4/7] md/raid1-10: submit write io directly if bitmap is not enabled Yu Kuai
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2023-05-29 13:11 UTC (permalink / raw)
  To: song, neilb, akpm
  Cc: xni, linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are multiple places to do the same thing, factor out a helper to
prevent redundant code, and the helper will be used in following patch
as well.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid1-10.c | 17 +++++++++++++++++
 drivers/md/raid1.c    | 13 ++-----------
 drivers/md/raid10.c   | 26 ++++----------------------
 3 files changed, 23 insertions(+), 33 deletions(-)

diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index 9bf19a3409ce..506299bd55cb 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -110,6 +110,23 @@ static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
 	} while (idx++ < RESYNC_PAGES && size > 0);
 }
 
+
+static inline void raid1_submit_write(struct bio *bio)
+{
+	struct md_rdev *rdev = (struct md_rdev *)bio->bi_bdev;
+
+	bio->bi_next = NULL;
+	bio_set_dev(bio, rdev->bdev);
+	if (test_bit(Faulty, &rdev->flags))
+		bio_io_error(bio);
+	else if (unlikely(bio_op(bio) ==  REQ_OP_DISCARD &&
+			  !bdev_max_discard_sectors(bio->bi_bdev)))
+		/* Just ignore it */
+		bio_endio(bio);
+	else
+		submit_bio_noacct(bio);
+}
+
 static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
 				      blk_plug_cb_fn unplug)
 {
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index e86c5e71c604..0778e398584c 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -799,17 +799,8 @@ static void flush_bio_list(struct r1conf *conf, struct bio *bio)
 
 	while (bio) { /* submit pending writes */
 		struct bio *next = bio->bi_next;
-		struct md_rdev *rdev = (void *)bio->bi_bdev;
-		bio->bi_next = NULL;
-		bio_set_dev(bio, rdev->bdev);
-		if (test_bit(Faulty, &rdev->flags)) {
-			bio_io_error(bio);
-		} else if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
-				    !bdev_max_discard_sectors(bio->bi_bdev)))
-			/* Just ignore it */
-			bio_endio(bio);
-		else
-			submit_bio_noacct(bio);
+
+		raid1_submit_write(bio);
 		bio = next;
 		cond_resched();
 	}
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 18702051ebd1..6640507ecb0d 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -909,17 +909,8 @@ static void flush_pending_writes(struct r10conf *conf)
 
 		while (bio) { /* submit pending writes */
 			struct bio *next = bio->bi_next;
-			struct md_rdev *rdev = (void*)bio->bi_bdev;
-			bio->bi_next = NULL;
-			bio_set_dev(bio, rdev->bdev);
-			if (test_bit(Faulty, &rdev->flags)) {
-				bio_io_error(bio);
-			} else if (unlikely((bio_op(bio) ==  REQ_OP_DISCARD) &&
-					    !bdev_max_discard_sectors(bio->bi_bdev)))
-				/* Just ignore it */
-				bio_endio(bio);
-			else
-				submit_bio_noacct(bio);
+
+			raid1_submit_write(bio);
 			bio = next;
 			cond_resched();
 		}
@@ -1134,17 +1125,8 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 
 	while (bio) { /* submit pending writes */
 		struct bio *next = bio->bi_next;
-		struct md_rdev *rdev = (void*)bio->bi_bdev;
-		bio->bi_next = NULL;
-		bio_set_dev(bio, rdev->bdev);
-		if (test_bit(Faulty, &rdev->flags)) {
-			bio_io_error(bio);
-		} else if (unlikely((bio_op(bio) ==  REQ_OP_DISCARD) &&
-				    !bdev_max_discard_sectors(bio->bi_bdev)))
-			/* Just ignore it */
-			bio_endio(bio);
-		else
-			submit_bio_noacct(bio);
+
+		raid1_submit_write(bio);
 		bio = next;
 		cond_resched();
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH -next v3 4/7] md/raid1-10: submit write io directly if bitmap is not enabled
  2023-05-29 13:10 [PATCH -next v3 0/7] limit the number of plugged bio Yu Kuai
                   ` (2 preceding siblings ...)
  2023-05-29 13:11 ` [PATCH -next v3 3/7] md/raid1-10: factor out a helper to submit normal write Yu Kuai
@ 2023-05-29 13:11 ` Yu Kuai
  2023-05-31  7:26   ` Xiao Ni
  2023-05-29 13:11 ` [PATCH -next v3 5/7] md/md-bitmap: add a new helper to unplug bitmap asynchrously Yu Kuai
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2023-05-29 13:11 UTC (permalink / raw)
  To: song, neilb, akpm
  Cc: xni, linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Commit 6cce3b23f6f8 ("[PATCH] md: write intent bitmap support for raid10")
add bitmap support, and it changed that write io is submitted through
daemon thread because bitmap need to be updated before write io. And
later, plug is used to fix performance regression because all the write io
will go to demon thread, which means io can't be issued concurrently.

However, if bitmap is not enabled, the write io should not go to daemon
thread in the first place, and plug is not needed as well.

Fixes: 6cce3b23f6f8 ("[PATCH] md: write intent bitmap support for raid10")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-bitmap.c |  4 +---
 drivers/md/md-bitmap.h |  7 +++++++
 drivers/md/raid1-10.c  | 13 +++++++++++--
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index ad5a3456cd8a..3ee590cf12a7 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1016,7 +1016,6 @@ static int md_bitmap_file_test_bit(struct bitmap *bitmap, sector_t block)
 	return set;
 }
 
-
 /* this gets called when the md device is ready to unplug its underlying
  * (slave) device queues -- before we let any writes go down, we need to
  * sync the dirty pages of the bitmap file to disk */
@@ -1026,8 +1025,7 @@ void md_bitmap_unplug(struct bitmap *bitmap)
 	int dirty, need_write;
 	int writing = 0;
 
-	if (!bitmap || !bitmap->storage.filemap ||
-	    test_bit(BITMAP_STALE, &bitmap->flags))
+	if (!md_bitmap_enabled(bitmap))
 		return;
 
 	/* look at each page to see if there are any set bits that need to be
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index cfd7395de8fd..3a4750952b3a 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -273,6 +273,13 @@ int md_bitmap_copy_from_slot(struct mddev *mddev, int slot,
 			     sector_t *lo, sector_t *hi, bool clear_bits);
 void md_bitmap_free(struct bitmap *bitmap);
 void md_bitmap_wait_behind_writes(struct mddev *mddev);
+
+static inline bool md_bitmap_enabled(struct bitmap *bitmap)
+{
+	return bitmap && bitmap->storage.filemap &&
+	       !test_bit(BITMAP_STALE, &bitmap->flags);
+}
+
 #endif
 
 #endif
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index 506299bd55cb..73cc3cb9154d 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -131,9 +131,18 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
 				      blk_plug_cb_fn unplug)
 {
 	struct raid1_plug_cb *plug = NULL;
-	struct blk_plug_cb *cb = blk_check_plugged(unplug, mddev,
-						   sizeof(*plug));
+	struct blk_plug_cb *cb;
+
+	/*
+	 * If bitmap is not enabled, it's safe to submit the io directly, and
+	 * this can get optimal performance.
+	 */
+	if (!md_bitmap_enabled(mddev->bitmap)) {
+		raid1_submit_write(bio);
+		return true;
+	}
 
+	cb = blk_check_plugged(unplug, mddev, sizeof(*plug));
 	if (!cb)
 		return false;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH -next v3 5/7] md/md-bitmap: add a new helper to unplug bitmap asynchrously
  2023-05-29 13:10 [PATCH -next v3 0/7] limit the number of plugged bio Yu Kuai
                   ` (3 preceding siblings ...)
  2023-05-29 13:11 ` [PATCH -next v3 4/7] md/raid1-10: submit write io directly if bitmap is not enabled Yu Kuai
@ 2023-05-29 13:11 ` Yu Kuai
  2023-06-06 17:34   ` Song Liu
  2023-05-29 13:11 ` [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread Yu Kuai
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2023-05-29 13:11 UTC (permalink / raw)
  To: song, neilb, akpm
  Cc: xni, linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

If bitmap is enabled, bitmap must update before submitting write io, this
is why unplug callback must move these io to 'conf->pending_io_list' if
'current->bio_list' is not empty, which will suffer performance
degradation.

A new helper md_bitmap_unplug_async() is introduced to submit bitmap io
in a kworker, so that submit bitmap io in raid10_unplug() doesn't require
that 'current->bio_list' is empty.

This patch prepare to limit the number of plugged bio.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-bitmap.c | 29 +++++++++++++++++++++++++++++
 drivers/md/md-bitmap.h |  1 +
 drivers/md/md.c        |  9 +++++++++
 drivers/md/md.h        |  1 +
 4 files changed, 40 insertions(+)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 3ee590cf12a7..25cd72b317f1 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1054,6 +1054,35 @@ void md_bitmap_unplug(struct bitmap *bitmap)
 }
 EXPORT_SYMBOL(md_bitmap_unplug);
 
+struct bitmap_unplug_work {
+	struct work_struct work;
+	struct bitmap *bitmap;
+	struct completion *done;
+};
+
+static void md_bitmap_unplug_fn(struct work_struct *work)
+{
+	struct bitmap_unplug_work *unplug_work =
+		container_of(work, struct bitmap_unplug_work, work);
+
+	md_bitmap_unplug(unplug_work->bitmap);
+	complete(unplug_work->done);
+}
+
+void md_bitmap_unplug_async(struct bitmap *bitmap)
+{
+	DECLARE_COMPLETION_ONSTACK(done);
+	struct bitmap_unplug_work unplug_work;
+
+	INIT_WORK(&unplug_work.work, md_bitmap_unplug_fn);
+	unplug_work.bitmap = bitmap;
+	unplug_work.done = &done;
+
+	queue_work(md_bitmap_wq, &unplug_work.work);
+	wait_for_completion(&done);
+}
+EXPORT_SYMBOL(md_bitmap_unplug_async);
+
 static void md_bitmap_set_memory_bits(struct bitmap *bitmap, sector_t offset, int needed);
 /* * bitmap_init_from_disk -- called at bitmap_create time to initialize
  * the in-memory bitmap from the on-disk bitmap -- also, sets up the
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index 3a4750952b3a..8a3788c9bfef 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -264,6 +264,7 @@ void md_bitmap_sync_with_cluster(struct mddev *mddev,
 				 sector_t new_lo, sector_t new_hi);
 
 void md_bitmap_unplug(struct bitmap *bitmap);
+void md_bitmap_unplug_async(struct bitmap *bitmap);
 void md_bitmap_daemon_work(struct mddev *mddev);
 
 int md_bitmap_resize(struct bitmap *bitmap, sector_t blocks,
diff --git a/drivers/md/md.c b/drivers/md/md.c
index e592f37a1071..a5a7af2f4e59 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -83,6 +83,7 @@ static struct module *md_cluster_mod;
 static DECLARE_WAIT_QUEUE_HEAD(resync_wait);
 static struct workqueue_struct *md_wq;
 static struct workqueue_struct *md_misc_wq;
+struct workqueue_struct *md_bitmap_wq;
 
 static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this);
@@ -9636,6 +9637,11 @@ static int __init md_init(void)
 	if (!md_misc_wq)
 		goto err_misc_wq;
 
+	md_bitmap_wq = alloc_workqueue("md_bitmap", WQ_MEM_RECLAIM | WQ_UNBOUND,
+				       0);
+	if (!md_bitmap_wq)
+		goto err_bitmap_wq;
+
 	ret = __register_blkdev(MD_MAJOR, "md", md_probe);
 	if (ret < 0)
 		goto err_md;
@@ -9654,6 +9660,8 @@ static int __init md_init(void)
 err_mdp:
 	unregister_blkdev(MD_MAJOR, "md");
 err_md:
+	destroy_workqueue(md_bitmap_wq);
+err_bitmap_wq:
 	destroy_workqueue(md_misc_wq);
 err_misc_wq:
 	destroy_workqueue(md_wq);
@@ -9950,6 +9958,7 @@ static __exit void md_exit(void)
 	spin_unlock(&all_mddevs_lock);
 
 	destroy_workqueue(md_misc_wq);
+	destroy_workqueue(md_bitmap_wq);
 	destroy_workqueue(md_wq);
 }
 
diff --git a/drivers/md/md.h b/drivers/md/md.h
index a50122165fa1..bfd2306bc750 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -852,6 +852,7 @@ struct mdu_array_info_s;
 struct mdu_disk_info_s;
 
 extern int mdp_major;
+extern struct workqueue_struct *md_bitmap_wq;
 void md_autostart_arrays(int part);
 int md_set_array_info(struct mddev *mddev, struct mdu_array_info_s *info);
 int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread
  2023-05-29 13:10 [PATCH -next v3 0/7] limit the number of plugged bio Yu Kuai
                   ` (4 preceding siblings ...)
  2023-05-29 13:11 ` [PATCH -next v3 5/7] md/md-bitmap: add a new helper to unplug bitmap asynchrously Yu Kuai
@ 2023-05-29 13:11 ` Yu Kuai
  2023-05-31  7:50   ` Xiao Ni
  2023-05-31  7:57   ` Paul Menzel
  2023-05-29 13:11 ` [PATCH -next v3 7/7] md/raid1-10: limit the number of plugged bio Yu Kuai
  2023-06-06 21:54 ` [PATCH -next v3 0/7] " Song Liu
  7 siblings, 2 replies; 23+ messages in thread
From: Yu Kuai @ 2023-05-29 13:11 UTC (permalink / raw)
  To: song, neilb, akpm
  Cc: xni, linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

current->bio_list will be set under submit_bio() context, in this case
bitmap io will be added to the list and wait for current io submission to
finish, while current io submission must wait for bitmap io to be done.
commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
the deadlock by handling plugged bio by daemon thread.

On the one hand, the deadlock won't exist after commit a214b949d8e3
("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
the other hand, current solution makes it impossible to flush plugged bio
in raid1/10_make_request(), because this will cause that all the writes
will goto daemon thread.

In order to limit the number of plugged bio, commit 874807a83139
("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
deadlock is fixed by handling bitmap io asynchronously.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid1-10.c | 14 ++++++++++++++
 drivers/md/raid1.c    |  4 ++--
 drivers/md/raid10.c   |  8 +++-----
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index 73cc3cb9154d..17e55c1fd5a1 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
 
 	return true;
 }
+
+/*
+ * current->bio_list will be set under submit_bio() context, in this case bitmap
+ * io will be added to the list and wait for current io submission to finish,
+ * while current io submission must wait for bitmap io to be done. In order to
+ * avoid such deadlock, submit bitmap io asynchronously.
+ */
+static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
+{
+	if (current->bio_list)
+		md_bitmap_unplug_async(bitmap);
+	else
+		md_bitmap_unplug(bitmap);
+}
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0778e398584c..006620fed595 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 static void flush_bio_list(struct r1conf *conf, struct bio *bio)
 {
 	/* flush any pending bitmap writes to disk before proceeding w/ I/O */
-	md_bitmap_unplug(conf->mddev->bitmap);
+	raid1_prepare_flush_writes(conf->mddev->bitmap);
 	wake_up(&conf->wait_barrier);
 
 	while (bio) { /* submit pending writes */
@@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	struct r1conf *conf = mddev->private;
 	struct bio *bio;
 
-	if (from_schedule || current->bio_list) {
+	if (from_schedule) {
 		spin_lock_irq(&conf->device_lock);
 		bio_list_merge(&conf->pending_bio_list, &plug->pending);
 		spin_unlock_irq(&conf->device_lock);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6640507ecb0d..fb22cfe94d32 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
 		__set_current_state(TASK_RUNNING);
 
 		blk_start_plug(&plug);
-		/* flush any pending bitmap writes to disk
-		 * before proceeding w/ I/O */
-		md_bitmap_unplug(conf->mddev->bitmap);
+		raid1_prepare_flush_writes(conf->mddev->bitmap);
 		wake_up(&conf->wait_barrier);
 
 		while (bio) { /* submit pending writes */
@@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	struct r10conf *conf = mddev->private;
 	struct bio *bio;
 
-	if (from_schedule || current->bio_list) {
+	if (from_schedule) {
 		spin_lock_irq(&conf->device_lock);
 		bio_list_merge(&conf->pending_bio_list, &plug->pending);
 		spin_unlock_irq(&conf->device_lock);
@@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 
 	/* we aren't scheduling, so we can do the write-out directly. */
 	bio = bio_list_get(&plug->pending);
-	md_bitmap_unplug(mddev->bitmap);
+	raid1_prepare_flush_writes(mddev->bitmap);
 	wake_up(&conf->wait_barrier);
 
 	while (bio) { /* submit pending writes */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH -next v3 7/7] md/raid1-10: limit the number of plugged bio
  2023-05-29 13:10 [PATCH -next v3 0/7] limit the number of plugged bio Yu Kuai
                   ` (5 preceding siblings ...)
  2023-05-29 13:11 ` [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread Yu Kuai
@ 2023-05-29 13:11 ` Yu Kuai
  2023-05-31 15:42   ` Xiao Ni
  2023-06-06 21:54 ` [PATCH -next v3 0/7] " Song Liu
  7 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2023-05-29 13:11 UTC (permalink / raw)
  To: song, neilb, akpm
  Cc: xni, linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

bio can be added to plug infinitely, and following writeback test can
trigger huge amount of plugged bio:

Test script:
modprobe brd rd_nr=4 rd_size=10485760
mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean --bitmap=internal
echo 0 > /proc/sys/vm/dirty_background_ratio
fio -filename=/dev/md0 -ioengine=libaio -rw=write -bs=4k -numjobs=1 -iodepth=128 -name=test

Test result:
Monitor /sys/block/md0/inflight will found that inflight keep increasing
until fio finish writing, after running for about 2 minutes:

[root@fedora ~]# cat /sys/block/md0/inflight
       0  4474191

Fix the problem by limiting the number of plugged bio based on the number
of copies for original bio.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid1-10.c | 9 ++++++++-
 drivers/md/raid1.c    | 2 +-
 drivers/md/raid10.c   | 2 +-
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index 17e55c1fd5a1..bb1e23b66c45 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -21,6 +21,7 @@
 #define IO_MADE_GOOD ((struct bio *)2)
 
 #define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
+#define MAX_PLUG_BIO 32
 
 /* for managing resync I/O pages */
 struct resync_pages {
@@ -31,6 +32,7 @@ struct resync_pages {
 struct raid1_plug_cb {
 	struct blk_plug_cb	cb;
 	struct bio_list		pending;
+	unsigned int		count;
 };
 
 static void rbio_pool_free(void *rbio, void *data)
@@ -128,7 +130,7 @@ static inline void raid1_submit_write(struct bio *bio)
 }
 
 static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
-				      blk_plug_cb_fn unplug)
+				      blk_plug_cb_fn unplug, int copies)
 {
 	struct raid1_plug_cb *plug = NULL;
 	struct blk_plug_cb *cb;
@@ -148,6 +150,11 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
 
 	plug = container_of(cb, struct raid1_plug_cb, cb);
 	bio_list_add(&plug->pending, bio);
+	if (++plug->count / MAX_PLUG_BIO >= copies) {
+		list_del(&cb->list);
+		cb->callback(cb, false);
+	}
+
 
 	return true;
 }
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 006620fed595..dc89a1c4b1f1 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1562,7 +1562,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 					      r1_bio->sector);
 		/* flush_pending_writes() needs access to the rdev so...*/
 		mbio->bi_bdev = (void *)rdev;
-		if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug)) {
+		if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
 			spin_lock_irqsave(&conf->device_lock, flags);
 			bio_list_add(&conf->pending_bio_list, mbio);
 			spin_unlock_irqrestore(&conf->device_lock, flags);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index fb22cfe94d32..9237dbeb07ba 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1306,7 +1306,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
 
 	atomic_inc(&r10_bio->remaining);
 
-	if (!raid1_add_bio_to_plug(mddev, mbio, raid10_unplug)) {
+	if (!raid1_add_bio_to_plug(mddev, mbio, raid10_unplug, conf->copies)) {
 		spin_lock_irqsave(&conf->device_lock, flags);
 		bio_list_add(&conf->pending_bio_list, mbio);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 3/7] md/raid1-10: factor out a helper to submit normal write
  2023-05-29 13:11 ` [PATCH -next v3 3/7] md/raid1-10: factor out a helper to submit normal write Yu Kuai
@ 2023-05-31  7:20   ` Xiao Ni
  2023-05-31  7:56     ` Yu Kuai
  0 siblings, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2023-05-31  7:20 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yukuai3, yi.zhang,
	yangerkun

On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> There are multiple places to do the same thing, factor out a helper to
> prevent redundant code, and the helper will be used in following patch
> as well.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/raid1-10.c | 17 +++++++++++++++++
>  drivers/md/raid1.c    | 13 ++-----------
>  drivers/md/raid10.c   | 26 ++++----------------------
>  3 files changed, 23 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index 9bf19a3409ce..506299bd55cb 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -110,6 +110,23 @@ static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
>         } while (idx++ < RESYNC_PAGES && size > 0);
>  }
>
> +
> +static inline void raid1_submit_write(struct bio *bio)

Hi Kuai

Is it better to change the name to rdev_submit_write? It's just a
suggestion. The patch looks good to me.

Regards
Xiao

> +{
> +       struct md_rdev *rdev = (struct md_rdev *)bio->bi_bdev;
> +
> +       bio->bi_next = NULL;
> +       bio_set_dev(bio, rdev->bdev);
> +       if (test_bit(Faulty, &rdev->flags))
> +               bio_io_error(bio);
> +       else if (unlikely(bio_op(bio) ==  REQ_OP_DISCARD &&
> +                         !bdev_max_discard_sectors(bio->bi_bdev)))
> +               /* Just ignore it */
> +               bio_endio(bio);
> +       else
> +               submit_bio_noacct(bio);
> +}
> +
>  static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>                                       blk_plug_cb_fn unplug)
>  {
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index e86c5e71c604..0778e398584c 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -799,17 +799,8 @@ static void flush_bio_list(struct r1conf *conf, struct bio *bio)
>
>         while (bio) { /* submit pending writes */
>                 struct bio *next = bio->bi_next;
> -               struct md_rdev *rdev = (void *)bio->bi_bdev;
> -               bio->bi_next = NULL;
> -               bio_set_dev(bio, rdev->bdev);
> -               if (test_bit(Faulty, &rdev->flags)) {
> -                       bio_io_error(bio);
> -               } else if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
> -                                   !bdev_max_discard_sectors(bio->bi_bdev)))
> -                       /* Just ignore it */
> -                       bio_endio(bio);
> -               else
> -                       submit_bio_noacct(bio);
> +
> +               raid1_submit_write(bio);
>                 bio = next;
>                 cond_resched();
>         }
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 18702051ebd1..6640507ecb0d 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -909,17 +909,8 @@ static void flush_pending_writes(struct r10conf *conf)
>
>                 while (bio) { /* submit pending writes */
>                         struct bio *next = bio->bi_next;
> -                       struct md_rdev *rdev = (void*)bio->bi_bdev;
> -                       bio->bi_next = NULL;
> -                       bio_set_dev(bio, rdev->bdev);
> -                       if (test_bit(Faulty, &rdev->flags)) {
> -                               bio_io_error(bio);
> -                       } else if (unlikely((bio_op(bio) ==  REQ_OP_DISCARD) &&
> -                                           !bdev_max_discard_sectors(bio->bi_bdev)))
> -                               /* Just ignore it */
> -                               bio_endio(bio);
> -                       else
> -                               submit_bio_noacct(bio);
> +
> +                       raid1_submit_write(bio);
>                         bio = next;
>                         cond_resched();
>                 }
> @@ -1134,17 +1125,8 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>
>         while (bio) { /* submit pending writes */
>                 struct bio *next = bio->bi_next;
> -               struct md_rdev *rdev = (void*)bio->bi_bdev;
> -               bio->bi_next = NULL;
> -               bio_set_dev(bio, rdev->bdev);
> -               if (test_bit(Faulty, &rdev->flags)) {
> -                       bio_io_error(bio);
> -               } else if (unlikely((bio_op(bio) ==  REQ_OP_DISCARD) &&
> -                                   !bdev_max_discard_sectors(bio->bi_bdev)))
> -                       /* Just ignore it */
> -                       bio_endio(bio);
> -               else
> -                       submit_bio_noacct(bio);
> +
> +               raid1_submit_write(bio);
>                 bio = next;
>                 cond_resched();
>         }
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 4/7] md/raid1-10: submit write io directly if bitmap is not enabled
  2023-05-29 13:11 ` [PATCH -next v3 4/7] md/raid1-10: submit write io directly if bitmap is not enabled Yu Kuai
@ 2023-05-31  7:26   ` Xiao Ni
  2023-05-31  8:25     ` Yu Kuai
  0 siblings, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2023-05-31  7:26 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yukuai3, yi.zhang,
	yangerkun

On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Commit 6cce3b23f6f8 ("[PATCH] md: write intent bitmap support for raid10")
> add bitmap support, and it changed that write io is submitted through
> daemon thread because bitmap need to be updated before write io. And
> later, plug is used to fix performance regression because all the write io
> will go to demon thread, which means io can't be issued concurrently.
>
> However, if bitmap is not enabled, the write io should not go to daemon
> thread in the first place, and plug is not needed as well.
>
> Fixes: 6cce3b23f6f8 ("[PATCH] md: write intent bitmap support for raid10")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md-bitmap.c |  4 +---
>  drivers/md/md-bitmap.h |  7 +++++++
>  drivers/md/raid1-10.c  | 13 +++++++++++--
>  3 files changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index ad5a3456cd8a..3ee590cf12a7 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -1016,7 +1016,6 @@ static int md_bitmap_file_test_bit(struct bitmap *bitmap, sector_t block)
>         return set;
>  }
>
> -
>  /* this gets called when the md device is ready to unplug its underlying
>   * (slave) device queues -- before we let any writes go down, we need to
>   * sync the dirty pages of the bitmap file to disk */
> @@ -1026,8 +1025,7 @@ void md_bitmap_unplug(struct bitmap *bitmap)
>         int dirty, need_write;
>         int writing = 0;
>
> -       if (!bitmap || !bitmap->storage.filemap ||
> -           test_bit(BITMAP_STALE, &bitmap->flags))
> +       if (!md_bitmap_enabled(bitmap))
>                 return;
>
>         /* look at each page to see if there are any set bits that need to be
> diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
> index cfd7395de8fd..3a4750952b3a 100644
> --- a/drivers/md/md-bitmap.h
> +++ b/drivers/md/md-bitmap.h
> @@ -273,6 +273,13 @@ int md_bitmap_copy_from_slot(struct mddev *mddev, int slot,
>                              sector_t *lo, sector_t *hi, bool clear_bits);
>  void md_bitmap_free(struct bitmap *bitmap);
>  void md_bitmap_wait_behind_writes(struct mddev *mddev);
> +
> +static inline bool md_bitmap_enabled(struct bitmap *bitmap)
> +{
> +       return bitmap && bitmap->storage.filemap &&
> +              !test_bit(BITMAP_STALE, &bitmap->flags);
> +}
> +
>  #endif
>
>  #endif
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index 506299bd55cb..73cc3cb9154d 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -131,9 +131,18 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>                                       blk_plug_cb_fn unplug)
>  {
>         struct raid1_plug_cb *plug = NULL;
> -       struct blk_plug_cb *cb = blk_check_plugged(unplug, mddev,
> -                                                  sizeof(*plug));
> +       struct blk_plug_cb *cb;
> +
> +       /*
> +        * If bitmap is not enabled, it's safe to submit the io directly, and
> +        * this can get optimal performance.
> +        */
> +       if (!md_bitmap_enabled(mddev->bitmap)) {
> +               raid1_submit_write(bio);
> +               return true;
> +       }

Can we check this out of raid1_add_bio_to_plug and call
raid1_submit_write directly in make_request function?

Regards
Xiao
>
> +       cb = blk_check_plugged(unplug, mddev, sizeof(*plug));
>         if (!cb)
>                 return false;
>
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread
  2023-05-29 13:11 ` [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread Yu Kuai
@ 2023-05-31  7:50   ` Xiao Ni
  2023-05-31  7:55     ` Yu Kuai
  2023-05-31  7:57   ` Paul Menzel
  1 sibling, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2023-05-31  7:50 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yukuai3, yi.zhang,
	yangerkun

On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> current->bio_list will be set under submit_bio() context, in this case
> bitmap io will be added to the list and wait for current io submission to
> finish, while current io submission must wait for bitmap io to be done.
> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
> the deadlock by handling plugged bio by daemon thread.

Thanks for the historic introduction. I did a test and printed the
logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule
is always true during I/O and it's 0 when io finishes. So I have a
question here, how can I trigger the condition that from_schedule is 0
and current->list is not NULL? In other words, is there really a
deadlock here? Before your patch it looks like all bios are merged
into conf->pending_bio_list and are handled by raid10d. It can't
submit bio directly in the originating process which mentioned in
57c67df48866

>
> On the one hand, the deadlock won't exist after commit a214b949d8e3
> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
> the other hand, current solution makes it impossible to flush plugged bio
> in raid1/10_make_request(), because this will cause that all the writes
> will goto daemon thread.
>
> In order to limit the number of plugged bio, commit 874807a83139
> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
> deadlock is fixed by handling bitmap io asynchronously.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/raid1-10.c | 14 ++++++++++++++
>  drivers/md/raid1.c    |  4 ++--
>  drivers/md/raid10.c   |  8 +++-----
>  3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index 73cc3cb9154d..17e55c1fd5a1 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>
>         return true;
>  }
> +
> +/*
> + * current->bio_list will be set under submit_bio() context, in this case bitmap
> + * io will be added to the list and wait for current io submission to finish,
> + * while current io submission must wait for bitmap io to be done. In order to
> + * avoid such deadlock, submit bitmap io asynchronously.
> + */
> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
> +{
> +       if (current->bio_list)
> +               md_bitmap_unplug_async(bitmap);
> +       else
> +               md_bitmap_unplug(bitmap);
> +}
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 0778e398584c..006620fed595 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
>  static void flush_bio_list(struct r1conf *conf, struct bio *bio)
>  {
>         /* flush any pending bitmap writes to disk before proceeding w/ I/O */
> -       md_bitmap_unplug(conf->mddev->bitmap);
> +       raid1_prepare_flush_writes(conf->mddev->bitmap);

If we unplug bitmap asynchronously, can we make sure the bitmap are
flushed before the corresponding data?

Regards
Xiao

>         wake_up(&conf->wait_barrier);
>
>         while (bio) { /* submit pending writes */
> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
>         struct r1conf *conf = mddev->private;
>         struct bio *bio;
>
> -       if (from_schedule || current->bio_list) {
> +       if (from_schedule) {
>                 spin_lock_irq(&conf->device_lock);
>                 bio_list_merge(&conf->pending_bio_list, &plug->pending);
>                 spin_unlock_irq(&conf->device_lock);
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6640507ecb0d..fb22cfe94d32 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
>                 __set_current_state(TASK_RUNNING);
>
>                 blk_start_plug(&plug);
> -               /* flush any pending bitmap writes to disk
> -                * before proceeding w/ I/O */
> -               md_bitmap_unplug(conf->mddev->bitmap);
> +               raid1_prepare_flush_writes(conf->mddev->bitmap);
>                 wake_up(&conf->wait_barrier);
>
>                 while (bio) { /* submit pending writes */
> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>         struct r10conf *conf = mddev->private;
>         struct bio *bio;
>
> -       if (from_schedule || current->bio_list) {
> +       if (from_schedule) {
>                 spin_lock_irq(&conf->device_lock);
>                 bio_list_merge(&conf->pending_bio_list, &plug->pending);
>                 spin_unlock_irq(&conf->device_lock);
> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>
>         /* we aren't scheduling, so we can do the write-out directly. */
>         bio = bio_list_get(&plug->pending);
> -       md_bitmap_unplug(mddev->bitmap);
> +       raid1_prepare_flush_writes(mddev->bitmap);
>         wake_up(&conf->wait_barrier);
>
>         while (bio) { /* submit pending writes */
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread
  2023-05-31  7:50   ` Xiao Ni
@ 2023-05-31  7:55     ` Yu Kuai
  2023-05-31  8:00       ` Xiao Ni
  0 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2023-05-31  7:55 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yi.zhang, yangerkun,
	yukuai (C)

Hi,

在 2023/05/31 15:50, Xiao Ni 写道:
> On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> current->bio_list will be set under submit_bio() context, in this case
>> bitmap io will be added to the list and wait for current io submission to
>> finish, while current io submission must wait for bitmap io to be done.
>> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
>> the deadlock by handling plugged bio by daemon thread.
> 
> Thanks for the historic introduction. I did a test and printed the
> logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule
> is always true during I/O and it's 0 when io finishes. So I have a
> question here, how can I trigger the condition that from_schedule is 0
> and current->list is not NULL? In other words, is there really a
> deadlock here? Before your patch it looks like all bios are merged
> into conf->pending_bio_list and are handled by raid10d. It can't
> submit bio directly in the originating process which mentioned in
> 57c67df48866
> 
As I mentioned below, after commit a214b949d8e3, this deadlock doesn't
exist anymore, and without this patch, patch 7 will introduce this
scenario again.

Thanks,
Kuai
>>
>> On the one hand, the deadlock won't exist after commit a214b949d8e3
>> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
>> the other hand, current solution makes it impossible to flush plugged bio
>> in raid1/10_make_request(), because this will cause that all the writes
>> will goto daemon thread.
>>
>> In order to limit the number of plugged bio, commit 874807a83139
>> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
>> deadlock is fixed by handling bitmap io asynchronously.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/raid1-10.c | 14 ++++++++++++++
>>   drivers/md/raid1.c    |  4 ++--
>>   drivers/md/raid10.c   |  8 +++-----
>>   3 files changed, 19 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
>> index 73cc3cb9154d..17e55c1fd5a1 100644
>> --- a/drivers/md/raid1-10.c
>> +++ b/drivers/md/raid1-10.c
>> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>>
>>          return true;
>>   }
>> +
>> +/*
>> + * current->bio_list will be set under submit_bio() context, in this case bitmap
>> + * io will be added to the list and wait for current io submission to finish,
>> + * while current io submission must wait for bitmap io to be done. In order to
>> + * avoid such deadlock, submit bitmap io asynchronously.
>> + */
>> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
>> +{
>> +       if (current->bio_list)
>> +               md_bitmap_unplug_async(bitmap);
>> +       else
>> +               md_bitmap_unplug(bitmap);
>> +}
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 0778e398584c..006620fed595 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
>>   static void flush_bio_list(struct r1conf *conf, struct bio *bio)
>>   {
>>          /* flush any pending bitmap writes to disk before proceeding w/ I/O */
>> -       md_bitmap_unplug(conf->mddev->bitmap);
>> +       raid1_prepare_flush_writes(conf->mddev->bitmap);
> 
> If we unplug bitmap asynchronously, can we make sure the bitmap are
> flushed before the corresponding data?
> 
> Regards
> Xiao
> 
>>          wake_up(&conf->wait_barrier);
>>
>>          while (bio) { /* submit pending writes */
>> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>          struct r1conf *conf = mddev->private;
>>          struct bio *bio;
>>
>> -       if (from_schedule || current->bio_list) {
>> +       if (from_schedule) {
>>                  spin_lock_irq(&conf->device_lock);
>>                  bio_list_merge(&conf->pending_bio_list, &plug->pending);
>>                  spin_unlock_irq(&conf->device_lock);
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 6640507ecb0d..fb22cfe94d32 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
>>                  __set_current_state(TASK_RUNNING);
>>
>>                  blk_start_plug(&plug);
>> -               /* flush any pending bitmap writes to disk
>> -                * before proceeding w/ I/O */
>> -               md_bitmap_unplug(conf->mddev->bitmap);
>> +               raid1_prepare_flush_writes(conf->mddev->bitmap);
>>                  wake_up(&conf->wait_barrier);
>>
>>                  while (bio) { /* submit pending writes */
>> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>          struct r10conf *conf = mddev->private;
>>          struct bio *bio;
>>
>> -       if (from_schedule || current->bio_list) {
>> +       if (from_schedule) {
>>                  spin_lock_irq(&conf->device_lock);
>>                  bio_list_merge(&conf->pending_bio_list, &plug->pending);
>>                  spin_unlock_irq(&conf->device_lock);
>> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>
>>          /* we aren't scheduling, so we can do the write-out directly. */
>>          bio = bio_list_get(&plug->pending);
>> -       md_bitmap_unplug(mddev->bitmap);
>> +       raid1_prepare_flush_writes(mddev->bitmap);
>>          wake_up(&conf->wait_barrier);
>>
>>          while (bio) { /* submit pending writes */
>> --
>> 2.39.2
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 3/7] md/raid1-10: factor out a helper to submit normal write
  2023-05-31  7:20   ` Xiao Ni
@ 2023-05-31  7:56     ` Yu Kuai
  0 siblings, 0 replies; 23+ messages in thread
From: Yu Kuai @ 2023-05-31  7:56 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yi.zhang, yangerkun,
	yukuai (C)

Hi,

在 2023/05/31 15:20, Xiao Ni 写道:
>> @@ -110,6 +110,23 @@ static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
>>          } while (idx++ < RESYNC_PAGES && size > 0);
>>   }
>>
>> +
>> +static inline void raid1_submit_write(struct bio *bio)
> 
> Hi Kuai
> 
> Is it better to change the name to rdev_submit_write? It's just a
> suggestion. The patch looks good to me.

Yes, this sounds like a better name.

Thanks,
Kuai
> 
> Regards
> Xiao


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread
  2023-05-29 13:11 ` [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread Yu Kuai
  2023-05-31  7:50   ` Xiao Ni
@ 2023-05-31  7:57   ` Paul Menzel
  1 sibling, 0 replies; 23+ messages in thread
From: Paul Menzel @ 2023-05-31  7:57 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, neilb, akpm, xni, linux-raid, linux-kernel, yukuai3,
	yi.zhang, yangerkun

Dear Yu,


Thank you for your patch. Some minor nits in case you should resend this.

In the summary/title it should be plug*g*ed.

Am 29.05.23 um 15:11 schrieb Yu Kuai:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> current->bio_list will be set under submit_bio() context, in this case
> bitmap io will be added to the list and wait for current io submission to

1.  I’d use present tense: s/will be set/is set/; s/will be added/is added/
2.  wait*s*

> finish, while current io submission must wait for bitmap io to be done.
> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
> the deadlock by handling plugged bio by daemon thread.
> 
> On the one hand, the deadlock won't exist after commit a214b949d8e3
> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
> the other hand, current solution makes it impossible to flush plugged bio
> in raid1/10_make_request(), because this will cause that all the writes
> will goto daemon thread.

s/goto/go to/


Kind regards,

Paul


> In order to limit the number of plugged bio, commit 874807a83139
> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
> deadlock is fixed by handling bitmap io asynchronously.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/raid1-10.c | 14 ++++++++++++++
>   drivers/md/raid1.c    |  4 ++--
>   drivers/md/raid10.c   |  8 +++-----
>   3 files changed, 19 insertions(+), 7 deletions(-)

[…]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread
  2023-05-31  7:55     ` Yu Kuai
@ 2023-05-31  8:00       ` Xiao Ni
  2023-05-31  8:06         ` Yu Kuai
  0 siblings, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2023-05-31  8:00 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yi.zhang, yangerkun,
	yukuai (C)

On Wed, May 31, 2023 at 3:55 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/31 15:50, Xiao Ni 写道:
> > On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> From: Yu Kuai <yukuai3@huawei.com>
> >>
> >> current->bio_list will be set under submit_bio() context, in this case
> >> bitmap io will be added to the list and wait for current io submission to
> >> finish, while current io submission must wait for bitmap io to be done.
> >> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
> >> the deadlock by handling plugged bio by daemon thread.
> >
> > Thanks for the historic introduction. I did a test and printed the
> > logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule
> > is always true during I/O and it's 0 when io finishes. So I have a
> > question here, how can I trigger the condition that from_schedule is 0
> > and current->list is not NULL? In other words, is there really a
> > deadlock here? Before your patch it looks like all bios are merged
> > into conf->pending_bio_list and are handled by raid10d. It can't
> > submit bio directly in the originating process which mentioned in
> > 57c67df48866
> >
> As I mentioned below, after commit a214b949d8e3, this deadlock doesn't
> exist anymore, and without this patch, patch 7 will introduce this
> scenario again.
>
> Thanks,
> Kuai
> >>
> >> On the one hand, the deadlock won't exist after commit a214b949d8e3
> >> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
> >> the other hand, current solution makes it impossible to flush plugged bio
> >> in raid1/10_make_request(), because this will cause that all the writes
> >> will goto daemon thread.
> >>
> >> In order to limit the number of plugged bio, commit 874807a83139
> >> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
> >> deadlock is fixed by handling bitmap io asynchronously.
> >>
> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> >> ---
> >>   drivers/md/raid1-10.c | 14 ++++++++++++++
> >>   drivers/md/raid1.c    |  4 ++--
> >>   drivers/md/raid10.c   |  8 +++-----
> >>   3 files changed, 19 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> >> index 73cc3cb9154d..17e55c1fd5a1 100644
> >> --- a/drivers/md/raid1-10.c
> >> +++ b/drivers/md/raid1-10.c
> >> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
> >>
> >>          return true;
> >>   }
> >> +
> >> +/*
> >> + * current->bio_list will be set under submit_bio() context, in this case bitmap
> >> + * io will be added to the list and wait for current io submission to finish,
> >> + * while current io submission must wait for bitmap io to be done. In order to
> >> + * avoid such deadlock, submit bitmap io asynchronously.
> >> + */
> >> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
> >> +{
> >> +       if (current->bio_list)
> >> +               md_bitmap_unplug_async(bitmap);
> >> +       else
> >> +               md_bitmap_unplug(bitmap);
> >> +}
> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >> index 0778e398584c..006620fed595 100644
> >> --- a/drivers/md/raid1.c
> >> +++ b/drivers/md/raid1.c
> >> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
> >>   static void flush_bio_list(struct r1conf *conf, struct bio *bio)
> >>   {
> >>          /* flush any pending bitmap writes to disk before proceeding w/ I/O */
> >> -       md_bitmap_unplug(conf->mddev->bitmap);
> >> +       raid1_prepare_flush_writes(conf->mddev->bitmap);
> >
> > If we unplug bitmap asynchronously, can we make sure the bitmap are
> > flushed before the corresponding data?

Could you explain this question?

Regards
Xiao


> >
> > Regards
> > Xiao
> >
> >>          wake_up(&conf->wait_barrier);
> >>
> >>          while (bio) { /* submit pending writes */
> >> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >>          struct r1conf *conf = mddev->private;
> >>          struct bio *bio;
> >>
> >> -       if (from_schedule || current->bio_list) {
> >> +       if (from_schedule) {
> >>                  spin_lock_irq(&conf->device_lock);
> >>                  bio_list_merge(&conf->pending_bio_list, &plug->pending);
> >>                  spin_unlock_irq(&conf->device_lock);
> >> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >> index 6640507ecb0d..fb22cfe94d32 100644
> >> --- a/drivers/md/raid10.c
> >> +++ b/drivers/md/raid10.c
> >> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
> >>                  __set_current_state(TASK_RUNNING);
> >>
> >>                  blk_start_plug(&plug);
> >> -               /* flush any pending bitmap writes to disk
> >> -                * before proceeding w/ I/O */
> >> -               md_bitmap_unplug(conf->mddev->bitmap);
> >> +               raid1_prepare_flush_writes(conf->mddev->bitmap);
> >>                  wake_up(&conf->wait_barrier);
> >>
> >>                  while (bio) { /* submit pending writes */
> >> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >>          struct r10conf *conf = mddev->private;
> >>          struct bio *bio;
> >>
> >> -       if (from_schedule || current->bio_list) {
> >> +       if (from_schedule) {
> >>                  spin_lock_irq(&conf->device_lock);
> >>                  bio_list_merge(&conf->pending_bio_list, &plug->pending);
> >>                  spin_unlock_irq(&conf->device_lock);
> >> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >>
> >>          /* we aren't scheduling, so we can do the write-out directly. */
> >>          bio = bio_list_get(&plug->pending);
> >> -       md_bitmap_unplug(mddev->bitmap);
> >> +       raid1_prepare_flush_writes(mddev->bitmap);
> >>          wake_up(&conf->wait_barrier);
> >>
> >>          while (bio) { /* submit pending writes */
> >> --
> >> 2.39.2
> >>
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread
  2023-05-31  8:00       ` Xiao Ni
@ 2023-05-31  8:06         ` Yu Kuai
  2023-05-31 15:23           ` Xiao Ni
  0 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2023-05-31  8:06 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yi.zhang, yangerkun,
	yukuai (C)

Hi,

在 2023/05/31 16:00, Xiao Ni 写道:
> On Wed, May 31, 2023 at 3:55 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/05/31 15:50, Xiao Ni 写道:
>>> On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> current->bio_list will be set under submit_bio() context, in this case
>>>> bitmap io will be added to the list and wait for current io submission to
>>>> finish, while current io submission must wait for bitmap io to be done.
>>>> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
>>>> the deadlock by handling plugged bio by daemon thread.
>>>
>>> Thanks for the historic introduction. I did a test and printed the
>>> logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule
>>> is always true during I/O and it's 0 when io finishes. So I have a
>>> question here, how can I trigger the condition that from_schedule is 0
>>> and current->list is not NULL? In other words, is there really a
>>> deadlock here? Before your patch it looks like all bios are merged
>>> into conf->pending_bio_list and are handled by raid10d. It can't
>>> submit bio directly in the originating process which mentioned in
>>> 57c67df48866
>>>
>> As I mentioned below, after commit a214b949d8e3, this deadlock doesn't
>> exist anymore, and without this patch, patch 7 will introduce this
>> scenario again.
>>
>> Thanks,
>> Kuai
>>>>
>>>> On the one hand, the deadlock won't exist after commit a214b949d8e3
>>>> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
>>>> the other hand, current solution makes it impossible to flush plugged bio
>>>> in raid1/10_make_request(), because this will cause that all the writes
>>>> will goto daemon thread.
>>>>
>>>> In order to limit the number of plugged bio, commit 874807a83139
>>>> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
>>>> deadlock is fixed by handling bitmap io asynchronously.
>>>>
>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>> ---
>>>>    drivers/md/raid1-10.c | 14 ++++++++++++++
>>>>    drivers/md/raid1.c    |  4 ++--
>>>>    drivers/md/raid10.c   |  8 +++-----
>>>>    3 files changed, 19 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
>>>> index 73cc3cb9154d..17e55c1fd5a1 100644
>>>> --- a/drivers/md/raid1-10.c
>>>> +++ b/drivers/md/raid1-10.c
>>>> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>>>>
>>>>           return true;
>>>>    }
>>>> +
>>>> +/*
>>>> + * current->bio_list will be set under submit_bio() context, in this case bitmap
>>>> + * io will be added to the list and wait for current io submission to finish,
>>>> + * while current io submission must wait for bitmap io to be done. In order to
>>>> + * avoid such deadlock, submit bitmap io asynchronously.
>>>> + */
>>>> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
>>>> +{
>>>> +       if (current->bio_list)
>>>> +               md_bitmap_unplug_async(bitmap);
>>>> +       else
>>>> +               md_bitmap_unplug(bitmap);
>>>> +}
>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>>> index 0778e398584c..006620fed595 100644
>>>> --- a/drivers/md/raid1.c
>>>> +++ b/drivers/md/raid1.c
>>>> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
>>>>    static void flush_bio_list(struct r1conf *conf, struct bio *bio)
>>>>    {
>>>>           /* flush any pending bitmap writes to disk before proceeding w/ I/O */
>>>> -       md_bitmap_unplug(conf->mddev->bitmap);
>>>> +       raid1_prepare_flush_writes(conf->mddev->bitmap);
>>>
>>> If we unplug bitmap asynchronously, can we make sure the bitmap are
>>> flushed before the corresponding data?
> 
> Could you explain this question?

Sorry that I missed this... See the new helper in patch 5,
md_bitmap_unplug_async() will still wait for bitmap io to finish.

md_bitmap_unplug_async
  DECLARE_COMPLETION_ONSTACK(done)
  ...
  wait_for_completion(&done)

Thanks,
Kuai
> 
> Regards
> Xiao
> 
> 
>>>
>>> Regards
>>> Xiao
>>>
>>>>           wake_up(&conf->wait_barrier);
>>>>
>>>>           while (bio) { /* submit pending writes */
>>>> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>>>           struct r1conf *conf = mddev->private;
>>>>           struct bio *bio;
>>>>
>>>> -       if (from_schedule || current->bio_list) {
>>>> +       if (from_schedule) {
>>>>                   spin_lock_irq(&conf->device_lock);
>>>>                   bio_list_merge(&conf->pending_bio_list, &plug->pending);
>>>>                   spin_unlock_irq(&conf->device_lock);
>>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>>> index 6640507ecb0d..fb22cfe94d32 100644
>>>> --- a/drivers/md/raid10.c
>>>> +++ b/drivers/md/raid10.c
>>>> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
>>>>                   __set_current_state(TASK_RUNNING);
>>>>
>>>>                   blk_start_plug(&plug);
>>>> -               /* flush any pending bitmap writes to disk
>>>> -                * before proceeding w/ I/O */
>>>> -               md_bitmap_unplug(conf->mddev->bitmap);
>>>> +               raid1_prepare_flush_writes(conf->mddev->bitmap);
>>>>                   wake_up(&conf->wait_barrier);
>>>>
>>>>                   while (bio) { /* submit pending writes */
>>>> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>>>           struct r10conf *conf = mddev->private;
>>>>           struct bio *bio;
>>>>
>>>> -       if (from_schedule || current->bio_list) {
>>>> +       if (from_schedule) {
>>>>                   spin_lock_irq(&conf->device_lock);
>>>>                   bio_list_merge(&conf->pending_bio_list, &plug->pending);
>>>>                   spin_unlock_irq(&conf->device_lock);
>>>> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>>>
>>>>           /* we aren't scheduling, so we can do the write-out directly. */
>>>>           bio = bio_list_get(&plug->pending);
>>>> -       md_bitmap_unplug(mddev->bitmap);
>>>> +       raid1_prepare_flush_writes(mddev->bitmap);
>>>>           wake_up(&conf->wait_barrier);
>>>>
>>>>           while (bio) { /* submit pending writes */
>>>> --
>>>> 2.39.2
>>>>
>>>
>>> .
>>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 4/7] md/raid1-10: submit write io directly if bitmap is not enabled
  2023-05-31  7:26   ` Xiao Ni
@ 2023-05-31  8:25     ` Yu Kuai
  2023-05-31 15:19       ` Xiao Ni
  0 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2023-05-31  8:25 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yi.zhang, yangerkun,
	yukuai (C)

Hi,

在 2023/05/31 15:26, Xiao Ni 写道:
> On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Commit 6cce3b23f6f8 ("[PATCH] md: write intent bitmap support for raid10")
>> add bitmap support, and it changed that write io is submitted through
>> daemon thread because bitmap need to be updated before write io. And
>> later, plug is used to fix performance regression because all the write io
>> will go to demon thread, which means io can't be issued concurrently.
>>
>> However, if bitmap is not enabled, the write io should not go to daemon
>> thread in the first place, and plug is not needed as well.
>>
>> Fixes: 6cce3b23f6f8 ("[PATCH] md: write intent bitmap support for raid10")
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/md-bitmap.c |  4 +---
>>   drivers/md/md-bitmap.h |  7 +++++++
>>   drivers/md/raid1-10.c  | 13 +++++++++++--
>>   3 files changed, 19 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
>> index ad5a3456cd8a..3ee590cf12a7 100644
>> --- a/drivers/md/md-bitmap.c
>> +++ b/drivers/md/md-bitmap.c
>> @@ -1016,7 +1016,6 @@ static int md_bitmap_file_test_bit(struct bitmap *bitmap, sector_t block)
>>          return set;
>>   }
>>
>> -
>>   /* this gets called when the md device is ready to unplug its underlying
>>    * (slave) device queues -- before we let any writes go down, we need to
>>    * sync the dirty pages of the bitmap file to disk */
>> @@ -1026,8 +1025,7 @@ void md_bitmap_unplug(struct bitmap *bitmap)
>>          int dirty, need_write;
>>          int writing = 0;
>>
>> -       if (!bitmap || !bitmap->storage.filemap ||
>> -           test_bit(BITMAP_STALE, &bitmap->flags))
>> +       if (!md_bitmap_enabled(bitmap))
>>                  return;
>>
>>          /* look at each page to see if there are any set bits that need to be
>> diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
>> index cfd7395de8fd..3a4750952b3a 100644
>> --- a/drivers/md/md-bitmap.h
>> +++ b/drivers/md/md-bitmap.h
>> @@ -273,6 +273,13 @@ int md_bitmap_copy_from_slot(struct mddev *mddev, int slot,
>>                               sector_t *lo, sector_t *hi, bool clear_bits);
>>   void md_bitmap_free(struct bitmap *bitmap);
>>   void md_bitmap_wait_behind_writes(struct mddev *mddev);
>> +
>> +static inline bool md_bitmap_enabled(struct bitmap *bitmap)
>> +{
>> +       return bitmap && bitmap->storage.filemap &&
>> +              !test_bit(BITMAP_STALE, &bitmap->flags);
>> +}
>> +
>>   #endif
>>
>>   #endif
>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
>> index 506299bd55cb..73cc3cb9154d 100644
>> --- a/drivers/md/raid1-10.c
>> +++ b/drivers/md/raid1-10.c
>> @@ -131,9 +131,18 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>>                                        blk_plug_cb_fn unplug)
>>   {
>>          struct raid1_plug_cb *plug = NULL;
>> -       struct blk_plug_cb *cb = blk_check_plugged(unplug, mddev,
>> -                                                  sizeof(*plug));
>> +       struct blk_plug_cb *cb;
>> +
>> +       /*
>> +        * If bitmap is not enabled, it's safe to submit the io directly, and
>> +        * this can get optimal performance.
>> +        */
>> +       if (!md_bitmap_enabled(mddev->bitmap)) {
>> +               raid1_submit_write(bio);
>> +               return true;
>> +       }
> 
> Can we check this out of raid1_add_bio_to_plug and call
> raid1_submit_write directly in make_request function?

Of course we can, I'm trying to avoid redundant code here...

Thanks,
Kuai
> 
> Regards
> Xiao
>>
>> +       cb = blk_check_plugged(unplug, mddev, sizeof(*plug));
>>          if (!cb)
>>                  return false;
>>
>> --
>> 2.39.2
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 4/7] md/raid1-10: submit write io directly if bitmap is not enabled
  2023-05-31  8:25     ` Yu Kuai
@ 2023-05-31 15:19       ` Xiao Ni
  0 siblings, 0 replies; 23+ messages in thread
From: Xiao Ni @ 2023-05-31 15:19 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yi.zhang, yangerkun,
	yukuai (C)

On Wed, May 31, 2023 at 4:25 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/31 15:26, Xiao Ni 写道:
> > On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> From: Yu Kuai <yukuai3@huawei.com>
> >>
> >> Commit 6cce3b23f6f8 ("[PATCH] md: write intent bitmap support for raid10")
> >> add bitmap support, and it changed that write io is submitted through
> >> daemon thread because bitmap need to be updated before write io. And
> >> later, plug is used to fix performance regression because all the write io
> >> will go to demon thread, which means io can't be issued concurrently.
> >>
> >> However, if bitmap is not enabled, the write io should not go to daemon
> >> thread in the first place, and plug is not needed as well.
> >>
> >> Fixes: 6cce3b23f6f8 ("[PATCH] md: write intent bitmap support for raid10")
> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> >> ---
> >>   drivers/md/md-bitmap.c |  4 +---
> >>   drivers/md/md-bitmap.h |  7 +++++++
> >>   drivers/md/raid1-10.c  | 13 +++++++++++--
> >>   3 files changed, 19 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> >> index ad5a3456cd8a..3ee590cf12a7 100644
> >> --- a/drivers/md/md-bitmap.c
> >> +++ b/drivers/md/md-bitmap.c
> >> @@ -1016,7 +1016,6 @@ static int md_bitmap_file_test_bit(struct bitmap *bitmap, sector_t block)
> >>          return set;
> >>   }
> >>
> >> -
> >>   /* this gets called when the md device is ready to unplug its underlying
> >>    * (slave) device queues -- before we let any writes go down, we need to
> >>    * sync the dirty pages of the bitmap file to disk */
> >> @@ -1026,8 +1025,7 @@ void md_bitmap_unplug(struct bitmap *bitmap)
> >>          int dirty, need_write;
> >>          int writing = 0;
> >>
> >> -       if (!bitmap || !bitmap->storage.filemap ||
> >> -           test_bit(BITMAP_STALE, &bitmap->flags))
> >> +       if (!md_bitmap_enabled(bitmap))
> >>                  return;
> >>
> >>          /* look at each page to see if there are any set bits that need to be
> >> diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
> >> index cfd7395de8fd..3a4750952b3a 100644
> >> --- a/drivers/md/md-bitmap.h
> >> +++ b/drivers/md/md-bitmap.h
> >> @@ -273,6 +273,13 @@ int md_bitmap_copy_from_slot(struct mddev *mddev, int slot,
> >>                               sector_t *lo, sector_t *hi, bool clear_bits);
> >>   void md_bitmap_free(struct bitmap *bitmap);
> >>   void md_bitmap_wait_behind_writes(struct mddev *mddev);
> >> +
> >> +static inline bool md_bitmap_enabled(struct bitmap *bitmap)
> >> +{
> >> +       return bitmap && bitmap->storage.filemap &&
> >> +              !test_bit(BITMAP_STALE, &bitmap->flags);
> >> +}
> >> +
> >>   #endif
> >>
> >>   #endif
> >> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> >> index 506299bd55cb..73cc3cb9154d 100644
> >> --- a/drivers/md/raid1-10.c
> >> +++ b/drivers/md/raid1-10.c
> >> @@ -131,9 +131,18 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
> >>                                        blk_plug_cb_fn unplug)
> >>   {
> >>          struct raid1_plug_cb *plug = NULL;
> >> -       struct blk_plug_cb *cb = blk_check_plugged(unplug, mddev,
> >> -                                                  sizeof(*plug));
> >> +       struct blk_plug_cb *cb;
> >> +
> >> +       /*
> >> +        * If bitmap is not enabled, it's safe to submit the io directly, and
> >> +        * this can get optimal performance.
> >> +        */
> >> +       if (!md_bitmap_enabled(mddev->bitmap)) {
> >> +               raid1_submit_write(bio);
> >> +               return true;
> >> +       }
> >
> > Can we check this out of raid1_add_bio_to_plug and call
> > raid1_submit_write directly in make_request function?
>
> Of course we can, I'm trying to avoid redundant code here...

I know. But it doesn't fit the name of the function.

Regards
Xiao
>
> Thanks,
> Kuai
> >
> > Regards
> > Xiao
> >>
> >> +       cb = blk_check_plugged(unplug, mddev, sizeof(*plug));
> >>          if (!cb)
> >>                  return false;
> >>
> >> --
> >> 2.39.2
> >>
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread
  2023-05-31  8:06         ` Yu Kuai
@ 2023-05-31 15:23           ` Xiao Ni
  0 siblings, 0 replies; 23+ messages in thread
From: Xiao Ni @ 2023-05-31 15:23 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yi.zhang, yangerkun,
	yukuai (C)

On Wed, May 31, 2023 at 4:06 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/31 16:00, Xiao Ni 写道:
> > On Wed, May 31, 2023 at 3:55 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/05/31 15:50, Xiao Ni 写道:
> >>> On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>
> >>>> From: Yu Kuai <yukuai3@huawei.com>
> >>>>
> >>>> current->bio_list will be set under submit_bio() context, in this case
> >>>> bitmap io will be added to the list and wait for current io submission to
> >>>> finish, while current io submission must wait for bitmap io to be done.
> >>>> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
> >>>> the deadlock by handling plugged bio by daemon thread.
> >>>
> >>> Thanks for the historic introduction. I did a test and printed the
> >>> logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule
> >>> is always true during I/O and it's 0 when io finishes. So I have a
> >>> question here, how can I trigger the condition that from_schedule is 0
> >>> and current->list is not NULL? In other words, is there really a
> >>> deadlock here? Before your patch it looks like all bios are merged
> >>> into conf->pending_bio_list and are handled by raid10d. It can't
> >>> submit bio directly in the originating process which mentioned in
> >>> 57c67df48866
> >>>
> >> As I mentioned below, after commit a214b949d8e3, this deadlock doesn't
> >> exist anymore, and without this patch, patch 7 will introduce this
> >> scenario again.
> >>
> >> Thanks,
> >> Kuai
> >>>>
> >>>> On the one hand, the deadlock won't exist after commit a214b949d8e3
> >>>> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
> >>>> the other hand, current solution makes it impossible to flush plugged bio
> >>>> in raid1/10_make_request(), because this will cause that all the writes
> >>>> will goto daemon thread.
> >>>>
> >>>> In order to limit the number of plugged bio, commit 874807a83139
> >>>> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
> >>>> deadlock is fixed by handling bitmap io asynchronously.
> >>>>
> >>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> >>>> ---
> >>>>    drivers/md/raid1-10.c | 14 ++++++++++++++
> >>>>    drivers/md/raid1.c    |  4 ++--
> >>>>    drivers/md/raid10.c   |  8 +++-----
> >>>>    3 files changed, 19 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> >>>> index 73cc3cb9154d..17e55c1fd5a1 100644
> >>>> --- a/drivers/md/raid1-10.c
> >>>> +++ b/drivers/md/raid1-10.c
> >>>> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
> >>>>
> >>>>           return true;
> >>>>    }
> >>>> +
> >>>> +/*
> >>>> + * current->bio_list will be set under submit_bio() context, in this case bitmap
> >>>> + * io will be added to the list and wait for current io submission to finish,
> >>>> + * while current io submission must wait for bitmap io to be done. In order to
> >>>> + * avoid such deadlock, submit bitmap io asynchronously.
> >>>> + */
> >>>> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
> >>>> +{
> >>>> +       if (current->bio_list)
> >>>> +               md_bitmap_unplug_async(bitmap);
> >>>> +       else
> >>>> +               md_bitmap_unplug(bitmap);
> >>>> +}
> >>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >>>> index 0778e398584c..006620fed595 100644
> >>>> --- a/drivers/md/raid1.c
> >>>> +++ b/drivers/md/raid1.c
> >>>> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
> >>>>    static void flush_bio_list(struct r1conf *conf, struct bio *bio)
> >>>>    {
> >>>>           /* flush any pending bitmap writes to disk before proceeding w/ I/O */
> >>>> -       md_bitmap_unplug(conf->mddev->bitmap);
> >>>> +       raid1_prepare_flush_writes(conf->mddev->bitmap);
> >>>
> >>> If we unplug bitmap asynchronously, can we make sure the bitmap are
> >>> flushed before the corresponding data?
> >
> > Could you explain this question?
>
> Sorry that I missed this... See the new helper in patch 5,
> md_bitmap_unplug_async() will still wait for bitmap io to finish.
>
> md_bitmap_unplug_async
>   DECLARE_COMPLETION_ONSTACK(done)
>   ...
>   wait_for_completion(&done)

Ah I c. You use this way to avoid putting the bitmap io to
current->bio_list. Thanks for the explanation :)

Regards
Xiao
>
> Thanks,
> Kuai
> >
> > Regards
> > Xiao
> >
> >
> >>>
> >>> Regards
> >>> Xiao
> >>>
> >>>>           wake_up(&conf->wait_barrier);
> >>>>
> >>>>           while (bio) { /* submit pending writes */
> >>>> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >>>>           struct r1conf *conf = mddev->private;
> >>>>           struct bio *bio;
> >>>>
> >>>> -       if (from_schedule || current->bio_list) {
> >>>> +       if (from_schedule) {
> >>>>                   spin_lock_irq(&conf->device_lock);
> >>>>                   bio_list_merge(&conf->pending_bio_list, &plug->pending);
> >>>>                   spin_unlock_irq(&conf->device_lock);
> >>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >>>> index 6640507ecb0d..fb22cfe94d32 100644
> >>>> --- a/drivers/md/raid10.c
> >>>> +++ b/drivers/md/raid10.c
> >>>> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
> >>>>                   __set_current_state(TASK_RUNNING);
> >>>>
> >>>>                   blk_start_plug(&plug);
> >>>> -               /* flush any pending bitmap writes to disk
> >>>> -                * before proceeding w/ I/O */
> >>>> -               md_bitmap_unplug(conf->mddev->bitmap);
> >>>> +               raid1_prepare_flush_writes(conf->mddev->bitmap);
> >>>>                   wake_up(&conf->wait_barrier);
> >>>>
> >>>>                   while (bio) { /* submit pending writes */
> >>>> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >>>>           struct r10conf *conf = mddev->private;
> >>>>           struct bio *bio;
> >>>>
> >>>> -       if (from_schedule || current->bio_list) {
> >>>> +       if (from_schedule) {
> >>>>                   spin_lock_irq(&conf->device_lock);
> >>>>                   bio_list_merge(&conf->pending_bio_list, &plug->pending);
> >>>>                   spin_unlock_irq(&conf->device_lock);
> >>>> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >>>>
> >>>>           /* we aren't scheduling, so we can do the write-out directly. */
> >>>>           bio = bio_list_get(&plug->pending);
> >>>> -       md_bitmap_unplug(mddev->bitmap);
> >>>> +       raid1_prepare_flush_writes(mddev->bitmap);
> >>>>           wake_up(&conf->wait_barrier);
> >>>>
> >>>>           while (bio) { /* submit pending writes */
> >>>> --
> >>>> 2.39.2
> >>>>
> >>>
> >>> .
> >>>
> >>
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 7/7] md/raid1-10: limit the number of plugged bio
  2023-05-29 13:11 ` [PATCH -next v3 7/7] md/raid1-10: limit the number of plugged bio Yu Kuai
@ 2023-05-31 15:42   ` Xiao Ni
  2023-06-01  1:41     ` Yu Kuai
  0 siblings, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2023-05-31 15:42 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yukuai3, yi.zhang,
	yangerkun

On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> bio can be added to plug infinitely, and following writeback test can
> trigger huge amount of plugged bio:
>
> Test script:
> modprobe brd rd_nr=4 rd_size=10485760
> mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean --bitmap=internal
> echo 0 > /proc/sys/vm/dirty_background_ratio
> fio -filename=/dev/md0 -ioengine=libaio -rw=write -bs=4k -numjobs=1 -iodepth=128 -name=test
>
> Test result:
> Monitor /sys/block/md0/inflight will found that inflight keep increasing
> until fio finish writing, after running for about 2 minutes:
>
> [root@fedora ~]# cat /sys/block/md0/inflight
>        0  4474191
>
> Fix the problem by limiting the number of plugged bio based on the number
> of copies for original bio.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/raid1-10.c | 9 ++++++++-
>  drivers/md/raid1.c    | 2 +-
>  drivers/md/raid10.c   | 2 +-
>  3 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index 17e55c1fd5a1..bb1e23b66c45 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -21,6 +21,7 @@
>  #define IO_MADE_GOOD ((struct bio *)2)
>
>  #define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
> +#define MAX_PLUG_BIO 32
>
>  /* for managing resync I/O pages */
>  struct resync_pages {
> @@ -31,6 +32,7 @@ struct resync_pages {
>  struct raid1_plug_cb {
>         struct blk_plug_cb      cb;
>         struct bio_list         pending;
> +       unsigned int            count;
>  };
>
>  static void rbio_pool_free(void *rbio, void *data)
> @@ -128,7 +130,7 @@ static inline void raid1_submit_write(struct bio *bio)
>  }
>
>  static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
> -                                     blk_plug_cb_fn unplug)
> +                                     blk_plug_cb_fn unplug, int copies)
>  {
>         struct raid1_plug_cb *plug = NULL;
>         struct blk_plug_cb *cb;
> @@ -148,6 +150,11 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>
>         plug = container_of(cb, struct raid1_plug_cb, cb);
>         bio_list_add(&plug->pending, bio);
> +       if (++plug->count / MAX_PLUG_BIO >= copies) {
> +               list_del(&cb->list);
> +               cb->callback(cb, false);
> +       }
> +
It doesn't need this line here.

Have you done some performance tests with this patch set?

Regards
Xiao
>
>         return true;
>  }
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 006620fed595..dc89a1c4b1f1 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1562,7 +1562,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>                                               r1_bio->sector);
>                 /* flush_pending_writes() needs access to the rdev so...*/
>                 mbio->bi_bdev = (void *)rdev;
> -               if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug)) {
> +               if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
>                         spin_lock_irqsave(&conf->device_lock, flags);
>                         bio_list_add(&conf->pending_bio_list, mbio);
>                         spin_unlock_irqrestore(&conf->device_lock, flags);
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index fb22cfe94d32..9237dbeb07ba 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1306,7 +1306,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
>
>         atomic_inc(&r10_bio->remaining);
>
> -       if (!raid1_add_bio_to_plug(mddev, mbio, raid10_unplug)) {
> +       if (!raid1_add_bio_to_plug(mddev, mbio, raid10_unplug, conf->copies)) {
>                 spin_lock_irqsave(&conf->device_lock, flags);
>                 bio_list_add(&conf->pending_bio_list, mbio);
>                 spin_unlock_irqrestore(&conf->device_lock, flags);
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 7/7] md/raid1-10: limit the number of plugged bio
  2023-05-31 15:42   ` Xiao Ni
@ 2023-06-01  1:41     ` Yu Kuai
  0 siblings, 0 replies; 23+ messages in thread
From: Yu Kuai @ 2023-06-01  1:41 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: song, neilb, akpm, linux-raid, linux-kernel, yi.zhang, yangerkun,
	yukuai (C)

Hi,

在 2023/05/31 23:42, Xiao Ni 写道:
> On Mon, May 29, 2023 at 9:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> bio can be added to plug infinitely, and following writeback test can
>> trigger huge amount of plugged bio:
>>
>> Test script:
>> modprobe brd rd_nr=4 rd_size=10485760
>> mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean --bitmap=internal
>> echo 0 > /proc/sys/vm/dirty_background_ratio
>> fio -filename=/dev/md0 -ioengine=libaio -rw=write -bs=4k -numjobs=1 -iodepth=128 -name=test
>>
>> Test result:
>> Monitor /sys/block/md0/inflight will found that inflight keep increasing
>> until fio finish writing, after running for about 2 minutes:
>>
>> [root@fedora ~]# cat /sys/block/md0/inflight
>>         0  4474191
>>
>> Fix the problem by limiting the number of plugged bio based on the number
>> of copies for original bio.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/raid1-10.c | 9 ++++++++-
>>   drivers/md/raid1.c    | 2 +-
>>   drivers/md/raid10.c   | 2 +-
>>   3 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
>> index 17e55c1fd5a1..bb1e23b66c45 100644
>> --- a/drivers/md/raid1-10.c
>> +++ b/drivers/md/raid1-10.c
>> @@ -21,6 +21,7 @@
>>   #define IO_MADE_GOOD ((struct bio *)2)
>>
>>   #define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
>> +#define MAX_PLUG_BIO 32
>>
>>   /* for managing resync I/O pages */
>>   struct resync_pages {
>> @@ -31,6 +32,7 @@ struct resync_pages {
>>   struct raid1_plug_cb {
>>          struct blk_plug_cb      cb;
>>          struct bio_list         pending;
>> +       unsigned int            count;
>>   };
>>
>>   static void rbio_pool_free(void *rbio, void *data)
>> @@ -128,7 +130,7 @@ static inline void raid1_submit_write(struct bio *bio)
>>   }
>>
>>   static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>> -                                     blk_plug_cb_fn unplug)
>> +                                     blk_plug_cb_fn unplug, int copies)
>>   {
>>          struct raid1_plug_cb *plug = NULL;
>>          struct blk_plug_cb *cb;
>> @@ -148,6 +150,11 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>>
>>          plug = container_of(cb, struct raid1_plug_cb, cb);
>>          bio_list_add(&plug->pending, bio);
>> +       if (++plug->count / MAX_PLUG_BIO >= copies) {
>> +               list_del(&cb->list);
>> +               cb->callback(cb, false);
>> +       }
>> +
> It doesn't need this line here.
> 
> Have you done some performance tests with this patch set?

Just a simple fio script to test 4 ramdisk/loop 16thread 4k write in my
VM, and I didn't notice regression, however, I didn't run benchmarks
yet, I don't have such physical environment to test performance for
now...

I'll definitely run some performance tests in physical evironment later.

Thanks,
Kuai
> 
> Regards
> Xiao
>>
>>          return true;
>>   }
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 006620fed595..dc89a1c4b1f1 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1562,7 +1562,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>>                                                r1_bio->sector);
>>                  /* flush_pending_writes() needs access to the rdev so...*/
>>                  mbio->bi_bdev = (void *)rdev;
>> -               if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug)) {
>> +               if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
>>                          spin_lock_irqsave(&conf->device_lock, flags);
>>                          bio_list_add(&conf->pending_bio_list, mbio);
>>                          spin_unlock_irqrestore(&conf->device_lock, flags);
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index fb22cfe94d32..9237dbeb07ba 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -1306,7 +1306,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
>>
>>          atomic_inc(&r10_bio->remaining);
>>
>> -       if (!raid1_add_bio_to_plug(mddev, mbio, raid10_unplug)) {
>> +       if (!raid1_add_bio_to_plug(mddev, mbio, raid10_unplug, conf->copies)) {
>>                  spin_lock_irqsave(&conf->device_lock, flags);
>>                  bio_list_add(&conf->pending_bio_list, mbio);
>>                  spin_unlock_irqrestore(&conf->device_lock, flags);
>> --
>> 2.39.2
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 5/7] md/md-bitmap: add a new helper to unplug bitmap asynchrously
  2023-05-29 13:11 ` [PATCH -next v3 5/7] md/md-bitmap: add a new helper to unplug bitmap asynchrously Yu Kuai
@ 2023-06-06 17:34   ` Song Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Song Liu @ 2023-06-06 17:34 UTC (permalink / raw)
  To: Yu Kuai
  Cc: neilb, akpm, xni, linux-raid, linux-kernel, yukuai3, yi.zhang, yangerkun

On Mon, May 29, 2023 at 6:14 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> If bitmap is enabled, bitmap must update before submitting write io, this
> is why unplug callback must move these io to 'conf->pending_io_list' if
> 'current->bio_list' is not empty, which will suffer performance
> degradation.
>
> A new helper md_bitmap_unplug_async() is introduced to submit bitmap io
> in a kworker, so that submit bitmap io in raid10_unplug() doesn't require
> that 'current->bio_list' is empty.
>
> This patch prepare to limit the number of plugged bio.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md-bitmap.c | 29 +++++++++++++++++++++++++++++
>  drivers/md/md-bitmap.h |  1 +
>  drivers/md/md.c        |  9 +++++++++
>  drivers/md/md.h        |  1 +
>  4 files changed, 40 insertions(+)
>
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index 3ee590cf12a7..25cd72b317f1 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -1054,6 +1054,35 @@ void md_bitmap_unplug(struct bitmap *bitmap)
>  }
>  EXPORT_SYMBOL(md_bitmap_unplug);
>
> +struct bitmap_unplug_work {
> +       struct work_struct work;
> +       struct bitmap *bitmap;
> +       struct completion *done;
> +};
> +
> +static void md_bitmap_unplug_fn(struct work_struct *work)
> +{
> +       struct bitmap_unplug_work *unplug_work =
> +               container_of(work, struct bitmap_unplug_work, work);
> +
> +       md_bitmap_unplug(unplug_work->bitmap);
> +       complete(unplug_work->done);
> +}
> +
> +void md_bitmap_unplug_async(struct bitmap *bitmap)
> +{
> +       DECLARE_COMPLETION_ONSTACK(done);
> +       struct bitmap_unplug_work unplug_work;
> +
> +       INIT_WORK(&unplug_work.work, md_bitmap_unplug_fn);
> +       unplug_work.bitmap = bitmap;
> +       unplug_work.done = &done;
> +
> +       queue_work(md_bitmap_wq, &unplug_work.work);
> +       wait_for_completion(&done);
> +}
> +EXPORT_SYMBOL(md_bitmap_unplug_async);

I updated the patch with:

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 25cd72b317f1..1ff712889a3b 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1074,7 +1074,7 @@ void md_bitmap_unplug_async(struct bitmap *bitmap)
        DECLARE_COMPLETION_ONSTACK(done);
        struct bitmap_unplug_work unplug_work;

-       INIT_WORK(&unplug_work.work, md_bitmap_unplug_fn);
+       INIT_WORK_ONSTACK(&unplug_work.work, md_bitmap_unplug_fn);
        unplug_work.bitmap = bitmap;
        unplug_work.done = &done;

Otherwise, we will get the following warning with mdadm test
05r1-add-internalbitmap.

Thanks,
Song



[  135.072374] ODEBUG: object ffffc90005adf500 is on stack
ffffc90005ad8000, but NOT annotated.
[  135.074994] ------------[ cut here ]------------
[  135.075770] WARNING: CPU: 1 PID: 1115 at lib/debugobjects.c:548
lookup_object_or_alloc+0x294/0x560
[  135.077085] Modules linked in:
[  135.077571] CPU: 1 PID: 1115 Comm: mkfs.ext3 Not tainted 6.4.0-rc2+ #737
[  135.078542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
[  135.080398] RIP: 0010:lookup_object_or_alloc+0x294/0x560
[  135.081165] Code: 2d b1 24 15 09 65 48 8b 2c 25 80 9e 1f 00 48 8d
7d 20 e8 7f 19 a9 ff 48 8b 55 20 48 89 de 48 c7 c7 00 f2 05 83 e8 5c
97 70 ff <0f> 0b 48 83 c4 58 4c 89 f8 5b 5d 41 5c 41 5d 41 5e 41 5f c3
cc cc
[  135.083830] RSP: 0018:ffffc90005adf398 EFLAGS: 00010082
[  135.084568] RAX: 0000000000000050 RBX: ffffc90005adf500 RCX: 0000000000000000
[  135.085582] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff8a7dea40
[  135.086687] RBP: 0000000000000001 R08: fffff52000b5be3d R09: fffff52000b5be3d
[  135.087684] R10: ffffc90005adf1e7 R11: fffff52000b5be3c R12: ffff888103ee0040
[  135.088725] R13: ffff888103ee0060 R14: ffff888df71f7c40 R15: ffff88815eb90060
[  135.089780] FS:  00007fa1f1c85780(0000) GS:ffff888df7000000(0000)
knlGS:0000000000000000
[  135.090920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.091751] CR2: 000055add4915f44 CR3: 000000011227a002 CR4: 0000000000370ee0
[  135.092775] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  135.093812] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  135.094838] Call Trace:
[  135.095178]  <TASK>
[  135.095528]  ? _raw_spin_lock_irqsave+0x6d/0x90
[  135.096197]  __debug_object_init+0x63/0x130
[  135.096853]  md_bitmap_unplug_async+0x9c/0x180
[  135.097512]  ? __pfx_md_bitmap_unplug_async+0x10/0x10
[  135.098267]  ? bio_associate_blkg_from_css+0x446/0xb10
[  135.099043]  ? update_io_ticks+0xad/0x180
[  135.099636]  ? __pfx_update_io_ticks+0x10/0x10
[  135.100355]  flush_bio_list+0x60/0x1c0
[  135.100922]  raid1_unplug+0x64/0x120
[  135.101460]  raid1_make_request+0xfd7/0x1e10
[  135.102125]  ? __pfx_raid1_make_request+0x10/0x10
[  135.102806]  ? lock_release+0x27a/0x690
[  135.103368]  ? __pfx_lock_release+0x10/0x10
[  135.103963]  ? md_handle_request+0x2b3/0x7c0
[  135.104570]  ? __pfx_rcu_read_lock_held+0x10/0x10
[  135.105232]  ? md_handle_request+0x2b3/0x7c0
[  135.105886]  md_handle_request+0x365/0x7c0
[  135.106492]  ? __pfx_md_handle_request+0x10/0x10
[  135.107177]  __submit_bio+0x1dc/0x5d0
[  135.107770]  submit_bio_noacct_nocheck+0x2c2/0x5b0
[  135.108457]  ? lock_is_held_type+0xd8/0x130
[  135.109076]  ? __pfx_submit_bio_noacct_nocheck+0x10/0x10
[  135.109872]  ? submit_bio_noacct+0x283/0x750
[  135.110533]  __block_write_full_page+0x45f/0xbb0
[  135.111200]  ? __pfx_blkdev_get_block+0x10/0x10
[  135.111901]  ? __pfx_end_buffer_async_write+0x10/0x10
[  135.112741]  writepage_cb+0x3e/0xc0
[  135.113285]  write_cache_pages+0x2d8/0x730
[  135.113906]  ? __pfx_writepage_cb+0x10/0x10
[  135.114531]  ? __pfx_write_cache_pages+0x10/0x10
[  135.115196]  ? lock_release+0x27a/0x690
[  135.115819]  do_writepages+0x190/0x310
[  135.116370]  ? __pfx_do_writepages+0x10/0x10
[  135.116995]  ? preempt_count_sub+0x14/0xc0
[  135.117596]  ? _raw_spin_unlock+0x29/0x50
[  135.118171]  ? wbc_attach_and_unlock_inode+0x216/0x410
[  135.118934]  filemap_fdatawrite_wbc+0x93/0xc0
[  135.119574]  __filemap_fdatawrite_range+0x99/0xd0
[  135.120246]  ? __pfx___filemap_fdatawrite_range+0x10/0x10
[  135.121055]  ? ktime_get_coarse_real_ts64+0xec/0x100
[  135.121789]  file_write_and_wait_range+0x67/0xd0
[  135.122578]  blkdev_fsync+0x35/0x60
[  135.123095]  do_fsync+0x38/0x70
[  135.123587]  ? syscall_trace_enter.isra.15+0x15c/0x1f0
[  135.124318]  __x64_sys_fsync+0x1d/0x30
[  135.124924]  do_syscall_64+0x3a/0x90
[  135.125481]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  135.126197] RIP: 0033:0x7fa1efef4478
[  135.126765] Code: 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00
00 90 f3 0f 1e fa 48 8d 05 25 01 2d 00 8b 00 85 c0 75 17 b8 4a 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 40 c3 0f 1f 80 00 00 00 00 53 89 fb 48
83 ec
[  135.129424] RSP: 002b:00007fffb2176318 EFLAGS: 00000246 ORIG_RAX:
000000000000004a
[  135.130498] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa1efef4478
[  135.131493] RDX: 0000000000000400 RSI: 0000559e026969e0 RDI: 0000000000000003
[  135.132491] RBP: 0000559e026950d0 R08: 0000559e026969e0 R09: 00007fa1eff80620
[  135.133496] R10: 0000000000800800 R11: 0000000000000246 R12: 00007fffb2176390
[  135.134491] R13: 00007fffb2176398 R14: 0000559e02698e70 R15: 0000559e026944f0
[  135.135515]  </TASK>
[  135.135843] irq event stamp: 141152
[  135.136349] hardirqs last  enabled at (141151):
[<ffffffff827778e0>] _raw_spin_unlock_irqrestore+0x30/0x60
[  135.137721] hardirqs last disabled at (141152):
[<ffffffff827775cd>] _raw_spin_lock_irqsave+0x6d/0x90
[  135.139031] softirqs last  enabled at (139890):
[<ffffffff827793fb>] __do_softirq+0x3eb/0x531
[  135.140249] softirqs last disabled at (139885):
[<ffffffff810d8f45>] irq_exit_rcu+0x115/0x160
[  135.141503] ---[ end trace 0000000000000000 ]---




> +
>  static void md_bitmap_set_memory_bits(struct bitmap *bitmap, sector_t offset, int needed);
>  /* * bitmap_init_from_disk -- called at bitmap_create time to initialize
>   * the in-memory bitmap from the on-disk bitmap -- also, sets up the
> diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
> index 3a4750952b3a..8a3788c9bfef 100644
> --- a/drivers/md/md-bitmap.h
> +++ b/drivers/md/md-bitmap.h
> @@ -264,6 +264,7 @@ void md_bitmap_sync_with_cluster(struct mddev *mddev,
>                                  sector_t new_lo, sector_t new_hi);
>
>  void md_bitmap_unplug(struct bitmap *bitmap);
> +void md_bitmap_unplug_async(struct bitmap *bitmap);
>  void md_bitmap_daemon_work(struct mddev *mddev);
>
>  int md_bitmap_resize(struct bitmap *bitmap, sector_t blocks,
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index e592f37a1071..a5a7af2f4e59 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -83,6 +83,7 @@ static struct module *md_cluster_mod;
>  static DECLARE_WAIT_QUEUE_HEAD(resync_wait);
>  static struct workqueue_struct *md_wq;
>  static struct workqueue_struct *md_misc_wq;
> +struct workqueue_struct *md_bitmap_wq;
>
>  static int remove_and_add_spares(struct mddev *mddev,
>                                  struct md_rdev *this);
> @@ -9636,6 +9637,11 @@ static int __init md_init(void)
>         if (!md_misc_wq)
>                 goto err_misc_wq;
>
> +       md_bitmap_wq = alloc_workqueue("md_bitmap", WQ_MEM_RECLAIM | WQ_UNBOUND,
> +                                      0);
> +       if (!md_bitmap_wq)
> +               goto err_bitmap_wq;
> +
>         ret = __register_blkdev(MD_MAJOR, "md", md_probe);
>         if (ret < 0)
>                 goto err_md;
> @@ -9654,6 +9660,8 @@ static int __init md_init(void)
>  err_mdp:
>         unregister_blkdev(MD_MAJOR, "md");
>  err_md:
> +       destroy_workqueue(md_bitmap_wq);
> +err_bitmap_wq:
>         destroy_workqueue(md_misc_wq);
>  err_misc_wq:
>         destroy_workqueue(md_wq);
> @@ -9950,6 +9958,7 @@ static __exit void md_exit(void)
>         spin_unlock(&all_mddevs_lock);
>
>         destroy_workqueue(md_misc_wq);
> +       destroy_workqueue(md_bitmap_wq);
>         destroy_workqueue(md_wq);
>  }
>
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index a50122165fa1..bfd2306bc750 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -852,6 +852,7 @@ struct mdu_array_info_s;
>  struct mdu_disk_info_s;
>
>  extern int mdp_major;
> +extern struct workqueue_struct *md_bitmap_wq;
>  void md_autostart_arrays(int part);
>  int md_set_array_info(struct mddev *mddev, struct mdu_array_info_s *info);
>  int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info);
> --
> 2.39.2
>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH -next v3 0/7] limit the number of plugged bio
  2023-05-29 13:10 [PATCH -next v3 0/7] limit the number of plugged bio Yu Kuai
                   ` (6 preceding siblings ...)
  2023-05-29 13:11 ` [PATCH -next v3 7/7] md/raid1-10: limit the number of plugged bio Yu Kuai
@ 2023-06-06 21:54 ` Song Liu
  7 siblings, 0 replies; 23+ messages in thread
From: Song Liu @ 2023-06-06 21:54 UTC (permalink / raw)
  To: Yu Kuai
  Cc: neilb, akpm, xni, linux-raid, linux-kernel, yukuai3, yi.zhang, yangerkun

On Mon, May 29, 2023 at 6:14 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Changes in v3:
>  - prefix function with 'raid1_' instead of 'md_'
>  - use a globle workequeue instead of per bitmap in patch 5
> Changes in v2:
>  - remove the patch to rename raid1-10.c
>
> Yu Kuai (7):
>   md/raid10: prevent soft lockup while flush writes
>   md/raid1-10: factor out a helper to add bio to plug
>   md/raid1-10: factor out a helper to submit normal write
>   md/raid1-10: submit write io directly if bitmap is not enabled
>   md/md-bitmap: add a new helper to unplug bitmap asynchrously
>   md/raid1-10: don't handle pluged bio by daemon thread
>   md/raid1-10: limit the number of plugged bio

With the fix on 5/7, applied to md-next.

Thanks,
Song


>
>  drivers/md/md-bitmap.c | 33 ++++++++++++++++++++--
>  drivers/md/md-bitmap.h |  8 ++++++
>  drivers/md/md.c        |  9 ++++++
>  drivers/md/md.h        |  1 +
>  drivers/md/raid1-10.c  | 63 ++++++++++++++++++++++++++++++++++++++++++
>  drivers/md/raid1.c     | 29 ++++---------------
>  drivers/md/raid10.c    | 47 +++++++------------------------
>  7 files changed, 126 insertions(+), 64 deletions(-)
>
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-06-06 21:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-29 13:10 [PATCH -next v3 0/7] limit the number of plugged bio Yu Kuai
2023-05-29 13:11 ` [PATCH -next v3 1/7] md/raid10: prevent soft lockup while flush writes Yu Kuai
2023-05-29 13:11 ` [PATCH -next v3 2/7] md/raid1-10: factor out a helper to add bio to plug Yu Kuai
2023-05-29 13:11 ` [PATCH -next v3 3/7] md/raid1-10: factor out a helper to submit normal write Yu Kuai
2023-05-31  7:20   ` Xiao Ni
2023-05-31  7:56     ` Yu Kuai
2023-05-29 13:11 ` [PATCH -next v3 4/7] md/raid1-10: submit write io directly if bitmap is not enabled Yu Kuai
2023-05-31  7:26   ` Xiao Ni
2023-05-31  8:25     ` Yu Kuai
2023-05-31 15:19       ` Xiao Ni
2023-05-29 13:11 ` [PATCH -next v3 5/7] md/md-bitmap: add a new helper to unplug bitmap asynchrously Yu Kuai
2023-06-06 17:34   ` Song Liu
2023-05-29 13:11 ` [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread Yu Kuai
2023-05-31  7:50   ` Xiao Ni
2023-05-31  7:55     ` Yu Kuai
2023-05-31  8:00       ` Xiao Ni
2023-05-31  8:06         ` Yu Kuai
2023-05-31 15:23           ` Xiao Ni
2023-05-31  7:57   ` Paul Menzel
2023-05-29 13:11 ` [PATCH -next v3 7/7] md/raid1-10: limit the number of plugged bio Yu Kuai
2023-05-31 15:42   ` Xiao Ni
2023-06-01  1:41     ` Yu Kuai
2023-06-06 21:54 ` [PATCH -next v3 0/7] " Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).