linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fixes for zone write plugging
@ 2024-04-20  7:16 Damien Le Moal
  2024-04-20  7:16 ` [PATCH 1/2] block: prevent freeing a zone write plug too early Damien Le Moal
  2024-04-20  7:16 ` [PATCH 2/2] block: use a per disk workqueue for zone write plugging Damien Le Moal
  0 siblings, 2 replies; 3+ messages in thread
From: Damien Le Moal @ 2024-04-20  7:16 UTC (permalink / raw)
  To: Jens Axboe, linux-block

Jens,

A couple of fix patches for zone write plugging. The second fix
addresses something that is in fact likely to happen in production with
large servers with lots of SMR drives.

Damien Le Moal (2):
  block: prevent freeing a zone write plug too early
  block: use a per disk workqueue for zone write plugging

 block/blk-zoned.c      | 34 ++++++++++++++++++++++++++--------
 include/linux/blkdev.h |  1 +
 2 files changed, 27 insertions(+), 8 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] block: prevent freeing a zone write plug too early
  2024-04-20  7:16 [PATCH 0/2] Fixes for zone write plugging Damien Le Moal
@ 2024-04-20  7:16 ` Damien Le Moal
  2024-04-20  7:16 ` [PATCH 2/2] block: use a per disk workqueue for zone write plugging Damien Le Moal
  1 sibling, 0 replies; 3+ messages in thread
From: Damien Le Moal @ 2024-04-20  7:16 UTC (permalink / raw)
  To: Jens Axboe, linux-block

The submission of plugged BIOs is done using a work struct executing the
function blk_zone_wplug_bio_work(). This function gets and submits a
plugged zone write BIO and is guaranteed to operate on a valid zone
write plug (with a reference count higher than 0) on entry as plugged
BIOs hold a reference on their zone write plugs. However, once a BIO is
submitted with submit_bio_noacct_nocheck(), the BIO may complete before
blk_zone_wplug_bio_work(), with the BIO completion trigering a release
and freeing of the zone write plug if the BIO is the last write to a
zone (making the zone FULL). This potentially can result in the zone
write plug being freed while the work is still active.

Avoid this by calling flush_work() from disk_free_zone_wplug_rcu().

Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-zoned.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 3befebe6b319..685f0b9159fd 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -526,6 +526,8 @@ static void disk_free_zone_wplug_rcu(struct rcu_head *rcu_head)
 	struct blk_zone_wplug *zwplug =
 		container_of(rcu_head, struct blk_zone_wplug, rcu_head);
 
+	flush_work(&zwplug->bio_work);
+
 	mempool_free(zwplug, zwplug->disk->zone_wplugs_pool);
 }
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] block: use a per disk workqueue for zone write plugging
  2024-04-20  7:16 [PATCH 0/2] Fixes for zone write plugging Damien Le Moal
  2024-04-20  7:16 ` [PATCH 1/2] block: prevent freeing a zone write plug too early Damien Le Moal
@ 2024-04-20  7:16 ` Damien Le Moal
  1 sibling, 0 replies; 3+ messages in thread
From: Damien Le Moal @ 2024-04-20  7:16 UTC (permalink / raw)
  To: Jens Axboe, linux-block

A zone write plug BIO work function blk_zone_wplug_bio_work() calls
submit_bio_noacct_nocheck() to execute the next unplugged BIO. This
function may block. So executing zone plugs BIO works using the block
layer global kblockd workqueue can potentially lead to preformance or
latency issues as the number of concurrent work for a workqueue is
limited to WQ_DFL_ACTIVE (256).
1) For a system with a large number of zoned disks, issuing write
   requests to otherwise unused zones may be delayed wiating for a work
   thread to become available.
2) Requeue operations which use kblockd but are independent of zone
   write plugging may alsoi end up being delayed.

To avoid these potential performance issues, create a workqueue per
zoned device to execute zone plugs BIO work. The workqueue max active
parameter is set to the maximum number of zone write plugs allocated
with the zone write plug mempool. This limit is equal to the maximum
number of open zones of the disk and defaults to 128 for disks that do
not have a limit on the number of open zones.

Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-zoned.c      | 32 ++++++++++++++++++++++++--------
 include/linux/blkdev.h |  1 +
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 685f0b9159fd..9bed43472d43 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -1131,7 +1131,7 @@ static void disk_zone_wplug_unplug_bio(struct gendisk *disk,
 	/* Schedule submission of the next plugged BIO if we have one. */
 	if (!bio_list_empty(&zwplug->bio_list)) {
 		spin_unlock_irqrestore(&zwplug->lock, flags);
-		kblockd_schedule_work(&zwplug->bio_work);
+		queue_work(disk->zone_wplugs_wq, &zwplug->bio_work);
 		return;
 	}
 
@@ -1334,7 +1334,7 @@ static void disk_zone_wplug_handle_error(struct gendisk *disk,
 	/* Restart BIO submission if we still have any BIO left. */
 	if (!bio_list_empty(&zwplug->bio_list)) {
 		WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED));
-		kblockd_schedule_work(&zwplug->bio_work);
+		queue_work(disk->zone_wplugs_wq, &zwplug->bio_work);
 		goto unlock;
 	}
 
@@ -1411,14 +1411,25 @@ static int disk_alloc_zone_resources(struct gendisk *disk,
 
 	disk->zone_wplugs_pool = mempool_create_kmalloc_pool(pool_size,
 						sizeof(struct blk_zone_wplug));
-	if (!disk->zone_wplugs_pool) {
-		kfree(disk->zone_wplugs_hash);
-		disk->zone_wplugs_hash = NULL;
-		disk->zone_wplugs_hash_bits = 0;
-		return -ENOMEM;
-	}
+	if (!disk->zone_wplugs_pool)
+		goto free_hash;
+
+	disk->zone_wplugs_wq =
+		alloc_workqueue("%s_zwplugs", WQ_MEM_RECLAIM | WQ_HIGHPRI,
+				0, disk->disk_name);
+	if (!disk->zone_wplugs_wq)
+		goto destroy_pool;
 
 	return 0;
+
+destroy_pool:
+	mempool_destroy(disk->zone_wplugs_pool);
+	disk->zone_wplugs_pool = NULL;
+free_hash:
+	kfree(disk->zone_wplugs_hash);
+	disk->zone_wplugs_hash = NULL;
+	disk->zone_wplugs_hash_bits = 0;
+	return -ENOMEM;
 }
 
 static void disk_destroy_zone_wplugs_hash_table(struct gendisk *disk)
@@ -1449,6 +1460,11 @@ void disk_free_zone_resources(struct gendisk *disk)
 {
 	cancel_work_sync(&disk->zone_wplugs_work);
 
+	if (disk->zone_wplugs_wq) {
+		destroy_workqueue(disk->zone_wplugs_wq);
+		disk->zone_wplugs_wq = NULL;
+	}
+
 	disk_destroy_zone_wplugs_hash_table(disk);
 
 	/*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c62536c78a46..a2847f8e5f82 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -191,6 +191,7 @@ struct gendisk {
 	struct hlist_head       *zone_wplugs_hash;
 	struct list_head        zone_wplugs_err_list;
 	struct work_struct	zone_wplugs_work;
+	struct workqueue_struct *zone_wplugs_wq;
 #endif /* CONFIG_BLK_DEV_ZONED */
 
 #if IS_ENABLED(CONFIG_CDROM)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-20  7:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-20  7:16 [PATCH 0/2] Fixes for zone write plugging Damien Le Moal
2024-04-20  7:16 ` [PATCH 1/2] block: prevent freeing a zone write plug too early Damien Le Moal
2024-04-20  7:16 ` [PATCH 2/2] block: use a per disk workqueue for zone write plugging Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).