All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19
@ 2022-04-18  2:27 Mike Snitzer
  2022-04-18  2:27   ` [dm-devel] " Mike Snitzer
                   ` (22 more replies)
  0 siblings, 23 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Hi,

This patchset reflects what I've staged in linux-dm.git's "dm-5.19"
branch (also staged in "for-next" for linux-next):
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19

It build's on jens/for-5.19/block branch (which is based on v5.18-rc3)

I can still make changes or add Reviewed-by:s, etc. So please feel
free to review.

Jens, I'd appreciate it if you could pickup the first patch:
"block: change exported IO accounting interface from gendisk to bdev"
(still not in love with that subject but...)

Thanks,
Mike

Mike Snitzer (13):
  dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset
  dm: factor out dm_io_set_error and __dm_io_dec_pending
  dm: simplify dm_io access in dm_split_and_process_bio
  dm: simplify dm_start_io_acct
  dm: mark various branches unlikely
  dm: add local variables to clone_endio and __map_bio
  dm: move hot dm_io members to same cacheline as dm_target_io
  dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio
  dm: conditionally enable branching for less used features
  dm: simplify basic targets
  dm: use bio_sectors in dm_aceept_partial_bio
  dm: simplify bio-based IO accounting further
  dm: improve abnormal bio processing

Ming Lei (8):
  block: change exported IO accounting interface from gendisk to bdev
  dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct
  dm: pass dm_io instance to dm_io_acct directly
  dm: switch to bdev based IO accounting interfaces
  dm: improve bio splitting and associated IO accounting
  dm: don't grab target io reference in dm_zone_map_bio
  dm: improve dm_io reference counting
  dm: put all polled dm_io instances into a single list

 block/blk-core.c              |  52 ++---
 drivers/block/zram/zram_drv.c |   5 +-
 drivers/md/dm-core.h          |  38 ++--
 drivers/md/dm-delay.c         |   3 +-
 drivers/md/dm-flakey.c        |   4 +-
 drivers/md/dm-linear.c        |  11 +-
 drivers/md/dm-stats.c         |   3 +
 drivers/md/dm-table.c         |  16 +-
 drivers/md/dm-zone.c          |  10 -
 drivers/md/dm.c               | 515 ++++++++++++++++++++++++------------------
 drivers/md/dm.h               |   4 +-
 include/linux/blkdev.h        |   7 +-
 12 files changed, 367 insertions(+), 301 deletions(-)

-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [dm-5.19 PATCH 01/21] block: change exported IO accounting interface from gendisk to bdev
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
@ 2022-04-18  2:27   ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 02/21] dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset Mike Snitzer
                     ` (21 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei, linux-block

From: Ming Lei <ming.lei@redhat.com>

Export IO accounting interfaces in terms of block_device now that
gendisk has become more internal to block core.

Rename __part_{start,end}_io_acct's first argument from part to bdev.
Rename __part_{start,end}_io_acct to bdev_{start,end}_io_acct and
export them.  Remove disk_{start,end}_io_acct and update caller (zram)
to use bdev_{start,end}_io_acct.

DM can now be updated to use bdev_{start,end}_io_acct.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 block/blk-core.c              | 52 +++++++++++++++++--------------------------
 drivers/block/zram/zram_drv.c |  5 +++--
 include/linux/blkdev.h        |  7 +++---
 3 files changed, 27 insertions(+), 37 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 937bb6b86331..99deb4603be3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1022,21 +1022,22 @@ void update_io_ticks(struct block_device *part, unsigned long now, bool end)
 	}
 }
 
-static unsigned long __part_start_io_acct(struct block_device *part,
-					  unsigned int sectors, unsigned int op,
-					  unsigned long start_time)
+unsigned long bdev_start_io_acct(struct block_device *bdev,
+				 unsigned int sectors, unsigned int op,
+				 unsigned long start_time)
 {
 	const int sgrp = op_stat_group(op);
 
 	part_stat_lock();
-	update_io_ticks(part, start_time, false);
-	part_stat_inc(part, ios[sgrp]);
-	part_stat_add(part, sectors[sgrp], sectors);
-	part_stat_local_inc(part, in_flight[op_is_write(op)]);
+	update_io_ticks(bdev, start_time, false);
+	part_stat_inc(bdev, ios[sgrp]);
+	part_stat_add(bdev, sectors[sgrp], sectors);
+	part_stat_local_inc(bdev, in_flight[op_is_write(op)]);
 	part_stat_unlock();
 
 	return start_time;
 }
+EXPORT_SYMBOL(bdev_start_io_acct);
 
 /**
  * bio_start_io_acct_time - start I/O accounting for bio based drivers
@@ -1045,8 +1046,8 @@ static unsigned long __part_start_io_acct(struct block_device *part,
  */
 void bio_start_io_acct_time(struct bio *bio, unsigned long start_time)
 {
-	__part_start_io_acct(bio->bi_bdev, bio_sectors(bio),
-			     bio_op(bio), start_time);
+	bdev_start_io_acct(bio->bi_bdev, bio_sectors(bio),
+			   bio_op(bio), start_time);
 }
 EXPORT_SYMBOL_GPL(bio_start_io_acct_time);
 
@@ -1058,46 +1059,33 @@ EXPORT_SYMBOL_GPL(bio_start_io_acct_time);
  */
 unsigned long bio_start_io_acct(struct bio *bio)
 {
-	return __part_start_io_acct(bio->bi_bdev, bio_sectors(bio),
-				    bio_op(bio), jiffies);
+	return bdev_start_io_acct(bio->bi_bdev, bio_sectors(bio),
+				  bio_op(bio), jiffies);
 }
 EXPORT_SYMBOL_GPL(bio_start_io_acct);
 
-unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors,
-				 unsigned int op)
-{
-	return __part_start_io_acct(disk->part0, sectors, op, jiffies);
-}
-EXPORT_SYMBOL(disk_start_io_acct);
-
-static void __part_end_io_acct(struct block_device *part, unsigned int op,
-			       unsigned long start_time)
+void bdev_end_io_acct(struct block_device *bdev, unsigned int op,
+		      unsigned long start_time)
 {
 	const int sgrp = op_stat_group(op);
 	unsigned long now = READ_ONCE(jiffies);
 	unsigned long duration = now - start_time;
 
 	part_stat_lock();
-	update_io_ticks(part, now, true);
-	part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
-	part_stat_local_dec(part, in_flight[op_is_write(op)]);
+	update_io_ticks(bdev, now, true);
+	part_stat_add(bdev, nsecs[sgrp], jiffies_to_nsecs(duration));
+	part_stat_local_dec(bdev, in_flight[op_is_write(op)]);
 	part_stat_unlock();
 }
+EXPORT_SYMBOL(bdev_end_io_acct);
 
 void bio_end_io_acct_remapped(struct bio *bio, unsigned long start_time,
-		struct block_device *orig_bdev)
+			      struct block_device *orig_bdev)
 {
-	__part_end_io_acct(orig_bdev, bio_op(bio), start_time);
+	bdev_end_io_acct(orig_bdev, bio_op(bio), start_time);
 }
 EXPORT_SYMBOL_GPL(bio_end_io_acct_remapped);
 
-void disk_end_io_acct(struct gendisk *disk, unsigned int op,
-		      unsigned long start_time)
-{
-	__part_end_io_acct(disk->part0, op, start_time);
-}
-EXPORT_SYMBOL(disk_end_io_acct);
-
 /**
  * blk_lld_busy - Check if underlying low-level drivers of a device are busy
  * @q : the queue of the device being checked
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e9474b02012d..adb5209a556a 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1675,9 +1675,10 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	bv.bv_len = PAGE_SIZE;
 	bv.bv_offset = 0;
 
-	start_time = disk_start_io_acct(bdev->bd_disk, SECTORS_PER_PAGE, op);
+	start_time = bdev_start_io_acct(bdev->bd_disk->part0,
+			SECTORS_PER_PAGE, op, jiffies);
 	ret = zram_bvec_rw(zram, &bv, index, offset, op, NULL);
-	disk_end_io_acct(bdev->bd_disk, op, start_time);
+	bdev_end_io_acct(bdev->bd_disk->part0, op, start_time);
 out:
 	/*
 	 * If I/O fails, just return error(ie, non-zero) without
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 60d016138997..f680ba6f0ab2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1491,9 +1491,10 @@ static inline void blk_wake_io_task(struct task_struct *waiter)
 		wake_up_process(waiter);
 }
 
-unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors,
-		unsigned int op);
-void disk_end_io_acct(struct gendisk *disk, unsigned int op,
+unsigned long bdev_start_io_acct(struct block_device *bdev,
+				 unsigned int sectors, unsigned int op,
+				 unsigned long start_time);
+void bdev_end_io_acct(struct block_device *bdev, unsigned int op,
 		unsigned long start_time);
 
 void bio_start_io_acct_time(struct bio *bio, unsigned long start_time);
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 01/21] block: change exported IO accounting interface from gendisk to bdev
@ 2022-04-18  2:27   ` Mike Snitzer
  0 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, linux-block, hch, ming.lei

[-- Attachment #1: Type: application/octet-stream, Size: 5719 bytes --]

From: Ming Lei <ming.lei@redhat.com>

Export IO accounting interfaces in terms of block_device now that
gendisk has become more internal to block core.

Rename __part_{start,end}_io_acct's first argument from part to bdev.
Rename __part_{start,end}_io_acct to bdev_{start,end}_io_acct and
export them.  Remove disk_{start,end}_io_acct and update caller (zram)
to use bdev_{start,end}_io_acct.

DM can now be updated to use bdev_{start,end}_io_acct.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 block/blk-core.c              | 52 +++++++++++++++++--------------------------
 drivers/block/zram/zram_drv.c |  5 +++--
 include/linux/blkdev.h        |  7 +++---
 3 files changed, 27 insertions(+), 37 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 937bb6b86331..99deb4603be3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1022,21 +1022,22 @@ void update_io_ticks(struct block_device *part, unsigned long now, bool end)
 	}
 }
 
-static unsigned long __part_start_io_acct(struct block_device *part,
-					  unsigned int sectors, unsigned int op,
-					  unsigned long start_time)
+unsigned long bdev_start_io_acct(struct block_device *bdev,
+				 unsigned int sectors, unsigned int op,
+				 unsigned long start_time)
 {
 	const int sgrp = op_stat_group(op);
 
 	part_stat_lock();
-	update_io_ticks(part, start_time, false);
-	part_stat_inc(part, ios[sgrp]);
-	part_stat_add(part, sectors[sgrp], sectors);
-	part_stat_local_inc(part, in_flight[op_is_write(op)]);
+	update_io_ticks(bdev, start_time, false);
+	part_stat_inc(bdev, ios[sgrp]);
+	part_stat_add(bdev, sectors[sgrp], sectors);
+	part_stat_local_inc(bdev, in_flight[op_is_write(op)]);
 	part_stat_unlock();
 
 	return start_time;
 }
+EXPORT_SYMBOL(bdev_start_io_acct);
 
 /**
  * bio_start_io_acct_time - start I/O accounting for bio based drivers
@@ -1045,8 +1046,8 @@ static unsigned long __part_start_io_acct(struct block_device *part,
  */
 void bio_start_io_acct_time(struct bio *bio, unsigned long start_time)
 {
-	__part_start_io_acct(bio->bi_bdev, bio_sectors(bio),
-			     bio_op(bio), start_time);
+	bdev_start_io_acct(bio->bi_bdev, bio_sectors(bio),
+			   bio_op(bio), start_time);
 }
 EXPORT_SYMBOL_GPL(bio_start_io_acct_time);
 
@@ -1058,46 +1059,33 @@ EXPORT_SYMBOL_GPL(bio_start_io_acct_time);
  */
 unsigned long bio_start_io_acct(struct bio *bio)
 {
-	return __part_start_io_acct(bio->bi_bdev, bio_sectors(bio),
-				    bio_op(bio), jiffies);
+	return bdev_start_io_acct(bio->bi_bdev, bio_sectors(bio),
+				  bio_op(bio), jiffies);
 }
 EXPORT_SYMBOL_GPL(bio_start_io_acct);
 
-unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors,
-				 unsigned int op)
-{
-	return __part_start_io_acct(disk->part0, sectors, op, jiffies);
-}
-EXPORT_SYMBOL(disk_start_io_acct);
-
-static void __part_end_io_acct(struct block_device *part, unsigned int op,
-			       unsigned long start_time)
+void bdev_end_io_acct(struct block_device *bdev, unsigned int op,
+		      unsigned long start_time)
 {
 	const int sgrp = op_stat_group(op);
 	unsigned long now = READ_ONCE(jiffies);
 	unsigned long duration = now - start_time;
 
 	part_stat_lock();
-	update_io_ticks(part, now, true);
-	part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
-	part_stat_local_dec(part, in_flight[op_is_write(op)]);
+	update_io_ticks(bdev, now, true);
+	part_stat_add(bdev, nsecs[sgrp], jiffies_to_nsecs(duration));
+	part_stat_local_dec(bdev, in_flight[op_is_write(op)]);
 	part_stat_unlock();
 }
+EXPORT_SYMBOL(bdev_end_io_acct);
 
 void bio_end_io_acct_remapped(struct bio *bio, unsigned long start_time,
-		struct block_device *orig_bdev)
+			      struct block_device *orig_bdev)
 {
-	__part_end_io_acct(orig_bdev, bio_op(bio), start_time);
+	bdev_end_io_acct(orig_bdev, bio_op(bio), start_time);
 }
 EXPORT_SYMBOL_GPL(bio_end_io_acct_remapped);
 
-void disk_end_io_acct(struct gendisk *disk, unsigned int op,
-		      unsigned long start_time)
-{
-	__part_end_io_acct(disk->part0, op, start_time);
-}
-EXPORT_SYMBOL(disk_end_io_acct);
-
 /**
  * blk_lld_busy - Check if underlying low-level drivers of a device are busy
  * @q : the queue of the device being checked
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e9474b02012d..adb5209a556a 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1675,9 +1675,10 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	bv.bv_len = PAGE_SIZE;
 	bv.bv_offset = 0;
 
-	start_time = disk_start_io_acct(bdev->bd_disk, SECTORS_PER_PAGE, op);
+	start_time = bdev_start_io_acct(bdev->bd_disk->part0,
+			SECTORS_PER_PAGE, op, jiffies);
 	ret = zram_bvec_rw(zram, &bv, index, offset, op, NULL);
-	disk_end_io_acct(bdev->bd_disk, op, start_time);
+	bdev_end_io_acct(bdev->bd_disk->part0, op, start_time);
 out:
 	/*
 	 * If I/O fails, just return error(ie, non-zero) without
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 60d016138997..f680ba6f0ab2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1491,9 +1491,10 @@ static inline void blk_wake_io_task(struct task_struct *waiter)
 		wake_up_process(waiter);
 }
 
-unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors,
-		unsigned int op);
-void disk_end_io_acct(struct gendisk *disk, unsigned int op,
+unsigned long bdev_start_io_acct(struct block_device *bdev,
+				 unsigned int sectors, unsigned int op,
+				 unsigned long start_time);
+void bdev_end_io_acct(struct block_device *bdev, unsigned int op,
 		unsigned long start_time);
 
 void bio_start_io_acct_time(struct bio *bio, unsigned long start_time);
-- 
2.15.0


[-- Attachment #2: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 02/21] dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
  2022-04-18  2:27   ` [dm-devel] " Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 03/21] dm: factor out dm_io_set_error and __dm_io_dec_pending Mike Snitzer
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

A bioset's per-cpu alloc cache may have broader utility in the future
but for now constrain it to being tightly coupled to QUEUE_FLAG_POLL.

Also change dm_io_complete() to use bio_clear_polled() so that it
properly clears all associated bio state on requeue.

This commit improves DM's hipri bio polling (REQ_POLLED) perf by
7 - 20% depending on the system.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-table.c | 11 ++++++++---
 drivers/md/dm.c       |  8 ++++----
 drivers/md/dm.h       |  4 ++--
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 03541cfc2317..ffd97cb6d1c1 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1002,6 +1002,8 @@ bool dm_table_request_based(struct dm_table *t)
 	return __table_type_request_based(dm_table_get_type(t));
 }
 
+static int dm_table_supports_poll(struct dm_table *t);
+
 static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device *md)
 {
 	enum dm_queue_mode type = dm_table_get_type(t);
@@ -1009,21 +1011,24 @@ static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device *
 	unsigned min_pool_size = 0;
 	struct dm_target *ti;
 	unsigned i;
+	bool poll_supported = false;
 
 	if (unlikely(type == DM_TYPE_NONE)) {
 		DMWARN("no table type is set, can't allocate mempools");
 		return -EINVAL;
 	}
 
-	if (__table_type_bio_based(type))
+	if (__table_type_bio_based(type)) {
 		for (i = 0; i < t->num_targets; i++) {
 			ti = t->targets + i;
 			per_io_data_size = max(per_io_data_size, ti->per_io_data_size);
 			min_pool_size = max(min_pool_size, ti->num_flush_bios);
 		}
+		poll_supported = !!dm_table_supports_poll(t);
+	}
 
-	t->mempools = dm_alloc_md_mempools(md, type, t->integrity_supported,
-					   per_io_data_size, min_pool_size);
+	t->mempools = dm_alloc_md_mempools(md, type, per_io_data_size, min_pool_size,
+					   t->integrity_supported, poll_supported);
 	if (!t->mempools)
 		return -ENOMEM;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 82957bd460e8..61a52f585966 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -899,7 +899,7 @@ static void dm_io_complete(struct dm_io *io)
 			 * may only reflect a subset of the pre-split original)
 			 * so clear REQ_POLLED in case of requeue.
 			 */
-			bio->bi_opf &= ~REQ_POLLED;
+			bio_clear_polled(bio);
 			if (io_error == BLK_STS_AGAIN) {
 				/* io_uring doesn't handle BLK_STS_AGAIN (yet) */
 				queue_io(md, bio);
@@ -2902,8 +2902,8 @@ int dm_noflush_suspending(struct dm_target *ti)
 EXPORT_SYMBOL_GPL(dm_noflush_suspending);
 
 struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_queue_mode type,
-					    unsigned integrity, unsigned per_io_data_size,
-					    unsigned min_pool_size)
+					    unsigned per_io_data_size, unsigned min_pool_size,
+					    bool integrity, bool poll)
 {
 	struct dm_md_mempools *pools = kzalloc_node(sizeof(*pools), GFP_KERNEL, md->numa_node_id);
 	unsigned int pool_size = 0;
@@ -2919,7 +2919,7 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
 		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
 		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + DM_TARGET_IO_BIO_OFFSET;
 		io_front_pad = roundup(per_io_data_size,  __alignof__(struct dm_io)) + DM_IO_BIO_OFFSET;
-		ret = bioset_init(&pools->io_bs, pool_size, io_front_pad, 0);
+		ret = bioset_init(&pools->io_bs, pool_size, io_front_pad, poll ? BIOSET_PERCPU_CACHE : 0);
 		if (ret)
 			goto out;
 		if (integrity && bioset_integrity_create(&pools->io_bs, pool_size))
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 9013dc1a7b00..3f89664fea01 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -221,8 +221,8 @@ void dm_kcopyd_exit(void);
  * Mempool operations
  */
 struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_queue_mode type,
-					    unsigned integrity, unsigned per_bio_data_size,
-					    unsigned min_pool_size);
+					    unsigned per_io_data_size, unsigned min_pool_size,
+					    bool integrity, bool poll);
 void dm_free_md_mempools(struct dm_md_mempools *pools);
 
 /*
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 03/21] dm: factor out dm_io_set_error and __dm_io_dec_pending
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
  2022-04-18  2:27   ` [dm-devel] " Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 02/21] dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 04/21] dm: simplify dm_io access in dm_split_and_process_bio Mike Snitzer
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Also eliminate need to use errno_to_blk_status().

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 64 ++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 36 insertions(+), 28 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 61a52f585966..88ff5c3311aa 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -578,7 +578,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
 
 	io = container_of(tio, struct dm_io, tio);
 	io->magic = DM_IO_MAGIC;
-	io->status = 0;
+	io->status = BLK_STS_OK;
 	atomic_set(&io->io_count, 1);
 	this_cpu_inc(*md->pending_io);
 	io->orig_bio = NULL;
@@ -933,20 +933,31 @@ static inline bool dm_tio_is_normal(struct dm_target_io *tio)
  * Decrements the number of outstanding ios that a bio has been
  * cloned into, completing the original io if necc.
  */
-void dm_io_dec_pending(struct dm_io *io, blk_status_t error)
+static inline void __dm_io_dec_pending(struct dm_io *io)
+{
+	if (atomic_dec_and_test(&io->io_count))
+		dm_io_complete(io);
+}
+
+static void dm_io_set_error(struct dm_io *io, blk_status_t error)
 {
+	unsigned long flags;
+
 	/* Push-back supersedes any I/O errors */
-	if (unlikely(error)) {
-		unsigned long flags;
-		spin_lock_irqsave(&io->lock, flags);
-		if (!(io->status == BLK_STS_DM_REQUEUE &&
-		      __noflush_suspending(io->md)))
-			io->status = error;
-		spin_unlock_irqrestore(&io->lock, flags);
+	spin_lock_irqsave(&io->lock, flags);
+	if (!(io->status == BLK_STS_DM_REQUEUE &&
+	      __noflush_suspending(io->md))) {
+		io->status = error;
 	}
+	spin_unlock_irqrestore(&io->lock, flags);
+}
 
-	if (atomic_dec_and_test(&io->io_count))
-		dm_io_complete(io);
+void dm_io_dec_pending(struct dm_io *io, blk_status_t error)
+{
+	if (unlikely(error))
+		dm_io_set_error(io, error);
+
+	__dm_io_dec_pending(io);
 }
 
 void disable_discard(struct mapped_device *md)
@@ -1429,7 +1440,7 @@ static bool is_abnormal_io(struct bio *bio)
 }
 
 static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
-				  int *result)
+				  blk_status_t *status)
 {
 	unsigned num_bios = 0;
 
@@ -1453,11 +1464,11 @@ static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
 	 * reconfiguration might also have changed that since the
 	 * check was performed.
 	 */
-	if (!num_bios)
-		*result = -EOPNOTSUPP;
+	if (unlikely(!num_bios))
+		*status = BLK_STS_NOTSUPP;
 	else {
 		__send_changing_extent_only(ci, ti, num_bios);
-		*result = 0;
+		*status = BLK_STS_OK;
 	}
 	return true;
 }
@@ -1506,19 +1517,16 @@ static void dm_queue_poll_io(struct bio *bio, struct dm_io *io)
 /*
  * Select the correct strategy for processing a non-flush bio.
  */
-static int __split_and_process_bio(struct clone_info *ci)
+static blk_status_t __split_and_process_bio(struct clone_info *ci)
 {
 	struct bio *clone;
 	struct dm_target *ti;
 	unsigned len;
-	int r;
+	blk_status_t error = BLK_STS_IOERR;
 
 	ti = dm_table_find_target(ci->map, ci->sector);
-	if (!ti)
-		return -EIO;
-
-	if (__process_abnormal_io(ci, ti, &r))
-		return r;
+	if (unlikely(!ti || __process_abnormal_io(ci, ti, &error)))
+		return error;
 
 	/*
 	 * Only support bio polling for normal IO, and the target io is
@@ -1533,7 +1541,7 @@ static int __split_and_process_bio(struct clone_info *ci)
 	ci->sector += len;
 	ci->sector_count -= len;
 
-	return 0;
+	return BLK_STS_OK;
 }
 
 static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
@@ -1559,7 +1567,7 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 {
 	struct clone_info ci;
 	struct bio *orig_bio = NULL;
-	int error = 0;
+	blk_status_t error = BLK_STS_OK;
 
 	init_clone_info(&ci, md, map, bio);
 
@@ -1601,7 +1609,7 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 	 * bio->bi_private, so that dm_poll_bio can poll them all.
 	 */
 	if (error || !ci.submit_as_polled)
-		dm_io_dec_pending(ci.io, errno_to_blk_status(error));
+		dm_io_dec_pending(ci.io, error);
 	else
 		dm_queue_poll_io(bio, ci.io);
 }
@@ -1682,10 +1690,10 @@ static int dm_poll_bio(struct bio *bio, struct io_comp_batch *iob,
 		if (dm_poll_dm_io(io, iob, flags)) {
 			hlist_del_init(&io->node);
 			/*
-			 * clone_endio() has already occurred, so passing
-			 * error as 0 here doesn't override io->status
+			 * clone_endio() has already occurred, so no
+			 * error handling is needed here.
 			 */
-			dm_io_dec_pending(io, 0);
+			__dm_io_dec_pending(io);
 		}
 	}
 
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 04/21] dm: simplify dm_io access in dm_split_and_process_bio
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (2 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 03/21] dm: factor out dm_io_set_error and __dm_io_dec_pending Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 05/21] dm: simplify dm_start_io_acct Mike Snitzer
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Use local variable instead of redudant access using ci.io

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 88ff5c3311aa..48f93c55c992 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1566,10 +1566,12 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 				     struct dm_table *map, struct bio *bio)
 {
 	struct clone_info ci;
+	struct dm_io *io;
 	struct bio *orig_bio = NULL;
 	blk_status_t error = BLK_STS_OK;
 
 	init_clone_info(&ci, md, map, bio);
+	io = ci.io;
 
 	if (bio->bi_opf & REQ_PREFLUSH) {
 		__send_empty_flush(&ci);
@@ -1578,14 +1580,14 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 	}
 
 	error = __split_and_process_bio(&ci);
-	ci.io->map_task = NULL;
+	io->map_task = NULL;
 	if (error || !ci.sector_count)
 		goto out;
 
 	/*
 	 * Remainder must be passed to submit_bio_noacct() so it gets handled
 	 * *after* bios already submitted have been completely processed.
-	 * We take a clone of the original to store in ci.io->orig_bio to be
+	 * We take a clone of the original to store in io->orig_bio to be
 	 * used by dm_end_io_acct() and for dm_io_complete() to use for
 	 * completion handling.
 	 */
@@ -1597,9 +1599,9 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 out:
 	if (!orig_bio)
 		orig_bio = bio;
-	smp_store_release(&ci.io->orig_bio, orig_bio);
-	if (dm_io_flagged(ci.io, DM_IO_START_ACCT))
-		dm_start_io_acct(ci.io, NULL);
+	smp_store_release(&io->orig_bio, orig_bio);
+	if (dm_io_flagged(io, DM_IO_START_ACCT))
+		dm_start_io_acct(io, NULL);
 
 	/*
 	 * Drop the extra reference count for non-POLLED bio, and hold one
@@ -1611,7 +1613,7 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 	if (error || !ci.submit_as_polled)
 		dm_io_dec_pending(ci.io, error);
 	else
-		dm_queue_poll_io(bio, ci.io);
+		dm_queue_poll_io(bio, io);
 }
 
 static void dm_submit_bio(struct bio *bio)
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 05/21] dm: simplify dm_start_io_acct
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (3 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 04/21] dm: simplify dm_io access in dm_split_and_process_bio Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 06/21] dm: mark various branches unlikely Mike Snitzer
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Pull common DM_IO_ACCOUNTED check out to beginning of dm_start_io_acct.
Also, use dm_tio_is_normal (and move it to dm-core.h).

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-core.h |  6 ++++++
 drivers/md/dm.c      | 18 +++++-------------
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 4277853c7535..db069fa9cee5 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -237,6 +237,12 @@ static inline void dm_tio_set_flag(struct dm_target_io *tio, unsigned int bit)
 	tio->flags |= (1U << bit);
 }
 
+static inline bool dm_tio_is_normal(struct dm_target_io *tio)
+{
+	return (dm_tio_flagged(tio, DM_TIO_INSIDE_DM_IO) &&
+		!dm_tio_flagged(tio, DM_TIO_IS_DUPLICATE_BIO));
+}
+
 /*
  * One of these is allocated per original bio.
  * It contains the first clone used for that original.
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 48f93c55c992..0974498c68e7 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -538,17 +538,15 @@ static void dm_start_io_acct(struct dm_io *io, struct bio *clone)
 
 	/*
 	 * Ensure IO accounting is only ever started once.
-	 * Expect no possibility for race unless DM_TIO_IS_DUPLICATE_BIO.
 	 */
-	if (!clone ||
-	    likely(!dm_tio_flagged(clone_to_tio(clone), DM_TIO_IS_DUPLICATE_BIO))) {
-		if (WARN_ON_ONCE(dm_io_flagged(io, DM_IO_ACCOUNTED)))
-			return;
+	if (dm_io_flagged(io, DM_IO_ACCOUNTED))
+		return;
+
+	/* Expect no possibility for race unless DM_TIO_IS_DUPLICATE_BIO. */
+	if (!clone || likely(dm_tio_is_normal(clone_to_tio(clone)))) {
 		dm_io_set_flag(io, DM_IO_ACCOUNTED);
 	} else {
 		unsigned long flags;
-		if (dm_io_flagged(io, DM_IO_ACCOUNTED))
-			return;
 		/* Can afford locking given DM_TIO_IS_DUPLICATE_BIO */
 		spin_lock_irqsave(&io->lock, flags);
 		dm_io_set_flag(io, DM_IO_ACCOUNTED);
@@ -923,12 +921,6 @@ static void dm_io_complete(struct dm_io *io)
 	}
 }
 
-static inline bool dm_tio_is_normal(struct dm_target_io *tio)
-{
-	return (dm_tio_flagged(tio, DM_TIO_INSIDE_DM_IO) &&
-		!dm_tio_flagged(tio, DM_TIO_IS_DUPLICATE_BIO));
-}
-
 /*
  * Decrements the number of outstanding ios that a bio has been
  * cloned into, completing the original io if necc.
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 06/21] dm: mark various branches unlikely
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (4 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 05/21] dm: simplify dm_start_io_acct Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 07/21] dm: add local variables to clone_endio and __map_bio Mike Snitzer
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 0974498c68e7..630c1880023a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -992,7 +992,7 @@ static void clone_endio(struct bio *bio)
 			disable_write_zeroes(md);
 	}
 
-	if (blk_queue_is_zoned(q))
+	if (unlikely(blk_queue_is_zoned(q)))
 		dm_zone_endio(io, bio);
 
 	if (endio) {
@@ -1289,7 +1289,7 @@ static void __map_bio(struct bio *clone)
 	 * on zoned target. In this case, dm_zone_map_bio() calls the target
 	 * map operation.
 	 */
-	if (dm_emulate_zone_append(io->md))
+	if (unlikely(dm_emulate_zone_append(io->md)))
 		r = dm_zone_map_bio(tio);
 	else
 		r = ti->type->map(ti, clone);
@@ -1632,7 +1632,7 @@ static void dm_submit_bio(struct bio *bio)
 	 * Use blk_queue_split() for abnormal IO (e.g. discard, writesame, etc)
 	 * otherwise associated queue_limits won't be imposed.
 	 */
-	if (is_abnormal_io(bio))
+	if (unlikely(is_abnormal_io(bio)))
 		blk_queue_split(&bio);
 
 	dm_split_and_process_bio(md, map, bio);
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 07/21] dm: add local variables to clone_endio and __map_bio
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (5 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 06/21] dm: mark various branches unlikely Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 08/21] dm: move hot dm_io members to same cacheline as dm_target_io Mike Snitzer
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Avoid redundant dereferences in both functions.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 630c1880023a..4793225722dd 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -977,11 +977,12 @@ static bool swap_bios_limit(struct dm_target *ti, struct bio *bio)
 static void clone_endio(struct bio *bio)
 {
 	blk_status_t error = bio->bi_status;
+	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
 	struct dm_target_io *tio = clone_to_tio(bio);
+	struct dm_target *ti = tio->ti;
+	dm_endio_fn endio = ti->type->end_io;
 	struct dm_io *io = tio->io;
-	struct mapped_device *md = tio->io->md;
-	dm_endio_fn endio = tio->ti->type->end_io;
-	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+	struct mapped_device *md = io->md;
 
 	if (unlikely(error == BLK_STS_TARGET)) {
 		if (bio_op(bio) == REQ_OP_DISCARD &&
@@ -996,7 +997,7 @@ static void clone_endio(struct bio *bio)
 		dm_zone_endio(io, bio);
 
 	if (endio) {
-		int r = endio(tio->ti, bio, &error);
+		int r = endio(ti, bio, &error);
 		switch (r) {
 		case DM_ENDIO_REQUEUE:
 			/*
@@ -1020,10 +1021,8 @@ static void clone_endio(struct bio *bio)
 		}
 	}
 
-	if (unlikely(swap_bios_limit(tio->ti, bio))) {
-		struct mapped_device *md = io->md;
+	if (unlikely(swap_bios_limit(ti, bio)))
 		up(&md->swap_bios_semaphore);
-	}
 
 	free_tio(bio);
 	dm_io_dec_pending(io, error);
@@ -1264,9 +1263,10 @@ static noinline void __set_swap_bios_limit(struct mapped_device *md, int latch)
 static void __map_bio(struct bio *clone)
 {
 	struct dm_target_io *tio = clone_to_tio(clone);
-	int r;
-	struct dm_io *io = tio->io;
 	struct dm_target *ti = tio->ti;
+	struct dm_io *io = tio->io;
+	struct mapped_device *md = io->md;
+	int r;
 
 	clone->bi_end_io = clone_endio;
 
@@ -1277,7 +1277,6 @@ static void __map_bio(struct bio *clone)
 	tio->old_sector = clone->bi_iter.bi_sector;
 
 	if (unlikely(swap_bios_limit(ti, clone))) {
-		struct mapped_device *md = io->md;
 		int latch = get_swap_bios();
 		if (unlikely(latch != md->swap_bios))
 			__set_swap_bios_limit(md, latch);
@@ -1289,7 +1288,7 @@ static void __map_bio(struct bio *clone)
 	 * on zoned target. In this case, dm_zone_map_bio() calls the target
 	 * map operation.
 	 */
-	if (unlikely(dm_emulate_zone_append(io->md)))
+	if (unlikely(dm_emulate_zone_append(md)))
 		r = dm_zone_map_bio(tio);
 	else
 		r = ti->type->map(ti, clone);
@@ -1305,14 +1304,14 @@ static void __map_bio(struct bio *clone)
 		 * the bio has been remapped so dispatch it, but defer
 		 * dm_start_io_acct() until after possible bio_split().
 		 */
-		__dm_submit_bio_remap(clone, disk_devt(io->md->disk),
+		__dm_submit_bio_remap(clone, disk_devt(md->disk),
 				      tio->old_sector);
 		dm_io_set_flag(io, DM_IO_START_ACCT);
 		break;
 	case DM_MAPIO_KILL:
 	case DM_MAPIO_REQUEUE:
 		if (unlikely(swap_bios_limit(ti, clone)))
-			up(&io->md->swap_bios_semaphore);
+			up(&md->swap_bios_semaphore);
 		free_tio(clone);
 		if (r == DM_MAPIO_KILL)
 			dm_io_dec_pending(io, BLK_STS_IOERR);
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 08/21] dm: move hot dm_io members to same cacheline as dm_target_io
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (6 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 07/21] dm: add local variables to clone_endio and __map_bio Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 09/21] dm: introduce dm_{get, put}_live_table_bio called from dm_submit_bio Mike Snitzer
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Just saves some cacheline bouncing for members accessed during cloned
bio submission and completion.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-core.h | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index db069fa9cee5..41d6511dc7cf 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -250,17 +250,19 @@ static inline bool dm_tio_is_normal(struct dm_target_io *tio)
 #define DM_IO_MAGIC 19577
 struct dm_io {
 	unsigned short magic;
-	blk_short_t flags;
-	atomic_t io_count;
-	struct mapped_device *md;
-	struct bio *orig_bio;
-	blk_status_t status;
+
 	spinlock_t lock;
 	unsigned long start_time;
 	void *data;
 	struct hlist_node node;
 	struct task_struct *map_task;
 	struct dm_stats_aux stats_aux;
+
+	blk_short_t flags;
+	blk_status_t status;
+	atomic_t io_count;
+	struct mapped_device *md;
+	struct bio *orig_bio;
 	/* last member of dm_target_io is 'struct bio' */
 	struct dm_target_io tio;
 };
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 09/21] dm: introduce dm_{get, put}_live_table_bio called from dm_submit_bio
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (7 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 08/21] dm: move hot dm_io members to same cacheline as dm_target_io Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 10/21] dm: conditionally enable branching for less used features Mike Snitzer
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

If a bio is marked REQ_NOWAIT optimize dm_submit_bio()'s dm_table RCU
usage to dm_{get,put}_live_table_fast.

DM core offers protection against blocking (via suspend) if REQ_NOWAIT.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4793225722dd..073b41ce7a30 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -661,14 +661,16 @@ static void queue_io(struct mapped_device *md, struct bio *bio)
  * function to access the md->map field, and make sure they call
  * dm_put_live_table() when finished.
  */
-struct dm_table *dm_get_live_table(struct mapped_device *md, int *srcu_idx) __acquires(md->io_barrier)
+struct dm_table *dm_get_live_table(struct mapped_device *md,
+				   int *srcu_idx) __acquires(md->io_barrier)
 {
 	*srcu_idx = srcu_read_lock(&md->io_barrier);
 
 	return srcu_dereference(md->map, &md->io_barrier);
 }
 
-void dm_put_live_table(struct mapped_device *md, int srcu_idx) __releases(md->io_barrier)
+void dm_put_live_table(struct mapped_device *md,
+		       int srcu_idx) __releases(md->io_barrier)
 {
 	srcu_read_unlock(&md->io_barrier, srcu_idx);
 }
@@ -694,6 +696,24 @@ static void dm_put_live_table_fast(struct mapped_device *md) __releases(RCU)
 	rcu_read_unlock();
 }
 
+static inline struct dm_table *dm_get_live_table_bio(struct mapped_device *md,
+						     int *srcu_idx, struct bio *bio)
+{
+	if (bio->bi_opf & REQ_NOWAIT)
+		return dm_get_live_table_fast(md);
+	else
+		return dm_get_live_table(md, srcu_idx);
+}
+
+static inline void dm_put_live_table_bio(struct mapped_device *md, int srcu_idx,
+					 struct bio *bio)
+{
+	if (bio->bi_opf & REQ_NOWAIT)
+		dm_put_live_table_fast(md);
+	else
+		dm_put_live_table(md, srcu_idx);
+}
+
 static char *_dm_claim_ptr = "I belong to device-mapper";
 
 /*
@@ -1613,7 +1633,7 @@ static void dm_submit_bio(struct bio *bio)
 	int srcu_idx;
 	struct dm_table *map;
 
-	map = dm_get_live_table(md, &srcu_idx);
+	map = dm_get_live_table_bio(md, &srcu_idx, bio);
 
 	/* If suspended, or map not yet available, queue this IO for later */
 	if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) ||
@@ -1636,7 +1656,7 @@ static void dm_submit_bio(struct bio *bio)
 
 	dm_split_and_process_bio(md, map, bio);
 out:
-	dm_put_live_table(md, srcu_idx);
+	dm_put_live_table_bio(md, srcu_idx, bio);
 }
 
 static bool dm_poll_dm_io(struct dm_io *io, struct io_comp_batch *iob,
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 10/21] dm: conditionally enable branching for less used features
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (8 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 09/21] dm: introduce dm_{get, put}_live_table_bio called from dm_submit_bio Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 11/21] dm: simplify basic targets Mike Snitzer
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Use jump_labels to further reduce cost of unlikely branches for zoned
block devices, dm-stats and swap_bios throttling.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-core.h  |  5 ++++
 drivers/md/dm-stats.c |  3 +++
 drivers/md/dm-table.c |  5 ++++
 drivers/md/dm.c       | 63 ++++++++++++++++++++++++++++++++-------------------
 4 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 41d6511dc7cf..8ba99eaa0872 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -13,6 +13,7 @@
 #include <linux/ktime.h>
 #include <linux/blk-mq.h>
 #include <linux/blk-crypto-profile.h>
+#include <linux/jump_label.h>
 
 #include <trace/events/block.h>
 
@@ -154,6 +155,10 @@ static inline struct dm_stats *dm_get_stats(struct mapped_device *md)
 	return &md->stats;
 }
 
+DECLARE_STATIC_KEY_FALSE(stats_enabled);
+DECLARE_STATIC_KEY_FALSE(swap_bios_enabled);
+DECLARE_STATIC_KEY_FALSE(zoned_enabled);
+
 static inline bool dm_emulate_zone_append(struct mapped_device *md)
 {
 	if (blk_queue_is_zoned(md->queue))
diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index 0e039a8c0bf2..86e0697330e8 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -396,6 +396,9 @@ static int dm_stats_create(struct dm_stats *stats, sector_t start, sector_t end,
 
 	dm_stats_recalc_precise_timestamps(stats);
 
+	if (!static_key_enabled(&stats_enabled.key))
+		static_branch_enable(&stats_enabled);
+
 	mutex_unlock(&stats->mutex);
 
 	resume_callback(md);
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index ffd97cb6d1c1..a9d2fc5d9ad9 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -719,6 +719,9 @@ int dm_table_add_target(struct dm_table *t, const char *type,
 		DMWARN("%s: %s: ignoring discards_supported because num_discard_bios is zero.",
 		       dm_device_name(t->md), type);
 
+	if (tgt->limit_swap_bios && !static_key_enabled(&swap_bios_enabled.key))
+		static_branch_enable(&swap_bios_enabled);
+
 	return 0;
 
  bad:
@@ -2051,6 +2054,8 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 		r = dm_set_zones_restrictions(t, q);
 		if (r)
 			return r;
+		if (!static_key_enabled(&zoned_enabled.key))
+			static_branch_enable(&zoned_enabled);
 	}
 
 	dm_update_crypto_profile(q, t);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 073b41ce7a30..3359b55daa99 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -71,6 +71,10 @@ void dm_issue_global_event(void)
 	wake_up(&dm_global_eventq);
 }
 
+DEFINE_STATIC_KEY_FALSE(stats_enabled);
+DEFINE_STATIC_KEY_FALSE(swap_bios_enabled);
+DEFINE_STATIC_KEY_FALSE(zoned_enabled);
+
 /*
  * One of these is allocated (on-stack) per original bio.
  */
@@ -516,7 +520,8 @@ static void dm_io_acct(bool end, struct mapped_device *md, struct bio *bio,
 	else
 		bio_end_io_acct(bio, start_time);
 
-	if (unlikely(dm_stats_used(&md->stats)))
+	if (static_branch_unlikely(&stats_enabled) &&
+	    unlikely(dm_stats_used(&md->stats)))
 		dm_stats_account_io(&md->stats, bio_data_dir(bio),
 				    bio->bi_iter.bi_sector, bio_sectors(bio),
 				    end, start_time, stats_aux);
@@ -586,7 +591,8 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
 	io->start_time = jiffies;
 	io->flags = 0;
 
-	dm_stats_record_start(&md->stats, &io->stats_aux);
+	if (static_branch_unlikely(&stats_enabled))
+		dm_stats_record_start(&md->stats, &io->stats_aux);
 
 	return io;
 }
@@ -1013,21 +1019,25 @@ static void clone_endio(struct bio *bio)
 			disable_write_zeroes(md);
 	}
 
-	if (unlikely(blk_queue_is_zoned(q)))
+	if (static_branch_unlikely(&zoned_enabled) &&
+	    unlikely(blk_queue_is_zoned(q)))
 		dm_zone_endio(io, bio);
 
 	if (endio) {
 		int r = endio(ti, bio, &error);
 		switch (r) {
 		case DM_ENDIO_REQUEUE:
-			/*
-			 * Requeuing writes to a sequential zone of a zoned
-			 * target will break the sequential write pattern:
-			 * fail such IO.
-			 */
-			if (WARN_ON_ONCE(dm_is_zone_write(md, bio)))
-				error = BLK_STS_IOERR;
-			else
+			if (static_branch_unlikely(&zoned_enabled)) {
+				/*
+				 * Requeuing writes to a sequential zone of a zoned
+				 * target will break the sequential write pattern:
+				 * fail such IO.
+				 */
+				if (WARN_ON_ONCE(dm_is_zone_write(md, bio)))
+					error = BLK_STS_IOERR;
+				else
+					error = BLK_STS_DM_REQUEUE;
+			} else
 				error = BLK_STS_DM_REQUEUE;
 			fallthrough;
 		case DM_ENDIO_DONE:
@@ -1041,7 +1051,8 @@ static void clone_endio(struct bio *bio)
 		}
 	}
 
-	if (unlikely(swap_bios_limit(ti, bio)))
+	if (static_branch_unlikely(&swap_bios_enabled) &&
+	    unlikely(swap_bios_limit(ti, bio)))
 		up(&md->swap_bios_semaphore);
 
 	free_tio(bio);
@@ -1296,21 +1307,25 @@ static void __map_bio(struct bio *clone)
 	dm_io_inc_pending(io);
 	tio->old_sector = clone->bi_iter.bi_sector;
 
-	if (unlikely(swap_bios_limit(ti, clone))) {
+	if (static_branch_unlikely(&swap_bios_enabled) &&
+	    unlikely(swap_bios_limit(ti, clone))) {
 		int latch = get_swap_bios();
 		if (unlikely(latch != md->swap_bios))
 			__set_swap_bios_limit(md, latch);
 		down(&md->swap_bios_semaphore);
 	}
 
-	/*
-	 * Check if the IO needs a special mapping due to zone append emulation
-	 * on zoned target. In this case, dm_zone_map_bio() calls the target
-	 * map operation.
-	 */
-	if (unlikely(dm_emulate_zone_append(md)))
-		r = dm_zone_map_bio(tio);
-	else
+	if (static_branch_unlikely(&zoned_enabled)) {
+		/*
+		 * Check if the IO needs a special mapping due to zone append
+		 * emulation on zoned target. In this case, dm_zone_map_bio()
+		 * calls the target map operation.
+		 */
+		if (unlikely(dm_emulate_zone_append(md)))
+			r = dm_zone_map_bio(tio);
+		else
+			r = ti->type->map(ti, clone);
+	} else
 		r = ti->type->map(ti, clone);
 
 	switch (r) {
@@ -1330,7 +1345,8 @@ static void __map_bio(struct bio *clone)
 		break;
 	case DM_MAPIO_KILL:
 	case DM_MAPIO_REQUEUE:
-		if (unlikely(swap_bios_limit(ti, clone)))
+		if (static_branch_unlikely(&swap_bios_enabled) &&
+		    unlikely(swap_bios_limit(ti, clone)))
 			up(&md->swap_bios_semaphore);
 		free_tio(clone);
 		if (r == DM_MAPIO_KILL)
@@ -1566,7 +1582,8 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
 	ci->sector_count = bio_sectors(bio);
 
 	/* Shouldn't happen but sector_count was being set to 0 so... */
-	if (WARN_ON_ONCE(op_is_zone_mgmt(bio_op(bio)) && ci->sector_count))
+	if (static_branch_unlikely(&zoned_enabled) &&
+	    WARN_ON_ONCE(op_is_zone_mgmt(bio_op(bio)) && ci->sector_count))
 		ci->sector_count = 0;
 }
 
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 11/21] dm: simplify basic targets
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (9 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 10/21] dm: conditionally enable branching for less used features Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 12/21] dm: use bio_sectors in dm_aceept_partial_bio Mike Snitzer
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Remove needless factoring and remap bi_sector regardless of
bio_sectors() being non-zero.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-delay.c  |  3 +--
 drivers/md/dm-flakey.c |  4 +---
 drivers/md/dm-linear.c | 11 ++---------
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index 9a51bf51a859..869afef5654a 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -296,8 +296,7 @@ static int delay_map(struct dm_target *ti, struct bio *bio)
 	}
 	delayed->class = c;
 	bio_set_dev(bio, c->dev->bdev);
-	if (bio_sectors(bio))
-		bio->bi_iter.bi_sector = c->start + dm_target_offset(ti, bio->bi_iter.bi_sector);
+	bio->bi_iter.bi_sector = c->start + dm_target_offset(ti, bio->bi_iter.bi_sector);
 
 	return delay_bio(dc, c, bio);
 }
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 345229d7e59c..f2305eb758a2 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -280,9 +280,7 @@ static void flakey_map_bio(struct dm_target *ti, struct bio *bio)
 	struct flakey_c *fc = ti->private;
 
 	bio_set_dev(bio, fc->dev->bdev);
-	if (bio_sectors(bio) || op_is_zone_mgmt(bio_op(bio)))
-		bio->bi_iter.bi_sector =
-			flakey_map_sector(ti, bio->bi_iter.bi_sector);
+	bio->bi_iter.bi_sector = flakey_map_sector(ti, bio->bi_iter.bi_sector);
 }
 
 static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc)
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 76b486e4d2be..0a6abbbe3745 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -84,19 +84,12 @@ static sector_t linear_map_sector(struct dm_target *ti, sector_t bi_sector)
 	return lc->start + dm_target_offset(ti, bi_sector);
 }
 
-static void linear_map_bio(struct dm_target *ti, struct bio *bio)
+static int linear_map(struct dm_target *ti, struct bio *bio)
 {
 	struct linear_c *lc = ti->private;
 
 	bio_set_dev(bio, lc->dev->bdev);
-	if (bio_sectors(bio) || op_is_zone_mgmt(bio_op(bio)))
-		bio->bi_iter.bi_sector =
-			linear_map_sector(ti, bio->bi_iter.bi_sector);
-}
-
-static int linear_map(struct dm_target *ti, struct bio *bio)
-{
-	linear_map_bio(ti, bio);
+	bio->bi_iter.bi_sector = linear_map_sector(ti, bio->bi_iter.bi_sector);
 
 	return DM_MAPIO_REMAPPED;
 }
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 12/21] dm: use bio_sectors in dm_aceept_partial_bio
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (10 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 11/21] dm: simplify basic targets Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 13/21] dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct Mike Snitzer
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Rename 'bi_size' to 'bio_sectors' given bi_size is being stored in
sectors.  Also, use bio_sectors() rather than open-coding it.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3359b55daa99..c8933d7e6a78 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1195,7 +1195,7 @@ static int dm_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
  * +--------------------+---------------+-------+
  *
  * <-------------- *tio->len_ptr --------------->
- *                      <------- bi_size ------->
+ *                      <----- bio_sectors ----->
  *                      <-- n_sectors -->
  *
  * Region 1 was already iterated over with bio_advance or similar function.
@@ -1212,15 +1212,15 @@ static int dm_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
 void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors)
 {
 	struct dm_target_io *tio = clone_to_tio(bio);
-	unsigned bi_size = bio->bi_iter.bi_size >> SECTOR_SHIFT;
+	unsigned bio_sectors = bio_sectors(bio);
 
 	BUG_ON(dm_tio_flagged(tio, DM_TIO_IS_DUPLICATE_BIO));
 	BUG_ON(op_is_zone_mgmt(bio_op(bio)));
 	BUG_ON(bio_op(bio) == REQ_OP_ZONE_APPEND);
-	BUG_ON(bi_size > *tio->len_ptr);
-	BUG_ON(n_sectors > bi_size);
+	BUG_ON(bio_sectors > *tio->len_ptr);
+	BUG_ON(n_sectors > bio_sectors);
 
-	*tio->len_ptr -= bi_size - n_sectors;
+	*tio->len_ptr -= bio_sectors - n_sectors;
 	bio->bi_iter.bi_size = n_sectors << SECTOR_SHIFT;
 }
 EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 13/21] dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (11 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 12/21] dm: use bio_sectors in dm_aceept_partial_bio Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 14/21] dm: pass dm_io instance to dm_io_acct directly Mike Snitzer
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

[-- Attachment #1: Type: application/octet-stream, Size: 2158 bytes --]

From: Ming Lei <ming.lei@redhat.com>

dm->orig_bio is always passed to __dm_start_io_acct and dm_end_io_acct,
so it isn't necessary to take one bio parameter for the two helpers.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index c8933d7e6a78..9012cd1eff1f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -531,16 +531,13 @@ static void dm_io_acct(bool end, struct mapped_device *md, struct bio *bio,
 		bio->bi_iter.bi_size = bi_size;
 }
 
-static void __dm_start_io_acct(struct dm_io *io, struct bio *bio)
+static void __dm_start_io_acct(struct dm_io *io)
 {
-	dm_io_acct(false, io->md, bio, io->start_time, &io->stats_aux);
+	dm_io_acct(false, io->md, io->orig_bio, io->start_time, &io->stats_aux);
 }
 
 static void dm_start_io_acct(struct dm_io *io, struct bio *clone)
 {
-	/* Must account IO to DM device in terms of orig_bio */
-	struct bio *bio = io->orig_bio;
-
 	/*
 	 * Ensure IO accounting is only ever started once.
 	 */
@@ -558,12 +555,12 @@ static void dm_start_io_acct(struct dm_io *io, struct bio *clone)
 		spin_unlock_irqrestore(&io->lock, flags);
 	}
 
-	__dm_start_io_acct(io, bio);
+	__dm_start_io_acct(io);
 }
 
-static void dm_end_io_acct(struct dm_io *io, struct bio *bio)
+static void dm_end_io_acct(struct dm_io *io)
 {
-	dm_io_acct(true, io->md, bio, io->start_time, &io->stats_aux);
+	dm_io_acct(true, io->md, io->orig_bio, io->start_time, &io->stats_aux);
 }
 
 static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
@@ -899,14 +896,14 @@ static void dm_io_complete(struct dm_io *io)
 
 	io_error = io->status;
 	if (dm_io_flagged(io, DM_IO_ACCOUNTED))
-		dm_end_io_acct(io, bio);
+		dm_end_io_acct(io);
 	else if (!io_error) {
 		/*
 		 * Must handle target that DM_MAPIO_SUBMITTED only to
 		 * then bio_endio() rather than dm_submit_bio_remap()
 		 */
-		__dm_start_io_acct(io, bio);
-		dm_end_io_acct(io, bio);
+		__dm_start_io_acct(io);
+		dm_end_io_acct(io);
 	}
 	free_io(io);
 	smp_wmb();
-- 
2.15.0


[-- Attachment #2: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 14/21] dm: pass dm_io instance to dm_io_acct directly
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (12 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 13/21] dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 15/21] dm: switch to bdev based IO accounting interfaces Mike Snitzer
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

[-- Attachment #1: Type: application/octet-stream, Size: 1689 bytes --]

From: Ming Lei <ming.lei@redhat.com>

All the other 4 parameters are retrieved from the 'dm_io' instance, so
it's not necessary to pass all four to dm_io_acct().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 9012cd1eff1f..f100ced29e0d 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -502,9 +502,12 @@ static bool bio_is_flush_with_data(struct bio *bio)
 	return ((bio->bi_opf & REQ_PREFLUSH) && bio->bi_iter.bi_size);
 }
 
-static void dm_io_acct(bool end, struct mapped_device *md, struct bio *bio,
-		       unsigned long start_time, struct dm_stats_aux *stats_aux)
+static void dm_io_acct(struct dm_io *io, bool end)
 {
+	struct dm_stats_aux *stats_aux = &io->stats_aux;
+	unsigned long start_time = io->start_time;
+	struct mapped_device *md = io->md;
+	struct bio *bio = io->orig_bio;
 	bool is_flush_with_data;
 	unsigned int bi_size;
 
@@ -533,7 +536,7 @@ static void dm_io_acct(bool end, struct mapped_device *md, struct bio *bio,
 
 static void __dm_start_io_acct(struct dm_io *io)
 {
-	dm_io_acct(false, io->md, io->orig_bio, io->start_time, &io->stats_aux);
+	dm_io_acct(io, false);
 }
 
 static void dm_start_io_acct(struct dm_io *io, struct bio *clone)
@@ -560,7 +563,7 @@ static void dm_start_io_acct(struct dm_io *io, struct bio *clone)
 
 static void dm_end_io_acct(struct dm_io *io)
 {
-	dm_io_acct(true, io->md, io->orig_bio, io->start_time, &io->stats_aux);
+	dm_io_acct(io, true);
 }
 
 static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
-- 
2.15.0


[-- Attachment #2: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 15/21] dm: switch to bdev based IO accounting interfaces
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (13 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 14/21] dm: pass dm_io instance to dm_io_acct directly Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 16/21] dm: improve bio splitting and associated IO accounting Mike Snitzer
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

[-- Attachment #1: Type: application/octet-stream, Size: 2099 bytes --]

From: Ming Lei <ming.lei@redhat.com>

DM splits flush with data into empty flush followed by bio with data
payload, switch dm_io_acct() to use bdev_{start,end}_io_acct() to do
this accoiunting more naturally (rather than temporarily changing the
bio's bi_size).

This will allow DM to more easily account bios that are split (in
following commit).

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f100ced29e0d..cb41384cd814 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -508,30 +508,28 @@ static void dm_io_acct(struct dm_io *io, bool end)
 	unsigned long start_time = io->start_time;
 	struct mapped_device *md = io->md;
 	struct bio *bio = io->orig_bio;
-	bool is_flush_with_data;
-	unsigned int bi_size;
+	unsigned int sectors;
 
-	/* If REQ_PREFLUSH set save any payload but do not account it */
-	is_flush_with_data = bio_is_flush_with_data(bio);
-	if (is_flush_with_data) {
-		bi_size = bio->bi_iter.bi_size;
-		bio->bi_iter.bi_size = 0;
-	}
+	/*
+	 * If REQ_PREFLUSH set, don't account payload, it will be
+	 * submitted (and accounted) after this flush completes.
+	 */
+	if (bio_is_flush_with_data(bio))
+		sectors = 0;
+	else
+		sectors = bio_sectors(bio);
 
 	if (!end)
-		bio_start_io_acct_time(bio, start_time);
+		bdev_start_io_acct(bio->bi_bdev, sectors, bio_op(bio),
+				   start_time);
 	else
-		bio_end_io_acct(bio, start_time);
+		bdev_end_io_acct(bio->bi_bdev, bio_op(bio), start_time);
 
 	if (static_branch_unlikely(&stats_enabled) &&
 	    unlikely(dm_stats_used(&md->stats)))
 		dm_stats_account_io(&md->stats, bio_data_dir(bio),
-				    bio->bi_iter.bi_sector, bio_sectors(bio),
+				    bio->bi_iter.bi_sector, sectors,
 				    end, start_time, stats_aux);
-
-	/* Restore bio's payload so it does get accounted upon requeue */
-	if (is_flush_with_data)
-		bio->bi_iter.bi_size = bi_size;
 }
 
 static void __dm_start_io_acct(struct dm_io *io)
-- 
2.15.0


[-- Attachment #2: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 16/21] dm: improve bio splitting and associated IO accounting
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (14 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 15/21] dm: switch to bdev based IO accounting interfaces Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 17/21] dm: don't grab target io reference in dm_zone_map_bio Mike Snitzer
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

[-- Attachment #1: Type: application/octet-stream, Size: 7086 bytes --]

From: Ming Lei <ming.lei@redhat.com>

The current DM code (ab)uses late assignment of dm_io->orig_bio (after
__map_bio() returns and any bio splitting is complete) to indicate the
FS bio has been processed and can be accounted. This results in
awkward waiting until ->orig_bio is set in dm_submit_bio_remap().

Also the bio splitting was implemented using bio_split()+bio_chain()
-- a well-worn pattern but it requires bio cloning purely for the
benefit of more natural IO accounting.  The bio_split() result was
stored in ->orig_bio to represent the mapped part of the original FS
bio.

DM has switched to the bdev based IO accounting interface.  DM's IO
accounting can be implemented in terms of the original FS bio (now
stored early in ->orig_bio) via access to its sectors/bio_op. And
if/when splitting is needed, set a new DM_IO_WAS_SPLIT flag and use
new dm_io fields of .sector_offset & .sectors to allow IO accounting
for split bios _without_ needing to clone a new bio to store in
->orig_bio.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-core.h |  8 +++++-
 drivers/md/dm.c      | 75 ++++++++++++++++++++++++++++++++++++----------------
 2 files changed, 59 insertions(+), 24 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 8ba99eaa0872..37ddedf61249 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -267,7 +267,12 @@ struct dm_io {
 	blk_status_t status;
 	atomic_t io_count;
 	struct mapped_device *md;
+
+	/* The three fields represent mapped part of original bio */
 	struct bio *orig_bio;
+	unsigned int sector_offset; /* offset to end of orig_bio */
+	unsigned int sectors;
+
 	/* last member of dm_target_io is 'struct bio' */
 	struct dm_target_io tio;
 };
@@ -277,7 +282,8 @@ struct dm_io {
  */
 enum {
 	DM_IO_START_ACCT,
-	DM_IO_ACCOUNTED
+	DM_IO_ACCOUNTED,
+	DM_IO_WAS_SPLIT
 };
 
 static inline bool dm_io_flagged(struct dm_io *io, unsigned int bit)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index cb41384cd814..860d2aaffd2a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -516,8 +516,10 @@ static void dm_io_acct(struct dm_io *io, bool end)
 	 */
 	if (bio_is_flush_with_data(bio))
 		sectors = 0;
-	else
+	else if (likely(!(dm_io_flagged(io, DM_IO_WAS_SPLIT))))
 		sectors = bio_sectors(bio);
+	else
+		sectors = io->sectors;
 
 	if (!end)
 		bdev_start_io_acct(bio->bi_bdev, sectors, bio_op(bio),
@@ -526,10 +528,18 @@ static void dm_io_acct(struct dm_io *io, bool end)
 		bdev_end_io_acct(bio->bi_bdev, bio_op(bio), start_time);
 
 	if (static_branch_unlikely(&stats_enabled) &&
-	    unlikely(dm_stats_used(&md->stats)))
+	    unlikely(dm_stats_used(&md->stats))) {
+		sector_t sector;
+
+		if (likely(!dm_io_flagged(io, DM_IO_WAS_SPLIT)))
+			sector = bio->bi_iter.bi_sector;
+		else
+			sector = bio_end_sector(bio) - io->sector_offset;
+
 		dm_stats_account_io(&md->stats, bio_data_dir(bio),
-				    bio->bi_iter.bi_sector, sectors,
+				    sector, sectors,
 				    end, start_time, stats_aux);
+	}
 }
 
 static void __dm_start_io_acct(struct dm_io *io)
@@ -582,7 +592,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
 	io->status = BLK_STS_OK;
 	atomic_set(&io->io_count, 1);
 	this_cpu_inc(*md->pending_io);
-	io->orig_bio = NULL;
+	io->orig_bio = bio;
 	io->md = md;
 	io->map_task = current;
 	spin_lock_init(&io->lock);
@@ -1220,6 +1230,13 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors)
 
 	*tio->len_ptr -= bio_sectors - n_sectors;
 	bio->bi_iter.bi_size = n_sectors << SECTOR_SHIFT;
+
+	/*
+	 * __split_and_process_bio() may have already saved mapped part
+	 * for accounting but it is being reduced so update accordingly.
+	 */
+	dm_io_set_flag(tio->io, DM_IO_WAS_SPLIT);
+	tio->io->sectors = n_sectors;
 }
 EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
 
@@ -1258,13 +1275,6 @@ void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone)
 		/* Still in target's map function */
 		dm_io_set_flag(io, DM_IO_START_ACCT);
 	} else {
-		/*
-		 * Called by another thread, managed by DM target,
-		 * wait for dm_split_and_process_bio() to store
-		 * io->orig_bio
-		 */
-		while (unlikely(!smp_load_acquire(&io->orig_bio)))
-			msleep(1);
 		dm_start_io_acct(io, clone);
 	}
 
@@ -1358,6 +1368,31 @@ static void __map_bio(struct bio *clone)
 	}
 }
 
+static void setup_split_accounting(struct clone_info *ci, unsigned len)
+{
+	struct dm_io *io = ci->io;
+
+	if (ci->sector_count > len) {
+		/*
+		 * Split needed, save the mapped part for accounting.
+		 * NOTE: dm_accept_partial_bio() will update accordingly.
+		 */
+		dm_io_set_flag(io, DM_IO_WAS_SPLIT);
+		io->sectors = len;
+	}
+
+	if (static_branch_unlikely(&stats_enabled) &&
+	    unlikely(dm_stats_used(&io->md->stats))) {
+		/*
+		 * Save bi_sector in terms of its offset from end of
+		 * original bio, only needed for DM-stats' benefit.
+		 * - saved regardless of whether split needed so that
+		 *   dm_accept_partial_bio() doesn't need to.
+		 */
+		io->sector_offset = bio_end_sector(ci->bio) - ci->sector;
+	}
+}
+
 static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
 				struct dm_target *ti, unsigned num_bios)
 {
@@ -1397,6 +1432,8 @@ static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
 	case 0:
 		break;
 	case 1:
+		if (len)
+			setup_split_accounting(ci, *len);
 		clone = alloc_tio(ci, ti, 0, len, GFP_NOIO);
 		__map_bio(clone);
 		break;
@@ -1560,6 +1597,7 @@ static blk_status_t __split_and_process_bio(struct clone_info *ci)
 	ci->submit_as_polled = ci->bio->bi_opf & REQ_POLLED;
 
 	len = min_t(sector_t, max_io_len(ti, ci->sector), ci->sector_count);
+	setup_split_accounting(ci, len);
 	clone = alloc_tio(ci, ti, 0, &len, GFP_NOIO);
 	__map_bio(clone);
 
@@ -1593,7 +1631,6 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 {
 	struct clone_info ci;
 	struct dm_io *io;
-	struct bio *orig_bio = NULL;
 	blk_status_t error = BLK_STS_OK;
 
 	init_clone_info(&ci, md, map, bio);
@@ -1609,23 +1646,15 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 	io->map_task = NULL;
 	if (error || !ci.sector_count)
 		goto out;
-
 	/*
 	 * Remainder must be passed to submit_bio_noacct() so it gets handled
 	 * *after* bios already submitted have been completely processed.
-	 * We take a clone of the original to store in io->orig_bio to be
-	 * used by dm_end_io_acct() and for dm_io_complete() to use for
-	 * completion handling.
 	 */
-	orig_bio = bio_split(bio, bio_sectors(bio) - ci.sector_count,
-			     GFP_NOIO, &md->queue->bio_split);
-	bio_chain(orig_bio, bio);
-	trace_block_split(orig_bio, bio->bi_iter.bi_sector);
+	bio_trim(bio, io->sectors, ci.sector_count);
+	trace_block_split(bio, bio->bi_iter.bi_sector);
+	bio_inc_remaining(bio);
 	submit_bio_noacct(bio);
 out:
-	if (!orig_bio)
-		orig_bio = bio;
-	smp_store_release(&io->orig_bio, orig_bio);
 	if (dm_io_flagged(io, DM_IO_START_ACCT))
 		dm_start_io_acct(io, NULL);
 
-- 
2.15.0


[-- Attachment #2: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 17/21] dm: don't grab target io reference in dm_zone_map_bio
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (15 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 16/21] dm: improve bio splitting and associated IO accounting Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 18/21] dm: improve dm_io reference counting Mike Snitzer
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

[-- Attachment #1: Type: application/octet-stream, Size: 2737 bytes --]

From: Ming Lei <ming.lei@redhat.com>

dm_zone_map_bio() is only called from __map_bio in which the io's
reference is grabbed already, and the reference won't be released
until the bio is submitted, so not necessary to do it dm_zone_map_bio
any more.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-core.h |  7 -------
 drivers/md/dm-zone.c | 10 ----------
 drivers/md/dm.c      |  7 ++++++-
 3 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 37ddedf61249..d2d188c9b632 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -296,13 +296,6 @@ static inline void dm_io_set_flag(struct dm_io *io, unsigned int bit)
 	io->flags |= (1U << bit);
 }
 
-static inline void dm_io_inc_pending(struct dm_io *io)
-{
-	atomic_inc(&io->io_count);
-}
-
-void dm_io_dec_pending(struct dm_io *io, blk_status_t error);
-
 static inline struct completion *dm_get_completion_from_kobject(struct kobject *kobj)
 {
 	return &container_of(kobj, struct dm_kobject_holder, kobj)->completion;
diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
index 57daa86c19cf..3e7b1fe1580b 100644
--- a/drivers/md/dm-zone.c
+++ b/drivers/md/dm-zone.c
@@ -550,13 +550,6 @@ int dm_zone_map_bio(struct dm_target_io *tio)
 		return DM_MAPIO_KILL;
 	}
 
-	/*
-	 * The target map function may issue and complete the IO quickly.
-	 * Take an extra reference on the IO to make sure it does disappear
-	 * until we run dm_zone_map_bio_end().
-	 */
-	dm_io_inc_pending(io);
-
 	/* Let the target do its work */
 	r = ti->type->map(ti, clone);
 	switch (r) {
@@ -587,9 +580,6 @@ int dm_zone_map_bio(struct dm_target_io *tio)
 		break;
 	}
 
-	/* Drop the extra reference on the IO */
-	dm_io_dec_pending(io, sts);
-
 	if (sts != BLK_STS_OK)
 		return DM_MAPIO_KILL;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 860d2aaffd2a..e4ccd0cce8f3 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -955,6 +955,11 @@ static void dm_io_complete(struct dm_io *io)
 	}
 }
 
+static void dm_io_inc_pending(struct dm_io *io)
+{
+	atomic_inc(&io->io_count);
+}
+
 /*
  * Decrements the number of outstanding ios that a bio has been
  * cloned into, completing the original io if necc.
@@ -978,7 +983,7 @@ static void dm_io_set_error(struct dm_io *io, blk_status_t error)
 	spin_unlock_irqrestore(&io->lock, flags);
 }
 
-void dm_io_dec_pending(struct dm_io *io, blk_status_t error)
+static void dm_io_dec_pending(struct dm_io *io, blk_status_t error)
 {
 	if (unlikely(error))
 		dm_io_set_error(io, error);
-- 
2.15.0


[-- Attachment #2: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 18/21] dm: improve dm_io reference counting
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (16 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 17/21] dm: don't grab target io reference in dm_zone_map_bio Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 19/21] dm: put all polled dm_io instances into a single list Mike Snitzer
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

[-- Attachment #1: Type: application/octet-stream, Size: 4732 bytes --]

From: Ming Lei <ming.lei@redhat.com>

Currently each dm_io's reference counter is grabbed before calling
__map_bio(), this way isn't efficient since we can move this grabbing
to initialization time inside alloc_io().

Meantime it becomes typical async io reference counter model: one is
for submission side, the other is for completion side, and the io won't
be completed until both sides are done.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 53 +++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e4ccd0cce8f3..9b5cdd4b734d 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -590,7 +590,9 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
 	io = container_of(tio, struct dm_io, tio);
 	io->magic = DM_IO_MAGIC;
 	io->status = BLK_STS_OK;
-	atomic_set(&io->io_count, 1);
+
+	/* one ref is for submission, the other is for completion */
+	atomic_set(&io->io_count, 2);
 	this_cpu_inc(*md->pending_io);
 	io->orig_bio = bio;
 	io->md = md;
@@ -955,11 +957,6 @@ static void dm_io_complete(struct dm_io *io)
 	}
 }
 
-static void dm_io_inc_pending(struct dm_io *io)
-{
-	atomic_inc(&io->io_count);
-}
-
 /*
  * Decrements the number of outstanding ios that a bio has been
  * cloned into, completing the original io if necc.
@@ -1317,7 +1314,6 @@ static void __map_bio(struct bio *clone)
 	/*
 	 * Map the clone.
 	 */
-	dm_io_inc_pending(io);
 	tio->old_sector = clone->bi_iter.bi_sector;
 
 	if (static_branch_unlikely(&swap_bios_enabled) &&
@@ -1427,11 +1423,12 @@ static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
 	}
 }
 
-static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
+static int __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
 				  unsigned num_bios, unsigned *len)
 {
 	struct bio_list blist = BIO_EMPTY_LIST;
 	struct bio *clone;
+	int ret = 0;
 
 	switch (num_bios) {
 	case 0:
@@ -1441,6 +1438,7 @@ static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
 			setup_split_accounting(ci, *len);
 		clone = alloc_tio(ci, ti, 0, len, GFP_NOIO);
 		__map_bio(clone);
+		ret = 1;
 		break;
 	default:
 		/* dm_accept_partial_bio() is not supported with shared tio->len_ptr */
@@ -1448,9 +1446,12 @@ static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
 		while ((clone = bio_list_pop(&blist))) {
 			dm_tio_set_flag(clone_to_tio(clone), DM_TIO_IS_DUPLICATE_BIO);
 			__map_bio(clone);
+			ret += 1;
 		}
 		break;
 	}
+
+	return ret;
 }
 
 static void __send_empty_flush(struct clone_info *ci)
@@ -1471,8 +1472,19 @@ static void __send_empty_flush(struct clone_info *ci)
 	ci->sector_count = 0;
 	ci->io->tio.clone.bi_iter.bi_size = 0;
 
-	while ((ti = dm_table_get_target(ci->map, target_nr++)))
-		__send_duplicate_bios(ci, ti, ti->num_flush_bios, NULL);
+	while ((ti = dm_table_get_target(ci->map, target_nr++))) {
+		int bios;
+
+		atomic_add(ti->num_flush_bios, &ci->io->io_count);
+		bios = __send_duplicate_bios(ci, ti, ti->num_flush_bios, NULL);
+		atomic_sub(ti->num_flush_bios - bios, &ci->io->io_count);
+	}
+
+	/*
+	 * alloc_io() takes one extra reference for submission, so the
+	 * reference won't reach 0 without the following subtraction
+	 */
+	atomic_sub(1, &ci->io->io_count);
 
 	bio_uninit(ci->bio);
 }
@@ -1481,11 +1493,18 @@ static void __send_changing_extent_only(struct clone_info *ci, struct dm_target
 					unsigned num_bios)
 {
 	unsigned len;
+	int bios;
 
 	len = min_t(sector_t, ci->sector_count,
 		    max_io_len_target_boundary(ti, dm_target_offset(ti, ci->sector)));
 
-	__send_duplicate_bios(ci, ti, num_bios, &len);
+	atomic_add(num_bios, &ci->io->io_count);
+	bios = __send_duplicate_bios(ci, ti, num_bios, &len);
+	/*
+	 * alloc_io() takes one extra reference for submission, so the
+	 * reference won't reach 0 without the following (+1) subtraction
+	 */
+	atomic_sub(num_bios - bios + 1, &ci->io->io_count);
 
 	ci->sector += len;
 	ci->sector_count -= len;
@@ -1670,9 +1689,15 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 	 * Add every dm_io instance into the hlist_head which is stored in
 	 * bio->bi_private, so that dm_poll_bio can poll them all.
 	 */
-	if (error || !ci.submit_as_polled)
-		dm_io_dec_pending(ci.io, error);
-	else
+	if (error || !ci.submit_as_polled) {
+		/*
+		 * In case of submission failure, the extra reference for
+		 * submitting io isn't consumed yet
+		 */
+		if (error)
+			atomic_dec(&io->io_count);
+		dm_io_dec_pending(io, error);
+	} else
 		dm_queue_poll_io(bio, io);
 }
 
-- 
2.15.0


[-- Attachment #2: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 19/21] dm: put all polled dm_io instances into a single list
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (17 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 18/21] dm: improve dm_io reference counting Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 20/21] dm: simplify bio-based IO accounting further Mike Snitzer
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

[-- Attachment #1: Type: application/octet-stream, Size: 5068 bytes --]

From: Ming Lei <ming.lei@redhat.com>

Now that bio_split() isn't used by DM's bio splitting, it is a bit
overkill to link dm_io into an hlist given there is only single dm_io
in the list.

Convert to using a single list for holding all dm_io instances
associated with this bio.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-core.h |  2 +-
 drivers/md/dm.c      | 52 +++++++++++++++++++++++++++-------------------------
 2 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index d2d188c9b632..f3cfc7affd12 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -259,7 +259,7 @@ struct dm_io {
 	spinlock_t lock;
 	unsigned long start_time;
 	void *data;
-	struct hlist_node node;
+	struct dm_io *next;
 	struct task_struct *map_task;
 	struct dm_stats_aux stats_aux;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 9b5cdd4b734d..c5a79712de1d 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1560,7 +1560,7 @@ static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
 }
 
 /*
- * Reuse ->bi_private as hlist head for storing all dm_io instances
+ * Reuse ->bi_private as dm_io list head for storing all dm_io instances
  * associated with this bio, and this bio's bi_private needs to be
  * stored in dm_io->data before the reuse.
  *
@@ -1568,36 +1568,37 @@ static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
  * touch it after splitting. Meantime it won't be changed by anyone after
  * bio is submitted. So this reuse is safe.
  */
-static inline struct hlist_head *dm_get_bio_hlist_head(struct bio *bio)
+static inline struct dm_io **dm_poll_list_head(struct bio *bio)
 {
-	return (struct hlist_head *)&bio->bi_private;
+	return (struct dm_io **)&bio->bi_private;
 }
 
 static void dm_queue_poll_io(struct bio *bio, struct dm_io *io)
 {
-	struct hlist_head *head = dm_get_bio_hlist_head(bio);
+	struct dm_io **head = dm_poll_list_head(bio);
 
 	if (!(bio->bi_opf & REQ_DM_POLL_LIST)) {
 		bio->bi_opf |= REQ_DM_POLL_LIST;
 		/*
 		 * Save .bi_private into dm_io, so that we can reuse
-		 * .bi_private as hlist head for storing dm_io list
+		 * .bi_private as dm_io list head for storing dm_io list
 		 */
 		io->data = bio->bi_private;
 
-		INIT_HLIST_HEAD(head);
-
 		/* tell block layer to poll for completion */
 		bio->bi_cookie = ~BLK_QC_T_NONE;
+
+		io->next = NULL;
 	} else {
 		/*
 		 * bio recursed due to split, reuse original poll list,
 		 * and save bio->bi_private too.
 		 */
-		io->data = hlist_entry(head->first, struct dm_io, node)->data;
+		io->data = (*head)->data;
+		io->next = *head;
 	}
 
-	hlist_add_head(&io->node, head);
+	*head = io;
 }
 
 /*
@@ -1686,8 +1687,8 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 	 * Drop the extra reference count for non-POLLED bio, and hold one
 	 * reference for POLLED bio, which will be released in dm_poll_bio
 	 *
-	 * Add every dm_io instance into the hlist_head which is stored in
-	 * bio->bi_private, so that dm_poll_bio can poll them all.
+	 * Add every dm_io instance into the dm_io list head which is stored
+	 * in bio->bi_private, so that dm_poll_bio can poll them all.
 	 */
 	if (error || !ci.submit_as_polled) {
 		/*
@@ -1749,18 +1750,16 @@ static bool dm_poll_dm_io(struct dm_io *io, struct io_comp_batch *iob,
 static int dm_poll_bio(struct bio *bio, struct io_comp_batch *iob,
 		       unsigned int flags)
 {
-	struct hlist_head *head = dm_get_bio_hlist_head(bio);
-	struct hlist_head tmp = HLIST_HEAD_INIT;
-	struct hlist_node *next;
-	struct dm_io *io;
+	struct dm_io **head = dm_poll_list_head(bio);
+	struct dm_io *list = *head;
+	struct dm_io *tmp = NULL;
+	struct dm_io *curr, *next;
 
 	/* Only poll normal bio which was marked as REQ_DM_POLL_LIST */
 	if (!(bio->bi_opf & REQ_DM_POLL_LIST))
 		return 0;
 
-	WARN_ON_ONCE(hlist_empty(head));
-
-	hlist_move_list(head, &tmp);
+	WARN_ON_ONCE(!list);
 
 	/*
 	 * Restore .bi_private before possibly completing dm_io.
@@ -1771,24 +1770,27 @@ static int dm_poll_bio(struct bio *bio, struct io_comp_batch *iob,
 	 * clearing REQ_DM_POLL_LIST here.
 	 */
 	bio->bi_opf &= ~REQ_DM_POLL_LIST;
-	bio->bi_private = hlist_entry(tmp.first, struct dm_io, node)->data;
+	bio->bi_private = list->data;
 
-	hlist_for_each_entry_safe(io, next, &tmp, node) {
-		if (dm_poll_dm_io(io, iob, flags)) {
-			hlist_del_init(&io->node);
+	for (curr = list, next = curr->next; curr; curr = next, next =
+			curr ? curr->next : NULL) {
+		if (dm_poll_dm_io(curr, iob, flags)) {
 			/*
 			 * clone_endio() has already occurred, so no
 			 * error handling is needed here.
 			 */
-			__dm_io_dec_pending(io);
+			__dm_io_dec_pending(curr);
+		} else {
+			curr->next = tmp;
+			tmp = curr;
 		}
 	}
 
 	/* Not done? */
-	if (!hlist_empty(&tmp)) {
+	if (tmp) {
 		bio->bi_opf |= REQ_DM_POLL_LIST;
 		/* Reset bio->bi_private to dm_io list head */
-		hlist_move_list(&tmp, head);
+		*head = tmp;
 		return 0;
 	}
 	return 1;
-- 
2.15.0


[-- Attachment #2: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 20/21] dm: simplify bio-based IO accounting further
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (18 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 19/21] dm: put all polled dm_io instances into a single list Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 21/21] dm: improve abnormal bio processing Mike Snitzer
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Now that io splitting is recorded prior to, or during, ->map IO
accounting can happen immediately rather than defer until after
bio splitting in dm_split_and_process_bio().

Remove the DM_IO_START_ACCT flag and also remove dm_io's map_task
member because there is no longer any need to wait for splitting to
occur before accounting.

Also move dm_io struct's 'flags' member to consolidate struct holes.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-core.h |  6 +-----
 drivers/md/dm.c      | 34 +++++-----------------------------
 2 files changed, 6 insertions(+), 34 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index f3cfc7affd12..d21648a923ea 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -255,15 +255,12 @@ static inline bool dm_tio_is_normal(struct dm_target_io *tio)
 #define DM_IO_MAGIC 19577
 struct dm_io {
 	unsigned short magic;
-
+	blk_short_t flags;
 	spinlock_t lock;
 	unsigned long start_time;
 	void *data;
 	struct dm_io *next;
-	struct task_struct *map_task;
 	struct dm_stats_aux stats_aux;
-
-	blk_short_t flags;
 	blk_status_t status;
 	atomic_t io_count;
 	struct mapped_device *md;
@@ -281,7 +278,6 @@ struct dm_io {
  * dm_io flags
  */
 enum {
-	DM_IO_START_ACCT,
 	DM_IO_ACCOUNTED,
 	DM_IO_WAS_SPLIT
 };
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index c5a79712de1d..3b87d151ef88 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -596,7 +596,6 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
 	this_cpu_inc(*md->pending_io);
 	io->orig_bio = bio;
 	io->md = md;
-	io->map_task = current;
 	spin_lock_init(&io->lock);
 	io->start_time = jiffies;
 	io->flags = 0;
@@ -1242,13 +1241,6 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors)
 }
 EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
 
-static inline void __dm_submit_bio_remap(struct bio *clone,
-					 dev_t dev, sector_t old_sector)
-{
-	trace_block_bio_remap(clone, dev, old_sector);
-	submit_bio_noacct(clone);
-}
-
 /*
  * @clone: clone bio that DM core passed to target's .map function
  * @tgt_clone: clone of @clone bio that target needs submitted
@@ -1263,8 +1255,6 @@ void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone)
 	struct dm_target_io *tio = clone_to_tio(clone);
 	struct dm_io *io = tio->io;
 
-	WARN_ON_ONCE(!tio->ti->accounts_remapped_io);
-
 	/* establish bio that will get submitted */
 	if (!tgt_clone)
 		tgt_clone = clone;
@@ -1273,15 +1263,11 @@ void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone)
 	 * Account io->origin_bio to DM dev on behalf of target
 	 * that took ownership of IO with DM_MAPIO_SUBMITTED.
 	 */
-	if (io->map_task == current) {
-		/* Still in target's map function */
-		dm_io_set_flag(io, DM_IO_START_ACCT);
-	} else {
-		dm_start_io_acct(io, clone);
-	}
+	dm_start_io_acct(io, clone);
 
-	__dm_submit_bio_remap(tgt_clone, disk_devt(io->md->disk),
+	trace_block_bio_remap(tgt_clone, disk_devt(io->md->disk),
 			      tio->old_sector);
+	submit_bio_noacct(tgt_clone);
 }
 EXPORT_SYMBOL_GPL(dm_submit_bio_remap);
 
@@ -1341,16 +1327,10 @@ static void __map_bio(struct bio *clone)
 	case DM_MAPIO_SUBMITTED:
 		/* target has assumed ownership of this io */
 		if (!ti->accounts_remapped_io)
-			dm_io_set_flag(io, DM_IO_START_ACCT);
+			dm_start_io_acct(io, clone);
 		break;
 	case DM_MAPIO_REMAPPED:
-		/*
-		 * the bio has been remapped so dispatch it, but defer
-		 * dm_start_io_acct() until after possible bio_split().
-		 */
-		__dm_submit_bio_remap(clone, disk_devt(md->disk),
-				      tio->old_sector);
-		dm_io_set_flag(io, DM_IO_START_ACCT);
+		dm_submit_bio_remap(clone, NULL);
 		break;
 	case DM_MAPIO_KILL:
 	case DM_MAPIO_REQUEUE:
@@ -1668,7 +1648,6 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 	}
 
 	error = __split_and_process_bio(&ci);
-	io->map_task = NULL;
 	if (error || !ci.sector_count)
 		goto out;
 	/*
@@ -1680,9 +1659,6 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 	bio_inc_remaining(bio);
 	submit_bio_noacct(bio);
 out:
-	if (dm_io_flagged(io, DM_IO_START_ACCT))
-		dm_start_io_acct(io, NULL);
-
 	/*
 	 * Drop the extra reference count for non-POLLED bio, and hold one
 	 * reference for POLLED bio, which will be released in dm_poll_bio
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [dm-devel] [dm-5.19 PATCH 21/21] dm: improve abnormal bio processing
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (19 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 20/21] dm: simplify bio-based IO accounting further Mike Snitzer
@ 2022-04-18  2:27 ` Mike Snitzer
  2022-04-21  4:06   ` Shinichiro Kawasaki
  2022-04-18  3:00 ` [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Damien Le Moal
  2022-04-18 12:51 ` [dm-devel] (subset) " Jens Axboe
  22 siblings, 1 reply; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  2:27 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, hch, ming.lei

Read/write/flush are the most common operations, optimize switch in
is_abnormal_io() for those cases. Follows same pattern established in
block perf-wip commit ("block: optimise blk_may_split for normal rw")

Also, push is_abnormal_io() check and blk_queue_split() down from
dm_submit_bio() to dm_split_and_process_bio() and set new
'is_abnormal_io' flag in clone_info. Optimize __split_and_process_bio
and __process_abnormal_io by leveraging ci.is_abnormal_io flag.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm.c | 60 +++++++++++++++++++++++++++++----------------------------
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3b87d151ef88..b9c30dfe0f2a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -84,7 +84,8 @@ struct clone_info {
 	struct dm_io *io;
 	sector_t sector;
 	unsigned sector_count;
-	bool submit_as_polled;
+	bool is_abnormal_io:1;
+	bool submit_as_polled:1;
 };
 
 #define DM_TARGET_IO_BIO_OFFSET (offsetof(struct dm_target_io, clone))
@@ -1492,21 +1493,24 @@ static void __send_changing_extent_only(struct clone_info *ci, struct dm_target
 
 static bool is_abnormal_io(struct bio *bio)
 {
-	bool r = false;
+	unsigned int op = bio_op(bio);
 
-	switch (bio_op(bio)) {
-	case REQ_OP_DISCARD:
-	case REQ_OP_SECURE_ERASE:
-	case REQ_OP_WRITE_ZEROES:
-		r = true;
-		break;
+	if (op != REQ_OP_READ && op != REQ_OP_WRITE && op != REQ_OP_FLUSH) {
+		switch (op) {
+		case REQ_OP_DISCARD:
+		case REQ_OP_SECURE_ERASE:
+		case REQ_OP_WRITE_ZEROES:
+			return true;
+		default:
+			break;
+		}
 	}
 
-	return r;
+	return false;
 }
 
-static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
-				  blk_status_t *status)
+static blk_status_t __process_abnormal_io(struct clone_info *ci,
+					  struct dm_target *ti)
 {
 	unsigned num_bios = 0;
 
@@ -1520,8 +1524,6 @@ static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
 	case REQ_OP_WRITE_ZEROES:
 		num_bios = ti->num_write_zeroes_bios;
 		break;
-	default:
-		return false;
 	}
 
 	/*
@@ -1531,12 +1533,10 @@ static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
 	 * check was performed.
 	 */
 	if (unlikely(!num_bios))
-		*status = BLK_STS_NOTSUPP;
-	else {
-		__send_changing_extent_only(ci, ti, num_bios);
-		*status = BLK_STS_OK;
-	}
-	return true;
+		return BLK_STS_NOTSUPP;
+
+	__send_changing_extent_only(ci, ti, num_bios);
+	return BLK_STS_OK;
 }
 
 /*
@@ -1589,11 +1589,12 @@ static blk_status_t __split_and_process_bio(struct clone_info *ci)
 	struct bio *clone;
 	struct dm_target *ti;
 	unsigned len;
-	blk_status_t error = BLK_STS_IOERR;
 
 	ti = dm_table_find_target(ci->map, ci->sector);
-	if (unlikely(!ti || __process_abnormal_io(ci, ti, &error)))
-		return error;
+	if (unlikely(!ti))
+		return BLK_STS_IOERR;
+	else if (unlikely(ci->is_abnormal_io))
+		return __process_abnormal_io(ci, ti);
 
 	/*
 	 * Only support bio polling for normal IO, and the target io is
@@ -1618,6 +1619,7 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
 	ci->map = map;
 	ci->io = alloc_io(md, bio);
 	ci->bio = bio;
+	ci->is_abnormal_io = false;
 	ci->submit_as_polled = false;
 	ci->sector = bio->bi_iter.bi_sector;
 	ci->sector_count = bio_sectors(bio);
@@ -1645,6 +1647,13 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 		__send_empty_flush(&ci);
 		/* dm_io_complete submits any data associated with flush */
 		goto out;
+	} else if (unlikely(is_abnormal_io(bio))) {
+		/*
+		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
+		 * otherwise associated queue_limits won't be imposed.
+		 */
+		blk_queue_split(&bio);
+		ci.is_abnormal_io = true;
 	}
 
 	error = __split_and_process_bio(&ci);
@@ -1698,13 +1707,6 @@ static void dm_submit_bio(struct bio *bio)
 		goto out;
 	}
 
-	/*
-	 * Use blk_queue_split() for abnormal IO (e.g. discard, writesame, etc)
-	 * otherwise associated queue_limits won't be imposed.
-	 */
-	if (unlikely(is_abnormal_io(bio)))
-		blk_queue_split(&bio);
-
 	dm_split_and_process_bio(md, map, bio);
 out:
 	dm_put_live_table_bio(md, srcu_idx, bio);
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (20 preceding siblings ...)
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 21/21] dm: improve abnormal bio processing Mike Snitzer
@ 2022-04-18  3:00 ` Damien Le Moal
  2022-04-18  3:16   ` Mike Snitzer
  2022-04-18 12:51 ` [dm-devel] (subset) " Jens Axboe
  22 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2022-04-18  3:00 UTC (permalink / raw)
  To: Mike Snitzer, dm-devel; +Cc: axboe, hch, ming.lei

On 4/18/22 11:27, Mike Snitzer wrote:
> Hi,
> 
> This patchset reflects what I've staged in linux-dm.git's "dm-5.19"
> branch (also staged in "for-next" for linux-next):
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19
> 
> It build's on jens/for-5.19/block branch (which is based on v5.18-rc3)

Mike, thanks for posting. We will run this on zoned stuff to check.
Note that patches 13 to 20 are empty...

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19
  2022-04-18  3:00 ` [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Damien Le Moal
@ 2022-04-18  3:16   ` Mike Snitzer
  2022-04-18 12:49     ` Jens Axboe
  0 siblings, 1 reply; 31+ messages in thread
From: Mike Snitzer @ 2022-04-18  3:16 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: axboe, dm-devel, hch, ming.lei

On Sun, Apr 17 2022 at 11:00P -0400,
Damien Le Moal <damien.lemoal@opensource.wdc.com> wrote:

> On 4/18/22 11:27, Mike Snitzer wrote:
> > Hi,
> > 
> > This patchset reflects what I've staged in linux-dm.git's "dm-5.19"
> > branch (also staged in "for-next" for linux-next):
> > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19
> > 
> > It build's on jens/for-5.19/block branch (which is based on v5.18-rc3)
> 
> Mike, thanks for posting. We will run this on zoned stuff to check.

OK, I appreciate it..

> Note that patches 13 to 20 are empty...

Not sure what's going on there... basically any patch that wasn't from
me (so 1, 13-19) isn't showing up in patchwork or the dm-devel
archive.

This is my first submission of patches using my snitzer@kernel.org
alias with redhat's smtp server that is really anal about not sending
email that isn't from me.  Yet it sent the email but seized on the
following to have the body be empty:
"From: Ming Lei <ming.lei@redhat.com>"

Best part is my email account shows all the emails sent like all was
perfect.  Sorting this out will have to wait until Tuesday.

Sorry, but please just pull the dm-5.19 branch via git ;)

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19
  2022-04-18  3:16   ` Mike Snitzer
@ 2022-04-18 12:49     ` Jens Axboe
  0 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2022-04-18 12:49 UTC (permalink / raw)
  To: Mike Snitzer, Damien Le Moal; +Cc: dm-devel, hch, ming.lei

On 4/17/22 9:16 PM, Mike Snitzer wrote:
> On Sun, Apr 17 2022 at 11:00P -0400,
> Damien Le Moal <damien.lemoal@opensource.wdc.com> wrote:
> 
>> On 4/18/22 11:27, Mike Snitzer wrote:
>>> Hi,
>>>
>>> This patchset reflects what I've staged in linux-dm.git's "dm-5.19"
>>> branch (also staged in "for-next" for linux-next):
>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19
>>>
>>> It build's on jens/for-5.19/block branch (which is based on v5.18-rc3)
>>
>> Mike, thanks for posting. We will run this on zoned stuff to check.
> 
> OK, I appreciate it..
> 
>> Note that patches 13 to 20 are empty...
> 
> Not sure what's going on there... basically any patch that wasn't from
> me (so 1, 13-19) isn't showing up in patchwork or the dm-devel
> archive.

The patches I received look normal, fwiw.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dm-devel] (subset) [dm-5.19 PATCH 00/21] dm: changes staged for 5.19
  2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
                   ` (21 preceding siblings ...)
  2022-04-18  3:00 ` [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Damien Le Moal
@ 2022-04-18 12:51 ` Jens Axboe
  22 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2022-04-18 12:51 UTC (permalink / raw)
  To: snitzer, dm-devel; +Cc: Christoph Hellwig, ming.lei

On Sun, 17 Apr 2022 22:27:12 -0400, Mike Snitzer wrote:
> This patchset reflects what I've staged in linux-dm.git's "dm-5.19"
> branch (also staged in "for-next" for linux-next):
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19
> 
> It build's on jens/for-5.19/block branch (which is based on v5.18-rc3)
> 
> I can still make changes or add Reviewed-by:s, etc. So please feel
> free to review.
> 
> [...]

Applied, thanks!

[01/21] block: change exported IO accounting interface from gendisk to bdev
        commit: 5f0614a55ecebdf55f1a17db0b5f6b787ed009f1

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [dm-5.19 PATCH 21/21] dm: improve abnormal bio processing
  2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 21/21] dm: improve abnormal bio processing Mike Snitzer
@ 2022-04-21  4:06   ` Shinichiro Kawasaki
  2022-04-22 13:13     ` Mike Snitzer
  0 siblings, 1 reply; 31+ messages in thread
From: Shinichiro Kawasaki @ 2022-04-21  4:06 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: axboe, Damien Le Moal, dm-devel, hch, ming.lei

On Apr 17, 2022 / 22:27, Mike Snitzer wrote:
> Read/write/flush are the most common operations, optimize switch in
> is_abnormal_io() for those cases. Follows same pattern established in
> block perf-wip commit ("block: optimise blk_may_split for normal rw")
> 
> Also, push is_abnormal_io() check and blk_queue_split() down from
> dm_submit_bio() to dm_split_and_process_bio() and set new
> 'is_abnormal_io' flag in clone_info. Optimize __split_and_process_bio
> and __process_abnormal_io by leveraging ci.is_abnormal_io flag.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  drivers/md/dm.c | 60 +++++++++++++++++++++++++++++----------------------------
>  1 file changed, 31 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3b87d151ef88..b9c30dfe0f2a 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -84,7 +84,8 @@ struct clone_info {
>  	struct dm_io *io;
>  	sector_t sector;

(snip)

>  	ci->sector = bio->bi_iter.bi_sector;
>  	ci->sector_count = bio_sectors(bio);
> @@ -1645,6 +1647,13 @@ static void dm_split_and_process_bio(struct mapped_device *md,
>  		__send_empty_flush(&ci);
>  		/* dm_io_complete submits any data associated with flush */
>  		goto out;
> +	} else if (unlikely(is_abnormal_io(bio))) {
> +		/*
> +		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
> +		 * otherwise associated queue_limits won't be imposed.
> +		 */
> +		blk_queue_split(&bio);
> +		ci.is_abnormal_io = true;
>  	}
>  
>  	error = __split_and_process_bio(&ci);

Hi Mike,

The hunk above triggered a WARNING [1] which is observed at mkfs.xfs for dm-
zoned devices. With the hunk, the blk_queue_split(&bio) for abnormal I/O is
called after init_clone_info(). I think it made the cloned bio different from
the split bio. I suggest to move the "if (unlikely(is_abnormal_io(bio)))" block
at the beginning of dm_split_and_process_bio(), so that blk_queue_split() is
called before init_clone_info(). I created a patch for such change [2], and
confirmed the WARNING goes away.

> @@ -1698,13 +1707,6 @@ static void dm_submit_bio(struct bio *bio)
>  		goto out;
>  	}
>  
> -	/*
> -	 * Use blk_queue_split() for abnormal IO (e.g. discard, writesame, etc)
> -	 * otherwise associated queue_limits won't be imposed.
> -	 */
> -	if (unlikely(is_abnormal_io(bio)))
> -		blk_queue_split(&bio);
> -
>  	dm_split_and_process_bio(md, map, bio);
>  out:
>  	dm_put_live_table_bio(md, srcu_idx, bio);
> -- 
> 2.15.0
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
> 

[1]

WARNING: CPU: 3 PID: 79434 at block/bio.c:1629 bio_trim+0x10a/0x150
Modules linked in: dm_zoned null_blk f2fs crc32_generic dm_flakey iscsi_target_mod tcm_loop target_core_pscsi target_core_file target_core_iblock xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw rfkill iptable_security ip_set target_core_user nfnetlink target_core_mod ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter qrtr sunrpc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel at24 iTCO_wdt intel_pmc_bxt iTCO_vendor_support btrfs kvm irqbypass rapl raid6_pq intel_cstate zstd_compress joydev
 intel_uncore i2c_i801 pcspkr i2c_smbus intel_pch_thermal lpc_ich ses enclosure ie31200_edac video zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ast ghash_clmulni_intel drm_vram_helper drm_kms_helper drm_ttm_helper ttm igb drm mpt3sas e1000e dca i2c_algo_bit raid_class scsi_transport_sas fuse [last unloaded: zonefs]
CPU: 3 PID: 79434 Comm: mkfs.xfs Kdump: loaded Not tainted 5.18.0-rc3+ #2
Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0 04/24/2015
RIP: 0010:bio_trim+0x10a/0x150
Code: 8d 7c 24 68 48 89 fa 48 c1 ea 03 80 3c 02 00 75 58 49 83 7c 24 68 00 74 13 48 83 c4 10 4c 89 e7 5d 41 5c 41 5d e9 f6 aa 0f 00 <0f> 0b 48 83 c4 10 5d 41 5c 41 5d c3 4c 89 ef 48 89 54 24 08 48 89
RSP: 0018:ffff8881977d7928 EFLAGS: 00010206
RAX: 0000000008000000 RBX: ffff88852b000000 RCX: 0000000000040000
RDX: 00000000003c0000 RSI: 0000000000040000 RDI: ffff8883215a3d00
RBP: 0000000000400000 R08: 0000000000000001 R09: ffff8881124b9057
R10: ffffed102249720a R11: 0000000000000001 R12: ffff8883215a3d00
R13: ffff8883215a3d28 R14: ffff8881124b9098 R15: ffff8881124b9054
FS:  00007ff3a993dbc0(0000) GS:ffff8886ecf80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055a1bf241900 CR3: 0000000177074002 CR4: 00000000001706e0
Call Trace:
 <TASK>
 dm_submit_bio+0x5fa/0x13b0
 ? dm_dax_direct_access+0x1c0/0x1c0
 ? lock_release+0x3a7/0x6c0
 ? lock_downgrade+0x6a0/0x6a0
 __submit_bio+0x1c2/0x2c0
 ? __bio_queue_enter+0x570/0x570
 submit_bio_noacct_nocheck+0x2f7/0x810
 ? should_fail_request+0x70/0x70
 ? rcu_read_lock_sched_held+0x3f/0x70
 ? __lock_acquire+0x1591/0x5030
 submit_bio+0x10a/0x270
 ? submit_bio_noacct+0x15c0/0x15c0
 submit_bio_wait+0xf2/0x1d0
 ? submit_bio_wait_endio+0x40/0x40
 ? lock_release+0x6c0/0x6c0
 blkdev_issue_discard+0xb5/0x110
 ? __blkdev_issue_discard+0x590/0x590
 ? truncate_bdev_range+0x15d/0x240
 blkdev_common_ioctl+0x853/0x1670
 ? blkdev_bszset+0x160/0x160
 ? __ia32_sys_readlinkat+0x87/0xf0
 ? __ia32_compat_sys_newlstat+0x70/0x70
 ? count_memcg_events.constprop.0+0x44/0x50
 blkdev_ioctl+0x23b/0x690
 ? blkdev_common_ioctl+0x1670/0x1670
 ? security_file_ioctl+0x50/0x90
 __x64_sys_ioctl+0x127/0x190
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7ff3a970731b
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 2a 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd79c8db68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000e8000000 RCX: 00007ff3a970731b
RDX: 00007ffd79c8db80 RSI: 0000000000001277 RDI: 0000000000000004
RBP: 0000000000000004 R08: 0000000000000000 R09: 00007ffd79c8d997
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd79c8db80
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000080000000
 </TASK>
irq event stamp: 10213
hardirqs last  enabled at (10223): [<ffffffffaa36e02e>] __up_console_sem+0x5e/0x70
hardirqs last disabled at (10232): [<ffffffffaa36e013>] __up_console_sem+0x43/0x70
softirqs last  enabled at (10190): [<ffffffffaa1faf06>] __irq_exit_rcu+0x1c6/0x260
softirqs last disabled at (10185): [<ffffffffaa1faf06>] __irq_exit_rcu+0x1c6/0x260
---[ end trace 0000000000000000 ]---

[2]

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7e3b5bdcf520..50382c98d7b3 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1639,6 +1639,15 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 	struct dm_io *io;
 	blk_status_t error = BLK_STS_OK;
 
+	if (unlikely(is_abnormal_io(bio))) {
+		/*
+		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
+		 * otherwise associated queue_limits won't be imposed.
+		 */
+		blk_queue_split(&bio);
+		ci.is_abnormal_io = true;
+	}
+
 	init_clone_info(&ci, md, map, bio);
 	io = ci.io;
 
@@ -1646,13 +1655,6 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 		__send_empty_flush(&ci);
 		/* dm_io_complete submits any data associated with flush */
 		goto out;
-	} else if (unlikely(is_abnormal_io(bio))) {
-		/*
-		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
-		 * otherwise associated queue_limits won't be imposed.
-		 */
-		blk_queue_split(&bio);
-		ci.is_abnormal_io = true;
 	}
 
 	error = __split_and_process_bio(&ci);

-- 
Best Regards,
Shin'ichiro Kawasaki

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [dm-5.19 PATCH 21/21] dm: improve abnormal bio processing
  2022-04-21  4:06   ` Shinichiro Kawasaki
@ 2022-04-22 13:13     ` Mike Snitzer
  2022-04-22 13:26       ` Mike Snitzer
  0 siblings, 1 reply; 31+ messages in thread
From: Mike Snitzer @ 2022-04-22 13:13 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: axboe, Damien Le Moal, Mike Snitzer, ming.lei, dm-devel, hch

On Thu, Apr 21 2022 at 12:06P -0400,
Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> wrote:

> On Apr 17, 2022 / 22:27, Mike Snitzer wrote:
> > Read/write/flush are the most common operations, optimize switch in
> > is_abnormal_io() for those cases. Follows same pattern established in
> > block perf-wip commit ("block: optimise blk_may_split for normal rw")
> > 
> > Also, push is_abnormal_io() check and blk_queue_split() down from
> > dm_submit_bio() to dm_split_and_process_bio() and set new
> > 'is_abnormal_io' flag in clone_info. Optimize __split_and_process_bio
> > and __process_abnormal_io by leveraging ci.is_abnormal_io flag.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  drivers/md/dm.c | 60 +++++++++++++++++++++++++++++----------------------------
> >  1 file changed, 31 insertions(+), 29 deletions(-)
> > 
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 3b87d151ef88..b9c30dfe0f2a 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -84,7 +84,8 @@ struct clone_info {
> >  	struct dm_io *io;
> >  	sector_t sector;
> 
> (snip)
> 
> >  	ci->sector = bio->bi_iter.bi_sector;
> >  	ci->sector_count = bio_sectors(bio);
> > @@ -1645,6 +1647,13 @@ static void dm_split_and_process_bio(struct mapped_device *md,
> >  		__send_empty_flush(&ci);
> >  		/* dm_io_complete submits any data associated with flush */
> >  		goto out;
> > +	} else if (unlikely(is_abnormal_io(bio))) {
> > +		/*
> > +		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
> > +		 * otherwise associated queue_limits won't be imposed.
> > +		 */
> > +		blk_queue_split(&bio);
> > +		ci.is_abnormal_io = true;
> >  	}
> >  
> >  	error = __split_and_process_bio(&ci);
> 
> Hi Mike,
> 
> The hunk above triggered a WARNING [1] which is observed at mkfs.xfs for dm-
> zoned devices. With the hunk, the blk_queue_split(&bio) for abnormal I/O is
> called after init_clone_info(). I think it made the cloned bio different from
> the split bio. I suggest to move the "if (unlikely(is_abnormal_io(bio)))" block
> at the beginning of dm_split_and_process_bio(), so that blk_queue_split() is
> called before init_clone_info(). I created a patch for such change [2], and
> confirmed the WARNING goes away.
> 
> > @@ -1698,13 +1707,6 @@ static void dm_submit_bio(struct bio *bio)
> >  		goto out;
> >  	}
> >  
> > -	/*
> > -	 * Use blk_queue_split() for abnormal IO (e.g. discard, writesame, etc)
> > -	 * otherwise associated queue_limits won't be imposed.
> > -	 */
> > -	if (unlikely(is_abnormal_io(bio)))
> > -		blk_queue_split(&bio);
> > -
> >  	dm_split_and_process_bio(md, map, bio);
> >  out:
> >  	dm_put_live_table_bio(md, srcu_idx, bio);
> > -- 
> > 2.15.0
> > 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://listman.redhat.com/mailman/listinfo/dm-devel
> > 
> 
> [1]
> 
> WARNING: CPU: 3 PID: 79434 at block/bio.c:1629 bio_trim+0x10a/0x150
> Modules linked in: dm_zoned null_blk f2fs crc32_generic dm_flakey iscsi_target_mod tcm_loop target_core_pscsi target_core_file target_core_iblock xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw rfkill iptable_security ip_set target_core_user nfnetlink target_core_mod ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter qrtr sunrpc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel at24 iTCO_wdt intel_pmc_bxt iTCO_vendor_support btrfs kvm irqbypass rapl raid6_pq intel_cstate zstd_compress joydev
>  intel_uncore i2c_i801 pcspkr i2c_smbus intel_pch_thermal lpc_ich ses enclosure ie31200_edac video zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ast ghash_clmulni_intel drm_vram_helper drm_kms_helper drm_ttm_helper ttm igb drm mpt3sas e1000e dca i2c_algo_bit raid_class scsi_transport_sas fuse [last unloaded: zonefs]
> CPU: 3 PID: 79434 Comm: mkfs.xfs Kdump: loaded Not tainted 5.18.0-rc3+ #2
> Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0 04/24/2015
> RIP: 0010:bio_trim+0x10a/0x150
> Code: 8d 7c 24 68 48 89 fa 48 c1 ea 03 80 3c 02 00 75 58 49 83 7c 24 68 00 74 13 48 83 c4 10 4c 89 e7 5d 41 5c 41 5d e9 f6 aa 0f 00 <0f> 0b 48 83 c4 10 5d 41 5c 41 5d c3 4c 89 ef 48 89 54 24 08 48 89
> RSP: 0018:ffff8881977d7928 EFLAGS: 00010206
> RAX: 0000000008000000 RBX: ffff88852b000000 RCX: 0000000000040000
> RDX: 00000000003c0000 RSI: 0000000000040000 RDI: ffff8883215a3d00
> RBP: 0000000000400000 R08: 0000000000000001 R09: ffff8881124b9057
> R10: ffffed102249720a R11: 0000000000000001 R12: ffff8883215a3d00
> R13: ffff8883215a3d28 R14: ffff8881124b9098 R15: ffff8881124b9054
> FS:  00007ff3a993dbc0(0000) GS:ffff8886ecf80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055a1bf241900 CR3: 0000000177074002 CR4: 00000000001706e0
> Call Trace:
>  <TASK>
>  dm_submit_bio+0x5fa/0x13b0
>  ? dm_dax_direct_access+0x1c0/0x1c0
>  ? lock_release+0x3a7/0x6c0
>  ? lock_downgrade+0x6a0/0x6a0
>  __submit_bio+0x1c2/0x2c0
>  ? __bio_queue_enter+0x570/0x570
>  submit_bio_noacct_nocheck+0x2f7/0x810
>  ? should_fail_request+0x70/0x70
>  ? rcu_read_lock_sched_held+0x3f/0x70
>  ? __lock_acquire+0x1591/0x5030
>  submit_bio+0x10a/0x270
>  ? submit_bio_noacct+0x15c0/0x15c0
>  submit_bio_wait+0xf2/0x1d0
>  ? submit_bio_wait_endio+0x40/0x40
>  ? lock_release+0x6c0/0x6c0
>  blkdev_issue_discard+0xb5/0x110
>  ? __blkdev_issue_discard+0x590/0x590
>  ? truncate_bdev_range+0x15d/0x240
>  blkdev_common_ioctl+0x853/0x1670
>  ? blkdev_bszset+0x160/0x160
>  ? __ia32_sys_readlinkat+0x87/0xf0
>  ? __ia32_compat_sys_newlstat+0x70/0x70
>  ? count_memcg_events.constprop.0+0x44/0x50
>  blkdev_ioctl+0x23b/0x690
>  ? blkdev_common_ioctl+0x1670/0x1670
>  ? security_file_ioctl+0x50/0x90
>  __x64_sys_ioctl+0x127/0x190
>  do_syscall_64+0x3b/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7ff3a970731b
> Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 2a 0f 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffd79c8db68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00000000e8000000 RCX: 00007ff3a970731b
> RDX: 00007ffd79c8db80 RSI: 0000000000001277 RDI: 0000000000000004
> RBP: 0000000000000004 R08: 0000000000000000 R09: 00007ffd79c8d997
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd79c8db80
> R13: 0000000000000001 R14: 0000000000000000 R15: 0000000080000000
>  </TASK>
> irq event stamp: 10213
> hardirqs last  enabled at (10223): [<ffffffffaa36e02e>] __up_console_sem+0x5e/0x70
> hardirqs last disabled at (10232): [<ffffffffaa36e013>] __up_console_sem+0x43/0x70
> softirqs last  enabled at (10190): [<ffffffffaa1faf06>] __irq_exit_rcu+0x1c6/0x260
> softirqs last disabled at (10185): [<ffffffffaa1faf06>] __irq_exit_rcu+0x1c6/0x260
> ---[ end trace 0000000000000000 ]---
> 
> [2]
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 7e3b5bdcf520..50382c98d7b3 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1639,6 +1639,15 @@ static void dm_split_and_process_bio(struct mapped_device *md,
>  	struct dm_io *io;
>  	blk_status_t error = BLK_STS_OK;
>  
> +	if (unlikely(is_abnormal_io(bio))) {
> +		/*
> +		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
> +		 * otherwise associated queue_limits won't be imposed.
> +		 */
> +		blk_queue_split(&bio);
> +		ci.is_abnormal_io = true;
> +	}
> +
>  	init_clone_info(&ci, md, map, bio);
>  	io = ci.io;
>  
> @@ -1646,13 +1655,6 @@ static void dm_split_and_process_bio(struct mapped_device *md,
>  		__send_empty_flush(&ci);
>  		/* dm_io_complete submits any data associated with flush */
>  		goto out;
> -	} else if (unlikely(is_abnormal_io(bio))) {
> -		/*
> -		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
> -		 * otherwise associated queue_limits won't be imposed.
> -		 */
> -		blk_queue_split(&bio);
> -		ci.is_abnormal_io = true;
>  	}
>  
>  	error = __split_and_process_bio(&ci);
> 
> -- 
> Best Regards,
> Shin'ichiro Kawasaki
> 

Thanks for your report, I'll sort it out but your patch doesn't seem right.

init_clone_info() will reset ci.is_abnormal_io so the bio won't be
processed properly.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [dm-5.19 PATCH 21/21] dm: improve abnormal bio processing
  2022-04-22 13:13     ` Mike Snitzer
@ 2022-04-22 13:26       ` Mike Snitzer
  2022-04-25 23:57         ` Shinichiro Kawasaki
  0 siblings, 1 reply; 31+ messages in thread
From: Mike Snitzer @ 2022-04-22 13:26 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: axboe, Damien Le Moal, Mike Snitzer, ming.lei, dm-devel, hch

On Fri, Apr 22 2022 at  9:13P -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Thu, Apr 21 2022 at 12:06P -0400,
> Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> wrote:
> 
> > On Apr 17, 2022 / 22:27, Mike Snitzer wrote:
> > > Read/write/flush are the most common operations, optimize switch in
> > > is_abnormal_io() for those cases. Follows same pattern established in
> > > block perf-wip commit ("block: optimise blk_may_split for normal rw")
> > > 
> > > Also, push is_abnormal_io() check and blk_queue_split() down from
> > > dm_submit_bio() to dm_split_and_process_bio() and set new
> > > 'is_abnormal_io' flag in clone_info. Optimize __split_and_process_bio
> > > and __process_abnormal_io by leveraging ci.is_abnormal_io flag.
> > > 
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > ---
> > >  drivers/md/dm.c | 60 +++++++++++++++++++++++++++++----------------------------
> > >  1 file changed, 31 insertions(+), 29 deletions(-)
> > > 
> > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > index 3b87d151ef88..b9c30dfe0f2a 100644
> > > --- a/drivers/md/dm.c
> > > +++ b/drivers/md/dm.c
> > > @@ -84,7 +84,8 @@ struct clone_info {
> > >  	struct dm_io *io;
> > >  	sector_t sector;
> > 
> > (snip)
> > 
> > >  	ci->sector = bio->bi_iter.bi_sector;
> > >  	ci->sector_count = bio_sectors(bio);
> > > @@ -1645,6 +1647,13 @@ static void dm_split_and_process_bio(struct mapped_device *md,
> > >  		__send_empty_flush(&ci);
> > >  		/* dm_io_complete submits any data associated with flush */
> > >  		goto out;
> > > +	} else if (unlikely(is_abnormal_io(bio))) {
> > > +		/*
> > > +		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
> > > +		 * otherwise associated queue_limits won't be imposed.
> > > +		 */
> > > +		blk_queue_split(&bio);
> > > +		ci.is_abnormal_io = true;
> > >  	}
> > >  
> > >  	error = __split_and_process_bio(&ci);
> > 
> > Hi Mike,
> > 
> > The hunk above triggered a WARNING [1] which is observed at mkfs.xfs for dm-
> > zoned devices. With the hunk, the blk_queue_split(&bio) for abnormal I/O is
> > called after init_clone_info(). I think it made the cloned bio different from
> > the split bio. I suggest to move the "if (unlikely(is_abnormal_io(bio)))" block
> > at the beginning of dm_split_and_process_bio(), so that blk_queue_split() is
> > called before init_clone_info(). I created a patch for such change [2], and
> > confirmed the WARNING goes away.
> > 
> > > @@ -1698,13 +1707,6 @@ static void dm_submit_bio(struct bio *bio)
> > >  		goto out;
> > >  	}
> > >  
> > > -	/*
> > > -	 * Use blk_queue_split() for abnormal IO (e.g. discard, writesame, etc)
> > > -	 * otherwise associated queue_limits won't be imposed.
> > > -	 */
> > > -	if (unlikely(is_abnormal_io(bio)))
> > > -		blk_queue_split(&bio);
> > > -
> > >  	dm_split_and_process_bio(md, map, bio);
> > >  out:
> > >  	dm_put_live_table_bio(md, srcu_idx, bio);
> > > -- 
> > > 2.15.0
> > > 
> > > --
> > > dm-devel mailing list
> > > dm-devel@redhat.com
> > > https://listman.redhat.com/mailman/listinfo/dm-devel
> > > 
> > 
> > [1]
> > 
> > WARNING: CPU: 3 PID: 79434 at block/bio.c:1629 bio_trim+0x10a/0x150
> > Modules linked in: dm_zoned null_blk f2fs crc32_generic dm_flakey iscsi_target_mod tcm_loop target_core_pscsi target_core_file target_core_iblock xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw rfkill iptable_security ip_set target_core_user nfnetlink target_core_mod ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter qrtr sunrpc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel at24 iTCO_wdt intel_pmc_bxt iTCO_vendor_support btrfs kvm irqbypass rapl raid6_pq intel_cstate zstd_compress joydev
> >  intel_uncore i2c_i801 pcspkr i2c_smbus intel_pch_thermal lpc_ich ses enclosure ie31200_edac video zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ast ghash_clmulni_intel drm_vram_helper drm_kms_helper drm_ttm_helper ttm igb drm mpt3sas e1000e dca i2c_algo_bit raid_class scsi_transport_sas fuse [last unloaded: zonefs]
> > CPU: 3 PID: 79434 Comm: mkfs.xfs Kdump: loaded Not tainted 5.18.0-rc3+ #2
> > Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0 04/24/2015
> > RIP: 0010:bio_trim+0x10a/0x150
> > Code: 8d 7c 24 68 48 89 fa 48 c1 ea 03 80 3c 02 00 75 58 49 83 7c 24 68 00 74 13 48 83 c4 10 4c 89 e7 5d 41 5c 41 5d e9 f6 aa 0f 00 <0f> 0b 48 83 c4 10 5d 41 5c 41 5d c3 4c 89 ef 48 89 54 24 08 48 89
> > RSP: 0018:ffff8881977d7928 EFLAGS: 00010206
> > RAX: 0000000008000000 RBX: ffff88852b000000 RCX: 0000000000040000
> > RDX: 00000000003c0000 RSI: 0000000000040000 RDI: ffff8883215a3d00
> > RBP: 0000000000400000 R08: 0000000000000001 R09: ffff8881124b9057
> > R10: ffffed102249720a R11: 0000000000000001 R12: ffff8883215a3d00
> > R13: ffff8883215a3d28 R14: ffff8881124b9098 R15: ffff8881124b9054
> > FS:  00007ff3a993dbc0(0000) GS:ffff8886ecf80000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000055a1bf241900 CR3: 0000000177074002 CR4: 00000000001706e0
> > Call Trace:
> >  <TASK>
> >  dm_submit_bio+0x5fa/0x13b0
> >  ? dm_dax_direct_access+0x1c0/0x1c0
> >  ? lock_release+0x3a7/0x6c0
> >  ? lock_downgrade+0x6a0/0x6a0
> >  __submit_bio+0x1c2/0x2c0
> >  ? __bio_queue_enter+0x570/0x570
> >  submit_bio_noacct_nocheck+0x2f7/0x810
> >  ? should_fail_request+0x70/0x70
> >  ? rcu_read_lock_sched_held+0x3f/0x70
> >  ? __lock_acquire+0x1591/0x5030
> >  submit_bio+0x10a/0x270
> >  ? submit_bio_noacct+0x15c0/0x15c0
> >  submit_bio_wait+0xf2/0x1d0
> >  ? submit_bio_wait_endio+0x40/0x40
> >  ? lock_release+0x6c0/0x6c0
> >  blkdev_issue_discard+0xb5/0x110
> >  ? __blkdev_issue_discard+0x590/0x590
> >  ? truncate_bdev_range+0x15d/0x240
> >  blkdev_common_ioctl+0x853/0x1670
> >  ? blkdev_bszset+0x160/0x160
> >  ? __ia32_sys_readlinkat+0x87/0xf0
> >  ? __ia32_compat_sys_newlstat+0x70/0x70
> >  ? count_memcg_events.constprop.0+0x44/0x50
> >  blkdev_ioctl+0x23b/0x690
> >  ? blkdev_common_ioctl+0x1670/0x1670
> >  ? security_file_ioctl+0x50/0x90
> >  __x64_sys_ioctl+0x127/0x190
> >  do_syscall_64+0x3b/0x90
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7ff3a970731b
> > Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 2a 0f 00 f7 d8 64 89 01 48
> > RSP: 002b:00007ffd79c8db68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > RAX: ffffffffffffffda RBX: 00000000e8000000 RCX: 00007ff3a970731b
> > RDX: 00007ffd79c8db80 RSI: 0000000000001277 RDI: 0000000000000004
> > RBP: 0000000000000004 R08: 0000000000000000 R09: 00007ffd79c8d997
> > R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd79c8db80
> > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000080000000
> >  </TASK>
> > irq event stamp: 10213
> > hardirqs last  enabled at (10223): [<ffffffffaa36e02e>] __up_console_sem+0x5e/0x70
> > hardirqs last disabled at (10232): [<ffffffffaa36e013>] __up_console_sem+0x43/0x70
> > softirqs last  enabled at (10190): [<ffffffffaa1faf06>] __irq_exit_rcu+0x1c6/0x260
> > softirqs last disabled at (10185): [<ffffffffaa1faf06>] __irq_exit_rcu+0x1c6/0x260
> > ---[ end trace 0000000000000000 ]---
> > 
> > [2]
> > 
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 7e3b5bdcf520..50382c98d7b3 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1639,6 +1639,15 @@ static void dm_split_and_process_bio(struct mapped_device *md,
> >  	struct dm_io *io;
> >  	blk_status_t error = BLK_STS_OK;
> >  
> > +	if (unlikely(is_abnormal_io(bio))) {
> > +		/*
> > +		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
> > +		 * otherwise associated queue_limits won't be imposed.
> > +		 */
> > +		blk_queue_split(&bio);
> > +		ci.is_abnormal_io = true;
> > +	}
> > +
> >  	init_clone_info(&ci, md, map, bio);
> >  	io = ci.io;
> >  
> > @@ -1646,13 +1655,6 @@ static void dm_split_and_process_bio(struct mapped_device *md,
> >  		__send_empty_flush(&ci);
> >  		/* dm_io_complete submits any data associated with flush */
> >  		goto out;
> > -	} else if (unlikely(is_abnormal_io(bio))) {
> > -		/*
> > -		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
> > -		 * otherwise associated queue_limits won't be imposed.
> > -		 */
> > -		blk_queue_split(&bio);
> > -		ci.is_abnormal_io = true;
> >  	}
> >  
> >  	error = __split_and_process_bio(&ci);
> > 
> > -- 
> > Best Regards,
> > Shin'ichiro Kawasaki
> > 
> 
> Thanks for your report, I'll sort it out but your patch doesn't seem right.
> 
> init_clone_info() will reset ci.is_abnormal_io so the bio won't be
> processed properly.

FYI, I will be pushing a rebased commit with the following patch folded in:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7e3b5bdcf520..9650ba2075b8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1613,12 +1613,12 @@ static blk_status_t __split_and_process_bio(struct clone_info *ci)
 }

 static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
-			    struct dm_table *map, struct bio *bio)
+			    struct dm_table *map, struct bio *bio, bool is_abnormal)
 {
	ci->map = map;
	ci->io = alloc_io(md, bio);
	ci->bio = bio;
-	ci->is_abnormal_io = false;
+	ci->is_abnormal_io = is_abnormal;
	ci->submit_as_polled = false;
	ci->sector = bio->bi_iter.bi_sector;
	ci->sector_count = bio_sectors(bio);
@@ -1638,21 +1638,24 @@ static void dm_split_and_process_bio(struct mapped_device *md,
	struct clone_info ci;
	struct dm_io *io;
	blk_status_t error = BLK_STS_OK;
+	bool is_abnormal;

-	init_clone_info(&ci, md, map, bio);
+	is_abnormal = is_abnormal_io(bio);
+	if (unlikely(is_abnormal)) {
+		/*
+		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
+		 * otherwise associated queue_limits won't be imposed.
+		 */
+		blk_queue_split(&bio);
+	}
+
+	init_clone_info(&ci, md, map, bio, is_abnormal);
	io = ci.io;

	if (bio->bi_opf & REQ_PREFLUSH) {
		__send_empty_flush(&ci);
		/* dm_io_complete submits any data associated with flush */
		goto out;
-	} else if (unlikely(is_abnormal_io(bio))) {
-		/*
-		 * Use blk_queue_split() for abnormal IO (e.g. discard, etc)
-		 * otherwise associated queue_limits won't be imposed.
-		 */
-		blk_queue_split(&bio);
-		ci.is_abnormal_io = true;
	}

	error = __split_and_process_bio(&ci);

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [dm-5.19 PATCH 21/21] dm: improve abnormal bio processing
  2022-04-22 13:26       ` Mike Snitzer
@ 2022-04-25 23:57         ` Shinichiro Kawasaki
  0 siblings, 0 replies; 31+ messages in thread
From: Shinichiro Kawasaki @ 2022-04-25 23:57 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: axboe, Damien Le Moal, Mike Snitzer, ming.lei, dm-devel, hch

On Apr 22, 2022 / 09:26, Mike Snitzer wrote:
> On Fri, Apr 22 2022 at  9:13P -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:

[...]

> > Thanks for your report, I'll sort it out but your patch doesn't seem right.
> > 
> > init_clone_info() will reset ci.is_abnormal_io so the bio won't be
> > processed properly.

Right, I missed that point.

> FYI, I will be pushing a rebased commit with the following patch folded in:
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 7e3b5bdcf520..9650ba2075b8 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c

[...]

I tested the dm-5.19 branch at git hash e7ea9556985f including the fix above,
and observed it passed all of my test sets. Thanks!

-- 
Best Regards,
Shin'ichiro Kawasaki

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-04-27 12:17 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-18  2:27 [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Mike Snitzer
2022-04-18  2:27 ` [dm-5.19 PATCH 01/21] block: change exported IO accounting interface from gendisk to bdev Mike Snitzer
2022-04-18  2:27   ` [dm-devel] " Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 02/21] dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 03/21] dm: factor out dm_io_set_error and __dm_io_dec_pending Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 04/21] dm: simplify dm_io access in dm_split_and_process_bio Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 05/21] dm: simplify dm_start_io_acct Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 06/21] dm: mark various branches unlikely Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 07/21] dm: add local variables to clone_endio and __map_bio Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 08/21] dm: move hot dm_io members to same cacheline as dm_target_io Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 09/21] dm: introduce dm_{get, put}_live_table_bio called from dm_submit_bio Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 10/21] dm: conditionally enable branching for less used features Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 11/21] dm: simplify basic targets Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 12/21] dm: use bio_sectors in dm_aceept_partial_bio Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 13/21] dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 14/21] dm: pass dm_io instance to dm_io_acct directly Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 15/21] dm: switch to bdev based IO accounting interfaces Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 16/21] dm: improve bio splitting and associated IO accounting Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 17/21] dm: don't grab target io reference in dm_zone_map_bio Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 18/21] dm: improve dm_io reference counting Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 19/21] dm: put all polled dm_io instances into a single list Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 20/21] dm: simplify bio-based IO accounting further Mike Snitzer
2022-04-18  2:27 ` [dm-devel] [dm-5.19 PATCH 21/21] dm: improve abnormal bio processing Mike Snitzer
2022-04-21  4:06   ` Shinichiro Kawasaki
2022-04-22 13:13     ` Mike Snitzer
2022-04-22 13:26       ` Mike Snitzer
2022-04-25 23:57         ` Shinichiro Kawasaki
2022-04-18  3:00 ` [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19 Damien Le Moal
2022-04-18  3:16   ` Mike Snitzer
2022-04-18 12:49     ` Jens Axboe
2022-04-18 12:51 ` [dm-devel] (subset) " Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.