All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] block: a couple chunk_sectors fixes/improvements
@ 2020-09-15 17:23 ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, Vijayendra Suman, dm-devel, linux-block

Hi,

This v2 drops a patch from v1 and fixes the chunk_sectprs check added to
blk_stack_limits to convert chubk_sectors to bytes before comparing with
physical_block_size.

Jens, please feel free to pick up patches 1 and 2.

DM patches 3 and 4 are provided just to give context for how DM will be
updated to use chunk_sectors.

Mike Snitzer (4):
  block: use lcm_not_zero() when stacking chunk_sectors
  block: allow 'chunk_sectors' to be non-power-of-2
  dm table: stack 'chunk_sectors' limit to account for target-specific splitting
  dm: unconditionally call blk_queue_split() in dm_process_bio()

 block/blk-settings.c   | 22 ++++++++++++----------
 drivers/md/dm-table.c  |  5 +++++
 drivers/md/dm.c        | 45 +--------------------------------------------
 include/linux/blkdev.h | 12 +++++++++---
 4 files changed, 27 insertions(+), 57 deletions(-)

-- 
2.15.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 0/4] block: a couple chunk_sectors fixes/improvements
@ 2020-09-15 17:23 ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel, Vijayendra Suman, Ming Lei

Hi,

This v2 drops a patch from v1 and fixes the chunk_sectprs check added to
blk_stack_limits to convert chubk_sectors to bytes before comparing with
physical_block_size.

Jens, please feel free to pick up patches 1 and 2.

DM patches 3 and 4 are provided just to give context for how DM will be
updated to use chunk_sectors.

Mike Snitzer (4):
  block: use lcm_not_zero() when stacking chunk_sectors
  block: allow 'chunk_sectors' to be non-power-of-2
  dm table: stack 'chunk_sectors' limit to account for target-specific splitting
  dm: unconditionally call blk_queue_split() in dm_process_bio()

 block/blk-settings.c   | 22 ++++++++++++----------
 drivers/md/dm-table.c  |  5 +++++
 drivers/md/dm.c        | 45 +--------------------------------------------
 include/linux/blkdev.h | 12 +++++++++---
 4 files changed, 27 insertions(+), 57 deletions(-)

-- 
2.15.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] block: use lcm_not_zero() when stacking chunk_sectors
  2020-09-15 17:23 ` Mike Snitzer
@ 2020-09-15 17:23   ` Mike Snitzer
  -1 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, Vijayendra Suman, dm-devel, linux-block

Like 'io_opt', blk_stack_limits() should stack 'chunk_sectors' using
lcm_not_zero() rather than min_not_zero() -- otherwise the final
'chunk_sectors' could result in sub-optimal alignment of IO to
component devices in the IO stack.

Also, if 'chunk_sectors' isn't a multiple of 'physical_block_size'
then it is a bug in the driver and the device should be flagged as
'misaligned'.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-settings.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 76a7e03bcd6c..b2e1a929a6db 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -534,6 +534,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 
 	t->io_min = max(t->io_min, b->io_min);
 	t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
+	t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
 
 	/* Physical block size a multiple of the logical block size? */
 	if (t->physical_block_size & (t->logical_block_size - 1)) {
@@ -556,6 +557,13 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 		ret = -1;
 	}
 
+	/* chunk_sectors a multiple of the physical block size? */
+	if ((t->chunk_sectors << 9) & (t->physical_block_size - 1)) {
+		t->chunk_sectors = 0;
+		t->misaligned = 1;
+		ret = -1;
+	}
+
 	t->raid_partial_stripes_expensive =
 		max(t->raid_partial_stripes_expensive,
 		    b->raid_partial_stripes_expensive);
@@ -594,10 +602,6 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 			t->discard_granularity;
 	}
 
-	if (b->chunk_sectors)
-		t->chunk_sectors = min_not_zero(t->chunk_sectors,
-						b->chunk_sectors);
-
 	t->zoned = max(t->zoned, b->zoned);
 	return ret;
 }
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] block: use lcm_not_zero() when stacking chunk_sectors
@ 2020-09-15 17:23   ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel, Vijayendra Suman, Ming Lei

Like 'io_opt', blk_stack_limits() should stack 'chunk_sectors' using
lcm_not_zero() rather than min_not_zero() -- otherwise the final
'chunk_sectors' could result in sub-optimal alignment of IO to
component devices in the IO stack.

Also, if 'chunk_sectors' isn't a multiple of 'physical_block_size'
then it is a bug in the driver and the device should be flagged as
'misaligned'.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-settings.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 76a7e03bcd6c..b2e1a929a6db 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -534,6 +534,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 
 	t->io_min = max(t->io_min, b->io_min);
 	t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
+	t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
 
 	/* Physical block size a multiple of the logical block size? */
 	if (t->physical_block_size & (t->logical_block_size - 1)) {
@@ -556,6 +557,13 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 		ret = -1;
 	}
 
+	/* chunk_sectors a multiple of the physical block size? */
+	if ((t->chunk_sectors << 9) & (t->physical_block_size - 1)) {
+		t->chunk_sectors = 0;
+		t->misaligned = 1;
+		ret = -1;
+	}
+
 	t->raid_partial_stripes_expensive =
 		max(t->raid_partial_stripes_expensive,
 		    b->raid_partial_stripes_expensive);
@@ -594,10 +602,6 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 			t->discard_granularity;
 	}
 
-	if (b->chunk_sectors)
-		t->chunk_sectors = min_not_zero(t->chunk_sectors,
-						b->chunk_sectors);
-
 	t->zoned = max(t->zoned, b->zoned);
 	return ret;
 }
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] block: allow 'chunk_sectors' to be non-power-of-2
  2020-09-15 17:23 ` Mike Snitzer
@ 2020-09-15 17:23   ` Mike Snitzer
  -1 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, Vijayendra Suman, dm-devel, linux-block

It is possible for a block device to use a non power-of-2 for chunk
size which results in a full-stripe size that is also a non
power-of-2.

Update blk_queue_chunk_sectors() and blk_max_size_offset() to
accommodate drivers that need a non power-of-2 chunk_sectors.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-settings.c   | 10 ++++------
 include/linux/blkdev.h | 12 +++++++++---
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b2e1a929a6db..5ea3de48afba 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -172,15 +172,13 @@ EXPORT_SYMBOL(blk_queue_max_hw_sectors);
  *
  * Description:
  *    If a driver doesn't want IOs to cross a given chunk size, it can set
- *    this limit and prevent merging across chunks. Note that the chunk size
- *    must currently be a power-of-2 in sectors. Also note that the block
- *    layer must accept a page worth of data at any offset. So if the
- *    crossing of chunks is a hard limitation in the driver, it must still be
- *    prepared to split single page bios.
+ *    this limit and prevent merging across chunks. Note that the block layer
+ *    must accept a page worth of data at any offset. So if the crossing of
+ *    chunks is a hard limitation in the driver, it must still be prepared
+ *    to split single page bios.
  **/
 void blk_queue_chunk_sectors(struct request_queue *q, unsigned int chunk_sectors)
 {
-	BUG_ON(!is_power_of_2(chunk_sectors));
 	q->limits.chunk_sectors = chunk_sectors;
 }
 EXPORT_SYMBOL(blk_queue_chunk_sectors);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bb5636cc17b9..bbfbda33e993 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1059,11 +1059,17 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
 static inline unsigned int blk_max_size_offset(struct request_queue *q,
 					       sector_t offset)
 {
-	if (!q->limits.chunk_sectors)
+	unsigned int chunk_sectors = q->limits.chunk_sectors;
+
+	if (!chunk_sectors)
 		return q->limits.max_sectors;
 
-	return min(q->limits.max_sectors, (unsigned int)(q->limits.chunk_sectors -
-			(offset & (q->limits.chunk_sectors - 1))));
+	if (is_power_of_2(chunk_sectors))
+		chunk_sectors -= offset & (chunk_sectors - 1);
+	else
+		chunk_sectors -= sector_div(offset, chunk_sectors);
+
+	return min(q->limits.max_sectors, chunk_sectors);
 }
 
 static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] block: allow 'chunk_sectors' to be non-power-of-2
@ 2020-09-15 17:23   ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel, Vijayendra Suman, Ming Lei

It is possible for a block device to use a non power-of-2 for chunk
size which results in a full-stripe size that is also a non
power-of-2.

Update blk_queue_chunk_sectors() and blk_max_size_offset() to
accommodate drivers that need a non power-of-2 chunk_sectors.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-settings.c   | 10 ++++------
 include/linux/blkdev.h | 12 +++++++++---
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b2e1a929a6db..5ea3de48afba 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -172,15 +172,13 @@ EXPORT_SYMBOL(blk_queue_max_hw_sectors);
  *
  * Description:
  *    If a driver doesn't want IOs to cross a given chunk size, it can set
- *    this limit and prevent merging across chunks. Note that the chunk size
- *    must currently be a power-of-2 in sectors. Also note that the block
- *    layer must accept a page worth of data at any offset. So if the
- *    crossing of chunks is a hard limitation in the driver, it must still be
- *    prepared to split single page bios.
+ *    this limit and prevent merging across chunks. Note that the block layer
+ *    must accept a page worth of data at any offset. So if the crossing of
+ *    chunks is a hard limitation in the driver, it must still be prepared
+ *    to split single page bios.
  **/
 void blk_queue_chunk_sectors(struct request_queue *q, unsigned int chunk_sectors)
 {
-	BUG_ON(!is_power_of_2(chunk_sectors));
 	q->limits.chunk_sectors = chunk_sectors;
 }
 EXPORT_SYMBOL(blk_queue_chunk_sectors);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bb5636cc17b9..bbfbda33e993 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1059,11 +1059,17 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
 static inline unsigned int blk_max_size_offset(struct request_queue *q,
 					       sector_t offset)
 {
-	if (!q->limits.chunk_sectors)
+	unsigned int chunk_sectors = q->limits.chunk_sectors;
+
+	if (!chunk_sectors)
 		return q->limits.max_sectors;
 
-	return min(q->limits.max_sectors, (unsigned int)(q->limits.chunk_sectors -
-			(offset & (q->limits.chunk_sectors - 1))));
+	if (is_power_of_2(chunk_sectors))
+		chunk_sectors -= offset & (chunk_sectors - 1);
+	else
+		chunk_sectors -= sector_div(offset, chunk_sectors);
+
+	return min(q->limits.max_sectors, chunk_sectors);
 }
 
 static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] dm table: stack 'chunk_sectors' limit to account for target-specific splitting
  2020-09-15 17:23 ` Mike Snitzer
@ 2020-09-15 17:23   ` Mike Snitzer
  -1 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, Vijayendra Suman, dm-devel, linux-block

If target set ti->max_io_len it must be used when stacking
DM device's queue_limits to establish a 'chunk_sectors' that is
compatible with the IO stack.

By using lcm_not_zero() care is taken to avoid blindly overriding the
chunk_sectors limit stacked up by blk_stack_limits().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-table.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 5edc3079e7c1..248c5a1074a7 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -18,6 +18,7 @@
 #include <linux/mutex.h>
 #include <linux/delay.h>
 #include <linux/atomic.h>
+#include <linux/lcm.h>
 #include <linux/blk-mq.h>
 #include <linux/mount.h>
 #include <linux/dax.h>
@@ -1502,6 +1503,10 @@ int dm_calculate_queue_limits(struct dm_table *table,
 			zone_sectors = ti_limits.chunk_sectors;
 		}
 
+		/* Stack chunk_sectors if target-specific splitting is required */
+		if (ti->max_io_len)
+			ti_limits.chunk_sectors = lcm_not_zero(ti->max_io_len,
+							       ti_limits.chunk_sectors);
 		/* Set I/O hints portion of queue limits */
 		if (ti->type->io_hints)
 			ti->type->io_hints(ti, &ti_limits);
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] dm table: stack 'chunk_sectors' limit to account for target-specific splitting
@ 2020-09-15 17:23   ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel, Vijayendra Suman, Ming Lei

If target set ti->max_io_len it must be used when stacking
DM device's queue_limits to establish a 'chunk_sectors' that is
compatible with the IO stack.

By using lcm_not_zero() care is taken to avoid blindly overriding the
chunk_sectors limit stacked up by blk_stack_limits().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-table.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 5edc3079e7c1..248c5a1074a7 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -18,6 +18,7 @@
 #include <linux/mutex.h>
 #include <linux/delay.h>
 #include <linux/atomic.h>
+#include <linux/lcm.h>
 #include <linux/blk-mq.h>
 #include <linux/mount.h>
 #include <linux/dax.h>
@@ -1502,6 +1503,10 @@ int dm_calculate_queue_limits(struct dm_table *table,
 			zone_sectors = ti_limits.chunk_sectors;
 		}
 
+		/* Stack chunk_sectors if target-specific splitting is required */
+		if (ti->max_io_len)
+			ti_limits.chunk_sectors = lcm_not_zero(ti->max_io_len,
+							       ti_limits.chunk_sectors);
 		/* Set I/O hints portion of queue limits */
 		if (ti->type->io_hints)
 			ti->type->io_hints(ti, &ti_limits);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio()
  2020-09-15 17:23 ` Mike Snitzer
@ 2020-09-15 17:23   ` Mike Snitzer
  -1 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, Vijayendra Suman, dm-devel, linux-block

blk_queue_split() has become compulsory from .submit_bio -- regardless
of whether it is recursing.  Update DM core to always call
blk_queue_split().

dm_queue_split() is removed because __split_and_process_bio() handles
splitting as needed.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c | 45 +--------------------------------------------
 1 file changed, 1 insertion(+), 44 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fb0255d25e4b..0bae9f26dc8e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti)
 	return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti));
 }
 
-static bool is_abnormal_io(struct bio *bio)
-{
-	bool r = false;
-
-	switch (bio_op(bio)) {
-	case REQ_OP_DISCARD:
-	case REQ_OP_SECURE_ERASE:
-	case REQ_OP_WRITE_SAME:
-	case REQ_OP_WRITE_ZEROES:
-		r = true;
-		break;
-	}
-
-	return r;
-}
-
 static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
 				  int *result)
 {
@@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
 	return ret;
 }
 
-static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio)
-{
-	unsigned len, sector_count;
-
-	sector_count = bio_sectors(*bio);
-	len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count);
-
-	if (sector_count > len) {
-		struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split);
-
-		bio_chain(split, *bio);
-		trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector);
-		submit_bio_noacct(*bio);
-		*bio = split;
-	}
-}
-
 static blk_qc_t dm_process_bio(struct mapped_device *md,
 			       struct dm_table *map, struct bio *bio)
 {
@@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
 		}
 	}
 
-	/*
-	 * If in ->queue_bio we need to use blk_queue_split(), otherwise
-	 * queue_limits for abnormal requests (e.g. discard, writesame, etc)
-	 * won't be imposed.
-	 */
-	if (current->bio_list) {
-		if (is_abnormal_io(bio))
-			blk_queue_split(&bio);
-		else
-			dm_queue_split(md, ti, &bio);
-	}
+	blk_queue_split(&bio);
 
 	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
 		return __process_bio(md, map, bio, ti);
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio()
@ 2020-09-15 17:23   ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel, Vijayendra Suman, Ming Lei

blk_queue_split() has become compulsory from .submit_bio -- regardless
of whether it is recursing.  Update DM core to always call
blk_queue_split().

dm_queue_split() is removed because __split_and_process_bio() handles
splitting as needed.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c | 45 +--------------------------------------------
 1 file changed, 1 insertion(+), 44 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fb0255d25e4b..0bae9f26dc8e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti)
 	return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti));
 }
 
-static bool is_abnormal_io(struct bio *bio)
-{
-	bool r = false;
-
-	switch (bio_op(bio)) {
-	case REQ_OP_DISCARD:
-	case REQ_OP_SECURE_ERASE:
-	case REQ_OP_WRITE_SAME:
-	case REQ_OP_WRITE_ZEROES:
-		r = true;
-		break;
-	}
-
-	return r;
-}
-
 static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
 				  int *result)
 {
@@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
 	return ret;
 }
 
-static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio)
-{
-	unsigned len, sector_count;
-
-	sector_count = bio_sectors(*bio);
-	len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count);
-
-	if (sector_count > len) {
-		struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split);
-
-		bio_chain(split, *bio);
-		trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector);
-		submit_bio_noacct(*bio);
-		*bio = split;
-	}
-}
-
 static blk_qc_t dm_process_bio(struct mapped_device *md,
 			       struct dm_table *map, struct bio *bio)
 {
@@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
 		}
 	}
 
-	/*
-	 * If in ->queue_bio we need to use blk_queue_split(), otherwise
-	 * queue_limits for abnormal requests (e.g. discard, writesame, etc)
-	 * won't be imposed.
-	 */
-	if (current->bio_list) {
-		if (is_abnormal_io(bio))
-			blk_queue_split(&bio);
-		else
-			dm_queue_split(md, ti, &bio);
-	}
+	blk_queue_split(&bio);
 
 	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
 		return __process_bio(md, map, bio, ti);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio()
  2020-09-15 17:23   ` Mike Snitzer
  (?)
@ 2020-09-16  1:08   ` Ming Lei
  2020-09-16  1:28     ` Mike Snitzer
  -1 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2020-09-16  1:08 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, Vijayendra Suman, dm-devel, linux-block

On Tue, Sep 15, 2020 at 01:23:57PM -0400, Mike Snitzer wrote:
> blk_queue_split() has become compulsory from .submit_bio -- regardless
> of whether it is recursing.  Update DM core to always call
> blk_queue_split().
> 
> dm_queue_split() is removed because __split_and_process_bio() handles
> splitting as needed.
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  drivers/md/dm.c | 45 +--------------------------------------------
>  1 file changed, 1 insertion(+), 44 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index fb0255d25e4b..0bae9f26dc8e 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti)
>  	return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti));
>  }
>  
> -static bool is_abnormal_io(struct bio *bio)
> -{
> -	bool r = false;
> -
> -	switch (bio_op(bio)) {
> -	case REQ_OP_DISCARD:
> -	case REQ_OP_SECURE_ERASE:
> -	case REQ_OP_WRITE_SAME:
> -	case REQ_OP_WRITE_ZEROES:
> -		r = true;
> -		break;
> -	}
> -
> -	return r;
> -}
> -
>  static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
>  				  int *result)
>  {
> @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
>  	return ret;
>  }
>  
> -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio)
> -{
> -	unsigned len, sector_count;
> -
> -	sector_count = bio_sectors(*bio);
> -	len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count);
> -
> -	if (sector_count > len) {
> -		struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split);
> -
> -		bio_chain(split, *bio);
> -		trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector);
> -		submit_bio_noacct(*bio);
> -		*bio = split;
> -	}
> -}
> -
>  static blk_qc_t dm_process_bio(struct mapped_device *md,
>  			       struct dm_table *map, struct bio *bio)
>  {
> @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
>  		}
>  	}
>  
> -	/*
> -	 * If in ->queue_bio we need to use blk_queue_split(), otherwise
> -	 * queue_limits for abnormal requests (e.g. discard, writesame, etc)
> -	 * won't be imposed.
> -	 */
> -	if (current->bio_list) {
> -		if (is_abnormal_io(bio))
> -			blk_queue_split(&bio);
> -		else
> -			dm_queue_split(md, ti, &bio);
> -	}
> +	blk_queue_split(&bio);

In max_io_len(), target boundary is taken into account when figuring out
the max io len. However, this info won't be used any more after
switching to blk_queue_split(). Is that one potential problem?


thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio()
  2020-09-16  1:08   ` Ming Lei
@ 2020-09-16  1:28     ` Mike Snitzer
  2020-09-16  1:48       ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2020-09-16  1:28 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Vijayendra Suman, dm-devel, linux-block

On Tue, Sep 15 2020 at  9:08pm -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> On Tue, Sep 15, 2020 at 01:23:57PM -0400, Mike Snitzer wrote:
> > blk_queue_split() has become compulsory from .submit_bio -- regardless
> > of whether it is recursing.  Update DM core to always call
> > blk_queue_split().
> > 
> > dm_queue_split() is removed because __split_and_process_bio() handles
> > splitting as needed.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > ---
> >  drivers/md/dm.c | 45 +--------------------------------------------
> >  1 file changed, 1 insertion(+), 44 deletions(-)
> > 
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index fb0255d25e4b..0bae9f26dc8e 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti)
> >  	return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti));
> >  }
> >  
> > -static bool is_abnormal_io(struct bio *bio)
> > -{
> > -	bool r = false;
> > -
> > -	switch (bio_op(bio)) {
> > -	case REQ_OP_DISCARD:
> > -	case REQ_OP_SECURE_ERASE:
> > -	case REQ_OP_WRITE_SAME:
> > -	case REQ_OP_WRITE_ZEROES:
> > -		r = true;
> > -		break;
> > -	}
> > -
> > -	return r;
> > -}
> > -
> >  static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
> >  				  int *result)
> >  {
> > @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
> >  	return ret;
> >  }
> >  
> > -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio)
> > -{
> > -	unsigned len, sector_count;
> > -
> > -	sector_count = bio_sectors(*bio);
> > -	len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count);
> > -
> > -	if (sector_count > len) {
> > -		struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split);
> > -
> > -		bio_chain(split, *bio);
> > -		trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector);
> > -		submit_bio_noacct(*bio);
> > -		*bio = split;
> > -	}
> > -}
> > -
> >  static blk_qc_t dm_process_bio(struct mapped_device *md,
> >  			       struct dm_table *map, struct bio *bio)
> >  {
> > @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
> >  		}
> >  	}
> >  
> > -	/*
> > -	 * If in ->queue_bio we need to use blk_queue_split(), otherwise
> > -	 * queue_limits for abnormal requests (e.g. discard, writesame, etc)
> > -	 * won't be imposed.
> > -	 */
> > -	if (current->bio_list) {
> > -		if (is_abnormal_io(bio))
> > -			blk_queue_split(&bio);
> > -		else
> > -			dm_queue_split(md, ti, &bio);
> > -	}
> > +	blk_queue_split(&bio);
> 
> In max_io_len(), target boundary is taken into account when figuring out
> the max io len. However, this info won't be used any more after
> switching to blk_queue_split(). Is that one potential problem?

Thanks for your review.  But no, as the patch header says:
"dm_queue_split() is removed because __split_and_process_bio() handles
splitting as needed."

(__split_and_process_non_flush calls max_io_len, as does
__process_abnormal_io by calling __send_changing_extent_only)

SO the blk_queue_split() bio will be further split if needed (due to
DM target boundary, etc).

Mike


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio()
  2020-09-16  1:28     ` Mike Snitzer
@ 2020-09-16  1:48       ` Ming Lei
  2020-09-16  3:39         ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2020-09-16  1:48 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, Vijayendra Suman, dm-devel, linux-block

On Tue, Sep 15, 2020 at 09:28:14PM -0400, Mike Snitzer wrote:
> On Tue, Sep 15 2020 at  9:08pm -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On Tue, Sep 15, 2020 at 01:23:57PM -0400, Mike Snitzer wrote:
> > > blk_queue_split() has become compulsory from .submit_bio -- regardless
> > > of whether it is recursing.  Update DM core to always call
> > > blk_queue_split().
> > > 
> > > dm_queue_split() is removed because __split_and_process_bio() handles
> > > splitting as needed.
> > > 
> > > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > > ---
> > >  drivers/md/dm.c | 45 +--------------------------------------------
> > >  1 file changed, 1 insertion(+), 44 deletions(-)
> > > 
> > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > index fb0255d25e4b..0bae9f26dc8e 100644
> > > --- a/drivers/md/dm.c
> > > +++ b/drivers/md/dm.c
> > > @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti)
> > >  	return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti));
> > >  }
> > >  
> > > -static bool is_abnormal_io(struct bio *bio)
> > > -{
> > > -	bool r = false;
> > > -
> > > -	switch (bio_op(bio)) {
> > > -	case REQ_OP_DISCARD:
> > > -	case REQ_OP_SECURE_ERASE:
> > > -	case REQ_OP_WRITE_SAME:
> > > -	case REQ_OP_WRITE_ZEROES:
> > > -		r = true;
> > > -		break;
> > > -	}
> > > -
> > > -	return r;
> > > -}
> > > -
> > >  static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
> > >  				  int *result)
> > >  {
> > > @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
> > >  	return ret;
> > >  }
> > >  
> > > -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio)
> > > -{
> > > -	unsigned len, sector_count;
> > > -
> > > -	sector_count = bio_sectors(*bio);
> > > -	len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count);
> > > -
> > > -	if (sector_count > len) {
> > > -		struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split);
> > > -
> > > -		bio_chain(split, *bio);
> > > -		trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector);
> > > -		submit_bio_noacct(*bio);
> > > -		*bio = split;
> > > -	}
> > > -}
> > > -
> > >  static blk_qc_t dm_process_bio(struct mapped_device *md,
> > >  			       struct dm_table *map, struct bio *bio)
> > >  {
> > > @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
> > >  		}
> > >  	}
> > >  
> > > -	/*
> > > -	 * If in ->queue_bio we need to use blk_queue_split(), otherwise
> > > -	 * queue_limits for abnormal requests (e.g. discard, writesame, etc)
> > > -	 * won't be imposed.
> > > -	 */
> > > -	if (current->bio_list) {
> > > -		if (is_abnormal_io(bio))
> > > -			blk_queue_split(&bio);
> > > -		else
> > > -			dm_queue_split(md, ti, &bio);
> > > -	}
> > > +	blk_queue_split(&bio);
> > 
> > In max_io_len(), target boundary is taken into account when figuring out
> > the max io len. However, this info won't be used any more after
> > switching to blk_queue_split(). Is that one potential problem?
> 
> Thanks for your review.  But no, as the patch header says:
> "dm_queue_split() is removed because __split_and_process_bio() handles
> splitting as needed."
> 
> (__split_and_process_non_flush calls max_io_len, as does
> __process_abnormal_io by calling __send_changing_extent_only)
> 
> SO the blk_queue_split() bio will be further split if needed (due to
> DM target boundary, etc).

Thanks for your explanation.

Then looks there is double split issue since both blk_queue_split()
and __split_and_process_non_flush() may split bio from same bioset(md->queue->bio_split),
and this way may cause deadlock, see comment of bio_alloc_bioset(), especially
the paragraph of 'callers must never allocate more than 1 bio at a time
from this pool.'


Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio()
  2020-09-16  1:48       ` Ming Lei
@ 2020-09-16  3:39         ` Mike Snitzer
  2020-09-16  7:51           ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2020-09-16  3:39 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Vijayendra Suman, dm-devel, linux-block

On Tue, Sep 15 2020 at  9:48pm -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> On Tue, Sep 15, 2020 at 09:28:14PM -0400, Mike Snitzer wrote:
> > On Tue, Sep 15 2020 at  9:08pm -0400,
> > Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > > On Tue, Sep 15, 2020 at 01:23:57PM -0400, Mike Snitzer wrote:
> > > > blk_queue_split() has become compulsory from .submit_bio -- regardless
> > > > of whether it is recursing.  Update DM core to always call
> > > > blk_queue_split().
> > > > 
> > > > dm_queue_split() is removed because __split_and_process_bio() handles
> > > > splitting as needed.
> > > > 
> > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > > > ---
> > > >  drivers/md/dm.c | 45 +--------------------------------------------
> > > >  1 file changed, 1 insertion(+), 44 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > > index fb0255d25e4b..0bae9f26dc8e 100644
> > > > --- a/drivers/md/dm.c
> > > > +++ b/drivers/md/dm.c
> > > > @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti)
> > > >  	return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti));
> > > >  }
> > > >  
> > > > -static bool is_abnormal_io(struct bio *bio)
> > > > -{
> > > > -	bool r = false;
> > > > -
> > > > -	switch (bio_op(bio)) {
> > > > -	case REQ_OP_DISCARD:
> > > > -	case REQ_OP_SECURE_ERASE:
> > > > -	case REQ_OP_WRITE_SAME:
> > > > -	case REQ_OP_WRITE_ZEROES:
> > > > -		r = true;
> > > > -		break;
> > > > -	}
> > > > -
> > > > -	return r;
> > > > -}
> > > > -
> > > >  static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
> > > >  				  int *result)
> > > >  {
> > > > @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
> > > >  	return ret;
> > > >  }
> > > >  
> > > > -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio)
> > > > -{
> > > > -	unsigned len, sector_count;
> > > > -
> > > > -	sector_count = bio_sectors(*bio);
> > > > -	len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count);
> > > > -
> > > > -	if (sector_count > len) {
> > > > -		struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split);
> > > > -
> > > > -		bio_chain(split, *bio);
> > > > -		trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector);
> > > > -		submit_bio_noacct(*bio);
> > > > -		*bio = split;
> > > > -	}
> > > > -}
> > > > -
> > > >  static blk_qc_t dm_process_bio(struct mapped_device *md,
> > > >  			       struct dm_table *map, struct bio *bio)
> > > >  {
> > > > @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
> > > >  		}
> > > >  	}
> > > >  
> > > > -	/*
> > > > -	 * If in ->queue_bio we need to use blk_queue_split(), otherwise
> > > > -	 * queue_limits for abnormal requests (e.g. discard, writesame, etc)
> > > > -	 * won't be imposed.
> > > > -	 */
> > > > -	if (current->bio_list) {
> > > > -		if (is_abnormal_io(bio))
> > > > -			blk_queue_split(&bio);
> > > > -		else
> > > > -			dm_queue_split(md, ti, &bio);
> > > > -	}
> > > > +	blk_queue_split(&bio);
> > > 
> > > In max_io_len(), target boundary is taken into account when figuring out
> > > the max io len. However, this info won't be used any more after
> > > switching to blk_queue_split(). Is that one potential problem?
> > 
> > Thanks for your review.  But no, as the patch header says:
> > "dm_queue_split() is removed because __split_and_process_bio() handles
> > splitting as needed."
> > 
> > (__split_and_process_non_flush calls max_io_len, as does
> > __process_abnormal_io by calling __send_changing_extent_only)
> > 
> > SO the blk_queue_split() bio will be further split if needed (due to
> > DM target boundary, etc).
> 
> Thanks for your explanation.
> 
> Then looks there is double split issue since both blk_queue_split()
> and __split_and_process_non_flush() may split bio from same bioset(md->queue->bio_split),
> and this way may cause deadlock, see comment of bio_alloc_bioset(), especially
> the paragraph of 'callers must never allocate more than 1 bio at a time
> from this pool.'

Next sentence is:
"Callers that need to allocate more than 1 bio must always submit the
previously allocated bio for IO before attempting to allocate a new
one."

__split_and_process_non_flush -> __map_bio -> submit_bio_noacct
bio_split
submit_bio_noacct

With commit 18a25da84354c, NeilBrown wrote the __split_and_process_bio()
with an eye toward depth-first submission to avoid this deadlock you're
concerned about.  That commit header speaks to it directly.

I did go on to change Neil's code a bit with commit f21c601a2bb31 -- but
I _think_ the current code is still OK relative to bio_split mempool
use.

Mike


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio()
  2020-09-16  3:39         ` Mike Snitzer
@ 2020-09-16  7:51           ` Ming Lei
  0 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2020-09-16  7:51 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, Vijayendra Suman, dm-devel, linux-block

On Tue, Sep 15, 2020 at 11:39:46PM -0400, Mike Snitzer wrote:
> On Tue, Sep 15 2020 at  9:48pm -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On Tue, Sep 15, 2020 at 09:28:14PM -0400, Mike Snitzer wrote:
> > > On Tue, Sep 15 2020 at  9:08pm -0400,
> > > Ming Lei <ming.lei@redhat.com> wrote:
> > > 
> > > > On Tue, Sep 15, 2020 at 01:23:57PM -0400, Mike Snitzer wrote:
> > > > > blk_queue_split() has become compulsory from .submit_bio -- regardless
> > > > > of whether it is recursing.  Update DM core to always call
> > > > > blk_queue_split().
> > > > > 
> > > > > dm_queue_split() is removed because __split_and_process_bio() handles
> > > > > splitting as needed.
> > > > > 
> > > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > > > > ---
> > > > >  drivers/md/dm.c | 45 +--------------------------------------------
> > > > >  1 file changed, 1 insertion(+), 44 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > > > index fb0255d25e4b..0bae9f26dc8e 100644
> > > > > --- a/drivers/md/dm.c
> > > > > +++ b/drivers/md/dm.c
> > > > > @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti)
> > > > >  	return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti));
> > > > >  }
> > > > >  
> > > > > -static bool is_abnormal_io(struct bio *bio)
> > > > > -{
> > > > > -	bool r = false;
> > > > > -
> > > > > -	switch (bio_op(bio)) {
> > > > > -	case REQ_OP_DISCARD:
> > > > > -	case REQ_OP_SECURE_ERASE:
> > > > > -	case REQ_OP_WRITE_SAME:
> > > > > -	case REQ_OP_WRITE_ZEROES:
> > > > > -		r = true;
> > > > > -		break;
> > > > > -	}
> > > > > -
> > > > > -	return r;
> > > > > -}
> > > > > -
> > > > >  static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
> > > > >  				  int *result)
> > > > >  {
> > > > > @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
> > > > >  	return ret;
> > > > >  }
> > > > >  
> > > > > -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio)
> > > > > -{
> > > > > -	unsigned len, sector_count;
> > > > > -
> > > > > -	sector_count = bio_sectors(*bio);
> > > > > -	len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count);
> > > > > -
> > > > > -	if (sector_count > len) {
> > > > > -		struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split);
> > > > > -
> > > > > -		bio_chain(split, *bio);
> > > > > -		trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector);
> > > > > -		submit_bio_noacct(*bio);
> > > > > -		*bio = split;
> > > > > -	}
> > > > > -}
> > > > > -
> > > > >  static blk_qc_t dm_process_bio(struct mapped_device *md,
> > > > >  			       struct dm_table *map, struct bio *bio)
> > > > >  {
> > > > > @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
> > > > >  		}
> > > > >  	}
> > > > >  
> > > > > -	/*
> > > > > -	 * If in ->queue_bio we need to use blk_queue_split(), otherwise
> > > > > -	 * queue_limits for abnormal requests (e.g. discard, writesame, etc)
> > > > > -	 * won't be imposed.
> > > > > -	 */
> > > > > -	if (current->bio_list) {
> > > > > -		if (is_abnormal_io(bio))
> > > > > -			blk_queue_split(&bio);
> > > > > -		else
> > > > > -			dm_queue_split(md, ti, &bio);
> > > > > -	}
> > > > > +	blk_queue_split(&bio);
> > > > 
> > > > In max_io_len(), target boundary is taken into account when figuring out
> > > > the max io len. However, this info won't be used any more after
> > > > switching to blk_queue_split(). Is that one potential problem?
> > > 
> > > Thanks for your review.  But no, as the patch header says:
> > > "dm_queue_split() is removed because __split_and_process_bio() handles
> > > splitting as needed."
> > > 
> > > (__split_and_process_non_flush calls max_io_len, as does
> > > __process_abnormal_io by calling __send_changing_extent_only)
> > > 
> > > SO the blk_queue_split() bio will be further split if needed (due to
> > > DM target boundary, etc).
> > 
> > Thanks for your explanation.
> > 
> > Then looks there is double split issue since both blk_queue_split()
> > and __split_and_process_non_flush() may split bio from same bioset(md->queue->bio_split),
> > and this way may cause deadlock, see comment of bio_alloc_bioset(), especially
> > the paragraph of 'callers must never allocate more than 1 bio at a time
> > from this pool.'
> 
> Next sentence is:
> "Callers that need to allocate more than 1 bio must always submit the
> previously allocated bio for IO before attempting to allocate a new
> one."

Yeah, I know that. This sentence actually means that the previous
submission should make forward progress, then the bio may be completed &
freed, so that new allocation can move on.

However, in this situation, __split_and_process_non_flush() doesn't
provide such forward progress, see below.

> 
> __split_and_process_non_flush -> __map_bio -> submit_bio_noacct
> bio_split
> submit_bio_noacct

Yeah, the above submission is done on clone bio & underlying queue. What
matters is if the submission can make forward progress. After
__split_and_process_non_flush() returns, the splitted 'bio'(original bio)
can't be done by previous submission because this bio won't be freed until
dec_pending() from __split_and_process_bio() returns.

So when ci.sector_count doesn't become zero, bio_split() is called again from
the same bio_set for allocating new bio, the allocation may never be made because
the original bio allocated from the same bio_set can't be freed during bio_split().

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-09-16  7:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-15 17:23 [PATCH v2 0/4] block: a couple chunk_sectors fixes/improvements Mike Snitzer
2020-09-15 17:23 ` Mike Snitzer
2020-09-15 17:23 ` [PATCH v2 1/4] block: use lcm_not_zero() when stacking chunk_sectors Mike Snitzer
2020-09-15 17:23   ` Mike Snitzer
2020-09-15 17:23 ` [PATCH v2 2/4] block: allow 'chunk_sectors' to be non-power-of-2 Mike Snitzer
2020-09-15 17:23   ` Mike Snitzer
2020-09-15 17:23 ` [PATCH v2 3/4] dm table: stack 'chunk_sectors' limit to account for target-specific splitting Mike Snitzer
2020-09-15 17:23   ` Mike Snitzer
2020-09-15 17:23 ` [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() Mike Snitzer
2020-09-15 17:23   ` Mike Snitzer
2020-09-16  1:08   ` Ming Lei
2020-09-16  1:28     ` Mike Snitzer
2020-09-16  1:48       ` Ming Lei
2020-09-16  3:39         ` Mike Snitzer
2020-09-16  7:51           ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.