* [PATCH v2 0/4] block: a couple chunk_sectors fixes/improvements @ 2020-09-15 17:23 Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 1/4] block: use lcm_not_zero() when stacking chunk_sectors Mike Snitzer ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, dm-devel, Vijayendra Suman, Ming Lei Hi, This v2 drops a patch from v1 and fixes the chunk_sectprs check added to blk_stack_limits to convert chubk_sectors to bytes before comparing with physical_block_size. Jens, please feel free to pick up patches 1 and 2. DM patches 3 and 4 are provided just to give context for how DM will be updated to use chunk_sectors. Mike Snitzer (4): block: use lcm_not_zero() when stacking chunk_sectors block: allow 'chunk_sectors' to be non-power-of-2 dm table: stack 'chunk_sectors' limit to account for target-specific splitting dm: unconditionally call blk_queue_split() in dm_process_bio() block/blk-settings.c | 22 ++++++++++++---------- drivers/md/dm-table.c | 5 +++++ drivers/md/dm.c | 45 +-------------------------------------------- include/linux/blkdev.h | 12 +++++++++--- 4 files changed, 27 insertions(+), 57 deletions(-) -- 2.15.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/4] block: use lcm_not_zero() when stacking chunk_sectors 2020-09-15 17:23 [PATCH v2 0/4] block: a couple chunk_sectors fixes/improvements Mike Snitzer @ 2020-09-15 17:23 ` Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 2/4] block: allow 'chunk_sectors' to be non-power-of-2 Mike Snitzer ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, dm-devel, Vijayendra Suman, Ming Lei Like 'io_opt', blk_stack_limits() should stack 'chunk_sectors' using lcm_not_zero() rather than min_not_zero() -- otherwise the final 'chunk_sectors' could result in sub-optimal alignment of IO to component devices in the IO stack. Also, if 'chunk_sectors' isn't a multiple of 'physical_block_size' then it is a bug in the driver and the device should be flagged as 'misaligned'. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> --- block/blk-settings.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index 76a7e03bcd6c..b2e1a929a6db 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -534,6 +534,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->io_min = max(t->io_min, b->io_min); t->io_opt = lcm_not_zero(t->io_opt, b->io_opt); + t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors); /* Physical block size a multiple of the logical block size? */ if (t->physical_block_size & (t->logical_block_size - 1)) { @@ -556,6 +557,13 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, ret = -1; } + /* chunk_sectors a multiple of the physical block size? */ + if ((t->chunk_sectors << 9) & (t->physical_block_size - 1)) { + t->chunk_sectors = 0; + t->misaligned = 1; + ret = -1; + } + t->raid_partial_stripes_expensive = max(t->raid_partial_stripes_expensive, b->raid_partial_stripes_expensive); @@ -594,10 +602,6 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->discard_granularity; } - if (b->chunk_sectors) - t->chunk_sectors = min_not_zero(t->chunk_sectors, - b->chunk_sectors); - t->zoned = max(t->zoned, b->zoned); return ret; } -- 2.15.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/4] block: allow 'chunk_sectors' to be non-power-of-2 2020-09-15 17:23 [PATCH v2 0/4] block: a couple chunk_sectors fixes/improvements Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 1/4] block: use lcm_not_zero() when stacking chunk_sectors Mike Snitzer @ 2020-09-15 17:23 ` Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 3/4] dm table: stack 'chunk_sectors' limit to account for target-specific splitting Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() Mike Snitzer 3 siblings, 0 replies; 10+ messages in thread From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, dm-devel, Vijayendra Suman, Ming Lei It is possible for a block device to use a non power-of-2 for chunk size which results in a full-stripe size that is also a non power-of-2. Update blk_queue_chunk_sectors() and blk_max_size_offset() to accommodate drivers that need a non power-of-2 chunk_sectors. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> --- block/blk-settings.c | 10 ++++------ include/linux/blkdev.h | 12 +++++++++--- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index b2e1a929a6db..5ea3de48afba 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -172,15 +172,13 @@ EXPORT_SYMBOL(blk_queue_max_hw_sectors); * * Description: * If a driver doesn't want IOs to cross a given chunk size, it can set - * this limit and prevent merging across chunks. Note that the chunk size - * must currently be a power-of-2 in sectors. Also note that the block - * layer must accept a page worth of data at any offset. So if the - * crossing of chunks is a hard limitation in the driver, it must still be - * prepared to split single page bios. + * this limit and prevent merging across chunks. Note that the block layer + * must accept a page worth of data at any offset. So if the crossing of + * chunks is a hard limitation in the driver, it must still be prepared + * to split single page bios. **/ void blk_queue_chunk_sectors(struct request_queue *q, unsigned int chunk_sectors) { - BUG_ON(!is_power_of_2(chunk_sectors)); q->limits.chunk_sectors = chunk_sectors; } EXPORT_SYMBOL(blk_queue_chunk_sectors); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index bb5636cc17b9..bbfbda33e993 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1059,11 +1059,17 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q, static inline unsigned int blk_max_size_offset(struct request_queue *q, sector_t offset) { - if (!q->limits.chunk_sectors) + unsigned int chunk_sectors = q->limits.chunk_sectors; + + if (!chunk_sectors) return q->limits.max_sectors; - return min(q->limits.max_sectors, (unsigned int)(q->limits.chunk_sectors - - (offset & (q->limits.chunk_sectors - 1)))); + if (is_power_of_2(chunk_sectors)) + chunk_sectors -= offset & (chunk_sectors - 1); + else + chunk_sectors -= sector_div(offset, chunk_sectors); + + return min(q->limits.max_sectors, chunk_sectors); } static inline unsigned int blk_rq_get_max_sectors(struct request *rq, -- 2.15.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 3/4] dm table: stack 'chunk_sectors' limit to account for target-specific splitting 2020-09-15 17:23 [PATCH v2 0/4] block: a couple chunk_sectors fixes/improvements Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 1/4] block: use lcm_not_zero() when stacking chunk_sectors Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 2/4] block: allow 'chunk_sectors' to be non-power-of-2 Mike Snitzer @ 2020-09-15 17:23 ` Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() Mike Snitzer 3 siblings, 0 replies; 10+ messages in thread From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, dm-devel, Vijayendra Suman, Ming Lei If target set ti->max_io_len it must be used when stacking DM device's queue_limits to establish a 'chunk_sectors' that is compatible with the IO stack. By using lcm_not_zero() care is taken to avoid blindly overriding the chunk_sectors limit stacked up by blk_stack_limits(). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/md/dm-table.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 5edc3079e7c1..248c5a1074a7 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -18,6 +18,7 @@ #include <linux/mutex.h> #include <linux/delay.h> #include <linux/atomic.h> +#include <linux/lcm.h> #include <linux/blk-mq.h> #include <linux/mount.h> #include <linux/dax.h> @@ -1502,6 +1503,10 @@ int dm_calculate_queue_limits(struct dm_table *table, zone_sectors = ti_limits.chunk_sectors; } + /* Stack chunk_sectors if target-specific splitting is required */ + if (ti->max_io_len) + ti_limits.chunk_sectors = lcm_not_zero(ti->max_io_len, + ti_limits.chunk_sectors); /* Set I/O hints portion of queue limits */ if (ti->type->io_hints) ti->type->io_hints(ti, &ti_limits); -- 2.15.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() 2020-09-15 17:23 [PATCH v2 0/4] block: a couple chunk_sectors fixes/improvements Mike Snitzer ` (2 preceding siblings ...) 2020-09-15 17:23 ` [PATCH v2 3/4] dm table: stack 'chunk_sectors' limit to account for target-specific splitting Mike Snitzer @ 2020-09-15 17:23 ` Mike Snitzer 2020-09-16 1:08 ` Ming Lei 3 siblings, 1 reply; 10+ messages in thread From: Mike Snitzer @ 2020-09-15 17:23 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, dm-devel, Vijayendra Suman, Ming Lei blk_queue_split() has become compulsory from .submit_bio -- regardless of whether it is recursing. Update DM core to always call blk_queue_split(). dm_queue_split() is removed because __split_and_process_bio() handles splitting as needed. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/md/dm.c | 45 +-------------------------------------------- 1 file changed, 1 insertion(+), 44 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index fb0255d25e4b..0bae9f26dc8e 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti) return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti)); } -static bool is_abnormal_io(struct bio *bio) -{ - bool r = false; - - switch (bio_op(bio)) { - case REQ_OP_DISCARD: - case REQ_OP_SECURE_ERASE: - case REQ_OP_WRITE_SAME: - case REQ_OP_WRITE_ZEROES: - r = true; - break; - } - - return r; -} - static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti, int *result) { @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map, return ret; } -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio) -{ - unsigned len, sector_count; - - sector_count = bio_sectors(*bio); - len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count); - - if (sector_count > len) { - struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split); - - bio_chain(split, *bio); - trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector); - submit_bio_noacct(*bio); - *bio = split; - } -} - static blk_qc_t dm_process_bio(struct mapped_device *md, struct dm_table *map, struct bio *bio) { @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md, } } - /* - * If in ->queue_bio we need to use blk_queue_split(), otherwise - * queue_limits for abnormal requests (e.g. discard, writesame, etc) - * won't be imposed. - */ - if (current->bio_list) { - if (is_abnormal_io(bio)) - blk_queue_split(&bio); - else - dm_queue_split(md, ti, &bio); - } + blk_queue_split(&bio); if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED) return __process_bio(md, map, bio, ti); -- 2.15.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() 2020-09-15 17:23 ` [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() Mike Snitzer @ 2020-09-16 1:08 ` Ming Lei 2020-09-16 1:28 ` Mike Snitzer 0 siblings, 1 reply; 10+ messages in thread From: Ming Lei @ 2020-09-16 1:08 UTC (permalink / raw) To: Mike Snitzer; +Cc: Jens Axboe, Vijayendra Suman, dm-devel, linux-block On Tue, Sep 15, 2020 at 01:23:57PM -0400, Mike Snitzer wrote: > blk_queue_split() has become compulsory from .submit_bio -- regardless > of whether it is recursing. Update DM core to always call > blk_queue_split(). > > dm_queue_split() is removed because __split_and_process_bio() handles > splitting as needed. > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > --- > drivers/md/dm.c | 45 +-------------------------------------------- > 1 file changed, 1 insertion(+), 44 deletions(-) > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index fb0255d25e4b..0bae9f26dc8e 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti) > return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti)); > } > > -static bool is_abnormal_io(struct bio *bio) > -{ > - bool r = false; > - > - switch (bio_op(bio)) { > - case REQ_OP_DISCARD: > - case REQ_OP_SECURE_ERASE: > - case REQ_OP_WRITE_SAME: > - case REQ_OP_WRITE_ZEROES: > - r = true; > - break; > - } > - > - return r; > -} > - > static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti, > int *result) > { > @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map, > return ret; > } > > -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio) > -{ > - unsigned len, sector_count; > - > - sector_count = bio_sectors(*bio); > - len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count); > - > - if (sector_count > len) { > - struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split); > - > - bio_chain(split, *bio); > - trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector); > - submit_bio_noacct(*bio); > - *bio = split; > - } > -} > - > static blk_qc_t dm_process_bio(struct mapped_device *md, > struct dm_table *map, struct bio *bio) > { > @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md, > } > } > > - /* > - * If in ->queue_bio we need to use blk_queue_split(), otherwise > - * queue_limits for abnormal requests (e.g. discard, writesame, etc) > - * won't be imposed. > - */ > - if (current->bio_list) { > - if (is_abnormal_io(bio)) > - blk_queue_split(&bio); > - else > - dm_queue_split(md, ti, &bio); > - } > + blk_queue_split(&bio); In max_io_len(), target boundary is taken into account when figuring out the max io len. However, this info won't be used any more after switching to blk_queue_split(). Is that one potential problem? thanks, Ming ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() 2020-09-16 1:08 ` Ming Lei @ 2020-09-16 1:28 ` Mike Snitzer 2020-09-16 1:48 ` Ming Lei 0 siblings, 1 reply; 10+ messages in thread From: Mike Snitzer @ 2020-09-16 1:28 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, Vijayendra Suman, dm-devel, linux-block On Tue, Sep 15 2020 at 9:08pm -0400, Ming Lei <ming.lei@redhat.com> wrote: > On Tue, Sep 15, 2020 at 01:23:57PM -0400, Mike Snitzer wrote: > > blk_queue_split() has become compulsory from .submit_bio -- regardless > > of whether it is recursing. Update DM core to always call > > blk_queue_split(). > > > > dm_queue_split() is removed because __split_and_process_bio() handles > > splitting as needed. > > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > > --- > > drivers/md/dm.c | 45 +-------------------------------------------- > > 1 file changed, 1 insertion(+), 44 deletions(-) > > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > > index fb0255d25e4b..0bae9f26dc8e 100644 > > --- a/drivers/md/dm.c > > +++ b/drivers/md/dm.c > > @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti) > > return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti)); > > } > > > > -static bool is_abnormal_io(struct bio *bio) > > -{ > > - bool r = false; > > - > > - switch (bio_op(bio)) { > > - case REQ_OP_DISCARD: > > - case REQ_OP_SECURE_ERASE: > > - case REQ_OP_WRITE_SAME: > > - case REQ_OP_WRITE_ZEROES: > > - r = true; > > - break; > > - } > > - > > - return r; > > -} > > - > > static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti, > > int *result) > > { > > @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map, > > return ret; > > } > > > > -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio) > > -{ > > - unsigned len, sector_count; > > - > > - sector_count = bio_sectors(*bio); > > - len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count); > > - > > - if (sector_count > len) { > > - struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split); > > - > > - bio_chain(split, *bio); > > - trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector); > > - submit_bio_noacct(*bio); > > - *bio = split; > > - } > > -} > > - > > static blk_qc_t dm_process_bio(struct mapped_device *md, > > struct dm_table *map, struct bio *bio) > > { > > @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md, > > } > > } > > > > - /* > > - * If in ->queue_bio we need to use blk_queue_split(), otherwise > > - * queue_limits for abnormal requests (e.g. discard, writesame, etc) > > - * won't be imposed. > > - */ > > - if (current->bio_list) { > > - if (is_abnormal_io(bio)) > > - blk_queue_split(&bio); > > - else > > - dm_queue_split(md, ti, &bio); > > - } > > + blk_queue_split(&bio); > > In max_io_len(), target boundary is taken into account when figuring out > the max io len. However, this info won't be used any more after > switching to blk_queue_split(). Is that one potential problem? Thanks for your review. But no, as the patch header says: "dm_queue_split() is removed because __split_and_process_bio() handles splitting as needed." (__split_and_process_non_flush calls max_io_len, as does __process_abnormal_io by calling __send_changing_extent_only) SO the blk_queue_split() bio will be further split if needed (due to DM target boundary, etc). Mike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() 2020-09-16 1:28 ` Mike Snitzer @ 2020-09-16 1:48 ` Ming Lei 2020-09-16 3:39 ` Mike Snitzer 0 siblings, 1 reply; 10+ messages in thread From: Ming Lei @ 2020-09-16 1:48 UTC (permalink / raw) To: Mike Snitzer; +Cc: Jens Axboe, Vijayendra Suman, dm-devel, linux-block On Tue, Sep 15, 2020 at 09:28:14PM -0400, Mike Snitzer wrote: > On Tue, Sep 15 2020 at 9:08pm -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > On Tue, Sep 15, 2020 at 01:23:57PM -0400, Mike Snitzer wrote: > > > blk_queue_split() has become compulsory from .submit_bio -- regardless > > > of whether it is recursing. Update DM core to always call > > > blk_queue_split(). > > > > > > dm_queue_split() is removed because __split_and_process_bio() handles > > > splitting as needed. > > > > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > > > --- > > > drivers/md/dm.c | 45 +-------------------------------------------- > > > 1 file changed, 1 insertion(+), 44 deletions(-) > > > > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > > > index fb0255d25e4b..0bae9f26dc8e 100644 > > > --- a/drivers/md/dm.c > > > +++ b/drivers/md/dm.c > > > @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti) > > > return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti)); > > > } > > > > > > -static bool is_abnormal_io(struct bio *bio) > > > -{ > > > - bool r = false; > > > - > > > - switch (bio_op(bio)) { > > > - case REQ_OP_DISCARD: > > > - case REQ_OP_SECURE_ERASE: > > > - case REQ_OP_WRITE_SAME: > > > - case REQ_OP_WRITE_ZEROES: > > > - r = true; > > > - break; > > > - } > > > - > > > - return r; > > > -} > > > - > > > static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti, > > > int *result) > > > { > > > @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map, > > > return ret; > > > } > > > > > > -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio) > > > -{ > > > - unsigned len, sector_count; > > > - > > > - sector_count = bio_sectors(*bio); > > > - len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count); > > > - > > > - if (sector_count > len) { > > > - struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split); > > > - > > > - bio_chain(split, *bio); > > > - trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector); > > > - submit_bio_noacct(*bio); > > > - *bio = split; > > > - } > > > -} > > > - > > > static blk_qc_t dm_process_bio(struct mapped_device *md, > > > struct dm_table *map, struct bio *bio) > > > { > > > @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md, > > > } > > > } > > > > > > - /* > > > - * If in ->queue_bio we need to use blk_queue_split(), otherwise > > > - * queue_limits for abnormal requests (e.g. discard, writesame, etc) > > > - * won't be imposed. > > > - */ > > > - if (current->bio_list) { > > > - if (is_abnormal_io(bio)) > > > - blk_queue_split(&bio); > > > - else > > > - dm_queue_split(md, ti, &bio); > > > - } > > > + blk_queue_split(&bio); > > > > In max_io_len(), target boundary is taken into account when figuring out > > the max io len. However, this info won't be used any more after > > switching to blk_queue_split(). Is that one potential problem? > > Thanks for your review. But no, as the patch header says: > "dm_queue_split() is removed because __split_and_process_bio() handles > splitting as needed." > > (__split_and_process_non_flush calls max_io_len, as does > __process_abnormal_io by calling __send_changing_extent_only) > > SO the blk_queue_split() bio will be further split if needed (due to > DM target boundary, etc). Thanks for your explanation. Then looks there is double split issue since both blk_queue_split() and __split_and_process_non_flush() may split bio from same bioset(md->queue->bio_split), and this way may cause deadlock, see comment of bio_alloc_bioset(), especially the paragraph of 'callers must never allocate more than 1 bio at a time from this pool.' Thanks, Ming ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() 2020-09-16 1:48 ` Ming Lei @ 2020-09-16 3:39 ` Mike Snitzer 2020-09-16 7:51 ` Ming Lei 0 siblings, 1 reply; 10+ messages in thread From: Mike Snitzer @ 2020-09-16 3:39 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, Vijayendra Suman, dm-devel, linux-block On Tue, Sep 15 2020 at 9:48pm -0400, Ming Lei <ming.lei@redhat.com> wrote: > On Tue, Sep 15, 2020 at 09:28:14PM -0400, Mike Snitzer wrote: > > On Tue, Sep 15 2020 at 9:08pm -0400, > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > On Tue, Sep 15, 2020 at 01:23:57PM -0400, Mike Snitzer wrote: > > > > blk_queue_split() has become compulsory from .submit_bio -- regardless > > > > of whether it is recursing. Update DM core to always call > > > > blk_queue_split(). > > > > > > > > dm_queue_split() is removed because __split_and_process_bio() handles > > > > splitting as needed. > > > > > > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > > > > --- > > > > drivers/md/dm.c | 45 +-------------------------------------------- > > > > 1 file changed, 1 insertion(+), 44 deletions(-) > > > > > > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > > > > index fb0255d25e4b..0bae9f26dc8e 100644 > > > > --- a/drivers/md/dm.c > > > > +++ b/drivers/md/dm.c > > > > @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti) > > > > return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti)); > > > > } > > > > > > > > -static bool is_abnormal_io(struct bio *bio) > > > > -{ > > > > - bool r = false; > > > > - > > > > - switch (bio_op(bio)) { > > > > - case REQ_OP_DISCARD: > > > > - case REQ_OP_SECURE_ERASE: > > > > - case REQ_OP_WRITE_SAME: > > > > - case REQ_OP_WRITE_ZEROES: > > > > - r = true; > > > > - break; > > > > - } > > > > - > > > > - return r; > > > > -} > > > > - > > > > static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti, > > > > int *result) > > > > { > > > > @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map, > > > > return ret; > > > > } > > > > > > > > -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio) > > > > -{ > > > > - unsigned len, sector_count; > > > > - > > > > - sector_count = bio_sectors(*bio); > > > > - len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count); > > > > - > > > > - if (sector_count > len) { > > > > - struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split); > > > > - > > > > - bio_chain(split, *bio); > > > > - trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector); > > > > - submit_bio_noacct(*bio); > > > > - *bio = split; > > > > - } > > > > -} > > > > - > > > > static blk_qc_t dm_process_bio(struct mapped_device *md, > > > > struct dm_table *map, struct bio *bio) > > > > { > > > > @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md, > > > > } > > > > } > > > > > > > > - /* > > > > - * If in ->queue_bio we need to use blk_queue_split(), otherwise > > > > - * queue_limits for abnormal requests (e.g. discard, writesame, etc) > > > > - * won't be imposed. > > > > - */ > > > > - if (current->bio_list) { > > > > - if (is_abnormal_io(bio)) > > > > - blk_queue_split(&bio); > > > > - else > > > > - dm_queue_split(md, ti, &bio); > > > > - } > > > > + blk_queue_split(&bio); > > > > > > In max_io_len(), target boundary is taken into account when figuring out > > > the max io len. However, this info won't be used any more after > > > switching to blk_queue_split(). Is that one potential problem? > > > > Thanks for your review. But no, as the patch header says: > > "dm_queue_split() is removed because __split_and_process_bio() handles > > splitting as needed." > > > > (__split_and_process_non_flush calls max_io_len, as does > > __process_abnormal_io by calling __send_changing_extent_only) > > > > SO the blk_queue_split() bio will be further split if needed (due to > > DM target boundary, etc). > > Thanks for your explanation. > > Then looks there is double split issue since both blk_queue_split() > and __split_and_process_non_flush() may split bio from same bioset(md->queue->bio_split), > and this way may cause deadlock, see comment of bio_alloc_bioset(), especially > the paragraph of 'callers must never allocate more than 1 bio at a time > from this pool.' Next sentence is: "Callers that need to allocate more than 1 bio must always submit the previously allocated bio for IO before attempting to allocate a new one." __split_and_process_non_flush -> __map_bio -> submit_bio_noacct bio_split submit_bio_noacct With commit 18a25da84354c, NeilBrown wrote the __split_and_process_bio() with an eye toward depth-first submission to avoid this deadlock you're concerned about. That commit header speaks to it directly. I did go on to change Neil's code a bit with commit f21c601a2bb31 -- but I _think_ the current code is still OK relative to bio_split mempool use. Mike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() 2020-09-16 3:39 ` Mike Snitzer @ 2020-09-16 7:51 ` Ming Lei 0 siblings, 0 replies; 10+ messages in thread From: Ming Lei @ 2020-09-16 7:51 UTC (permalink / raw) To: Mike Snitzer; +Cc: Jens Axboe, Vijayendra Suman, dm-devel, linux-block On Tue, Sep 15, 2020 at 11:39:46PM -0400, Mike Snitzer wrote: > On Tue, Sep 15 2020 at 9:48pm -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > On Tue, Sep 15, 2020 at 09:28:14PM -0400, Mike Snitzer wrote: > > > On Tue, Sep 15 2020 at 9:08pm -0400, > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > On Tue, Sep 15, 2020 at 01:23:57PM -0400, Mike Snitzer wrote: > > > > > blk_queue_split() has become compulsory from .submit_bio -- regardless > > > > > of whether it is recursing. Update DM core to always call > > > > > blk_queue_split(). > > > > > > > > > > dm_queue_split() is removed because __split_and_process_bio() handles > > > > > splitting as needed. > > > > > > > > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > > > > > --- > > > > > drivers/md/dm.c | 45 +-------------------------------------------- > > > > > 1 file changed, 1 insertion(+), 44 deletions(-) > > > > > > > > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > > > > > index fb0255d25e4b..0bae9f26dc8e 100644 > > > > > --- a/drivers/md/dm.c > > > > > +++ b/drivers/md/dm.c > > > > > @@ -1530,22 +1530,6 @@ static int __send_write_zeroes(struct clone_info *ci, struct dm_target *ti) > > > > > return __send_changing_extent_only(ci, ti, get_num_write_zeroes_bios(ti)); > > > > > } > > > > > > > > > > -static bool is_abnormal_io(struct bio *bio) > > > > > -{ > > > > > - bool r = false; > > > > > - > > > > > - switch (bio_op(bio)) { > > > > > - case REQ_OP_DISCARD: > > > > > - case REQ_OP_SECURE_ERASE: > > > > > - case REQ_OP_WRITE_SAME: > > > > > - case REQ_OP_WRITE_ZEROES: > > > > > - r = true; > > > > > - break; > > > > > - } > > > > > - > > > > > - return r; > > > > > -} > > > > > - > > > > > static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti, > > > > > int *result) > > > > > { > > > > > @@ -1723,23 +1707,6 @@ static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map, > > > > > return ret; > > > > > } > > > > > > > > > > -static void dm_queue_split(struct mapped_device *md, struct dm_target *ti, struct bio **bio) > > > > > -{ > > > > > - unsigned len, sector_count; > > > > > - > > > > > - sector_count = bio_sectors(*bio); > > > > > - len = min_t(sector_t, max_io_len((*bio)->bi_iter.bi_sector, ti), sector_count); > > > > > - > > > > > - if (sector_count > len) { > > > > > - struct bio *split = bio_split(*bio, len, GFP_NOIO, &md->queue->bio_split); > > > > > - > > > > > - bio_chain(split, *bio); > > > > > - trace_block_split(md->queue, split, (*bio)->bi_iter.bi_sector); > > > > > - submit_bio_noacct(*bio); > > > > > - *bio = split; > > > > > - } > > > > > -} > > > > > - > > > > > static blk_qc_t dm_process_bio(struct mapped_device *md, > > > > > struct dm_table *map, struct bio *bio) > > > > > { > > > > > @@ -1759,17 +1726,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md, > > > > > } > > > > > } > > > > > > > > > > - /* > > > > > - * If in ->queue_bio we need to use blk_queue_split(), otherwise > > > > > - * queue_limits for abnormal requests (e.g. discard, writesame, etc) > > > > > - * won't be imposed. > > > > > - */ > > > > > - if (current->bio_list) { > > > > > - if (is_abnormal_io(bio)) > > > > > - blk_queue_split(&bio); > > > > > - else > > > > > - dm_queue_split(md, ti, &bio); > > > > > - } > > > > > + blk_queue_split(&bio); > > > > > > > > In max_io_len(), target boundary is taken into account when figuring out > > > > the max io len. However, this info won't be used any more after > > > > switching to blk_queue_split(). Is that one potential problem? > > > > > > Thanks for your review. But no, as the patch header says: > > > "dm_queue_split() is removed because __split_and_process_bio() handles > > > splitting as needed." > > > > > > (__split_and_process_non_flush calls max_io_len, as does > > > __process_abnormal_io by calling __send_changing_extent_only) > > > > > > SO the blk_queue_split() bio will be further split if needed (due to > > > DM target boundary, etc). > > > > Thanks for your explanation. > > > > Then looks there is double split issue since both blk_queue_split() > > and __split_and_process_non_flush() may split bio from same bioset(md->queue->bio_split), > > and this way may cause deadlock, see comment of bio_alloc_bioset(), especially > > the paragraph of 'callers must never allocate more than 1 bio at a time > > from this pool.' > > Next sentence is: > "Callers that need to allocate more than 1 bio must always submit the > previously allocated bio for IO before attempting to allocate a new > one." Yeah, I know that. This sentence actually means that the previous submission should make forward progress, then the bio may be completed & freed, so that new allocation can move on. However, in this situation, __split_and_process_non_flush() doesn't provide such forward progress, see below. > > __split_and_process_non_flush -> __map_bio -> submit_bio_noacct > bio_split > submit_bio_noacct Yeah, the above submission is done on clone bio & underlying queue. What matters is if the submission can make forward progress. After __split_and_process_non_flush() returns, the splitted 'bio'(original bio) can't be done by previous submission because this bio won't be freed until dec_pending() from __split_and_process_bio() returns. So when ci.sector_count doesn't become zero, bio_split() is called again from the same bio_set for allocating new bio, the allocation may never be made because the original bio allocated from the same bio_set can't be freed during bio_split(). Thanks, Ming ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-09-16 7:51 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-15 17:23 [PATCH v2 0/4] block: a couple chunk_sectors fixes/improvements Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 1/4] block: use lcm_not_zero() when stacking chunk_sectors Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 2/4] block: allow 'chunk_sectors' to be non-power-of-2 Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 3/4] dm table: stack 'chunk_sectors' limit to account for target-specific splitting Mike Snitzer 2020-09-15 17:23 ` [PATCH v2 4/4] dm: unconditionally call blk_queue_split() in dm_process_bio() Mike Snitzer 2020-09-16 1:08 ` Ming Lei 2020-09-16 1:28 ` Mike Snitzer 2020-09-16 1:48 ` Ming Lei 2020-09-16 3:39 ` Mike Snitzer 2020-09-16 7:51 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).