* fix nr_phys_segments vs iterators accounting v3 @ 2019-05-21 7:01 Christoph Hellwig 2019-05-21 7:01 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig ` (4 more replies) 0 siblings, 5 replies; 16+ messages in thread From: Christoph Hellwig @ 2019-05-21 7:01 UTC (permalink / raw) To: axboe; +Cc: ming.lei, linux-block Hi all, we have had a problem for a while where the number of segments that the bvec iterators will iterate over don't match the value in req->nr_phys_segments, causing problems for anyone looking at nr_phys_segments and iterating over bvec directly instead of using blk_rq_map_sg. The first patch in this series fixes this by making sure nr_phys_segments matches the actual number of segments. Drivers using blk_rq_map_sg will still get the lower number returned from function eventually, but the fact that we don't reduce the value earlier will not allow some merges that we might otherwise allow. With that in place I also noticed that we do not properly account segements sizes on devices with a virt_boundary, but it turns out that segment sizes fundamentally don't make sense for such devices, as their "segment" is a fixed size "device page", and not a variable sized scatter/gather elements as in the block layer, so we make that fact formal. Once all that is sorted out it is pretty clear that there is no good reason to have the front/back segement accounting to start with. Changes since v2: - fix some fixes tags Changes since v1: - update a commit log - add fixes tags - drop the follow on patches not suitable for 5.2 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments 2019-05-21 7:01 fix nr_phys_segments vs iterators accounting v3 Christoph Hellwig @ 2019-05-21 7:01 ` Christoph Hellwig 2019-05-21 8:05 ` Ming Lei 2019-05-21 7:01 ` [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary Christoph Hellwig ` (3 subsequent siblings) 4 siblings, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2019-05-21 7:01 UTC (permalink / raw) To: axboe; +Cc: ming.lei, linux-block, Hannes Reinecke Currently ll_merge_requests_fn, unlike all other merge functions, reduces nr_phys_segments by one if the last segment of the previous, and the first segment of the next segement are contigous. While this seems like a nice solution to avoid building smaller than possible requests it causes a mismatch between the segments actually present in the request and those iterated over by the bvec iterators, including __rq_for_each_bio. This can for example mistrigger the single segment optimization in the nvme-pci driver, and might lead to mismatching nr_phys_segments number when recalculating the number of request when inserting a cloned request. We could possibly work around this by making the bvec iterators take the front and back segment size into account, but that would require moving them from the bio to the bio_iter and spreading this mess over all users of bvecs. Or we could simply remove this optimization under the assumption that most users already build good enough bvecs, and that the bio merge patch never cared about this optimization either. The latter is what this patch does. dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests"). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> --- block/blk-merge.c | 23 +---------------------- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 21e87a714a73..80a5a0facb87 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -358,7 +358,6 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, unsigned front_seg_size; struct bio *fbio, *bbio; struct bvec_iter iter; - bool new_bio = false; if (!bio) return 0; @@ -379,31 +378,12 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, nr_phys_segs = 0; for_each_bio(bio) { bio_for_each_bvec(bv, bio, iter) { - if (new_bio) { - if (seg_size + bv.bv_len - > queue_max_segment_size(q)) - goto new_segment; - if (!biovec_phys_mergeable(q, &bvprv, &bv)) - goto new_segment; - - seg_size += bv.bv_len; - - if (nr_phys_segs == 1 && seg_size > - front_seg_size) - front_seg_size = seg_size; - - continue; - } -new_segment: bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size, &front_seg_size, NULL, UINT_MAX); - new_bio = false; } bbio = bio; - if (likely(bio->bi_iter.bi_size)) { + if (likely(bio->bi_iter.bi_size)) bvprv = bv; - new_bio = true; - } } fbio->bi_seg_front_size = front_seg_size; @@ -725,7 +705,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, req->bio->bi_seg_front_size = seg_size; if (next->nr_phys_segments == 1) next->biotail->bi_seg_back_size = seg_size; - total_phys_segments--; } if (total_phys_segments > queue_max_segments(q)) -- 2.20.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments 2019-05-21 7:01 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig @ 2019-05-21 8:05 ` Ming Lei 0 siblings, 0 replies; 16+ messages in thread From: Ming Lei @ 2019-05-21 8:05 UTC (permalink / raw) To: Christoph Hellwig; +Cc: axboe, linux-block, Hannes Reinecke On Tue, May 21, 2019 at 09:01:40AM +0200, Christoph Hellwig wrote: > Currently ll_merge_requests_fn, unlike all other merge functions, > reduces nr_phys_segments by one if the last segment of the previous, > and the first segment of the next segement are contigous. While this > seems like a nice solution to avoid building smaller than possible > requests it causes a mismatch between the segments actually present > in the request and those iterated over by the bvec iterators, including > __rq_for_each_bio. This can for example mistrigger the single segment > optimization in the nvme-pci driver, and might lead to mismatching > nr_phys_segments number when recalculating the number of request > when inserting a cloned request. > > We could possibly work around this by making the bvec iterators take > the front and back segment size into account, but that would require > moving them from the bio to the bio_iter and spreading this mess > over all users of bvecs. Or we could simply remove this optimization > under the assumption that most users already build good enough bvecs, > and that the bio merge patch never cared about this optimization > either. The latter is what this patch does. > > dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests"). > Signed-off-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Hannes Reinecke <hare@suse.com> > --- > block/blk-merge.c | 23 +---------------------- > 1 file changed, 1 insertion(+), 22 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 21e87a714a73..80a5a0facb87 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -358,7 +358,6 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, > unsigned front_seg_size; > struct bio *fbio, *bbio; > struct bvec_iter iter; > - bool new_bio = false; > > if (!bio) > return 0; > @@ -379,31 +378,12 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, > nr_phys_segs = 0; > for_each_bio(bio) { > bio_for_each_bvec(bv, bio, iter) { > - if (new_bio) { > - if (seg_size + bv.bv_len > - > queue_max_segment_size(q)) > - goto new_segment; > - if (!biovec_phys_mergeable(q, &bvprv, &bv)) > - goto new_segment; > - > - seg_size += bv.bv_len; > - > - if (nr_phys_segs == 1 && seg_size > > - front_seg_size) > - front_seg_size = seg_size; > - > - continue; > - } > -new_segment: > bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size, > &front_seg_size, NULL, UINT_MAX); > - new_bio = false; > } > bbio = bio; > - if (likely(bio->bi_iter.bi_size)) { > + if (likely(bio->bi_iter.bi_size)) > bvprv = bv; > - new_bio = true; > - } > } > > fbio->bi_seg_front_size = front_seg_size; > @@ -725,7 +705,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, > req->bio->bi_seg_front_size = seg_size; > if (next->nr_phys_segments == 1) > next->biotail->bi_seg_back_size = seg_size; > - total_phys_segments--; > } > > if (total_phys_segments > queue_max_segments(q)) > -- > 2.20.1 > Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary 2019-05-21 7:01 fix nr_phys_segments vs iterators accounting v3 Christoph Hellwig 2019-05-21 7:01 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig @ 2019-05-21 7:01 ` Christoph Hellwig 2019-07-23 15:48 ` James Bottomley 2019-05-21 7:01 ` [PATCH 3/4] block: remove the segment size check in bio_will_gap Christoph Hellwig ` (2 subsequent siblings) 4 siblings, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2019-05-21 7:01 UTC (permalink / raw) To: axboe; +Cc: ming.lei, linux-block, Hannes Reinecke We currently fail to update the front/back segment size in the bio when deciding to allow an otherwise gappy segement to a device with a virt boundary. The reason why this did not cause problems is that devices with a virt boundary fundamentally don't use segments as we know it and thus don't care. Make that assumption formal by forcing an unlimited segement size in this case. Fixes: f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can be mergeable") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> --- block/blk-settings.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/block/blk-settings.c b/block/blk-settings.c index 3facc41476be..2ae348c101a0 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -310,6 +310,9 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size) __func__, max_size); } + /* see blk_queue_virt_boundary() for the explanation */ + WARN_ON_ONCE(q->limits.virt_boundary_mask); + q->limits.max_segment_size = max_size; } EXPORT_SYMBOL(blk_queue_max_segment_size); @@ -742,6 +745,14 @@ EXPORT_SYMBOL(blk_queue_segment_boundary); void blk_queue_virt_boundary(struct request_queue *q, unsigned long mask) { q->limits.virt_boundary_mask = mask; + + /* + * Devices that require a virtual boundary do not support scatter/gather + * I/O natively, but instead require a descriptor list entry for each + * page (which might not be idential to the Linux PAGE_SIZE). Because + * of that they are not limited by our notion of "segment size". + */ + q->limits.max_segment_size = UINT_MAX; } EXPORT_SYMBOL(blk_queue_virt_boundary); -- 2.20.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary 2019-05-21 7:01 ` [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary Christoph Hellwig @ 2019-07-23 15:48 ` James Bottomley 2019-07-23 16:11 ` Christoph Hellwig 0 siblings, 1 reply; 16+ messages in thread From: James Bottomley @ 2019-07-23 15:48 UTC (permalink / raw) To: Christoph Hellwig, axboe; +Cc: ming.lei, linux-block, Hannes Reinecke On Tue, 2019-05-21 at 09:01 +0200, Christoph Hellwig wrote: > We currently fail to update the front/back segment size in the bio > when > deciding to allow an otherwise gappy segement to a device with a > virt boundary. The reason why this did not cause problems is that > devices with a virt boundary fundamentally don't use segments as we > know it and thus don't care. Make that assumption formal by forcing > an unlimited segement size in this case. > > Fixes: f6970f83ef79 ("block: don't check if adjacent bvecs in one bio > can be mergeable") > Signed-off-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Ming Lei <ming.lei@redhat.com> > Reviewed-by: Hannes Reinecke <hare@suse.com> > --- > block/blk-settings.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 3facc41476be..2ae348c101a0 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -310,6 +310,9 @@ void blk_queue_max_segment_size(struct > request_queue *q, unsigned int max_size) > __func__, max_size); > } > > + /* see blk_queue_virt_boundary() for the explanation */ > + WARN_ON_ONCE(q->limits.virt_boundary_mask); > + > q->limits.max_segment_size = max_size; > } > EXPORT_SYMBOL(blk_queue_max_segment_size); > @@ -742,6 +745,14 @@ EXPORT_SYMBOL(blk_queue_segment_boundary); > void blk_queue_virt_boundary(struct request_queue *q, unsigned long > mask) > { > q->limits.virt_boundary_mask = mask; > + > + /* > + * Devices that require a virtual boundary do not support > scatter/gather > + * I/O natively, but instead require a descriptor list entry > for each > + * page (which might not be idential to the Linux > PAGE_SIZE). Because > + * of that they are not limited by our notion of "segment > size". > + */ > + q->limits.max_segment_size = UINT_MAX; This needs to be conditional. If you did set the max segment size, the block layer should still respect it and not do an override here. This seems to be what's causing a regression in SCSI because we set the max_segment_size first then call the blk_queue_virt_boundary. Additionally, if you're not setting the virt boundary (mask == 0) this should not be set. I suggest the below. James --- diff --git a/block/blk-settings.c b/block/blk-settings.c index 2ae348c101a0..46a95536f3bd 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -752,7 +752,8 @@ void blk_queue_virt_boundary(struct request_queue *q, unsigned long mask) * page (which might not be idential to the Linux PAGE_SIZE). Because * of that they are not limited by our notion of "segment size". */ - q->limits.max_segment_size = UINT_MAX; + if (mask != 0 && q->limits.max_segment_size == BLK_MAX_SEGMENT_SIZE) + q->limits.max_segment_size = UINT_MAX; } EXPORT_SYMBOL(blk_queue_virt_boundary); ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary 2019-07-23 15:48 ` James Bottomley @ 2019-07-23 16:11 ` Christoph Hellwig 0 siblings, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2019-07-23 16:11 UTC (permalink / raw) To: James Bottomley Cc: Christoph Hellwig, axboe, ming.lei, linux-block, Hannes Reinecke On Tue, Jul 23, 2019 at 08:48:52AM -0700, James Bottomley wrote: > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 2ae348c101a0..46a95536f3bd 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -752,7 +752,8 @@ void blk_queue_virt_boundary(struct request_queue *q, unsigned long mask) > * page (which might not be idential to the Linux PAGE_SIZE). Because > * of that they are not limited by our notion of "segment size". > */ > - q->limits.max_segment_size = UINT_MAX; > + if (mask != 0 && q->limits.max_segment_size == BLK_MAX_SEGMENT_SIZE) > + q->limits.max_segment_size = UINT_MAX; The first check makes sense, defintively safer than leaving it to the caller. The second one is wrong - we need to force an unlimited segment size because we can't account for it for the virt_boundary merges. And the comment just above explains why that is safe. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/4] block: remove the segment size check in bio_will_gap 2019-05-21 7:01 fix nr_phys_segments vs iterators accounting v3 Christoph Hellwig 2019-05-21 7:01 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig 2019-05-21 7:01 ` [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary Christoph Hellwig @ 2019-05-21 7:01 ` Christoph Hellwig 2019-05-21 7:01 ` [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio Christoph Hellwig 2019-05-21 12:47 ` fix nr_phys_segments vs iterators accounting v3 Jens Axboe 4 siblings, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2019-05-21 7:01 UTC (permalink / raw) To: axboe; +Cc: ming.lei, linux-block, Hannes Reinecke We fundamentally do not have a maximum segement size for devices with a virt boundary. So don't bother checking it, especially given that the existing checks didn't properly work to start with as we never fully update the front/back segment size and miss the bi_seg_front_size that wuld have been required for some cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> --- block/blk-merge.c | 19 +------------------ 1 file changed, 1 insertion(+), 18 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 80a5a0facb87..eee2c02c50ce 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -12,23 +12,6 @@ #include "blk.h" -/* - * Check if the two bvecs from two bios can be merged to one segment. If yes, - * no need to check gap between the two bios since the 1st bio and the 1st bvec - * in the 2nd bio can be handled in one segment. - */ -static inline bool bios_segs_mergeable(struct request_queue *q, - struct bio *prev, struct bio_vec *prev_last_bv, - struct bio_vec *next_first_bv) -{ - if (!biovec_phys_mergeable(q, prev_last_bv, next_first_bv)) - return false; - if (prev->bi_seg_back_size + next_first_bv->bv_len > - queue_max_segment_size(q)) - return false; - return true; -} - static inline bool bio_will_gap(struct request_queue *q, struct request *prev_rq, struct bio *prev, struct bio *next) { @@ -60,7 +43,7 @@ static inline bool bio_will_gap(struct request_queue *q, */ bio_get_last_bvec(prev, &pb); bio_get_first_bvec(next, &nb); - if (bios_segs_mergeable(q, prev, &pb, &nb)) + if (biovec_phys_mergeable(q, &pb, &nb)) return false; return __bvec_gap_to_prev(q, &pb, nb.bv_offset); } -- 2.20.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio 2019-05-21 7:01 fix nr_phys_segments vs iterators accounting v3 Christoph Hellwig ` (2 preceding siblings ...) 2019-05-21 7:01 ` [PATCH 3/4] block: remove the segment size check in bio_will_gap Christoph Hellwig @ 2019-05-21 7:01 ` Christoph Hellwig 2019-05-21 8:06 ` Ming Lei 2019-05-21 12:47 ` fix nr_phys_segments vs iterators accounting v3 Jens Axboe 4 siblings, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2019-05-21 7:01 UTC (permalink / raw) To: axboe; +Cc: ming.lei, linux-block, Hannes Reinecke At this point these fields aren't used for anything, so we can remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> --- block/blk-merge.c | 94 +++++---------------------------------- include/linux/blk_types.h | 7 --- 2 files changed, 12 insertions(+), 89 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index eee2c02c50ce..17713d7d98d5 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -162,8 +162,7 @@ static unsigned get_max_segment_size(struct request_queue *q, * variables. */ static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv, - unsigned *nsegs, unsigned *last_seg_size, - unsigned *front_seg_size, unsigned *sectors, unsigned max_segs) + unsigned *nsegs, unsigned *sectors, unsigned max_segs) { unsigned len = bv->bv_len; unsigned total_len = 0; @@ -185,28 +184,12 @@ static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv, break; } - if (!new_nsegs) - return !!len; - - /* update front segment size */ - if (!*nsegs) { - unsigned first_seg_size; - - if (new_nsegs == 1) - first_seg_size = get_max_segment_size(q, bv->bv_offset); - else - first_seg_size = queue_max_segment_size(q); - - if (*front_seg_size < first_seg_size) - *front_seg_size = first_seg_size; + if (new_nsegs) { + *nsegs += new_nsegs; + if (sectors) + *sectors += total_len >> 9; } - /* update other varibles */ - *last_seg_size = seg_size; - *nsegs += new_nsegs; - if (sectors) - *sectors += total_len >> 9; - /* split in the middle of the bvec if len != 0 */ return !!len; } @@ -218,8 +201,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, { struct bio_vec bv, bvprv, *bvprvp = NULL; struct bvec_iter iter; - unsigned seg_size = 0, nsegs = 0, sectors = 0; - unsigned front_seg_size = bio->bi_seg_front_size; + unsigned nsegs = 0, sectors = 0; bool do_split = true; struct bio *new = NULL; const unsigned max_sectors = get_max_io_size(q, bio); @@ -243,8 +225,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, /* split in the middle of bvec */ bv.bv_len = (max_sectors - sectors) << 9; bvec_split_segs(q, &bv, &nsegs, - &seg_size, - &front_seg_size, §ors, max_segs); } goto split; @@ -258,12 +238,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, if (bv.bv_offset + bv.bv_len <= PAGE_SIZE) { nsegs++; - seg_size = bv.bv_len; sectors += bv.bv_len >> 9; - if (nsegs == 1 && seg_size > front_seg_size) - front_seg_size = seg_size; - } else if (bvec_split_segs(q, &bv, &nsegs, &seg_size, - &front_seg_size, §ors, max_segs)) { + } else if (bvec_split_segs(q, &bv, &nsegs, §ors, + max_segs)) { goto split; } } @@ -278,10 +255,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, bio = new; } - bio->bi_seg_front_size = front_seg_size; - if (seg_size > bio->bi_seg_back_size) - bio->bi_seg_back_size = seg_size; - return do_split ? new : NULL; } @@ -336,17 +309,13 @@ EXPORT_SYMBOL(blk_queue_split); static unsigned int __blk_recalc_rq_segments(struct request_queue *q, struct bio *bio) { - struct bio_vec uninitialized_var(bv), bvprv = { NULL }; - unsigned int seg_size, nr_phys_segs; - unsigned front_seg_size; - struct bio *fbio, *bbio; + unsigned int nr_phys_segs = 0; struct bvec_iter iter; + struct bio_vec bv; if (!bio) return 0; - front_seg_size = bio->bi_seg_front_size; - switch (bio_op(bio)) { case REQ_OP_DISCARD: case REQ_OP_SECURE_ERASE: @@ -356,23 +325,11 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, return 1; } - fbio = bio; - seg_size = 0; - nr_phys_segs = 0; for_each_bio(bio) { - bio_for_each_bvec(bv, bio, iter) { - bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size, - &front_seg_size, NULL, UINT_MAX); - } - bbio = bio; - if (likely(bio->bi_iter.bi_size)) - bvprv = bv; + bio_for_each_bvec(bv, bio, iter) + bvec_split_segs(q, &bv, &nr_phys_segs, NULL, UINT_MAX); } - fbio->bi_seg_front_size = front_seg_size; - if (seg_size > bbio->bi_seg_back_size) - bbio->bi_seg_back_size = seg_size; - return nr_phys_segs; } @@ -392,24 +349,6 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio) bio_set_flag(bio, BIO_SEG_VALID); } -static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio, - struct bio *nxt) -{ - struct bio_vec end_bv = { NULL }, nxt_bv; - - if (bio->bi_seg_back_size + nxt->bi_seg_front_size > - queue_max_segment_size(q)) - return 0; - - if (!bio_has_data(bio)) - return 1; - - bio_get_last_bvec(bio, &end_bv); - bio_get_first_bvec(nxt, &nxt_bv); - - return biovec_phys_mergeable(q, &end_bv, &nxt_bv); -} - static inline struct scatterlist *blk_next_sg(struct scatterlist **sg, struct scatterlist *sglist) { @@ -669,8 +608,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, struct request *next) { int total_phys_segments; - unsigned int seg_size = - req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size; if (req_gap_back_merge(req, next->bio)) return 0; @@ -683,13 +620,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, return 0; total_phys_segments = req->nr_phys_segments + next->nr_phys_segments; - if (blk_phys_contig_segment(q, req->biotail, next->bio)) { - if (req->nr_phys_segments == 1) - req->bio->bi_seg_front_size = seg_size; - if (next->nr_phys_segments == 1) - next->biotail->bi_seg_back_size = seg_size; - } - if (total_phys_segments > queue_max_segments(q)) return 0; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index be418275763c..95202f80676c 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -159,13 +159,6 @@ struct bio { */ unsigned int bi_phys_segments; - /* - * To keep track of the max segment size, we account for the - * sizes of the first and last mergeable segments in this bio. - */ - unsigned int bi_seg_front_size; - unsigned int bi_seg_back_size; - struct bvec_iter bi_iter; atomic_t __bi_remaining; -- 2.20.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio 2019-05-21 7:01 ` [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio Christoph Hellwig @ 2019-05-21 8:06 ` Ming Lei 0 siblings, 0 replies; 16+ messages in thread From: Ming Lei @ 2019-05-21 8:06 UTC (permalink / raw) To: Christoph Hellwig; +Cc: axboe, linux-block, Hannes Reinecke On Tue, May 21, 2019 at 09:01:43AM +0200, Christoph Hellwig wrote: > At this point these fields aren't used for anything, so we can remove > them. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Hannes Reinecke <hare@suse.com> > --- > block/blk-merge.c | 94 +++++---------------------------------- > include/linux/blk_types.h | 7 --- > 2 files changed, 12 insertions(+), 89 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index eee2c02c50ce..17713d7d98d5 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -162,8 +162,7 @@ static unsigned get_max_segment_size(struct request_queue *q, > * variables. > */ > static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv, > - unsigned *nsegs, unsigned *last_seg_size, > - unsigned *front_seg_size, unsigned *sectors, unsigned max_segs) > + unsigned *nsegs, unsigned *sectors, unsigned max_segs) > { > unsigned len = bv->bv_len; > unsigned total_len = 0; > @@ -185,28 +184,12 @@ static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv, > break; > } > > - if (!new_nsegs) > - return !!len; > - > - /* update front segment size */ > - if (!*nsegs) { > - unsigned first_seg_size; > - > - if (new_nsegs == 1) > - first_seg_size = get_max_segment_size(q, bv->bv_offset); > - else > - first_seg_size = queue_max_segment_size(q); > - > - if (*front_seg_size < first_seg_size) > - *front_seg_size = first_seg_size; > + if (new_nsegs) { > + *nsegs += new_nsegs; > + if (sectors) > + *sectors += total_len >> 9; > } > > - /* update other varibles */ > - *last_seg_size = seg_size; > - *nsegs += new_nsegs; > - if (sectors) > - *sectors += total_len >> 9; > - > /* split in the middle of the bvec if len != 0 */ > return !!len; > } > @@ -218,8 +201,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, > { > struct bio_vec bv, bvprv, *bvprvp = NULL; > struct bvec_iter iter; > - unsigned seg_size = 0, nsegs = 0, sectors = 0; > - unsigned front_seg_size = bio->bi_seg_front_size; > + unsigned nsegs = 0, sectors = 0; > bool do_split = true; > struct bio *new = NULL; > const unsigned max_sectors = get_max_io_size(q, bio); > @@ -243,8 +225,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, > /* split in the middle of bvec */ > bv.bv_len = (max_sectors - sectors) << 9; > bvec_split_segs(q, &bv, &nsegs, > - &seg_size, > - &front_seg_size, > §ors, max_segs); > } > goto split; > @@ -258,12 +238,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, > > if (bv.bv_offset + bv.bv_len <= PAGE_SIZE) { > nsegs++; > - seg_size = bv.bv_len; > sectors += bv.bv_len >> 9; > - if (nsegs == 1 && seg_size > front_seg_size) > - front_seg_size = seg_size; > - } else if (bvec_split_segs(q, &bv, &nsegs, &seg_size, > - &front_seg_size, §ors, max_segs)) { > + } else if (bvec_split_segs(q, &bv, &nsegs, §ors, > + max_segs)) { > goto split; > } > } > @@ -278,10 +255,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, > bio = new; > } > > - bio->bi_seg_front_size = front_seg_size; > - if (seg_size > bio->bi_seg_back_size) > - bio->bi_seg_back_size = seg_size; > - > return do_split ? new : NULL; > } > > @@ -336,17 +309,13 @@ EXPORT_SYMBOL(blk_queue_split); > static unsigned int __blk_recalc_rq_segments(struct request_queue *q, > struct bio *bio) > { > - struct bio_vec uninitialized_var(bv), bvprv = { NULL }; > - unsigned int seg_size, nr_phys_segs; > - unsigned front_seg_size; > - struct bio *fbio, *bbio; > + unsigned int nr_phys_segs = 0; > struct bvec_iter iter; > + struct bio_vec bv; > > if (!bio) > return 0; > > - front_seg_size = bio->bi_seg_front_size; > - > switch (bio_op(bio)) { > case REQ_OP_DISCARD: > case REQ_OP_SECURE_ERASE: > @@ -356,23 +325,11 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, > return 1; > } > > - fbio = bio; > - seg_size = 0; > - nr_phys_segs = 0; > for_each_bio(bio) { > - bio_for_each_bvec(bv, bio, iter) { > - bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size, > - &front_seg_size, NULL, UINT_MAX); > - } > - bbio = bio; > - if (likely(bio->bi_iter.bi_size)) > - bvprv = bv; > + bio_for_each_bvec(bv, bio, iter) > + bvec_split_segs(q, &bv, &nr_phys_segs, NULL, UINT_MAX); > } > > - fbio->bi_seg_front_size = front_seg_size; > - if (seg_size > bbio->bi_seg_back_size) > - bbio->bi_seg_back_size = seg_size; > - > return nr_phys_segs; > } > > @@ -392,24 +349,6 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio) > bio_set_flag(bio, BIO_SEG_VALID); > } > > -static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio, > - struct bio *nxt) > -{ > - struct bio_vec end_bv = { NULL }, nxt_bv; > - > - if (bio->bi_seg_back_size + nxt->bi_seg_front_size > > - queue_max_segment_size(q)) > - return 0; > - > - if (!bio_has_data(bio)) > - return 1; > - > - bio_get_last_bvec(bio, &end_bv); > - bio_get_first_bvec(nxt, &nxt_bv); > - > - return biovec_phys_mergeable(q, &end_bv, &nxt_bv); > -} > - > static inline struct scatterlist *blk_next_sg(struct scatterlist **sg, > struct scatterlist *sglist) > { > @@ -669,8 +608,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, > struct request *next) > { > int total_phys_segments; > - unsigned int seg_size = > - req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size; > > if (req_gap_back_merge(req, next->bio)) > return 0; > @@ -683,13 +620,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, > return 0; > > total_phys_segments = req->nr_phys_segments + next->nr_phys_segments; > - if (blk_phys_contig_segment(q, req->biotail, next->bio)) { > - if (req->nr_phys_segments == 1) > - req->bio->bi_seg_front_size = seg_size; > - if (next->nr_phys_segments == 1) > - next->biotail->bi_seg_back_size = seg_size; > - } > - > if (total_phys_segments > queue_max_segments(q)) > return 0; > > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h > index be418275763c..95202f80676c 100644 > --- a/include/linux/blk_types.h > +++ b/include/linux/blk_types.h > @@ -159,13 +159,6 @@ struct bio { > */ > unsigned int bi_phys_segments; > > - /* > - * To keep track of the max segment size, we account for the > - * sizes of the first and last mergeable segments in this bio. > - */ > - unsigned int bi_seg_front_size; > - unsigned int bi_seg_back_size; > - > struct bvec_iter bi_iter; > > atomic_t __bi_remaining; > -- > 2.20.1 > Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: fix nr_phys_segments vs iterators accounting v3 2019-05-21 7:01 fix nr_phys_segments vs iterators accounting v3 Christoph Hellwig ` (3 preceding siblings ...) 2019-05-21 7:01 ` [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio Christoph Hellwig @ 2019-05-21 12:47 ` Jens Axboe 4 siblings, 0 replies; 16+ messages in thread From: Jens Axboe @ 2019-05-21 12:47 UTC (permalink / raw) To: Christoph Hellwig; +Cc: ming.lei, linux-block On 5/21/19 1:01 AM, Christoph Hellwig wrote: > Hi all, > > we have had a problem for a while where the number of segments that > the bvec iterators will iterate over don't match the value in > req->nr_phys_segments, causing problems for anyone looking at > nr_phys_segments and iterating over bvec directly instead of using > blk_rq_map_sg. The first patch in this series fixes this by > making sure nr_phys_segments matches the actual number of segments. > Drivers using blk_rq_map_sg will still get the lower number returned > from function eventually, but the fact that we don't reduce the > value earlier will not allow some merges that we might otherwise > allow. > > With that in place I also noticed that we do not properly account > segements sizes on devices with a virt_boundary, but it turns out that > segment sizes fundamentally don't make sense for such devices, as their > "segment" is a fixed size "device page", and not a variable sized > scatter/gather elements as in the block layer, so we make that fact > formal. > > Once all that is sorted out it is pretty clear that there is no > good reason to have the front/back segement accounting to start > with. Applied, thanks. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* fix nr_phys_segments vs iterators accounting v2 @ 2019-05-16 8:40 Christoph Hellwig 2019-05-16 8:40 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig 0 siblings, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2019-05-16 8:40 UTC (permalink / raw) To: axboe; +Cc: ming.lei, linux-block Hi all, we have had a problem for a while where the number of segments that the bvec iterators will iterate over don't match the value in req->nr_phys_segments, causing problems for anyone looking at nr_phys_segments and iterating over bvec directly instead of using blk_rq_map_sg. The first patch in this series fixes this by making sure nr_phys_segments matches the actual number of segments. Drivers using blk_rq_map_sg will still get the lower number returned from function eventually, but the fact that we don't reduce the value earlier will not allow some merges that we might otherwise allow. With that in place I also noticed that we do not properly account segements sizes on devices with a virt_boundary, but it turns out that segment sizes fundamentally don't make sense for such devices, as their "segment" is a fixed size "device page", and not a variable sized scatter/gather elements as in the block layer, so we make that fact formal. Once all that is sorted out it is pretty clear that there is no good reason to have the front/back segement accounting to start with. Changes since v1: - update a commit log - add fixes tags - drop the follow on patches not suitable for 5.2 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments 2019-05-16 8:40 fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig @ 2019-05-16 8:40 ` Christoph Hellwig 2019-05-16 8:48 ` Hannes Reinecke 2019-05-16 13:17 ` Ming Lei 0 siblings, 2 replies; 16+ messages in thread From: Christoph Hellwig @ 2019-05-16 8:40 UTC (permalink / raw) To: axboe; +Cc: ming.lei, linux-block Currently ll_merge_requests_fn, unlike all other merge functions, reduces nr_phys_segments by one if the last segment of the previous, and the first segment of the next segement are contigous. While this seems like a nice solution to avoid building smaller than possible requests it causes a mismatch between the segments actually present in the request and those iterated over by the bvec iterators, including __rq_for_each_bio. This could cause overwrites of too small kmalloc allocations in any driver using ranged discard, or also mistrigger the single segment optimization in the nvme-pci driver. We could possibly work around this by making the bvec iterators take the front and back segment size into account, but that would require moving them from the bio to the bio_iter and spreading this mess over all users of bvecs. Or we could simply remove this optimization under the assumption that most users already build good enough bvecs, and that the bio merge patch never cared about this optimization either. The latter is what this patch does. Fixes: b35ba01ea697 ("nvme: support ranged discard requests") Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs") Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-merge.c | 23 +---------------------- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 21e87a714a73..80a5a0facb87 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -358,7 +358,6 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, unsigned front_seg_size; struct bio *fbio, *bbio; struct bvec_iter iter; - bool new_bio = false; if (!bio) return 0; @@ -379,31 +378,12 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, nr_phys_segs = 0; for_each_bio(bio) { bio_for_each_bvec(bv, bio, iter) { - if (new_bio) { - if (seg_size + bv.bv_len - > queue_max_segment_size(q)) - goto new_segment; - if (!biovec_phys_mergeable(q, &bvprv, &bv)) - goto new_segment; - - seg_size += bv.bv_len; - - if (nr_phys_segs == 1 && seg_size > - front_seg_size) - front_seg_size = seg_size; - - continue; - } -new_segment: bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size, &front_seg_size, NULL, UINT_MAX); - new_bio = false; } bbio = bio; - if (likely(bio->bi_iter.bi_size)) { + if (likely(bio->bi_iter.bi_size)) bvprv = bv; - new_bio = true; - } } fbio->bi_seg_front_size = front_seg_size; @@ -725,7 +705,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, req->bio->bi_seg_front_size = seg_size; if (next->nr_phys_segments == 1) next->biotail->bi_seg_back_size = seg_size; - total_phys_segments--; } if (total_phys_segments > queue_max_segments(q)) -- 2.20.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments 2019-05-16 8:40 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig @ 2019-05-16 8:48 ` Hannes Reinecke 2019-05-16 13:17 ` Ming Lei 1 sibling, 0 replies; 16+ messages in thread From: Hannes Reinecke @ 2019-05-16 8:48 UTC (permalink / raw) To: Christoph Hellwig, axboe; +Cc: ming.lei, linux-block On 5/16/19 10:40 AM, Christoph Hellwig wrote: > Currently ll_merge_requests_fn, unlike all other merge functions, > reduces nr_phys_segments by one if the last segment of the previous, > and the first segment of the next segement are contigous. While this > seems like a nice solution to avoid building smaller than possible > requests it causes a mismatch between the segments actually present > in the request and those iterated over by the bvec iterators, including > __rq_for_each_bio. This could cause overwrites of too small kmalloc > allocations in any driver using ranged discard, or also mistrigger > the single segment optimization in the nvme-pci driver. > > We could possibly work around this by making the bvec iterators take > the front and back segment size into account, but that would require > moving them from the bio to the bio_iter and spreading this mess > over all users of bvecs. Or we could simply remove this optimization > under the assumption that most users already build good enough bvecs, > and that the bio merge patch never cared about this optimization > either. The latter is what this patch does. > > Fixes: b35ba01ea697 ("nvme: support ranged discard requests") > Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") > Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs") > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/blk-merge.c | 23 +---------------------- > 1 file changed, 1 insertion(+), 22 deletions(-) > Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments 2019-05-16 8:40 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig 2019-05-16 8:48 ` Hannes Reinecke @ 2019-05-16 13:17 ` Ming Lei 2019-05-17 23:02 ` Ming Lei 2019-05-20 11:11 ` Christoph Hellwig 1 sibling, 2 replies; 16+ messages in thread From: Ming Lei @ 2019-05-16 13:17 UTC (permalink / raw) To: Christoph Hellwig; +Cc: axboe, linux-block Hi Christoph, On Thu, May 16, 2019 at 10:40:55AM +0200, Christoph Hellwig wrote: > Currently ll_merge_requests_fn, unlike all other merge functions, > reduces nr_phys_segments by one if the last segment of the previous, > and the first segment of the next segement are contigous. While this > seems like a nice solution to avoid building smaller than possible > requests it causes a mismatch between the segments actually present > in the request and those iterated over by the bvec iterators, including > __rq_for_each_bio. This could cause overwrites of too small kmalloc > allocations in any driver using ranged discard, or also mistrigger > the single segment optimization in the nvme-pci driver. > > We could possibly work around this by making the bvec iterators take > the front and back segment size into account, but that would require > moving them from the bio to the bio_iter and spreading this mess > over all users of bvecs. Or we could simply remove this optimization > under the assumption that most users already build good enough bvecs, > and that the bio merge patch never cared about this optimization > either. The latter is what this patch does. > > Fixes: b35ba01ea697 ("nvme: support ranged discard requests") > Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") ll_merge_requests_fn() is only called from attempt_merge() in case that ELEVATOR_BACK_MERGE is returned from blk_try_req_merge(). However, for discard merge of both virtio_blk and nvme, ELEVATOR_DISCARD_MERGE is always returned from blk_try_req_merge() in attempt_merge(), so looks ll_merge_requests_fn() shouldn't be called for virtio_blk/nvme's discard request. Just wondering if you may explain a bit how the change on ll_merge_requests_fn() in this patch makes a difference on the above two commits? > Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs") I guess it should be dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests"). Yes, this patch helps for this case, cause blk_rq_nr_phys_segments() may be 1 but there are two bios which share same segment. Thanks, Ming ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments 2019-05-16 13:17 ` Ming Lei @ 2019-05-17 23:02 ` Ming Lei 2019-05-20 11:11 ` Christoph Hellwig 1 sibling, 0 replies; 16+ messages in thread From: Ming Lei @ 2019-05-17 23:02 UTC (permalink / raw) To: Christoph Hellwig; +Cc: axboe, linux-block On Thu, May 16, 2019 at 09:17:04PM +0800, Ming Lei wrote: > Hi Christoph, > > On Thu, May 16, 2019 at 10:40:55AM +0200, Christoph Hellwig wrote: > > Currently ll_merge_requests_fn, unlike all other merge functions, > > reduces nr_phys_segments by one if the last segment of the previous, > > and the first segment of the next segement are contigous. While this > > seems like a nice solution to avoid building smaller than possible > > requests it causes a mismatch between the segments actually present > > in the request and those iterated over by the bvec iterators, including > > __rq_for_each_bio. This could cause overwrites of too small kmalloc > > allocations in any driver using ranged discard, or also mistrigger > > the single segment optimization in the nvme-pci driver. > > > > We could possibly work around this by making the bvec iterators take > > the front and back segment size into account, but that would require > > moving them from the bio to the bio_iter and spreading this mess > > over all users of bvecs. Or we could simply remove this optimization > > under the assumption that most users already build good enough bvecs, > > and that the bio merge patch never cared about this optimization > > either. The latter is what this patch does. > > > > Fixes: b35ba01ea697 ("nvme: support ranged discard requests") > > Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") > > ll_merge_requests_fn() is only called from attempt_merge() in case > that ELEVATOR_BACK_MERGE is returned from blk_try_req_merge(). However, > for discard merge of both virtio_blk and nvme, ELEVATOR_DISCARD_MERGE is > always returned from blk_try_req_merge() in attempt_merge(), so looks > ll_merge_requests_fn() shouldn't be called for virtio_blk/nvme's discard > request. Just wondering if you may explain a bit how the change on > ll_merge_requests_fn() in this patch makes a difference on the above > two commits? > > > Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs") > > I guess it should be dff824b2aadb ("nvme-pci: optimize mapping of small > single segment requests"). > > Yes, this patch helps for this case, cause blk_rq_nr_phys_segments() may be 1 > but there are two bios which share same segment. BTW, I just sent a single-line nvme-pci fix on this issue, which may be more suitable to serve as v5.2 fix: http://lists.infradead.org/pipermail/linux-nvme/2019-May/024283.html Thanks, Ming ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments 2019-05-16 13:17 ` Ming Lei 2019-05-17 23:02 ` Ming Lei @ 2019-05-20 11:11 ` Christoph Hellwig 2019-05-21 1:04 ` Ming Lei 1 sibling, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2019-05-20 11:11 UTC (permalink / raw) To: Ming Lei; +Cc: Christoph Hellwig, axboe, linux-block On Thu, May 16, 2019 at 09:17:04PM +0800, Ming Lei wrote: > ll_merge_requests_fn() is only called from attempt_merge() in case > that ELEVATOR_BACK_MERGE is returned from blk_try_req_merge(). However, > for discard merge of both virtio_blk and nvme, ELEVATOR_DISCARD_MERGE is > always returned from blk_try_req_merge() in attempt_merge(), so looks > ll_merge_requests_fn() shouldn't be called for virtio_blk/nvme's discard > request. Just wondering if you may explain a bit how the change on > ll_merge_requests_fn() in this patch makes a difference on the above > two commits? Good question. I've seen virtio overwriting its range, but I think this might have been been with a series to actually decrement nr_phys_segments for all cases where we can merge the tail and front bvecs. So mainline probably doesn't see it unless someone calls blk_recalc_rq_segments due to a partial completion or when using dm-multipath. Thinking of it at least the latter seems like a real possibily, although a rather unlikely use case. > > Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs") > > I guess it should be dff824b2aadb ("nvme-pci: optimize mapping of small > single segment requests"). Indeed. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments 2019-05-20 11:11 ` Christoph Hellwig @ 2019-05-21 1:04 ` Ming Lei 0 siblings, 0 replies; 16+ messages in thread From: Ming Lei @ 2019-05-21 1:04 UTC (permalink / raw) To: Christoph Hellwig; +Cc: axboe, linux-block On Mon, May 20, 2019 at 01:11:41PM +0200, Christoph Hellwig wrote: > On Thu, May 16, 2019 at 09:17:04PM +0800, Ming Lei wrote: > > ll_merge_requests_fn() is only called from attempt_merge() in case > > that ELEVATOR_BACK_MERGE is returned from blk_try_req_merge(). However, > > for discard merge of both virtio_blk and nvme, ELEVATOR_DISCARD_MERGE is > > always returned from blk_try_req_merge() in attempt_merge(), so looks > > ll_merge_requests_fn() shouldn't be called for virtio_blk/nvme's discard > > request. Just wondering if you may explain a bit how the change on > > ll_merge_requests_fn() in this patch makes a difference on the above > > two commits? > > Good question. I've seen virtio overwriting its range, but I think > this might have been been with a series to actually decrement > nr_phys_segments for all cases where we can merge the tail and front > bvecs. So mainline probably doesn't see it unless someone calls > blk_recalc_rq_segments due to a partial completion or when using > dm-multipath. Thinking of it at least the latter seems like a real > possibily, although a rather unlikely use case. This patch shouldn't effect discard IO in case of partial completion too cause blk_recalc_rq_segments() always return 0 for discard IO w/wo this patch. However looks this way is wrong, the following patch may help for this case: diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1aafeb923e7b..302667887ea1 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1109,7 +1109,12 @@ static inline unsigned short blk_rq_nr_phys_segments(struct request *rq) */ static inline unsigned short blk_rq_nr_discard_segments(struct request *rq) { - return max_t(unsigned short, rq->nr_phys_segments, 1); + struct bio *bio; + unsigned shart segs = 0; + + __rq_for_each_bio(bio, rq) + segs++; + return segs; } extern int blk_rq_map_sg(struct request_queue *, struct request *, struct scatterlist *); Or re-calculate the segment number in this way for multi-range discard IO in __blk_recalc_rq_segments(). Thanks, Ming ^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2019-07-23 16:11 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-21 7:01 fix nr_phys_segments vs iterators accounting v3 Christoph Hellwig 2019-05-21 7:01 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig 2019-05-21 8:05 ` Ming Lei 2019-05-21 7:01 ` [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary Christoph Hellwig 2019-07-23 15:48 ` James Bottomley 2019-07-23 16:11 ` Christoph Hellwig 2019-05-21 7:01 ` [PATCH 3/4] block: remove the segment size check in bio_will_gap Christoph Hellwig 2019-05-21 7:01 ` [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio Christoph Hellwig 2019-05-21 8:06 ` Ming Lei 2019-05-21 12:47 ` fix nr_phys_segments vs iterators accounting v3 Jens Axboe -- strict thread matches above, loose matches on Subject: below -- 2019-05-16 8:40 fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig 2019-05-16 8:40 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig 2019-05-16 8:48 ` Hannes Reinecke 2019-05-16 13:17 ` Ming Lei 2019-05-17 23:02 ` Ming Lei 2019-05-20 11:11 ` Christoph Hellwig 2019-05-21 1:04 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).