linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fix nr_phys_segments vs iterators accounting v2
@ 2019-05-16  8:40 Christoph Hellwig
  2019-05-16  8:40 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Christoph Hellwig @ 2019-05-16  8:40 UTC (permalink / raw)
  To: axboe; +Cc: ming.lei, linux-block

Hi all,

we have had a problem for a while where the number of segments that
the bvec iterators will iterate over don't match the value in
req->nr_phys_segments, causing problems for anyone looking at
nr_phys_segments and iterating over bvec directly instead of using
blk_rq_map_sg.  The first patch in this series fixes this by
making sure nr_phys_segments matches the actual number of segments.
Drivers using blk_rq_map_sg will still get the lower number returned
from function eventually, but the fact that we don't reduce the
value earlier will not allow some merges that we might otherwise
allow.

With that in place I also noticed that we do not properly account
segements sizes on devices with a virt_boundary, but it turns out that
segment sizes fundamentally don't make sense for such devices, as their
"segment" is a fixed size "device page", and not a variable sized
scatter/gather elements as in the block layer, so we make that fact
formal.

Once all that is sorted out it is pretty clear that there is no
good reason to have the front/back segement accounting to start
with.

Changes since v1:
  - update a commit log
  - add fixes tags
  - drop the follow on patches not suitable for 5.2

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments
  2019-05-16  8:40 fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig
@ 2019-05-16  8:40 ` Christoph Hellwig
  2019-05-16  8:48   ` Hannes Reinecke
  2019-05-16 13:17   ` Ming Lei
  2019-05-16  8:40 ` [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary Christoph Hellwig
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 21+ messages in thread
From: Christoph Hellwig @ 2019-05-16  8:40 UTC (permalink / raw)
  To: axboe; +Cc: ming.lei, linux-block

Currently ll_merge_requests_fn, unlike all other merge functions,
reduces nr_phys_segments by one if the last segment of the previous,
and the first segment of the next segement are contigous.  While this
seems like a nice solution to avoid building smaller than possible
requests it causes a mismatch between the segments actually present
in the request and those iterated over by the bvec iterators, including
__rq_for_each_bio.  This could cause overwrites of too small kmalloc
allocations in any driver using ranged discard, or also mistrigger
the single segment optimization in the nvme-pci driver.

We could possibly work around this by making the bvec iterators take
the front and back segment size into account, but that would require
moving them from the bio to the bio_iter and spreading this mess
over all users of bvecs.  Or we could simply remove this optimization
under the assumption that most users already build good enough bvecs,
and that the bio merge patch never cared about this optimization
either.  The latter is what this patch does.

Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-merge.c | 23 +----------------------
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 21e87a714a73..80a5a0facb87 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -358,7 +358,6 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	unsigned front_seg_size;
 	struct bio *fbio, *bbio;
 	struct bvec_iter iter;
-	bool new_bio = false;
 
 	if (!bio)
 		return 0;
@@ -379,31 +378,12 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	nr_phys_segs = 0;
 	for_each_bio(bio) {
 		bio_for_each_bvec(bv, bio, iter) {
-			if (new_bio) {
-				if (seg_size + bv.bv_len
-				    > queue_max_segment_size(q))
-					goto new_segment;
-				if (!biovec_phys_mergeable(q, &bvprv, &bv))
-					goto new_segment;
-
-				seg_size += bv.bv_len;
-
-				if (nr_phys_segs == 1 && seg_size >
-						front_seg_size)
-					front_seg_size = seg_size;
-
-				continue;
-			}
-new_segment:
 			bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size,
 					&front_seg_size, NULL, UINT_MAX);
-			new_bio = false;
 		}
 		bbio = bio;
-		if (likely(bio->bi_iter.bi_size)) {
+		if (likely(bio->bi_iter.bi_size))
 			bvprv = bv;
-			new_bio = true;
-		}
 	}
 
 	fbio->bi_seg_front_size = front_seg_size;
@@ -725,7 +705,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 			req->bio->bi_seg_front_size = seg_size;
 		if (next->nr_phys_segments == 1)
 			next->biotail->bi_seg_back_size = seg_size;
-		total_phys_segments--;
 	}
 
 	if (total_phys_segments > queue_max_segments(q))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary
  2019-05-16  8:40 fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig
  2019-05-16  8:40 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig
@ 2019-05-16  8:40 ` Christoph Hellwig
  2019-05-16  8:49   ` Hannes Reinecke
  2019-05-16  8:40 ` [PATCH 3/4] block: remove the segment size check in bio_will_gap Christoph Hellwig
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2019-05-16  8:40 UTC (permalink / raw)
  To: axboe; +Cc: ming.lei, linux-block

We currently fail to update the front/back segment size in the bio when
deciding to allow an otherwise gappy segement to a device with a
virt boundary.  The reason why this did not cause problems is that
devices with a virt boundary fundamentally don't use segments as we
know it and thus don't care.  Make that assumption formal by forcing
an unlimited segement size in this case.

Fixes: f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can be mergeable")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-settings.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 3facc41476be..2ae348c101a0 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -310,6 +310,9 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
 		       __func__, max_size);
 	}
 
+	/* see blk_queue_virt_boundary() for the explanation */
+	WARN_ON_ONCE(q->limits.virt_boundary_mask);
+
 	q->limits.max_segment_size = max_size;
 }
 EXPORT_SYMBOL(blk_queue_max_segment_size);
@@ -742,6 +745,14 @@ EXPORT_SYMBOL(blk_queue_segment_boundary);
 void blk_queue_virt_boundary(struct request_queue *q, unsigned long mask)
 {
 	q->limits.virt_boundary_mask = mask;
+
+	/*
+	 * Devices that require a virtual boundary do not support scatter/gather
+	 * I/O natively, but instead require a descriptor list entry for each
+	 * page (which might not be idential to the Linux PAGE_SIZE).  Because
+	 * of that they are not limited by our notion of "segment size".
+	 */
+	q->limits.max_segment_size = UINT_MAX;
 }
 EXPORT_SYMBOL(blk_queue_virt_boundary);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4] block: remove the segment size check in bio_will_gap
  2019-05-16  8:40 fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig
  2019-05-16  8:40 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig
  2019-05-16  8:40 ` [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary Christoph Hellwig
@ 2019-05-16  8:40 ` Christoph Hellwig
  2019-05-16  8:49   ` Hannes Reinecke
  2019-05-16  8:40 ` [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio Christoph Hellwig
  2019-05-20 11:17 ` fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig
  4 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2019-05-16  8:40 UTC (permalink / raw)
  To: axboe; +Cc: ming.lei, linux-block

We fundamentally do not have a maximum segement size for devices with a
virt boundary.  So don't bother checking it, especially given that the
existing checks didn't properly work to start with as we never fully
update the front/back segment size and miss the bi_seg_front_size that
wuld have been required for some cases.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-merge.c | 19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 80a5a0facb87..eee2c02c50ce 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -12,23 +12,6 @@
 
 #include "blk.h"
 
-/*
- * Check if the two bvecs from two bios can be merged to one segment.  If yes,
- * no need to check gap between the two bios since the 1st bio and the 1st bvec
- * in the 2nd bio can be handled in one segment.
- */
-static inline bool bios_segs_mergeable(struct request_queue *q,
-		struct bio *prev, struct bio_vec *prev_last_bv,
-		struct bio_vec *next_first_bv)
-{
-	if (!biovec_phys_mergeable(q, prev_last_bv, next_first_bv))
-		return false;
-	if (prev->bi_seg_back_size + next_first_bv->bv_len >
-			queue_max_segment_size(q))
-		return false;
-	return true;
-}
-
 static inline bool bio_will_gap(struct request_queue *q,
 		struct request *prev_rq, struct bio *prev, struct bio *next)
 {
@@ -60,7 +43,7 @@ static inline bool bio_will_gap(struct request_queue *q,
 	 */
 	bio_get_last_bvec(prev, &pb);
 	bio_get_first_bvec(next, &nb);
-	if (bios_segs_mergeable(q, prev, &pb, &nb))
+	if (biovec_phys_mergeable(q, &pb, &nb))
 		return false;
 	return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio
  2019-05-16  8:40 fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2019-05-16  8:40 ` [PATCH 3/4] block: remove the segment size check in bio_will_gap Christoph Hellwig
@ 2019-05-16  8:40 ` Christoph Hellwig
  2019-05-16  8:50   ` Hannes Reinecke
  2019-05-20 11:17 ` fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig
  4 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2019-05-16  8:40 UTC (permalink / raw)
  To: axboe; +Cc: ming.lei, linux-block

At this point these fields aren't used for anything, so we can remove
them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-merge.c         | 94 +++++----------------------------------
 include/linux/blk_types.h |  7 ---
 2 files changed, 12 insertions(+), 89 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index eee2c02c50ce..17713d7d98d5 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -162,8 +162,7 @@ static unsigned get_max_segment_size(struct request_queue *q,
  * variables.
  */
 static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv,
-		unsigned *nsegs, unsigned *last_seg_size,
-		unsigned *front_seg_size, unsigned *sectors, unsigned max_segs)
+		unsigned *nsegs, unsigned *sectors, unsigned max_segs)
 {
 	unsigned len = bv->bv_len;
 	unsigned total_len = 0;
@@ -185,28 +184,12 @@ static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv,
 			break;
 	}
 
-	if (!new_nsegs)
-		return !!len;
-
-	/* update front segment size */
-	if (!*nsegs) {
-		unsigned first_seg_size;
-
-		if (new_nsegs == 1)
-			first_seg_size = get_max_segment_size(q, bv->bv_offset);
-		else
-			first_seg_size = queue_max_segment_size(q);
-
-		if (*front_seg_size < first_seg_size)
-			*front_seg_size = first_seg_size;
+	if (new_nsegs) {
+		*nsegs += new_nsegs;
+		if (sectors)
+			*sectors += total_len >> 9;
 	}
 
-	/* update other varibles */
-	*last_seg_size = seg_size;
-	*nsegs += new_nsegs;
-	if (sectors)
-		*sectors += total_len >> 9;
-
 	/* split in the middle of the bvec if len != 0 */
 	return !!len;
 }
@@ -218,8 +201,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 {
 	struct bio_vec bv, bvprv, *bvprvp = NULL;
 	struct bvec_iter iter;
-	unsigned seg_size = 0, nsegs = 0, sectors = 0;
-	unsigned front_seg_size = bio->bi_seg_front_size;
+	unsigned nsegs = 0, sectors = 0;
 	bool do_split = true;
 	struct bio *new = NULL;
 	const unsigned max_sectors = get_max_io_size(q, bio);
@@ -243,8 +225,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 				/* split in the middle of bvec */
 				bv.bv_len = (max_sectors - sectors) << 9;
 				bvec_split_segs(q, &bv, &nsegs,
-						&seg_size,
-						&front_seg_size,
 						&sectors, max_segs);
 			}
 			goto split;
@@ -258,12 +238,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 
 		if (bv.bv_offset + bv.bv_len <= PAGE_SIZE) {
 			nsegs++;
-			seg_size = bv.bv_len;
 			sectors += bv.bv_len >> 9;
-			if (nsegs == 1 && seg_size > front_seg_size)
-				front_seg_size = seg_size;
-		} else if (bvec_split_segs(q, &bv, &nsegs, &seg_size,
-				    &front_seg_size, &sectors, max_segs)) {
+		} else if (bvec_split_segs(q, &bv, &nsegs, &sectors,
+				max_segs)) {
 			goto split;
 		}
 	}
@@ -278,10 +255,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			bio = new;
 	}
 
-	bio->bi_seg_front_size = front_seg_size;
-	if (seg_size > bio->bi_seg_back_size)
-		bio->bi_seg_back_size = seg_size;
-
 	return do_split ? new : NULL;
 }
 
@@ -336,17 +309,13 @@ EXPORT_SYMBOL(blk_queue_split);
 static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 					     struct bio *bio)
 {
-	struct bio_vec uninitialized_var(bv), bvprv = { NULL };
-	unsigned int seg_size, nr_phys_segs;
-	unsigned front_seg_size;
-	struct bio *fbio, *bbio;
+	unsigned int nr_phys_segs = 0;
 	struct bvec_iter iter;
+	struct bio_vec bv;
 
 	if (!bio)
 		return 0;
 
-	front_seg_size = bio->bi_seg_front_size;
-
 	switch (bio_op(bio)) {
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
@@ -356,23 +325,11 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 		return 1;
 	}
 
-	fbio = bio;
-	seg_size = 0;
-	nr_phys_segs = 0;
 	for_each_bio(bio) {
-		bio_for_each_bvec(bv, bio, iter) {
-			bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size,
-					&front_seg_size, NULL, UINT_MAX);
-		}
-		bbio = bio;
-		if (likely(bio->bi_iter.bi_size))
-			bvprv = bv;
+		bio_for_each_bvec(bv, bio, iter)
+			bvec_split_segs(q, &bv, &nr_phys_segs, NULL, UINT_MAX);
 	}
 
-	fbio->bi_seg_front_size = front_seg_size;
-	if (seg_size > bbio->bi_seg_back_size)
-		bbio->bi_seg_back_size = seg_size;
-
 	return nr_phys_segs;
 }
 
@@ -392,24 +349,6 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio)
 	bio_set_flag(bio, BIO_SEG_VALID);
 }
 
-static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
-				   struct bio *nxt)
-{
-	struct bio_vec end_bv = { NULL }, nxt_bv;
-
-	if (bio->bi_seg_back_size + nxt->bi_seg_front_size >
-	    queue_max_segment_size(q))
-		return 0;
-
-	if (!bio_has_data(bio))
-		return 1;
-
-	bio_get_last_bvec(bio, &end_bv);
-	bio_get_first_bvec(nxt, &nxt_bv);
-
-	return biovec_phys_mergeable(q, &end_bv, &nxt_bv);
-}
-
 static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
 		struct scatterlist *sglist)
 {
@@ -669,8 +608,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 				struct request *next)
 {
 	int total_phys_segments;
-	unsigned int seg_size =
-		req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
 
 	if (req_gap_back_merge(req, next->bio))
 		return 0;
@@ -683,13 +620,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 		return 0;
 
 	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
-	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
-		if (req->nr_phys_segments == 1)
-			req->bio->bi_seg_front_size = seg_size;
-		if (next->nr_phys_segments == 1)
-			next->biotail->bi_seg_back_size = seg_size;
-	}
-
 	if (total_phys_segments > queue_max_segments(q))
 		return 0;
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index be418275763c..95202f80676c 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -159,13 +159,6 @@ struct bio {
 	 */
 	unsigned int		bi_phys_segments;
 
-	/*
-	 * To keep track of the max segment size, we account for the
-	 * sizes of the first and last mergeable segments in this bio.
-	 */
-	unsigned int		bi_seg_front_size;
-	unsigned int		bi_seg_back_size;
-
 	struct bvec_iter	bi_iter;
 
 	atomic_t		__bi_remaining;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments
  2019-05-16  8:40 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig
@ 2019-05-16  8:48   ` Hannes Reinecke
  2019-05-16 13:17   ` Ming Lei
  1 sibling, 0 replies; 21+ messages in thread
From: Hannes Reinecke @ 2019-05-16  8:48 UTC (permalink / raw)
  To: Christoph Hellwig, axboe; +Cc: ming.lei, linux-block

On 5/16/19 10:40 AM, Christoph Hellwig wrote:
> Currently ll_merge_requests_fn, unlike all other merge functions,
> reduces nr_phys_segments by one if the last segment of the previous,
> and the first segment of the next segement are contigous.  While this
> seems like a nice solution to avoid building smaller than possible
> requests it causes a mismatch between the segments actually present
> in the request and those iterated over by the bvec iterators, including
> __rq_for_each_bio.  This could cause overwrites of too small kmalloc
> allocations in any driver using ranged discard, or also mistrigger
> the single segment optimization in the nvme-pci driver.
> 
> We could possibly work around this by making the bvec iterators take
> the front and back segment size into account, but that would require
> moving them from the bio to the bio_iter and spreading this mess
> over all users of bvecs.  Or we could simply remove this optimization
> under the assumption that most users already build good enough bvecs,
> and that the bio merge patch never cared about this optimization
> either.  The latter is what this patch does.
> 
> Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
> Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-merge.c | 23 +----------------------
>   1 file changed, 1 insertion(+), 22 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary
  2019-05-16  8:40 ` [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary Christoph Hellwig
@ 2019-05-16  8:49   ` Hannes Reinecke
  0 siblings, 0 replies; 21+ messages in thread
From: Hannes Reinecke @ 2019-05-16  8:49 UTC (permalink / raw)
  To: Christoph Hellwig, axboe; +Cc: ming.lei, linux-block

On 5/16/19 10:40 AM, Christoph Hellwig wrote:
> We currently fail to update the front/back segment size in the bio when
> deciding to allow an otherwise gappy segement to a device with a
> virt boundary.  The reason why this did not cause problems is that
> devices with a virt boundary fundamentally don't use segments as we
> know it and thus don't care.  Make that assumption formal by forcing
> an unlimited segement size in this case.
> 
> Fixes: f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can be mergeable")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-settings.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 3facc41476be..2ae348c101a0 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -310,6 +310,9 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
>   		       __func__, max_size);
>   	}
>   
> +	/* see blk_queue_virt_boundary() for the explanation */
> +	WARN_ON_ONCE(q->limits.virt_boundary_mask);
> +
>   	q->limits.max_segment_size = max_size;
>   }
>   EXPORT_SYMBOL(blk_queue_max_segment_size);
> @@ -742,6 +745,14 @@ EXPORT_SYMBOL(blk_queue_segment_boundary);
>   void blk_queue_virt_boundary(struct request_queue *q, unsigned long mask)
>   {
>   	q->limits.virt_boundary_mask = mask;
> +
> +	/*
> +	 * Devices that require a virtual boundary do not support scatter/gather
> +	 * I/O natively, but instead require a descriptor list entry for each
> +	 * page (which might not be idential to the Linux PAGE_SIZE).  Because
> +	 * of that they are not limited by our notion of "segment size".
> +	 */
> +	q->limits.max_segment_size = UINT_MAX;
>   }
>   EXPORT_SYMBOL(blk_queue_virt_boundary);
>   
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] block: remove the segment size check in bio_will_gap
  2019-05-16  8:40 ` [PATCH 3/4] block: remove the segment size check in bio_will_gap Christoph Hellwig
@ 2019-05-16  8:49   ` Hannes Reinecke
  0 siblings, 0 replies; 21+ messages in thread
From: Hannes Reinecke @ 2019-05-16  8:49 UTC (permalink / raw)
  To: Christoph Hellwig, axboe; +Cc: ming.lei, linux-block

On 5/16/19 10:40 AM, Christoph Hellwig wrote:
> We fundamentally do not have a maximum segement size for devices with a
> virt boundary.  So don't bother checking it, especially given that the
> existing checks didn't properly work to start with as we never fully
> update the front/back segment size and miss the bi_seg_front_size that
> wuld have been required for some cases.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-merge.c | 19 +------------------
>   1 file changed, 1 insertion(+), 18 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio
  2019-05-16  8:40 ` [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio Christoph Hellwig
@ 2019-05-16  8:50   ` Hannes Reinecke
  0 siblings, 0 replies; 21+ messages in thread
From: Hannes Reinecke @ 2019-05-16  8:50 UTC (permalink / raw)
  To: Christoph Hellwig, axboe; +Cc: ming.lei, linux-block

On 5/16/19 10:40 AM, Christoph Hellwig wrote:
> At this point these fields aren't used for anything, so we can remove
> them.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-merge.c         | 94 +++++----------------------------------
>   include/linux/blk_types.h |  7 ---
>   2 files changed, 12 insertions(+), 89 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments
  2019-05-16  8:40 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig
  2019-05-16  8:48   ` Hannes Reinecke
@ 2019-05-16 13:17   ` Ming Lei
  2019-05-17 23:02     ` Ming Lei
  2019-05-20 11:11     ` Christoph Hellwig
  1 sibling, 2 replies; 21+ messages in thread
From: Ming Lei @ 2019-05-16 13:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block

Hi Christoph,

On Thu, May 16, 2019 at 10:40:55AM +0200, Christoph Hellwig wrote:
> Currently ll_merge_requests_fn, unlike all other merge functions,
> reduces nr_phys_segments by one if the last segment of the previous,
> and the first segment of the next segement are contigous.  While this
> seems like a nice solution to avoid building smaller than possible
> requests it causes a mismatch between the segments actually present
> in the request and those iterated over by the bvec iterators, including
> __rq_for_each_bio.  This could cause overwrites of too small kmalloc
> allocations in any driver using ranged discard, or also mistrigger
> the single segment optimization in the nvme-pci driver.
> 
> We could possibly work around this by making the bvec iterators take
> the front and back segment size into account, but that would require
> moving them from the bio to the bio_iter and spreading this mess
> over all users of bvecs.  Or we could simply remove this optimization
> under the assumption that most users already build good enough bvecs,
> and that the bio merge patch never cared about this optimization
> either.  The latter is what this patch does.
> 
> Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")

ll_merge_requests_fn() is only called from attempt_merge() in case
that ELEVATOR_BACK_MERGE is returned from blk_try_req_merge(). However,
for discard merge of both virtio_blk and nvme, ELEVATOR_DISCARD_MERGE is
always returned from blk_try_req_merge() in attempt_merge(), so looks
ll_merge_requests_fn() shouldn't be called for virtio_blk/nvme's discard
request. Just wondering if you may explain a bit how the change on
ll_merge_requests_fn() in this patch makes a difference on the above
two commits?

> Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs")

I guess it should be dff824b2aadb ("nvme-pci: optimize mapping of small
single segment requests").

Yes, this patch helps for this case, cause blk_rq_nr_phys_segments() may be 1
but there are two bios which share same segment.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments
  2019-05-16 13:17   ` Ming Lei
@ 2019-05-17 23:02     ` Ming Lei
  2019-05-20 11:11     ` Christoph Hellwig
  1 sibling, 0 replies; 21+ messages in thread
From: Ming Lei @ 2019-05-17 23:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block

On Thu, May 16, 2019 at 09:17:04PM +0800, Ming Lei wrote:
> Hi Christoph,
> 
> On Thu, May 16, 2019 at 10:40:55AM +0200, Christoph Hellwig wrote:
> > Currently ll_merge_requests_fn, unlike all other merge functions,
> > reduces nr_phys_segments by one if the last segment of the previous,
> > and the first segment of the next segement are contigous.  While this
> > seems like a nice solution to avoid building smaller than possible
> > requests it causes a mismatch between the segments actually present
> > in the request and those iterated over by the bvec iterators, including
> > __rq_for_each_bio.  This could cause overwrites of too small kmalloc
> > allocations in any driver using ranged discard, or also mistrigger
> > the single segment optimization in the nvme-pci driver.
> > 
> > We could possibly work around this by making the bvec iterators take
> > the front and back segment size into account, but that would require
> > moving them from the bio to the bio_iter and spreading this mess
> > over all users of bvecs.  Or we could simply remove this optimization
> > under the assumption that most users already build good enough bvecs,
> > and that the bio merge patch never cared about this optimization
> > either.  The latter is what this patch does.
> > 
> > Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> > Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
> 
> ll_merge_requests_fn() is only called from attempt_merge() in case
> that ELEVATOR_BACK_MERGE is returned from blk_try_req_merge(). However,
> for discard merge of both virtio_blk and nvme, ELEVATOR_DISCARD_MERGE is
> always returned from blk_try_req_merge() in attempt_merge(), so looks
> ll_merge_requests_fn() shouldn't be called for virtio_blk/nvme's discard
> request. Just wondering if you may explain a bit how the change on
> ll_merge_requests_fn() in this patch makes a difference on the above
> two commits?
> 
> > Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs")
> 
> I guess it should be dff824b2aadb ("nvme-pci: optimize mapping of small
> single segment requests").
> 
> Yes, this patch helps for this case, cause blk_rq_nr_phys_segments() may be 1
> but there are two bios which share same segment.

BTW, I just sent a single-line nvme-pci fix on this issue, which may be more
suitable to serve as v5.2 fix:

http://lists.infradead.org/pipermail/linux-nvme/2019-May/024283.html

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments
  2019-05-16 13:17   ` Ming Lei
  2019-05-17 23:02     ` Ming Lei
@ 2019-05-20 11:11     ` Christoph Hellwig
  2019-05-21  1:04       ` Ming Lei
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2019-05-20 11:11 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, axboe, linux-block

On Thu, May 16, 2019 at 09:17:04PM +0800, Ming Lei wrote:
> ll_merge_requests_fn() is only called from attempt_merge() in case
> that ELEVATOR_BACK_MERGE is returned from blk_try_req_merge(). However,
> for discard merge of both virtio_blk and nvme, ELEVATOR_DISCARD_MERGE is
> always returned from blk_try_req_merge() in attempt_merge(), so looks
> ll_merge_requests_fn() shouldn't be called for virtio_blk/nvme's discard
> request. Just wondering if you may explain a bit how the change on
> ll_merge_requests_fn() in this patch makes a difference on the above
> two commits?

Good question.  I've seen virtio overwriting its range, but I think
this might have been been with a series to actually decrement
nr_phys_segments for all cases where we can merge the tail and front
bvecs.  So mainline probably doesn't see it unless someone calls
blk_recalc_rq_segments due to a partial completion or when using
dm-multipath.  Thinking of it at least the latter seems like a real
possibily, although a rather unlikely use case.

> > Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs")
> 
> I guess it should be dff824b2aadb ("nvme-pci: optimize mapping of small
> single segment requests").

Indeed.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: fix nr_phys_segments vs iterators accounting v2
  2019-05-16  8:40 fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2019-05-16  8:40 ` [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio Christoph Hellwig
@ 2019-05-20 11:17 ` Christoph Hellwig
  2019-05-21  1:09   ` Jens Axboe
  4 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2019-05-20 11:17 UTC (permalink / raw)
  To: axboe; +Cc: ming.lei, linux-block

Jens,

is this ok for 5.2?  If not we need to hack around the sector
accounting in nvme, and possibly virtio.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments
  2019-05-20 11:11     ` Christoph Hellwig
@ 2019-05-21  1:04       ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2019-05-21  1:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block

On Mon, May 20, 2019 at 01:11:41PM +0200, Christoph Hellwig wrote:
> On Thu, May 16, 2019 at 09:17:04PM +0800, Ming Lei wrote:
> > ll_merge_requests_fn() is only called from attempt_merge() in case
> > that ELEVATOR_BACK_MERGE is returned from blk_try_req_merge(). However,
> > for discard merge of both virtio_blk and nvme, ELEVATOR_DISCARD_MERGE is
> > always returned from blk_try_req_merge() in attempt_merge(), so looks
> > ll_merge_requests_fn() shouldn't be called for virtio_blk/nvme's discard
> > request. Just wondering if you may explain a bit how the change on
> > ll_merge_requests_fn() in this patch makes a difference on the above
> > two commits?
> 
> Good question.  I've seen virtio overwriting its range, but I think
> this might have been been with a series to actually decrement
> nr_phys_segments for all cases where we can merge the tail and front
> bvecs.  So mainline probably doesn't see it unless someone calls
> blk_recalc_rq_segments due to a partial completion or when using
> dm-multipath.  Thinking of it at least the latter seems like a real
> possibily, although a rather unlikely use case.

This patch shouldn't effect discard IO in case of partial completion too
cause blk_recalc_rq_segments() always return 0 for discard IO w/wo this
patch.

However looks this way is wrong, the following patch may help for this
case:

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1aafeb923e7b..302667887ea1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1109,7 +1109,12 @@ static inline unsigned short blk_rq_nr_phys_segments(struct request *rq)
  */
 static inline unsigned short blk_rq_nr_discard_segments(struct request *rq)
 {
-	return max_t(unsigned short, rq->nr_phys_segments, 1);
+	struct bio *bio;
+	unsigned shart segs = 0;
+
+	__rq_for_each_bio(bio, rq)
+		segs++;
+	return segs;
 }
 
 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct scatterlist *);

Or re-calculate the segment number in this way for multi-range discard IO in
__blk_recalc_rq_segments().

Thanks,
Ming

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: fix nr_phys_segments vs iterators accounting v2
  2019-05-20 11:17 ` fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig
@ 2019-05-21  1:09   ` Jens Axboe
  2019-05-21  1:17     ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Axboe @ 2019-05-21  1:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: ming.lei, linux-block

On 5/20/19 5:17 AM, Christoph Hellwig wrote:
> Jens,
> 
> is this ok for 5.2?  If not we need to hack around the sector
> accounting in nvme, and possibly virtio.

I'd rather do it right in 5.2, especially if we can get something
acceptable cobbled together this week.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: fix nr_phys_segments vs iterators accounting v2
  2019-05-21  1:09   ` Jens Axboe
@ 2019-05-21  1:17     ` Ming Lei
  2019-05-21  1:20       ` Jens Axboe
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2019-05-21  1:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block

On Tue, May 21, 2019 at 01:09:39AM +0000, Jens Axboe wrote:
> On 5/20/19 5:17 AM, Christoph Hellwig wrote:
> > Jens,
> > 
> > is this ok for 5.2?  If not we need to hack around the sector
> > accounting in nvme, and possibly virtio.
> 
> I'd rather do it right in 5.2, especially if we can get something
> acceptable cobbled together this week.

If this patchset will be merged to 5.2, please remove the following
two lines from patch 1:

Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")


Because the patch 1 doesn't fix them actually.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: fix nr_phys_segments vs iterators accounting v2
  2019-05-21  1:17     ` Ming Lei
@ 2019-05-21  1:20       ` Jens Axboe
  2019-05-21  1:29         ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Axboe @ 2019-05-21  1:20 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, linux-block

On 5/20/19 7:17 PM, Ming Lei wrote:
> On Tue, May 21, 2019 at 01:09:39AM +0000, Jens Axboe wrote:
>> On 5/20/19 5:17 AM, Christoph Hellwig wrote:
>>> Jens,
>>>
>>> is this ok for 5.2?  If not we need to hack around the sector
>>> accounting in nvme, and possibly virtio.
>>
>> I'd rather do it right in 5.2, especially if we can get something
>> acceptable cobbled together this week.
> 
> If this patchset will be merged to 5.2, please remove the following
> two lines from patch 1:
> 
> Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
> 
> 
> Because the patch 1 doesn't fix them actually.

I don't want to merge something unless you are happy with it as well.
Are you fine with going this route?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: fix nr_phys_segments vs iterators accounting v2
  2019-05-21  1:20       ` Jens Axboe
@ 2019-05-21  1:29         ` Ming Lei
  2019-05-21  5:11           ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2019-05-21  1:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block

On Mon, May 20, 2019 at 07:20:56PM -0600, Jens Axboe wrote:
> On 5/20/19 7:17 PM, Ming Lei wrote:
> > On Tue, May 21, 2019 at 01:09:39AM +0000, Jens Axboe wrote:
> >> On 5/20/19 5:17 AM, Christoph Hellwig wrote:
> >>> Jens,
> >>>
> >>> is this ok for 5.2?  If not we need to hack around the sector
> >>> accounting in nvme, and possibly virtio.
> >>
> >> I'd rather do it right in 5.2, especially if we can get something
> >> acceptable cobbled together this week.
> > 
> > If this patchset will be merged to 5.2, please remove the following
> > two lines from patch 1:
> > 
> > Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> > Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
> > 
> > 
> > Because the patch 1 doesn't fix them actually.
> 
> I don't want to merge something unless you are happy with it as well.
> Are you fine with going this route?

I am fine with this route, and just try to make the commit log not
misleading.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: fix nr_phys_segments vs iterators accounting v2
  2019-05-21  1:29         ` Ming Lei
@ 2019-05-21  5:11           ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2019-05-21  5:11 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block

On Tue, May 21, 2019 at 09:29:36AM +0800, Ming Lei wrote:
> I am fine with this route, and just try to make the commit log not
> misleading.

I'll resend with the updated tags.  And I'll also look into the whole
recalc segments vs discard issue later, as we might have more dragons
hidden there.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments
  2019-05-21  7:01 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig
@ 2019-05-21  8:05   ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2019-05-21  8:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, Hannes Reinecke

On Tue, May 21, 2019 at 09:01:40AM +0200, Christoph Hellwig wrote:
> Currently ll_merge_requests_fn, unlike all other merge functions,
> reduces nr_phys_segments by one if the last segment of the previous,
> and the first segment of the next segement are contigous.  While this
> seems like a nice solution to avoid building smaller than possible
> requests it causes a mismatch between the segments actually present
> in the request and those iterated over by the bvec iterators, including
> __rq_for_each_bio.  This can for example mistrigger the single segment
> optimization in the nvme-pci driver, and might lead to mismatching
> nr_phys_segments number when recalculating the number of request
> when inserting a cloned request.
> 
> We could possibly work around this by making the bvec iterators take
> the front and back segment size into account, but that would require
> moving them from the bio to the bio_iter and spreading this mess
> over all users of bvecs.  Or we could simply remove this optimization
> under the assumption that most users already build good enough bvecs,
> and that the bio merge patch never cared about this optimization
> either.  The latter is what this patch does.
> 
> dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests").
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> ---
>  block/blk-merge.c | 23 +----------------------
>  1 file changed, 1 insertion(+), 22 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 21e87a714a73..80a5a0facb87 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -358,7 +358,6 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
>  	unsigned front_seg_size;
>  	struct bio *fbio, *bbio;
>  	struct bvec_iter iter;
> -	bool new_bio = false;
>  
>  	if (!bio)
>  		return 0;
> @@ -379,31 +378,12 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
>  	nr_phys_segs = 0;
>  	for_each_bio(bio) {
>  		bio_for_each_bvec(bv, bio, iter) {
> -			if (new_bio) {
> -				if (seg_size + bv.bv_len
> -				    > queue_max_segment_size(q))
> -					goto new_segment;
> -				if (!biovec_phys_mergeable(q, &bvprv, &bv))
> -					goto new_segment;
> -
> -				seg_size += bv.bv_len;
> -
> -				if (nr_phys_segs == 1 && seg_size >
> -						front_seg_size)
> -					front_seg_size = seg_size;
> -
> -				continue;
> -			}
> -new_segment:
>  			bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size,
>  					&front_seg_size, NULL, UINT_MAX);
> -			new_bio = false;
>  		}
>  		bbio = bio;
> -		if (likely(bio->bi_iter.bi_size)) {
> +		if (likely(bio->bi_iter.bi_size))
>  			bvprv = bv;
> -			new_bio = true;
> -		}
>  	}
>  
>  	fbio->bi_seg_front_size = front_seg_size;
> @@ -725,7 +705,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  			req->bio->bi_seg_front_size = seg_size;
>  		if (next->nr_phys_segments == 1)
>  			next->biotail->bi_seg_back_size = seg_size;
> -		total_phys_segments--;
>  	}
>  
>  	if (total_phys_segments > queue_max_segments(q))
> -- 
> 2.20.1
> 

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments
  2019-05-21  7:01 fix nr_phys_segments vs iterators accounting v3 Christoph Hellwig
@ 2019-05-21  7:01 ` Christoph Hellwig
  2019-05-21  8:05   ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2019-05-21  7:01 UTC (permalink / raw)
  To: axboe; +Cc: ming.lei, linux-block, Hannes Reinecke

Currently ll_merge_requests_fn, unlike all other merge functions,
reduces nr_phys_segments by one if the last segment of the previous,
and the first segment of the next segement are contigous.  While this
seems like a nice solution to avoid building smaller than possible
requests it causes a mismatch between the segments actually present
in the request and those iterated over by the bvec iterators, including
__rq_for_each_bio.  This can for example mistrigger the single segment
optimization in the nvme-pci driver, and might lead to mismatching
nr_phys_segments number when recalculating the number of request
when inserting a cloned request.

We could possibly work around this by making the bvec iterators take
the front and back segment size into account, but that would require
moving them from the bio to the bio_iter and spreading this mess
over all users of bvecs.  Or we could simply remove this optimization
under the assumption that most users already build good enough bvecs,
and that the bio merge patch never cared about this optimization
either.  The latter is what this patch does.

dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests").
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
 block/blk-merge.c | 23 +----------------------
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 21e87a714a73..80a5a0facb87 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -358,7 +358,6 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	unsigned front_seg_size;
 	struct bio *fbio, *bbio;
 	struct bvec_iter iter;
-	bool new_bio = false;
 
 	if (!bio)
 		return 0;
@@ -379,31 +378,12 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	nr_phys_segs = 0;
 	for_each_bio(bio) {
 		bio_for_each_bvec(bv, bio, iter) {
-			if (new_bio) {
-				if (seg_size + bv.bv_len
-				    > queue_max_segment_size(q))
-					goto new_segment;
-				if (!biovec_phys_mergeable(q, &bvprv, &bv))
-					goto new_segment;
-
-				seg_size += bv.bv_len;
-
-				if (nr_phys_segs == 1 && seg_size >
-						front_seg_size)
-					front_seg_size = seg_size;
-
-				continue;
-			}
-new_segment:
 			bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size,
 					&front_seg_size, NULL, UINT_MAX);
-			new_bio = false;
 		}
 		bbio = bio;
-		if (likely(bio->bi_iter.bi_size)) {
+		if (likely(bio->bi_iter.bi_size))
 			bvprv = bv;
-			new_bio = true;
-		}
 	}
 
 	fbio->bi_seg_front_size = front_seg_size;
@@ -725,7 +705,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 			req->bio->bi_seg_front_size = seg_size;
 		if (next->nr_phys_segments == 1)
 			next->biotail->bi_seg_back_size = seg_size;
-		total_phys_segments--;
 	}
 
 	if (total_phys_segments > queue_max_segments(q))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-05-21  8:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16  8:40 fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig
2019-05-16  8:40 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig
2019-05-16  8:48   ` Hannes Reinecke
2019-05-16 13:17   ` Ming Lei
2019-05-17 23:02     ` Ming Lei
2019-05-20 11:11     ` Christoph Hellwig
2019-05-21  1:04       ` Ming Lei
2019-05-16  8:40 ` [PATCH 2/4] block: force an unlimited segment size on queues with a virt boundary Christoph Hellwig
2019-05-16  8:49   ` Hannes Reinecke
2019-05-16  8:40 ` [PATCH 3/4] block: remove the segment size check in bio_will_gap Christoph Hellwig
2019-05-16  8:49   ` Hannes Reinecke
2019-05-16  8:40 ` [PATCH 4/4] block: remove the bi_seg_{front,back}_size fields in struct bio Christoph Hellwig
2019-05-16  8:50   ` Hannes Reinecke
2019-05-20 11:17 ` fix nr_phys_segments vs iterators accounting v2 Christoph Hellwig
2019-05-21  1:09   ` Jens Axboe
2019-05-21  1:17     ` Ming Lei
2019-05-21  1:20       ` Jens Axboe
2019-05-21  1:29         ` Ming Lei
2019-05-21  5:11           ` Christoph Hellwig
2019-05-21  7:01 fix nr_phys_segments vs iterators accounting v3 Christoph Hellwig
2019-05-21  7:01 ` [PATCH 1/4] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig
2019-05-21  8:05   ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).