All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] block: enable multi-page bvec for passthrough IO
@ 2019-03-09  1:37 ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-09  1:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, ris Ostrovsky, Juergen Gross, xen-devel,
	Omar Sandoval, Christoph Hellwig

Hi,

Now the whole IO stack is capable of handling multi-page bvec, and it has
been enabled in the normal FS IO path. However, it isn't done for
passthrough IO.

Without enabling multi-bvec for passthough IO, we won't go ahead for
optimizing related IO paths, such as bvec merging, bio_add_pc_page
simplification.

This patch enables multi-page bvec for passthrough IO. Turns out
bio_add_pc_page() is simpliefied a lot, especially the physical segment
number of passthrough bio is always same with bio.bi_vcnt. Also the
bvec merging inside bio is killed.

blktests(block/029) is added for covering passthough IO path, and this
patchset does pass the new block/029 test.

	https://marc.info/?l=linux-block&m=155175063417139&w=2


Ming Lei (6):
  block: pass page to xen_biovec_phys_mergeable
  block: don't merge adjacent bvecs to one segment in bio
    blk_queue_split
  block: check if page is mergeable in one helper
  block: put the same page when adding it to bio
  block: enable multi-page bvec for passthrough IO
  block: don't check if adjacent bvecs in one bio can be mergeable

 block/bio.c            | 121 +++++++++++++++++++++++++++----------------------
 block/blk-merge.c      |  98 ++++++++++++++++++++-------------------
 block/blk.h            |   2 +-
 drivers/xen/biomerge.c |   5 +-
 include/linux/bio.h    |  12 ++++-
 include/xen/xen.h      |   2 +-
 6 files changed, 134 insertions(+), 106 deletions(-)

Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>


-- 
2.9.5


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/6] block: enable multi-page bvec for passthrough IO
@ 2019-03-09  1:37 ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-09  1:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Juergen Gross, Ming Lei, linux-block, xen-devel, ris Ostrovsky,
	Christoph Hellwig, Omar Sandoval

Hi,

Now the whole IO stack is capable of handling multi-page bvec, and it has
been enabled in the normal FS IO path. However, it isn't done for
passthrough IO.

Without enabling multi-bvec for passthough IO, we won't go ahead for
optimizing related IO paths, such as bvec merging, bio_add_pc_page
simplification.

This patch enables multi-page bvec for passthrough IO. Turns out
bio_add_pc_page() is simpliefied a lot, especially the physical segment
number of passthrough bio is always same with bio.bi_vcnt. Also the
bvec merging inside bio is killed.

blktests(block/029) is added for covering passthough IO path, and this
patchset does pass the new block/029 test.

	https://marc.info/?l=linux-block&m=155175063417139&w=2


Ming Lei (6):
  block: pass page to xen_biovec_phys_mergeable
  block: don't merge adjacent bvecs to one segment in bio
    blk_queue_split
  block: check if page is mergeable in one helper
  block: put the same page when adding it to bio
  block: enable multi-page bvec for passthrough IO
  block: don't check if adjacent bvecs in one bio can be mergeable

 block/bio.c            | 121 +++++++++++++++++++++++++++----------------------
 block/blk-merge.c      |  98 ++++++++++++++++++++-------------------
 block/blk.h            |   2 +-
 drivers/xen/biomerge.c |   5 +-
 include/linux/bio.h    |  12 ++++-
 include/xen/xen.h      |   2 +-
 6 files changed, 134 insertions(+), 106 deletions(-)

Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>


-- 
2.9.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/6] block: pass page to xen_biovec_phys_mergeable
  2019-03-09  1:37 ` Ming Lei
@ 2019-03-09  1:37   ` Ming Lei
  -1 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-09  1:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, ris Ostrovsky, Juergen Gross, xen-devel,
	Omar Sandoval, Christoph Hellwig

xen_biovec_phys_mergeable() only needs .bv_page of the 2nd bio bvec
for checking if the two bvecs can be merged, so pass page to
xen_biovec_phys_mergeable() directly.

No function change.

Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk.h            | 2 +-
 drivers/xen/biomerge.c | 5 +++--
 include/xen/xen.h      | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 5d636ee41663..e27fd1512e4b 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -75,7 +75,7 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
 
 	if (addr1 + vec1->bv_len != addr2)
 		return false;
-	if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2))
+	if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
 		return false;
 	if ((addr1 | mask) != ((addr2 + vec2->bv_len - 1) | mask))
 		return false;
diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index f3fbb700f569..05a286d24f14 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -4,12 +4,13 @@
 #include <xen/xen.h>
 #include <xen/page.h>
 
+/* check if @page can be merged with 'vec1' */
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
-			       const struct bio_vec *vec2)
+			       const struct page *page)
 {
 #if XEN_PAGE_SIZE == PAGE_SIZE
 	unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
-	unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
+	unsigned long bfn2 = pfn_to_bfn(page_to_pfn(page));
 
 	return bfn1 + PFN_DOWN(vec1->bv_offset + vec1->bv_len) == bfn2;
 #else
diff --git a/include/xen/xen.h b/include/xen/xen.h
index 0e2156786ad2..39a97f2eff75 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -44,6 +44,6 @@ extern struct hvm_start_info pvh_start_info;
 
 struct bio_vec;
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
-		const struct bio_vec *vec2);
+		const struct page *page);
 
 #endif	/* _XEN_XEN_H */
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/6] block: pass page to xen_biovec_phys_mergeable
@ 2019-03-09  1:37   ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-09  1:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Juergen Gross, Ming Lei, linux-block, xen-devel, ris Ostrovsky,
	Christoph Hellwig, Omar Sandoval

xen_biovec_phys_mergeable() only needs .bv_page of the 2nd bio bvec
for checking if the two bvecs can be merged, so pass page to
xen_biovec_phys_mergeable() directly.

No function change.

Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk.h            | 2 +-
 drivers/xen/biomerge.c | 5 +++--
 include/xen/xen.h      | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 5d636ee41663..e27fd1512e4b 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -75,7 +75,7 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
 
 	if (addr1 + vec1->bv_len != addr2)
 		return false;
-	if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2))
+	if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
 		return false;
 	if ((addr1 | mask) != ((addr2 + vec2->bv_len - 1) | mask))
 		return false;
diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index f3fbb700f569..05a286d24f14 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -4,12 +4,13 @@
 #include <xen/xen.h>
 #include <xen/page.h>
 
+/* check if @page can be merged with 'vec1' */
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
-			       const struct bio_vec *vec2)
+			       const struct page *page)
 {
 #if XEN_PAGE_SIZE == PAGE_SIZE
 	unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
-	unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
+	unsigned long bfn2 = pfn_to_bfn(page_to_pfn(page));
 
 	return bfn1 + PFN_DOWN(vec1->bv_offset + vec1->bv_len) == bfn2;
 #else
diff --git a/include/xen/xen.h b/include/xen/xen.h
index 0e2156786ad2..39a97f2eff75 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -44,6 +44,6 @@ extern struct hvm_start_info pvh_start_info;
 
 struct bio_vec;
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
-		const struct bio_vec *vec2);
+		const struct page *page);
 
 #endif	/* _XEN_XEN_H */
-- 
2.9.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/6] block: don't merge adjacent bvecs to one segment in bio blk_queue_split
  2019-03-09  1:37 ` Ming Lei
@ 2019-03-09  1:37   ` Ming Lei
  -1 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-09  1:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, ris Ostrovsky, Juergen Gross, xen-devel,
	Omar Sandoval, Christoph Hellwig

For normal filesystem IO, each page is added via blk_add_page(),
in which bvec(page) merge has been handled already, and basically
not possible to merge two adjacent bvecs in one bio.

So not try to merge two adjacent bvecs in blk_queue_split(), also add
check if one page is mergeable to current bvec in bio_add_page() for
avoiding to break XEN.

Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c       |  2 ++
 block/blk-merge.c | 17 -----------------
 2 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 71a78d9fb8b7..d8f48188937c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -776,6 +776,8 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 
 		if (vec_end_addr + 1 != page_addr + off)
 			return false;
+		if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
+			return false;
 		if (same_page && (vec_end_addr & PAGE_MASK) != page_addr)
 			return false;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1c9d4f0f96ea..aa9164eb7187 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -267,23 +267,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			goto split;
 		}
 
-		if (bvprvp) {
-			if (seg_size + bv.bv_len > queue_max_segment_size(q))
-				goto new_segment;
-			if (!biovec_phys_mergeable(q, bvprvp, &bv))
-				goto new_segment;
-
-			seg_size += bv.bv_len;
-			bvprv = bv;
-			bvprvp = &bvprv;
-			sectors += bv.bv_len >> 9;
-
-			if (nsegs == 1 && seg_size > front_seg_size)
-				front_seg_size = seg_size;
-
-			continue;
-		}
-new_segment:
 		if (nsegs == max_segs)
 			goto split;
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/6] block: don't merge adjacent bvecs to one segment in bio blk_queue_split
@ 2019-03-09  1:37   ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-09  1:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Juergen Gross, Ming Lei, linux-block, xen-devel, ris Ostrovsky,
	Christoph Hellwig, Omar Sandoval

For normal filesystem IO, each page is added via blk_add_page(),
in which bvec(page) merge has been handled already, and basically
not possible to merge two adjacent bvecs in one bio.

So not try to merge two adjacent bvecs in blk_queue_split(), also add
check if one page is mergeable to current bvec in bio_add_page() for
avoiding to break XEN.

Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c       |  2 ++
 block/blk-merge.c | 17 -----------------
 2 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 71a78d9fb8b7..d8f48188937c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -776,6 +776,8 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 
 		if (vec_end_addr + 1 != page_addr + off)
 			return false;
+		if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
+			return false;
 		if (same_page && (vec_end_addr & PAGE_MASK) != page_addr)
 			return false;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1c9d4f0f96ea..aa9164eb7187 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -267,23 +267,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			goto split;
 		}
 
-		if (bvprvp) {
-			if (seg_size + bv.bv_len > queue_max_segment_size(q))
-				goto new_segment;
-			if (!biovec_phys_mergeable(q, bvprvp, &bv))
-				goto new_segment;
-
-			seg_size += bv.bv_len;
-			bvprv = bv;
-			bvprvp = &bvprv;
-			sectors += bv.bv_len >> 9;
-
-			if (nsegs == 1 && seg_size > front_seg_size)
-				front_seg_size = seg_size;
-
-			continue;
-		}
-new_segment:
 		if (nsegs == max_segs)
 			goto split;
 
-- 
2.9.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/6] block: check if page is mergeable in one helper
  2019-03-09  1:37 ` Ming Lei
                   ` (2 preceding siblings ...)
  (?)
@ 2019-03-09  1:37 ` Ming Lei
  2019-03-11 14:23   ` Christoph Hellwig
  -1 siblings, 1 reply; 27+ messages in thread
From: Ming Lei @ 2019-03-09  1:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Omar Sandoval, Christoph Hellwig

Now the check for deciding if one page is mergeable to current bvec
becomes a bit complicated, and we need to reuse the code before
adding pc page.

So move the check in one dedicated helper.

No function change.

Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index d8f48188937c..62411877224c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -647,6 +647,24 @@ struct bio *bio_clone_fast(struct bio *bio, gfp_t gfp_mask, struct bio_set *bs)
 }
 EXPORT_SYMBOL(bio_clone_fast);
 
+static inline bool
+page_is_mergeable(const struct bio_vec *bv, struct page *page,
+		unsigned int len, unsigned int off, bool same_page)
+{
+	phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) +
+		bv->bv_offset + bv->bv_len - 1;
+	phys_addr_t page_addr = page_to_phys(page);
+
+	if (vec_end_addr + 1 != page_addr + off)
+		return false;
+	if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
+		return false;
+	if (same_page && (vec_end_addr & PAGE_MASK) != page_addr)
+		return false;
+
+	return true;
+}
+
 /**
  *	bio_add_pc_page	-	attempt to add page to bio
  *	@q: the target queue
@@ -770,20 +788,12 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 
 	if (bio->bi_vcnt > 0) {
 		struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
-		phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) +
-			bv->bv_offset + bv->bv_len - 1;
-		phys_addr_t page_addr = page_to_phys(page);
-
-		if (vec_end_addr + 1 != page_addr + off)
-			return false;
-		if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
-			return false;
-		if (same_page && (vec_end_addr & PAGE_MASK) != page_addr)
-			return false;
-
-		bv->bv_len += len;
-		bio->bi_iter.bi_size += len;
-		return true;
+
+		if (page_is_mergeable(bv, page, len, off, same_page)) {
+			bv->bv_len += len;
+			bio->bi_iter.bi_size += len;
+			return true;
+		}
 	}
 	return false;
 }
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/6] block: put the same page when adding it to bio
  2019-03-09  1:37 ` Ming Lei
                   ` (3 preceding siblings ...)
  (?)
@ 2019-03-09  1:37 ` Ming Lei
  2019-03-11 14:28   ` Christoph Hellwig
  -1 siblings, 1 reply; 27+ messages in thread
From: Ming Lei @ 2019-03-09  1:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Omar Sandoval, Christoph Hellwig

When the added page is merged to last same page in bio_add_pc_page(),
the user may need to put this page for avoiding page leak.

bio_map_user_iov() needs this kind of handling, and now it deals with
it by itself in hack style.

Moves the handling of put page into __bio_add_pc_page(), so
bio_map_user_iov() may be simplified a bit, and maybe more users
can benefit from this change.

Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c         | 23 ++++++++++-------------
 include/linux/bio.h | 12 ++++++++++--
 2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 62411877224c..95ec5e893265 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -666,12 +666,13 @@ page_is_mergeable(const struct bio_vec *bv, struct page *page,
 }
 
 /**
- *	bio_add_pc_page	-	attempt to add page to bio
+ *	__bio_add_pc_page	-	attempt to add page to bio
  *	@q: the target queue
  *	@bio: destination bio
  *	@page: page to add
  *	@len: vec entry length
  *	@offset: vec entry offset
+ *	@put_same_page: put the page if it is same with last added page
  *
  *	Attempt to add a page to the bio_vec maplist. This can fail for a
  *	number of reasons, such as the bio being full or target block device
@@ -680,8 +681,9 @@ page_is_mergeable(const struct bio_vec *bv, struct page *page,
  *
  *	This should only be used by REQ_PC bios.
  */
-int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
-		    *page, unsigned int len, unsigned int offset)
+int __bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
+		      *page, unsigned int len, unsigned int offset,
+		      bool put_same_page)
 {
 	int retried_segments = 0;
 	struct bio_vec *bvec;
@@ -705,6 +707,8 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 
 		if (page == prev->bv_page &&
 		    offset == prev->bv_offset + prev->bv_len) {
+			if (put_same_page)
+				put_page(page);
 			prev->bv_len += len;
 			bio->bi_iter.bi_size += len;
 			goto done;
@@ -763,7 +767,7 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 	blk_recount_segments(q, bio);
 	return 0;
 }
-EXPORT_SYMBOL(bio_add_pc_page);
+EXPORT_SYMBOL(__bio_add_pc_page);
 
 /**
  * __bio_try_merge_page - try appending data to an existing bvec.
@@ -1394,21 +1398,14 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 			for (j = 0; j < npages; j++) {
 				struct page *page = pages[j];
 				unsigned int n = PAGE_SIZE - offs;
-				unsigned short prev_bi_vcnt = bio->bi_vcnt;
 
 				if (n > bytes)
 					n = bytes;
 
-				if (!bio_add_pc_page(q, bio, page, n, offs))
+				if (!__bio_add_pc_page(q, bio, page, n, offs,
+							true))
 					break;
 
-				/*
-				 * check if vector was merged with previous
-				 * drop page reference if needed
-				 */
-				if (bio->bi_vcnt == prev_bi_vcnt)
-					put_page(page);
-
 				added += n;
 				bytes -= n;
 				offs = 0;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index bb6090aa165d..28b8c46de11c 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -430,8 +430,16 @@ extern void bio_reset(struct bio *);
 void bio_chain(struct bio *, struct bio *);
 
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
-extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
-			   unsigned int, unsigned int);
+extern int __bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
+			   unsigned int, unsigned int, bool);
+
+static inline int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
+		    *page, unsigned int len, unsigned int offset)
+{
+	return __bio_add_pc_page(q, bio, page, len, offset, false);
+}
+
+
 bool __bio_try_merge_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off, bool same_page);
 void __bio_add_page(struct bio *bio, struct page *page,
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/6] block: enable multi-page bvec for passthrough IO
  2019-03-09  1:37 ` Ming Lei
                   ` (4 preceding siblings ...)
  (?)
@ 2019-03-09  1:37 ` Ming Lei
  2019-03-11 14:35   ` Christoph Hellwig
  -1 siblings, 1 reply; 27+ messages in thread
From: Ming Lei @ 2019-03-09  1:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Omar Sandoval, Christoph Hellwig

Now block IO stack is basically ready for supporting multi-page bvec,
however it isn't enabled on passthrough IO.

One reason is that passthrough IO is dispatched to LLD directly and bio
split is bypassed, so the bio has to be built correctly for dispatch to
LLD from the beginning.

Implement multi-page support for passthrough IO by limitting each bvec
as block device's segment and applying all kinds of queue limit in
blk_add_pc_page(). Then we don't need to calculate segments any more for
passthrough IO any more, turns out code is simplified much.

Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c | 64 +++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 35 insertions(+), 29 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 95ec5e893265..ce15246f9a1f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -665,6 +665,27 @@ page_is_mergeable(const struct bio_vec *bv, struct page *page,
 	return true;
 }
 
+/*
+ * Check if the @page can be added to the current segment(@bv), and make
+ * sure to call it only if page_is_mergeable(@bv, @page) is true
+ */
+static bool can_add_page_to_seg(struct request_queue *q,
+		const struct bio_vec *bv, const struct page *page,
+		unsigned len, unsigned offset)
+{
+	unsigned long mask = queue_segment_boundary(q);
+	phys_addr_t addr1 = page_to_phys(bv->bv_page) + bv->bv_offset;
+	phys_addr_t addr2 = page_to_phys(page) + offset + len - 1;
+
+	if ((addr1 | mask) != (addr2 | mask))
+		return false;
+
+	if (bv->bv_len + len > queue_max_segment_size(q))
+		return false;
+
+	return true;
+}
+
 /**
  *	__bio_add_pc_page	-	attempt to add page to bio
  *	@q: the target queue
@@ -680,12 +701,13 @@ page_is_mergeable(const struct bio_vec *bv, struct page *page,
  *	so it is always possible to add a single page to an empty bio.
  *
  *	This should only be used by REQ_PC bios.
+ *
+ *	For REQ_PC bios, bvec is exactly same with segment of block device.
  */
 int __bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 		      *page, unsigned int len, unsigned int offset,
 		      bool put_same_page)
 {
-	int retried_segments = 0;
 	struct bio_vec *bvec;
 
 	/*
@@ -705,10 +727,12 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 	if (bio->bi_vcnt > 0) {
 		struct bio_vec *prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
 
+		/* segment size is always >= PAGE_SIZE */
 		if (page == prev->bv_page &&
 		    offset == prev->bv_offset + prev->bv_len) {
 			if (put_same_page)
 				put_page(page);
+ bvec_merge:
 			prev->bv_len += len;
 			bio->bi_iter.bi_size += len;
 			goto done;
@@ -720,11 +744,18 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 		 */
 		if (bvec_gap_to_prev(q, prev, offset))
 			return 0;
+
+		if (page_is_mergeable(prev, page, len, offset, false) &&
+				can_add_page_to_seg(q, prev, page, len, offset))
+			goto bvec_merge;
 	}
 
 	if (bio_full(bio))
 		return 0;
 
+	if (bio->bi_phys_segments >= queue_max_segments(q))
+		return 0;
+
 	/*
 	 * setup the new entry, we might clear it again later if we
 	 * cannot add the page
@@ -734,38 +765,13 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 	bvec->bv_len = len;
 	bvec->bv_offset = offset;
 	bio->bi_vcnt++;
-	bio->bi_phys_segments++;
 	bio->bi_iter.bi_size += len;
 
-	/*
-	 * Perform a recount if the number of segments is greater
-	 * than queue_max_segments(q).
-	 */
-
-	while (bio->bi_phys_segments > queue_max_segments(q)) {
-
-		if (retried_segments)
-			goto failed;
-
-		retried_segments = 1;
-		blk_recount_segments(q, bio);
-	}
-
-	/* If we may be able to merge these biovecs, force a recount */
-	if (bio->bi_vcnt > 1 && biovec_phys_mergeable(q, bvec - 1, bvec))
-		bio_clear_flag(bio, BIO_SEG_VALID);
-
  done:
+	bio->bi_phys_segments = bio->bi_vcnt;
+	if (!bio_flagged(bio, BIO_SEG_VALID))
+		bio_set_flag(bio, BIO_SEG_VALID);
 	return len;
-
- failed:
-	bvec->bv_page = NULL;
-	bvec->bv_len = 0;
-	bvec->bv_offset = 0;
-	bio->bi_vcnt--;
-	bio->bi_iter.bi_size -= len;
-	blk_recount_segments(q, bio);
-	return 0;
 }
 EXPORT_SYMBOL(__bio_add_pc_page);
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/6] block: don't check if adjacent bvecs in one bio can be mergeable
  2019-03-09  1:37 ` Ming Lei
                   ` (5 preceding siblings ...)
  (?)
@ 2019-03-09  1:37 ` Ming Lei
  2019-03-11 14:40   ` Christoph Hellwig
  -1 siblings, 1 reply; 27+ messages in thread
From: Ming Lei @ 2019-03-09  1:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Omar Sandoval, Christoph Hellwig

Now both passthrough and FS IO have supported multi-page bvec, and
bvec merging has been handled actually when adding page to bio, then
adjacent bvecs won't be mergeable any more if they belong to same bio.

So only try to merge bvecs if they are from different bios.

Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-merge.c | 81 ++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 51 insertions(+), 30 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index aa9164eb7187..5c52141eee7d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -354,11 +354,11 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 					     struct bio *bio)
 {
 	struct bio_vec bv, bvprv = { NULL };
-	int prev = 0;
 	unsigned int seg_size, nr_phys_segs;
 	unsigned front_seg_size;
 	struct bio *fbio, *bbio;
 	struct bvec_iter iter;
+	bool new_bio;
 
 	if (!bio)
 		return 0;
@@ -377,9 +377,10 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	fbio = bio;
 	seg_size = 0;
 	nr_phys_segs = 0;
+	new_bio = false;
 	for_each_bio(bio) {
 		bio_for_each_bvec(bv, bio, iter) {
-			if (prev) {
+			if (new_bio) {
 				if (seg_size + bv.bv_len
 				    > queue_max_segment_size(q))
 					goto new_segment;
@@ -387,7 +388,6 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 					goto new_segment;
 
 				seg_size += bv.bv_len;
-				bvprv = bv;
 
 				if (nr_phys_segs == 1 && seg_size >
 						front_seg_size)
@@ -396,12 +396,13 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 				continue;
 			}
 new_segment:
-			bvprv = bv;
-			prev = 1;
 			bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size,
 					&front_seg_size, NULL, UINT_MAX);
+			new_bio = false;
 		}
 		bbio = bio;
+		bvprv = bv;
+		new_bio = true;
 	}
 
 	fbio->bi_seg_front_size = front_seg_size;
@@ -493,31 +494,38 @@ static unsigned blk_bvec_map_sg(struct request_queue *q,
 	return nsegs;
 }
 
-static inline void
-__blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
-		     struct scatterlist *sglist, struct bio_vec *bvprv,
-		     struct scatterlist **sg, int *nsegs)
+/* only try to merge bvecs into one sg if they are from two bios */
+static inline bool
+__blk_segment_map_sg_merge(struct request_queue *q, struct bio_vec *bvec,
+			   struct bio_vec *bvprv, struct scatterlist **sg)
 {
-
 	int nbytes = bvec->bv_len;
 
-	if (*sg) {
-		if ((*sg)->length + nbytes > queue_max_segment_size(q))
-			goto new_segment;
-		if (!biovec_phys_mergeable(q, bvprv, bvec))
-			goto new_segment;
+	if (!*sg)
+		return false;
 
-		(*sg)->length += nbytes;
-	} else {
-new_segment:
-		if (bvec->bv_offset + bvec->bv_len <= PAGE_SIZE) {
-			*sg = blk_next_sg(sg, sglist);
-			sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
-			(*nsegs) += 1;
-		} else
-			(*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
-	}
-	*bvprv = *bvec;
+	if ((*sg)->length + nbytes > queue_max_segment_size(q))
+		return false;
+
+	if (!biovec_phys_mergeable(q, bvprv, bvec))
+		return false;
+
+	(*sg)->length += nbytes;
+
+	return true;
+}
+
+static inline void
+__blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
+		     struct scatterlist *sglist, struct scatterlist **sg,
+		     int *nsegs)
+{
+	if (bvec->bv_offset + bvec->bv_len <= PAGE_SIZE) {
+		*sg = blk_next_sg(sg, sglist);
+		sg_set_page(*sg, bvec->bv_page, bvec->bv_len, bvec->bv_offset);
+		(*nsegs) += 1;
+	} else
+		(*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
 }
 
 static inline int __blk_bvec_map_sg(struct request_queue *q, struct bio_vec bv,
@@ -535,11 +543,24 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
 	struct bio_vec bvec, bvprv = { NULL };
 	struct bvec_iter iter;
 	int nsegs = 0;
+	bool new_bio = false;
 
-	for_each_bio(bio)
-		bio_for_each_bvec(bvec, bio, iter)
-			__blk_segment_map_sg(q, &bvec, sglist, &bvprv, sg,
-					     &nsegs);
+	for_each_bio(bio) {
+		bio_for_each_bvec(bvec, bio, iter) {
+			/*
+			 * Only try to merge bvecs from two bios given we
+			 * have done bio internal merge when adding pages
+			 * to bio
+			 */
+			if (!new_bio || !__blk_segment_map_sg_merge(q,
+						&bvec, &bvprv, sg))
+				__blk_segment_map_sg(q, &bvec, sglist, sg,
+						     &nsegs);
+			new_bio = false;
+		}
+		bvprv = bvec;
+		new_bio = true;
+	}
 
 	return nsegs;
 }
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] block: pass page to xen_biovec_phys_mergeable
  2019-03-09  1:37   ` Ming Lei
@ 2019-03-11 14:16     ` Christoph Hellwig
  -1 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2019-03-11 14:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, ris Ostrovsky, Juergen Gross, xen-devel,
	Omar Sandoval, Christoph Hellwig

On Sat, Mar 09, 2019 at 09:37:32AM +0800, Ming Lei wrote:
> xen_biovec_phys_mergeable() only needs .bv_page of the 2nd bio bvec
> for checking if the two bvecs can be merged, so pass page to
> xen_biovec_phys_mergeable() directly.

Looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] block: pass page to xen_biovec_phys_mergeable
@ 2019-03-11 14:16     ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2019-03-11 14:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Juergen Gross, Christoph Hellwig, linux-block,
	xen-devel, ris Ostrovsky, Omar Sandoval

On Sat, Mar 09, 2019 at 09:37:32AM +0800, Ming Lei wrote:
> xen_biovec_phys_mergeable() only needs .bv_page of the 2nd bio bvec
> for checking if the two bvecs can be merged, so pass page to
> xen_biovec_phys_mergeable() directly.

Looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] block: don't merge adjacent bvecs to one segment in bio blk_queue_split
  2019-03-09  1:37   ` Ming Lei
@ 2019-03-11 14:21     ` Christoph Hellwig
  -1 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2019-03-11 14:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, ris Ostrovsky, Juergen Gross, xen-devel,
	Omar Sandoval, Christoph Hellwig

On Sat, Mar 09, 2019 at 09:37:33AM +0800, Ming Lei wrote:
> For normal filesystem IO, each page is added via blk_add_page(),
> in which bvec(page) merge has been handled already, and basically
> not possible to merge two adjacent bvecs in one bio.
> 
> So not try to merge two adjacent bvecs in blk_queue_split(), also add
> check if one page is mergeable to current bvec in bio_add_page() for
> avoiding to break XEN.

Isn't this two entirely different things?  Both look good to me,
but I don't understand why this is one patch vs two.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] block: don't merge adjacent bvecs to one segment in bio blk_queue_split
@ 2019-03-11 14:21     ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2019-03-11 14:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Juergen Gross, Christoph Hellwig, linux-block,
	xen-devel, ris Ostrovsky, Omar Sandoval

On Sat, Mar 09, 2019 at 09:37:33AM +0800, Ming Lei wrote:
> For normal filesystem IO, each page is added via blk_add_page(),
> in which bvec(page) merge has been handled already, and basically
> not possible to merge two adjacent bvecs in one bio.
> 
> So not try to merge two adjacent bvecs in blk_queue_split(), also add
> check if one page is mergeable to current bvec in bio_add_page() for
> avoiding to break XEN.

Isn't this two entirely different things?  Both look good to me,
but I don't understand why this is one patch vs two.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] block: check if page is mergeable in one helper
  2019-03-09  1:37 ` [PATCH 3/6] block: check if page is mergeable in one helper Ming Lei
@ 2019-03-11 14:23   ` Christoph Hellwig
  2019-03-17  8:05     ` Ming Lei
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2019-03-11 14:23 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Omar Sandoval, Christoph Hellwig

> +static inline bool
> +page_is_mergeable(const struct bio_vec *bv, struct page *page,
> +		unsigned int len, unsigned int off, bool same_page)

Please follow the other function declarations in this file:

static inline bool page_is_mergeable(const struct bio_vec *bv,
		struct page *page, unsigned int len, unsigned int off,
		bool same_page)

> +	if (same_page && (vec_end_addr & PAGE_MASK) != page_addr)
> +		return false;

No need for the inner braces here.

Otherwis this looks good to me.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] block: put the same page when adding it to bio
  2019-03-09  1:37 ` [PATCH 4/6] block: put the same page when adding it to bio Ming Lei
@ 2019-03-11 14:28   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2019-03-11 14:28 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Omar Sandoval, Christoph Hellwig

> +int __bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
> +		      *page, unsigned int len, unsigned int offset,
> +		      bool put_same_page)

Very odd indentation, we try to never have a linebreak between the type
and its parameter name.

> +static inline int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
> +		    *page, unsigned int len, unsigned int offset)
> +{
> +	return __bio_add_pc_page(q, bio, page, len, offset, false);
> +}

Too long line.  Also please keep bio_add_pc_page in bio.c and export
it instead of the low-level helper.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] block: enable multi-page bvec for passthrough IO
  2019-03-09  1:37 ` [PATCH 5/6] block: enable multi-page bvec for passthrough IO Ming Lei
@ 2019-03-11 14:35   ` Christoph Hellwig
  2019-03-12  1:06     ` Ming Lei
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2019-03-11 14:35 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Omar Sandoval, Christoph Hellwig

>  		struct bio_vec *prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
>  
> +		/* segment size is always >= PAGE_SIZE */

I don't think that actually is true.  We have various drivers with 4k
segment size, for which this would not be true with a 64k page size
system.

>  		if (page == prev->bv_page &&
>  		    offset == prev->bv_offset + prev->bv_len) {
>  			if (put_same_page)
>  				put_page(page);
> + bvec_merge:
>  			prev->bv_len += len;
>  			bio->bi_iter.bi_size += len;

Please throw in a cleanup patch that removes prev and and always uses
bvec, then we can just move the done label up by two lines and handle
the increment of bv_len and bi_size in a common branch.

>   done:
> +	bio->bi_phys_segments = bio->bi_vcnt;
> +	if (!bio_flagged(bio, BIO_SEG_VALID))
> +		bio_set_flag(bio, BIO_SEG_VALID);

How would BIO_SEG_VALID get set previously?  And even if it did
we wouldn't optimize much here, I'd just do it unconditionally.

Btw, there are various comments in this function that either already
were or have become out of data, like talking about REQ_PC or
file system I/O - I think they could use some love.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] block: don't check if adjacent bvecs in one bio can be mergeable
  2019-03-09  1:37 ` [PATCH 6/6] block: don't check if adjacent bvecs in one bio can be mergeable Ming Lei
@ 2019-03-11 14:40   ` Christoph Hellwig
  2019-03-12  1:19     ` Ming Lei
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2019-03-11 14:40 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Omar Sandoval, Christoph Hellwig

> +	bool new_bio;
>  
>  	if (!bio)
>  		return 0;
> @@ -377,9 +377,10 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
>  	fbio = bio;
>  	seg_size = 0;
>  	nr_phys_segs = 0;
> +	new_bio = false;

I'd just initialize it to false in the declaration line.

> +/* only try to merge bvecs into one sg if they are from two bios */
> +static inline bool
> +__blk_segment_map_sg_merge(struct request_queue *q, struct bio_vec *bvec,
> +			   struct bio_vec *bvprv, struct scatterlist **sg)
>  {
>  	int nbytes = bvec->bv_len;
>  
> +	if (!*sg)
> +		return false;
>  
> +	if ((*sg)->length + nbytes > queue_max_segment_size(q))
> +		return false;
> +
> +	if (!biovec_phys_mergeable(q, bvprv, bvec))
> +		return false;
> +
> +	(*sg)->length += nbytes;
> +
> +	return true;
> +}
> +
> +static inline void
> +__blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
> +		     struct scatterlist *sglist, struct scatterlist **sg,
> +		     int *nsegs)
> +{
> +	if (bvec->bv_offset + bvec->bv_len <= PAGE_SIZE) {
> +		*sg = blk_next_sg(sg, sglist);
> +		sg_set_page(*sg, bvec->bv_page, bvec->bv_len, bvec->bv_offset);
> +		(*nsegs) += 1;

This branch is basically the same as __blk_bvec_map_sg, I wonder if
we can reuse it.

> +	} else
> +		(*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
>  }

And then maybe just kill off __blk_segment_map_sg entirely by moving
the if else into the caller.

> @@ -535,11 +543,24 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
>  	struct bio_vec bvec, bvprv = { NULL };
>  	struct bvec_iter iter;
>  	int nsegs = 0;
> +	bool new_bio = false;
>  
> +	for_each_bio(bio) {
> +		bio_for_each_bvec(bvec, bio, iter) {
> +			/*
> +			 * Only try to merge bvecs from two bios given we
> +			 * have done bio internal merge when adding pages
> +			 * to bio
> +			 */
> +			if (!new_bio || !__blk_segment_map_sg_merge(q,
> +						&bvec, &bvprv, sg))

Somewhat hard to read.  Why not:

			if (new_bio ||
			    !__blk_segment_map_sg_merge(q, &bvec, &bvprv, sg))

That being said I'd really like to see some stats on workloads that
actually trigger cross-bio segment merges.  If we get a lot of those
we are doing something wrong.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] block: pass page to xen_biovec_phys_mergeable
  2019-03-09  1:37   ` Ming Lei
@ 2019-03-11 19:57     ` Boris Ostrovsky
  -1 siblings, 0 replies; 27+ messages in thread
From: Boris Ostrovsky @ 2019-03-11 19:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Juergen Gross, xen-devel, Omar Sandoval,
	Christoph Hellwig

On Sat, Mar 09, 2019 at 09:37:32AM +0800, Ming Lei wrote:
> xen_biovec_phys_mergeable() only needs .bv_page of the 2nd bio bvec
> for checking if the two bvecs can be merged, so pass page to
> xen_biovec_phys_mergeable() directly.
> 
> No function change.
> 
> Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: xen-devel@lists.xenproject.org
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>


Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
             ^^

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] block: pass page to xen_biovec_phys_mergeable
@ 2019-03-11 19:57     ` Boris Ostrovsky
  0 siblings, 0 replies; 27+ messages in thread
From: Boris Ostrovsky @ 2019-03-11 19:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Juergen Gross, Christoph Hellwig, linux-block,
	xen-devel, Omar Sandoval

On Sat, Mar 09, 2019 at 09:37:32AM +0800, Ming Lei wrote:
> xen_biovec_phys_mergeable() only needs .bv_page of the 2nd bio bvec
> for checking if the two bvecs can be merged, so pass page to
> xen_biovec_phys_mergeable() directly.
> 
> No function change.
> 
> Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: xen-devel@lists.xenproject.org
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>


Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
             ^^

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] block: don't merge adjacent bvecs to one segment in bio blk_queue_split
  2019-03-09  1:37   ` Ming Lei
@ 2019-03-11 19:58     ` Boris Ostrovsky
  -1 siblings, 0 replies; 27+ messages in thread
From: Boris Ostrovsky @ 2019-03-11 19:58 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Juergen Gross, xen-devel, Omar Sandoval,
	Christoph Hellwig

On Sat, Mar 09, 2019 at 09:37:33AM +0800, Ming Lei wrote:
> For normal filesystem IO, each page is added via blk_add_page(),
> in which bvec(page) merge has been handled already, and basically
> not possible to merge two adjacent bvecs in one bio.
> 
> So not try to merge two adjacent bvecs in blk_queue_split(), also add
> check if one page is mergeable to current bvec in bio_add_page() for
> avoiding to break XEN.
> 
> Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: xen-devel@lists.xenproject.org
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/bio.c       |  2 ++
>  block/blk-merge.c | 17 -----------------
>  2 files changed, 2 insertions(+), 17 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 71a78d9fb8b7..d8f48188937c 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -776,6 +776,8 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
>  
>  		if (vec_end_addr + 1 != page_addr + off)
>  			return false;
> +		if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
> +			return false;
>  		if (same_page && (vec_end_addr & PAGE_MASK) != page_addr)
>  			return false;
>  

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] block: don't merge adjacent bvecs to one segment in bio blk_queue_split
@ 2019-03-11 19:58     ` Boris Ostrovsky
  0 siblings, 0 replies; 27+ messages in thread
From: Boris Ostrovsky @ 2019-03-11 19:58 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Juergen Gross, Christoph Hellwig, linux-block,
	xen-devel, Omar Sandoval

On Sat, Mar 09, 2019 at 09:37:33AM +0800, Ming Lei wrote:
> For normal filesystem IO, each page is added via blk_add_page(),
> in which bvec(page) merge has been handled already, and basically
> not possible to merge two adjacent bvecs in one bio.
> 
> So not try to merge two adjacent bvecs in blk_queue_split(), also add
> check if one page is mergeable to current bvec in bio_add_page() for
> avoiding to break XEN.
> 
> Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: xen-devel@lists.xenproject.org
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/bio.c       |  2 ++
>  block/blk-merge.c | 17 -----------------
>  2 files changed, 2 insertions(+), 17 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 71a78d9fb8b7..d8f48188937c 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -776,6 +776,8 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
>  
>  		if (vec_end_addr + 1 != page_addr + off)
>  			return false;
> +		if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
> +			return false;
>  		if (same_page && (vec_end_addr & PAGE_MASK) != page_addr)
>  			return false;
>  

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] block: enable multi-page bvec for passthrough IO
  2019-03-11 14:35   ` Christoph Hellwig
@ 2019-03-12  1:06     ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-12  1:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Omar Sandoval

On Mon, Mar 11, 2019 at 03:35:54PM +0100, Christoph Hellwig wrote:
> >  		struct bio_vec *prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
> >  
> > +		/* segment size is always >= PAGE_SIZE */
> 
> I don't think that actually is true.  We have various drivers with 4k
> segment size, for which this would not be true with a 64k page size
> system.

Please look at blk_queue_max_segment_size():

void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
{
        if (max_size < PAGE_SIZE) {
                max_size = PAGE_SIZE;
                printk(KERN_INFO "%s: set to minimum %d\n",
                       __func__, max_size);
        }

        q->limits.max_segment_size = max_size;
}

But your concern is correct wrt. 4k segment size vs. 64k page size,
however, seems not see any such report.

> 
> >  		if (page == prev->bv_page &&
> >  		    offset == prev->bv_offset + prev->bv_len) {
> >  			if (put_same_page)
> >  				put_page(page);
> > + bvec_merge:
> >  			prev->bv_len += len;
> >  			bio->bi_iter.bi_size += len;
> 
> Please throw in a cleanup patch that removes prev and and always uses
> bvec, then we can just move the done label up by two lines and handle
> the increment of bv_len and bi_size in a common branch.

Seems only one line of 'bio->bi_iter.bi_size += len;' can be shared.

> 
> >   done:
> > +	bio->bi_phys_segments = bio->bi_vcnt;
> > +	if (!bio_flagged(bio, BIO_SEG_VALID))
> > +		bio_set_flag(bio, BIO_SEG_VALID);
> 
> How would BIO_SEG_VALID get set previously?  And even if it did

bio_add_pc_page() is run over each page.

> we wouldn't optimize much here, I'd just do it unconditionally.

OK.

> 
> Btw, there are various comments in this function that either already
> were or have become out of data, like talking about REQ_PC or
> file system I/O - I think they could use some love.

OK.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] block: don't check if adjacent bvecs in one bio can be mergeable
  2019-03-11 14:40   ` Christoph Hellwig
@ 2019-03-12  1:19     ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-12  1:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Omar Sandoval

On Mon, Mar 11, 2019 at 03:40:24PM +0100, Christoph Hellwig wrote:
> > +	bool new_bio;
> >  
> >  	if (!bio)
> >  		return 0;
> > @@ -377,9 +377,10 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
> >  	fbio = bio;
> >  	seg_size = 0;
> >  	nr_phys_segs = 0;
> > +	new_bio = false;
> 
> I'd just initialize it to false in the declaration line.

OK.

> 
> > +/* only try to merge bvecs into one sg if they are from two bios */
> > +static inline bool
> > +__blk_segment_map_sg_merge(struct request_queue *q, struct bio_vec *bvec,
> > +			   struct bio_vec *bvprv, struct scatterlist **sg)
> >  {
> >  	int nbytes = bvec->bv_len;
> >  
> > +	if (!*sg)
> > +		return false;
> >  
> > +	if ((*sg)->length + nbytes > queue_max_segment_size(q))
> > +		return false;
> > +
> > +	if (!biovec_phys_mergeable(q, bvprv, bvec))
> > +		return false;
> > +
> > +	(*sg)->length += nbytes;
> > +
> > +	return true;
> > +}
> > +
> > +static inline void
> > +__blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
> > +		     struct scatterlist *sglist, struct scatterlist **sg,
> > +		     int *nsegs)
> > +{
> > +	if (bvec->bv_offset + bvec->bv_len <= PAGE_SIZE) {
> > +		*sg = blk_next_sg(sg, sglist);
> > +		sg_set_page(*sg, bvec->bv_page, bvec->bv_len, bvec->bv_offset);
> > +		(*nsegs) += 1;
> 
> This branch is basically the same as __blk_bvec_map_sg, I wonder if
> we can reuse it.

Good point!

> 
> > +	} else
> > +		(*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
> >  }
> 
> And then maybe just kill off __blk_segment_map_sg entirely by moving
> the if else into the caller.
> 
> > @@ -535,11 +543,24 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
> >  	struct bio_vec bvec, bvprv = { NULL };
> >  	struct bvec_iter iter;
> >  	int nsegs = 0;
> > +	bool new_bio = false;
> >  
> > +	for_each_bio(bio) {
> > +		bio_for_each_bvec(bvec, bio, iter) {
> > +			/*
> > +			 * Only try to merge bvecs from two bios given we
> > +			 * have done bio internal merge when adding pages
> > +			 * to bio
> > +			 */
> > +			if (!new_bio || !__blk_segment_map_sg_merge(q,
> > +						&bvec, &bvprv, sg))
> 
> Somewhat hard to read.  Why not:
> 
> 			if (new_bio ||
> 			    !__blk_segment_map_sg_merge(q, &bvec, &bvprv, sg))

OK.

> 
> That being said I'd really like to see some stats on workloads that
> actually trigger cross-bio segment merges.  If we get a lot of those
> we are doing something wrong.

Please see 729204ef49ec00b ("block: relax check on sg gap"):

    If the last bvec of the 1st bio and the 1st bvec of the next
    bio are physically contigious, and the latter can be merged
    to last segment of the 1st bio, we should think they don't
    violate sg gap(or virt boundary) limit.

    Both Vitaly and Dexuan reported lots of unmergeable small bios
    are observed when running mkfs on Hyper-V virtual storage, and
    performance becomes quite low. This patch fixes that performance
    issue.

    The same issue should exist on NVMe, since it sets virt boundary too.

However, if the related fs code can be fixed to not generate too many
small bios, we might kill the cross-bio segment merge. Also not sure if
mkfs is the only such case.
    
Thanks,
Ming

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] block: don't merge adjacent bvecs to one segment in bio blk_queue_split
  2019-03-11 14:21     ` Christoph Hellwig
@ 2019-03-12  1:22       ` Ming Lei
  -1 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-12  1:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, ris Ostrovsky, Juergen Gross, xen-devel,
	Omar Sandoval

On Mon, Mar 11, 2019 at 03:21:06PM +0100, Christoph Hellwig wrote:
> On Sat, Mar 09, 2019 at 09:37:33AM +0800, Ming Lei wrote:
> > For normal filesystem IO, each page is added via blk_add_page(),
> > in which bvec(page) merge has been handled already, and basically
> > not possible to merge two adjacent bvecs in one bio.
> > 
> > So not try to merge two adjacent bvecs in blk_queue_split(), also add
> > check if one page is mergeable to current bvec in bio_add_page() for
> > avoiding to break XEN.
> 
> Isn't this two entirely different things?  Both look good to me,
> but I don't understand why this is one patch vs two.

You are right, it should have been two things logically.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] block: don't merge adjacent bvecs to one segment in bio blk_queue_split
@ 2019-03-12  1:22       ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-12  1:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Juergen Gross, linux-block, xen-devel, ris Ostrovsky,
	Omar Sandoval

On Mon, Mar 11, 2019 at 03:21:06PM +0100, Christoph Hellwig wrote:
> On Sat, Mar 09, 2019 at 09:37:33AM +0800, Ming Lei wrote:
> > For normal filesystem IO, each page is added via blk_add_page(),
> > in which bvec(page) merge has been handled already, and basically
> > not possible to merge two adjacent bvecs in one bio.
> > 
> > So not try to merge two adjacent bvecs in blk_queue_split(), also add
> > check if one page is mergeable to current bvec in bio_add_page() for
> > avoiding to break XEN.
> 
> Isn't this two entirely different things?  Both look good to me,
> but I don't understand why this is one patch vs two.

You are right, it should have been two things logically.

Thanks,
Ming

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] block: check if page is mergeable in one helper
  2019-03-11 14:23   ` Christoph Hellwig
@ 2019-03-17  8:05     ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2019-03-17  8:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Omar Sandoval

On Mon, Mar 11, 2019 at 03:23:56PM +0100, Christoph Hellwig wrote:
> > +static inline bool
> > +page_is_mergeable(const struct bio_vec *bv, struct page *page,
> > +		unsigned int len, unsigned int off, bool same_page)
> 
> Please follow the other function declarations in this file:
> 
> static inline bool page_is_mergeable(const struct bio_vec *bv,
> 		struct page *page, unsigned int len, unsigned int off,
> 		bool same_page)

OK.

> 
> > +	if (same_page && (vec_end_addr & PAGE_MASK) != page_addr)
> > +		return false;
> 
> No need for the inner braces here.

I think it is needed, given "!=" has higher precedence thank "&", see

	https://en.cppreference.com/w/c/language/operator_precedence

Also the line is more readable with inner braces.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-03-17  8:05 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-09  1:37 [PATCH 0/6] block: enable multi-page bvec for passthrough IO Ming Lei
2019-03-09  1:37 ` Ming Lei
2019-03-09  1:37 ` [PATCH 1/6] block: pass page to xen_biovec_phys_mergeable Ming Lei
2019-03-09  1:37   ` Ming Lei
2019-03-11 14:16   ` Christoph Hellwig
2019-03-11 14:16     ` Christoph Hellwig
2019-03-11 19:57   ` Boris Ostrovsky
2019-03-11 19:57     ` Boris Ostrovsky
2019-03-09  1:37 ` [PATCH 2/6] block: don't merge adjacent bvecs to one segment in bio blk_queue_split Ming Lei
2019-03-09  1:37   ` Ming Lei
2019-03-11 14:21   ` Christoph Hellwig
2019-03-11 14:21     ` Christoph Hellwig
2019-03-12  1:22     ` Ming Lei
2019-03-12  1:22       ` Ming Lei
2019-03-11 19:58   ` Boris Ostrovsky
2019-03-11 19:58     ` Boris Ostrovsky
2019-03-09  1:37 ` [PATCH 3/6] block: check if page is mergeable in one helper Ming Lei
2019-03-11 14:23   ` Christoph Hellwig
2019-03-17  8:05     ` Ming Lei
2019-03-09  1:37 ` [PATCH 4/6] block: put the same page when adding it to bio Ming Lei
2019-03-11 14:28   ` Christoph Hellwig
2019-03-09  1:37 ` [PATCH 5/6] block: enable multi-page bvec for passthrough IO Ming Lei
2019-03-11 14:35   ` Christoph Hellwig
2019-03-12  1:06     ` Ming Lei
2019-03-09  1:37 ` [PATCH 6/6] block: don't check if adjacent bvecs in one bio can be mergeable Ming Lei
2019-03-11 14:40   ` Christoph Hellwig
2019-03-12  1:19     ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.