linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* alternative take on the same page merging leak fix
@ 2019-06-11 15:10 Christoph Hellwig
  2019-06-11 15:10 ` [PATCH 1/5] block: fix gap checking in __bio_add_pc_page Christoph Hellwig
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Christoph Hellwig @ 2019-06-11 15:10 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: David Gibson, Darrick J. Wong, linux-block, linux-xfs

Hi Jens, hi Ming,

this is the tested and split version of what I think is the better
fix for the get_user_pages page leak, as it leaves the intelligence
in the callers instead of in bio_try_to_merge_page.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] block: fix gap checking in __bio_add_pc_page
  2019-06-11 15:10 alternative take on the same page merging leak fix Christoph Hellwig
@ 2019-06-11 15:10 ` Christoph Hellwig
  2019-06-11 15:10 ` [PATCH 2/5] block: factor out a bio_try_merge_pc_page helper Christoph Hellwig
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2019-06-11 15:10 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: David Gibson, Darrick J. Wong, linux-block, linux-xfs

If we can add more data into an existing segment we do not create a gap
per definition, so move the check for a gap after the attempt to merge
into the segment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 683cbb40f051..6db39699aab9 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -722,18 +722,18 @@ static int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
 			goto done;
 		}
 
+		if (page_is_mergeable(bvec, page, len, offset, false) &&
+		    can_add_page_to_seg(q, bvec, page, len, offset)) {
+			bvec->bv_len += len;
+			goto done;
+		}
+
 		/*
 		 * If the queue doesn't support SG gaps and adding this
 		 * offset would create a gap, disallow it.
 		 */
 		if (bvec_gap_to_prev(q, bvec, offset))
 			return 0;
-
-		if (page_is_mergeable(bvec, page, len, offset, false) &&
-		    can_add_page_to_seg(q, bvec, page, len, offset)) {
-			bvec->bv_len += len;
-			goto done;
-		}
 	}
 
 	if (bio_full(bio))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] block: factor out a bio_try_merge_pc_page helper
  2019-06-11 15:10 alternative take on the same page merging leak fix Christoph Hellwig
  2019-06-11 15:10 ` [PATCH 1/5] block: fix gap checking in __bio_add_pc_page Christoph Hellwig
@ 2019-06-11 15:10 ` Christoph Hellwig
  2019-06-11 15:10 ` [PATCH 3/5] block: return from __bio_try_merge_page if merging occured in the same page Christoph Hellwig
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2019-06-11 15:10 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: David Gibson, Darrick J. Wong, linux-block, linux-xfs

Factor the case of trying to add to an existing segment in
__bio_add_pc_page into a new helper that is similar to the regular
bio case.  Subsume the existing can_add_page_to_seg helper into this new
one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c | 44 ++++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 6db39699aab9..85e243ea6a0e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -659,24 +659,27 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
 	return true;
 }
 
-/*
- * Check if the @page can be added to the current segment(@bv), and make
- * sure to call it only if page_is_mergeable(@bv, @page) is true
- */
-static bool can_add_page_to_seg(struct request_queue *q,
-		struct bio_vec *bv, struct page *page, unsigned len,
-		unsigned offset)
+static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio,
+		struct page *page, unsigned len, unsigned off, bool *same_page)
 {
+	struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
 	unsigned long mask = queue_segment_boundary(q);
 	phys_addr_t addr1 = page_to_phys(bv->bv_page) + bv->bv_offset;
-	phys_addr_t addr2 = page_to_phys(page) + offset + len - 1;
+	phys_addr_t addr2 = page_to_phys(page) + off + len - 1;
+
+	if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
+		*same_page = true;
+		goto done;
+	}
 
 	if ((addr1 | mask) != (addr2 | mask))
 		return false;
-
 	if (bv->bv_len + len > queue_max_segment_size(q))
 		return false;
-
+	if (!page_is_mergeable(bv, page, len, off, false))
+		return false;
+done:
+	bv->bv_len += len;
 	return true;
 }
 
@@ -701,6 +704,7 @@ static int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
 		bool put_same_page)
 {
 	struct bio_vec *bvec;
+	bool same_page = false;
 
 	/*
 	 * cloned bio must not modify vec list
@@ -712,26 +716,18 @@ static int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
 		return 0;
 
 	if (bio->bi_vcnt > 0) {
-		bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
-
-		if (page == bvec->bv_page &&
-		    offset == bvec->bv_offset + bvec->bv_len) {
-			if (put_same_page)
+		if (bio_try_merge_pc_page(q, bio, page, len, offset,
+				&same_page)) {
+			if (put_same_page && same_page)
 				put_page(page);
-			bvec->bv_len += len;
-			goto done;
-		}
-
-		if (page_is_mergeable(bvec, page, len, offset, false) &&
-		    can_add_page_to_seg(q, bvec, page, len, offset)) {
-			bvec->bv_len += len;
 			goto done;
 		}
 
 		/*
-		 * If the queue doesn't support SG gaps and adding this
-		 * offset would create a gap, disallow it.
+		 * If the queue doesn't support SG gaps and adding this offset
+		 * would create a gap, disallow it.
 		 */
+		bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
 		if (bvec_gap_to_prev(q, bvec, offset))
 			return 0;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] block: return from __bio_try_merge_page if merging occured in the same page
  2019-06-11 15:10 alternative take on the same page merging leak fix Christoph Hellwig
  2019-06-11 15:10 ` [PATCH 1/5] block: fix gap checking in __bio_add_pc_page Christoph Hellwig
  2019-06-11 15:10 ` [PATCH 2/5] block: factor out a bio_try_merge_pc_page helper Christoph Hellwig
@ 2019-06-11 15:10 ` Christoph Hellwig
  2019-06-12 10:14   ` Ming Lei
  2019-06-11 15:10 ` [PATCH 4/5] block: fix page leak when merging to " Christoph Hellwig
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2019-06-11 15:10 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: David Gibson, Darrick J. Wong, linux-block, linux-xfs

We currently have an input same_page parameter to __bio_try_merge_page
to prohibit merging in the same page.  The rationale for that is that
some callers need to account for every page added to a bio.  Instead of
letting these callers call twice into the merge code to account for the
new vs existing page cases, just turn the paramter into an output one that
returns if a merge in the same page occured and let them act accordingly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c         | 23 +++++++++--------------
 fs/iomap.c          | 12 ++++++++----
 fs/xfs/xfs_aops.c   | 11 ++++++++---
 include/linux/bio.h |  2 +-
 4 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 85e243ea6a0e..c34327aa9216 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -636,7 +636,7 @@ EXPORT_SYMBOL(bio_clone_fast);
 
 static inline bool page_is_mergeable(const struct bio_vec *bv,
 		struct page *page, unsigned int len, unsigned int off,
-		bool same_page)
+		bool *same_page)
 {
 	phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) +
 		bv->bv_offset + bv->bv_len - 1;
@@ -647,15 +647,9 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
 	if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
 		return false;
 
-	if ((vec_end_addr & PAGE_MASK) != page_addr) {
-		if (same_page)
-			return false;
-		if (pfn_to_page(PFN_DOWN(vec_end_addr)) + 1 != page)
-			return false;
-	}
-
-	WARN_ON_ONCE(same_page && (len + off) > PAGE_SIZE);
-
+	*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
+	if (!*same_page && pfn_to_page(PFN_DOWN(vec_end_addr)) + 1 != page)
+		return false;
 	return true;
 }
 
@@ -763,8 +757,7 @@ EXPORT_SYMBOL(bio_add_pc_page);
  * @page: start page to add
  * @len: length of the data to add
  * @off: offset of the data relative to @page
- * @same_page: if %true only merge if the new data is in the same physical
- *		page as the last segment of the bio.
+ * @same_page: return if the segment has been merged inside the same page
  *
  * Try to add the data at @page + @off to the last bvec of @bio.  This is a
  * a useful optimisation for file systems with a block size smaller than the
@@ -775,7 +768,7 @@ EXPORT_SYMBOL(bio_add_pc_page);
  * Return %true on success or %false on failure.
  */
 bool __bio_try_merge_page(struct bio *bio, struct page *page,
-		unsigned int len, unsigned int off, bool same_page)
+		unsigned int len, unsigned int off, bool *same_page)
 {
 	if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
 		return false;
@@ -833,7 +826,9 @@ EXPORT_SYMBOL_GPL(__bio_add_page);
 int bio_add_page(struct bio *bio, struct page *page,
 		 unsigned int len, unsigned int offset)
 {
-	if (!__bio_try_merge_page(bio, page, len, offset, false)) {
+	bool same_page = false;
+
+	if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) {
 		if (bio_full(bio))
 			return 0;
 		__bio_add_page(bio, page, len, offset);
diff --git a/fs/iomap.c b/fs/iomap.c
index 23ef63fd1669..12654c2e78f8 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -287,7 +287,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	struct iomap_readpage_ctx *ctx = data;
 	struct page *page = ctx->cur_page;
 	struct iomap_page *iop = iomap_page_create(inode, page);
-	bool is_contig = false;
+	bool same_page = false, is_contig = false;
 	loff_t orig_pos = pos;
 	unsigned poff, plen;
 	sector_t sector;
@@ -315,10 +315,14 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	 * Try to merge into a previous segment if we can.
 	 */
 	sector = iomap_sector(iomap, pos);
-	if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
-		if (__bio_try_merge_page(ctx->bio, page, plen, poff, true))
-			goto done;
+	if (ctx->bio && bio_end_sector(ctx->bio) == sector)
 		is_contig = true;
+
+	if (is_contig &&
+	    __bio_try_merge_page(ctx->bio, page, plen, poff, &same_page)) {
+		if (!same_page && iop)
+			atomic_inc(&iop->read_count);
+		goto done;
 	}
 
 	/*
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a6f0f4761a37..8da5e6637771 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -758,6 +758,7 @@ xfs_add_to_ioend(
 	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
 	unsigned		len = i_blocksize(inode);
 	unsigned		poff = offset & (PAGE_SIZE - 1);
+	bool			merged, same_page = false;
 	sector_t		sector;
 
 	sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
@@ -774,9 +775,13 @@ xfs_add_to_ioend(
 				wpc->imap.br_state, offset, bdev, sector);
 	}
 
-	if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff, true)) {
-		if (iop)
-			atomic_inc(&iop->write_count);
+	merged = __bio_try_merge_page(wpc->ioend->io_bio, page, len, poff,
+			&same_page);
+
+	if (iop && !same_page)
+		atomic_inc(&iop->write_count);
+
+	if (!merged) {
 		if (bio_full(wpc->ioend->io_bio))
 			xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
 		bio_add_page(wpc->ioend->io_bio, page, len, poff);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 0f23b5682640..f87abaa898f0 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -423,7 +423,7 @@ extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
 			   unsigned int, unsigned int);
 bool __bio_try_merge_page(struct bio *bio, struct page *page,
-		unsigned int len, unsigned int off, bool same_page);
+		unsigned int len, unsigned int off, bool *same_page);
 void __bio_add_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] block: fix page leak when merging to same page
  2019-06-11 15:10 alternative take on the same page merging leak fix Christoph Hellwig
                   ` (2 preceding siblings ...)
  2019-06-11 15:10 ` [PATCH 3/5] block: return from __bio_try_merge_page if merging occured in the same page Christoph Hellwig
@ 2019-06-11 15:10 ` Christoph Hellwig
  2019-06-12 10:18   ` Ming Lei
  2019-06-11 15:10 ` [PATCH 5/5] block: use __bio_try_merge_page in __bio_try_merge_pc_page Christoph Hellwig
  2019-06-12  1:09 ` alternative take on the same page merging leak fix Ming Lei
  5 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2019-06-11 15:10 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: David Gibson, Darrick J. Wong, linux-block, linux-xfs

When multiple iovecs reference the same page, each get_user_page call
will add a reference to the page.  But once we've created the bio that
information gets lost and only a single reference will be dropped after
I/O completion.  Use the same_page information returned from
__bio_try_merge_page to drop additional references to pages that were
already present in the bio.

Based on a patch from Ming Lei.

Link: https://lkml.org/lkml/2019/4/23/64
Fixes: 576ed913 ("block: use bio_add_page in bio_iov_iter_get_pages")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index c34327aa9216..0d841ba4373a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -891,6 +891,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
+	bool same_page = false;
 	ssize_t size, left;
 	unsigned len, i;
 	size_t offset;
@@ -911,8 +912,15 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		struct page *page = pages[i];
 
 		len = min_t(size_t, PAGE_SIZE - offset, left);
-		if (WARN_ON_ONCE(bio_add_page(bio, page, len, offset) != len))
-			return -EINVAL;
+
+		if (__bio_try_merge_page(bio, page, len, offset, &same_page)) {
+			if (same_page)
+				put_page(page);
+		} else {
+			if (WARN_ON_ONCE(bio_full(bio)))
+                                return -EINVAL;
+			__bio_add_page(bio, page, len, offset);
+		}
 		offset = 0;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] block: use __bio_try_merge_page in __bio_try_merge_pc_page
  2019-06-11 15:10 alternative take on the same page merging leak fix Christoph Hellwig
                   ` (3 preceding siblings ...)
  2019-06-11 15:10 ` [PATCH 4/5] block: fix page leak when merging to " Christoph Hellwig
@ 2019-06-11 15:10 ` Christoph Hellwig
  2019-06-12  1:09 ` alternative take on the same page merging leak fix Ming Lei
  5 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2019-06-11 15:10 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: David Gibson, Darrick J. Wong, linux-block, linux-xfs

Passsthrough bio handling should be the same as normal bio handling,
except that we need to take hardware limitations into account.  Thus
use the common try_merge implementation after checking the hardware
limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 0d841ba4373a..7db7186eab1c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -661,20 +661,11 @@ static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio,
 	phys_addr_t addr1 = page_to_phys(bv->bv_page) + bv->bv_offset;
 	phys_addr_t addr2 = page_to_phys(page) + off + len - 1;
 
-	if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
-		*same_page = true;
-		goto done;
-	}
-
 	if ((addr1 | mask) != (addr2 | mask))
 		return false;
 	if (bv->bv_len + len > queue_max_segment_size(q))
 		return false;
-	if (!page_is_mergeable(bv, page, len, off, false))
-		return false;
-done:
-	bv->bv_len += len;
-	return true;
+	return __bio_try_merge_page(bio, page, len, off, same_page);
 }
 
 /**
@@ -737,8 +728,8 @@ static int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
 	bvec->bv_len = len;
 	bvec->bv_offset = offset;
 	bio->bi_vcnt++;
- done:
 	bio->bi_iter.bi_size += len;
+ done:
 	bio->bi_phys_segments = bio->bi_vcnt;
 	bio_set_flag(bio, BIO_SEG_VALID);
 	return len;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: alternative take on the same page merging leak fix
  2019-06-11 15:10 alternative take on the same page merging leak fix Christoph Hellwig
                   ` (4 preceding siblings ...)
  2019-06-11 15:10 ` [PATCH 5/5] block: use __bio_try_merge_page in __bio_try_merge_pc_page Christoph Hellwig
@ 2019-06-12  1:09 ` Ming Lei
  2019-06-12  7:45   ` Christoph Hellwig
  5 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2019-06-12  1:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, David Gibson, Darrick J. Wong, linux-block, linux-xfs

Hi Christoph,

On Tue, Jun 11, 2019 at 05:10:02PM +0200, Christoph Hellwig wrote:
> Hi Jens, hi Ming,
> 
> this is the tested and split version of what I think is the better
> fix for the get_user_pages page leak, as it leaves the intelligence
> in the callers instead of in bio_try_to_merge_page.

I am fine with either one from upstream developer view, so let Jens
decide.

However:

We have to backport the fixes to -stable tree, and downstream need to
ship the fix too.

The issue is quite serious because the leak is in IO path and the whole
system ram can be used up easily on some workloads. So I think the fix
should be for 5.2, however, regression risk might be increased by
pulling cleanup & re-factor in now.

I really appreciate you may cook a fix-only patch for this issue.
Especially the change in add pc page code isn't necessary for fixing
the issue.



Thanks,
Ming

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: alternative take on the same page merging leak fix
  2019-06-12  1:09 ` alternative take on the same page merging leak fix Ming Lei
@ 2019-06-12  7:45   ` Christoph Hellwig
  2019-06-12 10:11     ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2019-06-12  7:45 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, David Gibson, Darrick J. Wong,
	linux-block, linux-xfs

On Wed, Jun 12, 2019 at 09:09:23AM +0800, Ming Lei wrote:
> We have to backport the fixes to -stable tree, and downstream need to
> ship the fix too.
> 
> The issue is quite serious because the leak is in IO path and the whole
> system ram can be used up easily on some workloads. So I think the fix
> should be for 5.2, however, regression risk might be increased by
> pulling cleanup & re-factor in now.
> 
> I really appreciate you may cook a fix-only patch for this issue.
> Especially the change in add pc page code isn't necessary for fixing
> the issue.

Patches 3 and 4 have no dependencies on 1 and 2, and should have
arguably been ordered first in the series.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: alternative take on the same page merging leak fix
  2019-06-12  7:45   ` Christoph Hellwig
@ 2019-06-12 10:11     ` Ming Lei
  2019-06-13  9:02       ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2019-06-12 10:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, David Gibson, Darrick J. Wong, linux-block, linux-xfs

On Wed, Jun 12, 2019 at 09:45:27AM +0200, Christoph Hellwig wrote:
> On Wed, Jun 12, 2019 at 09:09:23AM +0800, Ming Lei wrote:
> > We have to backport the fixes to -stable tree, and downstream need to
> > ship the fix too.
> > 
> > The issue is quite serious because the leak is in IO path and the whole
> > system ram can be used up easily on some workloads. So I think the fix
> > should be for 5.2, however, regression risk might be increased by
> > pulling cleanup & re-factor in now.
> > 
> > I really appreciate you may cook a fix-only patch for this issue.
> > Especially the change in add pc page code isn't necessary for fixing
> > the issue.
> 
> Patches 3 and 4 have no dependencies on 1 and 2, and should have
> arguably been ordered first in the series.

OK, that is good to make patch 3 &4 into 5.2, I will give a review
soon.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] block: return from __bio_try_merge_page if merging occured in the same page
  2019-06-11 15:10 ` [PATCH 3/5] block: return from __bio_try_merge_page if merging occured in the same page Christoph Hellwig
@ 2019-06-12 10:14   ` Ming Lei
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2019-06-12 10:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, David Gibson, Darrick J. Wong, linux-block, linux-xfs

On Tue, Jun 11, 2019 at 05:10:05PM +0200, Christoph Hellwig wrote:
> We currently have an input same_page parameter to __bio_try_merge_page
> to prohibit merging in the same page.  The rationale for that is that
> some callers need to account for every page added to a bio.  Instead of
> letting these callers call twice into the merge code to account for the
> new vs existing page cases, just turn the paramter into an output one that
> returns if a merge in the same page occured and let them act accordingly.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/bio.c         | 23 +++++++++--------------
>  fs/iomap.c          | 12 ++++++++----
>  fs/xfs/xfs_aops.c   | 11 ++++++++---
>  include/linux/bio.h |  2 +-
>  4 files changed, 26 insertions(+), 22 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 85e243ea6a0e..c34327aa9216 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -636,7 +636,7 @@ EXPORT_SYMBOL(bio_clone_fast);
>  
>  static inline bool page_is_mergeable(const struct bio_vec *bv,
>  		struct page *page, unsigned int len, unsigned int off,
> -		bool same_page)
> +		bool *same_page)
>  {
>  	phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) +
>  		bv->bv_offset + bv->bv_len - 1;
> @@ -647,15 +647,9 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
>  	if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
>  		return false;
>  
> -	if ((vec_end_addr & PAGE_MASK) != page_addr) {
> -		if (same_page)
> -			return false;
> -		if (pfn_to_page(PFN_DOWN(vec_end_addr)) + 1 != page)
> -			return false;
> -	}
> -
> -	WARN_ON_ONCE(same_page && (len + off) > PAGE_SIZE);
> -
> +	*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
> +	if (!*same_page && pfn_to_page(PFN_DOWN(vec_end_addr)) + 1 != page)
> +		return false;
>  	return true;
>  }
>  
> @@ -763,8 +757,7 @@ EXPORT_SYMBOL(bio_add_pc_page);
>   * @page: start page to add
>   * @len: length of the data to add
>   * @off: offset of the data relative to @page
> - * @same_page: if %true only merge if the new data is in the same physical
> - *		page as the last segment of the bio.
> + * @same_page: return if the segment has been merged inside the same page
>   *
>   * Try to add the data at @page + @off to the last bvec of @bio.  This is a
>   * a useful optimisation for file systems with a block size smaller than the
> @@ -775,7 +768,7 @@ EXPORT_SYMBOL(bio_add_pc_page);
>   * Return %true on success or %false on failure.
>   */
>  bool __bio_try_merge_page(struct bio *bio, struct page *page,
> -		unsigned int len, unsigned int off, bool same_page)
> +		unsigned int len, unsigned int off, bool *same_page)
>  {
>  	if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
>  		return false;
> @@ -833,7 +826,9 @@ EXPORT_SYMBOL_GPL(__bio_add_page);
>  int bio_add_page(struct bio *bio, struct page *page,
>  		 unsigned int len, unsigned int offset)
>  {
> -	if (!__bio_try_merge_page(bio, page, len, offset, false)) {
> +	bool same_page = false;
> +
> +	if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) {
>  		if (bio_full(bio))
>  			return 0;
>  		__bio_add_page(bio, page, len, offset);
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 23ef63fd1669..12654c2e78f8 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -287,7 +287,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	struct iomap_readpage_ctx *ctx = data;
>  	struct page *page = ctx->cur_page;
>  	struct iomap_page *iop = iomap_page_create(inode, page);
> -	bool is_contig = false;
> +	bool same_page = false, is_contig = false;
>  	loff_t orig_pos = pos;
>  	unsigned poff, plen;
>  	sector_t sector;
> @@ -315,10 +315,14 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	 * Try to merge into a previous segment if we can.
>  	 */
>  	sector = iomap_sector(iomap, pos);
> -	if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
> -		if (__bio_try_merge_page(ctx->bio, page, plen, poff, true))
> -			goto done;
> +	if (ctx->bio && bio_end_sector(ctx->bio) == sector)
>  		is_contig = true;
> +
> +	if (is_contig &&
> +	    __bio_try_merge_page(ctx->bio, page, plen, poff, &same_page)) {
> +		if (!same_page && iop)
> +			atomic_inc(&iop->read_count);
> +		goto done;
>  	}
>  
>  	/*
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index a6f0f4761a37..8da5e6637771 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -758,6 +758,7 @@ xfs_add_to_ioend(
>  	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
>  	unsigned		len = i_blocksize(inode);
>  	unsigned		poff = offset & (PAGE_SIZE - 1);
> +	bool			merged, same_page = false;
>  	sector_t		sector;
>  
>  	sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
> @@ -774,9 +775,13 @@ xfs_add_to_ioend(
>  				wpc->imap.br_state, offset, bdev, sector);
>  	}
>  
> -	if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff, true)) {
> -		if (iop)
> -			atomic_inc(&iop->write_count);
> +	merged = __bio_try_merge_page(wpc->ioend->io_bio, page, len, poff,
> +			&same_page);
> +
> +	if (iop && !same_page)
> +		atomic_inc(&iop->write_count);
> +
> +	if (!merged) {
>  		if (bio_full(wpc->ioend->io_bio))
>  			xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
>  		bio_add_page(wpc->ioend->io_bio, page, len, poff);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 0f23b5682640..f87abaa898f0 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -423,7 +423,7 @@ extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
>  extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
>  			   unsigned int, unsigned int);
>  bool __bio_try_merge_page(struct bio *bio, struct page *page,
> -		unsigned int len, unsigned int off, bool same_page);
> +		unsigned int len, unsigned int off, bool *same_page);
>  void __bio_add_page(struct bio *bio, struct page *page,
>  		unsigned int len, unsigned int off);
>  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
> -- 
> 2.20.1
> 

Looks fine for v5.2:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] block: fix page leak when merging to same page
  2019-06-11 15:10 ` [PATCH 4/5] block: fix page leak when merging to " Christoph Hellwig
@ 2019-06-12 10:18   ` Ming Lei
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2019-06-12 10:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, David Gibson, Darrick J. Wong, linux-block, linux-xfs

On Tue, Jun 11, 2019 at 05:10:06PM +0200, Christoph Hellwig wrote:
> When multiple iovecs reference the same page, each get_user_page call
> will add a reference to the page.  But once we've created the bio that
> information gets lost and only a single reference will be dropped after
> I/O completion.  Use the same_page information returned from
> __bio_try_merge_page to drop additional references to pages that were
> already present in the bio.
> 
> Based on a patch from Ming Lei.
> 
> Link: https://lkml.org/lkml/2019/4/23/64
> Fixes: 576ed913 ("block: use bio_add_page in bio_iov_iter_get_pages")
> Reported-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/bio.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index c34327aa9216..0d841ba4373a 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -891,6 +891,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
>  	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
>  	struct page **pages = (struct page **)bv;
> +	bool same_page = false;
>  	ssize_t size, left;
>  	unsigned len, i;
>  	size_t offset;
> @@ -911,8 +912,15 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  		struct page *page = pages[i];
>  
>  		len = min_t(size_t, PAGE_SIZE - offset, left);
> -		if (WARN_ON_ONCE(bio_add_page(bio, page, len, offset) != len))
> -			return -EINVAL;
> +
> +		if (__bio_try_merge_page(bio, page, len, offset, &same_page)) {
> +			if (same_page)
> +				put_page(page);
> +		} else {
> +			if (WARN_ON_ONCE(bio_full(bio)))
> +                                return -EINVAL;
> +			__bio_add_page(bio, page, len, offset);
> +		}
>  		offset = 0;
>  	}

Looks fine for v5.2:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: alternative take on the same page merging leak fix
  2019-06-12 10:11     ` Ming Lei
@ 2019-06-13  9:02       ` Jens Axboe
  2019-06-13  9:02         ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2019-06-13  9:02 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig
  Cc: David Gibson, Darrick J. Wong, linux-block, linux-xfs

On 6/12/19 4:11 AM, Ming Lei wrote:
> On Wed, Jun 12, 2019 at 09:45:27AM +0200, Christoph Hellwig wrote:
>> On Wed, Jun 12, 2019 at 09:09:23AM +0800, Ming Lei wrote:
>>> We have to backport the fixes to -stable tree, and downstream need to
>>> ship the fix too.
>>>
>>> The issue is quite serious because the leak is in IO path and the whole
>>> system ram can be used up easily on some workloads. So I think the fix
>>> should be for 5.2, however, regression risk might be increased by
>>> pulling cleanup & re-factor in now.
>>>
>>> I really appreciate you may cook a fix-only patch for this issue.
>>> Especially the change in add pc page code isn't necessary for fixing
>>> the issue.
>>
>> Patches 3 and 4 have no dependencies on 1 and 2, and should have
>> arguably been ordered first in the series.
> 
> OK, that is good to make patch 3 &4 into 5.2, I will give a review
> soon.

I'll echo Mings sentiments here, for the series.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: alternative take on the same page merging leak fix
  2019-06-13  9:02       ` Jens Axboe
@ 2019-06-13  9:02         ` Christoph Hellwig
  2019-06-13  9:11           ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2019-06-13  9:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ming Lei, Christoph Hellwig, David Gibson, Darrick J. Wong,
	linux-block, linux-xfs

On Thu, Jun 13, 2019 at 03:02:10AM -0600, Jens Axboe wrote:
> >> Patches 3 and 4 have no dependencies on 1 and 2, and should have
> >> arguably been ordered first in the series.
> > 
> > OK, that is good to make patch 3 &4 into 5.2, I will give a review
> > soon.
> 
> I'll echo Mings sentiments here, for the series.

And what does that actually mean?  Do you want me to just resend
3 and 4, or can you just pick them up?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: alternative take on the same page merging leak fix
  2019-06-13  9:02         ` Christoph Hellwig
@ 2019-06-13  9:11           ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2019-06-13  9:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, David Gibson, Darrick J. Wong, linux-block, linux-xfs

On 6/13/19 3:02 AM, Christoph Hellwig wrote:
> On Thu, Jun 13, 2019 at 03:02:10AM -0600, Jens Axboe wrote:
>>>> Patches 3 and 4 have no dependencies on 1 and 2, and should have
>>>> arguably been ordered first in the series.
>>>
>>> OK, that is good to make patch 3 &4 into 5.2, I will give a review
>>> soon.
>>
>> I'll echo Mings sentiments here, for the series.
> 
> And what does that actually mean?  Do you want me to just resend
> 3 and 4, or can you just pick them up?

Please resend them with the acks/reviews, I'll pick them up asap.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-06-13 15:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-11 15:10 alternative take on the same page merging leak fix Christoph Hellwig
2019-06-11 15:10 ` [PATCH 1/5] block: fix gap checking in __bio_add_pc_page Christoph Hellwig
2019-06-11 15:10 ` [PATCH 2/5] block: factor out a bio_try_merge_pc_page helper Christoph Hellwig
2019-06-11 15:10 ` [PATCH 3/5] block: return from __bio_try_merge_page if merging occured in the same page Christoph Hellwig
2019-06-12 10:14   ` Ming Lei
2019-06-11 15:10 ` [PATCH 4/5] block: fix page leak when merging to " Christoph Hellwig
2019-06-12 10:18   ` Ming Lei
2019-06-11 15:10 ` [PATCH 5/5] block: use __bio_try_merge_page in __bio_try_merge_pc_page Christoph Hellwig
2019-06-12  1:09 ` alternative take on the same page merging leak fix Ming Lei
2019-06-12  7:45   ` Christoph Hellwig
2019-06-12 10:11     ` Ming Lei
2019-06-13  9:02       ` Jens Axboe
2019-06-13  9:02         ` Christoph Hellwig
2019-06-13  9:11           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).