All of lore.kernel.org
 help / color / mirror / Atom feed
* [stable-4.14 0/2] block layer fixes for silent data corruption
@ 2019-06-25 14:17 Jack Wang
  2019-06-25 14:17 ` [stable-4.14 1/2] block: add a lower-level bio_add_page interface Jack Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jack Wang @ 2019-06-25 14:17 UTC (permalink / raw)
  To: gregkh, sashal, stable

A silent data corruption was introduced in v4.10-rc1 with commit
72ecad22d9f198aafee64218512e02ffa7818671 and was fixed in v4.18-rc7
with commit 17d51b10d7773e4618bcac64648f30f12d4078fb. It affects
users of O_DIRECT, in our case a KVM virtual machine with drives
which use qemu's "cache=none" option.

The other 2 commits has been accepted in 4.14, but 2 are missing,
ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1796542

Please consider to include them in next release.

Thanks,

Jack Wang @ 1 & 1 IONOS Cloud GmbH

Christoph Hellwig (1):
  block: add a lower-level bio_add_page interface

Martin Wilck (1):
  block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs

 block/bio.c         | 131 ++++++++++++++++++++++++++++++++------------
 include/linux/bio.h |   9 +++
 2 files changed, 104 insertions(+), 36 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [stable-4.14 1/2] block: add a lower-level bio_add_page interface
  2019-06-25 14:17 [stable-4.14 0/2] block layer fixes for silent data corruption Jack Wang
@ 2019-06-25 14:17 ` Jack Wang
  2019-06-25 14:24   ` Christoph Hellwig
  2019-06-25 14:17 ` [stable-4.14 2/2] block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs Jack Wang
  2019-06-25 19:50 ` [stable-4.14 0/2] block layer fixes for silent data corruption Sasha Levin
  2 siblings, 1 reply; 9+ messages in thread
From: Jack Wang @ 2019-06-25 14:17 UTC (permalink / raw)
  To: gregkh, sashal, stable; +Cc: Christoph Hellwig, Darrick J . Wong, Jack Wang

From: Christoph Hellwig <hch@lst.de>

commit 0aa69fd32a5f766e997ca8ab4723c5a1146efa8b upstream

For the upcoming removal of buffer heads in XFS we need to keep track of
the number of outstanding writeback requests per page.  For this we need
to know if bio_add_page merged a region with the previous bvec or not.
Instead of adding additional arguments this refactors bio_add_page to
be implemented using three lower level helpers which users like XFS can
use directly if they care about the merge decisions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[jwang: cherry pick to 4.14, requred for next patch to build]
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
---
 block/bio.c         | 96 +++++++++++++++++++++++++++++----------------
 include/linux/bio.h |  9 +++++
 2 files changed, 72 insertions(+), 33 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index d01ab919b313..c1386ce2c014 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -773,7 +773,7 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 			return 0;
 	}
 
-	if (bio->bi_vcnt >= bio->bi_max_vecs)
+	if (bio_full(bio))
 		return 0;
 
 	/*
@@ -821,52 +821,82 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
 EXPORT_SYMBOL(bio_add_pc_page);
 
 /**
- *	bio_add_page	-	attempt to add page to bio
- *	@bio: destination bio
- *	@page: page to add
- *	@len: vec entry length
- *	@offset: vec entry offset
+ * __bio_try_merge_page - try appending data to an existing bvec.
+ * @bio: destination bio
+ * @page: page to add
+ * @len: length of the data to add
+ * @off: offset of the data in @page
  *
- *	Attempt to add a page to the bio_vec maplist. This will only fail
- *	if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio.
+ * Try to add the data at @page + @off to the last bvec of @bio.  This is a
+ * a useful optimisation for file systems with a block size smaller than the
+ * page size.
+ *
+ * Return %true on success or %false on failure.
  */
-int bio_add_page(struct bio *bio, struct page *page,
-		 unsigned int len, unsigned int offset)
+bool __bio_try_merge_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off)
 {
-	struct bio_vec *bv;
-
-	/*
-	 * cloned bio must not modify vec list
-	 */
 	if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
-		return 0;
+		return false;
 
-	/*
-	 * For filesystems with a blocksize smaller than the pagesize
-	 * we will often be called with the same page as last time and
-	 * a consecutive offset.  Optimize this special case.
-	 */
 	if (bio->bi_vcnt > 0) {
-		bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
+		struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
 
-		if (page == bv->bv_page &&
-		    offset == bv->bv_offset + bv->bv_len) {
+		if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
 			bv->bv_len += len;
-			goto done;
+			bio->bi_iter.bi_size += len;
+			return true;
 		}
 	}
+	return false;
+}
+EXPORT_SYMBOL_GPL(__bio_try_merge_page);
 
-	if (bio->bi_vcnt >= bio->bi_max_vecs)
-		return 0;
+/**
+ * __bio_add_page - add page to a bio in a new segment
+ * @bio: destination bio
+ * @page: page to add
+ * @len: length of the data to add
+ * @off: offset of the data in @page
+ *
+ * Add the data at @page + @off to @bio as a new bvec.  The caller must ensure
+ * that @bio has space for another bvec.
+ */
+void __bio_add_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off)
+{
+	struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt];
 
-	bv		= &bio->bi_io_vec[bio->bi_vcnt];
-	bv->bv_page	= page;
-	bv->bv_len	= len;
-	bv->bv_offset	= offset;
+	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
+	WARN_ON_ONCE(bio_full(bio));
+
+	bv->bv_page = page;
+	bv->bv_offset = off;
+	bv->bv_len = len;
 
-	bio->bi_vcnt++;
-done:
 	bio->bi_iter.bi_size += len;
+	bio->bi_vcnt++;
+}
+EXPORT_SYMBOL_GPL(__bio_add_page);
+
+/**
+ *	bio_add_page	-	attempt to add page to bio
+ *	@bio: destination bio
+ *	@page: page to add
+ *	@len: vec entry length
+ *	@offset: vec entry offset
+ *
+ *	Attempt to add a page to the bio_vec maplist. This will only fail
+ *	if either bio->bi_vcnt == bio->bi_max_vecs or it's a cloned bio.
+ */
+int bio_add_page(struct bio *bio, struct page *page,
+		 unsigned int len, unsigned int offset)
+{
+	if (!__bio_try_merge_page(bio, page, len, offset)) {
+		if (bio_full(bio))
+			return 0;
+		__bio_add_page(bio, page, len, offset);
+	}
 	return len;
 }
 EXPORT_SYMBOL(bio_add_page);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d4b39caf081d..e260f000b9ac 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -123,6 +123,11 @@ static inline void *bio_data(struct bio *bio)
 	return NULL;
 }
 
+static inline bool bio_full(struct bio *bio)
+{
+	return bio->bi_vcnt >= bio->bi_max_vecs;
+}
+
 /*
  * will die
  */
@@ -459,6 +464,10 @@ void bio_chain(struct bio *, struct bio *);
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
 			   unsigned int, unsigned int);
+bool __bio_try_merge_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off);
+void __bio_add_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
 struct rq_map_data;
 extern struct bio *bio_map_user_iov(struct request_queue *,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [stable-4.14 2/2] block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
  2019-06-25 14:17 [stable-4.14 0/2] block layer fixes for silent data corruption Jack Wang
  2019-06-25 14:17 ` [stable-4.14 1/2] block: add a lower-level bio_add_page interface Jack Wang
@ 2019-06-25 14:17 ` Jack Wang
  2019-06-25 19:50 ` [stable-4.14 0/2] block layer fixes for silent data corruption Sasha Levin
  2 siblings, 0 replies; 9+ messages in thread
From: Jack Wang @ 2019-06-25 14:17 UTC (permalink / raw)
  To: gregkh, sashal, stable; +Cc: Martin Wilck, Jens Axboe, Jack Wang

From: Martin Wilck <mwilck@suse.com>

commit 17d51b10d7773e4618bcac64648f30f12d4078fb upstream

bio_iov_iter_get_pages() currently only adds pages for the next non-zero
segment from the iov_iter to the bio. That's suboptimal for callers,
which typically try to pin as many pages as fit into the bio. This patch
converts the current bio_iov_iter_get_pages() into a static helper, and
introduces a new helper that allocates as many pages as

 1) fit into the bio,
 2) are present in the iov_iter,
 3) and can be pinned by MM.

Error is returned only if zero pages could be pinned. Because of 3), a
zero return value doesn't necessarily mean all pages have been pinned.
Callers that have to pin every page in the iov_iter must still call this
function in a loop (this is currently the case).

This change matters most for __blkdev_direct_IO_simple(), which calls
bio_iov_iter_get_pages() only once. If it obtains less pages than
requested, it returns a "short write" or "short read", and
__generic_file_write_iter() falls back to buffered writes, which may
lead to data corruption.

Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[jwang: cherry-picked to 4.14]
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
---
 block/bio.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index c1386ce2c014..1384f9790882 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -902,14 +902,16 @@ int bio_add_page(struct bio *bio, struct page *page,
 EXPORT_SYMBOL(bio_add_page);
 
 /**
- * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio
+ * __bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio
  * @bio: bio to add pages to
  * @iter: iov iterator describing the region to be mapped
  *
- * Pins as many pages from *iter and appends them to @bio's bvec array. The
+ * Pins pages from *iter and appends them to @bio's bvec array. The
  * pages will have to be released using put_page() when done.
+ * For multi-segment *iter, this function only adds pages from the
+ * the next non-empty segment of the iov iterator.
  */
-int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
+static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
 	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt, idx;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
@@ -946,6 +948,33 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	iov_iter_advance(iter, size);
 	return 0;
 }
+
+/**
+ * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio
+ * @bio: bio to add pages to
+ * @iter: iov iterator describing the region to be mapped
+ *
+ * Pins pages from *iter and appends them to @bio's bvec array. The
+ * pages will have to be released using put_page() when done.
+ * The function tries, but does not guarantee, to pin as many pages as
+ * fit into the bio, or are requested in *iter, whatever is smaller.
+ * If MM encounters an error pinning the requested pages, it stops.
+ * Error is returned only if 0 pages could be pinned.
+ */
+int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
+{
+	unsigned short orig_vcnt = bio->bi_vcnt;
+
+	do {
+		int ret = __bio_iov_iter_get_pages(bio, iter);
+
+		if (unlikely(ret))
+			return bio->bi_vcnt > orig_vcnt ? 0 : ret;
+
+	} while (iov_iter_count(iter) && !bio_full(bio));
+
+	return 0;
+}
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
 
 struct submit_bio_ret {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [stable-4.14 1/2] block: add a lower-level bio_add_page interface
  2019-06-25 14:17 ` [stable-4.14 1/2] block: add a lower-level bio_add_page interface Jack Wang
@ 2019-06-25 14:24   ` Christoph Hellwig
  2019-06-25 14:27     ` Jinpu Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2019-06-25 14:24 UTC (permalink / raw)
  To: Jack Wang
  Cc: gregkh, sashal, stable, Christoph Hellwig, Darrick J . Wong, Jack Wang

Why does this patch warrant a stable backport?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [stable-4.14 1/2] block: add a lower-level bio_add_page interface
  2019-06-25 14:24   ` Christoph Hellwig
@ 2019-06-25 14:27     ` Jinpu Wang
  2019-06-25 14:33       ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Jinpu Wang @ 2019-06-25 14:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jack Wang, Greg Kroah-Hartman, sashal, stable, Darrick J . Wong

On Tue, Jun 25, 2019 at 4:25 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Why does this patch warrant a stable backport?
[jwang: cherry pick to 4.14, requred for next patch to build] :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [stable-4.14 1/2] block: add a lower-level bio_add_page interface
  2019-06-25 14:27     ` Jinpu Wang
@ 2019-06-25 14:33       ` Christoph Hellwig
  2019-06-25 14:39         ` Jinpu Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2019-06-25 14:33 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: Christoph Hellwig, Jack Wang, Greg Kroah-Hartman, sashal, stable,
	Darrick J . Wong

On Tue, Jun 25, 2019 at 04:27:44PM +0200, Jinpu Wang wrote:
> On Tue, Jun 25, 2019 at 4:25 PM Christoph Hellwig <hch@lst.de> wrote:
> >
> > Why does this patch warrant a stable backport?
> [jwang: cherry pick to 4.14, requred for next patch to build] :)

There was no next patch in my inbox..

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [stable-4.14 1/2] block: add a lower-level bio_add_page interface
  2019-06-25 14:33       ` Christoph Hellwig
@ 2019-06-25 14:39         ` Jinpu Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Jinpu Wang @ 2019-06-25 14:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jack Wang, Greg Kroah-Hartman, sashal, stable, Darrick J . Wong

On Tue, Jun 25, 2019 at 4:33 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Tue, Jun 25, 2019 at 04:27:44PM +0200, Jinpu Wang wrote:
> > On Tue, Jun 25, 2019 at 4:25 PM Christoph Hellwig <hch@lst.de> wrote:
> > >
> > > Why does this patch warrant a stable backport?
> > [jwang: cherry pick to 4.14, requred for next patch to build] :)
>
> There was no next patch in my inbox..
Sorry, it's 17d51b10d777 ("block: bio_iov_iter_get_pages: pin more
pages for multi-segment IOs")
It has you Reviewed-by tag, I thought git will also sent to you, but
checked it's not.

link: https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/stable/0.git/commit/?id=938c071db43ad047f95a0fde25545a170ad20bf0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [stable-4.14 0/2] block layer fixes for silent data corruption
  2019-06-25 14:17 [stable-4.14 0/2] block layer fixes for silent data corruption Jack Wang
  2019-06-25 14:17 ` [stable-4.14 1/2] block: add a lower-level bio_add_page interface Jack Wang
  2019-06-25 14:17 ` [stable-4.14 2/2] block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs Jack Wang
@ 2019-06-25 19:50 ` Sasha Levin
  2019-06-26 10:01   ` Jinpu Wang
  2 siblings, 1 reply; 9+ messages in thread
From: Sasha Levin @ 2019-06-25 19:50 UTC (permalink / raw)
  To: Jack Wang; +Cc: gregkh, stable

On Tue, Jun 25, 2019 at 04:17:23PM +0200, Jack Wang wrote:
>A silent data corruption was introduced in v4.10-rc1 with commit
>72ecad22d9f198aafee64218512e02ffa7818671 and was fixed in v4.18-rc7
>with commit 17d51b10d7773e4618bcac64648f30f12d4078fb. It affects
>users of O_DIRECT, in our case a KVM virtual machine with drives
>which use qemu's "cache=none" option.
>
>The other 2 commits has been accepted in 4.14, but 2 are missing,
>ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1796542
>
>Please consider to include them in next release.

I've ended up cherry picking these two into the 4.14 tree.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [stable-4.14 0/2] block layer fixes for silent data corruption
  2019-06-25 19:50 ` [stable-4.14 0/2] block layer fixes for silent data corruption Sasha Levin
@ 2019-06-26 10:01   ` Jinpu Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Jinpu Wang @ 2019-06-26 10:01 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Greg Kroah-Hartman, v3.14+, only the raid10 part

Sasha Levin <sashal@kernel.org> 于2019年6月25日周二 下午9:50写道:
>
> On Tue, Jun 25, 2019 at 04:17:23PM +0200, Jack Wang wrote:
> >A silent data corruption was introduced in v4.10-rc1 with commit
> >72ecad22d9f198aafee64218512e02ffa7818671 and was fixed in v4.18-rc7
> >with commit 17d51b10d7773e4618bcac64648f30f12d4078fb. It affects
> >users of O_DIRECT, in our case a KVM virtual machine with drives
> >which use qemu's "cache=none" option.
> >
> >The other 2 commits has been accepted in 4.14, but 2 are missing,
> >ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1796542
> >
> >Please consider to include them in next release.
>
> I've ended up cherry picking these two into the 4.14 tree.

Thanks Sasha!
>
> --
> Thanks,
> Sasha
Jack

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-26 10:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 14:17 [stable-4.14 0/2] block layer fixes for silent data corruption Jack Wang
2019-06-25 14:17 ` [stable-4.14 1/2] block: add a lower-level bio_add_page interface Jack Wang
2019-06-25 14:24   ` Christoph Hellwig
2019-06-25 14:27     ` Jinpu Wang
2019-06-25 14:33       ` Christoph Hellwig
2019-06-25 14:39         ` Jinpu Wang
2019-06-25 14:17 ` [stable-4.14 2/2] block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs Jack Wang
2019-06-25 19:50 ` [stable-4.14 0/2] block layer fixes for silent data corruption Sasha Levin
2019-06-26 10:01   ` Jinpu Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.