All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] Fix silent data corruption in blkdev_direct_IO()
@ 2018-07-20 13:05 Martin Wilck
  2018-07-20 13:05 ` [PATCH v4 1/4] block: bio_iov_iter_get_pages: fix size of last iovec Martin Wilck
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Martin Wilck @ 2018-07-20 13:05 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara, Ming Lei, Christoph Hellwig
  Cc: Hannes Reinecke, Johannes Thumshirn, Kent Overstreet,
	linux-block, Martin Wilck

Hello Jens, Ming, Jan, and all others,

the following patches have been verified by a customer to fix a silent data
corruption which he has been seeing since "72ecad2 block: support a full bio
worth of IO for simplified bdev direct-io".

The patches are based on our observation that the corruption is only
observed if the __blkdev_direct_IO_simple() code path is executed,
and if that happens, "short writes" are observed in this code path,
which causes a fallback to buffered IO, while the application continues
submitting direct IO requests.

In v4, I've separated out the "get as many pages as the bio can hold"
functionality into a new helper. This way other callers can migrate to the new
helper if deemed appropriate. Changing the semantics of
bio_iov_iter_get_pages() for all callers, as Ming originally suggested, seems
too intrusive to me at this time.

Regards,
Martin

Changes wrt v1:
 - 1/3: minor formatting change (Christoph)
 - 2/3: split off the leak fix (Ming)
 - 3/3: give up if bio_iov_iter_get_pages() returns an error (Jan)
 - 3/3: warn if space in bio exhausted (Jan)
 - 3/3: add comments

Changes wrt v3:
 - split previous 3/3 into two patches (3/4, 4/4).
 - 3/4: add a new helper to retrieve as many pages as possible (Ming)
 - 3/4: put pages in case of error (Ming)

Martin Wilck (4):
  block: bio_iov_iter_get_pages: fix size of last iovec
  blkdev: __blkdev_direct_IO_simple: fix leak in error case
  block: add bio_iov_iter_get_all_pages() helper
  blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

 block/bio.c         | 61 +++++++++++++++++++++++++++++++++++++--------
 fs/block_dev.c      | 18 +++++++++----
 include/linux/bio.h |  1 +
 3 files changed, 64 insertions(+), 16 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/4] block: bio_iov_iter_get_pages: fix size of last iovec
  2018-07-20 13:05 [PATCH v4 0/4] Fix silent data corruption in blkdev_direct_IO() Martin Wilck
@ 2018-07-20 13:05 ` Martin Wilck
  2018-07-20 13:05 ` [PATCH v4 2/4] blkdev: __blkdev_direct_IO_simple: fix leak in error case Martin Wilck
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2018-07-20 13:05 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara, Ming Lei, Christoph Hellwig
  Cc: Hannes Reinecke, Johannes Thumshirn, Kent Overstreet,
	linux-block, Martin Wilck

If the last page of the bio is not "full", the length of the last
vector slot needs to be corrected. This slot has the index
(bio->bi_vcnt - 1), but only in bio->bi_io_vec. In the "bv" helper
array, which is shifted by the value of bio->bi_vcnt at function
invocation, the correct index is (nr_pages - 1).

v2: improved readability following suggestions from Ming Lei.
v3: followed a formatting suggestion from Christoph Hellwig.

Fixes: 2cefe4dbaadf ("block: add bio_iov_iter_get_pages()")
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/bio.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 67eff5e..489a430 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -912,16 +912,16 @@ EXPORT_SYMBOL(bio_add_page);
  */
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
-	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
+	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt, idx;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
-	size_t offset, diff;
+	size_t offset;
 	ssize_t size;
 
 	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
-	nr_pages = (size + offset + PAGE_SIZE - 1) / PAGE_SIZE;
+	idx = nr_pages = (size + offset + PAGE_SIZE - 1) / PAGE_SIZE;
 
 	/*
 	 * Deep magic below:  We need to walk the pinned pages backwards
@@ -934,17 +934,15 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	bio->bi_iter.bi_size += size;
 	bio->bi_vcnt += nr_pages;
 
-	diff = (nr_pages * PAGE_SIZE - offset) - size;
-	while (nr_pages--) {
-		bv[nr_pages].bv_page = pages[nr_pages];
-		bv[nr_pages].bv_len = PAGE_SIZE;
-		bv[nr_pages].bv_offset = 0;
+	while (idx--) {
+		bv[idx].bv_page = pages[idx];
+		bv[idx].bv_len = PAGE_SIZE;
+		bv[idx].bv_offset = 0;
 	}
 
 	bv[0].bv_offset += offset;
 	bv[0].bv_len -= offset;
-	if (diff)
-		bv[bio->bi_vcnt - 1].bv_len -= diff;
+	bv[nr_pages - 1].bv_len -= nr_pages * PAGE_SIZE - offset - size;
 
 	iov_iter_advance(iter, size);
 	return 0;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/4] blkdev: __blkdev_direct_IO_simple: fix leak in error case
  2018-07-20 13:05 [PATCH v4 0/4] Fix silent data corruption in blkdev_direct_IO() Martin Wilck
  2018-07-20 13:05 ` [PATCH v4 1/4] block: bio_iov_iter_get_pages: fix size of last iovec Martin Wilck
@ 2018-07-20 13:05 ` Martin Wilck
  2018-07-20 15:09   ` Christoph Hellwig
  2018-07-20 13:05 ` [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper Martin Wilck
  2018-07-20 13:05 ` [PATCH v4 4/4] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio Martin Wilck
  3 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2018-07-20 13:05 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara, Ming Lei, Christoph Hellwig
  Cc: Hannes Reinecke, Johannes Thumshirn, Kent Overstreet,
	linux-block, Martin Wilck

Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for
 simplified bdev direct-io")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 fs/block_dev.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 0dd87aa..aba2541 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -221,7 +221,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 
 	ret = bio_iov_iter_get_pages(&bio, iter);
 	if (unlikely(ret))
-		return ret;
+		goto out;
 	ret = bio.bi_iter.bi_size;
 
 	if (iov_iter_rw(iter) == READ) {
@@ -250,12 +250,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 		put_page(bvec->bv_page);
 	}
 
-	if (vecs != inline_vecs)
-		kfree(vecs);
-
 	if (unlikely(bio.bi_status))
 		ret = blk_status_to_errno(bio.bi_status);
 
+out:
+	if (vecs != inline_vecs)
+		kfree(vecs);
+
 	bio_uninit(&bio);
 
 	return ret;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper
  2018-07-20 13:05 [PATCH v4 0/4] Fix silent data corruption in blkdev_direct_IO() Martin Wilck
  2018-07-20 13:05 ` [PATCH v4 1/4] block: bio_iov_iter_get_pages: fix size of last iovec Martin Wilck
  2018-07-20 13:05 ` [PATCH v4 2/4] blkdev: __blkdev_direct_IO_simple: fix leak in error case Martin Wilck
@ 2018-07-20 13:05 ` Martin Wilck
  2018-07-20 15:11   ` Christoph Hellwig
  2018-07-20 16:16   ` Ming Lei
  2018-07-20 13:05 ` [PATCH v4 4/4] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio Martin Wilck
  3 siblings, 2 replies; 11+ messages in thread
From: Martin Wilck @ 2018-07-20 13:05 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara, Ming Lei, Christoph Hellwig
  Cc: Hannes Reinecke, Johannes Thumshirn, Kent Overstreet,
	linux-block, Martin Wilck

bio_iov_iter_get_pages() only adds pages for the next non-zero
segment from the iov_iter to the bio. Some callers prefer to
obtain as many pages as would fit into the bio, with proper
rollback in case of failure. Add bio_iov_iter_get_all_pages()
for this purpose.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/bio.c         | 43 ++++++++++++++++++++++++++++++++++++++++++-
 include/linux/bio.h |  1 +
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index 489a430..693eb3b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -907,8 +907,10 @@ EXPORT_SYMBOL(bio_add_page);
  * @bio: bio to add pages to
  * @iter: iov iterator describing the region to be mapped
  *
- * Pins as many pages from *iter and appends them to @bio's bvec array. The
+ * Pins pages from *iter and appends them to @bio's bvec array. The
  * pages will have to be released using put_page() when done.
+ * For multi-segment *iter, this function only adds pages from the
+ * the next non-empty segment of the iov iterator.
  */
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
@@ -949,6 +951,45 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 }
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
 
+/**
+ * bio_iov_iter_get_all_pages - pin user or kernel pages and add them to a bio
+ * @bio: bio to add pages to
+ * @iter: iov iterator describing the region to be mapped
+ *
+ * Pins pages from *iter and appends them to @bio's bvec array. The
+ * pages will have to be released using put_page() when done.
+ * This function adds as many pages as possible to a bio.
+ * If this function encounters an error, it unpins the pages it has
+ * pinned before, leaving previously pinned pages untouched.
+ */
+int bio_iov_iter_get_all_pages(struct bio *bio, struct iov_iter *iter)
+{
+	unsigned short orig_vcnt = bio->bi_vcnt;
+
+	do {
+		int ret = bio_iov_iter_get_pages(bio, iter);
+
+		if (unlikely(ret)) {
+			struct bio_vec *bvec;
+			unsigned short i;
+
+			bio_for_each_segment_all(bvec, bio, i) {
+				if (i >= orig_vcnt) {
+					put_page(bvec->bv_page);
+					bvec->bv_page = NULL;
+					bvec->bv_len = 0;
+					bvec->bv_offset = 0;
+				}
+			}
+			bio->bi_vcnt = orig_vcnt;
+			return ret;
+		}
+	} while (iov_iter_count(iter) && !bio_full(bio));
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(bio_iov_iter_get_all_pages);
+
 static void submit_bio_wait_endio(struct bio *bio)
 {
 	complete(bio->bi_private);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index f08f5fe..824cb74 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -485,6 +485,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 void __bio_add_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
+int bio_iov_iter_get_all_pages(struct bio *bio, struct iov_iter *iter);
 struct rq_map_data;
 extern struct bio *bio_map_user_iov(struct request_queue *,
 				    struct iov_iter *, gfp_t);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/4] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio
  2018-07-20 13:05 [PATCH v4 0/4] Fix silent data corruption in blkdev_direct_IO() Martin Wilck
                   ` (2 preceding siblings ...)
  2018-07-20 13:05 ` [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper Martin Wilck
@ 2018-07-20 13:05 ` Martin Wilck
  3 siblings, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2018-07-20 13:05 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara, Ming Lei, Christoph Hellwig
  Cc: Hannes Reinecke, Johannes Thumshirn, Kent Overstreet,
	linux-block, Martin Wilck

bio_iov_iter_get_pages() returns only pages for a single non-empty
segment of the input iov_iter's iovec. This may be less than the
number of pages __blkdev_direct_IO_simple() is supposed to process.
Call the new bio_iov_iter_get_all_pages() helper instead to avoid
short reads or writes. Otherwise, __generic_file_write_iter() falls
back to buffered writes, which has been observed to cause data
corruption in certain workloads.

Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for
 simplified bdev direct-io")
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 fs/block_dev.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index aba2541..010708a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -219,9 +219,16 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 	bio.bi_end_io = blkdev_bio_end_io_simple;
 	bio.bi_ioprio = iocb->ki_ioprio;
 
-	ret = bio_iov_iter_get_pages(&bio, iter);
+	ret = bio_iov_iter_get_all_pages(&bio, iter);
 	if (unlikely(ret))
 		goto out;
+
+	/*
+	 * Our bi_io_vec should be big enough to hold all data from the
+	 * iov_iter, as this has been checked before calling this function.
+	 */
+	WARN_ON_ONCE(iov_iter_count(iter));
+
 	ret = bio.bi_iter.bi_size;
 
 	if (iov_iter_rw(iter) == READ) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/4] blkdev: __blkdev_direct_IO_simple: fix leak in error case
  2018-07-20 13:05 ` [PATCH v4 2/4] blkdev: __blkdev_direct_IO_simple: fix leak in error case Martin Wilck
@ 2018-07-20 15:09   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2018-07-20 15:09 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Jens Axboe, Jan Kara, Ming Lei, Christoph Hellwig,
	Hannes Reinecke, Johannes Thumshirn, Kent Overstreet,
	linux-block

On Fri, Jul 20, 2018 at 03:05:50PM +0200, Martin Wilck wrote:
> Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for
>  simplified bdev direct-io")
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Martin Wilck <mwilck@suse.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper
  2018-07-20 13:05 ` [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper Martin Wilck
@ 2018-07-20 15:11   ` Christoph Hellwig
  2018-07-20 15:29     ` Martin Wilck
  2018-07-20 16:16   ` Ming Lei
  1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2018-07-20 15:11 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Jens Axboe, Jan Kara, Ming Lei, Christoph Hellwig,
	Hannes Reinecke, Johannes Thumshirn, Kent Overstreet,
	linux-block

On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> bio_iov_iter_get_pages() only adds pages for the next non-zero
> segment from the iov_iter to the bio. Some callers prefer to
> obtain as many pages as would fit into the bio, with proper
> rollback in case of failure. Add bio_iov_iter_get_all_pages()
> for this purpose.

I'd much rather have you fix bio_iov_iter_get_pages.  It only has
three callers, all beeing slight variations of the same direct I/O
pattern.  There is no point in diverging in implementation details
for them.

> +	do {
> +		int ret = bio_iov_iter_get_pages(bio, iter);
> +
> +		if (unlikely(ret)) {
> +			struct bio_vec *bvec;
> +			unsigned short i;
> +
> +			bio_for_each_segment_all(bvec, bio, i) {
> +				if (i >= orig_vcnt) {
> +					put_page(bvec->bv_page);
> +					bvec->bv_page = NULL;
> +					bvec->bv_len = 0;
> +					bvec->bv_offset = 0;
> +				}
> +			}

I don't think we need any of the zeroing here.   Also for code flow
purposes I'd rather see a goto for the error handling here.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper
  2018-07-20 15:11   ` Christoph Hellwig
@ 2018-07-20 15:29     ` Martin Wilck
  0 siblings, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2018-07-20 15:29 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Jens Axboe, Jan Kara, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, Kent Overstreet, linux-block

On Fri, 2018-07-20 at 17:11 +0200, Christoph Hellwig wrote:
> On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> > bio_iov_iter_get_pages() only adds pages for the next non-zero
> > segment from the iov_iter to the bio. Some callers prefer to
> > obtain as many pages as would fit into the bio, with proper
> > rollback in case of failure. Add bio_iov_iter_get_all_pages()
> > for this purpose.
> 
> I'd much rather have you fix bio_iov_iter_get_pages.  It only has
> three callers, all beeing slight variations of the same direct I/O
> pattern.  There is no point in diverging in implementation details
> for them.

If that's consensus, I'll do it happily. So far I felt there was
opposition against it.


> > +	do {
> > +		int ret = bio_iov_iter_get_pages(bio, iter);
> > +
> > +		if (unlikely(ret)) {
> > +			struct bio_vec *bvec;
> > +			unsigned short i;
> > +
> > +			bio_for_each_segment_all(bvec, bio, i) {
> > +				if (i >= orig_vcnt) {
> > +					put_page(bvec->bv_page);
> > +					bvec->bv_page = NULL;
> > +					bvec->bv_len = 0;
> > +					bvec->bv_offset = 0;
> > +				}
> > +			}
> 
> I don't think we need any of the zeroing here.   Also for code flow
> purposes I'd rather see a goto for the error handling here.

OK, will do.

I'll wait for some more feedback on the "fix bio_iov_iter_get_pages"
part.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper
  2018-07-20 13:05 ` [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper Martin Wilck
  2018-07-20 15:11   ` Christoph Hellwig
@ 2018-07-20 16:16   ` Ming Lei
  2018-07-20 16:54     ` Martin Wilck
  1 sibling, 1 reply; 11+ messages in thread
From: Ming Lei @ 2018-07-20 16:16 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Jens Axboe, Jan Kara, Christoph Hellwig, Hannes Reinecke,
	Johannes Thumshirn, Kent Overstreet, linux-block

On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> bio_iov_iter_get_pages() only adds pages for the next non-zero
> segment from the iov_iter to the bio. Some callers prefer to
> obtain as many pages as would fit into the bio, with proper
> rollback in case of failure. Add bio_iov_iter_get_all_pages()
> for this purpose.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  block/bio.c         | 43 ++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/bio.h |  1 +
>  2 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 489a430..693eb3b 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -907,8 +907,10 @@ EXPORT_SYMBOL(bio_add_page);
>   * @bio: bio to add pages to
>   * @iter: iov iterator describing the region to be mapped
>   *
> - * Pins as many pages from *iter and appends them to @bio's bvec array. The
> + * Pins pages from *iter and appends them to @bio's bvec array. The
>   * pages will have to be released using put_page() when done.
> + * For multi-segment *iter, this function only adds pages from the
> + * the next non-empty segment of the iov iterator.
>   */
>  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  {
> @@ -949,6 +951,45 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  }
>  EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
>  
> +/**
> + * bio_iov_iter_get_all_pages - pin user or kernel pages and add them to a bio
> + * @bio: bio to add pages to
> + * @iter: iov iterator describing the region to be mapped
> + *
> + * Pins pages from *iter and appends them to @bio's bvec array. The
> + * pages will have to be released using put_page() when done.
> + * This function adds as many pages as possible to a bio.
> + * If this function encounters an error, it unpins the pages it has
> + * pinned before, leaving previously pinned pages untouched.
> + */
> +int bio_iov_iter_get_all_pages(struct bio *bio, struct iov_iter *iter)
> +{
> +	unsigned short orig_vcnt = bio->bi_vcnt;
> +
> +	do {
> +		int ret = bio_iov_iter_get_pages(bio, iter);
> +
> +		if (unlikely(ret)) {
> +			struct bio_vec *bvec;
> +			unsigned short i;
> +
> +			bio_for_each_segment_all(bvec, bio, i) {
> +				if (i >= orig_vcnt) {
> +					put_page(bvec->bv_page);
> +					bvec->bv_page = NULL;
> +					bvec->bv_len = 0;
> +					bvec->bv_offset = 0;
> +				}
> +			}
> +			bio->bi_vcnt = orig_vcnt;
> +			return ret;
> +		}
> +	} while (iov_iter_count(iter) && !bio_full(bio));

The failure handling part(release pages) may be moved out of this
helper, so usage of this helper can be aligned with bio_iov_iter_get_pages().

BTW, I agree with Christoph, we may just fix/improve bio_iov_iter_get_pages()
for all users.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper
  2018-07-20 16:16   ` Ming Lei
@ 2018-07-20 16:54     ` Martin Wilck
  2018-07-20 23:48       ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2018-07-20 16:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Jan Kara, Christoph Hellwig, Hannes Reinecke,
	Johannes Thumshirn, Kent Overstreet, linux-block

On Sat, 2018-07-21 at 00:16 +0800, Ming Lei wrote:
> On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> > bio_iov_iter_get_pages() only adds pages for the next non-zero
> > segment from the iov_iter to the bio. Some callers prefer to
> > obtain as many pages as would fit into the bio, with proper
> > rollback in case of failure. Add bio_iov_iter_get_all_pages()
> > for this purpose.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  block/bio.c         | 43
> > ++++++++++++++++++++++++++++++++++++++++++-
> >  include/linux/bio.h |  1 +
> >  2 files changed, 43 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index 489a430..693eb3b 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -907,8 +907,10 @@ EXPORT_SYMBOL(bio_add_page);
> >   * @bio: bio to add pages to
> >   * @iter: iov iterator describing the region to be mapped
> >   *
> > - * Pins as many pages from *iter and appends them to @bio's bvec
> > array. The
> > + * Pins pages from *iter and appends them to @bio's bvec array.
> > The
> >   * pages will have to be released using put_page() when done.
> > + * For multi-segment *iter, this function only adds pages from the
> > + * the next non-empty segment of the iov iterator.
> >   */
> >  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> >  {
> > @@ -949,6 +951,45 @@ int bio_iov_iter_get_pages(struct bio *bio,
> > struct iov_iter *iter)
> >  }
> >  EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
> >  
> > +/**
> > + * bio_iov_iter_get_all_pages - pin user or kernel pages and add
> > them to a bio
> > + * @bio: bio to add pages to
> > + * @iter: iov iterator describing the region to be mapped
> > + *
> > + * Pins pages from *iter and appends them to @bio's bvec array.
> > The
> > + * pages will have to be released using put_page() when done.
> > + * This function adds as many pages as possible to a bio.
> > + * If this function encounters an error, it unpins the pages it
> > has
> > + * pinned before, leaving previously pinned pages untouched.
> > + */
> > +int bio_iov_iter_get_all_pages(struct bio *bio, struct iov_iter
> > *iter)
> > +{
> > +	unsigned short orig_vcnt = bio->bi_vcnt;
> > +
> > +	do {
> > +		int ret = bio_iov_iter_get_pages(bio, iter);
> > +
> > +		if (unlikely(ret)) {
> > +			struct bio_vec *bvec;
> > +			unsigned short i;
> > +
> > +			bio_for_each_segment_all(bvec, bio, i) {
> > +				if (i >= orig_vcnt) {
> > +					put_page(bvec->bv_page);
> > +					bvec->bv_page = NULL;
> > +					bvec->bv_len = 0;
> > +					bvec->bv_offset = 0;
> > +				}
> > +			}
> > +			bio->bi_vcnt = orig_vcnt;
> > +			return ret;
> > +		}
> > +	} while (iov_iter_count(iter) && !bio_full(bio));
> 
> The failure handling part(release pages) may be moved out of this
> helper, so usage of this helper can be aligned with
> bio_iov_iter_get_pages().

I wrote the failure handling precisely for being compatible with
bio_iov_iter_get_pages(), which requires no rollback if it returns an
error. If we don't do this, we have to add extra error handling code to
every caller. It was the issue you raised with my v3 submission...
apparently I misunderstood you.

What should happen if the new handler encounters an error from
bio_iov_iter_get_pages() in the 2nd or later iteration?

 1 return success, so that the caller doesn't realize there was a
problem,
 2 return error and roll back bio changes, as I implemented it here,
 3 return error and keep the already allocated pages.

You seem to support option 3. But that leaves it to the caller to 
differentiate this from a failure with zero allocated pages, and clean
up appropriately. I not sure if that's wise, and it's for sure
different from bio_iov_iter_get_pages()' behavior. Or what am I
missing?

> BTW, I agree with Christoph, we may just fix/improve
> bio_iov_iter_get_pages()
> for all users.

Ok, thanks for confirming.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper
  2018-07-20 16:54     ` Martin Wilck
@ 2018-07-20 23:48       ` Ming Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2018-07-20 23:48 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Jens Axboe, Jan Kara, Christoph Hellwig, Hannes Reinecke,
	Johannes Thumshirn, Kent Overstreet, linux-block

On Fri, Jul 20, 2018 at 06:54:48PM +0200, Martin Wilck wrote:
> On Sat, 2018-07-21 at 00:16 +0800, Ming Lei wrote:
> > On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> > > bio_iov_iter_get_pages() only adds pages for the next non-zero
> > > segment from the iov_iter to the bio. Some callers prefer to
> > > obtain as many pages as would fit into the bio, with proper
> > > rollback in case of failure. Add bio_iov_iter_get_all_pages()
> > > for this purpose.
> > > 
> > > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > > ---
> > >  block/bio.c         | 43
> > > ++++++++++++++++++++++++++++++++++++++++++-
> > >  include/linux/bio.h |  1 +
> > >  2 files changed, 43 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/bio.c b/block/bio.c
> > > index 489a430..693eb3b 100644
> > > --- a/block/bio.c
> > > +++ b/block/bio.c
> > > @@ -907,8 +907,10 @@ EXPORT_SYMBOL(bio_add_page);
> > >   * @bio: bio to add pages to
> > >   * @iter: iov iterator describing the region to be mapped
> > >   *
> > > - * Pins as many pages from *iter and appends them to @bio's bvec
> > > array. The
> > > + * Pins pages from *iter and appends them to @bio's bvec array.
> > > The
> > >   * pages will have to be released using put_page() when done.
> > > + * For multi-segment *iter, this function only adds pages from the
> > > + * the next non-empty segment of the iov iterator.
> > >   */
> > >  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> > >  {
> > > @@ -949,6 +951,45 @@ int bio_iov_iter_get_pages(struct bio *bio,
> > > struct iov_iter *iter)
> > >  }
> > >  EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
> > >  
> > > +/**
> > > + * bio_iov_iter_get_all_pages - pin user or kernel pages and add
> > > them to a bio
> > > + * @bio: bio to add pages to
> > > + * @iter: iov iterator describing the region to be mapped
> > > + *
> > > + * Pins pages from *iter and appends them to @bio's bvec array.
> > > The
> > > + * pages will have to be released using put_page() when done.
> > > + * This function adds as many pages as possible to a bio.
> > > + * If this function encounters an error, it unpins the pages it
> > > has
> > > + * pinned before, leaving previously pinned pages untouched.
> > > + */
> > > +int bio_iov_iter_get_all_pages(struct bio *bio, struct iov_iter
> > > *iter)
> > > +{
> > > +	unsigned short orig_vcnt = bio->bi_vcnt;
> > > +
> > > +	do {
> > > +		int ret = bio_iov_iter_get_pages(bio, iter);
> > > +
> > > +		if (unlikely(ret)) {
> > > +			struct bio_vec *bvec;
> > > +			unsigned short i;
> > > +
> > > +			bio_for_each_segment_all(bvec, bio, i) {
> > > +				if (i >= orig_vcnt) {
> > > +					put_page(bvec->bv_page);
> > > +					bvec->bv_page = NULL;
> > > +					bvec->bv_len = 0;
> > > +					bvec->bv_offset = 0;
> > > +				}
> > > +			}
> > > +			bio->bi_vcnt = orig_vcnt;
> > > +			return ret;
> > > +		}
> > > +	} while (iov_iter_count(iter) && !bio_full(bio));
> > 
> > The failure handling part(release pages) may be moved out of this
> > helper, so usage of this helper can be aligned with
> > bio_iov_iter_get_pages().
> 
> I wrote the failure handling precisely for being compatible with
> bio_iov_iter_get_pages(), which requires no rollback if it returns an
> error. If we don't do this, we have to add extra error handling code to
> every caller. It was the issue you raised with my v3 submission...
> apparently I misunderstood you.
> 
> What should happen if the new handler encounters an error from
> bio_iov_iter_get_pages() in the 2nd or later iteration?
> 
>  1 return success, so that the caller doesn't realize there was a
> problem,
>  2 return error and roll back bio changes, as I implemented it here,
>  3 return error and keep the already allocated pages.
> 
> You seem to support option 3. But that leaves it to the caller to 
> differentiate this from a failure with zero allocated pages, and clean
> up appropriately. I not sure if that's wise, and it's for sure
> different from bio_iov_iter_get_pages()' behavior. Or what am I
> missing?
> > BTW, I agree with Christoph, we may just fix/improve
> > bio_iov_iter_get_pages()
> > for all users.
> 
> Ok, thanks for confirming.

OK, if you follow this suggestion, the pinned pages may be released
in bio_iov_iter_get_pages() like what this patch does, but the bio_endio(bio)
in failure handler of __blkdev_direct_IO() has to be handled in the
following way for avoiding double release:

1) if this bio is allocated from &blkdev_dio_pool, the bio_endio() need
to be removed

2) otherwise, the bio_endio() need to be replaced with bio_put().

Frankly speaking, after your patch is in, seems it is fine to allocate
single bio for doing the dio in __blkdev_direct_IO(), given the passed
'max_pages' is <= BIO_MAX_PAGES. Then __blkdev_direct_IO() can be
simplified much.

But, both current __blkdev_direct_IO() and iomap_dio_actor() supports
short dio, do you think there is same issue in the two with yours? Or
do we need to support it in the new bio_iov_iter_get_pages()?


Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-07-20 23:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-20 13:05 [PATCH v4 0/4] Fix silent data corruption in blkdev_direct_IO() Martin Wilck
2018-07-20 13:05 ` [PATCH v4 1/4] block: bio_iov_iter_get_pages: fix size of last iovec Martin Wilck
2018-07-20 13:05 ` [PATCH v4 2/4] blkdev: __blkdev_direct_IO_simple: fix leak in error case Martin Wilck
2018-07-20 15:09   ` Christoph Hellwig
2018-07-20 13:05 ` [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper Martin Wilck
2018-07-20 15:11   ` Christoph Hellwig
2018-07-20 15:29     ` Martin Wilck
2018-07-20 16:16   ` Ming Lei
2018-07-20 16:54     ` Martin Wilck
2018-07-20 23:48       ` Ming Lei
2018-07-20 13:05 ` [PATCH v4 4/4] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.