All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: linux-block@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Matthew Wilcox <willy@infradead.org>,
	Ming Lei <ming.lei@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	io-uring@vger.kernel.org, linux-kernel@vger.kernel.org,
	target-devel@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-doc@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: [PATCH v3 7/7] bio: don't copy bvec for direct IO
Date: Sat,  9 Jan 2021 16:03:03 +0000	[thread overview]
Message-ID: <69fef253b37fc44dd28c43398715e27cee5e0fe0.1610170479.git.asml.silence@gmail.com> (raw)
In-Reply-To: <cover.1610170479.git.asml.silence@gmail.com>

The block layer spends quite a while in blkdev_direct_IO() to copy and
initialise bio's bvec. However, if we've already got a bvec in the input
iterator it might be reused in some cases, i.e. when new
ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
performance boost, and it also reduces memory footprint.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 Documentation/filesystems/porting.rst |  9 ++++
 block/bio.c                           | 67 ++++++++++++---------------
 include/linux/bio.h                   |  5 +-
 3 files changed, 42 insertions(+), 39 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index c722d94f29ea..1f8cf8e10b34 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -872,3 +872,12 @@ its result is kern_unmount() or kern_unmount_array().
 
 zero-length bvec segments are disallowed, they must be filtered out before
 passed on to an iterator.
+
+---
+
+**mandatory**
+
+For bvec based itererators bio_iov_iter_get_pages() now doesn't copy bvecs but
+uses the one provided. Anyone issuing kiocb-I/O should ensure that the bvec and
+page references stay until I/O has completed, i.e. until ->ki_complete() has
+been called or returned with non -EIOCBQUEUED code.
diff --git a/block/bio.c b/block/bio.c
index 9f26984af643..6f031a04b59a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -960,21 +960,17 @@ void bio_release_pages(struct bio *bio, bool mark_dirty)
 }
 EXPORT_SYMBOL_GPL(bio_release_pages);
 
-static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
+static int bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
 {
-	const struct bio_vec *bv = iter->bvec;
-	unsigned int len;
-	size_t size;
-
-	if (WARN_ON_ONCE(iter->iov_offset > bv->bv_len))
-		return -EINVAL;
-
-	len = min_t(size_t, bv->bv_len - iter->iov_offset, iter->count);
-	size = bio_add_page(bio, bv->bv_page, len,
-				bv->bv_offset + iter->iov_offset);
-	if (unlikely(size != len))
-		return -EINVAL;
-	iov_iter_advance(iter, size);
+	WARN_ON_ONCE(BVEC_POOL_IDX(bio) != 0);
+
+	bio->bi_vcnt = iter->nr_segs;
+	bio->bi_max_vecs = iter->nr_segs;
+	bio->bi_io_vec = (struct bio_vec *)iter->bvec;
+	bio->bi_iter.bi_bvec_done = iter->iov_offset;
+	bio->bi_iter.bi_size = iter->count;
+
+	iov_iter_advance(iter, iter->count);
 	return 0;
 }
 
@@ -1088,12 +1084,12 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
  * This takes either an iterator pointing to user memory, or one pointing to
  * kernel pages (BVEC iterator). If we're adding user pages, we pin them and
  * map them into the kernel. On IO completion, the caller should put those
- * pages. If we're adding kernel pages, and the caller told us it's safe to
- * do so, we just have to add the pages to the bio directly. We don't grab an
- * extra reference to those pages (the user should already have that), and we
- * don't put the page on IO completion. The caller needs to check if the bio is
- * flagged BIO_NO_PAGE_REF on IO completion. If it isn't, then pages should be
- * released.
+ * pages. For bvec based iterators bio_iov_iter_get_pages() uses the provided
+ * bvecs rather than copying them. Hence anyone issuing kiocb based IO needs
+ * to ensure the bvecs and pages stay referenced until the submitted I/O is
+ * completed by a call to ->ki_complete() or returns with an error other than
+ * -EIOCBQUEUED. The caller needs to check if the bio is flagged BIO_NO_PAGE_REF
+ * on IO completion. If it isn't, then pages should be released.
  *
  * The function tries, but does not guarantee, to pin as many pages as
  * fit into the bio, or are requested in @iter, whatever is smaller. If
@@ -1105,27 +1101,22 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
  */
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
-	const bool is_bvec = iov_iter_is_bvec(iter);
-	int ret;
-
-	if (WARN_ON_ONCE(bio->bi_vcnt))
-		return -EINVAL;
+	int ret = 0;
 
-	do {
-		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-			if (WARN_ON_ONCE(is_bvec))
-				return -EINVAL;
-			ret = __bio_iov_append_get_pages(bio, iter);
-		} else {
-			if (is_bvec)
-				ret = __bio_iov_bvec_add_pages(bio, iter);
+	if (iov_iter_is_bvec(iter)) {
+		if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
+			return -EINVAL;
+		bio_iov_bvec_set(bio, iter);
+		bio_set_flag(bio, BIO_NO_PAGE_REF);
+		return 0;
+	} else {
+		do {
+			if (bio_op(bio) == REQ_OP_ZONE_APPEND)
+				ret = __bio_iov_append_get_pages(bio, iter);
 			else
 				ret = __bio_iov_iter_get_pages(bio, iter);
-		}
-	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
-
-	if (is_bvec)
-		bio_set_flag(bio, BIO_NO_PAGE_REF);
+		} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
+	}
 
 	/* don't account direct I/O as memory stall */
 	bio_clear_flag(bio, BIO_WORKINGSET);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d8f9077c43ef..1d30572a8c53 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -444,10 +444,13 @@ static inline void bio_wouldblock_error(struct bio *bio)
 
 /*
  * Calculate number of bvec segments that should be allocated to fit data
- * pointed by @iter.
+ * pointed by @iter. If @iter is backed by bvec it's going to be reused
+ * instead of allocating a new one.
  */
 static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
 {
+	if (iov_iter_is_bvec(iter))
+		return 0;
 	return iov_iter_npages(iter, max_segs);
 }
 
-- 
2.24.0


  parent reply	other threads:[~2021-01-09 16:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-09 16:02 [PATCH v3 0/7] no-copy bvec Pavel Begunkov
2021-01-09 16:02 ` [PATCH v3 1/7] splice: don't generate zero-len segement bvecs Pavel Begunkov
2021-01-11  2:48   ` Ming Lei
2021-01-09 16:02 ` [PATCH v3 2/7] bvec/iter: disallow zero-length segment bvecs Pavel Begunkov
2021-01-11  2:49   ` Ming Lei
2021-01-09 16:02 ` [PATCH v3 3/7] block/psi: remove PSI annotations from direct IO Pavel Begunkov
2021-01-11  2:52   ` Ming Lei
2021-01-09 16:03 ` [PATCH v3 4/7] target/file: allocate the bvec array as part of struct target_core_file_cmd Pavel Begunkov
2021-01-11  2:53   ` Ming Lei
2021-01-09 16:03 ` [PATCH v3 5/7] iov_iter: optimise bvec iov_iter_advance() Pavel Begunkov
2021-01-11  2:55   ` Ming Lei
2021-01-09 16:03 ` [PATCH v3 6/7] bio: add a helper calculating nr segments to alloc Pavel Begunkov
2021-01-11  2:57   ` Ming Lei
2021-01-09 16:03 ` Pavel Begunkov [this message]
2021-01-11  3:00   ` [PATCH v3 7/7] bio: don't copy bvec for direct IO Ming Lei
2021-01-25 15:58 ` [PATCH v3 0/7] no-copy bvec Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69fef253b37fc44dd28c43398715e27cee5e0fe0.1610170479.git.asml.silence@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=corbet@lwn.net \
    --cc=darrick.wong@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=target-devel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.