All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: linux-block@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Matthew Wilcox <willy@infradead.org>,
	Ming Lei <ming.lei@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	io-uring@vger.kernel.org, linux-kernel@vger.kernel.org,
	target-devel@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-doc@vger.kernel.org
Subject: [PATCH v1 6/6] block/iomap: don't copy bvec for direct IO
Date: Tue, 15 Dec 2020 00:20:25 +0000	[thread overview]
Message-ID: <498b34d746627e874740d8315b2924880c46dbc3.1607976425.git.asml.silence@gmail.com> (raw)
In-Reply-To: <cover.1607976425.git.asml.silence@gmail.com>

The block layer spends quite a while in blkdev_direct_IO() to copy and
initialise bio's bvec. However, if we've already got a bvec in the input
iterator it might be reused in some cases, i.e. when new
ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
performance boost, and it also reduces memory footprint.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 Documentation/filesystems/porting.rst |  9 ++++
 block/bio.c                           | 64 +++++++++++----------------
 include/linux/bio.h                   |  3 ++
 3 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 867036aa90b8..47a622879952 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -865,3 +865,12 @@ no matter what.  Everything is handled by the caller.
 
 clone_private_mount() returns a longterm mount now, so the proper destructor of
 its result is kern_unmount() or kern_unmount_array().
+
+---
+
+**mandatory**
+
+For bvec based itererators bio_iov_iter_get_pages() now doesn't copy bvecs but
+uses the one provided. Anyone issuing kiocb-I/O should ensure that the bvec and
+page references stay until I/O has completed, i.e. until ->ki_complete() has
+been called or returned with non -EIOCBQUEUED code.
diff --git a/block/bio.c b/block/bio.c
index 3192358c411f..f8229be24562 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -960,25 +960,16 @@ void bio_release_pages(struct bio *bio, bool mark_dirty)
 }
 EXPORT_SYMBOL_GPL(bio_release_pages);
 
-static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
+static int bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
 {
-	const struct bio_vec *bv = iter->bvec;
-	struct page *page = bv->bv_page;
-	bool same_page = false;
-	unsigned int off, len;
-
-	if (WARN_ON_ONCE(iter->iov_offset > bv->bv_len))
-		return -EINVAL;
-
-	len = min_t(size_t, bv->bv_len - iter->iov_offset, iter->count);
-	off = bv->bv_offset + iter->iov_offset;
-
-	if (!__bio_try_merge_page(bio, page, len, off, &same_page)) {
-		if (bio_full(bio, len))
-			return -EINVAL;
-		bio_add_page_noaccount(bio, page, len, off);
-	}
-	iov_iter_advance(iter, len);
+	WARN_ON_ONCE(BVEC_POOL_IDX(bio) != 0);
+	bio->bi_vcnt = iter->nr_segs;
+	bio->bi_max_vecs = iter->nr_segs;
+	bio->bi_io_vec = (struct bio_vec *)iter->bvec;
+	bio->bi_iter.bi_bvec_done = iter->iov_offset;
+	bio->bi_iter.bi_size = iter->count;
+
+	iov_iter_advance(iter, iter->count);
 	return 0;
 }
 
@@ -1092,12 +1083,13 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
  * This takes either an iterator pointing to user memory, or one pointing to
  * kernel pages (BVEC iterator). If we're adding user pages, we pin them and
  * map them into the kernel. On IO completion, the caller should put those
- * pages. If we're adding kernel pages, and the caller told us it's safe to
- * do so, we just have to add the pages to the bio directly. We don't grab an
- * extra reference to those pages (the user should already have that), and we
- * don't put the page on IO completion. The caller needs to check if the bio is
- * flagged BIO_NO_PAGE_REF on IO completion. If it isn't, then pages should be
- * released.
+ * pages. If we're adding kernel pages, it doesn't take extra page references
+ * and reuses the provided bvec, so the caller must ensure that the bvec isn't
+ * freed and page references remain to be taken until I/O has completed. If
+ * the I/O is completed asynchronously, the bvec must not be freed before
+ * ->ki_complete() has been called. The caller needs to check if the bio is
+ * flagged BIO_NO_PAGE_REF on IO completion. If it isn't, then pages should
+ * be released.
  *
  * The function tries, but does not guarantee, to pin as many pages as
  * fit into the bio, or are requested in @iter, whatever is smaller. If
@@ -1109,27 +1101,23 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
  */
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
-	const bool is_bvec = iov_iter_is_bvec(iter);
 	int ret;
 
-	if (WARN_ON_ONCE(bio->bi_vcnt))
-		return -EINVAL;
+	if (iov_iter_is_bvec(iter)) {
+		if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
+			return -EINVAL;
+		bio_iov_bvec_set(bio, iter);
+		bio_set_flag(bio, BIO_NO_PAGE_REF);
+		return 0;
+	}
 
 	do {
-		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-			if (WARN_ON_ONCE(is_bvec))
-				return -EINVAL;
+		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
 			ret = __bio_iov_append_get_pages(bio, iter);
-		} else {
-			if (is_bvec)
-				ret = __bio_iov_bvec_add_pages(bio, iter);
-			else
-				ret = __bio_iov_iter_get_pages(bio, iter);
-		}
+		else
+			ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
 
-	if (is_bvec)
-		bio_set_flag(bio, BIO_NO_PAGE_REF);
 	return bio->bi_vcnt ? 0 : ret;
 }
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2a9f3f0bbe0a..337f4280b639 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -444,6 +444,9 @@ static inline void bio_wouldblock_error(struct bio *bio)
 
 static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
 {
+	/* reuse iter->bvec */
+	if (iov_iter_is_bvec(iter))
+		return 0;
 	return iov_iter_npages(iter, max_segs);
 }
 
-- 
2.24.0


  parent reply	other threads:[~2020-12-15  0:27 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15  0:20 [PATCH v1 0/6] no-copy bvec Pavel Begunkov
2020-12-15  0:20 ` [PATCH v1 1/6] target/file: allocate the bvec array as part of struct target_core_file_cmd Pavel Begunkov
2020-12-15  0:20 ` [PATCH v1 2/6] iov_iter: optimise bvec iov_iter_advance() Pavel Begunkov
2020-12-15  9:37   ` David Laight
2020-12-15 11:23     ` Pavel Begunkov
2020-12-15 13:54       ` David Laight
2020-12-15 13:56         ` Pavel Begunkov
2020-12-22 14:03   ` Christoph Hellwig
2020-12-15  0:20 ` [PATCH v1 3/6] bio: deduplicate adding a page into bio Pavel Begunkov
2020-12-22 14:04   ` Christoph Hellwig
2020-12-15  0:20 ` [PATCH v1 4/6] block/psi: remove PSI annotations from direct IO Pavel Begunkov
2020-12-15  0:56   ` Dave Chinner
2020-12-15  1:03     ` Pavel Begunkov
2020-12-15  1:33       ` Dave Chinner
2020-12-15 11:41         ` Pavel Begunkov
2020-12-22 14:07   ` Christoph Hellwig
2020-12-15  0:20 ` [PATCH v1 5/6] bio: add a helper calculating nr segments to alloc Pavel Begunkov
2020-12-15  1:00   ` Dave Chinner
2020-12-15  1:07     ` Pavel Begunkov
2020-12-15  1:09     ` Dave Chinner
2020-12-22 14:07   ` Christoph Hellwig
2020-12-15  0:20 ` Pavel Begunkov [this message]
2020-12-15  1:09   ` [PATCH v1 6/6] block/iomap: don't copy bvec for direct IO Dave Chinner
2020-12-15  1:15     ` Pavel Begunkov
2020-12-22 14:15   ` Christoph Hellwig
2020-12-15  1:41 ` [PATCH v1 0/6] no-copy bvec Ming Lei
2020-12-15 11:14   ` Pavel Begunkov
2020-12-15 12:03     ` Ming Lei
2020-12-15 14:05       ` Pavel Begunkov
2020-12-22 14:11         ` Christoph Hellwig
2020-12-23 12:52           ` Pavel Begunkov
2020-12-23 15:51             ` Christoph Hellwig
2020-12-23 16:04               ` James Bottomley
2020-12-23 20:23                 ` Douglas Gilbert
2020-12-23 20:32                   ` Pavel Begunkov
2020-12-24  6:41                     ` Christoph Hellwig
2020-12-24 16:45                       ` Douglas Gilbert
2020-12-24 17:30                   ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=498b34d746627e874740d8315b2924880c46dbc3.1607976425.git.asml.silence@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=corbet@lwn.net \
    --cc=darrick.wong@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=target-devel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.