* [PATCH 1/3] iov_iter: add ITER_BVEC_FLAG_NO_REF flag
2019-03-15 14:59 [PATCHSET 0/3] io_uring improvements Jens Axboe
@ 2019-03-15 14:59 ` Jens Axboe
2019-03-15 14:59 ` [PATCH 2/3] block: add BIO_NO_PAGE_REF flag Jens Axboe
2019-03-15 14:59 ` [PATCH 3/3] io_uring: add io_uring_event cache hit information Jens Axboe
2 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2019-03-15 14:59 UTC (permalink / raw)
To: linux-fsdevel, linux-block; +Cc: viro, Jens Axboe
For ITER_BVEC, if we're holding on to kernel pages, the caller
doesn't need to grab a reference to the bvec pages, and drop that
same reference on IO completion. This is essentially safe for any
ITER_BVEC, but some use cases end up reusing pages and uncondtionally
dropping a page reference on completion. And example of that is
sendfile(2), that ends up being a splice_in + splice_out on the
pipe pages.
Add a flag that tells us it's fine to not grab a page reference
to the bvec pages, since that caller knows not to drop a reference
when it's done with the pages.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
fs/io_uring.c | 3 +++
include/linux/uio.h | 24 +++++++++++++++++++-----
2 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5be6e4f99a9e..f66a4a5daf35 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -862,6 +862,9 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len);
if (offset)
iov_iter_advance(iter, offset);
+
+ /* don't drop a reference to these pages */
+ iter->type |= ITER_BVEC_FLAG_NO_REF;
return 0;
}
diff --git a/include/linux/uio.h b/include/linux/uio.h
index ecf584f6b82d..4e926641fa80 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -23,14 +23,23 @@ struct kvec {
};
enum iter_type {
- ITER_IOVEC = 0,
- ITER_KVEC = 2,
- ITER_BVEC = 4,
- ITER_PIPE = 8,
- ITER_DISCARD = 16,
+ /* set if ITER_BVEC doesn't hold a bv_page ref */
+ ITER_BVEC_FLAG_NO_REF = 2,
+
+ /* iter types */
+ ITER_IOVEC = 4,
+ ITER_KVEC = 8,
+ ITER_BVEC = 16,
+ ITER_PIPE = 32,
+ ITER_DISCARD = 64,
};
struct iov_iter {
+ /*
+ * Bit 0 is the read/write bit, set if we're writing.
+ * Bit 1 is the BVEC_FLAG_NO_REF bit, set if type is a bvec and
+ * the caller isn't expecting to drop a page reference when done.
+ */
unsigned int type;
size_t iov_offset;
size_t count;
@@ -84,6 +93,11 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i)
return i->type & (READ | WRITE);
}
+static inline bool iov_iter_bvec_no_ref(const struct iov_iter *i)
+{
+ return (i->type & ITER_BVEC_FLAG_NO_REF) != 0;
+}
+
/*
* Total number of bytes covered by an iovec.
*
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] block: add BIO_NO_PAGE_REF flag
2019-03-15 14:59 [PATCHSET 0/3] io_uring improvements Jens Axboe
2019-03-15 14:59 ` [PATCH 1/3] iov_iter: add ITER_BVEC_FLAG_NO_REF flag Jens Axboe
@ 2019-03-15 14:59 ` Jens Axboe
2019-03-15 14:59 ` [PATCH 3/3] io_uring: add io_uring_event cache hit information Jens Axboe
2 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2019-03-15 14:59 UTC (permalink / raw)
To: linux-fsdevel, linux-block; +Cc: viro, Jens Axboe
If bio_iov_iter_get_pages() is called on an iov_iter that is flagged
with NO_REF, then we don't need to add a page reference for the pages
that we add.
Add BIO_NO_PAGE_REF to track this in the bio, so IO completion knows
not to drop a reference to these pages.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
block/bio.c | 43 ++++++++++++++++++++++-----------------
fs/block_dev.c | 12 ++++++-----
fs/iomap.c | 12 ++++++-----
include/linux/blk_types.h | 1 +
4 files changed, 39 insertions(+), 29 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 71a78d9fb8b7..b64cedc7f87c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -849,20 +849,14 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
size = bio_add_page(bio, bv->bv_page, len,
bv->bv_offset + iter->iov_offset);
if (size == len) {
- struct page *page;
- int i;
+ if (!bio_flagged(bio, BIO_NO_PAGE_REF)) {
+ struct page *page;
+ int i;
+
+ mp_bvec_for_each_page(page, bv, i)
+ get_page(page);
+ }
- /*
- * For the normal O_DIRECT case, we could skip grabbing this
- * reference and then not have to put them again when IO
- * completes. But this breaks some in-kernel users, like
- * splicing to/from a loop device, where we release the pipe
- * pages unconditionally. If we can fix that case, we can
- * get rid of the get here and the need to call
- * bio_release_pages() at IO completion time.
- */
- mp_bvec_for_each_page(page, bv, i)
- get_page(page);
iov_iter_advance(iter, size);
return 0;
}
@@ -925,10 +919,12 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
* This takes either an iterator pointing to user memory, or one pointing to
* kernel pages (BVEC iterator). If we're adding user pages, we pin them and
* map them into the kernel. On IO completion, the caller should put those
- * pages. For now, when adding kernel pages, we still grab a reference to the
- * page. This isn't strictly needed for the common case, but some call paths
- * end up releasing pages from eg a pipe and we can't easily control these.
- * See comment in __bio_iov_bvec_add_pages().
+ * pages. If we're adding kernel pages, and the caller told us it's safe to
+ * do so, we just have to add the pages to the bio directly. We don't grab an
+ * extra reference to those pages (the user should already have that), and we
+ * don't put the page on IO completion. The caller needs to check if the bio is
+ * flagged BIO_NO_PAGE_REF on IO completion. If it isn't, then pages should be
+ * released.
*
* The function tries, but does not guarantee, to pin as many pages as
* fit into the bio, or are requested in *iter, whatever is smaller. If
@@ -940,6 +936,13 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
const bool is_bvec = iov_iter_is_bvec(iter);
unsigned short orig_vcnt = bio->bi_vcnt;
+ /*
+ * If this is a BVEC iter, then the pages are kernel pages. Don't
+ * release them on IO completion, if the caller asked us to.
+ */
+ if (is_bvec && iov_iter_bvec_no_ref(iter))
+ bio_set_flag(bio, BIO_NO_PAGE_REF);
+
do {
int ret;
@@ -1696,7 +1699,8 @@ static void bio_dirty_fn(struct work_struct *work)
next = bio->bi_private;
bio_set_pages_dirty(bio);
- bio_release_pages(bio);
+ if (!bio_flagged(bio, BIO_NO_PAGE_REF))
+ bio_release_pages(bio);
bio_put(bio);
}
}
@@ -1713,7 +1717,8 @@ void bio_check_pages_dirty(struct bio *bio)
goto defer;
}
- bio_release_pages(bio);
+ if (!bio_flagged(bio, BIO_NO_PAGE_REF))
+ bio_release_pages(bio);
bio_put(bio);
return;
defer:
diff --git a/fs/block_dev.c b/fs/block_dev.c
index e9faa52bb489..78d3257435c0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -336,12 +336,14 @@ static void blkdev_bio_end_io(struct bio *bio)
if (should_dirty) {
bio_check_pages_dirty(bio);
} else {
- struct bio_vec *bvec;
- int i;
- struct bvec_iter_all iter_all;
+ if (!bio_flagged(bio, BIO_NO_PAGE_REF)) {
+ struct bvec_iter_all iter_all;
+ struct bio_vec *bvec;
+ int i;
- bio_for_each_segment_all(bvec, bio, i, iter_all)
- put_page(bvec->bv_page);
+ bio_for_each_segment_all(bvec, bio, i, iter_all)
+ put_page(bvec->bv_page);
+ }
bio_put(bio);
}
}
diff --git a/fs/iomap.c b/fs/iomap.c
index 97cb9d486a7d..abdd18e404f8 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1589,12 +1589,14 @@ static void iomap_dio_bio_end_io(struct bio *bio)
if (should_dirty) {
bio_check_pages_dirty(bio);
} else {
- struct bio_vec *bvec;
- int i;
- struct bvec_iter_all iter_all;
+ if (!bio_flagged(bio, BIO_NO_PAGE_REF)) {
+ struct bvec_iter_all iter_all;
+ struct bio_vec *bvec;
+ int i;
- bio_for_each_segment_all(bvec, bio, i, iter_all)
- put_page(bvec->bv_page);
+ bio_for_each_segment_all(bvec, bio, i, iter_all)
+ put_page(bvec->bv_page);
+ }
bio_put(bio);
}
}
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d66bf5f32610..791fee35df88 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -215,6 +215,7 @@ struct bio {
/*
* bio flags
*/
+#define BIO_NO_PAGE_REF 0 /* don't put release vec pages */
#define BIO_SEG_VALID 1 /* bi_phys_segments valid */
#define BIO_CLONED 2 /* doesn't own data */
#define BIO_BOUNCED 3 /* bio is a bounce bio */
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] io_uring: add io_uring_event cache hit information
2019-03-15 14:59 [PATCHSET 0/3] io_uring improvements Jens Axboe
2019-03-15 14:59 ` [PATCH 1/3] iov_iter: add ITER_BVEC_FLAG_NO_REF flag Jens Axboe
2019-03-15 14:59 ` [PATCH 2/3] block: add BIO_NO_PAGE_REF flag Jens Axboe
@ 2019-03-15 14:59 ` Jens Axboe
2019-03-15 16:27 ` Jens Axboe
2 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2019-03-15 14:59 UTC (permalink / raw)
To: linux-fsdevel, linux-block; +Cc: viro, Jens Axboe
Add hint on whether a read was served out of the page cache, or if it
hit media. This is useful for buffered async IO, O_DIRECT reads would
never have this set (for obvious reasons).
If the read hit page cache, cqe->flags will have IOCQE_FLAG_CACHEHIT
set.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
fs/io_uring.c | 6 +++++-
include/uapi/linux/io_uring.h | 5 +++++
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index f66a4a5daf35..513df722702e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -635,10 +635,14 @@ static void kiocb_end_write(struct kiocb *kiocb)
static void io_complete_rw(struct kiocb *kiocb, long res, long res2)
{
struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw);
+ unsigned ev_flags = 0;
kiocb_end_write(kiocb);
- io_cqring_add_event(req->ctx, req->user_data, res, 0);
+ if (res > 0 && (req->flags & REQ_F_FORCE_NONBLOCK))
+ ev_flags = IOCQE_FLAG_CACHEHIT;
+
+ io_cqring_add_event(req->ctx, req->user_data, res, ev_flags);
io_put_req(req);
}
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index e23408692118..24906e99fdc7 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -69,6 +69,11 @@ struct io_uring_cqe {
__u32 flags;
};
+/*
+ * io_uring_event->flags
+ */
+#define IOCQE_FLAG_CACHEHIT (1U << 0) /* IO did not hit media */
+
/*
* Magic offsets for the application to mmap the data it needs
*/
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread