All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/17] playing around req alloc
@ 2021-02-10  0:03 Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 01/17] io_uring: replace force_nonblock with flags Pavel Begunkov
                   ` (17 more replies)
  0 siblings, 18 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Unfolding previous ideas on persistent req caches. 4-7 including
slashed ~20% of overhead for nops benchmark, haven't done benchmarking
personally for this yet, but according to perf should be ~30-40% in
total. That's for IOPOLL + inline completion cases, obviously w/o
async/IRQ completions.

Jens,
1. 11/17 removes deallocations on end of submit_sqes. Looks you
forgot or just didn't do that.

2. lists are slow and not great cache-wise, that why at I want at least
a combined approach from 12/17.

3. Instead of lists in "use persistent request cache" I had in mind a
slightly different way: to grow the req alloc cache to 32-128 (or hint
from the userspace), batch-alloc by 8 as before, and recycle _all_ reqs
right into it. If  overflows, do kfree().
It should give probabilistically high hit rate, amortising out most of
allocations. Pros: it doesn't grow ~infinitely as lists can. Cons: there
are always counter examples. But as I don't have numbers to back it, I
took your implementation. Maybe, we'll reconsider later.

I'll revise tomorrow on a fresh head + do some performance testing,
and is leaving it RFC until then.

Jens Axboe (3):
  io_uring: use persistent request cache
  io_uring: provide FIFO ordering for task_work
  io_uring: enable req cache for task_work items

Pavel Begunkov (14):
  io_uring: replace force_nonblock with flags
  io_uring: make op handlers always take issue flags
  io_uring: don't propagate io_comp_state
  io_uring: don't keep submit_state on stack
  io_uring: remove ctx from comp_state
  io_uring: don't reinit submit state every time
  io_uring: replace list with array for compl batch
  io_uring: submit-completion free batching
  io_uring: remove fallback_req
  io_uring: count ctx refs separately from reqs
  io_uring: persistent req cache
  io_uring: feed reqs back into alloc cache
  io_uring: take comp_state from ctx
  io_uring: defer flushing cached reqs

 fs/io-wq.h               |   9 -
 fs/io_uring.c            | 716 ++++++++++++++++++++++-----------------
 include/linux/io_uring.h |  14 +
 3 files changed, 425 insertions(+), 314 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/17] io_uring: replace force_nonblock with flags
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 02/17] io_uring: make op handlers always take issue flags Pavel Begunkov
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Replace bool force_nonblock with flags. It has a long standing goal of
differentiating context from which we execute. Currently we have some
subtle places where some invariants, like holding of uring_lock, are
subtly inferred.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 178 +++++++++++++++++++++++++++-----------------------
 1 file changed, 96 insertions(+), 82 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9c77fbc0c395..862121c48cee 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -187,6 +187,10 @@ struct io_rings {
 	struct io_uring_cqe	cqes[] ____cacheline_aligned_in_smp;
 };
 
+enum io_uring_cmd_flags {
+	IO_URING_F_NONBLOCK		= 1,
+};
+
 struct io_mapped_ubuf {
 	u64		ubuf;
 	size_t		len;
@@ -3477,7 +3481,7 @@ static int io_iter_do_read(struct io_kiocb *req, struct iov_iter *iter)
 		return -EINVAL;
 }
 
-static int io_read(struct io_kiocb *req, bool force_nonblock,
+static int io_read(struct io_kiocb *req, unsigned int issue_flags,
 		   struct io_comp_state *cs)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
@@ -3485,6 +3489,7 @@ static int io_read(struct io_kiocb *req, bool force_nonblock,
 	struct iov_iter __iter, *iter = &__iter;
 	struct io_async_rw *rw = req->async_data;
 	ssize_t io_size, ret, ret2;
+	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
 	if (rw) {
 		iter = &rw->iter;
@@ -3588,7 +3593,7 @@ static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return io_rw_prep_async(req, WRITE);
 }
 
-static int io_write(struct io_kiocb *req, bool force_nonblock,
+static int io_write(struct io_kiocb *req, unsigned int issue_flags,
 		    struct io_comp_state *cs)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
@@ -3596,6 +3601,7 @@ static int io_write(struct io_kiocb *req, bool force_nonblock,
 	struct iov_iter __iter, *iter = &__iter;
 	struct io_async_rw *rw = req->async_data;
 	ssize_t ret, ret2, io_size;
+	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
 	if (rw) {
 		iter = &rw->iter;
@@ -3706,12 +3712,12 @@ static int io_renameat_prep(struct io_kiocb *req,
 	return 0;
 }
 
-static int io_renameat(struct io_kiocb *req, bool force_nonblock)
+static int io_renameat(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_rename *ren = &req->rename;
 	int ret;
 
-	if (force_nonblock)
+	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 
 	ret = do_renameat2(ren->old_dfd, ren->oldpath, ren->new_dfd,
@@ -3748,12 +3754,12 @@ static int io_unlinkat_prep(struct io_kiocb *req,
 	return 0;
 }
 
-static int io_unlinkat(struct io_kiocb *req, bool force_nonblock)
+static int io_unlinkat(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_unlink *un = &req->unlink;
 	int ret;
 
-	if (force_nonblock)
+	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 
 	if (un->flags & AT_REMOVEDIR)
@@ -3785,13 +3791,13 @@ static int io_shutdown_prep(struct io_kiocb *req,
 #endif
 }
 
-static int io_shutdown(struct io_kiocb *req, bool force_nonblock)
+static int io_shutdown(struct io_kiocb *req, unsigned int issue_flags)
 {
 #if defined(CONFIG_NET)
 	struct socket *sock;
 	int ret;
 
-	if (force_nonblock)
+	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 
 	sock = sock_from_file(req->file);
@@ -3850,7 +3856,7 @@ static int io_tee_prep(struct io_kiocb *req,
 	return __io_splice_prep(req, sqe);
 }
 
-static int io_tee(struct io_kiocb *req, bool force_nonblock)
+static int io_tee(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_splice *sp = &req->splice;
 	struct file *in = sp->file_in;
@@ -3858,7 +3864,7 @@ static int io_tee(struct io_kiocb *req, bool force_nonblock)
 	unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED;
 	long ret = 0;
 
-	if (force_nonblock)
+	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 	if (sp->len)
 		ret = do_tee(in, out, sp->len, flags);
@@ -3881,7 +3887,7 @@ static int io_splice_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return __io_splice_prep(req, sqe);
 }
 
-static int io_splice(struct io_kiocb *req, bool force_nonblock)
+static int io_splice(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_splice *sp = &req->splice;
 	struct file *in = sp->file_in;
@@ -3890,7 +3896,7 @@ static int io_splice(struct io_kiocb *req, bool force_nonblock)
 	loff_t *poff_in, *poff_out;
 	long ret = 0;
 
-	if (force_nonblock)
+	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 
 	poff_in = (sp->off_in == -1) ? NULL : &sp->off_in;
@@ -3943,13 +3949,13 @@ static int io_prep_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_fsync(struct io_kiocb *req, bool force_nonblock)
+static int io_fsync(struct io_kiocb *req, unsigned int issue_flags)
 {
 	loff_t end = req->sync.off + req->sync.len;
 	int ret;
 
 	/* fsync always requires a blocking context */
-	if (force_nonblock)
+	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 
 	ret = vfs_fsync_range(req->file, req->sync.off,
@@ -3975,12 +3981,12 @@ static int io_fallocate_prep(struct io_kiocb *req,
 	return 0;
 }
 
-static int io_fallocate(struct io_kiocb *req, bool force_nonblock)
+static int io_fallocate(struct io_kiocb *req, unsigned int issue_flags)
 {
 	int ret;
 
 	/* fallocate always requiring blocking context */
-	if (force_nonblock)
+	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 	ret = vfs_fallocate(req->file, req->sync.mode, req->sync.off,
 				req->sync.len);
@@ -4050,7 +4056,7 @@ static int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return __io_openat_prep(req, sqe);
 }
 
-static int io_openat2(struct io_kiocb *req, bool force_nonblock)
+static int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct open_flags op;
 	struct file *file;
@@ -4063,7 +4069,7 @@ static int io_openat2(struct io_kiocb *req, bool force_nonblock)
 		goto err;
 	nonblock_set = op.open_flag & O_NONBLOCK;
 	resolve_nonblock = req->open.how.resolve & RESOLVE_CACHED;
-	if (force_nonblock) {
+	if (issue_flags & IO_URING_F_NONBLOCK) {
 		/*
 		 * Don't bother trying for O_TRUNC, O_CREAT, or O_TMPFILE open,
 		 * it'll always -EAGAIN
@@ -4080,7 +4086,8 @@ static int io_openat2(struct io_kiocb *req, bool force_nonblock)
 
 	file = do_filp_open(req->open.dfd, req->open.filename, &op);
 	/* only retry if RESOLVE_CACHED wasn't already set by application */
-	if ((!resolve_nonblock && force_nonblock) && file == ERR_PTR(-EAGAIN)) {
+	if ((!resolve_nonblock && (issue_flags & IO_URING_F_NONBLOCK)) &&
+	    file == ERR_PTR(-EAGAIN)) {
 		/*
 		 * We could hang on to this 'fd', but seems like marginal
 		 * gain for something that is now known to be a slower path.
@@ -4094,7 +4101,7 @@ static int io_openat2(struct io_kiocb *req, bool force_nonblock)
 		put_unused_fd(ret);
 		ret = PTR_ERR(file);
 	} else {
-		if (force_nonblock && !nonblock_set)
+		if ((issue_flags & IO_URING_F_NONBLOCK) && !nonblock_set)
 			file->f_flags &= ~O_NONBLOCK;
 		fsnotify_open(file);
 		fd_install(ret, file);
@@ -4108,9 +4115,9 @@ static int io_openat2(struct io_kiocb *req, bool force_nonblock)
 	return 0;
 }
 
-static int io_openat(struct io_kiocb *req, bool force_nonblock)
+static int io_openat(struct io_kiocb *req, unsigned int issue_flags)
 {
-	return io_openat2(req, force_nonblock);
+	return io_openat2(req, issue_flags & IO_URING_F_NONBLOCK);
 }
 
 static int io_remove_buffers_prep(struct io_kiocb *req,
@@ -4158,13 +4165,14 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, struct io_buffer *buf,
 	return i;
 }
 
-static int io_remove_buffers(struct io_kiocb *req, bool force_nonblock,
+static int io_remove_buffers(struct io_kiocb *req, unsigned int issue_flags,
 			     struct io_comp_state *cs)
 {
 	struct io_provide_buf *p = &req->pbuf;
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_buffer *head;
 	int ret = 0;
+	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
 	io_ring_submit_lock(ctx, !force_nonblock);
 
@@ -4242,13 +4250,14 @@ static int io_add_buffers(struct io_provide_buf *pbuf, struct io_buffer **head)
 	return i ? i : -ENOMEM;
 }
 
-static int io_provide_buffers(struct io_kiocb *req, bool force_nonblock,
+static int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags,
 			      struct io_comp_state *cs)
 {
 	struct io_provide_buf *p = &req->pbuf;
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_buffer *head, *list;
 	int ret = 0;
+	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
 	io_ring_submit_lock(ctx, !force_nonblock);
 
@@ -4310,12 +4319,13 @@ static int io_epoll_ctl_prep(struct io_kiocb *req,
 #endif
 }
 
-static int io_epoll_ctl(struct io_kiocb *req, bool force_nonblock,
+static int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags,
 			struct io_comp_state *cs)
 {
 #if defined(CONFIG_EPOLL)
 	struct io_epoll *ie = &req->epoll;
 	int ret;
+	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
 	ret = do_epoll_ctl(ie->epfd, ie->op, ie->fd, &ie->event, force_nonblock);
 	if (force_nonblock && ret == -EAGAIN)
@@ -4347,13 +4357,13 @@ static int io_madvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 #endif
 }
 
-static int io_madvise(struct io_kiocb *req, bool force_nonblock)
+static int io_madvise(struct io_kiocb *req, unsigned int issue_flags)
 {
 #if defined(CONFIG_ADVISE_SYSCALLS) && defined(CONFIG_MMU)
 	struct io_madvise *ma = &req->madvise;
 	int ret;
 
-	if (force_nonblock)
+	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 
 	ret = do_madvise(current->mm, ma->addr, ma->len, ma->advice);
@@ -4379,12 +4389,12 @@ static int io_fadvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_fadvise(struct io_kiocb *req, bool force_nonblock)
+static int io_fadvise(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_fadvise *fa = &req->fadvise;
 	int ret;
 
-	if (force_nonblock) {
+	if (issue_flags & IO_URING_F_NONBLOCK) {
 		switch (fa->advice) {
 		case POSIX_FADV_NORMAL:
 		case POSIX_FADV_RANDOM:
@@ -4420,12 +4430,12 @@ static int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_statx(struct io_kiocb *req, bool force_nonblock)
+static int io_statx(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_statx *ctx = &req->statx;
 	int ret;
 
-	if (force_nonblock) {
+	if (issue_flags & IO_URING_F_NONBLOCK) {
 		/* only need file table for an actual valid fd */
 		if (ctx->dfd == -1 || ctx->dfd == AT_FDCWD)
 			req->flags |= REQ_F_NO_FILE_TABLE;
@@ -4455,7 +4465,7 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_close(struct io_kiocb *req, bool force_nonblock,
+static int io_close(struct io_kiocb *req, unsigned int issue_flags,
 		    struct io_comp_state *cs)
 {
 	struct files_struct *files = current->files;
@@ -4485,7 +4495,7 @@ static int io_close(struct io_kiocb *req, bool force_nonblock,
 	}
 
 	/* if the file has a flush method, be safe and punt to async */
-	if (file->f_op->flush && force_nonblock) {
+	if (file->f_op->flush && (issue_flags & IO_URING_F_NONBLOCK)) {
 		spin_unlock(&files->file_lock);
 		return -EAGAIN;
 	}
@@ -4527,12 +4537,12 @@ static int io_prep_sfr(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_sync_file_range(struct io_kiocb *req, bool force_nonblock)
+static int io_sync_file_range(struct io_kiocb *req, unsigned int issue_flags)
 {
 	int ret;
 
 	/* sync_file_range always requires a blocking context */
-	if (force_nonblock)
+	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 
 	ret = sync_file_range(req->file, req->sync.off, req->sync.len,
@@ -4601,7 +4611,7 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return ret;
 }
 
-static int io_sendmsg(struct io_kiocb *req, bool force_nonblock,
+static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags,
 		      struct io_comp_state *cs)
 {
 	struct io_async_msghdr iomsg, *kmsg;
@@ -4624,11 +4634,11 @@ static int io_sendmsg(struct io_kiocb *req, bool force_nonblock,
 	flags = req->sr_msg.msg_flags;
 	if (flags & MSG_DONTWAIT)
 		req->flags |= REQ_F_NOWAIT;
-	else if (force_nonblock)
+	else if (issue_flags & IO_URING_F_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 
 	ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
-	if (force_nonblock && ret == -EAGAIN)
+	if ((issue_flags & IO_URING_F_NONBLOCK) && ret == -EAGAIN)
 		return io_setup_async_msg(req, kmsg);
 	if (ret == -ERESTARTSYS)
 		ret = -EINTR;
@@ -4643,7 +4653,7 @@ static int io_sendmsg(struct io_kiocb *req, bool force_nonblock,
 	return 0;
 }
 
-static int io_send(struct io_kiocb *req, bool force_nonblock,
+static int io_send(struct io_kiocb *req, unsigned int issue_flags,
 		   struct io_comp_state *cs)
 {
 	struct io_sr_msg *sr = &req->sr_msg;
@@ -4669,12 +4679,12 @@ static int io_send(struct io_kiocb *req, bool force_nonblock,
 	flags = req->sr_msg.msg_flags;
 	if (flags & MSG_DONTWAIT)
 		req->flags |= REQ_F_NOWAIT;
-	else if (force_nonblock)
+	else if (issue_flags & IO_URING_F_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 
 	msg.msg_flags = flags;
 	ret = sock_sendmsg(sock, &msg);
-	if (force_nonblock && ret == -EAGAIN)
+	if ((issue_flags & IO_URING_F_NONBLOCK) && ret == -EAGAIN)
 		return -EAGAIN;
 	if (ret == -ERESTARTSYS)
 		ret = -EINTR;
@@ -4822,7 +4832,7 @@ static int io_recvmsg_prep(struct io_kiocb *req,
 	return ret;
 }
 
-static int io_recvmsg(struct io_kiocb *req, bool force_nonblock,
+static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags,
 		      struct io_comp_state *cs)
 {
 	struct io_async_msghdr iomsg, *kmsg;
@@ -4830,6 +4840,7 @@ static int io_recvmsg(struct io_kiocb *req, bool force_nonblock,
 	struct io_buffer *kbuf;
 	unsigned flags;
 	int ret, cflags = 0;
+	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
 	sock = sock_from_file(req->file);
 	if (unlikely(!sock))
@@ -4878,7 +4889,7 @@ static int io_recvmsg(struct io_kiocb *req, bool force_nonblock,
 	return 0;
 }
 
-static int io_recv(struct io_kiocb *req, bool force_nonblock,
+static int io_recv(struct io_kiocb *req, unsigned int issue_flags,
 		   struct io_comp_state *cs)
 {
 	struct io_buffer *kbuf;
@@ -4889,6 +4900,7 @@ static int io_recv(struct io_kiocb *req, bool force_nonblock,
 	struct iovec iov;
 	unsigned flags;
 	int ret, cflags = 0;
+	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
 	sock = sock_from_file(req->file);
 	if (unlikely(!sock))
@@ -4948,10 +4960,11 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_accept(struct io_kiocb *req, bool force_nonblock,
+static int io_accept(struct io_kiocb *req, unsigned int issue_flags,
 		     struct io_comp_state *cs)
 {
 	struct io_accept *accept = &req->accept;
+	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 	unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
 	int ret;
 
@@ -4992,12 +5005,13 @@ static int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 					&io->address);
 }
 
-static int io_connect(struct io_kiocb *req, bool force_nonblock,
+static int io_connect(struct io_kiocb *req, unsigned int issue_flags,
 		      struct io_comp_state *cs)
 {
 	struct io_async_connect __io, *io;
 	unsigned file_flags;
 	int ret;
+	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
 	if (req->async_data) {
 		io = req->async_data;
@@ -5039,13 +5053,13 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return -EOPNOTSUPP;
 }
 
-static int io_sendmsg(struct io_kiocb *req, bool force_nonblock,
+static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags,
 		      struct io_comp_state *cs)
 {
 	return -EOPNOTSUPP;
 }
 
-static int io_send(struct io_kiocb *req, bool force_nonblock,
+static int io_send(struct io_kiocb *req, unsigned int issue_flags,
 		   struct io_comp_state *cs)
 {
 	return -EOPNOTSUPP;
@@ -5057,13 +5071,13 @@ static int io_recvmsg_prep(struct io_kiocb *req,
 	return -EOPNOTSUPP;
 }
 
-static int io_recvmsg(struct io_kiocb *req, bool force_nonblock,
+static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags,
 		      struct io_comp_state *cs)
 {
 	return -EOPNOTSUPP;
 }
 
-static int io_recv(struct io_kiocb *req, bool force_nonblock,
+static int io_recv(struct io_kiocb *req, unsigned int issue_flags,
 		   struct io_comp_state *cs)
 {
 	return -EOPNOTSUPP;
@@ -5074,7 +5088,7 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return -EOPNOTSUPP;
 }
 
-static int io_accept(struct io_kiocb *req, bool force_nonblock,
+static int io_accept(struct io_kiocb *req, unsigned int issue_flags,
 		     struct io_comp_state *cs)
 {
 	return -EOPNOTSUPP;
@@ -5085,7 +5099,7 @@ static int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return -EOPNOTSUPP;
 }
 
-static int io_connect(struct io_kiocb *req, bool force_nonblock,
+static int io_connect(struct io_kiocb *req, unsigned int issue_flags,
 		      struct io_comp_state *cs)
 {
 	return -EOPNOTSUPP;
@@ -5963,14 +5977,14 @@ static int io_rsrc_update_prep(struct io_kiocb *req,
 	return 0;
 }
 
-static int io_files_update(struct io_kiocb *req, bool force_nonblock,
+static int io_files_update(struct io_kiocb *req, unsigned int issue_flags,
 			   struct io_comp_state *cs)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_uring_rsrc_update up;
 	int ret;
 
-	if (force_nonblock)
+	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 
 	up.offset = req->rsrc_update.offset;
@@ -6189,7 +6203,7 @@ static void __io_clean_op(struct io_kiocb *req)
 	}
 }
 
-static int io_issue_sqe(struct io_kiocb *req, bool force_nonblock,
+static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags,
 			struct io_comp_state *cs)
 {
 	struct io_ring_ctx *ctx = req->ctx;
@@ -6202,15 +6216,15 @@ static int io_issue_sqe(struct io_kiocb *req, bool force_nonblock,
 	case IORING_OP_READV:
 	case IORING_OP_READ_FIXED:
 	case IORING_OP_READ:
-		ret = io_read(req, force_nonblock, cs);
+		ret = io_read(req, issue_flags, cs);
 		break;
 	case IORING_OP_WRITEV:
 	case IORING_OP_WRITE_FIXED:
 	case IORING_OP_WRITE:
-		ret = io_write(req, force_nonblock, cs);
+		ret = io_write(req, issue_flags, cs);
 		break;
 	case IORING_OP_FSYNC:
-		ret = io_fsync(req, force_nonblock);
+		ret = io_fsync(req, issue_flags);
 		break;
 	case IORING_OP_POLL_ADD:
 		ret = io_poll_add(req);
@@ -6219,19 +6233,19 @@ static int io_issue_sqe(struct io_kiocb *req, bool force_nonblock,
 		ret = io_poll_remove(req);
 		break;
 	case IORING_OP_SYNC_FILE_RANGE:
-		ret = io_sync_file_range(req, force_nonblock);
+		ret = io_sync_file_range(req, issue_flags);
 		break;
 	case IORING_OP_SENDMSG:
-		ret = io_sendmsg(req, force_nonblock, cs);
+		ret = io_sendmsg(req, issue_flags, cs);
 		break;
 	case IORING_OP_SEND:
-		ret = io_send(req, force_nonblock, cs);
+		ret = io_send(req, issue_flags, cs);
 		break;
 	case IORING_OP_RECVMSG:
-		ret = io_recvmsg(req, force_nonblock, cs);
+		ret = io_recvmsg(req, issue_flags, cs);
 		break;
 	case IORING_OP_RECV:
-		ret = io_recv(req, force_nonblock, cs);
+		ret = io_recv(req, issue_flags, cs);
 		break;
 	case IORING_OP_TIMEOUT:
 		ret = io_timeout(req);
@@ -6240,61 +6254,61 @@ static int io_issue_sqe(struct io_kiocb *req, bool force_nonblock,
 		ret = io_timeout_remove(req);
 		break;
 	case IORING_OP_ACCEPT:
-		ret = io_accept(req, force_nonblock, cs);
+		ret = io_accept(req, issue_flags, cs);
 		break;
 	case IORING_OP_CONNECT:
-		ret = io_connect(req, force_nonblock, cs);
+		ret = io_connect(req, issue_flags, cs);
 		break;
 	case IORING_OP_ASYNC_CANCEL:
 		ret = io_async_cancel(req);
 		break;
 	case IORING_OP_FALLOCATE:
-		ret = io_fallocate(req, force_nonblock);
+		ret = io_fallocate(req, issue_flags);
 		break;
 	case IORING_OP_OPENAT:
-		ret = io_openat(req, force_nonblock);
+		ret = io_openat(req, issue_flags);
 		break;
 	case IORING_OP_CLOSE:
-		ret = io_close(req, force_nonblock, cs);
+		ret = io_close(req, issue_flags, cs);
 		break;
 	case IORING_OP_FILES_UPDATE:
-		ret = io_files_update(req, force_nonblock, cs);
+		ret = io_files_update(req, issue_flags, cs);
 		break;
 	case IORING_OP_STATX:
-		ret = io_statx(req, force_nonblock);
+		ret = io_statx(req, issue_flags);
 		break;
 	case IORING_OP_FADVISE:
-		ret = io_fadvise(req, force_nonblock);
+		ret = io_fadvise(req, issue_flags);
 		break;
 	case IORING_OP_MADVISE:
-		ret = io_madvise(req, force_nonblock);
+		ret = io_madvise(req, issue_flags);
 		break;
 	case IORING_OP_OPENAT2:
-		ret = io_openat2(req, force_nonblock);
+		ret = io_openat2(req, issue_flags);
 		break;
 	case IORING_OP_EPOLL_CTL:
-		ret = io_epoll_ctl(req, force_nonblock, cs);
+		ret = io_epoll_ctl(req, issue_flags, cs);
 		break;
 	case IORING_OP_SPLICE:
-		ret = io_splice(req, force_nonblock);
+		ret = io_splice(req, issue_flags);
 		break;
 	case IORING_OP_PROVIDE_BUFFERS:
-		ret = io_provide_buffers(req, force_nonblock, cs);
+		ret = io_provide_buffers(req, issue_flags, cs);
 		break;
 	case IORING_OP_REMOVE_BUFFERS:
-		ret = io_remove_buffers(req, force_nonblock, cs);
+		ret = io_remove_buffers(req, issue_flags, cs);
 		break;
 	case IORING_OP_TEE:
-		ret = io_tee(req, force_nonblock);
+		ret = io_tee(req, issue_flags);
 		break;
 	case IORING_OP_SHUTDOWN:
-		ret = io_shutdown(req, force_nonblock);
+		ret = io_shutdown(req, issue_flags);
 		break;
 	case IORING_OP_RENAMEAT:
-		ret = io_renameat(req, force_nonblock);
+		ret = io_renameat(req, issue_flags);
 		break;
 	case IORING_OP_UNLINKAT:
-		ret = io_unlinkat(req, force_nonblock);
+		ret = io_unlinkat(req, issue_flags);
 		break;
 	default:
 		ret = -EINVAL;
@@ -6336,7 +6350,7 @@ static void io_wq_submit_work(struct io_wq_work *work)
 
 	if (!ret) {
 		do {
-			ret = io_issue_sqe(req, false, NULL);
+			ret = io_issue_sqe(req, 0, NULL);
 			/*
 			 * We can get EAGAIN for polled IO even though we're
 			 * forcing a sync submission from here, since we can't
@@ -6499,7 +6513,7 @@ static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
 			old_creds = override_creds(req->work.identity->creds);
 	}
 
-	ret = io_issue_sqe(req, true, cs);
+	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK, cs);
 
 	/*
 	 * We async punt it if the file wasn't marked NOWAIT, or if the file
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/17] io_uring: make op handlers always take issue flags
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 01/17] io_uring: replace force_nonblock with flags Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 03/17] io_uring: don't propagate io_comp_state Pavel Begunkov
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Make opcode handler interfaces a bit more consistent by always passing
in issue flags. Bulky but pretty easy and mechanical change.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 862121c48cee..ac233d04ee71 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3917,7 +3917,8 @@ static int io_splice(struct io_kiocb *req, unsigned int issue_flags)
 /*
  * IORING_OP_NOP just posts a completion event, nothing else.
  */
-static int io_nop(struct io_kiocb *req, struct io_comp_state *cs)
+static int io_nop(struct io_kiocb *req, unsigned int issue_flags,
+		  struct io_comp_state *cs)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 
@@ -5581,7 +5582,7 @@ static int io_poll_remove_prep(struct io_kiocb *req,
  * Find a running poll command that matches one specified in sqe->addr,
  * and remove it if found.
  */
-static int io_poll_remove(struct io_kiocb *req)
+static int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
@@ -5632,7 +5633,7 @@ static int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
 	return 0;
 }
 
-static int io_poll_add(struct io_kiocb *req)
+static int io_poll_add(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_poll_iocb *poll = &req->poll;
 	struct io_ring_ctx *ctx = req->ctx;
@@ -5772,7 +5773,7 @@ static inline enum hrtimer_mode io_translate_timeout_mode(unsigned int flags)
 /*
  * Remove or update an existing timeout command
  */
-static int io_timeout_remove(struct io_kiocb *req)
+static int io_timeout_remove(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_timeout_rem *tr = &req->timeout_rem;
 	struct io_ring_ctx *ctx = req->ctx;
@@ -5828,7 +5829,7 @@ static int io_timeout_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	return 0;
 }
 
-static int io_timeout(struct io_kiocb *req)
+static int io_timeout(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_timeout_data *data = req->async_data;
@@ -5951,7 +5952,7 @@ static int io_async_cancel_prep(struct io_kiocb *req,
 	return 0;
 }
 
-static int io_async_cancel(struct io_kiocb *req)
+static int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 
@@ -6211,7 +6212,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags,
 
 	switch (req->opcode) {
 	case IORING_OP_NOP:
-		ret = io_nop(req, cs);
+		ret = io_nop(req, issue_flags, cs);
 		break;
 	case IORING_OP_READV:
 	case IORING_OP_READ_FIXED:
@@ -6227,10 +6228,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags,
 		ret = io_fsync(req, issue_flags);
 		break;
 	case IORING_OP_POLL_ADD:
-		ret = io_poll_add(req);
+		ret = io_poll_add(req, issue_flags);
 		break;
 	case IORING_OP_POLL_REMOVE:
-		ret = io_poll_remove(req);
+		ret = io_poll_remove(req, issue_flags);
 		break;
 	case IORING_OP_SYNC_FILE_RANGE:
 		ret = io_sync_file_range(req, issue_flags);
@@ -6248,10 +6249,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags,
 		ret = io_recv(req, issue_flags, cs);
 		break;
 	case IORING_OP_TIMEOUT:
-		ret = io_timeout(req);
+		ret = io_timeout(req, issue_flags);
 		break;
 	case IORING_OP_TIMEOUT_REMOVE:
-		ret = io_timeout_remove(req);
+		ret = io_timeout_remove(req, issue_flags);
 		break;
 	case IORING_OP_ACCEPT:
 		ret = io_accept(req, issue_flags, cs);
@@ -6260,7 +6261,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags,
 		ret = io_connect(req, issue_flags, cs);
 		break;
 	case IORING_OP_ASYNC_CANCEL:
-		ret = io_async_cancel(req);
+		ret = io_async_cancel(req, issue_flags);
 		break;
 	case IORING_OP_FALLOCATE:
 		ret = io_fallocate(req, issue_flags);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/17] io_uring: don't propagate io_comp_state
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 01/17] io_uring: replace force_nonblock with flags Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 02/17] io_uring: make op handlers always take issue flags Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10 14:00   ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 04/17] io_uring: don't keep submit_state on stack Pavel Begunkov
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

There is no reason to drag io_comp_state into opcode handlers, we just
need a flag and the actual work will be done in __io_queue_sqe().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 146 +++++++++++++++++++++++---------------------------
 1 file changed, 67 insertions(+), 79 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ac233d04ee71..5b95a3f2b978 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -189,6 +189,7 @@ struct io_rings {
 
 enum io_uring_cmd_flags {
 	IO_URING_F_NONBLOCK		= 1,
+	IO_URING_F_COMPLETE_DEFER	= 2,
 };
 
 struct io_mapped_ubuf {
@@ -1018,7 +1019,7 @@ static void init_fixed_file_ref_node(struct io_ring_ctx *ctx,
 				     struct fixed_rsrc_ref_node *ref_node);
 
 static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
-			     struct io_comp_state *cs);
+			     unsigned int issue_flags);
 static void io_cqring_fill_event(struct io_kiocb *req, long res);
 static void io_put_req(struct io_kiocb *req);
 static void io_put_req_deferred(struct io_kiocb *req, int nr);
@@ -1957,7 +1958,7 @@ static void io_submit_flush_completions(struct io_comp_state *cs)
 }
 
 static void io_req_complete_state(struct io_kiocb *req, long res,
-				  unsigned int cflags, struct io_comp_state *cs)
+				  unsigned int cflags)
 {
 	io_clean_op(req);
 	req->result = res;
@@ -1965,18 +1966,18 @@ static void io_req_complete_state(struct io_kiocb *req, long res,
 	req->flags |= REQ_F_COMPLETE_INLINE;
 }
 
-static inline void __io_req_complete(struct io_kiocb *req, long res,
-				     unsigned cflags, struct io_comp_state *cs)
+static inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags,
+				     long res, unsigned cflags)
 {
-	if (!cs)
-		io_req_complete_nostate(req, res, cflags);
+	if (issue_flags & IO_URING_F_COMPLETE_DEFER)
+		io_req_complete_state(req, res, cflags);
 	else
-		io_req_complete_state(req, res, cflags, cs);
+		io_req_complete_nostate(req, res, cflags);
 }
 
 static inline void io_req_complete(struct io_kiocb *req, long res)
 {
-	__io_req_complete(req, res, 0, NULL);
+	__io_req_complete(req, 0, res, 0);
 }
 
 static inline bool io_is_fallback_req(struct io_kiocb *req)
@@ -2449,7 +2450,7 @@ static void io_iopoll_queue(struct list_head *again)
 	do {
 		req = list_first_entry(again, struct io_kiocb, inflight_entry);
 		list_del(&req->inflight_entry);
-		__io_complete_rw(req, -EAGAIN, 0, NULL);
+		__io_complete_rw(req, -EAGAIN, 0, 0);
 	} while (!list_empty(again));
 }
 
@@ -2662,7 +2663,7 @@ static void kiocb_end_write(struct io_kiocb *req)
 }
 
 static void io_complete_rw_common(struct kiocb *kiocb, long res,
-				  struct io_comp_state *cs)
+				  unsigned int issue_flags)
 {
 	struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
 	int cflags = 0;
@@ -2674,7 +2675,7 @@ static void io_complete_rw_common(struct kiocb *kiocb, long res,
 		req_set_fail_links(req);
 	if (req->flags & REQ_F_BUFFER_SELECTED)
 		cflags = io_put_rw_kbuf(req);
-	__io_req_complete(req, res, cflags, cs);
+	__io_req_complete(req, issue_flags, res, cflags);
 }
 
 #ifdef CONFIG_BLOCK
@@ -2741,17 +2742,17 @@ static bool io_rw_reissue(struct io_kiocb *req, long res)
 }
 
 static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
-			     struct io_comp_state *cs)
+			     unsigned int issue_flags)
 {
 	if (!io_rw_reissue(req, res))
-		io_complete_rw_common(&req->rw.kiocb, res, cs);
+		io_complete_rw_common(&req->rw.kiocb, res, issue_flags);
 }
 
 static void io_complete_rw(struct kiocb *kiocb, long res, long res2)
 {
 	struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
 
-	__io_complete_rw(req, res, res2, NULL);
+	__io_complete_rw(req, res, res2, 0);
 }
 
 static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2)
@@ -2970,7 +2971,7 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
 }
 
 static void kiocb_done(struct kiocb *kiocb, ssize_t ret,
-		       struct io_comp_state *cs)
+		       unsigned int issue_flags)
 {
 	struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
 	struct io_async_rw *io = req->async_data;
@@ -2986,7 +2987,7 @@ static void kiocb_done(struct kiocb *kiocb, ssize_t ret,
 	if (req->flags & REQ_F_CUR_POS)
 		req->file->f_pos = kiocb->ki_pos;
 	if (ret >= 0 && kiocb->ki_complete == io_complete_rw)
-		__io_complete_rw(req, ret, 0, cs);
+		__io_complete_rw(req, ret, 0, issue_flags);
 	else
 		io_rw_done(kiocb, ret);
 }
@@ -3481,8 +3482,7 @@ static int io_iter_do_read(struct io_kiocb *req, struct iov_iter *iter)
 		return -EINVAL;
 }
 
-static int io_read(struct io_kiocb *req, unsigned int issue_flags,
-		   struct io_comp_state *cs)
+static int io_read(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct kiocb *kiocb = &req->rw.kiocb;
@@ -3572,7 +3572,7 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags,
 		/* we got some bytes, but not all. retry. */
 	} while (ret > 0 && ret < io_size);
 done:
-	kiocb_done(kiocb, ret, cs);
+	kiocb_done(kiocb, ret, issue_flags);
 	return 0;
 }
 
@@ -3593,8 +3593,7 @@ static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return io_rw_prep_async(req, WRITE);
 }
 
-static int io_write(struct io_kiocb *req, unsigned int issue_flags,
-		    struct io_comp_state *cs)
+static int io_write(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct kiocb *kiocb = &req->rw.kiocb;
@@ -3668,7 +3667,7 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags,
 		if ((req->ctx->flags & IORING_SETUP_IOPOLL) && ret2 == -EAGAIN)
 			goto copy_iov;
 done:
-		kiocb_done(kiocb, ret2, cs);
+		kiocb_done(kiocb, ret2, issue_flags);
 	} else {
 copy_iov:
 		/* some cases will consume bytes even on error returns */
@@ -3917,15 +3916,14 @@ static int io_splice(struct io_kiocb *req, unsigned int issue_flags)
 /*
  * IORING_OP_NOP just posts a completion event, nothing else.
  */
-static int io_nop(struct io_kiocb *req, unsigned int issue_flags,
-		  struct io_comp_state *cs)
+static int io_nop(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 
 	if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
 		return -EINVAL;
 
-	__io_req_complete(req, 0, 0, cs);
+	__io_req_complete(req, issue_flags, 0, 0);
 	return 0;
 }
 
@@ -4166,8 +4164,7 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, struct io_buffer *buf,
 	return i;
 }
 
-static int io_remove_buffers(struct io_kiocb *req, unsigned int issue_flags,
-			     struct io_comp_state *cs)
+static int io_remove_buffers(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_provide_buf *p = &req->pbuf;
 	struct io_ring_ctx *ctx = req->ctx;
@@ -4188,11 +4185,11 @@ static int io_remove_buffers(struct io_kiocb *req, unsigned int issue_flags,
 
 	/* need to hold the lock to complete IOPOLL requests */
 	if (ctx->flags & IORING_SETUP_IOPOLL) {
-		__io_req_complete(req, ret, 0, cs);
+		__io_req_complete(req, issue_flags, ret, 0);
 		io_ring_submit_unlock(ctx, !force_nonblock);
 	} else {
 		io_ring_submit_unlock(ctx, !force_nonblock);
-		__io_req_complete(req, ret, 0, cs);
+		__io_req_complete(req, issue_flags, ret, 0);
 	}
 	return 0;
 }
@@ -4251,8 +4248,7 @@ static int io_add_buffers(struct io_provide_buf *pbuf, struct io_buffer **head)
 	return i ? i : -ENOMEM;
 }
 
-static int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags,
-			      struct io_comp_state *cs)
+static int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_provide_buf *p = &req->pbuf;
 	struct io_ring_ctx *ctx = req->ctx;
@@ -4284,11 +4280,11 @@ static int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags,
 
 	/* need to hold the lock to complete IOPOLL requests */
 	if (ctx->flags & IORING_SETUP_IOPOLL) {
-		__io_req_complete(req, ret, 0, cs);
+		__io_req_complete(req, issue_flags, ret, 0);
 		io_ring_submit_unlock(ctx, !force_nonblock);
 	} else {
 		io_ring_submit_unlock(ctx, !force_nonblock);
-		__io_req_complete(req, ret, 0, cs);
+		__io_req_complete(req, issue_flags, ret, 0);
 	}
 	return 0;
 }
@@ -4320,8 +4316,7 @@ static int io_epoll_ctl_prep(struct io_kiocb *req,
 #endif
 }
 
-static int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags,
-			struct io_comp_state *cs)
+static int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags)
 {
 #if defined(CONFIG_EPOLL)
 	struct io_epoll *ie = &req->epoll;
@@ -4334,7 +4329,7 @@ static int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags,
 
 	if (ret < 0)
 		req_set_fail_links(req);
-	__io_req_complete(req, ret, 0, cs);
+	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
 #else
 	return -EOPNOTSUPP;
@@ -4466,8 +4461,7 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_close(struct io_kiocb *req, unsigned int issue_flags,
-		    struct io_comp_state *cs)
+static int io_close(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct files_struct *files = current->files;
 	struct io_close *close = &req->close;
@@ -4516,7 +4510,7 @@ static int io_close(struct io_kiocb *req, unsigned int issue_flags,
 		req_set_fail_links(req);
 	if (file)
 		fput(file);
-	__io_req_complete(req, ret, 0, cs);
+	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
 }
 
@@ -4612,8 +4606,7 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return ret;
 }
 
-static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags,
-		      struct io_comp_state *cs)
+static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_async_msghdr iomsg, *kmsg;
 	struct socket *sock;
@@ -4650,12 +4643,11 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags,
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	if (ret < 0)
 		req_set_fail_links(req);
-	__io_req_complete(req, ret, 0, cs);
+	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
 }
 
-static int io_send(struct io_kiocb *req, unsigned int issue_flags,
-		   struct io_comp_state *cs)
+static int io_send(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_sr_msg *sr = &req->sr_msg;
 	struct msghdr msg;
@@ -4692,7 +4684,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags,
 
 	if (ret < 0)
 		req_set_fail_links(req);
-	__io_req_complete(req, ret, 0, cs);
+	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
 }
 
@@ -4833,8 +4825,7 @@ static int io_recvmsg_prep(struct io_kiocb *req,
 	return ret;
 }
 
-static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags,
-		      struct io_comp_state *cs)
+static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_async_msghdr iomsg, *kmsg;
 	struct socket *sock;
@@ -4886,12 +4877,11 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags,
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	if (ret < 0)
 		req_set_fail_links(req);
-	__io_req_complete(req, ret, cflags, cs);
+	__io_req_complete(req, issue_flags, ret, cflags);
 	return 0;
 }
 
-static int io_recv(struct io_kiocb *req, unsigned int issue_flags,
-		   struct io_comp_state *cs)
+static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_buffer *kbuf;
 	struct io_sr_msg *sr = &req->sr_msg;
@@ -4941,7 +4931,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags,
 		cflags = io_put_recv_kbuf(req);
 	if (ret < 0)
 		req_set_fail_links(req);
-	__io_req_complete(req, ret, cflags, cs);
+	__io_req_complete(req, issue_flags, ret, cflags);
 	return 0;
 }
 
@@ -4961,8 +4951,7 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_accept(struct io_kiocb *req, unsigned int issue_flags,
-		     struct io_comp_state *cs)
+static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_accept *accept = &req->accept;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
@@ -4982,7 +4971,7 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags,
 			ret = -EINTR;
 		req_set_fail_links(req);
 	}
-	__io_req_complete(req, ret, 0, cs);
+	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
 }
 
@@ -5006,8 +4995,7 @@ static int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 					&io->address);
 }
 
-static int io_connect(struct io_kiocb *req, unsigned int issue_flags,
-		      struct io_comp_state *cs)
+static int io_connect(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_async_connect __io, *io;
 	unsigned file_flags;
@@ -5045,7 +5033,7 @@ static int io_connect(struct io_kiocb *req, unsigned int issue_flags,
 out:
 	if (ret < 0)
 		req_set_fail_links(req);
-	__io_req_complete(req, ret, 0, cs);
+	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
 }
 #else /* !CONFIG_NET */
@@ -5978,8 +5966,7 @@ static int io_rsrc_update_prep(struct io_kiocb *req,
 	return 0;
 }
 
-static int io_files_update(struct io_kiocb *req, unsigned int issue_flags,
-			   struct io_comp_state *cs)
+static int io_files_update(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_uring_rsrc_update up;
@@ -5997,7 +5984,7 @@ static int io_files_update(struct io_kiocb *req, unsigned int issue_flags,
 
 	if (ret < 0)
 		req_set_fail_links(req);
-	__io_req_complete(req, ret, 0, cs);
+	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
 }
 
@@ -6204,25 +6191,24 @@ static void __io_clean_op(struct io_kiocb *req)
 	}
 }
 
-static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags,
-			struct io_comp_state *cs)
+static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
 	switch (req->opcode) {
 	case IORING_OP_NOP:
-		ret = io_nop(req, issue_flags, cs);
+		ret = io_nop(req, issue_flags);
 		break;
 	case IORING_OP_READV:
 	case IORING_OP_READ_FIXED:
 	case IORING_OP_READ:
-		ret = io_read(req, issue_flags, cs);
+		ret = io_read(req, issue_flags);
 		break;
 	case IORING_OP_WRITEV:
 	case IORING_OP_WRITE_FIXED:
 	case IORING_OP_WRITE:
-		ret = io_write(req, issue_flags, cs);
+		ret = io_write(req, issue_flags);
 		break;
 	case IORING_OP_FSYNC:
 		ret = io_fsync(req, issue_flags);
@@ -6237,16 +6223,16 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags,
 		ret = io_sync_file_range(req, issue_flags);
 		break;
 	case IORING_OP_SENDMSG:
-		ret = io_sendmsg(req, issue_flags, cs);
+		ret = io_sendmsg(req, issue_flags);
 		break;
 	case IORING_OP_SEND:
-		ret = io_send(req, issue_flags, cs);
+		ret = io_send(req, issue_flags);
 		break;
 	case IORING_OP_RECVMSG:
-		ret = io_recvmsg(req, issue_flags, cs);
+		ret = io_recvmsg(req, issue_flags);
 		break;
 	case IORING_OP_RECV:
-		ret = io_recv(req, issue_flags, cs);
+		ret = io_recv(req, issue_flags);
 		break;
 	case IORING_OP_TIMEOUT:
 		ret = io_timeout(req, issue_flags);
@@ -6255,10 +6241,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags,
 		ret = io_timeout_remove(req, issue_flags);
 		break;
 	case IORING_OP_ACCEPT:
-		ret = io_accept(req, issue_flags, cs);
+		ret = io_accept(req, issue_flags);
 		break;
 	case IORING_OP_CONNECT:
-		ret = io_connect(req, issue_flags, cs);
+		ret = io_connect(req, issue_flags);
 		break;
 	case IORING_OP_ASYNC_CANCEL:
 		ret = io_async_cancel(req, issue_flags);
@@ -6270,10 +6256,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags,
 		ret = io_openat(req, issue_flags);
 		break;
 	case IORING_OP_CLOSE:
-		ret = io_close(req, issue_flags, cs);
+		ret = io_close(req, issue_flags);
 		break;
 	case IORING_OP_FILES_UPDATE:
-		ret = io_files_update(req, issue_flags, cs);
+		ret = io_files_update(req, issue_flags);
 		break;
 	case IORING_OP_STATX:
 		ret = io_statx(req, issue_flags);
@@ -6288,16 +6274,16 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags,
 		ret = io_openat2(req, issue_flags);
 		break;
 	case IORING_OP_EPOLL_CTL:
-		ret = io_epoll_ctl(req, issue_flags, cs);
+		ret = io_epoll_ctl(req, issue_flags);
 		break;
 	case IORING_OP_SPLICE:
 		ret = io_splice(req, issue_flags);
 		break;
 	case IORING_OP_PROVIDE_BUFFERS:
-		ret = io_provide_buffers(req, issue_flags, cs);
+		ret = io_provide_buffers(req, issue_flags);
 		break;
 	case IORING_OP_REMOVE_BUFFERS:
-		ret = io_remove_buffers(req, issue_flags, cs);
+		ret = io_remove_buffers(req, issue_flags);
 		break;
 	case IORING_OP_TEE:
 		ret = io_tee(req, issue_flags);
@@ -6351,7 +6337,7 @@ static void io_wq_submit_work(struct io_wq_work *work)
 
 	if (!ret) {
 		do {
-			ret = io_issue_sqe(req, 0, NULL);
+			ret = io_issue_sqe(req, 0);
 			/*
 			 * We can get EAGAIN for polled IO even though we're
 			 * forcing a sync submission from here, since we can't
@@ -6498,8 +6484,10 @@ static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
 {
 	struct io_kiocb *linked_timeout;
 	const struct cred *old_creds = NULL;
-	int ret;
+	int ret, issue_flags = IO_URING_F_NONBLOCK;
 
+	if (cs)
+		issue_flags |= IO_URING_F_COMPLETE_DEFER;
 again:
 	linked_timeout = io_prep_linked_timeout(req);
 
@@ -6514,7 +6502,7 @@ static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
 			old_creds = override_creds(req->work.identity->creds);
 	}
 
-	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK, cs);
+	ret = io_issue_sqe(req, issue_flags);
 
 	/*
 	 * We async punt it if the file wasn't marked NOWAIT, or if the file
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/17] io_uring: don't keep submit_state on stack
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 03/17] io_uring: don't propagate io_comp_state Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 05/17] io_uring: remove ctx from comp_state Pavel Begunkov
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

struct io_submit_state is quite big (168 bytes) and going to grow. It's
better to not keep it on stack as it is now. Move it to context, it's
always protected by uring_lock, so it's fine to have only one instance
of it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 90 ++++++++++++++++++++++++++-------------------------
 1 file changed, 46 insertions(+), 44 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5b95a3f2b978..0606fa5f9eb0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -264,6 +264,39 @@ struct io_sq_data {
 	unsigned		sq_thread_idle;
 };
 
+#define IO_IOPOLL_BATCH			8
+
+struct io_comp_state {
+	unsigned int		nr;
+	struct list_head	list;
+	struct io_ring_ctx	*ctx;
+};
+
+struct io_submit_state {
+	struct blk_plug		plug;
+
+	/*
+	 * io_kiocb alloc cache
+	 */
+	void			*reqs[IO_IOPOLL_BATCH];
+	unsigned int		free_reqs;
+
+	bool			plug_started;
+
+	/*
+	 * Batch completion logic
+	 */
+	struct io_comp_state	comp;
+
+	/*
+	 * File reference cache
+	 */
+	struct file		*file;
+	unsigned int		fd;
+	unsigned int		file_refs;
+	unsigned int		ios_left;
+};
+
 struct io_ring_ctx {
 	struct {
 		struct percpu_ref	refs;
@@ -406,6 +439,7 @@ struct io_ring_ctx {
 
 	struct work_struct		exit_work;
 	struct io_restriction		restrictions;
+	struct io_submit_state		submit_state;
 };
 
 /*
@@ -758,39 +792,6 @@ struct io_defer_entry {
 	u32			seq;
 };
 
-#define IO_IOPOLL_BATCH			8
-
-struct io_comp_state {
-	unsigned int		nr;
-	struct list_head	list;
-	struct io_ring_ctx	*ctx;
-};
-
-struct io_submit_state {
-	struct blk_plug		plug;
-
-	/*
-	 * io_kiocb alloc cache
-	 */
-	void			*reqs[IO_IOPOLL_BATCH];
-	unsigned int		free_reqs;
-
-	bool			plug_started;
-
-	/*
-	 * Batch completion logic
-	 */
-	struct io_comp_state	comp;
-
-	/*
-	 * File reference cache
-	 */
-	struct file		*file;
-	unsigned int		fd;
-	unsigned int		file_refs;
-	unsigned int		ios_left;
-};
-
 struct io_op_def {
 	/* needs req->file assigned */
 	unsigned		needs_file : 1;
@@ -1997,9 +1998,10 @@ static struct io_kiocb *io_get_fallback_req(struct io_ring_ctx *ctx)
 	return NULL;
 }
 
-static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx,
-				     struct io_submit_state *state)
+static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
 {
+	struct io_submit_state *state = &ctx->submit_state;
+
 	if (!state->free_reqs) {
 		gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
 		size_t sz;
@@ -6764,9 +6766,9 @@ static inline bool io_check_restriction(struct io_ring_ctx *ctx,
 				IOSQE_BUFFER_SELECT)
 
 static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
-		       const struct io_uring_sqe *sqe,
-		       struct io_submit_state *state)
+		       const struct io_uring_sqe *sqe)
 {
+	struct io_submit_state *state;
 	unsigned int sqe_flags;
 	int id, ret;
 
@@ -6818,6 +6820,7 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
 
 	/* same numerical values with corresponding REQ_F_*, safe to copy */
 	req->flags |= sqe_flags;
+	state = &ctx->submit_state;
 
 	/*
 	 * Plug now if we have more than 1 IO left after this, and the target
@@ -6844,7 +6847,6 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
 
 static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 {
-	struct io_submit_state state;
 	struct io_submit_link link;
 	int i, submitted = 0;
 
@@ -6863,7 +6865,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 	percpu_counter_add(&current->io_uring->inflight, nr);
 	refcount_add(nr, &current->usage);
 
-	io_submit_state_start(&state, ctx, nr);
+	io_submit_state_start(&ctx->submit_state, ctx, nr);
 	link.head = NULL;
 
 	for (i = 0; i < nr; i++) {
@@ -6876,7 +6878,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 			io_consume_sqe(ctx);
 			break;
 		}
-		req = io_alloc_req(ctx, &state);
+		req = io_alloc_req(ctx);
 		if (unlikely(!req)) {
 			if (!submitted)
 				submitted = -EAGAIN;
@@ -6886,7 +6888,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 		/* will complete beyond this point, count as submitted */
 		submitted++;
 
-		err = io_init_req(ctx, req, sqe, &state);
+		err = io_init_req(ctx, req, sqe);
 		if (unlikely(err)) {
 fail_req:
 			io_put_req(req);
@@ -6896,7 +6898,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 
 		trace_io_uring_submit_sqe(ctx, req->opcode, req->user_data,
 					true, ctx->flags & IORING_SETUP_SQPOLL);
-		err = io_submit_sqe(req, sqe, &link, &state.comp);
+		err = io_submit_sqe(req, sqe, &link, &ctx->submit_state.comp);
 		if (err)
 			goto fail_req;
 	}
@@ -6911,8 +6913,8 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 		put_task_struct_many(current, unused);
 	}
 	if (link.head)
-		io_queue_link_head(link.head, &state.comp);
-	io_submit_state_end(&state);
+		io_queue_link_head(link.head, &ctx->submit_state.comp);
+	io_submit_state_end(&ctx->submit_state);
 
 	 /* Commit SQ ring head once we've consumed and submitted all SQEs */
 	io_commit_sqring(ctx);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/17] io_uring: remove ctx from comp_state
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (3 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 04/17] io_uring: don't keep submit_state on stack Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 06/17] io_uring: don't reinit submit state every time Pavel Begunkov
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

completion state is closely bound to ctx, we don't need to store ctx
inside as we always have it around to pass to flush.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0606fa5f9eb0..f0cc5ccd6fe4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -269,7 +269,6 @@ struct io_sq_data {
 struct io_comp_state {
 	unsigned int		nr;
 	struct list_head	list;
-	struct io_ring_ctx	*ctx;
 };
 
 struct io_submit_state {
@@ -1924,10 +1923,9 @@ static inline void io_req_complete_nostate(struct io_kiocb *req, long res,
 	io_put_req(req);
 }
 
-static void io_submit_flush_completions(struct io_comp_state *cs)
+static void io_submit_flush_completions(struct io_comp_state *cs,
+					struct io_ring_ctx *ctx)
 {
-	struct io_ring_ctx *ctx = cs->ctx;
-
 	spin_lock_irq(&ctx->completion_lock);
 	while (!list_empty(&cs->list)) {
 		struct io_kiocb *req;
@@ -6526,7 +6524,7 @@ static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
 		if (req->flags & REQ_F_COMPLETE_INLINE) {
 			list_add_tail(&req->compl.list, &cs->list);
 			if (++cs->nr >= 32)
-				io_submit_flush_completions(cs);
+				io_submit_flush_completions(cs, req->ctx);
 			req = NULL;
 		} else {
 			req = io_put_req_find_next(req);
@@ -6661,10 +6659,11 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 /*
  * Batched submission is done, ensure local IO is flushed out.
  */
-static void io_submit_state_end(struct io_submit_state *state)
+static void io_submit_state_end(struct io_submit_state *state,
+				struct io_ring_ctx *ctx)
 {
 	if (!list_empty(&state->comp.list))
-		io_submit_flush_completions(&state->comp);
+		io_submit_flush_completions(&state->comp, ctx);
 	if (state->plug_started)
 		blk_finish_plug(&state->plug);
 	io_state_file_put(state);
@@ -6676,12 +6675,11 @@ static void io_submit_state_end(struct io_submit_state *state)
  * Start submission side cache.
  */
 static void io_submit_state_start(struct io_submit_state *state,
-				  struct io_ring_ctx *ctx, unsigned int max_ios)
+				  unsigned int max_ios)
 {
 	state->plug_started = false;
 	state->comp.nr = 0;
 	INIT_LIST_HEAD(&state->comp.list);
-	state->comp.ctx = ctx;
 	state->free_reqs = 0;
 	state->file_refs = 0;
 	state->ios_left = max_ios;
@@ -6865,7 +6863,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 	percpu_counter_add(&current->io_uring->inflight, nr);
 	refcount_add(nr, &current->usage);
 
-	io_submit_state_start(&ctx->submit_state, ctx, nr);
+	io_submit_state_start(&ctx->submit_state, nr);
 	link.head = NULL;
 
 	for (i = 0; i < nr; i++) {
@@ -6914,7 +6912,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 	}
 	if (link.head)
 		io_queue_link_head(link.head, &ctx->submit_state.comp);
-	io_submit_state_end(&ctx->submit_state);
+	io_submit_state_end(&ctx->submit_state, ctx);
 
 	 /* Commit SQ ring head once we've consumed and submitted all SQEs */
 	io_commit_sqring(ctx);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/17] io_uring: don't reinit submit state every time
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (4 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 05/17] io_uring: remove ctx from comp_state Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 07/17] io_uring: replace list with array for compl batch Pavel Begunkov
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

As now submit_state is retained across syscalls, we can save ourself
from initialising it from ground up for each io_submit_sqes(). Set some
fields during ctx allocation, and just keep them always consistent.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f0cc5ccd6fe4..7076564aa944 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1294,6 +1294,7 @@ static inline bool io_is_timeout_noseq(struct io_kiocb *req)
 
 static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 {
+	struct io_submit_state *submit_state;
 	struct io_ring_ctx *ctx;
 	int hash_bits;
 
@@ -1345,6 +1346,12 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	INIT_LIST_HEAD(&ctx->rsrc_ref_list);
 	INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
 	init_llist_head(&ctx->rsrc_put_llist);
+
+	submit_state = &ctx->submit_state;
+	INIT_LIST_HEAD(&submit_state->comp.list);
+	submit_state->comp.nr = 0;
+	submit_state->file_refs = 0;
+	submit_state->free_reqs = 0;
 	return ctx;
 err:
 	if (ctx->fallback_req)
@@ -6667,8 +6674,10 @@ static void io_submit_state_end(struct io_submit_state *state,
 	if (state->plug_started)
 		blk_finish_plug(&state->plug);
 	io_state_file_put(state);
-	if (state->free_reqs)
+	if (state->free_reqs) {
 		kmem_cache_free_bulk(req_cachep, state->free_reqs, state->reqs);
+		state->free_reqs = 0;
+	}
 }
 
 /*
@@ -6678,10 +6687,6 @@ static void io_submit_state_start(struct io_submit_state *state,
 				  unsigned int max_ios)
 {
 	state->plug_started = false;
-	state->comp.nr = 0;
-	INIT_LIST_HEAD(&state->comp.list);
-	state->free_reqs = 0;
-	state->file_refs = 0;
 	state->ios_left = max_ios;
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/17] io_uring: replace list with array for compl batch
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (5 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 06/17] io_uring: don't reinit submit state every time Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 08/17] io_uring: submit-completion free batching Pavel Begunkov
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Reincarnation of an old patch that replaces a list in struct
io_compl_batch with an array. It's needed to avoid hooking requests via
their compl.list, because it won't be always available in the future.

It's also nice to split io_submit_flush_completions() to avoid free
under locks and remove unlock/lock with a long comment describing when
it can be done.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 35 +++++++++++------------------------
 1 file changed, 11 insertions(+), 24 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7076564aa944..8c5fd348cac5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -265,10 +265,11 @@ struct io_sq_data {
 };
 
 #define IO_IOPOLL_BATCH			8
+#define IO_COMPL_BATCH			32
 
 struct io_comp_state {
 	unsigned int		nr;
-	struct list_head	list;
+	struct io_kiocb		*reqs[IO_COMPL_BATCH];
 };
 
 struct io_submit_state {
@@ -1348,7 +1349,6 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	init_llist_head(&ctx->rsrc_put_llist);
 
 	submit_state = &ctx->submit_state;
-	INIT_LIST_HEAD(&submit_state->comp.list);
 	submit_state->comp.nr = 0;
 	submit_state->file_refs = 0;
 	submit_state->free_reqs = 0;
@@ -1933,33 +1933,20 @@ static inline void io_req_complete_nostate(struct io_kiocb *req, long res,
 static void io_submit_flush_completions(struct io_comp_state *cs,
 					struct io_ring_ctx *ctx)
 {
+	int i, nr = cs->nr;
+
 	spin_lock_irq(&ctx->completion_lock);
-	while (!list_empty(&cs->list)) {
-		struct io_kiocb *req;
+	for (i = 0; i < nr; i++) {
+		struct io_kiocb *req = cs->reqs[i];
 
-		req = list_first_entry(&cs->list, struct io_kiocb, compl.list);
-		list_del(&req->compl.list);
 		__io_cqring_fill_event(req, req->result, req->compl.cflags);
-
-		/*
-		 * io_free_req() doesn't care about completion_lock unless one
-		 * of these flags is set. REQ_F_WORK_INITIALIZED is in the list
-		 * because of a potential deadlock with req->work.fs->lock
-		 * We defer both, completion and submission refs.
-		 */
-		if (req->flags & (REQ_F_FAIL_LINK|REQ_F_LINK_TIMEOUT
-				 |REQ_F_WORK_INITIALIZED)) {
-			spin_unlock_irq(&ctx->completion_lock);
-			io_double_put_req(req);
-			spin_lock_irq(&ctx->completion_lock);
-		} else {
-			io_double_put_req(req);
-		}
 	}
 	io_commit_cqring(ctx);
 	spin_unlock_irq(&ctx->completion_lock);
 
 	io_cqring_ev_posted(ctx);
+	for (i = 0; i < nr; i++)
+		io_double_put_req(cs->reqs[i]);
 	cs->nr = 0;
 }
 
@@ -6529,8 +6516,8 @@ static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
 	} else if (likely(!ret)) {
 		/* drop submission reference */
 		if (req->flags & REQ_F_COMPLETE_INLINE) {
-			list_add_tail(&req->compl.list, &cs->list);
-			if (++cs->nr >= 32)
+			cs->reqs[cs->nr++] = req;
+			if (cs->nr == IO_COMPL_BATCH)
 				io_submit_flush_completions(cs, req->ctx);
 			req = NULL;
 		} else {
@@ -6669,7 +6656,7 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 static void io_submit_state_end(struct io_submit_state *state,
 				struct io_ring_ctx *ctx)
 {
-	if (!list_empty(&state->comp.list))
+	if (state->comp.nr)
 		io_submit_flush_completions(&state->comp, ctx);
 	if (state->plug_started)
 		blk_finish_plug(&state->plug);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/17] io_uring: submit-completion free batching
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (6 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 07/17] io_uring: replace list with array for compl batch Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 09/17] io_uring: remove fallback_req Pavel Begunkov
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

io_submit_flush_completions() does completion batching, but may also use
free batching as iopoll does. The main beneficiaries should be buffered
reads/writes and send/recv.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 49 +++++++++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8c5fd348cac5..ed4c92f64d96 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1930,26 +1930,6 @@ static inline void io_req_complete_nostate(struct io_kiocb *req, long res,
 	io_put_req(req);
 }
 
-static void io_submit_flush_completions(struct io_comp_state *cs,
-					struct io_ring_ctx *ctx)
-{
-	int i, nr = cs->nr;
-
-	spin_lock_irq(&ctx->completion_lock);
-	for (i = 0; i < nr; i++) {
-		struct io_kiocb *req = cs->reqs[i];
-
-		__io_cqring_fill_event(req, req->result, req->compl.cflags);
-	}
-	io_commit_cqring(ctx);
-	spin_unlock_irq(&ctx->completion_lock);
-
-	io_cqring_ev_posted(ctx);
-	for (i = 0; i < nr; i++)
-		io_double_put_req(cs->reqs[i]);
-	cs->nr = 0;
-}
-
 static void io_req_complete_state(struct io_kiocb *req, long res,
 				  unsigned int cflags)
 {
@@ -2335,6 +2315,35 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
 		__io_req_free_batch_flush(req->ctx, rb);
 }
 
+static void io_submit_flush_completions(struct io_comp_state *cs,
+					struct io_ring_ctx *ctx)
+{
+	int i, nr = cs->nr;
+	struct io_kiocb *req;
+	struct req_batch rb;
+
+	io_init_req_batch(&rb);
+	spin_lock_irq(&ctx->completion_lock);
+	for (i = 0; i < nr; i++) {
+		req = cs->reqs[i];
+		__io_cqring_fill_event(req, req->result, req->compl.cflags);
+	}
+	io_commit_cqring(ctx);
+	spin_unlock_irq(&ctx->completion_lock);
+
+	io_cqring_ev_posted(ctx);
+	for (i = 0; i < nr; i++) {
+		req = cs->reqs[i];
+
+		/* submission and completion refs */
+		if (refcount_sub_and_test(2, &req->refs))
+			io_req_free_batch(&rb, req);
+	}
+
+	io_req_free_batch_finish(ctx, &rb);
+	cs->nr = 0;
+}
+
 /*
  * Drop reference to request, return next in chain (if there is one) if this
  * was the last reference to this request.
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/17] io_uring: remove fallback_req
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (7 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 08/17] io_uring: submit-completion free batching Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 10/17] io_uring: count ctx refs separately from reqs Pavel Begunkov
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Remove fallback_req for now, it gets in the way of other changes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 38 ++------------------------------------
 1 file changed, 2 insertions(+), 36 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ed4c92f64d96..be940db96fb8 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -386,9 +386,6 @@ struct io_ring_ctx {
 	struct completion	ref_comp;
 	struct completion	sq_thread_comp;
 
-	/* if all else fails... */
-	struct io_kiocb		*fallback_req;
-
 #if defined(CONFIG_UNIX)
 	struct socket		*ring_sock;
 #endif
@@ -1303,10 +1300,6 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	if (!ctx)
 		return NULL;
 
-	ctx->fallback_req = kmem_cache_alloc(req_cachep, GFP_KERNEL);
-	if (!ctx->fallback_req)
-		goto err;
-
 	/*
 	 * Use 5 bits less than the max cq entries, that should give us around
 	 * 32 entries per hash list if totally full and uniformly spread.
@@ -1354,8 +1347,6 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	submit_state->free_reqs = 0;
 	return ctx;
 err:
-	if (ctx->fallback_req)
-		kmem_cache_free(req_cachep, ctx->fallback_req);
 	kfree(ctx->cancel_hash);
 	kfree(ctx);
 	return NULL;
@@ -1953,23 +1944,6 @@ static inline void io_req_complete(struct io_kiocb *req, long res)
 	__io_req_complete(req, 0, res, 0);
 }
 
-static inline bool io_is_fallback_req(struct io_kiocb *req)
-{
-	return req == (struct io_kiocb *)
-			((unsigned long) req->ctx->fallback_req & ~1UL);
-}
-
-static struct io_kiocb *io_get_fallback_req(struct io_ring_ctx *ctx)
-{
-	struct io_kiocb *req;
-
-	req = ctx->fallback_req;
-	if (!test_and_set_bit_lock(0, (unsigned long *) &ctx->fallback_req))
-		return req;
-
-	return NULL;
-}
-
 static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
 {
 	struct io_submit_state *state = &ctx->submit_state;
@@ -1989,7 +1963,7 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
 		if (unlikely(ret <= 0)) {
 			state->reqs[0] = kmem_cache_alloc(req_cachep, gfp);
 			if (!state->reqs[0])
-				return io_get_fallback_req(ctx);
+				return NULL;
 			ret = 1;
 		}
 		state->free_reqs = ret;
@@ -2036,10 +2010,7 @@ static void __io_free_req(struct io_kiocb *req)
 	io_dismantle_req(req);
 	io_put_task(req->task, 1);
 
-	if (likely(!io_is_fallback_req(req)))
-		kmem_cache_free(req_cachep, req);
-	else
-		clear_bit_unlock(0, (unsigned long *) &ctx->fallback_req);
+	kmem_cache_free(req_cachep, req);
 	percpu_ref_put(&ctx->refs);
 }
 
@@ -2295,10 +2266,6 @@ static void io_req_free_batch_finish(struct io_ring_ctx *ctx,
 
 static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
 {
-	if (unlikely(io_is_fallback_req(req))) {
-		io_free_req(req);
-		return;
-	}
 	io_queue_next(req);
 
 	if (req->task != rb->task) {
@@ -8707,7 +8674,6 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	free_uid(ctx->user);
 	put_cred(ctx->creds);
 	kfree(ctx->cancel_hash);
-	kmem_cache_free(req_cachep, ctx->fallback_req);
 	kfree(ctx);
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/17] io_uring: count ctx refs separately from reqs
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (8 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 09/17] io_uring: remove fallback_req Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 11/17] io_uring: persistent req cache Pavel Begunkov
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Currently batch free handles request memory freeing and ctx ref putting
together. Separate them and use different counters, that will be needed
for reusing reqs memory.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index be940db96fb8..c1f7dd17a62f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2233,6 +2233,7 @@ static void io_free_req(struct io_kiocb *req)
 struct req_batch {
 	void *reqs[IO_IOPOLL_BATCH];
 	int to_free;
+	int ctx_refs;
 
 	struct task_struct	*task;
 	int			task_refs;
@@ -2242,6 +2243,7 @@ static inline void io_init_req_batch(struct req_batch *rb)
 {
 	rb->to_free = 0;
 	rb->task_refs = 0;
+	rb->ctx_refs = 0;
 	rb->task = NULL;
 }
 
@@ -2249,7 +2251,6 @@ static void __io_req_free_batch_flush(struct io_ring_ctx *ctx,
 				      struct req_batch *rb)
 {
 	kmem_cache_free_bulk(req_cachep, rb->to_free, rb->reqs);
-	percpu_ref_put_many(&ctx->refs, rb->to_free);
 	rb->to_free = 0;
 }
 
@@ -2262,6 +2263,8 @@ static void io_req_free_batch_finish(struct io_ring_ctx *ctx,
 		io_put_task(rb->task, rb->task_refs);
 		rb->task = NULL;
 	}
+	if (rb->ctx_refs)
+		percpu_ref_put_many(&ctx->refs, rb->ctx_refs);
 }
 
 static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
@@ -2275,6 +2278,7 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
 		rb->task_refs = 0;
 	}
 	rb->task_refs++;
+	rb->ctx_refs++;
 
 	io_dismantle_req(req);
 	rb->reqs[rb->to_free++] = req;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/17] io_uring: persistent req cache
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (9 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 10/17] io_uring: count ctx refs separately from reqs Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 12/17] io_uring: feed reqs back into alloc cache Pavel Begunkov
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Don't free batch-allocated requests across syscalls.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index c1f7dd17a62f..3711ae2633cb 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -266,6 +266,8 @@ struct io_sq_data {
 
 #define IO_IOPOLL_BATCH			8
 #define IO_COMPL_BATCH			32
+#define IO_REQ_CACHE_SIZE		8
+#define IO_REQ_ALLOC_BATCH		8
 
 struct io_comp_state {
 	unsigned int		nr;
@@ -278,7 +280,7 @@ struct io_submit_state {
 	/*
 	 * io_kiocb alloc cache
 	 */
-	void			*reqs[IO_IOPOLL_BATCH];
+	void			*reqs[IO_REQ_CACHE_SIZE];
 	unsigned int		free_reqs;
 
 	bool			plug_started;
@@ -1948,13 +1950,14 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
 {
 	struct io_submit_state *state = &ctx->submit_state;
 
+	BUILD_BUG_ON(IO_REQ_ALLOC_BATCH > ARRAY_SIZE(state->reqs));
+
 	if (!state->free_reqs) {
 		gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
-		size_t sz;
 		int ret;
 
-		sz = min_t(size_t, state->ios_left, ARRAY_SIZE(state->reqs));
-		ret = kmem_cache_alloc_bulk(req_cachep, gfp, sz, state->reqs);
+		ret = kmem_cache_alloc_bulk(req_cachep, gfp, IO_REQ_ALLOC_BATCH,
+					    state->reqs);
 
 		/*
 		 * Bulk alloc is all-or-nothing. If we fail to get a batch,
@@ -6641,10 +6644,6 @@ static void io_submit_state_end(struct io_submit_state *state,
 	if (state->plug_started)
 		blk_finish_plug(&state->plug);
 	io_state_file_put(state);
-	if (state->free_reqs) {
-		kmem_cache_free_bulk(req_cachep, state->free_reqs, state->reqs);
-		state->free_reqs = 0;
-	}
 }
 
 /*
@@ -8644,6 +8643,8 @@ static void io_destroy_buffers(struct io_ring_ctx *ctx)
 
 static void io_ring_ctx_free(struct io_ring_ctx *ctx)
 {
+	struct io_submit_state *submit_state = &ctx->submit_state;
+
 	io_finish_async(ctx);
 	io_sqe_buffers_unregister(ctx);
 
@@ -8654,6 +8655,10 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
 		ctx->mm_account = NULL;
 	}
 
+	if (submit_state->free_reqs)
+		kmem_cache_free_bulk(req_cachep, submit_state->free_reqs,
+				     submit_state->reqs);
+
 #ifdef CONFIG_BLK_CGROUP
 	if (ctx->sqo_blkcg_css)
 		css_put(ctx->sqo_blkcg_css);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/17] io_uring: feed reqs back into alloc cache
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (10 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 11/17] io_uring: persistent req cache Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 13/17] io_uring: use persistent request cache Pavel Begunkov
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Make io_req_free_batch(), which is used for inline executed requests and
IOPOLL, to return requests back into the allocation cache, so avoid
most of kmalloc()/kfree() for those cases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3711ae2633cb..1918b410b6f2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -266,7 +266,7 @@ struct io_sq_data {
 
 #define IO_IOPOLL_BATCH			8
 #define IO_COMPL_BATCH			32
-#define IO_REQ_CACHE_SIZE		8
+#define IO_REQ_CACHE_SIZE		32
 #define IO_REQ_ALLOC_BATCH		8
 
 struct io_comp_state {
@@ -2270,7 +2270,8 @@ static void io_req_free_batch_finish(struct io_ring_ctx *ctx,
 		percpu_ref_put_many(&ctx->refs, rb->ctx_refs);
 }
 
-static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
+static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
+			      struct io_submit_state *state)
 {
 	io_queue_next(req);
 
@@ -2284,9 +2285,13 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
 	rb->ctx_refs++;
 
 	io_dismantle_req(req);
-	rb->reqs[rb->to_free++] = req;
-	if (unlikely(rb->to_free == ARRAY_SIZE(rb->reqs)))
-		__io_req_free_batch_flush(req->ctx, rb);
+	if (state->free_reqs != ARRAY_SIZE(state->reqs)) {
+		state->reqs[state->free_reqs++] = req;
+	} else {
+		rb->reqs[rb->to_free++] = req;
+		if (unlikely(rb->to_free == ARRAY_SIZE(rb->reqs)))
+			__io_req_free_batch_flush(req->ctx, rb);
+	}
 }
 
 static void io_submit_flush_completions(struct io_comp_state *cs,
@@ -2311,7 +2316,7 @@ static void io_submit_flush_completions(struct io_comp_state *cs,
 
 		/* submission and completion refs */
 		if (refcount_sub_and_test(2, &req->refs))
-			io_req_free_batch(&rb, req);
+			io_req_free_batch(&rb, req, &ctx->submit_state);
 	}
 
 	io_req_free_batch_finish(ctx, &rb);
@@ -2464,7 +2469,7 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 		(*nr_events)++;
 
 		if (refcount_dec_and_test(&req->refs))
-			io_req_free_batch(&rb, req);
+			io_req_free_batch(&rb, req, &ctx->submit_state);
 	}
 
 	io_commit_cqring(ctx);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/17] io_uring: use persistent request cache
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (11 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 12/17] io_uring: feed reqs back into alloc cache Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  2:14   ` Jens Axboe
  2021-02-10  0:03 ` [PATCH 14/17] io_uring: provide FIFO ordering for task_work Pavel Begunkov
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

From: Jens Axboe <axboe@kernel.dk>

Now that we have the submit_state in the ring itself, we can have io_kiocb
allocations that are persistent across invocations. This reduces the time
spent doing slab allocations and frees.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
[sil: rebased]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 53 +++++++++++++++++++++++++++++----------------------
 1 file changed, 30 insertions(+), 23 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1918b410b6f2..58f150680c05 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -270,8 +270,9 @@ struct io_sq_data {
 #define IO_REQ_ALLOC_BATCH		8
 
 struct io_comp_state {
-	unsigned int		nr;
 	struct io_kiocb		*reqs[IO_COMPL_BATCH];
+	unsigned int		nr;
+	struct list_head	free_list;
 };
 
 struct io_submit_state {
@@ -1294,7 +1295,6 @@ static inline bool io_is_timeout_noseq(struct io_kiocb *req)
 
 static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 {
-	struct io_submit_state *submit_state;
 	struct io_ring_ctx *ctx;
 	int hash_bits;
 
@@ -1343,10 +1343,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
 	init_llist_head(&ctx->rsrc_put_llist);
 
-	submit_state = &ctx->submit_state;
-	submit_state->comp.nr = 0;
-	submit_state->file_refs = 0;
-	submit_state->free_reqs = 0;
+	INIT_LIST_HEAD(&ctx->submit_state.comp.free_list);
 	return ctx;
 err:
 	kfree(ctx->cancel_hash);
@@ -1952,6 +1949,15 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
 
 	BUILD_BUG_ON(IO_REQ_ALLOC_BATCH > ARRAY_SIZE(state->reqs));
 
+	if (!list_empty(&state->comp.free_list)) {
+		struct io_kiocb *req;
+
+		req = list_first_entry(&state->comp.free_list, struct io_kiocb,
+					compl.list);
+		list_del(&req->compl.list);
+		return req;
+	}
+
 	if (!state->free_reqs) {
 		gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
 		int ret;
@@ -2234,34 +2240,21 @@ static void io_free_req(struct io_kiocb *req)
 }
 
 struct req_batch {
-	void *reqs[IO_IOPOLL_BATCH];
-	int to_free;
-	int ctx_refs;
-
 	struct task_struct	*task;
 	int			task_refs;
+	int			ctx_refs;
 };
 
 static inline void io_init_req_batch(struct req_batch *rb)
 {
-	rb->to_free = 0;
 	rb->task_refs = 0;
 	rb->ctx_refs = 0;
 	rb->task = NULL;
 }
 
-static void __io_req_free_batch_flush(struct io_ring_ctx *ctx,
-				      struct req_batch *rb)
-{
-	kmem_cache_free_bulk(req_cachep, rb->to_free, rb->reqs);
-	rb->to_free = 0;
-}
-
 static void io_req_free_batch_finish(struct io_ring_ctx *ctx,
 				     struct req_batch *rb)
 {
-	if (rb->to_free)
-		__io_req_free_batch_flush(ctx, rb);
 	if (rb->task) {
 		io_put_task(rb->task, rb->task_refs);
 		rb->task = NULL;
@@ -2288,9 +2281,9 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
 	if (state->free_reqs != ARRAY_SIZE(state->reqs)) {
 		state->reqs[state->free_reqs++] = req;
 	} else {
-		rb->reqs[rb->to_free++] = req;
-		if (unlikely(rb->to_free == ARRAY_SIZE(rb->reqs)))
-			__io_req_free_batch_flush(req->ctx, rb);
+		struct io_comp_state *cs = &req->ctx->submit_state.comp;
+
+		list_add(&req->compl.list, &cs->free_list);
 	}
 }
 
@@ -8646,6 +8639,19 @@ static void io_destroy_buffers(struct io_ring_ctx *ctx)
 	idr_destroy(&ctx->io_buffer_idr);
 }
 
+static void io_req_cache_free(struct io_ring_ctx *ctx)
+{
+	struct io_comp_state *cs = &ctx->submit_state.comp;
+
+	while (!list_empty(&cs->free_list)) {
+		struct io_kiocb *req;
+
+		req = list_first_entry(&cs->free_list, struct io_kiocb, compl.list);
+		list_del(&req->compl.list);
+		kmem_cache_free(req_cachep, req);
+	}
+}
+
 static void io_ring_ctx_free(struct io_ring_ctx *ctx)
 {
 	struct io_submit_state *submit_state = &ctx->submit_state;
@@ -8688,6 +8694,7 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	free_uid(ctx->user);
 	put_cred(ctx->creds);
 	kfree(ctx->cancel_hash);
+	io_req_cache_free(ctx);
 	kfree(ctx);
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 14/17] io_uring: provide FIFO ordering for task_work
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (12 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 13/17] io_uring: use persistent request cache Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 15/17] io_uring: enable req cache for task_work items Pavel Begunkov
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

From: Jens Axboe <axboe@kernel.dk>

task_work is a LIFO list, due to how it's implemented as a lockless
list. For long chains of task_work, this can be problematic as the
first entry added is the last one processed. Similarly, we'd waste
a lot of CPU cycles reversing this list.

Wrap the task_work so we have a single task_work entry per task per
ctx, and use that to run it in the right order.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io-wq.h               |   9 ----
 fs/io_uring.c            | 101 ++++++++++++++++++++++++++++++++++++---
 include/linux/io_uring.h |  14 ++++++
 3 files changed, 108 insertions(+), 16 deletions(-)

diff --git a/fs/io-wq.h b/fs/io-wq.h
index e37a0f217cc8..096f1021018e 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -27,15 +27,6 @@ enum io_wq_cancel {
 	IO_WQ_CANCEL_NOTFOUND,	/* work not found */
 };
 
-struct io_wq_work_node {
-	struct io_wq_work_node *next;
-};
-
-struct io_wq_work_list {
-	struct io_wq_work_node *first;
-	struct io_wq_work_node *last;
-};
-
 static inline void wq_list_add_after(struct io_wq_work_node *node,
 				     struct io_wq_work_node *pos,
 				     struct io_wq_work_list *list)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 58f150680c05..1d55ff827242 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -721,6 +721,11 @@ struct async_poll {
 	struct io_poll_iocb	*double_poll;
 };
 
+struct io_task_work {
+	struct io_wq_work_node	node;
+	task_work_func_t	func;
+};
+
 /*
  * NOTE! Each of the iocb union members has the file pointer
  * as the first entry in their struct definition. So you can
@@ -779,7 +784,10 @@ struct io_kiocb {
 	 * 2. to track reqs with ->files (see io_op_def::file_table)
 	 */
 	struct list_head		inflight_entry;
-	struct callback_head		task_work;
+	union {
+		struct io_task_work	io_task_work;
+		struct callback_head	task_work;
+	};
 	/* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
 	struct hlist_node		hash_node;
 	struct async_poll		*apoll;
@@ -2130,6 +2138,81 @@ static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req)
 	return __io_req_find_next(req);
 }
 
+static bool __tctx_task_work(struct io_uring_task *tctx)
+{
+	struct io_wq_work_list list;
+	struct io_wq_work_node *node;
+
+	if (wq_list_empty(&tctx->task_list))
+		return false;
+
+	spin_lock(&tctx->task_lock);
+	list = tctx->task_list;
+	INIT_WQ_LIST(&tctx->task_list);
+	spin_unlock(&tctx->task_lock);
+
+	node = list.first;
+	while (node) {
+		struct io_wq_work_node *next = node->next;
+		struct io_kiocb *req;
+
+		req = container_of(node, struct io_kiocb, io_task_work.node);
+		req->task_work.func(&req->task_work);
+		node = next;
+	}
+
+	return list.first != NULL;
+}
+
+static void tctx_task_work(struct callback_head *cb)
+{
+	struct io_uring_task *tctx = container_of(cb, struct io_uring_task, task_work);
+
+	while (__tctx_task_work(tctx))
+		cond_resched();
+
+	clear_bit(0, &tctx->task_state);
+}
+
+static int io_task_work_add(struct task_struct *tsk, struct io_kiocb *req,
+			    enum task_work_notify_mode notify)
+{
+	struct io_uring_task *tctx = tsk->io_uring;
+	struct io_wq_work_node *node, *prev;
+	int ret;
+
+	WARN_ON_ONCE(!tctx);
+
+	spin_lock(&tctx->task_lock);
+	wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
+	spin_unlock(&tctx->task_lock);
+
+	/* task_work already pending, we're done */
+	if (test_bit(0, &tctx->task_state) ||
+	    test_and_set_bit(0, &tctx->task_state))
+		return 0;
+
+	if (!task_work_add(tsk, &tctx->task_work, notify))
+		return 0;
+
+	/*
+	 * Slow path - we failed, find and delete work. if the work is not
+	 * in the list, it got run and we're fine.
+	 */
+	ret = 0;
+	spin_lock(&tctx->task_lock);
+	wq_list_for_each(node, prev, &tctx->task_list) {
+		if (&req->io_task_work.node == node) {
+			wq_list_del(&tctx->task_list, node, prev);
+			ret = 1;
+			break;
+		}
+	}
+	spin_unlock(&tctx->task_lock);
+	clear_bit(0, &tctx->task_state);
+	return ret;
+}
+
 static int io_req_task_work_add(struct io_kiocb *req)
 {
 	struct task_struct *tsk = req->task;
@@ -2150,7 +2233,7 @@ static int io_req_task_work_add(struct io_kiocb *req)
 	if (!(ctx->flags & IORING_SETUP_SQPOLL))
 		notify = TWA_SIGNAL;
 
-	ret = task_work_add(tsk, &req->task_work, notify);
+	ret = io_task_work_add(tsk, req, notify);
 	if (!ret)
 		wake_up_process(tsk);
 
@@ -2158,7 +2241,7 @@ static int io_req_task_work_add(struct io_kiocb *req)
 }
 
 static void io_req_task_work_add_fallback(struct io_kiocb *req,
-					  void (*cb)(struct callback_head *))
+					  task_work_func_t cb)
 {
 	struct task_struct *tsk = io_wq_get_task(req->ctx->io_wq);
 
@@ -2217,7 +2300,7 @@ static void io_req_task_queue(struct io_kiocb *req)
 {
 	int ret;
 
-	init_task_work(&req->task_work, io_req_task_submit);
+	req->task_work.func = io_req_task_submit;
 	percpu_ref_get(&req->ctx->refs);
 
 	ret = io_req_task_work_add(req);
@@ -2348,7 +2431,7 @@ static void io_free_req_deferred(struct io_kiocb *req)
 {
 	int ret;
 
-	init_task_work(&req->task_work, io_put_req_deferred_cb);
+	req->task_work.func = io_put_req_deferred_cb;
 	ret = io_req_task_work_add(req);
 	if (unlikely(ret))
 		io_req_task_work_add_fallback(req, io_put_req_deferred_cb);
@@ -3393,7 +3476,7 @@ static int io_async_buf_func(struct wait_queue_entry *wait, unsigned mode,
 	req->rw.kiocb.ki_flags &= ~IOCB_WAITQ;
 	list_del_init(&wait->entry);
 
-	init_task_work(&req->task_work, io_req_task_submit);
+	req->task_work.func = io_req_task_submit;
 	percpu_ref_get(&req->ctx->refs);
 
 	/* submit ref gets dropped, acquire a new one */
@@ -5090,7 +5173,7 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll,
 	list_del_init(&poll->wait.entry);
 
 	req->result = mask;
-	init_task_work(&req->task_work, func);
+	req->task_work.func = func;
 	percpu_ref_get(&req->ctx->refs);
 
 	/*
@@ -8093,6 +8176,10 @@ static int io_uring_alloc_task_context(struct task_struct *task)
 	io_init_identity(&tctx->__identity);
 	tctx->identity = &tctx->__identity;
 	task->io_uring = tctx;
+	spin_lock_init(&tctx->task_lock);
+	INIT_WQ_LIST(&tctx->task_list);
+	tctx->task_state = 0;
+	init_task_work(&tctx->task_work, tctx_task_work);
 	return 0;
 }
 
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 35b2d845704d..2eb6d19de336 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -22,6 +22,15 @@ struct io_identity {
 	refcount_t			count;
 };
 
+struct io_wq_work_node {
+	struct io_wq_work_node *next;
+};
+
+struct io_wq_work_list {
+	struct io_wq_work_node *first;
+	struct io_wq_work_node *last;
+};
+
 struct io_uring_task {
 	/* submission side */
 	struct xarray		xa;
@@ -32,6 +41,11 @@ struct io_uring_task {
 	struct io_identity	*identity;
 	atomic_t		in_idle;
 	bool			sqpoll;
+
+	spinlock_t		task_lock;
+	struct io_wq_work_list	task_list;
+	unsigned long		task_state;
+	struct callback_head	task_work;
 };
 
 #if defined(CONFIG_IO_URING)
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 15/17] io_uring: enable req cache for task_work items
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (13 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 14/17] io_uring: provide FIFO ordering for task_work Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 16/17] io_uring: take comp_state from ctx Pavel Begunkov
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

From: Jens Axboe <axboe@kernel.dk>

task_work is run without utilizing the req alloc cache, so any deferred
items don't get to take advantage of either the alloc or free side of it.
With task_work now being wrapped by io_uring, we can use the ctx
completion state to both use the req cache and the completion flush
batching.

With this, the only request type that cannot take advantage of the req
cache is IRQ driven IO for regular files / block devices. Anything else,
including IOPOLL polled IO to those same tyes, will take advantage of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1d55ff827242..f58a5459d6e3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1051,6 +1051,8 @@ static int io_setup_async_rw(struct io_kiocb *req, const struct iovec *iovec,
 			     const struct iovec *fast_iov,
 			     struct iov_iter *iter, bool force);
 static void io_req_task_queue(struct io_kiocb *req);
+static void io_submit_flush_completions(struct io_comp_state *cs,
+					struct io_ring_ctx *ctx);
 
 static struct kmem_cache *req_cachep;
 
@@ -2140,6 +2142,7 @@ static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req)
 
 static bool __tctx_task_work(struct io_uring_task *tctx)
 {
+	struct io_ring_ctx *ctx = NULL;
 	struct io_wq_work_list list;
 	struct io_wq_work_node *node;
 
@@ -2154,11 +2157,28 @@ static bool __tctx_task_work(struct io_uring_task *tctx)
 	node = list.first;
 	while (node) {
 		struct io_wq_work_node *next = node->next;
+		struct io_ring_ctx *this_ctx;
 		struct io_kiocb *req;
 
 		req = container_of(node, struct io_kiocb, io_task_work.node);
+		this_ctx = req->ctx;
 		req->task_work.func(&req->task_work);
 		node = next;
+
+		if (!ctx) {
+			ctx = this_ctx;
+		} else if (ctx != this_ctx) {
+			mutex_lock(&ctx->uring_lock);
+			io_submit_flush_completions(&ctx->submit_state.comp, ctx);
+			mutex_unlock(&ctx->uring_lock);
+			ctx = this_ctx;
+		}
+	}
+
+	if (ctx && ctx->submit_state.comp.nr) {
+		mutex_lock(&ctx->uring_lock);
+		io_submit_flush_completions(&ctx->submit_state.comp, ctx);
+		mutex_unlock(&ctx->uring_lock);
 	}
 
 	return list.first != NULL;
@@ -2281,7 +2301,7 @@ static void __io_req_task_submit(struct io_kiocb *req)
 	if (!ctx->sqo_dead &&
 	    !__io_sq_thread_acquire_mm(ctx) &&
 	    !__io_sq_thread_acquire_files(ctx))
-		__io_queue_sqe(req, NULL);
+		__io_queue_sqe(req, &ctx->submit_state.comp);
 	else
 		__io_req_task_cancel(req, -EFAULT);
 	mutex_unlock(&ctx->uring_lock);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 16/17] io_uring: take comp_state from ctx
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (14 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 15/17] io_uring: enable req cache for task_work items Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  0:03 ` [PATCH 17/17] io_uring: defer flushing cached reqs Pavel Begunkov
  2021-02-10  2:08 ` [PATCH RFC 00/17] playing around req alloc Jens Axboe
  17 siblings, 0 replies; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

__io_queue_sqe() is always called with a non-NULL comp_state, which is
taken directly from context. Don't pass it around but infer from ctx.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f58a5459d6e3..64d3f3e2e93d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1042,7 +1042,7 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
 static void __io_clean_op(struct io_kiocb *req);
 static struct file *io_file_get(struct io_submit_state *state,
 				struct io_kiocb *req, int fd, bool fixed);
-static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs);
+static void __io_queue_sqe(struct io_kiocb *req);
 static void io_rsrc_put_work(struct work_struct *work);
 
 static int io_import_iovec(int rw, struct io_kiocb *req, struct iovec **iovec,
@@ -2301,7 +2301,7 @@ static void __io_req_task_submit(struct io_kiocb *req)
 	if (!ctx->sqo_dead &&
 	    !__io_sq_thread_acquire_mm(ctx) &&
 	    !__io_sq_thread_acquire_files(ctx))
-		__io_queue_sqe(req, &ctx->submit_state.comp);
+		__io_queue_sqe(req);
 	else
 		__io_req_task_cancel(req, -EFAULT);
 	mutex_unlock(&ctx->uring_lock);
@@ -6558,14 +6558,12 @@ static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req)
 	return nxt;
 }
 
-static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
+static void __io_queue_sqe(struct io_kiocb *req)
 {
 	struct io_kiocb *linked_timeout;
 	const struct cred *old_creds = NULL;
-	int ret, issue_flags = IO_URING_F_NONBLOCK;
+	int ret;
 
-	if (cs)
-		issue_flags |= IO_URING_F_COMPLETE_DEFER;
 again:
 	linked_timeout = io_prep_linked_timeout(req);
 
@@ -6580,7 +6578,7 @@ static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
 			old_creds = override_creds(req->work.identity->creds);
 	}
 
-	ret = io_issue_sqe(req, issue_flags);
+	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
 
 	/*
 	 * We async punt it if the file wasn't marked NOWAIT, or if the file
@@ -6600,9 +6598,12 @@ static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
 	} else if (likely(!ret)) {
 		/* drop submission reference */
 		if (req->flags & REQ_F_COMPLETE_INLINE) {
+			struct io_ring_ctx *ctx = req->ctx;
+			struct io_comp_state *cs = &ctx->submit_state.comp;
+
 			cs->reqs[cs->nr++] = req;
 			if (cs->nr == IO_COMPL_BATCH)
-				io_submit_flush_completions(cs, req->ctx);
+				io_submit_flush_completions(cs, ctx);
 			req = NULL;
 		} else {
 			req = io_put_req_find_next(req);
@@ -6628,8 +6629,7 @@ static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
 		revert_creds(old_creds);
 }
 
-static void io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
-			 struct io_comp_state *cs)
+static void io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	int ret;
 
@@ -6654,18 +6654,17 @@ static void io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (unlikely(ret))
 				goto fail_req;
 		}
-		__io_queue_sqe(req, cs);
+		__io_queue_sqe(req);
 	}
 }
 
-static inline void io_queue_link_head(struct io_kiocb *req,
-				      struct io_comp_state *cs)
+static inline void io_queue_link_head(struct io_kiocb *req)
 {
 	if (unlikely(req->flags & REQ_F_FAIL_LINK)) {
 		io_put_req(req);
 		io_req_complete(req, -ECANCELED);
 	} else
-		io_queue_sqe(req, NULL, cs);
+		io_queue_sqe(req, NULL);
 }
 
 struct io_submit_link {
@@ -6674,7 +6673,7 @@ struct io_submit_link {
 };
 
 static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
-			 struct io_submit_link *link, struct io_comp_state *cs)
+			 struct io_submit_link *link)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
@@ -6712,7 +6711,7 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 
 		/* last request of a link, enqueue the link */
 		if (!(req->flags & (REQ_F_LINK | REQ_F_HARDLINK))) {
-			io_queue_link_head(head, cs);
+			io_queue_link_head(head);
 			link->head = NULL;
 		}
 	} else {
@@ -6727,7 +6726,7 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			link->head = req;
 			link->last = req;
 		} else {
-			io_queue_sqe(req, sqe, cs);
+			io_queue_sqe(req, sqe);
 		}
 	}
 
@@ -6968,7 +6967,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 
 		trace_io_uring_submit_sqe(ctx, req->opcode, req->user_data,
 					true, ctx->flags & IORING_SETUP_SQPOLL);
-		err = io_submit_sqe(req, sqe, &link, &ctx->submit_state.comp);
+		err = io_submit_sqe(req, sqe, &link);
 		if (err)
 			goto fail_req;
 	}
@@ -6983,7 +6982,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 		put_task_struct_many(current, unused);
 	}
 	if (link.head)
-		io_queue_link_head(link.head, &ctx->submit_state.comp);
+		io_queue_link_head(link.head);
 	io_submit_state_end(&ctx->submit_state, ctx);
 
 	 /* Commit SQ ring head once we've consumed and submitted all SQEs */
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 17/17] io_uring: defer flushing cached reqs
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (15 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 16/17] io_uring: take comp_state from ctx Pavel Begunkov
@ 2021-02-10  0:03 ` Pavel Begunkov
  2021-02-10  2:10   ` Jens Axboe
  2021-02-10  2:08 ` [PATCH RFC 00/17] playing around req alloc Jens Axboe
  17 siblings, 1 reply; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  0:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Awhile there are requests in the allocation cache -- use them, only if
those ended go for the stashed memory in comp.free_list. As list
manipulation are generally heavy and are not good for caches, flush them
all or as much as can in one go.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 64d3f3e2e93d..17194e0d62ff 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1953,25 +1953,34 @@ static inline void io_req_complete(struct io_kiocb *req, long res)
 	__io_req_complete(req, 0, res, 0);
 }
 
+static void io_flush_cached_reqs(struct io_submit_state *state)
+{
+	do {
+		struct io_kiocb *req = list_first_entry(&state->comp.free_list,
+						struct io_kiocb, compl.list);
+
+		list_del(&req->compl.list);
+		state->reqs[state->free_reqs++] = req;
+		if (state->free_reqs == ARRAY_SIZE(state->reqs))
+			break;
+	} while (!list_empty(&state->comp.free_list));
+}
+
 static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
 {
 	struct io_submit_state *state = &ctx->submit_state;
 
 	BUILD_BUG_ON(IO_REQ_ALLOC_BATCH > ARRAY_SIZE(state->reqs));
 
-	if (!list_empty(&state->comp.free_list)) {
-		struct io_kiocb *req;
-
-		req = list_first_entry(&state->comp.free_list, struct io_kiocb,
-					compl.list);
-		list_del(&req->compl.list);
-		return req;
-	}
-
 	if (!state->free_reqs) {
 		gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
 		int ret;
 
+		if (!list_empty(&state->comp.free_list)) {
+			io_flush_cached_reqs(state);
+			goto out;
+		}
+
 		ret = kmem_cache_alloc_bulk(req_cachep, gfp, IO_REQ_ALLOC_BATCH,
 					    state->reqs);
 
@@ -1987,7 +1996,7 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
 		}
 		state->free_reqs = ret;
 	}
-
+out:
 	state->free_reqs--;
 	return state->reqs[state->free_reqs];
 }
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC 00/17] playing around req alloc
  2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
                   ` (16 preceding siblings ...)
  2021-02-10  0:03 ` [PATCH 17/17] io_uring: defer flushing cached reqs Pavel Begunkov
@ 2021-02-10  2:08 ` Jens Axboe
  2021-02-10  3:14   ` Pavel Begunkov
  17 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2021-02-10  2:08 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2/9/21 5:03 PM, Pavel Begunkov wrote:
> Unfolding previous ideas on persistent req caches. 4-7 including
> slashed ~20% of overhead for nops benchmark, haven't done benchmarking
> personally for this yet, but according to perf should be ~30-40% in
> total. That's for IOPOLL + inline completion cases, obviously w/o
> async/IRQ completions.

And task_work, which is sort-of inline.

> Jens,
> 1. 11/17 removes deallocations on end of submit_sqes. Looks you
> forgot or just didn't do that.
> 
> 2. lists are slow and not great cache-wise, that why at I want at least
> a combined approach from 12/17.

This is only true if you're browsing a full list. If you do add-to-front
for a cache, and remove-from-front, then cache footprint of lists are
good.

> 3. Instead of lists in "use persistent request cache" I had in mind a
> slightly different way: to grow the req alloc cache to 32-128 (or hint
> from the userspace), batch-alloc by 8 as before, and recycle _all_ reqs
> right into it. If  overflows, do kfree().
> It should give probabilistically high hit rate, amortising out most of
> allocations. Pros: it doesn't grow ~infinitely as lists can. Cons: there
> are always counter examples. But as I don't have numbers to back it, I
> took your implementation. Maybe, we'll reconsider later.

It shouldn't grow bigger than what was used, but the downside is that
it will grow as big as the biggest usage ever. We could prune, if need
be, of course.

As far as I'm concerned, the hint from user space is the submit count.

> I'll revise tomorrow on a fresh head + do some performance testing,
> and is leaving it RFC until then.

I'll look too and test this, thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 17/17] io_uring: defer flushing cached reqs
  2021-02-10  0:03 ` [PATCH 17/17] io_uring: defer flushing cached reqs Pavel Begunkov
@ 2021-02-10  2:10   ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2021-02-10  2:10 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2/9/21 5:03 PM, Pavel Begunkov wrote:
> Awhile there are requests in the allocation cache -- use them, only if
> those ended go for the stashed memory in comp.free_list. As list
> manipulation are generally heavy and are not good for caches, flush them
> all or as much as can in one go.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  fs/io_uring.c | 29 +++++++++++++++++++----------
>  1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 64d3f3e2e93d..17194e0d62ff 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1953,25 +1953,34 @@ static inline void io_req_complete(struct io_kiocb *req, long res)
>  	__io_req_complete(req, 0, res, 0);
>  }
>  
> +static void io_flush_cached_reqs(struct io_submit_state *state)
> +{
> +	do {
> +		struct io_kiocb *req = list_first_entry(&state->comp.free_list,
> +						struct io_kiocb, compl.list);
> +
> +		list_del(&req->compl.list);
> +		state->reqs[state->free_reqs++] = req;
> +		if (state->free_reqs == ARRAY_SIZE(state->reqs))
> +			break;
> +	} while (!list_empty(&state->comp.free_list));
> +}
> +
>  static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
>  {
>  	struct io_submit_state *state = &ctx->submit_state;
>  
>  	BUILD_BUG_ON(IO_REQ_ALLOC_BATCH > ARRAY_SIZE(state->reqs));
>  
> -	if (!list_empty(&state->comp.free_list)) {
> -		struct io_kiocb *req;
> -
> -		req = list_first_entry(&state->comp.free_list, struct io_kiocb,
> -					compl.list);
> -		list_del(&req->compl.list);
> -		return req;
> -	}
> -
>  	if (!state->free_reqs) {
>  		gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
>  		int ret;
>  
> +		if (!list_empty(&state->comp.free_list)) {
> +			io_flush_cached_reqs(state);
> +			goto out;
> +		}

I think that'd be cleaner as:

	if (io_flush_cached_reqs(state))
		goto got_req;

and have io_flush_cached_reqs() return true/false depending on what it did.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 13/17] io_uring: use persistent request cache
  2021-02-10  0:03 ` [PATCH 13/17] io_uring: use persistent request cache Pavel Begunkov
@ 2021-02-10  2:14   ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2021-02-10  2:14 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2/9/21 5:03 PM, Pavel Begunkov wrote:
> @@ -1343,10 +1343,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>  	INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
>  	init_llist_head(&ctx->rsrc_put_llist);
>  
> -	submit_state = &ctx->submit_state;
> -	submit_state->comp.nr = 0;
> -	submit_state->file_refs = 0;
> -	submit_state->free_reqs = 0;
> +	INIT_LIST_HEAD(&ctx->submit_state.comp.free_list);
>  	return ctx;
>  err:
>  	kfree(ctx->cancel_hash);

This just needs to be folded into "io_uring: don't reinit submit state every time"
as 'ctx' is zeroed on alloc. Except the list init, of course.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC 00/17] playing around req alloc
  2021-02-10  2:08 ` [PATCH RFC 00/17] playing around req alloc Jens Axboe
@ 2021-02-10  3:14   ` Pavel Begunkov
  2021-02-10  3:23     ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10  3:14 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 10/02/2021 02:08, Jens Axboe wrote:
> On 2/9/21 5:03 PM, Pavel Begunkov wrote:
>> Unfolding previous ideas on persistent req caches. 4-7 including
>> slashed ~20% of overhead for nops benchmark, haven't done benchmarking
>> personally for this yet, but according to perf should be ~30-40% in
>> total. That's for IOPOLL + inline completion cases, obviously w/o
>> async/IRQ completions.
> 
> And task_work, which is sort-of inline.
> 
>> Jens,
>> 1. 11/17 removes deallocations on end of submit_sqes. Looks you
>> forgot or just didn't do that.

And without the patches I added, it wasn't even necessary, so
nevermind

>>
>> 2. lists are slow and not great cache-wise, that why at I want at least
>> a combined approach from 12/17.
> 
> This is only true if you're browsing a full list. If you do add-to-front
> for a cache, and remove-from-front, then cache footprint of lists are
> good.

Ok, good point, but still don't think it's great. E.g. 7/17 did improve
performance a bit for me, as I mentioned in the related RFC. And that
was for inline-completed nops, and going over the list/array and
always touching all reqs.

> 
>> 3. Instead of lists in "use persistent request cache" I had in mind a
>> slightly different way: to grow the req alloc cache to 32-128 (or hint
>> from the userspace), batch-alloc by 8 as before, and recycle _all_ reqs
>> right into it. If  overflows, do kfree().
>> It should give probabilistically high hit rate, amortising out most of
>> allocations. Pros: it doesn't grow ~infinitely as lists can. Cons: there
>> are always counter examples. But as I don't have numbers to back it, I
>> took your implementation. Maybe, we'll reconsider later.
> 
> It shouldn't grow bigger than what was used, but the downside is that
> it will grow as big as the biggest usage ever. We could prune, if need
> be, of course.

Yeah, that was the point. But not a deal-breaker in either case.

> 
> As far as I'm concerned, the hint from user space is the submit count.

I mean hint on setup, like max QD, then we can allocate req cache
accordingly. Not like it matters

> 
>> I'll revise tomorrow on a fresh head + do some performance testing,
>> and is leaving it RFC until then.
> 
> I'll look too and test this, thanks!
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC 00/17] playing around req alloc
  2021-02-10  3:14   ` Pavel Begunkov
@ 2021-02-10  3:23     ` Jens Axboe
  2021-02-10 11:53       ` Pavel Begunkov
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2021-02-10  3:23 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2/9/21 8:14 PM, Pavel Begunkov wrote:
> On 10/02/2021 02:08, Jens Axboe wrote:
>> On 2/9/21 5:03 PM, Pavel Begunkov wrote:
>>> Unfolding previous ideas on persistent req caches. 4-7 including
>>> slashed ~20% of overhead for nops benchmark, haven't done benchmarking
>>> personally for this yet, but according to perf should be ~30-40% in
>>> total. That's for IOPOLL + inline completion cases, obviously w/o
>>> async/IRQ completions.
>>
>> And task_work, which is sort-of inline.
>>
>>> Jens,
>>> 1. 11/17 removes deallocations on end of submit_sqes. Looks you
>>> forgot or just didn't do that.
> 
> And without the patches I added, it wasn't even necessary, so
> nevermind

OK good, I was a bit confused about that one...

>>> 2. lists are slow and not great cache-wise, that why at I want at least
>>> a combined approach from 12/17.
>>
>> This is only true if you're browsing a full list. If you do add-to-front
>> for a cache, and remove-from-front, then cache footprint of lists are
>> good.
> 
> Ok, good point, but still don't think it's great. E.g. 7/17 did improve
> performance a bit for me, as I mentioned in the related RFC. And that
> was for inline-completed nops, and going over the list/array and
> always touching all reqs.

Agree, array is always a bit better. Just saying that it's not a huge
deal unless you're traversing the list, in which case lists are indeed
horrible. But for popping off the first entry (or adding one), it's not
bad at all.

>>> 3. Instead of lists in "use persistent request cache" I had in mind a
>>> slightly different way: to grow the req alloc cache to 32-128 (or hint
>>> from the userspace), batch-alloc by 8 as before, and recycle _all_ reqs
>>> right into it. If  overflows, do kfree().
>>> It should give probabilistically high hit rate, amortising out most of
>>> allocations. Pros: it doesn't grow ~infinitely as lists can. Cons: there
>>> are always counter examples. But as I don't have numbers to back it, I
>>> took your implementation. Maybe, we'll reconsider later.
>>
>> It shouldn't grow bigger than what was used, but the downside is that
>> it will grow as big as the biggest usage ever. We could prune, if need
>> be, of course.
> 
> Yeah, that was the point. But not a deal-breaker in either case.

Agree

>> As far as I'm concerned, the hint from user space is the submit count.
> 
> I mean hint on setup, like max QD, then we can allocate req cache
> accordingly. Not like it matters

I'd rather grow it dynamically, only the first few iterations will hit
the alloc. Which is fine, and better than pre-populating. Assuming I
understood you correctly here...

>>
>>> I'll revise tomorrow on a fresh head + do some performance testing,
>>> and is leaving it RFC until then.
>>
>> I'll look too and test this, thanks!

Tests out good for me with the suggested edits I made. nops are
massively improved, as suspected. But also realistic workloads benefit
nicely.

I'll send out a few patches I have on top tomorrow. Not fixes, but just
further improvements/features/accounting.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC 00/17] playing around req alloc
  2021-02-10  3:23     ` Jens Axboe
@ 2021-02-10 11:53       ` Pavel Begunkov
  2021-02-10 14:27         ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10 11:53 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 10/02/2021 03:23, Jens Axboe wrote:
> On 2/9/21 8:14 PM, Pavel Begunkov wrote:
>> On 10/02/2021 02:08, Jens Axboe wrote:
>>> On 2/9/21 5:03 PM, Pavel Begunkov wrote:
>>>> Unfolding previous ideas on persistent req caches. 4-7 including
>>>> slashed ~20% of overhead for nops benchmark, haven't done benchmarking
>>>> personally for this yet, but according to perf should be ~30-40% in
>>>> total. That's for IOPOLL + inline completion cases, obviously w/o
>>>> async/IRQ completions.
>>>
>>> And task_work, which is sort-of inline.
>>>
>>>> Jens,
>>>> 1. 11/17 removes deallocations on end of submit_sqes. Looks you
>>>> forgot or just didn't do that.
>>
>> And without the patches I added, it wasn't even necessary, so
>> nevermind
> 
> OK good, I was a bit confused about that one...
> 
>>>> 2. lists are slow and not great cache-wise, that why at I want at least
>>>> a combined approach from 12/17.
>>>
>>> This is only true if you're browsing a full list. If you do add-to-front
>>> for a cache, and remove-from-front, then cache footprint of lists are
>>> good.
>>
>> Ok, good point, but still don't think it's great. E.g. 7/17 did improve
>> performance a bit for me, as I mentioned in the related RFC. And that
>> was for inline-completed nops, and going over the list/array and
>> always touching all reqs.
> 
> Agree, array is always a bit better. Just saying that it's not a huge
> deal unless you're traversing the list, in which case lists are indeed
> horrible. But for popping off the first entry (or adding one), it's not
> bad at all.

btw, looks can be replaced with a singly-linked list (stack).

> 
>>>> 3. Instead of lists in "use persistent request cache" I had in mind a
>>>> slightly different way: to grow the req alloc cache to 32-128 (or hint
>>>> from the userspace), batch-alloc by 8 as before, and recycle _all_ reqs
>>>> right into it. If  overflows, do kfree().
>>>> It should give probabilistically high hit rate, amortising out most of
>>>> allocations. Pros: it doesn't grow ~infinitely as lists can. Cons: there
>>>> are always counter examples. But as I don't have numbers to back it, I
>>>> took your implementation. Maybe, we'll reconsider later.
>>>
>>> It shouldn't grow bigger than what was used, but the downside is that
>>> it will grow as big as the biggest usage ever. We could prune, if need
>>> be, of course.
>>
>> Yeah, that was the point. But not a deal-breaker in either case.
> 
> Agree
> 
>>> As far as I'm concerned, the hint from user space is the submit count.
>>
>> I mean hint on setup, like max QD, then we can allocate req cache
>> accordingly. Not like it matters
> 
> I'd rather grow it dynamically, only the first few iterations will hit
> the alloc. Which is fine, and better than pre-populating. Assuming I
> understood you correctly here...

I guess not, it's not about number of requests perse, but space in
alloc cache. Like that

struct io_submit_state {
	...
	void			*reqs[userspace_hint];
};

> 
>>>
>>>> I'll revise tomorrow on a fresh head + do some performance testing,
>>>> and is leaving it RFC until then.
>>>
>>> I'll look too and test this, thanks!
> 
> Tests out good for me with the suggested edits I made. nops are
> massively improved, as suspected. But also realistic workloads benefit
> nicely.
> 
> I'll send out a few patches I have on top tomorrow. Not fixes, but just
> further improvements/features/accounting.

Sounds great!
btw, "lock_free_list" which is not a "lock-free" list can be
confusing, I'd suggest to rename it to free_list_lock.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/17] io_uring: don't propagate io_comp_state
  2021-02-10  0:03 ` [PATCH 03/17] io_uring: don't propagate io_comp_state Pavel Begunkov
@ 2021-02-10 14:00   ` Pavel Begunkov
  2021-02-10 14:27     ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Begunkov @ 2021-02-10 14:00 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 10/02/2021 00:03, Pavel Begunkov wrote:
> There is no reason to drag io_comp_state into opcode handlers, we just
> need a flag and the actual work will be done in __io_queue_sqe().
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>

Jens, can you fold it in? Doesn't build currently for !CONFIG_NET


diff --git a/fs/io_uring.c b/fs/io_uring.c
index e5f9f04e2e2d..8fb1c845fa5b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5145,14 +5145,12 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return -EOPNOTSUPP;
 }
 
-static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags,
-		      struct io_comp_state *cs)
+static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 {
 	return -EOPNOTSUPP;
 }
 
-static int io_send(struct io_kiocb *req, unsigned int issue_flags,
-		   struct io_comp_state *cs)
+static int io_send(struct io_kiocb *req, unsigned int issue_flags)
 {
 	return -EOPNOTSUPP;
 }
@@ -5163,14 +5161,12 @@ static int io_recvmsg_prep(struct io_kiocb *req,
 	return -EOPNOTSUPP;
 }
 
-static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags,
-		      struct io_comp_state *cs)
+static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 {
 	return -EOPNOTSUPP;
 }
 
-static int io_recv(struct io_kiocb *req, unsigned int issue_flags,
-		   struct io_comp_state *cs)
+static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 {
 	return -EOPNOTSUPP;
 }
@@ -5180,8 +5176,7 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return -EOPNOTSUPP;
 }
 
-static int io_accept(struct io_kiocb *req, unsigned int issue_flags,
-		     struct io_comp_state *cs)
+static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 {
 	return -EOPNOTSUPP;
 }
@@ -5191,8 +5186,7 @@ static int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return -EOPNOTSUPP;
 }
 
-static int io_connect(struct io_kiocb *req, unsigned int issue_flags,
-		      struct io_comp_state *cs)
+static int io_connect(struct io_kiocb *req, unsigned int issue_flags)
 {
 	return -EOPNOTSUPP;
 }

-- 
Pavel Begunkov

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC 00/17] playing around req alloc
  2021-02-10 11:53       ` Pavel Begunkov
@ 2021-02-10 14:27         ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2021-02-10 14:27 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2/10/21 4:53 AM, Pavel Begunkov wrote:
> On 10/02/2021 03:23, Jens Axboe wrote:
>> On 2/9/21 8:14 PM, Pavel Begunkov wrote:
>>> On 10/02/2021 02:08, Jens Axboe wrote:
>>>> On 2/9/21 5:03 PM, Pavel Begunkov wrote:
>>>>> Unfolding previous ideas on persistent req caches. 4-7 including
>>>>> slashed ~20% of overhead for nops benchmark, haven't done benchmarking
>>>>> personally for this yet, but according to perf should be ~30-40% in
>>>>> total. That's for IOPOLL + inline completion cases, obviously w/o
>>>>> async/IRQ completions.
>>>>
>>>> And task_work, which is sort-of inline.
>>>>
>>>>> Jens,
>>>>> 1. 11/17 removes deallocations on end of submit_sqes. Looks you
>>>>> forgot or just didn't do that.
>>>
>>> And without the patches I added, it wasn't even necessary, so
>>> nevermind
>>
>> OK good, I was a bit confused about that one...
>>
>>>>> 2. lists are slow and not great cache-wise, that why at I want at least
>>>>> a combined approach from 12/17.
>>>>
>>>> This is only true if you're browsing a full list. If you do add-to-front
>>>> for a cache, and remove-from-front, then cache footprint of lists are
>>>> good.
>>>
>>> Ok, good point, but still don't think it's great. E.g. 7/17 did improve
>>> performance a bit for me, as I mentioned in the related RFC. And that
>>> was for inline-completed nops, and going over the list/array and
>>> always touching all reqs.
>>
>> Agree, array is always a bit better. Just saying that it's not a huge
>> deal unless you're traversing the list, in which case lists are indeed
>> horrible. But for popping off the first entry (or adding one), it's not
>> bad at all.
> 
> btw, looks can be replaced with a singly-linked list (stack).

It could, yes.

>>>>> 3. Instead of lists in "use persistent request cache" I had in mind a
>>>>> slightly different way: to grow the req alloc cache to 32-128 (or hint
>>>>> from the userspace), batch-alloc by 8 as before, and recycle _all_ reqs
>>>>> right into it. If  overflows, do kfree().
>>>>> It should give probabilistically high hit rate, amortising out most of
>>>>> allocations. Pros: it doesn't grow ~infinitely as lists can. Cons: there
>>>>> are always counter examples. But as I don't have numbers to back it, I
>>>>> took your implementation. Maybe, we'll reconsider later.
>>>>
>>>> It shouldn't grow bigger than what was used, but the downside is that
>>>> it will grow as big as the biggest usage ever. We could prune, if need
>>>> be, of course.
>>>
>>> Yeah, that was the point. But not a deal-breaker in either case.
>>
>> Agree
>>
>>>> As far as I'm concerned, the hint from user space is the submit count.
>>>
>>> I mean hint on setup, like max QD, then we can allocate req cache
>>> accordingly. Not like it matters
>>
>> I'd rather grow it dynamically, only the first few iterations will hit
>> the alloc. Which is fine, and better than pre-populating. Assuming I
>> understood you correctly here...
> 
> I guess not, it's not about number of requests perse, but space in
> alloc cache. Like that
> 
> struct io_submit_state {
> 	...
> 	void			*reqs[userspace_hint];
> };
> 
>>
>>>>
>>>>> I'll revise tomorrow on a fresh head + do some performance testing,
>>>>> and is leaving it RFC until then.
>>>>
>>>> I'll look too and test this, thanks!
>>
>> Tests out good for me with the suggested edits I made. nops are
>> massively improved, as suspected. But also realistic workloads benefit
>> nicely.
>>
>> I'll send out a few patches I have on top tomorrow. Not fixes, but just
>> further improvements/features/accounting.
> 
> Sounds great!
> btw, "lock_free_list" which is not a "lock-free" list can be
> confusing, I'd suggest to rename it to free_list_lock.

Or maybe just locked_free_list would make it more understandable. The
current name is misleading.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/17] io_uring: don't propagate io_comp_state
  2021-02-10 14:00   ` Pavel Begunkov
@ 2021-02-10 14:27     ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2021-02-10 14:27 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2/10/21 7:00 AM, Pavel Begunkov wrote:
> On 10/02/2021 00:03, Pavel Begunkov wrote:
>> There is no reason to drag io_comp_state into opcode handlers, we just
>> need a flag and the actual work will be done in __io_queue_sqe().
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> 
> Jens, can you fold it in? Doesn't build currently for !CONFIG_NET

Will do.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-02-10 14:28 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-10  0:03 [PATCH RFC 00/17] playing around req alloc Pavel Begunkov
2021-02-10  0:03 ` [PATCH 01/17] io_uring: replace force_nonblock with flags Pavel Begunkov
2021-02-10  0:03 ` [PATCH 02/17] io_uring: make op handlers always take issue flags Pavel Begunkov
2021-02-10  0:03 ` [PATCH 03/17] io_uring: don't propagate io_comp_state Pavel Begunkov
2021-02-10 14:00   ` Pavel Begunkov
2021-02-10 14:27     ` Jens Axboe
2021-02-10  0:03 ` [PATCH 04/17] io_uring: don't keep submit_state on stack Pavel Begunkov
2021-02-10  0:03 ` [PATCH 05/17] io_uring: remove ctx from comp_state Pavel Begunkov
2021-02-10  0:03 ` [PATCH 06/17] io_uring: don't reinit submit state every time Pavel Begunkov
2021-02-10  0:03 ` [PATCH 07/17] io_uring: replace list with array for compl batch Pavel Begunkov
2021-02-10  0:03 ` [PATCH 08/17] io_uring: submit-completion free batching Pavel Begunkov
2021-02-10  0:03 ` [PATCH 09/17] io_uring: remove fallback_req Pavel Begunkov
2021-02-10  0:03 ` [PATCH 10/17] io_uring: count ctx refs separately from reqs Pavel Begunkov
2021-02-10  0:03 ` [PATCH 11/17] io_uring: persistent req cache Pavel Begunkov
2021-02-10  0:03 ` [PATCH 12/17] io_uring: feed reqs back into alloc cache Pavel Begunkov
2021-02-10  0:03 ` [PATCH 13/17] io_uring: use persistent request cache Pavel Begunkov
2021-02-10  2:14   ` Jens Axboe
2021-02-10  0:03 ` [PATCH 14/17] io_uring: provide FIFO ordering for task_work Pavel Begunkov
2021-02-10  0:03 ` [PATCH 15/17] io_uring: enable req cache for task_work items Pavel Begunkov
2021-02-10  0:03 ` [PATCH 16/17] io_uring: take comp_state from ctx Pavel Begunkov
2021-02-10  0:03 ` [PATCH 17/17] io_uring: defer flushing cached reqs Pavel Begunkov
2021-02-10  2:10   ` Jens Axboe
2021-02-10  2:08 ` [PATCH RFC 00/17] playing around req alloc Jens Axboe
2021-02-10  3:14   ` Pavel Begunkov
2021-02-10  3:23     ` Jens Axboe
2021-02-10 11:53       ` Pavel Begunkov
2021-02-10 14:27         ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.