All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next v2 00/12] CQ locking optimisation
@ 2022-12-07  3:53 Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 01/12] io_uring: dont remove file from msg_ring reqs Pavel Begunkov
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Optimise CQ locking for event posting depending on a number of ring setup flags.
QD1 nop benchmark showed 12.067 -> 12.565 MIOPS increase, which more than 8.5%
of the io_uring kernel overhead (taking into account that the syscall overhead
is over 50%) or 4.12% of the total performance. Naturally, it's not only about
QD1, applications can submit a bunch of requests but their completions will may
arrive randomly hurting batching and so performance (or latency).

The downside is that we have to punt all io-wq completions to the
original task. The performance win should diminish with better
completion batching, but it should be worth it for as it also helps tw,
which in reality often don't complete too many requests.

The feature depends on DEFER_TASKRUN but can be relaxed to SINGLE_ISSUER

v2: some general msg_ring fixes (patche 1,2)
    fix exiting ring potentially modifying CQ in parallel (8/12)
    use task_work instead of overflowing msg_ring CQEs, which could've
      messed with CQE ordering (9-11)

Pavel Begunkov (12):
  io_uring: dont remove file from msg_ring reqs
  io_uring: improve io_double_lock_ctx fail handling
  io_uring: skip overflow CQE posting for dying ring
  io_uring: don't check overflow flush failures
  io_uring: complete all requests in task context
  io_uring: force multishot CQEs into task context
  io_uring: use tw for putting rsrc
  io_uring: never run tw and fallback in parallel
  io_uring: get rid of double locking
  io_uring: extract a io_msg_install_complete helper
  io_uring: do msg_ring in target task via tw
  io_uring: skip spinlocking for ->task_complete

 include/linux/io_uring.h       |   2 +
 include/linux/io_uring_types.h |   3 +
 io_uring/io_uring.c            | 165 ++++++++++++++++++++++-----------
 io_uring/io_uring.h            |  12 ++-
 io_uring/msg_ring.c            | 163 ++++++++++++++++++++++----------
 io_uring/msg_ring.h            |   1 +
 io_uring/net.c                 |  21 +++++
 io_uring/opdef.c               |   8 ++
 io_uring/opdef.h               |   2 +
 io_uring/rsrc.c                |  19 +++-
 io_uring/rsrc.h                |   1 +
 11 files changed, 291 insertions(+), 106 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 01/12] io_uring: dont remove file from msg_ring reqs
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07 13:52   ` Jens Axboe
  2022-12-07  3:53 ` [PATCH for-next v2 02/12] io_uring: improve io_double_lock_ctx fail handling Pavel Begunkov
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

We should not be messing with req->file outside of core paths. Clearing
it makes msg_ring non reentrant, i.e. luckily io_msg_send_fd() fails the
request on failed io_double_lock_ctx() but clearly was originally
intended to do retries instead.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 2 +-
 io_uring/msg_ring.c | 4 ----
 io_uring/opdef.c    | 7 +++++++
 io_uring/opdef.h    | 2 ++
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 436b1ac8f6d0..62372a641add 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1810,7 +1810,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 		return ret;
 
 	/* If the op doesn't have a file, we're not polling for it */
-	if ((req->ctx->flags & IORING_SETUP_IOPOLL) && req->file)
+	if ((req->ctx->flags & IORING_SETUP_IOPOLL) && def->iopoll_queue)
 		io_iopoll_req_issued(req, issue_flags);
 
 	return 0;
diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index afb543aab9f6..615c85e164ab 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -167,9 +167,5 @@ int io_msg_ring(struct io_kiocb *req, unsigned int issue_flags)
 	if (ret < 0)
 		req_set_fail(req);
 	io_req_set_res(req, ret, 0);
-	/* put file to avoid an attempt to IOPOLL the req */
-	if (!(req->flags & REQ_F_FIXED_FILE))
-		io_put_file(req->file);
-	req->file = NULL;
 	return IOU_OK;
 }
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 83dc0f9ad3b2..04dd2c983fce 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -63,6 +63,7 @@ const struct io_op_def io_op_defs[] = {
 		.audit_skip		= 1,
 		.ioprio			= 1,
 		.iopoll			= 1,
+		.iopoll_queue		= 1,
 		.async_size		= sizeof(struct io_async_rw),
 		.name			= "READV",
 		.prep			= io_prep_rw,
@@ -80,6 +81,7 @@ const struct io_op_def io_op_defs[] = {
 		.audit_skip		= 1,
 		.ioprio			= 1,
 		.iopoll			= 1,
+		.iopoll_queue		= 1,
 		.async_size		= sizeof(struct io_async_rw),
 		.name			= "WRITEV",
 		.prep			= io_prep_rw,
@@ -103,6 +105,7 @@ const struct io_op_def io_op_defs[] = {
 		.audit_skip		= 1,
 		.ioprio			= 1,
 		.iopoll			= 1,
+		.iopoll_queue		= 1,
 		.async_size		= sizeof(struct io_async_rw),
 		.name			= "READ_FIXED",
 		.prep			= io_prep_rw,
@@ -118,6 +121,7 @@ const struct io_op_def io_op_defs[] = {
 		.audit_skip		= 1,
 		.ioprio			= 1,
 		.iopoll			= 1,
+		.iopoll_queue		= 1,
 		.async_size		= sizeof(struct io_async_rw),
 		.name			= "WRITE_FIXED",
 		.prep			= io_prep_rw,
@@ -277,6 +281,7 @@ const struct io_op_def io_op_defs[] = {
 		.audit_skip		= 1,
 		.ioprio			= 1,
 		.iopoll			= 1,
+		.iopoll_queue		= 1,
 		.async_size		= sizeof(struct io_async_rw),
 		.name			= "READ",
 		.prep			= io_prep_rw,
@@ -292,6 +297,7 @@ const struct io_op_def io_op_defs[] = {
 		.audit_skip		= 1,
 		.ioprio			= 1,
 		.iopoll			= 1,
+		.iopoll_queue		= 1,
 		.async_size		= sizeof(struct io_async_rw),
 		.name			= "WRITE",
 		.prep			= io_prep_rw,
@@ -481,6 +487,7 @@ const struct io_op_def io_op_defs[] = {
 		.plug			= 1,
 		.name			= "URING_CMD",
 		.iopoll			= 1,
+		.iopoll_queue		= 1,
 		.async_size		= uring_cmd_pdu_size(1),
 		.prep			= io_uring_cmd_prep,
 		.issue			= io_uring_cmd,
diff --git a/io_uring/opdef.h b/io_uring/opdef.h
index 3efe06d25473..df7e13d9bfba 100644
--- a/io_uring/opdef.h
+++ b/io_uring/opdef.h
@@ -25,6 +25,8 @@ struct io_op_def {
 	unsigned		ioprio : 1;
 	/* supports iopoll */
 	unsigned		iopoll : 1;
+	/* have to be put into the iopoll list */
+	unsigned		iopoll_queue : 1;
 	/* opcode specific path will handle ->async_data allocation if needed */
 	unsigned		manual_alloc : 1;
 	/* size of async data needed, if any */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 02/12] io_uring: improve io_double_lock_ctx fail handling
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 01/12] io_uring: dont remove file from msg_ring reqs Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 03/12] io_uring: skip overflow CQE posting for dying ring Pavel Begunkov
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

msg_ring will fail the request if it can't lock rings, instead punt it
to io-wq as was originally intended.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/msg_ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index 615c85e164ab..c7d6586164ca 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -164,6 +164,8 @@ int io_msg_ring(struct io_kiocb *req, unsigned int issue_flags)
 	}
 
 done:
+	if (ret == -EAGAIN)
+		return -EAGAIN;
 	if (ret < 0)
 		req_set_fail(req);
 	io_req_set_res(req, ret, 0);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 03/12] io_uring: skip overflow CQE posting for dying ring
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 01/12] io_uring: dont remove file from msg_ring reqs Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 02/12] io_uring: improve io_double_lock_ctx fail handling Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 04/12] io_uring: don't check overflow flush failures Pavel Begunkov
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

After io_ring_ctx_wait_and_kill() is called there should be no users
poking into rings and so there is no need to post CQEs. So, instead of
trying to post overflowed CQEs into the CQ, drop them. Also, do it
in io_ring_exit_work() in a loop to reduce the number of contexts it
can be executed from and even when it struggles to quiesce the ring we
won't be leaving memory allocated for longer than needed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 45 +++++++++++++++++++++++++++++++--------------
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 62372a641add..5c0b3ba6059e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -611,12 +611,30 @@ void io_cq_unlock_post(struct io_ring_ctx *ctx)
 }
 
 /* Returns true if there are no backlogged entries after the flush */
-static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
+static void io_cqring_overflow_kill(struct io_ring_ctx *ctx)
+{
+	struct io_overflow_cqe *ocqe;
+	LIST_HEAD(list);
+
+	io_cq_lock(ctx);
+	list_splice_init(&ctx->cq_overflow_list, &list);
+	clear_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
+	io_cq_unlock(ctx);
+
+	while (!list_empty(&list)) {
+		ocqe = list_first_entry(&list, struct io_overflow_cqe, list);
+		list_del(&ocqe->list);
+		kfree(ocqe);
+	}
+}
+
+/* Returns true if there are no backlogged entries after the flush */
+static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx)
 {
 	bool all_flushed;
 	size_t cqe_size = sizeof(struct io_uring_cqe);
 
-	if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
+	if (__io_cqring_events(ctx) == ctx->cq_entries)
 		return false;
 
 	if (ctx->flags & IORING_SETUP_CQE32)
@@ -627,15 +645,11 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 		struct io_uring_cqe *cqe = io_get_cqe_overflow(ctx, true);
 		struct io_overflow_cqe *ocqe;
 
-		if (!cqe && !force)
+		if (!cqe)
 			break;
 		ocqe = list_first_entry(&ctx->cq_overflow_list,
 					struct io_overflow_cqe, list);
-		if (cqe)
-			memcpy(cqe, &ocqe->cqe, cqe_size);
-		else
-			io_account_cq_overflow(ctx);
-
+		memcpy(cqe, &ocqe->cqe, cqe_size);
 		list_del(&ocqe->list);
 		kfree(ocqe);
 	}
@@ -658,7 +672,7 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx)
 		/* iopoll syncs against uring_lock, not completion_lock */
 		if (ctx->flags & IORING_SETUP_IOPOLL)
 			mutex_lock(&ctx->uring_lock);
-		ret = __io_cqring_overflow_flush(ctx, false);
+		ret = __io_cqring_overflow_flush(ctx);
 		if (ctx->flags & IORING_SETUP_IOPOLL)
 			mutex_unlock(&ctx->uring_lock);
 	}
@@ -1478,7 +1492,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
 	check_cq = READ_ONCE(ctx->check_cq);
 	if (unlikely(check_cq)) {
 		if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
-			__io_cqring_overflow_flush(ctx, false);
+			__io_cqring_overflow_flush(ctx);
 		/*
 		 * Similarly do not spin if we have not informed the user of any
 		 * dropped CQE.
@@ -2646,8 +2660,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 		__io_sqe_buffers_unregister(ctx);
 	if (ctx->file_data)
 		__io_sqe_files_unregister(ctx);
-	if (ctx->rings)
-		__io_cqring_overflow_flush(ctx, true);
+	io_cqring_overflow_kill(ctx);
 	io_eventfd_unregister(ctx);
 	io_alloc_cache_free(&ctx->apoll_cache, io_apoll_cache_free);
 	io_alloc_cache_free(&ctx->netmsg_cache, io_netmsg_cache_free);
@@ -2788,6 +2801,12 @@ static __cold void io_ring_exit_work(struct work_struct *work)
 	 * as nobody else will be looking for them.
 	 */
 	do {
+		if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) {
+			mutex_lock(&ctx->uring_lock);
+			io_cqring_overflow_kill(ctx);
+			mutex_unlock(&ctx->uring_lock);
+		}
+
 		if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
 			io_move_task_work_from_local(ctx);
 
@@ -2853,8 +2872,6 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
 
 	mutex_lock(&ctx->uring_lock);
 	percpu_ref_kill(&ctx->refs);
-	if (ctx->rings)
-		__io_cqring_overflow_flush(ctx, true);
 	xa_for_each(&ctx->personalities, index, creds)
 		io_unregister_personality(ctx, index);
 	if (ctx->rings)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 04/12] io_uring: don't check overflow flush failures
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
                   ` (2 preceding siblings ...)
  2022-12-07  3:53 ` [PATCH for-next v2 03/12] io_uring: skip overflow CQE posting for dying ring Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 05/12] io_uring: complete all requests in task context Pavel Begunkov
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

The only way to fail overflowed CQEs flush is for CQ to be fully packed.
There is one place checking for flush failures, i.e. io_cqring_wait(),
but we limit the number to be waited for by the CQ size, so getting a
failure automatically means that we're done with waiting.

Don't check for failures, rarely but they might spuriously fail CQ
waiting with -EBUSY.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 5c0b3ba6059e..7bebca5ed950 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -629,13 +629,12 @@ static void io_cqring_overflow_kill(struct io_ring_ctx *ctx)
 }
 
 /* Returns true if there are no backlogged entries after the flush */
-static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx)
+static void __io_cqring_overflow_flush(struct io_ring_ctx *ctx)
 {
-	bool all_flushed;
 	size_t cqe_size = sizeof(struct io_uring_cqe);
 
 	if (__io_cqring_events(ctx) == ctx->cq_entries)
-		return false;
+		return;
 
 	if (ctx->flags & IORING_SETUP_CQE32)
 		cqe_size <<= 1;
@@ -654,30 +653,23 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx)
 		kfree(ocqe);
 	}
 
-	all_flushed = list_empty(&ctx->cq_overflow_list);
-	if (all_flushed) {
+	if (list_empty(&ctx->cq_overflow_list)) {
 		clear_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
 		atomic_andnot(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
 	}
-
 	io_cq_unlock_post(ctx);
-	return all_flushed;
 }
 
-static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx)
+static void io_cqring_overflow_flush(struct io_ring_ctx *ctx)
 {
-	bool ret = true;
-
 	if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) {
 		/* iopoll syncs against uring_lock, not completion_lock */
 		if (ctx->flags & IORING_SETUP_IOPOLL)
 			mutex_lock(&ctx->uring_lock);
-		ret = __io_cqring_overflow_flush(ctx);
+		__io_cqring_overflow_flush(ctx);
 		if (ctx->flags & IORING_SETUP_IOPOLL)
 			mutex_unlock(&ctx->uring_lock);
 	}
-
-	return ret;
 }
 
 void __io_put_task(struct task_struct *task, int nr)
@@ -2505,11 +2497,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 
 	trace_io_uring_cqring_wait(ctx, min_events);
 	do {
-		/* if we can't even flush overflow, don't wait for more */
-		if (!io_cqring_overflow_flush(ctx)) {
-			ret = -EBUSY;
-			break;
-		}
+		io_cqring_overflow_flush(ctx);
 		prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
 		ret = io_cqring_wait_schedule(ctx, &iowq, timeout);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 05/12] io_uring: complete all requests in task context
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
                   ` (3 preceding siblings ...)
  2022-12-07  3:53 ` [PATCH for-next v2 04/12] io_uring: don't check overflow flush failures Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 06/12] io_uring: force multishot CQEs into " Pavel Begunkov
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

This patch adds ctx->task_complete flag. If set, we'll complete all
requests in the context of the original task. Note, this extends to
completion CQE posting only but not io_kiocb cleanup / free, e.g. io-wq
may free the requests in the free calllback. This flag will be used
later for optimisations purposes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/io_uring.h       |  2 ++
 include/linux/io_uring_types.h |  2 ++
 io_uring/io_uring.c            | 14 +++++++++++---
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 29e519752da4..934e5dd4ccc0 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -11,6 +11,8 @@ enum io_uring_cmd_flags {
 	IO_URING_F_UNLOCKED		= 2,
 	/* the request is executed from poll, it should not be freed */
 	IO_URING_F_MULTISHOT		= 4,
+	/* executed by io-wq */
+	IO_URING_F_IOWQ			= 8,
 	/* int's last bit, sign checks are usually faster than a bit test */
 	IO_URING_F_NONBLOCK		= INT_MIN,
 
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index accdfecee953..6be1e1359c89 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -208,6 +208,8 @@ struct io_ring_ctx {
 		unsigned int		drain_disabled: 1;
 		unsigned int		has_evfd: 1;
 		unsigned int		syscall_iopoll: 1;
+		/* all CQEs should be posted only by the submitter task */
+		unsigned int		task_complete: 1;
 	} ____cacheline_aligned_in_smp;
 
 	/* submission data */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 7bebca5ed950..52ea83f241c6 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -932,8 +932,11 @@ static void __io_req_complete_post(struct io_kiocb *req)
 
 void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
 {
-	if (!(issue_flags & IO_URING_F_UNLOCKED) ||
-	    !(req->ctx->flags & IORING_SETUP_IOPOLL)) {
+	if (req->ctx->task_complete && (issue_flags & IO_URING_F_IOWQ)) {
+		req->io_task_work.func = io_req_task_complete;
+		io_req_task_work_add(req);
+	} else if (!(issue_flags & IO_URING_F_UNLOCKED) ||
+		   !(req->ctx->flags & IORING_SETUP_IOPOLL)) {
 		__io_req_complete_post(req);
 	} else {
 		struct io_ring_ctx *ctx = req->ctx;
@@ -1841,7 +1844,7 @@ void io_wq_submit_work(struct io_wq_work *work)
 {
 	struct io_kiocb *req = container_of(work, struct io_kiocb, work);
 	const struct io_op_def *def = &io_op_defs[req->opcode];
-	unsigned int issue_flags = IO_URING_F_UNLOCKED;
+	unsigned int issue_flags = IO_URING_F_UNLOCKED | IO_URING_F_IOWQ;
 	bool needs_poll = false;
 	int ret = 0, err = -ECANCELED;
 
@@ -3501,6 +3504,11 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
 	if (!ctx)
 		return -ENOMEM;
 
+	if ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
+	    !(ctx->flags & IORING_SETUP_IOPOLL) &&
+	    !(ctx->flags & IORING_SETUP_SQPOLL))
+		ctx->task_complete = true;
+
 	/*
 	 * When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, user
 	 * space applications don't need to do io completion events
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 06/12] io_uring: force multishot CQEs into task context
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
                   ` (4 preceding siblings ...)
  2022-12-07  3:53 ` [PATCH for-next v2 05/12] io_uring: complete all requests in task context Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 07/12] io_uring: use tw for putting rsrc Pavel Begunkov
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Multishot are posting CQEs outside of the normal request completion
path, which is usually done from within a task work handler. However, it
might be not the case when it's yet to be polled but has been punted to
io-wq. Make it abide ->task_complete and push it to the polling path
when executed by io-wq.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/net.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/io_uring/net.c b/io_uring/net.c
index 90342dcb6b1d..f276f6dd5b09 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -67,6 +67,19 @@ struct io_sr_msg {
 	struct io_kiocb 		*notif;
 };
 
+static inline bool io_check_multishot(struct io_kiocb *req,
+				      unsigned int issue_flags)
+{
+	/*
+	 * When ->locked_cq is set we only allow to post CQEs from the original
+	 * task context. Usual request completions will be handled in other
+	 * generic paths but multipoll may decide to post extra cqes.
+	 */
+	return !(issue_flags & IO_URING_F_IOWQ) ||
+		!(issue_flags & IO_URING_F_MULTISHOT) ||
+		!req->ctx->task_complete;
+}
+
 int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_shutdown *shutdown = io_kiocb_to_cmd(req, struct io_shutdown);
@@ -730,6 +743,9 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
 		return io_setup_async_msg(req, kmsg, issue_flags);
 
+	if (!io_check_multishot(req, issue_flags))
+		return io_setup_async_msg(req, kmsg, issue_flags);
+
 retry_multishot:
 	if (io_do_buffer_select(req)) {
 		void __user *buf;
@@ -829,6 +845,9 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
 		return -EAGAIN;
 
+	if (!io_check_multishot(req, issue_flags))
+		return -EAGAIN;
+
 	sock = sock_from_file(req->file);
 	if (unlikely(!sock))
 		return -ENOTSOCK;
@@ -1280,6 +1299,8 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 	struct file *file;
 	int ret, fd;
 
+	if (!io_check_multishot(req, issue_flags))
+		return -EAGAIN;
 retry:
 	if (!fixed) {
 		fd = __get_unused_fd_flags(accept->flags, accept->nofile);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 07/12] io_uring: use tw for putting rsrc
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
                   ` (5 preceding siblings ...)
  2022-12-07  3:53 ` [PATCH for-next v2 06/12] io_uring: force multishot CQEs into " Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 08/12] io_uring: never run tw and fallback in parallel Pavel Begunkov
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Use task_work for completing rsrc removals, it'll be needed later for
spinlock optimisations.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/io_uring_types.h |  1 +
 io_uring/io_uring.c            |  1 +
 io_uring/rsrc.c                | 19 +++++++++++++++++--
 io_uring/rsrc.h                |  1 +
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 6be1e1359c89..dcd8a563ab52 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -328,6 +328,7 @@ struct io_ring_ctx {
 	struct io_rsrc_data		*buf_data;
 
 	struct delayed_work		rsrc_put_work;
+	struct callback_head		rsrc_put_tw;
 	struct llist_head		rsrc_put_llist;
 	struct list_head		rsrc_ref_list;
 	spinlock_t			rsrc_ref_lock;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 52ea83f241c6..3a422a7b7132 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -326,6 +326,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	spin_lock_init(&ctx->rsrc_ref_lock);
 	INIT_LIST_HEAD(&ctx->rsrc_ref_list);
 	INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
+	init_task_work(&ctx->rsrc_put_tw, io_rsrc_put_tw);
 	init_llist_head(&ctx->rsrc_put_llist);
 	init_llist_head(&ctx->work_llist);
 	INIT_LIST_HEAD(&ctx->tctx_list);
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index d25309400a45..18de10c68a15 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -204,6 +204,14 @@ void io_rsrc_put_work(struct work_struct *work)
 	}
 }
 
+void io_rsrc_put_tw(struct callback_head *cb)
+{
+	struct io_ring_ctx *ctx = container_of(cb, struct io_ring_ctx,
+					       rsrc_put_tw);
+
+	io_rsrc_put_work(&ctx->rsrc_put_work.work);
+}
+
 void io_wait_rsrc_data(struct io_rsrc_data *data)
 {
 	if (data && !atomic_dec_and_test(&data->refs))
@@ -242,8 +250,15 @@ static __cold void io_rsrc_node_ref_zero(struct percpu_ref *ref)
 	}
 	spin_unlock_irqrestore(&ctx->rsrc_ref_lock, flags);
 
-	if (first_add)
-		mod_delayed_work(system_wq, &ctx->rsrc_put_work, delay);
+	if (!first_add)
+		return;
+
+	if (ctx->submitter_task) {
+		if (!task_work_add(ctx->submitter_task, &ctx->rsrc_put_tw,
+				   ctx->notify_method))
+			return;
+	}
+	mod_delayed_work(system_wq, &ctx->rsrc_put_work, delay);
 }
 
 static struct io_rsrc_node *io_rsrc_node_alloc(void)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 81445a477622..2b8743645efc 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -53,6 +53,7 @@ struct io_mapped_ubuf {
 	struct bio_vec	bvec[];
 };
 
+void io_rsrc_put_tw(struct callback_head *cb);
 void io_rsrc_put_work(struct work_struct *work);
 void io_rsrc_refs_refill(struct io_ring_ctx *ctx);
 void io_wait_rsrc_data(struct io_rsrc_data *data);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 08/12] io_uring: never run tw and fallback in parallel
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
                   ` (6 preceding siblings ...)
  2022-12-07  3:53 ` [PATCH for-next v2 07/12] io_uring: use tw for putting rsrc Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 09/12] io_uring: get rid of double locking Pavel Begunkov
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Once we fallback a tw we want all requests to that task to be given to
the fallback wq so we dont run it in parallel with the last, i.e. post
PF_EXITING, tw run of the task.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 3a422a7b7132..0e424d8721ab 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -149,6 +149,7 @@ static void io_clean_op(struct io_kiocb *req);
 static void io_queue_sqe(struct io_kiocb *req);
 static void io_move_task_work_from_local(struct io_ring_ctx *ctx);
 static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
+static __cold void io_fallback_tw(struct io_uring_task *tctx);
 
 static struct kmem_cache *req_cachep;
 
@@ -1160,10 +1161,17 @@ void tctx_task_work(struct callback_head *cb)
 	struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
 						  task_work);
 	struct llist_node fake = {};
-	struct llist_node *node = io_llist_xchg(&tctx->task_list, &fake);
+	struct llist_node *node;
 	unsigned int loops = 1;
-	unsigned int count = handle_tw_list(node, &ctx, &uring_locked, NULL);
+	unsigned int count;
+
+	if (unlikely(current->flags & PF_EXITING)) {
+		io_fallback_tw(tctx);
+		return;
+	}
 
+	node = io_llist_xchg(&tctx->task_list, &fake);
+	count = handle_tw_list(node, &ctx, &uring_locked, NULL);
 	node = io_llist_cmpxchg(&tctx->task_list, &fake, NULL);
 	while (node != &fake) {
 		loops++;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 09/12] io_uring: get rid of double locking
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
                   ` (7 preceding siblings ...)
  2022-12-07  3:53 ` [PATCH for-next v2 08/12] io_uring: never run tw and fallback in parallel Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 10/12] io_uring: extract a io_msg_install_complete helper Pavel Begunkov
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

We don't need to take both uring_locks at once, msg_ring can be split in
two parts, first getting a file from the filetable of the first ring and
then installing it into the second one.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/msg_ring.c | 85 ++++++++++++++++++++++++++-------------------
 io_uring/msg_ring.h |  1 +
 io_uring/opdef.c    |  1 +
 3 files changed, 51 insertions(+), 36 deletions(-)

diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index c7d6586164ca..387c45a58654 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -15,6 +15,7 @@
 
 struct io_msg {
 	struct file			*file;
+	struct file			*src_file;
 	u64 user_data;
 	u32 len;
 	u32 cmd;
@@ -23,6 +24,17 @@ struct io_msg {
 	u32 flags;
 };
 
+void io_msg_ring_cleanup(struct io_kiocb *req)
+{
+	struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
+
+	if (WARN_ON_ONCE(!msg->src_file))
+		return;
+
+	fput(msg->src_file);
+	msg->src_file = NULL;
+}
+
 static int io_msg_ring_data(struct io_kiocb *req)
 {
 	struct io_ring_ctx *target_ctx = req->file->private_data;
@@ -37,17 +49,13 @@ static int io_msg_ring_data(struct io_kiocb *req)
 	return -EOVERFLOW;
 }
 
-static void io_double_unlock_ctx(struct io_ring_ctx *ctx,
-				 struct io_ring_ctx *octx,
+static void io_double_unlock_ctx(struct io_ring_ctx *octx,
 				 unsigned int issue_flags)
 {
-	if (issue_flags & IO_URING_F_UNLOCKED)
-		mutex_unlock(&ctx->uring_lock);
 	mutex_unlock(&octx->uring_lock);
 }
 
-static int io_double_lock_ctx(struct io_ring_ctx *ctx,
-			      struct io_ring_ctx *octx,
+static int io_double_lock_ctx(struct io_ring_ctx *octx,
 			      unsigned int issue_flags)
 {
 	/*
@@ -60,17 +68,28 @@ static int io_double_lock_ctx(struct io_ring_ctx *ctx,
 			return -EAGAIN;
 		return 0;
 	}
+	mutex_lock(&octx->uring_lock);
+	return 0;
+}
 
-	/* Always grab smallest value ctx first. We know ctx != octx. */
-	if (ctx < octx) {
-		mutex_lock(&ctx->uring_lock);
-		mutex_lock(&octx->uring_lock);
-	} else {
-		mutex_lock(&octx->uring_lock);
-		mutex_lock(&ctx->uring_lock);
+static struct file *io_msg_grab_file(struct io_kiocb *req, unsigned int issue_flags)
+{
+	struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
+	struct io_ring_ctx *ctx = req->ctx;
+	struct file *file = NULL;
+	unsigned long file_ptr;
+	int idx = msg->src_fd;
+
+	io_ring_submit_lock(ctx, issue_flags);
+	if (likely(idx < ctx->nr_user_files)) {
+		idx = array_index_nospec(idx, ctx->nr_user_files);
+		file_ptr = io_fixed_file_slot(&ctx->file_table, idx)->file_ptr;
+		file = (struct file *) (file_ptr & FFS_MASK);
+		if (file)
+			get_file(file);
 	}
-
-	return 0;
+	io_ring_submit_unlock(ctx, issue_flags);
+	return file;
 }
 
 static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags)
@@ -78,34 +97,27 @@ static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_ring_ctx *target_ctx = req->file->private_data;
 	struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
 	struct io_ring_ctx *ctx = req->ctx;
-	unsigned long file_ptr;
-	struct file *src_file;
+	struct file *src_file = msg->src_file;
 	int ret;
 
 	if (target_ctx == ctx)
 		return -EINVAL;
+	if (!src_file) {
+		src_file = io_msg_grab_file(req, issue_flags);
+		if (!src_file)
+			return -EBADF;
+		msg->src_file = src_file;
+		req->flags |= REQ_F_NEED_CLEANUP;
+	}
 
-	ret = io_double_lock_ctx(ctx, target_ctx, issue_flags);
-	if (unlikely(ret))
-		return ret;
-
-	ret = -EBADF;
-	if (unlikely(msg->src_fd >= ctx->nr_user_files))
-		goto out_unlock;
-
-	msg->src_fd = array_index_nospec(msg->src_fd, ctx->nr_user_files);
-	file_ptr = io_fixed_file_slot(&ctx->file_table, msg->src_fd)->file_ptr;
-	if (!file_ptr)
-		goto out_unlock;
-
-	src_file = (struct file *) (file_ptr & FFS_MASK);
-	get_file(src_file);
+	if (unlikely(io_double_lock_ctx(target_ctx, issue_flags)))
+		return -EAGAIN;
 
 	ret = __io_fixed_fd_install(target_ctx, src_file, msg->dst_fd);
-	if (ret < 0) {
-		fput(src_file);
+	if (ret < 0)
 		goto out_unlock;
-	}
+	msg->src_file = NULL;
+	req->flags &= ~REQ_F_NEED_CLEANUP;
 
 	if (msg->flags & IORING_MSG_RING_CQE_SKIP)
 		goto out_unlock;
@@ -119,7 +131,7 @@ static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags)
 	if (!io_post_aux_cqe(target_ctx, msg->user_data, msg->len, 0))
 		ret = -EOVERFLOW;
 out_unlock:
-	io_double_unlock_ctx(ctx, target_ctx, issue_flags);
+	io_double_unlock_ctx(target_ctx, issue_flags);
 	return ret;
 }
 
@@ -130,6 +142,7 @@ int io_msg_ring_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if (unlikely(sqe->buf_index || sqe->personality))
 		return -EINVAL;
 
+	msg->src_file = NULL;
 	msg->user_data = READ_ONCE(sqe->off);
 	msg->len = READ_ONCE(sqe->len);
 	msg->cmd = READ_ONCE(sqe->addr);
diff --git a/io_uring/msg_ring.h b/io_uring/msg_ring.h
index fb9601f202d0..3987ee6c0e5f 100644
--- a/io_uring/msg_ring.h
+++ b/io_uring/msg_ring.h
@@ -2,3 +2,4 @@
 
 int io_msg_ring_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_msg_ring(struct io_kiocb *req, unsigned int issue_flags);
+void io_msg_ring_cleanup(struct io_kiocb *req);
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 04dd2c983fce..3aa0d65c50e3 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -445,6 +445,7 @@ const struct io_op_def io_op_defs[] = {
 		.name			= "MSG_RING",
 		.prep			= io_msg_ring_prep,
 		.issue			= io_msg_ring,
+		.cleanup		= io_msg_ring_cleanup,
 	},
 	[IORING_OP_FSETXATTR] = {
 		.needs_file = 1,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 10/12] io_uring: extract a io_msg_install_complete helper
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
                   ` (8 preceding siblings ...)
  2022-12-07  3:53 ` [PATCH for-next v2 09/12] io_uring: get rid of double locking Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 11/12] io_uring: do msg_ring in target task via tw Pavel Begunkov
  2022-12-07  3:53 ` [PATCH for-next v2 12/12] io_uring: skip spinlocking for ->task_complete Pavel Begunkov
  11 siblings, 0 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Extract a helper called io_msg_install_complete() from io_msg_send_fd(),
will be used later.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/msg_ring.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index 387c45a58654..525063ac3dab 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -92,36 +92,25 @@ static struct file *io_msg_grab_file(struct io_kiocb *req, unsigned int issue_fl
 	return file;
 }
 
-static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags)
+static int io_msg_install_complete(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_ring_ctx *target_ctx = req->file->private_data;
 	struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
-	struct io_ring_ctx *ctx = req->ctx;
 	struct file *src_file = msg->src_file;
 	int ret;
 
-	if (target_ctx == ctx)
-		return -EINVAL;
-	if (!src_file) {
-		src_file = io_msg_grab_file(req, issue_flags);
-		if (!src_file)
-			return -EBADF;
-		msg->src_file = src_file;
-		req->flags |= REQ_F_NEED_CLEANUP;
-	}
-
 	if (unlikely(io_double_lock_ctx(target_ctx, issue_flags)))
 		return -EAGAIN;
 
 	ret = __io_fixed_fd_install(target_ctx, src_file, msg->dst_fd);
 	if (ret < 0)
 		goto out_unlock;
+
 	msg->src_file = NULL;
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 
 	if (msg->flags & IORING_MSG_RING_CQE_SKIP)
 		goto out_unlock;
-
 	/*
 	 * If this fails, the target still received the file descriptor but
 	 * wasn't notified of the fact. This means that if this request
@@ -135,6 +124,25 @@ static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags)
 	return ret;
 }
 
+static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags)
+{
+	struct io_ring_ctx *target_ctx = req->file->private_data;
+	struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
+	struct io_ring_ctx *ctx = req->ctx;
+	struct file *src_file = msg->src_file;
+
+	if (target_ctx == ctx)
+		return -EINVAL;
+	if (!src_file) {
+		src_file = io_msg_grab_file(req, issue_flags);
+		if (!src_file)
+			return -EBADF;
+		msg->src_file = src_file;
+		req->flags |= REQ_F_NEED_CLEANUP;
+	}
+	return io_msg_install_complete(req, issue_flags);
+}
+
 int io_msg_ring_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 11/12] io_uring: do msg_ring in target task via tw
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
                   ` (9 preceding siblings ...)
  2022-12-07  3:53 ` [PATCH for-next v2 10/12] io_uring: extract a io_msg_install_complete helper Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  2022-12-07 15:31   ` Jens Axboe
  2022-12-07  3:53 ` [PATCH for-next v2 12/12] io_uring: skip spinlocking for ->task_complete Pavel Begunkov
  11 siblings, 1 reply; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

While executing in a context of one io_uring instance msg_ring
manipulates another ring. We're trying to keep CQEs posting contained in
the context of the ring-owner task, use task_work to send the request to
the target ring's task when we're modifying its CQ or trying to install
a file. Note, we can't safely use io_uring task_work infra and have to
use task_work directly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/msg_ring.c | 56 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index 525063ac3dab..24e6cc477515 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -16,6 +16,7 @@
 struct io_msg {
 	struct file			*file;
 	struct file			*src_file;
+	struct callback_head		tw;
 	u64 user_data;
 	u32 len;
 	u32 cmd;
@@ -35,6 +36,23 @@ void io_msg_ring_cleanup(struct io_kiocb *req)
 	msg->src_file = NULL;
 }
 
+static void io_msg_tw_complete(struct callback_head *head)
+{
+	struct io_msg *msg = container_of(head, struct io_msg, tw);
+	struct io_kiocb *req = cmd_to_io_kiocb(msg);
+	struct io_ring_ctx *target_ctx = req->file->private_data;
+	int ret = 0;
+
+	if (current->flags & PF_EXITING)
+		ret = -EOWNERDEAD;
+	else if (!io_post_aux_cqe(target_ctx, msg->user_data, msg->len, 0))
+		ret = -EOVERFLOW;
+
+	if (ret < 0)
+		req_set_fail(req);
+	io_req_queue_tw_complete(req, ret);
+}
+
 static int io_msg_ring_data(struct io_kiocb *req)
 {
 	struct io_ring_ctx *target_ctx = req->file->private_data;
@@ -43,6 +61,15 @@ static int io_msg_ring_data(struct io_kiocb *req)
 	if (msg->src_fd || msg->dst_fd || msg->flags)
 		return -EINVAL;
 
+	if (target_ctx->task_complete && current != target_ctx->submitter_task) {
+		init_task_work(&msg->tw, io_msg_tw_complete);
+		if (task_work_add(target_ctx->submitter_task, &msg->tw,
+				  TWA_SIGNAL))
+			return -EOWNERDEAD;
+
+		return IOU_ISSUE_SKIP_COMPLETE;
+	}
+
 	if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, 0))
 		return 0;
 
@@ -124,6 +151,19 @@ static int io_msg_install_complete(struct io_kiocb *req, unsigned int issue_flag
 	return ret;
 }
 
+static void io_msg_tw_fd_complete(struct callback_head *head)
+{
+	struct io_msg *msg = container_of(head, struct io_msg, tw);
+	struct io_kiocb *req = cmd_to_io_kiocb(msg);
+	int ret = -EOWNERDEAD;
+
+	if (!(current->flags & PF_EXITING))
+		ret = io_msg_install_complete(req, IO_URING_F_UNLOCKED);
+	if (ret < 0)
+		req_set_fail(req);
+	io_req_queue_tw_complete(req, ret);
+}
+
 static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_ring_ctx *target_ctx = req->file->private_data;
@@ -140,6 +180,15 @@ static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags)
 		msg->src_file = src_file;
 		req->flags |= REQ_F_NEED_CLEANUP;
 	}
+
+	if (target_ctx->task_complete && current != target_ctx->submitter_task) {
+		init_task_work(&msg->tw, io_msg_tw_fd_complete);
+		if (task_work_add(target_ctx->submitter_task, &msg->tw,
+				  TWA_SIGNAL))
+			return -EOWNERDEAD;
+
+		return IOU_ISSUE_SKIP_COMPLETE;
+	}
 	return io_msg_install_complete(req, issue_flags);
 }
 
@@ -185,10 +234,11 @@ int io_msg_ring(struct io_kiocb *req, unsigned int issue_flags)
 	}
 
 done:
-	if (ret == -EAGAIN)
-		return -EAGAIN;
-	if (ret < 0)
+	if (ret < 0) {
+		if (ret == -EAGAIN || ret == IOU_ISSUE_SKIP_COMPLETE)
+			return ret;
 		req_set_fail(req);
+	}
 	io_req_set_res(req, ret, 0);
 	return IOU_OK;
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v2 12/12] io_uring: skip spinlocking for ->task_complete
  2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
                   ` (10 preceding siblings ...)
  2022-12-07  3:53 ` [PATCH for-next v2 11/12] io_uring: do msg_ring in target task via tw Pavel Begunkov
@ 2022-12-07  3:53 ` Pavel Begunkov
  11 siblings, 0 replies; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07  3:53 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

->task_complete was added to serialised CQE posting by doing it from
the task context only (or fallback wq when the task is dead), and now we
can use that to avoid taking ->completion_lock while filling CQ entries.
The patch skips spinlocking only in two spots,
__io_submit_flush_completions() and flushing in io_aux_cqe, it's safer
and covers all cases we care about. Extra care is taken to force taking
the lock while queueing overflow entries.

It fundamentally relies on SINGLE_ISSUER to have only one task posting
events. It also need to take into account overflowed CQEs, flushing of
which happens in the cq wait path, and so this implementation also needs
DEFER_TASKRUN to limit waiters. For the same reason we disable it for
SQPOLL, and for IOPOLL as it won't benefit from it in any case.
DEFER_TASKRUN, SQPOLL and IOPOLL requirement may be relaxed in the
future.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 71 +++++++++++++++++++++++++++++++++------------
 io_uring/io_uring.h | 12 ++++++--
 2 files changed, 62 insertions(+), 21 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 0e424d8721ab..529ea5942dea 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -595,13 +595,25 @@ static inline void io_cq_unlock(struct io_ring_ctx *ctx)
 	spin_unlock(&ctx->completion_lock);
 }
 
+static inline void __io_cq_lock(struct io_ring_ctx *ctx)
+	__acquires(ctx->completion_lock)
+{
+	if (!ctx->task_complete)
+		spin_lock(&ctx->completion_lock);
+}
+
+static inline void __io_cq_unlock(struct io_ring_ctx *ctx)
+{
+	if (!ctx->task_complete)
+		spin_unlock(&ctx->completion_lock);
+}
+
 /* keep it inlined for io_submit_flush_completions() */
-static inline void io_cq_unlock_post_inline(struct io_ring_ctx *ctx)
+static inline void __io_cq_unlock_post(struct io_ring_ctx *ctx)
 	__releases(ctx->completion_lock)
 {
 	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-
+	__io_cq_unlock(ctx);
 	io_commit_cqring_flush(ctx);
 	io_cqring_wake(ctx);
 }
@@ -609,7 +621,10 @@ static inline void io_cq_unlock_post_inline(struct io_ring_ctx *ctx)
 void io_cq_unlock_post(struct io_ring_ctx *ctx)
 	__releases(ctx->completion_lock)
 {
-	io_cq_unlock_post_inline(ctx);
+	io_commit_cqring(ctx);
+	spin_unlock(&ctx->completion_lock);
+	io_commit_cqring_flush(ctx);
+	io_cqring_wake(ctx);
 }
 
 /* Returns true if there are no backlogged entries after the flush */
@@ -796,12 +811,13 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow)
 	return &rings->cqes[off];
 }
 
-static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags,
-			    bool allow_overflow)
+static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res,
+			      u32 cflags)
 {
 	struct io_uring_cqe *cqe;
 
-	lockdep_assert_held(&ctx->completion_lock);
+	if (!ctx->task_complete)
+		lockdep_assert_held(&ctx->completion_lock);
 
 	ctx->cq_extra++;
 
@@ -824,10 +840,6 @@ static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32
 		}
 		return true;
 	}
-
-	if (allow_overflow)
-		return io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0);
-
 	return false;
 }
 
@@ -841,7 +853,17 @@ static void __io_flush_post_cqes(struct io_ring_ctx *ctx)
 	for (i = 0; i < state->cqes_count; i++) {
 		struct io_uring_cqe *cqe = &state->cqes[i];
 
-		io_fill_cqe_aux(ctx, cqe->user_data, cqe->res, cqe->flags, true);
+		if (!io_fill_cqe_aux(ctx, cqe->user_data, cqe->res, cqe->flags)) {
+			if (ctx->task_complete) {
+				spin_lock(&ctx->completion_lock);
+				io_cqring_event_overflow(ctx, cqe->user_data,
+							cqe->res, cqe->flags, 0, 0);
+				spin_unlock(&ctx->completion_lock);
+			} else {
+				io_cqring_event_overflow(ctx, cqe->user_data,
+							cqe->res, cqe->flags, 0, 0);
+			}
+		}
 	}
 	state->cqes_count = 0;
 }
@@ -852,7 +874,10 @@ static bool __io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u
 	bool filled;
 
 	io_cq_lock(ctx);
-	filled = io_fill_cqe_aux(ctx, user_data, res, cflags, allow_overflow);
+	filled = io_fill_cqe_aux(ctx, user_data, res, cflags);
+	if (!filled && allow_overflow)
+		filled = io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0);
+
 	io_cq_unlock_post(ctx);
 	return filled;
 }
@@ -876,10 +901,10 @@ bool io_aux_cqe(struct io_ring_ctx *ctx, bool defer, u64 user_data, s32 res, u32
 	lockdep_assert_held(&ctx->uring_lock);
 
 	if (ctx->submit_state.cqes_count == length) {
-		io_cq_lock(ctx);
+		__io_cq_lock(ctx);
 		__io_flush_post_cqes(ctx);
 		/* no need to flush - flush is deferred */
-		io_cq_unlock(ctx);
+		__io_cq_unlock_post(ctx);
 	}
 
 	/* For defered completions this is not as strict as it is otherwise,
@@ -1414,7 +1439,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
 	struct io_wq_work_node *node, *prev;
 	struct io_submit_state *state = &ctx->submit_state;
 
-	io_cq_lock(ctx);
+	__io_cq_lock(ctx);
 	/* must come first to preserve CQE ordering in failure cases */
 	if (state->cqes_count)
 		__io_flush_post_cqes(ctx);
@@ -1422,10 +1447,18 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
 		struct io_kiocb *req = container_of(node, struct io_kiocb,
 					    comp_list);
 
-		if (!(req->flags & REQ_F_CQE_SKIP))
-			io_fill_cqe_req(ctx, req);
+		if (!(req->flags & REQ_F_CQE_SKIP) &&
+		    unlikely(!__io_fill_cqe_req(ctx, req))) {
+			if (ctx->task_complete) {
+				spin_lock(&ctx->completion_lock);
+				io_req_cqe_overflow(req);
+				spin_unlock(&ctx->completion_lock);
+			} else {
+				io_req_cqe_overflow(req);
+			}
+		}
 	}
-	io_cq_unlock_post_inline(ctx);
+	__io_cq_unlock_post(ctx);
 
 	if (!wq_list_empty(&ctx->submit_state.compl_reqs)) {
 		io_free_batch_list(ctx, state->compl_reqs.first);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 62227ec3260c..c117e029c8dc 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -110,7 +110,7 @@ static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx)
 	return io_get_cqe_overflow(ctx, false);
 }
 
-static inline bool io_fill_cqe_req(struct io_ring_ctx *ctx,
+static inline bool __io_fill_cqe_req(struct io_ring_ctx *ctx,
 				     struct io_kiocb *req)
 {
 	struct io_uring_cqe *cqe;
@@ -122,7 +122,7 @@ static inline bool io_fill_cqe_req(struct io_ring_ctx *ctx,
 	 */
 	cqe = io_get_cqe(ctx);
 	if (unlikely(!cqe))
-		return io_req_cqe_overflow(req);
+		return false;
 
 	trace_io_uring_complete(req->ctx, req, req->cqe.user_data,
 				req->cqe.res, req->cqe.flags,
@@ -145,6 +145,14 @@ static inline bool io_fill_cqe_req(struct io_ring_ctx *ctx,
 	return true;
 }
 
+static inline bool io_fill_cqe_req(struct io_ring_ctx *ctx,
+				   struct io_kiocb *req)
+{
+	if (likely(__io_fill_cqe_req(ctx, req)))
+		return true;
+	return io_req_cqe_overflow(req);
+}
+
 static inline void req_set_fail(struct io_kiocb *req)
 {
 	req->flags |= REQ_F_FAIL;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v2 01/12] io_uring: dont remove file from msg_ring reqs
  2022-12-07  3:53 ` [PATCH for-next v2 01/12] io_uring: dont remove file from msg_ring reqs Pavel Begunkov
@ 2022-12-07 13:52   ` Jens Axboe
  2022-12-07 21:12     ` Pavel Begunkov
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2022-12-07 13:52 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 12/6/22 8:53 PM, Pavel Begunkov wrote:
> We should not be messing with req->file outside of core paths. Clearing
> it makes msg_ring non reentrant, i.e. luckily io_msg_send_fd() fails the
> request on failed io_double_lock_ctx() but clearly was originally
> intended to do retries instead.

That's basically what I had in my patch, except I just went for the
negated one instead to cut down on churn. Why not just do that?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v2 11/12] io_uring: do msg_ring in target task via tw
  2022-12-07  3:53 ` [PATCH for-next v2 11/12] io_uring: do msg_ring in target task via tw Pavel Begunkov
@ 2022-12-07 15:31   ` Jens Axboe
  2022-12-07 15:51     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2022-12-07 15:31 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 12/6/22 8:53?PM, Pavel Begunkov wrote:
> @@ -43,6 +61,15 @@ static int io_msg_ring_data(struct io_kiocb *req)
>  	if (msg->src_fd || msg->dst_fd || msg->flags)
>  		return -EINVAL;
>  
> +	if (target_ctx->task_complete && current != target_ctx->submitter_task) {
> +		init_task_work(&msg->tw, io_msg_tw_complete);
> +		if (task_work_add(target_ctx->submitter_task, &msg->tw,
> +				  TWA_SIGNAL))
> +			return -EOWNERDEAD;
> +
> +		return IOU_ISSUE_SKIP_COMPLETE;
> +	}
> +

We should probably be able to get by with TWA_SIGNAL_NO_IPI here, no?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v2 11/12] io_uring: do msg_ring in target task via tw
  2022-12-07 15:31   ` Jens Axboe
@ 2022-12-07 15:51     ` Jens Axboe
  2022-12-07 21:18       ` Pavel Begunkov
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2022-12-07 15:51 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 12/7/22 8:31 AM, Jens Axboe wrote:
> On 12/6/22 8:53?PM, Pavel Begunkov wrote:
>> @@ -43,6 +61,15 @@ static int io_msg_ring_data(struct io_kiocb *req)
>>  	if (msg->src_fd || msg->dst_fd || msg->flags)
>>  		return -EINVAL;
>>  
>> +	if (target_ctx->task_complete && current != target_ctx->submitter_task) {
>> +		init_task_work(&msg->tw, io_msg_tw_complete);
>> +		if (task_work_add(target_ctx->submitter_task, &msg->tw,
>> +				  TWA_SIGNAL))
>> +			return -EOWNERDEAD;
>> +
>> +		return IOU_ISSUE_SKIP_COMPLETE;
>> +	}
>> +
> 
> We should probably be able to get by with TWA_SIGNAL_NO_IPI here, no?

Considering we didn't even wake before, I'd say that's a solid yes.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v2 01/12] io_uring: dont remove file from msg_ring reqs
  2022-12-07 13:52   ` Jens Axboe
@ 2022-12-07 21:12     ` Pavel Begunkov
  2022-12-07 21:23       ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07 21:12 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 12/7/22 13:52, Jens Axboe wrote:
> On 12/6/22 8:53 PM, Pavel Begunkov wrote:
>> We should not be messing with req->file outside of core paths. Clearing
>> it makes msg_ring non reentrant, i.e. luckily io_msg_send_fd() fails the
>> request on failed io_double_lock_ctx() but clearly was originally
>> intended to do retries instead.
> 
> That's basically what I had in my patch, except I just went for the
> negated one instead to cut down on churn. Why not just do that?
I just already had this patch so left it as is, but if I have to
find a reason it would be: 1) considering that the req->file check
is already an exception to the rule, the negative would be an
exception to the exception, and 2) it removes that extra req->file
check.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v2 11/12] io_uring: do msg_ring in target task via tw
  2022-12-07 15:51     ` Jens Axboe
@ 2022-12-07 21:18       ` Pavel Begunkov
  2022-12-07 21:22         ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Begunkov @ 2022-12-07 21:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 12/7/22 15:51, Jens Axboe wrote:
> On 12/7/22 8:31 AM, Jens Axboe wrote:
>> On 12/6/22 8:53?PM, Pavel Begunkov wrote:
>>> @@ -43,6 +61,15 @@ static int io_msg_ring_data(struct io_kiocb *req)
>>>   	if (msg->src_fd || msg->dst_fd || msg->flags)
>>>   		return -EINVAL;
>>>   
>>> +	if (target_ctx->task_complete && current != target_ctx->submitter_task) {
>>> +		init_task_work(&msg->tw, io_msg_tw_complete);
>>> +		if (task_work_add(target_ctx->submitter_task, &msg->tw,
>>> +				  TWA_SIGNAL))
>>> +			return -EOWNERDEAD;
>>> +
>>> +		return IOU_ISSUE_SKIP_COMPLETE;
>>> +	}
>>> +
>>
>> We should probably be able to get by with TWA_SIGNAL_NO_IPI here, no?
> 
> Considering we didn't even wake before, I'd say that's a solid yes.

I'm not so sure. It'll work, but a naive approach would also lack
IORING_SETUP_TASKRUN_FLAG and so mess with latencies when it's not
desirable.

option 1)

method = TWA_SIGNAL;
if (flags & IORING_SETUP_TASKRUN_FLAG)
	method = NO_IPI;

option 2)

task_work_add(NO_IPI);
atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);


Might be better to have one of those.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v2 11/12] io_uring: do msg_ring in target task via tw
  2022-12-07 21:18       ` Pavel Begunkov
@ 2022-12-07 21:22         ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2022-12-07 21:22 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 12/7/22 2:18 PM, Pavel Begunkov wrote:
> On 12/7/22 15:51, Jens Axboe wrote:
>> On 12/7/22 8:31 AM, Jens Axboe wrote:
>>> On 12/6/22 8:53?PM, Pavel Begunkov wrote:
>>>> @@ -43,6 +61,15 @@ static int io_msg_ring_data(struct io_kiocb *req)
>>>>       if (msg->src_fd || msg->dst_fd || msg->flags)
>>>>           return -EINVAL;
>>>>   +    if (target_ctx->task_complete && current != target_ctx->submitter_task) {
>>>> +        init_task_work(&msg->tw, io_msg_tw_complete);
>>>> +        if (task_work_add(target_ctx->submitter_task, &msg->tw,
>>>> +                  TWA_SIGNAL))
>>>> +            return -EOWNERDEAD;
>>>> +
>>>> +        return IOU_ISSUE_SKIP_COMPLETE;
>>>> +    }
>>>> +
>>>
>>> We should probably be able to get by with TWA_SIGNAL_NO_IPI here, no?
>>
>> Considering we didn't even wake before, I'd say that's a solid yes.
> 
> I'm not so sure. It'll work, but a naive approach would also lack
> IORING_SETUP_TASKRUN_FLAG and so mess with latencies when it's not
> desirable.
> 
> option 1)
> 
> method = TWA_SIGNAL;
> if (flags & IORING_SETUP_TASKRUN_FLAG)
>     method = NO_IPI;
> 
> option 2)
> 
> task_work_add(NO_IPI);
> atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
> 
> 
> Might be better to have one of those.

I like option 2, which should be fine.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v2 01/12] io_uring: dont remove file from msg_ring reqs
  2022-12-07 21:12     ` Pavel Begunkov
@ 2022-12-07 21:23       ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2022-12-07 21:23 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 12/7/22 2:12 PM, Pavel Begunkov wrote:
> On 12/7/22 13:52, Jens Axboe wrote:
>> On 12/6/22 8:53 PM, Pavel Begunkov wrote:
>>> We should not be messing with req->file outside of core paths. Clearing
>>> it makes msg_ring non reentrant, i.e. luckily io_msg_send_fd() fails the
>>> request on failed io_double_lock_ctx() but clearly was originally
>>> intended to do retries instead.
>>
>> That's basically what I had in my patch, except I just went for the
>> negated one instead to cut down on churn. Why not just do that?
> I just already had this patch so left it as is, but if I have to
> find a reason it would be: 1) considering that the req->file check
> is already an exception to the rule, the negative would be an
> exception to the exception, and 2) it removes that extra req->file
> check.

Let's just leave it as-is, was only mostly concerned with potential
backport pain.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-12-07 21:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-07  3:53 [PATCH for-next v2 00/12] CQ locking optimisation Pavel Begunkov
2022-12-07  3:53 ` [PATCH for-next v2 01/12] io_uring: dont remove file from msg_ring reqs Pavel Begunkov
2022-12-07 13:52   ` Jens Axboe
2022-12-07 21:12     ` Pavel Begunkov
2022-12-07 21:23       ` Jens Axboe
2022-12-07  3:53 ` [PATCH for-next v2 02/12] io_uring: improve io_double_lock_ctx fail handling Pavel Begunkov
2022-12-07  3:53 ` [PATCH for-next v2 03/12] io_uring: skip overflow CQE posting for dying ring Pavel Begunkov
2022-12-07  3:53 ` [PATCH for-next v2 04/12] io_uring: don't check overflow flush failures Pavel Begunkov
2022-12-07  3:53 ` [PATCH for-next v2 05/12] io_uring: complete all requests in task context Pavel Begunkov
2022-12-07  3:53 ` [PATCH for-next v2 06/12] io_uring: force multishot CQEs into " Pavel Begunkov
2022-12-07  3:53 ` [PATCH for-next v2 07/12] io_uring: use tw for putting rsrc Pavel Begunkov
2022-12-07  3:53 ` [PATCH for-next v2 08/12] io_uring: never run tw and fallback in parallel Pavel Begunkov
2022-12-07  3:53 ` [PATCH for-next v2 09/12] io_uring: get rid of double locking Pavel Begunkov
2022-12-07  3:53 ` [PATCH for-next v2 10/12] io_uring: extract a io_msg_install_complete helper Pavel Begunkov
2022-12-07  3:53 ` [PATCH for-next v2 11/12] io_uring: do msg_ring in target task via tw Pavel Begunkov
2022-12-07 15:31   ` Jens Axboe
2022-12-07 15:51     ` Jens Axboe
2022-12-07 21:18       ` Pavel Begunkov
2022-12-07 21:22         ` Jens Axboe
2022-12-07  3:53 ` [PATCH for-next v2 12/12] io_uring: skip spinlocking for ->task_complete Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.