All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET REBASE 00/10] cq wait refactoring rebase
@ 2023-01-05 11:22 Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 01/10] io_uring: rearrange defer list checks Pavel Begunkov
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Rebase of 6.3, i.e. recent CQ waiting refactoring series on top of
just sent 6.2 patch ("io_uring: fix CQ waiting timeout handling").

Apart from that there are 2 more patches on top, 9/10 squeezes
an extra 1% of perf for one of my tests.

Pavel Begunkov (10):
  io_uring: rearrange defer list checks
  io_uring: don't iterate cq wait fast path
  io_uring: kill io_run_task_work_ctx
  io_uring: move defer tw task checks
  io_uring: parse check_cq out of wq waiting
  io_uring: mimimise io_cqring_wait_schedule
  io_uring: simplify io_has_work
  io_uring: set TASK_RUNNING right after schedule
  io_uring: optimise non-timeout waiting
  io_uring: keep timeout in io_wait_queue

 io_uring/io_uring.c | 137 +++++++++++++++++++++++---------------------
 io_uring/io_uring.h |  25 ++------
 2 files changed, 77 insertions(+), 85 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHSET REBASE 01/10] io_uring: rearrange defer list checks
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
@ 2023-01-05 11:22 ` Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 02/10] io_uring: don't iterate cq wait fast path Pavel Begunkov
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

There should be nothing in the ->work_llist for non DEFER_TASKRUN rings,
so we can skip flag checks and test the list emptiness directly. Also
move it out of io_run_local_work() for inlining.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 3 ---
 io_uring/io_uring.h | 2 +-
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index fefffb02e6b0..cddfcfddcb52 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1338,9 +1338,6 @@ int io_run_local_work(struct io_ring_ctx *ctx)
 	bool locked;
 	int ret;
 
-	if (llist_empty(&ctx->work_llist))
-		return 0;
-
 	__set_current_state(TASK_RUNNING);
 	locked = mutex_trylock(&ctx->uring_lock);
 	ret = __io_run_local_work(ctx, &locked);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index e9f0d41ebb99..46c0f765a77a 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -274,7 +274,7 @@ static inline int io_run_task_work_ctx(struct io_ring_ctx *ctx)
 	int ret = 0;
 	int ret2;
 
-	if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
+	if (!llist_empty(&ctx->work_llist))
 		ret = io_run_local_work(ctx);
 
 	/* want to run this after in case more is added */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHSET REBASE 02/10] io_uring: don't iterate cq wait fast path
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 01/10] io_uring: rearrange defer list checks Pavel Begunkov
@ 2023-01-05 11:22 ` Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 03/10] io_uring: kill io_run_task_work_ctx Pavel Begunkov
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Task work runners keep running until all queues tw items are exhausted.
It's also rare for defer tw to queue normal tw and vise versa. Taking it
into account, there is only a dim chance that further iterating the
io_cqring_wait() fast path will get us anything and so we can remove
the loop there.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index cddfcfddcb52..ec800a8bed28 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2507,18 +2507,16 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 
 	if (!io_allowed_run_tw(ctx))
 		return -EEXIST;
-
-	do {
-		/* always run at least 1 task work to process local work */
-		ret = io_run_task_work_ctx(ctx);
+	if (!llist_empty(&ctx->work_llist)) {
+		ret = io_run_local_work(ctx);
 		if (ret < 0)
 			return ret;
-		io_cqring_overflow_flush(ctx);
-
-		/* if user messes with these they will just get an early return */
-		if (__io_cqring_events_user(ctx) >= min_events)
-			return 0;
-	} while (ret > 0);
+	}
+	io_run_task_work();
+	io_cqring_overflow_flush(ctx);
+	/* if user messes with these they will just get an early return */
+	if (__io_cqring_events_user(ctx) >= min_events)
+		return 0;
 
 	if (sig) {
 #ifdef CONFIG_COMPAT
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHSET REBASE 03/10] io_uring: kill io_run_task_work_ctx
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 01/10] io_uring: rearrange defer list checks Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 02/10] io_uring: don't iterate cq wait fast path Pavel Begunkov
@ 2023-01-05 11:22 ` Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 04/10] io_uring: move defer tw task checks Pavel Begunkov
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

There is only one user of io_run_task_work_ctx(), inline it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c |  6 +++++-
 io_uring/io_uring.h | 20 --------------------
 2 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ec800a8bed28..bf6f9777d165 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2452,7 +2452,11 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
 
 int io_run_task_work_sig(struct io_ring_ctx *ctx)
 {
-	if (io_run_task_work_ctx(ctx) > 0)
+	if (!llist_empty(&ctx->work_llist)) {
+		if (io_run_local_work(ctx) > 0)
+			return 1;
+	}
+	if (io_run_task_work() > 0)
 		return 1;
 	if (task_sigpending(current))
 		return -EINTR;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 46c0f765a77a..8a5c3affd724 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -269,26 +269,6 @@ static inline bool io_task_work_pending(struct io_ring_ctx *ctx)
 	return task_work_pending(current) || !wq_list_empty(&ctx->work_llist);
 }
 
-static inline int io_run_task_work_ctx(struct io_ring_ctx *ctx)
-{
-	int ret = 0;
-	int ret2;
-
-	if (!llist_empty(&ctx->work_llist))
-		ret = io_run_local_work(ctx);
-
-	/* want to run this after in case more is added */
-	ret2 = io_run_task_work();
-
-	/* Try propagate error in favour of if tasks were run,
-	 * but still make sure to run them if requested
-	 */
-	if (ret >= 0)
-		ret += ret2;
-
-	return ret;
-}
-
 static inline int io_run_local_work_locked(struct io_ring_ctx *ctx)
 {
 	bool locked;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHSET REBASE 04/10] io_uring: move defer tw task checks
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
                   ` (2 preceding siblings ...)
  2023-01-05 11:22 ` [PATCHSET REBASE 03/10] io_uring: kill io_run_task_work_ctx Pavel Begunkov
@ 2023-01-05 11:22 ` Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 05/10] io_uring: parse check_cq out of wq waiting Pavel Begunkov
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Most places that want to run local tw explicitly and in advance check if
they are allowed to do so. Don't rely on a similar check in
__io_run_local_work(), leave it as a just-in-case warning and make sure
callers checks capabilities themselves.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 15 ++++++---------
 io_uring/io_uring.h |  5 +++++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index bf6f9777d165..3bb3e9889717 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1296,14 +1296,13 @@ int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
 	struct llist_node *node;
 	struct llist_node fake;
 	struct llist_node *current_final = NULL;
-	int ret;
+	int ret = 0;
 	unsigned int loops = 1;
 
-	if (unlikely(ctx->submitter_task != current))
+	if (WARN_ON_ONCE(ctx->submitter_task != current))
 		return -EEXIST;
 
 	node = io_llist_xchg(&ctx->work_llist, &fake);
-	ret = 0;
 again:
 	while (node != current_final) {
 		struct llist_node *next = node->next;
@@ -2511,11 +2510,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 
 	if (!io_allowed_run_tw(ctx))
 		return -EEXIST;
-	if (!llist_empty(&ctx->work_llist)) {
-		ret = io_run_local_work(ctx);
-		if (ret < 0)
-			return ret;
-	}
+	if (!llist_empty(&ctx->work_llist))
+		io_run_local_work(ctx);
 	io_run_task_work();
 	io_cqring_overflow_flush(ctx);
 	/* if user messes with these they will just get an early return */
@@ -3052,7 +3048,8 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 		}
 	}
 
-	if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
+	if ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
+	    io_allowed_defer_tw_run(ctx))
 		ret |= io_run_local_work(ctx) > 0;
 	ret |= io_cancel_defer_files(ctx, task, cancel_all);
 	mutex_lock(&ctx->uring_lock);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 8a5c3affd724..9b7baeff5a1c 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -352,6 +352,11 @@ static inline struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
 	return container_of(node, struct io_kiocb, comp_list);
 }
 
+static inline bool io_allowed_defer_tw_run(struct io_ring_ctx *ctx)
+{
+	return likely(ctx->submitter_task == current);
+}
+
 static inline bool io_allowed_run_tw(struct io_ring_ctx *ctx)
 {
 	return likely(!(ctx->flags & IORING_SETUP_DEFER_TASKRUN) ||
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHSET REBASE 05/10] io_uring: parse check_cq out of wq waiting
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
                   ` (3 preceding siblings ...)
  2023-01-05 11:22 ` [PATCHSET REBASE 04/10] io_uring: move defer tw task checks Pavel Begunkov
@ 2023-01-05 11:22 ` Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 06/10] io_uring: mimimise io_cqring_wait_schedule Pavel Begunkov
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

We already avoid flushing overflows in io_cqring_wait_schedule() but
only return an error for the outer loop to handle it. Minimise it even
further by moving all ->check_cq parsing there.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 3bb3e9889717..067e3577ac9b 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2468,21 +2468,13 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 					  ktime_t *timeout)
 {
 	int ret;
-	unsigned long check_cq;
 
+	if (unlikely(READ_ONCE(ctx->check_cq)))
+		return 1;
 	/* make sure we run task_work before checking for signals */
 	ret = io_run_task_work_sig(ctx);
 	if (ret || io_should_wake(iowq))
 		return ret;
-
-	check_cq = READ_ONCE(ctx->check_cq);
-	if (unlikely(check_cq)) {
-		/* let the caller flush overflows, retry */
-		if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
-			return 1;
-		if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
-			return -EBADR;
-	}
 	if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
 		return -ETIME;
 
@@ -2548,13 +2540,25 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 
 	trace_io_uring_cqring_wait(ctx, min_events);
 	do {
-		if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) {
-			finish_wait(&ctx->cq_wait, &iowq.wq);
-			io_cqring_do_overflow_flush(ctx);
-		}
+		unsigned long check_cq;
+
 		prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
 		ret = io_cqring_wait_schedule(ctx, &iowq, &timeout);
+
+		check_cq = READ_ONCE(ctx->check_cq);
+		if (unlikely(check_cq)) {
+			/* let the caller flush overflows, retry */
+			if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT)) {
+				finish_wait(&ctx->cq_wait, &iowq.wq);
+				io_cqring_do_overflow_flush(ctx);
+			}
+			if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT)) {
+				ret = -EBADR;
+				break;
+			}
+		}
+
 		if (__io_cqring_events_user(ctx) >= min_events)
 			break;
 		cond_resched();
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHSET REBASE 06/10] io_uring: mimimise io_cqring_wait_schedule
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
                   ` (4 preceding siblings ...)
  2023-01-05 11:22 ` [PATCHSET REBASE 05/10] io_uring: parse check_cq out of wq waiting Pavel Begunkov
@ 2023-01-05 11:22 ` Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 07/10] io_uring: simplify io_has_work Pavel Begunkov
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

io_cqring_wait_schedule() is called after we started waiting on the cq
wq and set the state to TASK_INTERRUPTIBLE, for that reason we have to
constantly worry whether we has returned the state back to running or
not. Leave only quick checks in io_cqring_wait_schedule() and move the
rest including running task work to the callers. Note, we run tw in the
loop after the sched checks because of the fast path in the beginning of
the function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 067e3577ac9b..b4ca238cbd63 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2467,24 +2467,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 					  struct io_wait_queue *iowq,
 					  ktime_t *timeout)
 {
-	int ret;
-
 	if (unlikely(READ_ONCE(ctx->check_cq)))
 		return 1;
-	/* make sure we run task_work before checking for signals */
-	ret = io_run_task_work_sig(ctx);
-	if (ret || io_should_wake(iowq))
-		return ret;
+	if (unlikely(!llist_empty(&ctx->work_llist)))
+		return 1;
+	if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL)))
+		return 1;
+	if (unlikely(task_sigpending(current)))
+		return -EINTR;
+	if (unlikely(io_should_wake(iowq)))
+		return 0;
 	if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
 		return -ETIME;
-
-	/*
-	 * Run task_work after scheduling. If we got woken because of
-	 * task_work being processed, run it now rather than let the caller
-	 * do another wait loop.
-	 */
-	ret = io_run_task_work_sig(ctx);
-	return ret < 0 ? ret : 1;
+	return 0;
 }
 
 /*
@@ -2545,6 +2540,16 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 		prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
 		ret = io_cqring_wait_schedule(ctx, &iowq, &timeout);
+		if (ret < 0)
+			break;
+		/*
+		 * Run task_work after scheduling and before io_should_wake().
+		 * If we got woken because of task_work being processed, run it
+		 * now rather than let the caller do another wait loop.
+		 */
+		io_run_task_work();
+		if (!llist_empty(&ctx->work_llist))
+			io_run_local_work(ctx);
 
 		check_cq = READ_ONCE(ctx->check_cq);
 		if (unlikely(check_cq)) {
@@ -2559,10 +2564,12 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 			}
 		}
 
-		if (__io_cqring_events_user(ctx) >= min_events)
+		if (io_should_wake(&iowq)) {
+			ret = 0;
 			break;
+		}
 		cond_resched();
-	} while (ret > 0);
+	} while (1);
 
 	finish_wait(&ctx->cq_wait, &iowq.wq);
 	restore_saved_sigmask_unless(ret == -EINTR);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHSET REBASE 07/10] io_uring: simplify io_has_work
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
                   ` (5 preceding siblings ...)
  2023-01-05 11:22 ` [PATCHSET REBASE 06/10] io_uring: mimimise io_cqring_wait_schedule Pavel Begunkov
@ 2023-01-05 11:22 ` Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 08/10] io_uring: set TASK_RUNNING right after schedule Pavel Begunkov
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

->work_llist should never be non-empty for a non DEFER_TASKRUN ring, so
we can safely skip checking the flag.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index b4ca238cbd63..2376adce9570 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2416,8 +2416,7 @@ struct io_wait_queue {
 static inline bool io_has_work(struct io_ring_ctx *ctx)
 {
 	return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) ||
-	       ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
-		!llist_empty(&ctx->work_llist));
+	       !llist_empty(&ctx->work_llist);
 }
 
 static inline bool io_should_wake(struct io_wait_queue *iowq)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHSET REBASE 08/10] io_uring: set TASK_RUNNING right after schedule
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
                   ` (6 preceding siblings ...)
  2023-01-05 11:22 ` [PATCHSET REBASE 07/10] io_uring: simplify io_has_work Pavel Begunkov
@ 2023-01-05 11:22 ` Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 09/10] io_uring: optimise non-timeout waiting Pavel Begunkov
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Instead of constantly watching that the state of the task is running
before executing tw or taking locks in io_cqring_wait(), switch it back
to TASK_RUNNING immediately.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 2376adce9570..54ec0106ab83 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2541,6 +2541,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 		ret = io_cqring_wait_schedule(ctx, &iowq, &timeout);
 		if (ret < 0)
 			break;
+		__set_current_state(TASK_RUNNING);
 		/*
 		 * Run task_work after scheduling and before io_should_wake().
 		 * If we got woken because of task_work being processed, run it
@@ -2553,10 +2554,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 		check_cq = READ_ONCE(ctx->check_cq);
 		if (unlikely(check_cq)) {
 			/* let the caller flush overflows, retry */
-			if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT)) {
-				finish_wait(&ctx->cq_wait, &iowq.wq);
+			if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
 				io_cqring_do_overflow_flush(ctx);
-			}
 			if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT)) {
 				ret = -EBADR;
 				break;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHSET REBASE 09/10] io_uring: optimise non-timeout waiting
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
                   ` (7 preceding siblings ...)
  2023-01-05 11:22 ` [PATCHSET REBASE 08/10] io_uring: set TASK_RUNNING right after schedule Pavel Begunkov
@ 2023-01-05 11:22 ` Pavel Begunkov
  2023-01-05 11:22 ` [PATCHSET REBASE 10/10] io_uring: keep timeout in io_wait_queue Pavel Begunkov
  2023-01-05 15:28 ` [PATCHSET REBASE 00/10] cq wait refactoring rebase Jens Axboe
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Unlike the jiffy scheduling version, schedule_hrtimeout() jumps a few
functions before getting into schedule() even if there is no actual
timeout needed. Some tests showed that it takes up to 1% of CPU cycles.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 54ec0106ab83..420b022f6c31 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2476,7 +2476,9 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 		return -EINTR;
 	if (unlikely(io_should_wake(iowq)))
 		return 0;
-	if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
+	if (*timeout == KTIME_MAX)
+		schedule();
+	else if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
 		return -ETIME;
 	return 0;
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHSET REBASE 10/10] io_uring: keep timeout in io_wait_queue
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
                   ` (8 preceding siblings ...)
  2023-01-05 11:22 ` [PATCHSET REBASE 09/10] io_uring: optimise non-timeout waiting Pavel Begunkov
@ 2023-01-05 11:22 ` Pavel Begunkov
  2023-01-05 15:28 ` [PATCHSET REBASE 00/10] cq wait refactoring rebase Jens Axboe
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2023-01-05 11:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Move waiting timeout into io_wait_queue

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 420b022f6c31..a44b3b5813df 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2411,6 +2411,7 @@ struct io_wait_queue {
 	struct io_ring_ctx *ctx;
 	unsigned cq_tail;
 	unsigned nr_timeouts;
+	ktime_t timeout;
 };
 
 static inline bool io_has_work(struct io_ring_ctx *ctx)
@@ -2463,8 +2464,7 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx)
 
 /* when returns >0, the caller should retry */
 static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
-					  struct io_wait_queue *iowq,
-					  ktime_t *timeout)
+					  struct io_wait_queue *iowq)
 {
 	if (unlikely(READ_ONCE(ctx->check_cq)))
 		return 1;
@@ -2476,9 +2476,9 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 		return -EINTR;
 	if (unlikely(io_should_wake(iowq)))
 		return 0;
-	if (*timeout == KTIME_MAX)
+	if (iowq->timeout == KTIME_MAX)
 		schedule();
-	else if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
+	else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS))
 		return -ETIME;
 	return 0;
 }
@@ -2493,7 +2493,6 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 {
 	struct io_wait_queue iowq;
 	struct io_rings *rings = ctx->rings;
-	ktime_t timeout = KTIME_MAX;
 	int ret;
 
 	if (!io_allowed_run_tw(ctx))
@@ -2519,20 +2518,21 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 			return ret;
 	}
 
-	if (uts) {
-		struct timespec64 ts;
-
-		if (get_timespec64(&ts, uts))
-			return -EFAULT;
-		timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
-	}
-
 	init_waitqueue_func_entry(&iowq.wq, io_wake_function);
 	iowq.wq.private = current;
 	INIT_LIST_HEAD(&iowq.wq.entry);
 	iowq.ctx = ctx;
 	iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
 	iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
+	iowq.timeout = KTIME_MAX;
+
+	if (uts) {
+		struct timespec64 ts;
+
+		if (get_timespec64(&ts, uts))
+			return -EFAULT;
+		iowq.timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
+	}
 
 	trace_io_uring_cqring_wait(ctx, min_events);
 	do {
@@ -2540,7 +2540,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 
 		prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
-		ret = io_cqring_wait_schedule(ctx, &iowq, &timeout);
+		ret = io_cqring_wait_schedule(ctx, &iowq);
 		if (ret < 0)
 			break;
 		__set_current_state(TASK_RUNNING);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHSET REBASE 00/10] cq wait refactoring rebase
  2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
                   ` (9 preceding siblings ...)
  2023-01-05 11:22 ` [PATCHSET REBASE 10/10] io_uring: keep timeout in io_wait_queue Pavel Begunkov
@ 2023-01-05 15:28 ` Jens Axboe
  10 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2023-01-05 15:28 UTC (permalink / raw)
  To: io-uring, Pavel Begunkov


On Thu, 05 Jan 2023 11:22:19 +0000, Pavel Begunkov wrote:
> Rebase of 6.3, i.e. recent CQ waiting refactoring series on top of
> just sent 6.2 patch ("io_uring: fix CQ waiting timeout handling").
> 
> Apart from that there are 2 more patches on top, 9/10 squeezes
> an extra 1% of perf for one of my tests.
> 
> Pavel Begunkov (10):
>   io_uring: rearrange defer list checks
>   io_uring: don't iterate cq wait fast path
>   io_uring: kill io_run_task_work_ctx
>   io_uring: move defer tw task checks
>   io_uring: parse check_cq out of wq waiting
>   io_uring: mimimise io_cqring_wait_schedule
>   io_uring: simplify io_has_work
>   io_uring: set TASK_RUNNING right after schedule
>   io_uring: optimise non-timeout waiting
>   io_uring: keep timeout in io_wait_queue
> 
> [...]

Applied, thanks!

[01/10] io_uring: rearrange defer list checks
        commit: d6d505fb8cafd1d78ca438fcd2d9a460c9695a1a
[02/10] io_uring: don't iterate cq wait fast path
        commit: ba5636ba85eb7b01bcbb03b8855ec444f78779ed
[03/10] io_uring: kill io_run_task_work_ctx
        commit: a95f8ea66f4804c9faeb51bf496008d09d43bf41
[04/10] io_uring: move defer tw task checks
        commit: 07ef9bf92577095d2e6612b010267f6bd5139936
[05/10] io_uring: parse check_cq out of wq waiting
        commit: 4a42b391367e7df55f8fd5f7a4e42cdf796ef36a
[06/10] io_uring: mimimise io_cqring_wait_schedule
        commit: 4fbb75ef0d9ed027d098394227596d91c59e246a
[07/10] io_uring: simplify io_has_work
        commit: 8d1f973f5025d62f88a9b3e4406c74c857ef49de
[08/10] io_uring: set TASK_RUNNING right after schedule
        commit: 2fc7bfa59e195d2a1fed638a35dde7ab794f0178
[09/10] io_uring: optimise non-timeout waiting
        commit: dc53cdbd5a63ef78c1acc34b2fda535bbcffd5c5
[10/10] io_uring: keep timeout in io_wait_queue
        commit: a367ffbe03a6da8d97f5f8057114bada90ccaea8

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-01-05 15:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-05 11:22 [PATCHSET REBASE 00/10] cq wait refactoring rebase Pavel Begunkov
2023-01-05 11:22 ` [PATCHSET REBASE 01/10] io_uring: rearrange defer list checks Pavel Begunkov
2023-01-05 11:22 ` [PATCHSET REBASE 02/10] io_uring: don't iterate cq wait fast path Pavel Begunkov
2023-01-05 11:22 ` [PATCHSET REBASE 03/10] io_uring: kill io_run_task_work_ctx Pavel Begunkov
2023-01-05 11:22 ` [PATCHSET REBASE 04/10] io_uring: move defer tw task checks Pavel Begunkov
2023-01-05 11:22 ` [PATCHSET REBASE 05/10] io_uring: parse check_cq out of wq waiting Pavel Begunkov
2023-01-05 11:22 ` [PATCHSET REBASE 06/10] io_uring: mimimise io_cqring_wait_schedule Pavel Begunkov
2023-01-05 11:22 ` [PATCHSET REBASE 07/10] io_uring: simplify io_has_work Pavel Begunkov
2023-01-05 11:22 ` [PATCHSET REBASE 08/10] io_uring: set TASK_RUNNING right after schedule Pavel Begunkov
2023-01-05 11:22 ` [PATCHSET REBASE 09/10] io_uring: optimise non-timeout waiting Pavel Begunkov
2023-01-05 11:22 ` [PATCHSET REBASE 10/10] io_uring: keep timeout in io_wait_queue Pavel Begunkov
2023-01-05 15:28 ` [PATCHSET REBASE 00/10] cq wait refactoring rebase Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.