All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next 0/7] cqe posting cleanups
@ 2022-06-19 11:26 Pavel Begunkov
  2022-06-19 11:26 ` [PATCH for-next 1/7] io_uring: remove extra io_commit_cqring() Pavel Begunkov
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 11:26 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Apart from this patches removing some implicit assumptions, which we
had problems with before, and making code cleaner, they and especially
6-7 are also needed to push for synchronisation optimisations later, lile
[1] or removing spinlocking with SINGLE_ISSUER.

The downside is that we add additional lock/unlock into eventfd path,
but I don't think we care about it.

The series also exposes a minor issue with cancellations, which for some
reason calls io_kill_timeouts() and io_poll_remove_all() too many times on
task exit. That makes poll-cancel to timeout on sigalarm, though usually
is fine if given 3-5 sec instead of 1. We'll investigate it later.

[1] https://github.com/isilence/linux/commit/6224f58bf7b542e6aed1eed44ee6bd5b5f706437

Pavel Begunkov (7):
  io_uring: remove extra io_commit_cqring()
  io_uring: reshuffle io_uring/io_uring.h
  io_uring: move io_eventfd_signal()
  io_uring: hide eventfd assumptions in evenfd paths
  io_uring: remove ->flush_cqes optimisation
  io_uring: introduce locking helpers for CQE posting
  io_uring: add io_commit_cqring_flush()

 include/linux/io_uring_types.h |   2 +
 io_uring/io_uring.c            | 144 ++++++++++++++++-----------------
 io_uring/io_uring.h            | 108 ++++++++++++++-----------
 io_uring/rw.c                  |   5 +-
 io_uring/timeout.c             |   7 +-
 5 files changed, 133 insertions(+), 133 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH for-next 1/7] io_uring: remove extra io_commit_cqring()
  2022-06-19 11:26 [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
@ 2022-06-19 11:26 ` Pavel Begunkov
  2022-06-19 11:26 ` [PATCH for-next 2/7] io_uring: reshuffle io_uring/io_uring.h Pavel Begunkov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 11:26 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

We don't post events in __io_commit_cqring_flush() anymore but send all
requests to tw, so no need to do io_commit_cqring() there.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 37084fe3cc07..9e02c4a950ef 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -480,7 +480,6 @@ void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
 			io_flush_timeouts(ctx);
 		if (ctx->drain_active)
 			io_queue_deferred(ctx);
-		io_commit_cqring(ctx);
 		spin_unlock(&ctx->completion_lock);
 	}
 	if (ctx->has_evfd)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 2/7] io_uring: reshuffle io_uring/io_uring.h
  2022-06-19 11:26 [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
  2022-06-19 11:26 ` [PATCH for-next 1/7] io_uring: remove extra io_commit_cqring() Pavel Begunkov
@ 2022-06-19 11:26 ` Pavel Begunkov
  2022-06-19 11:26 ` [PATCH for-next 3/7] io_uring: move io_eventfd_signal() Pavel Begunkov
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 11:26 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

It's a good idea to first do forward declarations and then inline
helpers, otherwise there will be keep stumbling on dependencies
between them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.h | 95 ++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 48 deletions(-)

diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 388391516a62..906749fa3415 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -18,6 +18,53 @@ enum {
 
 struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx);
 bool io_req_cqe_overflow(struct io_kiocb *req);
+int io_run_task_work_sig(void);
+void io_req_complete_failed(struct io_kiocb *req, s32 res);
+void __io_req_complete(struct io_kiocb *req, unsigned issue_flags);
+void io_req_complete_post(struct io_kiocb *req);
+void __io_req_complete_post(struct io_kiocb *req);
+bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
+void io_cqring_ev_posted(struct io_ring_ctx *ctx);
+void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
+
+struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
+
+struct file *io_file_get_normal(struct io_kiocb *req, int fd);
+struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
+			       unsigned issue_flags);
+
+bool io_is_uring_fops(struct file *file);
+bool io_alloc_async_data(struct io_kiocb *req);
+void io_req_task_work_add(struct io_kiocb *req);
+void io_req_task_prio_work_add(struct io_kiocb *req);
+void io_req_tw_post_queue(struct io_kiocb *req, s32 res, u32 cflags);
+void io_req_task_queue(struct io_kiocb *req);
+void io_queue_iowq(struct io_kiocb *req, bool *dont_use);
+void io_req_task_complete(struct io_kiocb *req, bool *locked);
+void io_req_task_queue_fail(struct io_kiocb *req, int ret);
+void io_req_task_submit(struct io_kiocb *req, bool *locked);
+void tctx_task_work(struct callback_head *cb);
+__cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd);
+int io_uring_alloc_task_context(struct task_struct *task,
+				struct io_ring_ctx *ctx);
+
+int io_poll_issue(struct io_kiocb *req, bool *locked);
+int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr);
+int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin);
+void io_free_batch_list(struct io_ring_ctx *ctx, struct io_wq_work_node *node);
+int io_req_prep_async(struct io_kiocb *req);
+
+struct io_wq_work *io_wq_free_work(struct io_wq_work *work);
+void io_wq_submit_work(struct io_wq_work *work);
+
+void io_free_req(struct io_kiocb *req);
+void io_queue_next(struct io_kiocb *req);
+
+bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task,
+			bool cancel_all);
+
+#define io_for_each_link(pos, head) \
+	for (pos = (head); pos; pos = pos->link)
 
 static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx)
 {
@@ -190,52 +237,4 @@ static inline void io_req_complete_defer(struct io_kiocb *req)
 	wq_list_add_tail(&req->comp_list, &state->compl_reqs);
 }
 
-int io_run_task_work_sig(void);
-void io_req_complete_failed(struct io_kiocb *req, s32 res);
-void __io_req_complete(struct io_kiocb *req, unsigned issue_flags);
-void io_req_complete_post(struct io_kiocb *req);
-void __io_req_complete_post(struct io_kiocb *req);
-bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
-void io_cqring_ev_posted(struct io_ring_ctx *ctx);
-void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
-
-struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
-
-struct file *io_file_get_normal(struct io_kiocb *req, int fd);
-struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
-			       unsigned issue_flags);
-
-bool io_is_uring_fops(struct file *file);
-bool io_alloc_async_data(struct io_kiocb *req);
-void io_req_task_work_add(struct io_kiocb *req);
-void io_req_task_prio_work_add(struct io_kiocb *req);
-void io_req_tw_post_queue(struct io_kiocb *req, s32 res, u32 cflags);
-void io_req_task_queue(struct io_kiocb *req);
-void io_queue_iowq(struct io_kiocb *req, bool *dont_use);
-void io_req_task_complete(struct io_kiocb *req, bool *locked);
-void io_req_task_queue_fail(struct io_kiocb *req, int ret);
-void io_req_task_submit(struct io_kiocb *req, bool *locked);
-void tctx_task_work(struct callback_head *cb);
-__cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd);
-int io_uring_alloc_task_context(struct task_struct *task,
-				struct io_ring_ctx *ctx);
-
-int io_poll_issue(struct io_kiocb *req, bool *locked);
-int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr);
-int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin);
-void io_free_batch_list(struct io_ring_ctx *ctx, struct io_wq_work_node *node);
-int io_req_prep_async(struct io_kiocb *req);
-
-struct io_wq_work *io_wq_free_work(struct io_wq_work *work);
-void io_wq_submit_work(struct io_wq_work *work);
-
-void io_free_req(struct io_kiocb *req);
-void io_queue_next(struct io_kiocb *req);
-
-bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task,
-			bool cancel_all);
-
-#define io_for_each_link(pos, head) \
-	for (pos = (head); pos; pos = pos->link)
-
 #endif
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 3/7] io_uring: move io_eventfd_signal()
  2022-06-19 11:26 [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
  2022-06-19 11:26 ` [PATCH for-next 1/7] io_uring: remove extra io_commit_cqring() Pavel Begunkov
  2022-06-19 11:26 ` [PATCH for-next 2/7] io_uring: reshuffle io_uring/io_uring.h Pavel Begunkov
@ 2022-06-19 11:26 ` Pavel Begunkov
  2022-06-19 11:26 ` [PATCH for-next 4/7] io_uring: hide eventfd assumptions in evenfd paths Pavel Begunkov
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 11:26 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Move io_eventfd_signal() in the sources without any changes and kill its
forward declaration.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9e02c4a950ef..31beb9ccbf12 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -142,8 +142,6 @@ static void io_queue_sqe(struct io_kiocb *req);
 
 static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
 
-static void io_eventfd_signal(struct io_ring_ctx *ctx);
-
 static struct kmem_cache *req_cachep;
 
 struct sock *io_uring_get_socket(struct file *file)
@@ -472,20 +470,6 @@ static __cold void io_queue_deferred(struct io_ring_ctx *ctx)
 	}
 }
 
-void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
-{
-	if (ctx->off_timeout_used || ctx->drain_active) {
-		spin_lock(&ctx->completion_lock);
-		if (ctx->off_timeout_used)
-			io_flush_timeouts(ctx);
-		if (ctx->drain_active)
-			io_queue_deferred(ctx);
-		spin_unlock(&ctx->completion_lock);
-	}
-	if (ctx->has_evfd)
-		io_eventfd_signal(ctx);
-}
-
 static void io_eventfd_signal(struct io_ring_ctx *ctx)
 {
 	struct io_ev_fd *ev_fd;
@@ -513,6 +497,20 @@ static void io_eventfd_signal(struct io_ring_ctx *ctx)
 	rcu_read_unlock();
 }
 
+void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
+{
+	if (ctx->off_timeout_used || ctx->drain_active) {
+		spin_lock(&ctx->completion_lock);
+		if (ctx->off_timeout_used)
+			io_flush_timeouts(ctx);
+		if (ctx->drain_active)
+			io_queue_deferred(ctx);
+		spin_unlock(&ctx->completion_lock);
+	}
+	if (ctx->has_evfd)
+		io_eventfd_signal(ctx);
+}
+
 /*
  * This should only get called when at least one event has been posted.
  * Some applications rely on the eventfd notification count only changing
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 4/7] io_uring: hide eventfd assumptions in evenfd paths
  2022-06-19 11:26 [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
                   ` (2 preceding siblings ...)
  2022-06-19 11:26 ` [PATCH for-next 3/7] io_uring: move io_eventfd_signal() Pavel Begunkov
@ 2022-06-19 11:26 ` Pavel Begunkov
  2022-06-19 18:18   ` Jens Axboe
  2022-06-19 11:26 ` [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation Pavel Begunkov
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 11:26 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Some io_uring-eventfd users assume that there won't be spurious wakeups.
That assumption has to be honoured by all io_cqring_ev_posted() callers,
which is inconvenient and from time to time leads to problems but should
be maintained to not break the userspace.

Instead of making the callers to track whether a CQE was posted or not,
hide it inside io_eventfd_signal(). It saves ->cached_cq_tail it saw
last time and triggers the eventfd only when ->cached_cq_tail changed
since then.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/io_uring_types.h |  2 ++
 io_uring/io_uring.c            | 44 ++++++++++++++++++++--------------
 io_uring/timeout.c             |  3 +--
 3 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 779c72da5b8f..327bc7f0808d 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -315,6 +315,8 @@ struct io_ring_ctx {
 
 	struct list_head		defer_list;
 	unsigned			sq_thread_idle;
+	/* protected by ->completion_lock */
+	unsigned			evfd_last_cq_tail;
 };
 
 enum {
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 31beb9ccbf12..0875cc649e23 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -473,6 +473,22 @@ static __cold void io_queue_deferred(struct io_ring_ctx *ctx)
 static void io_eventfd_signal(struct io_ring_ctx *ctx)
 {
 	struct io_ev_fd *ev_fd;
+	bool skip;
+
+	spin_lock(&ctx->completion_lock);
+	/*
+	 * Eventfd should only get triggered when at least one event has been
+	 * posted. Some applications rely on the eventfd notification count only
+	 * changing IFF a new CQE has been added to the CQ ring. There's no
+	 * depedency on 1:1 relationship between how many times this function is
+	 * called (and hence the eventfd count) and number of CQEs posted to the
+	 * CQ ring.
+	 */
+	skip = ctx->cached_cq_tail == ctx->evfd_last_cq_tail;
+	ctx->evfd_last_cq_tail = ctx->cached_cq_tail;
+	spin_unlock(&ctx->completion_lock);
+	if (skip)
+		return;
 
 	rcu_read_lock();
 	/*
@@ -511,13 +527,6 @@ void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
 		io_eventfd_signal(ctx);
 }
 
-/*
- * This should only get called when at least one event has been posted.
- * Some applications rely on the eventfd notification count only changing
- * IFF a new CQE has been added to the CQ ring. There's no depedency on
- * 1:1 relationship between how many times this function is called (and
- * hence the eventfd count) and number of CQEs posted to the CQ ring.
- */
 void io_cqring_ev_posted(struct io_ring_ctx *ctx)
 {
 	if (unlikely(ctx->off_timeout_used || ctx->drain_active ||
@@ -530,7 +539,7 @@ void io_cqring_ev_posted(struct io_ring_ctx *ctx)
 /* Returns true if there are no backlogged entries after the flush */
 static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 {
-	bool all_flushed, posted;
+	bool all_flushed;
 	size_t cqe_size = sizeof(struct io_uring_cqe);
 
 	if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
@@ -539,7 +548,6 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 	if (ctx->flags & IORING_SETUP_CQE32)
 		cqe_size <<= 1;
 
-	posted = false;
 	spin_lock(&ctx->completion_lock);
 	while (!list_empty(&ctx->cq_overflow_list)) {
 		struct io_uring_cqe *cqe = io_get_cqe(ctx);
@@ -554,7 +562,6 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 		else
 			io_account_cq_overflow(ctx);
 
-		posted = true;
 		list_del(&ocqe->list);
 		kfree(ocqe);
 	}
@@ -567,8 +574,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 
 	io_commit_cqring(ctx);
 	spin_unlock(&ctx->completion_lock);
-	if (posted)
-		io_cqring_ev_posted(ctx);
+	io_cqring_ev_posted(ctx);
 	return all_flushed;
 }
 
@@ -758,8 +764,7 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx,
 	filled = io_fill_cqe_aux(ctx, user_data, res, cflags);
 	io_commit_cqring(ctx);
 	spin_unlock(&ctx->completion_lock);
-	if (filled)
-		io_cqring_ev_posted(ctx);
+	io_cqring_ev_posted(ctx);
 	return filled;
 }
 
@@ -940,14 +945,12 @@ __cold void io_free_req(struct io_kiocb *req)
 static void __io_req_find_next_prep(struct io_kiocb *req)
 {
 	struct io_ring_ctx *ctx = req->ctx;
-	bool posted;
 
 	spin_lock(&ctx->completion_lock);
-	posted = io_disarm_next(req);
+	io_disarm_next(req);
 	io_commit_cqring(ctx);
 	spin_unlock(&ctx->completion_lock);
-	if (posted)
-		io_cqring_ev_posted(ctx);
+	io_cqring_ev_posted(ctx);
 }
 
 static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req)
@@ -2431,6 +2434,11 @@ static int io_eventfd_register(struct io_ring_ctx *ctx, void __user *arg,
 		kfree(ev_fd);
 		return ret;
 	}
+
+	spin_lock(&ctx->completion_lock);
+	ctx->evfd_last_cq_tail = ctx->cached_cq_tail;
+	spin_unlock(&ctx->completion_lock);
+
 	ev_fd->eventfd_async = eventfd_async;
 	ctx->has_evfd = true;
 	rcu_assign_pointer(ctx->io_ev_fd, ev_fd);
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 557c637af158..4938c1cdcbcd 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -628,7 +628,6 @@ __cold bool io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk,
 	spin_unlock_irq(&ctx->timeout_lock);
 	io_commit_cqring(ctx);
 	spin_unlock(&ctx->completion_lock);
-	if (canceled != 0)
-		io_cqring_ev_posted(ctx);
+	io_cqring_ev_posted(ctx);
 	return canceled != 0;
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation
  2022-06-19 11:26 [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
                   ` (3 preceding siblings ...)
  2022-06-19 11:26 ` [PATCH for-next 4/7] io_uring: hide eventfd assumptions in evenfd paths Pavel Begunkov
@ 2022-06-19 11:26 ` Pavel Begunkov
  2022-06-19 13:31   ` Jens Axboe
  2022-06-19 11:26 ` [PATCH for-next 6/7] io_uring: introduce locking helpers for CQE posting Pavel Begunkov
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 11:26 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
->flush_cqes flag prevents from completion being flushed. Sometimes it's
high level of concurrency that enables it at least for one CQE, but
sometimes it doesn't save much because nobody waiting on the CQ.

Remove ->flush_cqes flag and the optimisation, it should benefit the
normal use case. Note, that there is no spurious eventfd problem with
that as checks for spuriousness were incorporated into
io_eventfd_signal().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 23 ++++++++++-------------
 io_uring/io_uring.h |  2 --
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 0875cc649e23..57aef092ef38 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1253,22 +1253,19 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
 	struct io_wq_work_node *node, *prev;
 	struct io_submit_state *state = &ctx->submit_state;
 
-	if (state->flush_cqes) {
-		spin_lock(&ctx->completion_lock);
-		wq_list_for_each(node, prev, &state->compl_reqs) {
-			struct io_kiocb *req = container_of(node, struct io_kiocb,
-						    comp_list);
-
-			if (!(req->flags & REQ_F_CQE_SKIP))
-				__io_fill_cqe_req(ctx, req);
-		}
+	spin_lock(&ctx->completion_lock);
+	wq_list_for_each(node, prev, &state->compl_reqs) {
+		struct io_kiocb *req = container_of(node, struct io_kiocb,
+					    comp_list);
 
-		io_commit_cqring(ctx);
-		spin_unlock(&ctx->completion_lock);
-		io_cqring_ev_posted(ctx);
-		state->flush_cqes = false;
+		if (!(req->flags & REQ_F_CQE_SKIP))
+			__io_fill_cqe_req(ctx, req);
 	}
 
+	io_commit_cqring(ctx);
+	spin_unlock(&ctx->completion_lock);
+	io_cqring_ev_posted(ctx);
+
 	io_free_batch_list(ctx, state->compl_reqs.first);
 	INIT_WQ_LIST(&state->compl_reqs);
 }
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 906749fa3415..7feef8c36db7 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -232,8 +232,6 @@ static inline void io_req_complete_defer(struct io_kiocb *req)
 
 	lockdep_assert_held(&req->ctx->uring_lock);
 
-	if (!(req->flags & REQ_F_CQE_SKIP))
-		state->flush_cqes = true;
 	wq_list_add_tail(&req->comp_list, &state->compl_reqs);
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 6/7] io_uring: introduce locking helpers for CQE posting
  2022-06-19 11:26 [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
                   ` (4 preceding siblings ...)
  2022-06-19 11:26 ` [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation Pavel Begunkov
@ 2022-06-19 11:26 ` Pavel Begunkov
  2022-06-19 13:30   ` Jens Axboe
  2022-06-19 11:26 ` [PATCH for-next 7/7] io_uring: add io_commit_cqring_flush() Pavel Begunkov
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 11:26 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

spin_lock(&ctx->completion_lock);
/* post CQEs */
io_commit_cqring(ctx);
spin_unlock(&ctx->completion_lock);
io_cqring_ev_posted(ctx);

We have many places repeating this sequence, and the three function
unlock section is not perfect from the maintainance perspective and also
makes harder to add new locking/sync trick.

Introduce to helpers. io_cq_lock(), which is simple and only grabs
->completion_lock, and io_cq_unlock_post() encapsulating the three call
section.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 57 +++++++++++++++++++++------------------------
 io_uring/io_uring.h |  9 ++++++-
 io_uring/timeout.c  |  6 ++---
 3 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 57aef092ef38..cff046b0734b 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -527,7 +527,7 @@ void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
 		io_eventfd_signal(ctx);
 }
 
-void io_cqring_ev_posted(struct io_ring_ctx *ctx)
+static inline void io_cqring_ev_posted(struct io_ring_ctx *ctx)
 {
 	if (unlikely(ctx->off_timeout_used || ctx->drain_active ||
 		     ctx->has_evfd))
@@ -536,6 +536,19 @@ void io_cqring_ev_posted(struct io_ring_ctx *ctx)
 	io_cqring_wake(ctx);
 }
 
+static inline void __io_cq_unlock_post(struct io_ring_ctx *ctx)
+	__releases(ctx->completion_lock)
+{
+	io_commit_cqring(ctx);
+	spin_unlock(&ctx->completion_lock);
+	io_cqring_ev_posted(ctx);
+}
+
+void io_cq_unlock_post(struct io_ring_ctx *ctx)
+{
+	__io_cq_unlock_post(ctx);
+}
+
 /* Returns true if there are no backlogged entries after the flush */
 static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 {
@@ -548,7 +561,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 	if (ctx->flags & IORING_SETUP_CQE32)
 		cqe_size <<= 1;
 
-	spin_lock(&ctx->completion_lock);
+	io_cq_lock(ctx);
 	while (!list_empty(&ctx->cq_overflow_list)) {
 		struct io_uring_cqe *cqe = io_get_cqe(ctx);
 		struct io_overflow_cqe *ocqe;
@@ -572,9 +585,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 		atomic_andnot(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
 	}
 
-	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-	io_cqring_ev_posted(ctx);
+	io_cq_unlock_post(ctx);
 	return all_flushed;
 }
 
@@ -760,11 +771,9 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx,
 {
 	bool filled;
 
-	spin_lock(&ctx->completion_lock);
+	io_cq_lock(ctx);
 	filled = io_fill_cqe_aux(ctx, user_data, res, cflags);
-	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-	io_cqring_ev_posted(ctx);
+	io_cq_unlock_post(ctx);
 	return filled;
 }
 
@@ -810,11 +819,9 @@ void io_req_complete_post(struct io_kiocb *req)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 
-	spin_lock(&ctx->completion_lock);
+	io_cq_lock(ctx);
 	__io_req_complete_post(req);
-	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-	io_cqring_ev_posted(ctx);
+	io_cq_unlock_post(ctx);
 }
 
 inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
@@ -946,11 +953,9 @@ static void __io_req_find_next_prep(struct io_kiocb *req)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 
-	spin_lock(&ctx->completion_lock);
+	io_cq_lock(ctx);
 	io_disarm_next(req);
-	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-	io_cqring_ev_posted(ctx);
+	io_cq_unlock_post(ctx);
 }
 
 static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req)
@@ -984,13 +989,6 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
 	percpu_ref_put(&ctx->refs);
 }
 
-static inline void ctx_commit_and_unlock(struct io_ring_ctx *ctx)
-{
-	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-	io_cqring_ev_posted(ctx);
-}
-
 static void handle_prev_tw_list(struct io_wq_work_node *node,
 				struct io_ring_ctx **ctx, bool *uring_locked)
 {
@@ -1006,7 +1004,7 @@ static void handle_prev_tw_list(struct io_wq_work_node *node,
 
 		if (req->ctx != *ctx) {
 			if (unlikely(!*uring_locked && *ctx))
-				ctx_commit_and_unlock(*ctx);
+				io_cq_unlock_post(*ctx);
 
 			ctx_flush_and_put(*ctx, uring_locked);
 			*ctx = req->ctx;
@@ -1014,7 +1012,7 @@ static void handle_prev_tw_list(struct io_wq_work_node *node,
 			*uring_locked = mutex_trylock(&(*ctx)->uring_lock);
 			percpu_ref_get(&(*ctx)->refs);
 			if (unlikely(!*uring_locked))
-				spin_lock(&(*ctx)->completion_lock);
+				io_cq_lock(*ctx);
 		}
 		if (likely(*uring_locked)) {
 			req->io_task_work.func(req, uring_locked);
@@ -1026,7 +1024,7 @@ static void handle_prev_tw_list(struct io_wq_work_node *node,
 	} while (node);
 
 	if (unlikely(!*uring_locked))
-		ctx_commit_and_unlock(*ctx);
+		io_cq_unlock_post(*ctx);
 }
 
 static void handle_tw_list(struct io_wq_work_node *node,
@@ -1261,10 +1259,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
 		if (!(req->flags & REQ_F_CQE_SKIP))
 			__io_fill_cqe_req(ctx, req);
 	}
-
-	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-	io_cqring_ev_posted(ctx);
+	__io_cq_unlock_post(ctx);
 
 	io_free_batch_list(ctx, state->compl_reqs.first);
 	INIT_WQ_LIST(&state->compl_reqs);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 7feef8c36db7..bb8367908472 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -24,7 +24,6 @@ void __io_req_complete(struct io_kiocb *req, unsigned issue_flags);
 void io_req_complete_post(struct io_kiocb *req);
 void __io_req_complete_post(struct io_kiocb *req);
 bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
-void io_cqring_ev_posted(struct io_ring_ctx *ctx);
 void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
 
 struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
@@ -66,6 +65,14 @@ bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task,
 #define io_for_each_link(pos, head) \
 	for (pos = (head); pos; pos = pos->link)
 
+static inline void io_cq_lock(struct io_ring_ctx *ctx)
+	__acquires(ctx->completion_lock)
+{
+	spin_lock(&ctx->completion_lock);
+}
+
+void io_cq_unlock_post(struct io_ring_ctx *ctx);
+
 static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx)
 {
 	if (likely(ctx->cqe_cached < ctx->cqe_sentinel)) {
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 4938c1cdcbcd..3c331b723332 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -615,7 +615,7 @@ __cold bool io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk,
 	struct io_timeout *timeout, *tmp;
 	int canceled = 0;
 
-	spin_lock(&ctx->completion_lock);
+	io_cq_lock(ctx);
 	spin_lock_irq(&ctx->timeout_lock);
 	list_for_each_entry_safe(timeout, tmp, &ctx->timeout_list, list) {
 		struct io_kiocb *req = cmd_to_io_kiocb(timeout);
@@ -626,8 +626,6 @@ __cold bool io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk,
 		}
 	}
 	spin_unlock_irq(&ctx->timeout_lock);
-	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-	io_cqring_ev_posted(ctx);
+	io_cq_unlock_post(ctx);
 	return canceled != 0;
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 7/7] io_uring: add io_commit_cqring_flush()
  2022-06-19 11:26 [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
                   ` (5 preceding siblings ...)
  2022-06-19 11:26 ` [PATCH for-next 6/7] io_uring: introduce locking helpers for CQE posting Pavel Begunkov
@ 2022-06-19 11:26 ` Pavel Begunkov
  2022-06-19 12:36 ` [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
  2022-06-19 16:01 ` Jens Axboe
  8 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 11:26 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Since __io_commit_cqring_flush users moved to different files, introduce
io_commit_cqring_flush() helper and encapsulate all flags testing details
inside.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 5 +----
 io_uring/io_uring.h | 6 ++++++
 io_uring/rw.c       | 5 +----
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index cff046b0734b..c24c285dfac9 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -529,10 +529,7 @@ void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
 
 static inline void io_cqring_ev_posted(struct io_ring_ctx *ctx)
 {
-	if (unlikely(ctx->off_timeout_used || ctx->drain_active ||
-		     ctx->has_evfd))
-		__io_commit_cqring_flush(ctx);
-
+	io_commit_cqring_flush(ctx);
 	io_cqring_wake(ctx);
 }
 
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index bb8367908472..76cfb88af812 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -242,4 +242,10 @@ static inline void io_req_complete_defer(struct io_kiocb *req)
 	wq_list_add_tail(&req->comp_list, &state->compl_reqs);
 }
 
+static inline void io_commit_cqring_flush(struct io_ring_ctx *ctx)
+{
+	if (unlikely(ctx->off_timeout_used || ctx->drain_active || ctx->has_evfd))
+		__io_commit_cqring_flush(ctx);
+}
+
 #endif
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 5f24db65a81d..17707e78ab01 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -948,10 +948,7 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 
 static void io_cqring_ev_posted_iopoll(struct io_ring_ctx *ctx)
 {
-	if (unlikely(ctx->off_timeout_used || ctx->drain_active ||
-		     ctx->has_evfd))
-		__io_commit_cqring_flush(ctx);
-
+	io_commit_cqring_flush(ctx);
 	if (ctx->flags & IORING_SETUP_SQPOLL)
 		io_cqring_wake(ctx);
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 0/7] cqe posting cleanups
  2022-06-19 11:26 [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
                   ` (6 preceding siblings ...)
  2022-06-19 11:26 ` [PATCH for-next 7/7] io_uring: add io_commit_cqring_flush() Pavel Begunkov
@ 2022-06-19 12:36 ` Pavel Begunkov
  2022-06-19 16:01 ` Jens Axboe
  8 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 12:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

On 6/19/22 12:26, Pavel Begunkov wrote:
> Apart from this patches removing some implicit assumptions, which we
> had problems with before, and making code cleaner, they and especially
> 6-7 are also needed to push for synchronisation optimisations later, lile
> [1] or removing spinlocking with SINGLE_ISSUER.
> 
> The downside is that we add additional lock/unlock into eventfd path,
> but I don't think we care about it.

just in case there conflicts, it's based on top of

"[PATCH for-next 0/4] simple cleanups" from yesterday


> 
> The series also exposes a minor issue with cancellations, which for some
> reason calls io_kill_timeouts() and io_poll_remove_all() too many times on
> task exit. That makes poll-cancel to timeout on sigalarm, though usually
> is fine if given 3-5 sec instead of 1. We'll investigate it later.
> 
> [1] https://github.com/isilence/linux/commit/6224f58bf7b542e6aed1eed44ee6bd5b5f706437
> 
> Pavel Begunkov (7):
>    io_uring: remove extra io_commit_cqring()
>    io_uring: reshuffle io_uring/io_uring.h
>    io_uring: move io_eventfd_signal()
>    io_uring: hide eventfd assumptions in evenfd paths
>    io_uring: remove ->flush_cqes optimisation
>    io_uring: introduce locking helpers for CQE posting
>    io_uring: add io_commit_cqring_flush()
> 
>   include/linux/io_uring_types.h |   2 +
>   io_uring/io_uring.c            | 144 ++++++++++++++++-----------------
>   io_uring/io_uring.h            | 108 ++++++++++++++-----------
>   io_uring/rw.c                  |   5 +-
>   io_uring/timeout.c             |   7 +-
>   5 files changed, 133 insertions(+), 133 deletions(-)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 6/7] io_uring: introduce locking helpers for CQE posting
  2022-06-19 11:26 ` [PATCH for-next 6/7] io_uring: introduce locking helpers for CQE posting Pavel Begunkov
@ 2022-06-19 13:30   ` Jens Axboe
  2022-06-19 14:20     ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2022-06-19 13:30 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 6/19/22 5:26 AM, Pavel Begunkov wrote:
> spin_lock(&ctx->completion_lock);
> /* post CQEs */
> io_commit_cqring(ctx);
> spin_unlock(&ctx->completion_lock);
> io_cqring_ev_posted(ctx);
> 
> We have many places repeating this sequence, and the three function
> unlock section is not perfect from the maintainance perspective and also
> makes harder to add new locking/sync trick.
> 
> Introduce to helpers. io_cq_lock(), which is simple and only grabs
> ->completion_lock, and io_cq_unlock_post() encapsulating the three call
> section.

I'm a bit split on this one, since I generally hate helpers that are
just wrapping something trivial:

static inline void io_cq_lock(struct io_ring_ctx *ctx)
	__acquires(ctx->completion_lock)
{
	spin_lock(&ctx->completion_lock);
}

The problem imho is that when I see spin_lock(ctx->lock) in the code I
know exactly what it does, if I see io_cq_lock(ctx) I have a good guess,
but I don't know for a fact until I become familiar with that new
helper.

I can see why you're doing it as it gives us symmetry with the unlock
helper, which does indeed make more sense. But I do wonder if we
shouldn't just keep the spin_lock() part the same, and just have the
unlock helper?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation
  2022-06-19 11:26 ` [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation Pavel Begunkov
@ 2022-06-19 13:31   ` Jens Axboe
  2022-06-19 14:52     ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2022-06-19 13:31 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 6/19/22 5:26 AM, Pavel Begunkov wrote:
> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
> high level of concurrency that enables it at least for one CQE, but
> sometimes it doesn't save much because nobody waiting on the CQ.
> 
> Remove ->flush_cqes flag and the optimisation, it should benefit the
> normal use case. Note, that there is no spurious eventfd problem with
> that as checks for spuriousness were incorporated into
> io_eventfd_signal().

Would be note to quantify, which should be pretty easy. Eg run a nop
workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
it to the extreme, and I do think it'd be nice to have an understanding
of how big the gap could potentially be.

With luck, it doesn't really matter. Always nice to kill stuff like
this, if it isn't that impactful.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 6/7] io_uring: introduce locking helpers for CQE posting
  2022-06-19 13:30   ` Jens Axboe
@ 2022-06-19 14:20     ` Pavel Begunkov
  2022-06-19 15:50       ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 14:20 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 6/19/22 14:30, Jens Axboe wrote:
> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>> spin_lock(&ctx->completion_lock);
>> /* post CQEs */
>> io_commit_cqring(ctx);
>> spin_unlock(&ctx->completion_lock);
>> io_cqring_ev_posted(ctx);
>>
>> We have many places repeating this sequence, and the three function
>> unlock section is not perfect from the maintainance perspective and also
>> makes harder to add new locking/sync trick.
>>
>> Introduce to helpers. io_cq_lock(), which is simple and only grabs
>> ->completion_lock, and io_cq_unlock_post() encapsulating the three call
>> section.
> 
> I'm a bit split on this one, since I generally hate helpers that are
> just wrapping something trivial:
> 
> static inline void io_cq_lock(struct io_ring_ctx *ctx)
> 	__acquires(ctx->completion_lock)
> {
> 	spin_lock(&ctx->completion_lock);
> }
> 
> The problem imho is that when I see spin_lock(ctx->lock) in the code I
> know exactly what it does, if I see io_cq_lock(ctx) I have a good guess,
> but I don't know for a fact until I become familiar with that new
> helper.
> 
> I can see why you're doing it as it gives us symmetry with the unlock
> helper, which does indeed make more sense. But I do wonder if we
> shouldn't just keep the spin_lock() part the same, and just have the
> unlock helper?

That what I was doing first, but it's too ugly, that's the main
reason. And if we find that removing locking with SINGLE_ISSUER
is worth it, it'd need modification on the locking side:

cq_lock() {
	if (!(ctx->flags & SINGLE_ISSUER))
		lock(compl_lock);
}

cq_unlock() {
	...
	if (!(ctx->flags & SINGLE_ISSUER))
		unlock(compl_lock);
}


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation
  2022-06-19 13:31   ` Jens Axboe
@ 2022-06-19 14:52     ` Pavel Begunkov
  2022-06-19 15:52       ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 14:52 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 6/19/22 14:31, Jens Axboe wrote:
> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>> high level of concurrency that enables it at least for one CQE, but
>> sometimes it doesn't save much because nobody waiting on the CQ.
>>
>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>> normal use case. Note, that there is no spurious eventfd problem with
>> that as checks for spuriousness were incorporated into
>> io_eventfd_signal().
> 
> Would be note to quantify, which should be pretty easy. Eg run a nop
> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
> it to the extreme, and I do think it'd be nice to have an understanding
> of how big the gap could potentially be.
> 
> With luck, it doesn't really matter. Always nice to kill stuff like
> this, if it isn't that impactful.

Trying without this patch nops32 (submit 32 nops, complete all, repeat).

1) all CQE_SKIP:
	~51 Mreqs/s
2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
	~49 Mreq/s
3) same as 2) but another task waits on CQ (so we call wake_up_all)
	~36 Mreq/s

And that's more or less expected. What is more interesting for me
is how often for those using CQE_SKIP it helps to avoid this
ev_posted()/etc. They obviously can't just mark all requests
with it, and most probably helping only some quite niche cases.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 6/7] io_uring: introduce locking helpers for CQE posting
  2022-06-19 14:20     ` Pavel Begunkov
@ 2022-06-19 15:50       ` Jens Axboe
  0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2022-06-19 15:50 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 6/19/22 8:20 AM, Pavel Begunkov wrote:
> On 6/19/22 14:30, Jens Axboe wrote:
>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>> spin_lock(&ctx->completion_lock);
>>> /* post CQEs */
>>> io_commit_cqring(ctx);
>>> spin_unlock(&ctx->completion_lock);
>>> io_cqring_ev_posted(ctx);
>>>
>>> We have many places repeating this sequence, and the three function
>>> unlock section is not perfect from the maintainance perspective and also
>>> makes harder to add new locking/sync trick.
>>>
>>> Introduce to helpers. io_cq_lock(), which is simple and only grabs
>>> ->completion_lock, and io_cq_unlock_post() encapsulating the three call
>>> section.
>>
>> I'm a bit split on this one, since I generally hate helpers that are
>> just wrapping something trivial:
>>
>> static inline void io_cq_lock(struct io_ring_ctx *ctx)
>>     __acquires(ctx->completion_lock)
>> {
>>     spin_lock(&ctx->completion_lock);
>> }
>>
>> The problem imho is that when I see spin_lock(ctx->lock) in the code I
>> know exactly what it does, if I see io_cq_lock(ctx) I have a good guess,
>> but I don't know for a fact until I become familiar with that new
>> helper.
>>
>> I can see why you're doing it as it gives us symmetry with the unlock
>> helper, which does indeed make more sense. But I do wonder if we
>> shouldn't just keep the spin_lock() part the same, and just have the
>> unlock helper?
> 
> That what I was doing first, but it's too ugly, that's the main
> reason. And if we find that removing locking with SINGLE_ISSUER
> is worth it, it'd need modification on the locking side:
> 
> cq_lock() {
>     if (!(ctx->flags & SINGLE_ISSUER))
>         lock(compl_lock);
> }
> 
> cq_unlock() {
>     ...
>     if (!(ctx->flags & SINGLE_ISSUER))
>         unlock(compl_lock);
> }

OK, that makes sense, if the helper will grow further changes.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation
  2022-06-19 14:52     ` Pavel Begunkov
@ 2022-06-19 15:52       ` Jens Axboe
  2022-06-19 16:15         ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2022-06-19 15:52 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 6/19/22 8:52 AM, Pavel Begunkov wrote:
> On 6/19/22 14:31, Jens Axboe wrote:
>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>> high level of concurrency that enables it at least for one CQE, but
>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>
>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>> normal use case. Note, that there is no spurious eventfd problem with
>>> that as checks for spuriousness were incorporated into
>>> io_eventfd_signal().
>>
>> Would be note to quantify, which should be pretty easy. Eg run a nop
>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>> it to the extreme, and I do think it'd be nice to have an understanding
>> of how big the gap could potentially be.
>>
>> With luck, it doesn't really matter. Always nice to kill stuff like
>> this, if it isn't that impactful.
> 
> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
> 
> 1) all CQE_SKIP:
>     ~51 Mreqs/s
> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>     ~49 Mreq/s
> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>     ~36 Mreq/s
> 
> And that's more or less expected. What is more interesting for me
> is how often for those using CQE_SKIP it helps to avoid this
> ev_posted()/etc. They obviously can't just mark all requests
> with it, and most probably helping only some quite niche cases.

That's not too bad. But I think we disagree on CQE_SKIP being niche,
there are several standard cases where it makes sense. Provide buffers
is one, though that one we have a better solution for now. But also eg
OP_CLOSE is something that I'd personally use CQE_SKIP with always.

Hence I don't think it's fair or reasonable to call it "quite niche" in
terms of general usability.

But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
as we'll likely see more broad appeal from that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 0/7] cqe posting cleanups
  2022-06-19 11:26 [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
                   ` (7 preceding siblings ...)
  2022-06-19 12:36 ` [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
@ 2022-06-19 16:01 ` Jens Axboe
  8 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2022-06-19 16:01 UTC (permalink / raw)
  To: asml.silence, io-uring

On Sun, 19 Jun 2022 12:26:03 +0100, Pavel Begunkov wrote:
> Apart from this patches removing some implicit assumptions, which we
> had problems with before, and making code cleaner, they and especially
> 6-7 are also needed to push for synchronisation optimisations later, lile
> [1] or removing spinlocking with SINGLE_ISSUER.
> 
> The downside is that we add additional lock/unlock into eventfd path,
> but I don't think we care about it.
> 
> [...]

Applied, thanks!

[1/7] io_uring: remove extra io_commit_cqring()
      commit: 7b303f5b95b660088f9cdb28f4ee601ccb68865b
[2/7] io_uring: reshuffle io_uring/io_uring.h
      commit: a41959b3fc45a02cb198567c1999b1840082e25a
[3/7] io_uring: move io_eventfd_signal()
      commit: 7687834cfe602a7cd71c702e91865745194e9111
[4/7] io_uring: hide eventfd assumptions in evenfd paths
      commit: 86baeb81befdfe85ee024ba57a376af5fb956678
[5/7] io_uring: remove ->flush_cqes optimisation
      commit: 812c7f7f73fdb2d5cb870e3bcf042a1c0ad03273
[6/7] io_uring: introduce locking helpers for CQE posting
      commit: d74ebb4263668909c697688604096c468b04cacd
[7/7] io_uring: add io_commit_cqring_flush()
      commit: 07ab94ca3e3123fa39e498794ff59cb07fecafad

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation
  2022-06-19 15:52       ` Jens Axboe
@ 2022-06-19 16:15         ` Pavel Begunkov
  2022-06-19 16:17           ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 16:15 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 6/19/22 16:52, Jens Axboe wrote:
> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>> On 6/19/22 14:31, Jens Axboe wrote:
>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>> high level of concurrency that enables it at least for one CQE, but
>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>
>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>> that as checks for spuriousness were incorporated into
>>>> io_eventfd_signal().
>>>
>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>> it to the extreme, and I do think it'd be nice to have an understanding
>>> of how big the gap could potentially be.
>>>
>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>> this, if it isn't that impactful.
>>
>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>
>> 1) all CQE_SKIP:
>>      ~51 Mreqs/s
>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>      ~49 Mreq/s
>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>      ~36 Mreq/s
>>
>> And that's more or less expected. What is more interesting for me
>> is how often for those using CQE_SKIP it helps to avoid this
>> ev_posted()/etc. They obviously can't just mark all requests
>> with it, and most probably helping only some quite niche cases.
> 
> That's not too bad. But I think we disagree on CQE_SKIP being niche,

I wasn't talking about CQE_SKIP but rather cases where that
->flush_cqes actually does anything. Consider that when at least
one of the requests queued for inline completion is not CQE_SKIP
->flush_cqes is effectively disabled.

> there are several standard cases where it makes sense. Provide buffers
> is one, though that one we have a better solution for now. But also eg
> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
> 
> Hence I don't think it's fair or reasonable to call it "quite niche" in
> terms of general usability.
> 
> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
> as we'll likely see more broad appeal from that.

It neither conflicts with the SINGLE_ISSUER locking optimisations
nor with the meantioned mb() optimisation. So, if there is a good
reason to leave ->flush_cqes alone we can drop the patch.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation
  2022-06-19 16:15         ` Pavel Begunkov
@ 2022-06-19 16:17           ` Jens Axboe
  2022-06-19 16:19             ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2022-06-19 16:17 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 6/19/22 10:15 AM, Pavel Begunkov wrote:
> On 6/19/22 16:52, Jens Axboe wrote:
>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>
>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>> that as checks for spuriousness were incorporated into
>>>>> io_eventfd_signal().
>>>>
>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>> of how big the gap could potentially be.
>>>>
>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>> this, if it isn't that impactful.
>>>
>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>
>>> 1) all CQE_SKIP:
>>>      ~51 Mreqs/s
>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>      ~49 Mreq/s
>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>      ~36 Mreq/s
>>>
>>> And that's more or less expected. What is more interesting for me
>>> is how often for those using CQE_SKIP it helps to avoid this
>>> ev_posted()/etc. They obviously can't just mark all requests
>>> with it, and most probably helping only some quite niche cases.
>>
>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
> 
> I wasn't talking about CQE_SKIP but rather cases where that
> ->flush_cqes actually does anything. Consider that when at least
> one of the requests queued for inline completion is not CQE_SKIP
> ->flush_cqes is effectively disabled.
> 
>> there are several standard cases where it makes sense. Provide buffers
>> is one, though that one we have a better solution for now. But also eg
>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>
>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>> terms of general usability.
>>
>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>> as we'll likely see more broad appeal from that.
> 
> It neither conflicts with the SINGLE_ISSUER locking optimisations
> nor with the meantioned mb() optimisation. So, if there is a good
> reason to leave ->flush_cqes alone we can drop the patch.

Let me flip that around - is there a good reason NOT to leave the
optimization in there then?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation
  2022-06-19 16:17           ` Jens Axboe
@ 2022-06-19 16:19             ` Pavel Begunkov
  2022-06-19 16:38               ` Jens Axboe
  2022-06-19 16:38               ` Jens Axboe
  0 siblings, 2 replies; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 16:19 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 6/19/22 17:17, Jens Axboe wrote:
> On 6/19/22 10:15 AM, Pavel Begunkov wrote:
>> On 6/19/22 16:52, Jens Axboe wrote:
>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>>
>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>>> that as checks for spuriousness were incorporated into
>>>>>> io_eventfd_signal().
>>>>>
>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>>> of how big the gap could potentially be.
>>>>>
>>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>>> this, if it isn't that impactful.
>>>>
>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>>
>>>> 1) all CQE_SKIP:
>>>>       ~51 Mreqs/s
>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>>       ~49 Mreq/s
>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>>       ~36 Mreq/s
>>>>
>>>> And that's more or less expected. What is more interesting for me
>>>> is how often for those using CQE_SKIP it helps to avoid this
>>>> ev_posted()/etc. They obviously can't just mark all requests
>>>> with it, and most probably helping only some quite niche cases.
>>>
>>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
>>
>> I wasn't talking about CQE_SKIP but rather cases where that
>> ->flush_cqes actually does anything. Consider that when at least
>> one of the requests queued for inline completion is not CQE_SKIP
>> ->flush_cqes is effectively disabled.
>>
>>> there are several standard cases where it makes sense. Provide buffers
>>> is one, though that one we have a better solution for now. But also eg
>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>>
>>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>>> terms of general usability.
>>>
>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>>> as we'll likely see more broad appeal from that.
>>
>> It neither conflicts with the SINGLE_ISSUER locking optimisations
>> nor with the meantioned mb() optimisation. So, if there is a good
>> reason to leave ->flush_cqes alone we can drop the patch.
> 
> Let me flip that around - is there a good reason NOT to leave the
> optimization in there then?

Apart from ifs in the hot path with no understanding whether
it helps anything, no

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation
  2022-06-19 16:19             ` Pavel Begunkov
@ 2022-06-19 16:38               ` Jens Axboe
  2022-06-19 16:38               ` Jens Axboe
  1 sibling, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2022-06-19 16:38 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 6/19/22 10:19 AM, Pavel Begunkov wrote:
> On 6/19/22 17:17, Jens Axboe wrote:
>> On 6/19/22 10:15 AM, Pavel Begunkov wrote:
>>> On 6/19/22 16:52, Jens Axboe wrote:
>>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>>>
>>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>>>> that as checks for spuriousness were incorporated into
>>>>>>> io_eventfd_signal().
>>>>>>
>>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>>>> of how big the gap could potentially be.
>>>>>>
>>>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>>>> this, if it isn't that impactful.
>>>>>
>>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>>>
>>>>> 1) all CQE_SKIP:
>>>>>       ~51 Mreqs/s
>>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>>>       ~49 Mreq/s
>>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>>>       ~36 Mreq/s
>>>>>
>>>>> And that's more or less expected. What is more interesting for me
>>>>> is how often for those using CQE_SKIP it helps to avoid this
>>>>> ev_posted()/etc. They obviously can't just mark all requests
>>>>> with it, and most probably helping only some quite niche cases.
>>>>
>>>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
>>>
>>> I wasn't talking about CQE_SKIP but rather cases where that
>>> ->flush_cqes actually does anything. Consider that when at least
>>> one of the requests queued for inline completion is not CQE_SKIP
>>> ->flush_cqes is effectively disabled.
>>>
>>>> there are several standard cases where it makes sense. Provide buffers
>>>> is one, though that one we have a better solution for now. But also eg
>>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>>>
>>>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>>>> terms of general usability.
>>>>
>>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>>>> as we'll likely see more broad appeal from that.
>>>
>>> It neither conflicts with the SINGLE_ISSUER locking optimisations
>>> nor with the meantioned mb() optimisation. So, if there is a good
>>> reason to leave ->flush_cqes alone we can drop the patch.
>>
>> Let me flip that around - is there a good reason NOT to leave the
>> optimization in there then?
> 
> Apart from ifs in the hot path with no understanding whether
> it helps anything, no

Let's just keep the patch. Ratio of skip to non-skip should still be
very tiny.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation
  2022-06-19 16:19             ` Pavel Begunkov
  2022-06-19 16:38               ` Jens Axboe
@ 2022-06-19 16:38               ` Jens Axboe
  1 sibling, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2022-06-19 16:38 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 6/19/22 10:19 AM, Pavel Begunkov wrote:
> On 6/19/22 17:17, Jens Axboe wrote:
>> On 6/19/22 10:15 AM, Pavel Begunkov wrote:
>>> On 6/19/22 16:52, Jens Axboe wrote:
>>>> On 6/19/22 8:52 AM, Pavel Begunkov wrote:
>>>>> On 6/19/22 14:31, Jens Axboe wrote:
>>>>>> On 6/19/22 5:26 AM, Pavel Begunkov wrote:
>>>>>>> It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
>>>>>>> ->flush_cqes flag prevents from completion being flushed. Sometimes it's
>>>>>>> high level of concurrency that enables it at least for one CQE, but
>>>>>>> sometimes it doesn't save much because nobody waiting on the CQ.
>>>>>>>
>>>>>>> Remove ->flush_cqes flag and the optimisation, it should benefit the
>>>>>>> normal use case. Note, that there is no spurious eventfd problem with
>>>>>>> that as checks for spuriousness were incorporated into
>>>>>>> io_eventfd_signal().
>>>>>>
>>>>>> Would be note to quantify, which should be pretty easy. Eg run a nop
>>>>>> workload, then run the same but with CQE_SKIP_SUCCESS set. That'd take
>>>>>> it to the extreme, and I do think it'd be nice to have an understanding
>>>>>> of how big the gap could potentially be.
>>>>>>
>>>>>> With luck, it doesn't really matter. Always nice to kill stuff like
>>>>>> this, if it isn't that impactful.
>>>>>
>>>>> Trying without this patch nops32 (submit 32 nops, complete all, repeat).
>>>>>
>>>>> 1) all CQE_SKIP:
>>>>>       ~51 Mreqs/s
>>>>> 2) all CQE_SKIP but last, so it triggers locking + *ev_posted()
>>>>>       ~49 Mreq/s
>>>>> 3) same as 2) but another task waits on CQ (so we call wake_up_all)
>>>>>       ~36 Mreq/s
>>>>>
>>>>> And that's more or less expected. What is more interesting for me
>>>>> is how often for those using CQE_SKIP it helps to avoid this
>>>>> ev_posted()/etc. They obviously can't just mark all requests
>>>>> with it, and most probably helping only some quite niche cases.
>>>>
>>>> That's not too bad. But I think we disagree on CQE_SKIP being niche,
>>>
>>> I wasn't talking about CQE_SKIP but rather cases where that
>>> ->flush_cqes actually does anything. Consider that when at least
>>> one of the requests queued for inline completion is not CQE_SKIP
>>> ->flush_cqes is effectively disabled.
>>>
>>>> there are several standard cases where it makes sense. Provide buffers
>>>> is one, though that one we have a better solution for now. But also eg
>>>> OP_CLOSE is something that I'd personally use CQE_SKIP with always.
>>>>
>>>> Hence I don't think it's fair or reasonable to call it "quite niche" in
>>>> terms of general usability.
>>>>
>>>> But if this helps in terms of SINGLE_ISSUER, then I think it's worth it
>>>> as we'll likely see more broad appeal from that.
>>>
>>> It neither conflicts with the SINGLE_ISSUER locking optimisations
>>> nor with the meantioned mb() optimisation. So, if there is a good
>>> reason to leave ->flush_cqes alone we can drop the patch.
>>
>> Let me flip that around - is there a good reason NOT to leave the
>> optimization in there then?
> 
> Apart from ifs in the hot path with no understanding whether
> it helps anything, no

Let's just keep the patch. Ratio of skip to non-skip should still be
very tiny.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 4/7] io_uring: hide eventfd assumptions in evenfd paths
  2022-06-19 11:26 ` [PATCH for-next 4/7] io_uring: hide eventfd assumptions in evenfd paths Pavel Begunkov
@ 2022-06-19 18:18   ` Jens Axboe
  2022-06-19 18:49     ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2022-06-19 18:18 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, Pavel Begunkov (Silence)

On Sun, Jun 19, 2022 at 5:26 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> Some io_uring-eventfd users assume that there won't be spurious wakeups.
> That assumption has to be honoured by all io_cqring_ev_posted() callers,
> which is inconvenient and from time to time leads to problems but should
> be maintained to not break the userspace.
>
> Instead of making the callers to track whether a CQE was posted or not,
> hide it inside io_eventfd_signal(). It saves ->cached_cq_tail it saw
> last time and triggers the eventfd only when ->cached_cq_tail changed
> since then.

This one is causing frequent errors with poll-cancel.t:

axboe@m1pro-kvm ~/g/liburing (master)> test/poll-cancel.t
axboe@m1pro-kvm ~/g/liburing (master)> test/poll-cancel.t
Timed out!
axboe@m1pro-kvm ~/g/liburing (master) [1]> test/poll-cancel.t
Timed out!
axboe@m1pro-kvm ~/g/liburing (master) [1]> test/poll-cancel.t
Timed out!

I've dropped this one, and 6-7/7 as they then also throw a bunch of
rejects.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 4/7] io_uring: hide eventfd assumptions in evenfd paths
  2022-06-19 18:18   ` Jens Axboe
@ 2022-06-19 18:49     ` Pavel Begunkov
  2022-06-19 18:58       ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2022-06-19 18:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

On 6/19/22 19:18, Jens Axboe wrote:
> On Sun, Jun 19, 2022 at 5:26 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> Some io_uring-eventfd users assume that there won't be spurious wakeups.
>> That assumption has to be honoured by all io_cqring_ev_posted() callers,
>> which is inconvenient and from time to time leads to problems but should
>> be maintained to not break the userspace.
>>
>> Instead of making the callers to track whether a CQE was posted or not,
>> hide it inside io_eventfd_signal(). It saves ->cached_cq_tail it saw
>> last time and triggers the eventfd only when ->cached_cq_tail changed
>> since then.
> 
> This one is causing frequent errors with poll-cancel.t:
> 
> axboe@m1pro-kvm ~/g/liburing (master)> test/poll-cancel.t
> axboe@m1pro-kvm ~/g/liburing (master)> test/poll-cancel.t
> Timed out!
> axboe@m1pro-kvm ~/g/liburing (master) [1]> test/poll-cancel.t
> Timed out!
> axboe@m1pro-kvm ~/g/liburing (master) [1]> test/poll-cancel.t
> Timed out!
> 
> I've dropped this one, and 6-7/7 as they then also throw a bunch of
> rejects.

I mentioned it in the cover letter, extra wake ups slowing task
exit cancellations down, which make it to timeout (increasing
alarm time helps). The problem is in cancellation, in particular
because for some reason it spins on the cancellation (including
timeout and poll remove all).

I'll look into it, but it's rather "discovered by accident"
rather than caused by the patch.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 4/7] io_uring: hide eventfd assumptions in evenfd paths
  2022-06-19 18:49     ` Pavel Begunkov
@ 2022-06-19 18:58       ` Jens Axboe
  0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2022-06-19 18:58 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring

On 6/19/22 12:49 PM, Pavel Begunkov wrote:
> On 6/19/22 19:18, Jens Axboe wrote:
>> On Sun, Jun 19, 2022 at 5:26 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>
>>> Some io_uring-eventfd users assume that there won't be spurious wakeups.
>>> That assumption has to be honoured by all io_cqring_ev_posted() callers,
>>> which is inconvenient and from time to time leads to problems but should
>>> be maintained to not break the userspace.
>>>
>>> Instead of making the callers to track whether a CQE was posted or not,
>>> hide it inside io_eventfd_signal(). It saves ->cached_cq_tail it saw
>>> last time and triggers the eventfd only when ->cached_cq_tail changed
>>> since then.
>>
>> This one is causing frequent errors with poll-cancel.t:
>>
>> axboe@m1pro-kvm ~/g/liburing (master)> test/poll-cancel.t
>> axboe@m1pro-kvm ~/g/liburing (master)> test/poll-cancel.t
>> Timed out!
>> axboe@m1pro-kvm ~/g/liburing (master) [1]> test/poll-cancel.t
>> Timed out!
>> axboe@m1pro-kvm ~/g/liburing (master) [1]> test/poll-cancel.t
>> Timed out!
>>
>> I've dropped this one, and 6-7/7 as they then also throw a bunch of
>> rejects.
> 
> I mentioned it in the cover letter, extra wake ups slowing task
> exit cancellations down, which make it to timeout (increasing
> alarm time helps). The problem is in cancellation, in particular
> because for some reason it spins on the cancellation (including
> timeout and poll remove all).

But that's not really usable at all. Before the patch, the test case
takes 1-2ms, predictably. After the patch, I see lots of 10.0s runs.
That's clearly not going to work.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-06-19 18:59 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-19 11:26 [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
2022-06-19 11:26 ` [PATCH for-next 1/7] io_uring: remove extra io_commit_cqring() Pavel Begunkov
2022-06-19 11:26 ` [PATCH for-next 2/7] io_uring: reshuffle io_uring/io_uring.h Pavel Begunkov
2022-06-19 11:26 ` [PATCH for-next 3/7] io_uring: move io_eventfd_signal() Pavel Begunkov
2022-06-19 11:26 ` [PATCH for-next 4/7] io_uring: hide eventfd assumptions in evenfd paths Pavel Begunkov
2022-06-19 18:18   ` Jens Axboe
2022-06-19 18:49     ` Pavel Begunkov
2022-06-19 18:58       ` Jens Axboe
2022-06-19 11:26 ` [PATCH for-next 5/7] io_uring: remove ->flush_cqes optimisation Pavel Begunkov
2022-06-19 13:31   ` Jens Axboe
2022-06-19 14:52     ` Pavel Begunkov
2022-06-19 15:52       ` Jens Axboe
2022-06-19 16:15         ` Pavel Begunkov
2022-06-19 16:17           ` Jens Axboe
2022-06-19 16:19             ` Pavel Begunkov
2022-06-19 16:38               ` Jens Axboe
2022-06-19 16:38               ` Jens Axboe
2022-06-19 11:26 ` [PATCH for-next 6/7] io_uring: introduce locking helpers for CQE posting Pavel Begunkov
2022-06-19 13:30   ` Jens Axboe
2022-06-19 14:20     ` Pavel Begunkov
2022-06-19 15:50       ` Jens Axboe
2022-06-19 11:26 ` [PATCH for-next 7/7] io_uring: add io_commit_cqring_flush() Pavel Begunkov
2022-06-19 12:36 ` [PATCH for-next 0/7] cqe posting cleanups Pavel Begunkov
2022-06-19 16:01 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.