All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations
@ 2022-06-14 14:36 Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 01/25] io_uring: make reg buf init consistent Pavel Begunkov
                   ` (24 more replies)
  0 siblings, 25 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

1-13 are cleanups after splitting io_uring into files.

Patch 14 from Hao should remove some overhead from poll requests

Patch 15 from Hao adds per-bucket spinlocks, and 16-19 do a little
bit of cleanup. The downside of per-bucket spinlocks is that it adds
additional spinlock/unlock pair in the poll request completion side,
which shouldn't matter much with 20/25.

Patch 20 uses inline completion infra for poll requests, this nicely
improves perf when there is a good tw batching.

Patch 21 implements the userspace visible side of
IORING_SETUP_SINGLE_ISSUER, it'll be used for poll requests and
later for spinlock optimisations.

22-25 introduces ->uring_lock protected cancellation hashing. It
requires us to grab ->uring_lock in the completion side, but saves
two spin lock/unlock pairs. We apply it automatically in cases the
mutex is already likely to be held (see 25/25 description), so there
is no additional mutex overhead and potential latency problemes.


Numbers:

The used poll benchmark each iteration queues a batch of 32 POLLIN
poll requests and triggers all of them with read (+write).

baseline (patches 1-18):
    11720 K req/s
base + 19 (+ inline completion infra)
    12419 K req/s, ~+6%
base + 19-25 (+ uring_lock hashing):
    12804 K req/s, +9.2% from the baseline, or +3.2% relative to patch 19.

Note that patch 19 only helps performance of poll-add requests, whenever
25/25 also improves apoll.

v2:
  don't move ->cancel_seq out of iowq work struct
  fix up single-issuer

Hao Xu (2):
  io_uring: poll: remove unnecessary req->ref set
  io_uring: switch cancel_hash to use per entry spinlock

Pavel Begunkov (23):
  io_uring: make reg buf init consistent
  io_uring: move defer_list to slow data
  io_uring: better caching for ctx timeout fields
  io_uring: refactor ctx slow data placement
  io_uring: move small helpers to headers
  io_uring: explain io_wq_work::cancel_seq placement
  io_uring: inline ->registered_rings
  io_uring: don't set REQ_F_COMPLETE_INLINE in tw
  io_uring: never defer-complete multi-apoll
  io_uring: kill REQ_F_COMPLETE_INLINE
  io_uring: refactor io_req_task_complete()
  io_uring: don't inline io_put_kbuf
  io_uring: remove check_cq checking from hot paths
  io_uring: pass poll_find lock back
  io_uring: clean up io_try_cancel
  io_uring: limit number hash buckets
  io_uring: clean up io_ring_ctx_alloc
  io_uring: use state completion infra for poll reqs
  io_uring: add IORING_SETUP_SINGLE_ISSUER
  io_uring: pass hash table into poll_find
  io_uring: introduce a struct for hash table
  io_uring: propagate locking state to poll cancel
  io_uring: mutex locked poll hashing

 include/uapi/linux/io_uring.h |   5 +-
 io_uring/cancel.c             |  23 +++-
 io_uring/cancel.h             |   4 +-
 io_uring/fdinfo.c             |  11 +-
 io_uring/io-wq.h              |   1 +
 io_uring/io_uring.c           | 145 +++++++++++-----------
 io_uring/io_uring.h           |  17 +++
 io_uring/io_uring_types.h     | 108 +++++++++--------
 io_uring/kbuf.c               |  33 +++++
 io_uring/kbuf.h               |  38 +-----
 io_uring/poll.c               | 219 +++++++++++++++++++++++++---------
 io_uring/poll.h               |   3 +-
 io_uring/rsrc.c               |   9 +-
 io_uring/tctx.c               |  36 ++++--
 io_uring/tctx.h               |   7 +-
 io_uring/timeout.c            |   3 +-
 16 files changed, 416 insertions(+), 246 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 01/25] io_uring: make reg buf init consistent
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 02/25] io_uring: move defer_list to slow data Pavel Begunkov
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

The default (i.e. empty) state of register buffer is dummy_ubuf, so set
it to dummy on init instead of NULL.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/rsrc.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index fef46972c327..fd1323482030 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -567,7 +567,7 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx,
 				io_buffer_unmap(ctx, &imu);
 				break;
 			}
-			ctx->user_bufs[i] = NULL;
+			ctx->user_bufs[i] = ctx->dummy_ubuf;
 			needs_switch = true;
 		}
 
@@ -1200,14 +1200,11 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
 	size_t size;
 	int ret, nr_pages, i;
 
-	if (!iov->iov_base) {
-		*pimu = ctx->dummy_ubuf;
+	*pimu = ctx->dummy_ubuf;
+	if (!iov->iov_base)
 		return 0;
-	}
 
-	*pimu = NULL;
 	ret = -ENOMEM;
-
 	pages = io_pin_pages((unsigned long) iov->iov_base, iov->iov_len,
 				&nr_pages);
 	if (IS_ERR(pages)) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 02/25] io_uring: move defer_list to slow data
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 01/25] io_uring: make reg buf init consistent Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 03/25] io_uring: better caching for ctx timeout fields Pavel Begunkov
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

draining is slow path, move defer_list to the end where slow data lives
inside the context.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring_types.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 7c22cf35a7e2..52e91c3df8d5 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -160,7 +160,6 @@ struct io_ring_ctx {
 		struct io_uring_sqe	*sq_sqes;
 		unsigned		cached_sq_head;
 		unsigned		sq_entries;
-		struct list_head	defer_list;
 
 		/*
 		 * Fixed resources fast path, should be accessed only under
@@ -272,8 +271,12 @@ struct io_ring_ctx {
 		struct work_struct		exit_work;
 		struct list_head		tctx_list;
 		struct completion		ref_comp;
+
+		/* io-wq management, e.g. thread count */
 		u32				iowq_limits[2];
 		bool				iowq_limits_set;
+
+		struct list_head		defer_list;
 	};
 };
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 03/25] io_uring: better caching for ctx timeout fields
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 01/25] io_uring: make reg buf init consistent Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 02/25] io_uring: move defer_list to slow data Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement Pavel Begunkov
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Following timeout fields access patterns, move all of them into a
separate cache line inside ctx, so they don't intervene with normal
completion caching, especially since timeout removals and completion
are separated and the later is done via tw.

It also sheds some bytes from io_ring_ctx, 1216B -> 1152B

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring_types.h | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 52e91c3df8d5..4f52dcbbda56 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -179,8 +179,6 @@ struct io_ring_ctx {
 		struct xarray		io_bl_xa;
 		struct list_head	io_buffers_cache;
 
-		struct list_head	timeout_list;
-		struct list_head	ltimeout_list;
 		struct list_head	cq_overflow_list;
 		struct list_head	apoll_cache;
 		struct xarray		personalities;
@@ -213,15 +211,11 @@ struct io_ring_ctx {
 		struct io_ev_fd	__rcu	*io_ev_fd;
 		struct wait_queue_head	cq_wait;
 		unsigned		cq_extra;
-		atomic_t		cq_timeouts;
-		unsigned		cq_last_tm_flush;
 	} ____cacheline_aligned_in_smp;
 
 	struct {
 		spinlock_t		completion_lock;
 
-		spinlock_t		timeout_lock;
-
 		/*
 		 * ->iopoll_list is protected by the ctx->uring_lock for
 		 * io_uring instances that don't use IORING_SETUP_SQPOLL.
@@ -253,6 +247,15 @@ struct io_ring_ctx {
 		struct list_head	io_buffers_pages;
 	};
 
+	/* timeouts */
+	struct {
+		spinlock_t		timeout_lock;
+		atomic_t		cq_timeouts;
+		struct list_head	timeout_list;
+		struct list_head	ltimeout_list;
+		unsigned		cq_last_tm_flush;
+	} ____cacheline_aligned_in_smp;
+
 	/* Keep this last, we don't need it for the fast path */
 	struct {
 		#if defined(CONFIG_UNIX)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (2 preceding siblings ...)
  2022-06-14 14:36 ` [PATCH for-next v2 03/25] io_uring: better caching for ctx timeout fields Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
  2022-06-15  7:58   ` Hao Xu
  2022-06-14 14:36 ` [PATCH for-next v2 05/25] io_uring: move small helpers to headers Pavel Begunkov
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Shove all slow path data at the end of ctx and get rid of extra
indention.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring_types.h | 81 +++++++++++++++++++--------------------
 1 file changed, 39 insertions(+), 42 deletions(-)

diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 4f52dcbbda56..ca8e25992ece 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -183,7 +183,6 @@ struct io_ring_ctx {
 		struct list_head	apoll_cache;
 		struct xarray		personalities;
 		u32			pers_next;
-		unsigned		sq_thread_idle;
 	} ____cacheline_aligned_in_smp;
 
 	/* IRQ completion list, under ->completion_lock */
@@ -230,23 +229,6 @@ struct io_ring_ctx {
 		struct list_head	io_buffers_comp;
 	} ____cacheline_aligned_in_smp;
 
-	struct io_restriction		restrictions;
-
-	/* slow path rsrc auxilary data, used by update/register */
-	struct {
-		struct io_rsrc_node		*rsrc_backup_node;
-		struct io_mapped_ubuf		*dummy_ubuf;
-		struct io_rsrc_data		*file_data;
-		struct io_rsrc_data		*buf_data;
-
-		struct delayed_work		rsrc_put_work;
-		struct llist_head		rsrc_put_llist;
-		struct list_head		rsrc_ref_list;
-		spinlock_t			rsrc_ref_lock;
-
-		struct list_head	io_buffers_pages;
-	};
-
 	/* timeouts */
 	struct {
 		spinlock_t		timeout_lock;
@@ -257,30 +239,45 @@ struct io_ring_ctx {
 	} ____cacheline_aligned_in_smp;
 
 	/* Keep this last, we don't need it for the fast path */
-	struct {
-		#if defined(CONFIG_UNIX)
-			struct socket		*ring_sock;
-		#endif
-		/* hashed buffered write serialization */
-		struct io_wq_hash		*hash_map;
-
-		/* Only used for accounting purposes */
-		struct user_struct		*user;
-		struct mm_struct		*mm_account;
-
-		/* ctx exit and cancelation */
-		struct llist_head		fallback_llist;
-		struct delayed_work		fallback_work;
-		struct work_struct		exit_work;
-		struct list_head		tctx_list;
-		struct completion		ref_comp;
-
-		/* io-wq management, e.g. thread count */
-		u32				iowq_limits[2];
-		bool				iowq_limits_set;
-
-		struct list_head		defer_list;
-	};
+
+	struct io_restriction		restrictions;
+
+	/* slow path rsrc auxilary data, used by update/register */
+	struct io_rsrc_node		*rsrc_backup_node;
+	struct io_mapped_ubuf		*dummy_ubuf;
+	struct io_rsrc_data		*file_data;
+	struct io_rsrc_data		*buf_data;
+
+	struct delayed_work		rsrc_put_work;
+	struct llist_head		rsrc_put_llist;
+	struct list_head		rsrc_ref_list;
+	spinlock_t			rsrc_ref_lock;
+
+	struct list_head		io_buffers_pages;
+
+	#if defined(CONFIG_UNIX)
+		struct socket		*ring_sock;
+	#endif
+	/* hashed buffered write serialization */
+	struct io_wq_hash		*hash_map;
+
+	/* Only used for accounting purposes */
+	struct user_struct		*user;
+	struct mm_struct		*mm_account;
+
+	/* ctx exit and cancelation */
+	struct llist_head		fallback_llist;
+	struct delayed_work		fallback_work;
+	struct work_struct		exit_work;
+	struct list_head		tctx_list;
+	struct completion		ref_comp;
+
+	/* io-wq management, e.g. thread count */
+	u32				iowq_limits[2];
+	bool				iowq_limits_set;
+
+	struct list_head		defer_list;
+	unsigned			sq_thread_idle;
 };
 
 enum {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 05/25] io_uring: move small helpers to headers
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (3 preceding siblings ...)
  2022-06-14 14:36 ` [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 06/25] io_uring: explain io_wq_work::cancel_seq placement Pavel Begunkov
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

There is a bunch of inline helpers that will be useful not only to the
core of io_uring, move them to headers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 22 ----------------------
 io_uring/io_uring.h | 22 ++++++++++++++++++++++
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 6a94d1682aaf..3fdb368820c9 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -616,14 +616,6 @@ struct sock *io_uring_get_socket(struct file *file)
 }
 EXPORT_SYMBOL(io_uring_get_socket);
 
-static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
-{
-	if (!*locked) {
-		mutex_lock(&ctx->uring_lock);
-		*locked = true;
-	}
-}
-
 static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
 {
 	if (!wq_list_empty(&ctx->submit_state.compl_reqs))
@@ -879,15 +871,6 @@ static void io_prep_async_link(struct io_kiocb *req)
 	}
 }
 
-static inline void io_req_add_compl_list(struct io_kiocb *req)
-{
-	struct io_submit_state *state = &req->ctx->submit_state;
-
-	if (!(req->flags & REQ_F_CQE_SKIP))
-		state->flush_cqes = true;
-	wq_list_add_tail(&req->comp_list, &state->compl_reqs);
-}
-
 void io_queue_iowq(struct io_kiocb *req, bool *dont_use)
 {
 	struct io_kiocb *link = io_prep_linked_timeout(req);
@@ -1293,11 +1276,6 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2)
 	io_cqring_ev_posted(ctx);
 }
 
-static inline void io_req_complete_state(struct io_kiocb *req)
-{
-	req->flags |= REQ_F_COMPLETE_INLINE;
-}
-
 inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
 {
 	if (issue_flags & IO_URING_F_COMPLETE_DEFER)
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 3660df80e589..26b669746d61 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -193,6 +193,28 @@ static inline bool io_run_task_work(void)
 	return false;
 }
 
+static inline void io_req_complete_state(struct io_kiocb *req)
+{
+	req->flags |= REQ_F_COMPLETE_INLINE;
+}
+
+static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
+{
+	if (!*locked) {
+		mutex_lock(&ctx->uring_lock);
+		*locked = true;
+	}
+}
+
+static inline void io_req_add_compl_list(struct io_kiocb *req)
+{
+	struct io_submit_state *state = &req->ctx->submit_state;
+
+	if (!(req->flags & REQ_F_CQE_SKIP))
+		state->flush_cqes = true;
+	wq_list_add_tail(&req->comp_list, &state->compl_reqs);
+}
+
 int io_run_task_work_sig(void);
 void io_req_complete_failed(struct io_kiocb *req, s32 res);
 void __io_req_complete32(struct io_kiocb *req, unsigned int issue_flags,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 06/25] io_uring: explain io_wq_work::cancel_seq placement
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (4 preceding siblings ...)
  2022-06-14 14:36 ` [PATCH for-next v2 05/25] io_uring: move small helpers to headers Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 07/25] io_uring: inline ->registered_rings Pavel Begunkov
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Add a comment on why we keep ->cancel_seq in struct io_wq_work instead
of struct io_kiocb despite it needed only by io_uring but not io-wq.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io-wq.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h
index ba6eee76d028..3f54ee2a8eeb 100644
--- a/io_uring/io-wq.h
+++ b/io_uring/io-wq.h
@@ -155,6 +155,7 @@ struct io_wq_work_node *wq_stack_extract(struct io_wq_work_node *stack)
 struct io_wq_work {
 	struct io_wq_work_node list;
 	unsigned flags;
+	/* place it here instead of io_kiocb as it fills padding and saves 4B */
 	int cancel_seq;
 };
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 07/25] io_uring: inline ->registered_rings
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (5 preceding siblings ...)
  2022-06-14 14:36 ` [PATCH for-next v2 06/25] io_uring: explain io_wq_work::cancel_seq placement Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 08/25] io_uring: don't set REQ_F_COMPLETE_INLINE in tw Pavel Begunkov
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

There can be only 16 registered rings, no need to allocate an array for
them separately but store it in tctx.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 1 -
 io_uring/tctx.c     | 9 ---------
 io_uring/tctx.h     | 3 ++-
 3 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 3fdb368820c9..dbf0dbc87758 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2840,7 +2840,6 @@ void __io_uring_free(struct task_struct *tsk)
 	WARN_ON_ONCE(tctx->io_wq);
 	WARN_ON_ONCE(tctx->cached_refs);
 
-	kfree(tctx->registered_rings);
 	percpu_counter_destroy(&tctx->inflight);
 	kfree(tctx);
 	tsk->io_uring = NULL;
diff --git a/io_uring/tctx.c b/io_uring/tctx.c
index f3262eef55d4..6adf659687f8 100644
--- a/io_uring/tctx.c
+++ b/io_uring/tctx.c
@@ -55,16 +55,8 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
 	if (unlikely(!tctx))
 		return -ENOMEM;
 
-	tctx->registered_rings = kcalloc(IO_RINGFD_REG_MAX,
-					 sizeof(struct file *), GFP_KERNEL);
-	if (unlikely(!tctx->registered_rings)) {
-		kfree(tctx);
-		return -ENOMEM;
-	}
-
 	ret = percpu_counter_init(&tctx->inflight, 0, GFP_KERNEL);
 	if (unlikely(ret)) {
-		kfree(tctx->registered_rings);
 		kfree(tctx);
 		return ret;
 	}
@@ -73,7 +65,6 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
 	if (IS_ERR(tctx->io_wq)) {
 		ret = PTR_ERR(tctx->io_wq);
 		percpu_counter_destroy(&tctx->inflight);
-		kfree(tctx->registered_rings);
 		kfree(tctx);
 		return ret;
 	}
diff --git a/io_uring/tctx.h b/io_uring/tctx.h
index f4964e40d07e..7684713e950f 100644
--- a/io_uring/tctx.h
+++ b/io_uring/tctx.h
@@ -20,8 +20,9 @@ struct io_uring_task {
 	struct io_wq_work_list	task_list;
 	struct io_wq_work_list	prio_task_list;
 	struct callback_head	task_work;
-	struct file		**registered_rings;
 	bool			task_running;
+
+	struct file		*registered_rings[IO_RINGFD_REG_MAX];
 };
 
 struct io_tctx_node {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 08/25] io_uring: don't set REQ_F_COMPLETE_INLINE in tw
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (6 preceding siblings ...)
  2022-06-14 14:36 ` [PATCH for-next v2 07/25] io_uring: inline ->registered_rings Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
  2022-06-14 14:36 ` [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll Pavel Begunkov
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

io_req_task_complete() enqueues requests for state completion itself, no
need for REQ_F_COMPLETE_INLINE, which is only serve the purpose of not
bloating the kernel.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index dbf0dbc87758..d895f70977b0 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1864,7 +1864,6 @@ inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
 {
 	if (*locked) {
 		req->cqe.flags |= io_put_kbuf(req, 0);
-		io_req_complete_state(req);
 		io_req_add_compl_list(req);
 	} else {
 		req->cqe.flags |= io_put_kbuf(req, IO_URING_F_UNLOCKED);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (7 preceding siblings ...)
  2022-06-14 14:36 ` [PATCH for-next v2 08/25] io_uring: don't set REQ_F_COMPLETE_INLINE in tw Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
  2022-06-15  8:05   ` Hao Xu
  2022-06-14 14:37 ` [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE Pavel Begunkov
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Luckily, nnobody completes multi-apoll requests outside the polling
functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
requests if used.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index d895f70977b0..1fb93fdcfbab 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2149,7 +2149,7 @@ int io_poll_issue(struct io_kiocb *req, bool *locked)
 	io_tw_lock(req->ctx, locked);
 	if (unlikely(req->task->flags & PF_EXITING))
 		return -EFAULT;
-	return io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
+	return io_issue_sqe(req, IO_URING_F_NONBLOCK);
 }
 
 struct io_wq_work *io_wq_free_work(struct io_wq_work *work)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (8 preceding siblings ...)
  2022-06-14 14:36 ` [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-15  8:20   ` Hao Xu
  2022-06-14 14:37 ` [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete() Pavel Begunkov
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.

As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c       | 20 ++++++++------------
 io_uring/io_uring.h       |  5 -----
 io_uring/io_uring_types.h |  3 ---
 3 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1fb93fdcfbab..fcee58c6c35e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2)
 
 inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
 {
-	if (issue_flags & IO_URING_F_COMPLETE_DEFER)
-		io_req_complete_state(req);
-	else
-		io_req_complete_post(req);
+	io_req_complete_post(req);
 }
 
 void __io_req_complete32(struct io_kiocb *req, unsigned int issue_flags,
 			 u64 extra1, u64 extra2)
 {
 	if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
-		io_req_complete_state(req);
+		io_req_add_compl_list(req);
 		req->extra1 = extra1;
 		req->extra2 = extra2;
 	} else {
@@ -2132,9 +2129,12 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	if (creds)
 		revert_creds(creds);
 
-	if (ret == IOU_OK)
-		__io_req_complete(req, issue_flags);
-	else if (ret != IOU_ISSUE_SKIP_COMPLETE)
+	if (ret == IOU_OK) {
+		if (issue_flags & IO_URING_F_COMPLETE_DEFER)
+			io_req_add_compl_list(req);
+		else
+			io_req_complete_post(req);
+	} else if (ret != IOU_ISSUE_SKIP_COMPLETE)
 		return ret;
 
 	/* If the op doesn't have a file, we're not polling for it */
@@ -2299,10 +2299,6 @@ static inline void io_queue_sqe(struct io_kiocb *req)
 
 	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
 
-	if (req->flags & REQ_F_COMPLETE_INLINE) {
-		io_req_add_compl_list(req);
-		return;
-	}
 	/*
 	 * We async punt it if the file wasn't marked NOWAIT, or if the file
 	 * doesn't support non-blocking read/write attempts
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 26b669746d61..2141519e995a 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -193,11 +193,6 @@ static inline bool io_run_task_work(void)
 	return false;
 }
 
-static inline void io_req_complete_state(struct io_kiocb *req)
-{
-	req->flags |= REQ_F_COMPLETE_INLINE;
-}
-
 static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
 {
 	if (!*locked) {
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index ca8e25992ece..3228872c199b 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -299,7 +299,6 @@ enum {
 	REQ_F_POLLED_BIT,
 	REQ_F_BUFFER_SELECTED_BIT,
 	REQ_F_BUFFER_RING_BIT,
-	REQ_F_COMPLETE_INLINE_BIT,
 	REQ_F_REISSUE_BIT,
 	REQ_F_CREDS_BIT,
 	REQ_F_REFCOUNT_BIT,
@@ -353,8 +352,6 @@ enum {
 	REQ_F_BUFFER_SELECTED	= BIT(REQ_F_BUFFER_SELECTED_BIT),
 	/* buffer selected from ring, needs commit */
 	REQ_F_BUFFER_RING	= BIT(REQ_F_BUFFER_RING_BIT),
-	/* completion is deferred through io_comp_state */
-	REQ_F_COMPLETE_INLINE	= BIT(REQ_F_COMPLETE_INLINE_BIT),
 	/* caller should reissue async */
 	REQ_F_REISSUE		= BIT(REQ_F_REISSUE_BIT),
 	/* supports async reads/writes */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete()
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (9 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 17:45   ` Hao Xu
  2022-06-14 14:37 ` [PATCH for-next v2 12/25] io_uring: don't inline io_put_kbuf Pavel Begunkov
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index fcee58c6c35e..0f6edf82f262 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1857,15 +1857,19 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
 
 	return ret;
 }
-inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
+
+void io_req_task_complete(struct io_kiocb *req, bool *locked)
 {
-	if (*locked) {
-		req->cqe.flags |= io_put_kbuf(req, 0);
+	if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) {
+		unsigned issue_flags = *locked ? IO_URING_F_UNLOCKED : 0;
+
+		req->cqe.flags |= io_put_kbuf(req, issue_flags);
+	}
+
+	if (*locked)
 		io_req_add_compl_list(req);
-	} else {
-		req->cqe.flags |= io_put_kbuf(req, IO_URING_F_UNLOCKED);
+	else
 		io_req_complete_post(req);
-	}
 }
 
 /*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 12/25] io_uring: don't inline io_put_kbuf
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (10 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete() Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 13/25] io_uring: remove check_cq checking from hot paths Pavel Begunkov
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

io_put_kbuf() is huge, don't bloat the kernel with inlining.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/kbuf.c | 33 +++++++++++++++++++++++++++++++++
 io_uring/kbuf.h | 38 ++++++--------------------------------
 2 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 9cdbc018fd64..6f2adb481a0d 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -81,6 +81,39 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx,
 	return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL));
 }
 
+unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags)
+{
+	unsigned int cflags;
+
+	/*
+	 * We can add this buffer back to two lists:
+	 *
+	 * 1) The io_buffers_cache list. This one is protected by the
+	 *    ctx->uring_lock. If we already hold this lock, add back to this
+	 *    list as we can grab it from issue as well.
+	 * 2) The io_buffers_comp list. This one is protected by the
+	 *    ctx->completion_lock.
+	 *
+	 * We migrate buffers from the comp_list to the issue cache list
+	 * when we need one.
+	 */
+	if (req->flags & REQ_F_BUFFER_RING) {
+		/* no buffers to recycle for this case */
+		cflags = __io_put_kbuf_list(req, NULL);
+	} else if (issue_flags & IO_URING_F_UNLOCKED) {
+		struct io_ring_ctx *ctx = req->ctx;
+
+		spin_lock(&ctx->completion_lock);
+		cflags = __io_put_kbuf_list(req, &ctx->io_buffers_comp);
+		spin_unlock(&ctx->completion_lock);
+	} else {
+		lockdep_assert_held(&req->ctx->uring_lock);
+
+		cflags = __io_put_kbuf_list(req, &req->ctx->io_buffers_cache);
+	}
+	return cflags;
+}
+
 static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
 					      struct io_buffer_list *bl)
 {
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 80b6df2c7535..5da3d4039aed 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -47,6 +47,8 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags);
 int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg);
 int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg);
 
+unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags);
+
 static inline bool io_do_buffer_select(struct io_kiocb *req)
 {
 	if (!(req->flags & REQ_F_BUFFER_SELECT))
@@ -70,7 +72,8 @@ static inline void io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
 	__io_kbuf_recycle(req, issue_flags);
 }
 
-static unsigned int __io_put_kbuf(struct io_kiocb *req, struct list_head *list)
+static inline unsigned int __io_put_kbuf_list(struct io_kiocb *req,
+					      struct list_head *list)
 {
 	if (req->flags & REQ_F_BUFFER_RING) {
 		if (req->buf_list)
@@ -90,44 +93,15 @@ static inline unsigned int io_put_kbuf_comp(struct io_kiocb *req)
 
 	if (!(req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)))
 		return 0;
-	return __io_put_kbuf(req, &req->ctx->io_buffers_comp);
+	return __io_put_kbuf_list(req, &req->ctx->io_buffers_comp);
 }
 
 static inline unsigned int io_put_kbuf(struct io_kiocb *req,
 				       unsigned issue_flags)
 {
-	unsigned int cflags;
 
 	if (!(req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)))
 		return 0;
-
-	/*
-	 * We can add this buffer back to two lists:
-	 *
-	 * 1) The io_buffers_cache list. This one is protected by the
-	 *    ctx->uring_lock. If we already hold this lock, add back to this
-	 *    list as we can grab it from issue as well.
-	 * 2) The io_buffers_comp list. This one is protected by the
-	 *    ctx->completion_lock.
-	 *
-	 * We migrate buffers from the comp_list to the issue cache list
-	 * when we need one.
-	 */
-	if (req->flags & REQ_F_BUFFER_RING) {
-		/* no buffers to recycle for this case */
-		cflags = __io_put_kbuf(req, NULL);
-	} else if (issue_flags & IO_URING_F_UNLOCKED) {
-		struct io_ring_ctx *ctx = req->ctx;
-
-		spin_lock(&ctx->completion_lock);
-		cflags = __io_put_kbuf(req, &ctx->io_buffers_comp);
-		spin_unlock(&ctx->completion_lock);
-	} else {
-		lockdep_assert_held(&req->ctx->uring_lock);
-
-		cflags = __io_put_kbuf(req, &req->ctx->io_buffers_cache);
-	}
-
-	return cflags;
+	return __io_put_kbuf(req, issue_flags);
 }
 #endif
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 13/25] io_uring: remove check_cq checking from hot paths
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (11 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 12/25] io_uring: don't inline io_put_kbuf Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 14/25] io_uring: poll: remove unnecessary req->ref set Pavel Begunkov
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

All ctx->check_cq events are slow path, don't test every single flag one
by one in the hot path, but add a common guarding if.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 0f6edf82f262..e43eccf173ff 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1807,24 +1807,25 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
 	int ret = 0;
 	unsigned long check_cq;
 
+	check_cq = READ_ONCE(ctx->check_cq);
+	if (unlikely(check_cq)) {
+		if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
+			__io_cqring_overflow_flush(ctx, false);
+		/*
+		 * Similarly do not spin if we have not informed the user of any
+		 * dropped CQE.
+		 */
+		if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
+			return -EBADR;
+	}
 	/*
 	 * Don't enter poll loop if we already have events pending.
 	 * If we do, we can potentially be spinning for commands that
 	 * already triggered a CQE (eg in error).
 	 */
-	check_cq = READ_ONCE(ctx->check_cq);
-	if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
-		__io_cqring_overflow_flush(ctx, false);
 	if (io_cqring_events(ctx))
 		return 0;
 
-	/*
-	 * Similarly do not spin if we have not informed the user of any
-	 * dropped CQE.
-	 */
-	if (unlikely(check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT)))
-		return -EBADR;
-
 	do {
 		/*
 		 * If a submit got punted to a workqueue, we can have the
@@ -2752,12 +2753,15 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 	ret = io_run_task_work_sig();
 	if (ret || io_should_wake(iowq))
 		return ret;
+
 	check_cq = READ_ONCE(ctx->check_cq);
-	/* let the caller flush overflows, retry */
-	if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
-		return 1;
-	if (unlikely(check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT)))
-		return -EBADR;
+	if (unlikely(check_cq)) {
+		/* let the caller flush overflows, retry */
+		if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
+			return 1;
+		if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
+			return -EBADR;
+	}
 	if (!schedule_hrtimeout(&timeout, HRTIMER_MODE_ABS))
 		return -ETIME;
 	return 1;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 14/25] io_uring: poll: remove unnecessary req->ref set
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (12 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 13/25] io_uring: remove check_cq checking from hot paths Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 15/25] io_uring: switch cancel_hash to use per entry spinlock Pavel Begunkov
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence, Hao Xu

From: Hao Xu <howeyxu@tencent.com>

We now don't need to set req->refcount for poll requests since the
reworked poll code ensures no request release race.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/poll.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 0df5eca93b16..73584c4e3e9b 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -683,7 +683,6 @@ int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if ((flags & IORING_POLL_ADD_MULTI) && (req->flags & REQ_F_CQE_SKIP))
 		return -EINVAL;
 
-	io_req_set_refcount(req);
 	req->apoll_events = poll->events = io_poll_parse_events(sqe, flags);
 	return 0;
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 15/25] io_uring: switch cancel_hash to use per entry spinlock
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (13 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 14/25] io_uring: poll: remove unnecessary req->ref set Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 16/25] io_uring: pass poll_find lock back Pavel Begunkov
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence, Hao Xu

From: Hao Xu <howeyxu@tencent.com>

Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/cancel.c         | 14 ++++++-
 io_uring/cancel.h         |  6 +++
 io_uring/fdinfo.c         |  9 +++--
 io_uring/io_uring.c       |  8 ++--
 io_uring/io_uring_types.h |  2 +-
 io_uring/poll.c           | 80 ++++++++++++++++++++++++---------------
 6 files changed, 79 insertions(+), 40 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 83cceb52d82d..6f2888388a40 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -93,14 +93,14 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
 	if (!ret)
 		return 0;
 
-	spin_lock(&ctx->completion_lock);
 	ret = io_poll_cancel(ctx, cd);
 	if (ret != -ENOENT)
 		goto out;
+	spin_lock(&ctx->completion_lock);
 	if (!(cd->flags & IORING_ASYNC_CANCEL_FD))
 		ret = io_timeout_cancel(ctx, cd);
-out:
 	spin_unlock(&ctx->completion_lock);
+out:
 	return ret;
 }
 
@@ -192,3 +192,13 @@ int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
 	io_req_set_res(req, ret, 0);
 	return IOU_OK;
 }
+
+void init_hash_table(struct io_hash_bucket *hash_table, unsigned size)
+{
+	unsigned int i;
+
+	for (i = 0; i < size; i++) {
+		spin_lock_init(&hash_table[i].lock);
+		INIT_HLIST_HEAD(&hash_table[i].list);
+	}
+}
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 4f35d8696325..556a7dcf160e 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -4,3 +4,9 @@ int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
 
 int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
+void init_hash_table(struct io_hash_bucket *hash_table, unsigned size);
+
+struct io_hash_bucket {
+	spinlock_t		lock;
+	struct hlist_head	list;
+} ____cacheline_aligned_in_smp;
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index fcedde4b4b1e..f941c73f5502 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -13,6 +13,7 @@
 #include "io_uring.h"
 #include "sqpoll.h"
 #include "fdinfo.h"
+#include "cancel.h"
 
 #ifdef CONFIG_PROC_FS
 static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id,
@@ -157,17 +158,19 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
 		mutex_unlock(&ctx->uring_lock);
 
 	seq_puts(m, "PollList:\n");
-	spin_lock(&ctx->completion_lock);
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct hlist_head *list = &ctx->cancel_hash[i];
+		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
 		struct io_kiocb *req;
 
-		hlist_for_each_entry(req, list, hash_node)
+		spin_lock(&hb->lock);
+		hlist_for_each_entry(req, &hb->list, hash_node)
 			seq_printf(m, "  op=%d, task_works=%d\n", req->opcode,
 					task_work_pending(req->task));
+		spin_unlock(&hb->lock);
 	}
 
 	seq_puts(m, "CqOverflowList:\n");
+	spin_lock(&ctx->completion_lock);
 	list_for_each_entry(ocqe, &ctx->cq_overflow_list, list) {
 		struct io_uring_cqe *cqe = &ocqe->cqe;
 
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index e43eccf173ff..4d94757e6e28 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -717,11 +717,13 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	if (hash_bits <= 0)
 		hash_bits = 1;
 	ctx->cancel_hash_bits = hash_bits;
-	ctx->cancel_hash = kmalloc((1U << hash_bits) * sizeof(struct hlist_head),
-					GFP_KERNEL);
+	ctx->cancel_hash =
+		kmalloc((1U << hash_bits) * sizeof(struct io_hash_bucket),
+			GFP_KERNEL);
 	if (!ctx->cancel_hash)
 		goto err;
-	__hash_init(ctx->cancel_hash, 1U << hash_bits);
+
+	init_hash_table(ctx->cancel_hash, 1U << hash_bits);
 
 	ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
 	if (!ctx->dummy_ubuf)
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 3228872c199b..aba0f8cd6f49 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -222,7 +222,7 @@ struct io_ring_ctx {
 		 * manipulate the list, hence no extra locking is needed there.
 		 */
 		struct io_wq_work_list	iopoll_list;
-		struct hlist_head	*cancel_hash;
+		struct io_hash_bucket	*cancel_hash;
 		unsigned		cancel_hash_bits;
 		bool			poll_multi_queue;
 
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 73584c4e3e9b..7f6b16f687b0 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -19,6 +19,7 @@
 #include "opdef.h"
 #include "kbuf.h"
 #include "poll.h"
+#include "cancel.h"
 
 struct io_poll_update {
 	struct file			*file;
@@ -73,10 +74,22 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req)
 static void io_poll_req_insert(struct io_kiocb *req)
 {
 	struct io_ring_ctx *ctx = req->ctx;
-	struct hlist_head *list;
+	u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
+	struct io_hash_bucket *hb = &ctx->cancel_hash[index];
 
-	list = &ctx->cancel_hash[hash_long(req->cqe.user_data, ctx->cancel_hash_bits)];
-	hlist_add_head(&req->hash_node, list);
+	spin_lock(&hb->lock);
+	hlist_add_head(&req->hash_node, &hb->list);
+	spin_unlock(&hb->lock);
+}
+
+static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
+{
+	u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
+	spinlock_t *lock = &ctx->cancel_hash[index].lock;
+
+	spin_lock(lock);
+	hash_del(&req->hash_node);
+	spin_unlock(lock);
 }
 
 static void io_init_poll_iocb(struct io_poll *poll, __poll_t events,
@@ -220,8 +233,8 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 	}
 
 	io_poll_remove_entries(req);
+	io_poll_req_delete(req, ctx);
 	spin_lock(&ctx->completion_lock);
-	hash_del(&req->hash_node);
 	req->cqe.flags = 0;
 	__io_req_complete_post(req);
 	io_commit_cqring(ctx);
@@ -231,7 +244,6 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 
 static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 {
-	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
 	ret = io_poll_check_events(req, locked);
@@ -239,9 +251,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 		return;
 
 	io_poll_remove_entries(req);
-	spin_lock(&ctx->completion_lock);
-	hash_del(&req->hash_node);
-	spin_unlock(&ctx->completion_lock);
+	io_poll_req_delete(req, req->ctx);
 
 	if (!ret)
 		io_req_task_submit(req, locked);
@@ -435,9 +445,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
 		return 0;
 	}
 
-	spin_lock(&ctx->completion_lock);
 	io_poll_req_insert(req);
-	spin_unlock(&ctx->completion_lock);
 
 	if (mask && (poll->events & EPOLLET)) {
 		/* can't multishot if failed, just queue the event we've got */
@@ -534,32 +542,31 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 	bool found = false;
 	int i;
 
-	spin_lock(&ctx->completion_lock);
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct hlist_head *list;
+		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
 
-		list = &ctx->cancel_hash[i];
-		hlist_for_each_entry_safe(req, tmp, list, hash_node) {
+		spin_lock(&hb->lock);
+		hlist_for_each_entry_safe(req, tmp, &hb->list, hash_node) {
 			if (io_match_task_safe(req, tsk, cancel_all)) {
 				hlist_del_init(&req->hash_node);
 				io_poll_cancel_req(req);
 				found = true;
 			}
 		}
+		spin_unlock(&hb->lock);
 	}
-	spin_unlock(&ctx->completion_lock);
 	return found;
 }
 
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				     struct io_cancel_data *cd)
-	__must_hold(&ctx->completion_lock)
 {
-	struct hlist_head *list;
 	struct io_kiocb *req;
+	u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
+	struct io_hash_bucket *hb = &ctx->cancel_hash[index];
 
-	list = &ctx->cancel_hash[hash_long(cd->data, ctx->cancel_hash_bits)];
-	hlist_for_each_entry(req, list, hash_node) {
+	spin_lock(&hb->lock);
+	hlist_for_each_entry(req, &hb->list, hash_node) {
 		if (cd->data != req->cqe.user_data)
 			continue;
 		if (poll_only && req->opcode != IORING_OP_POLL_ADD)
@@ -571,21 +578,21 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 		}
 		return req;
 	}
+	spin_unlock(&hb->lock);
 	return NULL;
 }
 
 static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 					  struct io_cancel_data *cd)
-	__must_hold(&ctx->completion_lock)
 {
 	struct io_kiocb *req;
 	int i;
 
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct hlist_head *list;
+		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
 
-		list = &ctx->cancel_hash[i];
-		hlist_for_each_entry(req, list, hash_node) {
+		spin_lock(&hb->lock);
+		hlist_for_each_entry(req, &hb->list, hash_node) {
 			if (!(cd->flags & IORING_ASYNC_CANCEL_ANY) &&
 			    req->file != cd->file)
 				continue;
@@ -594,12 +601,12 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 			req->work.cancel_seq = cd->seq;
 			return req;
 		}
+		spin_unlock(&hb->lock);
 	}
 	return NULL;
 }
 
 static bool io_poll_disarm(struct io_kiocb *req)
-	__must_hold(&ctx->completion_lock)
 {
 	if (!io_poll_get_ownership(req))
 		return false;
@@ -609,17 +616,23 @@ static bool io_poll_disarm(struct io_kiocb *req)
 }
 
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
-	__must_hold(&ctx->completion_lock)
 {
 	struct io_kiocb *req;
+	u32 index;
+	spinlock_t *lock;
 
 	if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
 		req = io_poll_file_find(ctx, cd);
 	else
 		req = io_poll_find(ctx, false, cd);
-	if (!req)
+	if (!req) {
 		return -ENOENT;
+	} else {
+		index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
+		lock = &ctx->cancel_hash[index].lock;
+	}
 	io_poll_cancel_req(req);
+	spin_unlock(lock);
 	return 0;
 }
 
@@ -713,18 +726,23 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_poll_update *poll_update = io_kiocb_to_cmd(req);
 	struct io_cancel_data cd = { .data = poll_update->old_user_data, };
 	struct io_ring_ctx *ctx = req->ctx;
+	u32 index = hash_long(cd.data, ctx->cancel_hash_bits);
+	spinlock_t *lock = &ctx->cancel_hash[index].lock;
 	struct io_kiocb *preq;
 	int ret2, ret = 0;
 	bool locked;
 
-	spin_lock(&ctx->completion_lock);
 	preq = io_poll_find(ctx, true, &cd);
-	if (!preq || !io_poll_disarm(preq)) {
-		spin_unlock(&ctx->completion_lock);
-		ret = preq ? -EALREADY : -ENOENT;
+	if (!preq) {
+		ret = -ENOENT;
+		goto out;
+	}
+	ret2 = io_poll_disarm(preq);
+	spin_unlock(lock);
+	if (!ret2) {
+		ret = -EALREADY;
 		goto out;
 	}
-	spin_unlock(&ctx->completion_lock);
 
 	if (poll_update->update_events || poll_update->update_user_data) {
 		/* only mask one event flags, keep behavior flags */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 16/25] io_uring: pass poll_find lock back
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (14 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 15/25] io_uring: switch cancel_hash to use per entry spinlock Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 17/25] io_uring: clean up io_try_cancel Pavel Begunkov
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Instead of using implicit knowledge of what is locked or not after
io_poll_find() and co returns, pass back a pointer to the locked
bucket if any. If set the user must to unlock the spinlock.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/poll.c | 46 ++++++++++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 7f6b16f687b0..7fc4aafcca95 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -559,12 +559,15 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 }
 
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
-				     struct io_cancel_data *cd)
+				     struct io_cancel_data *cd,
+				     struct io_hash_bucket **out_bucket)
 {
 	struct io_kiocb *req;
 	u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
 	struct io_hash_bucket *hb = &ctx->cancel_hash[index];
 
+	*out_bucket = NULL;
+
 	spin_lock(&hb->lock);
 	hlist_for_each_entry(req, &hb->list, hash_node) {
 		if (cd->data != req->cqe.user_data)
@@ -576,6 +579,7 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				continue;
 			req->work.cancel_seq = cd->seq;
 		}
+		*out_bucket = hb;
 		return req;
 	}
 	spin_unlock(&hb->lock);
@@ -583,11 +587,14 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 }
 
 static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
-					  struct io_cancel_data *cd)
+					  struct io_cancel_data *cd,
+					  struct io_hash_bucket **out_bucket)
 {
 	struct io_kiocb *req;
 	int i;
 
+	*out_bucket = NULL;
+
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
 		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
 
@@ -599,6 +606,7 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 			if (cd->seq == req->work.cancel_seq)
 				continue;
 			req->work.cancel_seq = cd->seq;
+			*out_bucket = hb;
 			return req;
 		}
 		spin_unlock(&hb->lock);
@@ -617,23 +625,19 @@ static bool io_poll_disarm(struct io_kiocb *req)
 
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
 {
+	struct io_hash_bucket *bucket;
 	struct io_kiocb *req;
-	u32 index;
-	spinlock_t *lock;
 
 	if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
-		req = io_poll_file_find(ctx, cd);
+		req = io_poll_file_find(ctx, cd, &bucket);
 	else
-		req = io_poll_find(ctx, false, cd);
-	if (!req) {
-		return -ENOENT;
-	} else {
-		index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
-		lock = &ctx->cancel_hash[index].lock;
-	}
-	io_poll_cancel_req(req);
-	spin_unlock(lock);
-	return 0;
+		req = io_poll_find(ctx, false, cd, &bucket);
+
+	if (req)
+		io_poll_cancel_req(req);
+	if (bucket)
+		spin_unlock(&bucket->lock);
+	return req ? 0 : -ENOENT;
 }
 
 static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
@@ -726,19 +730,21 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_poll_update *poll_update = io_kiocb_to_cmd(req);
 	struct io_cancel_data cd = { .data = poll_update->old_user_data, };
 	struct io_ring_ctx *ctx = req->ctx;
-	u32 index = hash_long(cd.data, ctx->cancel_hash_bits);
-	spinlock_t *lock = &ctx->cancel_hash[index].lock;
+	struct io_hash_bucket *bucket;
 	struct io_kiocb *preq;
 	int ret2, ret = 0;
 	bool locked;
 
-	preq = io_poll_find(ctx, true, &cd);
+	preq = io_poll_find(ctx, true, &cd, &bucket);
+	if (preq)
+		ret2 = io_poll_disarm(preq);
+	if (bucket)
+		spin_unlock(&bucket->lock);
+
 	if (!preq) {
 		ret = -ENOENT;
 		goto out;
 	}
-	ret2 = io_poll_disarm(preq);
-	spin_unlock(lock);
 	if (!ret2) {
 		ret = -EALREADY;
 		goto out;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 17/25] io_uring: clean up io_try_cancel
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (15 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 16/25] io_uring: pass poll_find lock back Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 18/25] io_uring: limit number hash buckets Pavel Begunkov
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Get rid of an unnecessary extra goto in io_try_cancel() and simplify the
function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/cancel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 6f2888388a40..a253e2ad22eb 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -95,12 +95,12 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
 
 	ret = io_poll_cancel(ctx, cd);
 	if (ret != -ENOENT)
-		goto out;
+		return ret;
+
 	spin_lock(&ctx->completion_lock);
 	if (!(cd->flags & IORING_ASYNC_CANCEL_FD))
 		ret = io_timeout_cancel(ctx, cd);
 	spin_unlock(&ctx->completion_lock);
-out:
 	return ret;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 18/25] io_uring: limit number hash buckets
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (16 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 17/25] io_uring: clean up io_try_cancel Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc Pavel Begunkov
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Don't allocate to many hash/cancellation buckets, there might be too
many, clamp it to 8 bits, or 256 * 64B = 16KB. We don't usually have too
many requests, and 256 buckets should be enough, especially since we
do hash search only in the cancellation path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4d94757e6e28..2a7a5db12a0e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -710,12 +710,12 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 
 	/*
 	 * Use 5 bits less than the max cq entries, that should give us around
-	 * 32 entries per hash list if totally full and uniformly spread.
+	 * 32 entries per hash list if totally full and uniformly spread, but
+	 * don't keep too many buckets to not overconsume memory.
 	 */
-	hash_bits = ilog2(p->cq_entries);
-	hash_bits -= 5;
-	if (hash_bits <= 0)
-		hash_bits = 1;
+	hash_bits = ilog2(p->cq_entries) - 5;
+	hash_bits = clamp(hash_bits, 1, 8);
+
 	ctx->cancel_hash_bits = hash_bits;
 	ctx->cancel_hash =
 		kmalloc((1U << hash_bits) * sizeof(struct io_hash_bucket),
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (17 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 18/25] io_uring: limit number hash buckets Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-15  8:46   ` Hao Xu
  2022-06-14 14:37 ` [PATCH for-next v2 20/25] io_uring: use state completion infra for poll reqs Pavel Begunkov
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Add a variable for the number of hash buckets in io_ring_ctx_alloc(),
makes it more readable.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 2a7a5db12a0e..15d209f334eb 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -700,6 +700,8 @@ static __cold void io_fallback_req_func(struct work_struct *work)
 static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 {
 	struct io_ring_ctx *ctx;
+	unsigned hash_buckets;
+	size_t hash_size;
 	int hash_bits;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -715,15 +717,15 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	 */
 	hash_bits = ilog2(p->cq_entries) - 5;
 	hash_bits = clamp(hash_bits, 1, 8);
+	hash_buckets = 1U << hash_bits;
+	hash_size = hash_buckets * sizeof(struct io_hash_bucket);
 
 	ctx->cancel_hash_bits = hash_bits;
-	ctx->cancel_hash =
-		kmalloc((1U << hash_bits) * sizeof(struct io_hash_bucket),
-			GFP_KERNEL);
+	ctx->cancel_hash = kmalloc(hash_size, GFP_KERNEL);
 	if (!ctx->cancel_hash)
 		goto err;
 
-	init_hash_table(ctx->cancel_hash, 1U << hash_bits);
+	init_hash_table(ctx->cancel_hash, hash_buckets);
 
 	ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
 	if (!ctx->dummy_ubuf)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 20/25] io_uring: use state completion infra for poll reqs
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (18 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Use io_req_task_complete() for poll request completions, so it can
utilise state completions and save lots of unnecessary locking.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/poll.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 7fc4aafcca95..c4ce98504986 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -234,12 +234,8 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 
 	io_poll_remove_entries(req);
 	io_poll_req_delete(req, ctx);
-	spin_lock(&ctx->completion_lock);
-	req->cqe.flags = 0;
-	__io_req_complete_post(req);
-	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-	io_cqring_ev_posted(ctx);
+	io_req_set_res(req, req->cqe.res, 0);
+	io_req_task_complete(req, locked);
 }
 
 static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (19 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 20/25] io_uring: use state completion infra for poll reqs Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-15  9:34   ` Hao Xu
  2022-06-15  9:41   ` Hao Xu
  2022-06-14 14:37 ` [PATCH for-next v2 22/25] io_uring: pass hash table into poll_find Pavel Begunkov
                   ` (3 subsequent siblings)
  24 siblings, 2 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
of it, i.e. put limitations of submitters. Also, don't allow it together
with IOPOLL as we're not going to put it to good use.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/uapi/linux/io_uring.h |  5 ++++-
 io_uring/io_uring.c           |  7 +++++--
 io_uring/io_uring_types.h     |  1 +
 io_uring/tctx.c               | 27 ++++++++++++++++++++++++---
 io_uring/tctx.h               |  4 ++--
 5 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index a41ddb8c5e1f..a3a691340d3e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -138,9 +138,12 @@ enum {
  * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
  */
 #define IORING_SETUP_TASKRUN_FLAG	(1U << 9)
-
 #define IORING_SETUP_SQE128		(1U << 10) /* SQEs are 128 byte */
 #define IORING_SETUP_CQE32		(1U << 11) /* CQEs are 32 byte */
+/*
+ * Only one task is allowed to submit requests
+ */
+#define IORING_SETUP_SINGLE_ISSUER	(1U << 12)
 
 enum io_uring_op {
 	IORING_OP_NOP,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 15d209f334eb..4b90439808e3 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3020,6 +3020,8 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	io_destroy_buffers(ctx);
 	if (ctx->sq_creds)
 		put_cred(ctx->sq_creds);
+	if (ctx->submitter_task)
+		put_task_struct(ctx->submitter_task);
 
 	/* there are no registered resources left, nobody uses it */
 	if (ctx->rsrc_node)
@@ -3752,7 +3754,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
 	if (fd < 0)
 		return fd;
 
-	ret = io_uring_add_tctx_node(ctx);
+	ret = __io_uring_add_tctx_node(ctx, false);
 	if (ret) {
 		put_unused_fd(fd);
 		return ret;
@@ -3972,7 +3974,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
 			IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
 			IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
 			IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG |
-			IORING_SETUP_SQE128 | IORING_SETUP_CQE32))
+			IORING_SETUP_SQE128 | IORING_SETUP_CQE32 |
+			IORING_SETUP_SINGLE_ISSUER))
 		return -EINVAL;
 
 	return io_uring_create(entries, &p, params);
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index aba0f8cd6f49..f6d0ad25f377 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -241,6 +241,7 @@ struct io_ring_ctx {
 	/* Keep this last, we don't need it for the fast path */
 
 	struct io_restriction		restrictions;
+	struct task_struct		*submitter_task;
 
 	/* slow path rsrc auxilary data, used by update/register */
 	struct io_rsrc_node		*rsrc_backup_node;
diff --git a/io_uring/tctx.c b/io_uring/tctx.c
index 6adf659687f8..012be261dc50 100644
--- a/io_uring/tctx.c
+++ b/io_uring/tctx.c
@@ -81,12 +81,32 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
 	return 0;
 }
 
-int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
+static int io_register_submitter(struct io_ring_ctx *ctx)
+{
+	int ret = 0;
+
+	mutex_lock(&ctx->uring_lock);
+	if (!ctx->submitter_task)
+		ctx->submitter_task = get_task_struct(current);
+	else if (ctx->submitter_task != current)
+		ret = -EEXIST;
+	mutex_unlock(&ctx->uring_lock);
+
+	return ret;
+}
+
+int __io_uring_add_tctx_node(struct io_ring_ctx *ctx, bool submitter)
 {
 	struct io_uring_task *tctx = current->io_uring;
 	struct io_tctx_node *node;
 	int ret;
 
+	if ((ctx->flags & IORING_SETUP_SINGLE_ISSUER) && submitter) {
+		ret = io_register_submitter(ctx);
+		if (ret)
+			return ret;
+	}
+
 	if (unlikely(!tctx)) {
 		ret = io_uring_alloc_task_context(current, ctx);
 		if (unlikely(ret))
@@ -120,7 +140,8 @@ int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
 		list_add(&node->ctx_node, &ctx->tctx_list);
 		mutex_unlock(&ctx->uring_lock);
 	}
-	tctx->last = ctx;
+	if (submitter)
+		tctx->last = ctx;
 	return 0;
 }
 
@@ -228,7 +249,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
 		return -EINVAL;
 
 	mutex_unlock(&ctx->uring_lock);
-	ret = io_uring_add_tctx_node(ctx);
+	ret = __io_uring_add_tctx_node(ctx, false);
 	mutex_lock(&ctx->uring_lock);
 	if (ret)
 		return ret;
diff --git a/io_uring/tctx.h b/io_uring/tctx.h
index 7684713e950f..dde82ce4d8e2 100644
--- a/io_uring/tctx.h
+++ b/io_uring/tctx.h
@@ -34,7 +34,7 @@ struct io_tctx_node {
 int io_uring_alloc_task_context(struct task_struct *task,
 				struct io_ring_ctx *ctx);
 void io_uring_del_tctx_node(unsigned long index);
-int __io_uring_add_tctx_node(struct io_ring_ctx *ctx);
+int __io_uring_add_tctx_node(struct io_ring_ctx *ctx, bool submitter);
 void io_uring_clean_tctx(struct io_uring_task *tctx);
 
 void io_uring_unreg_ringfd(void);
@@ -52,5 +52,5 @@ static inline int io_uring_add_tctx_node(struct io_ring_ctx *ctx)
 
 	if (likely(tctx && tctx->last == ctx))
 		return 0;
-	return __io_uring_add_tctx_node(ctx);
+	return __io_uring_add_tctx_node(ctx, true);
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 22/25] io_uring: pass hash table into poll_find
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (20 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 23/25] io_uring: introduce a struct for hash table Pavel Begunkov
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

In preparation for having multiple cancellation hash tables, pass a
table pointer into io_poll_find() and other poll cancel functions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/poll.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index c4ce98504986..5cc03be365e3 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -556,11 +556,12 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				     struct io_cancel_data *cd,
+				     struct io_hash_bucket hash_table[],
 				     struct io_hash_bucket **out_bucket)
 {
 	struct io_kiocb *req;
 	u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
-	struct io_hash_bucket *hb = &ctx->cancel_hash[index];
+	struct io_hash_bucket *hb = &hash_table[index];
 
 	*out_bucket = NULL;
 
@@ -584,6 +585,7 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 
 static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 					  struct io_cancel_data *cd,
+					  struct io_hash_bucket hash_table[],
 					  struct io_hash_bucket **out_bucket)
 {
 	struct io_kiocb *req;
@@ -592,7 +594,7 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 	*out_bucket = NULL;
 
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
+		struct io_hash_bucket *hb = &hash_table[i];
 
 		spin_lock(&hb->lock);
 		hlist_for_each_entry(req, &hb->list, hash_node) {
@@ -619,15 +621,16 @@ static bool io_poll_disarm(struct io_kiocb *req)
 	return true;
 }
 
-int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
+static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
+			    struct io_hash_bucket hash_table[])
 {
 	struct io_hash_bucket *bucket;
 	struct io_kiocb *req;
 
 	if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
-		req = io_poll_file_find(ctx, cd, &bucket);
+		req = io_poll_file_find(ctx, cd, ctx->cancel_hash, &bucket);
 	else
-		req = io_poll_find(ctx, false, cd, &bucket);
+		req = io_poll_find(ctx, false, cd, ctx->cancel_hash, &bucket);
 
 	if (req)
 		io_poll_cancel_req(req);
@@ -636,6 +639,11 @@ int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
 	return req ? 0 : -ENOENT;
 }
 
+int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
+{
+	return __io_poll_cancel(ctx, cd, ctx->cancel_hash);
+}
+
 static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
 				     unsigned int flags)
 {
@@ -731,7 +739,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	int ret2, ret = 0;
 	bool locked;
 
-	preq = io_poll_find(ctx, true, &cd, &bucket);
+	preq = io_poll_find(ctx, true, &cd, ctx->cancel_hash, &bucket);
 	if (preq)
 		ret2 = io_poll_disarm(preq);
 	if (bucket)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 23/25] io_uring: introduce a struct for hash table
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (21 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 22/25] io_uring: pass hash table into poll_find Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 24/25] io_uring: propagate locking state to poll cancel Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing Pavel Begunkov
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Instead of passing around a pointer to hash buckets, add a bit of type
safety and wrap it into a structure.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/cancel.c         |  6 +++---
 io_uring/cancel.h         |  7 +------
 io_uring/fdinfo.c         |  4 ++--
 io_uring/io_uring.c       | 29 ++++++++++++++++------------
 io_uring/io_uring_types.h | 13 +++++++++++--
 io_uring/poll.c           | 40 +++++++++++++++++++++------------------
 6 files changed, 56 insertions(+), 43 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index a253e2ad22eb..f28f0a7d1272 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -193,12 +193,12 @@ int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
 	return IOU_OK;
 }
 
-void init_hash_table(struct io_hash_bucket *hash_table, unsigned size)
+void init_hash_table(struct io_hash_table *table, unsigned size)
 {
 	unsigned int i;
 
 	for (i = 0; i < size; i++) {
-		spin_lock_init(&hash_table[i].lock);
-		INIT_HLIST_HEAD(&hash_table[i].list);
+		spin_lock_init(&table->hbs[i].lock);
+		INIT_HLIST_HEAD(&table->hbs[i].list);
 	}
 }
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 556a7dcf160e..fd4cb1a2595d 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -4,9 +4,4 @@ int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
 
 int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
-void init_hash_table(struct io_hash_bucket *hash_table, unsigned size);
-
-struct io_hash_bucket {
-	spinlock_t		lock;
-	struct hlist_head	list;
-} ____cacheline_aligned_in_smp;
+void init_hash_table(struct io_hash_table *table, unsigned size);
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index f941c73f5502..344e7d90d557 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -158,8 +158,8 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
 		mutex_unlock(&ctx->uring_lock);
 
 	seq_puts(m, "PollList:\n");
-	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
+	for (i = 0; i < (1U << ctx->cancel_table.hash_bits); i++) {
+		struct io_hash_bucket *hb = &ctx->cancel_table.hbs[i];
 		struct io_kiocb *req;
 
 		spin_lock(&hb->lock);
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4b90439808e3..4bead16e57f7 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -697,11 +697,23 @@ static __cold void io_fallback_req_func(struct work_struct *work)
 	percpu_ref_put(&ctx->refs);
 }
 
+static int io_alloc_hash_table(struct io_hash_table *table, unsigned bits)
+{
+	unsigned hash_buckets = 1U << bits;
+	size_t hash_size = hash_buckets * sizeof(table->hbs[0]);
+
+	table->hbs = kmalloc(hash_size, GFP_KERNEL);
+	if (!table->hbs)
+		return -ENOMEM;
+
+	table->hash_bits = bits;
+	init_hash_table(table, hash_buckets);
+	return 0;
+}
+
 static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 {
 	struct io_ring_ctx *ctx;
-	unsigned hash_buckets;
-	size_t hash_size;
 	int hash_bits;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -717,16 +729,9 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	 */
 	hash_bits = ilog2(p->cq_entries) - 5;
 	hash_bits = clamp(hash_bits, 1, 8);
-	hash_buckets = 1U << hash_bits;
-	hash_size = hash_buckets * sizeof(struct io_hash_bucket);
-
-	ctx->cancel_hash_bits = hash_bits;
-	ctx->cancel_hash = kmalloc(hash_size, GFP_KERNEL);
-	if (!ctx->cancel_hash)
+	if (io_alloc_hash_table(&ctx->cancel_table, hash_bits))
 		goto err;
 
-	init_hash_table(ctx->cancel_hash, hash_buckets);
-
 	ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
 	if (!ctx->dummy_ubuf)
 		goto err;
@@ -767,7 +772,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	return ctx;
 err:
 	kfree(ctx->dummy_ubuf);
-	kfree(ctx->cancel_hash);
+	kfree(ctx->cancel_table.hbs);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
 	kfree(ctx);
@@ -3050,7 +3055,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	io_req_caches_free(ctx);
 	if (ctx->hash_map)
 		io_wq_put_hash(ctx->hash_map);
-	kfree(ctx->cancel_hash);
+	kfree(ctx->cancel_table.hbs);
 	kfree(ctx->dummy_ubuf);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index f6d0ad25f377..ce2fbe6749bb 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -7,6 +7,16 @@
 #include "io-wq.h"
 #include "filetable.h"
 
+struct io_hash_bucket {
+	spinlock_t		lock;
+	struct hlist_head	list;
+} ____cacheline_aligned_in_smp;
+
+struct io_hash_table {
+	struct io_hash_bucket	*hbs;
+	unsigned		hash_bits;
+};
+
 struct io_uring {
 	u32 head ____cacheline_aligned_in_smp;
 	u32 tail ____cacheline_aligned_in_smp;
@@ -222,8 +232,7 @@ struct io_ring_ctx {
 		 * manipulate the list, hence no extra locking is needed there.
 		 */
 		struct io_wq_work_list	iopoll_list;
-		struct io_hash_bucket	*cancel_hash;
-		unsigned		cancel_hash_bits;
+		struct io_hash_table	cancel_table;
 		bool			poll_multi_queue;
 
 		struct list_head	io_buffers_comp;
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 5cc03be365e3..9c7793f5e93b 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -73,9 +73,9 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req)
 
 static void io_poll_req_insert(struct io_kiocb *req)
 {
-	struct io_ring_ctx *ctx = req->ctx;
-	u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
-	struct io_hash_bucket *hb = &ctx->cancel_hash[index];
+	struct io_hash_table *table = &req->ctx->cancel_table;
+	u32 index = hash_long(req->cqe.user_data, table->hash_bits);
+	struct io_hash_bucket *hb = &table->hbs[index];
 
 	spin_lock(&hb->lock);
 	hlist_add_head(&req->hash_node, &hb->list);
@@ -84,8 +84,9 @@ static void io_poll_req_insert(struct io_kiocb *req)
 
 static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
 {
-	u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
-	spinlock_t *lock = &ctx->cancel_hash[index].lock;
+	struct io_hash_table *table = &req->ctx->cancel_table;
+	u32 index = hash_long(req->cqe.user_data, table->hash_bits);
+	spinlock_t *lock = &table->hbs[index].lock;
 
 	spin_lock(lock);
 	hash_del(&req->hash_node);
@@ -533,13 +534,15 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
 __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 			       bool cancel_all)
 {
+	struct io_hash_table *table = &ctx->cancel_table;
+	unsigned nr_buckets = 1U << table->hash_bits;
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
 	bool found = false;
 	int i;
 
-	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
+	for (i = 0; i < nr_buckets; i++) {
+		struct io_hash_bucket *hb = &table->hbs[i];
 
 		spin_lock(&hb->lock);
 		hlist_for_each_entry_safe(req, tmp, &hb->list, hash_node) {
@@ -556,12 +559,12 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				     struct io_cancel_data *cd,
-				     struct io_hash_bucket hash_table[],
+				     struct io_hash_table *table,
 				     struct io_hash_bucket **out_bucket)
 {
 	struct io_kiocb *req;
-	u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
-	struct io_hash_bucket *hb = &hash_table[index];
+	u32 index = hash_long(cd->data, table->hash_bits);
+	struct io_hash_bucket *hb = &table->hbs[index];
 
 	*out_bucket = NULL;
 
@@ -585,16 +588,17 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 
 static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 					  struct io_cancel_data *cd,
-					  struct io_hash_bucket hash_table[],
+					  struct io_hash_table *table,
 					  struct io_hash_bucket **out_bucket)
 {
+	unsigned nr_buckets = 1U << table->hash_bits;
 	struct io_kiocb *req;
 	int i;
 
 	*out_bucket = NULL;
 
-	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct io_hash_bucket *hb = &hash_table[i];
+	for (i = 0; i < nr_buckets; i++) {
+		struct io_hash_bucket *hb = &table->hbs[i];
 
 		spin_lock(&hb->lock);
 		hlist_for_each_entry(req, &hb->list, hash_node) {
@@ -622,15 +626,15 @@ static bool io_poll_disarm(struct io_kiocb *req)
 }
 
 static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
-			    struct io_hash_bucket hash_table[])
+			    struct io_hash_table *table)
 {
 	struct io_hash_bucket *bucket;
 	struct io_kiocb *req;
 
 	if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
-		req = io_poll_file_find(ctx, cd, ctx->cancel_hash, &bucket);
+		req = io_poll_file_find(ctx, cd, table, &bucket);
 	else
-		req = io_poll_find(ctx, false, cd, ctx->cancel_hash, &bucket);
+		req = io_poll_find(ctx, false, cd, table, &bucket);
 
 	if (req)
 		io_poll_cancel_req(req);
@@ -641,7 +645,7 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
 {
-	return __io_poll_cancel(ctx, cd, ctx->cancel_hash);
+	return __io_poll_cancel(ctx, cd, &ctx->cancel_table);
 }
 
 static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
@@ -739,7 +743,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	int ret2, ret = 0;
 	bool locked;
 
-	preq = io_poll_find(ctx, true, &cd, ctx->cancel_hash, &bucket);
+	preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table, &bucket);
 	if (preq)
 		ret2 = io_poll_disarm(preq);
 	if (bucket)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 24/25] io_uring: propagate locking state to poll cancel
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (22 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 23/25] io_uring: introduce a struct for hash table Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-14 14:37 ` [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing Pavel Begunkov
  24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Poll cancellation will be soon need to grab ->uring_lock inside, pass
the locking state, i.e. issue_flags, inside the cancellation functions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/cancel.c  | 7 ++++---
 io_uring/cancel.h  | 3 ++-
 io_uring/poll.c    | 3 ++-
 io_uring/poll.h    | 3 ++-
 io_uring/timeout.c | 3 ++-
 5 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index f28f0a7d1272..f07bfd27c98a 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -78,7 +78,8 @@ static int io_async_cancel_one(struct io_uring_task *tctx,
 	return ret;
 }
 
-int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
+int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd,
+		  unsigned issue_flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
@@ -93,7 +94,7 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
 	if (!ret)
 		return 0;
 
-	ret = io_poll_cancel(ctx, cd);
+	ret = io_poll_cancel(ctx, cd, issue_flags);
 	if (ret != -ENOENT)
 		return ret;
 
@@ -136,7 +137,7 @@ static int __io_async_cancel(struct io_cancel_data *cd, struct io_kiocb *req,
 	int ret, nr = 0;
 
 	do {
-		ret = io_try_cancel(req, cd);
+		ret = io_try_cancel(req, cd, issue_flags);
 		if (ret == -ENOENT)
 			break;
 		if (!all)
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index fd4cb1a2595d..8dd259dc383e 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -3,5 +3,6 @@
 int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
 
-int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
+int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd,
+		  unsigned int issue_flags);
 void init_hash_table(struct io_hash_table *table, unsigned size);
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 9c7793f5e93b..07157da1c2cb 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -643,7 +643,8 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 	return req ? 0 : -ENOENT;
 }
 
-int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
+int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
+		   unsigned issue_flags)
 {
 	return __io_poll_cancel(ctx, cd, &ctx->cancel_table);
 }
diff --git a/io_uring/poll.h b/io_uring/poll.h
index cc75c1567a84..fa3e19790281 100644
--- a/io_uring/poll.h
+++ b/io_uring/poll.h
@@ -24,7 +24,8 @@ int io_poll_add(struct io_kiocb *req, unsigned int issue_flags);
 int io_poll_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags);
 
-int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd);
+int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
+		   unsigned issue_flags);
 int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags);
 bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 			bool cancel_all);
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 69cca42d6835..526fc8b2e3b6 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -262,6 +262,7 @@ int io_timeout_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
 
 static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked)
 {
+	unsigned issue_flags = *locked ? 0 : IO_URING_F_UNLOCKED;
 	struct io_timeout *timeout = io_kiocb_to_cmd(req);
 	struct io_kiocb *prev = timeout->prev;
 	int ret = -ENOENT;
@@ -273,7 +274,7 @@ static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked)
 				.data		= prev->cqe.user_data,
 			};
 
-			ret = io_try_cancel(req, &cd);
+			ret = io_try_cancel(req, &cd, issue_flags);
 		}
 		io_req_set_res(req, ret ?: -ETIME, 0);
 		io_req_complete_post(req);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing
  2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (23 preceding siblings ...)
  2022-06-14 14:37 ` [PATCH for-next v2 24/25] io_uring: propagate locking state to poll cancel Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
  2022-06-15 12:53   ` Hao Xu
  24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Currently we do two extra spin lock/unlock pairs to add a poll/apoll
request to the cancellation hash table and remove it from there.

On the submission side we often already hold ->uring_lock and tw
completion is likely to hold it as well. Add a second cancellation hash
table protected by ->uring_lock. In concerns for latency because of a
need to have the mutex locked on the completion side, use the new table
only in following cases:

1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
   is no contention and so the main tw hander will always end up
   grabbing it before calling into callbacks.

2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
   using ->uring_lock.

3) apoll: we normally grab the lock on the completion side anyway to
   execute the request, so it's free.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c       |   9 +++-
 io_uring/io_uring_types.h |   4 ++
 io_uring/poll.c           | 111 ++++++++++++++++++++++++++++++--------
 3 files changed, 102 insertions(+), 22 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4bead16e57f7..1395176bc2ea 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -731,6 +731,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	hash_bits = clamp(hash_bits, 1, 8);
 	if (io_alloc_hash_table(&ctx->cancel_table, hash_bits))
 		goto err;
+	if (io_alloc_hash_table(&ctx->cancel_table_locked, hash_bits))
+		goto err;
 
 	ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
 	if (!ctx->dummy_ubuf)
@@ -773,6 +775,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 err:
 	kfree(ctx->dummy_ubuf);
 	kfree(ctx->cancel_table.hbs);
+	kfree(ctx->cancel_table_locked.hbs);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
 	kfree(ctx);
@@ -3056,6 +3059,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	if (ctx->hash_map)
 		io_wq_put_hash(ctx->hash_map);
 	kfree(ctx->cancel_table.hbs);
+	kfree(ctx->cancel_table_locked.hbs);
 	kfree(ctx->dummy_ubuf);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
@@ -3217,12 +3221,13 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
 		__io_cqring_overflow_flush(ctx, true);
 	xa_for_each(&ctx->personalities, index, creds)
 		io_unregister_personality(ctx, index);
+	if (ctx->rings)
+		io_poll_remove_all(ctx, NULL, true);
 	mutex_unlock(&ctx->uring_lock);
 
 	/* failed during ring init, it couldn't have issued any requests */
 	if (ctx->rings) {
 		io_kill_timeouts(ctx, NULL, true);
-		io_poll_remove_all(ctx, NULL, true);
 		/* if we failed setting up the ctx, we might not have any rings */
 		io_iopoll_try_reap_events(ctx);
 	}
@@ -3347,7 +3352,9 @@ static __cold void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 		}
 
 		ret |= io_cancel_defer_files(ctx, task, cancel_all);
+		mutex_lock(&ctx->uring_lock);
 		ret |= io_poll_remove_all(ctx, task, cancel_all);
+		mutex_unlock(&ctx->uring_lock);
 		ret |= io_kill_timeouts(ctx, task, cancel_all);
 		if (task)
 			ret |= io_run_task_work();
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index ce2fbe6749bb..557b8e7719c9 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -189,6 +189,7 @@ struct io_ring_ctx {
 		struct xarray		io_bl_xa;
 		struct list_head	io_buffers_cache;
 
+		struct io_hash_table	cancel_table_locked;
 		struct list_head	cq_overflow_list;
 		struct list_head	apoll_cache;
 		struct xarray		personalities;
@@ -323,6 +324,7 @@ enum {
 	/* keep async read/write and isreg together and in order */
 	REQ_F_SUPPORT_NOWAIT_BIT,
 	REQ_F_ISREG_BIT,
+	REQ_F_HASH_LOCKED_BIT,
 
 	/* not a real bit, just to check we're not overflowing the space */
 	__REQ_F_LAST_BIT,
@@ -388,6 +390,8 @@ enum {
 	REQ_F_APOLL_MULTISHOT	= BIT(REQ_F_APOLL_MULTISHOT_BIT),
 	/* recvmsg special flag, clear EPOLLIN */
 	REQ_F_CLEAR_POLLIN	= BIT(REQ_F_CLEAR_POLLIN_BIT),
+	/* hashed into ->cancel_hash_locked */
+	REQ_F_HASH_LOCKED	= BIT(REQ_F_HASH_LOCKED_BIT),
 };
 
 typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 07157da1c2cb..d20484c1cbb7 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -93,6 +93,26 @@ static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
 	spin_unlock(lock);
 }
 
+static void io_poll_req_insert_locked(struct io_kiocb *req)
+{
+	struct io_hash_table *table = &req->ctx->cancel_table_locked;
+	u32 index = hash_long(req->cqe.user_data, table->hash_bits);
+
+	hlist_add_head(&req->hash_node, &table->hbs[index].list);
+}
+
+static void io_poll_tw_hash_eject(struct io_kiocb *req, bool *locked)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+
+	if (req->flags & REQ_F_HASH_LOCKED) {
+		io_tw_lock(ctx, locked);
+		hash_del(&req->hash_node);
+	} else {
+		io_poll_req_delete(req, ctx);
+	}
+}
+
 static void io_init_poll_iocb(struct io_poll *poll, __poll_t events,
 			      wait_queue_func_t wake_func)
 {
@@ -217,7 +237,6 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 
 static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 {
-	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
 	ret = io_poll_check_events(req, locked);
@@ -234,7 +253,8 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 	}
 
 	io_poll_remove_entries(req);
-	io_poll_req_delete(req, ctx);
+	io_poll_tw_hash_eject(req, locked);
+
 	io_req_set_res(req, req->cqe.res, 0);
 	io_req_task_complete(req, locked);
 }
@@ -248,7 +268,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 		return;
 
 	io_poll_remove_entries(req);
-	io_poll_req_delete(req, req->ctx);
+	io_poll_tw_hash_eject(req, locked);
 
 	if (!ret)
 		io_req_task_submit(req, locked);
@@ -442,7 +462,10 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
 		return 0;
 	}
 
-	io_poll_req_insert(req);
+	if (req->flags & REQ_F_HASH_LOCKED)
+		io_poll_req_insert_locked(req);
+	else
+		io_poll_req_insert(req);
 
 	if (mask && (poll->events & EPOLLET)) {
 		/* can't multishot if failed, just queue the event we've got */
@@ -480,6 +503,15 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
 	__poll_t mask = POLLPRI | POLLERR | EPOLLET;
 	int ret;
 
+	/*
+	 * apoll requests already grab the mutex to complete in the tw handler,
+	 * so removal from the mutex-backed hash is free, use it by default.
+	 */
+	if (issue_flags & IO_URING_F_UNLOCKED)
+		req->flags &= ~REQ_F_HASH_LOCKED;
+	else
+		req->flags |= REQ_F_HASH_LOCKED;
+
 	if (!def->pollin && !def->pollout)
 		return IO_APOLL_ABORTED;
 	if (!file_can_poll(req->file))
@@ -528,13 +560,10 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
 	return IO_APOLL_OK;
 }
 
-/*
- * Returns true if we found and killed one or more poll requests
- */
-__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
-			       bool cancel_all)
+static __cold bool io_poll_remove_all_table(struct task_struct *tsk,
+					    struct io_hash_table *table,
+					    bool cancel_all)
 {
-	struct io_hash_table *table = &ctx->cancel_table;
 	unsigned nr_buckets = 1U << table->hash_bits;
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
@@ -557,6 +586,17 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 	return found;
 }
 
+/*
+ * Returns true if we found and killed one or more poll requests
+ */
+__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
+			       bool cancel_all)
+	__must_hold(&ctx->uring_lock)
+{
+	return io_poll_remove_all_table(tsk, &ctx->cancel_table, cancel_all) |
+	       io_poll_remove_all_table(tsk, &ctx->cancel_table_locked, cancel_all);
+}
+
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				     struct io_cancel_data *cd,
 				     struct io_hash_table *table,
@@ -616,13 +656,15 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 	return NULL;
 }
 
-static bool io_poll_disarm(struct io_kiocb *req)
+static int io_poll_disarm(struct io_kiocb *req)
 {
+	if (!req)
+		return -ENOENT;
 	if (!io_poll_get_ownership(req))
-		return false;
+		return -EALREADY;
 	io_poll_remove_entries(req);
 	hash_del(&req->hash_node);
-	return true;
+	return 0;
 }
 
 static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
@@ -646,7 +688,16 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 		   unsigned issue_flags)
 {
-	return __io_poll_cancel(ctx, cd, &ctx->cancel_table);
+	int ret;
+
+	ret = __io_poll_cancel(ctx, cd, &ctx->cancel_table);
+	if (ret != -ENOENT)
+		return ret;
+
+	io_ring_submit_lock(ctx, issue_flags);
+	ret = __io_poll_cancel(ctx, cd, &ctx->cancel_table_locked);
+	io_ring_submit_unlock(ctx, issue_flags);
+	return ret;
 }
 
 static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
@@ -721,6 +772,16 @@ int io_poll_add(struct io_kiocb *req, unsigned int issue_flags)
 
 	ipt.pt._qproc = io_poll_queue_proc;
 
+	/*
+	 * If sqpoll or single issuer, there is no contention for ->uring_lock
+	 * and we'll end up holding it in tw handlers anyway.
+	 */
+	if (!(issue_flags & IO_URING_F_UNLOCKED) &&
+	    (req->ctx->flags & (IORING_SETUP_SQPOLL | IORING_SETUP_SINGLE_ISSUER)))
+		req->flags |= REQ_F_HASH_LOCKED;
+	else
+		req->flags &= ~REQ_F_HASH_LOCKED;
+
 	ret = __io_arm_poll_handler(req, poll, &ipt, poll->events);
 	if (ipt.error) {
 		return ipt.error;
@@ -745,20 +806,28 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	bool locked;
 
 	preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table, &bucket);
-	if (preq)
-		ret2 = io_poll_disarm(preq);
+	ret2 = io_poll_disarm(preq);
 	if (bucket)
 		spin_unlock(&bucket->lock);
-
-	if (!preq) {
-		ret = -ENOENT;
+	if (!ret2)
+		goto found;
+	if (ret2 != -ENOENT) {
+		ret = ret2;
 		goto out;
 	}
-	if (!ret2) {
-		ret = -EALREADY;
+
+	io_ring_submit_lock(ctx, issue_flags);
+	preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table_locked, &bucket);
+	ret2 = io_poll_disarm(preq);
+	if (bucket)
+		spin_unlock(&bucket->lock);
+	io_ring_submit_unlock(ctx, issue_flags);
+	if (ret2) {
+		ret = ret2;
 		goto out;
 	}
 
+found:
 	if (poll_update->update_events || poll_update->update_user_data) {
 		/* only mask one event flags, keep behavior flags */
 		if (poll_update->update_events) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete()
  2022-06-14 14:37 ` [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete() Pavel Begunkov
@ 2022-06-14 17:45   ` Hao Xu
  2022-06-14 17:52     ` Pavel Begunkov
  0 siblings, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-14 17:45 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/14/22 22:37, Pavel Begunkov wrote:
> Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/io_uring.c | 16 ++++++++++------
>   1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index fcee58c6c35e..0f6edf82f262 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1857,15 +1857,19 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
>   
>   	return ret;
>   }
> -inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
> +
> +void io_req_task_complete(struct io_kiocb *req, bool *locked)
>   {
> -	if (*locked) {
> -		req->cqe.flags |= io_put_kbuf(req, 0);
> +	if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) {
> +		unsigned issue_flags = *locked ? IO_URING_F_UNLOCKED : 0;

should be *locked ? 0 : IO_URING_F_UNLOCKED; I think?. I haven't look
into the whole series carefully, will do that tomorrow.



> +
> +		req->cqe.flags |= io_put_kbuf(req, issue_flags);
> +	}
> +
> +	if (*locked)
>   		io_req_add_compl_list(req);
> -	} else {
> -		req->cqe.flags |= io_put_kbuf(req, IO_URING_F_UNLOCKED);
> +	else
>   		io_req_complete_post(req);
> -	}
>   }
>   
>   /*


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete()
  2022-06-14 17:45   ` Hao Xu
@ 2022-06-14 17:52     ` Pavel Begunkov
  0 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 17:52 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe

On 6/14/22 18:45, Hao Xu wrote:
> On 6/14/22 22:37, Pavel Begunkov wrote:
>> Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   io_uring/io_uring.c | 16 ++++++++++------
>>   1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index fcee58c6c35e..0f6edf82f262 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -1857,15 +1857,19 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
>>       return ret;
>>   }
>> -inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
>> +
>> +void io_req_task_complete(struct io_kiocb *req, bool *locked)
>>   {
>> -    if (*locked) {
>> -        req->cqe.flags |= io_put_kbuf(req, 0);
>> +    if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) {
>> +        unsigned issue_flags = *locked ? IO_URING_F_UNLOCKED : 0;
> 
> should be *locked ? 0 : IO_URING_F_UNLOCKED; I think?. I haven't look
> into the whole series carefully, will do that tomorrow.

Yeah, it should... Thanks


>> +
>> +        req->cqe.flags |= io_put_kbuf(req, issue_flags);
>> +    }
>> +
>> +    if (*locked)
>>           io_req_add_compl_list(req);
>> -    } else {
>> -        req->cqe.flags |= io_put_kbuf(req, IO_URING_F_UNLOCKED);
>> +    else
>>           io_req_complete_post(req);
>> -    }
>>   }
>>   /*
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement
  2022-06-14 14:36 ` [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement Pavel Begunkov
@ 2022-06-15  7:58   ` Hao Xu
  2022-06-15 10:11     ` Pavel Begunkov
  0 siblings, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-15  7:58 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/14/22 22:36, Pavel Begunkov wrote:
> Shove all slow path data at the end of ctx and get rid of extra
> indention.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/io_uring_types.h | 81 +++++++++++++++++++--------------------
>   1 file changed, 39 insertions(+), 42 deletions(-)
> 
> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
> index 4f52dcbbda56..ca8e25992ece 100644
> --- a/io_uring/io_uring_types.h
> +++ b/io_uring/io_uring_types.h
> @@ -183,7 +183,6 @@ struct io_ring_ctx {
>   		struct list_head	apoll_cache;
>   		struct xarray		personalities;
>   		u32			pers_next;
> -		unsigned		sq_thread_idle;

SQPOLL is seen as a slow path?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll
  2022-06-14 14:36 ` [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll Pavel Begunkov
@ 2022-06-15  8:05   ` Hao Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Hao Xu @ 2022-06-15  8:05 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/14/22 22:36, Pavel Begunkov wrote:
> Luckily, nnobody completes multi-apoll requests outside the polling
> functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
> there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
> requests if used.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/io_uring.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index d895f70977b0..1fb93fdcfbab 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2149,7 +2149,7 @@ int io_poll_issue(struct io_kiocb *req, bool *locked)
>   	io_tw_lock(req->ctx, locked);
>   	if (unlikely(req->task->flags & PF_EXITING))
>   		return -EFAULT;
> -	return io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
> +	return io_issue_sqe(req, IO_URING_F_NONBLOCK);
>   }
>   
>   struct io_wq_work *io_wq_free_work(struct io_wq_work *work)

Good catch! Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE
  2022-06-14 14:37 ` [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE Pavel Begunkov
@ 2022-06-15  8:20   ` Hao Xu
  2022-06-15 10:18     ` Pavel Begunkov
  0 siblings, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-15  8:20 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/14/22 22:37, Pavel Begunkov wrote:
> REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
> completion list to io_queue_sqe() as __io_req_complete() is inlined and
> we don't want to bloat the kernel.
> 
> As now we complete in a more centralised fashion in io_issue_sqe() we
> can get rid of the flag and queue to the list directly.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/io_uring.c       | 20 ++++++++------------
>   io_uring/io_uring.h       |  5 -----
>   io_uring/io_uring_types.h |  3 ---
>   3 files changed, 8 insertions(+), 20 deletions(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 1fb93fdcfbab..fcee58c6c35e 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2)
>   
>   inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
>   {
> -	if (issue_flags & IO_URING_F_COMPLETE_DEFER)
> -		io_req_complete_state(req);
> -	else
> -		io_req_complete_post(req);
> +	io_req_complete_post(req);
>   }
>   

io_read/write and provide_buffers/remove_buffers are still using
io_req_complete() in their own function. By removing the
IO_URING_F_COMPLETE_DEFER branch they will end in complete_post path
100% which we shouldn't.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc
  2022-06-14 14:37 ` [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc Pavel Begunkov
@ 2022-06-15  8:46   ` Hao Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Hao Xu @ 2022-06-15  8:46 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/14/22 22:37, Pavel Begunkov wrote:
> Add a variable for the number of hash buckets in io_ring_ctx_alloc(),
> makes it more readable.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/io_uring.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 2a7a5db12a0e..15d209f334eb 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -700,6 +700,8 @@ static __cold void io_fallback_req_func(struct work_struct *work)
>   static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>   {
>   	struct io_ring_ctx *ctx;
> +	unsigned hash_buckets;

personally prefer nr_something like nr_buckets or nr_hash_buckets


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
  2022-06-14 14:37 ` [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
@ 2022-06-15  9:34   ` Hao Xu
  2022-06-15 10:20     ` Pavel Begunkov
  2022-06-15  9:41   ` Hao Xu
  1 sibling, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-15  9:34 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/14/22 22:37, Pavel Begunkov wrote:
> @@ -228,7 +249,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
>   		return -EINVAL;
>   
>   	mutex_unlock(&ctx->uring_lock);
> -	ret = io_uring_add_tctx_node(ctx);
> +	ret = __io_uring_add_tctx_node(ctx, false);

An question unrelated with this patch: why do we need this, since anyway
we will do it in later io_uring_enter() this task really submits reqs.

>   	mutex_lock(&ctx->uring_lock);
>   	if (ret)
>   		return ret;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
  2022-06-14 14:37 ` [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
  2022-06-15  9:34   ` Hao Xu
@ 2022-06-15  9:41   ` Hao Xu
  2022-06-15 10:26     ` Pavel Begunkov
  1 sibling, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-15  9:41 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/14/22 22:37, Pavel Begunkov wrote:
> Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
> of it, i.e. put limitations of submitters. Also, don't allow it together
> with IOPOLL as we're not going to put it to good use.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   include/uapi/linux/io_uring.h |  5 ++++-
>   io_uring/io_uring.c           |  7 +++++--
>   io_uring/io_uring_types.h     |  1 +
>   io_uring/tctx.c               | 27 ++++++++++++++++++++++++---
>   io_uring/tctx.h               |  4 ++--
>   5 files changed, 36 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index a41ddb8c5e1f..a3a691340d3e 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -138,9 +138,12 @@ enum {
>    * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
>    */
>   #define IORING_SETUP_TASKRUN_FLAG	(1U << 9)
> -
>   #define IORING_SETUP_SQE128		(1U << 10) /* SQEs are 128 byte */
>   #define IORING_SETUP_CQE32		(1U << 11) /* CQEs are 32 byte */
> +/*
> + * Only one task is allowed to submit requests
> + */
> +#define IORING_SETUP_SINGLE_ISSUER	(1U << 12)
>   
>   enum io_uring_op {
>   	IORING_OP_NOP,
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 15d209f334eb..4b90439808e3 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3020,6 +3020,8 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
>   	io_destroy_buffers(ctx);
>   	if (ctx->sq_creds)
>   		put_cred(ctx->sq_creds);
> +	if (ctx->submitter_task)
> +		put_task_struct(ctx->submitter_task);
>   
>   	/* there are no registered resources left, nobody uses it */
>   	if (ctx->rsrc_node)
> @@ -3752,7 +3754,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
>   	if (fd < 0)
>   		return fd;
>   
> -	ret = io_uring_add_tctx_node(ctx);
> +	ret = __io_uring_add_tctx_node(ctx, false);
>   	if (ret) {
>   		put_unused_fd(fd);
>   		return ret;
> @@ -3972,7 +3974,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
>   			IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
>   			IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
>   			IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG |
> -			IORING_SETUP_SQE128 | IORING_SETUP_CQE32))
> +			IORING_SETUP_SQE128 | IORING_SETUP_CQE32 |
> +			IORING_SETUP_SINGLE_ISSUER))
>   		return -EINVAL;
>   
>   	return io_uring_create(entries, &p, params);
> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
> index aba0f8cd6f49..f6d0ad25f377 100644
> --- a/io_uring/io_uring_types.h
> +++ b/io_uring/io_uring_types.h
> @@ -241,6 +241,7 @@ struct io_ring_ctx {
>   	/* Keep this last, we don't need it for the fast path */
>   
>   	struct io_restriction		restrictions;
> +	struct task_struct		*submitter_task;
>   
>   	/* slow path rsrc auxilary data, used by update/register */
>   	struct io_rsrc_node		*rsrc_backup_node;
> diff --git a/io_uring/tctx.c b/io_uring/tctx.c
> index 6adf659687f8..012be261dc50 100644
> --- a/io_uring/tctx.c
> +++ b/io_uring/tctx.c
> @@ -81,12 +81,32 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
>   	return 0;
>   }
>   
> -int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
> +static int io_register_submitter(struct io_ring_ctx *ctx)
> +{
> +	int ret = 0;
> +
> +	mutex_lock(&ctx->uring_lock);
> +	if (!ctx->submitter_task)
> +		ctx->submitter_task = get_task_struct(current);
> +	else if (ctx->submitter_task != current)
> +		ret = -EEXIST;
> +	mutex_unlock(&ctx->uring_lock);
> +
> +	return ret;
> +}

Seems we don't need this uring_lock:
When we create a ring, we setup ctx->submitter_task before uring fd is
installed so at that time nobody else can enter this code.
when we enter this code later in io_uring_enter, we just read it.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement
  2022-06-15  7:58   ` Hao Xu
@ 2022-06-15 10:11     ` Pavel Begunkov
  2022-06-15 10:59       ` Hao Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:11 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe

On 6/15/22 08:58, Hao Xu wrote:
> On 6/14/22 22:36, Pavel Begunkov wrote:
>> Shove all slow path data at the end of ctx and get rid of extra
>> indention.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   io_uring/io_uring_types.h | 81 +++++++++++++++++++--------------------
>>   1 file changed, 39 insertions(+), 42 deletions(-)
>>
>> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
>> index 4f52dcbbda56..ca8e25992ece 100644
>> --- a/io_uring/io_uring_types.h
>> +++ b/io_uring/io_uring_types.h
>> @@ -183,7 +183,6 @@ struct io_ring_ctx {
>>           struct list_head    apoll_cache;
>>           struct xarray        personalities;
>>           u32            pers_next;
>> -        unsigned        sq_thread_idle;
> 
> SQPOLL is seen as a slow path?

SQPOLL is not a slow path, but struct io_ring_ctx::sq_thread_idle
definitely is. Perhaps, you mixed it up with
struct io_sq_data::sq_thread_idle

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE
  2022-06-15  8:20   ` Hao Xu
@ 2022-06-15 10:18     ` Pavel Begunkov
  2022-06-15 10:19       ` Pavel Begunkov
  0 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:18 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe

On 6/15/22 09:20, Hao Xu wrote:
> On 6/14/22 22:37, Pavel Begunkov wrote:
>> REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
>> completion list to io_queue_sqe() as __io_req_complete() is inlined and
>> we don't want to bloat the kernel.
>>
>> As now we complete in a more centralised fashion in io_issue_sqe() we
>> can get rid of the flag and queue to the list directly.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   io_uring/io_uring.c       | 20 ++++++++------------
>>   io_uring/io_uring.h       |  5 -----
>>   io_uring/io_uring_types.h |  3 ---
>>   3 files changed, 8 insertions(+), 20 deletions(-)
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 1fb93fdcfbab..fcee58c6c35e 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2)
>>   inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
>>   {
>> -    if (issue_flags & IO_URING_F_COMPLETE_DEFER)
>> -        io_req_complete_state(req);
>> -    else
>> -        io_req_complete_post(req);
>> +    io_req_complete_post(req);
>>   }
> 
> io_read/write and provide_buffers/remove_buffers are still using
> io_req_complete() in their own function. By removing the
> IO_URING_F_COMPLETE_DEFER branch they will end in complete_post path
> 100% which we shouldn't.

Old provided buffers are such a useful feature that Jens adds
a new ring-based version of it, so I couldn't care less about
those two.

I any case, let's leave it to follow ups. Those locking is a
weird construct and shouldn't be done this ad-hook way, it's
a potential bug nest

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE
  2022-06-15 10:18     ` Pavel Begunkov
@ 2022-06-15 10:19       ` Pavel Begunkov
  0 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:19 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe

On 6/15/22 11:18, Pavel Begunkov wrote:
> On 6/15/22 09:20, Hao Xu wrote:
>> On 6/14/22 22:37, Pavel Begunkov wrote:
>>> REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
>>> completion list to io_queue_sqe() as __io_req_complete() is inlined and
>>> we don't want to bloat the kernel.
>>>
>>> As now we complete in a more centralised fashion in io_issue_sqe() we
>>> can get rid of the flag and queue to the list directly.
>>>
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> ---
>>>   io_uring/io_uring.c       | 20 ++++++++------------
>>>   io_uring/io_uring.h       |  5 -----
>>>   io_uring/io_uring_types.h |  3 ---
>>>   3 files changed, 8 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index 1fb93fdcfbab..fcee58c6c35e 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2)
>>>   inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
>>>   {
>>> -    if (issue_flags & IO_URING_F_COMPLETE_DEFER)
>>> -        io_req_complete_state(req);
>>> -    else
>>> -        io_req_complete_post(req);
>>> +    io_req_complete_post(req);
>>>   }
>>
>> io_read/write and provide_buffers/remove_buffers are still using
>> io_req_complete() in their own function. By removing the
>> IO_URING_F_COMPLETE_DEFER branch they will end in complete_post path
>> 100% which we shouldn't.
> 
> Old provided buffers are such a useful feature that Jens adds
> a new ring-based version of it, so I couldn't care less about
> those two.
> 
> I any case, let's leave it to follow ups. Those locking is a
> weird construct and shouldn't be done this ad-hook way, it's
> a potential bug nest

Ok, missed io_read/write part, that's a problem, agree

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
  2022-06-15  9:34   ` Hao Xu
@ 2022-06-15 10:20     ` Pavel Begunkov
  0 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:20 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe

On 6/15/22 10:34, Hao Xu wrote:
> On 6/14/22 22:37, Pavel Begunkov wrote:
>> @@ -228,7 +249,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
>>           return -EINVAL;
>>       mutex_unlock(&ctx->uring_lock);
>> -    ret = io_uring_add_tctx_node(ctx);
>> +    ret = __io_uring_add_tctx_node(ctx, false);
> 
> An question unrelated with this patch: why do we need this, since anyway
> we will do it in later io_uring_enter() this task really submits reqs.

At least we need to allocate a tctx as the files are stored in there


>>       mutex_lock(&ctx->uring_lock);
>>       if (ret)
>>           return ret;

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
  2022-06-15  9:41   ` Hao Xu
@ 2022-06-15 10:26     ` Pavel Begunkov
  2022-06-15 11:08       ` Hao Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:26 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe

On 6/15/22 10:41, Hao Xu wrote:
> On 6/14/22 22:37, Pavel Begunkov wrote:
>> Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
>> of it, i.e. put limitations of submitters. Also, don't allow it together
>> with IOPOLL as we're not going to put it to good use.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   include/uapi/linux/io_uring.h |  5 ++++-
>>   io_uring/io_uring.c           |  7 +++++--
>>   io_uring/io_uring_types.h     |  1 +
>>   io_uring/tctx.c               | 27 ++++++++++++++++++++++++---
>>   io_uring/tctx.h               |  4 ++--
>>   5 files changed, 36 insertions(+), 8 deletions(-)
>>
[...]
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 15d209f334eb..4b90439808e3 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -3020,6 +3020,8 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
>>       io_destroy_buffers(ctx);
>>       if (ctx->sq_creds)
>>           put_cred(ctx->sq_creds);
>> +    if (ctx->submitter_task)
>> +        put_task_struct(ctx->submitter_task);
>>       /* there are no registered resources left, nobody uses it */
>>       if (ctx->rsrc_node)
>> @@ -3752,7 +3754,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
>>       if (fd < 0)
>>           return fd;
>> -    ret = io_uring_add_tctx_node(ctx);
>> +    ret = __io_uring_add_tctx_node(ctx, false);

                                             ^^^^^^

Note this one


>>       if (ret) {
>>           put_unused_fd(fd);
>>           return ret;
>> @@ -3972,7 +3974,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
>>               IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
>>               IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
>>               IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG |
>> -            IORING_SETUP_SQE128 | IORING_SETUP_CQE32))
>> +            IORING_SETUP_SQE128 | IORING_SETUP_CQE32 |
>> +            IORING_SETUP_SINGLE_ISSUER))
>>           return -EINVAL;
>>       return io_uring_create(entries, &p, params);
>> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
>> index aba0f8cd6f49..f6d0ad25f377 100644
>> --- a/io_uring/io_uring_types.h
>> +++ b/io_uring/io_uring_types.h
>> @@ -241,6 +241,7 @@ struct io_ring_ctx {
>>       /* Keep this last, we don't need it for the fast path */
>>       struct io_restriction        restrictions;
>> +    struct task_struct        *submitter_task;
>>       /* slow path rsrc auxilary data, used by update/register */
>>       struct io_rsrc_node        *rsrc_backup_node;
>> diff --git a/io_uring/tctx.c b/io_uring/tctx.c
>> index 6adf659687f8..012be261dc50 100644
>> --- a/io_uring/tctx.c
>> +++ b/io_uring/tctx.c
>> @@ -81,12 +81,32 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
>>       return 0;
>>   }
>> -int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
>> +static int io_register_submitter(struct io_ring_ctx *ctx)
>> +{
>> +    int ret = 0;
>> +
>> +    mutex_lock(&ctx->uring_lock);
>> +    if (!ctx->submitter_task)
>> +        ctx->submitter_task = get_task_struct(current);
>> +    else if (ctx->submitter_task != current)
>> +        ret = -EEXIST;
>> +    mutex_unlock(&ctx->uring_lock);
>> +
>> +    return ret;
>> +}
> 
> Seems we don't need this uring_lock:
> When we create a ring, we setup ctx->submitter_task before uring fd is
> installed so at that time nobody else can enter this code.
> when we enter this code later in io_uring_enter, we just read it.

Not really, we specifically don't set it just to the ring's
creator but to the first submitter. That's needed to be able to
create a ring in one task and pass it over to another.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement
  2022-06-15 10:11     ` Pavel Begunkov
@ 2022-06-15 10:59       ` Hao Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Hao Xu @ 2022-06-15 10:59 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/15/22 18:11, Pavel Begunkov wrote:
> On 6/15/22 08:58, Hao Xu wrote:
>> On 6/14/22 22:36, Pavel Begunkov wrote:
>>> Shove all slow path data at the end of ctx and get rid of extra
>>> indention.
>>>
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> ---
>>>   io_uring/io_uring_types.h | 81 +++++++++++++++++++--------------------
>>>   1 file changed, 39 insertions(+), 42 deletions(-)
>>>
>>> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
>>> index 4f52dcbbda56..ca8e25992ece 100644
>>> --- a/io_uring/io_uring_types.h
>>> +++ b/io_uring/io_uring_types.h
>>> @@ -183,7 +183,6 @@ struct io_ring_ctx {
>>>           struct list_head    apoll_cache;
>>>           struct xarray        personalities;
>>>           u32            pers_next;
>>> -        unsigned        sq_thread_idle;
>>
>> SQPOLL is seen as a slow path?
> 
> SQPOLL is not a slow path, but struct io_ring_ctx::sq_thread_idle
> definitely is. Perhaps, you mixed it up with
> struct io_sq_data::sq_thread_idle
> 
Indeed, thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
  2022-06-15 10:26     ` Pavel Begunkov
@ 2022-06-15 11:08       ` Hao Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Hao Xu @ 2022-06-15 11:08 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/15/22 18:26, Pavel Begunkov wrote:
> On 6/15/22 10:41, Hao Xu wrote:
>> On 6/14/22 22:37, Pavel Begunkov wrote:
>>> Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
>>> of it, i.e. put limitations of submitters. Also, don't allow it together
>>> with IOPOLL as we're not going to put it to good use.
>>>
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> ---
>>>   include/uapi/linux/io_uring.h |  5 ++++-
>>>   io_uring/io_uring.c           |  7 +++++--
>>>   io_uring/io_uring_types.h     |  1 +
>>>   io_uring/tctx.c               | 27 ++++++++++++++++++++++++---
>>>   io_uring/tctx.h               |  4 ++--
>>>   5 files changed, 36 insertions(+), 8 deletions(-)
>>>
> [...]
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index 15d209f334eb..4b90439808e3 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -3020,6 +3020,8 @@ static __cold void io_ring_ctx_free(struct 
>>> io_ring_ctx *ctx)
>>>       io_destroy_buffers(ctx);
>>>       if (ctx->sq_creds)
>>>           put_cred(ctx->sq_creds);
>>> +    if (ctx->submitter_task)
>>> +        put_task_struct(ctx->submitter_task);
>>>       /* there are no registered resources left, nobody uses it */
>>>       if (ctx->rsrc_node)
>>> @@ -3752,7 +3754,7 @@ static int io_uring_install_fd(struct 
>>> io_ring_ctx *ctx, struct file *file)
>>>       if (fd < 0)
>>>           return fd;
>>> -    ret = io_uring_add_tctx_node(ctx);
>>> +    ret = __io_uring_add_tctx_node(ctx, false);
> 
>                                              ^^^^^^
> 
> Note this one

My bad, I read it wrong...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing
  2022-06-14 14:37 ` [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing Pavel Begunkov
@ 2022-06-15 12:53   ` Hao Xu
  2022-06-15 13:55     ` Pavel Begunkov
  0 siblings, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-15 12:53 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/14/22 22:37, Pavel Begunkov wrote:
> Currently we do two extra spin lock/unlock pairs to add a poll/apoll
> request to the cancellation hash table and remove it from there.
> 
> On the submission side we often already hold ->uring_lock and tw
> completion is likely to hold it as well. Add a second cancellation hash
> table protected by ->uring_lock. In concerns for latency because of a
> need to have the mutex locked on the completion side, use the new table
> only in following cases:
> 
> 1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
>     is no contention and so the main tw hander will always end up
>     grabbing it before calling into callbacks.

This statement seems not true, the io-worker may grab the uring lock,
and that's why the [1] place I marked below is needed, right? Or do I
miss something?

> 
> 2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
>     using ->uring_lock.

same as above.

> 
> 3) apoll: we normally grab the lock on the completion side anyway to
>     execute the request, so it's free.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/io_uring.c       |   9 +++-
>   io_uring/io_uring_types.h |   4 ++
>   io_uring/poll.c           | 111 ++++++++++++++++++++++++++++++--------
>   3 files changed, 102 insertions(+), 22 deletions(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 4bead16e57f7..1395176bc2ea 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -731,6 +731,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>   	hash_bits = clamp(hash_bits, 1, 8);
>   	if (io_alloc_hash_table(&ctx->cancel_table, hash_bits))
>   		goto err;
> +	if (io_alloc_hash_table(&ctx->cancel_table_locked, hash_bits))
> +		goto err;
>   
>   	ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
>   	if (!ctx->dummy_ubuf)
> @@ -773,6 +775,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>   err:
>   	kfree(ctx->dummy_ubuf);
>   	kfree(ctx->cancel_table.hbs);
> +	kfree(ctx->cancel_table_locked.hbs);
>   	kfree(ctx->io_bl);
>   	xa_destroy(&ctx->io_bl_xa);
>   	kfree(ctx);
> @@ -3056,6 +3059,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
>   	if (ctx->hash_map)
>   		io_wq_put_hash(ctx->hash_map);
>   	kfree(ctx->cancel_table.hbs);
> +	kfree(ctx->cancel_table_locked.hbs);
>   	kfree(ctx->dummy_ubuf);
>   	kfree(ctx->io_bl);
>   	xa_destroy(&ctx->io_bl_xa);
> @@ -3217,12 +3221,13 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
>   		__io_cqring_overflow_flush(ctx, true);
>   	xa_for_each(&ctx->personalities, index, creds)
>   		io_unregister_personality(ctx, index);
> +	if (ctx->rings)
> +		io_poll_remove_all(ctx, NULL, true);
>   	mutex_unlock(&ctx->uring_lock);
>   
>   	/* failed during ring init, it couldn't have issued any requests */
>   	if (ctx->rings) {
>   		io_kill_timeouts(ctx, NULL, true);
> -		io_poll_remove_all(ctx, NULL, true);
>   		/* if we failed setting up the ctx, we might not have any rings */
>   		io_iopoll_try_reap_events(ctx);
>   	}
> @@ -3347,7 +3352,9 @@ static __cold void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
>   		}
>   
>   		ret |= io_cancel_defer_files(ctx, task, cancel_all);
> +		mutex_lock(&ctx->uring_lock);
>   		ret |= io_poll_remove_all(ctx, task, cancel_all);
> +		mutex_unlock(&ctx->uring_lock);
>   		ret |= io_kill_timeouts(ctx, task, cancel_all);
>   		if (task)
>   			ret |= io_run_task_work();
> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
> index ce2fbe6749bb..557b8e7719c9 100644
> --- a/io_uring/io_uring_types.h
> +++ b/io_uring/io_uring_types.h
> @@ -189,6 +189,7 @@ struct io_ring_ctx {
>   		struct xarray		io_bl_xa;
>   		struct list_head	io_buffers_cache;
>   
> +		struct io_hash_table	cancel_table_locked;
>   		struct list_head	cq_overflow_list;
>   		struct list_head	apoll_cache;
>   		struct xarray		personalities;
> @@ -323,6 +324,7 @@ enum {
>   	/* keep async read/write and isreg together and in order */
>   	REQ_F_SUPPORT_NOWAIT_BIT,
>   	REQ_F_ISREG_BIT,
> +	REQ_F_HASH_LOCKED_BIT,
>   
>   	/* not a real bit, just to check we're not overflowing the space */
>   	__REQ_F_LAST_BIT,
> @@ -388,6 +390,8 @@ enum {
>   	REQ_F_APOLL_MULTISHOT	= BIT(REQ_F_APOLL_MULTISHOT_BIT),
>   	/* recvmsg special flag, clear EPOLLIN */
>   	REQ_F_CLEAR_POLLIN	= BIT(REQ_F_CLEAR_POLLIN_BIT),
> +	/* hashed into ->cancel_hash_locked */
> +	REQ_F_HASH_LOCKED	= BIT(REQ_F_HASH_LOCKED_BIT),
>   };
>   
>   typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
> diff --git a/io_uring/poll.c b/io_uring/poll.c
> index 07157da1c2cb..d20484c1cbb7 100644
> --- a/io_uring/poll.c
> +++ b/io_uring/poll.c
> @@ -93,6 +93,26 @@ static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
>   	spin_unlock(lock);
>   }
>   
> +static void io_poll_req_insert_locked(struct io_kiocb *req)
> +{
> +	struct io_hash_table *table = &req->ctx->cancel_table_locked;
> +	u32 index = hash_long(req->cqe.user_data, table->hash_bits);
> +
> +	hlist_add_head(&req->hash_node, &table->hbs[index].list);
> +}
> +
> +static void io_poll_tw_hash_eject(struct io_kiocb *req, bool *locked)
> +{
> +	struct io_ring_ctx *ctx = req->ctx;
> +
> +	if (req->flags & REQ_F_HASH_LOCKED) {
> +		io_tw_lock(ctx, locked);

                           [1]

> +		hash_del(&req->hash_node);
> +	} else {
> +		io_poll_req_delete(req, ctx);
> +	}
> +}
> +

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing
  2022-06-15 12:53   ` Hao Xu
@ 2022-06-15 13:55     ` Pavel Begunkov
  2022-06-15 15:19       ` Hao Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 13:55 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe

On 6/15/22 13:53, Hao Xu wrote:
> On 6/14/22 22:37, Pavel Begunkov wrote:
>> Currently we do two extra spin lock/unlock pairs to add a poll/apoll
>> request to the cancellation hash table and remove it from there.
>>
>> On the submission side we often already hold ->uring_lock and tw
>> completion is likely to hold it as well. Add a second cancellation hash
>> table protected by ->uring_lock. In concerns for latency because of a
>> need to have the mutex locked on the completion side, use the new table
>> only in following cases:
>>
>> 1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
>>     is no contention and so the main tw hander will always end up
>>     grabbing it before calling into callbacks.
> 
> This statement seems not true, the io-worker may grab the uring lock,
> and that's why the [1] place I marked below is needed, right? Or do I
> miss something?

Ok, "almost always ends up ...". The thing is io-wq is discouraged
taking the lock and if it does can do only briefly and without any
blocking/waiting. So yeah, it might be not taken at [1] but it's
rather unlikely.


>> 2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
>>     using ->uring_lock.
> 
> same as above.
> 
>>
>> 3) apoll: we normally grab the lock on the completion side anyway to
>>     execute the request, so it's free.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   io_uring/io_uring.c       |   9 +++-
>>   io_uring/io_uring_types.h |   4 ++
>>   io_uring/poll.c           | 111 ++++++++++++++++++++++++++++++--------
>>   3 files changed, 102 insertions(+), 22 deletions(-)
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 4bead16e57f7..1395176bc2ea 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -731,6 +731,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>>       hash_bits = clamp(hash_bits, 1, 8);
>>       if (io_alloc_hash_table(&ctx->cancel_table, hash_bits))
>>           goto err;
>> +    if (io_alloc_hash_table(&ctx->cancel_table_locked, hash_bits))
>> +        goto err;
>>       ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
>>       if (!ctx->dummy_ubuf)
>> @@ -773,6 +775,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>>   err:
>>       kfree(ctx->dummy_ubuf);
>>       kfree(ctx->cancel_table.hbs);
>> +    kfree(ctx->cancel_table_locked.hbs);
>>       kfree(ctx->io_bl);
>>       xa_destroy(&ctx->io_bl_xa);
>>       kfree(ctx);
>> @@ -3056,6 +3059,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
>>       if (ctx->hash_map)
>>           io_wq_put_hash(ctx->hash_map);
>>       kfree(ctx->cancel_table.hbs);
>> +    kfree(ctx->cancel_table_locked.hbs);
>>       kfree(ctx->dummy_ubuf);
>>       kfree(ctx->io_bl);
>>       xa_destroy(&ctx->io_bl_xa);
>> @@ -3217,12 +3221,13 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
>>           __io_cqring_overflow_flush(ctx, true);
>>       xa_for_each(&ctx->personalities, index, creds)
>>           io_unregister_personality(ctx, index);
>> +    if (ctx->rings)
>> +        io_poll_remove_all(ctx, NULL, true);
>>       mutex_unlock(&ctx->uring_lock);
>>       /* failed during ring init, it couldn't have issued any requests */
>>       if (ctx->rings) {
>>           io_kill_timeouts(ctx, NULL, true);
>> -        io_poll_remove_all(ctx, NULL, true);
>>           /* if we failed setting up the ctx, we might not have any rings */
>>           io_iopoll_try_reap_events(ctx);
>>       }
>> @@ -3347,7 +3352,9 @@ static __cold void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
>>           }
>>           ret |= io_cancel_defer_files(ctx, task, cancel_all);
>> +        mutex_lock(&ctx->uring_lock);
>>           ret |= io_poll_remove_all(ctx, task, cancel_all);
>> +        mutex_unlock(&ctx->uring_lock);
>>           ret |= io_kill_timeouts(ctx, task, cancel_all);
>>           if (task)
>>               ret |= io_run_task_work();
>> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
>> index ce2fbe6749bb..557b8e7719c9 100644
>> --- a/io_uring/io_uring_types.h
>> +++ b/io_uring/io_uring_types.h
>> @@ -189,6 +189,7 @@ struct io_ring_ctx {
>>           struct xarray        io_bl_xa;
>>           struct list_head    io_buffers_cache;
>> +        struct io_hash_table    cancel_table_locked;
>>           struct list_head    cq_overflow_list;
>>           struct list_head    apoll_cache;
>>           struct xarray        personalities;
>> @@ -323,6 +324,7 @@ enum {
>>       /* keep async read/write and isreg together and in order */
>>       REQ_F_SUPPORT_NOWAIT_BIT,
>>       REQ_F_ISREG_BIT,
>> +    REQ_F_HASH_LOCKED_BIT,
>>       /* not a real bit, just to check we're not overflowing the space */
>>       __REQ_F_LAST_BIT,
>> @@ -388,6 +390,8 @@ enum {
>>       REQ_F_APOLL_MULTISHOT    = BIT(REQ_F_APOLL_MULTISHOT_BIT),
>>       /* recvmsg special flag, clear EPOLLIN */
>>       REQ_F_CLEAR_POLLIN    = BIT(REQ_F_CLEAR_POLLIN_BIT),
>> +    /* hashed into ->cancel_hash_locked */
>> +    REQ_F_HASH_LOCKED    = BIT(REQ_F_HASH_LOCKED_BIT),
>>   };
>>   typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
>> diff --git a/io_uring/poll.c b/io_uring/poll.c
>> index 07157da1c2cb..d20484c1cbb7 100644
>> --- a/io_uring/poll.c
>> +++ b/io_uring/poll.c
>> @@ -93,6 +93,26 @@ static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
>>       spin_unlock(lock);
>>   }
>> +static void io_poll_req_insert_locked(struct io_kiocb *req)
>> +{
>> +    struct io_hash_table *table = &req->ctx->cancel_table_locked;
>> +    u32 index = hash_long(req->cqe.user_data, table->hash_bits);
>> +
>> +    hlist_add_head(&req->hash_node, &table->hbs[index].list);
>> +}
>> +
>> +static void io_poll_tw_hash_eject(struct io_kiocb *req, bool *locked)
>> +{
>> +    struct io_ring_ctx *ctx = req->ctx;
>> +
>> +    if (req->flags & REQ_F_HASH_LOCKED) {
>> +        io_tw_lock(ctx, locked);
> 
>                            [1]
> 
>> +        hash_del(&req->hash_node);
>> +    } else {
>> +        io_poll_req_delete(req, ctx);
>> +    }
>> +}
>> +

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing
  2022-06-15 13:55     ` Pavel Begunkov
@ 2022-06-15 15:19       ` Hao Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Hao Xu @ 2022-06-15 15:19 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/15/22 21:55, Pavel Begunkov wrote:
> On 6/15/22 13:53, Hao Xu wrote:
>> On 6/14/22 22:37, Pavel Begunkov wrote:
>>> Currently we do two extra spin lock/unlock pairs to add a poll/apoll
>>> request to the cancellation hash table and remove it from there.
>>>
>>> On the submission side we often already hold ->uring_lock and tw
>>> completion is likely to hold it as well. Add a second cancellation hash
>>> table protected by ->uring_lock. In concerns for latency because of a
>>> need to have the mutex locked on the completion side, use the new table
>>> only in following cases:
>>>
>>> 1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
>>>     is no contention and so the main tw hander will always end up
>>>     grabbing it before calling into callbacks.
>>
>> This statement seems not true, the io-worker may grab the uring lock,
>> and that's why the [1] place I marked below is needed, right? Or do I
>> miss something?
> 
> Ok, "almost always ends up ...". The thing is io-wq is discouraged
> taking the lock and if it does can do only briefly and without any
> blocking/waiting. So yeah, it might be not taken at [1] but it's
> rather unlikely.
> 

Agree, What I meant to say is the code at [1] deserve a comment for it
since I have a feeling that new developers into io_uring may struggle to
figure it out with commit message saying like this.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2022-06-15 15:20 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 01/25] io_uring: make reg buf init consistent Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 02/25] io_uring: move defer_list to slow data Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 03/25] io_uring: better caching for ctx timeout fields Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement Pavel Begunkov
2022-06-15  7:58   ` Hao Xu
2022-06-15 10:11     ` Pavel Begunkov
2022-06-15 10:59       ` Hao Xu
2022-06-14 14:36 ` [PATCH for-next v2 05/25] io_uring: move small helpers to headers Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 06/25] io_uring: explain io_wq_work::cancel_seq placement Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 07/25] io_uring: inline ->registered_rings Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 08/25] io_uring: don't set REQ_F_COMPLETE_INLINE in tw Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll Pavel Begunkov
2022-06-15  8:05   ` Hao Xu
2022-06-14 14:37 ` [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE Pavel Begunkov
2022-06-15  8:20   ` Hao Xu
2022-06-15 10:18     ` Pavel Begunkov
2022-06-15 10:19       ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete() Pavel Begunkov
2022-06-14 17:45   ` Hao Xu
2022-06-14 17:52     ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 12/25] io_uring: don't inline io_put_kbuf Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 13/25] io_uring: remove check_cq checking from hot paths Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 14/25] io_uring: poll: remove unnecessary req->ref set Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 15/25] io_uring: switch cancel_hash to use per entry spinlock Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 16/25] io_uring: pass poll_find lock back Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 17/25] io_uring: clean up io_try_cancel Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 18/25] io_uring: limit number hash buckets Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc Pavel Begunkov
2022-06-15  8:46   ` Hao Xu
2022-06-14 14:37 ` [PATCH for-next v2 20/25] io_uring: use state completion infra for poll reqs Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
2022-06-15  9:34   ` Hao Xu
2022-06-15 10:20     ` Pavel Begunkov
2022-06-15  9:41   ` Hao Xu
2022-06-15 10:26     ` Pavel Begunkov
2022-06-15 11:08       ` Hao Xu
2022-06-14 14:37 ` [PATCH for-next v2 22/25] io_uring: pass hash table into poll_find Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 23/25] io_uring: introduce a struct for hash table Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 24/25] io_uring: propagate locking state to poll cancel Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing Pavel Begunkov
2022-06-15 12:53   ` Hao Xu
2022-06-15 13:55     ` Pavel Begunkov
2022-06-15 15:19       ` Hao Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.