All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] cancel_hash per entry lock
@ 2022-05-29 16:19 Hao Xu
  2022-05-29 16:19 ` [PATCH 1/2] io_uring: add an argument for io_poll_disarm() Hao Xu
  2022-05-29 16:20 ` [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock Hao Xu
  0 siblings, 2 replies; 15+ messages in thread
From: Hao Xu @ 2022-05-29 16:19 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov

From: Hao Xu <howeyxu@tencent.com>

Make per entry lock for cancel_hash array, this reduces usage of
completion_lock and contension between cancel_hash entries.


Hao Xu (2):
  io_uring: add an argument for io_poll_disarm()
  io_uring: switch cancel_hash to use per list spinlock

 io_uring/cancel.c         | 12 ++++++++++--
 io_uring/cancel.h         |  1 +
 io_uring/io_uring.c       |  9 +++++++++
 io_uring/io_uring_types.h |  1 +
 io_uring/poll.c           | 38 +++++++++++++++++++++-----------------
 5 files changed, 42 insertions(+), 19 deletions(-)


base-commit: cbbf9a4865526c55a6a984ef69578def9944f319
-- 
2.25.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] io_uring: add an argument for io_poll_disarm()
  2022-05-29 16:19 [PATCH 0/2] cancel_hash per entry lock Hao Xu
@ 2022-05-29 16:19 ` Hao Xu
  2022-05-29 16:20 ` [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock Hao Xu
  1 sibling, 0 replies; 15+ messages in thread
From: Hao Xu @ 2022-05-29 16:19 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov

From: Hao Xu <howeyxu@tencent.com>

From: Hao Xu <howeyxu@tencent.com>

Add an argument for io_poll_disarm() for later use.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
---
 io_uring/poll.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 728f6e7b47c5..c8982c5ef0fa 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -561,8 +561,9 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 {
 	struct hlist_head *list;
 	struct io_kiocb *req;
+	u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
 
-	list = &ctx->cancel_hash[hash_long(cd->data, ctx->cancel_hash_bits)];
+	list = &ctx->cancel_hash[index];
 	hlist_for_each_entry(req, list, hash_node) {
 		if (cd->data != req->cqe.user_data)
 			continue;
@@ -573,6 +574,7 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				continue;
 			req->work.cancel_seq = cd->seq;
 		}
+		cd->flags = index;
 		return req;
 	}
 	return NULL;
@@ -602,7 +604,7 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 	return NULL;
 }
 
-static bool io_poll_disarm(struct io_kiocb *req)
+static bool io_poll_disarm(struct io_kiocb *req, u32 index)
 	__must_hold(&ctx->completion_lock)
 {
 	if (!io_poll_get_ownership(req))
@@ -724,7 +726,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 
 	spin_lock(&ctx->completion_lock);
 	preq = io_poll_find(ctx, true, &cd);
-	if (!preq || !io_poll_disarm(preq)) {
+	if (!preq || !io_poll_disarm(preq, cd.flags)) {
 		spin_unlock(&ctx->completion_lock);
 		ret = preq ? -EALREADY : -ENOENT;
 		goto out;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-29 16:19 [PATCH 0/2] cancel_hash per entry lock Hao Xu
  2022-05-29 16:19 ` [PATCH 1/2] io_uring: add an argument for io_poll_disarm() Hao Xu
@ 2022-05-29 16:20 ` Hao Xu
  2022-05-29 16:25   ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: Hao Xu @ 2022-05-29 16:20 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov

From: Hao Xu <howeyxu@tencent.com>

From: Hao Xu <howeyxu@tencent.com>

Use per list lock for cancel_hash, this removes some completion lock
invocation and remove contension between different cancel_hash entries

Signed-off-by: Hao Xu <howeyxu@tencent.com>
---
 io_uring/cancel.c         | 12 ++++++++++--
 io_uring/cancel.h         |  1 +
 io_uring/io_uring.c       |  9 +++++++++
 io_uring/io_uring_types.h |  1 +
 io_uring/poll.c           | 30 ++++++++++++++++--------------
 5 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 83cceb52d82d..0b1aa3ab7664 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -93,14 +93,14 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
 	if (!ret)
 		return 0;
 
-	spin_lock(&ctx->completion_lock);
 	ret = io_poll_cancel(ctx, cd);
 	if (ret != -ENOENT)
 		goto out;
+	spin_lock(&ctx->completion_lock);
 	if (!(cd->flags & IORING_ASYNC_CANCEL_FD))
 		ret = io_timeout_cancel(ctx, cd);
-out:
 	spin_unlock(&ctx->completion_lock);
+out:
 	return ret;
 }
 
@@ -192,3 +192,11 @@ int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
 	io_req_set_res(req, ret, 0);
 	return IOU_OK;
 }
+
+inline void init_cancel_hash_locks(spinlock_t *cancel_hash_locks, unsigned size)
+{
+	int i;
+
+	for (i = 0; i < size; i++)
+		spin_lock_init(&cancel_hash_locks[i]);
+}
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 4f35d8696325..fdec2595797e 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -4,3 +4,4 @@ int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
 
 int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
+inline void init_cancel_hash_locks(spinlock_t *cancel_hash_locks, unsigned size);
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index f31d3446dcbf..6eaa27aea197 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -706,7 +706,14 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 					GFP_KERNEL);
 	if (!ctx->cancel_hash)
 		goto err;
+	ctx->cancel_hash_locks =
+		kmalloc((1U << hash_bits) * sizeof(spinlock_t),
+			GFP_KERNEL);
+	if (!ctx->cancel_hash_locks)
+		goto err;
+
 	__hash_init(ctx->cancel_hash, 1U << hash_bits);
+	init_cancel_hash_locks(ctx->cancel_hash_locks, 1U << hash_bits);
 
 	ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
 	if (!ctx->dummy_ubuf)
@@ -749,6 +756,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 err:
 	kfree(ctx->dummy_ubuf);
 	kfree(ctx->cancel_hash);
+	kfree(ctx->cancel_hash_locks);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
 	kfree(ctx);
@@ -3045,6 +3053,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	if (ctx->hash_map)
 		io_wq_put_hash(ctx->hash_map);
 	kfree(ctx->cancel_hash);
+	kfree(ctx->cancel_hash_locks);
 	kfree(ctx->dummy_ubuf);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 7c22cf35a7e2..4619a46f7ecd 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -231,6 +231,7 @@ struct io_ring_ctx {
 		 */
 		struct io_wq_work_list	iopoll_list;
 		struct hlist_head	*cancel_hash;
+		spinlock_t		*cancel_hash_locks;
 		unsigned		cancel_hash_bits;
 		bool			poll_multi_queue;
 
diff --git a/io_uring/poll.c b/io_uring/poll.c
index c8982c5ef0fa..e1b6dd282860 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -73,10 +73,11 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req)
 static void io_poll_req_insert(struct io_kiocb *req)
 {
 	struct io_ring_ctx *ctx = req->ctx;
-	struct hlist_head *list;
+	u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
 
-	list = &ctx->cancel_hash[hash_long(req->cqe.user_data, ctx->cancel_hash_bits)];
-	hlist_add_head(&req->hash_node, list);
+	spin_lock(&ctx->cancel_hash_locks[index]);
+	hlist_add_head(&req->hash_node, &ctx->cancel_hash[index]);
+	spin_unlock(&ctx->cancel_hash_locks[index]);
 }
 
 static void io_init_poll_iocb(struct io_poll *poll, __poll_t events,
@@ -439,9 +440,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
 		return 0;
 	}
 
-	spin_lock(&ctx->completion_lock);
 	io_poll_req_insert(req);
-	spin_unlock(&ctx->completion_lock);
 
 	if (mask && (poll->events & EPOLLET)) {
 		/* can't multishot if failed, just queue the event we've got */
@@ -538,10 +537,10 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 	bool found = false;
 	int i;
 
-	spin_lock(&ctx->completion_lock);
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
 		struct hlist_head *list;
 
+		spin_lock(&ctx->cancel_hash_locks[i]);
 		list = &ctx->cancel_hash[i];
 		hlist_for_each_entry_safe(req, tmp, list, hash_node) {
 			if (io_match_task_safe(req, tsk, cancel_all)) {
@@ -550,19 +549,19 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 				found = true;
 			}
 		}
+		spin_unlock(&ctx->cancel_hash_locks[i]);
 	}
-	spin_unlock(&ctx->completion_lock);
 	return found;
 }
 
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				     struct io_cancel_data *cd)
-	__must_hold(&ctx->completion_lock)
 {
 	struct hlist_head *list;
 	struct io_kiocb *req;
 	u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
 
+	spin_lock(&ctx->cancel_hash_locks[index]);
 	list = &ctx->cancel_hash[index];
 	hlist_for_each_entry(req, list, hash_node) {
 		if (cd->data != req->cqe.user_data)
@@ -574,15 +573,16 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				continue;
 			req->work.cancel_seq = cd->seq;
 		}
+		spin_unlock(&ctx->cancel_hash_locks[index]);
 		cd->flags = index;
 		return req;
 	}
+	spin_unlock(&ctx->cancel_hash_locks[index]);
 	return NULL;
 }
 
 static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 					  struct io_cancel_data *cd)
-	__must_hold(&ctx->completion_lock)
 {
 	struct io_kiocb *req;
 	int i;
@@ -590,6 +590,7 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
 		struct hlist_head *list;
 
+		spin_lock(&ctx->cancel_hash_locks[i]);
 		list = &ctx->cancel_hash[i];
 		hlist_for_each_entry(req, list, hash_node) {
 			if (!(cd->flags & IORING_ASYNC_CANCEL_ANY) &&
@@ -598,24 +599,28 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 			if (cd->seq == req->work.cancel_seq)
 				continue;
 			req->work.cancel_seq = cd->seq;
+			spin_unlock(&ctx->cancel_hash_locks[i]);
 			return req;
 		}
+		spin_unlock(&ctx->cancel_hash_locks[i]);
 	}
 	return NULL;
 }
 
 static bool io_poll_disarm(struct io_kiocb *req, u32 index)
-	__must_hold(&ctx->completion_lock)
 {
+	struct io_ring_ctx *ctx = req->ctx;
+
 	if (!io_poll_get_ownership(req))
 		return false;
 	io_poll_remove_entries(req);
+	spin_lock(&ctx->cancel_hash_locks[index]);
 	hash_del(&req->hash_node);
+	spin_unlock(&ctx->cancel_hash_locks[index]);
 	return true;
 }
 
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
-	__must_hold(&ctx->completion_lock)
 {
 	struct io_kiocb *req;
 
@@ -724,14 +729,11 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	int ret2, ret = 0;
 	bool locked;
 
-	spin_lock(&ctx->completion_lock);
 	preq = io_poll_find(ctx, true, &cd);
 	if (!preq || !io_poll_disarm(preq, cd.flags)) {
-		spin_unlock(&ctx->completion_lock);
 		ret = preq ? -EALREADY : -ENOENT;
 		goto out;
 	}
-	spin_unlock(&ctx->completion_lock);
 
 	if (poll_update->update_events || poll_update->update_user_data) {
 		/* only mask one event flags, keep behavior flags */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-29 16:20 ` [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock Hao Xu
@ 2022-05-29 16:25   ` Jens Axboe
  2022-05-29 18:07     ` Hao Xu
  2022-05-30 13:33     ` Hao Xu
  0 siblings, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2022-05-29 16:25 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov

On 5/29/22 10:20 AM, Hao Xu wrote:
> From: Hao Xu <howeyxu@tencent.com>
> 
> From: Hao Xu <howeyxu@tencent.com>
> 
> Use per list lock for cancel_hash, this removes some completion lock
> invocation and remove contension between different cancel_hash entries

Interesting, do you have any numbers on this?

Also, I'd make a hash bucket struct:

struct io_hash_bucket {
	spinlock_t lock;
	struct hlist_head list;
};

rather than two separate structs, that'll have nicer memory locality too
and should further improve it. Could be done as a prep patch with the
old locking in place, making the end patch doing the per-bucket lock
simpler as well.

Hmm?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-29 16:25   ` Jens Axboe
@ 2022-05-29 18:07     ` Hao Xu
  2022-05-29 18:40       ` Jens Axboe
  2022-05-30 13:33     ` Hao Xu
  1 sibling, 1 reply; 15+ messages in thread
From: Hao Xu @ 2022-05-29 18:07 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Pavel Begunkov

On 5/30/22 00:25, Jens Axboe wrote:
> On 5/29/22 10:20 AM, Hao Xu wrote:
>> From: Hao Xu <howeyxu@tencent.com>
>>
>> From: Hao Xu <howeyxu@tencent.com>
>>
>> Use per list lock for cancel_hash, this removes some completion lock
>> invocation and remove contension between different cancel_hash entries
> 
> Interesting, do you have any numbers on this?

Just Theoretically for now, I'll do some tests tomorrow. This is
actually RFC, forgot to change the subject.

> 
> Also, I'd make a hash bucket struct:
> 
> struct io_hash_bucket {
> 	spinlock_t lock;
> 	struct hlist_head list;
> };
> 
> rather than two separate structs, that'll have nicer memory locality too
> and should further improve it. Could be done as a prep patch with the
> old locking in place, making the end patch doing the per-bucket lock
> simpler as well.

Sure, if the test number make sense, I'll send v2. I'll test the
hlist_bl list as well(the comment of it says it is much slower than
normal spin_lock, but we may not care the efficiency of poll
cancellation very much?).

> 
> Hmm?
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-29 18:07     ` Hao Xu
@ 2022-05-29 18:40       ` Jens Axboe
  2022-05-29 22:50         ` Pavel Begunkov
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2022-05-29 18:40 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov

On 5/29/22 12:07 PM, Hao Xu wrote:
> On 5/30/22 00:25, Jens Axboe wrote:
>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>> From: Hao Xu <howeyxu@tencent.com>
>>>
>>> From: Hao Xu <howeyxu@tencent.com>
>>>
>>> Use per list lock for cancel_hash, this removes some completion lock
>>> invocation and remove contension between different cancel_hash entries
>>
>> Interesting, do you have any numbers on this?
> 
> Just Theoretically for now, I'll do some tests tomorrow. This is
> actually RFC, forgot to change the subject.
> 
>>
>> Also, I'd make a hash bucket struct:
>>
>> struct io_hash_bucket {
>>     spinlock_t lock;
>>     struct hlist_head list;
>> };
>>
>> rather than two separate structs, that'll have nicer memory locality too
>> and should further improve it. Could be done as a prep patch with the
>> old locking in place, making the end patch doing the per-bucket lock
>> simpler as well.
> 
> Sure, if the test number make sense, I'll send v2. I'll test the
> hlist_bl list as well(the comment of it says it is much slower than
> normal spin_lock, but we may not care the efficiency of poll
> cancellation very much?).

I don't think the bit spinlocks are going to be useful, we should
stick with a spinlock for this. They are indeed slower and generally not
used for that reason. For a use case where you need a ton of locks and
saving the 4 bytes for a spinlock would make sense (or maybe not
changing some struct?), maybe they have a purpose. But not for this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-29 18:40       ` Jens Axboe
@ 2022-05-29 22:50         ` Pavel Begunkov
  2022-05-29 23:34           ` Jens Axboe
  2022-05-30  6:38           ` Hao Xu
  0 siblings, 2 replies; 15+ messages in thread
From: Pavel Begunkov @ 2022-05-29 22:50 UTC (permalink / raw)
  To: Jens Axboe, Hao Xu, io-uring

On 5/29/22 19:40, Jens Axboe wrote:
> On 5/29/22 12:07 PM, Hao Xu wrote:
>> On 5/30/22 00:25, Jens Axboe wrote:
>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>
>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>
>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>> invocation and remove contension between different cancel_hash entries
>>>
>>> Interesting, do you have any numbers on this?
>>
>> Just Theoretically for now, I'll do some tests tomorrow. This is
>> actually RFC, forgot to change the subject.
>>
>>>
>>> Also, I'd make a hash bucket struct:
>>>
>>> struct io_hash_bucket {
>>>      spinlock_t lock;
>>>      struct hlist_head list;
>>> };
>>>
>>> rather than two separate structs, that'll have nicer memory locality too
>>> and should further improve it. Could be done as a prep patch with the
>>> old locking in place, making the end patch doing the per-bucket lock
>>> simpler as well.
>>
>> Sure, if the test number make sense, I'll send v2. I'll test the
>> hlist_bl list as well(the comment of it says it is much slower than
>> normal spin_lock, but we may not care the efficiency of poll
>> cancellation very much?).
> 
> I don't think the bit spinlocks are going to be useful, we should
> stick with a spinlock for this. They are indeed slower and generally not
> used for that reason. For a use case where you need a ton of locks and
> saving the 4 bytes for a spinlock would make sense (or maybe not
> changing some struct?), maybe they have a purpose. But not for this.

We can put the cancel hashes under uring_lock and completely kill
the hash spinlocking (2 lock/unlock pairs per single-shot). The code
below won't even compile and missing cancellation bits, I'll pick it
up in a week.

Even better would be to have two hash tables, and auto-magically apply
the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
and apoll (need uring_lock after anyway).


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6be21967959d..191fa7f31610 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7120,12 +7120,20 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
  	}
  
  	io_poll_remove_entries(req);
-	spin_lock(&ctx->completion_lock);
-	hash_del(&req->hash_node);
-	__io_req_complete_post(req, req->cqe.res, 0);
-	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-	io_cqring_ev_posted(ctx);
+
+	if (ctx->flags & IORING_MUTEX_HASH) {
+		io_tw_lock(ctx, locked);
+		hash_del(&req->hash_node);
+		io_req_complete_state(req, req->cqe.res, 0);
+		io_req_add_compl_list(req);
+	} else {
+		spin_lock(&ctx->completion_lock);
+		hash_del(&req->hash_node);
+		__io_req_complete_post(req, req->cqe.res, 0);
+		io_commit_cqring(ctx);
+		spin_unlock(&ctx->completion_lock);
+		io_cqring_ev_posted(ctx);
+	}
  }
  
  static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
@@ -7138,9 +7146,14 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
  		return;
  
  	io_poll_remove_entries(req);
-	spin_lock(&ctx->completion_lock);
-	hash_del(&req->hash_node);
-	spin_unlock(&ctx->completion_lock);
+	if (ctx->flags & IORING_MUTEX_HASH) {
+		io_tw_lock(ctx, locked);
+		hash_del(&req->hash_node);
+	} else {
+		spin_lock(&ctx->completion_lock);
+		hash_del(&req->hash_node);
+		spin_unlock(&ctx->completion_lock);
+	}
  
  	if (!ret)
  		io_req_task_submit(req, locked);
@@ -7332,9 +7345,13 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
  		return 0;
  	}
  
-	spin_lock(&ctx->completion_lock);
-	io_poll_req_insert(req);
-	spin_unlock(&ctx->completion_lock);
+	if (ctx->flags & IORING_MUTEX_HASH) {
+		io_poll_req_insert(req);
+	} else {
+		spin_lock(&ctx->completion_lock);
+		io_poll_req_insert(req);
+		spin_unlock(&ctx->completion_lock);
+	}
  
  	if (mask) {
  		/* can't multishot if failed, just queue the event we've got */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-29 22:50         ` Pavel Begunkov
@ 2022-05-29 23:34           ` Jens Axboe
  2022-05-30  0:18             ` Pavel Begunkov
  2022-05-30  6:38           ` Hao Xu
  1 sibling, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2022-05-29 23:34 UTC (permalink / raw)
  To: Pavel Begunkov, Hao Xu, io-uring

On 5/29/22 4:50 PM, Pavel Begunkov wrote:
> On 5/29/22 19:40, Jens Axboe wrote:
>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>
>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>
>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>> invocation and remove contension between different cancel_hash entries
>>>>
>>>> Interesting, do you have any numbers on this?
>>>
>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>> actually RFC, forgot to change the subject.
>>>
>>>>
>>>> Also, I'd make a hash bucket struct:
>>>>
>>>> struct io_hash_bucket {
>>>>      spinlock_t lock;
>>>>      struct hlist_head list;
>>>> };
>>>>
>>>> rather than two separate structs, that'll have nicer memory locality too
>>>> and should further improve it. Could be done as a prep patch with the
>>>> old locking in place, making the end patch doing the per-bucket lock
>>>> simpler as well.
>>>
>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>> hlist_bl list as well(the comment of it says it is much slower than
>>> normal spin_lock, but we may not care the efficiency of poll
>>> cancellation very much?).
>>
>> I don't think the bit spinlocks are going to be useful, we should
>> stick with a spinlock for this. They are indeed slower and generally not
>> used for that reason. For a use case where you need a ton of locks and
>> saving the 4 bytes for a spinlock would make sense (or maybe not
>> changing some struct?), maybe they have a purpose. But not for this.
> 
> We can put the cancel hashes under uring_lock and completely kill
> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
> below won't even compile and missing cancellation bits, I'll pick it
> up in a week.
> 
> Even better would be to have two hash tables, and auto-magically apply
> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
> and apoll (need uring_lock after anyway).

My hope was that it'd take us closer to being able to use more granular
locking for hashing in general. I don't care too much about the
cancelation, but the normal hash locking would be useful to do.

However, for cancelations, under uring_lock would indeed be preferable
to doing per-bucket locks there. Guess I'll wait and see what your final
patch looks like, not sure why it'd be a ctx conditional?

What about io_poll_remove_all()?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-29 23:34           ` Jens Axboe
@ 2022-05-30  0:18             ` Pavel Begunkov
  2022-05-30  6:52               ` Hao Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2022-05-30  0:18 UTC (permalink / raw)
  To: Jens Axboe, Hao Xu, io-uring

On 5/30/22 00:34, Jens Axboe wrote:
> On 5/29/22 4:50 PM, Pavel Begunkov wrote:
>> On 5/29/22 19:40, Jens Axboe wrote:
>>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>
>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>
>>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>>> invocation and remove contension between different cancel_hash entries
>>>>>
>>>>> Interesting, do you have any numbers on this?
>>>>
>>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>>> actually RFC, forgot to change the subject.
>>>>
>>>>>
>>>>> Also, I'd make a hash bucket struct:
>>>>>
>>>>> struct io_hash_bucket {
>>>>>       spinlock_t lock;
>>>>>       struct hlist_head list;
>>>>> };
>>>>>
>>>>> rather than two separate structs, that'll have nicer memory locality too
>>>>> and should further improve it. Could be done as a prep patch with the
>>>>> old locking in place, making the end patch doing the per-bucket lock
>>>>> simpler as well.
>>>>
>>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>>> hlist_bl list as well(the comment of it says it is much slower than
>>>> normal spin_lock, but we may not care the efficiency of poll
>>>> cancellation very much?).
>>>
>>> I don't think the bit spinlocks are going to be useful, we should
>>> stick with a spinlock for this. They are indeed slower and generally not
>>> used for that reason. For a use case where you need a ton of locks and
>>> saving the 4 bytes for a spinlock would make sense (or maybe not
>>> changing some struct?), maybe they have a purpose. But not for this.
>>
>> We can put the cancel hashes under uring_lock and completely kill
>> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
>> below won't even compile and missing cancellation bits, I'll pick it
>> up in a week.
>>
>> Even better would be to have two hash tables, and auto-magically apply
>> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
>> and apoll (need uring_lock after anyway).
> 
> My hope was that it'd take us closer to being able to use more granular
> locking for hashing in general. I don't care too much about the
> cancelation, but the normal hash locking would be useful to do.
> 
> However, for cancelations, under uring_lock would indeed be preferable
> to doing per-bucket locks there. Guess I'll wait and see what your final
> patch looks like, not sure why it'd be a ctx conditional?

It replaces 2 spin lock/unlock with one io_tw_lock() in the completion
path, which is done once per tw batch and grabbed anyway if
there is no contention (see handle_tw_list()).

It could be unconditional, but I'd say for those 3 cases we have
non-existing chance to regress perf/latency, but I can think of
some cases where it might screw latencies, all share io_uring
b/w threads.

Should benefit the cancellation path as well, but I don't care
about it as well.

> What about io_poll_remove_all()?

As mentioned, it's not handled in the diff, but easily doable,
it should just traverse both hash tables.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-29 22:50         ` Pavel Begunkov
  2022-05-29 23:34           ` Jens Axboe
@ 2022-05-30  6:38           ` Hao Xu
  2022-05-30  6:59             ` Hao Xu
  2022-05-30  9:39             ` Pavel Begunkov
  1 sibling, 2 replies; 15+ messages in thread
From: Hao Xu @ 2022-05-30  6:38 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

On 5/30/22 06:50, Pavel Begunkov wrote:
> On 5/29/22 19:40, Jens Axboe wrote:
>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>
>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>
>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>> invocation and remove contension between different cancel_hash entries
>>>>
>>>> Interesting, do you have any numbers on this?
>>>
>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>> actually RFC, forgot to change the subject.
>>>
>>>>
>>>> Also, I'd make a hash bucket struct:
>>>>
>>>> struct io_hash_bucket {
>>>>      spinlock_t lock;
>>>>      struct hlist_head list;
>>>> };
>>>>
>>>> rather than two separate structs, that'll have nicer memory locality 
>>>> too
>>>> and should further improve it. Could be done as a prep patch with the
>>>> old locking in place, making the end patch doing the per-bucket lock
>>>> simpler as well.
>>>
>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>> hlist_bl list as well(the comment of it says it is much slower than
>>> normal spin_lock, but we may not care the efficiency of poll
>>> cancellation very much?).
>>
>> I don't think the bit spinlocks are going to be useful, we should
>> stick with a spinlock for this. They are indeed slower and generally not
>> used for that reason. For a use case where you need a ton of locks and
>> saving the 4 bytes for a spinlock would make sense (or maybe not
>> changing some struct?), maybe they have a purpose. But not for this.
> 
> We can put the cancel hashes under uring_lock and completely kill
> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
> below won't even compile and missing cancellation bits, I'll pick it
> up in a week.
> 
> Even better would be to have two hash tables, and auto-magically apply
> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
> and apoll (need uring_lock after anyway).
> 
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 6be21967959d..191fa7f31610 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -7120,12 +7120,20 @@ static void io_poll_task_func(struct io_kiocb 
> *req, bool *locked)
>       }
> 
>       io_poll_remove_entries(req);
> -    spin_lock(&ctx->completion_lock);
> -    hash_del(&req->hash_node);
> -    __io_req_complete_post(req, req->cqe.res, 0);
> -    io_commit_cqring(ctx);
> -    spin_unlock(&ctx->completion_lock);
> -    io_cqring_ev_posted(ctx);
> +
> +    if (ctx->flags & IORING_MUTEX_HASH) {
> +        io_tw_lock(ctx, locked);
> +        hash_del(&req->hash_node);
> +        io_req_complete_state(req, req->cqe.res, 0);
> +        io_req_add_compl_list(req);
> +    } else {
> +        spin_lock(&ctx->completion_lock);
> +        hash_del(&req->hash_node);
> +        __io_req_complete_post(req, req->cqe.res, 0);
> +        io_commit_cqring(ctx);
> +        spin_unlock(&ctx->completion_lock);
> +        io_cqring_ev_posted(ctx);
> +    }
>   }
> 
>   static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
> @@ -7138,9 +7146,14 @@ static void io_apoll_task_func(struct io_kiocb 
> *req, bool *locked)
>           return;
> 
>       io_poll_remove_entries(req);
> -    spin_lock(&ctx->completion_lock);
> -    hash_del(&req->hash_node);
> -    spin_unlock(&ctx->completion_lock);
> +    if (ctx->flags & IORING_MUTEX_HASH) {
> +        io_tw_lock(ctx, locked);
> +        hash_del(&req->hash_node);
> +    } else {
> +        spin_lock(&ctx->completion_lock);
> +        hash_del(&req->hash_node);
> +        spin_unlock(&ctx->completion_lock);
> +    }
> 
>       if (!ret)
>           io_req_task_submit(req, locked);
> @@ -7332,9 +7345,13 @@ static int __io_arm_poll_handler(struct io_kiocb 
> *req,
>           return 0;
>       }
> 
> -    spin_lock(&ctx->completion_lock);
> -    io_poll_req_insert(req);
> -    spin_unlock(&ctx->completion_lock);
> +    if (ctx->flags & IORING_MUTEX_HASH) {

Does IORING_MUTEX_HASH exclude IOSQE_ASYNC as well? though IOSQE_ASYNC
is uncommon but users may do that.

> +        io_poll_req_insert(req);
> +    } else {
> +        spin_lock(&ctx->completion_lock);
> +        io_poll_req_insert(req);
> +        spin_unlock(&ctx->completion_lock);
> +    }
> 
>       if (mask) {
>           /* can't multishot if failed, just queue the event we've got */


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-30  0:18             ` Pavel Begunkov
@ 2022-05-30  6:52               ` Hao Xu
  2022-05-30  9:35                 ` Pavel Begunkov
  0 siblings, 1 reply; 15+ messages in thread
From: Hao Xu @ 2022-05-30  6:52 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

On 5/30/22 08:18, Pavel Begunkov wrote:
> On 5/30/22 00:34, Jens Axboe wrote:
>> On 5/29/22 4:50 PM, Pavel Begunkov wrote:
>>> On 5/29/22 19:40, Jens Axboe wrote:
>>>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>>
>>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>>
>>>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>>>> invocation and remove contension between different cancel_hash 
>>>>>>> entries
>>>>>>
>>>>>> Interesting, do you have any numbers on this?
>>>>>
>>>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>>>> actually RFC, forgot to change the subject.
>>>>>
>>>>>>
>>>>>> Also, I'd make a hash bucket struct:
>>>>>>
>>>>>> struct io_hash_bucket {
>>>>>>       spinlock_t lock;
>>>>>>       struct hlist_head list;
>>>>>> };
>>>>>>
>>>>>> rather than two separate structs, that'll have nicer memory 
>>>>>> locality too
>>>>>> and should further improve it. Could be done as a prep patch with the
>>>>>> old locking in place, making the end patch doing the per-bucket lock
>>>>>> simpler as well.
>>>>>
>>>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>>>> hlist_bl list as well(the comment of it says it is much slower than
>>>>> normal spin_lock, but we may not care the efficiency of poll
>>>>> cancellation very much?).
>>>>
>>>> I don't think the bit spinlocks are going to be useful, we should
>>>> stick with a spinlock for this. They are indeed slower and generally 
>>>> not
>>>> used for that reason. For a use case where you need a ton of locks and
>>>> saving the 4 bytes for a spinlock would make sense (or maybe not
>>>> changing some struct?), maybe they have a purpose. But not for this.
>>>
>>> We can put the cancel hashes under uring_lock and completely kill
>>> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
>>> below won't even compile and missing cancellation bits, I'll pick it
>>> up in a week.
>>>
>>> Even better would be to have two hash tables, and auto-magically apply
>>> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
>>> and apoll (need uring_lock after anyway).
>>
>> My hope was that it'd take us closer to being able to use more granular
>> locking for hashing in general. I don't care too much about the
>> cancelation, but the normal hash locking would be useful to do.
>>
>> However, for cancelations, under uring_lock would indeed be preferable
>> to doing per-bucket locks there. Guess I'll wait and see what your final
>> patch looks like, not sure why it'd be a ctx conditional?
> 
> It replaces 2 spin lock/unlock with one io_tw_lock() in the completion
> path, which is done once per tw batch and grabbed anyway if
> there is no contention (see handle_tw_list()).
> 
> It could be unconditional, but I'd say for those 3 cases we have
> non-existing chance to regress perf/latency, but I can think of
> some cases where it might screw latencies, all share io_uring
> b/w threads.
> 
> Should benefit the cancellation path as well, but I don't care
> about it as well.
> 
>> What about io_poll_remove_all()?
> 
> As mentioned, it's not handled in the diff, but easily doable,
> it should just traverse both hash tables.
> 

Two hash tables looks good to me. If I don't get you wrong, one table
under uring_lock, the other one for normal handling(like per bucket
locking)?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-30  6:38           ` Hao Xu
@ 2022-05-30  6:59             ` Hao Xu
  2022-05-30  9:39             ` Pavel Begunkov
  1 sibling, 0 replies; 15+ messages in thread
From: Hao Xu @ 2022-05-30  6:59 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

On 5/30/22 14:38, Hao Xu wrote:
> On 5/30/22 06:50, Pavel Begunkov wrote:
>> On 5/29/22 19:40, Jens Axboe wrote:
>>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>
>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>
>>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>>> invocation and remove contension between different cancel_hash 
>>>>>> entries
>>>>>
>>>>> Interesting, do you have any numbers on this?
>>>>
>>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>>> actually RFC, forgot to change the subject.
>>>>
>>>>>
>>>>> Also, I'd make a hash bucket struct:
>>>>>
>>>>> struct io_hash_bucket {
>>>>>      spinlock_t lock;
>>>>>      struct hlist_head list;
>>>>> };
>>>>>
>>>>> rather than two separate structs, that'll have nicer memory 
>>>>> locality too
>>>>> and should further improve it. Could be done as a prep patch with the
>>>>> old locking in place, making the end patch doing the per-bucket lock
>>>>> simpler as well.
>>>>
>>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>>> hlist_bl list as well(the comment of it says it is much slower than
>>>> normal spin_lock, but we may not care the efficiency of poll
>>>> cancellation very much?).
>>>
>>> I don't think the bit spinlocks are going to be useful, we should
>>> stick with a spinlock for this. They are indeed slower and generally not
>>> used for that reason. For a use case where you need a ton of locks and
>>> saving the 4 bytes for a spinlock would make sense (or maybe not
>>> changing some struct?), maybe they have a purpose. But not for this.
>>
>> We can put the cancel hashes under uring_lock and completely kill
>> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
>> below won't even compile and missing cancellation bits, I'll pick it
>> up in a week.
>>
>> Even better would be to have two hash tables, and auto-magically apply
>> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
>> and apoll (need uring_lock after anyway).
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 6be21967959d..191fa7f31610 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -7120,12 +7120,20 @@ static void io_poll_task_func(struct io_kiocb 
>> *req, bool *locked)
>>       }
>>
>>       io_poll_remove_entries(req);
>> -    spin_lock(&ctx->completion_lock);
>> -    hash_del(&req->hash_node);
>> -    __io_req_complete_post(req, req->cqe.res, 0);
>> -    io_commit_cqring(ctx);
>> -    spin_unlock(&ctx->completion_lock);
>> -    io_cqring_ev_posted(ctx);
>> +
>> +    if (ctx->flags & IORING_MUTEX_HASH) {
>> +        io_tw_lock(ctx, locked);
>> +        hash_del(&req->hash_node);
>> +        io_req_complete_state(req, req->cqe.res, 0);
>> +        io_req_add_compl_list(req);
>> +    } else {
>> +        spin_lock(&ctx->completion_lock);
>> +        hash_del(&req->hash_node);
>> +        __io_req_complete_post(req, req->cqe.res, 0);
>> +        io_commit_cqring(ctx);
>> +        spin_unlock(&ctx->completion_lock);
>> +        io_cqring_ev_posted(ctx);
>> +    }
>>   }
>>
>>   static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
>> @@ -7138,9 +7146,14 @@ static void io_apoll_task_func(struct io_kiocb 
>> *req, bool *locked)
>>           return;
>>
>>       io_poll_remove_entries(req);
>> -    spin_lock(&ctx->completion_lock);
>> -    hash_del(&req->hash_node);
>> -    spin_unlock(&ctx->completion_lock);
>> +    if (ctx->flags & IORING_MUTEX_HASH) {
>> +        io_tw_lock(ctx, locked);
>> +        hash_del(&req->hash_node);
>> +    } else {
>> +        spin_lock(&ctx->completion_lock);
>> +        hash_del(&req->hash_node);
>> +        spin_unlock(&ctx->completion_lock);
>> +    }
>>
>>       if (!ret)
>>           io_req_task_submit(req, locked);
>> @@ -7332,9 +7345,13 @@ static int __io_arm_poll_handler(struct 
>> io_kiocb *req,
>>           return 0;
>>       }
>>
>> -    spin_lock(&ctx->completion_lock);
>> -    io_poll_req_insert(req);
>> -    spin_unlock(&ctx->completion_lock);
>> +    if (ctx->flags & IORING_MUTEX_HASH) {
> 
> Does IORING_MUTEX_HASH exclude IOSQE_ASYNC as well? though IOSQE_ASYNC
> is uncommon but users may do that.

I see, cases like IOSQE_ASYNC definitely goes into the else branch since
it's in iowq context.

> 
>> +        io_poll_req_insert(req);
>> +    } else {
>> +        spin_lock(&ctx->completion_lock);
>> +        io_poll_req_insert(req);
>> +        spin_unlock(&ctx->completion_lock);
>> +    }
>>
>>       if (mask) {
>>           /* can't multishot if failed, just queue the event we've got */
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-30  6:52               ` Hao Xu
@ 2022-05-30  9:35                 ` Pavel Begunkov
  0 siblings, 0 replies; 15+ messages in thread
From: Pavel Begunkov @ 2022-05-30  9:35 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, io-uring

On 5/30/22 07:52, Hao Xu wrote:
> On 5/30/22 08:18, Pavel Begunkov wrote:
>> On 5/30/22 00:34, Jens Axboe wrote:
>>> On 5/29/22 4:50 PM, Pavel Begunkov wrote:
>>>> On 5/29/22 19:40, Jens Axboe wrote:
>>>>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>>>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>>>
>>>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>>>
>>>>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>>>>> invocation and remove contension between different cancel_hash entries
>>>>>>>
>>>>>>> Interesting, do you have any numbers on this?
>>>>>>
>>>>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>>>>> actually RFC, forgot to change the subject.
>>>>>>
>>>>>>>
>>>>>>> Also, I'd make a hash bucket struct:
>>>>>>>
>>>>>>> struct io_hash_bucket {
>>>>>>>       spinlock_t lock;
>>>>>>>       struct hlist_head list;
>>>>>>> };
>>>>>>>
>>>>>>> rather than two separate structs, that'll have nicer memory locality too
>>>>>>> and should further improve it. Could be done as a prep patch with the
>>>>>>> old locking in place, making the end patch doing the per-bucket lock
>>>>>>> simpler as well.
>>>>>>
>>>>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>>>>> hlist_bl list as well(the comment of it says it is much slower than
>>>>>> normal spin_lock, but we may not care the efficiency of poll
>>>>>> cancellation very much?).
>>>>>
>>>>> I don't think the bit spinlocks are going to be useful, we should
>>>>> stick with a spinlock for this. They are indeed slower and generally not
>>>>> used for that reason. For a use case where you need a ton of locks and
>>>>> saving the 4 bytes for a spinlock would make sense (or maybe not
>>>>> changing some struct?), maybe they have a purpose. But not for this.
>>>>
>>>> We can put the cancel hashes under uring_lock and completely kill
>>>> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
>>>> below won't even compile and missing cancellation bits, I'll pick it
>>>> up in a week.
>>>>
>>>> Even better would be to have two hash tables, and auto-magically apply
>>>> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
>>>> and apoll (need uring_lock after anyway).
>>>
>>> My hope was that it'd take us closer to being able to use more granular
>>> locking for hashing in general. I don't care too much about the
>>> cancelation, but the normal hash locking would be useful to do.
>>>
>>> However, for cancelations, under uring_lock would indeed be preferable
>>> to doing per-bucket locks there. Guess I'll wait and see what your final
>>> patch looks like, not sure why it'd be a ctx conditional?
>>
>> It replaces 2 spin lock/unlock with one io_tw_lock() in the completion
>> path, which is done once per tw batch and grabbed anyway if
>> there is no contention (see handle_tw_list()).
>>
>> It could be unconditional, but I'd say for those 3 cases we have
>> non-existing chance to regress perf/latency, but I can think of
>> some cases where it might screw latencies, all share io_uring
>> b/w threads.
>>
>> Should benefit the cancellation path as well, but I don't care
>> about it as well.
>>
>>> What about io_poll_remove_all()?
>>
>> As mentioned, it's not handled in the diff, but easily doable,
>> it should just traverse both hash tables.
>>
> 
> Two hash tables looks good to me. If I don't get you wrong, one table
> under uring_lock, the other one for normal handling(like per bucket
> locking)?

Right, that's the plan

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-30  6:38           ` Hao Xu
  2022-05-30  6:59             ` Hao Xu
@ 2022-05-30  9:39             ` Pavel Begunkov
  1 sibling, 0 replies; 15+ messages in thread
From: Pavel Begunkov @ 2022-05-30  9:39 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, io-uring

On 5/30/22 07:38, Hao Xu wrote:
> On 5/30/22 06:50, Pavel Begunkov wrote:
>> On 5/29/22 19:40, Jens Axboe wrote:
>>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>
>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>
>>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>>> invocation and remove contension between different cancel_hash entries
>>>>>
>>>>> Interesting, do you have any numbers on this?
>>>>
>>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>>> actually RFC, forgot to change the subject.
>>>>
>>>>>
>>>>> Also, I'd make a hash bucket struct:
>>>>>
>>>>> struct io_hash_bucket {
>>>>>      spinlock_t lock;
>>>>>      struct hlist_head list;
>>>>> };
>>>>>
>>>>> rather than two separate structs, that'll have nicer memory locality too
>>>>> and should further improve it. Could be done as a prep patch with the
>>>>> old locking in place, making the end patch doing the per-bucket lock
>>>>> simpler as well.
>>>>
>>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>>> hlist_bl list as well(the comment of it says it is much slower than
>>>> normal spin_lock, but we may not care the efficiency of poll
>>>> cancellation very much?).
>>>
>>> I don't think the bit spinlocks are going to be useful, we should
>>> stick with a spinlock for this. They are indeed slower and generally not
>>> used for that reason. For a use case where you need a ton of locks and
>>> saving the 4 bytes for a spinlock would make sense (or maybe not
>>> changing some struct?), maybe they have a purpose. But not for this.
>>
>> We can put the cancel hashes under uring_lock and completely kill
>> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
>> below won't even compile and missing cancellation bits, I'll pick it
>> up in a week.
>>
>> Even better would be to have two hash tables, and auto-magically apply
>> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
>> and apoll (need uring_lock after anyway).
>>
>> @@ -7332,9 +7345,13 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
>>           return 0;
>>       }
>>
>> -    spin_lock(&ctx->completion_lock);
>> -    io_poll_req_insert(req);
>> -    spin_unlock(&ctx->completion_lock);
>> +    if (ctx->flags & IORING_MUTEX_HASH) {
> 
> Does IORING_MUTEX_HASH exclude IOSQE_ASYNC as well? though IOSQE_ASYNC
> is uncommon but users may do that.

It should, io-wq -> arm_poll() doesn't hold uring_lock, good point


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
  2022-05-29 16:25   ` Jens Axboe
  2022-05-29 18:07     ` Hao Xu
@ 2022-05-30 13:33     ` Hao Xu
  1 sibling, 0 replies; 15+ messages in thread
From: Hao Xu @ 2022-05-30 13:33 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Pavel Begunkov

On 5/30/22 00:25, Jens Axboe wrote:
> On 5/29/22 10:20 AM, Hao Xu wrote:
>> From: Hao Xu <howeyxu@tencent.com>
>>
>> From: Hao Xu <howeyxu@tencent.com>
>>
>> Use per list lock for cancel_hash, this removes some completion lock
>> invocation and remove contension between different cancel_hash entries
> 
> Interesting, do you have any numbers on this?
> 
> Also, I'd make a hash bucket struct:
> 
> struct io_hash_bucket {
> 	spinlock_t lock;
> 	struct hlist_head list;
> };
> 
> rather than two separate structs, that'll have nicer memory locality too
> and should further improve it. Could be done as a prep patch with the
> old locking in place, making the end patch doing the per-bucket lock
> simpler as well.
> 
> Hmm?
> 

I've done a v2 here, also a test which issues async poll densely to
make high frequency cancel_hash[] visits. But I won't have a real box
with big number of cpu processors which is suitable for testing until
tomorrow, so I'd test it tomorrow.

https://github.com/HowHsu/linux/commits/for-5.20/io_uring_hash_lock

https://github.com/HowHsu/liburing/commit/b9fb4d20a5dfe7c7bd62fe36c37aea3b261d4499

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-05-30 13:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-29 16:19 [PATCH 0/2] cancel_hash per entry lock Hao Xu
2022-05-29 16:19 ` [PATCH 1/2] io_uring: add an argument for io_poll_disarm() Hao Xu
2022-05-29 16:20 ` [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock Hao Xu
2022-05-29 16:25   ` Jens Axboe
2022-05-29 18:07     ` Hao Xu
2022-05-29 18:40       ` Jens Axboe
2022-05-29 22:50         ` Pavel Begunkov
2022-05-29 23:34           ` Jens Axboe
2022-05-30  0:18             ` Pavel Begunkov
2022-05-30  6:52               ` Hao Xu
2022-05-30  9:35                 ` Pavel Begunkov
2022-05-30  6:38           ` Hao Xu
2022-05-30  6:59             ` Hao Xu
2022-05-30  9:39             ` Pavel Begunkov
2022-05-30 13:33     ` Hao Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.