All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] improvements for multi-shot poll requests
@ 2021-10-25  5:38 Xiaoguang Wang
  2021-10-25  5:38 ` [PATCH v3 1/3] io_uring: refactor event check out of __io_async_wake() Xiaoguang Wang
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Xiaoguang Wang @ 2021-10-25  5:38 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, asml.silence

Echo_server codes can be clone from:
https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git
branch is xiaoguangwang/io_uring_multishot. There is a simple HOWTO
in this repository.

Usage:
In server: port 10016, 1000 connections, packet size 16 bytes, and
enable fixed files.
  taskset -c 10 io_uring_echo_server_multi_shot  -f -p 10016 -n 1000 -l 16

In client:
  taskset -c 13,14,15,16 ./echo -addr 11.238.147.21:10016 -n 1000 -size 16

Before this patchset, the tps is like below:
1:15:53 req: 1430425, req/s: 286084.693
11:15:58 req: 1426021, req/s: 285204.079
11:16:03 req: 1416761, req/s: 283352.146
11:16:08 req: 1417969, req/s: 283165.637
11:16:13 req: 1424591, req/s: 285349.915
11:16:18 req: 1418706, req/s: 283738.725
11:16:23 req: 1411988, req/s: 282399.052
11:16:28 req: 1419097, req/s: 283820.477
11:16:33 req: 1417816, req/s: 283563.262
11:16:38 req: 1422461, req/s: 284491.702
11:16:43 req: 1418176, req/s: 283635.327
11:16:48 req: 1414525, req/s: 282905.276
11:16:53 req: 1415624, req/s: 283124.140
11:16:58 req: 1426435, req/s: 284970.486

with this patchset:
2021/09/24 11:10:01 start to do client
11:10:06 req: 1444979, req/s: 288995.300
11:10:11 req: 1442559, req/s: 288511.689
11:10:16 req: 1427253, req/s: 285450.390
11:10:21 req: 1445236, req/s: 288349.853
11:10:26 req: 1423949, req/s: 285480.941
11:10:31 req: 1445304, req/s: 289060.815
11:10:36 req: 1441036, req/s: 288207.119
11:10:41 req: 1441117, req/s: 288220.695
11:10:46 req: 1441451, req/s: 288292.731
11:10:51 req: 1438801, req/s: 287759.157
11:10:56 req: 1433227, req/s: 286646.338
11:11:01 req: 1438307, req/s: 287661.577

about 1.3% tps improvements.

Changes in v3:
  Rebase to for-5.16/io_uring.

Changes in v2:
  I dropped the poll request completion batching patch in V1, since
it shows performance fluctuations, hard to say whether it's useful.

Xiaoguang Wang (3):
  io_uring: refactor event check out of __io_async_wake()
  io_uring: reduce frequent add_wait_queue() overhead for multi-shot
    poll request
  io_uring: don't get completion_lock in io_poll_rewait()

 fs/io_uring.c | 131 +++++++++++++++++++++++++++++++---------------------------
 1 file changed, 70 insertions(+), 61 deletions(-)

-- 
2.14.4.44.g2045bb6


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/3] io_uring: refactor event check out of __io_async_wake()
  2021-10-25  5:38 [PATCH v3 0/3] improvements for multi-shot poll requests Xiaoguang Wang
@ 2021-10-25  5:38 ` Xiaoguang Wang
  2021-10-25  9:35   ` Praveen Kumar
  2021-10-25  5:38 ` [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request Xiaoguang Wang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Xiaoguang Wang @ 2021-10-25  5:38 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, asml.silence

Which is a preparation for following patch, and here try to inline
__io_async_wake(), which is simple and can save a function call.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 fs/io_uring.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 736d456e7913..18af9bb9a4bc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5228,13 +5228,9 @@ struct io_poll_table {
 	int error;
 };
 
-static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll,
+static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll,
 			   __poll_t mask, io_req_tw_func_t func)
 {
-	/* for instances that support it check for an event match first: */
-	if (mask && !(mask & poll->events))
-		return 0;
-
 	trace_io_uring_task_add(req->ctx, req->opcode, req->user_data, mask);
 
 	list_del_init(&poll->wait.entry);
@@ -5508,11 +5504,16 @@ static int io_async_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 {
 	struct io_kiocb *req = wait->private;
 	struct io_poll_iocb *poll = &req->apoll->poll;
+	__poll_t mask = key_to_poll(key);
 
 	trace_io_uring_poll_wake(req->ctx, req->opcode, req->user_data,
 					key_to_poll(key));
 
-	return __io_async_wake(req, poll, key_to_poll(key), io_async_task_func);
+	/* for instances that support it check for an event match first: */
+	if (mask && !(mask & poll->events))
+		return 0;
+
+	return __io_async_wake(req, poll, mask, io_async_task_func);
 }
 
 static void io_poll_req_insert(struct io_kiocb *req)
@@ -5772,8 +5773,13 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 {
 	struct io_kiocb *req = wait->private;
 	struct io_poll_iocb *poll = &req->poll;
+	__poll_t mask = key_to_poll(key);
+
+	/* for instances that support it check for an event match first: */
+	if (mask && !(mask & poll->events))
+		return 0;
 
-	return __io_async_wake(req, poll, key_to_poll(key), io_poll_task_func);
+	return __io_async_wake(req, poll, mask, io_poll_task_func);
 }
 
 static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head,
-- 
2.14.4.44.g2045bb6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
  2021-10-25  5:38 [PATCH v3 0/3] improvements for multi-shot poll requests Xiaoguang Wang
  2021-10-25  5:38 ` [PATCH v3 1/3] io_uring: refactor event check out of __io_async_wake() Xiaoguang Wang
@ 2021-10-25  5:38 ` Xiaoguang Wang
  2021-10-28 19:21   ` Pavel Begunkov
  2021-10-25  5:38 ` [PATCH v3 3/3] io_uring: don't get completion_lock in io_poll_rewait() Xiaoguang Wang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Xiaoguang Wang @ 2021-10-25  5:38 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, asml.silence

Run echo_server to evaluate io_uring's multi-shot poll performance, perf
shows that add_wait_queue() has obvious overhead. Intruduce a new state
'active' in io_poll_iocb to indicate whether io_poll_wake() should queue
a task_work. This new state will be set to true initially, be set to false
when starting to queue a task work, and be set to true again when a poll
cqe has been committed. One concern is that this method may lost waken-up
event, but seems it's ok.

  io_poll_wake                io_poll_task_func
t1                       |
t2                       |    WRITE_ONCE(req->poll.active, true);
t3                       |
t4                       |    io_commit_cqring(ctx);
t5                       |
t6                       |

If waken-up events happens before or at t4, it's ok, user app will always
see a cqe. If waken-up events happens after t4 and IIUC, io_poll_wake()
will see the new req->poll.active value by using READ_ONCE().

Echo_server codes can be cloned from:
https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git,
branch is xiaoguangwang/io_uring_multishot.

Without this patch, the tps in our test environment is 284116, with
this patch, the tps is 287832, about 1.3% reqs improvement, which
is indeed in accord with the saved add_wait_queue() cost.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 fs/io_uring.c | 57 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 18af9bb9a4bc..e4c779dac953 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -481,6 +481,7 @@ struct io_poll_iocb {
 	__poll_t			events;
 	bool				done;
 	bool				canceled;
+	bool				active;
 	struct wait_queue_entry		wait;
 };
 
@@ -5233,8 +5234,6 @@ static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *pol
 {
 	trace_io_uring_task_add(req->ctx, req->opcode, req->user_data, mask);
 
-	list_del_init(&poll->wait.entry);
-
 	req->result = mask;
 	req->io_task_work.func = func;
 
@@ -5265,7 +5264,10 @@ static bool io_poll_rewait(struct io_kiocb *req, struct io_poll_iocb *poll)
 
 	spin_lock(&ctx->completion_lock);
 	if (!req->result && !READ_ONCE(poll->canceled)) {
-		add_wait_queue(poll->head, &poll->wait);
+		if (req->opcode == IORING_OP_POLL_ADD)
+			WRITE_ONCE(poll->active, true);
+		else
+			add_wait_queue(poll->head, &poll->wait);
 		return true;
 	}
 
@@ -5331,6 +5333,26 @@ static bool __io_poll_complete(struct io_kiocb *req, __poll_t mask)
 	return !(flags & IORING_CQE_F_MORE);
 }
 
+static bool __io_poll_remove_one(struct io_kiocb *req,
+				 struct io_poll_iocb *poll, bool do_cancel)
+	__must_hold(&req->ctx->completion_lock)
+{
+	bool do_complete = false;
+
+	if (!poll->head)
+		return false;
+	spin_lock_irq(&poll->head->lock);
+	if (do_cancel)
+		WRITE_ONCE(poll->canceled, true);
+	if (!list_empty(&poll->wait.entry)) {
+		list_del_init(&poll->wait.entry);
+		do_complete = true;
+	}
+	spin_unlock_irq(&poll->head->lock);
+	hash_del(&req->hash_node);
+	return do_complete;
+}
+
 static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 {
 	struct io_ring_ctx *ctx = req->ctx;
@@ -5348,11 +5370,12 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 		done = __io_poll_complete(req, req->result);
 		if (done) {
 			io_poll_remove_double(req);
+			__io_poll_remove_one(req, io_poll_get_single(req), true);
 			hash_del(&req->hash_node);
 			req->poll.done = true;
 		} else {
 			req->result = 0;
-			add_wait_queue(req->poll.head, &req->poll.wait);
+			WRITE_ONCE(req->poll.active, true);
 		}
 		io_commit_cqring(ctx);
 		spin_unlock(&ctx->completion_lock);
@@ -5407,6 +5430,7 @@ static void io_init_poll_iocb(struct io_poll_iocb *poll, __poll_t events,
 	poll->head = NULL;
 	poll->done = false;
 	poll->canceled = false;
+	poll->active = true;
 #define IO_POLL_UNMASK	(EPOLLERR|EPOLLHUP|EPOLLNVAL|EPOLLRDHUP)
 	/* mask in events that we always want/need */
 	poll->events = events | IO_POLL_UNMASK;
@@ -5513,6 +5537,7 @@ static int io_async_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 	if (mask && !(mask & poll->events))
 		return 0;
 
+	list_del_init(&poll->wait.entry);
 	return __io_async_wake(req, poll, mask, io_async_task_func);
 }
 
@@ -5623,26 +5648,6 @@ static int io_arm_poll_handler(struct io_kiocb *req)
 	return IO_APOLL_OK;
 }
 
-static bool __io_poll_remove_one(struct io_kiocb *req,
-				 struct io_poll_iocb *poll, bool do_cancel)
-	__must_hold(&req->ctx->completion_lock)
-{
-	bool do_complete = false;
-
-	if (!poll->head)
-		return false;
-	spin_lock_irq(&poll->head->lock);
-	if (do_cancel)
-		WRITE_ONCE(poll->canceled, true);
-	if (!list_empty(&poll->wait.entry)) {
-		list_del_init(&poll->wait.entry);
-		do_complete = true;
-	}
-	spin_unlock_irq(&poll->head->lock);
-	hash_del(&req->hash_node);
-	return do_complete;
-}
-
 static bool io_poll_remove_one(struct io_kiocb *req)
 	__must_hold(&req->ctx->completion_lock)
 {
@@ -5779,6 +5784,10 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 	if (mask && !(mask & poll->events))
 		return 0;
 
+	if (!READ_ONCE(poll->active))
+		return 0;
+	WRITE_ONCE(poll->active, false);
+
 	return __io_async_wake(req, poll, mask, io_poll_task_func);
 }
 
-- 
2.14.4.44.g2045bb6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/3] io_uring: don't get completion_lock in io_poll_rewait()
  2021-10-25  5:38 [PATCH v3 0/3] improvements for multi-shot poll requests Xiaoguang Wang
  2021-10-25  5:38 ` [PATCH v3 1/3] io_uring: refactor event check out of __io_async_wake() Xiaoguang Wang
  2021-10-25  5:38 ` [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request Xiaoguang Wang
@ 2021-10-25  5:38 ` Xiaoguang Wang
  2021-10-28 19:26   ` Pavel Begunkov
  2021-10-28 18:19 ` [PATCH v3 0/3] improvements for multi-shot poll requests Jens Axboe
  2021-10-28 18:19 ` Jens Axboe
  4 siblings, 1 reply; 19+ messages in thread
From: Xiaoguang Wang @ 2021-10-25  5:38 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, asml.silence

In current implementation, if there are not available events,
io_poll_rewait() just gets completion_lock, and unlocks it in
io_poll_task_func() or io_async_task_func(), which isn't necessary.

Change this logic to let io_poll_task_func() or io_async_task_func()
get the completion_lock lock.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 fs/io_uring.c | 58 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 26 insertions(+), 32 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e4c779dac953..41ff8fdafe55 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5248,10 +5248,7 @@ static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *pol
 }
 
 static bool io_poll_rewait(struct io_kiocb *req, struct io_poll_iocb *poll)
-	__acquires(&req->ctx->completion_lock)
 {
-	struct io_ring_ctx *ctx = req->ctx;
-
 	/* req->task == current here, checking PF_EXITING is safe */
 	if (unlikely(req->task->flags & PF_EXITING))
 		WRITE_ONCE(poll->canceled, true);
@@ -5262,7 +5259,6 @@ static bool io_poll_rewait(struct io_kiocb *req, struct io_poll_iocb *poll)
 		req->result = vfs_poll(req->file, &pt) & poll->events;
 	}
 
-	spin_lock(&ctx->completion_lock);
 	if (!req->result && !READ_ONCE(poll->canceled)) {
 		if (req->opcode == IORING_OP_POLL_ADD)
 			WRITE_ONCE(poll->active, true);
@@ -5357,35 +5353,34 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_kiocb *nxt;
+	bool done;
 
-	if (io_poll_rewait(req, &req->poll)) {
-		spin_unlock(&ctx->completion_lock);
-	} else {
-		bool done;
+	if (io_poll_rewait(req, &req->poll))
+		return;
 
-		if (req->poll.done) {
-			spin_unlock(&ctx->completion_lock);
-			return;
-		}
-		done = __io_poll_complete(req, req->result);
-		if (done) {
-			io_poll_remove_double(req);
-			__io_poll_remove_one(req, io_poll_get_single(req), true);
-			hash_del(&req->hash_node);
-			req->poll.done = true;
-		} else {
-			req->result = 0;
-			WRITE_ONCE(req->poll.active, true);
-		}
-		io_commit_cqring(ctx);
+	spin_lock(&ctx->completion_lock);
+	if (req->poll.done) {
 		spin_unlock(&ctx->completion_lock);
-		io_cqring_ev_posted(ctx);
+		return;
+	}
+	done = __io_poll_complete(req, req->result);
+	if (done) {
+		io_poll_remove_double(req);
+		__io_poll_remove_one(req, io_poll_get_single(req), true);
+		hash_del(&req->hash_node);
+		req->poll.done = true;
+	} else {
+		req->result = 0;
+		WRITE_ONCE(req->poll.active, true);
+	}
+	io_commit_cqring(ctx);
+	spin_unlock(&ctx->completion_lock);
+	io_cqring_ev_posted(ctx);
 
-		if (done) {
-			nxt = io_put_req_find_next(req);
-			if (nxt)
-				io_req_task_submit(nxt, locked);
-		}
+	if (done) {
+		nxt = io_put_req_find_next(req);
+		if (nxt)
+			io_req_task_submit(nxt, locked);
 	}
 }
 
@@ -5507,11 +5502,10 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
 
 	trace_io_uring_task_run(req->ctx, req, req->opcode, req->user_data);
 
-	if (io_poll_rewait(req, &apoll->poll)) {
-		spin_unlock(&ctx->completion_lock);
+	if (io_poll_rewait(req, &apoll->poll))
 		return;
-	}
 
+	spin_lock(&ctx->completion_lock);
 	hash_del(&req->hash_node);
 	io_poll_remove_double(req);
 	apoll->poll.done = true;
-- 
2.14.4.44.g2045bb6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] io_uring: refactor event check out of __io_async_wake()
  2021-10-25  5:38 ` [PATCH v3 1/3] io_uring: refactor event check out of __io_async_wake() Xiaoguang Wang
@ 2021-10-25  9:35   ` Praveen Kumar
  0 siblings, 0 replies; 19+ messages in thread
From: Praveen Kumar @ 2021-10-25  9:35 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe, asml.silence

On 25-10-2021 11:08, Xiaoguang Wang wrote:
> Which is a preparation for following patch, and here try to inline
> __io_async_wake(), which is simple and can save a function call.
> 
> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
> ---
>  fs/io_uring.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 736d456e7913..18af9bb9a4bc 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5228,13 +5228,9 @@ struct io_poll_table {
>  	int error;
>  };
>  
> -static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll,
> +static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll,
>  			   __poll_t mask, io_req_tw_func_t func)
>  {
> -	/* for instances that support it check for an event match first: */
> -	if (mask && !(mask & poll->events))
> -		return 0;
> -

Is it possible to keep this check as it is, and make the __io_async_wake function inline ONLY ?
As I can see, the callers doing the same checks at different places ?
Also, there could be a possibility that, this check may get missed in new caller APIs introduced in future.

>  	trace_io_uring_task_add(req->ctx, req->opcode, req->user_data, mask);
>  
>  	list_del_init(&poll->wait.entry);
> @@ -5508,11 +5504,16 @@ static int io_async_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
>  {
>  	struct io_kiocb *req = wait->private;
>  	struct io_poll_iocb *poll = &req->apoll->poll;
> +	__poll_t mask = key_to_poll(key);
>  
>  	trace_io_uring_poll_wake(req->ctx, req->opcode, req->user_data,
>  					key_to_poll(key));
>  
> -	return __io_async_wake(req, poll, key_to_poll(key), io_async_task_func);
> +	/* for instances that support it check for an event match first: */
> +	if (mask && !(mask & poll->events))
> +		return 0;
> +
> +	return __io_async_wake(req, poll, mask, io_async_task_func);
>  }
>  
>  static void io_poll_req_insert(struct io_kiocb *req)
> @@ -5772,8 +5773,13 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
>  {
>  	struct io_kiocb *req = wait->private;
>  	struct io_poll_iocb *poll = &req->poll;
> +	__poll_t mask = key_to_poll(key);
> +
> +	/* for instances that support it check for an event match first: */
> +	if (mask && !(mask & poll->events))
> +		return 0;
>  
> -	return __io_async_wake(req, poll, key_to_poll(key), io_poll_task_func);
> +	return __io_async_wake(req, poll, mask, io_poll_task_func);
>  }
>  
>  static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head,
> 

Regards,

~Praveen.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/3] improvements for multi-shot poll requests
  2021-10-25  5:38 [PATCH v3 0/3] improvements for multi-shot poll requests Xiaoguang Wang
                   ` (2 preceding siblings ...)
  2021-10-25  5:38 ` [PATCH v3 3/3] io_uring: don't get completion_lock in io_poll_rewait() Xiaoguang Wang
@ 2021-10-28 18:19 ` Jens Axboe
  2021-10-29 18:29   ` Jens Axboe
  2021-10-28 18:19 ` Jens Axboe
  4 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2021-10-28 18:19 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: asml.silence

On 10/24/21 11:38 PM, Xiaoguang Wang wrote:
> Echo_server codes can be clone from:
> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git
> branch is xiaoguangwang/io_uring_multishot. There is a simple HOWTO
> in this repository.
> 
> Usage:
> In server: port 10016, 1000 connections, packet size 16 bytes, and
> enable fixed files.
>   taskset -c 10 io_uring_echo_server_multi_shot  -f -p 10016 -n 1000 -l 16
> 
> In client:
>   taskset -c 13,14,15,16 ./echo -addr 11.238.147.21:10016 -n 1000 -size 16
> 
> Before this patchset, the tps is like below:
> 1:15:53 req: 1430425, req/s: 286084.693
> 11:15:58 req: 1426021, req/s: 285204.079
> 11:16:03 req: 1416761, req/s: 283352.146
> 11:16:08 req: 1417969, req/s: 283165.637
> 11:16:13 req: 1424591, req/s: 285349.915
> 11:16:18 req: 1418706, req/s: 283738.725
> 11:16:23 req: 1411988, req/s: 282399.052
> 11:16:28 req: 1419097, req/s: 283820.477
> 11:16:33 req: 1417816, req/s: 283563.262
> 11:16:38 req: 1422461, req/s: 284491.702
> 11:16:43 req: 1418176, req/s: 283635.327
> 11:16:48 req: 1414525, req/s: 282905.276
> 11:16:53 req: 1415624, req/s: 283124.140
> 11:16:58 req: 1426435, req/s: 284970.486
> 
> with this patchset:
> 2021/09/24 11:10:01 start to do client
> 11:10:06 req: 1444979, req/s: 288995.300
> 11:10:11 req: 1442559, req/s: 288511.689
> 11:10:16 req: 1427253, req/s: 285450.390
> 11:10:21 req: 1445236, req/s: 288349.853
> 11:10:26 req: 1423949, req/s: 285480.941
> 11:10:31 req: 1445304, req/s: 289060.815
> 11:10:36 req: 1441036, req/s: 288207.119
> 11:10:41 req: 1441117, req/s: 288220.695
> 11:10:46 req: 1441451, req/s: 288292.731
> 11:10:51 req: 1438801, req/s: 287759.157
> 11:10:56 req: 1433227, req/s: 286646.338
> 11:11:01 req: 1438307, req/s: 287661.577> 
> about 1.3% tps improvements.

In the spirit of moving this one along, I've applied this series. Still a few
things we can do on top, but I don't think that should hold it back. If you
planned on sending an update to inline that check again just do it on top
of the current tree.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/3] improvements for multi-shot poll requests
  2021-10-25  5:38 [PATCH v3 0/3] improvements for multi-shot poll requests Xiaoguang Wang
                   ` (3 preceding siblings ...)
  2021-10-28 18:19 ` [PATCH v3 0/3] improvements for multi-shot poll requests Jens Axboe
@ 2021-10-28 18:19 ` Jens Axboe
  2021-10-28 19:01   ` Pavel Begunkov
  4 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2021-10-28 18:19 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: asml.silence

On Mon, 25 Oct 2021 13:38:46 +0800, Xiaoguang Wang wrote:
> Echo_server codes can be clone from:
> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git
> branch is xiaoguangwang/io_uring_multishot. There is a simple HOWTO
> in this repository.
> 
> Usage:
> In server: port 10016, 1000 connections, packet size 16 bytes, and
> enable fixed files.
>   taskset -c 10 io_uring_echo_server_multi_shot  -f -p 10016 -n 1000 -l 16
> 
> [...]

Applied, thanks!

[1/3] io_uring: refactor event check out of __io_async_wake()
      commit: db3191671f970164d0074039d262d3f402a417eb
[2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
      commit: 34ced75ca1f63fac6148497971212583aa0f7a87
[3/3] io_uring: don't get completion_lock in io_poll_rewait()
      commit: 57d9cc0f0dfe7453327c4c71ea22074419e2e800

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/3] improvements for multi-shot poll requests
  2021-10-28 18:19 ` Jens Axboe
@ 2021-10-28 19:01   ` Pavel Begunkov
  2021-10-28 19:04     ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2021-10-28 19:01 UTC (permalink / raw)
  To: Jens Axboe, Xiaoguang Wang, io-uring

On 10/28/21 19:19, Jens Axboe wrote:
> On Mon, 25 Oct 2021 13:38:46 +0800, Xiaoguang Wang wrote:
>> Echo_server codes can be clone from:
>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git
>> branch is xiaoguangwang/io_uring_multishot. There is a simple HOWTO
>> in this repository.
>>
>> Usage:
>> In server: port 10016, 1000 connections, packet size 16 bytes, and
>> enable fixed files.
>>    taskset -c 10 io_uring_echo_server_multi_shot  -f -p 10016 -n 1000 -l 16
>>
>> [...]
> 
> Applied, thanks!
> 
> [1/3] io_uring: refactor event check out of __io_async_wake()
>        commit: db3191671f970164d0074039d262d3f402a417eb
> [2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
>        commit: 34ced75ca1f63fac6148497971212583aa0f7a87
> [3/3] io_uring: don't get completion_lock in io_poll_rewait()
>        commit: 57d9cc0f0dfe7453327c4c71ea22074419e2e800

Jens, give me time to take a look first. There might be enough of bugs
just because of sheer amount of new code. I also don't like the amount
of overhead it adds to generic paths, hopefully unnecessary

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/3] improvements for multi-shot poll requests
  2021-10-28 19:01   ` Pavel Begunkov
@ 2021-10-28 19:04     ` Pavel Begunkov
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Begunkov @ 2021-10-28 19:04 UTC (permalink / raw)
  To: Jens Axboe, Xiaoguang Wang, io-uring

On 10/28/21 20:01, Pavel Begunkov wrote:
> On 10/28/21 19:19, Jens Axboe wrote:
>> On Mon, 25 Oct 2021 13:38:46 +0800, Xiaoguang Wang wrote:
>>> Echo_server codes can be clone from:
>>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git
>>> branch is xiaoguangwang/io_uring_multishot. There is a simple HOWTO
>>> in this repository.
>>>
>>> Usage:
>>> In server: port 10016, 1000 connections, packet size 16 bytes, and
>>> enable fixed files.
>>>    taskset -c 10 io_uring_echo_server_multi_shot  -f -p 10016 -n 1000 -l 16
>>>
>>> [...]
>>
>> Applied, thanks!
>>
>> [1/3] io_uring: refactor event check out of __io_async_wake()
>>        commit: db3191671f970164d0074039d262d3f402a417eb
>> [2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
>>        commit: 34ced75ca1f63fac6148497971212583aa0f7a87
>> [3/3] io_uring: don't get completion_lock in io_poll_rewait()
>>        commit: 57d9cc0f0dfe7453327c4c71ea22074419e2e800
> 
> Jens, give me time to take a look first. There might be enough of bugs
> just because of sheer amount of new code. I also don't like the amount
> of overhead it adds to generic paths, hopefully unnecessary

Hmm, I mixed them up. This one is simpler, I was looking at the other one

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
  2021-10-25  5:38 ` [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request Xiaoguang Wang
@ 2021-10-28 19:21   ` Pavel Begunkov
  2021-10-29  2:57     ` Xiaoguang Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2021-10-28 19:21 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe

On 10/25/21 06:38, Xiaoguang Wang wrote:
> Run echo_server to evaluate io_uring's multi-shot poll performance, perf
> shows that add_wait_queue() has obvious overhead. Intruduce a new state
> 'active' in io_poll_iocb to indicate whether io_poll_wake() should queue
> a task_work. This new state will be set to true initially, be set to false
> when starting to queue a task work, and be set to true again when a poll
> cqe has been committed. One concern is that this method may lost waken-up
> event, but seems it's ok.
> 
>    io_poll_wake                io_poll_task_func
> t1                       |
> t2                       |    WRITE_ONCE(req->poll.active, true);
> t3                       |
> t4                       |    io_commit_cqring(ctx);
> t5                       |
> t6                       |
> 
> If waken-up events happens before or at t4, it's ok, user app will always
> see a cqe. If waken-up events happens after t4 and IIUC, io_poll_wake()
> will see the new req->poll.active value by using READ_ONCE().
> 
> Echo_server codes can be cloned from:
> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git,
> branch is xiaoguangwang/io_uring_multishot.
> 
> Without this patch, the tps in our test environment is 284116, with
> this patch, the tps is 287832, about 1.3% reqs improvement, which
> is indeed in accord with the saved add_wait_queue() cost.
> 
> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
> ---
>   fs/io_uring.c | 57 +++++++++++++++++++++++++++++++++------------------------
>   1 file changed, 33 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 18af9bb9a4bc..e4c779dac953 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -481,6 +481,7 @@ struct io_poll_iocb {
>   	__poll_t			events;
>   	bool				done;
>   	bool				canceled;
> +	bool				active;
>   	struct wait_queue_entry		wait;
>   };
>   
> @@ -5233,8 +5234,6 @@ static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *pol
>   {
>   	trace_io_uring_task_add(req->ctx, req->opcode, req->user_data, mask);
>   
> -	list_del_init(&poll->wait.entry);
> -

As I mentioned to Hao some time ago, we can't allow this function or in
particular io_req_task_work_add() to happen twice before the first
task work got executed, they use the same field in io_kiocb and those
will corrupt the tw list.

Looks that's what can happen here.

>   	req->result = mask;
>   	req->io_task_work.func = func;
>   
> @@ -5265,7 +5264,10 @@ static bool io_poll_rewait(struct io_kiocb *req, struct io_poll_iocb *poll)
>   
>   	spin_lock(&ctx->completion_lock);
>   	if (!req->result && !READ_ONCE(poll->canceled)) {
> -		add_wait_queue(poll->head, &poll->wait);
> +		if (req->opcode == IORING_OP_POLL_ADD)
> +			WRITE_ONCE(poll->active, true);
> +		else
> +			add_wait_queue(poll->head, &poll->wait);
>   		return true;
>   	}
>   
> @@ -5331,6 +5333,26 @@ static bool __io_poll_complete(struct io_kiocb *req, __poll_t mask)
>   	return !(flags & IORING_CQE_F_MORE);
>   }
>   
> +static bool __io_poll_remove_one(struct io_kiocb *req,
> +				 struct io_poll_iocb *poll, bool do_cancel)
> +	__must_hold(&req->ctx->completion_lock)
> +{
> +	bool do_complete = false;
> +
> +	if (!poll->head)
> +		return false;
> +	spin_lock_irq(&poll->head->lock);
> +	if (do_cancel)
> +		WRITE_ONCE(poll->canceled, true);
> +	if (!list_empty(&poll->wait.entry)) {
> +		list_del_init(&poll->wait.entry);
> +		do_complete = true;
> +	}
> +	spin_unlock_irq(&poll->head->lock);
> +	hash_del(&req->hash_node);
> +	return do_complete;
> +}
> +
>   static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>   {
>   	struct io_ring_ctx *ctx = req->ctx;
> @@ -5348,11 +5370,12 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>   		done = __io_poll_complete(req, req->result);
>   		if (done) {
>   			io_poll_remove_double(req);
> +			__io_poll_remove_one(req, io_poll_get_single(req), true);
>   			hash_del(&req->hash_node);
>   			req->poll.done = true;
>   		} else {
>   			req->result = 0;
> -			add_wait_queue(req->poll.head, &req->poll.wait);
> +			WRITE_ONCE(req->poll.active, true);
>   		}
>   		io_commit_cqring(ctx);
>   		spin_unlock(&ctx->completion_lock);
> @@ -5407,6 +5430,7 @@ static void io_init_poll_iocb(struct io_poll_iocb *poll, __poll_t events,
>   	poll->head = NULL;
>   	poll->done = false;
>   	poll->canceled = false;
> +	poll->active = true;
>   #define IO_POLL_UNMASK	(EPOLLERR|EPOLLHUP|EPOLLNVAL|EPOLLRDHUP)
>   	/* mask in events that we always want/need */
>   	poll->events = events | IO_POLL_UNMASK;
> @@ -5513,6 +5537,7 @@ static int io_async_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
>   	if (mask && !(mask & poll->events))
>   		return 0;
>   
> +	list_del_init(&poll->wait.entry);
>   	return __io_async_wake(req, poll, mask, io_async_task_func);
>   }
>   
> @@ -5623,26 +5648,6 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>   	return IO_APOLL_OK;
>   }
>   
> -static bool __io_poll_remove_one(struct io_kiocb *req,
> -				 struct io_poll_iocb *poll, bool do_cancel)
> -	__must_hold(&req->ctx->completion_lock)
> -{
> -	bool do_complete = false;
> -
> -	if (!poll->head)
> -		return false;
> -	spin_lock_irq(&poll->head->lock);
> -	if (do_cancel)
> -		WRITE_ONCE(poll->canceled, true);
> -	if (!list_empty(&poll->wait.entry)) {
> -		list_del_init(&poll->wait.entry);
> -		do_complete = true;
> -	}
> -	spin_unlock_irq(&poll->head->lock);
> -	hash_del(&req->hash_node);
> -	return do_complete;
> -}
> -
>   static bool io_poll_remove_one(struct io_kiocb *req)
>   	__must_hold(&req->ctx->completion_lock)
>   {
> @@ -5779,6 +5784,10 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
>   	if (mask && !(mask & poll->events))
>   		return 0;
>   
> +	if (!READ_ONCE(poll->active))
> +		return 0;
> +	WRITE_ONCE(poll->active, false);
> +
>   	return __io_async_wake(req, poll, mask, io_poll_task_func);
>   }
>   
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/3] io_uring: don't get completion_lock in io_poll_rewait()
  2021-10-25  5:38 ` [PATCH v3 3/3] io_uring: don't get completion_lock in io_poll_rewait() Xiaoguang Wang
@ 2021-10-28 19:26   ` Pavel Begunkov
  2021-10-29  5:59     ` Xiaoguang Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2021-10-28 19:26 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe

On 10/25/21 06:38, Xiaoguang Wang wrote:
> In current implementation, if there are not available events,
> io_poll_rewait() just gets completion_lock, and unlocks it in
> io_poll_task_func() or io_async_task_func(), which isn't necessary.
> 
> Change this logic to let io_poll_task_func() or io_async_task_func()
> get the completion_lock lock.
> 
> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
> ---
>   fs/io_uring.c | 58 ++++++++++++++++++++++++++--------------------------------
>   1 file changed, 26 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index e4c779dac953..41ff8fdafe55 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5248,10 +5248,7 @@ static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *pol
>   }
>   
>   static bool io_poll_rewait(struct io_kiocb *req, struct io_poll_iocb *poll)
> -	__acquires(&req->ctx->completion_lock)
>   {
> -	struct io_ring_ctx *ctx = req->ctx;
> -
>   	/* req->task == current here, checking PF_EXITING is safe */
>   	if (unlikely(req->task->flags & PF_EXITING))
>   		WRITE_ONCE(poll->canceled, true);
> @@ -5262,7 +5259,6 @@ static bool io_poll_rewait(struct io_kiocb *req, struct io_poll_iocb *poll)
>   		req->result = vfs_poll(req->file, &pt) & poll->events;
>   	}
>   
> -	spin_lock(&ctx->completion_lock);

Don't remember poll sync too well but this was synchronising with the
final section of __io_arm_poll_handler(), and I'm afraid it may go
completely loose with races.


>   	if (!req->result && !READ_ONCE(poll->canceled)) {
>   		if (req->opcode == IORING_OP_POLL_ADD)
>   			WRITE_ONCE(poll->active, true);
-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
  2021-10-28 19:21   ` Pavel Begunkov
@ 2021-10-29  2:57     ` Xiaoguang Wang
  2021-10-29 10:02       ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Xiaoguang Wang @ 2021-10-29  2:57 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: axboe

hi,

> On 10/25/21 06:38, Xiaoguang Wang wrote:
>> Run echo_server to evaluate io_uring's multi-shot poll performance, perf
>> shows that add_wait_queue() has obvious overhead. Intruduce a new state
>> 'active' in io_poll_iocb to indicate whether io_poll_wake() should queue
>> a task_work. This new state will be set to true initially, be set to 
>> false
>> when starting to queue a task work, and be set to true again when a poll
>> cqe has been committed. One concern is that this method may lost 
>> waken-up
>> event, but seems it's ok.
>>
>>    io_poll_wake                io_poll_task_func
>> t1                       |
>> t2                       |    WRITE_ONCE(req->poll.active, true);
>> t3                       |
>> t4                       |    io_commit_cqring(ctx);
>> t5                       |
>> t6                       |
>>
>> If waken-up events happens before or at t4, it's ok, user app will 
>> always
>> see a cqe. If waken-up events happens after t4 and IIUC, io_poll_wake()
>> will see the new req->poll.active value by using READ_ONCE().
>>
>> Echo_server codes can be cloned from:
>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git,
>> branch is xiaoguangwang/io_uring_multishot.
>>
>> Without this patch, the tps in our test environment is 284116, with
>> this patch, the tps is 287832, about 1.3% reqs improvement, which
>> is indeed in accord with the saved add_wait_queue() cost.
>>
>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>> ---
>>   fs/io_uring.c | 57 
>> +++++++++++++++++++++++++++++++++------------------------
>>   1 file changed, 33 insertions(+), 24 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 18af9bb9a4bc..e4c779dac953 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -481,6 +481,7 @@ struct io_poll_iocb {
>>       __poll_t            events;
>>       bool                done;
>>       bool                canceled;
>> +    bool                active;
>>       struct wait_queue_entry        wait;
>>   };
>>   @@ -5233,8 +5234,6 @@ static inline int __io_async_wake(struct 
>> io_kiocb *req, struct io_poll_iocb *pol
>>   {
>>       trace_io_uring_task_add(req->ctx, req->opcode, req->user_data, 
>> mask);
>>   -    list_del_init(&poll->wait.entry);
>> -
>
> As I mentioned to Hao some time ago, we can't allow this function or in
> particular io_req_task_work_add() to happen twice before the first
> task work got executed, they use the same field in io_kiocb and those
> will corrupt the tw list.
>
> Looks that's what can happen here.
If I have understood scenario your described correctly, I think it won't 
happen :)
With this patch, if the first io_req_task_work_add() is issued, poll.active
will be set to false, then no new io_req_task_work_add() will be issued.
Only the first task_work installed by the first io_req_task_work_add() has
completed, poll.active will be set to true again.


Regards,
Xiaoguang Wang
>
>>       req->result = mask;
>>       req->io_task_work.func = func;
>>   @@ -5265,7 +5264,10 @@ static bool io_poll_rewait(struct io_kiocb 
>> *req, struct io_poll_iocb *poll)
>>         spin_lock(&ctx->completion_lock);
>>       if (!req->result && !READ_ONCE(poll->canceled)) {
>> -        add_wait_queue(poll->head, &poll->wait);
>> +        if (req->opcode == IORING_OP_POLL_ADD)
>> +            WRITE_ONCE(poll->active, true);
>> +        else
>> +            add_wait_queue(poll->head, &poll->wait);
>>           return true;
>>       }
>>   @@ -5331,6 +5333,26 @@ static bool __io_poll_complete(struct 
>> io_kiocb *req, __poll_t mask)
>>       return !(flags & IORING_CQE_F_MORE);
>>   }
>>   +static bool __io_poll_remove_one(struct io_kiocb *req,
>> +                 struct io_poll_iocb *poll, bool do_cancel)
>> +    __must_hold(&req->ctx->completion_lock)
>> +{
>> +    bool do_complete = false;
>> +
>> +    if (!poll->head)
>> +        return false;
>> +    spin_lock_irq(&poll->head->lock);
>> +    if (do_cancel)
>> +        WRITE_ONCE(poll->canceled, true);
>> +    if (!list_empty(&poll->wait.entry)) {
>> +        list_del_init(&poll->wait.entry);
>> +        do_complete = true;
>> +    }
>> +    spin_unlock_irq(&poll->head->lock);
>> +    hash_del(&req->hash_node);
>> +    return do_complete;
>> +}
>> +
>>   static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>>   {
>>       struct io_ring_ctx *ctx = req->ctx;
>> @@ -5348,11 +5370,12 @@ static void io_poll_task_func(struct io_kiocb 
>> *req, bool *locked)
>>           done = __io_poll_complete(req, req->result);
>>           if (done) {
>>               io_poll_remove_double(req);
>> +            __io_poll_remove_one(req, io_poll_get_single(req), true);
>>               hash_del(&req->hash_node);
>>               req->poll.done = true;
>>           } else {
>>               req->result = 0;
>> -            add_wait_queue(req->poll.head, &req->poll.wait);
>> +            WRITE_ONCE(req->poll.active, true);
>>           }
>>           io_commit_cqring(ctx);
>>           spin_unlock(&ctx->completion_lock);
>> @@ -5407,6 +5430,7 @@ static void io_init_poll_iocb(struct 
>> io_poll_iocb *poll, __poll_t events,
>>       poll->head = NULL;
>>       poll->done = false;
>>       poll->canceled = false;
>> +    poll->active = true;
>>   #define IO_POLL_UNMASK (EPOLLERR|EPOLLHUP|EPOLLNVAL|EPOLLRDHUP)
>>       /* mask in events that we always want/need */
>>       poll->events = events | IO_POLL_UNMASK;
>> @@ -5513,6 +5537,7 @@ static int io_async_wake(struct 
>> wait_queue_entry *wait, unsigned mode, int sync,
>>       if (mask && !(mask & poll->events))
>>           return 0;
>>   +    list_del_init(&poll->wait.entry);
>>       return __io_async_wake(req, poll, mask, io_async_task_func);
>>   }
>>   @@ -5623,26 +5648,6 @@ static int io_arm_poll_handler(struct 
>> io_kiocb *req)
>>       return IO_APOLL_OK;
>>   }
>>   -static bool __io_poll_remove_one(struct io_kiocb *req,
>> -                 struct io_poll_iocb *poll, bool do_cancel)
>> -    __must_hold(&req->ctx->completion_lock)
>> -{
>> -    bool do_complete = false;
>> -
>> -    if (!poll->head)
>> -        return false;
>> -    spin_lock_irq(&poll->head->lock);
>> -    if (do_cancel)
>> -        WRITE_ONCE(poll->canceled, true);
>> -    if (!list_empty(&poll->wait.entry)) {
>> -        list_del_init(&poll->wait.entry);
>> -        do_complete = true;
>> -    }
>> -    spin_unlock_irq(&poll->head->lock);
>> -    hash_del(&req->hash_node);
>> -    return do_complete;
>> -}
>> -
>>   static bool io_poll_remove_one(struct io_kiocb *req)
>>       __must_hold(&req->ctx->completion_lock)
>>   {
>> @@ -5779,6 +5784,10 @@ static int io_poll_wake(struct 
>> wait_queue_entry *wait, unsigned mode, int sync,
>>       if (mask && !(mask & poll->events))
>>           return 0;
>>   +    if (!READ_ONCE(poll->active))
>> +        return 0;
>> +    WRITE_ONCE(poll->active, false);
>> +
>>       return __io_async_wake(req, poll, mask, io_poll_task_func);
>>   }
>>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/3] io_uring: don't get completion_lock in io_poll_rewait()
  2021-10-28 19:26   ` Pavel Begunkov
@ 2021-10-29  5:59     ` Xiaoguang Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Xiaoguang Wang @ 2021-10-29  5:59 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: axboe

hi,

> On 10/25/21 06:38, Xiaoguang Wang wrote:
>> In current implementation, if there are not available events,
>> io_poll_rewait() just gets completion_lock, and unlocks it in
>> io_poll_task_func() or io_async_task_func(), which isn't necessary.
>>
>> Change this logic to let io_poll_task_func() or io_async_task_func()
>> get the completion_lock lock.
>>
>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>> ---
>>   fs/io_uring.c | 58 
>> ++++++++++++++++++++++++++--------------------------------
>>   1 file changed, 26 insertions(+), 32 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index e4c779dac953..41ff8fdafe55 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5248,10 +5248,7 @@ static inline int __io_async_wake(struct 
>> io_kiocb *req, struct io_poll_iocb *pol
>>   }
>>     static bool io_poll_rewait(struct io_kiocb *req, struct 
>> io_poll_iocb *poll)
>> -    __acquires(&req->ctx->completion_lock)
>>   {
>> -    struct io_ring_ctx *ctx = req->ctx;
>> -
>>       /* req->task == current here, checking PF_EXITING is safe */
>>       if (unlikely(req->task->flags & PF_EXITING))
>>           WRITE_ONCE(poll->canceled, true);
>> @@ -5262,7 +5259,6 @@ static bool io_poll_rewait(struct io_kiocb 
>> *req, struct io_poll_iocb *poll)
>>           req->result = vfs_poll(req->file, &pt) & poll->events;
>>       }
>>   -    spin_lock(&ctx->completion_lock);
>
> Don't remember poll sync too well but this was synchronising with the
> final section of __io_arm_poll_handler(), and I'm afraid it may go
> completely loose with races.
Yeah, I understand your concerns, and the final section of 
__io_arm_poll_handler() sees
very complicated, it maybe need better cleanup.

After checking my patch and __io_arm_poll_handler() again, I think the 
race which maybe
introduced in my patch is that:
   1)  __io_arm_poll_handler() calls list_del_init(&poll->wait.entry) 
under completion_lock.
   2) io_poll_rewait calls add_wait_queue() without completion_lock.

But __io_arm_poll_handler() only calls list_del_init(&poll->wait.entry)  
when mask isn't zero.
and io_poll_rewait() only calls add_wait_queue when no real event 
happens(mask is zero).
So 1) and 2) should not happen at the same time, seems that there's no race.

Regards,
Xiaoguang Wang

>
>
>>       if (!req->result && !READ_ONCE(poll->canceled)) {
>>           if (req->opcode == IORING_OP_POLL_ADD)
>>               WRITE_ONCE(poll->active, true);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
  2021-10-29  2:57     ` Xiaoguang Wang
@ 2021-10-29 10:02       ` Pavel Begunkov
  2021-10-29 13:37         ` Xiaoguang Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2021-10-29 10:02 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe

On 10/29/21 03:57, Xiaoguang Wang wrote:
> hi,
> 
>> On 10/25/21 06:38, Xiaoguang Wang wrote:
>>> Run echo_server to evaluate io_uring's multi-shot poll performance, perf
>>> shows that add_wait_queue() has obvious overhead. Intruduce a new state
>>> 'active' in io_poll_iocb to indicate whether io_poll_wake() should queue
>>> a task_work. This new state will be set to true initially, be set to false
>>> when starting to queue a task work, and be set to true again when a poll
>>> cqe has been committed. One concern is that this method may lost waken-up
>>> event, but seems it's ok.
>>>
>>>    io_poll_wake                io_poll_task_func
>>> t1                       |
>>> t2                       |    WRITE_ONCE(req->poll.active, true);
>>> t3                       |
>>> t4                       |    io_commit_cqring(ctx);
>>> t5                       |
>>> t6                       |
>>>
>>> If waken-up events happens before or at t4, it's ok, user app will always
>>> see a cqe. If waken-up events happens after t4 and IIUC, io_poll_wake()
>>> will see the new req->poll.active value by using READ_ONCE().
>>>
>>> Echo_server codes can be cloned from:
>>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git,
>>> branch is xiaoguangwang/io_uring_multishot.
>>>
>>> Without this patch, the tps in our test environment is 284116, with
>>> this patch, the tps is 287832, about 1.3% reqs improvement, which
>>> is indeed in accord with the saved add_wait_queue() cost.
>>>
>>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>>> ---
>>>   fs/io_uring.c | 57 +++++++++++++++++++++++++++++++++------------------------
>>>   1 file changed, 33 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 18af9bb9a4bc..e4c779dac953 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -481,6 +481,7 @@ struct io_poll_iocb {
>>>       __poll_t            events;
>>>       bool                done;
>>>       bool                canceled;
>>> +    bool                active;
>>>       struct wait_queue_entry        wait;
>>>   };
>>>   @@ -5233,8 +5234,6 @@ static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *pol
>>>   {
>>>       trace_io_uring_task_add(req->ctx, req->opcode, req->user_data, mask);
>>>   -    list_del_init(&poll->wait.entry);
>>> -
>>
>> As I mentioned to Hao some time ago, we can't allow this function or in
>> particular io_req_task_work_add() to happen twice before the first
>> task work got executed, they use the same field in io_kiocb and those
>> will corrupt the tw list.
>>
>> Looks that's what can happen here.
> If I have understood scenario your described correctly, I think it won't happen :)
> With this patch, if the first io_req_task_work_add() is issued, poll.active
> will be set to false, then no new io_req_task_work_add() will be issued.
> Only the first task_work installed by the first io_req_task_work_add() has
> completed, poll.active will be set to true again.

Ah, I see now, the active dance is in io_poll_wake(). That won't work
with READ_ONCE/WRITE_ONCE though, you would need real atomics

The easiest race to explain is:

CPU1                        | CPU2
io_poll_wake                | io_poll_wake
if (p->active) return 0;    | if (p->active) return 0;
        // p->active is false in both cases, continue
p->active = false;          | p->active = false;
task_work_add()             | task_work_add()


But there are more subtle races.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
  2021-10-29 10:02       ` Pavel Begunkov
@ 2021-10-29 13:37         ` Xiaoguang Wang
  2021-10-29 13:47           ` Pavel Begunkov
  2021-10-29 14:12           ` Pavel Begunkov
  0 siblings, 2 replies; 19+ messages in thread
From: Xiaoguang Wang @ 2021-10-29 13:37 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: axboe

hi,

> On 10/29/21 03:57, Xiaoguang Wang wrote:
>> hi,
>>
>>> On 10/25/21 06:38, Xiaoguang Wang wrote:
>>>> Run echo_server to evaluate io_uring's multi-shot poll performance, 
>>>> perf
>>>> shows that add_wait_queue() has obvious overhead. Intruduce a new 
>>>> state
>>>> 'active' in io_poll_iocb to indicate whether io_poll_wake() should 
>>>> queue
>>>> a task_work. This new state will be set to true initially, be set 
>>>> to false
>>>> when starting to queue a task work, and be set to true again when a 
>>>> poll
>>>> cqe has been committed. One concern is that this method may lost 
>>>> waken-up
>>>> event, but seems it's ok.
>>>>
>>>>    io_poll_wake                io_poll_task_func
>>>> t1                       |
>>>> t2                       | WRITE_ONCE(req->poll.active, true);
>>>> t3                       |
>>>> t4                       |    io_commit_cqring(ctx);
>>>> t5                       |
>>>> t6                       |
>>>>
>>>> If waken-up events happens before or at t4, it's ok, user app will 
>>>> always
>>>> see a cqe. If waken-up events happens after t4 and IIUC, 
>>>> io_poll_wake()
>>>> will see the new req->poll.active value by using READ_ONCE().
>>>>
>>>> Echo_server codes can be cloned from:
>>>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git,
>>>> branch is xiaoguangwang/io_uring_multishot.
>>>>
>>>> Without this patch, the tps in our test environment is 284116, with
>>>> this patch, the tps is 287832, about 1.3% reqs improvement, which
>>>> is indeed in accord with the saved add_wait_queue() cost.
>>>>
>>>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>>>> ---
>>>>   fs/io_uring.c | 57 
>>>> +++++++++++++++++++++++++++++++++------------------------
>>>>   1 file changed, 33 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 18af9bb9a4bc..e4c779dac953 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -481,6 +481,7 @@ struct io_poll_iocb {
>>>>       __poll_t            events;
>>>>       bool                done;
>>>>       bool                canceled;
>>>> +    bool                active;
>>>>       struct wait_queue_entry        wait;
>>>>   };
>>>>   @@ -5233,8 +5234,6 @@ static inline int __io_async_wake(struct 
>>>> io_kiocb *req, struct io_poll_iocb *pol
>>>>   {
>>>>       trace_io_uring_task_add(req->ctx, req->opcode, 
>>>> req->user_data, mask);
>>>>   -    list_del_init(&poll->wait.entry);
>>>> -
>>>
>>> As I mentioned to Hao some time ago, we can't allow this function or in
>>> particular io_req_task_work_add() to happen twice before the first
>>> task work got executed, they use the same field in io_kiocb and those
>>> will corrupt the tw list.
>>>
>>> Looks that's what can happen here.
>> If I have understood scenario your described correctly, I think it 
>> won't happen :)
>> With this patch, if the first io_req_task_work_add() is issued, 
>> poll.active
>> will be set to false, then no new io_req_task_work_add() will be issued.
>> Only the first task_work installed by the first 
>> io_req_task_work_add() has
>> completed, poll.active will be set to true again.
>
> Ah, I see now, the active dance is in io_poll_wake(). That won't work
> with READ_ONCE/WRITE_ONCE though, you would need real atomics
>
> The easiest race to explain is:
>
> CPU1                        | CPU2
> io_poll_wake                | io_poll_wake
> if (p->active) return 0;    | if (p->active) return 0;
it's "if (!p->active) return 0;" in my patch :)
> // p->active is false in both cases, continue
> p->active = false;          | p->active = false;
> task_work_add()             | task_work_add()
io_poll_wake() is called with poll->head->lock, so there will no concurrent
io_poll_wake() calls.

Regards,
Xiaoguang Wang
>
>
> But there are more subtle races.
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
  2021-10-29 13:37         ` Xiaoguang Wang
@ 2021-10-29 13:47           ` Pavel Begunkov
  2021-10-29 14:12           ` Pavel Begunkov
  1 sibling, 0 replies; 19+ messages in thread
From: Pavel Begunkov @ 2021-10-29 13:47 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe

On 10/29/21 14:37, Xiaoguang Wang wrote:
> hi,
> 
>> On 10/29/21 03:57, Xiaoguang Wang wrote:
>>> hi,
>>>
>>>> On 10/25/21 06:38, Xiaoguang Wang wrote:
>>>>> Run echo_server to evaluate io_uring's multi-shot poll performance, perf
>>>>> shows that add_wait_queue() has obvious overhead. Intruduce a new state
>>>>> 'active' in io_poll_iocb to indicate whether io_poll_wake() should queue
>>>>> a task_work. This new state will be set to true initially, be set to false
>>>>> when starting to queue a task work, and be set to true again when a poll
>>>>> cqe has been committed. One concern is that this method may lost waken-up
>>>>> event, but seems it's ok.
>>>>>
>>>>>    io_poll_wake                io_poll_task_func
>>>>> t1                       |
>>>>> t2                       | WRITE_ONCE(req->poll.active, true);
>>>>> t3                       |
>>>>> t4                       |    io_commit_cqring(ctx);
>>>>> t5                       |
>>>>> t6                       |
>>>>>
>>>>> If waken-up events happens before or at t4, it's ok, user app will always
>>>>> see a cqe. If waken-up events happens after t4 and IIUC, io_poll_wake()
>>>>> will see the new req->poll.active value by using READ_ONCE().
>>>>>
>>>>> Echo_server codes can be cloned from:
>>>>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git,
>>>>> branch is xiaoguangwang/io_uring_multishot.
>>>>>
>>>>> Without this patch, the tps in our test environment is 284116, with
>>>>> this patch, the tps is 287832, about 1.3% reqs improvement, which
>>>>> is indeed in accord with the saved add_wait_queue() cost.
>>>>>
>>>>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>>>>> ---
>>>>>   fs/io_uring.c | 57 +++++++++++++++++++++++++++++++++------------------------
>>>>>   1 file changed, 33 insertions(+), 24 deletions(-)
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 18af9bb9a4bc..e4c779dac953 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -481,6 +481,7 @@ struct io_poll_iocb {
>>>>>       __poll_t            events;
>>>>>       bool                done;
>>>>>       bool                canceled;
>>>>> +    bool                active;
>>>>>       struct wait_queue_entry        wait;
>>>>>   };
>>>>>   @@ -5233,8 +5234,6 @@ static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *pol
>>>>>   {
>>>>>       trace_io_uring_task_add(req->ctx, req->opcode, req->user_data, mask);
>>>>>   -    list_del_init(&poll->wait.entry);
>>>>> -
>>>>
>>>> As I mentioned to Hao some time ago, we can't allow this function or in
>>>> particular io_req_task_work_add() to happen twice before the first
>>>> task work got executed, they use the same field in io_kiocb and those
>>>> will corrupt the tw list.
>>>>
>>>> Looks that's what can happen here.
>>> If I have understood scenario your described correctly, I think it won't happen :)
>>> With this patch, if the first io_req_task_work_add() is issued, poll.active
>>> will be set to false, then no new io_req_task_work_add() will be issued.
>>> Only the first task_work installed by the first io_req_task_work_add() has
>>> completed, poll.active will be set to true again.
>>
>> Ah, I see now, the active dance is in io_poll_wake(). That won't work
>> with READ_ONCE/WRITE_ONCE though, you would need real atomics
>>
>> The easiest race to explain is:
>>
>> CPU1                        | CPU2
>> io_poll_wake                | io_poll_wake
>> if (p->active) return 0;    | if (p->active) return 0;
> it's "if (!p->active) return 0;" in my patch :)
>> // p->active is false in both cases, continue
>> p->active = false;          | p->active = false;
>> task_work_add()             | task_work_add()
> io_poll_wake() is called with poll->head->lock, so there will no concurrent
> io_poll_wake() calls.

Double poll, and also there are no guarantees that CPUs don't
see inconsistent values.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
  2021-10-29 13:37         ` Xiaoguang Wang
  2021-10-29 13:47           ` Pavel Begunkov
@ 2021-10-29 14:12           ` Pavel Begunkov
  2021-10-29 14:34             ` Xiaoguang Wang
  1 sibling, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2021-10-29 14:12 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe

On 10/29/21 14:37, Xiaoguang Wang wrote:
> hi,
> 
>> On 10/29/21 03:57, Xiaoguang Wang wrote:
>>> hi,
>>>
>>>> On 10/25/21 06:38, Xiaoguang Wang wrote:
>>>>> Run echo_server to evaluate io_uring's multi-shot poll performance, perf
>>>>> shows that add_wait_queue() has obvious overhead. Intruduce a new state
>>>>> 'active' in io_poll_iocb to indicate whether io_poll_wake() should queue
>>>>> a task_work. This new state will be set to true initially, be set to false
>>>>> when starting to queue a task work, and be set to true again when a poll
>>>>> cqe has been committed. One concern is that this method may lost waken-up
>>>>> event, but seems it's ok.
>>>>>
>>>>>    io_poll_wake                io_poll_task_func
>>>>> t1                       |
>>>>> t2                       | WRITE_ONCE(req->poll.active, true);
>>>>> t3                       |
>>>>> t4                       |    io_commit_cqring(ctx);
>>>>> t5                       |
>>>>> t6                       |
>>>>>
>>>>> If waken-up events happens before or at t4, it's ok, user app will always
>>>>> see a cqe. If waken-up events happens after t4 and IIUC, io_poll_wake()
>>>>> will see the new req->poll.active value by using READ_ONCE().
>>>>>
>>>>> Echo_server codes can be cloned from:
>>>>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git,
>>>>> branch is xiaoguangwang/io_uring_multishot.
>>>>>
>>>>> Without this patch, the tps in our test environment is 284116, with
>>>>> this patch, the tps is 287832, about 1.3% reqs improvement, which
>>>>> is indeed in accord with the saved add_wait_queue() cost.
>>>>>
>>>>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>>>>> ---
>>>>>   fs/io_uring.c | 57 +++++++++++++++++++++++++++++++++------------------------
>>>>>   1 file changed, 33 insertions(+), 24 deletions(-)
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 18af9bb9a4bc..e4c779dac953 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -481,6 +481,7 @@ struct io_poll_iocb {
>>>>>       __poll_t            events;
>>>>>       bool                done;
>>>>>       bool                canceled;
>>>>> +    bool                active;
>>>>>       struct wait_queue_entry        wait;
>>>>>   };
>>>>>   @@ -5233,8 +5234,6 @@ static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *pol
>>>>>   {
>>>>>       trace_io_uring_task_add(req->ctx, req->opcode, req->user_data, mask);
>>>>>   -    list_del_init(&poll->wait.entry);
>>>>> -
>>>>
>>>> As I mentioned to Hao some time ago, we can't allow this function or in
>>>> particular io_req_task_work_add() to happen twice before the first
>>>> task work got executed, they use the same field in io_kiocb and those
>>>> will corrupt the tw list.
>>>>
>>>> Looks that's what can happen here.
>>> If I have understood scenario your described correctly, I think it won't happen :)
>>> With this patch, if the first io_req_task_work_add() is issued, poll.active
>>> will be set to false, then no new io_req_task_work_add() will be issued.
>>> Only the first task_work installed by the first io_req_task_work_add() has
>>> completed, poll.active will be set to true again.
>>
>> Ah, I see now, the active dance is in io_poll_wake(). That won't work
>> with READ_ONCE/WRITE_ONCE though, you would need real atomics
>>
>> The easiest race to explain is:
>>
>> CPU1                        | CPU2
>> io_poll_wake                | io_poll_wake
>> if (p->active) return 0;    | if (p->active) return 0;
> it's "if (!p->active) return 0;" in my patch :)

hah, yeah, email draft-coding

>> // p->active is false in both cases, continue
>> p->active = false;          | p->active = false;
>> task_work_add()             | task_work_add()
> io_poll_wake() is called with poll->head->lock, so there will no concurrent
> io_poll_wake() calls.

The problem is that the poll implementation is too wobbly, can't count
how many corner cases there are... We can get to the point that your
patches are "proven" to work, but I'll be more on the cautious side as
the current state is hell over complicated.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
  2021-10-29 14:12           ` Pavel Begunkov
@ 2021-10-29 14:34             ` Xiaoguang Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Xiaoguang Wang @ 2021-10-29 14:34 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: axboe


> On 10/29/21 14:37, Xiaoguang Wang wrote:
>> hi,
>>
>>> On 10/29/21 03:57, Xiaoguang Wang wrote:
>>>> hi,
>>>>
>>>>> On 10/25/21 06:38, Xiaoguang Wang wrote:
>>>>>> Run echo_server to evaluate io_uring's multi-shot poll 
>>>>>> performance, perf
>>>>>> shows that add_wait_queue() has obvious overhead. Intruduce a new 
>>>>>> state
>>>>>> 'active' in io_poll_iocb to indicate whether io_poll_wake() 
>>>>>> should queue
>>>>>> a task_work. This new state will be set to true initially, be set 
>>>>>> to false
>>>>>> when starting to queue a task work, and be set to true again when 
>>>>>> a poll
>>>>>> cqe has been committed. One concern is that this method may lost 
>>>>>> waken-up
>>>>>> event, but seems it's ok.
>>>>>>
>>>>>>    io_poll_wake                io_poll_task_func
>>>>>> t1                       |
>>>>>> t2                       | WRITE_ONCE(req->poll.active, true);
>>>>>> t3                       |
>>>>>> t4                       |    io_commit_cqring(ctx);
>>>>>> t5                       |
>>>>>> t6                       |
>>>>>>
>>>>>> If waken-up events happens before or at t4, it's ok, user app 
>>>>>> will always
>>>>>> see a cqe. If waken-up events happens after t4 and IIUC, 
>>>>>> io_poll_wake()
>>>>>> will see the new req->poll.active value by using READ_ONCE().
>>>>>>
>>>>>> Echo_server codes can be cloned from:
>>>>>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git, 
>>>>>>
>>>>>> branch is xiaoguangwang/io_uring_multishot.
>>>>>>
>>>>>> Without this patch, the tps in our test environment is 284116, with
>>>>>> this patch, the tps is 287832, about 1.3% reqs improvement, which
>>>>>> is indeed in accord with the saved add_wait_queue() cost.
>>>>>>
>>>>>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>>>>>> ---
>>>>>>   fs/io_uring.c | 57 
>>>>>> +++++++++++++++++++++++++++++++++------------------------
>>>>>>   1 file changed, 33 insertions(+), 24 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>> index 18af9bb9a4bc..e4c779dac953 100644
>>>>>> --- a/fs/io_uring.c
>>>>>> +++ b/fs/io_uring.c
>>>>>> @@ -481,6 +481,7 @@ struct io_poll_iocb {
>>>>>>       __poll_t            events;
>>>>>>       bool                done;
>>>>>>       bool                canceled;
>>>>>> +    bool                active;
>>>>>>       struct wait_queue_entry        wait;
>>>>>>   };
>>>>>>   @@ -5233,8 +5234,6 @@ static inline int __io_async_wake(struct 
>>>>>> io_kiocb *req, struct io_poll_iocb *pol
>>>>>>   {
>>>>>>       trace_io_uring_task_add(req->ctx, req->opcode, 
>>>>>> req->user_data, mask);
>>>>>>   -    list_del_init(&poll->wait.entry);
>>>>>> -
>>>>>
>>>>> As I mentioned to Hao some time ago, we can't allow this function 
>>>>> or in
>>>>> particular io_req_task_work_add() to happen twice before the first
>>>>> task work got executed, they use the same field in io_kiocb and those
>>>>> will corrupt the tw list.
>>>>>
>>>>> Looks that's what can happen here.
>>>> If I have understood scenario your described correctly, I think it 
>>>> won't happen :)
>>>> With this patch, if the first io_req_task_work_add() is issued, 
>>>> poll.active
>>>> will be set to false, then no new io_req_task_work_add() will be 
>>>> issued.
>>>> Only the first task_work installed by the first 
>>>> io_req_task_work_add() has
>>>> completed, poll.active will be set to true again.
>>>
>>> Ah, I see now, the active dance is in io_poll_wake(). That won't work
>>> with READ_ONCE/WRITE_ONCE though, you would need real atomics
>>>
>>> The easiest race to explain is:
>>>
>>> CPU1                        | CPU2
>>> io_poll_wake                | io_poll_wake
>>> if (p->active) return 0;    | if (p->active) return 0;
>> it's "if (!p->active) return 0;" in my patch :)
>
> hah, yeah, email draft-coding
>
>>> // p->active is false in both cases, continue
>>> p->active = false;          | p->active = false;
>>> task_work_add()             | task_work_add()
>> io_poll_wake() is called with poll->head->lock, so there will no 
>> concurrent
>> io_poll_wake() calls.
>
> The problem is that the poll implementation is too wobbly, can't count
> how many corner cases there are... 
Absolutely agree with you. In the process of developing fixed poll patch,
I have realized poll implementation is  not easy...

> We can get to the point that your
> patches are "proven" to work, but I'll be more on the cautious side as
> the current state is hell over complicated.
OK, I understand your concerns. I'll check double poll codes again, not
quite familiar with it yet.

Regards,
Xiaoguang Wang



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/3] improvements for multi-shot poll requests
  2021-10-28 18:19 ` [PATCH v3 0/3] improvements for multi-shot poll requests Jens Axboe
@ 2021-10-29 18:29   ` Jens Axboe
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2021-10-29 18:29 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: asml.silence

On 10/28/21 12:19 PM, Jens Axboe wrote:
> On 10/24/21 11:38 PM, Xiaoguang Wang wrote:
>> Echo_server codes can be clone from:
>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git
>> branch is xiaoguangwang/io_uring_multishot. There is a simple HOWTO
>> in this repository.
>>
>> Usage:
>> In server: port 10016, 1000 connections, packet size 16 bytes, and
>> enable fixed files.
>>   taskset -c 10 io_uring_echo_server_multi_shot  -f -p 10016 -n 1000 -l 16
>>
>> In client:
>>   taskset -c 13,14,15,16 ./echo -addr 11.238.147.21:10016 -n 1000 -size 16
>>
>> Before this patchset, the tps is like below:
>> 1:15:53 req: 1430425, req/s: 286084.693
>> 11:15:58 req: 1426021, req/s: 285204.079
>> 11:16:03 req: 1416761, req/s: 283352.146
>> 11:16:08 req: 1417969, req/s: 283165.637
>> 11:16:13 req: 1424591, req/s: 285349.915
>> 11:16:18 req: 1418706, req/s: 283738.725
>> 11:16:23 req: 1411988, req/s: 282399.052
>> 11:16:28 req: 1419097, req/s: 283820.477
>> 11:16:33 req: 1417816, req/s: 283563.262
>> 11:16:38 req: 1422461, req/s: 284491.702
>> 11:16:43 req: 1418176, req/s: 283635.327
>> 11:16:48 req: 1414525, req/s: 282905.276
>> 11:16:53 req: 1415624, req/s: 283124.140
>> 11:16:58 req: 1426435, req/s: 284970.486
>>
>> with this patchset:
>> 2021/09/24 11:10:01 start to do client
>> 11:10:06 req: 1444979, req/s: 288995.300
>> 11:10:11 req: 1442559, req/s: 288511.689
>> 11:10:16 req: 1427253, req/s: 285450.390
>> 11:10:21 req: 1445236, req/s: 288349.853
>> 11:10:26 req: 1423949, req/s: 285480.941
>> 11:10:31 req: 1445304, req/s: 289060.815
>> 11:10:36 req: 1441036, req/s: 288207.119
>> 11:10:41 req: 1441117, req/s: 288220.695
>> 11:10:46 req: 1441451, req/s: 288292.731
>> 11:10:51 req: 1438801, req/s: 287759.157
>> 11:10:56 req: 1433227, req/s: 286646.338
>> 11:11:01 req: 1438307, req/s: 287661.577> 
>> about 1.3% tps improvements.
> 
> In the spirit of moving this one along, I've applied this series. Still a few
> things we can do on top, but I don't think that should hold it back. If you
> planned on sending an update to inline that check again just do it on top
> of the current tree.

Due to the discussion about it, and the potential syzbot issue reported,
I've dropped this one from the 5.16 merge window.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-10-29 18:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-25  5:38 [PATCH v3 0/3] improvements for multi-shot poll requests Xiaoguang Wang
2021-10-25  5:38 ` [PATCH v3 1/3] io_uring: refactor event check out of __io_async_wake() Xiaoguang Wang
2021-10-25  9:35   ` Praveen Kumar
2021-10-25  5:38 ` [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request Xiaoguang Wang
2021-10-28 19:21   ` Pavel Begunkov
2021-10-29  2:57     ` Xiaoguang Wang
2021-10-29 10:02       ` Pavel Begunkov
2021-10-29 13:37         ` Xiaoguang Wang
2021-10-29 13:47           ` Pavel Begunkov
2021-10-29 14:12           ` Pavel Begunkov
2021-10-29 14:34             ` Xiaoguang Wang
2021-10-25  5:38 ` [PATCH v3 3/3] io_uring: don't get completion_lock in io_poll_rewait() Xiaoguang Wang
2021-10-28 19:26   ` Pavel Begunkov
2021-10-29  5:59     ` Xiaoguang Wang
2021-10-28 18:19 ` [PATCH v3 0/3] improvements for multi-shot poll requests Jens Axboe
2021-10-29 18:29   ` Jens Axboe
2021-10-28 18:19 ` Jens Axboe
2021-10-28 19:01   ` Pavel Begunkov
2021-10-28 19:04     ` Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.