io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests
@ 2021-10-08 12:36 Hao Xu
  2021-10-08 12:36 ` [PATCH 1/2] io_uring: add IOSQE_ASYNC_HYBRID flag " Hao Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Hao Xu @ 2021-10-08 12:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

this is a new feature for pollable requests, see detail in commit
message.


Hao Xu (2):
  io_uring: add IOSQE_ASYNC_HYBRID flag for pollable requests
  io_uring: implementation of IOSQE_ASYNC_HYBRID logic

 fs/io_uring.c                 | 48 +++++++++++++++++++++++++++++++----
 include/uapi/linux/io_uring.h |  4 ++-
 2 files changed, 46 insertions(+), 6 deletions(-)

-- 
2.24.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] io_uring: add IOSQE_ASYNC_HYBRID flag for pollable requests
  2021-10-08 12:36 [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests Hao Xu
@ 2021-10-08 12:36 ` Hao Xu
  2021-10-08 12:36 ` [PATCH 2/2] io_uring: implementation of IOSQE_ASYNC_HYBRID logic Hao Xu
  2021-10-09 12:51 ` [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests Pavel Begunkov
  2 siblings, 0 replies; 13+ messages in thread
From: Hao Xu @ 2021-10-08 12:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

The current logic of requests with IOSQE_FORCE_ASYNC is first queueing
it to io-worker, then execute it in a synchronous way. For unbound works
like pollable requests(e.g. read/write a socketfd), the io-worker may
stuck there waiting for events for a long time. And thus other works
wait in the list for a long time too.
Let's introduce a new sqe flag IOSQE_ASYNC_HIBRID for unbound works
(currently pollable requests), with this a request will first be queued
to io-worker, then executed in a nonblock try rather than a synchronous
way. Failure of that leads it to arm poll stuff and then the worker can
begin to handle other works.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---
 fs/io_uring.c                 | 6 +++++-
 include/uapi/linux/io_uring.h | 4 +++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 73135c5c6168..a99f7f46e6d4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -104,7 +104,8 @@
 #define IORING_MAX_REG_BUFFERS	(1U << 14)
 
 #define SQE_COMMON_FLAGS (IOSQE_FIXED_FILE | IOSQE_IO_LINK | \
-			  IOSQE_IO_HARDLINK | IOSQE_ASYNC)
+			  IOSQE_IO_HARDLINK | IOSQE_ASYNC | \
+			  IOSQE_ASYNC_HYBRID)
 
 #define SQE_VALID_FLAGS	(SQE_COMMON_FLAGS|IOSQE_BUFFER_SELECT|IOSQE_IO_DRAIN)
 
@@ -709,6 +710,7 @@ enum {
 	REQ_F_HARDLINK_BIT	= IOSQE_IO_HARDLINK_BIT,
 	REQ_F_FORCE_ASYNC_BIT	= IOSQE_ASYNC_BIT,
 	REQ_F_BUFFER_SELECT_BIT	= IOSQE_BUFFER_SELECT_BIT,
+	REQ_F_ASYNC_HYBRID_BIT	= IOSQE_ASYNC_HYBRID_BIT,
 
 	/* first byte is taken by user flags, shift it to not overlap */
 	REQ_F_FAIL_BIT		= 8,
@@ -747,6 +749,8 @@ enum {
 	REQ_F_FORCE_ASYNC	= BIT(REQ_F_FORCE_ASYNC_BIT),
 	/* IOSQE_BUFFER_SELECT */
 	REQ_F_BUFFER_SELECT	= BIT(REQ_F_BUFFER_SELECT_BIT),
+	/* IOSQE_ASYNC_HYBRID */
+	REQ_F_ASYNC_HYBRID	= BIT(REQ_F_ASYNC_HYBRID_BIT),
 
 	/* fail rest of links */
 	REQ_F_FAIL		= BIT(REQ_F_FAIL_BIT),
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index c45b5e9a9387..3e49a7dbe636 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -70,6 +70,7 @@ enum {
 	IOSQE_IO_HARDLINK_BIT,
 	IOSQE_ASYNC_BIT,
 	IOSQE_BUFFER_SELECT_BIT,
+	IOSQE_ASYNC_HYBRID_BIT,
 };
 
 /*
@@ -87,7 +88,8 @@ enum {
 #define IOSQE_ASYNC		(1U << IOSQE_ASYNC_BIT)
 /* select buffer from sqe->buf_group */
 #define IOSQE_BUFFER_SELECT	(1U << IOSQE_BUFFER_SELECT_BIT)
-
+/* first force async then arm poll */
+#define IOSQE_ASYNC_HYBRID	(1U << IOSQE_ASYNC_HYBRID_BIT)
 /*
  * io_uring_setup() flags
  */
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] io_uring: implementation of IOSQE_ASYNC_HYBRID logic
  2021-10-08 12:36 [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests Hao Xu
  2021-10-08 12:36 ` [PATCH 1/2] io_uring: add IOSQE_ASYNC_HYBRID flag " Hao Xu
@ 2021-10-08 12:36 ` Hao Xu
  2021-10-09 12:46   ` Pavel Begunkov
  2021-10-09 12:51 ` [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests Pavel Begunkov
  2 siblings, 1 reply; 13+ messages in thread
From: Hao Xu @ 2021-10-08 12:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

The process of this kind of requests is:

step1: original context:
           queue it to io-worker
step2: io-worker context:
           nonblock try(the old logic is a synchronous try here)
               |
               |--fail--> arm poll
                            |
                            |--(fail/ready)-->synchronous issue
                            |
                            |--(succeed)-->worker finish it's job, tw
                                           take over the req

This works much better than IOSQE_ASYNC in cases where cpu resources
are scarce or unbound max_worker is small. In these cases, number of
io-worker eazily increments to max_worker, new worker cannot be created
and running workers stuck there handling old works in IOSQE_ASYNC mode.

In my machine, set unbound max_worker to 20, run echo-server, turns out:
(arguments: register_file, connetion number is 1000, message size is 12
Byte)
IOSQE_ASYNC: 76664.151 tps
IOSQE_ASYNC_HYBRID: 166934.985 tps

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---
 fs/io_uring.c | 42 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a99f7f46e6d4..024cef09bc12 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1409,7 +1409,7 @@ static void io_prep_async_work(struct io_kiocb *req)
 
 	req->work.list.next = NULL;
 	req->work.flags = 0;
-	if (req->flags & REQ_F_FORCE_ASYNC)
+	if (req->flags & (REQ_F_FORCE_ASYNC | REQ_F_ASYNC_HYBRID))
 		req->work.flags |= IO_WQ_WORK_CONCURRENT;
 
 	if (req->flags & REQ_F_ISREG) {
@@ -5575,7 +5575,13 @@ static int io_arm_poll_handler(struct io_kiocb *req)
 	req->apoll = apoll;
 	req->flags |= REQ_F_POLLED;
 	ipt.pt._qproc = io_async_queue_proc;
-	io_req_set_refcount(req);
+	/*
+	 * REQ_F_REFCOUNT set indicate we are in io-worker context, where we
+	 * already explicitly set the submittion and completion ref. So no
+	 * need to set refcount here if that is the case.
+	 */
+	if (!(req->flags & REQ_F_REFCOUNT))
+		io_req_set_refcount(req);
 
 	ret = __io_arm_poll_handler(req, &apoll->poll, &ipt, mask,
 					io_async_wake);
@@ -6704,8 +6710,11 @@ static void io_wq_submit_work(struct io_wq_work *work)
 		ret = -ECANCELED;
 
 	if (!ret) {
+		bool need_poll = req->flags & REQ_F_ASYNC_HYBRID;
+
 		do {
-			ret = io_issue_sqe(req, 0);
+issue_sqe:
+			ret = io_issue_sqe(req, need_poll ? IO_URING_F_NONBLOCK : 0);
 			/*
 			 * We can get EAGAIN for polled IO even though we're
 			 * forcing a sync submission from here, since we can't
@@ -6713,6 +6722,30 @@ static void io_wq_submit_work(struct io_wq_work *work)
 			 */
 			if (ret != -EAGAIN)
 				break;
+			if (need_poll) {
+				bool armed = false;
+
+				ret = 0;
+				need_poll = false;
+
+				switch (io_arm_poll_handler(req)) {
+				case IO_APOLL_READY:
+					goto issue_sqe;
+				case IO_APOLL_ABORTED:
+					/*
+					 * somehow we failed to arm the poll infra,
+					 * fallback it to a normal async worker try.
+					 */
+					break;
+				case IO_APOLL_OK:
+					armed = true;
+					break;
+				}
+
+				if (armed)
+					break;
+
+			}
 			cond_resched();
 		} while (1);
 	}
@@ -6928,7 +6961,8 @@ static void io_queue_sqe_fallback(struct io_kiocb *req)
 static inline void io_queue_sqe(struct io_kiocb *req)
 	__must_hold(&req->ctx->uring_lock)
 {
-	if (likely(!(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL))))
+	if (likely(!(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_ASYNC_HYBRID |
+				   REQ_F_FAIL))))
 		__io_queue_sqe(req);
 	else
 		io_queue_sqe_fallback(req);
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] io_uring: implementation of IOSQE_ASYNC_HYBRID logic
  2021-10-08 12:36 ` [PATCH 2/2] io_uring: implementation of IOSQE_ASYNC_HYBRID logic Hao Xu
@ 2021-10-09 12:46   ` Pavel Begunkov
  2021-10-11  8:55     ` Hao Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2021-10-09 12:46 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 10/8/21 13:36, Hao Xu wrote:
> The process of this kind of requests is:
> 
> step1: original context:
>             queue it to io-worker
> step2: io-worker context:
>             nonblock try(the old logic is a synchronous try here)
>                 |
>                 |--fail--> arm poll
>                              |
>                              |--(fail/ready)-->synchronous issue
>                              |
>                              |--(succeed)-->worker finish it's job, tw
>                                             take over the req
> 
> This works much better than IOSQE_ASYNC in cases where cpu resources
> are scarce or unbound max_worker is small. In these cases, number of
> io-worker eazily increments to max_worker, new worker cannot be created
> and running workers stuck there handling old works in IOSQE_ASYNC mode.
> 
> In my machine, set unbound max_worker to 20, run echo-server, turns out:
> (arguments: register_file, connetion number is 1000, message size is 12
> Byte)
> IOSQE_ASYNC: 76664.151 tps
> IOSQE_ASYNC_HYBRID: 166934.985 tps
> 
> Suggested-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
>   fs/io_uring.c | 42 ++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index a99f7f46e6d4..024cef09bc12 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1409,7 +1409,7 @@ static void io_prep_async_work(struct io_kiocb *req)
>   
>   	req->work.list.next = NULL;
>   	req->work.flags = 0;
> -	if (req->flags & REQ_F_FORCE_ASYNC)
> +	if (req->flags & (REQ_F_FORCE_ASYNC | REQ_F_ASYNC_HYBRID))
>   		req->work.flags |= IO_WQ_WORK_CONCURRENT;
>   
>   	if (req->flags & REQ_F_ISREG) {
> @@ -5575,7 +5575,13 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>   	req->apoll = apoll;
>   	req->flags |= REQ_F_POLLED;
>   	ipt.pt._qproc = io_async_queue_proc;
> -	io_req_set_refcount(req);
> +	/*
> +	 * REQ_F_REFCOUNT set indicate we are in io-worker context, where we

Nope, it indicates that needs more complex refcounting. It includes linked
timeouts but also poll because of req_ref_get for double poll. fwiw, with
some work it can be removed for polls, harder (and IMHO not necessary) to do
for timeouts.

> +	 * already explicitly set the submittion and completion ref. So no

I'd say there is no notion of submission vs completion refs anymore.

> +	 * need to set refcount here if that is the case.
> +	 */
> +	if (!(req->flags & REQ_F_REFCOUNT))

Compare it with io_req_set_refcount(), that "if" is a a no-op

> +		io_req_set_refcount(req);
>   
>   	ret = __io_arm_poll_handler(req, &apoll->poll, &ipt, mask,
>   					io_async_wake);
> @@ -6704,8 +6710,11 @@ static void io_wq_submit_work(struct io_wq_work *work)
>   		ret = -ECANCELED;
>   
>   	if (!ret) {
> +		bool need_poll = req->flags & REQ_F_ASYNC_HYBRID;
> +
>   		do {
> -			ret = io_issue_sqe(req, 0);
> +issue_sqe:
> +			ret = io_issue_sqe(req, need_poll ? IO_URING_F_NONBLOCK : 0);

It's buggy, you will get all kinds of kernel crashes and leaks.
Currently IO_URING_F_NONBLOCK has dual meaning: obvious nonblock but
also whether we hold uring_lock or not. You'd need to split the flag
into two, i.e. IO_URING_F_LOCKED

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests
  2021-10-08 12:36 [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests Hao Xu
  2021-10-08 12:36 ` [PATCH 1/2] io_uring: add IOSQE_ASYNC_HYBRID flag " Hao Xu
  2021-10-08 12:36 ` [PATCH 2/2] io_uring: implementation of IOSQE_ASYNC_HYBRID logic Hao Xu
@ 2021-10-09 12:51 ` Pavel Begunkov
  2021-10-11  3:08   ` Hao Xu
  2 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2021-10-09 12:51 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 10/8/21 13:36, Hao Xu wrote:
> this is a new feature for pollable requests, see detail in commit
> message.

It really sounds we should do it as a part of IOSQE_ASYNC, so
what are the cons and compromises?

> Hao Xu (2):
>    io_uring: add IOSQE_ASYNC_HYBRID flag for pollable requests

btw, it doesn't make sense to split it into two patches

>    io_uring: implementation of IOSQE_ASYNC_HYBRID logic
> 
>   fs/io_uring.c                 | 48 +++++++++++++++++++++++++++++++----
>   include/uapi/linux/io_uring.h |  4 ++-
>   2 files changed, 46 insertions(+), 6 deletions(-)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests
  2021-10-09 12:51 ` [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests Pavel Begunkov
@ 2021-10-11  3:08   ` Hao Xu
  2021-10-12 11:39     ` Pavel Begunkov
  0 siblings, 1 reply; 13+ messages in thread
From: Hao Xu @ 2021-10-11  3:08 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/10/9 下午8:51, Pavel Begunkov 写道:
> On 10/8/21 13:36, Hao Xu wrote:
>> this is a new feature for pollable requests, see detail in commit
>> message.
> 
> It really sounds we should do it as a part of IOSQE_ASYNC, so
> what are the cons and compromises?
I wrote the pros and cons here:
https://github.com/axboe/liburing/issues/426#issuecomment-939221300

> 
>> Hao Xu (2):
>>    io_uring: add IOSQE_ASYNC_HYBRID flag for pollable requests
> 
> btw, it doesn't make sense to split it into two patches
Hmm, I thought we should make adding a new flag as a separate patch.
Could you give me more hints about the considerration here?
> 
>>    io_uring: implementation of IOSQE_ASYNC_HYBRID logic
>>
>>   fs/io_uring.c                 | 48 +++++++++++++++++++++++++++++++----
>>   include/uapi/linux/io_uring.h |  4 ++-
>>   2 files changed, 46 insertions(+), 6 deletions(-)
>>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] io_uring: implementation of IOSQE_ASYNC_HYBRID logic
  2021-10-09 12:46   ` Pavel Begunkov
@ 2021-10-11  8:55     ` Hao Xu
  2021-10-11  8:58       ` Hao Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Hao Xu @ 2021-10-11  8:55 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/10/9 下午8:46, Pavel Begunkov 写道:
> On 10/8/21 13:36, Hao Xu wrote:
>> The process of this kind of requests is:
>>
>> step1: original context:
>>             queue it to io-worker
>> step2: io-worker context:
>>             nonblock try(the old logic is a synchronous try here)
>>                 |
>>                 |--fail--> arm poll
>>                              |
>>                              |--(fail/ready)-->synchronous issue
>>                              |
>>                              |--(succeed)-->worker finish it's job, tw
>>                                             take over the req
>>
>> This works much better than IOSQE_ASYNC in cases where cpu resources
>> are scarce or unbound max_worker is small. In these cases, number of
>> io-worker eazily increments to max_worker, new worker cannot be created
>> and running workers stuck there handling old works in IOSQE_ASYNC mode.
>>
>> In my machine, set unbound max_worker to 20, run echo-server, turns out:
>> (arguments: register_file, connetion number is 1000, message size is 12
>> Byte)
>> IOSQE_ASYNC: 76664.151 tps
>> IOSQE_ASYNC_HYBRID: 166934.985 tps
>>
>> Suggested-by: Jens Axboe <axboe@kernel.dk>
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>   fs/io_uring.c | 42 ++++++++++++++++++++++++++++++++++++++----
>>   1 file changed, 38 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index a99f7f46e6d4..024cef09bc12 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1409,7 +1409,7 @@ static void io_prep_async_work(struct io_kiocb 
>> *req)
>>       req->work.list.next = NULL;
>>       req->work.flags = 0;
>> -    if (req->flags & REQ_F_FORCE_ASYNC)
>> +    if (req->flags & (REQ_F_FORCE_ASYNC | REQ_F_ASYNC_HYBRID))
>>           req->work.flags |= IO_WQ_WORK_CONCURRENT;
>>       if (req->flags & REQ_F_ISREG) {
>> @@ -5575,7 +5575,13 @@ static int io_arm_poll_handler(struct io_kiocb 
>> *req)
>>       req->apoll = apoll;
>>       req->flags |= REQ_F_POLLED;
>>       ipt.pt._qproc = io_async_queue_proc;
>> -    io_req_set_refcount(req);
>> +    /*
>> +     * REQ_F_REFCOUNT set indicate we are in io-worker context, where we
> 
> Nope, it indicates that needs more complex refcounting. It includes linked
> timeouts but also poll because of req_ref_get for double poll. fwiw, with
> some work it can be removed for polls, harder (and IMHO not necessary) 
> to do
> for timeouts.Agree, I now realize that the explanation I put here is not good at all,
I actually want to say that the io-worker already set refs = 2 (also
possible that prep_link_out set 1, and io-worker adds the other 1,
previously I miss this situation). One will be put at completion time,
the other one will be put in io_wq_free_work(). So no need to set the
refcount here again. I looked into io_req_set_refcount(), since it does
nothing if refcount is already not zero, I should be ok to keep this one
as it was.
> 
>> +     * already explicitly set the submittion and completion ref. So no
> 
> I'd say there is no notion of submission vs completion refs anymore.
> 
>> +     * need to set refcount here if that is the case.
>> +     */
>> +    if (!(req->flags & REQ_F_REFCOUNT))
> 
> Compare it with io_req_set_refcount(), that "if" is a a no-op
> 
>> +        io_req_set_refcount(req);
>>       ret = __io_arm_poll_handler(req, &apoll->poll, &ipt, mask,
>>                       io_async_wake);
>> @@ -6704,8 +6710,11 @@ static void io_wq_submit_work(struct io_wq_work 
>> *work)
>>           ret = -ECANCELED;
>>       if (!ret) {
>> +        bool need_poll = req->flags & REQ_F_ASYNC_HYBRID;
>> +
>>           do {
>> -            ret = io_issue_sqe(req, 0);
>> +issue_sqe:
>> +            ret = io_issue_sqe(req, need_poll ? IO_URING_F_NONBLOCK : 
>> 0);
> 
> It's buggy, you will get all kinds of kernel crashes and leaks.
> Currently IO_URING_F_NONBLOCK has dual meaning: obvious nonblock but
> also whether we hold uring_lock or not. You'd need to split the flag
> into two, i.e. IO_URING_F_LOCKED
I'll look into it. I was thinking about to do the first nowait try in
the original context, but then I thought it doesn't make sense to bring
up a worker just for poll infra arming since thread creating and
scheduling has its overhead.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] io_uring: implementation of IOSQE_ASYNC_HYBRID logic
  2021-10-11  8:55     ` Hao Xu
@ 2021-10-11  8:58       ` Hao Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Hao Xu @ 2021-10-11  8:58 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/10/11 下午4:55, Hao Xu 写道:
> 在 2021/10/9 下午8:46, Pavel Begunkov 写道:
>> On 10/8/21 13:36, Hao Xu wrote:
>>> The process of this kind of requests is:
>>>
>>> step1: original context:
>>>             queue it to io-worker
>>> step2: io-worker context:
>>>             nonblock try(the old logic is a synchronous try here)
>>>                 |
>>>                 |--fail--> arm poll
>>>                              |
>>>                              |--(fail/ready)-->synchronous issue
>>>                              |
>>>                              |--(succeed)-->worker finish it's job, tw
>>>                                             take over the req
>>>
>>> This works much better than IOSQE_ASYNC in cases where cpu resources
>>> are scarce or unbound max_worker is small. In these cases, number of
>>> io-worker eazily increments to max_worker, new worker cannot be created
>>> and running workers stuck there handling old works in IOSQE_ASYNC mode.
>>>
>>> In my machine, set unbound max_worker to 20, run echo-server, turns out:
>>> (arguments: register_file, connetion number is 1000, message size is 12
>>> Byte)
>>> IOSQE_ASYNC: 76664.151 tps
>>> IOSQE_ASYNC_HYBRID: 166934.985 tps
>>>
>>> Suggested-by: Jens Axboe <axboe@kernel.dk>
>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>> ---
>>>   fs/io_uring.c | 42 ++++++++++++++++++++++++++++++++++++++----
>>>   1 file changed, 38 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index a99f7f46e6d4..024cef09bc12 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -1409,7 +1409,7 @@ static void io_prep_async_work(struct io_kiocb 
>>> *req)
>>>       req->work.list.next = NULL;
>>>       req->work.flags = 0;
>>> -    if (req->flags & REQ_F_FORCE_ASYNC)
>>> +    if (req->flags & (REQ_F_FORCE_ASYNC | REQ_F_ASYNC_HYBRID))
>>>           req->work.flags |= IO_WQ_WORK_CONCURRENT;
>>>       if (req->flags & REQ_F_ISREG) {
>>> @@ -5575,7 +5575,13 @@ static int io_arm_poll_handler(struct io_kiocb 
>>> *req)
>>>       req->apoll = apoll;
>>>       req->flags |= REQ_F_POLLED;
>>>       ipt.pt._qproc = io_async_queue_proc;
>>> -    io_req_set_refcount(req);
>>> +    /*
>>> +     * REQ_F_REFCOUNT set indicate we are in io-worker context, 
>>> where we
>>
>> Nope, it indicates that needs more complex refcounting. It includes 
>> linked
>> timeouts but also poll because of req_ref_get for double poll. fwiw, with
>> some work it can be removed for polls, harder (and IMHO not necessary) 
>> to do
>> for timeouts.Agree, I now realize that the explanation I put here is 
                  ^ it is messed up here..
>> not good at all,
> I actually want to say that the io-worker already set refs = 2 (also
> possible that prep_link_out set 1, and io-worker adds the other 1,
> previously I miss this situation). One will be put at completion time,
> the other one will be put in io_wq_free_work(). So no need to set the
> refcount here again. I looked into io_req_set_refcount(), since it does
> nothing if refcount is already not zero, I should be ok to keep this one
> as it was.
>>
>>> +     * already explicitly set the submittion and completion ref. So no
>>
>> I'd say there is no notion of submission vs completion refs anymore.
>>
>>> +     * need to set refcount here if that is the case.
>>> +     */
>>> +    if (!(req->flags & REQ_F_REFCOUNT))
>>
>> Compare it with io_req_set_refcount(), that "if" is a a no-op
>>
>>> +        io_req_set_refcount(req);
>>>       ret = __io_arm_poll_handler(req, &apoll->poll, &ipt, mask,
>>>                       io_async_wake);
>>> @@ -6704,8 +6710,11 @@ static void io_wq_submit_work(struct 
>>> io_wq_work *work)
>>>           ret = -ECANCELED;
>>>       if (!ret) {
>>> +        bool need_poll = req->flags & REQ_F_ASYNC_HYBRID;
>>> +
>>>           do {
>>> -            ret = io_issue_sqe(req, 0);
>>> +issue_sqe:
>>> +            ret = io_issue_sqe(req, need_poll ? IO_URING_F_NONBLOCK 
>>> : 0);
>>
>> It's buggy, you will get all kinds of kernel crashes and leaks.
>> Currently IO_URING_F_NONBLOCK has dual meaning: obvious nonblock but
>> also whether we hold uring_lock or not. You'd need to split the flag
>> into two, i.e. IO_URING_F_LOCKED
> I'll look into it. I was thinking about to do the first nowait try in
> the original context, but then I thought it doesn't make sense to bring
> up a worker just for poll infra arming since thread creating and
> scheduling has its overhead.
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests
  2021-10-11  3:08   ` Hao Xu
@ 2021-10-12 11:39     ` Pavel Begunkov
  2021-10-14  8:53       ` Hao Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2021-10-12 11:39 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 10/11/21 04:08, Hao Xu wrote:
> 在 2021/10/9 下午8:51, Pavel Begunkov 写道:
>> On 10/8/21 13:36, Hao Xu wrote:
>>> this is a new feature for pollable requests, see detail in commit
>>> message.
>>
>> It really sounds we should do it as a part of IOSQE_ASYNC, so
>> what are the cons and compromises?
> I wrote the pros and cons here:
> https://github.com/axboe/liburing/issues/426#issuecomment-939221300

I see. The problem is as always, adding extra knobs, which users
should tune and it's not exactly clear where to use what. Not specific
to the new flag, there is enough confusion around IOSQE_ASYNC, but it
only makes it worse. It would be nice to have it applied
"automatically".

Say, with IOSQE_ASYNC the copy is always (almost) done by io-wq but
there is that polling optimisation on top. Do we care enough about
copying specifically in task context to have a different flag?

a quick question, what is "tps" in "IOSQE_ASYNC: 76664.151 tps"?

>>> Hao Xu (2):
>>>    io_uring: add IOSQE_ASYNC_HYBRID flag for pollable requests
>>
>> btw, it doesn't make sense to split it into two patches
> Hmm, I thought we should make adding a new flag as a separate patch.
> Could you give me more hints about the considerration here?

You can easily ignore it, just looked weird to me. Let's try to
phrase it:

1) 1/2 doesn't do anything useful w/o 2/2, iow it doesn't feel like
an atomic change. And it would be breaking the userspace, if it's
not just a hint flag.

2) it's harder to read, you search the git history, find the
implementation (and the flag is already there), you think what's
happening here, where the flag was used and so to find out that
it was added separately a commit ago.

3) sometimes it's done similarly because the API change is not
simple, but it's not the case here.
By similarly I mean the other way around, first implement it
internally, but not exposing any mean to use it, and adding
the userspace API in next commits.

>>>    io_uring: implementation of IOSQE_ASYNC_HYBRID logic
>>>
>>>   fs/io_uring.c                 | 48 +++++++++++++++++++++++++++++++----
>>>   include/uapi/linux/io_uring.h |  4 ++-
>>>   2 files changed, 46 insertions(+), 6 deletions(-)
>>>
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests
  2021-10-12 11:39     ` Pavel Begunkov
@ 2021-10-14  8:53       ` Hao Xu
  2021-10-14  9:20         ` Hao Xu
                           ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Hao Xu @ 2021-10-14  8:53 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/10/12 下午7:39, Pavel Begunkov 写道:
> On 10/11/21 04:08, Hao Xu wrote:
>> 在 2021/10/9 下午8:51, Pavel Begunkov 写道:
>>> On 10/8/21 13:36, Hao Xu wrote:
>>>> this is a new feature for pollable requests, see detail in commit
>>>> message.
>>>
>>> It really sounds we should do it as a part of IOSQE_ASYNC, so
>>> what are the cons and compromises?
>> I wrote the pros and cons here:
>> https://github.com/axboe/liburing/issues/426#issuecomment-939221300
> 
> I see. The problem is as always, adding extra knobs, which users
> should tune and it's not exactly clear where to use what. Not specific
> to the new flag, there is enough confusion around IOSQE_ASYNC, but it
> only makes it worse. It would be nice to have it applied
> "automatically".
> 
> Say, with IOSQE_ASYNC the copy is always (almost) done by io-wq but
> there is that polling optimisation on top. Do we care enough about
> copying specifically in task context to have a different flag?
> 
I did more tests in a 64 cores machine.
test command is: nice -n -15 taskset -c 10-20 ./io_uring_echo_server -p 
8084 -f -n con_nr -l 1024
where -n means the number of connections, -l means size of load.
the results of tps and cpu usage under different IO pressure is:
(didn't find the way to make it look better, you may need a markdown
renderer :) )
tps

| feature | 1 | 2 | 1000 | 2000 | 3000 | 5800 |
| -------- | -------- | -------- | -------- | -------- | -------- | 
-------- |
| ASYNC     |  123.000    |  295.203    |  67390.432   | 132686.361   | 
186084.114   | 319550.540    |
| ASYNC_HYBRID     |   122.000   |  299.401    |  168321.092   | 
188870.283  | 226427.166   |  371580.062   |


cpu

| feature | 1 | 2 | 1000 | 2000 | 3000 | 5800 |
| -------- | -------- | -------- | -------- | -------- | -------- | 
-------- |
| ASYNC     |  0.3%    |   1.0%   |   62.5%  |  111.3%  |  198.3%  | 
420.9%   |
| ASYNC_HYBRID     |    0.3%  |   1.0%   |  360%   |  435.5%  |  516.6% 
  |   1100%  |

when con_nr is 1000 or more, we leveraged all the 10 cpus. hybrid is
surely better than async. when con_nr is 1 or 2, in theory async should
be better since it use more cpu resource, but it didn't, it is because
the copying in tw is not a bottleneck. So I tried bigger workload, no
difference. So I think it should be ok to just apply this feature on top
of IOSQE_ASYNC, for all pollable requests in all condition.

Regards,
Hao
> a quick question, what is "tps" in "IOSQE_ASYNC: 76664.151 tps"?
> 
>>>> Hao Xu (2):
>>>>    io_uring: add IOSQE_ASYNC_HYBRID flag for pollable requests
>>>
>>> btw, it doesn't make sense to split it into two patches
>> Hmm, I thought we should make adding a new flag as a separate patch.
>> Could you give me more hints about the considerration here?
> 
> You can easily ignore it, just looked weird to me. Let's try to
> phrase it:
> 
> 1) 1/2 doesn't do anything useful w/o 2/2, iow it doesn't feel like
> an atomic change. And it would be breaking the userspace, if it's
> not just a hint flag.
> 
> 2) it's harder to read, you search the git history, find the
> implementation (and the flag is already there), you think what's
> happening here, where the flag was used and so to find out that
> it was added separately a commit ago.
> 
> 3) sometimes it's done similarly because the API change is not
> simple, but it's not the case here.
> By similarly I mean the other way around, first implement it
> internally, but not exposing any mean to use it, and adding
> the userspace API in next commits.
> 
>>>>    io_uring: implementation of IOSQE_ASYNC_HYBRID logic
>>>>
>>>>   fs/io_uring.c                 | 48 
>>>> +++++++++++++++++++++++++++++++----
>>>>   include/uapi/linux/io_uring.h |  4 ++-
>>>>   2 files changed, 46 insertions(+), 6 deletions(-)
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests
  2021-10-14  8:53       ` Hao Xu
@ 2021-10-14  9:20         ` Hao Xu
  2021-10-14 13:53         ` Hao Xu
  2021-10-14 14:17         ` Pavel Begunkov
  2 siblings, 0 replies; 13+ messages in thread
From: Hao Xu @ 2021-10-14  9:20 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/10/14 下午4:53, Hao Xu 写道:
> 在 2021/10/12 下午7:39, Pavel Begunkov 写道:
>> On 10/11/21 04:08, Hao Xu wrote:
>>> 在 2021/10/9 下午8:51, Pavel Begunkov 写道:
>>>> On 10/8/21 13:36, Hao Xu wrote:
>>>>> this is a new feature for pollable requests, see detail in commit
>>>>> message.
>>>>
>>>> It really sounds we should do it as a part of IOSQE_ASYNC, so
>>>> what are the cons and compromises?
>>> I wrote the pros and cons here:
>>> https://github.com/axboe/liburing/issues/426#issuecomment-939221300
>>
>> I see. The problem is as always, adding extra knobs, which users
>> should tune and it's not exactly clear where to use what. Not specific
>> to the new flag, there is enough confusion around IOSQE_ASYNC, but it
>> only makes it worse. It would be nice to have it applied
>> "automatically".
>>
>> Say, with IOSQE_ASYNC the copy is always (almost) done by io-wq but
>> there is that polling optimisation on top. Do we care enough about
>> copying specifically in task context to have a different flag?
>>
> I did more tests in a 64 cores machine.
> test command is: nice -n -15 taskset -c 10-20 ./io_uring_echo_server -p 
> 8084 -f -n con_nr -l 1024
> where -n means the number of connections, -l means size of load.
> the results of tps and cpu usage under different IO pressure is:
> (didn't find the way to make it look better, you may need a markdown
> renderer :) )
> tps
> 
> | feature | 1 | 2 | 1000 | 2000 | 3000 | 5800 |
> | -------- | -------- | -------- | -------- | -------- | -------- | 
> -------- |
> | ASYNC     |  123.000    |  295.203    |  67390.432   | 132686.361   | 
> 186084.114   | 319550.540    |
> | ASYNC_HYBRID     |   122.000   |  299.401    |  168321.092   | 
> 188870.283  | 226427.166   |  371580.062   |
> 
> 
> cpu
> 
> | feature | 1 | 2 | 1000 | 2000 | 3000 | 5800 |
> | -------- | -------- | -------- | -------- | -------- | -------- | 
> -------- |
> | ASYNC     |  0.3%    |   1.0%   |   62.5%  |  111.3%  |  198.3%  | 
> 420.9%   |
> | ASYNC_HYBRID     |    0.3%  |   1.0%   |  360%   |  435.5%  |  516.6% 
>   |   1100%  |
the data changes from time to time, so maybe not precise, but the
comparison between ASYNC and ASYNC_HYBRID shows something.
> 
> when con_nr is 1000 or more, we leveraged all the 10 cpus. hybrid is
> surely better than async. when con_nr is 1 or 2, in theory async should
> be better since it use more cpu resource, but it didn't, it is because
> the copying in tw is not a bottleneck. So I tried bigger workload, no
> difference. So I think it should be ok to just apply this feature on top
> of IOSQE_ASYNC, for all pollable requests in all condition.
> 
> Regards,
> Hao
>> a quick question, what is "tps" in "IOSQE_ASYNC: 76664.151 tps"?
>>
>>>>> Hao Xu (2):
>>>>>    io_uring: add IOSQE_ASYNC_HYBRID flag for pollable requests
>>>>
>>>> btw, it doesn't make sense to split it into two patches
>>> Hmm, I thought we should make adding a new flag as a separate patch.
>>> Could you give me more hints about the considerration here?
>>
>> You can easily ignore it, just looked weird to me. Let's try to
>> phrase it:
>>
>> 1) 1/2 doesn't do anything useful w/o 2/2, iow it doesn't feel like
>> an atomic change. And it would be breaking the userspace, if it's
>> not just a hint flag.
>>
>> 2) it's harder to read, you search the git history, find the
>> implementation (and the flag is already there), you think what's
>> happening here, where the flag was used and so to find out that
>> it was added separately a commit ago.
>>
>> 3) sometimes it's done similarly because the API change is not
>> simple, but it's not the case here.
>> By similarly I mean the other way around, first implement it
>> internally, but not exposing any mean to use it, and adding
>> the userspace API in next commits.
>>
>>>>>    io_uring: implementation of IOSQE_ASYNC_HYBRID logic
>>>>>
>>>>>   fs/io_uring.c                 | 48 
>>>>> +++++++++++++++++++++++++++++++----
>>>>>   include/uapi/linux/io_uring.h |  4 ++-
>>>>>   2 files changed, 46 insertions(+), 6 deletions(-)
>>>>>
>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests
  2021-10-14  8:53       ` Hao Xu
  2021-10-14  9:20         ` Hao Xu
@ 2021-10-14 13:53         ` Hao Xu
  2021-10-14 14:17         ` Pavel Begunkov
  2 siblings, 0 replies; 13+ messages in thread
From: Hao Xu @ 2021-10-14 13:53 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/10/14 下午4:53, Hao Xu 写道:
> 在 2021/10/12 下午7:39, Pavel Begunkov 写道:
>> On 10/11/21 04:08, Hao Xu wrote:
>>> 在 2021/10/9 下午8:51, Pavel Begunkov 写道:
>>>> On 10/8/21 13:36, Hao Xu wrote:
>>>>> this is a new feature for pollable requests, see detail in commit
>>>>> message.
>>>>
>>>> It really sounds we should do it as a part of IOSQE_ASYNC, so
>>>> what are the cons and compromises?
>>> I wrote the pros and cons here:
>>> https://github.com/axboe/liburing/issues/426#issuecomment-939221300
>>
>> I see. The problem is as always, adding extra knobs, which users
>> should tune and it's not exactly clear where to use what. Not specific
>> to the new flag, there is enough confusion around IOSQE_ASYNC, but it
>> only makes it worse. It would be nice to have it applied
>> "automatically".
>>
>> Say, with IOSQE_ASYNC the copy is always (almost) done by io-wq but
>> there is that polling optimisation on top. Do we care enough about
>> copying specifically in task context to have a different flag?
>>
> I did more tests in a 64 cores machine.
> test command is: nice -n -15 taskset -c 10-20 ./io_uring_echo_server -p 
> 8084 -f -n con_nr -l 1024
> where -n means the number of connections, -l means size of load.
> the results of tps and cpu usage under different IO pressure is:
> (didn't find the way to make it look better, you may need a markdown
> renderer :) )
> tps
> 
> | feature | 1 | 2 | 1000 | 2000 | 3000 | 5800 |
> | -------- | -------- | -------- | -------- | -------- | -------- | 
> -------- |
> | ASYNC     |  123.000    |  295.203    |  67390.432   | 132686.361   | 
> 186084.114   | 319550.540    |
> | ASYNC_HYBRID     |   122.000   |  299.401    |  168321.092   | 
> 188870.283  | 226427.166   |  371580.062   |
> 
> 
> cpu
> 
> | feature | 1 | 2 | 1000 | 2000 | 3000 | 5800 |
> | -------- | -------- | -------- | -------- | -------- | -------- | 
> -------- |
> | ASYNC     |  0.3%    |   1.0%   |   62.5%  |  111.3%  |  198.3%  | 
> 420.9%   |
> | ASYNC_HYBRID     |    0.3%  |   1.0%   |  360%   |  435.5%  |  516.6% 
>   |   1100%  |
> 
> when con_nr is 1000 or more, we leveraged all the 10 cpus. hybrid is
> surely better than async. when con_nr is 1 or 2, in theory async should
> be better since it use more cpu resource, but it didn't, it is because
> the copying in tw is not a bottleneck. So I tried bigger workload, no
> difference. So I think it should be ok to just apply this feature on top
> of IOSQE_ASYNC, for all pollable requests in all condition.
> 
> Regards,
> Hao
>> a quick question, what is "tps" in "IOSQE_ASYNC: 76664.151 tps"?
Forgot this one.. tps = transaction per second, here a transaction means
the client send the workload to the server and receive it from the
server.
>>
>>>>> Hao Xu (2):
>>>>>    io_uring: add IOSQE_ASYNC_HYBRID flag for pollable requests
>>>>
>>>> btw, it doesn't make sense to split it into two patches
>>> Hmm, I thought we should make adding a new flag as a separate patch.
>>> Could you give me more hints about the considerration here?
>>
>> You can easily ignore it, just looked weird to me. Let's try to
>> phrase it:
>>
>> 1) 1/2 doesn't do anything useful w/o 2/2, iow it doesn't feel like
>> an atomic change. And it would be breaking the userspace, if it's
>> not just a hint flag.
>>
>> 2) it's harder to read, you search the git history, find the
>> implementation (and the flag is already there), you think what's
>> happening here, where the flag was used and so to find out that
>> it was added separately a commit ago.
>>
>> 3) sometimes it's done similarly because the API change is not
>> simple, but it's not the case here.
>> By similarly I mean the other way around, first implement it
>> internally, but not exposing any mean to use it, and adding
>> the userspace API in next commits.
>>
>>>>>    io_uring: implementation of IOSQE_ASYNC_HYBRID logic
>>>>>
>>>>>   fs/io_uring.c                 | 48 
>>>>> +++++++++++++++++++++++++++++++----
>>>>>   include/uapi/linux/io_uring.h |  4 ++-
>>>>>   2 files changed, 46 insertions(+), 6 deletions(-)
>>>>>
>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests
  2021-10-14  8:53       ` Hao Xu
  2021-10-14  9:20         ` Hao Xu
  2021-10-14 13:53         ` Hao Xu
@ 2021-10-14 14:17         ` Pavel Begunkov
  2 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-10-14 14:17 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 10/14/21 09:53, Hao Xu wrote:
> 在 2021/10/12 下午7:39, Pavel Begunkov 写道:
>> On 10/11/21 04:08, Hao Xu wrote:
>>> 在 2021/10/9 下午8:51, Pavel Begunkov 写道:
>>>> On 10/8/21 13:36, Hao Xu wrote:
>>>>> this is a new feature for pollable requests, see detail in commit
>>>>> message.
>>>>
>>>> It really sounds we should do it as a part of IOSQE_ASYNC, so
>>>> what are the cons and compromises?
>>> I wrote the pros and cons here:
>>> https://github.com/axboe/liburing/issues/426#issuecomment-939221300
>>
>> I see. The problem is as always, adding extra knobs, which users
>> should tune and it's not exactly clear where to use what. Not specific
>> to the new flag, there is enough confusion around IOSQE_ASYNC, but it
>> only makes it worse. It would be nice to have it applied
>> "automatically".
>>
>> Say, with IOSQE_ASYNC the copy is always (almost) done by io-wq but
>> there is that polling optimisation on top. Do we care enough about
>> copying specifically in task context to have a different flag?
>>
> I did more tests in a 64 cores machine.
> test command is: nice -n -15 taskset -c 10-20 ./io_uring_echo_server -p 8084 -f -n con_nr -l 1024
> where -n means the number of connections, -l means size of load.
> the results of tps and cpu usage under different IO pressure is:
> (didn't find the way to make it look better, you may need a markdown
> renderer :) )
> tps
> 
> | feature | 1 | 2 | 1000 | 2000 | 3000 | 5800 |
> | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
> | ASYNC     |  123.000    |  295.203    |  67390.432   | 132686.361   | 186084.114   | 319550.540    |
> | ASYNC_HYBRID     |   122.000   |  299.401    |  168321.092   | 188870.283  | 226427.166   |  371580.062   |
> 
> 
> cpu
> 
> | feature | 1 | 2 | 1000 | 2000 | 3000 | 5800 |
> | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
> | ASYNC     |  0.3%    |   1.0%   |   62.5%  |  111.3%  |  198.3%  | 420.9%   |
> | ASYNC_HYBRID     |    0.3%  |   1.0%   |  360%   |  435.5%  |  516.6%  |   1100%  |
> 
> when con_nr is 1000 or more, we leveraged all the 10 cpus. hybrid is
> surely better than async. when con_nr is 1 or 2, in theory async should
> be better since it use more cpu resource, but it didn't, it is because
> the copying in tw is not a bottleneck. So I tried bigger workload, no
> difference. So I think it should be ok to just apply this feature on top
> of IOSQE_ASYNC, for all pollable requests in all condition.

Sounds great. And IOSQE_ASYNC is a hint flag, so if things change
we can return it back the behaviour of IOSQE_ASYNC and add that new
flag (or do something else).


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-14 14:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-08 12:36 [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests Hao Xu
2021-10-08 12:36 ` [PATCH 1/2] io_uring: add IOSQE_ASYNC_HYBRID flag " Hao Xu
2021-10-08 12:36 ` [PATCH 2/2] io_uring: implementation of IOSQE_ASYNC_HYBRID logic Hao Xu
2021-10-09 12:46   ` Pavel Begunkov
2021-10-11  8:55     ` Hao Xu
2021-10-11  8:58       ` Hao Xu
2021-10-09 12:51 ` [PATCH for-5.16 0/2] async hybrid, a new way for pollable requests Pavel Begunkov
2021-10-11  3:08   ` Hao Xu
2021-10-12 11:39     ` Pavel Begunkov
2021-10-14  8:53       ` Hao Xu
2021-10-14  9:20         ` Hao Xu
2021-10-14 13:53         ` Hao Xu
2021-10-14 14:17         ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).