All of lore.kernel.org
 help / color / mirror / Atom feed
* IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
@ 2022-04-19 11:07 Avi Kivity
  2022-04-19 11:38 ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2022-04-19 11:07 UTC (permalink / raw)
  To: io-uring

A simple webserver shows about 5% loss compared to linux-aio.


I expect the loss is due to an optimization that io_uring lacks - inline 
completion vs workqueue completion:


static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, 
int sync,
                 void *key)
{
         struct poll_iocb *req = container_of(wait, struct poll_iocb, wait);
         struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
         __poll_t mask = key_to_poll(key);
         unsigned long flags;

         /* for instances that support it check for an event match first: */
         if (mask && !(mask & req->events))
                 return 0;

         /*
          * Complete the request inline if possible.  This requires that 
three
          * conditions be met:
          *   1. An event mask must have been passed.  If a plain wakeup 
was done
          *      instead, then mask == 0 and we have to call vfs_poll() 
to get
          *      the events, so inline completion isn't possible.
          *   2. The completion work must not have already been scheduled.
          *   3. ctx_lock must not be busy.  We have to use trylock 
because we
          *      already hold the waitqueue lock, so this inverts the normal
          *      locking order.  Use irqsave/irqrestore because not all
          *      filesystems (e.g. fuse) call this function with IRQs 
disabled,
          *      yet IRQs have to be disabled before ctx_lock is obtained.
          */
         if (mask && !req->work_scheduled &&
spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
                 struct kioctx *ctx = iocb->ki_ctx;

                 list_del_init(&req->wait.entry);
                 list_del(&iocb->ki_list);
                 iocb->ki_res.res = mangle_poll(mask);
                 if (iocb->ki_eventfd && !eventfd_signal_allowed()) {
                         iocb = NULL;
                         INIT_WORK(&req->work, aio_poll_put_work);
                         schedule_work(&req->work);
                 }
                 spin_unlock_irqrestore(&ctx->ctx_lock, flags);
                 if (iocb)
                         iocb_put(iocb);
         } else {


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 11:07 IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL Avi Kivity
@ 2022-04-19 11:38 ` Jens Axboe
  2022-04-19 11:57   ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2022-04-19 11:38 UTC (permalink / raw)
  To: Avi Kivity, io-uring

On 4/19/22 5:07 AM, Avi Kivity wrote:
> A simple webserver shows about 5% loss compared to linux-aio.
> 
> 
> I expect the loss is due to an optimization that io_uring lacks -
> inline completion vs workqueue completion:

I don't think that's it, io_uring never punts to a workqueue for
completions. The aio inline completions is more of a hack because it
needs to do that, as always using a workqueue would lead to bad
performance and higher overhead.

So if there's a difference in performance, it's something else and we
need to look at that. But your report is pretty lacking! What kernel are
you running?

Do you have a test case of sorts?

For a performance oriented network setup, I'd normally not consider data
readiness poll replacements to be that interesting, my recommendation
would be to use async send/recv for that instead. That's how io_uring is
supposed to be used, in a completion based model.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 11:38 ` Jens Axboe
@ 2022-04-19 11:57   ` Avi Kivity
  2022-04-19 12:04     ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2022-04-19 11:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring


On 19/04/2022 14.38, Jens Axboe wrote:
> On 4/19/22 5:07 AM, Avi Kivity wrote:
>> A simple webserver shows about 5% loss compared to linux-aio.
>>
>>
>> I expect the loss is due to an optimization that io_uring lacks -
>> inline completion vs workqueue completion:
> I don't think that's it, io_uring never punts to a workqueue for
> completions.


I measured this:



  Performance counter stats for 'system wide':

          1,273,756 io_uring:io_uring_task_add

       12.288597765 seconds time elapsed

Which exactly matches with the number of requests sent. If that's the 
wrong counter to measure, I'm happy to try again with the correct counter.


>   The aio inline completions is more of a hack because it
> needs to do that, as always using a workqueue would lead to bad
> performance and higher overhead.
>
> So if there's a difference in performance, it's something else and we
> need to look at that. But your report is pretty lacking! What kernel are
> you running?


5.17.2-300.fc36.x86_64


> Do you have a test case of sorts?


Seastar's httpd, running on a single core, against wrk -c 1000 -t 4 
http://localhost:10000/.


Instructions:

   git clone --recursive -b io_uring https://github.com/avikivity/seastar

   cd seastar

   sudo ./install-dependencies.sh  # after carefully verifying it, of course

   ./configure.py --mode release

   ninja -C build/release apps/httpd/httpd

   ./build/release/apps/httpd/httpd --smp 1 [--reactor-backing 
io_uring|linux-aio|epoll]


and run wrk againt it.


> For a performance oriented network setup, I'd normally not consider data
> readiness poll replacements to be that interesting, my recommendation
> would be to use async send/recv for that instead. That's how io_uring is
> supposed to be used, in a completion based model.
>

That's true. Still, an existing system that evolved around poll will 
take some time and effort to migrate, and have slower IORING_OP_POLL 
means it cannot benefit from io_uring's many other advantages if it 
fears a regression from that difference.


Note that it's not just a matter of converting poll+recvmsg to 
IORING_OP_RECVMSG. If you support many connections, one must migrate to 
internal buffer selection, otherwise the memory load with a large number 
of idle connections is high. The end result is wonderful but the road 
there is long.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 11:57   ` Avi Kivity
@ 2022-04-19 12:04     ` Jens Axboe
  2022-04-19 12:21       ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2022-04-19 12:04 UTC (permalink / raw)
  To: Avi Kivity, io-uring

On 4/19/22 5:57 AM, Avi Kivity wrote:
> 
> On 19/04/2022 14.38, Jens Axboe wrote:
>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>
>>>
>>> I expect the loss is due to an optimization that io_uring lacks -
>>> inline completion vs workqueue completion:
>> I don't think that's it, io_uring never punts to a workqueue for
>> completions.
> 
> 
> I measured this:
> 
> 
> 
>  Performance counter stats for 'system wide':
> 
>          1,273,756 io_uring:io_uring_task_add
> 
>       12.288597765 seconds time elapsed
> 
> Which exactly matches with the number of requests sent. If that's the
> wrong counter to measure, I'm happy to try again with the correct
> counter.

io_uring_task_add() isn't a workqueue, it's task_work. So that is
expected.

>>   The aio inline completions is more of a hack because it
>> needs to do that, as always using a workqueue would lead to bad
>> performance and higher overhead.
>>
>> So if there's a difference in performance, it's something else and we
>> need to look at that. But your report is pretty lacking! What kernel are
>> you running?
> 
> 
> 5.17.2-300.fc36.x86_64

OK, that sounds fine.

>> Do you have a test case of sorts?
> 
> 
> Seastar's httpd, running on a single core, against wrk -c 1000 -t 4 http://localhost:10000/.
> 
> 
> Instructions:
> 
>   git clone --recursive -b io_uring https://github.com/avikivity/seastar
> 
>   cd seastar
> 
>   sudo ./install-dependencies.sh  # after carefully verifying it, of course
> 
>   ./configure.py --mode release
> 
>   ninja -C build/release apps/httpd/httpd
> 
>   ./build/release/apps/httpd/httpd --smp 1 [--reactor-backing io_uring|linux-aio|epoll]
> 
> 
> and run wrk againt it.

Thanks, I'll give that a spin!

>> For a performance oriented network setup, I'd normally not consider data
>> readiness poll replacements to be that interesting, my recommendation
>> would be to use async send/recv for that instead. That's how io_uring is
>> supposed to be used, in a completion based model.
>>
> 
> That's true. Still, an existing system that evolved around poll will
> take some time and effort to migrate, and have slower IORING_OP_POLL
> means it cannot benefit from io_uring's many other advantages if it
> fears a regression from that difference.

I'd like to separate the two - should the OP_POLL work as well, most
certainly. Do I think it's largely a useless way to run it, also yes :-)

> Note that it's not just a matter of converting poll+recvmsg to
> IORING_OP_RECVMSG. If you support many connections, one must migrate
> to internal buffer selection, otherwise the memory load with a large
> number of idle connections is high. The end result is wonderful but
> the road there is long.

Totally agree. My point is just that to take full advantage of it, you
need to be using that kind of model and quick conversions aren't really
expected to yield much of a performance win. They are also not supposed
to run slower, so that does need some attention if that's the case here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 12:04     ` Jens Axboe
@ 2022-04-19 12:21       ` Avi Kivity
  2022-04-19 12:31         ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2022-04-19 12:21 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 19/04/2022 15.04, Jens Axboe wrote:
> On 4/19/22 5:57 AM, Avi Kivity wrote:
>> On 19/04/2022 14.38, Jens Axboe wrote:
>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>
>>>>
>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>> inline completion vs workqueue completion:
>>> I don't think that's it, io_uring never punts to a workqueue for
>>> completions.
>>
>> I measured this:
>>
>>
>>
>>   Performance counter stats for 'system wide':
>>
>>           1,273,756 io_uring:io_uring_task_add
>>
>>        12.288597765 seconds time elapsed
>>
>> Which exactly matches with the number of requests sent. If that's the
>> wrong counter to measure, I'm happy to try again with the correct
>> counter.
> io_uring_task_add() isn't a workqueue, it's task_work. So that is
> expected.


Ah, and it should be fine. I'll try 'perf diff' again (I ran it but 
didn't reach any conclusive results and assumed non-systemwide runs 
weren't measuring workqueues (and systemwide runs generated too much 
noise on my workstation)).


>>> Do you have a test case of sorts?
>>
>> Seastar's httpd, running on a single core, against wrk -c 1000 -t 4 http://localhost:10000/.
>>
>>
>> Instructions:
>>
>>    git clone --recursive -b io_uring https://github.com/avikivity/seastar
>>
>>    cd seastar
>>
>>    sudo ./install-dependencies.sh  # after carefully verifying it, of course
>>
>>    ./configure.py --mode release
>>
>>    ninja -C build/release apps/httpd/httpd
>>
>>    ./build/release/apps/httpd/httpd --smp 1 [--reactor-backing io_uring|linux-aio|epoll]
>>
>>
>> and run wrk againt it.
> Thanks, I'll give that a spin!


Thanks. You may need ./configure --c++-dialect=c++17 if your C++ 
compiler is too old.


>
>>> For a performance oriented network setup, I'd normally not consider data
>>> readiness poll replacements to be that interesting, my recommendation
>>> would be to use async send/recv for that instead. That's how io_uring is
>>> supposed to be used, in a completion based model.
>>>
>> That's true. Still, an existing system that evolved around poll will
>> take some time and effort to migrate, and have slower IORING_OP_POLL
>> means it cannot benefit from io_uring's many other advantages if it
>> fears a regression from that difference.
> I'd like to separate the two - should the OP_POLL work as well, most
> certainly. Do I think it's largely a useless way to run it, also yes :-)


Agree.


>
>> Note that it's not just a matter of converting poll+recvmsg to
>> IORING_OP_RECVMSG. If you support many connections, one must migrate
>> to internal buffer selection, otherwise the memory load with a large
>> number of idle connections is high. The end result is wonderful but
>> the road there is long.
> Totally agree. My point is just that to take full advantage of it, you
> need to be using that kind of model and quick conversions aren't really
> expected to yield much of a performance win. They are also not supposed
> to run slower, so that does need some attention if that's the case here.
>

We're in agreement, but I'd like to clarify the quick conversion is 
intended to win from other aspects of io_uring, with the deeper change 
coming later.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 12:21       ` Avi Kivity
@ 2022-04-19 12:31         ` Jens Axboe
  2022-04-19 15:21           ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2022-04-19 12:31 UTC (permalink / raw)
  To: Avi Kivity, io-uring

On 4/19/22 6:21 AM, Avi Kivity wrote:
> On 19/04/2022 15.04, Jens Axboe wrote:
>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>
>>>>>
>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>> inline completion vs workqueue completion:
>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>> completions.
>>>
>>> I measured this:
>>>
>>>
>>>
>>>   Performance counter stats for 'system wide':
>>>
>>>           1,273,756 io_uring:io_uring_task_add
>>>
>>>        12.288597765 seconds time elapsed
>>>
>>> Which exactly matches with the number of requests sent. If that's the
>>> wrong counter to measure, I'm happy to try again with the correct
>>> counter.
>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>> expected.
> 
> 
> Ah, and it should be fine. I'll try 'perf diff' again (I ran it but
> didn't reach any conclusive results and assumed non-systemwide runs
> weren't measuring workqueues (and systemwide runs generated too much
> noise on my workstation)).

Might help to run the client from another box? But I did a quick run
just to see if it worked after setting it up, and it seems mostly
dominated by seastar itself. But nothing conclusive, was more of a
"let's see if it works". I'll take an actual look in a bit.

One thing that did stick out is excessive fget/fput, which is usually
solved by using direct descriptors with io_uring. You can use
accept_direct, for example, and then IOSQE_FIXED_FILE with the
send/recv. But it looks like you are just using io_uring for polling,
which also eliminates being able to do that.

You can retain the readiness based model and still use io_uring for the
actual data transfer, at least that would enable you to use some
features that would help reduce overhead.

Note that this is separate from the actual poll being slower. I might
try and write a basic benchmark that is just poll and see what comes up.
Would help do more targeted development and profiling.

>>>> Do you have a test case of sorts?
>>>
>>> Seastar's httpd, running on a single core, against wrk -c 1000 -t 4 http://localhost:10000/.
>>>
>>>
>>> Instructions:
>>>
>>>    git clone --recursive -b io_uring https://github.com/avikivity/seastar
>>>
>>>    cd seastar
>>>
>>>    sudo ./install-dependencies.sh  # after carefully verifying it, of course
>>>
>>>    ./configure.py --mode release
>>>
>>>    ninja -C build/release apps/httpd/httpd
>>>
>>>    ./build/release/apps/httpd/httpd --smp 1 [--reactor-backing io_uring|linux-aio|epoll]

s/backing/backend

>>>> For a performance oriented network setup, I'd normally not consider data
>>>> readiness poll replacements to be that interesting, my recommendation
>>>> would be to use async send/recv for that instead. That's how io_uring is
>>>> supposed to be used, in a completion based model.
>>>>
>>> That's true. Still, an existing system that evolved around poll will
>>> take some time and effort to migrate, and have slower IORING_OP_POLL
>>> means it cannot benefit from io_uring's many other advantages if it
>>> fears a regression from that difference.
>> I'd like to separate the two - should the OP_POLL work as well, most
>> certainly. Do I think it's largely a useless way to run it, also yes :-)
> 
> 
> Agree.
> 
> 
>>
>>> Note that it's not just a matter of converting poll+recvmsg to
>>> IORING_OP_RECVMSG. If you support many connections, one must migrate
>>> to internal buffer selection, otherwise the memory load with a large
>>> number of idle connections is high. The end result is wonderful but
>>> the road there is long.
>>
>> Totally agree. My point is just that to take full advantage of it,
>> you need to be using that kind of model and quick conversions aren't
>> really expected to yield much of a performance win. They are also not
>> supposed to run slower, so that does need some attention if that's
>> the case here.
>>
> 
> We're in agreement, but I'd like to clarify the quick conversion is
> intended to win from other aspects of io_uring, with the deeper change
> coming later.

Ideally, yes.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 12:31         ` Jens Axboe
@ 2022-04-19 15:21           ` Jens Axboe
  2022-04-19 15:51             ` Avi Kivity
  2022-04-19 17:14             ` Jens Axboe
  0 siblings, 2 replies; 23+ messages in thread
From: Jens Axboe @ 2022-04-19 15:21 UTC (permalink / raw)
  To: Avi Kivity, io-uring

On 4/19/22 6:31 AM, Jens Axboe wrote:
> On 4/19/22 6:21 AM, Avi Kivity wrote:
>> On 19/04/2022 15.04, Jens Axboe wrote:
>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>
>>>>>>
>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>> inline completion vs workqueue completion:
>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>> completions.
>>>>
>>>> I measured this:
>>>>
>>>>
>>>>
>>>>   Performance counter stats for 'system wide':
>>>>
>>>>           1,273,756 io_uring:io_uring_task_add
>>>>
>>>>        12.288597765 seconds time elapsed
>>>>
>>>> Which exactly matches with the number of requests sent. If that's the
>>>> wrong counter to measure, I'm happy to try again with the correct
>>>> counter.
>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>> expected.

Might actually be implicated. Not because it's a async worker, but
because I think we might be losing some affinity in this case. Looking
at traces, we're definitely bouncing between the poll completion side
and then execution the completion.

Can you try this hack? It's against -git + for-5.19/io_uring. If you let
me know what base you prefer, I can do a version against that. I see
about a 3% win with io_uring with this, and was slower before against
linux-aio as you saw as well.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index caa5b673f8f5..f3da6c9a9635 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6303,6 +6303,25 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 		io_req_complete_failed(req, ret);
 }
 
+static bool __io_poll_execute_direct(struct io_kiocb *req, int mask, int events)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+
+	if (ctx->has_evfd || req->flags & REQ_F_INFLIGHT ||
+	    req->opcode != IORING_OP_POLL_ADD)
+		return false;
+	if (!spin_trylock(&ctx->completion_lock))
+		return false;
+
+	req->cqe.res = mangle_poll(mask & events);
+	hash_del(&req->hash_node);
+	__io_req_complete_post(req, req->cqe.res, 0);
+	io_commit_cqring(ctx);
+	spin_unlock(&ctx->completion_lock);
+	io_cqring_ev_posted(ctx);
+	return true;
+}
+
 static void __io_poll_execute(struct io_kiocb *req, int mask, int events)
 {
 	req->cqe.res = mask;
@@ -6384,7 +6403,8 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 			else
 				req->flags &= ~REQ_F_SINGLE_POLL;
 		}
-		__io_poll_execute(req, mask, poll->events);
+		if (!__io_poll_execute_direct(req, mask, poll->events))
+			__io_poll_execute(req, mask, poll->events);
 	}
 	return 1;
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 15:21           ` Jens Axboe
@ 2022-04-19 15:51             ` Avi Kivity
  2022-04-19 17:14             ` Jens Axboe
  1 sibling, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2022-04-19 15:51 UTC (permalink / raw)
  To: Jens Axboe, io-uring


On 19/04/2022 18.21, Jens Axboe wrote:
> On 4/19/22 6:31 AM, Jens Axboe wrote:
>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>
>>>>>>>
>>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>>> inline completion vs workqueue completion:
>>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>>> completions.
>>>>> I measured this:
>>>>>
>>>>>
>>>>>
>>>>>    Performance counter stats for 'system wide':
>>>>>
>>>>>            1,273,756 io_uring:io_uring_task_add
>>>>>
>>>>>         12.288597765 seconds time elapsed
>>>>>
>>>>> Which exactly matches with the number of requests sent. If that's the
>>>>> wrong counter to measure, I'm happy to try again with the correct
>>>>> counter.
>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>> expected.
> Might actually be implicated. Not because it's a async worker, but
> because I think we might be losing some affinity in this case. Looking
> at traces, we're definitely bouncing between the poll completion side
> and then execution the completion.


What affinity are we losing?


Maybe it's TWA_SIGNAL, which causes the poll notification (which could 
happen while httpd is running) to interrupt httpd? Although it should 
happen during tcp processing in softirq, which should be co-located with 
httpd and therefore httpd wasn't running?


>
> Can you try this hack? It's against -git + for-5.19/io_uring. If you let
> me know what base you prefer, I can do a version against that. I see
> about a 3% win with io_uring with this, and was slower before against
> linux-aio as you saw as well.
>

Sure, I'll try it against for-5.19/io_uring, though I don't doubt you 
fixed it.


> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index caa5b673f8f5..f3da6c9a9635 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -6303,6 +6303,25 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
>   		io_req_complete_failed(req, ret);
>   }
>   
> +static bool __io_poll_execute_direct(struct io_kiocb *req, int mask, int events)
> +{
> +	struct io_ring_ctx *ctx = req->ctx;
> +
> +	if (ctx->has_evfd || req->flags & REQ_F_INFLIGHT ||
> +	    req->opcode != IORING_OP_POLL_ADD)
> +		return false;
> +	if (!spin_trylock(&ctx->completion_lock))
> +		return false;


This looks as if it's losing some affinity, before all completions were 
co-located with httpd and now some are not. So maybe it's the TWA_SIGNAL 
thing.


> +
> +	req->cqe.res = mangle_poll(mask & events);
> +	hash_del(&req->hash_node);
> +	__io_req_complete_post(req, req->cqe.res, 0);
> +	io_commit_cqring(ctx);
> +	spin_unlock(&ctx->completion_lock);
> +	io_cqring_ev_posted(ctx);
> +	return true;
> +}
> +
>   static void __io_poll_execute(struct io_kiocb *req, int mask, int events)
>   {
>   	req->cqe.res = mask;
> @@ -6384,7 +6403,8 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
>   			else
>   				req->flags &= ~REQ_F_SINGLE_POLL;
>   		}
> -		__io_poll_execute(req, mask, poll->events);
> +		if (!__io_poll_execute_direct(req, mask, poll->events))
> +			__io_poll_execute(req, mask, poll->events);
>   	}
>   	return 1;
>   }
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 15:21           ` Jens Axboe
  2022-04-19 15:51             ` Avi Kivity
@ 2022-04-19 17:14             ` Jens Axboe
  2022-04-19 19:41               ` Avi Kivity
  2022-06-15 10:12               ` Avi Kivity
  1 sibling, 2 replies; 23+ messages in thread
From: Jens Axboe @ 2022-04-19 17:14 UTC (permalink / raw)
  To: Avi Kivity, io-uring

On 4/19/22 9:21 AM, Jens Axboe wrote:
> On 4/19/22 6:31 AM, Jens Axboe wrote:
>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>
>>>>>>>
>>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>>> inline completion vs workqueue completion:
>>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>>> completions.
>>>>>
>>>>> I measured this:
>>>>>
>>>>>
>>>>>
>>>>>   Performance counter stats for 'system wide':
>>>>>
>>>>>           1,273,756 io_uring:io_uring_task_add
>>>>>
>>>>>        12.288597765 seconds time elapsed
>>>>>
>>>>> Which exactly matches with the number of requests sent. If that's the
>>>>> wrong counter to measure, I'm happy to try again with the correct
>>>>> counter.
>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>> expected.
> 
> Might actually be implicated. Not because it's a async worker, but
> because I think we might be losing some affinity in this case. Looking
> at traces, we're definitely bouncing between the poll completion side
> and then execution the completion.
> 
> Can you try this hack? It's against -git + for-5.19/io_uring. If you let
> me know what base you prefer, I can do a version against that. I see
> about a 3% win with io_uring with this, and was slower before against
> linux-aio as you saw as well.

Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
believe may be the underlying cause of it.


diff --git a/fs/io-wq.c b/fs/io-wq.c
index 32aeb2c581c5..59987dd212d8 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -871,7 +871,7 @@ static bool io_wq_for_each_worker(struct io_wqe *wqe,
 
 static bool io_wq_worker_wake(struct io_worker *worker, void *data)
 {
-	set_notify_signal(worker->task);
+	set_notify_signal(worker->task, true);
 	wake_up_process(worker->task);
 	return false;
 }
@@ -991,7 +991,7 @@ static bool __io_wq_worker_cancel(struct io_worker *worker,
 {
 	if (work && match->fn(work, match->data)) {
 		work->flags |= IO_WQ_WORK_CANCEL;
-		set_notify_signal(worker->task);
+		set_notify_signal(worker->task, true);
 		return true;
 	}
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..ac1f14973e09 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -359,10 +359,10 @@ static inline void clear_notify_signal(void)
  * Called to break out of interruptible wait loops, and enter the
  * exit_to_user_mode_loop().
  */
-static inline void set_notify_signal(struct task_struct *task)
+static inline void set_notify_signal(struct task_struct *task, bool need_ipi)
 {
 	if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
-	    !wake_up_state(task, TASK_INTERRUPTIBLE))
+	    !wake_up_state(task, TASK_INTERRUPTIBLE) && need_ipi)
 		kick_process(task);
 }
 
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 5d03a2ad1066..bff53f539933 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -367,7 +367,7 @@ static void klp_send_signals(void)
 			 * Send fake signal to all non-kthread tasks which are
 			 * still not migrated.
 			 */
-			set_notify_signal(task);
+			set_notify_signal(task, true);
 		}
 	}
 	read_unlock(&tasklist_lock);
diff --git a/kernel/task_work.c b/kernel/task_work.c
index c59e1a49bc40..47d7024dc499 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -51,7 +51,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
 		set_notify_resume(task);
 		break;
 	case TWA_SIGNAL:
-		set_notify_signal(task);
+		set_notify_signal(task, false);
 		break;
 	default:
 		WARN_ON_ONCE(1);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 17:14             ` Jens Axboe
@ 2022-04-19 19:41               ` Avi Kivity
  2022-04-19 19:58                 ` Jens Axboe
  2022-06-15 10:12               ` Avi Kivity
  1 sibling, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2022-04-19 19:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring


On 19/04/2022 20.14, Jens Axboe wrote:
> On 4/19/22 9:21 AM, Jens Axboe wrote:
>> On 4/19/22 6:31 AM, Jens Axboe wrote:
>>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>>
>>>>>>>>
>>>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>>>> inline completion vs workqueue completion:
>>>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>>>> completions.
>>>>>> I measured this:
>>>>>>
>>>>>>
>>>>>>
>>>>>>    Performance counter stats for 'system wide':
>>>>>>
>>>>>>            1,273,756 io_uring:io_uring_task_add
>>>>>>
>>>>>>         12.288597765 seconds time elapsed
>>>>>>
>>>>>> Which exactly matches with the number of requests sent. If that's the
>>>>>> wrong counter to measure, I'm happy to try again with the correct
>>>>>> counter.
>>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>>> expected.
>> Might actually be implicated. Not because it's a async worker, but
>> because I think we might be losing some affinity in this case. Looking
>> at traces, we're definitely bouncing between the poll completion side
>> and then execution the completion.
>>
>> Can you try this hack? It's against -git + for-5.19/io_uring. If you let
>> me know what base you prefer, I can do a version against that. I see
>> about a 3% win with io_uring with this, and was slower before against
>> linux-aio as you saw as well.
> Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
> believe may be the underlying cause of it.
>

Won't it delay notification until the next io_uring_enter? Or does 
io_uring only guarantee completions when you call it (and earlier 
completions are best-effort?)


For Seastar it's not a problem, asking about the general io_uring 
completion philosophy.


I'll try it tomorrow (also the other patch).




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 19:41               ` Avi Kivity
@ 2022-04-19 19:58                 ` Jens Axboe
  2022-04-20 11:55                   ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2022-04-19 19:58 UTC (permalink / raw)
  To: Avi Kivity, io-uring

On 4/19/22 1:41 PM, Avi Kivity wrote:
> 
> On 19/04/2022 20.14, Jens Axboe wrote:
>> On 4/19/22 9:21 AM, Jens Axboe wrote:
>>> On 4/19/22 6:31 AM, Jens Axboe wrote:
>>>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>>>>> inline completion vs workqueue completion:
>>>>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>>>>> completions.
>>>>>>> I measured this:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    Performance counter stats for 'system wide':
>>>>>>>
>>>>>>>            1,273,756 io_uring:io_uring_task_add
>>>>>>>
>>>>>>>         12.288597765 seconds time elapsed
>>>>>>>
>>>>>>> Which exactly matches with the number of requests sent. If that's the
>>>>>>> wrong counter to measure, I'm happy to try again with the correct
>>>>>>> counter.
>>>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>>>> expected.
>>> Might actually be implicated. Not because it's a async worker, but
>>> because I think we might be losing some affinity in this case. Looking
>>> at traces, we're definitely bouncing between the poll completion side
>>> and then execution the completion.
>>>
>>> Can you try this hack? It's against -git + for-5.19/io_uring. If you let
>>> me know what base you prefer, I can do a version against that. I see
>>> about a 3% win with io_uring with this, and was slower before against
>>> linux-aio as you saw as well.
>> Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
>> believe may be the underlying cause of it.
>>
> 
> Won't it delay notification until the next io_uring_enter? Or does
> io_uring only guarantee completions when you call it (and earlier
> completions are best-effort?)

Only if it needs to reschedule, it'll still enter the kernel if not. Or
if it's waiting in the kernel, it'll still run the task work as the
TIF_NOTIFY_SIGNAL will get that job done.

So actually not sure if we ever need the IPI, doesn't seem like we do.

> I'll try it tomorrow (also the other patch).

Thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 19:58                 ` Jens Axboe
@ 2022-04-20 11:55                   ` Avi Kivity
  2022-04-20 12:09                     ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2022-04-20 11:55 UTC (permalink / raw)
  To: Jens Axboe, io-uring


On 19/04/2022 22.58, Jens Axboe wrote:
>
>> I'll try it tomorrow (also the other patch).
> Thanks!
>

With the new kernel, I get


io_uring_setup(200, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, 
sq_entries=256, cq_entries=512, 
features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|IORING_FEAT_SQPOLL_NONFIXED|IORING_FEAT_EXT_ARG|IORING_FEAT_NATIVE_WORKERS|IORING_FEAT_RSRC_TAGS|0x1800, 
sq_off={head=0, tail=64, ring_mask=256, ring_entries=264, flags=276, 
dropped=272, array=8512}, cq_off={head=128, tail=192, ring_mask=260, 
ring_entries=268, overflow=284, cqes=320, flags=280}}) = 7
mmap(NULL, 9536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 7, 0) = 
-1 EACCES (Permission denied)



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-20 11:55                   ` Avi Kivity
@ 2022-04-20 12:09                     ` Jens Axboe
  2022-04-21  9:05                       ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2022-04-20 12:09 UTC (permalink / raw)
  To: Avi Kivity, io-uring

On 4/20/22 5:55 AM, Avi Kivity wrote:
> 
> On 19/04/2022 22.58, Jens Axboe wrote:
>>
>>> I'll try it tomorrow (also the other patch).
>> Thanks!
>>
> 
> With the new kernel, I get
> 
> 
> io_uring_setup(200, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=256, cq_entries=512, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|IORING_FEAT_SQPOLL_NONFIXED|IORING_FEAT_EXT_ARG|IORING_FEAT_NATIVE_WORKERS|IORING_FEAT_RSRC_TAGS|0x1800, sq_off={head=0, tail=64, ring_mask=256, ring_entries=264, flags=276, dropped=272, array=8512}, cq_off={head=128, tail=192, ring_mask=260, ring_entries=268, overflow=284, cqes=320, flags=280}}) = 7
> mmap(NULL, 9536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 7, 0) = -1 EACCES (Permission denied)

That looks odd, and not really related at all?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-20 12:09                     ` Jens Axboe
@ 2022-04-21  9:05                       ` Avi Kivity
  0 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2022-04-21  9:05 UTC (permalink / raw)
  To: Jens Axboe, io-uring


On 20/04/2022 15.09, Jens Axboe wrote:
> On 4/20/22 5:55 AM, Avi Kivity wrote:
>> On 19/04/2022 22.58, Jens Axboe wrote:
>>>> I'll try it tomorrow (also the other patch).
>>> Thanks!
>>>
>> With the new kernel, I get
>>
>>
>> io_uring_setup(200, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=256, cq_entries=512, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|IORING_FEAT_SQPOLL_NONFIXED|IORING_FEAT_EXT_ARG|IORING_FEAT_NATIVE_WORKERS|IORING_FEAT_RSRC_TAGS|0x1800, sq_off={head=0, tail=64, ring_mask=256, ring_entries=264, flags=276, dropped=272, array=8512}, cq_off={head=128, tail=192, ring_mask=260, ring_entries=268, overflow=284, cqes=320, flags=280}}) = 7
>> mmap(NULL, 9536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 7, 0) = -1 EACCES (Permission denied)
> That looks odd, and not really related at all?
>

It was selinux. So perhaps there is a regression somewhere that breaks 
selinux+io_uring.


The reason I encountered selinux was that I ran it in a virtual machine 
(nvidia + custom kernel = ...). Now the problem is that the results in 
the VM are very inconsistent, perhaps due to incorrect topology exposed 
to the guest. So I can't provide meaningful results from the patches, at 
least until I find a more workable setup.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-04-19 17:14             ` Jens Axboe
  2022-04-19 19:41               ` Avi Kivity
@ 2022-06-15 10:12               ` Avi Kivity
  2022-06-15 10:48                 ` Pavel Begunkov
  1 sibling, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2022-06-15 10:12 UTC (permalink / raw)
  To: Jens Axboe, io-uring


On 19/04/2022 20.14, Jens Axboe wrote:
> On 4/19/22 9:21 AM, Jens Axboe wrote:
>> On 4/19/22 6:31 AM, Jens Axboe wrote:
>>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>>
>>>>>>>>
>>>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>>>> inline completion vs workqueue completion:
>>>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>>>> completions.
>>>>>> I measured this:
>>>>>>
>>>>>>
>>>>>>
>>>>>>    Performance counter stats for 'system wide':
>>>>>>
>>>>>>            1,273,756 io_uring:io_uring_task_add
>>>>>>
>>>>>>         12.288597765 seconds time elapsed
>>>>>>
>>>>>> Which exactly matches with the number of requests sent. If that's the
>>>>>> wrong counter to measure, I'm happy to try again with the correct
>>>>>> counter.
>>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>>> expected.
>> Might actually be implicated. Not because it's a async worker, but
>> because I think we might be losing some affinity in this case. Looking
>> at traces, we're definitely bouncing between the poll completion side
>> and then execution the completion.
>>
>> Can you try this hack? It's against -git + for-5.19/io_uring. If you let
>> me know what base you prefer, I can do a version against that. I see
>> about a 3% win with io_uring with this, and was slower before against
>> linux-aio as you saw as well.
> Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
> believe may be the underlying cause of it.
>

Resurrecting an old thread. I have a question about timeliness of 
completions. Let's assume a request has completed. From the patch, it 
appears that io_uring will only guarantee that a completion appears on 
the completion ring if the thread has entered kernel mode since the 
completion happened. So user-space polling of the completion ring can 
cause unbounded delays.


If this is correct (it's not unreasonable, but should be documented), 
then there should also be a simple way to force a kernel entry. But how 
to do this using liburing? IIUC if I the following apply:


  1. I have no pending sqes

  2. There are pending completions

  3. There is a completed event for which a completion has not been 
appended to the completion queue ring


Then io_uring_wait_cqe() will elide io_uring_enter() and the 
completed-but-not-reported event will be delayed.


> diff --git a/fs/io-wq.c b/fs/io-wq.c
> index 32aeb2c581c5..59987dd212d8 100644
> --- a/fs/io-wq.c
> +++ b/fs/io-wq.c
> @@ -871,7 +871,7 @@ static bool io_wq_for_each_worker(struct io_wqe *wqe,
>   
>   static bool io_wq_worker_wake(struct io_worker *worker, void *data)
>   {
> -	set_notify_signal(worker->task);
> +	set_notify_signal(worker->task, true);
>   	wake_up_process(worker->task);
>   	return false;
>   }
> @@ -991,7 +991,7 @@ static bool __io_wq_worker_cancel(struct io_worker *worker,
>   {
>   	if (work && match->fn(work, match->data)) {
>   		work->flags |= IO_WQ_WORK_CANCEL;
> -		set_notify_signal(worker->task);
> +		set_notify_signal(worker->task, true);
>   		return true;
>   	}
>   
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3c8b34876744..ac1f14973e09 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -359,10 +359,10 @@ static inline void clear_notify_signal(void)
>    * Called to break out of interruptible wait loops, and enter the
>    * exit_to_user_mode_loop().
>    */
> -static inline void set_notify_signal(struct task_struct *task)
> +static inline void set_notify_signal(struct task_struct *task, bool need_ipi)
>   {
>   	if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
> -	    !wake_up_state(task, TASK_INTERRUPTIBLE))
> +	    !wake_up_state(task, TASK_INTERRUPTIBLE) && need_ipi)
>   		kick_process(task);
>   }
>   
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 5d03a2ad1066..bff53f539933 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -367,7 +367,7 @@ static void klp_send_signals(void)
>   			 * Send fake signal to all non-kthread tasks which are
>   			 * still not migrated.
>   			 */
> -			set_notify_signal(task);
> +			set_notify_signal(task, true);
>   		}
>   	}
>   	read_unlock(&tasklist_lock);
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index c59e1a49bc40..47d7024dc499 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -51,7 +51,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>   		set_notify_resume(task);
>   		break;
>   	case TWA_SIGNAL:
> -		set_notify_signal(task);
> +		set_notify_signal(task, false);
>   		break;
>   	default:
>   		WARN_ON_ONCE(1);
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-06-15 10:12               ` Avi Kivity
@ 2022-06-15 10:48                 ` Pavel Begunkov
  2022-06-15 11:04                   ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:48 UTC (permalink / raw)
  To: Avi Kivity, Jens Axboe, io-uring

On 6/15/22 11:12, Avi Kivity wrote:
> 
> On 19/04/2022 20.14, Jens Axboe wrote:
>> On 4/19/22 9:21 AM, Jens Axboe wrote:
>>> On 4/19/22 6:31 AM, Jens Axboe wrote:
>>>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>>>>> inline completion vs workqueue completion:
>>>>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>>>>> completions.
>>>>>>> I measured this:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    Performance counter stats for 'system wide':
>>>>>>>
>>>>>>>            1,273,756 io_uring:io_uring_task_add
>>>>>>>
>>>>>>>         12.288597765 seconds time elapsed
>>>>>>>
>>>>>>> Which exactly matches with the number of requests sent. If that's the
>>>>>>> wrong counter to measure, I'm happy to try again with the correct
>>>>>>> counter.
>>>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>>>> expected.
>>> Might actually be implicated. Not because it's a async worker, but
>>> because I think we might be losing some affinity in this case. Looking
>>> at traces, we're definitely bouncing between the poll completion side
>>> and then execution the completion.
>>>
>>> Can you try this hack? It's against -git + for-5.19/io_uring. If you let
>>> me know what base you prefer, I can do a version against that. I see
>>> about a 3% win with io_uring with this, and was slower before against
>>> linux-aio as you saw as well.
>> Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
>> believe may be the underlying cause of it.
>>
> 
> Resurrecting an old thread. I have a question about timeliness of completions. Let's assume a request has completed. From the patch, it appears that io_uring will only guarantee that a completion appears on the completion ring if the thread has entered kernel mode since the completion happened. So user-space polling of the completion ring can cause unbounded delays.

Right, but polling the CQ is a bad pattern, io_uring_{wait,peek}_cqe/etc.
will do the polling vs syscalling dance for you.

For larger audience, I'll remind that it's an opt-in feature


> If this is correct (it's not unreasonable, but should be documented), then there should also be a simple way to force a kernel entry. But how to do this using liburing? IIUC if I the following apply:
> 
> 
>   1. I have no pending sqes
> 
>   2. There are pending completions
> 
>   3. There is a completed event for which a completion has not been appended to the completion queue ring
> 
> 
> Then io_uring_wait_cqe() will elide io_uring_enter() and the completed-but-not-reported event will be delayed.

One way is to process all CQEs and then it'll try to enter the
kernel and do the job.

Another way is to also set IORING_SETUP_TASKRUN_FLAG, then when
there is work that requires to enter the kernel io_uring will
set IORING_SQ_TASKRUN in sq_flags.
Actually, I'm not mistaken io_uring has some automagic handling
of it internally

https://github.com/axboe/liburing/blob/master/src/queue.c#L36



-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-06-15 10:48                 ` Pavel Begunkov
@ 2022-06-15 11:04                   ` Avi Kivity
  2022-06-15 11:07                     ` Avi Kivity
  2022-06-15 11:30                     ` Pavel Begunkov
  0 siblings, 2 replies; 23+ messages in thread
From: Avi Kivity @ 2022-06-15 11:04 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring


On 15/06/2022 13.48, Pavel Begunkov wrote:
> On 6/15/22 11:12, Avi Kivity wrote:
>>
>> On 19/04/2022 20.14, Jens Axboe wrote:
>>> On 4/19/22 9:21 AM, Jens Axboe wrote:
>>>> On 4/19/22 6:31 AM, Jens Axboe wrote:
>>>>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>>>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I expect the loss is due to an optimization that io_uring 
>>>>>>>>>> lacks -
>>>>>>>>>> inline completion vs workqueue completion:
>>>>>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>>>>>> completions.
>>>>>>>> I measured this:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    Performance counter stats for 'system wide':
>>>>>>>>
>>>>>>>>            1,273,756 io_uring:io_uring_task_add
>>>>>>>>
>>>>>>>>         12.288597765 seconds time elapsed
>>>>>>>>
>>>>>>>> Which exactly matches with the number of requests sent. If 
>>>>>>>> that's the
>>>>>>>> wrong counter to measure, I'm happy to try again with the correct
>>>>>>>> counter.
>>>>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>>>>> expected.
>>>> Might actually be implicated. Not because it's a async worker, but
>>>> because I think we might be losing some affinity in this case. Looking
>>>> at traces, we're definitely bouncing between the poll completion side
>>>> and then execution the completion.
>>>>
>>>> Can you try this hack? It's against -git + for-5.19/io_uring. If 
>>>> you let
>>>> me know what base you prefer, I can do a version against that. I see
>>>> about a 3% win with io_uring with this, and was slower before against
>>>> linux-aio as you saw as well.
>>> Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
>>> believe may be the underlying cause of it.
>>>
>>
>> Resurrecting an old thread. I have a question about timeliness of 
>> completions. Let's assume a request has completed. From the patch, it 
>> appears that io_uring will only guarantee that a completion appears 
>> on the completion ring if the thread has entered kernel mode since 
>> the completion happened. So user-space polling of the completion ring 
>> can cause unbounded delays.
>
> Right, but polling the CQ is a bad pattern, io_uring_{wait,peek}_cqe/etc.
> will do the polling vs syscalling dance for you.


Can you be more explicit?


I don't think peek is enough. If there is a cqe pending, it will return 
it, but will not cause compeleted-but-unqueued events to generate 
completions.


And wait won't enter the kernel if a cqe is pending, IIUC.


>
> For larger audience, I'll remind that it's an opt-in feature
>

I don't understand - what is an opt-in feature?


>
>> If this is correct (it's not unreasonable, but should be documented), 
>> then there should also be a simple way to force a kernel entry. But 
>> how to do this using liburing? IIUC if I the following apply:
>>
>>
>>   1. I have no pending sqes
>>
>>   2. There are pending completions
>>
>>   3. There is a completed event for which a completion has not been 
>> appended to the completion queue ring
>>
>>
>> Then io_uring_wait_cqe() will elide io_uring_enter() and the 
>> completed-but-not-reported event will be delayed.
>
> One way is to process all CQEs and then it'll try to enter the
> kernel and do the job.
>
> Another way is to also set IORING_SETUP_TASKRUN_FLAG, then when
> there is work that requires to enter the kernel io_uring will
> set IORING_SQ_TASKRUN in sq_flags.
> Actually, I'm not mistaken io_uring has some automagic handling
> of it internally
>
> https://github.com/axboe/liburing/blob/master/src/queue.c#L36
>
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-06-15 11:04                   ` Avi Kivity
@ 2022-06-15 11:07                     ` Avi Kivity
  2022-06-15 11:38                       ` Pavel Begunkov
  2022-06-15 11:30                     ` Pavel Begunkov
  1 sibling, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2022-06-15 11:07 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

Sorry, hit send too quiclky.


>
>>
>>> If this is correct (it's not unreasonable, but should be 
>>> documented), then there should also be a simple way to force a 
>>> kernel entry. But how to do this using liburing? IIUC if I the 
>>> following apply:
>>>
>>>
>>>   1. I have no pending sqes
>>>
>>>   2. There are pending completions
>>>
>>>   3. There is a completed event for which a completion has not been 
>>> appended to the completion queue ring
>>>
>>>
>>> Then io_uring_wait_cqe() will elide io_uring_enter() and the 
>>> completed-but-not-reported event will be delayed.
>>
>> One way is to process all CQEs and then it'll try to enter the
>> kernel and do the job.


But I don't want it to wait. I want it to generate pending completions, 
and return immediately even if no completions were generated. I have 
some background computations I'm happy to perform if no events are 
pending, but I would like those events to be generated promptly.


>>
>> Another way is to also set IORING_SETUP_TASKRUN_FLAG, then when
>> there is work that requires to enter the kernel io_uring will
>> set IORING_SQ_TASKRUN in sq_flags.
>> Actually, I'm not mistaken io_uring has some automagic handling
>> of it internally
>>
>> https://github.com/axboe/liburing/blob/master/src/queue.c#L36
>>

Is there documentation about this flag?


>>
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-06-15 11:04                   ` Avi Kivity
  2022-06-15 11:07                     ` Avi Kivity
@ 2022-06-15 11:30                     ` Pavel Begunkov
  2022-06-15 11:36                       ` Avi Kivity
  1 sibling, 1 reply; 23+ messages in thread
From: Pavel Begunkov @ 2022-06-15 11:30 UTC (permalink / raw)
  To: Avi Kivity, Jens Axboe, io-uring

On 6/15/22 12:04, Avi Kivity wrote:
> 
> On 15/06/2022 13.48, Pavel Begunkov wrote:
>> On 6/15/22 11:12, Avi Kivity wrote:
>>>
>>> On 19/04/2022 20.14, Jens Axboe wrote:
>>>> On 4/19/22 9:21 AM, Jens Axboe wrote:
>>>>> On 4/19/22 6:31 AM, Jens Axboe wrote:
>>>>>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>>>>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>>>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>>>>>>> inline completion vs workqueue completion:
>>>>>>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>>>>>>> completions.
>>>>>>>>> I measured this:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    Performance counter stats for 'system wide':
>>>>>>>>>
>>>>>>>>>            1,273,756 io_uring:io_uring_task_add
>>>>>>>>>
>>>>>>>>>         12.288597765 seconds time elapsed
>>>>>>>>>
>>>>>>>>> Which exactly matches with the number of requests sent. If that's the
>>>>>>>>> wrong counter to measure, I'm happy to try again with the correct
>>>>>>>>> counter.
>>>>>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>>>>>> expected.
>>>>> Might actually be implicated. Not because it's a async worker, but
>>>>> because I think we might be losing some affinity in this case. Looking
>>>>> at traces, we're definitely bouncing between the poll completion side
>>>>> and then execution the completion.
>>>>>
>>>>> Can you try this hack? It's against -git + for-5.19/io_uring. If you let
>>>>> me know what base you prefer, I can do a version against that. I see
>>>>> about a 3% win with io_uring with this, and was slower before against
>>>>> linux-aio as you saw as well.
>>>> Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
>>>> believe may be the underlying cause of it.
>>>>
>>>
>>> Resurrecting an old thread. I have a question about timeliness of completions. Let's assume a request has completed. From the patch, it appears that io_uring will only guarantee that a completion appears on the completion ring if the thread has entered kernel mode since the completion happened. So user-space polling of the completion ring can cause unbounded delays.
>>
>> Right, but polling the CQ is a bad pattern, io_uring_{wait,peek}_cqe/etc.
>> will do the polling vs syscalling dance for you.
> 
> 
> Can you be more explicit?
> 
> 
> I don't think peek is enough. If there is a cqe pending, it will return it, but will not cause compeleted-but-unqueued events to generate completions.
> 
> 
> And wait won't enter the kernel if a cqe is pending, IIUC.

Right, usually it won't, but works if you eventually end up
waiting, e.g. by waiting for all expected cqes.


>> For larger audience, I'll remind that it's an opt-in feature
>>
> 
> I don't understand - what is an opt-in feature?

The behaviour that you worry about when CQEs are not posted until
you do syscall, it's only so if you set IORING_SETUP_COOP_TASKRUN.


>>> If this is correct (it's not unreasonable, but should be documented), then there should also be a simple way to force a kernel entry. But how to do this using liburing? IIUC if I the following apply:
>>>
>>>
>>>   1. I have no pending sqes
>>>
>>>   2. There are pending completions
>>>
>>>   3. There is a completed event for which a completion has not been appended to the completion queue ring
>>>
>>>
>>> Then io_uring_wait_cqe() will elide io_uring_enter() and the completed-but-not-reported event will be delayed.
>>
>> One way is to process all CQEs and then it'll try to enter the
>> kernel and do the job.
>>
>> Another way is to also set IORING_SETUP_TASKRUN_FLAG, then when
>> there is work that requires to enter the kernel io_uring will
>> set IORING_SQ_TASKRUN in sq_flags.
>> Actually, I'm not mistaken io_uring has some automagic handling
>> of it internally
>>
>> https://github.com/axboe/liburing/blob/master/src/queue.c#L36
>>
>>
>>

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-06-15 11:30                     ` Pavel Begunkov
@ 2022-06-15 11:36                       ` Avi Kivity
  0 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2022-06-15 11:36 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring


On 15/06/2022 14.30, Pavel Begunkov wrote:
> On 6/15/22 12:04, Avi Kivity wrote:
>>
>> On 15/06/2022 13.48, Pavel Begunkov wrote:
>>> On 6/15/22 11:12, Avi Kivity wrote:
>>>>
>>>> On 19/04/2022 20.14, Jens Axboe wrote:
>>>>> On 4/19/22 9:21 AM, Jens Axboe wrote:
>>>>>> On 4/19/22 6:31 AM, Jens Axboe wrote:
>>>>>>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>>>>>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>>>>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>>>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I expect the loss is due to an optimization that io_uring 
>>>>>>>>>>>> lacks -
>>>>>>>>>>>> inline completion vs workqueue completion:
>>>>>>>>>>> I don't think that's it, io_uring never punts to a workqueue 
>>>>>>>>>>> for
>>>>>>>>>>> completions.
>>>>>>>>>> I measured this:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    Performance counter stats for 'system wide':
>>>>>>>>>>
>>>>>>>>>>            1,273,756 io_uring:io_uring_task_add
>>>>>>>>>>
>>>>>>>>>>         12.288597765 seconds time elapsed
>>>>>>>>>>
>>>>>>>>>> Which exactly matches with the number of requests sent. If 
>>>>>>>>>> that's the
>>>>>>>>>> wrong counter to measure, I'm happy to try again with the 
>>>>>>>>>> correct
>>>>>>>>>> counter.
>>>>>>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>>>>>>> expected.
>>>>>> Might actually be implicated. Not because it's a async worker, but
>>>>>> because I think we might be losing some affinity in this case. 
>>>>>> Looking
>>>>>> at traces, we're definitely bouncing between the poll completion 
>>>>>> side
>>>>>> and then execution the completion.
>>>>>>
>>>>>> Can you try this hack? It's against -git + for-5.19/io_uring. If 
>>>>>> you let
>>>>>> me know what base you prefer, I can do a version against that. I see
>>>>>> about a 3% win with io_uring with this, and was slower before 
>>>>>> against
>>>>>> linux-aio as you saw as well.
>>>>> Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
>>>>> believe may be the underlying cause of it.
>>>>>
>>>>
>>>> Resurrecting an old thread. I have a question about timeliness of 
>>>> completions. Let's assume a request has completed. From the patch, 
>>>> it appears that io_uring will only guarantee that a completion 
>>>> appears on the completion ring if the thread has entered kernel 
>>>> mode since the completion happened. So user-space polling of the 
>>>> completion ring can cause unbounded delays.
>>>
>>> Right, but polling the CQ is a bad pattern, 
>>> io_uring_{wait,peek}_cqe/etc.
>>> will do the polling vs syscalling dance for you.
>>
>>
>> Can you be more explicit?
>>
>>
>> I don't think peek is enough. If there is a cqe pending, it will 
>> return it, but will not cause compeleted-but-unqueued events to 
>> generate completions.
>>
>>
>> And wait won't enter the kernel if a cqe is pending, IIUC.
>
> Right, usually it won't, but works if you eventually end up
> waiting, e.g. by waiting for all expected cqes.
>
>
>>> For larger audience, I'll remind that it's an opt-in feature
>>>
>>
>> I don't understand - what is an opt-in feature?
>
> The behaviour that you worry about when CQEs are not posted until
> you do syscall, it's only so if you set IORING_SETUP_COOP_TASKRUN.
>

Ah! I wasn't aware of this new flag. This is exactly what I want - 
either ask for timely completions, or optimize for throughput.


Of course, it puts me in a dilemma because I want both, but that's my 
problem.


Thanks!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-06-15 11:07                     ` Avi Kivity
@ 2022-06-15 11:38                       ` Pavel Begunkov
  2022-06-15 12:21                         ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Pavel Begunkov @ 2022-06-15 11:38 UTC (permalink / raw)
  To: Avi Kivity, Jens Axboe, io-uring

On 6/15/22 12:07, Avi Kivity wrote:
> Sorry, hit send too quiclky.
>>>> If this is correct (it's not unreasonable, but should be documented), then there should also be a simple way to force a kernel entry. But how to do this using liburing? IIUC if I the following apply:
>>>>
>>>>
>>>>   1. I have no pending sqes
>>>>
>>>>   2. There are pending completions
>>>>
>>>>   3. There is a completed event for which a completion has not been appended to the completion queue ring
>>>>
>>>>
>>>> Then io_uring_wait_cqe() will elide io_uring_enter() and the completed-but-not-reported event will be delayed.
>>>
>>> One way is to process all CQEs and then it'll try to enter the
>>> kernel and do the job.
> 
> 
> But I don't want it to wait. I want it to generate pending completions, and return immediately even if no completions were generated. I have some background computations I'm happy to perform if no events are pending, but I would like those events to be generated promptly.

Ok, then you may want IORING_SETUP_TASKRUN_FLAG described below.
btw IORING_SETUP_TASKRUN_FLAG has some additional overhead.


>>> Another way is to also set IORING_SETUP_TASKRUN_FLAG, then when
>>> there is work that requires to enter the kernel io_uring will
>>> set IORING_SQ_TASKRUN in sq_flags.
>>> Actually, I'm not mistaken io_uring has some automagic handling
>>> of it internally
>>>
>>> https://github.com/axboe/liburing/blob/master/src/queue.c#L36
>>>
> 
> Is there documentation about this flag?

Unfortunately, I don't see any.
Here is a link to the kernel commit if it helps:

https://git.kernel.dk/cgit/linux-block/commit/?h=ef060ea9e4fd3b763e7060a3af0a258d2d5d7c0d

I think it's more interesting what support liburing has,
but would need to look up in the code. Maybe Jens
remembers and can tell.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-06-15 11:38                       ` Pavel Begunkov
@ 2022-06-15 12:21                         ` Jens Axboe
  2022-06-15 13:43                           ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2022-06-15 12:21 UTC (permalink / raw)
  To: Pavel Begunkov, Avi Kivity, io-uring

On 6/15/22 5:38 AM, Pavel Begunkov wrote:
>>>> Another way is to also set IORING_SETUP_TASKRUN_FLAG, then when
>>>> there is work that requires to enter the kernel io_uring will
>>>> set IORING_SQ_TASKRUN in sq_flags.
>>>> Actually, I'm not mistaken io_uring has some automagic handling
>>>> of it internally
>>>>
>>>> https://github.com/axboe/liburing/blob/master/src/queue.c#L36
>>>>
>>
>> Is there documentation about this flag?
> 
> Unfortunately, I don't see any.
> Here is a link to the kernel commit if it helps:
> 
> https://git.kernel.dk/cgit/linux-block/commit/?h=ef060ea9e4fd3b763e7060a3af0a258d2d5d7c0d
> 
> I think it's more interesting what support liburing has,
> but would need to look up in the code. Maybe Jens
> remembers and can tell.

It is documented, in the io_uring.2 man page in liburing:

https://git.kernel.dk/cgit/liburing/tree/man/io_uring_setup.2#n200

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
  2022-06-15 12:21                         ` Jens Axboe
@ 2022-06-15 13:43                           ` Avi Kivity
  0 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2022-06-15 13:43 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring


On 15/06/2022 15.21, Jens Axboe wrote:
> On 6/15/22 5:38 AM, Pavel Begunkov wrote:
>>>>> Another way is to also set IORING_SETUP_TASKRUN_FLAG, then when
>>>>> there is work that requires to enter the kernel io_uring will
>>>>> set IORING_SQ_TASKRUN in sq_flags.
>>>>> Actually, I'm not mistaken io_uring has some automagic handling
>>>>> of it internally
>>>>>
>>>>> https://github.com/axboe/liburing/blob/master/src/queue.c#L36
>>>>>
>>> Is there documentation about this flag?
>> Unfortunately, I don't see any.
>> Here is a link to the kernel commit if it helps:
>>
>> https://git.kernel.dk/cgit/linux-block/commit/?h=ef060ea9e4fd3b763e7060a3af0a258d2d5d7c0d
>>
>> I think it's more interesting what support liburing has,
>> but would need to look up in the code. Maybe Jens
>> remembers and can tell.
> It is documented, in the io_uring.2 man page in liburing:
>
> https://git.kernel.dk/cgit/liburing/tree/man/io_uring_setup.2#n200
>

Perfect. Thanks!



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-06-15 13:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 11:07 IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL Avi Kivity
2022-04-19 11:38 ` Jens Axboe
2022-04-19 11:57   ` Avi Kivity
2022-04-19 12:04     ` Jens Axboe
2022-04-19 12:21       ` Avi Kivity
2022-04-19 12:31         ` Jens Axboe
2022-04-19 15:21           ` Jens Axboe
2022-04-19 15:51             ` Avi Kivity
2022-04-19 17:14             ` Jens Axboe
2022-04-19 19:41               ` Avi Kivity
2022-04-19 19:58                 ` Jens Axboe
2022-04-20 11:55                   ` Avi Kivity
2022-04-20 12:09                     ` Jens Axboe
2022-04-21  9:05                       ` Avi Kivity
2022-06-15 10:12               ` Avi Kivity
2022-06-15 10:48                 ` Pavel Begunkov
2022-06-15 11:04                   ` Avi Kivity
2022-06-15 11:07                     ` Avi Kivity
2022-06-15 11:38                       ` Pavel Begunkov
2022-06-15 12:21                         ` Jens Axboe
2022-06-15 13:43                           ` Avi Kivity
2022-06-15 11:30                     ` Pavel Begunkov
2022-06-15 11:36                       ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.