io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Should io_sq_thread belongs to specific cpu, not io_uring instance
@ 2020-04-14 13:08 Xiaoguang Wang
  2020-04-14 18:55 ` Yu Jian Wu
  0 siblings, 1 reply; 3+ messages in thread
From: Xiaoguang Wang @ 2020-04-14 13:08 UTC (permalink / raw)
  To: io-uring, axboe, joseph qi

hi,

Currently we can create multiple io_uring instances which all have SQPOLL
enabled and make them run in the same cpu core by setting sq_thread_cpu
argument, but I think this behaviour maybe not efficient. Say we create two
io_uring instances, which both have sq_thread_cpu set to 1 and sq_thread_idle
set to 1000 milliseconds, there maybe such scene below:
   For example, in 0-1s time interval, io_uring instance0 has neither sqes
nor cqes, so it just busy waits for new sqes in 0-1s time interval, but
io_uring instance1 have work to do, submitting sqes or polling issued requests,
then io_uring instance0 will impact io_uring instance1. Of cource io_uring
instance1 may impact iouring instance0 as well, which is not efficient. I think
the complete disorder of multiple io_uring instances running in same cpu core is
not good.

How about we create one io_sq_thread for user specified cpu for multiple io_uring
instances which try to share this cpu core, that means this io_sq_thread does not
belong to specific io_uring instance, it belongs to specific cpu and will
handle requests from mulpile io_uring instance, see simple running flow:
   1, for cpu 1, now there are no io_uring instances bind to it, so do not create io_sq_thread
   2, io_uring instance1 is created and bind to cpu 1, then create cpu1's io_sq_thread
   3, io_sq_thread will handle io_uring instance1's requests
   4, io_uring instance2 is created and bind to cpu 1, since there are already an
      io_sq_thread for cpu 1, will not create an io_sq_thread for cpu1.
   5. now io_sq_thread in cpu1 will handle both io_uring instances' requests.

What do you think about it? Thanks.

Regards,
Xiaoguang Wang

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Should io_sq_thread belongs to specific cpu, not io_uring instance
  2020-04-14 13:08 Should io_sq_thread belongs to specific cpu, not io_uring instance Xiaoguang Wang
@ 2020-04-14 18:55 ` Yu Jian Wu
  2020-04-15 12:18   ` Xiaoguang Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Yu Jian Wu @ 2020-04-14 18:55 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring, axboe, joseph qi, asaf.cidon, stutsman


On 4/14/20 9:08 AM, Xiaoguang Wang wrote:
> hi,
>
> Currently we can create multiple io_uring instances which all have SQPOLL
> enabled and make them run in the same cpu core by setting sq_thread_cpu
> argument, but I think this behaviour maybe not efficient. Say we create two
> io_uring instances, which both have sq_thread_cpu set to 1 and sq_thread_idle
> set to 1000 milliseconds, there maybe such scene below:
>   For example, in 0-1s time interval, io_uring instance0 has neither sqes
> nor cqes, so it just busy waits for new sqes in 0-1s time interval, but
> io_uring instance1 have work to do, submitting sqes or polling issued requests,
> then io_uring instance0 will impact io_uring instance1. Of cource io_uring
> instance1 may impact iouring instance0 as well, which is not efficient. I think
> the complete disorder of multiple io_uring instances running in same cpu core is
> not good.
>
> How about we create one io_sq_thread for user specified cpu for multiple io_uring
> instances which try to share this cpu core, that means this io_sq_thread does not
> belong to specific io_uring instance, it belongs to specific cpu and will
> handle requests from mulpile io_uring instance, see simple running flow:
>   1, for cpu 1, now there are no io_uring instances bind to it, so do not create io_sq_thread
>   2, io_uring instance1 is created and bind to cpu 1, then create cpu1's io_sq_thread
>   3, io_sq_thread will handle io_uring instance1's requests
>   4, io_uring instance2 is created and bind to cpu 1, since there are already an
>      io_sq_thread for cpu 1, will not create an io_sq_thread for cpu1.
>   5. now io_sq_thread in cpu1 will handle both io_uring instances' requests.
>
> What do you think about it? Thanks.
>
> Regards,
> Xiaoguang Wang
>
Hi Xiaoguang,

We (a group of researchers at Utah and Columbia) are currently trying that right now.

We have an initial prototype going, and we are assessing the performance impact now to see if we can see gains. Basically, have a rcu-list of io_uring_ctx and then traverse the list and do work in a shared io_sq_thread. We are starting experiments on a machine with fast SSDs where we hope to see some performance benefits.

We will send the list of patches soon, once we are sure the approach works and we finish cleaning it up. (There is a subtlety of what to do with the timeouts and resched() when not pinning.)

We'll keep you in the loop on any updates. Feel free to contact any of us.

Thanks,

Yu Jian Wu


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Should io_sq_thread belongs to specific cpu, not io_uring instance
  2020-04-14 18:55 ` Yu Jian Wu
@ 2020-04-15 12:18   ` Xiaoguang Wang
  0 siblings, 0 replies; 3+ messages in thread
From: Xiaoguang Wang @ 2020-04-15 12:18 UTC (permalink / raw)
  To: Yu Jian Wu, io-uring, axboe, joseph qi, asaf.cidon, stutsman

hi,

> 
> On 4/14/20 9:08 AM, Xiaoguang Wang wrote:
>> hi,
>>
>> Currently we can create multiple io_uring instances which all have SQPOLL
>> enabled and make them run in the same cpu core by setting sq_thread_cpu
>> argument, but I think this behaviour maybe not efficient. Say we create two
>> io_uring instances, which both have sq_thread_cpu set to 1 and sq_thread_idle
>> set to 1000 milliseconds, there maybe such scene below:
>>    For example, in 0-1s time interval, io_uring instance0 has neither sqes
>> nor cqes, so it just busy waits for new sqes in 0-1s time interval, but
>> io_uring instance1 have work to do, submitting sqes or polling issued requests,
>> then io_uring instance0 will impact io_uring instance1. Of cource io_uring
>> instance1 may impact iouring instance0 as well, which is not efficient. I think
>> the complete disorder of multiple io_uring instances running in same cpu core is
>> not good.
>>
>> How about we create one io_sq_thread for user specified cpu for multiple io_uring
>> instances which try to share this cpu core, that means this io_sq_thread does not
>> belong to specific io_uring instance, it belongs to specific cpu and will
>> handle requests from mulpile io_uring instance, see simple running flow:
>>    1, for cpu 1, now there are no io_uring instances bind to it, so do not create io_sq_thread
>>    2, io_uring instance1 is created and bind to cpu 1, then create cpu1's io_sq_thread
>>    3, io_sq_thread will handle io_uring instance1's requests
>>    4, io_uring instance2 is created and bind to cpu 1, since there are already an
>>       io_sq_thread for cpu 1, will not create an io_sq_thread for cpu1.
>>    5. now io_sq_thread in cpu1 will handle both io_uring instances' requests.
>>
>> What do you think about it? Thanks.
>>
>> Regards,
>> Xiaoguang Wang
>>
> Hi Xiaoguang,
> 
> We (a group of researchers at Utah and Columbia) are currently trying that right now.
Cool, thanks, let me explain more why we need this feature :)
Cpu is a much more important resource. Say a physical machine has 96 cores,
if we run many io_uring instances which all have sqpoll enabled, indeed we
can only allocate a small number of cpus to io_sq_thread, so sharing cpu to
poll is valuable.

> 
> We have an initial prototype going, and we are assessing the performance impact now to see if we can see gains. Basically, have a rcu-list of io_uring_ctx and then traverse the list and do work in a shared io_sq_thread. We are starting experiments on a machine with fast SSDs where we hope to see some performance benefits.
You can try this test case to assessing the performance :)
   1. create two io_uring instances, which both have sqpoll enabled, set
sq_thread_idle to 1000ms and bind to same cpu core.
   2. one io_uring instance just sends one io request per 500ms, which will
make this instance's io_sq_thead always contend for the cpu.
   3. another io_uring instance issues io requests continually, so this
instance's io_sq_thread will also contend for the cpu.
In current io_uring implementation, I think the second io_uring instance will
be impacted by the first io_uring instance.

> 
> We will send the list of patches soon, once we are sure the approach works and we finish cleaning it up. (There is a subtlety of what to do with the timeouts and resched() when not pinning.)
> 
> We'll keep you in the loop on any updates. Feel free to contact any of us.
OK, thanks.

Regards,
Xiaoguang Wang
> 
> Thanks,
> 
> Yu Jian Wu
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-15 12:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-14 13:08 Should io_sq_thread belongs to specific cpu, not io_uring instance Xiaoguang Wang
2020-04-14 18:55 ` Yu Jian Wu
2020-04-15 12:18   ` Xiaoguang Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).