All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
@ 2021-08-20 18:40 Hao Xu
  2021-08-20 18:59 ` Pavel Begunkov
  0 siblings, 1 reply; 13+ messages in thread
From: Hao Xu @ 2021-08-20 18:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
may cause problems when accessing it parallelly.

Fixes: d10299e14aae ("io_uring: inline struct io_comp_state")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---
 fs/io_uring.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index c755efdac71f..420f8dfa5327 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2003,11 +2003,10 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx)
 {
 	if (!ctx)
 		return;
-	if (ctx->submit_state.compl_nr) {
-		mutex_lock(&ctx->uring_lock);
+	mutex_lock(&ctx->uring_lock);
+	if (ctx->submit_state.compl_nr)
 		io_submit_flush_completions(ctx);
-		mutex_unlock(&ctx->uring_lock);
-	}
+	mutex_unlock(&ctx->uring_lock);
 	percpu_ref_put(&ctx->refs);
 }
 
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 18:40 [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr Hao Xu
@ 2021-08-20 18:59 ` Pavel Begunkov
  2021-08-20 20:39   ` Hao Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2021-08-20 18:59 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 8/20/21 7:40 PM, Hao Xu wrote:
> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
> may cause problems when accessing it parallelly.

Did you hit any problem? It sounds like it should be fine as is:

The trick is that it's only responsible to flush requests added
during execution of current call to tctx_task_work(), and those
naturally synchronised with the current task. All other potentially
enqueued requests will be of someone else's responsibility.

So, if nobody flushed requests, we're finely in-sync. If we see
0 there, but actually enqueued a request, it means someone
actually flushed it after the request had been added.

Probably, needs a more formal explanation with happens-before
and so.

> 
> Fixes: d10299e14aae ("io_uring: inline struct io_comp_state")

FWIW, it came much earlier than this commit, IIRC

commit 2c32395d8111037ae2cb8cab883e80bcdbb70713
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Sun Feb 28 22:04:53 2021 +0000

    io_uring: fix __tctx_task_work() ctx race


> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
>  fs/io_uring.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index c755efdac71f..420f8dfa5327 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2003,11 +2003,10 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx)
>  {
>  	if (!ctx)
>  		return;
> -	if (ctx->submit_state.compl_nr) {
> -		mutex_lock(&ctx->uring_lock);
> +	mutex_lock(&ctx->uring_lock);
> +	if (ctx->submit_state.compl_nr)
>  		io_submit_flush_completions(ctx);
> -		mutex_unlock(&ctx->uring_lock);
> -	}
> +	mutex_unlock(&ctx->uring_lock);
>  	percpu_ref_put(&ctx->refs);
>  }
>  
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 18:59 ` Pavel Begunkov
@ 2021-08-20 20:39   ` Hao Xu
  2021-08-20 21:32     ` Pavel Begunkov
  0 siblings, 1 reply; 13+ messages in thread
From: Hao Xu @ 2021-08-20 20:39 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/8/21 上午2:59, Pavel Begunkov 写道:
> On 8/20/21 7:40 PM, Hao Xu wrote:
>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>> may cause problems when accessing it parallelly.
> 
> Did you hit any problem? It sounds like it should be fine as is:
> 
> The trick is that it's only responsible to flush requests added
> during execution of current call to tctx_task_work(), and those
> naturally synchronised with the current task. All other potentially
> enqueued requests will be of someone else's responsibility.
> 
> So, if nobody flushed requests, we're finely in-sync. If we see
> 0 there, but actually enqueued a request, it means someone
> actually flushed it after the request had been added.
> 
> Probably, needs a more formal explanation with happens-before
> and so.
I should put more detail in the commit message, the thing is:
say coml_nr > 0

   ctx_flush_and put                  other context
    if (compl_nr)                      get mutex
                                       coml_nr > 0
                                       do flush
                                           coml_nr = 0
                                       release mutex
         get mutex
            do flush (*)
         release mutex

in (*) place, we do a bunch of unnecessary works, moreover, we
call io_cqring_ev_posted() which I think we shouldn't.
> 
>>
>> Fixes: d10299e14aae ("io_uring: inline struct io_comp_state")
> 
> FWIW, it came much earlier than this commit, IIRC
> 
> commit 2c32395d8111037ae2cb8cab883e80bcdbb70713
> Author: Pavel Begunkov <asml.silence@gmail.com>
> Date:   Sun Feb 28 22:04:53 2021 +0000
> 
>      io_uring: fix __tctx_task_work() ctx race
> 
> 
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>   fs/io_uring.c | 7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index c755efdac71f..420f8dfa5327 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -2003,11 +2003,10 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx)
>>   {
>>   	if (!ctx)
>>   		return;
>> -	if (ctx->submit_state.compl_nr) {
>> -		mutex_lock(&ctx->uring_lock);
>> +	mutex_lock(&ctx->uring_lock);
>> +	if (ctx->submit_state.compl_nr)
>>   		io_submit_flush_completions(ctx);
>> -		mutex_unlock(&ctx->uring_lock);
>> -	}
>> +	mutex_unlock(&ctx->uring_lock);
>>   	percpu_ref_put(&ctx->refs);
>>   }
>>   
>>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 20:39   ` Hao Xu
@ 2021-08-20 21:32     ` Pavel Begunkov
  2021-08-20 22:07       ` Hao Xu
  2021-08-20 22:09       ` Jens Axboe
  0 siblings, 2 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-08-20 21:32 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 8/20/21 9:39 PM, Hao Xu wrote:
> 在 2021/8/21 上午2:59, Pavel Begunkov 写道:
>> On 8/20/21 7:40 PM, Hao Xu wrote:
>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>>> may cause problems when accessing it parallelly.
>>
>> Did you hit any problem? It sounds like it should be fine as is:
>>
>> The trick is that it's only responsible to flush requests added
>> during execution of current call to tctx_task_work(), and those
>> naturally synchronised with the current task. All other potentially
>> enqueued requests will be of someone else's responsibility.
>>
>> So, if nobody flushed requests, we're finely in-sync. If we see
>> 0 there, but actually enqueued a request, it means someone
>> actually flushed it after the request had been added.
>>
>> Probably, needs a more formal explanation with happens-before
>> and so.
> I should put more detail in the commit message, the thing is:
> say coml_nr > 0
> 
>   ctx_flush_and put                  other context
>    if (compl_nr)                      get mutex
>                                       coml_nr > 0
>                                       do flush
>                                           coml_nr = 0
>                                       release mutex
>         get mutex
>            do flush (*)
>         release mutex
> 
> in (*) place, we do a bunch of unnecessary works, moreover, we

I wouldn't care about overhead, that shouldn't be much

> call io_cqring_ev_posted() which I think we shouldn't.

IMHO, users should expect spurious io_cqring_ev_posted(),
though there were some eventfd users complaining before, so
for them we can do

if (state->nr) {
    lock();
    if (state->nr) flush();
    unlock();
}

>>> Fixes: d10299e14aae ("io_uring: inline struct io_comp_state")
>>
>> FWIW, it came much earlier than this commit, IIRC
>>
>> commit 2c32395d8111037ae2cb8cab883e80bcdbb70713
>> Author: Pavel Begunkov <asml.silence@gmail.com>
>> Date:   Sun Feb 28 22:04:53 2021 +0000
>>
>>      io_uring: fix __tctx_task_work() ctx race
>>
>>
>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>> ---
>>>   fs/io_uring.c | 7 +++----
>>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index c755efdac71f..420f8dfa5327 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -2003,11 +2003,10 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx)
>>>   {
>>>       if (!ctx)
>>>           return;
>>> -    if (ctx->submit_state.compl_nr) {
>>> -        mutex_lock(&ctx->uring_lock);
>>> +    mutex_lock(&ctx->uring_lock);
>>> +    if (ctx->submit_state.compl_nr)
>>>           io_submit_flush_completions(ctx);
>>> -        mutex_unlock(&ctx->uring_lock);
>>> -    }
>>> +    mutex_unlock(&ctx->uring_lock);
>>>       percpu_ref_put(&ctx->refs);
>>>   }
>>>  
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 21:32     ` Pavel Begunkov
@ 2021-08-20 22:07       ` Hao Xu
  2021-08-20 22:09       ` Jens Axboe
  1 sibling, 0 replies; 13+ messages in thread
From: Hao Xu @ 2021-08-20 22:07 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/8/21 上午5:32, Pavel Begunkov 写道:
> On 8/20/21 9:39 PM, Hao Xu wrote:
>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道:
>>> On 8/20/21 7:40 PM, Hao Xu wrote:
>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>>>> may cause problems when accessing it parallelly.
>>>
>>> Did you hit any problem? It sounds like it should be fine as is:
>>>
>>> The trick is that it's only responsible to flush requests added
>>> during execution of current call to tctx_task_work(), and those
>>> naturally synchronised with the current task. All other potentially
>>> enqueued requests will be of someone else's responsibility.
>>>
>>> So, if nobody flushed requests, we're finely in-sync. If we see
>>> 0 there, but actually enqueued a request, it means someone
>>> actually flushed it after the request had been added.
>>>
>>> Probably, needs a more formal explanation with happens-before
>>> and so.
>> I should put more detail in the commit message, the thing is:
>> say coml_nr > 0
>>
>>    ctx_flush_and put                  other context
>>     if (compl_nr)                      get mutex
>>                                        coml_nr > 0
>>                                        do flush
>>                                            coml_nr = 0
>>                                        release mutex
>>          get mutex
>>             do flush (*)
>>          release mutex
>>
>> in (*) place, we do a bunch of unnecessary works, moreover, we
> 
> I wouldn't care about overhead, that shouldn't be much
> 
>> call io_cqring_ev_posted() which I think we shouldn't.
> 
> IMHO, users should expect spurious io_cqring_ev_posted(),
> though there were some eventfd users complaining before, so
> for them we can do
> 
> if (state->nr) {
>      lock();
>      if (state->nr) flush();
>      unlock();
> }
Agree.
> 
>>>> Fixes: d10299e14aae ("io_uring: inline struct io_comp_state")
>>>
>>> FWIW, it came much earlier than this commit, IIRC
>>>
>>> commit 2c32395d8111037ae2cb8cab883e80bcdbb70713
>>> Author: Pavel Begunkov <asml.silence@gmail.com>
>>> Date:   Sun Feb 28 22:04:53 2021 +0000
>>>
>>>       io_uring: fix __tctx_task_work() ctx race
>>>
>>>
>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>> ---
>>>>    fs/io_uring.c | 7 +++----
>>>>    1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index c755efdac71f..420f8dfa5327 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -2003,11 +2003,10 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx)
>>>>    {
>>>>        if (!ctx)
>>>>            return;
>>>> -    if (ctx->submit_state.compl_nr) {
>>>> -        mutex_lock(&ctx->uring_lock);
>>>> +    mutex_lock(&ctx->uring_lock);
>>>> +    if (ctx->submit_state.compl_nr)
>>>>            io_submit_flush_completions(ctx);
>>>> -        mutex_unlock(&ctx->uring_lock);
>>>> -    }
>>>> +    mutex_unlock(&ctx->uring_lock);
>>>>        percpu_ref_put(&ctx->refs);
>>>>    }
>>>>   
>>>
>>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 21:32     ` Pavel Begunkov
  2021-08-20 22:07       ` Hao Xu
@ 2021-08-20 22:09       ` Jens Axboe
  2021-08-20 22:21         ` Hao Xu
  2021-08-20 22:28         ` Pavel Begunkov
  1 sibling, 2 replies; 13+ messages in thread
From: Jens Axboe @ 2021-08-20 22:09 UTC (permalink / raw)
  To: Pavel Begunkov, Hao Xu; +Cc: io-uring, Joseph Qi

On 8/20/21 3:32 PM, Pavel Begunkov wrote:
> On 8/20/21 9:39 PM, Hao Xu wrote:
>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道:
>>> On 8/20/21 7:40 PM, Hao Xu wrote:
>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>>>> may cause problems when accessing it parallelly.
>>>
>>> Did you hit any problem? It sounds like it should be fine as is:
>>>
>>> The trick is that it's only responsible to flush requests added
>>> during execution of current call to tctx_task_work(), and those
>>> naturally synchronised with the current task. All other potentially
>>> enqueued requests will be of someone else's responsibility.
>>>
>>> So, if nobody flushed requests, we're finely in-sync. If we see
>>> 0 there, but actually enqueued a request, it means someone
>>> actually flushed it after the request had been added.
>>>
>>> Probably, needs a more formal explanation with happens-before
>>> and so.
>> I should put more detail in the commit message, the thing is:
>> say coml_nr > 0
>>
>>   ctx_flush_and put                  other context
>>    if (compl_nr)                      get mutex
>>                                       coml_nr > 0
>>                                       do flush
>>                                           coml_nr = 0
>>                                       release mutex
>>         get mutex
>>            do flush (*)
>>         release mutex
>>
>> in (*) place, we do a bunch of unnecessary works, moreover, we
> 
> I wouldn't care about overhead, that shouldn't be much
> 
>> call io_cqring_ev_posted() which I think we shouldn't.
> 
> IMHO, users should expect spurious io_cqring_ev_posted(),
> though there were some eventfd users complaining before, so
> for them we can do

It does sometimes cause issues, see:

commit b18032bb0a883cd7edd22a7fe6c57e1059b81ed0
Author: Jens Axboe <axboe@kernel.dk>
Date:   Sun Jan 24 16:58:56 2021 -0700

    io_uring: only call io_cqring_ev_posted() if events were posted

I would tend to agree with Hao here, and the usual optimization idiom
looks like:

if (struct->nr) {
	mutex_lock(&struct->lock);
	if (struct->nr)
		do_something();
	mutex_unlock(&struct->lock);
}

like you posted, which would be fine and avoid this whole discussion :-)

Hao, care to spin a patch that does that?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 22:09       ` Jens Axboe
@ 2021-08-20 22:21         ` Hao Xu
  2021-08-20 22:28         ` Pavel Begunkov
  1 sibling, 0 replies; 13+ messages in thread
From: Hao Xu @ 2021-08-20 22:21 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Joseph Qi

在 2021/8/21 上午6:09, Jens Axboe 写道:
> On 8/20/21 3:32 PM, Pavel Begunkov wrote:
>> On 8/20/21 9:39 PM, Hao Xu wrote:
>>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道:
>>>> On 8/20/21 7:40 PM, Hao Xu wrote:
>>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>>>>> may cause problems when accessing it parallelly.
>>>>
>>>> Did you hit any problem? It sounds like it should be fine as is:
>>>>
>>>> The trick is that it's only responsible to flush requests added
>>>> during execution of current call to tctx_task_work(), and those
>>>> naturally synchronised with the current task. All other potentially
>>>> enqueued requests will be of someone else's responsibility.
>>>>
>>>> So, if nobody flushed requests, we're finely in-sync. If we see
>>>> 0 there, but actually enqueued a request, it means someone
>>>> actually flushed it after the request had been added.
>>>>
>>>> Probably, needs a more formal explanation with happens-before
>>>> and so.
>>> I should put more detail in the commit message, the thing is:
>>> say coml_nr > 0
>>>
>>>    ctx_flush_and put                  other context
>>>     if (compl_nr)                      get mutex
>>>                                        coml_nr > 0
>>>                                        do flush
>>>                                            coml_nr = 0
>>>                                        release mutex
>>>          get mutex
>>>             do flush (*)
>>>          release mutex
>>>
>>> in (*) place, we do a bunch of unnecessary works, moreover, we
>>
>> I wouldn't care about overhead, that shouldn't be much
>>
>>> call io_cqring_ev_posted() which I think we shouldn't.
>>
>> IMHO, users should expect spurious io_cqring_ev_posted(),
>> though there were some eventfd users complaining before, so
>> for them we can do
> 
> It does sometimes cause issues, see:
> 
> commit b18032bb0a883cd7edd22a7fe6c57e1059b81ed0
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Sun Jan 24 16:58:56 2021 -0700
> 
>      io_uring: only call io_cqring_ev_posted() if events were posted
> 
> I would tend to agree with Hao here, and the usual optimization idiom
> looks like:
> 
> if (struct->nr) {
> 	mutex_lock(&struct->lock);
> 	if (struct->nr)
> 		do_something();
> 	mutex_unlock(&struct->lock);
> }
> 
> like you posted, which would be fine and avoid this whole discussion :-)
> 
> Hao, care to spin a patch that does that?
no problem.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 22:09       ` Jens Axboe
  2021-08-20 22:21         ` Hao Xu
@ 2021-08-20 22:28         ` Pavel Begunkov
  2021-08-20 22:30           ` Jens Axboe
  1 sibling, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2021-08-20 22:28 UTC (permalink / raw)
  To: Jens Axboe, Hao Xu; +Cc: io-uring, Joseph Qi

On 8/20/21 11:09 PM, Jens Axboe wrote:
> On 8/20/21 3:32 PM, Pavel Begunkov wrote:
>> On 8/20/21 9:39 PM, Hao Xu wrote:
>>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道:
>>>> On 8/20/21 7:40 PM, Hao Xu wrote:
>>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>>>>> may cause problems when accessing it parallelly.
>>>>
>>>> Did you hit any problem? It sounds like it should be fine as is:
>>>>
>>>> The trick is that it's only responsible to flush requests added
>>>> during execution of current call to tctx_task_work(), and those
>>>> naturally synchronised with the current task. All other potentially
>>>> enqueued requests will be of someone else's responsibility.
>>>>
>>>> So, if nobody flushed requests, we're finely in-sync. If we see
>>>> 0 there, but actually enqueued a request, it means someone
>>>> actually flushed it after the request had been added.
>>>>
>>>> Probably, needs a more formal explanation with happens-before
>>>> and so.
>>> I should put more detail in the commit message, the thing is:
>>> say coml_nr > 0
>>>
>>>   ctx_flush_and put                  other context
>>>    if (compl_nr)                      get mutex
>>>                                       coml_nr > 0
>>>                                       do flush
>>>                                           coml_nr = 0
>>>                                       release mutex
>>>         get mutex
>>>            do flush (*)
>>>         release mutex
>>>
>>> in (*) place, we do a bunch of unnecessary works, moreover, we
>>
>> I wouldn't care about overhead, that shouldn't be much
>>
>>> call io_cqring_ev_posted() which I think we shouldn't.
>>
>> IMHO, users should expect spurious io_cqring_ev_posted(),
>> though there were some eventfd users complaining before, so
>> for them we can do
> 
> It does sometimes cause issues, see:

I'm used that locking may end up in spurious wakeups. May be
different for eventfd, but considering that we do batch
completions and so might be calling it only once per multiple
CQEs, it shouldn't be.


> commit b18032bb0a883cd7edd22a7fe6c57e1059b81ed0
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Sun Jan 24 16:58:56 2021 -0700
> 
>     io_uring: only call io_cqring_ev_posted() if events were posted
> 
> I would tend to agree with Hao here, and the usual optimization idiom
> looks like:
> 
> if (struct->nr) {
> 	mutex_lock(&struct->lock);
> 	if (struct->nr)
> 		do_something();
> 	mutex_unlock(&struct->lock);
> }
> 
> like you posted, which would be fine and avoid this whole discussion :-)

Well, until the Hao's message explaining the concerns, I was thinking
it's about potential hangs because of not flushing requests. I'd rather
say the discussion was fruitful and naturally came to the conclusion.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 22:28         ` Pavel Begunkov
@ 2021-08-20 22:30           ` Jens Axboe
  2021-08-20 22:41             ` Pavel Begunkov
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2021-08-20 22:30 UTC (permalink / raw)
  To: Pavel Begunkov, Hao Xu; +Cc: io-uring, Joseph Qi

On 8/20/21 4:28 PM, Pavel Begunkov wrote:
> On 8/20/21 11:09 PM, Jens Axboe wrote:
>> On 8/20/21 3:32 PM, Pavel Begunkov wrote:
>>> On 8/20/21 9:39 PM, Hao Xu wrote:
>>>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道:
>>>>> On 8/20/21 7:40 PM, Hao Xu wrote:
>>>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>>>>>> may cause problems when accessing it parallelly.
>>>>>
>>>>> Did you hit any problem? It sounds like it should be fine as is:
>>>>>
>>>>> The trick is that it's only responsible to flush requests added
>>>>> during execution of current call to tctx_task_work(), and those
>>>>> naturally synchronised with the current task. All other potentially
>>>>> enqueued requests will be of someone else's responsibility.
>>>>>
>>>>> So, if nobody flushed requests, we're finely in-sync. If we see
>>>>> 0 there, but actually enqueued a request, it means someone
>>>>> actually flushed it after the request had been added.
>>>>>
>>>>> Probably, needs a more formal explanation with happens-before
>>>>> and so.
>>>> I should put more detail in the commit message, the thing is:
>>>> say coml_nr > 0
>>>>
>>>>   ctx_flush_and put                  other context
>>>>    if (compl_nr)                      get mutex
>>>>                                       coml_nr > 0
>>>>                                       do flush
>>>>                                           coml_nr = 0
>>>>                                       release mutex
>>>>         get mutex
>>>>            do flush (*)
>>>>         release mutex
>>>>
>>>> in (*) place, we do a bunch of unnecessary works, moreover, we
>>>
>>> I wouldn't care about overhead, that shouldn't be much
>>>
>>>> call io_cqring_ev_posted() which I think we shouldn't.
>>>
>>> IMHO, users should expect spurious io_cqring_ev_posted(),
>>> though there were some eventfd users complaining before, so
>>> for them we can do
>>
>> It does sometimes cause issues, see:
> 
> I'm used that locking may end up in spurious wakeups. May be
> different for eventfd, but considering that we do batch
> completions and so might be calling it only once per multiple
> CQEs, it shouldn't be.

The wakeups are fine, it's the ev increment that's causing some issues.

>> commit b18032bb0a883cd7edd22a7fe6c57e1059b81ed0
>> Author: Jens Axboe <axboe@kernel.dk>
>> Date:   Sun Jan 24 16:58:56 2021 -0700
>>
>>     io_uring: only call io_cqring_ev_posted() if events were posted
>>
>> I would tend to agree with Hao here, and the usual optimization idiom
>> looks like:
>>
>> if (struct->nr) {
>> 	mutex_lock(&struct->lock);
>> 	if (struct->nr)
>> 		do_something();
>> 	mutex_unlock(&struct->lock);
>> }
>>
>> like you posted, which would be fine and avoid this whole discussion :-)
> 
> Well, until the Hao's message explaining the concerns, I was thinking
> it's about potential hangs because of not flushing requests. I'd rather
> say the discussion was fruitful and naturally came to the conclusion.

Oh for sure, didn't mean to imply it was useless. At least it's in the
archives :)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 22:30           ` Jens Axboe
@ 2021-08-20 22:41             ` Pavel Begunkov
  2021-08-20 22:46               ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2021-08-20 22:41 UTC (permalink / raw)
  To: Jens Axboe, Hao Xu; +Cc: io-uring, Joseph Qi

On 8/20/21 11:30 PM, Jens Axboe wrote:
> On 8/20/21 4:28 PM, Pavel Begunkov wrote:
>> On 8/20/21 11:09 PM, Jens Axboe wrote:
>>> On 8/20/21 3:32 PM, Pavel Begunkov wrote:
>>>> On 8/20/21 9:39 PM, Hao Xu wrote:
>>>>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道:
>>>>>> On 8/20/21 7:40 PM, Hao Xu wrote:
>>>>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>>>>>>> may cause problems when accessing it parallelly.
>>>>>>
>>>>>> Did you hit any problem? It sounds like it should be fine as is:
>>>>>>
>>>>>> The trick is that it's only responsible to flush requests added
>>>>>> during execution of current call to tctx_task_work(), and those
>>>>>> naturally synchronised with the current task. All other potentially
>>>>>> enqueued requests will be of someone else's responsibility.
>>>>>>
>>>>>> So, if nobody flushed requests, we're finely in-sync. If we see
>>>>>> 0 there, but actually enqueued a request, it means someone
>>>>>> actually flushed it after the request had been added.
>>>>>>
>>>>>> Probably, needs a more formal explanation with happens-before
>>>>>> and so.
>>>>> I should put more detail in the commit message, the thing is:
>>>>> say coml_nr > 0
>>>>>
>>>>>   ctx_flush_and put                  other context
>>>>>    if (compl_nr)                      get mutex
>>>>>                                       coml_nr > 0
>>>>>                                       do flush
>>>>>                                           coml_nr = 0
>>>>>                                       release mutex
>>>>>         get mutex
>>>>>            do flush (*)
>>>>>         release mutex
>>>>>
>>>>> in (*) place, we do a bunch of unnecessary works, moreover, we
>>>>
>>>> I wouldn't care about overhead, that shouldn't be much
>>>>
>>>>> call io_cqring_ev_posted() which I think we shouldn't.
>>>>
>>>> IMHO, users should expect spurious io_cqring_ev_posted(),
>>>> though there were some eventfd users complaining before, so
>>>> for them we can do
>>>
>>> It does sometimes cause issues, see:
>>
>> I'm used that locking may end up in spurious wakeups. May be
>> different for eventfd, but considering that we do batch
>> completions and so might be calling it only once per multiple
>> CQEs, it shouldn't be.
> 
> The wakeups are fine, it's the ev increment that's causing some issues.

If userspace doesn't expect that eventfd may get diverged from the
number of posted CQEs, we need something like below. The weird part
is that it looks nobody complained about this one, even though it
should be happening pretty often. 


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 761f4d99a1a9..7a0fc024d857 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1463,34 +1463,39 @@ static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx)
 	return !ctx->eventfd_async || io_wq_current_is_worker();
 }
 
-static void io_cqring_ev_posted(struct io_ring_ctx *ctx)
+static void __io_cqring_ev_posted(struct io_ring_ctx *ctx, unsigned events)
 {
 	/* see waitqueue_active() comment */
 	smp_mb();
 
+	if (io_should_trigger_evfd(ctx))
+		eventfd_signal(ctx->cq_ev_fd, events);
 	if (waitqueue_active(&ctx->wait))
 		wake_up(&ctx->wait);
 	if (ctx->sq_data && waitqueue_active(&ctx->sq_data->wait))
 		wake_up(&ctx->sq_data->wait);
-	if (io_should_trigger_evfd(ctx))
-		eventfd_signal(ctx->cq_ev_fd, 1);
 	if (waitqueue_active(&ctx->cq_wait)) {
 		wake_up_interruptible(&ctx->cq_wait);
 		kill_fasync(&ctx->cq_fasync, SIGIO, POLL_IN);
 	}
 }
 
-static void io_cqring_ev_posted_iopoll(struct io_ring_ctx *ctx)
+static inline void io_cqring_ev_posted(struct io_ring_ctx *ctx)
+{
+	__io_cqring_ev_posted(ctx, 1);
+}
+
+static void io_cqring_ev_posted_iopoll(struct io_ring_ctx *ctx, unsigned events)
 {
 	/* see waitqueue_active() comment */
 	smp_mb();
 
+	if (io_should_trigger_evfd(ctx))
+		eventfd_signal(ctx->cq_ev_fd, events);
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		if (waitqueue_active(&ctx->wait))
 			wake_up(&ctx->wait);
 	}
-	if (io_should_trigger_evfd(ctx))
-		eventfd_signal(ctx->cq_ev_fd, 1);
 	if (waitqueue_active(&ctx->cq_wait)) {
 		wake_up_interruptible(&ctx->cq_wait);
 		kill_fasync(&ctx->cq_fasync, SIGIO, POLL_IN);
@@ -2223,7 +2228,7 @@ static void io_submit_flush_completions(struct io_comp_state *cs,
 	io_commit_cqring(ctx);
 	spin_unlock_irq(&ctx->completion_lock);
 
-	io_cqring_ev_posted(ctx);
+	__io_cqring_ev_posted(ctx, nr);
 	for (i = 0; i < nr; i++) {
 		req = cs->reqs[i];
 
@@ -2336,6 +2341,7 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 {
 	struct req_batch rb;
 	struct io_kiocb *req;
+	unsigned int events = 0;
 
 	/* order with ->result store in io_complete_rw_iopoll() */
 	smp_rmb();
@@ -2360,15 +2366,16 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 
 		__io_cqring_fill_event(ctx, req->user_data, req->result, cflags,
 					req->cq_idx);
-		(*nr_events)++;
+		events++;
 
 		if (req_ref_put_and_test(req))
 			io_req_free_batch(&rb, req, &ctx->submit_state);
 	}
 
 	io_commit_cqring(ctx);
-	io_cqring_ev_posted_iopoll(ctx);
+	io_cqring_ev_posted_iopoll(ctx, events);
 	io_req_free_batch_finish(ctx, &rb);
+	*nr_events += events;
 }
 
 static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
@@ -5404,7 +5411,7 @@ static bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 	spin_unlock_irq(&ctx->completion_lock);
 
 	if (posted)
-		io_cqring_ev_posted(ctx);
+		__io_cqring_ev_posted(ctx, posted);
 
 	return posted != 0;
 }
@@ -9010,7 +9017,7 @@ static bool io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk,
 		io_commit_cqring(ctx);
 	spin_unlock_irq(&ctx->completion_lock);
 	if (canceled != 0)
-		io_cqring_ev_posted(ctx);
+		__io_cqring_ev_posted(ctx, canceled);
 	return canceled != 0;
 }





^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 22:41             ` Pavel Begunkov
@ 2021-08-20 22:46               ` Jens Axboe
  2021-08-20 22:59                 ` Pavel Begunkov
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2021-08-20 22:46 UTC (permalink / raw)
  To: Pavel Begunkov, Hao Xu; +Cc: io-uring, Joseph Qi

On 8/20/21 4:41 PM, Pavel Begunkov wrote:
> On 8/20/21 11:30 PM, Jens Axboe wrote:
>> On 8/20/21 4:28 PM, Pavel Begunkov wrote:
>>> On 8/20/21 11:09 PM, Jens Axboe wrote:
>>>> On 8/20/21 3:32 PM, Pavel Begunkov wrote:
>>>>> On 8/20/21 9:39 PM, Hao Xu wrote:
>>>>>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道:
>>>>>>> On 8/20/21 7:40 PM, Hao Xu wrote:
>>>>>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>>>>>>>> may cause problems when accessing it parallelly.
>>>>>>>
>>>>>>> Did you hit any problem? It sounds like it should be fine as is:
>>>>>>>
>>>>>>> The trick is that it's only responsible to flush requests added
>>>>>>> during execution of current call to tctx_task_work(), and those
>>>>>>> naturally synchronised with the current task. All other potentially
>>>>>>> enqueued requests will be of someone else's responsibility.
>>>>>>>
>>>>>>> So, if nobody flushed requests, we're finely in-sync. If we see
>>>>>>> 0 there, but actually enqueued a request, it means someone
>>>>>>> actually flushed it after the request had been added.
>>>>>>>
>>>>>>> Probably, needs a more formal explanation with happens-before
>>>>>>> and so.
>>>>>> I should put more detail in the commit message, the thing is:
>>>>>> say coml_nr > 0
>>>>>>
>>>>>>   ctx_flush_and put                  other context
>>>>>>    if (compl_nr)                      get mutex
>>>>>>                                       coml_nr > 0
>>>>>>                                       do flush
>>>>>>                                           coml_nr = 0
>>>>>>                                       release mutex
>>>>>>         get mutex
>>>>>>            do flush (*)
>>>>>>         release mutex
>>>>>>
>>>>>> in (*) place, we do a bunch of unnecessary works, moreover, we
>>>>>
>>>>> I wouldn't care about overhead, that shouldn't be much
>>>>>
>>>>>> call io_cqring_ev_posted() which I think we shouldn't.
>>>>>
>>>>> IMHO, users should expect spurious io_cqring_ev_posted(),
>>>>> though there were some eventfd users complaining before, so
>>>>> for them we can do
>>>>
>>>> It does sometimes cause issues, see:
>>>
>>> I'm used that locking may end up in spurious wakeups. May be
>>> different for eventfd, but considering that we do batch
>>> completions and so might be calling it only once per multiple
>>> CQEs, it shouldn't be.
>>
>> The wakeups are fine, it's the ev increment that's causing some issues.
> 
> If userspace doesn't expect that eventfd may get diverged from the
> number of posted CQEs, we need something like below. The weird part
> is that it looks nobody complained about this one, even though it
> should be happening pretty often. 

That wasn't the issue we ran into, it was more the fact that eventfd
would indicate that something had been posted, when nothing had.
We don't need eventfd notifications to be == number of posted events,
just if the eventfd notification is inremented, there should be new
events there.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 22:46               ` Jens Axboe
@ 2021-08-20 22:59                 ` Pavel Begunkov
  2021-08-21  3:10                   ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2021-08-20 22:59 UTC (permalink / raw)
  To: Jens Axboe, Hao Xu; +Cc: io-uring, Joseph Qi

On 8/20/21 11:46 PM, Jens Axboe wrote:
> On 8/20/21 4:41 PM, Pavel Begunkov wrote:
>> On 8/20/21 11:30 PM, Jens Axboe wrote:
>>> On 8/20/21 4:28 PM, Pavel Begunkov wrote:
>>>> On 8/20/21 11:09 PM, Jens Axboe wrote:
>>>>> On 8/20/21 3:32 PM, Pavel Begunkov wrote:
>>>>>> On 8/20/21 9:39 PM, Hao Xu wrote:
>>>>>>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道:
>>>>>>>> On 8/20/21 7:40 PM, Hao Xu wrote:
>>>>>>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>>>>>>>>> may cause problems when accessing it parallelly.
>>>>>>>>
>>>>>>>> Did you hit any problem? It sounds like it should be fine as is:
>>>>>>>>
>>>>>>>> The trick is that it's only responsible to flush requests added
>>>>>>>> during execution of current call to tctx_task_work(), and those
>>>>>>>> naturally synchronised with the current task. All other potentially
>>>>>>>> enqueued requests will be of someone else's responsibility.
>>>>>>>>
>>>>>>>> So, if nobody flushed requests, we're finely in-sync. If we see
>>>>>>>> 0 there, but actually enqueued a request, it means someone
>>>>>>>> actually flushed it after the request had been added.
>>>>>>>>
>>>>>>>> Probably, needs a more formal explanation with happens-before
>>>>>>>> and so.
>>>>>>> I should put more detail in the commit message, the thing is:
>>>>>>> say coml_nr > 0
>>>>>>>
>>>>>>>   ctx_flush_and put                  other context
>>>>>>>    if (compl_nr)                      get mutex
>>>>>>>                                       coml_nr > 0
>>>>>>>                                       do flush
>>>>>>>                                           coml_nr = 0
>>>>>>>                                       release mutex
>>>>>>>         get mutex
>>>>>>>            do flush (*)
>>>>>>>         release mutex
>>>>>>>
>>>>>>> in (*) place, we do a bunch of unnecessary works, moreover, we
>>>>>>
>>>>>> I wouldn't care about overhead, that shouldn't be much
>>>>>>
>>>>>>> call io_cqring_ev_posted() which I think we shouldn't.
>>>>>>
>>>>>> IMHO, users should expect spurious io_cqring_ev_posted(),
>>>>>> though there were some eventfd users complaining before, so
>>>>>> for them we can do
>>>>>
>>>>> It does sometimes cause issues, see:
>>>>
>>>> I'm used that locking may end up in spurious wakeups. May be
>>>> different for eventfd, but considering that we do batch
>>>> completions and so might be calling it only once per multiple
>>>> CQEs, it shouldn't be.
>>>
>>> The wakeups are fine, it's the ev increment that's causing some issues.
>>
>> If userspace doesn't expect that eventfd may get diverged from the
>> number of posted CQEs, we need something like below. The weird part
>> is that it looks nobody complained about this one, even though it
>> should be happening pretty often. 
> 
> That wasn't the issue we ran into, it was more the fact that eventfd
> would indicate that something had been posted, when nothing had.
> We don't need eventfd notifications to be == number of posted events,
> just if the eventfd notification is inremented, there should be new
> events there.

It's just so commonly mentioned, that for me expecting spurious
events/wakeups is a default. Do we have it documented anywhere?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr
  2021-08-20 22:59                 ` Pavel Begunkov
@ 2021-08-21  3:10                   ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2021-08-21  3:10 UTC (permalink / raw)
  To: Pavel Begunkov, Hao Xu; +Cc: io-uring, Joseph Qi

On 8/20/21 4:59 PM, Pavel Begunkov wrote:
> On 8/20/21 11:46 PM, Jens Axboe wrote:
>> On 8/20/21 4:41 PM, Pavel Begunkov wrote:
>>> On 8/20/21 11:30 PM, Jens Axboe wrote:
>>>> On 8/20/21 4:28 PM, Pavel Begunkov wrote:
>>>>> On 8/20/21 11:09 PM, Jens Axboe wrote:
>>>>>> On 8/20/21 3:32 PM, Pavel Begunkov wrote:
>>>>>>> On 8/20/21 9:39 PM, Hao Xu wrote:
>>>>>>>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道:
>>>>>>>>> On 8/20/21 7:40 PM, Hao Xu wrote:
>>>>>>>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
>>>>>>>>>> may cause problems when accessing it parallelly.
>>>>>>>>>
>>>>>>>>> Did you hit any problem? It sounds like it should be fine as is:
>>>>>>>>>
>>>>>>>>> The trick is that it's only responsible to flush requests added
>>>>>>>>> during execution of current call to tctx_task_work(), and those
>>>>>>>>> naturally synchronised with the current task. All other potentially
>>>>>>>>> enqueued requests will be of someone else's responsibility.
>>>>>>>>>
>>>>>>>>> So, if nobody flushed requests, we're finely in-sync. If we see
>>>>>>>>> 0 there, but actually enqueued a request, it means someone
>>>>>>>>> actually flushed it after the request had been added.
>>>>>>>>>
>>>>>>>>> Probably, needs a more formal explanation with happens-before
>>>>>>>>> and so.
>>>>>>>> I should put more detail in the commit message, the thing is:
>>>>>>>> say coml_nr > 0
>>>>>>>>
>>>>>>>>   ctx_flush_and put                  other context
>>>>>>>>    if (compl_nr)                      get mutex
>>>>>>>>                                       coml_nr > 0
>>>>>>>>                                       do flush
>>>>>>>>                                           coml_nr = 0
>>>>>>>>                                       release mutex
>>>>>>>>         get mutex
>>>>>>>>            do flush (*)
>>>>>>>>         release mutex
>>>>>>>>
>>>>>>>> in (*) place, we do a bunch of unnecessary works, moreover, we
>>>>>>>
>>>>>>> I wouldn't care about overhead, that shouldn't be much
>>>>>>>
>>>>>>>> call io_cqring_ev_posted() which I think we shouldn't.
>>>>>>>
>>>>>>> IMHO, users should expect spurious io_cqring_ev_posted(),
>>>>>>> though there were some eventfd users complaining before, so
>>>>>>> for them we can do
>>>>>>
>>>>>> It does sometimes cause issues, see:
>>>>>
>>>>> I'm used that locking may end up in spurious wakeups. May be
>>>>> different for eventfd, but considering that we do batch
>>>>> completions and so might be calling it only once per multiple
>>>>> CQEs, it shouldn't be.
>>>>
>>>> The wakeups are fine, it's the ev increment that's causing some issues.
>>>
>>> If userspace doesn't expect that eventfd may get diverged from the
>>> number of posted CQEs, we need something like below. The weird part
>>> is that it looks nobody complained about this one, even though it
>>> should be happening pretty often. 
>>
>> That wasn't the issue we ran into, it was more the fact that eventfd
>> would indicate that something had been posted, when nothing had.
>> We don't need eventfd notifications to be == number of posted events,
>> just if the eventfd notification is inremented, there should be new
>> events there.
> 
> It's just so commonly mentioned, that for me expecting spurious
> events/wakeups is a default. Do we have it documented anywhere?

Not documented to my knowledge, and I wasn't really aware of this being
a problem until it was reported and that above referenced commit was
done to fix it. Might be worthwhile to put a comment in ev_posted() to
detail this, I'll do that for 5.15.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-08-21  3:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 18:40 [PATCH for-5.15] io_uring: fix lacking of protection for compl_nr Hao Xu
2021-08-20 18:59 ` Pavel Begunkov
2021-08-20 20:39   ` Hao Xu
2021-08-20 21:32     ` Pavel Begunkov
2021-08-20 22:07       ` Hao Xu
2021-08-20 22:09       ` Jens Axboe
2021-08-20 22:21         ` Hao Xu
2021-08-20 22:28         ` Pavel Begunkov
2021-08-20 22:30           ` Jens Axboe
2021-08-20 22:41             ` Pavel Begunkov
2021-08-20 22:46               ` Jens Axboe
2021-08-20 22:59                 ` Pavel Begunkov
2021-08-21  3:10                   ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.