linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] io_uring: add a memory barrier before atomic_read
@ 2019-07-18 12:44 Zhengyuan Liu
  2019-07-18 15:41 ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Zhengyuan Liu @ 2019-07-18 12:44 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, liuyun01

There is a hang issue while using fio to do some basic test. The issue can
been easily reproduced using bellow scripts:

        while true
        do
                fio  --ioengine=io_uring  -rw=write -bs=4k -numjobs=1 \
                     -size=1G -iodepth=64 -name=uring   --filename=/dev/zero
        done

After serveral minutes, maybe more, fio would block at
io_uring_enter->io_cqring_wait in order to waiting for previously committed
sqes to be completed and cann't return to user anymore until we send a SIGTERM
to fio. After got SIGTERM, fio turns to hang at io_ring_ctx_wait_and_kill with
a backtrace like this:

        [54133.243816] Call Trace:
        [54133.243842]  __schedule+0x3a0/0x790
        [54133.243868]  schedule+0x38/0xa0
        [54133.243880]  schedule_timeout+0x218/0x3b0
        [54133.243891]  ? sched_clock+0x9/0x10
        [54133.243903]  ? wait_for_completion+0xa3/0x130
        [54133.243916]  ? _raw_spin_unlock_irq+0x2c/0x40
        [54133.243930]  ? trace_hardirqs_on+0x3f/0xe0
        [54133.243951]  wait_for_completion+0xab/0x130
        [54133.243962]  ? wake_up_q+0x70/0x70
        [54133.243984]  io_ring_ctx_wait_and_kill+0xa0/0x1d0
        [54133.243998]  io_uring_release+0x20/0x30
        [54133.244008]  __fput+0xcf/0x270
        [54133.244029]  ____fput+0xe/0x10
        [54133.244040]  task_work_run+0x7f/0xa0
        [54133.244056]  do_exit+0x305/0xc40
        [54133.244067]  ? get_signal+0x13b/0xbd0
        [54133.244088]  do_group_exit+0x50/0xd0
        [54133.244103]  get_signal+0x18d/0xbd0
        [54133.244112]  ? _raw_spin_unlock_irqrestore+0x36/0x60
        [54133.244142]  do_signal+0x34/0x720
        [54133.244171]  ? exit_to_usermode_loop+0x7e/0x130
        [54133.244190]  exit_to_usermode_loop+0xc0/0x130
        [54133.244209]  do_syscall_64+0x16b/0x1d0
        [54133.244221]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The reason is that we had added a req to ctx->pending_async at the very end, but
it got no chance to be processed anymore. How could this be happened?

        fio#cpu0                                        wq#cpu1

        io_add_to_prev_work                    io_sq_wq_submit_work

          atomic_read() <<< 1

                                                  atomic_dec_return() << 1->0
                                                  list_empty();    <<< true;

          list_add_tail()
          atomic_read() << 0 or 1?

As was said in atomic_ops.rst, atomic_read does not guarantee that the runtime
initialization by any other thread is visible yet, so we must take care of that
with a proper implicit or explicit memory barrier;

This issue was detected with the help of Jackie's <liuyun01@kylinos.cn>

Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
---
 fs/io_uring.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 56fe6e1..26e7223 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1766,6 +1766,7 @@ static bool io_add_to_prev_work(struct async_list *list, struct io_kiocb *req)
 	ret = true;
 	spin_lock(&list->lock);
 	list_add_tail(&req->list, &list->list);
+	smp_mb();
 	if (!atomic_read(&list->cnt)) {
 		list_del_init(&req->list);
 		ret = false;
-- 
2.7.4




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] io_uring: add a memory barrier before atomic_read
  2019-07-18 12:44 [RFC PATCH] io_uring: add a memory barrier before atomic_read Zhengyuan Liu
@ 2019-07-18 15:41 ` Jens Axboe
  2019-07-18 16:43   ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2019-07-18 15:41 UTC (permalink / raw)
  To: Zhengyuan Liu; +Cc: linux-block, liuyun01

On 7/18/19 6:44 AM, Zhengyuan Liu wrote:
> There is a hang issue while using fio to do some basic test. The issue can
> been easily reproduced using bellow scripts:
> 
>          while true
>          do
>                  fio  --ioengine=io_uring  -rw=write -bs=4k -numjobs=1 \
>                       -size=1G -iodepth=64 -name=uring   --filename=/dev/zero
>          done
> 
> After serveral minutes, maybe more, fio would block at
> io_uring_enter->io_cqring_wait in order to waiting for previously committed
> sqes to be completed and cann't return to user anymore until we send a SIGTERM
> to fio. After got SIGTERM, fio turns to hang at io_ring_ctx_wait_and_kill with
> a backtrace like this:
> 
>          [54133.243816] Call Trace:
>          [54133.243842]  __schedule+0x3a0/0x790
>          [54133.243868]  schedule+0x38/0xa0
>          [54133.243880]  schedule_timeout+0x218/0x3b0
>          [54133.243891]  ? sched_clock+0x9/0x10
>          [54133.243903]  ? wait_for_completion+0xa3/0x130
>          [54133.243916]  ? _raw_spin_unlock_irq+0x2c/0x40
>          [54133.243930]  ? trace_hardirqs_on+0x3f/0xe0
>          [54133.243951]  wait_for_completion+0xab/0x130
>          [54133.243962]  ? wake_up_q+0x70/0x70
>          [54133.243984]  io_ring_ctx_wait_and_kill+0xa0/0x1d0
>          [54133.243998]  io_uring_release+0x20/0x30
>          [54133.244008]  __fput+0xcf/0x270
>          [54133.244029]  ____fput+0xe/0x10
>          [54133.244040]  task_work_run+0x7f/0xa0
>          [54133.244056]  do_exit+0x305/0xc40
>          [54133.244067]  ? get_signal+0x13b/0xbd0
>          [54133.244088]  do_group_exit+0x50/0xd0
>          [54133.244103]  get_signal+0x18d/0xbd0
>          [54133.244112]  ? _raw_spin_unlock_irqrestore+0x36/0x60
>          [54133.244142]  do_signal+0x34/0x720
>          [54133.244171]  ? exit_to_usermode_loop+0x7e/0x130
>          [54133.244190]  exit_to_usermode_loop+0xc0/0x130
>          [54133.244209]  do_syscall_64+0x16b/0x1d0
>          [54133.244221]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> The reason is that we had added a req to ctx->pending_async at the very end, but
> it got no chance to be processed anymore. How could this be happened?
> 
>          fio#cpu0                                        wq#cpu1
> 
>          io_add_to_prev_work                    io_sq_wq_submit_work
> 
>            atomic_read() <<< 1
> 
>                                                    atomic_dec_return() << 1->0
>                                                    list_empty();    <<< true;
> 
>            list_add_tail()
>            atomic_read() << 0 or 1?
> 
> As was said in atomic_ops.rst, atomic_read does not guarantee that the runtime
> initialization by any other thread is visible yet, so we must take care of that
> with a proper implicit or explicit memory barrier;

Thanks for looking at this and finding this issue, it does looks like a problem.
But I'm not sure about the fix. Shouldn't we just need an smp_mb__after_atomic()
on the atomic_dec_return() side of things? Like the below.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5ec06e5ba0be..3c2a6f88a6b0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1881,6 +1881,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
 	 */
 	if (async_list) {
 		ret = atomic_dec_return(&async_list->cnt);
+		smp_mb__after_atomic();
 		while (!ret && !list_empty(&async_list->list)) {
 			spin_lock(&async_list->lock);
 			atomic_inc(&async_list->cnt);
@@ -1894,6 +1895,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
 				goto restart;
 			}
 			ret = atomic_dec_return(&async_list->cnt);
+			smp_mb__after_atomic();
 		}
 	}
 
-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] io_uring: add a memory barrier before atomic_read
  2019-07-18 15:41 ` Jens Axboe
@ 2019-07-18 16:43   ` Jens Axboe
  2019-07-19  0:44     ` Jackie Liu
       [not found]     ` <5d3114d7.1c69fb81.fc097.122eSMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 7+ messages in thread
From: Jens Axboe @ 2019-07-18 16:43 UTC (permalink / raw)
  To: Zhengyuan Liu; +Cc: linux-block, liuyun01

On 7/18/19 9:41 AM, Jens Axboe wrote:
> On 7/18/19 6:44 AM, Zhengyuan Liu wrote:
>> There is a hang issue while using fio to do some basic test. The issue can
>> been easily reproduced using bellow scripts:
>>
>>           while true
>>           do
>>                   fio  --ioengine=io_uring  -rw=write -bs=4k -numjobs=1 \
>>                        -size=1G -iodepth=64 -name=uring   --filename=/dev/zero
>>           done
>>
>> After serveral minutes, maybe more, fio would block at
>> io_uring_enter->io_cqring_wait in order to waiting for previously committed
>> sqes to be completed and cann't return to user anymore until we send a SIGTERM
>> to fio. After got SIGTERM, fio turns to hang at io_ring_ctx_wait_and_kill with
>> a backtrace like this:
>>
>>           [54133.243816] Call Trace:
>>           [54133.243842]  __schedule+0x3a0/0x790
>>           [54133.243868]  schedule+0x38/0xa0
>>           [54133.243880]  schedule_timeout+0x218/0x3b0
>>           [54133.243891]  ? sched_clock+0x9/0x10
>>           [54133.243903]  ? wait_for_completion+0xa3/0x130
>>           [54133.243916]  ? _raw_spin_unlock_irq+0x2c/0x40
>>           [54133.243930]  ? trace_hardirqs_on+0x3f/0xe0
>>           [54133.243951]  wait_for_completion+0xab/0x130
>>           [54133.243962]  ? wake_up_q+0x70/0x70
>>           [54133.243984]  io_ring_ctx_wait_and_kill+0xa0/0x1d0
>>           [54133.243998]  io_uring_release+0x20/0x30
>>           [54133.244008]  __fput+0xcf/0x270
>>           [54133.244029]  ____fput+0xe/0x10
>>           [54133.244040]  task_work_run+0x7f/0xa0
>>           [54133.244056]  do_exit+0x305/0xc40
>>           [54133.244067]  ? get_signal+0x13b/0xbd0
>>           [54133.244088]  do_group_exit+0x50/0xd0
>>           [54133.244103]  get_signal+0x18d/0xbd0
>>           [54133.244112]  ? _raw_spin_unlock_irqrestore+0x36/0x60
>>           [54133.244142]  do_signal+0x34/0x720
>>           [54133.244171]  ? exit_to_usermode_loop+0x7e/0x130
>>           [54133.244190]  exit_to_usermode_loop+0xc0/0x130
>>           [54133.244209]  do_syscall_64+0x16b/0x1d0
>>           [54133.244221]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> The reason is that we had added a req to ctx->pending_async at the very end, but
>> it got no chance to be processed anymore. How could this be happened?
>>
>>           fio#cpu0                                        wq#cpu1
>>
>>           io_add_to_prev_work                    io_sq_wq_submit_work
>>
>>             atomic_read() <<< 1
>>
>>                                                     atomic_dec_return() << 1->0
>>                                                     list_empty();    <<< true;
>>
>>             list_add_tail()
>>             atomic_read() << 0 or 1?
>>
>> As was said in atomic_ops.rst, atomic_read does not guarantee that the runtime
>> initialization by any other thread is visible yet, so we must take care of that
>> with a proper implicit or explicit memory barrier;
> 
> Thanks for looking at this and finding this issue, it does looks like a problem.
> But I'm not sure about the fix. Shouldn't we just need an smp_mb__after_atomic()
> on the atomic_dec_return() side of things? Like the below.
> 
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 5ec06e5ba0be..3c2a6f88a6b0 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1881,6 +1881,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>   	 */
>   	if (async_list) {
>   		ret = atomic_dec_return(&async_list->cnt);
> +		smp_mb__after_atomic();
>   		while (!ret && !list_empty(&async_list->list)) {
>   			spin_lock(&async_list->lock);
>   			atomic_inc(&async_list->cnt);
> @@ -1894,6 +1895,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>   				goto restart;
>   			}
>   			ret = atomic_dec_return(&async_list->cnt);
> +			smp_mb__after_atomic();
>   		}
>   	}
>   
> 

I don't think this is enough, I actually think your fix is the most
appropriate. I will apply it, thank you!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] io_uring: add a memory barrier before atomic_read
  2019-07-18 16:43   ` Jens Axboe
@ 2019-07-19  0:44     ` Jackie Liu
       [not found]     ` <5d3114d7.1c69fb81.fc097.122eSMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 7+ messages in thread
From: Jackie Liu @ 2019-07-19  0:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: 刘正元, linux-block



> 在 2019年7月19日,00:43,Jens Axboe <axboe@kernel.dk> 写道:
> 
> On 7/18/19 9:41 AM, Jens Axboe wrote:
>> On 7/18/19 6:44 AM, Zhengyuan Liu wrote:
>>> There is a hang issue while using fio to do some basic test. The issue can
>>> been easily reproduced using bellow scripts:
>>> 
>>>          while true
>>>          do
>>>                  fio  --ioengine=io_uring  -rw=write -bs=4k -numjobs=1 \
>>>                       -size=1G -iodepth=64 -name=uring   --filename=/dev/zero
>>>          done
>>> 
>>> After serveral minutes, maybe more, fio would block at
>>> io_uring_enter->io_cqring_wait in order to waiting for previously committed
>>> sqes to be completed and cann't return to user anymore until we send a SIGTERM
>>> to fio. After got SIGTERM, fio turns to hang at io_ring_ctx_wait_and_kill with
>>> a backtrace like this:
>>> 
>>>          [54133.243816] Call Trace:
>>>          [54133.243842]  __schedule+0x3a0/0x790
>>>          [54133.243868]  schedule+0x38/0xa0
>>>          [54133.243880]  schedule_timeout+0x218/0x3b0
>>>          [54133.243891]  ? sched_clock+0x9/0x10
>>>          [54133.243903]  ? wait_for_completion+0xa3/0x130
>>>          [54133.243916]  ? _raw_spin_unlock_irq+0x2c/0x40
>>>          [54133.243930]  ? trace_hardirqs_on+0x3f/0xe0
>>>          [54133.243951]  wait_for_completion+0xab/0x130
>>>          [54133.243962]  ? wake_up_q+0x70/0x70
>>>          [54133.243984]  io_ring_ctx_wait_and_kill+0xa0/0x1d0
>>>          [54133.243998]  io_uring_release+0x20/0x30
>>>          [54133.244008]  __fput+0xcf/0x270
>>>          [54133.244029]  ____fput+0xe/0x10
>>>          [54133.244040]  task_work_run+0x7f/0xa0
>>>          [54133.244056]  do_exit+0x305/0xc40
>>>          [54133.244067]  ? get_signal+0x13b/0xbd0
>>>          [54133.244088]  do_group_exit+0x50/0xd0
>>>          [54133.244103]  get_signal+0x18d/0xbd0
>>>          [54133.244112]  ? _raw_spin_unlock_irqrestore+0x36/0x60
>>>          [54133.244142]  do_signal+0x34/0x720
>>>          [54133.244171]  ? exit_to_usermode_loop+0x7e/0x130
>>>          [54133.244190]  exit_to_usermode_loop+0xc0/0x130
>>>          [54133.244209]  do_syscall_64+0x16b/0x1d0
>>>          [54133.244221]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> 
>>> The reason is that we had added a req to ctx->pending_async at the very end, but
>>> it got no chance to be processed anymore. How could this be happened?
>>> 
>>>          fio#cpu0                                        wq#cpu1
>>> 
>>>          io_add_to_prev_work                    io_sq_wq_submit_work
>>> 
>>>            atomic_read() <<< 1
>>> 
>>>                                                    atomic_dec_return() << 1->0
>>>                                                    list_empty();    <<< true;
>>> 
>>>            list_add_tail()
>>>            atomic_read() << 0 or 1?
>>> 
>>> As was said in atomic_ops.rst, atomic_read does not guarantee that the runtime
>>> initialization by any other thread is visible yet, so we must take care of that
>>> with a proper implicit or explicit memory barrier;
>> 
>> Thanks for looking at this and finding this issue, it does looks like a problem.
>> But I'm not sure about the fix. Shouldn't we just need an smp_mb__after_atomic()
>> on the atomic_dec_return() side of things? Like the below.
>> 
>> 
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 5ec06e5ba0be..3c2a6f88a6b0 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1881,6 +1881,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>  	 */
>>  	if (async_list) {
>>  		ret = atomic_dec_return(&async_list->cnt);
>> +		smp_mb__after_atomic();
>>  		while (!ret && !list_empty(&async_list->list)) {
>>  			spin_lock(&async_list->lock);
>>  			atomic_inc(&async_list->cnt);
>> @@ -1894,6 +1895,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>  				goto restart;
>>  			}
>>  			ret = atomic_dec_return(&async_list->cnt);
>> +			smp_mb__after_atomic();
>>  		}
>>  	}
>> 
>> 
> 
> I don't think this is enough, I actually think your fix is the most
> appropriate. I will apply it, thank you!
> 

Actually, although we can passed test use smp_mb(), but in the end we still do not
understand where the race conditions are, could you explain it. If it is said as 
Zhengyuan, because of atomic_read, I think we should only need smp_rmb. but failed.
smp_rmb can't help us pass the test. At the same time, we have tried smp_wmb, failed too.
it seems that only smp_mb works correctly.

Is it because list_add_tail requires smp_wmb and atomic_read requires smp_rmb? 

--
Jackie Liu





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] io_uring: add a memory barrier before atomic_read
       [not found]     ` <5d3114d7.1c69fb81.fc097.122eSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2019-07-19 14:51       ` Jens Axboe
  2019-07-20 15:14         ` Zhengyuan Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2019-07-19 14:51 UTC (permalink / raw)
  To: Zhengyuan Liu, source; +Cc: linux-block, liuyun01

On 7/18/19 6:54 PM, Zhengyuan Liu wrote:
> 
> On 7/19/19 12:43 AM, Jens Axboe wrote:
>> On 7/18/19 9:41 AM, Jens Axboe wrote:
>>> On 7/18/19 6:44 AM, Zhengyuan Liu wrote:
>>>> There is a hang issue while using fio to do some basic test. The issue can
>>>> been easily reproduced using bellow scripts:
>>>>
>>>>             while true
>>>>             do
>>>>                     fio  --ioengine=io_uring  -rw=write -bs=4k -numjobs=1 \
>>>>                          -size=1G -iodepth=64 -name=uring   --filename=/dev/zero
>>>>             done
>>>>
>>>> After serveral minutes, maybe more, fio would block at
>>>> io_uring_enter->io_cqring_wait in order to waiting for previously committed
>>>> sqes to be completed and cann't return to user anymore until we send a SIGTERM
>>>> to fio. After got SIGTERM, fio turns to hang at io_ring_ctx_wait_and_kill with
>>>> a backtrace like this:
>>>>
>>>>             [54133.243816] Call Trace:
>>>>             [54133.243842]  __schedule+0x3a0/0x790
>>>>             [54133.243868]  schedule+0x38/0xa0
>>>>             [54133.243880]  schedule_timeout+0x218/0x3b0
>>>>             [54133.243891]  ? sched_clock+0x9/0x10
>>>>             [54133.243903]  ? wait_for_completion+0xa3/0x130
>>>>             [54133.243916]  ? _raw_spin_unlock_irq+0x2c/0x40
>>>>             [54133.243930]  ? trace_hardirqs_on+0x3f/0xe0
>>>>             [54133.243951]  wait_for_completion+0xab/0x130
>>>>             [54133.243962]  ? wake_up_q+0x70/0x70
>>>>             [54133.243984]  io_ring_ctx_wait_and_kill+0xa0/0x1d0
>>>>             [54133.243998]  io_uring_release+0x20/0x30
>>>>             [54133.244008]  __fput+0xcf/0x270
>>>>             [54133.244029]  ____fput+0xe/0x10
>>>>             [54133.244040]  task_work_run+0x7f/0xa0
>>>>             [54133.244056]  do_exit+0x305/0xc40
>>>>             [54133.244067]  ? get_signal+0x13b/0xbd0
>>>>             [54133.244088]  do_group_exit+0x50/0xd0
>>>>             [54133.244103]  get_signal+0x18d/0xbd0
>>>>             [54133.244112]  ? _raw_spin_unlock_irqrestore+0x36/0x60
>>>>             [54133.244142]  do_signal+0x34/0x720
>>>>             [54133.244171]  ? exit_to_usermode_loop+0x7e/0x130
>>>>             [54133.244190]  exit_to_usermode_loop+0xc0/0x130
>>>>             [54133.244209]  do_syscall_64+0x16b/0x1d0
>>>>             [54133.244221]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>
>>>> The reason is that we had added a req to ctx->pending_async at the very end, but
>>>> it got no chance to be processed anymore. How could this be happened?
>>>>
>>>>             fio#cpu0                                        wq#cpu1
>>>>
>>>>             io_add_to_prev_work                    io_sq_wq_submit_work
>>>>
>>>>               atomic_read() <<< 1
>>>>
>>>>                                                       atomic_dec_return() << 1->0
>>>>                                                       list_empty();    <<< true;
>>>>
>>>>               list_add_tail()
>>>>               atomic_read() << 0 or 1?
>>>>
>>>> As was said in atomic_ops.rst, atomic_read does not guarantee that the runtime
>>>> initialization by any other thread is visible yet, so we must take care of that
>>>> with a proper implicit or explicit memory barrier;
>>> Thanks for looking at this and finding this issue, it does looks like a problem.
>>> But I'm not sure about the fix. Shouldn't we just need an smp_mb__after_atomic()
>>> on the atomic_dec_return() side of things? Like the below.
>>>
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 5ec06e5ba0be..3c2a6f88a6b0 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -1881,6 +1881,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>>     	 */
>>>     	if (async_list) {
>>>     		ret = atomic_dec_return(&async_list->cnt);
>>> +		smp_mb__after_atomic();
>>>     		while (!ret && !list_empty(&async_list->list)) {
>>>     			spin_lock(&async_list->lock);
>>>     			atomic_inc(&async_list->cnt);
>>> @@ -1894,6 +1895,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>>     				goto restart;
>>>     			}
>>>     			ret = atomic_dec_return(&async_list->cnt);
>>> +			smp_mb__after_atomic();
>>>     		}
>>>     	}
>>>     
>>>
>> I don't think this is enough, I actually think your fix is the most
>> appropriate. I will apply it, thank you!
>>
> 
> Hi, Jens.
> I have tested you fix and the issue still existed. Actually the
> implementation of atomic_dec_return has been implicitly surrounded
> already by mb()  and as I know, smp_mb__after/before_atomic are not
> suitable for atomic_t operation which does not return a value.

We aren't guaranteed to see the atomic_dec_return() update if it happens
at the same time. So we can either force ordering with the smp_mb(), or
we can do something ala:

	if (!atomic_sub_return(0, &list->cnt)) {
		...

io_add_to_prev_work() to achieve the same sort of effect. That should
work as well.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] io_uring: add a memory barrier before atomic_read
  2019-07-19 14:51       ` Jens Axboe
@ 2019-07-20 15:14         ` Zhengyuan Liu
  2019-07-20 15:19           ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Zhengyuan Liu @ 2019-07-20 15:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Zhengyuan Liu, source, linux-block, 刘云

On Sat, Jul 20, 2019 at 2:27 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 7/18/19 6:54 PM, Zhengyuan Liu wrote:
> >
> > On 7/19/19 12:43 AM, Jens Axboe wrote:
> >> On 7/18/19 9:41 AM, Jens Axboe wrote:
> >>> On 7/18/19 6:44 AM, Zhengyuan Liu wrote:
> >>>> There is a hang issue while using fio to do some basic test. The issue can
> >>>> been easily reproduced using bellow scripts:
> >>>>
> >>>>             while true
> >>>>             do
> >>>>                     fio  --ioengine=io_uring  -rw=write -bs=4k -numjobs=1 \
> >>>>                          -size=1G -iodepth=64 -name=uring   --filename=/dev/zero
> >>>>             done
> >>>>
> >>>> After serveral minutes, maybe more, fio would block at
> >>>> io_uring_enter->io_cqring_wait in order to waiting for previously committed
> >>>> sqes to be completed and cann't return to user anymore until we send a SIGTERM
> >>>> to fio. After got SIGTERM, fio turns to hang at io_ring_ctx_wait_and_kill with
> >>>> a backtrace like this:
> >>>>
> >>>>             [54133.243816] Call Trace:
> >>>>             [54133.243842]  __schedule+0x3a0/0x790
> >>>>             [54133.243868]  schedule+0x38/0xa0
> >>>>             [54133.243880]  schedule_timeout+0x218/0x3b0
> >>>>             [54133.243891]  ? sched_clock+0x9/0x10
> >>>>             [54133.243903]  ? wait_for_completion+0xa3/0x130
> >>>>             [54133.243916]  ? _raw_spin_unlock_irq+0x2c/0x40
> >>>>             [54133.243930]  ? trace_hardirqs_on+0x3f/0xe0
> >>>>             [54133.243951]  wait_for_completion+0xab/0x130
> >>>>             [54133.243962]  ? wake_up_q+0x70/0x70
> >>>>             [54133.243984]  io_ring_ctx_wait_and_kill+0xa0/0x1d0
> >>>>             [54133.243998]  io_uring_release+0x20/0x30
> >>>>             [54133.244008]  __fput+0xcf/0x270
> >>>>             [54133.244029]  ____fput+0xe/0x10
> >>>>             [54133.244040]  task_work_run+0x7f/0xa0
> >>>>             [54133.244056]  do_exit+0x305/0xc40
> >>>>             [54133.244067]  ? get_signal+0x13b/0xbd0
> >>>>             [54133.244088]  do_group_exit+0x50/0xd0
> >>>>             [54133.244103]  get_signal+0x18d/0xbd0
> >>>>             [54133.244112]  ? _raw_spin_unlock_irqrestore+0x36/0x60
> >>>>             [54133.244142]  do_signal+0x34/0x720
> >>>>             [54133.244171]  ? exit_to_usermode_loop+0x7e/0x130
> >>>>             [54133.244190]  exit_to_usermode_loop+0xc0/0x130
> >>>>             [54133.244209]  do_syscall_64+0x16b/0x1d0
> >>>>             [54133.244221]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>>>
> >>>> The reason is that we had added a req to ctx->pending_async at the very end, but
> >>>> it got no chance to be processed anymore. How could this be happened?
> >>>>
> >>>>             fio#cpu0                                        wq#cpu1
> >>>>
> >>>>             io_add_to_prev_work                    io_sq_wq_submit_work
> >>>>
> >>>>               atomic_read() <<< 1
> >>>>
> >>>>                                                       atomic_dec_return() << 1->0
> >>>>                                                       list_empty();    <<< true;
> >>>>
> >>>>               list_add_tail()
> >>>>               atomic_read() << 0 or 1?
> >>>>
> >>>> As was said in atomic_ops.rst, atomic_read does not guarantee that the runtime
> >>>> initialization by any other thread is visible yet, so we must take care of that
> >>>> with a proper implicit or explicit memory barrier;
> >>> Thanks for looking at this and finding this issue, it does looks like a problem.
> >>> But I'm not sure about the fix. Shouldn't we just need an smp_mb__after_atomic()
> >>> on the atomic_dec_return() side of things? Like the below.
> >>>
> >>>
> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c
> >>> index 5ec06e5ba0be..3c2a6f88a6b0 100644
> >>> --- a/fs/io_uring.c
> >>> +++ b/fs/io_uring.c
> >>> @@ -1881,6 +1881,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
> >>>              */
> >>>             if (async_list) {
> >>>                     ret = atomic_dec_return(&async_list->cnt);
> >>> +           smp_mb__after_atomic();
> >>>                     while (!ret && !list_empty(&async_list->list)) {
> >>>                             spin_lock(&async_list->lock);
> >>>                             atomic_inc(&async_list->cnt);
> >>> @@ -1894,6 +1895,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
> >>>                                     goto restart;
> >>>                             }
> >>>                             ret = atomic_dec_return(&async_list->cnt);
> >>> +                   smp_mb__after_atomic();
> >>>                     }
> >>>             }
> >>>
> >>>
> >> I don't think this is enough, I actually think your fix is the most
> >> appropriate. I will apply it, thank you!
> >>
> >
> > Hi, Jens.
> > I have tested you fix and the issue still existed. Actually the
> > implementation of atomic_dec_return has been implicitly surrounded
> > already by mb()  and as I know, smp_mb__after/before_atomic are not
> > suitable for atomic_t operation which does not return a value.
>
> We aren't guaranteed to see the atomic_dec_return() update if it happens
> at the same time. So we can either force ordering with the smp_mb(), or
> we can do something ala:
>
>         if (!atomic_sub_return(0, &list->cnt)) {
>                 ...
>
> io_add_to_prev_work() to achieve the same sort of effect. That should
> work as well.

Yeah,  but I'd prefer smp_mb(), since atomic_sub_return(0, &list->cnt) isn't
such clear.

Thanks.
>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] io_uring: add a memory barrier before atomic_read
  2019-07-20 15:14         ` Zhengyuan Liu
@ 2019-07-20 15:19           ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2019-07-20 15:19 UTC (permalink / raw)
  To: Zhengyuan Liu; +Cc: Zhengyuan Liu, source, linux-block, 刘云

On 7/20/19 9:14 AM, Zhengyuan Liu wrote:
> On Sat, Jul 20, 2019 at 2:27 AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 7/18/19 6:54 PM, Zhengyuan Liu wrote:
>>>
>>> On 7/19/19 12:43 AM, Jens Axboe wrote:
>>>> On 7/18/19 9:41 AM, Jens Axboe wrote:
>>>>> On 7/18/19 6:44 AM, Zhengyuan Liu wrote:
>>>>>> There is a hang issue while using fio to do some basic test. The issue can
>>>>>> been easily reproduced using bellow scripts:
>>>>>>
>>>>>>              while true
>>>>>>              do
>>>>>>                      fio  --ioengine=io_uring  -rw=write -bs=4k -numjobs=1 \
>>>>>>                           -size=1G -iodepth=64 -name=uring   --filename=/dev/zero
>>>>>>              done
>>>>>>
>>>>>> After serveral minutes, maybe more, fio would block at
>>>>>> io_uring_enter->io_cqring_wait in order to waiting for previously committed
>>>>>> sqes to be completed and cann't return to user anymore until we send a SIGTERM
>>>>>> to fio. After got SIGTERM, fio turns to hang at io_ring_ctx_wait_and_kill with
>>>>>> a backtrace like this:
>>>>>>
>>>>>>              [54133.243816] Call Trace:
>>>>>>              [54133.243842]  __schedule+0x3a0/0x790
>>>>>>              [54133.243868]  schedule+0x38/0xa0
>>>>>>              [54133.243880]  schedule_timeout+0x218/0x3b0
>>>>>>              [54133.243891]  ? sched_clock+0x9/0x10
>>>>>>              [54133.243903]  ? wait_for_completion+0xa3/0x130
>>>>>>              [54133.243916]  ? _raw_spin_unlock_irq+0x2c/0x40
>>>>>>              [54133.243930]  ? trace_hardirqs_on+0x3f/0xe0
>>>>>>              [54133.243951]  wait_for_completion+0xab/0x130
>>>>>>              [54133.243962]  ? wake_up_q+0x70/0x70
>>>>>>              [54133.243984]  io_ring_ctx_wait_and_kill+0xa0/0x1d0
>>>>>>              [54133.243998]  io_uring_release+0x20/0x30
>>>>>>              [54133.244008]  __fput+0xcf/0x270
>>>>>>              [54133.244029]  ____fput+0xe/0x10
>>>>>>              [54133.244040]  task_work_run+0x7f/0xa0
>>>>>>              [54133.244056]  do_exit+0x305/0xc40
>>>>>>              [54133.244067]  ? get_signal+0x13b/0xbd0
>>>>>>              [54133.244088]  do_group_exit+0x50/0xd0
>>>>>>              [54133.244103]  get_signal+0x18d/0xbd0
>>>>>>              [54133.244112]  ? _raw_spin_unlock_irqrestore+0x36/0x60
>>>>>>              [54133.244142]  do_signal+0x34/0x720
>>>>>>              [54133.244171]  ? exit_to_usermode_loop+0x7e/0x130
>>>>>>              [54133.244190]  exit_to_usermode_loop+0xc0/0x130
>>>>>>              [54133.244209]  do_syscall_64+0x16b/0x1d0
>>>>>>              [54133.244221]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>>>
>>>>>> The reason is that we had added a req to ctx->pending_async at the very end, but
>>>>>> it got no chance to be processed anymore. How could this be happened?
>>>>>>
>>>>>>              fio#cpu0                                        wq#cpu1
>>>>>>
>>>>>>              io_add_to_prev_work                    io_sq_wq_submit_work
>>>>>>
>>>>>>                atomic_read() <<< 1
>>>>>>
>>>>>>                                                        atomic_dec_return() << 1->0
>>>>>>                                                        list_empty();    <<< true;
>>>>>>
>>>>>>                list_add_tail()
>>>>>>                atomic_read() << 0 or 1?
>>>>>>
>>>>>> As was said in atomic_ops.rst, atomic_read does not guarantee that the runtime
>>>>>> initialization by any other thread is visible yet, so we must take care of that
>>>>>> with a proper implicit or explicit memory barrier;
>>>>> Thanks for looking at this and finding this issue, it does looks like a problem.
>>>>> But I'm not sure about the fix. Shouldn't we just need an smp_mb__after_atomic()
>>>>> on the atomic_dec_return() side of things? Like the below.
>>>>>
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 5ec06e5ba0be..3c2a6f88a6b0 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -1881,6 +1881,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>>>>               */
>>>>>              if (async_list) {
>>>>>                      ret = atomic_dec_return(&async_list->cnt);
>>>>> +           smp_mb__after_atomic();
>>>>>                      while (!ret && !list_empty(&async_list->list)) {
>>>>>                              spin_lock(&async_list->lock);
>>>>>                              atomic_inc(&async_list->cnt);
>>>>> @@ -1894,6 +1895,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>>>>                                      goto restart;
>>>>>                              }
>>>>>                              ret = atomic_dec_return(&async_list->cnt);
>>>>> +                   smp_mb__after_atomic();
>>>>>                      }
>>>>>              }
>>>>>
>>>>>
>>>> I don't think this is enough, I actually think your fix is the most
>>>> appropriate. I will apply it, thank you!
>>>>
>>>
>>> Hi, Jens.
>>> I have tested you fix and the issue still existed. Actually the
>>> implementation of atomic_dec_return has been implicitly surrounded
>>> already by mb()  and as I know, smp_mb__after/before_atomic are not
>>> suitable for atomic_t operation which does not return a value.
>>
>> We aren't guaranteed to see the atomic_dec_return() update if it happens
>> at the same time. So we can either force ordering with the smp_mb(), or
>> we can do something ala:
>>
>>          if (!atomic_sub_return(0, &list->cnt)) {
>>                  ...
>>
>> io_add_to_prev_work() to achieve the same sort of effect. That should
>> work as well.
> 
> Yeah,  but I'd prefer smp_mb(), since atomic_sub_return(0, &list->cnt) isn't
> such clear.

In some ways I actually think it's more clear, as it makes it explicit
what we're synchronizing with, and it's then up to the atomic primitives
to use the right barrier. But I'm fine with the smp_mb() and that's what
I already queued up, so let's just stick with that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-20 15:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-18 12:44 [RFC PATCH] io_uring: add a memory barrier before atomic_read Zhengyuan Liu
2019-07-18 15:41 ` Jens Axboe
2019-07-18 16:43   ` Jens Axboe
2019-07-19  0:44     ` Jackie Liu
     [not found]     ` <5d3114d7.1c69fb81.fc097.122eSMTPIN_ADDED_BROKEN@mx.google.com>
2019-07-19 14:51       ` Jens Axboe
2019-07-20 15:14         ` Zhengyuan Liu
2019-07-20 15:19           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).