linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] io_uring: add a schedule condition in io_submit_sqes
@ 2022-05-21 14:33 Guo Xuenan
  2022-05-22  2:42 ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Guo Xuenan @ 2022-05-21 14:33 UTC (permalink / raw)
  To: axboe, asml.silence, io-uring
  Cc: houtao1, yi.zhang, chengzhihao1, guoxuenan, linux-kernel

when set up sq ring size with IORING_MAX_ENTRIES, io_submit_sqes may
looping ~32768 times which may trigger soft lockups. add need_resched
condition to avoid this bad situation.

set sq ring size 32768 and using io_sq_thread to perform stress test
as follows:
watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [iou-sqp-600:601]
Kernel panic - not syncing: softlockup: hung tasks
CPU: 2 PID: 601 Comm: iou-sqp-600 Tainted: G L 5.18.0-rc7+ #3
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x218/0x228
 show_stack+0x20/0x68
 dump_stack_lvl+0x68/0x84
 dump_stack+0x1c/0x38
 panic+0x1ec/0x3ec
 watchdog_timer_fn+0x28c/0x300
 __hrtimer_run_queues+0x1d8/0x498
 hrtimer_interrupt+0x238/0x558
 arch_timer_handler_virt+0x48/0x60
 handle_percpu_devid_irq+0xdc/0x270
 generic_handle_domain_irq+0x50/0x70
 gic_handle_irq+0x8c/0x4bc
 call_on_irq_stack+0x2c/0x38
 do_interrupt_handler+0xc4/0xc8
 el1_interrupt+0x48/0xb0
 el1h_64_irq_handler+0x18/0x28
 el1h_64_irq+0x74/0x78
 console_unlock+0x5d0/0x908
 vprintk_emit+0x21c/0x470
 vprintk_default+0x40/0x50
 vprintk+0xd0/0x128
 _printk+0xb4/0xe8
 io_issue_sqe+0x1784/0x2908
 io_submit_sqes+0x538/0x2880
 io_sq_thread+0x328/0x7b0
 ret_from_fork+0x10/0x20
SMP: stopping secondary CPUs
Kernel Offset: 0x40f1e8600000 from 0xffff800008000000
PHYS_OFFSET: 0xfffffa8c80000000
CPU features: 0x110,0000cf09,00001006
Memory Limit: none
---[ end Kernel panic - not syncing: softlockup: hung tasks ]---

Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
---
 fs/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 92ac50f139cd..d897c6798f00 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7864,7 +7864,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 			if (!(ctx->flags & IORING_SETUP_SUBMIT_ALL))
 				break;
 		}
-	} while (submitted < nr);
+	} while (submitted < nr && !need_resched());
 
 	if (unlikely(submitted != nr)) {
 		int ref_used = (submitted == -EAGAIN) ? 0 : submitted;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] io_uring: add a schedule condition in io_submit_sqes
  2022-05-21 14:33 [PATCH] io_uring: add a schedule condition in io_submit_sqes Guo Xuenan
@ 2022-05-22  2:42 ` Jens Axboe
  2022-05-23 14:45   ` Guo Xuenan
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2022-05-22  2:42 UTC (permalink / raw)
  To: Guo Xuenan, asml.silence, io-uring
  Cc: houtao1, yi.zhang, chengzhihao1, linux-kernel

On 5/21/22 8:33 AM, Guo Xuenan wrote:
> when set up sq ring size with IORING_MAX_ENTRIES, io_submit_sqes may
> looping ~32768 times which may trigger soft lockups. add need_resched
> condition to avoid this bad situation.
> 
> set sq ring size 32768 and using io_sq_thread to perform stress test
> as follows:
> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [iou-sqp-600:601]
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 2 PID: 601 Comm: iou-sqp-600 Tainted: G L 5.18.0-rc7+ #3
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace+0x218/0x228
>  show_stack+0x20/0x68
>  dump_stack_lvl+0x68/0x84
>  dump_stack+0x1c/0x38
>  panic+0x1ec/0x3ec
>  watchdog_timer_fn+0x28c/0x300
>  __hrtimer_run_queues+0x1d8/0x498
>  hrtimer_interrupt+0x238/0x558
>  arch_timer_handler_virt+0x48/0x60
>  handle_percpu_devid_irq+0xdc/0x270
>  generic_handle_domain_irq+0x50/0x70
>  gic_handle_irq+0x8c/0x4bc
>  call_on_irq_stack+0x2c/0x38
>  do_interrupt_handler+0xc4/0xc8
>  el1_interrupt+0x48/0xb0
>  el1h_64_irq_handler+0x18/0x28
>  el1h_64_irq+0x74/0x78
>  console_unlock+0x5d0/0x908
>  vprintk_emit+0x21c/0x470
>  vprintk_default+0x40/0x50
>  vprintk+0xd0/0x128
>  _printk+0xb4/0xe8
>  io_issue_sqe+0x1784/0x2908
>  io_submit_sqes+0x538/0x2880
>  io_sq_thread+0x328/0x7b0
>  ret_from_fork+0x10/0x20
> SMP: stopping secondary CPUs
> Kernel Offset: 0x40f1e8600000 from 0xffff800008000000
> PHYS_OFFSET: 0xfffffa8c80000000
> CPU features: 0x110,0000cf09,00001006
> Memory Limit: none
> ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---
> 
> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
> ---
>  fs/io_uring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 92ac50f139cd..d897c6798f00 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -7864,7 +7864,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
>  			if (!(ctx->flags & IORING_SETUP_SUBMIT_ALL))
>  				break;
>  		}
> -	} while (submitted < nr);
> +	} while (submitted < nr && !need_resched());
>  
>  	if (unlikely(submitted != nr)) {
>  		int ref_used = (submitted == -EAGAIN) ? 0 : submitted;

This is wrong, you'll potentially end up doing random short submits for
non-sqpoll as well.

sqpoll already supports capping how many it submits in one go, it just
doesn't do it if it's only running one ring. As simple as the below,
with 1024 pulled out of thin air. Would be great if you could experiment
and submit a v2 based on this principle instead. Might still need a
cond_resched() carefully placed in io_sq_thread().

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e0823f58f795..3830d7b493b9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7916,7 +7916,8 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
 	unsigned int to_submit;
 	int ret = 0;
 
-	to_submit = io_sqring_entries(ctx);
+	/* cap at 1024 to avoid doing too much in one submit round */
+	to_submit = min(io_sqring_entries(ctx), 1024U);
 	/* if we're handling multiple rings, cap submit size for fairness */
 	if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
 		to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] io_uring: add a schedule condition in io_submit_sqes
  2022-05-22  2:42 ` Jens Axboe
@ 2022-05-23 14:45   ` Guo Xuenan
  2022-05-23 16:27     ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Guo Xuenan @ 2022-05-23 14:45 UTC (permalink / raw)
  To: Jens Axboe, asml.silence, io-uring
  Cc: houtao1, yi.zhang, chengzhihao1, linux-kernel

Hi Jens

On 2022/5/22 10:42, Jens Axboe wrote:
> On 5/21/22 8:33 AM, Guo Xuenan wrote:
>> when set up sq ring size with IORING_MAX_ENTRIES, io_submit_sqes may
>> looping ~32768 times which may trigger soft lockups. add need_resched
>> condition to avoid this bad situation.
>>
>> set sq ring size 32768 and using io_sq_thread to perform stress test
>> as follows:
>> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [iou-sqp-600:601]
>> Kernel panic - not syncing: softlockup: hung tasks
>> CPU: 2 PID: 601 Comm: iou-sqp-600 Tainted: G L 5.18.0-rc7+ #3
>> Hardware name: linux,dummy-virt (DT)
>> Call trace:
>>   dump_backtrace+0x218/0x228
>>   show_stack+0x20/0x68
>>   dump_stack_lvl+0x68/0x84
>>   dump_stack+0x1c/0x38
>>   panic+0x1ec/0x3ec
>>   watchdog_timer_fn+0x28c/0x300
>>   __hrtimer_run_queues+0x1d8/0x498
>>   hrtimer_interrupt+0x238/0x558
>>   arch_timer_handler_virt+0x48/0x60
>>   handle_percpu_devid_irq+0xdc/0x270
>>   generic_handle_domain_irq+0x50/0x70
>>   gic_handle_irq+0x8c/0x4bc
>>   call_on_irq_stack+0x2c/0x38
>>   do_interrupt_handler+0xc4/0xc8
>>   el1_interrupt+0x48/0xb0
>>   el1h_64_irq_handler+0x18/0x28
>>   el1h_64_irq+0x74/0x78
>>   console_unlock+0x5d0/0x908
>>   vprintk_emit+0x21c/0x470
>>   vprintk_default+0x40/0x50
>>   vprintk+0xd0/0x128
>>   _printk+0xb4/0xe8
>>   io_issue_sqe+0x1784/0x2908
>>   io_submit_sqes+0x538/0x2880
>>   io_sq_thread+0x328/0x7b0
>>   ret_from_fork+0x10/0x20
>> SMP: stopping secondary CPUs
>> Kernel Offset: 0x40f1e8600000 from 0xffff800008000000
>> PHYS_OFFSET: 0xfffffa8c80000000
>> CPU features: 0x110,0000cf09,00001006
>> Memory Limit: none
>> ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---
>>
>> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
>> ---
>>   fs/io_uring.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 92ac50f139cd..d897c6798f00 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -7864,7 +7864,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
>>   			if (!(ctx->flags & IORING_SETUP_SUBMIT_ALL))
>>   				break;
>>   		}
>> -	} while (submitted < nr);
>> +	} while (submitted < nr && !need_resched());
>>   
>>   	if (unlikely(submitted != nr)) {
>>   		int ref_used = (submitted == -EAGAIN) ? 0 : submitted;
> This is wrong, you'll potentially end up doing random short submits for
> non-sqpoll as well.
Sorry, Indeed, this is not a good solution. Since, the function 
io_submit_sqes
not only called by io_sq_thread, it also called by syscall 
io_uring_enter sending
large amounts of requests, will also trigger soft lockup.
> sqpoll already supports capping how many it submits in one go, it just
> doesn't do it if it's only running one ring. As simple as the below,
> with 1024 pulled out of thin air. Would be great if you could experiment
> and submit a v2 based on this principle instead. Might still need a
yes, Jens, your patch sloved sq-poll-thread problem, but the problem may 
not
completely solved; when using syscall io_uring_enter to subimit large 
amounts

of requests.So in my opinion How about 1) add cond_resched() in the 
while cycle

part of io_submit_sqes ?. OR 2) set macro IORING_MAX_ENTRIES smaller? (i'm

curious about the value,why we set it with 32768)

> cond_resched() carefully placed in io_sq_thread().
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index e0823f58f795..3830d7b493b9 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -7916,7 +7916,8 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
>   	unsigned int to_submit;
>   	int ret = 0;
>   
> -	to_submit = io_sqring_entries(ctx);
> +	/* cap at 1024 to avoid doing too much in one submit round */
> +	to_submit = min(io_sqring_entries(ctx), 1024U);
Yes, it works.;)
>   	/* if we're handling multiple rings, cap submit size for fairness */
>   	if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
>   		to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] io_uring: add a schedule condition in io_submit_sqes
  2022-05-23 14:45   ` Guo Xuenan
@ 2022-05-23 16:27     ` Jens Axboe
  2022-05-24  6:58       ` Guo Xuenan
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2022-05-23 16:27 UTC (permalink / raw)
  To: Guo Xuenan, asml.silence, io-uring
  Cc: houtao1, yi.zhang, chengzhihao1, linux-kernel

On 5/23/22 8:45 AM, Guo Xuenan wrote:
> Hi Jens
> 
> On 2022/5/22 10:42, Jens Axboe wrote:
>> On 5/21/22 8:33 AM, Guo Xuenan wrote:
>>> when set up sq ring size with IORING_MAX_ENTRIES, io_submit_sqes may
>>> looping ~32768 times which may trigger soft lockups. add need_resched
>>> condition to avoid this bad situation.
>>>
>>> set sq ring size 32768 and using io_sq_thread to perform stress test
>>> as follows:
>>> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [iou-sqp-600:601]
>>> Kernel panic - not syncing: softlockup: hung tasks
>>> CPU: 2 PID: 601 Comm: iou-sqp-600 Tainted: G L 5.18.0-rc7+ #3
>>> Hardware name: linux,dummy-virt (DT)
>>> Call trace:
>>>   dump_backtrace+0x218/0x228
>>>   show_stack+0x20/0x68
>>>   dump_stack_lvl+0x68/0x84
>>>   dump_stack+0x1c/0x38
>>>   panic+0x1ec/0x3ec
>>>   watchdog_timer_fn+0x28c/0x300
>>>   __hrtimer_run_queues+0x1d8/0x498
>>>   hrtimer_interrupt+0x238/0x558
>>>   arch_timer_handler_virt+0x48/0x60
>>>   handle_percpu_devid_irq+0xdc/0x270
>>>   generic_handle_domain_irq+0x50/0x70
>>>   gic_handle_irq+0x8c/0x4bc
>>>   call_on_irq_stack+0x2c/0x38
>>>   do_interrupt_handler+0xc4/0xc8
>>>   el1_interrupt+0x48/0xb0
>>>   el1h_64_irq_handler+0x18/0x28
>>>   el1h_64_irq+0x74/0x78
>>>   console_unlock+0x5d0/0x908
>>>   vprintk_emit+0x21c/0x470
>>>   vprintk_default+0x40/0x50
>>>   vprintk+0xd0/0x128
>>>   _printk+0xb4/0xe8
>>>   io_issue_sqe+0x1784/0x2908
>>>   io_submit_sqes+0x538/0x2880
>>>   io_sq_thread+0x328/0x7b0
>>>   ret_from_fork+0x10/0x20
>>> SMP: stopping secondary CPUs
>>> Kernel Offset: 0x40f1e8600000 from 0xffff800008000000
>>> PHYS_OFFSET: 0xfffffa8c80000000
>>> CPU features: 0x110,0000cf09,00001006
>>> Memory Limit: none
>>> ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---
>>>
>>> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
>>> ---
>>>   fs/io_uring.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 92ac50f139cd..d897c6798f00 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -7864,7 +7864,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
>>>               if (!(ctx->flags & IORING_SETUP_SUBMIT_ALL))
>>>                   break;
>>>           }
>>> -    } while (submitted < nr);
>>> +    } while (submitted < nr && !need_resched());
>>>         if (unlikely(submitted != nr)) {
>>>           int ref_used = (submitted == -EAGAIN) ? 0 : submitted;
>> This is wrong, you'll potentially end up doing random short submits for
>> non-sqpoll as well.
> Sorry, Indeed, this is not a good solution. Since, the function
> io_submit_sqes not only called by io_sq_thread, it also called by
> syscall io_uring_enter sending large amounts of requests, will also
> trigger soft lockup.

Exactly.

>> sqpoll already supports capping how many it submits in one go, it just
>> doesn't do it if it's only running one ring. As simple as the below,
>> with 1024 pulled out of thin air. Would be great if you could experiment
>> and submit a v2 based on this principle instead. Might still need a
>
> yes, Jens, your patch sloved sq-poll-thread problem, but the problem
> may not completely solved; when using syscall io_uring_enter to
> subimit large amounts of requests.So in my opinion How about 1) add
> cond_resched() in the while cycle part of io_submit_sqes ?. OR 2) set
> macro IORING_MAX_ENTRIES smaller? (i'm curious about the value,why we
> set it with 32768)

I did suspect this isn't specific to SQPOLL at all.

Might make sense to cap batches of non-sqpoll as well, and for each
batch, have a cond_resched() just in case. If you change
IORING_MAX_ENTRIES to something smaller, you risk breaking applications
that currently (for whatever reason) may have set up an SQ ring of that
side. So that is not a viable solution, and honestly wouldn't be a good
option even if that weren't the case.

So the simple solution is just to do it in io_submit_sqes() itself, but
would need to be carefully benchmarked to make sure that it doesn't
regress anything. It's the very fast path, and for real use cases you'd
never run into this problem. Even for a synthetic use case, it sounds
highly suspicious that nothing in the call path ends up doing a
conditional reschedule. What kind of requests are being submitted when
you hit this?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] io_uring: add a schedule condition in io_submit_sqes
  2022-05-23 16:27     ` Jens Axboe
@ 2022-05-24  6:58       ` Guo Xuenan
  0 siblings, 0 replies; 5+ messages in thread
From: Guo Xuenan @ 2022-05-24  6:58 UTC (permalink / raw)
  To: Jens Axboe, asml.silence, io-uring
  Cc: houtao1, yi.zhang, chengzhihao1, linux-kernel

Hi Jens,

this piece of code is used to reproduce the problem.
it always successfuly reproduce this bug on my
arm64 qemu virtual machine.
```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "liburing.h"

int main(void)
{
     int ret, i = 0;
     struct io_uring uring;
     struct io_uring_params params;
     struct io_uring_sqe *sqe;
     struct io_uring_cqe *cqe;
     struct __kernel_timespec ts = { .tv_sec = 30, .tv_nsec = 0, };

     memset(&params, 0, sizeof(params));
     ret = io_uring_queue_init_params(32768, &uring, &params);
     if (ret < 0) {
         printf("init err %d\n", errno);
         return 0;
     }

     io_uring_register_personality(&uring);
     while (i++ < 32768) {
         sqe = io_uring_get_sqe(&uring);
         io_uring_prep_timeout(sqe, &ts, rand(), 0);
     }
     io_uring_submit(&uring);
     printf("to_submit %d\n", i - 1);
     io_uring_wait_cqe_nr(&uring, &cqe, i - 1);
     return 0;
}
```
------------------------------------------------------------------------------------------
considering the performance issue when we add cond_resched() in "the 
very fast path"
you have mentationed, may we add some restriction to avoid extreme 
situtaions while
also reduceing the impact on performance.
I tested the following patch, it did work, there is no soft lockup here.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e0823f58f795..3c2e019fd124 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7873,6 +7873,9 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, 
unsigned int nr)
                         if (!(ctx->flags & IORING_SETUP_SUBMIT_ALL))
                                 break;
                 }
+               /* to avoid doing too much in one submit round */
+               if (submitted > IORING_MAX_ENTRIES / 2)
+                       cond_resched();
         } while (submitted < nr);

         if (unlikely(submitted != nr)) {

Best regards
Xuenan
On 2022/5/24 0:27, Jens Axboe wrote:
> On 5/23/22 8:45 AM, Guo Xuenan wrote:
>> Hi Jens
>>
>> On 2022/5/22 10:42, Jens Axboe wrote:
>>> On 5/21/22 8:33 AM, Guo Xuenan wrote:
>>>> when set up sq ring size with IORING_MAX_ENTRIES, io_submit_sqes may
>>>> looping ~32768 times which may trigger soft lockups. add need_resched
>>>> condition to avoid this bad situation.
>>>>
>>>> set sq ring size 32768 and using io_sq_thread to perform stress test
>>>> as follows:
>>>> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [iou-sqp-600:601]
>>>> Kernel panic - not syncing: softlockup: hung tasks
>>>> CPU: 2 PID: 601 Comm: iou-sqp-600 Tainted: G L 5.18.0-rc7+ #3
>>>> Hardware name: linux,dummy-virt (DT)
>>>> Call trace:
>>>>    dump_backtrace+0x218/0x228
>>>>    show_stack+0x20/0x68
>>>>    dump_stack_lvl+0x68/0x84
>>>>    dump_stack+0x1c/0x38
>>>>    panic+0x1ec/0x3ec
>>>>    watchdog_timer_fn+0x28c/0x300
>>>>    __hrtimer_run_queues+0x1d8/0x498
>>>>    hrtimer_interrupt+0x238/0x558
>>>>    arch_timer_handler_virt+0x48/0x60
>>>>    handle_percpu_devid_irq+0xdc/0x270
>>>>    generic_handle_domain_irq+0x50/0x70
>>>>    gic_handle_irq+0x8c/0x4bc
>>>>    call_on_irq_stack+0x2c/0x38
>>>>    do_interrupt_handler+0xc4/0xc8
>>>>    el1_interrupt+0x48/0xb0
>>>>    el1h_64_irq_handler+0x18/0x28
>>>>    el1h_64_irq+0x74/0x78
>>>>    console_unlock+0x5d0/0x908
>>>>    vprintk_emit+0x21c/0x470
>>>>    vprintk_default+0x40/0x50
>>>>    vprintk+0xd0/0x128
>>>>    _printk+0xb4/0xe8
>>>>    io_issue_sqe+0x1784/0x2908
>>>>    io_submit_sqes+0x538/0x2880
>>>>    io_sq_thread+0x328/0x7b0
>>>>    ret_from_fork+0x10/0x20
>>>> SMP: stopping secondary CPUs
>>>> Kernel Offset: 0x40f1e8600000 from 0xffff800008000000
>>>> PHYS_OFFSET: 0xfffffa8c80000000
>>>> CPU features: 0x110,0000cf09,00001006
>>>> Memory Limit: none
>>>> ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---
>>>>
>>>> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
>>>> ---
>>>>    fs/io_uring.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 92ac50f139cd..d897c6798f00 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -7864,7 +7864,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
>>>>                if (!(ctx->flags & IORING_SETUP_SUBMIT_ALL))
>>>>                    break;
>>>>            }
>>>> -    } while (submitted < nr);
>>>> +    } while (submitted < nr && !need_resched());
>>>>          if (unlikely(submitted != nr)) {
>>>>            int ref_used = (submitted == -EAGAIN) ? 0 : submitted;
>>> This is wrong, you'll potentially end up doing random short submits for
>>> non-sqpoll as well.
>> Sorry, Indeed, this is not a good solution. Since, the function
>> io_submit_sqes not only called by io_sq_thread, it also called by
>> syscall io_uring_enter sending large amounts of requests, will also
>> trigger soft lockup.
> Exactly.
>
>>> sqpoll already supports capping how many it submits in one go, it just
>>> doesn't do it if it's only running one ring. As simple as the below,
>>> with 1024 pulled out of thin air. Would be great if you could experiment
>>> and submit a v2 based on this principle instead. Might still need a
>> yes, Jens, your patch sloved sq-poll-thread problem, but the problem
>> may not completely solved; when using syscall io_uring_enter to
>> subimit large amounts of requests.So in my opinion How about 1) add
>> cond_resched() in the while cycle part of io_submit_sqes ?. OR 2) set
>> macro IORING_MAX_ENTRIES smaller? (i'm curious about the value,why we
>> set it with 32768)
> I did suspect this isn't specific to SQPOLL at all.
>
> Might make sense to cap batches of non-sqpoll as well, and for each
> batch, have a cond_resched() just in case. If you change
> IORING_MAX_ENTRIES to something smaller, you risk breaking applications
> that currently (for whatever reason) may have set up an SQ ring of that
> side. So that is not a viable solution, and honestly wouldn't be a good
> option even if that weren't the case.
>
> So the simple solution is just to do it in io_submit_sqes() itself, but
> would need to be carefully benchmarked to make sure that it doesn't
> regress anything. It's the very fast path, and for real use cases you'd
> never run into this problem. Even for a synthetic use case, it sounds
> highly suspicious that nothing in the call path ends up doing a
> conditional reschedule. What kind of requests are being submitted when
> you hit this?
>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-24  6:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-21 14:33 [PATCH] io_uring: add a schedule condition in io_submit_sqes Guo Xuenan
2022-05-22  2:42 ` Jens Axboe
2022-05-23 14:45   ` Guo Xuenan
2022-05-23 16:27     ` Jens Axboe
2022-05-24  6:58       ` Guo Xuenan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).