All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] timer_list: avoid other cpu soft lockup when printing timer list
@ 2020-02-20  3:42 Yang Yingliang
  2020-02-21  1:41 ` Stephen Boyd
  0 siblings, 1 reply; 3+ messages in thread
From: Yang Yingliang @ 2020-02-20  3:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, john.stultz, sboyd

If system has many cpus (e.g. 128), it will spend a lot of time to
print message to the console when execute echo q > /proc/sysrq-trigger.

When /proc/sys/kernel/numa_balancing is enabled, if the migration threads
are woke up, the migration thread that on print mesasage cpu can't run
until the print finish, another migration thread may trigger soft lockup.

PID: 619    TASK: ffffa02fdd8bec80  CPU: 121  COMMAND: "migration/121"
  #0 [ffff00000a103b10] __crash_kexec at ffff0000081bf200
  #1 [ffff00000a103ca0] panic at ffff0000080ec93c
  #2 [ffff00000a103d80] watchdog_timer_fn at ffff0000081f8a14
  #3 [ffff00000a103e00] __run_hrtimer at ffff00000819701c
  #4 [ffff00000a103e40] __hrtimer_run_queues at ffff000008197420
  #5 [ffff00000a103ea0] hrtimer_interrupt at ffff00000819831c
  #6 [ffff00000a103f10] arch_timer_dying_cpu at ffff000008b53144
  #7 [ffff00000a103f30] handle_percpu_devid_irq at ffff000008174e34
  #8 [ffff00000a103f70] generic_handle_irq at ffff00000816c5e8
  #9 [ffff00000a103f90] __handle_domain_irq at ffff00000816d1f4
 #10 [ffff00000a103fd0] gic_handle_irq at ffff000008081860
 --- <IRQ stack> ---
 #11 [ffff00000d6e3d50] el1_irq at ffff0000080834c8
 #12 [ffff00000d6e3d60] multi_cpu_stop at ffff0000081d9964
 #13 [ffff00000d6e3db0] cpu_stopper_thread at ffff0000081d9cfc
 #14 [ffff00000d6e3e10] smpboot_thread_fn at ffff00000811e0a8
 #15 [ffff00000d6e3e70] kthread at ffff000008118988

To avoid this soft lockup, add touch_all_softlockup_watchdogs()
in sysrq_timer_list_show()

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 kernel/time/timer_list.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index acb326f..4cb0e6f 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -289,13 +289,17 @@ void sysrq_timer_list_show(void)
 
 	timer_list_header(NULL, now);
 
-	for_each_online_cpu(cpu)
+	for_each_online_cpu(cpu) {
+		touch_all_softlockup_watchdogs();
 		print_cpu(NULL, cpu, now);
+	}
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 	timer_list_show_tickdevices_header(NULL);
-	for_each_online_cpu(cpu)
+	for_each_online_cpu(cpu) {
+		touch_all_softlockup_watchdogs();
 		print_tickdevice(NULL, tick_get_device(cpu), cpu);
+	}
 #endif
 	return;
 }
-- 
1.8.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] timer_list: avoid other cpu soft lockup when printing timer list
  2020-02-20  3:42 [PATCH] timer_list: avoid other cpu soft lockup when printing timer list Yang Yingliang
@ 2020-02-21  1:41 ` Stephen Boyd
  2020-03-09  8:20   ` Yang Yingliang
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen Boyd @ 2020-02-21  1:41 UTC (permalink / raw)
  To: Yang Yingliang, linux-kernel; +Cc: tglx, john.stultz

Quoting Yang Yingliang (2020-02-19 19:42:32)
> If system has many cpus (e.g. 128), it will spend a lot of time to
> print message to the console when execute echo q > /proc/sysrq-trigger.
> 
> When /proc/sys/kernel/numa_balancing is enabled, if the migration threads
> are woke up, the migration thread that on print mesasage cpu can't run
> until the print finish, another migration thread may trigger soft lockup.
> 
> PID: 619    TASK: ffffa02fdd8bec80  CPU: 121  COMMAND: "migration/121"
>   #0 [ffff00000a103b10] __crash_kexec at ffff0000081bf200
>   #1 [ffff00000a103ca0] panic at ffff0000080ec93c
>   #2 [ffff00000a103d80] watchdog_timer_fn at ffff0000081f8a14
>   #3 [ffff00000a103e00] __run_hrtimer at ffff00000819701c
>   #4 [ffff00000a103e40] __hrtimer_run_queues at ffff000008197420
>   #5 [ffff00000a103ea0] hrtimer_interrupt at ffff00000819831c
>   #6 [ffff00000a103f10] arch_timer_dying_cpu at ffff000008b53144
>   #7 [ffff00000a103f30] handle_percpu_devid_irq at ffff000008174e34
>   #8 [ffff00000a103f70] generic_handle_irq at ffff00000816c5e8
>   #9 [ffff00000a103f90] __handle_domain_irq at ffff00000816d1f4
>  #10 [ffff00000a103fd0] gic_handle_irq at ffff000008081860
>  --- <IRQ stack> ---
>  #11 [ffff00000d6e3d50] el1_irq at ffff0000080834c8
>  #12 [ffff00000d6e3d60] multi_cpu_stop at ffff0000081d9964
>  #13 [ffff00000d6e3db0] cpu_stopper_thread at ffff0000081d9cfc
>  #14 [ffff00000d6e3e10] smpboot_thread_fn at ffff00000811e0a8
>  #15 [ffff00000d6e3e70] kthread at ffff000008118988
> 
> To avoid this soft lockup, add touch_all_softlockup_watchdogs()
> in sysrq_timer_list_show()
> 
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> ---
>  kernel/time/timer_list.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
> index acb326f..4cb0e6f 100644
> --- a/kernel/time/timer_list.c
> +++ b/kernel/time/timer_list.c
> @@ -289,13 +289,17 @@ void sysrq_timer_list_show(void)
>  
>         timer_list_header(NULL, now);
>  
> -       for_each_online_cpu(cpu)
> +       for_each_online_cpu(cpu) {
> +               touch_all_softlockup_watchdogs();

Usage of touch_all_softlockup_watchdogs() deserves a comment. Otherwise
the reader is left to git archaeology to understand why watchdogs are
being touched. Of course, we failed at that with commit 010704276865
("sysrq: Reset the watchdog timers while displaying high-resolution
timers") which looks awfully similar to this.

>                 print_cpu(NULL, cpu, now);
> +       }
>  
>  #ifdef CONFIG_GENERIC_CLOCKEVENTS
>         timer_list_show_tickdevices_header(NULL);
> -       for_each_online_cpu(cpu)
> +       for_each_online_cpu(cpu) {
> +               touch_all_softlockup_watchdogs();
>                 print_tickdevice(NULL, tick_get_device(cpu), cpu);

print_tickdevice() already has touch_nmi_watchdog() which eventually
touches the softlockup watchdog. Is the problem that it isn't enough to
do that when the migration thread is also running?

> +       }
>  #endif
>         return;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] timer_list: avoid other cpu soft lockup when printing timer list
  2020-02-21  1:41 ` Stephen Boyd
@ 2020-03-09  8:20   ` Yang Yingliang
  0 siblings, 0 replies; 3+ messages in thread
From: Yang Yingliang @ 2020-03-09  8:20 UTC (permalink / raw)
  To: Stephen Boyd, linux-kernel; +Cc: tglx, john.stultz

Hi,

sorry for the late reply.

On 2020/2/21 9:41, Stephen Boyd wrote:
> Quoting Yang Yingliang (2020-02-19 19:42:32)
>> If system has many cpus (e.g. 128), it will spend a lot of time to
>> print message to the console when execute echo q > /proc/sysrq-trigger.
>>
>> When /proc/sys/kernel/numa_balancing is enabled, if the migration threads
>> are woke up, the migration thread that on print mesasage cpu can't run
>> until the print finish, another migration thread may trigger soft lockup.
>>
>> PID: 619    TASK: ffffa02fdd8bec80  CPU: 121  COMMAND: "migration/121"
>>    #0 [ffff00000a103b10] __crash_kexec at ffff0000081bf200
>>    #1 [ffff00000a103ca0] panic at ffff0000080ec93c
>>    #2 [ffff00000a103d80] watchdog_timer_fn at ffff0000081f8a14
>>    #3 [ffff00000a103e00] __run_hrtimer at ffff00000819701c
>>    #4 [ffff00000a103e40] __hrtimer_run_queues at ffff000008197420
>>    #5 [ffff00000a103ea0] hrtimer_interrupt at ffff00000819831c
>>    #6 [ffff00000a103f10] arch_timer_dying_cpu at ffff000008b53144
>>    #7 [ffff00000a103f30] handle_percpu_devid_irq at ffff000008174e34
>>    #8 [ffff00000a103f70] generic_handle_irq at ffff00000816c5e8
>>    #9 [ffff00000a103f90] __handle_domain_irq at ffff00000816d1f4
>>   #10 [ffff00000a103fd0] gic_handle_irq at ffff000008081860
>>   --- <IRQ stack> ---
>>   #11 [ffff00000d6e3d50] el1_irq at ffff0000080834c8
>>   #12 [ffff00000d6e3d60] multi_cpu_stop at ffff0000081d9964
>>   #13 [ffff00000d6e3db0] cpu_stopper_thread at ffff0000081d9cfc
>>   #14 [ffff00000d6e3e10] smpboot_thread_fn at ffff00000811e0a8
>>   #15 [ffff00000d6e3e70] kthread at ffff000008118988
>>
>> To avoid this soft lockup, add touch_all_softlockup_watchdogs()
>> in sysrq_timer_list_show()
>>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>> ---
>>   kernel/time/timer_list.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
>> index acb326f..4cb0e6f 100644
>> --- a/kernel/time/timer_list.c
>> +++ b/kernel/time/timer_list.c
>> @@ -289,13 +289,17 @@ void sysrq_timer_list_show(void)
>>   
>>          timer_list_header(NULL, now);
>>   
>> -       for_each_online_cpu(cpu)
>> +       for_each_online_cpu(cpu) {
>> +               touch_all_softlockup_watchdogs();
> Usage of touch_all_softlockup_watchdogs() deserves a comment. Otherwise
> the reader is left to git archaeology to understand why watchdogs are
> being touched. Of course, we failed at that with commit 010704276865
> ("sysrq: Reset the watchdog timers while displaying high-resolution
> timers") which looks awfully similar to this.
OK, I will add a comment later.
>
>>                  print_cpu(NULL, cpu, now);
>> +       }
>>   
>>   #ifdef CONFIG_GENERIC_CLOCKEVENTS
>>          timer_list_show_tickdevices_header(NULL);
>> -       for_each_online_cpu(cpu)
>> +       for_each_online_cpu(cpu) {
>> +               touch_all_softlockup_watchdogs();
>>                  print_tickdevice(NULL, tick_get_device(cpu), cpu);
> print_tickdevice() already has touch_nmi_watchdog() which eventually
> touches the softlockup watchdog. Is the problem that it isn't enough to
> do that when the migration thread is also running?
No, it's not enough.
The soft lockup occurs on other cpu, so other cpu's soft watchdog need 
to be touched.

>
>> +       }
>>   #endif
>>          return;
> .
>



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-03-09  8:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-20  3:42 [PATCH] timer_list: avoid other cpu soft lockup when printing timer list Yang Yingliang
2020-02-21  1:41 ` Stephen Boyd
2020-03-09  8:20   ` Yang Yingliang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.