All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t
@ 2019-12-26  8:52 Wei Li
  2020-01-02 19:45 ` Steven Rostedt
  2020-01-17 10:08 ` [tip: sched/core] " tip-bot2 for Wei Li
  0 siblings, 2 replies; 5+ messages in thread
From: Wei Li @ 2019-12-26  8:52 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman
  Cc: huawei.libin, linux-kernel

Lengthy output of sysrq-t may take a lot of time on slow serial console
with lots of processes and CPUs.

So we need to reset NMI-watchdog to avoid spurious lockup messages, and
we also reset softlockup watchdogs on all other CPUs since another CPU
might be blocked waiting for us to process an IPI or stop_machine.

Add to sysrq_sched_debug_show() as what we did in show_state_filter().

Signed-off-by: Wei Li <liwei391@huawei.com>
---
 kernel/sched/debug.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index f7e4579e746c..879d3ccf3806 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -751,9 +751,16 @@ void sysrq_sched_debug_show(void)
 	int cpu;
 
 	sched_debug_header(NULL);
-	for_each_online_cpu(cpu)
+	for_each_online_cpu(cpu) {
+		/*
+		 * Need to reset softlockup watchdogs on all CPUs, because
+		 * another CPU might be blocked waiting for us to process
+		 * an IPI or stop_machine.
+		 */
+		touch_nmi_watchdog();
+		touch_all_softlockup_watchdogs();
 		print_cpu(NULL, cpu);
-
+	}
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t
  2019-12-26  8:52 [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t Wei Li
@ 2020-01-02 19:45 ` Steven Rostedt
  2020-01-03  1:54   ` liwei (GF)
  2020-01-07  9:32   ` Peter Zijlstra
  2020-01-17 10:08 ` [tip: sched/core] " tip-bot2 for Wei Li
  1 sibling, 2 replies; 5+ messages in thread
From: Steven Rostedt @ 2020-01-02 19:45 UTC (permalink / raw)
  To: Wei Li
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, huawei.libin, linux-kernel

On Thu, 26 Dec 2019 16:52:24 +0800
Wei Li <liwei391@huawei.com> wrote:

> Lengthy output of sysrq-t may take a lot of time on slow serial console
> with lots of processes and CPUs.
> 
> So we need to reset NMI-watchdog to avoid spurious lockup messages, and
> we also reset softlockup watchdogs on all other CPUs since another CPU
> might be blocked waiting for us to process an IPI or stop_machine.

Have you had this triggered?

> 
> Add to sysrq_sched_debug_show() as what we did in show_state_filter().
> 
> Signed-off-by: Wei Li <liwei391@huawei.com>
> ---
>  kernel/sched/debug.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index f7e4579e746c..879d3ccf3806 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -751,9 +751,16 @@ void sysrq_sched_debug_show(void)
>  	int cpu;
>  
>  	sched_debug_header(NULL);
> -	for_each_online_cpu(cpu)
> +	for_each_online_cpu(cpu) {
> +		/*
> +		 * Need to reset softlockup watchdogs on all CPUs, because
> +		 * another CPU might be blocked waiting for us to process
> +		 * an IPI or stop_machine.
> +		 */
> +		touch_nmi_watchdog();
> +		touch_all_softlockup_watchdogs();

This doesn't seem to hurt to add, thus.

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

>  		print_cpu(NULL, cpu);
> -
> +	}
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t
  2020-01-02 19:45 ` Steven Rostedt
@ 2020-01-03  1:54   ` liwei (GF)
  2020-01-07  9:32   ` Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: liwei (GF) @ 2020-01-03  1:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, huawei.libin, linux-kernel

Hi Steven,
Yes, it can be triggered on the Hi1620 system (128 cores) as follows:
stress-ng -c 50 &
stress-ng -m 50 &
stress-ng -i 20 &
echo 7 > /proc/sys/kernel/printk
echo t > /proc/sysrq-trigger

Then a soft lockup will be reported at migration thread
[39636.303531] watchdog: BUG: soft lockup - CPU#67 stuck for 23s! [migration/67:348]
which is waiting for the CPU handling sysrq-t to process stop_two_cpus.

Thanks,
Wei

On 2020/1/3 3:45, Steven Rostedt wrote:
> On Thu, 26 Dec 2019 16:52:24 +0800
> Wei Li <liwei391@huawei.com> wrote:
> 
>> Lengthy output of sysrq-t may take a lot of time on slow serial console
>> with lots of processes and CPUs.
>>
>> So we need to reset NMI-watchdog to avoid spurious lockup messages, and
>> we also reset softlockup watchdogs on all other CPUs since another CPU
>> might be blocked waiting for us to process an IPI or stop_machine.
> 
> Have you had this triggered?
> 
>>
>> Add to sysrq_sched_debug_show() as what we did in show_state_filter().
>>
>> Signed-off-by: Wei Li <liwei391@huawei.com>
>> ---
>>  kernel/sched/debug.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
>> index f7e4579e746c..879d3ccf3806 100644
>> --- a/kernel/sched/debug.c
>> +++ b/kernel/sched/debug.c
>> @@ -751,9 +751,16 @@ void sysrq_sched_debug_show(void)
>>  	int cpu;
>>  
>>  	sched_debug_header(NULL);
>> -	for_each_online_cpu(cpu)
>> +	for_each_online_cpu(cpu) {
>> +		/*
>> +		 * Need to reset softlockup watchdogs on all CPUs, because
>> +		 * another CPU might be blocked waiting for us to process
>> +		 * an IPI or stop_machine.
>> +		 */
>> +		touch_nmi_watchdog();
>> +		touch_all_softlockup_watchdogs();
> 
> This doesn't seem to hurt to add, thus.
> 
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> -- Steve
> 
>>  		print_cpu(NULL, cpu);
>> -
>> +	}
>>  }
>>  
>>  /*
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t
  2020-01-02 19:45 ` Steven Rostedt
  2020-01-03  1:54   ` liwei (GF)
@ 2020-01-07  9:32   ` Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2020-01-07  9:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Wei Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, huawei.libin, linux-kernel

On Thu, Jan 02, 2020 at 02:45:14PM -0500, Steven Rostedt wrote:
> On Thu, 26 Dec 2019 16:52:24 +0800
> Wei Li <liwei391@huawei.com> wrote:
> 
> > Lengthy output of sysrq-t may take a lot of time on slow serial console
> > with lots of processes and CPUs.
> > 
> > So we need to reset NMI-watchdog to avoid spurious lockup messages, and
> > we also reset softlockup watchdogs on all other CPUs since another CPU
> > might be blocked waiting for us to process an IPI or stop_machine.
> 
> Have you had this triggered?
> 
> > 
> > Add to sysrq_sched_debug_show() as what we did in show_state_filter().
> > 
> > Signed-off-by: Wei Li <liwei391@huawei.com>
> > ---
> >  kernel/sched/debug.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> > index f7e4579e746c..879d3ccf3806 100644
> > --- a/kernel/sched/debug.c
> > +++ b/kernel/sched/debug.c
> > @@ -751,9 +751,16 @@ void sysrq_sched_debug_show(void)
> >  	int cpu;
> >  
> >  	sched_debug_header(NULL);
> > -	for_each_online_cpu(cpu)
> > +	for_each_online_cpu(cpu) {
> > +		/*
> > +		 * Need to reset softlockup watchdogs on all CPUs, because
> > +		 * another CPU might be blocked waiting for us to process
> > +		 * an IPI or stop_machine.
> > +		 */
> > +		touch_nmi_watchdog();
> > +		touch_all_softlockup_watchdogs();
> 
> This doesn't seem to hurt to add, thus.
> 
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: sched/core] sched/debug: Reset watchdog on all CPUs while processing sysrq-t
  2019-12-26  8:52 [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t Wei Li
  2020-01-02 19:45 ` Steven Rostedt
@ 2020-01-17 10:08 ` tip-bot2 for Wei Li
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot2 for Wei Li @ 2020-01-17 10:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Wei Li, Peter Zijlstra (Intel), Steven Rostedt (VMware), x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     02d4ac5885a18d326b500b94808f0956dcce2832
Gitweb:        https://git.kernel.org/tip/02d4ac5885a18d326b500b94808f0956dcce2832
Author:        Wei Li <liwei391@huawei.com>
AuthorDate:    Thu, 26 Dec 2019 16:52:24 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 17 Jan 2020 10:19:20 +01:00

sched/debug: Reset watchdog on all CPUs while processing sysrq-t

Lengthy output of sysrq-t may take a lot of time on slow serial console
with lots of processes and CPUs.

So we need to reset NMI-watchdog to avoid spurious lockup messages, and
we also reset softlockup watchdogs on all other CPUs since another CPU
might be blocked waiting for us to process an IPI or stop_machine.

Add to sysrq_sched_debug_show() as what we did in show_state_filter().

Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20191226085224.48942-1-liwei391@huawei.com
---
 kernel/sched/debug.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index f7e4579..879d3cc 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -751,9 +751,16 @@ void sysrq_sched_debug_show(void)
 	int cpu;
 
 	sched_debug_header(NULL);
-	for_each_online_cpu(cpu)
+	for_each_online_cpu(cpu) {
+		/*
+		 * Need to reset softlockup watchdogs on all CPUs, because
+		 * another CPU might be blocked waiting for us to process
+		 * an IPI or stop_machine.
+		 */
+		touch_nmi_watchdog();
+		touch_all_softlockup_watchdogs();
 		print_cpu(NULL, cpu);
-
+	}
 }
 
 /*

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-17 10:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-26  8:52 [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t Wei Li
2020-01-02 19:45 ` Steven Rostedt
2020-01-03  1:54   ` liwei (GF)
2020-01-07  9:32   ` Peter Zijlstra
2020-01-17 10:08 ` [tip: sched/core] " tip-bot2 for Wei Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.