linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
@ 2020-02-10  9:48 Konstantin Khlebnikov
  2020-02-10 22:51 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2020-02-10  9:48 UTC (permalink / raw)
  To: Petr Mladek, Peter Zijlstra, linux-kernel, Steven Rostedt,
	Sergey Senozhatsky, Andrew Morton
  Cc: Dmitry Monakhov

In NMI context printk() could save messages into per-cpu buffers and
schedule flush by irq_work when IRQ are unblocked. This means message
about hardlockup appears in kernel log only when/if lockup is gone.

Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
thus printk() cannot schedule flush work to another cpu.

This patch adds simple atomic counter of detected hardlockups and
flushes all per-cpu printk buffers in context softlockup watchdog
at any other cpu when it sees changes of this counter.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 include/linux/nmi.h   |    1 +
 kernel/watchdog.c     |   22 ++++++++++++++++++++++
 kernel/watchdog_hld.c |    1 +
 3 files changed, 24 insertions(+)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 9003e29cde46..8406df72ae5a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
 extern unsigned int hardlockup_panic;
+extern atomic_t hardlockup_detected;
 #else
 static inline void hardlockup_detector_disable(void) {}
 #endif
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b6b1f54a7837..9f5c68fababe 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
 }
 __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 # endif /* CONFIG_SMP */
+
+atomic_t hardlockup_detected = ATOMIC_INIT(0);
+
+static inline void flush_hardlockup_messages(void)
+{
+	static atomic_t flushed = ATOMIC_INIT(0);
+
+	/* flush messages from hard lockup detector */
+	if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
+		atomic_set(&flushed, atomic_read(&hardlockup_detected));
+		printk_safe_flush();
+	}
+}
+
+#else /* CONFIG_HARDLOCKUP_DETECTOR */
+
+static inline void flush_hardlockup_messages(void)
+{
+}
+
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /*
@@ -370,6 +390,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	/* kick the hardlockup detector */
 	watchdog_interrupt_count();
 
+	flush_hardlockup_messages();
+
 	/* kick the softlockup detector */
 	if (completion_done(this_cpu_ptr(&softlockup_completion))) {
 		reinit_completion(this_cpu_ptr(&softlockup_completion));
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..a546bc54f6ff 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -154,6 +154,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
 
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");
+		atomic_inc(&hardlockup_detected);
 
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-10  9:48 [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected Konstantin Khlebnikov
@ 2020-02-10 22:51 ` Andrew Morton
  2020-02-11 11:01   ` Konstantin Khlebnikov
  2020-02-11  8:14 ` Kirill Tkhai
  2020-02-12  1:15 ` Sergey Senozhatsky
  2 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2020-02-10 22:51 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Petr Mladek, Peter Zijlstra, linux-kernel, Steven Rostedt,
	Sergey Senozhatsky, Dmitry Monakhov

On Mon, 10 Feb 2020 12:48:57 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:

> In NMI context printk() could save messages into per-cpu buffers and
> schedule flush by irq_work when IRQ are unblocked. This means message
> about hardlockup appears in kernel log only when/if lockup is gone.

I think I understand what this means.  The hard lockup detector runs at
NMI time but if it detects a lockup within IRQ context it cannot call
printk, because it's within NMI context, where synchronous printk
doesn't work.  Yes?

> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> thus printk() cannot schedule flush work to another cpu.
> 
> This patch adds simple atomic counter of detected hardlockups and
> flushes all per-cpu printk buffers in context softlockup watchdog
> at any other cpu when it sees changes of this counter.

And I think this works because the softlockup detector runs within irq
context?

>
> ...
>
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>  }
>  __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>  # endif /* CONFIG_SMP */
> +
> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
> +
> +static inline void flush_hardlockup_messages(void)

I don't think this needs to be inlined?

> +{
> +	static atomic_t flushed = ATOMIC_INIT(0);
> +
> +	/* flush messages from hard lockup detector */
> +	if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
> +		atomic_set(&flushed, atomic_read(&hardlockup_detected));
> +		printk_safe_flush();
> +	}
> +}

Could we add some explanatory comments here?  Explain to the reader why
this code exists, what purpose it serves?  Basically a micro version of
the above changelog.

>
> ...
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-10  9:48 [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected Konstantin Khlebnikov
  2020-02-10 22:51 ` Andrew Morton
@ 2020-02-11  8:14 ` Kirill Tkhai
  2020-02-11 12:36   ` Konstantin Khlebnikov
  2020-02-12  1:15 ` Sergey Senozhatsky
  2 siblings, 1 reply; 12+ messages in thread
From: Kirill Tkhai @ 2020-02-11  8:14 UTC (permalink / raw)
  To: Konstantin Khlebnikov, Petr Mladek, Peter Zijlstra, linux-kernel,
	Steven Rostedt, Sergey Senozhatsky, Andrew Morton
  Cc: Dmitry Monakhov

Hi, Konstantin,

On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
> In NMI context printk() could save messages into per-cpu buffers and
> schedule flush by irq_work when IRQ are unblocked. This means message
> about hardlockup appears in kernel log only when/if lockup is gone.
> 
> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> thus printk() cannot schedule flush work to another cpu.
> 
> This patch adds simple atomic counter of detected hardlockups and
> flushes all per-cpu printk buffers in context softlockup watchdog
> at any other cpu when it sees changes of this counter.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>  include/linux/nmi.h   |    1 +
>  kernel/watchdog.c     |   22 ++++++++++++++++++++++
>  kernel/watchdog_hld.c |    1 +
>  3 files changed, 24 insertions(+)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 9003e29cde46..8406df72ae5a 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
>  #if defined(CONFIG_HARDLOCKUP_DETECTOR)
>  extern void hardlockup_detector_disable(void);
>  extern unsigned int hardlockup_panic;
> +extern atomic_t hardlockup_detected;
>  #else
>  static inline void hardlockup_detector_disable(void) {}
>  #endif
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index b6b1f54a7837..9f5c68fababe 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>  }
>  __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>  # endif /* CONFIG_SMP */
> +
> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
> +
> +static inline void flush_hardlockup_messages(void)
> +{
> +	static atomic_t flushed = ATOMIC_INIT(0);
> +
> +	/* flush messages from hard lockup detector */
> +	if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
> +		atomic_set(&flushed, atomic_read(&hardlockup_detected));
> +		printk_safe_flush();
> +	}
> +}

Do we really need two variables here? They may come into two different
cache lines, and there will be double cache pollution just because of
this simple check. Why not the below?

	if (atomic_read(&hardlockup_detected)) {
		atomic_set(&hardlockup_detected, 0);
		printk_safe_flush();
	}

Or even, since atomic is not needed here, as it does not give any ordering guarantees.
static inline void flush_hardlockup_messages(void)
{
	if (READ_ONCE(&hardlockup_detected)) {
		WRITE_ONCE(&hardlockup_detected, 0);
		printk_safe_flush();
	}
}

watchdog_timer_fn()
{
	...
	WRITE_ONCE(&hardlockup_detected, 1);
	...
}

Kirill

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-10 22:51 ` Andrew Morton
@ 2020-02-11 11:01   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2020-02-11 11:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Peter Zijlstra, linux-kernel, Steven Rostedt,
	Sergey Senozhatsky, Dmitry Monakhov, Kirill Tkhai

On 11/02/2020 01.51, Andrew Morton wrote:
> On Mon, 10 Feb 2020 12:48:57 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:
> 
>> In NMI context printk() could save messages into per-cpu buffers and
>> schedule flush by irq_work when IRQ are unblocked. This means message
>> about hardlockup appears in kernel log only when/if lockup is gone.
> 
> I think I understand what this means.  The hard lockup detector runs at
> NMI time but if it detects a lockup within IRQ context it cannot call
> printk, because it's within NMI context, where synchronous printk
> doesn't work.  Yes?

Yes. Printing from hardlockup watchdog is only special case.
Without it irq-work will flush per-cpu buffer right after enabling irq.

> 
>> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
>> thus printk() cannot schedule flush work to another cpu.
>>
>> This patch adds simple atomic counter of detected hardlockups and
>> flushes all per-cpu printk buffers in context softlockup watchdog
>> at any other cpu when it sees changes of this counter.
> 
> And I think this works because the softlockup detector runs within irq
> context?

Yes. Softlockuo watchdog is a timer. It could use normal printk and
flush per-cpu buffers. Any periodically executed code could do that
but softlockup is most logical place for that.

There is forward signal from softlockup to hardlockup wathdogs in
per-cpu counter hrtimer_interrupts (increment means cpu in't in
hardlockup). This patch adds backward signal from hardlockup to
softlocup detector that some cpus are in hardlockup.

> 
>>
>> ...
>>
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>>   }
>>   __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>>   # endif /* CONFIG_SMP */
>> +
>> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
>> +
>> +static inline void flush_hardlockup_messages(void)
> 
> I don't think this needs to be inlined?
> 
>> +{
>> +	static atomic_t flushed = ATOMIC_INIT(0);
>> +
>> +	/* flush messages from hard lockup detector */
>> +	if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
>> +		atomic_set(&flushed, atomic_read(&hardlockup_detected));
>> +		printk_safe_flush();
>> +	}
>> +}
> 
> Could we add some explanatory comments here?  Explain to the reader why
> this code exists, what purpose it serves?  Basically a micro version of
> the above changelog.

Hmm. This seems obvious from names of variables and called function.
Both watchdogs use same patterns: monotonic counter and side variable
with snapshot to detect changes or stalls.

> 
>>
>> ...
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-11  8:14 ` Kirill Tkhai
@ 2020-02-11 12:36   ` Konstantin Khlebnikov
  2020-02-12 14:54     ` Petr Mladek
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2020-02-11 12:36 UTC (permalink / raw)
  To: Kirill Tkhai, Petr Mladek, Peter Zijlstra, linux-kernel,
	Steven Rostedt, Sergey Senozhatsky, Andrew Morton
  Cc: Dmitry Monakhov

On 11/02/2020 11.14, Kirill Tkhai wrote:
> Hi, Konstantin,
> 
> On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
>> In NMI context printk() could save messages into per-cpu buffers and
>> schedule flush by irq_work when IRQ are unblocked. This means message
>> about hardlockup appears in kernel log only when/if lockup is gone.
>>
>> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
>> thus printk() cannot schedule flush work to another cpu.
>>
>> This patch adds simple atomic counter of detected hardlockups and
>> flushes all per-cpu printk buffers in context softlockup watchdog
>> at any other cpu when it sees changes of this counter.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> ---
>>   include/linux/nmi.h   |    1 +
>>   kernel/watchdog.c     |   22 ++++++++++++++++++++++
>>   kernel/watchdog_hld.c |    1 +
>>   3 files changed, 24 insertions(+)
>>
>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>> index 9003e29cde46..8406df72ae5a 100644
>> --- a/include/linux/nmi.h
>> +++ b/include/linux/nmi.h
>> @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
>>   #if defined(CONFIG_HARDLOCKUP_DETECTOR)
>>   extern void hardlockup_detector_disable(void);
>>   extern unsigned int hardlockup_panic;
>> +extern atomic_t hardlockup_detected;
>>   #else
>>   static inline void hardlockup_detector_disable(void) {}
>>   #endif
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index b6b1f54a7837..9f5c68fababe 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>>   }
>>   __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>>   # endif /* CONFIG_SMP */
>> +
>> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
>> +
>> +static inline void flush_hardlockup_messages(void)
>> +{
>> +	static atomic_t flushed = ATOMIC_INIT(0);
>> +
>> +	/* flush messages from hard lockup detector */
>> +	if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
>> +		atomic_set(&flushed, atomic_read(&hardlockup_detected));
>> +		printk_safe_flush();
>> +	}
>> +}
> 
> Do we really need two variables here? They may come into two different
> cache lines, and there will be double cache pollution just because of
> this simple check. Why not the below?

I don't think anybody could notice read-only access to second variable.
This executes once in several seconds.

Watchdogs already use same pattern (monotonic counter + snapshot) in
couple places. So code looks more clean in this way.

> 
> 	if (atomic_read(&hardlockup_detected)) {
> 		atomic_set(&hardlockup_detected, 0);
> 		printk_safe_flush();
> 	}
> 
> Or even, since atomic is not needed here, as it does not give any ordering guarantees.
> static inline void flush_hardlockup_messages(void)
> {
> 	if (READ_ONCE(&hardlockup_detected)) {
> 		WRITE_ONCE(&hardlockup_detected, 0);
> 		printk_safe_flush();
> 	}
> }
> 
> watchdog_timer_fn()
> {
> 	...
> 	WRITE_ONCE(&hardlockup_detected, 1);
> 	...
> }
> 
> Kirill
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-10  9:48 [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected Konstantin Khlebnikov
  2020-02-10 22:51 ` Andrew Morton
  2020-02-11  8:14 ` Kirill Tkhai
@ 2020-02-12  1:15 ` Sergey Senozhatsky
  2020-02-12  2:49   ` Steven Rostedt
  2020-02-12 14:18   ` Petr Mladek
  2 siblings, 2 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2020-02-12  1:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Peter Zijlstra, linux-kernel, Steven Rostedt, Sergey Senozhatsky,
	Andrew Morton, Dmitry Monakhov, Konstantin Khlebnikov

On (20/02/10 12:48), Konstantin Khlebnikov wrote:
> 
> In NMI context printk() could save messages into per-cpu buffers and
> schedule flush by irq_work when IRQ are unblocked. This means message
> about hardlockup appears in kernel log only when/if lockup is gone.
> 
> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> thus printk() cannot schedule flush work to another cpu.
> 
> This patch adds simple atomic counter of detected hardlockups and
> flushes all per-cpu printk buffers in context softlockup watchdog
> at any other cpu when it sees changes of this counter.

Petr, could you remind me, why do we do PRINTK_NMI_DIRECT_CONTEXT_MASK
only from ftrace?

	-ss

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-12  1:15 ` Sergey Senozhatsky
@ 2020-02-12  2:49   ` Steven Rostedt
  2020-02-12  3:04     ` Sergey Senozhatsky
  2020-02-12 14:18   ` Petr Mladek
  1 sibling, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2020-02-12  2:49 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Peter Zijlstra, linux-kernel, Sergey Senozhatsky,
	Andrew Morton, Dmitry Monakhov, Konstantin Khlebnikov

On Wed, 12 Feb 2020 10:15:51 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> On (20/02/10 12:48), Konstantin Khlebnikov wrote:
> > 
> > In NMI context printk() could save messages into per-cpu buffers and
> > schedule flush by irq_work when IRQ are unblocked. This means message
> > about hardlockup appears in kernel log only when/if lockup is gone.
> > 
> > Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> > thus printk() cannot schedule flush work to another cpu.
> > 
> > This patch adds simple atomic counter of detected hardlockups and
> > flushes all per-cpu printk buffers in context softlockup watchdog
> > at any other cpu when it sees changes of this counter.  
> 
> Petr, could you remind me, why do we do PRINTK_NMI_DIRECT_CONTEXT_MASK
> only from ftrace?

Could it be because its from ftrace_dump() which can spit out millions
of lines from NMI context?

-- Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-12  2:49   ` Steven Rostedt
@ 2020-02-12  3:04     ` Sergey Senozhatsky
  2020-02-12  3:27       ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Sergey Senozhatsky @ 2020-02-12  3:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Peter Zijlstra, linux-kernel,
	Sergey Senozhatsky, Andrew Morton, Dmitry Monakhov,
	Konstantin Khlebnikov

On (20/02/11 21:49), Steven Rostedt wrote:
> On Wed, 12 Feb 2020 10:15:51 +0900
> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
> 
> > On (20/02/10 12:48), Konstantin Khlebnikov wrote:
> > > 
> > > In NMI context printk() could save messages into per-cpu buffers and
> > > schedule flush by irq_work when IRQ are unblocked. This means message
> > > about hardlockup appears in kernel log only when/if lockup is gone.
> > > 
> > > Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> > > thus printk() cannot schedule flush work to another cpu.
> > > 
> > > This patch adds simple atomic counter of detected hardlockups and
> > > flushes all per-cpu printk buffers in context softlockup watchdog
> > > at any other cpu when it sees changes of this counter.
> > 
> > Petr, could you remind me, why do we do PRINTK_NMI_DIRECT_CONTEXT_MASK
> > only from ftrace?
> 
> Could it be because its from ftrace_dump() which can spit out millions
> of lines from NMI context?

Oh, yes, ftrace printks a lot. But I sort of forgot why don't we do
the same for "regular" NMIs. So NMIs use per-cpu buffers, expect for
NMIs which involve ftrace dump. I'm missing something here.

	-ss

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-12  3:04     ` Sergey Senozhatsky
@ 2020-02-12  3:27       ` Steven Rostedt
  0 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2020-02-12  3:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Peter Zijlstra, linux-kernel, Sergey Senozhatsky,
	Andrew Morton, Dmitry Monakhov, Konstantin Khlebnikov

On Wed, 12 Feb 2020 12:04:03 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> Oh, yes, ftrace printks a lot. But I sort of forgot why don't we do
> the same for "regular" NMIs. So NMIs use per-cpu buffers, expect for
> NMIs which involve ftrace dump. I'm missing something here.

Well, ftrace_dump() is called from NMI context when the system is about
to crash (ftrace_dump_on_oops). And at that moment, we care more about
the trace than other output (it's most like to contain the information
that caused the bug).

But for things like sysrq-l, that does a printk in NMI context for
normal operations, we don't want strange races to occur and affect
other printk operations. Having them in buffers controls the output a
bit better.

But with the new printk ring buffer work, perhaps that's no longer
necessary.

-- Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-12  1:15 ` Sergey Senozhatsky
  2020-02-12  2:49   ` Steven Rostedt
@ 2020-02-12 14:18   ` Petr Mladek
  1 sibling, 0 replies; 12+ messages in thread
From: Petr Mladek @ 2020-02-12 14:18 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, linux-kernel, Steven Rostedt, Sergey Senozhatsky,
	Andrew Morton, Dmitry Monakhov, Konstantin Khlebnikov

On Wed 2020-02-12 10:15:51, Sergey Senozhatsky wrote:
> On (20/02/10 12:48), Konstantin Khlebnikov wrote:
> > 
> > In NMI context printk() could save messages into per-cpu buffers and
> > schedule flush by irq_work when IRQ are unblocked. This means message
> > about hardlockup appears in kernel log only when/if lockup is gone.
> > 
> > Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> > thus printk() cannot schedule flush work to another cpu.
> > 
> > This patch adds simple atomic counter of detected hardlockups and
> > flushes all per-cpu printk buffers in context softlockup watchdog
> > at any other cpu when it sees changes of this counter.
> 
> Petr, could you remind me, why do we do PRINTK_NMI_DIRECT_CONTEXT_MASK
> only from ftrace?

There was a possible deadlock when printing backtraces from all CPUs.
The CPUs were serialized via a lock in nmi_cpu_backtrace(). One of
them might have been interrupted under logbuf_lock.

ftrace was needed because it printed too many messages. And it was
safe because the ftrace log was read from a single CPU without
any lock.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-11 12:36   ` Konstantin Khlebnikov
@ 2020-02-12 14:54     ` Petr Mladek
  2020-02-13 13:05       ` Konstantin Khlebnikov
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Mladek @ 2020-02-12 14:54 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Kirill Tkhai, Peter Zijlstra, linux-kernel, Steven Rostedt,
	Sergey Senozhatsky, Andrew Morton, Dmitry Monakhov

On Tue 2020-02-11 15:36:02, Konstantin Khlebnikov wrote:
> On 11/02/2020 11.14, Kirill Tkhai wrote:
> > Hi, Konstantin,
> > 
> > On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
> > > In NMI context printk() could save messages into per-cpu buffers and
> > > schedule flush by irq_work when IRQ are unblocked. This means message
> > > about hardlockup appears in kernel log only when/if lockup is gone.
> > > 
> > > Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> > > thus printk() cannot schedule flush work to another cpu.
> > > 
> > > This patch adds simple atomic counter of detected hardlockups and
> > > flushes all per-cpu printk buffers in context softlockup watchdog
> > > at any other cpu when it sees changes of this counter.
> > > 
> > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > > ---
> > >   include/linux/nmi.h   |    1 +
> > >   kernel/watchdog.c     |   22 ++++++++++++++++++++++
> > >   kernel/watchdog_hld.c |    1 +
> > >   3 files changed, 24 insertions(+)
> > > 
> > > diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> > > index 9003e29cde46..8406df72ae5a 100644
> > > --- a/include/linux/nmi.h
> > > +++ b/include/linux/nmi.h
> > > @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
> > >   #if defined(CONFIG_HARDLOCKUP_DETECTOR)
> > >   extern void hardlockup_detector_disable(void);
> > >   extern unsigned int hardlockup_panic;
> > > +extern atomic_t hardlockup_detected;
> > >   #else
> > >   static inline void hardlockup_detector_disable(void) {}
> > >   #endif
> > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > index b6b1f54a7837..9f5c68fababe 100644
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> > >   }
> > >   __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
> > >   # endif /* CONFIG_SMP */
> > > +
> > > +atomic_t hardlockup_detected = ATOMIC_INIT(0);
> > > +
> > > +static inline void flush_hardlockup_messages(void)
> > > +{
> > > +	static atomic_t flushed = ATOMIC_INIT(0);
> > > +
> > > +	/* flush messages from hard lockup detector */
> > > +	if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
> > > +		atomic_set(&flushed, atomic_read(&hardlockup_detected));
> > > +		printk_safe_flush();
> > > +	}
> > > +}
> > 
> > Do we really need two variables here? They may come into two different
> > cache lines, and there will be double cache pollution just because of
> > this simple check. Why not the below?
> 
> I don't think anybody could notice read-only access to second variable.
> This executes once in several seconds.
> 
> Watchdogs already use same pattern (monotonic counter + snapshot) in
> couple places. So code looks more clean in this way.

It is not only about speed. It is also about code complexity
and correctness. Using two variables is more complex.
Calling atomic_read(&hardlockup_detected) twice does
not look like a correct pattern.

The single variable patter is used for similar things there
as well, for example, see hard_watchdog_warn,
hardlockup_allcpu_dumped.

I would call the variable "hardlockup_dump_flush" and
use the same logic as for hardlockup_allcpu_dumped.

Note that simple READ_ONCE(), WRITE_ONCE() are not enough
because they do not provide smp barriers.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
  2020-02-12 14:54     ` Petr Mladek
@ 2020-02-13 13:05       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2020-02-13 13:05 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Kirill Tkhai, Peter Zijlstra, linux-kernel, Steven Rostedt,
	Sergey Senozhatsky, Andrew Morton, Dmitry Monakhov



On 12/02/2020 17.54, Petr Mladek wrote:
> On Tue 2020-02-11 15:36:02, Konstantin Khlebnikov wrote:
>> On 11/02/2020 11.14, Kirill Tkhai wrote:
>>> Hi, Konstantin,
>>>
>>> On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
>>>> In NMI context printk() could save messages into per-cpu buffers and
>>>> schedule flush by irq_work when IRQ are unblocked. This means message
>>>> about hardlockup appears in kernel log only when/if lockup is gone.
>>>>
>>>> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
>>>> thus printk() cannot schedule flush work to another cpu.
>>>>
>>>> This patch adds simple atomic counter of detected hardlockups and
>>>> flushes all per-cpu printk buffers in context softlockup watchdog
>>>> at any other cpu when it sees changes of this counter.
>>>>
>>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>>> ---
>>>>    include/linux/nmi.h   |    1 +
>>>>    kernel/watchdog.c     |   22 ++++++++++++++++++++++
>>>>    kernel/watchdog_hld.c |    1 +
>>>>    3 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>>>> index 9003e29cde46..8406df72ae5a 100644
>>>> --- a/include/linux/nmi.h
>>>> +++ b/include/linux/nmi.h
>>>> @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
>>>>    #if defined(CONFIG_HARDLOCKUP_DETECTOR)
>>>>    extern void hardlockup_detector_disable(void);
>>>>    extern unsigned int hardlockup_panic;
>>>> +extern atomic_t hardlockup_detected;
>>>>    #else
>>>>    static inline void hardlockup_detector_disable(void) {}
>>>>    #endif
>>>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>>>> index b6b1f54a7837..9f5c68fababe 100644
>>>> --- a/kernel/watchdog.c
>>>> +++ b/kernel/watchdog.c
>>>> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>>>>    }
>>>>    __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>>>>    # endif /* CONFIG_SMP */
>>>> +
>>>> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
>>>> +
>>>> +static inline void flush_hardlockup_messages(void)
>>>> +{
>>>> +	static atomic_t flushed = ATOMIC_INIT(0);
>>>> +
>>>> +	/* flush messages from hard lockup detector */
>>>> +	if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
>>>> +		atomic_set(&flushed, atomic_read(&hardlockup_detected));
>>>> +		printk_safe_flush();
>>>> +	}
>>>> +}
>>>
>>> Do we really need two variables here? They may come into two different
>>> cache lines, and there will be double cache pollution just because of
>>> this simple check. Why not the below?
>>
>> I don't think anybody could notice read-only access to second variable.
>> This executes once in several seconds.
>>
>> Watchdogs already use same pattern (monotonic counter + snapshot) in
>> couple places. So code looks more clean in this way.
> 
> It is not only about speed. It is also about code complexity
> and correctness. Using two variables is more complex.
> Calling atomic_read(&hardlockup_detected) twice does > not look like a correct pattern.

It guarantees "at least once" which is enough for this case.

> 
> The single variable patter is used for similar things there
> as well, for example, see hard_watchdog_warn,

> hardlockup_allcpu_dumped.

Ouch. This works only once and there is no way to rearm it back.
Now I see why this thing never worked for me recent years =)

Maybe it's better reset sysctl_hardlockup_all_cpu_backtrace
to let user set sysctl back to 1.
Or rearm it back in softlockup watchdog after timeout.

> 
> I would call the variable "hardlockup_dump_flush" and
> use the same logic as for hardlockup_allcpu_dumped.
> 
> Note that simple READ_ONCE(), WRITE_ONCE() are not enough
> because they do not provide smp barriers.

It's hard to imagine arch which actually needs barrires here.
Softlockup timers will eventually see the change.

> 
> Best Regards,
> Petr
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-02-13 13:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10  9:48 [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected Konstantin Khlebnikov
2020-02-10 22:51 ` Andrew Morton
2020-02-11 11:01   ` Konstantin Khlebnikov
2020-02-11  8:14 ` Kirill Tkhai
2020-02-11 12:36   ` Konstantin Khlebnikov
2020-02-12 14:54     ` Petr Mladek
2020-02-13 13:05       ` Konstantin Khlebnikov
2020-02-12  1:15 ` Sergey Senozhatsky
2020-02-12  2:49   ` Steven Rostedt
2020-02-12  3:04     ` Sergey Senozhatsky
2020-02-12  3:27       ` Steven Rostedt
2020-02-12 14:18   ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).