linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Removal of printk safe buffers delays NMI context printk
@ 2021-11-04 15:54 Nicholas Piggin
  2021-11-04 16:18 ` John Ogness
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2021-11-04 15:54 UTC (permalink / raw)
  To: John Ogness; +Cc: Petr Mladek, linux-kernel

Hi John,

It seems printk from NMI context is now delayed indefinitely and
there is no printk_safe_flush equivalent (or I can't see one) to
allow a NMI buffer to be flushed by a different CPU.

This causes hard lockup watchdog messages to not get shown on the
console. I can call printk from a different CPU and that seems to
flush the stuck CPU's NMI buffer immediately.

What's the best way to expose this? Can we have something like tihs?

void printk_flush(void)
{
	preempt_disable();
	if (console_trylock_spinning())
		console_unlock();
	preempt_enable();
        wake_up_klogd();
}

Thanks,
Nick


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-04 15:54 Removal of printk safe buffers delays NMI context printk Nicholas Piggin
@ 2021-11-04 16:18 ` John Ogness
  2021-11-05  1:26   ` Nicholas Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: John Ogness @ 2021-11-04 16:18 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Petr Mladek, linux-kernel

Hi Nick,

On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
> It seems printk from NMI context is now delayed indefinitely and
> there is no printk_safe_flush equivalent (or I can't see one) to
> allow a NMI buffer to be flushed by a different CPU.

NMI flushing is triggered using irq work (for the same CPU). This should
not have changed recently. Are you reporting a new issue?

> This causes hard lockup watchdog messages to not get shown on the
> console. I can call printk from a different CPU and that seems to
> flush the stuck CPU's NMI buffer immediately.

Perhaps we should be triggering the irq work on multiple CPUs if from
NMI context?

> What's the best way to expose this? Can we have something like tihs?
>
> void printk_flush(void)
> {
> 	preempt_disable();
> 	if (console_trylock_spinning())
> 		console_unlock();
> 	preempt_enable();
>         wake_up_klogd();
> }

We are planning on implementing a pr_flush() that will do something
similar. But I am wondering how you are planning on triggering a CPU to
call that function.

John Ogness

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-04 16:18 ` John Ogness
@ 2021-11-05  1:26   ` Nicholas Piggin
  2021-11-05  9:55     ` John Ogness
  2021-11-05 16:01     ` Petr Mladek
  0 siblings, 2 replies; 13+ messages in thread
From: Nicholas Piggin @ 2021-11-05  1:26 UTC (permalink / raw)
  To: John Ogness; +Cc: linux-kernel, Petr Mladek, Laurent Dufour

Excerpts from John Ogness's message of November 5, 2021 2:18 am:
> Hi Nick,
> 
> On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
>> It seems printk from NMI context is now delayed indefinitely and
>> there is no printk_safe_flush equivalent (or I can't see one) to
>> allow a NMI buffer to be flushed by a different CPU.
> 
> NMI flushing is triggered using irq work (for the same CPU). This should
> not have changed recently. Are you reporting a new issue?

Previously prink_safe_flush from another CPU would flush those out.

>> This causes hard lockup watchdog messages to not get shown on the
>> console. I can call printk from a different CPU and that seems to
>> flush the stuck CPU's NMI buffer immediately.
> 
> Perhaps we should be triggering the irq work on multiple CPUs if from
> NMI context?

Possibly. I don't know if it's really needed because the irq work should 
usually be fine, the problem are lockup detector messages where the
printing CPU does not enable irqs after it returns from NMI.

> 
>> What's the best way to expose this? Can we have something like tihs?
>>
>> void printk_flush(void)
>> {
>> 	preempt_disable();
>> 	if (console_trylock_spinning())
>> 		console_unlock();
>> 	preempt_enable();
>>         wake_up_klogd();
>> }
> 
> We are planning on implementing a pr_flush() that will do something
> similar. But I am wondering how you are planning on triggering a CPU to
> call that function.

Similar way as was removed by commit 93d102f094b ("printk: remove safe 
buffers") fro nmi_backtrace and powerpc watchdog. The following patch
is what I'm using which seems to work for me.

What would pr_flush do differently and where were you planning to use 
it?

Thanks,
Nick
--
    printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
    
    printk from NMI context relies on irq work being raised on the local CPU
    to print to console. This can be a problem if the NMI was raised by a
    lockup detector to print lockup stack and regs, because the CPU may not
    enable irqs (because it is locked up).
    
    Introduce printk_flush() that can be called from non-NMI context on
    another CPU to try to get those messages to the console.
    
    Fixes: 93d102f094be ("printk: remove safe buffers")
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 5f69ba4de1f3..59ddb24797ae 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -227,6 +227,12 @@ static void watchdog_smp_panic(int cpu)
 		cpumask_clear(&wd_smp_cpus_ipi);
 	}
 
+	/*
+	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * and therefore could not run their irq_work.
+	 */
+	printk_flush();
+
 	if (hardlockup_panic)
 		nmi_panic(NULL, "Hard LOCKUP");
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 85b656f82d75..50424ad87fb5 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -198,6 +198,7 @@ void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
 extern asmlinkage void dump_stack(void) __cold;
+void printk_flush(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -274,6 +275,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
 static inline void dump_stack(void)
 {
 }
+static inline void printk_flush(void)
+{
+}
 #endif
 
 #ifdef CONFIG_SMP
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a8d0a58deebc..dd9471bb58c2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2206,6 +2206,26 @@ int vprintk_store(int facility, int level,
 	return ret;
 }
 
+void printk_flush(void)
+{
+	/*
+	 * Disable preemption to avoid being preempted while holding
+	 * console_sem which would prevent anyone from printing to
+	 * console
+	 */
+	preempt_disable();
+	/*
+	 * Try to acquire and then immediately release the console
+	 * semaphore.  The release will print out buffers and wake up
+	 * /dev/kmsg and syslog() users.
+	 */
+	if (console_trylock_spinning())
+		console_unlock();
+	preempt_enable();
+
+	wake_up_klogd();
+}
+
 asmlinkage int vprintk_emit(int facility, int level,
 			    const struct dev_printk_info *dev_info,
 			    const char *fmt, va_list args)
@@ -2228,24 +2248,11 @@ asmlinkage int vprintk_emit(int facility, int level,
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
 
 	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
-		/*
-		 * Disable preemption to avoid being preempted while holding
-		 * console_sem which would prevent anyone from printing to
-		 * console
-		 */
-		preempt_disable();
-		/*
-		 * Try to acquire and then immediately release the console
-		 * semaphore.  The release will print out buffers and wake up
-		 * /dev/kmsg and syslog() users.
-		 */
-		if (console_trylock_spinning())
-			console_unlock();
-		preempt_enable();
-	}
+	if (!in_sched)
+		printk_flush();
+	else
+		wake_up_klogd();
 
-	wake_up_klogd();
 	return printed_len;
 }
 EXPORT_SYMBOL(vprintk_emit);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index f9e89001b52e..b735b2e24902 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -75,6 +75,12 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 		touch_softlockup_watchdog();
 	}
 
+	/*
+	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * and therefore could not run their irq_work.
+	 */
+	printk_flush();
+
 	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
 }


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-05  1:26   ` Nicholas Piggin
@ 2021-11-05  9:55     ` John Ogness
  2021-11-05 11:43       ` Nicholas Piggin
  2021-11-05 16:01     ` Petr Mladek
  1 sibling, 1 reply; 13+ messages in thread
From: John Ogness @ 2021-11-05  9:55 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, Petr Mladek, Laurent Dufour

On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
>> We are planning on implementing a pr_flush() that will do something
>> similar. But I am wondering how you are planning on triggering a CPU
>> to call that function.
>
> Similar way as was removed by commit 93d102f094b ("printk: remove safe
> buffers") fro nmi_backtrace and powerpc watchdog. The following patch
> is what I'm using which seems to work for me.

What was removed from 93d102f094b was irq_work triggering on all
CPUs. Your patch is attempting to directly print from a non-NMI
context. This is obviously a better approach if a caller is known to be
non-NMI.

Also, before 93d102f094b the safe buffer flushing was performed before
triggering the NMI backtraces (in some cases). So clearly it worked
before by accident rather than by design. (Not that that really
matters.)

> What would pr_flush do differently and where were you planning to use
> it?

Currently, and particularly when we move to threaded printers, printk
callers do not know when their messages are actually visible. However,
in some cases the printk caller (or some related task) wants to be sure
the message is visible.

pr_flush() will allow a task to block until all pending messages (at the
time of the call) are output on all consoles.

Your patch is doing something similar for non-NMI contexts, except that
it will exit early if it fails to get the console lock. For your case
this is probably acceptable because you are only concerned that some
context is printing the messages, even if the latest messages may not
have been printed yet.

>     printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
>     
>     printk from NMI context relies on irq work being raised on the local CPU
>     to print to console. This can be a problem if the NMI was raised by a
>     lockup detector to print lockup stack and regs, because the CPU may not
>     enable irqs (because it is locked up).
>     
>     Introduce printk_flush() that can be called from non-NMI context on
>     another CPU to try to get those messages to the console.
>     
>     Fixes: 93d102f094be ("printk: remove safe buffers")
>     Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index 5f69ba4de1f3..59ddb24797ae 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -227,6 +227,12 @@ static void watchdog_smp_panic(int cpu)
>  		cpumask_clear(&wd_smp_cpus_ipi);
>  	}
>  
> +	/*
> +	 * Force flush any remote buffers that might be stuck in IRQ context
> +	 * and therefore could not run their irq_work.
> +	 */
> +	printk_flush();
> +

nmi_trigger_cpumask_backtrace() tracks and waits (up to 10 seconds) for
the CPUs to print their backtrace before flushing. Here there is no
waiting. I suppose in practice the NMI backtrace will be fast enough,
but I am just wondering if there should be some sort of tracking/waiting
like in nmi_trigger_cpumask_backtrace().

>  	if (hardlockup_panic)
>  		nmi_panic(NULL, "Hard LOCKUP");
>  
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 85b656f82d75..50424ad87fb5 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -198,6 +198,7 @@ void dump_stack_print_info(const char *log_lvl);
>  void show_regs_print_info(const char *log_lvl);
>  extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
>  extern asmlinkage void dump_stack(void) __cold;
> +void printk_flush(void);
>  #else
>  static inline __printf(1, 0)
>  int vprintk(const char *s, va_list args)
> @@ -274,6 +275,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
>  static inline void dump_stack(void)
>  {
>  }
> +static inline void printk_flush(void)
> +{
> +}
>  #endif
>  
>  #ifdef CONFIG_SMP
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index a8d0a58deebc..dd9471bb58c2 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2206,6 +2206,26 @@ int vprintk_store(int facility, int level,
>  	return ret;
>  }
>  
> +void printk_flush(void)
> +{
> +	/*
> +	 * Disable preemption to avoid being preempted while holding
> +	 * console_sem which would prevent anyone from printing to
> +	 * console
> +	 */
> +	preempt_disable();
> +	/*
> +	 * Try to acquire and then immediately release the console
> +	 * semaphore.  The release will print out buffers and wake up
> +	 * /dev/kmsg and syslog() users.
> +	 */
> +	if (console_trylock_spinning())
> +		console_unlock();
> +	preempt_enable();
> +
> +	wake_up_klogd();
> +}

I have mixed feelings about this printk_flush() because it is exactly
this code that is about to undergo massive reworking. If this function
is accepted, then I wonder if printk_trigger_flush() might be a better
name since it does not necessarily wait until the messages are
printed. (There could be another task printing and already a queued
handover printer. Or this task could handover printing before
finishing.)

John Ogness

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-05  9:55     ` John Ogness
@ 2021-11-05 11:43       ` Nicholas Piggin
  2021-11-05 13:57         ` John Ogness
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2021-11-05 11:43 UTC (permalink / raw)
  To: John Ogness; +Cc: Laurent
 Dufour, linux-kernel, Petr Mladek

Excerpts from John Ogness's message of November 5, 2021 7:55 pm:
> On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
>>> We are planning on implementing a pr_flush() that will do something
>>> similar. But I am wondering how you are planning on triggering a CPU
>>> to call that function.
>>
>> Similar way as was removed by commit 93d102f094b ("printk: remove safe
>> buffers") fro nmi_backtrace and powerpc watchdog. The following patch
>> is what I'm using which seems to work for me.
> 
> What was removed from 93d102f094b was irq_work triggering on all
> CPUs.

No, it was the caller executing the flush for all remote CPUs itself.
irq work was not involved (and irq work can't be raised in a remote
CPU from NMI context).

> Your patch is attempting to directly print from a non-NMI
> context. This is obviously a better approach if a caller is known to be
> non-NMI.
> 
> Also, before 93d102f094b the safe buffer flushing was performed before
> triggering the NMI backtraces (in some cases). So clearly it worked
> before by accident rather than by design. (Not that that really
> matters.)

That's not the case, in the core nmi_backtrace.c, the flush happens
afterward.

In the powerpc code it's not actually the case either but it wasn't
obvious why at least should have been commented. powerpc's
trigger_allbutself_cpu_backtrace() is not a NMI IPI, just a normal
one. So if a CPU takes it then it will flush its own buffer via
irq_work. You only have to do the printk flush for the case of the
true NMI IPI which comes before.

That code does need a bit more work for other reasons, but we do
need that printk flush capability back there and for nmi_backtrace.

>> What would pr_flush do differently and where were you planning to use
>> it?
> 
> Currently, and particularly when we move to threaded printers, printk
> callers do not know when their messages are actually visible. However,
> in some cases the printk caller (or some related task) wants to be sure
> the message is visible.
> 
> pr_flush() will allow a task to block until all pending messages (at the
> time of the call) are output on all consoles.
> 
> Your patch is doing something similar for non-NMI contexts, except that
> it will exit early if it fails to get the console lock. For your case
> this is probably acceptable because you are only concerned that some
> context is printing the messages, even if the latest messages may not
> have been printed yet.

Yep, if something else has the lock hopefully it should release it and
do the flush when it does. We're already 10 seconds delayed here so a
bit longer doesn't matter, just needs to get out at some point.

>>     printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
>>     
>>     printk from NMI context relies on irq work being raised on the local CPU
>>     to print to console. This can be a problem if the NMI was raised by a
>>     lockup detector to print lockup stack and regs, because the CPU may not
>>     enable irqs (because it is locked up).
>>     
>>     Introduce printk_flush() that can be called from non-NMI context on
>>     another CPU to try to get those messages to the console.
>>     
>>     Fixes: 93d102f094be ("printk: remove safe buffers")
>>     Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>
>> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
>> index 5f69ba4de1f3..59ddb24797ae 100644
>> --- a/arch/powerpc/kernel/watchdog.c
>> +++ b/arch/powerpc/kernel/watchdog.c
>> @@ -227,6 +227,12 @@ static void watchdog_smp_panic(int cpu)
>>  		cpumask_clear(&wd_smp_cpus_ipi);
>>  	}
>>  
>> +	/*
>> +	 * Force flush any remote buffers that might be stuck in IRQ context
>> +	 * and therefore could not run their irq_work.
>> +	 */
>> +	printk_flush();
>> +
> 
> nmi_trigger_cpumask_backtrace() tracks and waits (up to 10 seconds) for
> the CPUs to print their backtrace before flushing. Here there is no
> waiting. I suppose in practice the NMI backtrace will be fast enough,
> but I am just wondering if there should be some sort of tracking/waiting
> like in nmi_trigger_cpumask_backtrace().

Yeah it has tended to work, but I have something to fix that up.

> 
>>  	if (hardlockup_panic)
>>  		nmi_panic(NULL, "Hard LOCKUP");
>>  
>> diff --git a/include/linux/printk.h b/include/linux/printk.h
>> index 85b656f82d75..50424ad87fb5 100644
>> --- a/include/linux/printk.h
>> +++ b/include/linux/printk.h
>> @@ -198,6 +198,7 @@ void dump_stack_print_info(const char *log_lvl);
>>  void show_regs_print_info(const char *log_lvl);
>>  extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
>>  extern asmlinkage void dump_stack(void) __cold;
>> +void printk_flush(void);
>>  #else
>>  static inline __printf(1, 0)
>>  int vprintk(const char *s, va_list args)
>> @@ -274,6 +275,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
>>  static inline void dump_stack(void)
>>  {
>>  }
>> +static inline void printk_flush(void)
>> +{
>> +}
>>  #endif
>>  
>>  #ifdef CONFIG_SMP
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index a8d0a58deebc..dd9471bb58c2 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2206,6 +2206,26 @@ int vprintk_store(int facility, int level,
>>  	return ret;
>>  }
>>  
>> +void printk_flush(void)
>> +{
>> +	/*
>> +	 * Disable preemption to avoid being preempted while holding
>> +	 * console_sem which would prevent anyone from printing to
>> +	 * console
>> +	 */
>> +	preempt_disable();
>> +	/*
>> +	 * Try to acquire and then immediately release the console
>> +	 * semaphore.  The release will print out buffers and wake up
>> +	 * /dev/kmsg and syslog() users.
>> +	 */
>> +	if (console_trylock_spinning())
>> +		console_unlock();
>> +	preempt_enable();
>> +
>> +	wake_up_klogd();
>> +}
> 
> I have mixed feelings about this printk_flush() because it is exactly
> this code that is about to undergo massive reworking.

Better to fix up the regression before said massive reworking I think.

> If this function
> is accepted, then I wonder if printk_trigger_flush() might be a better
> name since it does not necessarily wait until the messages are
> printed. (There could be another task printing and already a queued
> handover printer. Or this task could handover printing before
> finishing.)

Sure I'm not wedded to the name. I can resend with the name change if
there's no other issues (the powerpc fixups will go as separate series).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-05 11:43       ` Nicholas Piggin
@ 2021-11-05 13:57         ` John Ogness
  2021-11-05 16:23           ` Petr Mladek
  2021-11-05 23:48           ` Nicholas Piggin
  0 siblings, 2 replies; 13+ messages in thread
From: John Ogness @ 2021-11-05 13:57 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Laurent Dufour, linux-kernel, Petr Mladek

On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
>> What was removed from 93d102f094b was irq_work triggering on all
>> CPUs.
>
> No, it was the caller executing the flush for all remote CPUs itself.
> irq work was not involved (and irq work can't be raised in a remote
> CPU from NMI context).

Maybe I am missing something. In 93d102f094b~1 I see:

watchdog_smp_panic
  printk_safe_flush
    __printk_safe_flush
      printk_safe_flush_buffer
        printk_safe_flush_line
          printk_deferred
            vprintk_deferred
              vprintk_emit (but no direct printing)
              defer_console_output
                irq_work_queue

AFAICT, using defer_console_output() instead of your new printk_flush()
should cause the exact behavior as before.

> but we do need that printk flush capability back there and for
> nmi_backtrace.

Agreed. I had not considered this necessary side-effect when I removed
the NMI safe buffers.

I am just wondering if we should fix the regression by going back to
using irq_work (such as defer_console_output()) or if we want to
introduce something new that introduces direct printing.

John Ogness

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-05  1:26   ` Nicholas Piggin
  2021-11-05  9:55     ` John Ogness
@ 2021-11-05 16:01     ` Petr Mladek
  1 sibling, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2021-11-05 16:01 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: John Ogness, linux-kernel, Laurent Dufour

On Fri 2021-11-05 11:26:35, Nicholas Piggin wrote:
>     printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
>     
>     printk from NMI context relies on irq work being raised on the local CPU
>     to print to console. This can be a problem if the NMI was raised by a
>     lockup detector to print lockup stack and regs, because the CPU may not
>     enable irqs (because it is locked up).
>     
>     Introduce printk_flush() that can be called from non-NMI context on
>     another CPU to try to get those messages to the console.
>     
>     Fixes: 93d102f094be ("printk: remove safe buffers")
>     Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2206,6 +2206,26 @@ int vprintk_store(int facility, int level,
>  	return ret;
>  }
>  
> +void printk_flush(void)
> +{
> +	/*
> +	 * Disable preemption to avoid being preempted while holding
> +	 * console_sem which would prevent anyone from printing to
> +	 * console
> +	 */
> +	preempt_disable();
> +	/*
> +	 * Try to acquire and then immediately release the console
> +	 * semaphore.  The release will print out buffers and wake up
> +	 * /dev/kmsg and syslog() users.
> +	 */
> +	if (console_trylock_spinning())
> +		console_unlock();
> +	preempt_enable();
> +
> +	wake_up_klogd();
> +}
> +
>  asmlinkage int vprintk_emit(int facility, int level,
>  			    const struct dev_printk_info *dev_info,
>  			    const char *fmt, va_list args)
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -75,6 +75,12 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>  		touch_softlockup_watchdog();
>  	}
>  
> +	/*
> +	 * Force flush any remote buffers that might be stuck in IRQ context
> +	 * and therefore could not run their irq_work.
> +	 */
> +	printk_flush();

IMHO, nmi_trigger_cpumask_backtrace() might be called also from NMI
context.

printk_flush() is not safe in NMI context because console drivers have
their own locks. Even conosle_trylock() takes a lock, see
raw_spin_lock_irqsave(&sem->lock, flags) in down_trylock().

Best Regards,
Petr

> +
>  	clear_bit_unlock(0, &backtrace_flag);
>  	put_cpu();
>  }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-05 13:57         ` John Ogness
@ 2021-11-05 16:23           ` Petr Mladek
  2021-11-05 16:44             ` John Ogness
  2021-11-05 23:57             ` Nicholas Piggin
  2021-11-05 23:48           ` Nicholas Piggin
  1 sibling, 2 replies; 13+ messages in thread
From: Petr Mladek @ 2021-11-05 16:23 UTC (permalink / raw)
  To: John Ogness; +Cc: Nicholas Piggin, Laurent Dufour, linux-kernel

On Fri 2021-11-05 15:03:27, John Ogness wrote:
> On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
> >> What was removed from 93d102f094b was irq_work triggering on all
> >> CPUs.
> >
> > No, it was the caller executing the flush for all remote CPUs itself.
> > irq work was not involved (and irq work can't be raised in a remote
> > CPU from NMI context).
> 
> Maybe I am missing something. In 93d102f094b~1 I see:
> 
> watchdog_smp_panic
>   printk_safe_flush
>     __printk_safe_flush
>       printk_safe_flush_buffer
>         printk_safe_flush_line
>           printk_deferred
>             vprintk_deferred
>               vprintk_emit (but no direct printing)
>               defer_console_output
>                 irq_work_queue
> 
> AFAICT, using defer_console_output() instead of your new printk_flush()
> should cause the exact behavior as before.

I agree. printk_safe_flush() used printk_deferred(). It only queued
the irq_work and never called consoles directly.

> > but we do need that printk flush capability back there and for
> > nmi_backtrace.
> 
> Agreed. I had not considered this necessary side-effect when I removed
> the NMI safe buffers.

Honestly, I do not understand why it stopped working or how
it worked before.

printk() calls vprintk(). Current vprintk() does:

asmlinkage int vprintk(const char *fmt, va_list args)
{
[...]
	/*
	 * Use the main logbuf even in NMI. But avoid calling console
	 * drivers that might have their own locks.
	 */
	if (this_cpu_read(printk_context) || in_nmi()) {
		int len;

		len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
		defer_console_output();
		return len;
	}

	/* No obstacles. */
	return vprintk_default(fmt, args);
}

By other words, current vprintk():

   + queues irq_work() in NMI context
   + tries to flush consoles immeditely otherwise


> I am just wondering if we should fix the regression by going back to
> using irq_work (such as defer_console_output()) or if we want to
> introduce something new that introduces direct printing.

Yup, defer_console_output() should do the same as printk_safe_flush()
before. We do not longer need to copy the messages because they are
already in the main lockless log buffer.

Well, I am curious about the original code. The commit 93d102f094be9beab28e5
("printk: remove safe buffers") did the following:

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index c9a8f4781a10..dc17d8903d4f 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -183,11 +183,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
        wd_smp_unlock(&flags);
 
-       printk_safe_flush();
-       /*
-        * printk_safe_flush() seems to require another print
-        * before anything actually goes out to console.
-        */
        if (sysctl_hardlockup_all_cpu_backtrace)
                trigger_allbutself_cpu_backtrace();

And I am curious because:

   + Why was printk_safe_flush() called before triggering backtraces
     on other CPUs?

   + The comment says that another print is needed before the messages
     goes to the console. It makes sense because printk_safe_flush()
     only set irq_work. But the patch did not remove any printk().
     So, nobody called any printk() even before.

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-05 16:23           ` Petr Mladek
@ 2021-11-05 16:44             ` John Ogness
  2021-11-06  0:26               ` Nicholas Piggin
  2021-11-05 23:57             ` Nicholas Piggin
  1 sibling, 1 reply; 13+ messages in thread
From: John Ogness @ 2021-11-05 16:44 UTC (permalink / raw)
  To: Petr Mladek; +Cc: Nicholas Piggin, Laurent Dufour, linux-kernel

On 2021-11-05, Petr Mladek <pmladek@suse.com> wrote:
> On Fri 2021-11-05 15:03:27, John Ogness wrote:
>> On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
>>> but we do need that printk flush capability back there and for
>>> nmi_backtrace.
>> 
>> Agreed. I had not considered this necessary side-effect when I
>> removed the NMI safe buffers.
>
> Honestly, I do not understand why it stopped working or how
> it worked before.

IIUC, Nick is presenting a problem where a lockup on the other CPUs is
detected. Those CPUs will dump their backtraces per NMI context. But in
their lockup state the irq_work for those CPUs is not functional. So
even though the messages are in the buffer, there is no one printing the
buffer.

printk_safe_flush() would dump the NMI safe buffers for all the CPUs
into the printk buffer, then trigger an irq_work on itself (the
non-locked-up CPU).

That irq_work trigger was critical, because the other CPUs (which also
triggered irq_works for themselves) aren't able to process irq_works. I
did not consider this case. Which is why we still need to trigger
irq_work here. (Or, as the removed comment hinted at, add some printk()
call to either directly print or trigger the irq_work.)

John Ogness

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-05 13:57         ` John Ogness
  2021-11-05 16:23           ` Petr Mladek
@ 2021-11-05 23:48           ` Nicholas Piggin
  1 sibling, 0 replies; 13+ messages in thread
From: Nicholas Piggin @ 2021-11-05 23:48 UTC (permalink / raw)
  To: John Ogness; +Cc: Laurent
 Dufour, linux-kernel, Petr Mladek

Excerpts from John Ogness's message of November 5, 2021 11:57 pm:
> On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
>>> What was removed from 93d102f094b was irq_work triggering on all
>>> CPUs.
>>
>> No, it was the caller executing the flush for all remote CPUs itself.
>> irq work was not involved (and irq work can't be raised in a remote
>> CPU from NMI context).
> 
> Maybe I am missing something. In 93d102f094b~1 I see:
> 
> watchdog_smp_panic
>   printk_safe_flush
>     __printk_safe_flush
>       printk_safe_flush_buffer
>         printk_safe_flush_line
>           printk_deferred
>             vprintk_deferred
>               vprintk_emit (but no direct printing)
>               defer_console_output
>                 irq_work_queue

Oh I thought you meant irq_work triggering on all CPUs (i.e., including
remote CPUs) was the key.

> AFAICT, using defer_console_output() instead of your new printk_flush()
> should cause the exact behavior as before.

It does.

>> but we do need that printk flush capability back there and for
>> nmi_backtrace.
> 
> Agreed. I had not considered this necessary side-effect when I removed
> the NMI safe buffers.
> 
> I am just wondering if we should fix the regression by going back to
> using irq_work (such as defer_console_output()) or if we want to
> introduce something new that introduces direct printing.

irq_work works for this situation so for a minimal fix I think it's 
fine. When you do the big rework it would be okay to do it directly
if you have such a facility for other reaons.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-05 16:23           ` Petr Mladek
  2021-11-05 16:44             ` John Ogness
@ 2021-11-05 23:57             ` Nicholas Piggin
  1 sibling, 0 replies; 13+ messages in thread
From: Nicholas Piggin @ 2021-11-05 23:57 UTC (permalink / raw)
  To: John Ogness, Petr Mladek; +Cc: Laurent Dufour, linux-kernel

Excerpts from Petr Mladek's message of November 6, 2021 2:23 am:
> On Fri 2021-11-05 15:03:27, John Ogness wrote:
>> On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
>> >> What was removed from 93d102f094b was irq_work triggering on all
>> >> CPUs.
>> >
>> > No, it was the caller executing the flush for all remote CPUs itself.
>> > irq work was not involved (and irq work can't be raised in a remote
>> > CPU from NMI context).
>> 
>> Maybe I am missing something. In 93d102f094b~1 I see:
>> 
>> watchdog_smp_panic
>>   printk_safe_flush
>>     __printk_safe_flush
>>       printk_safe_flush_buffer
>>         printk_safe_flush_line
>>           printk_deferred
>>             vprintk_deferred
>>               vprintk_emit (but no direct printing)
>>               defer_console_output
>>                 irq_work_queue
>> 
>> AFAICT, using defer_console_output() instead of your new printk_flush()
>> should cause the exact behavior as before.
> 
> I agree. printk_safe_flush() used printk_deferred(). It only queued
> the irq_work and never called consoles directly.
> 
>> > but we do need that printk flush capability back there and for
>> > nmi_backtrace.
>> 
>> Agreed. I had not considered this necessary side-effect when I removed
>> the NMI safe buffers.
> 
> Honestly, I do not understand why it stopped working or how
> it worked before.
> 
> printk() calls vprintk(). Current vprintk() does:
> 
> asmlinkage int vprintk(const char *fmt, va_list args)
> {
> [...]
> 	/*
> 	 * Use the main logbuf even in NMI. But avoid calling console
> 	 * drivers that might have their own locks.
> 	 */
> 	if (this_cpu_read(printk_context) || in_nmi()) {
> 		int len;
> 
> 		len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
> 		defer_console_output();
> 		return len;
> 	}
> 
> 	/* No obstacles. */
> 	return vprintk_default(fmt, args);
> }
> 
> By other words, current vprintk():
> 
>    + queues irq_work() in NMI context
>    + tries to flush consoles immeditely otherwise
> 
> 
>> I am just wondering if we should fix the regression by going back to
>> using irq_work (such as defer_console_output()) or if we want to
>> introduce something new that introduces direct printing.
> 
> Yup, defer_console_output() should do the same as printk_safe_flush()
> before. We do not longer need to copy the messages because they are
> already in the main lockless log buffer.
> 
> Well, I am curious about the original code. The commit 93d102f094be9beab28e5
> ("printk: remove safe buffers") did the following:
> 
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index c9a8f4781a10..dc17d8903d4f 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -183,11 +183,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
>  
>         wd_smp_unlock(&flags);
>  
> -       printk_safe_flush();
> -       /*
> -        * printk_safe_flush() seems to require another print
> -        * before anything actually goes out to console.
> -        */
>         if (sysctl_hardlockup_all_cpu_backtrace)
>                 trigger_allbutself_cpu_backtrace();
> 
> And I am curious because:
> 
>    + Why was printk_safe_flush() called before triggering backtraces
>      on other CPUs?

I commented about that earlier in the thread. The flush is for a 
previous non-maskable IPI. trigger_allbutself_cpu_backtrace is not NMI 
on powerpc it's just regular irq context so if they did respond to it 
they would also be able to run the irq work.

>    + The comment says that another print is needed before the messages
>      goes to the console. It makes sense because printk_safe_flush()
>      only set irq_work. But the patch did not remove any printk().
>      So, nobody called any printk() even before.

I think that was my misunderstanding as to the problem. The NMI IPI is
actually asynchronous so it was not guaranteed that targets had
completed their backtraces at this point. powerpc watchdog needs a
separate fix (after which it seems to work fine with printk_safe_flush)

Thanks,
Nick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-05 16:44             ` John Ogness
@ 2021-11-06  0:26               ` Nicholas Piggin
  2021-11-06 20:05                 ` John Ogness
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2021-11-06  0:26 UTC (permalink / raw)
  To: John Ogness, Petr Mladek; +Cc: Laurent Dufour, linux-kernel

Excerpts from John Ogness's message of November 6, 2021 2:44 am:
> On 2021-11-05, Petr Mladek <pmladek@suse.com> wrote:
>> On Fri 2021-11-05 15:03:27, John Ogness wrote:
>>> On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
>>>> but we do need that printk flush capability back there and for
>>>> nmi_backtrace.
>>> 
>>> Agreed. I had not considered this necessary side-effect when I
>>> removed the NMI safe buffers.
>>
>> Honestly, I do not understand why it stopped working or how
>> it worked before.
> 
> IIUC, Nick is presenting a problem where a lockup on the other CPUs is
> detected. Those CPUs will dump their backtraces per NMI context. But in
> their lockup state the irq_work for those CPUs is not functional. So
> even though the messages are in the buffer, there is no one printing the
> buffer.

Yep.

> printk_safe_flush() would dump the NMI safe buffers for all the CPUs
> into the printk buffer, then trigger an irq_work on itself (the
> non-locked-up CPU).
> 
> That irq_work trigger was critical, because the other CPUs (which also
> triggered irq_works for themselves) aren't able to process irq_works. I
> did not consider this case. Which is why we still need to trigger
> irq_work here. (Or, as the removed comment hinted at, add some printk()
> call to either directly print or trigger the irq_work.)

This patch seems to work, I can submit it if you'd like?

Thanks,
Nick

---
 arch/powerpc/kernel/watchdog.c | 6 ++++++
 include/linux/printk.h         | 4 ++++
 kernel/printk/printk.c         | 5 +++++
 lib/nmi_backtrace.c            | 6 ++++++
 4 files changed, 21 insertions(+)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 5f69ba4de1f3..c8017bc23b00 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -227,6 +227,12 @@ static void watchdog_smp_panic(int cpu)
 		cpumask_clear(&wd_smp_cpus_ipi);
 	}
 
+	/*
+	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * and therefore could not run their irq_work.
+	 */
+	printk_trigger_flush();
+
 	if (hardlockup_panic)
 		nmi_panic(NULL, "Hard LOCKUP");
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 85b656f82d75..9497f6b98339 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -198,6 +198,7 @@ void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
 extern asmlinkage void dump_stack(void) __cold;
+void printk_trigger_flush(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -274,6 +275,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
 static inline void dump_stack(void)
 {
 }
+static inline void printk_trigger_flush(void)
+{
+}
 #endif
 
 #ifdef CONFIG_SMP
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a8d0a58deebc..99221b016c68 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3252,6 +3252,11 @@ void defer_console_output(void)
 	preempt_enable();
 }
 
+void printk_trigger_flush(void)
+{
+	defer_console_output();
+}
+
 int vprintk_deferred(const char *fmt, va_list args)
 {
 	int r;
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index f9e89001b52e..199ab201d501 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -75,6 +75,12 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 		touch_softlockup_watchdog();
 	}
 
+	/*
+	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * and therefore could not run their irq_work.
+	 */
+	printk_trigger_flush();
+
 	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Removal of printk safe buffers delays NMI context printk
  2021-11-06  0:26               ` Nicholas Piggin
@ 2021-11-06 20:05                 ` John Ogness
  0 siblings, 0 replies; 13+ messages in thread
From: John Ogness @ 2021-11-06 20:05 UTC (permalink / raw)
  To: Nicholas Piggin, Petr Mladek; +Cc: Laurent Dufour, linux-kernel

On 2021-11-06, Nicholas Piggin <npiggin@gmail.com> wrote:
> This patch seems to work, I can submit it if you'd like?

It looks good to me. Thanks for identifying and fixing this regression.

John Ogness

> ---
>  arch/powerpc/kernel/watchdog.c | 6 ++++++
>  include/linux/printk.h         | 4 ++++
>  kernel/printk/printk.c         | 5 +++++
>  lib/nmi_backtrace.c            | 6 ++++++
>  4 files changed, 21 insertions(+)
>
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index 5f69ba4de1f3..c8017bc23b00 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -227,6 +227,12 @@ static void watchdog_smp_panic(int cpu)
>  		cpumask_clear(&wd_smp_cpus_ipi);
>  	}
>  
> +	/*
> +	 * Force flush any remote buffers that might be stuck in IRQ context
> +	 * and therefore could not run their irq_work.
> +	 */
> +	printk_trigger_flush();
> +
>  	if (hardlockup_panic)
>  		nmi_panic(NULL, "Hard LOCKUP");
>  
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 85b656f82d75..9497f6b98339 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -198,6 +198,7 @@ void dump_stack_print_info(const char *log_lvl);
>  void show_regs_print_info(const char *log_lvl);
>  extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
>  extern asmlinkage void dump_stack(void) __cold;
> +void printk_trigger_flush(void);
>  #else
>  static inline __printf(1, 0)
>  int vprintk(const char *s, va_list args)
> @@ -274,6 +275,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
>  static inline void dump_stack(void)
>  {
>  }
> +static inline void printk_trigger_flush(void)
> +{
> +}
>  #endif
>  
>  #ifdef CONFIG_SMP
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index a8d0a58deebc..99221b016c68 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3252,6 +3252,11 @@ void defer_console_output(void)
>  	preempt_enable();
>  }
>  
> +void printk_trigger_flush(void)
> +{
> +	defer_console_output();
> +}
> +
>  int vprintk_deferred(const char *fmt, va_list args)
>  {
>  	int r;
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index f9e89001b52e..199ab201d501 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -75,6 +75,12 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>  		touch_softlockup_watchdog();
>  	}
>  
> +	/*
> +	 * Force flush any remote buffers that might be stuck in IRQ context
> +	 * and therefore could not run their irq_work.
> +	 */
> +	printk_trigger_flush();
> +
>  	clear_bit_unlock(0, &backtrace_flag);
>  	put_cpu();
>  }
> -- 
> 2.23.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-11-06 20:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-04 15:54 Removal of printk safe buffers delays NMI context printk Nicholas Piggin
2021-11-04 16:18 ` John Ogness
2021-11-05  1:26   ` Nicholas Piggin
2021-11-05  9:55     ` John Ogness
2021-11-05 11:43       ` Nicholas Piggin
2021-11-05 13:57         ` John Ogness
2021-11-05 16:23           ` Petr Mladek
2021-11-05 16:44             ` John Ogness
2021-11-06  0:26               ` Nicholas Piggin
2021-11-06 20:05                 ` John Ogness
2021-11-05 23:57             ` Nicholas Piggin
2021-11-05 23:48           ` Nicholas Piggin
2021-11-05 16:01     ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).