linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] lib/nmi_backtrace: Serialize even messages about idle CPUs
@ 2021-07-27  8:09 Petr Mladek
  2021-07-27 15:47 ` John Ogness
  0 siblings, 1 reply; 4+ messages in thread
From: Petr Mladek @ 2021-07-27  8:09 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Petr Mladek

The commit 55d6af1d66885059ffc2a ("lib/nmi_backtrace: explicitly serialize
banner and regs") serialized backtraces from more CPUs using the re-entrant
printk_printk_cpu lock. It was a preparation step for removing the obsolete
nmi_safe buffers.

The single-line messages about idle CPUs were not serialized against other
CPUs and might appear in the middle of backtrace from another CPU,
for example:

[56394.590068] NMI backtrace for cpu 2
[56394.590069] CPU: 2 PID: 444 Comm: systemd-journal Not tainted 5.14.0-rc1-default+ #268
[56394.590071] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[56394.590072] RIP: 0010:lock_is_held_type+0x0/0x120
[56394.590071] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xb/0x10
[56394.590076] Code: a2 38 ff 0f 0b 8b 44 24 04 eb bd 48 8d ...
[56394.590077] RSP: 0018:ffffab02c07c7e68 EFLAGS: 00000246
[56394.590079] RAX: 0000000000000000 RBX: ffff9a7bc0ec8a40 RCX: ffffffffaab8eb40

It might cause confusion what CPU the following lines belongs to and
whether the backtraces are really serialized.

Prevent the confusion and serialize also the single line message against
other CPUs.

Fixes: 55d6af1d66885059ffc2a ("lib/nmi_backtrace: explicitly serialize banner and regs")
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
I have got a bit confused by the bactraces from all CPUs when I pushed
the patchset removing printk_safe buffers and double checked the behavior.

I propose this patch to avoid the confusion. I send it as a separate patch
to avoid another re-spin/delay of the non-trivial patchset.

The patch is against printk/linux.git, branch rework/printk_safe-removal.

 lib/nmi_backtrace.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 9813a983d024..f9e89001b52e 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -89,22 +89,22 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
 	unsigned long flags;
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
+		/*
+		 * Allow nested NMI backtraces while serializing
+		 * against other CPUs.
+		 */
+		printk_cpu_lock_irqsave(flags);
 		if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) {
 			pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n",
 				cpu, (void *)instruction_pointer(regs));
 		} else {
-			/*
-			 * Allow nested NMI backtraces while serializing
-			 * against other CPUs.
-			 */
-			printk_cpu_lock_irqsave(flags);
 			pr_warn("NMI backtrace for cpu %d\n", cpu);
 			if (regs)
 				show_regs(regs);
 			else
 				dump_stack();
-			printk_cpu_unlock_irqrestore(flags);
 		}
+		printk_cpu_unlock_irqrestore(flags);
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] lib/nmi_backtrace: Serialize even messages about idle CPUs
  2021-07-27  8:09 [PATCH] lib/nmi_backtrace: Serialize even messages about idle CPUs Petr Mladek
@ 2021-07-27 15:47 ` John Ogness
  2021-07-28  1:21   ` Sergey Senozhatsky
  2021-07-30 11:08   ` Petr Mladek
  0 siblings, 2 replies; 4+ messages in thread
From: John Ogness @ 2021-07-27 15:47 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Petr Mladek

On 2021-07-27, Petr Mladek <pmladek@suse.com> wrote:
> The commit 55d6af1d66885059ffc2a ("lib/nmi_backtrace: explicitly serialize
> banner and regs") serialized backtraces from more CPUs using the re-entrant
> printk_printk_cpu lock. It was a preparation step for removing the obsolete
> nmi_safe buffers.
>
> The single-line messages about idle CPUs were not serialized against other
> CPUs and might appear in the middle of backtrace from another CPU,
> for example:
>
> [56394.590068] NMI backtrace for cpu 2
> [56394.590069] CPU: 2 PID: 444 Comm: systemd-journal Not tainted 5.14.0-rc1-default+ #268
> [56394.590071] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> [56394.590072] RIP: 0010:lock_is_held_type+0x0/0x120
> [56394.590071] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xb/0x10
> [56394.590076] Code: a2 38 ff 0f 0b 8b 44 24 04 eb bd 48 8d ...
> [56394.590077] RSP: 0018:ffffab02c07c7e68 EFLAGS: 00000246
> [56394.590079] RAX: 0000000000000000 RBX: ffff9a7bc0ec8a40 RCX: ffffffffaab8eb40
>
> It might cause confusion what CPU the following lines belongs to and
> whether the backtraces are really serialized.

I originally implemented this, but later decided against it because it
causes idle CPUs to begin busy-waiting in NMI context in order to log a
single line saying they are idle. If the user is aware that there is
only 1 line for the idle message, then the user knows that it isn't
causing a problem for reading the stack trace.

When triggering many such dumps on systems with many CPUs where this
patch is applied, it seemed like I was making the whole system work
awfully hard for something that should be trivial.

Considering that dump_stack() and show_regs() should be fast and we are
only dumping to the lockless buffer, it is probably OK to be doing all
the busy-waiting. Once atomic consoles are introduced, it will have
quite an impact here, but atomic consoles are mostly only active on
system crash, so I think that would be OK as well.

Feel free to add:

Reviewed-by: John Ogness <john.ogness@linutronix.de>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] lib/nmi_backtrace: Serialize even messages about idle CPUs
  2021-07-27 15:47 ` John Ogness
@ 2021-07-28  1:21   ` Sergey Senozhatsky
  2021-07-30 11:08   ` Petr Mladek
  1 sibling, 0 replies; 4+ messages in thread
From: Sergey Senozhatsky @ 2021-07-28  1:21 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel

On (21/07/27 17:53), John Ogness wrote:
> On 2021-07-27, Petr Mladek <pmladek@suse.com> wrote:
> > The commit 55d6af1d66885059ffc2a ("lib/nmi_backtrace: explicitly serialize
> > banner and regs") serialized backtraces from more CPUs using the re-entrant
> > printk_printk_cpu lock. It was a preparation step for removing the obsolete
> > nmi_safe buffers.
> >
> > The single-line messages about idle CPUs were not serialized against other
> > CPUs and might appear in the middle of backtrace from another CPU,
> > for example:
> >
> > [56394.590068] NMI backtrace for cpu 2
> > [56394.590069] CPU: 2 PID: 444 Comm: systemd-journal Not tainted 5.14.0-rc1-default+ #268
> > [56394.590071] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> > [56394.590072] RIP: 0010:lock_is_held_type+0x0/0x120
> > [56394.590071] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xb/0x10
> > [56394.590076] Code: a2 38 ff 0f 0b 8b 44 24 04 eb bd 48 8d ...
> > [56394.590077] RSP: 0018:ffffab02c07c7e68 EFLAGS: 00000246
> > [56394.590079] RAX: 0000000000000000 RBX: ffff9a7bc0ec8a40 RCX: ffffffffaab8eb40
> >
> > It might cause confusion what CPU the following lines belongs to and
> > whether the backtraces are really serialized.
> 
> I originally implemented this, but later decided against it because it
> causes idle CPUs to begin busy-waiting in NMI context in order to log a
> single line saying they are idle. If the user is aware that there is
> only 1 line for the idle message, then the user knows that it isn't
> causing a problem for reading the stack trace.

I agree, but don't have any strong opinion against the patch.

> Feel free to add:
>
> Reviewed-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] lib/nmi_backtrace: Serialize even messages about idle CPUs
  2021-07-27 15:47 ` John Ogness
  2021-07-28  1:21   ` Sergey Senozhatsky
@ 2021-07-30 11:08   ` Petr Mladek
  1 sibling, 0 replies; 4+ messages in thread
From: Petr Mladek @ 2021-07-30 11:08 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Tue 2021-07-27 17:53:04, John Ogness wrote:
> On 2021-07-27, Petr Mladek <pmladek@suse.com> wrote:
> > The commit 55d6af1d66885059ffc2a ("lib/nmi_backtrace: explicitly serialize
> > banner and regs") serialized backtraces from more CPUs using the re-entrant
> > printk_printk_cpu lock. It was a preparation step for removing the obsolete
> > nmi_safe buffers.
> >
> > The single-line messages about idle CPUs were not serialized against other
> > CPUs and might appear in the middle of backtrace from another CPU,
> > for example:
> >
> > [56394.590068] NMI backtrace for cpu 2
> > [56394.590069] CPU: 2 PID: 444 Comm: systemd-journal Not tainted 5.14.0-rc1-default+ #268
> > [56394.590071] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> > [56394.590072] RIP: 0010:lock_is_held_type+0x0/0x120
> > [56394.590071] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xb/0x10
> > [56394.590076] Code: a2 38 ff 0f 0b 8b 44 24 04 eb bd 48 8d ...
> > [56394.590077] RSP: 0018:ffffab02c07c7e68 EFLAGS: 00000246
> > [56394.590079] RAX: 0000000000000000 RBX: ffff9a7bc0ec8a40 RCX: ffffffffaab8eb40
> >
> > It might cause confusion what CPU the following lines belongs to and
> > whether the backtraces are really serialized.
> 
> I originally implemented this, but later decided against it because it
> causes idle CPUs to begin busy-waiting in NMI context in order to log a
> single line saying they are idle. If the user is aware that there is
> only 1 line for the idle message, then the user knows that it isn't
> causing a problem for reading the stack trace.
> 
> When triggering many such dumps on systems with many CPUs where this
> patch is applied, it seemed like I was making the whole system work
> awfully hard for something that should be trivial.

I understand the concern. I am not super happy with it as well.

> Considering that dump_stack() and show_regs() should be fast and we are
> only dumping to the lockless buffer, it is probably OK to be doing all
> the busy-waiting. Once atomic consoles are introduced, it will have
> quite an impact here, but atomic consoles are mostly only active on
> system crash, so I think that would be OK as well.

Also the single line is printed only on idle CPUs. They have nothing
else to do anyway ;-)

I have pushed the patch into printk/linux.git, branch
rework/printk_safe-removal.

We could always revert it when there are some problems with it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-30 11:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-27  8:09 [PATCH] lib/nmi_backtrace: Serialize even messages about idle CPUs Petr Mladek
2021-07-27 15:47 ` John Ogness
2021-07-28  1:21   ` Sergey Senozhatsky
2021-07-30 11:08   ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).