All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/4] Cleaning printk stuff in NMI context
@ 2016-04-21 11:48 ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

This patch set generalizes the already existing solution for
printing NMI messages. The main idea comes from Peter Zijlstra.

v5 adds changes suggested by Sergey Senozhatsky. It should not longer
conflict with his async printk patchset.

There are some conflicts with the nmi_backtrace improvements
from Chris Metcalf, see
https://lkml.kernel.org/g/<1459877208-15119-1-git-send-email-cmetcalf@mellanox.com>
Feel free to ask me to resolve them.


Changes against v4:

  + merged 2nd patch into the 1st one to do not break bisection

  + set vprintk_nmi() early in nmi_enter() and set vprintk_default()
    later in nmi_exit() to make sure that all NMI messages are printed
    to the temporary ring buffer

  + used printk_deferred() when flushing the temporary buffers
    on panic in NMI; we do not longer need to touch vprintk_emit()
    and define the ugly macro deferred_console_in_nmi().


Changes against v3:

  + merged all small changes from -mm tree, including commit
    descriptions

  + disabled interrupts when taking the read_lock in __printk_nmi_flush();
    printk_nmi_flush() might be called from any context; reported
    by lockdep

  + never introduce NEED_PRINTK_NMI; in -mm tree was introduced
    in 1st patch and removed in the 4th one

  + flush NMI buffers when the system goes down (new 5th patch);
    addresses Daniel's concerns


Changes against v2:

  + fixed compilation problems reported by 0-day build robot

  + MN10300 and Xtensa architectures will get handled separately

  + dropped the patch that printed NMI messages directly when Oops
    in progress; it made the solution less reliable

  + made the size of the buffer configurable; use real numbers
    instead of PAGE_SIZE


Changes against v1:

  + rebased on top of 4.4-rc2; there the old implementation was
    moved to lib/nmi_backtrace.c and used also on arm; I hope that
    I got the arm side correctly; I was not able to test on arm :-(

  + defined HAVE_NMI on arm for !CPU_V7M instead of !CPU_V7;
    handle_fiq_as_nmi() is called from entry-armv.S that
    is compiled when !CPU_V7M

  + defined HAVE_NMI also on mips; it calls nmi_enter() and
    seems to have real NMIs (or am I wrong?)

  + serialized backtraces when printing directly
    (oops_in_progress)


Petr Mladek (4):
  printk/nmi: generic solution for safe printk in NMI
  printk/nmi: warn when some message has been lost in NMI context
  printk/nmi: increase the size of NMI buffer and make it configurable
  printk/nmi: flush NMI messages on the system panic

 arch/Kconfig                  |   4 +
 arch/arm/Kconfig              |   1 +
 arch/arm/kernel/smp.c         |   2 +
 arch/avr32/Kconfig            |   1 +
 arch/blackfin/Kconfig         |   1 +
 arch/cris/Kconfig             |   1 +
 arch/mips/Kconfig             |   1 +
 arch/powerpc/Kconfig          |   1 +
 arch/s390/Kconfig             |   1 +
 arch/sh/Kconfig               |   1 +
 arch/sparc/Kconfig            |   1 +
 arch/tile/Kconfig             |   1 +
 arch/x86/Kconfig              |   1 +
 arch/x86/kernel/apic/hw_nmi.c |   1 -
 include/linux/hardirq.h       |   2 +
 include/linux/percpu.h        |   3 -
 include/linux/printk.h        |  14 ++-
 init/Kconfig                  |  27 +++++
 init/main.c                   |   1 +
 kernel/kexec_core.c           |   1 +
 kernel/panic.c                |   6 +-
 kernel/printk/Makefile        |   1 +
 kernel/printk/internal.h      |  57 +++++++++
 kernel/printk/nmi.c           | 260 ++++++++++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c        |  31 ++---
 lib/nmi_backtrace.c           |  89 +--------------
 26 files changed, 401 insertions(+), 109 deletions(-)
 create mode 100644 kernel/printk/internal.h
 create mode 100644 kernel/printk/nmi.c

-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 0/4] Cleaning printk stuff in NMI context
@ 2016-04-21 11:48 ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek

This patch set generalizes the already existing solution for
printing NMI messages. The main idea comes from Peter Zijlstra.

v5 adds changes suggested by Sergey Senozhatsky. It should not longer
conflict with his async printk patchset.

There are some conflicts with the nmi_backtrace improvements
from Chris Metcalf, see
https://lkml.kernel.org/g/<1459877208-15119-1-git-send-email-cmetcalf@mellanox.com>
Feel free to ask me to resolve them.


Changes against v4:

  + merged 2nd patch into the 1st one to do not break bisection

  + set vprintk_nmi() early in nmi_enter() and set vprintk_default()
    later in nmi_exit() to make sure that all NMI messages are printed
    to the temporary ring buffer

  + used printk_deferred() when flushing the temporary buffers
    on panic in NMI; we do not longer need to touch vprintk_emit()
    and define the ugly macro deferred_console_in_nmi().


Changes against v3:

  + merged all small changes from -mm tree, including commit
    descriptions

  + disabled interrupts when taking the read_lock in __printk_nmi_flush();
    printk_nmi_flush() might be called from any context; reported
    by lockdep

  + never introduce NEED_PRINTK_NMI; in -mm tree was introduced
    in 1st patch and removed in the 4th one

  + flush NMI buffers when the system goes down (new 5th patch);
    addresses Daniel's concerns


Changes against v2:

  + fixed compilation problems reported by 0-day build robot

  + MN10300 and Xtensa architectures will get handled separately

  + dropped the patch that printed NMI messages directly when Oops
    in progress; it made the solution less reliable

  + made the size of the buffer configurable; use real numbers
    instead of PAGE_SIZE


Changes against v1:

  + rebased on top of 4.4-rc2; there the old implementation was
    moved to lib/nmi_backtrace.c and used also on arm; I hope that
    I got the arm side correctly; I was not able to test on arm :-(

  + defined HAVE_NMI on arm for !CPU_V7M instead of !CPU_V7;
    handle_fiq_as_nmi() is called from entry-armv.S that
    is compiled when !CPU_V7M

  + defined HAVE_NMI also on mips; it calls nmi_enter() and
    seems to have real NMIs (or am I wrong?)

  + serialized backtraces when printing directly
    (oops_in_progress)


Petr Mladek (4):
  printk/nmi: generic solution for safe printk in NMI
  printk/nmi: warn when some message has been lost in NMI context
  printk/nmi: increase the size of NMI buffer and make it configurable
  printk/nmi: flush NMI messages on the system panic

 arch/Kconfig                  |   4 +
 arch/arm/Kconfig              |   1 +
 arch/arm/kernel/smp.c         |   2 +
 arch/avr32/Kconfig            |   1 +
 arch/blackfin/Kconfig         |   1 +
 arch/cris/Kconfig             |   1 +
 arch/mips/Kconfig             |   1 +
 arch/powerpc/Kconfig          |   1 +
 arch/s390/Kconfig             |   1 +
 arch/sh/Kconfig               |   1 +
 arch/sparc/Kconfig            |   1 +
 arch/tile/Kconfig             |   1 +
 arch/x86/Kconfig              |   1 +
 arch/x86/kernel/apic/hw_nmi.c |   1 -
 include/linux/hardirq.h       |   2 +
 include/linux/percpu.h        |   3 -
 include/linux/printk.h        |  14 ++-
 init/Kconfig                  |  27 +++++
 init/main.c                   |   1 +
 kernel/kexec_core.c           |   1 +
 kernel/panic.c                |   6 +-
 kernel/printk/Makefile        |   1 +
 kernel/printk/internal.h      |  57 +++++++++
 kernel/printk/nmi.c           | 260 ++++++++++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c        |  31 ++---
 lib/nmi_backtrace.c           |  89 +--------------
 26 files changed, 401 insertions(+), 109 deletions(-)
 create mode 100644 kernel/printk/internal.h
 create mode 100644 kernel/printk/nmi.c

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 0/4] Cleaning printk stuff in NMI context
@ 2016-04-21 11:48 ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

This patch set generalizes the already existing solution for
printing NMI messages. The main idea comes from Peter Zijlstra.

v5 adds changes suggested by Sergey Senozhatsky. It should not longer
conflict with his async printk patchset.

There are some conflicts with the nmi_backtrace improvements
from Chris Metcalf, see
https://lkml.kernel.org/g/<1459877208-15119-1-git-send-email-cmetcalf@mellanox.com>
Feel free to ask me to resolve them.


Changes against v4:

  + merged 2nd patch into the 1st one to do not break bisection

  + set vprintk_nmi() early in nmi_enter() and set vprintk_default()
    later in nmi_exit() to make sure that all NMI messages are printed
    to the temporary ring buffer

  + used printk_deferred() when flushing the temporary buffers
    on panic in NMI; we do not longer need to touch vprintk_emit()
    and define the ugly macro deferred_console_in_nmi().


Changes against v3:

  + merged all small changes from -mm tree, including commit
    descriptions

  + disabled interrupts when taking the read_lock in __printk_nmi_flush();
    printk_nmi_flush() might be called from any context; reported
    by lockdep

  + never introduce NEED_PRINTK_NMI; in -mm tree was introduced
    in 1st patch and removed in the 4th one

  + flush NMI buffers when the system goes down (new 5th patch);
    addresses Daniel's concerns


Changes against v2:

  + fixed compilation problems reported by 0-day build robot

  + MN10300 and Xtensa architectures will get handled separately

  + dropped the patch that printed NMI messages directly when Oops
    in progress; it made the solution less reliable

  + made the size of the buffer configurable; use real numbers
    instead of PAGE_SIZE


Changes against v1:

  + rebased on top of 4.4-rc2; there the old implementation was
    moved to lib/nmi_backtrace.c and used also on arm; I hope that
    I got the arm side correctly; I was not able to test on arm :-(

  + defined HAVE_NMI on arm for !CPU_V7M instead of !CPU_V7;
    handle_fiq_as_nmi() is called from entry-armv.S that
    is compiled when !CPU_V7M

  + defined HAVE_NMI also on mips; it calls nmi_enter() and
    seems to have real NMIs (or am I wrong?)

  + serialized backtraces when printing directly
    (oops_in_progress)


Petr Mladek (4):
  printk/nmi: generic solution for safe printk in NMI
  printk/nmi: warn when some message has been lost in NMI context
  printk/nmi: increase the size of NMI buffer and make it configurable
  printk/nmi: flush NMI messages on the system panic

 arch/Kconfig                  |   4 +
 arch/arm/Kconfig              |   1 +
 arch/arm/kernel/smp.c         |   2 +
 arch/avr32/Kconfig            |   1 +
 arch/blackfin/Kconfig         |   1 +
 arch/cris/Kconfig             |   1 +
 arch/mips/Kconfig             |   1 +
 arch/powerpc/Kconfig          |   1 +
 arch/s390/Kconfig             |   1 +
 arch/sh/Kconfig               |   1 +
 arch/sparc/Kconfig            |   1 +
 arch/tile/Kconfig             |   1 +
 arch/x86/Kconfig              |   1 +
 arch/x86/kernel/apic/hw_nmi.c |   1 -
 include/linux/hardirq.h       |   2 +
 include/linux/percpu.h        |   3 -
 include/linux/printk.h        |  14 ++-
 init/Kconfig                  |  27 +++++
 init/main.c                   |   1 +
 kernel/kexec_core.c           |   1 +
 kernel/panic.c                |   6 +-
 kernel/printk/Makefile        |   1 +
 kernel/printk/internal.h      |  57 +++++++++
 kernel/printk/nmi.c           | 260 ++++++++++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c        |  31 ++---
 lib/nmi_backtrace.c           |  89 +--------------
 26 files changed, 401 insertions(+), 109 deletions(-)
 create mode 100644 kernel/printk/internal.h
 create mode 100644 kernel/printk/nmi.c

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2016-04-21 11:48 ` Petr Mladek
  (?)
@ 2016-04-21 11:48   ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

printk() takes some locks and could not be used a safe way in NMI context.

The chance of a deadlock is real especially when printing stacks from all
CPUs.  This particular problem has been addressed on x86 by the commit
a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs").

The patchset brings two big advantages.  First, it makes the NMI
backtraces safe on all architectures for free.  Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited.  We still should keep the number of messages in NMI context at
minimum).

Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers.  These are not easy to avoid.

This patch reuses most of the code and makes it generic.  It is useful for
all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context.  It queues IRQ work to copy the messages into the
main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers.  There is also used a spinlock to get synchronized with other
flushers.

We do not longer use seq_buf because it depends on external lock.  It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.

The code is put into separate printk/nmi.c as suggested by Steven Rostedt.
It needs a per-CPU buffer and is compiled only on architectures that call
nmi_enter().  This is achieved by the new HAVE_NMI Kconfig flag.

The are MN10300 and Xtensa architectures.  We need to clean up NMI
handling there first.  Let's do it separately.

The patch is heavily based on the draft from Peter Zijlstra, see
https://lkml.org/lkml/2015/6/10/327

[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 arch/Kconfig                  |   4 +
 arch/arm/Kconfig              |   1 +
 arch/arm/kernel/smp.c         |   2 +
 arch/avr32/Kconfig            |   1 +
 arch/blackfin/Kconfig         |   1 +
 arch/cris/Kconfig             |   1 +
 arch/mips/Kconfig             |   1 +
 arch/powerpc/Kconfig          |   1 +
 arch/s390/Kconfig             |   1 +
 arch/sh/Kconfig               |   1 +
 arch/sparc/Kconfig            |   1 +
 arch/tile/Kconfig             |   1 +
 arch/x86/Kconfig              |   1 +
 arch/x86/kernel/apic/hw_nmi.c |   1 -
 include/linux/hardirq.h       |   2 +
 include/linux/percpu.h        |   3 -
 include/linux/printk.h        |  12 ++-
 init/Kconfig                  |   5 +
 init/main.c                   |   1 +
 kernel/printk/Makefile        |   1 +
 kernel/printk/internal.h      |  44 +++++++++
 kernel/printk/nmi.c           | 219 ++++++++++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c        |  19 +---
 lib/nmi_backtrace.c           |  89 +----------------
 24 files changed, 306 insertions(+), 107 deletions(-)
 create mode 100644 kernel/printk/internal.h
 create mode 100644 kernel/printk/nmi.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 81869a5e7e17..049f243a7af8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -187,7 +187,11 @@ config HAVE_OPTPROBES
 config HAVE_KPROBES_ON_FTRACE
 	bool
 
+config HAVE_NMI
+	bool
+
 config HAVE_NMI_WATCHDOG
+	depends on HAVE_NMI
 	bool
 #
 # An arch should select this if it provides all these things:
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cdfa6c2b7626..259543ec6dc9 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -66,6 +66,7 @@ config ARM
 	select HAVE_KRETPROBES if (HAVE_KPROBES)
 	select HAVE_MEMBLOCK
 	select HAVE_MOD_ARCH_SPECIFIC
+	select HAVE_NMI
 	select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
 	select HAVE_OPTPROBES if !THUMB2_KERNEL
 	select HAVE_PERF_EVENTS
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index baee70267f29..df90bc59bfce 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -644,9 +644,11 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 		break;
 
 	case IPI_CPU_BACKTRACE:
+		printk_nmi_enter();
 		irq_enter();
 		nmi_cpu_backtrace(regs);
 		irq_exit();
+		printk_nmi_exit();
 		break;
 
 	default:
diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig
index b6878eb64884..9dc3e2b1180b 100644
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -17,6 +17,7 @@ config AVR32
 	select GENERIC_CLOCKEVENTS
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
+	select HAVE_NMI
 	help
 	  AVR32 is a high-performance 32-bit RISC microprocessor core,
 	  designed for cost-sensitive embedded applications, with particular
diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index a63c12259e77..28c63fea786d 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -40,6 +40,7 @@ config BLACKFIN
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
 	select HAVE_DEBUG_STACKOVERFLOW
+	select HAVE_NMI
 
 config GENERIC_CSUM
 	def_bool y
diff --git a/arch/cris/Kconfig b/arch/cris/Kconfig
index e086f9e93728..62148eaf0189 100644
--- a/arch/cris/Kconfig
+++ b/arch/cris/Kconfig
@@ -69,6 +69,7 @@ config CRIS
 	select GENERIC_CLOCKEVENTS if ETRAX_ARCH_V32
 	select GENERIC_SCHED_CLOCK if ETRAX_ARCH_V32
 	select HAVE_DEBUG_BUGVERBOSE if ETRAX_ARCH_V32
+	select HAVE_NMI
 
 config HZ
 	int
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2018c2b0e078..7de958bb19f1 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -62,6 +62,7 @@ config MIPS
 	select HAVE_IRQ_TIME_ACCOUNTING
 	select GENERIC_TIME_VSYSCALL
 	select ARCH_CLOCKSOURCE_DATA
+	select HAVE_NMI
 
 menu "Machine selection"
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7cd32c038286..f99b19d2d572 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -153,6 +153,7 @@ config PPC
 	select NO_BOOTMEM
 	select HAVE_GENERIC_RCU_GUP
 	select HAVE_PERF_EVENTS_NMI if PPC64
+	select HAVE_NMI if PERF_EVENTS
 	select EDAC_SUPPORT
 	select EDAC_ATOMIC_SCRUB
 	select ARCH_HAS_DMA_SET_COHERENT_MASK
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index bf24ab188921..42d315fb8a00 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -164,6 +164,7 @@ config S390
 	select TTY
 	select VIRT_CPU_ACCOUNTING
 	select VIRT_TO_BUS
+	select HAVE_NMI
 
 
 config SCHED_OMIT_FRAME_POINTER
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 7ed20fc3fc81..1725e365d751 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -44,6 +44,7 @@ config SUPERH
 	select OLD_SIGSUSPEND
 	select OLD_SIGACTION
 	select HAVE_ARCH_AUDITSYSCALL
+	select HAVE_NMI
 	help
 	  The SuperH is a RISC processor targeted for use in embedded systems
 	  and consumer electronics; it was also used in the Sega Dreamcast
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 57ffaf285c2f..a900e7f35f6a 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -79,6 +79,7 @@ config SPARC64
 	select NO_BOOTMEM
 	select HAVE_ARCH_AUDITSYSCALL
 	select ARCH_SUPPORTS_ATOMIC_RMW
+	select HAVE_NMI
 
 config ARCH_DEFCONFIG
 	string
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 81719302b056..cedcbbefcd1c 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -29,6 +29,7 @@ config TILE
 	select HAVE_DEBUG_STACKOVERFLOW
 	select ARCH_WANT_FRAME_POINTERS
 	select HAVE_CONTEXT_TRACKING
+	select HAVE_NMI if USE_PMC
 	select EDAC_SUPPORT
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2dc18605831f..30ea1f834a06 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -130,6 +130,7 @@ config X86
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_MIXED_BREAKPOINTS_REGS
+	select HAVE_NMI
 	select HAVE_OPROFILE
 	select HAVE_OPTPROBES
 	select HAVE_PCSPKR_PLATFORM
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 045e424fb368..7788ce643bf4 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,7 +18,6 @@
 #include <linux/nmi.h>
 #include <linux/module.h>
 #include <linux/delay.h>
-#include <linux/seq_buf.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..c683996110b1 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -61,6 +61,7 @@ extern void irq_exit(void);
 
 #define nmi_enter()						\
 	do {							\
+		printk_nmi_enter();				\
 		lockdep_off();					\
 		ftrace_nmi_enter();				\
 		BUG_ON(in_nmi());				\
@@ -77,6 +78,7 @@ extern void irq_exit(void);
 		preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
 		ftrace_nmi_exit();				\
 		lockdep_on();					\
+		printk_nmi_exit();				\
 	} while (0)
 
 #endif /* LINUX_HARDIRQ_H */
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 4bc6dafb703e..56939d3f6e53 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -129,7 +129,4 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
 	(typeof(type) __percpu *)__alloc_percpu(sizeof(type),		\
 						__alignof__(type))
 
-/* To avoid include hell, as printk can not declare this, we declare it here */
-DECLARE_PER_CPU(printk_func_t, printk_func);
-
 #endif /* __LINUX_PERCPU_H */
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9ccbdf2c1453..51dd6b824fe2 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -122,7 +122,17 @@ static inline __printf(1, 2) __cold
 void early_printk(const char *s, ...) { }
 #endif
 
-typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+#ifdef CONFIG_PRINTK_NMI
+extern void printk_nmi_init(void);
+extern void printk_nmi_enter(void);
+extern void printk_nmi_exit(void);
+extern void printk_nmi_flush(void);
+#else
+static inline void printk_nmi_init(void) { }
+static inline void printk_nmi_enter(void) { }
+static inline void printk_nmi_exit(void) { }
+static inline void printk_nmi_flush(void) { }
+#endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
 asmlinkage __printf(5, 0)
diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d54c65..85c7a2bf1ea4 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1454,6 +1454,11 @@ config PRINTK
 	  very difficult to diagnose system problems, saying N here is
 	  strongly discouraged.
 
+config PRINTK_NMI
+	def_bool y
+	depends on PRINTK
+	depends on HAVE_NMI
+
 config BUG
 	bool "BUG() support" if EXPERT
 	default y
diff --git a/init/main.c b/init/main.c
index b3c6e363ae18..c0d3ff91c8e0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -569,6 +569,7 @@ asmlinkage __visible void __init start_kernel(void)
 	timekeeping_init();
 	time_init();
 	sched_clock_postinit();
+	printk_nmi_init();
 	perf_event_init();
 	profile_init();
 	call_function_init();
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 85405bdcf2b3..abb0042a427b 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,2 +1,3 @@
 obj-y	= printk.o
+obj-$(CONFIG_PRINTK_NMI)		+= nmi.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)	+= braille.o
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
new file mode 100644
index 000000000000..2de99faedfc1
--- /dev/null
+++ b/kernel/printk/internal.h
@@ -0,0 +1,44 @@
+/*
+ * internal.h - printk internal definitions
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/percpu.h>
+
+typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+
+int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
+
+#ifdef CONFIG_PRINTK_NMI
+
+/*
+ * printk() could not take logbuf_lock in NMI context. Instead,
+ * it temporary stores the strings into a per-CPU buffer.
+ * The alternative implementation is chosen transparently
+ * via per-CPU variable.
+ */
+DECLARE_PER_CPU(printk_func_t, printk_func);
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+{
+	return this_cpu_read(printk_func)(fmt, args);
+}
+
+#else /* CONFIG_PRINTK_NMI */
+
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+{
+	return vprintk_default(fmt, args);
+}
+
+#endif /* CONFIG_PRINTK_NMI */
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
new file mode 100644
index 000000000000..303cf0d15e57
--- /dev/null
+++ b/kernel/printk/nmi.c
@@ -0,0 +1,219 @@
+/*
+ * nmi.c - Safe printk in NMI context
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/preempt.h>
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+#include <linux/cpumask.h>
+#include <linux/irq_work.h>
+#include <linux/printk.h>
+
+#include "internal.h"
+
+/*
+ * printk() could not take logbuf_lock in NMI context. Instead,
+ * it uses an alternative implementation that temporary stores
+ * the strings into a per-CPU buffer. The content of the buffer
+ * is later flushed into the main ring buffer via IRQ work.
+ *
+ * The alternative implementation is chosen transparently
+ * via @printk_func per-CPU variable.
+ *
+ * The implementation allows to flush the strings also from another CPU.
+ * There are situations when we want to make sure that all buffers
+ * were handled or when IRQs are blocked.
+ */
+DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
+static int printk_nmi_irq_ready;
+
+#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
+
+struct nmi_seq_buf {
+	atomic_t		len;	/* length of written data */
+	struct irq_work		work;	/* IRQ work that flushes the buffer */
+	unsigned char		buffer[NMI_LOG_BUF_LEN];
+};
+static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
+
+/*
+ * Safe printk() for NMI context. It uses a per-CPU buffer to
+ * store the message. NMIs are not nested, so there is always only
+ * one writer running. But the buffer might get flushed from another
+ * CPU, so we need to be careful.
+ */
+static int vprintk_nmi(const char *fmt, va_list args)
+{
+	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+	int add = 0;
+	size_t len;
+
+again:
+	len = atomic_read(&s->len);
+
+	if (len >= sizeof(s->buffer))
+		return 0;
+
+	/*
+	 * Make sure that all old data have been read before the buffer was
+	 * reseted. This is not needed when we just append data.
+	 */
+	if (!len)
+		smp_rmb();
+
+	add = vsnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, args);
+
+	/*
+	 * Do it once again if the buffer has been flushed in the meantime.
+	 * Note that atomic_cmpxchg() is an implicit memory barrier that
+	 * makes sure that the data were written before updating s->len.
+	 */
+	if (atomic_cmpxchg(&s->len, len, len + add) != len)
+		goto again;
+
+	/* Get flushed in a more safe context. */
+	if (add && printk_nmi_irq_ready) {
+		/* Make sure that IRQ work is really initialized. */
+		smp_rmb();
+		irq_work_queue(&s->work);
+	}
+
+	return add;
+}
+
+/*
+ * printk one line from the temporary buffer from @start index until
+ * and including the @end index.
+ */
+static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end)
+{
+	const char *buf = s->buffer + start;
+
+	printk("%.*s", (end - start) + 1, buf);
+}
+
+/*
+ * Flush data from the associated per_CPU buffer. The function
+ * can be called either via IRQ work or independently.
+ */
+static void __printk_nmi_flush(struct irq_work *work)
+{
+	static raw_spinlock_t read_lock +		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
+	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
+	unsigned long flags;
+	size_t len, size;
+	int i, last_i;
+
+	/*
+	 * The lock has two functions. First, one reader has to flush all
+	 * available message to make the lockless synchronization with
+	 * writers easier. Second, we do not want to mix messages from
+	 * different CPUs. This is especially important when printing
+	 * a backtrace.
+	 */
+	raw_spin_lock_irqsave(&read_lock, flags);
+
+	i = 0;
+more:
+	len = atomic_read(&s->len);
+
+	/*
+	 * This is just a paranoid check that nobody has manipulated
+	 * the buffer an unexpected way. If we printed something then
+	 * @len must only increase.
+	 */
+	if (i && i >= len)
+		pr_err("printk_nmi_flush: internal error: i=%d >= len=%zu\n",
+		       i, len);
+
+	if (!len)
+		goto out; /* Someone else has already flushed the buffer. */
+
+	/* Make sure that data has been written up to the @len */
+	smp_rmb();
+
+	size = min(len, sizeof(s->buffer));
+	last_i = i;
+
+	/* Print line by line. */
+	for (; i < size; i++) {
+		if (s->buffer[i] = '\n') {
+			print_nmi_seq_line(s, last_i, i);
+			last_i = i + 1;
+		}
+	}
+	/* Check if there was a partial line. */
+	if (last_i < size) {
+		print_nmi_seq_line(s, last_i, size - 1);
+		pr_cont("\n");
+	}
+
+	/*
+	 * Check that nothing has got added in the meantime and truncate
+	 * the buffer. Note that atomic_cmpxchg() is an implicit memory
+	 * barrier that makes sure that the data were copied before
+	 * updating s->len.
+	 */
+	if (atomic_cmpxchg(&s->len, len, 0) != len)
+		goto more;
+
+out:
+	raw_spin_unlock_irqrestore(&read_lock, flags);
+}
+
+/**
+ * printk_nmi_flush - flush all per-cpu nmi buffers.
+ *
+ * The buffers are flushed automatically via IRQ work. This function
+ * is useful only when someone wants to be sure that all buffers have
+ * been flushed at some point.
+ */
+void printk_nmi_flush(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
+}
+
+void __init printk_nmi_init(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct nmi_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
+
+		init_irq_work(&s->work, __printk_nmi_flush);
+	}
+
+	/* Make sure that IRQ works are initialized before enabling. */
+	smp_wmb();
+	printk_nmi_irq_ready = 1;
+
+	/* Flush pending messages that did not have scheduled IRQ works. */
+	printk_nmi_flush();
+}
+
+void printk_nmi_enter(void)
+{
+	this_cpu_write(printk_func, vprintk_nmi);
+}
+
+void printk_nmi_exit(void)
+{
+	this_cpu_write(printk_func, vprintk_default);
+}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bfbf284e4218..71eba0607034 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -55,6 +55,7 @@
 
 #include "console_cmdline.h"
 #include "braille.h"
+#include "internal.h"
 
 int console_printk[4] = {
 	CONSOLE_LOGLEVEL_DEFAULT,	/* console_loglevel */
@@ -1807,14 +1808,6 @@ int vprintk_default(const char *fmt, va_list args)
 }
 EXPORT_SYMBOL_GPL(vprintk_default);
 
-/*
- * This allows printk to be diverted to another function per cpu.
- * This is useful for calling printk functions from within NMI
- * without worrying about race conditions that can lock up the
- * box.
- */
-DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
-
 /**
  * printk - print a kernel message
  * @fmt: format string
@@ -1838,21 +1831,11 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
  */
 asmlinkage __visible int printk(const char *fmt, ...)
 {
-	printk_func_t vprintk_func;
 	va_list args;
 	int r;
 
 	va_start(args, fmt);
-
-	/*
-	 * If a caller overrides the per_cpu printk_func, then it needs
-	 * to disable preemption when calling printk(). Otherwise
-	 * the printk_func should be set to the default. No need to
-	 * disable preemption here.
-	 */
-	vprintk_func = this_cpu_read(printk_func);
 	r = vprintk_func(fmt, args);
-
 	va_end(args);
 
 	return r;
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 6019c53c669e..26caf51cc238 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -16,33 +16,14 @@
 #include <linux/delay.h>
 #include <linux/kprobes.h>
 #include <linux/nmi.h>
-#include <linux/seq_buf.h>
 
 #ifdef arch_trigger_all_cpu_backtrace
 /* For reliability, we're prepared to waste bits here. */
 static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
-static cpumask_t printtrace_mask;
-
-#define NMI_BUF_SIZE		4096
-
-struct nmi_seq_buf {
-	unsigned char		buffer[NMI_BUF_SIZE];
-	struct seq_buf		seq;
-};
-
-/* Safe printing in NMI context */
-static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
 
 /* "in progress" flag of arch_trigger_all_cpu_backtrace */
 static unsigned long backtrace_flag;
 
-static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
-{
-	const char *buf = s->buffer + start;
-
-	printk("%.*s", (end - start) + 1, buf);
-}
-
 /*
  * When raise() is called it will be is passed a pointer to the
  * backtrace_mask. Architectures that call nmi_cpu_backtrace()
@@ -52,8 +33,7 @@ static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
 void nmi_trigger_all_cpu_backtrace(bool include_self,
 				   void (*raise)(cpumask_t *mask))
 {
-	struct nmi_seq_buf *s;
-	int i, cpu, this_cpu = get_cpu();
+	int i, this_cpu = get_cpu();
 
 	if (test_and_set_bit(0, &backtrace_flag)) {
 		/*
@@ -68,17 +48,6 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
 	if (!include_self)
 		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
 
-	cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
-
-	/*
-	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
-	 * CPUs will write to.
-	 */
-	for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
-		s = &per_cpu(nmi_print_seq, cpu);
-		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
-	}
-
 	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
 		pr_info("Sending NMI to %s CPUs:\n",
 			(include_self ? "all" : "other"));
@@ -94,73 +63,25 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
 	}
 
 	/*
-	 * Now that all the NMIs have triggered, we can dump out their
-	 * back traces safely to the console.
+	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * and therefore could not run their irq_work.
 	 */
-	for_each_cpu(cpu, &printtrace_mask) {
-		int len, last_i = 0;
+	printk_nmi_flush();
 
-		s = &per_cpu(nmi_print_seq, cpu);
-		len = seq_buf_used(&s->seq);
-		if (!len)
-			continue;
-
-		/* Print line by line. */
-		for (i = 0; i < len; i++) {
-			if (s->buffer[i] = '\n') {
-				print_seq_line(s, last_i, i);
-				last_i = i + 1;
-			}
-		}
-		/* Check if there was a partial line. */
-		if (last_i < len) {
-			print_seq_line(s, last_i, len - 1);
-			pr_cont("\n");
-		}
-	}
-
-	clear_bit(0, &backtrace_flag);
-	smp_mb__after_atomic();
+	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
 }
 
-/*
- * It is not safe to call printk() directly from NMI handlers.
- * It may be fine if the NMI detected a lock up and we have no choice
- * but to do so, but doing a NMI on all other CPUs to get a back trace
- * can be done with a sysrq-l. We don't want that to lock up, which
- * can happen if the NMI interrupts a printk in progress.
- *
- * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
- * the content into a per cpu seq_buf buffer. Then when the NMIs are
- * all done, we can safely dump the contents of the seq_buf to a printk()
- * from a non NMI context.
- */
-static int nmi_vprintk(const char *fmt, va_list args)
-{
-	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
-	unsigned int len = seq_buf_used(&s->seq);
-
-	seq_buf_vprintf(&s->seq, fmt, args);
-	return seq_buf_used(&s->seq) - len;
-}
-
 bool nmi_cpu_backtrace(struct pt_regs *regs)
 {
 	int cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
-		printk_func_t printk_func_save = this_cpu_read(printk_func);
-
-		/* Replace printk to write into the NMI seq */
-		this_cpu_write(printk_func, nmi_vprintk);
 		pr_warn("NMI backtrace for cpu %d\n", cpu);
 		if (regs)
 			show_regs(regs);
 		else
 			dump_stack();
-		this_cpu_write(printk_func, printk_func_save);
-
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}
-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2016-04-21 11:48   ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek, Jan Kara,
	Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky,
	David Miller

printk() takes some locks and could not be used a safe way in NMI context.

The chance of a deadlock is real especially when printing stacks from all
CPUs.  This particular problem has been addressed on x86 by the commit
a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs").

The patchset brings two big advantages.  First, it makes the NMI
backtraces safe on all architectures for free.  Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited.  We still should keep the number of messages in NMI context at
minimum).

Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers.  These are not easy to avoid.

This patch reuses most of the code and makes it generic.  It is useful for
all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context.  It queues IRQ work to copy the messages into the
main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers.  There is also used a spinlock to get synchronized with other
flushers.

We do not longer use seq_buf because it depends on external lock.  It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.

The code is put into separate printk/nmi.c as suggested by Steven Rostedt.
It needs a per-CPU buffer and is compiled only on architectures that call
nmi_enter().  This is achieved by the new HAVE_NMI Kconfig flag.

The are MN10300 and Xtensa architectures.  We need to clean up NMI
handling there first.  Let's do it separately.

The patch is heavily based on the draft from Peter Zijlstra, see
https://lkml.org/lkml/2015/6/10/327

[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 arch/Kconfig                  |   4 +
 arch/arm/Kconfig              |   1 +
 arch/arm/kernel/smp.c         |   2 +
 arch/avr32/Kconfig            |   1 +
 arch/blackfin/Kconfig         |   1 +
 arch/cris/Kconfig             |   1 +
 arch/mips/Kconfig             |   1 +
 arch/powerpc/Kconfig          |   1 +
 arch/s390/Kconfig             |   1 +
 arch/sh/Kconfig               |   1 +
 arch/sparc/Kconfig            |   1 +
 arch/tile/Kconfig             |   1 +
 arch/x86/Kconfig              |   1 +
 arch/x86/kernel/apic/hw_nmi.c |   1 -
 include/linux/hardirq.h       |   2 +
 include/linux/percpu.h        |   3 -
 include/linux/printk.h        |  12 ++-
 init/Kconfig                  |   5 +
 init/main.c                   |   1 +
 kernel/printk/Makefile        |   1 +
 kernel/printk/internal.h      |  44 +++++++++
 kernel/printk/nmi.c           | 219 ++++++++++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c        |  19 +---
 lib/nmi_backtrace.c           |  89 +----------------
 24 files changed, 306 insertions(+), 107 deletions(-)
 create mode 100644 kernel/printk/internal.h
 create mode 100644 kernel/printk/nmi.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 81869a5e7e17..049f243a7af8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -187,7 +187,11 @@ config HAVE_OPTPROBES
 config HAVE_KPROBES_ON_FTRACE
 	bool
 
+config HAVE_NMI
+	bool
+
 config HAVE_NMI_WATCHDOG
+	depends on HAVE_NMI
 	bool
 #
 # An arch should select this if it provides all these things:
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cdfa6c2b7626..259543ec6dc9 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -66,6 +66,7 @@ config ARM
 	select HAVE_KRETPROBES if (HAVE_KPROBES)
 	select HAVE_MEMBLOCK
 	select HAVE_MOD_ARCH_SPECIFIC
+	select HAVE_NMI
 	select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
 	select HAVE_OPTPROBES if !THUMB2_KERNEL
 	select HAVE_PERF_EVENTS
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index baee70267f29..df90bc59bfce 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -644,9 +644,11 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 		break;
 
 	case IPI_CPU_BACKTRACE:
+		printk_nmi_enter();
 		irq_enter();
 		nmi_cpu_backtrace(regs);
 		irq_exit();
+		printk_nmi_exit();
 		break;
 
 	default:
diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig
index b6878eb64884..9dc3e2b1180b 100644
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -17,6 +17,7 @@ config AVR32
 	select GENERIC_CLOCKEVENTS
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
+	select HAVE_NMI
 	help
 	  AVR32 is a high-performance 32-bit RISC microprocessor core,
 	  designed for cost-sensitive embedded applications, with particular
diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index a63c12259e77..28c63fea786d 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -40,6 +40,7 @@ config BLACKFIN
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
 	select HAVE_DEBUG_STACKOVERFLOW
+	select HAVE_NMI
 
 config GENERIC_CSUM
 	def_bool y
diff --git a/arch/cris/Kconfig b/arch/cris/Kconfig
index e086f9e93728..62148eaf0189 100644
--- a/arch/cris/Kconfig
+++ b/arch/cris/Kconfig
@@ -69,6 +69,7 @@ config CRIS
 	select GENERIC_CLOCKEVENTS if ETRAX_ARCH_V32
 	select GENERIC_SCHED_CLOCK if ETRAX_ARCH_V32
 	select HAVE_DEBUG_BUGVERBOSE if ETRAX_ARCH_V32
+	select HAVE_NMI
 
 config HZ
 	int
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2018c2b0e078..7de958bb19f1 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -62,6 +62,7 @@ config MIPS
 	select HAVE_IRQ_TIME_ACCOUNTING
 	select GENERIC_TIME_VSYSCALL
 	select ARCH_CLOCKSOURCE_DATA
+	select HAVE_NMI
 
 menu "Machine selection"
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7cd32c038286..f99b19d2d572 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -153,6 +153,7 @@ config PPC
 	select NO_BOOTMEM
 	select HAVE_GENERIC_RCU_GUP
 	select HAVE_PERF_EVENTS_NMI if PPC64
+	select HAVE_NMI if PERF_EVENTS
 	select EDAC_SUPPORT
 	select EDAC_ATOMIC_SCRUB
 	select ARCH_HAS_DMA_SET_COHERENT_MASK
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index bf24ab188921..42d315fb8a00 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -164,6 +164,7 @@ config S390
 	select TTY
 	select VIRT_CPU_ACCOUNTING
 	select VIRT_TO_BUS
+	select HAVE_NMI
 
 
 config SCHED_OMIT_FRAME_POINTER
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 7ed20fc3fc81..1725e365d751 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -44,6 +44,7 @@ config SUPERH
 	select OLD_SIGSUSPEND
 	select OLD_SIGACTION
 	select HAVE_ARCH_AUDITSYSCALL
+	select HAVE_NMI
 	help
 	  The SuperH is a RISC processor targeted for use in embedded systems
 	  and consumer electronics; it was also used in the Sega Dreamcast
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 57ffaf285c2f..a900e7f35f6a 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -79,6 +79,7 @@ config SPARC64
 	select NO_BOOTMEM
 	select HAVE_ARCH_AUDITSYSCALL
 	select ARCH_SUPPORTS_ATOMIC_RMW
+	select HAVE_NMI
 
 config ARCH_DEFCONFIG
 	string
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 81719302b056..cedcbbefcd1c 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -29,6 +29,7 @@ config TILE
 	select HAVE_DEBUG_STACKOVERFLOW
 	select ARCH_WANT_FRAME_POINTERS
 	select HAVE_CONTEXT_TRACKING
+	select HAVE_NMI if USE_PMC
 	select EDAC_SUPPORT
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2dc18605831f..30ea1f834a06 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -130,6 +130,7 @@ config X86
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_MIXED_BREAKPOINTS_REGS
+	select HAVE_NMI
 	select HAVE_OPROFILE
 	select HAVE_OPTPROBES
 	select HAVE_PCSPKR_PLATFORM
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 045e424fb368..7788ce643bf4 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,7 +18,6 @@
 #include <linux/nmi.h>
 #include <linux/module.h>
 #include <linux/delay.h>
-#include <linux/seq_buf.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..c683996110b1 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -61,6 +61,7 @@ extern void irq_exit(void);
 
 #define nmi_enter()						\
 	do {							\
+		printk_nmi_enter();				\
 		lockdep_off();					\
 		ftrace_nmi_enter();				\
 		BUG_ON(in_nmi());				\
@@ -77,6 +78,7 @@ extern void irq_exit(void);
 		preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
 		ftrace_nmi_exit();				\
 		lockdep_on();					\
+		printk_nmi_exit();				\
 	} while (0)
 
 #endif /* LINUX_HARDIRQ_H */
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 4bc6dafb703e..56939d3f6e53 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -129,7 +129,4 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
 	(typeof(type) __percpu *)__alloc_percpu(sizeof(type),		\
 						__alignof__(type))
 
-/* To avoid include hell, as printk can not declare this, we declare it here */
-DECLARE_PER_CPU(printk_func_t, printk_func);
-
 #endif /* __LINUX_PERCPU_H */
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9ccbdf2c1453..51dd6b824fe2 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -122,7 +122,17 @@ static inline __printf(1, 2) __cold
 void early_printk(const char *s, ...) { }
 #endif
 
-typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+#ifdef CONFIG_PRINTK_NMI
+extern void printk_nmi_init(void);
+extern void printk_nmi_enter(void);
+extern void printk_nmi_exit(void);
+extern void printk_nmi_flush(void);
+#else
+static inline void printk_nmi_init(void) { }
+static inline void printk_nmi_enter(void) { }
+static inline void printk_nmi_exit(void) { }
+static inline void printk_nmi_flush(void) { }
+#endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
 asmlinkage __printf(5, 0)
diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d54c65..85c7a2bf1ea4 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1454,6 +1454,11 @@ config PRINTK
 	  very difficult to diagnose system problems, saying N here is
 	  strongly discouraged.
 
+config PRINTK_NMI
+	def_bool y
+	depends on PRINTK
+	depends on HAVE_NMI
+
 config BUG
 	bool "BUG() support" if EXPERT
 	default y
diff --git a/init/main.c b/init/main.c
index b3c6e363ae18..c0d3ff91c8e0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -569,6 +569,7 @@ asmlinkage __visible void __init start_kernel(void)
 	timekeeping_init();
 	time_init();
 	sched_clock_postinit();
+	printk_nmi_init();
 	perf_event_init();
 	profile_init();
 	call_function_init();
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 85405bdcf2b3..abb0042a427b 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,2 +1,3 @@
 obj-y	= printk.o
+obj-$(CONFIG_PRINTK_NMI)		+= nmi.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)	+= braille.o
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
new file mode 100644
index 000000000000..2de99faedfc1
--- /dev/null
+++ b/kernel/printk/internal.h
@@ -0,0 +1,44 @@
+/*
+ * internal.h - printk internal definitions
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/percpu.h>
+
+typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+
+int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
+
+#ifdef CONFIG_PRINTK_NMI
+
+/*
+ * printk() could not take logbuf_lock in NMI context. Instead,
+ * it temporary stores the strings into a per-CPU buffer.
+ * The alternative implementation is chosen transparently
+ * via per-CPU variable.
+ */
+DECLARE_PER_CPU(printk_func_t, printk_func);
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+{
+	return this_cpu_read(printk_func)(fmt, args);
+}
+
+#else /* CONFIG_PRINTK_NMI */
+
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+{
+	return vprintk_default(fmt, args);
+}
+
+#endif /* CONFIG_PRINTK_NMI */
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
new file mode 100644
index 000000000000..303cf0d15e57
--- /dev/null
+++ b/kernel/printk/nmi.c
@@ -0,0 +1,219 @@
+/*
+ * nmi.c - Safe printk in NMI context
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/preempt.h>
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+#include <linux/cpumask.h>
+#include <linux/irq_work.h>
+#include <linux/printk.h>
+
+#include "internal.h"
+
+/*
+ * printk() could not take logbuf_lock in NMI context. Instead,
+ * it uses an alternative implementation that temporary stores
+ * the strings into a per-CPU buffer. The content of the buffer
+ * is later flushed into the main ring buffer via IRQ work.
+ *
+ * The alternative implementation is chosen transparently
+ * via @printk_func per-CPU variable.
+ *
+ * The implementation allows to flush the strings also from another CPU.
+ * There are situations when we want to make sure that all buffers
+ * were handled or when IRQs are blocked.
+ */
+DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
+static int printk_nmi_irq_ready;
+
+#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
+
+struct nmi_seq_buf {
+	atomic_t		len;	/* length of written data */
+	struct irq_work		work;	/* IRQ work that flushes the buffer */
+	unsigned char		buffer[NMI_LOG_BUF_LEN];
+};
+static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
+
+/*
+ * Safe printk() for NMI context. It uses a per-CPU buffer to
+ * store the message. NMIs are not nested, so there is always only
+ * one writer running. But the buffer might get flushed from another
+ * CPU, so we need to be careful.
+ */
+static int vprintk_nmi(const char *fmt, va_list args)
+{
+	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+	int add = 0;
+	size_t len;
+
+again:
+	len = atomic_read(&s->len);
+
+	if (len >= sizeof(s->buffer))
+		return 0;
+
+	/*
+	 * Make sure that all old data have been read before the buffer was
+	 * reseted. This is not needed when we just append data.
+	 */
+	if (!len)
+		smp_rmb();
+
+	add = vsnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, args);
+
+	/*
+	 * Do it once again if the buffer has been flushed in the meantime.
+	 * Note that atomic_cmpxchg() is an implicit memory barrier that
+	 * makes sure that the data were written before updating s->len.
+	 */
+	if (atomic_cmpxchg(&s->len, len, len + add) != len)
+		goto again;
+
+	/* Get flushed in a more safe context. */
+	if (add && printk_nmi_irq_ready) {
+		/* Make sure that IRQ work is really initialized. */
+		smp_rmb();
+		irq_work_queue(&s->work);
+	}
+
+	return add;
+}
+
+/*
+ * printk one line from the temporary buffer from @start index until
+ * and including the @end index.
+ */
+static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end)
+{
+	const char *buf = s->buffer + start;
+
+	printk("%.*s", (end - start) + 1, buf);
+}
+
+/*
+ * Flush data from the associated per_CPU buffer. The function
+ * can be called either via IRQ work or independently.
+ */
+static void __printk_nmi_flush(struct irq_work *work)
+{
+	static raw_spinlock_t read_lock =
+		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
+	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
+	unsigned long flags;
+	size_t len, size;
+	int i, last_i;
+
+	/*
+	 * The lock has two functions. First, one reader has to flush all
+	 * available message to make the lockless synchronization with
+	 * writers easier. Second, we do not want to mix messages from
+	 * different CPUs. This is especially important when printing
+	 * a backtrace.
+	 */
+	raw_spin_lock_irqsave(&read_lock, flags);
+
+	i = 0;
+more:
+	len = atomic_read(&s->len);
+
+	/*
+	 * This is just a paranoid check that nobody has manipulated
+	 * the buffer an unexpected way. If we printed something then
+	 * @len must only increase.
+	 */
+	if (i && i >= len)
+		pr_err("printk_nmi_flush: internal error: i=%d >= len=%zu\n",
+		       i, len);
+
+	if (!len)
+		goto out; /* Someone else has already flushed the buffer. */
+
+	/* Make sure that data has been written up to the @len */
+	smp_rmb();
+
+	size = min(len, sizeof(s->buffer));
+	last_i = i;
+
+	/* Print line by line. */
+	for (; i < size; i++) {
+		if (s->buffer[i] == '\n') {
+			print_nmi_seq_line(s, last_i, i);
+			last_i = i + 1;
+		}
+	}
+	/* Check if there was a partial line. */
+	if (last_i < size) {
+		print_nmi_seq_line(s, last_i, size - 1);
+		pr_cont("\n");
+	}
+
+	/*
+	 * Check that nothing has got added in the meantime and truncate
+	 * the buffer. Note that atomic_cmpxchg() is an implicit memory
+	 * barrier that makes sure that the data were copied before
+	 * updating s->len.
+	 */
+	if (atomic_cmpxchg(&s->len, len, 0) != len)
+		goto more;
+
+out:
+	raw_spin_unlock_irqrestore(&read_lock, flags);
+}
+
+/**
+ * printk_nmi_flush - flush all per-cpu nmi buffers.
+ *
+ * The buffers are flushed automatically via IRQ work. This function
+ * is useful only when someone wants to be sure that all buffers have
+ * been flushed at some point.
+ */
+void printk_nmi_flush(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
+}
+
+void __init printk_nmi_init(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct nmi_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
+
+		init_irq_work(&s->work, __printk_nmi_flush);
+	}
+
+	/* Make sure that IRQ works are initialized before enabling. */
+	smp_wmb();
+	printk_nmi_irq_ready = 1;
+
+	/* Flush pending messages that did not have scheduled IRQ works. */
+	printk_nmi_flush();
+}
+
+void printk_nmi_enter(void)
+{
+	this_cpu_write(printk_func, vprintk_nmi);
+}
+
+void printk_nmi_exit(void)
+{
+	this_cpu_write(printk_func, vprintk_default);
+}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bfbf284e4218..71eba0607034 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -55,6 +55,7 @@
 
 #include "console_cmdline.h"
 #include "braille.h"
+#include "internal.h"
 
 int console_printk[4] = {
 	CONSOLE_LOGLEVEL_DEFAULT,	/* console_loglevel */
@@ -1807,14 +1808,6 @@ int vprintk_default(const char *fmt, va_list args)
 }
 EXPORT_SYMBOL_GPL(vprintk_default);
 
-/*
- * This allows printk to be diverted to another function per cpu.
- * This is useful for calling printk functions from within NMI
- * without worrying about race conditions that can lock up the
- * box.
- */
-DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
-
 /**
  * printk - print a kernel message
  * @fmt: format string
@@ -1838,21 +1831,11 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
  */
 asmlinkage __visible int printk(const char *fmt, ...)
 {
-	printk_func_t vprintk_func;
 	va_list args;
 	int r;
 
 	va_start(args, fmt);
-
-	/*
-	 * If a caller overrides the per_cpu printk_func, then it needs
-	 * to disable preemption when calling printk(). Otherwise
-	 * the printk_func should be set to the default. No need to
-	 * disable preemption here.
-	 */
-	vprintk_func = this_cpu_read(printk_func);
 	r = vprintk_func(fmt, args);
-
 	va_end(args);
 
 	return r;
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 6019c53c669e..26caf51cc238 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -16,33 +16,14 @@
 #include <linux/delay.h>
 #include <linux/kprobes.h>
 #include <linux/nmi.h>
-#include <linux/seq_buf.h>
 
 #ifdef arch_trigger_all_cpu_backtrace
 /* For reliability, we're prepared to waste bits here. */
 static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
-static cpumask_t printtrace_mask;
-
-#define NMI_BUF_SIZE		4096
-
-struct nmi_seq_buf {
-	unsigned char		buffer[NMI_BUF_SIZE];
-	struct seq_buf		seq;
-};
-
-/* Safe printing in NMI context */
-static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
 
 /* "in progress" flag of arch_trigger_all_cpu_backtrace */
 static unsigned long backtrace_flag;
 
-static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
-{
-	const char *buf = s->buffer + start;
-
-	printk("%.*s", (end - start) + 1, buf);
-}
-
 /*
  * When raise() is called it will be is passed a pointer to the
  * backtrace_mask. Architectures that call nmi_cpu_backtrace()
@@ -52,8 +33,7 @@ static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
 void nmi_trigger_all_cpu_backtrace(bool include_self,
 				   void (*raise)(cpumask_t *mask))
 {
-	struct nmi_seq_buf *s;
-	int i, cpu, this_cpu = get_cpu();
+	int i, this_cpu = get_cpu();
 
 	if (test_and_set_bit(0, &backtrace_flag)) {
 		/*
@@ -68,17 +48,6 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
 	if (!include_self)
 		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
 
-	cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
-
-	/*
-	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
-	 * CPUs will write to.
-	 */
-	for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
-		s = &per_cpu(nmi_print_seq, cpu);
-		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
-	}
-
 	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
 		pr_info("Sending NMI to %s CPUs:\n",
 			(include_self ? "all" : "other"));
@@ -94,73 +63,25 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
 	}
 
 	/*
-	 * Now that all the NMIs have triggered, we can dump out their
-	 * back traces safely to the console.
+	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * and therefore could not run their irq_work.
 	 */
-	for_each_cpu(cpu, &printtrace_mask) {
-		int len, last_i = 0;
+	printk_nmi_flush();
 
-		s = &per_cpu(nmi_print_seq, cpu);
-		len = seq_buf_used(&s->seq);
-		if (!len)
-			continue;
-
-		/* Print line by line. */
-		for (i = 0; i < len; i++) {
-			if (s->buffer[i] == '\n') {
-				print_seq_line(s, last_i, i);
-				last_i = i + 1;
-			}
-		}
-		/* Check if there was a partial line. */
-		if (last_i < len) {
-			print_seq_line(s, last_i, len - 1);
-			pr_cont("\n");
-		}
-	}
-
-	clear_bit(0, &backtrace_flag);
-	smp_mb__after_atomic();
+	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
 }
 
-/*
- * It is not safe to call printk() directly from NMI handlers.
- * It may be fine if the NMI detected a lock up and we have no choice
- * but to do so, but doing a NMI on all other CPUs to get a back trace
- * can be done with a sysrq-l. We don't want that to lock up, which
- * can happen if the NMI interrupts a printk in progress.
- *
- * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
- * the content into a per cpu seq_buf buffer. Then when the NMIs are
- * all done, we can safely dump the contents of the seq_buf to a printk()
- * from a non NMI context.
- */
-static int nmi_vprintk(const char *fmt, va_list args)
-{
-	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
-	unsigned int len = seq_buf_used(&s->seq);
-
-	seq_buf_vprintf(&s->seq, fmt, args);
-	return seq_buf_used(&s->seq) - len;
-}
-
 bool nmi_cpu_backtrace(struct pt_regs *regs)
 {
 	int cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
-		printk_func_t printk_func_save = this_cpu_read(printk_func);
-
-		/* Replace printk to write into the NMI seq */
-		this_cpu_write(printk_func, nmi_vprintk);
 		pr_warn("NMI backtrace for cpu %d\n", cpu);
 		if (regs)
 			show_regs(regs);
 		else
 			dump_stack();
-		this_cpu_write(printk_func, printk_func_save);
-
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2016-04-21 11:48   ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

printk() takes some locks and could not be used a safe way in NMI context.

The chance of a deadlock is real especially when printing stacks from all
CPUs.  This particular problem has been addressed on x86 by the commit
a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs").

The patchset brings two big advantages.  First, it makes the NMI
backtraces safe on all architectures for free.  Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited.  We still should keep the number of messages in NMI context at
minimum).

Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers.  These are not easy to avoid.

This patch reuses most of the code and makes it generic.  It is useful for
all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context.  It queues IRQ work to copy the messages into the
main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers.  There is also used a spinlock to get synchronized with other
flushers.

We do not longer use seq_buf because it depends on external lock.  It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.

The code is put into separate printk/nmi.c as suggested by Steven Rostedt.
It needs a per-CPU buffer and is compiled only on architectures that call
nmi_enter().  This is achieved by the new HAVE_NMI Kconfig flag.

The are MN10300 and Xtensa architectures.  We need to clean up NMI
handling there first.  Let's do it separately.

The patch is heavily based on the draft from Peter Zijlstra, see
https://lkml.org/lkml/2015/6/10/327

[arnd at arndb.de: printk-nmi: use %zu format string for size_t]
[akpm at linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 arch/Kconfig                  |   4 +
 arch/arm/Kconfig              |   1 +
 arch/arm/kernel/smp.c         |   2 +
 arch/avr32/Kconfig            |   1 +
 arch/blackfin/Kconfig         |   1 +
 arch/cris/Kconfig             |   1 +
 arch/mips/Kconfig             |   1 +
 arch/powerpc/Kconfig          |   1 +
 arch/s390/Kconfig             |   1 +
 arch/sh/Kconfig               |   1 +
 arch/sparc/Kconfig            |   1 +
 arch/tile/Kconfig             |   1 +
 arch/x86/Kconfig              |   1 +
 arch/x86/kernel/apic/hw_nmi.c |   1 -
 include/linux/hardirq.h       |   2 +
 include/linux/percpu.h        |   3 -
 include/linux/printk.h        |  12 ++-
 init/Kconfig                  |   5 +
 init/main.c                   |   1 +
 kernel/printk/Makefile        |   1 +
 kernel/printk/internal.h      |  44 +++++++++
 kernel/printk/nmi.c           | 219 ++++++++++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c        |  19 +---
 lib/nmi_backtrace.c           |  89 +----------------
 24 files changed, 306 insertions(+), 107 deletions(-)
 create mode 100644 kernel/printk/internal.h
 create mode 100644 kernel/printk/nmi.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 81869a5e7e17..049f243a7af8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -187,7 +187,11 @@ config HAVE_OPTPROBES
 config HAVE_KPROBES_ON_FTRACE
 	bool
 
+config HAVE_NMI
+	bool
+
 config HAVE_NMI_WATCHDOG
+	depends on HAVE_NMI
 	bool
 #
 # An arch should select this if it provides all these things:
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cdfa6c2b7626..259543ec6dc9 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -66,6 +66,7 @@ config ARM
 	select HAVE_KRETPROBES if (HAVE_KPROBES)
 	select HAVE_MEMBLOCK
 	select HAVE_MOD_ARCH_SPECIFIC
+	select HAVE_NMI
 	select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
 	select HAVE_OPTPROBES if !THUMB2_KERNEL
 	select HAVE_PERF_EVENTS
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index baee70267f29..df90bc59bfce 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -644,9 +644,11 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 		break;
 
 	case IPI_CPU_BACKTRACE:
+		printk_nmi_enter();
 		irq_enter();
 		nmi_cpu_backtrace(regs);
 		irq_exit();
+		printk_nmi_exit();
 		break;
 
 	default:
diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig
index b6878eb64884..9dc3e2b1180b 100644
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -17,6 +17,7 @@ config AVR32
 	select GENERIC_CLOCKEVENTS
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
+	select HAVE_NMI
 	help
 	  AVR32 is a high-performance 32-bit RISC microprocessor core,
 	  designed for cost-sensitive embedded applications, with particular
diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index a63c12259e77..28c63fea786d 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -40,6 +40,7 @@ config BLACKFIN
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
 	select HAVE_DEBUG_STACKOVERFLOW
+	select HAVE_NMI
 
 config GENERIC_CSUM
 	def_bool y
diff --git a/arch/cris/Kconfig b/arch/cris/Kconfig
index e086f9e93728..62148eaf0189 100644
--- a/arch/cris/Kconfig
+++ b/arch/cris/Kconfig
@@ -69,6 +69,7 @@ config CRIS
 	select GENERIC_CLOCKEVENTS if ETRAX_ARCH_V32
 	select GENERIC_SCHED_CLOCK if ETRAX_ARCH_V32
 	select HAVE_DEBUG_BUGVERBOSE if ETRAX_ARCH_V32
+	select HAVE_NMI
 
 config HZ
 	int
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2018c2b0e078..7de958bb19f1 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -62,6 +62,7 @@ config MIPS
 	select HAVE_IRQ_TIME_ACCOUNTING
 	select GENERIC_TIME_VSYSCALL
 	select ARCH_CLOCKSOURCE_DATA
+	select HAVE_NMI
 
 menu "Machine selection"
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7cd32c038286..f99b19d2d572 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -153,6 +153,7 @@ config PPC
 	select NO_BOOTMEM
 	select HAVE_GENERIC_RCU_GUP
 	select HAVE_PERF_EVENTS_NMI if PPC64
+	select HAVE_NMI if PERF_EVENTS
 	select EDAC_SUPPORT
 	select EDAC_ATOMIC_SCRUB
 	select ARCH_HAS_DMA_SET_COHERENT_MASK
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index bf24ab188921..42d315fb8a00 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -164,6 +164,7 @@ config S390
 	select TTY
 	select VIRT_CPU_ACCOUNTING
 	select VIRT_TO_BUS
+	select HAVE_NMI
 
 
 config SCHED_OMIT_FRAME_POINTER
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 7ed20fc3fc81..1725e365d751 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -44,6 +44,7 @@ config SUPERH
 	select OLD_SIGSUSPEND
 	select OLD_SIGACTION
 	select HAVE_ARCH_AUDITSYSCALL
+	select HAVE_NMI
 	help
 	  The SuperH is a RISC processor targeted for use in embedded systems
 	  and consumer electronics; it was also used in the Sega Dreamcast
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 57ffaf285c2f..a900e7f35f6a 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -79,6 +79,7 @@ config SPARC64
 	select NO_BOOTMEM
 	select HAVE_ARCH_AUDITSYSCALL
 	select ARCH_SUPPORTS_ATOMIC_RMW
+	select HAVE_NMI
 
 config ARCH_DEFCONFIG
 	string
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 81719302b056..cedcbbefcd1c 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -29,6 +29,7 @@ config TILE
 	select HAVE_DEBUG_STACKOVERFLOW
 	select ARCH_WANT_FRAME_POINTERS
 	select HAVE_CONTEXT_TRACKING
+	select HAVE_NMI if USE_PMC
 	select EDAC_SUPPORT
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2dc18605831f..30ea1f834a06 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -130,6 +130,7 @@ config X86
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_MIXED_BREAKPOINTS_REGS
+	select HAVE_NMI
 	select HAVE_OPROFILE
 	select HAVE_OPTPROBES
 	select HAVE_PCSPKR_PLATFORM
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 045e424fb368..7788ce643bf4 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,7 +18,6 @@
 #include <linux/nmi.h>
 #include <linux/module.h>
 #include <linux/delay.h>
-#include <linux/seq_buf.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..c683996110b1 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -61,6 +61,7 @@ extern void irq_exit(void);
 
 #define nmi_enter()						\
 	do {							\
+		printk_nmi_enter();				\
 		lockdep_off();					\
 		ftrace_nmi_enter();				\
 		BUG_ON(in_nmi());				\
@@ -77,6 +78,7 @@ extern void irq_exit(void);
 		preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
 		ftrace_nmi_exit();				\
 		lockdep_on();					\
+		printk_nmi_exit();				\
 	} while (0)
 
 #endif /* LINUX_HARDIRQ_H */
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 4bc6dafb703e..56939d3f6e53 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -129,7 +129,4 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
 	(typeof(type) __percpu *)__alloc_percpu(sizeof(type),		\
 						__alignof__(type))
 
-/* To avoid include hell, as printk can not declare this, we declare it here */
-DECLARE_PER_CPU(printk_func_t, printk_func);
-
 #endif /* __LINUX_PERCPU_H */
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9ccbdf2c1453..51dd6b824fe2 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -122,7 +122,17 @@ static inline __printf(1, 2) __cold
 void early_printk(const char *s, ...) { }
 #endif
 
-typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+#ifdef CONFIG_PRINTK_NMI
+extern void printk_nmi_init(void);
+extern void printk_nmi_enter(void);
+extern void printk_nmi_exit(void);
+extern void printk_nmi_flush(void);
+#else
+static inline void printk_nmi_init(void) { }
+static inline void printk_nmi_enter(void) { }
+static inline void printk_nmi_exit(void) { }
+static inline void printk_nmi_flush(void) { }
+#endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
 asmlinkage __printf(5, 0)
diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d54c65..85c7a2bf1ea4 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1454,6 +1454,11 @@ config PRINTK
 	  very difficult to diagnose system problems, saying N here is
 	  strongly discouraged.
 
+config PRINTK_NMI
+	def_bool y
+	depends on PRINTK
+	depends on HAVE_NMI
+
 config BUG
 	bool "BUG() support" if EXPERT
 	default y
diff --git a/init/main.c b/init/main.c
index b3c6e363ae18..c0d3ff91c8e0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -569,6 +569,7 @@ asmlinkage __visible void __init start_kernel(void)
 	timekeeping_init();
 	time_init();
 	sched_clock_postinit();
+	printk_nmi_init();
 	perf_event_init();
 	profile_init();
 	call_function_init();
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 85405bdcf2b3..abb0042a427b 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,2 +1,3 @@
 obj-y	= printk.o
+obj-$(CONFIG_PRINTK_NMI)		+= nmi.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)	+= braille.o
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
new file mode 100644
index 000000000000..2de99faedfc1
--- /dev/null
+++ b/kernel/printk/internal.h
@@ -0,0 +1,44 @@
+/*
+ * internal.h - printk internal definitions
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/percpu.h>
+
+typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+
+int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
+
+#ifdef CONFIG_PRINTK_NMI
+
+/*
+ * printk() could not take logbuf_lock in NMI context. Instead,
+ * it temporary stores the strings into a per-CPU buffer.
+ * The alternative implementation is chosen transparently
+ * via per-CPU variable.
+ */
+DECLARE_PER_CPU(printk_func_t, printk_func);
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+{
+	return this_cpu_read(printk_func)(fmt, args);
+}
+
+#else /* CONFIG_PRINTK_NMI */
+
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+{
+	return vprintk_default(fmt, args);
+}
+
+#endif /* CONFIG_PRINTK_NMI */
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
new file mode 100644
index 000000000000..303cf0d15e57
--- /dev/null
+++ b/kernel/printk/nmi.c
@@ -0,0 +1,219 @@
+/*
+ * nmi.c - Safe printk in NMI context
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/preempt.h>
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+#include <linux/cpumask.h>
+#include <linux/irq_work.h>
+#include <linux/printk.h>
+
+#include "internal.h"
+
+/*
+ * printk() could not take logbuf_lock in NMI context. Instead,
+ * it uses an alternative implementation that temporary stores
+ * the strings into a per-CPU buffer. The content of the buffer
+ * is later flushed into the main ring buffer via IRQ work.
+ *
+ * The alternative implementation is chosen transparently
+ * via @printk_func per-CPU variable.
+ *
+ * The implementation allows to flush the strings also from another CPU.
+ * There are situations when we want to make sure that all buffers
+ * were handled or when IRQs are blocked.
+ */
+DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
+static int printk_nmi_irq_ready;
+
+#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
+
+struct nmi_seq_buf {
+	atomic_t		len;	/* length of written data */
+	struct irq_work		work;	/* IRQ work that flushes the buffer */
+	unsigned char		buffer[NMI_LOG_BUF_LEN];
+};
+static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
+
+/*
+ * Safe printk() for NMI context. It uses a per-CPU buffer to
+ * store the message. NMIs are not nested, so there is always only
+ * one writer running. But the buffer might get flushed from another
+ * CPU, so we need to be careful.
+ */
+static int vprintk_nmi(const char *fmt, va_list args)
+{
+	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+	int add = 0;
+	size_t len;
+
+again:
+	len = atomic_read(&s->len);
+
+	if (len >= sizeof(s->buffer))
+		return 0;
+
+	/*
+	 * Make sure that all old data have been read before the buffer was
+	 * reseted. This is not needed when we just append data.
+	 */
+	if (!len)
+		smp_rmb();
+
+	add = vsnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, args);
+
+	/*
+	 * Do it once again if the buffer has been flushed in the meantime.
+	 * Note that atomic_cmpxchg() is an implicit memory barrier that
+	 * makes sure that the data were written before updating s->len.
+	 */
+	if (atomic_cmpxchg(&s->len, len, len + add) != len)
+		goto again;
+
+	/* Get flushed in a more safe context. */
+	if (add && printk_nmi_irq_ready) {
+		/* Make sure that IRQ work is really initialized. */
+		smp_rmb();
+		irq_work_queue(&s->work);
+	}
+
+	return add;
+}
+
+/*
+ * printk one line from the temporary buffer from @start index until
+ * and including the @end index.
+ */
+static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end)
+{
+	const char *buf = s->buffer + start;
+
+	printk("%.*s", (end - start) + 1, buf);
+}
+
+/*
+ * Flush data from the associated per_CPU buffer. The function
+ * can be called either via IRQ work or independently.
+ */
+static void __printk_nmi_flush(struct irq_work *work)
+{
+	static raw_spinlock_t read_lock =
+		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
+	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
+	unsigned long flags;
+	size_t len, size;
+	int i, last_i;
+
+	/*
+	 * The lock has two functions. First, one reader has to flush all
+	 * available message to make the lockless synchronization with
+	 * writers easier. Second, we do not want to mix messages from
+	 * different CPUs. This is especially important when printing
+	 * a backtrace.
+	 */
+	raw_spin_lock_irqsave(&read_lock, flags);
+
+	i = 0;
+more:
+	len = atomic_read(&s->len);
+
+	/*
+	 * This is just a paranoid check that nobody has manipulated
+	 * the buffer an unexpected way. If we printed something then
+	 * @len must only increase.
+	 */
+	if (i && i >= len)
+		pr_err("printk_nmi_flush: internal error: i=%d >= len=%zu\n",
+		       i, len);
+
+	if (!len)
+		goto out; /* Someone else has already flushed the buffer. */
+
+	/* Make sure that data has been written up to the @len */
+	smp_rmb();
+
+	size = min(len, sizeof(s->buffer));
+	last_i = i;
+
+	/* Print line by line. */
+	for (; i < size; i++) {
+		if (s->buffer[i] == '\n') {
+			print_nmi_seq_line(s, last_i, i);
+			last_i = i + 1;
+		}
+	}
+	/* Check if there was a partial line. */
+	if (last_i < size) {
+		print_nmi_seq_line(s, last_i, size - 1);
+		pr_cont("\n");
+	}
+
+	/*
+	 * Check that nothing has got added in the meantime and truncate
+	 * the buffer. Note that atomic_cmpxchg() is an implicit memory
+	 * barrier that makes sure that the data were copied before
+	 * updating s->len.
+	 */
+	if (atomic_cmpxchg(&s->len, len, 0) != len)
+		goto more;
+
+out:
+	raw_spin_unlock_irqrestore(&read_lock, flags);
+}
+
+/**
+ * printk_nmi_flush - flush all per-cpu nmi buffers.
+ *
+ * The buffers are flushed automatically via IRQ work. This function
+ * is useful only when someone wants to be sure that all buffers have
+ * been flushed at some point.
+ */
+void printk_nmi_flush(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
+}
+
+void __init printk_nmi_init(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct nmi_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
+
+		init_irq_work(&s->work, __printk_nmi_flush);
+	}
+
+	/* Make sure that IRQ works are initialized before enabling. */
+	smp_wmb();
+	printk_nmi_irq_ready = 1;
+
+	/* Flush pending messages that did not have scheduled IRQ works. */
+	printk_nmi_flush();
+}
+
+void printk_nmi_enter(void)
+{
+	this_cpu_write(printk_func, vprintk_nmi);
+}
+
+void printk_nmi_exit(void)
+{
+	this_cpu_write(printk_func, vprintk_default);
+}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bfbf284e4218..71eba0607034 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -55,6 +55,7 @@
 
 #include "console_cmdline.h"
 #include "braille.h"
+#include "internal.h"
 
 int console_printk[4] = {
 	CONSOLE_LOGLEVEL_DEFAULT,	/* console_loglevel */
@@ -1807,14 +1808,6 @@ int vprintk_default(const char *fmt, va_list args)
 }
 EXPORT_SYMBOL_GPL(vprintk_default);
 
-/*
- * This allows printk to be diverted to another function per cpu.
- * This is useful for calling printk functions from within NMI
- * without worrying about race conditions that can lock up the
- * box.
- */
-DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
-
 /**
  * printk - print a kernel message
  * @fmt: format string
@@ -1838,21 +1831,11 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
  */
 asmlinkage __visible int printk(const char *fmt, ...)
 {
-	printk_func_t vprintk_func;
 	va_list args;
 	int r;
 
 	va_start(args, fmt);
-
-	/*
-	 * If a caller overrides the per_cpu printk_func, then it needs
-	 * to disable preemption when calling printk(). Otherwise
-	 * the printk_func should be set to the default. No need to
-	 * disable preemption here.
-	 */
-	vprintk_func = this_cpu_read(printk_func);
 	r = vprintk_func(fmt, args);
-
 	va_end(args);
 
 	return r;
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 6019c53c669e..26caf51cc238 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -16,33 +16,14 @@
 #include <linux/delay.h>
 #include <linux/kprobes.h>
 #include <linux/nmi.h>
-#include <linux/seq_buf.h>
 
 #ifdef arch_trigger_all_cpu_backtrace
 /* For reliability, we're prepared to waste bits here. */
 static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
-static cpumask_t printtrace_mask;
-
-#define NMI_BUF_SIZE		4096
-
-struct nmi_seq_buf {
-	unsigned char		buffer[NMI_BUF_SIZE];
-	struct seq_buf		seq;
-};
-
-/* Safe printing in NMI context */
-static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
 
 /* "in progress" flag of arch_trigger_all_cpu_backtrace */
 static unsigned long backtrace_flag;
 
-static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
-{
-	const char *buf = s->buffer + start;
-
-	printk("%.*s", (end - start) + 1, buf);
-}
-
 /*
  * When raise() is called it will be is passed a pointer to the
  * backtrace_mask. Architectures that call nmi_cpu_backtrace()
@@ -52,8 +33,7 @@ static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
 void nmi_trigger_all_cpu_backtrace(bool include_self,
 				   void (*raise)(cpumask_t *mask))
 {
-	struct nmi_seq_buf *s;
-	int i, cpu, this_cpu = get_cpu();
+	int i, this_cpu = get_cpu();
 
 	if (test_and_set_bit(0, &backtrace_flag)) {
 		/*
@@ -68,17 +48,6 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
 	if (!include_self)
 		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
 
-	cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
-
-	/*
-	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
-	 * CPUs will write to.
-	 */
-	for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
-		s = &per_cpu(nmi_print_seq, cpu);
-		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
-	}
-
 	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
 		pr_info("Sending NMI to %s CPUs:\n",
 			(include_self ? "all" : "other"));
@@ -94,73 +63,25 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
 	}
 
 	/*
-	 * Now that all the NMIs have triggered, we can dump out their
-	 * back traces safely to the console.
+	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * and therefore could not run their irq_work.
 	 */
-	for_each_cpu(cpu, &printtrace_mask) {
-		int len, last_i = 0;
+	printk_nmi_flush();
 
-		s = &per_cpu(nmi_print_seq, cpu);
-		len = seq_buf_used(&s->seq);
-		if (!len)
-			continue;
-
-		/* Print line by line. */
-		for (i = 0; i < len; i++) {
-			if (s->buffer[i] == '\n') {
-				print_seq_line(s, last_i, i);
-				last_i = i + 1;
-			}
-		}
-		/* Check if there was a partial line. */
-		if (last_i < len) {
-			print_seq_line(s, last_i, len - 1);
-			pr_cont("\n");
-		}
-	}
-
-	clear_bit(0, &backtrace_flag);
-	smp_mb__after_atomic();
+	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
 }
 
-/*
- * It is not safe to call printk() directly from NMI handlers.
- * It may be fine if the NMI detected a lock up and we have no choice
- * but to do so, but doing a NMI on all other CPUs to get a back trace
- * can be done with a sysrq-l. We don't want that to lock up, which
- * can happen if the NMI interrupts a printk in progress.
- *
- * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
- * the content into a per cpu seq_buf buffer. Then when the NMIs are
- * all done, we can safely dump the contents of the seq_buf to a printk()
- * from a non NMI context.
- */
-static int nmi_vprintk(const char *fmt, va_list args)
-{
-	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
-	unsigned int len = seq_buf_used(&s->seq);
-
-	seq_buf_vprintf(&s->seq, fmt, args);
-	return seq_buf_used(&s->seq) - len;
-}
-
 bool nmi_cpu_backtrace(struct pt_regs *regs)
 {
 	int cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
-		printk_func_t printk_func_save = this_cpu_read(printk_func);
-
-		/* Replace printk to write into the NMI seq */
-		this_cpu_write(printk_func, nmi_vprintk);
 		pr_warn("NMI backtrace for cpu %d\n", cpu);
 		if (regs)
 			show_regs(regs);
 		else
 			dump_stack();
-		this_cpu_write(printk_func, printk_func_save);
-
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context
  2016-04-21 11:48 ` Petr Mladek
  (?)
@ 2016-04-21 11:48   ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

We could not resize the temporary buffer in NMI context.  Let's warn if a
message is lost.

This is rather theoretical.  printk() should not be used in NMI.  The only
sensible use is when we want to print backtrace from all CPUs.  The
current buffer should be enough for this purpose.

[akpm@linux-foundation.org: whitespace fixlet]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/printk/internal.h | 11 +++++++++++
 kernel/printk/nmi.c      |  5 ++++-
 kernel/printk/printk.c   | 10 ++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2de99faedfc1..341bedccc065 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -34,6 +34,12 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 	return this_cpu_read(printk_func)(fmt, args);
 }
 
+extern atomic_t nmi_message_lost;
+static inline int get_nmi_message_lost(void)
+{
+	return atomic_xchg(&nmi_message_lost, 0);
+}
+
 #else /* CONFIG_PRINTK_NMI */
 
 static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
@@ -41,4 +47,9 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 	return vprintk_default(fmt, args);
 }
 
+static inline int get_nmi_message_lost(void)
+{
+	return 0;
+}
+
 #endif /* CONFIG_PRINTK_NMI */
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 303cf0d15e57..572f94922230 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -39,6 +39,7 @@
  */
 DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
 static int printk_nmi_irq_ready;
+atomic_t nmi_message_lost;
 
 #define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
 
@@ -64,8 +65,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
 again:
 	len = atomic_read(&s->len);
 
-	if (len >= sizeof(s->buffer))
+	if (len >= sizeof(s->buffer)) {
+		atomic_inc(&nmi_message_lost);
 		return 0;
+	}
 
 	/*
 	 * Make sure that all old data have been read before the buffer was
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 71eba0607034..e38579d730f4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1617,6 +1617,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	unsigned long flags;
 	int this_cpu;
 	int printed_len = 0;
+	int nmi_message_lost;
 	bool in_sched = false;
 	/* cpu currently holding logbuf_lock in this function */
 	static unsigned int logbuf_cpu = UINT_MAX;
@@ -1667,6 +1668,15 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 strlen(recursion_msg));
 	}
 
+	nmi_message_lost = get_nmi_message_lost();
+	if (unlikely(nmi_message_lost)) {
+		text_len = scnprintf(textbuf, sizeof(textbuf),
+				     "BAD LUCK: lost %d message(s) from NMI context!",
+				     nmi_message_lost);
+		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+					 NULL, 0, textbuf, text_len);
+	}
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context
@ 2016-04-21 11:48   ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek, Jan Kara,
	Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky,
	David Miller

We could not resize the temporary buffer in NMI context.  Let's warn if a
message is lost.

This is rather theoretical.  printk() should not be used in NMI.  The only
sensible use is when we want to print backtrace from all CPUs.  The
current buffer should be enough for this purpose.

[akpm@linux-foundation.org: whitespace fixlet]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/printk/internal.h | 11 +++++++++++
 kernel/printk/nmi.c      |  5 ++++-
 kernel/printk/printk.c   | 10 ++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2de99faedfc1..341bedccc065 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -34,6 +34,12 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 	return this_cpu_read(printk_func)(fmt, args);
 }
 
+extern atomic_t nmi_message_lost;
+static inline int get_nmi_message_lost(void)
+{
+	return atomic_xchg(&nmi_message_lost, 0);
+}
+
 #else /* CONFIG_PRINTK_NMI */
 
 static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
@@ -41,4 +47,9 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 	return vprintk_default(fmt, args);
 }
 
+static inline int get_nmi_message_lost(void)
+{
+	return 0;
+}
+
 #endif /* CONFIG_PRINTK_NMI */
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 303cf0d15e57..572f94922230 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -39,6 +39,7 @@
  */
 DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
 static int printk_nmi_irq_ready;
+atomic_t nmi_message_lost;
 
 #define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
 
@@ -64,8 +65,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
 again:
 	len = atomic_read(&s->len);
 
-	if (len >= sizeof(s->buffer))
+	if (len >= sizeof(s->buffer)) {
+		atomic_inc(&nmi_message_lost);
 		return 0;
+	}
 
 	/*
 	 * Make sure that all old data have been read before the buffer was
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 71eba0607034..e38579d730f4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1617,6 +1617,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	unsigned long flags;
 	int this_cpu;
 	int printed_len = 0;
+	int nmi_message_lost;
 	bool in_sched = false;
 	/* cpu currently holding logbuf_lock in this function */
 	static unsigned int logbuf_cpu = UINT_MAX;
@@ -1667,6 +1668,15 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 strlen(recursion_msg));
 	}
 
+	nmi_message_lost = get_nmi_message_lost();
+	if (unlikely(nmi_message_lost)) {
+		text_len = scnprintf(textbuf, sizeof(textbuf),
+				     "BAD LUCK: lost %d message(s) from NMI context!",
+				     nmi_message_lost);
+		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+					 NULL, 0, textbuf, text_len);
+	}
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context
@ 2016-04-21 11:48   ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

We could not resize the temporary buffer in NMI context.  Let's warn if a
message is lost.

This is rather theoretical.  printk() should not be used in NMI.  The only
sensible use is when we want to print backtrace from all CPUs.  The
current buffer should be enough for this purpose.

[akpm at linux-foundation.org: whitespace fixlet]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/printk/internal.h | 11 +++++++++++
 kernel/printk/nmi.c      |  5 ++++-
 kernel/printk/printk.c   | 10 ++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2de99faedfc1..341bedccc065 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -34,6 +34,12 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 	return this_cpu_read(printk_func)(fmt, args);
 }
 
+extern atomic_t nmi_message_lost;
+static inline int get_nmi_message_lost(void)
+{
+	return atomic_xchg(&nmi_message_lost, 0);
+}
+
 #else /* CONFIG_PRINTK_NMI */
 
 static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
@@ -41,4 +47,9 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 	return vprintk_default(fmt, args);
 }
 
+static inline int get_nmi_message_lost(void)
+{
+	return 0;
+}
+
 #endif /* CONFIG_PRINTK_NMI */
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 303cf0d15e57..572f94922230 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -39,6 +39,7 @@
  */
 DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
 static int printk_nmi_irq_ready;
+atomic_t nmi_message_lost;
 
 #define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
 
@@ -64,8 +65,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
 again:
 	len = atomic_read(&s->len);
 
-	if (len >= sizeof(s->buffer))
+	if (len >= sizeof(s->buffer)) {
+		atomic_inc(&nmi_message_lost);
 		return 0;
+	}
 
 	/*
 	 * Make sure that all old data have been read before the buffer was
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 71eba0607034..e38579d730f4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1617,6 +1617,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	unsigned long flags;
 	int this_cpu;
 	int printed_len = 0;
+	int nmi_message_lost;
 	bool in_sched = false;
 	/* cpu currently holding logbuf_lock in this function */
 	static unsigned int logbuf_cpu = UINT_MAX;
@@ -1667,6 +1668,15 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 strlen(recursion_msg));
 	}
 
+	nmi_message_lost = get_nmi_message_lost();
+	if (unlikely(nmi_message_lost)) {
+		text_len = scnprintf(textbuf, sizeof(textbuf),
+				     "BAD LUCK: lost %d message(s) from NMI context!",
+				     nmi_message_lost);
+		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+					 NULL, 0, textbuf, text_len);
+	}
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 3/4] printk/nmi: increase the size of NMI buffer and make it configurable
  2016-04-21 11:48 ` Petr Mladek
  (?)
@ 2016-04-21 11:48   ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

Testing has shown that the backtrace sometimes does not fit into the 4kB
temporary buffer that is used in NMI context.  The warnings are gone when
I double the temporary buffer size.

This patch doubles the buffer size and makes it configurable.

Note that this problem existed even in the x86-specific implementation
that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI
stack trace on all CPUs").  Nobody noticed it because it did not print any
warnings.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 init/Kconfig        | 22 ++++++++++++++++++++++
 kernel/printk/nmi.c |  3 ++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 85c7a2bf1ea4..9dfd27394d43 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -862,6 +862,28 @@ config LOG_CPU_MAX_BUF_SHIFT
 		     13 =>   8 KB for each CPU
 		     12 =>   4 KB for each CPU
 
+config NMI_LOG_BUF_SHIFT
+	int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
+	range 10 21
+	default 13
+	depends on PRINTK_NMI
+	help
+	  Select the size of a per-CPU buffer where NMI messages are temporary
+	  stored. They are copied to the main log buffer in a safe context
+	  to avoid a deadlock. The value defines the size as a power of 2.
+
+	  NMI messages are rare and limited. The largest one is when
+	  a backtrace is printed. It usually fits into 4KB. Select
+	  8KB if you want to be on the safe side.
+
+	  Examples:
+		     17 => 128 KB for each CPU
+		     16 =>  64 KB for each CPU
+		     15 =>  32 KB for each CPU
+		     14 =>  16 KB for each CPU
+		     13 =>   8 KB for each CPU
+		     12 =>   4 KB for each CPU
+
 #
 # Architectures with an unreliable sched_clock() should select this:
 #
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 572f94922230..bf08557d7e3d 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -41,7 +41,8 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
 static int printk_nmi_irq_ready;
 atomic_t nmi_message_lost;
 
-#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
+#define NMI_LOG_BUF_LEN ((1 << CONFIG_NMI_LOG_BUF_SHIFT) -		\
+			 sizeof(atomic_t) - sizeof(struct irq_work))
 
 struct nmi_seq_buf {
 	atomic_t		len;	/* length of written data */
-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 3/4] printk/nmi: increase the size of NMI buffer and make it configurable
@ 2016-04-21 11:48   ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek, Jan Kara,
	Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky,
	David Miller

Testing has shown that the backtrace sometimes does not fit into the 4kB
temporary buffer that is used in NMI context.  The warnings are gone when
I double the temporary buffer size.

This patch doubles the buffer size and makes it configurable.

Note that this problem existed even in the x86-specific implementation
that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI
stack trace on all CPUs").  Nobody noticed it because it did not print any
warnings.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 init/Kconfig        | 22 ++++++++++++++++++++++
 kernel/printk/nmi.c |  3 ++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 85c7a2bf1ea4..9dfd27394d43 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -862,6 +862,28 @@ config LOG_CPU_MAX_BUF_SHIFT
 		     13 =>   8 KB for each CPU
 		     12 =>   4 KB for each CPU
 
+config NMI_LOG_BUF_SHIFT
+	int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
+	range 10 21
+	default 13
+	depends on PRINTK_NMI
+	help
+	  Select the size of a per-CPU buffer where NMI messages are temporary
+	  stored. They are copied to the main log buffer in a safe context
+	  to avoid a deadlock. The value defines the size as a power of 2.
+
+	  NMI messages are rare and limited. The largest one is when
+	  a backtrace is printed. It usually fits into 4KB. Select
+	  8KB if you want to be on the safe side.
+
+	  Examples:
+		     17 => 128 KB for each CPU
+		     16 =>  64 KB for each CPU
+		     15 =>  32 KB for each CPU
+		     14 =>  16 KB for each CPU
+		     13 =>   8 KB for each CPU
+		     12 =>   4 KB for each CPU
+
 #
 # Architectures with an unreliable sched_clock() should select this:
 #
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 572f94922230..bf08557d7e3d 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -41,7 +41,8 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
 static int printk_nmi_irq_ready;
 atomic_t nmi_message_lost;
 
-#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
+#define NMI_LOG_BUF_LEN ((1 << CONFIG_NMI_LOG_BUF_SHIFT) -		\
+			 sizeof(atomic_t) - sizeof(struct irq_work))
 
 struct nmi_seq_buf {
 	atomic_t		len;	/* length of written data */
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 3/4] printk/nmi: increase the size of NMI buffer and make it configurable
@ 2016-04-21 11:48   ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

Testing has shown that the backtrace sometimes does not fit into the 4kB
temporary buffer that is used in NMI context.  The warnings are gone when
I double the temporary buffer size.

This patch doubles the buffer size and makes it configurable.

Note that this problem existed even in the x86-specific implementation
that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI
stack trace on all CPUs").  Nobody noticed it because it did not print any
warnings.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 init/Kconfig        | 22 ++++++++++++++++++++++
 kernel/printk/nmi.c |  3 ++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 85c7a2bf1ea4..9dfd27394d43 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -862,6 +862,28 @@ config LOG_CPU_MAX_BUF_SHIFT
 		     13 =>   8 KB for each CPU
 		     12 =>   4 KB for each CPU
 
+config NMI_LOG_BUF_SHIFT
+	int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
+	range 10 21
+	default 13
+	depends on PRINTK_NMI
+	help
+	  Select the size of a per-CPU buffer where NMI messages are temporary
+	  stored. They are copied to the main log buffer in a safe context
+	  to avoid a deadlock. The value defines the size as a power of 2.
+
+	  NMI messages are rare and limited. The largest one is when
+	  a backtrace is printed. It usually fits into 4KB. Select
+	  8KB if you want to be on the safe side.
+
+	  Examples:
+		     17 => 128 KB for each CPU
+		     16 =>  64 KB for each CPU
+		     15 =>  32 KB for each CPU
+		     14 =>  16 KB for each CPU
+		     13 =>   8 KB for each CPU
+		     12 =>   4 KB for each CPU
+
 #
 # Architectures with an unreliable sched_clock() should select this:
 #
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 572f94922230..bf08557d7e3d 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -41,7 +41,8 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
 static int printk_nmi_irq_ready;
 atomic_t nmi_message_lost;
 
-#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
+#define NMI_LOG_BUF_LEN ((1 << CONFIG_NMI_LOG_BUF_SHIFT) -		\
+			 sizeof(atomic_t) - sizeof(struct irq_work))
 
 struct nmi_seq_buf {
 	atomic_t		len;	/* length of written data */
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
  2016-04-21 11:48 ` Petr Mladek
  (?)
@ 2016-04-21 11:48   ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

In NMI context, printk() messages are stored into per-CPU buffers to avoid
a possible deadlock.  They are normally flushed to the main ring buffer via
an IRQ work.  But the work is never called when the system calls panic() in
the very same NMI handler.

This patch tries to flush NMI buffers before the crash dump is generated.
In this case it does not risk a double release and bails out when the
logbuf_lock is already taken.  The aim is to get the messages into the main
ring buffer when possible.  It makes them better accessible in the vmcore.

Then the patch tries to flush the buffers second time when other CPUs are
down.  It might be more aggressive and reset logbuf_lock. The aim is to
get the messages available for the consequent kmsg_dump() and
console_flush_on_panic() calls.

The patch causes vprintk_emit() to be called even in NMI context again. But
it is done via printk_deferred() so that the console handling is skipped.
Consoles use internal locks and we could not prevent a deadlock easily.
They are explicitly called later when the crash dump is not generated, see
console_flush_on_panic().

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/printk.h   |  2 ++
 kernel/kexec_core.c      |  1 +
 kernel/panic.c           |  6 +++++-
 kernel/printk/internal.h |  2 ++
 kernel/printk/nmi.c      | 39 ++++++++++++++++++++++++++++++++++++++-
 kernel/printk/printk.c   |  2 +-
 6 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 51dd6b824fe2..f4da695fd615 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -127,11 +127,13 @@ extern void printk_nmi_init(void);
 extern void printk_nmi_enter(void);
 extern void printk_nmi_exit(void);
 extern void printk_nmi_flush(void);
+extern void printk_nmi_flush_on_panic(void);
 #else
 static inline void printk_nmi_init(void) { }
 static inline void printk_nmi_enter(void) { }
 static inline void printk_nmi_exit(void) { }
 static inline void printk_nmi_flush(void) { }
+static inline void printk_nmi_flush_on_panic(void) { }
 #endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 8d34308ea449..1dc3fe8495e0 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -893,6 +893,7 @@ void crash_kexec(struct pt_regs *regs)
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
 	if (old_cpu = PANIC_CPU_INVALID) {
 		/* This is the 1st CPU which comes here, so go ahead. */
+		printk_nmi_flush_on_panic();
 		__crash_kexec(regs);
 
 		/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 535c96510a44..8aa74497cc5a 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -160,8 +160,10 @@ void panic(const char *fmt, ...)
 	 *
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
-	if (!crash_kexec_post_notifiers)
+	if (!crash_kexec_post_notifiers) {
+		printk_nmi_flush_on_panic();
 		__crash_kexec(NULL);
+	}
 
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
@@ -176,6 +178,8 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
+	/* Call flush even twice. It tries harder with a single online CPU */
+	printk_nmi_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 
 	/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 341bedccc065..7fd2838fa417 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,6 +22,8 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
+extern raw_spinlock_t logbuf_lock;
+
 /*
  * printk() could not take logbuf_lock in NMI context. Instead,
  * it temporary stores the strings into a per-CPU buffer.
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index bf08557d7e3d..b69eb8a2876f 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -17,6 +17,7 @@
 
 #include <linux/preempt.h>
 #include <linux/spinlock.h>
+#include <linux/debug_locks.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
@@ -106,7 +107,16 @@ static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end)
 {
 	const char *buf = s->buffer + start;
 
-	printk("%.*s", (end - start) + 1, buf);
+	/*
+	 * The buffers are flushed in NMI only on panic.  The messages must
+	 * go only into the ring buffer at this stage.  Consoles will get
+	 * explicitly called later when a crashdump is not generated.
+	 */
+	if (in_nmi())
+		printk_deferred("%.*s", (end - start) + 1, buf);
+	else
+		printk("%.*s", (end - start) + 1, buf);
+
 }
 
 /*
@@ -194,6 +204,33 @@ void printk_nmi_flush(void)
 		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
 }
 
+/**
+ * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
+ *	goes down.
+ *
+ * Similar to printk_nmi_flush() but it can be called even in NMI context when
+ * the system goes down. It does the best effort to get NMI messages into
+ * the main ring buffer.
+ *
+ * Note that it could try harder when there is only one CPU online.
+ */
+void printk_nmi_flush_on_panic(void)
+{
+	/*
+	 * Make sure that we could access the main ring buffer.
+	 * Do not risk a double release when more CPUs are up.
+	 */
+	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
+		if (num_online_cpus() > 1)
+			return;
+
+		debug_locks_off();
+		raw_spin_lock_init(&logbuf_lock);
+	}
+
+	printk_nmi_flush();
+}
+
 void __init printk_nmi_init(void)
 {
 	int cpu;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e38579d730f4..60cdf6386763 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -245,7 +245,7 @@ __packed __aligned(4)
  * within the scheduler's rq lock. It must be released before calling
  * console_unlock() or anything else that might wake up a process.
  */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
+DEFINE_RAW_SPINLOCK(logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
@ 2016-04-21 11:48   ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek

In NMI context, printk() messages are stored into per-CPU buffers to avoid
a possible deadlock.  They are normally flushed to the main ring buffer via
an IRQ work.  But the work is never called when the system calls panic() in
the very same NMI handler.

This patch tries to flush NMI buffers before the crash dump is generated.
In this case it does not risk a double release and bails out when the
logbuf_lock is already taken.  The aim is to get the messages into the main
ring buffer when possible.  It makes them better accessible in the vmcore.

Then the patch tries to flush the buffers second time when other CPUs are
down.  It might be more aggressive and reset logbuf_lock. The aim is to
get the messages available for the consequent kmsg_dump() and
console_flush_on_panic() calls.

The patch causes vprintk_emit() to be called even in NMI context again. But
it is done via printk_deferred() so that the console handling is skipped.
Consoles use internal locks and we could not prevent a deadlock easily.
They are explicitly called later when the crash dump is not generated, see
console_flush_on_panic().

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/printk.h   |  2 ++
 kernel/kexec_core.c      |  1 +
 kernel/panic.c           |  6 +++++-
 kernel/printk/internal.h |  2 ++
 kernel/printk/nmi.c      | 39 ++++++++++++++++++++++++++++++++++++++-
 kernel/printk/printk.c   |  2 +-
 6 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 51dd6b824fe2..f4da695fd615 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -127,11 +127,13 @@ extern void printk_nmi_init(void);
 extern void printk_nmi_enter(void);
 extern void printk_nmi_exit(void);
 extern void printk_nmi_flush(void);
+extern void printk_nmi_flush_on_panic(void);
 #else
 static inline void printk_nmi_init(void) { }
 static inline void printk_nmi_enter(void) { }
 static inline void printk_nmi_exit(void) { }
 static inline void printk_nmi_flush(void) { }
+static inline void printk_nmi_flush_on_panic(void) { }
 #endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 8d34308ea449..1dc3fe8495e0 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -893,6 +893,7 @@ void crash_kexec(struct pt_regs *regs)
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
 	if (old_cpu == PANIC_CPU_INVALID) {
 		/* This is the 1st CPU which comes here, so go ahead. */
+		printk_nmi_flush_on_panic();
 		__crash_kexec(regs);
 
 		/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 535c96510a44..8aa74497cc5a 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -160,8 +160,10 @@ void panic(const char *fmt, ...)
 	 *
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
-	if (!crash_kexec_post_notifiers)
+	if (!crash_kexec_post_notifiers) {
+		printk_nmi_flush_on_panic();
 		__crash_kexec(NULL);
+	}
 
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
@@ -176,6 +178,8 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
+	/* Call flush even twice. It tries harder with a single online CPU */
+	printk_nmi_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 
 	/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 341bedccc065..7fd2838fa417 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,6 +22,8 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
+extern raw_spinlock_t logbuf_lock;
+
 /*
  * printk() could not take logbuf_lock in NMI context. Instead,
  * it temporary stores the strings into a per-CPU buffer.
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index bf08557d7e3d..b69eb8a2876f 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -17,6 +17,7 @@
 
 #include <linux/preempt.h>
 #include <linux/spinlock.h>
+#include <linux/debug_locks.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
@@ -106,7 +107,16 @@ static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end)
 {
 	const char *buf = s->buffer + start;
 
-	printk("%.*s", (end - start) + 1, buf);
+	/*
+	 * The buffers are flushed in NMI only on panic.  The messages must
+	 * go only into the ring buffer at this stage.  Consoles will get
+	 * explicitly called later when a crashdump is not generated.
+	 */
+	if (in_nmi())
+		printk_deferred("%.*s", (end - start) + 1, buf);
+	else
+		printk("%.*s", (end - start) + 1, buf);
+
 }
 
 /*
@@ -194,6 +204,33 @@ void printk_nmi_flush(void)
 		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
 }
 
+/**
+ * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
+ *	goes down.
+ *
+ * Similar to printk_nmi_flush() but it can be called even in NMI context when
+ * the system goes down. It does the best effort to get NMI messages into
+ * the main ring buffer.
+ *
+ * Note that it could try harder when there is only one CPU online.
+ */
+void printk_nmi_flush_on_panic(void)
+{
+	/*
+	 * Make sure that we could access the main ring buffer.
+	 * Do not risk a double release when more CPUs are up.
+	 */
+	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
+		if (num_online_cpus() > 1)
+			return;
+
+		debug_locks_off();
+		raw_spin_lock_init(&logbuf_lock);
+	}
+
+	printk_nmi_flush();
+}
+
 void __init printk_nmi_init(void)
 {
 	int cpu;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e38579d730f4..60cdf6386763 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -245,7 +245,7 @@ __packed __aligned(4)
  * within the scheduler's rq lock. It must be released before calling
  * console_unlock() or anything else that might wake up a process.
  */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
+DEFINE_RAW_SPINLOCK(logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
@ 2016-04-21 11:48   ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-21 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

In NMI context, printk() messages are stored into per-CPU buffers to avoid
a possible deadlock.  They are normally flushed to the main ring buffer via
an IRQ work.  But the work is never called when the system calls panic() in
the very same NMI handler.

This patch tries to flush NMI buffers before the crash dump is generated.
In this case it does not risk a double release and bails out when the
logbuf_lock is already taken.  The aim is to get the messages into the main
ring buffer when possible.  It makes them better accessible in the vmcore.

Then the patch tries to flush the buffers second time when other CPUs are
down.  It might be more aggressive and reset logbuf_lock. The aim is to
get the messages available for the consequent kmsg_dump() and
console_flush_on_panic() calls.

The patch causes vprintk_emit() to be called even in NMI context again. But
it is done via printk_deferred() so that the console handling is skipped.
Consoles use internal locks and we could not prevent a deadlock easily.
They are explicitly called later when the crash dump is not generated, see
console_flush_on_panic().

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/printk.h   |  2 ++
 kernel/kexec_core.c      |  1 +
 kernel/panic.c           |  6 +++++-
 kernel/printk/internal.h |  2 ++
 kernel/printk/nmi.c      | 39 ++++++++++++++++++++++++++++++++++++++-
 kernel/printk/printk.c   |  2 +-
 6 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 51dd6b824fe2..f4da695fd615 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -127,11 +127,13 @@ extern void printk_nmi_init(void);
 extern void printk_nmi_enter(void);
 extern void printk_nmi_exit(void);
 extern void printk_nmi_flush(void);
+extern void printk_nmi_flush_on_panic(void);
 #else
 static inline void printk_nmi_init(void) { }
 static inline void printk_nmi_enter(void) { }
 static inline void printk_nmi_exit(void) { }
 static inline void printk_nmi_flush(void) { }
+static inline void printk_nmi_flush_on_panic(void) { }
 #endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 8d34308ea449..1dc3fe8495e0 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -893,6 +893,7 @@ void crash_kexec(struct pt_regs *regs)
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
 	if (old_cpu == PANIC_CPU_INVALID) {
 		/* This is the 1st CPU which comes here, so go ahead. */
+		printk_nmi_flush_on_panic();
 		__crash_kexec(regs);
 
 		/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 535c96510a44..8aa74497cc5a 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -160,8 +160,10 @@ void panic(const char *fmt, ...)
 	 *
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
-	if (!crash_kexec_post_notifiers)
+	if (!crash_kexec_post_notifiers) {
+		printk_nmi_flush_on_panic();
 		__crash_kexec(NULL);
+	}
 
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
@@ -176,6 +178,8 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
+	/* Call flush even twice. It tries harder with a single online CPU */
+	printk_nmi_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 
 	/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 341bedccc065..7fd2838fa417 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,6 +22,8 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
+extern raw_spinlock_t logbuf_lock;
+
 /*
  * printk() could not take logbuf_lock in NMI context. Instead,
  * it temporary stores the strings into a per-CPU buffer.
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index bf08557d7e3d..b69eb8a2876f 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -17,6 +17,7 @@
 
 #include <linux/preempt.h>
 #include <linux/spinlock.h>
+#include <linux/debug_locks.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
@@ -106,7 +107,16 @@ static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end)
 {
 	const char *buf = s->buffer + start;
 
-	printk("%.*s", (end - start) + 1, buf);
+	/*
+	 * The buffers are flushed in NMI only on panic.  The messages must
+	 * go only into the ring buffer at this stage.  Consoles will get
+	 * explicitly called later when a crashdump is not generated.
+	 */
+	if (in_nmi())
+		printk_deferred("%.*s", (end - start) + 1, buf);
+	else
+		printk("%.*s", (end - start) + 1, buf);
+
 }
 
 /*
@@ -194,6 +204,33 @@ void printk_nmi_flush(void)
 		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
 }
 
+/**
+ * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
+ *	goes down.
+ *
+ * Similar to printk_nmi_flush() but it can be called even in NMI context when
+ * the system goes down. It does the best effort to get NMI messages into
+ * the main ring buffer.
+ *
+ * Note that it could try harder when there is only one CPU online.
+ */
+void printk_nmi_flush_on_panic(void)
+{
+	/*
+	 * Make sure that we could access the main ring buffer.
+	 * Do not risk a double release when more CPUs are up.
+	 */
+	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
+		if (num_online_cpus() > 1)
+			return;
+
+		debug_locks_off();
+		raw_spin_lock_init(&logbuf_lock);
+	}
+
+	printk_nmi_flush();
+}
+
 void __init printk_nmi_init(void)
 {
 	int cpu;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e38579d730f4..60cdf6386763 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -245,7 +245,7 @@ __packed __aligned(4)
  * within the scheduler's rq lock. It must be released before calling
  * console_unlock() or anything else that might wake up a process.
  */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
+DEFINE_RAW_SPINLOCK(logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
  2016-04-21 11:48   ` Petr Mladek
  (?)
@ 2016-04-23  3:49     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2016-04-23  3:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Petr,

On (04/21/16 13:48), Petr Mladek wrote:
>  extern void printk_nmi_flush(void);
> +extern void printk_nmi_flush_on_panic(void);
>  #else
>  static inline void printk_nmi_flush(void) { }
> +static inline void printk_nmi_flush_on_panic(void) { }
[..]
> +void printk_nmi_flush_on_panic(void)
> +{
> +	/*
> +	 * Make sure that we could access the main ring buffer.
> +	 * Do not risk a double release when more CPUs are up.
> +	 */
> +	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
> +		if (num_online_cpus() > 1)
> +			return;
> +
> +		debug_locks_off();
> +		raw_spin_lock_init(&logbuf_lock);
> +	}
> +
> +	printk_nmi_flush();
> +}
[..]
> -static DEFINE_RAW_SPINLOCK(logbuf_lock);
> +DEFINE_RAW_SPINLOCK(logbuf_lock);

just an idea,

how about doing it a bit differently?


move printk_nmi_flush_on_panic() to printk.c, and place it next to
printk_flush_on_panic() (so we will have two printk "flush-on-panic"
functions sitting together). /* printk_nmi_flush() is in printk.h,
so it's visible to printk anyway */

it also will let us keep logbuf_lock static, it's a bit too internal
to printk to expose it, I think.


IOW, something like this?

---

 kernel/printk/internal.h |  2 --
 kernel/printk/nmi.c      | 27 ---------------------------
 kernel/printk/printk.c   | 29 ++++++++++++++++++++++++++++-
 3 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 7fd2838..341bedc 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,8 +22,6 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
-extern raw_spinlock_t logbuf_lock;
-
 /*
  * printk() could not take logbuf_lock in NMI context. Instead,
  * it temporary stores the strings into a per-CPU buffer.
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index b69eb8a..b68a9864 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -204,33 +204,6 @@ void printk_nmi_flush(void)
 		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
 }
 
-/**
- * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
- *	goes down.
- *
- * Similar to printk_nmi_flush() but it can be called even in NMI context when
- * the system goes down. It does the best effort to get NMI messages into
- * the main ring buffer.
- *
- * Note that it could try harder when there is only one CPU online.
- */
-void printk_nmi_flush_on_panic(void)
-{
-	/*
-	 * Make sure that we could access the main ring buffer.
-	 * Do not risk a double release when more CPUs are up.
-	 */
-	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
-		if (num_online_cpus() > 1)
-			return;
-
-		debug_locks_off();
-		raw_spin_lock_init(&logbuf_lock);
-	}
-
-	printk_nmi_flush();
-}
-
 void __init printk_nmi_init(void)
 {
 	int cpu;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0a0e789..1509baa 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -245,7 +245,7 @@ __packed __aligned(4)
  * within the scheduler's rq lock. It must be released before calling
  * console_unlock() or anything else that might wake up a process.
  */
-DEFINE_RAW_SPINLOCK(logbuf_lock);
+static DEFINE_RAW_SPINLOCK(logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
@@ -2447,6 +2447,33 @@ void console_unblank(void)
 }
 
 /**
+ * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
+ *	goes down.
+ *
+ * Similar to printk_nmi_flush() but it can be called even in NMI context when
+ * the system goes down. It does the best effort to get NMI messages into
+ * the main ring buffer.
+ *
+ * Note that it could try harder when there is only one CPU online.
+ */
+void printk_nmi_flush_on_panic(void)
+{
+	/*
+	 * Make sure that we could access the main ring buffer.
+	 * Do not risk a double release when more CPUs are up.
+	 */
+	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
+		if (num_online_cpus() > 1)
+			return;
+
+		debug_locks_off();
+		raw_spin_lock_init(&logbuf_lock);
+	}
+
+	printk_nmi_flush();
+}
+
+/**
  * console_flush_on_panic - flush console content on panic
  *
  * Immediately output all pending messages no matter what.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
@ 2016-04-23  3:49     ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2016-04-23  3:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86,
	linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux

Hello Petr,

On (04/21/16 13:48), Petr Mladek wrote:
>  extern void printk_nmi_flush(void);
> +extern void printk_nmi_flush_on_panic(void);
>  #else
>  static inline void printk_nmi_flush(void) { }
> +static inline void printk_nmi_flush_on_panic(void) { }
[..]
> +void printk_nmi_flush_on_panic(void)
> +{
> +	/*
> +	 * Make sure that we could access the main ring buffer.
> +	 * Do not risk a double release when more CPUs are up.
> +	 */
> +	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
> +		if (num_online_cpus() > 1)
> +			return;
> +
> +		debug_locks_off();
> +		raw_spin_lock_init(&logbuf_lock);
> +	}
> +
> +	printk_nmi_flush();
> +}
[..]
> -static DEFINE_RAW_SPINLOCK(logbuf_lock);
> +DEFINE_RAW_SPINLOCK(logbuf_lock);

just an idea,

how about doing it a bit differently?


move printk_nmi_flush_on_panic() to printk.c, and place it next to
printk_flush_on_panic() (so we will have two printk "flush-on-panic"
functions sitting together). /* printk_nmi_flush() is in printk.h,
so it's visible to printk anyway */

it also will let us keep logbuf_lock static, it's a bit too internal
to printk to expose it, I think.


IOW, something like this?

---

 kernel/printk/internal.h |  2 --
 kernel/printk/nmi.c      | 27 ---------------------------
 kernel/printk/printk.c   | 29 ++++++++++++++++++++++++++++-
 3 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 7fd2838..341bedc 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,8 +22,6 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
-extern raw_spinlock_t logbuf_lock;
-
 /*
  * printk() could not take logbuf_lock in NMI context. Instead,
  * it temporary stores the strings into a per-CPU buffer.
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index b69eb8a..b68a9864 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -204,33 +204,6 @@ void printk_nmi_flush(void)
 		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
 }
 
-/**
- * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
- *	goes down.
- *
- * Similar to printk_nmi_flush() but it can be called even in NMI context when
- * the system goes down. It does the best effort to get NMI messages into
- * the main ring buffer.
- *
- * Note that it could try harder when there is only one CPU online.
- */
-void printk_nmi_flush_on_panic(void)
-{
-	/*
-	 * Make sure that we could access the main ring buffer.
-	 * Do not risk a double release when more CPUs are up.
-	 */
-	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
-		if (num_online_cpus() > 1)
-			return;
-
-		debug_locks_off();
-		raw_spin_lock_init(&logbuf_lock);
-	}
-
-	printk_nmi_flush();
-}
-
 void __init printk_nmi_init(void)
 {
 	int cpu;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0a0e789..1509baa 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -245,7 +245,7 @@ __packed __aligned(4)
  * within the scheduler's rq lock. It must be released before calling
  * console_unlock() or anything else that might wake up a process.
  */
-DEFINE_RAW_SPINLOCK(logbuf_lock);
+static DEFINE_RAW_SPINLOCK(logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
@@ -2447,6 +2447,33 @@ void console_unblank(void)
 }
 
 /**
+ * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
+ *	goes down.
+ *
+ * Similar to printk_nmi_flush() but it can be called even in NMI context when
+ * the system goes down. It does the best effort to get NMI messages into
+ * the main ring buffer.
+ *
+ * Note that it could try harder when there is only one CPU online.
+ */
+void printk_nmi_flush_on_panic(void)
+{
+	/*
+	 * Make sure that we could access the main ring buffer.
+	 * Do not risk a double release when more CPUs are up.
+	 */
+	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
+		if (num_online_cpus() > 1)
+			return;
+
+		debug_locks_off();
+		raw_spin_lock_init(&logbuf_lock);
+	}
+
+	printk_nmi_flush();
+}
+
+/**
  * console_flush_on_panic - flush console content on panic
  *
  * Immediately output all pending messages no matter what.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
@ 2016-04-23  3:49     ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2016-04-23  3:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Petr,

On (04/21/16 13:48), Petr Mladek wrote:
>  extern void printk_nmi_flush(void);
> +extern void printk_nmi_flush_on_panic(void);
>  #else
>  static inline void printk_nmi_flush(void) { }
> +static inline void printk_nmi_flush_on_panic(void) { }
[..]
> +void printk_nmi_flush_on_panic(void)
> +{
> +	/*
> +	 * Make sure that we could access the main ring buffer.
> +	 * Do not risk a double release when more CPUs are up.
> +	 */
> +	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
> +		if (num_online_cpus() > 1)
> +			return;
> +
> +		debug_locks_off();
> +		raw_spin_lock_init(&logbuf_lock);
> +	}
> +
> +	printk_nmi_flush();
> +}
[..]
> -static DEFINE_RAW_SPINLOCK(logbuf_lock);
> +DEFINE_RAW_SPINLOCK(logbuf_lock);

just an idea,

how about doing it a bit differently?


move printk_nmi_flush_on_panic() to printk.c, and place it next to
printk_flush_on_panic() (so we will have two printk "flush-on-panic"
functions sitting together). /* printk_nmi_flush() is in printk.h,
so it's visible to printk anyway */

it also will let us keep logbuf_lock static, it's a bit too internal
to printk to expose it, I think.


IOW, something like this?

---

 kernel/printk/internal.h |  2 --
 kernel/printk/nmi.c      | 27 ---------------------------
 kernel/printk/printk.c   | 29 ++++++++++++++++++++++++++++-
 3 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 7fd2838..341bedc 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,8 +22,6 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
-extern raw_spinlock_t logbuf_lock;
-
 /*
  * printk() could not take logbuf_lock in NMI context. Instead,
  * it temporary stores the strings into a per-CPU buffer.
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index b69eb8a..b68a9864 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -204,33 +204,6 @@ void printk_nmi_flush(void)
 		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
 }
 
-/**
- * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
- *	goes down.
- *
- * Similar to printk_nmi_flush() but it can be called even in NMI context when
- * the system goes down. It does the best effort to get NMI messages into
- * the main ring buffer.
- *
- * Note that it could try harder when there is only one CPU online.
- */
-void printk_nmi_flush_on_panic(void)
-{
-	/*
-	 * Make sure that we could access the main ring buffer.
-	 * Do not risk a double release when more CPUs are up.
-	 */
-	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
-		if (num_online_cpus() > 1)
-			return;
-
-		debug_locks_off();
-		raw_spin_lock_init(&logbuf_lock);
-	}
-
-	printk_nmi_flush();
-}
-
 void __init printk_nmi_init(void)
 {
 	int cpu;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0a0e789..1509baa 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -245,7 +245,7 @@ __packed __aligned(4)
  * within the scheduler's rq lock. It must be released before calling
  * console_unlock() or anything else that might wake up a process.
  */
-DEFINE_RAW_SPINLOCK(logbuf_lock);
+static DEFINE_RAW_SPINLOCK(logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
@@ -2447,6 +2447,33 @@ void console_unblank(void)
 }
 
 /**
+ * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
+ *	goes down.
+ *
+ * Similar to printk_nmi_flush() but it can be called even in NMI context when
+ * the system goes down. It does the best effort to get NMI messages into
+ * the main ring buffer.
+ *
+ * Note that it could try harder when there is only one CPU online.
+ */
+void printk_nmi_flush_on_panic(void)
+{
+	/*
+	 * Make sure that we could access the main ring buffer.
+	 * Do not risk a double release when more CPUs are up.
+	 */
+	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
+		if (num_online_cpus() > 1)
+			return;
+
+		debug_locks_off();
+		raw_spin_lock_init(&logbuf_lock);
+	}
+
+	printk_nmi_flush();
+}
+
+/**
  * console_flush_on_panic - flush console content on panic
  *
  * Immediately output all pending messages no matter what.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
  2016-04-23  3:49     ` Sergey Senozhatsky
  (?)
@ 2016-04-26 14:21       ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-26 14:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat 2016-04-23 12:49:24, Sergey Senozhatsky wrote:
> Hello Petr,
> 
> On (04/21/16 13:48), Petr Mladek wrote:
> >  extern void printk_nmi_flush(void);
> > +extern void printk_nmi_flush_on_panic(void);
> >  #else
> >  static inline void printk_nmi_flush(void) { }
> > +static inline void printk_nmi_flush_on_panic(void) { }
> [..]
> > +void printk_nmi_flush_on_panic(void)
> > +{
> > +	/*
> > +	 * Make sure that we could access the main ring buffer.
> > +	 * Do not risk a double release when more CPUs are up.
> > +	 */
> > +	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
> > +		if (num_online_cpus() > 1)
> > +			return;
> > +
> > +		debug_locks_off();
> > +		raw_spin_lock_init(&logbuf_lock);
> > +	}
> > +
> > +	printk_nmi_flush();
> > +}
> [..]
> > -static DEFINE_RAW_SPINLOCK(logbuf_lock);
> > +DEFINE_RAW_SPINLOCK(logbuf_lock);
> 
> just an idea,
> 
> how about doing it a bit differently?
> 
> 
> move printk_nmi_flush_on_panic() to printk.c, and place it next to
> printk_flush_on_panic() (so we will have two printk "flush-on-panic"
> functions sitting together). /* printk_nmi_flush() is in printk.h,
> so it's visible to printk anyway */
> 
> it also will let us keep logbuf_lock static, it's a bit too internal
> to printk to expose it, I think.
> 
> IOW, something like this?

It is rather cosmetic change. I 

> ---
> 
>  kernel/printk/internal.h |  2 --
>  kernel/printk/nmi.c      | 27 ---------------------------
>  kernel/printk/printk.c   | 29 ++++++++++++++++++++++++++++-
>  3 files changed, 28 insertions(+), 30 deletions(-)
> 
> diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
> index 7fd2838..341bedc 100644
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -22,8 +22,6 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
>  
>  #ifdef CONFIG_PRINTK_NMI
>  
> -extern raw_spinlock_t logbuf_lock;

Well, it was exposed only in the internal.h header file. I consider
this rather a cosmetic change and do not have strong opinion about it. :-)

Anyway, thanks a lot for review.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
@ 2016-04-26 14:21       ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-26 14:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux

On Sat 2016-04-23 12:49:24, Sergey Senozhatsky wrote:
> Hello Petr,
> 
> On (04/21/16 13:48), Petr Mladek wrote:
> >  extern void printk_nmi_flush(void);
> > +extern void printk_nmi_flush_on_panic(void);
> >  #else
> >  static inline void printk_nmi_flush(void) { }
> > +static inline void printk_nmi_flush_on_panic(void) { }
> [..]
> > +void printk_nmi_flush_on_panic(void)
> > +{
> > +	/*
> > +	 * Make sure that we could access the main ring buffer.
> > +	 * Do not risk a double release when more CPUs are up.
> > +	 */
> > +	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
> > +		if (num_online_cpus() > 1)
> > +			return;
> > +
> > +		debug_locks_off();
> > +		raw_spin_lock_init(&logbuf_lock);
> > +	}
> > +
> > +	printk_nmi_flush();
> > +}
> [..]
> > -static DEFINE_RAW_SPINLOCK(logbuf_lock);
> > +DEFINE_RAW_SPINLOCK(logbuf_lock);
> 
> just an idea,
> 
> how about doing it a bit differently?
> 
> 
> move printk_nmi_flush_on_panic() to printk.c, and place it next to
> printk_flush_on_panic() (so we will have two printk "flush-on-panic"
> functions sitting together). /* printk_nmi_flush() is in printk.h,
> so it's visible to printk anyway */
> 
> it also will let us keep logbuf_lock static, it's a bit too internal
> to printk to expose it, I think.
> 
> IOW, something like this?

It is rather cosmetic change. I 

> ---
> 
>  kernel/printk/internal.h |  2 --
>  kernel/printk/nmi.c      | 27 ---------------------------
>  kernel/printk/printk.c   | 29 ++++++++++++++++++++++++++++-
>  3 files changed, 28 insertions(+), 30 deletions(-)
> 
> diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
> index 7fd2838..341bedc 100644
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -22,8 +22,6 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
>  
>  #ifdef CONFIG_PRINTK_NMI
>  
> -extern raw_spinlock_t logbuf_lock;

Well, it was exposed only in the internal.h header file. I consider
this rather a cosmetic change and do not have strong opinion about it. :-)

Anyway, thanks a lot for review.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
@ 2016-04-26 14:21       ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2016-04-26 14:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat 2016-04-23 12:49:24, Sergey Senozhatsky wrote:
> Hello Petr,
> 
> On (04/21/16 13:48), Petr Mladek wrote:
> >  extern void printk_nmi_flush(void);
> > +extern void printk_nmi_flush_on_panic(void);
> >  #else
> >  static inline void printk_nmi_flush(void) { }
> > +static inline void printk_nmi_flush_on_panic(void) { }
> [..]
> > +void printk_nmi_flush_on_panic(void)
> > +{
> > +	/*
> > +	 * Make sure that we could access the main ring buffer.
> > +	 * Do not risk a double release when more CPUs are up.
> > +	 */
> > +	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
> > +		if (num_online_cpus() > 1)
> > +			return;
> > +
> > +		debug_locks_off();
> > +		raw_spin_lock_init(&logbuf_lock);
> > +	}
> > +
> > +	printk_nmi_flush();
> > +}
> [..]
> > -static DEFINE_RAW_SPINLOCK(logbuf_lock);
> > +DEFINE_RAW_SPINLOCK(logbuf_lock);
> 
> just an idea,
> 
> how about doing it a bit differently?
> 
> 
> move printk_nmi_flush_on_panic() to printk.c, and place it next to
> printk_flush_on_panic() (so we will have two printk "flush-on-panic"
> functions sitting together). /* printk_nmi_flush() is in printk.h,
> so it's visible to printk anyway */
> 
> it also will let us keep logbuf_lock static, it's a bit too internal
> to printk to expose it, I think.
> 
> IOW, something like this?

It is rather cosmetic change. I 

> ---
> 
>  kernel/printk/internal.h |  2 --
>  kernel/printk/nmi.c      | 27 ---------------------------
>  kernel/printk/printk.c   | 29 ++++++++++++++++++++++++++++-
>  3 files changed, 28 insertions(+), 30 deletions(-)
> 
> diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
> index 7fd2838..341bedc 100644
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -22,8 +22,6 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
>  
>  #ifdef CONFIG_PRINTK_NMI
>  
> -extern raw_spinlock_t logbuf_lock;

Well, it was exposed only in the internal.h header file. I consider
this rather a cosmetic change and do not have strong opinion about it. :-)

Anyway, thanks a lot for review.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
  2016-04-26 14:21       ` Petr Mladek
  (?)
@ 2016-04-27  0:34         ` Sergey Senozhatsky
  -1 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2016-04-27  0:34 UTC (permalink / raw)
  To: linux-arm-kernel

On (04/26/16 16:21), Petr Mladek wrote:
[..]
> > move printk_nmi_flush_on_panic() to printk.c, and place it next to
> > printk_flush_on_panic() (so we will have two printk "flush-on-panic"
> > functions sitting together). /* printk_nmi_flush() is in printk.h,
> > so it's visible to printk anyway */
> > 
> > it also will let us keep logbuf_lock static, it's a bit too internal
> > to printk to expose it, I think.
> > 
> > IOW, something like this?
> 
> It is rather cosmetic change. I 

oh yes, it is. just to keep similar functionality (flush, zap locks)
in one place (printk).

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
@ 2016-04-27  0:34         ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2016-04-27  0:34 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra,
	Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina,
	Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86,
	linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux

On (04/26/16 16:21), Petr Mladek wrote:
[..]
> > move printk_nmi_flush_on_panic() to printk.c, and place it next to
> > printk_flush_on_panic() (so we will have two printk "flush-on-panic"
> > functions sitting together). /* printk_nmi_flush() is in printk.h,
> > so it's visible to printk anyway */
> > 
> > it also will let us keep logbuf_lock static, it's a bit too internal
> > to printk to expose it, I think.
> > 
> > IOW, something like this?
> 
> It is rather cosmetic change. I 

oh yes, it is. just to keep similar functionality (flush, zap locks)
in one place (printk).

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic
@ 2016-04-27  0:34         ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2016-04-27  0:34 UTC (permalink / raw)
  To: linux-arm-kernel

On (04/26/16 16:21), Petr Mladek wrote:
[..]
> > move printk_nmi_flush_on_panic() to printk.c, and place it next to
> > printk_flush_on_panic() (so we will have two printk "flush-on-panic"
> > functions sitting together). /* printk_nmi_flush() is in printk.h,
> > so it's visible to printk anyway */
> > 
> > it also will let us keep logbuf_lock static, it's a bit too internal
> > to printk to expose it, I think.
> > 
> > IOW, something like this?
> 
> It is rather cosmetic change. I 

oh yes, it is. just to keep similar functionality (flush, zap locks)
in one place (printk).

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 0/4] Cleaning printk stuff in NMI context
  2016-04-21 11:48 ` Petr Mladek
  (?)
@ 2016-04-27  0:36   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2016-04-27  0:36 UTC (permalink / raw)
  To: linux-arm-kernel

On (04/21/16 13:48), Petr Mladek wrote:
> v5 adds changes suggested by Sergey Senozhatsky. It should not longer
> conflict with his async printk patchset.

passes my tests. (apart from that cosmetic-nano-nit) the series
looks good to me

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 0/4] Cleaning printk stuff in NMI context
@ 2016-04-27  0:36   ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2016-04-27  0:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86,
	linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux

On (04/21/16 13:48), Petr Mladek wrote:
> v5 adds changes suggested by Sergey Senozhatsky. It should not longer
> conflict with his async printk patchset.

passes my tests. (apart from that cosmetic-nano-nit) the series
looks good to me

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 0/4] Cleaning printk stuff in NMI context
@ 2016-04-27  0:36   ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2016-04-27  0:36 UTC (permalink / raw)
  To: linux-arm-kernel

On (04/21/16 13:48), Petr Mladek wrote:
> v5 adds changes suggested by Sergey Senozhatsky. It should not longer
> conflict with his async printk patchset.

passes my tests. (apart from that cosmetic-nano-nit) the series
looks good to me

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2016-04-21 11:48   ` Petr Mladek
  (?)
@ 2016-04-27  9:31     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 97+ messages in thread
From: Russell King - ARM Linux @ 2016-04-27  9:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 21, 2016 at 01:48:42PM +0200, Petr Mladek wrote:
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index cdfa6c2b7626..259543ec6dc9 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -66,6 +66,7 @@ config ARM
>  	select HAVE_KRETPROBES if (HAVE_KPROBES)
>  	select HAVE_MEMBLOCK
>  	select HAVE_MOD_ARCH_SPECIFIC
> +	select HAVE_NMI
>  	select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
>  	select HAVE_OPTPROBES if !THUMB2_KERNEL
>  	select HAVE_PERF_EVENTS
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index baee70267f29..df90bc59bfce 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -644,9 +644,11 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
>  		break;
>  
>  	case IPI_CPU_BACKTRACE:
> +		printk_nmi_enter();
>  		irq_enter();
>  		nmi_cpu_backtrace(regs);
>  		irq_exit();
> +		printk_nmi_exit();
>  		break;
>  
>  	default:

For the above,

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2016-04-27  9:31     ` Russell King - ARM Linux
  0 siblings, 0 replies; 97+ messages in thread
From: Russell King - ARM Linux @ 2016-04-27  9:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Thu, Apr 21, 2016 at 01:48:42PM +0200, Petr Mladek wrote:
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index cdfa6c2b7626..259543ec6dc9 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -66,6 +66,7 @@ config ARM
>  	select HAVE_KRETPROBES if (HAVE_KPROBES)
>  	select HAVE_MEMBLOCK
>  	select HAVE_MOD_ARCH_SPECIFIC
> +	select HAVE_NMI
>  	select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
>  	select HAVE_OPTPROBES if !THUMB2_KERNEL
>  	select HAVE_PERF_EVENTS
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index baee70267f29..df90bc59bfce 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -644,9 +644,11 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
>  		break;
>  
>  	case IPI_CPU_BACKTRACE:
> +		printk_nmi_enter();
>  		irq_enter();
>  		nmi_cpu_backtrace(regs);
>  		irq_exit();
> +		printk_nmi_exit();
>  		break;
>  
>  	default:

For the above,

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2016-04-27  9:31     ` Russell King - ARM Linux
  0 siblings, 0 replies; 97+ messages in thread
From: Russell King - ARM Linux @ 2016-04-27  9:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 21, 2016 at 01:48:42PM +0200, Petr Mladek wrote:
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index cdfa6c2b7626..259543ec6dc9 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -66,6 +66,7 @@ config ARM
>  	select HAVE_KRETPROBES if (HAVE_KPROBES)
>  	select HAVE_MEMBLOCK
>  	select HAVE_MOD_ARCH_SPECIFIC
> +	select HAVE_NMI
>  	select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
>  	select HAVE_OPTPROBES if !THUMB2_KERNEL
>  	select HAVE_PERF_EVENTS
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index baee70267f29..df90bc59bfce 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -644,9 +644,11 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
>  		break;
>  
>  	case IPI_CPU_BACKTRACE:
> +		printk_nmi_enter();
>  		irq_enter();
>  		nmi_cpu_backtrace(regs);
>  		irq_exit();
> +		printk_nmi_exit();
>  		break;
>  
>  	default:

For the above,

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context
  2016-04-21 11:48   ` Petr Mladek
  (?)
@ 2016-04-27  9:34     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 97+ messages in thread
From: Russell King - ARM Linux @ 2016-04-27  9:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 21, 2016 at 01:48:43PM +0200, Petr Mladek wrote:
> @@ -64,8 +65,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
>  again:
>  	len = atomic_read(&s->len);
>  
> -	if (len >= sizeof(s->buffer))
> +	if (len >= sizeof(s->buffer)) {
> +		atomic_inc(&nmi_message_lost);

This should be fine - I think its reasonable to expect that no one is
using a spinlock implementation for their atomic ops for this...

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context
@ 2016-04-27  9:34     ` Russell King - ARM Linux
  0 siblings, 0 replies; 97+ messages in thread
From: Russell King - ARM Linux @ 2016-04-27  9:34 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Thu, Apr 21, 2016 at 01:48:43PM +0200, Petr Mladek wrote:
> @@ -64,8 +65,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
>  again:
>  	len = atomic_read(&s->len);
>  
> -	if (len >= sizeof(s->buffer))
> +	if (len >= sizeof(s->buffer)) {
> +		atomic_inc(&nmi_message_lost);

This should be fine - I think its reasonable to expect that no one is
using a spinlock implementation for their atomic ops for this...

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context
@ 2016-04-27  9:34     ` Russell King - ARM Linux
  0 siblings, 0 replies; 97+ messages in thread
From: Russell King - ARM Linux @ 2016-04-27  9:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 21, 2016 at 01:48:43PM +0200, Petr Mladek wrote:
> @@ -64,8 +65,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
>  again:
>  	len = atomic_read(&s->len);
>  
> -	if (len >= sizeof(s->buffer))
> +	if (len >= sizeof(s->buffer)) {
> +		atomic_inc(&nmi_message_lost);

This should be fine - I think its reasonable to expect that no one is
using a spinlock implementation for their atomic ops for this...

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2016-04-21 11:48   ` Petr Mladek
  (?)
@ 2017-04-19 17:13     ` Steven Rostedt
  -1 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-19 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 21 Apr 2016 13:48:42 +0200
Petr Mladek <pmladek@suse.com> wrote:

> printk() takes some locks and could not be used a safe way in NMI context.

I just found a problem with this solution. It kills ftrace dumps from
NMI context :-(

[ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
[ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: commÌ1 pidg423 runtime–858 [ns] vruntime\x11924198270 [ns]
[ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
[ 1295.168497]
[ 1295.168498] Lost 4890096 message(s)!
[ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
[ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
[ 1296.811553] Call Trace:
[ 1296.811553]  <NMI>

I was hoping to see a cause of a hard lockup by enabling
ftrace_dump_on_oops. But as NMIs now have a very small buffer that
gets flushed, we need to find a new way to print out the full ftrace
buffer over serial.

Thoughts?


-- Steve


> 
> The chance of a deadlock is real especially when printing stacks from all
> CPUs.  This particular problem has been addressed on x86 by the commit
> a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs").
> 
> The patchset brings two big advantages.  First, it makes the NMI
> backtraces safe on all architectures for free.  Second, it makes all NMI
> messages almost safe on all architectures (the temporary buffer is
> limited.  We still should keep the number of messages in NMI context at
> minimum).
> 
> Note that there already are several messages printed in NMI context:
> WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
> handlers.  These are not easy to avoid.
> 
> This patch reuses most of the code and makes it generic.  It is useful for
> all messages and architectures that support NMI.
> 
> The alternative printk_func is set when entering and is reseted when
> leaving NMI context.  It queues IRQ work to copy the messages into the
> main ring buffer in a safe context.
> 
> __printk_nmi_flush() copies all available messages and reset the buffer.
> Then we could use a simple cmpxchg operations to get synchronized with
> writers.  There is also used a spinlock to get synchronized with other
> flushers.
> 
> We do not longer use seq_buf because it depends on external lock.  It
> would be hard to make all supported operations safe for a lockless use.
> It would be confusing and error prone to make only some operations safe.
> 
> The code is put into separate printk/nmi.c as suggested by Steven Rostedt.
> It needs a per-CPU buffer and is compiled only on architectures that call
> nmi_enter().  This is achieved by the new HAVE_NMI Kconfig flag.
> 
> The are MN10300 and Xtensa architectures.  We need to clean up NMI
> handling there first.  Let's do it separately.
> 
> The patch is heavily based on the draft from Peter Zijlstra, see
> https://lkml.org/lkml/2015/6/10/327
> 
> [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
> [akpm@linux-foundation.org: min_t->min - all types are size_t here]
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> Cc: Daniel Thompson <daniel.thompson@linaro.org>
> Cc: Jiri Kosina <jkosina@suse.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-19 17:13     ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-19 17:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Thu, 21 Apr 2016 13:48:42 +0200
Petr Mladek <pmladek@suse.com> wrote:

> printk() takes some locks and could not be used a safe way in NMI context.

I just found a problem with this solution. It kills ftrace dumps from
NMI context :-(

[ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
[ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns]
[ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
[ 1295.168497]
[ 1295.168498] Lost 4890096 message(s)!
[ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
[ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
[ 1296.811553] Call Trace:
[ 1296.811553]  <NMI>

I was hoping to see a cause of a hard lockup by enabling
ftrace_dump_on_oops. But as NMIs now have a very small buffer that
gets flushed, we need to find a new way to print out the full ftrace
buffer over serial.

Thoughts?


-- Steve


> 
> The chance of a deadlock is real especially when printing stacks from all
> CPUs.  This particular problem has been addressed on x86 by the commit
> a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs").
> 
> The patchset brings two big advantages.  First, it makes the NMI
> backtraces safe on all architectures for free.  Second, it makes all NMI
> messages almost safe on all architectures (the temporary buffer is
> limited.  We still should keep the number of messages in NMI context at
> minimum).
> 
> Note that there already are several messages printed in NMI context:
> WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
> handlers.  These are not easy to avoid.
> 
> This patch reuses most of the code and makes it generic.  It is useful for
> all messages and architectures that support NMI.
> 
> The alternative printk_func is set when entering and is reseted when
> leaving NMI context.  It queues IRQ work to copy the messages into the
> main ring buffer in a safe context.
> 
> __printk_nmi_flush() copies all available messages and reset the buffer.
> Then we could use a simple cmpxchg operations to get synchronized with
> writers.  There is also used a spinlock to get synchronized with other
> flushers.
> 
> We do not longer use seq_buf because it depends on external lock.  It
> would be hard to make all supported operations safe for a lockless use.
> It would be confusing and error prone to make only some operations safe.
> 
> The code is put into separate printk/nmi.c as suggested by Steven Rostedt.
> It needs a per-CPU buffer and is compiled only on architectures that call
> nmi_enter().  This is achieved by the new HAVE_NMI Kconfig flag.
> 
> The are MN10300 and Xtensa architectures.  We need to clean up NMI
> handling there first.  Let's do it separately.
> 
> The patch is heavily based on the draft from Peter Zijlstra, see
> https://lkml.org/lkml/2015/6/10/327
> 
> [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
> [akpm@linux-foundation.org: min_t->min - all types are size_t here]
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> Cc: Daniel Thompson <daniel.thompson@linaro.org>
> Cc: Jiri Kosina <jkosina@suse.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-19 17:13     ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-19 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 21 Apr 2016 13:48:42 +0200
Petr Mladek <pmladek@suse.com> wrote:

> printk() takes some locks and could not be used a safe way in NMI context.

I just found a problem with this solution. It kills ftrace dumps from
NMI context :-(

[ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
[ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns]
[ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
[ 1295.168497]
[ 1295.168498] Lost 4890096 message(s)!
[ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
[ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
[ 1296.811553] Call Trace:
[ 1296.811553]  <NMI>

I was hoping to see a cause of a hard lockup by enabling
ftrace_dump_on_oops. But as NMIs now have a very small buffer that
gets flushed, we need to find a new way to print out the full ftrace
buffer over serial.

Thoughts?


-- Steve


> 
> The chance of a deadlock is real especially when printing stacks from all
> CPUs.  This particular problem has been addressed on x86 by the commit
> a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs").
> 
> The patchset brings two big advantages.  First, it makes the NMI
> backtraces safe on all architectures for free.  Second, it makes all NMI
> messages almost safe on all architectures (the temporary buffer is
> limited.  We still should keep the number of messages in NMI context at
> minimum).
> 
> Note that there already are several messages printed in NMI context:
> WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
> handlers.  These are not easy to avoid.
> 
> This patch reuses most of the code and makes it generic.  It is useful for
> all messages and architectures that support NMI.
> 
> The alternative printk_func is set when entering and is reseted when
> leaving NMI context.  It queues IRQ work to copy the messages into the
> main ring buffer in a safe context.
> 
> __printk_nmi_flush() copies all available messages and reset the buffer.
> Then we could use a simple cmpxchg operations to get synchronized with
> writers.  There is also used a spinlock to get synchronized with other
> flushers.
> 
> We do not longer use seq_buf because it depends on external lock.  It
> would be hard to make all supported operations safe for a lockless use.
> It would be confusing and error prone to make only some operations safe.
> 
> The code is put into separate printk/nmi.c as suggested by Steven Rostedt.
> It needs a per-CPU buffer and is compiled only on architectures that call
> nmi_enter().  This is achieved by the new HAVE_NMI Kconfig flag.
> 
> The are MN10300 and Xtensa architectures.  We need to clean up NMI
> handling there first.  Let's do it separately.
> 
> The patch is heavily based on the draft from Peter Zijlstra, see
> https://lkml.org/lkml/2015/6/10/327
> 
> [arnd at arndb.de: printk-nmi: use %zu format string for size_t]
> [akpm at linux-foundation.org: min_t->min - all types are size_t here]
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> Cc: Daniel Thompson <daniel.thompson@linaro.org>
> Cc: Jiri Kosina <jkosina@suse.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Daniel Thompson <daniel.thompson@linaro.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-19 17:13     ` Steven Rostedt
  (?)
@ 2017-04-19 17:21       ` Peter Zijlstra
  -1 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2017-04-19 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 19, 2017 at 01:13:41PM -0400, Steven Rostedt wrote:
> On Thu, 21 Apr 2016 13:48:42 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > printk() takes some locks and could not be used a safe way in NMI context.
> 
> I just found a problem with this solution. It kills ftrace dumps from
> NMI context :-(
> 
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: commÌ1 pidg423 runtime–858 [ns] vruntime\x11924198270 [ns]
> [ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
> [ 1295.168497]
> [ 1295.168498] Lost 4890096 message(s)!
> [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
> [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
> [ 1296.811553] Call Trace:
> [ 1296.811553]  <NMI>
> 
> I was hoping to see a cause of a hard lockup by enabling
> ftrace_dump_on_oops. But as NMIs now have a very small buffer that
> gets flushed, we need to find a new way to print out the full ftrace
> buffer over serial.
> 
> Thoughts?

early_printk ;-)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-19 17:21       ` Peter Zijlstra
  0 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2017-04-19 17:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Andrew Morton, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Wed, Apr 19, 2017 at 01:13:41PM -0400, Steven Rostedt wrote:
> On Thu, 21 Apr 2016 13:48:42 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > printk() takes some locks and could not be used a safe way in NMI context.
> 
> I just found a problem with this solution. It kills ftrace dumps from
> NMI context :-(
> 
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns]
> [ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
> [ 1295.168497]
> [ 1295.168498] Lost 4890096 message(s)!
> [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
> [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
> [ 1296.811553] Call Trace:
> [ 1296.811553]  <NMI>
> 
> I was hoping to see a cause of a hard lockup by enabling
> ftrace_dump_on_oops. But as NMIs now have a very small buffer that
> gets flushed, we need to find a new way to print out the full ftrace
> buffer over serial.
> 
> Thoughts?

early_printk ;-)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-19 17:21       ` Peter Zijlstra
  0 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2017-04-19 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 19, 2017 at 01:13:41PM -0400, Steven Rostedt wrote:
> On Thu, 21 Apr 2016 13:48:42 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > printk() takes some locks and could not be used a safe way in NMI context.
> 
> I just found a problem with this solution. It kills ftrace dumps from
> NMI context :-(
> 
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns]
> [ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
> [ 1295.168497]
> [ 1295.168498] Lost 4890096 message(s)!
> [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
> [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
> [ 1296.811553] Call Trace:
> [ 1296.811553]  <NMI>
> 
> I was hoping to see a cause of a hard lockup by enabling
> ftrace_dump_on_oops. But as NMIs now have a very small buffer that
> gets flushed, we need to find a new way to print out the full ftrace
> buffer over serial.
> 
> Thoughts?

early_printk ;-)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-19 17:13     ` Steven Rostedt
  (?)
@ 2017-04-20  3:31       ` Sergey Senozhatsky
  -1 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-20  3:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Steven,

On (04/19/17 13:13), Steven Rostedt wrote:
> > printk() takes some locks and could not be used a safe way in NMI context.
> 
> I just found a problem with this solution. It kills ftrace dumps from
> NMI context :-(
> 
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: commÌ1 pidg423 runtime–858 [ns] vruntime\x11924198270 [ns]
> [ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
> [ 1295.168497]
> [ 1295.168498] Lost 4890096 message(s)!
> [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
> [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
> [ 1296.811553] Call Trace:
> [ 1296.811553]  <NMI>
> 
> I was hoping to see a cause of a hard lockup by enabling
> ftrace_dump_on_oops. But as NMIs now have a very small buffer that
> gets flushed, we need to find a new way to print out the full ftrace
> buffer over serial.
> 
> Thoughts?

hmmm... a really tough one.

well, someone has to say this:
 the simplest thing is to have a bigger PRINTK_SAFE_LOG_BUF_SHIFT value :)


just thinking (well, sort of) out loud. the problem is that we can't tell if
we already hold any printk related locks ("printk related locks" is not even
well defined term). so printk from NMI can deadlock or it can be OK, we
never know. but looking and vprintk_emit() and console_unlock() it seems that
we have some sort of a hint now, which is this_cpu_read(printk_context) - if
we are not in printk_safe context then we can say that _probably_ (and that's
a Russian roulette) doing "normal" printk() will work. that is a *very-very*
risky (and admittedly dumb) thing to assume, so we will move in a slightly
different direction. checking this_cpu_read(printk_context) only assures us
that we don't hold `logbuf_lock' on this CPU. and that is sort of something,
at least we can be sure that doing printk_deferred() from this CPU is safe.
printk_deferred() means that your NMI messages will end up in the logbuf,
which is a) bigger in size than per-CPU buffer and b) some other CPU can
immediately print those messages (hopefully).

we also switch to printk_safe mode for call_console_drivers() in
console_unlock(). but we can't make any solid assumptions there - serial
console lock can already be acquired, we don't have any markers for that.
it may be reasonable to assume that if we are not in printk_safe mode on
this CPU then serial console is not locked from this CPU, but there is
nothing that can assure us.

so at the moment what I can think of is something like

  -- check this_cpu_read(printk_context) in NMI prink

	-- if we are NOT in printk_safe on this CPU, then do printk_deferred()
	   and bypass `nmi_print_seq' buffer

	-- if we are in printk_safe
	  -- well... bad luck... have a bigger buffer.

		.... or there are some other options here, but I'd prefer
		not to list them, because people will hate me :)


so this shifts the purpose of `nmi_print_seq' buffer: use it only when
you can't use logbuf. otherwise, do printk_deferred().


need to think more.

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-20  3:31       ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-20  3:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86,
	linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux,
	Jan Kara, Ralf Baechle, Benjamin Herrenschmidt,
	Martin Schwidefsky, David Miller

Hello Steven,

On (04/19/17 13:13), Steven Rostedt wrote:
> > printk() takes some locks and could not be used a safe way in NMI context.
> 
> I just found a problem with this solution. It kills ftrace dumps from
> NMI context :-(
> 
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns]
> [ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
> [ 1295.168497]
> [ 1295.168498] Lost 4890096 message(s)!
> [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
> [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
> [ 1296.811553] Call Trace:
> [ 1296.811553]  <NMI>
> 
> I was hoping to see a cause of a hard lockup by enabling
> ftrace_dump_on_oops. But as NMIs now have a very small buffer that
> gets flushed, we need to find a new way to print out the full ftrace
> buffer over serial.
> 
> Thoughts?

hmmm... a really tough one.

well, someone has to say this:
 the simplest thing is to have a bigger PRINTK_SAFE_LOG_BUF_SHIFT value :)


just thinking (well, sort of) out loud. the problem is that we can't tell if
we already hold any printk related locks ("printk related locks" is not even
well defined term). so printk from NMI can deadlock or it can be OK, we
never know. but looking and vprintk_emit() and console_unlock() it seems that
we have some sort of a hint now, which is this_cpu_read(printk_context) - if
we are not in printk_safe context then we can say that _probably_ (and that's
a Russian roulette) doing "normal" printk() will work. that is a *very-very*
risky (and admittedly dumb) thing to assume, so we will move in a slightly
different direction. checking this_cpu_read(printk_context) only assures us
that we don't hold `logbuf_lock' on this CPU. and that is sort of something,
at least we can be sure that doing printk_deferred() from this CPU is safe.
printk_deferred() means that your NMI messages will end up in the logbuf,
which is a) bigger in size than per-CPU buffer and b) some other CPU can
immediately print those messages (hopefully).

we also switch to printk_safe mode for call_console_drivers() in
console_unlock(). but we can't make any solid assumptions there - serial
console lock can already be acquired, we don't have any markers for that.
it may be reasonable to assume that if we are not in printk_safe mode on
this CPU then serial console is not locked from this CPU, but there is
nothing that can assure us.

so at the moment what I can think of is something like

  -- check this_cpu_read(printk_context) in NMI prink

	-- if we are NOT in printk_safe on this CPU, then do printk_deferred()
	   and bypass `nmi_print_seq' buffer

	-- if we are in printk_safe
	  -- well... bad luck... have a bigger buffer.

		.... or there are some other options here, but I'd prefer
		not to list them, because people will hate me :)


so this shifts the purpose of `nmi_print_seq' buffer: use it only when
you can't use logbuf. otherwise, do printk_deferred().


need to think more.

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-20  3:31       ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-20  3:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Steven,

On (04/19/17 13:13), Steven Rostedt wrote:
> > printk() takes some locks and could not be used a safe way in NMI context.
> 
> I just found a problem with this solution. It kills ftrace dumps from
> NMI context :-(
> 
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
> [ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns]
> [ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
> [ 1295.168497]
> [ 1295.168498] Lost 4890096 message(s)!
> [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
> [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
> [ 1296.811553] Call Trace:
> [ 1296.811553]  <NMI>
> 
> I was hoping to see a cause of a hard lockup by enabling
> ftrace_dump_on_oops. But as NMIs now have a very small buffer that
> gets flushed, we need to find a new way to print out the full ftrace
> buffer over serial.
> 
> Thoughts?

hmmm... a really tough one.

well, someone has to say this:
 the simplest thing is to have a bigger PRINTK_SAFE_LOG_BUF_SHIFT value :)


just thinking (well, sort of) out loud. the problem is that we can't tell if
we already hold any printk related locks ("printk related locks" is not even
well defined term). so printk from NMI can deadlock or it can be OK, we
never know. but looking and vprintk_emit() and console_unlock() it seems that
we have some sort of a hint now, which is this_cpu_read(printk_context) - if
we are not in printk_safe context then we can say that _probably_ (and that's
a Russian roulette) doing "normal" printk() will work. that is a *very-very*
risky (and admittedly dumb) thing to assume, so we will move in a slightly
different direction. checking this_cpu_read(printk_context) only assures us
that we don't hold `logbuf_lock' on this CPU. and that is sort of something,
at least we can be sure that doing printk_deferred() from this CPU is safe.
printk_deferred() means that your NMI messages will end up in the logbuf,
which is a) bigger in size than per-CPU buffer and b) some other CPU can
immediately print those messages (hopefully).

we also switch to printk_safe mode for call_console_drivers() in
console_unlock(). but we can't make any solid assumptions there - serial
console lock can already be acquired, we don't have any markers for that.
it may be reasonable to assume that if we are not in printk_safe mode on
this CPU then serial console is not locked from this CPU, but there is
nothing that can assure us.

so at the moment what I can think of is something like

  -- check this_cpu_read(printk_context) in NMI prink

	-- if we are NOT in printk_safe on this CPU, then do printk_deferred()
	   and bypass `nmi_print_seq' buffer

	-- if we are in printk_safe
	  -- well... bad luck... have a bigger buffer.

		.... or there are some other options here, but I'd prefer
		not to list them, because people will hate me :)


so this shifts the purpose of `nmi_print_seq' buffer: use it only when
you can't use logbuf. otherwise, do printk_deferred().


need to think more.

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-20  3:31       ` Sergey Senozhatsky
  (?)
  (?)
@ 2017-04-20 13:11         ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-20 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu 2017-04-20 12:31:12, Sergey Senozhatsky wrote:
> Hello Steven,
> 
> On (04/19/17 13:13), Steven Rostedt wrote:
> > > printk() takes some locks and could not be used a safe way in NMI context.
> > 
> > I just found a problem with this solution. It kills ftrace dumps from
> > NMI context :-(
> > 
> > [ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
> > [ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: commÌ1 pidg423 runtime–858 [ns] vruntime\x11924198270 [ns]
> > [ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
> > [ 1295.168497]
> > [ 1295.168498] Lost 4890096 message(s)!
> > [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
> > [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
> > [ 1296.811553] Call Trace:
> > [ 1296.811553]  <NMI>
> > 
> > I was hoping to see a cause of a hard lockup by enabling
> > ftrace_dump_on_oops. But as NMIs now have a very small buffer that
> > gets flushed, we need to find a new way to print out the full ftrace
> > buffer over serial.
> > 
> > Thoughts?
> 
> hmmm... a really tough one.
> 
> well, someone has to say this:
>  the simplest thing is to have a bigger PRINTK_SAFE_LOG_BUF_SHIFT value :)
> 
> 
> just thinking (well, sort of) out loud. the problem is that we can't tell if
> we already hold any printk related locks ("printk related locks" is not even
> well defined term). so printk from NMI can deadlock or it can be OK, we
> never know. but looking and vprintk_emit() and console_unlock() it seems that
> we have some sort of a hint now, which is this_cpu_read(printk_context) - if
> we are not in printk_safe context then we can say that _probably_ (and that's
> a Russian roulette) doing "normal" printk() will work. that is a *very-very*
> risky (and admittedly dumb) thing to assume, so we will move in a slightly
> different direction. checking this_cpu_read(printk_context) only assures us
> that we don't hold `logbuf_lock' on this CPU. and that is sort of something,
> at least we can be sure that doing printk_deferred() from this CPU is safe.
> printk_deferred() means that your NMI messages will end up in the logbuf,
> which is a) bigger in size than per-CPU buffer and b) some other CPU can
> immediately print those messages (hopefully).
> 
> we also switch to printk_safe mode for call_console_drivers() in
> console_unlock(). but we can't make any solid assumptions there - serial
> console lock can already be acquired, we don't have any markers for that.
> it may be reasonable to assume that if we are not in printk_safe mode on
> this CPU then serial console is not locked from this CPU, but there is
> nothing that can assure us.

Good analyze. I would summarize it that we need to be careful of:

  + logbug_lock
  + PRINTK_SAFE_CONTEXT
  + locks used by console drivers

The first two things are easy to check. Except that a check for logbuf_lock
might produce false negatives. The last check is very hard.

> so at the moment what I can think of is something like
> 
>   -- check this_cpu_read(printk_context) in NMI prink
> 
> 	-- if we are NOT in printk_safe on this CPU, then do printk_deferred()
> 	   and bypass `nmi_print_seq' buffer

I would add also a check for logbuf_lock.

> 	-- if we are in printk_safe
> 	  -- well... bad luck... have a bigger buffer.

Yup, we do the best effort while still trying to stay on the safe
side.

I have cooked up a patch based on this. It uses printk_deferred()
in NMI when it is safe. Note that console_flush_on_panic() will
try to get them on the console when a kdump is not generated.
I believe that it will help Steven.


From c530d9dee91c74db5e6a198479e2e63b24cb84a2 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 20 Apr 2017 10:52:31 +0200
Subject: [PATCH] printk: Use the main logbuf in NMI when logbuf_lock is
 available

The commit 42a0bb3f71383b457a7d ("printk/nmi: generic solution for safe
printk in NMI") caused that printk stores messages into a temporary
buffer in NMI context.

The buffer is per-CPU and therefore the size is rather limited.
It works quite well for NMI backtraces. But there are longer logs
that might get printed in NMI context, for example, lockdep
warnings, ftrace_dump_on_oops.

The temporary buffer is used to avoid deadlocks caused by
logbuf_lock. Also it is needed to avoid races with the other
temporary buffer that is used when PRINTK_SAFE_CONTEXT is entered.
But the main buffer can be used in NMI if the lock is available
and we did not interrupt PRINTK_SAFE_CONTEXT.

The lock is checked using raw_spin_is_locked(). It might cause
false negatives when the lock is taken on another CPU outside NMI.
For this reason, we do the check in printk_nmi_enter(). It makes
the handling consistent for the entire NMI handler and avoids
reshuffling of the messages.

The patch also defines special printk context that allows
to use printk_deferred() in NMI. Note that we could not flush
the messages to the consoles because console drivers might use
many other internal locks.

The newly created vprintk_deferred() disables the preemption
only around the irq work handling. It is needed there to keep
the consistency between the two per-CPU variables. But there
is no reason to disable preemption around vprintk_emit().

Finally, the patch patch puts back explicit serialization
of the NMI backtraces from different CPUs. It was removed
by the commit a9edc88093287183ac934b ("x86/nmi: Perform
a safe NMI stack trace on all CPUs"). It was not needed
because the flushing of the temporary per-CPU buffers
was serialized.

Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/internal.h    |  6 ++++--
 kernel/printk/printk.c      | 19 ++++++++++++++-----
 kernel/printk/printk_safe.c | 25 +++++++++++++++++++++++--
 lib/nmi_backtrace.c         |  3 +++
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 1db044f808b7..2a7d04049af4 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -18,12 +18,14 @@
 
 #ifdef CONFIG_PRINTK
 
-#define PRINTK_SAFE_CONTEXT_MASK	0x7fffffff
-#define PRINTK_NMI_CONTEXT_MASK	0x80000000
+#define PRINTK_SAFE_CONTEXT_MASK	 0x3fffffff
+#define PRINTK_NMI_DEFERRED_CONTEXT_MASK 0x40000000
+#define PRINTK_NMI_CONTEXT_MASK		 0x80000000
 
 extern raw_spinlock_t logbuf_lock;
 
 __printf(1, 0) int vprintk_default(const char *fmt, va_list args);
+__printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2984fb0f0257..16b519927d35 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2691,16 +2691,13 @@ void wake_up_klogd(void)
 	preempt_enable();
 }
 
-int printk_deferred(const char *fmt, ...)
+int vprintk_deferred(const char *fmt, va_list args)
 {
-	va_list args;
 	int r;
 
-	preempt_disable();
-	va_start(args, fmt);
 	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
 
+	preempt_disable();
 	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
 	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	preempt_enable();
@@ -2708,6 +2705,18 @@ int printk_deferred(const char *fmt, ...)
 	return r;
 }
 
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_deferred(fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /*
  * printk rate limiting, lifted from the networking subsystem.
  *
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 033e50a7d706..c3d165bcde42 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -308,12 +308,23 @@ static int vprintk_nmi(const char *fmt, va_list args)
 
 void printk_nmi_enter(void)
 {
-	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
+	/*
+	 * The size of the extra per-CPU buffer is limited. Use it
+	 * only when really needed.
+	 */
+	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
+	    raw_spin_is_locked(&logbuf_lock)) {
+		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
+	} else {
+		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
+	}
 }
 
 void printk_nmi_exit(void)
 {
-	this_cpu_and(printk_context, ~PRINTK_NMI_CONTEXT_MASK);
+	this_cpu_and(printk_context,
+		     ~(PRINTK_NMI_CONTEXT_MASK ||
+		       PRINTK_NMI_DEFERRED_CONTEXT_MASK));
 }
 
 #else
@@ -351,12 +362,22 @@ void __printk_safe_exit(void)
 
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 {
+	/* Use extra buffer in NMI when logbuf_lock is taken or in safe mode. */
 	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
 		return vprintk_nmi(fmt, args);
 
+	/* Use extra buffer to prevent a recursion deadlock in safe mode. */
 	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
 		return vprintk_safe(fmt, args);
 
+	/*
+	 * Use the main logbuf when logbuf_lock is available in NMI.
+	 * But avoid calling console drivers that might have their own locks.
+	 */
+	if (this_cpu_read(printk_context) & PRINTK_NMI_DEFERRED_CONTEXT_MASK)
+		return vprintk_deferred(fmt, args);
+
+	/* No obstacles. */
 	return vprintk_default(fmt, args);
 }
 
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 4e8a30d1c22f..0bc0a3535a8a 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 
 bool nmi_cpu_backtrace(struct pt_regs *regs)
 {
+	static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
 	int cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
+		arch_spin_lock(&lock);
 		if (regs && cpu_in_idle(instruction_pointer(regs))) {
 			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
 				cpu, instruction_pointer(regs));
@@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
 			else
 				dump_stack();
 		}
+		arch_spin_unlock(&lock);
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}
-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-20 13:11         ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-20 13:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Thu 2017-04-20 12:31:12, Sergey Senozhatsky wrote:
> Hello Steven,
> 
> On (04/19/17 13:13), Steven Rostedt wrote:
> > > printk() takes some locks and could not be used a safe way in NMI context.
> > 
> > I just found a problem with this solution. It kills ftrace dumps from
> > NMI context :-(
> > 
> > [ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
> > [ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns]
> > [ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
> > [ 1295.168497]
> > [ 1295.168498] Lost 4890096 message(s)!
> > [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
> > [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
> > [ 1296.811553] Call Trace:
> > [ 1296.811553]  <NMI>
> > 
> > I was hoping to see a cause of a hard lockup by enabling
> > ftrace_dump_on_oops. But as NMIs now have a very small buffer that
> > gets flushed, we need to find a new way to print out the full ftrace
> > buffer over serial.
> > 
> > Thoughts?
> 
> hmmm... a really tough one.
> 
> well, someone has to say this:
>  the simplest thing is to have a bigger PRINTK_SAFE_LOG_BUF_SHIFT value :)
> 
> 
> just thinking (well, sort of) out loud. the problem is that we can't tell if
> we already hold any printk related locks ("printk related locks" is not even
> well defined term). so printk from NMI can deadlock or it can be OK, we
> never know. but looking and vprintk_emit() and console_unlock() it seems that
> we have some sort of a hint now, which is this_cpu_read(printk_context) - if
> we are not in printk_safe context then we can say that _probably_ (and that's
> a Russian roulette) doing "normal" printk() will work. that is a *very-very*
> risky (and admittedly dumb) thing to assume, so we will move in a slightly
> different direction. checking this_cpu_read(printk_context) only assures us
> that we don't hold `logbuf_lock' on this CPU. and that is sort of something,
> at least we can be sure that doing printk_deferred() from this CPU is safe.
> printk_deferred() means that your NMI messages will end up in the logbuf,
> which is a) bigger in size than per-CPU buffer and b) some other CPU can
> immediately print those messages (hopefully).
> 
> we also switch to printk_safe mode for call_console_drivers() in
> console_unlock(). but we can't make any solid assumptions there - serial
> console lock can already be acquired, we don't have any markers for that.
> it may be reasonable to assume that if we are not in printk_safe mode on
> this CPU then serial console is not locked from this CPU, but there is
> nothing that can assure us.

Good analyze. I would summarize it that we need to be careful of:

  + logbug_lock
  + PRINTK_SAFE_CONTEXT
  + locks used by console drivers

The first two things are easy to check. Except that a check for logbuf_lock
might produce false negatives. The last check is very hard.

> so at the moment what I can think of is something like
> 
>   -- check this_cpu_read(printk_context) in NMI prink
> 
> 	-- if we are NOT in printk_safe on this CPU, then do printk_deferred()
> 	   and bypass `nmi_print_seq' buffer

I would add also a check for logbuf_lock.

> 	-- if we are in printk_safe
> 	  -- well... bad luck... have a bigger buffer.

Yup, we do the best effort while still trying to stay on the safe
side.

I have cooked up a patch based on this. It uses printk_deferred()
in NMI when it is safe. Note that console_flush_on_panic() will
try to get them on the console when a kdump is not generated.
I believe that it will help Steven.


>From c530d9dee91c74db5e6a198479e2e63b24cb84a2 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 20 Apr 2017 10:52:31 +0200
Subject: [PATCH] printk: Use the main logbuf in NMI when logbuf_lock is
 available

The commit 42a0bb3f71383b457a7d ("printk/nmi: generic solution for safe
printk in NMI") caused that printk stores messages into a temporary
buffer in NMI context.

The buffer is per-CPU and therefore the size is rather limited.
It works quite well for NMI backtraces. But there are longer logs
that might get printed in NMI context, for example, lockdep
warnings, ftrace_dump_on_oops.

The temporary buffer is used to avoid deadlocks caused by
logbuf_lock. Also it is needed to avoid races with the other
temporary buffer that is used when PRINTK_SAFE_CONTEXT is entered.
But the main buffer can be used in NMI if the lock is available
and we did not interrupt PRINTK_SAFE_CONTEXT.

The lock is checked using raw_spin_is_locked(). It might cause
false negatives when the lock is taken on another CPU outside NMI.
For this reason, we do the check in printk_nmi_enter(). It makes
the handling consistent for the entire NMI handler and avoids
reshuffling of the messages.

The patch also defines special printk context that allows
to use printk_deferred() in NMI. Note that we could not flush
the messages to the consoles because console drivers might use
many other internal locks.

The newly created vprintk_deferred() disables the preemption
only around the irq work handling. It is needed there to keep
the consistency between the two per-CPU variables. But there
is no reason to disable preemption around vprintk_emit().

Finally, the patch patch puts back explicit serialization
of the NMI backtraces from different CPUs. It was removed
by the commit a9edc88093287183ac934b ("x86/nmi: Perform
a safe NMI stack trace on all CPUs"). It was not needed
because the flushing of the temporary per-CPU buffers
was serialized.

Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/internal.h    |  6 ++++--
 kernel/printk/printk.c      | 19 ++++++++++++++-----
 kernel/printk/printk_safe.c | 25 +++++++++++++++++++++++--
 lib/nmi_backtrace.c         |  3 +++
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 1db044f808b7..2a7d04049af4 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -18,12 +18,14 @@
 
 #ifdef CONFIG_PRINTK
 
-#define PRINTK_SAFE_CONTEXT_MASK	0x7fffffff
-#define PRINTK_NMI_CONTEXT_MASK	0x80000000
+#define PRINTK_SAFE_CONTEXT_MASK	 0x3fffffff
+#define PRINTK_NMI_DEFERRED_CONTEXT_MASK 0x40000000
+#define PRINTK_NMI_CONTEXT_MASK		 0x80000000
 
 extern raw_spinlock_t logbuf_lock;
 
 __printf(1, 0) int vprintk_default(const char *fmt, va_list args);
+__printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2984fb0f0257..16b519927d35 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2691,16 +2691,13 @@ void wake_up_klogd(void)
 	preempt_enable();
 }
 
-int printk_deferred(const char *fmt, ...)
+int vprintk_deferred(const char *fmt, va_list args)
 {
-	va_list args;
 	int r;
 
-	preempt_disable();
-	va_start(args, fmt);
 	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
 
+	preempt_disable();
 	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
 	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	preempt_enable();
@@ -2708,6 +2705,18 @@ int printk_deferred(const char *fmt, ...)
 	return r;
 }
 
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_deferred(fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /*
  * printk rate limiting, lifted from the networking subsystem.
  *
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 033e50a7d706..c3d165bcde42 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -308,12 +308,23 @@ static int vprintk_nmi(const char *fmt, va_list args)
 
 void printk_nmi_enter(void)
 {
-	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
+	/*
+	 * The size of the extra per-CPU buffer is limited. Use it
+	 * only when really needed.
+	 */
+	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
+	    raw_spin_is_locked(&logbuf_lock)) {
+		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
+	} else {
+		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
+	}
 }
 
 void printk_nmi_exit(void)
 {
-	this_cpu_and(printk_context, ~PRINTK_NMI_CONTEXT_MASK);
+	this_cpu_and(printk_context,
+		     ~(PRINTK_NMI_CONTEXT_MASK ||
+		       PRINTK_NMI_DEFERRED_CONTEXT_MASK));
 }
 
 #else
@@ -351,12 +362,22 @@ void __printk_safe_exit(void)
 
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 {
+	/* Use extra buffer in NMI when logbuf_lock is taken or in safe mode. */
 	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
 		return vprintk_nmi(fmt, args);
 
+	/* Use extra buffer to prevent a recursion deadlock in safe mode. */
 	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
 		return vprintk_safe(fmt, args);
 
+	/*
+	 * Use the main logbuf when logbuf_lock is available in NMI.
+	 * But avoid calling console drivers that might have their own locks.
+	 */
+	if (this_cpu_read(printk_context) & PRINTK_NMI_DEFERRED_CONTEXT_MASK)
+		return vprintk_deferred(fmt, args);
+
+	/* No obstacles. */
 	return vprintk_default(fmt, args);
 }
 
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 4e8a30d1c22f..0bc0a3535a8a 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 
 bool nmi_cpu_backtrace(struct pt_regs *regs)
 {
+	static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
 	int cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
+		arch_spin_lock(&lock);
 		if (regs && cpu_in_idle(instruction_pointer(regs))) {
 			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
 				cpu, instruction_pointer(regs));
@@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
 			else
 				dump_stack();
 		}
+		arch_spin_unlock(&lock);
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-20 13:11         ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-20 13:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Thu 2017-04-20 12:31:12, Sergey Senozhatsky wrote:
> Hello Steven,
> 
> On (04/19/17 13:13), Steven Rostedt wrote:
> > > printk() takes some locks and could not be used a safe way in NMI context.
> > 
> > I just found a problem with this solution. It kills ftrace dumps from
> > NMI context :-(
> > 
> > [ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
> > [ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns]
> > [ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
> > [ 1295.168497]
> > [ 1295.168498] Lost 4890096 message(s)!
> > [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
> > [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
> > [ 1296.811553] Call Trace:
> > [ 1296.811553]  <NMI>
> > 
> > I was hoping to see a cause of a hard lockup by enabling
> > ftrace_dump_on_oops. But as NMIs now have a very small buffer that
> > gets flushed, we need to find a new way to print out the full ftrace
> > buffer over serial.
> > 
> > Thoughts?
> 
> hmmm... a really tough one.
> 
> well, someone has to say this:
>  the simplest thing is to have a bigger PRINTK_SAFE_LOG_BUF_SHIFT value :)
> 
> 
> just thinking (well, sort of) out loud. the problem is that we can't tell if
> we already hold any printk related locks ("printk related locks" is not even
> well defined term). so printk from NMI can deadlock or it can be OK, we
> never know. but looking and vprintk_emit() and console_unlock() it seems that
> we have some sort of a hint now, which is this_cpu_read(printk_context) - if
> we are not in printk_safe context then we can say that _probably_ (and that's
> a Russian roulette) doing "normal" printk() will work. that is a *very-very*
> risky (and admittedly dumb) thing to assume, so we will move in a slightly
> different direction. checking this_cpu_read(printk_context) only assures us
> that we don't hold `logbuf_lock' on this CPU. and that is sort of something,
> at least we can be sure that doing printk_deferred() from this CPU is safe.
> printk_deferred() means that your NMI messages will end up in the logbuf,
> which is a) bigger in size than per-CPU buffer and b) some other CPU can
> immediately print those messages (hopefully).
> 
> we also switch to printk_safe mode for call_console_drivers() in
> console_unlock(). but we can't make any solid assumptions there - serial
> console lock can already be acquired, we don't have any markers for that.
> it may be reasonable to assume that if we are not in printk_safe mode on
> this CPU then serial console is not locked from this CPU, but there is
> nothing that can assure us.

Good analyze. I would summarize it that we need to be careful of:

  + logbug_lock
  + PRINTK_SAFE_CONTEXT
  + locks used by console drivers

The first two things are easy to check. Except that a check for logbuf_lock
might produce false negatives. The last check is very hard.

> so at the moment what I can think of is something like
> 
>   -- check this_cpu_read(printk_context) in NMI prink
> 
> 	-- if we are NOT in printk_safe on this CPU, then do printk_deferred()
> 	   and bypass `nmi_print_seq' buffer

I would add also a check for logbuf_lock.

> 	-- if we are in printk_safe
> 	  -- well... bad luck... have a bigger buffer.

Yup, we do the best effort while still trying to stay on the safe
side.

I have cooked up a patch based on this. It uses printk_deferred()
in NMI when it is safe. Note that console_flush_on_panic() will
try to get them on the console when a kdump is not generated.
I believe that it will help Steven.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-20 13:11         ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-20 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu 2017-04-20 12:31:12, Sergey Senozhatsky wrote:
> Hello Steven,
> 
> On (04/19/17 13:13), Steven Rostedt wrote:
> > > printk() takes some locks and could not be used a safe way in NMI context.
> > 
> > I just found a problem with this solution. It kills ftrace dumps from
> > NMI context :-(
> > 
> > [ 1295.168495]    <...>-67423  10dNh1 382171111us : do_raw_spin_lock <-_raw_spin_lock
> > [ 1295.168495]    <...>-67423  10dNh1 382171111us : sched_stat_runtime: comm=cc1 pid=67423 runtime=96858 [ns] vruntime=11924198270 [ns]
> > [ 1295.168496]    <...>-67423  10dNh1 382171111us : lock_acquire: ffffffff81c5c940 read rcu_read_lock
> > [ 1295.168497]
> > [ 1295.168498] Lost 4890096 message(s)!
> > [ 1296.805063] ---[ end Kernel panic - not syncing: Hard LOCKUP
> > [ 1296.811553] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81046fc7 (native_apic_msr_write+0x27/0x40)
> > [ 1296.811553] Call Trace:
> > [ 1296.811553]  <NMI>
> > 
> > I was hoping to see a cause of a hard lockup by enabling
> > ftrace_dump_on_oops. But as NMIs now have a very small buffer that
> > gets flushed, we need to find a new way to print out the full ftrace
> > buffer over serial.
> > 
> > Thoughts?
> 
> hmmm... a really tough one.
> 
> well, someone has to say this:
>  the simplest thing is to have a bigger PRINTK_SAFE_LOG_BUF_SHIFT value :)
> 
> 
> just thinking (well, sort of) out loud. the problem is that we can't tell if
> we already hold any printk related locks ("printk related locks" is not even
> well defined term). so printk from NMI can deadlock or it can be OK, we
> never know. but looking and vprintk_emit() and console_unlock() it seems that
> we have some sort of a hint now, which is this_cpu_read(printk_context) - if
> we are not in printk_safe context then we can say that _probably_ (and that's
> a Russian roulette) doing "normal" printk() will work. that is a *very-very*
> risky (and admittedly dumb) thing to assume, so we will move in a slightly
> different direction. checking this_cpu_read(printk_context) only assures us
> that we don't hold `logbuf_lock' on this CPU. and that is sort of something,
> at least we can be sure that doing printk_deferred() from this CPU is safe.
> printk_deferred() means that your NMI messages will end up in the logbuf,
> which is a) bigger in size than per-CPU buffer and b) some other CPU can
> immediately print those messages (hopefully).
> 
> we also switch to printk_safe mode for call_console_drivers() in
> console_unlock(). but we can't make any solid assumptions there - serial
> console lock can already be acquired, we don't have any markers for that.
> it may be reasonable to assume that if we are not in printk_safe mode on
> this CPU then serial console is not locked from this CPU, but there is
> nothing that can assure us.

Good analyze. I would summarize it that we need to be careful of:

  + logbug_lock
  + PRINTK_SAFE_CONTEXT
  + locks used by console drivers

The first two things are easy to check. Except that a check for logbuf_lock
might produce false negatives. The last check is very hard.

> so at the moment what I can think of is something like
> 
>   -- check this_cpu_read(printk_context) in NMI prink
> 
> 	-- if we are NOT in printk_safe on this CPU, then do printk_deferred()
> 	   and bypass `nmi_print_seq' buffer

I would add also a check for logbuf_lock.

> 	-- if we are in printk_safe
> 	  -- well... bad luck... have a bigger buffer.

Yup, we do the best effort while still trying to stay on the safe
side.

I have cooked up a patch based on this. It uses printk_deferred()
in NMI when it is safe. Note that console_flush_on_panic() will
try to get them on the console when a kdump is not generated.
I believe that it will help Steven.


>From c530d9dee91c74db5e6a198479e2e63b24cb84a2 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 20 Apr 2017 10:52:31 +0200
Subject: [PATCH] printk: Use the main logbuf in NMI when logbuf_lock is
 available

The commit 42a0bb3f71383b457a7d ("printk/nmi: generic solution for safe
printk in NMI") caused that printk stores messages into a temporary
buffer in NMI context.

The buffer is per-CPU and therefore the size is rather limited.
It works quite well for NMI backtraces. But there are longer logs
that might get printed in NMI context, for example, lockdep
warnings, ftrace_dump_on_oops.

The temporary buffer is used to avoid deadlocks caused by
logbuf_lock. Also it is needed to avoid races with the other
temporary buffer that is used when PRINTK_SAFE_CONTEXT is entered.
But the main buffer can be used in NMI if the lock is available
and we did not interrupt PRINTK_SAFE_CONTEXT.

The lock is checked using raw_spin_is_locked(). It might cause
false negatives when the lock is taken on another CPU outside NMI.
For this reason, we do the check in printk_nmi_enter(). It makes
the handling consistent for the entire NMI handler and avoids
reshuffling of the messages.

The patch also defines special printk context that allows
to use printk_deferred() in NMI. Note that we could not flush
the messages to the consoles because console drivers might use
many other internal locks.

The newly created vprintk_deferred() disables the preemption
only around the irq work handling. It is needed there to keep
the consistency between the two per-CPU variables. But there
is no reason to disable preemption around vprintk_emit().

Finally, the patch patch puts back explicit serialization
of the NMI backtraces from different CPUs. It was removed
by the commit a9edc88093287183ac934b ("x86/nmi: Perform
a safe NMI stack trace on all CPUs"). It was not needed
because the flushing of the temporary per-CPU buffers
was serialized.

Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/internal.h    |  6 ++++--
 kernel/printk/printk.c      | 19 ++++++++++++++-----
 kernel/printk/printk_safe.c | 25 +++++++++++++++++++++++--
 lib/nmi_backtrace.c         |  3 +++
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 1db044f808b7..2a7d04049af4 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -18,12 +18,14 @@
 
 #ifdef CONFIG_PRINTK
 
-#define PRINTK_SAFE_CONTEXT_MASK	0x7fffffff
-#define PRINTK_NMI_CONTEXT_MASK	0x80000000
+#define PRINTK_SAFE_CONTEXT_MASK	 0x3fffffff
+#define PRINTK_NMI_DEFERRED_CONTEXT_MASK 0x40000000
+#define PRINTK_NMI_CONTEXT_MASK		 0x80000000
 
 extern raw_spinlock_t logbuf_lock;
 
 __printf(1, 0) int vprintk_default(const char *fmt, va_list args);
+__printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2984fb0f0257..16b519927d35 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2691,16 +2691,13 @@ void wake_up_klogd(void)
 	preempt_enable();
 }
 
-int printk_deferred(const char *fmt, ...)
+int vprintk_deferred(const char *fmt, va_list args)
 {
-	va_list args;
 	int r;
 
-	preempt_disable();
-	va_start(args, fmt);
 	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
 
+	preempt_disable();
 	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
 	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
 	preempt_enable();
@@ -2708,6 +2705,18 @@ int printk_deferred(const char *fmt, ...)
 	return r;
 }
 
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_deferred(fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /*
  * printk rate limiting, lifted from the networking subsystem.
  *
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 033e50a7d706..c3d165bcde42 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -308,12 +308,23 @@ static int vprintk_nmi(const char *fmt, va_list args)
 
 void printk_nmi_enter(void)
 {
-	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
+	/*
+	 * The size of the extra per-CPU buffer is limited. Use it
+	 * only when really needed.
+	 */
+	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
+	    raw_spin_is_locked(&logbuf_lock)) {
+		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
+	} else {
+		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
+	}
 }
 
 void printk_nmi_exit(void)
 {
-	this_cpu_and(printk_context, ~PRINTK_NMI_CONTEXT_MASK);
+	this_cpu_and(printk_context,
+		     ~(PRINTK_NMI_CONTEXT_MASK ||
+		       PRINTK_NMI_DEFERRED_CONTEXT_MASK));
 }
 
 #else
@@ -351,12 +362,22 @@ void __printk_safe_exit(void)
 
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 {
+	/* Use extra buffer in NMI when logbuf_lock is taken or in safe mode. */
 	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
 		return vprintk_nmi(fmt, args);
 
+	/* Use extra buffer to prevent a recursion deadlock in safe mode. */
 	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
 		return vprintk_safe(fmt, args);
 
+	/*
+	 * Use the main logbuf when logbuf_lock is available in NMI.
+	 * But avoid calling console drivers that might have their own locks.
+	 */
+	if (this_cpu_read(printk_context) & PRINTK_NMI_DEFERRED_CONTEXT_MASK)
+		return vprintk_deferred(fmt, args);
+
+	/* No obstacles. */
 	return vprintk_default(fmt, args);
 }
 
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 4e8a30d1c22f..0bc0a3535a8a 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 
 bool nmi_cpu_backtrace(struct pt_regs *regs)
 {
+	static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
 	int cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
+		arch_spin_lock(&lock);
 		if (regs && cpu_in_idle(instruction_pointer(regs))) {
 			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
 				cpu, instruction_pointer(regs));
@@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
 			else
 				dump_stack();
 		}
+		arch_spin_unlock(&lock);
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}
-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-20 13:11         ` Petr Mladek
  (?)
@ 2017-04-21  1:57           ` Sergey Senozhatsky
  -1 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-21  1:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On (04/20/17 15:11), Petr Mladek wrote:
[..]
> Good analyze. I would summarize it that we need to be careful of:
> 
>   + logbug_lock
>   + PRINTK_SAFE_CONTEXT
>   + locks used by console drivers
> 
> The first two things are easy to check. Except that a check for logbuf_lock
> might produce false negatives. The last check is very hard.
> 
> > so at the moment what I can think of is something like
> > 
> >   -- check this_cpu_read(printk_context) in NMI prink
> > 
> > 	-- if we are NOT in printk_safe on this CPU, then do printk_deferred()
> > 	   and bypass `nmi_print_seq' buffer
> 
> I would add also a check for logbuf_lock.
> > 	-- if we are in printk_safe
> > 	  -- well... bad luck... have a bigger buffer.
> 
> Yup, we do the best effort while still trying to stay on the safe
> side.
> 
> I have cooked up a patch based on this. It uses printk_deferred()
> in NMI when it is safe. Note that console_flush_on_panic() will
> try to get them on the console when a kdump is not generated.
> I believe that it will help Steven.


OK. I need to look more at the patch. It does more than I'd expected/imagined.


[..]
>  void printk_nmi_enter(void)
>  {
> -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> +	/*
> +	 * The size of the extra per-CPU buffer is limited. Use it
> +	 * only when really needed.
> +	 */
> +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> +	    raw_spin_is_locked(&logbuf_lock)) {
> +		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> +	} else {
> +		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> +	}
>  }

well... the logbuf_lock can temporarily be locked from another CPU. I'd say
that spin_is_locked() has better chances for false positive than
this_cpu_read(printk_context). because this_cpu_read(printk_context) depends
only on this CPU state, while spin_is_locked() depends on all CPUs. and the
idea with this_cpu_read(printk_context) was that we check if the logbuf_lock
was locked from this particular CPU.

I agree that this_cpu_read(printk_context) covers slightly more than
logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
with unlocked logbuf_lock, but I don't tend to think that it's a big
problem.


wouldn't something as simple as below do the trick?
// absolutely and completely untested //


diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 033e50a7d706..c7477654c5b1 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
 {
        struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
 
-       return printk_safe_log_store(s, fmt, args);
+       if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
+               return printk_safe_log_store(s, fmt, args);
+
+       return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
 }

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-21  1:57           ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-21  1:57 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina,
	Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86,
	linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux,
	Jan Kara, Ralf Baechle, Benjamin Herrenschmidt,
	Martin Schwidefsky, David Miller

Hello,

On (04/20/17 15:11), Petr Mladek wrote:
[..]
> Good analyze. I would summarize it that we need to be careful of:
> 
>   + logbug_lock
>   + PRINTK_SAFE_CONTEXT
>   + locks used by console drivers
> 
> The first two things are easy to check. Except that a check for logbuf_lock
> might produce false negatives. The last check is very hard.
> 
> > so at the moment what I can think of is something like
> > 
> >   -- check this_cpu_read(printk_context) in NMI prink
> > 
> > 	-- if we are NOT in printk_safe on this CPU, then do printk_deferred()
> > 	   and bypass `nmi_print_seq' buffer
> 
> I would add also a check for logbuf_lock.
> > 	-- if we are in printk_safe
> > 	  -- well... bad luck... have a bigger buffer.
> 
> Yup, we do the best effort while still trying to stay on the safe
> side.
> 
> I have cooked up a patch based on this. It uses printk_deferred()
> in NMI when it is safe. Note that console_flush_on_panic() will
> try to get them on the console when a kdump is not generated.
> I believe that it will help Steven.


OK. I need to look more at the patch. It does more than I'd expected/imagined.


[..]
>  void printk_nmi_enter(void)
>  {
> -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> +	/*
> +	 * The size of the extra per-CPU buffer is limited. Use it
> +	 * only when really needed.
> +	 */
> +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> +	    raw_spin_is_locked(&logbuf_lock)) {
> +		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> +	} else {
> +		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> +	}
>  }

well... the logbuf_lock can temporarily be locked from another CPU. I'd say
that spin_is_locked() has better chances for false positive than
this_cpu_read(printk_context). because this_cpu_read(printk_context) depends
only on this CPU state, while spin_is_locked() depends on all CPUs. and the
idea with this_cpu_read(printk_context) was that we check if the logbuf_lock
was locked from this particular CPU.

I agree that this_cpu_read(printk_context) covers slightly more than
logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
with unlocked logbuf_lock, but I don't tend to think that it's a big
problem.


wouldn't something as simple as below do the trick?
// absolutely and completely untested //


diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 033e50a7d706..c7477654c5b1 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
 {
        struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
 
-       return printk_safe_log_store(s, fmt, args);
+       if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
+               return printk_safe_log_store(s, fmt, args);
+
+       return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
 }

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-21  1:57           ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-21  1:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On (04/20/17 15:11), Petr Mladek wrote:
[..]
> Good analyze. I would summarize it that we need to be careful of:
> 
>   + logbug_lock
>   + PRINTK_SAFE_CONTEXT
>   + locks used by console drivers
> 
> The first two things are easy to check. Except that a check for logbuf_lock
> might produce false negatives. The last check is very hard.
> 
> > so at the moment what I can think of is something like
> > 
> >   -- check this_cpu_read(printk_context) in NMI prink
> > 
> > 	-- if we are NOT in printk_safe on this CPU, then do printk_deferred()
> > 	   and bypass `nmi_print_seq' buffer
> 
> I would add also a check for logbuf_lock.
> > 	-- if we are in printk_safe
> > 	  -- well... bad luck... have a bigger buffer.
> 
> Yup, we do the best effort while still trying to stay on the safe
> side.
> 
> I have cooked up a patch based on this. It uses printk_deferred()
> in NMI when it is safe. Note that console_flush_on_panic() will
> try to get them on the console when a kdump is not generated.
> I believe that it will help Steven.


OK. I need to look more at the patch. It does more than I'd expected/imagined.


[..]
>  void printk_nmi_enter(void)
>  {
> -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> +	/*
> +	 * The size of the extra per-CPU buffer is limited. Use it
> +	 * only when really needed.
> +	 */
> +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> +	    raw_spin_is_locked(&logbuf_lock)) {
> +		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> +	} else {
> +		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> +	}
>  }

well... the logbuf_lock can temporarily be locked from another CPU. I'd say
that spin_is_locked() has better chances for false positive than
this_cpu_read(printk_context). because this_cpu_read(printk_context) depends
only on this CPU state, while spin_is_locked() depends on all CPUs. and the
idea with this_cpu_read(printk_context) was that we check if the logbuf_lock
was locked from this particular CPU.

I agree that this_cpu_read(printk_context) covers slightly more than
logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
with unlocked logbuf_lock, but I don't tend to think that it's a big
problem.


wouldn't something as simple as below do the trick?
// absolutely and completely untested //


diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 033e50a7d706..c7477654c5b1 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
 {
        struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
 
-       return printk_safe_log_store(s, fmt, args);
+       if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
+               return printk_safe_log_store(s, fmt, args);
+
+       return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
 }

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-21  1:57           ` Sergey Senozhatsky
  (?)
@ 2017-04-21 12:06             ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-21 12:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri 2017-04-21 10:57:25, Sergey Senozhatsky wrote:
> On (04/20/17 15:11), Petr Mladek wrote:
> [..]
> >  void printk_nmi_enter(void)
> >  {
> > -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > +	/*
> > +	 * The size of the extra per-CPU buffer is limited. Use it
> > +	 * only when really needed.
> > +	 */
> > +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> > +	    raw_spin_is_locked(&logbuf_lock)) {
> > +		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > +	} else {
> > +		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> > +	}
> >  }
> 
> well... the logbuf_lock can temporarily be locked from another CPU. I'd say
> that spin_is_locked() has better chances for false positive than
> this_cpu_read(printk_context). because this_cpu_read(printk_context) depends
> only on this CPU state, while spin_is_locked() depends on all CPUs. and the
> idea with this_cpu_read(printk_context) was that we check if the logbuf_lock
> was locked from this particular CPU.

I finally see the point. I confess that I did not think much about
this way yesterday because it looked too tricky (indirect) and possibly
error prone.

> I agree that this_cpu_read(printk_context) covers slightly more than
> logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> with unlocked logbuf_lock, but I don't tend to think that it's a big
> problem.

PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
It might take rather long and logbuf_lock is availe. So, it is
noticeable source of false positives.

Hmm, my version actually checks both the lock and the context.
It is very deffensive to be always on the safe side.

We could get the best restults with both checks and by using "&&":

void printk_nmi_enter(void)
{
	/*
	 * The size of the extra per-CPU buffer is limited. Use it
	 * only when the lock for the main log buffer is not
	 * available.
	 *
	 * logbuf_lock might be taken on another CPU. But it must be
	 * in PRINTK_SAFE context. Reduce false positives by a check
	 * of the context.
	 */
	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
	    raw_spin_is_locked(&logbuf_lock)) {
		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
	} else {
		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
	}
}


But after all, I would prefer to keep it simple, strightforward,
and check only the logbuf_lock:

	if (raw_spin_is_locked(&logbuf_lock))
		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
	else
		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);


> wouldn't something as simple as below do the trick?
> // absolutely and completely untested //
> 
> 
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index 033e50a7d706..c7477654c5b1 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
>  {
>         struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
>  
> -       return printk_safe_log_store(s, fmt, args);
> +       if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
> +               return printk_safe_log_store(s, fmt, args);
> +
> +       return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
>  }

It looks simple but some things are missing. It will be used also
outside panic/oops, so it should queue the irq_work to flush the console.
Also the serialization of nmi_cpu_backtrace() backtrace calls is
needed.

All in all, we could get rid only of the extra
PRINTK_NMI_DEFERRED_CONTEXT_MASK with this approach. Other than
that it looks more tricky to me.

Sigh, I hate problems without the single best solution.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-21 12:06             ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-21 12:06 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Fri 2017-04-21 10:57:25, Sergey Senozhatsky wrote:
> On (04/20/17 15:11), Petr Mladek wrote:
> [..]
> >  void printk_nmi_enter(void)
> >  {
> > -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > +	/*
> > +	 * The size of the extra per-CPU buffer is limited. Use it
> > +	 * only when really needed.
> > +	 */
> > +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> > +	    raw_spin_is_locked(&logbuf_lock)) {
> > +		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > +	} else {
> > +		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> > +	}
> >  }
> 
> well... the logbuf_lock can temporarily be locked from another CPU. I'd say
> that spin_is_locked() has better chances for false positive than
> this_cpu_read(printk_context). because this_cpu_read(printk_context) depends
> only on this CPU state, while spin_is_locked() depends on all CPUs. and the
> idea with this_cpu_read(printk_context) was that we check if the logbuf_lock
> was locked from this particular CPU.

I finally see the point. I confess that I did not think much about
this way yesterday because it looked too tricky (indirect) and possibly
error prone.

> I agree that this_cpu_read(printk_context) covers slightly more than
> logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> with unlocked logbuf_lock, but I don't tend to think that it's a big
> problem.

PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
It might take rather long and logbuf_lock is availe. So, it is
noticeable source of false positives.

Hmm, my version actually checks both the lock and the context.
It is very deffensive to be always on the safe side.

We could get the best restults with both checks and by using "&&":

void printk_nmi_enter(void)
{
	/*
	 * The size of the extra per-CPU buffer is limited. Use it
	 * only when the lock for the main log buffer is not
	 * available.
	 *
	 * logbuf_lock might be taken on another CPU. But it must be
	 * in PRINTK_SAFE context. Reduce false positives by a check
	 * of the context.
	 */
	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
	    raw_spin_is_locked(&logbuf_lock)) {
		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
	} else {
		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
	}
}


But after all, I would prefer to keep it simple, strightforward,
and check only the logbuf_lock:

	if (raw_spin_is_locked(&logbuf_lock))
		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
	else
		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);


> wouldn't something as simple as below do the trick?
> // absolutely and completely untested //
> 
> 
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index 033e50a7d706..c7477654c5b1 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
>  {
>         struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
>  
> -       return printk_safe_log_store(s, fmt, args);
> +       if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
> +               return printk_safe_log_store(s, fmt, args);
> +
> +       return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
>  }

It looks simple but some things are missing. It will be used also
outside panic/oops, so it should queue the irq_work to flush the console.
Also the serialization of nmi_cpu_backtrace() backtrace calls is
needed.

All in all, we could get rid only of the extra
PRINTK_NMI_DEFERRED_CONTEXT_MASK with this approach. Other than
that it looks more tricky to me.

Sigh, I hate problems without the single best solution.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-21 12:06             ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-21 12:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri 2017-04-21 10:57:25, Sergey Senozhatsky wrote:
> On (04/20/17 15:11), Petr Mladek wrote:
> [..]
> >  void printk_nmi_enter(void)
> >  {
> > -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > +	/*
> > +	 * The size of the extra per-CPU buffer is limited. Use it
> > +	 * only when really needed.
> > +	 */
> > +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> > +	    raw_spin_is_locked(&logbuf_lock)) {
> > +		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > +	} else {
> > +		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> > +	}
> >  }
> 
> well... the logbuf_lock can temporarily be locked from another CPU. I'd say
> that spin_is_locked() has better chances for false positive than
> this_cpu_read(printk_context). because this_cpu_read(printk_context) depends
> only on this CPU state, while spin_is_locked() depends on all CPUs. and the
> idea with this_cpu_read(printk_context) was that we check if the logbuf_lock
> was locked from this particular CPU.

I finally see the point. I confess that I did not think much about
this way yesterday because it looked too tricky (indirect) and possibly
error prone.

> I agree that this_cpu_read(printk_context) covers slightly more than
> logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> with unlocked logbuf_lock, but I don't tend to think that it's a big
> problem.

PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
It might take rather long and logbuf_lock is availe. So, it is
noticeable source of false positives.

Hmm, my version actually checks both the lock and the context.
It is very deffensive to be always on the safe side.

We could get the best restults with both checks and by using "&&":

void printk_nmi_enter(void)
{
	/*
	 * The size of the extra per-CPU buffer is limited. Use it
	 * only when the lock for the main log buffer is not
	 * available.
	 *
	 * logbuf_lock might be taken on another CPU. But it must be
	 * in PRINTK_SAFE context. Reduce false positives by a check
	 * of the context.
	 */
	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
	    raw_spin_is_locked(&logbuf_lock)) {
		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
	} else {
		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
	}
}


But after all, I would prefer to keep it simple, strightforward,
and check only the logbuf_lock:

	if (raw_spin_is_locked(&logbuf_lock))
		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
	else
		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);


> wouldn't something as simple as below do the trick?
> // absolutely and completely untested //
> 
> 
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index 033e50a7d706..c7477654c5b1 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
>  {
>         struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
>  
> -       return printk_safe_log_store(s, fmt, args);
> +       if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
> +               return printk_safe_log_store(s, fmt, args);
> +
> +       return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
>  }

It looks simple but some things are missing. It will be used also
outside panic/oops, so it should queue the irq_work to flush the console.
Also the serialization of nmi_cpu_backtrace() backtrace calls is
needed.

All in all, we could get rid only of the extra
PRINTK_NMI_DEFERRED_CONTEXT_MASK with this approach. Other than
that it looks more tricky to me.

Sigh, I hate problems without the single best solution.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-21 12:06             ` Petr Mladek
  (?)
@ 2017-04-24  2:17               ` Sergey Senozhatsky
  -1 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-24  2:17 UTC (permalink / raw)
  To: linux-arm-kernel

On (04/21/17 14:06), Petr Mladek wrote:
[..]
> > I agree that this_cpu_read(printk_context) covers slightly more than
> > logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> > with unlocked logbuf_lock, but I don't tend to think that it's a big
> > problem.
> 
> PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
> It might take rather long and logbuf_lock is availe. So, it is
> noticeable source of false positives.

yes, agree.

probably we need additional printk_safe annotations for
		"logbuf_lock is locked from _this_ CPU"

false positives there can be very painful.

[..]
> 	if (raw_spin_is_locked(&logbuf_lock))
> 		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> 	else
> 		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);

well, if everyone is fine with logbuf_lock access from every CPU from every
NMI then I won't object either. but may be it makes sense to reduce the
possibility of false positives. Steven is loosing critically important logs,
after all.


by the way,
does this `nmi_print_seq' bypass even fix anything for Steven? it sort of
can, in theory, but just in theory. so may be we need direct message flush
from NMI handler (printk->console_unlock), which will be a really big problem.

logbuf might not be big enough for 4890096 messages (Steven's report
mentions "Lost 4890096 message(s)!"). we are counting on the fact that
in case of `nmi_print_seq' bypass some other CPU will call console_unlock()
and print pending logbuf messages, but this is not guaranteed and the
messages can be dropped even from logbuf.

I don't know,
should we try to queue printk_deferred irq_work for all online CPUs from
vprintk_nmi() when it bypasses printk_safe_log_store()? in order to minimize
possibilities of logbuf overflow. printk_deferred() will queue work on
vprintk_nmi() CPU, sure, but we don't know how many messages we are going
to add to logbuf from NMI.


> > @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
> >  {
> >         struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> >  
> > -       return printk_safe_log_store(s, fmt, args);
> > +       if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
> > +               return printk_safe_log_store(s, fmt, args);
> > +
> > +       return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> >  }
> 
> It looks simple but some things are missing. It will be used also
> outside panic/oops, so it should queue the irq_work to flush the console.

you are right. I thought about moving irq_work to vprintk_emit(), but
completely forgot about it. without that missing bit the proposed two-liner
is not complete.

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-24  2:17               ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-24  2:17 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina,
	Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86,
	linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux,
	Jan Kara, Ralf Baechle, Benjamin Herrenschmidt,
	Martin Schwidefsky, David Miller

On (04/21/17 14:06), Petr Mladek wrote:
[..]
> > I agree that this_cpu_read(printk_context) covers slightly more than
> > logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> > with unlocked logbuf_lock, but I don't tend to think that it's a big
> > problem.
> 
> PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
> It might take rather long and logbuf_lock is availe. So, it is
> noticeable source of false positives.

yes, agree.

probably we need additional printk_safe annotations for
		"logbuf_lock is locked from _this_ CPU"

false positives there can be very painful.

[..]
> 	if (raw_spin_is_locked(&logbuf_lock))
> 		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> 	else
> 		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);

well, if everyone is fine with logbuf_lock access from every CPU from every
NMI then I won't object either. but may be it makes sense to reduce the
possibility of false positives. Steven is loosing critically important logs,
after all.


by the way,
does this `nmi_print_seq' bypass even fix anything for Steven? it sort of
can, in theory, but just in theory. so may be we need direct message flush
from NMI handler (printk->console_unlock), which will be a really big problem.

logbuf might not be big enough for 4890096 messages (Steven's report
mentions "Lost 4890096 message(s)!"). we are counting on the fact that
in case of `nmi_print_seq' bypass some other CPU will call console_unlock()
and print pending logbuf messages, but this is not guaranteed and the
messages can be dropped even from logbuf.

I don't know,
should we try to queue printk_deferred irq_work for all online CPUs from
vprintk_nmi() when it bypasses printk_safe_log_store()? in order to minimize
possibilities of logbuf overflow. printk_deferred() will queue work on
vprintk_nmi() CPU, sure, but we don't know how many messages we are going
to add to logbuf from NMI.


> > @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
> >  {
> >         struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> >  
> > -       return printk_safe_log_store(s, fmt, args);
> > +       if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
> > +               return printk_safe_log_store(s, fmt, args);
> > +
> > +       return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> >  }
> 
> It looks simple but some things are missing. It will be used also
> outside panic/oops, so it should queue the irq_work to flush the console.

you are right. I thought about moving irq_work to vprintk_emit(), but
completely forgot about it. without that missing bit the proposed two-liner
is not complete.

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-24  2:17               ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-24  2:17 UTC (permalink / raw)
  To: linux-arm-kernel

On (04/21/17 14:06), Petr Mladek wrote:
[..]
> > I agree that this_cpu_read(printk_context) covers slightly more than
> > logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> > with unlocked logbuf_lock, but I don't tend to think that it's a big
> > problem.
> 
> PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
> It might take rather long and logbuf_lock is availe. So, it is
> noticeable source of false positives.

yes, agree.

probably we need additional printk_safe annotations for
		"logbuf_lock is locked from _this_ CPU"

false positives there can be very painful.

[..]
> 	if (raw_spin_is_locked(&logbuf_lock))
> 		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> 	else
> 		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);

well, if everyone is fine with logbuf_lock access from every CPU from every
NMI then I won't object either. but may be it makes sense to reduce the
possibility of false positives. Steven is loosing critically important logs,
after all.


by the way,
does this `nmi_print_seq' bypass even fix anything for Steven? it sort of
can, in theory, but just in theory. so may be we need direct message flush
from NMI handler (printk->console_unlock), which will be a really big problem.

logbuf might not be big enough for 4890096 messages (Steven's report
mentions "Lost 4890096 message(s)!"). we are counting on the fact that
in case of `nmi_print_seq' bypass some other CPU will call console_unlock()
and print pending logbuf messages, but this is not guaranteed and the
messages can be dropped even from logbuf.

I don't know,
should we try to queue printk_deferred irq_work for all online CPUs from
vprintk_nmi() when it bypasses printk_safe_log_store()? in order to minimize
possibilities of logbuf overflow. printk_deferred() will queue work on
vprintk_nmi() CPU, sure, but we don't know how many messages we are going
to add to logbuf from NMI.


> > @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
> >  {
> >         struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> >  
> > -       return printk_safe_log_store(s, fmt, args);
> > +       if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
> > +               return printk_safe_log_store(s, fmt, args);
> > +
> > +       return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> >  }
> 
> It looks simple but some things are missing. It will be used also
> outside panic/oops, so it should queue the irq_work to flush the console.

you are right. I thought about moving irq_work to vprintk_emit(), but
completely forgot about it. without that missing bit the proposed two-liner
is not complete.

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-24  2:17               ` Sergey Senozhatsky
  (?)
@ 2017-04-27 13:38                 ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-27 13:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon 2017-04-24 11:17:47, Sergey Senozhatsky wrote:
> On (04/21/17 14:06), Petr Mladek wrote:
> [..]
> > > I agree that this_cpu_read(printk_context) covers slightly more than
> > > logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> > > with unlocked logbuf_lock, but I don't tend to think that it's a big
> > > problem.
> > 
> > PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
> > It might take rather long and logbuf_lock is availe. So, it is
> > noticeable source of false positives.
> 
> yes, agree.
> 
> probably we need additional printk_safe annotations for
> 		"logbuf_lock is locked from _this_ CPU"
> 
> false positives there can be very painful.
> 
> [..]
> > 	if (raw_spin_is_locked(&logbuf_lock))
> > 		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > 	else
> > 		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> 
> well, if everyone is fine with logbuf_lock access from every CPU from every
> NMI then I won't object either. but may be it makes sense to reduce the
> possibility of false positives. Steven is loosing critically important logs,
> after all.
> 
> 
> by the way,
> does this `nmi_print_seq' bypass even fix anything for Steven?

I think that this is the most important question.

Steven, does the patch from
https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz
help you to see the debug messages, please?


> it sort of
> can, in theory, but just in theory. so may be we need direct message flush
> from NMI handler (printk->console_unlock), which will be a really big problem.

I thought about it a lot and got scared where this might go.
We need to balance the usefulness and the complexity of the solution.

It took one year to discover this regression. Before it was
suggested to avoid calling printk() in NMI context at all.
Now, we are trying to fix printk() to handle MBs of messages
in NMI context.

If my proposed patch solves the problem for Steven, I would still
like to get similar solution in. It is not that complex and helps
to bypass the limited per-CPU buffer in most cases. I always thought
that 8kB might be not enough in some cases.

Note that my patch is very defensive. It uses the main log buffer
only when it is really safe. It has higher potential for unneeded
fallback but if it works for Steven (really existing usecase), ...

On the other hand, I would prefer to avoid any much more complex
solution until we have a real reports that they are needed.

Also we need to look for alternatives. There is a chance
to create crashdump and get the ftrace messages from it.
Also this might be scenario when we might need to suggest
the early_printk() patchset from Peter Zijlstra.


> logbuf might not be big enough for 4890096 messages (Steven's report
> mentions "Lost 4890096 message(s)!"). we are counting on the fact that
> in case of `nmi_print_seq' bypass some other CPU will call console_unlock()
> and print pending logbuf messages, but this is not guaranteed and the
> messages can be dropped even from logbuf.

Yup. I tested the patch here and I needed to increase the main log buffer
size to see all ftrace messages. Fortunately, it was possible to use a really
huge global buffer. But it is not realistic to use huge per-CPU ones.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-27 13:38                 ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-27 13:38 UTC (permalink / raw)
  To: Sergey Senozhatsky, Steven Rostedt
  Cc: Andrew Morton, Peter Zijlstra, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Chris Metcalf,
	linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel,
	linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390,
	linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Mon 2017-04-24 11:17:47, Sergey Senozhatsky wrote:
> On (04/21/17 14:06), Petr Mladek wrote:
> [..]
> > > I agree that this_cpu_read(printk_context) covers slightly more than
> > > logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> > > with unlocked logbuf_lock, but I don't tend to think that it's a big
> > > problem.
> > 
> > PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
> > It might take rather long and logbuf_lock is availe. So, it is
> > noticeable source of false positives.
> 
> yes, agree.
> 
> probably we need additional printk_safe annotations for
> 		"logbuf_lock is locked from _this_ CPU"
> 
> false positives there can be very painful.
> 
> [..]
> > 	if (raw_spin_is_locked(&logbuf_lock))
> > 		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > 	else
> > 		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> 
> well, if everyone is fine with logbuf_lock access from every CPU from every
> NMI then I won't object either. but may be it makes sense to reduce the
> possibility of false positives. Steven is loosing critically important logs,
> after all.
> 
> 
> by the way,
> does this `nmi_print_seq' bypass even fix anything for Steven?

I think that this is the most important question.

Steven, does the patch from
https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz
help you to see the debug messages, please?


> it sort of
> can, in theory, but just in theory. so may be we need direct message flush
> from NMI handler (printk->console_unlock), which will be a really big problem.

I thought about it a lot and got scared where this might go.
We need to balance the usefulness and the complexity of the solution.

It took one year to discover this regression. Before it was
suggested to avoid calling printk() in NMI context at all.
Now, we are trying to fix printk() to handle MBs of messages
in NMI context.

If my proposed patch solves the problem for Steven, I would still
like to get similar solution in. It is not that complex and helps
to bypass the limited per-CPU buffer in most cases. I always thought
that 8kB might be not enough in some cases.

Note that my patch is very defensive. It uses the main log buffer
only when it is really safe. It has higher potential for unneeded
fallback but if it works for Steven (really existing usecase), ...

On the other hand, I would prefer to avoid any much more complex
solution until we have a real reports that they are needed.

Also we need to look for alternatives. There is a chance
to create crashdump and get the ftrace messages from it.
Also this might be scenario when we might need to suggest
the early_printk() patchset from Peter Zijlstra.


> logbuf might not be big enough for 4890096 messages (Steven's report
> mentions "Lost 4890096 message(s)!"). we are counting on the fact that
> in case of `nmi_print_seq' bypass some other CPU will call console_unlock()
> and print pending logbuf messages, but this is not guaranteed and the
> messages can be dropped even from logbuf.

Yup. I tested the patch here and I needed to increase the main log buffer
size to see all ftrace messages. Fortunately, it was possible to use a really
huge global buffer. But it is not realistic to use huge per-CPU ones.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-27 13:38                 ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-27 13:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon 2017-04-24 11:17:47, Sergey Senozhatsky wrote:
> On (04/21/17 14:06), Petr Mladek wrote:
> [..]
> > > I agree that this_cpu_read(printk_context) covers slightly more than
> > > logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> > > with unlocked logbuf_lock, but I don't tend to think that it's a big
> > > problem.
> > 
> > PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
> > It might take rather long and logbuf_lock is availe. So, it is
> > noticeable source of false positives.
> 
> yes, agree.
> 
> probably we need additional printk_safe annotations for
> 		"logbuf_lock is locked from _this_ CPU"
> 
> false positives there can be very painful.
> 
> [..]
> > 	if (raw_spin_is_locked(&logbuf_lock))
> > 		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > 	else
> > 		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> 
> well, if everyone is fine with logbuf_lock access from every CPU from every
> NMI then I won't object either. but may be it makes sense to reduce the
> possibility of false positives. Steven is loosing critically important logs,
> after all.
> 
> 
> by the way,
> does this `nmi_print_seq' bypass even fix anything for Steven?

I think that this is the most important question.

Steven, does the patch from
https://lkml.kernel.org/r/20170420131154.GL3452 at pathway.suse.cz
help you to see the debug messages, please?


> it sort of
> can, in theory, but just in theory. so may be we need direct message flush
> from NMI handler (printk->console_unlock), which will be a really big problem.

I thought about it a lot and got scared where this might go.
We need to balance the usefulness and the complexity of the solution.

It took one year to discover this regression. Before it was
suggested to avoid calling printk() in NMI context at all.
Now, we are trying to fix printk() to handle MBs of messages
in NMI context.

If my proposed patch solves the problem for Steven, I would still
like to get similar solution in. It is not that complex and helps
to bypass the limited per-CPU buffer in most cases. I always thought
that 8kB might be not enough in some cases.

Note that my patch is very defensive. It uses the main log buffer
only when it is really safe. It has higher potential for unneeded
fallback but if it works for Steven (really existing usecase), ...

On the other hand, I would prefer to avoid any much more complex
solution until we have a real reports that they are needed.

Also we need to look for alternatives. There is a chance
to create crashdump and get the ftrace messages from it.
Also this might be scenario when we might need to suggest
the early_printk() patchset from Peter Zijlstra.


> logbuf might not be big enough for 4890096 messages (Steven's report
> mentions "Lost 4890096 message(s)!"). we are counting on the fact that
> in case of `nmi_print_seq' bypass some other CPU will call console_unlock()
> and print pending logbuf messages, but this is not guaranteed and the
> messages can be dropped even from logbuf.

Yup. I tested the patch here and I needed to increase the main log buffer
size to see all ftrace messages. Fortunately, it was possible to use a really
huge global buffer. But it is not realistic to use huge per-CPU ones.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-27 13:38                 ` Petr Mladek
  (?)
@ 2017-04-27 14:31                   ` Steven Rostedt
  -1 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-27 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 27 Apr 2017 15:38:19 +0200
Petr Mladek <pmladek@suse.com> wrote:

> > by the way,
> > does this `nmi_print_seq' bypass even fix anything for Steven?  
> 
> I think that this is the most important question.
> 
> Steven, does the patch from
> https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz
> help you to see the debug messages, please?

You'll have to wait for a bit. The box that I was debugging takes 45
minutes to reboot. And I don't have much more time to play on it before
I have to give it back. I already found the bug I was looking for and
I'm trying not to crash it again (due to the huge bring up time).

When I get a chance, I'll see if I can insert a trigger to crash the
kernel from NMI on another box and see if this patch helps.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-27 14:31                   ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-27 14:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Thu, 27 Apr 2017 15:38:19 +0200
Petr Mladek <pmladek@suse.com> wrote:

> > by the way,
> > does this `nmi_print_seq' bypass even fix anything for Steven?  
> 
> I think that this is the most important question.
> 
> Steven, does the patch from
> https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz
> help you to see the debug messages, please?

You'll have to wait for a bit. The box that I was debugging takes 45
minutes to reboot. And I don't have much more time to play on it before
I have to give it back. I already found the bug I was looking for and
I'm trying not to crash it again (due to the huge bring up time).

When I get a chance, I'll see if I can insert a trigger to crash the
kernel from NMI on another box and see if this patch helps.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-27 14:31                   ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-27 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 27 Apr 2017 15:38:19 +0200
Petr Mladek <pmladek@suse.com> wrote:

> > by the way,
> > does this `nmi_print_seq' bypass even fix anything for Steven?  
> 
> I think that this is the most important question.
> 
> Steven, does the patch from
> https://lkml.kernel.org/r/20170420131154.GL3452 at pathway.suse.cz
> help you to see the debug messages, please?

You'll have to wait for a bit. The box that I was debugging takes 45
minutes to reboot. And I don't have much more time to play on it before
I have to give it back. I already found the bug I was looking for and
I'm trying not to crash it again (due to the huge bring up time).

When I get a chance, I'll see if I can insert a trigger to crash the
kernel from NMI on another box and see if this patch helps.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-27 14:31                   ` Steven Rostedt
  (?)
@ 2017-04-27 15:28                     ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-27 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu 2017-04-27 10:31:18, Steven Rostedt wrote:
> On Thu, 27 Apr 2017 15:38:19 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > > by the way,
> > > does this `nmi_print_seq' bypass even fix anything for Steven?  
> > 
> > I think that this is the most important question.
> > 
> > Steven, does the patch from
> > https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz
> > help you to see the debug messages, please?
> 
> You'll have to wait for a bit. The box that I was debugging takes 45
> minutes to reboot. And I don't have much more time to play on it before
> I have to give it back. I already found the bug I was looking for and
> I'm trying not to crash it again (due to the huge bring up time).

I see.

> When I get a chance, I'll see if I can insert a trigger to crash the
> kernel from NMI on another box and see if this patch helps.

I actually tested it here using this hack:

diff --cc lib/nmi_backtrace.c
index d531f85c0c9b,0bc0a3535a8a..000000000000
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs *
        int cpu = smp_processor_id();
  
        if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
 +              if (in_nmi())
 +                      panic("Simulating panic in NMI\n");
+               arch_spin_lock(&lock);
                if (regs && cpu_in_idle(instruction_pointer(regs))) {
                        pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
                                cpu, instruction_pointer(regs));

and triggered by:

   echo  l > /proc/sysrq-trigger

The patch really helped to see much more (all) messages from the ftrace
buffers in NMI mode.

But the test is a bit artifical. The patch might not help when there
is a big printk() activity on the system when the panic() is
triggered. We might wrongly use the small per-CPU buffer when
the logbuf_lock is tested and taken on another CPU at the same time.
It means that it will not always help.

I personally think that the patch might be good enough. I am not sure
if a perfect (more comlpex) solution is worth it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-27 15:28                     ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-27 15:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Thu 2017-04-27 10:31:18, Steven Rostedt wrote:
> On Thu, 27 Apr 2017 15:38:19 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > > by the way,
> > > does this `nmi_print_seq' bypass even fix anything for Steven?  
> > 
> > I think that this is the most important question.
> > 
> > Steven, does the patch from
> > https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz
> > help you to see the debug messages, please?
> 
> You'll have to wait for a bit. The box that I was debugging takes 45
> minutes to reboot. And I don't have much more time to play on it before
> I have to give it back. I already found the bug I was looking for and
> I'm trying not to crash it again (due to the huge bring up time).

I see.

> When I get a chance, I'll see if I can insert a trigger to crash the
> kernel from NMI on another box and see if this patch helps.

I actually tested it here using this hack:

diff --cc lib/nmi_backtrace.c
index d531f85c0c9b,0bc0a3535a8a..000000000000
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs *
        int cpu = smp_processor_id();
  
        if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
 +              if (in_nmi())
 +                      panic("Simulating panic in NMI\n");
+               arch_spin_lock(&lock);
                if (regs && cpu_in_idle(instruction_pointer(regs))) {
                        pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
                                cpu, instruction_pointer(regs));

and triggered by:

   echo  l > /proc/sysrq-trigger

The patch really helped to see much more (all) messages from the ftrace
buffers in NMI mode.

But the test is a bit artifical. The patch might not help when there
is a big printk() activity on the system when the panic() is
triggered. We might wrongly use the small per-CPU buffer when
the logbuf_lock is tested and taken on another CPU at the same time.
It means that it will not always help.

I personally think that the patch might be good enough. I am not sure
if a perfect (more comlpex) solution is worth it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-27 15:28                     ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-27 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu 2017-04-27 10:31:18, Steven Rostedt wrote:
> On Thu, 27 Apr 2017 15:38:19 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > > by the way,
> > > does this `nmi_print_seq' bypass even fix anything for Steven?  
> > 
> > I think that this is the most important question.
> > 
> > Steven, does the patch from
> > https://lkml.kernel.org/r/20170420131154.GL3452 at pathway.suse.cz
> > help you to see the debug messages, please?
> 
> You'll have to wait for a bit. The box that I was debugging takes 45
> minutes to reboot. And I don't have much more time to play on it before
> I have to give it back. I already found the bug I was looking for and
> I'm trying not to crash it again (due to the huge bring up time).

I see.

> When I get a chance, I'll see if I can insert a trigger to crash the
> kernel from NMI on another box and see if this patch helps.

I actually tested it here using this hack:

diff --cc lib/nmi_backtrace.c
index d531f85c0c9b,0bc0a3535a8a..000000000000
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs *
        int cpu = smp_processor_id();
  
        if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
 +              if (in_nmi())
 +                      panic("Simulating panic in NMI\n");
+               arch_spin_lock(&lock);
                if (regs && cpu_in_idle(instruction_pointer(regs))) {
                        pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
                                cpu, instruction_pointer(regs));

and triggered by:

   echo  l > /proc/sysrq-trigger

The patch really helped to see much more (all) messages from the ftrace
buffers in NMI mode.

But the test is a bit artifical. The patch might not help when there
is a big printk() activity on the system when the panic() is
triggered. We might wrongly use the small per-CPU buffer when
the logbuf_lock is tested and taken on another CPU at the same time.
It means that it will not always help.

I personally think that the patch might be good enough. I am not sure
if a perfect (more comlpex) solution is worth it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-27 15:28                     ` Petr Mladek
  (?)
@ 2017-04-27 15:42                       ` Steven Rostedt
  -1 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-27 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 27 Apr 2017 17:28:07 +0200
Petr Mladek <pmladek@suse.com> wrote:


> > When I get a chance, I'll see if I can insert a trigger to crash the
> > kernel from NMI on another box and see if this patch helps.  
> 
> I actually tested it here using this hack:
> 
> diff --cc lib/nmi_backtrace.c
> index d531f85c0c9b,0bc0a3535a8a..000000000000
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs *
>         int cpu = smp_processor_id();
>   
>         if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
>  +              if (in_nmi())
>  +                      panic("Simulating panic in NMI\n");
> +               arch_spin_lock(&lock);

I was going to create a ftrace trigger, to crash on demand, but this
may do as well.

>                 if (regs && cpu_in_idle(instruction_pointer(regs))) {
>                         pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
>                                 cpu, instruction_pointer(regs));
> 
> and triggered by:
> 
>    echo  l > /proc/sysrq-trigger
> 
> The patch really helped to see much more (all) messages from the ftrace
> buffers in NMI mode.
> 
> But the test is a bit artifical. The patch might not help when there
> is a big printk() activity on the system when the panic() is
> triggered. We might wrongly use the small per-CPU buffer when
> the logbuf_lock is tested and taken on another CPU at the same time.
> It means that it will not always help.
> 
> I personally think that the patch might be good enough. I am not sure
> if a perfect (more comlpex) solution is worth it.

I wasn't asking for perfect, as the previous solutions never were
either. I just want an optimistic dump if possible.

I'll try to get some time today to test this, and let you know. But it
wont be on the machine that I originally had the issue with.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-27 15:42                       ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-27 15:42 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Thu, 27 Apr 2017 17:28:07 +0200
Petr Mladek <pmladek@suse.com> wrote:


> > When I get a chance, I'll see if I can insert a trigger to crash the
> > kernel from NMI on another box and see if this patch helps.  
> 
> I actually tested it here using this hack:
> 
> diff --cc lib/nmi_backtrace.c
> index d531f85c0c9b,0bc0a3535a8a..000000000000
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs *
>         int cpu = smp_processor_id();
>   
>         if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
>  +              if (in_nmi())
>  +                      panic("Simulating panic in NMI\n");
> +               arch_spin_lock(&lock);

I was going to create a ftrace trigger, to crash on demand, but this
may do as well.

>                 if (regs && cpu_in_idle(instruction_pointer(regs))) {
>                         pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
>                                 cpu, instruction_pointer(regs));
> 
> and triggered by:
> 
>    echo  l > /proc/sysrq-trigger
> 
> The patch really helped to see much more (all) messages from the ftrace
> buffers in NMI mode.
> 
> But the test is a bit artifical. The patch might not help when there
> is a big printk() activity on the system when the panic() is
> triggered. We might wrongly use the small per-CPU buffer when
> the logbuf_lock is tested and taken on another CPU at the same time.
> It means that it will not always help.
> 
> I personally think that the patch might be good enough. I am not sure
> if a perfect (more comlpex) solution is worth it.

I wasn't asking for perfect, as the previous solutions never were
either. I just want an optimistic dump if possible.

I'll try to get some time today to test this, and let you know. But it
wont be on the machine that I originally had the issue with.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-27 15:42                       ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-27 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 27 Apr 2017 17:28:07 +0200
Petr Mladek <pmladek@suse.com> wrote:


> > When I get a chance, I'll see if I can insert a trigger to crash the
> > kernel from NMI on another box and see if this patch helps.  
> 
> I actually tested it here using this hack:
> 
> diff --cc lib/nmi_backtrace.c
> index d531f85c0c9b,0bc0a3535a8a..000000000000
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs *
>         int cpu = smp_processor_id();
>   
>         if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
>  +              if (in_nmi())
>  +                      panic("Simulating panic in NMI\n");
> +               arch_spin_lock(&lock);

I was going to create a ftrace trigger, to crash on demand, but this
may do as well.

>                 if (regs && cpu_in_idle(instruction_pointer(regs))) {
>                         pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
>                                 cpu, instruction_pointer(regs));
> 
> and triggered by:
> 
>    echo  l > /proc/sysrq-trigger
> 
> The patch really helped to see much more (all) messages from the ftrace
> buffers in NMI mode.
> 
> But the test is a bit artifical. The patch might not help when there
> is a big printk() activity on the system when the panic() is
> triggered. We might wrongly use the small per-CPU buffer when
> the logbuf_lock is tested and taken on another CPU at the same time.
> It means that it will not always help.
> 
> I personally think that the patch might be good enough. I am not sure
> if a perfect (more comlpex) solution is worth it.

I wasn't asking for perfect, as the previous solutions never were
either. I just want an optimistic dump if possible.

I'll try to get some time today to test this, and let you know. But it
wont be on the machine that I originally had the issue with.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-20 13:11         ` Petr Mladek
  (?)
@ 2017-04-27 16:14           ` Steven Rostedt
  -1 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-27 16:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 20 Apr 2017 15:11:54 +0200
Petr Mladek <pmladek@suse.com> wrote:



> 
> >From c530d9dee91c74db5e6a198479e2e63b24cb84a2 Mon Sep 17 00:00:00 2001  
> From: Petr Mladek <pmladek@suse.com>
> Date: Thu, 20 Apr 2017 10:52:31 +0200
> Subject: [PATCH] printk: Use the main logbuf in NMI when logbuf_lock is
>  available

I tried this patch. It's better because I get the end of the trace, but
I do lose the beginning of it:

** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path

The way I tested it was by adding this:

Index: linux-trace.git/kernel/trace/trace_functions.c
=================================--- linux-trace.git.orig/kernel/trace/trace_functions.c
+++ linux-trace.git/kernel/trace/trace_functions.c
@@ -469,8 +469,11 @@ ftrace_cpudump_probe(unsigned long ip, u
 		     struct trace_array *tr, struct ftrace_probe_ops *ops,
 		     void *data)
 {
-	if (update_count(ops, ip, data))
-		ftrace_dump(DUMP_ORIG);
+	char *killer = NULL;
+
+	panic_on_oops = 1;	/* force panic */
+	wmb();
+	*killer = 1;
 }
 
 static int


Then doing the following:

# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops 
# trace-cmd start -p function
# echo nmi_handle:cpudump > /debug/tracing/set_ftrace_filter
# perf record -c 100 -a sleep 1

And that triggers the crash.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-27 16:14           ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-27 16:14 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Thu, 20 Apr 2017 15:11:54 +0200
Petr Mladek <pmladek@suse.com> wrote:



> 
> >From c530d9dee91c74db5e6a198479e2e63b24cb84a2 Mon Sep 17 00:00:00 2001  
> From: Petr Mladek <pmladek@suse.com>
> Date: Thu, 20 Apr 2017 10:52:31 +0200
> Subject: [PATCH] printk: Use the main logbuf in NMI when logbuf_lock is
>  available

I tried this patch. It's better because I get the end of the trace, but
I do lose the beginning of it:

** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path

The way I tested it was by adding this:

Index: linux-trace.git/kernel/trace/trace_functions.c
===================================================================
--- linux-trace.git.orig/kernel/trace/trace_functions.c
+++ linux-trace.git/kernel/trace/trace_functions.c
@@ -469,8 +469,11 @@ ftrace_cpudump_probe(unsigned long ip, u
 		     struct trace_array *tr, struct ftrace_probe_ops *ops,
 		     void *data)
 {
-	if (update_count(ops, ip, data))
-		ftrace_dump(DUMP_ORIG);
+	char *killer = NULL;
+
+	panic_on_oops = 1;	/* force panic */
+	wmb();
+	*killer = 1;
 }
 
 static int


Then doing the following:

# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops 
# trace-cmd start -p function
# echo nmi_handle:cpudump > /debug/tracing/set_ftrace_filter
# perf record -c 100 -a sleep 1

And that triggers the crash.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-27 16:14           ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-27 16:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 20 Apr 2017 15:11:54 +0200
Petr Mladek <pmladek@suse.com> wrote:



> 
> >From c530d9dee91c74db5e6a198479e2e63b24cb84a2 Mon Sep 17 00:00:00 2001  
> From: Petr Mladek <pmladek@suse.com>
> Date: Thu, 20 Apr 2017 10:52:31 +0200
> Subject: [PATCH] printk: Use the main logbuf in NMI when logbuf_lock is
>  available

I tried this patch. It's better because I get the end of the trace, but
I do lose the beginning of it:

** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path

The way I tested it was by adding this:

Index: linux-trace.git/kernel/trace/trace_functions.c
===================================================================
--- linux-trace.git.orig/kernel/trace/trace_functions.c
+++ linux-trace.git/kernel/trace/trace_functions.c
@@ -469,8 +469,11 @@ ftrace_cpudump_probe(unsigned long ip, u
 		     struct trace_array *tr, struct ftrace_probe_ops *ops,
 		     void *data)
 {
-	if (update_count(ops, ip, data))
-		ftrace_dump(DUMP_ORIG);
+	char *killer = NULL;
+
+	panic_on_oops = 1;	/* force panic */
+	wmb();
+	*killer = 1;
 }
 
 static int


Then doing the following:

# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops 
# trace-cmd start -p function
# echo nmi_handle:cpudump > /debug/tracing/set_ftrace_filter
# perf record -c 100 -a sleep 1

And that triggers the crash.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-20 13:11         ` Petr Mladek
  (?)
@ 2017-04-28  1:25           ` Sergey Senozhatsky
  -1 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-28  1:25 UTC (permalink / raw)
  To: linux-arm-kernel


On (04/20/17 15:11), Petr Mladek wrote:
[..]
>  void printk_nmi_enter(void)
>  {
> -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> +	/*
> +	 * The size of the extra per-CPU buffer is limited. Use it
> +	 * only when really needed.
> +	 */
> +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> +	    raw_spin_is_locked(&logbuf_lock)) {

can we please have && here?


[..]
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index 4e8a30d1c22f..0bc0a3535a8a 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>  
>  bool nmi_cpu_backtrace(struct pt_regs *regs)
>  {
> +	static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
>  	int cpu = smp_processor_id();
>  
>  	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
> +		arch_spin_lock(&lock);
>  		if (regs && cpu_in_idle(instruction_pointer(regs))) {
>  			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
>  				cpu, instruction_pointer(regs));
> @@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
>  			else
>  				dump_stack();
>  		}
> +		arch_spin_unlock(&lock);
>  		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>  		return true;
>  	}

can the nmi_backtrace part be a patch on its own?

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28  1:25           ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-28  1:25 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Peter Zijlstra, Russell King, Daniel Thompson, Jiri Kosina,
	Ingo Molnar, Thomas Gleixner, Chris Metcalf, linux-kernel, x86,
	linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux,
	Jan Kara, Ralf Baechle, Benjamin Herrenschmidt,
	Martin Schwidefsky, David Miller


On (04/20/17 15:11), Petr Mladek wrote:
[..]
>  void printk_nmi_enter(void)
>  {
> -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> +	/*
> +	 * The size of the extra per-CPU buffer is limited. Use it
> +	 * only when really needed.
> +	 */
> +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> +	    raw_spin_is_locked(&logbuf_lock)) {

can we please have && here?


[..]
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index 4e8a30d1c22f..0bc0a3535a8a 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>  
>  bool nmi_cpu_backtrace(struct pt_regs *regs)
>  {
> +	static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
>  	int cpu = smp_processor_id();
>  
>  	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
> +		arch_spin_lock(&lock);
>  		if (regs && cpu_in_idle(instruction_pointer(regs))) {
>  			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
>  				cpu, instruction_pointer(regs));
> @@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
>  			else
>  				dump_stack();
>  		}
> +		arch_spin_unlock(&lock);
>  		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>  		return true;
>  	}

can the nmi_backtrace part be a patch on its own?

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28  1:25           ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-28  1:25 UTC (permalink / raw)
  To: linux-arm-kernel


On (04/20/17 15:11), Petr Mladek wrote:
[..]
>  void printk_nmi_enter(void)
>  {
> -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> +	/*
> +	 * The size of the extra per-CPU buffer is limited. Use it
> +	 * only when really needed.
> +	 */
> +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> +	    raw_spin_is_locked(&logbuf_lock)) {

can we please have && here?


[..]
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index 4e8a30d1c22f..0bc0a3535a8a 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>  
>  bool nmi_cpu_backtrace(struct pt_regs *regs)
>  {
> +	static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
>  	int cpu = smp_processor_id();
>  
>  	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
> +		arch_spin_lock(&lock);
>  		if (regs && cpu_in_idle(instruction_pointer(regs))) {
>  			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
>  				cpu, instruction_pointer(regs));
> @@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
>  			else
>  				dump_stack();
>  		}
> +		arch_spin_unlock(&lock);
>  		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>  		return true;
>  	}

can the nmi_backtrace part be a patch on its own?

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-27 16:14           ` Steven Rostedt
  (?)
@ 2017-04-28  1:35             ` Sergey Senozhatsky
  -1 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-28  1:35 UTC (permalink / raw)
  To: linux-arm-kernel

On (04/27/17 12:14), Steven Rostedt wrote:
[..]
> I tried this patch. It's better because I get the end of the trace, but
> I do lose the beginning of it:
> 
> ** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path

many thanks!

so we now drop messages from logbuf, not from per-CPU buffers. that
"queue printk_deferred irq_work on every online CPU when we bypass per-CPU
buffers from NMI" idea *probably* might help here - we need someone to emit
messages from the logbuf while we printk from NMI. there is still a
possibility that we can drop messages, though, since log_store() from NMI
CPU can be much-much faster than call_console_drivers() on other CPU.

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28  1:35             ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-28  1:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Andrew Morton, Peter Zijlstra,
	Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar,
	Thomas Gleixner, Chris Metcalf, linux-kernel, x86,
	linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux,
	Jan Kara, Ralf Baechle, Benjamin Herrenschmidt,
	Martin Schwidefsky, David Miller

On (04/27/17 12:14), Steven Rostedt wrote:
[..]
> I tried this patch. It's better because I get the end of the trace, but
> I do lose the beginning of it:
> 
> ** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path

many thanks!

so we now drop messages from logbuf, not from per-CPU buffers. that
"queue printk_deferred irq_work on every online CPU when we bypass per-CPU
buffers from NMI" idea *probably* might help here - we need someone to emit
messages from the logbuf while we printk from NMI. there is still a
possibility that we can drop messages, though, since log_store() from NMI
CPU can be much-much faster than call_console_drivers() on other CPU.

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28  1:35             ` Sergey Senozhatsky
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Senozhatsky @ 2017-04-28  1:35 UTC (permalink / raw)
  To: linux-arm-kernel

On (04/27/17 12:14), Steven Rostedt wrote:
[..]
> I tried this patch. It's better because I get the end of the trace, but
> I do lose the beginning of it:
> 
> ** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path

many thanks!

so we now drop messages from logbuf, not from per-CPU buffers. that
"queue printk_deferred irq_work on every online CPU when we bypass per-CPU
buffers from NMI" idea *probably* might help here - we need someone to emit
messages from the logbuf while we printk from NMI. there is still a
possibility that we can drop messages, though, since log_store() from NMI
CPU can be much-much faster than call_console_drivers() on other CPU.

	-ss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-27 13:38                 ` Petr Mladek
  (?)
@ 2017-04-28  9:02                   ` Peter Zijlstra
  -1 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2017-04-28  9:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:
> Also we need to look for alternatives. There is a chance
> to create crashdump and get the ftrace messages from it.
> Also this might be scenario when we might need to suggest
> the early_printk() patchset from Peter Zijlstra.

I'd be happy to repost those. I still carry them in my tree.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28  9:02                   ` Peter Zijlstra
  0 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2017-04-28  9:02 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:
> Also we need to look for alternatives. There is a chance
> to create crashdump and get the ftrace messages from it.
> Also this might be scenario when we might need to suggest
> the early_printk() patchset from Peter Zijlstra.

I'd be happy to repost those. I still carry them in my tree.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28  9:02                   ` Peter Zijlstra
  0 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2017-04-28  9:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:
> Also we need to look for alternatives. There is a chance
> to create crashdump and get the ftrace messages from it.
> Also this might be scenario when we might need to suggest
> the early_printk() patchset from Peter Zijlstra.

I'd be happy to repost those. I still carry them in my tree.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-28  1:25           ` Sergey Senozhatsky
  (?)
@ 2017-04-28 12:38             ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-28 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri 2017-04-28 10:25:30, Sergey Senozhatsky wrote:
> 
> On (04/20/17 15:11), Petr Mladek wrote:
> [..]
> >  void printk_nmi_enter(void)
> >  {
> > -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > +	/*
> > +	 * The size of the extra per-CPU buffer is limited. Use it
> > +	 * only when really needed.
> > +	 */
> > +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> > +	    raw_spin_is_locked(&logbuf_lock)) {
> 
> can we please have && here?

OK, it sounds reasonable after all.

> [..]
> > diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> > index 4e8a30d1c22f..0bc0a3535a8a 100644
> > --- a/lib/nmi_backtrace.c
> > +++ b/lib/nmi_backtrace.c
> > @@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> >  
> >  bool nmi_cpu_backtrace(struct pt_regs *regs)
> >  {
> > +	static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
> >  	int cpu = smp_processor_id();
> >  
> >  	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
> > +		arch_spin_lock(&lock);
> >  		if (regs && cpu_in_idle(instruction_pointer(regs))) {
> >  			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
> >  				cpu, instruction_pointer(regs));
> > @@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
> >  			else
> >  				dump_stack();
> >  		}
> > +		arch_spin_unlock(&lock);
> >  		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
> >  		return true;
> >  	}
> 
> can the nmi_backtrace part be a patch on its own?

I would prefer to keep it in the same patch. The backtrace from
all CPUs is completely unusable when all CPUs push to the global
log buffer in parallel. Single patch might safe hair of some
poor bisectors.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 12:38             ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-28 12:38 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Fri 2017-04-28 10:25:30, Sergey Senozhatsky wrote:
> 
> On (04/20/17 15:11), Petr Mladek wrote:
> [..]
> >  void printk_nmi_enter(void)
> >  {
> > -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > +	/*
> > +	 * The size of the extra per-CPU buffer is limited. Use it
> > +	 * only when really needed.
> > +	 */
> > +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> > +	    raw_spin_is_locked(&logbuf_lock)) {
> 
> can we please have && here?

OK, it sounds reasonable after all.

> [..]
> > diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> > index 4e8a30d1c22f..0bc0a3535a8a 100644
> > --- a/lib/nmi_backtrace.c
> > +++ b/lib/nmi_backtrace.c
> > @@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> >  
> >  bool nmi_cpu_backtrace(struct pt_regs *regs)
> >  {
> > +	static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
> >  	int cpu = smp_processor_id();
> >  
> >  	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
> > +		arch_spin_lock(&lock);
> >  		if (regs && cpu_in_idle(instruction_pointer(regs))) {
> >  			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
> >  				cpu, instruction_pointer(regs));
> > @@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
> >  			else
> >  				dump_stack();
> >  		}
> > +		arch_spin_unlock(&lock);
> >  		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
> >  		return true;
> >  	}
> 
> can the nmi_backtrace part be a patch on its own?

I would prefer to keep it in the same patch. The backtrace from
all CPUs is completely unusable when all CPUs push to the global
log buffer in parallel. Single patch might safe hair of some
poor bisectors.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 12:38             ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-28 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri 2017-04-28 10:25:30, Sergey Senozhatsky wrote:
> 
> On (04/20/17 15:11), Petr Mladek wrote:
> [..]
> >  void printk_nmi_enter(void)
> >  {
> > -	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > +	/*
> > +	 * The size of the extra per-CPU buffer is limited. Use it
> > +	 * only when really needed.
> > +	 */
> > +	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK ||
> > +	    raw_spin_is_locked(&logbuf_lock)) {
> 
> can we please have && here?

OK, it sounds reasonable after all.

> [..]
> > diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> > index 4e8a30d1c22f..0bc0a3535a8a 100644
> > --- a/lib/nmi_backtrace.c
> > +++ b/lib/nmi_backtrace.c
> > @@ -86,9 +86,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> >  
> >  bool nmi_cpu_backtrace(struct pt_regs *regs)
> >  {
> > +	static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
> >  	int cpu = smp_processor_id();
> >  
> >  	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
> > +		arch_spin_lock(&lock);
> >  		if (regs && cpu_in_idle(instruction_pointer(regs))) {
> >  			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
> >  				cpu, instruction_pointer(regs));
> > @@ -99,6 +101,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
> >  			else
> >  				dump_stack();
> >  		}
> > +		arch_spin_unlock(&lock);
> >  		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
> >  		return true;
> >  	}
> 
> can the nmi_backtrace part be a patch on its own?

I would prefer to keep it in the same patch. The backtrace from
all CPUs is completely unusable when all CPUs push to the global
log buffer in parallel. Single patch might safe hair of some
poor bisectors.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-28  1:35             ` Sergey Senozhatsky
  (?)
@ 2017-04-28 12:57               ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-28 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri 2017-04-28 10:35:32, Sergey Senozhatsky wrote:
> On (04/27/17 12:14), Steven Rostedt wrote:
> [..]
> > I tried this patch. It's better because I get the end of the trace, but
> > I do lose the beginning of it:
> > 
> > ** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path
>
> many thanks!
> 
> so we now drop messages from logbuf, not from per-CPU buffers. that
> "queue printk_deferred irq_work on every online CPU when we bypass per-CPU
> buffers from NMI" idea *probably* might help here - we need someone to emit
> messages from the logbuf while we printk from NMI. there is still a
> possibility that we can drop messages, though, since log_store() from NMI
> CPU can be much-much faster than call_console_drivers() on other CPU.

ftrace log is dumped via trace_panic_notifier. It is done after
smp_send_stop(). It means that only a single CPU is available and
it is NMI context at the moment.

One possibility might be to put printk into a special mode and
drop the last messages instead of the first ones. But this would
need to be configurable.

Of course, if the problem is reproducible, the easiest solution
is to use bigger main log buffer, for example boot with
log_buf_len2M.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 12:57               ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-28 12:57 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Fri 2017-04-28 10:35:32, Sergey Senozhatsky wrote:
> On (04/27/17 12:14), Steven Rostedt wrote:
> [..]
> > I tried this patch. It's better because I get the end of the trace, but
> > I do lose the beginning of it:
> > 
> > ** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path
>
> many thanks!
> 
> so we now drop messages from logbuf, not from per-CPU buffers. that
> "queue printk_deferred irq_work on every online CPU when we bypass per-CPU
> buffers from NMI" idea *probably* might help here - we need someone to emit
> messages from the logbuf while we printk from NMI. there is still a
> possibility that we can drop messages, though, since log_store() from NMI
> CPU can be much-much faster than call_console_drivers() on other CPU.

ftrace log is dumped via trace_panic_notifier. It is done after
smp_send_stop(). It means that only a single CPU is available and
it is NMI context at the moment.

One possibility might be to put printk into a special mode and
drop the last messages instead of the first ones. But this would
need to be configurable.

Of course, if the problem is reproducible, the easiest solution
is to use bigger main log buffer, for example boot with
log_buf_len=32M.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 12:57               ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-28 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri 2017-04-28 10:35:32, Sergey Senozhatsky wrote:
> On (04/27/17 12:14), Steven Rostedt wrote:
> [..]
> > I tried this patch. It's better because I get the end of the trace, but
> > I do lose the beginning of it:
> > 
> > ** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path
>
> many thanks!
> 
> so we now drop messages from logbuf, not from per-CPU buffers. that
> "queue printk_deferred irq_work on every online CPU when we bypass per-CPU
> buffers from NMI" idea *probably* might help here - we need someone to emit
> messages from the logbuf while we printk from NMI. there is still a
> possibility that we can drop messages, though, since log_store() from NMI
> CPU can be much-much faster than call_console_drivers() on other CPU.

ftrace log is dumped via trace_panic_notifier. It is done after
smp_send_stop(). It means that only a single CPU is available and
it is NMI context at the moment.

One possibility might be to put printk into a special mode and
drop the last messages instead of the first ones. But this would
need to be configurable.

Of course, if the problem is reproducible, the easiest solution
is to use bigger main log buffer, for example boot with
log_buf_len=32M.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-28  9:02                   ` Peter Zijlstra
  (?)
@ 2017-04-28 13:44                     ` Petr Mladek
  -1 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-28 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote:
> On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:
> > Also we need to look for alternatives. There is a chance
> > to create crashdump and get the ftrace messages from it.
> > Also this might be scenario when we might need to suggest
> > the early_printk() patchset from Peter Zijlstra.
> 
> I'd be happy to repost those. I still carry them in my tree.

You do not need to if they are still the same as
https://lkml.kernel.org/r/20161018170830.405990950@infradead.org

I rather do not promise anything but I would like to look at them
within next few weeks (after the merge window).

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 13:44                     ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-28 13:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote:
> On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:
> > Also we need to look for alternatives. There is a chance
> > to create crashdump and get the ftrace messages from it.
> > Also this might be scenario when we might need to suggest
> > the early_printk() patchset from Peter Zijlstra.
> 
> I'd be happy to repost those. I still carry them in my tree.

You do not need to if they are still the same as
https://lkml.kernel.org/r/20161018170830.405990950@infradead.org

I rather do not promise anything but I would like to look at them
within next few weeks (after the merge window).

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 13:44                     ` Petr Mladek
  0 siblings, 0 replies; 97+ messages in thread
From: Petr Mladek @ 2017-04-28 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote:
> On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:
> > Also we need to look for alternatives. There is a chance
> > to create crashdump and get the ftrace messages from it.
> > Also this might be scenario when we might need to suggest
> > the early_printk() patchset from Peter Zijlstra.
> 
> I'd be happy to repost those. I still carry them in my tree.

You do not need to if they are still the same as
https://lkml.kernel.org/r/20161018170830.405990950 at infradead.org

I rather do not promise anything but I would like to look at them
within next few weeks (after the merge window).

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-28 13:44                     ` Petr Mladek
  (?)
@ 2017-04-28 13:58                       ` Peter Zijlstra
  -1 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2017-04-28 13:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 28, 2017 at 03:44:23PM +0200, Petr Mladek wrote:
> On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote:
> > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:
> > > Also we need to look for alternatives. There is a chance
> > > to create crashdump and get the ftrace messages from it.
> > > Also this might be scenario when we might need to suggest
> > > the early_printk() patchset from Peter Zijlstra.
> > 
> > I'd be happy to repost those. I still carry them in my tree.
> 
> You do not need to if they are still the same as
> https://lkml.kernel.org/r/20161018170830.405990950@infradead.org

More or less, I think I fixed an intermediate compile fail reported by
you and the 0-day thing and I wrote a few comments in the last patch.

They live here fwiw:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/tmp

> I rather do not promise anything but I would like to look at them
> within next few weeks (after the merge window).

Sure, I'll post them again during/after the merge window.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 13:58                       ` Peter Zijlstra
  0 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2017-04-28 13:58 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Fri, Apr 28, 2017 at 03:44:23PM +0200, Petr Mladek wrote:
> On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote:
> > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:
> > > Also we need to look for alternatives. There is a chance
> > > to create crashdump and get the ftrace messages from it.
> > > Also this might be scenario when we might need to suggest
> > > the early_printk() patchset from Peter Zijlstra.
> > 
> > I'd be happy to repost those. I still carry them in my tree.
> 
> You do not need to if they are still the same as
> https://lkml.kernel.org/r/20161018170830.405990950@infradead.org

More or less, I think I fixed an intermediate compile fail reported by
you and the 0-day thing and I wrote a few comments in the last patch.

They live here fwiw:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/tmp

> I rather do not promise anything but I would like to look at them
> within next few weeks (after the merge window).

Sure, I'll post them again during/after the merge window.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 13:58                       ` Peter Zijlstra
  0 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2017-04-28 13:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 28, 2017 at 03:44:23PM +0200, Petr Mladek wrote:
> On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote:
> > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:
> > > Also we need to look for alternatives. There is a chance
> > > to create crashdump and get the ftrace messages from it.
> > > Also this might be scenario when we might need to suggest
> > > the early_printk() patchset from Peter Zijlstra.
> > 
> > I'd be happy to repost those. I still carry them in my tree.
> 
> You do not need to if they are still the same as
> https://lkml.kernel.org/r/20161018170830.405990950 at infradead.org

More or less, I think I fixed an intermediate compile fail reported by
you and the 0-day thing and I wrote a few comments in the last patch.

They live here fwiw:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/tmp

> I rather do not promise anything but I would like to look at them
> within next few weeks (after the merge window).

Sure, I'll post them again during/after the merge window.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-28 12:57               ` Petr Mladek
  (?)
@ 2017-04-28 14:16                 ` Steven Rostedt
  -1 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-28 14:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 28 Apr 2017 14:57:25 +0200
Petr Mladek <pmladek@suse.com> wrote:


> Of course, if the problem is reproducible, the easiest solution
> is to use bigger main log buffer, for example boot with
> log_buf_len2M.

Of course that may not be enough. Especially when I have a machine with
240 CPUs. But it also has a ton of RAM, I could easily do
log_buf_len2G

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 14:16                 ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-28 14:16 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Fri, 28 Apr 2017 14:57:25 +0200
Petr Mladek <pmladek@suse.com> wrote:


> Of course, if the problem is reproducible, the easiest solution
> is to use bigger main log buffer, for example boot with
> log_buf_len=32M.

Of course that may not be enough. Especially when I have a machine with
240 CPUs. But it also has a ton of RAM, I could easily do
log_buf_len=32G

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 14:16                 ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-28 14:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 28 Apr 2017 14:57:25 +0200
Petr Mladek <pmladek@suse.com> wrote:


> Of course, if the problem is reproducible, the easiest solution
> is to use bigger main log buffer, for example boot with
> log_buf_len=32M.

Of course that may not be enough. Especially when I have a machine with
240 CPUs. But it also has a ton of RAM, I could easily do
log_buf_len=32G

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
  2017-04-28 13:58                       ` Peter Zijlstra
  (?)
@ 2017-04-28 14:47                         ` Steven Rostedt
  -1 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-28 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 28 Apr 2017 15:58:10 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Apr 28, 2017 at 03:44:23PM +0200, Petr Mladek wrote:
> > On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote:  
> > > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:  
> > > > Also we need to look for alternatives. There is a chance
> > > > to create crashdump and get the ftrace messages from it.
> > > > Also this might be scenario when we might need to suggest
> > > > the early_printk() patchset from Peter Zijlstra.  
> > > 
> > > I'd be happy to repost those. I still carry them in my tree.  
> > 
> > You do not need to if they are still the same as
> > https://lkml.kernel.org/r/20161018170830.405990950@infradead.org  
> 
> More or less, I think I fixed an intermediate compile fail reported by
> you and the 0-day thing and I wrote a few comments in the last patch.
> 
> They live here fwiw:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/tmp
> 
> > I rather do not promise anything but I would like to look at them
> > within next few weeks (after the merge window).  
> 
> Sure, I'll post them again during/after the merge window.

BTW, this is what I ended up using to get the debug information I
needed.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 14:47                         ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-28 14:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Sergey Senozhatsky, Andrew Morton, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Fri, 28 Apr 2017 15:58:10 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Apr 28, 2017 at 03:44:23PM +0200, Petr Mladek wrote:
> > On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote:  
> > > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:  
> > > > Also we need to look for alternatives. There is a chance
> > > > to create crashdump and get the ftrace messages from it.
> > > > Also this might be scenario when we might need to suggest
> > > > the early_printk() patchset from Peter Zijlstra.  
> > > 
> > > I'd be happy to repost those. I still carry them in my tree.  
> > 
> > You do not need to if they are still the same as
> > https://lkml.kernel.org/r/20161018170830.405990950@infradead.org  
> 
> More or less, I think I fixed an intermediate compile fail reported by
> you and the 0-day thing and I wrote a few comments in the last patch.
> 
> They live here fwiw:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/tmp
> 
> > I rather do not promise anything but I would like to look at them
> > within next few weeks (after the merge window).  
> 
> Sure, I'll post them again during/after the merge window.

BTW, this is what I ended up using to get the debug information I
needed.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
@ 2017-04-28 14:47                         ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-04-28 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 28 Apr 2017 15:58:10 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Apr 28, 2017 at 03:44:23PM +0200, Petr Mladek wrote:
> > On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote:  
> > > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote:  
> > > > Also we need to look for alternatives. There is a chance
> > > > to create crashdump and get the ftrace messages from it.
> > > > Also this might be scenario when we might need to suggest
> > > > the early_printk() patchset from Peter Zijlstra.  
> > > 
> > > I'd be happy to repost those. I still carry them in my tree.  
> > 
> > You do not need to if they are still the same as
> > https://lkml.kernel.org/r/20161018170830.405990950 at infradead.org  
> 
> More or less, I think I fixed an intermediate compile fail reported by
> you and the 0-day thing and I wrote a few comments in the last patch.
> 
> They live here fwiw:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/tmp
> 
> > I rather do not promise anything but I would like to look at them
> > within next few weeks (after the merge window).  
> 
> Sure, I'll post them again during/after the merge window.

BTW, this is what I ended up using to get the debug information I
needed.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2017-04-28 14:47 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-21 11:48 [PATCH v5 0/4] Cleaning printk stuff in NMI context Petr Mladek
2016-04-21 11:48 ` Petr Mladek
2016-04-21 11:48 ` Petr Mladek
2016-04-21 11:48 ` [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-27  9:31   ` Russell King - ARM Linux
2016-04-27  9:31     ` Russell King - ARM Linux
2016-04-27  9:31     ` Russell King - ARM Linux
2017-04-19 17:13   ` Steven Rostedt
2017-04-19 17:13     ` Steven Rostedt
2017-04-19 17:13     ` Steven Rostedt
2017-04-19 17:21     ` Peter Zijlstra
2017-04-19 17:21       ` Peter Zijlstra
2017-04-19 17:21       ` Peter Zijlstra
2017-04-20  3:31     ` Sergey Senozhatsky
2017-04-20  3:31       ` Sergey Senozhatsky
2017-04-20  3:31       ` Sergey Senozhatsky
2017-04-20 13:11       ` Petr Mladek
2017-04-20 13:11         ` Petr Mladek
2017-04-20 13:11         ` Petr Mladek
2017-04-20 13:11         ` Petr Mladek
2017-04-21  1:57         ` Sergey Senozhatsky
2017-04-21  1:57           ` Sergey Senozhatsky
2017-04-21  1:57           ` Sergey Senozhatsky
2017-04-21 12:06           ` Petr Mladek
2017-04-21 12:06             ` Petr Mladek
2017-04-21 12:06             ` Petr Mladek
2017-04-24  2:17             ` Sergey Senozhatsky
2017-04-24  2:17               ` Sergey Senozhatsky
2017-04-24  2:17               ` Sergey Senozhatsky
2017-04-27 13:38               ` Petr Mladek
2017-04-27 13:38                 ` Petr Mladek
2017-04-27 13:38                 ` Petr Mladek
2017-04-27 14:31                 ` Steven Rostedt
2017-04-27 14:31                   ` Steven Rostedt
2017-04-27 14:31                   ` Steven Rostedt
2017-04-27 15:28                   ` Petr Mladek
2017-04-27 15:28                     ` Petr Mladek
2017-04-27 15:28                     ` Petr Mladek
2017-04-27 15:42                     ` Steven Rostedt
2017-04-27 15:42                       ` Steven Rostedt
2017-04-27 15:42                       ` Steven Rostedt
2017-04-28  9:02                 ` Peter Zijlstra
2017-04-28  9:02                   ` Peter Zijlstra
2017-04-28  9:02                   ` Peter Zijlstra
2017-04-28 13:44                   ` Petr Mladek
2017-04-28 13:44                     ` Petr Mladek
2017-04-28 13:44                     ` Petr Mladek
2017-04-28 13:58                     ` Peter Zijlstra
2017-04-28 13:58                       ` Peter Zijlstra
2017-04-28 13:58                       ` Peter Zijlstra
2017-04-28 14:47                       ` Steven Rostedt
2017-04-28 14:47                         ` Steven Rostedt
2017-04-28 14:47                         ` Steven Rostedt
2017-04-27 16:14         ` Steven Rostedt
2017-04-27 16:14           ` Steven Rostedt
2017-04-27 16:14           ` Steven Rostedt
2017-04-28  1:35           ` Sergey Senozhatsky
2017-04-28  1:35             ` Sergey Senozhatsky
2017-04-28  1:35             ` Sergey Senozhatsky
2017-04-28 12:57             ` Petr Mladek
2017-04-28 12:57               ` Petr Mladek
2017-04-28 12:57               ` Petr Mladek
2017-04-28 14:16               ` Steven Rostedt
2017-04-28 14:16                 ` Steven Rostedt
2017-04-28 14:16                 ` Steven Rostedt
2017-04-28  1:25         ` Sergey Senozhatsky
2017-04-28  1:25           ` Sergey Senozhatsky
2017-04-28  1:25           ` Sergey Senozhatsky
2017-04-28 12:38           ` Petr Mladek
2017-04-28 12:38             ` Petr Mladek
2017-04-28 12:38             ` Petr Mladek
2016-04-21 11:48 ` [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-27  9:34   ` Russell King - ARM Linux
2016-04-27  9:34     ` Russell King - ARM Linux
2016-04-27  9:34     ` Russell King - ARM Linux
2016-04-21 11:48 ` [PATCH v5 3/4] printk/nmi: increase the size of NMI buffer and make it configurable Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-21 11:48 ` [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-23  3:49   ` Sergey Senozhatsky
2016-04-23  3:49     ` Sergey Senozhatsky
2016-04-23  3:49     ` Sergey Senozhatsky
2016-04-26 14:21     ` Petr Mladek
2016-04-26 14:21       ` Petr Mladek
2016-04-26 14:21       ` Petr Mladek
2016-04-27  0:34       ` Sergey Senozhatsky
2016-04-27  0:34         ` Sergey Senozhatsky
2016-04-27  0:34         ` Sergey Senozhatsky
2016-04-27  0:36 ` [PATCH v5 0/4] Cleaning printk stuff in NMI context Sergey Senozhatsky
2016-04-27  0:36   ` Sergey Senozhatsky
2016-04-27  0:36   ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.