linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] Cleaning printk stuff in NMI context
@ 2016-03-30 15:53 Petr Mladek
  2016-03-30 15:53 ` [PATCH v4 1/5] printk/nmi: generic solution for safe printk in NMI Petr Mladek
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Petr Mladek @ 2016-03-30 15:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek

This patch set generalizes the already existing solution for
printing NMI messages. The main idea comes from Peter Zijlstra.

v4 adds one more patch that flushes NNI buffers when the system
is panicing. Otherwise, it is almost the same as the last version
in the -mm tree, see changes below for more details

There is a trivial conflict with Sergey's v8 of async printk,
see http://thread.gmane.org/gmane.linux.kernel/2183611
In short, we must handle console in vprintk_emit() neither in sched
nor in nmi. Therefore we need the line

	if (!in_sched && !deferred_console_in_nmi()) {


There are move conflicts with the nmi_backtrace improvements
from Chris Metcalf, see
 http://marc.info/?l=linux-kernel&m=145866814901740&w=2
Feel free to ask me to resolve them.


Changes against v3:

  + merged all small changes from -mm tree, including commit
    descriptions

  + disabled interrupts when taking the read_lock in __printk_nmi_flush();
    printk_nmi_flush() might be called from any context; reported
    by lockdep

  + never introduce NEED_PRINTK_NMI; in -mm tree was introduced
    in 1st patch and removed in the 4th one

  + flush NMI buffers when the system goes down (new 5th patch);
    addresses Daniel's concerns


Changes against v2:

  + fixed compilation problems reported by 0-day build robot

  + MN10300 and Xtensa architectures will get handled separately

  + dropped the patch that printed NMI messages directly when Oops
    in progress; it made the solution less reliable

  + made the size of the buffer configurable; use real numbers
    instead of PAGE_SIZE


Changes against v1:

  + rebased on top of 4.4-rc2; there the old implementation was
    moved to lib/nmi_backtrace.c and used also on arm; I hope that
    I got the arm side correctly; I was not able to test on arm :-(

  + defined HAVE_NMI on arm for !CPU_V7M instead of !CPU_V7;
    handle_fiq_as_nmi() is called from entry-armv.S that
    is compiled when !CPU_V7M

  + defined HAVE_NMI also on mips; it calls nmi_enter() and
    seems to have real NMIs (or am I wrong?)

  + serialized backtraces when printing directly
    (oops_in_progress)


Petr Mladek (5):
  printk/nmi: generic solution for safe printk in NMI
  printk/nmi: use IRQ work only when ready
  printk/nmi: warn when some message has been lost in NMI context
  printk/nmi: increase the size of NMI buffer and make it configurable
  printk/nmi: flush NMI messages on the system panic

 arch/Kconfig                  |   4 +
 arch/arm/Kconfig              |   1 +
 arch/arm/kernel/smp.c         |   2 +
 arch/avr32/Kconfig            |   1 +
 arch/blackfin/Kconfig         |   1 +
 arch/cris/Kconfig             |   1 +
 arch/mips/Kconfig             |   1 +
 arch/powerpc/Kconfig          |   1 +
 arch/s390/Kconfig             |   1 +
 arch/sh/Kconfig               |   1 +
 arch/sparc/Kconfig            |   1 +
 arch/tile/Kconfig             |   1 +
 arch/x86/Kconfig              |   1 +
 arch/x86/kernel/apic/hw_nmi.c |   1 -
 include/linux/hardirq.h       |   2 +
 include/linux/percpu.h        |   3 -
 include/linux/printk.h        |  16 ++-
 init/Kconfig                  |  27 +++++
 init/main.c                   |   1 +
 kernel/kexec_core.c           |   1 +
 kernel/panic.c                |   6 +-
 kernel/printk/Makefile        |   1 +
 kernel/printk/internal.h      |  57 ++++++++++
 kernel/printk/nmi.c           | 258 ++++++++++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c        |  43 +++----
 lib/nmi_backtrace.c           |  89 +--------------
 26 files changed, 411 insertions(+), 111 deletions(-)
 create mode 100644 kernel/printk/internal.h
 create mode 100644 kernel/printk/nmi.c

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/5] printk/nmi: generic solution for safe printk in NMI
  2016-03-30 15:53 [PATCH v4 0/5] Cleaning printk stuff in NMI context Petr Mladek
@ 2016-03-30 15:53 ` Petr Mladek
  2016-04-04  4:49   ` Sergey Senozhatsky
  2016-03-30 15:53 ` [PATCH v4 2/5] printk/nmi: use IRQ work only when ready Petr Mladek
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2016-03-30 15:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek, Jan Kara,
	Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky,
	David Miller

printk() takes some locks and could not be used a safe way in NMI context.

The chance of a deadlock is real especially when printing stacks from all
CPUs.  This particular problem has been addressed on x86 by the commit
a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs").

The patchset brings two big advantages.  First, it makes the NMI
backtraces safe on all architectures for free.  Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited.  We still should keep the number of messages in NMI context at
minimum).

Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers.  These are not easy to avoid.

This patch reuses most of the code and makes it generic.  It is useful for
all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context.  It queues IRQ work to copy the messages into the
main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers.  There is also used a spinlock to get synchronized with other
flushers.

We do not longer use seq_buf because it depends on external lock.  It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.

The code is put into separate printk/nmi.c as suggested by Steven Rostedt.
It needs a per-CPU buffer and is compiled only on architectures that call
nmi_enter().  This is achieved by the new HAVE_NMI Kconfig flag.

The are MN10300 and Xtensa architectures.  We need to clean up NMI
handling there first.  Let's do it separately.

The patch is heavily based on the draft from Peter Zijlstra, see
https://lkml.org/lkml/2015/6/10/327

[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 arch/Kconfig                  |   4 +
 arch/arm/Kconfig              |   1 +
 arch/arm/kernel/smp.c         |   2 +
 arch/avr32/Kconfig            |   1 +
 arch/blackfin/Kconfig         |   1 +
 arch/cris/Kconfig             |   1 +
 arch/mips/Kconfig             |   1 +
 arch/powerpc/Kconfig          |   1 +
 arch/s390/Kconfig             |   1 +
 arch/sh/Kconfig               |   1 +
 arch/sparc/Kconfig            |   1 +
 arch/tile/Kconfig             |   1 +
 arch/x86/Kconfig              |   1 +
 arch/x86/kernel/apic/hw_nmi.c |   1 -
 include/linux/hardirq.h       |   2 +
 include/linux/percpu.h        |   3 -
 include/linux/printk.h        |  12 ++-
 init/Kconfig                  |   5 +
 init/main.c                   |   1 +
 kernel/printk/Makefile        |   1 +
 kernel/printk/internal.h      |  44 +++++++++
 kernel/printk/nmi.c           | 208 ++++++++++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c        |  19 +---
 lib/nmi_backtrace.c           |  89 +-----------------
 24 files changed, 295 insertions(+), 107 deletions(-)
 create mode 100644 kernel/printk/internal.h
 create mode 100644 kernel/printk/nmi.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 81869a5e7e17..049f243a7af8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -187,7 +187,11 @@ config HAVE_OPTPROBES
 config HAVE_KPROBES_ON_FTRACE
 	bool
 
+config HAVE_NMI
+	bool
+
 config HAVE_NMI_WATCHDOG
+	depends on HAVE_NMI
 	bool
 #
 # An arch should select this if it provides all these things:
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cdfa6c2b7626..259543ec6dc9 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -66,6 +66,7 @@ config ARM
 	select HAVE_KRETPROBES if (HAVE_KPROBES)
 	select HAVE_MEMBLOCK
 	select HAVE_MOD_ARCH_SPECIFIC
+	select HAVE_NMI
 	select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
 	select HAVE_OPTPROBES if !THUMB2_KERNEL
 	select HAVE_PERF_EVENTS
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index baee70267f29..2dfb52a0263b 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -645,7 +645,9 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 
 	case IPI_CPU_BACKTRACE:
 		irq_enter();
+		printk_nmi_enter();
 		nmi_cpu_backtrace(regs);
+		printk_nmi_exit();
 		irq_exit();
 		break;
 
diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig
index b6878eb64884..9dc3e2b1180b 100644
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -17,6 +17,7 @@ config AVR32
 	select GENERIC_CLOCKEVENTS
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
+	select HAVE_NMI
 	help
 	  AVR32 is a high-performance 32-bit RISC microprocessor core,
 	  designed for cost-sensitive embedded applications, with particular
diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index a63c12259e77..28c63fea786d 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -40,6 +40,7 @@ config BLACKFIN
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
 	select HAVE_DEBUG_STACKOVERFLOW
+	select HAVE_NMI
 
 config GENERIC_CSUM
 	def_bool y
diff --git a/arch/cris/Kconfig b/arch/cris/Kconfig
index e086f9e93728..62148eaf0189 100644
--- a/arch/cris/Kconfig
+++ b/arch/cris/Kconfig
@@ -69,6 +69,7 @@ config CRIS
 	select GENERIC_CLOCKEVENTS if ETRAX_ARCH_V32
 	select GENERIC_SCHED_CLOCK if ETRAX_ARCH_V32
 	select HAVE_DEBUG_BUGVERBOSE if ETRAX_ARCH_V32
+	select HAVE_NMI
 
 config HZ
 	int
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2018c2b0e078..7de958bb19f1 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -62,6 +62,7 @@ config MIPS
 	select HAVE_IRQ_TIME_ACCOUNTING
 	select GENERIC_TIME_VSYSCALL
 	select ARCH_CLOCKSOURCE_DATA
+	select HAVE_NMI
 
 menu "Machine selection"
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7cd32c038286..f99b19d2d572 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -153,6 +153,7 @@ config PPC
 	select NO_BOOTMEM
 	select HAVE_GENERIC_RCU_GUP
 	select HAVE_PERF_EVENTS_NMI if PPC64
+	select HAVE_NMI if PERF_EVENTS
 	select EDAC_SUPPORT
 	select EDAC_ATOMIC_SCRUB
 	select ARCH_HAS_DMA_SET_COHERENT_MASK
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index b9df8d11d7a9..46868f5c8ae6 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -158,6 +158,7 @@ config S390
 	select TTY
 	select VIRT_CPU_ACCOUNTING
 	select VIRT_TO_BUS
+	select HAVE_NMI
 
 
 config SCHED_OMIT_FRAME_POINTER
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 7ed20fc3fc81..1725e365d751 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -44,6 +44,7 @@ config SUPERH
 	select OLD_SIGSUSPEND
 	select OLD_SIGACTION
 	select HAVE_ARCH_AUDITSYSCALL
+	select HAVE_NMI
 	help
 	  The SuperH is a RISC processor targeted for use in embedded systems
 	  and consumer electronics; it was also used in the Sega Dreamcast
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 57ffaf285c2f..a900e7f35f6a 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -79,6 +79,7 @@ config SPARC64
 	select NO_BOOTMEM
 	select HAVE_ARCH_AUDITSYSCALL
 	select ARCH_SUPPORTS_ATOMIC_RMW
+	select HAVE_NMI
 
 config ARCH_DEFCONFIG
 	string
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 81719302b056..cedcbbefcd1c 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -29,6 +29,7 @@ config TILE
 	select HAVE_DEBUG_STACKOVERFLOW
 	select ARCH_WANT_FRAME_POINTERS
 	select HAVE_CONTEXT_TRACKING
+	select HAVE_NMI if USE_PMC
 	select EDAC_SUPPORT
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2dc18605831f..30ea1f834a06 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -130,6 +130,7 @@ config X86
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_MIXED_BREAKPOINTS_REGS
+	select HAVE_NMI
 	select HAVE_OPROFILE
 	select HAVE_OPTPROBES
 	select HAVE_PCSPKR_PLATFORM
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 045e424fb368..7788ce643bf4 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,7 +18,6 @@
 #include <linux/nmi.h>
 #include <linux/module.h>
 #include <linux/delay.h>
-#include <linux/seq_buf.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..a477e0766d2e 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -67,10 +67,12 @@ extern void irq_exit(void);
 		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
 		rcu_nmi_enter();				\
 		trace_hardirq_enter();				\
+		printk_nmi_enter();				\
 	} while (0)
 
 #define nmi_exit()						\
 	do {							\
+		printk_nmi_exit();				\
 		trace_hardirq_exit();				\
 		rcu_nmi_exit();					\
 		BUG_ON(!in_nmi());				\
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 4bc6dafb703e..56939d3f6e53 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -129,7 +129,4 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
 	(typeof(type) __percpu *)__alloc_percpu(sizeof(type),		\
 						__alignof__(type))
 
-/* To avoid include hell, as printk can not declare this, we declare it here */
-DECLARE_PER_CPU(printk_func_t, printk_func);
-
 #endif /* __LINUX_PERCPU_H */
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9ccbdf2c1453..51dd6b824fe2 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -122,7 +122,17 @@ static inline __printf(1, 2) __cold
 void early_printk(const char *s, ...) { }
 #endif
 
-typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+#ifdef CONFIG_PRINTK_NMI
+extern void printk_nmi_init(void);
+extern void printk_nmi_enter(void);
+extern void printk_nmi_exit(void);
+extern void printk_nmi_flush(void);
+#else
+static inline void printk_nmi_init(void) { }
+static inline void printk_nmi_enter(void) { }
+static inline void printk_nmi_exit(void) { }
+static inline void printk_nmi_flush(void) { }
+#endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
 asmlinkage __printf(5, 0)
diff --git a/init/Kconfig b/init/Kconfig
index e0d26162432e..fd10deeefc22 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1453,6 +1453,11 @@ config PRINTK
 	  very difficult to diagnose system problems, saying N here is
 	  strongly discouraged.
 
+config PRINTK_NMI
+	def_bool y
+	depends on PRINTK
+	depends on HAVE_NMI
+
 config BUG
 	bool "BUG() support" if EXPERT
 	default y
diff --git a/init/main.c b/init/main.c
index b3c6e363ae18..c0d3ff91c8e0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -569,6 +569,7 @@ asmlinkage __visible void __init start_kernel(void)
 	timekeeping_init();
 	time_init();
 	sched_clock_postinit();
+	printk_nmi_init();
 	perf_event_init();
 	profile_init();
 	call_function_init();
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 85405bdcf2b3..abb0042a427b 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,2 +1,3 @@
 obj-y	= printk.o
+obj-$(CONFIG_PRINTK_NMI)		+= nmi.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)	+= braille.o
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
new file mode 100644
index 000000000000..2de99faedfc1
--- /dev/null
+++ b/kernel/printk/internal.h
@@ -0,0 +1,44 @@
+/*
+ * internal.h - printk internal definitions
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/percpu.h>
+
+typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+
+int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
+
+#ifdef CONFIG_PRINTK_NMI
+
+/*
+ * printk() could not take logbuf_lock in NMI context. Instead,
+ * it temporary stores the strings into a per-CPU buffer.
+ * The alternative implementation is chosen transparently
+ * via per-CPU variable.
+ */
+DECLARE_PER_CPU(printk_func_t, printk_func);
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+{
+	return this_cpu_read(printk_func)(fmt, args);
+}
+
+#else /* CONFIG_PRINTK_NMI */
+
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+{
+	return vprintk_default(fmt, args);
+}
+
+#endif /* CONFIG_PRINTK_NMI */
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
new file mode 100644
index 000000000000..479e0764203c
--- /dev/null
+++ b/kernel/printk/nmi.c
@@ -0,0 +1,208 @@
+/*
+ * nmi.c - Safe printk in NMI context
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/preempt.h>
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+#include <linux/cpumask.h>
+#include <linux/irq_work.h>
+#include <linux/printk.h>
+
+#include "internal.h"
+
+/*
+ * printk() could not take logbuf_lock in NMI context. Instead,
+ * it uses an alternative implementation that temporary stores
+ * the strings into a per-CPU buffer. The content of the buffer
+ * is later flushed into the main ring buffer via IRQ work.
+ *
+ * The alternative implementation is chosen transparently
+ * via @printk_func per-CPU variable.
+ *
+ * The implementation allows to flush the strings also from another CPU.
+ * There are situations when we want to make sure that all buffers
+ * were handled or when IRQs are blocked.
+ */
+DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
+
+#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
+
+struct nmi_seq_buf {
+	atomic_t		len;	/* length of written data */
+	struct irq_work		work;	/* IRQ work that flushes the buffer */
+	unsigned char		buffer[NMI_LOG_BUF_LEN];
+};
+static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
+
+/*
+ * Safe printk() for NMI context. It uses a per-CPU buffer to
+ * store the message. NMIs are not nested, so there is always only
+ * one writer running. But the buffer might get flushed from another
+ * CPU, so we need to be careful.
+ */
+static int vprintk_nmi(const char *fmt, va_list args)
+{
+	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+	int add = 0;
+	size_t len;
+
+again:
+	len = atomic_read(&s->len);
+
+	if (len >= sizeof(s->buffer))
+		return 0;
+
+	/*
+	 * Make sure that all old data have been read before the buffer was
+	 * reseted. This is not needed when we just append data.
+	 */
+	if (!len)
+		smp_rmb();
+
+	add = vsnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, args);
+
+	/*
+	 * Do it once again if the buffer has been flushed in the meantime.
+	 * Note that atomic_cmpxchg() is an implicit memory barrier that
+	 * makes sure that the data were written before updating s->len.
+	 */
+	if (atomic_cmpxchg(&s->len, len, len + add) != len)
+		goto again;
+
+	/* Get flushed in a more safe context. */
+	if (add)
+		irq_work_queue(&s->work);
+
+	return add;
+}
+
+/*
+ * printk one line from the temporary buffer from @start index until
+ * and including the @end index.
+ */
+static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end)
+{
+	const char *buf = s->buffer + start;
+
+	printk("%.*s", (end - start) + 1, buf);
+}
+
+/*
+ * Flush data from the associated per_CPU buffer. The function
+ * can be called either via IRQ work or independently.
+ */
+static void __printk_nmi_flush(struct irq_work *work)
+{
+	static raw_spinlock_t read_lock =
+		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
+	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
+	unsigned long flags;
+	size_t len, size;
+	int i, last_i;
+
+	/*
+	 * The lock has two functions. First, one reader has to flush all
+	 * available message to make the lockless synchronization with
+	 * writers easier. Second, we do not want to mix messages from
+	 * different CPUs. This is especially important when printing
+	 * a backtrace.
+	 */
+	raw_spin_lock_irqsave(&read_lock, flags);
+
+	i = 0;
+more:
+	len = atomic_read(&s->len);
+
+	/*
+	 * This is just a paranoid check that nobody has manipulated
+	 * the buffer an unexpected way. If we printed something then
+	 * @len must only increase.
+	 */
+	if (i && i >= len)
+		pr_err("printk_nmi_flush: internal error: i=%d >= len=%zu\n",
+		       i, len);
+
+	if (!len)
+		goto out; /* Someone else has already flushed the buffer. */
+
+	/* Make sure that data has been written up to the @len */
+	smp_rmb();
+
+	size = min(len, sizeof(s->buffer));
+	last_i = i;
+
+	/* Print line by line. */
+	for (; i < size; i++) {
+		if (s->buffer[i] == '\n') {
+			print_nmi_seq_line(s, last_i, i);
+			last_i = i + 1;
+		}
+	}
+	/* Check if there was a partial line. */
+	if (last_i < size) {
+		print_nmi_seq_line(s, last_i, size - 1);
+		pr_cont("\n");
+	}
+
+	/*
+	 * Check that nothing has got added in the meantime and truncate
+	 * the buffer. Note that atomic_cmpxchg() is an implicit memory
+	 * barrier that makes sure that the data were copied before
+	 * updating s->len.
+	 */
+	if (atomic_cmpxchg(&s->len, len, 0) != len)
+		goto more;
+
+out:
+	raw_spin_unlock_irqrestore(&read_lock, flags);
+}
+
+/**
+ * printk_nmi_flush - flush all per-cpu nmi buffers.
+ *
+ * The buffers are flushed automatically via IRQ work. This function
+ * is useful only when someone wants to be sure that all buffers have
+ * been flushed at some point.
+ */
+void printk_nmi_flush(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
+}
+
+void __init printk_nmi_init(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct nmi_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
+
+		init_irq_work(&s->work, __printk_nmi_flush);
+	}
+}
+
+void printk_nmi_enter(void)
+{
+	this_cpu_write(printk_func, vprintk_nmi);
+}
+
+void printk_nmi_exit(void)
+{
+	this_cpu_write(printk_func, vprintk_default);
+}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bfbf284e4218..71eba0607034 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -55,6 +55,7 @@
 
 #include "console_cmdline.h"
 #include "braille.h"
+#include "internal.h"
 
 int console_printk[4] = {
 	CONSOLE_LOGLEVEL_DEFAULT,	/* console_loglevel */
@@ -1807,14 +1808,6 @@ int vprintk_default(const char *fmt, va_list args)
 }
 EXPORT_SYMBOL_GPL(vprintk_default);
 
-/*
- * This allows printk to be diverted to another function per cpu.
- * This is useful for calling printk functions from within NMI
- * without worrying about race conditions that can lock up the
- * box.
- */
-DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
-
 /**
  * printk - print a kernel message
  * @fmt: format string
@@ -1838,21 +1831,11 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
  */
 asmlinkage __visible int printk(const char *fmt, ...)
 {
-	printk_func_t vprintk_func;
 	va_list args;
 	int r;
 
 	va_start(args, fmt);
-
-	/*
-	 * If a caller overrides the per_cpu printk_func, then it needs
-	 * to disable preemption when calling printk(). Otherwise
-	 * the printk_func should be set to the default. No need to
-	 * disable preemption here.
-	 */
-	vprintk_func = this_cpu_read(printk_func);
 	r = vprintk_func(fmt, args);
-
 	va_end(args);
 
 	return r;
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 6019c53c669e..26caf51cc238 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -16,33 +16,14 @@
 #include <linux/delay.h>
 #include <linux/kprobes.h>
 #include <linux/nmi.h>
-#include <linux/seq_buf.h>
 
 #ifdef arch_trigger_all_cpu_backtrace
 /* For reliability, we're prepared to waste bits here. */
 static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
-static cpumask_t printtrace_mask;
-
-#define NMI_BUF_SIZE		4096
-
-struct nmi_seq_buf {
-	unsigned char		buffer[NMI_BUF_SIZE];
-	struct seq_buf		seq;
-};
-
-/* Safe printing in NMI context */
-static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
 
 /* "in progress" flag of arch_trigger_all_cpu_backtrace */
 static unsigned long backtrace_flag;
 
-static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
-{
-	const char *buf = s->buffer + start;
-
-	printk("%.*s", (end - start) + 1, buf);
-}
-
 /*
  * When raise() is called it will be is passed a pointer to the
  * backtrace_mask. Architectures that call nmi_cpu_backtrace()
@@ -52,8 +33,7 @@ static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
 void nmi_trigger_all_cpu_backtrace(bool include_self,
 				   void (*raise)(cpumask_t *mask))
 {
-	struct nmi_seq_buf *s;
-	int i, cpu, this_cpu = get_cpu();
+	int i, this_cpu = get_cpu();
 
 	if (test_and_set_bit(0, &backtrace_flag)) {
 		/*
@@ -68,17 +48,6 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
 	if (!include_self)
 		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
 
-	cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
-
-	/*
-	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
-	 * CPUs will write to.
-	 */
-	for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
-		s = &per_cpu(nmi_print_seq, cpu);
-		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
-	}
-
 	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
 		pr_info("Sending NMI to %s CPUs:\n",
 			(include_self ? "all" : "other"));
@@ -94,73 +63,25 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
 	}
 
 	/*
-	 * Now that all the NMIs have triggered, we can dump out their
-	 * back traces safely to the console.
+	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * and therefore could not run their irq_work.
 	 */
-	for_each_cpu(cpu, &printtrace_mask) {
-		int len, last_i = 0;
+	printk_nmi_flush();
 
-		s = &per_cpu(nmi_print_seq, cpu);
-		len = seq_buf_used(&s->seq);
-		if (!len)
-			continue;
-
-		/* Print line by line. */
-		for (i = 0; i < len; i++) {
-			if (s->buffer[i] == '\n') {
-				print_seq_line(s, last_i, i);
-				last_i = i + 1;
-			}
-		}
-		/* Check if there was a partial line. */
-		if (last_i < len) {
-			print_seq_line(s, last_i, len - 1);
-			pr_cont("\n");
-		}
-	}
-
-	clear_bit(0, &backtrace_flag);
-	smp_mb__after_atomic();
+	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
 }
 
-/*
- * It is not safe to call printk() directly from NMI handlers.
- * It may be fine if the NMI detected a lock up and we have no choice
- * but to do so, but doing a NMI on all other CPUs to get a back trace
- * can be done with a sysrq-l. We don't want that to lock up, which
- * can happen if the NMI interrupts a printk in progress.
- *
- * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
- * the content into a per cpu seq_buf buffer. Then when the NMIs are
- * all done, we can safely dump the contents of the seq_buf to a printk()
- * from a non NMI context.
- */
-static int nmi_vprintk(const char *fmt, va_list args)
-{
-	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
-	unsigned int len = seq_buf_used(&s->seq);
-
-	seq_buf_vprintf(&s->seq, fmt, args);
-	return seq_buf_used(&s->seq) - len;
-}
-
 bool nmi_cpu_backtrace(struct pt_regs *regs)
 {
 	int cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
-		printk_func_t printk_func_save = this_cpu_read(printk_func);
-
-		/* Replace printk to write into the NMI seq */
-		this_cpu_write(printk_func, nmi_vprintk);
 		pr_warn("NMI backtrace for cpu %d\n", cpu);
 		if (regs)
 			show_regs(regs);
 		else
 			dump_stack();
-		this_cpu_write(printk_func, printk_func_save);
-
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/5] printk/nmi: use IRQ work only when ready
  2016-03-30 15:53 [PATCH v4 0/5] Cleaning printk stuff in NMI context Petr Mladek
  2016-03-30 15:53 ` [PATCH v4 1/5] printk/nmi: generic solution for safe printk in NMI Petr Mladek
@ 2016-03-30 15:53 ` Petr Mladek
  2016-04-04  4:56   ` Sergey Senozhatsky
  2016-03-30 15:53 ` [PATCH v4 3/5] printk/nmi: warn when some message has been lost in NMI context Petr Mladek
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2016-03-30 15:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek, Jan Kara,
	Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky,
	David Miller

NMIs could happen at any time.  This patch makes sure that the safe
printk() in NMI will schedule IRQ work only when the related structs are
initialized.

All pending messages are flushed when the IRQ work is being initialized.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/printk/nmi.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 479e0764203c..303cf0d15e57 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -38,6 +38,7 @@
  * were handled or when IRQs are blocked.
  */
 DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
+static int printk_nmi_irq_ready;
 
 #define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
 
@@ -84,8 +85,11 @@ again:
 		goto again;
 
 	/* Get flushed in a more safe context. */
-	if (add)
+	if (add && printk_nmi_irq_ready) {
+		/* Make sure that IRQ work is really initialized. */
+		smp_rmb();
 		irq_work_queue(&s->work);
+	}
 
 	return add;
 }
@@ -195,6 +199,13 @@ void __init printk_nmi_init(void)
 
 		init_irq_work(&s->work, __printk_nmi_flush);
 	}
+
+	/* Make sure that IRQ works are initialized before enabling. */
+	smp_wmb();
+	printk_nmi_irq_ready = 1;
+
+	/* Flush pending messages that did not have scheduled IRQ works. */
+	printk_nmi_flush();
 }
 
 void printk_nmi_enter(void)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/5] printk/nmi: warn when some message has been lost in NMI context
  2016-03-30 15:53 [PATCH v4 0/5] Cleaning printk stuff in NMI context Petr Mladek
  2016-03-30 15:53 ` [PATCH v4 1/5] printk/nmi: generic solution for safe printk in NMI Petr Mladek
  2016-03-30 15:53 ` [PATCH v4 2/5] printk/nmi: use IRQ work only when ready Petr Mladek
@ 2016-03-30 15:53 ` Petr Mladek
  2016-03-30 15:53 ` [PATCH v4 4/5] printk/nmi: increase the size of NMI buffer and make it configurable Petr Mladek
  2016-03-30 15:53 ` [PATCH v4 5/5] printk/nmi: flush NMI messages on the system panic Petr Mladek
  4 siblings, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2016-03-30 15:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek, Jan Kara,
	Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky,
	David Miller

We could not resize the temporary buffer in NMI context.  Let's warn if a
message is lost.

This is rather theoretical.  printk() should not be used in NMI.  The only
sensible use is when we want to print backtrace from all CPUs.  The
current buffer should be enough for this purpose.

[akpm@linux-foundation.org: whitespace fixlet]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/printk/internal.h | 11 +++++++++++
 kernel/printk/nmi.c      |  5 ++++-
 kernel/printk/printk.c   | 10 ++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2de99faedfc1..341bedccc065 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -34,6 +34,12 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 	return this_cpu_read(printk_func)(fmt, args);
 }
 
+extern atomic_t nmi_message_lost;
+static inline int get_nmi_message_lost(void)
+{
+	return atomic_xchg(&nmi_message_lost, 0);
+}
+
 #else /* CONFIG_PRINTK_NMI */
 
 static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
@@ -41,4 +47,9 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 	return vprintk_default(fmt, args);
 }
 
+static inline int get_nmi_message_lost(void)
+{
+	return 0;
+}
+
 #endif /* CONFIG_PRINTK_NMI */
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 303cf0d15e57..572f94922230 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -39,6 +39,7 @@
  */
 DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
 static int printk_nmi_irq_ready;
+atomic_t nmi_message_lost;
 
 #define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
 
@@ -64,8 +65,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
 again:
 	len = atomic_read(&s->len);
 
-	if (len >= sizeof(s->buffer))
+	if (len >= sizeof(s->buffer)) {
+		atomic_inc(&nmi_message_lost);
 		return 0;
+	}
 
 	/*
 	 * Make sure that all old data have been read before the buffer was
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 71eba0607034..e38579d730f4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1617,6 +1617,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	unsigned long flags;
 	int this_cpu;
 	int printed_len = 0;
+	int nmi_message_lost;
 	bool in_sched = false;
 	/* cpu currently holding logbuf_lock in this function */
 	static unsigned int logbuf_cpu = UINT_MAX;
@@ -1667,6 +1668,15 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 strlen(recursion_msg));
 	}
 
+	nmi_message_lost = get_nmi_message_lost();
+	if (unlikely(nmi_message_lost)) {
+		text_len = scnprintf(textbuf, sizeof(textbuf),
+				     "BAD LUCK: lost %d message(s) from NMI context!",
+				     nmi_message_lost);
+		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+					 NULL, 0, textbuf, text_len);
+	}
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/5] printk/nmi: increase the size of NMI buffer and make it configurable
  2016-03-30 15:53 [PATCH v4 0/5] Cleaning printk stuff in NMI context Petr Mladek
                   ` (2 preceding siblings ...)
  2016-03-30 15:53 ` [PATCH v4 3/5] printk/nmi: warn when some message has been lost in NMI context Petr Mladek
@ 2016-03-30 15:53 ` Petr Mladek
  2016-03-30 15:53 ` [PATCH v4 5/5] printk/nmi: flush NMI messages on the system panic Petr Mladek
  4 siblings, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2016-03-30 15:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek, Jan Kara,
	Ralf Baechle, Benjamin Herrenschmidt, Martin Schwidefsky,
	David Miller

Testing has shown that the backtrace sometimes does not fit into the 4kB
temporary buffer that is used in NMI context.  The warnings are gone when
I double the temporary buffer size.

This patch doubles the buffer size and makes it configurable.

Note that this problem existed even in the x86-specific implementation
that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI
stack trace on all CPUs").  Nobody noticed it because it did not print any
warnings.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 init/Kconfig        | 22 ++++++++++++++++++++++
 kernel/printk/nmi.c |  3 ++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index fd10deeefc22..1c9be9396a4a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -861,6 +861,28 @@ config LOG_CPU_MAX_BUF_SHIFT
 		     13 =>   8 KB for each CPU
 		     12 =>   4 KB for each CPU
 
+config NMI_LOG_BUF_SHIFT
+	int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
+	range 10 21
+	default 13
+	depends on PRINTK_NMI
+	help
+	  Select the size of a per-CPU buffer where NMI messages are temporary
+	  stored. They are copied to the main log buffer in a safe context
+	  to avoid a deadlock. The value defines the size as a power of 2.
+
+	  NMI messages are rare and limited. The largest one is when
+	  a backtrace is printed. It usually fits into 4KB. Select
+	  8KB if you want to be on the safe side.
+
+	  Examples:
+		     17 => 128 KB for each CPU
+		     16 =>  64 KB for each CPU
+		     15 =>  32 KB for each CPU
+		     14 =>  16 KB for each CPU
+		     13 =>   8 KB for each CPU
+		     12 =>   4 KB for each CPU
+
 #
 # Architectures with an unreliable sched_clock() should select this:
 #
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 572f94922230..bf08557d7e3d 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -41,7 +41,8 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
 static int printk_nmi_irq_ready;
 atomic_t nmi_message_lost;
 
-#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
+#define NMI_LOG_BUF_LEN ((1 << CONFIG_NMI_LOG_BUF_SHIFT) -		\
+			 sizeof(atomic_t) - sizeof(struct irq_work))
 
 struct nmi_seq_buf {
 	atomic_t		len;	/* length of written data */
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/5] printk/nmi: flush NMI messages on the system panic
  2016-03-30 15:53 [PATCH v4 0/5] Cleaning printk stuff in NMI context Petr Mladek
                   ` (3 preceding siblings ...)
  2016-03-30 15:53 ` [PATCH v4 4/5] printk/nmi: increase the size of NMI buffer and make it configurable Petr Mladek
@ 2016-03-30 15:53 ` Petr Mladek
  2016-03-30 16:33   ` kbuild test robot
  4 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2016-03-30 15:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Steven Rostedt, Russell King, Daniel Thompson,
	Jiri Kosina, Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Petr Mladek

In NMI context, printk() messages are stored into per-CPU buffers to avoid
a possible deadlock.  They are normally flushed to the main ring buffer via
an IRQ work.  But the work is never called when the system calls panic() in
the very same NMI handler.

This patch tries to flush NMI buffers before the crash dump is generated.
In this case it does not risk a double release and bails out when the
logbuf_lock is already taken.  The aim is to get the messages into the main
ring buffer when possible.  It makes them better accessible in the vmcore.

Then the patch tries to flush the buffers second time when other CPUs are
down.  It might be more aggressive and reset logbuf_lock. The aim is to
get the messages available for the consequent kmsg_dump() and
console_flush_on_panic() calls.

The patch causes vprintk_emit() to be called even in NMI context again. But
we do not want to call consoles in this case.  They might use internal
locks and we could not prevent a deadlock easily.  We only want to have the
messages in the main ring buffer for crash dump and kmsg_dump().  The
consoles are explicitly called later by console_flush_on_panic().

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/printk.h   |  4 ++++
 kernel/kexec_core.c      |  1 +
 kernel/panic.c           |  6 +++++-
 kernel/printk/internal.h |  2 ++
 kernel/printk/nmi.c      | 35 +++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c   | 14 +++++++++++---
 6 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 51dd6b824fe2..2da06c2a63c3 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -123,15 +123,19 @@ void early_printk(const char *s, ...) { }
 #endif
 
 #ifdef CONFIG_PRINTK_NMI
+#define deferred_console_in_nmi() in_nmi()
 extern void printk_nmi_init(void);
 extern void printk_nmi_enter(void);
 extern void printk_nmi_exit(void);
 extern void printk_nmi_flush(void);
+extern void printk_nmi_flush_on_panic(void);
 #else
+#define deferred_console_in_nmi() 0
 static inline void printk_nmi_init(void) { }
 static inline void printk_nmi_enter(void) { }
 static inline void printk_nmi_exit(void) { }
 static inline void printk_nmi_flush(void) { }
+static inline void printk_nmi_flush_on_panic(void) { }
 #endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 8d34308ea449..1dc3fe8495e0 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -893,6 +893,7 @@ void crash_kexec(struct pt_regs *regs)
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
 	if (old_cpu == PANIC_CPU_INVALID) {
 		/* This is the 1st CPU which comes here, so go ahead. */
+		printk_nmi_flush_on_panic();
 		__crash_kexec(regs);
 
 		/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 535c96510a44..8aa74497cc5a 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -160,8 +160,10 @@ void panic(const char *fmt, ...)
 	 *
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
-	if (!crash_kexec_post_notifiers)
+	if (!crash_kexec_post_notifiers) {
+		printk_nmi_flush_on_panic();
 		__crash_kexec(NULL);
+	}
 
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
@@ -176,6 +178,8 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
+	/* Call flush even twice. It tries harder with a single online CPU */
+	printk_nmi_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 
 	/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 341bedccc065..7fd2838fa417 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,6 +22,8 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
+extern raw_spinlock_t logbuf_lock;
+
 /*
  * printk() could not take logbuf_lock in NMI context. Instead,
  * it temporary stores the strings into a per-CPU buffer.
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index bf08557d7e3d..69b0c67ca2b2 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -194,6 +194,41 @@ void printk_nmi_flush(void)
 		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
 }
 
+/**
+ * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
+ *	goes down.
+ *
+ * Similar to printk_nmi_flush() but it can be called even in NMI context when
+ * the system goes down. It does the best effort to get NMI messages into
+ * the main ring buffer.
+ *
+ * Note that it could try harder when there is only one CPU online.
+ */
+void printk_nmi_flush_on_panic(void)
+{
+	if (in_nmi()) {
+		/*
+		 * Make sure that we could access the main ring buffer.
+		 * Do not risk a double release when more CPUs are up.
+		 */
+		if (raw_spin_is_locked(&logbuf_lock)) {
+			if (num_online_cpus() > 1)
+				return;
+
+			debug_locks_off();
+			raw_spin_lock_init(&logbuf_lock);
+		}
+
+		/*
+		 * Flush the messages using the default printk handler
+		 * to store them into the main ring buffer.
+		 */
+		this_cpu_write(printk_func, vprintk_default);
+	}
+
+	printk_nmi_flush();
+}
+
 void __init printk_nmi_init(void)
 {
 	int cpu;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e38579d730f4..bf84df2eb3b6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -245,7 +245,7 @@ __packed __aligned(4)
  * within the scheduler's rq lock. It must be released before calling
  * console_unlock() or anything else that might wake up a process.
  */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
+DEFINE_RAW_SPINLOCK(logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
@@ -1764,8 +1764,16 @@ asmlinkage int vprintk_emit(int facility, int level,
 	lockdep_on();
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	/*
+	 * Console calls must be deferred when called from the scheduler.
+	 *
+	 * Many architectures never call vprintk_emit() in NMI context,
+	 * see vprintk_nmi(). The only exception is when the NMI buffers
+	 * are flushed on panic. In this case, the consoles are called
+	 * later explicitly only when crashdump does not work, see
+	 * console_flush_on_panic().
+	 */
+	if (!in_sched && !deferred_console_in_nmi()) {
 		lockdep_off();
 		/*
 		 * Try to acquire and then immediately release the console
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 5/5] printk/nmi: flush NMI messages on the system panic
  2016-03-30 15:53 ` [PATCH v4 5/5] printk/nmi: flush NMI messages on the system panic Petr Mladek
@ 2016-03-30 16:33   ` kbuild test robot
  2016-03-31 12:36     ` Petr Mladek
  0 siblings, 1 reply; 13+ messages in thread
From: kbuild test robot @ 2016-03-30 16:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: kbuild-all, Andrew Morton, Peter Zijlstra, Steven Rostedt,
	Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar,
	Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel,
	x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux,
	Petr Mladek

[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]

Hi Petr,

[auto build test ERROR on v4.6-rc1]
[cannot apply to tip/x86/core next-20160330]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Petr-Mladek/Cleaning-printk-stuff-in-NMI-context/20160330-235818
config: i386-randconfig-s1-201613 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/printk/nmi.c: In function 'printk_nmi_flush_on_panic':
>> kernel/printk/nmi.c:218:4: error: implicit declaration of function 'debug_locks_off' [-Werror=implicit-function-declaration]
       debug_locks_off();
       ^
   cc1: some warnings being treated as errors

vim +/debug_locks_off +218 kernel/printk/nmi.c

   212			 * Do not risk a double release when more CPUs are up.
   213			 */
   214			if (raw_spin_is_locked(&logbuf_lock)) {
   215				if (num_online_cpus() > 1)
   216					return;
   217	
 > 218				debug_locks_off();
   219				raw_spin_lock_init(&logbuf_lock);
   220			}
   221	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 27512 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 5/5] printk/nmi: flush NMI messages on the system panic
  2016-03-30 16:33   ` kbuild test robot
@ 2016-03-31 12:36     ` Petr Mladek
  2016-04-04  4:52       ` Sergey Senozhatsky
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2016-03-31 12:36 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Andrew Morton, Peter Zijlstra, Steven Rostedt,
	Russell King, Daniel Thompson, Jiri Kosina, Ingo Molnar,
	Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf, linux-kernel,
	x86, linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux

On Thu 2016-03-31 00:33:54, kbuild test robot wrote:
> Hi Petr,
> 
> [auto build test ERROR on v4.6-rc1]
> [cannot apply to tip/x86/core next-20160330]
> [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Petr-Mladek/Cleaning-printk-stuff-in-NMI-context/20160330-235818
> config: i386-randconfig-s1-201613 (attached as .config)
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
>    kernel/printk/nmi.c: In function 'printk_nmi_flush_on_panic':
> >> kernel/printk/nmi.c:218:4: error: implicit declaration of function 'debug_locks_off' [-Werror=implicit-function-declaration]
>        debug_locks_off();
>        ^
>    cc1: some warnings being treated as errors

Fixed by adding #include <linux/debug_locks.h> into kernel/printk/nmi.c

Please, find the updated patch below.


>From 52cc5fee3909e8955bd78a8c63b9a610dc5c11fe Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 24 Mar 2016 15:10:21 +0100
Subject: [PATCH] printk/nmi: flush NMI messages on the system panic

In NMI context, printk() messages are stored into per-CPU buffers to avoid
a possible deadlock.  They are normally flushed to the main ring buffer via
an IRQ work.  But the work is never called when the system calls panic() in
the very same NMI handler.

This patch tries to flush NMI buffers before the crash dump is generated.
In this case it does not risk a double release and bails out when the
logbuf_lock is already taken.  The aim is to get the messages into the main
ring buffer when possible.  It makes them better accessible in the vmcore.

Then the patch tries to flush the buffers second time when other CPUs are
down.  It might be more aggressive and reset logbuf_lock. The aim is to
get the messages available for the consequent kmsg_dump() and
console_flush_on_panic() calls.

The patch causes vprintk_emit() to be called even in NMI context again. But
we do not want to call consoles in this case.  They might use internal
locks and we could not prevent a deadlock easily.  We only want to have the
messages in the main ring buffer for crash dump and kmsg_dump().  The
consoles are explicitly called later by console_flush_on_panic().

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/printk.h   |  4 ++++
 kernel/kexec_core.c      |  1 +
 kernel/panic.c           |  6 +++++-
 kernel/printk/internal.h |  2 ++
 kernel/printk/nmi.c      | 36 ++++++++++++++++++++++++++++++++++++
 kernel/printk/printk.c   | 14 +++++++++++---
 6 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 51dd6b824fe2..2da06c2a63c3 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -123,15 +123,19 @@ void early_printk(const char *s, ...) { }
 #endif
 
 #ifdef CONFIG_PRINTK_NMI
+#define deferred_console_in_nmi() in_nmi()
 extern void printk_nmi_init(void);
 extern void printk_nmi_enter(void);
 extern void printk_nmi_exit(void);
 extern void printk_nmi_flush(void);
+extern void printk_nmi_flush_on_panic(void);
 #else
+#define deferred_console_in_nmi() 0
 static inline void printk_nmi_init(void) { }
 static inline void printk_nmi_enter(void) { }
 static inline void printk_nmi_exit(void) { }
 static inline void printk_nmi_flush(void) { }
+static inline void printk_nmi_flush_on_panic(void) { }
 #endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 8d34308ea449..1dc3fe8495e0 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -893,6 +893,7 @@ void crash_kexec(struct pt_regs *regs)
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
 	if (old_cpu == PANIC_CPU_INVALID) {
 		/* This is the 1st CPU which comes here, so go ahead. */
+		printk_nmi_flush_on_panic();
 		__crash_kexec(regs);
 
 		/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 535c96510a44..8aa74497cc5a 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -160,8 +160,10 @@ void panic(const char *fmt, ...)
 	 *
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
-	if (!crash_kexec_post_notifiers)
+	if (!crash_kexec_post_notifiers) {
+		printk_nmi_flush_on_panic();
 		__crash_kexec(NULL);
+	}
 
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
@@ -176,6 +178,8 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
+	/* Call flush even twice. It tries harder with a single online CPU */
+	printk_nmi_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 
 	/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 341bedccc065..7fd2838fa417 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,6 +22,8 @@ int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
+extern raw_spinlock_t logbuf_lock;
+
 /*
  * printk() could not take logbuf_lock in NMI context. Instead,
  * it temporary stores the strings into a per-CPU buffer.
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index bf08557d7e3d..6dc3ff80ae89 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -17,6 +17,7 @@
 
 #include <linux/preempt.h>
 #include <linux/spinlock.h>
+#include <linux/debug_locks.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
@@ -194,6 +195,41 @@ void printk_nmi_flush(void)
 		__printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
 }
 
+/**
+ * printk_nmi_flush_on_panic - flush all per-cpu nmi buffers when the system
+ *	goes down.
+ *
+ * Similar to printk_nmi_flush() but it can be called even in NMI context when
+ * the system goes down. It does the best effort to get NMI messages into
+ * the main ring buffer.
+ *
+ * Note that it could try harder when there is only one CPU online.
+ */
+void printk_nmi_flush_on_panic(void)
+{
+	if (in_nmi()) {
+		/*
+		 * Make sure that we could access the main ring buffer.
+		 * Do not risk a double release when more CPUs are up.
+		 */
+		if (raw_spin_is_locked(&logbuf_lock)) {
+			if (num_online_cpus() > 1)
+				return;
+
+			debug_locks_off();
+			raw_spin_lock_init(&logbuf_lock);
+		}
+
+		/*
+		 * Flush the messages using the default printk handler
+		 * to store them into the main ring buffer.
+		 */
+		this_cpu_write(printk_func, vprintk_default);
+	}
+
+	printk_nmi_flush();
+}
+
 void __init printk_nmi_init(void)
 {
 	int cpu;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e38579d730f4..bf84df2eb3b6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -245,7 +245,7 @@ __packed __aligned(4)
  * within the scheduler's rq lock. It must be released before calling
  * console_unlock() or anything else that might wake up a process.
  */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
+DEFINE_RAW_SPINLOCK(logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
@@ -1764,8 +1764,16 @@ asmlinkage int vprintk_emit(int facility, int level,
 	lockdep_on();
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	/*
+	 * Console calls must be deferred when called from the scheduler.
+	 *
+	 * Many architectures never call vprintk_emit() in NMI context,
+	 * see vprintk_nmi(). The only exception is when the NMI buffers
+	 * are flushed on panic. In this case, the consoles are called
+	 * later explicitly only when crashdump does not work, see
+	 * console_flush_on_panic().
+	 */
+	if (!in_sched && !deferred_console_in_nmi()) {
 		lockdep_off();
 		/*
 		 * Try to acquire and then immediately release the console
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] printk/nmi: generic solution for safe printk in NMI
  2016-03-30 15:53 ` [PATCH v4 1/5] printk/nmi: generic solution for safe printk in NMI Petr Mladek
@ 2016-04-04  4:49   ` Sergey Senozhatsky
  2016-04-04  9:38     ` Petr Mladek
  0 siblings, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2016-04-04  4:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86,
	linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux,
	Jan Kara, Ralf Baechle, Benjamin Herrenschmidt,
	Martin Schwidefsky, David Miller

Hello,

On (03/30/16 17:53), Petr Mladek wrote:
[..]
> @@ -67,10 +67,12 @@ extern void irq_exit(void);
>  		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
>  		rcu_nmi_enter();				\
>  		trace_hardirq_enter();				\
> +		printk_nmi_enter();				\
>  	} while (0)
>  
>  #define nmi_exit()						\
>  	do {							\
> +		printk_nmi_exit();				\
>  		trace_hardirq_exit();				\
>  		rcu_nmi_exit();					\
>  		BUG_ON(!in_nmi());				\

isn't it a bit too early to printk_nmi_exit()? rcu_nmi_exit() can
WARN_ON_ONCE() in 3 places.

the same goes for printk_nmi_enter(). rcu_nmi_enter() can WARN_ON_ONCE().

seems that in both cases we can endup having WARN_ON_ONCE() from nmi,
but with default printk function.


> +/*
> + * Flush data from the associated per_CPU buffer. The function
> + * can be called either via IRQ work or independently.
> + */
> +static void __printk_nmi_flush(struct irq_work *work)
> +{
> +	static raw_spinlock_t read_lock =
> +		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
> +	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> +	unsigned long flags;
> +	size_t len, size;
> +	int i, last_i;
> +
> +	/*
> +	 * The lock has two functions. First, one reader has to flush all
> +	 * available message to make the lockless synchronization with
> +	 * writers easier. Second, we do not want to mix messages from
> +	 * different CPUs. This is especially important when printing
> +	 * a backtrace.
> +	 */
> +	raw_spin_lock_irqsave(&read_lock, flags);
> +

hm... so here we have
	for (; i < size; i++)
		printk()

under the spinlock. the thing is that one of printk() can end up
in console_unlock()->call_console_drivers() loop, iterating there
long enough to spinlock lockup other CPUs that might want to flush
NMI buffers (if any), assuming that there are enough printk() (or
may be a slow serial console) happening concurrently on other CPUs
to keep the current ->read_lock busy. async printk can help here,
but user can request sync version of printk.

how about using deferred printk for nmi flush?
print_nmi_seq_line()->printk_deferred() ?

	-ss

> +	i = 0;
> +more:
> +	len = atomic_read(&s->len);
> +
> +	/*
> +	 * This is just a paranoid check that nobody has manipulated
> +	 * the buffer an unexpected way. If we printed something then
> +	 * @len must only increase.
> +	 */
> +	if (i && i >= len)
> +		pr_err("printk_nmi_flush: internal error: i=%d >= len=%zu\n",
> +		       i, len);
> +
> +	if (!len)
> +		goto out; /* Someone else has already flushed the buffer. */
> +
> +	/* Make sure that data has been written up to the @len */
> +	smp_rmb();
> +
> +	size = min(len, sizeof(s->buffer));
> +	last_i = i;
> +
> +	/* Print line by line. */
> +	for (; i < size; i++) {
> +		if (s->buffer[i] == '\n') {
> +			print_nmi_seq_line(s, last_i, i);
> +			last_i = i + 1;
> +		}
> +	}
> +	/* Check if there was a partial line. */
> +	if (last_i < size) {
> +		print_nmi_seq_line(s, last_i, size - 1);
> +		pr_cont("\n");
> +	}
> +
> +	/*
> +	 * Check that nothing has got added in the meantime and truncate
> +	 * the buffer. Note that atomic_cmpxchg() is an implicit memory
> +	 * barrier that makes sure that the data were copied before
> +	 * updating s->len.
> +	 */
> +	if (atomic_cmpxchg(&s->len, len, 0) != len)
> +		goto more;
> +
> +out:
> +	raw_spin_unlock_irqrestore(&read_lock, flags);
> +}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 5/5] printk/nmi: flush NMI messages on the system panic
  2016-03-31 12:36     ` Petr Mladek
@ 2016-04-04  4:52       ` Sergey Senozhatsky
  0 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2016-04-04  4:52 UTC (permalink / raw)
  To: Petr Mladek
  Cc: kbuild test robot, kbuild-all, Andrew Morton, Peter Zijlstra,
	Steven Rostedt, Russell King, Daniel Thompson, Jiri Kosina,
	Ingo Molnar, Thomas Gleixner, Sergey Senozhatsky, Chris Metcalf,
	linux-kernel, x86, linux-arm-kernel, adi-buildroot-devel,
	linux-cris-kernel, linux-mips, linuxppc-dev, linux-s390,
	linux-sh, sparclinux

Hello,

On (03/31/16 14:36), Petr Mladek wrote:
[..]
>  #ifdef CONFIG_PRINTK_NMI
> +#define deferred_console_in_nmi() in_nmi()
>  #else
> +#define deferred_console_in_nmi() 0
[..]
> +	if (!in_sched && !deferred_console_in_nmi()) {
>  		lockdep_off();
>  		/*
>  		 * Try to acquire and then immediately release the console

or use SCHED level for nmi messages instead of defining deferred_console_in_nmi()?
so !in_sched will work for both SCHED and NMI messages.

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 2/5] printk/nmi: use IRQ work only when ready
  2016-03-30 15:53 ` [PATCH v4 2/5] printk/nmi: use IRQ work only when ready Petr Mladek
@ 2016-04-04  4:56   ` Sergey Senozhatsky
  0 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2016-04-04  4:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Sergey Senozhatsky, Chris Metcalf, linux-kernel, x86,
	linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
	linux-mips, linuxppc-dev, linux-s390, linux-sh, sparclinux,
	Jan Kara, Ralf Baechle, Benjamin Herrenschmidt,
	Martin Schwidefsky, David Miller

Hello,

On (03/30/16 17:53), Petr Mladek wrote:
> NMIs could happen at any time.  This patch makes sure that the safe
> printk() in NMI will schedule IRQ work only when the related structs are
> initialized.
> 
> All pending messages are flushed when the IRQ work is being initialized.

so, does this patch 'fix' 0001 (in a way)? shouldn't it then be folded?
sorry if I'm missing something.

	-ss

>  DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
> +static int printk_nmi_irq_ready;
>  
>  #define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
>  
> @@ -84,8 +85,11 @@ again:
>  		goto again;
>  
>  	/* Get flushed in a more safe context. */
> -	if (add)
> +	if (add && printk_nmi_irq_ready) {
> +		/* Make sure that IRQ work is really initialized. */
> +		smp_rmb();
>  		irq_work_queue(&s->work);
> +	}
>  
>  	return add;
>  }
> @@ -195,6 +199,13 @@ void __init printk_nmi_init(void)
>  
>  		init_irq_work(&s->work, __printk_nmi_flush);
>  	}
> +
> +	/* Make sure that IRQ works are initialized before enabling. */
> +	smp_wmb();
> +	printk_nmi_irq_ready = 1;
> +
> +	/* Flush pending messages that did not have scheduled IRQ works. */
> +	printk_nmi_flush();
>  }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] printk/nmi: generic solution for safe printk in NMI
  2016-04-04  4:49   ` Sergey Senozhatsky
@ 2016-04-04  9:38     ` Petr Mladek
  2016-04-20 13:49       ` Petr Mladek
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2016-04-04  9:38 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Mon 2016-04-04 13:49:28, Sergey Senozhatsky wrote:
> Hello,
> 
> On (03/30/16 17:53), Petr Mladek wrote:
> [..]
> > @@ -67,10 +67,12 @@ extern void irq_exit(void);
> >  		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
> >  		rcu_nmi_enter();				\
> >  		trace_hardirq_enter();				\
> > +		printk_nmi_enter();				\
> >  	} while (0)
> >  
> >  #define nmi_exit()						\
> >  	do {							\
> > +		printk_nmi_exit();				\
> >  		trace_hardirq_exit();				\
> >  		rcu_nmi_exit();					\
> >  		BUG_ON(!in_nmi());				\
> 
> isn't it a bit too early to printk_nmi_exit()? rcu_nmi_exit() can
> WARN_ON_ONCE() in 3 places.
> 
> the same goes for printk_nmi_enter(). rcu_nmi_enter() can WARN_ON_ONCE().
> 
> seems that in both cases we can endup having WARN_ON_ONCE() from nmi,
> but with default printk function.

Great catch! You are right.


> > +/*
> > + * Flush data from the associated per_CPU buffer. The function
> > + * can be called either via IRQ work or independently.
> > + */
> > +static void __printk_nmi_flush(struct irq_work *work)
> > +{
> > +	static raw_spinlock_t read_lock =
> > +		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
> > +	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> > +	unsigned long flags;
> > +	size_t len, size;
> > +	int i, last_i;
> > +
> > +	/*
> > +	 * The lock has two functions. First, one reader has to flush all
> > +	 * available message to make the lockless synchronization with
> > +	 * writers easier. Second, we do not want to mix messages from
> > +	 * different CPUs. This is especially important when printing
> > +	 * a backtrace.
> > +	 */
> > +	raw_spin_lock_irqsave(&read_lock, flags);
> > +
> 
> hm... so here we have
> 	for (; i < size; i++)
> 		printk()
> 
> under the spinlock. the thing is that one of printk() can end up
> in console_unlock()->call_console_drivers() loop, iterating there
> long enough to spinlock lockup other CPUs that might want to flush
> NMI buffers (if any), assuming that there are enough printk() (or
> may be a slow serial console) happening concurrently on other CPUs
> to keep the current ->read_lock busy. async printk can help here,
> but user can request sync version of printk.

I think that printk() is called on many other locations under
a spinlock and they all are waiting for the async printk.


> how about using deferred printk for nmi flush?
> print_nmi_seq_line()->printk_deferred() ?

But this is great idea. It will help to avoid the ugly macro
deferred_console_in_nmi() as you mentioned in the other mail.

Heh, I remember that I thought about this but I did not want
to override the original log level of the messages. Now, I see
that LOGLEVEL_SCHED is added on top and the original level
is preserved.

Thanks a lot for review. I am going to wait with respin a bit
and give others some time for feedback.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] printk/nmi: generic solution for safe printk in NMI
  2016-04-04  9:38     ` Petr Mladek
@ 2016-04-20 13:49       ` Petr Mladek
  0 siblings, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2016-04-20 13:49 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Peter Zijlstra, Steven Rostedt, Russell King,
	Daniel Thompson, Jiri Kosina, Ingo Molnar, Thomas Gleixner,
	Chris Metcalf, linux-kernel, x86, linux-arm-kernel,
	adi-buildroot-devel, linux-cris-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, Jan Kara, Ralf Baechle,
	Benjamin Herrenschmidt, Martin Schwidefsky, David Miller

On Mon 2016-04-04 11:38:19, Petr Mladek wrote:
> On Mon 2016-04-04 13:49:28, Sergey Senozhatsky wrote:
> > Hello,
> > 
> > On (03/30/16 17:53), Petr Mladek wrote:
> > > +/*
> > > + * Flush data from the associated per_CPU buffer. The function
> > > + * can be called either via IRQ work or independently.
> > > + */
> > > +static void __printk_nmi_flush(struct irq_work *work)
> > > +{
> > > +	static raw_spinlock_t read_lock =
> > > +		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
> > > +	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> > > +	unsigned long flags;
> > > +	size_t len, size;
> > > +	int i, last_i;
> > > +
> > > +	/*
> > > +	 * The lock has two functions. First, one reader has to flush all
> > > +	 * available message to make the lockless synchronization with
> > > +	 * writers easier. Second, we do not want to mix messages from
> > > +	 * different CPUs. This is especially important when printing
> > > +	 * a backtrace.
> > > +	 */
> > > +	raw_spin_lock_irqsave(&read_lock, flags);
> > > +
> > 
> > hm... so here we have
> > 	for (; i < size; i++)
> > 		printk()
> > 
> > under the spinlock. the thing is that one of printk() can end up
> > in console_unlock()->call_console_drivers() loop, iterating there
> > long enough to spinlock lockup other CPUs that might want to flush
> > NMI buffers (if any), assuming that there are enough printk() (or
> > may be a slow serial console) happening concurrently on other CPUs
> > to keep the current ->read_lock busy. async printk can help here,
> > but user can request sync version of printk.
> 
> I think that printk() is called on many other locations under
> a spinlock and they all are waiting for the async printk.
>
> > how about using deferred printk for nmi flush?
> > print_nmi_seq_line()->printk_deferred() ?

I thought more about it. printk_nmi_flush() is primary called
via irq work. If we use deferred printk() here we will just delay
the console stuff to another irq work.

If we use async printk from your patch set, it will unnecessary
create another irq work and delay the waking of the printk kthread.

If anyone forces sync printk, it will not help much. We will still
call console from the IRQ context and the potential flood of messages
still might cause a soft lookup.

Finally, as I already said. This is not different from any other
printk() call under a spinlock.

After all, I am going to use the normal printk() when flushing
the NMI buffers.

The only exception will be when this is called on panic in NMI
context. In this case, printk_deferred() will do exactly what
we need.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-04-20 13:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-30 15:53 [PATCH v4 0/5] Cleaning printk stuff in NMI context Petr Mladek
2016-03-30 15:53 ` [PATCH v4 1/5] printk/nmi: generic solution for safe printk in NMI Petr Mladek
2016-04-04  4:49   ` Sergey Senozhatsky
2016-04-04  9:38     ` Petr Mladek
2016-04-20 13:49       ` Petr Mladek
2016-03-30 15:53 ` [PATCH v4 2/5] printk/nmi: use IRQ work only when ready Petr Mladek
2016-04-04  4:56   ` Sergey Senozhatsky
2016-03-30 15:53 ` [PATCH v4 3/5] printk/nmi: warn when some message has been lost in NMI context Petr Mladek
2016-03-30 15:53 ` [PATCH v4 4/5] printk/nmi: increase the size of NMI buffer and make it configurable Petr Mladek
2016-03-30 15:53 ` [PATCH v4 5/5] printk/nmi: flush NMI messages on the system panic Petr Mladek
2016-03-30 16:33   ` kbuild test robot
2016-03-31 12:36     ` Petr Mladek
2016-04-04  4:52       ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).